When I look at the Mycroft API, I see that it assumes that the first match is definitely correct. For example, get_response returns a single string.
From the experience of using the Google’s STT API directly, I know that it returns a list of possible matches, for example [“martin”, “marty”, “martie”, “my tin”, “modern”]. Perhaps we will always get “Mandy” while the user’s name is “Mendy” because it is not as common. The STT can recognise “Mendy” but it simply is not its first guess.
Obvious I can always do “if respond in [“yes”, “yea”, “ya”, “yup”]:“ but sometimes I will need open answers.
So is there a way to get the list of guesses, and should we make it more accessible in the API? Thanks.
Currently the Mycroft STT backends don’t provide the alternatives even if the code is partially prepared for handling multiple alternatives. So currently I don’t think this is possible I’m afraid.
Historically this has been due to the original implementation of the STT only supported one alternative. Now there are at least two backends that support multiple alternatives (google_cloud and Kaldi were the ones I found doing a quick check) so this should be updated.
Looking at it some more. speechrecognition’s google_cloud doesn’t allow multiple alternatives either…
Hi forslund, I am not sure if that has changed recently, I vividly recall that (when I tried to make my own voice assistant a year ago) I sent the audio file to Google’s API and it returned multiple (4 or 5) possible matches.
I hope I understand your reply correctly.
It can reply with multiple replies but doesn’t by default (see https://cloud.google.com/speech-to-text/docs/basics), not sure if this is new.
the maxAlternatives parameter defaults to 1 and the speechrecognition python module doesn’t expose this parameter. I’ve just tested hacking the module, adding the parameter myself. Doing this I can get it to sometime offer more alternatives.
I see. Thank you for that.
We could make a PR against the speechrecognition repo adding the parameter…
Yes, I think we can at least add the parameter to the get_response and have it return a list instead of str.
However, to adapt this throughout the project, it may require wider consent from the community as it touches many areas, such as how the intend is triggered considering all the alternative matches.
We don’t actually have a “change proposal” process, but this suggested enhancement would be a good way to develop and validate a “change proposal” process.
Hi Forslund, because I am new to this, could you do the PR this time, so I can review it and learn the submission process for Mycroft? Thanks.
Sure, I suppose other projects have specific section in the forum to discuss core change proposals. If we don’t have that yet, we can certainly use this thread for the purpose.
I am a bit surprised that the community is not very concerned about this alternative matching question. There are many occasions that the first result is not as expected, due to the STT inaccuracy or the user’s pronunciation. At the same time, the first 5 matches will probably cover 90% of the cases. So it is a quick win to improve accuracy.
I understand that the Adapt returns possible matches with confidence levels. Is that why it is not a major problem for Mycroft, because it also considers alternatives from STT?