I use a local install of deepspeech and while usable, the recognition is not perfect. So it can happen that i have to repeat my favourite phrase “open radio swiss jazz from favourites” once or twice after mycroft told me “I’m sorry but I did not understand”.
But alas, not without first saying “hey mycroft” again. Which really makes me sound more like an AI than him…
A more natural dialog would be possible if mycroft allowed for another recognition right after, without the need for the wake word. I know that this is generally possible within a skill using the Conversational Context. But there doesn’t seem to be an option to make the general error dialog behave this way, is there? I’m happy to put this in a feature request but wanted to confirm that this isn’t already possible.
This option does not exist.
While your current usage is presenting an issue for you, it seems more on the DS side of things than on the mycroft side. Have you considered fine-tuning a DS model for yourself to try and improve recognition? You’d probably have better luck in the short (and long) term fixing that issue.
Thought so. Thanks for confirming that.
In hindsight it was a bad idea to mention my setup at all. Please consider the flow of dialog without thinking about my particular situation (which definitely could use improvements on the STT side of things).
There will always be interactions where mycroft won’t understand you and you want to try again. Fewer with ever better STT but still. Mycroft himself sometimes asks you to “rephrase”. Don’t you agree that it would make for a more natural dialog if you could just do that without prepending it with another “hey mycroft”?
“natural” interactions may come someday but for now you’d have to make a fallback skill to handle this sort of thing, but that’d present a whole new set of problems. Also don’t ignore working on the stt piece, this would certainly improve your experience.
FWIW, i started recording my utterances and found out that in my setup mycroft regularly stops listening before the end of the sentence which results in poor stt performance. I’m now looking to fix that issue.
The other common problem is the PS3 Eye picking up lyrics from music when it’s playing and i give a command. IIUC, this could be fixed with either echo cancellation or ducking the music but none of those worked for me.
Echo cancellation on ps3 resulted in abysmal audio quality despite PS3 Eye-specific settings while ducking music played with pulseaudio simply didn’t work.
Stopping before the end of the sentence might be connected to me setting the source volume to 300% which I cargo-culted from the internet. The recording seemed more intelligible with that setting but I think it’s actually messing with the NoiseTracker’s logic in mic.py. After I set the volume to 150% the problem pretty much disappeared.