For testing purposes, I am running Mycroft on low grade hardware (RPi 4 1GB).
While testing the system, I have run into something that I find regularly annoying: I speak “Hey Mycroft” to trigger the system, and then must delay for 1-3 seconds before I can speak my actual intent / command. If I speak prior to this, I find my command (in debug terminal, which prints returned STT) has the first word or two cut off.
As I envision my ideal home assistant, I want to be able to speak to it fluidly. I want to be able to converse naturally and have it be intuitive for others to use.
In an ideal scenario, U envision the recording would begin prior to the wake word, and would continue until the same / current end-of-phrase indicator time space.
I think this would offer two main advantages:
- Natural flow of speech / dialog allowed for end users
- Wake word would be recorded, generating additional samples for NN training (particularly helpful for custom wake word models) using a confidence based inclusion system (eg: 95%+ confidence auto-included in next training)
I understand why the current system is implemented–as we can trigger the record event only after a successful NN confirmation of wake word. However, I’d argue this change would make the system more intuitive and user friendly–all while generating valuable data for those who opt in!
For this to occur, my gut is we’d have to keep a rolling buffer of audio frames for ~10sec, waiting NN confirmation of wake word. Upon confirmation, dump buffer + recording, slicing the start at the starting frame, and noting the end frame of the wake word. In this way, you’d end with a single audio file containing the trimmed wake word + command, and meta data as to where the wake word ended / command began + confidence level.
The entire block could be streamed or sent to STT, with post-processing of the file done locally (queued, for free CPU) or on server (for those who opt-in).
If this something that has been discussed prior? Couldn’t find any reference to it in the forums. Thought I’d create a topic here though, as opposed to an issue on Github. Thoughts?