Just wondering if you guys have figured out the Hotword Detection (voice activated trigger) aspect of it. Will it be Sphinx-based or something else?
We have! At least, for the time being. We have an implementation that uses pocketsphinx locally in conjunction with an extended version of the SpeechRecognition library. We’re pretty happy with the performance right now, but an alternative implementation that runs Kaldi locally (for the streaming interface and better speech end-pointing) is not off the table. I’m also working on making our extensions of SpeechRecognition non-breaking so we can contribute them back easily.
Also, we refer to this as “Wake Word Detection” internally.
Thanks for the question!
Aside: If could re-tag this, as it’s not directly related to Adapt (at least not in our current implementation), I’d appreciate it.
I’m having fun too with SpeechRecognition (can’t wait to have fun with your code).
But pocketsphinx, so far, has beeen reeeeally slow interpreting the audio.
I will do some more tests…