Question Please Help - Adapt - 6 Phrases - Embedded - Can it Do It?

Hello -

I am trying to evaluate the potential for the Adapt module for our purposes. I would like to listen for a few specific phrases and parse the speech to text without a cloud connection. Those phrases are STOP, WAIT, GO, HELLO, YES, NO. That’s it.

I will have separate code that will evaluate what to do with the semantic result from those utterances (I won’t build skills with Mycroft). Is it necessary to use the cloud to evaluate this speech or can Adapt handle this on an embedded system?

Adapt is an intent parser, and farther along the stack than you want to be. It works after speech-to-text. The STT results are fed back to the intent parsers for evaluation against their expected input. By the time you’ve invoked Adapt, you’re halfway to writing a skill, even if you aren’t descending from mycroft.skill.

If you just want to actively listen for these phrases, and nothing else, you’re looking for a wake word listener, using several models. I’m no good with the listeners, so I won’t try to guide you, but you can start with the relevant section in the Mycroft docs, and people who do understand the listeners are active on Mycroft’s chat server.

This would not require a cloud connection. You’d just have the listener invoke the corresponding code on the corresponding wake word.

If you want to activate it with a button or a separate wake word, and then have it respond to one of those words, you’ll need to run the audio through some kind of STT. Some can be run locally, or LAN-hosted. Others use the cloud. Also not my area of expertise, but same deal: the docs have pointers, and people who understand the various STT engines are active in chat.

1 Like

Kaldi-Spotter might be worth a look for this use-case