I have only worked with Alexa skills previously, as well as chatbots (text typing and button pushing), so please excuse this basic question. I searched the docs and the source code, but could not find what I was looking for, so I am missing a keyword and/or architecture understanding.
I want to have two WAKE/HOT words, one is “Order” and the other is “Dictate”.
When I say “Order”, I will use one set of skills that understands the variety of items that are being ordered, and takes the appropriate actions.
When I say “Dictate”, I will use a different skill. The job of Dictate is to continuously perform a speech-to-text until I say “End Dictate” or “Stop Dictate”.
I could not find how the system knows when to stop streaming the words I am saying until the next WAKE/HOT word. Does it time out if there is just a set amount of time with no words? Can I set that time dynamically? Does the system constantly stream to the back-end without sleeping? Does a skill completion signal the end of the interaction?
I don’t necessarily want the system to stop streaming when I am in “Dictate” mode, but I do want it to stop streaming when I say “End Dictate” or when an “Order” skill signals that it is complete. So, how is it signaled to sleep and stop streaming to the ASR server?
Please direct me to the documentation and source code so I can review. I am already aware of these two wonderful Jarbas packages that can provide some assistance, but I am missing a key piece of the architecture puzzle to put it all together. https://github.com/JarbasAl/local_listener https://github.com/JarbasAl/skill-dictation .
Thanks so much. I received several intelligent responses to my queries the other day. Appreciate it.