In 2015 there are quite a bit of virtual assistants out there. Some for PC, Mac, and even mobile platforms.
The problem with these entities is that most of the time you have to trigger a listening event by saying something like…
For example, “Okay Google”, “Okay Cortana”, “Okay Ubi”, “Okay Alexia”, “Okay Ivee”.
Granted there are some who do not need the preset “Okay” but still, most of them do.
The problem is, saying “Okay… [whatever]” kind of sounds dumb. The justification for some of these entities that use this preset is because they want the device to undoubtedly know when you are triggering a listening event. This worked wonders several years ago. However, in the past few years there has been tremendous growth in speech recognition advancement.
So it brings up the question, do these entities that use speech recognition really need the “okay” preset anymore?
Will MyCroft use the “okay” preset? If so, can we change it? Will we be able to change the name of it to make it more personalized?
Fortunately you will be able to change the “Wake Word(s)” to whatever you want. By default it might be “Mycroft” or “Hey Mycroft”. From testing, I’m pretty sure they use a two word phrase like “Okay Google” or “Hey Cortana” in order to avoid as many false positives as possible.
In the video I saw, it looks as if Mycroft’s commands may be being trigger by key words in the sentence or command. This is very cool, since I use AHK scripting to complete a command in the current AI I use. It does seem to have much potential.
Will Mycroft support multiple trigger phrases? For example, will it be able to recognize “Open Pandora” as a trigger, as well as “Hey Mycroft” or whatever else? Multiple trigger phrases is something I’ve wanted in every audio AI I’ve used- having to say the same set phrase that’s not actually related to what you’re saying it for really breaks the feeling of natural language.
The problem with voice recognition is that it doesn’t know what words mean, so what you are asking can prove somewhat difficult. This is why a trigger word is needed for the entity to acknowledge that you want it to do something.
For example, Google Now.
The trigger for Google Now is “Okay Google”. Then it starts to listen for commands after that trigger is initiated. It needs that trigger to begin listening for commands.
One day, when speech recognition is more advanced such trigger words will not be needed. For now, it is not practical.
Multiple phrases is definitely a possibility, but for some of the reasons @Dominique mentioned above, arbitrary launch phrases is unlikely.
The idea behind a wake phrase is that it’s used to explicitly signal intent, but from an implementation perspective it’s also about privacy and resource consumption. Wake words are processed locally on the device, and (most likely) full utterance parsing is handled off device. Wake words prevent us from streaming arbitrary (or all) audio to a 3rd party, and help us explicitly enter a conversation with the device where other cues may not be available.
The largest aggravation in this type of thing is that modelling wake word phrases requires intimate knowledge of the speech recognizer. A lot of them use SRGS , which helps to standardize, but it’s a set of tools that has a steep learning curve.
In another AI I use, I use AutoHotKey scripting. The scripts can be set up so the voice recognition triggers when a key word is spoken. So if I want it to email me something, I just have to say any phrase that contains the word email. “Email that to me”, or “Send that to my email”. The word email is picked up, and the assigned command would send whatever to my email or as a text message. Interesting.