Mycroft Community Forum

How to recognize "custom" words in skill intent?

How do I get Mycroft to recognize words and terms that mean nothing to anyone but me? I alluded to this problem in another post.

I created a skill to map to network shares on my home network (using cifs-utils). I want to say the name of the share to Mycroft and have it connect to that particular share. The share names are English words, but have no real-world context.
For example, I have a share called “everstor” (pronounced ‘ever store’). It is a backup drive where I dump miscellaneous data, and because I keep it forever, I called it “everstor”. When I say “map to everstor” it is recognized as “Everest store” or some other inaccuracy. In response, I limited the skill intent to respond to “map to network share” instead of “map to ” but now I can only map to the one network share that I associate with the code.

I read some documentation about phonemes in the custom wake word sections. Is there a way to integrate custom phonemes into skill intents? Alternatively, is there a way to create a custom global phoneme dictionary available to all (of my) skills (rather than addressing on a skill-by-skill basis)? I have read the documentation on both the Padatious and Adapt intents but neither seem to address this directly. For example, regarding using :0 from the Padatious Intent, even if the unknown word is the name of the share (e.g. “map to :0 network share”) within the intent utterance, I don’t know how to get Mycroft to recognize the actual share name when spoken by the user.

Thanks to all.

We have the same problem. The word Mycroft isn’t common in English. As a result is often inaccurately transcribed.

Modern speech-to-text algorithms make use of language models based on large volumes of written language.

They run audio through a two-step process. First they try to transcribe the audio directly into text, then they evaluate that text against the language model to determine if they’ve transcribed something inaccurately.

Most of the time the second step improves the transcription, but in cases where people are using words in ways that they wouldn’t be used in common speech the second step actually makes the transcription worse.

So nearly every time I need to use a voice algorithm to transcribe the word Mycroft it instead transcribes it as Microsoft or Minecraft because those words are more commonly used.

This is actually a pretty significant problem for specialty applications. In places like hospitals they use very specialized dialogue that includes words that are not in common use. By using language models that are adapted to the language as a whole, the models make it very difficult to accurately transcribe speech.

The solution is to create data sets that are customized for the application or to train existing speech and language models on additional data to enhance their accuracy.

Unfortunately the largest open source effort to create a modern speech-to-text engine ( DeepSpeech ) is not making progress as quickly as is necessary to keep open source relevant in this space. If it was, we’d be able to customize a model for applications like yours.

As it stands now we’re at the mercy of commercial products that don’t have the features that we need and don’t provide the data and training architecture necessary to improve them.

I’d say your best bet for creating a speech-to-text engine that works for your chosen vocabulary is to either use pocketsphinx or sphinx. Alternatively, you could kludge it by creating a filter that substitutes your chosen word for the strings the stt engine is inaccurately returning.

Here at Mycroft, it would make sense for us to build a filter into our software that always substitutes Mycroft for Minecraft or Microsoft.

If you did build some code that does this substitution and did it in a reasonably elegant way I suspect we would pull it in to master. Frankly I’ve wanted a piece of code to substitute Mycroft for Microsoft for some time now.

@Aldeaman, I Just a random thought on this. Maybe the best approach is to create your intent with the following.
use the trigger words “map to” then parse the remainder of the utterance (in your example will return “Everest Store”). Next apply fuzzy matching between the utterance remainder and the expected utterance. Then pick some threshold for a successful match.

network_resource_confidence = fuzzy_match(phrase_remainder, "everstor")

I have not tried this but it might overcome the initial issues outlined by @J_Montgomery_Mycroft

that substitution will cause more trouble than help i think, mycroft is usually in the wakeword not in the actual queries, id be happy to be proven wrong and maybe suggest a better approach (selective substitution), what are your use cases?

“hey mycroft, what do you think of microsoft?”
“microsoft is awesome and respects your privacy!”

“hey mycroft, launch minecraft”
“im already running”