Handling of intents containing unique first or last names

Qdev · July 19, 2019, 1:16am

Hi, I’m looking into Mycroft for a development project at my organisation. I just wanted to verify if Mycroft has capabilities to overcome limitation we’ve experienced when attempting to develop on Alexa and Google Assistant.

Our software serves law firms, the problem comes up when attempting to look up client contacts for these firms as their client first and last names are often unique. Mostly not common English names.

With Alexa and Google Assistant (Dialogflow), there is a limitation to how many custom names you can add to train the voice handling agent.
On top of this we are unable to dynamically get the intent handler to resolve the client name parameter dynamically depending on the logged in firm.
For our case, we have 5000+ firms using our software, with each having 100,000 unique ways to look up a client contact.
This means we can’t just load all the client contact info into the one client name parameter.
Firstly it would blow out the parameter slot. Secondly, when speaking via voice, the speech to text translation will not be accurate as it will contain
millions of name variations. (They are not relevant to the logged in firm).

The limitations we are experiencing with Alexa and Google Assistant is, there is no way to get the voice input to be handled
by a custom speech to text model (for the specific logged in firm). This is not in the pipeline at all.
By the time Alexa or Google Assistant gives the text of the client name, it is often wrong.

For example;

When we say “Find Aitken”. The text translated is “Find I can.”
When we say “Find Addley”. The text translated is “Find at Lee.”

Please let me know if you need any extra info regarding this case. I’m happy to provide clarifications or a skype call to explain scenario.

baconator · July 19, 2019, 4:30am

You can use a STT engine that does custom voice models. There’s two major ones I know of you can check out:

Kaldi
Deepspeech

Both will take a good deal of tweaking and testing, I presume, to meet your needs.

rekkitcwts · July 19, 2019, 6:23am

I really like this idea. Some people have unique names for some reason (and yes it affected me, sometimes being mistaken for a girl when they use the first word of my real name).

Dominik · July 19, 2019, 8:11am

Regardless of using Google, Alexa, Kaldi, etc - this will be a hard problem for any STT engine.

As this sounds like you want to build an enterprise application you might want to contact @J_Montgomery_Mycroft

Qdev · July 22, 2019, 2:04am

Thanks @baconator, I’ll check them out. Yes it needs to be custom voice models as it should bias towards those specific words when spoken to.

Both Alexa and Google allow creation of a type and adding example values, often to a specific size limit. Though is no routing capability to a custom voice model depending who’s logged onto your system.

Hopefully these other solutions have some dynamic routing capability where you can direct handling to a custom voice model of slot type of the logged on user.

Qdev · July 22, 2019, 2:09am

Yes @rekkitcwts, this seems to happen a lot. I’ve asked some people working on voice assistant why this stuff works for looking up weather, street maps and music by voice as they would experience similar problems.

As for weather some cities or towns often have very unique names.

The answer I got back from technical teams were, Weather and Music were huge use cases, often the platform provider will have assigned engineers building custom processes training these models internally and keeping them up to date and ensuring they work well.

Sounds like those features are not ready for public use yet or still require a bit of custom work internally for them.

Qdev · July 22, 2019, 2:10am

Thanks @Dominik, I’ll contact him.