Adapt - First Open Source Intent Parser Released by Mycroft A.I

Adapt doesn’t yet deal with numbers! At least, not in any helpful way. There’s an open task in our JIRA instance about making datetimes a first order citizen, and numbers are an obvious additional case. Right now (with the exception of the tokenizer), Adapt doesn’t require any localization, and converting from phrases to numerals (and vice versa) is something that will vary from language to language. I don’t have a good answer for you right now as to what the correct direction for this is.

In the short term:
You can specify a list of reasonable temperatures that are specific to your skill (twenty is ok, one hundred is death)
You can specify a regex entity that extracts numbers:
"?P(<Temperature>\d+) degrees"

Since (at the moment) this is primarily a command and control interface, the vocabulary sets for these skills shouldn’t be particularly large.

Good Work, but What are the differences between Adapt.ai and Api.ai https://api.ai/ ?
Why not use and improve what already exists?

Regards,
Miguel

Aside from the technical differences, which I wouldn’t likely be able to provide too much light on, just quickly having a look at the code on github, I’m unable to find where or how they handle parsing of the requests. Do you know if their intent parser or voice request parser is open source? My first impression is that they provide SDKs for various programming languages, but their implementation isn’t actually opensource.

1 Like

The parser includes code that prefers wider parses (covering more of the utterance) than smaller parses. So if “tokyo weather matsuri” is known as being a song title, it should recognize it. Unfortunately, it can not work out that this is a song title just because it appears in front of the word “song” and after the word “play”, as best I can make out.

@Raidptn Adapt is open source and not reliant on a 3rd party service. Even if https://api.ai offers free service to open source project - you are still beholden to their service being online and available for the lifetime of any project you base upon it.

This is a correct interpretation of the code. The “Adapt-y” way of implementing this is to have an index of song titles that you’d expect to recognize. A good implementation would be to get song titles from the user’s media library. A less good implementation would be to go to freebase and get a list of all songs ever written. I do not recommend the latter; people have written a lot of songs :slightly_smiling:

I tried that on my own attempt at doing what Mycroft is attempting. The trouble is that the people who provide track data are not very cooperative. One of my tracks has a title tag value of

 "Gimme! Gimme! Gimme! (A Man Af"

And my library has over 3,500 tracks in it. (not all ABBA songs) So I had to add an additional table relating a special “pronounced title” string with each track, and populate that with algorithmically simplified title strings. I removed anything after a left-paren and converted all punctuation to space. Then I was faced with the pronounciation dictionary not having “gimme” in it.

Yup, these are the problems! I’ve done the same thing on past projects; music is a particularly dirty data set. The lack of gimme in your pronunciation dictionary is something that should be resolved with a high-quality dictation speech recognizer. There’s then the issue of using, for example, the english recognizer, and trying to recognize the names of german songs.

Long story short, this stuff is hard. But hard is fun!

A way around this is not to try to select individual songs by voice command. Instead define nicely named playlists for various purposes. “Mycroft, play my christmas album.”

Another approach is to implement a “fuzzy match” algorithm that tries to match the text output by the recognizer against known category strings using something like a SounDex hash. This requires using specified grammars with wild cards like “Play the song [WORD…] .” or using “hamming distance” matching. A general “sounds most like which of these” match filter might be generally useful to several behaviors.

My library includes songs with titles like “川の流れのように”. And Japanese has phonemes that do not exist in English, and vice-versa.

Possible solution:
For this you could go through the music library and try to detect what languages are present, (language detection need be run only once, each time the music library is modified), or some other way to obtain a list of languages.
Then whenever the user asks to play a song, the spoken song title could be transcribed with all of the language recognizers in the list of languages the music library contains. The transcription with the highest confidence is then chosen, and that is the title that is searched for.

Would also be nice for persons whose first language is not English, but have English music in their musik library.

also: “99 Luftballoons” ftw

What about a json skill parser ?

I think we could write most skills with just a few jsons.
Jsons for entities, one for regex and some properties.

Therefore we could easily write skill without touching any python.

Edit : maybe not a good idea…

Hi, having fun with adapt.

Does it support well UTF-8 ? i tried to put a “é” in a keyword and I have weird issue in json output

{
“Delay”: “10 minutes”,
“intent_type”: “timerIntent”,
“confidence”: 0.4827586206896552,
“timerKeyword”: “pr\u00e9viens”, ---- instead of “préviens”
“target”: null
}

Other question : is there a french tokenizer ? for words such as “j’ajoute”, I would like “ajoute” to be a word (a keyword actually) but it doesn’t work =)

I think french tokenizer is pretty similar to the english one except for this quote rule (which has exceptions such as words like “aujourd’hui”).

Other question :slightly_smiling:
I tried registering this regex entity :

engine.register_regex_entity("(?P<NumericValue>10)%")

But it doesn’t work as it should, any idea ?

@seanfitz what do you think? ^

I’d love to know what you think is “should”, as it appears to me that would match on the string 10, as opposed to any numeric value. I think you want something like

engine.register_regex_entity("(?P<NumericValue>\d+)%")

Yes, I’m sorry.

I took this example as I could have chosen engine.register_regex_entity("(?Pab)c").

Actually, at one point, when recognizing

“ab c” with “(?Pab) c”, I had the good result (i.e. “ab”)

but

“abc” with “(?Pab)c” returned nothing.

And I didn’t understand.
I will do some more tests if I have time :wink:

The keyword ‘play’ indicates a general purpose media playback intent is required. It makes sense to match against a list of titles in the users’ library first. Failing to find an exact match, it will need to build a larger data-set on which to search.

It seems most prudent to search this information progressively until a high confidence match is located. First, by searching a list of songs by the same artists in the user’s library,
then a list of popular songs in the last 2 years,
then a list of iconic classic songs.
then repeating the process with lyrics instead of titles as the user may likely be referencing an iconic lyric in the song instead of the official title

Stopping as soon as a high-confidence match is located
It could run a similar match algorithm for video, audio-books, podcasts, etc. each media type returning their highest result, if multiple media types are returned with similar confidence level, Mycroft could suggest clarification of media-type or artist name.

Since the download/caching of media metadata is done in an incremental cycle, the sequence could be optimized in later releases or customized by the end user trivially.

I don’t see any short-term problems with 3rd party services and directories for this information, though it would be really nice if each Mycroft could search a P2P network of other Mycroft first, before going out to search the WWW. Mycroft units may not be very good at processing requests for each other in real-time, but they could synchronize data-sets of common information like song title lists per artist based on shared interests between the users of the units, (2 users both listen to the same artist, so their units maintain a synchronized list of song titles of that artist)

The first thing I would check is if Mycroft had been asked to play the song previously.

Thinking about it, if this method were to work, a sensible thing to do would be to add a way to inform Mycroft that it had made a mistake: just in case it gets the worng one and so that it doesn’t continue to repeatedly play the wrong song every time

Hi, I played around with adapt-parser, it is nice knowledge based parser, But I would like to know what would happen if text doesn’t come under any trained intent.
the engine should return json with “0.00” confidence, but it is throwing some kind on exception.

‘not {!r}’.format(s.class.name))
TypeError: the JSON object must be str, bytes or bytearray, not ‘NoneType’

for eg:
I trained adapt engine for “Music_Intent” with some words, but tried “hi” to parse which is throwing above mentioned exception.

Help much appreciated

Hi @avi_gaur,

I’m not 100% sure that the adapt engine will return matches with a result of 0. This might be the problem but I’m not sure what statement causes the type error so it’s hard to tell.

Can you show the code you’re using (pasted on pastebin or such)? I’ve tried with the example code in adapt and I can’t quite trigger your exception.