Adapt is removing "the"?


#1

I am creating a new skill for movie trivia. I noticed that some movie names are changed from recognized utterance to data passed to code (and I don’t think the problem is with the regex, but still will provide it).
For example the recognized utterance is “movie info about harry potter and the chamber of secrets”.
In the code I get “harry potter and chamber of secrets”. This of course breaks my query.
My vocab contains:
tell me who is playing in
who is playing in
who’s playing in
movie info about

My regex is:
(tell me who is playing in|who is playing in|who’s playing in|movie info about) (?P.*)
I get in the code the named group with :
movie = message.data.get(‘MovieName’)

I am checking with https://regex101.com/ and the MovieName named group contains the correct thing.

What am I missing? Is somehow Adapt or some other part of Mycroft stripping the definite article? Or have I made a mistake somewhere?

Also if (developer questions are with if-else structures it seems) this is the expected behaviour for some reason, how can I get the raw utterance as possible work-around?


#2

utterance normalization removes articles, “the”, “a”, “an”, before they are sent to adapt

inside an intent you can get the full utterance with message.data[“utterance”]


#3

Can this be turned off?

I have trouble getting the german phrase “schalte das licht an” (“turn the light on”) to work.

I observe similar effect as @vonodna: my regex with this phrase works in regex101 but is not recognized by Mycroft/Adapt. I think my case is even worse as the word “an” is a required keyword and therefore my intent is not triggered at all.

… browsing mycroft-core sources …

Hm, looks like that there is a normalize-function for german (de-de) in mycroft/util/lang/parse_de.py but there article “an” is not removed…?

… doing some tests …
Looks like that mycroft_cli_client tags the entered phrases with lang=“en-us” although Mycroft is configured with lang=“de-de”. The wrong lang-tag causes the intent-parser to use parse_en.py for normalization.The same phrase spoken (“Hey mycroft, schalte das licht an”) get tagged with lang=“de-de” and triggers the intent correctly.

EDIT: some days ago issue #1917 was opened in Github adressing the lang “en-us” tagging.


#4

Thanks, JarbasAI!
OK, I did it with message.data[“utterance”], but now I have a regex that is parsing the utterance in my code with the same content as in my rx file. This breaks the language compatibility, because it is not part of the language coded folders. What should I do? (I want to keep working with Adapt for the time being, not Padatious)


#5

i really suggest you use padatious whenever regex or articles are needed instead of adapt

i agree that removing articles should be optional, but if its just a config field then skills may behave badly depending on where they are running, some may depend on the normalization

i will think about some mechanism to disable this per intent and make a PR…

if you really must use adapt, you can try to do something like

    leftover = normalize(message.data.get("utterance", ""), remove_articles=False)
    for token in message.data["__tags__"]:
        leftover = re.sub(r'\b' + token.get("key", "") + r"\b", "", leftover)

now leftover wiill have everything adapt did not consume, in your case the movie title

however some keywords might not be replaced if there’s an article in between them, say you have a .voc with “tell time” , because you did not remove articles the utterance has “tell the time”, and that .voc will not be removed like it should, so the leftover will be incorrect

still want to use adapt? At this point if you do not want padatious, you are probably better off skipping the intent parsing altogether and just go with a fallback skill and parse the utterance there

you can try to match individual .voc files in your code

def voc_match(self, utt, voc_filename, lang=None):
    """ Determine if the given utterance contains the vocabulary provided
    Checks for vocabulary match in the utterance instead of the other
    way around to allow the user to say things like "yes, please" and
    still match against "Yes.voc" containing only "yes". The method first
    checks in the current skill's .voc files and secondly the "res/text"
    folder of mycroft-core. The result is cached to avoid hitting the
    disk each time the method is called.
    Args:
        utt (str): Utterance to be tested
        voc_filename (str): Name of vocabulary file (e.g. 'yes' for
                            'res/text/en-us/yes.voc')
        lang (str): Language code, defaults to self.lang
    Returns:
        bool: True if the utterance has the given vocabulary it

Padatious would have made your life easier…


#6

This seems like terribly complicated workaround, I just wanted to make my first skill with adapt before moving to padatious, but oh, well, I will try it. I read your guide https://jarbasal.github.io/posts/2017/10/skill_guidelines_1/ and I already know you are a padatious proponent and looked at https://mycroft.ai/documentation/padatious/ but I am not sure those will be enough for me. Where should I read more to start developing skills with padatious and will it be possible to help me if I stuck somewhere? Thanks again!


#7

that is a terrible workaround which i do not recommend indeed

if you are just getting started with intents i suggest this blog post i made, all about intents for a general idea of what is possible

you did hit an edge case where normalization conspires against you, most often you will be perfectly fine using adapt