Build an open future with us.

Invest in Mycroft and become a community partner.

Multiple intents match an utterance


#1

I’m doing a skill for Yeelight smart lights. There are already skills for Hue lights, LIFX* and no doubt there will be many more. Most of them will recognise “Turn on the lounge light” and similar. If I have two or more light types, then some will belong to one skill, others to different skills. But when I ask for a light to go on, I don’t want it to only match one skill, but to try all possibilities until it gets the one managing the right room. Otherwise it could fail on one skill (no room matching lounge, only bedroom) while another one should succeed (the one with the lounge light). This is really a generic problem for multiple suppliers of common functionality.

The code in intent_service.py calls determine_intent() from adapt.engine’s IntentDeterminationEngine which eventually calls _best_intent() i.e it only returns one match, the one with the highest confidence.

A non-solution is to ensure each skill has disjoint vocabulary - then as a user I would have to remember which room has which type of light and its associated vocabulary while the developers would have to ensure disjointness, which is probably impossible.

Alternatively, can _best_intent() act as a generator, returning successive “best” matches each time (of course, confidence greater than some threshold). Also for an intent to be able to signal in code that it has has failed so that another intent can be called. There’s a heap of changes doing that though - is there a better way?

*The LIFX code avoids “false triggers” by only matching to lights it has registered during initialise(). Does this update regularly, or does adding a new light need a restart of Mycroft? Otherwise, is this good practice or something that should be enforced during the acceptance testing by Mycroft?


#2

What you’re discussing is why things like homeassistant are used. There’s a skill looking for a maintainer/helper for that now, in fact: https://github.com/MycroftAI/mycroft-homeassistant


#3

what you are describing is more or less what happens for “play X” intents, there is a work in progress for other queries

PR for Common query framework

A system similar to the common play for negotiating for finding the best answer for questions. Currently three levels of CQSMatchLevel Exist:
EXACT: If the query could be identified exactly and a response is returned.
Example: The cockail skill could find a cocktail in the query that exists in
it’s database.

CATEGORY: A category of questions the skill handles could be identified.
Example: The wiki-data skill can identify that the question is regarding
A date of birth and finds an answer.

GENERAL: A general question and answer service could parse the question.
Example: The wolfram alpha skill got a match for “How tall is Abraham
Lincoln”.

The skill can also return the CQSVisualMatchLevel indicating that they have visual media to go with the skill, on visual devices such as a Mark-2 a bonus is given to these matches.

The skill-query will send the response for the best match to the TTS but will also invoke the CQS_action() where additional handling may occur, showing of visual info or in the case of wolfram alpha preparing to send the sources via e-mail.

How to test

install https://github.com/mycroftai/skill-query and switch the branch on wolfram and duck-duck-go to feature/common-qa .

Make sure the questions still works.


#4

@JarbasAl do I get it right that you are suggesting a “Common Home Automation framework” that handles requests/intents like:
turn

  • switch/toggle/dim light
  • turn device on/off
  • set/change temperature
  • shutter control
  • etc.

#5

im not suggesting, im pointing out it exists and its official :slight_smile:

basically all skills get a shot to answer a question, and return a confidence on the answer

this way all skills that handle “something” can return a confidence and best will be chosen

this wont work with existing skills, they need to be made with this in mind

docs for common play which works same way here

afaik this is very beta and there are no docs yet for the common query framework except for what i posted

the skill that handles disambiguation needs to be manually installed, relevant code can be found here


#6

@jannewmarch,
I already created a working yeelight skill here. Feel free to use what you may.


#7

Thanks, I’d missed that. Maybe I should use Google more :-(. I see you have hard-coded IP addresses. I’m using yeelight.discover_bulbs, and have also got testing working. I’ll put yours and mine together - you are ahead of me on flows!


#8

The two approaches suggested (homeassistant and CSQ/CPS) seem to be going down the route of “meta skill” - i.e. a skill that manages other skills in its own way. The HomeAssistant skill presumably works with any skill that has been built under the HomeAssistant framework (may be wrong, haven’t checked it in detail yet), while “Play…” utterances are handled by those skills written to work uder the PlaybackControlSkill.

That’s definitely a viable route to go. However, there can be an indefinite number of such meta skills, depending on the framework you are using (HomeAssistant, OpenHAB, SmartThings, …). Creating a skill under such a framework means that it can be used once a meta skill is made for your digital assistant (Mycroft, Alexa, Google Assistant, …). That’s a good thing if you run an IoT framework, not helpful if you don’t.

PlayControlSkill is specific to Mycroft, and there could be many such: FastForwardControlSkill, IncreaseControlSkill, … . PlayControlSkill only handles the keyword “Play” - what about multiple keywords such as “Increase”, “Random” which might be relevant to a Play skill?

Sorry, I couldn’t understand what CSQ does. Does it abstract these specific meta-skills?

One thing I like about PlayControlSkill is the separation of skills it manages into essentially

can this skill run?
run this skill

Mycroft currently does the first syntactically but not semantically, and then does the second, “run this skill.” The weather service shows the issues with this. If you ask “What is the weather in Uranus” you get a stupid answer of “6 degrees” (or similar) instead of “-357 degrees”. It would be better for the weather intent to fail or say “No I can’t run on the place Uranus”. I would say it would be better to fail meaning “can’t run” so that a (fictitious) astro-weather skill can run. Other (silly) questions like “What is the weather in roast chicken” should fail with no matching skill.

I’m suggesting that maybe Mycroft itself should handle failure rather than relying on a meta skill. Otherwise we would be looking at lots of meta skills for different cases. I’ve made an initial prototype, but not complete yet. A key part is that it requires NO changes to existing skills. Only those skills that wish to signal semantic failure need changes by adding “return False” when an intent fails.


#9

Please ignore this if I’m being naive (I’m very new to this) but why not make Mycroft ask you when he has 2 or more possibilities? He can then store your answer for the next time if applicable; i.e turn on the bedroom light, ‘do you mean the A light or the B light’. This can obviously be expanded so as to add labels on the fly; i.e. turn on the boys bedroom light, ‘do you mean’ etc and then a light type will be associated with a place.


#10

from the community we have 2 main meta/bridge skills, home assistant and node red, these will interact with other software and fit your description above

as part of mycroft core we have 2 more “meta” skills, but these are integral part of the workflow, so i wouldn’t really call them meta, these are the common play and the common query, these provide tools for other skills to behave correctly

the “common X frameworks” introduce a slightly different workflow, most skills run with intents, they basically announce “if the user says this trigger me”, but if the skill instead of using intents registers itself in common play/query then it must parse the utterance and report back how well it can solve it

Common play will disambiguate for play commands, this skill reacts to a play order and asks other skills “hey can you play this?”, the other skills will answer with a confidence, “that matches exactly a song and artist i have”, “that matches a genre i have”, “that matches an artist i have”, then the common play will trigger the skill that matched the best

Common query behaves the same, but for general questions, “can you answer this question?”, other skills will also return a confidence, “i can answer birthdays”, “i can answer how to questions”, "i am wolfram alpha and can answer anything"

in case of equal confidences there is a TODO to ask the user what he meant

you got it a bit wrong with the “can this skill run?”, skills that maybe can run will register themselves, so the question is actually “how well can you solve this thing you said you usually can?”

regarding audio playback the playback control skill will handle next/previous/stop/resume, play is the only intent that really needs to ask individual skills which one can do it, it is common and important enough thing that it deserved a framework independent from general questions, skills only need to start the actual playback, audio is controlled independently

the common query is more general purpose, it can handle mostly any case, there is no need for lots of meta skills, it wouldnt make sense to have a meta skill for controlling lights, just use the common query and report back “that is a question about lights, i can control lights, but i don’t know that light name”

the problem with this is that a old or “greedy” skill may decide not to use the framework and be given precedence if badly designed, i.e “trigger me whenever the word light is present!”

skills should be designed to require specific keywords if they would collide with one of these frameworks (needing next/previous in a skill that browses a database for example), otherwise it’s bad skill design

“next” -> audio service will play next song
“next book” -> book library will trigger instead because of book keyword

other thing to have in mind is the context, if we just triggered a book skill the book keyword can be injected even if the user didnt say it

i might write a blog post about these frameworks, for now you might find all about intents interesting


#11

Another point that I wanted to make here as well @jannewmarch is that if an Intent fails, there are several Fallback Skills available that try and “catch” the Intent.

Fallback Skills have an order of precedence and this determines which one is triggered.

Ideally over time we could train a Fallback Skill to “know” what was meant by an Intent using machine learning techniques.


#12

One thing that would be useful- again at a “meta” level- would be something like the “explain” facility that some SQL dialects have, i.e. “Hey Mycroft, explain turn on the lights” which would get it to say which skill thinks it can handle the job and (possibly) why.


#13

@Dominik I’m not suggesting a "Common Home Automation framework”, but Steve Penrod talks about CommonXYZ framewoks and says a CommonIoTSkill is on its way


#14

@robgriff444 I think you will find that some of this can probably be done using the CommonQuerySkill that @JarbasAl talks about. A skill would need an intent to handle queries such as “What lights exist?” which could be asked of all the skills supporting (in this case) lights. The current light skills don’t have such an intent though.

A question: can a skill contain two classes, one subclassing from MycroftSkill like most skills do to handle most utterances, and another subclassing from QuestionsAnswersSkill specifically to work with the Common Query framework? Then they could share code. But create_skill() only returns one skill. A small code change to handle a list could work. I suppose two skills, with the second dependent on the first could also do it.


#15

Thanks @JarbasAl, useful comments. (I like your tutorial, wish I had found it earlier!)

from the community we have 2 main meta/bridge skills, home assistant and node red, these will interact with other software and fit your description above

The advantage of these frameworks is that once there is a skill for them, then anything running under that framework is available to Mycroft. I gather from the recent CES that vendors are now starting to write to some of the frameworks, so the number of services available to Mycroft (and Alexa and Google Assistant…) will steadily increase. But using a framework won’t be everyone’s choice.

regarding audio playback the playback control skill will handle next/previous/stop/resume, play is the only intent that really needs to ask individual skills which one can do it

Yes, my mistake - I hadn’t thought it through.

you got it a bit wrong with the “can this skill run?” , skills that maybe can run will register themselves, so the question is actually “how well can you solve this thing you said you usually can?”

Yes, I stand corrected, thanks. Can I rephrase this as “given the syntax matches (to high confidence), will the semantics match (to what confidence)?” The vocabulary should have been chosen so that the confidence level is high, but not a guarantee.

What the Common X Skills seems to be adding is a semantic layer to the syntactic pattern matching of Adapt. This is done without altering the Mycroft core code at all. What I was experimenting with was changing the Mycroft core code to add in semantic tests. Obviously that has to be done so that it doesn’t break any existing systems nor cause any time blowouts in execution.

I guess my question would be: is that worthwhile exploring, or is the current core considered off-limits? Or maybe it has been tried already?

The current algorithm in Mycroft is basically

    best intent = None
    for each matching intent
        update best intent
    if best intent != None
        execute intent
        return
    execute fallback skills

I’ve prototyped changing this to

    matching intent list = []
    for each matching intent (> some confidence)
        add to matching intent list
    sort matching intent list
    for each intent in matching intent list
        execute intent
        if execution failed
            continue with next intent
        else
            exit
    execute fallback skills

Success or failure can be tested for a return value of None or otherwise from a method - current intents return no value (i.e. None) so no change is needed to signal success. (I see that the fallback skills return True or False in a similar way.). So current systems will not be broken.

The changes to the Mycroft code are not extensive. Return an ordered list from determine_intent() instead of a singleton, put success or failure on the bus from wrapper() in core.py. This is caught in handle_utterance() and will get another intent from determine_intent() if needed. The time cost is sorting and an extra bus message. Sorting is pretty efficient anyway in Python, but if needed a lazy sort could be used, as described by Lazy sorting in Python For current skills, they all succeed, so only the best intent will be tried, an O(1) cost in trying intents. This cost only increases if skills that fail are tried.

Following the ideas of the Common X systems, maybe a combination of adding a semantic assessment to the core might be better:

matching intent list = []
for each matching intent
    add to best matching intent list
can succeed list = []
for each intent in matching intent list
    if intent can succeed (above some threshold),
        add to can succeed list
for each intent in can succeed list (maybe sorted?)
    execute intent
    if execution failed
        continue with next intent
    else
        exit
execute fallback skills

I’m assuming “can succeed” returns a confidence level 0.0 - 1.0 like CommonQuery. This isn’t as finegrained as CommonPlay - which can still be used, none of the existing skills will be affected.

If MycroftSkill has an added method of can_succeed() returning 1.0, then only classes overriding this signal possible inability to handle an utterance. So existing intents with no changes work exactly as before. The extra time cost for existing skills will be the round trip to evaluate the default value of 1.0 and an intent with that value will of course be executed (and should succeed). This is in addition to the sorting and extra bus traffic of the previous code.

So, is it worth experimenting or should I give up on this?


#16

i will comment later on, but that’s certainly worth exploring! count me in for testing

also related