Build an open future with us.

Invest in Mycroft and become a community partner.

The CommonPlay Skill Infrastructure


#1

Originally published at: http://mycroft.ai/blog/the-commonplay-skill-infrastructure/

We recently added a new piece of software architecture to Mycroft known as a CommonPlaySkill. This is the first of a series of “Common” infrastructure pieces which will make working with Mycroft much more natural and powerful.

What is a Skill?

First a quick review: a Skill adds new abilities to your Mycroft. Think of it like the scene in the Matrix where Neo learns Jiu-Jitsu. Plug in a skill and suddenly Mycroft has new powers. Skills have two primary pieces: intents which allow them to define patterns of words to listen for, and handlers which allow them to perform an action when the intent is heard.

For example, a simple skill can handle phrases like “tell me a joke”. The skill has an intent which spells out an interest in that phrase (along with related phrases like “I want to hear a joke”, etc). That intent is connected to a handler which looks up a random joke and has Mycroft read it to you. Hilarity ensues.

Why do we need CommonPlay?

Clearly, the skill system is really powerful! But it has an inherent limitation – it decides the handler purely on the word patterns. While I can easily define a pattern that captures the phrase “play something”, without a deeper understanding of that something Mycroft would be unable to distinguish which player to use purely from the words.

Here are some example phrases that illustrate the challenges:

play Zork
This one is easy – there is a game called Zork, just play it.

play the News
This one is easy too – fire up NPR!

Play Huey Lewis and the News
Looking at this naively (as if I’ve never heard Huey Lewis), is Huey Lewis a reporter or a singer? Which skill should handle this?

play The Latest Single by The Hot New Band
Even if I understand this is a song request, it is impossible to tell from these words which music service has the legal contracts in place to be able to play the music.

play Ragtime
Is this a band? A music style? A movie? Yes to all of these. What should Mycroft do?

CommonPlay Approach

A single skill (skill-playback-control) currently captures all of the “play *” style utterances, like those listed above. This skill will now query all the CommonPlay skills and give them an opportunity to respond with:
  1. I can potentially handle that request
  2. This is how confident I feel in my handling
After the CommonPlay skills respond, there are a few ways to continue. If only one skill replies, it is the winner and will handle the request. When there are multiple respondents, the highest confidence wins. If there are several with about the same confidence, we can ask the user to pick the winner.

Gory Details

As they say, the devil is in the details. How do you catch the query? How do you format the response? What does “confidence” mean? We wrapped all of this up in a class called CommonPlaySkill which itself derives from the familiar MycroftSkill. To participate in the CommonPlay system you only need to derive your skill from CommonPlaySkill and override a handful of methods. Here is the all you need to connect a News skill to the CommonPlay system.
def CPS_match_query_phrase(self, phrase):
    if self.voc_match(phrase, "News"):
        return ("news", CPSMatchLevel.TITLE)
And:
def CPS_start(self, phrase, data):
    # Begin the news stream
    self.CPS_play(self.url_rss)
That’s it. The first method responds to the CommonPlay query, responding to any phrase that contains the words “News”. The framework will generate a standardized confidence level based on the given CPSMatchLevel and the number of words in the phrase that were used in the “news” title match it found.

The second method is invoked by the framework if the query match is determined to be the best match.

You can see the entire News Skill on Github. It also has an intent which supports a few other non-“play” phrases such as “what is the news” and “tell me the latest news”. As you can see, it has all the capabilities of a regular skill in addition to being in the CommonPlay system.

I won’t bore you with lines of code here, but you can see more examples involving complex matches on the Pandora/Pianobar Skill and the Spotify Skill.

So Much in Common

This is the first of several “Common” skill frameworks I have planned. The CommonQASkill will allow Question and Answer skills to search their databases for answers and then present the best answer found. A good example of why this is needed is the question “How old is …”. From those words alone (not knowing the specific name) you can’t tell if the best answer would be in Wikipedia, IMDB, or Wookiepedia (a Star Wars knowledge base). It might even best be answered by a skill that tracks refrigerator contents – “How old is my milk?”. The CommonQASkill framework will allow each of these skills to look at the specific query and report back how confidently they can answer that question.

A CommonIoTSkill is also coming, making it easy to combine multiple types of Internet of Things systems. They can handle identical verbal requests such as “turn on the light” by looking at the context clues, such as the location of the Mycroft unit which heard the words.

Something for Everyone

Everyone is welcome to create a Common Skill. The framework will likely evolve, but by deriving from the CommonPlaySkill class, your skill will receive the benefits of this evolution. Play on!

#2

Is there more reading / examples / information on the CommonPlay skill infrastructure?
I am trying to wrap my adolescent python brain (I am not adolescent, only my python skills) around how the transaction works between the CommonPlay system and my skill. What happens to the original intent builder structure? Do I just remove the word “play” from my original intent or does the whole utterance get passed on to any skill that is registered as a CommonPlay skill? Do I need to identify my skill utterance with something that uniquely identifies my skill such as “kodi”? Does the CommonPlay skill use my intent to determine the confidence or do I have to determine the confidence based on the utterance passed from CommonPlay. Sorry if I am confusing the issue but I would like to understand this a bit more before I begin carving up my existing skill(s) to support this.


#3

hey there @pcwiii - @forslund is going to do a writeup / tutorial on this soon so that we have some more documentation available on CommonPlay.


#4

The WIP can be found here: https://github.com/MycroftAI/mycroft-skills/wiki/CommonPlaySkill

Comments, Questions and suggestions are welcome :slight_smile:


#5

And this doco is now live at;


#6

I have read through the documents and still have a question. When reworking an existing skill to utilize the commonplay structure is the original intent builder still applied to the phrase returned by coomonplay?


#7

No it is not. The CPS_match_query_phrase gets the phrase and then has to handle all parsing of the string.