Adapt Introduction

Greetings, everyone! @seanfitz here, introducing the Adapt Intent Parser.

While we lock down licensing/code organization (i.e. before the code is up for grabs), I thought I’d share an overview that I wrote for the Mycroft crew. I’m here to answer questions about capabilities, help with integration and sdks, and to continue development of Adapt. Here’s to the exciting path ahead!

Adapt Intent Determination
The Adapt Intent Parser is a flexible and extensible intent definition and determination framework. It is intended to parse natural language text into a structured intent that can then be invoked programatically.

Intent Modelling
In this context, an Intent is an action the system should perform. In the context of Pandora, we’ll define two actions: List Stations, and Select Station (aka start playback)

With the Adapt intent builder:

list_stations_intent = IntentBuilder('pandora:list_stations')\
        .require('Browse Music Command')\
        .build()

For the above, we are describing a “List Stations” intent, which has a single requirement of a “Browse Music Command” entity.

play_music_command = IntentBuilder('pandora:select_station')\
        .require('Listen Command')\
        .require('Pandora Station')\
        .optionally('Music Keyword')\
        .build()

For the above, we are describing a “Select Station” (aka start playback) intent, which requires a “Listen Command” entity, a “Pandora Station”, and optionally a “Music Keyword” entity.

Entities
Entities are a named value. Examples include:
Blink 182 is an Artist
The Big Bang Theory is a Television Show
Play is a Listen Command
Song(s) is a Music Keyword

For my Pandora implementation, there is a static set of vocabulary for Browse Music Command, Listen Command, and Music Keyword (defined by me, a native english speaker and all-around good guy). Pandora Station entities are populated via a list stations API call to Pandora. Here’s what the vocabulary registration looks like.

def register_vocab(entity_type, entity_value):
    # a tiny bit of code 

def register_pandora_vocab(emitter):
    for v in ["stations"]:
        register_vocab('Browse Music Command', v)

    for v in ["play", "listen", "hear"]:
        register_vocab('Listen Command', v)

    for v in ["music", "radio"]:
        register_vocab('Music Keyword', v)

    for v in ["Pandora"]:
        register_vocab('Plugin Name', v)

    station_name_regex = re.compile(r"(.*) Radio")
    p = get_pandora()
    for station in p.stations:
        m = station_name_regex.match(station.get('stationName'))
        if not m:
            continue
        for match in m.groups():
            register_vocab('Pandora Station', match)
3 Likes

So I understand this correctly… vocab registration is run at startup? If that’s the case, what happens if (in this example) Pandora add a new station after Mycroft has already started?

Hey Sil! In this example, vocabulary is run at startup. It’s not clearly laid out here, but the entity index (trie, above) is mutable at runtime, and additional vocabulary can be added.

This is a technique known as “Known-Entity Tagging,” and as you’ve probably deduced, requires prior knowledge of all entities that are to be recognized. Typically a large set of vocabulary is bootstrapped, and a variety of systems are built around the parser to keep it up to date. Potential solutions for the “new station” use case include:

  1. An Add Station Intent, as part of Mycroft, where the new station name is added to vocabulary at run-time.
  2. Polling Pandora’s APIs (periodically) to detect changes.
  3. Allowing a speaker to teach new vocabulary to Mycroft (“Mycroft, Third Eye Blind is a Pandora station”).
  4. An explicit “Update Pandora” spoken command, to trigger new vocabulary loading.

There are plenty of others, the ideal being a PubSub mechanism allowing you to detect when various datasets have changed. Depending on the domain/data provider, this may be an option.

2 Likes

Oh… an update command. So sudo apt update && sudo apt dist-upgrade is so 14.04 LTS now.

Soon we just have to say : “Mycroft, update $HOSTNAME and tadaaaaaaaaa.

This update command coumd be very useful too to update Mycroft apps as well

2 Likes

I have a few ideas for how we might solve the update process. I’ll share them once I gather my thoughts and get time to type them up so that they make sense.

Indeed, yes. Webhooks or pubsubhubbub or hanging on a websocket; anything rather than polling, because polling is the devil, although I appreciate that some services aren’t enlightened enough to know this yet :slight_smile:

This is also something potentially usefully centraliseable; obviously some central Mycroft service can’t usefully know what’s on my Kodi box, but it ought to be able to know the list of Pandora stations and push them to all Mycrofts that care, rather than having everyone poll. (Of course you could choose to poll if you prefer.)

Will the adapt intent parser running on the device or in the cloud. Before my understanding was that a STT engine in the cloud translate the audio into text and sends it back to the device. The device then starts to interpret the text.
Now I read this blogpost and there it is stated:

The audio is then processed in the cloud to determine both the text content and the meaning behind the content

So the intent is also evaluated in the cloud?

@avanc using the Adapt Intent Parser, it will allow us to analyze intent on the device itself.

2 Likes

Hey @avanc! The end implementation hasn’t really been settled on yet. Adapt allows for intent determination on device (I run it on my raspi1 at home!), but there may be some interesting use-cases we can enable using cloud (or hybrid) approaches. We will be keeping the community posted (and looking for input) as we move forward.

2 Likes

Great to here. The main reason supporting mycroft was, to have a minimum on processing running in the cloud.

Since we have feature requests that will rely on a cloud infrastructure and some users who want nothing to do with the cloud, we will try to include documentation on how to avoid using our backend if you absolutely want to. However, some useful management features may not be available if you decide to completely cut our backend out of the mix.

Everything is going to be open sourced though, so anyone can take it and spin up their own backend services if they feel it is necessary.

1 Like

@ryanleesipes will the part of the cloud infrastructure will be based on snappy too ? Will you model it with Juju ?

:smile: - we are very much trying to build the cloud infrastructure around these technologies, hopefully we’ll do a blog post soon to talk about it!

Oh great :slight_smile: Couldn’t wait to read it, I’m very interested about Juju. Will you build your infrastructure on Openstack ?

Has any thought been given to a sort of short-term memory for the subjects and objects of the prior command? One of the things I like about Google Now is the way it can disambiguate pronouns like ‘it’ based on your prior request. For example:

  • Me: What is the capital of Nebraska?
  • GN: The capital of Nebraska is Lincoln.
  • Me: How far away is it?
  • GN: From your current location it is xxx miles away.

A possible use case related to Mycroft might be something like this:

  • Me: Mycroft what is the upstairs thermostat set to?
  • Mycroft: The upstairs thermostat is set to 68 degrees.
  • Me: Mycroft please change it to 70.
  • Mycroft: Upstairs thermostat set to 70 degrees.

Functionality like this would make interaction with Mycroft more natural and conversational. I imagine the implementation of this kind of thing would need to be closely integrated with Adapt as it would be central to intent determination. Some sort of short-term memory for the entities of the last command could then be searched for context if the current command had some ambiguity in it. e.g. if the current command has a ‘Location Entity’ requirement, but it was not found in the spoken command the short-term memory could be checked for a ‘Location Entity’ as context for the current command. This short-term memory would of course need to be time-bound or it could produce some odd/undesired outcomes. Perhaps only the prior command from 30 or 60 seconds ago.

Any thoughts?

2 Likes

This is not currently implemented in Adapt, but is on the roadmap. There are scenarios in which context can be used to influence intent determination itself, as well as scenarios where context is simply used to fill in the gaps after intent determination. It will be fun problem to work on again :smile:

1 Like

How is adapt better/different than the probabilistic parser (https://github.com/wit-ai/duckling) open sourced by wit.ai?

From what I understand duckling is agnostic about the data that you want to extract, so it should be possible to write rules for intent parsing on top of it.

From a functionality perspective, it looks like the two share a lot. Their out-of-box modules are pretty impressive. Aside from the obvious implementation details (clojure vs. python), I’d say Adapt is a less mature version of Duckling (they’ve been OSS about 1y longer than Adapt).

My goals for Adapt were a little bit different, which will likely influence the direction going forward. I was shooting for extremely light-weight and simple. All of adapt is ~1k lines of code, and was built to have <10ms parse times on a raspberry pi. Given those constraints, I wouldn’t expect these libraries to ever have true parity.

What ships with Duckling (as opposed to possible extensions therein) appears to be focused on datetime and natural language numerical parsing. These can mostly be modeled with CFGs, and I would consider them an alternative implementation (certainly more accurate now) of the EntityTagger implementation inside of Adapt.

TL/DR; They do some similar things, but they’re not the same. Thanks for pointing it out, though!

1 Like

So glad to see other great open source projects getting released around this tech. As @seanfitz said, I don’t see why we can’t implement some of the things that Duckling does well into Adapt. I think the great thing is that by having these different projects out there, the sea rises raising all ships!

i had a question, Adapt appears to be a BDI (Belief, Decision, Intent) type system similar to Siri’s Spark BDI system which i think if i remember correctly was built using Jython. is this system based on that one? i was trying to integrate Spark into the system i was building, but was never able to really find a good documentation on how to code in the Spark BDI engine…
are there any good example Adapt scripts that we can use as a template to modify and good example codes we can use?