Mycroft Skills etiquette: How should they interact?

Concerning “undo”, I think, actions that cause one of the following should be undo-able:

  • result in immediate or future payments (to protect from financial loss)
  • result in configuration changes (to avoid misconfigs that drive users crazy, e.g., changing the language to something you do not speak → requires a universal keyword for undo)
  • control events for Internet of Things devices (to mitigate damage or other harmful/unwanted results)

These would be my top priorities. The second one is universal for all skills, while the first and second concern specific skill types only (but these are the really useful ones, I guess).

1 Like

It may be helpful to see this as another instance of skill-based intent recognition, giving skills a chance to handle things first.

For example, the timer skill may be able to handle “stop timer {name}” to stop a timer by name. It may also handle just “stop”, but this should trigger a fallback if no timer exists. Similarly, the music skill could handle “stop music” or just “stop” (trigger fallback if no song is playing).

If the skills “fall back” in LIFO order of use, it would produce the expected behavior.
With a song playing, then a timer set and beeping:

  1. “stop” will disengage the current timer
  2. “stop” again will pause the music

The benefit of this approach is you could still say “stop music” to leave the timer alone. Additionally, Mycroft itself may have a meta skill that can handle phrases like “stop everything”.

1 Like

The commentary around the word “stop” actually reminds me of a surprising interaction I had with a proprietary home-assistant device I had while at a friend’s place.

My friend has much of his A/V gear hooked up to his HA device, so we were able to tell it to turn on the projector and start streaming a movie to it. At some point, we put some pizzas in the oven and asked the HA device to set a timer to tell us when they were cooked, and resumed watching the movie.

At some point during the movie something interesting happened, so we wanted to pause the movie to discuss the interesting thing that happened in the movie, but our attempt to pause the movie didn’t work. Puzzled, we tried again; it worked the second time.

After several minutes discussing whatever plot point had taken our interest, we resumed the movie. Some time later, we noticed a burning smell; we jumped up and pulled the pizzas out just in time! They were extra toasted around the edge, but still edible.

I’m sure you can tell what happened: our first attempt to pause the movie actually paused our cooking timer!

In this case, I don’t think his HA device didn’t have a screen, so it could be difficult for the device to choose what we wanted to pause. But even a display might not help; the HA device may not be able to determine what it is we are focused on and therefore what should be paused.

Perhaps an interaction like this would have helped:

  • Me: “pause”
  • HA: “the movie, or the timer?”

I have no idea what you would do if there were multiple times, either with or without a movie or background music… A conundrum to be sure!

PS: if the HA device did choose one thing when there were multiple candidates, perhaps some unambiguous feedback would help; for example, the situation above would have been easy for us to resolve if my friend’s HA device told us “I’ve paused your timer”. I don’t recall there being any feedback at all, or if there was it was drowned out by the movie…

3 Likes

I don’t do CrossFit anymore, but when I did the following would have been great.
An example might be have Mycroft start workout music playlist, start a timer for 3 minutes of say jumping rope with a 5 sec count down and alarm at the end of the 3 min, then 1 minute of rest with 5 sec count down and alarm at end of minute, then 2 minutes of burpees with countdown and alarm, and repeat this a total of five times.
“Mycroft start a workout with music, exercise announcements, countdown timers and alarms.”
“Ok. What music do you want?”
“World’s best workout playlist”
“Ok. What is your first exercise?”
“Jumping rope”
“Ok. How long?”
“3 minutes with 5 second count down at end”
“Got it. Next exercise (and time with 5 sec count down)?”
“Rest (for 1 minute with 5 sec count down)”
“Roger that. Next exercise (and time)?”
“Burpees for 3 minutes with 5 sec countdown”
“Ok. Next exercise?”
“Repeat that 5 times and stop all but let the music play”

That would have been really nice to have.
If Mycroft would have called split times every minute (2 minutes left) and in last minute every 30 sec that would have been really nice.
If Mycroft could be connected to a heart rate monitor it could call out/display your heart rate occasionally.
If you count your reps, Mycroft records your count, and at the end tell you how many you did, and where your heart rate was, and what your heart rate did while you recovered after the work out.
Or maybe there is an app I don’t know about that does all this, and you just say “Mycroft, run world’s best workout app”

If you like your workout, you can name it and save it and not have to recreate it. Share workouts with friends. Work out together at the same time (option for separate playlists) or asynchronisely.
Also Mycroft could save results of workouts if you count reps out loud - first round 30 pushups in 1 min, heart rate X to X during 1 minute rest, 20 burpees in 1 minute, heart rate …, second round: pushups, heart rate, burpees, heart rate, 3 rd round, etc. Look for progress over time, compete with friends.

Could Mycroft sync to multiple headphones and play different playlists on the different headphones so people can have their own tunes?

1 Like

I did not expect the year when asking for the day.

I expected the music to just play, not to receive a confirmation.

When the timer was being set, I did not expect the music to disappear, I expected it to continue with a lowered volume. Like when I’m listening to music on my phone, the sound is just lowered to signal a notification.

For the stop command, I expected a question : “Stop what?” or “The music or the timer?”

1 Like

Thanks for the extra feedback all - it’s all very helpful in determining the behaviour we should expect. This will be used to inform the design of the technical processes sitting behind the scenes.

I’ve just posted another video and would love your feedback on this video too.

Thanks!

Hey all, sorry for the slow response here, I’ve been out of office for a while. These are all great suggestions that we will consider in the Skills Interaction Sprint.

In the disambiguation example that @plmorel described we are considering taking some assumptions when you say “stop.” For example if a timer or alarm is an expired state we think it’s safe to assume they want to stop the beeping timer or alarm and not the music. The beeping timer or alarm at this point would be in the perceived foreground.

@Msquared also mentioned a problem that would have benefited from disambiguation, the movie vs timer scenario. It probably would have been ok if the Assistant would have ducked the audio and responded with “I’ve paused the Pizza timer.” The Assistant is making an assumption, and in this case, the wrong one, but at least you know what happened. I think giving feedback when things are ambiguous give the assistant some more agency to “guess.”

I’m not saying these two solutions are THE solution, but we do want to minimize disambiguation as much as possible if we are HIGHLY confident we have the right answer.

Also everyone should keep in mind that a lot of the solutions we will be working on in the Skills Interaction Sprint will be the mechanisms to allow these interactions to happen. If we do it correctly we can change priority, or add disambiguation, etc… to react to user feedback. Right now we need the system to be aware of these types of clashes.

1 Like

Tell Mycroft to “Silence all” or “Silence everything” for just that.
Tell Mycroft to “Silence” all/everything except/but . . . " and list the audible skill(s) that should continue and all else is stopped.
Any skills (the timer continuing to count down, cooking breakfast, returning Pluto to its rightful planetary status, etc) that are not being done through the speaker would continue.
So “silence” may work better than “stop”. Just need to train the human.

The user may want silence to hear Mycroft’s timer when it goes off. Or the user may want complete silence and NOT hear the timer. That’s a (rare?) context specific situation. If its a repeating situation there should be a way for the user to specify audible timer alarm vs silence. Not sure what the initial default would be.

Mycroft may “introduce” itself to any new user as a helpful machine/gadget and go over things like “stop all” vs stop specific skill(s) vs “silence all/everything” vs silence all but specific noise producing skill(s). Either one long introduction or many smaller ones depending on user preference. May need optional reminders when requested.

Speaking of user preference and just for me, I don’t want to tell Mycroft “thank you” for anything. It’s going to be a helpful machine but its obviously not human. Even though it is obviously not human, it’s obvious that humans can treat/think about machines as human. I don’t want to ever start blurring that line between machines and humans. I don’t want other people to blur it either, but it’s their call if they want to or just don’t care.

I would also like to rename my personal Mycroft the name HAL (yep, from 2001 Space Odyssey) to help me remember that. I’d actually prefer to rename my computer “computer” or “machine” (more neutral and less silly than HAL) but that would lead to too many inadvertent wake ups. I want Mycroft to help me be better to humans and me not treating/thinking of Mycroft as human is a good place to start.

Mycroft to me: “Hey moron, so-and-so’s birthday is next week. Try to do something human. Don’t look at me. Can’t help you with that.” Most people absolutely do not want this particular feature. Just trying to make a point.

Does Mycroft’s own speaker (depending on volume) interfere with Mycroft’s ability to hear commands?

1 Like

@Msquared 's scenario is really interesting. It seems like Mycroft might benefit from some additional categorical metadata, like active skills that have a “duration” quality to them (such as timers, or audio streams). Mycroft then could evaluate if there was a plurality (or not) of things that could be “stopped” on command. This would also allow skill authors that come up with new and creative things that don’t have an implicit or obvious “duration” quality to just explicitly make Mycroft aware of it (Do not Disturb Mode, perhaps?).
Another complication: some kinds of skills, like media playing (indeed probably as was the case in @Msquared 's example) are implemented in a “stateless” way (kind of like HTTP) where “PLAY” and “PAUSE” are point-in-time commands, not ongoing states/connections. So Mycroft (and/or the skill authors) may need to include some kind of “state stack” so he could keep track of whether a movie was (probably) still playing, seeing as how the most recent command he had issued was a “PLAY”.

1 Like

This is largely hardware dependent. So on the Mark II we have Acoustic Echo Cancellation which essentially subtracts the audio output from the audio input meaning you can speak over the top of it very easily. On the Mark 1, desktop or Picroft installs (depending on the hardware) this isn’t the case so you really have to yell to be heard.

The other way to overcome this is hardware buttons. We’ve got those on the Mark 1 and Mark II, and on desktop you can setup a key combination. Hitting this acts like a wake-word trigger letting you issue a voice command without saying “hey mycroft”.

This may have already been suggested but … If Mycroft is playing music and another request is initiated the music could be muted and continued playing instead of pausing.

1 Like

Hey Mark, I don’t know whether that has come up, but I can totally see that behavior happening too. Seems like it might depend on what type of media it is eg music vs a podcast.