Mycroft Community Forum

Mycroft Skills etiquette: How should they interact?

We need your help again! We’re looking for feedback from anyone who has used Mycroft in the past, or wants to use it in the future. If you’re reading this and that doesn’t include you – then I have no idea why you’re here… #awks

Over the last few months we have been polishing all the base Skills – both their voice and graphical interfaces. The focus during this time has been on each Skill working as expected in isolation, however we also know that real-world usage isn’t that straightforward. Conversations with other humans often jump around between different contexts and interacting with a voice assistant is no different.

When you speak to a voice assistant, you might start by playing some music. Set a timer for the pot on the stove. Find out how many teaspoons in a tablespoon. Then want everything to shut up while you focus on the key moment of your delicious creation. Each of these is likely handled by a different Skill – your music service, the Timer Skill, Wolfram Alpha, and the Volume Skill. If each of these Skills are only designed to operate on its own then these interactions quickly become disjointed, come into conflict with each other, and ultimately can become confusing for the human trying to focus on their food.

For a seamless interaction experience that does what you want, and otherwise gets out of the way – we need to think through all the ways that people might interact with Mycroft as a whole. It’s important that this also includes how Mycroft will behave when we diverge from what’s called the “happy path” in software development. When things go wrong, how do we recover gracefully? What should Mycroft do if the user doesn’t hear what was said? If the user makes a mistake or changes their mind – how do we help them get back on track?

As you might imagine, when you start digging into “how should anything interact with anything” – it quickly spawns discussions, disagreements, new scenarios, and an endless list of new questions. To help us work through these questions, and to ensure the interaction of Skills reflects the needs and wants of all Mycroft users – we need your help.

To unravel this enormous puzzle and all its possible pathways, we want to:

  • Understand how you would expect Mycroft to behave in different circumstances
  • Consider ways we might be able to group or generalize those expectations
  • Design a technical implementation to meet these rules
  • Implement the design and gather more feedback.

When you put it in 4 dot points, it sounds so simple – and we want to make it just as simple to have your say. So to begin with we want to understand how you want Mycroft to behave. To explore this we’re going to be releasing a series of videos showing some example interactions with a Mark II unit.

For each of these videos we want you to give it a thumbs up if it’s spot on, and a thumbs down if it misses the mark. Each video will also have it’s own questions that we’d love your opinion on in the comments. But if you have something to say about it, and it doesn’t fit in one of the questions, we still want to hear it.

The first video we wanted feedback on is:

  • Does the video reflect the way you would want your Mycroft unit to behave?
  • If not, how would you expect it to be different?
  • Are there other situations where you would expect the same or similar behaviour?
  • Are there situations where you wouldn’t want this behaviour?
1 Like
  • Time and date - fine

  • Play music => “Just one moment while I look for that” should be shorter, e.g. “checking”
    (related though not in the video - omit reply “Playing song by artist”)

  • Timer and music both required a “stop” even though timer had gone off. Perhaps timer should stop itself at 0:00 so only one “stop” is needed.

    -Mike M

1 Like

To me, that works as expected. Just a couple of suggestions:

  • The Play music response is very long. “Checking” would be better
  • Separate “stop play”, “stop timer” and “stop all” commands that could be issued at any point, regardless of what is currently shown on the screen, would be very useful
  • Is there some means of bringing up on screen a list of all the things that are happening in the background? In this example, can both the music and the timer be checked and controlled while the two are operational?
3 Likes

If the timer switches itself off automatically at 0:00, or shortly afterwards, you’ll miss it if you happen to be out of the room for a minute or two

2 Likes

Safety measure if you’re not around at that time (but still want to be remembered).
(edit: oh, spaced MichaelM’s response)
Maybe add an autostop checkbox to settingsmeta.yml/code, then it’s on the user to decide.

You’re also able to say “delete timer” (although i remember i tweaked the code cause it is acting unexpected if 2 timer are running - and one is expired)

On a sidenote: I like that approach. It’s hard to get the UX(perience) right just by codeing your *** off

1 Like

Good call! I have a timer that just ‘dissapears’ end of sound and it causes me to miss it :(.

The music plays after a mere three seconds. Given that timeframe, it should just start, no additional confirmation at all. I would suggest a “still looking” message ONLY if more than 10 seconds have passed. (You might want to make that number configurable, so one can match their own network speed.)

Unclear to me from single example how timer and music interact. It isn’t a sort of ‘LIFO’, I hope, were only top of stack is open for interaction? So please also show us how to stop music in the middle of a timer.

2 Likes

^Agree this could be more concise, perhaps just “Playing song” or “Here’s song” (or even “You got it!” …might be nice to have a little variation in the responses, especially if it’s a frequently played and/or quickly accessible song).

^Agree with this as well.
In terms of the general “Hey Mycroft… Stop.” command to terminate skills, I think we could make that language a little more natural. (This would involve giving Mycroft a new activation phrase to listen for in addition to “Hey Mycroft”, and I have no idea how hard that is programmatically. This idea may very well be unfeasible.)
For any skill that’s awaiting a termination command, like a timer, or for any skill that gives a verbal/visual reply to a question (e.g. “What’s the date?” / “What’s the capital of France?”), it would be great for Mycroft to recognize the termination command “Thank you Mycroft.”

  • Use case 1: “Hey Mycroft … Set a 5 minute timer.” [Timer goes off. It’s beeping loudly.] Instead of saying, “Hey Mycroft…” [wait for Mycroft sound to indicate that it’s listening] “…Stop.” the user could simply say, “Thank you Mycroft” and the timer would end.
  • Use case 2: “Hey Mycroft… What’s the longest city name in the world?” Mycroft replies something like: “The longest city name in the world is in Wales, called ‘llanfairpwllgwyngyllgogerychwyrndro—’ .” The user realizes they have made a terrible mistake, so they simply say, “Thank you Mycroft.” and Mycroft immediately stops talking.

(I noticed in the video that when asked for the date, Mycroft replied with the entire date including year, and also left the visual reply on screen for what seemed like an excessively long time. I realized it might be helpful if, as soon as the user had gotten the information they needed, they could say, “Thank you Mycroft” and terminate both Mycroft’s verbal reply and the and visual reply displayed on-screen.)

Lastly, it would be nice if there was some sort of visual indicator that Mycroft is ready to receive commands. For example, Google Assistant has a colorful border around part of the screen, which changes depending on whether its state is “ready”, “actively listening”, or “processing command” (see image below). If Mycroft had some similar visual indicator, it would help users avoid the frustration/confusion of giving Mycroft a new command while it’s still working on processing the previous one.
For example, imagine there’s a pulsating blue bar at the bottom edge of the screen to indicate Mycroft is ready. The users says “Hey Mycroft…” The blue bar lights up to indicate Mycroft is listening. The user continues, “…play song.” The blue bar undergoes some sort of animation to indicate Mycroft is processing the request. The music player appears on screen, and Mycroft says, “Playing song by artist.” Music begins to play, and the blue bar returns to its original pulsating state, indicating that Mycroft is now ready to receive further commands.

2 Likes

All of those are excellent ideas. I especially like “Thank you Mycroft” to instantly stop the current action

4 Likes

One thing to note is that “Thank you Mycroft” should always make Mycroft stop talking, but should not always terminate the skill. For example, if a timer is started, and the user for whatever reasons says, “Thank you Mycroft,” before the timer has run out, that probably should not terminate the timer. As another example, if the user says, “Hey Mycroft, play song,” and Mycroft starts to reply, “Playing song by artist,” if the user says, “thank you Mycroft,” that should make Mycroft stop talking, but it should not prevent the music from playing.

4 Likes

Thanks for all your feedback already!

There’s no need to limit suggestions to what is possible in the current implementation. We are building a roadmap so if we can’t get to a particularly good idea right now it will get put on the plan.

Also I should point out that we don’t expect us to reach a consensus on how Mycroft should behave. Our immediate goal is to establish a reasonable default behavior. But ultimately we want your Mycroft (or whatever you call it) to develop a “personality” that suits you. This can be accomplished in a variety of ways at the technical level- configuration options, machine learning and installation of alternative skill implementations, for example. But for now we’re not worried about that: we’re establishing the user requirements. Implementation details come much later.

So keep those ideas coming. And if you find your opinion in the minority, just remember that we’re not looking for one ring to rule them all. We are looking to enable infinite diversity in infinite combinations.

1 Like

Well, commands both to ‘stop the current skill’ and also to ‘stop talking but otherwise continue’ would be useful. What those commands should be is perhaps an implementation detail

1 Like

I agree that the “play music” response is too verbose and a simple “sure” or “ok” response would be fine. Would be nice to have some contextual feedback only if the song or artist is not found, maybe with some options listed for play selection.

1 Like

I am not sure if it is a detail. The most intuitive way, in my view, is to say “stop” when you want that your assistant stops what it currently does in a narrow sense. If it talks, it should stop talking. If it displays some specific content, it should stop displaying that specific content. If it plays music, it should stop playing music, etc…

The semantic ambiguity of “stop” may very well require that Mycroft understands much more context - and ultimately also personal context (that can become personal data if Mycroft collects enough to finger print users). So, I think that skill etiquette depends on/interacts with data privacy at least.

Probably, it also depends on accountability, e.g. when an assistant does something for you that you rely on. For instance, you may want Mycroft to order 10 pizzas for your birthday party. While Mycroft utters that it successfully sent the order, you quickly realize you want someting else and tell Mycroft to stop. Does that mean it should stop uttering or cancelling the order? Should there be a different “default” for skills that perform something that may cost you money?

2 Likes

In lack of context, a layered cake of “skill hierarchy” could be applied. It’s generally obvious that when music is playing and the alarm goes off, a single “stop” command is directed at the alarm.

At least do I not see any case where the opposite would be expected.

I do realise that implementing “skill hierarchy” generally is no simple thing, maybe it could be a personal setting in each skill?

1 Like

I think a lot of people lose track of what day of the month it is. Very few lose track of what year it is. I think it is generally a good policy to minimise clutter in mycroft responses. If I want to quickly check what the date is, then ideal response is “today is the 28th of September”. I most likely know what day of the week it is and what year of the century it is. I can ask specifically for that info on the rare occasions I need it. Long winded responses from a voice assistant quickly grows old. The music request should definitely not be answered preceded with a long winded announcement unless as pointed out it is taking an unusually long time
I like the nesting of active contexts in the demo. Alarm going off is active, then stop applies to that and not to background contexts. Could have some catchall as suggested such as “mycroft quiet” if I need immediate silence to talk to someone in the room or answer my phone. (might be a nice skill for mycroft to recognise a phone ringing)

2 Likes

When there’s a display then anything ambiguous or generic (like “stop”) should go to the displaying application.

If the displaying skill doesn’t handle it but others do then “do you want to stop timer1, egg timer or music” seems reasonable rather than stopping something you didn’t want stopped.

When the timer goes off it would be nice if Mycroft entered listening mode whilst it is sounding as either you’re not there to hear so it won’t matter or you’re likely to want to speak and acknowledge the timer.

some other comments that I would vote for:

  • Not speaking the year
  • Music response is too long (though 10s is an eternity)
  • Visual feedback of listening

Nitpick: The timer display of -ve numbers is quite geeky if you think about it.
It should ideally say “10s ago” and maybe change colour? I’m not sure if there is space to put “went off 10s ago”.

1 Like

Time and date were fine.
If the music could be found “quickly”, skip the intro announcement.
Stopping the timer and music:
The timer should continue until verbally stopped (good).
The issue, thoush, is how should Mycroft interpret potentially ambiguous commands?
If the timer was still running and the music was still playing and the command was “Hey Mycroft Stop”, Mycroft would not know which skill to stop. Relying on what is displayed is not an great option as it may be running on a display-less device or the display can’t be seen for some reason. If it can’t reasonably be determined which skill the command is meant for, Mycroft could come back and say “Stop timer or stop music?” and wait for the response.

About the discussion about using “Thank you Mycroft” to stop a response, it seems inconsistent with how Mycroft interactions work. Unless Mycroft asks for clarification, it is simplest if any/all commands start with “Hey Mycroft…” In the case of a rambling response to a badly stated question, simply “Hey Mycroft, stop”. In the middle of a response, there is little ambiguity as to what needs to be stopped.

1 Like

Thanks for all the awesome feedback everyone!

It’s clear that there’s never going to be 1 set of behaviour that works for everyone. We’re all different and we have different expectations.

As Michael said - in the first round we want to implement good default behaviour. Something that works well for most people and can be manually modified if you choose eg through a setting change. In time the system should learn different users preferences and adapt itself - but this is a much longer term ambition :smiley:

I won’t reply to everything directly but there’s a few things I wanted to drill into a little more…

Stopping things

Some intents are nice and simple to infer - “stop the music” you would expect to… stop the music. It gets less clear when looking at generic phrases like “stop”. If a timer or alarm is actively beeping then we might assume that this rather than the music is what should be stopped. But what happens if we have a Timer displayed on the screen and music playing in the background? Should we ask which thing to stop? Stop the most recently activated process (ie LIFO)? Have a defined priority order - Expired thing > active timer > music > other? or something else?

It is relatively easy to add more back and forth conversation and/or settings to handle situations like this - but the more we can infer these correctly and not need to bother the user with more detail, the better the experience.

As you said in a follow up post, I can see myself saying this when a Timer has expired or even if Mycroft is being a bit long winded and I want to cut off the speech. But if I asked for some music, it started playing, and I said “thank you Mycroft” - the intent is not for the music to stop. Can you think of other examples where “thank you” is or isn’t a termination intent/command?

Undoing actions

This is a really interesting scenario both in terms of what “stop” should do but also around what actions should be “undo-able”? Would love to hear more thoughts or examples of things that are important to undo.

Expired Timer display

Definitely not a nitpick - this is great feedback! We want to hear it all - big and small :slight_smile:

Visual indicator of system state

Currently the LED’s on top of the unit perform this function but the intention is to have this reflected on the screen as well. You can see the start of this if you set your device to grab the “latest” updates via home.mycroft.ai. Are there other states or information that you think could be communicated through this other than “ready”, “actively listening”, or “processing command”?

Stopping: it’s indeed nice if Mycroft could infer what you’re trying to stop, rather than (always) add questions. But note that there are options in between as well:

  1. it could infer, yet you’d still have the option to say “that’s not what I meant” (or some command like that), to nip in in the bud
  2. it could infer, and ask “OK?” (when in doubt) to get explicit confirmation
  3. only if pretty clueless it would indeed have to just ask what to stop

Thank you I think that’ll do wonders if it effectively means “shut up”, but shouldn’t extend beyond that; gets way too subltle = confusing.

Also means that the actual “stop” could be “stop action” (not just stop confirming). If it’s confirming that 10 pizza order, I suppose there might be a lot of text, like total price, when they’ll be there, etc. If that order is somehow a mistake and I want to stop it going through, I do not need further confusion around the commands…

Quick to glance Please don’t bury information in extensive text (hope that simply won’t fit). I’ve been surprised while visiting the USA how much text there is in trafic everywhere! In Europe, we use road signs for that. IMHO that’s much more readable at a glance and less dependent on knowing a language well.

It might be a tad geeky, but a “-X s” (in red?) is also very clear and remains practical for those who use timers all day long, as it is concise and quick to read. Which also means easy to read for kids who don’t read that well yet, anyone who should wear glassess for proper reading but is still at breakfast, folks who don’t speak the language very well, etc.

(Anyone confused a first time, will see what it means anyway by staring at it for litterally an extra second - “Oh, it goes UP again…”)

Processing command Sound like a really good set of three simple indicators, that just didn’t come across yet over this video medium.

1 Like

Good discussion around LIFO/stack of context vs. specifying context. It seems best to me that Mycroft handle both.

My remark is actually mainly about the video content. For completeness, the video could anticipate the variety and demonstrate both. So perhaps after the demo as is, the user then pauses a moment and starts a different song, and starts another timer, and this time at expiration, utters “Mycroft, stop music” and then a little pause and then “Mycroft, stop timer”. And finally, user then asks Mycroft how long the timer ran, to which Mycroft replies “five minutes and twenty seconds.” (Adding the absolute value of the time overage to original duration).

1 Like