To me, that works as expected. Just a couple of suggestions:
The Play music response is very long. “Checking” would be better
Separate “stop play”, “stop timer” and “stop all” commands that could be issued at any point, regardless of what is currently shown on the screen, would be very useful
Is there some means of bringing up on screen a list of all the things that are happening in the background? In this example, can both the music and the timer be checked and controlled while the two are operational?
Safety measure if you’re not around at that time (but still want to be remembered).
(edit: oh, spaced MichaelM’s response)
Maybe add an autostop checkbox to settingsmeta.yml/code, then it’s on the user to decide.
You’re also able to say “delete timer” (although i remember i tweaked the code cause it is acting unexpected if 2 timer are running - and one is expired)
On a sidenote: I like that approach. It’s hard to get the UX(perience) right just by codeing your *** off
Good call! I have a timer that just ‘dissapears’ end of sound and it causes me to miss it :(.
The music plays after a mere three seconds. Given that timeframe, it should just start, no additional confirmation at all. I would suggest a “still looking” message ONLY if more than 10 seconds have passed. (You might want to make that number configurable, so one can match their own network speed.)
Unclear to me from single example how timer and music interact. It isn’t a sort of ‘LIFO’, I hope, were only top of stack is open for interaction? So please also show us how to stop music in the middle of a timer.
^Agree this could be more concise, perhaps just “Playing song” or “Here’s song” (or even “You got it!” …might be nice to have a little variation in the responses, especially if it’s a frequently played and/or quickly accessible song).
^Agree with this as well.
In terms of the general “Hey Mycroft… Stop.” command to terminate skills, I think we could make that language a little more natural. (This would involve giving Mycroft a new activation phrase to listen for in addition to “Hey Mycroft”, and I have no idea how hard that is programmatically. This idea may very well be unfeasible.)
For any skill that’s awaiting a termination command, like a timer, or for any skill that gives a verbal/visual reply to a question (e.g. “What’s the date?” / “What’s the capital of France?”), it would be great for Mycroft to recognize the termination command “Thank you Mycroft.”
Use case 1: “Hey Mycroft … Set a 5 minute timer.” [Timer goes off. It’s beeping loudly.] Instead of saying, “Hey Mycroft…” [wait for Mycroft sound to indicate that it’s listening] “…Stop.” the user could simply say, “Thank you Mycroft” and the timer would end.
Use case 2: “Hey Mycroft… What’s the longest city name in the world?” Mycroft replies something like: “The longest city name in the world is in Wales, called ‘llanfairpwllgwyngyllgogerychwyrndro—’ .” The user realizes they have made a terrible mistake, so they simply say, “Thank you Mycroft.” and Mycroft immediately stops talking.
(I noticed in the video that when asked for the date, Mycroft replied with the entire date including year, and also left the visual reply on screen for what seemed like an excessively long time. I realized it might be helpful if, as soon as the user had gotten the information they needed, they could say, “Thank you Mycroft” and terminate both Mycroft’s verbal reply and the and visual reply displayed on-screen.)
Lastly, it would be nice if there was some sort of visual indicator that Mycroft is ready to receive commands. For example, Google Assistant has a colorful border around part of the screen, which changes depending on whether its state is “ready”, “actively listening”, or “processing command” (see image below). If Mycroft had some similar visual indicator, it would help users avoid the frustration/confusion of giving Mycroft a new command while it’s still working on processing the previous one.
For example, imagine there’s a pulsating blue bar at the bottom edge of the screen to indicate Mycroft is ready. The users says “Hey Mycroft…” The blue bar lights up to indicate Mycroft is listening. The user continues, “…play song.” The blue bar undergoes some sort of animation to indicate Mycroft is processing the request. The music player appears on screen, and Mycroft says, “Playing song by artist.” Music begins to play, and the blue bar returns to its original pulsating state, indicating that Mycroft is now ready to receive further commands.
One thing to note is that “Thank you Mycroft” should always make Mycroft stop talking, but should not always terminate the skill. For example, if a timer is started, and the user for whatever reasons says, “Thank you Mycroft,” before the timer has run out, that probably should not terminate the timer. As another example, if the user says, “Hey Mycroft, play song,” and Mycroft starts to reply, “Playing song by artist,” if the user says, “thank you Mycroft,” that should make Mycroft stop talking, but it should not prevent the music from playing.
There’s no need to limit suggestions to what is possible in the current implementation. We are building a roadmap so if we can’t get to a particularly good idea right now it will get put on the plan.
Also I should point out that we don’t expect us to reach a consensus on how Mycroft should behave. Our immediate goal is to establish a reasonable default behavior. But ultimately we want your Mycroft (or whatever you call it) to develop a “personality” that suits you. This can be accomplished in a variety of ways at the technical level- configuration options, machine learning and installation of alternative skill implementations, for example. But for now we’re not worried about that: we’re establishing the user requirements. Implementation details come much later.
So keep those ideas coming. And if you find your opinion in the minority, just remember that we’re not looking for one ring to rule them all. We are looking to enable infinite diversity in infinite combinations.
Well, commands both to ‘stop the current skill’ and also to ‘stop talking but otherwise continue’ would be useful. What those commands should be is perhaps an implementation detail
I agree that the “play music” response is too verbose and a simple “sure” or “ok” response would be fine. Would be nice to have some contextual feedback only if the song or artist is not found, maybe with some options listed for play selection.
I am not sure if it is a detail. The most intuitive way, in my view, is to say “stop” when you want that your assistant stops what it currently does in a narrow sense. If it talks, it should stop talking. If it displays some specific content, it should stop displaying that specific content. If it plays music, it should stop playing music, etc…
The semantic ambiguity of “stop” may very well require that Mycroft understands much more context - and ultimately also personal context (that can become personal data if Mycroft collects enough to finger print users). So, I think that skill etiquette depends on/interacts with data privacy at least.
Probably, it also depends on accountability, e.g. when an assistant does something for you that you rely on. For instance, you may want Mycroft to order 10 pizzas for your birthday party. While Mycroft utters that it successfully sent the order, you quickly realize you want someting else and tell Mycroft to stop. Does that mean it should stop uttering or cancelling the order? Should there be a different “default” for skills that perform something that may cost you money?
In lack of context, a layered cake of “skill hierarchy” could be applied. It’s generally obvious that when music is playing and the alarm goes off, a single “stop” command is directed at the alarm.
At least do I not see any case where the opposite would be expected.
I do realise that implementing “skill hierarchy” generally is no simple thing, maybe it could be a personal setting in each skill?
I think a lot of people lose track of what day of the month it is. Very few lose track of what year it is. I think it is generally a good policy to minimise clutter in mycroft responses. If I want to quickly check what the date is, then ideal response is “today is the 28th of September”. I most likely know what day of the week it is and what year of the century it is. I can ask specifically for that info on the rare occasions I need it. Long winded responses from a voice assistant quickly grows old. The music request should definitely not be answered preceded with a long winded announcement unless as pointed out it is taking an unusually long time
I like the nesting of active contexts in the demo. Alarm going off is active, then stop applies to that and not to background contexts. Could have some catchall as suggested such as “mycroft quiet” if I need immediate silence to talk to someone in the room or answer my phone. (might be a nice skill for mycroft to recognise a phone ringing)
When there’s a display then anything ambiguous or generic (like “stop”) should go to the displaying application.
If the displaying skill doesn’t handle it but others do then “do you want to stop timer1, egg timer or music” seems reasonable rather than stopping something you didn’t want stopped.
When the timer goes off it would be nice if Mycroft entered listening mode whilst it is sounding as either you’re not there to hear so it won’t matter or you’re likely to want to speak and acknowledge the timer.
some other comments that I would vote for:
Not speaking the year
Music response is too long (though 10s is an eternity)
Visual feedback of listening
Nitpick: The timer display of -ve numbers is quite geeky if you think about it.
It should ideally say “10s ago” and maybe change colour? I’m not sure if there is space to put “went off 10s ago”.
Time and date were fine.
If the music could be found “quickly”, skip the intro announcement.
Stopping the timer and music:
The timer should continue until verbally stopped (good).
The issue, thoush, is how should Mycroft interpret potentially ambiguous commands?
If the timer was still running and the music was still playing and the command was “Hey Mycroft Stop”, Mycroft would not know which skill to stop. Relying on what is displayed is not an great option as it may be running on a display-less device or the display can’t be seen for some reason. If it can’t reasonably be determined which skill the command is meant for, Mycroft could come back and say “Stop timer or stop music?” and wait for the response.
About the discussion about using “Thank you Mycroft” to stop a response, it seems inconsistent with how Mycroft interactions work. Unless Mycroft asks for clarification, it is simplest if any/all commands start with “Hey Mycroft…” In the case of a rambling response to a badly stated question, simply “Hey Mycroft, stop”. In the middle of a response, there is little ambiguity as to what needs to be stopped.
It’s clear that there’s never going to be 1 set of behaviour that works for everyone. We’re all different and we have different expectations.
As Michael said - in the first round we want to implement good default behaviour. Something that works well for most people and can be manually modified if you choose eg through a setting change. In time the system should learn different users preferences and adapt itself - but this is a much longer term ambition
I won’t reply to everything directly but there’s a few things I wanted to drill into a little more…
Stopping things
Some intents are nice and simple to infer - “stop the music” you would expect to… stop the music. It gets less clear when looking at generic phrases like “stop”. If a timer or alarm is actively beeping then we might assume that this rather than the music is what should be stopped. But what happens if we have a Timer displayed on the screen and music playing in the background? Should we ask which thing to stop? Stop the most recently activated process (ie LIFO)? Have a defined priority order - Expired thing > active timer > music > other? or something else?
It is relatively easy to add more back and forth conversation and/or settings to handle situations like this - but the more we can infer these correctly and not need to bother the user with more detail, the better the experience.
As you said in a follow up post, I can see myself saying this when a Timer has expired or even if Mycroft is being a bit long winded and I want to cut off the speech. But if I asked for some music, it started playing, and I said “thank you Mycroft” - the intent is not for the music to stop. Can you think of other examples where “thank you” is or isn’t a termination intent/command?
Undoing actions
This is a really interesting scenario both in terms of what “stop” should do but also around what actions should be “undo-able”? Would love to hear more thoughts or examples of things that are important to undo.
Expired Timer display
Definitely not a nitpick - this is great feedback! We want to hear it all - big and small
Visual indicator of system state
Currently the LED’s on top of the unit perform this function but the intention is to have this reflected on the screen as well. You can see the start of this if you set your device to grab the “latest” updates via home.mycroft.ai. Are there other states or information that you think could be communicated through this other than “ready”, “actively listening”, or “processing command”?
Stopping: it’s indeed nice if Mycroft could infer what you’re trying to stop, rather than (always) add questions. But note that there are options in between as well:
it could infer, yet you’d still have the option to say “that’s not what I meant” (or some command like that), to nip in in the bud
it could infer, and ask “OK?” (when in doubt) to get explicit confirmation
only if pretty clueless it would indeed have to just ask what to stop
Thank you I think that’ll do wonders if it effectively means “shut up”, but shouldn’t extend beyond that; gets way too subltle = confusing.
Also means that the actual “stop” could be “stop action” (not just stop confirming). If it’s confirming that 10 pizza order, I suppose there might be a lot of text, like total price, when they’ll be there, etc. If that order is somehow a mistake and I want to stop it going through, I do not need further confusion around the commands…
Quick to glance Please don’t bury information in extensive text (hope that simply won’t fit). I’ve been surprised while visiting the USA how much text there is in trafic everywhere! In Europe, we use road signs for that. IMHO that’s much more readable at a glance and less dependent on knowing a language well.
It might be a tad geeky, but a “-X s” (in red?) is also very clear and remains practical for those who use timers all day long, as it is concise and quick to read. Which also means easy to read for kids who don’t read that well yet, anyone who should wear glassess for proper reading but is still at breakfast, folks who don’t speak the language very well, etc.
(Anyone confused a first time, will see what it means anyway by staring at it for litterally an extra second - “Oh, it goes UP again…”)
Processing command Sound like a really good set of three simple indicators, that just didn’t come across yet over this video medium.
Good discussion around LIFO/stack of context vs. specifying context. It seems best to me that Mycroft handle both.
My remark is actually mainly about the video content. For completeness, the video could anticipate the variety and demonstrate both. So perhaps after the demo as is, the user then pauses a moment and starts a different song, and starts another timer, and this time at expiration, utters “Mycroft, stop music” and then a little pause and then “Mycroft, stop timer”. And finally, user then asks Mycroft how long the timer ran, to which Mycroft replies “five minutes and twenty seconds.” (Adding the absolute value of the time overage to original duration).
Concerning “undo”, I think, actions that cause one of the following should be undo-able:
result in immediate or future payments (to protect from financial loss)
result in configuration changes (to avoid misconfigs that drive users crazy, e.g., changing the language to something you do not speak → requires a universal keyword for undo)
control events for Internet of Things devices (to mitigate damage or other harmful/unwanted results)
These would be my top priorities. The second one is universal for all skills, while the first and second concern specific skill types only (but these are the really useful ones, I guess).
It may be helpful to see this as another instance of skill-based intent recognition, giving skills a chance to handle things first.
For example, the timer skill may be able to handle “stop timer {name}” to stop a timer by name. It may also handle just “stop”, but this should trigger a fallback if no timer exists. Similarly, the music skill could handle “stop music” or just “stop” (trigger fallback if no song is playing).
If the skills “fall back” in LIFO order of use, it would produce the expected behavior.
With a song playing, then a timer set and beeping:
“stop” will disengage the current timer
“stop” again will pause the music
The benefit of this approach is you could still say “stop music” to leave the timer alone. Additionally, Mycroft itself may have a meta skill that can handle phrases like “stop everything”.