A few questions about Mycroft

Hi there,

First of all I would like to thank everyone for the amazing work on this project, I discovered this voice assistant about a month ago and am rather impressed about how well it works when compared to the other commercial voice assistants. I am currently working on my first skill which I hope to publish within the next couple of weeks which will make it possible to use Apple’s Find My feature to ring a lost phone via Mycroft.

In the meantime I have a few questions and bits of feedback which I would like to provide you with. I am fully open to help to bring these ideas to life, so if someone would be willing to point me in the right direction I would be more that happy to try to get a pull request on the way.

1. When saying the wake word, I often find that I have to wait for the chime to play before giving my request

Is there any way that the initiation of the recording could be sped up? I find it rather unnatural to have to pause my sentence between the ‘Hey Mycroft’ and the beginning of my request. If I do not pause I find that the first word or two of my request is cut off.

2. How could multi-language Precise wake words be implemented

I am bilingual in two languages and would find it very practical to be able to speak in two different languages to Mycroft (Google Assistant has a similar functionality where you can add an extra language which it will listen for).
Do you think it would be feasible to run two instances of Precise in parallel on a Raspberry Pi to enable the detection of the wake word in two different languages? I suppose we would then have to detect the language used before passing the request. How do you think this could be accomplished?

3. How can conversation context be properly used in a skill

I saw this page in the docs which discusses exactly this, unfortunately none of the example requests such as “How tall is John Cleese?” and “Where’s he from?” work for my on my default installation. I see how this can be efficient at remembering elements of a request such as the name of the subject or the name of a town to enable a continuous conversation about the subject, but what about other forms of requests. Say I ask ‘What’s the weather like in London’, and then I ask ‘How about in New York’. I’m expecting Mycroft to give me the weather for New York, but how could I code this into my skill?

4. Is it possible to chain commands together

Imagine that I wanted to ask Mycroft to do two things at once. For instance ‘turn on the lights in the living room and set them to blue’, is this something which could easily be accomplished through Mycroft? This would be especially useful for controlling IoT devices, but it would be very practical to be able to generally ask Mycroft to do at least two things at once in one command.

Thanks for having stuck until the end!

You can turn of “the chime” in your mycroft.conf by setting "confirm_listening": false

More advanced: you can reduce the audio subsystems latency a little bit by editing /etc/pulse/default.pa and add “tsched=0” to the line “load-module-udev-detect” so that it looks like:

load-module module-udev-detect tsched=0
2 Likes

I do not want to sound too insisting, but it would be a great help if someone with the required knowledge could answer my questions.

I am trying to create my own skill using conversational context and it would be particularly useful is my third question could be answered.

Thanks!

1 Like

Hey Jack,

To fill in on some of the other questions:

2. Multi-lingual Precise
Currently Precise can only utilize a single model at a time. Generally a model detects a single wake word (/phrase /sound). There are some community efforts to add multiple wake word support.

The alternative would be to train a single model on both your wake words. Smarter people than I might have a better idea on how well that will work…

3. Conversational Context docs
These are very dated unfortunately. They’re on my list to re-work but it won’t be for a little while.

From memory, context is not currently used in the Weather Skill but it would be an ideal candidate as you’ve shown.

Some Skills that I know use context are:

These might give you a better idea of how it works in a real example.

4. Chaining commands
Not currently. Jarbas had an experimental thing at one stage but I don’t think it’s in a state to use. At the moment each utterance is treated as a single intent.

1 Like