Build an open future with us.

Invest in Mycroft and become a community partner.

We Need Help with Precise Sample Tagging! Please Help!


This one’s a good question for @Wolfgange


Hey folks! This tool is online and ready for further tagging. Can I get some community members to try out 1,000 or so queries? I’ve been doing them here in Palo Alto with great success.

We have processed around 21,000, but still have 79,000 to go ( each query is processed several times to ensure that the data is being correctly categorized.

Once we get 100,000 “Hey, Mycroft” keywords tagged, we can move on to new wakewords. We were hoping to do this by the end of the month.

Tag Here:

See Progress Here:


Yes. We are going to release that code soon ( @Wolfgange ? ), but you need around 100,000 samples to get accurate wakeword spotting. Probably best for us to work on the first dozen or so as a community. What wakeword are you interested in training?


@pcwii Relatively recently we released the Precise source code to the public :smiley: . You can find it here.


I tagged about 300+ yesterday. The tagger hung after 184 tags, recovered, and then stopped again after about 50 more. It did recover again some time later, and I did some more.

Not sure what happened, I see some spinning icons when it hangs, then if had resolving problems, with training, Home… and the website were working though. So were other sites.



Thanks for that @tjoen, our system monitoring software picked that up too - not sure what the problem is, but we’ll keep digging.


I don’t think I am going to be able to convince my wife and daughter to record 100k samples for me :wink: so If the community is interested in contributing to my wake word that would be amazing. My daughter had an imaginary friend “Kelsey” when she was a toddler and now that she is much older she thinks it is cool that here imaginary friend is my AI. Currently I am using the phenome K EH L S IY . I find this works very well with my voice 90% but my wife is only about 10% and my daughter is about 25%. I was thinking of using snowboy to record their voices but had some trouble getting snowboy to work. Precise would be excellent.
Thanks for all the exciting work you and your team are doing!


Maybe this tool could also add the possibility to tag the voice as a male/female voice and even if is an adult or a child talking. This skill could be very useful to prioritize the female samples in the precise tagger tool.

Also, imagine we configure Mycroft to know the names, age and and gender of the members of our family and then assistant is capable to guess who is talking with and personalize its answers (calendar events, music preferences, …)


Thanks for the suggestion, @Jorge. One of the things we have to be careful of is putting privacy first. The more we allow tagging of gender, and age etc, the more we would need to also ensure the privacy of the person who provided the sample.

In terms of prioritizing the gender of samples, it’s really about getting more diversity of samples - women, children, different accents, different situations. That way Precise is trained on a large sample set.


Now that the precise tagger has moved to a different URL does anyone know how I can maintain my existing ranking on the leaderboard as I am no longer recognised on the new site (something to do with cookies I believe).


Found out that there is a code on which you copy and paste into a text box at the new site and hit the “Import Points” button.


Hello, I found that even if I select another wake word (I use “Hey Domo”), the audio files are still regarding the “Hey Mycroft” wake word.
I think that it should distinguish wich audio file use depending on the wakeword, shouldn’t it?


@piretro99 - Right now we’re focused on getting the first 100,000 samples squared away for “Hey, Mycroft”, then we’ll work with the community to select the next wakeword to focus on.

Once we’ve got a good flow going with the first few wakewords, we’ll open up the system for any wakeword. In the mean time we’d love some help with “Hey, Mycroft”


@J_Montgomery_Mycroft Can I suggest deactivating the possibility of tagging the other ones? I find it confusing.


Hi there @piretro999 do you mean having the drop-down SELECT element is distracting? Hey Mycroft is the default Wake Word for training there, you don’t really have to do anything to alter or select the Wake Word.


@KathyReid Hi, yes it distracting: if we do not have to select it then why should it be there with so many options?
For sure I tried to find My Hey Domo, and I found it, but what about the results I posted with that selection? Were them somehow useful or was I just losing my time? (I guess the second one) Consider we are still loosing time in discussing if they were useful or not.

It seems the classical “Do not press this button” :wink:


Hi everybody, I am (going to) be a new user, and maybe contributor. I have pledged on kickstarter, and I am following the development. I may try the plasma desktop version while I wait for my new toy.

I have done some tagging now, around 2000 tags in a couple of weeks, a few hundred at a time.

I have a few doubts about categorization, especially the “almost/duplicate” category:

The description says to use it for “Mycroft” or “Mycroft, hey Mycroft” kind of phrases. I took this as an example to generalize, but I am not sure how far should I go.

Is “Say, Mycroft” a “nor wake word” or an “almost” word?
What about “Hey Mike”?

Another doubt is about the positives:

I have classified as positives the instances where it was apparent the user was trying to wake Mycroft, but the pronunciation was a bit off: for example, many times I heard “Hey, Mycrof” (no “t” phoneme audible), maybe because the clip was cut a bit short, or because the speaker was not saying it. In another case, the speaker used a different pronunciation (Hey, Mycròft), stressing the o instead of the y.

So I guess I would like some clarification for my reference in the future.


My colleague @Wolfgange is best placed to respond here.


I imagine that I’m a multi billionaire, and I hired some butler named mycroft. Yes means I’d expect my butler to respond, the no is when he should not. The maybe is where there is enough confusion that I wouldn’t fire him either way.

This means I’ll accept some bad pronunciations, so long as it is obvious that my butler was intended to respond. I might have guests from a foreign country where English is not native and they can’t pronounce the sounds correctly.

This also means that “when I say hey mycroft the device”… Is a no, even though hey mycroft is in there context says ignore it.


Thank you, that was helpful.