some of my female family members often have problems activating Mycroft (I believe this is a well known issue), so I wanted to train the default hey-mycroft model with their voices using precise.
I stumbled over a few problems and I now have basically two question.
First of all, is it possible to install precise at the moment at all? I don’t know, maybe I’m just too stupid, but I couldn’t install precise properly on various machines (Ubuntu x86, Ubuntu ARM, Raspberry Pi 3 and 4). Some of the time some commands worked (precise-collect and precise-listen) but I could never get all of them to work properly.
And installing one of the community skills for training the wakeword hasn’t been working for me either, but I haven’t yet tried very hard.
Secondly, as far as I understood (I have no experience at all with AI, neural networks, etc.) it is only possible to train on a .net file with precise, not a tensorflow .pb one. Is there any other well trained .net file available apart from the ones in this repo? Because when I precise-listened to the hey-mycroft.train.net from this repo the results were considerably worse than listening to the default hey-mycroft.pb model.
Thanks in advance for your help!
you can find some more models in
Yes, thanks for the hint👍!
For those who aren’t happy with “hey mycroft” this will be a great resource, but I don’t want to use a different wake word (because they are worse than the default one, at least for me) and I also don’t want to train my own one. Instead I want to continue training on the default “hey mycroft” wake word with my own recordings, so that the default wake word model can understand the voices of my family better. Is this even possible? I assumed that it is, because there are some skills that aim to provide this functionality (GitHub - MycroftAI/mycroft-precise-trainer: A skill to train a precise model for a specific voice (WIP) and GitHub - gras64/wake-word-skill: Just train a new wakeword) and there is even a blog post about this topic with exactly the functionality laid out, that I’m looking for (Hey Mycroft, Listen to Me! - Mycroft)
Make sure you are signed up to donate your samples, first off. Second, for the interim, you can try the trainer skill to see if it helps. Third, you might actually end up better off having the family record a bunch of samples for the wake word you want and uploading them to the community repo, and we can model that for you, as well.
Last, there’s a beta model that has some more-specific sampling done to try and improve responsiveness for non-average-male voices. I don’t have a link handy, but one of the mycroft guys should be along in a day or so with that if you want to try it.
I am having exact same issue as above … linux-stable-Mycroft … doesn’t recognize my voice (60+ female) … “Hey Mycroft” works all the time if I use a recording of my husbands voice saying it on my iphone.
I’ve opted-in to open dataset … but since it never recognizes me saying Hey Mycroft is it capturing my voice??? …(and since I am trying to sound sound more like husband in my attempts to get it recognize me… not sure I want it to capture all the times that it’s not my normal voice ).
I’ve ordered a dev_Markll (May) . . . was hoping to get this worked-out before it arrives… or will the Markll be significantly different with regard Precise & wakewords.
You suggested recording our own voice samples & uploading them …could we do that for Hey Mycroft?
There’s a beta model you can try using: precise-data/hey-mycroft-001 at production_models · MycroftAI/precise-data · GitHub
you’ll have to download and configure it manually. It’s supposed to work better with non-average-male voices.
I haven’t tried fine-tuning a hey-mycroft model ever, so I can’t speak to that. The precise tagger should also be coming back soon-ish, which will start to help as well.
Hopefully Precise will have a TFlite makeover as Googleresearch have published not only how but a framework to run and create your own models.
There is the GRU of Precise in there but for streaming KWS a CRNN has greater accuracy for less ops and latency, which would be my choice but just about every applicable state-of-art model for KWS is in the above repo with working examples.
On Arm tensorflow-addons can be a pain but as an intro to the great work of the above GitHub - StuartIanNaylor/g-kws: Adaption of the Googleresearch kws repo gives install info and also contains a few simple scripts to use the trained models.
As for models and training I am often confused at the methods proscribed by others as when it comes to variation in classification less is very much more accuracy.
If you are going to train for 1 or 2 voices then use data only of those voices and not some command set from the other side of the world, unless they are coming to your house to use your mycroft.
I have read some crazy ideas where people are just feeding random large amounts of data of anything and everything to notkw which is akin to advocating for the IT mantra of ‘garbage in’ is a good idea whilst with models it very much holds true and likely you will get 'garbage out.
That noise is mixed into KW & !KW without any consideration to the resultant volumes, noise should always be mixed at a lower volume than classification data or otherwise that is no longer classification data but purely noise.
Also for some reason the ‘Google command set’ is often used even though not one of the voices it contains is ever going to use your Mycroft but worse the ‘Google command set’ is a benchmark dataset containing deliberately high portions of bad and varied data that no model will ever hit 100% as thats the point as its a test to which can provide the best as there is no higher than 100% accuracy so you need a bad dataset for accuracy benchmarks hence the ‘Google command set’.
Google use the ‘command set’ to benchmark accuracy but I am pretty damn sure that dataset isn’t the one they use for their range of voiceAI.
Its actually really easy to create custom datasets and you use actual data on the capture device of use not third party imported datasets as there is nothing more accurate than training with ‘Your’ voice.
Precise is long in the tooth and extremely heavy whilst running the full tensorflow considering TFL can run the same models just less than x10 faster.
I really do suggest the Devs have a look at the Google-KWS as its end to end and just needs to be fed with a chunked audio feed as even the MFCC is embedded into the model.
just wanted to share my experience with this model and to give a short feedback.
I’ve now used this model throughout my home on 3 devices for 4 months and it’s great! Mycroft recognizes the female voices almost as well as the male voices (but it’s way better than before. With the default model Mycroft was kind of unusable) and it reduced the amount of false activations a little.
For me a sensitivity of 0.3 and a trigger value of 6 have worked the best if someone wants to try.
Just one question: Why isn’t this model the default model for Mycroft? Has it any hidden disadvantages?
Thanks again for the hint @baconator