For Those Having Trouble Training a New Wake Word

mycroftfan · May 17, 2022, 2:16am

All the sample files have a .wav extension.

I’ve never seen riff come up in format discussions before, so I’m sort of singling that out as sus, as the kiddos say.

I used ffmpeg in the same Ubuntu 18.04 VM I’ve been using for training and testing.

You’re welcome to upload some of your data somewhere and I can try to look at it when I get a chance.

Oh sure, awesome! Ok, I’ll message you a link.

Edit: I’m just a hobbyist too, don’t worry. If I can get it, you can get it. There’s just something goofy going on here.

Thanks, I needed this!

StuartIanNaylor · May 17, 2022, 6:33am

You can try these hey-marvin 10k KW
https://drive.google.com/file/d/1v8xfbj7bILcHn8KkTSs6G-twd3glnsbo/view?usp=sharing

hey-marvin2 100k KW
https://drive.google.com/file/d/1lm-wKZAYIGKQxtm2t6Vkrlv935ccJJfP/view?usp=sharing

Might not be the KW you want but gives you a datum to get used to training.

notkw-10k
https://drive.google.com/file/d/1V1rMgZKTUItZbzat_ZfDArg493kCkL0B/view?usp=sharing

notkw 100k
https://drive.google.com/file/d/1rxdzAS1t49tYvv8GkdwOzvamhdu3XMIC/view?usp=sharing

util-scripts
https://drive.google.com/file/d/18DMNdrwBUXw3lgDysoarfgw94rublGGx/view?usp=sharing

https://storage.googleapis.com/public-datasets-mswc/audio/en.tar.gz

JarbasAl · May 17, 2022, 2:09pm

For starters you should read the precise tips from eltocino, one of the first guides that has been used and validated by the community over the years

github.com

el-tocino/localcroft/blob/master/precise/Precise.md

The [precise page](https://github.com/MycroftAI/mycroft-precise/wiki/Training-your-own-wake-word#how-to-train-your-own-wake-word) has good instructions to get you started.  

#### data

You need more data. 

You should pick a wakeword with at least three syllables or two words making three or more syllables, preferrably that do not have a lot of similar sounding rhymes. 

The more data you can collect, the better, up to about 50k samples.  I've collected over 400 total wake word and about 5000 fake word samples (including generated sounds).  If you're using local uploads, you can review those and add them to your dataset.  Once you have collected your data, try and have an 80/20 training/test split.  ie, for 100 clips, 80 go to the wakewords folder, 20 go to the test/wakewords folder.  In a ten minute span, I can, using precise-collect, record about 75 prepared words.  

Slight update: The google speech commands dataset v0.2 can also be pulled down and used to supplement your not-wake-words.  This contains almost 100k samples.  When adding this to my current data, training slows down quite a bit.  Accuracy and val_acc both improve, I'm seeing val-acc reaching .999+ routinely now.  

Having a base of clean wake word samples to start with seems to work best. It is important that your core data be sourced as much as possible from your target audience.  From there it's a matter of testing to see what is best to model.  Precise modeling runs quickly, even on a cpu, so don't be afraid to start over a few times and try things. 

##### wake words

I've recorded myself quite a bit for my wake word.  I vary speed, inflection, volume, distance from mic, tone, etc. I have gotten about a dozen other folks to record samples for my model as well.  Your target audience is where you should be sourcing most of your data from.  When recording, I've used a variety of mics.  I have a cheap small diaphragm condenser that hits a usb preamp, a cheap usb mic, and a few on a PS Eye.  This isn't necessary, it's just been a matter of what was handy.

I have only recently started recording with noisy backgrounds.  Will update if I get better info.

This file has been truncated. show original

recently we got these cool repos i havent tried yet, but i know of people having great results with these

and

mycroftfan · May 19, 2022, 1:52am

What are these? What are you suggesting? Sorry, I’m totally lost.

mycroftfan · May 19, 2022, 1:55am

Thanks @JarbasAl . I’ve read el tocino’s guide and I have sunk a bunch of time into my training data already. I guess I’ll check those out if I just can’t get training to work with my existing data.

StuartIanNaylor · May 19, 2022, 2:06am

Ready made KW & !KW 10k of each, then same sets but 100k of each for good measure.
Formatted just need training the 1st or 2nd pair is your choice.
All wavs from MLcommons, if the don’t train and do as last time then its your setup not your dataset.
If they do train then its your dataset and you should get an idea of how a reasonable set should train.
They are not Big Data good but as good a ready made binary dataset, minus noise that you will find.

mycroftfan · May 24, 2022, 10:31pm

I used these. I put 80% in wake-word and 20% in test/wake-word.

I used these and did a similar 80/20 split between not-wake-word and test/not-wake-word.

Worked. Perfectly. Dang! I guess the problem is with my data. Thank you! Back to the drawing board.

$ precise-test hey-marvin.net hey-marvin/ 2>/dev/null 
Loading wake-word...
Loading not-wake-word...
Data: <TrainData wake_words=8000 not_wake_words=9108 test_wake_words=2000 test_not_wake_words=2277>
=== False Positives ===
hey-marvin/test/not-wake-word/2819015076embroideredcommon_voice_en_18751486.wav
hey-marvin/test/not-wake-word/2827423258theobaldcommon_voice_en_18984515.wav
hey-marvin/test/not-wake-word/2606411025indianapoliscommon_voice_en_19116959.wav

=== False Negatives ===
hey-marvin/test/wake-word/c913d758-e6d2-4a35-b930-e0ebd11656f1cats.wav
hey-marvin/test/wake-word/d367a887-994e-449d-987e-22b36a5099dbcatstb.wav

=== Counts ===
False Positives: 3
True Negatives: 2274
False Negatives: 2
True Positives: 1998


=== Summary ===
4272 out of 4277
99.88%

0.13% false positives
0.10% false negatives

mycroftfan · May 24, 2022, 11:00pm

Found one problem: the wav files must all be in the top-level directories (wake-word, not-wake-word, test/wake-word, test/not-wake-word) during training. I must have missed that in one of the guides!

This seems like a way better starting point for my training / data cleanup:

$ precise-test custom-hey-mycroft.net custom-hey-mycroft/ 2>/dev/null 
Loading wake-word...
Loading not-wake-word...
Data: <TrainData wake_words=421 not_wake_words=885 test_wake_words=102 test_not_wake_words=217>
=== False Positives ===
custom-hey-mycroft/test/not-wake-word/room-noise-03-46.wav
custom-hey-mycroft/test/not-wake-word/a-01_S-25.wav
custom-hey-mycroft/test/not-wake-word/e-01_S-4.wav
custom-hey-mycroft/test/not-wake-word/room-noise-03-20.wav
custom-hey-mycroft/test/not-wake-word/m-01_S-12.wav
custom-hey-mycroft/test/not-wake-word/m-01_S-47.wav
custom-hey-mycroft/test/not-wake-word/room-noise-03-48.wav

=== False Negatives ===
custom-hey-mycroft/test/wake-word/a-02_S-11.wav
custom-hey-mycroft/test/wake-word/a-01_S.wav
custom-hey-mycroft/test/wake-word/d-02_S-23.wav
custom-hey-mycroft/test/wake-word/a-02_S-10.wav
custom-hey-mycroft/test/wake-word/a-02_S-36.wav
custom-hey-mycroft/test/wake-word/m-01_S-37.wav
custom-hey-mycroft/test/wake-word/a-02_S-26.wav
custom-hey-mycroft/test/wake-word/a-01_S-11.wav
custom-hey-mycroft/test/wake-word/m-02_S-10.wav
custom-hey-mycroft/test/wake-word/a-01_S-54.wav
custom-hey-mycroft/test/wake-word/m-02_S-4.wav
custom-hey-mycroft/test/wake-word/m-02_S-12.wav
custom-hey-mycroft/test/wake-word/a-01_S-4.wav

=== Counts ===
False Positives: 7
True Negatives: 210
False Negatives: 13
True Positives: 89


=== Summary ===
299 out of 319
93.73%

3.23% false positives
12.75% false negatives

precise-listen doesn’t work yet (it doesn’t seem to recognize when I say the wake word).

Update: adding all the speech_commands_v0.01 helped a bit… I am at “99.9%” with fewer false positives and fewer false negatives. precise-listen works sometimes! I’m a little suspicious of this USB mic through a VM setup, so I’m going to try this fresh model on my actual picroft.

StuartIanNaylor · May 24, 2022, 11:47pm

You prob need to turn on Precise-collect so you can play back the KW it received which is a bit hard when not receiving any I guess.
Maybe it does have options to give max amplitude and avg of incoming wav, dunno as not a precise fan and don’t use it.

PS stop using audacity and manual editing and pip install sox and check Welcome to pysox’s documentation! — pysox 1.4.2 documentation as your latest results are not good.
Maybe being forced to do it programatically making sure length, format, sr and levels are right and not full of distortion, may highlight possible problems more?

You can always connect to your vm and arecord -D plughw:1 -V mono -r 16000 -f S16_LE -c 1 /dev/null or whatever device index is your mic and the -V will display a VU meter on the cli. It doesn’t record anything but is just a level check.

PS VM or container?

Use docker ps to get the name of the existing container
Use the command docker exec -it <container name> /bin/bash to get a bash shell in the container

Why Mycroft don’t have a high quality preformatted dataset available which hey-marvin sufficed is a mystery to me but at least it served its purpose.
Maybe start again but switch and make a hey-marvin by adding more samples from family members.

I did a rough and ready recording ‘boutique’ a while back GitHub - StuartIanNaylor/Dataset-builder: KWS dataset builder for Google-streaming-kws
That prompts on screen and the code there for augmentation and stuff will give you a load of tips.

Also just the text sentences to grab !kw from are as concise as you can get as they are ‘phonetic pangrams’ of nonsense sentences that have just about every phone & allophone in a single sentence.

mycroftfan · May 25, 2022, 12:29am

Not good how?

VM. I followed GitHub - sparky-vision/mycroft-precise-tips: mycroft-precise-tips , which recommends audacity over sox.

Seriously? Argggghhhh.

StuartIanNaylor · May 25, 2022, 12:43am

=== Summary ===
299 out of 319
93.73%

3.23% false positives
12.75% false negatives

Prob me being a KWS snob but for me the above results are atrocious, even on the dataset I posted I would still be drilling down and cleaning when 99.88%.
The KW where the only remnants a harddrive mishap of formatting the wrong one and losing loads of work as would of had a much cleaner !kw than just grabbing a selection fro MLCommons for you, but hey. Likely on my own models with my own dataset I would get 100% which actually means very little as your feeding what its just been trained on and its should be near 100% as it should of heard those already as in a real life environment with conditions and input it hasn’t been trained on things will just get worse.

Still guess it doesn’t matter as with a binary model such as Precise when you ran Hey-Marvin KW & !KW it was just voice samples and the cross entropy was quite high so the input KW had to be quite close to the KW label.
Because any single label has no memory of any individual elements just a graph of its input adding !voice will lower the cross entropy and move further away from KW so making it less accurate.
Its the catch-22 of a binary model such as this and trying to place so much variance in a single label and so little variance in another.

So if you are happy with that go with the flow and just find out what signal precise is getting from your mic and if that is good enough for you its good enough for you.
Sparky recommends recording via audacity but to be honest when you hand manipulating that qty of files you are just bound to get something wrong and that is where sox comes in for post processing and augmentation.
I am still kicking myself for losing all my scripts as its somehow more painful to dev them a 2nd time even though can remember the gist.
I am making a model of my own in the next couple of days and will be starting with those scripts again so will share when I have them done as with a sulk been dodging it.

StuartIanNaylor · May 26, 2022, 11:51pm

Just adding those bits as I go along try the trim script from ProjectEars/dataset at main · StuartIanNaylor/ProjectEars · GitHub as a test on your kw dataset.
Also by time hopefully you look there will be an augmentation folder as well.