For Those Having Trouble Training a New Wake Word

Hello friends!

After much collaboration with my friend El-Tocino and others, I’ve written up a guide to getting your custom wake-word trained, building on the available wiki entries and articles from others. Surely, I’m standing on the shoulders of giants here, so I humbly submit this guide for the use of anyone who was having as much trouble getting a custom wake-word to work as I was.

I hope this helps someone, and I appreciate feedback and suggestions.

8 Likes

Thank you for the write up. I’ll give it a go this week, been putting it off for years! PocketSphinx really wasn’t that bad for me once I tuned its sensitivity.

1 Like

Yes but finding the correct levels is frustrating. And while my main ‘Mycroft’ is on a large Linux server, it’s name and wake word is genesis. However genesis is much more sensitive to my wife’s voice, but to add to the difficulty arises and my wife’s name is Janice and depending on the voice level there can be some confusion.

Sounds like you need to add a bunch more not-wake-words of “Janice”.

I’m finally getting around to trying this. Thank you @sparkyvision !

I’m stuck trying to run setup.sh. I’m getting:

ERROR: This script does not work on Python 3.6 The minimum supported Python version is 3.7. Please use https://bootstrap.pypa.io/pip/3.6/get-pip.py instead.

Any ideas? I’m using an Ubuntu 18.04 virtual machine, per the instructions.

Here’s some more context:

user@mycroft-trainer:~/mycroft-precise$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.6 LTS
Release:        18.04
Codename:       bionic
user@mycroft-trainer:~/mycroft-precise$ time ./setup.sh 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
cython is already the newest version (0.26.1-0.4).
libatlas-base-dev is already the newest version (3.10.3-5).
libhdf5-dev is already the newest version (1.10.0-patch1+docs-4).
libopenblas-dev is already the newest version (0.2.20+ds-4).
portaudio19-dev is already the newest version (19.6.0-1).
python3-h5py is already the newest version (2.7.1-2).
python3-scipy is already the newest version (0.19.1-2ubuntu1).
swig is already the newest version (3.0.12-1).
curl is already the newest version (7.58.0-2ubuntu3.18).
libpulse-dev is already the newest version (1:11.1-1ubuntu7.11).
python3-pip is already the newest version (9.0.1-2.3~ubuntu1.18.04.5).
0 upgraded, 0 newly installed, 0 to remove and 23 not upgraded.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2617k  100 2617k    0     0  29.7M      0 --:--:-- --:--:-- --:--:-- 29.7M
ERROR: This script does not work on Python 3.6 The minimum supported Python version is 3.7. Please use https://bootstrap.pypa.io/pip/3.6/get-pip.py instead.

real    0m0.416s
user    0m0.286s
sys     0m0.088s

This seemed to get things working again:

sudo apt install python3.7 python3.7-dev
sudo ln -fs python3.7 /usr/bin/python3

Doesn’t really work like that as the not-wake-words of “Janice” are not individual elements but just part of the graph on the not-wake-word label.
The model still reacts because the wake-word label contains more similarity to the not-wake-words that are not-janice and substantially different.
When your training and you see your model climb steeply and quickly in accuracy it doesn’t mean its accurate it just means the model can distinguish between your KW and other labels and is ‘overfitted’ as likely will be prone to false positives & negatives even though stated near 100% accuracy.
Every item you add to a singular label changes the spectral bias of that label a fraction and adding Janice could lower the score of the spectra of other words.

If you add another label of ‘sounds like’ that are not in ‘not kw’ the ‘sounds like’ label has more cross entropy and you will notice your training accuracy will be a more gradual slope and likely need longer, but you will get less false positives and negatives.
Works with any KW type of model and you can further split ‘sounds like’ into start & ends ‘sound likes’ but generally low label count models often to seem to be very prone to ‘over fitting’ and adding labels creates much work in data prep but no additional running load just more accuracy.

Mlcommons have a huge ‘word’ dataset(s) Multilingual Spoken Words Corpus - 50 Languages and Over 23 Million Audio Keyword Examples - MLCommons as the idea of just adding ‘not kw’ is a bit ‘infinity & beyond’.

1 Like

I could refer you to the previously written docs you’ve ignored and/or decried that cover all this but you’re you, so carry on doing…whatever…it is you do.

Why not post something worthwhile and link the docs you mention.

I’ve trained a custom wake word, but the precise-test output looks wrong (everything is a true negative) and precise-listen isn’t able to detect me saying the wake word. Here’s the test output:

Data: <TrainData wake_words=421 not_wake_words=52678 test_wake_words=102 test_not_wake_words=13151>
=== False Positives ===


=== False Negatives ===


=== Counts ===
False Positives: 0
True Negatives: 12986
False Negatives: 0
True Positives: 0


=== Summary ===
12986 out of 12986
100.00%

0.00% false positives
0.00% false negatives

Any ideas? I created, converted, and trimmed silence with quite a few wake-word and not-wake-word recordings. I’ve tried precise-train about a dozen times, using invocations suggested by @sparkyvision and @baconator .

If I stop training after 6 epochs I get different test output, but I still don’t get a model that can detect the wake word.

I’m also asking for help in chat in the “general” channel–I’ve posted more details there.

I am not really sure as I am a bit of a tensorflow geek which Precise is 100% tensorflow its just packaged and branded as Mycroft Precise but is like any other tensorflow model apart from if you go on the tensorflow forum as don’t take my opinion for granted or check the tensorflow documentation so much is basically wrong here.

wake_words=421 to not_wake_words=52678 is hugely imbalanced to start with.
421 really is far too few wake_words and likely should be a 0 on the end of that but for balance it should match the not_wake_words.
As I was saying about general tensorflow models doesn’t matter if GRU or some of the latest and greatest models you are just talking a few % in accuracy difference and the biggest provider to accuracy is the dataset you use.

Also you have stopped after 6 epochs which is when the full dataset has been trained 6x times is far too short for training any sort of KWS I am regularly training my own tensorflow models often around the 200 epoch mark.

To create some more KW use sox to augment what you have or concatenate 2 words so your kw qty is kw1*kw2 “Hey” is part of the common voice single voice segment.

Going back to this far too binary setup of kw vs not_word as the only ‘labels’ or ‘class’ with whatever terminology you wish to use is never going to be great as just pouring in extra not_kw into a singular label that just becomes this see-saw of spectra with little cross entropy to the KW will always be the case.
What I have seen what has been written about training and setup is just absolutely bizzare to how tensorflow works.

If you where going to be simplistic your setting up 2x spectra graphs tensorflow drops your input into each and with a binary setup like this often you get false positives not because the KW is particularly close but because because you are just far away from not_kw.
Here and with precise has been a complete misconception of how tensorflow works, how tensorflow states models should be, how tensorflow advises on ‘class’ / ‘label’ balance should be treated and strangely been regurgitated for years.

A few members have previously tried to rectify some of the flaws mycroft-precise-tips/README.md at main · sparky-vision/mycroft-precise-tips · GitHub
Its never going to be great though with a binary model where KW if vastly overfitted with a not_KW that is equally vastly underfitted as it contains so much variance.

But do not take my word for it read up on Tensorflow as Precise is 100% tensorflow and apols that you have spent so much time creating that hugely imbalanced dataset and not much more else that I can say.

Thanks for your reply. I’ll try re-training with fewer not-wake-word samples. Heck, I’ve come this far, why not?

98% of the not-wake-word samples are from the google speech commands, so thankfully I didn’t waste too much time there.

The training & test data shape was recommended in @sparkyvision 's mycroft-precise-tips you linked to. Note where the author claims ~51k not wake words and 300 wake words (they also recommend against using sox). @baconator 's data shape sounds similar, see localcroft/Precise.md at master · el-tocino/localcroft · GitHub . (dang, suddenly I’m hungry for bacon) .

Hopefully I’m using the phrase “data shape” correctly. When I say that I’m referring to the balance–the number of wake-word and not-wake-word samples.

What you are recommending (roughly the same number of wake-word and not-wake-word samples) is way different so I’d love to understand why. Unfortunately I’m pretty much lost when you mention kw / not_kw, labels, class, binary, spectra graphs, cross entropy, over-fit, under-fit. I get bits and pieces of what you wrote, but this just really isn’t my field. I guess I’ll need to read up on Tensorflow as you recommend.

The precise-test output I included was after 150 epochs. I tried stopping after 6 just as an experiment because acc and val_acc were hitting 1.000 and staying there. I also tried 300 epochs and a few different invocations of precise-train. I haven’t been able to train a model that works.

Here’s yet another try without google speech commands samples. Everything is still coming back as true negatives.

Training:

$ precise-train custom-hey-mycroft.net data/ -e 150
Data: <TrainData wake_words=421 not_wake_words=885 test_wake_words=102 test_not_wake_words=217>
Loading wake-word...

       
Loading not-wake-word...

       
Loading wake-word...

       
Loading not-wake-word...

       
Inputs shape: (620, 29, 13)
Outputs shape: (620, 1)
Test inputs shape: (217, 29, 13)
Test outputs shape: (217, 1)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
net (GRU)                    (None, 20)                2040      
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 21        
=================================================================
Total params: 2,061
Trainable params: 2,061
Non-trainable params: 0
_________________________________________________________________
Train on 620 samples, validate on 217 samples
Epoch 1/150

620/620 [==============================] - 0s 370us/step - loss: 0.5040 - acc: 0.6597 - val_loss: 0.2961 - val_acc: 0.8479
Epoch 2/150

620/620 [==============================] - 0s 38us/step - loss: 0.3600 - acc: 0.7726 - val_loss: 0.2368 - val_acc: 0.9171
Epoch 3/150

620/620 [==============================] - 0s 38us/step - loss: 0.3130 - acc: 0.8226 - val_loss: 0.2005 - val_acc: 0.9447
Epoch 4/150

620/620 [==============================] - 0s 40us/step - loss: 0.2709 - acc: 0.8516 - val_loss: 0.1739 - val_acc: 0.9631
Epoch 5/150

620/620 [==============================] - 0s 39us/step - loss: 0.2492 - acc: 0.8710 - val_loss: 0.1538 - val_acc: 0.9677
Epoch 6/150

620/620 [==============================] - 0s 39us/step - loss: 0.2329 - acc: 0.8774 - val_loss: 0.1382 - val_acc: 0.9770
Epoch 7/150

620/620 [==============================] - 0s 39us/step - loss: 0.2070 - acc: 0.9129 - val_loss: 0.1252 - val_acc: 0.9770
Epoch 8/150

620/620 [==============================] - 0s 39us/step - loss: 0.1939 - acc: 0.9161 - val_loss: 0.1139 - val_acc: 0.9816
Epoch 9/150

620/620 [==============================] - 0s 39us/step - loss: 0.1862 - acc: 0.9113 - val_loss: 0.1043 - val_acc: 0.9816
Epoch 10/150

620/620 [==============================] - 0s 39us/step - loss: 0.1660 - acc: 0.9177 - val_loss: 0.0967 - val_acc: 0.9816
Epoch 11/150

620/620 [==============================] - 0s 40us/step - loss: 0.1610 - acc: 0.9419 - val_loss: 0.0894 - val_acc: 0.9816
Epoch 12/150

620/620 [==============================] - 0s 39us/step - loss: 0.1785 - acc: 0.9016 - val_loss: 0.0829 - val_acc: 0.9816
Epoch 13/150

620/620 [==============================] - 0s 39us/step - loss: 0.1425 - acc: 0.9387 - val_loss: 0.0776 - val_acc: 0.9816
Epoch 14/150

620/620 [==============================] - 0s 40us/step - loss: 0.1483 - acc: 0.9323 - val_loss: 0.0728 - val_acc: 0.9862
Epoch 15/150

620/620 [==============================] - 0s 39us/step - loss: 0.1506 - acc: 0.9258 - val_loss: 0.0680 - val_acc: 0.9862
Epoch 16/150

620/620 [==============================] - 0s 39us/step - loss: 0.1386 - acc: 0.9419 - val_loss: 0.0636 - val_acc: 0.9862
Epoch 17/150

620/620 [==============================] - 0s 41us/step - loss: 0.1133 - acc: 0.9613 - val_loss: 0.0591 - val_acc: 0.9862
Epoch 18/150

620/620 [==============================] - 0s 39us/step - loss: 0.1186 - acc: 0.9468 - val_loss: 0.0558 - val_acc: 0.9908
Epoch 19/150

620/620 [==============================] - 0s 39us/step - loss: 0.1226 - acc: 0.9484 - val_loss: 0.0527 - val_acc: 0.9908
Epoch 20/150

620/620 [==============================] - 0s 39us/step - loss: 0.1223 - acc: 0.9500 - val_loss: 0.0500 - val_acc: 0.9908
Epoch 21/150

620/620 [==============================] - 0s 40us/step - loss: 0.1102 - acc: 0.9532 - val_loss: 0.0473 - val_acc: 0.9908
Epoch 22/150

620/620 [==============================] - 0s 40us/step - loss: 0.1043 - acc: 0.9613 - val_loss: 0.0450 - val_acc: 0.9908
Epoch 23/150

620/620 [==============================] - 0s 39us/step - loss: 0.1005 - acc: 0.9548 - val_loss: 0.0427 - val_acc: 0.9908
Epoch 24/150

620/620 [==============================] - 0s 40us/step - loss: 0.1054 - acc: 0.9500 - val_loss: 0.0407 - val_acc: 0.9908
Epoch 25/150

620/620 [==============================] - 0s 39us/step - loss: 0.1088 - acc: 0.9468 - val_loss: 0.0387 - val_acc: 0.9908
Epoch 26/150

620/620 [==============================] - 0s 40us/step - loss: 0.0952 - acc: 0.9484 - val_loss: 0.0370 - val_acc: 0.9908
Epoch 27/150

620/620 [==============================] - 0s 42us/step - loss: 0.0843 - acc: 0.9645 - val_loss: 0.0354 - val_acc: 0.9908
Epoch 28/150

620/620 [==============================] - 0s 39us/step - loss: 0.0851 - acc: 0.9710 - val_loss: 0.0338 - val_acc: 0.9908
Epoch 29/150

620/620 [==============================] - 0s 40us/step - loss: 0.0824 - acc: 0.9613 - val_loss: 0.0320 - val_acc: 0.9908
Epoch 30/150

620/620 [==============================] - 0s 39us/step - loss: 0.0855 - acc: 0.9694 - val_loss: 0.0305 - val_acc: 0.9908
Epoch 31/150

620/620 [==============================] - 0s 41us/step - loss: 0.0826 - acc: 0.9613 - val_loss: 0.0291 - val_acc: 0.9908
Epoch 32/150

620/620 [==============================] - 0s 39us/step - loss: 0.0721 - acc: 0.9710 - val_loss: 0.0276 - val_acc: 0.9908
Epoch 33/150

620/620 [==============================] - 0s 39us/step - loss: 0.0832 - acc: 0.9661 - val_loss: 0.0264 - val_acc: 0.9908
Epoch 34/150

620/620 [==============================] - 0s 40us/step - loss: 0.0659 - acc: 0.9726 - val_loss: 0.0254 - val_acc: 0.9908
Epoch 35/150

620/620 [==============================] - 0s 39us/step - loss: 0.0657 - acc: 0.9726 - val_loss: 0.0242 - val_acc: 0.9954
Epoch 36/150

620/620 [==============================] - 0s 39us/step - loss: 0.0677 - acc: 0.9677 - val_loss: 0.0232 - val_acc: 0.9954
Epoch 37/150

620/620 [==============================] - 0s 40us/step - loss: 0.0584 - acc: 0.9839 - val_loss: 0.0222 - val_acc: 0.9954
Epoch 38/150

620/620 [==============================] - 0s 39us/step - loss: 0.0631 - acc: 0.9758 - val_loss: 0.0213 - val_acc: 0.9954
Epoch 39/150

620/620 [==============================] - 0s 39us/step - loss: 0.0576 - acc: 0.9790 - val_loss: 0.0202 - val_acc: 0.9954
Epoch 40/150

620/620 [==============================] - 0s 40us/step - loss: 0.0578 - acc: 0.9726 - val_loss: 0.0193 - val_acc: 0.9954
Epoch 41/150

620/620 [==============================] - 0s 41us/step - loss: 0.0598 - acc: 0.9774 - val_loss: 0.0185 - val_acc: 0.9954
Epoch 42/150

620/620 [==============================] - 0s 39us/step - loss: 0.0505 - acc: 0.9806 - val_loss: 0.0177 - val_acc: 0.9954
Epoch 43/150

620/620 [==============================] - 0s 39us/step - loss: 0.0590 - acc: 0.9823 - val_loss: 0.0169 - val_acc: 1.0000
Epoch 44/150

620/620 [==============================] - 0s 41us/step - loss: 0.0618 - acc: 0.9758 - val_loss: 0.0162 - val_acc: 1.0000
Epoch 45/150

620/620 [==============================] - 0s 39us/step - loss: 0.0630 - acc: 0.9694 - val_loss: 0.0156 - val_acc: 1.0000
Epoch 46/150

620/620 [==============================] - 0s 39us/step - loss: 0.0482 - acc: 0.9839 - val_loss: 0.0148 - val_acc: 1.0000
Epoch 47/150

620/620 [==============================] - 0s 40us/step - loss: 0.0384 - acc: 0.9871 - val_loss: 0.0142 - val_acc: 1.0000
Epoch 48/150

620/620 [==============================] - 0s 40us/step - loss: 0.0474 - acc: 0.9790 - val_loss: 0.0138 - val_acc: 1.0000
Epoch 49/150

620/620 [==============================] - 0s 39us/step - loss: 0.0421 - acc: 0.9887 - val_loss: 0.0132 - val_acc: 1.0000
Epoch 50/150

620/620 [==============================] - 0s 40us/step - loss: 0.0442 - acc: 0.9855 - val_loss: 0.0125 - val_acc: 1.0000
Epoch 51/150

620/620 [==============================] - 0s 40us/step - loss: 0.0490 - acc: 0.9855 - val_loss: 0.0119 - val_acc: 1.0000
Epoch 52/150

620/620 [==============================] - 0s 39us/step - loss: 0.0402 - acc: 0.9887 - val_loss: 0.0114 - val_acc: 1.0000
Epoch 53/150

620/620 [==============================] - 0s 41us/step - loss: 0.0415 - acc: 0.9871 - val_loss: 0.0109 - val_acc: 1.0000
Epoch 54/150

620/620 [==============================] - 0s 39us/step - loss: 0.0301 - acc: 0.9903 - val_loss: 0.0103 - val_acc: 1.0000
Epoch 55/150

620/620 [==============================] - 0s 39us/step - loss: 0.0373 - acc: 0.9855 - val_loss: 0.0099 - val_acc: 1.0000
Epoch 56/150

620/620 [==============================] - 0s 40us/step - loss: 0.0406 - acc: 0.9823 - val_loss: 0.0096 - val_acc: 1.0000
Epoch 57/150

620/620 [==============================] - 0s 40us/step - loss: 0.0379 - acc: 0.9887 - val_loss: 0.0092 - val_acc: 1.0000
Epoch 58/150

620/620 [==============================] - 0s 39us/step - loss: 0.0359 - acc: 0.9903 - val_loss: 0.0088 - val_acc: 1.0000
Epoch 59/150

620/620 [==============================] - 0s 39us/step - loss: 0.0341 - acc: 0.9919 - val_loss: 0.0083 - val_acc: 1.0000
Epoch 60/150

620/620 [==============================] - 0s 41us/step - loss: 0.0354 - acc: 0.9935 - val_loss: 0.0078 - val_acc: 1.0000
Epoch 61/150

620/620 [==============================] - 0s 39us/step - loss: 0.0296 - acc: 0.9887 - val_loss: 0.0076 - val_acc: 1.0000
Epoch 62/150

620/620 [==============================] - 0s 39us/step - loss: 0.0350 - acc: 0.9823 - val_loss: 0.0072 - val_acc: 1.0000
Epoch 63/150

620/620 [==============================] - 0s 40us/step - loss: 0.0320 - acc: 0.9855 - val_loss: 0.0069 - val_acc: 1.0000
Epoch 64/150

620/620 [==============================] - 0s 41us/step - loss: 0.0315 - acc: 0.9903 - val_loss: 0.0066 - val_acc: 1.0000
Epoch 65/150

620/620 [==============================] - 0s 40us/step - loss: 0.0278 - acc: 0.9903 - val_loss: 0.0063 - val_acc: 1.0000
Epoch 66/150

620/620 [==============================] - 0s 41us/step - loss: 0.0265 - acc: 0.9919 - val_loss: 0.0059 - val_acc: 1.0000
Epoch 67/150

620/620 [==============================] - 0s 40us/step - loss: 0.0293 - acc: 0.9903 - val_loss: 0.0056 - val_acc: 1.0000
Epoch 68/150

620/620 [==============================] - 0s 40us/step - loss: 0.0226 - acc: 0.9935 - val_loss: 0.0054 - val_acc: 1.0000
Epoch 69/150

620/620 [==============================] - 0s 40us/step - loss: 0.0201 - acc: 0.9935 - val_loss: 0.0051 - val_acc: 1.0000
Epoch 70/150

620/620 [==============================] - 0s 40us/step - loss: 0.0212 - acc: 0.9887 - val_loss: 0.0050 - val_acc: 1.0000
Epoch 71/150

620/620 [==============================] - 0s 39us/step - loss: 0.0248 - acc: 0.9919 - val_loss: 0.0047 - val_acc: 1.0000
Epoch 72/150

620/620 [==============================] - 0s 40us/step - loss: 0.0225 - acc: 0.9952 - val_loss: 0.0044 - val_acc: 1.0000
Epoch 73/150

620/620 [==============================] - 0s 41us/step - loss: 0.0157 - acc: 0.9935 - val_loss: 0.0043 - val_acc: 1.0000
Epoch 74/150

620/620 [==============================] - 0s 40us/step - loss: 0.0187 - acc: 0.9919 - val_loss: 0.0041 - val_acc: 1.0000
Epoch 75/150

620/620 [==============================] - 0s 39us/step - loss: 0.0161 - acc: 0.9968 - val_loss: 0.0040 - val_acc: 1.0000
Epoch 76/150

620/620 [==============================] - 0s 40us/step - loss: 0.0249 - acc: 0.9903 - val_loss: 0.0038 - val_acc: 1.0000
Epoch 77/150

620/620 [==============================] - 0s 40us/step - loss: 0.0140 - acc: 0.9984 - val_loss: 0.0036 - val_acc: 1.0000
Epoch 78/150

620/620 [==============================] - 0s 40us/step - loss: 0.0208 - acc: 0.9935 - val_loss: 0.0035 - val_acc: 1.0000
Epoch 79/150

620/620 [==============================] - 0s 41us/step - loss: 0.0171 - acc: 0.9952 - val_loss: 0.0033 - val_acc: 1.0000
Epoch 80/150

620/620 [==============================] - 0s 40us/step - loss: 0.0177 - acc: 0.9935 - val_loss: 0.0031 - val_acc: 1.0000
Epoch 81/150

620/620 [==============================] - 0s 39us/step - loss: 0.0149 - acc: 0.9984 - val_loss: 0.0030 - val_acc: 1.0000
Epoch 82/150

620/620 [==============================] - 0s 39us/step - loss: 0.0132 - acc: 0.9952 - val_loss: 0.0029 - val_acc: 1.0000
Epoch 83/150

620/620 [==============================] - 0s 40us/step - loss: 0.0115 - acc: 1.0000 - val_loss: 0.0028 - val_acc: 1.0000
Epoch 84/150

620/620 [==============================] - 0s 39us/step - loss: 0.0101 - acc: 0.9984 - val_loss: 0.0026 - val_acc: 1.0000
Epoch 85/150

620/620 [==============================] - 0s 40us/step - loss: 0.0105 - acc: 1.0000 - val_loss: 0.0025 - val_acc: 1.0000
Epoch 86/150

620/620 [==============================] - 0s 43us/step - loss: 0.0149 - acc: 0.9984 - val_loss: 0.0024 - val_acc: 1.0000
Epoch 87/150

620/620 [==============================] - 0s 40us/step - loss: 0.0137 - acc: 0.9952 - val_loss: 0.0023 - val_acc: 1.0000
Epoch 88/150

620/620 [==============================] - 0s 40us/step - loss: 0.0080 - acc: 0.9984 - val_loss: 0.0022 - val_acc: 1.0000
Epoch 89/150

620/620 [==============================] - 0s 41us/step - loss: 0.0131 - acc: 0.9968 - val_loss: 0.0021 - val_acc: 1.0000
Epoch 90/150

620/620 [==============================] - 0s 40us/step - loss: 0.0105 - acc: 0.9968 - val_loss: 0.0020 - val_acc: 1.0000
Epoch 91/150

620/620 [==============================] - 0s 40us/step - loss: 0.0107 - acc: 0.9952 - val_loss: 0.0020 - val_acc: 1.0000
Epoch 92/150

620/620 [==============================] - 0s 40us/step - loss: 0.0097 - acc: 1.0000 - val_loss: 0.0019 - val_acc: 1.0000
Epoch 93/150

620/620 [==============================] - 0s 39us/step - loss: 0.0057 - acc: 1.0000 - val_loss: 0.0018 - val_acc: 1.0000
Epoch 94/150

620/620 [==============================] - 0s 39us/step - loss: 0.0069 - acc: 0.9984 - val_loss: 0.0017 - val_acc: 1.0000
Epoch 95/150

620/620 [==============================] - 0s 39us/step - loss: 0.0065 - acc: 1.0000 - val_loss: 0.0016 - val_acc: 1.0000
Epoch 96/150

620/620 [==============================] - 0s 40us/step - loss: 0.0081 - acc: 0.9984 - val_loss: 0.0016 - val_acc: 1.0000
Epoch 97/150

620/620 [==============================] - 0s 39us/step - loss: 0.0098 - acc: 1.0000 - val_loss: 0.0015 - val_acc: 1.0000
Epoch 98/150

620/620 [==============================] - 0s 39us/step - loss: 0.0055 - acc: 1.0000 - val_loss: 0.0014 - val_acc: 1.0000
Epoch 99/150

620/620 [==============================] - 0s 41us/step - loss: 0.0072 - acc: 1.0000 - val_loss: 0.0013 - val_acc: 1.0000
Epoch 100/150

620/620 [==============================] - 0s 39us/step - loss: 0.0054 - acc: 0.9984 - val_loss: 0.0013 - val_acc: 1.0000
Epoch 101/150

620/620 [==============================] - 0s 39us/step - loss: 0.0086 - acc: 1.0000 - val_loss: 0.0012 - val_acc: 1.0000
Epoch 102/150

620/620 [==============================] - 0s 40us/step - loss: 0.0078 - acc: 0.9984 - val_loss: 0.0011 - val_acc: 1.0000
Epoch 103/150

620/620 [==============================] - 0s 40us/step - loss: 0.0058 - acc: 1.0000 - val_loss: 0.0010 - val_acc: 1.0000
Epoch 104/150

620/620 [==============================] - 0s 40us/step - loss: 0.0039 - acc: 0.9968 - val_loss: 9.8235e-04 - val_acc: 1.0000
Epoch 105/150

620/620 [==============================] - 0s 40us/step - loss: 0.0039 - acc: 1.0000 - val_loss: 9.3719e-04 - val_acc: 1.0000
Epoch 106/150

620/620 [==============================] - 0s 40us/step - loss: 0.0030 - acc: 1.0000 - val_loss: 8.9144e-04 - val_acc: 1.0000
Epoch 107/150

620/620 [==============================] - 0s 40us/step - loss: 0.0044 - acc: 0.9984 - val_loss: 8.3908e-04 - val_acc: 1.0000
Epoch 108/150

620/620 [==============================] - 0s 39us/step - loss: 0.0030 - acc: 1.0000 - val_loss: 7.9770e-04 - val_acc: 1.0000
Epoch 109/150

620/620 [==============================] - 0s 40us/step - loss: 0.0036 - acc: 1.0000 - val_loss: 7.4814e-04 - val_acc: 1.0000
Epoch 110/150

620/620 [==============================] - 0s 39us/step - loss: 0.0055 - acc: 0.9984 - val_loss: 6.9113e-04 - val_acc: 1.0000
Epoch 111/150

620/620 [==============================] - 0s 40us/step - loss: 0.0024 - acc: 1.0000 - val_loss: 6.4932e-04 - val_acc: 1.0000
Epoch 112/150

620/620 [==============================] - 0s 40us/step - loss: 0.0035 - acc: 1.0000 - val_loss: 6.0726e-04 - val_acc: 1.0000
Epoch 113/150

620/620 [==============================] - 0s 40us/step - loss: 0.0042 - acc: 0.9984 - val_loss: 5.8828e-04 - val_acc: 1.0000
Epoch 114/150

620/620 [==============================] - 0s 39us/step - loss: 0.0049 - acc: 1.0000 - val_loss: 5.3590e-04 - val_acc: 1.0000
Epoch 115/150

620/620 [==============================] - 0s 39us/step - loss: 0.0063 - acc: 0.9968 - val_loss: 5.1206e-04 - val_acc: 1.0000
Epoch 116/150

620/620 [==============================] - 0s 39us/step - loss: 0.0037 - acc: 1.0000 - val_loss: 4.6828e-04 - val_acc: 1.0000
Epoch 117/150

620/620 [==============================] - 0s 39us/step - loss: 0.0022 - acc: 1.0000 - val_loss: 4.3992e-04 - val_acc: 1.0000
Epoch 118/150

620/620 [==============================] - 0s 40us/step - loss: 0.0022 - acc: 1.0000 - val_loss: 4.0873e-04 - val_acc: 1.0000
Epoch 119/150

620/620 [==============================] - 0s 40us/step - loss: 0.0026 - acc: 0.9984 - val_loss: 3.5682e-04 - val_acc: 1.0000
Epoch 120/150

620/620 [==============================] - 0s 40us/step - loss: 0.0023 - acc: 1.0000 - val_loss: 3.1697e-04 - val_acc: 1.0000
Epoch 121/150

620/620 [==============================] - 0s 38us/step - loss: 0.0011 - acc: 1.0000 - val_loss: 3.0181e-04 - val_acc: 1.0000
Epoch 122/150

620/620 [==============================] - 0s 38us/step - loss: 0.0015 - acc: 1.0000 - val_loss: 2.8702e-04 - val_acc: 1.0000
Epoch 123/150

620/620 [==============================] - 0s 39us/step - loss: 0.0037 - acc: 0.9984 - val_loss: 2.6929e-04 - val_acc: 1.0000
Epoch 124/150

620/620 [==============================] - 0s 38us/step - loss: 0.0022 - acc: 1.0000 - val_loss: 2.3655e-04 - val_acc: 1.0000
Epoch 125/150

620/620 [==============================] - 0s 40us/step - loss: 0.0035 - acc: 0.9984 - val_loss: 2.2102e-04 - val_acc: 1.0000
Epoch 126/150

620/620 [==============================] - 0s 39us/step - loss: 0.0030 - acc: 1.0000 - val_loss: 1.9317e-04 - val_acc: 1.0000
Epoch 127/150

620/620 [==============================] - 0s 39us/step - loss: 4.2609e-04 - acc: 1.0000 - val_loss: 1.8849e-04 - val_acc: 1.0000
Epoch 128/150

620/620 [==============================] - 0s 40us/step - loss: 0.0020 - acc: 1.0000 - val_loss: 1.7967e-04 - val_acc: 1.0000
Epoch 129/150

620/620 [==============================] - 0s 39us/step - loss: 0.0019 - acc: 1.0000 - val_loss: 1.7138e-04 - val_acc: 1.0000
Epoch 130/150

620/620 [==============================] - 0s 40us/step - loss: 0.0017 - acc: 1.0000 - val_loss: 1.6300e-04 - val_acc: 1.0000
Epoch 131/150

620/620 [==============================] - 0s 40us/step - loss: 0.0015 - acc: 1.0000 - val_loss: 1.5270e-04 - val_acc: 1.0000
Epoch 132/150

620/620 [==============================] - 0s 40us/step - loss: 0.0013 - acc: 1.0000 - val_loss: 1.4148e-04 - val_acc: 1.0000
Epoch 133/150

620/620 [==============================] - 0s 39us/step - loss: 9.3766e-04 - acc: 1.0000 - val_loss: 1.3763e-04 - val_acc: 1.0000
Epoch 134/150

620/620 [==============================] - 0s 39us/step - loss: 3.7757e-04 - acc: 1.0000 - val_loss: 1.3148e-04 - val_acc: 1.0000
Epoch 135/150

620/620 [==============================] - 0s 40us/step - loss: 0.0014 - acc: 1.0000 - val_loss: 1.2069e-04 - val_acc: 1.0000
Epoch 136/150

620/620 [==============================] - 0s 40us/step - loss: 9.2301e-04 - acc: 1.0000 - val_loss: 1.1077e-04 - val_acc: 1.0000
Epoch 137/150

620/620 [==============================] - 0s 38us/step - loss: 9.0810e-04 - acc: 1.0000 - val_loss: 1.0398e-04 - val_acc: 1.0000
Epoch 138/150

620/620 [==============================] - 0s 39us/step - loss: 0.0014 - acc: 1.0000 - val_loss: 8.3460e-05 - val_acc: 1.0000
Epoch 139/150

620/620 [==============================] - 0s 40us/step - loss: 0.0012 - acc: 1.0000 - val_loss: 7.9932e-05 - val_acc: 1.0000
Epoch 140/150

620/620 [==============================] - 0s 40us/step - loss: 0.0019 - acc: 1.0000 - val_loss: 7.8661e-05 - val_acc: 1.0000
Epoch 141/150

620/620 [==============================] - 0s 38us/step - loss: 0.0017 - acc: 1.0000 - val_loss: 7.4887e-05 - val_acc: 1.0000
Epoch 142/150

620/620 [==============================] - 0s 39us/step - loss: 0.0011 - acc: 1.0000 - val_loss: 7.3465e-05 - val_acc: 1.0000
Epoch 143/150

620/620 [==============================] - 0s 39us/step - loss: 1.4882e-04 - acc: 1.0000 - val_loss: 7.2583e-05 - val_acc: 1.0000
Epoch 144/150

620/620 [==============================] - 0s 39us/step - loss: 5.4213e-04 - acc: 1.0000 - val_loss: 7.0473e-05 - val_acc: 1.0000
Epoch 145/150

620/620 [==============================] - 0s 39us/step - loss: 0.0025 - acc: 0.9984 - val_loss: 7.2666e-05 - val_acc: 1.0000
Epoch 146/150

620/620 [==============================] - 0s 40us/step - loss: 0.0020 - acc: 1.0000 - val_loss: 7.1411e-05 - val_acc: 1.0000
Epoch 147/150

620/620 [==============================] - 0s 40us/step - loss: 0.0020 - acc: 1.0000 - val_loss: 6.6285e-05 - val_acc: 1.0000
Epoch 148/150

620/620 [==============================] - 0s 40us/step - loss: 0.0019 - acc: 1.0000 - val_loss: 6.3273e-05 - val_acc: 1.0000
Epoch 149/150

620/620 [==============================] - 0s 40us/step - loss: 4.8341e-04 - acc: 1.0000 - val_loss: 6.1387e-05 - val_acc: 1.0000
Epoch 150/150

620/620 [==============================] - 0s 41us/step - loss: 0.0015 - acc: 1.0000 - val_loss: 5.5593e-05 - val_acc: 1.0000

Testing:

$ precise-test custom-hey-mycroft.net data/
Loading wake-word...

       
Loading not-wake-word...

       
Data: <TrainData wake_words=421 not_wake_words=885 test_wake_words=102 test_not_wake_words=217>
=== False Positives ===


=== False Negatives ===


=== Counts ===
False Positives: 0
True Negatives: 217
False Negatives: 0
True Positives: 0


=== Summary ===
217 out of 217
100.00%

0.00% false positives
0.00% false negatives

I would be more inclined to use sox to augment your KW and make more than the other way.

If you google you can cope with imbalance Classification on imbalanced data  |  TensorFlow Core but don’t think Precise employs such mechanisms.

Only reason I am saying is because splitting notkw into noise (non voice kw) & notkw (voice kw) is a simple and easy why by just adding the noise ‘label’/‘class’ to the model.
What I started saying even with that I have always struggled still with overfitting KW and 2x sounds like of the start and end of your KW can further balance but have much more cross entropy so forcing the model to work much harder in training and end with a far more accurate model of use.

I would download Multilingual Spoken Words | MLCommons for your language for words and have a few of as many words you can also just picking words with a similar syllable count and length as words much longer or shorter are not really that much use.

Dunno must be Precise or your setup but if you want to get really to grips with KWS then tensorflow is by Google and the publish a repo specifically dedicated to KWS.

You could learn much there but purely down to its dataset manner Precise is always going to be oddly named.

PS what is your KW as have you played anyback to see exactly what you have?

Are you certain that your files are in the exact right format? Precise-train is extremely particular about the format, samples, mono files, etc. I’ve had issues where it would ignore huge chunks of files because they were in stereo.

The fact that it’s completely failling points to something other than the auditory quality of your data, and to something more fundamental.

Edit: additionally, you’re getting extremely high acc values very quickly, which smells wrong to me.

1 Like

Assuming KW means “keyword”, the thing I’m trying to detect. In Mycroft land seems like they use the term “wake word”.

I’m trying to train a model to recognize “hey mycroft” – that’s my wake word. The included/default Precise model for “hey mycroft” only detects my voice (white male) and fails to activate for many other people that try (other genders, ages, etc).

And I assume KWS is “keyword spotting”? Sigh, this is super overwhelming. I’m just a hobbyist with plenty else to do.

Have I spot-checked my data to listen to it to check that it sounds as I’d expect? Yes.

I think so. The file command says every one of my wav files are RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz, and ffprobe says all contain Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s.

The fact that it’s completely failling points to something other than the auditory quality of your data, and to something more fundamental.

Agreed.

Edit: additionally, you’re getting extremely high acc values very quickly, which smells wrong to me.

Agreed.

Dunno as its strange but ‘Hey Mycroft’ if Mycroft actually released that as a dataset it would make your life easier as you would have a datum and also you could just add to that than having to start from scratch.

Apols if its all been a bit much as my additions are merely curiosity to why Precise training sets itself up to be less than it could by applying a binary class model.
So ignore all that and sure someone can help.

PS install audacity and have a look at what you have in comparison to your not-kw, something is not right though but you know that.

Are they .riff files? I believe Precise expects the extension .wav. (I realize riff is just a container format for a variety of data.)

I’ve never seen riff come up in format discussions before, so I’m sort of singling that out as sus, as the kiddos say.

You’re welcome to upload some of your data somewhere and I can try to look at it when I get a chance.

Edit: I’m just a hobbyist too, don’t worry. If I can get it, you can get it. There’s just something goofy going on here.

That’s a good idea, I’ll give that a shot!

install audacity and have a look at what you have in comparison to your not-kw, something is not right though but you know that

Yeah, I did the recordings in audacity in the first place (following sparkyvision’s handy guide) and tried to clean them up as much as possible before training. I also trimmed silence at the beginnings and ends of the samples I made.


So I tried precise-test hey-mycroft.pb data/ to test the default/included/built-in wake word model against my test data set. This looks better, and matches my experience with the default model missing some wake word activations (at least, when the voice sounds like mine).

I can’t train though, I guess precise-train can’t work on hey-mycroft.pb?

$ precise-train hey-mycroft.pb data/
Using TensorFlow backend.
Loading from hey-mycroft.pb...
Warning: Unknown model type,  hey-mycroft.pb
Traceback (most recent call last):
  File "/home/user/mycroft-precise/.venv/bin/precise-train", line 33, in <module>
    sys.exit(load_entry_point('mycroft-precise', 'console_scripts', 'precise-train')())
  File "/home/user/mycroft-precise/precise/scripts/base_script.py", line 43, in run_main
    script = cls(args)
  File "/home/user/mycroft-precise/precise/scripts/train.py", line 87, in __init__
    self.model = create_model(args.model, params)
  File "/home/user/mycroft-precise/precise/model.py", line 70, in create_model
    model = load_precise_model(model_name)
  File "/home/user/mycroft-precise/precise/model.py", line 54, in load_precise_model
    return load_keras().models.load_model(model_name)
  File "/home/user/mycroft-precise/.venv/lib/python3.7/site-packages/keras/models.py", line 237, in load_model
    with h5py.File(filepath, mode='r') as f:
  File "/home/user/mycroft-precise/.venv/lib/python3.7/site-packages/h5py/_hl/files.py", line 408, in __init__
    swmr=swmr)
  File "/home/user/mycroft-precise/.venv/lib/python3.7/site-packages/h5py/_hl/files.py", line 173, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (file signature not found)

Is there an easy way to convert this .pb file to a .net file that works with precise-train? I searched around and tried a few methods and none worked for me.