Amount of training data for custom wake word



I’m currently training my own custom wake word (WW) and I’m facing some issues:

1- Other words than the wake word are being recognized.

First, I only trained the WW model using the Public Domain Sounds Backup as negative examples (100 files) and 12 files containing the wake word as positive examples. This led the model to recognize almost every noise that contained voice in it, because the PDSB set of sounds is a collection of noises, not voice. Then, I added 18 more positive examples and the recognition got slightly better. Then I added 25 negative examples containing words that sound similar to my WW. The model got much better but stills recognizes many words that are not the intended WW.
Since the model is slowly getting better, I’m guessing that is just a matter of more training data. I would like to know if I’m on the right path or if the training should work much better with less data and I’m just training it wrong.

2- The model only recognizes the WW when it is pronounced very close to the mic.

I tried different mics and the WW ‘hey mycroft’ works really well even from afar. My model only works (if at all) when the person is at least 0,5m close to the mic. When I was recording the audio samples, I asked the persons to pronounce the word with a distance of 5cm from the mic. Could this be a reason? Is there a way to change the sensibility of the model from the code that trains it? I’ve read the code but I couldn’t find any parameter for that. Maybe I missed it?

It is my first time training a ML model, so any further information about this process is highly appreciated.


For 1) yes, more data, particularly more not-wake-word data (try for maybe 5:1 nww:ww). For nww samples, using rhyme words and similar sounding words is a great idea.
For 2) record samples from a variety of ways. If all your ww samples are from high-volume, clear recordings, that’s what it’s going to recognize. I used a bunch of different samples. Local wakeword recording can also be turned on, then those samples (good and bad) sorted and used to model with.


Hi Consetto,

Hope the training is going well. In case you hadn’t seen it in our docs, baconator (aka el-tocino) wrote up a number of their learnings that might be helpful: