How to collect samples to improve wake word accuracy?

Faruk · January 27, 2021, 9:08am

Hello,

I am so new on speech world. I work on offline speech recognition project and I use mycroft-precise for wake word. I have some specific question. To summarize briefly the work I do:

I work on offline speech recognition with Rapsberry pi 3. I use mycroft-precise for wake word. I created a two-word wake word on Turkish Language. The word is “Mega yedi”, English speech “Mega seven”. I followed the steps in the mycroft-precise github repo. Avarage I recorded 60 .wav file for wake word and I recude wake word used for non-wake-word. The wake word working %85 accuracy. So sometimes I have to repeat the wake word a few times and it is not working with woman voice.(I tested with one woman and she repeated wake word four times.). I also want to solve these problems. I also want to solve these problems. I will collect new sample voices from different people for wake word and restart mycroft-precise learning.

So:
1- How much data should I collect for this? (I can collect sounds for this particular wake word from about 20 people. Is it enough?)
2- How many times should this 20 people say this wake-up word?
3- I search some dataset but some voice file is not appropriate for Record wake word conditions. You should say word 1,5-2 second after begin record. Do you have any Turkish dataset suggestions that meet these conditions?
4- Would it be useful to record the words “mega” and “yedi” separately and teach mycroft-precise?
5- Can the accuracy of the wake word be increased with words generated by digital human voice generation applications such as AWS Polly Generator? Because these are not human voices after all.

Thanks for timing

gras64 · January 28, 2021, 8:09am

hi it’s still a bit early for the public but i work on a wakeword skill https://github.com/gras64/wake-word-skill/tree/new_exe .this will check your wakeword and add background noise. it should work on pi 3 I haven’t tested enough yet.

Faruk · January 28, 2021, 8:31am

Thanks for your reply. I will try when I have time and I report results.

baconator · January 29, 2021, 4:59am

How much data can you get? You’ll want more. 20 people is a good start.
Three or four in varied manners, ie, quicker, slower, angrier, happier, etc., but try and make them how they would “normally” say your wake word.
No idea, sorry?
No.
Yes, this is an easy way to generate a few more samples, but it is best to build your core data from users saying the wake word.

I have some other info here. You will also want to record similar sounding words for the not-wake word dataset, I typically try and have people record 15 samples: 3 wake word and 12 not-wake-word. Additionally record all activations for a couple of weeks, and any false activations you can add to the not-wake-word set and retrain with.