Custom wake word false negative when using different recording device

MiniSoda · April 13, 2021, 2:52am

Data:
=== Counts ===
False Positives: 1
True Negatives: 7
False Negatives: 5
True Positives: 8

=== Summary ===
15 out of 21
71.43%

12.50% false positives
38.46% false negatives

I collected some samples with SONY PCM A10, a great and handy recording device.
It works fine when all test samples came from it.
However, when I trying to collect more test samples with on-device mic like respeaker usb array 2.0 and respeaker hat 4 mic on a rpi4, a lot of false negative pops up.
I was meant to collect more samples with a handy device, but it turns out on-device mic may not have the same recording quality.
Is there anything I can do to fix this problem?

baconator · April 13, 2021, 4:51am

how many samples are you training on vs. testing against?

MiniSoda · April 13, 2021, 5:32am

hi baconator,
TrainData wake_words=52 not_wake_words=82 test_wake_words=13 test_not_wake_words=8

baconator · April 13, 2021, 7:30am

More data’s helpful, in particular more not wake words would probably help. If there’s a pattern to what’s activating it incorrectly, add more of those kinds of sounds. I’d record on all the mics you can as well, particularly if you’re using one as the activation mic.

umsaenga · April 13, 2021, 9:20am

When I first started training custom wake I also thought to use the best quality microphone, but soon i learn it is not realistic if the device microphone is different and not as good quality

While good quality helps, with limited data set, best would be to train using the same microphone as used to detect real time.

Agree you need more data. Most time it ends up being around 4x not wake to 1 wake sample.

You can also save fp (when detecting real time) and build your data set to keep training and improve your specificity & sensitivity

MiniSoda · April 13, 2021, 11:04am

Thanks, good advice. I will try to collect more samples through on-device mic.

StuartIanNaylor · April 16, 2021, 4:19pm

I have done just a hacky linear repo dataset builder for the Google-streaming-kws

It just creates a disk based datset where the use of sox augments 20-40 samples to create 1000+

You can use it as is and just drag and drop the 1sec samples or hack and use the sox methods to create your own precise scripts.

More is definitely better and augmenting ‘your’ voice actually is a lot better than few samples or samples of some else’s voice.