Custom wake word false negative when using different recording device

Data:
=== Counts ===
False Positives: 1
True Negatives: 7
False Negatives: 5
True Positives: 8

=== Summary ===
15 out of 21
71.43%

12.50% false positives
38.46% false negatives

I collected some samples with SONY PCM A10, a great and handy recording device.
It works fine when all test samples came from it.
However, when I trying to collect more test samples with on-device mic like respeaker usb array 2.0 and respeaker hat 4 mic on a rpi4, a lot of false negative pops up.
I was meant to collect more samples with a handy device, but it turns out on-device mic may not have the same recording quality.
Is there anything I can do to fix this problem?

how many samples are you training on vs. testing against?

hi baconator,
TrainData wake_words=52 not_wake_words=82 test_wake_words=13 test_not_wake_words=8

More data’s helpful, in particular more not wake words would probably help. If there’s a pattern to what’s activating it incorrectly, add more of those kinds of sounds. I’d record on all the mics you can as well, particularly if you’re using one as the activation mic.

When I first started training custom wake I also thought to use the best quality microphone, but soon i learn it is not realistic if the device microphone is different and not as good quality

While good quality helps, with limited data set, best would be to train using the same microphone as used to detect real time.

Agree you need more data. Most time it ends up being around 4x not wake to 1 wake sample.

You can also save fp (when detecting real time) and build your data set to keep training and improve your specificity & sensitivity

1 Like

Thanks, good advice. I will try to collect more samples through on-device mic.

I have done just a hacky linear repo dataset builder for the Google-streaming-kws

It just creates a disk based datset where the use of sox augments 20-40 samples to create 1000+

You can use it as is and just drag and drop the 1sec samples or hack and use the sox methods to create your own precise scripts.

More is definitely better and augmenting ‘your’ voice actually is a lot better than few samples or samples of some else’s voice.