Thanks for your reply. I’ll try re-training with fewer not-wake-word samples. Heck, I’ve come this far, why not?
98% of the not-wake-word samples are from the google speech commands, so thankfully I didn’t waste too much time there.
The training & test data shape was recommended in @sparkyvision 's mycroft-precise-tips you linked to. Note where the author claims ~51k not wake words and 300 wake words (they also recommend against using sox). @baconator 's data shape sounds similar, see localcroft/Precise.md at master · el-tocino/localcroft · GitHub . (dang, suddenly I’m hungry for bacon) .
Hopefully I’m using the phrase “data shape” correctly. When I say that I’m referring to the balance–the number of wake-word and not-wake-word samples.
What you are recommending (roughly the same number of wake-word and not-wake-word samples) is way different so I’d love to understand why. Unfortunately I’m pretty much lost when you mention kw / not_kw, labels, class, binary, spectra graphs, cross entropy, over-fit, under-fit. I get bits and pieces of what you wrote, but this just really isn’t my field. I guess I’ll need to read up on Tensorflow as you recommend.
The precise-test output I included was after 150 epochs. I tried stopping after 6 just as an experiment because acc and val_acc were hitting 1.000 and staying there. I also tried 300 epochs and a few different invocations of precise-train. I haven’t been able to train a model that works.