Doesn’t really work like that as the not-wake-words of “Janice” are not individual elements but just part of the graph on the not-wake-word label.
The model still reacts because the wake-word label contains more similarity to the not-wake-words that are not-janice and substantially different.
When your training and you see your model climb steeply and quickly in accuracy it doesn’t mean its accurate it just means the model can distinguish between your KW and other labels and is ‘overfitted’ as likely will be prone to false positives & negatives even though stated near 100% accuracy.
Every item you add to a singular label changes the spectral bias of that label a fraction and adding Janice could lower the score of the spectra of other words.
If you add another label of ‘sounds like’ that are not in ‘not kw’ the ‘sounds like’ label has more cross entropy and you will notice your training accuracy will be a more gradual slope and likely need longer, but you will get less false positives and negatives.
Works with any KW type of model and you can further split ‘sounds like’ into start & ends ‘sound likes’ but generally low label count models often to seem to be very prone to ‘over fitting’ and adding labels creates much work in data prep but no additional running load just more accuracy.
Mlcommons have a huge ‘word’ dataset(s) Multilingual Spoken Words Corpus - 50 Languages and Over 23 Million Audio Keyword Examples | MLCommons as the idea of just adding ‘not kw’ is a bit ‘infinity & beyond’.