Hopefully Precise will have a TFlite makeover as Googleresearch have published not only how but a framework to run and create your own models.
There is the GRU of Precise in there but for streaming KWS a CRNN has greater accuracy for less ops and latency, which would be my choice but just about every applicable state-of-art model for KWS is in the above repo with working examples.
On Arm tensorflow-addons can be a pain but as an intro to the great work of the above GitHub - StuartIanNaylor/g-kws: Adaption of the Googleresearch kws repo gives install info and also contains a few simple scripts to use the trained models.
As for models and training I am often confused at the methods proscribed by others as when it comes to variation in classification less is very much more accuracy.
If you are going to train for 1 or 2 voices then use data only of those voices and not some command set from the other side of the world, unless they are coming to your house to use your mycroft.
I have read some crazy ideas where people are just feeding random large amounts of data of anything and everything to notkw which is akin to advocating for the IT mantra of ‘garbage in’ is a good idea whilst with models it very much holds true and likely you will get 'garbage out.
That noise is mixed into KW & !KW without any consideration to the resultant volumes, noise should always be mixed at a lower volume than classification data or otherwise that is no longer classification data but purely noise.
Also for some reason the ‘Google command set’ is often used even though not one of the voices it contains is ever going to use your Mycroft but worse the ‘Google command set’ is a benchmark dataset containing deliberately high portions of bad and varied data that no model will ever hit 100% as thats the point as its a test to which can provide the best as there is no higher than 100% accuracy so you need a bad dataset for accuracy benchmarks hence the ‘Google command set’.
Google use the ‘command set’ to benchmark accuracy but I am pretty damn sure that dataset isn’t the one they use for their range of voiceAI.
Its actually really easy to create custom datasets and you use actual data on the capture device of use not third party imported datasets as there is nothing more accurate than training with ‘Your’ voice.
Precise is long in the tooth and extremely heavy whilst running the full tensorflow considering TFL can run the same models just less than x10 faster.
I really do suggest the Devs have a look at the Google-KWS as its end to end and just needs to be fed with a chunked audio feed as even the MFCC is embedded into the model.