That’s exactly what the Assistant (generally) does. Here you can see how OVOS does it in our Mimic3 plugins. The Mimic3 API returns a wav file, we write it, then the Assistant plays it. Separate operations.
Piper’s developer was Mimic3’s developer and Mycroft’s last lead dev. Mimic3 is abandonware. The OVOS community has some public Mimic3 instances up for the time being, so Neon and OVOS devices running those plugins will keep working, but nobody is working on Mimic itself.
mycroft had a hardcoded path for mimic2 only, in ovos we generalized that for every plugin, so there are 2 caches
runtime cache, every utterance is saved there so repeat speech is cached, deleted on reboot or if running out of disk space
permanent cache, those cache files are never deleted but are not auto generated either, classic core included mimic2 samples for default dialogs here (things that need to be spoken before selene was available, such as pairing and wifi setup)
we also introduced a config flag “persist_cache” that will save any new utterance to the permanent cache, this is meant to be enabled temporarily only for generating said cache
Do you know the algorithm used to generate the names of the wav files? I’m creating the files (prefetching ahead of when they’re needed), but then I don’t know how to retrieve them based on the text and voice (an any other needed parameters).