We first want to iron out the shortcomings mentioned above, e.g. “stop attention” and voice quality. After that the TTS & vocoder models will be published.
As discussed in Mycroft chat with @SGee i’ve uploaded the sample phrases as in first post with a new “vocoder” (wavegrad) model training. @Dominik and i are currently playing around with different vocoders.
It’s based on same taco2 model as first samples (460k steps), so voice flow is identical but it’s pronounced diffently. Random noise in background will (hopefully) get away on more training steps (currently wavegrad training on 350k steps).
My german offline TTS model is ready to be used in Mycroft.
Here’s a video showing how to set it up.
thx a lot for all the hard work. works great but sadly on my machines quite slow and therefore almost not usable since I’m not sure how to improve the time it takes to generate the wav.
A better machine. What are you trying it on now?
Intel® Xeon® CPU E3-1246 v3 @ 3.50GHz
CPU(s) 8 x Intel(R) Core™ i7-6700T CPU @ 2.80GHz
Got a GPU on either one?
no… just the integrated ones in the CPU ( Intel® Xeon® CPU E3-1246 v3 @ 3.50GHz)
I’d check if the model can be run easily with the Griffin-Lim vocoder. That could be faster but with less quality.
I’ll check and give feedback if i know it’s working.
thx a lot. In our video what kind of hardware did you use, the developer kit is based on what chip?