Mycroft Community Forum

Free german tts voice for mycroft (sneak preview)

For tacotron, a gpu would be ideal. I use nvidia 1030’s, they don’t draw much when idle and fanless models are available. Yes, this necessitates running a host with them in it 24/7, but for quality and speed you’re going to have to make some trade-offs.

We’re quickly approaching a place were cpu can be used instead of a gpu, so this answer may change in the next year.

1 Like

Very recently an article popped up on how to setup a Win (easily reproduceable in linux) deepspeech server for Mycroft

Yet, Mozilla trained models seem a little bit different with using pbmm and some separate scorer.

Like the exclusion :grin:

Got a stripped naked 1080 (only 2GB dedicated though)

I already thought that’d catch someone’s eye :grin: .
I’m trying to be “nice” though, but if i’m successful in it should “the other guys” say. :wink:

I can gladly confirm that Thorsten is a nice guy, too. :smile:

2 Likes

As it turns out this is a mmap-able format for inferencing. The pretty easy process to convert pb to pbmm is described here

This would be for deepspeech, not TTS.

The Deepspeech server is serving STT/TTS. I just don’t think it will run another model type.

Deepspeech just does STT. Tacotron is TTS.



This is about the later.

@baconator
Oh OK, now i’ve dug a little deeper i saw that the articles talking about 2 servers with the second model already packaged, so i haven’t recognized it as such.

So, STT aside. Is the TTS serveing (described in the how-to) still viable? Or what would you suggest?

Is STT modeling sourced from one speaker beneficial?

@Thorsten @Dominik Have you planned to upload the model?

If you’re referring to https://github.com/mozilla/TTS/tree/master/TTS/server then yes, this is still viable and I just sent off a package earlier today using that.

Possibly for that one person. A wider set of submitted data would almost certainly help, even if the bulk of the data was from one person.

We first want to iron out the shortcomings mentioned above, e.g. “stop attention” and voice quality. After that the TTS & vocoder models will be published.

2 Likes

As discussed in Mycroft chat with @SGee i’ve uploaded the sample phrases as in first post with a new “vocoder” (wavegrad) model training. @Dominik and i are currently playing around with different vocoders.

It’s based on same taco2 model as first samples (460k steps), so voice flow is identical but it’s pronounced diffently. Random noise in background will (hopefully) get away on more training steps (currently wavegrad training on 350k steps).

3 Likes

My german offline TTS model is ready to be used in Mycroft.

Here’s a video showing how to set it up.

4 Likes

thx a lot for all the hard work. works great but sadly on my machines quite slow and therefore almost not usable since I’m not sure how to improve the time it takes to generate the wav.

A better machine. What are you trying it on now?

Intel® Xeon® CPU E3-1246 v3 @ 3.50GHz
or a
CPU(s) 8 x Intel(R) Core™ i7-6700T CPU @ 2.80GHz

Got a GPU on either one?

no… just the integrated ones in the CPU ( Intel® Xeon® CPU E3-1246 v3 @ 3.50GHz)

I’d check if the model can be run easily with the Griffin-Lim vocoder. That could be faster but with less quality.
I’ll check and give feedback if i know it’s working.

thx a lot. In our video what kind of hardware did you use, the developer kit is based on what chip?