Free german tts voice for mycroft (sneak preview)

Dominik · September 21, 2020, 10:07pm

We first want to iron out the shortcomings mentioned above, e.g. “stop attention” and voice quality. After that the TTS & vocoder models will be published.

Thorsten · December 9, 2020, 10:04pm

As discussed in Mycroft chat with @SGee i’ve uploaded the sample phrases as in first post with a new “vocoder” (wavegrad) model training. @Dominik and i are currently playing around with different vocoders.

It’s based on same taco2 model as first samples (460k steps), so voice flow is identical but it’s pronounced diffently. Random noise in background will (hopefully) get away on more training steps (currently wavegrad training on 350k steps).

Thorsten · August 24, 2021, 4:29pm

My german offline TTS model is ready to be used in Mycroft.

Here’s a video showing how to set it up.

gekoch · August 28, 2021, 10:39am

thx a lot for all the hard work. works great but sadly on my machines quite slow and therefore almost not usable since I’m not sure how to improve the time it takes to generate the wav.

baconator · August 28, 2021, 6:12pm

A better machine. What are you trying it on now?

gekoch · August 28, 2021, 6:14pm

Intel® Xeon® CPU E3-1246 v3 @ 3.50GHz
or a
CPU(s) 8 x Intel(R) Core™ i7-6700T CPU @ 2.80GHz

baconator · August 28, 2021, 6:20pm

Got a GPU on either one?

gekoch · August 28, 2021, 8:37pm

no… just the integrated ones in the CPU ( Intel® Xeon® CPU E3-1246 v3 @ 3.50GHz)

Thorsten · August 29, 2021, 6:50pm

I’d check if the model can be run easily with the Griffin-Lim vocoder. That could be faster but with less quality.
I’ll check and give feedback if i know it’s working.

gekoch · August 30, 2021, 4:59am

thx a lot. In our video what kind of hardware did you use, the developer kit is based on what chip?

Thorsten · March 8, 2022, 6:54pm

Guude,

@Dominik and i are still working on the next release of “Thorsten” voice to be used as offline TTS for Mycroft. (@Olaf thanks for supporting with HifiGAN training compute power)

Some german sentences taken from Mycroft skills can be heard here:

Thorsten · August 14, 2022, 2:48pm

Even if this thread a a little outdated because of @synesthesiam fantasic work on Mimic 3, but i still work on providing a free, high(er) quality TTS voice.

I’ve trained a new model and am unsure which of these two variations to be released.
Please give the samples a listen and let me know which variation you like more.

gekoch · September 27, 2022, 12:34pm

WOW it sounds really great!

Is there a possibility to run it as a docker?
So we can acces it over http://localhost:5002/api/tts?text=Hallo%20wie%20geht%20es%20dir

OpenGuru07 · September 27, 2022, 2:15pm

How about providing a Plugin for Openvoice OS TTS engines with whatever your model supports as a Text to Speech engine?

SGee · September 27, 2022, 5:20pm

I have the newest (.8.0) coqui-tts docker successfully deployed (gpu-enabled). Can’t speak for cpu only, yet this should be straight forward with the given Dockerfile.

GPU on the other hand needs, if nvcr.io/nvidia/pytorch:22.08-py3 is used, changes to the Dockerfile and setup.py directly in the source code. I only got it to work with a conda install (nvidia uses conda), otherwise pytorch will throw exceptions left and right.

The plugin for the Coqui server is in the making. (But you would need to build coqui tts form a fork of mine, it needs an api addition and the possibility to define a conf.json loaded at startup)

gekoch · September 27, 2022, 5:56pm

thx a lot.
At least I got the docker running but I’m not sure how to build the docker to use TTS_VERSION=0.8.0 like he mentions here:

SGee · September 27, 2022, 6:10pm

you don’t necessarily need synthesiams template. Just clone coqui tts and try building it from their Dockerfile.

docker build -t <somename> .
run it with
docker run -it -v "<dir/from/your/host>:/root/.local/share/tts" -p 5002:5002 --entrypoint 'tts-server' "<somename>"

gekoch · September 27, 2022, 7:12pm

thx a lot for your help.
docker build builds me an img but when I try to start it I get this error:

Traceback (most recent call last):
  File "/venv/bin/tts-server", line 5, in <module>
    from TTS.server.server import main
  File "/root/TTS/server/server.py", line 13, in <module>
    from TTS.config import load_config
  File "/root/TTS/config/__init__.py", line 10, in <module>
    from TTS.config.shared_configs import *
  File "/root/TTS/config/shared_configs.py", line 5, in <module>
    from trainer import TrainerConfig
  File "/venv/lib/python3.8/site-packages/trainer/__init__.py", line 3, in <module>
    from trainer.model import *
  File "/venv/lib/python3.8/site-packages/trainer/model.py", line 4, in <module>
    import torch
  File "/venv/lib/python3.8/site-packages/torch/__init__.py", line 655, in <module>
    from ._tensor import Tensor
  File "/venv/lib/python3.8/site-packages/torch/_tensor.py", line 15, in <module>
    from torch.overrides import (
  File "/venv/lib/python3.8/site-packages/torch/overrides.py", line 33, in <module>
    from torch._C import (
ImportError: cannot import name '_set_torch_function_mode' from 'torch._C' (/venv/lib/python3.8/site-packages/torch/_C.cpython-38-x86_64-linux-gnu.so)

Probably a simple problem but I first need to dig deeper and learn a lot…

SGee · September 27, 2022, 7:28pm

sent you the code to basically use conda instead of pip

gekoch · September 27, 2022, 8:04pm

thx a lot for all the help. The docker is now running however the voice is really strange. Maybe someone knows why it sounds that way.

EDIT:
arg, I did use the old tts_models/de/thorsten/tacotron2-DCA model instead of the new tts_models/de/thorsten/tacotron2-DDC