Introducing Mimic 3

Interesting results. Edresson is a knowledgable person when it comes to TTS, e.g. see his work on YourTTS.

@synesthesiam Does it make sense if I apply my audio preprocessing chain (that I have used for Thorsten-DE) to the TTS-Portugese-Corpus?

Looks like he’s contributing his TTS knowledge to the Coqui.ai project.

I wonder if I need to just train a model directly on characters rather than trying to use a phonemizer. My most recent attempt in Mimic 3 used the pt-br voice from espeak-ng.

Looking at Edresson’s model config, his audio settings are a bit different from mine. For example, his sample rate is 20000 instead of 22050, and he’s using “preemphasis” which appears to filter the audio before training. So maybe the problem is my naive use of the data directly without enough preprocessing?

Hi, sorry for bothering you. Do you have the plan for releasing the training source code yet? Thank you, looking forward to your information.

I’ve published a video on my Youtube channel showing all ways to install/run Mimic 3 and first steps to synthesize audio by CLI oder local WebUI :slight_smile:.

Just in case it’s interesting for you.

2 Likes

Hi @rostom132, no bother :slight_smile:
I do have a plan, but not definite release date yet. The training software as it is right now is the result of a year and a half of experimentation, including a lot of dead-ends. I’m cleaning it up now and removing a lot of the unused code. My hope is to make it work closely with Mimic Studio.

1 Like

Thanks for the awesome video, @Thorsten! I was very happy to see all the installation methods worked out well :slight_smile:

For anyone curious about the SSML volume, it currently just goes from 0-100%. Something I need definitely need to fix :+1:

1 Like

You’re welcome @synesthesiam :slight_smile:

In general a volume value between 0 and 100 makes sense, but just in case you want your TTS voice “yelling” at you a higher value could make sense too :wink:.

1 Like

Hello,
My final goal would be to record enough audio for slovak voice and then later on use espeak-ng phonemizer with that recording to train the voice.
Slovak is somewhat similar to czech and given there are some czech recordings available, perhaps I can first get to understand the whole process withese czech data and then move on to record slovak recordings.
There are some czech voices for festival with available recordings.
For example here is the list of words: voice-czech-ph/words at master · brailcom/voice-czech-ph · GitHub
And here are the actual recordings: voice-czech-machac/wav at master · brailcom/voice-czech-machac · GitHub
These are so called diphone voices for festival so these recordings include list of words where each word features concrete preselected syllable used for training.
Would it be doable and does it make sense to use these recorded words for training mimic3 czech voice? Is likelly to provide better results than festival?

If I manage to record or otherwise source reasonable number of recordings for slovak voice let’s say a few hours will I be able to train the voice on my own using my laptop or my computer or does it need much more power?
I see how LJ speech or thorsten german recordings are structured. Should I move on to recording or am I supposed to know how it all works before recording? In other words does my experiment with existing czech recordings make sense?

Greetings

Peter

1 Like

Hi @pvagner, welcome :slight_smile:

The czech recordings are a good start, but you would also need full sentence recordings. Single words are important to include too, of course – you can hear Mimic 3 struggle with them for many voices because the whole dataset was just full sentences.

It might, but Mimic 3 needs full sentences to get the right inflection and pacing of a sentence. I don’t know for sure, but I’d guess that single words only would result in an unnatural sounding voice.

For training, I’ve struggled to train voices on a GTX 1060 6GB. I’d recommend something with at least 8-10+ GB of VRAM. If you’re willing to provide the dataset with an open license, I’d be happy to train the voice for you on the GPUs I have here at home :slight_smile:

I’ve been working on a tool that might save you some time. It takes text from existing corpora like the Oscar corpus and tries to create a small phonetically balanced list of sentences to read. It looks like Oscar has Slovak; would you be willing to help create me create a dataset? I need a native speaker to review sentences and ensure they make sense and (because they come from the internet) are not advertisements for adult material :see_no_evil:

2 Likes

Just in case you are using Home Assistant and would like to use it with Mimic 3.

1 Like

Hello again,
Excuse me for the late reply.
Yes I am definatelly interested. I don’t have access to such powerfull machines so I’ll take it reasonable that I won’t be able to build it my-self. Still I’d be happy to move forward with this thus I’d like to help creating the text we should read. That way we can ensure the licencing is correct.

2 Likes

Hello.
I am planing to use mimic3 as TTS engine in my project, which will be use QT Speech (with help of speech-dispatcher) on RPi4/CM4. For building OS image I use boot2qt (Yocto based). I see mimic1 in Yocto. Any plan to support mimic3? (I newbie in Yocto and can’t do it myself yet :slight_smile: )

Do you have any success until now? I’ve the same messages and problems.

I installed using the .deb on Linux Mint 21. The default voice works very well. Nice!

mimic3-download hangs after everything (I think) has been downloaded. Example: "mimic3-download ‘en_US/*’ reaches 100% on generator.onnx and hangs forever. Ctrl-C (SIGINT) generates:

Traceback (most recent call last):
File “mimic3.py”, line 36, in
File “mimic3_tts/download.py”, line 237, in main
File “mimic3_tts/download.py”, line 134, in download_voice
File “http/client.py”, line 458, in read
File “http/client.py”, line 502, in readinto
File “socket.py”, line 704, in readinto
File “ssl.py”, line 1241, in recv_into
File “ssl.py”, line 1099, in read

I haven’t dug into it at all, but thought someone should know.

EDIT: this was caused by IDS/IPS software on my firewall. The solution is to disable that.

1 Like

Running
.venv/bin/mimic3-server
presents the demo web, but the dropdown widgets for selecting locale, voice, and speaker are unpopulated.

I presume that’s a missing file? One with some js in it? mimic3_http/templates/index.html sets up a listener for selection changes, but I see nothing that loads the lists with values to select.

Very cool though.