Introducing Mimic 3

The Debian packages do only support bullseye or later, unfortunately. Besides upgrading to bullseye, other options would include installing from pip/source or using the Docker image.

I did not use the Debian packages though, I installed via mycroft-pip as described at Mimic 3 - Mycroft AI like so:

mycroft-pip install -f https://synesthesiam.github.io/prebuilt-apps mycroft-plugin-tts-mimic3

Or do you mean to say the wheels at https://synesthesiam.github.io/prebuilt-apps also only support bullseye and above?

That’s what I would have meant if I’d been paying attention :stuck_out_tongue:

So it seems the Docker image may be the only option unless you want to run Mimic 3 remotely on a different machine. If you do that, you can just use Mycroft’s MaryTTS plugin to connect to it.

I have the same error. I set a speaker in the user config as suggested, but the only difference is that mycroft falls back to mimic1 instead of staying silent

Hello,
i can comfirm this. I set the speaker in config like described in the documentation.

Hi I have setup mimic3 using the guide, It all setup. The speaker works the microphone works but it wont speak to me. I set it up using the plugin and have a raspberry pi 4 setup using the raspbian os with mycroft installed. I don’t know where to get logs from. Thanks in advanced

Any errors should be in /var/log/mycroft/audio.log

A common problem is using an outdated pip before installing the plugin. There is some dependency-of-a-dependency problem with the dateparser and regex package. After upgrading pip, you may need to do pip install --upgrade regex as well.

Thanks for all the hints, guys! Using the plugin, I could make my Mycroft Linux Mint test install work with Mimic3—works a treat!

Here’s my mycroft.conf:

{
  "max_allowed_core_version": 21.2,

  "listener": {
    "wake_word": "hey computer"
  },

  "hotwords": {
    "hey computer": {
        "module": "pocketsphinx",
        "phonemes": "HH EY . K AH M P Y UW T ER .",
        "threshold": 1e-90,
        "lang": "en-us"
    }
  },

  "tts": {
    "module": "mimic3_tts_plug",
    "mimic3_tts_plug": {
        "voice": "en_US/cmu-arctic_low",
        "speaker": "rms",
        "length_scale": 0.8,
        "noise_scale": 0.667,
        "noise_w": 0.8
    }
  }
}
2 Likes

I’d really love to see Brazilian Portuguese supported.

If you know of a good dataset, I’d happily train a voice :slight_smile:

I’ve not even a clue where or how I’d search for that. Any ideas?

https://edresson.github.io/TTS-Portuguese-Corpus/

According to this comment from the GitHub repo it is Brazilian Portugese.

1 Like

@synesthesiam said previously in a Rhasspy forum thread he tried that one before and the quality isn’t good…

It was, but people told me that the voice I trained wasn’t understandable. I used this dataset: https://github.com/Edresson/TTS-Portuguese-Corpus

Do you know of any other TTS Portuguese datasets?

edit:
Listening to the recordings resulting from the TTS experments on this page TTS-Portuguese Corpus In my opinion the audio file results from of the Portuguese TTS is perfectly fine and understandable - the results of experiment #1 and #3 on that page. Experiment #2 was also understandable but had added noise distortion.
For example this wav file result of the longest phrase from Experiment #3 is very good and highly understandablef:
Hoje é fundamental encontrar a razão da existência humana
So @synesthesiam I wonder why when you used this data set your results were not understandable? The above results are pretty much the same quality as you get when using the Google Translate page to generate Portuguese TTS audio. It’s good!

  • Experiment 1 uses the DCTTS model, trained in the TTS-Portuguese Corpus, and vocoder RTISI-LA (Good).
  • Experiment 2 uses the Tacotron 1 model, trained in the TTS-Portuguese Corpus (Bad)
  • Experiment 3 this experiment explores the use of the TTS Mozilla model, trained in the TTS-Portuguese Corpus (Very Good)

Interesting results. Edresson is a knowledgable person when it comes to TTS, e.g. see his work on YourTTS.

@synesthesiam Does it make sense if I apply my audio preprocessing chain (that I have used for Thorsten-DE) to the TTS-Portugese-Corpus?

Looks like he’s contributing his TTS knowledge to the Coqui.ai project.

I wonder if I need to just train a model directly on characters rather than trying to use a phonemizer. My most recent attempt in Mimic 3 used the pt-br voice from espeak-ng.

Looking at Edresson’s model config, his audio settings are a bit different from mine. For example, his sample rate is 20000 instead of 22050, and he’s using “preemphasis” which appears to filter the audio before training. So maybe the problem is my naive use of the data directly without enough preprocessing?

Hi, sorry for bothering you. Do you have the plan for releasing the training source code yet? Thank you, looking forward to your information.

I’ve published a video on my Youtube channel showing all ways to install/run Mimic 3 and first steps to synthesize audio by CLI oder local WebUI :slight_smile:.

Just in case it’s interesting for you.

2 Likes

Hi @rostom132, no bother :slight_smile:
I do have a plan, but not definite release date yet. The training software as it is right now is the result of a year and a half of experimentation, including a lot of dead-ends. I’m cleaning it up now and removing a lot of the unused code. My hope is to make it work closely with Mimic Studio.

1 Like

Thanks for the awesome video, @Thorsten! I was very happy to see all the installation methods worked out well :slight_smile:

For anyone curious about the SSML volume, it currently just goes from 0-100%. Something I need definitely need to fix :+1:

1 Like