Introducing Mimic 3

Mimic 3: Mycroft’s newer, better, privacy-focused neural text-to-speech (TTS) engine. In human terms, that means it can run completely offline and sounds great. To top it all off, it’s open source.

Mimic 3 can:

  • Speak more than two dozen languages, with over 100 English voices available
  • Run completely offline on devices like the Raspberry Pi 4
  • Control speaking rate and variability
  • Use SSML (Speaking Synthesis Markup Language) to switch voices, add pauses, and pronounce words phonetically within a single document
  • And more…

Mimic 3 is for anyone who wants:

  • To integrate a state-of-the-art text-to-speech system into just about anything;
  • A personal, offline text-to-speech system that sounds better than ever;
  • A premium text-to-speech cloud service, but also cares about the privacy of their data;
  • To contribute their language expertise (or voice) for everyone’s benefit; or
  • To hack on open source code.

Using Mimic 3

Obligatory self-plug:

The easiest way to make use of Mimic 3 is to grab yourself a Mycroft Mark II, which will use Mimic 3 as its standard TTS engine. Pre-orders for the first two production runs of the Mark II have sold out. Get in quick for the third production run to have your Mark II shipped in November 2022.

Mark II Pre-Order

Mimic 3 can also be used on any existing Mycroft installation with a Raspberry Pi 4 or better using our TTS plugin.It can be the voice of your Mycroft assistant, computer, or next IoT project. It is intended to run completely offline on devices like the Raspberry Pi 4, and works out of the box with other open source software like Home Assistant and Node-RED. Mimic 3 includes a web API as well as a command-line interface, allowing it to be used easily in scripts and automations.

Mycroft AI has pre-trained voices for Mimic 3 in 25 different languages, with over 100 individual English speakers available. Most voices are based on publicly available datasets from worldwide volunteers, such as the indefatigable Thorsten Müller. We are always on the lookout for datasets from new languages, or higher-quality datasets for a currently supported language. All it takes is one voice to make a difference for everyone!

We have also recorded a number of new datasets in a more controlled environment with phonetically-diverse data, increasing the voice quality significantly. Premium voices trained from these Mycroft datasets will be available on the Mark II, for Mycroft Members, and under commercial licensing agreements.

In addition to basic voice controls like speaking rate and variability, Mimic 3 supports a subset of Speech Synthesis Markup Language (SSML), allowing you to script who’s speaking and how. With SSML, you can create a single document that switches voices (and even languages), includes timed pauses between sentences, and manually adjusts volume, speed, etc. See the docs for details on exactly what SSML tags are currently supported.

See our documentation for all the options.

How does it Work?

Mimic 3 uses a cloud-quality machine learning model that has been scaled down to run on lower-end devices like the Raspberry Pi 4. Specifically, Mimic 3’s voices are phoneme or character-based PyTorch VITS models (based on the excellent work of Jaehyeon Kim) that are exported to the Onnx runtime. At runtime, Mimic 3 transforms your text into numbers that are fed to a voice model, which produces audio resembling the dataset it was trained on.

For some languages like English and German, large public datasets for individual speakers are available. The base models trained from these datasets were used as a starting point for different voices with smaller datasets in the same language (a process called “fine-tuning”). Many different voices were trained quickly with this approach, often overnight, on a handmade server with a few RTX 3090’s. We plan to release the training code soon so anyone with the right technical skills can train their own voice, with the ultimate future goal of integrating it into Mimic Recording Studio.

How Can I Help?

There are many ways you can help out the Mimic 3 project:

Adding a new language or voice to Mimic 3 can vary in difficulty depending on two main factors:

  1. For new languages, Mimic 3 needs to know how to normalize and phonemize text. This involves turning everything in a sentence into words (“$5” = “five dollars”) and then converting those words into units of human speech (phonemes). Projects like espeak-ng, gruut, and lingua-franca provide Mimic 3 with this capability. The phonemization step can be skipped for languages where what’s written is very close to what’s spoken.
  2. Every voice comes from a dataset, which is just text with the corresponding spoken audio. An audiobook or someone speaking a list of prepared lines can be a dataset as long as the licensing allows for it. Importantly, each sentence (audio and text) needs to be separated out for training, so there is often manual work involved that can be difficult for non-native speakers. Lastly, the dataset needs to have a good variety of sounds from the language (“phonetically diverse”). This is usually not a problem if you have a dozen hours of audio data, but becomes critical with only one or two hours.

Read more at https://mycroft.ai

7 Likes

Hi,

I am trying to install Mimic 3 TTS Plugin for Mycroft (on Picroft) but I get this error:

Requirement already satisfied: requests<3,>=2 in ./mycroft-core/.venv/lib/python3.7/site-packages (from mycroft-mimic3-tts<1.0->mycroft-plugin-tts-mimic3) (2.20.0)
INFO: pip is looking at multiple versions of mycroft-plugin-tts-mimic3 to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement onnxruntime<2.0,>=1.6 (from mycroft-mimic3-tts) (from versions: none)
ERROR: No matching distribution found for onnxruntime<2.0,>=1.6

Here are some details about my installation:

(.venv) pi@picroft:~ $ python --version
Python 3.7.3
$ lsb_release -a
No LSB modules are available.
Distributor ID: Raspbian
Description:    Raspbian GNU/Linux 10 (buster)
Release:        10
Codename:       buster

Any idea what might be wrong and how I can fix this?

Hi @shaan7, if you’re on a 32-bit ARM machine try installing the plugin with:

pip3 install -f https://synesthesiam.github.io/prebuilt-apps/ ...

Microsoft does not release official Python wheels for 32-bit platforms, so I’ve built/collected them.

Hi,
am trying to setup mimic3 on picroft arm64. I have problems installing additional languages.

mycroft-pip install mycroft-plugin-tts-mimic3

works and i can use an english language.

mycroft-pip install mycroft-plugin-tts-mimic3[de]

fails with the following error message:

(.venv) pi@picroft:~ $ mycroft-pip install mycroft-plugin-tts-mimic3[de]
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Requirement already satisfied: mycroft-plugin-tts-mimic3[de] in ./mycroft-core/.venv/lib/python3.9/site-packages (0.1.4)
Requirement already satisfied: mycroft-mimic3-tts<1.0 in ./mycroft-core/.venv/lib/python3.9/site-packages (from mycroft-plugin-tts-mimic3[de]) (0.2.2)
ERROR: Could not find a version that satisfies the requirement mimic3-tts[de]; extra == "de" (from mycroft-plugin-tts-mimic3[de]) (from versions: none)
ERROR: No matching distribution found for mimic3-tts[de]; extra == "de"

my installation:

(.venv) pi@picroft:~ $ mycroft-pip --version
pip 22.1.2 from /home/pi/mycroft-core/.venv/lib/python3.9/site-packages/pip (python 3.9)
(.venv) pi@picroft:~ $ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 11 (bullseye)
Release:        11
Codename:       bullseye

Thanks a lot and kind regards

1 Like

Got it fixed in v0.1.5. Thanks!

1 Like

Hi,
sorry for bothering again :slight_smile: Installation now went well. I now get the error from audio log:

2022-06-29 20:28:14.595 | INFO     |   787 | mycroft.tts.tts:create:609 | Loaded plugin mimic3_tts_plug
2022-06-29 20:28:14.599 | ERROR    |   787 | mycroft.tts.tts:create:618 | The selected TTS backend couldn't be loaded. Falling back to Mimic
Traceback (most recent call last):
  File "/home/pi/mycroft-core/mycroft/tts/tts.py", line 613, in create
    tts = clazz(tts_lang, tts_config)
  File "/home/pi/mycroft-core/.venv/lib/python3.9/site-packages/mycroft_plugin_tts_mimic3/__init__.py", line 110, in __init__
    self.tts.preload_voice(voice)
  File "/home/pi/mycroft-core/.venv/lib/python3.9/site-packages/mimic3_tts/tts.py", line 310, in preload_voice
    self._get_or_load_voice(key_to_load)
  File "/home/pi/mycroft-core/.venv/lib/python3.9/site-packages/mimic3_tts/tts.py", line 553, in _get_or_load_voice
    maybe_model_dir = self._download_voice(voice_key)
  File "/home/pi/mycroft-core/.venv/lib/python3.9/site-packages/mimic3_tts/tts.py", line 605, in _download_voice
    download_voice(
  File "/home/pi/mycroft-core/.venv/lib/python3.9/site-packages/mimic3_tts/download.py", line 85, in download_voice
    voice_dir = Path(voices_dir) / voice_key
  File "/usr/lib/python3.9/pathlib.py", line 1071, in __new__
    self = cls._from_parts(args, init=False)
  File "/usr/lib/python3.9/pathlib.py", line 696, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "/usr/lib/python3.9/pathlib.py", line 680, in _parse_args
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
2022-06-29 20:28:14.603 | WARNING  |   787 | mycroft.tts.tts:clear_cache:428 | This method is deprecated, use TextToSpeechCache.clear
  Carnegie Mellon University, Copyright (c) 1999-2011, all rights reserved
  mimic developers, Copyright (c) 2016, all rights reserved
  version: mimic-1.2.0.2 ()

I think that last time the english voice worked was only because me playing around i installed the debian package from mimic3. But in theory this is not needed or am i wrong?
If i install the debian package the english voice works again and /usr/share/mycroft/mimic3/voices exists but only with the english language.
Thanks a lot and kind regards

Edit: I think some things are not loaded properly. I took the following workaround:
i started mimic3-server from the commandline and let it say something in the voice i wanted to setup. It loaded some files afterwards the configured voice worked.

(.venv) pi@picroft:~ $ mimic3-server
INFO:mimic3_http.__main__:Starting web server
[2022-06-29 20:44:21 +0200] [1823] [INFO] Running on http://0.0.0.0:59125 (CTRL + C to quit)
INFO:hypercorn.error:Running on http://0.0.0.0:59125 (CTRL + C to quit)
ALIASES: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33.0/33.0 [00:00<00:00, 59.9kB/s]
LICENSE: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.40k/6.40k [00:00<00:00, 3.21MB/s]
README.md: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10.7k/10.7k [00:00<00:00, 3.93MB/s]
README.md.in: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 193/193 [00:00<00:00, 363kB/s]
SOURCE: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 61.0/61.0 [00:00<00:00, 109kB/s]
VERSION: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.00/6.00 [00:00<00:00, 11.7kB/s]
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.71k/3.71k [00:00<00:00, 5.90MB/s]
generator.onnx: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 59.9M/59.9M [00:08<00:00, 7.21MB/s]
phoneme_map.txt: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15.0/15.0 [00:00<00:00, 25.2kB/s]
phonemes.txt: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 340/340 [00:00<00:00, 618kB/s]
INFO:mimic3_tts.tts:Loaded voice from /home/pi/.local/share/mycroft/mimic3/voices/de_DE/thorsten_low

1 Like

Thanks for that, installation now continues after using that URL. However, now I don’t hear the TTS output. I can hear the wakeword detection sound, and Mycroft even runs skills (so my light bulb turns on), but I can’t hear him.

There are no obvious errors in the log:

However, running mimic3 manually shows something interesting:

(.venv) pi@picroft:~ $ which mimic3
/home/pi/mycroft-core/.venv/bin/mimic3
(.venv) pi@picroft:~ $ mimic3 --help
Traceback (most recent call last):
  File "/home/pi/mycroft-core/.venv/lib/python3.7/site-packages/numpy/core/__init__.py", line 22, in <module>
    from . import multiarray
  File "/home/pi/mycroft-core/.venv/lib/python3.7/site-packages/numpy/core/multiarray.py", line 12, in <module>
    from . import overrides
  File "/home/pi/mycroft-core/.venv/lib/python3.7/site-packages/numpy/core/overrides.py", line 7, in <module>
    from numpy.core._multiarray_umath import (
ImportError: /lib/arm-linux-gnueabihf/libm.so.6: version `GLIBC_2.29' not found (required by /home/pi/mycroft-core/.venv/lib/python3.7/site-packages/numpy/core/_multiarray_umath.cpython-37m-arm-linux-gnueabihf.so)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/pi/mycroft-core/.venv/bin/mimic3", line 5, in <module>
    from mimic3_tts.__main__ import main
  File "/home/pi/mycroft-core/.venv/lib/python3.7/site-packages/mimic3_tts/__init__.py", line 17, in <module>
    from .tts import Mimic3Settings, Mimic3TextToSpeechSystem
  File "/home/pi/mycroft-core/.venv/lib/python3.7/site-packages/mimic3_tts/tts.py", line 26, in <module>
    from gruut_ipa import IPA
  File "/home/pi/mycroft-core/.venv/lib/python3.7/site-packages/gruut_ipa/__init__.py", line 2, in <module>
    from gruut_ipa.accent import GuessedPhonemes, guess_phonemes  # noqa: F401
  File "/home/pi/mycroft-core/.venv/lib/python3.7/site-packages/gruut_ipa/accent.py", line 8, in <module>
    from gruut_ipa.distances import get_closest
  File "/home/pi/mycroft-core/.venv/lib/python3.7/site-packages/gruut_ipa/distances.py", line 10, in <module>
    import numpy as np
  File "/home/pi/mycroft-core/.venv/lib/python3.7/site-packages/numpy/__init__.py", line 150, in <module>
    from . import core
  File "/home/pi/mycroft-core/.venv/lib/python3.7/site-packages/numpy/core/__init__.py", line 48, in <module>
    raise ImportError(msg)
ImportError: 

IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!

Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.

We have compiled some common reasons and troubleshooting tips at:

    https://numpy.org/devdocs/user/troubleshooting-importerror.html

Please note and check the following:

  * The Python version is: Python3.7 from "/home/pi/mycroft-core/.venv/bin/python3"
  * The NumPy version is: "1.21.6"

and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.

Original error was: /lib/arm-linux-gnueabihf/libm.so.6: version `GLIBC_2.29' not found (required by /home/pi/mycroft-core/.venv/lib/python3.7/site-packages/numpy/core/_multiarray_umath.cpython-37m-arm-linux-gnueabihf.so)

I checked and Raspbian buster doesn’t have a 2.29 package for libc6, only 2.28 . Should I go ahead and attempt to upgrade my Picroft to Bullseye (assuming thats what you are building/collecting on)?

I’ve seen this error crop up sometimes if you use a voice with a #speaker in the plugin config. You have to set the speaker separately for now like is shown here: GitHub - MycroftAI/plugin-tts-mimic3: Text to speech plugin for Mycroft using Mimic 3

The Debian packages do only support bullseye or later, unfortunately. Besides upgrading to bullseye, other options would include installing from pip/source or using the Docker image.

I did not use the Debian packages though, I installed via mycroft-pip as described at Mimic 3 - Mycroft AI like so:

mycroft-pip install -f https://synesthesiam.github.io/prebuilt-apps mycroft-plugin-tts-mimic3

Or do you mean to say the wheels at https://synesthesiam.github.io/prebuilt-apps also only support bullseye and above?

That’s what I would have meant if I’d been paying attention :stuck_out_tongue:

So it seems the Docker image may be the only option unless you want to run Mimic 3 remotely on a different machine. If you do that, you can just use Mycroft’s MaryTTS plugin to connect to it.

I have the same error. I set a speaker in the user config as suggested, but the only difference is that mycroft falls back to mimic1 instead of staying silent

Hello,
i can comfirm this. I set the speaker in config like described in the documentation.

Any errors should be in /var/log/mycroft/audio.log

A common problem is using an outdated pip before installing the plugin. There is some dependency-of-a-dependency problem with the dateparser and regex package. After upgrading pip, you may need to do pip install --upgrade regex as well.

Thanks for all the hints, guys! Using the plugin, I could make my Mycroft Linux Mint test install work with Mimic3—works a treat!

Here’s my mycroft.conf:

{
  "max_allowed_core_version": 21.2,

  "listener": {
    "wake_word": "hey computer"
  },

  "hotwords": {
    "hey computer": {
        "module": "pocketsphinx",
        "phonemes": "HH EY . K AH M P Y UW T ER .",
        "threshold": 1e-90,
        "lang": "en-us"
    }
  },

  "tts": {
    "module": "mimic3_tts_plug",
    "mimic3_tts_plug": {
        "voice": "en_US/cmu-arctic_low",
        "speaker": "rms",
        "length_scale": 0.8,
        "noise_scale": 0.667,
        "noise_w": 0.8
    }
  }
}
2 Likes

I’d really love to see Brazilian Portuguese supported.

If you know of a good dataset, I’d happily train a voice :slight_smile:

I’ve not even a clue where or how I’d search for that. Any ideas?

https://edresson.github.io/TTS-Portuguese-Corpus/

According to this comment from the GitHub repo it is Brazilian Portugese.

1 Like

@synesthesiam said previously in a Rhasspy forum thread he tried that one before and the quality isn’t good…

It was, but people told me that the voice I trained wasn’t understandable. I used this dataset: https://github.com/Edresson/TTS-Portuguese-Corpus

Do you know of any other TTS Portuguese datasets?

edit:
Listening to the recordings resulting from the TTS experments on this page TTS-Portuguese Corpus In my opinion the audio file results from of the Portuguese TTS is perfectly fine and understandable - the results of experiment #1 and #3 on that page. Experiment #2 was also understandable but had added noise distortion.
For example this wav file result of the longest phrase from Experiment #3 is very good and highly understandablef:
Hoje é fundamental encontrar a razão da existência humana
So @synesthesiam I wonder why when you used this data set your results were not understandable? The above results are pretty much the same quality as you get when using the Google Translate page to generate Portuguese TTS audio. It’s good!

  • Experiment 1 uses the DCTTS model, trained in the TTS-Portuguese Corpus, and vocoder RTISI-LA (Good).
  • Experiment 2 uses the Tacotron 1 model, trained in the TTS-Portuguese Corpus (Bad)
  • Experiment 3 this experiment explores the use of the TTS Mozilla model, trained in the TTS-Portuguese Corpus (Very Good)