That’s what I would have meant if I’d been paying attention
So it seems the Docker image may be the only option unless you want to run Mimic 3 remotely on a different machine. If you do that, you can just use Mycroft’s MaryTTS plugin to connect to it.
I have the same error. I set a speaker in the user config as suggested, but the only difference is that mycroft falls back to mimic1 instead of staying silent
Hi I have setup mimic3 using the guide, It all setup. The speaker works the microphone works but it wont speak to me. I set it up using the plugin and have a raspberry pi 4 setup using the raspbian os with mycroft installed. I don’t know where to get logs from. Thanks in advanced
Any errors should be in /var/log/mycroft/audio.log
A common problem is using an outdated pip before installing the plugin. There is some dependency-of-a-dependency problem with the dateparser and regex package. After upgrading pip, you may need to do pip install --upgrade regex as well.
edit:
Listening to the recordings resulting from the TTS experments on this page TTS-Portuguese Corpus In my opinion the audio file results from of the Portuguese TTS is perfectly fine and understandable - the results of experiment #1 and #3 on that page. Experiment #2 was also understandable but had added noise distortion.
For example this wav file result of the longest phrase from Experiment #3 is very good and highly understandablef: Hoje é fundamental encontrar a razão da existência humana
So @synesthesiamI wonder why when you used this data set your results were not understandable? The above results are pretty much the same quality as you get when using the Google Translate page to generate Portuguese TTS audio. It’s good!
Experiment 1 uses the DCTTS model, trained in the TTS-Portuguese Corpus, and vocoder RTISI-LA (Good).
Experiment 2 uses the Tacotron 1 model, trained in the TTS-Portuguese Corpus (Bad)
Experiment 3 this experiment explores the use of the TTS Mozilla model, trained in the TTS-Portuguese Corpus (Very Good)
I wonder if I need to just train a model directly on characters rather than trying to use a phonemizer. My most recent attempt in Mimic 3 used the pt-br voice from espeak-ng.
Looking at Edresson’s model config, his audio settings are a bit different from mine. For example, his sample rate is 20000 instead of 22050, and he’s using “preemphasis” which appears to filter the audio before training. So maybe the problem is my naive use of the data directly without enough preprocessing?
Hi @rostom132, no bother
I do have a plan, but not definite release date yet. The training software as it is right now is the result of a year and a half of experimentation, including a lot of dead-ends. I’m cleaning it up now and removing a lot of the unused code. My hope is to make it work closely with Mimic Studio.
In general a volume value between 0 and 100 makes sense, but just in case you want your TTS voice “yelling” at you a higher value could make sense too .
Hello,
My final goal would be to record enough audio for slovak voice and then later on use espeak-ng phonemizer with that recording to train the voice.
Slovak is somewhat similar to czech and given there are some czech recordings available, perhaps I can first get to understand the whole process withese czech data and then move on to record slovak recordings.
There are some czech voices for festival with available recordings.
For example here is the list of words: voice-czech-ph/words at master · brailcom/voice-czech-ph · GitHub
And here are the actual recordings: voice-czech-machac/wav at master · brailcom/voice-czech-machac · GitHub
These are so called diphone voices for festival so these recordings include list of words where each word features concrete preselected syllable used for training.
Would it be doable and does it make sense to use these recorded words for training mimic3 czech voice? Is likelly to provide better results than festival?
If I manage to record or otherwise source reasonable number of recordings for slovak voice let’s say a few hours will I be able to train the voice on my own using my laptop or my computer or does it need much more power?
I see how LJ speech or thorsten german recordings are structured. Should I move on to recording or am I supposed to know how it all works before recording? In other words does my experiment with existing czech recordings make sense?