Hello.
I’ve read that training my own voice for mimic tts takes besides good mic equipment a huge amount of time. But before any further researches - are there any compatible corpus files available for german language?
Since i have never used the software. Must the process be completely finished before hearing your tts voice or can you already hear how it might sound after finishing 1/3?
Well, if you have too little data, some sentences may not be output correctly. 20 h language files are needed for mimic2. if you have less data can be that there is no result and you have calculated for free the model. The quality is also very important. read the docs about it exactly. I also find in language-de and mimic chat
I started with your german corpus files.
I’m at phrase 195 out of 30049. When do i have the option of testing my tts voice? Is there a button appearing after a minimum of 20h voice has been recorded?
I’m just at phrase 312 of 30049. So there’s still some work to do.
Since i haven’t worked with mimic before is this the right link to follow when i reached a good amount of speech samples?
Would you now still use your existing corpus or would you generate a new one. Yours seems primarily based on phrases used by an (home) assistant.
Maybe we can generate more common/generic phrases from opencontent?
I started using mycroft skill sets until about 2000. then mozilla voice data and rest from 10000 come from wikipedia. To work up this data has cost me myself with wikipedia downloader and mycroft utils filter months. you need to rest assured to create a mimic voice takes very very long
Wow. Seems like you really spend a lot of time in creating and mixing the corpus from different sources. After all of your work should your corpus not be included in official repo as “official” german corpus then?
Thanks for your work
btw: I just finished 500 phrases and i’m getting an idea of how long it will take to speak out all 30.000 phrases. Wow - this takes really a huge amount of time. Hope it’s worth it :-).
My average is 10.4 words per second. And for now i recorded 19 min and 26 sec.
We’ve also found that consistency in recordings really affects the quality of the final voice model. So whilst it’s tempting to record a lot really quickly, it’s better to take your time, with regular breaks, speak at a consistent pace using the same equipment and environment, etc.
Definitely not a straight forward task but can be pretty amazing if you stick with it.