15.ai text-to-speech

MoffKalast · September 29, 2021, 8:59am

So there’s this somewhat rude guy on twitter that’s been working on an open text-to-speech based on very little data with absurdly good results.

I’ve been following him for a while and he’s just released it, and I’m still not quite believing how good it is:

https://15.ai

I’m not sure if there’s any chance or possibility of this ever getting integrated with Mycroft, but it sure makes the existing stuff pale in comparison.

j1nx · September 29, 2021, 2:08pm

It is not allowed to be included.

gez-mycroft · September 30, 2021, 1:04am

Hey Moff,

Thanks for posting it - interesting to see but would agree with j1nx that it’s basically unusable with Mycroft. It also seems way slower than “real-time” - at least based on generating a couple of simple samples from here.

If you’re looking for different TTS options - have you had a look at Coqui’s recent work?

MoffKalast · September 30, 2021, 8:56am

Yeah I noticed the slow generation too, but it’s hard to say how much of it is due to queuing and wait in the web service and how much the actual processing. It would probably be faster locally with something like a Coral or NCS to run it, but I there doesn’t seem to be an option to run it ourselves yet that I’ve seen. I do hope that gets relaxed and a self hosted version gets released eventually.

If you’re looking for different TTS options - have you had a look at Coqui’s recent work?

I think I’ve skimmed it at some point, but the main project I’ve had on the backburner for a while is a real life version of the curator character from the gravitas game, which has roughly 40 min of spoken lines in game so I’d need something that’s a bit more in the 15.ai range of learning capability than the usual 10-20 hour stuff which is effectively useless.

ChanceNCounter · September 30, 2021, 11:07pm

It’s not the performance, it’s the licensing. The TOS with which you’re presented at that site is incompatible both with MycroftAI’s commercial endeavors and with the huge ecosystem of permissively-licensed software.

You could probably wire it up yourself, but distributing it would be iffy, and it would be straightforwardly illegal for MycroftAI to integrate or ship it themselves.

edit: on a personal level, this person haughtily informs us that they work on this project at MIT and that’s why it’s restrictively licensed. Meanwhile, I release all my voice-related code under permissive licenses - primarily the one developed at MIT. Somewhat rude guy? This isn’t the first person to declare that they’re protecting my users from me.

MoffKalast · October 1, 2021, 10:14am

Yeah I figured, though I guess I’m half hoping there may be some chance of a permissive version in the future. I’m not holding my breath though.

It does definitively prove that good voice synth from a small recording set is in fact possible, which settles an argument of impossibility that floats around here often, that’s mainly why I posted it.

He does seem like the type that would develop the best thing possible just to prove he can, then keep it walled off to spite everyone else.

gez-mycroft · October 4, 2021, 2:10am

Yeah there’s a lot of work going into this space, so I expect it to become much more common in the pretty near future.

Similar stuff happening on the STT side of things which will be a massive game changer for language groups with less support and less written content.

synesthesiam · October 21, 2021, 5:21pm

You might want to try out Larynx (I’m the author). It has an open source license (MIT), and over 50 voices trained from public data across 9 languages.

Here are some voice samples: Larynx Voice Samples

The Larynx web server runs locally and has a MaryTTS-compatible API, so after installation you can just follow the MaryTTS instructions in the Mycroft docs (port is 5002 by default).

Let me know if you have any questions

ChanceNCounter · October 21, 2021, 6:54pm

@JarbasAl is this a good candidate for pluginification?

@synesthesiam Jarbas has been dropping pip installable STT and TTS module plugins, to minimize setup. There’s a whole plugin ecosystem in the works, so this will (presumably) be the easiest way for end users to handle things in the long run, as well (though I imagine MycroftAI will drop some code of their own.) We already have a plugin manager in dev, if only for the meantime. For now, we’re just pip installing these modules and plugging them into our mycroft.conf

synesthesiam · October 21, 2021, 7:30pm

Larynx can be installed through pip, and then executed with python3 -m larynx.server

However, only one (English) voice is included by default. The rest are downloaded on request into $HOME/.local/share/larynx

Not sure if this would be a problem for a plugin – files being downloaded post-install. The voices can be manually downloaded as well, if needed.

JarbasAl · October 21, 2021, 9:24pm

plugin exists and is the default option for the online female voice in OpenVoiceOS (if you select local backend you can choose male/female online/offline)

ChanceNCounter · October 21, 2021, 9:34pm

Mea culpa. I did not think to look under colleague orgs.

synesthesiam · October 28, 2021, 8:46pm

Great work, @JarbasAl!

FYI, the Larynx web API also supports a ssml=true flag that indicates the input text is SSML (you can mix voices and add pauses).

I’ve also recently released OpenTTS 2.1. This is a collection of Docker images for 27 languages that have all of the open source text to speech systems/voices I could find. Besides the usual eSpeak/Festival/Flite, this includes neural TTS voices for:

de (German)
el (Greek)
en (English)
es (Spanish)
fi (Finnish)
fr (French)
hu (Hungarian)
it (Italian)
ja (Japanese)
ko (Korean)
nl (Dutch)
ru (Russian)
sv (Swedish)
sw (Swahili)
zh (Chinese)