Mycroft Community Forum text-to-speech

So there’s this somewhat rude guy on twitter that’s been working on an open text-to-speech based on very little data with absurdly good results.

I’ve been following him for a while and he’s just released it, and I’m still not quite believing how good it is:

I’m not sure if there’s any chance or possibility of this ever getting integrated with Mycroft, but it sure makes the existing stuff pale in comparison.

It is not allowed to be included.

Hey Moff,

Thanks for posting it - interesting to see but would agree with j1nx that it’s basically unusable with Mycroft. It also seems way slower than “real-time” - at least based on generating a couple of simple samples from here.

If you’re looking for different TTS options - have you had a look at Coqui’s recent work?

Yeah I noticed the slow generation too, but it’s hard to say how much of it is due to queuing and wait in the web service and how much the actual processing. It would probably be faster locally with something like a Coral or NCS to run it, but I there doesn’t seem to be an option to run it ourselves yet that I’ve seen. I do hope that gets relaxed and a self hosted version gets released eventually.

If you’re looking for different TTS options - have you had a look at Coqui’s recent work?

I think I’ve skimmed it at some point, but the main project I’ve had on the backburner for a while is a real life version of the curator character from the gravitas game, which has roughly 40 min of spoken lines in game so I’d need something that’s a bit more in the range of learning capability than the usual 10-20 hour stuff which is effectively useless.

1 Like

It’s not the performance, it’s the licensing. The TOS with which you’re presented at that site is incompatible both with MycroftAI’s commercial endeavors and with the huge ecosystem of permissively-licensed software.

You could probably wire it up yourself, but distributing it would be iffy, and it would be straightforwardly illegal for MycroftAI to integrate or ship it themselves.

edit: on a personal level, this person haughtily informs us that they work on this project at MIT and that’s why it’s restrictively licensed. Meanwhile, I release all my voice-related code under permissive licenses - primarily the one developed at MIT. Somewhat rude guy? This isn’t the first person to declare that they’re protecting my users from me.

1 Like

Yeah I figured, though I guess I’m half hoping there may be some chance of a permissive version in the future. I’m not holding my breath though.

It does definitively prove that good voice synth from a small recording set is in fact possible, which settles an argument of impossibility that floats around here often, that’s mainly why I posted it.

He does seem like the type that would develop the best thing possible just to prove he can, then keep it walled off to spite everyone else.

1 Like

Yeah there’s a lot of work going into this space, so I expect it to become much more common in the pretty near future.

Similar stuff happening on the STT side of things which will be a massive game changer for language groups with less support and less written content.