It’s exciting to see these adapted voices getting better. Particularly for users of alternative communication technologies who either had to pay thousands of dollars to get a custom voice, or have the same 3 voices as everyone else who uses the same device as they do.

Speaker identification would certainly be useful for Mycroft too…

Have you tried it out on your own voice?


It currently requires a cuda enabled GPU, I’m sure there is a work around (ie use the torch-rocm docker image, and run it from in there and do some code massaging), I just have done that yet.

I would be really interested in anyone else’s experiences though!


If you want to try it out without dropping a lot of cash on hardware, has some suggestions and guides on how to get started on different cloud platforms.


How do you save a model and run it as a server? That’s what I want to know.

Quality is so-so based on the samples, though. Wonder if there’s like a minute version that improves things.