Speech-to-Text on Personal Server

So, I apologize if this has been asked before. I was looking at selene-backend and I didn’t see anything about a Speech-to-Text API. Is that currently not a thing for personal backends or is there something extra we need to do in order to set it up?

Edit: I’m dumb and didn’t read far enough. I saw the part about Google STT API key. Can we use our own STT setup instead? Keeping data private from big tech companies like Google is a large reason for looking into Mycroft.

You can run Mozilla’s DeepSpeech engine either on a server in your home. You need to install the deepspeech engine itself:

and a web service to make it available across the network:

See baconators response here to get the web service running properly as an actual service:

And finally configure your MyCroft.conf file:

I just did this recently but have to tell you, the accuracy is terrible on DeepSpeech at the moment, basically unusable. For me this is a combination of two things, my Australian accent, and a lack of modelling data for deepspeech. The former is here to stay but the latter will get better with time as more people contribute data to the project. If you’re in any kind of position to encourage others to contribute to the Mozilla Common Voice project, do so:

https://voice.mozilla.org/en/speak

3 Likes