Speech-to-Text on Personal Server

StoneOfLight · February 26, 2020, 2:07am

So, I apologize if this has been asked before. I was looking at selene-backend and I didn’t see anything about a Speech-to-Text API. Is that currently not a thing for personal backends or is there something extra we need to do in order to set it up?

Edit: I’m dumb and didn’t read far enough. I saw the part about Google STT API key. Can we use our own STT setup instead? Keeping data private from big tech companies like Google is a large reason for looking into Mycroft.

Fellhahn · February 26, 2020, 4:39am

You can run Mozilla’s DeepSpeech engine either on a server in your home. You need to install the deepspeech engine itself:

github.com

mozilla/DeepSpeech/blob/master/doc/USING.rst#using-the-python-package

.. _usage-docs:

Using a Pre-trained Model
=========================

Inference using a DeepSpeech pre-trained model can be done with a client/language binding package. We have four clients/language bindings in this repository, listed below, and also a few community-maintained clients/language bindings in other repositories, listed `further down in this README <#third-party-bindings>`_.

* :ref:`The C API <c-usage>`.
* :ref:`The Python package/language binding <py-usage>`
* :ref:`The Node.JS package/language binding <nodejs-usage>`
* :ref:`The command-line client <cli-usage>`
* :github:`The .NET client/language binding <native_client/dotnet/README.rst>`

.. _runtime-deps:

Running ``deepspeech`` might, see below, require some runtime dependencies to be already installed on your system:

* ``sox`` - The Python and Node.JS clients use SoX to resample files to 16kHz.
* ``libgomp1`` - libsox (statically linked into the clients) depends on OpenMP. Some people have had to install this manually.
* ``libstdc++`` - Standard C++ Library implementation. Some people have had to install this manually.

This file has been truncated. show original

and a web service to make it available across the network:

See baconators response here to get the web service running properly as an actual service:

And finally configure your MyCroft.conf file:

I just did this recently but have to tell you, the accuracy is terrible on DeepSpeech at the moment, basically unusable. For me this is a combination of two things, my Australian accent, and a lack of modelling data for deepspeech. The former is here to stay but the latter will get better with time as more people contribute data to the project. If you’re in any kind of position to encourage others to contribute to the Mozilla Common Voice project, do so:

https://voice.mozilla.org/en/speak