@jrbauer97, you are talking about DeepSpeech (https://github.com/mozilla/DeepSpeech) and the Common Voice project. They are tightly related, but different things and we (Mycroft) are working with them already to build this out.
DeepSpeech is the code that can do STT using machine learning approaches. But, like all machine learning systems, it is really useless without data to ‘learn’ from.
Common Voice is an attempt by Mozilla to build a body of recordings they can use to train on. It is a great idea and we encourage you to go to https://voice.mozilla.org and help with this effort. It is building a fully open (CC0 aka public domain) dataset. But this is just a dataset, it doesn’t ‘do’ anything.
Finally there is the model that Mozilla has generated by training DeepSpeech. That is are what you referred to 6.5%. Those models are based on data from Common Voice plus several other dataset (e.g. the Libre Voice dataset, so TED talks, etc).
Mycroft fits in to this in two ways:
- We are building another dataset for them to train off of. Our users can participate in this by using the Opt In mechanism. We are building a custom license that is different from any existing license, allowing you to back out of sharing your data. This is different from the Common Voice dataset because once you release something to the public domain, you don’t get to withdraw it.
- As DeepSpeech and the trained model we jointly build matures, we are going to integrate it into the Mycroft ecosystem in two ways:
a) DeepSpeech will be the engine we using in on our servers.
b) We will provide tools allowing for a simple setup of DeepSpeech on your own device or private server if you want to host your own STT service and have the computational power available to support it.
The data DeepSpeech is training on right now is decent, but is holding the STT performance back because it is too clean. Most was recorded by someone at a desk using a good mic in a quiet environment. For DeepSpeech to really shine it needs data of people talking in kitchens with a stove fan runnig, coffe shops with lots of chatter in the background, cars with road noise, etc. And it needs lots of that data. That is where we – Mycroft AI and our community – have the unique ability to collaborate and really make this a successful joint effort.
I hope this all makes sense!