We are chasing the Watson AI XPrize, and the first project we are tackling as part of this initiative is an Open Speech-to-Text project called OpenSTT.
Check out the XPrize page on our website for more info.
To share ideas you can post them here, on this forum, using the OpenSTT category.
Congratulations guys. Hope you will get far
Congratulations!!! I will contribute in all that i can. See ya
Awesome, looking forward to this!
Great to hear!
I would suggest checking out Kaldi which I think is really good to build upon (seems to be much better than CMU Sphinx, but I don’t have tons of knowledge there).
Otherwise I think CNN + LSTM is the way to go!
Good language models are dearly needed in the Open Source Community, for sure.
PS: there seems to be at least one model for Kaldi in the open: https://github.com/alumae/kaldi-gstreamer-server
@wolfv very good feedback. I’d been looking for a Kaldi implementation out in the open and hadn’t found one. I will play with this.
Hey @wolfv, thx very much for sharing this. We’ve been looking into Kaldi project in the past and it seems very promising! It’s on the top of our list to try.
I was wondering if you are interested in joining the Machine Learning Team. If so, let me know more about you by commenting on this topic: The Machine Learning Team
P.S. Everyone is welcome to participate on the ML Team. Join us, share your experience and interest by commenting on the above topic.
I was wondering if you guys could keep us updated on whats been happening in this space. I didn’t see any commits on OpenSTT Github account as well.
Looking fwd to hear back from you guys. Its good stuff mycroft has taken up.
I heard about the OpenSTT Project through the Linux Luddites podcast (https://linuxluddites.com/shows/episode-78/). One quick idea for finding data to use for testing/validation is the LibriVox community. There are recordings of public domain books read by volunteers in many languages (https://librivox.org/)
There is actually a dataset for automated speech recognition based on LibriVox audiobooks, and there is a Kaldi model learned on this dataset.