Speech Recognition Accuracy

I keep replaying a scenario in my head about another A.I. entity that just did not work out. I am talking about The Ubi.

When The Ubi first started crowdfunding I had high hopes for it. I thought it may even be a replacement to my Denise program since it was hardware and could be plugged into any outlet. It was not until I finally received The Ubi was when I had a harsh realization set in. The Ubi, frankly was not that good.

Why? Well depending on whom you ask it could range from a number of reasons. However one of the biggest gripes was speech recognition. Fact of the matter is, The Ubi simply did not have a very good onboard microphone.

This is one HUGE fear I have about MyCroft. I want it to succeed. However if it has a cheap and or bad onboard microphone, we will be seeing a lot of complaints. Being able to shoot commands off by voice is one of the biggest fundamentals of MyCroft is it not?

I am sure at this point the roadmap of what hardware and such inside MyCroft has already been decided and set. However I am simply saying that being able to understand our commands is important, so having a good microphone is essential.

This topic may be a year too early since we will not know how good or bad the speech recognition will be until we get our hands on it.

What do you all think?

3 Likes

You’re not wrong. A low-quality mic can end the whole thing. The Amazon Echo solved this problem with a combination of highly specialized hardware and software. It is the first piece of hardware of it’s kind to have a 7 mic array, and one of very few systems to use several-channel speech recognition. It also has on-board beam forming (where is that speech coming from?) and echo cancellation (ignore audio coming out of this device). Lastly, specialized acoustic models are captured to help detection of far-field utterances. Some of these things will be available to us for our initial version, like on-board noise-cancellation in software. The more sophisticated hardware and software is something we’re always looking for expertise on, and will integrate should it become available.

@ryanleesipes can speak more to specifics, but I believe we’re talking to new hardware partners specifically about the microphone.

1 Like

That is good to know @seanfitz so thanks for that. I think in this situation, a microphone array is needed. The Ubi only has one microphone. However in situations like the Echo and even the Microsoft Kinect where a microphone array is used, they have far better success.

Yes, a good microphone is essential. We realize this is of the utmost importance. We are creating an experience, and at the core of that experience is Mycroft being able to understand what is said to it. We are at the beginning of another hardware sprint and looking at incorporting a pretty decent mic set up that I’m pretty confident about, looking forward to testing this version.

I agree with Dominique and this is a part where I’m eager to read advancement and news.

I have a little experience designing/building beamforming microphone arrays, thought not for speech frequencies, I would assume there are some extra considerations to be made there. I’ve been wanting to start an open-hardware beamforming microphone project for a while now, initially as a tool to help recording focus groups and other interview scenarios (e.g. one audio channel per person in the room). Unfortunately my time is extremely limited so I don’t know when I’ll actually start, but I wanted to throw this out there to anyone who might be interested in identifying good microphones to use and other components. At least for me, I find that finding “the right” component takes a lot of my time, once I’ve identified the components designing the circuit and laying it out isn’t too bad.

I’m glad this topic was revisited.

Echo cancellation through a pulseaudio-sink might be a good start. It also holds a lot of customization potential for advanced users, as they can easily be streamed (say the audio-source is a different machine).