Why we don’t just use IPA for direct speech processing and easy way for STT? We won’t even need to exchange phonetic form of words and sentences to text form for processing language. Fortunately Unicode already has IPA code. We could remove speach-to-text layer from Mycroft AI. Direct IPA processing.
I’m a huge fan of IPA, but I’m told that speech-to-phoneme is a lot harder and more error prone than direct speech to text… I believe I may have heard this here, in fact. Maybe someone more knowledgable than I could explain it?
I’ve often though that if such a thing could be done, it would instantly simplify things like translation, and phoneme-to-speech would allow much more natural text-to-speech. But I don’t think it’s that simple. Also, I guess its just that IPA - as good as it is - still cannopt fully capture the full range and nuances of the human voice.
Look at this. They use phoneme for text-to-speach services. Why not use it for speech-to-text services?