I’ve a similar experience. After some very disappointing tests with Google English STT (ok, could be it was caused by my perfect spanglish … or not), I decided not be ambitious. I do not need talk about philosophy with Mycroft.
What I need is small commands (“play music”, “add to shop list”, …). For these commands a jsgf grammar file is perfect, reaching near than 100% recognition.
Only some of these commands can trigger an answer followed by a free speech record. By example “mycroft, look at wikipedia” can be followed answer “what I must look for?” and a free speech phrase that needs to be converted using a generic language model. For this reason, the pull request done adds the possibility of switch between pocketsphinx grammars (or change the STT service provider).
In some other cases the answer will be followed by a record not to be translated by the STT. By example, “mycroft, remind me in ten minutes” can be followed by “what I must remind you?” followed by a phrase not to be translated, because after ten minutes mycroft will play exactly the recorded sound without processing nor understanding.
( yes, I’m tired of burn the chicken, I want say to mycroft “remind me in 5 minutes: stop the cooking fire”. I do not need that mycroft understands “stop the cooking fire”, just that replays it).
I see very difficult nowadays find an STT that understands all the thousands of commercial product names. For this reason, I would considerer a simple skill started with “mycroft, add to shopping list” followed by play “what I must add?” and record of a free phrase.