Here is a dictation GUI app

I have been frustrated with Mycroft at every step of the way.
The first thing I read about it was a blog entry saying how embarrassing it was at the first launch, when nothing worked.
But NOW here was a launch of a new version which wouldn’t fail, and it happened 5 Oct 2019, just a couple of weeks ago.
So I decided to try Mycroft. And it didn’t work.
Now knowing the reasons why, it was obvious it wouldn’t work, and that it was my fault for trying to doing things the Ubuntu way.
However, having fixed that, it still doesn’t work.

Anyway the basic job I want to tackle is to dictate into a word processor. Amazingly, Ubuntu doesn’t have ANY software package to do this.
So I have written a simple wxpython dictation app, a mockup if you like, as it has a routine mypocketsphinx() which should call something else to do the speech to text conversion.
I would have written more, only importing the right Python libraries is something I have never quite understood (and pip, wheel, six, tools and the rest).
It will probably be slow, but it will be faster than going round in circles with Mycroft.
It requires wxpython libraries, which you can get by:
$ sudo apt install python3-wxgtk4.0

If anyone can help finish it, I would be very grateful.
http://davekimble.net/dictation.2.py

Hi Dave,

Mycroft is definitely not intended to be used as a general dictation app, so great to hear you’ve started project to fill this gap.

Audio troubles on Linux are certainly one of our most common frustrations, and with the range of hardware are unlikely to disappear anytime soon. If the audio device is registered by the system then we can generally get it working, but it can take a bit of configuration.

I also find a missing feature on the Linux Desktop or in the LibreOffice Suite, a dictation or helper in writing documents.

Some time ago, I proposed a document writing skill,by using the pyUNO library from LibreOffice. I even created a proof of concept sending a text from a python script to a LibreOffice Writer. But I’m not so good at coding as that skill needs.

The idea behind was to let Mycroft to start a headless instance of LibreOffice writer and send line by line what it heard. PyUNO also should let format text and so on, so that would help to impaired (or lazy) people. There is also a python library (PyOO) to work with spreadsheets, but I think that is not for interest here.

Finally @JarbasAl created a [dictation skill] (https://github.com/JarbasAl/skill-dictation) - I didn’t tested-, which can put the text you tell to mycroft and then, send it by email. I think that could be modified to save a txt file instead sending it via email.

I think this should work with Mycroft, not just because I think this is a skill ideal for a voice assistant, but because here there are some great developers community members who can contribute if we can glue the basic pieces together.

I’m afraid I can’t help to code this on the short term, first I need to learn and code another priorities.

that dictation skill is super old and unsupported, please don’t ask questions about it…

honestly this is not a great use case for mycroft, there are timeouts and such in the recording code, if you remain silent for a bit it just stops recording, so i suggest doing it some way that does not use the native mycroft listener, maybe once started you can monitor pulseaudio directly or something

i have directly changed the listener code a bunch of times for different projects to support this kind of thing, i might make some open source version at some point

one more thing, if you pick a route that does not enforce maximum recording time (as i would), be sure to use a streaming STT service, else you can expect errors, most STT enforce a limit (15 seconds or so?)

I certainly think what I need is inside Mycroft, but the structure of Mycroft isn’t suitable for my app. I haven’t found a diagram of the structure of Mycroft, so I’m only guessing,

I have built and installed pocketsphinx and its dependency, sphinxbase, but I can’t get my app to import anything that matches up with it. Likewise pyaudio. I must say I don’t like Python because of that impasse.

I want my app to be as simple as possible, no bells and whistles, the least number of black boxes and very simple interfaces - STT(speech, text) .

i hate to be that guy, but pocketsphinx’s accuracy is so low that it ain’t worth the trouble

pocketsphinx is only useful for limited vocabulary, and even then accuracy ain’t great

If you are looking for something simple to use, check https://github.com/Uberi/speech_recognition

2 Likes

What about the recorder app by the new Pixel Phone by Google?
This is why I would buy such a phone.
A leaked apk by Google imo means the tech will get OS or similar soon. Couldnt get closer to what I would love to have for decades now. Maybe sth. for our MyCroft?
Cheers
Kelvin

I have downloaded speech_recognition and the first tests seem to work:
I said “computer” twice and it got that right!
Then I said “Hey Mycroft” and it got “play Minecraft”

$ python -m speech_recognition
ALSA lib pcm_dmix.c:1052:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_dmix.c:1052:(snd_pcm_dmix_open) unable to open slave
A moment of silence, please…
Set minimum energy threshold to 188.37136069
Say something!
Got it! Now to recognize it…
You said computer
Say something!
Got it! Now to recognize it…
Oops! Didn’t catch that
Say something!
Got it! Now to recognize it…
You said computer
Say something!
Got it! Now to recognize it…
You said play Minecraft
Say something!
Got it! Now to recognize it…

I don’t know how to “set minimum energy threshold”, and I don’t know how to write code to call it.
Its README says it can use multiple services (recommendations accepted):

  • CMU Sphinx <http://cmusphinx.sourceforge.net/wiki/>__ (works offline)
  • Google Speech Recognition
  • Google Cloud Speech API <https://cloud.google.com/speech/>__
  • Wit.ai <https://wit.ai/>__
  • Microsoft Azure Speech <https://azure.microsoft.com/en-us/services/cognitive-services/speech/>__
  • Microsoft Bing Voice Recognition (Deprecated) <https://www.microsoft.com/cognitive-services/en-us/speech-api>__
  • Houndify API <https://houndify.com/>__
  • IBM Speech to Text <http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text.html>__
  • Snowboy Hotword Detection <https://snowboy.kitt.ai/>__ (works offline)

It has multiple dependencies, and I don’t know how to check if I have them.
I really don’t want this sort of thing.
QUESTION: Can it really be done entirely with Python3 code, or does it call C++ or assembler routines?
I used to write a lot in low-level languages, but my patience-level is much less these days.

New, improved https://davekimble.net/dictation.3.py .

I tried dictation2.py the other day and I couldn’t even get my microphone work with it. I put the nice name of it, as I saw you put with yours, but I had no luck.
I see the same in this version. :expressionless:

It must be something to do with what directory you are in when you do the pip install.
I have “successfully” pip installed pyaudio and speech_recognition, the speech_recognition-master folder is in ~/MyData/ so my dictation.n.py script cannot find it. I don’t know where /pyaudio/ is.

Reading https://docs.python.org/3/reference/import.html you can see that the import process is conceptually a total mess. It shouldn’t take 10 pages to define how you import things.

This is a tidied up version which either fails “ModuleNotFoundError: No module named ‘speech_recognition’” or (with #import …) runs with fails “NameError: name ‘sr’ is not defined”.
https://davekimble.net/dictation.4.py

It isn’t going to really use a microphone until speech_recognition works

Still simplifying the project until I can achieve something that works.
PyAudio seems to be the “do everything tool”, but it can’t identify microphone devices. Some devices, like my webcam+mic, are 1 device but 2 sub-devices, and since PyAudio doesn’t deal with sub-devices, the device uses sub-device 0, which is the webcam, not the microphone.

Given all the features put into PyAudio, it is odd that it doesn’t deal with real-world microphones.

1 Like