Replacing cloud services for STT by local ones


#1

Is the code of Mycroft AI modular so that i can replace the current cloud STT by Kaldi or Pocketsphinx running on a local server?

It’s clear why the company uses cloud services for now and the approach to develop OpenSTT but nevertheless it’s an interesting question if one can replace it with less effort.


Is OpenSTT ready / working with MyCroft
#2

I’m running a fully functional Mycroft with local Pocketsphinx and Spanish language.

See “lspeech” here:

https://github.com/SoloVeniaASaludar/mycroft-core/tree/next/mycroft/client


#3

Great wiki by the way https://github.com/MycroftAI/mycroft-core/wiki/Mycroft-in-Spanish

Which pocketsphinx version do you use 5prealpha? As you use pocketsphinx for both word recognition and STT did you tried snowboy, is it supported? For me latter works pretty well outside of mycroft.


#4

The AI server doesn’t seem to be replaceable by a local one, right?


#5

Same pocketsphinx that is used and distributed with Mycroft

No experience with snowboy. If you test with it, please, do not forget to share your experience.

I do not know what you mean by “AI server”.

About the wiki, it is mainly still valid, but some details should be updated. asap.


#6

I meant: There is a remote server defined in the mycroft.conf which points to https://api.mycroft.ai. Is it possible to get rid of all cloud services to use mycroft standalone?


#7

I think so. I’m using Mycroft fully standalone. Disabled www access, msm skill updates, remote config, etc.


#8

That sounds great. For what purpose you are using mycroft + pocketsphinx, what about the repository recognition accuracy?


#9

Same reasons than all open-source: freedom (including costs).

About repository accuracy, and the reason that I’ve created a new listener: I do not expect talk about philosophy with Mycroft, I do not need free grammar speech all time. I need small commands, restricted grammar, that sometimes involves a dialog and this dialog can have some free speech or some speech not for STT.

Restricted grammar improves a lot the accuracy (best SST nowadays has an accuracy of 5% over words, that is, in a phrase of 10 words, near to 50% error).

Examples:

a) “start tv channel two” => complete command.

b) “mycroft remind me in 10 minutes”; “what to remind?”, “stop kitchen” => command followed by record no STT

c) “mycroft, look at wikipedia”; “what I must look for?”, “Alain Turing” => command followed by free grammar STT.


#10

I got it and i confirm that with a restricted dictionary set pocketsphinx performs very well.

I will check your code as i am interested in German language.


#11

Hello.
My Name is Thorsten and i’m new to mycroft.

I used the raspberry pi image and succeeded in using mycroft out of the box (great peace of software by the way).
Then i tried to use german pocketsphinx configuration based on that wiki article (https://github.com/MycroftAI/mycroft-core/wiki/Mycroft-in-Spanish).

But using the raspberry pi image i have no directory structure like “/usr/share/pocketsphinx/model/” or “<your_base_dir>/mycroft/client/speech/model”. Is this wiki documentation not valid for the raspberry image?

“dpkg -l|grep -i pocket” shows that the pocketsphinx package seems not be installed.

Can someone please point me to the right direction.

Best regards
Thorsten


#12

Hi,

A pocketsphinx package for a language is composed of three elements:

  • one dictionary (.dict)
  • the language model (several files as “mdef”, “means”, usually in a directory called “model”)
  • one grammar (file .lm).

The packages in this address usually follows this format.

In order to use it with Mycroft, it is necessary to download the package, extract/move/copy files and, at the end, have a files structure as the one described in the wiki after the phrase “at the end, you must have the following directory, files and softlink:” (It is not mandatory the softlink, it can be the real file instead).

When using pocketsphinx only to detect the wake words, it is not necessary a dictionary and grammar, only the model. If pocketsphinx will be used as STT, the the dictionary and one or more grammars (lm or jspg) are need. See alternative “lspeech” client here.

This final file structure of directories and files is forced by the speech cllient of Mycroft. It is the same one than for english language, as can be seen here.


#13

Hi.
Thanks for your helpful reply :slight_smile: .

At first i just want to change the “wake word” to a german word. Next step would be switching complete speech recognition to local pocketsphinx installation.

What i have done so far:

  1. Downloaded cmusphinx-de-ptm-voxforge-5.2.tar.gz, cmusphinx-voxforge-de.lm.gz and cmusphinx-voxforge-de.dic from https://sourceforge.net/projects/cmusphinx. I used the “ptm” version because it contains a file named “sendump” which i found also in the original “en-us/hmm” folder.

  2. Extracted and copied the files according to the wiki document.

  3. Created the following directory/file structure:

  • “/home/thorsten/mycroft-core/mycroft/client/speech/model/de-de” containing the files “cmusphinx-voxforge-de.lm” and “cmusphinx-voxforge-de.dic” (these files should not be neccesary for just changing the wake up word).

  • “/home/thorsten/mycroft-core/mycroft/client/speech/model/de-de/hmm” containing the files “feat.params”, “mdef”, “means”, “mixture_weights”, “noisedict”, “README.md”, “sendump”, “transition_matrices”, “variances”.

  1. Set "“lang”: “de-de” in mycroft.conf and set a german wake up word

  2. During start auf “start.sh voice” i receive the following error:

2017-06-18 10:57:59,233 - mycroft.configuration - DEBUG - Configuration ‘/home/thorsten/.mycroft/mycroft.conf’ not found
_ Carnegie Mellon University, Copyright © 1999-2011, all rights reserved_
_ mimic developers, Copyright © 2016, all rights reserved_
_ version: mimic-1.2.0.2 ()_
Traceback (most recent call last):
_ File “/home/thorsten/mycroft-core/mycroft/client/speech/main.py”, line 221, in _
_ main()_
_ File “/home/thorsten/mycroft-core/mycroft/client/speech/main.py”, line 190, in main_
_ loop = RecognizerLoop()_
_ File “/home/thorsten/mycroft-core/mycroft/client/speech/listener.py”, line 193, in init_
_ self.load_config()
_ File “/home/thorsten/mycroft-core/mycroft/client/speech/listener.py”, line 209, in load_config
_ self.mycroft_recognizer = self.create_mycroft_recognizer(rate, lang)_
_ File “/home/thorsten/mycroft-core/mycroft/client/speech/listener.py”, line 220, in create_mycroft_recognizer_
_ return LocalRecognizer(wake_word, phonemes, threshold, rate, lang)_
_ File “/home/thorsten/mycroft-core/mycroft/client/speech/local_recognizer.py”, line 40, in init_
_ self.decoder = Decoder(self.create_config(dict_name))_
_ File “/home/thorsten/.virtualenvs/mycroft/local/lib/python2.7/site-packages/pocketsphinx/pocketsphinx.py”, line 271, in init_
_ this = pocketsphinx.new_Decoder(*args)
RuntimeError: new_Decoder returned -1

After some google research i found that this error may occur if the “model” path is not set using python. But since the path is configured similar to the “en-us” path i think that this is not the problem (https://github.com/cmusphinx/pocketsphinx/issues/32).

I checked the git changes you made and described in the wiki and as far as i understand the first two commits has been merged and the other two are closed at the moment. So i did not change anything within the code right now. Is this neccesary for just changing the wakeup word in the first step?

Addition info:

  • Just for testing i renamed the original folder “en-us” to “en-us.old” and the german folder “de-de” to “en-us”. Just to check if it’s a problem with the path or with the content of the directory. Since the error is still there it seems to be a problem within the content of the “de-de” folder.

  • (I think) I’m using the dev branch (whatever happens by default executing this command: it clone https://github.com/MycroftAI/mycroft-core.git)

  • I’m using ubuntu server 16.04

  • At the moment i didn’t take a look at “lspeech”, because i think it’s just needed for the second step.

Would be great if you can give me another hint.

Thanks so far :-).

Thorsten


#14

I failed with above approach yesterday too. My biggest issue is related to the mic i use which is not well supported pythonwise by the vendor.

I also went ahead and tried to install pocketsphinx 5prealpha, German language. Maybe we can get in sync to get this task done together as we seem to have similar goals.


#15

As far as i can see our common goal is the use mycroft-core with “offline” german voice recognition using pocketsphinx for the wake-up word and for the whole recognition. So would be great if we can support each other solving our problems.

I’m using a cheap headset which works without problems using the default configuration (arecord and aplay works without problems).
I read that installing an additional pocketsphinx is not neccesary, because mycroft has a buildin pocketsphinx installation used for recognition of the wakeup word.

I just checked that i am using the dev-branch and applied all code changes mentioned in the wiki. But modifing the core.py lead to an error during “start.sh skills” so i rolled this modification back.

I’m still getting nearly the error described in here.

So is there a way i can assist to make german recognition functional?


#16

Hi,

On this mycroft source file you can see the statement:

config.set_string(’-logfn’, ‘/dev/null’)

that disables pocketsphinx log. You can modify it to point to some file, by example:

config.set_string(’-logfn’, ‘/var/log/pocket.log’)

in this way, we could see why pocketsphinx is not starting.

My first hypothesis is that english wake word is still configured, but better we go step to step guided by the log files.


#17

Hi.

Sorry for the delayed answer.
Setting up the logfile parameter (config.set_string(’-logfn’, ‘/var/log/pocket.log’)) was really helpful. After creating the logfile and adjusting the permissions i got hints within that approve your “first hypothesis” :slight_smile: .

[snip]

INFO: acmod.c(117): Attempting to use PTM computation module
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/thorsten/mycroft-core/mycroft/client/speech/model/de-de/hmm/means
INFO: ms_gauden.c(242): 66 codebook, 3 feature, size:
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/thorsten/mycroft-core/mycroft/client/speech/model/de-de/hmm/variances
INFO: ms_gauden.c(242): 66 codebook, 3 feature, size:
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(304): 3955 variance values floored
INFO: ptm_mgau.c(476): Loading senones from dump file /home/thorsten/mycroft-core/mycroft/client/speech/model/de-de/hmm/sendump
INFO: ptm_mgau.c(500): BEGIN FILE FORMAT DESCRIPTION
INFO: ptm_mgau.c(563): Rows: 128, Columns: 5198
INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones
INFO: ptm_mgau.c(838): Maximum top-N: 4
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 4101 * 32 bytes (128 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: /tmp/tmpr8TV5V
> ERROR: “dict.c”, line 195: Line 1: Phone ‘EY’ is mising in the acoustic model; word ‘hey’ ignored
INFO: dict.c(213): Dictionary size 1, allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(336): 1 words read
INFO: dict.c(358): Reading filler dictionary: /home/thorsten/mycroft-core/mycroft/client/speech/model/de-de/hmm/noisedict
INFO: dict.c(213): Dictionary size 4, allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 3 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 66^3 * 2 bytes (561 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 105072 bytes (102 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 105072 bytes (102 KiB) for single-phone word triphones
INFO: kws_search.c(420): KWS(beam: -1080, plp: -23, default threshold -2024, delay 10)
> ERROR: “kws_search.c”, line 171: The word ‘hey’ is missing in the dictionary
INFO: kws_search.c(467): TOTAL kws 0.00 CPU -nan xRT
INFO: kws_search.c(470): TOTAL kws 0.00 wall -nan xRT
[/snip]

I double checked my mycroft.conf file that the original wakeup word “hey mycroft” isn’t there.

// Settings used by the wake-up-word listener
// Override: REMOTE
"listener": {
“producer”: “pocketsphinx”,
“grammar”: “lm”,
“sample_rate”: 16000,
“channels”: 1,
“record_wake_words”: false,
“wake_word”: “charlotte”,
“phonemes”: “SH AH EX L OO T AX”,
“phoneme_duration”: 120,
“standup_word”: “charlotte”,
“standup_phonemes”: “SH AH EX L OO T AX”,
“standup_threshold”: 1e-90,
“threshold”: 1e-90,
“multiplier”: 1.0,
“energy_ratio”: 1.5
},

There are no other mycroft config files (like within /home/thorsten/.mycroft or /etc/mycroft.conf) and i found nothing about a cached version of the original config file.

I also checked the code-patches you described within your wiki.

I really would appreciate your further support.

Thorsten


#18

I did a file search using string “hey” but primarily for it in “test” files so that this should not be the point. I used “strace start.sh voice” to find which file is read containing “hey” - without success.

Update 1:
Maybe i should have read the docs more in detail. Regarding this thread “Changing the wake word” i should disable remote updates or configure the wake up word in home.mycroft.ai. I will give it soonly a try.


#19

yes, remote configuration is another feature to be disabled. I will edit the wiki adding this point.


#20

Thank you @PasabaPorAqui for updating the wiki.
Now my german wakeup word works like a charm.

Acording the wiki i added the following two lines to the listener section for offline pocketsphinx stt.

"producer": "pocketsphinx",
"grammar": "lm",

This is the whole section.

// Settings used by the wake-up-word listener
// Override: REMOTE
"listener": {
“producer”: “pocketsphinx”,
“grammar”: “lm”,
//“grammar”: “jsgf”,
“sample_rate”: 16000,
“channels”: 1,
“record_wake_words”: false,
//“wake_word”: “hey mycroft”,
//“phonemes”: “HH EY . M AY K R AO F T”,
“wake_word”: “charlotte”,
“phonemes”: “SH AH EX L OO T AX”,
“standup_word”: “charlotte”,
“standup_phonemes”: “SH AH EX L OO T AX”,
“standup_threshold”: 1e-90,
“phoneme_duration”: 120,
“threshold”: 1e-90,
“multiplier”: 1.0,
“energy_ratio”: 1.5
},

With this configuration german stt works but just using the cloud service of mycroft. After i disabled my network interface i get no stt except the wakeup word. I just receive api errors because of the connection failure.

Commenting out the complete “server” block produces an syntax error on start.sh voice.
Writing an empty url does not work either.

The files (.dict, .lm) are in the “de” directory.

Did i miss something in the wiki (e.g. the git changes)?

Thanks for your help :slight_smile: