Mycroft Community Forum

Wireless conversational picroft

been having fun with this. You guys created something really special.
After installing it on a pi 4, I added a microphone and a speaker.

Then, I loaded a local version of deepspeech and this is running as service now, locally. It loads from cron and reloads if it goes down. I made the changes in the config file, but I do not know how to tell which service I am really using. Is there a way to confirm that it is using the service locally?

I added ChatScript as a local service now also, also reloading if it goes down. ChatScript is an advanced conversational chatbot framework that has a lot of capability built into it. I added this and a bunch of custom code, so basically, you can talk to it and it remembers you and is somewhat intelligent. Everything is stored locally and it learns from you, basically triples.

It looks like the text to speech can be local if I choose the British male voice. I have to look at this tomorrow. I want to see if I can make this local also. But it looks like everything is going to the Mycroft servers and there is no option for running it locally.
I am not at all opposed to supporting this effort, but I want an option to run it without internet if it is not connected.

I got everything to work end to end, but I am not sure which STT and TTS I am actually using.

When I put it together, you can talk to a pi microphone and it does STT, sends the text to ChatScript. ChatScript does the conversational magic and replies with text. The text is then sent to the text to speech engine. Then it goes through the cable to the speaker.
Basically, it is a wireless standalone chatbot.

I have some questions. How can I tell which services it is really using. I guess I can unplug everything and see if it works?

And deepspeech is really slow. I am wondering if anyone has really gotten it to work quickly on a pi.

the chatscript engine is also running on an app called elfchat on android and ios (free), I am just seeing if there’s another use for my code on this platform.

just some ramblings

2 Likes

Its a bit faster than realtime on a Pi4 but depending on your setup you could be waiting for end of voice intent + timed silence and then only presenting it to deepspeech that adds time to intent then intent processing time and finally intent action such as text to speech.

I think in the new current Alpha 8.0 the internal KWS & VAD callback implementation of deepspeech might show its head as its on the roadmap.
It will then be streamlined for stream with only need for audio input as guess it can be used direct.

But from benchmarks its x1.2 realtime still limited to a single core and Pi3 drops down to less .5x realtime.

So yeah its not the fastest but also when it receves the audio and how long intent action starts and complete could also be adding what seems in response huge latency.

7.1 is the current stable release as 8.x alpha is now live.

A pi is a slow computer. Deepspeech, even with the tflite model, is going to be less than ideal. You can check the logs on your instance of deepspeech, should be logging something somewhere. What config changes did you make? Did you use the streaming or server instance for DS?

Yes, i tried the 3 versus the 4, and the 4 is a lot faster. I was replicating the test posted on seed. While I appreciate her post, I do not see the same results (faster than speech), although she is on an earlier version, V6 and other factors are different.

Here are two ways I tested it,

  1. Running it from the command line:
    (.venv) pi@picroft:~ $ deepspeech --model deepspeech-0.7.0-models.tflite --scorer deepspeech-0.7.0-models.scorer --audio audio/2830-3980-0043.wav

Loading model from file deepspeech-0.7.0-models.tflite
TensorFlow: v1.15.0-24-gceb46aa
DeepSpeech: v0.7.1-0-g2e9c281
Loaded model in 0.00428s.
Loading scorer from files deepspeech-0.7.0-models.scorer
Loaded scorer in 0.00109s.
Running inference.
experience proves this
Inference took 3.650s for 1.975s audio file.

  1. By calling the locally running server
    (.venv) pi@picroft:~ $ time curl -X POST --data-binary @audio/4507-16021-0012.wav http://localhost:8080/stt
    why should one hall on the way
    real 0m5.030s
    user 0m0.038s
    sys 0m0.028s

I will have to look at this later, maybe there is some setting that delays the timing.
Also, I was surprised to see that the command line is significantly faster than a running instance of this service. But looking at the load time, I guess this makes sense.

I am not sure I understand this, “how long intent action starts and complete”.
I will try other audio files.

re deepspeech on a pi, not an out of the box solution for you, but check

Contains streaming STT for deepspeech and kaldi, which should run in a pi

Not integrated in mycroft-core yet (but should be fairly easy to add support for it)

2 Likes

Dunno strange as I was running 7.0 and was getting exactly what they where saying approx x1.2 realtime for the pi4 and < x0.5 for Pi3.

But that was just a vanilla Deepspeech only install, on the couple of sample wavs they provide and the benchmarks where bang on expected.
If you run with the scorer dunno what it does but you have to run it a couple of times and benches eventually become right.

I was also not all that impressed about single core use as DNN is possible to multithread its just a bit complex.
It was just the though that it might get to be > realtime on a Pi3 and Pi4 well then you talking.

What is also interesting for the Pi4 is the new RaspiOS 64bit release as it could well translate to another 20-30%.
But multithreading seem to be a touchy subject amongst the Mozzila developers.
But noobed my comments in there.

“how long intent action starts and complete” yeah not very sensicial but whatever your input process is, its the time it takes to get a the result to the the user. TTS has latency also.

Excellent! I don’t suppose that you have the Mycroft skill for connecting to the Chatscript chatbot in Github? I’d love to load both the skill and Chatscript on my animatronic and fuss around with it.

Charles, direct IM me for the code. It is a series of steps. If we can get it to work on your end, maybe we can publish something for others. I got it to work, but it took a number of steps.
info @ projectonegames.com