Speed of reacting to the wake word

Hy

I use Mycroft on a PI3.
When I call the wake word and I can see that the wake word is recognized in the CLI it takes 1.3 seconds till I hear the sound over my 3.5mm headphones to tell me that mycroft is listening. In some video I can see that it can react faster.
Can some one tell me if the Pi3 is the bottleneck or what can I improve?

Andy

If you run the command;

pactl list sinks

What is the latency saying of your audio device?

1 Like

0.78seconds

State: IDLE
        Name: alsa_output.platform-soc_audio.analog-mono
        Description: Built-in Audio Analog Mono
        Driver: module-alsa-card.c
        Sample Specification: s16le 1ch 48000Hz
        Channel Map: mono
        Owner Module: 7
        Mute: no
        Volume: mono: 51773 /  79% / -6.14 dB
                balance 0.00
        Base Volume: 56210 /  86% / -4.00 dB
        Monitor Source: alsa_output.platform-soc_audio.analog-mono.monitor
        Latency: 782805 usec, configured 1365333 usec
        Flags: HARDWARE HW_MUTE_CTRL HW_VOLUME_CTRL DECIBEL_VOLUME LATENCY
        Properties:
                alsa.resolution_bits = "16"
                device.api = "alsa"
                device.class = "sound"
                alsa.class = "generic"
                alsa.subclass = "generic-mix"
                alsa.name = "bcm2835 ALSA"
                alsa.id = "bcm2835 ALSA"
                alsa.subdevice = "0"
                alsa.subdevice_name = "subdevice #0"
                alsa.device = "0"
                alsa.card = "0"
                alsa.card_name = "bcm2835 ALSA"
                alsa.long_card_name = "bcm2835 ALSA"
                alsa.driver_name = "snd_bcm2835"
                device.bus_path = "platform-soc:audio"
                sysfs.path = "/devices/platform/soc/soc:audio/sound/card0"
                device.form_factor = "internal"
                device.string = "hw:0"
                device.buffering.buffer_size = "131072"
                device.buffering.fragment_size = "131072"
                device.access_mode = "mmap+timer"
                device.profile.name = "analog-mono"
                device.profile.description = "Analog Mono"
                device.description = "Built-in Audio Analog Mono"
                alsa.mixer_name = "Broadcom Mixer"
                module-udev-detect.discovered = "1"
                device.icon_name = "audio-card"
        Ports:
                analog-output: Analog Output (priority: 9900)
        Active Port: analog-output
        Formats:
                pcm

Removing the timebased schedular might bring it down.

Edit your /etc/pulse/default.pa and find;

load-module module-udev-detect

Replace it with this;

load-module module-udev-detect tsched=0

Might need a reboot for easy to get these setting applied. That might bring the latency down.

There is more ttweaking you can do, but is a bit of try and error;

1 Like

wohoo nice => 0.098s
Latency: 98142 usec, configured 99954 usec

Now it is much snappier. Still the answers take some time but it feels much better now

The answers need to come from external server, so nothing you can do there.

yeah exactly maybe in the future mycroft can already start talking while the info from the internet is delivered so there is not that “huge” waiting time

For the Mark-2 they make use of the streaming TTS of google. You could do that yourself as well, but requires a API key.

Haven’t looked into that yet myself, perhaps @forslund or @gez-mycroft can help you out with that.

thx I’m currently installing

mycroft-pip install google-cloud-speech

but this takes ages (stillt installing)… have already a working API Key since I use ti with my OpenHAB system.

by the way is it possible to send the recognized text by Mycroft to a url? I’m think of this Scenario: Mycroft passes the recognized text to the REST url of OpenHAB this then decides via a “rules” whether it should play the text via TTS (of the OopenHAB) to the pulseaudio speakers or not.
(I know there is a OpenHAB skill but I like to have the text recognized by Mycroft as text in a OpneHAB rule)

Euhm, I see i (once again) mixed up the STT and TTS stuff…

Please double check and see if it is still what you want :wink:

This is a great tip on the timebased schedular. Will add it to the docs.

Yeah the Mark II prototypes are using the Google Cloud Streaming STT. There is also a built in support for DeepSpeech Streaming STT and the new IBM Watson service provides Streaming STT but Mycroft Core hasn’t yet been updated to support it.

Yes, as the RPI does not have a hardware clock, timebased scheduling is really not optimal.

I know nothing about it, is this something we should just drop to 0 by default?

Depends a bit, could be now we use pulseaudio by default.

But perhaps need some testing from different users. I use it for a while now and have not run into problems, but perhaps with high bitrates playback of music this might not be beneficial.

can you help me with how to change it to that value it equals 0 on my rasspberry pi

In /etc/pulse/default.pa add tsched=0 to the line load-module module-udev-detect so it looks like

load-module module-udev-detect tsched=0
2 Likes

when I did this , it replied faster put the first word of the respond seems to be cropped, somebody wrote that 0.098s is giving greate results but I don’t know how to change it’s value from 0 to 0.098

Have you set some parameter is the nano /etc/pulse/daemon.conf file?

For every answer mycroft generates a new mp3 that is then played to the pulseaudio server.
Now for example mplayer needs around 2 seconds to open the tunnel to the pulseaudio sever and then play the mp3. How can we improve this?

One way would be to have a stream open the whole time mycroft is running with an unhearable audio sound. And once there is a new answer to play through the speakers the connection is already up and running and the new “mp3” can be injected into the the same stream…