Respeaker 6Mic Array AEC and Noise spression

Stephen_O_Sullivan · June 8, 2020, 7:03pm

Hi All,

I know this is not the correct Forum to ask in, but the ReSpeaker Forum is pretty bad for finding answers. I am hoping someone here can help me. I have a respeaker 6 Mic Array, and it works pretty well. The last thing I want to do with it is enable Acoustic Echo Cancellation and Noise Suppression, for better wake word detecting if music is playing in the background.

Has anyone got this working? The documentation is confusing. It lists example code to enable these features, but it also says that processed audio is available through the loop back interface. Does anyone know which is correct? I can’t see why they would provide code to process the signal, if it is already done on board itself? I’m confused.

Hope someone can help me out here,

Thanks in advance,

Stephen

j1nx · June 8, 2020, 8:48pm

@StuartIanNaylor Is the current AEC expert.

StuartIanNaylor · June 8, 2020, 9:13pm

No expert as its Dom with the respeaker USB things.

It needs to be set a limited channel mode all channels its just doesn’t work.

Its all there on the respeaker wiki as it has a python properties control app where you set.
Channels I think might be the firmware you flash.

If its 6 mic for PI dunno never played with it but quite likely like the 4 mic 4 Pi the loopback channels are somewhere between outright BS or active imagination via Seeed.

Noise supression for a VoiceAI is sort of pointless really as it should be done via the MFCC process.
There are some excellent MFCC libs such as Librosa https://librosa.github.io/librosa/ but what we have is quite poor in comparison.
https://github.com/JuliaDSP/MFCC.jl is godlike but Julia and not python.

Is shown as an EC lib so presume it doesn’t have hardware EC as why mention the software one.

Just do as it says.

sudo apt-get -y install libasound2-dev libspeexdsp-dev
git clone https://github.com/voice-engine/ec.git
cd ec
make

Install the alsa fifo

git clone https://github.com/voice-engine/alsa_plugin_fifo.git
cd alsa_plugin_fifo
make && sudo make install

Copy the asound.conf to /etc/

Run ec in cli console ./ec -i plughw:1 -o plughw:1 -d 75 -f 2048

In another cli console arecord -fS16_LE -r16000 rec.wav

In another cli console aplay file_example_WAV_10MG.wav

When you stop recording you will notice EC ends.
So to make it permanent you will just have to sort the pcm names and device names but using examples

sudo modprobe snd-aloop
arecord -D my-ec-output | aplay -D aloop,dev0

Make aloop,dev1 the default capture and ec will always record but only kick on on media play.
I think there is prob a slight mistake in the code ec.c where the small 10ms filter_frame should be a power of 2 (128, 256 or 512) as FFT like it that way. 10ms of 16000hz is between 128 & 256 but speex seems to think 20ms and why maybe I think 256 sounds a little better and 512 seems to work OK aswell.
He has actually set the filter_tail to a power of 2 and actually that doesn’t matter so much, but no harm in keeping to the power of 2 .
So maybe do a hard edit and make.

So it all works via alsa, kicks in when only media is played and with the loopback will not need a restart on each recording end.

Only thing I did any different as noticed raspbian speex & speexdsp are quite old and an RC I just downloaded the tar and compiled and installed the 1.2.0 release before making EC.
ec_hw doesn’t seem to work with my 4 mic linear but not surprised as I am unsure if it has a loopback channel that is stated in the sales lit, think they are mixing it with the usb 4 mic.

If someone can provide a working latency measurement of that arrangement, it would be really appreciated as my 75ms delay is a total guess.
I tried the alsabat --roundtriplatency and couldn’t seem to get it to work but it does with straight hardware. Also prob wouldn’t be correct because we have a pipe through a loopback after EC.

Again once more my annoying voice but a default install of ec without any tinkering on a Pi4 should sound like this.

StuartIanNaylor · June 9, 2020, 12:49pm

@Stephen_O_Sullivan

If you have any problems with EC then give us a shout, its actually really easy to install.
Also has a pulseaudio default.pa also.

Needs a Pi3A+ or Pi4 as the poor old single core Zero has neither Neon or Ooomph for software filters.
It requires the capture and playback to run from a single clock so its synchronized which means they need to be on the same card.
Otherwise the clock drift just kills the EC process.

So use the output of the card not the Pi3 3.5mm.

Stephen_O_Sullivan · June 9, 2020, 1:41pm

@StuartIanNaylor,

thanks for taking the time to respond.

Input and output are done the same card already, and i have a Raspi 4 (4GB), so all requirements should be met.

MY setup works very well with the Respeaker Card. Just sometime, the wake word is not captured if the music is on the loud side.

So, what do I need to do to enable the echo cancellation?

Thanks and Regards,

Stephen

StuartIanNaylor · June 9, 2020, 2:45pm

Grab a fresh image of Raspbian.

PS on a Pi4 64bit OS is actually about 15-20% faster but just grab a 32bit raspbian lite and give it a fresh flash just to get to grip with things.
Install that horrid respeaker driver.

I am not even going to bother with the Loopbacks on the 6mic and just stick to what I know. I am presuming they are just the same but hardware loopbacks that the kernel provides with ```
snd-aloop and you can play with those later.

I am pretty sure the echo mentions of Respeaker are snake-oil as the silicon lacks DSP.

From the repo https://github.com/voice-engine/ec do as it says.
For some reason Raspbian still includes the RC version of SpeexDSP and its up to you as the compile of the latest build is easy and if this is additional or not I will just run through here or just jump to the instructions of https://github.com/voice-engine/ec

git clone https://github.com/xiph/speexdsp
Grab the latest and then cd speexdsp`` Add some compile tools ``sudo apt-get install autotools-dev autoconf libtool pkg-config
Think its ./autogen.sh
From memory
./configure -h will show the options only one you need to add is the libs-dir= which is that funny /usr/lib/gnu-linux-whatever folder
so ./configure --lib-dir=gnu-whatever

Don’t enable neon as it just causes errors and gets enabled anyway

Then make
then sudo make install

That gives you the release version and installs speexdsp rather than the raspbian RC version it contains but both still work.

So if you didn’t compile install speexdsp as in https://github.com/voice-engine/ec#build
clone EC cd to the dir and make.
Stay in that directory when you run ./ec as its not in any path anywhere.

But cd … and then do as https://github.com/voice-engine/ec#use-ec-with-alsa-plugins-as-alsa-devices

Then copy https://github.com/voice-engine/ec/blob/master/asound.conf to either /etc/asound.conf (system wide) or ~/.asoundrc (user)

The cd back into the EC folder do as https://github.com/voice-engine/ec#ec-for-raspberry-pi and ignore the hardware loopback directions for now.

You need to do a aplay -l / arecord -l to get the index of the respeaker but likely
./ec -i plughw:0 -o plughw:1 -d 20

I tend to disable pi audio when not used as it just makes things more tidy

sudo nano /etc/modprobe.d/raspi-blacklist.conf
add blacklist snd_bcm2835 save reboot and now your respeaker will be index 0 for capture and playback and the 3.5mm soc_audio will of gone.

Not essential but yeah a little more tidy when it comes to audio asound.conf and things.

so then it becomes ./ec -i plughw:0 -o plughw:0 -d 20

Leave that open in a terminal windows so you can see stndout in the window.

Open another terminal and the asound.conf we set earlier will of set the defaults but just set the sampling rate, format and file to record to.

arecord -fS16_LE -r16000 rec.wav

In another terminal playback an example wav whilst talking into the mic.

wget https://file-examples.com/wp-content/uploads/2017/11/file_example_WAV_10MG.wav

aplay file_example_WAV_10MG.wav

When its finished playing go back to the arecord terminal and press ctrl+c and that will stop recording.

Ec ends when the recording ends so it will not be accessing the card and you can just aplay -Dplughw:0 rec.wav to check your results.

Thats just the test and wise just to do to make sure all is working as then we just need to add a few lines to the Mycroft start service and script as EC only kicks in when media is playing and thats it done.

What I suggest you do is record a non EC via plughw:0 without EC running
arecord -Dplughw:0 -fS16_LE -r16000 no-ec.wav
In a seperate terminal
aplay -Dplughw:0 file_example_WAV_10MG.wav
Into the mic talk proverbial
ctrl+c arecord terminal on end

Then run ./ec

then the same but use the asound.conf defaults
arecord -fS16_LE -r16000 ec.wav
In a seperate terminal
aplay file_example_WAV_10MG.wav
Into the mic talk proverbial
ctrl+c arecord terminal on end

Then compare if all OK then we will automate the startup, prob seems all complex but once you have done that step by step once it will all become fairly apparent and easy.

Stephen_O_Sullivan · June 9, 2020, 6:30pm

@StuartIanNaylor

“I am pretty sure the echo mentions of Respeaker are snake-oil as the silicon lacks DSP.”
I am beginning to think that also. It’s a pity, as I was under the impression, from looking at the seed website, that it was possible on the card itself, which is why I bought it. But to be fair, I don’t think they were being intentionally deceitful, I think they just have poor English. But this is useful information, so I can stop chasing down this rabbit hole.

I have tried setting up EC, but I could not get it working. I will try your instructions and see if I can make some progress.

By the way, I did try some of the Seeed scripts, and they did seem to work OK. So I can look at integrating them into mycoft. But honestly, the mycroft wakeword works 95% of the time. I am not sure how much effort I am willing to invest to get it to 97/98%.

I’ll try your instructions tomorrow and I’ll let you know the results.

Thanks for the help,

Regards,

Stephen

StuartIanNaylor · June 9, 2020, 7:10pm

I sort of view it snakeoil as if you check the forums there is a long history of this and nothing has been done to change the sales pitch.
They do know, have known and still continue.

My first noob purchase of the 4mic was the same but found its common for them to make claims that are really not true or at least misleading.
The number of times that happens surely can not be by mistake but surely by choice.

EC is for when your playing media and you will not get 95% when it does as you might be getting lucky to get 10% whilst yelling closely to the mic.
Depends on the volume and position of your speaker but often for the mic its louder than your voice and at that point EC is essential.

If your not doing ‘barge in’ then you don’t need EC, if you are playing media use EC or wait hopefully till it ends.

Stephen_O_Sullivan · June 10, 2020, 10:09am

@StuartIanNaylor
I tied again this morning, but no joy.

I just have a few questions:

Do I completely overwrite the old file or just add the new values to the end of the file?

I need to shutdown pulse audio, or else I get a message saying the resource is busy. Is this normal? When the EC command is running, should I be seeing something? I see nothing change as I am recording.

The recording seems to be an empty file. It seems to record nothing.

When the EC command is running, I get no output through the speakers.

Any further tips?

By the way, for the 6 mic array, the command should looks something like this:

./ec_hw -i plughw:0 -c 8 -l 7 -m 0,1,2,3

`
Regards,

Stephen

StuartIanNaylor · June 10, 2020, 11:10am

Start with a fresh flash of raspbian just to prove your doing it right.

Your probably using the Picroft image which I think has a really bad sound setup or at least did the last time I looked at it.
That is how much I don’t like it as will not use it, but presuming so as you have pulseaudio installed.

Don’t use ec_hw as all it does is use the hardware loopbacks of the card which has no advantage over kernel loopbacks really. The 4 mic is supposed to have loopbacks but its another Seeed snakeoil so never used hw_ec but just read comments on the github they didn’t get it to work.
Its zero advantage in terms of EC to have loopbacks, they just allow you to present an output as a standard source input. The kernel supports 128 of them or was it 256 I forget but one of the 2 via modprobe snd_aloop, so whoop if you have 1 on a card. Thinking its 128 as 16 snd-aloop of 8 subdevices.

We can add to an existing asound.conf or use pulseaudio https://github.com/voice-engine/ec/blob/master/pulse.default.pa or even setup for pulseaudio and set alsa to use that pulseaudio.

For some unknown reason Picroft doesnt use the default devices and sets hardware index parameters in various conf files.
Usually in an app it uses default unless deliberately overwritten and for some reason Picroft does it the other way round.
The utility on setup prob has set those on whatever you selected during initial setup, so we don’t get lost with picroft, docker or anything else just test on a fresh flash of Raspbian so you know everything else is default.

You can run through that once on a fresh Raspbian and it will all come apparent and then you can choose to install on image of choice.

Also do a aplay -l and arecord -l and get the alsa index of the playback and capture card.

If an asound.conf exists we can adopt and add but it will prob need changes to acomodate what it already contains.
Just post the aplay/arecord -l info and the contents of asound.conf do a sudo mv /etc/asound.conf /etc/asound.conf.old so you have a back up and I will post you an asound.conf that can just replace the current.

I could check whats on the respeaker github for 6mic but just send your asound.conf just to make sure nothing is overlooked and really start with a fresh raspbian install with drivers no pulseaudio or picroft so we don’t get sidetracked.
Then I can leave you in the hands of Mycroft and pulseaudio expert @j1nx

Who does keep telling me he is going to test pulseaudio webrtc EC

PS once more the respeaker asound.conf https://github.com/respeaker/seeed-voicecard/blob/master/asound_6mic.conf is confusing as why its mapping 8 channels like so dunno?

Stephen_O_Sullivan · June 10, 2020, 11:43am

@StuartIanNaylor

I did do a fresh install, while I was waiting for you to answer. No pulseaudio installed, and I was able to get it working.

To be able to hear the play back, I needed to run the ./ec -i plughw:0 -o plughw:0 -d 200 command, as it had been stopped as the recording was ended.

I can now see feedback in this terminal, i.e.:
default pipe size: 65536
new pipe size: 8192
skip frames 200
^[[AEnable AEC
playback filled 160 bytes zero
No playback, bypass AEC
Enable AEC
playback filled 160 bytes zero
No playback, bypass AEC
Enable AEC
playback filled 480 bytes zero
No playback, bypass AEC
Enable AEC
playback filled 160 bytes zero
No playback, bypass AEC

and what’s being played back appears to be at a lesser volume, but not totally removed.

Something I did differently this time was, when compiling the speexdsp, I added ./configure --libdir=/usr/lib/gcc/arm-linux-gnueabihf/8. Before I had left it out (just./configure before) . Is this the correct directory?

I guess pulseaudio could be a part of the problem. Now that I “know” what I am doing, I’ll try again with a picroft image and follow the same procedure. But, I have been working on this for 5 hours now. I need a break. I’ll try it out later an get back to you.

Thanks again for the help.

Stephen

StuartIanNaylor · June 10, 2020, 12:02pm

Just /usr/lib/arm-linux-gnueabihf its just the arm-hf location for libs as arm is multi-arch in terms of 32/64 bit.
I don’t have a Pi powered up and doing this all from memory which isn’t great.

Yeah doesn’t git rid totally but attenuates echo so that your voice is the predominant sound source rather than the other way round.
Its why I have asked @j1nx about webrtc ec as it totally gets rid of echo but totally garbles mic input.

To keep recording and stop EC from stopping we sudo modprobe snd-aloop.
if you aplay -l you will then see we have an aloop card.
From ec you just arecord -Dec | aplay -Dloopback0 and that keeps ec also live.
Just pipes the ec into loopback and that stays recording and you just access the other side of the loopback.

A loopback just lets you play to a source so you can use that as a normal alsa source with your linux audio settings.

Its actually aloop:card index, input/output, subdevice of 0-7 as 8 of them
forgot what side is input and output but if the card is index 1 you can just omit subdevice number and :1,1 would be one side and :1,0 would be the other.

Also latency I found on a Pi4 and Pi3 latency was much less that 200 msec seemed approx distance of mic via speed of sound but set to 20.

alsabat --roundtriplatency if it will work with the respeaker drivers as it can be a bit choosey and doesn’t seem to like the ec.
Test your device natural latency and then use that with EC.

You can just add snd-aloop to /etc/modules so it loads on boot.
Forgot the options for multiple cards but we only need one but would be nice to set the index and can not quite remember the location of those options.

edit /lib/modprobe.d/aliases.conf add ``options snd-aloop index=-2` then it will always come after your respeaker card.
Multiple aloop cards just add enable=1,1,1,1,1,1,1 before the index with a 1 for each card up to 16.

In the mycroft or any start-up script just have the ./ec cli command with and & on the end.
same with the redirect of ec to loopback and change asound.conf so the default capture is the loopback rather than eco.

I think this is what ec_hw is supposed to do so you just don’t have to do the aloop yourself.
But without a hardware loopback I never tried, but software its much the same.
Confused at the reaspeaker asound.conf as you will see its mapping your both sides of the loopback to both playback and capture !? dunno.

prob with your 6 mic you might want to add something like.
pcm.cap {
type plug
slave {
pcm “my_card”
channels 6
}
route_policy sum

as that will sum channels into a mono stream.

j1nx · June 10, 2020, 1:52pm

Almost there…

Stephen_O_Sullivan · June 11, 2020, 11:27am

@StuartIanNaylor @j1nx

Looks like I cracked it!!!

Now I am able to see input coming into the EC program, when running pulse audio.

Here’s the trick:

pacmd load-module module-pipe-sink sink_name=ec.sink format=s16 rate=16000 channels=1 file=/tmp/ec.input
pacmd load-module module-pipe-source source_name=ec.source format=s16 rate=16000 channels=2 file=/tmp/ec.output
pacmd set-default-sink ec.sink
pacmd set-default-source ec.source

and then start echo cancellation. It’s getting late where I am, so I can’t have the music too loud. But, I have placed the speakers right on the mic, and I can still trigger the wake word. So, it looks like it’s working! Let me see can I replicate the results tomorrow, and configure it so that this is enabled by default.

Regards,

Stephen

StuartIanNaylor · June 11, 2020, 6:18pm

Good, it works quite well for barge in where voice would be swamped by echo.

Its Pi3A+ and above only but your Pi4 obviosly does the trick.
From your tests you will notice EC only kicks in when media is playing so it works really well as has quite high load but only runs during the generally low load of audio media playback.

What you can do with frameworks that mix and match Alsa & Pulseaudio is use pulseaudio_alsa or the alsa pulse plugin.
Then even aplay and other alsa only will output through pulseaudio and use EC.

https://wiki.archlinux.org/index.php/PulseAudio#Expose_PulseAudio_sources,_sinks_and_mixers_to_ALSA

64bit Raspbian lite should be out soon and will gain quite a bit of performance when compiles catch up.

I think EC produces slightly better results on the Pi4 so the extra oomf must help and so will better cope with load or introduce latency.
You don’t need noise susspression as if it was implemented it could be done as part of audio>mfcc with no load induced other than mfcc creation.
Same with VAD but Sonopy is actually pretty limited.

For opensource all the edge voiceAi devices I know are heavily implemented on Arm.

Is a much more comprehensive MFCC lib code is Julia and likely extremely fast but it would be very inteesting if a C guru took the Julia code and implemented the FFT routines with https://projectne10.github.io/Ne10/

JuliaDSP.MFCC.jl its called SAD (Speech) not VAD (same thing) and it is just sad we don’t have it employed.

Linto.ai have a great HMG (Hotword Model Generator) that allows you to pick MFCC settings and profiles and saves that as a json profile for use with thier apps.

https://www.ee.columbia.edu/~dpwe/resources/matlab/rastamat/ is extremely cutting edge which is what JuliaDSP tries and does encapsulate.

PS if you are installing Mycroft my preffered way is mycroft-core on vanilla raspberry as it makes a much cleaner audio implementation than Picroft.

StuartIanNaylor · August 9, 2020, 1:27am

I keep asking about WebRTC EC

Due to it being more complex it is more permissive of clock drift between input and out which means as a waveform subtraction tool its much more flexible that SpeexDSP.

I got the same from the Puse Dev community and just stopped harassing for a reply but lay a challenge if anyone can provide 2 pulse audio recordings of an echo situation with PA_EC off and on.

I have wondered due to it being a much more complex alg than Speex that maybe the Pi simply can not do the load in the time for the FFT routines to return valid results.

It doesn’t matter as with ALSA or Pulse audio as @Stephen_O_Sullivan showed you can get the Speex version to run and the most important show stopping DSP routine works at least with one version.
The permissive clock drift synchronisation that PA_EC supposedly can provide could have many benefits especially across different input/output hardware and RTP.

@JarbasAl I know this is a big big ask but your programming skills are extremely high that…

Is just a very interesting MFCC lib that far outstretched Sonopy in scope, some of the feat extraction, diarisation for ASR model switching could be used to switch not just language, but regional, gender and age models that provide greater accuracy.
Also for a while the huge load hit of VAD & MFCC because of the similarity of FFT routine on the same input stream I have had a hunch that rather than 2 separate processes one can be just a byproduct of the other and its possible to almost halve the load if the function is combined into a single lib.

You are our only hope obi @JarbasAl

Also the Linto-desktop-tools for KWS is just a great tool for model creation.

If not adoption then maybe emulate its layout and finish some of its rough edges but the ability to evaluate, test your model on completion and test by playing the false positives & negative wavs is extremely important when assessing your dataset.
I made the presumption the Google command set was actually validated and good and it couldn’t be further from the truth with up to almost 10% of the dataset being cut and padded totally wrong.
That massively skews model accuracy.

Also Linto seemed to of copied much of what Mycroft have done and even employ Sonopy for MFCC as they have dropped there Arm only Neon version for cross platform support.

They have adopted tensorflow 2 also via Keras and they accuracy results are much higher than Precise or at least seem to be.

Maybe its time to update Precise maybe even think about employing Pytorch that follows a much more Python logic orientated operation than the heavy reflection of C+ of Keras.
Also its worth thinking about alternatives to a GRU as even for Cortex_M pretty much paralel results can be expected on Cortex_A.

GRU was pretty much cutting edge but seems to be of surpassed by 2 options ds_cnn for greater accuracy or CRNN performance as ops are more than halved.
As said even though based on MCU https://github.com/ARM-software/ML-KWS-for-MCU/blob/master/Deployment/Quant_guide.md has quite a bit of info on adopting the quantising and batch normalization layers of the more cutting edge models.

Also even if only LED bling the DOA can be used as in.
https://github.com/voice-engine/voice-engine/tree/master/voice_engine as you can see with the various array types such as https://github.com/voice-engine/voice-engine/blob/master/voice_engine/doa_respeaker_2mic_hat.py
Python versions of EC SpeexDsp also https://github.com/voice-engine/voice-engine/blob/master/voice_engine/ec.py