Mycroft Community Forum

Google Coral Edge TPU

Anyone tested how much a boost a Google Coral Edge TPU gives a Pi4 with Picroft?
Deepspeech would benefit but FANN used in Padatious doesn’t?!?
Surprised a tensorflow-lite neural net hasn’t preference over FANN.
Precise is the same I guess.

I name drop the Google product as likely it will have much support, just wondered if someone had tried.
I was just wondering with the diversification of load that generally runs on Mycroft could each mic input trigger an instance.
That would really throw a cat amongst the Alexa/Google pigeons… :slight_smile:

No need for a TPU/GPU: “DeepSpeech v0.6 with TensorFlow Lite runs faster than real time on a single core of a Raspberry Pi 4”.

Inference performance is not the problem here, but word error rate (WER).
Even though current DeepSpeech model has WER of 7.5% in tests, this is still too much for real world application - unless you want to repeat your phrases multiple times until you trigger the correct intent…

Is that with training or just the pre-trained models?

I should just use my USB headset and give it a go, but waiting for mic and speaker.
I have had a look at a few vids and often there is quite a delay maybe it was a Pi3?
I was presuming it was wake -> STT -> Intent Parser -> TTS that apart from network latency you could accelerate the whole process.

Are there any vids showing it in action on a Pi 4?

Another quote from the article I have linked in my previous post: “It achieves a 7.5% word error rate on the LibriSpeech test clean benchmark” - this relates to the pre-trained model for DeepSpeech 0.6. LibriSpeech is not “real world talk” but based on public domain audio books.

The overall delay you observe on the Pi3 has several factors, one being the network latency for STT (with cloud service). Another factor is that currently this is not a streaming STT, so Mycroft waits for silence to detect the end of the spoken phrase. In noisy environments this may be not possible so that it cuts off after the maximum recording time (10 seconds if i remember correctly). So in worst case it takes up to 10 seconds after wake-word before the actual STT is started.

For DeepSpeech there is a streaming API but this only shifts the problem of “phrase ending detection” to the DeepSpeech side…

And yes, a beefier CPU like the Pi4 will speed up the intent parsing.

I was looking at the roadmap and just getting up to speed with Mycroft.

Being a noob my assumptions are relatively blind but it would seem generally there is a shift to neural network technologies for key Mycroft components.
I was just interested if anyone had tried tensorflow with the Pi now its has an alternative to GPU based offering.

" Identify hardware targets and create recommendations list

As rapidly evolving technologies, the requirements to run full STT and TTS services is a moving target. But recognizing the demands helps shape the system, too, so is valuable to look at early. Examine options such as:

  • Hosted STT/TTS hardware recommendations and limitations, e.g. “1 GTX 1080 can handle X Mycroft units” or “1 FPGA (brand and type need to be specced) can handle Y Mycroft units”
  • Look at lower-power options such as STT/TTS passthru with account randomization"

It was just curiosity but I did wonder “1 Coral Edge TPU can handle X Mycroft units” with the Pi4 being able to steam to multiple dumb WiFi speaker/mics.
They are still pretty pricey in comparison to the cost of a Pi4 but $75 gives 4 TOPs and being USB the Pi4 might be able to handle x2.
But my thoughts where along the lines of could a Pi4 & TPU handle 4 or more Mycroft units as its likely they will drop in cost and not far out of line of multiple Google/Amazon offerings in overall price.

That so much is in the neural network domain that even parsing text semantics from webpages could also be accelerated greatly.
Like this article as a semantic aware webcrawler would be pretty damn awesome :slight_smile:

With the roadmap and current neural net libs chosen as I noob I am head scratching to why not tensorcore-light for all?

Also has anyone played with GPU/TPU acceleration? As just extremely curious how much load can the TPU handle and what resultant load is on the Pi4.
You can still run CPU based tensorcore-light but at an instant add a TPU and gain much accelaration…

There are quite a few now on the market but the Coral edge just seems to be getting the most Pi focus.

From what I read here in the Mozilla-forums currently DeepSpeech will not run on Coral EdgeTPU as it does not support full TensorflowLite function set and DeepSpeechs TfLite model requires some functions that are not supported.

Regarding TTS inference performance: Mozilla TTS Tacotron + GriffinLim vocoder, which has rather low quality, is 2-3x realtime on a GTX1080Ti (6 seconds audio take 2-3 seconds for inference). Tacotron2 and/or higher quality vocoders like WaveRNN or WaveGlow are slower…

Another issue with the smaller Edge computing devices like Coral TPU or Jetson Nano is the rather small amount of available RAM that limits size of model that can be loaded. Therefore usually these devices can only run one model at a time, so there is no parallel STT and TTS. (Loading a DeepSpeech model on my Jetson Nano can take up to a minute).

Dunno not sure about TfLite as think they are awaiting replies.

Dunno haven’t a clue about ram as with 2 KB RAM presumed the models where not TPU based.
We talking language model? As not sure how the 46MB or whatever it is resides.

They seem to use Discourse more frequently, so looking like your right.

Coral support is limited. I would not pick one up at this point. Wait for a more generally usable version or updates to current software to take advantage of its capabilities.

As for the N GPUs are able to X mycroft units, that’s not a very clear metric, either. There’s significantly more elements to make that a coherent equation.

Yeah it seems it works ok with the object identification demo.
Its my noobness as I thought tensorcore-lite support would mean it would support tensor-lite?!?

e.g. “1 GTX 1080 can handle X Mycroft units” or “1 FPGA (brand and type need to be specced) can handle Y Mycroft units”

Was from the Mycroft roadmap and only quoted in a similar vain of curiosity that X or Y of a brand and type could be tested.

I always struggle getting through any ML documentation as always seems extremely longwinded and painful.
I was hoping it was going to be purchase and install with tensorflow-lite which I keep calling tensorcore, but doesn’t matter its currently a cul-de-sac.
Apols for all the questions but just getting my bearings and direction.

Lots of marketing, lots of tech, shove into a casing and you have the sausage that is modern “AI”. :slight_smile:

Not so sure about the sausages in the casing, from personal experience seems to be us outside the case trying to develop.
There is a disconnect in the knowledge hireachy of ML, but up at the top close to the source get results.
I played around with Unity and their OpenML stuff and the samples worked perfect, but my usual approach to hacking (what I call programming) was a pointless disaster.
The model collection and hyperparameters seems more like an arcane art rather than the simple brute force of it works, it doesn’t check error msg.

TFL and GPU seem to be currently x86_64 only where the Arm/Raspberry version is native client_client only

I read enough of lissyx posts that if they are failing to compile then I am not even going to bother,

1 Like

Hello can I get faster CNN training time by using Google Coral dev rather than PYNQ-Z1? Can I get faster CNN training time by Google Coral dev comparing to Jetson nano? Has anyone who use it, give an advice?

Yes but many only offer a subset of tensorflow compatibillity.
The Coral works with the Google image project but haven’t seen another project for it.
They are coming down in price but how restrictive there subset is still means unless you know of a working project prob don’t bother.

“The Coral works with the Google image project but haven’t seen another project for it.”

What do you mean? If I buy it and run custom build tensorflow CNN code, won’t it work??

Prob not as they offer a subset of tensorflow lite and have you seen another project for it?
Or any mention anywhere of another project that supports it?
If you can rewrite using its specifics then yes, but no one seems to.

Deepspeech might be now they are using 1.15 but prob not as likely.

Is this better than the Google Coral Dev?

What I mean, is: Is it more flexible as far as the ML programs I can use, is concerned?

No as they are all very similar with different compatibility issues that you will have to research yourself.
The google coral is as good as any and you could take the image kit and feed with Mel-Frequency Cepstral Coefficients, or MFCCs
Basically voice images and the standard image classification with that input should work.
I think google where/are in the process of improving whats available not sure what the state of play is.

Asus say they are going to ‘support’ but if its any better or worse than the Pi offering via Google image dunno.

Just don’t expect to grab Deepspeech compile and fly as Deepspeech even runs a fork of tensorflow 1.5 that I have no idea what stage with accelerators such as that are.

You can try but think its best to say that actual compatibility and whats available might be big constraints.

I want one but think it prob might be a dissapointment in what I can run.

The best overall compat are the new Nvidia RTX cards and after that its all down hill with earlier cards often needing earlier versions of tensorflow as performance is badly effected.
My graphics card is a mweh GTX780 pretty old now and don’t even bother trying to use it.

Deepspeech prob would benefit from an accelerator if it would work as on a Pi at least its single thread.

That’s literally a google coral tpu plus an SBC, so a direct competitor from a different vendor.

Other than a Jetson TX2 or Xavier board from nvidia, there’s not much in the sbc space that’s viable for anything but customer or very specific ML work yet. If you’re looking to train, get an add in board for a desktop or go the cloud route. SBC’s are inference boards.

The NPU that this contains, has anything to do with TPU? Is it faster than Google Coral Dev/Asus Tinker edge T ?

TPU = Tensor Processing Unit - this can be seen as a GPU that is specialized/optimized for tensor operations (vector and matrix multiplication)

NPU = Neural Processing Unit (based on FPGA) where you can load the model directly to the processing unit. This may give excellent performance but as a drawback programming is quite complicated. Most available model/algorithms are for visual processing (object detection) so this would not be my choice when it comes to speech recognition.

In absolute numbers: Rockchip RK3399pro+NPU is up to 2.4 TOPS, Google Coral TPU is rated up to 4TOPS.