Mycroft on Nvidia Jetson Nano

Hi,

I was wondering if Mycroft can run on Nvidia Jetson nano deployed with CUDA. With 128 cuda cores and 4GB running a Linux4Tegra OS in such hardware has the potential of making Mycroft run pretty fast.

What I don’t know is if there is any easy way to make Mycroft software use of the hundred cuda cores.

Has the Mycroft team reviewed this option when choosing the SBC for the Mark II and III? And if so, why it has chosen other way? just curious.

/Daniel

Hi Daniel,

There’s been plenty of talk about it but I haven’t seen anyone give it a go unfortunately. I’m particularly interested to hear how well they handle running a DeepSpeech STT service.

It would require some work to make use of all the available system resources.

For the Mark II this board would unfortunately be too expensive as there are a range of other costs to consider, particularly a Mic Array for which we’re using the Seeed Mic Array v2. The Mark III however…

2 Likes

I saw Adafruit is working on a similar board: https://blog.adafruit.com/2019/09/02/machine-learning-monday-braincraft-hat-for-raspberry-pi-and-single-board-linux-computers-adafruit-raspberry_pi-tensorflow-machinelearning-tinyml-raspberrypi/ how well could something like this work?

1 Like

The nano is a pretty beefy SBC with a small GPU on board, and lots of support. Even then, one community member has tried a bunch of tools on it and hasn’t had great luck with it working well. This doesn’t look as capable as the nano, so probably not very well. Might be useful for wakeword spotting?

1 Like

Bummer, well it does seem like there a fair amount of people looking at it so hopefully there will be a chipset which does the work we want at lower power than a general device. I’ve never understood if the matrix chips were if they could help either, but I’m assuming people are looking at systems like those too.

Thanks Baconater!

I have a Jetson Nano and can confirm that Mycroft runs on it, but Mycroft does not utilize the GPU/CUDA cores out of the box. The CPU is a ARM Cortex-57 which is better/faster than the Cortex-53 of the RPI3. Overall the performance Mycroft running on the Jetson Nano is a bit better, but the 4GB RAM help a bit performance wise as well.

One pain point is that the Jetson Nano is aarchlinux/arm64 architecture where you sometimes have trouble finding pre-built software packages (apt an pypi). So exspect to be building packages yourself from scratch or not being able to use some pieces of software at all.

Right now i am testing different STT and TTS systems on the Nano.

For STT i got Kaldi, Zamia (based on Kaldi) and Mozilla Deepspeech running. While performance is o.k. for most of them, the quality of the pre-trained models provided is still bad, e.g. a phrase like “turn off the light” sometimes needs five attempts until the system got it right.

On the TTS side I got Tacotron, Tacotron2 and WaveRNN in different flavors running. Again the pre-trained models differ in quality and there is the trade-off between quality and performance, e.g. the very good sounding WaveRNN models require more than 10 seconds of processing for one second of audio.

5 Likes

HI Dominik, do you have a GitHub code for implementation of Tacotron on Jetson Nano? I am new to Jetson Nano and new to creating my own hacks in general, so any help would be appreciated :slight_smile: !

I tried several implementations:

Reason for choosing these was the availability of pre-trained models for inference as training such models on the Jetson Nano is not feasible (only 4GB RAM, GPU too limited for this purpose).

For performance etc. see also my posts in the Mycroft-Chat channel ~machine-learning

1 Like

Has anyone tried running the trained tensorFlow .pb through tensorRT? You can run it on the Nano or on a PC with an NVIDIA GPU. It should optimize the model for faster inference performance, better memory utilization, and ensure it uses all the goodness on the NVIDIA hardware.

https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html

After getting it to work stock, you can speed it up further.
Depending on the level of precision the model was trained at, you can also trying dropping it to fp16 or int8. It could dramatically reduce the model size and memory required, speed up the inference while it may only see a nominal decrease in accuracy. If I get some time in the coming weeks, I’ll see if I can test it out myself.

Hi Dominic. I noticed your new release of DeepSpeech here. Is the quality still lacking or has it improved since last September?
(I’m asking since I’m curious whether to buy a Nano to replace Alexa or not ˸)
Thank you!

DeepSpeech 0.8.x as a software stack has seen a lot of improvements
The STT “quality” is still dependent on a number of factors like input signal quality (you can run denoiser like RNNoise before passing the audio to DeepSpeech) and the language you want to use. The english pre-trained models are now at a WER (word error rate) of less than 9% (actually 6-7% if I remember correctly). For other languages the situation is different, e.g. the best german model I know of still has a WER of 15% which in my opinion is not feasible for the Mycroft use-case scenario.

1 Like

I’m running Mycroft on my Jetson Nano 2gb, it runs just fine using a usb audio dongle.

1 Like

Hi @Dominik ,

Could you provide an example excerpt from your mycroft.conf TTS section, to demonstrate how you are sending calls to these custom TTS setups?

What engine are you using? remote_tts.py? Have you had to modify the tts py file at all?

Thanks.

For STT, I have Mozilla Deepspeech 0.9.1 running in Docker using an Nvidia Quadro P620.

Provided the microphone recording is reasonably clear and at the correct sample rate, I’ve found the accuracy with the pre-trained model to be quite good, and that’s with my Australian accent.

It would be nice to get the TTS working internal to my network as well. Currently I use neural voices from AWS Polly. They sound great but probably not great in terms of privacy.

I am using Mozilla TTS server and deepspeech-server for Mozilla Voice-STT (a.k.a. DeepSpeech), both running local on a Nvidia Xavier-AGX.

My config

    "pulse_duck": false,
    "module": "mimic2",
    "mimic2": {
      "lang": "de",
      "url": "http://10.0.0.1:5002/api/tts?text="
    }
  },
  "stt": {
    "module": "deepspeech_server",
     "deepspeech_server": {
       "uri": "http://10.0.0.1:8080/stt"
     }
  },

Note: I have customized the mimic2_tts.py (replace line req_route = self.url + "/synthesize?text=" + sentence with req_route = self.url + sentence ). In mycroft-core dev-branch there is also a new module mozilla_tts.py

1 Like

Oh clever, Mozilla TTS and Mimic2 are both derived from tacotron/tacotron2, yes? So of course they’re similar enough to be interoperable. Thanks.

They’re both tacotron derivatives, yes, but you can basically use that tts interface to any api that returns a wav from your call.

1 Like

Another option for Speech-to-text is the new mycroft-stt-plugin-vosk, which can run on a RPI3 locally. But that requires the latest dev-branch of Mycroft-core.

2 Likes