Money for hardware aside: How good can a German selfhosted instance be?

Hi, I’m currently running Rhasspy in my homelab and while it works impressively well for something you can deploy on a Raspberry Pi, the performance just isn’t where I want it to be (I’m spoiled by Alexa). I know, I could probably dial the wakeword sensitivity just right, tweak a few commands to be easier to separate from one another and so on and so forth, but I’d prefer to adjust my assistant to the humans using it and not vice versa. I also want to run it completely offline. For that purpose I’d buy a used 2080ti or similar. I want to be able to set a custom wakeword, have on-point wakeword detection ((close to) no false positives and/or negatives) and pipe commands into NodeRED.

Does anyone run a German offline instance on high-end hardware and can tell me how well it works for them? Thanks in advance!

1 Like

Hey there, welcome to the forums!

The big limitation for completely self-hosted instances has been the speech recognition (or speech-to-text). So far the systems we’ve evaluated haven’t been accurate and fast enough to run on a Raspberry Pi or similar.

If you were happy to run some decent hardware locally for this, that would be where I’d focus first. I haven’t tested it myself yet - but I’ve been hearing great things about Whisper from Open AI. Coqui is another one to explore (it’s made up of former Mozillian voice team members).

The internal dev team at Mycroft are looking at this problem right now actually. We ideally want to make it possible for any Mark II (which is a Pi 4) to be able to run STT locally. Most importantly this needs to be fast and accurate enough for a voice assistant.

Interested to hear if others have a more direct response to your question though.

As @gez-mycroft said the STT is the critical part. Back in January I have evaluated all free german STT models here: german-stt-evaluation | Evaluation of STT models for german language

Openai’s Whisper is still missing in my comparison, though. It looks very promising when you have the technical resources to run the large model…

2 Likes

We ideally want to make it possible for any Mark II (which is a Pi 4) to be able to run STT locally.

That would make the technology far more accessible to the general public. Great!

1 Like

I don’t, at least not yet, so far I’ve only been thinking about dropping the cash on a used 2080ti. I’m in the lucky financial situation to be able to afford this right now, but I’d like to know if someone with more experience can tell me whether or not that’s actually a good idea or not.

Thank you for the work on testing the available models btw!

Personally I wouldn’t drop the cash just to run STT. Not sure what they’re going for 2nd hand, but the other cost factor is needing to have a machine running 24/7 just for local STT.

This is where Nvidia’s Jetson Orin SBCs come into play - GPU horse power at less than 30W average power consumption. The current models are still a bit pricey and availability is limited, though.

i’m serveing TTS (coqui docker with the big thosten model) on a 2080 (12gb) with my work computer. (RTF ~0.2) Coded core that it automatically adjusts to whether the docker is available, otherwise use the fallback.