In the last days I am thinking about a “Virtual Device” system for mycroft devices.
So when I connect ESP32’s to a single picroft (running as server) (with WIFI) and send back the recorded audio data, the pycroft processes the audio and sends back the result (TTS, music,…). The microcontroller would then output this.
Here a microcontroller would be a virtual device and is treated like a normal device that I can add to my account.
Since I am familiar with programming, I would do it myself.
What I need is information on where to start and suggestions for improvement.
You should have a look at Mycroft Messagebus docs - while the functioanility you want is currently not there (yet), it nevertheless gibes you an idea how to hook up the Esp-devices with Mycroft.
I’ve been considering something like this myself.
Easiest would be to create a separate mycroft client (running in parallell with mycroft-speech-client) that hosts a simple socket server which the ESP32 can connect to and send wav-data to.
I created this example a long time ago: https://github.com/forslund/mycroft-wave-client
This one reads from a file but the same architecture can be used to read from a socket.
If you can get the ESP32 to record audio and send using a standard socket I can help with the mycroft side if you like.
I vote the ESP32 device should be called μCroft (microcroft/mucroft)
Totally calling those sherlocks
Got looking for microphones for my small devices. Do you have any recommendations? I’ve been looking at https://www.electrokit.com/uploads/productfile/41016/adafruit-agc-electret-microphone-amplifier-max9814.pdf
Okay I will PM you when I am ready
Also voting for that name
I’m definitely not as far into this as most here, apologies if my thoughts don’t apply to this… but I had pondered if this would work as a way to do multi-person voice recognition?
For example you run three μCrofts… one to recognize your voice, one for another person (in my example it would be my wife), and the 3rd for all others… all reporting back to a main picroft… with unique accounts to allow different actions…
The idea is that every uCroft works as a virtual device so like a normal picroft.
Why multiple voices wouldn’t be able?
From what I have read, I thought multiple voice recognition wasn’t possible on the standard setup? Sorry, like I said, just getting started looking into MyCroft and have pulled the trigger on setting up a device yet.
Hi there @Senkrad, unfortunately this isn’t possible at the moment, but multiple speaker recognition is something we would really like to do longer term.
This would require a few technical pieces to be in place;
We would need a way for people to record their voices - voice samples - so that Mycroft could then compare the recorded sample with someone speaking to do a match.
We would need a Wake Word listener that compared the recording of a Wake word with the voice sample on file to determine “who” was speaking
We would also need a way to handle the case where the voice could not be matched to a sample on record.
So, it’s doable, but requires a fair bit of work.
I’m really interested in such development. Personnaly, I have similar project based on Nabaztag as remote microphone/speaker.
Please, don’t forget to bump here for any progress.
Hi, I’m glad you like the idea.
I have to say that I hadn’t started it yet. But I will announce some progress as soon as possible. Fortunately there is more time after Christmas ^^
I still not yet worked on converting Nabaztag nor Karotz. But I already have a new promising gadget: a (hackable) smartwatch. It is a TTGO T-Watch 2020, designed around a ESP32. This watch has a WiFi, an speaker and a microphone.
Coupled with Mycroft, and a good watchface, we can build the effective comlink for KITT from Knight Rider.
Of course, this device is certainly not able to run a TTS nor a STT. So it really need a remote companion to do it. I certainly have to look HiveMind for a possible companion. But for now, I think it does not yet exist (I imagine an endpoint receiving sound input and relaying sound output).
Secondary HiveMind person here! If I had more time (or money) there’d be a suitable HiveMind protocol available by now. I’ve already got it laid out… and I’ve barely written anything, because people keep slapping things on the message bus and it’s starting to feel almost futile.
Doesn’t interfere with the current messaging protocol in any way, but it offers puny devices an (internally) minified binary protocol for exchanging instructions, status, and so forth.
This will be done sometime between next week and never, at the rate I’m going with it.
On performance, on a watch, once the protocol is made suitable:
There’s not a lot Mycroft or HiveMind can do about the size or nature of audio output by a particular TTS system. If it’s realtime TTS, your watch will need something that can receive the audio stream; if it’s (as is currently much more common) pre-“recorded” and played back, you’ll get a regular audio file of indeterminate size (but usually fairly small) and you’ll need a playback mechanism.
The biggest problem I foresee on such a small device is a wake word listener. Talk about a RAM problem. Of course, being a watch, there might be a good way to make it button-based; Mycroft can already do that, and HiveMind nodes can do it, too, so it’s just a matter of implementing the “button” at your end and figuring out how to make it available on your “home” screen.