Customizability of Mycroft for a university project

Flo-Mueller · November 6, 2021, 10:53am

Hello guys,

we are a group of three IT students who want to work on voice assistants for an ethics seminar in the scope of showing malicious misuse of AI. We stumbled upon Mycroft, finding it quite interesting considering it being open source and thus being quite adaptable to our needs.
However, since we don’t have much experience working with voice assistants, we’d have a few initial questions before starting to work on our project.

Is it possible to adjust the Wake Word Listener (either Precise or PocketSphinx) so that you listen constantly without having to say any Wake Word at all?
Does the intent parser recognize which person talks to the speaker right now? Or does it recognize certain names of the household?

You’d really help us if you could answer these. If we ask “Is it possible […] ?” we also refer to this being in the scope of a university seminar and us being able to work on this for 3 months only with limited time.
Thanks for your help

baconator · November 6, 2021, 11:01pm

For #2, no. This would be an an interesting upgrade! There’s several speaker ID tools out there but finding one that can run quickly enough on-device(hopefully) in parallel with the STT function (so it can be parsed as part of intent handling) is the trick!

Good luck.

ChanceNCounter · November 9, 2021, 12:56am

I’m pretty sure @JarbasAl has some shelved work someplace involving voice recognition.

ScienceGuy · November 11, 2021, 10:07am

I supervised engineering students working with Mycroft in a one year project. If you can program (with Python in particular), it will be comparably easy to work with it and you should have a fast start.

Concerning #1: if you have a wake word listener, the device already listens permanently. Otherwise it could not know when you say the wake word. You can also manipulate how long it listens to the user after activation. But there is a limit to what makes sense, because at some point you need to process the audio recording into text (that is the basis for intent classification).

For creating ethics issues intentionally, I suggest that you use a custom skill - then you have full control over what happens after the intent recognition. You could use SpaCy to https://spacy.io/ to extract things from user utterances that Mycroft does not extract now.