Individual user identification?

So while I have started following the Mycroft project with some Enthusiasm to what it can do, the one thing which has become apparent with many of the artificial assistant type devices like Cortana/Siri/Google/Jasper is the lack of identification of an individual user speaking to it.

I wondered if as part of the develop of this AI if it might be a problem that can be solved to get the best user experience.

Having watched a lot of Star Trek recently (working through the Voyager boxed set at the moment) what was interesting is that the ships computer can identify where people are but doesnt recognise individuals speaking to it most of the time, however me and my better half were sat chatting about it, as even she thought the whole multimycroft system looked cool and were thinking does each unit perhaps need to have an enquiry built in to some of it to identify the user?

Something like lighting is obviously user irrelevant, but if you said to a unit “hey mycroft whats my calendar looking like today” surely it would only look at the default setup account?

I guess we also considered some of the personalisation that comes with it, like the tiny things you find on Facebook and google that identify you and your interests, like what the weather is going to be where you normally are on that day, a relationship type thing where it identifies you “Hello Dave” “I’m sorry I cant do that Dave”.

I still have alot of reading and understanding to do yet, maybe I missed something yet, but I just wondered if this is something that needs to be included in an AI?

Apparently, this is quite hard to do as it is based on such a limited data set. The Star Trek computer probably also had cameras - as did HAL9000 (he could lip read quite well too!)

Well I thought of that integration too, I was looking into face recognition for the raspberry pi as I figured with a simple webcam mounted on the front of a unit could maybe identify a person to add personalisation.

I guess I wondered on the whats possible side as over time it might be something that can be added in. I’m looking into the camera face recognition as it might not be good enough to go down that route.

I thought as an initial idea maybe a query of whos asking a bit like the ‘please state the nature of the medical emergency’ the holodoctor on voyager came out with the first time he was activated.

I think speaker recognition would be the way to go, as otherwise you would have to be looking into the camera.
Siri already has this I think ( https://www.newscientist.com/article/mg22830423-100-speech-recognition-ai-identifies-you-by-voice-wherever-you-are/ )

This is definitely a hot topic for me, and a task on the Roadmap I posted recently. I think this could be done using voice alone, no camera needed. I called it “voice printing” on the roadmap. I believe this is achievable using a registration mechanism where individual users would repeat the wake-up word (e.g. “Hey Mycroft”) a few times to establish their voice print. Later when we recognize the wake-up word the system can check for a near match amongst registered users. Skills could then query to system to see what user they are speaking with (or “unknown” if no match is found).

Because of the consistent vocabulary of the wake-up word, this is a much simpler task than a general voice match that attempts to work off of user samples on different vocabulary. The voice-print matching would obviously have to be a little fuzzy, as no two recordings of even the same phrase will be identical. But I think this could definitely be achieved.

The low-dollar approach to this would be to allow individual users to have different wake-up words. But I don’t really like that solution and would rather not go down that road.

Anyone interested in tackling this?

2 Likes

Does any one know if this skill ever succeded? GitHub - TREE-Ind/skill-voice-recognition: Mycroft AI skill to enable voice recognition using Tensorflow “skill-voice-recognition” or if there are any thing like it that actually works?

Based on the age of files for that skill I’m guessing it didn’t work well. The tech has moved forward as well, so probably worth an interested party looking into that and finding a way to run it in parallel to STT parsing.

1 Like