Please Help - Wrap Up Precise ( only 2,383 to go! )


#1

We’re close to 100,000 utterances for the Precise wake word tagger. Today we have 97,617 Once we cross that line we can start with a second community backed wakeword.

Today is July 12, can everyone take 1/2 hour and tag a few hundred queries to help get us over the top?


#2

I will check a few hundreds this evening (in ~8 hours). Will the categorization actually help then? Because it does not look like precise has improved since the tagging started, but maybe the algorithm (model?) needs to be tuned.


#3

@Wolfgange has been doing quite a bit of work on the model. Maybe he can weigh in here with some further details?


#4

So… apparently we did it! What’s next?


#5

There was something really funky going on with the graphs and I see we actually cross the 100K line a few days ago. I couldn’t figure out why all the clips I was tagging weren’t having a greater impact. It appeared to be updating (nearly) daily again after a couple of pauses but the figures for the last few days have been retroactively updated since I checked 10 hrs ago.

But the important thing is WE MADE IT! :smile:


#6

Now go tag some on DeepSpeech.

There will be another wakeword added soon per chat. And you can still tag on “Hey Mycroft” as well.

If you have some peculiar wakeword you want to use, you can also start your own tagging locally (but won’t likely integrate with any group effort), see: https://github.com/MycroftAI/mycroft-precise/wiki/Training-your-own-wake-word#how-to-train-your-own-wake-word


#7

@bacanator, I would, but everytime I try I find there is no check to do…
I guess there are not enough clips to verify, or many eager contributors. Or else, the web page is not working as expected.

Edit: what I mean is I get the “NO MORE DATA TO TAG” message…
Which is odd, as the progress page says:

Total collected audio samples: 69,633
Reviewed Samples: 16,780

So it appears there is actually work to do…


#8

@mikelima - I had exactly the same problem a couple of weeks ago but called it out on Mattermost and the boffins at Mycroft fixed the issue.

One of the devs, Michael Nguyen, fixed the problem for me. Doesn’t look like he has a profile in the forum but I’m sure the ever helpful @KathyReid can get Michael to assist.


#9

Just fixed it for @mikelima. If anyone else is having a NO MORE DATA TO TAG message and suspect that there is still data to tag, message me on here or michael-mycroft on mattermost. The issue is from a bug that was fixed a couple weeks ago but people who experienced this before the bug fix may still have the issue persist. Working on a mechanism to detect and clean up anyone with this issue programmatically right now but some may have slip through the cracks.


#10

I’m new to Mycroft and am interested in helping with tagging. Deepspeech is giving me the NO MORE DATA TO TAG message, is this the same bug? I have been able to tag with Precise however it does not seem to keep track of my progress between sessions. Any info would be greatly appriciated :slight_smile:


#11

Hi @gooberSmash,

There’s no data in our DeepSpeech tagger at the moment, however I’m glad you flagged the issue you’re having with Precise not tracking progress across sessions. There should be a small yellow bubble at the top right of the audio player. Am I right in assuming that this increments as you tag, but then resets if you go away and come back? Does it do it anytime you log out or does it seem to time out, eg overnight?

You can also validate voice samples through Mozilla’s Common Voice platform which ultimately helps us to move to DeepSpeech quicker as well.


#12

You are correct, the count in the yellow bubble resets if I log out. I have not tested leaving a session logged in for an extended period to see if it times out. I am active on Mozilla’s Common Voice, great suggestion. I hope to see both projects succeed.