Mycroft mispronounces the letter 'M' when SPELLING a word unless it is the last letter of the word

Mycroft was having issues talking again, but last night her voice came back. As a test, I generally ask her what day it is. Today I also asked her to Spell ‘Monday’. When saying ‘M’, it sounds like she is say ‘misure’. I am using the Google voice if that makes a difference.

Want a good laugh, have her spell ‘momma’.

I don’t think this will actually help, because you can’t do letters, just words, but check out https://mimic.mycroft.ai/pronounce. This is tech that @LearnedVector created to help with the poor pronunciations that came out of the last Mimic 2 training. Check out the new voice, and see if you can pop in some words that trip Mycroft up and get them updated to a better pronunciation.

1 Like

More information on this.

If ‘M’ is the last letter of the word, it is pronounced correctly. If it is in any other position, it is pronounced as 'misure’.

if you are using google voice that’s a problem with their model, nothing you can do to fix it

even the giants hit this kind of problem, moar data isn’t always the solution :slight_smile:

in the case of mimic2 the link above helps it work around this kind of problems, but this is specific per voice model and will not help (usually) with other voices

I am using the google voice, but not sure I agree with that.

If it can say the letter ‘M’ correctly at the end of a word, it should be able to say it correctly parsed anywhere in the word’. I haven’t had a chance to pull the code to look at it yet, but I would venture there is something added to the letters when parsed and trimmed / missing on the last letter in the array.

google uses either tacotron or wavenet, these are neural networks that generate sound directly from text

these are machine learning techniques used to create a voice. When a child is taught to read, they learn pronunciation rules by example. But even with lots of examples the pronunciation of a specific word often needs to be taught explicitly. For example, most children (and many adults) will mispronounce “epitome”

they learned to read from lots of examples basically, there is no parsing anywhere, the model just isn’t sure how to pronounce some words. you found some example with M, eventually you will come across other weird cases

just try giving it garbage and see the output… lots of fun to be had

1 Like

I wouldn’t say it isn’t sure how to pronounce ‘some’ words. It can’t pronounce ‘mom’, well it misses the first ‘m’ and gets the last one correct. So far all ‘normal’ words with an ‘m’ prior to the end it misses and gets the ‘m’ correct if it is the last letter.

Also note this only started after the last update. Prior to this, it pronounced the ‘m’ correctly in any location.

Just a guess, but I would think it has something to do with the move from Python 2.7 to 3.

The 2.7-3 conversion was a while back.

Have a go with this to see how it says things:

I couldn’t get it to weird things with the m’s on basic or wavenet on any of the voices you can pick there.

Ie, nonsense like
“many moms making mini mince meat pies mostly mourning mandatory rum”

Also I’m trying us-english.

The Mark I will only do a single word at a time. If you say ‘spell my mom is a ninja master’, it will spell ‘my’ and drop the rest.

I don’t think it is going to be an issue with Google, could be wrong, but don’t think so.

Don’t know the exact time she started having issues with ‘m’, but my original post was on December 18th and it was a bit before that.

Had her say the alphabet bits at a time and ‘m’ is the only one that has had issues so far.

What’s in the audio log file?

I think there is a misunderstanding of the issue. I have updated the subject for clarity.

It is not an issue with Mycroft ‘saying’ a word. It is an issue with Mycroft SPELLING letters of a word.

Examples for clarification / testing…

Phrase                                        Result
Hey Mycroft say the letter M                  M
Hey Mycroft say mom                           mom
Hey Mycroft spell M                           M
Hey Mycroft spell tom                         T       O       M
Hey Mycroft spell mom                         Misurre         O          M

And what does it show in the logs when you do this?

NOTE - The last letter being ‘spoken’ does not have a ‘period’ after it. Below are the spellings for ‘Monday’ and ‘Mom’. The only ‘m’ said correctly is the last one in ‘Mom’ (end of word).

mycroft.audio.speech:mute_and_speak:119 - INFO - Speaks: M.
mycroft.audio.speech:mute_and_speak:119 - INFO - Speaks: O.
mycroft.audio.speech:mute_and_speak:119 - INFO - Speaks: N.
mycroft.audio.speech:mute_and_speak:119 - INFO - Speaks: D.
mycroft.audio.speech:mute_and_speak:119 - INFO - Speaks: A.
mycroft.audio.speech:mute_and_speak:119 - INFO - Speaks: Y

mycroft.audio.speech:mute_and_speak:119 - INFO - Speaks: M.
mycroft.audio.speech:mute_and_speak:119 - INFO - Speaks: O.
mycroft.audio.speech:mute_and_speak:119 - INFO - Speaks: M
1 Like

I really don’t know why this is so hard for people to grasp;)
It has been doing it for AGES now…

Mycroft pronounces the letter “M” when spelling IF the “M” is not the the last letter as “Misurre”.

This seems to be because it “sees” “M.” ( so M with a full-stop/period after it) as a single thing instead of an “Emmm period” like it does for all other letters.

Thus if it is at the end of a word that it is spelling, there is no full-stop and it is pronounced as “Emmm”

It doesn’t matter what voice you are using , it always does it :slight_smile:

https://github.com/MycroftAI/skill-spelling/blob/18.08/init.py line 31 is where the ‘period’ is coming from.

def handle_spell(self, message):
    word = message.data.get("Word")
    spelled_word = '. '.join(word).upper()

Going to try setting up a dev testing environment and remove / update this and see if it resolves the issue. If anyone already has one setup, it would be great to have this tested by them as well.

NOTE - editing this will require updating the tests as well as they are looking for the period on all letters other than the last one.

Hey can I clarify, this is only when using the Google TTS engine right?

I would assume they have added the periods to help space the audio out when so that it’s “M pause O pause M” rather than “M O M”, but could be worth trying with a comma instead of a period?

I will check with the other voices and let you know, but the logs look like each letter is sent individually to be spoken…

mycroft.audio.speech:mute_and_speak:119 - INFO - Speaks: M.
mycroft.audio.speech:mute_and_speak:119 - INFO - Speaks: O.
mycroft.audio.speech:mute_and_speak:119 - INFO - Speaks: N.
mycroft.audio.speech:mute_and_speak:119 - INFO - Speaks: D.
mycroft.audio.speech:mute_and_speak:119 - INFO - Speaks: A.
mycroft.audio.speech:mute_and_speak:119 - INFO - Speaks: Y

If this is the case, the period is not need.

I just checked. The ‘mimic’ voice does say the letter ‘M’ correctly even if it is not the last letter of the word.