American male speech glitch?

Yesterday I fired up Mycroft on my Linux system and immediately found a problem. It couldn’t talk!

After a lot of fiddling about I discovered it was something to do with the Google “Text to Voice” conversion. So I switched back to MIMIC. Now the “British Male” voice, well I don’t know where in Britain he is from but he sounds like he has just seen a Ghost, which is why I used the Google version.

So I thought lets give the American Male voice a go. It is much better by far but it does have a slight feature, worth a mention. As I get Mycroft to read data off my weather station I found the new voice has a problem with decimal points. For example, say it was to say “13.4 degrees”. Google did it fine. This chap doesn’t know what to make of a decimal point, so it delays instead. Therefore I get " Thirteen four degrees". Of course I heard 4 degrees and, yeah its not that cold here, yet. I am thinking to add code to my skill to lift the floating point numbers out and turn “13.4” into “thirteen point four”. However this could crop up in all sorts of ways so worth raising the issue.

On a side issue, the American voice will eventually require membership. I see a lot of potential in Mycroft and I am happy to contribute financially, however I am very reluctant to put my credit card details on the net. Are there any plans to add PayPal? I would certainly favor that if there are.

Cheers,
Dave

Hi Dave,

thanks for reporting the issue with the American Male. It’s still early days and we keep improving the model and the preprocessing needed, so these reports are really valuable. I’ll ping @LearnedVector who does most of the work on this. It might be the chunking we do or possibly we just need to add some preprocessing detecting this case.

For the paypal payment method I’ll ping @KathyReid who I think has a good handle on those kinds of things.

/Åke

@Darmain Hi Dave, unfortunately we don’t support Paypal currently, only credit card.

Thanks for the report. This could be two things like Ake mentioned. It could be how we are chunking things or how we are preprocessing in the mimic2 service. I’ll put this in as a ticket to look at. We should have the ability to transform 13.4 to thirteen point four in Mycroft’s core util somewhere. Is this from the Mycroft’s weather skill or did you personally built this particular skill?

And thanks for this laugh :rofl:

@LearnedVector This is a skill I designed myself that feeds from my own weather station. However you raise a good point. I tried asking for the weather for tomorrow. Mycroft reports in integer temperature values, hence no problem with decimal points. My system reports to one decimal place. However, try asking Mycroft “What is one point five times three”. The Americal Male will answer “Four Five”.

@KathyReid Hi Kathy, that’s a shame about Paypal. Is there any plan to add this anytime? If not, then the next question is does your system store our credit card details once payment has been confirmed? I am sure you will no doubt understand the reasoning for this.

Unfortunately the risk of fraud with Paypal is quite high; we don’t store CC details - they are processed by the third party company Stripe.

Okay, thanks @KathyReid, I will look to this at the weekend.

1 Like

Hi. Having got two Mark 1s running with American Male and latest software, I spotted another problem with this voice.

If the text is in capitals then it sounds out the letters rather than the word. Ask it for a joke and get the one about programmers confusing Christmas and Halloween because OCT 31 = DEC 25. Mycroft says “Owe Cee Tee thirty one equals Dee Ee Cee twenty five”.

I have another skill for monitoring an environmental control system by scraping the web page it produces. The system status reads back as IDLE. You can guess what Mycroft makes of that.

As you’ve experienced, Google TTS is very good. Which I would expect. They’ve spent literally decades working on their technology and likely thousands if not tens of thousands of man-years. I recall reading a story about the early days in Google’s search technology – they had people literally reading the search queries and building special case code to help. I expect they have done the same with their TTS.

We are still in that stage, and Michael has an issue full of “bad pronunciations” that he is knocking off several each day in various ways. Just a couple days ago I requested that he add the special case of converting all uppercase to individual pronunciations, as that is normally how they are handled in written English – e.g. AI, IBM, ABC. You and I read all of those as individual letters. I still think that is a good generalization, but obviously not perfect.

The other way to improve these is via training. Much like a child learning to read, we are teaching this neural network how to convert written text to sound. Currently it is learning from about 15 hours of text spoken by one individual. I’ve recently heard that Google’s voice trains on over 10 times that. So, again, we have a bit of catching up to do.

In total the Mimic 2 technology is now about 6 months old, trained on 15 hours of data, and probably the result of around 1 man-year of effort here at Mycroft. Of course, we are standing on the shoulders of giants, but even given that I think we are making great strides.

I personally think all of this is worth the effort. I know I am discouraged at least once a week when things feel stuck and I run in to frustrating glitches or bugs that I swore we were past. But I also find several times a week when something delightful or amazing happens. Each step back is definitely offset by several steps forward.

With the new Mimic, I believe we have hit upon what will be the definitive TTS solution for our community – one that is capable of scaling not just to an excellent English language voice, but also one that can be used to produce a voice for virtually any language. Most importantly, the techniques we are using can now be employed by non-experts to create new voices. Without something like this many tongues would be lost while they wait for someone to do a PhD thesis on how to convert graphemes to phonemes.

I hope you stick with us, or at least come back every few months to see the progress we are making. I think you will be pleasantly surprised.

- Steve

P.S. If you do stick around, do keep sending in bug reports. The only way we can correct issues if we are aware of them!

1 Like

@steve.penrod Hi Steve, please understand that I have great enthusiasm in this project and I do want to be part of it. I certainly want to stay around, providing everyone is happy for me to do so. I am not being critical of the design by reporting issues like this. I can see that this is immensely complex and the only way to improve the model is to keep testing and raise the issues as they are found. Please be aware that I am an embedded systems engineer with Chartered status. I have experiences in electronics development, systems design and embedded software development. I therefore do understand the processes of product development and the condition that Mycroft is currently in. I do recognize that this project is not polished yet, by any means, and to be apart of its evolvement would be an honor. I Hope that helps and if I can help in anyway then I would be happy to. Best regards, Dave

2 Likes

You help and observations are completely welcome here! I know some get frustrated when technology has glitches, while others get excited by the possibilities. I don’t want to discourage either, but I know the former might be better served by taking it slow.

Thanks again for the feedback! It has led me to some suggestions for Michael. For the TTS we are going to add a simple dictionary of the most common words and lookup the all caps words. We will only spell out those which aren’t in the common dictionary. This should pronounce sentences like these “correctly”:

   "I think the BEST station is CBS."

As we talked thorough the implementation, I think we will be doing it so that the lookup can contain pronunciation helpers, too. So it would be kinda like this:
{
“apple”: None,

“best” : None,

“car” : None,

“epitome” : “ee-pit-oh-me”

“oct” : “october”,

}

So we’ll do a lookup for each word, but should be fast and very effective. Even better, we can easily make this database of corrections available to the world so they can submit corrections.

As they say, it takes a village to raise a child. :slight_smile:

2 Likes

Well, one for @KathyReid and @steve.penrod, but also to the team in general as well. As I have said above, I have great enthusiasm for this project and I would like to to get up to speed on helping however I can with its development. As a result I have just paid for the yearly membership and have upgraded my account. That way I can keep working with “American Male” and see him progress also. :grinning:

2 Likes