Build an open future with us.

Invest in Mycroft and become a community partner.

From the silly side: Alexa vs. Mycroft on a strange question


#1

I wandered across a web page where someone asked an Amazon echo device “Alexa, what is 10 to the power of 308?”

Alexa responded “One zero zero zero…” and switched to “oh oh oh” toward the end.

I had to try this, so “Hey Mycroft, what is 10 to the power of 308?” He did almost the same thing, except that it was “zero” all the way to the end.

No point to this, I just found it mildly amusing.


#2

10010010110100101101110100111 … 2 … Amen


#3

Actually, it should be possible to express the number a bit more concisely…
something like
one hunderd billions of billions… for 35 times.
Or maybe one hundred thousand trillions of trillions… for 25 times. I had to look it up to be sure but…
there are shorter notations actually:
https://en.wikipedia.org/wiki/Names_of_large_numbers

so the number should be read as
ten thousand centillions. It should not be too difficult to produce this algorithmically.


#4

Actually I’m wondering if it should be read out as scientific notation. Such as:

one point two three four by ten to the twenty-third power

or similar …


#5

Yes, ten to the power of 308 or something is probably better. Cut the threshold at about millions of billions, and go to scientific/engineering notation above that.
Spelling out the numbers is silly, however, and could be a bit of a denial of service vulnerability.


#6

Fair point on the DOS. Some people around here have started discussing SSML, so I’d be interested in what those people have to say, and if we could make it an option by skill or by user.


#7

there are ssml tags that allow you to choose how to pronounce, depending on the TTS engine

ive seen option to speak by digit, name or scientific notation

i think this should not rely on ssml but be a util to use when needed

something like pronounce_as(number (int), mode (str))

pronounce_as(51, "digit") = "five one"
pronounce_as(51, "number") = "fifty one"

modes could be: digit, Short scale, Long scale, scientific notation, binary , hex, and number; number would be the default, maybe auto switch to scientific notation after a threshold?

in addition these methods would need to be localized per language

this would be super useful for example for phone numbers skill, to ensure they are pronounced correctly

mutating the utterance based on a default setting if digits are detected could also be part of the text normalization before adapt