SSML Support in Mycroft

jrwarwick · February 20, 2020, 5:42pm

What is the status of SSML support? I find a mixed assessment from what I could find:
Mycroft core documentation does not seem to mention it, 2016 roadmap mentions it as planned, more recent 2018 roadmap does not mention it (the sub-roadmap for TTS), a recent mimic release notes metions additional features and there was at least one Jarbas pull request to core some time ago that seem like they got included that were SSML features.

And lastly, I threw a self.speak() message including some simple SSML tags and Mycroft (with classic voice) did not sound noticeably different, but at the same time, he didn’t do something weird like pronounce the tagnames or something.

If there is some level of support, I think at least a minimal starter entry in the docs would be really helpful.

jrwarwick · February 20, 2020, 6:01pm

Supplemental:

I think maybe, just maybe, on a second experiment at mycroft-cli-client, I did hear some difference after I put the whole utterance inside of “root” tag pair (I did not do that the first time). But the difference that I perceived was not very strong.

Also, it seemed to me that the rendering to audio time was much much longer. This may explain the situation. Functionality is really mostly there, but we are reluctant to declare it official and supported because people would perhaps then complain about it or get a false bad impression about mycroft performance.

JarbasAl · February 20, 2020, 7:11pm

ssml is supported since april 2018, it will depend on your configured TTS engine, i think mimic2 does not support SSML

I tested with Amazon Polly only, PR is pending (i will get it in working order again soon)

JarbasAl · February 20, 2020, 7:27pm

related, helper tool, SSML builder in jarbas_utils >= 0.3.0

forslund · February 21, 2020, 12:06pm

Jarbas is correct, mimic1 supports ssml, and is invoked with the ssml flag from mycroft so utterances with ssml tags will be rendered using them (as long as they’re supported) Mimic1 1.3 adds support for the pitch prosody tag (mycroft still pulls in 1.2 by default, that will change as soon as I get time to repackage mimic1 for the Mark-1).

Will make a note about the docs…

jrwarwick · February 21, 2020, 4:51pm

Thanks, all that is clarifying. It would certainly be helpful in (hopefully) forthcoming docs to include those tags and properties that are supported so far.

NeonAndrii · January 29, 2021, 4:46pm

I run Mycroft using Mimic1 with SSML tags in my skill, but this seems to make no difference on the audio output–the tags are simply not pronounced. Is this a known issue? If so, could anyone suggest where to start to resolve it?

jrwarwick · January 29, 2021, 7:38pm

In forslund’s reply, he mentions invocation with “ssml flag”, which i think is this:

github.com

MycroftAI/mimic1/blob/adf655da0399530ac1b586590257847eb61be232/main/mimic_main.c#L110


"  --help      Output usage string\n"
"  -o WAVEFILE Explicitly set output filename\n"
"  -f TEXTFILE Explicitly set input filename\n"
"  -t TEXT     Explicitly set input textstring\n"
"  -p PHONES   Explicitly set input textstring and synthesize as phones\n"
"  --set F=V   Set feature (guesses type)\n"
"  -s F=V      Set feature (guesses type)\n"
"  --seti F=V  Set int feature\n"
"  --setf F=V  Set float feature\n"
"  --sets F=V  Set string feature\n"
"  -ssml       Read input text/file in ssml mode\n"
"  -b          Benchmark mode\n"
"  -l          Loop endlessly\n"
"  -voice NAME Use voice NAME (NAME can be filename or url too)\n"
"  -voicedir NAME Directory contain voice data\n"
"  -lv         List voices available\n"
"  -add_lex FILENAME add lex addenda from FILENAME\n"
"  -pw         Print words\n"
"  -ps         Print segments\n"
"  -psdur      Print segments and their durations (end-time)\n"
"  -pr RelName Print relation RelName\n"

You should be able to test that on your own if you have ssh terminal access. Then the question becomes: how do we correctly configure mycroft to make mimic invocations with that flag (or general custom flags)?