Using the Bing TTS Engine

Hi,

I am trying to use the Bing TTS engine (and the new Azure Speech service) but I was not able to figure out how to set the parameters in the mycroft.conf file. Microsoft documentation says that I need to set the following parameters (in addtion to the API key “Ocp-Apim-Subscription-Key” which I allready have):


POST /synthesize
HTTP/1.1
Host: speech.platform.bing.com

X-Microsoft-OutputFormat: riff-8khz-8bit-mono-mulaw
Content-Type: application/ssml+xml
Host: speech.platform.bing.com
Content-Length: 197
Authorization: Bearer [Base64 access_token]

<speak version='1.0' xml:lang='en-US'><voice xml:lang='en-US' xml:gender='Female' name='Microsoft Server Speech Text to Speech Voice (en-US, ZiraRUS)'>Microsoft Bing Voice Output API</voice></speak>

I am not sure how to add these settings in mycroft.conf using the following format:


{
  “tts”: {
  “module”: “bing”,    <- this should be set to the name of your TTS provider (ie "google_cloud", "wit" etc)
  “bing”: {            <- this should be set to the name of your TTS provider (ie "google_cloud", "wit" etc)
    “lang”: “en-US”,           <- the IETF BCP-47 language code for your language (shown is Hindi as spoken in India)
    “credential”: {            <- some STT engines require credentials - check the documentation for the STT engine
      “json”: {
      }
    }
  }
}

Thanks in advance.

Abdulrahman

Hi there @Abdulrahman2,

To try and find an answer for you here, I had a look at the mycroft-core source code for the BingTTS class. You can see it here. Based on the source code, the BingTTS class is expecting something like;

{
  “tts”: {
  “module”: “bing”,
  “bing”: {           
    “lang”: “en-US”,
    "api_key": "YOURAPIKEYHERE", 
   }
 }
}

None of the other parameters are found in the BingTTS class.

Can you try the above configuration and let me know how you go?

Thanks @KathyReid.

I tried that and unfortunately it did not work.

However, I think you are correct and the problem is actually caused by the proxy since I am correcting to the net through a proxy server.

Even though the proxy setting is already set using the export commands and Mycroft is working fine through the proxy, it seems that Bing TTS requests are not able to pass though the proxy.

Thanks,

Abdulrahman

Hmmm. That doesn’t make sense though, because it’s mycroft that is handling the proxy requests, and the requests/responses to/from BingTTS are handled thorugh mycroft.

Is there anything in your voice.log, audio.log or skills.log with either;

  • confirms that this is a proxy issue
  • or provides an error relating to BingTTS?

Best, Kathy

I don’t have access to the log files right now to check the errors. I will check them later.
However, I thought Mycroft depends on this Python library (Python-Bing-TTS) (as mentioned in bing_tts.py) to do the BingTTS requests.
Thanks, Abdulrahman

Yes, that’s correct, Mycroft uses python-bing-tts for abstraction

How did you go finding the logs?

If found the following errors in the audio.log file:

  1. If the “Python-Bing-TTS” package is not installed, I get the following error in the audio.log file:

ModuleNotFoundError: No module named 'bingtts'

  1. If I use the following configuration:

“bing”: {
“lang”: “en-US”,
“api_key”: “[my API key]”
},

I get the following error in the audio.log file:
mycroft.audio.speech:mute_and_speak:126 - ERROR - TTS execution failed (LanguageException('Requested language en-US does not have voice Male!',)).

So, I tried to use the gender field to pass the voice name.

  1. If I use the following configuration:

“bing”: {
“lang”: “en-US”,
“gender”:“JessaRUS”,
“api_key”: “[my API key]”
},

I get the following error in the audio.log file:

mycroft.audio.speech:mute_and_speak:126 - ERROR - TTS execution failed (TimeoutError(110, 'Connection timed out')).

So now I am not sure where is the source of the error: is it the proxy or in using the Python-Bing-TTS backage.

Thanks, Abdulrahman.

Great information and troubleshooting @Abdulrahman2

It sounds like BingTTS is expecting a voice key-value pair. By default, Mycroft sets the voice to “Male” so I think that’s what’s happening.

Do you know what voices are available with BingTTS?

We could try adding a voice value like this:

“bing”: {
“lang”: “en-US”,
“voice”:“JessaRUS”,
“api_key”: “[my API key]”
},

It’s an old thread but I can’t access Microsoft TTS too. I got two keys form Microsoft. Both don’t work. Are there any new hints?

Do you get any error message (spoken, in CLI or in log files)?

According to bing_tts implementation in mycroft-core you need to install a dependency - did you do that already?
Last time the bing_tts code was updated is now 11 month ago - maybe there was a change on MS/Bing API side?

I’v seen the error message:
'BingTTS dependencies not installed, please run pip install ' 'git+https://github.com/westparkcom/Python-Bing-TTS.git ')
so I did it. Now this message appears not longer but there is no answering from Bing. What I’ve seen too is some differences between the Mycroft TTS documentation and a help text from @KathyReid some lines above. And I’m missing the option to give the endpoint of my Microsoft Azure into the configuration. I guess the endpoint is the connection to my account in combination with the key.
EDIT: Here is a link to the the documentation from Microsoft. Maybe it’s useful:
https://docs.microsoft.com/de-de/azure/cognitive-services/speech-service/get-started-text-to-speech?tabs=script%2Cwindowsinstall&pivots=programming-language-python. It’s in german, I guess you can understand it.

EDIT 2: Now I tried again to connect to Microsoft with another voice definition. These are the last audio.log lines:
2020-12-02 09:06:23.391 | INFO | 712 | mycroft.audio.__main__:main:50 | Starting Audio Services 2020-12-02 09:06:23.403 | INFO | 712 | mycroft.messagebus.client.client:on_open:114 | Connected 2020-12-02 09:06:23.410 | INFO | 712 | mycroft.audio.audioservice:get_services:61 | Loading services from /usr/lib/python3.8/site-packages/mycroft/audio/services/ 2020-12-02 09:06:23.416 | INFO | 712 | mycroft.audio.audioservice:load_services:105 | Loading chromecast 2020-12-02 09:06:31.656 | INFO | 712 | mycroft.audio.audioservice:load_services:105 | Loading mopidy 2020-12-02 09:06:31.664 | INFO | 712 | mycroft.audio.audioservice:load_services:105 | Loading mplayer 2020-12-02 09:06:31.673 | INFO | 712 | mycroft.audio.audioservice:load_services:105 | Loading simple 2020-12-02 09:06:31.687 | INFO | 712 | mycroft.audio.audioservice:load_services:105 | Loading vlc 2020-12-02 09:06:32.197 | INFO | 712 | mycroft.audio.audioservice:load_services_callback:177 | Finding default backend... 2020-12-02 09:06:32.199 | INFO | 712 | mycroft.audio.audioservice:load_services_callback:181 | Found local 2020-12-02 09:06:32.214 | INFO | 712 | mycroft.audio.__main__:on_ready:30 | Audio service is ready. 2020-12-02 09:06:43.433 | INFO | 712 | mycroft.audio.speech:mute_and_speak:127 | Speak: You can get started with mycroft. 2020-12-02 09:06:43.437 | DEBUG | 712 | mycroft.tts.mimic2_tts:get_tts:232 | Generating Mimic2 TSS for: You can get started with mycroft. 2020-12-02 09:06:43.442 | DEBUG | 712 | urllib3.connectionpool | Starting new HTTPS connection (1): mimic-api.mycroft.ai:443 09:06:43.442 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): mimic-api.mycroft.ai:443 2020-12-02 09:06:44.183 | DEBUG | 712 | urllib3.connectionpool | https://mimic-api.mycroft.ai:443 "GET /synthesize?text=You%20can%20get%20started%20with%20mycroft.&visimes=True HTTP/1.1" 200 105985 09:06:44.183 - urllib3.connectionpool - DEBUG - https://mimic-api.mycroft.ai:443 "GET /synthesize?text=You%20can%20get%20started%20with%20mycroft.&visimes=True HTTP/1.1" 200 105985 2020-12-02 09:06:45.391 | DEBUG | 712 | urllib3.connectionpool | Starting new HTTPS connection (1): api.mycroft.ai:443
The last voice.log lines:
``
2020-02-07 16:51:03.220 | INFO | 338 | mycroft.messagebus.client.client:on_open:114 | Connected 2020-12-02 09:06:23.441 | WARNING | 338 | mycroft.client.speech.listener:run:88 | Audio contains no data. 2020-12-02 09:06:23.899 | INFO | 338 | mycroft.util.audio_utils:find_input_device:196 | Searching for input device: pulse 2020-12-02 09:06:23.912 | INFO | 338 | mycroft.client.speech.listener:create_wake_word_recognizer:328 | Creating wake word engine 2020-12-02 09:06:23.914 | INFO | 338 | mycroft.client.speech.listener:create_wake_word_recognizer:351 | Using hotword entry for hey mycroft 2020-12-02 09:06:23.917 | INFO | 338 | mycroft.client.speech.hotword_factory:load_module:403 | Loading "hey mycroft" wake word via precise 2020-12-02 09:06:27.665 | INFO | 338 | mycroft.client.speech.listener:create_wakeup_recognizer:365 | creating stand up word engine 2020-12-02 09:06:27.667 | INFO | 338 | mycroft.client.speech.hotword_factory:load_module:403 | Loading "wake up" wake word via pocketsphinx 2020-12-02 09:09:09.527 | INFO | 338 | mycroft.session:get:72 | New Session Start: bd330824-60eb-4595-bce8-5431e71effe7 2020-12-02 09:09:09.534 | INFO | 338 | mycroft.client.speech.__main__:handle_wakeword:67 | Wakeword Detected: hey mycroft 2020-12-02 09:09:10.593 | INFO | 338 | mycroft.client.speech.__main__:handle_record_begin:37 | Begin Recording... 2020-12-02 09:09:13.553 | INFO | 338 | mycroft.client.speech.__main__:handle_record_end:45 | End Recording... 2020-12-02 09:09:17.375 | INFO | 338 | mycroft.client.speech.__main__:handle_utterance:72 | Utterance: ['the institute on phone charger']

Audio-Log looks like you do not have Bing-TTS configured for TTS. Can you share the TTS-section of your mycroft.conf?

But looking at the Microsoft-documentation link you have provided it looks to me that there was a major change in den Bing-TTS API and Mycroft’s bing_tts implementation needs an overhaul…

The tts section:
`
“tts”: {
“module”: “bing”,
“bing”: {
“lang”: “de-DE”,
“voice”: “de-DE-KatjaNeural”,
“api_key”: “123456789abcdef1234567”
},

`
I tried to find some hints, what parameters must given to the Bing service, but I don’t understand enough of programing. Maybe you can help.

Current bing_tts implementation in mycroft-core accepts parameters „api_key“, „gender“ (default is „Male“) and „format“ (for the audio format).

Again, I think this implementation will no longer work as MS has changed the API. I don‘t have a Bing api-key and I don‘t want to register for Bing-services, otherwise I would look into it and try to apply the necessary changes - looking at MS/Bing documentation it does not look too hard to do…

Yes, you need an Azure and Microsoft account. That’s not good, but I’m looking for an alternative to Google because it’s not reliable. I got the next step. There is a quickstart python code which I tested successfully. The address is https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/quickstart/python/text-to-speech. And you have to import a library called azure.cognitiveservices.speech (installed by pip3). With this code I heard the speech of the text I put into the input() var. It was german words with an US sound but it works generally. I the next days I will have a look at the library to get a voice with german sound. It would be very fine if you could find a way to integrate the necessary code in mycroft/picroft.

@Dominik
Now I got the right code to get a german answer from Azure Text-to-Speech:
`

import azure.cognitiveservices.speech as speechsdk
speech_key, service_region = "Your_Azure_Key", "westeurope"
def speech_synthesis_with_language():
    """performs speech synthesis to the default speaker with specified spoken language"""
    # Creates an instance of a speech config with specified subscription key and service region.
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
    # Sets the synthesis language.
    # The full list of supported languages can be found here:
    # https://docs.microsoft.com/azure/cognitive-services/speech-service/language-support#text-to-speech
    language = "de-DE";
    speech_config.speech_synthesis_language = language
    # Creates a speech synthesizer for the specified language,
    # using the default speaker as audio output.
    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
    # Receives a text from console input and synthesizes it to speaker.
    while True:
        print("Enter some text that you want to speak, Ctrl-Z to exit")
        try:
            text = input()
        except EOFError:
            break
        result = speech_synthesizer.speak_text_async(text).get()
        # Check result
        if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
            print("Speech synthesized to speaker for text [{}] with language [{}]".format(text, language))
        elif result.reason == speechsdk.ResultReason.Canceled:
            cancellation_details = result.cancellation_details
            print("Speech synthesis canceled: {}".format(cancellation_details.reason))
            if cancellation_details.reason == speechsdk.CancellationReason.Error:
                print("Error details: {}".format(cancellation_details.error_details))
speech_synthesis_with_language()
`

The modul azure.cognitiveservices.speech must be installed. The only vars to define are speech_key, service_region an language. The input must be set by messagebus I guess, by the text of an answer from the skill. What do you think: is it possible to integrate this into mycroft for working with Azure speech services?

Yes, that should work and necessary changes to bing_tts module shouldn‘t be too complex.

Changed Mycroft Bing-TTS module to use new Azure REST-API - see PR #2775

1 Like