A layer (skill or ?) after the TTS .wav generation and before output

I want to perform additional processing after the .wav file is generated when whatever TTS engine selected. Ideally, my application would take the temp .wav file after TTS generation, process and re-write over, before it is picked up by and play speaker output.

Is there a skill, module, otherwise existing feature I can explore for this ?

If youā€™re only looking at audio manipulation and not additional api calls and the like, you might want to customize a TTS plugin for your usage.

You can modify mycroft.conf setting play_mp3_cmdline to process the output file via a bash script prior to playing the audio. As an example, the default value i believe is:

{
 "play_mp3_cmdline": "mpg123 %1"
}            

can be modified with something like:

{
"play_mp3_cmdline": "/bin/bash /home/mycroft/myscripts/process_before_play.sh %1"
}

with a script like:

#!/bin/bash

process_audio $1 >> /tmp/myrandom_file.mp3

# Play the processed audio
mpg123 -q "/tmp/myrandom_file.mp3"

# delete the post-process audio files after play
rm "/tmp/myrandom_file.mp3"

At the end of of your script you just tag on the mpg123 command so that it plays out the speaker or whatever with the altered audio file you want. You can also do that with the other play_wav_cmdline or play_ogg_cmdline as well.

5 Likes

So simple, but Iā€™d never thought of that - nice one!

I like the idea.

Extending on thatā€¦ The other way around? I mean, after TTS processess can I get the result text and store it? Iā€™m thinking about autotagging that can be use for future learning material.

Interesting :slight_smile:

I was using my system to play sound effects, music, etcā€¦

Does the ā€œplay_mp3_cmdlineā€ line in mycroft.conf, interpret ALL mp3 commands to play audio or just just the voice?

From what I can tell, anything thatā€™s processed by the TTS engine is temporarily cached in (in my case) ā€œ/tmp/mycroft/cache/tts/GoogleTTS/ā€ because I am using Googles TTS. This is probably dynamic based on the TTS you are using. An audio stream skill likely place their files in a different location if they are caching for streaming. You can modify your script to only process the audio if the base directory is ā€˜/tmp/mycroft/cache/tts/ā€™ otherwise just play without processing. Hereā€™s some pseudo-code:

[ "`dirname $1`" != "/tmp/mycroft/cache/tts/GoogleTTS" ] && just play the file && exit

You can copy the temporary TTS file to another location and then play it within the script.

Edit: fixed typo in pseudo code :smiley: