I’m training the LJspeech dataset using a GPU. It’s reached about 310,000 steps. I’ve noticed that when I test on short phrases, it sounds like a robot, to the point where I’m not able to understand what it’s saying.
When I test on longer phrases, it sounds great, something I might be able to use in a production environment.
I haven’t adjusted any of the hparams other than changing batch size from 32 to 64. I’ve posted the sample output below. Any ideas as to why I’m getting these results on the shorter phrases?
Short phrase
“Hello, I am your virtual assistant”
Sample: https://drive.google.com/open?id=1OXkFrPI8o3UmGBXnmHA0MYkOiZImObcB
Long phrase
“Hello, I am your virtual assistant. How can I help you today? You can ask me anything.”
Sample: https://drive.google.com/open?id=1NC6_Ah6sd3L5fwQrfOO1OaCxFFxoi52A