I’m training the LJspeech dataset using a GPU. It’s reached about 310,000 steps. I’ve noticed that when I test on short phrases, it sounds like a robot, to the point where I’m not able to understand what it’s saying.
When I test on longer phrases, it sounds great, something I might be able to use in a production environment.
I haven’t adjusted any of the hparams other than changing batch size from 32 to 64. I’ve posted the sample output below. Any ideas as to why I’m getting these results on the shorter phrases?
“Hello, I am your virtual assistant”
“Hello, I am your virtual assistant. How can I help you today? You can ask me anything.”