Why does E2/F5 TTS require such a long training duration to converge? #683
yuekaizhang
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, thank you for the excellent open-source TTS work.
I am curious why E2/F5-TTS requires such a long training duration to achieve good results?
Other popular TTS methods (e.g., matcha TTS) usually converge and produce decent audio with around 100k-200k training steps.
Could you provide some possible reasons?
Also, I would like to ask if you have conducted any experiments or tuning regarding learning rate, scheduler, etc.?
Beta Was this translation helpful? Give feedback.
All reactions