Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In config generation, switch to two stage training when the mono data is too small #632

Open
Tracked by #633
gregtatum opened this issue May 24, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@gregtatum
Copy link
Member

From: #620 (comment)

But I found another use case when we don't want ["one-stage" teacher training when using a pre-trained backtranslations model]. If the amount of mono-trg data is too small (for example for en-lt) we still want to use two-stage. We don't want to loop over 5M back-translated sentences.

@eu9ene
Copy link
Collaborator

eu9ene commented Jan 28, 2025

Now with HPLT2, NLLB and Monocleaner we always have a lot of mono data. I'd say we still use two stage by default and switch to one stage if it stops too early.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants