Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve retire_worker fallback by finding deployments name instead of pod name #912

Merged

Conversation

RossmacD
Copy link
Contributor

Fallback to finding deployments rather than worker for deletion, as otherwise the pod can have the incorrect name, leading to workers not scaling down in adaptive mode.

Believe this was part of the issue on #910

@jacobtomlinson jacobtomlinson merged commit 46350db into dask:main Oct 25, 2024
9 checks passed
@briceruzand
Copy link

Thanks @RossmacD , it looks like my trouble #855 (comment) (404 on deployments, because using pods name instead of deployments name)
I will try this fix on next release.

@briceruzand
Copy link

Try the fix. Thx a lot, my dask cluster can now scale down. 🎆

But it never scale down to 0, do you have any idea ?
I need to do some adjustment to avoid scale up/down flapping ;-)

Do you plan to release a new version with that fix ?

@briceruzand
Copy link

resolve #855

@fcourtial
Copy link

fcourtial commented Jan 16, 2025

@jacobtomlinson could we make a release with this fix ?

The current version prevents the dask auto scaler to down scale, so if the cluster creates 100 workers, they will never be deleted and be restarted given the pods come from a Deployment.

We cannot really use the auto scaling because of the cost of the Pods kept alive.

Best regards.

@jacobtomlinson
Copy link
Member

Sure @fcourtial I just tagged 2025.1.0.

@fcourtial
Copy link

Thanks @jacobtomlinson !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants