-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support per-zone PDB #194
Comments
internal link (apologies): this is similar to how this was implemented in https://github.com/grafana/hosted-grafana/pull/5667 |
There was a bit of discussion of an idea like this in #163 |
A comment from Charles from that PR:
|
If someone need immediate solution until this in not implemented yet: I solved similar issue running ZDB controller from aws/zone-aware-controllers-for-k8s . You can pick up my fork with golang and base image refreshed. Setup is quite straightforward but ping me if you have questions. |
Background
The classic PodDisruptionBudget in kubernetes doesn't allow us to express rules like "restart as many pods as you need, as long as they belong to the same zone."
Problem
This means that we're stuck with PDB with maxUnavailable=1. For large deployments this means that kubernetes node recycling is extremely slow restarting one Mimir ingester at a time.
Proposal
Implement our own PDB version in the admission controller that the rollout-operator already is. If a pod should be disrupted, the rollout operator checks that there are no pods from different zones which are currently not available. If there are pods from the same zone, then the pod can also be disrupted.
What about partitions?
The classic PDB isn't ideal for partitions too. With partitions we can restart any partition replica as long as the other replica in a different zone is up.
The text was updated successfully, but these errors were encountered: