-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replication zone support for store-gateway component #1
Conversation
ac44cfe
to
42a342e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looking really good, left some feedback.
I think the bulk of work (other than getting this to work) will be making sure we can support live migration of an existing deployment, so it's important that the diff with zone-awareness disabled is either empty or not significant.
affinity: | ||
{{- toYaml .Values.store_gateway.affinity | nindent 8 }} | ||
{{- $rolloutZone.affinity | toYaml | nindent 8 }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to support migration from single to multi-zone, we'll need to preserve the old behavior when zone aware replication is disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When last Krajo and I spoke, we weren't supporting migration from single to multi-zone or vice versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, that was probably true when you spoke to Krajo. The current plan is to support a live migration though. We (Grafana Labs) don't really have an option since this is the helm chart we ask our enterprise customers to use and we will need to provide an upgrade path for them in production.
It's probably more than twice as much work to include migration, so it makes sense to me if you'd rather a maintainer take over at that point.
operations/helm/charts/mimir-distributed/templates/store-gateway/store-gateway-statefulset.yaml
Show resolved
Hide resolved
affinity: | ||
podAntiAffinity: | ||
requiredDuringSchedulingIgnoredDuringExecution: | ||
- labelSelector: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could be more opinionated on the pod anti-affinity in order to simplify the configuration. What if we just always use the name and rollout-group labels for this? Do you think that would be too inflexible?
We would still need to allow users to override the nodeSelector, since different environments will have different means of distinguishing availability zones at the node level.
@@ -1879,3 +1934,7 @@ metaMonitoring: | |||
# -- Annotations to add to all monitoring.grafana.com custom resources. | |||
# Does not affect the ServiceMonitors for kubernetes metrics; use serviceMonitor.annotations for that. | |||
annotations: {} | |||
|
|||
zone_aware_replication: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intention to allow overriding the zone awareness in a single place or in each component (like store-gateway above?)
@@ -303,3 +342,38 @@ Cluster name that shows up in dashboard metrics | |||
{{- print "policy/v1beta1" -}} | |||
{{- end -}} | |||
{{- end -}} | |||
|
|||
{{/* Creates dict for zone aware replication configuration */}} | |||
{{- define "mimir.zoneAwareReplicationMap" -}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is taking me a bit to understand. Is there some way we can modify the config format in values.yaml to reduce the need for this code? What I mean is, can we make the values.yaml use the same structure you expect this to return or is that not feasible for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two structures here of importance:
.Values.zone_aware_replication
.Values.<component>.zone_aware_replication.
This merges the two with component overriding non. There are many facets to what a person may want to configure for zone awareness. In our case a zone is an aws AZ. I believe for grafana its a virtual labelling and grouping of nodes. So for us we want to ensure the pods are scheduled (via antiaffinity) into the appropriate AZs (topology.kubernetes.io/zone). For grafana it would be something entirely different. There's also the affinity one may desire to prevent pods from binpacking into the same hostname. And then there's node selection. We dont use it but others may have dedicated nodes for their mimir stuff entirely, or perhaps they just want the ingesters on different nodes. We're close to that point in our prod env as our mimir ingesters are using nearly the entire RAM that can be provided by our default node groups RAM. So we nearly need to create a larger instance type just for the ingesters.
The .Values.zone_aware_replication is for those that simply want to set a common nodeSelection/affinity across all the components with the ability to overwrite that definition via the .Values.component.zone...
If we want to get rid of the .Values.zone_aware_replication entirely and force users to individually define each .Values.component.z_a_r , I'm fine with that, it would definitely simply the code, but at the expense of some users ability to configure more easily.
Unfortunately due to the nature of this feature and what is a zone, I dont think we're going to make everyone happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also the aspect of the default values.yaml that the chart provides along with any values files that a user supplies. In order for values that a user supplies to overwrite the "default" configuration from the helm chart, the "map" has to be a list not a dict as helm merges dicts, but overwrites lists when dealing with combining values files and/or setting variables directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thorough explanation! I agree we can't make everyone happy. I think we should prioritize two things:
- Ensure the output for a given values file is relatively predictable. Nothing more annoying than running into template errors again and again when you already know exactly what you want to have in the final output.
- Ensure a reasonable amount of flexibility to support 90% of use-cases. We can't get to 100% without adding a field for everything in kubernetes which defeats the purpose of a helm chart.
With that said, I think I'd prefer to eliminate the top-level .Values.zone_aware_replication
. In most cases we will need to at least specify a different value for the app.kubernetes.io/component
label. Either store-gateway
, ingester
, alertmanager
, so I think it's probably relatively rare that a common zone configuration is possible. Same argument applies for node selectors, since these components have such different resource requirements it's unlikely users would want to move just those three onto a special node type I think.
Then in the (I assume rare) case that is needed, it's still possible, albeit with a bit of duplication.
WDYT?
64b511e
to
1e7ba5b
Compare
Signed-off-by: Ryan Dyer <[email protected]>
Signed-off-by: Ryan Dyer <[email protected]>
Signed-off-by: Ryan Dyer <[email protected]>
Signed-off-by: Ryan Dyer <[email protected]>
Signed-off-by: Ryan Dyer <[email protected]>
Signed-off-by: Ryan Dyer <[email protected]>
c828809
to
8414e49
Compare
What this PR does
Which issue(s) this PR fixes or relates to
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]