kueue-controller-manager not starting up #3956

kalki7 · 2025-01-10T13:23:24Z

What happened: Kueue v0.10.0 had already been deployed and working as expected. Once I had enabled enableClusterQueueResources: true and redeployed the same version of kueue, the manager container in kueue-controller-manager was stuck in a CrashLoopBackoff with the following being the only logs we were able to retrieve

{
activity_type_name: "ViolationOpenEventv1"
policy_id: "6885206598817066570"
resource_name: "dev manager"
started_at: "1736511103"
terse_message: "Restart count for dev manager is above the threshold of 1.000 with a value of 2.000."
verbose_message: "Restart count for dev manager is above the threshold of 1.000 with a value of 2.000."
violation_id: "0.nng1d6n8dhxc"
}

{
activity_type_name: "ViolationAutoResolveEventv1"
policy_id: "6885206598817066570"
resolved_at: "1736511220"
resource_name: "dev manager"
terse_message: "Restart count for dev manager returned to normal with a value of 4.000."
verbose_message: "Restart count for dev manager returned to normal with a value of 4.000."
violation_id: "0.nng1d6n8dhxc"
}

{
activity_type_name: "ViolationOpenEventv1"
policy_id: "6885206598817066570"
resource_name: "dev manager"
started_at: "1736511257"
terse_message: "Restart count for dev manager is above the threshold of 1.000 with a value of 5.000."
verbose_message: "Restart count for dev manager is above the threshold of 1.000 with a value of 5.000."
violation_id: "0.nng1d6n8dhxc"
}

Followed by kube-rbac-proxy logging received interrupt, shutting down

What you expected to happen: kueue-controler-manager to start up

How to reproduce it (as minimally and precisely as possible): Deploy kueue v0.10.0 using the manifest with enableClusterQueueResources: true enabled and kueue-controller-manager resource requests and limited increased to 8gb and 16gb for memory and 1 and 2 for cpu

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): 1.30.6-gke.1125000
Kueue version (use git describe --tags --dirty --always): v.0.10.0
Cloud provider or hardware configuration: GCP - Google Kubernetes Engine
OS (e.g: cat /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

kannon92 · 2025-01-10T13:28:23Z

Can you try and see if you get this problem on kind?

I don't know what "dev manager" is so I worry this may be a GKE problem.

kalki7 · 2025-01-10T13:53:39Z

dev manager is the name of the GKE resource. However, I just disabled enableClusterQueueResources and it back to the previous working state. So I think it might have something to do with the enableClusterQueueResources: true. Where once its enabled, the manager container in the kueue-controller-manager deployment doesn't come up.

kannon92 · 2025-01-10T13:59:02Z

cc @mimowo @mwielgus

Anyone you can loop into this from the GKE side?

kalki7 added the kind/bug Categorizes issue or PR as related to a bug. label Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kueue-controller-manager not starting up #3956

kueue-controller-manager not starting up #3956

kalki7 commented Jan 10, 2025 •

edited

Loading

kannon92 commented Jan 10, 2025

kalki7 commented Jan 10, 2025

kannon92 commented Jan 10, 2025

kueue-controller-manager not starting up #3956

kueue-controller-manager not starting up #3956

Comments

kalki7 commented Jan 10, 2025 • edited Loading

kannon92 commented Jan 10, 2025

kalki7 commented Jan 10, 2025

kannon92 commented Jan 10, 2025

kalki7 commented Jan 10, 2025 •

edited

Loading