Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 fix(securitygroups): Look up unmanaged NAT Gateway IPs, so provider doesn't add 0.0.0.0/0 SG rule #5198

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sl1pm4t
Copy link

@sl1pm4t sl1pm4t commented Nov 1, 2024

What type of PR is this?

/kind bug

What this PR does / why we need it:

This fixes an issue that caused the provider to add a security group rule allowing all addresses (0.0.0.0/0) access to the workload cluster API Server, if a cluster was created on an unmanaged VPC. The rule would be added even if the user had provided a list of source address in spec.controlPlaneLoadBalancer.ingressRules.

The rule was being added so the kubelets could reach the API, but 0.0.0.0/0 was a fall back source if the NAT Gateway IPs were not (yet) known. Prior to this PR, In the case of unmanaged VPC, the NAT Gateways IPs were never retrieved so getIngressRulesToAllowKubeletToAccessTheControlPlaneLB() in securitygroups.go would always fall back to using 0.0.0.0/0.

Which issue(s) this PR fixes :
Fixes #5196

Special notes for your reviewer:

Checklist:

  • squashed commits
  • includes documentation
  • includes emojis
  • adds unit tests
  • adds or updates e2e tests

Release note:

Action Required: If deploying clusters to an existing VPC (not managed by the AWS provider), the provider will no longer automatically create a security group rule allowing traffic from all addresses (`0.0.0.0/0`). You may need to update `AWSCluster.spec.controlPlaneLoadBalancer.ingressRules` with the source address of your Management Cluster.

@k8s-ci-robot k8s-ci-robot added release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 1, 2024
@k8s-ci-robot k8s-ci-robot requested a review from cnmcavoy November 1, 2024 20:27
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign justinsb for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from nrb November 1, 2024 20:27
@k8s-ci-robot k8s-ci-robot added needs-priority needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 1, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @sl1pm4t. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@sl1pm4t sl1pm4t changed the title 🐛 fix(securitygroups): Don't add 0.0.0.0/0 SG rule when using unmanaged VPC. 🐛 fix(securitygroups): Lookup unmanaged NAT Gateway IPs, so provider doesn't add 0.0.0.0/0 SG rule. Nov 1, 2024
@richardcase
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 1, 2024
@sl1pm4t
Copy link
Author

sl1pm4t commented Nov 3, 2024

/retest

@sl1pm4t sl1pm4t force-pushed the mm/unmanaged-lb-sg-rule branch from 58c52aa to f77078a Compare November 4, 2024 05:04
@richardcase
Copy link
Member

/test ?

@k8s-ci-robot
Copy link
Contributor

@richardcase: The following commands are available to trigger required jobs:

  • /test pull-cluster-api-provider-aws-build
  • /test pull-cluster-api-provider-aws-build-docker
  • /test pull-cluster-api-provider-aws-test
  • /test pull-cluster-api-provider-aws-verify

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-provider-aws-apidiff-main
  • /test pull-cluster-api-provider-aws-e2e
  • /test pull-cluster-api-provider-aws-e2e-blocking
  • /test pull-cluster-api-provider-aws-e2e-clusterclass
  • /test pull-cluster-api-provider-aws-e2e-conformance
  • /test pull-cluster-api-provider-aws-e2e-conformance-with-ci-artifacts
  • /test pull-cluster-api-provider-aws-e2e-eks
  • /test pull-cluster-api-provider-aws-e2e-eks-gc
  • /test pull-cluster-api-provider-aws-e2e-eks-testing

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-provider-aws-apidiff-main
  • pull-cluster-api-provider-aws-build
  • pull-cluster-api-provider-aws-build-docker
  • pull-cluster-api-provider-aws-test
  • pull-cluster-api-provider-aws-verify

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@richardcase
Copy link
Member

/test pull-cluster-api-provider-aws-e2e
/test pull-cluster-api-provider-aws-e2e-eks

@sl1pm4t
Copy link
Author

sl1pm4t commented Nov 4, 2024

/retest

@sl1pm4t
Copy link
Author

sl1pm4t commented Nov 6, 2024

/test pull-cluster-api-provider-aws-e2e

@sl1pm4t
Copy link
Author

sl1pm4t commented Nov 8, 2024

/retest

@AndiDog AndiDog changed the title 🐛 fix(securitygroups): Lookup unmanaged NAT Gateway IPs, so provider doesn't add 0.0.0.0/0 SG rule. 🐛 fix(securitygroups): Look up unmanaged NAT Gateway IPs, so provider doesn't add 0.0.0.0/0 SG rule Nov 19, 2024
Copy link
Contributor

@AndiDog AndiDog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks mostly fine. Did you manually test this? I'm wondering why tests are failing.

pkg/cloud/services/network/natgateways.go Outdated Show resolved Hide resolved
@sl1pm4t
Copy link
Author

sl1pm4t commented Nov 19, 2024

Change looks mostly fine. Did you manually test this? I'm wondering why tests are failing.

Yes I have tested the change, and we are using regularly in our fork.

I have not attempted to run the e2e test suite locally though, as I'm not familiar with them and not sure how to run against our own infrastructure.

@sl1pm4t
Copy link
Author

sl1pm4t commented Nov 19, 2024

@AndiDog I will add, the tests seem to fail on something different each time, so I'm not sure it's related to my change.

@sl1pm4t sl1pm4t force-pushed the mm/unmanaged-lb-sg-rule branch from 7e91d1b to ff07d10 Compare November 20, 2024 04:20
@sl1pm4t
Copy link
Author

sl1pm4t commented Nov 20, 2024

/retest

@joejulian
Copy link
Contributor

This is a great change! Thank you. I needed this, too.

@AndiDog
Copy link
Contributor

AndiDog commented Dec 19, 2024

/test pull-cluster-api-provider-aws-e2e
/test pull-cluster-api-provider-aws-e2e-eks

@sl1pm4t
Copy link
Author

sl1pm4t commented Jan 5, 2025

/retest

Update pkg/cloud/services/network/natgateways.go

Co-authored-by: Andreas Sommer <[email protected]>
@sl1pm4t sl1pm4t force-pushed the mm/unmanaged-lb-sg-rule branch from ff07d10 to eeeb979 Compare January 5, 2025 20:45
@sl1pm4t
Copy link
Author

sl1pm4t commented Jan 6, 2025

/test pull-cluster-api-provider-aws-e2e

1 similar comment
@nrb
Copy link
Contributor

nrb commented Jan 6, 2025

/test pull-cluster-api-provider-aws-e2e

@sl1pm4t
Copy link
Author

sl1pm4t commented Jan 6, 2025

/retest

@nrb
Copy link
Contributor

nrb commented Jan 7, 2025

EKS failure appears to be an account issue unrelated to our code.

Standard e2e tests were hitting #5252, but also [It] [unmanaged] [functional] Workload cluster with EFS driver should pass dynamic provisioning test. I don't think EFS should be affected by NAT gateways, so that appears to be a flake.

Looking at the PR history, I also saw one occurrence of #5242.

Kicking off the e2es again to double check the flake theory, but I think this change looks alright.

/lgtm
/test pull-cluster-api-provider-aws-e2e-eks
/test pull-cluster-api-provider-aws-e2e

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 7, 2025
@nrb
Copy link
Contributor

nrb commented Jan 10, 2025

Dunno why this didn't work last time, but last run was last week.

/test pull-cluster-api-provider-aws-e2e-eks
/test pull-cluster-api-provider-aws-e2e

@k8s-ci-robot
Copy link
Contributor

@sl1pm4t: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-aws-e2e-eks eeeb979 link false /test pull-cluster-api-provider-aws-e2e-eks
pull-cluster-api-provider-aws-e2e eeeb979 link false /test pull-cluster-api-provider-aws-e2e

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The provider adds a 0.0.0.0/0 SG rule to Control Plane LB in unmanaged mode.
6 participants