Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking change: Backend cluster DNS lookup is V4_PREFERRED #5247

Open
guydc opened this issue Feb 10, 2025 · 2 comments
Open

Breaking change: Backend cluster DNS lookup is V4_PREFERRED #5247

guydc opened this issue Feb 10, 2025 · 2 comments
Labels
area/IPv6 IPv6 related issues
Milestone

Comments

@guydc
Copy link
Contributor

guydc commented Feb 10, 2025

Description:
When originally introduced, Backend clusters had V4_ONLY dns lookup settings when FQDN endpoints were used.

With introduction of IPv6, the XDS cluster translation now uses V4_PREFERRED as a default when IP family is not set for a certain destination.

dnsLookupFamily := clusterv3.Cluster_V4_PREFERRED

Destination setting created for Service resources have their IPFamily determined:

ds.IPFamily = getServiceIPFamily(resources.GetService(backendNamespace, string(backendRef.Name)))

The same does not happen for Backend resources, and as a result, V4_PREFERRED is now the default for clusters created from Backend.

Users that specify FQDN endpoints are likely referring to cluster external resources. To reach these endpoints, traffic may need to traverse the cluster network, NAT GWs, etc. So, while the endpoint itself may advertise an IPv6 address, there's no guarantee that Envoy Proxy can actually establish an IPv6 connection with it. If Envoy falls back to an IPv6 address when an IPv4 address is temporarily found, attempts to establish a connection may fail.

To restore support for V4_ONLY, Envoy Gateway can:

  • Use the current EnvoyProxy.IPFamily as a hint for backend DNS resolution strategy. If IPv4 is set, backend clusters will use V4_ONLY, and otherwise - `V4_PREFERRED.
  • Introduce a new EnvoyProxy parameter for default backend IP resolution strategy
  • Support per-backend definition of resolution strategy, using a structure similar to the Service resource IPFamilies
  • a mix of the options mentioned above

Repro steps:

Include sample requests, environment, etc. All data and inputs
required to reproduce the bug.

Note: If there are privacy concerns, sanitize the data prior to
sharing.

Environment:

Include the environment like gateway version, envoy version and so on.

Logs:

Include the access logs and the Envoy logs.

@guydc guydc added the area/IPv6 IPv6 related issues label Feb 10, 2025
@arkodg
Copy link
Contributor

arkodg commented Feb 10, 2025

if I'm understanding this correctly, the issue here is for FQDN endpoints in the Backend resources, whose DNS resolution has changed from V4_ONLY to V4_PREFERRED , causing traffic to route to IPv6 endpoints when IPv4 endpoints were unavailable . This doesnt sound like a breaking change, since the behavior for IPv6 was undefined/not supported, and for others this is actually an enhancement, not a breaking change.

+1 to the idea of adding a ipFamiles field in the Backend to have more control on the DNS resolution or adding this to the dns field in BTP https://gateway.envoyproxy.io/docs/api/extension_types/#dns

@arkodg arkodg added this to the v1.4.0-rc.1 milestone Feb 10, 2025
@guydc
Copy link
Contributor Author

guydc commented Feb 10, 2025

It is a default change for sure. Also, in terms of data plane behavior, there's a difference between failing to discover the IPv4 address and falling back to IPv6.

For example, here: https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/service_discovery#on-eventually-consistent-service-discovery.

Host absent / health check OK:
Envoy will route to the target host. This is very important since the design assumes that the discovery service can fail at any time. If a host continues to pass health check even after becoming absent from the discovery data, Envoy will still route. Although it would be impossible to add new hosts in this scenario, existing hosts will continue to operate normally. When the discovery service is operating normally again the data will eventually re-converge.

In other words, this change can actually create a disruption for traffic that would not occur under V4_ONLY.

I'm +1 for making this configurable for all backend refs via BTP/Cluster settings. It's possible that users may have k8s services with some IP Families and still prefer envoy to use a specific lookup strategy that is different from the EG-determined one, so it may apply to these cases as well.

Having said that, and since the behavior is undefined, can we revert to the previous behavior for Backend until we provide an explicit API for users to define the behavior they need?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/IPv6 IPv6 related issues
Projects
None yet
Development

No branches or pull requests

2 participants