Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Datadog unix sockets for reporting traces #1615

Open
agates4 opened this issue Apr 19, 2022 · 4 comments
Open

Support Datadog unix sockets for reporting traces #1615

agates4 opened this issue Apr 19, 2022 · 4 comments

Comments

@agates4
Copy link

agates4 commented Apr 19, 2022

Supporting resources

Is your feature request related to a problem? Please describe.

Traces aren’t sent to Datadog when a unix socket is used, because Buildkite only support sending traces via HTTP.

So, even though my Buildkite Agent is configured properly to send traces via the volume mounted socket, they will not send, because the Buildkite implementation expects to use HTTP requests, defaulting to http://localhost:8126/v0.4/traces

It appears this is a bug!

Describe the solution you'd like

If one of these environment variables are set:

            - name: DD_TRACE_AGENT_URL
              value: "unix:///var/run/datadog/apm.socket"
            - name: DD_APM_RECEIVER_SOCKET
              value: "unix:///var/run/datadog/apm.socket"

Then I expect the Buildkite Agent to use that address to send traces rather than the default localhost:8126 address.

Describe alternatives you've considered

Hacking together the address by setting "HOST" to unix and "PORT" to ///var/run/datadog/apm.socket (Doesn't work because of a failure to resolve DNS)

Additional context

With these environment variables:

            - name: BUILDKITE_TRACING_BACKEND
              value: "datadog"
            - name: DD_ENV
              value: "buildkite"
            - name: DD_LOGS_INJECTION
              value: "true"
            - name: DD_APM_ENABLED
              value: "true"
            - name: DD_TRACE_ENABLED
              value: "true"
            - name: DD_TRACE_AGENT_URL
              value: "unix:///var/run/datadog/apm.socket"
            - name: DD_APM_RECEIVER_SOCKET
              value: "unix:///var/run/datadog/apm.socket"
            - name: DD_PROFILING_ENABLED
              value: "true"
            - name: DD_VERSION
              value: "1.2.2"

I receive this error at the end of the Buildkite job:

2022/04/19 17:45:28 Datadog Tracer v1.28.0 ERROR: lost 1 traces: Post \"http://localhost:8126/v0.4/traces\": dial tcp 127.0.0.1:8126: connect: connection refused (occurred: 19 Apr 22 17:45 UTC)\r
","size":80171,"header_times":[1650390251265893076,1650390251768560569,1650390251771733083,1650390253199885239,1650390253201794984,1650390253622077377,1650390307627264211,1650390318180815416,1650390328697174000]
@moskyb
Copy link
Contributor

moskyb commented Apr 28, 2022

this is a really good callout! we're going to be doing some more work around tracing and observability soon, so i'll keep this in mind for when we do.

@moskyb
Copy link
Contributor

moskyb commented May 17, 2022

kia ora @agates4! I've had a bit more of a chance to engage my brain on this this afternoon.

As far as i can tell, there's no way to tell dd-trace-go to use the UDS socket, other than either:

With this in mind, we could theoretically do something like

// in bootstrap/tracing.go

func (b *Bootstrap) startTracingDatadog(ctx context.Context) (tracetools.Span, context.Context, stopper) {
	opts := []tracer.StartOption{
		tracer.WithServiceName("buildkite_agent"),
		tracer.WithSampler(tracer.NewAllSampler()),
		tracer.WithAnalytics(true),
	}

	if sock := os.Getenv("DD_APM_RECEIVER_SOCKET"); sock != "" {
		opts = append(opts, tracer.WithUDS(sock))
	}

	// ... rest of the tracing setup
}

...butttt i'm not super keen to do something like this for a couple of reasons. The main one is that the socket location is a concern somewhat external to the agent itself; we don't really want to be explicitly handling an environment variable that's meant for the datadog agent.

my recommendation at this stage would be to open a PR against dd-trace-go that allows the UDS socket to be specified by environment variable - it's a bit weird that on that repo, some stuff can be configured with envars and some can't, and it's basically a crapshoot as to what's what.

the other option would be to create a socket at the magic location above, but (bit of a digression here) it seems like both the datadog agent and the tracing library expect the socket to exist prior to their initialisation, which seems like a weirdly hands-off approach. I would've expected the datadog agent to create the socket and have the libraries connect to it? It's all a bit odd.

I realise "go implement it yourself" isn't exactly am awesome outcome for this issue; i'm happy to discuss further and see if we can get to some sort of a happy middle ground :)

@agates4
Copy link
Author

agates4 commented May 17, 2022

Hey Moskyb, thanks for the detailed explanation!

Even if the hosts are not set, and the file exists in the magic location, the datadog client will still use localhost:8126 by default over http without UDS.

To fix this, I implemented this change and submitted a PR to the Go client
DataDog/dd-trace-go#1298

This should offload any changes from Buildkite and allow the user set environment variables to handle the rest!

@moskyb
Copy link
Contributor

moskyb commented May 17, 2022

Awesome! When that gets released, we'll update dd-trace-go to use that feature :)

I'll leave this issue open for tracking until then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants