Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(spanner): snapshot EnableOpenTelemetryMetrics value on client creation #11496

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

codygibb
Copy link

Fixes #9740

The problem with the recordGFELatencyMetricsOT function is
that it dynamically checks the value of the mutable
openTelemetryMetricsEnabled global variable, instead of
exclusively relying on immutable state. This is how we end
up in weird states where otConfig isn't fully initialized,
but somehow otel metrics are enabled, causing panics.

This PR fixes this by loading the global
openTelemetryMetricsEnabled variable exactly once on each
client's creation (i.e. snapshot the value), and then from
that point on, the client relies on the snapshotted
value for its entire lifetime: either otel metrics are
enabled and otConfig is fully initialized, or otel metrics
are disabled and otConfig isn't used. Basically, it's a bug
to call IsOpenTelemetryMetricsEnabled() more than once per
client, since the value can change between calls.

Includes a regression test, which I verified panics without
this fix:

=== RUN   TestOTMetrics_InterleavedEnableValues
--- FAIL: TestOTMetrics_InterleavedEnableValues (0.08s)
panic: runtime error: index out of range [0] with length 0 [recovered]
        panic: runtime error: index out of range [0] with length 0

goroutine 26 [running]:
testing.tRunner.func1.2({0x1013b23c0, 0x14000138210})
        /opt/homebrew/Cellar/go/1.23.0/libexec/src/testing/testing.go:1632 +0x1bc
testing.tRunner.func1()
        /opt/homebrew/Cellar/go/1.23.0/libexec/src/testing/testing.go:1635 +0x334
panic({0x1013b23c0?, 0x14000138210?})
        /opt/homebrew/Cellar/go/1.23.0/libexec/src/runtime/panic.go:785 +0x124
cloud.google.com/go/spanner.recordGFELatencyMetricsOT({0x1014ad7f0, 0x14000260600}, 0x0, {0x100f02b56, 0xf}, 0x140000d2400)
        /Users/codygibb/dev/forks/google-cloud-go/spanner/ot_metrics.go:252 +0x440
cloud.google.com/go/spanner.(*txReadOnly).ReadWithOptions.func2({0x1014ad7f0, 0x14000260600}, {0x0, 0x0, 0x0}, {0x14000700700, 0x1, 0x1})
        /Users/codygibb/dev/forks/google-cloud-go/spanner/transaction.go:350 +0x4a0
cloud.google.com/go/spanner.(*resumableStreamDecoder).next(0x140006383c0, 0x140002981b0)
        /Users/codygibb/dev/forks/google-cloud-go/spanner/read.go:573 +0x2f4
cloud.google.com/go/spanner.(*RowIterator).Next(0x1400026c790)
        /Users/codygibb/dev/forks/google-cloud-go/spanner/read.go:182 +0x1b0
cloud.google.com/go/spanner.(*txReadOnly).ReadRowWithOptions(0x1400164a1a8, {0x1014ad570, 0x102029b60}, {0x100ef7d6a, 0x5}, {0x140003af710, 0x1, 0x1}, {0x140003af720, 0x1, ...}, ...)
        /Users/codygibb/dev/forks/google-cloud-go/spanner/transaction.go:419 +0xc4
cloud.google.com/go/spanner.(*txReadOnly).ReadRow(...)
        /Users/codygibb/dev/forks/google-cloud-go/spanner/transaction.go:403
cloud.google.com/go/spanner/test.TestOTMetrics_InterleavedEnableValues(0x1400013d380)
        /Users/codygibb/dev/forks/google-cloud-go/spanner/test/opentelemetry/test/ot_metrics_test.go:414 +0x1e8
testing.tRunner(0x1400013d380, 0x101484f90)
        /opt/homebrew/Cellar/go/1.23.0/libexec/src/testing/testing.go:1690 +0xe4
created by testing.(*T).Run in goroutine 1
        /opt/homebrew/Cellar/go/1.23.0/libexec/src/testing/testing.go:1743 +0x314
FAIL    cloud.google.com/go/spanner/test        0.568s
FAIL

@codygibb codygibb requested review from a team as code owners January 24, 2025 00:10
Copy link

google-cla bot commented Jan 24, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@product-auto-label product-auto-label bot added the api: spanner Issues related to the Spanner API. label Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Spanner API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

spanner: out of range [0] with length 0 when OTMetrics enabled after client creation
1 participant