Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agent: Add scaling event reporting #1107

Merged
merged 20 commits into from
Jan 24, 2025
Merged

Conversation

sharnoff
Copy link
Member

@sharnoff sharnoff commented Oct 12, 2024

This is part 2 of 2; see #1078 for the ground work and neondatabase/cloud#15939 for the full context.

In short, this PR:

  • Adds a new package: pkg/agent/scalingevents
  • Adds new callbacks to core.State to allow it to report on scaling events changes in desired CU.

Notes for review:

I'd like to add minio-based S3 tests to this, but it seemed like it'd be non-trivial, particularly because scaling events actually require that there's scaling that happens ­— unlike the existing billing tests.

So I figured I'd open this for review in the meantime.

Also note: This PR builds on #1078 and must not be merged before it.

Copy link

github-actions bot commented Oct 12, 2024

No changes to the coverage.

HTML Report

Click to open

@sharnoff sharnoff force-pushed the sharnoff/scaling-event-reporting-2 branch from 693b601 to a3cf0fa Compare October 12, 2024 21:39
@sharnoff sharnoff force-pushed the sharnoff/scaling-event-reporting-1 branch from b70150d to 54bfb21 Compare October 12, 2024 21:53
@sharnoff sharnoff force-pushed the sharnoff/scaling-event-reporting-1 branch from f608569 to 1c71a57 Compare October 12, 2024 22:06
@sharnoff sharnoff force-pushed the sharnoff/scaling-event-reporting-2 branch from a3cf0fa to 16c0917 Compare October 12, 2024 22:16
@sharnoff sharnoff force-pushed the sharnoff/scaling-event-reporting-1 branch from a46466d to df54b37 Compare October 17, 2024 17:13
@sharnoff sharnoff force-pushed the sharnoff/scaling-event-reporting-2 branch from 16c0917 to d2b4d45 Compare October 17, 2024 17:13
Base automatically changed from sharnoff/scaling-event-reporting-1 to main November 13, 2024 16:50
This is part 2 of 2; see #1078 for the ground work.

In short, this commit:

* Adds a new package: 'pkg/agent/scalingevents'
* Adds new callbacks to core.State to allow it to report on scaling
  events changes in desired CU.
@sharnoff sharnoff force-pushed the sharnoff/scaling-event-reporting-2 branch from d2b4d45 to 8c60b7f Compare November 18, 2024 04:01
@sharnoff sharnoff requested review from a team and Omrigan and removed request for a team November 19, 2024 19:42
@sharnoff
Copy link
Member Author

sharnoff commented Nov 19, 2024

Remaining items for me, on this:

  1. Add more thorough e2e tests
  2. Test on staging

In the meantime, it should be ok to review.

Copy link
Contributor

@Omrigan Omrigan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, some questions and suggestions.

pkg/agent/config.go Outdated Show resolved Hide resolved
pkg/agent/core/goalcu.go Outdated Show resolved Hide resolved
Comment on lines +18 to +19
// This exists because Neon allows fractional compute units, while the autoscaler-agent acts on
// integer multiples of a smaller compute unit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never paid attention to this difference before, I'd like to discuss it more broadly.

pkg/agent/scalingevents/reporter.go Outdated Show resolved Hide resolved
pkg/agent/runner.go Outdated Show resolved Hide resolved
pkg/agent/scalingevents/clients.go Show resolved Hide resolved
pkg/agent/runner.go Show resolved Hide resolved
pkg/agent/runner.go Show resolved Hide resolved
Comment on lines 43 to 44
ScalingEvent ReportScalingEventCallback
DesiredScaling ReportDesiredScalingCallback
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: maybe instead of having those as callbacks, define a new adapter interface, and pass it like this?

Passing interface feels more idiomatic.

Copy link
Member Author

@sharnoff sharnoff Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, could you expand on this? (e.g., just these two? or the entire ObservabilityCallbacks struct?)

The reason I didn't already do that is because it seemed like a lot of boilerplate just to glue together different functionalities.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean just this two.

I guess I am trying to figure out when it is better to have callbacks, and when it is better to introduce the adapter. Probably, what differentiates things I previously put into the ObservabilityCallbacks is that those are at the top of callstack, while you tend to use anonymous functions for DI, essentially gluing one piece of complex code to another piece of complex code. Plus, I tend to think about callbacks as "something people are forced to use in C, because they don't have an adequate method for defining methods on an object, and doing polymorphism"

But probably I am being too strict, the wiring is perfectly understandable either way. It is fine to use callbacks in Go, as long as we don't put massive amounts of business logic into it. And we even have anonymous functions, compared with C.

Copy link
Member Author

@sharnoff sharnoff Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That all makes sense. I think in general you're right to call out using an interface instead, especially because I have a tendency to make big anonymous functions 😅

In this case, I would say it matches this:

It is fine to use callbacks in Go, as long as we don't put massive amounts of business logic into it.

Going to resolve this as-is for now, but we can always revisit later if you like

pkg/agent/core/state.go Outdated Show resolved Hide resolved
@sharnoff sharnoff assigned Omrigan and unassigned sharnoff Jan 6, 2025
Copy link
Contributor

@Omrigan Omrigan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more questions + some previous are still open. Should be good to go soon.

pkg/api/vminfo.go Outdated Show resolved Hide resolved
pkg/agent/config.go Show resolved Hide resolved
pkg/agent/runner.go Show resolved Hide resolved
pkg/agent/scalingevents/clients.go Show resolved Hide resolved
pkg/agent/scalingevents/prommetrics.go Outdated Show resolved Hide resolved
@sharnoff sharnoff requested a review from Omrigan January 20, 2025 21:35
@sharnoff sharnoff assigned Omrigan and unassigned sharnoff Jan 20, 2025
Copy link
Contributor

@Omrigan Omrigan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending a few remaining things.

pkg/agent/config.go Show resolved Hide resolved
pkg/agent/runner.go Show resolved Hide resolved
pkg/agent/runner.go Show resolved Hide resolved
pkg/agent/scalingevents/clients.go Show resolved Hide resolved
@Omrigan Omrigan assigned sharnoff and unassigned Omrigan Jan 21, 2025
@sharnoff sharnoff requested a review from Omrigan January 21, 2025 18:39
@sharnoff sharnoff assigned Omrigan and unassigned sharnoff Jan 21, 2025
@sharnoff
Copy link
Member Author

One comment thread left, cc @Omrigan

@sharnoff sharnoff assigned sharnoff and unassigned Omrigan Jan 22, 2025
@sharnoff
Copy link
Member Author

sharnoff commented Jan 22, 2025

Seems ready to merge (thanks for reviews!); I'm planning on testing on staging first.

Two comments to unresolve after merging, for posterity: #1107 (comment) and #1107 (comment).

@sharnoff
Copy link
Member Author

Tested on staging; looks good!

@sharnoff sharnoff merged commit 4e81d97 into main Jan 24, 2025
22 checks passed
@sharnoff sharnoff deleted the sharnoff/scaling-event-reporting-2 branch January 24, 2025 21:59
sharnoff added a commit that referenced this pull request Jan 27, 2025
Follow-up to #1107, ref #1220.

Basically, this change adds support into 'pkg/reporting' (which the
autoscaler-agent uses for billing and scaling events) to cleanly send
all remaining events on shutdown.

This has three pieces:

1. The event sender threads now send and exit when their context expires;
2. reporting.NewEventSink() now takes a parent context (or .Finish() can
   be called instead to explicitly trigger a clean exit); and
3. reporting.NewEventSink() now spawns the senders with a taskgroup.Group
   passed in by the caller, so that shutdown of the entire
   autoscaler-agent will wait for sending to complete.
sharnoff added a commit that referenced this pull request Jan 28, 2025
Follow-up to #1107, ref #1220.

In short, this change:

1. Moves reporting.EventSink's event sender thread management into a new
   Run() method
2. Calls (EventSink).Run() from (billing.MetricsCollector).Run(),
   appropriately canceling when done generating events, and waiting for
   it to finish before returning.
3. Adds (scalingevents.Reporter).Run() that just calls the underlying
   (EventSink).Run(), and calls *that* in the entrypoint's taskgroup.

Or in other words, exactly what was suggested in this comment:
#1221 (comment)
sharnoff added a commit that referenced this pull request Jan 29, 2025
Must have forgotten to include handling for azure blob storage clients
as part of #1107; this change adds it.

While we're at it, unify the config checking for azure blob between
billing and scaling events (like we have for S3), and also start
checking prefixInContainer where we weren't before.
sharnoff added a commit that referenced this pull request Jan 29, 2025
Must have forgotten to include handling for azure blob storage clients
as part of #1107; this change adds it.

While we're at it, unify the config checking for azure blob between
billing and scaling events (like we have for S3), and also start
checking prefixInContainer where we weren't before.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants