Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report vstorage size delta for set() operations via telemetry #10997

Merged
merged 17 commits into from
Feb 21, 2025

Conversation

siarhei-agoric
Copy link
Contributor

@siarhei-agoric siarhei-agoric commented Feb 12, 2025

refs: #10938

Description

add vstorage_size_delta metric to vstorage keeper. The metric is reported for each Set() and Delete() storage operation.

Security Considerations

none

Scaling Considerations

This change adds an explicit Get() call for each Set() / Delete() operation

Documentation Considerations

cosmos telemetry will emit a new metric vstorage_size_delta

Testing Considerations

Manual testing.

First terminal:

cd packages/cosmic-swingset
make
make scenario2-setup
make scenario2-run-chain-economy

Second terminal:

while true; do curl -s -G 'http://localhost:26660/metrics' | grep store_size_delta; sleep 10; done;

Output after a long startup delay:

# HELP store_size_delta store_size_delta
# TYPE store_size_delta counter
store_size_delta{storeKey="vstorage"} 61594
# HELP store_size_delta store_size_delta
# TYPE store_size_delta counter
store_size_delta{storeKey="vstorage"} 61769
# HELP store_size_delta store_size_delta
# TYPE store_size_delta counter
store_size_delta{storeKey="vstorage"} 61769
# HELP store_size_delta store_size_delta
# TYPE store_size_delta counter
store_size_delta{storeKey="vstorage"} 63002
# HELP store_size_delta store_size_delta
# TYPE store_size_delta counter
store_size_delta{storeKey="vstorage"} 71443
# HELP store_size_delta store_size_delta
# TYPE store_size_delta counter
store_size_delta{storeKey="vstorage"} 71443
# HELP store_size_delta store_size_delta
# TYPE store_size_delta counter
store_size_delta{storeKey="vstorage"} 71443

Upgrade Considerations

none

@@ -101,6 +147,7 @@ func (sh vstorageHandler) Receive(cctx context.Context, str string) (ret string,
if err != nil {
return
}
reportKVEntrySizeDelta(ctx, keeper, entry, msg.Method)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add a similar call to reportKVEntrySizeDelta() in case "append": below?
if yes, what should it be reporting as it is not clear to me how exactly "append" works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need this for "append". The implementation is in vstorage/keeper/keeper.go, and the summary is that inbound data extends rather than replaces preëxisting data except at block boundaries (for which the old data is permanently committed into history and can be recovered by querying for the previous block). And as @mhofman suggests, we really want the delta calulation at that point anyway.

Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to do the calculation of size delta closer to the DB access. Instrumenting at Receive causes us to perform extra DB operations which are expensive. It also cannot properly account for the "update" an existing entry case, especially in the "append" mode. The keeper already needs to read the DB, so it should be a much better place to do the accounting.

@siarhei-agoric siarhei-agoric force-pushed the sliakh-10938-vstorage-write-metrics branch from 916a907 to 81840a4 Compare February 18, 2025 16:28
Copy link

cloudflare-workers-and-pages bot commented Feb 18, 2025

Deploying agoric-sdk with  Cloudflare Pages  Cloudflare Pages

Latest commit: a8e9c91
Status: ✅  Deploy successful!
Preview URL: https://d4edc90a.agoric-sdk.pages.dev
Branch Preview URL: https://sliakh-10938-vstorage-write.agoric-sdk.pages.dev

View logs

@siarhei-agoric siarhei-agoric marked this pull request as ready for review February 18, 2025 16:35
@siarhei-agoric siarhei-agoric requested a review from a team as a code owner February 18, 2025 16:35
@siarhei-agoric siarhei-agoric added the automerge:squash Automatically squash merge label Feb 18, 2025
@siarhei-agoric siarhei-agoric removed the automerge:squash Automatically squash merge label Feb 19, 2025
@siarhei-agoric siarhei-agoric added the automerge:squash Automatically squash merge label Feb 19, 2025
Copy link
Member

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall modifications seem good to me, but looking at the telemetry package source code, we should anticipate a future in which we account for all chain storage and label metrics by storeKey. I've added corresponding suggestions.

Testing coverage would be nice, but those gaps are neither new nor exacerbated by this work. But can you document the process and results of manual testing in this PR?

@siarhei-agoric siarhei-agoric removed the automerge:squash Automatically squash merge label Feb 19, 2025
@siarhei-agoric siarhei-agoric added the automerge:squash Automatically squash merge label Feb 20, 2025
Copy link

Deploying agoric-sdk with  Cloudflare Pages  Cloudflare Pages

Latest commit: 36bdd97
Status: ✅  Deploy successful!
Preview URL: https://73068153.agoric-sdk.pages.dev
Branch Preview URL: https://sliakh-10938-vstorage-write.agoric-sdk.pages.dev

View logs

Copy link

Deploying agoric-sdk with  Cloudflare Pages  Cloudflare Pages

Latest commit: 1596289
Status:⚡️  Build in progress...

View logs

Copy link

Deploying agoric-sdk with  Cloudflare Pages  Cloudflare Pages

Latest commit: 1596289
Status: ✅  Deploy successful!
Preview URL: https://f262b963.agoric-sdk.pages.dev
Branch Preview URL: https://sliakh-10938-vstorage-write.agoric-sdk.pages.dev

View logs

Copy link

Deploying agoric-sdk with  Cloudflare Pages  Cloudflare Pages

Latest commit: b5e1d42
Status:⚡️  Build in progress...

View logs

Copy link

Deploying agoric-sdk with  Cloudflare Pages  Cloudflare Pages

Latest commit: b5e1d42
Status: ✅  Deploy successful!
Preview URL: https://fcd11148.agoric-sdk.pages.dev
Branch Preview URL: https://sliakh-10938-vstorage-write.agoric-sdk.pages.dev

View logs

@gibson042
Copy link
Member

can you document the process and results of manual testing in this PR?

@siarhei-agoric where did we land on this?

@siarhei-agoric siarhei-agoric removed the automerge:squash Automatically squash merge label Feb 21, 2025
@siarhei-agoric
Copy link
Contributor Author

can you document the process and results of manual testing in this PR?

@siarhei-agoric where did we land on this?

Sorry, missed that first time around. Updated the "Testing Considerations" section in OP.

@siarhei-agoric siarhei-agoric added the automerge:squash Automatically squash merge label Feb 21, 2025
Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this at the level of the keeper is so much cleaner, thanks for this!

It looks like cosmos-sdk always reports these telemetry in a defer, I'm wondering why.

@mhofman mhofman removed the automerge:squash Automatically squash merge label Feb 21, 2025
@mhofman
Copy link
Member

mhofman commented Feb 21, 2025

Removed automerge label to let you address question. Feel free to reapply if no changes needed.

@siarhei-agoric siarhei-agoric added the automerge:squash Automatically squash merge label Feb 21, 2025
@gibson042
Copy link
Member

It looks like cosmos-sdk always reports these telemetry in a defer, I'm wondering why.

My guess is to ensure that errors from attempting to update telemetry don't interfere with the business logic, although that seems of dubious value to me.

@mergify mergify bot merged commit 252b405 into master Feb 21, 2025
84 checks passed
@mergify mergify bot deleted the sliakh-10938-vstorage-write-metrics branch February 21, 2025 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge:squash Automatically squash merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants