Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query: Make group a/b minPenalty to be configurable #8061

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

magiceses
Copy link
Contributor

Change-Id: If991daccf49d03cc116a7e0413e023ea4cb45643

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

What happened

When the THANOS QURY connects multiple Prometheus instances, if the pull time of the two Prometheus instances is too close, it may cause the data to fall, such as:
image

timestamp 1733968809.887 > 1733968809.833, but value 3829920.042 < 3829920.063

At this time, when you query the metrics through the Thanos Query's dedup param interface, the data of the Counter data type will decline (this situation is not reset to the counter), just like the following:
image
This situation is incorrect for the counter type, which will cause the Rate func calculate abnormality, just like:
image

Solution

I noticed that the logic that affects this part is in the iter.go file.

Therefore, can we force the THANOS Query by modifying the penalty time of group A/B to solve this problem that we can skip two particularly similar points (for example, the collection time of these two points is only 1s or less).

After testing, after modifying this penalty time from 0 to 999, the fall of data will not appear. Just like following:
image

Verification

After modifying, THANOS Query can run normally, but I am not sure if there will be other scenarios that will affect it, or there are other considerations for this penalty value, must to be set to 0?

Can anyone help it or other suggestions?

Change-Id: If991daccf49d03cc116a7e0413e023ea4cb45643
Signed-off-by: Magiceses <[email protected]>
@MichaHoffmann
Copy link
Contributor

We should probably pipe it through commandline flags at least!

@magiceses
Copy link
Contributor Author

We should probably pipe it through commandline flags at least!

The current patches are just a simple way of implementation. I'm not sure if it will have other effects to modify this value.

If you think that this parameter to be configurable is reasonable, I can pipe it through commandline flags.

@MichaHoffmann
Copy link
Contributor

We should probably pipe it through commandline flags at least!

The current patches are just a simple way of implementation. I'm not sure if it will have other effects to modify this value.

If you think that this parameter to be configurable is reasonable, I can pipe it through commandline flags.

Yeah j think it would be cool to be able to configure this and initial penalty through command line options!

Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good luck trying to understand what the "correct" value is and to explain to regular users who aren't intimately familiar with the inner workings 😄 please focus instead on how to fix the dedup algorithm. I think a proposal is needed here instead which details everything and then we can make the correct decision

@MichaHoffmann
Copy link
Contributor

MichaHoffmann commented Jan 17, 2025

Good luck trying to understand what the "correct" value is and to explain to regular users who aren't intimately familiar with the inner workings 😄 please focus instead on how to fix the dedup algorithm. I think a proposal is needed here instead which details everything and then we can make the correct decision

I have lately wondered why we don't dedup on chunk level. That should sort out restarts but not scrape misses bit could be super easy to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants