Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[timeseries] Add Support for limit and numGroupsLimit #14945

Merged
merged 4 commits into from
Feb 3, 2025

Conversation

ankitsultana
Copy link
Contributor

@ankitsultana ankitsultana commented Jan 30, 2025

Description

Integrates with V1 Engine's limit and numGroupsLimit by adding new optional API parameters.

Pinot's Grouping Algorithm is defined here: https://docs.pinot.apache.org/users/user-guide-query/query-syntax/grouping-algorithm

One thing that isn't called out there is that for Group-By queries, the Combine operator will truncate results at QueryContext#limit.

Usage Guideline

The defaults should be decent for most of the cases, except the following

When Leaf Stage Needs to Return High Number of Series (>100k)

If you want the leaf stage to return a high number of series, increase the value of the limit parameter. Note that the numGroupsLimit will still truncate groups at segment level, so you may also want to increase that.

However, make sure that the number of TimeBuckets, multiplied by the number of series, is within a reasonable bound, to prevent high heap usage.

When Time Buckets Length is Very High

In such cases, one may want to set a lower value of limit and numGroupsLimit.

Additional Note to Language Developers

The combine operator may return more records than the limit defined in QueryContext#limit. Usually this excess is small (approximately equal to number of threads), and comes from concurrent updates to the Indexed Table.

Since the limit is passed to the RangeTimeSeriesRequest, you can add an operator to trim the number of series at any point after the Leaf stage.

Test Plan

Tested with a Quickstart, and also have some UTs.

Future Work

I am generally not a fan of the Prometheus API. Mid/Long term I'd prefer that we also support the standard SQL query API. I'll raise a separate PR or issue to discuss that broadly.

@ankitsultana ankitsultana added the timeseries-engine Tracking tag for generic time-series engine work label Jan 30, 2025
@codecov-commenter
Copy link

codecov-commenter commented Jan 30, 2025

Codecov Report

Attention: Patch coverage is 40.90909% with 13 lines in your changes missing coverage. Please review.

Project coverage is 63.67%. Comparing base (59551e4) to head (8cc51e5).
Report is 1649 commits behind head on master.

Files with missing lines Patch % Lines
...roker/requesthandler/TimeSeriesRequestHandler.java 0.00% 7 Missing ⚠️
.../apache/pinot/tsdb/spi/RangeTimeSeriesRequest.java 0.00% 5 Missing ⚠️
...che/pinot/segment/spi/AggregationFunctionType.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14945      +/-   ##
============================================
+ Coverage     61.75%   63.67%   +1.92%     
- Complexity      207     1472    +1265     
============================================
  Files          2436     2710     +274     
  Lines        133233   151938   +18705     
  Branches      20636    23464    +2828     
============================================
+ Hits          82274    96754   +14480     
- Misses        44911    47909    +2998     
- Partials       6048     7275    +1227     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.65% <40.90%> (+1.94%) ⬆️
java-21 63.57% <40.90%> (+1.95%) ⬆️
skip-bytebuffers-false 63.67% <40.90%> (+1.92%) ⬆️
skip-bytebuffers-true 63.55% <40.90%> (+35.82%) ⬆️
temurin 63.67% <40.90%> (+1.92%) ⬆️
unittests 63.67% <40.90%> (+1.92%) ⬆️
unittests1 56.21% <53.33%> (+9.31%) ⬆️
unittests2 34.01% <18.18%> (+6.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ankitsultana ankitsultana marked this pull request as ready for review January 30, 2025 18:08
@@ -78,7 +80,8 @@ public BaseTimeSeriesPlanNode planQuery(RangeTimeSeriesRequest request) {
switch (command) {
case "fetch":
List<String> tokens = commands.get(commandId).subList(1, commands.get(commandId).size());
currentNode = handleFetchNode(planIdGenerator.generateId(), tokens, children, aggInfo, groupByColumns);
currentNode = handleFetchNode(planIdGenerator.generateId(), tokens, children, aggInfo, groupByColumns,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the change for promQL to pass request param in the leaf stage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you just need to set the limit and numGroupsLimit in the LeafTimeSeriesPlanNode of the returned plan

@@ -152,7 +156,11 @@ public BaseTimeSeriesPlanNode handleFetchNode(String planId, List<String> tokens
Preconditions.checkNotNull(timeColumn, "Time column not set. Set via time_col=");
Preconditions.checkNotNull(timeUnit, "Time unit not set. Set via time_unit=");
Preconditions.checkNotNull(valueExpr, "Value expression not set. Set via value=");
Map<String, String> queryOptions = new HashMap<>();
if (request.getNumGroupsLimit() > 0) {
queryOptions.put("numGroupsLimit", Integer.toString(request.getNumGroupsLimit()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can put "numGroupsLimit" in a constant class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Can address in a follow-up. This module is mostly a throw away.. we'll open source the full M3 Plugin after time series support is marked as GA.

@tibrewalpratik17 tibrewalpratik17 merged commit c7fcda9 into apache:master Feb 3, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
timeseries-engine Tracking tag for generic time-series engine work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants