-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[timeseries] Add Support for limit and numGroupsLimit #14945
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #14945 +/- ##
============================================
+ Coverage 61.75% 63.67% +1.92%
- Complexity 207 1472 +1265
============================================
Files 2436 2710 +274
Lines 133233 151938 +18705
Branches 20636 23464 +2828
============================================
+ Hits 82274 96754 +14480
- Misses 44911 47909 +2998
- Partials 6048 7275 +1227
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -78,7 +80,8 @@ public BaseTimeSeriesPlanNode planQuery(RangeTimeSeriesRequest request) { | |||
switch (command) { | |||
case "fetch": | |||
List<String> tokens = commands.get(commandId).subList(1, commands.get(commandId).size()); | |||
currentNode = handleFetchNode(planIdGenerator.generateId(), tokens, children, aggInfo, groupByColumns); | |||
currentNode = handleFetchNode(planIdGenerator.generateId(), tokens, children, aggInfo, groupByColumns, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the change for promQL to pass request param in the leaf stage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you just need to set the limit
and numGroupsLimit
in the LeafTimeSeriesPlanNode
of the returned plan
@@ -152,7 +156,11 @@ public BaseTimeSeriesPlanNode handleFetchNode(String planId, List<String> tokens | |||
Preconditions.checkNotNull(timeColumn, "Time column not set. Set via time_col="); | |||
Preconditions.checkNotNull(timeUnit, "Time unit not set. Set via time_unit="); | |||
Preconditions.checkNotNull(valueExpr, "Value expression not set. Set via value="); | |||
Map<String, String> queryOptions = new HashMap<>(); | |||
if (request.getNumGroupsLimit() > 0) { | |||
queryOptions.put("numGroupsLimit", Integer.toString(request.getNumGroupsLimit())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can put "numGroupsLimit" in a constant class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Can address in a follow-up. This module is mostly a throw away.. we'll open source the full M3 Plugin after time series support is marked as GA.
Description
Integrates with V1 Engine's limit and numGroupsLimit by adding new optional API parameters.
Pinot's Grouping Algorithm is defined here: https://docs.pinot.apache.org/users/user-guide-query/query-syntax/grouping-algorithm
One thing that isn't called out there is that for Group-By queries, the Combine operator will truncate results at
QueryContext#limit
.Usage Guideline
The defaults should be decent for most of the cases, except the following
When Leaf Stage Needs to Return High Number of Series (>100k)
If you want the leaf stage to return a high number of series, increase the value of the
limit
parameter. Note that thenumGroupsLimit
will still truncate groups at segment level, so you may also want to increase that.However, make sure that the number of TimeBuckets, multiplied by the number of series, is within a reasonable bound, to prevent high heap usage.
When Time Buckets Length is Very High
In such cases, one may want to set a lower value of
limit
andnumGroupsLimit
.Additional Note to Language Developers
The combine operator may return more records than the limit defined in
QueryContext#limit
. Usually this excess is small (approximately equal to number of threads), and comes from concurrent updates to the Indexed Table.Since the limit is passed to the
RangeTimeSeriesRequest
, you can add an operator to trim the number of series at any point after the Leaf stage.Test Plan
Tested with a Quickstart, and also have some UTs.
Future Work
I am generally not a fan of the Prometheus API. Mid/Long term I'd prefer that we also support the standard SQL query API. I'll raise a separate PR or issue to discuss that broadly.