Part-1: Pinot Timeseries Engine SPI #13885

ankitsultana · 2024-08-23T23:53:32Z

The design doc and the corresponding issue was raised ~2 weeks ago: #13760

You can find the reference complete working implementation here. It also has instructions on how you can kickstart it on your local. The PR has slight improvements and code-cleanup over the full implementation: ankitsultana#35

codecov-commenter · 2024-08-24T00:32:56Z

Codecov Report

Attention: Patch coverage is 21.23894% with 178 lines in your changes missing coverage. Please review.

Project coverage is 57.89%. Comparing base (59551e4) to head (b284592).
Report is 1008 commits behind head on master.

Files with missing lines	Patch %	Lines
...in/java/org/apache/pinot/tsdb/spi/TimeBuckets.java	0.00%	30 Missing ⚠️
...a/org/apache/pinot/tsdb/spi/series/TimeSeries.java	0.00%	28 Missing ⚠️
...e/pinot/tsdb/spi/series/BaseTimeSeriesBuilder.java	0.00%	17 Missing ⚠️
...b/spi/series/TimeSeriesBuilderFactoryProvider.java	0.00%	17 Missing ⚠️
.../apache/pinot/tsdb/spi/RangeTimeSeriesRequest.java	0.00%	15 Missing ⚠️
...pinot/tsdb/spi/plan/serde/TimeSeriesPlanSerde.java	62.96%	6 Missing and 4 partials ⚠️
...tsdb/spi/series/builders/MaxTimeSeriesBuilder.java	0.00%	10 Missing ⚠️
...tsdb/spi/series/builders/MinTimeSeriesBuilder.java	0.00%	10 Missing ⚠️
.../spi/series/builders/SummingTimeSeriesBuilder.java	0.00%	9 Missing ⚠️
...ot/tsdb/spi/plan/ScanFilterAndProjectPlanNode.java	71.42%	8 Missing ⚠️
... and 6 more

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #13885      +/-   ##
============================================
- Coverage     61.75%   57.89%   -3.86%     
- Complexity      207      219      +12     
============================================
  Files          2436     2612     +176     
  Lines        133233   143175    +9942     
  Branches      20636    21982    +1346     
============================================
+ Hits          82274    82894     +620     
- Misses        44911    53802    +8891     
- Partials       6048     6479     +431

Flag	Coverage Δ
custom-integration1	`<0.01% <ø> (-0.01%)`	⬇️
integration	`<0.01% <ø> (-0.01%)`	⬇️
integration1	`<0.01% <ø> (-0.01%)`	⬇️
integration2	`0.00% <ø> (ø)`
java-11	`57.86% <21.23%> (-3.85%)`	⬇️
java-21	`57.77% <21.23%> (-3.86%)`	⬇️
skip-bytebuffers-false	`57.88% <21.23%> (-3.87%)`	⬇️
skip-bytebuffers-true	`57.73% <21.23%> (+30.00%)`	⬆️
temurin	`57.89% <21.23%> (-3.86%)`	⬇️
unittests	`57.89% <21.23%> (-3.86%)`	⬇️
unittests1	`40.77% <ø> (-6.12%)`	⬇️
unittests2	`27.93% <21.23%> (+0.19%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pinot-timeseries/pinot-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/AggInfo.java

chenboat · 2024-08-28T21:40:31Z

...ies/pinot-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/PinotTimeSeriesConfigs.java

+ */
+package org.apache.pinot.tsdb.spi;
+
+public class PinotTimeSeriesConfigs {


nit: why not use the full name PinotTimeSeriesConfiguration following the PinotConfiguration convention.

...ies/pinot-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/PinotTimeSeriesConfigs.java

...ies/pinot-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/RangeTimeSeriesRequest.java

...t-timeseries/pinot-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/series/Series.java

...eseries/pinot-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/series/SeriesBlock.java

chenboat · 2024-08-28T22:44:21Z

...eseries/pinot-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/series/SeriesBlock.java

+ */
+public class SeriesBlock {
+  private final TimeBuckets _timeBuckets;
+  private final Map<Long, List<Series>> _seriesMap;


Based on the comment, the key of the map is series ID (which is supposed to be a string type). Why use Long here?

Efficiency. String comparisons and copies are costly (interning might help but there's a lot of nuance to it).

When a series is built for the first time, we will convert the string id of the series to a Long hash and use that from there on. You can refer to this operator in the full working code: https://github.com/ankitsultana/pinot/pull/35/files#diff-8e88071ce9fc459e5fa8f4ade8f6ab9598d4edf7909ad054fd537804348944a6

Also, note that we will likely optimize this pretty soon and this representation may change significantly.

^ Adding one more point: I think the API is not that clean right now. I had left a TODO in Series#hash about it.

I need more code to be merged to clean this up since I want to look at the call-sites to figure out a proper design for this.

...timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/series/builders/MaxSeriesBuilder.java

...series-spi/src/main/java/org/apache/pinot/tsdb/spi/series/builders/SummingSeriesBuilder.java

Jackie-Jiang

Seems you are adding the ser/de layer in this PR. I don't think they belong to SPI

pinot-timeseries/pinot-timeseries-spi/pom.xml

pinot-timeseries/pinot-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/AggInfo.java

pinot-timeseries/pom.xml

ankitsultana · 2024-08-30T22:53:16Z

Seems you are adding the ser/de layer in this PR. I don't think they belong to SPI

This is temporary and I can add a TODO (or create an issue). Right now I am relying on Jackson for the serde and we unfortunately have some Serde related code in the plan nodes (JsonCreator, JsonProperty, etc.). Keeping the SPI here for now allows users to test their plans easily.

I am thinking of building a better approach once the baseline implementation is ready.

...not-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/PinotTimeSeriesConfiguration.java

Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com>

raghavyadav01 · 2024-09-05T18:52:18Z

...ies/pinot-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/RangeTimeSeriesRequest.java

+  /** Query is the raw query sent by the caller. */
+  private final String _query;
+  /** Start time of the time-window being queried. */
+  private final long _startSeconds;


Should we make these time series request have millisecond granularity? What is step resolution is less than a second?

The granularity here controls the minimum granularity of the response and the query execution. I don't think we will ever support granularity of less than a second. I had kept this compliant with Prometheus and other time-series systems (our M3 system at Uber also uses seconds for specifying range).

raghavyadav01 · 2024-09-05T20:17:00Z

pinot-timeseries/pinot-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/AggInfo.java

+  private final String _aggFunction;
+
+  @JsonCreator
+  public AggInfo(@JsonProperty("aggFunction") String aggFunction) {


From the test I found that aggFunction is not nullable. Can we enforce it otherwise exception is thrown deep in the stack?

raghavyadav01 · 2024-09-05T20:18:04Z

...ies/pinot-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/RangeTimeSeriesRequest.java

+  /** E2E timeout for the query. */
+  private final Duration _timeout;
+
+  public RangeTimeSeriesRequest(String engine, String query, long startSeconds, long endSeconds, long stepSeconds,


Should we add input validation like endtime >= starttime?

raghavyadav01 · 2024-09-05T20:24:03Z

...imeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/plan/ScanFilterAndProjectPlanNode.java

+
+/**
+ * This would typically be the leaf node of a plan-tree generated by a time-series engine's logical planner. At runtime,
+ * this gets compiled to a Combine Operator.


@ankitsultana promql has function operators like rate, increase.delta etc. Can they be plugged into leaf plan node as function? I think we will need a different function operator, thoughts?

Rate and other compound functions can be implemented by using "partial aggregates". I was planning to add an example after the baseline implementation is done. After this and the next PR, we will only support simple aggregates where the intermediate result has the same type as the final result. In a following PR I will add support for functions like rate, percentile, etc. Tracker: #13957

raghavyadav01 · 2024-09-05T20:25:25Z

...imeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/plan/ScanFilterAndProjectPlanNode.java

+    _tableName = tableName;
+    _timeColumn = timeColumn;
+    _timeUnit = timeUnit;
+    // TODO: This is broken technically. Adjust offset to meet TimeUnit resolution. For now use 0 offset.


What does the comment mean? Is it aligning the offsets to time resolution boundaries?

The issue is that timeUnit here represents time-unit of the stored time column. But the offset is always in seconds right now. We need to change the offset to match the time-unit. I am also tracking this here: #13957

raghavyadav01 · 2024-09-05T20:25:51Z

...imeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/plan/ScanFilterAndProjectPlanNode.java

+
+  public String getEffectiveFilter(TimeBuckets timeBuckets) {
+    String filter = _filterExpression == null ? "" : _filterExpression;
+    // TODO: This is wrong. offset should be converted to seconds before arithmetic. For now use 0 offset.


Is the comment valid?

Yeah same as the above comment. I am tracking this issue here: #13957

raghavyadav01 · 2024-09-05T22:46:50Z

...not-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/series/BaseTimeSeriesBuilder.java

+ * Each time-series operator would typically call either of {@link #addValue} or {@link #addValueAtIndex}. When
+ * the operator is done, it will call {@link #build()} to allow the builder to compute the final {@link TimeSeries}.
+ */
+public abstract class BaseTimeSeriesBuilder {


Do you see any issue if Function operator like rate, increase in promql passed as aggFunc? aggFunc name seems little confusing for rate. What are your thoughts?

Yeah we need to fix it. Will be picked up after E2E basic implementation.

raghavyadav01 · 2024-09-05T22:59:44Z

...series-spi/src/main/java/org/apache/pinot/tsdb/spi/series/builders/MinTimeSeriesBuilder.java

+ * <b>Context:</b>We provide some ready to use implementations for some of the most common use-cases in the SPI. This
+ * reduces redundancy and also serves as a reference implementation for language developers.
+ */
+public class MinTimeSeriesBuilder extends BaseTimeSeriesBuilder {


Do you think we should add a series builder in spi which returns the raw data? the use case would Instant vector or range vector in promql.

Yup good point. Will be picking up instant vector after the E2E implementation. We will also need to add a new combine operator for it.

...s/pinot-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/TimeSeriesLogicalPlanner.java

...imeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/plan/ScanFilterAndProjectPlanNode.java

gortiz · 2024-09-24T08:52:11Z

...t-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/plan/serde/TimeSeriesPlanSerde.java

+/**
+ * We have implemented a custom serialization/deserialization mechanism for time series plans. This allows users to
+ * use Jackson to annotate their plan nodes as shown in {@link ScanFilterAndProjectPlanNode}, which is used for
+ * plan serde for broker/server communication.
+ * TODO: There are limitations to this and we will change this soon. Issues:
+ *   1. Pinot TS SPI is compiled in Pinot distribution and Jackson deps get shaded usually.
+ *   2. The plugins have to shade the dependency in the exact same way, which is obviously error-prone and not ideal.
+ */
+@InterfaceStability.Evolving
+public class TimeSeriesPlanSerde {


This is the 3rd time we define how to serialize and deserialize plans and we still have issues in multi-stage query engine (ie I had to write a lot of code to be able to implement physical explain in multi-stage query engine, see #13733).

I think we should honestly think about moving to something more general we can always use, like substrait

...imeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/plan/ScanFilterAndProjectPlanNode.java

gortiz · 2024-09-24T13:05:47Z

...meseries/pinot-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/series/TimeSeries.java

+public class TimeSeries {
+  private final String _id;
+  private final Long[] _timeValues;
+  private final TimeBuckets _timeBuckets;
+  private final Double[] _values;
+  private final List<String> _tagNames;
+  private final Object[] _tagValues;


I guess this is fine for now, but IICU we are going to have tons of these objects during runtime. Therefore thinking about memory layout is pretty important. We should plan to create different TimeSeries for different data types with specific memory layouts. My main concern is the fact that we are using boxed arries, which are close to twice as expensive in terms of memory as a primitive array. Substituting _values with a double[] and a BitSet that marks the nulls should be quite cheaper in terms of memory and faster in terms of calculation. Same with _timeValues.

Notice that I'm assuming values should be in the order or thousands at most, so BitSet should be better than RoaringBitmap

As the design doc states, optimizations are for Phase-3 and we are aiming for simplicity and completeness for now. But yes agreed on all points

Part-1: Pinot Timeseries Engine SPI

ce37d2e

ankitsultana requested review from kishoreg, siddharthteotia and npawar August 23, 2024 23:53

Remove unused field in AggInfo

3e8fd09

ankitsultana marked this pull request as ready for review August 24, 2024 02:01

fix license

0824eae

ankitsultana requested a review from chenboat August 26, 2024 17:25

fix UT

e0e42ef

ankitsultana self-assigned this Aug 26, 2024

ankitsultana added the timeseries-engine Tracking tag for generic time-series engine work label Aug 26, 2024

chenboat reviewed Aug 26, 2024

View reviewed changes

pinot-timeseries/pinot-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/AggInfo.java Outdated Show resolved Hide resolved

address feedback

098bcf4

chenboat reviewed Aug 28, 2024

View reviewed changes

...ies/pinot-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/PinotTimeSeriesConfigs.java Outdated Show resolved Hide resolved

chenboat reviewed Aug 28, 2024

View reviewed changes

...ies/pinot-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/RangeTimeSeriesRequest.java Outdated Show resolved Hide resolved

chenboat reviewed Aug 28, 2024

View reviewed changes

...t-timeseries/pinot-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/series/Series.java Outdated Show resolved Hide resolved

chenboat reviewed Aug 28, 2024

View reviewed changes

...eseries/pinot-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/series/SeriesBlock.java Outdated Show resolved Hide resolved

chenboat reviewed Aug 28, 2024

View reviewed changes

...timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/series/builders/MaxSeriesBuilder.java Outdated Show resolved Hide resolved

...series-spi/src/main/java/org/apache/pinot/tsdb/spi/series/builders/SummingSeriesBuilder.java Outdated Show resolved Hide resolved

ankitsultana added 3 commits August 28, 2024 18:43

address feedback

260a92b

rename to timeseries everywhere

2f77c4d

fix checkstyle

3809a7d

Jackie-Jiang reviewed Aug 30, 2024

View reviewed changes

Address feedback

2e3fbe7

chenboat approved these changes Sep 5, 2024

View reviewed changes

Jackie-Jiang reviewed Sep 5, 2024

View reviewed changes

...not-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/PinotTimeSeriesConfiguration.java Outdated Show resolved Hide resolved

Update PinotTimeSeriesConfiguration.java

660bcbf

Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com>

raghavyadav01 reviewed Sep 6, 2024

View reviewed changes

raghavyadav01 approved these changes Sep 6, 2024

View reviewed changes

ankitsultana added 3 commits September 9, 2024 11:13

address feedback

d7d5d49

try fix checkstyle

3b8cd23

fix checkstyle again

b284592

Jackie-Jiang merged commit b828280 into apache:master Sep 9, 2024
22 of 23 checks passed

ankitsultana mentioned this pull request Sep 14, 2024

Part-2: Add Combine and Segment Level Operators for Time Series #13999

Merged

gortiz reviewed Sep 24, 2024

View reviewed changes

...s/pinot-timeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/TimeSeriesLogicalPlanner.java Show resolved Hide resolved

gortiz reviewed Sep 24, 2024

View reviewed changes

...imeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/plan/ScanFilterAndProjectPlanNode.java Show resolved Hide resolved

gortiz reviewed Sep 24, 2024

View reviewed changes

...imeseries-spi/src/main/java/org/apache/pinot/tsdb/spi/plan/ScanFilterAndProjectPlanNode.java Show resolved Hide resolved

gortiz reviewed Sep 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Part-1: Pinot Timeseries Engine SPI #13885

Part-1: Pinot Timeseries Engine SPI #13885

ankitsultana commented Aug 23, 2024

codecov-commenter commented Aug 24, 2024 •

edited

Loading

chenboat Aug 28, 2024

chenboat Aug 28, 2024

ankitsultana Aug 28, 2024

ankitsultana Aug 28, 2024

Jackie-Jiang left a comment

ankitsultana commented Aug 30, 2024

raghavyadav01 Sep 5, 2024

ankitsultana Sep 9, 2024

raghavyadav01 Sep 5, 2024

raghavyadav01 Sep 5, 2024

raghavyadav01 Sep 5, 2024

ankitsultana Sep 9, 2024

raghavyadav01 Sep 5, 2024

ankitsultana Sep 9, 2024

raghavyadav01 Sep 5, 2024

ankitsultana Sep 9, 2024

raghavyadav01 Sep 5, 2024

ankitsultana Sep 9, 2024

raghavyadav01 Sep 5, 2024

ankitsultana Sep 9, 2024

gortiz Sep 24, 2024

gortiz Sep 24, 2024

ankitsultana Sep 24, 2024 •

edited

Loading

Part-1: Pinot Timeseries Engine SPI #13885

Part-1: Pinot Timeseries Engine SPI #13885

Conversation

ankitsultana commented Aug 23, 2024

codecov-commenter commented Aug 24, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jackie-Jiang left a comment

Choose a reason for hiding this comment

ankitsultana commented Aug 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ankitsultana Sep 24, 2024 • edited Loading

Choose a reason for hiding this comment

codecov-commenter commented Aug 24, 2024 •

edited

Loading

ankitsultana Sep 24, 2024 •

edited

Loading