Skip to content

Latest commit

 

History

History

time-boundary-hybrid-table

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Time boundary for hybrid tables

In this recipe we’ll learn how to compute the time boundary used by Pinot brokers when processing queries for hybrid tables.

Cell in column 1, header row Cell in column 2, header row

Schema

schema

Offline configuration

offline

Realtime configuration

realtime

Batch upload job spec

job spec


Makefile

The Makefile contains all of the commands need to start up Pinot and Kafka. Run the make command below.

make recipe

This command will also:

  • Create a Kafka topic called events.

  • Create a hybrid Pinot events table.

  • Batch load data into the offline events table in Pinot.

  • Generate stream data using the Pinot schema to Kafka and ultimately into the realtime events table in Pinot.

When you go to the table list in Pinot, you will see an events_REALTIME and an events_OFFLINE table. When you go to the query console in Pinot, you will only see one table: events.

Select Pinot Segments

The stream data generator will generate 1000 records into the events_REALTIME table. The batch loader will load 10 records into the events_OFFLINE table for a total of 1010 records.

If you count the number of records in this table, you will only get 1000.

alt

If you look at the query response stats you’ll see 1000 documents scanned from 1010 totalDocs. When querying hybrid tables, the Pinot Broker must decide which records to read from the offline table and which to read from the real-time table.

If you run the SQL below, you’ll see that there are no OFFLINE segments. They are only realtime segments.

select $segmentName, count(*) from events
group by $segmentName

Check the current time boundary:

curl "http://localhost:8099/debug/timeBoundary/events" -H "accept: application/json" 2>/dev/null | jq '.'

Execute the API call below to force Pinot to update the event table’s time boundary.

curl -X POST \
  "http://localhost:9000/tables/events/timeBoundary" \
  -H "accept: application/json" | \
  jq

Run the query again below. This time, you should see both offline and realtime segments.

select $segmentName, count(*) from events
group by $segmentName