In this recipe we'll learn how to configure a JSON index.
Pinot Version | 1.0.0 |
Schema | config/schema.json |
Real-Time Table Config | config/table.json |
This is the code for the following recipe: https://dev.startree.ai/docs/pinot/recipes/json-index
git clone git@github.com:startreedata/pinot-recipes.git
cd pinot-recipes/recipes/json-index
Spin up a Pinot cluster using Docker Compose:
docker-compose up
Add tables and schema:
docker run \
--network jsonindex \
-v $PWD/config:/config \
apachepinot/pinot:1.0.0 AddTable \
-schemaFile /config/schema.json \
-tableConfigFile /config/table.json \
-controllerHost "pinot-controller-jsonindex" \
-exec
Import messages into Kafka:
pip install faker
python datagen.py --sleep 0.0001 2>/dev/null |
jq -cr --arg sep ø '[.uuid, tostring] | join($sep)' |
kcat -P -b localhost:9092 -t people -Kø
Query Pinot:
select *
from people
WHERE JSON_MATCH(person, '"$.address.state"=''Kentucky''')
limit 10
select count(*)
from people
WHERE JSON_MATCH(person, '"$.address.state" <> ''Kentucky''')
select json_extract_scalar(person, '$.address.state', 'STRING') AS state, count(*)
from people
WHERE JSON_MATCH(person, '"$.address.state" IN (''Kentucky'', ''Alabama'')')
GROUP BY state
ORDER BY count(*) DESC
select json_extract_scalar(person, '$.address.state', 'STRING') AS state, count(*)
from people
WHERE JSON_MATCH(person, '"$.address.state" NOT IN (''Kentucky'', ''Alabama'')')
GROUP BY state
ORDER BY count(*) DESC
select count(*)
from people
WHERE JSON_MATCH(person, '"$.address.state" IN (''Kentucky'')')
select count(*)
from people
WHERE JSON_MATCH(person, '"$.interests[0]" = ''Swimming''')