In this recipe we'll learn how to analyze the Chicago Crimes dataset with Apache Pinot and Streamlit
Pinot Version | 0.9.0 |
Schema | config/schema.json |
Table Config | config/table.json |
Ingestion Job | config/job-spec.yml |
Clone this repository and navigate to this recipe:
git clone git@github.com:startreedata/pinot-recipes.git
cd pinot-recipes/recipes/analyzing-chicago-crimes
Download the Chicago Crimes dataset:
curl "https://data.cityofchicago.org/api/views/ijzp-q8t2/rows.csv?accessType=DOWNLOAD&bom=true&query=select+*" -o data/Crimes_-_2001_to_Present.csv
Setup Python environment:
pipenv shell
pipenv install
Clean up the data so that it's sorted by the Beat
column:
python data_cleanup.py
Spin up a Pinot cluster using Docker Compose:
docker-compose up
Open another tab to add the crimes
table:
docker exec -it manual-pinot-controller-chicago bin/pinot-admin.sh AddTable \
-tableConfigFile /config/table.json \
-schemaFile /config/schema.json \
-exec
Import Chicago Crimes CSV file into Pinot:
docker exec -it manual-pinot-controller-chicago bin/pinot-admin.sh LaunchDataIngestionJob \
-jobSpecFile /config/job-spec.yml \
-values pinot-controller
Run Streamlit app:
streamlit run app.py
Navigate to http://localhost:8501/