This project aims to transform raw song play data and load them into traditional database, in this case, Postgres for later analysis. This is also used to satisfied with Data Modeling with Postgres
project under Data Engineer Nanodegree Program.
- conda
- Docker
- dataset gathered from Million Song Dataset
- Bootstrap Python and dependencies
$ ./bootstrap_env_via_conda.sh
- Spin up localized instance of Postgres DB
$ ./respawn_db.sh
- Initialize related tables
$ python ./create_tables.py
- place dataset under
./data
directory
$ python etl.py
$ jupyter notebook
# then walk through `test.ipynb` notebook
- In case of something wrong in local database, use
respawn_db.sh
to re-initialize new one.