- This repo is attempting to use New York Taxi Dataset from Google's Big Query to derive some insights.
- All notebooks are using Anaconda Python2
Summary Report of findings: Summary_Report.pdf
- git clone https://github.com/stephenleo87/nyc-taxi.git
- pip install -r requirements.txt
There are 4 Jupyter Notebooks in this repo
-
- All SQL queries are run from Google Cloud's Datalab platform and the data is stored as csv files for use by subsequent notebooks
- This notebook is included for reference purposes only
-
- Attempting to predict the fare amount from the available data such as trip distance, pickup locations, etc
- TL;DR: Best prediction model achieved RMSE of $3.5
-
- Attempting to predict the tip percentage (tip_amount/fare_amount) from the available data such as trip distance, pickup locations, etc
- TL;DR: Best prediction model could only achieve accuracy of 60% indicating the available dataset is not sufficient to accurately predict tip percentage
-
04_Destination Prediction.ipynb
- Attempting to predict the destination location from the available data such as pickup time, pickup location
- TL;DR: No good prediction model could be found indicating the available dataset is not sufficient to predict dropoff location