-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion about NYC Taxi Notebook #1
Comments
@bradcray @buddha314 @reuster986 @timothyneumann1 @ben-albrecht @jt-halbert |
I must not be a data scientist, because my head goes to things like "compute mean, median (requires sorting, right?), mode travel times" which seem trivial compared to some of your suggestions. On the other end of the spectrum, my head goes to "Figure out who owns all the taxi medallions and how much they paid for them", though I suspect that's not a task for this dataset. :D I haven't really taken the time to look through what's in the data sets yet, though. Will try to do that tomorrow, I'm being called to dinner ATM. |
Here are a couple of papers and articles about analysis of the NYC Taxi data.Anonymizing NYC Taxi Data: Does It Matter? Analyzing 1.1 Billion NYC Taxi and Uber Trips, with a Vengeance |
Do these datasets contain the same fields as the data from the NYC taxi kaggle competition? We might find some interesting ideas in those notebooks. |
I think it is the same data. Thanks for the links. |
20200925: I updated the notebook a bit and uploaded html and pdf of the notebook with output. |
@hokiegeek2 i forgot to include you on this. |
I like the idea of looking at the notebooks BenA pointed to. Left to my own devices, and looking a bit at the fields that are available, I wondered whether there were correlations that could be drawn about tip amount as a percentage of fare based on length of ride or where the ride originated or time of day. Something that would try to draw some conclusion based on different axes like that. But I don't feel like I'm enough of a data scientist to know whether that's trivial or difficult or interesting. (example hypotheses: tips are more generous as a percentage of total fare for shorter rides and ones originating in Manhattan). |
This issue encapsulates discussion about the NYC Taxi data set example using Arkouda.
Notebook here
Yellow Trips Data Dictionary
NYC Yellow Taxi Trip Records Jan 2020
NYC Taxi Zone Lookup Table
The text was updated successfully, but these errors were encountered: