Exploratory Data Analysis of the Improve_Detroit_Issues dataset, with some initial modeling:
Code is found in the notebooks of the repository. Summary of the insights from these notebooks are below.
- Characterize what data is included
- Inform some of the questions that we can and should ask // What can this data tell us?
- Perform some basic modeling of the data
Only the basics are needed:
- Numpy
- Pandas
- Matplotlib
- Sci-kit learn
The data itself can be found at this link. Data description:
Issues submitted through the Improve Detroit mobile app. Improve Detroit is a 311 ticketing system used by many City departments and agencies to manage resident requests.
Columns of the data include the (lat, long), status of the issues, short descriptions of the issues, the neighboorhoods, and the timestamp data of when issues were opened, closed, and updated. For the following visualizations, we'll be primarily interested in geospatial data.
Location of the issues
Do the issues depend on the types of requests?
Some of the most frequent requests are reported by public city departments, so let's look at just citizens' reports
This seemingly points to the idea that the southwestern neighborhoods of Detroit being more urban and dense. Evidently, there is still a lot more to explore. One of the main insights here is that the most common issues are reported in the lower left corner of Detroit. Consequently, a majority of the issues still left Open are found in that area. More exploration needs to be done to determine whether that is on part of the department or the citizens reporting it. Something interesting to look at would be the time it takes to close issues, and the shortest times by neighboorhood.
Does the time it take to close issues vary by month?
Clearly it pays to look at the data before modeling! There is a seemingly obvious quadratic fit. Let's try again.
Much better, and we're getting a pretty decent R-squared term as well. Evidently, there's a concave down relationship between month and time it takes to close issues. This seems counterintuitive, as I would assume that holiday months (later in the Fall and early Winter) would be when it takes issues longest due to fewer staff. However, it seems like the summer months take the absolute longest, with June take even more than a month to close.
Yet, this does not tell the full story, as the amount of time it takes to close may very well depend on the types of issues that show up during the year. The next step is to figure out what kinds of issues appear during which months.