Data wrangling is the method of washing, structuring and enriching raw data in a desired format in a shorter period for better decision making. Data wrangling is becoming even more omnipresent in today's top businesses. Data has become more complex and unstructured, taking more time to philtre, clean and arrange data ahead of wider study. Around the same time, business users have less time to wait for prepared data on technological tools with the data informing just about any business decision. It includes a model of self-service and a transition from IT-led data preparation to a more democratised model of self-service data preparation or data dispute. A self-service platform and tools for data wrangling helps researchers to address more complex data easier, deliver more precise outcomes and make smarter decisions. Despite of this skill, more businesses have begun to use data wrangling techniques to plan before analysing
Details
- Dataset: Ford Bike data
- Exploring the data, Showcase findings, assessed data for further cleaned dataset.
- Showcase data in cleaner way to convey hidden patterns in data.
- Data findings communicate through plots and histogram and other visualization techniques.
Project Findings
-
The data is pretty straight forward but some attributes work together to display a very conveying and intresting pattern in data and to learn the aspacts of business of Ford bike. I will list some of them below:
-
Among casual and members of the system, casual bikers tend to drive for more duration than the members, which conveys 2 things, either the every day charge is less than membership or number of tourists are more than the people living in the city.
-
Due the pandamic, most peak and busy hours/days are weekends during evening time. As most people work from home during week days. People tend to go out only on weekends for a short drive and short duration of time.
-
During weekdays only evening hours are busy which shows people use ford bike as their commute to go to park or near by places.
-
As the Data is limited to months of july a decline in daily riders can be seen with the further months, which conveys how Pandamic is causing people to stay home and lesser number of tourists arrving in the city and using the ford bike system.
-
Files
-
readme - Markdown file used to convey the findings and structure of data.
-
exploration.ipynb - Jupyter notebook, having all the findings and assessments done on the data set. Divided into parts :
- Data wrangling
- Gathering Data
- Assess Data
- Clean Data
- Explonatry analysis
- Communicating Findings
- Conclusion
-
slide_deck.ipynb - Jupyter notebook with overall findings and conveying the visualizations in Jupyter notebook's slide feature.
-
output_toggle.tpl - File used to export nbconvert slide deck. Using this only the parts which are important for conveying the findings. It is an interactive feature used to view slides in html file.