This is the code repository for Python Data Cleaning Cookbook-Second-Edition, Published by Packt.
Prepare your data for analysis with pandas, NumPy, Matplotlib, scikit-learn, and OpenAI
The book shows you how to clean, wrangle, and view data from multiple perspectives, including dataset and column attributes. You will cover common and not-so-common challenges that are faced while cleaning messy data for complex situations and learn to manipulate data to get it down to a form that can be useful for making the right decisions.
- Using OpenAI tools for various data cleaning tasks
- Producing summaries of the attributes of datasets, columns, and rows
- Anticipating data-cleaning issues when importing tabular data into pandas
- Applying validation techniques for imported tabular data
- Improving your productivity in pandas by using method chaining
- Recognizing and resolving common issues like dates and IDs
- Setting up indexes to streamline data issue identification
- Using data cleaning to prepare your data for ML and AI models
- Anticipating Data Cleaning Issues When Importing Tabular Data with pandas
- Anticipating Data Cleaning Issues When Working with HTML, JSON, and Spark Data
- Taking the Measure of Your Data
- Identifying Outliers in Subsets of Data
- Using Visualizations for the Identification of Unexpected Values
- Cleaning and Exploring Data with Series Operations
- Identifying and Fixing Missing Values
- Encoding, Transforming, and Scaling Features
- Fixing Messy Data When Aggregating
- Addressing Data Issues When Combining DataFrames
- Tidying and Reshaping Data
- Automate Data Cleaning with User-Defined Functions, Classes, and Pipelines
If you feel this book is for you, get your copy today!
You can get more engaged on the discord server for more latest updates and discussions in the community at Discord
If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost. Simply click on the link to claim your free PDF. Free-PDF
We also provide a PDF file that has color images of the screenshots/diagrams used in this book at ColorImages
Michael Walker has worked as a data analyst for 37 years at various educational institutions. He has also taught data science, research methods, statistics, and computer programming to undergraduates since 2006. He is currently the Chief Information Offi cer at College Unbound in Providence, Rhode Island. Michael is also the author of Data Cleaning and Exploration with Machine Learning.