Skip to content

MatthewDettman/Python-Data-Cleaning-Cookbook-Second-Edition

 
 

Repository files navigation

Python-Data-Cleaning-Cookbook-Second-Edition

This is the code repository for Python Data Cleaning Cookbook-Second-Edition, Published by Packt.

Prepare your data for analysis with pandas, NumPy, Matplotlib, scikit-learn, and OpenAI

About the book

The book shows you how to clean, wrangle, and view data from multiple perspectives, including dataset and column attributes. You will cover common and not-so-common challenges that are faced while cleaning messy data for complex situations and learn to manipulate data to get it down to a form that can be useful for making the right decisions.

What you will learn

  • Using OpenAI tools for various data cleaning tasks
  • Producing summaries of the attributes of datasets, columns, and rows
  • Anticipating data-cleaning issues when importing tabular data into pandas
  • Applying validation techniques for imported tabular data
  • Improving your productivity in pandas by using method chaining
  • Recognizing and resolving common issues like dates and IDs
  • Setting up indexes to streamline data issue identification
  • Using data cleaning to prepare your data for ML and AI models

Table of Contents

  1. Anticipating Data Cleaning Issues When Importing Tabular Data with pandas
  2. Anticipating Data Cleaning Issues When Working with HTML, JSON, and Spark Data
  3. Taking the Measure of Your Data
  4. Identifying Outliers in Subsets of Data
  5. Using Visualizations for the Identification of Unexpected Values
  6. Cleaning and Exploring Data with Series Operations
  7. Identifying and Fixing Missing Values
  8. Encoding, Transforming, and Scaling Features
  9. Fixing Messy Data When Aggregating
  10. Addressing Data Issues When Combining DataFrames
  11. Tidying and Reshaping Data
  12. Automate Data Cleaning with User-Defined Functions, Classes, and Pipelines

If you feel this book is for you, get your copy today! Coding

Know more on the Discord server Coding

You can get more engaged on the discord server for more latest updates and discussions in the community at Discord

Download a free PDF Coding

If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost. Simply click on the link to claim your free PDF. Free-PDF Coding

We also provide a PDF file that has color images of the screenshots/diagrams used in this book at ColorImages Coding

Get to Know the Author

Michael Walker has worked as a data analyst for 37 years at various educational institutions. He has also taught data science, research methods, statistics, and computer programming to undergraduates since 2006. He is currently the Chief Information Offi cer at College Unbound in Providence, Rhode Island. Michael is also the author of Data Cleaning and Exploration with Machine Learning.

About

Python Data Cleaning Cookbook, Second Edition - Published by Packt

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%