Skip to content

Latest commit

 

History

History
2 lines (2 loc) · 1.42 KB

README.md

File metadata and controls

2 lines (2 loc) · 1.42 KB

Automated_EDA

This Capstone Project involves building an Automated EDA (Exploratory Data Analysis) tool that can pre-process and visualize data based on column types using Python. The tool aims to simplify the EDA process by automating the pre-processing steps and providing a comprehensive visualization dashboard for each column type. The tool will accept data in various formats, including CSV, Excel, and SQL databases. It will pre-process the data by identifying the data types of each column and performing appropriate pre-processing steps such as handling missing values, encoding categorical features, scaling numerical features, and more. The tool will also provide options for feature selection and dimensionality reduction, making it easier to analyze large datasets. Once the pre-processing is complete, the tool will generate a comprehensive visualization dashboard for each column type, including histograms, box plots, scatter plots, and more. The tool will use Python's Matplotlib, Seaborn, and Plotly libraries to create interactive data visualizations that can be explored and customized by the user. The project will involve designing a user-friendly command-line interface for the tool, implementing data pre-processing steps, developing visualization dashboards for each column type, and testing and debugging the tool to ensure its functionality. Data: https://www.kaggle.com/datasets/parulpandey/us-international-air-traffic-data