This repository contains data visualization programs on various datasets done using python.
--> Data visualization is the graphical representation of information and data in a pictorial or graphical format(Example: charts, graphs, and maps).
--> Data visualization tools provide an accessible way to see and understand trends, patterns in data, and outliers.
--> Data visualization tools and technologies are essential to analyzing massive amounts of information and making data-driven decisions.
--> The concept of using pictures is to understand data that has been used for centuries. General types of data visualization are Charts, Tables, Graphs, Maps, Dashboards.
--> Python is a high-level, general-purpose, and very popular programming language.
--> Python programming language (latest Python 3) is being used in web development, Machine Learning applications, along with all cutting-edge technology in Software Industry.
--> Python is available across widely used platforms like Windows, Linux, and macOS.
--> The biggest strength of Python is huge collection of standard library.
--> Colaboratory, or βColabβ for short, is a product from Google Research which allows anybody to write and execute python code in Jupyter notebook through the browser.
--> Visit colab at:Β
--> Create account using google account.
--> Once account creation is done, we can directly start coding in colab.
--> It supports Python and R.
--> Files are directly saved in Google Drive.
Description: In this experiment, we download the House Pricing dataset from Kaggle and map the values to various aesthetics using visualizations such as color, shape, and size to represent the data features.
Description: This experiment involves using different color scales to visualize the Rainfall Prediction dataset. We explore the impact of various color palettes and their readability in different visual contexts.
Description: We create different bar plots to represent categorical variables from a given dataset, providing insights into the distribution and comparison across categories.
Description: This experiment demonstrates how to identify skewed data, visualize its distribution, and apply transformations to remove skewness for more accurate analysis.
Description: A time series visualization is performed on a sales dataset, showcasing trends, seasonality, and patterns in the data over time.
Description: A scatterplot is created for a dataset, followed by recommendations for dimension reduction techniques such as PCA or t-SNE to simplify the data while preserving key information.
Description: This experiment covers the use of geospatial data and applying various projections to visualize geographical datasets accurately on different types of maps.
Description: A trend line is plotted with a confidence band to showcase the relationship between variables in a dataset, offering insights into trends and uncertainty around predictions.
Description: This experiment illustrates the use of partial transparency and jittering in scatter plots to handle overlapping points and improve clarity in dense data visualizations.
Description: The experiment explores how different color codes (RGB, HEX, and named colors) can be applied to enhance data visualizations, improving the visual appeal and understanding of complex datasets.
To install python library this command is used-
pip install library_name
--> Dataset is taken from:
--> CSV file which contains house pricing data.
--> Price of house with respect to area and other basic amenties.
--> Dataset is taken from:
--> CSV file which contains the rainfall data.
--> Sub-division wise monthly data for 115 years from 1901-2015.
--> Dataset is taken from:
--> Business financial data provides sales, purchases, salaries and wages, and operating profit estimates for most market industries in New Zealand, and information on stocks for selected industries.
--> This collection uses a combination of survey, tax, and other administrative data.
--> Dataset is taken from:
--> CSV file which contains the sales data.
--> Dataset is taken from:
--> Dataset of minerals found around the world.
--> Dataset is taken from: π
--> This contains data about various automobile in Comma Separated Value (CSV) format.
--> CSV file contains the details of automobile-mileage,length,body-style among other attributes.
--> It contains the following dimensions-[60 rows X 6 columns].
--> The csv file is already preprocessed ,thus their is no need for data cleaning.
--> Dataset is taken from: π
--> This contains data about various NBA Players in Comma Separated Value (CSV) format.
--> CSV file contains the details of players-height,weight,team,position among other attributes.
--> It contains the following dimensions-[457 rows X 9 columns].
--> The csv file is already preprocessed ,thus their is no need for data cleaning.
Short Description about all libraries used.
- NumPy (Numerical Python) β Enables with collection of mathematical functions to operate on array and matrices.
- Pandas (Panel Data/ Python Data Analysis) - This library is mostly used for analyzing, cleaning, exploring, and manipulating data.
- Matplotlib - It is a data visualization and graphical plotting library.
- Seaborn - It is an extension of Matplotlib library used to create more attractive and informative statistical graphics.
- SciPy (Scientific Python) - used for scientific computation. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing
- Scikit-learn - It is a machine learning library that enables tools for used for many other machine learning algorithms such as classification, prediction, etc.
- Geopandas-GeoPandas, as the name suggests, extends the popular data science library pandas by adding support for geospatial data.
Drop a π if you find this repository useful.
If you have any doubts or suggestions, feel free to reach me.
π« How to reach me: Β Β Β