Skip to content

This repository contains data visualization programs on various datasets done using python.

Notifications You must be signed in to change notification settings

madhurimarawat/Data-Visualization-using-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Data-Visualization-using-python

This repository contains data visualization programs on various datasets done using python.

Data Visualization

What-is-Data-Visualization-Blog-Header


--> Data visualization is the graphical representation of information and data in a pictorial or graphical format(Example: charts, graphs, and maps).

--> Data visualization tools provide an accessible way to see and understand trends, patterns in data, and outliers.

--> Data visualization tools and technologies are essential to analyzing massive amounts of information and making data-driven decisions.

--> The concept of using pictures is to understand data that has been used for centuries. General types of data visualization are Charts, Tables, Graphs, Maps, Dashboards.

Various forms of Data Visualization

Various forms of Data Visualization

About Python Programming

--> Python is a high-level, general-purpose, and very popular programming language.

--> Python programming language (latest Python 3) is being used in web development, Machine Learning applications, along with all cutting-edge technology in Software Industry.

--> Python is available across widely used platforms like Windows, Linux, and macOS.

--> The biggest strength of Python is huge collection of standard library.


Mode of Execution Used Google Colab

--> Colaboratory, or β€œColab” for short, is a product from Google Research which allows anybody to write and execute python code in Jupyter notebook through the browser.

--> Visit colab at:Β  Google Colab

--> Create account using google account.

--> Once account creation is done, we can directly start coding in colab.

--> It supports Python and R.

--> Files are directly saved in Google Drive.


Table Of Contents πŸ“” πŸ”– πŸ“‘

Description: In this experiment, we download the House Pricing dataset from Kaggle and map the values to various aesthetics using visualizations such as color, shape, and size to represent the data features.

Description: This experiment involves using different color scales to visualize the Rainfall Prediction dataset. We explore the impact of various color palettes and their readability in different visual contexts.

Description: We create different bar plots to represent categorical variables from a given dataset, providing insights into the distribution and comparison across categories.

Description: This experiment demonstrates how to identify skewed data, visualize its distribution, and apply transformations to remove skewness for more accurate analysis.

Description: A time series visualization is performed on a sales dataset, showcasing trends, seasonality, and patterns in the data over time.

Description: A scatterplot is created for a dataset, followed by recommendations for dimension reduction techniques such as PCA or t-SNE to simplify the data while preserving key information.

Description: This experiment covers the use of geospatial data and applying various projections to visualize geographical datasets accurately on different types of maps.

Description: A trend line is plotted with a confidence band to showcase the relationship between variables in a dataset, offering insights into trends and uncertainty around predictions.

Description: This experiment illustrates the use of partial transparency and jittering in scatter plots to handle overlapping points and improve clarity in dense data visualizations.

Description: The experiment explores how different color codes (RGB, HEX, and named colors) can be applied to enhance data visualizations, improving the visual appeal and understanding of complex datasets.


Various Libraries in Python for Data Visualization

To install python library this command is used-

pip install library_name
python Library

Dataset Used

Housing Dataset

--> Dataset is taken from: Housing Dataset

--> CSV file which contains house pricing data.

--> Price of house with respect to area and other basic amenties.

Rainfall Prediction Dataset

--> Dataset is taken from: Housing Dataset

--> CSV file which contains the rainfall data.

--> Sub-division wise monthly data for 115 years from 1901-2015.

Buisness Dataset

--> Dataset is taken from: Buisness Dataset

--> Business financial data provides sales, purchases, salaries and wages, and operating profit estimates for most market industries in New Zealand, and information on stocks for selected industries.

--> This collection uses a combination of survey, tax, and other administrative data.

Sales Dataset

--> Dataset is taken from: Sales Dataset

--> CSV file which contains the sales data.

Mineral ores round the world Dataset

--> Dataset is taken from: Minerals Dataset

--> Dataset of minerals found around the world.

Automobile Dataset

--> Dataset is taken from: πŸ”—Automobile Dataset

--> This contains data about various automobile in Comma Separated Value (CSV) format.

--> CSV file contains the details of automobile-mileage,length,body-style among other attributes.

--> It contains the following dimensions-[60 rows X 6 columns].

--> The csv file is already preprocessed ,thus their is no need for data cleaning.

NBA Players Dataset

--> Dataset is taken from: πŸ”—NBA Dataset

--> This contains data about various NBA Players in Comma Separated Value (CSV) format.

--> CSV file contains the details of players-height,weight,team,position among other attributes.

--> It contains the following dimensions-[457 rows X 9 columns].

--> The csv file is already preprocessed ,thus their is no need for data cleaning.

Libraries Used

Short Description about all libraries used.

  • NumPy (Numerical Python) – Enables with collection of mathematical functions to operate on array and matrices.
  • Pandas (Panel Data/ Python Data Analysis) - This library is mostly used for analyzing, cleaning, exploring, and manipulating data.
  • Matplotlib - It is a data visualization and graphical plotting library.
  • Seaborn - It is an extension of Matplotlib library used to create more attractive and informative statistical graphics.
  • SciPy (Scientific Python) - used for scientific computation. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing
  • Scikit-learn - It is a machine learning library that enables tools for used for many other machine learning algorithms such as classification, prediction, etc.
  • Geopandas-GeoPandas, as the name suggests, extends the popular data science library pandas by adding support for geospatial data.

Thanks for Visiting πŸ˜„

Drop a 🌟 if you find this repository useful.

If you have any doubts or suggestions, feel free to reach me.

πŸ“« How to reach me: Β  Linkedin Badge Β  Β  Mail IllustrationπŸ“«