Skip to content

This is an end-to-end Data Analysis project using python and it's libraries pandas, numpy, matplotlib, seaborn. I have performed full data analysis steps on largest retail e-commerce orders dataset of Pakistan from problem statement to presenting report. It contains half a million transaction records from March 2016 to August 2018.

Notifications You must be signed in to change notification settings

ali-bin-kashif/Pakistan-ECommerce-Dataset-EDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Pakistan's E-Commerce(2016-2018) Data Analysis(end-to-end project)

Optimizing the Launch and Operation of a New E-commerce Business in Pakistan: A Data-Driven Strategy.

Problem Statement

A group of entrepreuners want to start a new E-Commerce business in Pakistan. They have gathered the data of half a million e-commerce orders in Pakistan from March 2016 to August 2018. Now as a data analyst, you have to analyze and explore the data to find out useful insights, answer various analytical and research questions and test different hypothesis with data driven approach, so that entreprueners can take informed decisions after looking what the data says?

Tasks Performed

  • Data loading and identification:

    Loaded the dataset and viewed it i.e it's colums and rows, checked out samples of different rows, looked for columns data type.

  • Data cleaning:

    Handled missing and anomalous data, checked for outliers.

  • Exploratory Data Analysis:

    Explored the data and extracted different valuable insights through different charts and plots like histogram, bar chart, line plot etc.

Libraries Used

Programming Language: Python

  • Numpy
  • Pandas
  • Matplotlib
  • Seaborn

Suggestions and Key findings

  • Mobiles and Tablets is the best selling category, along with Men and Women's Fashion and appliances.
  • In November there are the most sales due to different sales and campaigns like 11.11 (Giyara-Giyara)
  • Sales are also high in May and June due to Ramadan and Eid.
  • There is a high cancellation rate in online payments.
  • Easypaisa and Banks/Cards/Wallets has the highest rate of cancellation.
  • Cash on Delivery orders are the most successful with most them marked as completed.
  • Men's and women's fashion are both successful categories as they have more completions as compared to cancellations and refund.
  • Mobiles & Tablets is also a good category but with a high risk.

About

This is an end-to-end Data Analysis project using python and it's libraries pandas, numpy, matplotlib, seaborn. I have performed full data analysis steps on largest retail e-commerce orders dataset of Pakistan from problem statement to presenting report. It contains half a million transaction records from March 2016 to August 2018.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published