Welcome to the Probability and Statistics project! 📊🔍 In this exciting journey, you'll get the chance to apply the concepts you've learned in probability theory and statistics to analyze a real-world dataset. This project is your opportunity to dive deep into the world of data analysis and gain practical experience with the tools and techniques you've been learning. 🚀
Your mission is to analyze the provided dataset containing customer information and purchasing behavior to make informed decisions. Your goal is to identify patterns, trends, and correlations that will help your company optimize its marketing efforts and increase offer acceptance rates. 🎉
- Introduction
- Dataset
- Data Preprocessing
- Exploratory Data Analysis
- Statistical Analysis
- Conclusion
- How to Run the Project
- License
In this project, we perform a detailed analysis of customer purchasing behavior using descriptive statistics. The analysis includes various statistical methods to uncover insights from the data that can help in making informed marketing decisions.
The dataset used in this project contains customer information and purchasing behavior. It includes various features such as:
- Customer ID: Unique identifier for each customer
- Age: Age of the customer
- Income: Annual income of the customer
- Marital Status: Marital status of the customer
- MntWines: Amount spent on wine in the last 2 years
- MntFruits: Amount spent on fruits in the last 2 years
- MntMeatProducts: Amount spent on meat in the last 2 years
- MntFishProducts: Amount spent on fish in the last 2 years
- MntSweetProducts: Amount spent on sweets in the last 2 years
- MntGoldProds: Amount spent on gold products in the last 2 years
- NumDealsPurchases: Number of purchases made with a discount
- NumWebPurchases: Number of purchases made through the company’s website
- NumCatalogPurchases: Number of purchases made using a catalogue
- NumStorePurchases: Number of purchases made directly in stores
- NumWebVisitsMonth: Number of visits to the company’s website in the last month
Data preprocessing involves cleaning and preparing the dataset for analysis. The steps include:
- Loading the dataset: Reading the CSV file into a Pandas DataFrame.
- Handling missing values: Identifying and dealing with missing data.
- Data transformation: Transforming data into a suitable format for analysis.
Exploratory Data Analysis (EDA) is performed to understand the dataset better. This involves:
- Descriptive statistics: Calculating mean, median, mode, and standard deviation for numerical columns.
- Data visualization: Creating various plots using Matplotlib and Seaborn to visualize the distribution and relationships in the data.
- Histograms
- Box plots
- Scatter plots
- Correlation matrix
In this section, we apply statistical methods to draw inferences from the data:
- Correlation analysis: Analyzing the correlation between different variables to identify significant relationships.
- Hypothesis testing: Conducting hypothesis tests to validate assumptions and draw conclusions.
- Regression analysis: Performing regression analysis to understand the impact of different factors on the target variable.
The analysis provided insights into customer purchasing behavior, which can help the company optimize its marketing strategies. Key findings include significant correlations between certain demographic variables and purchasing patterns.
- Clone the repository:
git clone https://github.com/yourusername/Applied_Statistics_Project.git
- Navigate to the project directory:
cd Applied_Statistics_Project
- Install the required libraries:
pip install -r requirements.txt
- Run the Jupyter Notebook:
jupyter notebook Applied_Statistics_Project.ipynb