Welcome to the BetWise repository! This project encompasses the design and implementation of a robust Data Pipeline for handling large-scale analytics and Data Analytics focusing on sports betting trends across multiple websites. The repository is structured in two key stages:
In this stage, we design and implement a data pipeline to process customer information across multiple locations. The pipeline automates the ETL/ELT process to facilitate seamless data transfer from the Operations Center to the Analytics Center.
Scenario:
A new customer signed a contract and requires us to process their data and perform analytics to deliver valuable business insights. The analytics team has no direct access to the Operations Center, and the data must be replicated daily in batch mode using a custom API. The pipeline handles daily transfers of high-volume transactional data, ensuring secure and scalable processing.
🔹 Pipeline Flow: Data moves from Location 1 (Operations Center) to Location 2 (Analytics Center) via a secured API that allows pulling 5,000 records per call. The pipeline is designed to automate this flow and ensure data integrity.
🔹 Technologies Used: AWS for cloud infrastructure, custom ETL/ELT strategies, secured data replication protocols.
This stage focuses on Data Analytics for a client in the sports betting industry. The analysis was conducted on a dataset containing transaction details, including player bets, odds, outcomes, and more.
High Proportion of Losses in Live Bets
Impact of Betting Odds on Outcomes
Player Preferences in Bet Types
The repository is divided into two main stages:
└── 📁betWork
└── 📁Stage_1_Data_Pipeline_Design
└── 📁assets
└── data_pipeline_architecture.png # Pipeline architecture diagram
└── data_pipeline_diagram.png # Pipeline process flow
└── 📁docs
└── Stage_1_Case_1.md # Problem statement for Stage 1
└── stage1_pipeline_solution.pdf # Full solution documentation in PDF
└── 📁Stage_2_Data_Analytics
└── 📁assets
└── PL1.png # Insight #1: Visualization
└── PL2.png # Insight #2: Visualization
└── PL3.png # Insight #3: Visualization
└── PL4.png # Insight #4: Visualization
└── PL5.png # Insight #5: Visualization
└── 📁docs
└── Stage_2_Case_2.md # Problem statement for Stage 2
└── 📁src
└── data.csv # Raw betting data for analysis
└── data.zip # Compressed data for faster access
└── Stage_2_Case_2_Betting_Data_Analysis_PHP.pdf # Full analysis report in PDF
└── Stage_2_Case_2_Betting_Data_Analysis.ipynb # Jupyter Notebook with all code and visualizations
└── README.md # You're here! 😊
- Cloud Platform: AWS (preferred)
- Programming Languages: Python (for Data Analytics).
- Tools: Jupyter Notebooks & Plotly.
-
Stage 1: Data Pipeline Design
- Docs: Navigate to the
docs
folder insideStage_1_Data_Pipeline_Design
to find the problem statement (Stage_1_Case_1.md
) and full solution (stage1_pipeline_solution.pdf
). - Visuals: Explore the pipeline architecture and process flow diagrams located in the
assets
folder:data_pipeline_architecture.png
: Architecture diagramdata_pipeline_diagram.png
: Process flow diagram
- Docs: Navigate to the
-
Stage 2: Data Analytics
- Source Code: Run the Jupyter notebook (
Stage_2_Case_2_Betting_Data_Analysis.ipynb
) in thesrc
folder to reproduce the analysis and insights. - Data: The raw data for analysis is provided as
data.csv
, and a compressed version is available asdata.zip
. - Docs: Review the problem statement (
Stage_2_Case_2.md
) located in thedocs
folder for Stage 2. - Analysis Report PHP: A full analysis report PHP is also provided as
Stage_2_Case_2_Betting_Data_Analysis_PHP.pdf
. - Visuals: View the insights visualizations in the
assets
folder:PL1.png
toPL5.png
: Graphical representations of key insights.
- Source Code: Run the Jupyter notebook (
- 📈 Detailed Data Analytics: Advanced analysis of player behavior, betting outcomes, and market trends.
- 🔄 Automated Data Pipelines: Efficient and secure ETL/ELT processes that streamline data replication.
- 🎨 Beautiful Visualizations: Eye-catching, interactive plots to convey insights clearly and engagingly.