Develop an application to demonstrate how the Automotive Industry could harness data to make informed decisions. Demonstrate the use of data analytics in identifying:
- Customer segments
- Most popular car specification combination (engine type, fuel, mileage , etc.)
- Right time to launch a new car, etc.
The user is an automotive OEM (Original Equipment Manufacturer). He wants to gain an understanding of the following pain points:
- Which new car to position in the market for the target audience and what should be the expected sales?
- Should a car be positioned in a Mid SUV range and at what price?
- What updates to launch in the existing car models at a variant level or as a revamp? For example:
- Tata Harrier would like to check whether adding a petrol variant will increase the sales
- Hyundai Santro would like to check if they should revamp or upgrade features at variant
- I have divided the program into three weekly sprints with one epic for each sprint
- For the first sprint, the epic is to create a working data science pipeline to meet the requirements of the user problem
- For the second sprint, the epic is to create a working dashboard which will give access to the OEMs to access the dashboard & make queries
- For the third sprint, the epic is to create a dockerized container with the database which can host end to end solution
- JupyterNotebook is used for doing the Data Analysis where we have leverage Clustering and Regression Algorithms
- JupyterNotebook is connected with Django ORM Layer with Pickle files for getting the trained instances of Scaler, PCA (Principal Component Analysis) & K-Means
- JupterNotebook is also connected using Django REST APIs to send the updated aggregated insights over APIs
- Django is used with MongoDB. The key reason to use Django is that it can work on 'pickle' files
- MongoDB provides a very good database to put a lot of unstructured data coming from analytics pipeline
- Angular application is used at the frontend for providing user with a solution framework
- Angular application also leverages REST APIs from Django
- Docker & NGinx helps in better orchestration of the complete solution
- Redis is added in the last sprint to handle the queue management in seprate thread for data analytics pipeline
- Docker (Tested on Docker version 20.10.14)
- Docker Compose (Tested on Docker Compose version v2.5.1)
It will just take two steps to run the project after cloning it :-)
Step 1: Creates all the docker containers
./run.sh
Step 2: Populates initial data in API for Angular Dashboard to work
(For Local Dev)=> Run Juyter Notebook Cells to populate APIs for dashboarding
Kindly note that all builds are done in the Dockerfile
so the installation is easy, but the build process, specially for Angular application will take some time as it is two stage docker image based orchestration
cd ./jupyter-data-analysis
pip install -r requirements
jupyter notebook
There are two instances which have to be executed for Django
- Django Main Pre-requisites: MongoDB should be running in local
cd ./segmently_django
pip install -r requirements
python manage.py runserver
- Django Huey (Consumer) Pre-requisites: Redis should be running in local
cd ./segmently_django
python manage.py run_huey
Pre-reqisites: NodeJS
cd ./segmently_dashboard
npm start
K-Means Clustering
algorithm based onPrincipal Component Analysis
(which reduces 130+ features/dimension to 3 key features/dimension)- End to end integration with
Django REST APIs
which funnels insights from Jupyter to Angular dashboard - Use of
MongoDB
to keep - Use of
Redis
to queue the job task for processing data science pipeline on new data configuration - Use of
Pickle
files to transfer trained model functions from Jupyter to Django topredict
on the data set trainingfit
- Use of
Docker-Compose
to make the orchestraion simple
- Pickle based integration on Django Huey - Consumer Threads
- Launch Configuration to support multiple variants
- Interactive graphs and query for the OEM