Sber Avtopodpiska

Binary classification of Sber Avtopodpiska website visitors' interactions for predefined target actions. This project contains a full pipeline of data preparation and model training, as well as model deployment as an API endpoint. In addition, deploy other services such as a database to store initial data and training results (initial data is not included in the repository, but it is part of db image), scalable API endpoint, the dashboard for visualizing training results and services for collecting and visualizing metrics for database and endpoint performance.

File Structure

Sber-Avtopodpiska
├─ .env
├─ .gitignore
├─ .pre-commit-config.yaml
├─ assets
│  └─ services.svg
├─ data
│  ├─ grafana-storage
│  └─ ru_cities.csv
├─ db-init
│  ├─ 00-postgres-init.sh
│  ├─ 01-init.sql
│  └─ 02-init.sh
├─ dev
│  ├─ dashboard
│  │  ├─ app.py
│  │  ├─ AppData.py
│  │  ├─ assets
│  │  │  └─ style.css
│  │  ├─ callbacks.py
│  │  ├─ Config.py
│  │  ├─ IdHolder.py
│  │  ├─ layout.py
│  │  ├─ utils.py
│  │  └─ wsgi.py
│  └─ train
│     ├─ config.py
│     ├─ db.py
│     ├─ main.py
│     ├─ metrics.py
│     ├─ ModelWrapper.py
│     ├─ model_config.json
│     ├─ Objectives.py
│     ├─ query.sql
│     └─ train.py
├─ docker-compose.yaml
├─ Dockerfile.api
├─ Dockerfile.base-python
├─ Dockerfile.dashboard
├─ Dockerfile.db
├─ Dockerfile.ml
├─ local
│  ├─ api.py
│  ├─ main.py
│  ├─ ModelWrapper.py
│  ├─ model.pkl
│  ├─ model_config.json
│  └─ train.py
├─ notebooks
│  ├─ EDA.ipynb
│  ├─ Model Selection.ipynb
│  └─ Preprocessing.ipynb
├─ prod
│  └─ endpoint
│     ├─ api.py
│     ├─ Config.py
│     ├─ ModelWrapper.py
│     └─ train.py
├─ prometheus.yaml
├─ README.md
└─ wait-for-it.sh

Run

Docker setup requires approximately 15 GB RAM to run all services simultaneously or 5.8 GM RAM to run db + dev-train (most memory consumptive pair, the value highly depends on training settings - model, model parameters and resampler)

Run the following command in the root directory of the project:

docker-compose up db dev-train

at least once to train the model and save it to the database. Additionally, you can include the following services in the command:

traefik
adminer
grafana
postgres-exporter
prometheus

After that you can up the following services:

dev-dashboard
endpoint

Run locally

Alternatively, go to the local directory and run the following command:

python main.py

to initiate the training process. Consider putting respective data (ga_hits.csv, ga_sessions.csv) under the data directory beforehand.

Using the following command:

python -m uvicorn api:app --proxy-headers --host 127.0.0.1 --port 80

you can run the API locally.

API

API accepts the following requests

GET
- /, /status - return endpoint status. If running in container, additionally return container name
- /score - for local endpoint only. Return ROC AUC score for model.
POST
- /predict - return predictions for one or more items. Each item should contain utm_*, device_* and geo_* data. The accepted format is dict with items key, that contains array with dict. Each dict in array represents one item. Example:

{   
  "items": [
      {
          "utm_source": false,
          "utm_medium": false,
          "utm_campaign": "isYoUwVPnRHJ",
          "utm_adcontent": "JNHcPlZPxEM",
          "utm_keyword": null,
          "device_category": "mobile",
          "device_os": null,
          "device_brand": "Nokia",
          "device_model": null,
          "device_screen_resolution": "412x823",
          "device_browser": "Chrome",
          "geo_country": "Russia",
          "geo_city": "Stavropol"
      }
  ]
}

Services

ML - service for training models, making predictions on test data and saving models and metrics to a database.
Dev Dashboard - service for visualizing train results. Available at http://dev-dashboard.localhost:8050.

(Dashboard example)

Endpoint - service for making predictions on new data. Available at http://api.localhost:80.
Prometheus - service for collecting metrics from services. Available at http://prometheus.localhost:9090.
Grafana - service for visualizing metrics from database and API. Available at http://grafana.localhost:3000.
DB - service for storing data. Available at http://db.localhost:5432.
Adminer - service for database management. Available at http://adminer.localhost:8090.
Postgres-exporter - service for collecting metrics from a database.
Traefik - service for routing requests to services and API load balancer. In addition, allows to collect the metrics from API. Available at http://traefik.localhost:8080.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sber Avtopodpiska

File Structure

Run

Run locally

API

Services

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
data		data
db-init		db-init
dev		dev
local		local
notebooks		notebooks
presentation		presentation
prod/endpoint		prod/endpoint
.env		.env
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile.api		Dockerfile.api
Dockerfile.base-python		Dockerfile.base-python
Dockerfile.dashboard		Dockerfile.dashboard
Dockerfile.db		Dockerfile.db
Dockerfile.ml		Dockerfile.ml
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml
prometheus.yaml		prometheus.yaml
wait-for-it.sh		wait-for-it.sh

License

AlimU11/Sber-Avtopodpiska

Folders and files

Latest commit

History

Repository files navigation

Sber Avtopodpiska

File Structure

Run

Run locally

API

Services

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages