Skip to content

Commit

Permalink
chg ! dedup engine
Browse files Browse the repository at this point in the history
  • Loading branch information
vitali-yanushchyk-valor committed Oct 7, 2024
1 parent e8eee8f commit 411d5c0
Show file tree
Hide file tree
Showing 12 changed files with 58 additions and 5 deletions.
3 changes: 3 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
*
!pyproject.toml
!pdm.lock
5 changes: 3 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
.*
~*
__pycache__
docs-output

!.github

__pycache__
!.dockerignore
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,7 @@ add in your .envrc
#### Start
$ mkdocs build
$ mkdocs serve

#### Using Docker compose

$ docker compose up
10 changes: 10 additions & 0 deletions compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
services:

mkdocs:
build:
context: .
dockerfile: docker/Dockerfile
ports:
- "8000:8000"
volumes:
- .:/app
19 changes: 19 additions & 0 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
FROM python:3.12-slim

RUN apt-get update \
&& apt-get install -y --no-install-recommends \
git \
libcairo2-dev libfreetype6-dev libffi-dev libjpeg-dev libpng-dev libz-dev \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

WORKDIR /app

COPY pyproject.toml pdm.lock /app/

RUN pip install -U pip pdm \
&& pdm venv create --name for-dev 3.12 \
&& pdm sync --venv for-dev \
&& rm -rf ~/.cache/pip ~/.cache/pdm

CMD ["pdm", "run", "mkdocs", "serve", "-a", "0.0.0.0:8000", "--no-strict"]
1 change: 1 addition & 0 deletions docs/components/hde/deduplication_description.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
It provides users with powerful capabilities to identify and remove duplicate records within the system, ensuring that data remains clean, consistent, and reliable.
1 change: 1 addition & 0 deletions docs/components/hde/development.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
To develop the service locally, you can utilize the provided `compose.yml` file. This configuration file defines all the necessary services, including the primary application and its dependencies, to create a consistent development environment. By using **Docker Compose**, you can effortlessly spin up the entire application stack, ensuring that all components work seamlessly together.

To build and start the service, along with its dependencies, run the following command:

docker compose up --build


Expand Down
5 changes: 5 additions & 0 deletions docs/components/hde/did/workflow.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
---
tags:
- Deduplication
---

The Image Processing and Duplicate Detection workflow is designed to provide reliable face detection, recognition, and duplicate detection by leveraging a pre-trained deep learning model.

## Inference Mode Operation
Expand Down
3 changes: 2 additions & 1 deletion docs/components/hde/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# Deduplication

Deduplication Engine component of the HOPE ecosystem. It provides users with powerful capabilities to identify and remove duplicate records within the system, ensuring that data remains clean, consistent, and reliable.
Deduplication Engine component of the HOPE ecosystem.

--8<-- "components/hde/deduplication_description.md"

## Repository

Expand Down
7 changes: 6 additions & 1 deletion docs/components/hde/setup.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
---
tags:
- Deduplication
---

## Prerequisites

This project utilizes [PDM](https://pdm-project.org/) as the package manager for managing Python dependencies and environments.
Expand Down Expand Up @@ -78,7 +83,7 @@ This backend is used for storing locally downloaded DNN model files and encoded
##### FILE_STORAGE_DNN
This backend is dedicated to storing DNN model files. Ensure that the following two files are present in this storage:

1. *deploy.prototxt*: Defines the model architecture.
1. *deploy.prototxt.txt*: Defines the model architecture.
2. *res10_300x300_ssd_iter_140000.caffemodel*: Contains the pre-trained model weights.

The current process involves downloading files from a [GitHub repository](https://github.com/sr6033/face-detection-with-OpenCV-and-DNN) and saving them to this specific Azure Blob Storage using command `django-admin upgrade --with-dnn-setup`, or the specialized`django-admin dnnsetup` command .
Expand Down
3 changes: 2 additions & 1 deletion docs/components/hde/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@ If you encounter issues while running the service, the **admin panel** can be a

To efficiently track and monitor errors within the application, **Sentry** is integrated as the primary tool for error logging and alerting.

For Sentry to work correctly, ensure that the **SENTRY_DSN** environment variable is set.
!!! warning "Sentry environment"
For Sentry to work correctly, ensure that the **SENTRY_DSN** environment variable is set.
2 changes: 2 additions & 0 deletions docs/glossary/terms/process.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,6 @@ Sometimes used as a term pre-intervention to talk about who we are targeting.</p

## Deduplication

--8<-- "components/hde/deduplication_description.md"

#

0 comments on commit 411d5c0

Please sign in to comment.