Dockerized pyJedAI for integration into the KLMS.
Find all source code here.
Find all documentation here.
pyJedAI is a python framework, aiming to offer experts and novice users, robust and fast solutions for multiple types of Entity Resolution problems. It is builded using state-of-the-art python frameworks. pyJedAI constitutes the sole open-source Link Discovery tool that is capable of exploiting the latest breakthroughs in Deep Learning and NLP techniques, which are publicly available through the Python data science ecosystem. This applies to both blocking and matching, thus ensuring high time efficiency, high scalability as well as high effectiveness, without requiring any labelled instances from the user.
"input": [
"parameters": {
"separator": ",",
"id_column_name_1" : "Unnamed: 0",
"vectorizer": "st5",
"similarity_search": "faiss",
"top_k": 1,
"similarity_threshold": 0.9
"minio": {
"endpoint_url": "XXXXXXXXX",
"id": "XXXXXXXXX",
"key": "XXXXXXXXX",
"bucket": "XXXXXXXXX"
"message": "pyJedAI project executed successfully!",
"output": [
"name": "List of predicted duplicates",
"path": null
"metrics": {
"f1": 6.294964028776979,
"precision": 97.22222222222221,
"recall": 3.2527881040892193
"status": 200
- "separator": File separator,
- "id_column_name_1" : Coilumn containing ids,
- "vectorizer": Language model,
- "similarity_search": Similarity search framework,
- "top_k": Number of NNs,
- "similarity_threshold": Threshold for determing duplicates
- "f1": F1 score
- "precision": Precision
- "recall": Recall
Install the latest version of pyjedai [requires python >= 3.8]:
pip install pyjedai
More on PyPI.
Set up locally:
git clone
go to the root directory with cd pyJedAI
and type:
pip install .
Available at Docker Hub, or clone this repo and:
docker build -f Dockerfile
To build the docker image:
docker build --no-cache -t stelar_pyjedai .
and to execute
docker run -v <local-path-with-logs>:/app/logs/ -v <local-path-with-data>:/app/data/ stelar_pyjedai:latest input.json output.json
Released under the Apache-2.0 license (see LICENSE.txt).