Integrate AI-powered OCR features into your applications
- Add new Polling method
- prefetch
- Flow to gateway conversion
- Remove CRUD operations
Merge CAREFULLY with the master
branch of Jina.
- serve/runtimes/worker/request_handling.py > Added support for returning Dictionary object and not only Document
- serve/helper.py > Default GRPC options
create folder structure
mkdir models config
Follow instructions from pytorch
website
https://pytorch.org/get-started/locally/
Install required packages with pip
$ pip install -r ./requirements/requirements.txt
Install detectron2 https://github.com/conansherry/detectron2/blob/master/INSTALL.md
Build Docker Image
DOCKER_BUILDKIT=1 docker build . -t marie-icr:1.3
DOCKER_BUILDKIT=1 docker build . -f Dockerfile -t gregbugaj/marie-icr:2.4-cuda --no-cache && docker push gregbugaj/marie-icr:2.4-cuda
docker push gregbugaj/marie-icr:2.3-cuda
DOCKER_BUILDKIT=1 docker build . -f Dockerfile -t gregbugaj/marie-icr:2.3-cuda --no-cache && docker push gregbugaj/marie-icr:2.3-cuda
docker push gregbugaj/marie-icr:2.3-cuda
docker.io/
docker stop
docker container stop $(docker container ls -aq) && docker system prune -af --volumes
cd ~/dev/marie-ai/docker-util/ && docker container stop
-v pwd
/../cache:/opt/marie-icr/.cache:rw \
Starting in Development mode
PYTHONPATH="$PWD" python ./marie/app.py
``
Enable encryption
```sh
python ./app.py --enable-crypto --tls-cert ./cert.pem
Starting in Production mode with gunicorn
. Config
[gunicorn]settings (https://docs.gunicorn.org/en/stable/settings.html#settings)
gunicorn -c gunicorn.conf.py wsgi:app --log-level=debug
Activate the environment as we used PIP
to install docker-compose
(python -m pip install docker-compose)
source ~/environments/pytorch/bin/activate
COMPOSE_VERSION=$(curl -s https://api.github.com/repos/docker/compose/releases/latest | jq -r '.tag_name')
DOCKER_CONFIG=${DOCKER_CONFIG:-$HOME/.docker}
mkdir -p $DOCKER_CONFIG/cli-plugins
curl -SL https://github.com/docker/compose/releases/download/$COMPOSE_VERSION/docker-compose-linux-x86_64 -o $DOCKER_CONFIG/cli-plugins/docker-compose
chmod +x $DOCKER_CONFIG/cli-plugins/docker-compose
ln -s ./config/.env.dev ./.env
docker compose down --volumes --remove-orphans && DOCKER_BUILDKIT=1 docker compose -f docker-compose.yml --project-directory . up --build --remove-orphans
Start consul server
docker compose -f ./Dockerfiles/docker-compose.yml --project-directory . up consul-server --build --remove-orphans
Start storage
docker compose --env-file ./config/.env -f ./Dockerfiles/docker-compose.s3.yml -f ./Dockerfiles/docker-compose.storage.yml --project-directory . up --build --remove-orphans
Start Marie-AI with minimal dependencies (s3, redis, consul, traefik, postgres, minio)
docker compose --env-file ./config/.env -f ./Dockerfiles/docker-compose.yml -f ./Dockerfiles/docker-compose.s3.yml -f ./Dockerfiles/docker-compose.storage.yml --project-directory . up --build --remove-orphans
Building docker container
# --no-cache
DOCKER_BUILDKIT=1 docker build . -f Dockerfile -t marie-icr:2.0 --network=host --no-cache
Building GPU version of the framework requires 1.10.2+cu113
.
If you encounter following error that indicates that we have a wrong version of PyTorch / Cuda
1.11.0+cu102
Using device: cuda
/opt/venv/lib/python3.8/site-packages/torch/cuda/__init__.py:145: UserWarning:
NVIDIA GeForce RTX 3060 Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3060 Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
DOCKER_BUILDKIT=1 docker build . -f Dockerfile -t marie-icr:2.0 --network=host --no-cache
DOCKER_BUILDKIT=1 docker build . -f Dockerfile -t gregbugaj/marie-icr:2.2-cuda --no-cache && docker push gregbugaj/marie-icr:2.2-cuda
DOCKER_BUILDKIT=1 docker build . --build-arg PIP_TAG="[standard]" -f ./Dockerfiles/gpu.Dockerfile -t marieai/marie:3.0-cuda
Install following dependencies to ensure docker is setup for GPU processing.
https://docs.nvidia.com/ai-enterprise/deployment-guide/dg-docker.html https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
Before continuing we need to ensure that our container is configured b
#### Test nvidia-smi with the latest official CUDA image
docker run --gpus all nvidia/cuda:11.0-base nvidia-smi
docker run --gpus all --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 nvidia/cuda:11.0-base nvidia-smi
Overwrite the container ENTRYPOINT
by using --entrypoint
from command line and validate the GPU works by executing
nvidia-smi
docker run -it --rm --gpus all --entrypoint /bin/bash marieai/marie:3.0.31-cuda
Remove dangling containers
docker rmi -f $(docker images -f "dangling=true" -q)
Install new version of docker compose cli plugin
https://docs.docker.com/compose/install/compose-plugin/#installing-compose-on-linux-systems
Start docker compose
DOCKER_BUILDKIT=1 docker-compose up
source .env.prod && docker compose down --volumes --remove-orphans && DOCKER_BUILDKIT=1 docker compose --env-file .env.prod up -d
Cleanup containers
docker-compose down --volumes --remove-orphans
8500 -- Consul 5000 -- Traefik - Entrypoint 7777 -- Traefik - Dashboard
# tests/integration/psql_storage
docker-compose -f docker-compose.yml --project-directory . up --build --remove-orphans --env-file .env.prod
## new docker compose
docker compose --env-file .env -f ./Dockerfiles/docker-compose.storage.yml up
https://hub.docker.com/_/redis https://redis.io/docs/stack/get-started/install/docker/
python -m pip install redis
``
```sh
docker run --name marie_redis -p 6379:6379 -d redis
docker run --rm --name marie_redis -p 6379:6379 redis
docker exe -it marie_redis sh
black
There is a segmentation fault happening with opencv-python==4.5.4.62
switching to opencv-python==4.5.4.60
fixes the issue.
connectedComponentsWithStats produces a segfault
pip install opencv-python==4.5.4.60
KSQL Stream processing example KSQL
table-transformer DocumentUnderstanding [DocumentAI] (https://www.microsoft.com/en-us/research/project/document-ai/)
Implement secondary box detection method. TextFuseNet Implement DocFormer: End-to-End Transformer for Document Understanding DocFormer_End-to-End_Transforme
Install fairseq
from source, the release version is missing convert_namespace_to_omegaconf
git clone https://github.com/pytorch/fairseq.git
cd fairseq
pip install -r requirements.txt
python setup.py build develop
https://github.com/ShannonAI/service-streamer https://github.com/NVIDIA/apex https://github.com/pytorch/fairseq https://discuss.pytorch.org/t/cnn-fp16-slower-than-fp32-on-tesla-p100/12146/7 https://discuss.pytorch.org/t/torch-cuda-amp-inferencing-slower-than-normal/123684
Fix issue
AttributeError: module 'distutils' has no attribute 'version'
python3 -m pip install setuptools==59.5.0
ImageMagic 6 policy
/etc/ImageMagick-6/policy.xml
manualy convert burst tiff to single tiff
convert *.tif -set filename:f "%[t]_%[fx:t+1]" +adjoin "%[filename:f].tif"
Load gpt2 dictionary from https://layoutlm.blob.core.windows.net/trocr/dictionaries/gpt2_with_mask.dict.txt
https://github.com/ibm-aur-nlp/PubLayNet
DocFormer: End-to-End Transformer for Document Understanding
This application uses Open Source components. You can find the source code of their open source projects along with license information in the NOTICE. We acknowledge and are grateful to these developers for their contributions to open source.
Kill hanged docker
ps auxw | grep $(docker container ls | grep containername | awk '{print $1}') | awk '{print $2}'
kill -9 12345678
https://mmocr.readthedocs.io/en/latest/datasets/det.html#funsd https://github.com/alibaba/EasyNLP?ref=stackshare https://huggingface.co/spaces/rajistics/receipt_extractor/blob/main/app.py https://github.com/UBIAI/layoutlmv3FineTuning/blob/master/Layoutlmv3_inference/inference_handler.py https://powerusers.microsoft.com/t5/AI-Builder/bd-p/AIBuilder
RAY https://github.com/ray-project/ray
HAYSTACK https://github.com/deepset-ai/haystack/tree/main
docile : https://github.com/rossumai/docile/blob/ffc139e8e37505121c4b49243011ceed18653650/baselines/NER/docile_inference_NER_multilabel_layoutLMv3.py
QURATOR https://github.com/qurator-spk/eynollah
DAGSTER dagster
https://hevodata.com/signup/?step=email
https://www.marktechpost.com/2022/11/01/a-new-mlops-system-called-alaas-active-learning-as-a-service-adopts-the-philosophy-of-machine-learning-as-service-and-implements-a-server-client-architecture/ https://github.com/ocrmypdf/OCRmyPDF
https://github.com/allenai/datastore https://truss.baseten.co/reference/structure
https://docs.microsoft.com/en-us/aspnet/core/grpc/test-tools?view=aspnetcore-6.0
https://medium.com/swlh/easy-grafana-and-docker-compose-setup-d0f6f9fcec13
https://data-flair.training/blogs/spark-rdd-tutorial/
https://outerbounds.com/ https://docs.dyte.io/guides/integrating-with-webhooks
- Create volumes for
- Torch /home/app-svc/.cache/
- Marie /opt/marie-icr/.cache/
https://www.confluent.io/blog/prioritize-messages-in-kafka/
https://engineeringfordatascience.com/posts/pre_commit_yaml/
Auto annotation tool
https://github.com/opencv/cvat/projects/16 cvat-ai/cvat#2280
Colab notebooks
https://deci.ai/platform/ https://github.com/onepanelio/onepanel
https://github.com/jina-ai/dalle-flow https://github.com/jina-ai/clip-as-service
sudo apt purge nvidia-driver-465 sudo apt autoremove -y sudo apt autoclean sudo apt install nvidia-driver-525 -f
https://github.com/autogluon/autogluon/
https://developer.nvidia.com/blog/end-to-end-ai-for-nvidia-based-pcs-cuda-and-tensorrt-execution-providers-in-onnx-runtime/
https://www.educative.io/answers/what-is-the-least-connections-load-balancing-technique
https://github.com/Ritvik19/Implemented-Data-Science/blob/main/LayoutLMv2-Document-Classification.ipynb https://github.com/ahmedrasheed3995/DocumentClassification https://www.mlexpert.io/machine-learning/tutorials/document-classification-with-layoutlmv3#easyocr https://github.com/AjaxMultiCommentary/ajmc/blob/0389fc6cd53514d4c988baafe2831e0623a03b37/ajmc/olr/layoutlm/layoutlm.py#L20
https://github.com/fioresxcat/VAT_245/tree/fa526ac7e2ce9bb392ca66bd86305d69caee7a86
PDF-Extract-Kit https://github.com/opendatalab/PDF-Extract-Kit?tab=readme-ov-file
https://cloud.google.com/document-ai
LLaMA2 turning https://blog.ovhcloud.com/fine-tuning-llama-2-models-using-a-single-gpu-qlora-and-ai-notebooks/
ACT Testing
act -P ubuntu-20.04=catthehacker/ubuntu:act-20.04 -j build-and-push-latest-docs --secret-file act.secrets -e event.json -W .github/workflows/force-docs-build.yml --insecure-secrets
event.json
{
"inputs": {
"release_token": "ghp_ABC",
"SOME_VALUE": "ABC"
}
}
act.secrets
MARIE_CORE_RELEASE_TOKEN=ghp_ABC
pydantic 1.10.15
pydantic_core 2.10.1
git filter-repo --mailmap mailmap --force
Upgrade pydantic to the latest version
pip install pydantic --force-reinstall
Upgrade FastAPI to the latest version or version above 0.100.2
to fix the issue with pydantic
pip install fastapi --force-reinstall
Install bump-pydantic
and run it via bump.sh
script to convert all pydantic
models to the latest version
pip install bump-pydantic
./bump.sh
To patch JINA AI apply changes from the commits.