This standalone process application is being developed with Kotlin using Camunda Workflow & Springboot with JUnit5 Unit & Integration Test cases.
For the use case, we will perform ocr on a document using OCR Space API.
The OCR Space API provides a simple way of parsing images and multi-page PDF documents (PDF OCR). The extracted text results will be returned in a JSON format.
To get a free version of API Key, you can register with your email at OCR Space site
The free OCR API plan we are using to demo the application, has
- a rate limit of 500 requests within one day per IP address to prevent accidental spamming
- a limit of 25000 request per month
- a max file size of 1 MB
- a max of 3 pages can be OCRed per document
You can check the API performance and uptime at the API status page
Note: The API Key that you have received from OCR Space would be set in application.yaml file
ocr-space.key
= your_api_key
As per Wikipedia, Camunda Platform is an open-source workflow and decision automation platform. Camunda Platform ships with tools for creating workflow and decision models, operating deployed models in production, and allowing users to execute workflow tasks assigned to them.
It is developed in Java and released as open-source software under the terms of Apache License. It provides
- Business Process Model and Notation (BPMN) standard compliant workflow engine
- Decision Model and Notation (DMN) standard compliant decision engine
These can be embedded in Java applications and with other languages via REST. For more details on Camunda Workflow usage, you can refer to below blogs-
Below is the workflow that we would use to understand use of Camunda Workflow to perform Document OCR using OCR Space API developed using Kotlin Language.
Spring-Boot: (v2.4.3)
Camunda Platform: (v7.15.0)
Camunda Platform Spring Boot Starter: (v7.15.0)
On systems running the API
, it is recommended to use the below port for starting an instance of the API.
10101
– default api listening port
Below section would cover high level tasks required to configure and deploy api jar using Apache Maven
- Clone repository on local system. By default, jars would be taken from Maven Central Repository.
- Update properties in logback-spring.xml, application.yaml if applicable
- Build with maven task
mvnw clean install
- Copy .jar file from /target/ to your
deployment-directory
- Environment specific
application.yaml
&logback-spring.xml
are to be modified and placed in deployment-directory along with.jar
if applicable - Start execution with
java -jar camunda-ocr-<version>.jar
- Logs are generated in deployment-directory/logs folder with file
name
camunda-ocr-logger.log
or as mentioned inlogback-spring.xml
-
You can follow above steps in order to compile, build, package and install jar using maven.
-
Use command
mvnw clean install
to build and install the jar -
Use command
mvnw spring-boot:run
to run the springboot application. -
Alternatively you can navigate to file Application.java and start the application.
-
If every thing works then you should see below log in your console
2021-08-17 22:17:28,936 INFO [main] org.springframework.boot.web.embedded.tomcat.TomcatWebServer: Tomcat started on port(s): 10101 (http) with context path '/camunda-ocr'
2021-08-17 22:17:28,960 INFO [main] com.example.workflow.Application: Started Application in 21.931 seconds (JVM running for 23.514)
2021-08-17 22:17:28,966 INFO [main] org.camunda.bpm.engine.jobexecutor: ENGINE-14014 Starting up the JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor].
2021-08-17 22:17:28,969 INFO [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]] org.camunda.bpm.engine.jobexecutor: ENGINE-14018 JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor] starting to acquire jobs
- Refer to running-your-application for more help with running boot application.
As we have used a custom type file
the default camunda form does not allow to trigger the process from tasklist. However, we can use cockpit to monitor the process.
- On your local machine, you can access Camunda using Camunda Home Page url
- Credentials are configured in application.yaml file in key
camunda.bpm.admin-user
. Default credentials areusername:
demo
password:
demo
Cockpit
helps us to visualise which step our process is. It would also give us the process instance id along with other variables saved in the process.
- You can now click on
Running Process Instance
>Ocr Document
Process Definition to view the running processes.
Note: You may not see the running process as it completes very quick.
- URL to access Swagger OpenAPI Apecification
-
Click on
Try it out
in right corner to enable the API. -
Click
Browse
to upload a file -
Click
Execute
to send the request. You can check the response which is processed via triggering a camunda workflow. -
If you want to read more about swagger, then you can go through blog
In addition, we have added postman collection which can be used to test the API in your local environment.
You would require to have OCR Space key as defined in above section, and a document which is less than 1 MB in size and not more than 3 pages.
- In the end, all the steps we did will be tracked via the logs in the console. If you filter the logs using keyword
workflow-service-info
then you would see below
Logs for End to End Process
2021-08-17 23:35:35,534 INFO [http-nio-10101-exec-1] com.example.workflow.controller.DocumentOcrController: Timestamp:1629223535534:workflow-service-info:OCR Request received:File: images.jpg
2021-08-17 23:35:35,545 INFO [http-nio-10101-exec-1] com.example.workflow.service.DocumentOcrService: Timestamp:1629223535545:workflow-service-info:Prepare & Send Request:Process instance id b8765ce0-ff85-11eb-8bbb-dc7196c5d636
2021-08-17 23:35:37,670 INFO [http-nio-10101-exec-1] com.example.workflow.service.DocumentOcrService: Timestamp:1629223537670:workflow-service-info:Document OCR:Response Data received
2021-08-17 23:35:37,685 INFO [http-nio-10101-exec-1] com.example.workflow.service.ProcessOcrResponse: Timestamp:1629223537685:workflow-service-info:Prepare & Cleanup Response:Process instance id b8765ce0-ff85-11eb-8bbb-dc7196c5d636
2021-08-17 23:35:37,707 INFO [http-nio-10101-exec-1] com.example.workflow.controller.DocumentOcrController: Timestamp:1629223537707:workflow-service-info:Camunda Workflow Completed:Instance Id: b8765ce0-ff85-11eb-8bbb-dc7196c5d636
Thus, we have implemented Document OCR using Kotlin and Camunda Workflow!!
To deploy API as Docker Container refer Docker-Image-Deployment