PDF2Text OCR Tool with Image to PDF and Image OCR

A simple, free, and easy-to-use tool for converting scanned PDF files, images, and documents to text using Optical Character Recognition (OCR). This tool processes files locally in the browser, allowing developers and users to extract text from PDF documents and images, as well as convert images into PDFs.

Features

OCR-powered PDF to Text Conversion: Extract text from scanned PDF files using Tesseract.js.
Image OCR: Extract text from images (JPG, PNG, etc.) using OCR technology.
Multi-language Support: Supports various languages including English, Arabic, Spanish, French, and more.
Image to PDF Conversion: Convert images (JPG, PNG, etc.) into a PDF file.
Downloadable Output: Extracted text can be downloaded as a PDF file or plain text file.
Copy to Clipboard: The extracted text can be copied to the clipboard for easy pasting.
Local Processing: All processing is done locally in the browser, ensuring privacy and security.

Technologies

Tesseract.js: A powerful JavaScript library for OCR.
pdf.js: A PDF rendering engine that allows us to convert PDF pages to images.
jsPDF: A library to generate downloadable PDFs.
Tailwind CSS: A utility-first CSS framework for modern web design.

Usage

Open the PDF2Text OCR Tool in your browser.
Select a PDF file by clicking the Select PDF File button.
Choose the language for OCR from the dropdown menu.
Wait for the tool to process the file and extract the text.
Once the extraction is complete, you can:
- Copy the text to your clipboard.
- Download the extracted text as a PDF.

Installation

To run this project locally, follow these steps:

Clone the repository:

git clone https://github.com/AzozzALFiras/Pdf-OCR.git

2- Navigate to the project folder:

cd Pdf2Text-OCR

3- Open the index.html file in your browser to use the tool locally.

License

MIT

Acknowledgments

Tesseract.js – An open-source OCR (Optical Character Recognition) library.
pdf.js – A Mozilla project that allows the rendering of PDF documents in a web browser.
jsPDF – A library for generating PDF documents using JavaScript.
Tailwind CSS – A utility-first CSS framework for building custom user interfaces.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Img2PDF.html		Img2PDF.html
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF2Text OCR Tool with Image to PDF and Image OCR

Features

Technologies

Usage

Installation

License

Acknowledgments

About

Releases

Packages

Languages

AzozzALFiras/Pdf-OCR

Folders and files

Latest commit

History

Repository files navigation

PDF2Text OCR Tool with Image to PDF and Image OCR

Features

Technologies

Usage

Installation

License

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages