A simple, free, and easy-to-use tool for converting scanned PDF files, images, and documents to text using Optical Character Recognition (OCR). This tool processes files locally in the browser, allowing developers and users to extract text from PDF documents and images, as well as convert images into PDFs.
- OCR-powered PDF to Text Conversion: Extract text from scanned PDF files using Tesseract.js.
- Image OCR: Extract text from images (JPG, PNG, etc.) using OCR technology.
- Multi-language Support: Supports various languages including English, Arabic, Spanish, French, and more.
- Image to PDF Conversion: Convert images (JPG, PNG, etc.) into a PDF file.
- Downloadable Output: Extracted text can be downloaded as a PDF file or plain text file.
- Copy to Clipboard: The extracted text can be copied to the clipboard for easy pasting.
- Local Processing: All processing is done locally in the browser, ensuring privacy and security.
- Tesseract.js: A powerful JavaScript library for OCR.
- pdf.js: A PDF rendering engine that allows us to convert PDF pages to images.
- jsPDF: A library to generate downloadable PDFs.
- Tailwind CSS: A utility-first CSS framework for modern web design.
- Open the PDF2Text OCR Tool in your browser.
- Select a PDF file by clicking the Select PDF File button.
- Choose the language for OCR from the dropdown menu.
- Wait for the tool to process the file and extract the text.
- Once the extraction is complete, you can:
- Copy the text to your clipboard.
- Download the extracted text as a PDF.
To run this project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/AzozzALFiras/Pdf-OCR.git
2- Navigate to the project folder:
cd Pdf2Text-OCR
3- Open the index.html file in your browser to use the tool locally.
- Tesseract.js – An open-source OCR (Optical Character Recognition) library.
- pdf.js – A Mozilla project that allows the rendering of PDF documents in a web browser.
- jsPDF – A library for generating PDF documents using JavaScript.
- Tailwind CSS – A utility-first CSS framework for building custom user interfaces.