Skip to content

A simple, free tool for extracting text from scanned PDFs and images using OCR, and converting images to PDFs. It processes files locally in the browser, ensuring privacy and security while enabling users to effortlessly convert documents and images into editable text or PDF format.

Notifications You must be signed in to change notification settings

AzozzALFiras/Pdf-OCR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

PDF2Text OCR Tool with Image to PDF and Image OCR

A simple, free, and easy-to-use tool for converting scanned PDF files, images, and documents to text using Optical Character Recognition (OCR). This tool processes files locally in the browser, allowing developers and users to extract text from PDF documents and images, as well as convert images into PDFs.

Features

  • OCR-powered PDF to Text Conversion: Extract text from scanned PDF files using Tesseract.js.
  • Image OCR: Extract text from images (JPG, PNG, etc.) using OCR technology.
  • Multi-language Support: Supports various languages including English, Arabic, Spanish, French, and more.
  • Image to PDF Conversion: Convert images (JPG, PNG, etc.) into a PDF file.
  • Downloadable Output: Extracted text can be downloaded as a PDF file or plain text file.
  • Copy to Clipboard: The extracted text can be copied to the clipboard for easy pasting.
  • Local Processing: All processing is done locally in the browser, ensuring privacy and security.

Technologies

  • Tesseract.js: A powerful JavaScript library for OCR.
  • pdf.js: A PDF rendering engine that allows us to convert PDF pages to images.
  • jsPDF: A library to generate downloadable PDFs.
  • Tailwind CSS: A utility-first CSS framework for modern web design.

Usage

  1. Open the PDF2Text OCR Tool in your browser.
  2. Select a PDF file by clicking the Select PDF File button.
  3. Choose the language for OCR from the dropdown menu.
  4. Wait for the tool to process the file and extract the text.
  5. Once the extraction is complete, you can:
    • Copy the text to your clipboard.
    • Download the extracted text as a PDF.

Installation

To run this project locally, follow these steps:

  1. Clone the repository:

    git clone https://github.com/AzozzALFiras/Pdf-OCR.git

2- Navigate to the project folder:

cd Pdf2Text-OCR

3- Open the index.html file in your browser to use the tool locally.

License

MIT

Acknowledgments

  • Tesseract.js – An open-source OCR (Optical Character Recognition) library.
  • pdf.js – A Mozilla project that allows the rendering of PDF documents in a web browser.
  • jsPDF – A library for generating PDF documents using JavaScript.
  • Tailwind CSS – A utility-first CSS framework for building custom user interfaces.

About

A simple, free tool for extracting text from scanned PDFs and images using OCR, and converting images to PDFs. It processes files locally in the browser, ensuring privacy and security while enabling users to effortlessly convert documents and images into editable text or PDF format.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages