https://bizcard-text-extraction-a7jduyfnvvesxdckg8rhy6.streamlit.app/
The Biz Card Data project focuses on extracting and processing data from business card images using the EasyOCR library. By leveraging optical character recognition (OCR) techniques, we can automatically extract text from business card images and convert it into structured data for further analysis or storage. This documentation outlines the steps involved in extracting business card data and demonstrates how to use the Streamlit framework to create a user-friendly interface for the data extraction process.
To get started with the Biz Card Data project, follow these steps:
- Install the EasyOCR library using pip:
pip install easyocr
. - Import the EasyOCR module and set the language to English using
reader = easyocr.Reader(['en'])
. - Import the PIL (Python Imaging Library) module for image handling:
from PIL import Image, ImageDraw
. - Open an image using PIL's
Image.open
function:image = Image.open('business_card.jpg')
. - Use the
reader.readtext
function to extract text from the image:result = reader.readtext(image)
.
To extract data from business card images, we will perform the following steps:
- Create a function to read text from a business card image:
def extract_biz_card_data(image_path): ...
. - Inside the function, open the image using
Image.open
and pass it toreader.readtext
to obtain the text results. - Process the extracted text to identify relevant information such as name, phone number, email address, etc.
- Return the extracted data in a structured format, such as a dictionary or a list of key-value pairs.
To convert the extracted data into a DataFrame and store it for further analysis, follow these steps:
- Install the Pandas library:
pip install pandas
. - Import the Pandas module:
import pandas as pd
. - Convert the extracted data into a DataFrame:
df = pd.DataFrame(extracted_data)
. - Perform any necessary data cleaning or manipulation on the DataFrame.
- Store the DataFrame in a suitable format, such as a CSV file or a database, using Pandas'
to_csv
orto_sql
functions.
To create a user-friendly interface for the Biz Card Data project, we will utilize the Streamlit framework. Follow these steps to integrate Streamlit:
- Install Streamlit:
pip install streamlit
. - Import the Streamlit module:
import streamlit as st
. - Add a header to the Streamlit app:
st.header('Biz Card Data Extraction')
. - Create a file uploader using Streamlit's
file_uploader
function. - Inside the file uploader callback, open the uploaded image using PIL's
Image.open
. - Pass the image to the OCR function to extract the data.
- Display the extracted data using Streamlit's
write
ordataframe
functions. - Customize the Streamlit app layout and appearance as desired.
The Biz Card Data project provides a convenient solution for extracting and processing data from business card images. By leveraging the EasyOCR library and Streamlit framework, we can automate the extraction process and create an intuitive user interface for users to upload and extract data from their business card images. This documentation serves as a guide to set up and utilize the Biz Card Data project effectively.