Book Spine Detection System

An AI-powered system that detects and extracts metadata from book spines in images. The system uses computer vision and machine learning to identify books, read their spines, and extract title and author information.

Features

Book spine detection using YOLO object detection
Image enhancement using RealESRGAN
Text extraction using Google Cloud Vision API
Metadata refinement using Google Gemini AI
Intelligent caching system to reduce API costs
Web interface for uploading and viewing results

Demo

The system detects individual book spines and extracts metadata including title and author information

Pipeline

System Requirement

Backend Requirements

Python 3.8+
CUDA-capable GPU (for YOLO detection)
Node.js and npm

Python Dependencies

torch
torchvision
opencv-python
numpy
pillow
requests
google-cloud-vision
google-cloud-gemini

External Models and APIs Required

YOLO weights file (models/yolo_weights/best.pt)
- Download the weights file from Google Drive
- Place the downloaded best.pt file in models/yolo_weights/ directory
RealESRGAN executable (models/realesrgan_portable/realesrgan-ncnn-vulkan.exe)
- Download the portable executable from Real-ESRGAN releases
- For Windows: Use realesrgan-ncnn-vulkan.exe
- For Mac/Linux: Download appropriate version and adjust path accordingly
Google Cloud Vision API credentials
Google Gemini API key

RealESRGAN Setup

Download the portable RealESRGAN executable for your platform
Place the executable in models/realesrgan_portable/
The system uses RealESRGAN with these default settings:
```
# Windows example
realesrgan-ncnn-vulkan.exe -i input.jpg -o output.png -n realesrgan-x4plus
```
Available models:
- realesrgan-x4plus (default)
- realesrnet-x4plus
- realesrgan-x4plus-anime (optimized for anime images)
- realesr-animevideov3 (animation video)

Note: For Mac/Linux users, adjust the executable path and filename according to your platform.

API Setup Requirements

Google Cloud Vision API

Create a Google Cloud Project
Enable the Cloud Vision API
Create service account credentials
Download the JSON key file
Set up authentication by either:
- Setting the GOOGLE_APPLICATION_CREDENTIALS environment variable to point to your key file:
```
export GOOGLE_APPLICATION_CREDENTIALS="path/to/your/credentials.json"
```
- Or placing the JSON key file in a known location and updating the code to reference it

Google Gemini API

Get a Gemini API key from Google AI Studio
Create a .env file in the backend directory
Add your Gemini API key:
```
GEMINI_API=your_api_key_here
```

The system uses these APIs for:

Google Cloud Vision: Text extraction from book spines
Google Gemini: Intelligent refinement of extracted text and metadata parsing

For detailed Google Cloud Vision setup instructions, visit the official documentation.

Project Structure

├── backend/
│ ├── python-scripts/
│ │ ├── detect.py # Main detection script
│ │ ├── fetch_book_info.py # Book metadata fetching
│ │ └── fetch_database.py # Database operations
│ └── src/
│ └── server.js # Backend server
├── frontend/
│ └── public/
│ ├── index.html # Web interface
│ └── index.js # Frontend logic
└── models/ # AI model files

Setup Instructions

Clone the repository
Install Python dependencies:
```
pip install -r backend/requirements.txt
```
Install Node.js dependencies:
```
cd backend
npm install
```
Set up required API keys and credentials
Place model files in the appropriate directories
Configure CORS settings:
- The backend server runs on http://localhost:3000
- Frontend should be served from a live server (e.g., VS Code Live Server) at http://127.0.0.1:5500
- If using different ports, update the CORS configuration in backend/src/server.js

Usage

Command Line

python backend/python-scripts/detect.py <path_to_image>

Web Interface

Start the backend server:
```
cd backend
npm start
```
Open frontend/public/index.html in a web browser
Upload an image containing book spines
View the detected books and extracted metadata

Output

The system generates:

Detected book metadata (title, author)
Enhanced images
Cropped individual book spine images
Annotated original image showing detections
Cached results for faster subsequent processing

Caching System

The system implements multi-level caching to improve performance and reduce API costs:

OCR results cache
Gemini API response cache
Full process results cache

Cache files are stored in the output directory structure:

output/
└── image_name/
├── crops/ # Cropped book spine images
├── ocr_cache/ # OCR results
├── gemini_cache/ # AI refinement results
└── process_cache/ # Full process results

License

Apache License 2.0

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Contributors

Min-Han Li (@MinHanLiWesley)
Yuan Kuang (@greendress2022)
Lulu Jiao (@luljia0)
Yue Zhang (@WillzDevs)

Thank you to all contributors who have helped make this project possible!

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.deprecated		.deprecated
backend		backend
docs		docs
frontend/public		frontend/public
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Book Spine Detection System

Features

Demo

Pipeline

System Requirement

Backend Requirements

Python Dependencies

External Models and APIs Required

RealESRGAN Setup

API Setup Requirements

Google Cloud Vision API

Google Gemini API

Project Structure

Setup Instructions

Usage

Command Line

Web Interface

Output

Caching System

License

Contributors

About

Releases

Packages

Contributors 2

Languages

License

MinHanLiWesley/book-spine-recognition

Folders and files

Latest commit

History

Repository files navigation

Book Spine Detection System

Features

Demo

Pipeline

System Requirement

Backend Requirements

Python Dependencies

External Models and APIs Required

RealESRGAN Setup

API Setup Requirements

Google Cloud Vision API

Google Gemini API

Project Structure

Setup Instructions

Usage

Command Line

Web Interface

Output

Caching System

License

Contributors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages