# Chat with PDF and Images 📄🤖
A Streamlit-based app that enables users to upload PDF documents or images, extracts text from them, and allows interactive chat about the contents. This app uses Google Generative AI (Gemini API) to answer questions based on the uploaded documents.
## Features
- Upload PDF or Image (PNG, JPG, JPEG) files.
- Extract text from PDF documents or images.
- Ask questions related to the uploaded content.
- Interactive chat interface with animated styling.
- Simple, clean UI with custom CSS styles.
## Tech Stack
- **Streamlit**: For creating an interactive UI.
- **PyPDF2**: To extract text from PDF files.
- **Tesseract OCR**: To extract text from images.
- **Google Generative AI (Gemini API)**: For generating answers based on extracted text.
- **Pillow**: For image processing.
## Getting Started
### Prerequisites
1. Python 3.7+
2. [Tesseract OCR](https://github.com/tesseract-ocr/tesseract) installed (ensure the path to `tesseract.exe` is correct).
3. Google Generative AI (Gemini API) key.
### Installation
1. **Clone the repository:**
```bash
git clone https://github.com/sahil352005/ChatWithPdf-Images.git
cd ChatWithPdf-Images
-
Set up a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt
-
Set up
.env
file for API key: Create a.env
file in the root directory and add your Gemini API key:GOOGLE_API_KEY=your_gemini_api_key
-
Configure Tesseract OCR Path: Update the path to
tesseract.exe
inapp.py
:pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe' # Update if needed
-
Run the application:
streamlit run app.py
-
Upload PDF or Image:
- Go to the sidebar to upload a PDF or image.
- The app will extract text and display it in the sidebar.
-
Interact with Chat:
- Ask questions about the uploaded content, and the chatbot will generate responses based on extracted text.
ChatWithPdf-Images/
├── app.py # Main application file
├── requirements.txt # Python dependencies
└── .env # Contains the Gemini API key
Here are some screenshots to give you a glimpse of the app's interface and functionality:
Refer to requirements.txt
for all dependencies, including:
streamlit
PyPDF2
Pillow
pytesseract
google-generativeai
langchain
- Thanks to Google Generative AI (Gemini API) for the content generation capability.
- Icons and animations inspired by CSS libraries.
This project is open-source and available under the MIT License.
### Instructions for Adding Screenshots:
1. Place your screenshots (e.g., `home_screen.png`, `pdf_upload.png`, `chat_interface.png`) in an `assets` folder.
2. Commit the changes to GitHub to ensure the images appear in the `README.md`.
This setup gives users a visual guide along with installation and usage instructions. Let me know if you need further customization!
## License
This project is open-source and available under the MIT License.