PaperBrain is an intelligent research paper Q&A system that combines vector search and large language models to provide context-aware answers to research-related questions. It processes academic papers, understands their content, and generates structured, informative responses with proper citations and context.
- Smart Vector Search: Utilizes Qdrant for semantic similarity search of research papers
- Intelligent Analysis: Leverages LLaMA 3.2 for generating comprehensive, context-aware answers
- Structured Responses: Provides organized output with:
- Main answer summary
- Key points from papers
- Paper citations and references
- Analysis limitations
- Duplicate Detection: Intelligent tracking of shown papers to avoid repetition
- Analytics Dashboard: Track system usage, search patterns, and relevance metrics
- Conversation History: Maintain records of previous queries and responses
- Relevance Scoring: Clear explanation of paper matching with detailed relevance metrics
- Interactive Commands: System controls for analytics, history, and paper tracking
- Vector Store: Qdrant for efficient similarity search
- Embeddings: Nomic Embed Text for paper vectorization
- LLM Integration: LLaMA 3.2 (1B parameter model) via Ollama
- Infrastructure: Docker containerization
- Backend: Async Python with modern libraries
- API Layer: Async HTTP with HTTPX
# System requirements
- Python 3.9+
- Docker
- 4GB+ RAM for LLM operations
- Disk space for paper storage
- Clone the repository:
git clone https://github.com/ansh-info/PaperBrain.git
cd PaperBrain
- Create a virtual environment:
# Using conda
conda create --name PaperBrain python=3.11
conda activate PaperBrain
# Using venv
python -m venv env
source env/bin/activate # On Windows: .\env\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Start required services:
docker-compose up -d
- Pull required models:
# If you want other models
docker exec ollama ollama pull llama3.2:1b
docker exec -it ollama ollama pull mistral
docker exec -it ollama ollama pull nomic-embed-text
python src/vector.py
- Place your markdown files in the
markdowns/
directory - System automatically processes and indexes papers
- Handles duplicate detection and tracking
python src/llmquery.py #Run src/query.py to query qdrant database(without llm)
quit
orq
: Exit the programanalytics
: Display system usage statisticsclear
: Reset paper historyhistory
: View recent questions and responses
> What are the main approaches for discovering governing equations from data?
The system will provide:
1. Main Answer: Comprehensive summary
2. Key Points: Important findings
3. Paper Citations: Relevant sources
4. Limitations: Gaps in current knowledge
5. Relevance Scores: Why papers were selected
research-lens/
├── docker-compose.yml
├── requirements.txt
├── README.md
├── vector.py # Paper ingestion and processing
├── llmquery.py # Main Q&A interface
├── query.py # To query qdrant databse without llm
├── markdowns/ # Paper storage directory
└── processed_papers.json # Paper tracking database
Environment variables for system configuration:
QDRANT_HOST=localhost # Qdrant server host
QDRANT_PORT=6333 # Qdrant server port
OLLAMA_HOST=localhost # Ollama server host
OLLAMA_PORT=11434 # Ollama server port
-
Paper Ingestion:
- Reads markdown files from recommendations directory
- Generates embeddings using Nomic Embed Text
- Stores vectors and metadata in Qdrant
- Tracks processed papers to avoid duplicates
-
Query Processing:
- Converts user query to vector
- Performs similarity search
- Retrieves relevant papers
- Generates structured LLM response
-
Response Generation:
- Formats context for LLM
- Generates structured response
- Provides relevance explanations
- Maintains conversation history
- Export functionality (PDF, markdown)
- Advanced paper filtering options
- Citation network visualization
- Multi-language support
- Batch processing capabilities
- API interface for integration
- Enhanced analytics dashboard
- Custom prompt templates
Contributions are welcome! Please:
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Qdrant team for vector database
- Ollama project for LLM interface
- Nomic AI for embedding model
- LLaMA team for the base model
- The Markdowns were fetched using literatureSurvey
If you use this project in your research, please cite:
@software{PaperBrain_2024,
author = {Ansh Kumar and Apoorva Gupta},
title = {PaperBrain: Intelligent Research Paper Q&A System},
year = {2024},
url = {https://github.com/ansh-info/PaperBrain}
}