"infiniCite" is an advanced academic paper knowledge base that hosts a subset of the world's academic paper metadata and offers a combination of keyword and semantic search engine with chat interfaces. Users are allowed to add papers to their personal library and perform operations like sharing, merging, and commenting. The platform also provides graph-based visualizations of references, citations, and co-authorship, allowing user to visualize the relationships between papers and authors in a graph traversal format.
-
Knowledge Base Hosting: hosts a subset of academic paper metadata including the embedding vectors and tldr summaries in order to integrate with semantic search and chatbot functionalities. The platform has approximately a million articles, summing up to 90GB of data.
-
Backend Management: Django backend for user account management, offering functionalities such as library categorization of literature. It allows users to add papers and perform operations like sharing, merging, and commenting.
-
Interactive UI: Developed parts of the frontend user interface using React. Integrated D3 for graph-based visualizations, enabling users to view references, citations, and co-authorship in a graph traversal format, enhancing user interactivity and experience.
-
Search Functionality: Incorporated Elasticsearch to create an interactive chat interface in the user's knowledge base. It provides keyword search with multiple filters. Leveraging article embedding vectors and the OpenAI API, semantic search capabilities have been added to the platform.
-
Deployment: A Docker image has been crafted for the project, and it is deployed using AWS EC2.
Local or cloud PostgreSQL database is required.
# In `config/db_config.py`.
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': '<dbname>',
'USER': '<username>',
'PASSWORD': '<password>',
'HOST': '<host>',
'PORT': '5432',
}
}
Configure django secret key.
# config/django_config.py
SECRET_KEY = '<your-secret-key>
Elasticsearch is a search engine that is used to provide advanced search functionality in infiniCite. When combined with article embedding, it enables semantic search capabilities, allowing users to search for papers based on their meaning rather than just keywords. Elasticsearch can be hosted either locally or in the cloud, depending on your preference.
# config/elastic_config.py
ELASTICSEARCH_DSL = {
'default': {
'hosts': '<your_connection_string>'
},
}
The OpenAI API is used to integrate with elasticsearch to provide a chat search interface.
# config/api_keys.py
OPENAI_API_KEY = "<your_openai_api_key>"
- Install system-level dependencies
brew install libpq
brew install postgresql
- Create and activate virtual environment
infiniCite
python -m venv infiniCite
source .venv/bin/activate
pip install -r requirements.txt
- Start Django server
python manage.py runserver
- Start all services
docker-compose -f docker-compose.yaml up
- Attach to running container
docker exec -it <container_id> bash
- Run Django server (In container)
python manage.py runserver
- Stop all services (On host)
docker-compose -f docker-compose.yaml down
We welcome contributions! Please see CONTRIBUTING.md
for details.
(Include license details here.)
(Include any acknowledgments, if necessary.)