Paper Recommendation & Summarization

Developed during my master's thesis at TU Berlin, this library provides an end-to-end RAG pipeline for paper recommendation and summarization. Its purpose is to assist researchers in staying up-to-date with the latest research in their field. As such, it is focused on the discovery of new research, using OpenAlex as a data source.

From a free-form text description (FFTD) of a research interest, the system retrieves a set of candidate papers from OpenAlex, ranks them with regard to the query, and generates summaries tailored to the user's interest.

Architecture

Overview

Retrieval

Summarization

Usage

The docker image provides an execution environment for this library. For usage examples, see usage_example.ipynb or setup/test.py.

Setup

Prerequisites

PostgreSQL instance
- with pgvector (tested with 0.7.2 and 0.8.0)
- with pg_bestmatch_rs (for BM25, tested with 0.0.1)
Docker with Docker Compose
OpenAI API key

Preparing the Database

Setup the database schema via setup/ddl.sql.
E.g. psql -U [DB_USER] -d [DB_NAME] -f setup/ddl.sql
Load OpenAlex embeddings for topic matching via setup/openalex_embeddings.sql.
E.g. psql -U [DB_USER] -d [DB_NAME] -f setup/openalex_embeddings.sql

Setup Instructions

Copy .env.example to .env and fill in the required values (database connection parameters, OpenAI API key, etc).
Run docker compose build to obtain an image with the required dependencies.

Testing the Setup

You can test the setup by running
docker compose run --rm app bash -c "python3 setup/test.py 'llm rerankers'"

This script tests all components of the system, including the database connection, database extensions, OpenAI and OpenAlex APIs, and the reranking model.

This will use llm rerankers as the research interest description (FFTD), perform topic matching and retrieve a small number (100) of candidate papers from OpenAlex. These papers will be stored in the database and then ranked w.r.t the FFTD using a hybrid ranking model (embedding + BM25). After reranking via setwise.heapsort, the top 5 results are printed. The top 3 are then summarized, and the summaries are printed.

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
core		core
db		db
media		media
notebooks		notebooks
setup		setup
utils		utils
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
usage_example.ipynb		usage_example.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Paper Recommendation & Summarization

Architecture

Overview

Retrieval

Summarization

Usage

Setup

Prerequisites

Preparing the Database

Setup Instructions

Testing the Setup

About

Releases

Packages

Languages

License

fa-se/llm-paper-recommendation-summarization

Folders and files

Latest commit

History

Repository files navigation

Paper Recommendation & Summarization

Architecture

Overview

Retrieval

Summarization

Usage

Setup

Prerequisites

Preparing the Database

Setup Instructions

Testing the Setup

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages