This project is a FastAPI-based text summarization service that supports multiple traditional summarization algorithms from the sumy
library. It is not based on large language models (LLMs) but uses classical approaches for summarization.
- Supports various summarization algorithms:
- SumBasic
- Luhn
- Edmundson
- LexRank
- TextRank
- LSA (Latent Semantic Analysis)
- Cleans the input text by removing unwanted characters and links.
- Python 3.8+
- FastAPI
- Pydantic
- sumy
- nltk
- numpy
-
Clone the repository:
git clone github.com/AmirTahaMim/SumSimple cd <repository-directory>
-
Create and activate a virtual environment:
python -m venv env source env/bin/activate # On Windows use `env\Scripts\activate`
-
Install the required packages:
pip install fastapi pydantic sumy nltk numpy
-
Download the necessary NLTK data:
python -c "import nltk; nltk.download('punkt')"
-
Run the FastAPI server:
uvicorn main:app --reload
-
The API will be available at
http://127.0.0.1:8000
.
- POST
/summarize/
{
"input_text": "Your base64 encoded text here",
"summarizer": "LSA",
"sentences_count": 5,
"language": "english"
}
input_text
: The text to summarize (base64 encoded).summarizer
: The summarization algorithm to use (default: LSA). Options: "SumBasic", "Luhn", "Edmundson", "LexRank", "TextRank", "LSA".sentences_count
: The number of sentences for the summary (default: 5).language
: The language of the text (default: "english").
curl -X POST "http://127.0.0.1:8000/summarize/" -H "Content-Type: application/json" -d '{
"input_text": "VGhpcyBpcyBhIHNhbXBsZSB0ZXh0IHRvIHN1bW1hcml6ZS4uLi4=", # Base64 encoded text
"summarizer": "LSA",
"sentences_count": 3,
"language": "english"
}'
{
"summary": "This is the summarized text."
}
The input text is cleaned by:
- Removing lines with '*' or fewer than 50 words.
- Removing markdown and HTML links.
- Removing URLs.
- Removing extra whitespace and unwanted characters.
Contributions are welcome! Please submit a pull request or open an issue to discuss any changes.
This project is licensed under the MIT License.