Skip to content

A text-content summarizer microservice written in GoLang and using gRPC for the API

License

Notifications You must be signed in to change notification settings

Lawrence-Godfrey/Text-Summarizer

Repository files navigation

Go Tests GitHub Last Commit GitHub GitHub code size in bytes

Text Summarizer

This is a Go microservice using a gRPC API to expose various NLP functionality with the end goal of summarizing pieces of text.

The live gRPC API can be accessed via https://summarizer.lawrences.tech

Functionality

  • Segmentation - Split text into sentences and words. This can be a difficult task due to the complexity of natural language. For example, the period character is used to denote the end of a sentence, but it can also be used to denote an abbreviation or decimal number. The segmentation service will use a combination of rules and machine learning to split text into sentences and words.
  • Named Entity Recognition - Identify important people, places, and things in the text. For example, in the sentence "George Washington was the first president of the United States", the named entities are "George Washington" and "United States". The named entity recognition service will use machine learning to identify named entities in the text.
  • Keyword Extraction - Identify the main topics of a piece of text. Here we use TF-IDF (term frequency-inverse document frequency) to identify the most important words in the text. TF-IDF is a statistical measure that evaluates how important a word is to a document in a collection of documents by comparing the number of times the word appears in the document to the number of documents in the collection that contain the word.
  • Summarization - Create a summary of the text. This can be done in two ways: extractive summarization and abstractive summarization. Extractive summarization involves selecting key sentences from the original text, while abstractive summarization involves generating new sentences that convey the main points.
    • Extractive Summarization - This will be done using a combination of the above services.
    • Abstractive Summarization - This will be done using an existing model, such as BART or T5.

Usage

To be able to use the API, you will need to download the protobuf file (api/proto/summarizer.proto) and generate the gRPC client code for your language of choice. To generate the Go client code you can simply use the script in this repository:

./scripts/genproto.sh

Resources

About

A text-content summarizer microservice written in GoLang and using gRPC for the API

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published