Releases · pszemraj/textsum

02 Nov 23:35

pszemraj

v0.2.2

8ecf04b

Latest

The default model loaded when creating a Summarizer is now BEE-spoke-data/pegasus-x-base-synthsumm_open-16k which I trained on diverse data (including some code) to be a better generalist model than the previous default model based on booksum.

as before, you can load the old model/any model on the hub with:

from textsum.summarize import Summarizer

summarizer = Summarizer(model_name_or_path = "pszemraj/long-t5-tglobal-base-16384-book-summary")

What's Changed

Update default model by @pszemraj in #15

Full Changelog: v0.2.1...v0.2.2

Contributors

pszemraj

Assets 2

18 Feb 22:52

pszemraj

v0.2.1

82bafca

batch processing improvements

improvements for batch processing

A small release that includes some improvements to the Summarizer class for batch-processing related use.

let's say you've loaded your Summarizer class:

from textsum.summarize import Summarizer

model_name = "pszemraj/pegasus-x-large-book_synthsumm-bf16" # recent model
summarizer = Summarizer(model_name)

new features/improvements:

Smart `call` Function for `Summarizer` Class:

Added a smart __call__ function to automatically distinguish between text input and file paths for summarization, allowing easier integration into batch processing and .map() tasks.

# Directly passing text to be summarized
summary_text = summarizer("This is a sample text to summarize.")
print(summary_text)

# Passing a file path to be summarized
output_filepath = summarizer(
    "/path/to/textfile.extension",
    output_dir="./my-summary-stash",
)
print(output_filepath)

Enhanced Batch Processing Controls:

Introduced disable_progress_bar and batch_delimiter options to improve control over batch processing and output formatting

from datasets import load_dataset

dataset = load_dataset("Trelis/tiny-shakespeare")
dataset = dataset.map(
    lambda x: {"summary": summarizer(x["text"], disable_progress_bar=True)},
    batched=False,
) # doesn't spam you with multiple progress bars!!
print(dataset)

Note: You can pass disable_progress_bar=True when instantiating the Summarizer() for cleaner inference.

You can now set the 'summary batch delimiter' string by the batch_delimiter arg when running inference:

summary_output = summarizer(text, batch_delimiter="<I AM A DELIMITER>")
print(summary_output)
# "Summary of first chunk.<I AM A DELIMITER>Summary of second chunk.<I AM A DELIMITER>Summary of third chunk."

by default, it's "\n\n"

Misc

default parameter update: the length_penalty for inference is now 1.0 (was 0.8)
code cleanup across modules, mostly for readability and maintainability.

What's Changed

Batch processing by @pszemraj in #13

Full Changelog: v0.2.0...v0.2.1

Contributors

pszemraj

Assets 2

08 Jul 01:10

pszemraj

v0.2.0

d51c4cd

inference optimization ⚗

🦿 this release adds support for some features that can make inference faster:

support for torch compile & optimum onnx¹
improved the textsum-dir command, more options/streamline etc, added fire package to help with that
- the saved config JSON files are now better structured to keep track of parameters, etc
some small adjustments to the Summarizer class

Next up: the UI app will finally get an overhaul.

please note that Support for is not an equivalent statement to "I have tested every longctx model with ONNX max quantization and sign off guaranteeing they will all provide accurate results". I've had some good results, but also some strange ones (with Long-T5 specifically). Test beforehand, and file an issue on the Optimum repo as needed 🙏 ↩

Assets 2

31 Jan 04:11

pszemraj

v0.1.5

9108f66

support for LLM.int8

On GPU, you can now use LLM.int8 to use less memory:

from textsum.summarize import Summarizer
summarizer = Summarizer(load_in_8bit=True) # loads default model in LLM.int8, taking 1/4 of the memory

What's Changed

support for LLM.int8 by @pszemraj in #6

Full Changelog: v0.1.3...v0.1.5

Contributors

pszemraj

Assets 2

21 Jan 23:39

pszemraj

v0.1.3

419eb3b

minor doc & logging updates

improves docs, logging, and makes it easier to set the inference params from JSON

What's Changed

Documentation & Updates by @pszemraj in #5

Full Changelog: v0.1.2...v0.1.3

Contributors

pszemraj

Assets 2

18 Jan 23:00

pszemraj

v0.1.2

7405685

pip install textsum

updated docs reflecting that it's on pypi!

pip install textsum

What's Changed

Update docs by @pszemraj in #4

Full Changelog: v0.1.1...v0.1.2

Contributors

pszemraj

Assets 2

18 Jan 22:17

pszemraj

v0.1.1

83a11f5

pypi

add to pypi

Assets 2

18 Jan 19:47

pszemraj

v0.1

f096278

Summarize class object

easy-to-use API in python courtesy of a class object:

from textsum.summarize import Summarizer

summarizer = Summarizer() # loads default model and parameters
out_str = summarizer.summarize_string('This is a long string of text that will be summarized.')
print(out_str)

What's Changed

Summarizer() class object by @pszemraj in #3

Full Changelog: v0.0.5...v0.1

Contributors

pszemraj

Assets 2

16 Jan 03:44

pszemraj

v0.0.5

aa63119

v0.0.5 Pre-release

Pre-release

Adds functionality in a CLI summarization workflow to summarize all text files in a dir: textsum-dir
UI demo with gradio CLI updated to textsum-ui

What's Changed

Summarization Pipeline CLI by @pszemraj in #2

Full Changelog: v0.0.1...v0.0.5

Contributors

pszemraj

Assets 2

20 Dec 08:36

pszemraj

v0.0.1

0beb6c5

MWE Pre-release

Pre-release

minimum working functionality "porting" the hf space to a python package that can be set up to instantiate the same demo locally with the ts-ui command

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

improvements for batch processing

Smart `call` Function for `Summarizer` Class:

Enhanced Batch Processing Controls:

Misc

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

Releases: pszemraj/textsum

v0.2.2 - new default model

What's Changed

Contributors

batch processing improvements

improvements for batch processing

Smart __call__ Function for Summarizer Class:

Enhanced Batch Processing Controls:

Misc

What's Changed

Contributors

inference optimization ⚗

support for LLM.int8

What's Changed

Contributors

minor doc & logging updates

What's Changed

Contributors

pip install textsum

What's Changed

Contributors

pypi

Summarize class object

What's Changed

Contributors

v0.0.5

What's Changed

Contributors

MWE

Smart `call` Function for `Summarizer` Class: