Skip to content

A tiny Python no-string package for performing translation of a massive stream of texts with native support of pre-annotated fixed-spans that are invariant for translator.

License

Notifications You must be signed in to change notification settings

nicolay-r/bulk-translate

Repository files navigation

bulk-translate 0.25.1

Open In Colab twitter PyPI downloads

A tiny Python no-string package for performing translation of a massive CSV/JSONL files that natively provides support of pre-annotated fixed-spans that are invariant for translator.

Description

📘 More on spans

📘 bulk-translate features

The out-of-the box features of the bulk-translate are:

  • ✅ Support of the spans for annotation / optional translation.
  • ✅ Native Implementation of two translation modes:
    • fast-mode: exploits extra chars that could be used for grouping all the text parts into single batch with further deconstruction.
    • accurate: performs individual translation of each text part.
  • ✅ No strings: you're free to adopt any LM / LLM backend.
    • Support googletrans by default.

Installation

From PyPI:

pip install bulk-translate

or latest version from here:

pip install git+https://github.com/nicolay-r/bulk-translate

Usage

API

Command Line / Shell

NOTE: Spans supports only in JSON-lines format.

NOTE: Requires source_iter package installation.

For the following test.tsv example data with annotated entities enclosed in square brackets:

python -m bulk_translate.translate \
    --src "test/data/test.tsv" \
    --prompt "{text}" \
    --adapter "dynamic:models/googletrans_310a.py:GoogleTranslateModel" \
    --output "test-translated.jsonl" \
    %%m \
    --src "auto" \
    --dest "ru"

Powered by

The pipeline construction components were taken from AREkit [github]

About

A tiny Python no-string package for performing translation of a massive stream of texts with native support of pre-annotated fixed-spans that are invariant for translator.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published