Skip to content

Commit

Permalink
Merge branch 'master' into v2
Browse files Browse the repository at this point in the history
  • Loading branch information
PJ-Finlay committed May 6, 2024
2 parents 3beed42 + 6f1b5d5 commit d791700
Show file tree
Hide file tree
Showing 8 changed files with 152 additions and 10 deletions.
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Argos Translate uses [OpenNMT](https://opennmt.net/) for translations and can be
Argos Translate also manages automatically pivoting through intermediate languages to translate between languages that don't have a direct translation between them installed. For example, if you have a es → en and en → fr translation installed you are able to translate from es → fr as if you had that translation installed. This allows for translating between a wide variety of languages at the cost of some loss of translation quality.

### Supported languages
Arabic, Azerbaijani, Catalan, Chinese, Czech, Danish, Dutch, English, Esperanto, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Persian, Polish, Portuguese, Russian, Slovak, Spanish, Swedish, Turkish, Ukrainian
Arabic, Azerbaijani, Catalan, Chinese, Czech, Danish, Dutch, English, Esperanto, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Persian, Polish, Portuguese, Russian, Slovak, Spanish, Swedish, Turkish, Ukrainian, and more

[Request a language](https://github.com/argosopentech/argos-translate/discussions/91)

Expand Down Expand Up @@ -156,6 +156,7 @@ rm -r ~/.local/share/argos-translate
## Related Projects
- [LibreTranslate-py](https://github.com/argosopentech/LibreTranslate-py) - Python bindings for LibreTranslate
- [MetalTranslate](https://github.com/argosopentech/MetalTranslate) - Customizable translation in C++
- [LibreTranslate/Locomotive](https://github.com/LibreTranslate/Locomotive) - Toolkit for training/converting LibreTranslate compatible language models 🚂
- [DesktopTranslator](https://github.com/ymoslem/DesktopTranslator) - [OpenNMT](https://opennmt.net/) based translation application
- [LibreTranslate-rs](https://github.com/grantshandy/libretranslate-rs) - LibreTranslate Rust bindings
- [LibreTranslate Go](https://github.com/SnakeSel/libretranslate) - LibreTranslate Golang bindings
Expand All @@ -176,11 +177,13 @@ Custom models trained on your own data are available for $1000/language (negotia
[I am also available for hire](https://www.argosopentech.com/about/) to do support, consulting, or custom software development.

## Donate
If you find this software useful donations are appreciated.
If you find this software useful donations are greatly appreciated and help to make this project sustainable.
- [GitHub Sponsor](https://github.com/sponsors/argosopentech)
- [PayPal](https://www.paypal.com/biz/fund?id=MCCFG437JP9PJ)
- Bitcoin: 16UJrmSEGojFPaqjTGpuSMNhNRSsnspFJT
- Ethereum: argosopentech.eth
- Litecoin: MCwu7RRWeCRJdsv2bXGj2nnL1xYxDBvwW5
- BCH: bitcoincash:qzvpxe8y5kq45kahqkyv3p88sjrhlymj2v6xdrj3cv

Paid supporters receive priority support.

Expand Down
3 changes: 1 addition & 2 deletions argostranslate/tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,7 @@ def encode(self, sentence: str) -> List[str]:
return tokens

def decode(self, tokens: List[str]) -> str:
detokenized = "".join(tokens)
return detokenized.replace("▁", " ")
return self.lazy_processor().decode_pieces(tokens)


class BPETokenizer(Tokenizer):
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
#! /usr/bin/env bash

# migrate files
# from Argos-Translate-LibreTranslate-2022-04-30.zip.torrent
# to Argos-Translate-LibreTranslate-2022-04-30.torrent

# https://github.com/argosopentech/argos-translate/issues/375
# restructure the torrent: Argos-Translate-LibreTranslate-2022-04-30

# Argos-Translate-LibreTranslate-2022-04-30.zip.torrent
# is one zip file, which is bad practice for torrents
# users should be able to download only some model files

# this assumes that zip files are reproducible...
# see: packing new zip file

set -e

if [ $# != 1 ]; then
echo "error: missing arguments"
echo "usage:"
echo " $0 path/to/Argos-Translate-LibreTranslate-2022-04-30.zip"
exit 1
fi

old_zip="$1"

if ! [ -e "$old_zip" ]; then
echo "error: missing input file: $old_zip"
exit 1
fi

if [ -e "Argos-Translate-LibreTranslate-2022-04-30" ]; then
echo "error: output dir exists: Argos-Translate-LibreTranslate-2022-04-30"
exit 1
fi

echo "migrating files from old torrent to new torrent. this will take about 7 minutes"

# simple integrity check
old_zip_size_expected=6747579574
old_zip_size=$(stat -L -c%s "$old_zip")
if [[ "$old_zip_size" != "$old_zip_size_expected" ]]; then
echo "error: wrong size of input file. expected $old_zip_size_expected bytes"
exit 1
fi

# proper integrity check
old_zip_hash_expected="b8ef6920d998454cca1ba4748c54ca2a2306e47942678f7ca81fa48a1f2bf488"
echo checking hash of input file. this will take about 2 minutes
time \
old_zip_hash=$(sha256sum "$old_zip" | cut -d' ' -f1)
if [[ "$old_zip_hash" != "$old_zip_hash_expected" ]]; then
echo "error: wrong hash of input file"
echo " actual: $old_zip_hash"
echo " expected: $old_zip_hash_expected"
exit 1
fi

# sha256sum *.zip
new_zip_hash_expected_list="$(cat <<EOF
b15dfbb0299b352b4209c4f6226223c850659447391c60c932b7b177bdd9fd51 argos-translate-files.zip
47907f95dd3053a20bd0020c4d381ee93d27849c4db871914d76409b38f9986f argos-translate-gui.zip
b4d306e43ff8927d99693e398c4bbbcbbf8fbf1535e1d6660aa888702a122913 argos-translate.zip
7dd00400a5fecb1bdd4d19a6e8c653cc3f06ebab5693a1acec1bf1ca53cfb0de CTranslate2.zip
c81498054566c63578f42ec17781b365b601884077d4eeb30120110c035ef965 LibreTranslate-cpp.zip
ed564894f6edd3b2cd3fa94876058d37c3fc031fd75af31e7b271f7bb9e93174 libretranslate-go.zip
3628e46ad2f9fe7e5e594e59bb7899d170adcf9f875eb23db1703fce8775b4f5 LibreTranslate-init.zip
78972c3b27f4070519231c2576c2543dc334b35c64aa21bda8bae9aba608bb8b libretranslate-php.zip
0788898ae45307a64190ae7921d30b6e8fb474f88b158ad9fea7330bc934f578 LibreTranslate-py.zip
bc82e9eb95aa3f07930125ed3f1a54c81ba041feea0f854f3999a7e5b3cabb47 libretranslate-rs.zip
0ac964c2acbcaae12580a58ead4eafdb80ad5c74e83092c1f1fa592ac926cbfd LibreTranslate-sh.zip
23cbaaf8dffb2c6678080c9dad5b70c24a7640b653f92080cd0fbe70e734f82f LibreTranslate.zip
889f90650f36b085b5e68447a7815f35000ca4da65602b090cb2479d232f481a OpenNMT-py.zip
3585842220f7294dd61d6e859550069ed656c2f0c78b3c7debedc8fbbed0c171 OpenNMT-tf.zip
6556001eff3fee9847c712c1f9593785b598df9ebedb1e5cebd2baedfc98f835 sentencepiece.zip
cc80d801a238664b04d69d5fef57cf3e4ed7ce1dfbb5edf7af4edae2e2a8fe6b stanza.zip
369452ed39193b57d0c32ff553a7b495f346d42d23e76ef9cba2a0f9537f96d7 Tokenizer.zip
a35f3b9e7f52677972ecfaa5be32ace99249da427cb24082dc0d2272ba3dc108 translate-html.zip
EOF
)"

echo unpacking the old zip file. this will take about 4 minutes
time \
unzip -q "$old_zip"

cd Argos-Translate-LibreTranslate-2022-04-30

echo packing new zip files. this will take about 1 minute
time {
find . -mindepth 1 -maxdepth 1 -type d -printf "%P\n" |
grep -v -x -e models -e dirs |
while read dir; do
if [ -e $dir.zip ]; then
echo "error: new zip file exists: $dir.zip"
continue
fi
echo packing new zip file: $dir.zip
# already compressed files should be "stored" in the zip archives
zip -q -r -n .zip:.xz:.gz:.bz2:.7z:.rar:.odp:.epub:.idx:.pack:.bin:.pt:.woff:.woff2:.png:.torrent $dir.zip $dir

# check integrity
new_zip_hash=$(sha256sum $dir.zip | cut -d' ' -f1)
new_zip_hash_expected=$(echo "$new_zip_hash_expected_list" | grep " $dir.zip$" | cut -d' ' -f1)
if [[ "$new_zip_hash" != "$new_zip_hash_expected" ]]; then
echo "error: wrong hash of output file: $dir.zip"
echo " actual: $new_zip_hash"
echo " expected: $new_zip_hash_expected"
echo " keeping the folder $dir"
else
# remove the old files
rm -rf $dir
fi
done
}

cd ..

cat <<EOF
done Argos-Translate-LibreTranslate-2022-04-30
next:
open the torrent Argos-Translate-LibreTranslate-2022-04-30.torrent
either from the torrent file
https://github.com/argosopentech/argos-translate/raw/master/p2p/Argos-Translate-LibreTranslate-2022-04-30.torrent
or from the magnet link
magnet:?xt=urn:btmh:12203d4464d5ccd13dfa5d1829f7c36f4b512c07e1d3086c1f7b9bb706864b82ef6f
and set the download folder to
$(dirname "$(readlink -f Argos-Translate-LibreTranslate-2022-04-30)")
then your torrent client should use the existing files and start seeding
EOF
Binary file not shown.
6 changes: 4 additions & 2 deletions p2p/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,7 @@
## IPFS links to individual language models
- [package index](https://www.argosopentech.com/argospm/index/)

## Torrent
- [Argos-Translate-LibreTranslate-2022-04-30.zip.torrent](https://github.com/argosopentech/argos-translate/raw/master/p2p/Argos-Translate-LibreTranslate-2022-04-30.zip.torrent)
## Torrents
##### [Argos-Translate-LibreTranslate-2022-04-30.zip.torrent](https://github.com/argosopentech/argos-translate/raw/master/p2p/Argos-Translate-LibreTranslate-2022-04-30.zip.torrent)
##### [Argos-Translate-LibreTranslate-2022-04-30.torrent (requires BitTorrent v2)](https://github.com/argosopentech/argos-translate/raw/master/p2p/Argos-Translate-LibreTranslate-2022-04-30.torrent)
Magnet Link: magnet:?xt=urn:btmh:12203d4464d5ccd13dfa5d1829f7c36f4b512c07e1d3086c1f7b9bb706864b82ef6f
7 changes: 4 additions & 3 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
ctranslate2==3.20.0
sentencepiece==0.1.99
ctranslate2>=4.0,<5
sentencepiece==0.2.0
stanza==1.1.1
sacremoses==0.0.53
packaging
sacremoses==0.0.53
1 change: 1 addition & 0 deletions scripts/update_to_pypi.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ exit 1

# Git tag version number
# git tag -a v1.0.0
# git push --tags

# Run from root of project
rm -rf build dist
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

setup(
name="argostranslate",
version="1.10.0", # Version also stored in argostranslate/__version__ # TODO: Automate this
version="1.9.6",
description="Open-source neural machine translation library based on OpenNMT's CTranslate2",
long_description=long_description,
long_description_content_type="text/markdown",
Expand Down

0 comments on commit d791700

Please sign in to comment.