All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Fixed incorrect parsing at
pretrained/createReplaceDecoder()
function
- [#26]: load pretrained Roberta tokenizer failure
- Fixed errors at pretained.FromFile
- upgrade "golang.org/x/text" to fix github warning.
processor.NewRobertaProcessing
added 2 new parameters.- added
SequenceRanges
field toEncoding
struct
- Completed list of pretokenizers, decoders, normalizers, processors
- Fixed data race at
PostProcess
andEncodeBatch
- Error handling when
Tokenizer Model
is nil.
- Clean-up unwanted console print-out at
processor/bert
- Added pretrained model "Roberta" and "GPT2" and "BertLargeCasedWholeWordMaskingSquad"
- #14: fixed truncation and padding not working properly
- Update "example/truncation", "example/pretrained"
- #13: fixed Wordpiece Decoder not join incorrectly tokens and not strip prefix.
- #13: fixed Wordpiece Decoder not join incorrectly tokens and not strip prefix.
- #12: fixed using pointer to Decoder interface in Tokenizer struct.
- Updated
example_test
andexample
in README
- #11: added
addSpecialTokens
param toEncodeSingle
EncodePair
andTokenizer
APIs.
- Update Changelog and README
- #10: setup pretrained subpackage to load pretrained tokenizers.
- #8: Fixed
encoding.MergeWith
merge overflowing incorrectly.