tsalib/models/bert at master · ofnote/tsalib

README.md

Files obtained from the original BERT (tensorflow) repository.

Annotated with named shapes and compacted using tensor shorthand operators.

Benefits:

Several cryptic, shape wrangling functions (reshape_from_matrix, reshape_to_matrix, transpose_for_scores) turn into convenient, lucid one-liners
The flow of shapes becomes far more apparent in the code (courtesy both shape annotations and warp tsn arguments)
Avoid copying around dimension sizes as arguments (get_dim_vars by name at any location)
Found inconsistencies between documented and runtime shapes and duplicate definitions in the original code.

Code size reduced throughout. Lines in attention_layer function reduced from ~200 to ~175.

Code can be simplified and cleaned up further.

The TSA-annotated Pytorch version of BERT is available in a separate repository here.