Files obtained from the original BERT (tensorflow) repository.
Annotated with named shapes and compacted using tensor shorthand operators.
Benefits:
- Several cryptic, shape wrangling functions (
reshape_from_matrix
,reshape_to_matrix
,transpose_for_scores
) turn into convenient, lucid one-liners - The flow of shapes becomes far more apparent in the code (courtesy both shape annotations and
warp
tsn arguments) - Avoid copying around dimension sizes as arguments (
get_dim_vars
by name at any location) - Found inconsistencies between documented and runtime shapes and duplicate definitions in the original code.
Code size reduced throughout. Lines in attention_layer
function reduced from ~200 to ~175.
Code can be simplified and cleaned up further.
The TSA-annotated Pytorch version of BERT is available in a separate repository here.