Release Fast Transformer v0.1.0 · Rishit-dagli/Fast-Transformer

This is the initial release of Fast Transformer and implements Fast Transformer as a subclassed TensorFlow model.

Classes

FastAttention: Implements additive attention as a TensorFlow Keras layer, and supports using relative positional encodings.
PreNorm: Normalize the activations of the previous layer for each given example in a batch independently and apply some function to it, implemented as a TensorFlow Keras Layer.
FeedForward: Create a FeedForward neural net with two Dense layers and GELU activation, implemented as a TensorFlow Keras Layer.
FastTransformer: Implements the FastTransformer model using all the other classes, allows using rotary embeddings, weight tie projections, and converts to logits. Implemented as a TensorFlow Keras Model.