The open source implementation of the multi grouped query attention by the paper "GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints"
-
Updated
Dec 11, 2023 - Python
The open source implementation of the multi grouped query attention by the paper "GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints"
Adaptive Vision Transformer for efficient image classification, implementing dynamic token sparsification to reduce computational costs while maintaining accuracy.
Add a description, image, and links to the attentio topic page so that developers can more easily learn about it.
To associate your repository with the attentio topic, visit your repo's landing page and select "manage topics."