Efficient Neighborhood Attention

Neighborhood attention (Hassani et al., 2022) is a sliding window attention mechanism that achieves O(N) time and space complexity by localizing self-attention to nearest neighboring pixels. This is a reasonable approximation, especially for the lower layers of a vision model, where the receptive field is small. It is similar to Stand-Alone Self-Attention (Ramachandran et al., 2019) except for how edges are treated.

While it does achieve linear time complexity in theory, the official repository provides a tiled implementation that materializes the attention matrix, causing unnecessary data transfer between HBM and SRAM in the GPU. This repository provides a fused implementation similar to Flash Attention, where we avoid materializing the attention matrix to save time and memory.

Benchmarks

Forward and backward pass comparison of the fused and original natten implementation. Run with batch size 4, 4 heads, head dimension 128, and kernel size 5. The parameter on the x-axis is the image size in 2D, so the number of tokens is the square of this value. All experiments were run on an A100 GPU.

Forward Pass	Backward Pass

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
assets		assets
e_natten		e_natten
.gitignore		.gitignore
README.md		README.md
bench.py		bench.py
gradcheck.py		gradcheck.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Efficient Neighborhood Attention

Benchmarks

About

Releases

Packages

Languages

vikhyat/e_natten

Folders and files

Latest commit

History

Repository files navigation

Efficient Neighborhood Attention

Benchmarks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages