- Rank of attention in GATv1 is unconditional on query node
- Because of this, some graph problems cannot be expressed by GAT
- Proof GAT doesn't compute a dynamic type of attention. It's only static.
- Static means for any query node, the attention is monotonic with respect to its neighbors.
- Static attention:
- Family of functions computing scoring for a set of key vectors and query vectors. For every f there's always a highest scoring key.
- This is limiting because regardless of the query, every function has a key that's always selected
- BUT different keys have different relevance to different queries in real problems. How to express this?
- Dynamic attention:
- Every key has different scoring based on the query node. Notice dynamic attention families will have strict subsets of static attention.
- There exists a j_max wo that a_2(Wh_j) is maximal for all j in V.
- In that case, due to the monotonicity of LeakyReLU, for every query node i, the node j_max is leading to the maximal value of the distribution.
- To avoid this, we can apply the "a" learned attention layer after applying the LeakyReLU non linearity.
- Certain problems, like the following, cannot be learned with static attention (proven in appendix).