Skip to content

Latest commit

 

History

History
36 lines (29 loc) · 2.8 KB

How attentive are GATs (GATv2).md

File metadata and controls

36 lines (29 loc) · 2.8 KB

Key ideas

  • Rank of attention in GATv1 is unconditional on query node
  • Because of this, some graph problems cannot be expressed by GAT

Introduction

  • Proof GAT doesn't compute a dynamic type of attention. It's only static.
    • Static means for any query node, the attention is monotonic with respect to its neighbors.

Screenshot 2022-06-05 at 13 18 22

Preliminaries

  • Generic GNN layer: Screenshot 2022-06-05 at 13 18 45
  • GAT attention scoring: Screenshot 2022-06-05 at 13 19 13
  • GAT attention function: Screenshot 2022-06-05 at 13 19 21
  • GAT layer: Screenshot 2022-06-05 at 13 19 24

Static vs dynamic & limited expressivity of GAT

  • Static attention:
    • Family of functions computing scoring for a set of key vectors and query vectors. For every f there's always a highest scoring key.
    • This is limiting because regardless of the query, every function has a key that's always selected
    • BUT different keys have different relevance to different queries in real problems. How to express this?
  • Dynamic attention:
    • Every key has different scoring based on the query node. Notice dynamic attention families will have strict subsets of static attention.

Need for a new scoring function

Screenshot 2022-06-05 at 13 24 53

  • There exists a j_max wo that a_2(Wh_j) is maximal for all j in V.
  • In that case, due to the monotonicity of LeakyReLU, for every query node i, the node j_max is leading to the maximal value of the distribution.
  • To avoid this, we can apply the "a" learned attention layer after applying the LeakyReLU non linearity.

Screenshot 2022-06-05 at 13 26 54

Evaluation

  • Certain problems, like the following, cannot be learned with static attention (proven in appendix).

Screenshot 2022-06-05 at 13 29 21