Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about causal attention in decoder #31

Open
Porthoos opened this issue Mar 19, 2024 · 1 comment
Open

Question about causal attention in decoder #31

Porthoos opened this issue Mar 19, 2024 · 1 comment

Comments

@Porthoos
Copy link

Hello, I'm instersted in your work, and I found a question whithout explanation in paper.
I noticed that the causal attention in decoder uses a different structure unlike normal transformers:

  1. MAT causal attention uses encoder output as 'Key' and uses decoder self-attention output as 'Query' and 'Value', while normal transformers causal attention use encoder output as 'Query' and 'Value', and use decoder self-attention output as 'Key'.
  2. MAT residual connects the output of encoder after the causal attention, while normal transformers residual connect the output of decoder self-attention.

I have circled this in the figure, is there any reason to change the structure like this?
E0XC2$0LN9H(MZ9R(X_Q0$X

@morning9393
Copy link
Collaborator

hiya, thx for your attention, here we exchange the usage of obs_rep, which is different from the classical transformer. The intuitive reason is that, for marl problems here, the obs_rep from the encoder contained more information than the act_rep from the first attention block in the decoder (which is different from traditional NLP task, like translation, where they contained the same volume of information.). But for now, the second attention block in the decoder could be removed and its results should be theoretically the same, w.r.t the advantage decomposition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants