A poll on the Embedding class #121

gitfourteen · 2024-03-22T09:11:49Z

    def __init__(self, d_model, vocab):
        super(Embeddings, self).__init__()
        self.lut = nn.Embedding(vocab, d_model)
        self.d_model = d_model

    def forward(self, x):
        return self.lut(x) * math.sqrt(self.d_model)

Do you know why return the look up table by multiplying a constant d_model?

The text was updated successfully, but these errors were encountered:

kuraga · 2024-09-14T18:49:47Z

In our model, we share the same weight matrix between the two embedding layers and the pre-softmax linear transformation, similar to (cite). In the embedding layers, we multiply those weights by $\sqrt{d_{\text{model}}}$.

gitfourteen · 2024-09-15T12:03:28Z

In our model, we share the same weight matrix between the two embedding layers and the pre-softmax linear transformation, similar to (cite). In the embedding layers, we multiply those weights by
  dmodel

Thanks, does this paper explain why to multiply the constant d_model mathematically?

kuraga · 2024-09-21T11:42:59Z

Not sure but in my DL lectures said "for numerical stability" :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A poll on the Embedding class #121

A poll on the Embedding class #121

gitfourteen commented Mar 22, 2024 •

edited

Loading

kuraga commented Sep 14, 2024

gitfourteen commented Sep 15, 2024 •

edited

Loading

kuraga commented Sep 21, 2024

A poll on the Embedding class #121

A poll on the Embedding class #121

Comments

gitfourteen commented Mar 22, 2024 • edited Loading

kuraga commented Sep 14, 2024

gitfourteen commented Sep 15, 2024 • edited Loading

kuraga commented Sep 21, 2024

gitfourteen commented Mar 22, 2024 •

edited

Loading

gitfourteen commented Sep 15, 2024 •

edited

Loading