Skip to content

Commit

Permalink
changed to proper Xavier initialization, existing implementation was … (
Browse files Browse the repository at this point in the history
#1927)

Summary:
…resulting in a large negative bias, which was killing all gradients through the following relu. https://paperswithcode.com/method/xavier-initialization

Pull Request resolved: #1927

Reviewed By: davidberard98

Differential Revision: D49754019

Pulled By: xuzhao9

fbshipit-source-id: 436676afed9bcc0f464cd1b25465444a98a52b5a
  • Loading branch information
eknag authored and facebook-github-bot committed Sep 29, 2023
1 parent 3f11b81 commit 827f90b
Showing 1 changed file with 1 addition and 2 deletions.
3 changes: 1 addition & 2 deletions torchbenchmark/models/dlrm/dlrm_s_pytorch.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,8 +149,7 @@ def create_mlp(self, ln, sigmoid_layer):
mean = 0.0 # std_dev = np.sqrt(variance)
std_dev = np.sqrt(2 / (m + n)) # np.sqrt(1 / m) # np.sqrt(1 / n)
W = np.random.normal(mean, std_dev, size=(m, n)).astype(np.float32)
std_dev = np.sqrt(1 / m) # np.sqrt(2 / (m + 1))
bt = np.random.normal(mean, std_dev, size=m).astype(np.float32)
bt = np.zeros(m).astype(np.float32) # see upstream PR at https://github.com/facebookresearch/dlrm/pull/358
# approach 1
LL.weight.data = torch.tensor(W, requires_grad=True)
LL.bias.data = torch.tensor(bt, requires_grad=True)
Expand Down

0 comments on commit 827f90b

Please sign in to comment.