The output of BitLinear is quite abnormal #35

Jiangxg · 2024-03-05T13:42:16Z

Describe the bug
I print the mean and variance of the tensor y in example.py.
Its mean and variance are abnormal, as follows:

mean and var of BitLinear output:
-0.567935049533844
1149.9969482421875

To make sure, I print the mean and variance of outputs from Linear and BitLinear, simutaneously.

mean and var of Linear output:
0.012186492793262005
0.33256232738494873
mean and var of BitLinear output:
0.9070871472358704
992.69384765625

I believe there are mistakes in the implementation of BitLinear in bitnet/bitlinear.py.

To Reproduce
Steps to reproduce the behavior:

print the mean and variance of y in example.py
insert output_linear = torch.nn.functional.linear(x, self.weight, self.bias) in bitnet/bitlinear.py line 129. Then print the mean and variance of output_linear

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

The text was updated successfully, but these errors were encountered:

suzuke · 2024-03-16T03:44:44Z

The implementation of this binear is completely wrong, not only does it not follow the process outlined in the Bitnet paper, but it also misunderstands all the computational principles. I don't understand why it still receives so many stars.

suzuke · 2024-03-16T03:54:25Z

Gemma, beta, and alpha are calculated using weights and input before quantization. These parameters are then utilized for weights binarization and input quantization. The binarized weights and quantized input undergo linear operations to produce the output, which is then dequantized using the previously calculated gemma, beta. It's not meaningful to calculate gemma and beta separately for quantization and dequantization stages, and even the implementation of grouping here is entirely nonsensical.

2020zyc · 2024-03-16T07:50:56Z

Gemma, beta, and alpha are calculated using weights and input before quantization. These parameters are then utilized for weights binarization and input quantization. The binarized weights and quantized input undergo linear operations to produce the output, which is then dequantized using the previously calculated gemma, beta. It's not meaningful to calculate gemma and beta separately for quantization and dequantization stages, and even the implementation of grouping here is entirely nonsensical.

hi, I don't understand what u say. Could u tell more?
The code just calculates the gamma/beta in quantization stage dynamically, then uses the two statistics to dequant activation.
No extra calculation of gamma/beta in dequantization stage.
You of course can take the previous calculation out of the quantization stage, but still need dynamically get the gamma/beta.

2020zyc · 2024-03-16T07:57:48Z

Gemma, beta, and alpha are calculated using weights and input before quantization. These parameters are then utilized for weights binarization and input quantization. The binarized weights and quantized input undergo linear operations to produce the output, which is then dequantized using the previously calculated gemma, beta. It's not meaningful to calculate gemma and beta separately for quantization and dequantization stages, and even the implementation of grouping here is entirely nonsensical.

Another implementation is BIT-Transformers. I don't know how its BitLinear works, especially the forward function. No obvious beta/gamma and no need to dequant output. Could u understand this code? Thanks

forward

suzuke · 2024-03-16T09:06:38Z

The issues I mentioned have been addressed in the commit 6cdb2ea.

Jiangxg · 2024-03-18T15:18:25Z

The issues I mentioned have been addressed in the commit 6cdb2ea.

Yes, most of the problem has been addressed. Still got a bug in the implementation of grouping. I am working on that.

Jiangxg added the bug Something isn't working label Mar 5, 2024

Jiangxg assigned kyegomez Mar 5, 2024

Jiangxg changed the title ~~[BUG] The output of BitLinear is quite abnormal~~ The output of BitLinear is quite abnormal Mar 5, 2024

Jiangxg mentioned this issue Mar 18, 2024

fix grouping in bitlinear.py #47

Merged

Jiangxg closed this as completed Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The output of BitLinear is quite abnormal #35

The output of BitLinear is quite abnormal #35

Jiangxg commented Mar 5, 2024 •

edited by polar-sh bot

Loading

suzuke commented Mar 16, 2024

suzuke commented Mar 16, 2024

2020zyc commented Mar 16, 2024

2020zyc commented Mar 16, 2024 •

edited

Loading

suzuke commented Mar 16, 2024

Jiangxg commented Mar 18, 2024

The output of BitLinear is quite abnormal #35

The output of BitLinear is quite abnormal #35

Comments

Jiangxg commented Mar 5, 2024 • edited by polar-sh bot Loading

Upvote & Fund

suzuke commented Mar 16, 2024

suzuke commented Mar 16, 2024

2020zyc commented Mar 16, 2024

2020zyc commented Mar 16, 2024 • edited Loading

suzuke commented Mar 16, 2024

Jiangxg commented Mar 18, 2024

Jiangxg commented Mar 5, 2024 •

edited by polar-sh bot

Loading

2020zyc commented Mar 16, 2024 •

edited

Loading