Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLS12: Use mul/add rather than pippengers for low pair count G1/G2 MSM #226

Merged

Conversation

garyschulte
Copy link
Contributor

@garyschulte garyschulte commented Dec 13, 2024

For bls12-381 G1 and G2 MSM precompiles, use MUL/ADD loop for low pair counts, since the overhead cost of the gnark-crypto pippenger's algo implementation is more expensive than a straight MUL/ADD loop until the pair count gets over 2.

Added unit tests and data sources to use to determine the 'sweet spot' on various architectures (mainly Apple Silicon and X86_64).

Testing on intel shows mul/add is faster up to 2 pairs with uncapped parallelism and 6 pairs with 1 task.

intel i7 10700F:

G1 MSM ret 0 	pair count: 1 	mulAdd: 149 ms 	1task: 344 ms 	uncapped:218 ms
G1 MSM ret 0 	pair count: 2 	mulAdd: 247 ms 	1task: 463 ms 	uncapped:264 ms
G1 MSM ret 0 	pair count: 3 	mulAdd: 376 ms 	1task: 537 ms 	uncapped:316 ms
G1 MSM ret 0 	pair count: 4 	mulAdd: 508 ms 	1task: 611 ms 	uncapped:370 ms
G1 MSM ret 0 	pair count: 5 	mulAdd: 651 ms 	1task: 695 ms 	uncapped:423 ms
G1 MSM ret 0 	pair count: 6 	mulAdd: 777 ms 	1task: 771 ms 	uncapped:472 ms
G1 MSM ret 0 	pair count: 7 	mulAdd: 906 ms 	1task: 846 ms 	uncapped:521 ms
G1 MSM ret 0 	pair count: 8 	mulAdd: 1036 ms 	1task: 917 ms 	uncapped:571 ms
G1 MSM ret 0 	pair count: 9 	mulAdd: 1159 ms 	1task: 985 ms 	uncapped:624 ms
G1 MSM ret 0 	pair count: 10 	mulAdd: 1293 ms 	1task: 1065 ms 	uncapped:674 ms
G1 MSM ret 0 	pair count: 11 	mulAdd: 1433 ms 	1task: 1131 ms 	uncapped:743 ms
G1 MSM ret 0 	pair count: 12 	mulAdd: 1587 ms 	1task: 1194 ms 	uncapped:773 ms
G1 MSM ret 0 	pair count: 13 	mulAdd: 1692 ms 	1task: 1259 ms 	uncapped:831 ms
G1 MSM ret 0 	pair count: 14 	mulAdd: 1840 ms 	1task: 1330 ms 	uncapped:877 ms
G1 MSM ret 0 	pair count: 15 	mulAdd: 1933 ms 	1task: 1394 ms 	uncapped:931 ms
G1 MSM ret 0 	pair count: 16 	mulAdd: 2077 ms 	1task: 1469 ms 	uncapped:988 ms

G2 MSM ret 0 	pair count: 1 	mulAdd: 263 ms 	1task: 619 ms 	uncapped:436 ms
G2 MSM ret 0 	pair count: 2 	mulAdd: 479 ms 	1task: 852 ms 	uncapped:495 ms
G2 MSM ret 0 	pair count: 3 	mulAdd: 752 ms 	1task: 1026 ms 	uncapped:550 ms
G2 MSM ret 0 	pair count: 4 	mulAdd: 929 ms 	1task: 1153 ms 	uncapped:613 ms
G2 MSM ret 0 	pair count: 5 	mulAdd: 1168 ms 	1task: 1310 ms 	uncapped:750 ms
G2 MSM ret 0 	pair count: 6 	mulAdd: 1477 ms 	1task: 1507 ms 	uncapped:807 ms
G2 MSM ret 0 	pair count: 7 	mulAdd: 1629 ms 	1task: 1556 ms 	uncapped:801 ms
G2 MSM ret 0 	pair count: 8 	mulAdd: 1854 ms 	1task: 1680 ms 	uncapped:869 ms
G2 MSM ret 0 	pair count: 9 	mulAdd: 2066 ms 	1task: 1802 ms 	uncapped:940 ms
G2 MSM ret 0 	pair count: 10 	mulAdd: 2331 ms 	1task: 1938 ms 	uncapped:1013 ms
G2 MSM ret 0 	pair count: 11 	mulAdd: 2554 ms 	1task: 2045 ms 	uncapped:1082 ms
G2 MSM ret 0 	pair count: 12 	mulAdd: 2788 ms 	1task: 2141 ms 	uncapped:1155 ms
G2 MSM ret 0 	pair count: 13 	mulAdd: 3031 ms 	1task: 2286 ms 	uncapped:1283 ms
G2 MSM ret 0 	pair count: 14 	mulAdd: 3350 ms 	1task: 2406 ms 	uncapped:1303 ms
G2 MSM ret 0 	pair count: 15 	mulAdd: 3487 ms 	1task: 2465 ms 	uncapped:1381 ms
G2 MSM ret 0 	pair count: 16 	mulAdd: 3713 ms 	1task: 2645 ms 	uncapped:1425 ms

Testing on Mac shows 2 pairs with uncapped parallelism and 7 pairs with 1 task.

Mac M2 Max:

G1 MSM ret 0 	pair count: 1 	mulAdd: 286 ms 	1task: 415 ms 	uncapped:234 ms
G1 MSM ret 0 	pair count: 2 	mulAdd: 246 ms 	1task: 543 ms 	uncapped:293 ms
G1 MSM ret 0 	pair count: 3 	mulAdd: 396 ms 	1task: 704 ms 	uncapped:363 ms
G1 MSM ret 0 	pair count: 4 	mulAdd: 525 ms 	1task: 777 ms 	uncapped:414 ms
G1 MSM ret 0 	pair count: 5 	mulAdd: 693 ms 	1task: 855 ms 	uncapped:471 ms
G1 MSM ret 0 	pair count: 6 	mulAdd: 799 ms 	1task: 988 ms 	uncapped:528 ms
G1 MSM ret 0 	pair count: 7 	mulAdd: 941 ms 	1task: 1036 ms 	uncapped:570 ms
G1 MSM ret 0 	pair count: 8 	mulAdd: 1150 ms 	1task: 1108 ms 	uncapped:632 ms
G1 MSM ret 0 	pair count: 9 	mulAdd: 1273 ms 	1task: 1190 ms 	uncapped:730 ms
G1 MSM ret 0 	pair count: 10 	mulAdd: 1351 ms 	1task: 1284 ms 	uncapped:818 ms
G1 MSM ret 0 	pair count: 11 	mulAdd: 1488 ms 	1task: 1383 ms 	uncapped:778 ms
G1 MSM ret 0 	pair count: 12 	mulAdd: 1625 ms 	1task: 1426 ms 	uncapped:845 ms
G1 MSM ret 0 	pair count: 13 	mulAdd: 1778 ms 	1task: 1509 ms 	uncapped:926 ms
G1 MSM ret 0 	pair count: 14 	mulAdd: 1896 ms 	1task: 1588 ms 	uncapped:939 ms
G1 MSM ret 0 	pair count: 15 	mulAdd: 2022 ms 	1task: 1647 ms 	uncapped:990 ms
G1 MSM ret 0 	pair count: 16 	mulAdd: 2170 ms 	1task: 1742 ms 	uncapped:1042 ms


G2 MSM ret 0 	pair count: 1 	mulAdd: 470 ms 	1task: 868 ms 	uncapped:523 ms
G2 MSM ret 0 	pair count: 2 	mulAdd: 650 ms 	1task: 1244 ms 	uncapped:650 ms
G2 MSM ret 0 	pair count: 3 	mulAdd: 972 ms 	1task: 1534 ms 	uncapped:759 ms
G2 MSM ret 0 	pair count: 4 	mulAdd: 1370 ms 	1task: 1783 ms 	uncapped:916 ms
G2 MSM ret 0 	pair count: 5 	mulAdd: 1759 ms 	1task: 2031 ms 	uncapped:963 ms
G2 MSM ret 0 	pair count: 6 	mulAdd: 2095 ms 	1task: 2261 ms 	uncapped:1048 ms
G2 MSM ret 0 	pair count: 7 	mulAdd: 2424 ms 	1task: 2450 ms 	uncapped:1137 ms
G2 MSM ret 0 	pair count: 8 	mulAdd: 2835 ms 	1task: 2647 ms 	uncapped:1245 ms
G2 MSM ret 0 	pair count: 9 	mulAdd: 3126 ms 	1task: 2826 ms 	uncapped:1383 ms
G2 MSM ret 0 	pair count: 10 	mulAdd: 3449 ms 	1task: 3070 ms 	uncapped:1394 ms
G2 MSM ret 0 	pair count: 11 	mulAdd: 3883 ms 	1task: 3276 ms 	uncapped:1551 ms
G2 MSM ret 0 	pair count: 12 	mulAdd: 4045 ms 	1task: 3348 ms 	uncapped:1516 ms
G2 MSM ret 0 	pair count: 13 	mulAdd: 4374 ms 	1task: 3531 ms 	uncapped:1615 ms
G2 MSM ret 0 	pair count: 14 	mulAdd: 4698 ms 	1task: 3796 ms 	uncapped:1686 ms
G2 MSM ret 0 	pair count: 15 	mulAdd: 5021 ms 	1task: 3830 ms 	uncapped:1747 ms
G2 MSM ret 0 	pair count: 16 	mulAdd: 5419 ms 	1task: 4041 ms 	uncapped:1832 ms

Signed-off-by: garyschulte <garyschulte@gmail.com>
Signed-off-by: garyschulte <garyschulte@gmail.com>
… pippengers

Signed-off-by: garyschulte <garyschulte@gmail.com>
Signed-off-by: garyschulte <garyschulte@gmail.com>
@garyschulte garyschulte merged commit 8bb655d into hyperledger:main Dec 17, 2024
12 checks passed
@garyschulte garyschulte deleted the feature/non-pip-for-low-pair-count branch December 17, 2024 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants