Reorder load and scaling code to allow latency hidding for block-wise scaled GEMMs #3728
Job | Run time |
---|---|
4s | |
18m 51s | |
1h 12m 24s | |
1h 26m 52s | |
1h 21m 23s | |
1h 37m 6s | |
1h 32m 6s | |
30s | |
44s | |
41s | |
37s | |
37s | |
29s | |
7h 32m 24s |
Job | Run time |
---|---|
4s | |
18m 51s | |
1h 12m 24s | |
1h 26m 52s | |
1h 21m 23s | |
1h 37m 6s | |
1h 32m 6s | |
30s | |
44s | |
41s | |
37s | |
37s | |
29s | |
7h 32m 24s |