Add microbenchmark for layer normalization and improve latency #32394
windows.yml
on: pull_request
Windows-CUDA-12
44m 41s
Vcpkg
1m 38s