Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add mkl_fft and llvmlite to software stack? #71

Open
moustakas opened this issue Aug 6, 2024 · 2 comments
Open

add mkl_fft and llvmlite to software stack? #71

moustakas opened this issue Aug 6, 2024 · 2 comments

Comments

@moustakas
Copy link
Member

As part of a major performance-focused restructuring of https://github.com/desihub/fastspecfit that I'm doing with @jdbuhler, we would like to request two additional packages be added to DESI conda stack:

However, these packages are not compatible with the currently pinned version of mkl=2020.0. There's a comment in https://github.com/desihub/desiconda/blob/main/conf/conda-pkgs.sh that says:

mkl=2020.0 because that is the last version that guarantees bitwise identical output for bitwise idential input

Is there a report, thread, or ticket which discusses this issue?

The latest version of mkl is 2023.1.0. Can we test the effect of upgrading this package?
https://anaconda.org/anaconda/mkl/files

@jdbuhler
Copy link

jdbuhler commented Aug 6, 2024 via email

@sbailey
Copy link
Contributor

sbailey commented Aug 12, 2024

Context on pinning mkl=2020.0. When porting to cori, we found that numpy/scipy.linalg.eigh had non-reproducible answers even when run back-to-back on the same input in the same python script on the same machine. @lastephey traced this to an MKL issue and put together a reproducer at https://github.com/lastephey/eigh-mkl-bug. She reported that to Intel via our internal contacts on the NESAP team, and they filed internal Intel tickets about it, concluding that

Thank you for the detailed reproducer! I was able to reproduce the behavior, but this is not a bug. Intel MKL does not guarantee bit-wise identical results by default, as there may be a performance impact to do so. However, Intel MKL does offer a conditional numerical reproducibility feature that will provide reproducible results, subject to some limitations. For instance, there are some codepaths in Intel MKL that rely on aligned data, but if CNR mode is enabled then those codepaths are not taken regardless of data alignment.

If I enable CNR mode via:
export MKL_CBWR=AUTO
Then the test passes.

You can read more about reproducibility:
https://software.intel.com/content/www/us/en/develop/articles/introduction-to-the-conditional-numerical-reproducibility-cnr.html
https://software.intel.com/content/www/us/en/develop/documentation/mkl-linux-developer-guide/top/obtaining-numerically-reproducible-results/getting-started-with-conditional-numerical-reproducibility.html

If run-to-run bit-wise reproducibility is needed for Python, then perhaps CNR mode should be enabled by default when using Intel MKL.

Their argument is basically that reproducibility at the machine round-off level is meaningless and we shouldn't worry about it, favoring performance gains instead. In practice, that makes testing a huge pain because the output always changes and you have to check in detail every time whether the change was bad or not.

While they were sorting out the export MKL_CBWR=AUTO option, we found that mkl=2020.0 was the last version that did not have this problem and we stuck with that and haven't seriously explored the MKL_CBWR=AUTO option (performance impacts, whether it really fixes the problem in all cases, etc.). At some point this is going to bite us due to our other required packages needing newer MKL, but we're not there yet.

Regarding mkl_fft, in general I'm hesitant to bring in more required dependencies, and we're already in a bit of dependency hell with the desiconda packages, so there has to be a pretty strong case for the improvement offered by mkl_fft. I also view MKL-specific dependencies as risky since the future isn't guaranteed to be Intel/MKL-compatible, e.g. I wouldn't be surprised if some future machine used something like https://www.nvidia.com/en-us/data-center/grace-hopper-superchip/ and wasn't 100% MKL compatible and we use some MKL-alternative instead. Ideally that would be transparent at the numpy/scipy level, but if we find that we need to install MKL-specific dependencies and our code doesn't work if they aren't installed (vs. just running faster if they are installed), that's a warning flag for future maintainability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants