-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use size_t
in mr2d_malloc
instead of index-bound Int
#85
Conversation
In C, the type of the number of bytes passed to `malloc` *is* of type `size_t`. The current code ties that number of bytes to the general "Index type" (`Int`) which is defined at compile time and does fail on 32-bits systems setting `Int` as 64-bits. Additionally, this artificially limits 64-bit-element aux matrices to 2GB instead of 16GB with 32-bits signed indices.
At least at home, all tests besides the two listed in issue #69 pass ( |
My |
@julielangou Can I kindly ask for you opinion on this as a maintainer? The gist of it is that we have this signature This PR operates this transformation: The main advantage of this on 64-bit system is allowing the use of the full signed 32-bit indexing range instead of |
The gist of it is that we have this signature `mr2d_malloc(Int n)`, bounding the internal `malloc` to `Int`, which is Scalapack's indexing type. This is not how `malloc` is defined and means that maximum AUX matrix size is artifically limited on 64-bit systems to 2GB. This PR operates this transformation: `mr2d_malloc(Int n) -> mr2d_malloc(size_t n)`, yet ensures `Int -> size_t` conversion does not involve Int-negative values and ensures that no 64-bit values are passed to `malloc` on 32-bit systems. The main advantage of this on 64-bit system allows the use of the **full** _signed_ 32-bit indexing range instead of `range / element size`. E.g. the max AUX matrix size is now 16GB instead of 2GB previously. With this, "standard" 32-bit Scalapack and Blas/Lapack can still be used in programs like GPAW. Details (and full commit history) are in Reference-ScaLAPACK/scalapack#85 which does not seem to receive much attention. This patch is a way to provide that feature to conda users in the meantime.
Hello, @gdonval, @julielangou and @langou. I am asking you about status of this MR? Do one need to make fixes to it's code or can it be merged as is? Pgemr2d is important function for some applications, which rely of scalapack SVD decomposition and some other functions. Current implementation does not allow to solve problems with large input sizes with optimal MPI configurations. |
If tests still pass, this should be fine. |
The gist of it is that we have this signature `mr2d_malloc(Int n)`, bounding the internal `malloc` to `Int`, which is Scalapack's indexing type. This is not how `malloc` is defined and means that maximum AUX matrix size is artifically limited on 64-bit systems to 2GB. This PR operates this transformation: `mr2d_malloc(Int n) -> mr2d_malloc(size_t n)`, yet ensures `Int -> size_t` conversion does not involve Int-negative values and ensures that no 64-bit values are passed to `malloc` on 32-bit systems. The main advantage of this on 64-bit system allows the use of the **full** _signed_ 32-bit indexing range instead of `range / element size`. E.g. the max AUX matrix size is now 16GB instead of 2GB previously. With this, "standard" 32-bit Scalapack and Blas/Lapack can still be used in programs like GPAW. Details (and full commit history) are in Reference-ScaLAPACK/scalapack#85 which does not seem to receive much attention. This patch is a way to provide that feature to conda users in the meantime.
In C, the type of the number of bytes passed to
malloc
is of typesize_t
.The current code ties that number of bytes to the general "Index type" (
Int
) which is defined at compile time and does fail on 32-bits systems settingInt
as 64-bits if the number exceeds the 32-bit range.Additionally, this artificially limits 64-bit-element aux matrices to 2GB instead of 16GB with 32-bits signed indices. This is a limit I have reached.
Changing the type back to
size_t
fixes all of that.