Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Valgrind errors with struct assignment on ARM and PPC64LE #776

Open
gmaxwell opened this issue Jul 28, 2020 · 29 comments
Open

Valgrind errors with struct assignment on ARM and PPC64LE #776

gmaxwell opened this issue Jul 28, 2020 · 29 comments

Comments

@gmaxwell
Copy link
Contributor

For some time there has been a clang issue where clang emits a memcpy of a struct to itself and valgrind flags it as undefined behaviour. Clang developers gave the response that when the compiler does it, nothing is undefined behaviour. This is response is mostly true, but memcpy is a libc defined function and can change its behaviour out from under the compiler (and has in the past, it used to work okay with overlapping memory) -- in fact, memcpy can be replaced by dynamic linker at runtime (at the direction of the caller).

In any case wisdom of clang's behaviour aside, valgrind can't tell if it was the code or the compiler (which is also part of why valgrind is useful for finding compiler induced non-constant time behaviour).

For this library it is a practical problem both because valgrind errors are rightfully frightening to developers who would call the library but also because some people want to make running valgrind part of "make check" which would potentially elevate these useless errors to build failures (and teach users to ignore these sorts of reports).

It's also the case that where these issues are triggered is unstable and highly environment specific.

E.g. PPC64LE Fedora 32 Clang 10.0.0-2 -Os fails but other optimization levels do not.

Checkout this matrix of configurations on ARM8:

CC=gcc CFLAGS='-Os -g' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --with-scalar=64bit --with-field=64bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
ERROR SUMMARY: 0 errors from 0 contexts 
CC=gcc CFLAGS='-Os -g' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --with-scalar=64bit --with-field=64bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
ERROR SUMMARY: 0 errors from 0 contexts 
CC=gcc CFLAGS='-O3 -g' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --with-scalar=64bit --with-field=64bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
ERROR SUMMARY: 0 errors from 0 contexts 
CC=clang CFLAGS='-O2 -g' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --with-scalar=64bit --with-field=64bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
ERROR SUMMARY: 0 errors from 0 contexts 
CC=clang CFLAGS='-Os -g' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --with-scalar=64bit --with-field=64bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
ERROR SUMMARY: 1 errors from 1 contexts 
CC=clang CFLAGS='-O3 -g' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --with-scalar=64bit --with-field=64bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
ERROR SUMMARY: 0 errors from 0 contexts 
CC=gcc CFLAGS='-O2 -g' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --with-scalar=32bit --with-field=32bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
ERROR SUMMARY: 0 errors from 0 contexts 
CC=gcc CFLAGS='-Os -g' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --with-scalar=32bit --with-field=32bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
ERROR SUMMARY: 0 errors from 0 contexts 
CC=gcc CFLAGS='-O3 -g' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --with-scalar=32bit --with-field=32bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
ERROR SUMMARY: 0 errors from 0 contexts 
CC=clang CFLAGS='-O2 -g' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --with-scalar=32bit --with-field=32bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
ERROR SUMMARY: 0 errors from 0 contexts 
CC=clang CFLAGS='-Os -g' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --with-scalar=32bit --with-field=32bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
ERROR SUMMARY: 0 errors from 0 contexts 
CC=clang CFLAGS='-O3 -g' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --with-scalar=32bit --with-field=32bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
ERROR SUMMARY: 0 errors from 0 contexts 
CC=gcc CFLAGS='-O2 -g' ./configure --with-bignum=no --enable-experimental --enable-endomorphism --enable-module-ecdh --enable-module-recovery --with-scalar=64bit --with-field=64bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
ERROR SUMMARY: 0 errors from 0 contexts 
CC=gcc CFLAGS='-Os -g' ./configure --with-bignum=no --enable-experimental --enable-endomorphism --enable-module-ecdh --enable-module-recovery --with-scalar=64bit --with-field=64bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
ERROR SUMMARY: 2 errors from 2 contexts 
CC=gcc CFLAGS='-O3 -g' ./configure --with-bignum=no --enable-experimental --enable-endomorphism --enable-module-ecdh --enable-module-recovery --with-scalar=64bit --with-field=64bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
ERROR SUMMARY: 0 errors from 0 contexts 
CC=clang CFLAGS='-O2 -g' ./configure --with-bignum=no --enable-experimental --enable-endomorphism --enable-module-ecdh --enable-module-recovery --with-scalar=64bit --with-field=64bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
ERROR SUMMARY: 0 errors from 0 contexts 
CC=clang CFLAGS='-Os -g' ./configure --with-bignum=no --enable-experimental --enable-endomorphism --enable-module-ecdh --enable-module-recovery --with-scalar=64bit --with-field=64bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
ERROR SUMMARY: 2 errors from 2 contexts 
CC=clang CFLAGS='-O3 -g' ./configure --with-bignum=no --enable-experimental --enable-endomorphism --enable-module-ecdh --enable-module-recovery --with-scalar=64bit --with-field=64bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
ERROR SUMMARY: 0 errors from 0 contexts 
CC=gcc CFLAGS='-O2 -g' ./configure --with-bignum=no --enable-experimental --enable-endomorphism --enable-module-ecdh --enable-module-recovery --with-scalar=32bit --with-field=32bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
ERROR SUMMARY: 0 errors from 0 contexts 
CC=gcc CFLAGS='-Os -g' ./configure --with-bignum=no --enable-experimental --enable-endomorphism --enable-module-ecdh --enable-module-recovery --with-scalar=32bit --with-field=32bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
ERROR SUMMARY: 2 errors from 2 contexts 
CC=gcc CFLAGS='-O3 -g' ./configure --with-bignum=no --enable-experimental --enable-endomorphism --enable-module-ecdh --enable-module-recovery --with-scalar=32bit --with-field=32bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
ERROR SUMMARY: 0 errors from 0 contexts 
CC=clang CFLAGS='-O2 -g' ./configure --with-bignum=no --enable-experimental --enable-endomorphism --enable-module-ecdh --enable-module-recovery --with-scalar=32bit --with-field=32bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
ERROR SUMMARY: 0 errors from 0 contexts 
CC=clang CFLAGS='-Os -g' ./configure --with-bignum=no --enable-experimental --enable-endomorphism --enable-module-ecdh --enable-module-recovery --with-scalar=32bit --with-field=32bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
ERROR SUMMARY: 0 errors from 0 contexts 
CC=clang CFLAGS='-O3 -g' ./configure --with-bignum=no --enable-experimental --enable-endomorphism --enable-module-ecdh --enable-module-recovery --with-scalar=32bit --with-field=32bit --disable-openssl-tests && make clean && make -j4 valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
ERROR SUMMARY: 0 errors from 0 contexts 

So out of 24 configurations, 4 fail.

Each of the failing cases look like:

==45040== Source and destination overlap in memcpy(0x1ffeffefe0, 0x1ffeffefe0, 84)
==45040==    at 0x484F1D4: __GI_memcpy (in /usr/lib/aarch64-linux-gnu/valgrind/vgpreload_memcheck-arm64-linux.so)
==45040==    by 0x486B5B3: secp256k1_ge_neg (group_impl.h:95)
==45040==    by 0x4870737: secp256k1_ecmult_const.constprop.0 (ecmult_const_impl.h:260)
==45040==    by 0x4872A6B: secp256k1_ecdh (main_impl.h:53)
==45040==    by 0x109137: main (valgrind_ctime_test.c:73)
==45040== 
==45040== Source and destination overlap in memcpy(0x1ffeffefe0, 0x1ffeffefe0, 84)
==45040==    at 0x484F1D4: __GI_memcpy (in /usr/lib/aarch64-linux-gnu/valgrind/vgpreload_memcheck-arm64-linux.so)
==45040==    by 0x486B5B3: secp256k1_ge_neg (group_impl.h:95)
==45040==    by 0x487075F: secp256k1_ecmult_const.constprop.0 (ecmult_const_impl.h:266)
==45040==    by 0x4872A6B: secp256k1_ecdh (main_impl.h:53)
==45040==    by 0x109137: main (valgrind_ctime_test.c:73)

(The clang build with only one error gets only the ecmult_const_impl.h:260 one, because :266 is endo only)

So there are failures with both GCC and Clang Os only. GCC does it for 32 and 64 only with endo. Clang does it only with 64 bit but doesn't care about endo.

GCC previously had a bug a long time ago on this which was closed as resolved: https://gcc.gnu.org/bugzilla//show_bug.cgi?id=19410

Bug in LLVM tracker: https://bugs.llvm.org/show_bug.cgi?id=11763

I think this is a serious impedement to using valgrind except as a development/review tool, because the location of these issues is unpredictable, depends on random build flags, depends on the compiler version, and could come and go due to unrelated changes to the codebase. There are a large number of struct assignments in the code too, so a large number of places where one could crop up.

@real-or-random
Copy link
Contributor

@gmaxwell
Copy link
Contributor Author

I think that won't work well: they can show up in 100 different places, they'll move around, etc. Obviously we're not going to be maintaining a separate file with every struct assignment in the codebase. :P

I think suppressions might be workable for our CI purposes where we're targeting a small set of compilers and if we move something we can up just fix it on the spot. We also should be concerned about callers: I don't want to know how many security critical programs went years without being tested under valgrind just because they linked openssl and it returned ub tainted randomness.

For something like the ct timer tester I think we could make it just totally suppress the memcpy arguments check-- that's not what we're looking for with that test.

@real-or-random
Copy link
Contributor

I think that won't work well: they can show up in 100 different places, they'll move around, etc. Obviously we're not going to be maintaining a separate file with every struct assignment in the codebase. :P

The syntax is pretty clever, we could for example suppress it in secp256k1_ge_neg.

I think suppressions might be workable for our CI purposes where we're targeting a small set of compilers and if we move something we can up just fix it on the spot. We also should be concerned about callers: I don't want to know how many security critical programs went years without being tested under valgrind just because they linked openssl and it returned ub tainted randomness.

For something like the ct timer tester I think we could make it just totally suppress the memcpy arguments check-- that's not what we're looking for with that test.

I was thinking more into these directions. That the user can't use valgrind reliably is sad but not our fault and don't see what we could do about it except document it. And maybe report a valgrind bug. The only related bug I could find is https://bugs.kde.org/show_bug.cgi?id=148543.

@gmaxwell
Copy link
Contributor Author

Yeah, we should talk to the valgrind authors. This isn't technically a valgrind bug and GCC has previously fixed the behaviour. But LLVM/Clang indicate that they will not change the behaviour and instead require that they be used with a libc that can self memcpy safely, and this isn't a crazy position.

The memcpy-to-self seems like something that is pretty unlikely to be an actual bug, so perhaps at a minimum valgrind could be convinced to classify that as a different kind of error from general overlap.

@gmaxwell
Copy link
Contributor Author

Do we know anyone that is involved in the C standard? ... perhaps they could be convinced to at least elevate self-memcpy to implementation defined. :) I normally don't consider the testing of something even started until a compiler bug or two is reported... getting the language changed would be a new bar to meet in the future. :)

@elichai
Copy link
Contributor

elichai commented Jul 28, 2020

Do we know anyone that is involved in the C standard? ... perhaps they could be convinced to at least elevate self-memcpy to implementation defined. :) I normally don't consider the testing of something even started until a compiler bug or two is reported... getting the language changed would be a new bar to meet in the future. :)

I think I might know someone on twitter, I'll post a tweet and start tagging :)

@real-or-random
Copy link
Contributor

Some more references:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667

https://web.archive.org/web/20180926060810/http://valgrind.10908.n7.nabble.com/memcpy-x-x-n-is-not-OK-clang-llvm-vs-memcheck-td44514.html

I'm not very motivated to talk to Valgrind after this thread but yeah, it's old, hm.

Do we know anyone that is involved in the C standard? ... perhaps they could be convinced to at least elevate self-memcpy to implementation defined. :)

I don't believe that this is a language issue. I tend to agree with the clang devs here.

@gmaxwell
Copy link
Contributor Author

gmaxwell commented Jul 28, 2020

Clang devs don't have any control over libc. But if they think must-support-memcpy-self is a reasonable requirement for memcpy, then isn't that a case that it's a reasonable implementation defined behaviour for memcpy? ::shrugs:: That is in fact defacto what they're relying on now-- that they're only used with a memcpy that has an implementation defined behaviour for self copies.

You shouldn't let that thread throw you on valgrind-- the person with the argument for not changing valgrind doesn't appear to listed as a valgrind author, could just be an internet hothead. I'm not sure what change I'd ask from valgrind other than making it so you can separately suppress memcpy-self from overlaps (if that isn't already the case). I wouldn't want it to stop returning errors for actual bugs in source code.

[Especially now that I see that thread pointed out a plausible memcpy construction that would break self assignment.]

@real-or-random
Copy link
Contributor

real-or-random commented Jul 28, 2020

Clang devs don't have any control over libc. But if they think must-support-memcpy-self is a reasonable requirement for memcpy, then isn't that a case that it's a reasonable implementation defined behaviour for memcpy? ::shrugs::

I think this would miss the point. Clang and libc together form a C implementation, and things like "implementation-defined" apply to C programs calling memcpy and not to internal calls between one part of a C implementation and another part, these are literally implementation details. You're right of course, it's somewhat crazy for clang to rely on it if you control only half of the C implementation, and rely on undocumented behavior provided by the other part. (I don't know enough to judge about the potential real issues on PowerPC.)

You shouldn't let that thread throw you on valgrind-- the person with the argument for not changing valgrind doesn't appear to listed as a valgrind author, could just be an internet hothead. I'm not sure what change I'd ask from valgrind other than making it so you can separately suppress memcpy-self from overlaps (if that isn't already the case). I wouldn't want it to stop returning errors for actual bugs in source code.

Fair. Do you want to give a try? Otherwise I can do it next week or so, I guess I don't have the time right now.

@gmaxwell
Copy link
Contributor Author

it's somewhat crazy for clang to rely on it if you control only half of the C implementation,

Yeah. That's also my point. They're perfectly right that the "implementation" (complier + libc) can do whatever it wants, I agree. But relying on undocumented libc behaviour is not wise, esp because glibc has changed before -- memcpy on overlapping regions used to work.

Fair. Do you want to give a try? Otherwise I can do it next week or so, I guess I don't have the time right now.

I'm going to seek some friendly input for a few days first, but sure.

@elichai
Copy link
Contributor

elichai commented Oct 19, 2020

I can no longer reproduce this, not with clang or with gcc, I'm not sure if it is because of a code change or new compilers or valgrind changed

$ clang --version
clang version 11.0.0 (https://github.com/llvm/llvm-project.git 176249bd6732a8044d457092ed932768724a6f06)
Target: x86_64-unknown-linux-gnu

$ gcc --version
gcc (GCC) 10.2.0

$ CC=clang CFLAGS='-Os' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --enable-module-schnorrsig --enable-module-extrakeys --disable-openssl-tests && make clean && make valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
==3791018== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

$ CC=gcc CFLAGS='-Os' ./configure --with-bignum=no --enable-experimental --enable-module-ecdh --enable-module-recovery --enable-module-schnorrsig --enable-module-extrakeys --disable-openssl-tests && make clean &&make valgrind_ctime_test && libtool --mode=execute valgrind ./valgrind_ctime_test
==3793460== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

@real-or-random
Copy link
Contributor

@elichai I'm not sure if we ever had this issue on x86_64. The matrix in the initial comment is on ARM8.

@elichai
Copy link
Contributor

elichai commented Oct 19, 2020

@elichai I'm not sure if we ever had this issue on x86_64. The matrix in the initial comment is on ARM8.

Oops I forgot that part. you're right.

@real-or-random
Copy link
Contributor

This seems to happen with clang 7 also on -O2 builds:
https://app.shippable.com/github/real-or-random/secp256k1/runs/9/29/console (I was playing around with Shippable as a CI)
Not sure what the crucial difference here is to Travis, which also runs clang 7. Maybe it's that Shippable uses an older valgrind version,

@real-or-random
Copy link
Contributor

Fair. Do you want to give a try? Otherwise I can do it next week or so, I guess I don't have the time right now.

I'm going to seek some friendly input for a few days first, but sure.

@gmaxwell Did you reach out to them?

@real-or-random
Copy link
Contributor

So we see this now with GCC too:
ElementsProject/lightning#4203

There's an age-old open GCC bug: https://gcc.gnu.org/bugzilla//show_bug.cgi?id=32667
The proposed solution in clang is simply to document the issue: https://reviews.llvm.org/D86993

So far we've seen this only in secp256k1_ge_neg in this line:

*r = *a;

As long as this the case, I believe we should just mitigate this in our code. Here are a few things we could do:

  • Copy all struct members separately and hope the compiler does not compile it into a memcpy. This seems to work for secp256k1_gej_neg here:

    secp256k1/src/group_impl.h

    Lines 247 to 251 in c6b6b8f

    static void secp256k1_gej_neg(secp256k1_gej *r, const secp256k1_gej *a) {
    r->infinity = a->infinity;
    r->x = a->x;
    r->y = a->y;
    r->z = a->z;
  • Prepend if (r != a).
  • Split secp256k1_ge_neg in secp256k1_ge_neg_noalias and secp256k1_ge_neg_inplace. (I'm sure there are better names.) This should even be negligibly faster.
  • Replace the struct assignment by a memmove.
  • Add valgrind suppressions.
  • Ignore the issue and blame the ecosystem.

The first four look all reasonable to me and I'd probably ACK any of those. If you ask me, we should just do the first. In the interest of avoiding bikeshedding, I would open the PR but it'll be easier if someone with an ARM system can do it because I can't test (without compiling for my phone, which is a PITA).

@gmaxwell
Copy link
Contributor Author

or split into inplace and neg, and neg branches then calls inplace. Most usage would just use inplace directly.

@elichai
Copy link
Contributor

elichai commented Nov 19, 2020

  • Prepend if (r != a).

I'd really like to avoid pointer comparison, AFAIU comparing 2 pointers that point to different objects(ala not in the same array) is UB in C.

@real-or-random
Copy link
Contributor

real-or-random commented Nov 19, 2020

  • Prepend if (r != a).

I'd really like to avoid pointer comparison, AFAIU comparing 2 pointers that point to different objects(ala not in the same array) is UB in C.

No, it's defined. The ==/!= operators never yield UB. See http://port70.net/~nsz/c/c99/n1256.html#6.5.9p6

@jonasnick
Copy link
Contributor

So far we've seen this only in secp256k1_ge_neg in this line:

With default configuration flags, ARMv8 and GCC 10.2.0, I get the Source and destination overlap in memcpy valgrind error in secp256k1_gej_add_var.

@real-or-random
Copy link
Contributor

@jonasnick Ok, damn. Is this the only instance?

I'm not sure how to move forward then. I'm tempted to say "if it's only in two places, then let's fix it" but I worry that we'll end up replacing every struct assignment in the code base.

At least we've seen this only with ge and gej. I hope this is because they're the largest structs we have in the code and their size is maybe not nicely aligned to register sizes due to the infinity int. (Then it makes more sense to call memcpy for the compiler instead of optimize using other instructions, I've seen this when looking into zeroing secrets.) And we only have a handful of struct assignments in group_impl.h, so I think we should still try to eliminate them.

We should still benchmark if it makes a difference. Or directly split the function into two variants. If the compiler inserts memcpy in ARM with -O2, this may be a performance issue even now (secp256k1_gej_add_var is used in a loop in ecmults...)

@real-or-random real-or-random changed the title Valgrind errors with struct assignment in Os builds Valgrind errors with struct assignment on ARM Jun 7, 2021
@TheBlueMatt
Copy link
Contributor

This also occurs with on ppcel64.

@real-or-random real-or-random changed the title Valgrind errors with struct assignment on ARM Valgrind errors with struct assignment on ARM and PPC64LE Jun 7, 2021
@real-or-random
Copy link
Contributor

This also occurs with on ppcel64.

Right! This was in fact noted in the opening comment here but I mis-renamed the issue...

@real-or-random
Copy link
Contributor

real-or-random commented Jun 8, 2021

I've looked at the output of powerpc64le-linux-gnu-objdump -drwCSl .libs/libsecp256k1.so | grep -B10 memcpy@@ (for -O2) and it tells me that most calls to libc's memcpy correspond to actual memcpy calls in our code. But around 10 calls are struct assignments and a handful of them may be problematic because they may be a self-copy. All of those cases are ge or gej structs. (I haven't looked at ARM again but I believe it's similar there...)

I think the scope of the issue is still small but this is more than the two code locations we've seen before. :/

I suggest to add _ge_clone and _gej_clone methods, use them in the corresponding places, and then we can see how to implement them such that self-memcpys won't show up, e.g., using an if or memmove. This enables us to benchmark the code and see whether a particular implementation affects performance. (edit: And if we don't want to make changes due to performance, this makes it at least easier to write a Valgrind suppression rule or wrap the changes with if(RUNNING_ON_VALGRIND)).

All of this is acceptable, I think. But it's an ugly hack.

@jonasnick
Copy link
Contributor

Fwiw, if we do this hack we should also test in CI that we actually use these methods (preferably by running the tests in valgrind on the affected architectures).

@real-or-random
Copy link
Contributor

Fwiw, if we do this hack we should also test in CI that we actually use these methods (preferably by running the tests in valgrind on the affected architectures).

I agree though it's hard to run Valgrind on foreign architectures. Valgrind and qemu-user don't go together... We could pay for some EC2 ARM instances and hand them to Cirrus CI (that is supported). But even this won't cover PowerPC. No matter what, this will be a project on its own.

For now, I'd prefer some clever grep on the assembly.

@real-or-random
Copy link
Contributor

For now, I'd prefer some clever grep on the assembly.

Well ok, the issue with this is that this needs to be something which is statically checkable. This leaves us with only two options:

  • memmove
  • assigning all struct members individually

In both cases, we'd hope the compiler won't turn it into a memcpy.

And in both cases, it's probably better to ensure that all of the field members are initialized to avoid UB (or any discussion on whether this is UB or not, which may be worse than the actual UB :P).

So we think we should first merge #791 .

@real-or-random
Copy link
Contributor

I've run into this again in #1433, see https://cirrus-ci.com/build/5276023932059648.

There's at least one new case
==3653== Source and destination overlap in memcpy(0xfed0e388, 0xfed0e388, 148)
==3653==    at 0x486DA10: memcpy (vg_replace_strmem.c:1120)
==3653==    by 0x1689FF: secp256k1_ecmult_pippenger_wnaf (ecmult_impl.h:567)
==3653==    by 0x1689FF: secp256k1_ecmult_pippenger_batch (ecmult_impl.h:710)
==3653==    by 0x16BF5F: secp256k1_ecmult_pippenger_batch_single (ecmult_impl.h:729)
==3653==    by 0x169ABF: test_ecmult_multi (tests.c:4583)
==3653==    by 0x12BDA7: run_ecmult_multi_tests (tests.c:5137)
==3653==    by 0x12BDA7: main (tests.c:7588)

(Note:

==3653== Source and destination overlap in memcpy(0xfed13708, 0xfed13708, 148)
==3653==    at 0x486DA10: memcpy (vg_replace_strmem.c:1120)
==3653==    by 0x124E5F: test_ge (tests.c:3813)
==3653==    by 0x124E5F: run_ge (tests.c:3984)
==3653==    by 0x124E5F: main (tests.c:7575)

No useful debug info here, but I tracked this down to the known *r = *a in secp256k1_gej_add_var. )

@real-or-random
Copy link
Contributor

Do we know anyone that is involved in the C standard? ... perhaps they could be convinced to at least elevate self-memcpy to implementation defined. :)

LLVM people are working on a draft DR: https://reviews.llvm.org/D86993#4525896

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants