-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in SparseArrayLookup; CAAction's protocol version is invalid #283
Comments
@gcasa Is this the right way to file this, or should I move this to a different repo or to |
Do you have a reduced test case (ideally linking nothing but the runtime)? It looks as if it may be trying to send a message in +load to a class that is not yet loaded. |
I think I might have forgotten to link something. I don't have a minimal test case -- as I said, I have no idea where this came from. |
I looked and it seems like I have linked everything. |
Is the order of the +load messages defined? It doesn't occur in the libs-opal testcases, so if this is the reason for the segfault, then the load message order is different. |
@davidchisnall I believe this is the problem... + (void)load
{
[CGImageDestination registerDestinationClass: self];
} |
I changed those to |
Sending a message to a superclass in +load should be fine. Sending one to any other class that’s defined in the same library should also be fine. I wonder if somehow the selector for the message being sent is not resolved. That could result in an out of bounds access in the uninitialised stable. |
I have changed
@davidchisnall Might this be related? The abort appears in pthreads in |
Does Apple objc4 work on Linux? If it does, it might be possible to test on objc4 to see if the issue is in libobjc2. |
@davidchisnall How can I compile libobjc2 with debug symbols included? |
I passed |
I'm kinda busy, but here's my last GDB session if anyone wants to take a look:
|
@davidchisnall The issue appears to be that the isa of CAAction is 0x0, which is not a valid protocol version. I don't know why it ended up this way. |
What is the -fobjc-runtime= flag that you’re passing to clang? |
@davidchisnall
|
I can try switching to the newly released v2.2.1 |
I'm building like: export CC=clang-14
export CXX=clang++-14
export CXXFLAGS="-std=c++11"
export RUNTIME_VERSION=gnustep-2.1
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
export LD=/usr/bin/ld.gold
export LDFLAGS="-fuse-ld=/usr/bin/ld.gold -L/usr/local/lib"
rm -Rf build
mkdir build && cd build
cmake ../ \
-DCMAKE_C_COMPILER=${CC} \
-DCMAKE_CXX_COMPILER=${CXX} \
-DCMAKE_ASM_COMPILER=${CC} \
-DCMAKE_LINKER=${LD} \
-DUSE_GOLD_LINKER=YES \
-DCMAKE_MODULE_LINKER_FLAGS="${LDFLAGS}" \
-DTESTS=OFF \
-DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_EXPORT_COMPILE_COMMANDS=1
cmake --build .
sudo -E make install
sudo ldconfig |
The issue still appears in v2.2.1. |
clang 14 is pretty old, but should be fine. There are a few fixes for the v2 ABI after that, but I think they matter only on Windows. I wonder if the +load is somehow being called before we've loaded the |
@davidchisnall Is there anything that I should run in
I think this issue is unrelated to the
|
It runs into |
@davidchisnall Are you available to look at this more or not? Should I post this to |
I’m completely confused by this. It looks as if your binary contains things that shouldn’t be possible. |
I have tested again with a new installation of GNUstep on the same computer with clang-18, which results in the same segfault in SparseArrayLookup. I'll go apply my patches to Opal and Boron and see what happens. |
I should also try to do this on a clean Debian system, but I currently don't have the disk space to do that. |
I get "Unknown protocol versionaborted" error again, with my Opal and Boron patches applied on clang-18. |
By the way, for most repositories that I haven't patched I'm on the latest commit to master as of May 31. I built it with |
Is there something unusual about CAAction's protocol definition? (Is it missing a definition?) I can't usefully help debug this without a test case that doesn't depend on a big pile of other libraries. I hadn't noticed previously that you were using gold. Does the same thing happen if you use lld? |
There’s probably a compiler bug that is causing a non-adopted protocol to be emitted, but I’m not sure how it’s generated but not in the correct section. Can you send me the preprocessed source for the compilation unit that generates the .o file that contains the symbol for this protocol? |
Do you know how to find the associated compilation unit? |
nm will dump the symbols for each .o file, you should be able to find the CAAction protocol in that list for one file. Once you know that, find the .m file that generated it and add -E and -o {some temp file} to the compile command. |
Ok, thanks. |
I found I'll attach the output of |
(for myself) for file in *.o; do echo $file; nm $file; done | grep CAAction |
This is especially perplexing because nothing in libs-quartzcore implements the protocol CAAction, there are just methods that take and return CAAction values (returned values are ones that the user previously provided to libs-quartzcore, so libs-quartzcore is not producing any id itself). Despite that, CAAnimation ends up with |
Does something in that file do |
Here is the preprocessed source (I posted it above, but might not have been clear): https://gist.github.com/ethanc8/46be627ad36326f9b32843fe6409d77d The word |
Wait, I didn't see this: @interface CAAnimation : NSObject <NSCoding, NSCopying, CAAction, CAMediaTiming> |
Yup, looks like the protocol is adopted, so it is correct for it to be emitted in this file. But for some reason it's not initialised correctly. Looks like a compiler bug. If you can create a reduced test case that would be helpful (delete bits of the file, compile with |
Ok, thanks. I'm not sure whether it's a compiler bug, since it seems to be initialised correctly for my test case which creates classes and categories implementing CAAction and instantiates CAAnimation. |
The original $._OBJC_PROTOCOL_CAAction = comdat any
// a few hundred lines later
@52 = private unnamed_addr constant [9 x i8] c"CAAction\00", align 1
@._OBJC_PROTOCOL_CAAction = global { ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr } { ptr inttoptr (i32 4 to ptr), ptr @52, ptr @.objc_protocol_list.10, ptr @.objc_protocol_method_list.11, ptr @.objc_protocol_method_list.13, ptr @.objc_protocol_method_list.12, ptr @.objc_protocol_method_list.14, ptr null, ptr null, ptr null, ptr null }, section "__objc_protocols", comdat, align 8
// a few lines later
@.objc_protocol_list.20 = internal global { ptr, i64, [4 x ptr] } { ptr null, i64 4, [4 x ptr] [ptr @._OBJC_PROTOCOL_NSCoding, ptr @._OBJC_PROTOCOL_NSCopying, ptr @._OBJC_PROTOCOL_CAAction, ptr @._OBJC_PROTOCOL_CAMediaTiming] }, align 8 I've attached the full LLVM IR to https://gist.github.com/ethanc8/d02ef13441e1eb58d372cf650fa3f63b |
That looks correct. The |
Which load address are you talking about? |
Ok, I set a watchpoint at
|
If I try to forcefully reset it back to 4, it does:
and then fails with |
Okay, this looks like a load-order problem. The line above the one in your gdb session is Most of the paths that should be here are guarded on having loaded the runtime. Can you print a backtrace there? I suspect the simplest thing to do is treat the built-in classes as special and define them in C rather than relying on the normal load sequence. Does the library defining this protocol explicitly link libobjc2? If it does, we should see the libobjc2 constructors run first (I think). |
Here's the backtrace:
|
All of the libraries and apps should be linking libobjc2: $ gnustep-config --objc-libs -ldispatch -pthread -fexceptions -rdynamic -fobjc-runtime=gnustep-2.1 -fblocks -L/home/ethan/GNUstep/Library/Libraries -L/usr/GNUstep/Local/Library/Libraries -L/usr/GNUstep/System/Library/Libraries -lpthread -lobjc -lm |
Generate these classes using the structures that the runtime expects internally, rather than relying on the Objective-C compiler. This change means that they can always be the latest version, even if the runtime is compiled with an older compiler, and ensures that the `Protocol` class is always available, independent of global constructor ordering between libraries. Fixes #283
Please can you test #294 and see if this fixes it for you? |
I will try to do so tonight, but I'm a bit busy today. |
Yes, it seems to be working now. Thanks! Do you want me to confirm anything in the debugger for you? I have just noticed that it no longer aborts with "Unknown protocol version". |
No, that’s great. Thanks. |
Generate these classes using the structures that the runtime expects internally, rather than relying on the Objective-C compiler. This change means that they can always be the latest version, even if the runtime is compiled with an older compiler, and ensures that the `Protocol` class is always available, independent of global constructor ordering between libraries. Fixes #283
We still need to figure out what caused the segfault in SparseArrayLookup. |
Hello 👋, I have encountered a segfault in
SparseArrayLookup
during the initialization of my port of the application GitY. Here is the backtrace from GDB:Interestingly, I have not encountered this when running any of the libs-opal tests, such as
images
. I have confirmed with GDB thatimages
calls+[CGImageDestinationTIFF load]
without segfaulting. Therefore, I don't know in which component this came from.The Objective-C components that are linked to by
GitY
are:It's possible that one of those has messed something up in their
+load
methods, but I'm not exactly sure. If needed, I can send you binaries of my GNUstep installation, compilelibobjc2
with different flags, peek around in GDB, or whatever else would be needed.The text was updated successfully, but these errors were encountered: