Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move linker input file parsing to the frontend #21700

Merged
merged 40 commits into from
Oct 24, 2024
Merged

move linker input file parsing to the frontend #21700

merged 40 commits into from
Oct 24, 2024

Conversation

andrewrk
Copy link
Member

@andrewrk andrewrk commented Oct 15, 2024

Moves GNU ld script processing to the frontend to join the relevant library lookup logic, making the libraries within subject to the same search criteria as all the other libraries.

This change is necessary so that the compiler has knowledge of all linker inputs at the start of compilation, so that linking and compilation can begin at the same time. Finding out about linker inputs during flush() is too late. This branch fully removes lib_dirs from being passed to the ELF linking code.

This unfortunately means doing file system access on all .so files when targeting ELF to determine if they are linker scripts, so I introduced an opt-in CLI flag to enable .so scripts. When a GNU ld script is encountered, the error message instructs users about the CLI flag that will immediately solve their problem, which is passing -fallow-so-scripts to zig build. This means that users who don't have funky libraries, or at least avoid linking against them, don't have to pay the cost of these file system accesses.

This branch additionally moves all object file, archive, and shared object file parsing to happen during the compilation pipeline rather than in flush().

@kubkon
Copy link
Member

kubkon commented Oct 15, 2024

Nice, I like where this is headed. I will follow-up with a rewrite of handling dylibs (and dylib re-exports) for MachO since it's somewhat similar in spirit to GNU ld script processing in that parsing dylibs may add more dylibs into the pipeline. In MachO case though, I think this will never be an optional step due to the fact that Apple dylibs enforce this. What do you think?

@alexrp
Copy link
Member

alexrp commented Oct 15, 2024

Doesn't zig cc also need a way to enable this flag? (Or perhaps have it on by default?)

@andrewrk
Copy link
Member Author

I'm looking into moving library lookup into Compilation creation, which would address both of your comments and allow reverting the -fallow-so-scripts thing, however, the data dependency here is creation of the Compilation.Config, which depends on any_dyn_libs, which depends on looking up -l arguments. For example one might pass -lfoo, and if that resolves to libfoo.a then we end up with link_mode == .static in the Compilation.Config, otherwise if it resolves to libfoo.so, we end up with link_mode == .dynamic.

One possibility here is to guess the link mode heuristically, and then if the guess ends up being wrong, fail the compilation with an error message explaining the user needs to explicitly specify -dynamic. This would happen fairly infrequently if the heuristic was to assume any -l arg which is not libc (and whose search query does not preclude dynamic linking) implies dynamic linking.

@kubkon
Copy link
Member

kubkon commented Oct 15, 2024

I'm looking into moving library lookup into Compilation creation, which would address both of your comments and allow reverting the -fallow-so-scripts thing, however, the data dependency here is creation of the Compilation.Config, which depends on any_dyn_libs, which depends on looking up -l arguments. For example one might pass -lfoo, and if that resolves to libfoo.a then we end up with link_mode == .static in the Compilation.Config, otherwise if it resolves to libfoo.so, we end up with link_mode == .dynamic.

One possibility here is to guess the link mode heuristically, and then if the guess ends up being wrong, fail the compilation with an error message explaining the user needs to explicitly specify -dynamic. This would happen fairly infrequently if the heuristic was to assume any -l arg which is not libc (and whose search query does not preclude dynamic linking) implies dynamic linking.

Hmm, I wonder how safe it would be to assume that -l by definition resolves to a dynamic library. What if the libs for a project such as SDL are packaged as both libfoo.a and libfoo.so and exist in the same location? I don't know how likely that would be on Linux, but on macOS this is fairly common.

@alexrp
Copy link
Member

alexrp commented Oct 15, 2024

I don't know how likely that would be on Linux

Very common; /usr/lib/x86_64-linux-gnu is full of .a and .so files on my Ubuntu 23.10 system.

@andrewrk
Copy link
Member Author

This is why linkSystemLibrary provides these options:

This allows the user to have the traditional behavior, which allows for ambiguity in how libraries are linked, or to specify no_fallback and be certain that a library is linked according to the explicitly specified mode. When no_fallback is used, the compiler would not have to to guess the link mode, since it would not depend on library name resolution.

@andrewrk
Copy link
Member Author

andrewrk commented Oct 16, 2024

The "guess the link mode" heuristic idea doesn't work. This logic must go into the CLI - including dylibs if those require resolution with respect to library directories - in order for the cache system to function. The two possibilities are:

  1. (status quo) do the resolution with every CLI invocation before checking the cache
  2. make the cache also observe changes to every library search directory

For example, if you have -L a -L b -lfoo and this file system:

a/
b/libfoo.a

then you add a file and run the same CLI again:

a/libfoo.a
b/libfoo.a

Currently, this works correctly. If library lookup moves later in the pipeline than CLI, it would have a false positive cache hit, and the changes to the cache system that would be necessary to correct this are undesirable.

Note that in master branch, this exact example is broken if -lfoo appears within a GNU ld script, but it is fixed with this branch.

@andrewrk andrewrk force-pushed the cli-lib-dirs branch 3 times, most recently from a4d65a9 to a903727 Compare October 19, 2024 07:11
@andrewrk andrewrk changed the title move ld script processing to the frontend move linker input file parsing to the frontend Oct 22, 2024
@andrewrk andrewrk force-pushed the cli-lib-dirs branch 3 times, most recently from 96991c4 to fcc0f10 Compare October 23, 2024 21:52
this allows it to be used by the frontend
other than a compile error, specifically
along with the relevant logic, making the libraries within subject to
the same search criteria as all the other libraries.

this unfortunately means doing file system access on all .so files when
targeting ELF to determine if they are linker scripts, however, I have a
plan to address this.
The compiler defaults this value to off so that users whose system
shared libraries are all ELF files don't have to pay the cost of
checking every file to find out if it is a text file instead.

When a GNU ld script is encountered, the error message instructs users
about the CLI flag that will immediately solve their problem.
* Compilation.objects changes to Compilation.link_inputs which stores
  objects, archives, windows resources, shared objects, and strings
  intended to be put directly into the dynamic section. Order is now
  preserved between all of these kinds of linker inputs. If it is
  determined the order does not matter for a particular kind of linker
  input, that item should be moved to a different array.
* rename system_libs to windows_libs
* untangle library lookup from CLI types
* when doing library lookup, instead of using access syscalls, go ahead
  and open the files and keep the handles around for passing to the
  cache system and the linker.
* during library lookup and cache file hashing, use positioned reads to
  avoid affecting the file seek position.
* library directories are opened in the CLI and converted to Directory
  objects, warnings emitted for those that cannot be opened.
By making it a field of link.Elf, it is now accessible without a data
dependency on `files`, fixing a race condition with the codegen thread
and linker thread.
and don't look for glibc files on windows
so we don't have to do an entire compilation unit for just this silly
symbol
it is incredible how many bad ideas glibc is bundled into one project.
unstable sort is always better if you have no ties
This test does not pass in master branch either if you flip the object
order around.
If the "is darwin" check is moved below the libc_installation check
below, error.LibCInstallationMissingCrtDir is returned from
lci.resolveCrtPaths().

This should be revisited because it makes sense to check
libc_installation first even on darwin.

Anyway for now this more closely matches logic from master branch.
Unfortunately it's not a complete solution, so a follow-up commit will
need to do something more drastic like not do the linker task queue at
the same time as codegen task queue.

From that point, it is possible to do more work at the same time but
that should be a separate branch. This one has gotten big enough.
don't wait for AstGen and C source files to complete before starting to
build compiler_rt and libfuzzer
these tasks have some shared data dependencies so they cannot be done
simultaneously. Future work should untangle these data dependencies so
that more can be done in parallel.

for now this commit ensures correctness by making linker input parsing
and codegen tasks part of the same queue.
The safety lock needs to happen after check()
which, in this branch causes a miscompilation because it would get sent
to the linker.
@andrewrk andrewrk merged commit c563ba6 into master Oct 24, 2024
10 checks passed
@andrewrk andrewrk deleted the cli-lib-dirs branch October 24, 2024 05:56
@andrewrk andrewrk added the release notes This PR should be mentioned in the release notes. label Oct 24, 2024
@i18nsite
Copy link

break the cargo zigbuild
rust-cross/cargo-zigbuild#290

@alexrp
Copy link
Member

alexrp commented Oct 26, 2024

@i18nsite please open an issue for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes This PR should be mentioned in the release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants