Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Unit::new_with_abbreviations() to allow the user to implement custom abbreviations cache #677

Merged
merged 1 commit into from
Sep 11, 2023

Conversation

al13n321
Copy link
Contributor

@al13n321 al13n321 commented Sep 10, 2023

I noticed that, on one binary, iterating over .debug_info units was slow because it re-parsed the same abbreviations multiple times. Adding a cache sped it up from 1.5 seconds to 0.37 seconds.

This particular binary had 33109 compilation units with 5513 unique debug_abbrev_offset values, so each Abbreviations was parsed 6 times on average.

Gimli's AbbreviationsCache was ineffective because:

/// Currently this only caches the abbreviations for offset 0,                                                                                                                                                                                                                                                         
/// since this is a common case in which abbreviations are reused.                                                                                                                                                                                                                                                     
/// This strategy may change in future if there is sufficient need.                                                                                                                                                                                                                                                    

(In this case the abbreviations at offset 0 were used by only one unit, and all the reused abbreviations were at other offsets.)

This PR doesn't change AbbreviationsCache, but adds a trivial provision to allow the user to do their own caching of abbreviations. (A trivial HashMap works fine in my case, but idk whether it would be good enough for all gimli users.)

src/read/dwarf.rs Outdated Show resolved Hide resolved
@philipc
Copy link
Collaborator

philipc commented Sep 10, 2023

Do you know which toolchain invocation results in the abbreviations being like this?

@al13n321
Copy link
Contributor Author

I don't know what compiler flags or tools are responsible, but it's open source and should be reproducible (with considerable effort).

The binary is ClickHouse (https://github.com/ClickHouse/ClickHouse), can be downloaded like this:

curl 'https://builds.clickhouse.com/master/amd64/clickhouse' -o clickhouse && chmod a+x clickhouse && ./clickhouse local -q "select * from system.build_options"

It's mostly C++, built with clang 16, statically linked with a kitchen sink of libraries (~100 of them). Some flags:

CXX_FLAGS	 --gcc-toolchain=./cmake/linux/../../contrib/sysroot/linux-x86_64 -fdiagnostics-color=always -Xclang -fuse-ctor-homing -Wno-enum-constexpr-conversion -fsized-deallocation  -gdwarf-aranges -pipe -mssse3 -msse4.1 -msse4.2 -mpclmul -mpopcnt -fasynchronous-unwind-tables -ffile-prefix-map=.=. -ftime-trace -falign-functions=32 -mbranches-within-32B-boundaries -fdiagnostics-absolute-paths -fstrict-vtable-pointers -Wall -Wextra -Wframe-larger-than=65536 -Weverything -Wpedantic -Wno-zero-length-array -Wno-c++98-compat-pedantic -Wno-c++98-compat -Wno-c++20-compat -Wno-sign-conversion -Wno-implicit-int-conversion -Wno-implicit-int-float-conversion -Wno-ctad-maybe-unsupported -Wno-disabled-macro-expansion -Wno-documentation-unknown-command -Wno-double-promotion -Wno-exit-time-destructors -Wno-float-equal -Wno-global-constructors -Wno-missing-prototypes -Wno-missing-variable-declarations -Wno-padded -Wno-switch-enum -Wno-undefined-func-template -Wno-unused-template -Wno-vla -Wno-weak-template-vtables -Wno-weak-vtables -Wno-thread-safety-negative -Wno-enum-constexpr-conversion -Wno-unsafe-buffer-usage -O2 -g -DNDEBUG -O3 -g -gdwarf-4  -flto=thin -fwhole-program-vtables -fno-pie
LINK_FLAGS	 --gcc-toolchain=./cmake/linux/../../contrib/sysroot/linux-x86_64 --ld-path=/usr/bin/ld.lld-16 -Wl,--no-export-dynamic -Wl,--gdb-index -Wl,--build-id=sha1 -no-pie -Wl,-no-pie  -flto=thin -fwhole-program-vtables

(The flags that stood out to me: -gdwarf-4 -gdwarf-aranges -Wl,--gdb-index -flto=thin -fwhole-program-vtables)

If you build it locally with default flags, the binary has 7708 units and 5999 abbreviations. The official release binary downloaded like above has 34629 units and 5572 abbreviations. Idk what causes the difference - I guess just different compiler/linker flags?

There's also this interesting postprocessing step, where sections of the binary are compressed, and the decompression code is compiled in: https://github.com/ClickHouse/ClickHouse/blob/master/utils/self-extracting-executable/CMakeLists.txt . Hard to imagine how it can 4x the number of units, but who knows.

That's all I could gather quickly. If you don't have good guesses based on that, and if it's important to you, I can spend some more time investigating, but would prefer to move on.

@philipc
Copy link
Collaborator

philipc commented Sep 10, 2023

Thanks for the info. I'll think about improving AbbreviationsCache to handle this later.

This change looks fine. The CI failure is unrelated, and I'll merge after I've fixed it.

@philipc philipc merged commit 2d3ad32 into gimli-rs:master Sep 11, 2023
18 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants