Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scanelf: allow user to choose whether to scan .symtab or .dynsym #9

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

qookei
Copy link

@qookei qookei commented Oct 28, 2022

This pull request adds a new command line switch (-H/--how) that
allows the user to select whether scanelf should look for symbols in
.symtab or .dynsym.

This is done because the the sections contain different sets
of symbols (which somewhat intersect), and the names for the symbols
differ (.symtab includes the version suffix within the symbol itself),
which would cause scanelf to behave differently before and after
stripping the executable.

Additionally, it adds two modes of looking for symbols in .dynsym:

  • unversioned - look at the .dynsym entries as they are,
  • versioned - also look at .gnu.version to figure out the symbol
    version, and suffix the symbol name with it before matching,
    emulating the behavior of .symtab.

Note that this does not handle the case where the dynamic section needs
to be scanned to find the version information, as that case appears to
not be fully functional anyway (no DT_GNU_HASH support, and most ELFs
these days don't have DT_HASH). Consider it a TODO if need be :^).

Fixes bug 847493.

Thanks to @ArsenArsen for the idea for the fix.

@thesamesam
Copy link
Member

I don't know enough about ELF yet to give a good review by myself. cc @SoapGentoo @xen0n

@thesamesam
Copy link
Member

oh, and ofc @vapier

@qookei
Copy link
Author

qookei commented Oct 28, 2022

Fixed wrong version name being extracted from .gnu.version_d and fixed inconsistency with readelf output regarding single or double @ (getrpcent_r@@GLIBC_2.1.2 vs getrpcent_r@GLIBC_2.0; determined by uppermost bit of the version number in .gnu.version).

qookei and others added 2 commits October 28, 2022 23:53
Signed-off-by: Kacper Słomiński <kacper.slominski72@gmail.com>
This commit adds a new command line switch (-H/--how) that allows the
user to select whether scanelf should look for symbols in .symtab or
.dynsym.

This is done because the the sections contain different sets
of symbols (which somewhat intersect), and the names for the symbols
differ (.symtab includes the version suffix within the symbol itself),
which would cause scanelf to behave differently before and after
stripping the executable.

Additionally, it adds two modes of looking for symbols in .dynsym:
 - unversioned - look at the .dynsym entries as they are,
 - versioned - also look at .gnu.version to figure out the symbol
   version, and suffix the symbol name with it before matching,
   emulating the behavior of .symtab.

Note that this does not handle the case where the dynamic section needs
to be scanned to find the version information, as that case appears to
not be fully functional anyway (no DT_GNU_HASH support, and most ELFs
these days don't have DT_HASH). Consider it a TODO if need be :^).

Bug: https://bugs.gentoo.org/847493

Co-authored-by: Arsen Arsenović <arsen@aarsen.me>
Signed-off-by: Kacper Słomiński <kacper.slominski72@gmail.com>
@qookei
Copy link
Author

qookei commented Oct 28, 2022

Also added a sign off to both commits.

@vapier
Copy link
Member

vapier commented Oct 30, 2022

scanelf has a bit of a short-option problem already. let's not add to it by using a short-option just because it's free. i don't think --how will see that much usage to justify grabbing one.

in theory, the debug symbols (.symtab/.strtab) should be a superset of the runtime symbols (.dynsym/.dynstr). it used to be that way before commit efb0dff, then i changed it to try and guess based on who had more symbols. i don't recall what ELFs i was looking at at the time, but i'd believe there's still weirdness out there. i wonder if we should always scan both if they're found, and complain if there's a mismatch (i.e. one claims symbol foo lives at address X, but the other claims symbol foo lives at address Y, but we always defer to the runtime set).

the poor symbol version matching has been a known problem for a long time -- it's in the TODO. i think we can/should handle it, but i'd like to disconnect it from the debug-vs-runtime symbol issue, and not involve --how at all. the only time versioned symbol info comes up is with the -s/--symbol option, and we have flexibility in that syntax.

off the top of my head, the states are:

  • an ELF has an undefined reference to ...
    • ... an unversioned symbol (e.g. sem_init)
    • ... a versioned symbol (e.g. sem_init@GLIBC_2.0)
  • an ELF defines ...
    • ... an unversioned symbol (e.g. sem_init)
    • ... a non-default versioned symbol (e.g. sem_init@GLIBC_2.0)
    • ... a default versioned symbol (e.g. sem_init@@GLIBC_2.34)

so if someone is using --symbol sem_init, what is the expected matching behavior ? imo it would be to match all of those. if they specify --symbol sem_init@GLIBC_2.0, then we would only match that symbol exactly.

that leaves the question of expected output behavior. if we scanned for sem_init, what should the default display be ? today we emit non-versioned symbols:

$ scanelf -Bs sem_init /lib/libc.so.6 
ET_DYN sem_init,sem_init,sem_init /lib/libc.so.6 

i'm kind of inclined to keep that.

if people want to include the symbol version info in the output when passing an unversioned symbol, maybe we extend the output format. let's add [modifiers] between the % and the s so you could write %[v]s and it would include the version info.

$ scanelf -Bs sem_init -F '%[v]s %F' /lib/libc.so.6
sem_init@GLIBC_2.1,sem_init@GLIBC_2.0,sem_init@@GLIBC_2.34 /lib/libc.so.6

if they scanned for sem_init@..., i guess the default would be to still emit sem_init, and if they wanted the version info, they'd have to use the -F option with %[v]s ? i guess we could put it behind the -v flag too since the default (non-F) output isn't really meant to be machine readable.

is there another scenario we need to cover ?

@ArsenArsen
Copy link
Member

in theory, the debug symbols (.symtab/.strtab) should be a superset of the runtime symbols (.dynsym/.dynstr). it used to be that way before commit efb0dff, then i changed it to try and guess based on who had more symbols. i don't recall what ELFs i was looking at at the time, but i'd believe there's still weirdness out there. i wonder if we should always scan both if they're found, and complain if there's a mismatch (i.e. one claims symbol foo lives at address X, but the other claims symbol foo lives at address Y, but we always defer to the runtime set).

That is reasonable, yes, and should be doable (with some caveats: see below).

the poor symbol version matching has been a known problem for a long time -- it's in the TODO. i think we can/should handle it, but i'd like to disconnect it from the debug-vs-runtime symbol issue, and not involve --how at all. the only time versioned symbol info comes up is with the -s/--symbol option, and we have flexibility in that syntax.

off the top of my head, the states are:

  • an ELF has an undefined reference to ...

    • ... an unversioned symbol (e.g. sem_init)
    • ... a versioned symbol (e.g. sem_init@GLIBC_2.0)
  • an ELF defines ...

    • ... an unversioned symbol (e.g. sem_init)
    • ... a non-default versioned symbol (e.g. sem_init@GLIBC_2.0)
    • ... a default versioned symbol (e.g. sem_init@@GLIBC_2.34)

so if someone is using --symbol sem_init, what is the expected matching behavior ? imo it would be to match all of those. if they specify --symbol sem_init@GLIBC_2.0, then we would only match that symbol exactly.

that leaves the question of expected output behavior. if we scanned for sem_init, what should the default display be ? today we emit non-versioned symbols:

$ scanelf -Bs sem_init /lib/libc.so.6 
ET_DYN sem_init,sem_init,sem_init /lib/libc.so.6 

i'm kind of inclined to keep that.

Thing is, .symtab/.strtab contents (even the matching parts) don't exactly align with .dynsym/.dynstr precisely due to symbol versioning:

[i] ~/hello/build$ poke --quiet hello
#!!# var hello = Elf64_File @ 0#B
#!!# var strtab = hello.get_sections_by_name(".strtab")[0]
#!!# var dynstr = hello.get_sections_by_name(".dynstr")[0]
#!!# dump :from strtab.sh_offset + (strtab.sh_size - 128#B)
76543210  0011 2233 4455 6677 8899 aabb ccdd eeff  0123456789ABCDEF
0003031b: 6f63 0073 7472 6e6c 656e 3100 7365 746c  oc.strnlen1.setl
0003032b: 6f63 616c 655f 6e75 6c6c 0078 6e6d 616c  ocale_null.xnmal
0003033b: 6c6f 6300 5f5f 6374 7970 655f 625f 6c6f  loc.__ctype_b_lo
0003034b: 6340 474c 4942 435f 322e 3300 6963 6f6e  c@GLIBC_2.3.icon
0003035b: 765f 6f70 656e 4047 4c49 4243 5f32 2e32  v_open@GLIBC_2.2
0003036b: 2e35 0073 7464 6572 7240 474c 4942 435f  .5.stderr@GLIBC_
0003037b: 322e 322e 3500 5f5f 7370 7269 6e74 665f  2.2.5.__sprintf_
0003038b: 6368 6b40 474c 4942 435f 322e 332e 3400  chk@GLIBC_2.3.4.
#!!# dump :from dynstr.sh_offset + (dynstr.sh_size - 512#B)
76543210  0011 2233 4455 6677 8899 aabb ccdd eeff  0123456789ABCDEF
00000b1a: 0073 7472 6e6c 656e 006d 6273 696e 6974  .strnlen.mbsinit
00000b2a: 0073 7464 6f75 7400 6f70 7461 7267 0072  .stdout.optarg.r
00000b3a: 6561 6c6c 6f63 005f 5f73 7072 696e 7466  ealloc.__sprintf
00000b4a: 5f63 686b 005f 6578 6974 0062 696e 6474  _chk._exit.bindt
00000b5a: 6578 7464 6f6d 6169 6e00 5f5f 6670 7269  extdomain.__fpri
00000b6a: 6e74 665f 6368 6b00 6d61 6c6c 6f63 005f  ntf_chk.malloc._
00000b7a: 5f6c 6962 635f 7374 6172 745f 6d61 696e  _libc_start_main
00000b8a: 0069 7377 7072 696e 7400 7374 6465 7272  .iswprint.stderr
#!!# 

(didn't search for the same symbols, but the pattern is visible nevertheless)

... which means that the default behaviour before depended on whether .symtab/.strtab is present:

[i] ~/hello/build 1 $ scanelf -Bs fputs -F "%F  %s" hello hello_s
hello	fputs@GLIBC_2.2.5
hello_s	fputs
[i] ~/hello/build$ 

I guess we could decode dynsym/dynstr as name@[@] before matching against symtab/strtab?

Hmm, that might be even more compatible.

if people want to include the symbol version info in the output when passing an unversioned symbol, maybe we extend the output format. let's add [modifiers] between the % and the s so you could write %[v]s and it would include the version info.

$ scanelf -Bs sem_init -F '%[v]s %F' /lib/libc.so.6
sem_init@GLIBC_2.1,sem_init@GLIBC_2.0,sem_init@@GLIBC_2.34 /lib/libc.so.6

if they scanned for sem_init@..., i guess the default would be to still emit sem_init, and if they wanted the version info, they'd have to use the -F option with %[v]s ? i guess we could put it behind the -v flag too since the default (non-F) output isn't really meant to be machine readable.

Yeah, this idea sounds okay.

is there another scenario we need to cover ?

One more thing, maybe it'd be a good idea to detect whether @ is used in a -s match, and if not, ignore versioning when matching? This is at the crux of the original bug that brought us here, by the way, since some checks included @, resulting in failing matching. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants