-
-
Notifications
You must be signed in to change notification settings - Fork 470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LoongArch: support relaxation of pcalau12i/ld.d to pcalau12i/addi.d or pcaddi #1322
Conversation
elf/arch-loongarch.cc
Outdated
u32 insn2 = *(ul32 *)(isec.contents.data() + rels[i].r_offset + 4); | ||
|
||
// relax pcalau12i/ld.d to pcalau12i/addi.d | ||
if (get_rd(insn1) != get_rd(insn2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For more reliable, check rj is also necessary.
elf/arch-loongarch.cc
Outdated
|
||
i64 dist = compute_distance(ctx, sym, isec, r); | ||
// the second phase: relax pcalau12i/addi.d to pcaddi | ||
if (dist % 4 == 0 && -(1 << 21) < dist && dist < (1 << 21)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be an eage case "-(1<<21) == dist".
rels[i+2].r_offset == rel.r_offset + 4) { | ||
u32 insn1 = *(ul32 *)(contents.data() + rel.r_offset); | ||
u32 insn2 = *(ul32 *)(contents.data() + rels[i+2].r_offset); | ||
u32 rd = get_rd(insn1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there should be a range check for pcalau12i.
79b1e2d
to
3688c43
Compare
// pcalau12i/ld.d has been relaxed to pcalau12i/addi.d | ||
// reloc the pcalau12i as R_LARCH_PLACA_HI20 | ||
write_j20(loc, hi20(S + A, P)); | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should directly write to loc + 4
and then i += 3
here, instead of doing the same check in case R_LARCH_GOT_PC_LO12
.
elf/arch-loongarch.cc
Outdated
sym.is_local(ctx) && | ||
!sym.is_absolute() && | ||
!sym.is_ifunc() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you want to do the same as we do for R_RISCV_GOT_HI20
. Did you check that in arch-riscv.cc? We have sym.is_pcrel_linktime_const(ctx)
to check if the symbol's PC-relative address is link-time constant.
elf/arch-loongarch.cc
Outdated
!sym.is_absolute() && | ||
!sym.is_ifunc() && | ||
ctx.arg.relax && | ||
rels[i+1].r_type == R_LARCH_RELAX && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You want to pay more attention to the details -- add spaces around +
. This is how we format it in other places.
rels[i+3].r_type == R_LARCH_RELAX && | ||
rels[i+2].r_type == R_LARCH_GOT_PC_LO12 && | ||
rels[i+2].r_offset == rel.r_offset + 4) { | ||
u32 insn1 = *(ul32 *)(contents.data() + rel.r_offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and also indentation.
test/elf/loongarch64_relax-got.sh
Outdated
.globl get_sym1, get_sym2, get_sym3, get_sym4, get_sym5 | ||
get_sym1: | ||
la.global $a0, sym1 | ||
ld.w $a0, $a0, 0 | ||
ret | ||
get_sym2: | ||
la.global $a0, sym2 | ||
ld.w $a0, $a0, 0 | ||
ret | ||
get_sym3: | ||
la.global $a0, sym3 | ||
ld.w $a0, $a0, 0 | ||
ret | ||
get_sym4: | ||
la.global $a0, sym4 | ||
ld.w $a0, $a0, 0 | ||
ret | ||
get_sym5: | ||
la.global $a0, sym5 | ||
ld.w $a0, $a0, 0 | ||
ret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need five symbols for this test, as they are testing exactly the same relaxation.
elf/arch-loongarch.cc
Outdated
if (i >= 2 && get_r_delta(i-1) - get_r_delta(i-2) == 4) { | ||
// pcalau12i/ld.d has been relaxed to pcalau12i/addi.d, and | ||
// then the pair has been relaxed to pcaddi. | ||
// loc stores 'ld.d', rewrite ld.d with pcaddi | ||
*(ul32 *)loc = 0x1800'0000 | get_rd(*(ul32 *)loc); | ||
write_j20(loc, (S + A - P) >> 2); | ||
} else { | ||
if (i >= 2 && | ||
i + 1 < rels.size() && | ||
sym.is_local(ctx) && | ||
!sym.is_absolute() && | ||
!sym.is_ifunc() && | ||
ctx.arg.relax && | ||
rels[i-1].r_type == R_LARCH_RELAX && | ||
rels[i+1].r_type == R_LARCH_RELAX && | ||
rels[i-2].r_type == R_LARCH_GOT_PC_HI20 && | ||
rels[i-2].r_offset == rel.r_offset - 4) { | ||
u32 insn1 = *(ul32 *)(contents.data() + rels[i-2].r_offset); | ||
u32 insn2 = *(ul32 *)(contents.data() + rel.r_offset); | ||
u32 rd = get_rd(insn1); | ||
|
||
if (rd == get_rd(insn2)) { | ||
// pcalau12i/ld.d has been relaxed to pcalau12i/addi.d | ||
// rewrite the ld.d with addi.d | ||
*(ul32 *)loc = 0x02c00000 | rd | (rd << 5); | ||
write_k12(loc, S + A); | ||
break; | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can revert this change.
Do not care about the last two commits now, there are still some issues, I'll commit the new code once I've tested it out. |
6d525b7
to
3094d7d
Compare
I'll fix a few minor issues after merging this change. Next time, please pay more attention to indentation. Thank you! |
Yeah, Sorry for that. |
Thank you for doing this anyway. Did you have anything other than this, or is this the last one? |
TLS LD/GD relaxation is the next plan. |
As far as I know, LoongArch (wrongly) handles LD as a synonym for GD, so there should be only GD. I believe you guys' long-term plan is to support TLSDESC and make it the default. Is this correct? By the way, are you guys actually using mold as a linker, or just doing this for completeness? Just curious. |
Yeah, you are right, rui. At this stage, the implementation of ld is consistent with gd; to be precise, ld is synonymous with gd.
Yes.
Since we often need to build other tools via clang, mold greatly improves the speed of building clang for us. Also software like postgres is being tried out. |
I actually don't think you can relax TLS GD/LD on LoongArch, because calls to |
You're right, actually tls_desc is currently supported by loongarch, though it hasn't been introduced on mold yet. Considering the negligible performance gain from saving an instruction, it seems there is no need to apply relaxation on tls. |
Then maybe you want to add TLSDESC support? |
Yeah, I want have a try. |
If we get symval by accessing to local got, then we can omit one memory loading.
The first phase: pcalau12i/ld.d to pcala12i/addi.d
The second phase: pcalau12i/addi.d to pcaddi