Commit ad961eab authored by H.J. Lu's avatar H.J. Lu
Browse files

gold: Support x86-64 TLS code sequences without PLT

There are extensions to x86-64 psABI:

https://groups.google.com/forum/#!topic/x86-64-abi/de5_KnLHxtI

to call tls_get_addr via GOT:

call *__tls_get_addr@GOTPCREL(%rip)

Since direct call is 4-byte long and indirect call, is 5-byte long, the
extra one byte must be handled properly.

For general dynamic model, one 0x66 prefix before call instruction is
removed to make room for indirect call.  For local dynamic model, we
simply use 5-byte indirect call.

TLS linker optimization is updated to recognize new instruction
patterns.  For local dynamic model to local exec model transition, we
generate 4 0x66 prefixes, instead of 3, before mov instruction in 64-bit
and generate a 5-byte nop, instead of 4-byte, before mov instruction in
32-bit.

	PR gold/20216
	* configure.ac (DEFAULT_TARGET_X86_64_OR_X32): New
	AM_CONDITIONAL.
	* configure: Regenerated.
	* x86_64.cc (Target_x86_64<size>::Relocate::relocate): Allow
	R_X86_64_GOTPCRELX relocation against __tls_get_addr.
	(Target_x86_64<size>::Relocate::tls_gd_to_ie): Support indirect
	call to __tls_get_addr.
	(Target_x86_64<size>::Relocate::tls_gd_to_le): Likewise.
	(Target_x86_64<size>::Relocate::tls_ld_to_le): Likewise.
	* testsuite/Makefile.am (check_PROGRAMS): Add pr20216a_test,
	pr20216b_test, pr20216c_test, pr20216d_test, pr20216e_test.
	(pr20216a_test_SOURCES): New.
	(pr20216a_test_DEPENDENCIES): Likewise.
	(pr20216a_test_CFLAGS): Likewise.
	(pr20216a_test_LDFLAGS): Likewise.
	(pr20216a_test_LDADD): Likewise.
	(pr20216b_test_SOURCES): Likewise.
	(pr20216b_test_DEPENDENCIES): Likewise.
	(pr20216b_test_CFLAGS): Likewise.
	(pr20216b_test_LDFLAGS): Likewise.
	(pr20216b_test_LDADD): Likewise.
	(pr20216c_test_SOURCES): Likewise.
	(pr20216c_test_DEPENDENCIES): Likewise.
	(pr20216c_test_CFLAGS): Likewise.
	(pr20216c_test_LDFLAGS): Likewise.
	(pr20216c_test_LDADD): Likewise.
	(pr20216d_test_SOURCES): Likewise.
	(pr20216d_test_DEPENDENCIES): Likewise.
	(pr20216d_test_CFLAGS): Likewise.
	(pr20216d_test_LDFLAGS): Likewise.
	(pr20216d_test_LDADD): Likewise.
	(pr20216e_test_SOURCES): Likewise.
	(pr20216e_test_DEPENDENCIES): Likewise.
	(pr20216e_test_CFLAGS): Likewise.
	(pr20216e_test_LDFLAGS): Likewise.
	(pr20216e_test_LDADD): Likewise.
	(pr20216a.so): Likewise.
	(pr20216b.so): Likewise.
	(pr20216_gd.o): Likewise.
	(pr20216_ld.o): Likewise.
	(MOSTLYCLEANFILES): Add pr20216a.so pr20216b.so.
	* testsuite/Makefile.in: Regenerated.
	* testsuite/pr20216_def.c: New file.
	* testsuite/pr20216_gd.S: Likewise.
	* testsuite/pr20216_ld.S: Likewise.
	* testsuite/pr20216_main.c: Likewise.
parent 9bf74fb2
2016-06-29 H.J. Lu <hongjiu.lu@intel.com>
PR gold/20216
* configure.ac (DEFAULT_TARGET_X86_64_OR_X32): New
AM_CONDITIONAL.
* configure: Regenerated.
* x86_64.cc (Target_x86_64<size>::Relocate::relocate): Allow
R_X86_64_GOTPCRELX relocation against __tls_get_addr.
(Target_x86_64<size>::Relocate::tls_gd_to_ie): Support indirect
call to __tls_get_addr.
(Target_x86_64<size>::Relocate::tls_gd_to_le): Likewise.
(Target_x86_64<size>::Relocate::tls_ld_to_le): Likewise.
* testsuite/Makefile.am (check_PROGRAMS): Add pr20216a_test,
pr20216b_test, pr20216c_test, pr20216d_test, pr20216e_test.
(pr20216a_test_SOURCES): New.
(pr20216a_test_DEPENDENCIES): Likewise.
(pr20216a_test_CFLAGS): Likewise.
(pr20216a_test_LDFLAGS): Likewise.
(pr20216a_test_LDADD): Likewise.
(pr20216b_test_SOURCES): Likewise.
(pr20216b_test_DEPENDENCIES): Likewise.
(pr20216b_test_CFLAGS): Likewise.
(pr20216b_test_LDFLAGS): Likewise.
(pr20216b_test_LDADD): Likewise.
(pr20216c_test_SOURCES): Likewise.
(pr20216c_test_DEPENDENCIES): Likewise.
(pr20216c_test_CFLAGS): Likewise.
(pr20216c_test_LDFLAGS): Likewise.
(pr20216c_test_LDADD): Likewise.
(pr20216d_test_SOURCES): Likewise.
(pr20216d_test_DEPENDENCIES): Likewise.
(pr20216d_test_CFLAGS): Likewise.
(pr20216d_test_LDFLAGS): Likewise.
(pr20216d_test_LDADD): Likewise.
(pr20216e_test_SOURCES): Likewise.
(pr20216e_test_DEPENDENCIES): Likewise.
(pr20216e_test_CFLAGS): Likewise.
(pr20216e_test_LDFLAGS): Likewise.
(pr20216e_test_LDADD): Likewise.
(pr20216a.so): Likewise.
(pr20216b.so): Likewise.
(pr20216_gd.o): Likewise.
(pr20216_ld.o): Likewise.
(MOSTLYCLEANFILES): Add pr20216a.so pr20216b.so.
* testsuite/Makefile.in: Regenerated.
* testsuite/pr20216_def.c: New file.
* testsuite/pr20216_gd.S: Likewise.
* testsuite/pr20216_ld.S: Likewise.
* testsuite/pr20216_main.c: Likewise.
2016-06-29 Alan Modra <amodra@gmail.com>
* script_test_12.t: Delete .plt, specify 64k page size.
......
......@@ -690,6 +690,8 @@ DEFAULT_TARGET_MIPS_FALSE
DEFAULT_TARGET_MIPS_TRUE
DEFAULT_TARGET_TILEGX_FALSE
DEFAULT_TARGET_TILEGX_TRUE
DEFAULT_TARGET_X86_64_OR_X32_FALSE
DEFAULT_TARGET_X86_64_OR_X32_TRUE
DEFAULT_TARGET_X32_FALSE
DEFAULT_TARGET_X32_TRUE
DEFAULT_TARGET_X86_64_FALSE
......@@ -3539,6 +3541,14 @@ else
DEFAULT_TARGET_X32_FALSE=
fi
if test "$target_x86_64" = "yes" -o "$target_x32" = "yes"; then
DEFAULT_TARGET_X86_64_OR_X32_TRUE=
DEFAULT_TARGET_X86_64_OR_X32_FALSE='#'
else
DEFAULT_TARGET_X86_64_OR_X32_TRUE='#'
DEFAULT_TARGET_X86_64_OR_X32_FALSE=
fi
if test "$targ_obj" = "tilegx"; then
DEFAULT_TARGET_TILEGX_TRUE=
DEFAULT_TARGET_TILEGX_FALSE='#'
......@@ -7837,6 +7847,10 @@ if test -z "${DEFAULT_TARGET_X32_TRUE}" && test -z "${DEFAULT_TARGET_X32_FALSE}"
as_fn_error "conditional \"DEFAULT_TARGET_X32\" was never defined.
Usually this means the macro was only invoked conditionally." "$LINENO" 5
fi
if test -z "${DEFAULT_TARGET_X86_64_OR_X32_TRUE}" && test -z "${DEFAULT_TARGET_X86_64_OR_X32_FALSE}"; then
as_fn_error "conditional \"DEFAULT_TARGET_X86_64_OR_X32\" was never defined.
Usually this means the macro was only invoked conditionally." "$LINENO" 5
fi
if test -z "${DEFAULT_TARGET_TILEGX_TRUE}" && test -z "${DEFAULT_TARGET_TILEGX_FALSE}"; then
as_fn_error "conditional \"DEFAULT_TARGET_TILEGX\" was never defined.
Usually this means the macro was only invoked conditionally." "$LINENO" 5
......
......@@ -237,6 +237,8 @@ for targ in $target $canon_targets; do
fi
AM_CONDITIONAL(DEFAULT_TARGET_X86_64, test "$target_x86_64" = "yes")
AM_CONDITIONAL(DEFAULT_TARGET_X32, test "$target_x32" = "yes")
AM_CONDITIONAL(DEFAULT_TARGET_X86_64_OR_X32,
test "$target_x86_64" = "yes" -o "$target_x32" = "yes")
AM_CONDITIONAL(DEFAULT_TARGET_TILEGX, test "$targ_obj" = "tilegx")
AM_CONDITIONAL(DEFAULT_TARGET_MIPS, test "$targ_obj" = "mips")
DEFAULT_TARGET=${targ_obj}
......
......@@ -1147,6 +1147,59 @@ x32_overflow_pc32.err: x32_overflow_pc32.o gcctestdir/ld
endif DEFAULT_TARGET_X86_64
if DEFAULT_TARGET_X86_64_OR_X32
check_PROGRAMS += pr20216a_test
pr20216a_test_SOURCES = pr20216_main.c pr20216_def.c
pr20216a_test_DEPENDENCIES = pr20216_gd.o pr20216_ld.o gcctestdir/ld gcctestdir/as
pr20216a_test_CFLAGS = -Bgcctestdir/ -fPIE
pr20216a_test_LDFLAGS = -Bgcctestdir/ -Wl,-R,.
pr20216a_test_LDADD = pr20216_gd.o pr20216_ld.o
check_PROGRAMS += pr20216b_test
pr20216b_test_SOURCES = pr20216_main.c pr20216_def.c
pr20216b_test_DEPENDENCIES = pr20216_gd.o pr20216_ld.o gcctestdir/ld gcctestdir/as
pr20216b_test_CFLAGS = -Bgcctestdir/ -fPIE
pr20216b_test_LDFLAGS = -pie -Bgcctestdir/ -Wl,-R,.
pr20216b_test_LDADD = pr20216_gd.o pr20216_ld.o
check_PROGRAMS += pr20216c_test
pr20216c_test_SOURCES = pr20216_main.c pr20216_def.c
pr20216c_test_DEPENDENCIES = pr20216_gd.o pr20216_ld.o gcctestdir/ld gcctestdir/as
pr20216c_test_CFLAGS = -Bgcctestdir/ -fPIE
pr20216c_test_LDFLAGS = -static -Bgcctestdir/ -Wl,-R,.
pr20216c_test_LDADD = pr20216_gd.o pr20216_ld.o
check_PROGRAMS += pr20216d_test
pr20216d_test_SOURCES = pr20216_main.c pr20216_def.c
pr20216d_test_DEPENDENCIES = pr20216a.so gcctestdir/ld gcctestdir/as
pr20216d_test_CFLAGS = -Bgcctestdir/ -fPIE
pr20216d_test_LDFLAGS = -Bgcctestdir/ -Wl,-R,.
pr20216d_test_LDADD = pr20216a.so
check_PROGRAMS += pr20216e_test
pr20216e_test_SOURCES = pr20216_main.c
pr20216e_test_DEPENDENCIES = pr20216_gd.o pr20216_ld.o pr20216b.so gcctestdir/ld gcctestdir/as
pr20216e_test_CFLAGS = -Bgcctestdir/ -fPIE
pr20216e_test_LDFLAGS = -Bgcctestdir/ -Wl,-R,.
pr20216e_test_LDADD = pr20216_gd.o pr20216_ld.o pr20216b.so
MOSTLYCLEANFILES += pr20216a.so pr20216b.so
pr20216a.so: pr20216_gd.o pr20216_ld.o gcctestdir/ld
$(LINK) -Bgcctestdir/ -shared pr20216_gd.o pr20216_ld.o
pr20216b.so: pr20216_def.o gcctestdir/ld
$(LINK) -Bgcctestdir/ -shared pr20216_def.o
pr20216_gd.o: pr20216_gd.S
$(COMPILE) -c -o $@ $<
pr20216_ld.o: pr20216_ld.S
$(COMPILE) -c -o $@ $<
endif DEFAULT_TARGET_X86_64_OR_X32
if DEFAULT_TARGET_I386
check_SCRIPTS += i386_mov_to_lea.sh
......
This diff is collapsed.
__thread int gd = 1;
.text
.p2align 4,,15
.globl get_gd
.type get_gd, @function
get_gd:
subq $8, %rsp
#ifdef __LP64__
.byte 0x66
#endif
leaq gd@tlsgd(%rip), %rdi
.byte 0x66
rex64
call *__tls_get_addr@GOTPCREL(%rip)
addq $8, %rsp
ret
.size get_gd, .-get_gd
.text
.p2align 4,,15
.globl set_gd
.type set_gd, @function
set_gd:
pushq %rbx
movl %edi, %ebx
#ifdef __LP64__
.byte 0x66
#endif
leaq gd@tlsgd(%rip), %rdi
.value 0x6666
rex64
call __tls_get_addr@PLT
movl %ebx, (%rax)
popq %rbx
ret
.size set_gd, .-set_gd
.text
.p2align 4,,15
.globl test_gd
.type test_gd, @function
test_gd:
pushq %rbx
movl %edi, %ebx
#ifdef __LP64__
.byte 0x66
#endif
leaq gd@tlsgd(%rip), %rdi
.byte 0x66
rex64
call *__tls_get_addr@GOTPCREL(%rip)
cmpl %ebx, (%rax)
popq %rbx
sete %al
movzbl %al, %eax
ret
.size test_gd, .-test_gd
.section .note.GNU-stack,"",@progbits
.text
.p2align 4,,15
.globl get_ld
.type get_ld, @function
get_ld:
subq $8, %rsp
leaq ld@tlsld(%rip), %rdi
call __tls_get_addr@PLT
addq $8, %rsp
addq $ld@dtpoff, %rax
ret
.size get_ld, .-get_ld
.text
.p2align 4,,15
.globl set_ld
.type set_ld, @function
set_ld:
pushq %rbx
movl %edi, %ebx
leaq ld@tlsld(%rip), %rdi
call *__tls_get_addr@GOTPCREL(%rip)
movl %ebx, ld@dtpoff(%rax)
popq %rbx
ret
.size set_ld, .-set_ld
.text
.p2align 4,,15
.globl test_ld
.type test_ld, @function
test_ld:
pushq %rbx
movl %edi, %ebx
leaq ld@tlsld(%rip), %rdi
call *__tls_get_addr@GOTPCREL(%rip)
cmpl %ebx, ld@dtpoff(%rax)
popq %rbx
sete %al
movzbl %al, %eax
ret
.size test_ld, .-test_ld
.section .tbss,"awT",@nobits
.align 4
.type ld, @object
.size ld, 4
ld:
.zero 4
.section .note.GNU-stack,"",@progbits
#include <stdlib.h>
extern int * get_gd (void);
extern void set_gd (int);
extern int test_gd (int);
extern int * get_ld (void);
extern void set_ld (int);
extern int test_ld (int);
int
main ()
{
int *p;
p = get_gd ();
set_gd (3);
if (*p != 3 || !test_gd (3))
abort ();
p = get_ld ();
set_ld (4);
if (*p != 4 || !test_ld (4))
abort ();
return 0;
}
......@@ -3505,6 +3505,7 @@ Target_x86_64<size>::Relocate::relocate(
if (this->skip_call_tls_get_addr_)
{
if ((r_type != elfcpp::R_X86_64_PLT32
&& r_type != elfcpp::R_X86_64_GOTPCRELX
&& r_type != elfcpp::R_X86_64_PLT32_BND
&& r_type != elfcpp::R_X86_64_PC32_BND
&& r_type != elfcpp::R_X86_64_PC32)
......@@ -4169,16 +4170,23 @@ Target_x86_64<size>::Relocate::tls_gd_to_ie(
{
// For SIZE == 64:
// .byte 0x66; leaq foo@tlsgd(%rip),%rdi;
// .word 0x6666; rex64; call __tls_get_addr
// .word 0x6666; rex64; call __tls_get_addr@PLT
// ==> movq %fs:0,%rax; addq x@gottpoff(%rip),%rax
// .byte 0x66; leaq foo@tlsgd(%rip),%rdi;
// .word 0x66; rex64; call *__tls_get_addr@GOTPCREL(%rip)
// ==> movq %fs:0,%rax; addq x@gottpoff(%rip),%rax
// For SIZE == 32:
// leaq foo@tlsgd(%rip),%rdi;
// .word 0x6666; rex64; call __tls_get_addr
// .word 0x6666; rex64; call __tls_get_addr@PLT
// ==> movl %fs:0,%eax; addq x@gottpoff(%rip),%rax
// leaq foo@tlsgd(%rip),%rdi;
// .word 0x66; rex64; call *__tls_get_addr@GOTPCREL(%rip)
// ==> movl %fs:0,%eax; addq x@gottpoff(%rip),%rax
tls::check_range(relinfo, relnum, rela.get_r_offset(), view_size, 12);
tls::check_tls(relinfo, relnum, rela.get_r_offset(),
(memcmp(view + 4, "\x66\x66\x48\xe8", 4) == 0));
(memcmp(view + 4, "\x66\x66\x48\xe8", 4) == 0
|| memcmp(view + 4, "\x66\x48\xff", 3) == 0));
if (size == 64)
{
......@@ -4225,16 +4233,23 @@ Target_x86_64<size>::Relocate::tls_gd_to_le(
{
// For SIZE == 64:
// .byte 0x66; leaq foo@tlsgd(%rip),%rdi;
// .word 0x6666; rex64; call __tls_get_addr
// .word 0x6666; rex64; call __tls_get_addr@PLT
// ==> movq %fs:0,%rax; leaq x@tpoff(%rax),%rax
// .byte 0x66; leaq foo@tlsgd(%rip),%rdi;
// .word 0x66; rex64; call *__tls_get_addr@GOTPCREL(%rip)
// ==> movq %fs:0,%rax; leaq x@tpoff(%rax),%rax
// For SIZE == 32:
// leaq foo@tlsgd(%rip),%rdi;
// .word 0x6666; rex64; call __tls_get_addr
// .word 0x6666; rex64; call __tls_get_addr@PLT
// ==> movl %fs:0,%eax; leaq x@tpoff(%rax),%rax
// leaq foo@tlsgd(%rip),%rdi;
// .word 0x66; rex64; call *__tls_get_addr@GOTPCREL(%rip)
// ==> movl %fs:0,%eax; leaq x@tpoff(%rax),%rax
tls::check_range(relinfo, relnum, rela.get_r_offset(), view_size, 12);
tls::check_tls(relinfo, relnum, rela.get_r_offset(),
(memcmp(view + 4, "\x66\x66\x48\xe8", 4) == 0));
(memcmp(view + 4, "\x66\x66\x48\xe8", 4) == 0
|| memcmp(view + 4, "\x66\x48\xff", 3) == 0));
if (size == 64)
{
......@@ -4362,6 +4377,13 @@ Target_x86_64<size>::Relocate::tls_ld_to_le(
// For SIZE == 32:
// ... leq foo@dtpoff(%rax),%reg
// ==> nopl 0x0(%rax); movl %fs:0,%eax ... leaq x@tpoff(%rax),%rdx
// leaq foo@tlsld(%rip),%rdi; call *__tls_get_addr@GOTPCREL(%rip)
// For SIZE == 64:
// ... leq foo@dtpoff(%rax),%reg
// ==> .word 0x6666; .byte 0x6666; movq %fs:0,%rax ... leaq x@tpoff(%rax),%rdx
// For SIZE == 32:
// ... leq foo@dtpoff(%rax),%reg
// ==> nopw 0x0(%rax); movl %fs:0,%eax ... leaq x@tpoff(%rax),%rdx
tls::check_range(relinfo, relnum, rela.get_r_offset(), view_size, -3);
tls::check_range(relinfo, relnum, rela.get_r_offset(), view_size, 9);
......@@ -4369,12 +4391,25 @@ Target_x86_64<size>::Relocate::tls_ld_to_le(
tls::check_tls(relinfo, relnum, rela.get_r_offset(),
view[-3] == 0x48 && view[-2] == 0x8d && view[-1] == 0x3d);
tls::check_tls(relinfo, relnum, rela.get_r_offset(), view[4] == 0xe8);
tls::check_tls(relinfo, relnum, rela.get_r_offset(),
view[4] == 0xe8 || view[4] == 0xff);
if (size == 64)
memcpy(view - 3, "\x66\x66\x66\x64\x48\x8b\x04\x25\0\0\0\0", 12);
if (view[4] == 0xe8)
{
if (size == 64)
memcpy(view - 3, "\x66\x66\x66\x64\x48\x8b\x04\x25\0\0\0\0", 12);
else
memcpy(view - 3, "\x0f\x1f\x40\x00\x64\x8b\x04\x25\0\0\0\0", 12);
}
else
memcpy(view - 3, "\x0f\x1f\x40\x00\x64\x8b\x04\x25\0\0\0\0", 12);
{
if (size == 64)
memcpy(view - 3, "\x66\x66\x66\x66\x64\x48\x8b\x04\x25\0\0\0\0",
13);
else
memcpy(view - 3, "\x66\x0f\x1f\x40\x00\x64\x8b\x04\x25\0\0\0\0",
13);
}
// The next reloc should be a PLT32 reloc against __tls_get_addr.
// We can skip it.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment