objtool crashes on clang output (drivers/hwmon/pmbus/adm1275.o)

4 views
Skip to first unread message

Arnd Bergmann

unread,
Jul 11, 2019, 8:40:24 AM7/11/19
to Josh Poimboeuf, Peter Zijlstra, Linux Kernel Mailing List, clang-built-linux, Nick Desaulniers
During randconfig testing with clang-9, I came across an object file
that makes objtool segfault, see attachment. Let me know if you need
more information to
debug this.

I also get a ton of objtool warnings building random configurations, but Nick
mentioned that there is still a bug related to asm-goto in the build I'm using
that may be the root cause. Once I have a fixed clang-9 build, I can have a look
at those as well.

Arnd
adm1275.o
config.gz

Josh Poimboeuf

unread,
Jul 11, 2019, 1:26:26 PM7/11/19
to Arnd Bergmann, Peter Zijlstra, Linux Kernel Mailing List, clang-built-linux, Nick Desaulniers
Seg fault fix:

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 27818a93f0b1..ad18f8ef905a 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -902,7 +902,7 @@ static int add_switch_table(struct objtool_file *file, struct instruction *insn,
struct rela *table, struct rela *next_table)
{
struct rela *rela = table;
- struct instruction *alt_insn;
+ struct instruction *alt_insn, *prev_insn;
struct alternative *alt;
struct symbol *pfunc = insn->func->pfunc;
unsigned int prev_offset = 0;
@@ -924,6 +924,20 @@ static int add_switch_table(struct objtool_file *file, struct instruction *insn,
if (!alt_insn)
break;

+ if (!alt_insn->func) {
+ /*
+ * Clang 9 has a quirk where a switch table may have
+ * unused entries in the middle of the table which
+ * point to just past the end of the function. They're
+ * still part of the table but can be ignored.
+ */
+ prev_insn = list_prev_entry(alt_insn, list);
+ if (prev_insn->func && prev_insn->func->pfunc == pfunc)
+ goto skip;
+
+ break;
+ }
+
/* Make sure the jmp dest is in the function or subfunction: */
if (alt_insn->func->pfunc != pfunc)
break;
@@ -936,6 +950,7 @@ static int add_switch_table(struct objtool_file *file, struct instruction *insn,

alt->insn = alt_insn;
list_add_tail(&alt->list, &insn->alts);
+skip:
prev_offset = rela->offset;
}

Arnd Bergmann

unread,
Jul 11, 2019, 5:00:27 PM7/11/19
to Josh Poimboeuf, Peter Zijlstra, Linux Kernel Mailing List, clang-built-linux, Nick Desaulniers
On Thu, Jul 11, 2019 at 7:26 PM Josh Poimboeuf <jpoi...@redhat.com> wrote:
>
> On Thu, Jul 11, 2019 at 02:40:06PM +0200, Arnd Bergmann wrote:
> > During randconfig testing with clang-9, I came across an object file
> > that makes objtool segfault, see attachment. Let me know if you need
> > more information to
> > debug this.
> >
> > I also get a ton of objtool warnings building random configurations, but Nick
> > mentioned that there is still a bug related to asm-goto in the build I'm using
> > that may be the root cause. Once I have a fixed clang-9 build, I can have a look
> > at those as well.
>
> Seg fault fix:

Thanks for the fix! testing it over night now, will let you know tomorrow
if problems remain.

I wonder if this is also related to several warnings I get about switch
tables like:

drivers/usb/misc/sisusbvga/sisusb.o: warning: objtool:
sisusb_write_mem_bulk()+0x588: can't find switch jump table

drivers/gpu/drm/amd/amdgpu/../display/dc/dce110/dce110_mem_input_v.o:
warning: objtool: dce_mem_input_v_program_pte_vm()+0x46e: can't find
switch jump table
drivers/gpu/drm/amd/amdgpu/../display/dc/dce110/dce110_opp_csc_v.o:
warning: objtool: dce110_opp_v_set_csc_default()+0x714: can't find
switch jump table
drivers/gpu/drm/nouveau/nvkm/subdev/clk/nv50.o: warning: objtool:
nv50_clk_read()+0x15c: can't find switch jump table
drivers/media/dvb-frontends/cxd2880/cxd2880_tnrdmd_dvbt2.o: warning:
objtool: x_tune_dvbt2_demod_setting()+0x992: can't find switch jump
table
drivers/media/tuners/mt2063.o: warning: objtool:
MT2063_SetReceiverMode()+0x24d: can't find switch jump table
drivers/mmc/host/tifm_sd.o: warning: objtool: tifm_sd_exec()+0x7e:
can't find switch jump table
drivers/mtd/nand/raw/fsl_ifc_nand.o: warning: objtool:
fsl_ifc_nand_probe()+0x4c7: can't find switch jump table
drivers/net/can/at91_can.o: warning: objtool: at91_irq()+0x347: can't
find switch jump table
drivers/net/phy/phylink.o: warning: objtool:
phylink_mac_config()+0x2b5: can't find switch jump table
drivers/regulator/max8973-regulator.o: warning: objtool:
max8973_probe()+0x736: can't find switch jump table
drivers/regulator/tps80031-regulator.o: warning: objtool:
tps80031_regulator_probe()+0x143: can't find switch jump table
drivers/tty/cyclades.o: warning: objtool: cy_set_line_char()+0x86c:
can't find switch jump table
drivers/tty/serial/jsm/jsm_cls.o: warning: objtool: cls_param()+0x10b:
can't find switch jump table
drivers/tty/serial/jsm/jsm_neo.o: warning: objtool: neo_param()+0x151:
can't find switch jump table
drivers/usb/core/hub.o: warning: objtool: hub_probe()+0x920: can't
find switch jump table
drivers/usb/misc/sisusbvga/sisusb.o: warning: objtool:
sisusb_write_mem_bulk()+0x4db: can't find switch jump table
kernel/rcu/tree.o: warning: objtool: rcu_note_context_switch()+0x6b8:
can't find switch jump table
lib/zstd/decompress.o: warning: objtool:
ZSTD_decodeLiteralsBlock()+0x5e: can't find switch jump table

If you want to have a look, I can provide object files and/or reduced test
cases for this. My guess is that it is unrelated to the warnings that Nick
saw for asm-goto.

Arnd

Jann Horn

unread,
Jul 11, 2019, 5:05:02 PM7/11/19
to Arnd Bergmann, Josh Poimboeuf, Peter Zijlstra, Linux Kernel Mailing List, clang-built-linux, Nick Desaulniers
I was playing around with building the kernel with LLVM a few months
ago and used this local patch, but didn't get around to submitting
upstream because I couldn't reproduce the problem for some reason. I
think the warnings you're getting sound like what I saw back then:
https://gist.github.com/thejh/0434662728afb95d72455bf30ece5817

Quoting the commit message from that patch:

====
With clang from git master, code can be generated where a function contains
two indirect jump instructions that use the same switch table. To deal with
this case and similar ones properly, convert the switch table parsing to
use two passes:
====

Does that sound like what you're seeing?

Arnd Bergmann

unread,
Jul 11, 2019, 5:29:50 PM7/11/19
to Jann Horn, Josh Poimboeuf, Peter Zijlstra, Linux Kernel Mailing List, clang-built-linux, Nick Desaulniers
On Thu, Jul 11, 2019 at 11:05 PM 'Jann Horn' via Clang Built Linux
<clang-bu...@googlegroups.com> wrote:
> I was playing around with building the kernel with LLVM a few months
> ago and used this local patch, but didn't get around to submitting
> upstream because I couldn't reproduce the problem for some reason. I
> think the warnings you're getting sound like what I saw back then:
> https://gist.github.com/thejh/0434662728afb95d72455bf30ece5817
>
> Quoting the commit message from that patch:
>
> ====
> With clang from git master, code can be generated where a function contains
> two indirect jump instructions that use the same switch table. To deal with
> this case and similar ones properly, convert the switch table parsing to
> use two passes:
> ====
>
> Does that sound like what you're seeing?

Yes, that is exactly right, and your patch seems to address the problem
for the cases I tried so far (will know more after a night of randconfig
testing).

Tested-by: Arnd Bergmann <ar...@arndb.de>

Arnd

Josh Poimboeuf

unread,
Jul 11, 2019, 7:20:55 PM7/11/19
to Jann Horn, Arnd Bergmann, Peter Zijlstra, Linux Kernel Mailing List, clang-built-linux, Nick Desaulniers
Thanks Jann, I like this approach. Hopefully it also works with GCC.

The switch tables (and jump tables in general) have been a hot topic in
objtool lately. I have several other patches pending which touch this
code. I'll integrate your patch with the others and try to do some more
testing in GCC.

--
Josh

Arnd Bergmann

unread,
Jul 12, 2019, 3:51:53 AM7/12/19
to Jann Horn, Josh Poimboeuf, Peter Zijlstra, Linux Kernel Mailing List, clang-built-linux, Nick Desaulniers
I no longer see any of the "can't find switch jump table" in last
nights randconfig
builds. I do see one other rare warning, see attached object file:

fs/reiserfs/do_balan.o: warning: objtool: replace_key()+0x158: stack
state mismatch: cfa1=7+40 cfa2=7+56
fs/reiserfs/do_balan.o: warning: objtool: balance_leaf()+0x2791: stack
state mismatch: cfa1=7+176 cfa2=7+192
fs/reiserfs/ibalance.o: warning: objtool: balance_internal()+0xe8f:
stack state mismatch: cfa1=7+240 cfa2=7+248
fs/reiserfs/ibalance.o: warning: objtool:
internal_move_pointers_items()+0x36f: stack state mismatch: cfa1=7+152
cfa2=7+144
fs/reiserfs/lbalance.o: warning: objtool:
leaf_cut_from_buffer()+0x58b: stack state mismatch: cfa1=7+128
cfa2=7+112
fs/reiserfs/lbalance.o: warning: objtool:
leaf_copy_boundary_item()+0x7a9: stack state mismatch: cfa1=7+104
cfa2=7+96
fs/reiserfs/lbalance.o: warning: objtool:
leaf_copy_items_entirely()+0x3d2: stack state mismatch: cfa1=7+120
cfa2=7+128

I suspect this comes from the calls to the __reiserfs_panic() noreturn function,
but have not actually looked at the object file.

Arnd
lbalance.o

Josh Poimboeuf

unread,
Jul 12, 2019, 9:57:59 AM7/12/19
to Arnd Bergmann, Jann Horn, Peter Zijlstra, Linux Kernel Mailing List, clang-built-linux, Nick Desaulniers
On Fri, Jul 12, 2019 at 09:51:35AM +0200, Arnd Bergmann wrote:
> I no longer see any of the "can't find switch jump table" in last
> nights randconfig
> builds. I do see one other rare warning, see attached object file:
>
> fs/reiserfs/do_balan.o: warning: objtool: replace_key()+0x158: stack
> state mismatch: cfa1=7+40 cfa2=7+56
> fs/reiserfs/do_balan.o: warning: objtool: balance_leaf()+0x2791: stack
> state mismatch: cfa1=7+176 cfa2=7+192
> fs/reiserfs/ibalance.o: warning: objtool: balance_internal()+0xe8f:
> stack state mismatch: cfa1=7+240 cfa2=7+248
> fs/reiserfs/ibalance.o: warning: objtool:
> internal_move_pointers_items()+0x36f: stack state mismatch: cfa1=7+152
> cfa2=7+144
> fs/reiserfs/lbalance.o: warning: objtool:
> leaf_cut_from_buffer()+0x58b: stack state mismatch: cfa1=7+128
> cfa2=7+112
> fs/reiserfs/lbalance.o: warning: objtool:
> leaf_copy_boundary_item()+0x7a9: stack state mismatch: cfa1=7+104
> cfa2=7+96
> fs/reiserfs/lbalance.o: warning: objtool:
> leaf_copy_items_entirely()+0x3d2: stack state mismatch: cfa1=7+120
> cfa2=7+128
>
> I suspect this comes from the calls to the __reiserfs_panic() noreturn function,
> but have not actually looked at the object file.

Looking at one of the examples:

2346: 0f 85 6a 01 00 00 jne 24b6 <leaf_copy_items_entirely+0x3a8>
...
23b1: e9 2a 01 00 00 jmpq 24e0 <leaf_copy_items_entirely+0x3d2>
...
24b6: 31 ff xor %edi,%edi
24b8: 48 c7 c6 00 00 00 00 mov $0x0,%rsi
24bb: R_X86_64_32S .rodata.str1.1
24bf: 48 c7 c2 00 00 00 00 mov $0x0,%rdx
24c2: R_X86_64_32S .rodata.str1.1+0x127b
24c6: 48 c7 c1 00 00 00 00 mov $0x0,%rcx
24c9: R_X86_64_32S .rodata.str1.1+0x1679
24cd: 41 b8 90 01 00 00 mov $0x190,%r8d
24d3: 49 c7 c1 00 00 00 00 mov $0x0,%r9
24d6: R_X86_64_32S .rodata.str1.1+0x127b
24da: b8 00 00 00 00 mov $0x0,%eax
24df: 55 push %rbp
24e0: 41 52 push %r10
24e2: e8 00 00 00 00 callq 24e7 <leaf_item_bottle>
24e3: R_X86_64_PC32 __reiserfs_panic-0x4

Objtool is correct this time: There *is* a stack state mismatch at
0x24e0. The stack size is different at 0x24e0, depending on whether it
came from 0x2346 or from 0x23b1.

In this case it's not a problem for code flow, because the basic block
is a dead end.

But it *is* a problem for unwinding. The location of the previous stack
frame is nondeterministic.

And that's extra important for calls to noreturn functions, because they
often dump the stack before exiting.

So it looks like a compiler bug to me.

--
Josh

Arnd Bergmann

unread,
Jul 12, 2019, 10:19:19 AM7/12/19
to Josh Poimboeuf, Jann Horn, Peter Zijlstra, Linux Kernel Mailing List, clang-built-linux, Nick Desaulniers
The change below would shut up the warnings, and presumably avoid
the unwinding problem as well. Should I submit that for inclusion,
or should we try to fix clang first?

Arnd

diff --git a/fs/reiserfs/prints.c b/fs/reiserfs/prints.c
index 9fed1c05f1f4..da996eaaebac 100644
--- a/fs/reiserfs/prints.c
+++ b/fs/reiserfs/prints.c
@@ -387,7 +387,6 @@ void __reiserfs_panic(struct super_block *sb,
const char *id,
else
printk(KERN_WARNING "REISERFS panic: %s%s%s: %s\n",
id ? id : "", id ? " " : "", function, error_buf);
- BUG();
}

void __reiserfs_error(struct super_block *sb, const char *id,
@@ -397,8 +396,10 @@ void __reiserfs_error(struct super_block *sb,
const char *id,

BUG_ON(sb == NULL);

- if (reiserfs_error_panic(sb))
+ if (reiserfs_error_panic(sb)) {
__reiserfs_panic(sb, id, function, error_buf);
+ BUG();
+ }

if (id && id[0])
printk(KERN_CRIT "REISERFS error (device %s): %s %s: %s\n",
diff --git a/fs/reiserfs/reiserfs.h b/fs/reiserfs/reiserfs.h
index e5ca9ed79e54..f5bd17ee21f6 100644
--- a/fs/reiserfs/reiserfs.h
+++ b/fs/reiserfs/reiserfs.h
@@ -3185,10 +3185,9 @@ void unfix_nodes(struct tree_balance *);

/* prints.c */
void __reiserfs_panic(struct super_block *s, const char *id,
- const char *function, const char *fmt, ...)
- __attribute__ ((noreturn));
+ const char *function, const char *fmt, ...);
#define reiserfs_panic(s, id, fmt, args...) \
- __reiserfs_panic(s, id, __func__, fmt, ##args)
+ do { __reiserfs_panic(s, id, __func__, fmt, ##args); BUG(); } while (0)
void __reiserfs_error(struct super_block *s, const char *id,
const char *function, const char *fmt, ...);
#define reiserfs_error(s, id, fmt, args...) \

Josh Poimboeuf

unread,
Jul 12, 2019, 10:29:34 AM7/12/19
to Arnd Bergmann, Jann Horn, Peter Zijlstra, Linux Kernel Mailing List, clang-built-linux, Nick Desaulniers
That should work, though I guess it's up to the reiserfs maintainers.

The issue still needs to get fixed in clang regardless. There are other
noreturn functions in the kernel and this problem could easily pop back
up.

--
Josh

Nick Desaulniers

unread,
Jul 12, 2019, 12:59:14 PM7/12/19
to Arnd Bergmann, Jann Horn, Peter Zijlstra, Linux Kernel Mailing List, clang-built-linux, Josh Poimboeuf
Sure, thanks for the report. Arnd, can you help us get a more minimal
test case to understand the issue better?
--
Thanks,
~Nick Desaulniers

Arnd Bergmann

unread,
Jul 12, 2019, 4:41:16 PM7/12/19
to Nick Desaulniers, Jann Horn, Peter Zijlstra, Linux Kernel Mailing List, clang-built-linux, Josh Poimboeuf
On Fri, Jul 12, 2019 at 6:59 PM 'Nick Desaulniers' via Clang Built
Linux <clang-bu...@googlegroups.com> wrote:
> > The issue still needs to get fixed in clang regardless. There are other
> > noreturn functions in the kernel and this problem could easily pop back
> > up.
>
> Sure, thanks for the report. Arnd, can you help us get a more minimal
> test case to understand the issue better?

I reduced it to this testcase:

int a, b;
void __reiserfs_panic(int, ...) __attribute__((noreturn));
void balance_internal() {
if (a)
__reiserfs_panic(0, "", __func__, "", 2, __func__, a);
if (b)
__reiserfs_panic(0, "", __func__, "", 5, __func__, a, 0);
}

https://godbolt.org/z/Byfvmx

$ clang-8 -mstack-alignment=8 -S ibalance.c -Wall -Os -c
$ objtool orc generate ibalance.o
ibalance.o: warning: objtool: balance_internal()+0x61: stack state
mismatch: cfa1=7+8 cfa2=7+16

Arnd

Nick Desaulniers

unread,
Jul 16, 2019, 4:24:39 PM7/16/19
to Josh Poimboeuf, Jann Horn, Peter Zijlstra, Linux Kernel Mailing List, clang-built-linux, Arnd Bergmann
On Fri, Jul 12, 2019 at 1:41 PM Arnd Bergmann <ar...@arndb.de> wrote:
>
> On Fri, Jul 12, 2019 at 6:59 PM 'Nick Desaulniers' via Clang Built
> Linux <clang-bu...@googlegroups.com> wrote:
> > > The issue still needs to get fixed in clang regardless. There are other
> > > noreturn functions in the kernel and this problem could easily pop back
> > > up.
> >
> > Sure, thanks for the report. Arnd, can you help us get a more minimal
> > test case to understand the issue better?
>
> I reduced it to this testcase:
>
> int a, b;
> void __reiserfs_panic(int, ...) __attribute__((noreturn));
> void balance_internal() {
> if (a)
> __reiserfs_panic(0, "", __func__, "", 2, __func__, a);
> if (b)
> __reiserfs_panic(0, "", __func__, "", 5, __func__, a, 0);
> }
>
> https://godbolt.org/z/Byfvmx

Is this the same issue as Josh pointed out? IIUC, Josh pointed to a
jump destination that was past a `push %rbp`, and I don't see it in
your link. (Or, did I miss it?)
--
Thanks,
~Nick Desaulniers

Arnd Bergmann

unread,
Jul 16, 2019, 6:05:32 PM7/16/19
to Nick Desaulniers, Josh Poimboeuf, Jann Horn, Peter Zijlstra, Linux Kernel Mailing List, clang-built-linux
On Tue, Jul 16, 2019 at 10:24 PM 'Nick Desaulniers' via Clang Built
Linux <clang-bu...@googlegroups.com> wrote:
>
> On Fri, Jul 12, 2019 at 1:41 PM Arnd Bergmann <ar...@arndb.de> wrote:
> >
> > On Fri, Jul 12, 2019 at 6:59 PM 'Nick Desaulniers' via Clang Built
> > Linux <clang-bu...@googlegroups.com> wrote:
> > > > The issue still needs to get fixed in clang regardless. There are other
> > > > noreturn functions in the kernel and this problem could easily pop back
> > > > up.
> > >
> > > Sure, thanks for the report. Arnd, can you help us get a more minimal
> > > test case to understand the issue better?
> >
> > I reduced it to this testcase:
> >
> > int a, b;
> > void __reiserfs_panic(int, ...) __attribute__((noreturn));
> > void balance_internal() {
> > if (a)
> > __reiserfs_panic(0, "", __func__, "", 2, __func__, a);
> > if (b)
> > __reiserfs_panic(0, "", __func__, "", 5, __func__, a, 0);
> > }
> >
> > https://godbolt.org/z/Byfvmx
>
> Is this the same issue as Josh pointed out? IIUC, Josh pointed to a
> jump destination that was past a `push %rbp`, and I don't see it in
> your link. (Or, did I miss it?)

I think it can be any push. The point is that the stack is different
between the two branches leading up to the noreturn call.

Arnd

Josh Poimboeuf

unread,
Jul 16, 2019, 7:03:42 PM7/16/19
to Arnd Bergmann, Nick Desaulniers, Jann Horn, Peter Zijlstra, Linux Kernel Mailing List, clang-built-linux
Right.

--
Josh

Nick Desaulniers

unread,
Jul 18, 2019, 6:37:00 PM7/18/19
to Josh Poimboeuf, Arnd Bergmann, Jann Horn, Peter Zijlstra, Linux Kernel Mailing List, clang-built-linux
So if I remove the `-mstack-alignment=8` command line flag, it looks
like the stack depth will still differ on calls to __reiserfs_panic,
but now the call is not shared (two separate code paths):
https://godbolt.org/z/tvkXwK. Is that ok or also bad?

I'm getting the feeling that `-mstack-alignment=8` might have some
issues once we start pushing parameters on the stack. How many can we
use registers for in x86 before resorting to the stack, and does the
function being variadic affect this? (if not, maybe a test case
without variadic and many-parameters would not conflate the issue?)
--
Thanks,
~Nick Desaulniers

Josh Poimboeuf

unread,
Jul 18, 2019, 7:20:19 PM7/18/19
to Nick Desaulniers, Arnd Bergmann, Jann Horn, Peter Zijlstra, Linux Kernel Mailing List, clang-built-linux
That looks ok. I'm not sure whether removing the stack alignment would
fix it though, you might have just gotten lucky.

> I'm getting the feeling that `-mstack-alignment=8` might have some
> issues once we start pushing parameters on the stack. How many can we
> use registers for in x86 before resorting to the stack, and does the
> function being variadic affect this? (if not, maybe a test case
> without variadic and many-parameters would not conflate the issue?)

Yeah, I think calling a variadic function (or a function with more than
6 args) does have something to do with it, because then some arguments
have to be passed on the stack.

--
Josh
Reply all
Reply to author
Forward
0 new messages