[llvm-dev] broken C code only when optimized "-O2"

73 views
Skip to first unread message

Adrian Moreno via llvm-dev

unread,
Dec 21, 2021, 11:30:22 AM12/21/21
to llvm...@lists.llvm.org
Hello,

I need some help understanding what might be wrong with a piece of code from the
openvswitch project. By ${subject} I'm not suggesting there's a problem in
clang, gcc also shows the same behavior so it's likely our code is broken. I am
kindly asking for help to understand/troubleshoot the problem.

Summary: It seems that certain interaction between two main openvswitch data
structures, when optimized ("-O2 -flto=auto") is broken.
The two data structures are:

hmap: https://github.com/openvswitch/ovs/blob/master/include/openvswitch/hmap.h
list: https://github.com/openvswitch/ovs/blob/master/include/openvswitch/list.h

I've reproduced the problem outside of openvswitch daemon using a short C
program (attached)

Code snippet:

struct bond {
struct hmap members;
};

struct member {
struct hmap_node hmap_node;
int order;
struct ovs_list elem;
};

int main() {
int ret = 0;
struct member *member, *member1, *member2;
struct bond *bond;
struct ovs_list start = {0};

bond = malloc(sizeof *bond);
memset(bond, 0, sizeof (struct bond));
hmap_init(&bond->members);

member1 = malloc(sizeof *member1);
member2 = malloc(sizeof *member2);
memset(member1, 0, sizeof (struct member));
memset(member2, 0, sizeof (struct member));

member1->order = 3;
member2->order = 2;

hmap_insert(&bond->members, &member1->hmap_node, (uint32_t)(uintptr_t)member1);
hmap_insert(&bond->members, &member2->hmap_node, (uint32_t)(uintptr_t)member2);

ovs_list_init(&start);
HMAP_FOR_EACH (member, hmap_node, &bond->members) {
/*
* Insert member in start (sorted)
* */
struct member *pos;
LIST_FOR_EACH (pos, elem, &start) {
if (member->order > pos->order) {
break;
}
}
// TESTED: If I add this printf, the problem disappears
//printf("Inserting member: %p\n", member);
ovs_list_insert(&pos->elem, &member->elem);
}

/* I've inserted two members into the 'start' list.
* first and last have to be either member1 or member2
* */
if ((first != member1 && first != member2) || (last != member1 && last !=
member2)) {
printf("list is broken!\n");
}

}


What I know for now:
* -fno-strict-aliasing does not fix it
* Only happens with "-O2 -flto=auto"
* If I define 'ovs_list *start' and change the code to use the pointer directly
and not '&start' the problem disappears. It seems that the LIST_FOR_EACH macros
prefer an lvalue rather than "&" but I don't get why.
* I'm not able to reproduce without using hmap _and_ ovs_list.
* If I add a compiler barrier (or a call to an external function) after the
loop, the problem disappears (e.g printf), the problem disappears.
* If I add -fsanitize=undefined the problem disappears!

I'd really appreciate any hint or idea to try to understand this problem.

Thanks in advanced.

--
Adrián Moreno
example.c

Sterling Augustine via llvm-dev

unread,
Dec 21, 2021, 12:32:15 PM12/21/21
to Adrian Moreno, llvm...@lists.llvm.org
The thing to do with problems like this is to run with clang's sanitizers, which diagnose undefined behavior, memory errors, and other such issues that often show up only under optimization. A cursory look at all the casting in this example makes me think there is undefined behavior, but there very well could be another type of failure.



Adrián Moreno_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Adrian Moreno via llvm-dev

unread,
Dec 21, 2021, 1:23:09 PM12/21/21
to Sterling Augustine, llvm...@lists.llvm.org
Thanks Sterling

On 12/21/21 18:31, Sterling Augustine wrote:
> The thing to do with problems like this is to run with clang's sanitizers, which
> diagnose undefined behavior, memory errors, and other such issues that often
> show up only under optimization. A cursory look at all the casting in this
> example makes me think there is undefined behavior, but there very well could be
> another type of failure.

Interestingly, when I enable UndefinedBehaviorSanitizer, my problem disappears
(and sanitizer does not report anything). In fact, the issue is quite elisuve,
adding a function call, a compiler barrier or not inlining a single function
changes the behavior of the program.

MemorySanitizer does print an error that I'm investigating:

==762381==WARNING: MemorySanitizer: use-of-uninitialized-value
[Detaching after fork from child process 764910]
#0 0x4b1871 in hmap_next__
/home/amorenoz/code/ovs/out/include/openvswitch/hmap.h:379:13
#1 0x4af81f in hmap_first
/home/amorenoz/code/ovs/out/include/openvswitch/hmap.h:391:12
#2 0x4adf3a in main /home/amorenoz/dev/bugs/2014942/example.c:82:5
#3 0x7ffff77c055f in __libc_start_call_main (/lib64/libc.so.6+0x2d55f)
#4 0x7ffff77c060b in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2d60b)
#5 0x42c244 in _start (/home/amorenoz/devel/bugs/2014942/example_+0x42c244)

Uninitialized value was stored to memory at
#0 0x4b17d8 in hmap_next__
/home/amorenoz/code/ovs/out/include/openvswitch/hmap.h:378:27
#1 0x4af81f in hmap_first
/home/amorenoz/code/ovs/out/include/openvswitch/hmap.h:391:12
#2 0x4adf3a in main /home/amorenoz/dev/bugs/2014942/example.c:82:5
#3 0x7ffff77c055f in __libc_start_call_main (/lib64/libc.so.6+0x2d55f)

Uninitialized value was created by a heap allocation
#0 0x45c942 in malloc (/home/amorenoz/devel/bugs/2014942/example_+0x45c942)
#1 0x4b4c9d in xmalloc__ /home/amorenoz/code/ovs/lib/util.c:137:15
#2 0x4b4c9d in xmalloc /home/amorenoz/code/ovs/lib/util.c:172:12

>
> https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
> <https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html>
> https://clang.llvm.org/docs/AddressSanitizer.html
> <https://clang.llvm.org/docs/AddressSanitizer.html>
> https://clang.llvm.org/docs/MemorySanitizer.html

> llvm...@lists.llvm.org <mailto:llvm...@lists.llvm.org>
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>

--
Adrián Moreno

Dimitry Andric via llvm-dev

unread,
Dec 21, 2021, 1:25:11 PM12/21/21
to Adrian Moreno, llvm...@lists.llvm.org
Not for me:

% clang -g -fsanitize=undefined -I/Users/dim/tmp/vswitch/foo/include example.c -o example -L/Users/dim/tmp/vswitch/foo/lib -lopenvswitch

% ./example
start: 0x16ee6f618
example.c:78:5: runtime error: applying zero offset to null pointer
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior example.c:78:5 in
example.c:78:5: runtime error: applying zero offset to null pointer
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior example.c:78:5 in
first: 0x600003200270
last: 0x6000032002a0

It looks like the HMAP_FOR_EACH() macro uses null pointer arithmetic. The problem appears to be in https://github.com/openvswitch/ovs/blob/master/include/openvswitch/util.h#L94 :

/* Given OBJECT of type pointer-to-structure, expands to the offset of MEMBER
* within an instance of the structure.
*
* The GCC-specific version avoids the technicality of undefined behavior if
* OBJECT is null, invalid, or not yet initialized. This makes some static
* checkers (like Coverity) happier. But the non-GCC version does not actually
* dereference any pointer, so it would be surprising for it to cause any
* problems in practice.
*/
#ifdef __GNUC__
#define OBJECT_OFFSETOF(OBJECT, MEMBER) offsetof(typeof(*(OBJECT)), MEMBER)
#else
#define OBJECT_OFFSETOF(OBJECT, MEMBER) \
((char *) &(OBJECT)->MEMBER - (char *) (OBJECT))
#endif

The comment is incorrect here, because dereferencing a null pointer, as done in *(OBJECT), is definitely undefined behavior.

-Dimitry

signature.asc

James Dutton via llvm-dev

unread,
Dec 21, 2021, 1:45:27 PM12/21/21
to Adrian Moreno, llvm-dev
On Tue, 21 Dec 2021 at 16:30, Adrian Moreno via llvm-dev
<llvm...@lists.llvm.org> wrote:
>
> Hello,
>
> I need some help understanding what might be wrong with a piece of code from the
> openvswitch project. By ${subject} I'm not suggesting there's a problem in
> clang, gcc also shows the same behavior so it's likely our code is broken. I am
> kindly asking for help to understand/troubleshoot the problem.
>
> Summary: It seems that certain interaction between two main openvswitch data
> structures, when optimized ("-O2 -flto=auto") is broken.
> The two data structures are:
>
> hmap: https://github.com/openvswitch/ovs/blob/master/include/openvswitch/hmap.h
> list: https://github.com/openvswitch/ovs/blob/master/include/openvswitch/list.h
>
> I've reproduced the problem outside of openvswitch daemon using a short C
> program (attached)
>
> Code snippet:
>
...

It is quite obvious that the code is wrong.
Can't you spot the problem with this?:


member1 = malloc(sizeof *member1);
member2 = malloc(sizeof *member2);
memset(member1, 0, sizeof (struct member));
memset(member2, 0, sizeof (struct member));

Adrian Moreno via llvm-dev

unread,
Dec 21, 2021, 1:48:26 PM12/21/21
to James Dutton, llvm-dev

On 12/21/21 19:44, James Dutton wrote:
> On Tue, 21 Dec 2021 at 16:30, Adrian Moreno via llvm-dev
> <llvm...@lists.llvm.org> wrote:
>>
>> Hello,
>>
>> I need some help understanding what might be wrong with a piece of code from the
>> openvswitch project. By ${subject} I'm not suggesting there's a problem in
>> clang, gcc also shows the same behavior so it's likely our code is broken. I am
>> kindly asking for help to understand/troubleshoot the problem.
>>
>> Summary: It seems that certain interaction between two main openvswitch data
>> structures, when optimized ("-O2 -flto=auto") is broken.
>> The two data structures are:
>>
>> hmap: https://github.com/openvswitch/ovs/blob/master/include/openvswitch/hmap.h
>> list: https://github.com/openvswitch/ovs/blob/master/include/openvswitch/list.h
>>
>> I've reproduced the problem outside of openvswitch daemon using a short C
>> program (attached)
>>
>> Code snippet:
>>
> ...
>
> It is quite obvious that the code is wrong.
> Can't you spot the problem with this?:
> member1 = malloc(sizeof *member1);
> member2 = malloc(sizeof *member2);
> memset(member1, 0, sizeof (struct member));
> memset(member2, 0, sizeof (struct member));
>

Sorry, I pasted that code snippet from a semi-uncooked version. Removing the
pointer dereference in the sizeof (as in the attached version) has no effect on
the problem.

--
Adrián Moreno

via llvm-dev

unread,
Dec 21, 2021, 4:05:17 PM12/21/21
to james....@gmail.com, amor...@redhat.com, llvm...@lists.llvm.org
> It is quite obvious that the code is wrong.
> Can't you spot the problem with this?:

Please keep your comments positive and helpful, in accordance with
the standards of the LLVM community.

> member1 = malloc(sizeof *member1);
> member2 = malloc(sizeof *member2);
> memset(member1, 0, sizeof (struct member));
> memset(member2, 0, sizeof (struct member));

If you're referring to the dereference in (sizeof *member1),
the operand of sizeof is unevaluated, which AFAICT means it
won't trigger undefined behavior for being a null dereference.
--paulr

Adrian Moreno via llvm-dev

unread,
Dec 22, 2021, 2:34:06 AM12/22/21
to Dimitry Andric, llvm...@lists.llvm.org

Thanks for running it!

Thanks Dimitry for spotting it.

So you were running in a non GNUC system? I was testing on Fedora so we were
using different implementations of this macro. I will raise it with the
openvswitch community that the non GNUC is being flagged by UBSan as an error.

For the _GNUC_ version, is using *(OBJECT) inside the "typeof" operand also
undefined behavior?


> -Dimitry
>

--
Adrián Moreno

Adrian Moreno via llvm-dev

unread,
Dec 22, 2021, 5:05:40 AM12/22/21
to Sterling Augustine, llvm...@lists.llvm.org
Hi,

I have a suspicion of what could be causing the problem, I'd like to run through
you to get your feedback from the compiler pov.

One of the things that originally felt smelly was that the fact that the macros
that iterate the list elements assume the "struct ovs_list" element is embedded
into another "struct member":

struct member *pos = 0;

for ((((pos) = ((struct member *) (void *) ((uintptr_t)(void *)((&start)->next)
- __builtin_offsetof (struct member, elem)))));
&(pos)->elem != (&start);
((pos) = ((struct member *) (void *) ((uintptr_t)(void
*)((pos)->elem.next) - __builtin_offsetof (struct member , elem)
)))) {
[... use pos ]
}

however, the first node in the list ("start") is a "struct ovs_list" defined in
the stack and _not_ embedded into a "struct member".

Therefore, "(&start)->next - __builtin_offsetof (struct member, elem)" actually
points to somewhere in the stack that contains who knows what. (Note: initially
start->next = start)

In fact, if I make the struct offsets zero:

struct member {
struct ovs_list elem;
[ ... ]
int order;
};

... the code works. However if I use:

struct member {
int padding[10];
struct ovs_list elem;
[ ... ]
int order;
};

... the code fails.

Does anything of what I'm saying make sense so far?

If the code inside the loop just made use of "pos" through "(&pos->elem)" then
the compiler could(?) be ok with it but the loop actually contains:


if (member->order > pos->order) {
break;
}

So here I do not know what the compiler would think about "pos" if it happens to
point to some invalid stack address.

Thanks for the help.

--
Adrián Moreno

Dimitry Andric via llvm-dev

unread,
Dec 22, 2021, 7:10:19 AM12/22/21
to Adrian Moreno, llvm-dev
On 22 Dec 2021, at 08:33, Adrian Moreno <amor...@redhat.com> wrote:
>
> On 12/21/21 19:25, Dimitry Andric wrote:
>> On 21 Dec 2021, at 17:30, Adrian Moreno via llvm-dev <llvm...@lists.llvm.org> wrote:
>>>
>>> I need some help understanding what might be wrong with a piece of code from the openvswitch project. By ${subject} I'm not suggesting there's a problem in clang, gcc also shows the same behavior so it's likely our code is broken. I am kindly asking for help to understand/troubleshoot the problem.
...
Yes, I have run this on macOS 12.1 with Apple clang-1300.0.29.30, and on FreeBSD 14-CURRENT, with clang 13.0.0. In both cases, turning optimization on or off does not make a difference for the UBSan reports. I only added -g to be able to debug.

Note also that neither on macOS nor on FreeBSD do I ever get the program to print "list is broken!". It does not matter which optimization level I choose. So here you see that the system's memory layout will likely have an influence on the result.

Since you told us that adding LTO flags makes the problem appear, maybe this is related? I know that on FreeBSD, none of the libraries you link in are stored in LLVM bitcode (yet, at least:), and I suspect that this is also the case on macOS.

I attached another version of the example, with the external functions from openvswitch inlined, so it should only depend on system headers, and won't need any external libraries. Maybe this makes it easier to reason about the problem. Again, on macOS and FreeBSD this shows UBSan messages, but never "list is broken!".


> For the _GNUC_ version, is using *(OBJECT) inside the "typeof" operand also undefined behavior?

(Note that clang also defines __GNUC__ so this block will be used for clang too.)

If OBJECT ever becomes a null pointer, as in the example, then it is undefined.

And I don't think an argument "I'm only doing a typeof(*nullptr), not really accessing it" works... :)

-Dimitry
example-inlined.c
signature.asc

Jessica Clarke via llvm-dev

unread,
Dec 22, 2021, 8:01:19 AM12/22/21
to Dimitry Andric, llvm-dev

typeof(*NULL) should be fine, just like sizeof(*NULL), it’s not a “real” dereference.

At least, -Wnull-dereference isn’t emitted for either.

Jess

Adrian Moreno via llvm-dev

unread,
Dec 22, 2021, 10:07:47 AM12/22/21
to Jessica Clarke, Dimitry Andric, llvm-dev

Thanks very much Dimitry. With your inlined example clang does work but gcc
still fails. So clearly the linking does influence.

>>
>>> For the _GNUC_ version, is using *(OBJECT) inside the "typeof" operand also undefined behavior?
>>
>> (Note that clang also defines __GNUC__ so this block will be used for clang too.)
>>
>> If OBJECT ever becomes a null pointer, as in the example, then it is undefined.
>>
>> And I don't think an argument "I'm only doing a typeof(*nullptr), not really accessing it" works... :)
>
> typeof(*NULL) should be fine, just like sizeof(*NULL), it’s not a “real” dereference.
>
> At least, -Wnull-dereference isn’t emitted for either.
>
> Jess
>

After manually expanding the macros and replacing all the typeof(*nullptr) with
the actual type, the behavior of both inlined and statically linked versions
remains the same.

What UBSan is flagging is the use of:
((void*)0)) - __builtin_offsetof(struct member, hmap_node))))

So we clearly need to change that to avoid the potential undefined behavior

Also tried a patch based on the theory I've posted in the other subthread (that
one possible issue is that I'm using a "struct ovs_list" allocated in the stack
and treating them as if it was embedded inside a "struct member", which
generates potential accesses to arbitrary stack addresses). The patch below
fixes both clang and gcc in all optimization levels.

But I wouldn't like to stop the investigation until I understand all the
potential problems (these macros and data structures are wildly used in
openvswitch).

--- example-inlined.c 2021-12-22 16:05:12.584459052 +0100
+++ example-inlined-patched.c 2021-12-22 15:55:21.168212637 +0100
@@ -318,7 +320,9 @@
//struct ovs_list sstart;
//struct ovs_list start = &start;

- struct ovs_list start = {0};
+ // Using a member to hold the ovs_list start
+ struct member start_member = {0};
+ struct ovs_list *start = &start_member.elem;

bond = malloc(sizeof (struct bond));


memset(bond, 0, sizeof (struct bond));

@@ -334,14 +338,14 @@


hmap_insert(&bond->members, &member1->hmap_node,
(uint32_t)(uintptr_t)member1);
hmap_insert(&bond->members, &member2->hmap_node,
(uint32_t)(uintptr_t)member2);

- printf("start: %p\n", &start);
- ovs_list_init(&start);
+ printf("start: %p\n", start);
+ ovs_list_init(start);


HMAP_FOR_EACH (member, hmap_node, &bond->members) {
/*
* Insert member in start (sorted)
* */
struct member *pos;

- LIST_FOR_EACH (pos, elem, &start) {
+ LIST_FOR_EACH (pos, elem, start) {


if (member->order > pos->order) {
break;
}

@@ -351,13 +355,14 @@
ovs_list_insert(&pos->elem, &member->elem);
}

+
// TESTED: If, instead of using an hmap to iterate through bond->members,
I add them
// one by one, the problem disappears:
//insert_ordered(&start, member1);
//insert_ordered(&start, member2);

- struct member *first = CONTAINER_OF(ovs_list_front(&start), struct member,
elem);
- struct member *last = CONTAINER_OF(ovs_list_back(&start), struct member, elem);
+ struct member *first = CONTAINER_OF(ovs_list_front(start), struct member,
elem);
+ struct member *last = CONTAINER_OF(ovs_list_back(start), struct member, elem);

printf("first: %p \nlast: %p\n", first, last);


/* I've inserted two members into the 'start' list.

@@ -366,7 +371,7 @@


if ((first != member1 && first != member2) || (last != member1 && last !=
member2)) {
printf("list is broken!\n");

printf("Start: %p. start->next: %p, start->next->next: %p,
start->prev: %p\n",
- &start, start.next, start.next->next, start.prev);
+ start, start->next, start->next->next, start->prev);
ret = 1;
goto out;


--
Adrián Moreno

Reply all
Reply to author
Forward
0 new messages