Adrián Moreno_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
On 12/21/21 18:31, Sterling Augustine wrote:
> The thing to do with problems like this is to run with clang's sanitizers, which
> diagnose undefined behavior, memory errors, and other such issues that often
> show up only under optimization. A cursory look at all the casting in this
> example makes me think there is undefined behavior, but there very well could be
> another type of failure.
Interestingly, when I enable UndefinedBehaviorSanitizer, my problem disappears
(and sanitizer does not report anything). In fact, the issue is quite elisuve,
adding a function call, a compiler barrier or not inlining a single function
changes the behavior of the program.
MemorySanitizer does print an error that I'm investigating:
==762381==WARNING: MemorySanitizer: use-of-uninitialized-value
[Detaching after fork from child process 764910]
#0 0x4b1871 in hmap_next__
/home/amorenoz/code/ovs/out/include/openvswitch/hmap.h:379:13
#1 0x4af81f in hmap_first
/home/amorenoz/code/ovs/out/include/openvswitch/hmap.h:391:12
#2 0x4adf3a in main /home/amorenoz/dev/bugs/2014942/example.c:82:5
#3 0x7ffff77c055f in __libc_start_call_main (/lib64/libc.so.6+0x2d55f)
#4 0x7ffff77c060b in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2d60b)
#5 0x42c244 in _start (/home/amorenoz/devel/bugs/2014942/example_+0x42c244)
Uninitialized value was stored to memory at
#0 0x4b17d8 in hmap_next__
/home/amorenoz/code/ovs/out/include/openvswitch/hmap.h:378:27
#1 0x4af81f in hmap_first
/home/amorenoz/code/ovs/out/include/openvswitch/hmap.h:391:12
#2 0x4adf3a in main /home/amorenoz/dev/bugs/2014942/example.c:82:5
#3 0x7ffff77c055f in __libc_start_call_main (/lib64/libc.so.6+0x2d55f)
Uninitialized value was created by a heap allocation
#0 0x45c942 in malloc (/home/amorenoz/devel/bugs/2014942/example_+0x45c942)
#1 0x4b4c9d in xmalloc__ /home/amorenoz/code/ovs/lib/util.c:137:15
#2 0x4b4c9d in xmalloc /home/amorenoz/code/ovs/lib/util.c:172:12
>
> https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
> <https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html>
> https://clang.llvm.org/docs/AddressSanitizer.html
> <https://clang.llvm.org/docs/AddressSanitizer.html>
> https://clang.llvm.org/docs/MemorySanitizer.html
> llvm...@lists.llvm.org <mailto:llvm...@lists.llvm.org>
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
--
Adrián Moreno
It is quite obvious that the code is wrong.
Can't you spot the problem with this?:
member1 = malloc(sizeof *member1);
member2 = malloc(sizeof *member2);
memset(member1, 0, sizeof (struct member));
memset(member2, 0, sizeof (struct member));
On 12/21/21 19:44, James Dutton wrote:
> On Tue, 21 Dec 2021 at 16:30, Adrian Moreno via llvm-dev
> <llvm...@lists.llvm.org> wrote:
>>
>> Hello,
>>
>> I need some help understanding what might be wrong with a piece of code from the
>> openvswitch project. By ${subject} I'm not suggesting there's a problem in
>> clang, gcc also shows the same behavior so it's likely our code is broken. I am
>> kindly asking for help to understand/troubleshoot the problem.
>>
>> Summary: It seems that certain interaction between two main openvswitch data
>> structures, when optimized ("-O2 -flto=auto") is broken.
>> The two data structures are:
>>
>> hmap: https://github.com/openvswitch/ovs/blob/master/include/openvswitch/hmap.h
>> list: https://github.com/openvswitch/ovs/blob/master/include/openvswitch/list.h
>>
>> I've reproduced the problem outside of openvswitch daemon using a short C
>> program (attached)
>>
>> Code snippet:
>>
> ...
>
> It is quite obvious that the code is wrong.
> Can't you spot the problem with this?:
> member1 = malloc(sizeof *member1);
> member2 = malloc(sizeof *member2);
> memset(member1, 0, sizeof (struct member));
> memset(member2, 0, sizeof (struct member));
>
Sorry, I pasted that code snippet from a semi-uncooked version. Removing the
pointer dereference in the sizeof (as in the attached version) has no effect on
the problem.
--
Adrián Moreno
Thanks for running it!
Thanks Dimitry for spotting it.
So you were running in a non GNUC system? I was testing on Fedora so we were
using different implementations of this macro. I will raise it with the
openvswitch community that the non GNUC is being flagged by UBSan as an error.
For the _GNUC_ version, is using *(OBJECT) inside the "typeof" operand also
undefined behavior?
> -Dimitry
>
--
Adrián Moreno
I have a suspicion of what could be causing the problem, I'd like to run through
you to get your feedback from the compiler pov.
One of the things that originally felt smelly was that the fact that the macros
that iterate the list elements assume the "struct ovs_list" element is embedded
into another "struct member":
struct member *pos = 0;
for ((((pos) = ((struct member *) (void *) ((uintptr_t)(void *)((&start)->next)
- __builtin_offsetof (struct member, elem)))));
&(pos)->elem != (&start);
((pos) = ((struct member *) (void *) ((uintptr_t)(void
*)((pos)->elem.next) - __builtin_offsetof (struct member , elem)
)))) {
[... use pos ]
}
however, the first node in the list ("start") is a "struct ovs_list" defined in
the stack and _not_ embedded into a "struct member".
Therefore, "(&start)->next - __builtin_offsetof (struct member, elem)" actually
points to somewhere in the stack that contains who knows what. (Note: initially
start->next = start)
In fact, if I make the struct offsets zero:
struct member {
struct ovs_list elem;
[ ... ]
int order;
};
... the code works. However if I use:
struct member {
int padding[10];
struct ovs_list elem;
[ ... ]
int order;
};
... the code fails.
Does anything of what I'm saying make sense so far?
If the code inside the loop just made use of "pos" through "(&pos->elem)" then
the compiler could(?) be ok with it but the loop actually contains:
if (member->order > pos->order) {
break;
}
So here I do not know what the compiler would think about "pos" if it happens to
point to some invalid stack address.
Thanks for the help.
--
Adrián Moreno
typeof(*NULL) should be fine, just like sizeof(*NULL), it’s not a “real” dereference.
At least, -Wnull-dereference isn’t emitted for either.
Jess
Thanks very much Dimitry. With your inlined example clang does work but gcc
still fails. So clearly the linking does influence.
>>
>>> For the _GNUC_ version, is using *(OBJECT) inside the "typeof" operand also undefined behavior?
>>
>> (Note that clang also defines __GNUC__ so this block will be used for clang too.)
>>
>> If OBJECT ever becomes a null pointer, as in the example, then it is undefined.
>>
>> And I don't think an argument "I'm only doing a typeof(*nullptr), not really accessing it" works... :)
>
> typeof(*NULL) should be fine, just like sizeof(*NULL), it’s not a “real” dereference.
>
> At least, -Wnull-dereference isn’t emitted for either.
>
> Jess
>
After manually expanding the macros and replacing all the typeof(*nullptr) with
the actual type, the behavior of both inlined and statically linked versions
remains the same.
What UBSan is flagging is the use of:
((void*)0)) - __builtin_offsetof(struct member, hmap_node))))
So we clearly need to change that to avoid the potential undefined behavior
Also tried a patch based on the theory I've posted in the other subthread (that
one possible issue is that I'm using a "struct ovs_list" allocated in the stack
and treating them as if it was embedded inside a "struct member", which
generates potential accesses to arbitrary stack addresses). The patch below
fixes both clang and gcc in all optimization levels.
But I wouldn't like to stop the investigation until I understand all the
potential problems (these macros and data structures are wildly used in
openvswitch).
--- example-inlined.c 2021-12-22 16:05:12.584459052 +0100
+++ example-inlined-patched.c 2021-12-22 15:55:21.168212637 +0100
@@ -318,7 +320,9 @@
//struct ovs_list sstart;
//struct ovs_list start = &start;
- struct ovs_list start = {0};
+ // Using a member to hold the ovs_list start
+ struct member start_member = {0};
+ struct ovs_list *start = &start_member.elem;
bond = malloc(sizeof (struct bond));
memset(bond, 0, sizeof (struct bond));
@@ -334,14 +338,14 @@
hmap_insert(&bond->members, &member1->hmap_node,
(uint32_t)(uintptr_t)member1);
hmap_insert(&bond->members, &member2->hmap_node,
(uint32_t)(uintptr_t)member2);
- printf("start: %p\n", &start);
- ovs_list_init(&start);
+ printf("start: %p\n", start);
+ ovs_list_init(start);
HMAP_FOR_EACH (member, hmap_node, &bond->members) {
/*
* Insert member in start (sorted)
* */
struct member *pos;
- LIST_FOR_EACH (pos, elem, &start) {
+ LIST_FOR_EACH (pos, elem, start) {
if (member->order > pos->order) {
break;
}
@@ -351,13 +355,14 @@
ovs_list_insert(&pos->elem, &member->elem);
}
+
// TESTED: If, instead of using an hmap to iterate through bond->members,
I add them
// one by one, the problem disappears:
//insert_ordered(&start, member1);
//insert_ordered(&start, member2);
- struct member *first = CONTAINER_OF(ovs_list_front(&start), struct member,
elem);
- struct member *last = CONTAINER_OF(ovs_list_back(&start), struct member, elem);
+ struct member *first = CONTAINER_OF(ovs_list_front(start), struct member,
elem);
+ struct member *last = CONTAINER_OF(ovs_list_back(start), struct member, elem);
printf("first: %p \nlast: %p\n", first, last);
/* I've inserted two members into the 'start' list.
@@ -366,7 +371,7 @@
if ((first != member1 && first != member2) || (last != member1 && last !=
member2)) {
printf("list is broken!\n");
printf("Start: %p. start->next: %p, start->next->next: %p,
start->prev: %p\n",
- &start, start.next, start.next->next, start.prev);
+ start, start->next, start->next->next, start->prev);
ret = 1;
goto out;
--
Adrián Moreno