c++ concurrent map

Francesco Nigro

unread,

Feb 8, 2016, 4:52:27 AM2/8/16

to mechanical-sympathy

FYI

http://preshing.com/20160201/new-concurrent-hash-maps-for-cpp/

Gil Tene

unread,

Feb 8, 2016, 12:05:50 PM2/8/16

to mechanical-sympathy

Cool example of using u-GC (QSBR) in a concurrent algorithm to gain scalability, while keeping the GC logic internal to the data structure (internal except that each thread using the code needs to periodically call something).

On Monday, February 8, 2016 at 1:52:27 AM UTC-8, Francesco Nigro wrote:

FYI

http://preshing.com/20160201/new-concurrent-hash-maps-for-cpp/

Francesco Nigro

unread,

Feb 8, 2016, 1:11:19 PM2/8/16

to mechanical-sympathy

And the memory consume semantic to exploit better perf on weak memory model CPUs...it's interesting to read code like this for a java programmer like me :) It help to understand what java could gain adding a little bit of control for certain operation...

Todd Montgomery

unread,

Feb 8, 2016, 8:52:19 PM2/8/16

to mechanical-sympathy

I spent a while looking at read-copy-update (RCU) and QSBR for Aeron's C++ API. Opted not to use any of those techniques due to the need to have users make idle calls for reclamation. Which is difficult for a library to impose effectively. But hooking pthread calls and some other options are quite valid. And at a systems perspective, it is trivial to add these type of reclamation calls to an idle strategy. I'm a proponent of the general approach because of what it opens up.

It's worth noting that C++11 atomics are quite laborious in comparison to the JMM. The weaker memory models for std::atomic are.... tricky. But more useful than most in the C++ community seem to want to go along with. But not sure Java could actually leverage them, honestly. I'm not sure if the optimizations they enable are things the current optimizers in Java are able to do much with. So much data dependency.

On Mon, Feb 8, 2016 at 10:11 AM, Francesco Nigro <nigr...@gmail.com> wrote:

And the memory consume semantic to exploit better perf on weak memory model CPUs...it's interesting to read code like this for a java programmer like me :) It help to understand what java could gain adding a little bit of control for certain operation...

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dan Eloff

unread,

Feb 8, 2016, 9:14:11 PM2/8/16

to mechanica...@googlegroups.com

Looking at the performance graphs, they're only with 6 threads and showing clear sub-linear scaling after 4 threads. While the performance is amazing, it might be premature to give it the advantage over some of the others that seem to scale better. With large numbers of cores this implementation might need further tuning or may no longer be competitive. Intel TBB hash map, at least back when I tried it several years ago was actually pretty slow. I easily created a concurrent hash map that got better performance - however at high core counts the situation could be different. Several years is also a very long time in this business, and my experience may no longer be relevant.

On Mon, Feb 8, 2016 at 4:52 AM, Francesco Nigro <nigr...@gmail.com> wrote:

FYI

http://preshing.com/20160201/new-concurrent-hash-maps-for-cpp/

Vitaly Davidovich

unread,

Feb 8, 2016, 9:21:40 PM2/8/16

to mechanica...@googlegroups.com

On Monday, February 8, 2016, Todd Montgomery <tm...@nard.net> wrote:

I spent a while looking at read-copy-update (RCU) and QSBR for Aeron's C++ API. Opted not to use any of those techniques due to the need to have users make idle calls for reclamation. Which is difficult for a library to impose effectively. But hooking pthread calls and some other options are quite valid. And at a systems perspective, it is trivial to add these type of reclamation calls to an idle strategy. I'm a proponent of the general approach because of what it opens up.

It's worth noting that C++11 atomics are quite laborious in comparison to the JMM. The weaker memory models for std::atomic are.... tricky. But more useful than most in the C++ community seem to want to go along with. But not sure Java could actually leverage them, honestly. I'm not sure if the optimizations they enable are things the current optimizers in Java are able to do much with. So much data dependency.

These are coming to Java by way of VarHandle, with the exception of memory_order_consume I think.

By "so much data dependency" you mean pointer/memory chasing?

On Mon, Feb 8, 2016 at 10:11 AM, Francesco Nigro <nigr...@gmail.com> wrote:
And the memory consume semantic to exploit better perf on weak memory model CPUs...it's interesting to read code like this for a java programmer like me :) It help to understand what java could gain adding a little bit of control for certain operation...

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

Todd Montgomery

unread,

Feb 8, 2016, 9:49:02 PM2/8/16

to mechanical-sympathy

Hmmm. Might have to look at the latest VarHandles changes then. Didn't notice them before.

Yes, sorry for being imprecise. As I was doing some optimizations before using memory_order_relaxed (not Aeron, though), the more indirection (specifically pointer chasing),

the less optimizations could be done. In one case, a change to use std::make_shared was enough to allow the optimizer to do its thing. So, I've started to dig down and look when I

have the chance. Haven't been able to really find analogues in the JVM yet. i.e. same kinds of things don't do anything useful.

Aeron usage of relaxed is in the rate reporter and tied to the samples only. Which was an easy optimization. But you can play with heap allocating the RateReporter and

see if that has the same optimizations. When I checked clang pre-7, it didn't optimize as well then.

Rajiv Kurian

unread,

Feb 9, 2016, 12:11:32 AM2/9/16

to mechanical-sympathy

I last looked about a year or so ago and at that time neither GCC nor Clang optimized memory_order_consumer in cases where it could actually be optimized (both preferring to just treat it like memory_order_acquire). Not that it even matters for x86 much. Though it seems like there are a few papers out there to narrow its definition and actually make it implementable - http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4036.pdf

On Monday, February 8, 2016 at 6:21:40 PM UTC-8, Vitaly Davidovich wrote:

On Monday, February 8, 2016, Todd Montgomery <tm...@nard.net> wrote:
I spent a while looking at read-copy-update (RCU) and QSBR for Aeron's C++ API. Opted not to use any of those techniques due to the need to have users make idle calls for reclamation. Which is difficult for a library to impose effectively. But hooking pthread calls and some other options are quite valid. And at a systems perspective, it is trivial to add these type of reclamation calls to an idle strategy. I'm a proponent of the general approach because of what it opens up.

It's worth noting that C++11 atomics are quite laborious in comparison to the JMM. The weaker memory models for std::atomic are.... tricky. But more useful than most in the C++ community seem to want to go along with. But not sure Java could actually leverage them, honestly. I'm not sure if the optimizations they enable are things the current optimizers in Java are able to do much with. So much data dependency.
These are coming to Java by way of VarHandle, with the exception of memory_order_consume I think.

By "so much data dependency" you mean pointer/memory chasing?

On Mon, Feb 8, 2016 at 10:11 AM, Francesco Nigro <nigr...@gmail.com> wrote:

And the memory consume semantic to exploit better perf on weak memory model CPUs...it's interesting to read code like this for a java programmer like me :) It help to understand what java could gain adding a little bit of control for certain operation...

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Rajiv Kurian

unread,

Feb 9, 2016, 12:27:04 AM2/9/16

to mechanical-sympathy

The only good theoretical use of memory_order_relaxed I find is updating counters, progress bars etc where you give the compiler more room to play around with dependencies than with memory_order_release. Do you have any short examples where this actually generates better code? On x86 at least both GCC and Clang generate a "lock add" for fetch_and_add in both relaxed and release mode. Similarly they generate a simple "mov" for a store operation for both relaxed and release mode. So the only place where I see it generating better code is some affordance through playing around with dependencies.

What I found weird is that neither GCC nor Clang optimize a loop of stores in release or relaxed mode to a single store. As far as I can tell that transformation is totally valid.

On Monday, February 8, 2016 at 6:49:02 PM UTC-8, Todd L. Montgomery wrote:

Hmmm. Might have to look at the latest VarHandles changes then. Didn't notice them before.

Yes, sorry for being imprecise. As I was doing some optimizations before using memory_order_relaxed (not Aeron, though), the more indirection (specifically pointer chasing),
the less optimizations could be done. In one case, a change to use std::make_shared was enough to allow the optimizer to do its thing. So, I've started to dig down and look when I
have the chance. Haven't been able to really find analogues in the JVM yet. i.e. same kinds of things don't do anything useful.

Aeron usage of relaxed is in the rate reporter and tied to the samples only. Which was an easy optimization. But you can play with heap allocating the RateReporter and
see if that has the same optimizations. When I checked clang pre-7, it didn't optimize as well then.

On Mon, Feb 8, 2016 at 6:21 PM, Vitaly Davidovich <vit...@gmail.com> wrote:

On Monday, February 8, 2016, Todd Montgomery <tm...@nard.net> wrote:
I spent a while looking at read-copy-update (RCU) and QSBR for Aeron's C++ API. Opted not to use any of those techniques due to the need to have users make idle calls for reclamation. Which is difficult for a library to impose effectively. But hooking pthread calls and some other options are quite valid. And at a systems perspective, it is trivial to add these type of reclamation calls to an idle strategy. I'm a proponent of the general approach because of what it opens up.

It's worth noting that C++11 atomics are quite laborious in comparison to the JMM. The weaker memory models for std::atomic are.... tricky. But more useful than most in the C++ community seem to want to go along with. But not sure Java could actually leverage them, honestly. I'm not sure if the optimizations they enable are things the current optimizers in Java are able to do much with. So much data dependency.
These are coming to Java by way of VarHandle, with the exception of memory_order_consume I think.

By "so much data dependency" you mean pointer/memory chasing?

On Mon, Feb 8, 2016 at 10:11 AM, Francesco Nigro <nigr...@gmail.com> wrote:

And the memory consume semantic to exploit better perf on weak memory model CPUs...it's interesting to read code like this for a java programmer like me :) It help to understand what java could gain adding a little bit of control for certain operation...

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Rajiv Kurian

unread,

Feb 9, 2016, 12:28:22 AM2/9/16

to mechanical-sympathy

Assembly for my claims - http://goo.gl/5rU0l3

Todd Montgomery

unread,

Feb 9, 2016, 3:31:54 AM2/9/16

to mechanical-sympathy

Hi Rajiv.

I can talk about one example that you can play with yourself that I mentioned. That is the Aeron C++ RateReporter.

https://github.com/real-logic/Aeron/blob/master/aeron-samples/src/main/cpp/RateReporter.h and usage

https://github.com/real-logic/Aeron/blob/master/aeron-samples/src/main/cpp/RateSubscriber.cpp

I would be very surprised if a small example would be ideal for exploring this. The more "fluid" code nearby the better. Which with Aeron apps is easy.

The Aeron C++ RateReporter is just a reporter of last seconds rate. So, yes, it is in essence similar to what you mention in purpose.

As I didn't keep notes from this back in the summer, an anecdote will have to suffice. The reporter was originally not stack allocated (where it is used) and used default memory order. The throughput test rate was about like the Java rate (about 22M msgs/sec or so at that point). Using relax was a small improvement (1-2%, IIRC). It wasn't until moving the reporter to stack allocation that everything fell into place. About a 15% improvement, IIRC, to about 26M). I tend to make changes and run perf as I iterate. So, I remember this quite well. And as I mentioned, pre clang 7. I noticed some additional optimizations are there now with latest Mac OS X and mentioned it in the gitter room. Now this definitely enabled other code to be rearranged quite well. And being curious, I did play around with it enough to determine that reducing indirection was key. So, yes, it is mostly as you mention, enabling further optimizations.

BTW, there is a lot more optimizations left to look into in Aeron on the C++ side.

The other I can't go into detail yet. Once I understand why it does what it does, make sure it isn't doing something non-safe, and I get it done, I will, though. However, I can mention that the outer class holding the std::atomic using relax is the one being instantiated with std::make_shared (and the shared_ptr isn't contended) and that this combo seems to allow something to fall into place. Because it is night and day different when not using make_shared vs new, it is definitely enabling something. Still digging into it to understand it.

There are numerous places where atomicity is the only _real_ requirement and ordering can be relaxed. Ordering is an illusion and must be imposed.

On Mon, Feb 8, 2016 at 9:28 PM, Rajiv Kurian <geet...@gmail.com> wrote:

Assembly for my claims - http://goo.gl/5rU0l3

On Monday, February 8, 2016 at 9:27:04 PM UTC-8, Rajiv Kurian wrote:

The only good theoretical use of memory_order_relaxed I find is updating counters, progress bars etc where you give the compiler more room to play around with dependencies than with memory_order_release. Do you have any short examples where this actually generates better code? On x86 at least both GCC and Clang generate a "lock add" for fetch_and_add in both relaxed and release mode. Similarly they generate a simple "mov" for a store operation for both relaxed and release mode. So the only place where I see it generating better code is some affordance through playing around with dependencies.

What I found weird is that neither GCC nor Clang optimize a loop of stores in release or relaxed mode to a single store. As far as I can tell that transformation is totally valid.

On Monday, February 8, 2016 at 6:49:02 PM UTC-8, Todd L. Montgomery wrote:

Hmmm. Might have to look at the latest VarHandles changes then. Didn't notice them before.

Yes, sorry for being imprecise. As I was doing some optimizations before using memory_order_relaxed (not Aeron, though), the more indirection (specifically pointer chasing),
the less optimizations could be done. In one case, a change to use std::make_shared was enough to allow the optimizer to do its thing. So, I've started to dig down and look when I
have the chance. Haven't been able to really find analogues in the JVM yet. i.e. same kinds of things don't do anything useful.

Aeron usage of relaxed is in the rate reporter and tied to the samples only. Which was an easy optimization. But you can play with heap allocating the RateReporter and
see if that has the same optimizations. When I checked clang pre-7, it didn't optimize as well then.

On Mon, Feb 8, 2016 at 6:21 PM, Vitaly Davidovich <vit...@gmail.com> wrote:

On Monday, February 8, 2016, Todd Montgomery <tm...@nard.net> wrote:
I spent a while looking at read-copy-update (RCU) and QSBR for Aeron's C++ API. Opted not to use any of those techniques due to the need to have users make idle calls for reclamation. Which is difficult for a library to impose effectively. But hooking pthread calls and some other options are quite valid. And at a systems perspective, it is trivial to add these type of reclamation calls to an idle strategy. I'm a proponent of the general approach because of what it opens up.

It's worth noting that C++11 atomics are quite laborious in comparison to the JMM. The weaker memory models for std::atomic are.... tricky. But more useful than most in the C++ community seem to want to go along with. But not sure Java could actually leverage them, honestly. I'm not sure if the optimizations they enable are things the current optimizers in Java are able to do much with. So much data dependency.
These are coming to Java by way of VarHandle, with the exception of memory_order_consume I think.

By "so much data dependency" you mean pointer/memory chasing?

On Mon, Feb 8, 2016 at 10:11 AM, Francesco Nigro <nigr...@gmail.com> wrote:

And the memory consume semantic to exploit better perf on weak memory model CPUs...it's interesting to read code like this for a java programmer like me :) It help to understand what java could gain adding a little bit of control for certain operation...

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Vitaly Davidovich

unread,

Feb 9, 2016, 7:06:41 AM2/9/16

to mechanica...@googlegroups.com

On Tuesday, February 9, 2016, Rajiv Kurian <geet...@gmail.com> wrote:

The only good theoretical use of memory_order_relaxed I find is updating counters, progress bars etc where you give the compiler more room to play around with dependencies than with memory_order_release. Do you have any short examples where this actually generates better code? On x86 at least both GCC and Clang generate a "lock add" for fetch_and_add in both relaxed and release mode. Similarly they generate a simple "mov" for a store operation for both relaxed and release mode. So the only place where I see it generating better code is some affordance through playing around with dependencies.

Yeah I don't think x86 is the best example for the weaker order operations given the cpu has pretty strong memory model (only store-load can be reordered). Relaxed mode can be good if it allows other scheduling to be done around the load/store, but again, given x86 OoO engine there's not much juice to be squeezed there.

On weaker memory models, these things can be more useful. For example, some archs have a weak CAS which doesn't act like a full fence (unlike x86). Other archs don't order stores which means a relaxed and release store will issue different set of instructions. Other archs like Itanium are in-order so compiler does scheduling - I suspect one would see a lot more different code motion opened up by using weak order operations.

What I found weird is that neither GCC nor Clang optimize a loop of stores in release or relaxed mode to a single store. As far as I can tell that transformation is totally valid.

Why do you think this is a valid transformation? AFAICT all the ordered instructions require the compiler to actually perform them; they can schedule other things around them and omit cpu fences in some of them, but the operation must be performed (like volatile, except with atomicity).

On Monday, February 8, 2016 at 6:49:02 PM UTC-8, Todd L. Montgomery wrote:

Hmmm. Might have to look at the latest VarHandles changes then. Didn't notice them before.

Yes, sorry for being imprecise. As I was doing some optimizations before using memory_order_relaxed (not Aeron, though), the more indirection (specifically pointer chasing),
the less optimizations could be done. In one case, a change to use std::make_shared was enough to allow the optimizer to do its thing. So, I've started to dig down and look when I
have the chance. Haven't been able to really find analogues in the JVM yet. i.e. same kinds of things don't do anything useful.

Aeron usage of relaxed is in the rate reporter and tied to the samples only. Which was an easy optimization. But you can play with heap allocating the RateReporter and
see if that has the same optimizations. When I checked clang pre-7, it didn't optimize as well then.

On Mon, Feb 8, 2016 at 6:21 PM, Vitaly Davidovich <vit...@gmail.com> wrote:

On Monday, February 8, 2016, Todd Montgomery <tm...@nard.net> wrote:
I spent a while looking at read-copy-update (RCU) and QSBR for Aeron's C++ API. Opted not to use any of those techniques due to the need to have users make idle calls for reclamation. Which is difficult for a library to impose effectively. But hooking pthread calls and some other options are quite valid. And at a systems perspective, it is trivial to add these type of reclamation calls to an idle strategy. I'm a proponent of the general approach because of what it opens up.

It's worth noting that C++11 atomics are quite laborious in comparison to the JMM. The weaker memory models for std::atomic are.... tricky. But more useful than most in the C++ community seem to want to go along with. But not sure Java could actually leverage them, honestly. I'm not sure if the optimizations they enable are things the current optimizers in Java are able to do much with. So much data dependency.
These are coming to Java by way of VarHandle, with the exception of memory_order_consume I think.

By "so much data dependency" you mean pointer/memory chasing?

On Mon, Feb 8, 2016 at 10:11 AM, Francesco Nigro <nigr...@gmail.com> wrote:

And the memory consume semantic to exploit better perf on weak memory model CPUs...it's interesting to read code like this for a java programmer like me :) It help to understand what java could gain adding a little bit of control for certain operation...

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Vitaly Davidovich

unread,

Feb 9, 2016, 7:12:14 AM2/9/16

to mechanica...@googlegroups.com

On Tuesday, February 9, 2016, Rajiv Kurian <geet...@gmail.com> wrote:

I last looked about a year or so ago and at that time neither GCC nor Clang optimized memory_order_consumer in cases where it could actually be optimized (both preferring to just treat it like memory_order_acquire). Not that it even matters for x86 much. Though it seems like there are a few papers out there to narrow its definition and actually make it implementable - http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4036.pdf

My understanding is m_o_consume is really meant for archs like Alpha that don't respect data dependent loads through a pointer, which are rare (is there something else besides Alpha with such a property?).

On Monday, February 8, 2016 at 6:21:40 PM UTC-8, Vitaly Davidovich wrote:

On Monday, February 8, 2016, Todd Montgomery <tm...@nard.net> wrote:
I spent a while looking at read-copy-update (RCU) and QSBR for Aeron's C++ API. Opted not to use any of those techniques due to the need to have users make idle calls for reclamation. Which is difficult for a library to impose effectively. But hooking pthread calls and some other options are quite valid. And at a systems perspective, it is trivial to add these type of reclamation calls to an idle strategy. I'm a proponent of the general approach because of what it opens up.

It's worth noting that C++11 atomics are quite laborious in comparison to the JMM. The weaker memory models for std::atomic are.... tricky. But more useful than most in the C++ community seem to want to go along with. But not sure Java could actually leverage them, honestly. I'm not sure if the optimizations they enable are things the current optimizers in Java are able to do much with. So much data dependency.
These are coming to Java by way of VarHandle, with the exception of memory_order_consume I think.

By "so much data dependency" you mean pointer/memory chasing?

On Mon, Feb 8, 2016 at 10:11 AM, Francesco Nigro <nigr...@gmail.com> wrote:

And the memory consume semantic to exploit better perf on weak memory model CPUs...it's interesting to read code like this for a java programmer like me :) It help to understand what java could gain adding a little bit of control for certain operation...

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--

You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Francesco Nigro

unread,

Feb 9, 2016, 7:13:09 AM2/9/16

to mechanical-sympathy

Maybe using an Alpha DEC, A PowerPC or ARM based cpu it's possible to see the gains of such memory order...

If i've time i'll try to build a test on my little Rasp PI...

Il giorno martedì 9 febbraio 2016 13:06:41 UTC+1, Vitaly Davidovich ha scritto:

On Tuesday, February 9, 2016, Rajiv Kurian <geet...@gmail.com> wrote:
The only good theoretical use of memory_order_relaxed I find is updating counters, progress bars etc where you give the compiler more room to play around with dependencies than with memory_order_release. Do you have any short examples where this actually generates better code? On x86 at least both GCC and Clang generate a "lock add" for fetch_and_add in both relaxed and release mode. Similarly they generate a simple "mov" for a store operation for both relaxed and release mode. So the only place where I see it generating better code is some affordance through playing around with dependencies.
Yeah I don't think x86 is the best example for the weaker order operations given the cpu has pretty strong memory model (only store-load can be reordered). Relaxed mode can be good if it allows other scheduling to be done around the load/store, but again, given x86 OoO engine there's not much juice to be squeezed there.

On weaker memory models, these things can be more useful. For example, some archs have a weak CAS which doesn't act like a full fence (unlike x86). Other archs don't order stores which means a relaxed and release store will issue different set of instructions. Other archs like Itanium are in-order so compiler does scheduling - I suspect one would see a lot more different code motion opened up by using weak order operations.

What I found weird is that neither GCC nor Clang optimize a loop of stores in release or relaxed mode to a single store. As far as I can tell that transformation is totally valid.
Why do you think this is a valid transformation? AFAICT all the ordered instructions require the compiler to actually perform them; they can schedule other things around them and omit cpu fences in some of them, but the operation must be performed (like volatile, except with atomicity).

On Monday, February 8, 2016 at 6:49:02 PM UTC-8, Todd L. Montgomery wrote:

Hmmm. Might have to look at the latest VarHandles changes then. Didn't notice them before.

Yes, sorry for being imprecise. As I was doing some optimizations before using memory_order_relaxed (not Aeron, though), the more indirection (specifically pointer chasing),
the less optimizations could be done. In one case, a change to use std::make_shared was enough to allow the optimizer to do its thing. So, I've started to dig down and look when I
have the chance. Haven't been able to really find analogues in the JVM yet. i.e. same kinds of things don't do anything useful.

Aeron usage of relaxed is in the rate reporter and tied to the samples only. Which was an easy optimization. But you can play with heap allocating the RateReporter and
see if that has the same optimizations. When I checked clang pre-7, it didn't optimize as well then.

On Mon, Feb 8, 2016 at 6:21 PM, Vitaly Davidovich <vit...@gmail.com> wrote:

On Monday, February 8, 2016, Todd Montgomery <tm...@nard.net> wrote:
I spent a while looking at read-copy-update (RCU) and QSBR for Aeron's C++ API. Opted not to use any of those techniques due to the need to have users make idle calls for reclamation. Which is difficult for a library to impose effectively. But hooking pthread calls and some other options are quite valid. And at a systems perspective, it is trivial to add these type of reclamation calls to an idle strategy. I'm a proponent of the general approach because of what it opens up.

It's worth noting that C++11 atomics are quite laborious in comparison to the JMM. The weaker memory models for std::atomic are.... tricky. But more useful than most in the C++ community seem to want to go along with. But not sure Java could actually leverage them, honestly. I'm not sure if the optimizations they enable are things the current optimizers in Java are able to do much with. So much data dependency.
These are coming to Java by way of VarHandle, with the exception of memory_order_consume I think.

By "so much data dependency" you mean pointer/memory chasing?

On Mon, Feb 8, 2016 at 10:11 AM, Francesco Nigro <nigr...@gmail.com> wrote:

And the memory consume semantic to exploit better perf on weak memory model CPUs...it's interesting to read code like this for a java programmer like me :) It help to understand what java could gain adding a little bit of control for certain operation...

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Francesco Nigro

unread,

Feb 9, 2016, 7:25:40 AM2/9/16

to mechanical-sympathy

Here http://preshing.com/20141124/fixing-gccs-implementation-of-memory_order_consume/

and here http://preshing.com/20140709/the-purpose-of-memory_order_consume-in-cpp11/ were explained better...

For particular archs the implementation of the "heavy strategy" is faluted...i've to check if the bug is opened even now...

Il giorno martedì 9 febbraio 2016 13:12:14 UTC+1, Vitaly Davidovich ha scritto:

On Tuesday, February 9, 2016, Rajiv Kurian <geet...@gmail.com> wrote:
I last looked about a year or so ago and at that time neither GCC nor Clang optimized memory_order_consumer in cases where it could actually be optimized (both preferring to just treat it like memory_order_acquire). Not that it even matters for x86 much. Though it seems like there are a few papers out there to narrow its definition and actually make it implementable - http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4036.pdf
My understanding is m_o_consume is really meant for archs like Alpha that don't respect data dependent loads through a pointer, which are rare (is there something else besides Alpha with such a property?).

On Monday, February 8, 2016 at 6:21:40 PM UTC-8, Vitaly Davidovich wrote:

On Monday, February 8, 2016, Todd Montgomery <tm...@nard.net> wrote:
I spent a while looking at read-copy-update (RCU) and QSBR for Aeron's C++ API. Opted not to use any of those techniques due to the need to have users make idle calls for reclamation. Which is difficult for a library to impose effectively. But hooking pthread calls and some other options are quite valid. And at a systems perspective, it is trivial to add these type of reclamation calls to an idle strategy. I'm a proponent of the general approach because of what it opens up.

It's worth noting that C++11 atomics are quite laborious in comparison to the JMM. The weaker memory models for std::atomic are.... tricky. But more useful than most in the C++ community seem to want to go along with. But not sure Java could actually leverage them, honestly. I'm not sure if the optimizations they enable are things the current optimizers in Java are able to do much with. So much data dependency.
These are coming to Java by way of VarHandle, with the exception of memory_order_consume I think.

By "so much data dependency" you mean pointer/memory chasing?

On Mon, Feb 8, 2016 at 10:11 AM, Francesco Nigro <nigr...@gmail.com> wrote:

And the memory consume semantic to exploit better perf on weak memory model CPUs...it's interesting to read code like this for a java programmer like me :) It help to understand what java could gain adding a little bit of control for certain operation...

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Rajiv Kurian

unread,

Feb 9, 2016, 9:16:05 AM2/9/16

to mechanical-sympathy

The default memory order is memory_order_seq_cst which might have also played havoc. Looking forward to an Aeron driver in C++ :)

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Rajiv Kurian

unread,

Feb 9, 2016, 9:23:51 AM2/9/16

to mechanical-sympathy

On Tuesday, February 9, 2016 at 4:06:41 AM UTC-8, Vitaly Davidovich wrote:

On Tuesday, February 9, 2016, Rajiv Kurian <geet...@gmail.com> wrote:
The only good theoretical use of memory_order_relaxed I find is updating counters, progress bars etc where you give the compiler more room to play around with dependencies than with memory_order_release. Do you have any short examples where this actually generates better code? On x86 at least both GCC and Clang generate a "lock add" for fetch_and_add in both relaxed and release mode. Similarly they generate a simple "mov" for a store operation for both relaxed and release mode. So the only place where I see it generating better code is some affordance through playing around with dependencies.
Yeah I don't think x86 is the best example for the weaker order operations given the cpu has pretty strong memory model (only store-load can be reordered). Relaxed mode can be good if it allows other scheduling to be done around the load/store, but again, given x86 OoO engine there's not much juice to be squeezed there.

On weaker memory models, these things can be more useful. For example, some archs have a weak CAS which doesn't act like a full fence (unlike x86). Other archs don't order stores which means a relaxed and release store will issue different set of instructions. Other archs like Itanium are in-order so compiler does scheduling - I suspect one would see a lot more different code motion opened up by using weak order operations.

What I found weird is that neither GCC nor Clang optimize a loop of stores in release or relaxed mode to a single store. As far as I can tell that transformation is totally valid.
Why do you think this is a valid transformation? AFAICT all the ordered instructions require the compiler to actually perform them; they can schedule other things around them and omit cpu fences in some of them, but the operation must be performed (like volatile, except with atomicity).

They are valid transformations because there are no intervening writes to/reads from other parts of memory and so no consumer can see the effects of such a transformation. Further it does not hinder the forward progress guarantee. For memory_order_release it could be incremented by 100 but that increment could not be reordered with previous writes to other pieces of memory by the compiler. For memory_order_relaxed even that is not a constraint so I was hoping to see this as an improvement. Here is a paper showing other valid transformations - http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html

On Monday, February 8, 2016 at 6:49:02 PM UTC-8, Todd L. Montgomery wrote:

Hmmm. Might have to look at the latest VarHandles changes then. Didn't notice them before.

Yes, sorry for being imprecise. As I was doing some optimizations before using memory_order_relaxed (not Aeron, though), the more indirection (specifically pointer chasing),
the less optimizations could be done. In one case, a change to use std::make_shared was enough to allow the optimizer to do its thing. So, I've started to dig down and look when I
have the chance. Haven't been able to really find analogues in the JVM yet. i.e. same kinds of things don't do anything useful.

Aeron usage of relaxed is in the rate reporter and tied to the samples only. Which was an easy optimization. But you can play with heap allocating the RateReporter and
see if that has the same optimizations. When I checked clang pre-7, it didn't optimize as well then.

On Mon, Feb 8, 2016 at 6:21 PM, Vitaly Davidovich <vit...@gmail.com> wrote:

On Monday, February 8, 2016, Todd Montgomery <tm...@nard.net> wrote:
I spent a while looking at read-copy-update (RCU) and QSBR for Aeron's C++ API. Opted not to use any of those techniques due to the need to have users make idle calls for reclamation. Which is difficult for a library to impose effectively. But hooking pthread calls and some other options are quite valid. And at a systems perspective, it is trivial to add these type of reclamation calls to an idle strategy. I'm a proponent of the general approach because of what it opens up.

It's worth noting that C++11 atomics are quite laborious in comparison to the JMM. The weaker memory models for std::atomic are.... tricky. But more useful than most in the C++ community seem to want to go along with. But not sure Java could actually leverage them, honestly. I'm not sure if the optimizations they enable are things the current optimizers in Java are able to do much with. So much data dependency.
These are coming to Java by way of VarHandle, with the exception of memory_order_consume I think.

By "so much data dependency" you mean pointer/memory chasing?

On Mon, Feb 8, 2016 at 10:11 AM, Francesco Nigro <nigr...@gmail.com> wrote:

And the memory consume semantic to exploit better perf on weak memory model CPUs...it's interesting to read code like this for a java programmer like me :) It help to understand what java could gain adding a little bit of control for certain operation...

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Rajiv Kurian

unread,

Feb 9, 2016, 9:29:09 AM2/9/16

to mechanical-sympathy

On Tuesday, February 9, 2016 at 4:12:14 AM UTC-8, Vitaly Davidovich wrote:

On Tuesday, February 9, 2016, Rajiv Kurian <geet...@gmail.com> wrote:
I last looked about a year or so ago and at that time neither GCC nor Clang optimized memory_order_consumer in cases where it could actually be optimized (both preferring to just treat it like memory_order_acquire). Not that it even matters for x86 much. Though it seems like there are a few papers out there to narrow its definition and actually make it implementable - http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4036.pdf
My understanding is m_o_consume is really meant for archs like Alpha that don't respect data dependent loads through a pointer, which are rare (is there something else besides Alpha with such a property?).

And ARMV7 too. No 'dmb ish' needed when there is a data dependency. And PowerPC too in case people still care.

On Monday, February 8, 2016 at 6:21:40 PM UTC-8, Vitaly Davidovich wrote:

On Monday, February 8, 2016, Todd Montgomery <tm...@nard.net> wrote:
I spent a while looking at read-copy-update (RCU) and QSBR for Aeron's C++ API. Opted not to use any of those techniques due to the need to have users make idle calls for reclamation. Which is difficult for a library to impose effectively. But hooking pthread calls and some other options are quite valid. And at a systems perspective, it is trivial to add these type of reclamation calls to an idle strategy. I'm a proponent of the general approach because of what it opens up.

It's worth noting that C++11 atomics are quite laborious in comparison to the JMM. The weaker memory models for std::atomic are.... tricky. But more useful than most in the C++ community seem to want to go along with. But not sure Java could actually leverage them, honestly. I'm not sure if the optimizations they enable are things the current optimizers in Java are able to do much with. So much data dependency.
These are coming to Java by way of VarHandle, with the exception of memory_order_consume I think.

By "so much data dependency" you mean pointer/memory chasing?

On Mon, Feb 8, 2016 at 10:11 AM, Francesco Nigro <nigr...@gmail.com> wrote:

And the memory consume semantic to exploit better perf on weak memory model CPUs...it's interesting to read code like this for a java programmer like me :) It help to understand what java could gain adding a little bit of control for certain operation...

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Vitaly Davidovich

unread,

Feb 9, 2016, 10:01:22 AM2/9/16

to mechanical-sympathy

That paper is informative -- thanks! I was under the impression that atomics subsume volatile semantics, but apparently that's not the case.

At any rate, not that this is surprising given the loop transform isn't done, but even the simple case of (manual unroll-like thing):

std::atomic<int> x;

x.store(1, std::memory_order_relaxed);

x.store(2, std::memory_order_relaxed);

x.store(3, std::memory_order_relaxed);

isn't optimized (this is called out in the paper as well, albeit using increment as an example).

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Rajiv Kurian

unread,

Feb 9, 2016, 11:50:24 AM2/9/16

to mechanical-sympathy

Yeah I feel like some of these transformations are not done because of the assumption that atomics subsume volatile semantics.

They are some what problematic though. One can imagine upgrading a progress bar from some download thread using release order and the UI thread reading this and showing it on the screen. The compiler is free to figure out that your progress bar atomic goes from 0 - 100 and set it directly to 100, kind of killing the experience. Given how release works I think the compiler could bring move things (like the real work of downloading) above the release store and set it directly to 100.

Vitaly Davidovich

unread,

Feb 9, 2016, 12:29:26 PM2/9/16

to mechanica...@googlegroups.com

On Tuesday, February 9, 2016, Rajiv Kurian <geet...@gmail.com> wrote:

Yeah I feel like some of these transformations are not done because of the assumption that atomics subsume volatile semantics.

My guess is they simply play conservative; given the relative recency of atomics and the serious problems miscompiles can cause, I don't blame them :). That paper alludes to ongoing research to determine what exact transformations are safe.

They are some what problematic though. One can imagine upgrading a progress bar from some download thread using release order and the UI thread reading this and showing it on the screen. The compiler is free to figure out that your progress bar atomic goes from 0 - 100 and set it directly to 100, kind of killing the experience. Given how release works I think the compiler could bring move things (like the real work of downloading) above the release store and set it directly to 100.

Well yeah, but it's a data race to begin with if relaxed mode is used; e.g consumer may not see increments of the progress bar simply due to scheduling. If it's racily observing some counter then anything goes, which means 0 to 100 is possible. That being said, the conservative (and safe although not as performant) approach is to treat it like volatile in this regard. It sounds like over time compilers will get more aggressive there.

Avi Kivity

unread,

Feb 9, 2016, 12:38:57 PM2/9/16

to mechanica...@googlegroups.com

On 02/09/2016 07:29 PM, Vitaly Davidovich wrote:

On Tuesday, February 9, 2016, Rajiv Kurian <geet...@gmail.com> wrote:

Yeah I feel like some of these transformations are not done because of the assumption that atomics subsume volatile semantics.

My guess is they simply play conservative; given the relative recency of atomics and the serious problems miscompiles can cause, I don't blame them :). That paper alludes to ongoing research to determine what exact transformations are safe.

They are some what problematic though. One can imagine upgrading a progress bar from some download thread using release order and the UI thread reading this and showing it on the screen. The compiler is free to figure out that your progress bar atomic goes from 0 - 100 and set it directly to 100, kind of killing the experience. Given how release works I think the compiler could bring move things (like the real work of downloading) above the release store and set it directly to 100.

Well yeah, but it's a data race to begin with if relaxed mode is used; e.g consumer may not see increments of the progress bar simply due to scheduling. If it's racily observing some counter then anything goes, which means 0 to 100 is possible. That being said, the conservative (and safe although not as performant) approach is to treat it like volatile in this regard. It sounds like over time compilers will get more aggressive there.

Starting conservatively and gradually getting more aggressive has its own pitfalls -- people grow to expect the existing behavior, then moan when it changes.

Vitaly Davidovich

unread,

Feb 9, 2016, 12:56:28 PM2/9/16

to mechanical-sympathy

Starting conservatively and gradually getting more aggressive has its own pitfalls -- people grow to expect the existing behavior, then moan when it changes.

Agreed, but people complain no matter what.

Rajiv Kurian

unread,

Feb 9, 2016, 1:12:06 PM2/9/16

to mechanical-sympathy

On Tuesday, February 9, 2016 at 9:29:26 AM UTC-8, Vitaly Davidovich wrote:

On Tuesday, February 9, 2016, Rajiv Kurian <geet...@gmail.com> wrote:
Yeah I feel like some of these transformations are not done because of the assumption that atomics subsume volatile semantics.
My guess is they simply play conservative; given the relative recency of atomics and the serious problems miscompiles can cause, I don't blame them :). That paper alludes to ongoing research to determine what exact transformations are safe.

They are some what problematic though. One can imagine upgrading a progress bar from some download thread using release order and the UI thread reading this and showing it on the screen. The compiler is free to figure out that your progress bar atomic goes from 0 - 100 and set it directly to 100, kind of killing the experience. Given how release works I think the compiler could bring move things (like the real work of downloading) above the release store and set it directly to 100.
Well yeah, but it's a data race to begin with if relaxed mode is used; e.g consumer may not see increments of the progress bar simply due to scheduling. If it's racily observing some counter then anything goes, which means 0 to 100 is possible. That being said, the conservative (and safe although not as performant) approach is to treat it like volatile in this regard. It sounds like over time compilers will get more aggressive there.

This will happen even in release mode and not just relaxed mode since operations after the release store might be brought above it but not the other way round. It is a data race - exactly!. The compiler is free to do this reordering per the C++11 memory model. The problem is that people grow used to the existing behavior which as it stands is super conservative and after these optimizations programs break very subtly. This is not a simple "The programmer was doing it wrong so they deserved it" thing. Real software will be written with these wrong assumptions and they'll break when the compiler decides to take advantage of the spec to optimize further. For a good parallel one can see all the software that has subtly (or sometimes very obviously) broken when the compiler took advantage of undefined behavior to create nasal demons. Clang has become especially aggressive in the recent years and has gone as far as optimizing any function with undefined behavior to a "rep ret". Starting conservative and going aggressive later means real programs will break down the line with a new compiler and people will shout at the compiler writers and they will revert their changes or we'll be left with broken programs.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Vitaly Davidovich

unread,

Feb 9, 2016, 1:26:27 PM2/9/16

to mechanical-sympathy

Starting conservative and going aggressive later means real programs will break down the line with a new compiler and people will shout at the compiler writers and they will revert their changes or we'll be left with broken programs.

I'd expect compiler devs to take possible breakage into account when determining if further optimizations are worth the risk. But improving the optimizer (or stdlib for that matter) almost always carries some risk of breaking someone, for various definitions of "breaking" (races, UB, perf degradation, etc). That's the price of progress, IMHO. What's a better alternative?

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--

You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Rajiv Kurian

unread,

Feb 9, 2016, 1:59:34 PM2/9/16

to mechanical-sympathy

Compiler devs are way too eager to win benchmark games to do anything else. Look at all the programs (illegal but working) that they've broken through optimizations. Null pointer checks elided because there was a dereference before, range checks elided because x + 1 > x is always assumed to be true, "optimizing" your handwritten memcpy to a std lib memcpy which is not async safe thus breaking code. I wouldn't trust them so much. Yeah it is the price of progress. The alternative is to be conservative and let developers have control. I know how to write 100 to memory directly don't do it for me. C and C++ are now so far away from being simple to understand when it comes to the predicting the assembly. Of course it was never true that C was portable assembly but it is less true now than ever.

On Tuesday, February 9, 2016 at 10:26:27 AM UTC-8, Vitaly Davidovich wrote:

Starting conservative and going aggressive later means real programs will break down the line with a new compiler and people will shout at the compiler writers and they will revert their changes or we'll be left with broken programs.

I'd expect compiler devs to take possible breakage into account when determining if further optimizations are worth the risk. But improving the optimizer (or stdlib for that matter) almost always carries some risk of breaking someone, for various definitions of "breaking" (races, UB, perf degradation, etc). That's the price of progress, IMHO. What's a better alternative?

On Tue, Feb 9, 2016 at 1:12 PM, Rajiv Kurian <geet...@gmail.com> wrote:

On Tuesday, February 9, 2016 at 9:29:26 AM UTC-8, Vitaly Davidovich wrote:

On Tuesday, February 9, 2016, Rajiv Kurian <geet...@gmail.com> wrote:
Yeah I feel like some of these transformations are not done because of the assumption that atomics subsume volatile semantics.
My guess is they simply play conservative; given the relative recency of atomics and the serious problems miscompiles can cause, I don't blame them :). That paper alludes to ongoing research to determine what exact transformations are safe.

They are some what problematic though. One can imagine upgrading a progress bar from some download thread using release order and the UI thread reading this and showing it on the screen. The compiler is free to figure out that your progress bar atomic goes from 0 - 100 and set it directly to 100, kind of killing the experience. Given how release works I think the compiler could bring move things (like the real work of downloading) above the release store and set it directly to 100.
Well yeah, but it's a data race to begin with if relaxed mode is used; e.g consumer may not see increments of the progress bar simply due to scheduling. If it's racily observing some counter then anything goes, which means 0 to 100 is possible. That being said, the conservative (and safe although not as performant) approach is to treat it like volatile in this regard. It sounds like over time compilers will get more aggressive there.
This will happen even in release mode and not just relaxed mode since operations after the release store might be brought above it but not the other way round. It is a data race - exactly!. The compiler is free to do this reordering per the C++11 memory model. The problem is that people grow used to the existing behavior which as it stands is super conservative and after these optimizations programs break very subtly. This is not a simple "The programmer was doing it wrong so they deserved it" thing. Real software will be written with these wrong assumptions and they'll break when the compiler decides to take advantage of the spec to optimize further. For a good parallel one can see all the software that has subtly (or sometimes very obviously) broken when the compiler took advantage of undefined behavior to create nasal demons. Clang has become especially aggressive in the recent years and has gone as far as optimizing any function with undefined behavior to a "rep ret". Starting conservative and going aggressive later means real programs will break down the line with a new compiler and people will shout at the compiler writers and they will revert their changes or we'll be left with broken programs.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Vitaly Davidovich

unread,

Feb 9, 2016, 2:21:17 PM2/9/16

to mechanical-sympathy

Compiler devs are way too eager to win benchmark games to do anything else

Sure, but this is not specific to compilers.

Look at all the programs (illegal but working) that they've broken through optimizations. Null pointer checks elided because there was a dereference before, range checks elided because x + 1 > x is always assumed to be true, "optimizing" your handwritten memcpy to a std lib memcpy which is not async safe thus breaking code.

Sure, but using these same semantics in other contexts is what allows C/C++ compilers to achieve better performance. A lot of these cases fall out from many iterations of other optimizations being done, particularly things like inlining. Things like memcpy or memset'ing a stack buffer to 0 being elided and breaking security are more questionably, but outside of those contexts, I don't see anyone complaining about those optimizations taking place. If anything is to blame, it's C and C++ semantics (e.g. signed overflow being UB for your example above). The compiler is just exploiting those things by saying "this cannot happen, so I don't need to check it". Yes, it can create breakage. Use safer languages then with more defined behavior, but be prepared to take some performance hit.

The alternative is to be conservative and let developers have control. I know how to write 100 to memory directly don't do it for me.

You were just saying how you'd wish the compiler would fuse a loop with atomic stores! :) What does "conservative" actually mean? The most conservative is no optimization at all -- emit code as-is written. Again, the compiler is making phased/iterative optimizations, so even though you know how to write 100 directly to memory, after some other optimizations and "obfuscation" the compiler may realize that's exactly what some code is doing via a loop. You'd rather have it leave that alone and let the loop run? Again, I'm not entirely sure what the proposal is. The issue isn't the compiler, but the language semantics that it's working with. If you're saying C and C++ can have saner/safer semantics for some things, then I agree, but that's not a compiler issue per se.

Let's take this train of thought a bit further -- should the CPUs then be dumbed down in case someone is doing something wrong but getting away with it by chance today? If you're a library writer, should it be OK for users to code to the implementation rather than the interface/spec and thus handcuff you on further evolution of your library?

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Todd Montgomery

unread,

Feb 9, 2016, 2:56:27 PM2/9/16

to mechanical-sympathy

To be perfectly honest, I don't mind optimizations breaking assumptions made for UB. If you are relying on UB, you get what you get.

Rajiv Kurian

unread,

Feb 9, 2016, 2:57:12 PM2/9/16

to mechanical-sympathy

Yes the language semantics are to be blamed - no argument there. The fact that C/C++ is chock full of UB instead of implementation defined behavior is to be blamed. As a sad side note undefined behavior has now even crept into the so called language agnostic backends like LLVM. We are at a state that it is not even possible to verify that a C/C++ program does not invoke UB. I agree with you, we need a different language with more well defined semantics. I know that every transformation that the compiler is doing is legal - my only point was that it breaks real programs and compiler devs know it will and they still don't care. The memset, memcpy examples are particularly grievous and illustrates my point better - writing my own memset does not cause UB, but the compiler changes my program's semantics a lot. Another example is that memset to 0 followed by free results in the memset being elided - again no UB but it's just the compiler being too smart, totally defeating the programmers intention to make things secure. The fact that compilers are so aware of the standard library is somewhat weird to me.

Thanks for bringing up the CPU example - I was going to bring that up myself. What does a CPU do when you do something illegal? - it will either fail to decode if it is an illegal instruction or throw an exception. It won't silently recompile your code to eliminate checks, return early or call your parents. Dividing by zero in a cpu causes a divide by zero exception. Shifting by more than the width of an integer has implementation defined behavior. Neither of these "errors" can cause your cpu to issue instructions that will delete your hard-drive. Programmers are fine with implementation defined behavior (note: it HAS to be defined by all valid implementations even though not by the standard) IMHO and not undefined behavior which can cause anything including erasing your hard drive or making demons fly out of your nose.

I wasn't quite arguing for the transformations - I was surprised that the compilers didn't do it given all the other transformations they do. I know how difficult it is for compilers to warn users of UB when they take advantage of it especially because of how it falls out of other passes like inlining etc. But it goes to show that compiler writers will prioritize optimizations for UB instead of building the (really really complex) infrastructure required to pass information between optimization passes so that undefined behavior can be pointed out. I want a new language - have had enough of C :)

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Vitaly Davidovich

unread,

Feb 9, 2016, 3:54:45 PM2/9/16

to mechanical-sympathy

Yes the language semantics are to be blamed - no argument there. The fact that C/C++ is chock full of UB instead of implementation defined behavior is to be blamed.

I'm not sure implementation defined is so much better (it is better though) than UB. It makes code non portable across compilers, or perhaps even across compiler versions as each version may have different implementation-defined semantics.

As a sad side note undefined behavior has now even crept into the so called language agnostic backends like LLVM. We are at a state that it is not even possible to verify that a C/C++ program does not invoke UB. I agree with you, we need a different language with more well defined semantics. I know that every transformation that the compiler is doing is legal - my only point was that it breaks real programs and compiler devs know it will and they still don't care.

While I understand how some optimizations the compiler does can seem hostile to the developer, I don't think it's because compiler devs don't care. They simply look for code patterns/shapes, and then optimize them the exact same way -- the compiler cannot differentiate between a security-sensitive zero'ing of a stack buffer vs needless zero'ing that occurred because a lot of code was inlined, constants folded and propagated, branches folded/pruned out, calls devirtualized, and you're left with a pointless zero'ing once the dust settles. So it seems like the correct/reasonable thing to do here is find ways to communicate to the compiler that some piece of code cannot be optimized a certain way. AFAIK, there are security specific zero'ing calls in various C runtimes. But that's the gist of the issue -- compiler simply does not have enough info to determine true intent.

The memset, memcpy examples are particularly grievous and illustrates my point better - writing my own memset does not cause UB, but the compiler changes my program's semantics a lot. Another example is that memset to 0 followed by free results in the memset being elided - again no UB but it's just the compiler being too smart, totally defeating the programmers intention to make things secure. The fact that compilers are so aware of the standard library is somewhat weird to me.

This is yet again another case of compiler pattern matching without knowing the context. memset to 0 followed by free can just as well fall out of other optimizations, and it generally makes sense to elide the zero'ing just like it makes sense to remove other needless operations that weren't needless on their own but become such after sufficient prior optimization.

Thanks for bringing up the CPU example - I was going to bring that up myself. What does a CPU do when you do something illegal? - it will either fail to decode if it is an illegal instruction or throw an exception. It won't silently recompile your code to eliminate checks, return early or call your parents. Dividing by zero in a cpu causes a divide by zero exception. Shifting by more than the width of an integer has implementation defined behavior. Neither of these "errors" can cause your cpu to issue instructions that will delete your hard-drive. Programmers are fine with implementation defined behavior (note: it HAS to be defined by all valid implementations even though not by the standard) IMHO and not undefined behavior which can cause anything including erasing your hard drive or making demons fly out of your nose.

I wasn't thinking of CPUs in terms of them doing completely nonsensical things in the face of illegal instructions, traps, etc. Instead, I was thinking of them doing things out of order, specifically reordering memory loads/stores with respect to their program order. If someone is writing racy programs that happen to work today, should the CPU vendor be disallowed from making further reorder optimizations for fear of breaking someone whose code happens to be working today by virtue of current CPU reordering? This is analogous to more aggressively optimizing things like memory_order_relaxed.

I wasn't quite arguing for the transformations - I was surprised that the compilers didn't do it given all the other transformations they do. I know how difficult it is for compilers to warn users of UB when they take advantage of it especially because of how it falls out of other passes like inlining etc. But it goes to show that compiler writers will prioritize optimizations for UB instead of building the (really really complex) infrastructure required to pass information between optimization passes so that undefined behavior can be pointed out.

I think they're trying to get better at this, both from compiler warnings and the toolchains including address sanitizers, thread sanitizers, UB sanitizers, etc. Compiler writers are like anyone else in terms of exploiting constraints -- if you're programming something with certain constraints on the domain, it behooves you to take advantage of them to optimize performance, at least in release builds. Now, if those constraints happen to be hard to program against for the users or error prone, well then the blame is with the constraints (i.e. C/C++ spec).

But let's look at the other side of the token - Java. It has a lot more safety and defined behavior, but poor performance model. The JIT now needs to pull heroics to try and recover some of the performance loss. It's able to do that in some cases, but not nearly all of them leading to people opting out of those safety checks. I do think that safe-by-default with explicit opt-in escape hatches is a better default than fully unsafe. Its optimizer isn't nearly as aggressive because it has to ensure the semantics are preserved with optimizations applied (i.e. java code running in interpreter must behave the same way as JIT compiled code, modulo speed). A quick and simple example of such an artifact is supposing you have:

final Foo _foo;

int bar() {

return _foo.doSomething();

}

class Foo {

int doSomething() { return 8;}

}

JIT inlines doSomething and returns 8 from bar(), but it still performs an explicit null check of _foo. Instance final fields aren't treated as compile-time constants (but, mind you, *static* final fields are!) because there are frameworks that set them via reflection even though that's not defined behavior. So now Hotspot engineers are thinking of clever ways of addressing this by speculatively treating them as constants, but detecting such writes to them dynamically and deopt'ing the compiled code. But now that's a lot of complexity (which may bring about its own bugs) to perpetuate users' bad behavior which was never allowed but happened to work. The "rest of us" who aren't doing such things get worse performance in the meantime.

So basically every language will have some warts. C/C++ tend to care a lot more about performance than safety vs, say, Java and they focus more on exploiting that. Java cares more about safety/portability/uniformity, and will always lean towards those things if there's ever a safety vs performance tradeoff to be made. No matter what, with both languages having so many users, there will always be people who are unhappy with the direction :).

I want a new language - have had enough of C :)

You're not alone, but thus far there's no suitable replacement for the niches C occupies (things like Rust look promising, but it'll need to attain some critical mass of users before gaining sufficient momentum to make a dent here, IMO.).

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Rajiv Kurian

unread,

Feb 9, 2016, 5:02:12 PM2/9/16

to mechanical-sympathy

On Tuesday, February 9, 2016 at 12:54:45 PM UTC-8, Vitaly Davidovich wrote:

Yes the language semantics are to be blamed - no argument there. The fact that C/C++ is chock full of UB instead of implementation defined behavior is to be blamed.

I'm not sure implementation defined is so much better (it is better though) than UB. It makes code non portable across compilers, or perhaps even across compiler versions as each version may have different implementation-defined semantics.

Implementation defined is a lot better IMHO. We can't do a single standard behavior for everything because it varies by hardware. Shifting by more than the size of a register has different semantics for PowerPC vs X86 for example. I'd be fine for one to live with a couple extra masks though. If one is writing cross platform code then one has to be aware of these features, just like they have to be aware of Endianness for many applications or specific vector instructions for their platform or FMA instructions on their DSP. C programmers already have to do this for different platforms to get the best out of them.

As a sad side note undefined behavior has now even crept into the so called language agnostic backends like LLVM. We are at a state that it is not even possible to verify that a C/C++ program does not invoke UB. I agree with you, we need a different language with more well defined semantics. I know that every transformation that the compiler is doing is legal - my only point was that it breaks real programs and compiler devs know it will and they still don't care.

While I understand how some optimizations the compiler does can seem hostile to the developer, I don't think it's because compiler devs don't care. They simply look for code patterns/shapes, and then optimize them the exact same way -- the compiler cannot differentiate between a security-sensitive zero'ing of a stack buffer vs needless zero'ing that occurred because a lot of code was inlined, constants folded and propagated, branches folded/pruned out, calls devirtualized, and you're left with a pointless zero'ing once the dust settles. So it seems like the correct/reasonable thing to do here is find ways to communicate to the compiler that some piece of code cannot be optimized a certain way. AFAIK, there are security specific zero'ing calls in various C runtimes. But that's the gist of the issue -- compiler simply does not have enough info to determine true intent.

I don't agree. malloc, free, memset etc are part of the std library and not part of the C language. The free call will ultimately transform to some code that writes some bits in the memory allocator. The C compiler cannot possibly know the semantics of our program and that the zeroing is redundant. I could preload my own memory allocator that requires zeroing before free. C programmers are in a constant battle with compilers on how to trick the compiler into doing the right thing or passing in new and shiny (NOT) flags like -fno-tree-loop-distribute-patterns to get rid of these optimizations. Further most of the optimizations that they do are not making programs any faster at least not in the right way. I don't understand why so many people are okay with undefined behavior. Oh you are trying to shift a 32 bit register by 33, I won't give you a compiler error and many times not even a warning - but I'll silently just optimize your entire program to a rep ret. This is not acceptable behavior. If it is undefined behavior and the compiler knows it enough to take advantage of it - it should report it. This is the basic expectation one has of typed programs. Yes it is technically difficult but users don't care. I want correct first, then fast. 'rep ret' is faster than 'sal' but who cares about that kind of speed?

The memset, memcpy examples are particularly grievous and illustrates my point better - writing my own memset does not cause UB, but the compiler changes my program's semantics a lot. Another example is that memset to 0 followed by free results in the memset being elided - again no UB but it's just the compiler being too smart, totally defeating the programmers intention to make things secure. The fact that compilers are so aware of the standard library is somewhat weird to me.

This is yet again another case of compiler pattern matching without knowing the context. memset to 0 followed by free can just as well fall out of other optimizations, and it generally makes sense to elide the zero'ing just like it makes sense to remove other needless operations that weren't needless on their own but become such after sufficient prior optimization.

Again memset is a std lib concept, as is free. The compiler is not free to assume what free means.

Thanks for bringing up the CPU example - I was going to bring that up myself. What does a CPU do when you do something illegal? - it will either fail to decode if it is an illegal instruction or throw an exception. It won't silently recompile your code to eliminate checks, return early or call your parents. Dividing by zero in a cpu causes a divide by zero exception. Shifting by more than the width of an integer has implementation defined behavior. Neither of these "errors" can cause your cpu to issue instructions that will delete your hard-drive. Programmers are fine with implementation defined behavior (note: it HAS to be defined by all valid implementations even though not by the standard) IMHO and not undefined behavior which can cause anything including erasing your hard drive or making demons fly out of your nose.

I wasn't thinking of CPUs in terms of them doing completely nonsensical things in the face of illegal instructions, traps, etc. Instead, I was thinking of them doing things out of order, specifically reordering memory loads/stores with respect to their program order. If someone is writing racy programs that happen to work today, should the CPU vendor be disallowed from making further reorder optimizations for fear of breaking someone whose code happens to be working today by virtue of current CPU reordering? This is analogous to more aggressively optimizing things like memory_order_relaxed.

Racy programs have undefined ordering, but they can't delete your hard drive can they? Undefined ordering is different from undefined behavior. A shift by 33 on a 32 bit register is very well defined for intel, for C it means nasal demons. Would it be okay for any cpu vendor to have such a large hole in their spec? John Reghr and others have (in jest I hope) proposed compilers which actually delete hard drives to teach C programmers a lesson :)

I wasn't quite arguing for the transformations - I was surprised that the compilers didn't do it given all the other transformations they do. I know how difficult it is for compilers to warn users of UB when they take advantage of it especially because of how it falls out of other passes like inlining etc. But it goes to show that compiler writers will prioritize optimizations for UB instead of building the (really really complex) infrastructure required to pass information between optimization passes so that undefined behavior can be pointed out.

I think they're trying to get better at this, both from compiler warnings and the toolchains including address sanitizers, thread sanitizers, UB sanitizers, etc. Compiler writers are like anyone else in terms of exploiting constraints -- if you're programming something with certain constraints on the domain, it behooves you to take advantage of them to optimize performance, at least in release builds. Now, if those constraints happen to be hard to program against for the users or error prone, well then the blame is with the constraints (i.e. C/C++ spec).

The only constraint that I know of that actually leads to an improvement in speed in a regular C program is to elide checks for wrap of signed integers, because the compiler can then auto vectorize it. Auto vectorization itself is rarely more than code bloat but that is a different story. Everything else that I can think of can be legally optimized to 'rep ret' (and is by Clang at least) and what programmer cares about the speed of 'rep ret'? Sanitizers are great. I love them, but I don't like to run my program to know if it is valid - especially after I have run the compiler for 15 minutes.

But let's look at the other side of the token - Java. It has a lot more safety and defined behavior, but poor performance model. The JIT now needs to pull heroics to try and recover some of the performance loss. It's able to do that in some cases, but not nearly all of them leading to people opting out of those safety checks. I do think that safe-by-default with explicit opt-in escape hatches is a better default than fully unsafe. Its optimizer isn't nearly as aggressive because it has to ensure the semantics are preserved with optimizations applied (i.e. java code running in interpreter must behave the same way as JIT compiled code, modulo speed). A quick and simple example of such an artifact is supposing you have:

final Foo _foo;

int bar() {
return _foo.doSomething();
}

class Foo {
int doSomething() { return 8;}
}

JIT inlines doSomething and returns 8 from bar(), but it still performs an explicit null check of _foo. Instance final fields aren't treated as compile-time constants (but, mind you, *static* final fields are!) because there are frameworks that set them via reflection even though that's not defined behavior. So now Hotspot engineers are thinking of clever ways of addressing this by speculatively treating them as constants, but detecting such writes to them dynamically and deopt'ing the compiled code. But now that's a lot of complexity (which may bring about its own bugs) to perpetuate users' bad behavior which was never allowed but happened to work. The "rest of us" who aren't doing such things get worse performance in the meantime.

So basically every language will have some warts. C/C++ tend to care a lot more about performance than safety vs, say, Java and they focus more on exploiting that. Java cares more about safety/portability/uniformity, and will always lean towards those things if there's ever a safety vs performance tradeoff to be made. No matter what, with both languages having so many users, there will always be people who are unhappy with the direction :).

C's spec is too full of undefined behavior to have any confidence about software written in it. Fast comes after correct. The reason it has worked so far has been because the compilers were less aggressive, the people who cared were very strict about checking assembly and then sheer luck. All the vulnerabilities in software written by expert C programmers kind of goes against the whole sentiment of "I am good enough to write C without UB, so don't penalize me for the less smart people". It is easy to get swept by this sentiment, but if Ulrich Drepper can write undefined behavior into glibc and OpenSSL has a new problem every other day then I am pretty sure I will write them too. As the joke goes - C is not a portable assembler, it is a portable undefined behavior generator. And now all of these undefined behaviors don't require a seg fault or a trap - they can be silently "optimized" to a no op. What a great world to live in :) I am no fan of Java, but I am glad they chose safety over a couple cycles here and there. If they had SIMD intrinsics, I am pretty sure one can push it to a couple percentages of C and call it a day. The speed issues seem to be much more about lack of layout control and the wanton use of of heap allocation than anything else. The reflection example is truly horrific but another language can come around that is still AOT and has well defined or at least implementation defined semantics. I have been keeping an eye out for WebAssembly and so far it seems like the best option I've seen.

I want a new language - have had enough of C :)

You're not alone, but thus far there's no suitable replacement for the niches C occupies (things like Rust look promising, but it'll need to attain some critical mass of users before gaining sufficient momentum to make a dent here, IMO.).

Rust is right now (hopefully not forever) tied pretty tightly to the LLVM toolchain so they are exposed to the same kinds of issues. In fact the memcpy optimization thing has left people dumb founded on their mailing list when they hit it first. They do seem to have a much smaller UB surface area than C. Given how they are not much slower than C, it kind of goes to show that all this UB "optimization" makes very little difference outside of a benchmark. I like the language a lot and would gladly use it once it gets a bit more mature. It is however extremely difficult for me to not use the heap in steady state in Rust, shutting up the borrow checker is so tough otherwise :(

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Dan Eloff

unread,

Feb 9, 2016, 5:19:01 PM2/9/16

to mechanica...@googlegroups.com

I think that's fair, "If you dance barefoot on the broken glass of undefined behavior, you've got to expect the occasional cut." someone once put it quite colorfully. But the problem I see is there is so much UB and nothing is done about it. Changing the rules to make UB defined is fully backwards compatible ("it works" fits nicely inside the ven diagram circle of undefined.)

It's possible that people want to keep the UB because some of these UB rules let compilers make optimizations that they cannot otherwise, like I heard it said that UB of signed overflow lets compilers optimize loops in ways they can't with unsigned integers (they can assume the loop will terminate). However, I don't find that a likely explanation.

The more likely explanation is that hardware can be wildly varied and C and C++ are meant to run everywhere.

But instead of coming up with all these "silly" extensions to C++ like user defined literal suffixes why don't people add some extensions to limit the UB? Is it not reasonable for me to want to give an argument to the compiler with the meaning "don't treat things as UB that are actually well defined on the compilation target hardware". Then suddenly signed overflow is defined on any platform I care to compile for. If I then compile on something exotic, my code has UB, but that's my bad. Or alternately allow me to specify which UB I want treated as defined, just like I can enable and disable specific optimizations via compiler flags. It's my bad if I compile with allow signed overflow for a target which e.g. traps on signed overflow.

There seems no reason to me that we can't have a stricter C++ without having to create a new language.

Dan Eloff

unread,

Feb 9, 2016, 5:25:35 PM2/9/16

to mechanica...@googlegroups.com

Racy programs have undefined ordering, but they can't delete your hard drive can they?

They can delete your hard drive. They could even delete you, if the program controls things connected to the real world, like the brakes on your car or a nuclear missile in a silo :)

https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Rajiv Kurian

unread,

Feb 9, 2016, 6:04:30 PM2/9/16

to mechanical-sympathy

On Tuesday, February 9, 2016 at 2:25:35 PM UTC-8, Daniel Eloff wrote:

Racy programs have undefined ordering, but they can't delete your hard drive can they?

They can delete your hard drive. They could even delete you, if the program controls things connected to the real world, like the brakes on your car or a nuclear missile in a silo :)

https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong

I've read that article - the important difference is that x86 is not conspiring to change your code, it's the compiler. Do you like that C is legally allowed to call your launch_nucelar_missile() routine because your unrelated code for updating a counter had a race? Most of the points in that article have been explained by Hans Boehm and these transformations that could go wrong are done by the compiler and not the hardware. Quoting him - "Several prior research contributions [15, 9] have explored the problem of distinguishing “benign” and harmful data races to make it easier for programmers to focus on a subset of the output from a data race detector. Here we argue that, although such a distinction makes sense at the machine code level, it does not make sense at the C or C++ source code level." (emphasis mine).

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Vitaly Davidovich

unread,

Feb 9, 2016, 6:13:31 PM2/9/16

to mechanica...@googlegroups.com

CPU will not change your code but it can execute it in different order leading to undefined behavior from application point of view. It may not cause things like buffer overflows or double frees in managed languages, but it may make you send bogus orders to an exchange, corrupt data(base), turn off a pacemaker, make autopilot on a plane do something bogus, and so on. If you rely on what memory_order_relaxed does today (what started this discussion), your code will break if that changes. The bottom line is if you're relying on unspecified behavior, you can get undefined behavior in your system. If you rely on implementation details of a library, your code will break when that changes. How exactly the bug will manifest itself is secondary.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Dan Eloff

unread,

Feb 9, 2016, 6:14:17 PM2/9/16

to mechanica...@googlegroups.com

While the compiler is definitely the usual suspect, I think there's plenty of ways x86 (without any CPU bugs) could still call your launch_nucelar_missile() routine in a program with data races without the compiler having to have thrown you under the bus first. Programs with data races are undefined behavior even without any compiler optimizations enabled.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Todd Montgomery

unread,

Feb 9, 2016, 6:15:47 PM2/9/16

to mechanical-sympathy

Some UB is for things the compiler can't foresee or can't truly detect. Check out the UB associated with const_cast. http://en.cppreference.com/w/cpp/language/const_cast

Then imagine if that had to be defined for all possible access patterns.

Personally, I can deal with UB. I can find the rules and I can, for the most part understand them. Whether it be a hole in the spec or just avoiding a compatibility nightmare.

This has been a great discussion. Since we are bashing on C++ some, thought I would throw some stuff out for thought.

I've got to say, though, the memcpy/memset optimization behavior is not new. Variables and actions being optimized away is not new. Relying on side effects has always

been a dicey idea. Doesn't matter the language. Java has the same issues with variables that can be optimized out. Which is common in micro-benchmarks. It should not

be surprised that stdlib is optimized since it is now part of C++ spec. And has been part of the C spec for a very long time. It's been optimized for a while in many compilers.

http://en.cppreference.com/w/c/string/byte/memset

In fact, these issues are so well known and understood that there is a CERT attached to some and similar practices. Some links.

https://www.securecoding.cert.org/confluence/pages/viewpage.action?pageId=637

https://www.securecoding.cert.org/confluence/display/cplusplus/MSC06-CPP.+Be+aware+of+compiler+optimization+when+dealing+with+sensitive+data

https://www.securecoding.cert.org/confluence/display/c/MSC06-C.+Beware+of+compiler+optimizations

https://www.securecoding.cert.org/confluence/display/cplusplus/MSC15-CPP.+Do+not+depend+on+undefined+behavior

And if you think that Java is a secure language.... yeah, read through

https://www.securecoding.cert.org/confluence/display/java/SEI+CERT+Oracle+Coding+Standard+for+Java

After doing research for NASA in software safety for a number of years, there is no such thing as a safe/secure language/system/process except in the head of some marketing person.

Rajiv Kurian

unread,

Feb 9, 2016, 6:31:51 PM2/9/16

to mechanical-sympathy

On Tuesday, February 9, 2016 at 3:13:31 PM UTC-8, Vitaly Davidovich wrote:

CPU will not change your code but it can execute it in different order leading to undefined behavior from application point of view. It may not cause things like buffer overflows or double frees in managed languages, but it may make you send bogus orders to an exchange, corrupt data(base), turn off a pacemaker, make autopilot on a plane do something bogus, and so on. If you rely on what memory_order_relaxed does today (what started this discussion), your code will break if that changes. The bottom line is if you're relying on unspecified behavior, you can get undefined behavior in your system. If you rely on implementation details of a library, your code will break when that changes. How exactly the bug will manifest itself is secondary.

I think we agree that one should not depend on the implementation details of your library. I also agree that most of the changes the C compiler makes are legal - because of the lax spec. The lax spec is what I am against. The lax spec means that operations perfectly defined by the underlying hardware are potentially converted to nasal demons instead of causing compiler errors. Please note the big difference - x86 allows "sal eax 33", but C compilers choose to convert a shift by 33 to a 'rep ret' without a single warning. So the compiler identifies the problem, takes advantage of it and chooses NOT to inform you about it. And you are okay with it?

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Todd Montgomery

unread,

Feb 9, 2016, 6:50:18 PM2/9/16

to mechanical-sympathy

Are you sure about that? Yeah, I know the spec may be a bit lax. The language is: "In any case, the behavior is undefined if rhs is negative or is greater or equal the number of bits in the promoted lhs." However, most compilers I have seen do warn when the warning levels are set appropriately.... e.g.

#include <stdlib.h>

#include <stdio.h>

#include <stdint.h>

int main(int argc, char **argv)

{

uint32_t value = 42;

uint32_t value2 = value << 33;

uint32_t value3 = value >> 33;

}

$ gcc --std=c11 -Wall t.c

t.c:9:14: warning: unused variable 'value3' [-Wunused-variable]

uint32_t value3 = value >> 33;

^

t.c:8:14: warning: unused variable 'value2' [-Wunused-variable]

uint32_t value2 = value << 33;

^

t.c:8:29: warning: shift count >= width of type [-Wshift-count-overflow]

uint32_t value2 = value << 33;

^ ~~

t.c:9:29: warning: shift count >= width of type [-Wshift-count-overflow]

uint32_t value3 = value >> 33;

^ ~~

4 warnings generated.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Rajiv Kurian

unread,

Feb 9, 2016, 6:51:27 PM2/9/16

to mechanical-sympathy

On Tuesday, February 9, 2016 at 3:15:47 PM UTC-8, Todd L. Montgomery wrote:

Some UB is for things the compiler can't foresee or can't truly detect. Check out the UB associated with const_cast. http://en.cppreference.com/w/cpp/language/const_cast
Then imagine if that had to be defined for all possible access patterns.

Any UB the compiler knowingly decides to take advantage of during compilation (through optimizing away and or nasal demons) can be output as an error if they tried hard enough. The const_cast behavior you linked to cannot - so it's different. The fact that const_cast exists at all is all one needs to know about C++.

Personally, I can deal with UB. I can find the rules and I can, for the most part understand them. Whether it be a hole in the spec or just avoiding a compatibility nightmare.

Personally I can't. Given the bugs from just null checks optimized away in Linux, OpenSSL, glibc etc I am guessing I am not the only one.

This has been a great discussion. Since we are bashing on C++ some, thought I would throw some stuff out for thought.

I've got to say, though, the memcpy/memset optimization behavior is not new. Variables and actions being optimized away is not new. Relying on side effects has always
been a dicey idea. Doesn't matter the language. Java has the same issues with variables that can be optimized out. Which is common in micro-benchmarks. It should not
be surprised that stdlib is optimized since it is now part of C++ spec. And has been part of the C spec for a very long time. It's been optimized for a while in many compilers.

How is memset a side effect? Unless the compiler can prove that the memsetted memory is never read again (it can't because it is actually reused by the allocator) it is not free to optimize away the memset. It can't even look at the C code for the free call actually given I could load it at run time. Java does a rigorous job of proving no observable effects before optimizing away those variables.

http://en.cppreference.com/w/c/string/byte/memset

In fact, these issues are so well known and understood that there is a CERT attached to some and similar practices. Some links.

https://www.securecoding.cert.org/confluence/pages/viewpage.action?pageId=637
https://www.securecoding.cert.org/confluence/display/cplusplus/MSC06-CPP.+Be+aware+of+compiler+optimization+when+dealing+with+sensitive+data
https://www.securecoding.cert.org/confluence/display/c/MSC06-C.+Beware+of+compiler+optimizations
https://www.securecoding.cert.org/confluence/display/cplusplus/MSC15-CPP.+Do+not+depend+on+undefined+behavior

Do not depend on undefined behavior is easier said and done. Just like the compilers find it difficult to report undefined behavior (but no difficulty exploiting it), undefined behavior can arise from multiple levels of inlining of the user's code and a human might not be able to track it as well as the compiler. Why else are smart people like Ulrich Drepper and the maintainers of OpenSSL writing so much UB?

And if you think that Java is a secure language.... yeah, read through

https://www.securecoding.cert.org/confluence/display/java/SEI+CERT+Oracle+Coding+Standard+for+Java

That seems like a long list. A lot of it seems to be around vulnerability around things like SQL injection which is not a Java thing. Others seem to be around catching exceptions which is a best practice. None of the points seem to be undefined behavior. Java has some implementation defined behavior which is very different from nasal demons C++ undefined behavior.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Rajiv Kurian

unread,

Feb 9, 2016, 6:56:01 PM2/9/16

to mechanical-sympathy

Yes I am sure. Allow me to demonstrate - http://goo.gl/Hjiaji. This is the reason it is not easy for humans to figure out undefined behavior. Inlining causes the undefined behavior which is not present in any of the individual functions. Now one can imagine if I pass in the wrong number to code that is in another module that I have never checked the code for. The compiler might do LTO, inline my call and then take advantage of undefined behavior and not even tell me about it.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscribe@googl

Rajiv Kurian

unread,

Feb 9, 2016, 6:57:48 PM2/9/16

to mechanical-sympathy

Here is the output with -Wall -Wextra -pedantic-errors http://goo.gl/Lp8cm4

Todd Montgomery

unread,

Feb 9, 2016, 7:08:21 PM2/9/16

to mechanical-sympathy

On Tue, Feb 9, 2016 at 3:51 PM, Rajiv Kurian <geet...@gmail.com> wrote:

http://en.cppreference.com/w/c/string/byte/memset

In fact, these issues are so well known and understood that there is a CERT attached to some and similar practices. Some links.

https://www.securecoding.cert.org/confluence/pages/viewpage.action?pageId=637
https://www.securecoding.cert.org/confluence/display/cplusplus/MSC06-CPP.+Be+aware+of+compiler+optimization+when+dealing+with+sensitive+data
https://www.securecoding.cert.org/confluence/display/c/MSC06-C.+Beware+of+compiler+optimizations
https://www.securecoding.cert.org/confluence/display/cplusplus/MSC15-CPP.+Do+not+depend+on+undefined+behavior
Do not depend on undefined behavior is easier said and done. Just like the compilers find it difficult to report undefined behavior (but no difficulty exploiting it), undefined behavior can arise from multiple levels of inlining of the user's code and a human might not be able to track it as well as the compiler. Why else are smart people like Ulrich Drepper and the maintainers of OpenSSL writing so much UB?

Writing quality code isn't about being smart so much as rigorous and relentless. Having seen OpenSSL, it grew a lot of cruft over the years. It has issues. A lot of them.

Some obvious. Some not. And also, anyone can make mistakes.

If we started blaming a language as the root of all evil, we better start with JavaScript because there is some horrendous web apps out there. :)

And if you think that Java is a secure language.... yeah, read through

https://www.securecoding.cert.org/confluence/display/java/SEI+CERT+Oracle+Coding+Standard+for+Java
That seems like a long list. A lot of it seems to be around vulnerability around things like SQL injection which is not a Java thing. Others seem to be around catching exceptions which is a best practice. None of the points seem to be undefined behavior. Java has some implementation defined behavior which is very different from nasal demons C++ undefined behavior.

Java is a language controlled by a single company. C++ is a standards committee. This has pluses and minuses and compromises.

To be more specific around Java, check into the layers of buffer, etc. around https://www.securecoding.cert.org/confluence/display/java/MSC59-J.+Limit+the+lifetime+of+sensitive+data

Which is similar in nature. Different beasts, but similar security intention.

Rajiv Kurian

unread,

Feb 9, 2016, 7:19:57 PM2/9/16

to mechanical-sympathy

On Tuesday, February 9, 2016 at 4:08:21 PM UTC-8, Todd L. Montgomery wrote:

On Tue, Feb 9, 2016 at 3:51 PM, Rajiv Kurian <geet...@gmail.com> wrote:

http://en.cppreference.com/w/c/string/byte/memset

In fact, these issues are so well known and understood that there is a CERT attached to some and similar practices. Some links.

https://www.securecoding.cert.org/confluence/pages/viewpage.action?pageId=637
https://www.securecoding.cert.org/confluence/display/cplusplus/MSC06-CPP.+Be+aware+of+compiler+optimization+when+dealing+with+sensitive+data
https://www.securecoding.cert.org/confluence/display/c/MSC06-C.+Beware+of+compiler+optimizations
https://www.securecoding.cert.org/confluence/display/cplusplus/MSC15-CPP.+Do+not+depend+on+undefined+behavior
Do not depend on undefined behavior is easier said and done. Just like the compilers find it difficult to report undefined behavior (but no difficulty exploiting it), undefined behavior can arise from multiple levels of inlining of the user's code and a human might not be able to track it as well as the compiler. Why else are smart people like Ulrich Drepper and the maintainers of OpenSSL writing so much UB?

Writing quality code isn't about being smart so much as rigorous and relentless. Having seen OpenSSL, it grew a lot of cruft over the years. It has issues. A lot of them.
Some obvious. Some not. And also, anyone can make mistakes.

When experienced coders are regularly tapping into UB, one can blame the language. It is a weird mix of too low level but no low level enough to be able to be specific. We deserve better. It's been more than 50 years.

If we started blaming a language as the root of all evil, we better start with JavaScript because there is some horrendous web apps out there. :)

I write and read C++ regularly so I am more passionate about blaming it. I am sure I'd be blaming Javascript if I wrote it more often.

And if you think that Java is a secure language.... yeah, read through

https://www.securecoding.cert.org/confluence/display/java/SEI+CERT+Oracle+Coding+Standard+for+Java
That seems like a long list. A lot of it seems to be around vulnerability around things like SQL injection which is not a Java thing. Others seem to be around catching exceptions which is a best practice. None of the points seem to be undefined behavior. Java has some implementation defined behavior which is very different from nasal demons C++ undefined behavior.

Java is a language controlled by a single company. C++ is a standards committee. This has pluses and minuses and compromises.

Preaching to the choir. Like others have said before I reckon that changing these undefined behaviors to implementation defined behavior will already be a big difference. Almost no real world optimization opportunities will be lost and if a few are there are always ways to gain them back.

To be more specific around Java, check into the layers of buffer, etc. around https://www.securecoding.cert.org/confluence/display/java/MSC59-J.+Limit+the+lifetime+of+sensitive+data
Which is similar in nature. Different beasts, but similar security intention.

All of them to stem from Java garbage collection and objects not disappearing as soon as they are out of scope. The compiler is not eliminating memsets. There is a very straight forward way of writing zeros into a buffer in Java. C's compiler eliminates zeroing out memory even when it is actually read later when handed back by the memory allocator. More over given there is little (if any) undefined behavior in Java, it doesn't do any of the crazy things that C compilers allow one to do.

Todd Montgomery

unread,

Feb 9, 2016, 7:23:15 PM2/9/16

to mechanical-sympathy

int is platform dependent in size. It must be "at least" 16-bits. http://en.cppreference.com/w/c/language/arithmetic_types

And, yes, some platforms do have 16 bit and some 64 bit ints. I'm a fan of using fixed size types.

Interesting, if you play around with that, it seems to only warn when it can find a constant/literal that is passed directly to "<<". i.e. change shift45 to be: return z << 45 and the warning shows up.

To me this suggests the compiler can't infer passed the function declaration that particular warning. Which isn't surprising, really.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+

--

You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Rajiv Kurian

unread,

Feb 9, 2016, 7:40:19 PM2/9/16

to mechanical-sympathy

On Tuesday, February 9, 2016 at 4:23:15 PM UTC-8, Todd L. Montgomery wrote:

int is platform dependent in size. It must be "at least" 16-bits. http://en.cppreference.com/w/c/language/arithmetic_types
And, yes, some platforms do have 16 bit and some 64 bit ints. I'm a fan of using fixed size types.

Doesn't change anything. Try int_32 or uint_32.

Interesting, if you play around with that, it seems to only warn when it can find a constant/literal that is passed directly to "<<". i.e. change shift45 to be: return z << 45 and the warning shows up.
To me this suggests the compiler can't infer passed the function declaration that particular warning. Which isn't surprising, really.

It can definitely infer past the function declaration as it has so helpfully inlined the function, then figured out the UB and replaced my code with a 'ret'. The problem is the subsystem of the compiler that gives the warnings (front end) is not the same as the one making the optimization decision (backend). Further the backend pass that makes this decision is unpredictable too. Compilers could tread the transformations through the front end and each of the backend passes but that is a gigantic effort. Anyways it goes to show how difficult it is to know if your code has undefined behavior. A library you use might be perfectly fine, your code in isolation might be completely fine but a combination of both (or more) mixed in with some magic sprinkling of inlining and your program might have undefined behavior.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mail

Todd Montgomery

unread,

Feb 9, 2016, 7:42:10 PM2/9/16

to mechanical-sympathy

On Tue, Feb 9, 2016 at 4:19 PM, Rajiv Kurian <geet...@gmail.com> wrote:

Do not depend on undefined behavior is easier said and done. Just like the compilers find it difficult to report undefined behavior (but no difficulty exploiting it), undefined behavior can arise from multiple levels of inlining of the user's code and a human might not be able to track it as well as the compiler. Why else are smart people like Ulrich Drepper and the maintainers of OpenSSL writing so much UB?

Writing quality code isn't about being smart so much as rigorous and relentless. Having seen OpenSSL, it grew a lot of cruft over the years. It has issues. A lot of them.
Some obvious. Some not. And also, anyone can make mistakes.

When experienced coders are regularly tapping into UB, one can blame the language. It is a weird mix of too low level but no low level enough to be able to be specific. We deserve better. It's been more than 50 years.

I can't argue with that. It's a firearm without any safety. It's wise to carry it carefully.

If we started blaming a language as the root of all evil, we better start with JavaScript because there is some horrendous web apps out there. :)
I write and read C++ regularly so I am more passionate about blaming it. I am sure I'd be blaming Javascript if I wrote it more often.

I tend to not blame the language.

In C/C++, when I find something amiss, I check the spec and look into it. Usually I find out I've been doing something wrong and the spec says don't do it. To which I just sigh and learn. I don't like it, I just accept it. Call me resigned.

In Java, I tend to look at the doc, see basically it told me what to expect, then sigh and learn. Except when the optimizer won't inline. Then I use coarse language and throw things. :)

And if you think that Java is a secure language.... yeah, read through

https://www.securecoding.cert.org/confluence/display/java/SEI+CERT+Oracle+Coding+Standard+for+Java
That seems like a long list. A lot of it seems to be around vulnerability around things like SQL injection which is not a Java thing. Others seem to be around catching exceptions which is a best practice. None of the points seem to be undefined behavior. Java has some implementation defined behavior which is very different from nasal demons C++ undefined behavior.

To be more specific around Java, check into the layers of buffer, etc. around https://www.securecoding.cert.org/confluence/display/java/MSC59-J.+Limit+the+lifetime+of+sensitive+data

Which is similar in nature. Different beasts, but similar security intention.
All of them to stem from Java garbage collection and objects not disappearing as soon as they are out of scope. The compiler is not eliminating memsets. There is a very straight forward way of writing zeros into a buffer in Java. C's compiler eliminates zeroing out memory even when it is actually read later when handed back by the memory allocator. More over given there is little (if any) undefined behavior in Java, it doesn't do any of the crazy things that C compilers allow one to do.

True. But my point was that Java isn't really as secure of a language as we might think. The biggest security vulnerability isn't the language. It's the developers (lack of) rigor. And I would also extend that to safety.

Todd Montgomery

unread,

Feb 9, 2016, 7:47:40 PM2/9/16

to mechanical-sympathy

By infer, I did mean the warning subsystem. We could argue about the merits of how best to surface this type of thing so that the optimization and warning are both happy. But your point is well taken. Sometimes the compiler can warn and sometimes it can't. Just because how it is implemented. And with gray areas of the spec, it is going to be messy.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mail

--

You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Rajiv Kurian

unread,

Feb 9, 2016, 7:50:53 PM2/9/16

to mechanical-sympathy

On Tuesday, February 9, 2016 at 4:42:10 PM UTC-8, Todd L. Montgomery wrote:

On Tue, Feb 9, 2016 at 4:19 PM, Rajiv Kurian <geet...@gmail.com> wrote:
Do not depend on undefined behavior is easier said and done. Just like the compilers find it difficult to report undefined behavior (but no difficulty exploiting it), undefined behavior can arise from multiple levels of inlining of the user's code and a human might not be able to track it as well as the compiler. Why else are smart people like Ulrich Drepper and the maintainers of OpenSSL writing so much UB?

Writing quality code isn't about being smart so much as rigorous and relentless. Having seen OpenSSL, it grew a lot of cruft over the years. It has issues. A lot of them.
Some obvious. Some not. And also, anyone can make mistakes.

When experienced coders are regularly tapping into UB, one can blame the language. It is a weird mix of too low level but no low level enough to be able to be specific. We deserve better. It's been more than 50 years.

I can't argue with that. It's a firearm without any safety. It's wise to carry it carefully.

If we started blaming a language as the root of all evil, we better start with JavaScript because there is some horrendous web apps out there. :)
I write and read C++ regularly so I am more passionate about blaming it. I am sure I'd be blaming Javascript if I wrote it more often.

I tend to not blame the language.

In C/C++, when I find something amiss, I check the spec and look into it. Usually I find out I've been doing something wrong and the spec says don't do it. To which I just sigh and learn. I don't like it, I just accept it. Call me resigned.

In Java, I tend to look at the doc, see basically it told me what to expect, then sigh and learn. Except when the optimizer won't inline. Then I use coarse language and throw things. :)

Meh, I blame the language. The language being the spec that is. The language is specific about when it is NOT specific - doesn't help much because it is ultimately not very specific. I like this test that I read on twitter the other day. Would you use a compiler which will give you 4x speed but if you have a single undefined behavior anywhere it will kill your entire family? I'd chose no if I was given the choice :)

Rajiv Kurian

unread,

Feb 9, 2016, 8:03:02 PM2/9/16

to mechanical-sympathy

On Tuesday, February 9, 2016 at 4:47:40 PM UTC-8, Todd L. Montgomery wrote:

By infer, I did mean the warning subsystem. We could argue about the merits of how best to surface this type of thing so that the optimization and warning are both happy. But your point is well taken. Sometimes the compiler can warn and sometimes it can't. Just because how it is implemented. And with gray areas of the spec, it is going to be messy.

Yes how to surface this will be a technically challenging process. Ultimately I think every undefined behavior that the compiler can detect (and optimize away) should be a compiler error. It will be challenging to surface that as an actionable error to the user given they will have to be told about the multiple levels of inlining etc that led to the UB. Or we can make things implementation defined which would be a huge upgrade over UB IMHO.

Todd Montgomery

unread,

Feb 9, 2016, 8:30:21 PM2/9/16

to mechanical-sympathy

On Tue, Feb 9, 2016 at 4:50 PM, Rajiv Kurian <geet...@gmail.com> wrote:

Meh, I blame the language. The language being the spec that is. The language is specific about when it is NOT specific - doesn't help much because it is ultimately not very specific. I like this test that I read on twitter the other day. Would you use a compiler which will give you 4x speed but if you have a single undefined behavior anywhere it will kill your entire family? I'd chose no if I was given the choice :)

Then you shouldn't choose Java since it isn't defined behavior over random memory corruption due to hardware failure. ;)

Those types of arguments aren't very convincing as they narrow the choice to a logical fallacy and it's also a strawman. It could be rephrased many ways to limit the choice to whatever you wish (response time, security, etc.)

Having had to help verify systems and assess risk where loss of life and craft was a non-trivial possibility, I wish the choice of language were that simple. But it sadly isn't. All specs are incomplete, all languages are flawed in some way given arbitrary requirements.

Rajiv Kurian

unread,

Feb 9, 2016, 8:41:17 PM2/9/16

to mechanical-sympathy

That's a strawman if I heard one. All languages are crap - so every choice is equally valid. I don't much like Java, but it is definitely tougher to shoot yourself in the foot with it. Does hardware failure affect Java measurably more than anything else? I am not sure about that. Do security problems from UB affect C code more than Java? I think so. My hope is for a new language like I've said many times on this thread - not to replace C with Java. Till then I can only hope that I don't fall into one of the hundreds of traps that people much more experienced than me have.

Todd Montgomery

unread,

Feb 9, 2016, 8:59:37 PM2/9/16

to mechanical-sympathy

My apologies. Didn't mean to give the impression that I think all languages are crap. Just that the choice of language is not simple.

The point about specs being incomplete, flawed, etc. was intended as a separate point. Worked on too many standards docs, read too many. I'm cynical.

For the record, I don't have a horse in the race. I've used a lot of C/C++. And within the last 7 years done a good bit of Java. But I constantly look at other

languages anyway. I set a personal goal to learn at least one new language every year.

Does hw failure effect C more than Java? of course not. I was just pointing out there is always some undefined nature.

Do security problems from UB affect C more than Java? I think so, too. Sadly.

Really enjoyed the discussion, Rajiv. Thanks!

--

You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Rajiv Kurian

unread,

Feb 9, 2016, 9:08:36 PM2/9/16

to mechanical-sympathy

Same here - also apologies if I was rude (I am pretty certain I was). Same here - I don't have much of a horse in the race. I started as a Java hater after writing quite a bit of C. But I do see after a couple years of Java on why it is a good trade-off for some software. I have made many arguments against Java and for C++ on this forum in the past - funny how things change.

Enjoyed the discussion too as always Todd.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Todd Montgomery

unread,

Feb 9, 2016, 9:18:22 PM2/9/16

to mechanical-sympathy

Pfft. You weren't rude. :)

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Dan Eloff

unread,

Feb 9, 2016, 9:29:57 PM2/9/16

to mechanica...@googlegroups.com

I agree wholeheartedly, if the compiler can detect UB and abuse it, I would rather it dump an (even unhelpful) error instead. And some could probably just be fixed or made optional for esoteric platforms.

To get back to the original subject here, after a fascinating diversion, for those interested in RCU-style algorithms like QSBR there's a userspace RCU library:

https://lwn.net/Articles/573424/

And the QSBR source from that library can be found here:

http://git.lttng.org/?p=userspace-rcu.git;a=blob;f=urcu-qsbr.c;h=af82fb7e6d3d6bbe3d6ef199c5c9d23ce9253163;hb=HEAD

--

Sanjoy Das

unread,

Feb 9, 2016, 11:43:22 PM2/9/16

to mechanica...@googlegroups.com

Dan Eloff wrote:
> I agree wholeheartedly, if the compiler can detect UB and abuse it, I would rather it dump an (even unhelpful) error
> instead.

In theory this can be done, but in practice doing this without a huge
number of false positives is a major layering issue. For instance,
the component that "proves" `x s< (x+1)` usually has no idea if
simplifying that predicate will actually result in a transform that
exploits the UB on overflow. I'd say you'd have to structure a C/C++
compiler from ground up to accommodate the kind of design that would
print a diagnostic when UB has been exploited.

Also, relevant to this thread, there has been some proposals around cutting down the amount of
UB in C/C++ to manageable levels:
http://blog.regehr.org/archives/1180.

-- Sanjoy

Francesco Nigro

unread,

Feb 10, 2016, 3:22:15 AM2/10/16

to mechanical-sympathy

For others interested in QSBR:

http://www.cs.toronto.edu/~tomhart/masters_thesis.html

Anyway, the OTs in this group are always so instructive :)

I was expecting Gil to give his view of the tradeoff "compiler safeness" vs "programmer responsibility", i'm curious...

Vitaly Davidovich

unread,

Feb 10, 2016, 7:54:01 AM2/10/16

to mechanica...@googlegroups.com

I think this discussion is winding down but I'll comment on a few select things ...

On Tuesday, February 9, 2016, Rajiv Kurian <geet...@gmail.com> wrote:

On Tuesday, February 9, 2016 at 3:13:31 PM UTC-8, Vitaly Davidovich wrote:
CPU will not change your code but it can execute it in different order leading to undefined behavior from application point of view. It may not cause things like buffer overflows or double frees in managed languages, but it may make you send bogus orders to an exchange, corrupt data(base), turn off a pacemaker, make autopilot on a plane do something bogus, and so on. If you rely on what memory_order_relaxed does today (what started this discussion), your code will break if that changes. The bottom line is if you're relying on unspecified behavior, you can get undefined behavior in your system. If you rely on implementation details of a library, your code will break when that changes. How exactly the bug will manifest itself is secondary.

I think we agree that one should not depend on the implementation details of your library. I also agree that most of the changes the C compiler makes are legal - because of the lax spec. The lax spec is what I am against. The lax spec means that operations perfectly defined by the underlying hardware are potentially converted to nasal demons instead of causing compiler errors. Please note the big difference - x86 allows "sal eax 33", but C compilers choose to convert a shift by 33 to a 'rep ret' without a single warning. So the compiler identifies the problem, takes advantage of it and chooses NOT to inform you about it. And you are okay with it?

No, I'm not ok with it. Your shift45 example in this thread is a good one and illustrates a problem with the compiler itself, not just the spec (I think we agree the spec is the root enabler of this mess).

But, let's suppose overflowing shifts were implementation defined rather than UB. In the shift example, it's highly likely that shifting an int32 by 45 is a bug. With impl defined behavior you'll get some answer, but it will be just as wrong as compiler returning 0 since it's a bug. That wrong answer will have undefined consequences for further execution.

The bigger issue, IMO, is the compiler can remove *subsequent* perfectly legal code if it's dominated by the illegal code.

--

You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+

Rajiv Kurian

unread,

Feb 10, 2016, 11:29:20 AM2/10/16

to mechanical-sympathy

On Wednesday, February 10, 2016 at 4:54:01 AM UTC-8, Vitaly Davidovich wrote:

I think this discussion is winding down but I'll comment on a few select things ...

On Tuesday, February 9, 2016, Rajiv Kurian <geet...@gmail.com> wrote:

On Tuesday, February 9, 2016 at 3:13:31 PM UTC-8, Vitaly Davidovich wrote:
CPU will not change your code but it can execute it in different order leading to undefined behavior from application point of view. It may not cause things like buffer overflows or double frees in managed languages, but it may make you send bogus orders to an exchange, corrupt data(base), turn off a pacemaker, make autopilot on a plane do something bogus, and so on. If you rely on what memory_order_relaxed does today (what started this discussion), your code will break if that changes. The bottom line is if you're relying on unspecified behavior, you can get undefined behavior in your system. If you rely on implementation details of a library, your code will break when that changes. How exactly the bug will manifest itself is secondary.

I think we agree that one should not depend on the implementation details of your library. I also agree that most of the changes the C compiler makes are legal - because of the lax spec. The lax spec is what I am against. The lax spec means that operations perfectly defined by the underlying hardware are potentially converted to nasal demons instead of causing compiler errors. Please note the big difference - x86 allows "sal eax 33", but C compilers choose to convert a shift by 33 to a 'rep ret' without a single warning. So the compiler identifies the problem, takes advantage of it and chooses NOT to inform you about it. And you are okay with it?
No, I'm not ok with it. Your shift45 example in this thread is a good one and illustrates a problem with the compiler itself, not just the spec (I think we agree the spec is the root enabler of this mess).

But, let's suppose overflowing shifts were implementation defined rather than UB. In the shift example, it's highly likely that shifting an int32 by 45 is a bug. With impl defined behavior you'll get some answer, but it will be just as wrong as compiler returning 0 since it's a bug. That wrong answer will have undefined consequences for further execution.

I understand what you are saying but this is not necessarily true. If I am coding for x86 for example, shift by greater than bit width does what I want it to do when writing a bit map. Here is an example for a 128 bit bitmap - http://goo.gl/WvzT2c Note how the setBitWithDefensiveShift() and setBitWithoutDefensiveShift() have the exact same assembly. The compiler notices that the extra mod 64 is redundant on x86 and eliminates it in the setBithWithDefensiveShift() version. If you were to call either function with a random bitIndex your output would be exactly the same. However when you try the 2 functions with the exact same constant numbers, the setBitWithoutDefensiveShift() invokes undefined behavior and the setBitWithDefensiveShift() one doesn't! This is pretty sad given the compiler had the smarts to know that the mod 64 was not needed to begin with and compiled both functions down to the same assembly. Now here is the version where we forbid the compiler from inlining - http://goo.gl/zmM3kv The assembly is now exactly the same for the implementation and the invocations. We are at the mercy of the compiler. So if I was in an implementation defined world I would be fine, but in the current world I am not. Sure the code is not portable, but not all code has to be. Every time I use AVX intrinsics etc I make my code non-portable and I am ready to accept that cost.

The bigger issue, IMO, is the compiler can remove *subsequent* perfectly legal code if it's dominated by the illegal code.

Exactly! The scope of undefined behavior is undefined. If you used the NASA guidelines and separated your code into modules that only talked through message passing to isolate errors, your entire program could still be legally compiled to absolute garbage by a valid C compiler even if only a single module had undefined behavior. And that undefined behavior might have been completely defined by your implementation in a way where all your modules were working as intended, which makes matters worse.

Vitaly Davidovich

unread,

Feb 10, 2016, 1:08:12 PM2/10/16

to mechanical-sympathy

I understand what you are saying but this is not necessarily true. If I am coding for x86 for example, shift by greater than bit width does what I want it to do when writing a bit map. Here is an example for a 128 bit bitmap -http://goo.gl/WvzT2c Note how the setBitWithDefensiveShift() and setBitWithoutDefensiveShift() have the exact same assembly. The compiler notices that the extra mod 64 is redundant on x86 and eliminates it in the setBithWithDefensiveShift() version. If you were to call either function with a random bitIndex your output would be exactly the same. However when you try the 2 functions with the exact same constant numbers, the setBitWithoutDefensiveShift() invokes undefined behavior and the setBitWithDefensiveShift() one doesn't! This is pretty sad given the compiler had the smarts to know that the mod 64 was not needed to begin with and compiled both functions down to the same assembly. Now here is the version where we forbid the compiler from inlining - http://goo.gl/zmM3kv The assembly is now exactly the same for the implementation and the invocations. We are at the mercy of the compiler

Well, in the UB examples you're explicitly invoking UB. In the no UB, you're explicitly reducing the bitIndex because you know overflow is UB. The fact the compiler treats it as nop and generates same assembly is beside the point -- you wrote different semantics. Having said that, I agree the compiler could do a better job of warning on this - this is another example of missing diagnostics, like discussed earlier in the thread.

Soooooo, use macros instead of inline functions! :) j/k

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Rajiv Kurian

unread,

Feb 10, 2016, 2:09:00 PM2/10/16

to mechanical-sympathy

On Wednesday, February 10, 2016 at 10:08:12 AM UTC-8, Vitaly Davidovich wrote:

I understand what you are saying but this is not necessarily true. If I am coding for x86 for example, shift by greater than bit width does what I want it to do when writing a bit map. Here is an example for a 128 bit bitmap -http://goo.gl/WvzT2c Note how the setBitWithDefensiveShift() and setBitWithoutDefensiveShift() have the exact same assembly. The compiler notices that the extra mod 64 is redundant on x86 and eliminates it in the setBithWithDefensiveShift() version. If you were to call either function with a random bitIndex your output would be exactly the same. However when you try the 2 functions with the exact same constant numbers, the setBitWithoutDefensiveShift() invokes undefined behavior and the setBitWithDefensiveShift() one doesn't! This is pretty sad given the compiler had the smarts to know that the mod 64 was not needed to begin with and compiled both functions down to the same assembly. Now here is the version where we forbid the compiler from inlining - http://goo.gl/zmM3kv The assembly is now exactly the same for the implementation and the invocations. We are at the mercy of the compiler

Well, in the UB examples you're explicitly invoking UB. In the no UB, you're explicitly reducing the bitIndex because you know overflow is UB. The fact the compiler treats it as nop and generates same assembly is beside the point -- you wrote different semantics. Having said that, I agree the compiler could do a better job of warning on this - this is another example of missing diagnostics, like discussed earlier in the thread.

Yes, if it is UB and the compiler can detect it then it should create some noise instead of generating noops (or worse). Except I don't think it should be a warning - it should be a in your face "We are about to delete your code - fix it or we won't compile!!" error. Also shows that implementation defined would be fine in this example because the implementation (x86) is perfectly fine with the shift above register width and the code is still perfectly right for the implementation it was built for. I didn't generate separate semantics for the platform I was coding for. It is the exact same semantics (as observed by the compiler). When I mempcy an uint64_t to a uint8_t array of size 8, I get platform defined behavior not a 'ret'. I don't see any good reason why shift should be otherwise besides "Because the standard says so". There is no good reason to make shift by greater than register width undefined when every hardware I know of defines it. Shift is just one example - detectable unaligned loads, detectable data races, detectable read from uninitialized storage, detectable access beyond lifetime are all things that can have much better behavior when implementation defined or unspecified (or preferably error message) than the current nasal demons UB. Returning 0 can hide problems forever. The compiler is allowed to return the right answer every time in your test binary and invoke the UB only in the release binary. John Reghr's blog post that Sanjoy linked to goes into more details. I don't see any of his suggestions leading to any performance loss in real life code.

Only in the C/C++ community do I see people okay with this adversarial nature between compilers and programmers. The whole "you did wrong, so you must be punished" attitude doesn't make sense to me. The compiler folks took great effort in adding all these smarts to detect UB. But since reporting these would require a complete rewrite of the compiler - they decided exploiting it for "performance gains" was a better idea. The spec and its implementations are supposed to help us find our errors, not punish us for the sins we've committed. Every time there is a security bug from such UB, I see the security community (full of experienced C developers) wanting a simple C compiler (don't think this will help) or a friendly C dialect (I do think this will help). We could rally for a better language/spec or we could accept programs written by expert C programmers breaking every other compiler upgrade. Like John Reghr says here - I'm pretty sure I know which one of these will happen.

Soooooo, use macros instead of inline functions! :) j/k

Don't even joke about that :)

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Rajiv Kurian

unread,

Feb 10, 2016, 2:10:05 PM2/10/16

to mechanical-sympathy

On Tuesday, February 9, 2016 at 8:43:22 PM UTC-8, sanjoy wrote:

Dan Eloff wrote:
> I agree wholeheartedly, if the compiler can detect UB and abuse it, I would rather it dump an (even unhelpful) error
> instead.

In theory this can be done, but in practice doing this without a huge
number of false positives is a major layering issue. For instance,
the component that "proves" `x s< (x+1)` usually has no idea if
simplifying that predicate will actually result in a transform that
exploits the UB on overflow. I'd say you'd have to structure a C/C++
compiler from ground up to accommodate the kind of design that would
print a diagnostic when UB has been exploited.

Yes I've given up hope on that a long time ago. It is too difficult a task for current compilers to accomplish.

Also, relevant to this thread, there has been some proposals around cutting down the amount of
UB in C/C++ to manageable levels:
http://blog.regehr.org/archives/1180.

Something like this would be a big win indeed.

-- Sanjoy

Vitaly Davidovich

unread,

Feb 10, 2016, 2:45:11 PM2/10/16

to mechanica...@googlegroups.com

On Wednesday, February 10, 2016, Rajiv Kurian <geet...@gmail.com> wrote:

On Wednesday, February 10, 2016 at 10:08:12 AM UTC-8, Vitaly Davidovich wrote:
I understand what you are saying but this is not necessarily true. If I am coding for x86 for example, shift by greater than bit width does what I want it to do when writing a bit map. Here is an example for a 128 bit bitmap -http://goo.gl/WvzT2c Note how the setBitWithDefensiveShift() and setBitWithoutDefensiveShift() have the exact same assembly. The compiler notices that the extra mod 64 is redundant on x86 and eliminates it in the setBithWithDefensiveShift() version. If you were to call either function with a random bitIndex your output would be exactly the same. However when you try the 2 functions with the exact same constant numbers, the setBitWithoutDefensiveShift() invokes undefined behavior and the setBitWithDefensiveShift() one doesn't! This is pretty sad given the compiler had the smarts to know that the mod 64 was not needed to begin with and compiled both functions down to the same assembly. Now here is the version where we forbid the compiler from inlining - http://goo.gl/zmM3kv The assembly is now exactly the same for the implementation and the invocations. We are at the mercy of the compiler

Well, in the UB examples you're explicitly invoking UB. In the no UB, you're explicitly reducing the bitIndex because you know overflow is UB. The fact the compiler treats it as nop and generates same assembly is beside the point -- you wrote different semantics. Having said that, I agree the compiler could do a better job of warning on this - this is another example of missing diagnostics, like discussed earlier in the thread.

Yes, if it is UB and the compiler can detect it then it should create some noise instead of generating noops (or worse). Except I don't think it should be a warning - it should be a in your face "We are about to delete your code - fix it or we won't compile!!" error.

In this case, yes, but compilers routinely delete your code as part of normal mundane optimization. It would need to be smart enough to not cause false positives. Then you probably have pass ordering issues inside the backend where the pass that does this transform no longer has access to the AST or at least unmodified IR.

Also shows that implementation defined would be fine in this example because the implementation (x86) is perfectly fine with the shift above register width and the code is still perfectly right for the implementation it was built for.

Good argument if you're writing assembly, but you're not. As you said earlier, it's no longer a portable assembly. It may never have been in principle since you were always targeting an abstract C machine but compilers didn't do much optimization. If you're writing C/C++ you need to obey its rules (and their rules kinda suck by modern standards, which I agree with)

I didn't generate separate semantics for the platform I was coding for. It is the exact same semantics (as observed by the compiler). When I mempcy an uint64_t to a uint8_t array of size 8, I get platform defined behavior not a 'ret'. I don't see any good reason why shift should be otherwise besides "Because the standard says so". There is no good reason to make shift by greater than register width undefined when every hardware I know of defines it. Shift is just one example - detectable unaligned loads, detectable data races, detectable read from uninitialized storage, detectable access beyond lifetime are all things that can have much better behavior when implementation defined or unspecified (or preferably error message) than the current nasal demons UB. Returning 0 can hide problems forever. The compiler is allowed to return the right answer every time in your test binary and invoke the UB only in the release binary. John Reghr's blog post that Sanjoy linked to goes into more details. I don't see any of his suggestions leading to any performance loss in real life code.

Only in the C/C++ community do I see people okay with this adversarial nature between compilers and programmers. The whole "you did wrong, so you must be punished" attitude doesn't make sense to me. The compiler folks took great effort in adding all these smarts to detect UB. But since reporting these would require a complete rewrite of the compiler - they decided exploiting it for "performance gains" was a better idea. The spec and its implementations are supposed to help us find our errors, not punish us for the sins we've committed. Every time there is a security bug from such UB, I see the security community (full of experienced C developers) wanting a simple C compiler (don't think this will help) or a friendly C dialect (I do think this will help). We could rally for a better language/spec or we could accept programs written by expert C programmers breaking every other compiler upgrade. Like John Reghr says here - I'm pretty sure I know which one of these will happen.

I'm not sure anyone is OK with it - that's why there are attempts at creating new languages with less booby traps, sanitizers, static analysis tools, etc. I'm just not sure much can be done about C, realistically speaking. I think if you have to write C then you have to play by the (tricky, hostile, brittle, error prone, confusing, etc) rules. There are definitely C projects that are of top notch quality despite all these issues (e.g sqlite, postgresql, Linux, the various BSDs), so it's possible - just don't rely on compiler on holding your hand.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Rajiv Kurian

unread,

Feb 10, 2016, 3:06:34 PM2/10/16

to mechanical-sympathy

On Wednesday, February 10, 2016 at 11:45:11 AM UTC-8, Vitaly Davidovich wrote:

On Wednesday, February 10, 2016, Rajiv Kurian <geet...@gmail.com> wrote:

On Wednesday, February 10, 2016 at 10:08:12 AM UTC-8, Vitaly Davidovich wrote:
I understand what you are saying but this is not necessarily true. If I am coding for x86 for example, shift by greater than bit width does what I want it to do when writing a bit map. Here is an example for a 128 bit bitmap -http://goo.gl/WvzT2c Note how the setBitWithDefensiveShift() and setBitWithoutDefensiveShift() have the exact same assembly. The compiler notices that the extra mod 64 is redundant on x86 and eliminates it in the setBithWithDefensiveShift() version. If you were to call either function with a random bitIndex your output would be exactly the same. However when you try the 2 functions with the exact same constant numbers, the setBitWithoutDefensiveShift() invokes undefined behavior and the setBitWithDefensiveShift() one doesn't! This is pretty sad given the compiler had the smarts to know that the mod 64 was not needed to begin with and compiled both functions down to the same assembly. Now here is the version where we forbid the compiler from inlining - http://goo.gl/zmM3kv The assembly is now exactly the same for the implementation and the invocations. We are at the mercy of the compiler

Well, in the UB examples you're explicitly invoking UB. In the no UB, you're explicitly reducing the bitIndex because you know overflow is UB. The fact the compiler treats it as nop and generates same assembly is beside the point -- you wrote different semantics. Having said that, I agree the compiler could do a better job of warning on this - this is another example of missing diagnostics, like discussed earlier in the thread.

Yes, if it is UB and the compiler can detect it then it should create some noise instead of generating noops (or worse). Except I don't think it should be a warning - it should be a in your face "We are about to delete your code - fix it or we won't compile!!" error.
In this case, yes, but compilers routinely delete your code as part of normal mundane optimization. It would need to be smart enough to not cause false positives. Then you probably have pass ordering issues inside the backend where the pass that does this transform no longer has access to the AST or at least unmodified IR.

Also shows that implementation defined would be fine in this example because the implementation (x86) is perfectly fine with the shift above register width and the code is still perfectly right for the implementation it was built for.
Good argument if you're writing assembly, but you're not. As you said earlier, it's no longer a portable assembly. It may never have been in principle since you were always targeting an abstract C machine but compilers didn't do much optimization. If you're writing C/C++ you need to obey its rules (and their rules kinda suck by modern standards, which I agree with)

Yes the rules are stupid - that's my point. Shifting by more than register-width is undefined by the spec (but not impl), but copying an int to a byte array is platform defined (even without writing assembly).Telling me I need to obey stupid rules if I want write C is redundant. We all know that - it's not helping. Quoting D.J. Bernstein - Claim that earthquakes in the behavior of "undefined" programs will teach C programmers to stop writing such programs. "That'll show 'em!" But the reality is that this hasn't worked and won't work.

I didn't generate separate semantics for the platform I was coding for. It is the exact same semantics (as observed by the compiler). When I mempcy an uint64_t to a uint8_t array of size 8, I get platform defined behavior not a 'ret'. I don't see any good reason why shift should be otherwise besides "Because the standard says so". There is no good reason to make shift by greater than register width undefined when every hardware I know of defines it. Shift is just one example - detectable unaligned loads, detectable data races, detectable read from uninitialized storage, detectable access beyond lifetime are all things that can have much better behavior when implementation defined or unspecified (or preferably error message) than the current nasal demons UB. Returning 0 can hide problems forever. The compiler is allowed to return the right answer every time in your test binary and invoke the UB only in the release binary. John Reghr's blog post that Sanjoy linked to goes into more details. I don't see any of his suggestions leading to any performance loss in real life code.

Only in the C/C++ community do I see people okay with this adversarial nature between compilers and programmers. The whole "you did wrong, so you must be punished" attitude doesn't make sense to me. The compiler folks took great effort in adding all these smarts to detect UB. But since reporting these would require a complete rewrite of the compiler - they decided exploiting it for "performance gains" was a better idea. The spec and its implementations are supposed to help us find our errors, not punish us for the sins we've committed. Every time there is a security bug from such UB, I see the security community (full of experienced C developers) wanting a simple C compiler (don't think this will help) or a friendly C dialect (I do think this will help). We could rally for a better language/spec or we could accept programs written by expert C programmers breaking every other compiler upgrade. Like John Reghr says here - I'm pretty sure I know which one of these will happen.
I'm not sure anyone is OK with it - that's why there are attempts at creating new languages with less booby traps, sanitizers, static analysis tools, etc. I'm just not sure much can be done about C, realistically speaking. I think if you have to write C then you have to play by the (tricky, hostile, brittle, error prone, confusing, etc) rules. There are definitely C projects that are of top notch quality despite all these issues (e.g sqlite, postgresql, Linux, the various BSDs), so it's possible - just don't rely on compiler on holding your hand.

There are no definitely top notch C projects given how there is no UB detection that is full proof. Like John Reghr says C/C++ programs are not future proof. No amount of running sanitizers will help with that. Your sanitizer output is dependent on the code the backend generates, which changes all the time. Linux, glibc, OpenBSD, FreeBSD and other projects thought to be top notch have all had bugs in them exposed by exploitation of UB after compiler upgrades. I am sure you'll be able to find them so not linking. So it might be possible to write top notch quality code in C but the Linux and BSD folks have not demonstrated that.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Vitaly Davidovich

unread,

Feb 10, 2016, 3:14:37 PM2/10/16

to mechanical-sympathy

Let me just ask you this after having this discussion -- why are you using C?

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

--

You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Rajiv Kurian

unread,

Feb 10, 2016, 3:30:51 PM2/10/16

to mechanical-sympathy

On Wednesday, February 10, 2016 at 12:14:37 PM UTC-8, Vitaly Davidovich wrote:

Let me just ask you this after having this discussion -- why are you using C?

Because I used to get paid to do it (no longer). I am hoping with Rust and other new native languages coming up I won't have to any more. Btw for any one feeling like they need a good read - please read section J.2 Undefined behavior of the C standard When I read the 13 pages (yes 13 pages) of undefined behavior listed in that section I reevaluated my life choices. If you are still not scared - there's dangerous optimization and the loss of causality.

...

Rajiv Kurian

unread,

Feb 10, 2016, 3:47:05 PM2/10/16

to mechanical-sympathy

For the lazy here are some of my favorites:

1. A program in a hosted environment does not define a function named main using one of the specified forms - not a compile time error, it's UB.

2. The arguments to certain operators are such that could produce a negative zero result, but the implementation does not support negative zeros.

3. Two declarations of the same object or function specify types that are not compatible - again not a compilation error, it's UB.

4. An unmatched ' or " character is encountered on a logical source line during tokenization. During tokenization!!

5. An exceptional condition occurs during the evaluation of an expression - how specific.

6. An exceptional condition occurs during the evaluation of an expression - hmmm

7. A structure or union is defined as containing no named members - ookay.

8. The value of an unnamed member of a structure or union is used - why is there a compiler at all!

9. The character sequence in an #include preprocessing directive does not start with a letter - lol

10. The program modifies the string pointed to by the value returned by the setlocale function - dear god.

11. The string set up by the getenv or strerror function is modified by the program .

12. The array being searched by the bsearch function does not have its elements in proper order - not a logical error just straight up UB.

This is the tip of the iceberg. How confident is any one now that their program doesn't invoke UB? Things many programmer would assume are compiler errors are actually UB. What would happen if gcc/clang exploited each one of these?

Vitaly Davidovich

unread,

Feb 10, 2016, 4:00:26 PM2/10/16

to mechanica...@googlegroups.com

On Wednesday, February 10, 2016, Rajiv Kurian <geet...@gmail.com> wrote:

On Wednesday, February 10, 2016 at 12:14:37 PM UTC-8, Vitaly Davidovich wrote:
Let me just ask you this after having this discussion -- why are you using C?
Because I used to get paid to do it (no longer).

Fair enough (and what I thought your answer would be).

So suppose you were going to write some systems-like or performance sensitive software (e.g database, user space networking, OS kernel, AAA game, financial exchange, flight simulator, etc) today that needs to run across a variety of hardware - what would you pick today?

--

You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rajiv Kurian

unread,

Feb 10, 2016, 4:55:49 PM2/10/16

to mechanical-sympathy

On Wednesday, February 10, 2016 at 1:00:26 PM UTC-8, Vitaly Davidovich wrote:

On Wednesday, February 10, 2016, Rajiv Kurian <geet...@gmail.com> wrote:

On Wednesday, February 10, 2016 at 12:14:37 PM UTC-8, Vitaly Davidovich wrote:
Let me just ask you this after having this discussion -- why are you using C?
Because I used to get paid to do it (no longer).
Fair enough (and what I thought your answer would be).

So suppose you were going to write some systems-like or performance sensitive software (e.g database, user space networking, OS kernel, AAA game, financial exchange, flight simulator, etc) today that needs to run across a variety of hardware - what would you pick today?

SystemVerilog or VHDL - j/k. For games I think it is still C++ just because of the culture. Jonathan's Blow's language looks good for games too but I am not sure about the scope of UB in it. I know he has been frustrated by the scope of UB in C++, so I am hoping he will take a sane position on it. It also has some seriously cool features like extensive compile time reflection, AOS -> SOA transformations with a keyword etc. If it is a critical project like a defense system or other critical embedded systems (these usually are coded for a specific platform) I think the only good choice is assembly. Again the invasion of C++ in this field worries me a bit. Hopefully they use their own compilers. The fact that NASA now allows C++ (a heavily neutered version) is a little worrying to me. They do have their own compilers so they can control their ecosystem very well.

For cross platform needs I don't think there is any choice but C - not because it's the best but because it is the lowest common denominator i.e. every h/w vendor ships a C compiler. According to me it is the most unsafe language one can code an OS or any security sensitive code in. Applications running on the OS (even ones with UB) have some guarantees (however the OS folks ensure them) and so the scope of the havoc they can wreak is limited by their privilege. But the OS code itself is very vulnerable. My big hope was Rust - but I am too stupid to be able to write allocation-free code in it without using unsafe. I end up either allocating or using unsafe and AFAICT unsafe for them equals nasal demons too. I need to look at how people write Rust for embedded systems - don't know enough about it. Even though Rust has UB, the scope of it in Rust is limited unlike in C++. If Rust decides to take advantage of UB (please let it not be) it can only do so in the scope of an unsafe block. It can't change your safe rust to print your resignation letter and then send it to your boss because of your unsafe Rust. Rust is also restricted in choice of platforms to the ones LLVM supports.

...

Matt Godbolt

unread,

Feb 11, 2016, 10:46:59 AM2/11/16

to mechanical-sympathy

Just to give one example where UB (or something similar) can be used for good (instead of evil) by compilers:

struct Foo {
  Foo *next; 
  
  void release() {
    Foo *tmp = 0;
    for (Foo *it = next; it; it = tmp) {

#ifndef NDEBUG

some kind of check/assertion on 'it'

#endif
      tmp = it->next;
    }
  }
};

void test(Foo &f) { f.release(); }

This loop doesn't vanish in release builds on GCC, but does on clang. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67809

The reason clang can optimize it away is that an infinite loop that doesn't call any library I/O functions or synchronize on anything isn't allowed (not exactly UB, but the compiler is allowed to assume this according to the reference cited in the bug). Thus, clang assumes the loop will terminate (else it'd be UB I guess), and just drops the whole thing. GCC still does all the pointer chasing even when there's nothing to do.

I've very much enjoyed this thread, and share a few of the UB concerns with Rajiv. That said; I really don't find UB in general an issue in my day-to-day programming. Having spent some time working on compilers I do have a lot more sympathy for compiler writers and the near-impossible job they have in generating efficient, correct code. The lack of warnings for some of the more interesting cases is a usability concern I suppose. The ease-of-use ship for C++ sailed in the late 90s though... (albeit came nearer to shore with C++14 et al).

--matt

--

Rajiv Kurian

unread,

Feb 11, 2016, 5:11:12 PM2/11/16

to mechanical-sympathy

On Thursday, February 11, 2016 at 7:46:59 AM UTC-8, Matt Godbolt wrote:

Just to give one example where UB (or something similar) can be used for good (instead of evil) by compilers:
struct Foo {
  Foo *next; 
  
  void release() {
    Foo *tmp = 0;
    for (Foo *it = next; it; it = tmp) {
#ifndef NDEBUG
some kind of check/assertion on 'it'
#endif
      tmp = it->next;
    }
  }
};

void test(Foo &f) { f.release(); }
This loop doesn't vanish in release builds on GCC, but does on clang. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67809

The reason clang can optimize it away is that an infinite loop that doesn't call any library I/O functions or synchronize on anything isn't allowed (not exactly UB, but the compiler is allowed to assume this according to the reference cited in the bug). Thus, clang assumes the loop will terminate (else it'd be UB I guess), and just drops the whole thing. GCC still does all the pointer chasing even when there's nothing to do.

Removing "empty" loops is a possibly good optimization. Moreover one man's optimization is another man's debugging nightmare. I don't think it is using the undefined behavior escape hatch or so I hope. This sounds a bit like optimizing away a for (int i = 0; i < 10; i++); since there are no side effects observable. Weirdly enough void spinForever() { while(1); } does not get compiled to a 'rep ret' even at -O3.

I've very much enjoyed this thread, and share a few of the UB concerns with Rajiv. That said; I really don't find UB in general an issue in my day-to-day programming. Having spent some time working on compilers I do have a lot more sympathy for compiler writers and the near-impossible job they have in generating efficient, correct code. The lack of warnings for some of the more interesting cases is a usability concern I suppose. The ease-of-use ship for C++ sailed in the late 90s though... (albeit came nearer to shore with C++14 et al).

C++14 stows the shot gun away towards the back of the store. The savvy customer can still find it and shoot themselves in the foot ;)

...

Matt Godbolt

unread,

Feb 11, 2016, 5:24:04 PM2/11/16

to mechanica...@googlegroups.com

On Thu, Feb 11, 2016 at 4:11 PM Rajiv Kurian <geet...@gmail.com> wrote:

This loop doesn't vanish in release builds on GCC, but does on clang. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67809

The reason clang can optimize it away is that an infinite loop that doesn't call any library I/O functions or synchronize on anything isn't allowed (not exactly UB, but the compiler is allowed to assume this according to the reference cited in the bug). Thus, clang assumes the loop will terminate (else it'd be UB I guess), and just drops the whole thing. GCC still does all the pointer chasing even when there's nothing to do.
Removing "empty" loops is a possibly good optimization. Moreover one man's optimization is another man's debugging nightmare. I don't think it is using the undefined behavior escape hatch or so I hope.

How can the compiler prove that the link list is not circular? GCC thinks it can't, and so cannot optimize away the loop. Clang seems to subscribe to the "C++ [intro.multithread]" interpretation that an infinite loop is not allowed (without sync or external calls).

This sounds a bit like optimizing away a for (int i = 0; i < 10; i++); since there are no side effects observable. Weirdly enough void spinForever() { while(1); } does not get compiled to a 'rep ret' even at -O3.

It's a little different as the compiler can in principal see that "obviously" that loop will never return. It is a little confusing why clang doesn't turn it into a ret. (BTW we should really stop perpetuating the whole "rep ret" thing! It drove me mad enough to patch GCC for my company that "rep ret" is emitted...it was a perf regression on a few AMD processors about 10 years ago! It's just L1i bloat for any reasonable modern processor, Intel or not...)

The ease-of-use ship for C++ sailed in the late 90s though... (albeit came nearer to shore with C++14 et al).
C++14 stows the shot gun away towards the back of the store. The savvy customer can still find it and shoot themselves in the foot ;)

Hahah :) I think we've tortured these poor analogies enough :)

Vitaly Davidovich

unread,

Feb 11, 2016, 5:42:11 PM2/11/16

to mechanical-sympathy

It's a bit philosophical, but "time passing" is kind of a side effect, which may be used, e.g., to prevent timing attacks in some crypto code (this isn't the "right" way to do it, but just illustration) . Of course, just making code faster as part of standard optimization affects timing (on purpose!), but the crypto guys are consistently fighting compilers AFAIK (and dropping to assembly in cases precisely to subvert compiler optimizations).

I do agree that clang not removing while(1) is odd considering it eliminated a much less obvious loop.

BTW, if you add -march=<some intel>, gcc removes the rep prefix (is that what you fixed Matt?).

--

Sanjoy Das

unread,

Feb 11, 2016, 5:46:11 PM2/11/16

to mechanica...@googlegroups.com

Matt Godbolt wrote:
> On Thu, Feb 11, 2016 at 4:11 PM Rajiv Kurian <geet...@gmail.com
> <mailto:geet...@gmail.com>> wrote:
>
> This loop doesn't vanish in release builds on GCC, but does on
> clang. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67809
>
> The reason clang can optimize it away is that an infinite loop
> that doesn't call any library I/O functions or synchronize on
> anything isn't allowed (not exactly UB, but the compiler is
> allowed to assume this according to the reference cited in the
> bug). Thus, clang assumes the loop will terminate (else it'd be
> UB I guess), and just drops the whole thing. GCC still does all
> the pointer chasing even when there's nothing to do.
>
> Removing "empty" loops is a possibly good optimization. Moreover
> one man's optimization is another man's debugging nightmare. I don't
> think it is using the undefined behavior escape hatch or so I hope.
>
>
> How can the compiler prove that the link list is not circular? GCC
> thinks it can't, and so cannot optimize away the loop. Clang seems to
> subscribe to the "C++ [intro.multithread]" interpretation that an
> infinite loop is not allowed (without sync or external calls).

Relevant discussion on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2015-July/088095.html

>
> This sounds a bit like optimizing away a for (int i = 0; i < 10;
> i++); since there are no side effects observable. Weirdly enough
> void spinForever() { while(1); } does not get compiled to a 'rep
> ret' even at -O3.
>
>
> It's a little different as the compiler can in principal see that
> "obviously" that loop will never return. It is a little confusing why
> clang doesn't turn it into a ret. (BTW we should really stop
> perpetuating the whole "rep ret" thing! It drove me mad enough to patch
> GCC for my company that "rep ret" is emitted...it was a perf regression
> on a few AMD processors about 10 years ago! It's just L1i bloat for any
> reasonable modern processor, Intel or not...)
>
>
> The ease-of-use ship for C++ sailed in the late 90s though...
> (albeit came nearer to shore with C++14 et al).
>
> C++14 stows the shot gun away towards the back of the store. The
> savvy customer can still find it and shoot themselves in the foot ;)
>
>
> Hahah :) I think we've tortured these poor analogies enough :)
>

> --
> You received this message because you are subscribed to the Google
> Groups "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to mechanical-symp...@googlegroups.com

> <mailto:mechanical-symp...@googlegroups.com>.

Matt Godbolt

unread,

Feb 11, 2016, 5:49:36 PM2/11/16

to mechanica...@googlegroups.com

On Thu, Feb 11, 2016 at 4:42 PM Vitaly Davidovich <vit...@gmail.com> wrote:

It's a bit philosophical, but "time passing" is kind of a side effect, which may be used, e.g., to prevent timing attacks in some crypto code (this isn't the "right" way to do it, but just illustration) . Of course, just making code faster as part of standard optimization affects timing (on purpose!), but the crypto guys are consistently fighting compilers AFAIK (and dropping to assembly in cases precisely to subvert compiler optimizations).

Absolutely; from a philosophical point of view I completely agree. The standard's pretty clear though: http://uhdejc.com/intro.multithread/#27 seems to be an online copy of the relevant part...

I do agree that clang not removing while(1) is odd considering it eliminated a much less obvious loop.

BTW, if you add -march=<some intel>, gcc removes the rep prefix (is that what you fixed Matt?).

Yes; in GCC 5.x and above anyway (that's what I patched; someone beat me on the upstream patch) :)

Vitaly Davidovich

unread,

Feb 11, 2016, 5:51:26 PM2/11/16

to mechanical-sympathy

Sanjoy, given you work on LLVM, any idea why it removes that linked list loop but not while(1)?

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Sanjoy Das

unread,

Feb 11, 2016, 6:07:41 PM2/11/16

to mechanica...@googlegroups.com

Vitaly Davidovich wrote:
> Sanjoy, given you work on LLVM, any idea why it removes that linked list
> loop but not while(1)?

There is an inconsistency in LLVM's model of divergent behavior.

E.g. if you change your source program to

struct Foo {
Foo *next;
void release();
};

void Foo::release() {

Foo *tmp = 0;
for (Foo *it = next; it; it = tmp) {

tmp = it->next;
}
}

void test(Foo &f) { f.release(); }

(just to force generating Foo::release out of line) then the final IR
you'll get is

"""
; Function Attrs: norecurse nounwind readonly ssp uwtable
define void @_ZN3Foo7releaseEv(%struct.Foo* nocapture %this) #0 align 2 {
entry:
br label %for.cond

for.cond: ; preds = %for.cond,
%entry
%this.pn = phi %struct.Foo* [ %this, %entry ], [ %it.0, %for.cond ]
%it.0.in = getelementptr inbounds %struct.Foo, %struct.Foo* %this.pn,
i64 0, i32 0
%it.0 = load %struct.Foo*, %struct.Foo** %it.0.in, align 8, !tbaa !2
%tobool = icmp eq %struct.Foo* %it.0, null
br i1 %tobool, label %for.cond.cleanup, label %for.cond

for.cond.cleanup: ; preds = %for.cond
ret void
}

; Function Attrs: norecurse nounwind readnone ssp uwtable
define void @_Z4testR3Foo(%struct.Foo* nocapture dereferenceable(8) %f) #1 {
entry:
ret void
}
"""

(with some additional stuff above and below).

Note that even though test() is an empty method, Foo::release isn't.

What happens here is that LLVM infers that Foo::release() is a readonly
method that does not throw exceptions (see the 'Function Attrs' bit
before it), so it figures that the unused call to release() in test() is
not required.

*However*, Foo::release() still has the loop since the LoopDeletion does
not delete potentially infinite loops:

https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Scalar/LoopDeletion.cpp#L174

This is an inconsistency in how we model divergent control flow in LLVM
IR, and the thread I linked to was an attempt to come up with a
consistent model for infinite loops.

-- Sanjoy

>
> On Thu, Feb 11, 2016 at 5:46 PM, Sanjoy Das

> <san...@playingwithpointers.com <mailto:san...@playingwithpointers.com>>

> wrote:
>
>
>
> Matt Godbolt wrote:
>
> On Thu, Feb 11, 2016 at 4:11 PM Rajiv Kurian <geet...@gmail.com
> <mailto:geet...@gmail.com>

> <mailto:mechanical-sympathy%2Bunsu...@googlegroups.com>
> <mailto:mechanical-symp...@googlegroups.com
> <mailto:mechanical-sympathy%2Bunsu...@googlegroups.com>>.

> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to mechanical-symp...@googlegroups.com

> <mailto:mechanical-sympathy%2Bunsu...@googlegroups.com>.

Vitaly Davidovich

unread,

Feb 11, 2016, 6:37:16 PM2/11/16

to mechanica...@googlegroups.com

So if I understand correctly, LLVM never actually proved the loop terminates here (it can't), but release() was marked as noexcept and readonly and the call was removed erroneously; by erroneously, I mean that it's probably not the intention unless the agreed upon model for infinite loops disallows them.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Sent from my phone

Sanjoy Das

unread,

Feb 11, 2016, 7:00:25 PM2/11/16

to mechanica...@googlegroups.com

Vitaly Davidovich wrote:

> So if I understand correctly, LLVM never actually proved the loop
> terminates here (it can't), but release() was marked as noexcept and
> readonly and the call was removed erroneously; by erroneously, I mean

If in your language infinite loops are observable things, like in
Java, and are not UB, like in C++, then it was the inference of
readonly that was broken since, clearly, the function has a side
effect. If your language is C++ then "undefined behavior" was
"exploited" to infer "readonly". This is one of the places where
semantics of C/C++ has subtly leaked into the semantics of LLVM IR
(and needs to be fixed).

Removal of a "readonly nounwind" call is, in itself, fine -- since
readonly functions are allowed to only read memory and compute
results.

This also demonstrates the layering difficulty I was talking about
earlier -- the analysis step that inferred UB is a different layer
than the one that actually used the inference to do something
interesting. The latter does not know how "readonly" was inferred,
and the former does not know if inferring readonly is going to affect
the program in an interesting way.

-- Sanjoy

Vitaly Davidovich

unread,

Feb 11, 2016, 7:05:28 PM2/11/16

to mechanica...@googlegroups.com

On Thursday, February 11, 2016, Sanjoy Das <san...@playingwithpointers.com> wrote:

Vitaly Davidovich wrote:

So if I understand correctly, LLVM never actually proved the loop
terminates here (it can't), but release() was marked as noexcept and
readonly and the call was removed erroneously; by erroneously, I mean

If in your language infinite loops are observable things, like in
Java, and are not UB, like in C++, then it was the inference of
readonly that was broken since, clearly, the function has a side
effect. If your language is C++ then "undefined behavior" was
"exploited" to infer "readonly". This is one of the places where
semantics of C/C++ has subtly leaked into the semantics of LLVM IR
(and needs to be fixed).

Removal of a "readonly nounwind" call is, in itself, fine -- since
readonly functions are allowed to only read memory and compute
results.

This also demonstrates the layering difficulty I was talking about
earlier -- the analysis step that inferred UB is a different layer
than the one that actually used the inference to do something
interesting. The latter does not know how "readonly" was inferred,
and the former does not know if inferring readonly is going to affect
the program in an interesting way.

Right, thanks. I think the layering issue can also explain why sometimes compiler makes a "hostile" optimization as viewed by user without a warning (this isn't even considering the frontend vs backend "ultimate" layering split).

Greg Young

unread,

Feb 11, 2016, 7:26:22 PM2/11/16

to mechanica...@googlegroups.com

12. The array being searched by the bsearch function does not have its elements in proper order - not a logical error just straight up UB.

Lol so you want them to check at runtime?! This is a common discussed edge case in theorem proving..

--

You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Studying for the Turing test

Vitaly Davidovich

unread,

Feb 11, 2016, 8:05:26 PM2/11/16

to mechanica...@googlegroups.com

FWIW, Java says result is undefined in this case as well. It's a memory safe language so if you index out of bounds you get an exception, but that's UB in C. It's just saying anything can happen because it can't promise memory safety, for example. In other words, it's UB because of the other more simple UB behaviors that such a search might elicit (null deref, overflow, infinite loop, etc).

Sent from my phone

Rajiv Kurian

unread,

Feb 11, 2016, 8:55:55 PM2/11/16

to mechanical-sympathy

On Thursday, February 11, 2016 at 2:24:04 PM UTC-8, Matt Godbolt wrote:

On Thu, Feb 11, 2016 at 4:11 PM Rajiv Kurian <geet...@gmail.com> wrote:
This loop doesn't vanish in release builds on GCC, but does on clang. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67809

The reason clang can optimize it away is that an infinite loop that doesn't call any library I/O functions or synchronize on anything isn't allowed (not exactly UB, but the compiler is allowed to assume this according to the reference cited in the bug). Thus, clang assumes the loop will terminate (else it'd be UB I guess), and just drops the whole thing. GCC still does all the pointer chasing even when there's nothing to do.
Removing "empty" loops is a possibly good optimization. Moreover one man's optimization is another man's debugging nightmare. I don't think it is using the undefined behavior escape hatch or so I hope.

How can the compiler prove that the link list is not circular? GCC thinks it can't, and so cannot optimize away the loop. Clang seems to subscribe to the "C++ [intro.multithread]" interpretation that an infinite loop is not allowed (without sync or external calls).

This sounds a bit like optimizing away a for (int i = 0; i < 10; i++); since there are no side effects observable. Weirdly enough void spinForever() { while(1); } does not get compiled to a 'rep ret' even at -O3.

It's a little different as the compiler can in principal see that "obviously" that loop will never return. It is a little confusing why clang doesn't turn it into a ret. (BTW we should really stop perpetuating the whole "rep ret" thing! It drove me mad enough to patch GCC for my comp

Yes the 'rep ret' thing is a sad tragedy.

...

Rajiv Kurian

unread,

Feb 11, 2016, 9:04:15 PM2/11/16

to mechanical-sympathy

On Thursday, February 11, 2016 at 5:05:26 PM UTC-8, Vitaly Davidovich wrote:

FWIW, Java says result is undefined in this case as well. It's a memory safe language so if you index out of bounds you get an exception, but that's UB in C. It's just saying anything can happen because it can't promise memory safety, for example. In other words, it's UB because of the other more simple UB behaviors that such a search might elicit (null deref, overflow, infinite loop, etc).

Yes that's what I would have assumed too. But if you ask the language lawyers it allows the compiler to invoke UB if it can prove that your array is not in proper order, even if actually running bsearch on your improper input doesn't cause null deref, infinite loop etc.

...

Reply all

Reply to author

Forward