Nanotrusting the Nanotime and amortization.

John Hening

unread,

Apr 24, 2018, 4:44:33 PM4/24/18

to mechanical-sympathy

I'm reading the great article from https://shipilev.net/blog/2014/nanotrusting-nanotime/ (thanks Aleksey! :)) and I am not sure whether I understand correctly that.

Firstly, it is compared performance of plain and volatile writes:


Benchmark Mode Samples Mean Mean error Units
o.s.VolatileWriteSucks.incrPlain avgt 250 3.589 0.025 ns/op
o.s.VolatileWriteSucks.incrVolatile avgt 250 15.219 0.114 ns/op


and then it is written that: 

"In real code, the heavy-weight operations are mixed with relatively low-weight ops, which amortize the costs."

And my question is: What does it mean to amortize costs exactly? I explain it myself that amortization is caused by out of order execution of CPU, yes? 
So even if volatile write takes much more time than plain write, it isn't so painful because CPU executes other instruction out of order (if it can). 

What do you think?

Aleksey Shipilev

unread,

Apr 25, 2018, 4:52:12 AM4/25/18

to mechanica...@googlegroups.com, John Hening

Yes, that's basically the gist of it: volatile writes can be heavy, especially when contended
(although contention is the first-order effect there, and non-volatile writes would suck as much),
but in real cases they mostly aren't.

Amortizing would happen even for in-order CPUs: you can have N arithmetic ops executing on sub-cycle
speed, and then occasional speed bump with a memory barrier that takes tens/hundreds of cycles. The
larger the N, the higher the average execution speed. Obviously, it gets better with out-of-order
CPUs, but that is not a requirement.

It was supposed to protect readers from assuming they should avoid volatile writes, because they are
"obviously" slow (hey look, 10x degradation!). While in reality, it matters mostly on very optimized
fast-paths, and probably only the interest of performance fiends^W people subscribed to this list :)

Thanks,
-Aleksey

signature.asc

Vitaly Davidovich

unread,

Apr 25, 2018, 5:37:37 AM4/25/18

to mechanica...@googlegroups.com, John Hening

On Wed, Apr 25, 2018 at 4:52 AM Aleksey Shipilev <aleksey....@gmail.com> wrote:

On 04/24/2018 10:44 PM, John Hening wrote:
> I'm reading the great article from https://shipilev.net/blog/2014/nanotrusting-nanotime/ (thanks
> Aleksey! :)) and I am not sure whether I understand correctly that.
>
> Firstly, it is compared performance of plain and volatile writes:
>
> BenchmarkModeSamplesMeanMeanerror Units
> o.s.VolatileWriteSucks.incrPlain avgt 2503.5890.025ns/op
> o.s.VolatileWriteSucks.incrVolatile avgt 25015.2190.114ns/op
>
> and then it is written that:
>
> "In real code, the heavy-weight operations are mixed with relatively low-weight ops, which
> amortize the costs."
>
> And my question is: What does it mean to amortize costs exactly? I explain it myself that
> amortization is caused by out of order execution of CPU, yes? So even if volatile write takes
> much more time than plain write, it isn't so painful because CPU executes other instruction out
> of order (if it can).
>
> What do you think?
Yes, that's basically the gist of it: volatile writes can be heavy, especially when contended
(although contention is the first-order effect there, and non-volatile writes would suck as much),
but in real cases they mostly aren't.

Amortizing would happen even for in-order CPUs: you can have N arithmetic ops executing on sub-cycle
speed, and then occasional speed bump with a memory barrier that takes tens/hundreds of cycles. The
larger the N, the higher the average execution speed. Obviously, it gets better with out-of-order
CPUs, but that is not a requirement.

The way I see it, in-order/out-of-order doesn’t matter here and just muddies the water.

Volatile writes kill speculation/OoO - that’s arguably their biggest cost (+ compiler optimization barrier, particularly for loops) over plain stores to shared locations. So there’s no OoO execution across them.

I think the “amortization” can be viewed simply as (which you do mention):

1) If entire processing consists of volatile write and it takes, say, 15ns, then it consumes 100% of all processing.

2) If there’s another 15ns of processing involved, then it’s 50% of overall execution.

3) If there’s 85ns of additional work, it’s now 15%.

And so on.

It was supposed to protect readers from assuming they should avoid volatile writes, because they are
"obviously" slow (hey look, 10x degradation!). While in reality, it matters mostly on very optimized
fast-paths, and probably only the interest of performance fiends^W people subscribed to this list :)

Thanks,
-Aleksey

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Sent from my phone

John Hening

unread,

Apr 25, 2018, 1:33:01 PM4/25/18

to mechanica...@googlegroups.com

ok, now I see what do you mean by amortization:

In reality volatile write in relation to whole execution is negligible.

Reply all

Reply to author

Forward