[PATCH seastar v1] preempt: annotate for branch prediction

117 views
Skip to first unread message

Avi Kivity

<avi@scylladb.com>
unread,
Jul 1, 2018, 12:07:44 PM7/1/18
to seastar-dev@googlegroups.com
We can predict that need_preempt() will usually return false, helping
the compiler push the prempted path out of the hot path.
---
core/preempt.hh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/core/preempt.hh b/core/preempt.hh
index 44ceaf6e17..46f40bbd3a 100644
--- a/core/preempt.hh
+++ b/core/preempt.hh
@@ -28,11 +28,11 @@ extern __thread bool g_need_preempt;

inline bool need_preempt() {
#ifndef SEASTAR_DEBUG
// prevent compiler from eliminating loads in a loop
std::atomic_signal_fence(std::memory_order_seq_cst);
- return g_need_preempt;
+ return __builtin_expect(g_need_preempt, false);
#else
return true;
#endif
}

--
2.14.4

Avi Kivity

<avi@scylladb.com>
unread,
Jul 1, 2018, 1:18:35 PM7/1/18
to seastar-dev@googlegroups.com, scylladb-dev

On 2018-07-01 19:07, Avi Kivity wrote:
> We can predict that need_preempt() will usually return false, helping
> the compiler push the prempted path out of the hot path.
> ---

This gave 3.2% improvement on Scylla's perf_simple_query --smp 1
--operations-per-shard 2000000.

Before:

 Performance counter stats for './perf_simple_query --smp 1
--operations-per-shard 2000000':

      25991.965418      task-clock (msec)         #    0.999 CPUs utilized
           104,933      context-switches          #    0.004 M/sec
                 9      cpu-migrations            #    0.000 K/sec
             1,752      page-faults               #    0.067 K/sec
   100,123,254,159      cycles                    #    3.852 GHz
   109,713,718,186      instructions              #    1.10  insn per
cycle
    21,221,815,749      branches                  #  816.476 M/sec
        50,985,863      branch-misses             #    0.24% of all
branches

      26.010803667 seconds time elapsed

After:

 Performance counter stats for './perf_simple_query_likely --smp 1
--operations-per-shard 2000000':

      25219.874082      task-clock (msec)         #    0.999 CPUs utilized
           101,832      context-switches          #    0.004 M/sec
                 8      cpu-migrations            #    0.000 K/sec
             1,743      page-faults               #    0.069 K/sec
    96,720,569,733      cycles                    #    3.835 GHz
   109,634,145,825      instructions              #    1.13  insn per
cycle
    21,132,762,606      branches                  #  837.941 M/sec
        50,721,943      branch-misses             #    0.24% of all
branches

      25.242736852 seconds time elapsed


Only significant difference is cycles.

Trying a different counter:

Before:

    16,195,222,232 icache.ifetch_stall
    15,932,320,858 icache.ifetch_stall
    16,078,438,255 icache.ifetch_stall

After:

    14,846,166,778 icache.ifetch_stall
    15,213,067,906 icache.ifetch_stall
    15,087,632,877 icache.ifetch_stall

While the numbers vary the difference is about 1B stalls, or 500
stalls/cqlop. Given an op is about 11us on this machine, a stall is 22ns
or 80 cycles.

Avi Kivity

<avi@scylladb.com>
unread,
Jul 1, 2018, 1:27:32 PM7/1/18
to seastar-dev@googlegroups.com, scylladb-dev


On 2018-07-01 20:18, Avi Kivity wrote:
>
> On 2018-07-01 19:07, Avi Kivity wrote:
>> We can predict that need_preempt() will usually return false, helping
>> the compiler push the prempted path out of the hot path.
>> ---
>
> This gave 3.2% improvement on Scylla's perf_simple_query --smp 1
> --operations-per-shard 2000000.
>

btw, we could achieve a similar improvement by dropping the
need_preempt() check from then and then_wrapped. While the optimizations
overlap a lot, we should apply both.

Glauber Costa

<glauber@scylladb.com>
unread,
Jul 1, 2018, 1:30:01 PM7/1/18
to Avi Kivity, seastar-dev, scylladb-dev


On Sun, Jul 1, 2018, 7:27 PM Avi Kivity, <a...@scylladb.com> wrote:


On 2018-07-01 20:18, Avi Kivity wrote:
>
> On 2018-07-01 19:07, Avi Kivity wrote:
>> We can predict that need_preempt() will usually return false, helping
>> the compiler push the prempted path out of the hot path.
>> ---
>
> This gave 3.2% improvement on Scylla's perf_simple_query --smp 1
> --operations-per-shard 2000000.
>

btw, we could achieve a similar improvement by dropping the
need_preempt() check from then and then_wrapped. While the optimizations
overlap a lot, we should apply both.

If you remember, I saw a 3 % improvement doing that once. We ended up not applying the patch out of worry that it couls break something. 
--
You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com.
To post to this group, send email to seast...@googlegroups.com.
Visit this group at https://groups.google.com/group/seastar-dev.
To view this discussion on the web visit https://groups.google.com/d/msgid/seastar-dev/45904944-c95d-dadc-2b70-a8b45629f7c9%40scylladb.com.
For more options, visit https://groups.google.com/d/optout.

Avi Kivity

<avi@scylladb.com>
unread,
Jul 1, 2018, 1:32:19 PM7/1/18
to Glauber Costa, seastar-dev, scylladb-dev



On 2018-07-01 20:29, Glauber Costa wrote:


On Sun, Jul 1, 2018, 7:27 PM Avi Kivity, <a...@scylladb.com> wrote:


On 2018-07-01 20:18, Avi Kivity wrote:
>
> On 2018-07-01 19:07, Avi Kivity wrote:
>> We can predict that need_preempt() will usually return false, helping
>> the compiler push the prempted path out of the hot path.
>> ---
>
> This gave 3.2% improvement on Scylla's perf_simple_query --smp 1
> --operations-per-shard 2000000.
>

btw, we could achieve a similar improvement by dropping the
need_preempt() check from then and then_wrapped. While the optimizations
overlap a lot, we should apply both.

If you remember, I saw a 3 % improvement doing that once.

Yes I remember.


We ended up not applying the patch out of worry that it couls break something.

We could apply it and fix breakage later. Problems will be where we use recursion for looping.

Glauber Costa

<glauber@scylladb.com>
unread,
Jul 1, 2018, 4:27:57 PM7/1/18
to Avi Kivity, seastar-dev, scylladb-dev
On Sun, Jul 1, 2018 at 1:32 PM, Avi Kivity <a...@scylladb.com> wrote:



On 2018-07-01 20:29, Glauber Costa wrote:


On Sun, Jul 1, 2018, 7:27 PM Avi Kivity, <a...@scylladb.com> wrote:


On 2018-07-01 20:18, Avi Kivity wrote:
>
> On 2018-07-01 19:07, Avi Kivity wrote:
>> We can predict that need_preempt() will usually return false, helping
>> the compiler push the prempted path out of the hot path.
>> ---
>
> This gave 3.2% improvement on Scylla's perf_simple_query --smp 1
> --operations-per-shard 2000000.
>

btw, we could achieve a similar improvement by dropping the
need_preempt() check from then and then_wrapped. While the optimizations
overlap a lot, we should apply both.

If you remember, I saw a 3 % improvement doing that once.

Yes I remember.

We ended up not applying the patch out of worry that it couls break something.

We could apply it and fix breakage later. Problems will be where we use recursion for looping.

I can't find the patch now.
So please feel free to write & send a new version, since it is trivial.
 


To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev+unsubscribe@googlegroups.com.

Commit Bot

<bot@cloudius-systems.com>
unread,
Jul 2, 2018, 5:15:17 AM7/2/18
to seastar-dev@googlegroups.com, Avi Kivity
From: Avi Kivity <a...@scylladb.com>
Committer: Paweł Dziepak <pdzi...@scylladb.com>
Branch: master

preempt: annotate for branch prediction

We can predict that need_preempt() will usually return false, helping
the compiler push the prempted path out of the hot path.
Message-Id: <2018070116073...@scylladb.com>

---
diff --git a/core/preempt.hh b/core/preempt.hh
--- a/core/preempt.hh
+++ b/core/preempt.hh
@@ -30,7 +30,7 @@ inline bool need_preempt() {
Reply all
Reply to author
Forward
0 new messages