Analysis of perf sched for simple server.

John Hening

unread,

Apr 23, 2018, 3:33:17 PM4/23/18

to mechanica...@googlegroups.com

Hello,

1. I have a simple one-threaded tcp server written in Java. I try to measure its receive-and-process-response latency. More preceisely, the client sends (by loopback) the 128-bytes message with a timestamp in the header. The server receives a message, reads a content byte by byte and compute a difference between `now` and the timestamp from the header. The difference is more or less 6μs.

Now, I am trying to make it faster.

But, firstly I would like to examine a scheduling issue. So, I've collected results with:

perf sched record -p <PID> -- sleep 10

and then:

perf sched timehist -V

Excerpt from the result is presented below: (I've filtered it for my server thread)

                                                                    wait time  sch delay   run time
   time                  cpu    task name [tid/pid]                 (msec)     (msec)     (msec)
--------------- ------  ---------  ------------------------------  ---------  ---------  ---------

   1849.531842  [0002]    s        java[7340/7338]                     0.000      0.000     56.003
   21849.587844 [0002]    s        java[7340/7338]                     0.000      0.000     56.001
   21849.607838 [0002]    s        java[7340/7338]                     0.000      0.000     19.994
   21849.615836 [0002]    s        java[7340/7338]                     0.000      0.000      7.998
   ...............................................................................................
   21849.691834 [0002]    s        java[7340/7338]                     0.000      0.000      4.000
   21849.703837 [0001]   s         java[7340/7338]                     0.000      0.000     38.330
   21849.711838 [0005]       s     java[7340/7338]                     0.000      0.000      0.000
   21849.719834 [0005]       s     java[7340/7338]                     0.000      0.000      7.996

My question is:
How is it possible that wait_time is always zero? After all, it is impossible. My CPU is multicore but there is a lot of processes (threads) that needs CPU time. How to interpret that?
From the other hand I am gonna to reduce CPU migration. Perhaps it will help :-)

2. The second issue is:

The excerpt from perf sched script


            java  7340 [002] 21848.012360:       sched:sched_wakeup: comm=java pid=7511 prio=120 target_cpu=005
            java  7340 [002] 21848.012375:       sched:sched_wakeup: comm=java pid=7511 prio=120 target_cpu=005
            java  7340 [002] 21848.012391:       sched:sched_wakeup: comm=java pid=7511 prio=120 target_cpu=005
            java  7340 [002] 21848.012406:       sched:sched_wakeup: comm=java pid=7511 prio=120 target_cpu=005
            ...................................................................................................
         swapper     0 [007] 21848.012554:       sched:sched_wakeup: comm=java pid=7377 prio=120 target_cpu=007
         swapper     0 [007] 21848.012554:       sched:sched_wakeup: comm=java pid=7377 prio=120 target_cpu=007
            ...................................................................................................
            java  7340 [002] 21848.012555:       sched:sched_wakeup: comm=java pid=7511 prio=120 target_cpu=005
            java  7377 [007] 21848.012582: sched:sched_stat_runtime: comm=java pid=7377 runtime=37420 [ns] vruntime=1628300433237 [ns]
            java  7340 [002] 21848.012585:       sched:sched_wakeup: comm=java pid=7511 prio=120 target_cpu=005
            java  7377 [007] 21848.012587:       sched:sched_switch: prev_comm=java prev_pid=7377 prev_prio=120 prev_state=S ==> next_comm=swapper/7 next_pid=0 next_prio=120

Why my server receives sched_wakup that looks like spurious wakeups? What is a swapper?

Please explain and help :-)

Gil Tene

unread,

Apr 23, 2018, 4:41:02 PM4/23/18

to mechanical-sympathy

On Monday, April 23, 2018 at 12:33:17 PM UTC-7, John Hening wrote:

Hello,

1. I have a simple one-threaded tcp server written in Java. I try to measure its receive-and-process-response latency. More preceisely, the client sends (by loopback) the 128-bytes message with a timestamp in the header. The server receives a message, reads a content byte by byte and compute a difference between `now` and the timestamp from the header. The difference is more or less 6μs.

That "more or less" is probably a pretty a big range ;-). As in it's probably 6usec +/- 100000% or more. As in "that occasional 6msec glitch you'll be very lucky stay below is 1000x as big as your common case".

Avi Kivity

unread,

Apr 24, 2018, 4:45:49 AM4/24/18

to mechanica...@googlegroups.com, John Hening

On 2018-04-23 22:33, John Hening wrote:

Hello, 1. I have a simple one-threaded tcp server written in Java. I try to measure its receive-and-process-response latency. More preceisely, the client sends (by loopback) the 128-bytes message with a timestamp in the header. The server receives a message, reads a content byte by byte and compute a difference between `now` and the timestamp from the header. The difference is more or less 6μs. Now, I am trying to make it faster. But, firstly I would like to examine a scheduling issue. So, I've collected results with: perf sched record -p <PID> -- sleep 10 and then: perf sched timehist -V Excerpt from the result is presented below: (I've filtered it for my server thread) wait time sch delay run time time cpu task name [tid/pid] (msec) (msec) (msec) --------------- ------ --------- ------------------------------ --------- --------- --------- 1849.531842 [0002] s java[7340/7338] 0.000 0.000 56.003 21849.587844 [0002] s java[7340/7338] 0.000 0.000 56.001 21849.607838 [0002] s java[7340/7338] 0.000 0.000 19.994 21849.615836 [0002] s java[7340/7338] 0.000 0.000 7.998 ............................................................................................... 21849.691834 [0002] s java[7340/7338] 0.000 0.000 4.000 21849.703837 [0001] s java[7340/7338] 0.000 0.000 38.330 21849.711838 [0005] s java[7340/7338] 0.000 0.000 0.000 21849.719834 [0005] s java[7340/7338] 0.000 0.000 7.996

My question is:
How is it possible that wait_time is always zero? After all, it is impossible. My CPU is multicore but there is a lot of processes (threads) that needs CPU time. How to interpret that?
From the other hand I am gonna to reduce CPU migration. Perhaps it will help :-)

Maybe your setup is broken in some way. But perhaps if you unfilter the results it will make more sense.

2. The second issue is:

The excerpt from perf sched script

java 7340 [002] 21848.012360: sched:sched_wakeup: comm=java pid=7511 prio=120 target_cpu=005 java 7340 [002] 21848.012375: sched:sched_wakeup: comm=java pid=7511 prio=120 target_cpu=005 java 7340 [002] 21848.012391: sched:sched_wakeup: comm=java pid=7511 prio=120 target_cpu=005 java 7340 [002] 21848.012406: sched:sched_wakeup: comm=java pid=7511 prio=120 target_cpu=005 ................................................................................................... swapper 0 [007] 21848.012554: sched:sched_wakeup: comm=java pid=7377 prio=120 target_cpu=007 swapper 0 [007] 21848.012554: sched:sched_wakeup: comm=java pid=7377 prio=120 target_cpu=007 ................................................................................................... java 7340 [002] 21848.012555: sched:sched_wakeup: comm=java pid=7511 prio=120 target_cpu=005 java 7377 [007] 21848.012582: sched:sched_stat_runtime: comm=java pid=7377 runtime=37420 [ns] vruntime=1628300433237 [ns] java 7340 [002] 21848.012585: sched:sched_wakeup: comm=java pid=7511 prio=120 target_cpu=005 java 7377 [007] 21848.012587: sched:sched_switch: prev_comm=java prev_pid=7377 prev_prio=120 prev_state=S ==> next_comm=swapper/7 next_pid=0 next_prio=120

Why my server receives sched_wakup that looks like spurious wakeups? What is a swapper?

When a cpu goes idle, it switches to the swapper (an idiosyncrasy). An interrupt then wakes up the cpu, runs the TCP stack, which decides to wake up your process. Those wakeups will be seen from any process which happens to run; swapper is simply the task that runs when nothing needs the cpu.

Peter Booth

unread,

Apr 25, 2018, 2:15:11 PM4/25/18

to mechanical-sympathy

I have a couple of suggestions:

0. It costs about 29 nanos to grab a timestamp. If you are doing it on every message sent and received you might end up with your calls to clock_gettime being your bottleneck

1. Nitsan wakart wrote a nice Java ping benchmark see http://psy-lob-saw.blogspot.com/2013/07/a-java-ping-buffet.html

which is available in git. It's worth comparing with your test and seeing if the numbers are similar. His allows you to choose between spinning, blocking, using blocking or nonblocking select

2. How many cores do you have on your host? how much traffic are you sending?

3. If you use pidstat you can see a per second count of the number of voluntary and involuntary context switches.

Reply all

Reply to author

Forward