The Linux Scheduler: a Decade of Wasted Cores

66 views
Skip to first unread message

DrQ

unread,
May 14, 2016, 11:02:57 AM5/14/16
to Guerrilla Capacity Planning

Jean-Pierre Lozi, Baptiste Lepers, Justin Funston, Fabien Gaud, Vivien Quéma, and Alexandra Fedorova

As a central part of resource management, the OS thread scheduler must maintain the following, simple, invariant: make sure that ready threads are scheduled on available cores. As simple as it may seem, we found that this invariant is often broken in Linux. Cores may stay idle for seconds while ready threads are waiting in runqueues. In our experiments, these performance bugs caused many-fold performance degradation for synchronization-heavy scientific applications, 13% higher latency for kernel make, and a 14-23% decrease in TPC-H throughput for a widely used commercial database.

The main contribution of this work is the discovery and analysis of these bugs and providing the fixes. Conventional testing techniques and debugging tools are ineffective at confirming or understanding this kind of bugs, because their symptoms are often evasive. To drive our investigation, we built new tools that check for violation of the invariant online and visualize scheduling activity. They are simple, easily portable across kernel versions and run with a negligible overhead. We believe that making these tools part of the kernel developers' tool belt can help keep this type of bugs at bay.

Proceedings of the Eleventh European Conference on Computer Systems (EuroSys '16), London, United Kingdom, 2016.


Harry van der horst

unread,
May 16, 2016, 9:47:43 AM5/16/16
to guerrilla-cap...@googlegroups.com
Jezus holy christ.
This feels like 1969 again.

--
You received this message because you are subscribed to the Google Groups "Guerrilla Capacity Planning" group.
To unsubscribe from this group and stop receiving emails from it, send an email to guerrilla-capacity-...@googlegroups.com.
To post to this group, send email to guerrilla-cap...@googlegroups.com.
Visit this group at https://groups.google.com/group/guerrilla-capacity-planning.
For more options, visit https://groups.google.com/d/optout.



--
met hartelijke groeten/kind regards
harry van der Horst
M 0031643016999

DrQ

unread,
May 16, 2016, 9:51:55 AM5/16/16
to Guerrilla Capacity Planning
It's worse! You can't blue-wire anything.


On Monday, May 16, 2016 at 6:47:43 AM UTC-7, Harry van der horst wrote:
Jezus holy christ.
This feels like 1969 again.
2016-05-14 17:02 GMT+02:00 'DrQ' via Guerrilla Capacity Planning <guerrilla-capacity-planning@googlegroups.com>:

Jean-Pierre Lozi, Baptiste Lepers, Justin Funston, Fabien Gaud, Vivien Quéma, and Alexandra Fedorova

As a central part of resource management, the OS thread scheduler must maintain the following, simple, invariant: make sure that ready threads are scheduled on available cores. As simple as it may seem, we found that this invariant is often broken in Linux. Cores may stay idle for seconds while ready threads are waiting in runqueues. In our experiments, these performance bugs caused many-fold performance degradation for synchronization-heavy scientific applications, 13% higher latency for kernel make, and a 14-23% decrease in TPC-H throughput for a widely used commercial database.

The main contribution of this work is the discovery and analysis of these bugs and providing the fixes. Conventional testing techniques and debugging tools are ineffective at confirming or understanding this kind of bugs, because their symptoms are often evasive. To drive our investigation, we built new tools that check for violation of the invariant online and visualize scheduling activity. They are simple, easily portable across kernel versions and run with a negligible overhead. We believe that making these tools part of the kernel developers' tool belt can help keep this type of bugs at bay.

Proceedings of the Eleventh European Conference on Computer Systems (EuroSys '16), London, United Kingdom, 2016.


--
You received this message because you are subscribed to the Google Groups "Guerrilla Capacity Planning" group.
To unsubscribe from this group and stop receiving emails from it, send an email to guerrilla-capacity-planning+unsub...@googlegroups.com.
To post to this group, send email to guerrilla-capacity-planning@googlegroups.com.

Harry van der horst

unread,
May 17, 2016, 10:48:25 AM5/17/16
to guerrilla-cap...@googlegroups.com
Is it a LINUX problem or a UNIX problem?


2016-05-16 15:51 GMT+02:00 'DrQ' via Guerrilla Capacity Planning <guerrilla-cap...@googlegroups.com>:
It's worse! You can't blue-wire anything.


On Monday, May 16, 2016 at 6:47:43 AM UTC-7, Harry van der horst wrote:
Jezus holy christ.
This feels like 1969 again.
2016-05-14 17:02 GMT+02:00 'DrQ' via Guerrilla Capacity Planning <guerrilla-cap...@googlegroups.com>:

Jean-Pierre Lozi, Baptiste Lepers, Justin Funston, Fabien Gaud, Vivien Quéma, and Alexandra Fedorova

As a central part of resource management, the OS thread scheduler must maintain the following, simple, invariant: make sure that ready threads are scheduled on available cores. As simple as it may seem, we found that this invariant is often broken in Linux. Cores may stay idle for seconds while ready threads are waiting in runqueues. In our experiments, these performance bugs caused many-fold performance degradation for synchronization-heavy scientific applications, 13% higher latency for kernel make, and a 14-23% decrease in TPC-H throughput for a widely used commercial database.

The main contribution of this work is the discovery and analysis of these bugs and providing the fixes. Conventional testing techniques and debugging tools are ineffective at confirming or understanding this kind of bugs, because their symptoms are often evasive. To drive our investigation, we built new tools that check for violation of the invariant online and visualize scheduling activity. They are simple, easily portable across kernel versions and run with a negligible overhead. We believe that making these tools part of the kernel developers' tool belt can help keep this type of bugs at bay.

Proceedings of the Eleventh European Conference on Computer Systems (EuroSys '16), London, United Kingdom, 2016.


--
You received this message because you are subscribed to the Google Groups "Guerrilla Capacity Planning" group.
To unsubscribe from this group and stop receiving emails from it, send an email to guerrilla-capacity-...@googlegroups.com.
To post to this group, send email to guerrilla-cap...@googlegroups.com.



--
met hartelijke groeten/kind regards
harry van der Horst
M 0031643016999

--
You received this message because you are subscribed to the Google Groups "Guerrilla Capacity Planning" group.
To unsubscribe from this group and stop receiving emails from it, send an email to guerrilla-capacity-...@googlegroups.com.
To post to this group, send email to guerrilla-cap...@googlegroups.com.

David Collier-Brown

unread,
May 17, 2016, 11:07:16 AM5/17/16
to guerrilla-cap...@googlegroups.com
The place it was discovered is Linux: based on the description, any Unix could have it. 

--dave
-- 
David Collier-Brown,         | Always do right. This will gratify
System Programmer and Author | some people and astonish the rest
dav...@spamcop.net           |                      -- Mark Twain

Harry van der horst

unread,
May 17, 2016, 11:38:52 AM5/17/16
to guerrilla-cap...@googlegroups.com
Thanks for the info
Harry

Tom Childers

unread,
May 17, 2016, 11:38:52 AM5/17/16
to guerrilla-cap...@googlegroups.com
This is likely Linux-specific. It’s been well-known for many years that the Linux dispatcher did not scale well to multiple cores, although it’s certainly improved over time. 

There was an enormous amount of work on the Solaris kernel, for example, 10-20 years ago, to ensure that it scaled well to 64 cores. I ran various kinds of benchmarks back then demonstrating excellent scalability in Solaris, limited scalability in Linux, on the same hardware. Almost always, the application driving the benchmark would have scalability issues long before any showed up in the Solaris kernel.
-tdc

David Collier-Brown

unread,
May 17, 2016, 7:30:40 PM5/17/16
to guerrilla-cap...@googlegroups.com
As a former Solarii, I'd love to see the experiment done.

--dave
[I do expect we'd do well, but metrics are an eternal challenge].
--
You received this message because you are subscribed to the Google Groups "Guerrilla Capacity Planning" group.
To unsubscribe from this group and stop receiving emails from it, send an email to guerrilla-capacity-...@googlegroups.com.
To post to this group, send email to guerrilla-cap...@googlegroups.com.
Visit this group at https://groups.google.com/group/guerrilla-capacity-planning.
For more options, visit https://groups.google.com/d/optout.

Lance N.

unread,
May 17, 2016, 9:00:00 PM5/17/16
to Guerrilla Capacity Planning
Just write your apps with a load balancer, Ruby and Redis. Both of them deliberately only use one processor, so... you're fine!

rml...@gmail.com

unread,
May 19, 2016, 6:28:59 PM5/19/16
to Guerrilla Capacity Planning
I have a few observations.

  • This is a Linux specific problem. Commercial Unix implementations dedicate many engineering hours to improving scheduler performance driven by the desire to perform well on tactical and strategic benchmarks that require concurrency (e.g., TPC-C, TPC-H, SPECint-rate, SPECjbb, etc.)
  • There are organizations with dedicated Linux kernel engineers who improve the Linux scheduler only to have it regress between releases. They tell me that it is a constant battle.
  • Linus is not really interested in performance.
  • Some, like Brendon Gregg, acknowledge that a problem may exist but are unsure that they experience it (cf. https://lkml.org/lkml/2016/4/23/194).

I am personally convinced that a scheduling problem exists. I've been convinced for years that Linux performance is sub-par compared with commercial Unix especially as it relates to response time variance and outliers.


Neil, thanks for forwarding the article. I expect experiments over the next few months should confirm the paper's findings in the form of improved throughput and reduced outliers.


Bob


On Saturday, May 14, 2016 at 8:02:57 AM UTC-7, DrQ wrote:

DrQ

unread,
May 27, 2016, 5:37:48 PM5/27/16
to Guerrilla Capacity Planning
It seems this paper is based on an AMD Bulldozer 8-node cluster (each node having 8 cores) with HyperTransport 3 as the interconnect. 
I've never heard particularly good things about HyperTransport (in any version).

Moreover, how would a change in platform, to say, Intel with hyper-threading or the ARM server chip, impact the conclusions?


On Saturday, May 14, 2016 at 8:02:57 AM UTC-7, DrQ wrote:

DrQ

unread,
May 27, 2016, 6:33:32 PM5/27/16
to Guerrilla Capacity Planning
I should've added that the interconnect could also be replaced with something more typical, like x-Gig switched ether or infiniband, etc.
Reply all
Reply to author
Forward
0 new messages