Kernel IO Scheduler matters, BIG

323 views
Skip to first unread message

EastGhostCom

unread,
Feb 28, 2012, 9:00:11 PM2/28/12
to mongodb-user
Not a problem but FYI

While fixing a huge problem I discovered a huge but unrelated
revelation that now seems obvious. But in case you haven't payed any
attention to it...

2.6.x kernels have 4 types of IO scheduling included. Various distros
have defaulted to different types over the years.

Ubuntu 6.06.4 LTS (ca 2006) running kernel 2.6.15-28-SMP happened to
use the "as" type IO scheduler by default. Didn't know why until
today, but this default was causing frequent serious lag while doing
big copies (e.g., "cp /proc/kcore /dev/null") and of course big mongo
operations, particularly the auto-creation/pre-allocation of "empty"
2G files, which would drive the system load up to 40 and beyond, from
its normal of anywhere from 0.5 to 4.

After reading a bunch today and partly expecting a crash and forced
reboot (and ready for it), I tried changing on-the-fly from the "as"
type scheduler (that has been running over 700 days on this server
without reboot) to the "deadline" scheduler. Articles I read said the
"deadline" type scheduler is often great for database servers...mongo
boxes. After making the adjustment (without suffering a crash or even
hiccup), I then caused a pre-allocate and also did a dreaded "cp /proc/
kernel /dev/null", but the total load never exceeded 7 (vs 40+ !) and
responsiveness to new concomitant tasks (ls -al, etc.) was noticeably
much better than under the default "as" type scheduler. Normal / idle
(non-copying, non-pre-allocating" operation saw system load hovering
around 0.15. This astonished me. I changed nothing else on the box;
all I did was "echo deadline > /sys/block/hda/queue/scheduler" and the
same for hdd (because both are used in a RAID1), and the load was cut
by over 80%.

Disbelieving but smiling, I changed to the "cfq" type, which is
apparently the default on latest Ubuntu releases and widely claimed to
be superior ("echo cfq > /sys/block/hda/queue/scheduler" and also
"echo cfq > /sys/block/hdd/queue/scheduler" for RAID1 mirror). Load
average immediately began dropping, from 0.15 or so...
0.10...0.08...0.04...0.02...0.01... Been testing this all day now:
The average typical load has not exceeded 0.2. Testing the dreaded
big cp and same-time mongo pre-allocate, the load does not exceed 2.2
(box has 2 cpus) and responsiveness is almost instant regardless of
disk stuff going on. So, the cfq type scheduler approaches "ideal"
system load and feel, all other things untouched and equal. Flipping
back to the "as" type causes instant lag and pain, proving it pays to
always explore, read and try new things.

Good interview article w/ the guy Jens who wrote the schedulers:
http://kerneltrap.org/node/7637

EastGhostCom

unread,
Feb 29, 2012, 3:25:52 AM2/29/12
to mongod...@googlegroups.com
"... as the literature and this study reports, no one scheduler can provide the best possible performance for all workloads..."



Gregor Macadam

unread,
Feb 29, 2012, 4:29:05 AM2/29/12
to mongodb-user
Interesting!

On Feb 29, 8:25 am, EastGhostCom <mikes.google.acco...@brenden.com>
wrote:
> "... as the literature and this study reports, no one scheduler can provide
> the best possible performance for all workloads..."http://www.linuxinsight.com/ols2005_enhancements_to_linux_i_o_schedul...
>
> Workload Dependent Performance Evaluation of the Linux 2.6 I/O Schedulershttp://www.linuxinsight.com/ols2004_workload_dependent_performance_ev...

Andy O'Neill

unread,
Feb 29, 2012, 11:38:44 AM2/29/12
to mongodb-user
Ubuntu 10.04 LTS uses the 'deadline' scheduler by default in the
server edition. The desktop version uses 'cfq' by default. These are
probably reasonable defaults.

On Feb 28, 9:00 pm, EastGhostCom <mikes.google.acco...@brenden.com>
wrote:

EastGhostCom

unread,
Mar 1, 2012, 12:59:48 AM3/1/12
to mongod...@googlegroups.com

Found a note about how the "as" (anticipatory) scheduler is slated to be (and probably already has been) removed from latest kernels.  This pretty much says it all.

The "as" sched apparently had the elevator 'waiting around' for others who might get on instead of hastily servicing existing requests.  As I found this was devastating to performance under heavy load.
Reply all
Reply to author
Forward
0 new messages