Fix for "load balancing tick" issue

19 views
Skip to first unread message

l0tek

unread,
Jul 18, 2010, 3:25:49 AM7/18/10
to Zen-Kernel
Until the kernel devs come up with an official fix, looks like
commenting out nohz_ratelimit(cpu) in tick-sched.c fixes (sort of) the
infamous nohz balance problem.

See:
http://lkml.org/lkml/2010/7/8/122
http://www.listware.net/201007/linux-kernel/16253-high-power-consumption-in-recent-kernels.html

Steven Barrett

unread,
Jul 18, 2010, 3:44:38 AM7/18/10
to zen_k...@googlegroups.com

Nice find l0tek. This is a severe bug though, I'm sure they'll at
least remove nohz_ratelimit(cpu) from the logic if they think the more
long term fix is too invasive. Seeing how they completely revamped
the writeback code at the last minute, I don't see why they won't do
the same with this.

Triplesquarednine

unread,
Jul 21, 2010, 12:07:09 PM7/21/10
to Zen-Kernel
> See:http://lkml.org/lkml/2010/7/8/122http://www.listware.net/201007/linux-kernel/16253-high-power-consumpt...

Im a bit foggy on how to do this exactly, could you give an example?

I looked at the code, but it doesn't seem obvious to me, exaclty from
"where to where" i must comment out.
I am getting the systems of this nohz problem....

any help would be great.

jordan

i can downgrade and use 2.6.33 but 2.6.34 seems to be much nicer for
my applications, aside from this issue.

thanx

l0tek

unread,
Jul 22, 2010, 12:55:18 AM7/22/10
to Zen-Kernel
You just have to edit kernel/time/tick-sched.c look for function
tick_nohz_stop_sched_tick(int inidle) and where it says
arch_needs_cpu(cpu) comment out nohz_ratelimit(cpu) so that is reads:

if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu) ||
arch_needs_cpu(cpu) /* || nohz_ratelimit(cpu) */) {

However as Steven pointed out, the devs will probably come up with a
better patch for the nohz issue soon (just take a look at the work in
the latest 2.6.35 snapshot...)


On Jul 21, 6:07 pm, Triplesquarednine <triplesquaredn...@gmail.com>
wrote:
> On Jul 18, 3:25 am, l0tek <lotekw...@gmail.com> wrote:
>
> > Until the kernel devs come up with an official fix, looks like
> > commenting out nohz_ratelimit(cpu) in tick-sched.c fixes (sort of) the
> > infamous nohz balance problem.
>
> > See:http://lkml.org/lkml/2010/7/8/122http://www.listware.net/201007/linux......

Triplesquarednine

unread,
Jul 22, 2010, 4:14:03 AM7/22/10
to Zen-Kernel
thanks,

I got it!

Yes, i am sure it will be fixed soon, but that is no real for me to
not learn something, and also fix something,
in only temperarily...

again, thanks for your help :)

ninez

l0tek

unread,
Aug 5, 2010, 2:17:21 AM8/5/10
to Zen-Kernel
In the 2.6.35 changelog I see the nohz patch that Peter Zijlstra's
posted in June, which basically consists in moving the nohz_ratelimit
call elsewhere. I was hoping they would be using the patch he
released in July (sched: Revert nohz_ratelimit(), commit
396e894d289d69bacf5acd983c97cd6e21a14c08) instead... or am I missing
something?

l0tek

unread,
Aug 5, 2010, 5:15:12 AM8/5/10
to Zen-Kernel
BTW, Arjan van de Ven's patch (per-cpu tick skew removal) was also
interesting:
http://lkml.indiana.edu/hypermail/linux/kernel/1007.3/01096.html

Steven Barrett

unread,
Aug 5, 2010, 8:58:23 PM8/5/10
to zen_k...@googlegroups.com, l0tek
Looks reasonable, I'll add this and the no_hz patch suggested above to
2.6.35 on zen.git
Reply all
Reply to author
Forward
0 new messages