Sorry, it took me quite a while to realize the real root cause of the VAIO - and probably many other machines - suspend/resume regressions, which were unearthed by the dyntick / clockevents patches.
We disable a lot of ACPI/BIOS functionality during suspend, but we keep the lower idle C-states functionality active across suspend/resume. It seems that this causes trouble with certain BIOSes, but I assume that the problem is more wide spread and just not surfacing due to the various scenarios in which a machine goes into suspend/resume. I spent some quality time to figure out a set of debug mechanisms, which did not influence the problem. So it is quite likely that a lot of machines might be affected by this, but due to the configuration, interrupt scenarios, .... the problem just does not show up.
My final enlightment was, when I removed the ACPI processor module, which controls the lower idle C-states, right before resume; this worked fine all the time even without all the workaround hacks.
I really hope that this two patches finally set an end to the "jinxed VAIO heisenbug series", which started when we removed the periodic tick with the clockevents/dyntick patches.
Venki, can you please add the analogous fix to the cpuidle patch set ?
[
clockevents-remove-the-wrong-fix.patch 2K ] In a desparate attempt to fix the suspend/resume problem on Andrews VAIO I added a workaround which enforced the broadcast of the oneshot timer on resume. This was actually resolving the problem on the VAIO but was just a stupid workaround, which was not tackling the root cause: the assignement of lower idle C-States in the ACPI processor_idle code. The cpuidle patches, which utilize the dynamic tick feature and go faster into deeper C-states exposed the problem again. The correct solution is the previous patch, which prevents lower C-states across the suspend/resume.
Remove the enforcement code, including the conditional broadcast timer arming, which helped to pamper over the real problem for quite a time. The oneshot broadcast flag for the cpu, which runs the resume code can never be set at the time when this code is executed. It only gets set, when the CPU is entering a lower idle C-State.
Signed-off-by: Thomas Gleixner <t...@linutronix.de> Tested-by: Andrew Morton <a...@linux-foundation.org> Cc: Len Brown <l...@kernel.org> Cc: Venkatesh Pallipadi <venkatesh.pallip...@intel.com> Cc: Rafael J. Wysocki <r...@sisk.pl>
int tick_resume_broadcast_oneshot(struct clock_event_device *bc) { - int cpu = smp_processor_id(); - - /* - * If the CPU is marked for broadcast, enforce oneshot - * broadcast mode. The jinxed VAIO does not resume otherwise. - * No idea why it ends up in a lower C State during resume - * without notifying the clock events layer. - */ - if (cpu_isset(cpu, tick_broadcast_mask)) - cpu_set(cpu, tick_broadcast_oneshot_mask); - clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT); - - if(!cpus_empty(tick_broadcast_oneshot_mask)) - tick_broadcast_set_event(ktime_get(), 1); - - return cpu_isset(cpu, tick_broadcast_oneshot_mask); + return 0; }
[
acpi-ignore-cstates-during-suspend.patch 3K ] device_suspend() calls ACPI suspend functions, which seems to have undesired side effects on lower idle C-states. It took me some time to realize that especially the VAIO BIOSes (both Andrews jinxed UP and my elfstruck SMP one) show this effect. I'm quite sure that other bug reports against suspend/resume about turning the system into a brick have the same root cause.
After fishing in the dark for quite some time, I realized that removing the ACPI processor module before suspend (this removes the lower C-state functionality) made the problem disappear. Interestingly enough the propability of having a bricked box is influenced by various factors (interrupts, size of the ram image, ...). Even adding a bunch of printks in the wrong places made the problem go away. The previous periodic tick implementation simply pampered over the problem, which explains why the dyntick / clockevents changes made this more prominent.
We avoid complex functionality during the boot process and we have to do the same during suspend/resume. It is a similar scenario and equaly fragile.
Add suspend / resume functions to the ACPI processor code and disable the lower idle C-states across suspend/resume. Fall back to the default idle implementation (halt) instead.
Signed-off-by: Thomas Gleixner <t...@linutronix.de> Tested-by: Andrew Morton <a...@linux-foundation.org> Cc: Len Brown <l...@kernel.org> Cc: Venkatesh Pallipadi <venkatesh.pallip...@intel.com> Cc: Rafael J. Wysocki <r...@sisk.pl>
> My final enlightment was, when I removed the ACPI processor module, > which controls the lower idle C-states, right before resume; this > worked fine all the time even without all the workaround hacks.
> I really hope that this two patches finally set an end to the "jinxed > VAIO heisenbug series", which started when we removed the periodic > tick with the clockevents/dyntick patches.
Ok, so the patches look fine, but I somehow have this slight feeling that you gave up a bit too soon on the "*why* does this happen?" question.
I realize that the answer is easily "because ACPI screwed up", but I'm wondering if there's something we do to trigger that screw-up.
In particular, I also suspect that this may not really fix the problem - maybe it just makes the window sufficiently small that it no longer triggers. Because we don't necessarily understand what the real background for the problem is, I'm not sure we can say that it is solved.
The reason I say this is that I have a suspicion on what triggers it.
and here the big thing to notice is that "pm_ops->prepare()" call, which sets the wakup vector etc etc.
So maybe the real problem here is that once we've done the "->prepare()" call and ACPI has set up various stuff, we MUST NOT do any calls to any ACPI routines to set low-power states, because the stupid firmware isn't expecting it.
Now, if this is the cause, then I think your patch should indeed fix it, since you get called by the early-suspend code (which happens *before* the "->prepare()" call), but at the same time, I wonder if maybe it would be slightly "more correct" to instead of using the suspend/resume callbacks, simply do this in the "acpi_pm_prepare()" stage, since that is likely the thing that triggers it?
But hey, I think I'll apply the patches as-is. I'd just feel even better if we actually understood *why* doing the CPU Cx states is not something we can do around the suspend code!
On Sat, 2007-09-22 at 15:59 -0700, Linus Torvalds wrote: > > My final enlightment was, when I removed the ACPI processor module, > > which controls the lower idle C-states, right before resume; this > > worked fine all the time even without all the workaround hacks.
> > I really hope that this two patches finally set an end to the "jinxed > > VAIO heisenbug series", which started when we removed the periodic > > tick with the clockevents/dyntick patches.
> Ok, so the patches look fine, but I somehow have this slight feeling that > you gave up a bit too soon on the "*why* does this happen?" question.
Yeah, I gave up at the point where I was not longer able to dig deeper :)
> I realize that the answer is easily "because ACPI screwed up", but I'm > wondering if there's something we do to trigger that screw-up.
> In particular, I also suspect that this may not really fix the problem - > maybe it just makes the window sufficiently small that it no longer > triggers. Because we don't necessarily understand what the real background > for the problem is, I'm not sure we can say that it is solved.
> The reason I say this is that I have a suspicion on what triggers it.
> and here the big thing to notice is that "pm_ops->prepare()" call, which > sets the wakup vector etc etc.
> So maybe the real problem here is that once we've done the "->prepare()" > call and ACPI has set up various stuff, we MUST NOT do any calls to any > ACPI routines to set low-power states, because the stupid firmware isn't > expecting it.
That's what I suspect and deduced from the various experiments including a force the cpu into a lower c-state one, which triggered the problem fully reproducible. Note that in case of the "force a lower c-state" I verified, that the PIT was activated to avoid the local apic stops in c3 issue. But I never got an PIT interrupt. Either the box was completely stuck or I was able to recover by hitting a key, which is as well one of the unexplained phenomenons.
> Now, if this is the cause, then I think your patch should indeed fix it, > since you get called by the early-suspend code (which happens *before* the > "->prepare()" call), but at the same time, I wonder if maybe it would be > slightly "more correct" to instead of using the suspend/resume callbacks, > simply do this in the "acpi_pm_prepare()" stage, since that is likely the > thing that triggers it?
Yeah, probably that's the correct point, but I leave this to the ACPI wizards.
> But hey, I think I'll apply the patches as-is. I'd just feel even better > if we actually understood *why* doing the CPU Cx states is not something > we can do around the suspend code!
That needs some explanation of the folks who can actually look beyond the ACPI/BIOS internals.
from Mihal, who just managed to do some other magic in sligtly different context (maybe yet another anti "ACPI screwed up").
Mihai, you can find whole thread by above URL, requesting message-ids from in-reply-to or references headers of this message. _____ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
> from Mihal, who just managed to do some other magic in sligtly > different context (maybe yet another anti "ACPI screwed up").
> Mihai, you can find whole thread by above URL, requesting message-ids > from in-reply-to or references headers of this message.
From a "future behaviour" standpoint it would probably be interesting to hear whether Mihai can make his machine with not with the old IDE layer (which distributions are migrating away from) but with the ATA layer (libata) instead. It too should hopefully know about using ACPI to restore any ATA controller quirks.
On Sunday 23 September 2007, Linus Torvalds wrote:
> From a "future behaviour" standpoint it would probably be interesting to > hear whether Mihai can make his machine with not with the old IDE layer > (which distributions are migrating away from) but with the ATA layer > (libata) instead. It too should hopefully know about using ACPI to restore > any ATA controller quirks.
I switched to libata, but it behaves like the old IDE without ACPI. I did not manage to get a full dmesg (apparently all volumes are mounted r/o right after a power up from a s2ram) but I did make a picture, from which I quote (if I may say so):
" ata1.00: configured for PIO0 sd 0:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08 sd 0:0:0:0: [sda] Sense key : 0xb [current] [descriptor] Descriptor sense data with sense descriptors (in hex): 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 07 65 35 25 sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x0 end_request: I/O error, dev sda, sector 124073509 ata1: EH complete sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata1.00: cmd c5/00:10:75:36:65/00:00:00:00:00/e7 tag 0 cdb 0x0 data 8192 out res 51/04:10:75:36:65/00:00:00:00:00/e7 Emask 0x1 (device error) ata1.00: configured for PIO0 ata1: EH complete "
The last six lines repeat six times after which the whole things goes from the beginning: " ata1.00: configured for PIO0 ... "
It all gets crazy the moment I (or a process) try to access the root (or any other drive), until then, everything is nice and quiet.
Mmm... in the excerpt above it says: "Write Protect is off" but when I did $ mount -o remount,rw / I got something like: "the device is write protected".
I tried to save the dmesg on a mmc, but after powering up it said: "out of disk space"
These are about all symptoms that I noticed... oh, and 'scsi_eh_0/1' enters disk-sleep often.
On Sunday, 23 September 2007 00:59, Linus Torvalds wrote:
> On Sat, 22 Sep 2007, Thomas Gleixner wrote:
> > My final enlightment was, when I removed the ACPI processor module, > > which controls the lower idle C-states, right before resume; this > > worked fine all the time even without all the workaround hacks.
> > I really hope that this two patches finally set an end to the "jinxed > > VAIO heisenbug series", which started when we removed the periodic > > tick with the clockevents/dyntick patches.
> Ok, so the patches look fine, but I somehow have this slight feeling that > you gave up a bit too soon on the "*why* does this happen?" question.
> I realize that the answer is easily "because ACPI screwed up", but I'm > wondering if there's something we do to trigger that screw-up.
> In particular, I also suspect that this may not really fix the problem - > maybe it just makes the window sufficiently small that it no longer > triggers. Because we don't necessarily understand what the real background > for the problem is, I'm not sure we can say that it is solved.
> The reason I say this is that I have a suspicion on what triggers it.
> and here the big thing to notice is that "pm_ops->prepare()" call, which > sets the wakup vector etc etc.
> So maybe the real problem here is that once we've done the "->prepare()" > call and ACPI has set up various stuff, we MUST NOT do any calls to any > ACPI routines to set low-power states, because the stupid firmware isn't > expecting it.
I think that this is the case.
> Now, if this is the cause, then I think your patch should indeed fix it, > since you get called by the early-suspend code (which happens *before* the > "->prepare()" call), but at the same time, I wonder if maybe it would be > slightly "more correct" to instead of using the suspend/resume callbacks, > simply do this in the "acpi_pm_prepare()" stage, since that is likely the > thing that triggers it?
> But hey, I think I'll apply the patches as-is. I'd just feel even better > if we actually understood *why* doing the CPU Cx states is not something > we can do around the suspend code!
> I switched to libata, but it behaves like the old IDE without ACPI. I > did not manage to get a full dmesg (apparently all volumes are mounted > r/o right after a power up from a s2ram) but I did make a picture, from > which I quote (if I may say so):
Device errors.
Libata currently (wrongly IMHO) defaults to avoiding the use of ACPI suspend/resume methods
So you also need to boot with "libata.noacpi=0" and if that works beat up Jeff a bit ..
> > I switched to libata, but it behaves like the old IDE without ACPI. I > > did not manage to get a full dmesg (apparently all volumes are mounted > > r/o right after a power up from a s2ram) but I did make a picture, from > > which I quote (if I may say so):
> Device errors.
> Libata currently (wrongly IMHO) defaults to avoiding the use of ACPI > suspend/resume methods
> So you also need to boot with "libata.noacpi=0" and if that works beat > up Jeff a bit ..
You were right, 'libata.noacpi=0' does the trick :)
It's interesting my mmc is unmounted before s2ram and then mounted back on resume.
So, to sum up, I have a working suspend-to-ram with libata. Not bad, not bad at all...
On Sat, Sep 22, 2007 at 08:11:06PM -0700, Linus Torvalds wrote: > From a "future behaviour" standpoint it would probably be interesting to > hear whether Mihai can make his machine with not with the old IDE layer > (which distributions are migrating away from) but with the ATA layer > (libata) instead. It too should hopefully know about using ACPI to restore > any ATA controller quirks.
It does, but it's disabled by default. libata needs to be loaded with noacpi=0.
> On Sat, 22 Sep 2007, Thomas Gleixner wrote: >> My final enlightment was, when I removed the ACPI processor module, >> which controls the lower idle C-states, right before resume; this >> worked fine all the time even without all the workaround hacks.
>> I really hope that this two patches finally set an end to the "jinxed >> VAIO heisenbug series", which started when we removed the periodic >> tick with the clockevents/dyntick patches.
> Ok, so the patches look fine, but I somehow have this slight feeling that > you gave up a bit too soon on the "*why* does this happen?" question.
On a closely related note: I just now submitted a patch to fix SMP-poweroff, by having it do disable_nonboot_cpus before doing poweroff.
Which has led me to thinking.. ..are similar precautions perhaps necessary for *all* ACPI BIOS calls?
Because one never knows what the other CPUs are doing at the same time, and what the side effects may be on the ACPI BIOS functions.
And also, I wonder if at a minimum we should be guaranteeing ACPI BIOS calls only ever happen from CPU#0 (or the "boot" CPU)? Or do we do that already?
On Fri, 2007-09-28 at 16:27 -0400, Mark Lord wrote: > Linus Torvalds wrote:
> > On Sat, 22 Sep 2007, Thomas Gleixner wrote: > >> My final enlightment was, when I removed the ACPI processor module, > >> which controls the lower idle C-states, right before resume; this > >> worked fine all the time even without all the workaround hacks.
> >> I really hope that this two patches finally set an end to the "jinxed > >> VAIO heisenbug series", which started when we removed the periodic > >> tick with the clockevents/dyntick patches.
> > Ok, so the patches look fine, but I somehow have this slight feeling that > > you gave up a bit too soon on the "*why* does this happen?" question.
> On a closely related note: I just now submitted a patch to fix SMP-poweroff, > by having it do disable_nonboot_cpus before doing poweroff.
> Which has led me to thinking.. > ..are similar precautions perhaps necessary for *all* ACPI BIOS calls?
> Because one never knows what the other CPUs are doing at the same time, > and what the side effects may be on the ACPI BIOS functions.
> And also, I wonder if at a minimum we should be guaranteeing ACPI BIOS calls > only ever happen from CPU#0 (or the "boot" CPU)? Or do we do that already?
The ACPI calls are serialized in the kernel, AFAICT. But the fragile situations (suspend, resume, shutdown, reboot) are probably those, where some BIOS implementation expect that certain things are not called or not active.
> And also, I wonder if at a minimum we should be guaranteeing ACPI BIOS calls > only ever happen from CPU#0 (or the "boot" CPU)? Or do we do that already?
The real question that matters is "does windows" which possibly someone who touches Windows can definitively answer. For APM we lock to CPU 0 and have to because its what MS did, not because of the spec. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Thomas Gleixner wrote: > On Fri, 2007-09-28 at 16:27 -0400, Mark Lord wrote: .. >> On a closely related note: I just now submitted a patch to fix SMP-poweroff, >> by having it do disable_nonboot_cpus before doing poweroff.
>> Which has led me to thinking.. >> ..are similar precautions perhaps necessary for *all* ACPI BIOS calls?
>> Because one never knows what the other CPUs are doing at the same time, >> and what the side effects may be on the ACPI BIOS functions.
>> And also, I wonder if at a minimum we should be guaranteeing ACPI BIOS calls >> only ever happen from CPU#0 (or the "boot" CPU)? Or do we do that already?
> The ACPI calls are serialized in the kernel, AFAICT. But the fragile > situations (suspend, resume, shutdown, reboot) are probably those, where > some BIOS implementation expect that certain things are not called or > not active.
Mmm.. *do* we actually do this for reboot? I don't see it there. And how about for kexec?
On Friday, 28 September 2007 23:17, Mark Lord wrote: > Thomas Gleixner wrote: > > On Fri, 2007-09-28 at 16:27 -0400, Mark Lord wrote: > .. > >> On a closely related note: I just now submitted a patch to fix SMP-poweroff, > >> by having it do disable_nonboot_cpus before doing poweroff.
> >> Which has led me to thinking.. > >> ..are similar precautions perhaps necessary for *all* ACPI BIOS calls?
> >> Because one never knows what the other CPUs are doing at the same time, > >> and what the side effects may be on the ACPI BIOS functions.
> >> And also, I wonder if at a minimum we should be guaranteeing ACPI BIOS calls > >> only ever happen from CPU#0 (or the "boot" CPU)? Or do we do that already?
> > The ACPI calls are serialized in the kernel, AFAICT. But the fragile > > situations (suspend, resume, shutdown, reboot) are probably those, where > > some BIOS implementation expect that certain things are not called or > > not active.
> Mmm.. *do* we actually do this for reboot? I don't see it there. > And how about for kexec?
> I'm probably just missing seeing it. Right?
Nope.
Till now, only hibernation and suspend disabled the nonboot CPUs before invoking the platform firmware.
>> On Sat, 22 Sep 2007, Thomas Gleixner wrote: >>> My final enlightment was, when I removed the ACPI processor module, >>> which controls the lower idle C-states, right before resume; this >>> worked fine all the time even without all the workaround hacks.
>>> I really hope that this two patches finally set an end to the "jinxed >>> VAIO heisenbug series", which started when we removed the periodic >>> tick with the clockevents/dyntick patches.
>> Ok, so the patches look fine, but I somehow have this slight feeling >> that you gave up a bit too soon on the "*why* does this happen?" >> question.
> On a closely related note: I just now submitted a patch to fix > SMP-poweroff, > by having it do disable_nonboot_cpus before doing poweroff.
> Which has led me to thinking.. > ..are similar precautions perhaps necessary for *all* ACPI BIOS calls?
> Because one never knows what the other CPUs are doing at the same time, > and what the side effects may be on the ACPI BIOS functions.
> And also, I wonder if at a minimum we should be guaranteeing ACPI BIOS > calls > only ever happen from CPU#0 (or the "boot" CPU)? Or do we do that > already?
Boot CPU, and AFAIK suspend is the only place which does it.
-- Bill Davidsen <david...@tmr.com> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot