Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

sata suspend resume ...

4 views
Skip to first unread message

Jeff Chua

unread,
Apr 19, 2006, 11:30:32 AM4/19/06
to

Any change of getting suspend/resume to work on my IBM X60s notebook.

Disk model is ...

MODEL="ATA HTS541060G9SA00"
FW_REV="MB3I"

Linux 2.6.17-rc2.

System suspends ok. Resume ok. but no disk access after that.


Thanks,
Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Arkadiusz Miskiewicz

unread,
Apr 19, 2006, 12:00:35 PM4/19/06
to
On Wednesday 19 April 2006 17:26, Jeff Chua wrote:
> Any change of getting suspend/resume to work on my IBM X60s notebook.
>
> Disk model is ...
>
> MODEL="ATA HTS541060G9SA00"
> FW_REV="MB3I"
>
> Linux 2.6.17-rc2.
>
> System suspends ok. Resume ok. but no disk access after that.

It contains AHCI, right? Then try
http://www.spinnaker.de/linux/c1320/sata-resume-2.6.16.5.patch

For some others like ata_piix ,,SATA ACPI objects support'' patch by Randy
Dunlap is needed AFAIK. See http://www.xenotime.net/linux/SATA for old
versions of these. It seems that nothing happened in this area in last 2
months. It no longer applies (parts of it are already in mailine, other parts
changed too much). Z60* ThinkPads probably need that patch.

> Thanks,
> Jeff

--
Arkadiusz Miƛkiewicz PLD/Linux Team
arekm / maven.pl http://ftp.pld-linux.org/

Hugh Dickins

unread,
Apr 19, 2006, 12:20:11 PM4/19/06
to
On Wed, 19 Apr 2006, Jeff Chua wrote:
>
> Any change of getting suspend/resume to work on my IBM X60s notebook.
>
> Disk model is ...
>
> MODEL="ATA HTS541060G9SA00"
> FW_REV="MB3I"
>
> Linux 2.6.17-rc2.
>
> System suspends ok. Resume ok. but no disk access after that.

Not the same disk model, but I've been having similar trouble on a T43p.

I was delighted to see the MSI suspend/resume fix go into 2.6.17-rc2,
but then disappointed. A bisection found that Matt Mackall's sensible
rc1 patch, to speed up get_cmos_time, has removed what often used to be
a 2 second delay in resuming: things work well when I reinstate that
delay (1 second has proved not enough). Below is the patch I'm using -
where I've failed to resist mucking around to avoid those double calls
to get_cmos_time, sorry: really it's just mdelay(2000) needed somewhere
(until someone who knows puts in something more scientific).

Your problem, of course, is quite likely to be something else entirely;
but I thought I ought to speak up, in case it does help.

Hugh

--- 2.6.17-rc2/arch/i386/kernel/time.c 2006-03-20 05:53:29.000000000 +0000
+++ linux/arch/i386/kernel/time.c 2006-04-19 09:56:02.000000000 +0100
@@ -379,6 +379,7 @@ void notify_arch_cmos_timer(void)
}

static long clock_cmos_diff, sleep_start;
+unsigned long resume_mdelay = 2000;

static struct timer_opts *last_timer;
static int timer_suspend(struct sys_device *dev, pm_message_t state)
@@ -386,9 +387,8 @@ static int timer_suspend(struct sys_devi
/*
* Estimate time zone so that set_time can update the clock
*/
- clock_cmos_diff = -get_cmos_time();
- clock_cmos_diff += get_seconds();
sleep_start = get_cmos_time();
+ clock_cmos_diff = get_seconds() - sleep_start;
last_timer = cur_timer;
cur_timer = &timer_none;
if (last_timer->suspend)
@@ -407,10 +407,11 @@ static int timer_resume(struct sys_devic
hpet_reenable();
#endif
setup_pit_timer();
- sec = get_cmos_time() + clock_cmos_diff;
- sleep_length = (get_cmos_time() - sleep_start) * HZ;
+ mdelay(resume_mdelay);
+ sec = get_cmos_time();
+ sleep_length = (sec - sleep_start) * HZ;
write_seqlock_irqsave(&xtime_lock, flags);
- xtime.tv_sec = sec;
+ xtime.tv_sec = clock_cmos_diff + sec;
xtime.tv_nsec = 0;
jiffies_64 += sleep_length;
wall_jiffies += sleep_length;

Arkadiusz Miskiewicz

unread,
Apr 19, 2006, 1:00:25 PM4/19/06
to
On Wednesday 19 April 2006 18:13, Hugh Dickins wrote:
> On Wed, 19 Apr 2006, Jeff Chua wrote:
> > Any change of getting suspend/resume to work on my IBM X60s notebook.
> >
> > Disk model is ...
> >
> > MODEL="ATA HTS541060G9SA00"
> > FW_REV="MB3I"
> >
> > Linux 2.6.17-rc2.
> >
> > System suspends ok. Resume ok. but no disk access after that.
>
> Not the same disk model, but I've been having similar trouble on a T43p.
>
> I was delighted to see the MSI suspend/resume fix go into 2.6.17-rc2,
> but then disappointed.

Are you using ahci or ata_piix? It seem that people have success with AHCI but
not with ata_piix.

My ThinkPad Z60m has
00:1f.2 IDE interface: Intel Corporation 82801FBM (ICH6M) SATA Controller (rev
03)
which afaik is AHCI capable but only if BIOS initializes it in ahci mode ;-/
Unfortunately there is no such option in BIOS (I've checked latest available
bios - 1.14).

Is it possible to initialize this controller in AHCI mode by Linux itself
without BIOS help? (where possible means ,,possible but not implemented'',
too)

> Hugh

--
Arkadiusz Miƛkiewicz PLD/Linux Team
arekm / maven.pl http://ftp.pld-linux.org/

Hugh Dickins

unread,
Apr 19, 2006, 1:10:13 PM4/19/06
to
On Wed, 19 Apr 2006, Arkadiusz Miskiewicz wrote:
> On Wednesday 19 April 2006 18:13, Hugh Dickins wrote:
> >
> > I was delighted to see the MSI suspend/resume fix go into 2.6.17-rc2,
> > but then disappointed.
>
> Are you using ahci or ata_piix? It seem that people have success with AHCI
> but not with ata_piix.

I'm using ata_piix. But I may have misled you. I didn't mean to imply
any deficiency in the MSI suspend/resume fix: it's just that it sounded
like the answer to whatever my problem was, and probably is a good part
of the answer, but it now looks like I had more than one problem.

> My ThinkPad Z60m has
> 00:1f.2 IDE interface: Intel Corporation 82801FBM (ICH6M) SATA Controller (rev
> 03)
> which afaik is AHCI capable but only if BIOS initializes it in ahci mode ;-/
> Unfortunately there is no such option in BIOS (I've checked latest available
> bios - 1.14).
>
> Is it possible to initialize this controller in AHCI mode by Linux itself
> without BIOS help? (where possible means ,,possible but not implemented'',
> too)

Someone else will have to answer that one: sorry, I've no idea.

Hugh

Matt Mackall

unread,
Apr 19, 2006, 6:00:16 PM4/19/06
to
On Wed, Apr 19, 2006 at 05:13:27PM +0100, Hugh Dickins wrote:
> On Wed, 19 Apr 2006, Jeff Chua wrote:
> >
> > Any change of getting suspend/resume to work on my IBM X60s notebook.
> >
> > Disk model is ...
> >
> > MODEL="ATA HTS541060G9SA00"
> > FW_REV="MB3I"
> >
> > Linux 2.6.17-rc2.
> >
> > System suspends ok. Resume ok. but no disk access after that.
>
> Not the same disk model, but I've been having similar trouble on a T43p.
>
> I was delighted to see the MSI suspend/resume fix go into 2.6.17-rc2,
> but then disappointed. A bisection found that Matt Mackall's sensible
> rc1 patch, to speed up get_cmos_time, has removed what often used to be
> a 2 second delay in resuming: things work well when I reinstate that
> delay (1 second has proved not enough). Below is the patch I'm using -
> where I've failed to resist mucking around to avoid those double calls
> to get_cmos_time, sorry: really it's just mdelay(2000) needed somewhere
> (until someone who knows puts in something more scientific).

That's interesting.

Just to be clear, with my changes we should never fire timers early.
Is the problem that we have a timer that didn't get deleted at suspend
time?

--
Mathematics is the supreme nostalgia of our time.

Hugh Dickins

unread,
Apr 19, 2006, 7:00:17 PM4/19/06
to
On Wed, 19 Apr 2006, Matt Mackall wrote:
> On Wed, Apr 19, 2006 at 05:13:27PM +0100, Hugh Dickins wrote:
> > On Wed, 19 Apr 2006, Jeff Chua wrote:
> > >
> > > System suspends ok. Resume ok. but no disk access after that.
> >
> > Not the same disk model, but I've been having similar trouble on a T43p.

I should have mentioned before, it's suspend to RAM I'm using, by the way.

> > I was delighted to see the MSI suspend/resume fix go into 2.6.17-rc2,
> > but then disappointed. A bisection found that Matt Mackall's sensible
> > rc1 patch, to speed up get_cmos_time, has removed what often used to be
> > a 2 second delay in resuming: things work well when I reinstate that
> > delay (1 second has proved not enough). Below is the patch I'm using -
> > where I've failed to resist mucking around to avoid those double calls
> > to get_cmos_time, sorry: really it's just mdelay(2000) needed somewhere
> > (until someone who knows puts in something more scientific).
>
> That's interesting.
>
> Just to be clear, with my changes we should never fire timers early.

Yes, the only reservation I have about your patch, entirely unrelated to
this resume issue, is that those systems which "hwclock -w" on shutdown
(do they on suspend too? haven't looked) will slowly tend to lose time.

> Is the problem that we have a timer that didn't get deleted at suspend
> time?

I don't think so, but I don't really know. On resume, the disk
goes into ata_exec_internal's 30 second timeout which ends with
"ata1: qc timeout (cmd 0xef)": nothing wrong with that timeout, anyway.

I tend to assume that it's not anything subtle, just that something
there needs a delay which it accidentally happened to get (most of
the time) from the CMOS reading, and with that gone now falls over.

I'd be able to test patches from anyone who knows what they're
doing SATA-wise, but probably not until Friday.

Hugh

Matt Mackall

unread,
Apr 19, 2006, 7:10:15 PM4/19/06
to
On Wed, Apr 19, 2006 at 11:50:55PM +0100, Hugh Dickins wrote:
> On Wed, 19 Apr 2006, Matt Mackall wrote:
> > On Wed, Apr 19, 2006 at 05:13:27PM +0100, Hugh Dickins wrote:
> > > On Wed, 19 Apr 2006, Jeff Chua wrote:
> > > >
> > > > System suspends ok. Resume ok. but no disk access after that.
> > >
> > > Not the same disk model, but I've been having similar trouble on a T43p.
>
> I should have mentioned before, it's suspend to RAM I'm using, by the way.
>
> > > I was delighted to see the MSI suspend/resume fix go into 2.6.17-rc2,
> > > but then disappointed. A bisection found that Matt Mackall's sensible
> > > rc1 patch, to speed up get_cmos_time, has removed what often used to be
> > > a 2 second delay in resuming: things work well when I reinstate that
> > > delay (1 second has proved not enough). Below is the patch I'm using -
> > > where I've failed to resist mucking around to avoid those double calls
> > > to get_cmos_time, sorry: really it's just mdelay(2000) needed somewhere
> > > (until someone who knows puts in something more scientific).
> >
> > That's interesting.
> >
> > Just to be clear, with my changes we should never fire timers early.
>
> Yes, the only reservation I have about your patch, entirely unrelated to
> this resume issue, is that those systems which "hwclock -w" on shutdown
> (do they on suspend too? haven't looked) will slowly tend to lose time.

If they weren't already using NTP, they were losing time anyway.



> > Is the problem that we have a timer that didn't get deleted at suspend
> > time?
>
> I don't think so, but I don't really know. On resume, the disk
> goes into ata_exec_internal's 30 second timeout which ends with
> "ata1: qc timeout (cmd 0xef)": nothing wrong with that timeout, anyway.
>
> I tend to assume that it's not anything subtle, just that something
> there needs a delay which it accidentally happened to get (most of
> the time) from the CMOS reading, and with that gone now falls over.
>
> I'd be able to test patches from anyone who knows what they're
> doing SATA-wise, but probably not until Friday.

I'm puzzled by 1 second not being enough. The former code should have
taken between 1+e and 2 seconds, so I'd think mdelay(1000) would work.

--
Mathematics is the supreme nostalgia of our time.

Hugh Dickins

unread,
Apr 19, 2006, 7:30:15 PM4/19/06
to
On Wed, 19 Apr 2006, Matt Mackall wrote:
> On Wed, Apr 19, 2006 at 11:50:55PM +0100, Hugh Dickins wrote:
> >
> > Yes, the only reservation I have about your patch, entirely unrelated to
> > this resume issue, is that those systems which "hwclock -w" on shutdown
> > (do they on suspend too? haven't looked) will slowly tend to lose time.
>
> If they weren't already using NTP, they were losing time anyway.

But considerably more slowly, I thought; but I could well be wrong,
it's not something I've thought a great deal about.

> > I tend to assume that it's not anything subtle, just that something
> > there needs a delay which it accidentally happened to get (most of
> > the time) from the CMOS reading, and with that gone now falls over.
>

> I'm puzzled by 1 second not being enough. The former code should have
> taken between 1+e and 2 seconds, so I'd think mdelay(1000) would work.

You're assuming that resume worked on this before: not all the time.

This laptop is new to me, with several different issues in the
suspend to RAM area (e.g. the MSI business just fixed in -rc2).
I've not had it resuming reliably until just now: so I think that
with 2.6.16 (where I started out) it would (modulo other issues)
resume successfully when the former code worked out at 2 seconds,
but not when it worked out at 1 second.

One of the few things I'm sure of is that mdelay(1000) proved to be
not enough; maybe even mdelay(2000) is not enough, and I just haven't
yet hit a case which would show that.

Hugh

Jeff Chua

unread,
Apr 19, 2006, 10:30:16 PM4/19/06
to

On Wed, 19 Apr 2006, Arkadiusz Miskiewicz wrote:
> It contains AHCI, right? Then try
> http://www.spinnaker.de/linux/c1320/sata-resume-2.6.16.5.patch

Yes, the IBM X60s is using AHCI.

libata version 1.20 loaded.
ahci 0000:00:1f.2: version 1.2
ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 16 (level, low) -> IRQ 20
PCI: Setting latency timer of device 0000:00:1f.2 to 64
ahci 0000:00:1f.2: AHCI 0001.0100 32 slots 4 ports 1.5 Gbps 0x1 impl SATA
mode
ahci 0000:00:1f.2: flags: 64bit ncq pm led clo pio slum part


> It seems that nothing happened in this area in last 2 months. It no
> longer applies (parts of it are already in mailine, other parts changed
> too much). Z60* ThinkPads probably need that patch.

You're right. I tried to patch it with 2.6.17-rc2 and it doesn't seem to
merge well. Oh well, guess have to wait for a while ...


Thanks for the pointer.

Jeff.

Arkadiusz Miskiewicz

unread,
Apr 20, 2006, 9:30:24 AM4/20/06
to
On Wednesday 19 April 2006 18:13, Hugh Dickins wrote:

> A bisection found that Matt Mackall's sensible
> rc1 patch, to speed up get_cmos_time, has removed what often used to be
> a 2 second delay in resuming: things work well when I reinstate that
> delay (1 second has proved not enough). Below is the patch I'm using -
> where I've failed to resist mucking around to avoid those double calls
> to get_cmos_time, sorry: really it's just mdelay(2000) needed somewhere
> (until someone who knows puts in something more scientific).

FYI: I've started using 2.6.17rc2 + that patch and now resume (from suspend to
ram) works well on ThinkPad Z60m! (so far several suspend/resume cycles)

--
Arkadiusz Miƛkiewicz PLD/Linux Team
arekm / maven.pl http://ftp.pld-linux.org/

Pavel Machek

unread,
Apr 21, 2006, 3:40:07 AM4/21/06
to
Hi!


> > Any change of getting suspend/resume to work on my IBM X60s notebook.
> >
> > Disk model is ...
> >
> > MODEL="ATA HTS541060G9SA00"
> > FW_REV="MB3I"
> >
> > Linux 2.6.17-rc2.
> >
> > System suspends ok. Resume ok. but no disk access after that.
>
> Not the same disk model, but I've been having similar trouble on a T43p.
>
> I was delighted to see the MSI suspend/resume fix go into 2.6.17-rc2,
> but then disappointed. A bisection found that Matt Mackall's sensible
> rc1 patch, to speed up get_cmos_time, has removed what often used to be
> a 2 second delay in resuming: things work well when I reinstate that
> delay (1 second has proved not enough). Below is the patch I'm using -
> where I've failed to resist mucking around to avoid those double calls
> to get_cmos_time, sorry: really it's just mdelay(2000) needed somewhere
> (until someone who knows puts in something more scientific).
>
> Your problem, of course, is quite likely to be something else entirely;
> but I thought I ought to speak up, in case it does help.

Could you

1) try if mdelay(2000) also helps?

2) binary-search on drivers to see which one breaks it?

Pavel
--
Thanks, Sharp!

Hugh Dickins

unread,
Apr 21, 2006, 9:00:25 AM4/21/06
to
On Thu, 20 Apr 2006, Pavel Machek wrote:
> > >
> > > System suspends ok. Resume ok. but no disk access after that.
> >
> > Not the same disk model, but I've been having similar trouble on a T43p.
>
> Could you
>
> 1) try if mdelay(2000) also helps?
>
> 2) binary-search on drivers to see which one breaks it?

Thanks for looking into this. But we already know mdelay(2000) works
around it, and that the failure is "ata1: qc timeout (cmd 0xef)" when
trying to resume the SATA disk (in my case, don't know about Jeff's):
so I'm confused as to what binary search to be doing. I just tried
backing out the time.c patch I sent originally, and substituting the
patch below, much closer to the heart of the problem: that works too.
This is with ata_piix, by the way; resuming from suspend to RAM.
Do let me know what else to try if you've got an idea.

Hugh

--- 2.6.17-rc2/drivers/scsi/libata-core.c 2006-04-19 09:14:11.000000000 +0100
+++ linux/drivers/scsi/libata-core.c 2006-04-21 13:19:54.000000000 +0100
@@ -4287,6 +4287,7 @@ static int ata_start_drive(struct ata_po
int ata_device_resume(struct ata_port *ap, struct ata_device *dev)
{
if (ap->flags & ATA_FLAG_SUSPENDED) {
+ mdelay(2000);
ap->flags &= ~ATA_FLAG_SUSPENDED;
ata_set_mode(ap);

Pavel Machek

unread,
Apr 21, 2006, 12:50:05 PM4/21/06
to
Hi!

> > > > System suspends ok. Resume ok. but no disk access after that.
> > >
> > > Not the same disk model, but I've been having similar trouble on a T43p.
> >
> > Could you
> >
> > 1) try if mdelay(2000) also helps?
> >
> > 2) binary-search on drivers to see which one breaks it?
>
> Thanks for looking into this. But we already know mdelay(2000) works
> around it, and that the failure is "ata1: qc timeout (cmd 0xef)"
> when

Aha, sorry, I was confused.

> trying to resume the SATA disk (in my case, don't know about Jeff's):
> so I'm confused as to what binary search to be doing. I just tried
> backing out the time.c patch I sent originally, and substituting the
> patch below, much closer to the heart of the problem: that works too.
> This is with ata_piix, by the way; resuming from suspend to RAM.
> Do let me know what else to try if you've got an idea.

Not sure why it needs time. Waiting for disk to spin up?

Will it recover from the timeout? Would sticking ata_set_mode() at the
end of timeout routine help?
Pavel


> --- 2.6.17-rc2/drivers/scsi/libata-core.c 2006-04-19 09:14:11.000000000 +0100
> +++ linux/drivers/scsi/libata-core.c 2006-04-21 13:19:54.000000000 +0100
> @@ -4287,6 +4287,7 @@ static int ata_start_drive(struct ata_po
> int ata_device_resume(struct ata_port *ap, struct ata_device *dev)
> {
> if (ap->flags & ATA_FLAG_SUSPENDED) {
> + mdelay(2000);
> ap->flags &= ~ATA_FLAG_SUSPENDED;
> ata_set_mode(ap);
> }

--
Thanks for all the (sleeping) penguins.

Hugh Dickins

unread,
Apr 21, 2006, 4:50:14 PM4/21/06
to
On Fri, 21 Apr 2006, Pavel Machek wrote:
>
> Not sure why it needs time. Waiting for disk to spin up?

I don't know, and can't hear, but doubt it (I can't see why
it'd need disk spun up just to do an ata_dev_set_xfermode).

> Will it recover from the timeout?

No, after that wait for 30 seconds, it degenerates into attempting I/O,
getting errors, remounting the root readonly, can't get much further.

> Would sticking ata_set_mode() at the end of timeout routine help?

Well, moving the ata_set_mode after the ata_start_drive does help:
then the ata_start_drive times out and fails, but that does not seem
to matter at all, and the ata_set_mode then succeeds and all is well.
I guess that amounts to what you meant; but all the same, I won't be
alone in preferring to wait 2 seconds than 30 seconds!

But you've made me try a bit harder, and the patch below, waiting for
ATA_BUSY to clear, copying a line used in several other places there,
fixes it in a much more satisfactory way than mdelay(2000). (I checked
how long it in fact was waiting, saw various waits between 0.8s and 1.3s).

This is a patch I'd not be ashamed to send Jeff Garzik cc linux-ide,
even if we can't name precisely why it's ATA_BUSY then. But I'll
give it a day or so of real-life suspend/resuming first - Arkadiusz
and I both noticed we're more likely to resume successfully after
a brief suspend, so longer suspends are needed for proper testing.

Hugh

--- 2.6.17-rc2/drivers/scsi/libata-core.c 2006-04-19 09:14:11.000000000 +0100

+++ linux/drivers/scsi/libata-core.c 2006-04-21 20:55:48.000000000 +0100
@@ -4288,6 +4288,7 @@ int ata_device_resume(struct ata_port *a


{
if (ap->flags & ATA_FLAG_SUSPENDED) {

ap->flags &= ~ATA_FLAG_SUSPENDED;
+ ata_busy_sleep(ap, ATA_TMOUT_BOOT_QUICK, ATA_TMOUT_BOOT);
ata_set_mode(ap);
}
if (!ata_dev_present(dev))

Matt Mackall

unread,
Apr 21, 2006, 5:00:21 PM4/21/06
to
On Fri, Apr 21, 2006 at 09:44:59PM +0100, Hugh Dickins wrote:
> On Fri, 21 Apr 2006, Pavel Machek wrote:
> >
> > Not sure why it needs time. Waiting for disk to spin up?
>
> I don't know, and can't hear, but doubt it (I can't see why
> it'd need disk spun up just to do an ata_dev_set_xfermode).
>
> > Will it recover from the timeout?
>
> No, after that wait for 30 seconds, it degenerates into attempting I/O,
> getting errors, remounting the root readonly, can't get much further.
>
> > Would sticking ata_set_mode() at the end of timeout routine help?
>
> Well, moving the ata_set_mode after the ata_start_drive does help:
> then the ata_start_drive times out and fails, but that does not seem
> to matter at all, and the ata_set_mode then succeeds and all is well.
> I guess that amounts to what you meant; but all the same, I won't be
> alone in preferring to wait 2 seconds than 30 seconds!
>
> But you've made me try a bit harder, and the patch below, waiting for
> ATA_BUSY to clear, copying a line used in several other places there,
> fixes it in a much more satisfactory way than mdelay(2000). (I checked
> how long it in fact was waiting, saw various waits between 0.8s and 1.3s).
>
> This is a patch I'd not be ashamed to send Jeff Garzik cc linux-ide,
> even if we can't name precisely why it's ATA_BUSY then. But I'll
> give it a day or so of real-life suspend/resuming first - Arkadiusz
> and I both noticed we're more likely to resume successfully after
> a brief suspend, so longer suspends are needed for proper testing.

Well this looks like definite progress. Does kind of look like
spinning up.

> + ata_busy_sleep(ap, ATA_TMOUT_BOOT_QUICK, ATA_TMOUT_BOOT);

Huh. That function actually "busy sleeps". How aptly named.

--
Mathematics is the supreme nostalgia of our time.

Pavel Machek

unread,
Apr 21, 2006, 5:20:20 PM4/21/06
to
Hi!

> This is a patch I'd not be ashamed to send Jeff Garzik cc linux-ide,
> even if we can't name precisely why it's ATA_BUSY then. But I'll
> give it a day or so of real-life suspend/resuming first - Arkadiusz
> and I both noticed we're more likely to resume successfully after
> a brief suspend, so longer suspends are needed for proper testing.

Thanks, looks good.
Pavel

> Hugh
>
> --- 2.6.17-rc2/drivers/scsi/libata-core.c 2006-04-19 09:14:11.000000000 +0100
> +++ linux/drivers/scsi/libata-core.c 2006-04-21 20:55:48.000000000 +0100
> @@ -4288,6 +4288,7 @@ int ata_device_resume(struct ata_port *a
> {
> if (ap->flags & ATA_FLAG_SUSPENDED) {
> ap->flags &= ~ATA_FLAG_SUSPENDED;
> + ata_busy_sleep(ap, ATA_TMOUT_BOOT_QUICK, ATA_TMOUT_BOOT);
> ata_set_mode(ap);
> }
> if (!ata_dev_present(dev))

--

Thanks for all the (sleeping) penguins.

Jeff Garzik

unread,
Apr 21, 2006, 5:40:06 PM4/21/06
to
Hugh Dickins wrote:
> --- 2.6.17-rc2/drivers/scsi/libata-core.c 2006-04-19 09:14:11.000000000 +0100
> +++ linux/drivers/scsi/libata-core.c 2006-04-21 20:55:48.000000000 +0100
> @@ -4288,6 +4288,7 @@ int ata_device_resume(struct ata_port *a
> {
> if (ap->flags & ATA_FLAG_SUSPENDED) {
> ap->flags &= ~ATA_FLAG_SUSPENDED;
> + ata_busy_sleep(ap, ATA_TMOUT_BOOT_QUICK, ATA_TMOUT_BOOT);
> ata_set_mode(ap);
> }


This is helpful to narrow down the problem, but its a bit of a layering
violation. In the current code, all functions called by
ata_device_{suspend,resume}() are high level functions, which uses
ata_qc_issue/ata_qc_complete high level API to address the device.

In contrast, ata_busy_sleep() sticks its hands deep into the host state
machine, and gives the tree a good hard shake. :) Consider that
ata_busy_sleep() doesn't make sense for unusual cases like
ATA-over-ethernet (AoE), or other tunnelled ATA transports.

It may very well be that ata_busy_sleep() is the proper solution for
your hardware, but it isn't applicable to all hardware.

So you really want an ata_make_sure_bus_is_awake_and_working() called at
that location. ata_busy_sleep()'s purpose is to bring a PATA-like bus
to the bus-idle state. So, when working on suspend/resume, the software
needs to have points at which the bus state is controlled/queried/asserted.

Jeff

Chris Ball

unread,
Apr 21, 2006, 7:40:09 PM4/21/06
to
>> On Fri, 21 Apr 2006, Hugh Dickins <hu...@veritas.com> said:

> But you've made me try a bit harder, and the patch below, waiting
> for ATA_BUSY to clear, copying a line used in several other places
> there, fixes it in a much more satisfactory way than mdelay(2000).
> (I checked how long it in fact was waiting, saw various waits
> between 0.8s and 1.3s).

FWIW, this patch fixes S3 resume for me too. I'm on an Alienware m5500
using sd_mod and ata_piix, and I think your T43p is using AHCI, so it
seems that this fixes a libata-wide problem rather than something
specific to your hardware.

- Chris.
--
Chris Ball <c...@mrao.cam.ac.uk> <http://blog.printf.net/>

Hugh Dickins

unread,
Apr 23, 2006, 8:50:09 AM4/23/06
to
On Sat, 22 Apr 2006, Chris Ball wrote:
>
> FWIW, this patch fixes S3 resume for me too. I'm on an Alienware m5500
> using sd_mod and ata_piix, and I think your T43p is using AHCI, so it
> seems that this fixes a libata-wide problem rather than something
> specific to your hardware.

Thanks for the info, that is useful; but in fact I'm ata_piix not ahci.

Hugh

Hugh Dickins

unread,
Apr 23, 2006, 9:00:09 AM4/23/06
to
Thanks for taking a look, and replying in such detail, Jeff
(I'm afraid you overestimate my understanding of this area!)

On Fri, 21 Apr 2006, Jeff Garzik wrote:
> This is helpful to narrow down the problem, but its a bit of a layering
> violation. In the current code, all functions called by
> ata_device_{suspend,resume}() are high level functions, which uses
> ata_qc_issue/ata_qc_complete high level API to address the device.

Ah, that's a pity. I see what you mean.

> In contrast, ata_busy_sleep() sticks its hands deep into the host state
> machine, and gives the tree a good hard shake. :)

Though that seems a considerable overstatement: ata_busy_sleep doesn't
shake anything, it just hangs around reading and testing a flag (in a
bewildering series of slightly different loops). I guess even reading
status must be viewed as "a good hard shake" at this upper level.

> Consider that
> ata_busy_sleep() doesn't make sense for unusual cases like ATA-over-ethernet
> (AoE), or other tunnelled ATA transports.
>
> It may very well be that ata_busy_sleep() is the proper solution for your
> hardware, but it isn't applicable to all hardware.

Can it do harm on any hardware to wait for ATA_BUSY to clear there?
Would it be less of a violation to do it in ata_scsi_device_resume?
Or should ata_piix.c have its own .resume to add this in? Or....

> So you really want an ata_make_sure_bus_is_awake_and_working() called at that
> location. ata_busy_sleep()'s purpose is to bring a PATA-like bus to the
> bus-idle state. So, when working on suspend/resume, the software needs to
> have points at which the bus state is controlled/queried/asserted.

As you can see from my questions, I haven't a clue around here. So for
now I'll just have to keep that ata_busy_sleep with the patches I apply
to my kernel, until someone with a clue makes it redundant. And it is
now there in the LKML archives for those who find it useful.

Hugh

Jeff Chua

unread,
Apr 23, 2006, 10:00:14 AM4/23/06
to
On Sun, 23 Apr 2006, Hugh Dickins wrote:

> On Sat, 22 Apr 2006, Chris Ball wrote:
>> FWIW, this patch fixes S3 resume for me too. I'm on an Alienware m5500
>> using sd_mod and ata_piix, and I think your T43p is using AHCI, so it
>> seems that this fixes a libata-wide problem rather than something
>> specific to your hardware.
>
> Thanks for the info, that is useful; but in fact I'm ata_piix not ahci.


May be just me, not matter what I tried, it still doesn't work. Closest I
can get is to use "resume=/dev/sda" on boot, able to suspend, able to
resume to X windows, can do anything, but can't access disk. ... simple
"ls" would hang. Dmesg is show SATA disk timeout.


I've tried both "piix" and "ahci". Both suspend to disk and mem.


My config ...
CONFIG_SOFTWARE_SUSPEND=y
CONFIG_PM_STD_PARTITION="/dev/sda3"
CONFIG_SUSPEND_SMP=y
CONFIG_SUSPEND2_CRYPTO=y
CONFIG_SUSPEND2=y
CONFIG_SUSPEND2_SWAPWRITER=y
CONFIG_SUSPEND2_DEFAULT_RESUME2="swap:/dev/sda3"
CONFIG_SUSPEND_SHARED=y


Tried suspending via ...

echo shutdown > /sys/power/disk; echo disk > /sys/power/state
echo platform > /sys/power/disk; echo disk > /sys/power/state
echo mem > /sys/power/state


Grub.cfg ...
linux /linux/bzc1 root=/dev/sda2 resume=/dev/sda3


To use "piix" ...
Power On -> BIOS Setup -> SATA -> select "Compatibility"

To use "ahci" ...
Power On -> BIOS Setup -> SATA -> select "AHCI"


Linux version is 2.6.17-rc2. IBM X60s is Pentium D, so SMP ... may be this
has something to do with it.

I would really like to get suspend to work, so whatever ideas you've, I'm
willing to try it.


Thanks,
Jeff.

PS. Official email jchua at fedex dot com, but simply got too much spam at
jeff...@silk.corp.fedex.com and not easy to access as I'm travelling now.

--- arch/i386/kernel/time.c.org 2006-04-21 22:32:15 +0800
+++ arch/i386/kernel/time.c 2006-04-21 22:34:01 +0800
@@ -379,6 +379,8 @@
}

static long clock_cmos_diff, sleep_start;
+unsigned long resume_mdelay = 2000;

+

static struct timer_opts *last_timer;
static int timer_suspend(struct sys_device *dev, pm_message_t state)

@@ -386,9 +388,8 @@


/*
* Estimate time zone so that set_time can update the clock
*/
- clock_cmos_diff = -get_cmos_time();
- clock_cmos_diff += get_seconds();
sleep_start = get_cmos_time();
+ clock_cmos_diff = get_seconds() - sleep_start;
last_timer = cur_timer;
cur_timer = &timer_none;
if (last_timer->suspend)

@@ -407,10 +408,11 @@


hpet_reenable();
#endif
setup_pit_timer();
- sec = get_cmos_time() + clock_cmos_diff;
- sleep_length = (get_cmos_time() - sleep_start) * HZ;
+ mdelay(resume_mdelay);
+ sec = get_cmos_time();
+ sleep_length = (sec - sleep_start) * HZ;
write_seqlock_irqsave(&xtime_lock, flags);
- xtime.tv_sec = sec;
+ xtime.tv_sec = clock_cmos_diff + sec;
xtime.tv_nsec = 0;
jiffies_64 += sleep_length;
wall_jiffies += sleep_length;

--- linux/drivers/scsi/libata-core.c.org 2006-04-23 00:27:00 +0800
+++ linux/drivers/scsi/libata-core.c 2006-04-23 00:22:47 +0800
@@ -4287,7 +4287,8 @@


int ata_device_resume(struct ata_port *ap, struct ata_device *dev)
{

if (ap->flags & ATA_FLAG_SUSPENDED) {
ap->flags &= ~ATA_FLAG_SUSPENDED;
+ ata_busy_sleep(ap, ATA_TMOUT_BOOT_QUICK, ATA_TMOUT_BOOT);
ata_set_mode(ap);
}
if (!ata_dev_present(dev))

Hugh Dickins

unread,
Apr 29, 2006, 2:10:09 PM4/29/06
to
On Sun, 23 Apr 2006, Hugh Dickins wrote:
> On Fri, 21 Apr 2006, Jeff Garzik wrote:
>
> > So you really want an ata_make_sure_bus_is_awake_and_working() called at that
> > location. ata_busy_sleep()'s purpose is to bring a PATA-like bus to the
> > bus-idle state. So, when working on suspend/resume, the software needs to
> > have points at which the bus state is controlled/queried/asserted.
>
> As you can see from my questions, I haven't a clue around here. So for
> now I'll just have to keep that ata_busy_sleep with the patches I apply
> to my kernel, until someone with a clue makes it redundant. And it is
> now there in the LKML archives for those who find it useful.

I'm glad to report that my ata_busy_sleep is already unnecessary in
2.6.17-rc2-mm1 (and probably in at least -rc1-mm3 before it): unlike in
2.6.17-rc3, T43p resumes reliably from RAM with unpatched libata-core.c.
Something has gone seriously right! Thanks...

0 new messages