open-iscsi /w suspend to ram / resume.

Iain

unread,

Dec 4, 2010, 3:31:16 PM12/4/10

to open-iscsi

Hi All,

I'm currently experimenting with exporting an iscsi lun to a desktop
(running ubuntu). This generally works well, however, part of the
daily use of this desktop is that it is suspended to ram then resumed
some time later.

When going through a suspend / resume cycle I notice afterwards that
the iscsi devices are either:

* Not available without restarting the iscsi service
* Available but now have a different device name (eg. a move from sdb
to sdc)

I've been looking for documentation / bugs relating to this but I've
not found anything very helpful (Maybe I'm not looking in the right
places).

Thinking about it, I'm wondering if this particular use case is
expected to work or not. Would anyone be able to shed any light on
this?

Thanks,

Iain.

Mike Christie

unread,

Dec 6, 2010, 8:21:08 PM12/6/10

to open-...@googlegroups.com, Iain

On 12/04/2010 02:31 PM, Iain wrote:
> Hi All,
>
> I'm currently experimenting with exporting an iscsi lun to a desktop
> (running ubuntu). This generally works well, however, part of the
> daily use of this desktop is that it is suspended to ram then resumed
> some time later.
>
> When going through a suspend / resume cycle I notice afterwards that
> the iscsi devices are either:
>
> * Not available without restarting the iscsi service
> * Available but now have a different device name (eg. a move from sdb
> to sdc)

The iscsi layer does not remove devices, so it might be a distro script
that is logging in and out of the target or restarting the service and
causing devices to get renamed.

The renaming should be a problem, because you should use udev names with
iscsi. However, if the devices are actually removed and recreated (this
happens with a logout+login) then if there are filessystems or apps
using the iscsi device then you will hit problems because the underlying
kernel structs are changed. Using dm-multipath over iscsi should
workaround those problems though.

>
> I've been looking for documentation / bugs relating to this but I've
> not found anything very helpful (Maybe I'm not looking in the right
> places).
>
> Thinking about it, I'm wondering if this particular use case is
> expected to work or not. Would anyone be able to shed any light on
> this?

I do not test that case. I think it works though, because I have got bug
reports on RHEL about it before. However, I just tried it on fedora 14
and the suspend did not work for me. It seems if iscsi is running,
suspend hangs waiting for something (did not dig into it).

For your setup, when resume is done is iscsid running and what is the
connection, session and iscsid state that gets printed out when you run
"iscsiadm -m session -P 3".

What do you mean iscsi devices are not available? Is the /dev/sdX node
gone? If you access the device does it return IO errors?

Ulrich Windl

unread,

Dec 7, 2010, 3:25:32 AM12/7/10

to open-iscsi

>>> Iain <iain.b...@gmail.com> schrieb am 04.12.2010 um 21:31 in Nachricht
<0e9b8d0d-6d2c-489d...@h17g2000pre.googlegroups.com>:

> Hi All,
>
> I'm currently experimenting with exporting an iscsi lun to a desktop
> (running ubuntu). This generally works well, however, part of the
> daily use of this desktop is that it is suspended to ram then resumed
> some time later.
>
> When going through a suspend / resume cycle I notice afterwards that
> the iscsi devices are either:
>
> * Not available without restarting the iscsi service
> * Available but now have a different device name (eg. a move from sdb
> to sdc)

Hi!

I'm no expert, but I'd expect the TCP connections (if keep-alive is used) to time out. TCP connections without keep alive will survive when no data exchange happens during suspend. If the non sleeping party times out a connection, the awakening party will see a connection reset on the first packet exchange. I'd expect iSCSI to handle these (ie.e. re-establish a connection). I wonder why new devices are found, though.

>
> I've been looking for documentation / bugs relating to this but I've
> not found anything very helpful (Maybe I'm not looking in the right
> places).
>
> Thinking about it, I'm wondering if this particular use case is
> expected to work or not. Would anyone be able to shed any light on
> this?

A good question is whether any I/O operation on iSCSI devices are tried before the whole network stack is properly restored/reestablished. I don't know the answer, sorry.

Regards,
Ulrich

Mike Christie

unread,

Dec 7, 2010, 5:12:22 PM12/7/10

to open-...@googlegroups.com, Ulrich Windl

On 12/07/2010 02:25 AM, Ulrich Windl wrote:
>
> A good question is whether any I/O operation on iSCSI devices are tried before the whole network stack is properly restored/reestablished. I don't know the answer, sorry.
>

It could, but you should be ok. If the notification that the connection
is dead comes after IO is sent then we would try to send IO to the
network layer, but the iscsi layer would eventually figure things out
and end up resending the IO after it has reconnected to the target.

Mike Christie

unread,

Oct 31, 2011, 2:07:06 PM10/31/11

to Vincent Pelletier, open-...@googlegroups.com

On 10/30/2011 08:01 AM, Vincent Pelletier wrote:

> On Dec 7 2010, 11:12 pm, Mike Christie <micha...@cs.wisc.edu> wrote:
>> It could, but you should be ok. If the notification that the connection
>> is dead comes after IO is sent then we would try to send IO to the
>> network layer, but the iscsi layer would eventually figure things out
>> and end up resending the IO after it has reconnected to the target.
>

> Hi list (my first post here).
>
> I'm an iscsi noob (just bought a cheap NAS) and just succeeded in
> moving
> an existing Debian install to a diskless, PXE & root-on-iscsi setup.
> It works fine, except for the subject of this thread: when resuming
> from
> ram suspend (and probably after waiting long enough in suspend, I
> haven't
> done many tries so far), resume is stuck for 120s (
> node.session.timeo.replacement_timeout is set to 120s by default,
> maybe
> this is the timeout I encounter), then a reconnection occurs and
> resume
> finishes successfully. Though not without emitting a handful of kernel
> BUGs ("soft lockup - CPU#0 stuck for 22s").

Are these coming from accesses to the iscsi disk that is root? If so
when the replacement timeout fires, do you get IO errors for the root
paritition or are you using something like multipath over iscsi (I see
you have only one path but are you using it to just temporarily queue IO)?

Could you send the /var/log/messages?

>
> For the moment, I don't have an idea on how to make resume happen
> gracefully:
> - Shorten this timeout ? But for what risks ? Network setup is
> dead-simple: Gb ethernet with one switch and a few meters of cable.
> - I found a dmsetup suspend command, but I'm not sure I want to run
> this
> on a root device...
> - Get some script to be cached before suspend and executed early upon
> resume ?
>

Vincent Pelletier

unread,

Oct 30, 2011, 9:01:43 AM10/30/11

to Mike Christie, open-...@googlegroups.com

On Dec 7 2010, 11:12 pm, Mike Christie <micha...@cs.wisc.edu> wrote:

> It could, but you should be ok. If the notification that the connection
> is dead comes after IO is sent then we would try to send IO to the
> network layer, but the iscsi layer would eventually figure things out
> and end up resending the IO after it has reconnected to the target.

Hi list (my first post here).

I'm an iscsi noob (just bought a cheap NAS) and just succeeded in
moving
an existing Debian install to a diskless, PXE & root-on-iscsi setup.
It works fine, except for the subject of this thread: when resuming
from
ram suspend (and probably after waiting long enough in suspend, I
haven't
done many tries so far), resume is stuck for 120s (
node.session.timeo.replacement_timeout is set to 120s by default,
maybe
this is the timeout I encounter), then a reconnection occurs and
resume
finishes successfully. Though not without emitting a handful of kernel
BUGs ("soft lockup - CPU#0 stuck for 22s").

For the moment, I don't have an idea on how to make resume happen

gracefully:
- Shorten this timeout ? But for what risks ? Network setup is
dead-simple: Gb ethernet with one switch and a few meters of cable.
- I found a dmsetup suspend command, but I'm not sure I want to run
this
on a root device...
- Get some script to be cached before suspend and executed early upon
resume ?

Any pointer welcome.

Regards,
--
Vincent Pelletier

Vincent Pelletier

unread,

Oct 31, 2011, 4:26:47 PM10/31/11

to Mike Christie, open-...@googlegroups.com

Hi.

A short update first: I don't have this problem on any later suspend attempt
(~4 so far, from a few dozen of minutes suspend to several hours).

And a disclaimer: my kernel is tainted. Nvidia proprietary driver. Yuck.
Feel free to blame the problems on it, I need a motivation to switch this
box to nouveau ;) .

On Mon, Oct 31, 2011 at 7:07 PM, Mike Christie <mich...@cs.wisc.edu> wrote:
> Are these coming from accesses to the iscsi disk that is root? If so
> when the replacement timeout fires, do you get IO errors for the root
> paritition or are you using something like multipath over iscsi (I see
> you have only one path but are you using it to just temporarily queue IO)?

I don't use multipath (...at least, if lsmod | grep "multipath" -> nothing is
enough to tell I'm not). I've not configured a thing to use it.

> Could you send the /var/log/messages?

Attached (gzipped, as it's 250k+ extracted).
Limited from boot to shutdown (...for reboot).
Weird enough: last lines before suspend have a timestamp from wakeup time.
Also, the error output from wakeup is truncated, as seen on line 670:
Oct 30 06:23:22 localhost kernel: [ 1138.769133] Restarting tasks ...
95606] [<ffffffff8100a9ef>] ? do_softirq+0x3f/0x84

Note: I accidentally hit ctrl-scroll lock while trying to make console
flood stop
to get time to read - and discovered it somehow dumped scheduler status.
Sorry for the data it pushed out of buffer.

>> For the moment, I don't have an idea on how to make resume happen
>> gracefully:

A note on my setup for that boot: actually, I wasn't completely netbooting at
that point: grub2 & /boot were on local disk, but initrd was initiating iscsi
connection. It was an intermediate setting, and I am now completely booting
off iscsi (+ TFTP):
BIOS + embedded PXE (because I don't want to reflash) ->
iPXE ("sanboot iscsi:..." maps iscsi to bios disk 0x80) -> grub2 -> linux ->
initrd reconnects to iscsi to mount /

Maybe this could explain the problem I had (maybe the kernel/suspend tools
weren't treating network gently enough for a clean resume).

Regards,
--
Vincent Pelletier

messages.gz

Mike Christie

unread,

Oct 31, 2011, 10:22:09 PM10/31/11

to Vincent Pelletier, open-...@googlegroups.com

On 10/31/2011 03:26 PM, Vincent Pelletier wrote:
> Hi.
>
> A short update first: I don't have this problem on any later suspend attempt
> (~4 so far, from a few dozen of minutes suspend to several hours).
>
> And a disclaimer: my kernel is tainted. Nvidia proprietary driver. Yuck.
> Feel free to blame the problems on it, I need a motivation to switch this
> box to nouveau ;) .
>
> On Mon, Oct 31, 2011 at 7:07 PM, Mike Christie <mich...@cs.wisc.edu> wrote:
>> Are these coming from accesses to the iscsi disk that is root? If so
>> when the replacement timeout fires, do you get IO errors for the root
>> paritition or are you using something like multipath over iscsi (I see
>> you have only one path but are you using it to just temporarily queue IO)?
>
> I don't use multipath (...at least, if lsmod | grep "multipath" -> nothing is
> enough to tell I'm not). I've not configured a thing to use it.
>
>> Could you send the /var/log/messages?
>
> Attached (gzipped, as it's 250k+ extracted).
> Limited from boot to shutdown (...for reboot).
> Weird enough: last lines before suspend have a timestamp from wakeup time.
> Also, the error output from wakeup is truncated, as seen on line 670:
> Oct 30 06:23:22 localhost kernel: [ 1138.769133] Restarting tasks ...
> 95606] [<ffffffff8100a9ef>] ? do_softirq+0x3f/0x84

Is the log you attached of the case where it took hours or one where it
now sort of works? I did not see the 120 sec issue or any soft lockups.

When you hit the problem is the network accessible and is iscsid up and
running? Can you ping the initiator box from another box on the network?

When you said "then a reconnection" occurs what did you mean? Did you
see a iscsi message indicating that we were reconnected to the target?

>
> Note: I accidentally hit ctrl-scroll lock while trying to make console
> flood stop
> to get time to read - and discovered it somehow dumped scheduler status.
> Sorry for the data it pushed out of buffer.
>
>>> For the moment, I don't have an idea on how to make resume happen
>>> gracefully:
>
> A note on my setup for that boot: actually, I wasn't completely netbooting at
> that point: grub2 & /boot were on local disk, but initrd was initiating iscsi
> connection. It was an intermediate setting, and I am now completely booting
> off iscsi (+ TFTP):
> BIOS + embedded PXE (because I don't want to reflash) ->
> iPXE ("sanboot iscsi:..." maps iscsi to bios disk 0x80) -> grub2 -> linux ->
> initrd reconnects to iscsi to mount /
>
> Maybe this could explain the problem I had (maybe the kernel/suspend tools
> weren't treating network gently enough for a clean resume).

You are doing the suspend after we have povited from the initramfs to
the real /, right? If so that should not be an issue.

Vincent Pelletier

unread,

Nov 1, 2011, 3:20:22 AM11/1/11

to Mike Christie, open-...@googlegroups.com

Le mardi 01 novembre 2011 03:22:09, Mike Christie a écrit :
> Is the log you attached of the case where it took hours or one where it
> now sort of works?

The log attached is the case where it took 120s to wakeup from suspend to ram.

> I did not see the 120 sec issue or any soft lockups.

Weird. This comes from /var/log/messages, and indeed doesn't contain the "BUG"
log lines I mentionned. I dumped dmesg right after resume succeeded (after the
120s stall) to another log file, attached here.
It started with the same scheduller dump text, which I shortened (just kept
first and last line).

Comparing dmesg output with messages log, I see the BUG-related dumps are
present in messages, only the "BUG" line is only in dmesg.

I also realise the "120s" log line is not present. I found one in kern.log
which *looks* like the one I remember:
Oct 29 23:12:32 localhost kernel: [ 2403.296042] session1: session recovery
timed out after 120 secs
But the timestamp is from before I even booted the machine for the session
where resume took long to succeed. I'm puzzled.

> When you hit the problem is the network accessible and is iscsid up and
> running?

I could not do a thing when the problem occurred: no shell is available until
the end of wakeup.

> Can you ping the initiator box from another box on the network?

As I said, I cannot reproduce the problem: wakeup after suspend to ram now
works as I'm used to it with a local hard disk.

> When you said "then a reconnection" occurs what did you mean? Did you
> see a iscsi message indicating that we were reconnected to the target?

I don't remember any message literally mentionning a reconnection.

Regards,
--
Vincent Pelletier

iscsi_resume_timeouts.log

Reply all

Reply to author

Forward