The server is Debian based. I've had this problem with all kernels I've
tried (2.6.18, 2.6.24, 2.6.32). In /etc/multipath.conf, no_path_retry
is set to queue
Here are snippets from the reboot log:
<snip>
Stopping multipath daemon: multipathd.
...
Saving the system clock.
Unmounting iscsi-backed filesystems: /umount: /? device is busy
umount: /: device is busy
...
Disconnecting iSCSI targets:Logging out of session [sid: 1,....
Logging out of session [sid: 2,....
Logging out of session [sid: 3,....
sd 8:0:0:0: [sde] Synchronizing SCSI cache
sd 9:0:0:0: [sdd] Synchronizing SCSI cache
sd 10:0:0:0: [sdf] Synchronizing SCSI cache
connection2:0: detected conn error (1020)
connection1:0: detected conn error (1020)
connection3:0: detected conn error (1020)
Logout of [sid: 1...successful
Logout of [sid: 2...successful
Stopping iSCSI initiator server:.
....
Cleaning up ifupdown....
Deactivating swap...done.
Shutting down LVM Volume Groupsdevice-mapper: multipath: Failing path 8:64.
device-mapper: uevent: dm_send_uevents: kobject_uevent_env failed
device-mapper: multipath: Failing path 8:48.
device-mapper: uevent: dm_send_uevents: kobject_uevent_env failed
device-mapper: multipath: Failing path 8:80.
device-mapper: uevent: dm_send_uevents: kobject_uevent_env failed
</snip>
The server is then stuck there indefinitely.
What can I do to avoid this problem when rebooting?
Thanks.
-- James
I found that if I set no_path_retry to its default value of 0, then the
server reboots immediately. Is it possible to get this working with
no_path_retry set to queue?
-- James
Are there file systems mounted on the multipath device?
As far as I can tell, there are *no* file systems mounted on the
multipath device. This multipath device is used by a virtual machine.
The virtual machine is turned off at that point. The 'mount' command on
the physical host does not list the multipath device as being mounted.
This is what I have found...I ran the whole shutdown sequence manually,
i.e. running each script in /etc/rc0.d manually in order (with
*no_path_retry* set to *queue*). Between each shutdown script, I ran
'*multipath -f mpath5*' to try and remove the multipath device manually.
Each time I got this result:
mpath5: map in use
All the way down until I got to the last 3 scripts:
S50lvm2 -> ../init.d/lvm2
S60umountroot -> ../init.d/umountroot
S90halt -> ../init.d/halt
When that lvm2 script gets run to shutdown lvm2, I again get the
"multipath: Failing path" results:
Shutting down LVM Volume Groupsdevice-mapper: multipath: Failing path
8:48.
device-mapper: uevent: dm_send_uevents: kobject_uevent_env failed
device-mapper: multipath: Failing path 8:80.
device-mapper: uevent: dm_send_uevents: kobject_uevent_env failed
device-mapper: multipath: Failing path 8:64.
device-mapper: uevent: dm_send_uevents: kobject_uevent_env failed
That hangs indefinitely.
Now, if I do the same thing with *no_path_retry* set to *fail* the
sequence goes similarly, except that when I run */etc/init.d/lvm2 stop*
I get the same as above followed by a few of these lines:
/dev/dm-9: read failed after 0 of 2048 at 0: Input/output error
end_request: I/O error, dev dm-9, sector 20971776
Then the script finishes and the reboot can proceed.
So the key seems to be the *no_path_retry* setting.
From my tests, things seem to go so much better if *no_path_retry* is
set to *queue* and the connection to the iSCSI server is interrupted.
So, is it possible to get those paths to "fail" with *no_path_retry* set
to *queue* so the reboot can continue?
Thanks!
-- James
I do not know if you can easily do this, and I am not sure if it is safe
in your case. It seems like though from the first iscsi messages:
Disconnecting iSCSI targets:Logging out of session [sid: 1,....
Logging out of session [sid: 2,....
Logging out of session [sid: 3,....
sd 8:0:0:0: [sde] Synchronizing SCSI cache
sd 9:0:0:0: [sdd] Synchronizing SCSI cache
sd 10:0:0:0: [sdf] Synchronizing SCSI cache
connection2:0: detected conn error (1020)
connection1:0: detected conn error (1020)
connection3:0: detected conn error (1020)
Logout of [sid: 1...successful
Logout of [sid: 2...successful
Stopping iSCSI initiator server:.
that the iscsi layer has logged out of the sessoins and cleaned up at
its layer, so at this point no IO is going to get executed.
The problem and reason I do not think it is safe to rerrun with
no_path_retry 0, is that there is still IO somewhere in the
multipath/block layer queues. When you see:
> /dev/dm-9: read failed after 0 of 2048 at 0: Input/output error
> end_request: I/O error, dev dm-9, sector 20971776
It means some IO that was in that queue failed. If it was a write to
some disk it means that you lost data.
What you/(the debian scripts) want to do is shutdown multipath first, so
the higher level queues have flushed they data out. Then shut down iscsi.
Or do something to flush the multipath queues and shut that down, then
shutdown iscsi.
We're also running open-iscsi/dm-multipath/lvm/clvm stack on virtualization
Hosts. Due to this behavior one big point is to never ever let multipath loose
all pathes.
Try to add
features "1 queue_if_no_path"
to your related multipath.conf device section.
Regards,
Stephan
> Thanks!
>
> -- James
>
> --
> You received this message because you are subscribed to the Google
> Groups "open-iscsi" group.
> To post to this group, send email to open-...@googlegroups.com.
> To unsubscribe from this group, send email to open-
> iscsi+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/open-iscsi?hl=en.
>
Thanks for your help.
--
James Hammer
jha...@callone.com
312-681-5052