Reboot hangs on failing multipath devices

James Hammer

unread,

Mar 22, 2010, 4:38:22 PM3/22/10

to open-...@googlegroups.com

Every time I reboot my server it hangs on the multipath devices.

The server is Debian based. I've had this problem with all kernels I've
tried (2.6.18, 2.6.24, 2.6.32). In /etc/multipath.conf, no_path_retry
is set to queue

Here are snippets from the reboot log:

<snip>
Stopping multipath daemon: multipathd.
...
Saving the system clock.
Unmounting iscsi-backed filesystems: /umount: /? device is busy
umount: /: device is busy
...
Disconnecting iSCSI targets:Logging out of session [sid: 1,....
Logging out of session [sid: 2,....
Logging out of session [sid: 3,....
sd 8:0:0:0: [sde] Synchronizing SCSI cache
sd 9:0:0:0: [sdd] Synchronizing SCSI cache
sd 10:0:0:0: [sdf] Synchronizing SCSI cache
connection2:0: detected conn error (1020)
connection1:0: detected conn error (1020)
connection3:0: detected conn error (1020)
Logout of [sid: 1...successful
Logout of [sid: 2...successful
Stopping iSCSI initiator server:.
....
Cleaning up ifupdown....
Deactivating swap...done.
Shutting down LVM Volume Groupsdevice-mapper: multipath: Failing path 8:64.
device-mapper: uevent: dm_send_uevents: kobject_uevent_env failed
device-mapper: multipath: Failing path 8:48.
device-mapper: uevent: dm_send_uevents: kobject_uevent_env failed
device-mapper: multipath: Failing path 8:80.
device-mapper: uevent: dm_send_uevents: kobject_uevent_env failed
</snip>

The server is then stuck there indefinitely.

What can I do to avoid this problem when rebooting?

Thanks.

-- James

James Hammer

unread,

Mar 22, 2010, 4:57:01 PM3/22/10

to open-...@googlegroups.com

James Hammer wrote:
> Every time I reboot my server it hangs on the multipath devices.
>
> The server is Debian based. I've had this problem with all kernels
> I've tried (2.6.18, 2.6.24, 2.6.32). In /etc/multipath.conf,
> no_path_retry is set to queue
>

I found that if I set no_path_retry to its default value of 0, then the
server reboots immediately. Is it possible to get this working with
no_path_retry set to queue?

-- James

Mike Christie

unread,

Mar 22, 2010, 5:47:58 PM3/22/10

to open-...@googlegroups.com, James Hammer

Are there file systems mounted on the multipath device?

James Hammer

unread,

Mar 23, 2010, 11:13:14 AM3/23/10

to open-...@googlegroups.com

Mike Christie wrote:
> On 03/22/2010 03:38 PM, James Hammer wrote:
>> Every time I reboot my server it hangs on the multipath devices.
>>
>> The server is Debian based. I've had this problem with all kernels I've
>> tried (2.6.18, 2.6.24, 2.6.32). In /etc/multipath.conf, no_path_retry is
>> set to queue
>>
>> Here are snippets from the reboot log:
>>
>> <snip>
>> Stopping multipath daemon: multipathd.
>> ...

>> Shutting down LVM Volume Groupsdevice-mapper: multipath: Failing path
>> 8:64.
>> device-mapper: uevent: dm_send_uevents: kobject_uevent_env failed
>> device-mapper: multipath: Failing path 8:48.
>> device-mapper: uevent: dm_send_uevents: kobject_uevent_env failed

>> device-mapper: multipath: Failing path 8:80. mult

>
> Are there file systems mounted on the multipath device?
>

As far as I can tell, there are *no* file systems mounted on the
multipath device. This multipath device is used by a virtual machine.
The virtual machine is turned off at that point. The 'mount' command on
the physical host does not list the multipath device as being mounted.

This is what I have found...I ran the whole shutdown sequence manually,
i.e. running each script in /etc/rc0.d manually in order (with
*no_path_retry* set to *queue*). Between each shutdown script, I ran
'*multipath -f mpath5*' to try and remove the multipath device manually.
Each time I got this result:

mpath5: map in use

All the way down until I got to the last 3 scripts:

S50lvm2 -> ../init.d/lvm2
S60umountroot -> ../init.d/umountroot
S90halt -> ../init.d/halt

When that lvm2 script gets run to shutdown lvm2, I again get the
"multipath: Failing path" results:

Shutting down LVM Volume Groupsdevice-mapper: multipath: Failing path

8:48.
device-mapper: uevent: dm_send_uevents: kobject_uevent_env failed
device-mapper: multipath: Failing path 8:80.
device-mapper: uevent: dm_send_uevents: kobject_uevent_env failed

device-mapper: multipath: Failing path 8:64.
device-mapper: uevent: dm_send_uevents: kobject_uevent_env failed

That hangs indefinitely.

Now, if I do the same thing with *no_path_retry* set to *fail* the
sequence goes similarly, except that when I run */etc/init.d/lvm2 stop*
I get the same as above followed by a few of these lines:

/dev/dm-9: read failed after 0 of 2048 at 0: Input/output error
end_request: I/O error, dev dm-9, sector 20971776

Then the script finishes and the reboot can proceed.

So the key seems to be the *no_path_retry* setting.

From my tests, things seem to go so much better if *no_path_retry* is
set to *queue* and the connection to the iSCSI server is interrupted.

So, is it possible to get those paths to "fail" with *no_path_retry* set
to *queue* so the reboot can continue?

Thanks!

-- James

Mike Christie

unread,

Mar 25, 2010, 3:22:02 PM3/25/10

to open-...@googlegroups.com, James Hammer

I do not know if you can easily do this, and I am not sure if it is safe
in your case. It seems like though from the first iscsi messages:

Disconnecting iSCSI targets:Logging out of session [sid: 1,....
Logging out of session [sid: 2,....
Logging out of session [sid: 3,....
sd 8:0:0:0: [sde] Synchronizing SCSI cache
sd 9:0:0:0: [sdd] Synchronizing SCSI cache
sd 10:0:0:0: [sdf] Synchronizing SCSI cache
connection2:0: detected conn error (1020)
connection1:0: detected conn error (1020)
connection3:0: detected conn error (1020)
Logout of [sid: 1...successful
Logout of [sid: 2...successful
Stopping iSCSI initiator server:.

that the iscsi layer has logged out of the sessoins and cleaned up at
its layer, so at this point no IO is going to get executed.

The problem and reason I do not think it is safe to rerrun with
no_path_retry 0, is that there is still IO somewhere in the
multipath/block layer queues. When you see:

> /dev/dm-9: read failed after 0 of 2048 at 0: Input/output error
> end_request: I/O error, dev dm-9, sector 20971776

It means some IO that was in that queue failed. If it was a write to
some disk it means that you lost data.

What you/(the debian scripts) want to do is shutdown multipath first, so
the higher level queues have flushed they data out. Then shut down iscsi.

Or do something to flush the multipath queues and shut that down, then
shutdown iscsi.

netz-haut - stephan seitz

unread,

Mar 25, 2010, 4:20:31 PM3/25/10

to open-...@googlegroups.com

This is a reported bug of the device-mapper on debian.
There's a patch at debians bugtracker available, but as far as I remember,
it has been refused by upstream developers.

We're also running open-iscsi/dm-multipath/lvm/clvm stack on virtualization
Hosts. Due to this behavior one big point is to never ever let multipath loose
all pathes.
Try to add
features "1 queue_if_no_path"
to your related multipath.conf device section.

Regards,
Stephan

> Thanks!
>
> -- James
>
> --
> You received this message because you are subscribed to the Google
> Groups "open-iscsi" group.
> To post to this group, send email to open-...@googlegroups.com.
> To unsubscribe from this group, send email to open-
> iscsi+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/open-iscsi?hl=en.
>

James Hammer

unread,

Mar 31, 2010, 11:02:07 AM3/31/10

to open-...@googlegroups.com

Mike Christie wrote:
> On 03/23/2010 10:13 AM, James Hammer wrote:
>> Mike Christie wrote:
>>> On 03/22/2010 03:38 PM, James Hammer wrote:
>>>> Every time I reboot my server it hangs on the multipath devices.
>>>>
>>>> The server is Debian based. I've had this problem with all kernels
>>>> I've
>>>> tried (2.6.18, 2.6.24, 2.6.32). In /etc/multipath.conf,
>>>> no_path_retry is
>>>> set to queue
>>>>
>>>> Here are snippets from the reboot log:
>>>>
>>>> <snip>
>>>> Stopping multipath daemon: multipathd.
>>>> ...
>>>> Shutting down LVM Volume Groupsdevice-mapper: multipath: Failing path
>>>> 8:64.
>>>> device-mapper: uevent: dm_send_uevents: kobject_uevent_env failed
>>>> device-mapper: multipath: Failing path 8:48.
>>>> device-mapper: uevent: dm_send_uevents: kobject_uevent_env failed
>>>> device-mapper: multipath: Failing path 8:80. mult
>>>
>>> Are there file systems mounted on the multipath device?
>>>
>>
>> As far as I can tell, there are *no* file systems mounted on the
>> multipath device.
>
>

> What you/(the debian scripts) want to do is shutdown multipath first,
> so the higher level queues have flushed they data out. Then shut down
> iscsi.
>
> Or do something to flush the multipath queues and shut that down, then
> shutdown iscsi.
>

The multipath daemon was being shutdown before iscsi in the init
scripts. However, the multipath queues were not being flushed and
shutdown. I added a script to run 'multipath -F' between the multipath
and iscsi shutdown scripts. That seems to flush the queues and shutdown
multipath devices OK. The server no longer hangs on reboot and I see no
glaring errors.

Thanks for your help.

--
James Hammer
jha...@callone.com
312-681-5052

Reply all

Reply to author

Forward