iscsiadm hangs when /var/lock/iscsi/lock.write exists

574 views
Skip to first unread message

Tomasz Chmielewski

unread,
Feb 26, 2007, 6:51:12 AM2/26/07
to open-...@googlegroups.com
While writing/testing some scripts that automatically connect to
different iSCSI targets, connect nodes etc. I noticed, that sometimes,
iscsiadm hangs after starting such commands:

iscsiadm -m discovery -t sendtargets -p TARGET_IP


Running it via strace revealed:

nanosleep({0, 10000000}, NULL) = 0
link("/var/lock/iscsi/lock", "/var/lock/iscsi/lock.write") = -1 EEXIST
(File exists)


Simply removing "/var/lock/iscsi/lock.write" solved the issue.


Isn't merely checking if the "/var/lock/iscsi/lock.write" file exists
too little? Possibly, one can shut the system when such a file
accidentally exists.


This file can be accidentally there in at least two cases:

1.

# iscsiadm -m discovery -t sendtargets -p TARGET_IP
<ctrl+C when the command started and didn't finish>


2.

Reboot machine when it still starts, and "iscsiadm -m ..." is being
executed.


--
Tomasz Chmielewski
http://wpkg.org

Mike Christie

unread,
Feb 26, 2007, 1:30:00 PM2/26/07
to open-...@googlegroups.com
Tomasz Chmielewski wrote:
> While writing/testing some scripts that automatically connect to
> different iSCSI targets, connect nodes etc. I noticed, that sometimes,
> iscsiadm hangs after starting such commands:
>
> iscsiadm -m discovery -t sendtargets -p TARGET_IP
>
>
> Running it via strace revealed:
>
> nanosleep({0, 10000000}, NULL) = 0
> link("/var/lock/iscsi/lock", "/var/lock/iscsi/lock.write") = -1 EEXIST
> (File exists)
>
>
> Simply removing "/var/lock/iscsi/lock.write" solved the issue.
>
>
> Isn't merely checking if the "/var/lock/iscsi/lock.write" file exists
> too little? Possibly, one can shut the system when such a file
> accidentally exists.
>

Yeah, you are right. Thanks for the details. Would it be acceptable to
modify iscsid to check for stale files when it starts up and remove them
if found?

Tomasz Chmielewski

unread,
Feb 26, 2007, 1:46:37 PM2/26/07
to open-...@googlegroups.com
Mike Christie schrieb:
> Tomasz Chmielewski wrote:

(...)

>> Running it via strace revealed:
>>
>> nanosleep({0, 10000000}, NULL) = 0
>> link("/var/lock/iscsi/lock", "/var/lock/iscsi/lock.write") = -1 EEXIST
>> (File exists)
>>
>>
>> Simply removing "/var/lock/iscsi/lock.write" solved the issue.
>>
>>
>> Isn't merely checking if the "/var/lock/iscsi/lock.write" file exists
>> too little? Possibly, one can shut the system when such a file
>> accidentally exists.
>>
>
> Yeah, you are right. Thanks for the details. Would it be acceptable to
> modify iscsid to check for stale files when it starts up and remove them
> if found?

I guess better do it with iscsid than with a "rm -f .../stale/files" in
user's startup scripts...

Mike Christie

unread,
Feb 26, 2007, 3:39:58 PM2/26/07
to open-...@googlegroups.com

Actually, I am probably wrong here. We want to handle if someone kills
iscsiadm, and then wants to run it without restarting iscsid or the
iscsi service. Is there a standard way for apps to detect if they had
been killed by something like SIGKILL and did not get a chance to clean
themselves up?

Tomasz Chmielewski

unread,
May 9, 2007, 8:09:41 AM5/9/07
to open-...@googlegroups.com, Mike Christie
Mike Christie schrieb:

There's yet another case where iscsiadm will hang because this issue -
the filesystem is read-only.

Lately I came across this by trying to measure how for log can I
disconnect a diskless initiator from the target (just by pulling the
network cable).

After some time passes, the kernel will detect errors, and if we connect
again, we will find that the system is remounted read-only.

All programs will work (well, unless they really really need to write
something), but iscsiadm will just hang:

# strace iscsiadm -m node -T iqn.2007-04.net.net:some.server -p 192.168.1.2
(...)
access("/var/lock/iscsi", F_OK) = 0
open("/var/lock/iscsi/lock", O_RDWR|O_CREAT, 0666) = -1 EROFS (Read-only
file system)
link("/var/lock/iscsi/lock", "/var/lock/iscsi/lock.write") = -1 EROFS
(Read-only file system)


nanosleep({0, 10000000}, NULL) = 0

link("/var/lock/iscsi/lock", "/var/lock/iscsi/lock.write") = -1 EROFS
(Read-only file system)


nanosleep({0, 10000000}, NULL) = 0

link("/var/lock/iscsi/lock", "/var/lock/iscsi/lock.write") = -1 EROFS
(Read-only file system)


nanosleep({0, 10000000}, NULL) = 0

(...)

And it repeats there endlessly.

A pity for someone who tried to check/repair something with iscsiadm.

Tomasz Chmielewski

unread,
May 9, 2007, 8:13:57 AM5/9/07
to open-...@googlegroups.com, Mike Christie
Tomasz Chmielewski schrieb:

(...)

> All programs will work (well, unless they really really need to write
> something), but iscsiadm will just hang:
>
> # strace iscsiadm -m node -T iqn.2007-04.net.net:some.server -p 192.168.1.2
> (...)
> access("/var/lock/iscsi", F_OK) = 0
> open("/var/lock/iscsi/lock", O_RDWR|O_CREAT, 0666) = -1 EROFS (Read-only
> file system)
> link("/var/lock/iscsi/lock", "/var/lock/iscsi/lock.write") = -1 EROFS
> (Read-only file system)
> nanosleep({0, 10000000}, NULL) = 0
> link("/var/lock/iscsi/lock", "/var/lock/iscsi/lock.write") = -1 EROFS
> (Read-only file system)
> nanosleep({0, 10000000}, NULL) = 0
> link("/var/lock/iscsi/lock", "/var/lock/iscsi/lock.write") = -1 EROFS
> (Read-only file system)
> nanosleep({0, 10000000}, NULL) = 0
> (...)
>
> And it repeats there endlessly.
>
> A pity for someone who tried to check/repair something with iscsiadm.

A similar thing happens if you try to start that command as a non-root user:

link("/var/lock/iscsi/lock", "/var/lock/iscsi/lock.write") = -1 EACCES
(Permission denied)


nanosleep({0, 10000000}, NULL) = 0

link("/var/lock/iscsi/lock", "/var/lock/iscsi/lock.write") = -1 EACCES
(Permission denied)


nanosleep({0, 10000000}, NULL) = 0

link("/var/lock/iscsi/lock", "/var/lock/iscsi/lock.write") = -1 EACCES
(Permission denied)


nanosleep({0, 10000000}, NULL) = 0


At least, it should notify the user that he doesn't have sufficient
permissions, but not loop like that.

Mike Christie

unread,
May 9, 2007, 11:46:51 AM5/9/07
to Tomasz Chmielewski, open-...@googlegroups.com

Let me search for the proper way to check that and I will add it. I do
not know off the top of my head. I am still working on the basic lock
hang though. I thought it would be a easy fix like you suggested where
we restart the service, but I think I want to do be able to recover
without being that disruptive. I am still reading for a way to do that.

Reply all
Reply to author
Forward
0 new messages