iscsiadm -m discovery -t sendtargets -p TARGET_IP
Running it via strace revealed:
nanosleep({0, 10000000}, NULL) = 0
link("/var/lock/iscsi/lock", "/var/lock/iscsi/lock.write") = -1 EEXIST
(File exists)
Simply removing "/var/lock/iscsi/lock.write" solved the issue.
Isn't merely checking if the "/var/lock/iscsi/lock.write" file exists
too little? Possibly, one can shut the system when such a file
accidentally exists.
This file can be accidentally there in at least two cases:
1.
# iscsiadm -m discovery -t sendtargets -p TARGET_IP
<ctrl+C when the command started and didn't finish>
2.
Reboot machine when it still starts, and "iscsiadm -m ..." is being
executed.
--
Tomasz Chmielewski
http://wpkg.org
Yeah, you are right. Thanks for the details. Would it be acceptable to
modify iscsid to check for stale files when it starts up and remove them
if found?
(...)
>> Running it via strace revealed:
>>
>> nanosleep({0, 10000000}, NULL) = 0
>> link("/var/lock/iscsi/lock", "/var/lock/iscsi/lock.write") = -1 EEXIST
>> (File exists)
>>
>>
>> Simply removing "/var/lock/iscsi/lock.write" solved the issue.
>>
>>
>> Isn't merely checking if the "/var/lock/iscsi/lock.write" file exists
>> too little? Possibly, one can shut the system when such a file
>> accidentally exists.
>>
>
> Yeah, you are right. Thanks for the details. Would it be acceptable to
> modify iscsid to check for stale files when it starts up and remove them
> if found?
I guess better do it with iscsid than with a "rm -f .../stale/files" in
user's startup scripts...
Actually, I am probably wrong here. We want to handle if someone kills
iscsiadm, and then wants to run it without restarting iscsid or the
iscsi service. Is there a standard way for apps to detect if they had
been killed by something like SIGKILL and did not get a chance to clean
themselves up?
There's yet another case where iscsiadm will hang because this issue -
the filesystem is read-only.
Lately I came across this by trying to measure how for log can I
disconnect a diskless initiator from the target (just by pulling the
network cable).
After some time passes, the kernel will detect errors, and if we connect
again, we will find that the system is remounted read-only.
All programs will work (well, unless they really really need to write
something), but iscsiadm will just hang:
# strace iscsiadm -m node -T iqn.2007-04.net.net:some.server -p 192.168.1.2
(...)
access("/var/lock/iscsi", F_OK) = 0
open("/var/lock/iscsi/lock", O_RDWR|O_CREAT, 0666) = -1 EROFS (Read-only
file system)
link("/var/lock/iscsi/lock", "/var/lock/iscsi/lock.write") = -1 EROFS
(Read-only file system)
nanosleep({0, 10000000}, NULL) = 0
link("/var/lock/iscsi/lock", "/var/lock/iscsi/lock.write") = -1 EROFS
(Read-only file system)
nanosleep({0, 10000000}, NULL) = 0
link("/var/lock/iscsi/lock", "/var/lock/iscsi/lock.write") = -1 EROFS
(Read-only file system)
nanosleep({0, 10000000}, NULL) = 0
(...)
And it repeats there endlessly.
A pity for someone who tried to check/repair something with iscsiadm.
(...)
> All programs will work (well, unless they really really need to write
> something), but iscsiadm will just hang:
>
> # strace iscsiadm -m node -T iqn.2007-04.net.net:some.server -p 192.168.1.2
> (...)
> access("/var/lock/iscsi", F_OK) = 0
> open("/var/lock/iscsi/lock", O_RDWR|O_CREAT, 0666) = -1 EROFS (Read-only
> file system)
> link("/var/lock/iscsi/lock", "/var/lock/iscsi/lock.write") = -1 EROFS
> (Read-only file system)
> nanosleep({0, 10000000}, NULL) = 0
> link("/var/lock/iscsi/lock", "/var/lock/iscsi/lock.write") = -1 EROFS
> (Read-only file system)
> nanosleep({0, 10000000}, NULL) = 0
> link("/var/lock/iscsi/lock", "/var/lock/iscsi/lock.write") = -1 EROFS
> (Read-only file system)
> nanosleep({0, 10000000}, NULL) = 0
> (...)
>
> And it repeats there endlessly.
>
> A pity for someone who tried to check/repair something with iscsiadm.
A similar thing happens if you try to start that command as a non-root user:
link("/var/lock/iscsi/lock", "/var/lock/iscsi/lock.write") = -1 EACCES
(Permission denied)
nanosleep({0, 10000000}, NULL) = 0
link("/var/lock/iscsi/lock", "/var/lock/iscsi/lock.write") = -1 EACCES
(Permission denied)
nanosleep({0, 10000000}, NULL) = 0
link("/var/lock/iscsi/lock", "/var/lock/iscsi/lock.write") = -1 EACCES
(Permission denied)
nanosleep({0, 10000000}, NULL) = 0
At least, it should notify the user that he doesn't have sufficient
permissions, but not loop like that.
Let me search for the proper way to check that and I will add it. I do
not know off the top of my head. I am still working on the basic lock
hang though. I thought it would be a easy fix like you suggested where
we restart the service, but I think I want to do be able to recover
without being that disruptive. I am still reading for a way to do that.