[reposting, as the previous one seems to be lost]
Hi all,
I have a question regarding udev events when using iscsi disks.
By using "udevadm monitor" I can see that events are generated when I login and logout from an iscsi portal/resource, creating/destroying the relative links under /dev/
However, I can not see anything when the remote machine simple dies/reboots/disconnects: while "dmesg" shows the iscsi timeout expiring, I don't see anything about a removed disk (and the links under /dev/ remains unaltered, indeed). At the same time, when the remote machine and disk become available again, no reconnection events happen.
I can read here that, years ago, a patch was in progress to give better integration with udev when a device disconnects/reconnects. Did the patch got merged? Or does the one I described above remain the expected behavior? Can be changed?
Thanks.
--
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/open-iscsi/13d4c963-b633-4672-97d9-dd41eec5fb5b%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/open-iscsi/9D54680A-F97E-4465-BA6C-566562C5DC91%40eyeconsultantspc.com.
[reposting, as the previous one seems to be lost]
Hi all,
I have a question regarding udev events when using iscsi disks.
By using "udevadm monitor" I can see that events are generated when I login and logout from an iscsi portal/resource, creating/destroying the relative links under /dev/
However, I can not see anything when the remote machine simple dies/reboots/disconnects: while "dmesg" shows the iscsi timeout expiring, I don't see anything about a removed disk (and the links under /dev/ remains unaltered, indeed). At the same time, when the remote machine and disk become available again, no reconnection events happen.
I can read here that, years ago, a patch was in progress to give better integration with udev when a device disconnects/reconnects. Did the patch got merged? Or does the one I described above remain the expected behavior? Can be changed?
Thanks.
Wondering myself.On Apr 21, 2020, at 2:31 AM, Gionatan Danti <gionata...@gmail.com> wrote:[reposting, as the previous one seems to be lost]
Hi all,
I have a question regarding udev events when using iscsi disks.
By using "udevadm monitor" I can see that events are generated when I login and logout from an iscsi portal/resource, creating/destroying the relative links under /dev/So running “udevadm monitor” on the initiator, you can see when a block device becomes available locally.However, I can not see anything when the remote machine simple dies/reboots/disconnects: while "dmesg" shows the iscsi timeout expiring, I don't see anything about a removed disk (and the links under /dev/ remains unaltered, indeed). At the same time, when the remote machine and disk become available again, no reconnection events happen.As someone who has had an inordinate amount of experience with the iSCSi connection breaking ( power outage, Network switch dies, wrong ethernet cable pulled, the target server machine hardware crashes, ...) in the middle of production, the more info the better. Udev event triggers would help. I wonder exactly how XenServer handles this as it itself seemed more resilient.XenServer host initiators do something correct to recover and wonder how that compares to the normal iSCSi initiator.
But unfortunately, XenServer LVM-over-iSCSi does not pass the message along to its Linux virtual drives and VMs in the same way as Windows VMs.When the target drives became available again, MS Windows virtual machines would gracefully recover on their own. All Linux VM filesystems went read only and those VM machines required forceful rebooting. mount remount would not work.
--
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/open-iscsi/7f583720-8a84-4872-8d1a-5cd284295c22%40googlegroups.com.
Because of the design of iSCSI, there is no way for the initiator to know the server has gone away. The only time an initiator might figure this out is when it tries to communicate with the target.This assumes we are not using some sort of directory service, like iSNS, which can send asynchronous notifications. But even then, the iSNS server would have to somehow know that the target went down. If the target crashed, that might be difficult to ascertain.So in the absence of some asynchronous notification, the initiator only knows the target is not responding if it tries to talk to that target.Normally iscsid defaults to sending periodic NO-OPs to the target every 5 seconds. So if the target goes away, the initiator usually notices, even if no regular I/O is occurring.
But this is where the error recovery gets tricky, because iscsi tries to handle "lossy" connections. What if the server will be right back? Maybe it's rebooting? Maybe the cable will be plugged back in? So iscsi keeps trying to reconnect. As a matter of fact, if you stop iscsid and restart it, it sees the failed connection and retries it -- forever, by default. I actually added a configuration parameter called reopen_max, that can limit the number of retries. But there was pushback on changing the default value from 0, which is "retry forever".So what exactly do you think the system should do when a connection "goes away"? How long does it have to be gone to be considered gone for good? If the target comes back "later" should it get the same disc name? Should we retry, and if so how much before we give up? I'm interested in your views, since it seems like a non-trivial problem to me.
So you're saying as soon as a bad connection is detected (perhaps by a NOOP), the device should go away?
[reposting, as the previous one seems to be lost]
Hi all,
I have a question regarding udev events when using iscsi disks.
By using "udevadm monitor" I can see that events are generated when I login and logout from an iscsi portal/resource, creating/destroying the relative links under /dev/
However, I can not see anything when the remote machine simple dies/reboots/disconnects: while "dmesg" shows the iscsi timeout expiring, I don't see anything about a removed disk (and the links under /dev/ remains unaltered, indeed). At the same time, when the remote machine and disk become available again, no reconnection events happen.
I can read here that, years ago, a patch was in progress to give better integration with udev when a device disconnects/reconnects. Did the patch got merged? Or does the one I described above remain the expected behavior? Can be changed?
Thanks.
--
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/open-iscsi/13d4c963-b633-4672-97d9-dd41eec5fb5b%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/open-iscsi/5E9FEC8E020000A1000387D7%40gwsmtp.uni-regensburg.de.
>
> But this is where the error recovery gets tricky, because iscsi tries to
> handle "lossy" connections. What if the server will be right back? Maybe
> it's rebooting? Maybe the cable will be plugged back in? So iscsi keeps
> trying to reconnect. As a matter of fact, if you stop iscsid and restart
> it, it sees the failed connection and retries it -- forever, by default. I
> actually added a configuration parameter called reopen_max, that can limit
> the number of retries. But there was pushback on changing the default value
> from 0, which is "retry forever".
>
> So what exactly do you think the system should do when a connection "goes
> away"? How long does it have to be gone to be considered gone for good? If
> the target comes back "later" should it get the same disc name? Should we
> retry, and if so how much before we give up? I'm interested in your views,
> since it seems like a non-trivial problem to me.
IMHO a "bus down" is a critical event affecting _all_ devices on that bus, not just a single target. Well, it might be some extra noise if those other targets have no I/O outstanding, but it's better to know that the bus is down before initiating a transfer rather than concluding seconds later that the target seems unreachable for some reasons unknown.
>
>>
>> I can read here that, years ago, a patch was in progress to give better
>> integration with udev when a device disconnects/reconnects. Did the patch
>> got merged? Or does the one I described above remain the expected behavior?
>> Can be changed?
>>
>
> So you're saying as soon as a bad connection is detected (perhaps by a
> NOOP), the device should go away?
Maybe the state should be similar to a device being in power-save mode: It's not accessible right now, but should be woke up ASAP. See my earlier comparison to NFS hard-mounts...
Regards,
Ulrich
Well, for short disconnections the re-try approach is surely the better one. But I naively assumed that a longer disconnection, as described by the node.session.timeo.replacement_timeout parameter, would tear down the device with a corresponding udev event. Udev should have no problem assigning the device a sensible persistent name, right?
This open the door to another question: from iscsid.conf and README files I (wrongly?) understand that replacement_timeout come into play only when the SCSI EH is running, while in the other cases different timeouts as node.session.err_timeo.lu_reset_timeout and node.session.err_timeo.tgt_reset_timeout should affect the (dis)connection. However, in all my tests, I only saw replacement_timeout being honored, still I did not catch a single running instance of SCSI EH via the proposed command iscsiadm -m session -P 3