Concurrent logins to different interfaces of same iscsi target and login timeout

199 views
Skip to first unread message

Amit Bawer

unread,
Jun 30, 2020, 9:00:03 AM6/30/20
to open-iscsi
[Sorry if this message is duplicated, haven't seen it is published in the group]

Hi,

Have couple of question regarding iscsiadm version 6.2.0.878-2:

1) Is it safe to have concurrent logins to the same target from different interfaces?
That is, running the following commands in parallel:

iscsiadm -m node -T iqn.2003-01.org.vm-18-198.iqn2 -I default -p 10.35.18.121:3260,1 -l
iscsiadm -m node -T iqn.2003-01.org.vm-18-198.iqn2 -I default -p 10.35.18.166:3260,1 -l

2) Is there a particular reason for the default values of  node.conn[0].timeo.login_timeout and node.session.initial_login_retry_max?
According to comment in iscsid.conf it would spend 120 seconds in case of an unreachable interface login:

# The default node.session.initial_login_retry_max is 8 and
# node.conn[0].timeo.login_timeout is 15 so we have:
#
# node.conn[0].timeo.login_timeout * node.session.initial_login_retry_max =
#                                                               120 seconds


Thanks,
Amit

Donald Williams

unread,
Jun 30, 2020, 11:55:13 AM6/30/20
to open-...@googlegroups.com
Hello,
 
 Assuming that devmapper is running and MPIO properly configured you want to connect to the same volume/target from different interfaces. 

However in your case you aren't specifying the same interface. "default"  but they are on the same subnet.  Which typically will only use the default NIC for that subnet. 

What iSCSI target are you using?  

 Regards,
Don

--
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/open-iscsi/cc3ad021-753a-4ac4-9e6f-93e8da1e19bbn%40googlegroups.com.

The Lee-Man

unread,
Jun 30, 2020, 1:00:04 PM6/30/20
to open-iscsi
No, iscsiadm is not designed for parallel use. There is some locking, but IIRC there are still issues, like a single connection to the kernel?

After discovery, you should have NODE entries for each path, and you can login to both with "iscsiadm -m node -l".

As far as the default timeouts and retry counts are of course trade-offs. In general, iscsi can have flakey connections, since it's at the mercy of the network. In the event of a transient event, like a switch or target rebooting, the design allows reconnecting if and when the target finally comes back up, since giving up generally can mean data corruption (e.g. for a filesystem).

As the README for open-iscsi describes, you must tweak some of those numbers if you want to use multipathing, since the requirements for one of many paths usually requires a faster timeout, for example.

The Lee-Man

unread,
Jun 30, 2020, 1:02:15 PM6/30/20
to open-iscsi
On Tuesday, June 30, 2020 at 8:55:13 AM UTC-7, Donald Williams wrote:
Hello,
 
 Assuming that devmapper is running and MPIO properly configured you want to connect to the same volume/target from different interfaces. 

However in your case you aren't specifying the same interface. "default"  but they are on the same subnet.  Which typically will only use the default NIC for that subnet. 

Yes, generally best practices require that each component of your two paths between initiator and target are redundant. This means that, in the case of networking, you want to be on different subnets, served by different switches. You also want two different NICs on your initiator, if possible, although many times they are on the same card. But, obviously, some points are not redundant (like your initiator or target).

What iSCSI target are you using?  

 Regards,
Don

On Tue, Jun 30, 2020 at 9:00 AM Amit Bawer <aba...@redhat.com> wrote:
[Sorry if this message is duplicated, haven't seen it is published in the group]

Hi,

Have couple of question regarding iscsiadm version 6.2.0.878-2:

1) Is it safe to have concurrent logins to the same target from different interfaces?
That is, running the following commands in parallel:

iscsiadm -m node -T iqn.2003-01.org.vm-18-198.iqn2 -I default -p 10.35.18.121:3260,1 -l
iscsiadm -m node -T iqn.2003-01.org.vm-18-198.iqn2 -I default -p 10.35.18.166:3260,1 -l

2) Is there a particular reason for the default values of  node.conn[0].timeo.login_timeout and node.session.initial_login_retry_max?
According to comment in iscsid.conf it would spend 120 seconds in case of an unreachable interface login:

# The default node.session.initial_login_retry_max is 8 and
# node.conn[0].timeo.login_timeout is 15 so we have:
#
# node.conn[0].timeo.login_timeout * node.session.initial_login_retry_max =
#                                                               120 seconds


Thanks,
Amit

--
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscribe@googlegroups.com.

Amit Bawer

unread,
Jun 30, 2020, 1:03:20 PM6/30/20
to open-...@googlegroups.com
Hi,

Have couple of question regarding iscsiadm version 6.2.0.878-2:

1) Is it safe to have concurrent logins to the same target from different interfaces?
That is, running the following commands in parallel:

iscsiadm -m node -T iqn.2003-01.org.vm-18-198.iqn2 -I default -p 10.35.18.121:3260,1 -l
iscsiadm -m node -T iqn.2003-01.org.vm-18-198.iqn2 -I default -p 10.35.18.166:3260,1 -l

2) Is the a particular reason for the default values of  node.conn[0].timeo.login_timeout and node.session.initial_login_retry_max?

Amit Bawer

unread,
Jun 30, 2020, 1:03:20 PM6/30/20
to open-iscsi
[Sorry if this post is duplicated, haven't seen it is published in the group yet]

Hi,

Have couple of question regarding iscsiadm version 6.2.0.878-2:

1) Is it safe to have concurrent logins to the same target from different interfaces?
That is, running the following commands in parallel:

iscsiadm -m node -T iqn.2003-01.org.vm-18-198.iqn2 -I default -p 10.35.18.121:3260,1 -l
iscsiadm -m node -T iqn.2003-01.org.vm-18-198.iqn2 -I default -p 10.35.18.166:3260,1 -l

2) Is there a particular reason for the default values of  node.conn[0].timeo.login_timeout and node.session.initial_login_retry_max?

Donald Williams

unread,
Jun 30, 2020, 5:15:36 PM6/30/20
to open-...@googlegroups.com
Re: Subnets.  Not all iSCSI targets operate on multiple subnets.  The Equallogic for example is intended for a single IP subnet schema.,  Multiple subnet require routing be enabled. 

Don


To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/open-iscsi/bf75d5e8-f4ed-4a16-86a8-ab78d0cac1cco%40googlegroups.com.

Amit Bawer

unread,
Jul 27, 2020, 1:38:05 PM7/27/20
to open-iscsi
Thank you for your answers,

The motivation behind the original question is for reducing the waiting time for different iscsi connections logins
in case some of the portals are down.

We have a limitation on our RHEV system where all logins to listed iscsi targets should finish within 180 seconds in total.
In our current implementation we serialize the iscsiadm node logins one after the other,
each is for specific target and portal. In such scheme, each login would wait 120 seconds in case a portal is down
(default 15 seconds login timeout * 8 login retries), so if we have 2 or more connections down, we spend at least 240 seconds
which exceeds our 180 seconds time limit and the entire operation is considered to be failed (RHEV-wise).

Testing [1] different login schemes is summarized in the following table (logins to 2 targets with 2 portals each).
It seems that either login-all nodes after creating them, as suggested in previous answer here, compares in  total time spent
with doing specific node logins concurrently (i.e. running iscsiadm -m node -T target -p portal -I interface  -l in parallel per
each target-portal), for both cases of all portals being online and when one portal is down:

Login scheme                         Online  Portals             Active Sessions       Total Login Time (seconds)
---------------------------------------------------------------------------------------------------------------------------------------------------------
    All at once                            2/2                                 4                               2.1
    All at once                            1/2                                 2                               120.2
    Serial target-portal              2/2                                4                                8.5
    Serial target-portal              1/2                                2                                243.5
    Concurrent target-portal     2/2                               4                                2.1
    Concurrent target-portal    1/2                                2                               120.1

Using concurrent target-portal logins seems to be preferable in our perspective as it allows to connect only to the
specified target and portals without the risk of intermixing with other potential iscsi targets.

The node creation part is kept serial in all tests here and we have seen it may result in the iscsi DB issues if run in parallel.
But using only node logins in parallel doesn't seems to have issues for at least 1000 tries of out tests.

The question to be asked here is it advisable by open-iscsi?
I know I have been answered already that iscsiadm is racy, but does it applies to node logins as well?

The other option is to use one login-all call without parallelism, but that would have other implications on our system to consider.

Your answers would be helpful once again.

Thanks,
- Amit







To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+...@googlegroups.com.

Amit Bawer

unread,
Aug 6, 2020, 4:42:35 AM8/6/20
to open-iscsi
Another point i'd like to ask about is iSER fallback that we have:

Currently we check during connection flow if 'iser' is set on iscsi_default_ifaces in our configuration.
If yes, it is first checked if its working on server side by attempting

iscsiadm -m node -T target -I iser -p portal -l
iscsiadm -m node -T target -I iser -p portal -u

If the login/logout worked it is kept as 'iser' instead of 'default' interface setup, otherwise it fallbacks to 'default'.
This is used later for the actual node login.
The thing is that this check can also waste valuable time when the portal is down, is there a way to fallback in the iscsiadm command itself, or prefer a specific interface type when trying all/parallel logins for same target+portal but with different interfaces types?

The Lee-Man

unread,
Aug 7, 2020, 5:29:41 PM8/7/20
to open-iscsi
On Thursday, August 6, 2020 at 1:42:35 AM UTC-7, Amit Bawer wrote:
Another point i'd like to ask about is iSER fallback that we have:

Currently we check during connection flow if 'iser' is set on iscsi_default_ifaces in our configuration.
If yes, it is first checked if its working on server side by attempting

iscsiadm -m node -T target -I iser -p portal -l
iscsiadm -m node -T target -I iser -p portal -u

If the login/logout worked it is kept as 'iser' instead of 'default' interface setup, otherwise it fallbacks to 'default'.
This is used later for the actual node login.
The thing is that this check can also waste valuable time when the portal is down, is there a way to fallback in the iscsiadm command itself, or prefer a specific interface type when trying all/parallel logins for same target+portal but with different interfaces types?

There is no way to have the iscsi subsystem "fall back" to default from iser given the current code. The problem is when to fallback? Also, falling back to a secondary interface could add an addition 180 seconds, if it times out, as well. So it's up to the higher-level code (you, in this case) to make decisions like that.

The Lee-Man

unread,
Aug 7, 2020, 5:55:22 PM8/7/20
to open-iscsi
On Monday, July 27, 2020 at 10:38:05 AM UTC-7, Amit Bawer wrote:
Thank you for your answers,

The motivation behind the original question is for reducing the waiting time for different iscsi connections logins
in case some of the portals are down.

We have a limitation on our RHEV system where all logins to listed iscsi targets should finish within 180 seconds in total.
In our current implementation we serialize the iscsiadm node logins one after the other,
each is for specific target and portal. In such scheme, each login would wait 120 seconds in case a portal is down
(default 15 seconds login timeout * 8 login retries), so if we have 2 or more connections down, we spend at least 240 seconds
which exceeds our 180 seconds time limit and the entire operation is considered to be failed (RHEV-wise).

Of course these times are tunable, as the README distributed with open-iscsi suggests. But each setting has a trade-off. For example, if you shorten the timeout, you may miss connecting to a target that is just temporarily unreachable.

Testing [1] different login schemes is summarized in the following table (logins to 2 targets with 2 portals each).
It seems that either login-all nodes after creating them, as suggested in previous answer here, compares in  total time spent
with doing specific node logins concurrently (i.e. running iscsiadm -m node -T target -p portal -I interface  -l in parallel per
each target-portal), for both cases of all portals being online and when one portal is down:

Login scheme                         Online  Portals             Active Sessions       Total Login Time (seconds)
---------------------------------------------------------------------------------------------------------------------------------------------------------
    All at once                            2/2                                 4                               2.1
    All at once                            1/2                                 2                               120.2
    Serial target-portal              2/2                                4                                8.5
    Serial target-portal              1/2                                2                                243.5
    Concurrent target-portal     2/2                               4                                2.1
    Concurrent target-portal    1/2                                2                               120.1

So it looks like "All at once" is as fast as concurrent? I must be missing something. Maybe I'm misunderstanding what "all at once" means?

Using concurrent target-portal logins seems to be preferable in our perspective as it allows to connect only to the
specified target and portals without the risk of intermixing with other potential iscsi targets.

Okay, maybe that explains it. You don't trust the "all" option? You are, after all, in charge of the node database. But of course that's your choice.

The node creation part is kept serial in all tests here and we have seen it may result in the iscsi DB issues if run in parallel.
But using only node logins in parallel doesn't seems to have issues for at least 1000 tries of out tests.

In general the heavy lifting here is done by the kernel, which has proper multi-thread locking. And I believe iscsiadm has a single lock to the kernel communication socket, so that doesn't get messed up. So I wouldn't go as far as guaranteeing that this will work, but I agree it certainly seems to reliably work.

The question to be asked here is it advisable by open-iscsi?
I know I have been answered already that iscsiadm is racy, but does it applies to node logins as well?

I guess I answered that. I wouldn't advise against it, but I also wouldn't call best practice in general.

The other option is to use one login-all call without parallelism, but that would have other implications on our system to consider.

Such as?

Your answers would be helpful once again.

Thanks,
- Amit


You might be interested in a new feature I'm considering adding to iscsiadm to do asynchronous logins. In other words, the iscsiadm could, when asked to login to one or more targets, would send the login request to the targets, then return success immediately. It is then up to the end-user (you in this case) to poll for when the target actually shows up.

This would mean that you system boot could occur much more quickly, especially when using for example multipathing on top of two paths to a target, and one path is not up. The problem is that this adds a layer of functionality needed in the client (again, you in this case), since the client has to poll for success, handle timeouts, etc. Also, this is just test code, so you could try it at your own risk. :)

If interested, let me know, and I'll point you at a repo:branch

Amit Bawer

unread,
Aug 9, 2020, 2:08:50 PM8/9/20
to open-iscsi
On Saturday, August 8, 2020 at 12:55:22 AM UTC+3 The Lee-Man wrote:
On Monday, July 27, 2020 at 10:38:05 AM UTC-7, Amit Bawer wrote:
Thank you for your answers,

The motivation behind the original question is for reducing the waiting time for different iscsi connections logins
in case some of the portals are down.

We have a limitation on our RHEV system where all logins to listed iscsi targets should finish within 180 seconds in total.
In our current implementation we serialize the iscsiadm node logins one after the other,
each is for specific target and portal. In such scheme, each login would wait 120 seconds in case a portal is down
(default 15 seconds login timeout * 8 login retries), so if we have 2 or more connections down, we spend at least 240 seconds
which exceeds our 180 seconds time limit and the entire operation is considered to be failed (RHEV-wise).

Of course these times are tunable, as the README distributed with open-iscsi suggests. But each setting has a trade-off. For example, if you shorten the timeout, you may miss connecting to a target that is just temporarily unreachable.

Testing [1] different login schemes is summarized in the following table (logins to 2 targets with 2 portals each).
It seems that either login-all nodes after creating them, as suggested in previous answer here, compares in  total time spent
with doing specific node logins concurrently (i.e. running iscsiadm -m node -T target -p portal -I interface  -l in parallel per
each target-portal), for both cases of all portals being online and when one portal is down:

Login scheme                         Online  Portals             Active Sessions       Total Login Time (seconds)
---------------------------------------------------------------------------------------------------------------------------------------------------------
    All at once                            2/2                                 4                               2.1
    All at once                            1/2                                 2                               120.2
    Serial target-portal              2/2                                4                                8.5
    Serial target-portal              1/2                                2                                243.5
    Concurrent target-portal     2/2                               4                                2.1
    Concurrent target-portal    1/2                                2                               120.1

So it looks like "All at once" is as fast as concurrent? I must be missing something. Maybe I'm misunderstanding what "all at once" means?

To illustrate from the test discussed above, calling login_all() after calling new_node(...) per each listed target and portal as shown below:
...
    for target, portal in connections:
        new_node(target, portal)

    if args.concurrency:
        login_threads(connections, args.concurrency)
    else:
        login_all()
...

def new_node(target, portal):
    logging.info("Adding node for target %s portal %s", target, portal)

    run([
        "iscsiadm",
        "--mode", "node",
        "--targetname", target,
        "--interface", "default",
        "--portal", portal,
        "--op=new"])

    run([
        "iscsiadm",
        "--mode", "node",
        "--targetname", target,
        "--interface", "default",
        "--portal", portal,
        "--op=update",
        "--name", "node.startup",
        "--value", "manual"])

def login_all():
    logging.info("Login to all nodes")
    try:
        run(["iscsiadm", "--mode", "node", "--loginall=manual"])
    except Error as e:
        # Expected timeout error when there are disconnected portals.
        if e.rc != 8:
            raise
        logging.error("Some login failed: %s", e)
 

Using concurrent target-portal logins seems to be preferable in our perspective as it allows to connect only to the
specified target and portals without the risk of intermixing with other potential iscsi targets.

Okay, maybe that explains it. You don't trust the "all" option? You are, after all, in charge of the node database. But of course that's your choice.
It's more about safety I guess, since the connection flow may run on a machine which has other iscsi connections set outside/along this flow.

The node creation part is kept serial in all tests here and we have seen it may result in the iscsi DB issues if run in parallel.
But using only node logins in parallel doesn't seems to have issues for at least 1000 tries of out tests.

In general the heavy lifting here is done by the kernel, which has proper multi-thread locking. And I believe iscsiadm has a single lock to the kernel communication socket, so that doesn't get messed up. So I wouldn't go as far as guaranteeing that this will work, but I agree it certainly seems to reliably work.

The question to be asked here is it advisable by open-iscsi?
I know I have been answered already that iscsiadm is racy, but does it applies to node logins as well?

I guess I answered that. I wouldn't advise against it, but I also wouldn't call best practice in general.

The other option is to use one login-all call without parallelism, but that would have other implications on our system to consider.

Such as?
As mentioned above,  unless there is a way to specify a list of targets and portals for a single login (all) command.

Your answers would be helpful once again.

Thanks,
- Amit


You might be interested in a new feature I'm considering adding to iscsiadm to do asynchronous logins. In other words, the iscsiadm could, when asked to login to one or more targets, would send the login request to the targets, then return success immediately. It is then up to the end-user (you in this case) to poll for when the target actually shows up.
This sounds very interesting, but probably will be available to us only on later RHEL releases, if chosen to be delivered downstream.
At present it seems we can only use the login-all way or logins in a dedicated threads per target-portal.

The Lee-Man

unread,
Aug 12, 2020, 6:32:03 PM8/12/20
to open-iscsi
On Sunday, August 9, 2020 at 11:08:50 AM UTC-7, Amit Bawer wrote:
...

The other option is to use one login-all call without parallelism, but that would have other implications on our system to consider.

Such as?
As mentioned above,  unless there is a way to specify a list of targets and portals for a single login (all) command.

Your answers would be helpful once again.

Thanks,
- Amit


You might be interested in a new feature I'm considering adding to iscsiadm to do asynchronous logins. In other words, the iscsiadm could, when asked to login to one or more targets, would send the login request to the targets, then return success immediately. It is then up to the end-user (you in this case) to poll for when the target actually shows up.
This sounds very interesting, but probably will be available to us only on later RHEL releases, if chosen to be delivered downstream.
At present it seems we can only use the login-all way or logins in a dedicated threads per target-portal.

...

So you can only use RH-released packages? That's fine with me, but I'm asking you to test a new feature and see if it fixes your problems. If it helped, I would add up here in this repo, and redhat would get it by default when they updated, which they do regularly, as does my company (SUSE).

Just as a "side" point, I wouldn't attack your problem by manually listing nodes to login to.

It does seem as if you assume you are the only iscsi user on the system. In that case, you have complete control of the node database. Assuming your targets do not change, you can set up your node database once and never have to discover iscsi targets again. Of course if targets change, you can update your node database, but only as needed, i.e. full discovery shouldn't be needed each time you start up, unless targets are really changing all the time in your environment.

If you do discovery and have nodes in your node database you don't like, just remove them.

Another point about your scheme: you are setting each node's 'startup' to 'manual', but manual is the default, and since you seem to own the open-iscsi code on this system, you can ensure the default is manual. Perhaps because this is a test?

So, again, I ask you if you will test the async login code? It's really not much extra work -- just a "git clone" and a "make install" (mostly). If not, the async feature may make it into iscsiadm any way, some time soon, but I'd really prefer other testers for this feature before that.

Nir Soffer

unread,
Aug 13, 2020, 10:32:26 AM8/13/20
to open-...@googlegroups.com
On Thu, Aug 13, 2020 at 1:32 AM The Lee-Man <leeman...@gmail.com> wrote:
On Sunday, August 9, 2020 at 11:08:50 AM UTC-7, Amit Bawer wrote:
...

The other option is to use one login-all call without parallelism, but that would have other implications on our system to consider.

Such as?
As mentioned above,  unless there is a way to specify a list of targets and portals for a single login (all) command.

Your answers would be helpful once again.

Thanks,
- Amit


You might be interested in a new feature I'm considering adding to iscsiadm to do asynchronous logins. In other words, the iscsiadm could, when asked to login to one or more targets, would send the login request to the targets, then return success immediately. It is then up to the end-user (you in this case) to poll for when the target actually shows up.
This sounds very interesting, but probably will be available to us only on later RHEL releases, if chosen to be delivered downstream.
At present it seems we can only use the login-all way or logins in a dedicated threads per target-portal.

...

So you can only use RH-released packages?

Yes, we support RHEL and CentOS now.
 
That's fine with me, but I'm asking you to test a new feature and see if it fixes your problems. If it helped, I would add up here in this repo, and redhat would get it by default when they updated, which they do regularly, as does my company (SUSE).

Sure, this is how we do things. But using async login is something we can use only
in a future version, maybe RHEL/CentOS 8.4, since it is probably too late for 8.3.

Just as a "side" point, I wouldn't attack your problem by manually listing nodes to login to.

It does seem as if you assume you are the only iscsi user on the system. In that case, you have complete control of the node database. Assuming your targets do not change, you can set up your node database once and never have to discover iscsi targets again. Of course if targets change, you can update your node database, but only as needed, i.e. full discovery shouldn't be needed each time you start up, unless targets are really changing all the time in your environment.

This is partly true. in oVirt, there is the vdsm daemon managing iSCSI connections.
so usually only vdsm manipulates the database.

However even in vdsm we have an issue when we attach a Cinder based volume.
In this case we use os-brick (https://github.com/openstack/os-brick) to attach the
volume, and it will discover and login to the volume.

And of course we cannot prevent an admin from changing the database for their
valid reasons.

So being able to login/logout to specific nodes is very attractive for us. 

If you do discovery and have nodes in your node database you don't like, just remove them.

We can do this, adding and removing nodes we added, but we cannot remove nodes
we did not add. If may be something added by os-brik or an administrator.

Another point about your scheme: you are setting each node's 'startup' to 'manual', but manual is the default, and since you seem to own the open-iscsi code on this system, you can ensure the default is manual. Perhaps because this is a test?

No, this is our production setup. I don't know why we specify manual, maybe
this was not the default in 2009 when this code was written, or maybe the intent
was to be explicit about it, in case the default would change?

Do you see a problem with explicit node.startup=manual?
 

So, again, I ask you if you will test the async login code? It's really not much extra work -- just a "git clone" and a "make install" (mostly). If not, the async feature may make it into iscsiadm any way, some time soon, but I'd really prefer other testers for this feature before that.

Sure, we will test this.

Having async login API sounds great, but my concern is how do we wait for the 
login result. For example with systemd many things became asynchronous, but
there is no good way to wait for things. Few examples are mounts that can fail
after the mount command completes, because after the completion udev changes
permissions on the mount, or multipath devices, which may not be ready after
connecting to a target.

Can you elaborate on how you would wait for the login result, and how would you
get login error for reporting up the stack? How can you handle timeouts? This is 
easy to do when using synchronous API with threads.

From our point of view we want to be able to:

    start async login process
    for result in login results:
        add result to response
    return response with connection details

This runs on every host in a cluster, and the results are returned to oVirt engine,
managing the cluster.

Cheers,
Nir

Amit Bawer

unread,
Aug 18, 2020, 9:23:45 AM8/18/20
to open-iscsi
Hi Lee,

Thanks for adding the async login support to upstream. I've ran some tests using the iscsiadm built from there
and would like to ask:

1. How is it possible to gather the async logins return status? if understood correctly, the proposed way
is to lookup for the connections in the output of "iscsiadm -m session" after the async logins were launched.
Currently, I am using a sampling loop, checking at 1 second intervals the output of iscsiadm -m session for
presence of expected connections targets and portals and breaks if all were found or not found within
the expected timeout interval, which for the default iscsi settings is considered as following:
(120 seconds timeout per connection login) * (number of connections) / (number of workers)
Is there a better way? I am not sure how to gather the error status when a connection not able to login in such case.

2. Would it also be supported for non-login-all mode? For "iscsiadm -m node -T target -p portal -I interface --login"
I get same timeouts with/without the --no-wait flag, meaning the test waits 240 seconds in case two connections
are down when using a single node login worker for both cases, so I assume it currently doesn't apply for this login mode.

-- Simulating one portal down (2 connections down) with one worker, using node login without --no-wait

# python3 ./initiator.py  -j 1 -i 10.35.18.220 10.35.18.156  -d 10.35.18.156

2020-08-18 15:59:01,874 INFO    (MainThread) Removing prior sessions and nodes
2020-08-18 15:59:01,882 INFO    (MainThread) Deleting all nodes
2020-08-18 15:59:01,893 INFO    (MainThread) No active sessions
2020-08-18 15:59:01,943 INFO    (MainThread) Setting 10.35.18.156 as invalid address for target iqn.2003-01.org.vm-18-220.iqn2
2020-08-18 15:59:01,943 INFO    (MainThread) Setting 10.35.18.156 as invalid address for target iqn.2003-01.org.vm-18-220.iqn1
2020-08-18 15:59:01,943 INFO    (MainThread) Discovered connections: {('iqn.2003-01.org.vm-18-220.iqn1', '0.0.0.0:0,0'), ('iqn.2003-01.org.vm-18-220.iqn2', '0.0.0.0:0,0'), ('iqn.2003-01.org.vm-18-220.iqn2', '10.35.18.220:3260,1'), ('iqn.2003-01.org.vm-18-220.iqn1', '10.35.18.220:3260,1')}
2020-08-18 15:59:01,944 INFO    (MainThread) Adding node for target iqn.2003-01.org.vm-18-220.iqn1 portal 0.0.0.0:0,0
2020-08-18 15:59:01,956 INFO    (MainThread) Adding node for target iqn.2003-01.org.vm-18-220.iqn2 portal 0.0.0.0:0,0
2020-08-18 15:59:01,968 INFO    (MainThread) Adding node for target iqn.2003-01.org.vm-18-220.iqn2 portal 10.35.18.220:3260,1
2020-08-18 15:59:01,980 INFO    (MainThread) Adding node for target iqn.2003-01.org.vm-18-220.iqn1 portal 10.35.18.220:3260,1
2020-08-18 15:59:01,995 INFO    (login_0) Login to target iqn.2003-01.org.vm-18-220.iqn1 portal 0.0.0.0:0,0 (nowait=False)
2020-08-18 16:01:02,019 INFO    (login_0) Login to target iqn.2003-01.org.vm-18-220.iqn2 portal 0.0.0.0:0,0 (nowait=False)
2020-08-18 16:01:02,028 ERROR   (MainThread) Job failed: Command ['iscsiadm', '--mode', 'node', '--targetname', 'iqn.2003-01.org.vm-18-220.iqn1', '--interface', 'default', '--portal', '0.0.0.0:0,0', '--login'] failed rc=8 out='Logging in to [iface: default, target: iqn.2003-01.org.vm-18-220.iqn1, portal: 0.0.0.0,0]' err='iscsiadm: Could not login to [iface: default, target: iqn.2003-01.org.vm-18-220.iqn1, portal: 0.0.0.0,0].\niscsiadm: initiator reported error (8 - connection timed out)\niscsiadm: Could not log into all portals'
2020-08-18 16:03:02,045 INFO    (login_0) Login to target iqn.2003-01.org.vm-18-220.iqn2 portal 10.35.18.220:3260,1 (nowait=False)
2020-08-18 16:03:02,053 ERROR   (MainThread) Job failed: Command ['iscsiadm', '--mode', 'node', '--targetname', 'iqn.2003-01.org.vm-18-220.iqn2', '--interface', 'default', '--portal', '0.0.0.0:0,0', '--login'] failed rc=8 out='Logging in to [iface: default, target: iqn.2003-01.org.vm-18-220.iqn2, portal: 0.0.0.0,0]' err='iscsiadm: Could not login to [iface: default, target: iqn.2003-01.org.vm-18-220.iqn2, portal: 0.0.0.0,0].\niscsiadm: initiator reported error (8 - connection timed out)\niscsiadm: Could not log into all portals'
2020-08-18 16:03:02,321 INFO    (login_0) Login to target iqn.2003-01.org.vm-18-220.iqn1 portal 10.35.18.220:3260,1 (nowait=False)
2020-08-18 16:03:02,695 INFO    (MainThread) Connecting completed in 240.752s

-- Simulating one portal down (2 connections down) with one worker, using node login with --no-wait

# python3 ./initiator.py  -j 1 -i 10.35.18.220 10.35.18.156  -d 10.35.18.156  --nowait

2020-08-18 16:16:05,802 INFO    (MainThread) Removing prior sessions and nodes
2020-08-18 16:16:06,075 INFO    (MainThread) Deleting all nodes
2020-08-18 16:16:06,090 INFO    (MainThread) No active sessions
2020-08-18 16:16:06,130 INFO    (MainThread) Setting 10.35.18.156 as invalid address for target iqn.2003-01.org.vm-18-220.iqn2
2020-08-18 16:16:06,131 INFO    (MainThread) Setting 10.35.18.156 as invalid address for target iqn.2003-01.org.vm-18-220.iqn1
2020-08-18 16:16:06,131 INFO    (MainThread) Discovered connections: {('iqn.2003-01.org.vm-18-220.iqn2', '10.35.18.220:3260,1'), ('iqn.2003-01.org.vm-18-220.iqn1', '0.0.0.0:0,0'), ('iqn.2003-01.org.vm-18-220.iqn1', '10.35.18.220:3260,1'), ('iqn.2003-01.org.vm-18-220.iqn2', '0.0.0.0:0,0')}
2020-08-18 16:16:06,132 INFO    (MainThread) Adding node for target iqn.2003-01.org.vm-18-220.iqn2 portal 10.35.18.220:3260,1
2020-08-18 16:16:06,147 INFO    (MainThread) Adding node for target iqn.2003-01.org.vm-18-220.iqn1 portal 0.0.0.0:0,0
2020-08-18 16:16:06,162 INFO    (MainThread) Adding node for target iqn.2003-01.org.vm-18-220.iqn1 portal 10.35.18.220:3260,1
2020-08-18 16:16:06,176 INFO    (MainThread) Adding node for target iqn.2003-01.org.vm-18-220.iqn2 portal 0.0.0.0:0,0
2020-08-18 16:16:06,190 INFO    (login_0) Login to target iqn.2003-01.org.vm-18-220.iqn2 portal 10.35.18.220:3260,1 (nowait=True)
2020-08-18 16:16:06,324 INFO    (login_0) Login to target iqn.2003-01.org.vm-18-220.iqn1 portal 0.0.0.0:0,0 (nowait=True)
2020-08-18 16:18:06,351 INFO    (login_0) Login to target iqn.2003-01.org.vm-18-220.iqn1 portal 10.35.18.220:3260,1 (nowait=True)
2020-08-18 16:18:06,356 ERROR   (MainThread) Job failed: Command ['iscsiadm', '--mode', 'node', '--targetname', 'iqn.2003-01.org.vm-18-220.iqn1', '--interface', 'default', '--portal', '0.0.0.0:0,0', '--login', '--no_wait'] failed rc=8 out='Logging in to [iface: default, target: iqn.2003-01.org.vm-18-220.iqn1, portal: 0.0.0.0,0]' err='iscsiadm: Could not login to [iface: default, target: iqn.2003-01.org.vm-18-220.iqn1, portal: 0.0.0.0,0].\niscsiadm: initiator reported error (8 - connection timed out)\niscsiadm: Could not log into all portals'
2020-08-18 16:18:06,589 INFO    (login_0) Login to target iqn.2003-01.org.vm-18-220.iqn2 portal 0.0.0.0:0,0 (nowait=True)
2020-08-18 16:20:06,643 ERROR   (MainThread) Job failed: Command ['iscsiadm', '--mode', 'node', '--targetname', 'iqn.2003-01.org.vm-18-220.iqn2', '--interface', 'default', '--portal', '0.0.0.0:0,0', '--login', '--no_wait'] failed rc=8 out='Logging in to [iface: default, target: iqn.2003-01.org.vm-18-220.iqn2, portal: 0.0.0.0,0]' err='iscsiadm: Could not login to [iface: default, target: iqn.2003-01.org.vm-18-220.iqn2, portal: 0.0.0.0,0].\niscsiadm: initiator reported error (8 - connection timed out)\niscsiadm: Could not log into all portals'
2020-08-18 16:20:06,656 INFO    (MainThread) Connecting completed in 240.524s


Thanks for helping out,
Amit

The Lee-Man

unread,
Sep 15, 2020, 4:21:37 PM9/15/20
to open-iscsi
I believe the best way to check for the async session to complete is to look for results. Does a new disc show up? That sort of high-level algorithm will be the best, because it waits for what you actually care about.

If you'd like to wait until iscsiadm says the session is complete, that seems like it'd be possible to, but be advised there is a finite amount of time between when (1) the session connects, (2) iscsiadm will show the session as valid, and (3) when udev and friends in the operating system instantiate the disc.

By the way, how are you simulating one target is down? Just curious.

Lastly, yes, I could add code to iscsiadm to make "iscsiadm -m node -T <iqn> ... --login --no_wait" work. Would you be willing to test it (in a branch) if I did?

Amit Bawer

unread,
Sep 22, 2020, 2:45:00 AM9/22/20
to open-iscsi
On Tuesday, September 15, 2020 at 11:21:37 PM UTC+3 The Lee-Man wrote:
I believe the best way to check for the async session to complete is to look for results. Does a new disc show up? That sort of high-level algorithm will be the best, because it waits for what you actually care about.

If you'd like to wait until iscsiadm says the session is complete, that seems like it'd be possible to, but be advised there is a finite amount of time between when (1) the session connects, (2) iscsiadm will show the session as valid, and (3) when udev and friends in the operating system instantiate the disc.
 
I think it could be helpful, because it would avoid the sampling loop having to check for the the established sessions and it could also provide the error information in case the attempt has failed (such as login timeout).


By the way, how are you simulating one target is down? Just curious.
 
Maybe its oversimplified, but when listing portal address as down for the test script, it would replace it with a non-responsive address "0.0.0.0:0,0" in the resulting discovery list
so any login attempt to it would be timed out.


Lastly, yes, I could add code to iscsiadm to make "iscsiadm -m node -T <iqn> ... --login --no_wait" work. Would you be willing to test it (in a branch) if I did?
 
Yes, I could modify the test used so far to check this mod as well.

Thanks.

Gorka Eguileor

unread,
Sep 22, 2020, 8:15:32 AM9/22/20
to open-...@googlegroups.com
Hi,

For os-brick we would have to modify the library to use the async login
mechanism, because right now it's serializing iSCSI connections using an
in-process lock.

There are at least two reasons why we are serializing iSCSI
logins/logouts:

- It's easier: We don't have to be careful with race conditions between
attach/detach/cleanup on failed attach on the same targets.

- It's more robust: This is the main reason. I don't remember exactly
when/where it happened, but concurrently creating nodes and logging in
could lead to a program (iscsiadm or iscsid, I don't remember) getting
stuck forever.

It is in my TODO list to improve the connection speed by reducing the
critical section we are locking, but it's not something I'm currently
working on.


> And of course we cannot prevent an admin from changing the database for
> their
> valid reasons.
>
> So being able to login/logout to specific nodes is very attractive for us.
>
> If you do discovery and have nodes in your node database you don't like,
> > just remove them.
> >
>
> We can do this, adding and removing nodes we added, but we cannot remove
> nodes
> we did not add. If may be something added by os-brik or an administrator.
>
> Another point about your scheme: you are setting each node's 'startup' to
> > 'manual', but manual is the default, and since you seem to own the
> > open-iscsi code on this system, you can ensure the default is manual.
> > Perhaps because this is a test?
> >
>
> No, this is our production setup. I don't know why we specify manual, maybe
> this was not the default in 2009 when this code was written, or maybe the
> intent
> was to be explicit about it, in case the default would change?
>

Yes, that's the reason. The os-brick library doesn't know if the system
has customized defaults, so it sets every configuration option that is
necessary for its correct operation explicitly.


> Do you see a problem with explicit node.startup=manual?
>

The only downside I can think of is the time spent setting it.

Cheers,
Gorka.
> --
> You received this message because you are subscribed to the Google Groups "open-iscsi" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/open-iscsi/CAMr-obscx-wmXs8Y2Y1NzWjcgc_vY-hOaYho50hhQiaJVeN9Qw%40mail.gmail.com.

Gorka Eguileor

unread,
Sep 22, 2020, 8:23:24 AM9/22/20
to open-...@googlegroups.com
On 21/09, Amit Bawer wrote:
>
>
> On Tuesday, September 15, 2020 at 11:21:37 PM UTC+3 The Lee-Man wrote:
>
> > I believe the best way to check for the async session to complete is to
> > look for results. Does a new disc show up? That sort of high-level
> > algorithm will be the best, because it waits for what you actually care
> > about.
> >

Hi,

In the case of os-brick (OpenStack) that would never happen on its own,
because we disable all automatic scans (setting node.session.scan to
manual) on the connections to prevent race conditions between detaching
on the host and mapping a new volume that leads to the removed volume
being discovered again.

So we would need polling to check that the login operation has completed
(with or without success).

Cheers,
Gorka.
> --
> You received this message because you are subscribed to the Google Groups "open-iscsi" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/open-iscsi/d02edd0c-7a27-426c-9ead-be3a3a646e2dn%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages