Hi Jeremy,
Well, I should use the occasion to answer this one as well. The
reason why I stalled on that answer though is: the whole connection
caching is somewhat of a hack, and should be considered black magic.
We are not proud of it. It is a corollary of the saga.utils.pty
layer, which in itself is something which I am sure will haunt me
further in the future, and which is in desperate need for a conceptual
overhaul -- if only we could convince ourself of investing time to do
so...
With this disclaimer (which I can't make any stronger I guess), see below.
On Fri, Mar 4, 2016 at 11:07 AM, Jeremy Cohen
<
jeremy...@imperial.ac.uk> wrote:
> I'm submitting jobs to a remote cluster via SAGA-Python but also want to
> undertake some file operations while the jobs are running. These operations
> take place fairly frequently while jobs are running and I'd like to keep the
> number of SSH connections to a minimum and also avoid constantly creating
> and closing SSH connections for each file operation.
Full ack on the use case.
> I know that there have been various previous discussions on sharing SSH
> connections but I wondered if someone could provide some more detailed
> information about how SAGA-Python actually works with SSH connections? I saw
> information in an earlier thread
> (
https://groups.google.com/d/msg/saga-users/_Ldjbo6dElg/xPguk-f4dhsJ)
> stating that using a single Service instance should result in connections
> being re-used but new channels are created for data transfers. However, I
> couldn't find any more detailed description on how SAGA-Python handles the
> multiple connections that it creates.
that information is somewhat outdated I'm afraid...
>
> For example, I create a new SSH context and associated Session:
>
> import saga
> ctx = saga.Context("ssh")
> ctx.user_id = 'myuser'
> ctx.user_key = '/path/to/my/key'
> s = saga.Session(default=False)
> s.add_context(ctx)
>
> Now I create a service instance pointing to a remote server:
>
> svc = saga.job.Service('ssh://myserver.remote/',session=s)
>
> I see three SSH connections created to the remote node. If I then create a
> Directory instance:
>
> dir1 = saga.filesystem.Directory('sftp://myserver.remote/tmp/', session=s)
>
> ...I see a fourth SSH connection created…
>
> On creating a further two directory instances, similar to the above, no
> further SSH connections are initiated. When I then call close() on each of
> the directory instances, the fourth SSH connection seems to remain.
We basically manage a pool of connections, via a radical.utils.lease_manager:
https://github.com/radical-cybertools/saga-python/blob/devel/src/saga/session.py#L121
https://github.com/radical-cybertools/radical.utils/blob/devel/src/radical/utils/lease_manager.py
On the adaptor layer, when we need a new shell connection, we ask the
lease manager for one. Example:
https://github.com/radical-cybertools/saga-python/blob/devel/src/saga/adaptors/shell/shell_file.py#L249
The lease manager (LM) will check if a shell for that target host
exists, and is free to use (ie. is not used by any other adaptor). If
that is the case, the shell is locked for use by the adaptor, until
the lease is returned. If no shell exists, or non is free, AND if the
max pool size is not reached, the LM will instantiate a new connection
on the fly, adds that to the pool, and hands out a lease.
There is always a master channel alive per resource (although that
is, in the current configuration, not strictly necessary), so that
add's one connection to the total number of channels. that master is
not in the pool, and cannot be leased.
Now, that mechanism is not used in all places. Specifically, we
skipped places where we (lazily) assumed that the lease would be for a
long time, or just once, etc, and it would not be worthwhile to use
the LM. In order to reduce the number of created channels for your
case, we would need to check where we are not using the LM, yet, and
change that. That often implies some (small) structural code changes,
to make sure that:
- the time of channel lease is short (and finite!)
- there is no assumption on the state of the channel
As to the latter point: the channel represents a remote shell, which
has a PWD, env settings, etc. While much of that shell is abstracted
away by the PTY layer, not everything is. Specifically the instance
using a leased shell needs to make sure that PWD is pointed to the
expected location.
Hmm, I hope the above clarifies somewhat what is going on under the
hood. Specifically it should explain the behavior on the Dir
instances: they create new channels in the pool, which are not freed
after close(), but wait for reuse by other instances.
So, lets see what you make of it :)
Best, Andre.
> When I created a saga.job.Service instance pointing to ssh://localhost/, I
> see four SSH connections initiated and an SFTP connection too.
>
> Any more detailed explanation of the way the multiple connections are used
> and how one might go about making most efficient use of them would be great.
>
> Many thanks,
>
> Jeremy
>
> --
> You received this message because you are subscribed to the Google Groups
> "saga-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
saga-users+...@googlegroups.com.
> For more options, visit
https://groups.google.com/d/optout.
--
99 little bugs in the code.
99 little bugs in the code.
Take one down, patch it around.
127 little bugs in the code...