Intermittent failure to read output when using SSHClientProcess.wait

6 views
Skip to first unread message

Ethan Blackwood

unread,
Jul 14, 2024, 11:32:25 PM7/14/24
to asyncssh-users
Hi,

I posted this as an issue but realized that might be the wrong forum, so copying it here:

I'm having an issue that may just be user error, as I'm new to asyncio - hoping I can get some guidance.

I'm trying to start an SSH process and then wrap a coroutine that waits for it to complete in a Task. My problem is that >50% of the time, I am reading an empty string instead of the output I expect from the `SSHCompletedProcess` object. This is a command that 100% always returns a line of output when run on the host (and I do get the output I expect about 30-40% of the time).

Here is my waiting coroutine:
```python
async def wait_for_completion(conn: asyncssh.SSHClientConnection, proc: asyncssh.SSHClientProcess, timeout: Optional[float] = None,
                              check=False, read_output=False, expect_output=False) -> asyncssh.SSHCompletedProcess:
    """
    Coroutine that monitors a process for having exited and closes it if it runs beyond timeout seconds.
    If read_output is True, stdout will be consumed and printed to the terminal. Then it cannot be read from the returned object.
    """
    start_time = time.time()
    try:
        result = None
        while True:
            if read_output:  # this is always false, will probably remove
                print(proc.collect_output())
            try:
                result = await proc.wait(check=check, timeout=0.2)
                while not proc.stdout.at_eof():
                    await asyncio.sleep(0)
                break
            except asyncssh.TimeoutError:
                if timeout is None or time.time() - start_time < timeout:
                    continue
                else:
                    raise

        # I added this as a desperate attempt, but it didn't help
        # hack to maybe fix race condition?
        assert result is not None, 'Only way to exit is for result to be assigned'
        if expect_output and not read_output:
            wait_time = 5
            start_loop = time.time()
            while not result.stdout and time.time() - start_loop < wait_time:
                await asyncio.sleep(0.2)
            if not result.stdout:
                # print('SSH command: ', result.command)
                print('Result: ', result)
                print()
                raise RuntimeError(f'Output was not assigned in {wait_time} seconds')
        return result
    finally:
        if read_output:
            print(proc.collect_output())
        proc.close()
        conn.close()
```

Here is how I'm calling it:
```python
# in start_command_on_host (note read_output/print_output is always false, will probably remove)
conn = await asyncssh.connect(host.name, username=host.user if host.user is not None else ())
proc = await conn.create_process(command, term_type='xterm-color', stderr=asyncssh.STDOUT)
return asyncio.create_task(wait_for_completion(conn, proc, timeout=timeout, check=check,
                                               read_output=print_output, expect_output=expect_output))
...
# in another coroutine
task = await start_command_on_host(command, host=host, print_output=False, timeout=timeout, check=True,
                                       use_srun=use_slurm, sbatch_opts=slurm_args, expect_output=expect_nonempty)
res = await task
output = get_string_output(res)
```

And this is the output I'm getting when it fails:
```
Result:  env: {}, command: bash -ic 'conda run -n mescore python -c '"'"'import cmcode; print(next(iter(cmcode.__path__)))'"'"'', subsystem: None, exit_status: 0, exit_signal: None, returncode: 0, stdout: , stderr:

Traceback (most recent call last):
  ... clipped ...
  File "\\proektdata\bigdata\eblackwood\2p_analysis\cmcode\caimanlab.py", line 29, in get_path_from_host
    output = await remoteops.get_output_from_command(command, host, timeout=timeout, expect_nonempty=True)
  File "\\proektdata\bigdata\eblackwood\2p_analysis\cmcode\remoteops.py", line 286, in get_output_from_command
    res = await task
  File "\\proektdata\bigdata\eblackwood\2p_analysis\cmcode\remoteops.py", line 202, in wait_for_completion
    raise RuntimeError(f'Output was not assigned in {wait_time} seconds')
RuntimeError: Output was not assigned in 5 seconds
```

Expected output (when running on remote host):
```
(base) ethan@gpu01:~$ bash -ic "conda run -n mescore python -c 'import cmcode; print(next(iter(cmcode.__path__)))'"
/mnt/bigdata/eblackwood/2p_analysis/cmcode

(base) ethan@gpu01:~$ bash -ic "conda run -n mescore python -c 'import cmcode; print(next(iter(cmcode.__path__)))'"
/mnt/bigdata/eblackwood/2p_analysis/cmcode

(base) ethan@gpu01:~$ bash -ic "conda run -n mescore python -c 'import cmcode; print(next(iter(cmcode.__path__)))'"
/mnt/bigdata/eblackwood/2p_analysis/cmcode

etc.
```

Anything I'm doing clearly wrong here, or could there be a bug?
Reply all
Reply to author
Forward
0 new messages