Hello, everyone. I've been working on an aiida and aiida-vasp workchain. My calculations appear to work as they reach the cluster, and are reflected on the queue manager (SGE) but they eventually fail between aiida-vasp iterations with this error:
*** 3 LOG MESSAGES:
+-> ERROR at 2023-02-26 18:04:52.151421+01:00
| Traceback (most recent call last):
| File "/home/user/.virtualenvs/aiida/lib/python3.8/site-packages/paramiko/channel.py", line 699, in recv
| out = self.in_buffer.read(nbytes, self.timeout)
| File "/home/user/.virtualenvs/aiida/lib/python3.8/site-packages/paramiko/buffered_pipe.py", line 164, in read
| raise PipeTimeout()
| paramiko.buffered_pipe.PipeTimeout
|
| During handling of the above exception, another exception occurred:
|
| Traceback (most recent call last):
| File "/home/user/.virtualenvs/aiida/lib/python3.8/site-packages/aiida/engine/utils.py", line 186, in exponential_backoff_retry
| result = await coro()
| File "/home/user/.virtualenvs/aiida/lib/python3.8/site-packages/aiida/engine/processes/calcjobs/tasks.py", line 193, in do_update
| job_info = await cancellable.with_interrupt(update_request)
| File "/home/user/.virtualenvs/aiida/lib/python3.8/site-packages/aiida/engine/utils.py", line 93, in with_interrupt
| result = await next(wait_iter)
| File "/usr/lib/python3.8/asyncio/tasks.py", line 619, in _wait_for_one
| return f.result() # May raise f.exception().
| File "/usr/lib/python3.8/asyncio/futures.py", line 178, in result
| raise self._exception
| File "/home/user/.virtualenvs/aiida/lib/python3.8/site-packages/aiida/engine/utils.py", line 186, in exponential_backoff_retry
| result = await coro()
| File "/home/user/.virtualenvs/aiida/lib/python3.8/site-packages/aiida/engine/processes/calcjobs/tasks.py", line 193, in do_update
| job_info = await cancellable.with_interrupt(update_request)
| File "/home/user/.virtualenvs/aiida/lib/python3.8/site-packages/aiida/engine/utils.py", line 93, in with_interrupt
| result = await next(wait_iter)
| File "/usr/lib/python3.8/asyncio/tasks.py", line 619, in _wait_for_one
| return f.result() # May raise f.exception().
| File "/usr/lib/python3.8/asyncio/futures.py", line 178, in result
| raise self._exception
| File "/home/user/.virtualenvs/aiida/lib/python3.8/site-packages/aiida/engine/utils.py", line 186, in exponential_backoff_retry
| result = await coro()
| File "/home/user/.virtualenvs/aiida/lib/python3.8/site-packages/aiida/engine/processes/calcjobs/tasks.py", line 193, in do_update
| job_info = await cancellable.with_interrupt(update_request)
| File "/home/user/.virtualenvs/aiida/lib/python3.8/site-packages/aiida/engine/utils.py", line 93, in with_interrupt
| result = await next(wait_iter)
| File "/usr/lib/python3.8/asyncio/tasks.py", line 619, in _wait_for_one
| return f.result() # May raise f.exception().
| File "/usr/lib/python3.8/asyncio/futures.py", line 178, in result
| raise self._exception
| File "/usr/lib/python3.8/asyncio/tasks.py", line 280, in __step
| result = coro.send(None)
| File "/home/user/.virtualenvs/aiida/lib/python3.8/site-packages/aiida/engine/processes/calcjobs/manager.py", line 180, in updating
| await self._update_job_info()
| File "/home/user/.virtualenvs/aiida/lib/python3.8/site-packages/aiida/engine/processes/calcjobs/manager.py", line 132, in _update_job_info
| self._jobs_cache = await self._get_jobs_from_scheduler()
| File "/home/user/.virtualenvs/aiida/lib/python3.8/site-packages/aiida/engine/processes/calcjobs/manager.py", line 109, in _get_jobs_from_scheduler
| scheduler_response = scheduler.get_jobs(**kwargs)
| File "/home/user/.virtualenvs/aiida/lib/python3.8/site-packages/aiida/schedulers/scheduler.py", line 326, in get_jobs
| retval, stdout, stderr = self.transport.exec_command_wait(self._get_joblist_command(jobs=jobs, user=user))
| File "/home/user/.virtualenvs/aiida/lib/python3.8/site-packages/aiida/transports/transport.py", line 443, in exec_command_wait
| retval, stdout_bytes, stderr_bytes = self.exec_command_wait_bytes(command=command, stdin=stdin, **kwargs)
| File "/home/user/.virtualenvs/aiida/lib/python3.8/site-packages/aiida/transports/plugins/ssh.py", line 1470, in exec_command_wait_bytes
| stdout_bytes.append(stdout.read())
| File "/home/user/.virtualenvs/aiida/lib/python3.8/site-packages/paramiko/file.py", line 200, in read
| new_data = self._read(self._DEFAULT_BUFSIZE)
| File "/home/user/.virtualenvs/aiida/lib/python3.8/site-packages/paramiko/channel.py", line 1361, in _read
| return self.channel.recv(size)
| File "/home/user/.virtualenvs/aiida/lib/python3.8/site-packages/paramiko/channel.py", line 701, in recv
| raise socket.timeout()
| socket.timeout
+-> WARNING at 2023-02-26 19:34:11.153802+01:00
| Parsing total_energies from <aiida_vasp.parsers.content_parsers.vasprun.VasprunParser object at 0x7f1c21203d30> failed, exception: None
+-> WARNING at 2023-02-26 19:34:11.244651+01:00
| output parser returned exit code<700>: Calculation did not reach the end of execution.
The error "paramiko.buffered_pipe.PipeTimeout" is raised. As paramiko is involved, I assume this is some kind of ssh-related error. I've attempted increasing the timeout period on the computer setup, but to no avail. I don't know any other possible fixes. What could be causing this and how could it be fixed?
Many thanks,
Pol