Hello,
I am using a GCP instance that is controlled by an external autoscaler machine.
Sometimes when I create the instance (it's based on a pre-existing image).
My python program that runs on that new instance does not recognize the GPU on the instance.
During the initialization phase I added the reinstallation of the drivers through the script in /opt/deeplearning/install-drivers.sh.
The issue is that it seems that when I use the paramiko package (exec_command method) in python to do it remotely from my autoscaler.
https://docs.paramiko.org/en/stable/api/channel.htmlIt doesn't work, I had to add get_pty=True argument to make it work,
or so I believe it helped.
Does the installation script forces an SSH session with terminal?
Thanks
======================================
Welcome to the Google Deep Learning VM
======================================
Version: pytorch-gpu.1-11.m91
Based on: Debian GNU/Linux 10 (buster) (GNU/Linux 4.19.0-21-cloud-amd64 x86_64\n)
To reinstall Nvidia driver (if needed) run:
sudo /opt/deeplearning/install-driver.sh
Linux test-3d-61bdc7b0-3fac-4bd0-9837-9f19f0046760 4.19.0-21-cloud-amd64 #1 SMP Debian 4.19.249-2 (2022-06-30) x86_64