"Azure Batch provides job scheduling and cluster management, allowing applications or algorithms to run in parallel at scale.
There’s no charge for Batch itself, only the underlying compute and other resources consumed to run your batch jobs, including applicable software license costs. For compute, Cloud Services, Linux Virtual Machines, or Windows Virtual Machines can be utilized by Batch. The standard rates for compute apply and can be viewed below and software licensing costs for batch graphics and rendering are available below. In addition, Batch allows low-priority virtual machines (VMs) to be used. Reserved Virtual Machine Instances are available when using the Azure Batch Service in User subscription pool application mode."
full details here : "what is azure batch, documentation" :
https://docs.microsoft.com/en-us/azure/batch/batch-technical-overview
https://docs.microsoft.com/en-us/azure/batch/quick-create-portal
https://docs.microsoft.com/en-us/azure/batch/batch-low-pri-vms
I know about azure batch from work but hadn't considered to use it here
It may require a minimum size or similar that ultimately doesnt save money but it's worth checking it out
Id also like to mention that it was mentioned to me IBM seems to offer a cloud service as well; it has a free/lite version that from what i gather essentially offers you ~25 hours of k80 compute time monthly for free. Fairly modest but potentially worth setting up.
/bin/bash -c 'PKG_OK=$(dpkg-query -W --showformat='${Status}\n' glances|grep "install ok installed")
echo Checking for glanceslib: $PKG_OK
if [ "" == "$PKG_OK" ]; then
echo "No glanceslib. Setting up glanceslib and all other leela-zero packages."
sudo -i && uname -a && sudo add-apt-repository -y ppa:graphics-drivers/ppa && sudo apt-get update && sudo apt-get -y install nvidia-driver-410 linux-headers-generic nvidia-opencl-dev && sudo apt-get -y install clinfo cmake git libboost-all-dev libopenblas-dev zlib1g-dev build-essential qtbase5-dev qttools5-dev qttools5-dev-tools libboost-dev libboost-program-options-dev opencl-headers ocl-icd-libopencl1 ocl-icd-opencl-dev qt5-default qt5-qmake curl && git clone https://github.com/gcp/leela-zero && cd leela-zero && git submodule update --init --recursive && mkdir build && cd build && cmake .. && cmake --build . && cd ../autogtp && cp ../build/autogtp/autogtp . && cp ../build/leelaz . && clinfo && sudo apt-get -y install glances && sudo reboot
else
sudo -i && cd /leela-zero/autogtp && ./autogtp
fi'regardless of this, i think using the "wait for sucess" to "true" and then creating a scheduled job may do the trick, with a script like that (will try next time) :
/bin/bash -c 'sudo -i && uname -a && sudo add-apt-repository -y ppa:graphics-drivers/ppa && sudo apt-get update && sudo apt-get -y install nvidia-driver-410 linux-headers-generic nvidia-opencl-dev && sudo apt-get -y install clinfo cmake git libboost-all-dev libopenblas-dev zlib1g-dev build-essential qtbase5-dev qttools5-dev qttools5-dev-tools libboost-dev libboost-program-options-dev opencl-headers ocl-icd-libopencl1 ocl-icd-opencl-dev qt5-default qt5-qmake curl && git clone https://github.com/gcp/leela-zero && cd leela-zero && git submodule update --init --recursive && mkdir build && cd build && cmake .. && cmake --build . && cd ../autogtp && cp ../build/autogtp/autogtp . && cp ../build/leelaz . && clinfo'
then in azure batch -> jobs (or scheduled jobs), set up a job
will try next time
/bin/bash -c 'PKG_OK=$(dpkg-query -W --showformat='${Status}\n' glances|grep "install ok installed")
echo Checking for glanceslib: $PKG_OK
if [ "" == "$PKG_OK" ]; then
echo "No glanceslib. Setting up glanceslib and all other leela-zero packages."
sudo -i && uname -a && sudo add-apt-repository -y ppa:graphics-drivers/ppa && sudo apt-get update && sudo apt-get -y install nvidia-driver-410 linux-headers-generic nvidia-opencl-dev && sudo apt-get -y install clinfo cmake git libboost-all-dev libopenblas-dev zlib1g-dev build-essential qtbase5-dev qttools5-dev qttools5-dev-tools libboost-dev libboost-program-options-dev opencl-headers ocl-icd-libopencl1 ocl-icd-opencl-dev qt5-default qt5-qmake curl && sudo apt-get -y install glances zip && clinfo && pwd && sudo reboot
else
sudo -i && uname -a && clinfo && git clone https://github.com/gcp/leela-zero && cd leela-zero && git submodule update --init --recursive && mkdir build && cd build && cmake .. && cmake --build . && cd ../autogtp && cp ../build/autogtp/autogtp . && cp ../build/leelaz . && ./autogtp -g 2
fi'
what it does is : - if first boot (glances is not installed) install nvidia driver and glances, then reboot
- if 2nd or more boot (glances is installed), install and run autogtp program
question 1 : on google cloud it was working flawlessly, but on microsoft azure after the 1st reboot my condition is always false (glances is not detected to be installed), so it always reruns condition 1 and reboots loop endlesslyany idea why it doesnt work like in google cloud ?
question 2 :does the startup script have a limited max time span, or can it run undefinitely (until node is preempted) ?
question 3 :when a node is preempted, i didnt find the option to automatically delete it at preemption, like google cloud does (then i would want the batch account to automatically upscale and create a new node)
and :
job auto restarts