My worker microservice uses TFJS to predict video frames using a container running on a cluster of VMs on Google Kubernetes Engine (GKE). I’m using a gpu-enabled container which is built on top of the tensorflow/tensorflow-nightly-gpu 1 image. That image is 2.67 GB! 1 and it takes several minutes to start up after my worker VM is ready. It looks like the NVIDIA CUDA libs are the bulk of that, at 1.78 GB + 624 MB.
Can I minimize the CUDA installation in any way since I’m only using TFJS for prediction/inference, not training, and using the tfjs-node-gpu WebGL-enabled backend? Are there any smaller base images that will support TFJS prediction?