What is the best way to use GPU with TFX?

265 views
Skip to first unread message

박찬성

unread,
Feb 14, 2021, 11:57:00 PM2/14/21
to TensorFlow Extended (TFX)
As far as I know, there is no way to allocate individual machine type for each TFX components (like ExampleGen - CPU only, Trainer - GPU, ...).

My environment is TFX+Kubeflow in Cloud AI Platform(CAIP). Without "CAIP training", I need to configure all kubernetes nodes with GPU I guess. And if that is true, it is going to cost a lot of money!

So I have switched to use "CAIP Training". 

It works ok, but there is one problem.
TFX docker image does not come with NVIDIA/CUDA support which I have to set up by myself to create a custom docker image.

I have tried to compose Dockerfile something like ...

```

FROM tensorflow/tensorflow:latest-gpu

FROM tensorflow/tfx:latest


ENTRYPOINT ["python3.7", "/tfx-src/tfx/scripts/run_executor.py"]

```
and it didn't work. 
Whenever I try to build docker images, the logs shows me the below error message.
> Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0:

Could anyone please let me know materials to look up for building a custom docker image for AI Platform training with TFX?

PS;
just setting `scaleTier` to `BASIC_GPU` does not work too. I guess this is because TFX entry is not specified in the designated container for BASIC_GPU.

Reply all
Reply to author
Forward
0 new messages