What is the best way to use GPU with TFX?

265 views

Skip to first unread message

박찬성

unread,

Feb 14, 2021, 11:57:00 PM2/14/21

to TensorFlow Extended (TFX)

As far as I know, there is no way to allocate individual machine type for each TFX components (like ExampleGen - CPU only, Trainer - GPU, ...).

My environment is TFX+Kubeflow in Cloud AI Platform(CAIP). Without "CAIP training", I need to configure all kubernetes nodes with GPU I guess. And if that is true, it is going to cost a lot of money!

So I have switched to use "CAIP Training".

It works ok, but there is one problem.

TFX docker image does not come with NVIDIA/CUDA support which I have to set up by myself to create a custom docker image.

I have tried to compose Dockerfile something like ...

```

FROM tensorflow/tensorflow:latest-gpu

FROM tensorflow/tfx:latest

ENTRYPOINT ["python3.7", "/tfx-src/tfx/scripts/run_executor.py"]

```

and it didn't work.

Whenever I try to build docker images, the logs shows me the below error message.

> Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0:

Could anyone please let me know materials to look up for building a custom docker image for AI Platform training with TFX?

PS;

just setting `scaleTier` to `BASIC_GPU` does not work too. I guess this is because TFX entry is not specified in the designated container for BASIC_GPU.

Reply all

Reply to author

Forward

0 new messages