TFX Trainer component

48 views
Skip to first unread message

Maisa Daoud

unread,
Nov 24, 2021, 5:08:37 PM11/24/21
to TensorFlow Extended (TFX)
Hi there,
I'm trying to replicate https://www.tensorflow.org/tfx/tutorials/tfx/gcp/vertex_pipelines_simple#set_up_variables but using a different module_file (train.py) which imports functions from another file 
```from tools.foo import funcx ```. I uploaded train.py, tools/* to the same MODULE_ROOT = 'gs://...`. However, I get errors such as ModuleNotFoundError: No module named 'tools'

Can you please give me a hint on how to solve this issue.

Regards,
Maisa 

Hannes Hapke

unread,
Nov 27, 2021, 2:48:35 PM11/27/21
to TensorFlow Extended (TFX), maisa...@servian.com
Hi Maisa, 

TL;DR: The trainer component only loads the module file from the GCS path. To get access to your dependencies, you need to provide them through other means.

You have a two options:
1) You can create a custom Docker image which is extending the default TFX image and contains the dependencies. Update the Python path of the container image
Your custom docker file could look like this:
```
WORKDIR /pipeline
COPY local_path_to_your_dependencies/ ./
ENV PYTHONPATH="/pipeline:${PYTHONPATH}"
```

After you have built the docker image and pushed it to the Google Container Registry, you can use the custom image in two ways: 

1 A) update the container image for your Vertex AI Trainer:
```

VERTEX_TRAINING_SPEC = {
"project": GOOGLE_CLOUD_PROJECT,
"worker_pool_specs": [
{
"machine_spec": {
"machine_type": GOOGLE_CLOUD_MACHINE_TYPE,
},
"replica_count": 1,
"container_spec": {
"image_uri": tfx_image,
},
}
],
}

trainer = tfx.extensions.google_cloud_ai_platform.Trainer(
...
custom_config={
tfx.extensions.google_cloud_ai_platform.ENABLE_UCAIP_KEY: True,
tfx.extensions.google_cloud_ai_platform.UCAIP_REGION_KEY: YOUR_GOOGLE_CLOUD_REGION,
tfx.extensions.google_cloud_ai_platform.TRAINING_ARGS_KEY: VERTEX_TRAINING_SPEC,
},
)
```

1 B) Update the image for all components:
```

# if you use Kubeflow:
kubeflow_dag_runner.KubeflowDagRunnerConfig(tfx_image=tfx_image)

tfx.v1.orchestration.experimental.KubeflowV2DagRunnerConfig(default_image=tfx_image)
```

2) If your dependency is only one additional Python module file (not a pip package), you could concatenate the dependency code and the trainer code and then load the combined file from GCS. Otherwise, you could dynamically copy the module file to your trainer environment (https://github.com/tensorflow/tensorflow/blob/v2.7.0/tensorflow/python/lib/io/file_io.py#L518-L584) and then dynamically register it in the Python path before loading it. I believe `tf.io.gfile.copy` supports remote files, but it can't copy folders. Therefore, this option is more a hack than a proper implementation.

I hope this help.

- Hannes  
Reply all
Reply to author
Forward
0 new messages