Tensorflow csv dataset multi gpu training issue

16 views
Skip to first unread message

Ashwini Padhy

unread,
Jul 18, 2020, 3:27:17 PM7/18/20
to TensorFlow Community Testing
Hi,

I followed the blog for the distributed training with multi gpu.When I train with single gpu this works but with multi gpu i have added mirrored strategy but it throw the error.I am using csv dataset for the training.

Here used tf-2.2.0, CUDA/cuDNN version: cuda 10.2 and cuDNN 7.6.5 GPU model and memory: Tesla T4 15109MB

Followed google blog while dev but gpus are not getting used.git notebook is here

https://github.com/Akpadhy/tensorflow_model/blob/master/GPU_Testing_DL.ipynb

Any guess what should be the issue for the error:

enter TypeError: in user code: /usr/lib/environs/e-a-2019.03-py-3.7.3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py:1170 _call_model_fn * model_fn_results = self._model_fn(features=features, **kwargs) /media/ephemeral0/combined_estimator.py:39 model_fn * if spec.train_op: /usr/lib/environs/e-a-2019.03-py-3.7.3/lib/python3.7/site-packages/tensorflow/python/autograph/operators/control_flow.py:924 if_stmt basic_symbol_names, composite_symbol_names) /usr/lib/environs/e-a-2019.03-py-3.7.3/lib/python3.7/site-packages/tensorflow/python/autograph/operators/control_flow.py:962 tf_if_stmt error_checking_orelse) /usr/lib/environs/e-a-2019.03-py-3.7.3/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py:507 new_func return func(*args, **kwargs) /usr/lib/environs/e-a-2019.03-py-3.7.3/lib/python3.7/site-packages/tensorflow/python/ops/control_flow_ops.py:1177 cond return cond_v2.cond_v2(pred, true_fn, false_fn, name) /usr/lib/environs/e-a-2019.03-py-3.7.3/lib/python3.7/site-packages/tensorflow/python/ops/cond_v2.py:91 cond_v2 op_return_value=pred) /usr/lib/environs/e-a-2019.03-py-3.7.3/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py:981 func_graph_from_py_func func_outputs = python_func(*func_args, **func_kwargs) /usr/lib/environs/e-a-2019.03-py-3.7.3/lib/python3.7/site-packages/tensorflow/python/autograph/operators/control_flow.py:958 error_checking_orelse basic_symbol_names + composite_symbol_names) /usr/lib/environs/e-a-2019.03-py-3.7.3/lib/python3.7/site-packages/tensorflow/python/autograph/operators/control_flow.py:295 _verify_tf_cond_vars ' branches.\n\n{}'.format(name, str(e))) TypeError: "train_op_list" does not have the same nested structure in the TRUE and FALSE branches. The two structures don't have the same nested structure. First structure: type=list str=[<tf.Tensor 'Adam/Identity:0' shape=() dtype=int64>] Second structure: type=NoneType str=None More specifically: Substructure "type=list str=[<tf.Tensor 'Adam/Identity:0' shape=() dtype=int64>]" is a sequence, while substructure "type=NoneType str=None" is not Entire first structure: [.] Entire second structure: . hereRegards,
Reply all
Reply to author
Forward
0 new messages