--
You received this message because you are subscribed to the Google Groups "Google Research Football" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-research-fo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-research-football/30e8e730-0b2b-4cb6-8762-2f392ce0bb48n%40googlegroups.com.
Hi Piotr,Yet another ever so small question :-)Do you think it'd be possible to utilize Kaggle notebook TPU for training ? Default GPU based setup/notebook blows up pretty quickly on say a 1M frames. TPU would surely allow for more frames to be trained.I tried modifying utils.py so that scripts start using TPU, but it fails further down the road ( I guess some other important infrastructure pieces are missing ):I1013 13:43:37.511072 140532635748160 learner.py:304] Restoring checkpoint: /kaggle_simulations/agent/ckpt-1 2020-10-13 13:43:37.517063: W tensorflow/core/distributed_runtime/eager/remote_tensor_handle_data.cc:76] Unable to destroy remote tensor handles. If you are running a tf.function, it usually indicates some op in the graph gets an error: File system scheme '[local]' not implemented (file: '/kaggle_simulations/agent/') Encountered when executing an operation using EagerExecutor. This error cancels all future operations and poisons their output tensors. Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1486, in restore_v2 2020-10-13 13:43:37.520486: W ./tensorflow/core/distributed_runtime/eager/destroy_tensor_handle_node.h:58] Ignoring an error encountered when deleting remote tensors handles: Invalid argument: Unable to find the relevant tensor remote_handle: Op ID: 8317, Output num: 0 Additional GRPC error information: {"created":"@1602596617.520428187","description":"Error received from peer ipv4:10.0.0.2:8470","file":"external/com_github_grpc_grpc/src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Unable to find the relevant tensor remote_handle: Op ID: 8317, Output num: 0","grpc_status":3} dtypes) tensorflow.python.eager.core._FallbackException: This function does not handle the case of the path where all inputs are not already EagerTensors.--On Tuesday, October 13, 2020 at 9:33:29 AM UTC+2 stanczyk wrote:At a time of writing the paper AI Platform didn't allow to run experiments with TPU + multiple workers running environment. That is why we decided to opensource GPU training instead.TPU training could be done directly using VMs / Kubernetes, but for code simplicity we decided on providing AI Platform example.On Tue, Oct 13, 2020 at 9:07 AM Ranko Mosic <ranko...@gmail.com> wrote:Hi Piotr,I am actually interested in training seed RL on TPUs.I went for Distributed Training using AI Platform, train_football_checkpoints.sh for example. It seems it is using GPU ( refers to P100 ).The paper says TPU was used for training, I wonder if there are scripts available that use TPU ?Greetings !!On Monday, October 12, 2020 at 3:35:07 PM UTC+2 stanczyk wrote:Looks like you have some incompatible versions of system binaries, which result in undefined symbol. I suggest trying with a Docker image. Please have a look here: https://github.com/google-research/seed_rl/#local-machine-training-on-a-single-level--
You received this message because you are subscribed to the Google Groups "Google Research Football" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-research-fo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-research-football/30e8e730-0b2b-4cb6-8762-2f392ce0bb48n%40googlegroups.com.
You received this message because you are subscribed to the Google Groups "Google Research Football" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-research-fo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-research-football/cd3c543e-1a66-4da0-91c9-b3f12e46a297n%40googlegroups.com.