Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Benchmarking Tensorflow Code Fragments

11 views
Skip to first unread message

Jacob Stevens

unread,
Mar 30, 2021, 11:48:11 AM3/30/21
to TensorFlow End Users - GETTING STARTED, TUTORIALS & HOW-TO'S
Hi, 
  I would like to benchmark some different layers in Tensorflow. This can be done in PyTorch by using CUDA events to ensure sychronization. I am wondering if there is a Tensorflow equivalent for this. I am currently using the naive time.time() method for measuring in Tensorflow, but this seems incorrect. For example, when benchmarking a Conv2D layer with different batch sizes, I get roughly equal times for batch sizes {1, 64, 128, 256}. When benchmarking the same layer in PyTorch using the CUDA events, however, I see that the runtime increases as the batch size increases which matches my intuition (run time increases but throughput improves). 

 What is the best lightweight way to profile snippets of Tensorflow  GPU code programmatically? Running the TF profiler and then inspecting the output in the browser is not ideal. 
Reply all
Reply to author
Forward
0 new messages