I logged the session run full trace with the following code:
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
_, step, summary = sess.run([train_step, global_step, summary_op],
options=run_options,
run_metadata=run_metadata)
writer.add_run_metadata(run_metadata, 'step%d' % step)
writer.add_summary(summary, step)
When I view the session run graph in TensorBoard, there are something node marked as "unused substructure", and no stats is available for them like time consumption. From the code logic, these nodes should be executed, and since the trace is full trace, so all ops should be traced. So why there are "unused substructure"? Are these ops removed due to inlining?
Another question, I noticed the major time is consumed by "Excluded by default since this is a CPU thread setting up GPU kernels" nodes (take around 70% of embedding_loopuk node, aboud 0.2s), but they are not shown in the graph. So is this time reasonable?