I understand I can add an arbitrary counter as a user counter (examples with numFoos, etc.)
What I am interested in is not only the overall time of an algorithm but the time of a subcomponent. This is for an integration type of test where I'm hitting an API. There is a component of that that is running on GPU but there is also the time busing data in/out.
I'd like to know the end to end time but also the GPU time. Can I show a user counter that represents an average of the GPU time? It'd be great to have outliers removed as they'd be for the overall time but looking to get something on the same line. I was printing out times per iteration which as you can imagine was very noisy in the output.