Duplicate events when Perfetto library is built with inlining

9 views
Skip to first unread message

Wyatt Spear

unread,
Oct 22, 2025, 2:51:08 PMOct 22
to Perfetto Development - www.perfetto.dev
I'm building the v52.0 amalgamated source to add Perfetto trace output functionality to the TAU performance analysis toolkit. It's mostly working, but when we build the Perfetto library with -O1 or higher we start seeing duplicate events in some of our tracks (example bad trace follows. There should only be two events recorded in each worker thread: http://yu.nic.uoregon.edu/~wspear/perfetto/optimize/tau.perfetto.gz.18) .  We get valid traces (example valid trace: http://yu.nic.uoregon.edu/~wspear/perfetto/optimize/tau.perfetto.gz.22)  if we build the library with -O0 or  -O2 -fno-inline  (I started seeing a smaller number of duplicates in our test case with  -O3 -fno-inline, so there may be something else going on here).

I confirmed that TAU is sending the correct number of events to the Perfetto event macros. The duplication is taking place inside the library. Our implementation is here: https://github.com/UO-OACISS/tau2/blob/master/src/Profile/TracerPerfetto.cpp  I couldn't find much documentation on the Perfetto writer api so if there's some kind of secret to making inline-safe calls, that would be useful to know about.

I'm building with g++ (Ubuntu 14.2.0-4ubuntu2~24.04) 14.2.0

I would be happy to provide more information and do deeper debugging or go drop an issue on the github repo if that is appropriate.

Thanks,
Wyatt Spear

Lalit Maganti

unread,
Oct 22, 2025, 3:10:57 PMOct 22
to Wyatt Spear, Perfetto Development - www.perfetto.dev
Hi Wyatt,

Thanks for reaching out!

This is indeed very weird. I don't think we've ever seen anything like this before on any of Android, Chrome or any other embedder of the SDK in or out of Google.

So one thing I notice in your code which concerns me a little bit: I see you are doing some sort of "merging of traces" by concatenating n different traces together. I cannot quite piece together from the code how exactly/why exactly you do this. Can you please add a bit more colour there? I suspect it has something to do with that and the inlining thing might be a bit of a red herring.

Specifically I'm interested in the following:
1) Are you collecting n instances of the trace in parallel on different threads? If so, why? Why not use the fact that Perfetto can collect data across the whole process with a single tracing session?
2) Does the problem repro if you only run the trace at a time and open just that trace?

FWIW, doing `cat *.trace > out.trace` is actually an unspecified behaviour in most contexts. It's one of those things which works *most* of the time but it's not actually supported (except in special contexts where we can say for sure this is safe). Definitely taking SDK collected traces and merging them in that way is *not* defined behaviour.,

The "official" way to merge traces is given by https://github.com/google/perfetto/issues/1018. There's currently an experimental implementation of it in the tree and I hope to stabilize it by the end of the year. But there are already instructions there to try it out and I'll give you some secret instruction: the supported "programatic" way to merge traces will be to zip or tar the traces into a file and open that.

Hope this helps! Looking forward to hearing from you.

Best,
Lalit

--
You received this message because you are subscribed to the Google Groups "Perfetto Development - www.perfetto.dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to perfetto-dev...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/perfetto-dev/3dc253a2-8c79-49fb-a7ad-15c2b574f502n%40googlegroups.com.

Wyatt Spear

unread,
Oct 22, 2025, 3:27:13 PMOct 22
to Perfetto Development - www.perfetto.dev
Our trace merging (just file concatenation) is only used for MPI applications, which produce individual traces with their own 'thread 0' and potentially worker threads for each MPI rank. I've tested that functionality a bit and it has worked pretty well, though I will be happy to switch to a more official and correct merging strategy when it is available in a release.

That isn't happening in this example, which is a non-MPI run of an introductory pytorch demo. The 'merging' operation is superfluously running on a single rank, but that is essentially renaming a single trace file.

Regards,
Wyatt

Lalit Maganti

unread,
Oct 22, 2025, 3:51:22 PMOct 22
to Wyatt Spear, Perfetto Development - www.perfetto.dev
Ah thanks for the clarification, that really helps provide context. Though it does deepen the mystery. 

I think one thing which would really help out here is a bit more of a minimal repro which demonstrates the same issue we can try out without needing to understand the complexities of Tau. Is that possible for you to do? Would really help us come up with a theory. 

Best,
Lalit

Reply all
Reply to author
Forward
0 new messages