Here's a rough overview of our pipeline:
(1) Symbolize traces and convert everything into a more manageable format [we use protos]. I assume you've already figured this out, but just in case, there's python symbolization logic
here.
(2) We have a list of "known bugs", which maps simple regex expressions [e.g. stack contains 'P2PSocketUdp::Init'] to crbug numbers [in this case, issue 873785].
(3) Group all allocations in all traces by stack.
(4) For each group of allocations, if there is no matching regex expression for the stack, and the largest allocation > 100MB, then requires human follow up.
Parallel pipeline just in case previous pipeline misses something:
(5) Sort traces by total size of all allocations.
(6) Emit the top 20 traces, and for each, emit total size of all stacks matching regex signatures in (2).
e.g. Report X has 5GB total allocations, 4.7GB allocations match "macOS Preferences leak", 58MB match "LevelDB-Extensions".
(7) If there are traces with large total allocation, but no matching signatures, then requires human follow up.
Finally, as we find and file bugs, we update the list of "known bugs".