TL;DR:
* Maybe you want to tag allocations in NewFoo if you want to know in aggregate what types of things are being leaked in analysis
* Double check all threads are paused. You can spawn threads while attaching to all the threads.
* If the approach you have is working "fast enough" for you, the approach seems mostly correct modulo the above.
* I can't comment on the ELF stuff.
* Analyzers will benefit from having a detailed understanding of Go's allocator regardless.
It's been about 10 years so I'm fuzzy on all the details, but I worked on a tool to do this at Fastly where the server processes were hundreds of gigs resident memory and tens of thousands of threads, though the system was C. We already had tagged allocations, so it was easy to figure out what types allocations were, and even what was allocated, even without understanding allocator internals. I imagine this could be useful to do if you want to figure out what kinds of things are being leaked, even if you have some more detailed knowledge of Go's allocator. In this project, I wrote a custom output format for analysis, because I was looking specifically to analyze leaks and we had thousands of maps that were irrelevant for holding pointers. I also have a fairly limited understanding of ELF, so I can't speak to that aspect of your tool.
I'm not clear on what classifies as "outage" for you; we drained traffic from machines under analysis so that it wouldn't cause outages. This kind of thing can take a long time, though if this program is falling within an acceptable time, maybe you can ignore some of my cautions about these aspects. Problematically, for memory leak analysis, you need to pause allocators for the duration of instrumentation. It's usually easier to just pause everything for the duration of copying maps, which seems like what you're doing.
A cursory look over your project seems like most issues have been identified. I've seen tools forget to collect register state before, and that often holds references to huge live arenas that otherwise might go unreported.
Whatever method you use, you'll need to leave the whole process, and all threads, paused for a significant period of time to traverse and write the maps to disk. If you have any in-process liveness checks, turn those off so that when ptrace detaches, your process doesn't automatically die. (Or that attaching doesn't take so long that the health check thread lives long enough to kill the system while you're waiting to attach to everything.)
You might want to do an extra pass after running FreezeAllThreads to check if any threads were created while you were busy pausing threads. I haven't done a detailed review so I'm not sure if this means you'll also have to re-scan maps to get those thread stacks etc.
Not sure how you're doing core analysis but of course having some analyzer that understands how to read go's allocator state will be helpful.
Hope that's helpful in some capacity. But overall, this looks like a great project, and mostly correct. It really doesn't take that much code to do this kind of thing!
Kind regards,
--dho