We have more information. We changed our app from a vanilla executable to an NSApplication bundle (but we did not directly change how threads are created or spawned). This seems to have resulted in the following underlying change.
Before, threads are created via pthread_create. Each pthread gets its own VM allocation with guard pages. When Crashpad calls mach_vm_region() to find the stack bounds for a thread's SP, each thread maps to a different VM region with a unique end address. Result: ~20 KB captured per thread.
After the change, the app is an NSApplication that links AppKit.framework, which brings in GCD/libdispatch. GCD manages thread pools by allocating stacks from shared VM regions — multiple GCD worker threads, dispatch queues, NSEventThread, etc. all have their stack pointers within the same large VM region. When Crashpad calls mach_vm_region() for each thread's SP, it gets the same region end address for all threads in that group.
The result, each captured thread stack region runs from the start of the thread's address space all the way to the end of the large shared VM region, resulting in massive overlap in captured memory between the threads. I patched Crashpad to truncate the overlap, but the memory captured per-thread is still fundamentally too large ... it's not stopping capture at the end of the threads stack space.