Only cpu profiles on OSX are broken (and can be fixed with a kernel patch), all other profiles are fine.
- pprof can track memory a few different ways, but I believe the default is "memory in use by go at the time the profile was made". This does *NOT* include memory collected by the GC. The --help flag to pprof can show the various memory profile display options.
The GC never releases memory, it only re-uses it. From the outside, memory use will only grow or stay level. Also the GC is only automatically run on allocation, when the total in-use memory exceeds a previously set limit, not when objects become inactive.
If you write a program which processes data in 512MB chunks, the profiler may only list 512MB in use (depends on how the program holds on to memory), but when the GC runs it finds all the memory currently in use, and sets a limit of double that for running the next GC cycle. So if you happened to have 2 chunks active at the time, then at least 1GB is in use, and the GC will be next run when a total of 2GB has been allocated. This means that unless your app ends before the next GC, or the GC is called manually, those 2GB of memory will eventually be used. If only one 512MB chunk is active at a later GC, the limit is set to 1GB, but the extra 1GB allocated previously is *NOT* returned to the OS by the GC.
The scavenger runs every 5 minutes and tries to find blocks of memory (allocations are made in blocks) which are completely unused by go code. This can be difficult, as if any byte in the block is active, the whole block remains active. If it finds any inactive blocks, it tells the OS that they are "Unused", which is slightly different then "Free". Unfortunately the OS still counts that memory against the process, but if it needs memory, it will use the "Unused" group first, and the contents don't need to be preserved (ie: swapped out to disk), so the operation is nearly as cheap as on "Free" memory.
Also, maps and slices are easy ways to lose track of memory, since you may only be using len() elements, but the allocations can be larger. Eg: Slicing an array of length 21 to take only the middle 3 elements doesn't make the remaining 18 elements disappear in any way.
The value of Sys is the total number of bytes Go has asked from the system, which in normal (non-cgo) programs should roughly match the system's measurement, Alloc is the number of bytes Go believes is active at the moment. Even though the example functions only generate slices/map that contain 10*1MB of data, the program stabalizes at a Sys value more than 10 times that. Some of the extra space makes sense, as there is more memory in use than just the slices/maps, we are storing at least one 1MB element on the stack before copying it to the slice/map, and some extra allocations are needed due to re-sizing of the slices and maps. On re-size, both versions must exist at the same time, so at best with a doubling algorithm for slices, a slice of 0.5n elements and one of n must both be active at some point in the function, at worst it can be 2n-1 elements worth. Adjusting the boolean options in main show the results if you pre-allocate the slices, and/or call the gc at the end of each operation. With both active, the memory use as seen in the value Sys, is much better.
So in short, from the outside, Go programs never shrink in memory use unless the OS takes memory back, and may be larger than you expect based on how references to data are held in your program.