One more options to add to
Tamás' suggestions - due to Go's interface design, implementing a draw.Image, which is an image.Image with an additional Set(x, y int, c color.Color) method, is sufficient to use draw.Draw for compositing.
As such, you can pre-allocate a flat-file with sufficient space to store 4 x 8 x width x height bytes
(4 for RGBA, 8 for float64 in bytes), and have your implementation of draw.Image reading and writing
to the file by offset. This would minimize memory usage in exchange for a massive slowdown, as
every *pixel* operation would now be a read or write to a file, but only one of the small files needs to be open
at a time. MMapping this file can get some of the performance back, as Tamás suggested, or you can have the file be on a RAMdisk - this alleviates less of the performance issue, as it still requires filesystem operations per-pixel, but is not as slow as dealing with actual spinning media.
You would still at some point need to pull in the entire large file to write it into a usable format, so whether there is any significant advantage depends on whether your large file has an area equal to or larger than the sum of the areas of the individual images - that is, to what degree the images are being composited with overlap.