In reply to the tebeka comment of 2011-09-12 10:05:35 PDT: For a PT_LOAD, as long as .p_align divides (.p_vaddr - .p_offset), then is is permissible for the manager of the memory address space to expand the mapped interval to a convenient set of pages which cover the interval of addresses. It is also permissible for the manager of the address space to honor the indicated range _exactly_: the executing process must not depend on bytes that lie outside the interval [.p_vaddr, .p_memsz + .p_vaddr). For instance, dl_iterate_phdr() might be undefined when PT_PHDR lies outside of all PT_LOAD. The decompression into memory by UPX stub at beginning of execution of a compressed program also depends on PT_PHDR being inside the first PT_LOAD. Thus the scheme used by the Go language processor is unreliable.
The major problem arises during "upx --decompress ./my_app.compressed". It is required that the output be identical to the original never-compressed ./my_app. At compress time then UPX could expand the first PT_LOAD to cover the 0xc00 bytes of Go, by _changing_ the .p_vaddr. .p_filesz, and .p_memsz. But then the --decompress output would have those changes, and be different from the original. There is no convenient place to record the changes, and it is poor practice to add a quirk when Go's format already has problems.
The easiest way to get things to work is to modify the executable "offline", before compressing via UPX, so that PT_LOAD{0].p_offset==0. Open Watcom 1.9 on MS Windows generates ELF executables with a similar configuration. I will upload a short utility "hemfix.c" which works for that case. Click on "Attached File" near the bottom of this page [there is an invisible button there: rollover and see the pointer change].