Re: upx compression of golang binaries

Peter Waller

unread,

Jun 23, 2014, 7:20:52 AM6/23/14

to golang-dev, Russ Cox, Minux

Hi All,

I just wanted to respond to minux in this locked issue https://code.google.com/p/go/issues/detail?id=6853 where he said "re #8, I don't think it's Go's problem. upx should be made more flexible to handle this.".

I'm the author of goupx, which modifies go-compiled binaries so that they may be compressed by upx, for a saving of regularly up to 75% of the executable size, in practice.

My understanding is that the upx team require that one can exactly unpack the compressed binary into the original form and there is nowhere else to record the changes necessary to make the binary decompress into its original form. They support ~30 binary formats and they don't want to introduce somewhere to store this extra information for one case. (Which seems reasonable to me?)

In addition, they claim that the chosen values in the header are unreliable, though they may work in practice.

I absolutely understand that the core go developers have better things to be working on and that there is an unknown risk of breakage when you change such things.

However, is fixing this at the root ruled out forever, or just "for now"? When would be an appropriate time to revisit this, if ever? What would it take if myself or other people wanted to chip in?

Thanks,

- Peter

From John Reiser, upx author at http://sourceforge.net/p/upx/bugs/195/:

In reply to the tebeka comment of 2011-09-12 10:05:35 PDT: For a PT_LOAD, as long as .p_align divides (.p_vaddr - .p_offset), then is is permissible for the manager of the memory address space to expand the mapped interval to a convenient set of pages which cover the interval of addresses. It is also permissible for the manager of the address space to honor the indicated range _exactly_: the executing process must not depend on bytes that lie outside the interval [.p_vaddr, .p_memsz + .p_vaddr). For instance, dl_iterate_phdr() might be undefined when PT_PHDR lies outside of all PT_LOAD. The decompression into memory by UPX stub at beginning of execution of a compressed program also depends on PT_PHDR being inside the first PT_LOAD. Thus the scheme used by the Go language processor is unreliable.

The major problem arises during "upx --decompress ./my_app.compressed". It is required that the output be identical to the original never-compressed ./my_app. At compress time then UPX could expand the first PT_LOAD to cover the 0xc00 bytes of Go, by _changing_ the .p_vaddr. .p_filesz, and .p_memsz. But then the --decompress output would have those changes, and be different from the original. There is no convenient place to record the changes, and it is poor practice to add a quirk when Go's format already has problems.

The easiest way to get things to work is to modify the executable "offline", before compressing via UPX, so that PT_LOAD{0].p_offset==0. Open Watcom 1.9 on MS Windows generates ELF executables with a similar configuration. I will upload a short utility "hemfix.c" which works for that case. Click on "Attached File" near the bottom of this page [there is an invisible button there: rollover and see the pointer change].

Ian Lance Taylor

unread,

Jun 23, 2014, 11:49:52 AM6/23/14

to Peter Waller, golang-dev, Russ Cox, Minux

On Mon, Jun 23, 2014 at 4:20 AM, Peter Waller <pe...@scraperwiki.com> wrote:
>
> I just wanted to respond to minux in this locked issue
> https://code.google.com/p/go/issues/detail?id=6853 where he said "re #8, I
> don't think it's Go's problem. upx should be made more flexible to handle
> this.".
>
> I'm the author of goupx, which modifies go-compiled binaries so that they
> may be compressed by upx, for a saving of regularly up to 75% of the
> executable size, in practice.
>
> My understanding is that the upx team require that one can exactly unpack
> the compressed binary into the original form and there is nowhere else to
> record the changes necessary to make the binary decompress into its original
> form. They support ~30 binary formats and they don't want to introduce
> somewhere to store this extra information for one case. (Which seems
> reasonable to me?)
>
> In addition, they claim that the chosen values in the header are unreliable,
> though they may work in practice.
>
> I absolutely understand that the core go developers have better things to be
> working on and that there is an unknown risk of breakage when you change
> such things.
>
> However, is fixing this at the root ruled out forever, or just "for now"?
> When would be an appropriate time to revisit this, if ever? What would it
> take if myself or other people wanted to chip in?

Fixing Go binaries to work with UPX is not ruled out either forever or
just for now. If the fix is to make the Go binaries more "correct,"
then it is fine. At this point I don't understand what the fix is.

Ian

Russ Cox

unread,

Jun 23, 2014, 1:26:21 PM6/23/14

to Ian Lance Taylor, Peter Waller, golang-dev, Minux

Like Ian said, you have not stated the problem.

The original thread was about loads being unaligned and overlapping. Today, the ELF binaries we generate do neither of these things. This is what I see on a Linux binary built last week:

Program Headers:

Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align

PHDR 0x000034 0x08048034 0x08048034 0x00120 0x00120 R 0x1000

INTERP 0x000bed 0x08048bed 0x08048bed 0x00013 0x00013 R 0x1

[Requesting program interpreter: /lib/ld-linux.so.2]

LOAD 0x000000 0x08048000 0x08048000 0x26a3c0 0x26a3c0 R E 0x1000

LOAD 0x26b000 0x082b3000 0x082b3000 0x292ca3 0x292ca3 R 0x1000

LOAD 0x4fe000 0x08546000 0x08546000 0x1abc0 0x3036c RW 0x1000

DYNAMIC 0x4fe080 0x08546080 0x08546080 0x00098 0x00098 RW 0x4

TLS 0x000000 0x00000000 0x00000000 0x00000 0x00008 R 0x4

GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4

LOOS+5041580 0x000000 0x00000000 0x00000000 0x00000 0x00000 0x4

Everything is aligned properly, and the loads are non-overlapping.

Russ

Reply all

Reply to author

Forward