Re: upx compression of golang binaries

689 views
Skip to first unread message

Peter Waller

unread,
Jun 23, 2014, 7:20:52 AM6/23/14
to golang-dev, Russ Cox, Minux
Hi All,

I just wanted to respond to minux in this locked issue https://code.google.com/p/go/issues/detail?id=6853 where he said "re #8, I don't think it's Go's problem. upx should be made more flexible to handle this.".

I'm the author of goupx, which modifies go-compiled binaries so that they may be compressed by upx, for a saving of regularly up to 75% of the executable size, in practice.

My understanding is that the upx team require that one can exactly unpack the compressed binary into the original form and there is nowhere else to record the changes necessary to make the binary decompress into its original form. They support ~30 binary formats and they don't want to introduce somewhere to store this extra information for one case. (Which seems reasonable to me?)

In addition, they claim that the chosen values in the header are unreliable, though they may work in practice.

I absolutely understand that the core go developers have better things to be working on and that there is an unknown risk of breakage when you change such things.

However, is fixing this at the root ruled out forever, or just "for now"? When would be an appropriate time to revisit this, if ever? What would it take if myself or other people wanted to chip in?

Thanks,

- Peter

From John Reiser, upx author at http://sourceforge.net/p/upx/bugs/195/:

In reply to the tebeka comment of 2011-09-12 10:05:35 PDT: For a PT_LOAD, as long as .p_align divides (.p_vaddr - .p_offset), then is is permissible for the manager of the memory address space to expand the mapped interval to a convenient set of pages which cover the interval of addresses. It is also permissible for the manager of the address space to honor the indicated range _exactly_: the executing process must not depend on bytes that lie outside the interval [.p_vaddr, .p_memsz + .p_vaddr). For instance, dl_iterate_phdr() might be undefined when PT_PHDR lies outside of all PT_LOAD. The decompression into memory by UPX stub at beginning of execution of a compressed program also depends on PT_PHDR being inside the first PT_LOAD. Thus the scheme used by the Go language processor is unreliable.

The major problem arises during "upx --decompress ./my_app.compressed". It is required that the output be identical to the original never-compressed ./my_app. At compress time then UPX could expand the first PT_LOAD to cover the 0xc00 bytes of Go, by _changing_ the .p_vaddr. .p_filesz, and .p_memsz. But then the --decompress output would have those changes, and be different from the original. There is no convenient place to record the changes, and it is poor practice to add a quirk when Go's format already has problems.

The easiest way to get things to work is to modify the executable "offline", before compressing via UPX, so that PT_LOAD{0].p_offset==0. Open Watcom 1.9 on MS Windows generates ELF executables with a similar configuration. I will upload a short utility "hemfix.c" which works for that case. Click on "Attached File" near the bottom of this page [there is an invisible button there: rollover and see the pointer change].

Ian Lance Taylor

unread,
Jun 23, 2014, 11:49:52 AM6/23/14
to Peter Waller, golang-dev, Russ Cox, Minux
On Mon, Jun 23, 2014 at 4:20 AM, Peter Waller <pe...@scraperwiki.com> wrote:
>
> I just wanted to respond to minux in this locked issue
> https://code.google.com/p/go/issues/detail?id=6853 where he said "re #8, I
> don't think it's Go's problem. upx should be made more flexible to handle
> this.".
>
> I'm the author of goupx, which modifies go-compiled binaries so that they
> may be compressed by upx, for a saving of regularly up to 75% of the
> executable size, in practice.
>
> My understanding is that the upx team require that one can exactly unpack
> the compressed binary into the original form and there is nowhere else to
> record the changes necessary to make the binary decompress into its original
> form. They support ~30 binary formats and they don't want to introduce
> somewhere to store this extra information for one case. (Which seems
> reasonable to me?)
>
> In addition, they claim that the chosen values in the header are unreliable,
> though they may work in practice.
>
> I absolutely understand that the core go developers have better things to be
> working on and that there is an unknown risk of breakage when you change
> such things.
>
> However, is fixing this at the root ruled out forever, or just "for now"?
> When would be an appropriate time to revisit this, if ever? What would it
> take if myself or other people wanted to chip in?

Fixing Go binaries to work with UPX is not ruled out either forever or
just for now. If the fix is to make the Go binaries more "correct,"
then it is fine. At this point I don't understand what the fix is.

Ian

Russ Cox

unread,
Jun 23, 2014, 1:26:21 PM6/23/14
to Ian Lance Taylor, Peter Waller, golang-dev, Minux
Like Ian said, you have not stated the problem.

The original thread was about loads being unaligned and overlapping. Today, the ELF binaries we generate do neither of these things. This is what I see on a Linux binary built last week:

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x08048034 0x08048034 0x00120 0x00120 R   0x1000
  INTERP         0x000bed 0x08048bed 0x08048bed 0x00013 0x00013 R   0x1
      [Requesting program interpreter: /lib/ld-linux.so.2]
  LOAD           0x000000 0x08048000 0x08048000 0x26a3c0 0x26a3c0 R E 0x1000
  LOAD           0x26b000 0x082b3000 0x082b3000 0x292ca3 0x292ca3 R   0x1000
  LOAD           0x4fe000 0x08546000 0x08546000 0x1abc0 0x3036c RW  0x1000
  DYNAMIC        0x4fe080 0x08546080 0x08546080 0x00098 0x00098 RW  0x4
  TLS            0x000000 0x00000000 0x00000000 0x00000 0x00008 R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4
  LOOS+5041580   0x000000 0x00000000 0x00000000 0x00000 0x00000     0x4

Everything is aligned properly, and the loads are non-overlapping.

Russ
Reply all
Reply to author
Forward
0 new messages