how to compress executables under Unix

Kent Williams

unread,

Aug 19, 1991, 9:18:54 AM8/19/91

to

In article <ygx...@rpi.edu> sore...@gazoo.ecse.rpi.edu (Jeffrey Sorensen) writes:
>Has anyone seen or heard of a routine that will compress object code so that
>programs can be dynamically uncompressed before running? The idea I have in
:
>I have heard of a product from PKware that does this on the PC. Is this idea
>patented? Is there any code for Unix to do this?

I can think of one quick hack:

1. Compress the program with compress. Lets call our program blob.

compress blob.

2. Now put this shell script next to blob.Z in the file system

#!/bin/sh
compress -d < blob.Z > /tmp/blob$$
chmod u+x /tmp/blob$$
/tmp/blob$$ $*
rm /tmp/blob$$

This will get you 95% of the way there, except in cases where the executable
is supposed to be setuid or setgid something or other
--
Kent Williams --- will...@cs.uiowa.edu
"Reality is what refuses to go away when I stop believing in it" - P.K. Dick

Brad Templeton

unread,

Aug 20, 1991, 5:32:19 PM8/20/91

to

The paper at USENIX was boring and talked about 20 year old compression
techniques, wired into the kernel.

Doing this as an all-in-one is very system dependent, but of course
executables are all pretty system-dependent. If you want to write the
output into memory and execute the memory you have to really do some
non-portable things.

The simplest, but slower technique involves extracting to an executable
and doing an exec of the executable. That is portable, and while slow, is
not too bad if it all happens in the buffer cache.

To do this, take the source to compress, extract the decompressor from it,
and modify things to decompress from ram to a temp file, then exec the
temp file. Then find a way to link your compressed executable in with
the program perhaps by writing a program to convert it into a simple .o
with a TEXT block. (A slow method would be to output a fat assembler or
C program full of constants if you don't want to write a .o yourself)
--
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473

Alfred Kayser

unread,

Aug 21, 1991, 3:07:33 AM8/21/91

to

br...@looking.on.ca (Brad Templeton) writes:
>The simplest, but slower technique involves extracting to an executable
>and doing an exec of the executable. That is portable, and while slow, is
>not too bad if it all happens in the buffer cache.

This looks nice, because it is simple. The OS takes care of speeding things
up. (caching and such).

>To do this, take the source to compress, extract the decompressor from it,
>and modify things to decompress from ram to a temp file, then exec the
>temp file. Then find a way to link your compressed executable in with
>the program perhaps by writing a program to convert it into a simple .o
>with a TEXT block.

Another wau could be to cat the decompress program and the compressed
code together. Executing the result will discard the compressed code, which
can the read from the started executable file.
I don't know if this will work on all UNIX's but we can give it a try!
--
-- Ir. Alfred Kayser. PACS, OS/2, TCP/IP. --- Email: AKa...@et.tudelft.nl --
-- CARDIT, Delft University of Technology ------------ Tel: (31)-15-786179 --
-- P.O.Box 5031, 2600 GA Delft, The Netherlands ------ Fax: (31)-15-784898 --

Marcus J. Ranum

unread,

Aug 21, 1991, 9:39:46 AM8/21/91

to

>br...@looking.on.ca (Brad Templeton) writes:
>>The simplest, but slower technique involves extracting to an executable
>>and doing an exec of the executable.

You'd need to think of some clever way to still support shared
text for the decompressed executables, or you'd trade one problem for
another, wouldn't you?

mjr.
--
public key: 0x9817A19372CC74818B713A5017352379C9117918A43938ED4389012918
918AC847272C1939894849A83378840412

John Bowler

unread,

Aug 22, 1991, 8:58:37 AM8/22/91

to

In article <1991Aug20....@looking.on.ca> br...@looking.on.ca (Brad Templeton) writes:
>The paper at USENIX was boring and talked about 20 year old compression
>techniques, wired into the kernel.

Maybe, but is this relevant? The basic (RISC iX ``squeeze'') compression
algorithm gets results which are identical (or almost identical) to those
of compress (identical in terms of disc utilisation, which is what counts),
however it is compressing in 32k blocks (the page size of the target
machine), allowing demand paging, whereas compress compresses the whole file
as one. The (RISC iX) squeeze algorithm output leaves the compressed 32k
sections at the *SAME* file offsets as the original sections; this
constrains the results quite significantly:-

Compress:- |load of bytes|

Original File:- |Header | Text.... | Data.... |
0 32k N*32k M*32k

Squeeze:- |Header Tables | T000 | T000 | ... | D000 | D000 | ...|
0 32k 64k 96k... N*32k +32k +32k M*32k

So to get any reduction in disc utilisation compress only has to reduce the
file size by at least one disc fragment (I am assuming a BSD file system -
which is the target in this case), squeeze has to save fragments at the end
or *whole disc blocks* in the middle (the text segment).

Why constrain it this way? Because we wanted executables to still be demand
paged and we couldn't afford to save (uncompressed) text pages to the swap
area. [The swap area was 8Mbyte big, and we didn't have time to ``fix''
BSD so that it didn't insist on pre-allocating pages for program data areas,
so the machine was limited to 8Mbyte of data for *all* processes; if you
start taking text pages out of this (32k at a time) you lose, or rather the
users of the system lose.] (I'm probably justifying this after the event
- the disadvantages of reconstructing the uncompressed image on the disc
are real, but whether they justify not doing it depends critically on the
actual system).

>Doing this as an all-in-one is very system dependent, but of course
>executables are all pretty system-dependent. If you want to write the
>output into memory and execute the memory you have to really do some
>non-portable things.
>
>The simplest, but slower technique involves extracting to an executable
>and doing an exec of the executable. That is portable, and while slow, is
>not too bad if it all happens in the buffer cache.

Well, the system which we had had 4Mbyte of memory. The buffer cache was
not very big. We could no more afford to write the uncompressed text to a
filing system disc partition than we could afford to write it to the swap
segment. If the system is big enough to manage to avoid writing the buffer
cache it probably has big enough discs to avoid the overheads of compressing
executables.

As you say, it is all system dependent. The constraints on this system were
disc space and page size. The disc was 56Mbyte, *including* the swap
partition, leaving 39Mbytes (just under) for the whole file system (only
one partition). Into this it was necessary to fit an almost complete
BSD4.3+NFS3+X11R2 system. Because of a slight design botch which we could
no longer reverse the minimum executable file length was 96k, the only thing
which saved us was the fact that the BSD file system could accomodate
sequences of zeros in files by leaving holes.

Eventually we got to a position where the executables were no longer taking
up a major portion of the disc space (about 10Mbytes for /usr/bin, /bin,
/usr/ucb, or something like that). The squeeze compression scheme as
used has memory overhead, because the tables used must be retained for
later demand paging (in the u area if I remember correctly), but this is
only a few Kbytes.

>To do this, take the source to compress, extract the decompressor from it,
>and modify things to decompress from ram to a temp file, then exec the
>temp file. Then find a way to link your compressed executable in with
>the program perhaps by writing a program to convert it into a simple .o
>with a TEXT block. (A slow method would be to output a fat assembler or
>C program full of constants if you don't want to write a .o yourself)

The original scheme which Acorn used (and still uses in its proprietary
operating systems) has little memory overhead - as with this suggestion and
other suggestions in this thread a small piece of code is tacked
onto the executable which decompresses the whole executable. Most of the
code is decompressed ``in place'' (from the end downwards, with the compressed
code at the beginning) - a small amount of compressed code may need to be
copied, and some overhead is required for the encoding tables (about 6kbytes)
and the code. One obvious advantage is that the OS loads all the data in.

In general program loading from floppy disc is speeded up very significantly
(because the time saving in having to load less data more than compensated
for the time to execute the decompression), and even from hard disc there
are significant speed advantages. Under a system which supports background
disc io (such as any UNIX) the advantages are not as great - the system can
be doing something while waiting for the next lot of data from the disc.
However the speed costs in RISC iX do not seem to be particularly badly
affected by demand paging of squeezed data - the 32k page size is far more
significant (the kernel has to load the whole of many programs before any
execution can start).

Although the ``self reconstructing executable'' mechanism is processor and
architecture specific it is not OS specific. The constraint is that the
code is written by the program itself, which tends to mean that the code
ends up as writable. If the architecture forbids this then some OS
involvement may well be required.

Compress is not ideally suited to this. During the decompression compress
has to have access to the whole of its string table, and the string tables
are notoriously large. By definition the system is constricted in disc
space, UNIX systems with small amounts of disc space almost invariably have
little memory. Compressing executables -b12 gives me results which are
worse than those of squeeze. Of course compress's model is wrong - the
machine instructions in question (ARM code) are word sized, the last nibble
of more than 85% of instructions is 0xE (the ``do this always'' condition
code). Using an appropriate model for compressing code and a different
model for data will probably help. Ensuring that literals (in particular
strings) are separated from the code will probably also help.

To give some numbers to this, the following quote is from the original
document describing squeeze, written in 1987 by Richard Cownie:-

``To attempt to find out how good the encoding scheme really is, I have
analysed the word frequency statistics for a number of programs, calculating
how many bits a Huffman encoding would be (excluding any frequency tables).
This gives us a good measure of the amount of redundancy in the file [viewed
as a sequence of words], and thus a lower bound on the size achievable by
any compression algorithm which treats the image just as a sequence of
words. The figures are as follows:

[[Obviously arithmetic encoding would give a better lower bound. JB]]

Program Size Huffman Squeezed (without tables and code)

ab 45K 17K 25K huff% = 17/45 = 37%
cc 239K 91K 125K 38%
objasm 264K 81K 103K 31%
twin 42K 16K 21K 38%
diff 49K 16K 19K 33%
initf 406K 150K 199K 37%''

Richard toyed with the idea of using a more sophisticated model. The
compressability varies from machine instruction to machine instruction, and
the output of compilers produces very regular sequences of instructions at
procedure entry (this is a consequence of the ARM procedure call
standard(s)). However, we haven't tried anything more complicated.

My current opinion (it changes from time to time :-) is that instruction
(word) based compression is *not* the real answer. The are other
approaches, which can justifiably be called ``compression'' but which use
far more information about the structure of an executable:-

1) Use an intermediate code! A shell script to do something is often
smaller than a compiled C program. If the problem matches the capabilities
of awk, awk scripts are almost invariably smaller than the native code.
Intermediate codes produced by language compilers may be better too.

2) Use shared libraries. Simple compression of code models it as a sequence
of instructions. Most code is actually a sequence of library calls, used
by a large variety of different programs. Shared libraries reduce memory
use at run time as well as disc use! The current version of RISC iX fits
the executables from BSD4.3+NFS4+X11R4(core+some contrib)+Motif1.0+
XPG3(Base)+GNU Emacs+X.desktop plus some home grown programs (692
executables including shell scripts and shared libraries) into under
20Mbyte. Similarly on a SUN OS 4 system (also using shared libraries) the
743 executables (excluding shared libraries) which I found in /usr occupy
just under 29Mbytes. The shared libraries (in /lib) occupy a further
2Mbytes, but these include symbols.

Some of the space saving in RISC iX comes from compression, but most of it
comes from the shared libraries; the shared C library (libc) occupies
208kbytes (the libc.so.1.6 occupies 512k on SUN OS 4), while not all of this
is needed for any particular C program many programs use a substantial part.
The saving with X11 is even more noticeable; libX11 takes about 236k and
quite a high proportion is used by any particular program, the Motif shared
library takes 776k and *most* of it is used by even a trivial program.

These figures are cumulative; a Motif program needs Xm+X11+libc; probably
over 1Mbyte of code, sharing this saves 1Mbyte of disc space per Motif
program. RISC iX is shipped with only 7 Motif programs occupying 488k+776k
of disc space, if these did not use shared libraries the (unsqueezed)
executables would occupy 7.5Mbytes on the disc; I cannot believe that an
instruction based compression scheme could reduce this to less than 2Mbytes
of disc space, so even with this small number of Motif programs shared
libraries give a substantial win. (Incidentally, the shared libraries
themselves are not squeezed on RISC iX).

3) Package core utilities more sensibly. Use hard links where possible!
One of the above 7 Motif programs (myni from X.desktop) has 5 links;
five programs with different functionality implemented using the same
code. Although the functionality is different most of the implementation
is shareable! Various parts of the BSD toolkit use this approach. System V
gathers the basic utilities into a single executable.

Other parts of our system are small executables or shell scripts which
execute the real program with different arguments to obtain slightly
different behaviours - we have one sh implementation which obeys BSD or
X/Open semantics according to a command line option. A generic executable
called ``wrapper'' sits in the X/Open utility directory and masquerades as
various X/Open utilities. If executed it executes the program in /usr/bin
with the same name as itself and the extra argument ``-V'' (;-). Thus we
deal with minor changes in semantics between X/Open and BSD programs.
Similarly there is one C compiler, but the C compiler the programmer sees
is an executable C source file (;-):-

/usr/ucb/cc:-

#/* /usr/lib/cc -ZiBSD
*/
#define _BSD_C 1

semantics are like those of the #! BSD executable shell script; the
image executed ends up as:-

/usr/lib/cc "-ZiBSD" /usr/ucb/cc <original arguments>

/usr/ucb/cc is pre-included before every source file and changes the
behaviour of the header files appropriately (so only one set of header
files), the -Zixxx option also causes symbol renaming to be performed on
object modules so that a single C shared library can be used; with different
entry points where the X/Open and BSD) API semantics differ.

This all saves run time memory as well as disc space by permitting code
to be shared by concurrently running programs (an X/Open and a BSD C
compilation in parallel will use the same code).

4) Persuade the compiler writers to optimize for *space* rather than *time*,
where the choice exists - or write code in compilers intended to produce
compact code (BLISS?). Again this can save both disc space and memory at
run time.

Ok, I admit none of this is particularly clean, much of it is ad hoc, but
it all helps acheive the essential disc space compression. In essence the
trick is to compress at many levels using many different models.
Compressing the instructions *does* help, but compressing at the function
call level (by sharing libraries), at the program level (by sharing
executables) and in transformations of the original source by compilers
should gain significantly more simply because the model for the compression
is better!

John Bowler (jbo...@acorn.co.uk)

Brad Templeton

unread,

Aug 22, 1991, 4:24:38 PM8/22/91

to

John, your problem was acute, but it is still an old one not as relevant to
modern problems. Disk space costs $3 per megabyte and often less. Memory
costs about $40/megabyte mail order. The concept of "what if I only have
a few megs of swap" no longer makes sense.

LZW compress is indeed a more expensive method of decompression, and to
gain maximal compression it does require scanning a large block. As such
it is not as suitable as could be for demand paged blocks.

However LZ77 compression is not subject to these restrictions. Aside from
giving far better compression than LZW, it decompresses very quickly,
requiring effectively no memory, and it attains quite reasonable compression
values in blocks as small as 4K. (Your pages were 16K weren't they?)
Certainly far better than any modified prefix code scheme as you used based
on what I recall.

John Bowler

unread,

Aug 23, 1991, 7:49:21 AM8/23/91

to

In article <1991Aug22.2...@looking.on.ca> br...@looking.on.ca (Brad Templeton) writes:
>John, your problem was acute, but it is still an old one not as relevant to

^^^?
>modern problems.

The problem arose around 1988/89 when we started serious UNIX development.
The problem still exists. We still sell UNIX boxes based on a machine with
a 56MByte ST506 disc and 4Mbyte of memory. Our (perhaps more sensible) box
has a 100Mbyte disc and 8Mbyte of memory, but the OS is at least twice the
size (the addition of Motif and X/Open conformance stuff).

> Disk space costs $3 per megabyte and often less. Memory
>costs about $40/megabyte mail order. The concept of "what if I only have
>a few megs of swap" no longer makes sense.

The cost of disc space depends crucially on the disc size. The optimal size
changes with time. Currently (in *Britain*) it is ~120Mbyte for a SCSI
disc, smaller for IDE. The figure you give ($3/MByte) is probably
consistent with the optimal size (I don't know what Acorn pays - such
numbers are obviously closely guarded commercial secrets). If you choose
larger you page disproportionally more, if you put multiple discs in you
pay in other ways. Next year the size will probably be 200Mbyte - notice
that SUN have moved to this already (I'm thinking about bottom of the range
machines of course). In any case, the figures are there and easy to analyse
- if you want to sell an inexpensive system you have to ``compress''
something. We chose to compress the executables and the swap space :-).
Other people compress the range executables by leaving most of them out :-).

>LZW compress is indeed a more expensive method of decompression, and to
>gain maximal compression it does require scanning a large block. As such
>it is not as suitable as could be for demand paged blocks.
>
>However LZ77 compression is not subject to these restrictions. Aside from
>giving far better compression than LZW,

I thought they tended to be within 10% of each other?

> it decompresses very quickly,
>requiring effectively no memory, and it attains quite reasonable compression
>values in blocks as small as 4K. (Your pages were 16K weren't they?)

32k - good for most forms of compression.

>Certainly far better than any modified prefix code scheme as you used based
>on what I recall.

Well, I gave the figures for squeeze versus Huffman encoding, without any
consideration of the overhead of the tables for the Huffman encoding (they
would tend to be rather large - remember 32 bit words are being compressed).

Regardless of whether there is a better method, the bottom line is that
compress compresses bytes, not words. LZ77 does too doesn't it? What
matters *first* is the compression model. The figure about conditional
instructions should make the point - conditional instructions occupy 94% of
the instruction space, but constitute only 12% of the instructions actually
used. All data processing instructions fill 25% of the instruction space,
but cover 42% of the instructions used, so about 37% of instructions are
conditional data processing ones (it's more than this actually) but they
account for only 1.5% of the instruction space. Similar figures are
obtained for unconditional load instructions (11% of the instructions in an
executable, but under 1% of the instruction space).

Further analysis reveals that particular types of load and dp instructions
are by far the most frequently used. This is why a prefix code works. This
is (even) why byte based compression works - the model may be wrong, but
the patterns in word uses really do show up in the bytes, particularly when
you visualise each instruction as a 4 byte string.

This doesn't detract from the fact that using the right model is a very good
idea :-). Neither does it detract from the fact that higher level models
are good too (particularly shared libraries).

The message is - get the model right, then worry about the exact compression
algorithm.

John Bowler (jbo...@acorn.co.uk)

lance.norskog

unread,

Aug 23, 1991, 9:02:47 PM8/23/91

to

I don't want executables in the first place.

I want everything in source form with an optional incrementally
compiled version in disk cache. I want lazy incremental compilation
so that only code that gets used is stored in binary.

The concept of compilers dates from batch and card-readers.

Lance

Jacques Gelinas

unread,

Aug 22, 1991, 8:05:31 AM8/22/91

to

From article <1991Aug21.1...@decuac.dec.com>, by m...@hussar.dco.dec.com (Marcus J. Ranum):

>>br...@looking.on.ca (Brad Templeton) writes:
>>>The simplest, but slower technique involves extracting to an executable
>>>and doing an exec of the executable.
>
> You'd need to think of some clever way to still support shared
> text for the decompressed executables, or you'd trade one problem for
> another, wouldn't you?
>

One way is to uncompress to a specific location of the file system
using a unique path for each uncompress executable. /usr/bin/vi could
be uncompress to /tmp/umcomp/usr/bin/vi and execute from there. A log
may be maintained, say /tmp/uncomp/usr/bin/vi.log. This file holds the
PID number of all process currently using this. The last one to exit
delete the log and the uncompress file. A second user would not have
to uncompress the file, just add its pid to the log file.

The real clean solution to all this already exist but is not widespread.
It is shared library. However you have little control on how some
software suppliers package (link) their stuff.

Ed Hall

unread,

Aug 27, 1991, 12:41:19 PM8/27/91

to

The concept of binary arithmetic is even older; what would you replace
it with?

Show me an incremental compiler that can perform useful global
optimizations as well as they do, and I'll agree that the old
"batch-oriented" compilers are entirely outdated. That time isn't
here yet--which is not to deny the advantages of other approaches.

BTW, the idea of "lazy incremental compilation" is hardly new. Some
BASIC interpreters used to use it.

More relevent to comp.compression is this question: is there anything
about source code that makes it innately more--or less--compressable
than executables?

-Ed Hall
edh...@rand.org

Rob MacLachlan

unread,

Aug 28, 1991, 6:32:12 PM8/28/91

to

In article <1991Aug27....@rand.org> edh...@rand.org (Ed Hall) writes:
>
>Show me an incremental compiler that can perform useful global
>optimizations as well as they do, and I'll agree that the old
>"batch-oriented" compilers are entirely outdated. That time isn't
>here yet--which is not to deny the advantages of other approaches.

Well, if you mean true inter-routine optimization, then that is somewhat
difficult, but not impossible. Complex dynamic OOP langauges often have
back-pointers that allow invalidated assumptions to be decached on
redefinition. (e.g. the Self compiler)

>More relevent to comp.compression is this question: is there anything
>about source code that makes it innately more--or less--compressable
>than executables?

Well, compressibility ~= entropy. It seems pretty clear to me that a
compiler introduces entropy into the program implementation that is not
required by the langauge semantics. It does this in the process of making
arbitrary decisions, such which register to stick a value in. A
zero-address stack-machine program has lower entropy than a program for a
general register machine.

Source code has unnecessary entropy in the form of names, comments, etc.
Semantically it makes no difference what what a variable is called, but that
is a part of the source.

Rob

John Bowler

unread,

Aug 29, 1991, 6:28:48 AM8/29/91

to

In article <1991Aug27....@rand.org> edh...@rand.org (Ed Hall) writes:

>More relevent to comp.compression is this question: is there anything
>about source code that makes it innately more--or less--compressable
>than executables?

The comments. Compilation (typically) removes information (comments, long
variable names etc) which lossless source compression cannot remove -
compilation is lossy! On the other hand some (but certainly not all)
compilation is optimised for speed at the expense of space - in-lining,
macro expansion, loop unrolling.

Numbers? For an X server implementation compiled for the ARM:-

size(bytes) compress -b16
----------- -------------
source 2266k 864k
after removing comments 1384k 520k
binary 488k 280k

(Repeated white space was also converted to a single space when the comments
were removed, none of the figures include source or code for libraries or
library headers. Binary size is total image size, but no symbols.)

John Bowler (jbo...@acorn.co.uk)