Challenge: More Instructions in Fewer Bytes

Johann 'Myrkraverk' Oskarsson

unread,

May 29, 2012, 7:49:32 PM5/29/12

to

Hi all,

I'm sorry if this is a bit naive. Long ago, in an IRC channel far
far away, the following idea for a challenge appeared:

Write two versions of the same routine/algorithm where the one using
more instructions is fewer bytes in the resulting binary.

This may be trivial, particularly since I'm not giving any restrictions
on what the code is supposed to do.

I decided to throw this into usenet and see if the experts deem this a
worthy challenge. The goal is to have fun and come up with interesting
size optimizations. There is no "reward" as such.

Again, if this is trivial and pointless, then sorry for the bother.

Johann

Terje Mathisen

unread,

May 30, 2012, 2:27:42 AM5/30/12

to

Johann 'Myrkraverk' Oskarsson wrote:
> Hi all,
>
> I'm sorry if this is a bit naive. Long ago, in an IRC channel far
> far away, the following idea for a challenge appeared:
>
> Write two versions of the same routine/algorithm where the one using
> more instructions is fewer bytes in the resulting binary.

This is indeed trivial.

You can use multiple single-byte and two-byte opcodes in the space taken
by a single "operation reg,[reg+scale*reg+offset]" instruction.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Johann 'Myrkraverk' Oskarsson

unread,

May 30, 2012, 5:38:06 PM5/30/12

to

Terje Mathisen <"terje.mathisen at tmsw.no"@giganews.com> writes:

> This is indeed trivial.
>
> You can use multiple single-byte and two-byte opcodes in the space
> taken by a single "operation reg,[reg+scale*reg+offset]" instruction.

I see. I was brought up in mips where every instruction is 32bit wide.
The concept of variable size instructions is rather new to me.

Thank you.

Rugxulo

unread,

May 30, 2012, 11:25:35 PM5/30/12

to

Hi, Myrk, :-)

On May 30, 4:38 pm, Johann 'Myrkraverk' Oskarsson

Perhaps he shouldn't have said "trivial". It can be, but of course
real life tends to lean towards complexity.

But yes, for a few simple examples, see below (and the NOPs were just
to separate things for clarity):

B800000000 mov eax,0x0 ; 5 bytes
90 nop
9C pushfd ; 1+2+1 bytes
31C0 xor eax,eax
9D popfd
90 nop
90 nop

B8FFFFFFFF mov eax,0xffffffff ; 5 bytes
90 nop
31C0 xor eax,eax ; 2+1 bytes
48 dec eax
90 nop
90 nop

83C602 add esi,byte +0x2 ; 3 bytes
90 nop
46 inc esi ; 1+1 bytes
46 inc esi
90 nop
90 nop

And for reference, the max instruction size for 8086 was 6 bytes, 286
was 10 bytes, and 386 was 15 bytes (if that tells you anything).

P.S. Keep in mind that "most" people (and compilers) don't optimize
much for size, only for speed.

Kulin

unread,

May 31, 2012, 6:55:43 AM5/31/12

to

For HLL the variable length encoding is fine because they never have to work
at the assembly level. For assembly coders the variable length encoding,
specifically Intel, is really ugly, but then so is everything else about
Intel.

RISC designs with fixed length instructions are sure nice to code assembly
on. You usually get trapped right away if you try to branch or load/store a
bad location. Doing address arithmetic in your head is also easier.

Harold Aptroot

unread,

Jun 4, 2012, 6:34:14 AM6/4/12

to

On May 31, 12:55 pm, Kulin <remai...@nospicedham.reece.net.au> wrote:

For HLL's the variable length is fine because it never matter, for
assembly the variable length is fine because you can just use labels
and let the assembler do the math. In most cases anyway. It would
rather ugly if you had to do address math yes, but you don't.

wolfgang kern

unread,

Jun 8, 2012, 2:06:03 PM6/8/12

to

Kulin posted:

Yeah. But I never had any problems with either RISC nor CISC instruction-
set on all the various MCU/CPU I ever had to work on. Sure some tools for
fixed sized instructions may look easier to implement, but as a matter of
fact I figured that an overall code-size and performance gain(size+speed)
is much better achievable with the variable OPcode-size [Z-80...x86+++].

Freqently used single byte opcodes are hard to beat by fixed sized quads,
and by the way JFYI: x86-CPUs work internal RISC-based anyway.

If you think that AMD/Intel code is ugly, then share my view about ARM
or ye olde IBM and others ... my point of view always were and still is:
what kind of shit works here? [everyting is bloated and slow like M$ :)]

__
wolfgang

Markus Wichmann

unread,

Jun 9, 2012, 5:12:56 PM6/9/12

to

Johann 'Myrkraverk' Oskarsson <myrkr...@nospicedham.yahoo.com> schrieb:

Well, yeah, it is trivial:

1. Take _any_ algorithm and copy it.
2. In the "fewer bytes" version, you add a "nop; nop" (2 bytes, 2
instructions)
3. In the "more bytes" version, you add "nop rax, [rax+4*rax+10]"
- 1 byte for REX
- 1 byte for nop
- 1 byte for ModRM
- 1 byte for SIB
=> 4 bytes, 1 instruction

So the "fewer bytes" version is 2 bytes shorter but has 1 more
instruction than the "fewer instructions" version.

HTH,
Markus