Branch displacement Optimization

rand...@earthlink.net

unread,

Nov 2, 2006, 7:00:04 PM11/2/06

to

Recently, Rene "Betov" Tournois has seen the light and thought that
"small-first, grow large" is a good idea. Somehow, he thinks that this
is a new and nifty idea. Of course, the explanation for it has been in
this very newsgroup since Jan, 2004. See the following link and
follow-ups:

http://groups.google.com/group/alt.lang.asm/browse_frm/thread/528907cad91150c4/a49af5f25dd15296?lnk=gst&q=branch+displacement+optimization&rnum=4#a49af5f25dd15296

However, because this seems to be of current interest, I'll repost that
essay here:

==================================================
This is a short essay that attempts to explain the
basics of "displacement optimization" in machine
code such as on the x86. In posts I've made here
and on various newsgroups (comp.lang.asm.x86 and
alt.lang.asm) it is quite clear that many people
don't understand what this is all about, and even
those who do understand the basic issues can't
believe the problem is as difficult to solve
(optimally) as I claim that it is. In this essay
I'm hoping to explain the process and the problems
in such a way so to make all this clear.

What is This All About?

The first place to start is with the question
"what is branch displacement optimization?" On
certain CPUs (e.g., the x86), certain instructions
have a limited branch range because of the way the
CPU encodes the target address of the branch. For
example, a typical "JE" instruction on the x86 is
two bytes long - one byte for the opcode and one
byte for a relative displacement. This relative
displacement allows the instruction to transfer
control to some instruction within +/- 128 bytes
(roughly) of the JE instruction.

What happens if the target address is *not* within
range? Well, if the target address is within +/-
4Gbytes, the 386 and later devices have a
special six-byte version of JE (two-byte opcode
and two-byte displacement) that can transfer
control to the label. If you're operating on a
chip earlier than the 386, you have to emit a
five-byte sequence like the following:

jne skipJmp ;two bytes
jmp target ;three bytes
skipJmp:

In early x86 assemblers (and in a few assemblers
that have been created lately), it was the
programmer's responsibility to manually encode the
displacement sizes of these jumps. Besides being a
pain in the butt to do manually, it was often the
case that the programmer *missed* several
optimization opportunities (hey, who wants to
count bytes in a program) and so the program was
not as optimal as it should have been.

It doesn't take a huge intuitive leap to realize
that this kind of grunt work is *exactly* the kind
of thing computers were designed to handle. As
such, in the late 1980's, various assemblers
started appearing that automatically handled the
correct branch displacements (in most cases,
anyway). I don't personally recall the name of the
product, but the first time I saw this feature was
in a MASM 5.1-compatible assembler (not MASM
itself); TASM had the feature next. I believe
Microsoft added this feature in MASM v6.0. Today,
assemblers like Gas, FASM, and NASM all suport
this feature (and probably others too, I haven't
looked that closely).

Why is this optimization even necessary? Well,
it's pretty obvious that if the target location of
a jump instruction is within +/- 128 bytes, you're
wasting four or more bytes if you're not using the
smaller version of the jump instruction.
Considering that most branches turn out to be
within this range and there are generally quite a
few branches in a program, just using the long
version of each branch can make the program quite
a bit larger than it needs to be.

The Problem

"So what's the big deal?" You're probably asking.
"Why not just scan through the file during
assembly and determine the distance to the target
location and pick the appropriate size for the
branch instruction?"

The problem is, the size of a given branch
instruction may determine the size(s) of some
other branch instruction(s). Consider the
following simple code sequence:

jmp target
<exactly 125 bytes of code>
jmp someOtherTarget
target:

If "someOtherTarget" is within the +/- 128 byte
range of the second jmp instruction above, you can
encode it with only two bytes on the x86;
otherwise you will need at least four bytes. If we
do encode this second jump instruction in two
bytes, then we can encode the first jump
instruction above in two bytes (the actual range
on the x86 is 127 bytes starting with the *next*
instruction after the conditional jump, or -128
bytes before the next instruction following the
conditional jump). OTOH, if the second jump
requires more than two bytes, so will the first
jump. Consider the following degenerate case:

jmp l1
<<125 bytes of code>>
jmp l2
l1:
<<125 bytes of code>>
jmp l3
l2:
<<125 bytes of code>>
jmp l4
l3:
.
.
.
etc.

The *last* jump in this sequence controls the
sizes of all the other jumps in the code! This is
because if the last one is a two-byte opcode, then
the next-to-last one can be two bytes, then the
one before than can be two bytes, then the one
before that...

Unfortunately, there is no easy way (and soon,
you'll see, efficient way) to determine the size
of these jump instructions in a single pass over
the object code the assembler generates. A typical
assembler will do the following:

1. Scan over all the opcodes and determine if any
branches can be reduced in size.

2. If the answer is yes, the assembler reduces the
size of those branches and adjusts all other
instructions whose target addresses and other
values have changed as a result of reducing the
size of the branch instructions.

3. If the answer was yes, repeat steps 1-3 until
you make a pass with no optimizations occuring.

Note, that in the worst case, this algorithm runs
in O(n^2) time (that is, the amount of time it
takes, worst case, is proportional to the square
of the number of branches in the program; double
the number of branches, it takes four times as
long to process them). This isn't very good. But
it isn't terrible, either.

Though the algorithm above looks easy enough, it
turns out to be insufficient. Consider the
following trivial code sequence:

target2:
jmp target1
<< 124 bytes of object code>>
jmp target2
target1:

If we start off assuming that the branches are
long (four bytes each), then a quick pass through
this section of code will suggest that
optimization is not possible. However, were you to
arbitrarily select one of these jumps and make it
a short jump, the other jump would also be short
and both jumps would be within range. Therefore,
an assembler that starts off using large
displacements and shrinks them down can miss
optimization opportunities such as this one.

Another possibility is to start with all short
displacements and expand them as necessary. The
reason many assemblers start with long
displacements and shrink them is for reasons of
robustness -- the code will work, even if it's
less optimal, if a defect causes you to miss an
optimization that could have been introduced into
the object code. OTOH, if you start with short
displacements and extend the ones that are out of
range, a defect in the "correction" process
results in defective code generation. As it turns
out, optimization isn't guaranteed when you start
small and work big, either. So those assembler
authors who choose the robust aren't necessarily
missing out on something.

OTOH, *most* branches are short to begin with, so
starting off with short displacements and
increasing the size of those that are out of range
is going to be faster (O(n^2) >> O(m^2) if n>m).

The hard part for people to believe is that
starting small and working big doesn't necessarily
guarantee an optimal sized object file. In fact,
and this is the part that really blows people
away, sometimes the smallest object file is
achieved by making certain branches larger that
could have been encoded as short branches!

Here's a short example of some MASM source code
that demonstrates this problem:

.386
.model flat, syscall
.code
_HLAMain proc

jmpLbl: jmp near ptr target
jmpSize = $-jmpLbl
byte 32 - jmpSize*2 dup (0)
target:

_HLAMain endp

end

Here's the assembly listing:
Code:

.386
.model flat, syscall
00000000 .code
00000000 _HLAMain proc

00000000 EB 1C jmpLbl: jmp target
00000002 = 00000002 jmpSize = $-jmpLbl
00000002 0000001C [ byte 32 - jmpSize*2
dup
(0)
00
]
0000001E target:

0000001E _HLAMain endp

end

Note that the "object code" is 30 bytes long
(1Eh). Also note that the jump instruction is two
bytes long (that is, it's a short jump).

Now look at what happens when we force the jump to
be a five-byte jump:

.386
.model flat, syscall
.code
_HLAMain proc

jmpLbl: jmp near ptr target
jmpSize = $-jmpLbl
byte 32 - jmpSize*2 dup (0)
target:

_HLAMain endp

end

;;; Listing:

.386
.model flat, syscall
00000000 .code
00000000 _HLAMain proc

00000000 E9 00000016 jmpLbl: jmp near ptr target
00000005 = 00000005 jmpSize = $-jmpLbl
00000005 00000016 [ byte 32 - jmpSize*2
dup
(0)
00
]
0000001B target:

0000001B _HLAMain endp

end

Note that although the jump is now *five* bytes
long (rather than two), the object module is
actually *shorter* (27 bytes [1Bh] rather than
30).

Therefore, you cannot guarantee that you've got
the shortest possible program by simply making all
the jumps as short as they can be and letting it
go at that.

Unfortunately, as this last example demonstrates,
it's quite difficult to determine the optimal
selection of short and long branch instructions in
order to optimize your code. In fact, there have
been some mathematical proofs that show that this
problem is "NP-Complete", which means that the
only way to guarantee you've got an optimal
solution is to try all possible combinations of
short and long displacements in the object file
and select the (or one of the) combination(s) that
is the shortest. Unfortunately, this is an
intractible problem. The algorithm given earlier
runs in quadratic time (O(n^2)), which isn't
great, but still tractible. The best known
algorithms for NP-Complete problems run in
exponential time (i.e., O(2^n) where 'n' is the
number of branches in the program). This means
that adding a *single* branch doubles the amount
of time needed to determine the optimal output
module. To put it bluntly, no computer available
today and, indeed, no computer we can ever imagine
building will be fast enough to handle the
branches found in a reasonably-sized application
program.

Yet assemblers available today *do* perform branch
displacements. So how come they don't run real
slow? The answer is that these assemblers *don't*
guarantee that you'll get an optimal program.
AFAIK, for example, none of them handle the last
case I gave. Most of them simply use the "start
large, work small" or "start small, work large"
algorithms given earlier (for those who have had a
data structures class/algorithms analysis course,
you can think of this as a form the "greedy"
algorithm). As such, while they produce *good*
code (typically better than what someone will
produce by hand), they do not necessarily produce
optimal code.

In fact, some assemblers limit the amount of
optimization they do in order to keep assembly
times low. TASM is famous for this; TASM has the
/M# directive that lets you specify the maximum
number of passes (this is, generally, done to
actually *increase* the default number of passes
that TASM performs). I don't have any information
on MASM. However, the fact that FASM usually
produces better code that MASM suggests that MASM
stops at one point or another, before FASM does. I
have no information on how NASM or Gas handles
branch displacement optimization, so I cannot
comment on their performance.

That's not to say that there is anything wrong
with the way these assemblers do their
"optimization." Most of the time they probably do
produce the optimal program, and when they don't,
they produce something very close to it. That's a
reasonable trade-off given the intractible
alternative.

Although branch displacement optimization is a
tedious chore and it's difficult to add to an
assembler, it is a prime example of one of those
things that computers are much better suited for
than human beings. A decade and a half ago it
might have been reasonable to expect the
programmer to manually take care of branch
displacement optimizations. But given the fact
that most reasonable assemblers today offer this
facility, it's moved into the "must-have" category
for anyone developing a serious assembler.

johnzulu

unread,

Nov 3, 2006, 1:49:55 AM11/3/06

to

>
>Though the algorithm above looks easy enough, it
>turns out to be insufficient. Consider the
>following trivial code sequence:
>
>
>target2:
> jmp target1
> << 124 bytes of object code>>
> jmp target2
>target1:
>
>
>If we start off assuming that the branches are
>long (four bytes each), then a quick pass through
>this section of code will suggest that
>optimization is not possible. However, were you to
>arbitrarily select one of these jumps and make it
>a short jump, the other jump would also be short
>and both jumps would be within range. Therefore,
>an assembler that starts off using large
>displacements and shrinks them down can miss
>optimization opportunities such as this one.
>

Okay, you mean is that by decreasing the near jump to short jmp *by
the assembler* IT will shrink the code section where by it whacks the
others optimization for the rest of the jmps?

>
>Another possibility is to start with all short
>displacements and expand them as necessary. The
>reason many assemblers start with long
>displacements and shrink them is for reasons of
>robustness -- the code will work, even if it's
>less optimal, if a defect causes you to miss an
>optimization that could have been introduced into
>the object code. OTOH, if you start with short
>displacements and extend the ones that are out of
>range, a defect in the "correction" process
>results in defective code generation. As it turns
>out, optimization isn't guaranteed when you start
>small and work big, either. So those assembler
>authors who choose the robust aren't necessarily
>missing out on something.
>
>
>OTOH, *most* branches are short to begin with, so
>starting off with short displacements and
>increasing the size of those that are out of range
>is going to be faster (O(n^2) >> O(m^2) if n>m).
>

Okay the cascading effect.

>
>The hard part for people to believe is that
>starting small and working big doesn't necessarily
>guarantee an optimal sized object file. In fact,
>and this is the part that really blows people
>away, sometimes the smallest object file is
>achieved by making certain branches larger that
>could have been encoded as short branches!
>
>
>Here's a short example of some MASM source code
>that demonstrates this problem:
>
>
> .386
> .model flat, syscall
> .code
>_HLAMain proc
>
>
>jmpLbl: jmp near ptr target

I think you mean:
jmpLbl: jmp target

Yes b'cos the allocated byte array becomes much shorter when the
instrunction becomes much longer. Once I remove the "*2" the address
for target becomes the same.

lets say that there is no instruction so there is a full 32 byte

0 ins = 32 bytes
1 ins = 30 bytes
2 ins = 28 bytes
3 ins = 26 bytes
4 ins = 24 bytes
5 ins = 22 bytes

ins - jmp
bytes - for the allocated bytes db

But I fail to see the application here. But if you mean that the
programmer wanted *such* behaviour with "*2" it is agreeable that
optimization would fail here.

>
>Although branch displacement optimization is a
>tedious chore and it's difficult to add to an
>assembler, it is a prime example of one of those
>things that computers are much better suited for
>than human beings. A decade and a half ago it
>might have been reasonable to expect the
>programmer to manually take care of branch
>displacement optimizations. But given the fact
>that most reasonable assemblers today offer this
>facility, it's moved into the "must-have" category
>for anyone developing a serious assembler.

I agree with a must have feature.

John

japheth

unread,

Nov 3, 2006, 2:58:33 AM11/3/06

to

> The hard part for people to believe is that
> starting small and working big doesn't necessarily
> guarantee an optimal sized object file. In fact,
> and this is the part that really blows people
> away, sometimes the smallest object file is
> achieved by making certain branches larger that
> could have been encoded as short branches!
>

the example supplied is utterly artificial. Surely nothing which "blows
people away". I cannot see how it demonstrates any assembler related
problem.

Betov

unread,

Nov 3, 2006, 3:10:46 AM11/3/06

to

"rand...@earthlink.net" <rand...@earthlink.net> écrivait
news:1162512002....@b28g2000cwb.googlegroups.com:

> 1. Scan over all the opcodes and determine if any
> branches can be reduced in size.
>
>
> 2. If the answer is yes, the assembler reduces the
> size of those branches and adjusts all other
> instructions whose target addresses and other
> values have changed as a result of reducing the
> size of the branch instructions.
>
>
> 3. If the answer was yes, repeat steps 1-3 until
> you make a pass with no optimizations occuring.

Playing the teacher again, without having ever done
anything real, clown?

Yes, this is the traditional way, known by EVERYBODY.

What i was proposing, at that time, was to NOT "repeat
steps 1-3", and this was apparently such an innovative
idea, that you attacked it, saying that this was
impossible.

Since then, i have shown that this was possible... by
DOING IT, with a Mapping Simulation of what a Multi-
Passes does, as i was proposing to do. And of course,
the result is so faster that i can say that... "it
takes no time".

Unfortunately, (RosAsm being under constant developments)
Actually, there is a nasty bug somewhere that i must
have introduced when doing some modification, and as
i have not used this feature since a very long time,
the fix will take probably some time.

... So that i CAN'T point to the facts with evidencies,
and this is the real reason why you are now attacking
on this topic.

I have already said that i will post a message when this
bug will be fixed. So, in between, feel free to be the nerd
you usually are: You will win on that point: You are a real
nerd.

Betov.

< http://rosasm.org >

Betov

unread,

Nov 3, 2006, 4:42:13 AM11/3/06

to

"japheth" <ma...@japheth.de> écrivait news:1162540713.913592.154930
@m7g2000cwm.googlegroups.com:

All of this pure propaganda, and has no relationship with
any reality concerning RosAsm, as long as the Jumps sizes
in RosAsm, are under the control of the programmer.

Now, the programmer himself, might have occasionaly defined
long jumps whereas they could be short. This is a case when
some kind of "size optimization" can be considered, and i
implemented it, more for the programming fun, than for anything
serious: Who could care of saving a couple of Bytes, here and
there, in a Win32 Application written in Assembly?... Yes,
there _are_ guys who believe that this could mean something.
Well, this is evidently absurd, but this is not the problem.
It was just a funny challenge, and an interresting logical
problem, of doing this without the traditional Multi-Passes.

Betov.

< http://rosasm.org >

hutch--

unread,

Nov 3, 2006, 8:09:21 AM11/3/06

to

I have heard most of the theory about jump length optimisation and I
know that most assemblers will do it automatically but where I have
bothered to test the different lengths, I have yet to find a difference
in an algorithm that actually does anything fast.

In MASM it is easy enough to set up because of the specifiers SHORT or
NEAR so all you do is make an equate, something like,

DISTANCE equ NEAR

Use DISTANCE in an algorithm that has enough jumps in it and clock it.

Change the equate to,

DISTANCE equ SHORT

and see if there is any difference. I have yet to see any in 32 bit
code. The speed factor is almost exclusively related to memory access
where you can see the diference in timing by reducing the number of
memory accesses.

After testing this area, I use the default asembler calculated length
becuase there is no gain in specifying the jump distance in terms of
speed.

Regards,

hutch at movsd dot com

KiLVaiDeN

unread,

Nov 3, 2006, 8:31:23 AM11/3/06

to

> rand...@earthlink.net wrote:
>
> [...]

As long as I see, he has implemented this optimization for a while
already, so I really don't see how relevant it is to attack him on that
matter, except if it's because you want to prove "something" to
readers.

In my opinion, there are several things that we can learn from this
post :

- Betov made a mistake thinking the optimization was useless,
didn't recognize it at first, was stubborn for some moments, and
finally implemented it, in its own way. Ok, fine, it's good to know how
Betov's way of handling optimization goes forward, and that he found
his own way to do it, more efficiently.

- Randall Hyde was aware of the optimization for a while, claimed
it was a big important defect in RosASM assembler, and now is putting
it back on the table to push the debate even further, in order to
discredit Betov once more or/and credit himself.

- My attempt for bringing peace on the table was "once more" a
failure, it eventually even went worse after it.

- The war between you both will never stop, unless you both take a
step back to be on common grounds and to stop the bashing : what I
doubt will ever happen.

- We now know in detail about this optimization. At least this is
something positive the post brings, if it was only that, there wouldn't
be a problem, and people would be happy with the "teaching".. But with
all the war favor going with the knowledge, we could easily forget the
real interesting content in profit of the "less important" one, which
is your arguments to attack RosASM or René.

Isn't it time to stop the useless bashing and have grown up discussions
on here ??

Cheers
K

rh...@cs.ucr.edu

unread,

Nov 3, 2006, 9:40:54 AM11/3/06

to

johnzulu wrote:
> >
>
> Okay, you mean is that by decreasing the near jump to short jmp *by
> the assembler* IT will shrink the code section where by it whacks the
> others optimization for the rest of the jmps?

Yup. The example I gave was a bit contrived (basically to make it
easier to understand). Real-world examples do exists. Try searching for
some professional journal (refereed) articles on the subject. This is
how I learned about this subject about a decade ago.

> >
> Yes b'cos the allocated byte array becomes much shorter when the
> instrunction becomes much longer. Once I remove the "*2" the address
> for target becomes the same.

Again, a contrived example just to prove the point. Don't get the
impression that this is the only way this happens. That being said,
also don't get the impression that this commonly happens.

>
> But I fail to see the application here.

Just a demo that it *can* happen. And if it can happen with this
contrived example (which, btw, I developed back in the days of 16-bit
segments, hence a few little irregularities), it's certainly possible
for it to happen in other instances.

> But if you mean that the
> programmer wanted *such* behaviour with "*2" it is agreeable that
> optimization would fail here.

Again, contrived example. You're reading too much into this. All this
is, is a demonstration that the problem *does* exist. This isn't a
partcular pattern I'd expect you to find in programs. Of course,
because of the degenerate case nature of this problem, it's not like
you'll find *many* examples of failure patterns in most programs.

The purpose of this exercise is just to demonstrate that, contrary to
claims being made elsewhere, you cannot "optimize" all the branch
displacements with two passes. In the worst case, it takes O(2**n)
passes (n being the number of branches in the program).

Cheers,
Randy Hyde

rh...@cs.ucr.edu

unread,

Nov 3, 2006, 9:43:10 AM11/3/06

to

japheth wrote:
> > The hard part for people to believe is that
> > starting small and working big doesn't necessarily
> > guarantee an optimal sized object file. In fact,
> > and this is the part that really blows people
> > away, sometimes the smallest object file is
> > achieved by making certain branches larger that
> > could have been encoded as short branches!
> >
>
> the example supplied is utterly artificial.

Of course it is. And rest assured that "real-world" examples are fairly
rare. Otherwise no assembler would be able to do a decent, tractible,
job of optimization.

> Surely nothing which "blows
> people away".

It isn't the example, but the fact that making a jump longer shortens
the code. Completely counter-intuitive.

> I cannot see how it demonstrates any assembler related
> problem.

I demonstrates that you cannot optimize all the branch displacements in
two passes, as is being claimed elsewhere.
Cheers,
Randy Hyde

Betov

unread,

Nov 3, 2006, 9:49:08 AM11/3/06

to

"hutch--" <hu...@movsd.com> écrivait news:1162559361.031685.9180
@m73g2000cwd.googlegroups.com:

> and see if there is any difference. I have yet to see any in 32 bit
> code. The speed factor is almost exclusively related to memory access
> where you can see the diference in timing by reducing the number of
> memory accesses.
>
> After testing this area, I use the default asembler calculated length
> becuase there is no gain in specifying the jump distance in terms of
> speed.

Of course. The so called "Jumps Size Optimization" is,
as its name says, an optimization of the _SIZE_, and
in no way of the speed. Everybody knows this.

The Topic is flat propaganda, and if i played with it
in RosAsm, this was for nothing but for the programming
fun, and for the challenge of a new idea of mines (maybe
i have been re-inventing one another wheel, by the way,
but, for sure, FASM -which was the topic, i and VID were
talking about- did not re-invent that one).

When i am talking of "speed", this is about the one of my
Method (the _speed_ of the method i implemented inside the
Assembler, for calculating the distances, not the speed
of the outputed binary...)

The only cases when a "Jumps Size Optimization" can have
any effect are inside a LOOP, and aside a JECXZ. A
_practical_ effect, given the scope limitations of these
Instructions. And as, anyway, in RosAsm Syntax, the jumps
Sizes are under the programmer's control...

Betov.

< http://rosasm.org >

rh...@cs.ucr.edu

unread,

Nov 3, 2006, 9:55:14 AM11/3/06

to

Betov wrote:

:
>
> > 1. Scan over all the opcodes and determine if any
> > branches can be reduced in size.
> >
> >
> > 2. If the answer is yes, the assembler reduces the
> > size of those branches and adjusts all other
> > instructions whose target addresses and other
> > values have changed as a result of reducing the
> > size of the branch instructions.
> >
> >
> > 3. If the answer was yes, repeat steps 1-3 until
> > you make a pass with no optimizations occuring.
>
> Playing the teacher again, without having ever done
> anything real, clown?

Playing the anatagonist again, without doing the proper research
beforehand?

>
> Yes, this is the traditional way, known by EVERYBODY.

Yet, in a different post around here, you announce what a good idea it
is to do "start small, grow large" as if that was a brand new idea.
Based on the posts to this thread, it's pretty obvious that this
information is *not* known by everybody, including yourself.

>
> What i was proposing, at that time, was to NOT "repeat
> steps 1-3", and this was apparently such an innovative
> idea, that you attacked it, saying that this was
> impossible.

Not that it is impossible. But that you won't achieve as good a degree
of optimization as if you were to repeat the steps. But your argument
is irrelevant. If I understand your most recent posts correctly, all
you're doing is encoding pointers (vit a bit map?) to all the branches
in your program and running steps 1-3 over and over again across your
table of pointers (your "map"). You've sped up the algorithm by a
constant amount, which is good, but all the same restrictions still
apply.

>
> Since then, i have shown that this was possible...

Yes, just like you've shown automatic disassembly to be possible... :-)

> by
> DOING IT, with a Mapping Simulation of what a Multi-
> Passes does, as i was proposing to do.

You really don't get it, do you? All you've done is encode the source
file differently and then run your multiple passes over that encoding
rather than over the original source file. Earth to Rene: this isn't a
new idea. AFAIK, FASM works on an internal representation of the
program when optimizing the code. Your's may be more efficient, but the
basic algorithm is *still* the same. Only your lack of programming
maturity prevents you from seeing this.

> And of course,
> the result is so faster that i can say that... "it
> takes no time".

And, interestingly enough: FASM, which doesn't use your amazing
optimization, still assembles programs FASTER than RosAsm.

>
> Unfortunately, (RosAsm being under constant developments)
> Actually, there is a nasty bug somewhere that i must
> have introduced when doing some modification, and as
> i have not used this feature since a very long time,
> the fix will take probably some time.

You expect us to be surprised by this? You've still not fixed the
symbol table overflow problem after two years. As for when you
introduced it... well...
What you're really saying is that the original design wasn't very good
and you probably missed the problems with it when you first tested it.
You have a history of making pronounciations of how great your code is
(like your disassembler) only to have everyone discover a little later
on that this just isn't the case. This is just one more example in a
long line of them.

>
> ... So that i CAN'T point to the facts with evidencies,
> and this is the real reason why you are now attacking
> on this topic.

The fact that this subject was brought up in another thread, and that
you started making claims about it, was reason enough to dust off the
essay and repost it. And AFAICT, the essay isn't attacking anything.
Until your post, I had no idea that RosAsm's implementation was broken.
So I certainly couldn't have been attacking RosAsm on this issue.

>
> I have already said that i will post a message when this
> bug will be fixed. So, in between, feel free to be the nerd
> you usually are: You will win on that point: You are a real
> nerd.

Personally, I don't care. It isn't bugs in your product that annoy me
(like I really care). It's when you categorically insist that "that
isn't a bug" or when you say "this can be done in one pass when it
cannot." If you just admit the bug, life goes on. All software has
bugs in it. I expect no less of your's.
Cheers,
Randy Hyde

rh...@cs.ucr.edu

unread,

Nov 3, 2006, 9:58:53 AM11/3/06

to

hutch-- wrote:
>
> After testing this area, I use the default asembler calculated length
> becuase there is no gain in specifying the jump distance in terms of
> speed.

The differences in speed are *soooo* negligible. It's mainly caching
effects. Indeed, as pointed out in the thread of my original article,
you can actually improve program performance by using long jumps where
short ones would have worked before. This allows later instructions to
be aligned on a nice boundary (dword, cache line, whatever) which will
allow the program to run faster.

Alas, while the fact that branch displacement optimization is an
NP-Complete program is mainly of theoretical interest (because the
degenerate cases so rarely occur), playing with the branches to align
following code is a *nasty* NP-Complete problem and is intractible even
for common cases. This is one area, however, where selecting a few
instructions by hand could improve performance (over, say, injecting
NOPs into the code).
Cheers,
Randy Hyde

Betov

unread,

Nov 3, 2006, 10:01:52 AM11/3/06

to

"KiLVaiDeN" <kilv...@gmail.com> écrivait news:1162560683.185199.58990
@b28g2000cwb.googlegroups.com:

> - Betov made a mistake thinking the optimization was useless,
> didn't recognize it at first, was stubborn for some moments, and
> finally implemented it, in its own way. Ok, fine, it's good to know how
> Betov's way of handling optimization goes forward, and that he found
> his own way to do it, more efficiently.

No. You are confusing.

1) I have always said that i am against any kind of Code
Optimization, and that, as soon as you see a guy posting
something about Code Level Optimization you can keep 100%
sure that this is from an HLLer having never written any
significative Application in full Assembly.

All of this is about _Speed_ Optimization.

2) In RosAsm the jumps sizes are under the control of the
Programmer, and, anyway, part of the Syntax.

So, there are very poor reason for implementing a jumps
size Optimization in RosAsm.

3) Doing it the way i did was, first, for the fun of an
experiment of a new method of mines, which was some
kind of "logic challenge".

Now, of course, previously to that implementation, the
topic "RosAsm does not optimize the jumps sizes" was,
of course, one of the regular ones of Randall Hyde. But,
can you believe that i would implement anything because
of the insanities of this nerd? Oh _yes_! I did it once:

The "Hi! How are you, Randall Hyde" error message.

:))

Betov.

< http://rosasm.org >

rh...@cs.ucr.edu

unread,

Nov 3, 2006, 10:02:45 AM11/3/06

to

KiLVaiDeN wrote:
> > rand...@earthlink.net wrote:
> >
> > [...]
>
> As long as I see, he has implemented this optimization for a while
> already, so I really don't see how relevant it is to attack him on that
> matter, except if it's because you want to prove "something" to
> readers.

???
The post is no attack, but rather, an attempt to defend against a
personal attack on me. Rene made the claim that I was just repeating
information about "start small, grow large" as if the idea had never
occurred to me (apparently, this was the first time Rene had seen the
idea, based on his positive responses in the thread).

I posted this essay because it's pretty obvious that the idea has been
forgotten now, and it's time to help spread good information about
assembly language (in this case, jump optimization).

Have to catch a train, I will respond to the rest of the post later.
Cheers,
Randy Hyde

johnzulu

unread,

Nov 3, 2006, 10:11:02 AM11/3/06

to

On 3 Nov 2006 06:40:54 -0800, rh...@cs.ucr.edu wrote:

>
>johnzulu wrote:
>> >
>>
>> Okay, you mean is that by decreasing the near jump to short jmp *by
>> the assembler* IT will shrink the code section where by it whacks the
>> others optimization for the rest of the jmps?
>
>Yup. The example I gave was a bit contrived (basically to make it
>easier to understand). Real-world examples do exists. Try searching for
>some professional journal (refereed) articles on the subject. This is
>how I learned about this subject about a decade ago.
>

Any good sites on these articles? I would like to learn more on this.

>Again, contrived example. You're reading too much into this. All this
>is, is a demonstration that the problem *does* exist. This isn't a
>partcular pattern I'd expect you to find in programs. Of course,
>because of the degenerate case nature of this problem, it's not like
>you'll find *many* examples of failure patterns in most programs.
>
>The purpose of this exercise is just to demonstrate that, contrary to
>claims being made elsewhere, you cannot "optimize" all the branch
>displacements with two passes. In the worst case, it takes O(2**n)
>passes (n being the number of branches in the program).

Yes the problem exist and I acknowledge it. If there is a chance then
the program should cater for SUCH features*to it's best possible way*.
John

Betov

unread,

Nov 3, 2006, 10:20:23 AM11/3/06

to

rh...@cs.ucr.edu écrivait news:1162564854.761354.152540
@h48g2000cwc.googlegroups.com:

> Try searching for
> some professional journal (refereed) articles on the subject. This is
> how I learned about this subject about a decade ago.

For sure, this cannot be by doing anything by yourself, clown.

:)

Betov.

< http://rosasm.org >

Betov

unread,

Nov 3, 2006, 10:33:06 AM11/3/06

to

rh...@cs.ucr.edu écrivait news:1162566163.887576.22330
@m7g2000cwm.googlegroups.com:

> Rene made the claim that I was just repeating
> information about "start small, grow large"

Brag what you'd like, clown. Fact is that i was discussing
with a real Asmer about a problem that may be fun to consider,
that he made me a suggestion, - which i am actually testing -,
that is suggestion was interresting, and that you repeated it
like a parrot without having been invited in the discussion,
as far as i can recall.

Now, another point is that, from what i actually see (not
yet 100% sure of it, so, i might be wrong), the "start
large, grow small" method seems to be doing it perfectly.
To be verified. As opposed t you, i am not used to post
what i have read here and there: I do it, first.

Betov.

< http://rosasm.org >

KiLVaiDeN

unread,

Nov 3, 2006, 10:37:55 AM11/3/06

to

> rh...@cs.ucr.edu wrote:
>
> ???
> The post is no attack, but rather, an attempt to defend against a
> personal attack on me. Rene made the claim that I was just repeating
> information about "start small, grow large" as if the idea had never
> occurred to me (apparently, this was the first time Rene had seen the
> idea, based on his positive responses in the thread).

Well it's always the same story, who was the first, the egg or the
chicken ? And then it's, who was the first to attack, Betov or Randall
Hyde ? It would require an infinite amount of CPU time to compute the
newsgroup in order to find who started to attack who first on each
matter...

> I posted this essay because it's pretty obvious that the idea has been
> forgotten now, and it's time to help spread good information about
> assembly language (in this case, jump optimization).
>
> Have to catch a train, I will respond to the rest of the post later.
> Cheers,
> Randy Hyde

The essay is good, I don't deny that, BUT without the flaming and usual
war related kicks, it'd be way more interesting. Too much information (
specially irrelevant one ) kills information.

Cheers
K

rand...@earthlink.net

unread,

Nov 3, 2006, 12:16:05 PM11/3/06

to

Betov wrote:
> rh...@cs.ucr.edu écrivait news:1162566163.887576.22330
> @m7g2000cwm.googlegroups.com:
>
> > Rene made the claim that I was just repeating
> > information about "start small, grow large"
>
> Brag what you'd like, clown. Fact is that i was discussing
> with a real Asmer about a problem that may be fun to consider,
> that he made me a suggestion, - which i am actually testing -,
> that is suggestion was interresting, and that you repeated it
> like a parrot without having been invited in the discussion,
> as far as i can recall.

Yes, I repeated it. From two years ago.

>
> Now, another point is that, from what i actually see (not
> yet 100% sure of it, so, i might be wrong), the "start
> large, grow small" method seems to be doing it perfectly.
> To be verified. As opposed t you, i am not used to post
> what i have read here and there: I do it, first.

And I was there 10 years ago.
Enjoy your experiementation, Rene.
Cheers,
Randy Hyde

rand...@earthlink.net

unread,

Nov 3, 2006, 12:20:40 PM11/3/06

to

KiLVaiDeN wrote:
> > rh...@cs.ucr.edu wrote:
> >
> > ???
> > The post is no attack, but rather, an attempt to defend against a
> > personal attack on me. Rene made the claim that I was just repeating
> > information about "start small, grow large" as if the idea had never
> > occurred to me (apparently, this was the first time Rene had seen the
> > idea, based on his positive responses in the thread).
>
> Well it's always the same story, who was the first, the egg or the
> chicken ? And then it's, who was the first to attack, Betov or Randall
> Hyde ? It would require an infinite amount of CPU time to compute the
> newsgroup in order to find who started to attack who first on each
> matter...

Not really. The Rene attacks started around April 2000. Go back and
have a look. I didn't start responding in kind until about two years
later.

>
> The essay is good, I don't deny that, BUT without the flaming and usual
> war related kicks, it'd be way more interesting. Too much information (
> specially irrelevant one ) kills information.

And where, in the essay, was the flaming?

Perhaps you're thinking that the introduction was a flame?

>>>>>
Recently, Rene "Betov" Tournois has seen the light and thought that
"small-first, grow large" is a good idea. Somehow, he thinks that this

is a new and nifty idea. Of course, the explanation for it has been in

this very newsgroup since Jan, 2004. See the following link and
follow-ups:

http://groups.google.com/group/alt.lang.asm/browse_frm/thread/528907c...

However, because this seems to be of current interest, I'll repost that

essay here:

<<<<<<<<<<<<<<<

Wow, mentioning Rene's name in a post is a flame. You have a very low
tolerance for such things, I suspect. You might go read the original
thread that encouraged me to repost this essay. You might read Rene's
responses in this very thread (accusing me of just "parroting" this
information). Now compare that against the paragraph above. You call
the above an "attack"? Get real.
Cheers,
Randy Hyde

rand...@earthlink.net

unread,

Nov 3, 2006, 12:43:39 PM11/3/06

to

KiLVaiDeN wrote:
>
> In my opinion, there are several things that we can learn from this
> post :
>
> - Betov made a mistake thinking the optimization was useless,
> didn't recognize it at first, was stubborn for some moments, and
> finally implemented it, in its own way.

The essay has nothing to do with Rene's stubborness about refusing to
believe optimization is important to some people. I don't need to
"attack" him on this point, he's making these claims in a newsgroup
where people *are* concerned about such things. People's response to
his claims is sufficent reward for his foolishness.

> Ok, fine, it's good to know how
> Betov's way of handling optimization goes forward, and that he found
> his own way to do it, more efficiently.

Again, not an issue.
What is at issue is that he claims it can be done in two passes. It
cannot. As best I can tell, he is making multiple passes and simply
doesn't realize it. Par for the course. It doesn't matter though.
Either he *is* making multiple passes (or, more correctly, executing
multiple phases), or he isn't achieving the level of optimization that
other assemblers have achieved. Given the other issues present in the
RosAsm package, this would be the *least* of my concerns (that he is
producing files that are a few bytes longer than they need to be).

>
> - Randall Hyde was aware of the optimization for a while,

Yup. I think I read my first formal paper on this about 10 years ago.
And I played around with the idea on a 6502 assembler I wrote back in
the late 1970s and early 1980s, but never got the dynamic programming
stuff figured out back then.

> claimed
> it was a big important defect in RosASM assembler,

Really? Where and when was this claim made? The only person claiming
there is a defect in the RosAsm branch displacement optimization code
is Rene himself. Until he mentioned it, I was never aware of the issue.
If you mean that I'm claiming RosAsm is defective because it doesn't
generate a *perfectly optimal* object file, go back and re-read the
essay. I'm making that claim about *every* assembler out there. If
Rene is truly doing only two passes, that doesn't make his assembler
"defective", just "less optimal" than other assemblers that are
multi-phase. OTOH, based on his latest ramblings, I get the impression
(and I could be wrong) that he is making multiple passes over all those
jump statements, he just seems to be conveniently calling "looping over
my statement map" a single pass, which would be incorrect. Surely,
someone as yourself who has preached to me about asympotitic notation
would realize that making multiple passes over his "statement map" is
the same thing as making multiple passes over the source file -- it's
only a constant difference in time. He *is* to be applauded for
encoding the source file in an efficient manner that speeds up the
multi-phase approach (assuming, of course, that this is what he has
actually done), but all he has done is achieve a constant factor of
speed up. It's still multiphase. And the result is still intractible
with a suitably degenerate input file.

> and now is putting
> it back on the table to push the debate even further, in order to
> discredit Betov once more or/and credit himself.

Actually, it was Rene who started out "discrediting" and my post was
simply a defensive reflex. You seem to be out of touch with what's
going on here.

>
> - My attempt for bringing peace on the table was "once more" a
> failure, it eventually even went worse after it.

Do you honestly think you're the first person to try and "broker a
peace" around here? Sorry, dude, it's been tried before. Guga, for
example, has come to me twice in the past and said "if only you would
stop posting all this stuff that makes Rene crazy, he would stop
responding." So twice, I added Rene to my kill file. Once for something
like six months, once for about three months. It did no good
whatsoever. Rene continued with his attacks unabated. Indeed, he was
probably thrilled that he didn't have to deal with my responses to his
illogical posts. After a while, some people began to assume that what
he was posting was correct because it wasn't being challenged. I assure
you, that won't happen again.

Go back and re-read Rene's response to your apology to this newsgroup.
You'll find telling. You're complaining to the wrong person here. I've
tried people's suggestions in the past. They've all failed. If you want
peace in this newsgroup, you should communicate with Rene, not blame it
on the people that Rene constantly attacks.

>
> - The war between you both will never stop, unless you both take a
> step back to be on common grounds and to stop the bashing : what I
> doubt will ever happen.

Of course it will never happen, because Rene won't do it. He told you
as much in your apology post. Will I step back? As I've pointed out,
I've already tried that twice. It doesn't work. And one other point,
during certain time periods that Rene has been sick, on vacation, or
gone from here for some other reason, sanity returned to this place.
It's pretty obvious where the problem with this newsgroup lies. You
should take your complaints to that source.

>
> - We now know in detail about this optimization. At least this is
> something positive the post brings, if it was only that, there wouldn't
> be a problem, and people would be happy with the "teaching"..

It really wasn't a problem until your post came along, now was it?

> But with
> all the war favor going with the knowledge, we could easily forget the
> real interesting content in profit of the "less important" one, which
> is your arguments to attack RosASM or René.

You seem to be the one to bring up this "war" with RosAsm or Rene.
Except for your posts and Rene's posts, most of the other posts are
concerned with the technical content of the essay. So what does that
say about your posts? (we already know what to expect from Rene.)

>
> Isn't it time to stop the useless bashing and have grown up discussions
> on here ??

Physician, heal thyself!
This thread was a technical one (except, of course, for Rene's posts,
as usual) until you came along.

One other observation:
I notice you're trying to play the "high moral ground" here by
pretending to be the one that rises above all the muck. Let me suggest
that you're failing miserably at your attempt. For example, you're
quick to attack me, as you've done in this post, when you get the
slightest perception that I'm attacking Rene or RosAsm. Yet I never see
you jumping all over Rene when he calls people names, attacks other
products, posts outright lies, and so on. A little one-sided, don't you
think? Unless you change your approach real soon, my respect you
gained with your apology will quickly evaporate and I will assume that
you're just playing some sort of game in the never-ending-war of RosAsm
vs. the world.

Step back and take a look at what you're posting. And as other people
probably don't want to read this stuff, do them a favor and add "///"
to your subject line when you post. Rest assured, I'll still read it
:-)
Cheers,
Randy Hyde