Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

More efficient routine/instruction for (binary) "constrain" ?

2 views
Skip to first unread message

Skybuck Flying

unread,
Sep 19, 2015, 12:07:44 PM9/19/15
to
Hello,

Is there a more efficient routine for the following:

procedure Constrain( var Para : integer );
begin
if Para < 0 then Para := 0;
if Para > 1 then Para := 1;
end;

The idea here is to constrain an integer to 0 if it's negative or zero, or 1
if it's positive.

I tried (Para and 1) but that leads to bugs since values can be positive but
not have bit 0 set.

So something else might be needed ! ;)

Maybe a special instruction in the future if it doesn't exist yet ?

Bye,
Skybuck.



Skybuck Flying

unread,
Sep 19, 2015, 12:12:52 PM9/19/15
to
Also if a special instruction is ever added for it I would recommend
programming languages to implement it as follows for example:

var
a : integer;
begin

A := 23523;
A := (A-1) constrain 1;
end;

Or perhaps:

var
a : integer;
begin

A := 23523;
A := (A-1) con 1;
end;

(sounds to much like "con" like scam or something)
Or perhaps

var
a : integer;
begin

A := 23523;
A := (A-1) cons 1;
end;

^ For a shorter version. though might be confusing with const.

Either con or constrain is nice. Con seems nice for short hand versions and
code efficiency.

Also notice the parameter... it might be interesting to extend the
instruction to some larger range.

constrain 1 I would call: "binary constrain" lol..

But there could be other interesting constrains.

var
a : integer;
begin
A := 23523;
A := (A-1) con 100;
end;

Bye,
Skybuck.


Skybuck Flying

unread,
Sep 19, 2015, 12:16:58 PM9/19/15
to
Hmm the procedure version can't work like that haha...

a := constrain( something-1 ); // leads to errors

It's obviou what problem is... it's not a function so new version:

function Constrain( Para : integer ) : integer;
begin
if Para < 0 then result := 0;
if Para > 1 then result := 1;
end;

That should do it.

Also nice, no more var parameter... so operations like -1 should be
possible.

Not sure if a constant version would be safe ! ;)

Here is constant version, but I am gonna stick with the stack based
(?)/temporarely version for now or so ! ;)

function Constrain( const Para : integer ) : integer;
begin
if Para < 0 then result := 0;
if Para > 1 then result := 1;
end;

Bye,
Skybuck.

Skybuck Flying

unread,
Sep 19, 2015, 12:18:01 PM9/19/15
to
And ofcourse I just noticed another sloppy bug.. hihi...

The greater than sign still left room for undefined results, so finally
correct version is probably:

function Constrain( Para : integer ) : integer;
begin
if Para < 0 then Result := 0;
if Para >= 1 then Result := 1;
end;

Bye,
Skybuck.


Skybuck Flying

unread,
Sep 19, 2015, 12:20:16 PM9/19/15
to
However now Delphi nags about might be undefined result for function...

So for now to get rid of nag:

function Constrain( Para : integer ) : integer;
begin
result := Para;
if Para < 0 then Result := 0;
if Para > 1 then Result := 1;
end;

This kinda stays more true to it's natural intention... and thus is more
correct.

Now it can also be extended with range parameters and the para will only be
adjusted when out of range.

So this version is more correct than previous ones ! ;)

Thanks Delphi compiler for your hint message ! ;) =D lol.

Bye,
Skybuck.


Melzzzzz

unread,
Sep 19, 2015, 12:21:08 PM9/19/15
to
Learn how to program first...

Skybuck Flying

unread,
Sep 19, 2015, 12:25:34 PM9/19/15
to
Strangely enough this version is actually bugged.

It looks correct but it's not...

Don't know yet, why it's not correct, nor do I care much... it does not
concern me... though out of curiosity I may investigate it later or I may
not ;)

buggy:

function Constrain( Para : integer ) : integer;
begin
if Para < 0 then Result := 0;
if Para >= 1 then Result := 1;
end;

Strangely enough

Constrain( something -1 )
or
Constrain( something -2)

might lead to some weird bug where 1 is returned while it should be zero ?

Hmmm... may have to investigate later...

Bye,
Skybuck.

Skybuck Flying

unread,
Sep 19, 2015, 12:28:16 PM9/19/15
to
Haha now I see the bug:

function Constrain( Para : integer ) : integer;
begin
if Para < 0 then Result := 0;
if Para >= 1 then Result := 1;
end;

If following happens:
A = 1
Constrain( A-1)

Then Para = 0;

and thus neither branch will be executed leading to an undefined result ! ;)

Nice example of a totally fucked/obfuscated function which could be used in
an obfuscation contest ! HAHA LOl.

Undefined results <- not sure if this is allowed in obfuscation contest lol
?! As long as it doesn't influence outcome too much... maybe one can get
away with it ! LOL.

Or just use it as futher obfuscations... not sure if obfuscation contests
actually required something to be computed properly ? ;)

Bye,
Skybuck.




Skybuck Flying

unread,
Sep 19, 2015, 1:30:02 PM9/19/15
to
Ah details, details, details ! ;) =D

The devil is in the details ! =D

Been out of it for a little bit...

Though 99.9% of my code was pretty good, ofcourse there are always little
buggies...

But the most nasty ones are trivial little thingies like these which can or
are easy to be overlooked ! ;) =D

Don't you agree that all these version are pretty darn tricky ?! ;)

The mission itself seems simple enough... but quite easy to frack up somehow
! ;)

There's not even an instruction for something as simple as this ? As far as
I know ?

Thus two expensive branches needed ?! ;)

One possible optimization could be:

function Constrain( Para : integer ) : integer; // * opt
begin
if Para <= 0 then
Result := 0;
else
Result := 1;
end;

Not sure if that would actually execute faster than the correct
implementation:

function Constrain( Para : integer ) : integer;
begin
result := Para;
if Para < 0 then Result := 0;
if Para >= 1 then Result := 1;
end;

The "supposedly optimized one" above (see *opt) is also less extendedable I
think...

So the two branch version might actually be required.

Though I can imagine a hard-wired version to perhaps use some kind of
gate-tricks to try and prevent the branches and such ! ;)

Perhaps some maths/muls or something could be used... I have bigger fish to
fry... so for now not going to spend time on it ! ;) =D

But the MICRO people might like thinking about this ! ;) =D

So I shove it over to them ! ;) =D

Bye,
Skybuck.

Skybuck Flying

unread,
Sep 19, 2015, 1:44:18 PM9/19/15
to
Some last little hint about this:

The result of "constrain" will ultimately be used to keep or dispose other
values.

So for example:

Something1 = Something2 * Constrain( Something3 - Something4 );

Perhaps there is a different solution which may not need such accurate
results.

Though in my case the 0's and 1's returned from Constrain would also be
added together for other things.

However perhaps that could have been done differently... not sure yet...
(note to self: related to field length determination).

Anyway maybe for a processor it might be enough to compute something like:

Something5 = Something3-Something4

Or all bits of Something5 together to form a result which indicates a 0 or
1.

So for example if I had to write this in Delphi without a branch using this
idea it would look something like:

function Constrain( Para : integer ) : integer; // * opt
begin
result :=
(Para shr 31) or
(Para shr 30) or
(Para shr 29) or
(Para shr 28) or
(Para shr 27) or
(Para shr 26) or
(Para shr 25) or
(Para shr 24) or
(Para shr 23) or
(Para shr 22) or
(Para shr 21) or
(Para shr 20) or
(Para shr 19) or
(Para shr 18) or
(Para shr 17) or
(Para shr 16) or
(Para shr 15) or
(Para shr 14) or
(Para shr 13) or
(Para shr 12) or
(Para shr 11) or
(Para shr 10) or
(Para shr 9) or
(Para shr 8) or
(Para shr 7) or
(Para shr 6) or
(Para shr 5) or
(Para shr 4) or
(Para shr 3) or
(Para shr 2) or
(Para shr 1) or
(Para shr 0);
end;

This may actually give a valid/decent result without requiring any branch !

Funny thing is it would currently maybe require 32x2 instructions or so...
not sure if that is worth a potential branch-miss-prediction penalty ! ;) =D

Again... this is for the MICRO people to figure out ! LOL.

But these gates could maybe be hard-wired and such... soo... yeah...

There are probably other solutions as well... let me know any other creative
ones ! ;) :)

Also try and keep it fast ! =D

Branchless would also be interesting ! ;)

Bye,
Skybuck.

Skybuck Flying

unread,
Sep 19, 2015, 1:48:43 PM9/19/15
to
Anyway for parallel processors having a special value which would say:
"dispose/keep" "yes/no" could be interesting.

Even for integers...

A little bit like "NAN" for floating point numbers or so.

Not sure if this makes any sense... 0 and 1 could ofcourse be used for it...

Or perhaps a special instruction(s) which can somehow use any variable with
any value and use " <> 0" as keep and " = 0" as dispose.

Or vice versa...

So my point is basically a one and zero value not necessaryily required for
keeping/disposing of other values, some special rules could be applied as
above.

Perhaps even all instructions could be modified to use some kind of "gate"
mechnism.

for example:

add a, b

A gate could be added as follows:

add a,b, gate

If the gate is negative/positive the value/computation of add a,b is kept.
If the gate is zero the computation from add is disgarded.

This could prevent these kinds of seperate constrain computations which
could be beneficial for writing parallel code ! ;)

Bye,
Skybuck.




Skybuck Flying

unread,
Sep 19, 2015, 1:51:48 PM9/19/15
to
So programmers can then write code as follows:

A[0] := B[0] + C[0], D[0];
A[1] := B[1] + C[0], D[1];
A[2] := B[2] + C[0], D[2];
A[3] := B[3] + C[0], D[3];

These can then be computed in parallel without requiring branches and all
kinds of other stuff.

Bye,
Skybuck.

Rod Pemberton

unread,
Sep 19, 2015, 4:14:03 PM9/19/15
to
On Sat, 19 Sep 2015 13:30:04 -0400, Skybuck Flying <skybu...@hotmail.com> wrote:

> One possible optimization could be:
>
> function Constrain( Para : integer ) : integer; // * opt
> begin
> if Para <= 0 then
> Result := 0;
> else
> Result := 1;
> end;
>

First, Flying Nutjob, I really don't care for you spamming
a.l.a. It's been at least a decade now. PLEASE stop.

Second, I can only reply to three Usenet groups at once,
so I dropped most of them. Hopefully, you're watching
the Delphi group since that's the only group relevant
to the question ...

Did you mean "constraint"? The word usually has a 't'
on the end, unless Delphi does something strange, like
restricting procedure name lengths to 9 characters.

Finally, with this post, you're on the correct track,
but you simply haven't proceeded far enough. You can
simplify that further.

function Constrain( Para : integer ) : integer;
begin
result := 0;
if Para > 0 then result := 1;
end;

I don't code Delphi, but I assume that is what you're
wanting, based on the code you posted.


Rod Pemberton


--
Just how many texting and calendar apps does humanity need?

David W Noon

unread,
Sep 19, 2015, 4:59:34 PM9/19/15
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sat, 19 Sep 2015 19:30:04 +0200, in message
<70d54$55fd9b99$d47876e2$7...@news.ziggo.nl>, Skybuck Flying
(skybu...@hotmail.com) wrote:

[snip]
> There's not even an instruction for something as simple as this ?
> As far as I know ?

You don't know very far, then.

> Thus two expensive branches needed ?! ;)

Not in assembler.

> One possible optimization could be:
>
> function Constrain( Para : integer ) : integer; // * opt begin if
> Para <= 0 then Result := 0; else Result := 1; end;

If you want something faster than this, you will need some assembler.

[snip]
> The "supposedly optimized one" above (see *opt) is also less
> extendedable I think...

I don't know what that means.

> So the two branch version might actually be required.
>
> Though I can imagine a hard-wired version to perhaps use some kind
> of gate-tricks to try and prevent the branches and such ! ;)

No.

> Perhaps some maths/muls or something could be used...

No, unless you want it really slow. As Melzzzzz wrote: learn to
program first.

> I have bigger fish to fry...

You'll need a bigger boat.

Here is some code you can try. It is wrapped in Pascal, but the
function you are trying to speed up is in assembler, with zero branch
instructions. In fact, it is only 3 instructions in total.

<=======================================================================>
program dummy_test;

{$mode objfpc} {$J-} {$H-}

{$asmmode intel}
function constrain(para : integer) : integer; register; assembler;
const
one : integer = 1;
zero : integer = 0;
asm
cmp eax, 0
cmovg eax, one
cmovl eax, zero
end;

var
i : integer;
begin
for i := -10 to 10 do
writeln('i = ', i, ' constrain(i) = ', constrain(i))
end.
<=======================================================================>

I tested it with FPC on 32-bit Linux, but it might work with Delphi.
- --
Regards,

Dave [RLU #314465]
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
dwn...@spamtrap.ntlworld.com (David W Noon)
Remove spam trap to reply by e-mail.
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlX9zLYACgkQ9MqaUJQw2MmyNQCfc+oReiTY4EWgg6lxH8imJ+0w
OtYAoILtC4+aF1ag6ynnlEryOPcSbd0A
=19z7
-----END PGP SIGNATURE-----

Skybuck Flying

unread,
Sep 19, 2015, 8:55:56 PM9/19/15
to
Another idea which came to mind to speed it up is or two bits together
before shifting it down, to save on shifts and or's.

The way this can be done easily and even massively parallel is by using two
"half arrays" (splitting the array in two).

This can be done with nice array type casts, I will attempt to write some
code here, but not going to test it... it's late need to go to bed soon :)

I watched two cool movies today: "star trek into the darkness" (already seen
a crappy cam version of it) and "the 5th element" enjoyed that one too...
kinda funny movie.

Anyway now that that's done time to write some more code ! LOL ;) =D

I hope more reasoning is correct lol... here goes:

I was gonna write:

type
TWordArray = array[0..1] of word;
TByteArray = array[0..1] of word;

But unfortunately this cannot go down to just nimbles and half nimbles and
bits... Delphi does not support it.

So a different solution will need to be used... "divide and conquer" like
approach with shrs should be possible.

Like a binary search so here goes:

The idea now is to split up/copy the parameter into two registers and shift
one of them down and perform the or's and such.

To make it more clear they will be called top and bottom, top is to be
shifted down.

Upper part of bottom will need to be cleared... unfortunately enough ;)

Going to typecast to longword/unsigned integer just in case ! ;)

function Constrain( Para : integer ) : integer; // * opt
var
ParaTop : longword;
ParaBottom : longword;
begin

// 2x16
ParaTop := longword(Para);
ParaBottom := longword(Para);

// now shift top half way down
ParaTop := ParaTop shr 16;

// and clear upper part of bottom
ParaBottom := ParaBottom and (65535 shl 16);

// now or them together
Para := ParaTop or ParaBottom;

// now repeat for 2x8
ParaTop := longword(Para);
ParaBottom := longword(Para);

// now shift top half way down
ParaTop := ParaTop shr 8;

// and clear upper part of bottom
ParaBottom := ParaBottom and (255 shl 8);

// now or them together
Para := ParaTop or ParaBottom;

// now repeat for 2x4
ParaTop := longword(Para);
ParaBottom := longword(Para);

// now shift top half way down
ParaTop := ParaTop shr 4;

// and clear upper part of bottom
ParaBottom := ParaBottom and (15 shl 4);

// now or them together
Para := ParaTop or ParaBottom;

// now repeat for 2x2
ParaTop := longword(Para);
ParaBottom := longword(Para);

// now shift top half way down
ParaTop := ParaTop shr 2;

// and clear upper part of bottom
ParaBottom := ParaBottom and (3 shl 2);

// now or them together
Para := ParaTop or ParaBottom;

// now repeat for 2x1
ParaTop := longword(Para);
ParaBottom := longword(Para);

// now shift top half way down
ParaTop := ParaTop shr 1;

// and clear upper part of bottom
ParaBottom := ParaBottom and (1 shl 1);

// now or them together
Para := ParaTop or ParaBottom;

Result := Para;
end;

// abour 34 instructions or so would be my estimation.

Possibly an improvement over previous version.

This newer version assuming it's correct could probably be optimized
somewhat here and there ! ;) :)

Bye,
Skybuck :)

dunno

unread,
Sep 19, 2015, 9:22:43 PM9/19/15
to
You should check SET instruction. It sets the one bit of register if some
flag is set (effectively making it 1), otherwise it turns it to zero.
Consider this code:

test %eax, %eax
setg %ebx

Result is in %ebx.

> var
> i : integer;
> begin
> for i := -10 to 10 do
> writeln('i = ', i, ' constrain(i) = ', constrain(i))
> end.
> <=======================================================================>
>
> I tested it with FPC on 32-bit Linux, but it might work with Delphi.
> - --
> Regards,
>
> Dave [RLU #314465]
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
> dwn...@spamtrap.ntlworld.com (David W Noon)
> Remove spam trap to reply by e-mail.
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2
>
> iEYEARECAAYFAlX9zLYACgkQ9MqaUJQw2MmyNQCfc+oReiTY4EWgg6lxH8imJ+0w
> OtYAoILtC4+aF1ag6ynnlEryOPcSbd0A
> =19z7
> -----END PGP SIGNATURE-----


--
dunno

Skybuck Flying

unread,
Sep 19, 2015, 9:41:52 PM9/19/15
to


"Rod Pemberton" wrote in message news:op.x477tvreyfako5@localhost...
Hi thanks for your reply, your code seems somewhat obvious, but some may
consider it not optimal because it involves two assignments.

The "opt" version I wrote only assigns once.

Some may consider that more optimal... so it depends on what is optimal ? ;)

Bye,
Skybuck.

Skybuck Flying

unread,
Sep 19, 2015, 9:48:33 PM9/19/15
to
Ok, thanks Dave ! ;) =D

This function was tested in Delphi XE7 and works just fine ! =D

function constrain(para : integer) : integer; register; assembler;
const
one : integer = 1;
zero : integer = 0;
asm
cmp eax, 0
cmovg eax, one
cmovl eax, zero
end;

One little objection against this might be the use of these special looking
instructions ?!

Which processor generation would be necessary for this function to work ?

I'd doubt this code would work on a 80486 ?!?

Not sure if cmovg was part of 486 and such ! ;) :)

Besides from that... coolest solution so far ! :)

I may have to time all functions to see which one actually works fastest...

Though perhaps the looping code might take most of the time... though I do
expect there to be a time difference ;)

To be continued... perhaps... :)

Bye,
Skybuck.

Skybuck Flying

unread,
Sep 19, 2015, 9:54:40 PM9/19/15
to
Another little side note about this one as I read the instruction manual.

Apperently this style not supported for 8 bit operands... that might be an
issue if 8 bit version required...

For now a 32 bit version will do ;) but in future could give troubles ! ;)

function constrain(para : integer) : integer; register; assembler;
const
one : integer = 1;
zero : integer = 0;
asm
cmp eax, 0
cmovg eax, one
cmovl eax, zero
end;

Bye,
Skybuck :)

Skybuck Flying

unread,
Sep 19, 2015, 10:06:30 PM9/19/15
to
Thanks !

You seem to be the winner for now "dunno", unless it's slow or something !
;)

Or perhaps there are other constraints/limitations... but for now.. my own
rolled version on your idea seems to work just fine:

function ConstrainV9(para : integer) : integer; register; assembler;
asm
test eax, eax
setg al
end;

(Other newsgroups added so they too can see this fine 2 instruction solution
! ;) :) and perhaps comment on it further if there are any hidden issues
?!;))

Best of all this routine seems fully backwards compatible with 80486 ! It
goes even back to 80386 compatibility... nice ! ;) =D

Bye,
Skybuck :D


Skybuck Flying

unread,
Sep 19, 2015, 10:09:17 PM9/19/15
to
Wooooooooopppss ! =D

I cheered too soon LOL.

This function also has a problem with negative numbers !

AAAAAHHH To bad for you "dunno" lol... your idea may be bogus after all ?!

Problem was I forgot to update the test code from calling ConstrainV8 to
ConstrainV9 ! LOL.

So it was testing the wrong/old function !

Now the new one is being called and it's BUGGGGGGGED ! ;) =D

function ConstrainV9(para : integer) : integer; register; assembler;
asm
test eax, eax
setg al
end;

Maybe it can be fixed somehow ?! ;) :) =D

Bye for now,
Skybuck.

dunno

unread,
Sep 19, 2015, 10:12:51 PM9/19/15
to
Oh wait. IF I remember correctly SET instruction takes only lower part of
register as argument. If that's the case, there is a need to clear the rest
of it if compiler returns only 32-bit values from inline assembly code. So
the code would be:

xor %ebx, %ebx
test %eax, %eax
setg %bl

Result is in %ebx.

If compiler returns value of %eax register, then the code would be:

test %eax, %eax
setg %bl
movzb %bl, %eax

Skybuck Flying

unread,
Sep 19, 2015, 10:16:33 PM9/19/15
to
This seems to fix it:

function ConstrainV9(para : integer) : integer; register; assembler;
asm
test eax, eax
mov eax, 0
setg al
end;

I like the above one better than this one,

// this style supports 16 bit, 32 bit and perhaps even 64 bit, will probably
not work on 80486 and not in 8 bit versions.
function ConstrainV8(para : integer) : integer; register; assembler;
const
one : integer = 1;
zero : integer = 0;
asm
cmp eax, 0
cmovg eax, one
cmovl eax, zero
end;

since the top one is shorter and more backwards compatible and doesn't
require these bloody constants :)

Me not assembler expert so not sure if adding the mov, 0 could lead to any
problems ?! ;)

I think that was main problem with top one... it didn't clear eax enough...
and could lead residue in it leading to wrong/garbaged results ! ;)

Another solution might be:

function ConstrainV10(para : integer) : integer; register; assembler;
asm
test eax, eax
setg al
and eax, 1
end;

This one is also correct, I like this one even more than the mov version...
cause the setg is right after the test... so no chance of mov screwing up
any flags ?! ;) :)

Though if for some reason the flag have to be kept then the and version
might screw that up... but that's ok... for now the flags do not need to be
preserved...

if flags would need to be preserved than maybe mov version might be nicer !
;)

Bye,
Skybuck.

dunno

unread,
Sep 19, 2015, 10:24:39 PM9/19/15
to
Yes, you have to clear upper 24 bits of register. Apparently SET is
accessing only lower 8 bits.

Try this:

test eax, eax
setg bl
movzb eax, bl

Or, if you don't want to use MOVZB, you can do it with AND:

test eax, eax
setg al
and eax, 0xFFFFFF00h


> Bye for now,
> Skybuck.


--
dunno

dunno

unread,
Sep 19, 2015, 10:27:10 PM9/19/15
to
This won't work.

> This one is also correct, I like this one even more than the mov
> version... cause the setg is right after the test... so no chance of mov
> screwing up any flags ?! ;) :)
>
> Though if for some reason the flag have to be kept then the and version
> might screw that up... but that's ok... for now the flags do not need to be preserved...
>
> if flags would need to be preserved than maybe mov version might be nicer ! ;)
>
> Bye,
> Skybuck.


--
dunno

dunno

unread,
Sep 19, 2015, 10:35:13 PM9/19/15
to
No wait. You right. But it should be OR instead of AND, otherwise result
always will be 1.

dunno

unread,
Sep 19, 2015, 10:47:59 PM9/19/15
to
Apparently I forgot how AND is working. :) It should be:

test eax, eax
setg al
or eax, 1

Bernhard Schornak

unread,
Sep 19, 2015, 11:28:44 PM9/19/15
to
Skybuck Flying wrote:


> ... unless it's slow or something !
>
> function ConstrainV9(para : integer) : integer; register; assembler;
> asm
> test eax, eax
> setg al


This will be slow because it forces register merging (EAX => AL),
see "partial register writes" in AMD's ans iNTEL's data sheets...


Greetings from Augsburg

Bernhard Schornak

Bernhard Schornak

unread,
Sep 19, 2015, 11:29:00 PM9/19/15
to
Accessing a lookup table is *much* faster...

Besides that: Why do you post HLL goobledigook to an assembler NG?

Rod Pemberton

unread,
Sep 20, 2015, 1:34:48 AM9/20/15
to
On Sat, 19 Sep 2015 21:41:54 -0400, Skybuck Flying <skybu...@hotmail.com> wrote:

> "Rod Pemberton" wrote in message news:op.x477tvreyfako5@localhost...
>> On Sat, 19 Sep 2015 13:30:04 -0400, Skybuck Flying <skybu...@hotmail.com>
>> wrote:

>>> One possible optimization could be:
>>>
>>> function Constrain( Para : integer ) : integer; // * opt
>>> begin
>>> if Para <= 0 then
>>> Result := 0;
>>> else
>>> Result := 1;
>>> end;
>>>
>>
>> Finally, with this post, you're on the correct track,
>> but you simply haven't proceeded far enough. You can
>> simplify that further.
>> function Constrain( Para : integer ) : integer;
>> begin
>> result := 0;
>> if Para > 0 then result := 1;
>> end;
>>
> Hi thanks for your reply, your code seems somewhat obvious,
> but some may consider it not optimal because it involves
> two assignments.

An if-then-else will produce more branches for the code than
for just an if-then. Reducing the branching is where the
optimization comes in. An if-then-else has to produce branches
around two sections instead of just one section for if-then.
Branches are slow on x86, in part due to misprediction, and
usually slow enough that an extra assignment is faster than
eliminated branches.

Also, most programming languages will automatically clear
certain variables, i.e., meaning that 'result' may already
be cleared upon entry to Constrain. This depends on how
Delphi initializes variables, for which I'm not familiar.

Are you able to view the assembly produced by each? If so,
then you can see the shortest solution. Typically, the
shortest solution on x86 will also be the fastest, with a
few exceptions for various x86 issues, such as partial
register stalls, branch misprediction, or slower CISC
instructions like SETcc, all of which are presented in this
thread... If you see them, rewrite.

Skybuck Flying

unread,
Sep 20, 2015, 7:47:06 AM9/20/15
to


"dunno" wrote in message news:mtl610$bh5$1...@news.mixmin.net...
I don't think so, or 1 will always set the result to 1, that is undesired.

and 1 will single out the last bit, it will be 0 if it's zero and it will be
1 if it's 1.

It works.

Bye,
Skybuck.

Skybuck Flying

unread,
Sep 20, 2015, 7:50:17 AM9/20/15
to


"Bernhard Schornak" wrote in message news:mtl927$os0$2...@dont-email.me...
How would it look like ?

"
Besides that: Why do you post HLL goobledigook to an assembler NG?
"

It looks very much like assembler :)

Simple instructions really.

Bye,
Skybuck.

wolfgang kern

unread,
Sep 20, 2015, 9:05:06 AM9/20/15
to

"dunno" recommended:

> test %eax, %eax
> setg %bl
> movzb %bl, %eax

Yeah, this is almost what I'd have for it.

or eax,eax
setns al
movzxb eax,al

yours may be faster but needs one more reg.

And because I'm not bound to HLL's 32-bit true/false (0/1) flags,
I'd just have:

or eax,eax ;sign- and zero-bits tells enough how to proceed.
__
wolfgang

David W Noon

unread,
Sep 20, 2015, 11:58:51 AM9/20/15
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sun, 20 Sep 2015 03:48:35 +0200, in message
<d746f$55fe1070$d47876e2$20...@news.ziggo.nl>, Skybuck Flying
(skybu...@hotmail.com) wrote:

> Ok, thanks Dave ! ;) =D

No problems.

> Which processor generation would be necessary for this function to
> work ?

It requires a Pentium Pro (i686) or later. The Pentium Pro was
released in 1995, so it won't work on anything more than 20 years old
for Intel processors. For AMD processors you'll need a K7 (32-bit
Athlon/Duron/Thunderbird/etc.) or newer.
- --
Regards,

Dave [RLU #314465]
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
dwn...@spamtrap.ntlworld.com (David W Noon)
Remove spam trap to reply by e-mail.
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlX+17oACgkQ9MqaUJQw2Ml00wCgoNLgHQrP5DkIHgK+E1w7Ptnc
EpUAn0DxUjwKCtawYFYPKWBWPynM0tn1
=awzw
-----END PGP SIGNATURE-----

Skybuck Flying

unread,
Sep 20, 2015, 12:09:38 PM9/20/15
to
"
Yes, you have to clear upper 24 bits of register. Apparently SET is
accessing only lower 8 bits.
"

The test does compare all 32 bits as far as I know...

Once the test is done the only thing that is needed is a 1 or a 0 to return
as a result.

"
test eax, eax
setg bl
movzb eax, bl
"

movzb not supported by Delphi.


Or, if you don't want to use MOVZB, you can do it with AND:

"
test eax, eax
setg al
and eax, 0xFFFFFF00h
"

This is bugged for some reason...

I think this actually keeps the upper bits which would be wrong.

I think what you might be looking for is an inverted mask, which does clear
out the upper bits and keeps the lower bits:

"
test eax, eax
setg al
and eax, 0x000000FFh
"

^ This does work.

"
Apparently I forgot how AND is working. :) It should be:

test eax, eax
setg al
or eax, 1
"

No this also doesn't work lol... ofcourse or will keep the upper bits as
well.

since setg doesn't clear upper bits yet ?!

Did you have a good night sleep ?! lol... your brain seems to be a
mess/confused or you as noob... or you thought this was easy.... but
apperently it's not ! LOL.

You are forgiven though...

I guess sometimes the human brain can be easily fooled/confused by something
that looks easy/trivial... but then again it's not ! LOL.

Lazyness got to you ?! =D

Bye,
Skybuck ;) =D

Skybuck Flying

unread,
Sep 20, 2015, 12:11:49 PM9/20/15
to
"
Did you mean "constraint"? The word usually has a 't'
on the end, unless Delphi does something strange, like
restricting procedure name lengths to 9 characters.
"

No, constrain as in a verb.

"To contrain yourself"

Fits more with natural language ! ;)

Writing code as natural as possible is preferred ! ;)

Bye,
Skybuck.

Skybuck Flying

unread,
Sep 20, 2015, 12:44:17 PM9/20/15
to


"Rod Pemberton" wrote in message news:op.x48xsgmtyfako5@localhost...
Well the nice thing about a Delphi/High Level Language code only solution is
that it could be inlined.

And then the assumption that registers are used for writing might be
false/wrong.

So in reality in such a case perhaps the writing will happen directly to
memory.

And in that case two writes might actually be slower than a branch.

With some luck it might end up in data cache but in a heavy application it
might still need writing to memory and/or heavy cache trashing could take
place.

Thus in such a situation I am not yet convinced a two branch solution would
be slower than two memory writes.

So I thought I'd point that out to you... if it's true what I wrote I am not
certain... but it could be :)

"
Also, most programming languages will automatically clear
certain variables, i.e., meaning that 'result' may already
be cleared upon entry to Constrain. This depends on how
Delphi initializes variables, for which I'm not familiar.

I think Delphi will leave eax as it is... it is used for parameter passing I
think.

"
Are you able to view the assembly produced by each? If so,
"

For what it's worth here is assembler from Delphi (compiler in debug mode
though):

Here is my version:

TestProgram.dpr.156: begin
0041A0EC 55 push ebp
0041A0ED 8BEC mov ebp,esp
0041A0EF 83C4F8 add esp,-$08
0041A0F2 8945FC mov [ebp-$04],eax
TestProgram.dpr.157: result := Para;
0041A0F5 8B45FC mov eax,[ebp-$04]
0041A0F8 8945F8 mov [ebp-$08],eax
TestProgram.dpr.158: if Para < 0 then Result := 0;
0041A0FB 837DFC00 cmp dword ptr [ebp-$04],$00
0041A0FF 7D05 jnl $0041a106
0041A101 33C0 xor eax,eax
0041A103 8945F8 mov [ebp-$08],eax
TestProgram.dpr.159: if Para >= 1 then Result := 1;
0041A106 837DFC01 cmp dword ptr [ebp-$04],$01
0041A10A 7C07 jl $0041a113
0041A10C C745F801000000 mov [ebp-$08],$00000001
TestProgram.dpr.160: end;
0041A113 8B45F8 mov eax,[ebp-$08]
0041A116 59 pop ecx
0041A117 59 pop ecx
0041A118 5D pop ebp
0041A119 C3 ret

// 18 instructions.

and here is your version:

TestProgram.dpr.204: begin
0041A0EC 55 push ebp
0041A0ED 8BEC mov ebp,esp
0041A0EF 83C4F8 add esp,-$08
0041A0F2 8945FC mov [ebp-$04],eax
TestProgram.dpr.205: result := 0;
0041A0F5 33C0 xor eax,eax
0041A0F7 8945F8 mov [ebp-$08],eax
TestProgram.dpr.206: if Para > 0 then result := 1;
0041A0FA 837DFC00 cmp dword ptr [ebp-$04],$00
0041A0FE 7E07 jle $0041a107
0041A100 C745F801000000 mov [ebp-$08],$00000001
TestProgram.dpr.207: end;
0041A107 8B45F8 mov eax,[ebp-$08]
0041A10A 59 pop ecx
0041A10B 59 pop ecx
0041A10C 5D pop ebp
0041A10D C3 ret

// 14 instructions.

"
then you can see the shortest solution. Typically, the
shortest solution on x86 will also be the fastest, with a
few exceptions for various x86 issues, such as partial
register stalls, branch misprediction, or slower CISC
instructions like SETcc, all of which are presented in this
thread... If you see them, rewrite.
"

I have yet to do performance measurements but I may do so in near future !
;) :)

Yeah the slowness of SET does ring a vague bell...

I wonder why it's so slow though ? Why has a faster implementation not come
around ?

Perhaps a bit besides the topic but still ;)

Gonna add comp.arch back to it in case any of the comp.arch guys want to
respond to that :)

Bye,
Skybuck.

Skybuck Flying

unread,
Sep 20, 2015, 12:58:44 PM9/20/15
to
I also examined inlined versions.

Will post them here just for good measure:

Mine has 11 instructions in total:
Mine does 5 moves on first run -MaxInt as input:

TestProgram.dpr.217: vOutput := ConstrainV7(vIndex );
0041A0F9 8B45FC mov eax,[ebp-$04]
0041A0FC 8945F4 mov [ebp-$0c],eax
0041A0FF 837DFC00 cmp dword ptr [ebp-$04],$00
0041A103 7D05 jnl $0041a10a
0041A105 33C0 xor eax,eax
0041A107 8945F4 mov [ebp-$0c],eax
0041A10A 837DFC01 cmp dword ptr [ebp-$04],$01
0041A10E 7C07 jl $0041a117
0041A110 C745F401000000 mov [ebp-$0c],$00000001
0041A117 8B45F4 mov eax,[ebp-$0c]
0041A11A 8945F8 mov [ebp-$08],eax

Yours has 7 instructions in total:
Yours does 3 moves on first run -MaxInt as input:

TestProgram.dpr.217: vOutput := ConstrainV12( vIndex );
0041A0F9 33C0 xor eax,eax
0041A0FB 8945F4 mov [ebp-$0c],eax
0041A0FE 837DFC00 cmp dword ptr [ebp-$04],$00
0041A102 7E07 jle $0041a10b
0041A104 C745F401000000 mov [ebp-$0c],$00000001
0041A10B 8B45F4 mov eax,[ebp-$0c]
0041A10E 8945F8 mov [ebp-$08],eax

Actually the one I am comparing is this one:

// all are bugged except this one.
function ConstrainV7( Para : integer ) : integer; inline;
begin
result := Para;
if Para < 0 then Result := 0;
if Para >= 1 then Result := 1;
end;

Funny enough it also has double writes..

So I compared the wrong one...

The one I was talking about was this one, the optimized one:

// begin end statement blocks added I like that better for more parser
robust code ;)

function Constrain( Para : integer ) : integer; // * opt
begin
if Para <= 0 then
begin
Result := 0;
end else
begin
Result := 1;
end;
end;

So I will re-examine this one for normal call and inlined as well ! ;) :)

Normal version:

// 15 instructions:

// 4 moves, one doubtfull from ebp so might only be 3 moves.
TestProgram.dpr.212: begin
0041A0EC 55 push ebp
0041A0ED 8BEC mov ebp,esp
0041A0EF 83C4F8 add esp,-$08
0041A0F2 8945FC mov [ebp-$04],eax
TestProgram.dpr.213: if Para <= 0 then
0041A0F5 837DFC00 cmp dword ptr [ebp-$04],$00
0041A0F9 7F07 jnle $0041a102
TestProgram.dpr.215: Result := 0
0041A0FB 33C0 xor eax,eax
0041A0FD 8945F8 mov [ebp-$08],eax
0041A100 EB07 jmp $0041a109
TestProgram.dpr.218: Result := 1;
0041A102 C745F801000000 mov [ebp-$08],$00000001
TestProgram.dpr.220: end;
0041A109 8B45F8 mov eax,[ebp-$08]
0041A10C 59 pop ecx
0041A10D 59 pop ecx
0041A10E 5D pop ebp
0041A10F C3 ret

// inlined version:

// 8 instructions:

// 3 moves:
TestProgram.dpr.229: vOutput := ConstrainV13(vIndex );
0041A0F9 837DFC00 cmp dword ptr [ebp-$04],$00
0041A0FD 7F07 jnle $0041a106
0041A0FF 33C0 xor eax,eax
0041A101 8945F4 mov [ebp-$0c],eax
0041A104 EB07 jmp $0041a10d
0041A106 C745F401000000 mov [ebp-$0c],$00000001
0041A10D 8B45F4 mov eax,[ebp-$0c]
0041A110 8945F8 mov [ebp-$08],eax

Seems similar on the "moves" front :)

Bye,
Skybuck.

Melzzzzz

unread,
Sep 20, 2015, 7:39:38 PM9/20/15
to
On Sun, 20 Sep 2015 05:28:42 +0200
Bernhard Schornak <scho...@web.de> wrote:

> Skybuck Flying wrote:
>
>
> > ... unless it's slow or something !
> >
> > function ConstrainV9(para : integer) : integer; register; assembler;
> > asm
> > test eax, eax
> > setg al
>
>
> This will be slow because it forces register merging (EAX => AL),
> see "partial register writes" in AMD's ans iNTEL's data sheets...
>
>
Hmmm, I see gcc (svn trunk) generates such code for returning bool value
from function. Perhaps they know something we don't know?
eg for:
bool fun(int x) {
if (x > 5) return true;
return false;
}
gcc generated:
cmpl $5, %edi
setg %al
ret

Melzzzzz

unread,
Sep 20, 2015, 8:21:35 PM9/20/15
to
Ah I see now, it uses different registers

Bernhard Schornak

unread,
Sep 21, 2015, 3:24:53 AM9/21/15
to
Yes. ;)

MOVcc is much better if you want to set a register to a desired
value - SETcc's only advantage is its shorter instruction size,
but you buy it at the cost of a possible merge penalty.

wolfgang kern

unread,
Sep 21, 2015, 3:30:25 AM9/21/15
to

Flying Bucket posted:

> and here is your version:
>
> TestProgram.dpr.204: begin
> 0041A0EC 55 push ebp
> 0041A0ED 8BEC mov ebp,esp
> 0041A0EF 83C4F8 add esp,-$08
> 0041A0F2 8945FC mov [ebp-$04],eax
> TestProgram.dpr.205: result := 0;
> 0041A0F5 33C0 xor eax,eax
> 0041A0F7 8945F8 mov [ebp-$08],eax
> TestProgram.dpr.206: if Para > 0 then result := 1;
> 0041A0FA 837DFC00 cmp dword ptr [ebp-$04],$00
> 0041A0FE 7E07 jle $0041a107
> 0041A100 C745F801000000 mov [ebp-$08],$00000001
> TestProgram.dpr.207: end;
> 0041A107 8B45F8 mov eax,[ebp-$08]
> 0041A10A 59 pop ecx
> 0041A10B 59 pop ecx
> 0041A10C 5D pop ebp
> 0041A10D C3 ret
>
> // 14 instructions.

14 instructions, 35 byte and 4 or 5 memory accesses.

I really can't see WHY you need a stack-frame and locals here,
your only input parameter is in EAX, also returns in EAX.
And why do you alter ECX without using it.

A skilled programmer would use a decent compiler and produce:

test:
xor ecx,ecx
or eax,eax
js $+1 ;skip over inc
inc ecx
mov eax,ecx
ret

this are 9 byte and for sure much faster than yours.

or even one byte shorter and branchless:

test:
or eax,eax
setns al
movzx eax,al
ret
__
wolfgang

James Harris

unread,
Sep 21, 2015, 6:09:52 AM9/21/15
to
"Bernhard Schornak" <scho...@web.de> wrote in message
news:mtob8h$6uk$1...@dont-email.me...

...

> MOVcc is much better if you want to set a register to a desired
> value - SETcc's only advantage is its shorter instruction size,
> but you buy it at the cost of a possible merge penalty.

Another advantage of SETcc is that it is available on earlier CPUs.
Unfortunately MOVcc is not implemented on all x86-32 machines.

James

James Harris

unread,
Sep 21, 2015, 6:15:50 AM9/21/15
to
"wolfgang kern" <now...@never.at> wrote in message
news:mtobmd$bcb$1...@speranza.aioe.org...

...

> test:
> or eax,eax
> setns al movzx eax,al
> ret

Is it better to use OR or TEST? I usually use the latter to avoid extra
traffic to registers and write conflicts. For example, in this case
SETNS AL could, at least in theory, have to wait for OR EAX, EAX to
complete so it can then update the low byte of the same register.

Followups set to just alt.lang.asm.

James

Melzzzzz

unread,
Sep 21, 2015, 7:25:55 AM9/21/15
to
xeon phi does not supports cmov ;(

Bernhard Schornak

unread,
Sep 21, 2015, 8:33:03 AM9/21/15
to
Surely a fact to consider if code is written for old hardware.
I practise 64 bit programming, where MOVcc is available on all
machines capable to run in long mode. I forgot a lot about the
missing features we were badly missing in times where the 8088
was sold as the dog's bollocks... ;)

Bernhard Schornak

unread,
Sep 21, 2015, 8:33:19 AM9/21/15
to
wolfgang kern wrote:


> And why do you alter ECX without using it.


GCC treats ECX and EDX as "volatile" registers. That is: Both
must be preserved by the calling function if their content is
required on function return. A quite stupid behaviour, but it
is common practise for most C compilers.

James Harris

unread,
Sep 21, 2015, 9:12:30 AM9/21/15
to
"Bernhard Schornak" <scho...@web.de> wrote in message
news:mtotaa$97j$2...@dont-email.me...
> James Harris wrote:

...

>> Another advantage of SETcc is that it is available on earlier CPUs.
>> Unfortunately MOVcc is not implemented on all x86-32 machines.
>
>
> Surely a fact to consider if code is written for old hardware.
> I practise 64 bit programming, where MOVcc is available on all
> machines capable to run in long mode.

OK

> I forgot a lot about the
> missing features we were badly missing in times where the 8088

Just checked. According to the Nasm manual CMOV became available on the
P6. I think that means Pentium Pro and later. So the instruction was not
even available on a Pentium. That's the old Pentium, not the new CPU of
the same name. Why Intel thought it was a good idea to reuse the name is
a mystery.

James

Bernhard Schornak

unread,
Sep 21, 2015, 12:04:11 PM9/21/15
to
James Harris wrote:


> "Bernhard Schornak" <scho...@web.de> wrote in message news:mtotaa$97j$2...@dont-email.me...
>> James Harris wrote:

<snip>

> Just checked. According to the Nasm manual CMOV became available on the P6. I think that means
> Pentium Pro and later. So the instruction was not even available on a Pentium. That's the old
> Pentium, not the new CPU of the same name. Why Intel thought it was a good idea to reuse the name is
> a mystery.


The old bull (aka iNTEL) can do almost everything... ;)

Rod Pemberton

unread,
Sep 21, 2015, 6:00:42 PM9/21/15
to
On Mon, 21 Sep 2015 03:28:52 -0400, wolfgang kern <now...@never.at> wrote:

> A skilled programmer would use a decent compiler and produce:
>
> test:
> xor ecx,ecx
> or eax,eax
> js $+1 ;skip over inc
> inc ecx
> mov eax,ecx
> ret
>
> this are 9 byte and for sure much faster than yours.
>
> or even one byte shorter and branchless:
>
> test:
> or eax,eax
> setns al
> movzx eax,al
> ret

This will work, yes?

test:
neg eax
sbb eax, eax

This may cause register stalls too.

movzx is slow, but may reduce register stalls on more
recent processors. movzx isn't available for early
processors. setcc also isn't available for early
processors and is slow for early processors.

James Harris

unread,
Sep 22, 2015, 1:14:03 PM9/22/15
to
"Rod Pemberton" <b...@fasdfrewar.cdm> wrote in message
news:op.x5b13mlvyfako5@localhost...

...

> movzx isn't available for early
> processors. setcc also isn't available for early
> processors and is slow for early processors.

AIUI, for any 32-bit code MOVZX and SETcc *are* available. They were
present even on the 386.

James

Rod Pemberton

unread,
Sep 22, 2015, 5:09:30 PM9/22/15
to
On Tue, 22 Sep 2015 13:14:02 -0400, James Harris <james.h...@gmail.com> wrote:

> "Rod Pemberton" <b...@fasdfrewar.cdm> wrote in message
> news:op.x5b13mlvyfako5@localhost...

>> movzx isn't available for early
>> processors. setcc also isn't available for early
>> processors and is slow for early processors.
>
> AIUI, for any 32-bit code MOVZX and SETcc *are* available.
> They were present even on the 386.

FYI, this has no effect on what was stated.

James Harris

unread,
Sep 22, 2015, 5:52:54 PM9/22/15
to
"Rod Pemberton" <b...@fasdfrewar.cdm> wrote in message
news:op.x5dueac7yfako5@localhost...
> On Tue, 22 Sep 2015 13:14:02 -0400, James Harris
> <james.h...@gmail.com> wrote:
>
>> "Rod Pemberton" <b...@fasdfrewar.cdm> wrote in message
>> news:op.x5b13mlvyfako5@localhost...
>
>>> movzx isn't available for early
>>> processors. setcc also isn't available for early
>>> processors and is slow for early processors.
>>
>> AIUI, for any 32-bit code MOVZX and SETcc *are* available.
>> They were present even on the 386.
>
> FYI, this has no effect on what was stated.

So you meant that movzx wasn't available on 16-bit CPUs? Why would you
say that (without specifying)? All of the code in your post was 32-bit.

James

Rod Pemberton

unread,
Sep 22, 2015, 9:36:09 PM9/22/15
to
On Tue, 22 Sep 2015 17:52:53 -0400, James Harris <james.h...@gmail.com> wrote:

> "Rod Pemberton" <b...@fasdfrewar.cdm> wrote in message
> news:op.x5dueac7yfako5@localhost...
>> On Tue, 22 Sep 2015 13:14:02 -0400, James Harris
>> <james.h...@gmail.com> wrote:
>>> "Rod Pemberton" <b...@fasdfrewar.cdm> wrote in message
>>> news:op.x5b13mlvyfako5@localhost...

>>>> movzx isn't available for early
>>>> processors. setcc also isn't available for early
>>>> processors and is slow for early processors.
>>>
>>> AIUI, for any 32-bit code MOVZX and SETcc *are* available.
>>> They were present even on the 386.
>>
>> FYI, this has no effect on what was stated.
>
> So you meant that movzx wasn't available on 16-bit CPUs?
> Why would you say that (without specifying)? All of the
> code in your post was 32-bit.
>

It was just a simple statement of the facts.

Bernhard Schornak

unread,
Oct 1, 2015, 12:43:58 PM10/1/15
to
Skybuck Flying wrote:


> Hello,
>
> Is there a more efficient routine for the following:
>
> procedure Constrain( var Para : integer );
> begin
> if Para < 0 then Para := 0;
> if Para > 1 then Para := 1;
> end;
>
> The idea here is to constrain an integer to 0 if it's negative or zero, or 1 if it's positive.
>
> I tried (Para and 1) but that leads to bugs since values can be positive but not have bit 0 set.
>
> So something else might be needed ! ;)
>
> Maybe a special instruction in the future if it doesn't exist yet ?


Reading through the replies posted yet, here's my 2 cents:

NEG EAX
SHR EAX,31

Which might be the shortest and fastest solution... ;)

Skybuck Flying

unread,
Oct 1, 2015, 5:43:32 PM10/1/15
to
"
NEG EAX
SHR EAX,31
"

I think I already came up with something like that in the other thread:
"is_positive".

However that solution has the following problem:

error function is bugged at index: -2147483648

Besides from that one index it works.

Though having a 100% solution would be best ! ;)

Bye,
Skybuck.


Skybuck Flying

unread,
Oct 1, 2015, 5:46:52 PM10/1/15
to
"Skybuck Flying" wrote in message
news:d127c$560da903$d47876e2$32...@news.ziggo.nl...

"
NEG EAX
SHR EAX,31
"

I think I already came up with something like that in the other thread:
"is_positive".

Correction:

It was my other thread called:

"Re: How to write this sign function in Delphi."

Bye,
Skybuck.

Bernhard Schornak

unread,
Oct 2, 2015, 1:49:17 PM10/2/15
to
Skybuck Flying wrote:


> "
> NEG EAX
> SHR EAX,31
> "
>
> I think I already came up with something like that in the other thread: "is_positive".


Not in this thread.

(I generally ignore threads where posters answer their own
questions.)


> However that solution has the following problem:
>
> error function is bugged at index: -2147483648
>
> Besides from that one index it works.
>
> Though having a 100% solution would be best ! ;)


You have to pay 3 instructions to add this 'feature':

NEG EAX
XOR ECX, ECX
CMP EAX, 0x80000000
CMOVE EAX, ECX
SHR EAX, 31

Quite a lot to get a "100% solution"...

M Philbrook

unread,
Oct 2, 2015, 5:50:25 PM10/2/15
to
In article <mumfv2$43n$2...@dont-email.me>, scho...@web.de says...
AND EAX, EAX; ; assume value is in EAX and generate a flag.
Mov EAX, 00; ; Now zero it, should not harm flags.
JZ @done; ; Now use Z flag to finalize it.
Inc EAX, 1; ;force 1 for true if not Z flag.
@Done;

That should yeld "1" for any value not 0
and if you want to use the ZERO flag as a True/False results
you can. So you kill 2 birds with one stone and it's less code.

Unless I missed something, that should work, at least for the intel
processors.
The old 6502 series would be a different story, loading A, the
accumulator woul also set the flags, but that was just make the code
shorter :)

Jamie

Rosario19

unread,
Oct 2, 2015, 6:19:42 PM10/2/15
to
On Sat, 19 Sep 2015 18:07:46 +0200, "Skybuck Flying" wrote:
>Hello,
>
>Is there a more efficient routine for the following:
>
>procedure Constrain( var Para : integer );
>begin
>if Para < 0 then Para := 0;
>if Para > 1 then Para := 1;
>end;
>
>The idea here is to constrain an integer to 0 if it's negative or zero, or 1
>if it's positive.


procedure Constrain( var Para : integer );
begin
if Para < 0 then Para := 0;
if Para > 1 then Para := 1;
end;

efficence in what in above? more than above? for me it is enough
efficient that

>I tried (Para and 1) but that leads to bugs since values can be positive but
>not have bit 0 set.
>
>So something else might be needed ! ;)
>
>Maybe a special instruction in the future if it doesn't exist yet ?
>
>Bye,
> Skybuck.
>
>

Bernhard Schornak

unread,
Oct 3, 2015, 9:52:49 AM10/3/15
to
M Philbrook wrote:


> In article <mumfv2$43n$2...@dont-email.me>, scho...@web.de says...
>>
>> Skybuck Flying wrote:
>>
>>> "
>>> NEG EAX
>>> SHR EAX,31
>>> "
>>>
>>> I think I already came up with something like that in the other thread: "is_positive".
>>
>> Not in this thread.
>>
>> (I generally ignore threads where posters answer their own
>> questions.)
>>
>>
>>> However that solution has the following problem:
>>>
>>> error function is bugged at index: -2147483648
>>>
>>> Besides from that one index it works.
>>>
>>> Though having a 100% solution would be best ! ;)
>>
>>
>> You have to pay 3 instructions to add this 'feature':
>>
>> NEG EAX
>> XOR ECX, ECX
>> CMP EAX, 0x80000000
>> CMOVE EAX, ECX
>> SHR EAX, 31
>>
>> Quite a lot to get a "100% solution"...
>
>
> AND EAX, EAX; ; assume value is in EAX and generate a flag.
> Mov EAX, 00; ; Now zero it, should not harm flags.
> JZ @done; ; Now use Z flag to finalize it.
> Inc EAX, 1; ;force 1 for true if not Z flag.
> @Done;
>
> That should yeld "1" for any value not 0
> and if you want to use the ZERO flag as a True/False results
> you can. So you kill 2 birds with one stone and it's less code.


First : There is a(n avoidable) branch in your code... ;)

Second: The result should be zero if the input is zero or
negative - only positive numbers should return 1.

This one would do, but treats zero as positive number:

BTC EAX, 31
SHR EAX, 31

Also, a solution with MOVMSKPS was possible, but might be
slower than the integer version as long as zero has to be
treated as negative number.

M Philbrook

unread,
Oct 3, 2015, 2:41:03 PM10/3/15
to
In article <muomfm$1o4$2...@dont-email.me>, scho...@web.de says...
> >>
> >> NEG EAX
> >> XOR ECX, ECX
> >> CMP EAX, 0x80000000
> >> CMOVE EAX, ECX
> >> SHR EAX, 31
> >>
> >> Quite a lot to get a "100% solution"...
> >
> >
> > AND EAX, EAX; ; assume value is in EAX and generate a flag.
> > Mov EAX, 00; ; Now zero it, should not harm flags.
> > JZ @done; ; Now use Z flag to finalize it.
> > Inc EAX, 1; ;force 1 for true if not Z flag.
> > @Done;
> >
> > That should yeld "1" for any value not 0
> > and if you want to use the ZERO flag as a True/False results
> > you can. So you kill 2 birds with one stone and it's less code.
>
>
> First : There is a(n avoidable) branch in your code... ;)
Why? just a simple jump for the CPU.

>
> Second: The result should be zero if the input is zero or
> negative - only positive numbers should return 1.
>
It is zero if the input is 0 or -. ?
The original question was to make a logic 1 or 0 when
ever the value is 0 for logic 0 and 1 for any other
number and that included - numbers.

Must we not forget that (-trues) 16 or more bit land exist with
widows programming. and if you wanted to truely treat (-) as logic
0, which they are not, you can simply use a different flag, the rest
of the code is the same.

Getting back to the original coding that was being run.

When "AND EAX, EAX", the contents remain but flags are set.
(Z) flag gets set when there is any bits set, including the
sign.

MOV operation does not effect the flag reg, last time I knew,
so the flag status is still set.

And like before, the JZ is a simple jump? I don't see much CPU time
there. In fact I invite it.

But if it's not zero value, then the EAX gets incremented once, I
suppose one could move"1" into it, which ever is less costly.
I did make a mistake on the inc, it should of been INC EAX and you
didn't catch that?

But in the end, you have EAX reg with 0 or 1 and you also
have the Z flag set.

I just ran that code and did a usage check, your solution cost more..

WHat can I say, accept that I think I'll use my method if ever needed.
Thanks for the input and I'll drop this because all I see is wasted
code examples, ever kill of ASM code use for simple task..

Jamie

Bernhard Schornak

unread,
Oct 4, 2015, 7:26:16 AM10/4/15
to
M Philbrook wrote:


> In article <muomfm$1o4$2...@dont-email.me>, scho...@web.de says...
>>>>
>>>> NEG EAX
>>>> XOR ECX, ECX
>>>> CMP EAX, 0x80000000
>>>> CMOVE EAX, ECX
>>>> SHR EAX, 31
>>>>
>>>> Quite a lot to get a "100% solution"...
>>>
>>>
>>> AND EAX, EAX; ; assume value is in EAX and generate a flag.
>>> Mov EAX, 00; ; Now zero it, should not harm flags.
>>> JZ @done; ; Now use Z flag to finalize it.
>>> Inc EAX, 1; ;force 1 for true if not Z flag.
>>> @Done;
>>>
>>> That should yeld "1" for any value not 0
>>> and if you want to use the ZERO flag as a True/False results
>>> you can. So you kill 2 birds with one stone and it's less code.
>>
>>
>> First : There is a(n avoidable) branch in your code... ;)
> Why? just a simple jump for the CPU.


AMD optimisation guide (PDF #47414):


2.6 Branch-Prediction

To predict and accelerate branches, AMD Family 15h processors employ a
combination of next-address logic, a 2-level branch target buffer (BTB)
for branch identification and direct target prediction, a return address
stack used for predicting return addresses, an indirect target predictor
for predicting indirect jump and call addresses, a hybrid branch predictor
for predicting conditional branch directions, and a fetch window tracking
structure (BSR). Predicted-taken branches incur a 1-cycle bubble in the
branch prediction pipeline when they are predicted by the L1 BTB, and a
4-cycle bubble in the case where they are predicted by the L2 BTB. The
minimum branch misprediction penalty is 20 cycles in the case of
conditional and indirect branches and 15 cycles for unconditional direct
branches and returns.


iNTEL 64 and IA-32 Architectures Optimization Reference Manual
(PDF 248966-030):

3.4.1.1 Eliminating Branches

Eliminating branches improves performance because:

• It reduces the possibility of mispredictions.
• It reduces the number of required branch target buffer (BTB) entries.
Conditional branches, which are never taken, do not consume BTB
resources.

There are four principal ways of eliminating branches:

• Arrange code to make basic blocks contiguous.
• Unroll loops, as discussed in Section 3.4.1.7, “Loop Unrolling.”
• Use the CMOV instruction.
• Use the SETCC instruction.

The following rules apply to branch elimination:

Assembly/Compiler Coding Rule 1. (MH impact, M generality) Arrange code
to make basic blocks contiguous and eliminate unnecessary branches.

Assembly/Compiler Coding Rule 2. (M impact, ML generality) Use the SETCC
and CMOV instructions to eliminate unpredictable conditional branches
where possible. Do not do this for predictable branches. Do not use these
instructions to eliminate all unpredictable conditional branches (because
using these instructions will incur execution overhead due to the
requirement for executing both paths of a conditional branch). In addition,
converting a conditional branch to SETCC or CMOV trades off control flow
dependence for data dependence and restricts the capability of the
out-of-order engine. When tuning, note that all Intel 64 and IA-32
processors usually have very high branch prediction rates. Consistently
mispredicted branches are generally rare. Use these instructions only if
the increase in computation time is less than the expected cost of a
mispredicted branch.


>> Second: The result should be zero if the input is zero or
>> negative - only positive numbers should return 1.
>>
> It is zero if the input is 0 or -. ?
> The original question was to make a logic 1 or 0 when
> ever the value is 0 for logic 0 and 1 for any other
> number and that included - numbers.


The top level post in this thread asked this question:

>>>> Hello,
>>>>
>>>> Is there a more efficient routine for the following:
>>>>
>>>> procedure Constrain( var Para : integer );
>>>> begin
>>>> if Para < 0 then Para := 0;
>>>> if Para > 1 then Para := 1;
>>>> end;
>>>>
>>>> The idea here is to constrain an integer to 0 if it's negative or zero, or 1 if it's positive.

What you replied is a *very* fancyful interpretation of the
original task... ;)


> Must we not forget that (-trues) 16 or more bit land exist with
> widows programming. and if you wanted to truely treat (-) as logic
> 0, which they are not, you can simply use a different flag, the rest
> of the code is the same.
>
> Getting back to the original coding that was being run.
>
> When "AND EAX, EAX", the contents remain but flags are set.
> (Z) flag gets set when there is any bits set, including the
> sign.


"TEST EAX, EAX" leaves the content of EAX untouched and the
flags are set, as well.


> MOV operation does not effect the flag reg, last time I knew,
> so the flag status is still set.


Yes.


> And like before, the JZ is a simple jump? I don't see much CPU time
> there. In fact I invite it.


The usual branch prediction logic is:

1. Predict branch was not taken (1st)
2. Predict branch was taken (2nd)
3. Use prediction table for following predictions (Nth)

(3) only applies as long as that branch label is stored in
the branch prediction buffer. If the buffer was overwritten
with later branch targets, begin with (1), again. Guess how
often this happens in a multithreaded environment.

With your code you will trigger a penalty of 20 clocks each
time the function is called. After two subsequent calls, it
should be executed in three clocks, but still does not sort
out negative numbers.

My code needs 4 clocks to execute, but works as demanded by
the initial post.
0 new messages