Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

More efficient routine/instruction for (binary) "constrain" ?

4 views
Skip to first unread message

Skybuck Flying

unread,
Sep 19, 2015, 12:07:44 PM9/19/15
to
Hello,

Is there a more efficient routine for the following:

procedure Constrain( var Para : integer );
begin
if Para < 0 then Para := 0;
if Para > 1 then Para := 1;
end;

The idea here is to constrain an integer to 0 if it's negative or zero, or 1
if it's positive.

I tried (Para and 1) but that leads to bugs since values can be positive but
not have bit 0 set.

So something else might be needed ! ;)

Maybe a special instruction in the future if it doesn't exist yet ?

Bye,
Skybuck.



Skybuck Flying

unread,
Sep 19, 2015, 12:12:52 PM9/19/15
to
Also if a special instruction is ever added for it I would recommend
programming languages to implement it as follows for example:

var
a : integer;
begin

A := 23523;
A := (A-1) constrain 1;
end;

Or perhaps:

var
a : integer;
begin

A := 23523;
A := (A-1) con 1;
end;

(sounds to much like "con" like scam or something)
Or perhaps

var
a : integer;
begin

A := 23523;
A := (A-1) cons 1;
end;

^ For a shorter version. though might be confusing with const.

Either con or constrain is nice. Con seems nice for short hand versions and
code efficiency.

Also notice the parameter... it might be interesting to extend the
instruction to some larger range.

constrain 1 I would call: "binary constrain" lol..

But there could be other interesting constrains.

var
a : integer;
begin
A := 23523;
A := (A-1) con 100;
end;

Bye,
Skybuck.


Skybuck Flying

unread,
Sep 19, 2015, 12:16:58 PM9/19/15
to
Hmm the procedure version can't work like that haha...

a := constrain( something-1 ); // leads to errors

It's obviou what problem is... it's not a function so new version:

function Constrain( Para : integer ) : integer;
begin
if Para < 0 then result := 0;
if Para > 1 then result := 1;
end;

That should do it.

Also nice, no more var parameter... so operations like -1 should be
possible.

Not sure if a constant version would be safe ! ;)

Here is constant version, but I am gonna stick with the stack based
(?)/temporarely version for now or so ! ;)

function Constrain( const Para : integer ) : integer;
begin
if Para < 0 then result := 0;
if Para > 1 then result := 1;
end;

Bye,
Skybuck.

Skybuck Flying

unread,
Sep 19, 2015, 12:18:01 PM9/19/15
to
And ofcourse I just noticed another sloppy bug.. hihi...

The greater than sign still left room for undefined results, so finally
correct version is probably:

function Constrain( Para : integer ) : integer;
begin
if Para < 0 then Result := 0;
if Para >= 1 then Result := 1;
end;

Bye,
Skybuck.


Skybuck Flying

unread,
Sep 19, 2015, 12:20:16 PM9/19/15
to
However now Delphi nags about might be undefined result for function...

So for now to get rid of nag:

function Constrain( Para : integer ) : integer;
begin
result := Para;
if Para < 0 then Result := 0;
if Para > 1 then Result := 1;
end;

This kinda stays more true to it's natural intention... and thus is more
correct.

Now it can also be extended with range parameters and the para will only be
adjusted when out of range.

So this version is more correct than previous ones ! ;)

Thanks Delphi compiler for your hint message ! ;) =D lol.

Bye,
Skybuck.


Melzzzzz

unread,
Sep 19, 2015, 12:21:08 PM9/19/15
to
Learn how to program first...

Skybuck Flying

unread,
Sep 19, 2015, 12:25:34 PM9/19/15
to
Strangely enough this version is actually bugged.

It looks correct but it's not...

Don't know yet, why it's not correct, nor do I care much... it does not
concern me... though out of curiosity I may investigate it later or I may
not ;)

buggy:

function Constrain( Para : integer ) : integer;
begin
if Para < 0 then Result := 0;
if Para >= 1 then Result := 1;
end;

Strangely enough

Constrain( something -1 )
or
Constrain( something -2)

might lead to some weird bug where 1 is returned while it should be zero ?

Hmmm... may have to investigate later...

Bye,
Skybuck.

Skybuck Flying

unread,
Sep 19, 2015, 12:28:16 PM9/19/15
to
Haha now I see the bug:

function Constrain( Para : integer ) : integer;
begin
if Para < 0 then Result := 0;
if Para >= 1 then Result := 1;
end;

If following happens:
A = 1
Constrain( A-1)

Then Para = 0;

and thus neither branch will be executed leading to an undefined result ! ;)

Nice example of a totally fucked/obfuscated function which could be used in
an obfuscation contest ! HAHA LOl.

Undefined results <- not sure if this is allowed in obfuscation contest lol
?! As long as it doesn't influence outcome too much... maybe one can get
away with it ! LOL.

Or just use it as futher obfuscations... not sure if obfuscation contests
actually required something to be computed properly ? ;)

Bye,
Skybuck.




Skybuck Flying

unread,
Sep 19, 2015, 1:30:02 PM9/19/15
to
Ah details, details, details ! ;) =D

The devil is in the details ! =D

Been out of it for a little bit...

Though 99.9% of my code was pretty good, ofcourse there are always little
buggies...

But the most nasty ones are trivial little thingies like these which can or
are easy to be overlooked ! ;) =D

Don't you agree that all these version are pretty darn tricky ?! ;)

The mission itself seems simple enough... but quite easy to frack up somehow
! ;)

There's not even an instruction for something as simple as this ? As far as
I know ?

Thus two expensive branches needed ?! ;)

One possible optimization could be:

function Constrain( Para : integer ) : integer; // * opt
begin
if Para <= 0 then
Result := 0;
else
Result := 1;
end;

Not sure if that would actually execute faster than the correct
implementation:

function Constrain( Para : integer ) : integer;
begin
result := Para;
if Para < 0 then Result := 0;
if Para >= 1 then Result := 1;
end;

The "supposedly optimized one" above (see *opt) is also less extendedable I
think...

So the two branch version might actually be required.

Though I can imagine a hard-wired version to perhaps use some kind of
gate-tricks to try and prevent the branches and such ! ;)

Perhaps some maths/muls or something could be used... I have bigger fish to
fry... so for now not going to spend time on it ! ;) =D

But the MICRO people might like thinking about this ! ;) =D

So I shove it over to them ! ;) =D

Bye,
Skybuck.

Skybuck Flying

unread,
Sep 19, 2015, 1:44:18 PM9/19/15
to
Some last little hint about this:

The result of "constrain" will ultimately be used to keep or dispose other
values.

So for example:

Something1 = Something2 * Constrain( Something3 - Something4 );

Perhaps there is a different solution which may not need such accurate
results.

Though in my case the 0's and 1's returned from Constrain would also be
added together for other things.

However perhaps that could have been done differently... not sure yet...
(note to self: related to field length determination).

Anyway maybe for a processor it might be enough to compute something like:

Something5 = Something3-Something4

Or all bits of Something5 together to form a result which indicates a 0 or
1.

So for example if I had to write this in Delphi without a branch using this
idea it would look something like:

function Constrain( Para : integer ) : integer; // * opt
begin
result :=
(Para shr 31) or
(Para shr 30) or
(Para shr 29) or
(Para shr 28) or
(Para shr 27) or
(Para shr 26) or
(Para shr 25) or
(Para shr 24) or
(Para shr 23) or
(Para shr 22) or
(Para shr 21) or
(Para shr 20) or
(Para shr 19) or
(Para shr 18) or
(Para shr 17) or
(Para shr 16) or
(Para shr 15) or
(Para shr 14) or
(Para shr 13) or
(Para shr 12) or
(Para shr 11) or
(Para shr 10) or
(Para shr 9) or
(Para shr 8) or
(Para shr 7) or
(Para shr 6) or
(Para shr 5) or
(Para shr 4) or
(Para shr 3) or
(Para shr 2) or
(Para shr 1) or
(Para shr 0);
end;

This may actually give a valid/decent result without requiring any branch !

Funny thing is it would currently maybe require 32x2 instructions or so...
not sure if that is worth a potential branch-miss-prediction penalty ! ;) =D

Again... this is for the MICRO people to figure out ! LOL.

But these gates could maybe be hard-wired and such... soo... yeah...

There are probably other solutions as well... let me know any other creative
ones ! ;) :)

Also try and keep it fast ! =D

Branchless would also be interesting ! ;)

Bye,
Skybuck.

Skybuck Flying

unread,
Sep 19, 2015, 1:48:43 PM9/19/15
to
Anyway for parallel processors having a special value which would say:
"dispose/keep" "yes/no" could be interesting.

Even for integers...

A little bit like "NAN" for floating point numbers or so.

Not sure if this makes any sense... 0 and 1 could ofcourse be used for it...

Or perhaps a special instruction(s) which can somehow use any variable with
any value and use " <> 0" as keep and " = 0" as dispose.

Or vice versa...

So my point is basically a one and zero value not necessaryily required for
keeping/disposing of other values, some special rules could be applied as
above.

Perhaps even all instructions could be modified to use some kind of "gate"
mechnism.

for example:

add a, b

A gate could be added as follows:

add a,b, gate

If the gate is negative/positive the value/computation of add a,b is kept.
If the gate is zero the computation from add is disgarded.

This could prevent these kinds of seperate constrain computations which
could be beneficial for writing parallel code ! ;)

Bye,
Skybuck.




Skybuck Flying

unread,
Sep 19, 2015, 1:51:48 PM9/19/15
to
So programmers can then write code as follows:

A[0] := B[0] + C[0], D[0];
A[1] := B[1] + C[0], D[1];
A[2] := B[2] + C[0], D[2];
A[3] := B[3] + C[0], D[3];

These can then be computed in parallel without requiring branches and all
kinds of other stuff.

Bye,
Skybuck.

Skybuck Flying

unread,
Sep 19, 2015, 8:55:56 PM9/19/15
to
Another idea which came to mind to speed it up is or two bits together
before shifting it down, to save on shifts and or's.

The way this can be done easily and even massively parallel is by using two
"half arrays" (splitting the array in two).

This can be done with nice array type casts, I will attempt to write some
code here, but not going to test it... it's late need to go to bed soon :)

I watched two cool movies today: "star trek into the darkness" (already seen
a crappy cam version of it) and "the 5th element" enjoyed that one too...
kinda funny movie.

Anyway now that that's done time to write some more code ! LOL ;) =D

I hope more reasoning is correct lol... here goes:

I was gonna write:

type
TWordArray = array[0..1] of word;
TByteArray = array[0..1] of word;

But unfortunately this cannot go down to just nimbles and half nimbles and
bits... Delphi does not support it.

So a different solution will need to be used... "divide and conquer" like
approach with shrs should be possible.

Like a binary search so here goes:

The idea now is to split up/copy the parameter into two registers and shift
one of them down and perform the or's and such.

To make it more clear they will be called top and bottom, top is to be
shifted down.

Upper part of bottom will need to be cleared... unfortunately enough ;)

Going to typecast to longword/unsigned integer just in case ! ;)

function Constrain( Para : integer ) : integer; // * opt
var
ParaTop : longword;
ParaBottom : longword;
begin

// 2x16
ParaTop := longword(Para);
ParaBottom := longword(Para);

// now shift top half way down
ParaTop := ParaTop shr 16;

// and clear upper part of bottom
ParaBottom := ParaBottom and (65535 shl 16);

// now or them together
Para := ParaTop or ParaBottom;

// now repeat for 2x8
ParaTop := longword(Para);
ParaBottom := longword(Para);

// now shift top half way down
ParaTop := ParaTop shr 8;

// and clear upper part of bottom
ParaBottom := ParaBottom and (255 shl 8);

// now or them together
Para := ParaTop or ParaBottom;

// now repeat for 2x4
ParaTop := longword(Para);
ParaBottom := longword(Para);

// now shift top half way down
ParaTop := ParaTop shr 4;

// and clear upper part of bottom
ParaBottom := ParaBottom and (15 shl 4);

// now or them together
Para := ParaTop or ParaBottom;

// now repeat for 2x2
ParaTop := longword(Para);
ParaBottom := longword(Para);

// now shift top half way down
ParaTop := ParaTop shr 2;

// and clear upper part of bottom
ParaBottom := ParaBottom and (3 shl 2);

// now or them together
Para := ParaTop or ParaBottom;

// now repeat for 2x1
ParaTop := longword(Para);
ParaBottom := longword(Para);

// now shift top half way down
ParaTop := ParaTop shr 1;

// and clear upper part of bottom
ParaBottom := ParaBottom and (1 shl 1);

// now or them together
Para := ParaTop or ParaBottom;

Result := Para;
end;

// abour 34 instructions or so would be my estimation.

Possibly an improvement over previous version.

This newer version assuming it's correct could probably be optimized
somewhat here and there ! ;) :)

Bye,
Skybuck :)

Skybuck Flying

unread,
Sep 19, 2015, 10:06:30 PM9/19/15
to
Thanks !

You seem to be the winner for now "dunno", unless it's slow or something !
;)

Or perhaps there are other constraints/limitations... but for now.. my own
rolled version on your idea seems to work just fine:

function ConstrainV9(para : integer) : integer; register; assembler;
asm
test eax, eax
setg al
end;

(Other newsgroups added so they too can see this fine 2 instruction solution
! ;) :) and perhaps comment on it further if there are any hidden issues
?!;))

Best of all this routine seems fully backwards compatible with 80486 ! It
goes even back to 80386 compatibility... nice ! ;) =D

Bye,
Skybuck :D


Skybuck Flying

unread,
Sep 19, 2015, 10:09:17 PM9/19/15
to
Wooooooooopppss ! =D

I cheered too soon LOL.

This function also has a problem with negative numbers !

AAAAAHHH To bad for you "dunno" lol... your idea may be bogus after all ?!

Problem was I forgot to update the test code from calling ConstrainV8 to
ConstrainV9 ! LOL.

So it was testing the wrong/old function !

Now the new one is being called and it's BUGGGGGGGED ! ;) =D

function ConstrainV9(para : integer) : integer; register; assembler;
asm
test eax, eax
setg al
end;

Maybe it can be fixed somehow ?! ;) :) =D

Bye for now,
Skybuck.

Skybuck Flying

unread,
Sep 19, 2015, 10:16:33 PM9/19/15
to
This seems to fix it:

function ConstrainV9(para : integer) : integer; register; assembler;
asm
test eax, eax
mov eax, 0
setg al
end;

I like the above one better than this one,

// this style supports 16 bit, 32 bit and perhaps even 64 bit, will probably
not work on 80486 and not in 8 bit versions.
function ConstrainV8(para : integer) : integer; register; assembler;
const
one : integer = 1;
zero : integer = 0;
asm
cmp eax, 0
cmovg eax, one
cmovl eax, zero
end;

since the top one is shorter and more backwards compatible and doesn't
require these bloody constants :)

Me not assembler expert so not sure if adding the mov, 0 could lead to any
problems ?! ;)

I think that was main problem with top one... it didn't clear eax enough...
and could lead residue in it leading to wrong/garbaged results ! ;)

Another solution might be:

function ConstrainV10(para : integer) : integer; register; assembler;
asm
test eax, eax
setg al
and eax, 1
end;

This one is also correct, I like this one even more than the mov version...
cause the setg is right after the test... so no chance of mov screwing up
any flags ?! ;) :)

Though if for some reason the flag have to be kept then the and version
might screw that up... but that's ok... for now the flags do not need to be
preserved...

if flags would need to be preserved than maybe mov version might be nicer !
;)

Bye,
Skybuck.

dunno

unread,
Sep 19, 2015, 10:24:39 PM9/19/15
to
Yes, you have to clear upper 24 bits of register. Apparently SET is
accessing only lower 8 bits.

Try this:

test eax, eax
setg bl
movzb eax, bl

Or, if you don't want to use MOVZB, you can do it with AND:

test eax, eax
setg al
and eax, 0xFFFFFF00h


> Bye for now,
> Skybuck.


--
dunno

dunno

unread,
Sep 19, 2015, 10:27:10 PM9/19/15
to
This won't work.

> This one is also correct, I like this one even more than the mov
> version... cause the setg is right after the test... so no chance of mov
> screwing up any flags ?! ;) :)
>
> Though if for some reason the flag have to be kept then the and version
> might screw that up... but that's ok... for now the flags do not need to be preserved...
>
> if flags would need to be preserved than maybe mov version might be nicer ! ;)
>
> Bye,
> Skybuck.


--
dunno

dunno

unread,
Sep 19, 2015, 10:35:13 PM9/19/15
to
No wait. You right. But it should be OR instead of AND, otherwise result
always will be 1.

dunno

unread,
Sep 19, 2015, 10:47:59 PM9/19/15
to
Apparently I forgot how AND is working. :) It should be:

test eax, eax
setg al
or eax, 1

Bernhard Schornak

unread,
Sep 19, 2015, 11:28:44 PM9/19/15
to
Skybuck Flying wrote:


> ... unless it's slow or something !
>
> function ConstrainV9(para : integer) : integer; register; assembler;
> asm
> test eax, eax
> setg al


This will be slow because it forces register merging (EAX => AL),
see "partial register writes" in AMD's ans iNTEL's data sheets...


Greetings from Augsburg

Bernhard Schornak

Bernhard Schornak

unread,
Sep 19, 2015, 11:29:00 PM9/19/15
to
Accessing a lookup table is *much* faster...

Besides that: Why do you post HLL goobledigook to an assembler NG?

Skybuck Flying

unread,
Sep 20, 2015, 7:47:06 AM9/20/15
to


"dunno" wrote in message news:mtl610$bh5$1...@news.mixmin.net...
I don't think so, or 1 will always set the result to 1, that is undesired.

and 1 will single out the last bit, it will be 0 if it's zero and it will be
1 if it's 1.

It works.

Bye,
Skybuck.

Skybuck Flying

unread,
Sep 20, 2015, 7:50:17 AM9/20/15
to


"Bernhard Schornak" wrote in message news:mtl927$os0$2...@dont-email.me...
How would it look like ?

"
Besides that: Why do you post HLL goobledigook to an assembler NG?
"

It looks very much like assembler :)

Simple instructions really.

Bye,
Skybuck.

Skybuck Flying

unread,
Sep 20, 2015, 12:09:38 PM9/20/15
to
"
Yes, you have to clear upper 24 bits of register. Apparently SET is
accessing only lower 8 bits.
"

The test does compare all 32 bits as far as I know...

Once the test is done the only thing that is needed is a 1 or a 0 to return
as a result.

"
test eax, eax
setg bl
movzb eax, bl
"

movzb not supported by Delphi.


Or, if you don't want to use MOVZB, you can do it with AND:

"
test eax, eax
setg al
and eax, 0xFFFFFF00h
"

This is bugged for some reason...

I think this actually keeps the upper bits which would be wrong.

I think what you might be looking for is an inverted mask, which does clear
out the upper bits and keeps the lower bits:

"
test eax, eax
setg al
and eax, 0x000000FFh
"

^ This does work.

"
Apparently I forgot how AND is working. :) It should be:

test eax, eax
setg al
or eax, 1
"

No this also doesn't work lol... ofcourse or will keep the upper bits as
well.

since setg doesn't clear upper bits yet ?!

Did you have a good night sleep ?! lol... your brain seems to be a
mess/confused or you as noob... or you thought this was easy.... but
apperently it's not ! LOL.

You are forgiven though...

I guess sometimes the human brain can be easily fooled/confused by something
that looks easy/trivial... but then again it's not ! LOL.

Lazyness got to you ?! =D

Bye,
Skybuck ;) =D

Melzzzzz

unread,
Sep 20, 2015, 7:39:38 PM9/20/15
to
On Sun, 20 Sep 2015 05:28:42 +0200
Bernhard Schornak <scho...@web.de> wrote:

> Skybuck Flying wrote:
>
>
> > ... unless it's slow or something !
> >
> > function ConstrainV9(para : integer) : integer; register; assembler;
> > asm
> > test eax, eax
> > setg al
>
>
> This will be slow because it forces register merging (EAX => AL),
> see "partial register writes" in AMD's ans iNTEL's data sheets...
>
>
Hmmm, I see gcc (svn trunk) generates such code for returning bool value
from function. Perhaps they know something we don't know?
eg for:
bool fun(int x) {
if (x > 5) return true;
return false;
}
gcc generated:
cmpl $5, %edi
setg %al
ret

Melzzzzz

unread,
Sep 20, 2015, 8:21:35 PM9/20/15
to
Ah I see now, it uses different registers

Bernhard Schornak

unread,
Sep 21, 2015, 3:24:53 AM9/21/15
to
Yes. ;)

MOVcc is much better if you want to set a register to a desired
value - SETcc's only advantage is its shorter instruction size,
but you buy it at the cost of a possible merge penalty.

Bernhard Schornak

unread,
Oct 1, 2015, 12:43:58 PM10/1/15
to
Skybuck Flying wrote:


> Hello,
>
> Is there a more efficient routine for the following:
>
> procedure Constrain( var Para : integer );
> begin
> if Para < 0 then Para := 0;
> if Para > 1 then Para := 1;
> end;
>
> The idea here is to constrain an integer to 0 if it's negative or zero, or 1 if it's positive.
>
> I tried (Para and 1) but that leads to bugs since values can be positive but not have bit 0 set.
>
> So something else might be needed ! ;)
>
> Maybe a special instruction in the future if it doesn't exist yet ?


Reading through the replies posted yet, here's my 2 cents:

NEG EAX
SHR EAX,31

Which might be the shortest and fastest solution... ;)

Skybuck Flying

unread,
Oct 1, 2015, 5:43:32 PM10/1/15
to
"
NEG EAX
SHR EAX,31
"

I think I already came up with something like that in the other thread:
"is_positive".

However that solution has the following problem:

error function is bugged at index: -2147483648

Besides from that one index it works.

Though having a 100% solution would be best ! ;)

Bye,
Skybuck.


Skybuck Flying

unread,
Oct 1, 2015, 5:46:52 PM10/1/15
to
"Skybuck Flying" wrote in message
news:d127c$560da903$d47876e2$32...@news.ziggo.nl...

"
NEG EAX
SHR EAX,31
"

I think I already came up with something like that in the other thread:
"is_positive".

Correction:

It was my other thread called:

"Re: How to write this sign function in Delphi."

Bye,
Skybuck.

Bernhard Schornak

unread,
Oct 2, 2015, 1:49:16 PM10/2/15
to
Skybuck Flying wrote:


> "
> NEG EAX
> SHR EAX,31
> "
>
> I think I already came up with something like that in the other thread: "is_positive".


Not in this thread.

(I generally ignore threads where posters answer their own
questions.)


> However that solution has the following problem:
>
> error function is bugged at index: -2147483648
>
> Besides from that one index it works.
>
> Though having a 100% solution would be best ! ;)


You have to pay 3 instructions to add this 'feature':

NEG EAX
XOR ECX, ECX
CMP EAX, 0x80000000
CMOVE EAX, ECX
SHR EAX, 31

Quite a lot to get a "100% solution"...

M Philbrook

unread,
Oct 2, 2015, 5:50:25 PM10/2/15
to
In article <mumfv2$43n$2...@dont-email.me>, scho...@web.de says...
AND EAX, EAX; ; assume value is in EAX and generate a flag.
Mov EAX, 00; ; Now zero it, should not harm flags.
JZ @done; ; Now use Z flag to finalize it.
Inc EAX, 1; ;force 1 for true if not Z flag.
@Done;

That should yeld "1" for any value not 0
and if you want to use the ZERO flag as a True/False results
you can. So you kill 2 birds with one stone and it's less code.

Unless I missed something, that should work, at least for the intel
processors.
The old 6502 series would be a different story, loading A, the
accumulator woul also set the flags, but that was just make the code
shorter :)

Jamie

Rosario19

unread,
Oct 2, 2015, 6:19:41 PM10/2/15
to
On Sat, 19 Sep 2015 18:07:46 +0200, "Skybuck Flying" wrote:
>Hello,
>
>Is there a more efficient routine for the following:
>
>procedure Constrain( var Para : integer );
>begin
>if Para < 0 then Para := 0;
>if Para > 1 then Para := 1;
>end;
>
>The idea here is to constrain an integer to 0 if it's negative or zero, or 1
>if it's positive.


procedure Constrain( var Para : integer );
begin
if Para < 0 then Para := 0;
if Para > 1 then Para := 1;
end;

efficence in what in above? more than above? for me it is enough
efficient that

>I tried (Para and 1) but that leads to bugs since values can be positive but
>not have bit 0 set.
>
>So something else might be needed ! ;)
>
>Maybe a special instruction in the future if it doesn't exist yet ?
>
>Bye,
> Skybuck.
>
>

Bernhard Schornak

unread,
Oct 3, 2015, 9:52:49 AM10/3/15
to
M Philbrook wrote:


> In article <mumfv2$43n$2...@dont-email.me>, scho...@web.de says...
>>
>> Skybuck Flying wrote:
>>
>>> "
>>> NEG EAX
>>> SHR EAX,31
>>> "
>>>
>>> I think I already came up with something like that in the other thread: "is_positive".
>>
>> Not in this thread.
>>
>> (I generally ignore threads where posters answer their own
>> questions.)
>>
>>
>>> However that solution has the following problem:
>>>
>>> error function is bugged at index: -2147483648
>>>
>>> Besides from that one index it works.
>>>
>>> Though having a 100% solution would be best ! ;)
>>
>>
>> You have to pay 3 instructions to add this 'feature':
>>
>> NEG EAX
>> XOR ECX, ECX
>> CMP EAX, 0x80000000
>> CMOVE EAX, ECX
>> SHR EAX, 31
>>
>> Quite a lot to get a "100% solution"...
>
>
> AND EAX, EAX; ; assume value is in EAX and generate a flag.
> Mov EAX, 00; ; Now zero it, should not harm flags.
> JZ @done; ; Now use Z flag to finalize it.
> Inc EAX, 1; ;force 1 for true if not Z flag.
> @Done;
>
> That should yeld "1" for any value not 0
> and if you want to use the ZERO flag as a True/False results
> you can. So you kill 2 birds with one stone and it's less code.


First : There is a(n avoidable) branch in your code... ;)

Second: The result should be zero if the input is zero or
negative - only positive numbers should return 1.

This one would do, but treats zero as positive number:

BTC EAX, 31
SHR EAX, 31

Also, a solution with MOVMSKPS was possible, but might be
slower than the integer version as long as zero has to be
treated as negative number.

M Philbrook

unread,
Oct 3, 2015, 2:41:03 PM10/3/15
to
In article <muomfm$1o4$2...@dont-email.me>, scho...@web.de says...
> >>
> >> NEG EAX
> >> XOR ECX, ECX
> >> CMP EAX, 0x80000000
> >> CMOVE EAX, ECX
> >> SHR EAX, 31
> >>
> >> Quite a lot to get a "100% solution"...
> >
> >
> > AND EAX, EAX; ; assume value is in EAX and generate a flag.
> > Mov EAX, 00; ; Now zero it, should not harm flags.
> > JZ @done; ; Now use Z flag to finalize it.
> > Inc EAX, 1; ;force 1 for true if not Z flag.
> > @Done;
> >
> > That should yeld "1" for any value not 0
> > and if you want to use the ZERO flag as a True/False results
> > you can. So you kill 2 birds with one stone and it's less code.
>
>
> First : There is a(n avoidable) branch in your code... ;)
Why? just a simple jump for the CPU.

>
> Second: The result should be zero if the input is zero or
> negative - only positive numbers should return 1.
>
It is zero if the input is 0 or -. ?
The original question was to make a logic 1 or 0 when
ever the value is 0 for logic 0 and 1 for any other
number and that included - numbers.

Must we not forget that (-trues) 16 or more bit land exist with
widows programming. and if you wanted to truely treat (-) as logic
0, which they are not, you can simply use a different flag, the rest
of the code is the same.

Getting back to the original coding that was being run.

When "AND EAX, EAX", the contents remain but flags are set.
(Z) flag gets set when there is any bits set, including the
sign.

MOV operation does not effect the flag reg, last time I knew,
so the flag status is still set.

And like before, the JZ is a simple jump? I don't see much CPU time
there. In fact I invite it.

But if it's not zero value, then the EAX gets incremented once, I
suppose one could move"1" into it, which ever is less costly.
I did make a mistake on the inc, it should of been INC EAX and you
didn't catch that?

But in the end, you have EAX reg with 0 or 1 and you also
have the Z flag set.

I just ran that code and did a usage check, your solution cost more..

WHat can I say, accept that I think I'll use my method if ever needed.
Thanks for the input and I'll drop this because all I see is wasted
code examples, ever kill of ASM code use for simple task..

Jamie

Bernhard Schornak

unread,
Oct 4, 2015, 7:26:16 AM10/4/15
to
M Philbrook wrote:


> In article <muomfm$1o4$2...@dont-email.me>, scho...@web.de says...
>>>>
>>>> NEG EAX
>>>> XOR ECX, ECX
>>>> CMP EAX, 0x80000000
>>>> CMOVE EAX, ECX
>>>> SHR EAX, 31
>>>>
>>>> Quite a lot to get a "100% solution"...
>>>
>>>
>>> AND EAX, EAX; ; assume value is in EAX and generate a flag.
>>> Mov EAX, 00; ; Now zero it, should not harm flags.
>>> JZ @done; ; Now use Z flag to finalize it.
>>> Inc EAX, 1; ;force 1 for true if not Z flag.
>>> @Done;
>>>
>>> That should yeld "1" for any value not 0
>>> and if you want to use the ZERO flag as a True/False results
>>> you can. So you kill 2 birds with one stone and it's less code.
>>
>>
>> First : There is a(n avoidable) branch in your code... ;)
> Why? just a simple jump for the CPU.


AMD optimisation guide (PDF #47414):


2.6 Branch-Prediction

To predict and accelerate branches, AMD Family 15h processors employ a
combination of next-address logic, a 2-level branch target buffer (BTB)
for branch identification and direct target prediction, a return address
stack used for predicting return addresses, an indirect target predictor
for predicting indirect jump and call addresses, a hybrid branch predictor
for predicting conditional branch directions, and a fetch window tracking
structure (BSR). Predicted-taken branches incur a 1-cycle bubble in the
branch prediction pipeline when they are predicted by the L1 BTB, and a
4-cycle bubble in the case where they are predicted by the L2 BTB. The
minimum branch misprediction penalty is 20 cycles in the case of
conditional and indirect branches and 15 cycles for unconditional direct
branches and returns.


iNTEL 64 and IA-32 Architectures Optimization Reference Manual
(PDF 248966-030):

3.4.1.1 Eliminating Branches

Eliminating branches improves performance because:

• It reduces the possibility of mispredictions.
• It reduces the number of required branch target buffer (BTB) entries.
Conditional branches, which are never taken, do not consume BTB
resources.

There are four principal ways of eliminating branches:

• Arrange code to make basic blocks contiguous.
• Unroll loops, as discussed in Section 3.4.1.7, “Loop Unrolling.”
• Use the CMOV instruction.
• Use the SETCC instruction.

The following rules apply to branch elimination:

Assembly/Compiler Coding Rule 1. (MH impact, M generality) Arrange code
to make basic blocks contiguous and eliminate unnecessary branches.

Assembly/Compiler Coding Rule 2. (M impact, ML generality) Use the SETCC
and CMOV instructions to eliminate unpredictable conditional branches
where possible. Do not do this for predictable branches. Do not use these
instructions to eliminate all unpredictable conditional branches (because
using these instructions will incur execution overhead due to the
requirement for executing both paths of a conditional branch). In addition,
converting a conditional branch to SETCC or CMOV trades off control flow
dependence for data dependence and restricts the capability of the
out-of-order engine. When tuning, note that all Intel 64 and IA-32
processors usually have very high branch prediction rates. Consistently
mispredicted branches are generally rare. Use these instructions only if
the increase in computation time is less than the expected cost of a
mispredicted branch.


>> Second: The result should be zero if the input is zero or
>> negative - only positive numbers should return 1.
>>
> It is zero if the input is 0 or -. ?
> The original question was to make a logic 1 or 0 when
> ever the value is 0 for logic 0 and 1 for any other
> number and that included - numbers.


The top level post in this thread asked this question:

>>>> Hello,
>>>>
>>>> Is there a more efficient routine for the following:
>>>>
>>>> procedure Constrain( var Para : integer );
>>>> begin
>>>> if Para < 0 then Para := 0;
>>>> if Para > 1 then Para := 1;
>>>> end;
>>>>
>>>> The idea here is to constrain an integer to 0 if it's negative or zero, or 1 if it's positive.

What you replied is a *very* fancyful interpretation of the
original task... ;)


> Must we not forget that (-trues) 16 or more bit land exist with
> widows programming. and if you wanted to truely treat (-) as logic
> 0, which they are not, you can simply use a different flag, the rest
> of the code is the same.
>
> Getting back to the original coding that was being run.
>
> When "AND EAX, EAX", the contents remain but flags are set.
> (Z) flag gets set when there is any bits set, including the
> sign.


"TEST EAX, EAX" leaves the content of EAX untouched and the
flags are set, as well.


> MOV operation does not effect the flag reg, last time I knew,
> so the flag status is still set.


Yes.


> And like before, the JZ is a simple jump? I don't see much CPU time
> there. In fact I invite it.


The usual branch prediction logic is:

1. Predict branch was not taken (1st)
2. Predict branch was taken (2nd)
3. Use prediction table for following predictions (Nth)

(3) only applies as long as that branch label is stored in
the branch prediction buffer. If the buffer was overwritten
with later branch targets, begin with (1), again. Guess how
often this happens in a multithreaded environment.

With your code you will trigger a penalty of 20 clocks each
time the function is called. After two subsequent calls, it
should be executed in three clocks, but still does not sort
out negative numbers.

My code needs 4 clocks to execute, but works as demanded by
the initial post.
0 new messages