M Philbrook wrote:
> In article <muomfm$1o4$
2...@dont-email.me>,
scho...@web.de says...
>>>>
>>>> NEG EAX
>>>> XOR ECX, ECX
>>>> CMP EAX, 0x80000000
>>>> CMOVE EAX, ECX
>>>> SHR EAX, 31
>>>>
>>>> Quite a lot to get a "100% solution"...
>>>
>>>
>>> AND EAX, EAX; ; assume value is in EAX and generate a flag.
>>> Mov EAX, 00; ; Now zero it, should not harm flags.
>>> JZ @done; ; Now use Z flag to finalize it.
>>> Inc EAX, 1; ;force 1 for true if not Z flag.
>>> @Done;
>>>
>>> That should yeld "1" for any value not 0
>>> and if you want to use the ZERO flag as a True/False results
>>> you can. So you kill 2 birds with one stone and it's less code.
>>
>>
>> First : There is a(n avoidable) branch in your code... ;)
> Why? just a simple jump for the CPU.
AMD optimisation guide (PDF #47414):
2.6 Branch-Prediction
To predict and accelerate branches, AMD Family 15h processors employ a
combination of next-address logic, a 2-level branch target buffer (BTB)
for branch identification and direct target prediction, a return address
stack used for predicting return addresses, an indirect target predictor
for predicting indirect jump and call addresses, a hybrid branch predictor
for predicting conditional branch directions, and a fetch window tracking
structure (BSR). Predicted-taken branches incur a 1-cycle bubble in the
branch prediction pipeline when they are predicted by the L1 BTB, and a
4-cycle bubble in the case where they are predicted by the L2 BTB. The
minimum branch misprediction penalty is 20 cycles in the case of
conditional and indirect branches and 15 cycles for unconditional direct
branches and returns.
iNTEL 64 and IA-32 Architectures Optimization Reference Manual
(PDF 248966-030):
3.4.1.1 Eliminating Branches
Eliminating branches improves performance because:
• It reduces the possibility of mispredictions.
• It reduces the number of required branch target buffer (BTB) entries.
Conditional branches, which are never taken, do not consume BTB
resources.
There are four principal ways of eliminating branches:
• Arrange code to make basic blocks contiguous.
• Unroll loops, as discussed in Section 3.4.1.7, “Loop Unrolling.”
• Use the CMOV instruction.
• Use the SETCC instruction.
The following rules apply to branch elimination:
Assembly/Compiler Coding Rule 1. (MH impact, M generality) Arrange code
to make basic blocks contiguous and eliminate unnecessary branches.
Assembly/Compiler Coding Rule 2. (M impact, ML generality) Use the SETCC
and CMOV instructions to eliminate unpredictable conditional branches
where possible. Do not do this for predictable branches. Do not use these
instructions to eliminate all unpredictable conditional branches (because
using these instructions will incur execution overhead due to the
requirement for executing both paths of a conditional branch). In addition,
converting a conditional branch to SETCC or CMOV trades off control flow
dependence for data dependence and restricts the capability of the
out-of-order engine. When tuning, note that all Intel 64 and IA-32
processors usually have very high branch prediction rates. Consistently
mispredicted branches are generally rare. Use these instructions only if
the increase in computation time is less than the expected cost of a
mispredicted branch.
>> Second: The result should be zero if the input is zero or
>> negative - only positive numbers should return 1.
>>
> It is zero if the input is 0 or -. ?
> The original question was to make a logic 1 or 0 when
> ever the value is 0 for logic 0 and 1 for any other
> number and that included - numbers.
The top level post in this thread asked this question:
>>>> Hello,
>>>>
>>>> Is there a more efficient routine for the following:
>>>>
>>>> procedure Constrain( var Para : integer );
>>>> begin
>>>> if Para < 0 then Para := 0;
>>>> if Para > 1 then Para := 1;
>>>> end;
>>>>
>>>> The idea here is to constrain an integer to 0 if it's negative or zero, or 1 if it's positive.
What you replied is a *very* fancyful interpretation of the
original task... ;)
> Must we not forget that (-trues) 16 or more bit land exist with
> widows programming. and if you wanted to truely treat (-) as logic
> 0, which they are not, you can simply use a different flag, the rest
> of the code is the same.
>
> Getting back to the original coding that was being run.
>
> When "AND EAX, EAX", the contents remain but flags are set.
> (Z) flag gets set when there is any bits set, including the
> sign.
"TEST EAX, EAX" leaves the content of EAX untouched and the
flags are set, as well.
> MOV operation does not effect the flag reg, last time I knew,
> so the flag status is still set.
Yes.
> And like before, the JZ is a simple jump? I don't see much CPU time
> there. In fact I invite it.
The usual branch prediction logic is:
1. Predict branch was not taken (1st)
2. Predict branch was taken (2nd)
3. Use prediction table for following predictions (Nth)
(3) only applies as long as that branch label is stored in
the branch prediction buffer. If the buffer was overwritten
with later branch targets, begin with (1), again. Guess how
often this happens in a multithreaded environment.
With your code you will trigger a penalty of 20 clocks each
time the function is called. After two subsequent calls, it
should be executed in three clocks, but still does not sort
out negative numbers.
My code needs 4 clocks to execute, but works as demanded by
the initial post.