Fixed Point Rounding

Tricky

unread,

May 14, 2010, 10:43:40 AM5/14/10

to

Im using the new float fix lib, and Im getting a bit confused with the
resize function.

Im using the following code:

video_us : out ufixed(7 downto 0)
..
variable weighted_pix0 : ufixed(7 downto -12);
variable weighted_pix1 : ufixed(7 downto -12);
......
video_us <= resize(weighted_pix0 + weighted_pix1, 7 ,
0);

Now, when the sum of the 2 weighted pixels is X.5, the rounded output
(round style is defaulting to fixed_round) is always X, not X+1. Is
there a rule Im missing? Is this not what the resize function is for,
because otherwise whats the point of the "round_style" and
"overflow_style" function arguments? do I have to go back to old way
of rounding that is +0.5 and then truncating?

Thanks for help in advance

KJ

unread,

May 14, 2010, 7:36:34 PM5/14/10

to

On May 14, 10:43 am, Tricky <trickyh...@gmail.com> wrote:
> Im using the new float fix lib, and Im getting a bit confused with the
> resize function.
>
> Im using the following code:
>
> video_us : out ufixed(7 downto 0)
> ..
> variable weighted_pix0 : ufixed(7 downto -12);
> variable weighted_pix1 : ufixed(7 downto -12);
> ......
> video_us <= resize(weighted_pix0 + weighted_pix1, 7 ,
> 0);
>
> Now, when the sum of the 2 weighted pixels is X.5, the rounded output
> (round style is defaulting to fixed_round) is always X, not X+1. Is
> there a rule Im missing?

What you're missing is that whether X.5 rounds up or down depends on
what X is.

From ther user's guide...
"round_style" defaults to fixed_round (true) that turns on the
rounding routines. If false (fixed_truncate), the number is truncated.
Rounding is done by first looking to see if the MSB of the remainder
is a “1”, AND the LSB of the unrounded result is a “1” or the lower
bits of the remainder include a “1”, the result will be rounded. This
is similar to the floating-point “round_nearest” style. The down side
is that ALL of the bits are included in the decision to round

> do I have to go back to old way
> of rounding that is +0.5 and then truncating?
>

I use the "+0.5 and then truncating" approach because it takes less
logic to implement (hence the 'down side' mentioned in the user's
guide) and my requirements haven't so far required the floating-point
“round_nearest” style

Kevin Jennings

David Bishop

unread,

May 14, 2010, 11:04:14 PM5/14/10

to

KJ wrote:
> On May 14, 10:43 am, Tricky <trickyh...@gmail.com> wrote:
>> Im using the new float fix lib, and Im getting a bit confused with the
>> resize function.
>>
>> Im using the following code:
>>
>> video_us : out ufixed(7 downto 0)
>> ..
>> variable weighted_pix0 : ufixed(7 downto -12);
>> variable weighted_pix1 : ufixed(7 downto -12);
>> ......
>> video_us <= resize(weighted_pix0 + weighted_pix1, 7 ,
>> 0);
>>
>> Now, when the sum of the 2 weighted pixels is X.5, the rounded output
>> (round style is defaulting to fixed_round) is always X, not X+1. Is
>> there a rule Im missing?
>
> What you're missing is that whether X.5 rounds up or down depends on
> what X is.
>
> From ther user's guide...
> "round_style" defaults to fixed_round (true) that turns on the
> rounding routines. If false (fixed_truncate), the number is truncated.
> Rounding is done by first looking to see if the MSB of the remainder

> is a ï¿½1ï¿½, AND the LSB of the unrounded result is a ï¿½1ï¿½ or the lower
> bits of the remainder include a ï¿½1ï¿½, the result will be rounded. This
> is similar to the floating-point ï¿½round_nearestï¿½ style. The down side

> is that ALL of the bits are included in the decision to round

KJ is correct.

4.5 rounds to 4
5.5 rounds to 6

Though carrying 12 bits of decimal is a bit overkill.

>> do I have to go back to old way
>> of rounding that is +0.5 and then truncating?
>>
>
> I use the "+0.5 and then truncating" approach because it takes less
> logic to implement (hence the 'down side' mentioned in the user's
> guide) and my requirements haven't so far required the floating-point

> ï¿½round_nearestï¿½ style

Which causes data forking.
Take the numbers 1.5 to 8.5 and round by your method and add up the
error. Then do the same for my method (I wrote the fixed point packages).

KJ

unread,

May 15, 2010, 12:04:51 AM5/15/10

to

On May 14, 11:04 pm, David Bishop <dbis...@vhdl.org> wrote:

> KJ wrote:
> > I use the "+0.5 and then truncating" approach because it takes less
> > logic to implement

Should've said "I *have* used +0.5...". I've also used 'fixed_round'.

>
> Which causes data forking.
> Take the numbers 1.5 to 8.5 and round by your method and add up the
> error.

Depending on the application though, this additional error for certain
input combinations might still be acceptable. "+0.5 and truncate" is
generally intermediate between 'fixed_truncate' and 'fixed_round' both
in error and logic resources to implement. Which of the three methods
is 'best' will depend on the accuracy requirements of the particular
application.

"+0.5 and truncate" is a design tradeoff that should be evaluated
versus 'fixed_truncate' and 'fixed_round'...it's just another tool in
the toolbox.

As an example of resource usage, I took Tricky's code (actual code
posted below) and ran it through Quartus 9.0 to produce the following
results:

Rounding method Logic resources
=============== ===============
fixed_round 44
+.5_and_trunc 39
fixed_truncate 29

Kevin Jennings

--- START OF CODE
library ieee_proposed;
use ieee_proposed.math_utility_pkg.all;
use ieee_proposed.fixed_pkg.all;

entity Resizer_Adder is port(
weighted_pix0: in ufixed(7 downto -12);
weighted_pix1: in ufixed(7 downto -12);
video_us: out ufixed(7 downto 0));
end Resizer_Adder;
architecture rtl of Resizer_Adder is
begin
-- Uncomment the line you would like to evaluate
video_us <= resize(weighted_pix0 + weighted_pix1, 7 , 0,
fixed_overflow_style,fixed_round);
-- video_us <= resize(weighted_pix0 + weighted_pix1 +
to_ufixed(0.5,-1,-1), 7 , 0, fixed_overflow_style,fixed_truncate);
-- video_us <= resize(weighted_pix0 + weighted_pix1, 7 , 0,
fixed_overflow_style,fixed_truncate);
end rtl;
--- END OF CODE

Message has been deleted

Tricky

unread,

May 17, 2010, 3:47:37 AM5/17/10

to

>
> KJ is correct.
>
> 4.5 rounds to 4
> 5.5 rounds to 6
>
> Though carrying 12 bits of decimal is a bit overkill.
>
> >> do I have to go back to old way
> >> of rounding that is +0.5 and then truncating?
>
> > I use the "+0.5 and then truncating" approach because it takes less
> > logic to implement (hence the 'down side' mentioned in the user's
> > guide) and my requirements haven't so far required the floating-point

> > “round_nearest” style

>
> Which causes data forking.
> Take the numbers 1.5 to 8.5 and round by your method and add up the
> error. Then do the same for my method (I wrote the fixed point packages).

For me, data forking (if I understand correctly - is that like
compounding errors?) shouldnt be an issue becuase this is the final
output. The 12 bits are carried only because I have a previous divide
(by a constant 2^n, with n as a generic) with input data only carrying
4 bits fractional. 12 bits contains the worst possible case of N, with
me expecting the synthesiser to clear up anything thats overkill. And
actually, for my output, I require X.5 to always round to X+1, so
there is no error.