Help with gcc inline floating point asm

Erik de Castro Lopo

unread,

Aug 20, 2001, 8:54:08 AM8/20/01

to

Hi all,

Casting from float to int

ie float x = 12.3456 ;
int y = (int) x ;

is very slow on x86. This can apparently be sped up considerable using
something like this (Windows VC++) code:

int
float_to_int (float flt)
{ int i;
static const double half = 0.5f;

_asm
{
fld flt
fsub half
fistp i
}
return i
}

I'm trying to convert this to inline asm for compiling with gcc. I
found this piece of documentation :

http://www.castle.net/~avly/djasm.html

and using that I came up with the following. This code compiles
without errors and doesn't crash when I run it. Unfortunately
it doesn't produce the required results either.

static int
asm_float2int (float flt)
{ static const double half = 0.5f;
int out ;

__asm__ __volatile__ ("\n\
flds %1 \n\
fsubl %2 \n\
fistps %0 \n"
: "=g" (out) /* Output */
: "g" (flt), "g" (half) /* Inputs */
) ;

return out ;
} /* asm_float2int */

The assembler output of gcc which looks like this which seems to
be OK.

.section .rodata
.align 8
.type half.12,@object
.size half.12,8
half.12:
.long 0x0,0x3fe00000
.text
.align 4
.type asm_float2int,@function
asm_float2int:
subl $24,%esp
pushl %ebp
#APP

flds 32(%esp)
fsubl half.12
fistps 12(%esp)

#NO_APP
movl 12(%esp),%edx
movl %edx,%eax
jmp .L9
.p2align 4,,7
.L9:
popl %ebp
addl $24,%esp
ret

Anybody have any idea what is going wrong and how I might fix it?

Thanks in advance,
Erik
--
-----------------------------------------------------------------
Erik de Castro Lopo nos...@mega-nerd.com (Yes its valid)
-----------------------------------------------------------------
#!/bin/sh
unzip ; strip; touch ; finger ; mount ; gasp ;
yes ; more ; umount ; sleep ;

Phil Carmody

unread,

Aug 20, 2001, 10:25:31 AM8/20/01

to

I'll look at what you've written again later today, with a view to
actually answering the question asked! However, in the meantime I'll
suggest an alternative method.

Can you not add a magic value which causes fractional values to drop off
the mantissa (and be rounded appropriately), then subtract the magic
value (to get back what you started with in a rounded form), then store
that as a floating point value, then read the lowest 32 bits of where
the mantissa would be placed as an integer value? There may be endian
issues, and there will also be issure of chosing the magic value
depending on your rounding mode, and whether you will be using negative
values or not.
I believed I asked how to do this on comp.lang.c.moderated a few months
back, so a dejagoogle search including my name, that newsgroup and the
search words 'rounding', 'portable', and maybe 'IEEE' may help. I can't
google due to firewall/proxy problems today.

Phil

Joe Leherbauer

unread,

Aug 20, 2001, 11:01:48 AM8/20/01

to

Erik de Castro Lopo <nos...@mega-nerd.net> wrote:
>
> static int
> asm_float2int (float flt)
> { static const double half = 0.5f;
> int out ;
>
> __asm__ __volatile__ ("\n\
> flds %1 \n\
> fsubl %2 \n\
> fistps %0 \n"
> : "=g" (out) /* Output */
> : "g" (flt), "g" (half) /* Inputs */
> ) ;
>
> return out ;
> } /* asm_float2int */
>

> [SNIP]

>
> Anybody have any idea what is going wrong and how I might fix it?

The operand constraint "g" can only be used for *integer* operands.
Check the gcc info doc for how to deal with x86 floating-point operands.

I haven't tested this, but it probably works:

static int
asm_float2int (float f)
{
int i;

__asm__ ("fistpl %0"
: "=m" (i) /* memory */
: "t" (f - 0.5) /* top of FPU stack = %st(0) */
: "st" /* %st = %st(0) is clobbered, i.e. popped by asm */
);

return i;
}

---
Joe Leherbauer Leherbauer at telering dot at

"Somewhere something incredible is waiting to be known."
-- Isaac Asimov

Phil Frisbie, Jr.

unread,

Aug 20, 2001, 12:20:14 PM8/20/01

to

One Phil adding to another Phil's message :)

Phil Carmody wrote:
>
> Can you not add a magic value which causes fractional values to drop off
> the mantissa (and be rounded appropriately), then subtract the magic
> value (to get back what you started with in a rounded form), then store
> that as a floating point value, then read the lowest 32 bits of where
> the mantissa would be placed as an integer value? There may be endian
> issues, and there will also be issure of chosing the magic value
> depending on your rounding mode, and whether you will be using negative
> values or not.
> I believed I asked how to do this on comp.lang.c.moderated a few months
> back, so a dejagoogle search including my name, that newsgroup and the
> search words 'rounding', 'portable', and maybe 'IEEE' may help. I can't
> google due to firewall/proxy problems today.

I prefer not to use assembly, and stick with C instead. Here is a copy of my
header file I use to convert floats to long integers. Note that there are four
routines; two that are for any sign float, and two for unsigned floats. And for
each you can choose to round or truncate the result.

/* ftol.h by Phil Frisbie, Jr. ph...@hawksoft.com */
#ifdef __GNUC__
#define INLINE __inline__
#else
#ifdef _MSC_VER
#define INLINE __inline
#else
#define INLINE
#endif
#endif

#define FP_BITS(fp) (*(int *)&(fp))
#define FP_ABS_BITS(fp) (FP_BITS(fp)&0x7FFFFFFF)
#define FP_SIGN_BIT(fp) (FP_BITS(fp)&0x80000000)
#define FP_ONE_BITS 0x3F800000

#define FIST_FLOAT_MAGIC_S (float)(7.0f * 2097152.0f)
#define FIST_FLOAT_MAGIC_U (float)((float)(6)*(1<<21)) /*
((127+23)<<23)|(0x10<<21) */

INLINE int round(float inval)
{
float tmp = FIST_FLOAT_MAGIC_S + inval;
int res = ((FP_BITS(tmp)<<10)-0x80000000);
return res>>10;
}

INLINE int trunc(float inval) {
float tmp = FIST_FLOAT_MAGIC_S + (inval-0.4997f);
int res = ((FP_BITS(tmp)<<10)-0x80000000);
return res>>10;
}

INLINE int uround(float inval) {
float tmp = FIST_FLOAT_MAGIC_U + inval;
return FP_BITS(tmp)&0x003fffff;
}

INLINE int utrunc(float inval) {
float tmp = FIST_FLOAT_MAGIC_U + (inval-0.4997f);
return FP_BITS(tmp)&0x003fffff;
}

Phil Frisbie, Jr.
Lead Developer, Hawk Software
http://www.hawksoft.com

Kaz Kylheku

unread,

Aug 20, 2001, 12:26:28 PM8/20/01

to

In article <3B810865...@mega-nerd.net>, Erik de Castro Lopo wrote:
>Hi all,
>
>Casting from float to int
>
> ie float x = 12.3456 ;
> int y = (int) x ;

This doesn't require a cast operator.

>is very slow on x86. This can apparently be sped up considerable using
>something like this (Windows VC++) code:

Strange. Why don't the compiler writers know something that you know?

>int
>float_to_int (float flt)
>{ int i;
> static const double half = 0.5f;

Subtracting 0.5 changes the semantics, so it's not an optimization!

>and using that I came up with the following. This code compiles
>without errors and doesn't crash when I run it. Unfortunately
>it doesn't produce the required results either.
>
>static int
>asm_float2int (float flt)
>{ static const double half = 0.5f;
> int out ;
>
> __asm__ __volatile__ ("\n\
> flds %1 \n\
> fsubl %2 \n\
> fistps %0 \n"
> : "=g" (out) /* Output */
> : "g" (flt), "g" (half) /* Inputs */
> ) ;
>
> return out ;
>} /* asm_float2int */

Try

static int
float2int (float f)
{ return f + 0.5f; }

If this emits a machine instruction sequence that you think is too
inefficient, it makes more sense to do a little work on GCC than to
litter application code with inline assembly language. Write to the
GCC mailing list or learn how to do the work yourself. If
this operation can be sped up, other people could benefit.

Zeljko Vrba

unread,

Aug 20, 2001, 1:05:38 PM8/20/01

to

On Mon, 20 Aug 2001 16:20:14 GMT, Phil Frisbie, Jr. <ph...@hawksoft.com> wrote:
>
> I prefer not to use assembly, and stick with C instead. Here is a copy of my
> header file I use to convert floats to long integers. Note that there are four
> routines; two that are for any sign float, and two for unsigned floats. And for
> each you can choose to round or truncate the result.
>

Wow, a totally new concept: unsigned float !
Probably you meant positive/negative, or..?

Erik de Castro Lopo

unread,

Aug 20, 2001, 4:43:32 PM8/20/01

to

Kaz Kylheku wrote:
>
> In article <3B810865...@mega-nerd.net>, Erik de Castro Lopo wrote:
> >Hi all,
> >
> >Casting from float to int
> >
> > ie float x = 12.3456 ;
> > int y = (int) x ;
>
> This doesn't require a cast operator.

True, but an explict cast makes it easier to find in the code later.

> >is very slow on x86. This can apparently be sped up considerable using
> >something like this (Windows VC++) code:
>
> Strange. Why don't the compiler writers know something that you know?

Maybe they don't think this is something worth optimising. My application
is audio processsing done in floating point with the samples saved to disk
as ints or shorts.

I've also done some benchmarking. This simple hack:

static int
c_hack3_float2int (float flt)
{ double dble ;
int out ;

dble = ((double) flt) + 68719476736.0 * 1.5 ;

out = ((int*) &dble)[0] >> 16 ;

return out ;
} /* c_hack3_float2int */

is nearly 4 times faster than the simply casting to int. When doing this on
100000 values, this is a worthwhile optimisation. The assembly hack is supposed
to be significantly faster again!

> >int
> >float_to_int (float flt)
> >{ int i;
> > static const double half = 0.5f;
>
> Subtracting 0.5 changes the semantics, so it's not an optimization!

There are a number of methods of rounding/truncating floats to ints. For
the speedups above, many people would happily change from round to truncate
or back the other way.

That is a long term goal. For the moment I'd just like to get the above
assembler working so I can actually benchmark this,

Erik
--
-----------------------------------------------------------------
Erik de Castro Lopo nos...@mega-nerd.com (Yes its valid)
-----------------------------------------------------------------

"Don't be fooled by NT/Exchange propaganda. M$ Exchange is
just plain broken and NT cannot handle the sustained load
of a high-volume remote mail server"
-- Eric S. Raymond in the Fetchmail FAQ

Erik de Castro Lopo

unread,

Aug 20, 2001, 5:20:32 PM8/20/01

to

Joe Leherbauer wrote:
>
> The operand constraint "g" can only be used for *integer* operands.
> Check the gcc info doc for how to deal with x86 floating-point operands.
>
> I haven't tested this, but it probably works:
>
> static int
> asm_float2int (float f)
> {
> int i;
>
> __asm__ ("fistpl %0"
> : "=m" (i) /* memory */
> : "t" (f - 0.5) /* top of FPU stack = %st(0) */
> : "st" /* %st = %st(0) is clobbered, i.e. popped by asm */
> );
>
> return i;
> }

Thank you Joe!

This works and is correct. My first preliminary benchmark also suggests
that it is 8 times faster than a standard cast.

I'm now going to do some thorough benchmarking and write a document comparing
the various methods I have found to convert floats to ints.

Thanks to all.

Erik
--
-----------------------------------------------------------------
Erik de Castro Lopo nos...@mega-nerd.com (Yes its valid)
-----------------------------------------------------------------

"I would rather spend 10 hours reading someone else's source
code than 10 minutes listening to Musak waiting for technical
support which isn't."
- Dr. Greg Wettstein, Roger Maris Cancer Center

Phil Frisbie, Jr.

unread,

Aug 20, 2001, 5:38:37 PM8/20/01

to

Yes, that was worded poorly... The second set of routines ignore the sign bit
and always return a positive integer, just the thing for many applications, and
they are faster.

Robert Redelmeier

unread,

Aug 20, 2001, 10:33:52 PM8/20/01

to

Erik de Castro Lopo wrote:
>

> Joe Leherbauer wrote:
> > __asm__ ("fistpl %0"

>
> Thank you Joe!
>
> This works and is correct. My first preliminary benchmark
> also suggests that it is 8 times faster than a standard cast.

This probably will be correct, but be careful:
`fist` uses the rounding bits of the FP control
word. AFAIK, `c` normally sets these to `chop`,
but you may not always have/want this behaviour.

-- Robert

Joe Leherbauer

unread,

Aug 21, 2001, 3:31:18 AM8/21/01

to

Erik de Castro Lopo <nos...@mega-nerd.net> wrote:

> Joe Leherbauer wrote:
> >
> > The operand constraint "g" can only be used for *integer* operands.
> > Check the gcc info doc for how to deal with x86 floating-point operands.
> >
> > I haven't tested this, but it probably works:
> >
> > static int
> > asm_float2int (float f)
> > {
> > int i;
> >
> > __asm__ ("fistpl %0"
> > : "=m" (i) /* memory */
> > : "t" (f - 0.5) /* top of FPU stack = %st(0) */
> > : "st" /* %st = %st(0) is clobbered, i.e. popped by asm */
> > );
> >
> > return i;
> > }
>
> Thank you Joe!
>
> This works and is correct. My first preliminary benchmark also suggests
> that it is 8 times faster than a standard cast.

We can do even a little better by avoiding "- 0.5".
At program startup set the FPU rounding mode to "truncate" like this:

/* Rounding Control */
#define FPU_RC_TRUNCATE ((1 << 11) | (1 << 10))

unsigned short fpu_cw;

/* get control word */
__asm__ __volatile__ ("fstcw %0" : "=m" (fpu_cw));

/* set rounding mode (round towards zero) */
fpu_cw |= FPU_RC_TRUNCATE;

/* set control word */
__asm__ __volatile__ ("fldcw %0" : : "m" (fpu_cw));

And I would turn asm_float2int into a macro:

int i;
float f;

#define FLOAT2INT(i,f) __asm__ ("fistpl %0" : "=m" (i) : "t" (f) : "st")

The following variant does not pop the top of the FPU stack (this task is
left to the compiler). It is probably faster in case you reuse `f'
afterwards:

#define FLOAT2INT(i,f) __asm__ ("fistl %0" : "=m" (i) : "t" (f))

Erik de Castro Lopo

unread,

Aug 21, 2001, 6:16:19 AM8/21/01

to

Joe Leherbauer wrote:
>
> We can do even a little better by avoiding "- 0.5".
> At program startup set the FPU rounding mode to "truncate" like this:

Yep, figured that one out for myself :-).

<snip>

> #define FLOAT2INT(i,f) __asm__ ("fistpl %0" : "=m" (i) : "t" (f) : "st")

This is what I'm currently using. In my benchmarking program, this is 13 times
faster then a regular C cast operation.

Now I want to figure out why the regular cast is so slow!

> The following variant does not pop the top of the FPU stack (this task is
> left to the compiler). It is probably faster in case you reuse `f'
> afterwards:
>
> #define FLOAT2INT(i,f) __asm__ ("fistl %0" : "=m" (i) : "t" (f))

In my application, this is actually slower. I can see that in some situations
it may be faster.

Erik
--
-----------------------------------------------------------------
Erik de Castro Lopo nos...@mega-nerd.com (Yes its valid)
-----------------------------------------------------------------

The early bird gets the worm, but the second mouse gets the cheese.