Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

-ftree-vectorize in C

142 views
Skip to first unread message

Terran Melconian

unread,
Jan 18, 2008, 3:46:50 PM1/18/08
to
Good afternoon! I have been experiencing some difficulties with the
-ftree-vectorize optimization with the C front end - namely, it does not
produce vectorized code in an example where I would have expected it to.

The version of gcc in use for all languages is 4.1.2 (with some patches
from the OS distributor, Debian, the details of which I do not know.)

I shall begin with an example that works as I expect it to, in Fortran:

program vectorized
real a(1024)

do i=1,1024
a(i)=i
end do

do l=1,100000
a=a*1.00001
end do

do i=1,1024,64
print *,a(i)
end do

end program

I compile with these options:

gfortran -march=athlon64 -c -S -ftree-vectorize -O3

As expected, the addition of the -ftree-vectorize causes "mulps"
instructions to be generated, and cuts runtime by a factor of 3 relative
to not using the -ftree-vectorize option.

I would have expected the same phenomenon to occur with the following C
program:

#include <stdio.h>

int main (void)
{
float a[1024];

for (int i=0; i<1024; i++)
a[i]=i;

for (int l=0; l<100000; l++)
for (int i=0; i<1024; i++)
a[i]=a[i]*1.00001;

for (int i=0; i<1024; i+=64)
printf("%g\n", a[i]);
}

but it does not. Looking at the assembly with -S does not show mulps
instructions, and the runtime does not decrease.

Clearly I am missing something. What changes must I make to the C code
so that it can be vectorized by the compiler?

Terran Melconian

unread,
Jan 19, 2008, 2:38:05 PM1/19/08
to
On 2008-01-18, Terran Melconian <te_rem_ra_ov...@consistent.org> wrote:
> I would have expected the same phenomenon to occur with the following C
> program:
>
> #include <stdio.h>
>
> int main (void)
> {
> float a[1024];
>
> for (int i=0; i<1024; i++)
> a[i]=i;
>
> for (int l=0; l<100000; l++)
> for (int i=0; i<1024; i++)
> a[i]=a[i]*1.00001;
>
> for (int i=0; i<1024; i+=64)
> printf("%g\n", a[i]);
> }
>
> but it does not. Looking at the assembly with -S does not show mulps
> instructions, and the runtime does not decrease.

It appears that the problem lies in in the handling of floats and
doubles. While I cannot remember the details and do not have a copy of
the standard with me at this location, I found empirically that
autovectorization worked as expected with the addition of an "f" after
the constant 1.00001 to force it to be interpreted as a single-precision
value, and that in fact the results change slightly (on the order of
0.15% in this example) as a consequence.

Terran Melconian

unread,
Jan 24, 2008, 4:57:38 PM1/24/08
to
condemned without hearing. To Judas: Amice, ad guid
venisti?[175] To him that had not on the wedding garment, the same.

781. The types of the completeness of the Redemption, as that the sun gives
light to all, indicate only completeness; but the types of exclusions, as of
the Jews elected to the exclusion of the Gentiles, indicate exclusion.

"Jesus Christ the Redeemer of all." Yes, for He has offered, like a man who
has ransomed all those who were willing to come to Him. If any die on the
way, it is their misfortune; but, so far as He was concerned, He offered
them redemption. That holds good in this example, where he who ransoms and
he who prevents death are two persons, but not of Jesus Christ, who does
both these things. No, for Jesus Christ, in the quality of Redeemer, is not
perhaps Master of all; and thus, in so far as it is in Him, He is the
Redeemer of all.

When it is said that Jesus Christ did not die for all, you take undue
advantage of a fault in men who at once apply this exception to themselves;
and is to favour despair, instead of turning them from it to favour hope.
For men thus accustom themselves in inward virtues by ou


0 new messages