Optimized flags for Pentium-M (see inside)

1 view
Skip to first unread message

numlock

unread,
Apr 1, 2004, 4:07:33 PM4/1/04
to
Hi,

This is my first post.

I've been using Gentoo 2004.0 with several precompiled Pentium4 packages
on my Centrino laptop for a little while.. it works great !!

Now I've tried many optimization switches with some of my projects
(mainly heavy calculations stuff), and found this command line to be the
best:

CFLAGS="-pipe -O3 -march=pentium4 -mmmx -msse -msse2 -mfpmath=sse,387
-maccumulate-outgoing-args -mno-align-stringops -fomit-frame-pointer
-ffast-math -funroll-all-loops -fsched-spec-load -fprefetch-loop-arrays
-ftracer -fmove-all-movables --param max-gcse-passes=4"

Perhaps -msse is redundant, but I'm not certain.

Btw I post this for a *constructive* purpose, ie: for helping Pentium M
people with their CFLAGS :-)

When compared to Gentoo's Pentium 4 compiles, the speed improvement
(tested on bzip2, gzip and various others) is 10..15%. It seems I had no
miscompiles so far.

Size increase is about 20%, which could probably be reduced if we use
something less extreme than -funroll-all-loops.
For big applications, maybe some size decrease would be desirable.

Please let me know what you think :-)

Greeting to all !

Joėl

K

unread,
Apr 1, 2004, 4:59:55 PM4/1/04
to

Someone correct me if I'm wrong but I thought the Pentium M owed its
design more to the PIII than the P4. The only P4 features of the PM is the
quad pumped 400 MHz bus and SSE2. Other than that the two designs couldn't
be further apart.

K

Stavros Christoforou

unread,
Apr 1, 2004, 6:49:47 PM4/1/04
to

K, you are right. Although it is more accurate to say that Pentium M is
not close to any of the two. However, flag-wise, it is closer to PIII
than PIV. My not so optimized flags are:

CFLAGS="-O3 -mcpu=pentium3 -mmmx -msse -msse2 -fomit-frame-pointer
-funroll-all-loops -pipe"

I believe it is better to use mcpu and the parameters you want than
march, march sometimes breaks stuff. Everything compiles and runs fine,
even sensitive stuff like ximian-openoffice.

Any comments are welcome
Stavros

Damian Kolkowski

unread,
Apr 1, 2004, 5:14:16 PM4/1/04
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

* numlock <numlock...@freesurf.ch> [2004-04-01 23:07]:


> Now I've tried many optimization switches with some of my projects
> (mainly heavy calculations stuff), and found this command line to be the
> best:
>
> CFLAGS="-pipe -O3 -march=pentium4 -mmmx -msse -msse2 -mfpmath=sse,387
> -maccumulate-outgoing-args -mno-align-stringops -fomit-frame-pointer
> -ffast-math -funroll-all-loops -fsched-spec-load -fprefetch-loop-arrays
> -ftracer -fmove-all-movables --param max-gcse-passes=4"

gcc -Q -v -march=pentium4 file.c

and file.c inside: "int main() {}".

Try reduce those flags ;-)

P.S. Touch this [1] and add -O2 - belife me thats all, and thats stable and
good for all.

[1] http://kolkowski.no-ip.org/ftp/MY_tmp/gcc_test.sh

- --
# Damian *dEiMoS* Kołkowski # http://kolkowski.no-ip.org/ #
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAbJQ7zbCEkrLF3gMRArxeAKCeqXy/cEPZdz7zFlS6i7s9mo7gtwCeKLBz
xmViytSg3H8GbXX7xDZw1P0=
=eXVg
-----END PGP SIGNATURE-----

Damian Kolkowski

unread,
Apr 2, 2004, 11:50:43 AM4/2/04
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

* Stavros Christoforou <stg...@hotmail.com> [2004-04-02 01:49]:


> CFLAGS="-O3 -mcpu=pentium3 -mmmx -msse -msse2 -fomit-frame-pointer
> -funroll-all-loops -pipe"
>
> I believe it is better to use mcpu and the parameters you want than
> march, march sometimes breaks stuff. Everything compiles and runs fine,
> even sensitive stuff like ximian-openoffice.

Cool -mcpu=pentium3 and -march=i386 - yeachhhhhh ;-D

- --
# Damian *dEiMoS* Kołkowski # http://kolkowski.no-ip.org/ #
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAbZnnzbCEkrLF3gMRAqF6AKC+IrQIjiGlVSp1gQFq3HSgjOj0EgCgm0wi
6idbiOXeLQUv77d9dRlYQUg=
=Y4tl
-----END PGP SIGNATURE-----

numlock

unread,
Apr 2, 2004, 12:36:47 PM4/2/04
to

Well I totally agree about Pentium M's architecture being very close to
that of a P4, but when I add -cpu=pentium3 I lose about 5% speed (in my
tests anyway).

Regards,
Joėl

numlock

unread,
Apr 2, 2004, 12:56:35 PM4/2/04
to
Damian Kolkowski wrote:
> * numlock <numlock...@freesurf.ch> [2004-04-01 23:07]:
>
>>>Now I've tried many optimization switches with some of my projects
>>>(mainly heavy calculations stuff), and found this command line to be the
>>>best:
>>>
>>>CFLAGS="-pipe -O3 -march=pentium4 -mmmx -msse -msse2 -mfpmath=sse,387
>>>-maccumulate-outgoing-args -mno-align-stringops -fomit-frame-pointer
>>>-ffast-math -funroll-all-loops -fsched-spec-load -fprefetch-loop-arrays
>>>-ftracer -fmove-all-movables --param max-gcse-passes=4"
>
>
> gcc -Q -v -march=pentium4 file.c
>
> and file.c inside: "int main() {}".
>
> Try reduce those flags ;-)

I know a more complex flag list is not necessarily faster, but in this
case I measured the difference. Try it sometime ;-)

Admittedly, it is more likely to break stuff than yours..

Cheers
Joël

Xanadu

unread,
Apr 2, 2004, 3:16:39 PM4/2/04
to
-Os

Seriously

-Os


Reply all
Reply to author
Forward
0 new messages