large sys time for nbody using 5l

Fango

unread,

May 30, 2011, 4:18:03 AM5/30/11

to golang-nuts

nbody -n 50000000
gcc -O2 -lm nbody.c 316.08u 0.40s 389.46r
gc nbody 645.32u 686.06s 1843.30r
gc_B nbody 653.49u 640.93s 1373.40r

This run against an ARM Cortex-A8 board [1], I don't know why gc spent
such long time in sys. Where should I go figuring?

Thanks,
Fango

[1] http://going-along.blogspot.com/2011/05/easy-going-with-arm.html

Fango

unread,

May 30, 2011, 5:00:05 AM5/30/11

to golang-nuts

5 million iteration has negligible sys, but not 50 million.

lucid@lucid-desktop:~/go/test/bench$ time ./nbody.arm6 -n 5000000
-0.169075164
-0.169083134

real 2m15.345s
user 2m9.450s
sys 0m0.030s
lucid@lucid-desktop:~/go/test/bench$ time ./nbody.arm6 -n 50000000
-0.169075164
-0.169059907

real 22m26.845s
user 10m50.470s
sys 10m43.690s

Rob 'Commander' Pike

unread,

May 30, 2011, 5:48:27 AM5/30/11

to Fango, golang-nuts

Any chance it's floating-point emulation?

-rob

Fango

unread,

May 30, 2011, 5:55:25 AM5/30/11

to golang-nuts

I double checked it's not. Also it's not exhibited in 5mil iterations,
but in 50 millions.

Russ Cox

unread,

May 30, 2011, 3:55:34 PM5/30/11

to Fango, golang-nuts

Are you swapping?

Howard Fan

unread,

May 30, 2011, 9:06:21 PM5/30/11

to r...@golang.org, golang-nuts

No. The ARM board has no swap partition. And I checked `top`, the
memory stayed, so no memory leaks, and no GC I think.

I just tried the same on a ARM Cortex-A9 [1], which VFP is much faster
than A8. But to my astonishment, 5g floating point crunching is 11
times slower than optimized gcc. Something may be related to the long
sys time observation above?

nbody -n 50000000
gcc -O2 -lm nbody.c 71.40u 0.00s 71.41r
gc nbody 862.53u 0.02s 862.78r
gc_B nbody 865.00u 0.05s 865.28r

[1] http://going-along.blogspot.com/2011/05/easy-going-with-arm-kung-fu-panda-2.html

On Tue, May 31, 2011 at 3:55 AM, Russ Cox <r...@golang.org> wrote:
> Are you swapping?
>

Fango

unread,

May 31, 2011, 4:41:19 AM5/31/11

to golang-nuts

As it turns out, floating point for gc is NOT 11x slower than gcc. It
was caused by unoptimized math.Sqrt. When I changed nboby.c and
nbody.go to use their own sqrt (simply return 1.0), the timing is
comparable (Note it's benchmarked on A8, not A9)

nbody -n 50000000
gcc -O2 -lm nbody.c 202.57u 0.03s 206.27r
gc nbody 361.91u 0.04s 368.25r

That said, ARM VFP actually has VSQRT instruction, but is not
supported in 5l/optab.c. Adding that and sqrt_arm.s in pkg/math should
be easy. I may give it a try if you guys have no time to do it in the
next couple of weeks.

Cheers,
Fango

Reply all

Reply to author

Forward