[Haskell-cafe] GHC strange behavior

Ruslan Evdokimov

unread,

Feb 16, 2008, 6:06:39 PM2/16/08

to haskel...@haskell.org

Hi, all!

I have strange GHC behavior. Consider the code:

import Control.Parallel

main = print (o `par` (fromInteger e) / (fromInteger o))
where
[e,o] = map sum $ map (`filter` numbers) [even, odd]
numbers = [1..10000000]

When it compiled without threaded it has 19068 ms to run, 396 Mb total
memory in use and %GC time 88.2%, the same with -threaded and +RTS -N1,
but with +RTS -N2 it takes only 3806 ms to run, 3 Mb total memory in use and
%GC time 8.1%. Why it so? It's a bug or I missed something?
I test it on dual-core Athlon X2 4200+ 2Gb running 64bit Gentoo system. gcc
4.2.2 and ghc 6.8.2.

--
Ruslan

Jonathan Cast

unread,

Feb 16, 2008, 6:19:31 PM2/16/08

to Ruslan Evdokimov, haskel...@haskell.org

On 16 Feb 2008, at 3:06 PM, Ruslan Evdokimov wrote:

> Hi, all!
>
> I have strange GHC behavior. Consider the code:
>
> import Control.Parallel
>
> main = print (o `par` (fromInteger e) / (fromInteger o))
> where
> [e,o] = map sum $ map (`filter` numbers) [even, odd]
> numbers = [1..10000000]
>
>
> When it compiled without threaded it has 19068 ms to run, 396 Mb
> total memory in use and %GC time 88.2%, the same with -
> threaded and +RTS -N1, but with +RTS -N2 it takes only 3806 ms to
> run, 3 Mb total memory in use and %GC time 8.1%. Why it so?
> It's a bug or I missed something?

Wild guess? If you leave o as a thunk, to be evaluated once the
program has e, then it has numbers, so you keep the entire 10-million
entry list in memory. Evaluating e and o in parallel allows the
system to start garbage collecting cons cells from numbers much
earlier, which reduces residency (I'd've been unsuprised at more than
two orders of magnitude). Managing the smaller heap (and especially
not having to copy numbers on each GC) then makes the garbage
collector go much faster, so you get a smaller run time.

>
> I test it on dual-core Athlon X2 4200+ 2Gb running 64bit Gentoo
> system. gcc 4.2.2 and ghc 6.8.2.

jcc

_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Ruslan Evdokimov

unread,

Feb 16, 2008, 7:07:40 PM2/16/08

to Jonathan Cast, haskel...@haskell.org

2008/2/17, Jonathan Cast <jonath...@fastmail.fm>:

>
> Wild guess? If you leave o as a thunk, to be evaluated once the
> program has e, then it has numbers, so you keep the entire 10-million
> entry list in memory. Evaluating e and o in parallel allows the
> system to start garbage collecting cons cells from numbers much
> earlier, which reduces residency (I'd've been unsuprised at more than
> two orders of magnitude). Managing the smaller heap (and especially
> not having to copy numbers on each GC) then makes the garbage
> collector go much faster, so you get a smaller run time.
>

But I also tested it on P-IV 3.0 with HT and 1GB (single core) running
Windows-XP (ghc 6.8.2), and it works fine (fast & low GC) in all three
cases without significant difference. Sure it didn't runs faster with
-N2 'cause it's not dual-core.

Don Stewart

unread,

Feb 16, 2008, 7:09:41 PM2/16/08

to Ruslan Evdokimov, haskel...@haskell.org

ruslan.evdokimov:

> 2008/2/17, Jonathan Cast <jonath...@fastmail.fm>:
> >
> > Wild guess? If you leave o as a thunk, to be evaluated once the
> > program has e, then it has numbers, so you keep the entire 10-million
> > entry list in memory. Evaluating e and o in parallel allows the
> > system to start garbage collecting cons cells from numbers much
> > earlier, which reduces residency (I'd've been unsuprised at more than
> > two orders of magnitude). Managing the smaller heap (and especially
> > not having to copy numbers on each GC) then makes the garbage
> > collector go much faster, so you get a smaller run time.
> >
> But I also tested it on P-IV 3.0 with HT and 1GB (single core) running
> Windows-XP (ghc 6.8.2), and it works fine (fast & low GC) in all three
> cases without significant difference. Sure it didn't runs faster with
> -N2 'cause it's not dual-core.

What flags did you compile the code with?

Ruslan Evdokimov

unread,

Feb 16, 2008, 7:14:36 PM2/16/08

to Don Stewart, haskel...@haskell.org

2008/2/17, Don Stewart <do...@galois.com>:

>
> What flags did you compile the code with?
>

1st case:
ghc -O2 --make

2nd and 3rd cases:
ghc -O2 --make -threaded

Stefan O'Rear

unread,

Feb 16, 2008, 7:21:40 PM2/16/08

to Ruslan Evdokimov, haskel...@haskell.org

On Sun, Feb 17, 2008 at 03:07:15AM +0300, Ruslan Evdokimov wrote:
> 2008/2/17, Jonathan Cast <jonath...@fastmail.fm>:
> >
> > Wild guess? If you leave o as a thunk, to be evaluated once the
> > program has e, then it has numbers, so you keep the entire 10-million
> > entry list in memory. Evaluating e and o in parallel allows the
> > system to start garbage collecting cons cells from numbers much
> > earlier, which reduces residency (I'd've been unsuprised at more than
> > two orders of magnitude). Managing the smaller heap (and especially
> > not having to copy numbers on each GC) then makes the garbage
> > collector go much faster, so you get a smaller run time.
> >
> But I also tested it on P-IV 3.0 with HT and 1GB (single core) running
> Windows-XP (ghc 6.8.2), and it works fine (fast & low GC) in all three
> cases without significant difference. Sure it didn't runs faster with
> -N2 'cause it's not dual-core.

This makes perfect sense - -N2 tells GHC to use two threads, and if you
run two threads on a single-processor system it's implemented by running
the threads alternatingly (around 100/s for modern Linux, probably
similar for other systems). Thus, the two evaluations never get more
than a hundreth of a second out of step, and memory usage is still low.

Stefan

signature.asc

Ruslan Evdokimov

unread,

Feb 16, 2008, 7:42:19 PM2/16/08

to Stefan O'Rear, haskel...@haskell.org

2008/2/17, Stefan O'Rear <stef...@cox.net>:

> This makes perfect sense - -N2 tells GHC to use two threads, and if you
> run two threads on a single-processor system it's implemented by running
> the threads alternatingly (around 100/s for modern Linux, probably
> similar for other systems). Thus, the two evaluations never get more
> than a hundreth of a second out of step, and memory usage is still low.
>
> Stefan

Test on windows XP AthlonX2 4200+ 2Gb:

C:\imp>test
1
12328 ms

C:\imp>test +RTS -N2
1
5234 ms

C:\imp>test +RTS -N2
1
3515 ms

1st - 1 thread
2nd - 2 threads on single core (one core disabled through Task Manager)
3rd - 2 threads on different cores

Stefan O'Rear

unread,

Feb 16, 2008, 7:56:18 PM2/16/08

to Ruslan Evdokimov, haskel...@haskell.org

On Sun, Feb 17, 2008 at 03:41:52AM +0300, Ruslan Evdokimov wrote:
> 2008/2/17, Stefan O'Rear <stef...@cox.net>:
>
> > This makes perfect sense - -N2 tells GHC to use two threads, and if you
> > run two threads on a single-processor system it's implemented by running
> > the threads alternatingly (around 100/s for modern Linux, probably
> > similar for other systems). Thus, the two evaluations never get more
> > than a hundreth of a second out of step, and memory usage is still low.
> >
> > Stefan
>
> Test on windows XP AthlonX2 4200+ 2Gb:
>
> C:\imp>test
> 1
> 12328 ms
>
> C:\imp>test +RTS -N2
> 1
> 5234 ms
>
> C:\imp>test +RTS -N2
> 1
> 3515 ms
>
> 1st - 1 thread
> 2nd - 2 threads on single core (one core disabled through Task Manager)
> 3rd - 2 threads on different cores

As far as I can tell, that confirms my explanation. If you see it
differently - say how.

Stefan

signature.asc

Ruslan Evdokimov

unread,

Feb 16, 2008, 8:37:19 PM2/16/08

to Stefan O'Rear, haskel...@haskell.org

2008/2/17, Stefan O'Rear <stef...@cox.net>:

>

> As far as I can tell, that confirms my explanation. If you see it
> differently - say how.
>
> Stefan
>

Seems you're right, I changed it to:
[e,o] = map sum $ [filter even numbers, (filter odd) $ reverse numbers]

It prevents numbers from being collected and here is results:

>test.exe
1
12812 ms

>test.exe +RTS -N2
1
16671 ms