I have strange GHC behavior. Consider the code:
import Control.Parallel
main = print (o `par` (fromInteger e) / (fromInteger o))
where
[e,o] = map sum $ map (`filter` numbers) [even, odd]
numbers = [1..10000000]
When it compiled without threaded it has 19068 ms to run, 396 Mb total
memory in use and %GC time 88.2%, the same with -threaded and +RTS -N1,
but with +RTS -N2 it takes only 3806 ms to run, 3 Mb total memory in use and
%GC time 8.1%. Why it so? It's a bug or I missed something?
I test it on dual-core Athlon X2 4200+ 2Gb running 64bit Gentoo system. gcc
4.2.2 and ghc 6.8.2.
--
Ruslan
> Hi, all!
>
> I have strange GHC behavior. Consider the code:
>
> import Control.Parallel
>
> main = print (o `par` (fromInteger e) / (fromInteger o))
> where
> [e,o] = map sum $ map (`filter` numbers) [even, odd]
> numbers = [1..10000000]
>
>
> When it compiled without threaded it has 19068 ms to run, 396 Mb
> total memory in use and %GC time 88.2%, the same with -
> threaded and +RTS -N1, but with +RTS -N2 it takes only 3806 ms to
> run, 3 Mb total memory in use and %GC time 8.1%. Why it so?
> It's a bug or I missed something?
Wild guess? If you leave o as a thunk, to be evaluated once the
program has e, then it has numbers, so you keep the entire 10-million
entry list in memory. Evaluating e and o in parallel allows the
system to start garbage collecting cons cells from numbers much
earlier, which reduces residency (I'd've been unsuprised at more than
two orders of magnitude). Managing the smaller heap (and especially
not having to copy numbers on each GC) then makes the garbage
collector go much faster, so you get a smaller run time.
>
> I test it on dual-core Athlon X2 4200+ 2Gb running 64bit Gentoo
> system. gcc 4.2.2 and ghc 6.8.2.
jcc
_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
What flags did you compile the code with?
2nd and 3rd cases:
ghc -O2 --make -threaded
This makes perfect sense - -N2 tells GHC to use two threads, and if you
run two threads on a single-processor system it's implemented by running
the threads alternatingly (around 100/s for modern Linux, probably
similar for other systems). Thus, the two evaluations never get more
than a hundreth of a second out of step, and memory usage is still low.
Stefan
> This makes perfect sense - -N2 tells GHC to use two threads, and if you
> run two threads on a single-processor system it's implemented by running
> the threads alternatingly (around 100/s for modern Linux, probably
> similar for other systems). Thus, the two evaluations never get more
> than a hundreth of a second out of step, and memory usage is still low.
>
> Stefan
Test on windows XP AthlonX2 4200+ 2Gb:
C:\imp>test
1
12328 ms
C:\imp>test +RTS -N2
1
5234 ms
C:\imp>test +RTS -N2
1
3515 ms
1st - 1 thread
2nd - 2 threads on single core (one core disabled through Task Manager)
3rd - 2 threads on different cores
As far as I can tell, that confirms my explanation. If you see it
differently - say how.
Stefan
>
> As far as I can tell, that confirms my explanation. If you see it
> differently - say how.
>
> Stefan
>
Seems you're right, I changed it to:
[e,o] = map sum $ [filter even numbers, (filter odd) $ reverse numbers]
It prevents numbers from being collected and here is results:
>test.exe
1
12812 ms
>test.exe +RTS -N2
1
16671 ms