Chapter 25 - a little test adding some I/O

36 views
Skip to first unread message

piotr

unread,
Mar 18, 2009, 1:53:24 PM3/18/09
to Real World Haskell Book Club
Hi,
After reading chapter 25, I made a very simple script to see what I
can get from Haskell on simple I/O and calculations, compared to e.g
Python. It is very similar to the function of chapter 25: it reads a
bunch of columns of numbers in a file, and writes the mean of each
column. So the main difference is that there is I/O.

I had to use readDouble from bytestring-lexing package to avoid
unpacking in temporary strings, but still, the time is spent in the
function converting ByteStrings to Doubles, allocating ARR_WORDS (I
don't know what are those exactly).

The script runs 30% slower than a simple python equivalent on files
with several millions numbers.
Would someone have any comments or suggested improvements that I could
do?


-------------------------------------------------------------------------------------------------

import System.Environment (getArgs)
import Data.List (foldl', intercalate)
import qualified Data.ByteString.Lazy.Char8 as BS
import Data.ByteString.Lex.Lazy.Double (readDouble)
import Control.Parallel.Strategies (using, rnf)


-- | reads a file with several columns of numbers, computes the mean
of each column
main = do
file : _ <- getArgs
content <- BS.readFile file

let getMeans = computeMeans . (map toDoubles) . BS.lines
toDoubles = (map word2float) . BS.words

putStrLn $ "means of columns: " ++ (intercalate " " $ map show $
getMeans content)




-- | get a Double from a ByteString
word2float :: BS.ByteString -> Double
word2float b = case readDouble b of
Nothing -> 0
Just (k,_) -> k



-- | means of columns of the "table" of numbers
computeMeans :: [[Double]] -> [Double]
computeMeans lists = if null lists then [] else map (/len) sums
where
-- compute [sum] and length at the same time, using strict
evaluation.
(sums, len) = foldl' sumNext (head lists, 1) (tail lists)

sumNext (s, l) line = (zipWith (+) s line, l+1) `using` rnf

Don Stewart

unread,
Mar 18, 2009, 5:02:08 PM3/18/09
to Real World Haskell Book Club
I usually write those file summing programs to take advantage of the
fact readDouble (et al) return the tail of the file. Saves a lot of
different checks:

import qualified Data.ByteString.Char8 as S
import qualified Data.ByteString.Unsafe as S
import Data.ByteString.Lex.Double

main = print . go 0 =<< S.getContents
where
go !n s = case readDouble s of
Nothing -> n
Just (k,rest) -> go (n+k) (S.tail rest)

should run pretty well (ghc -O2 --make). Note how we sum as we go.
Now, for yours, you'll want to avoid building up all those
intermediate structures (which i'm imagining you aren't doing in the
python case).

piotr

unread,
Mar 19, 2009, 12:14:30 PM3/19/09
to Real World Haskell Book Club
Thanks !

On 18 mar, 22:02, Don Stewart <don...@gmail.com> wrote:
> fact readDouble (et al) return the tail of the file.

Corrected, with a few columns there doesn't seem to be a noticeable
inpact though.


> Now, for yours, you'll want to avoid building up all those
> intermediate structures

In fact that was not the problem either... : 77% of the time and 89%
of the allocation happens in readDouble!

So e.g. fusing the outer loop on [[Double]] had no impact at all on
performance ( http://haskell.pastebin.com/m2d8790c8 ). For me this I/O
example is still a challenge ;-)


PS. I really love your book (and your blog (and also your papers!)),
it makes me rethink my programming!

Blade Wang

unread,
Mar 20, 2009, 1:20:39 AM3/20/09
to Real World Haskell Book Club

What's the meaning of '!' in 'go !n s = ...' , please?

I'm a freshener in haskell,thanks for help~!

^_^

piotr

unread,
Mar 20, 2009, 2:28:47 PM3/20/09
to Real World Haskell Book Club
On 20 mar, 06:20, Blade Wang <K.Y.Wang.1...@gmail.com> wrote:
> What's the meaning of '!' in 'go !n s = ...' , please?

It's called a "bang pattern", it's a syntax extension to easily
precise that you want to evaluate a pattern strictly (in whnf). You
could also use "seq" in the function body.

In the example of the "go" function, we don't want to accumulate
calculations to do later, but we want the result now: this is the same
issue as with the foldl' function.

Blade Wang

unread,
Mar 22, 2009, 10:18:12 PM3/22/09
to Real World Haskell Book Club
Thanks ~~!
Reply all
Reply to author
Forward
0 new messages