[Haskell-cafe] poor performance when generating random text

14 views
Skip to first unread message

Dmitry Vyal

unread,
Oct 17, 2012, 3:07:41 AM10/17/12
to Haskell-Cafe
Hello anyone

I've written a snippet which generates a file full of random strings.
When compiled with -O2 on ghc-7.6, the generation speed is about 2Mb per
second which is on par with interpreted php. That's the fact I find
rather disappointing. Maybe I've missed something trivial? Any
suggestions and explanations are welcome. :)

% cat ext_sort.hs
import qualified Data.Text as T
import System.Random
import Control.Exception
import Control.Monad

import System.IO
import qualified Data.Text.IO as TI

gen_string g = let (len, g') = randomR (50, 450) g
in T.unfoldrN len rand_text (len, g')
where rand_text (0,_) = Nothing
rand_text (k,g) = let (c, g') = randomR ('a','z') g
in Just (c, ((k-1), g'))

write_corpus file = bracket (openFile file WriteMode) hClose $ \h -> do
let size = 100000
sequence $ replicate size $ do
g <- newStdGen
let text = gen_string g
TI.hPutStrLn h text

main = do
putStrLn "generating text corpus"
write_corpus "test.txt"



% cat ext_sort.prof
Wed Oct 17 10:59 2012 Time and Allocation Profiling Report (Final)

ext_sort +RTS -p -RTS

total time = 32.56 secs (32558 ticks @ 1000 us, 1
processor)
total alloc = 12,742,917,332 bytes (excludes profiling overheads)

COST CENTRE MODULE %time %alloc

gen_string.rand_text.(...) Main 70.7 69.8
gen_string Main 17.6 15.8
gen_string.rand_text Main 5.4 13.3
write_corpus.\ Main 4.3 0.8


individual inherited
COST CENTRE MODULE no. entries %time %alloc
%time %alloc

MAIN MAIN 67 0 0.0 0.0
100.0 100.0
main Main 135 0 0.0 0.0
100.0 100.0
write_corpus Main 137 0 0.0 0.0
100.0 100.0
write_corpus.\ Main 138 1 4.3 0.8
100.0 100.0
write_corpus.\.text Main 140 100000 0.0 0.0
95.7 99.2
gen_string Main 141 100000 17.6 15.8
95.7 99.2
gen_string.g' Main 147 100000 0.0
0.0 0.0 0.0
gen_string.rand_text Main 144 25109743 5.4 13.3
77.5 83.2
gen_string.rand_text.g' Main 148 24909743 0.6
0.0 0.6 0.0
gen_string.rand_text.(...) Main 146 25009743 70.7 69.8
70.7 69.8
gen_string.rand_text.c Main 145 25009743 0.8
0.0 0.8 0.0
gen_string.len Main 143 100000 0.0
0.0 0.0 0.0
gen_string.(...) Main 142 100000 0.6
0.3 0.6 0.3

_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Gregory Collins

unread,
Oct 17, 2012, 3:36:32 AM10/17/12
to Dmitry Vyal, Haskell-Cafe
System.Random is very slow. Try the mwc-random package from Hackage.
--
Gregory Collins <gr...@gregorycollins.net>

Alfredo Di Napoli

unread,
Oct 17, 2012, 4:45:20 AM10/17/12
to Gregory Collins, Haskell-Cafe
What about this? I've tested on my pc and seems pretty fast. The trick is to generate the gen only once. Not sure if the inlines helps, though:

import qualified Data.Text as T
import System.Random.MWC
import Control.Monad
import System.IO
import Data.ByteString as B
import Data.Word (Word8)
import Data.ByteString.Char8 as CB


{- | Converts a Char to a Word8. Took from MissingH -}
c2w8 :: Char -> Word8
c2w8 = fromIntegral . fromEnum


charRangeStart :: Word8
charRangeStart = c2w8 'a'
{-# INLINE charRangeStart #-}

charRangeEnd :: Word8
charRangeEnd = c2w8 'z'
{-# INLINE charRangeEnd #-}

--genString :: Gen RealWorld -> IO B.ByteString
genString g = do
    randomLen <- uniformR (50 :: Int, 450 :: Int) g
    str <- replicateM randomLen $ uniformR (charRangeStart, charRangeEnd) g
    return $ B.pack str


writeCorpus :: FilePath -> IO [()]
writeCorpus file = withFile file WriteMode $ \h -> do
  let size = 100000
  _ <- withSystemRandom $ \gen ->
      replicateM size $ do
        text <- genString gen :: IO B.ByteString
        CB.hPutStrLn h text
  return [()]

main :: IO [()]
main =  writeCorpus "test.txt"



A.

Dmitry Vyal

unread,
Oct 17, 2012, 3:10:24 PM10/17/12
to Alfredo Di Napoli, Haskell-Cafe
On 10/17/2012 12:45 PM, Alfredo Di Napoli wrote:
> What about this? I've tested on my pc and seems pretty fast. The trick
> is to generate the gen only once. Not sure if the inlines helps, though:
>

> What about this? I've tested on my pc and seems pretty fast. The
trick is to generate the gen only once. Not sure if the inlines helps,
though
...

Wow, haskell-cafe is a wonderful place! In just a two hours program run
time automagically improved 20x ;) Thanks Alfredo, code works wonderful.
Compared to mine implementation it's 2.5 sec vs 50 sec on my laptop.
Interesting, how it compares to C now.

Inlining makes about 50x difference when code compiled without
optimization. A nice example.

Best wishes,
Dmitry

Alfredo Di Napoli

unread,
Oct 17, 2012, 3:28:53 PM10/17/12
to Dmitry Vyal, Haskell-Cafe
Glad to have been helpful :)

Bests,
Alfredo

Sent from my iPad
Reply all
Reply to author
Forward
0 new messages