Same thing seems to happen with plain Data.Binary. Also, +RTS -s -RTS
is very accurate for measuring heap usage.
--
Cheers,
Lemmih
The test case is a list of ints. It doesn't seem to be related to happstack.
--
Cheers,
Lemmih
That's one frightening hat. ^_^
We aren't concerned about efficient disk encodings. It's the memory
retention we're worried about. Serializing 1,000,000 ints requires
32megs of memory (101 with GC overhead). Serializing 5,000,000 ints
requires 186megs of memory (475 with GC overhead). Serializing the
list with 'show' runs in constant space; hopefully this could be made
true for Data.Binary as well.
--
Cheers,
Lemmih
Oh darn, I'm a fool. Why do I always realize these things right after
I send the mail. Please ignore me.
--
Cheers,
Lemmih
To show that I'm not a complete idiot, here's a chunked list instance
that runs in constant space, and a "strict" list instance that
serializes the elements before writing the length of the list. The
strict list instance is a drop-in replacement for the normal list
instance and it is about 10x more memory efficient for lazy lists of
ints. It does not run in constant space, though.
The two instances obviously only use less memory when serializing lazy
data. I guess the lesson here (if there is any) is to avoid data
structures that uses 40 bytes per entry when storing objects in the
millions.
import Data.Binary
import qualified Data.Binary.Put as B
import qualified Data.ByteString.Lazy as BS
import Control.Monad
newtype ChunkedList a = ChunkedList [a]
instance Binary a ⇒ Binary (ChunkedList a) where
put (ChunkedList lst)
= worker (toChunks lst)
where worker [] = do putWord8 0
worker (x:xs) = do putWord8 1
put x
worker xs
get = liftM ChunkedList worker
where worker = do n ← getWord8
if n == 0
then do return []
else do liftM2 (:) get worker
toChunks ∷ [a] → [[a]]
toChunks [] = []
toChunks lst = take n lst : toChunks (drop n lst)
where n = 1024
newtype StrictList a = StrictList [a]
instance Binary a ⇒ Binary (StrictList a) where
put (StrictList lst)
= do let (len,bs) = B.runPutM (builder 0 lst)
BS.length bs `seq` put len >> B.putLazyByteString bs
where builder !n [] = return n
builder !n (x:xs) = do put x
builder (n+1 ∷ Int) xs
get = liftM StrictList get
--
Cheers,
Lemmih
I'm happy to change the implementation of [a] serialization (as long as
we prefer on-disk format), or to add a chunked list newtype, since that
seems to be useful too.
-- Don
> I'm happy to change the implementation of [a] serialization (as long
> as we prefer on-disk format), or to add a chunked list newtype, since
> that seems to be useful too.
+1 for the newtype, changing the default list encoding will cause
headaches for people who use binary.
G.
--
Gregory Collins <gr...@gregorycollins.net>
Yeah, I agree. Of course the next question is whether we should
change IxSet to use the chunked list or create something like
Happstack.Data.IxSet.Chunked.
I have wanted this exact feature as well. For IxSet, you'd want to
see both the number of items and the total memory allocated for the
whole set. This would help in estimating the relationship between
number of items, number of indexes, and total size. The problem is
that I have no idea if such a thing is possible, let alone how you
might go about implementing it if it was.
Isn't heap profiling an option?
--
Cheers,
Lemmih
Yes, seems like a good case for heap profiling (with the per-type option
set, so allocs are broken down by type).
-- Don
Ok, I'll try this. I've done some heap profiling before, but maybe I
just wasn't using the right options.
I wrote about it here,
http://book.realworldhaskell.org/read/profiling-and-optimization.html#id678078
with some examples.