Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Serialization code seems to require excessive amounts of memory
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 1 - 25 of 39 - Collapse all  -  Translate all to Translated (View all originals)   Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
stepcut  
View profile  
 More options Jul 30 2009, 6:54 pm
From: stepcut <jer...@n-heptane.com>
Date: Thu, 30 Jul 2009 15:54:39 -0700 (PDT)
Local: Thurs, Jul 30 2009 6:54 pm
Subject: Serialization code seems to require excessive amounts of memory
Hello,

I just filed this bug:

http://code.google.com/p/happstack/issues/detail?id=103

This simple program:

main =
    let list = [1..1000000] :: [Int]
        bin   = B.runPut (safePut list)
        list' = B.runGet safeGet bin :: [Int]
    in putStrLn (show . length $ takeWhile (< 10000000) list) >>
getLine >> return ()

requires 50 times more RSS and 10 times more VIRT if you change list
in the last line to list'.

I have a real server where my checkpoint file is only 11MB, but once
it gets loaded, the server requires 500-800MB. That seems pretty
excessive. Anyone got any ideas why the serialization code bumps up
the RAM usage to so much? It's not temporary either.. in the above
example, when sitting at the getLine, it is still using all that
memory.

I am using using GHC 6.10.4 on Linux.

- jeremy


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Lemmih  
View profile  
 More options Jul 30 2009, 7:33 pm
From: Lemmih <lem...@gmail.com>
Date: Fri, 31 Jul 2009 01:33:31 +0200
Local: Thurs, Jul 30 2009 7:33 pm
Subject: Re: Serialization code seems to require excessive amounts of memory

Same thing seems to happen with plain Data.Binary. Also, +RTS -s -RTS
is very accurate for measuring heap usage.

--
Cheers,
  Lemmih


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
MightyByte  
View profile  
 More options Jul 30 2009, 7:42 pm
From: MightyByte <mightyb...@gmail.com>
Date: Thu, 30 Jul 2009 16:42:36 -0700
Local: Thurs, Jul 30 2009 7:42 pm
Subject: Re: Serialization code seems to require excessive amounts of memory
I've also been experiencing similar memory size anomalies.  My  
checkpoint file is six megs and I'm using 600+ megs of RAM.  I haven't  
yet identified the source of the problem, but this sounds very similar.

On Jul 30, 2009, at 4:33 PM, Lemmih <lem...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
stepcut  
View profile  
 More options Jul 31 2009, 4:58 pm
From: stepcut <jer...@n-heptane.com>
Date: Fri, 31 Jul 2009 13:58:16 -0700 (PDT)
Local: Fri, Jul 31 2009 4:58 pm
Subject: Re: Serialization code seems to require excessive amounts of memory
Cool. I started a thread on haskell-cafe, now we just have to see if
dons rises to the challenge:

http://www.haskell.org/pipermail/haskell-cafe/2009-July/064779.html

This does indeed seem to be an issue with Binary, and not anything
happstack specific.

- jeremy


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matthew Elder  
View profile  
 More options Jul 31 2009, 5:40 pm
From: Matthew Elder <m...@mattelder.org>
Date: Fri, 31 Jul 2009 14:40:52 -0700
Local: Fri, Jul 31 2009 5:40 pm
Subject: Re: Serialization code seems to require excessive amounts of memory

Looks good stepcut, I only have one complaint.

*puts on his linux hat*

its Linux, ok? not GNU/Linux.

*puts on his RMS hat*

Linux is actually only the kernel that the GNU operating system runs on. You
see HERD was a really advanced kernel and it was going to be the best but
Linux beat us.

Since Linux is actually the GNU operating system, you should call it
GNU/Linux.

*thank you*

On Fri, Jul 31, 2009 at 1:58 PM, stepcut <jer...@n-heptane.com> wrote:

> Cool. I started a thread on haskell-cafe, now we just have to see if
> dons rises to the challenge:

> http://www.haskell.org/pipermail/haskell-cafe/2009-July/064779.html

> This does indeed seem to be an issue with Binary, and not anything
> happstack specific.

> - jeremy

--
Need somewhere to put your code? http://patch-tag.com
Want to build a webapp? http://happstack.com

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Don Stewart  
View profile  
 More options Jul 31 2009, 5:49 pm
From: Don Stewart <d...@galois.com>
Date: Fri, 31 Jul 2009 14:49:31 -0700
Local: Fri, Jul 31 2009 5:49 pm
Subject: Re: Serialization code seems to require excessive amounts of memory
Is there a general task here to write a better Binary instance for IxSet
et al?

If so, I could look at it. Let me know what you need.

-- Don

matt:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
MightyByte  
View profile  
 More options Jul 31 2009, 6:05 pm
From: MightyByte <mightyb...@gmail.com>
Date: Fri, 31 Jul 2009 15:05:35 -0700
Local: Fri, Jul 31 2009 6:05 pm
Subject: Re: Serialization code seems to require excessive amounts of memory
Well, we need something that both won't overflow stack *and* doesn't  
impose this 60x space explosion.

On Jul 31, 2009, at 2:49 PM, Don Stewart <d...@galois.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Don Stewart  
View profile  
 More options Jul 31 2009, 6:16 pm
From: Don Stewart <d...@galois.com>
Date: Fri, 31 Jul 2009 15:16:02 -0700
Local: Fri, Jul 31 2009 6:16 pm
Subject: Re: Serialization code seems to require excessive amounts of memory
Where's the data type defined? And is there some test data?

I'd be happy to help design a good serialization instance for the
happstack team.

-- Don

mightybyte:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Lemmih  
View profile  
 More options Jul 31 2009, 6:31 pm
From: Lemmih <lem...@gmail.com>
Date: Sat, 1 Aug 2009 00:31:58 +0200
Local: Fri, Jul 31 2009 6:31 pm
Subject: Re: Serialization code seems to require excessive amounts of memory

On Sat, Aug 1, 2009 at 12:16 AM, Don Stewart<d...@galois.com> wrote:

> Where's the data type defined? And is there some test data?

The test case is a list of ints. It doesn't seem to be related to happstack.

--
Cheers,
  Lemmih


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tom Tobin  
View profile  
 More options Jul 31 2009, 6:10 pm
From: Tom Tobin <korp...@korpios.com>
Date: Fri, 31 Jul 2009 17:10:52 -0500
Local: Fri, Jul 31 2009 6:10 pm
Subject: Re: Serialization code seems to require excessive amounts of memory

On Fri, Jul 31, 2009 at 4:40 PM, Matthew Elder<m...@mattelder.org> wrote:
> *puts on his RMS hat*

That's one frightening hat.  ^_^

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
MightyByte  
View profile  
 More options Jul 31 2009, 7:10 pm
From: MightyByte <mightyb...@gmail.com>
Date: Fri, 31 Jul 2009 16:10:48 -0700
Local: Fri, Jul 31 2009 7:10 pm
Subject: Re: Serialization code seems to require excessive amounts of memory
Dons, the code is in Happstack.Data.IxSet but I agree with lemmih that  
it should be able to be fixed outside of Happstack.

On Jul 31, 2009, at 3:31 PM, Lemmih <lem...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Don Stewart  
View profile  
 More options Jul 31 2009, 7:14 pm
From: Don Stewart <d...@galois.com>
Date: Fri, 31 Jul 2009 16:14:37 -0700
Local: Fri, Jul 31 2009 7:14 pm
Subject: Re: Serialization code seems to require excessive amounts of memory

So the problem is how to serialise lists lazily.

There are several encodings.

    * write the length of the list, then the elements
    * interleave bits of length with elements

We choose the more space efficient disk encoding. Also, we can't change
the disk encoding of the default instances (so we can't switch away from
a length encoding).

Specific applications may wish to use different disk representations,
due to other constaints. happstack should just use their own encoding
that satisifies the constrains they have.

mightybyte:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Lemmih  
View profile  
 More options Jul 31 2009, 7:41 pm
From: Lemmih <lem...@gmail.com>
Date: Sat, 1 Aug 2009 01:41:59 +0200
Local: Fri, Jul 31 2009 7:41 pm
Subject: Re: Serialization code seems to require excessive amounts of memory

We aren't concerned about efficient disk encodings. It's the memory
retention we're worried about. Serializing 1,000,000 ints requires
32megs of memory (101 with GC overhead). Serializing 5,000,000 ints
requires 186megs of memory (475 with GC overhead). Serializing the
list with 'show' runs in constant space; hopefully this could be made
true for Data.Binary as well.

--
Cheers,
  Lemmih


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Lemmih  
View profile  
 More options Jul 31 2009, 7:43 pm
From: Lemmih <lem...@gmail.com>
Date: Sat, 1 Aug 2009 01:43:01 +0200
Local: Fri, Jul 31 2009 7:43 pm
Subject: Re: Serialization code seems to require excessive amounts of memory

Oh darn, I'm a fool. Why do I always realize these things right after
I send the mail. Please ignore me.

--
Cheers,
  Lemmih


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Lemmih  
View profile  
 More options Jul 31 2009, 9:11 pm
From: Lemmih <lem...@gmail.com>
Date: Sat, 1 Aug 2009 03:11:24 +0200
Local: Fri, Jul 31 2009 9:11 pm
Subject: Re: Serialization code seems to require excessive amounts of memory

To show that I'm not a complete idiot, here's a chunked list instance
that runs in constant space, and a "strict" list instance that
serializes the elements before writing the length of the list. The
strict list instance is a drop-in replacement for the normal list
instance and it is about 10x more memory efficient for lazy lists of
ints. It does not run in constant space, though.

The two instances obviously only use less memory when serializing lazy
data. I guess the lesson here (if there is any) is to avoid data
structures that uses 40 bytes per entry when storing objects in the
millions.

import Data.Binary
import qualified Data.Binary.Put as B
import qualified Data.ByteString.Lazy as BS
import Control.Monad

newtype ChunkedList a = ChunkedList [a]

instance Binary a ⇒  Binary (ChunkedList a) where
    put (ChunkedList lst)
        = worker (toChunks lst)
        where worker []     = do putWord8 0
              worker (x:xs) = do putWord8 1
                                 put x
                                 worker xs
    get = liftM ChunkedList worker
        where worker = do n ←  getWord8
                          if n == 0
                            then do return []
                            else do liftM2 (:) get worker

toChunks ∷  [a] →  [[a]]
toChunks [] = []
toChunks lst = take n lst : toChunks (drop n lst)
  where n = 1024

newtype StrictList a = StrictList [a]

instance Binary a ⇒  Binary (StrictList a) where
    put (StrictList lst)
        = do let (len,bs) = B.runPutM (builder 0 lst)
             BS.length bs `seq` put len >> B.putLazyByteString bs
      where builder !n [] = return n
            builder !n (x:xs) = do put x
                                   builder (n+1 ∷  Int) xs
    get = liftM StrictList get

--
Cheers,
  Lemmih


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
stepcut  
View profile  
 More options Jul 31 2009, 10:59 pm
From: stepcut <jer...@n-heptane.com>
Date: Fri, 31 Jul 2009 19:59:14 -0700 (PDT)
Local: Fri, Jul 31 2009 10:59 pm
Subject: Re: Serialization code seems to require excessive amounts of memory
I believe I ultimately determined that my test case was bogus. Here is
my revised test, which does a better job of keeping the whole list in
memory (which is the behavior we generally want with happstack).

main :: IO ()
main =
    let list = [1..1000000] :: [Int]
        bin   = encode list
        list' = decode bin :: [Int]
    in do putStrLn (show . length $ takeWhile (< 10000000) list)
          getLine
          putStrLn (show . length $ takeWhile (< 10000001) list)

This version consumes about 40MB in the control version. and 60MB when
you swap list' for list. Further testing using decodeFile/encodeFile
showed that writing to disk, restarting the app, and reloading the
disk state resulted in 50MB of space instead of 40MB, which is not too
bad.

For applications that use lots of Strings the best option is probably
to use Data.Text from the text library.

Now that I have a better control, I am going to do some additional
testing and see if happstack-data or IxSet are causing any unneeded
memory usage. It is quite possible that the difference between the
checkpoint file and the loaded state sizes is because the binary
format is very compact, and typical Haskell data structures are not.
The 'fix' for this is to use Haskell data structures like Data.Text
which aim to provide compact in-memory representations.

- jeremy


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
MightyByte  
View profile  
 More options Aug 4 2009, 1:16 pm
From: MightyByte <mightyb...@gmail.com>
Date: Tue, 4 Aug 2009 13:16:37 -0400
Local: Tues, Aug 4 2009 1:16 pm
Subject: Re: Serialization code seems to require excessive amounts of memory
Is there a good way we could integrate this into Happstack as an
alternative way to serialize IxSets?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Don Stewart  
View profile  
 More options Aug 4 2009, 1:33 pm
From: Don Stewart <d...@galois.com>
Date: Tue, 4 Aug 2009 10:33:45 -0700
Local: Tues, Aug 4 2009 1:33 pm
Subject: Re: Serialization code seems to require excessive amounts of memory
mightybyte:

I'm happy to change the implementation of [a] serialization (as long as
we prefer on-disk format), or to add a chunked list newtype, since that
seems to be useful too.

-- Don


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gregory Collins  
View profile  
 More options Aug 4 2009, 3:22 pm
From: Gregory Collins <g...@gregorycollins.net>
Date: Tue, 04 Aug 2009 15:22:43 -0400
Local: Tues, Aug 4 2009 3:22 pm
Subject: Re: Serialization code seems to require excessive amounts of memory

Don Stewart <d...@galois.com> writes:
> I'm happy to change the implementation of [a] serialization (as long
> as we prefer on-disk format), or to add a chunked list newtype, since
> that seems to be useful too.

+1 for the newtype, changing the default list encoding will cause
headaches for people who use binary.

G.
--
Gregory Collins <g...@gregorycollins.net>


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
MightyByte  
View profile  
 More options Aug 4 2009, 4:15 pm
From: MightyByte <mightyb...@gmail.com>
Date: Tue, 4 Aug 2009 16:15:34 -0400
Local: Tues, Aug 4 2009 4:15 pm
Subject: Re: Serialization code seems to require excessive amounts of memory

On Tue, Aug 4, 2009 at 3:22 PM, Gregory Collins<g...@gregorycollins.net> wrote:

> Don Stewart <d...@galois.com> writes:

>> I'm happy to change the implementation of [a] serialization (as long
>> as we prefer on-disk format), or to add a chunked list newtype, since
>> that seems to be useful too.

> +1 for the newtype, changing the default list encoding will cause
> headaches for people who use binary.

Yeah, I agree.  Of course the next question is whether we should
change IxSet to use the chunked list or create something like
Happstack.Data.IxSet.Chunked.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
S. Alexander Jacobson  
View profile  
 More options Aug 7 2009, 3:34 pm
From: "S. Alexander Jacobson" <a...@alexjacobson.com>
Date: Fri, 07 Aug 2009 15:34:04 -0400
Local: Fri, Aug 7 2009 3:34 pm
Subject: Re: Serialization code seems to require excessive amounts of memory
Can you clarify the issue here?  Do state types need special
customization to be serialize correctly within HAppS as distinct from
other contexts?

-Alex-

On 8/4/09 4:15 PM, MightyByte wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
MightyByte  
View profile  
 More options Aug 7 2009, 4:01 pm
From: MightyByte <mightyb...@gmail.com>
Date: Fri, 7 Aug 2009 16:01:56 -0400
Local: Fri, Aug 7 2009 4:01 pm
Subject: Re: Serialization code seems to require excessive amounts of memory
The issue is that Happstack's memory usage seems a little high, and it
seems that the serialization may be partly responsible for that.  I'm
currently looking for every possible way I can reduce my memory
footprint, because it is what is driving up my hosting costs the most.
 I'm not coming close to my bandwidth, CPU, and disk usage limits.  So
I would argue that Happstack needs a list serialization that runs in
constant space.

On Fri, Aug 7, 2009 at 3:34 PM, S. Alexander


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
S. Alexander Jacobson  
View profile  
 More options Aug 7 2009, 7:50 pm
From: "S. Alexander Jacobson" <a...@alexjacobson.com>
Date: Fri, 07 Aug 2009 19:50:47 -0400
Local: Fri, Aug 7 2009 7:50 pm
Subject: Re: Serialization code seems to require excessive amounts of memory
We can just shift back to serializing using Read/Show.  It is a very
easy change of code.

On 8/7/09 4:01 PM, MightyByte wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
MightyByte  
View profile  
 More options Aug 7 2009, 8:14 pm
From: MightyByte <mightyb...@gmail.com>
Date: Fri, 7 Aug 2009 20:14:26 -0400
Local: Fri, Aug 7 2009 8:14 pm
Subject: Re: Serialization code seems to require excessive amounts of memory
That sounds fine, but we need to make sure we don't break existing  
state.

On Aug 7, 2009, at 7:50 PM, "S. Alexander Jacobson" <a...@alexjacobson.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
stepcut  
View profile  
 More options Aug 8 2009, 1:32 pm
From: stepcut <jer...@n-heptane.com>
Date: Sat, 8 Aug 2009 10:32:41 -0700 (PDT)
Local: Sat, Aug 8 2009 1:32 pm
Subject: Re: Serialization code seems to require excessive amounts of memory
In my experience, the tricky part so far has been figuring out why
memory is being used, and if it is justified.

If you use memory profiling to examine the memory consumption when
restoring from a checkpoint file, you find out that all the memory is
being allocated by various instances of getCopy. Which is not
surprising, but also does not tell you anything useful.

So, the first thing to do is to figure how to get a good picture of
where all the memory is going. In my case, it may all be due to the
fact that my state has a lot of Strings in it. These 'compress' really
well when serialized (because they are converted to a utf-8
bytestring). So that could explain the big different between the on
disk checkpoint file size, and the amount of memory used after it is
loaded.  If that is the case, then the method used for serialization/
deserialization is not going to change anything -- I need to change my
types instead, to use a more efficient representation in memory.

Not sure how to profile this type of thing though.

- jeremy


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Messages 1 - 25 of 39   Newer >
« Back to Discussions « Newer topic     Older topic »