Haskell Snap memory usage for large HTTP response bodies

54 views
Skip to first unread message

Simon Bourne

unread,
Aug 9, 2015, 3:14:59 PM8/9/15
to Snap Framework
Hey, I've been having some memory usage issues getting snap to produce very large http responses. I've included the offending code at the end of this message. Basically test1 consumes a lot of memory, and test2 runs in a small constant amount of memory, where test1 is delivering data to a client and test2 is printing it to stdout. More details in the stackoverflow question I posted:

Stack overflow question

http://stackoverflow.com/questions/31901811/haskell-snap-memory-usage-for-large-http-response-bodies

Any help is very much appreciated!

Simon

The offending code:

import Snap.Core (Snap, writeLBS, readRequestBody)
import Snap.Http.Server (quickHttpServe)
import Control.Monad.IO.Class (MonadIO(liftIO))
import qualified Data.ByteString.Lazy.Char8 as LBS (ByteString, length, replicate)

main :: IO ()
main = quickHttpServe $ site test1 where
    test1, test2 :: LBS.ByteString -> Snap ()

    -- Send ss to client
    test1 = writeLBS

    -- Print ss to stdout upon receiving request
    test2 = liftIO . print

    site write = do
        body <- readRequestBody 1000
        -- Making ss dependant on the request stops GHC from keeping a
        -- reference to ss as pointed out by Reid Barton.
        let bodyLength = fromIntegral $ LBS.length body
        write $ ss bodyLength

    ss c = LBS.replicate (1000000000000 * (c + 1)) 'S'

Simon Bourne

unread,
Aug 10, 2015, 6:49:29 AM8/10/15
to Snap Framework
A couple of extra bits of info:

The memory usage seems to jump up in powers of 2, rather than steadily increase. i.e. 0.5gb, 1gb, 2gb etc.

The problem still happens with ghc 7.10.2 and snap 0.14.0.6.

Leon Smith

unread,
Aug 10, 2015, 8:42:22 AM8/10/15
to snap_fr...@googlegroups.com
Quick question,   do you get any data back from your request?    If you do,  my guess is that it's possible that Snap would be erroneously maintaining a pointer to the start of the lazy bytestring somewhere,  and then this would seem like a bug.

If you don't,  my guess is that Snap would be trying to avoid using the chunked-encoding and/or trying to set the Content-Length header,  and that it's probably not really a bug.

What are you trying to do here?   If you are trying to serve up the contents of a file,  then I'd strongly suggest using sendFile instead.    If you are doing something a bit more dynamic,  then perhaps modifyResponseBody would be appropriate here.  (And,  if you want to go the modifyResponseBody route,   I would heartily recommend using Snap 1.0 even though it's not quite released.)

Best,
Leon


--

---
You received this message because you are subscribed to the Google Groups "Snap Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snap_framewor...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Simon Bourne

unread,
Aug 10, 2015, 11:06:53 AM8/10/15
to Snap Framework
Hi Leon, thanks for the info, I'll try transformRequestBody. Where did you get snap 1.0 from? Github releases look to be one behind hackage, and that is on 0.14.0.6.

In answer to your other questions, yes, I get the entire data back eventually. It's the best part of a terabyte in total and the virtual and resident sizes of the process remain at ~16Gb.

Snap is definitly using chunked encoding as I've looked at the raw output from curl. Once the request is completed, the resident and virtual size stays around 15-16Gb (I left it for a few minutes). When I fired another request at it, it stayed at 15-16Gb the entire time and successfully delivered the whole terabyte of data again. i.e. the memory grew to 15gb on the first request and remained at that during the whole of the second request. It did go down briefly when I fired 2 concurrent requests at it, but they never completed (I killed the process due to excessive swapping).

The ultimate aim is to deliver potentially large (although not quite 1tb!) query results in XMLA (XML for analysis) format. This is just a cut down version to illustrate the problem.

Cheers,

Simon

Leon Smith

unread,
Aug 10, 2015, 11:32:04 AM8/10/15
to snap_fr...@googlegroups.com
On Mon, Aug 10, 2015 at 11:06 AM, Simon Bourne <simon...@gmail.com> wrote:
Hi Leon, thanks for the info, I'll try transformRequestBody. Where did you get snap 1.0 from? Github releases look to be one behind hackage, and that is on 0.14.0.6.

1.0 is available in a branch in each relevant repo on Github.   You'll have to check out the repo,  as 1.0 hasn't been tagged yet.
 
In answer to your other questions, yes, I get the entire data back eventually. It's the best part of a terabyte in total and the virtual and resident sizes of the process remain at ~16Gb. 

Snap is definitly using chunked encoding as I've looked at the raw output from curl. Once the request is completed, the resident and virtual size stays around 15-16Gb (I left it for a few minutes). When I fired another request at it, it stayed at 15-16Gb the entire time and successfully delivered the whole terabyte of data again. i.e. the memory grew to 15gb on the first request and remained at that during the whole of the second request. It did go down briefly when I fired 2 concurrent requests at it, but they never completed (I killed the process due to excessive swapping).

Yeah,  that would seem to be a bug, then.
 
The ultimate aim is to deliver potentially large (although not quite 1tb!) query results in XMLA (XML for analysis) format. This is just a cut down version to illustrate the problem.

Right now I'm working on some stuff involving HTML5 server sent events,  so I have to use modifyResponseBody.   Otherwise nothing (not even headers) will be sent until the entire event stream has been computed,  which in most cases would be "never".    And the new io-streams based interface is a lot saner,  especially when it comes to proper resource handling in the presence of exceptions.

Best,
Leon

Gregory Collins

unread,
Aug 11, 2015, 1:57:56 PM8/11/15
to snap_fr...@googlegroups.com
Hi,

Unfortunately I don't have time to dig too deeply into this issue right now, but I will say:
  • Firstly, if "writeLBS" is too strict then that's a legitimate bug and we should fix that
  • in general the reason snap <1.0 uses enumerator and snap >= 1.0 uses io-streams is that reliably streaming using lazy I/O is very difficult to get right for precisely this reason --- something non-local to your computation can easily accidentally retain more of the spine of the data structure than desired, leading to huge memory blow-up. Use the streaming abstraction we provide for you and you will not encounter this issue
  • point #2 means that fixing this is a pretty low priority for us. Patient: "Doctor, it hurts when I do X". Doctor: "Don't do X"
G

--

---
You received this message because you are subscribed to the Google Groups "Snap Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snap_framewor...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Gregory Collins <gr...@gregorycollins.net>

Simon Bourne

unread,
Aug 20, 2015, 5:34:16 AM8/20/15
to snap_fr...@googlegroups.com
No problems, I'll try it with 1.0 when I get a chance.

Thanks,

Simon

--

---
You received this message because you are subscribed to a topic in the Google Groups "Snap Framework" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/snap_framework/su0HpwhaRaM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to snap_framewor...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages