a question on iterIO

11 views
Skip to first unread message

Eugene Perederey

unread,
Nov 20, 2011, 5:54:24 AM11/20/11
to Stanford CS240h 2011 Autumn
I want to implement my client-server interaction in a iterIO manner.
I also use AttoParsec to parse the data stream

procRequests :: Socket -> IO ()
procRequests sock = do
(conn, addr) <- accept sock
(sink, src) <- iterStream conn
-- client-server communication
src |$ parseAndUpdateState .| sink
procRequests sock

parseAndUpdateState :: Inum B.ByteString B.ByteString IO a
parseAndUpdateState = mkInum $ do
dd <- data0MaxI 1000
case parseRequest dd of
Right req ->
do
reply <- (liftIO $ updateEntry
req) -- retrieve key,value from storage here
liftIO $ print $ serialize reply
return $ serialize reply -- // THIS
BLOCKS
-- return dd -- // THIS WORKS
Left _ -> return dd -- should never happen

Basically, I take a chunk of ByteString, use it as a key to
System.IO.Storage, retrieve the corresponding value and return it in
inumerator. For some reason this doesn't work -- seems like the data
doesn't go to sink.
If I return the same value dd that I got from the source everything is
fine even though I print the intermediate IO for debugging.

Prior to switching to System.IO.Storage I have tried IORef and MVar,
those caused stack overflow. With System.IO.Storage it seems to be
waiting for the input somewhere but I can't figure out where.
So, is it generally an acceptable way to iterate using intermediate IO
as a source of intermediate keys?

dm-list-clas...@scs.stanford.edu

unread,
Nov 20, 2011, 2:47:08 PM11/20/11
to stanford-1...@googlegroups.com
At Sun, 20 Nov 2011 02:54:24 -0800 (PST),

Eugene Perederey wrote:
>
> I want to implement my client-server interaction in a iterIO manner.
> I also use AttoParsec to parse the data stream

Hi, Eugene. I don't have any experience with System.IO.Storage.
However, there's some chance you are just compounding the problems if
you never understood why you were blowing out your stack last time.

Is it possible for you to devise a small self-contained example that
blows out the stack? Otherwise, it's hard to figure out what's going
on from only the excerpts of code. I'll also be around Monday and
Tuesday if you want to stop by my office and have a look.

As for the code you included:

> procRequests :: Socket -> IO ()
> procRequests sock = do
> (conn, addr) <- accept sock
> (sink, src) <- iterStream conn
> -- client-server communication
> src |$ parseAndUpdateState .| sink
> procRequests sock
>
> parseAndUpdateState :: Inum B.ByteString B.ByteString IO a
> parseAndUpdateState = mkInum $ do
> dd <- data0MaxI 1000
> case parseRequest dd of
> Right req ->
> do
> reply <- (liftIO $ updateEntry
> req) -- retrieve key,value from storage here
> liftIO $ print $ serialize reply
> return $ serialize reply -- // THIS
> BLOCKS
> -- return dd -- // THIS WORKS
> Left _ -> return dd -- should never happen

Several things could be happening. It could be that the reply is
large and your client is not reading the reply, and thus the TCP
connection is getting flow-controlled. It could also be that you are
transmitting the reply and the client is not transmitting another
request.

One problem is that, as written, your code may have no way of exiting.
The 0 in 'data0MaxI' means that it is okay to return 0 bytes of data.
Thus, once the client closes the connection, your server will just
spin forever unless somehow updateEntry knows how to return something
that gets serialized to 0 bytes. You could use 'dataMaxI' instead of
'data0MaxI' to throw an exception at end-of-file, which would cause
things to get cleaned up properly.

Another issue is that, if you have pipelined, back-to-back requests,
it looks like you aren't handling the residual data properly. Suppose
parseRequest only consumes 500 bytes of data, what do you do with the
rest? Or what if parseRequest needs 600 bytes of data and you only
happen to have 500, you'll throw away the 500 and get stuck next time
reading in the middle of a request boundary.

If you want to invoke an attoparsec parser on the input within the
Iter monad, you might try the atto and tryAtto functions from
Data.IterIO.Atto, which should handle issues like residual input data
transparently.

David

Bryan O'Sullivan

unread,
Nov 21, 2011, 12:10:26 AM11/21/11
to stanford-1...@googlegroups.com
On Sun, Nov 20, 2011 at 2:54 AM, Eugene Perederey <eugene.p...@gmail.com> wrote:

Basically, I take a chunk of ByteString, use it as a key to
System.IO.Storage, retrieve the corresponding value and return it in
inumerator. For some reason this doesn't work -- seems like the data
doesn't go to sink.

Well, from reading the documentation of the io-storage package, you shouldn't be using it. The package's own description doesn't make any sense, which is a strong hint that it was written by someone who didn't know what they were doing, and which doesn't give me much confidence that it does anything sensible.

Prior to switching to System.IO.Storage I have tried IORef and MVar,
those caused stack overflow.

Wait - what are you using any of these for in the first place? I have no context to understand this chunk of code.

But even without knowing why you're using these: in general, it's really common for newcomers to get bitten by the fact that IORef and MVar and many other Haskell containers are lazy.

In other words, this expression:

writeIORef foo (1 + 2)

puts the thunk 1+2 in the IORef, not the number 3. Here's a safer variant:

writeIORef foo $! 1 + 2

The use of $! forces 1+2 to be evaluated to WHNF before it's passed to "writeIORef foo".

Functions like modifyIORef and modifyMVar have the same behaviour. These often result in stack overflows if you don't know to look out for them.

Eugene Perederey

unread,
Nov 21, 2011, 4:10:49 AM11/21/11
to Stanford CS240h 2011 Autumn
> writeIORef foo $! 1 + 2

Yes, now I know this too. I need some mutable storage to keep the
state of my server that can be changed upon request from the client.
Actually, I think there is no problem with this storage itself -- as
long as I operate in constant space.

My problem with lazy IO is that I need to properly combine it with
parsing and now I'm trying to use IterIO.Atto for that purpose.
I want to get data from client, reply and close the connection. The
data flow goes like this:

source |$ byteStringToRequest .| requestToReply .|
replyToByteString .| sink

where Request comes from the client, then it's parsed, converted to a
reply and written back to the socket.

The problem is that with binary data I need to carry the size of
request data chunk as a part of the request itself,
as
data Request = Request Int ByteString

So when the parser has consumed the proper number of bytes I'd feed it
an empty string to terminate parsing.
I couldn't figure out how to do it with IterIO.Atto -- it only allows
me to get the parsed request,
but not the result in form Result a.

Also, I don't see any other way to implement a two-way TCP connection
-- any better ideas?


On Nov 20, 9:10 pm, "Bryan O'Sullivan" <b...@serpentine.com> wrote:
> On Sun, Nov 20, 2011 at 2:54 AM, Eugene Perederey <
>

> eugene.perede...@gmail.com> wrote:
>
> > Basically, I take a chunk of ByteString, use it as a key to
> > System.IO.Storage, retrieve the corresponding value and return it in
> > inumerator. For some reason this doesn't work -- seems like the data
> > doesn't go to sink.
>
> Well, from reading the documentation of the io-storage package, you
> shouldn't be using it. The package's own description doesn't make any
> sense, which is a strong hint that it was written by someone who didn't
> know what they were doing, and which doesn't give me much confidence that
> it does anything sensible.
>
> Prior to switching to System.IO.Storage I have tried IORef and MVar,
>
> > those caused stack overflow.
>
> Wait - what are you using any of these for in the first place? I have no
> context to understand this chunk of code.
>
> But even without knowing why you're using these: in general, it's really
> common for newcomers to get bitten by the fact that IORef and MVar and many

> other Haskell containers are *lazy*.


>
> In other words, this expression:
>
> writeIORef foo (1 + 2)
>

> puts the *thunk* 1+2 in the IORef, *not* the number 3. Here's a safer

dm-list-clas...@scs.stanford.edu

unread,
Nov 21, 2011, 12:00:58 PM11/21/11
to stanford-1...@googlegroups.com
At Mon, 21 Nov 2011 01:10:49 -0800 (PST),

Eugene Perederey wrote:
>
> So when the parser has consumed the proper number of bytes I'd feed it
> an empty string to terminate parsing.

Why don't you have the parser just return when it's done?

I don't know what your protocol looks like, but let's say it's a hex
number representing the length, followed by a newline, followed by
that many bytes (kind of like an HTTP chunk encoding of a single
chunk). Then you could do something like:

parseRequest :: Parser Request
parseRequest = do
n <- hexadecimal
endOfLine
s <- take n
return $ Request n s

But the point is that TCP does not have record boundaries, thus your
protocol itself should have a notion of where the message boundaries
lie. And so the parser itself, which is interpreting the protocol,
should be able to figure out where the message boundaries are and take
just that many bytes.

It seems like you are trying to have two levels of parser, one that
determines the message boundaries, and one that parses within the
message boundaries. For your case, that is probably needlessly
complex. It is occasionally useful to do something like that , but
mostly where you don't want to keep the whole message in memory. For
example, the 'foldForm' function in Data.IterIO.HTTP runs an Iter on
each input field (followed by end-of-file), so that you can pipe
uploaded files to the file system without having to keep the whole
thing in memory. Since Attoparsec keeps the whole thing in memory
anyway, there's no reason to have a second level of parser.

David

Eugene Perederey

unread,
Nov 21, 2011, 3:30:58 PM11/21/11
to Stanford CS240h 2011 Autumn
> It seems like you are trying to have two levels of parser

Exactly! That's what was bothering me. I do need to modify the parser
so that it knows how much to take.
I have no more questions so far, thank you.

Reply all
Reply to author
Forward
0 new messages