Leftovers

44 views
Skip to first unread message

Tom Ellis

unread,
Mar 4, 2016, 1:44:58 PM3/4/16
to haskel...@googlegroups.com
I never really grasped what leftovers are. I don't understand why it
wouldn't suffice to have a "pushback pipe"

pushback :: Proxy a b (Either a b) b m r

that allows you to push "unused" 'b's back into it, to be stored in a queue.
The next 'b's then extracted from the pushback pipe will be the ones most
recently pushed in. If the queue is empty then we request a 'b' from the
other end.

Does this make no sense? Are leftovers much more subtle than I am
realising?

Thanks,

Tom

Gabriel Gonzalez

unread,
Mar 5, 2016, 11:41:33 AM3/5/16
to haskel...@googlegroups.com, tom-lists-hask...@jaguarpaw.co.uk
Yeah, leftovers handling is a little bit more subtle.  First, read this post if you haven’t already, particularly the section titled “Leftovers”:

http://www.haskellforall.com/2014/02/pipes-parse-30-lens-based-parsing.html

I can also add a more sophisticated example in addition to the one given in the above post.  Suppose that you have a pipe that encodes `Text` into `ByteString` named `encoder` that has this rough type:

    encoder :: Pipe Text ByteString ()

… which you can freely modify to add in any leftover functionality you want.

Now suppose we hook that up in this configuration:

    consumesBytes :: Pipe ByteString Out ()
    consumesText :: Pipe Text Out ()

    example :: Pipe Text Out ()
    example = do
        encoder >-> consumesBytes
        consumesText

Now think about what should happen if `consumesBytes` terminates while `encoder` is holding onto a non-empty queue of leftovers that haven’t been used up by `consumesBytes`

Well, first off, what type of leftovers would `encoder` be holding onto?  In this case the type of leftovers will be `ByteString`s that were returned to `encoder` by the `consumesBytes` `Pipe`.  There are two possible things that `encoder` could do with those leftovers upon termination:

* (A) Discard the leftovers.  However, that means that `consumesText` will begin at the wrong position in the stream
* (B) Transform the `ByteString` leftovers into `Text` leftovers and push those further upstream before terminating

Option (B) sounds reasonable at first except that there might not be a way to transform the `ByteString` leftovers into `Text` that can be pushed further upstream, for a couple of reasons:

* The encoding might not necessarily round-trip
* Even if the encoding *did* round-trip (like UTF8), there is nothing that requires that `consumesBytes` consumes bytes only along Unicode character boundaries

To elaborate on the latter case, assume that `encoder` received a text chunk containing a single character: "⌘”.  If you UTF8-encode that you get three bytes: "e2 8c 98”.  If `consumesBytes` only consumes the first byte (i.e. “e2”) then that means that `encoder` is now holding onto two byes in its leftovers queue, “8c 98”, and there’s no longer a way to push those two bytes further upstream as `Text` since they cannot be (correctly) re-encoded as `Text`.  Now, if `consumesBytes` terminates there is no legitimate way for `consumesText` to begin where `consumesBytes` left off.

--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.
To post to this group, send email to haskel...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages