Yeah, leftovers handling is a little bit more subtle. First, read this post if you haven’t already, particularly the section titled “Leftovers”:
http://www.haskellforall.com/2014/02/pipes-parse-30-lens-based-parsing.html
I can also add a more sophisticated example in addition to the one given in the above post. Suppose that you have a pipe that encodes `Text` into `ByteString` named `encoder` that has this rough type:
encoder :: Pipe Text ByteString ()
… which you can freely modify to add in any leftover functionality you want.
Now suppose we hook that up in this configuration:
consumesBytes :: Pipe ByteString Out ()
consumesText :: Pipe Text Out ()
example :: Pipe Text Out ()
example = do
encoder >-> consumesBytes
consumesText
Now think about what should happen if `consumesBytes` terminates while `encoder` is holding onto a non-empty queue of leftovers that haven’t been used up by `consumesBytes`
Well, first off, what type of leftovers would `encoder` be holding onto? In this case the type of leftovers will be `ByteString`s that were returned to `encoder` by the `consumesBytes` `Pipe`. There are two possible things that `encoder` could do with those leftovers upon termination:
* (A) Discard the leftovers. However, that means that `consumesText` will begin at the wrong position in the stream
* (B) Transform the `ByteString` leftovers into `Text` leftovers and push those further upstream before terminating
Option (B) sounds reasonable at first except that there might not be a way to transform the `ByteString` leftovers into `Text` that can be pushed further upstream, for a couple of reasons:
* The encoding might not necessarily round-trip
* Even if the encoding *did* round-trip (like UTF8), there is nothing that requires that `consumesBytes` consumes bytes only along Unicode character boundaries
To elaborate on the latter case, assume that `encoder` received a text chunk containing a single character: "⌘”. If you UTF8-encode that you get three bytes: "e2 8c 98”. If `consumesBytes` only consumes the first byte (i.e. “e2”) then that means that `encoder` is now holding onto two byes in its leftovers queue, “8c 98”, and there’s no longer a way to push those two bytes further upstream as `Text` since they cannot be (correctly) re-encoded as `Text`. Now, if `consumesBytes` terminates there is no legitimate way for `consumesText` to begin where `consumesBytes` left off.