Opinion on two splitting functions

57 views
Skip to first unread message

Daniel Díaz

unread,
Aug 17, 2017, 8:47:31 PM8/17/17
to Haskell Pipes
Hi,

The pipes-group and streaming libraries both give you functions like "group", "groupBy" and "chunksOf" that let you segment a stream while preserving streaming. I was thinking of implementing these two additional functions:

delimit  :: (x -> a -> (x,NonEmpty [b])) 
         -> x 
         -> (x -> NonEmpty [b])
         -> Stream (Of a) m r 
         -> Stream (Stream (Of b) m) m r

delimitM :: (x -> a -> Stream (Of b) m (Stream (Stream (Of b) m) m x))
         -> m x 
         -> (x -> Stream (Of b) m (Stream (Stream (Of b) m) m ()))
         -> Stream (Of a) m r 
         -> Stream (Stream (Of b) m) m r

For "delimit", the idea is that every step produces a NonEmpty list of lists along with the new internal state. The list at the head would consist of elements belonging to the "current open group". If there are more lists afterwards, they represent new groups which have been detected in the stream. For streams comprising one single continuous group,  the NonEmptys would *always* contain just one list.

"delimitM" is the monadic version, although here the types start getting quite scary.

What do you think? Would "delimit" make sense, or at this level of complexity it would be better to build the splitting function by single-stepping the source stream?

(Extra semi-related question: could the FreeT trick used in pipes-group also be applied to Sources from the conduit library?)

Gabriel Gonzalez

unread,
Aug 19, 2017, 12:34:21 PM8/19/17
to haskel...@googlegroups.com
What is the motivation for `delimit`/delimitM`?  Are these designed to be utilities for building group-like functions?

Also, the `FreeT` trick can be applied to `ConduitM`, although not `Source`

--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.
To post to this group, send email to haskel...@googlegroups.com.

Daniel Díaz

unread,
Aug 19, 2017, 3:31:51 PM8/19/17
to Haskell Pipes
On Saturday, August 19, 2017 at 6:34:21 PM UTC+2, Gabriel Gonzalez wrote:
What is the motivation for `delimit`/delimitM`?  Are these designed to be utilities for building group-like functions?

Yes.

I'm trying to use Backpack to create an abstract common interface to pipes / streaming / conduit: https://github.com/danidiaz/streamy

I don't want to expose a single-stepping function like "next", but I would still like to be able to define complex splitters using only the abstract signature.


Also, the `FreeT` trick can be applied to `ConduitM`, although not `Source`

Interesting, so perhaps it would be possible to implement a conduit-group package...

Gabriel Gonzalez

unread,
Aug 19, 2017, 5:53:21 PM8/19/17
to haskel...@googlegroups.com
So one thing I want to point out is that if you change the type of `delimit` to:

    delimit :: (x -> a -> x) -> x -> (x -> NonEmpty b) -> Stream (Of a) m r -> Stream (Stream (Of b) m) m r

... which might be equivalent depending on your needs, then you can simplify the type further to:

    delimit :: Fold a (NonEmpty b) -> Stream (Of a) m r -> Stream (Stream (Of b) m) m r

... where `Fold` is from the `foldl` library

You can similarly simplify `delimitM` using `FoldM` from the same library

Daniel Díaz

unread,
Sep 12, 2017, 6:16:40 PM9/12/17
to Haskell Pipes
One slight awkwardness of representing it as a Fold is that it gives the impression that the initial state may emit values downstream, which is odd for a splitter.

Anyway, I when ahead and implement the function. Here is the Backpack signature with some documentation. Here is the pipes implementation, here the streaming one, and here some simple tests.

I believe one could implement the pipes-text splitters (lines,words) this way.

Gabriel Gonzalez

unread,
Sep 15, 2017, 1:17:33 PM9/15/17
to haskel...@googlegroups.com, diaz.c...@gmail.com
Note that this applies to the type that you gave, too!  There is no way to deduce from the type that the initial state won't be used to produce a value :)

If you do want that guarantee, then perhaps you could restructure the type as:

    delimit :: (x -> a -> (x, NonEmpty b)) -> x -> Stream (Of a) m r -> Stream (Stream (Of b) m) m r

This type has some nice properties, too.  For example, the first part is equivalent to a mealy machine:

    {-# LANGUAGE ExistentialQuantification #-}

    data Mealy a b = forall s . Mealy
        { begin :: s

        , step :: a -> s -> (b, s)
        -- ^ @a -> State s b@
        }

... which implements all sorts of type classes (like `Category`, `Applicative` and `Arrow`) and supports nice lens-based combinators analogous to the ones for `Fold`

Using that type, you would have:

    delimit :: Mealy a (NonEmpty b) -> Stream (Or a) m r -> Stream (Stream (Of b) m) m r

... and you can also generalize `Mealy` to a monadic version, too for `delimitM`

However, looking at this more closely I'm not sure that your original type for `delimit` is sufficiently general to handle most use cases for grouping a stream.  If I'm reading the type correctly I think this assumes that you have one group of `b`s per original `a` in the input stream, which might not be the case.  For example, consider this function that groups a stream of text chunks into lines:

    lines :: Monad m => Stream (Of Text) m r -> Stream (Stream (Of Text)) m r

I don't see how you could implement that in terms of `delimit`

I think if you really want a sufficiently interface you should expose a `next`-like function (similar to `list-transformer` and `pipes`)

    data Step f m r = Cons (f (Stream f m r)) | Nil r

    next :: Stream f m r -> m (Step f m r)

You might still provide higher-efficiency combinators for grouping, but I think you will need the less efficient `next` primitive exposed as a fallback for the general case of grouping things

Daniel Díaz

unread,
Sep 16, 2017, 11:23:34 AM9/16/17
to Haskell Pipes
On Friday, September 15, 2017 at 7:17:33 PM UTC+2, Gabriel Gonzalez wrote:
Note that this applies to the type that you gave, too!  There is no way to deduce from the type that the initial state won't be used to produce a value :)


Ah, you're right, of course. The "done" function may emit stuff without any input. The problem is that if I remove it and go the Mealy machine route some ways of splitting seem to become impossible. For example, imagine splitting a text stream on a two-char separator (while removing the separator).
 

However, looking at this more closely I'm not sure that your original type for `delimit` is sufficiently general to handle most use cases for grouping a stream.  If I'm reading the type correctly I think this assumes that you have one group of `b`s per original `a` in the input stream, which might not be the case.  For example, consider this function that groups a stream of text chunks into lines:

    lines :: Monad m => Stream (Of Text) m r -> Stream (Stream (Of Text)) m r

I don't see how you could implement that in terms of `delimit`



Instead of the step function returning  NonEmpty b, it should return NonEmpty [b]:

delimit :: Monad m => (x -> a -> (x, NonEmpty [b])) -> (x -> NonEmpty [b]) -> x -> Stream a m r -> Groups b m r

I think this covers the case in which we encounter multiple newlines in a single block of text. It would be like:

lines :: Stream T.Text IO r -> Groups T.Text IO r
lines =
    Y.delimit (\nl txt ->
                  if T.null txt
                     then (nl,pure [])
                     else let nl' = T.last txt == '\n'
                          in case (nl,T.lines txt) of
                              (True,ls)    -> (nl', []  :| map pure ls)
                              (False,l:ls) -> (nl', [l] :| map pure ls)
                              (_,[])       -> error "never happens")
              (\_ -> pure [])
              True
 
Reply all
Reply to author
Forward
0 new messages