Multiple mapM and performance

43 views
Skip to first unread message

aqu...@gmail.com

unread,
Jun 6, 2017, 4:53:27 AM6/6/17
to Haskell Pipes
Hello, everyone. Because I'm newbie, my question may seem naive. but:
I'm doing something like:
do
 str
' <- S.mapM fun1 str
 str'' <- S.mapM fun2 str'

 
-- so on
and I suppose that real iteration (if I use `mapM` or `iterM`, map other functions of streaming) happens only one and resulting code after compilation will looks like
for item in str:
 item
' = fun1 item
 item'' = fun2 item'

 
-- so on
Am I right? Or are there some pitfalls here which we should remember to accomplish such result?


/Best regards, Paul



Gabriel Gonzalez

unread,
Jun 6, 2017, 11:26:58 AM6/6/17
to haskel...@googlegroups.com
Could you provide an example that compiles, including imports?  The reason I ask is that I'm not sure which mapM that you are using

--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.
To post to this group, send email to haskel...@googlegroups.com.

aqu...@gmail.com

unread,
Jun 7, 2017, 3:32:46 AM6/7/17
to Haskell Pipes
Hello, Gabriel!

Real module (which is compilable) is big, so I'll show on email's top extraction from it (little snippets, with imports and functions) and will add compilable module (I used it for experiments) at the end of the email - because it's not very small :) So, compilable code is on the bottom of the mail.

Snippets:

...

import           Streaming
import qualified Streaming.Prelude as S

...

(|>) = flip ($)

...

    flow = items
           |> S.mapM (liftIO . doRequest connection)
           |> S.map (liftM fun1)
           |> S.zip fun2
           |> S.filter filter1
           |> S.mapM fun3

...

So, I supposed that serial "map"s/"zip"s/"filter"s will be compiled into one loop with serial application of functions-arguments of that "map"s/"zip"s/"filter"s. My Python background "says" me that maps/filters/etc are types, not functions, so they are "combinatorable": I mean no problem to combine serial flat iterations into one iteration. I am newbie in Haskell and I don't know is it true for Haskell... may be it should happens on optimization phase of compilation, I don't know.

Does it happen automatically, on GHC side, or developer should take special actions when works with Streaming/Pipes/Conduits? Is it the same for usual maps/filters from Prelude/Data.List? I supposed 99% that GHC reduces ASTs then optimizes the result and we get at the end flat simple iterations and so on, and developer should not think about such things totally. But I had a doubt crept in, so I decided to ask...

Another example is:

  where flow = getItems connection
               |> S.filter getInteresting
               |> S.map fun1
               |> fixItems fun2
               |> S.mapM fun3
               |> S.mapM fun4
...

where fixItems iterates over stream's items and "yields" them with S.yield, like this:

...

import qualified Control.Monad.Trans.State  as ST                                           
import qualified Control.Monad.Trans.Writer as W

...

fixItems :: Monad m => FixItem e ps st -> Stream (Of e) (ST.StateT st m) Result -> Stream (Of e) (ST.StateT st m) Result
fixItems fi = loop where
  loop str = do
    e <- lift $ S.next str
    e' <- case e of
      Left err -> return err
      Right (e', str') ->
        let (fixes, problems) = W.runWriter $ fixItem fi e'
            fix = mconcat fixes
        in (lift $ ST.modify $ reportProblems fi problems e')
            >> case fix of
                ItemFix ff -> (S.yield $ ff e')
                ItemSkip   -> pure ()
            >> loop str'
    return NoResult -- FIXME how to return stream result?
...


Next is compilable module. I experimented with "fixing" of stream's items; code is terrible, I suppose, but it was experiment ;):

{-# LANGUAGE FlexibleContexts #-}
module Main where

import           Control.Monad
import           Control.Monad.Trans.Reader
import           Control.Monad.Trans.State
import           Control.Monad.Trans.Writer
import           Data.Functor.Identity
import           Data.Traversable
import           Streaming
import qualified Streaming.Prelude          as S


type M = StateT [String] IO

subgen :: Int -> S.Stream (S.Of Int) M Int
subgen n =
  if odd n then do { return 0 }
  else do
    S.yield (n*100)
    S.yield (n*1000)
    lift $ modify (++["!!!"])
    return 0

gen :: S.Stream (S.Of Int) M Int
gen = do
  S.yield 0; S.yield 100; S.yield 101; S.yield 102; S.yield 103; S.yield 104; S.yield 105; S.yield 106
  liftIO $ putStrLn "enter x: "
  x <- liftIO getLine
  let n = read x::Int
  S.yield n
  lift $ modify (++["gen1"])
  lift $ modify (++["gen2"])
  return 0 -- $ do

proc1 :: S.Stream (S.Of Int) M Int -> S.Stream (S.Of Int) M Int
proc1 str = do
  st <- lift get
  loop str st
  where
    loop str st = do
      e <- lift $ S.next str
      e' <- case e of
        Left err -> return err
        Right (e', str') ->
          (if e' == 100 then (lift $ put $ st ++ ["proc1"]) else pure ()) >>
            subgen e' >> (S.yield $ e' + 123) >> loop str' st
      return 1

data Cr = Cr {
  crName  :: String
  , crAge :: Int
  } deriving Show

data FixItem a = FixItem (a -> a) | SkipItem
instance Monoid (FixItem a) where
  mempty = FixItem id
  mappend SkipItem _              = SkipItem
  mappend _ SkipItem              = SkipItem
  mappend (FixItem f) (FixItem g) = FixItem (f . g)

fixWhen :: Monad m => Bool -> m (FixItem a) -> m (FixItem a)
fixWhen cond act = if cond then act else return $ FixItem id

fixcr1 :: Int -> Writer [String] (FixItem Int)
fixcr1 n =
  let (fixes, errs) = runWriter $ sequence [
        fixWhen (n == 100) (tell ["panic:"++show n++"==100"] >> return SkipItem)
        , fixWhen (n > 100) (tell ["err1:"++show n ++">100"] >> return (FixItem (10+)))
        , fixWhen (n > 1) (tell ["err2:"++show n++">1"] >> return (FixItem (100+)))
        ]
  in
    writer (mconcat fixes, errs)

fixItems :: (Monad m, Num a) => Stream (Of Int) (StateT [String] m) a -> Stream (Of Int) (StateT [String] m) a
fixItems = loop where
  loop str = do
    e <- lift $ S.next str
    e' <- case e of
      Left err -> return err
      Right (e', str') ->
        let (fix, errs) = runWriter $ fixcr1 e'
        in (lift $ modify (++errs))
            >> case fix of
                FixItem ff -> (S.yield $ ff e')
                SkipItem   -> pure ()
            >> loop str'
    return 1


(|>) = flip ($)

main :: IO ()
main = do
  p <- runStateT flow []
  print p
  print "end."
  where flow = gen
          |> fixItems
          |> S.map (+100)
          |> S.mapM_ (liftIO . print)

and to build it I included in cabal 2 dependencies:
...
                     , transformers >= 0.5
                     , streaming
...


/Best regards, Paul

Gabriel Gonzalez

unread,
Jun 7, 2017, 12:40:35 PM6/7/17
to haskel...@googlegroups.com
They will compile successive transformations (i.e. multiple maps/mapMs/filters, for example) into a single pass over the data.  This is true for the `streaming`/`pipes`/`list-transformer`/`logict`/`conduit`/`turtle` libraries or any "ListT-done-right" implementation.  All of these libraries do not rely on any special GHC optimizations or rewrite rules.  They will still go over the data in one pass even if you disable all optimizations.

This is *not* true for `Control.Monad.mapM` or the `ListT` type from `transformers` (a.k.a. "ListT done wrong"), which is why I wanted to make sure which `mapM` you were talking about.

aquagnu

unread,
Jun 7, 2017, 3:08:31 PM6/7/17
to haskel...@googlegroups.com
> They will compile successive transformations (i.e. multiple
> maps/mapMs/filters, for example) into a single pass over the data.
> This is true for the
> `streaming`/`pipes`/`list-transformer`/`logict`/`conduit`/`turtle`
> libraries or any "ListT-done-right" implementation. All of these
> libraries do not rely on any special GHC optimizations or rewrite
> rules. They will still go over the data in one pass even if you
> disable all optimizations.

So, this is so, because of implementation of their Functor and
Applicative instances? Am I right, that there occurs this combination
("single pass")?

But what about Data.List and Prelude's `map` (in GHC.Base)? If I'll
do

map fn1 (map fn2 lst)

?

>
> This is *not* true for `Control.Monad.mapM` or the `ListT` type from
> `transformers` (a.k.a. "ListT done wrong"), which is why I wanted to
> make sure which `mapM` you were talking about.

OK, I got it.
> >> <http://gmail.com/> wrote:
> >>
> >> Hello, everyone. Because I'm newbie, my question may seem naive.
> >> but: I'm doing something like:
> >> do
> >> str' <- S.mapM fun1 str
> >> str'' <- S.mapM fun2 str'
> >> -- so on
> >> and I suppose that real iteration (if I use `mapM` or `iterM`, map
> >> other functions of streaming) happens only one and resulting code
> >> after compilation will looks like for item in str: item' = fun1
> >> item item'' = fun2 item'
> >> -- so on
> >> Am I right? Or are there some pitfalls here which we should
> >> remember to accomplish such result?
> >>
> >>
> >> /Best regards, Paul
> >>
> >>
> >>
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> >> Groups "Haskell Pipes" group. To unsubscribe from this group and
> >> stop receiving emails from it, send an email to haskell-pipe...@
> >> <>googlegroups.com <http://googlegroups.com/>. To post to this
> >> group, send email to haskel...@ <>googlegroups.com
> >> <http://googlegroups.com/>.
> >
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "Haskell Pipes" group. To unsubscribe from this group and
> > stop receiving emails from it, send an email to
> > haskell-pipe...@googlegroups.com
> > <mailto:haskell-pipe...@googlegroups.com>. To post to
> > this group, send email to haskel...@googlegroups.com
> > <mailto:haskel...@googlegroups.com>.
>



--
Best regards,
Paul a.k.a. 6apcyk

Gabriel Gonzalez

unread,
Jun 11, 2017, 11:23:29 PM6/11/17
to haskell-pipes
I think it's important to distinguish between two separate concepts: "fusion" vs "one-pass".  "Fusion" refers to avoiding the allocation of intermediate data structures when you transform a stream multiple times whereas "one-pass" means that you don't traverse the sequence of elements twice when you transform the stream multiple times (i.e. you go over the stream in one pass).  You can have a "one-pass" implementation without "fusion" but you cannot have "fusion" without a "one-pass" implementation.

"Fusion" is purely an optimization, meaning that whether or not an implementation uses "fusion" only affects your program's performance but won't affect its behavior.  However, "one-pass" is not just an optimization: one-pass versus multiple pass changes the behavior of your program, especially once your stream has effects like in these streaming libraries.

Out of the two properties, "one-pass" is *much* more important.  The reason why is that "one-pass" ensures that certain functor laws hold.  To see why, let's consider a case where they *don't* hold, which is `Data.List.mapM`.  Normally, you mtigh expect the following functor laws to hold:

    Data.List.mapM (f <=< g) = Data.List.mapM f . Data.List.mapM g

    Data.List.mapM return = id

However, the above two laws don't actually hold for `Data.List.mapM`.  For example, the first law does not hold because the order of effects are not the same for the left-hand and right-hand sides of the equation.  The left-hand side of the equation interleaves the effects of `f` and `g` whereas the right-hand side runs all of `g`'s effects first followed by all of `f`'s effects.  The second equation is also wrong because `Data.List.mapM` misbehaves on infinite lists:

    Data.List.mapM return (repeat x) = _|_

    id (repeat x) = repeat x

This is the root of why `Data.List.mapM` is "bad"

However, the streaming libraries have their own versions of `mapM` which do obey the above functor laws.  For example, if you take the `list-transformer` library and define:

    mapM :: (a -> m b) -> ListT m a -> ListT m b
    mapM f as = do
        a <- as
        lift (f a)
 
... then this *does* obey the following functor laws:

    mapM (f <=< g) = mapM f . mapM g

    mapM return = id

For the first equation, both sides of the equation interleave the effects of `f` and `g`.  For the second equation, both sides of the equation behave correctly on infinite `ListT` streams.  These functor laws hold because `ListT` is has the "one-pass" property.

So to answer your question: it's not exactly the Haskell `Functor` type class per se that is important here, but the functor laws are important (for a more general notion of functor) in establishing why a single pass implementation matters.



> > this group, send email to haskel...@googlegroups.com

>



--
Best regards,
  Paul a.k.a. 6apcyk
--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipes+unsubscribe@googlegroups.com.
To post to this group, send email to haskel...@googlegroups.com.

aqu...@gmail.com

unread,
Jun 12, 2017, 6:40:33 AM6/12/17
to Haskell Pipes
Hello, Gabriel! Thank a lot for such detailed explanation. It's so good that I want to save it in my blog (with author name sure), but only if you do not mind and allow me to do it.

Best regards,
  Paul

понедельник, 12 июня 2017 г., 6:23:29 UTC+3 пользователь Gabriel Gonzalez написал:
I think it's important to distinguish between two separate concepts: "fusion" vs "one-pass".  "Fusion" refers to avoiding the allocation of intermediate data structures when you transform a stream multiple times whereas "one-pass" means that you don't traverse the sequence of elements twice when you transform the stream multiple times (i.e. you go over the stream in one pass).  You can have a "one-pass" implementation without "fusion" but you cannot have "fusion" without a "one-pass" implementation.

"Fusion" is purely an optimization, meaning that whether or not an implementation uses "fusion" only affects your program's performance but won't affect its behavior.  However, "one-pass" is not just an optimization: one-pass versus multiple pass changes the behavior of your program, especially once your stream has effects like in these streaming libraries.

Out of the two properties, "one-pass" is *much* more important.  The reason why is that "one-pass" ensures that certain functor laws hold.  To see why, let's consider a case where they *don't* hold, which is `Data.List.mapM`.  Normally, you mtigh expect the following functor laws to hold:

    Data.List.mapM (f <=< g) = Data.List.mapM f . Data.List.mapM g

    Data.List.mapM return = id

However, the above two laws don't actually hold for `Data.List.mapM`.  For example, the first law does not hold because the order of effects are not the same for the left-hand and right-hand sides of the equation.  The left-hand side of the equation interleaves the effects of `f` and `g` whereas the right-hand side runs all of `g`'s effects first followed by all of `f`'s effects.  The second equation is also wrong because `Data.List.mapM` misbehaves on infinite lists:

    Data.List.mapM return (repeat x) = _|_

    id (repeat x) = repeat x

This is the root of why `Data.List.mapM` is "bad"

However, the streaming libraries have their own versions of `mapM` which do obey the above functor laws.  For example, if you take the `list-transformer` library and define:

    mapM :: (a -> m b) -> ListT m a -> ListT m b
    mapM f as = do
        a <- as
        lift (f a)
 
... then this *does* obey the following functor laws:

    mapM (f <=< g) = mapM f . mapM g

    mapM return = id

For the first equation, both sides of the equation interleave the effects of `f` and `g`.  For the second equation, both sides of the equation behave correctly on infinite `ListT` streams.  These functor laws hold because `ListT` is has the "one-pass" property.

So to answer your question: it's not exactly the Haskell `Functor` type class per se that is important here, but the functor laws are important (for a more general notion of functor) in establishing why a single pass implementation matters.


> > <mailto:haskell-pipes+unsub...@googlegroups.com>. To post to
> > this group, send email to haskel...@googlegroups.com

>



--
Best regards,
  Paul a.k.a. 6apcyk

--
You received this message because you are subscribed to the Google Groups "Haskell Pipes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-pipe...@googlegroups.com.

Gabriel Gonzalez

unread,
Jun 12, 2017, 10:11:11 AM6/12/17
to haskel...@googlegroups.com
Yeah, feel free to reproduce it anywhere you like
Reply all
Reply to author
Forward
0 new messages