Monads, when and how to use them?

173 views
Skip to first unread message

Grégory Sainson

unread,
Nov 10, 2015, 3:15:04 AM11/10/15
to algebird
Hi,

I discovered Algebird and the Monad/Monoid paradigm and was blown away by what we can do with it.
I pretty much understood how to use Monoid and creating my own is very easy.
If I would  summarize very (very) quickly to someone else, you just need to extend Monoid[YourType], import com.twitter.algebird.Operators._
and there you have it. So simple yet powerful.

I tried to look for example of usage of Monad, because as I understood, it can be used to express pipelining. A Monad is a transformation from one type to anyther.
Flatmap is there to do the transformation. But I just didn't understood how to use it in a efficient way.
More specifically, what bothers me is the :
trait Monad[M[_]] extends Applicative[M]
I don't get why Monad takes a generic parameter of the form M[_]. Is it to force to use it on Collections?

Let's say I have a BigJsonCaseClass as input, and I want to transform it to extract only the needed elements to do aggregation and counting that we will call SmallAggregationCaseClass.
BigJsonCaseClass is not of the form M[_], I could put all my BigJsonCaseClass in a List of one element, but that seems wrong.

Also, how would you use your monad in a pipelining process? 
I want to do that in order to Aggregate statistics on Spark using the aggregate function in Batch mode not streaming.


Thx for your help.

 

Oscar Boykin

unread,
Nov 10, 2015, 1:45:55 PM11/10/15
to Grégory Sainson, algebird
A Monad for a type requires that type to itself have a type parameter. So, you can make a Monad for List, but really List has a type parameter T: List[T]. Or Option because it is really Option[T]. You can't make a Monad[Int] for instance, note that Int has no type parameter.

This is because of the flatMap method: In Monad:

def flatMap[T, U](m: M[T])(fn: T => M[U]): M[U]

if M[_] == List[_], then this becomes:

def flatMap[T, U](m: List[T])(fn: T => List[U]): List[U]

This syntax, M[_], in a type parameter like `trait Monad[M[_]]` is just a way to tell scala that M itself has to have EXACTLY ONE type parameter.

That all said, Monad has played a role in scalding to sequence several jobs that depend on the output of previous jobs in a functional way:

the companion object:

and the class:

--
You received this message because you are subscribed to the Google Groups "algebird" group.
To unsubscribe from this group and stop receiving emails from it, send an email to algebird+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Oscar Boykin :: @posco :: http://twitter.com/posco

Grégory Sainson

unread,
Nov 11, 2015, 2:56:40 AM11/11/15
to Oscar Boykin, algebird
Ok thx Oscar, so Monad should be used to compose your pipeline at a more highel level than Monoid. 
I could use Monads in my spark job pipelining for instance. That makes a lot more sens now.

Oscar Boykin

unread,
Nov 11, 2015, 1:19:24 PM11/11/15
to Grégory Sainson, algebird
Yes, imagine making something like this:

trait SparkExecution[+T] { self =>
  def run(sc: SparkContext): T
  
  def map[U](fn: T => U): SparkExecution[U] = new SparkExecution[U] {
    def run(sc: SparkContext) = fn(self.run(sc))
  }

  def flatMap[U](fn: T => SparkExecution[U]): SparkExecution[U] = new SparkExecution[U] {
    def run(sc: SparkExecution) = fn(self.run(sc)).run(sc)
  }
}

object SparkExecution {
  def fromFn(f: SparkContext => T): SparkExecuton[T] = new SparkExecution[T] { def run(sc: SparkContext) = f(sc) }
  def emptyRDD[T]: SparkExecution[RDD[T]] = fromFn(RDD.empty[T](_))
  // other stuff here....
}

Now you can write code that does not need the spark context until the end, which is nice. Your app at the last minute will run have the spark context and .run on a SparkExecution.

This would be more clearly useful if spark RDDs didn't need a context to create them, and only for actions (such as writing out, collecting a value, etc...)

But this is how monads could come into play with spark.

oss.m...@gmail.com

unread,
Jan 5, 2016, 1:19:25 PM1/5/16
to algebird
Hi,
I wouldn't strongly connect functional programming and its abstractions to collections. It should be really understood that Applicatives and Monads model some general computations which are already spread around our code but we possibly don't see them. Once I understood which general computations are modeled I can see Functors, Applicatices, Monoids, Monad pretty much "everywhere" in my code.

These computation run in some context which is expressed by M[_], or this context is called - effect - which is modeled. Option context/effect models a value which possibly doesn't exists, List models that more or less then 1 result can be returned, Future models a value which will be available later, Try model computation which can fail, etc...

One can understand Option as a collection which has one or none value, this can be fine when starting with FP but it is good to move away from this understanding soon. Collections are just one specialized context/effect in which pure function can be executed (Functor) / pipelined (Monad) / executed independently of each other (Applicative).

As already said, Monad models pipelining aka sequential transformation from input to output. The sequential is important as each next step awaits till previous one finishes. Another point to realize is that this computation fails as soon as the first pipelined function fails, next functions are not executed and the result of the whole pipeline is a failure.

In contrast Applicative executes all steps independently, each function is evaluated and when they all finish then the next step can be taken to operate on the result, e.q. a List of Successes and Failures.

Alternatively Monad can be understood as a fancy Builder pattern. If you take a look on the return type of flatMap it is the same Monad which is "updated" by the function executed by flatMap. Compare it to a Builder pattern where each method on a data structure (e.g. class) return "this" so that methods can be chained. It is just that Monad has general flatMap and we can ad-hoc plug in any function we want later (with arity 1 or curried).

Futher you can see Decorator pattern in Monads. flatMap takes a function and execute it somewhere inside. If you implement flatMap for your own type (e. g. class) aka context/effect you can execute some general behaviour before and after that function is executed inside flatMap. And you have it - decoration.

Hope it helps,
Petr
Reply all
Reply to author
Forward
0 new messages