No problem. So if we look again at the "dosum" example:
The code body with >1 parameter is the "accumulation" function. It
takes a state value and a tuple in the aggregation and returns a new
state value. In this case, it receives a 1-tuple containing the next
value in the grouping to add to the sum. If the aggregator took in 2-
tuples as input, that piece of code would take 3 parameters (1 for the
state, 2 for the tuple).
The code body with 1 parameter is the "return" function. It takes in
the totally accumulated state and returns a seq of tuples as output.
In this case it just returns the state as-is. "[state]" is equivalent
to saying "[[state]]". 1-tuples can be written as just values and
without the []. "[state state]" would be the same as saying "[[state]
[state]" (returning 2 tuples as output).
Also, I noticed that my buffer version has a typo. It should be:
> Hi Nathan,
> Thanks for this. Can you please elaborate on defaggregateop? You
> said it was more restricted and then gave an example but I'm not sure
> what the output is and if it has to follow a certain form. Other than
> that, everything else makes sense.
> Best
> On Nov 8, 5:56 pm, nathanmarz <nathan.m...@gmail.com> wrote:
> > Been meaning to write this up... I'll put a more detailed explanation
> > on the wiki at some point, let me know what doesn't make sense below.
> > All those "def" macros define custom operations with differing
> > semantics. Let's use this "test' dataset as an example:
> > ["a" 1]
> > ["b" 2]
> > ["a" 3]
> > defmapop: Define a custom operation which adds fields to a tuple.
> > (defmapop add-2-fields [x] [1 2])
> > (<- [?a ?b ?c] (test _ ?a) (add-2-fields ?a :> ?b ?c)
> > Results:
> > [1 1 2]
> > [2 1 2]
> > [3 1 2]
> > deffilterop: Define a custom operation which only keeps tuples for
> > which this operation returns true.
> > (deffilterop is2 [x] (= x 2))
> > (<- [?a ?b] (test ?a ?b) (is2 ?b))
> > Results:
> > ["b" 2]
> > defmapcatop: Define a custom operation which creates *multiple*
> > tuples.
> > (defmapcatop twomoretuples [x] [[(inc x)] [(+ 2 x)]])
> > (<- [?a ?b ?c] (test ?a ?b) (twomoretuples ?b :> ?c))
> > Results:
> > ["a" 1 2]
> > ["a" 1 3]
> > ["b" 2 3]
> > ["b" 2 4]
> > ["a" 3 4]
> > ["a" 3 5]
> > defbufferop: Defines an aggregator which receives all the tuples for
> > the group in a single seq. Buffers cannot be used with any other
> > buffers/aggregators in a query. Buffers operate reduce-side.
> > (defbufferop dosum [tuples] (reduce + (map first tuples)))
> > (<- [?a ?sum] (test ?a ?b) (dosum ?b :> ?sum))
> > Results:
> > ["a" 4]
> > ["b" 2]
> > defaggregateop: Defines an aggregator which must be written in a more
> > restricted way. Aggregators *can* be used with other aggregators in a
> > query (i.e., you can do a count and sum of a group at same time).
> > Aggregators operate reduce-side. Aggregators consist of code for
> > "initializing", "aggregating", and "extracting a result". Aggregators
> > return a seq of tuples.
> > (defaggregateop dosum ([] 0) ([state val] (+ state val)) ([state]
> > [state]))
> > defparallelagg: Defines an even more restricted aggregator that is
> > defined using two functions. These aggregators are more efficient as
> > they make use of map-side combiner optimizations. parallelaggs can be
> > composed with other parallelaggs/regular aggregators. However, when
> > composed with regular aggregators the entire computation is moved
> > reduce-side.
> > (defparallelagg dosum :init-var #'identity :combine-var #'+)
> > Vanilla Clojure functions can also be used as operations. When given
> > no output vars they work as filterops, when given output vars they
> > work as mapops. The drawback of using a regular Clojure function is
> > that they can't be inserted dynamically into a query. For example:
> > (defn mk-query [op] (<- [?a ?b] (test _ ?a) (op ?a :> ?b))
> > The "op" passed to that function must be defined using one of
> > Cascalog's "def" macros and can't be a regular Clojure function. This
> > is b/c Cascalog uses the var name of functions to distribute the
> > operation across the cluster.
> > Hope that helps! Let me know if you have more questions.
> > -Nathan
> > On Nov 8, 10:58 am, Robert Malko <robma...@gmail.com> wrote:
> > > Hi Cascaloggers,
> > > I just wanted to say that writing clojure/cascalog is very very fun
> > > but I'm having a hard time grasping what some of the included macros
> > > do (defmapcatop, deffilterop, defmapop, defaggregateop, defbufferop,
> > > defaggregateop, defparallelagg).
> > > I've read all the blog posts, the source and the google group posts
> > > but still can't really deduce what the point of these are and when I
> > > should use them.
> > > For instance, why would I define a filterop if I can just use a plain
> > > clojure function for filtering?
> > > Any help to further explain these macros would really take my cascalog
> > > experience to the next level.
> > > Thanks Nathan!