Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Description of def macros
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  4 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Robert Malko  
View profile  
 More options Nov 8 2010, 1:58 pm
From: Robert Malko <robma...@gmail.com>
Date: Mon, 8 Nov 2010 10:58:10 -0800 (PST)
Local: Mon, Nov 8 2010 1:58 pm
Subject: Description of def macros
Hi Cascaloggers,

I just wanted to say that writing clojure/cascalog is very very fun
but I'm having a hard time grasping what some of the included macros
do (defmapcatop, deffilterop, defmapop, defaggregateop, defbufferop,
defaggregateop, defparallelagg).

I've read all the blog posts, the source and the google group posts
but still can't really deduce what the point of these are and when I
should use them.

For instance, why would I define a filterop if I can just use a plain
clojure function for filtering?

Any help to further explain these macros would really take my cascalog
experience to the next level.

Thanks Nathan!


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
nathanmarz  
View profile  
 More options Nov 8 2010, 5:56 pm
From: nathanmarz <nathan.m...@gmail.com>
Date: Mon, 8 Nov 2010 14:56:23 -0800 (PST)
Local: Mon, Nov 8 2010 5:56 pm
Subject: Re: Description of def macros
Been meaning to write this up... I'll put a more detailed explanation
on the wiki at some point, let me know what doesn't make sense below.

All those "def" macros define custom operations with differing
semantics. Let's use this "test' dataset as an example:

["a" 1]
["b" 2]
["a" 3]

defmapop: Define a custom operation which adds fields to a tuple.

(defmapop add-2-fields [x] [1 2])

(<- [?a ?b ?c] (test _ ?a) (add-2-fields ?a :> ?b ?c)

Results:
[1 1 2]
[2 1 2]
[3 1 2]

deffilterop: Define a custom operation which only keeps tuples for
which this operation returns true.

(deffilterop is2 [x] (= x 2))

(<- [?a ?b] (test ?a ?b) (is2 ?b))

Results:
["b" 2]

defmapcatop: Define a custom operation which creates *multiple*
tuples.

(defmapcatop twomoretuples [x] [[(inc x)] [(+ 2 x)]])

(<- [?a ?b ?c] (test ?a ?b) (twomoretuples ?b :> ?c))

Results:
["a" 1 2]
["a" 1 3]
["b" 2 3]
["b" 2 4]
["a" 3 4]
["a" 3 5]

defbufferop: Defines an aggregator which receives all the tuples for
the group in a single seq. Buffers cannot be used with any other
buffers/aggregators in a query. Buffers operate reduce-side.

(defbufferop dosum [tuples] (reduce + (map first tuples)))

(<- [?a ?sum] (test ?a ?b) (dosum ?b :> ?sum))

Results:
["a" 4]
["b" 2]

defaggregateop: Defines an aggregator which must be written in a more
restricted way. Aggregators *can* be used with other aggregators in a
query (i.e., you can do a count and sum of a group at same time).
Aggregators operate reduce-side. Aggregators consist of code for
"initializing", "aggregating", and "extracting a result". Aggregators
return a seq of tuples.

(defaggregateop dosum ([] 0) ([state val] (+ state val)) ([state]
[state]))

defparallelagg: Defines an even more restricted aggregator that is
defined using two functions. These aggregators are more efficient as
they make use of map-side combiner optimizations. parallelaggs can be
composed with other parallelaggs/regular aggregators. However, when
composed with regular aggregators the entire computation is moved
reduce-side.

(defparallelagg dosum :init-var #'identity :combine-var #'+)

Vanilla Clojure functions can also be used as operations. When given
no output vars they work as filterops, when given output vars they
work as mapops. The drawback of using a regular Clojure function is
that they can't be inserted dynamically into a query. For example:

(defn mk-query [op] (<- [?a ?b] (test _ ?a) (op ?a :> ?b))

The "op" passed to that function must be defined using one of
Cascalog's "def" macros and can't be a regular Clojure function. This
is b/c Cascalog uses the var name of functions to distribute the
operation across the cluster.

Hope that helps! Let me know if you have more questions.

-Nathan

On Nov 8, 10:58 am, Robert Malko <robma...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Robert Malko  
View profile  
 More options Nov 9 2010, 3:52 pm
From: Robert Malko <robma...@gmail.com>
Date: Tue, 9 Nov 2010 12:52:19 -0800 (PST)
Local: Tues, Nov 9 2010 3:52 pm
Subject: Re: Description of def macros
Hi Nathan,

Thanks for this.  Can you please elaborate on defaggregateop?  You
said it was more restricted and then gave an example but I'm not sure
what the output is and if it has to follow a certain form.  Other than
that, everything else makes sense.

Best

On Nov 8, 5:56 pm, nathanmarz <nathan.m...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
nathanmarz  
View profile  
 More options Nov 9 2010, 4:40 pm
From: nathanmarz <nathan.m...@gmail.com>
Date: Tue, 9 Nov 2010 13:40:49 -0800 (PST)
Local: Tues, Nov 9 2010 4:40 pm
Subject: Re: Description of def macros
No problem. So if we look again at the "dosum" example:

(defaggregateop dosum
([] 0)
([state val] (+ state val))
([state] [state]))

An aggregateop accumulates some state over the course of the
aggregation. The code body with no parameters sets the initial value
of the state.

The code body with >1 parameter is the "accumulation" function. It
takes a state value and a tuple in the aggregation and returns a new
state value. In this case, it receives a 1-tuple containing the next
value in the grouping to add to the sum. If the aggregator took in 2-
tuples as input, that piece of code would take 3 parameters (1 for the
state, 2 for the tuple).

The code body with 1 parameter is the "return" function. It takes in
the totally accumulated state and returns a seq of tuples as output.
In this case it just returns the state as-is. "[state]" is equivalent
to saying "[[state]]". 1-tuples can be written as just values and
without the []. "[state state]" would be the same as saying "[[state]
[state]" (returning 2 tuples as output).

Also, I noticed that my buffer version has a typo. It should be:

(defbufferop dosum [tuples] [(reduce + (map first tuples))])

Hope that helps,
Nathan

On Nov 9, 12:52 pm, Robert Malko <robma...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »