a join function that can be used in a reduce

50 views
Skip to first unread message

Andrew Xue

unread,
Jan 14, 2013, 9:23:28 PM1/14/13
to clo...@googlegroups.com
Hi --

Building some generic joining functionality on top of cascalog, the clojure package for big data. 

I have a join function that is defined like this:

(defn join [lhs rhs join-ons] ..implementation ...)

lhs and rhs are subqueries. the join returns then another query that is a join of the two.

join-ons in this case would look something like this [{lhs: "user_id_lhs" :rhs "user_id_rhs" :as "user_id"}]

each map in the vector corresponds to a join condition (in sql, the above would be like lhs join rhs on lhs.user_id_lhs=rhs.user_id_rhs) .. the :as keyword renames the join variable in the result of the join. join-ons is a vector of these join conditions. 

this works ok and its basically modeled after joins in sql.

however it would be nice to able to do something maybe like this

(reduce join [query1 query2 query3 ... queryN])

i am having trouble picturing how the join-ons would work in this case though ...

JM Ibanez

unread,
Jan 15, 2013, 8:20:15 AM1/15/13
to clo...@googlegroups.com
Hi Andrew,
The simplest way to do this is to define a function generator:

   (defn generate-join-fn [join-ons]
    (fn [lhs rhs]
      (let [join-on-cond (join-ons [lhs rhs])]
        (join lhs rhs join-on-cond))))

This will then be used as such:
  
  (reduce 
   (generate-join-fn {[my-left my-right] {:lhs "myleft" :rhs "myright"}
                 [my-right my-final] {:lhs "myright" :rhs "myfinal"}})
   [my-left my-right my-final])

Having the function that returns the actual reducer function is what is key.


HTH,
Reply all
Reply to author
Forward
0 new messages