join with maps

52 views
Skip to first unread message

Bruno Bonacci

unread,
Aug 2, 2016, 7:47:13 AM8/2/16
to cascalog-user
Hi, 

I'm using Cascalog 3.0.0 and I have an issue while trying to join two dataset where records are maps.
Here is a sample of the join:


(def src1 [[{:name "john" :age 39}]

               [{:name "fred" :age 28}]])


(def src2 [[{:name "john" :city "london"}]

               [{:name "fred" :city "paris"}]])



(??<- [?out]

      (src1 ?p)

      (src2 ?c)

      (get ?p :name :> ?name)

      (get ?c :name :> ?name)

      (merge ?p ?c :> ?out))



When running this I run in this error: 


IllegalArgumentException Unable to join predicates together


Does anyone have any idea why I can't join maps in this way?


Bruno

Igor Postelnik

unread,
Aug 3, 2016, 9:49:51 AM8/3/16
to cascalog-user
Joins are generated only when fields are used with generators. In other contexts bound fields have effect of filtering.

You need to break up your query as follows. First make a new function that returns join key and the map like this:

(defn with-key 
 [gen k]
 (<- [?k ?m]
      (gen ?m)
      (get ?m k :> ?k)))

then use it in your query like this

(<- [?m]
      ((with-key src1 :name) ?name ?a)
      ((with-key src2 :name) ?name ?b)
      (merge ?a ?b :> ?m))

-Igor

Bruno Bonacci

unread,
Aug 4, 2016, 4:15:32 PM8/4/16
to cascalog-user
Hi Igor,

thanks for the hint, but it looks like even this simple example is causing a 

java.lang.Exception: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)

I had tried a similar solution myself extracting the  join key in a subquery with the same result.
There is certainly something work is a join between two lines causes a OOM.

Any other suggestion?

thanks 
Bruno

Sam Ritchie

unread,
Aug 4, 2016, 4:49:32 PM8/4/16
to cascal...@googlegroups.com
Hmm, I don't think that's actually true, about the join keys having to come out of generators. Not sure what's going on here.

--
You received this message because you are subscribed to the Google Groups "cascalog-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascalog-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Sam Ritchie, Stripe Inc.

(Too brief? Here's why! http://emailcharter.org)

Bruno Bonacci

unread,
Aug 4, 2016, 6:27:18 PM8/4/16
to cascalog-user
Hi Sam,

I've seen the same suggestion into another post with a similar issue.
I started using mapfn to extract the fields from both maps and try to join
and I stumbled into another issue and in this post Nathan was suggesting
that this might be a Cascalog bug, but one way to work around this
was to create a sub-query.

(this bug was first seen 2012)

So I tried with the sub-query with a similar query suggested by Igor and I do get a OOM, so now I'm stuck.
Any help would be really appreciated.

Bruno


To unsubscribe from this group and stop receiving emails from it, send an email to cascalog-use...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Igor Postelnik

unread,
Aug 5, 2016, 10:14:59 AM8/5/16
to cascalog-user
Can you post the code that generates OOM? 

I haven't tried my sample above, but I've written plenty of queries that join generators that have map values. 

-Igor

Bruno Bonacci

unread,
Aug 5, 2016, 11:06:44 AM8/5/16
to cascalog-user
Hi Igor,

Your sample is generating the OOM with the sample source already 

Bruno

Bruno Bonacci

unread,
Aug 12, 2016, 5:14:23 AM8/12/16
to cascalog-user
Hi,

So, it looks like there is no solution for this?

Bruno

Reply all
Reply to author
Forward
Message has been deleted
Message has been deleted
0 new messages