How to perform inner join in SparkR 2.0.0

13 views
Skip to first unread message

Shilp

unread,
Nov 18, 2016, 4:36:17 PM11/18/16
to SparkR Developers
Hi,
I have two SparkR data frames  say table1  with 2 columns "EmailID" and "Count" and table 2 with Columns as "ContactID" and "Name", "Address"

Sample table1

EmaiID    , Count


Sample table2 

"ContactID" , "Name", "Address"
c...@box.net,  "John",   " abc drive"
m...@google.com, "Kevin", "Alpha  drive"



I want to perform an inner join  using pattern matching between EmailID from Column 1 and ContactID  from Column 2 . 
so that the results is as follows :

EmaiID    ,                                                   Count,   "Name",    "Address" 
a...@usa.com;b...@verizon.com;c...@box.net,     3, "John",   " abc drive" 
a...@usa.net;ga...@google.com ,               2, NULL,     NULL 


 As you see since none of the values in row 2 for EmailID matched with ContactID Row 2 Value it puts NULL For Name and Address. 

I am doing the following in Spark R to perform this join 

registerTempTable(table1, "t1")
registerTempTable(table2, "t2")

tmp = sql(sqlContext, "SELECT t1.*, t2.* FROM t1 INNER JOIN t2 ON t1.EmailID LIKE Concat('%',t2.ContactID,'%')") 

But i get the error ::
Error: is.character(x) is not TRUE
trace:
stop(sprintf(ngettext(length(r), "%s is not TRUE", "%s are not all TRUE"), 
    ch), call. = FALSE, domain = NA)
stopifnot(is.character(x))

Any help appreciated . 
Reply all
Reply to author
Forward
0 new messages