Need help with Joining

22 views
Skip to first unread message

gowrinadh

unread,
Dec 3, 2014, 11:24:09 PM12/3/14
to cascadi...@googlegroups.com
Hi,
I have two scenarios :
1) I have two files File_A and File_B
File_A has :
a
b
c
i
File_B has:
h
i
k
d
Z

I am looking for the output like:
a
a
b
c
i
i
I tried with joins but getting the output as
a a
b
c
i i
Is there a way to get the output is the desired format, I tried by doing the innerjoin first and merging the output with File_A. I could the expected output but not sure whether that is right way?

2)I have two files File_A and File_B
File_A has :
a
b
c
i
File_B has:
h
i
k
d
Z
I am looking for the output like:
h
k
d
Means I need to drop the record from File_B which are present in File_A.


Please help me on these scenarios.

Thanks,
Gowri



Ken Krugler

unread,
Dec 4, 2014, 12:18:44 AM12/4/14
to cascadi...@googlegroups.com
Hi Gowri,


From: gowrinadh

Sent: December 3, 2014 8:24:09pm PST

To: cascadi...@googlegroups.com

Subject: Need help with Joining



It feels like there must be a better approach, but the blunt instrument way is to split the result of the left side join and leave only left or right side values in each of the pipes, then filter nulls from the right side, and merge. Something like this (assuming lhsPipe has values from File_A, and rhsPipe has values from File_B):

        Fields lhsFields = new Fields("lhs-id");
        Fields rhsFields = new Fields("rhs-id");
        Pipe joinPipe = new CoGroup(lhsPipe, lhsFields, rhsPipe, rhsFields, new LeftJoin());
        
        Pipe lhsValid = new Pipe("lhs-valid", joinPipe);
        lhsValid = new Each(lhsValid, lhsFields, new Identity(new Fields("id")));

        

        Pipe rhsValid = new Pipe("rhs-valid", joinPipe);
        rhsValid = new Each(rhsValid, rhsFields, new Identity(new Fields("id")));
        rhsValid = new Each(rhsValid, new FilterNull());

        

        Pipe result = new Merge(lhsValid, rhsValid);

2)I have two files File_A and File_B
File_A has :
a
b
c
i
File_B has:
h
i
k
d
Z
I am looking for the output like:
h
k
d
Means I need to drop the record from File_B which are present in File_A.

What happened to the last record from File_B ('Z')?

Using the same lhsPipe/rhsPipe approach as above, you'd do a RightJoin, then remove any record where the "lhs-id" field is not null.

        Fields lhsFields = new Fields("lhs-id");
        Fields rhsFields = new Fields("rhs-id");
        Pipe joinPipe = new CoGroup(lhsPipe, lhsFields, rhsPipe, rhsFields, new RightJoin());
        joinPipe = new Each(joinPipe, lhsFields, new Not(new FilterNull()));
        joinPipe = new Retain(joinPipe, rhsFields);

-- Ken

--------------------------
Ken Krugler
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr





gowrinadh

unread,
Dec 4, 2014, 12:54:42 AM12/4/14
to cascadi...@googlegroups.com
Oops..I missed Z. It should be there.
Thank you very much Ken.
Reply all
Reply to author
Forward
0 new messages