how to avoid cross-product of two pipes

8 views

Skip to first unread message

Sunny Malik

unread,

Apr 24, 2014, 3:39:46 AM4/24/14

to cascadi...@googlegroups.com

Problem statement:

pipeOne has tuple's of (integer, List[integers])

kId, kList

e.g.: 1, [3, 4, 9]

2, [6, 7, 8]

pipeTwo has tuple's of (integer, List[integers])

cId, cList

e.g.: 101, [10, 6, 3]

102, [6, 8, 10]

I need to find intersection of all elements of kList in all values of cList and count each intersection as one

i.e. check list of kId=1, against all available cList in pipeTwo

check list of kId=2, against all available cList in pipeTwo

and so on

like two "FOR loops" in java

FYI: cannot join "pipeOne -> pipeTwo" on "kId -> cId" -- they are two different columns

One approach would be:

1) do cross product of two pipes

kid, kList cId, cList

1, [3, 4, 9] 101, [10, 6, 3]

2, [6, 7, 8] crossproduct with 102, [6, 8, 10]

cross-product will be

kid Klist cId cList

1 [3, 4, 9] 101 [10, 6, 3]

1 [3, 4, 9] 102 [6, 8, 10]

2 [6, 7, 8] 101 [10, 6, 3]

2 [6, 7, 8] 102 [6, 8, 10]

.map(kList, cList -> counts){

val localList = kList.intersection(cList)

if(localList.length > 0)

else

}

above, relies on cross-product and that is heavy operation specially for huge pipe size.

is there a better way of doing this..... may be using matrix or something.

i would like to avoid cross-product......

any help is appreciated

Thanks for help in advance.

-Sunny

Ken Krugler

unread,

Apr 24, 2014, 9:05:45 AM4/24/14

to cascadi...@googlegroups.com

In regular Cascading code, you'd generate an inverse mapping first, then use that to group, and count uniques.

E.g.

kListItem kId

3 1

4 1

9 1

6 2

7 2

8 2

cListItem cId

10 101

6 101

3 101

6 102

8 102

10 102

The join on kListItem & cListItem. You'll get duplicate matches, but then do a unique before counting.

-- Ken

--------------------------

Ken Krugler

+1 530-210-6378

http://www.scaleunlimited.com

custom big data solutions & training

Hadoop, Cascading, Cassandra & Solr

Reply all

Reply to author

Forward

0 new messages