Problem statement:
pipeOne has tuple's of (integer, List[integers])
kId, kList
e.g.: 1, [3, 4, 9]
2, [6, 7, 8]
pipeTwo has tuple's of (integer, List[integers])
cId, cList
e.g.: 101, [10, 6, 3]
102, [6, 8, 10]
I need to find intersection of all elements of kList in all values of cList and count each intersection as one
i.e. check list of kId=1, against all available cList in pipeTwo
check list of kId=2, against all available cList in pipeTwo
and so on
like two "FOR loops" in java
FYI: cannot join "pipeOne -> pipeTwo" on "kId -> cId" -- they are two different columns
One approach would be:
1) do cross product of two pipes
kid, kList cId, cList
1, [3, 4, 9] 101, [10, 6, 3]
2, [6, 7, 8] crossproduct with 102, [6, 8, 10]
cross-product will be
kid Klist cId cList
1 [3, 4, 9] 101 [10, 6, 3]
1 [3, 4, 9] 102 [6, 8, 10]
2 [6, 7, 8] 101 [10, 6, 3]
2 [6, 7, 8] 102 [6, 8, 10]
.map(kList, cList -> counts){
val localList = kList.intersection(cList)
if(localList.length > 0)
1
else
0
}
above, relies on cross-product and that is heavy operation specially for huge pipe size.
is there a better way of doing this..... may be using matrix or something.
i would like to avoid cross-product......
any help is appreciated
Thanks for help in advance.
-Sunny