Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Equijoin issues RMR 3.2.0

33 views
Skip to first unread message

Abhishek Gayakwad

unread,
Nov 10, 2014, 4:52:24 AM11/10/14
to rha...@googlegroups.com

Hey Antonio,

We are using RMR version 3.2.0 and we are facing problems with equijoin.There are two issues which we have faced, our keys are character vector and values are dataframe.

1. Duplicate rows for some specific keys ( its not random behaviour, every time it generated duplicate for same keys)
2. order of key and values is also not same in output of equioin 

Though I am not able to reproduce it with small data and in local mode. I am trying hard to come up with a testcase.
Mean while if you are already aware of such issue can you please share it with us.

Thanks 
Abhishek

Abhishek Gayakwad

unread,
Nov 10, 2014, 8:00:35 AM11/10/14
to rha...@googlegroups.com
Here is the testcase, with strings as keys equijoin is returning incorrect results, but with dataframe as key it is fine, please uncomment two commented lines in code incase you want to test it with data frame as keys.
____________________________________________________________________________


Key = c(381897554,345814151,380662040)
DCId = c(10,4905,7051)
Style = c(139139,983274,139146)
SeasonType=c("B","B","B")
AverageSeasonality=c(1.280740,1.280740,1.272458)
AverageDemand=c(5.1442942,0.2230782,0.2230782)
V1 =c(1.1689322,0.8658043,1.6103202)

histval = data.frame(Key=Key,DCId=DCId,Style=Style,SeasonType=SeasonType,AverageSeasonality=AverageSeasonality,AverageDemand=AverageDemand)
futval = data.frame(Key=Key,DCId=DCId,Style=Style,SeasonType=SeasonType,V1=V1)

futPath = to.dfs(futval)
histPath = to.dfs(histval)

mapperWithoutKeys <- function(k,v){
    keyval(paste(v$Key,v$DCId,v$Style,v$SeasonType, sep="-"), v[-c(1:4)])
    #keyval(data.frame(Key=v$Key,DCId=v$DCId,Style=v$Style,SeasonType = v$SeasonType),v[-c(1:4)])
}
mapper <- function(k,v){
    keyval(paste(v$Key,v$DCId,v$Style,v$SeasonType, sep="-"),v)
    #keyval(data.frame(Key=v$Key,DCId=v$DCId,Style=v$Style,SeasonType = v$SeasonType),v)
}


ejHist = mapreduce(input=histPath, map =mapper)
from.dfs(ejHist)
ejFut = mapreduce(input=futPath, map =mapperWithoutKeys)
from.dfs(ejFut)

reducer <- function( k, l, r){
    val <- merge(l,r, by=c())
    keyval(k, val)
}

ejOut = equijoin(left.input = ejHist ,right.input =ejFut , reduce =reducer)
from.dfs(ejOut)

____________________________________________________________________________

Antonio Piccolboni

unread,
Nov 10, 2014, 12:12:18 PM11/10/14
to rha...@googlegroups.com
Thanks for the excellent report. This is now bug https://github.com/RevolutionAnalytics/rmr2/issues/147. Let's continue the discussion there and close it here.


Antonio
Reply all
Reply to author
Forward
0 new messages