Hi everybody,
I got a R script from my BI team, which does not support hadoop, so I have to implement RHadoop handling.
I found a good github project, that contains some R script, which implements linear regression in map reduce jobs.
So far all is working fine, locally.
After I switched hadoop mode on, I got an error:
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
I already know, that this error is related to the step function, which should fit my linear regression model stepwise.
Could somebody help me, to handle that stepwise fitting in reduce job?
I think, this code snipped should be enough for the beginning:
reduce = function(key,val.list) {
...
null = lm(target ~ 1 , data = df )
full = lm(target ~ . , data = df )
fit = step( null, scope=list(upper=full), data=df, direction="both" )
out <- pmml(fit, data=df)
keyval(key,out)
}
df is the data.frame, which contains all model factors.
both models null and full are created successfully, but fit fit crashes the reduce job.
Is there a possibility to implement a for loop over the models and fit it manually, so that hadoop is able to parallelize it?
Regards,
suerte