step {stats} in Reduce job

30 views
Skip to first unread message

Suerte

unread,
Nov 7, 2015, 9:25:45 AM11/7/15
to RHadoop
Hi everybody,

I got a R script from my BI team, which does not support hadoop, so I have to implement RHadoop handling.

I found a good github project, that contains some R script, which implements linear regression in map reduce jobs.
So far all is working fine, locally.
After I switched hadoop mode on, I got an error:

Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1

I already know, that this error is related to the step function, which should fit my linear regression model stepwise.

Could somebody help me, to handle that stepwise fitting in reduce job?
I think, this code snipped should be enough for the beginning:

reduce = function(key,val.list) {
...

null = lm(target ~ 1 , data = df )
full = lm(target ~ . , data = df )
fit = step( null, scope=list(upper=full), data=df, direction="both" )

out <- pmml(fit, data=df)
  keyval(key,out)
}

df is the data.frame, which contains all model factors.
both models null and full are created successfully, but fit fit crashes the reduce job.

Is there a possibility to implement a for loop over the models and fit it manually, so that hadoop is able to parallelize it? 

Regards,
suerte

Suerte

unread,
Nov 11, 2015, 6:32:53 AM11/11/15
to RHadoop
Hi Antonio Piccolboni,

i often read, that you want some rmr2 stderr files. Where can i find them on a hortonworks hdp sandbox?

Regards,
suerte
Reply all
Reply to author
Forward
0 new messages