An error of fitRandomForest.R (Kaggle Data Set on Bulldozer Sale Prices)

46 views
Skip to first unread message

刘文杰

unread,
Dec 9, 2014, 10:52:13 PM12/9/14
to rha...@googlegroups.com
    I  found fitRandomForest.R script in Github(https://github.com/cloudera/poisson_sampling/blob/master/src/fitRandomForest.R), and you are a contributor. I run the scripts(both joindata.R and fitRandomForest.R )
on Kaggle Data Set on Bulldozer Sale Prices , I changed the Input and Output to my own directories and got
an error like this:
   
14/12/10 10:37:47 INFO mapreduce.Job: The url to track the job: http://bigdata01:8088/proxy/application_1415692952566_0207/
14/12/10 10:37:47 INFO mapreduce.Job: Running job: job_1415692952566_0207
14/12/10 10:37:58 INFO mapreduce.Job: Job job_1415692952566_0207 running in uber mode : false
14/12/10 10:37:58 INFO mapreduce.Job:  map 0% reduce 0%
14/12/10 10:38:04 INFO mapreduce.Job:  map 50% reduce 0%
14/12/10 10:38:04 INFO mapreduce.Job: Task Id : attempt_1415692952566_0207_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
     ......
14/12/10 10:38:25 ERROR streaming.StreamJob: Job not Successful!
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce,  :
  hadoop streaming failed with error code 1


The version of packages are: rhdfs_1.0.8.tar.gz, rmr2_3.2.0.tar.gz
version of R is: R-2.15.3.tar.gz
should I set FALSE to do.trace on my version of rhadoop packages?  where can I find more details?   Thank you for your attention and expect for your answers!

Antonio Piccolboni

unread,
Jan 5, 2015, 12:00:18 PM1/5/15
to rha...@googlegroups.com
Can't support all rmr2 based code written by others, even my famous colleagues at Cloudera. But I'd recommend following the rmr2 debugging guidelines if you want to get to the bottom of this. Link atop the message list.
Reply all
Reply to author
Forward
0 new messages