hadoop streaming failed with error code 1

160 views
Skip to first unread message

fatma elsafoury

unread,
Aug 26, 2015, 9:55:46 AM8/26/15
to RHadoop


Hi,

it's my first time using Hadoop and R. Here i'm trying to run word count in R but i always get this error. I'm using Hadoop 2.7.1 on mac osx 10.9.5 and R 3.2.1

Error: Could not find or load main class org.apache.hadoop.util.RunJar
 Show Traceback
 
 Rerun with Debug
 Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce,  : 
  hadoop streaming failed with error code 1 
---------------------------------------------------------------------------------------------------------------

The R code is 
--------------------------------

Sys.setenv("HADOOP_PREFIX"="/usr/local/Cellar/hadoop/2.7.1")
Sys.setenv("HADOOP_CMD"="/usr/local/CELLAR/hadoop/2.7.1/bin/hadoop")
Sys.setenv("HADOOP_STREAMING"="/usr/local/Cellar/hadoop/2.7.1/libexec/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar")

library(rmr2) 


## map function
map <- function(k,lines) {
  words.list <- strsplit(lines, '\\s') 
  words <- unlist(words.list)
  return( keyval(words, 1) )
}

## reduce function
reduce <- function(word, counts) { 
  keyval(word, sum(counts))
}

wordcount <- function (input, output=NULL) { 
  mapreduce(input=input, output=output, input.format="text", 
            map=map, reduce=reduce)
}


## delete previous result if any
#system("/Users/hadoop/hadoop-1.1.2/bin/hadoop fs -rm-r hdfs://localhost:9000/data/wordcount/outcome")

## Submit job
hdfs.root <- '/wordscount/data'
hdfs.data <- file.path(hdfs.root, 'README.text') 
hdfs.out <- file.path(hdfs.root, 'res') 
out <- wordcount(hdfs.data, hdfs.out)

## Fetch results from HDFS
results <- from.dfs(out)

## check top 30 frequent words
results.df <- as.data.frame(results, stringsAsFactors=F) 
colnames(results.df) <- c('word', 'count') 
head(results.df[order(results.df$count, decreasing=T), ], 30)

Antonio Piccolboni

unread,
Aug 26, 2015, 11:16:17 AM8/26/15
to RHadoop
It looks like a hadoop problem on which rmr has no bearing. Did you try one of the streaming examples packaged with Hadoop to see if it works? HADOOP_PREFIX is not a variable that rmr2 has anything to deal with, but it shouldn't hurt.


--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Fatma Amin

unread,
Aug 26, 2015, 11:55:27 AM8/26/15
to rha...@googlegroups.com

I tried Hadoop_Stream.jar from the mac terminal with mappers and reducers written in R and the job also failed. So i don’t know if the problem is coming from the hadoop_streaming or from the R

here is the command i used

hadoop jar /usr/local/Cellar/hadoop/2.7.1/libexec/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar -input /wordscount/data/README.text  -output /wordscount/data/res2 -mapper /Users/fatma/Work/RHadoop/mapper.R -reducer /Users/fatma/Work/RHadoop/reducer.R

15/08/26 15:49:11 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!

can you tell me how to try one of the streaming examples packaged with hadoop ???



You received this message because you are subscribed to a topic in the Google Groups "RHadoop" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rhadoop/cywJUHelSfI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rhadoop+u...@googlegroups.com.

Fatma Amin

unread,
Aug 26, 2015, 12:36:45 PM8/26/15
to rha...@googlegroups.com
I tried another example 

Sys.setenv("HADOOP_CMD"="/usr/local/CELLAR/hadoop/2.7.1/bin/hadoop")
Sys.setenv("HADOOP_HOME"="/usr/local/CELLAR/hadoop/2.7.1")
Sys.setenv("HADOOP_STREAMING"="/usr/local/Cellar/hadoop/2.7.1/libexec/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar")

library(rmr2)
library(rhdfs)

ints2 = to.dfs(1:100)
calc = mapreduce(input = ints,
                 map = function(k, v) cbind(v, 2*v))

from.dfs(calc)


and it gives me this error

 Error: Could not find or load main class org.apache.hadoop.util.RunJar


Antonio Piccolboni

unread,
Aug 26, 2015, 12:50:03 PM8/26/15
to rha...@googlegroups.com
Unfortunately I need to refer you to your Hadoop docs or provider. To cut out R completely you can just specify "cat" as the mapper value and no reducer, you don't need any pre-packaged examples since you can write the cmd line yourself as you have demonstrated in the last message. Not finding a class, that suggests a problem with your java installation or configuration. Unfortunately it's beyond what I can assist you with.

Fatma Amin

unread,
Aug 26, 2015, 1:01:08 PM8/26/15
to rha...@googlegroups.com

Thank you

i tried the command as you suggested 

hadoop jar /usr/local/Cellar/hadoop/2.7.1/libexec/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar -input /wordscount/data/README.text  -output /wordscount/data/res3 -mapper “cat"

and the job worked successfully. So do you have any idea where is the problem??

Thank you

On 26 Aug 2015, at 16:55, Fatma Amin <e.fa...@gmail.com> wrote:

Antonio Piccolboni

unread,
Aug 26, 2015, 1:03:44 PM8/26/15
to rha...@googlegroups.com
rmr does not look for or accesses any classes, so it must be something in your script that has to deal with java. 
Reply all
Reply to author
Forward
0 new messages