hadoop streaming failed with error code 1

fatma elsafoury

unread,

Aug 26, 2015, 9:55:46 AM8/26/15

to RHadoop

Hi,

it's my first time using Hadoop and R. Here i'm trying to run word count in R but i always get this error. I'm using Hadoop 2.7.1 on mac osx 10.9.5 and R 3.2.1

Error: Could not find or load main class org.apache.hadoop.util.RunJar

Show Traceback

Rerun with Debug

Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :

hadoop streaming failed with error code 1

---------------------------------------------------------------------------------------------------------------

The R code is

--------------------------------

Sys.setenv("HADOOP_PREFIX"="/usr/local/Cellar/hadoop/2.7.1")

Sys.setenv("HADOOP_CMD"="/usr/local/CELLAR/hadoop/2.7.1/bin/hadoop")

Sys.setenv("HADOOP_STREAMING"="/usr/local/Cellar/hadoop/2.7.1/libexec/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar")

library(rmr2)

## map function

map <- function(k,lines) {

words.list <- strsplit(lines, '\\s')

words <- unlist(words.list)

return( keyval(words, 1) )

}

## reduce function

reduce <- function(word, counts) {

keyval(word, sum(counts))

}

wordcount <- function (input, output=NULL) {

mapreduce(input=input, output=output, input.format="text",

map=map, reduce=reduce)

}

## delete previous result if any

#system("/Users/hadoop/hadoop-1.1.2/bin/hadoop fs -rm-r hdfs://localhost:9000/data/wordcount/outcome")

## Submit job

hdfs.root <- '/wordscount/data'

hdfs.data <- file.path(hdfs.root, 'README.text')

hdfs.out <- file.path(hdfs.root, 'res')

out <- wordcount(hdfs.data, hdfs.out)

## Fetch results from HDFS

results <- from.dfs(out)

## check top 30 frequent words

results.df <- as.data.frame(results, stringsAsFactors=F)

colnames(results.df) <- c('word', 'count')

head(results.df[order(results.df$count, decreasing=T), ], 30)

Antonio Piccolboni

unread,

Aug 26, 2015, 11:16:17 AM8/26/15

to RHadoop

It looks like a hadoop problem on which rmr has no bearing. Did you try one of the streaming examples packaged with Hadoop to see if it works? HADOOP_PREFIX is not a variable that rmr2 has anything to deal with, but it shouldn't hurt.

--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Fatma Amin

unread,

Aug 26, 2015, 11:55:27 AM8/26/15

to rha...@googlegroups.com

I tried Hadoop_Stream.jar from the mac terminal with mappers and reducers written in R and the job also failed. So i don’t know if the problem is coming from the hadoop_streaming or from the R

here is the command i used

hadoop jar /usr/local/Cellar/hadoop/2.7.1/libexec/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar -input /wordscount/data/README.text  -output /wordscount/data/res2 -mapper /Users/fatma/Work/RHadoop/mapper.R -reducer /Users/fatma/Work/RHadoop/reducer.R

15/08/26 15:49:11 ERROR streaming.StreamJob: Job not successful!

Streaming Command Failed!

can you tell me how to try one of the streaming examples packaged with hadoop ???

You received this message because you are subscribed to a topic in the Google Groups "RHadoop" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rhadoop/cywJUHelSfI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rhadoop+u...@googlegroups.com.

Fatma Amin

unread,

Aug 26, 2015, 12:36:45 PM8/26/15

to rha...@googlegroups.com

I tried another example

Sys.setenv("HADOOP_CMD"="/usr/local/CELLAR/hadoop/2.7.1/bin/hadoop")

Sys.setenv("HADOOP_HOME"="/usr/local/CELLAR/hadoop/2.7.1")

Sys.setenv("HADOOP_STREAMING"="/usr/local/Cellar/hadoop/2.7.1/libexec/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar")

library(rmr2)

library(rhdfs)

ints2 = to.dfs(1:100)

calc = mapreduce(input = ints,

map = function(k, v) cbind(v, 2*v))

from.dfs(calc)

and it gives me this error

Error: Could not find or load main class org.apache.hadoop.util.RunJar

Antonio Piccolboni

unread,

Aug 26, 2015, 12:50:03 PM8/26/15

to rha...@googlegroups.com

Unfortunately I need to refer you to your Hadoop docs or provider. To cut out R completely you can just specify "cat" as the mapper value and no reducer, you don't need any pre-packaged examples since you can write the cmd line yourself as you have demonstrated in the last message. Not finding a class, that suggests a problem with your java installation or configuration. Unfortunately it's beyond what I can assist you with.

Fatma Amin

unread,

Aug 26, 2015, 1:01:08 PM8/26/15

to rha...@googlegroups.com

Thank you

i tried the command as you suggested

hadoop jar /usr/local/Cellar/hadoop/2.7.1/libexec/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar -input /wordscount/data/README.text  -output /wordscount/data/res3 -mapper “cat"

and the job worked successfully. So do you have any idea where is the problem??

Thank you

On 26 Aug 2015, at 16:55, Fatma Amin <e.fa...@gmail.com> wrote:

Antonio Piccolboni

unread,

Aug 26, 2015, 1:03:44 PM8/26/15

to rha...@googlegroups.com

rmr does not look for or accesses any classes, so it must be something in your script that has to deal with java.

Reply all

Reply to author

Forward