error using a rmr task

tom tom

unread,

Mar 2, 2014, 5:36:54 AM3/2/14

to rha...@googlegroups.com

Hello all.

I"m using newest rmr version (3.0.0 ) with a 'simple' mr job for new install inspection:

input.size=1000

input.ga = to.dfs(cbind(1:input.size, rnorm(input.size)))

group = function(x) x%%10

aggregate = function(x) sum(x)

result = mapreduce(

input.ga,

output='/tmp/tomery_4/RHadoop/test_12',

output.format='csv',

map = function(k, v) keyval(group(v[,1]), v[,2]),

reduce = function(k, vv) keyval(k, aggregate(vv)),

combine = TRUE

)

but get the following error:

packageJobJar: [/tmp/RtmptFGQL8/rmr-local-env7da05f86a892, /tmp/RtmptFGQL8/rmr-global-env7da018158b5f, /tmp/RtmptFGQL8/rmr-streaming-map7da0792d615d, /tmp/RtmptFGQL8/rmr-streaming-reduce7da02874a966, /tmp/RtmptFGQL8/rmr-streaming-combine7da0f6cf50, /tmp/hadoop-tomery/hadoop-unjar6397472233474176727/] [] /tmp/streamjob8447845707272548686.jar tmpDir=null
14/03/02 05:12:46 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/03/02 05:12:46 INFO mapred.FileInputFormat: Total input paths to process : 1
14/03/02 05:12:46 INFO streaming.StreamJob: getLocalDirs(): [/data/cluster/mapred/local]
14/03/02 05:12:46 INFO streaming.StreamJob: Running job: job_201403020418_0013
14/03/02 05:12:46 INFO streaming.StreamJob: To kill this job, run:
14/03/02 05:12:46 INFO streaming.StreamJob: /usr/lib/hadoop//bin/hadoop job  -Dmapred.job.tracker=DA-NN01:8021 -kill job_201403020418_0013
14/03/02 05:12:46 INFO streaming.StreamJob: Tracking URL: http://DA-NN01:50030/jobdetails.jsp?jobid=job_201403020418_0013
14/03/02 05:12:47 INFO streaming.StreamJob:  map 0%  reduce 0%
14/03/02 05:13:07 INFO streaming.StreamJob:  map 50%  reduce 0%
14/03/02 05:13:11 INFO streaming.StreamJob:  map 0%  reduce 0%
14/03/02 05:13:20 INFO streaming.StreamJob:  map 50%  reduce 0%
14/03/02 05:13:36 INFO streaming.StreamJob:  map 0%  reduce 0%
14/03/02 05:13:57 INFO streaming.StreamJob:  map 100%  reduce 100%
14/03/02 05:13:57 INFO streaming.StreamJob: To kill this job, run:
14/03/02 05:13:57 INFO streaming.StreamJob: /usr/lib/hadoop//bin/hadoop job  -Dmapred.job.tracker=DA-NN01:8021 -kill job_201403020418_0013
14/03/02 05:13:57 INFO streaming.StreamJob: Tracking URL: http://DA-NN01:50030/jobdetails.jsp?jobid=job_201403020418_0013
14/03/02 05:13:57 ERROR streaming.StreamJob: Job not successful. Error: NA
14/03/02 05:13:57 INFO streaming.StreamJob: killJob...
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce,  : 
  hadoop streaming failed with error code 1

*Can the start using of Trash option of Hadoop be a reason? As i recall the same code was running succesffully before that...

Thanks in advance !

 Tomer

Antonio Piccolboni

unread,

Mar 2, 2014, 12:20:17 PM3/2/14

to RHadoop Google Group

Please provide a bug report according to guidelines. There is absolutely nothing I can do from console output with hadoop in distributed mode. Follow that tracking url and get to the stderr log of one of the failing map attempts and post it back here. Thanks

Antonio

--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Tomer Yaniv

unread,

Mar 2, 2014, 4:29:01 PM3/2/14

to rha...@googlegroups.com, ant...@piccolboni.info

Thanks Antonio - you're right of course.Hope the attached is enough:

Thing is , following the tutorial - the only-map task:

ints = to.dfs(1:1000)

temp=13

squares = mapreduce(

input=ints,

output='/tmp/tomery3',

output.format='csv',

map=function(k,v) cbind(v, v^2+temp)

)

executes successfully , but the other one - again from tutorial:

groups_ = rbinom(32, n = 50, prob = 0.4)

groups = to.dfs(groups_)

from.dfs(

mapreduce(

input = groups,

map = function(., v) keyval(v, 1),

reduce =

function(k, vv)

keyval(k, length(vv))))

prints out an error:

packageJobJar: [/tmp/RtmpOrNL9A/rmr-local-env1a07551f3781, /tmp/RtmpOrNL9A/rmr-global-env1a07516f0741, /tmp/RtmpOrNL9A/rmr-streaming-map1a076cb9223b, /tmp/RtmpOrNL9A/rmr-streaming-reduce1a075b443b67, /tmp/hadoop-tomery/hadoop-unjar1411120999921516054/] [] /tmp/streamjob4529369072112549439.jar tmpDir=null
14/03/02 16:15:55 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/03/02 16:15:55 INFO mapred.FileInputFormat: Total input paths to process : 1
14/03/02 16:15:56 INFO streaming.StreamJob: getLocalDirs(): [/data/cluster/mapred/local]
14/03/02 16:15:56 INFO streaming.StreamJob: Running job: job_201403021441_0014
14/03/02 16:15:56 INFO streaming.StreamJob: To kill this job, run:
14/03/02 16:15:56 INFO streaming.StreamJob: /usr/lib/hadoop//bin/hadoop job  -Dmapred.job.tracker=DA-NN01:8021 -kill job_201403021441_0014
14/03/02 16:15:56 INFO streaming.StreamJob: Tracking URL: http://DA-NN01:50030/jobdetails.jsp?jobid=job_201403021441_0014
14/03/02 16:15:57 INFO streaming.StreamJob:  map 0%  reduce 0%
14/03/02 16:17:14 INFO streaming.StreamJob:  map 50%  reduce 0%
14/03/02 16:17:26 INFO streaming.StreamJob:  map 0%  reduce 0%
14/03/02 16:17:46 INFO streaming.StreamJob:  map 100%  reduce 100%
14/03/02 16:17:46 INFO streaming.StreamJob: To kill this job, run:
14/03/02 16:17:46 INFO streaming.StreamJob: /usr/lib/hadoop//bin/hadoop job  -Dmapred.job.tracker=DA-NN01:8021 -kill job_201403021441_0014
14/03/02 16:17:46 INFO streaming.StreamJob: Tracking URL: http://DA-NN01:50030/jobdetails.jsp?jobid=job_201403021441_0014
14/03/02 16:17:46 ERROR streaming.StreamJob: Job not successful. Error: NA
14/03/02 16:17:46 INFO streaming.StreamJob: killJob...
Streaming Command Failed!

Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce,  : 
  hadoop streaming failed with error code 1

Deleted /tmp/RtmpOrNL9A/file1a0712b4eb48

*** and the logs looks like:

java.lang.RuntimeException: java.io.EOFException
	at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:376)
	at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
	at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
	at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.io.EOFException
	at java.io.DataInputStream.readFully(DataInputStream.java:197)
	at org.apache.hadoop.typedbytes.TypedBytesInput.readRawBytes(Type

Thanks !

Antonio Piccolboni

unread,

Mar 2, 2014, 4:45:49 PM3/2/14

to Tomer Yaniv, RHadoop Google Group

Hi,

I need stderr of the R process, if you find a java stack trace, that can not possibly be it.

Antonio

Tomer Yaniv

unread,

Mar 2, 2014, 7:20:01 PM3/2/14

to rha...@googlegroups.com, Tomer Yaniv, ant...@piccolboni.info

Hello again.

I'm probably missing something - not aware of the stderr of R process although tried to find some material about using it with regards to mapreduce tasks even with the tutorial and debug section. Could you give me some hint about that ? ... BTW - the mappers within the mapreduce functions i"m running are doing ok ! it's the reduce phase which ruins it ...

Thanks again !

SH.Chou

unread,

Mar 2, 2014, 8:07:05 PM3/2/14

to rha...@googlegroups.com

Hi Tomer,

This is how I found stderr.

Go to Jobtracker page ( <ip>:50030/jobtracker.jsp)==> select your failed job (e.g. job_201402251621_0122 ) ==> click map ==> select one of failed task (e.g. task_201402251621_0122_m_000000) ==> click "All" in Task Logs column. Then you will find stderr.

--
=====================================
Shih-Hsiung, Chou
PH.D Candidate at
Department of Industrial Manufacturing
and Systems Engineering
Kansas State University

tom tom

unread,

Mar 16, 2014, 6:03:43 PM3/16/14

to rha...@googlegroups.com

Thanks !

We've managed to solve the problem which was configuration based and not rmr usage.