PipeMapRed.waitOutputThreads(): subprocess failed with code 1, Warning: namespace ‘rmr2’ is not ava

3,857 views

bugrmr

Skip to first unread message

Marc Paul

unread,

Dec 19, 2014, 3:26:01 AM12/19/14

to rha...@googlegroups.com

Hi Antonio,

thanks for your answer. Sorry I missed the message that the first port is moderated.

I found the link for the debugging guideline: https://github.com/RevolutionAnalytics/RHadoop/wiki/user%3Ermr%3EDebugging-rmr-programs

It's another link like you describ in the intro. Maybe this is interesting for you, so that you get more exactly error-questions.

So maybe I write now the real question with an exactly error log:

To the case: I have a cluster with 8 nodes

On each node is Hadoop 2.2.0 isntalled and R with all the packages(rhdfs, rjava, rmr2...)

So the firstmapreduce (small.ints 1:1000) runs without porblems in backend=local and backend=hadoop.

But I got problems with the wordcount-example. It runs perfect in backend=local, but in backend=hadoop, I got many problems.

For your node: I switch the reduce-function off, like in the error-guideline mentioned.

When I take a small .txt-file for the wordcount-example sometimes it works without problems. Sometimes I got errors, but the programm runs until the end and I get an right output.

The error in this case is teh following one:

14/12/19 08:10:47 INFO mapreduce.Job: map 0% reduce 0%

14/12/19 08:10:57 INFO mapreduce.Job: Task Id : attempt_1418972737310_0002_m_000001_0, Status : FAILED

Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1

at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)

at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)

at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)

at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)

at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)

14/12/19 08:10:57 INFO mapreduce.Job: Task Id : attempt_1418972737310_0002_m_000000_0, Status : FAILED

Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1

at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)

at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)

at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)

at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)

at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)

14/12/19 08:11:03 INFO mapreduce.Job: Task Id : attempt_1418972737310_0002_m_000001_1, Status : FAILED

Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1

at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)

at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)

at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)

at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)

at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)

14/12/19 08:11:05 INFO mapreduce.Job: map 50% reduce 0%

14/12/19 08:11:10 INFO mapreduce.Job: map 100% reduce 0%

14/12/19 08:11:10 INFO mapreduce.Job: Job job_1418972737310_0002 completed successfully

14/12/19 08:11:10 INFO mapreduce.Job: Counters: 29

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=168154

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=309

HDFS: Number of bytes written=2583

HDFS: Number of read operations=10

HDFS: Number of large read operations=0

HDFS: Number of write operations=4

Job Counters

Failed map tasks=3

Launched map tasks=5

Other local map tasks=3

Rack-local map tasks=2

Total time spent by all maps in occupied slots (ms)=29903

Total time spent by all reduces in occupied slots (ms)=0

Map-Reduce Framework

Map input records=3

Map output records=24

Input split bytes=184

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=67

CPU time spent (ms)=1060

Physical memory (bytes) snapshot=300879872

Virtual memory (bytes) snapshot=2485153792

Total committed heap usage (bytes)=214433792

File Input Format Counters

Bytes Read=125

File Output Format Counters

Bytes Written=2583

14/12/19 08:11:10 INFO streaming.StreamJob: Output directory: /tmp/file12c878fae41b

14/12/19 08:11:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Like described in the guideline I go to check the userlogs:

(the userlogs are sometimes empty, although I got an error from the R-console. That I don't understand)

But here is the error log: (notice the small.ints example works)

administrator@l101-pc01:/usr/local/hadoop/logs/userlogs/application_1418972737310_0002$ ls

container_1418972737310_0002_01_000002 container_1418972737310_0002_01_000003 container_1418972737310_0002_01_000004

administrator@l101-pc01:/usr/local/hadoop/logs/userlogs/application_1418972737310_0002/container_1418972737310_0002_01_000002$ cat stderr

Loading objects:

wordcount

Loading objects:

backend.parameters

combine

combine.file

combine.line

debug

default.input.format

Warning: namespace ‘rmr2’ is not available and has been replaced

by .GlobalEnv when processing object ‘default.input.format’

default.output.format

in.folder

in.memory.combine

input.format

libs

map

map.file

map.line

out.folder

output.format

pkg.opts

postamble

preamble

profile.nodes

reduce

reduce.file

reduce.line

rmr.global.env

rmr.local.env

save.env

tempfile

vectorized.reduce

verbose

work.dir

Loading required package: methods

Loading required package: rmr2

Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :

there is no package called ‘stringr’

Warning in FUN(c("base", "methods", "datasets", "utils", "grDevices", "graphics", :

can't load rmr2

Loading required package: rJava

Loading required package: rhdfs

Error : .onLoad failed in loadNamespace() for 'rhdfs', details:

call: fun(libname, pkgname)

error: Environment variable HADOOP_CMD must be set before loading package rhdfs

Warning in FUN(c("base", "methods", "datasets", "utils", "grDevices", "graphics", :

can't load rhdfs

Loading objects:

backend.parameters

combine

combine.file

combine.line

debug

default.input.format

Warning: namespace ‘rmr2’ is not available and has been replaced

by .GlobalEnv when processing object ‘default.input.format’

default.output.format

in.folder

in.memory.combine

input.format

libs

map

map.file

map.line

out.folder

output.format

pkg.opts

postamble

preamble

profile.nodes

reduce

reduce.file

reduce.line

rmr.global.env

rmr.local.env

save.env

tempfile

vectorized.reduce

verbose

work.dir

Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :

there is no package called ‘stringr’

Calls: <Anonymous> ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>

No traceback available

Error during wrapup:

Execution halted

The rmr2 package is intalled, otherwise the small.inits-example would not work or not?

That I don't understand.

The last question for me is:

I have 8 nodes,I see them on localhost:50070, but when go the URL localhost:8088 and looking for the application

I see only that one node is called:

ApplicationMaster

Attempt Number Start Time Node Logs

1 19-Dec-2014 08:10:40 l101-pc06:8042 logs

Like you see in the userlogs, there are three different container?that I don't understand:

And here is the R library from node l101-pc06:

Packages in library ‘/usr/local/lib/R/site-library’:

bitops Bitwise Operations

caTools Tools: moving window statistics, GIF, Base64,

ROC AUC, etc.

devtools Tools to make developing R code easier

digest Create Cryptographic Hash Digests of R Objects

evaluate Parsing and evaluation tools that provide more

details than the default.

functional Curry, Compose, and other higher-order

functions

httr Tools for Working with URLs and HTTP

iterators Iterator construct for R

itertools Iterator Tools

jsonlite A Robust, High Performance JSON Parser and

Generator for R

memoise Memoise functions

mime Map filenames to MIME types

plyr Tools for splitting, applying and combining

data

R6 Classes with reference semantics

Rcpp Seamless R and C++ Integration

RCurl General network (HTTP/FTP/...) client interface

for R

reshape2 Flexibly Reshape Data: A Reboot of the Reshape

Package.

rhdfs R and Hadoop Distributed Filesystem

rJava Low-level R to Java interface

RJSONIO Serialize R objects to JSON, JavaScript Object

Notation

rmr2 R and Hadoop Streaming Connector

rstudioapi Safely access the RStudio API.

stringr Make it easier to work with strings.

whisker {{mustache}} for R, logicless templating

Packages in library ‘/usr/lib/R/library’:

base The R Base Package

boot Bootstrap Functions (originally by Angelo Canty

for S)

class Functions for Classification

cluster Cluster Analysis Extended Rousseeuw et al.

codetools Code Analysis Tools for R

compiler The R Compiler Package

datasets The R Datasets Package

foreign Read Data Stored by Minitab, S, SAS, SPSS,

Stata, Systat, Weka, dBase, ...

graphics The R Graphics Package

grDevices The R Graphics Devices and Support for Colours

and Fonts

grid The Grid Graphics Package

KernSmooth Functions for kernel smoothing for Wand & Jones

(1995)

lattice Lattice Graphics

MASS Support Functions and Datasets for Venables and

Ripley's MASS

Matrix Sparse and Dense Matrix Classes and Methods

methods Formal Methods and Classes

mgcv Mixed GAM Computation Vehicle with GCV/AIC/REML

smoothness estimation

nlme Linear and Nonlinear Mixed Effects Models

nnet Feed-forward Neural Networks and Multinomial

Log-Linear Models

parallel Support for Parallel computation in R

rpart Recursive Partitioning and Regression Trees

spatial Functions for Kriging and Point Pattern

Analysis

splines Regression Spline Functions and Classes

stats The R Stats Package

stats4 Statistical Functions using S4 Classes

survival Survival Analysis

Thank you for the work and I hope you can help me.

Best regards

Marc

Antonio Piccolboni

unread,

Dec 19, 2014, 10:16:02 AM12/19/14

to rha...@googlegroups.com

If you go to the end of the stderr log, you can see this message

Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :

there is no package called ‘stringr’

Which makes the execution stop. I am not sure why your program needs stringr, but if it uses functions from stringr you need to install stringr on each node. If it doesn't, I would detach stringr before executing the wordcount program, so that it's not loaded on the nodes either.