Hi Antonio,
thanks for your answer. Sorry I missed the message that the first port is moderated.
I found the link for the debugging guideline: https://github.com/RevolutionAnalytics/RHadoop/wiki/user%3Ermr%3EDebugging-rmr-programs
It's another link like you describ in the intro. Maybe this is interesting for you, so that you get more exactly error-questions.
So maybe I write now the real question with an exactly error log:
To the case: I have a cluster with 8 nodes
On each node is Hadoop 2.2.0 isntalled and R with all the packages(rhdfs, rjava, rmr2...)
So the firstmapreduce (small.ints 1:1000) runs without porblems in backend=local and backend=hadoop.
But I got problems with the wordcount-example. It runs perfect in backend=local, but in backend=hadoop, I got many problems.
For your node: I switch the reduce-function off, like in the error-guideline mentioned.
When I take a small .txt-file for the wordcount-example sometimes it works without problems. Sometimes I got errors, but the programm runs until the end and I get an right output.
The error in this case is teh following one:
14/12/19 08:10:47 INFO mapreduce.Job: map 0% reduce 0%
14/12/19 08:10:57 INFO mapreduce.Job: Task Id : attempt_1418972737310_0002_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
14/12/19 08:10:57 INFO mapreduce.Job: Task Id : attempt_1418972737310_0002_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
14/12/19 08:11:03 INFO mapreduce.Job: Task Id : attempt_1418972737310_0002_m_000001_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
14/12/19 08:11:05 INFO mapreduce.Job: map 50% reduce 0%
14/12/19 08:11:10 INFO mapreduce.Job: map 100% reduce 0%
14/12/19 08:11:10 INFO mapreduce.Job: Job job_1418972737310_0002 completed successfully
14/12/19 08:11:10 INFO mapreduce.Job: Counters: 29
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=168154
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=309
HDFS: Number of bytes written=2583
HDFS: Number of read operations=10
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Job Counters
Failed map tasks=3
Launched map tasks=5
Other local map tasks=3
Rack-local map tasks=2
Total time spent by all maps in occupied slots (ms)=29903
Total time spent by all reduces in occupied slots (ms)=0
Map-Reduce Framework
Map input records=3
Map output records=24
Input split bytes=184
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=67
CPU time spent (ms)=1060
Physical memory (bytes) snapshot=300879872
Virtual memory (bytes) snapshot=2485153792
Total committed heap usage (bytes)=214433792
File Input Format Counters
Bytes Read=125
File Output Format Counters
Bytes Written=2583
14/12/19 08:11:10 INFO streaming.StreamJob: Output directory: /tmp/file12c878fae41b
14/12/19 08:11:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Like described in the guideline I go to check the userlogs:
(the userlogs are sometimes empty, although I got an error from the R-console. That I don't understand)
But here is the error log: (notice the small.ints example works)
administrator@l101-pc01:/usr/local/hadoop/logs/userlogs/application_1418972737310_0002$ ls
container_1418972737310_0002_01_000002 container_1418972737310_0002_01_000003 container_1418972737310_0002_01_000004
administrator@l101-pc01:/usr/local/hadoop/logs/userlogs/application_1418972737310_0002/container_1418972737310_0002_01_000002$ cat stderr
Loading objects:
wordcount
Loading objects:
backend.parameters
combine
combine.file
combine.line
debug
default.input.format
Warning: namespace ‘rmr2’ is not available and has been replaced
by .GlobalEnv when processing object ‘default.input.format’
default.output.format
in.folder
in.memory.combine
input.format
libs
map
map.file
map.line
out.folder
output.format
pkg.opts
postamble
preamble
profile.nodes
reduce
reduce.file
reduce.line
rmr.global.env
rmr.local.env
save.env
tempfile
vectorized.reduce
verbose
work.dir
Loading required package: methods
Loading required package: rmr2
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
there is no package called ‘stringr’
Warning in FUN(c("base", "methods", "datasets", "utils", "grDevices", "graphics", :
can't load rmr2
Loading required package: rJava
Loading required package: rhdfs
Error : .onLoad failed in loadNamespace() for 'rhdfs', details:
call: fun(libname, pkgname)
error: Environment variable HADOOP_CMD must be set before loading package rhdfs
Warning in FUN(c("base", "methods", "datasets", "utils", "grDevices", "graphics", :
can't load rhdfs
Loading objects:
backend.parameters
combine
combine.file
combine.line
debug
default.input.format
Warning: namespace ‘rmr2’ is not available and has been replaced
by .GlobalEnv when processing object ‘default.input.format’
default.output.format
in.folder
in.memory.combine
input.format
libs
map
map.file
map.line
out.folder
output.format
pkg.opts
postamble
preamble
profile.nodes
reduce
reduce.file
reduce.line
rmr.global.env
rmr.local.env
save.env
tempfile
vectorized.reduce
verbose
work.dir
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
there is no package called ‘stringr’
Calls: <Anonymous> ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>
No traceback available
Error during wrapup:
Execution halted
The rmr2 package is intalled, otherwise the small.inits-example would not work or not?
That I don't understand.
The last question for me is:
I have 8 nodes,I see them on localhost:50070, but when go the URL localhost:8088 and looking for the application
I see only that one node is called:
ApplicationMaster
Attempt Number Start Time Node Logs
1 19-Dec-2014 08:10:40 l101-pc06:8042 logs
Like you see in the userlogs, there are three different container?that I don't understand:
And here is the R library from node l101-pc06:
Packages in library ‘/usr/local/lib/R/site-library’:
bitops Bitwise Operations
caTools Tools: moving window statistics, GIF, Base64,
ROC AUC, etc.
devtools Tools to make developing R code easier
digest Create Cryptographic Hash Digests of R Objects
evaluate Parsing and evaluation tools that provide more
details than the default.
functional Curry, Compose, and other higher-order
functions
httr Tools for Working with URLs and HTTP
iterators Iterator construct for R
itertools Iterator Tools
jsonlite A Robust, High Performance JSON Parser and
Generator for R
memoise Memoise functions
mime Map filenames to MIME types
plyr Tools for splitting, applying and combining
data
R6 Classes with reference semantics
Rcpp Seamless R and C++ Integration
RCurl General network (HTTP/FTP/...) client interface
for R
reshape2 Flexibly Reshape Data: A Reboot of the Reshape
Package.
rhdfs R and Hadoop Distributed Filesystem
rJava Low-level R to Java interface
RJSONIO Serialize R objects to JSON, JavaScript Object
Notation
rmr2 R and Hadoop Streaming Connector
rstudioapi Safely access the RStudio API.
stringr Make it easier to work with strings.
whisker {{mustache}} for R, logicless templating
Packages in library ‘/usr/lib/R/library’:
base The R Base Package
boot Bootstrap Functions (originally by Angelo Canty
for S)
class Functions for Classification
cluster Cluster Analysis Extended Rousseeuw et al.
codetools Code Analysis Tools for R
compiler The R Compiler Package
datasets The R Datasets Package
foreign Read Data Stored by Minitab, S, SAS, SPSS,
Stata, Systat, Weka, dBase, ...
graphics The R Graphics Package
grDevices The R Graphics Devices and Support for Colours
and Fonts
grid The Grid Graphics Package
KernSmooth Functions for kernel smoothing for Wand & Jones
(1995)
lattice Lattice Graphics
MASS Support Functions and Datasets for Venables and
Ripley's MASS
Matrix Sparse and Dense Matrix Classes and Methods
methods Formal Methods and Classes
mgcv Mixed GAM Computation Vehicle with GCV/AIC/REML
smoothness estimation
nlme Linear and Nonlinear Mixed Effects Models
nnet Feed-forward Neural Networks and Multinomial
Log-Linear Models
parallel Support for Parallel computation in R
rpart Recursive Partitioning and Regression Trees
spatial Functions for Kriging and Point Pattern
Analysis
splines Regression Spline Functions and Classes
stats The R Stats Package
stats4 Statistical Functions using S4 Classes
survival Survival Analysis
Thank you for the work and I hope you can help me.
Best regards
Marc
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
there is no package called ‘stringr’
Which makes the execution stop. I am not sure why your program needs stringr, but if it uses functions from stringr you need to install stringr on each node. If it doesn't, I would detach stringr before executing the wordcount program, so that it's not loaded on the nodes either.
...