Run MapReduce with rHBase

499 views
Skip to first unread message

Hoang Le

unread,
Mar 11, 2014, 5:27:28 AM3/11/14
to rha...@googlegroups.com
Hi I installed both rmr and rHBase, I run a map-reduce program in R and it works, I use rhbase commmand, it works very well too.

The problem is when I run a sample of MapReduce with HBase, i have a error. Please anyone help me

My sample code: 
library(rmr2)
library(rhbase)

hb.init(serialize="char")

hb.new.table("mytable", "cf")
hb.insert("mytable",list( list(1,c("cf:x","cf:y","cf:z"),list("apple","berry","cherry"))))

r = from.dfs(
  mapreduce(
    input = "mytable",
    input.format = make.input.format("hbase",
      family.columns = list(cf=list("x","y")),
      key.deserialize = "raw",
      cell.deserialize = "raw"),
    map = function(k,v) v )
)  


My error : 

Loading required package: stringr
Loading required package: plyr

Attaching package: ‘plyr’

The following object is masked from ‘package:lubridate’:

here

Loading required package: caTools
Error in readBin(con, raw(), read.size) : 
  argument "read.size" is missing, with no default
Calls: <Anonymous> ... <Anonymous> -> keyval.reader -> <Anonymous> -> tif -> readBin
No traceback available 
Error during wrapup: 
Execution halted
14/03/11 16:26:23 INFO streaming.PipeMapRed: MRErrorThread done
14/03/11 16:26:23 INFO streaming.PipeMapRed: PipeMapRed failed!

Thank you.



Antonio Piccolboni

unread,
Mar 11, 2014, 12:03:58 PM3/11/14
to RHadoop Google Group
It looks like a format didn't get updated for a change in an internal API (translation: that looks like a bug in the package), could you please tell me which version of rmr2 you are running? There's also a possibility that you have not upgraded one of the nodes. Thanks


Antonio


--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Hoang Le

unread,
Mar 11, 2014, 10:20:44 PM3/11/14
to rha...@googlegroups.com, ant...@piccolboni.info
I installed hadoop 2.2.0,  hbase 0.96.1, rmr2 3.0.0, rhbase 1.2.0 rhdfs 1.0.8.

This is a test before I run on AWS, so i just installed standalone mode.

I ran a wordcount test for a file in hdfs with RHadoop and used RHbase to create/delete HBase table, it worked very well.

Antonio Piccolboni

unread,
Mar 11, 2014, 11:54:54 PM3/11/14
to RHadoop Google Group
I entered bug https://github.com/RevolutionAnalytics/rmr2/issues/97 to track progress on this, I am convinced it is a bug. Can you build directly from github? Would you be willing to test the fix? 


Antonio

Antonio Piccolboni

unread,
Mar 12, 2014, 12:51:54 AM3/12/14
to rha...@googlegroups.com, ant...@piccolboni.info
A tentative fix is in dev

Hoang Le

unread,
Mar 12, 2014, 7:19:13 AM3/12/14
to rha...@googlegroups.com, ant...@piccolboni.info
Hi Antonio,

I built the dev branch and here is the new error:

14/03/12 18:12:47 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
14/03/12 18:12:47 INFO mapred.TypedBytesTableInputFormat: Value Format[familiescolumns]
14/03/12 18:12:47 INFO mapred.MapTask: numReduceTasks: 0
14/03/12 18:12:47 INFO streaming.PipeMapRed: PipeMapRed exec [/usr/bin/Rscript, --vanilla, ./rmr-streaming-mapaf5040d070aa]
14/03/12 18:12:47 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
14/03/12 18:12:47 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
Error in library(functional) : there is no package called ‘functional’
No traceback available 
Error during wrapup: 
Execution halted
14/03/12 18:12:47 INFO streaming.PipeMapRed: MRErrorThread done
14/03/12 18:12:47 INFO streaming.PipeMapRed: PipeMapRed failed!
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/03/12 18:12:47 INFO mapred.LocalJobRunner: Map task executor complete.
14/03/12 18:12:47 WARN mapred.LocalJobRunner: job_local925134517_0001
java.lang.Exception: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403)
Caused by: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/03/12 18:12:48 INFO mapreduce.Job: Job job_local925134517_0001 running in uber mode : false
14/03/12 18:12:48 INFO mapreduce.Job:  map 0% reduce 0%

Antonio Piccolboni

unread,
Mar 12, 2014, 12:53:41 PM3/12/14
to rha...@googlegroups.com, ant...@piccolboni.info
Thanks for helping out with testing. functional has been a dependency for rmr2 for a long time, why isn't it installed on your system? Did you spin up a VM and forgot to install it? There is no change in dev concerning that. Please double check that functional is installed on all nodes for all users. Thanks


Antonio

Mohana Sundaram

unread,
Aug 13, 2014, 3:30:09 AM8/13/14
to rha...@googlegroups.com, ant...@piccolboni.info
Hi Antonio,

When i am trying to implement a mapreduce implementation with the below code in R.

R Source Code :: (rHbaseTest.R)
=============
require(rhbase)
require(rmr2)
require(rhdfs)
Sys.setenv(HADOOP_STREAMING="/home/user1/softwares/hadoop-2.2.0/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar");
hostLoc = 'myhost'  #Give your server IP
port = 9090            #Default port for thrift service

hb.init(hostLoc, port)
hb.init(serialize="char")

hb.list.tables()
hb.describe.table("input")


#r = from.dfs(mapreduce(input = "input",input.format = make.input.format("hbase",family.columns = list(cf=list("x","y")),map = function(k,v) v )))

r = from.dfs(mapreduce(input = "input",input.format = make.input.format("hbase",family.columns = list(cf=list("x","y")),   key.deserialize = "raw",cell.deserialize = "raw"),map = function(k,v) v ))


Output with Error:
==============
user1@PC:~/R$ Rscript rHbaseTest.R
Loading required package: rhbase
Loading required package: methods
Loading required package: rmr2
Loading required package: rhdfs
Loading required package: rJava

HADOOP_CMD=/home/user1/softwares/hadoop-2.2.0/bin/hadoop

Be sure to run hdfs.init()
<pointer: 0x9cc28c8>
attr(,"class")
[1] "hb.client.connection"
<pointer: 0x9cc5be8>
attr(,"class")
[1] "hb.client.connection"
$input
    maxversions compression inmemory bloomfiltertype bloomfiltervecsize
df:           5        NONE    FALSE            NONE                  0
    bloomfilternbhashes blockcache timetolive
df:                   0      FALSE         -1

    maxversions compression inmemory bloomfiltertype bloomfiltervecsize
df:           5        NONE    FALSE            NONE                  0
    bloomfilternbhashes blockcache timetolive
df:                   0      FALSE         -1
14/08/13 12:30:14 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
-inputformat : class not found : com.dappervision.hbase.mapred.TypedBytesTableInputFormat
Try -help for more information
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce,  :
  hadoop streaming failed with error code 1
Calls: from.dfs -> to.dfs.path -> mapreduce -> mr
In addition: Warning messages:
1: In rmr.options("backend") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
2: In rmr.options("hdfs.tempdir") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
3: In rmr.options("backend") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
4: In rmr.options("backend.parameters") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
Execution halted
Warning message:
In rmr.options("backend") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)


Could you Please help me out to resolve this problem ??

Thanks in advance.

Regards,
Mohan.

Antonio Piccolboni

unread,
Aug 13, 2014, 12:15:57 PM8/13/14
to rha...@googlegroups.com, ant...@piccolboni.info
Hi, is this inquiry related to this thread or independent? I think if it's independent, it helps for the readability of the group to start a new thread. There's no charge for that, just a different button.


Antonio

Mohana Sundaram

unread,
Aug 14, 2014, 5:58:52 AM8/14/14
to rha...@googlegroups.com, ant...@piccolboni.info
Hi Antonio,

Its been related with mapreduce implementation with rhbase right. That's the reason i have continued with the same thread.

- Mohan.

Antonio Piccolboni

unread,
Aug 14, 2014, 2:09:53 PM8/14/14
to RHadoop Google Group
When you installed rmr2, did it successfully build the hbase classes? Since not many people use them and they have a number of requirements, if anything goes wrong the package build still succeeds and you only get a message. If you overlooked that, than that class wouldn't exist, hence the error. I would go back to the install log or reinstall rmr2 on one node  and look carefully at the console output for the section related to the hbase build. It will start getting ant installed and so forth. This is absolutely unrelated to rhbase.


Antonio

Mohana Sundaram

unread,
Aug 16, 2014, 5:39:00 AM8/16/14
to rha...@googlegroups.com, ant...@piccolboni.info
Hi Antonio,

Yeah you are right. Just i remember that i got one error when installing rmr2, but it was successfully installed, so i didnt mind it.
I got the below error when installing it.

> install.packages("/home/user1/RLibrary/rmr2_3.1.2.tar.gz", repos=NULL, type="source")
Installing package into ‘/home/user1/R/i686-pc-linux-gnu-library/3.1’
(as ‘lib’ is unspecified)
* installing *source* package ‘rmr2’ ...
** libs
g++ -I/home/user1/R-3.1.1/include -DNDEBUG  -I/usr/local/include   `/home/user1/R-3.1.1/bin/Rscript -e "Rcpp:::CxxFlags()"` -fpic  -g -O2  -c extras.cpp -o extras.o
g++ -I/home/user1/R-3.1.1/include -DNDEBUG  -I/usr/local/include   `/home/user1/R-3.1.1/bin/Rscript -e "Rcpp:::CxxFlags()"` -fpic  -g -O2  -c hbase-to-df.cpp -o hbase-to-df.o
g++ -I/home/user1/R-3.1.1/include -DNDEBUG  -I/usr/local/include   `/home/user1/R-3.1.1/bin/Rscript -e "Rcpp:::CxxFlags()"` -fpic  -g -O2  -c keyval.cpp -o keyval.o
g++ -I/home/user1/R-3.1.1/include -DNDEBUG  -I/usr/local/include   `/home/user1/R-3.1.1/bin/Rscript -e "Rcpp:::CxxFlags()"` -fpic  -g -O2  -c t-list.cpp -o t-list.o
g++ -I/home/user1/R-3.1.1/include -DNDEBUG  -I/usr/local/include   `/home/user1/R-3.1.1/bin/Rscript -e "Rcpp:::CxxFlags()"` -fpic  -g -O2  -c typed-bytes.cpp -o typed-bytes.o
g++ -shared -L/usr/local/lib -o rmr2.so extras.o hbase-to-df.o keyval.o t-list.o typed-bytes.o -L/home/user1/R-3.1.1/lib -lR
((which hbase && (mkdir -p ../inst; cd hbase-io; sh build_linux.sh; cp build/dist/* ../../inst)) || echo "can't build hbase IO classes, skipping" >&2)
/mnt/foresight/servers/hbase/bin/hbase
--2014-08-14 17:47:31--  http://mirror.nus.edu.sg/apache/ant/binaries/apache-ant-1.9.2-bin.tar.gz
Resolving mirror.nus.edu.sg (mirror.nus.edu.sg)... 137.132.82.11
Connecting to mirror.nus.edu.sg (mirror.nus.edu.sg)|137.132.82.11|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2014-08-14 17:47:31 ERROR 404: Not Found.

tar (child): apache-ant-1.9.2-bin.tar.gz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
Using /home/user1/softwares/hadoop-2.2.0 as hadoop home
Using /mnt/foresight/servers/hbase as hbase home

Copying libs into local build directory
Cannot find hbase jars in hbase home
cp: cannot stat `build/dist/*': No such file or directory
can't build hbase IO classes, skipping
installing to /home/user1/R/i686-pc-linux-gnu-library/3.1/rmr2/libs
** R
** preparing package for lazy loading
Warning in library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called ‘quickcheck’
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (rmr2)

Versions used:   Hadoop - 2.2.0, Hbase - hbase-0.96.2(hadoop2), R - 3.1.1,  rmr2 - 3.1.2 , rhbase - 1.2.1 and rhdfs - 1.0.8.

I would like to know whether these versions are compatibility with each other .

Thanks,
Mohan.

Antonio Piccolboni

unread,
Aug 16, 2014, 6:46:14 PM8/16/14
to RHadoop Google Group
Hi,
I think if that line in build_linux.sh read ant-1.9.4 you  wouldn't have gotten that error and you may have completed the build. Strange, our contributor @kahhrut just fixed exactly that point. Either a typo, me botching a merge or something changing in the mirror. This script has been pretty brittle, maybe if we just said: ant > 1.9.2 is a requirement and remove the ant download it would work better. Short term, you could just fix the build script by yourself, or let me do it on monday, in which case you will have to build from the repo, not from a distribution. Whichever works for you. As far as compatibility, we don't have resources to test every combination, you need to grab an additional package, quickcheck, from our Downloads page and run R CMD check path-to-package. If it passes all the tests, that's what we mean by compatibility. They take a long time, because there are many of them, there's latency for each test and there are a few tests at a modest scale. I would love to have more tests at an even larger scale, but it's a trade off. It looks to me you have a fairly recent version of everything and there should be no problem (unfortunately, the tests don't include a test of the hbase IO format to avoid yet-another-dependency for the tests).


Antonio

Mohana Sundaram

unread,
Aug 18, 2014, 2:55:59 AM8/18/14
to rha...@googlegroups.com, ant...@piccolboni.info
Okay Antonio no problem . Thanks a lot for your quick & favorable reply. Now, i am going to check the same build with previous versions of Hbase(0.94.x) to make use of rmr package as because recent rmr2 packages are not compatible with Hbase-0.96.x versions. Hope it will work. I will get back to you antonio if i am facing any problem.

Mohan.

Antonio Piccolboni

unread,
Aug 19, 2014, 11:50:10 AM8/19/14
to RHadoop Google Group
It looks like this has been fixed, please upgrade to rmr2 3.2.0



Antonio
Message has been deleted

Antonio Piccolboni

unread,
Aug 25, 2014, 1:24:40 AM8/25/14
to RHadoop Google Group

On Sun, Aug 24, 2014 at 12:23 AM, Mohana Sundaram <technol...@gmail.com> wrote:
Error in if (file.exists(cmd)) return(cmd) : argument is of length zero

That looks like a bug in rmr2. Can you try, at the R prompt

rmr2:::hdfs.cmd()

and tell me what happens? (that's not a solution, it's for me to circumscribe the problem)


Mohana Sundaram

unread,
Aug 25, 2014, 1:58:43 AM8/25/14
to rha...@googlegroups.com, ant...@piccolboni.info
Hi Antonio,

Here is the output for the mentioned command when tried in R terminal.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> rmr2:::hdfs.cmd()

Error in if (file.exists(cmd)) return(cmd) : argument is of length zero


Thanks,
Mohan.

Antonio Piccolboni

unread,
Aug 25, 2014, 8:14:06 PM8/25/14
to rha...@googlegroups.com, ant...@piccolboni.info
This is bug https://github.com/RevolutionAnalytics/rmr2/issues/132. Please check there for a workaround. Your program seems to fail earlier though so there are two issues at play. To clarify the other one, you need to get to the stderr of a failing task and post it back here. Just console output won't cut it, unfortunately but nothing new on this group. Just reading your program I can't think of anything. Thsnks
Message has been deleted

Antonio Piccolboni

unread,
Aug 27, 2014, 4:27:00 PM8/27/14
to rha...@googlegroups.com, ant...@piccolboni.info
I don't need two copies of your program, I need stderr of a failing task. Consult your Hadoop documentation to find it. The console output, unless you run Hadoop in standalone, doesn't have the information we probably need.

On Monday, August 25, 2014 11:57:25 PM UTC-7, Mohana Sundaram wrote:
Antonio, i have resolved that error by workaround that you have mentioned in the above link by setting HDFS_CMD. Now, after running the mapreduce jobs, i am getting the below error in the job tracker in hadoop logs.

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
	at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
	at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
	at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:135)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
	at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)

But in the R terminal i am getting the error as below:
14/08/26 05:36:41 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201408221219_0019
14/08/26 05:36:41 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201408221219_0019_m_000000
14/08/26 05:36:41 INFO streaming.StreamJob: killJob...

Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
hadoop streaming failed with error code 1
Calls: mapreduce -> mr
Execution halted
Warning: $HADOOP_HOME is deprecated.

Warning: $HADOOP_HOME is deprecated.

Deleted hdfs://localhost:9000/tmp/file25b230f3f310

I have copied the program also for your reference as you have not mentioned that you are not aware of the exact logic.
R Program :
===========

require(rhbase)
require(rmr2)
#require(rhdfs)
Sys.setenv(HADOOP_CMD='/mnt/servers/hadoop/bin/hadoop');
Sys.setenv(HADOOP_STREAMING="/mnt/servers/hadoop/contrib/streaming/hadoop-streaming-1.2.1.jar");
hostLoc = 'localhost' #Give your server IP

port = 9090 #Default port for thrift service

#hb.init()
hb.init(serialize="char")

#hb.list.tables()
#hb.insert("test_stream_realtime",list(list("20100101",c("df:name","df:company"), list("Mohan","cognizant"))))
#hb.insert("test_stream_realtime",list(list("20100102",c("df:name","df:company"), list("Sharan","cognizant"))))
#hb.insert("test_stream_realtime",list(list("20100103",c("df:name","df:company"), list("Mathan","cognizant"))))
hb.describe.table("test_stream_realtime")



#r = from.dfs(mapreduce(input = "input",input.format = make.input.format("hbase",family.columns = list(cf=list("x","y")),map = function(k,v) v )))

print ('Here')
#hb.scan("test_stream_realtime",start=1,end=10,colspec=c("df"))

r = mapreduce(input = "test_stream_realtime",input.format = make.input.format("hbase",family.columns = list(df=list("name","company")),key.deserialize="raw",cell.deserialize = "raw"),output.format = "text",map = function(k,v) v )


Thanks,
Mohan.

Mohana Sundaram

unread,
Aug 28, 2014, 12:06:15 PM8/28/14
to rha...@googlegroups.com, ant...@piccolboni.info
Hi Antonio,

Here is the stderr for the failed mapreduce tasks. For all the tasks that i am running , i am getting the same error.

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
	at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
	at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
	at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:135)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
	at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)

could it be possible to trace the error with the help of this log now ?? 

Thanks,
Mohan.

Antonio Piccolboni

unread,
Aug 28, 2014, 1:00:28 PM8/28/14
to RHadoop Google Group
I need the stderr of R, not a stack trace from java. You'll see typical R startup messages, packages loading, objects loading and the like. If you just past console messages, almost all R errors looks like

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1

I would need superpowers to fix all bugs from that one single message, pasted in a new bug report each time.

To find the stderr of a failing task you need basic familiarity with the jobtracker webUI, which anyway is a must-have skill write any mapreduce program -- unless you only write correct code, of course.

You go to the URL seen in console

http://localhost:50030/jobdetails.jsp?jobid=job_201408221219_0019

then select map tasks, failed tasks, one in particular then logs then stderr or something like this. Take a look there and unless it clarifies the issue for you please report back here. Thanks


Mohana Sundaram

unread,
Aug 31, 2014, 10:19:18 AM8/31/14
to rha...@googlegroups.com, ant...@piccolboni.info

I have resolved this issue by installing all the R libraries in System path . Now , i am not getting any error in running Mapreduce operation.

- Mohan.

Mohana Sundaram

unread,
Aug 31, 2014, 11:10:34 AM8/31/14
to rha...@googlegroups.com, ant...@piccolboni.info
Antonio,

I would like to know , how could we proceed to retrieve the columns from Hbase with timestamp as filter during the mapreduce operation using rmr.
could you please suggest some idea.

- Mohan.

Antonio Piccolboni

unread,
Sep 1, 2014, 2:25:05 PM9/1/14
to RHadoop Google Group
Hi Mohan,
you don't explain the problem, so I don't understand the problem. Have you read the manual? Have you tried anything? Is the timestamp the key of your hbase table? Do you need somebody to do your work?

Antonio

Mohana Sundaram

unread,
Sep 4, 2014, 7:26:59 AM9/4/14
to rha...@googlegroups.com, ant...@piccolboni.info
Hey Antonio,

The way of use case that we are working was to retrieve or filter  the columns based on the timestamp(one of the columns in the HBase) inside Mapreduce . When i went through the manual, i came across some timestamp operation in normal function only, but not in the R - mapreduce operation, thats why i asked you.

It doesn't mean that i asked you to do my work, just i asked any suggestion or idea to work around further.

Thanks,
Mohan.

Antonio Piccolboni

unread,
Sep 4, 2014, 11:32:13 AM9/4/14
to RHadoop Google Group
Can you do what you need to do on a small scale outside mapreduce, in console? That would be step one. Second, I would think about the input to the mapreduce job. Third, move the rhbase code inside the map function and see what happens. 


Antonio

Antonio Piccolboni

unread,
Sep 8, 2014, 3:37:59 PM9/8/14
to rha...@googlegroups.com, ant...@piccolboni.info
I talked with people closely involved in the design of rhbse and they warned me that rhbase interfaces with hbase through a thrift server which could quickly become a bottleneck unless you make sure that each map process talks to a different thrift server, or to one of a large enough pool. If you followed the installation instruction node by node, then you'd have a thrift server running on each node and that should be scalable.

Antonio


On Thursday, September 4, 2014 8:32:13 AM UTC-7, Antonio Piccolboni wrote:
Can you do what you need to do on a small scale outside mapreduce, in console? That would be step one. Second, I would think about the input to the mapreduce job. Third, move the rhbase code inside the map function and see what happens. 


Antonio

web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages