Error while updateAttribute() in Rhipe test example - not a SequenceFile

72 views
Skip to first unread message

Sunny Kumar

unread,
Mar 17, 2015, 12:56:22 PM3/17/15
to rh...@googlegroups.com
Hi,

I have installed Rhipe-0.74.0, with R-3.1.2, Hadoop-1.0.4 and protobuf-2.4.1 in multi node cluster

I am trying to work out the example mentioned here.

In the function irisDdf <- updateAttributes(irisDdf), getting the following error

Waiting 5 seconds
There were Hadoop specific errors (autokill will not kill job), showing at most 30:
java.io.IOException: hdfs://.../tmp/irisKV/_meta/ddo.Rdata not a SequenceFile
...
java.io.IOException: hdfs://.../tmp/irisKV/_meta/ddf.Rdata not a SequenceFile
...
java.io.IOException: hdfs://.../tmp/irisKV/_meta/conn.Rdata not a SequenceFile
...

Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl,  :
  java.io.FileNotFoundException: Cannot access /tmp/tmp_output-d31d2f463cfbc5abc18ad4bb5ccdc3ed: No such file or directory.
In addition: Warning message:
In Rhipe:::rhwatch.runner(job = job, mon.sec = mon.sec, readback = readback,  :
  Job failure, deleting output: /tmp/tmp_output-d31d2f463cfbc5abc18ad4bb5ccdc3ed:


Additional Info:

PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

HADOOP_HOME=/usr/lib/hadoop-1.0.4
HADOOP_BIN=/usr/lib/hadoop-1.0.4/bin
HADOOP_CONF_DIR=/usr/lib/hadoop-1.0.4/conf
HADOOP_LIBS=/usr/lib/hadoop-1.0.4/lib

LD_LIBRARY_PATH=/usr/local/lib

JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64

rhoptions()$runner (modified to)
[1] "/usr/lib64/R/bin/R CMD /usr/lib64/R/library/Rhipe/bin/RhipeMapReduce --slave --silent --vanilla"


Thanks for the help.

Sunny Kumar

Jeremiah Rounds

unread,
Sep 14, 2015, 4:05:50 PM9/14/15
to rhipe
Have this exact error in the same usage (found thread via google search).

There were Hadoop specific errors (autokill will not kill job), showing at most 30:
Error: java.io.IOException: hdfs://tessera-cdh-master-001.novalocal:8020/tmp/irisKV/_meta/conn.Rdata not a SequenceFile
	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1850)
	at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
	at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:54)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:545)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:783)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Error: java.io.IOException: hdfs://tessera-cdh-master-001.novalocal:8020/tmp/irisKV/_meta/ddf.Rdata not a SequenceFile
	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1850)
	at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
	at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:54)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:545)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:783)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

Jeremiah Rounds

unread,
Sep 14, 2015, 4:09:29 PM9/14/15
to rhipe
Code I was trying to run was:

library(Rhipe)
library(datadr)
rhinit()
hdfs.setwd("/user/roun308")
rhoptions(zips = "/user/roun308/RhipeLib.tar.gz")
rhoptions(runner = "./RhipeLib/library/Rhipe/bin/RhipeMapReduce.sh")
<rhipe test code worked and is omitted>

#now testing datadr
irisHDFSconn <- hdfsConn("/tmp/irisKV", autoYes = TRUE)
data(iris)
irisKV <- list(
  list("key1", iris[1:40,]),
  list("key2", iris[41:110,]),
  list("key3", iris[111:150,]))
addData(irisHDFSconn, irisKV)
irisDdf <- ddf(irisHDFSconn)
irisDdf
# update irisDdf attributes
irisDdf <- updateAttributes(irisDdf)
irisDdf


On Tuesday, March 17, 2015 at 9:56:22 AM UTC-7, Sunny Kumar wrote:

Jeremiah Rounds

unread,
Sep 14, 2015, 5:39:27 PM9/14/15
to rhipe
For the record, the solution to this is:
rhoptions(file.types.remove.regex="(/_meta|/_rh_meta|/_outputs|/_SUCCESS|/_LOG|/_log|rhipe_debug|rhipe_merged_index_db)")



The issue is that Rhipe is seeing datadr files and interpreting them as inputs to MapReduce.  By adding them to file.types.remove.regex you keep them out of the MapReduce.


I suggest either this needs more forward documentation in datadr or some how it gets automated (or both).










On Tuesday, March 17, 2015 at 9:56:22 AM UTC-7, Sunny Kumar wrote:

Ryan

unread,
Sep 15, 2015, 11:46:36 AM9/15/15
to rhipe
This isn't a problem when using versions of RHIPE probably less than a year old, but you're right - we should add something to datadr to help with this.
Reply all
Reply to author
Forward
0 new messages