java.io.FileNotFoundException: Cannot access directory error

145 views
Skip to first unread message

hh

unread,
May 28, 2015, 4:22:25 AM5/28/15
to tesser...@googlegroups.com
Hi,

I have a problem reading a CSV file from HDFS in Tessera in a non-AWS cluster. I hope someone could shed a light on this.

The environment:
R 3.1.1
Rhipe v0.75.1.5 with rJava_0.9-6
Tessera running on CDH5 MR2 with YARN, all nodes (master and task) are running on RHEL.

I've tested the Hadoop layer via Hadoop Streaming code and it worked.
The output directory mentioned in the logs below, '/data-out-01' did get automatically deleted after the job failure.

The error (summary) is:

Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl,  : 
  java.io.FileNotFoundException: Cannot access /data-out-01: No such file or directory.
In addition: Warning message:
In Rhipe:::rhwatch.runner(job = job, mon.sec = mon.sec, readback = readback,  :
  Job failure, deleting output: /data-out-01:

I have a CSV file in HDFS, under '/data' directory and I want to read it and store its output to '/data-out-01' in HDFS. The commands I issued were:

> library(datadr)
> library(Rhipe)

> options(error = quote(dump.frames("testdump", TRUE)))
> data.in <- datadr::hdfsConn(loc="/data/", type="text", autoYes=T)
* Loading connection attributes

> data.out <- datadr::hdfsConn(loc="/data-out-01", autoYes=T)
* Attempting to create directory... success
* Saving connection attributes
* To initialize the data in this directory as a distributed data object or data frame, call ddo() or ddf()

> data.ddo <- datadr::drRead.csv(data.in, output=data.out)

The Java Exception is as follows:

       pct numtasks pending running complete killed failed_attempts killed_attempts
map      0        1       0       1        0      0               3               0
reduce   0        1       1       0        0      0               0               0
Waiting 5 seconds
There were Hadoop specific errors (autokill will not kill job), showing at most 30:
Error: java.io.IOException: java.io.IOException: Stream closed
at org.godhuli.rhipe.RHMRMapper.map(RHMRMapper.java:132)
at org.godhuli.rhipe.RHMRMapper.run(RHMRMapper.java:60)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: Stream closed
at java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:434)
at java.io.OutputStream.write(OutputStream.java:116)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.godhuli.rhipe.RHBytesWritable.write(RHBytesWritable.java:121)
at org.godhuli.rhipe.RHMRHelper.write(RHMRHelper.java:350)
at org.godhuli.rhipe.RHMRMapper.map(RHMRMapper.java:126)
... 8 more
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl,  : 
  java.io.FileNotFoundException: Cannot access /data-out-01: No such file or directory.
In addition: Warning message:
In Rhipe:::rhwatch.runner(job = job, mon.sec = mon.sec, readback = readback,  :
  Job failure, deleting output: /data-out-01:

Issuing 'traceback()` function right after gives me:

> traceback()
17: stop(list(message = "java.io.FileNotFoundException: Cannot access /data-out-01: No such file or directory.", 
        call = .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", 
            cl, .jcast(if (inherits(o, "jobjRef") || inherits(o, 
                "jarrayRef")) o else cl, "java/lang/Object"), .jnew("java/lang/String", 
                method), j_p, j_pc, use.true.class = TRUE, evalString = simplify, 
            evalArray = FALSE), jobj = <S4 object of class "jobjRef">))
16: .Call(RJavaCheckExceptions, silent)
15: .jcheck(silent = FALSE)
14: .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, 
        .jcast(if (inherits(o, "jobjRef") || inherits(o, "jarrayRef")) o else cl, 
            "java/lang/Object"), .jnew("java/lang/String", method), 
        j_p, j_pc, use.true.class = TRUE, evalString = simplify, 
        evalArray = FALSE)
13: .jrcall(x, name, ...)
12: rhoptions()$server$rhls(folder, if (recurse) 1L else 0L)
11: rhls(fp, recurse = TRUE)
10: getBasicDdoAttrs.kvHDFS(res, conn)
9: getBasicDdoAttrs(res, conn)
8: ddo(res$data, update = FALSE, verbose = FALSE)
7: mrExec(ddo(file), map = map, control = control, output = output, 
       overwrite = overwrite, params = c(params, parList), packages = packages)
6: inherits(conn, "ddo")
5: ddf(mrExec(ddo(file), map = map, control = control, output = output, 
       overwrite = overwrite, params = c(params, parList), packages = packages))
4: readTable.hdfsConn(file, rowsPerBlock, skip, header, hd, hdText, 
       readTabParams, postTransFn, output, overwrite, params, packages, 
       control)
3: readTable(file, rowsPerBlock, skip, header, hd, hdText, readTabParams, 
       postTransFn, output, overwrite, params, packages, control)
2: drRead.table(file = file, header = header, sep = sep, quote = quote, 
       dec = dec, fill = fill, comment.char = comment.char, ...)
1: datadr::drRead.csv(data.in, output = data.out)  
  
One of the logs from YARN mentions sometime about exit code 143.

Reply all
Reply to author
Forward
0 new messages