Saving an RDD to s3 => java.io.IOException: Can't make directory for path s3n://mydomain/andy since it is a file.

andrew

unread,

Jan 14, 2013, 5:54:44 AM1/14/13

to spark...@googlegroups.com

Hi,

I am having problems saving an RDD to s3. I execute the command like so:

rdd.saveAsTextFile("s3n://mydomain/andy/test/books.csv")

This error is then thrown:

java.io.IOException: Can't make directory for path s3n://mydomain/andy since it is a file.

I have tried variations of the path, but each time it is the same problem.

This is the full exception:

13/01/14 10:47:23 INFO spark.PairRDDFunctions: Saving as hadoop file of type (NullWritable, Text)

org.apache.hadoop.mapred.JobContext@71d7135b

java.io.IOException: Can't make directory for path s3n://mydomain/andy since it is a file.

at org.apache.hadoop.fs.s3native.NativeS3FileSystem.mkdir(NativeS3FileSystem.java:434)

at org.apache.hadoop.fs.s3native.NativeS3FileSystem.mkdirs(NativeS3FileSystem.java:425)

at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1126)

at org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:52)

at org.apache.hadoop.mapred.HadoopWriter.preSetup(HadoopWriter.scala:49)

at spark.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:593)

at spark.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:567)

at spark.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:477)

at spark.RDD.saveAsTextFile(RDD.scala:508)

at <init>(<console>:13)

at <init>(<console>:18)

at <init>(<console>:20)

at <init>(<console>:22)

at <init>(<console>:24)

at .<init>(<console>:28)

at .<clinit>(<console>)

at .<init>(<console>:11)

at .<clinit>(<console>)

at $export(<console>)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:601)

at spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:629)

at spark.repl.SparkIMain$Request$$anonfun$10.apply(SparkIMain.scala:890)

at scala.tools.nsc.interpreter.Line$$anonfun$1.apply$mcV$sp(Line.scala:43)

at scala.tools.nsc.io.package$$anon$2.run(package.scala:25)

at java.lang.Thread.run(Thread.java:722)

Anyone have any ideas?

Thanks

Stephen Haberman

unread,

Jan 14, 2013, 12:45:54 PM1/14/13

to spark...@googlegroups.com

> java.io.IOException: Can't make directory for path
> s3n://mydomain/andy since it is a file.

The Hadoop NativeS3FileSystem does not like the "andy" marker object
you have in S3.

(Remember that S3 has no native knowledge of "files" or "directories",
only objects, and so each API/tool brings its own convention to the
table about how to tell "this object is a file" or "this object is a
directory".)

In this case, NativeS3FileSystem's convention is that "directories are
objects who key ends in /". (Which is what the AWS console uses too.)

However, whatever tool you used to create your test output directory is
using another convention, where directories don't have to end in slash.
What tool did you use? Maybe s3sync? I believe that is the convention it
uses.

You'll have to remove the "andy" marker object (or rename it to "andy/")
and then it will work. Probably with another tool, like AWS console,
than the one you used to initially create it.

- Stephen

andrew

unread,

Jan 15, 2013, 4:47:26 AM1/15/13

to spark...@googlegroups.com

Ah! So silly!

I had an empty file named andy that I must have created by accident. So silly.

Thanks Stephen!

Reply all

Reply to author

Forward