Local Error - java.io.FileNotFoundException: (No such file or directory)

2,617 views
Skip to first unread message

gustavs...@gmail.com

unread,
Apr 23, 2013, 3:49:56 PM4/23/13
to spark...@googlegroups.com

I am running locally Spark locally ( local[8] ) on OSX with 16 GB ram, 8 cores.

It works when I pass in a dataset of 5 or 10 GB.

However, when I pass in a dataset of 15 GB, it fails. I am wondering if there is some memory limitation I am running into.

I get hundreds of errors like this one:

13/04/23 10:55:51 INFO local.LocalScheduler: Running ResultTask(0, 377)
13/04/23 10:55:51 INFO storage.BlockManager: Started 0 remote gets in  1 ms
13/04/23 10:55:51 INFO local.LocalScheduler: Size of task 377 is 1667 bytes
13/04/23 10:55:51 INFO storage.BlockManager: Started 0 remote gets in  0 ms
13/04/23 10:55:51 INFO storage.BlockManager: Started 0 remote gets in  0 ms
13/04/23 10:55:52 INFO storage.BlockManager: Started 0 remote gets in  0 ms
13/04/23 10:55:52 ERROR local.LocalScheduler: Exception in task 373
java.io.FileNotFoundException: /var/folders/g_/djj279317y12n7wn7wc6vfj00000gn/T/spark-local-20130423105354-3d97/01/shuffle_0_60_373 (No such file or directory)
     at java.io.RandomAccessFile.open(Native Method)
     at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
     at spark.storage.DiskStore.getBytes(DiskStore.scala:85)
     at spark.storage.DiskStore.getValues(DiskStore.scala:92)
     at spark.storage.BlockManager.getLocal(BlockManager.scala:269)
     at spark.storage.BlockManager$$anonfun$getMultiple$5.apply(BlockManager.scala:566)
     at spark.storage.BlockManager$$anonfun$getMultiple$5.apply(BlockManager.scala:565)
     at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
     at spark.storage.BlockManager.getMultiple(BlockManager.scala:565)
     at spark.BlockStoreShuffleFetcher.fetch(BlockStoreShuffleFetcher.scala:48)
     at spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:31)
     at spark.RDD.computeOrReadCheckpoint(RDD.scala:206)
     at spark.RDD.iterator(RDD.scala:195)
     at spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:19)
     at spark.RDD.computeOrReadCheckpoint(RDD.scala:206)
     at spark.RDD.iterator(RDD.scala:195)
     at spark.rdd.MappedRDD.compute(MappedRDD.scala:12)
     at spark.RDD.computeOrReadCheckpoint(RDD.scala:206)
     at spark.RDD.iterator(RDD.scala:195)
     at spark.scheduler.ResultTask.run(ResultTask.scala:76)
     at spark.scheduler.local.LocalScheduler.runTask$1(LocalScheduler.scala:74)
     at spark.scheduler.local.LocalScheduler$$anon$1.run(LocalScheduler.scala:50)
     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
     at java.util.concurrent.FutureTask.run(FutureTask.java:166)
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
     at java.lang.Thread.run(Thread.java:722)

I have attached a full log, if it helps.

Thanks,

Eric


spark_error.log

Josh Rosen

unread,
Apr 23, 2013, 4:02:03 PM4/23/13
to spark...@googlegroups.com
I don't know the fix for this, but this issue might be related to the one described at https://groups.google.com/d/msg/spark-users/wwQAWwKC9jE/EJKtt6aVgKkJ

gustavs...@gmail.com

unread,
Apr 23, 2013, 5:28:31 PM4/23/13
to spark...@googlegroups.com
Thanks for the link. I tried setting spark.local.dir to /sparktmp, and I took ownership of /sparktmp and opened up the permissions (777). I'm still getting the same error.

gustavs...@gmail.com

unread,
Apr 23, 2013, 7:15:26 PM4/23/13
to spark...@googlegroups.com
Update:

Running OSX 10.8.3, which is Mountain Lion

I tried running the 15GB job with 4 cores ( local[4] ) instead of 8 ( local[8] ), and it ran successfully without errors.

My best guess is that I am running into some limit imposed by OSX. I increased that maxfiles limit, and that didn't seem to help when running with 8 cores.

$ launchctl limit
    cpu         unlimited      unlimited     
    filesize    unlimited      unlimited     
    data        unlimited      unlimited     
    stack       8388608        67104768      
    core        0              unlimited     
    rss         unlimited      unlimited     
    memlock     unlimited      unlimited     
    maxproc     1064           1064          
    maxfiles    100000         100000 


I think it might have something to do with the number of concurrent processes, but I am not sure of that. For now, I will keep running it on fewer cores when testing stuff out locally.

gustavs...@gmail.com

unread,
Apr 26, 2013, 8:13:14 PM4/26/13
to spark...@googlegroups.com

My code uses a reduce operation, and since it fails only with larger data sets, I am wondering if it's a memory limitation issue.

Matei says:

[hadoop] deals better with reduce operations where one task's data doesn't fit in memory (by being able to spill sort data to disk).

If I have a dataset that is larger than my system memory, and I am running the spark job locally, should it work?

Or am I expected to distribute the job onto a set of machines with enough total memory?
Reply all
Reply to author
Forward
0 new messages