We are facing an issue where - "size exceeds integer.max_value spark". This exception is raised when "localIterator" start to write data. Not sure what is the probable cause, but our use case is to write data on driver node. This data size is too huge.
We have used localIterator in other cases too but have not encountered this error.
Many Thanks in advance.
Could you please give a code example? This should only come up in a integer literal that is to big or an integer overflow
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
JavaPairRDD<String, Iterable<CassandraRow>> theSpanRDD = RDDSparkHelper.spanBy(theCasssandraJavaRDD);
JavaRDD<String> theJavaRDD = RDDSparkHelper.mapSpanToDumpJavaRDD(theSpanRDD, true, theStringRecords, map, id);
JavaRDD<String> theRepartitionedRDD = theJavaRDD.repartition(Integer.parseInt(theNumPartitions) * 3);
JavaRDD<String> theCachedRDD = theRepartitionedRDD.persist(StorageLevel.MEMORY_AND_DISK());
Iterator<String> theLocalRDD = theCachedRDD.toLocalIterator();
I would just try more partitions seems like you are going past some internal api's limit. Also when you get an executor side exception like this it helps to get the log from that executot
org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:221)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.FilterOutputStream.close(FilterOutputStream.java:157)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:398)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103)
at org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108)
at org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:102)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1047)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:219)
... 18 more