OutOfMemoryError when using csvread

33 views
Skip to first unread message

nejc....@gmail.com

unread,
May 19, 2018, 8:49:30 AM5/19/18
to Scala Breeze
I'm trying to load the MNIST training data from Kaggle (73.22M) into a DenseMatrix using the csvread function and it results in an OutOfMemoryError (it runs out of heap space). If I increase the heap space the same error is thrown with GC overhead limit exceeded. It seems something in CSVReader.read is generating an excessive amount of short-term objects. Here is the stack trace:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
 at java
.lang.StringBuilder.toString(StringBuilder.java:407)
 at au
.com.bytecode.opencsv.CSVParser.parseLine(CSVParser.java:250)
 at au
.com.bytecode.opencsv.CSVParser.parseLineMulti(CSVParser.java:174)
 at au
.com.bytecode.opencsv.CSVReader.readNext(CSVReader.java:237)
 at breeze
.io.CSVReader$$anon$1.next(CSVReader.scala:41)
 at breeze
.io.CSVReader$$anon$1.next(CSVReader.scala:34)
 at scala
.collection.Iterator.foreach(Iterator.scala:929)
 at scala
.collection.Iterator.foreach$(Iterator.scala:929)
 at breeze
.io.CSVReader$$anon$1.foreach(CSVReader.scala:34)
 at scala
.collection.generic.Growable.$plus$plus$eq(Growable.scala:59)
 at scala
.collection.generic.Growable.$plus$plus$eq$(Growable.scala:50)
 at scala
.collection.immutable.VectorBuilder.$plus$plus$eq(Vector.scala:658)
 at scala
.collection.immutable.VectorBuilder.$plus$plus$eq(Vector.scala:635)
 at scala
.collection.TraversableOnce.to(TraversableOnce.scala:310)
 at scala
.collection.TraversableOnce.to$(TraversableOnce.scala:308)
 at breeze
.io.CSVReader$$anon$1.to(CSVReader.scala:34)
 at scala
.collection.TraversableOnce.toIndexedSeq(TraversableOnce.scala:300)
 at scala
.collection.TraversableOnce.toIndexedSeq$(TraversableOnce.scala:300)
 at breeze
.io.CSVReader$$anon$1.toIndexedSeq(CSVReader.scala:34)
 at breeze
.io.CSVReader$.read(CSVReader.scala:17)
 at breeze
.linalg.package$.csvread(package.scala:83)
 at com
.picnicml.doddlemodel.linear.SoftmaxClassifierTiming$.delayedEndpoint$com$picnicml$doddlemodel$linear$SoftmaxClassifierTiming$1(SoftmaxClassifierTiming.scala:11)
 at com
.picnicml.doddlemodel.linear.SoftmaxClassifierTiming$delayedInit$body.apply(SoftmaxClassifierTiming.scala:8)
 at scala
.Function0.apply$mcV$sp(Function0.scala:34)
 at scala
.Function0.apply$mcV$sp$(Function0.scala:34)
 at scala
.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
 at scala
.App.$anonfun$main$1$adapted(App.scala:76)
 at scala
.App$$Lambda$5/727001376.apply(Unknown Source)
 at scala
.collection.immutable.List.foreach(List.scala:389)
 at scala
.App.main(App.scala:76)
 at scala
.App.main$(App.scala:74)
 at com
.picnicml.doddlemodel.linear.SoftmaxClassifierTiming$.main(SoftmaxClassifierTiming.scala:8)

Reply all
Reply to author
Forward
0 new messages