Apache POI WorkbookFactory.create throws java.lang.OutOfMemoryError: Java heap space

226 views
Skip to first unread message

advocate via StackOverflow

unread,
Sep 11, 2014, 5:01:07 PM9/11/14
to google-appengin...@googlegroups.com

My issue is pretty simple. I want to validate files up to 50MB in size for correct formatting in App Engine.

This presents a number of large challenges now. The first of which is the Apache XLS/XLSX POI API. When I load 20MB of file data into memory locally before validation it throws:

java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2271)
    at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
    at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
    at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
    at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource$FakeZipEntry.<init>(ZipInputStreamZipEntrySource.java:128)
    at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:55)
    at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:84)
    at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:272)
    at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:79)

That's because I'm using the input stream method instead of the file method as detailed in the Apache POI quick start guide:

When opening a workbook, either a .xls HSSFWorkbook, or a .xlsx XSSFWorkbook, the Workbook can be loaded from either a File or an InputStream. Using a File object allows for lower memory consumption, while an InputStream requires more memory as it has to buffer the whole file.

To the best of my knowledge, the blob store service in App Engine only returns streams of bytes, not actual file objects. This means I have to use the Apache API for a stream which requires the entire file to be read into memory first. Thus the gist of my issue. Here's my sample code:

validatingSpec.validate(WorkbookFactory.create(blobInputStream)

Here are my questions:

  1. Is there a different blobstore API to receive my file as a file object?
  2. Is there a different spreadsheet-reading API that accepts byte streams and doesn't load the entire file into memory at one time?
  3. Can I raise the heap space on my server easily to account for this? Please note I'm already running on an F4_1G (2400MHz, 1024MB) instance. This heap issue is during local web server emulation validation. Remotely the HTTP session times out as it passes the request / response deadline for 30 seconds that Google has set.


Please DO NOT REPLY directly to this email but go to StackOverflow:
http://stackoverflow.com/questions/25796777/apache-poi-workbookfactory-create-throws-java-lang-outofmemoryerror-java-heap-s
Reply all
Reply to author
Forward
0 new messages