out of memory (java heap space) on zip creation (jruby)

68 views
Skip to first unread message

Jedrin

unread,
May 9, 2012, 12:52:27 PM5/9/12
to Ruby on Rails: Talk
I am using rubyzip and am trying to put a huge csv file with 1.4
million rows into the zip file.
Using jruby I get a out of heap error.

I believe the error happens in the block below:

Zip::ZipOutputStream.open(zip_path) do |zos|
zos.put_next_entry(File.basename(csv_path))
zos.print IO.read(csv_path)
end

Luis Lavena

unread,
May 9, 2012, 1:27:31 PM5/9/12
to rubyonra...@googlegroups.com
You're reading the entire file contents into memory and then saving.

Look if there is a way for you to stream chunks (16 kilobytes for example) into the zip stream.

 --
Luis Lavena

Jedrin

unread,
May 9, 2012, 2:07:32 PM5/9/12
to Ruby on Rails: Talk

The error happens on the line:

zos.print IO.read(csv_path)

I see that
p zos.class
shows:
Zip::ZipOutputStream

and that the print method is inherited from:
http://rubyzip.sourceforge.net/classes/IOExtras/AbstractOutputStream.html
where print is shown to be this according to doc:

# File lib/zip/ioextras.rb, line 130
def print(*params)
self << params.to_s << $\.to_s
end



I am not sure offhand how to stream the data, but gathered that the
problem was from reading the
file into memory

Greg Akins

unread,
May 9, 2012, 2:42:47 PM5/9/12
to rubyonra...@googlegroups.com
On Wed, May 9, 2012 at 2:07 PM, Jedrin <jrub...@gmail.com> wrote:



 I am not sure offhand how to stream the data, but gathered that the
problem was from reading the
file into memory

 
The default heapsize for the jvm is pretty small.  I believe you can pass args to jvm when you start jruby

if you do something like -xmx1024m (Not sure that syntax is exactly correct, but it's close) you might get enough.  Of course that depends on the size of the file

--
Greg Akins
http://twitter.com/akinsgre


Jedrin

unread,
May 9, 2012, 3:31:33 PM5/9/12
to Ruby on Rails: Talk


On May 9, 2:42 pm, Greg Akins <angryg...@gmail.com> wrote:
>
> The default heapsize for the jvm is pretty small.  I believe you can pass
> args to jvm when you start jruby
>
> if you do something like -xmx1024m (Not sure that syntax is exactly
> correct, but it's close) you might get enough.  Of course that depends on
> the size of the file
>
> --
> Greg Akinshttp://twitter.com/akinsgre

Well, the csv file has something like 1.4 million rows and maybe 20
columns or something like that. When I get a chance, maybe I'll look
into that if that seems like the thing to try ..


Robert Walker

unread,
May 9, 2012, 4:05:35 PM5/9/12
to rubyonra...@googlegroups.com
Jedrin wrote in post #1060204:
"When I get a chance, maybe..."???

Greg gave you the answer. A default JVM instance heap space is limited
to 64 Megabytes. If the file you're loading, plus the memory consumed by
your application, goes over that memory limit the JVM will report "out
of memory" and begin exhibiting unpredictable behavior.

It make no difference how much physical RAM your machine might contain.
The JVM will NOT use more heap space that the maximum defined by the
-xmx argument (-xmx64m being the default when not specified).

--
Posted via http://www.ruby-forum.com/.

Jedrin

unread,
May 9, 2012, 4:42:43 PM5/9/12
to Ruby on Rails: Talk
>
> Greg gave you the answer. A default JVM instance heap space is limited
> to 64 Megabytes. If the file you're loading, plus the memory consumed by
> your application, goes over that memory limit the JVM will report "out
> of memory" and begin exhibiting unpredictable behavior.
>
> It make no difference how much physical RAM your machine might contain.
> The JVM will NOT use more heap space that the maximum defined by the
> -xmx argument (-xmx64m being the default when not specified).
>
> --
> Posted viahttp://www.ruby-forum.com/.

So I launched my sinatra app like this and from my google searches the
-J arg looks like what I want.

jruby -J-Xmx1024m -S recordset.rb

When I tried to download the csv file (which the server puts into the
zip file and then crashes),
I got the same heap space error, but it seemed like it did run longer
before it crashed. II try to increase that number much higher than
1024m, I get:

Error occurred during initialization of VM
Could not reserve enough space for object heap
JVM creation failed




Greg Akins

unread,
May 9, 2012, 4:52:21 PM5/9/12
to rubyonra...@googlegroups.com
On Wed, May 9, 2012 at 4:42 PM, Jedrin <jrub...@gmail.com> wrote:

 When I tried to download the csv file (which the server puts into the
zip file and then crashes),
I got the same heap space error, but it seemed like it did run longer
before it crashed. II try to increase that number much higher than
1024m, I get:

The heap contains all the objects created for the application.. In this case, it looks like your file is still too big


Error occurred during initialization of VM
Could not reserve enough space for object heap
JVM creation failed

This means that you tried to allocate more than is available on the machine 

Are you doing this for a single load, or will it be an application that will commonly receive large files?

If it's the latter, I'd probably try to redesign the code you're using to load the files.  Sounds like this is part of a third party gem?  If that's the case, maybe they have some mechanism for handling larger files? 

Jedrin

unread,
May 9, 2012, 5:21:39 PM5/9/12
to Ruby on Rails: Talk

> The heap contains all the objects created for the application.. In this
> case, it looks like your file is still too big
>
> > Error occurred during initialization of VM
> > Could not reserve enough space for object heap
> > JVM creation failed
>
> This means that you tried to allocate more than is available on the machine
>
> Are you doing this for a single load, or will it be an application that
> will commonly receive large files?
>
> If it's the latter, I'd probably try to redesign the code you're using to
> load the files.  Sounds like this is part of a third party gem?  If that's
> the case, maybe they have some mechanism for handling larger files?
>
> --
> Greg Akinshttp://twitter.com/akinsgre

What I do is create a csv file from the database. I had some memory
problems there, but using active record find_in_batches() seemed to
solve that.

The CSV file has 1.4 million rows. It gets created successfully. I
then use rubyzip gem to create a zip file that just contains that CSV
file. I just used examples I found from google searches on how to
create the zip file which are shown earlier up in the thread. I looked
at the class info on the web for rubyzip and didn't see an obvious way
to stream data into the zip file. Tomorrow I can look at perhaps some
other way to create a zip file using a different gem or some such ..



Luis Lavena

unread,
May 9, 2012, 7:58:52 PM5/9/12
to rubyonra...@googlegroups.com
As I mentioned in my previous reply and similar to the problem you had when creating the file: you're trying to load the whole thing.

There are two options for this:

A) You stream the contents of your CSV file, reading by chunks into a ZipStream

or

B) You zip the file from outside Ruby (shelling out to gzip for example)

--
Luis Lavena
 

Jedrin

unread,
May 11, 2012, 10:11:52 AM5/11/12
to Ruby on Rails: Talk

> As I mentioned in my previous reply and similar to the problem you had when
> creating the file: you're trying to load the whole thing.
>
> There are two options for this:
>
> A) You stream the contents of your CSV file, reading by chunks into a
> ZipStream
>

That's exactly what I would like to do, I wasn't sure offhand if the
zip method will read it that way or how to pass it. I was hoping for
an idea on how to do that.

The code where it all happens is here and the second line is where it
crashes:

zos.put_next_entry(File.basename(fpath))
zos.print IO.read(fpath)

zos is an instance of Zip::ZipOutputStream.
The print method is inherited from IOExtras::AbstractOutputStream

According to the docs, print() is like this
def print(*params)
self << params.to_s << $\.to_s
end

Since it does params.to_s, I'm guessing that is going to put it all
into memory.
The other methods may have similar problems.

However, the putc method looked interesting.

There is a putc() defined like this according to the docs:

def putc(anObject)
self << case anObject
when Fixnum then anObject.chr
when String then anObject
else raise TypeError, "putc: Only Fixnum and String
supported"
end
anObject
end


So I tried that, here is my code, and the output follows, but the file
I was trying to zip was another zip file. It appeared to be a bit
bigger than it should have been and when I tried to open it, I got an
error saying it was corrupted.

This isn't quite the same CSV problem, but I am doing a zip file into
a zip file here.


def zput(zos,fpath)
p fpath
zos.put_next_entry(File.basename(fpath))
f = File.new(fpath)
chunk_sz = 10000000
while !f.eof?
data = f.read(chunk_sz)
zos.putc data
puts 'read ' + data.size.to_s + ' bytes'
end
end


"web.war"
read 10000000 bytes
read 10000000 bytes
read 8573823 bytes
"data.war"
read 10000000 bytes
read 8655347 bytes
"big.zip"
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 10000000 bytes
read 3431079 bytes

Jedrin

unread,
May 11, 2012, 10:26:46 AM5/11/12
to Ruby on Rails: Talk

I changed the putc about to a write in the above post, followed by
zos.print "" at the very end. print() adds $\ to the file it appears.
My byte size of the zip file inside the zip was short by two bytes and
I still get corrupted zip file errors on that.

Jedrin

unread,
May 11, 2012, 4:47:50 PM5/11/12
to Ruby on Rails: Talk
It's late Friday and I am done for the day, but I just tried something
else. It may be that I need to open the file in binary mode and I
didn't. Initial tests seem to indicate that may be the case. Thanks
for everyone's help.
Reply all
Reply to author
Forward
0 new messages