[groovy-user] xmlslurper parsing large XML documents results in heapsize errors

424 views
Skip to first unread message

Alex Wouda

unread,
Mar 31, 2010, 7:29:49 AM3/31/10
to us...@groovy.codehaus.org
Hi,

I'm looping trough a about 100+ files, whereby each file has a size between 5 and 8 MB with XML.
XML is very simple, see example below.
Per file I parse the text with XmlSlurper to get the values and put them in a grails domain class and persist.

I've increased heapsize to 768Mb but still I get OutOfMem exceptions. I installed MAT (Eclipse Memory Analyzer). My suspicion seemed right, somewhere in the XmlSlurpering the heap is not cleaned correclty by the GC. or at least that is what I understand of it.I've attached a screenshot of the output of the MAT tool.

First question, according to the creator of XmlSlurper, this XmlSlurper was designed to deal with great amounts of xml in a file/string. So I must be doing something wrong. Should I explicit empty / nullify objects to achieve my goals?  I've attached a part of the code.

How can I prevent these memory errors? Is there an error in my code that I'm missing or is XmlSlurper not capable of dealing with 5-8Mb size files? (if I test a single file from a groovyConsole it's very quick..)

Thanks for your help.

Alex



example XML

<ROWSET>
 <ROW>
  <CUSTOMERIDENTIFIER>xxxxxx</CUSTOMERIDENTIFIER>
  <GENERATEDIDENTIFIER>xxxx</GENERATEDIDENTIFIER>
  <DATEOFBIRTH>25-xxx-89</DATEOFBIRTH>
  <FIRSTNAME>xxxxx</FIRSTNAME>
  <LASTNAME>xxxx</LASTNAME>
  <INITIALS>s</INITIALS>
  <GENDER>M</GENDER>
  <EMAILADDRESS>xxxxx</EMAILADDRESS>
  <TELEPHONE>xxxxx</TELEPHONE>
  <PAYMENTTYPEID>1</PAYMENTTYPEID>
  <PAYMENTSTATUSID>0</PAYMENTSTATUSID>
  <HASPAIDWARRANTY>0</HASPAIDWARRANTY>
  <HASNOTENTEREDREFERENCENUMBER>0</HASNOTENTEREDREFERENCENUMBER>
  <LABELID>0</LABELID>
  <ISINVOICEADDRESS>0</ISINVOICEADDRESS>
  <ISCOMPANY>0</ISCOMPANY>
  <CANBEUSEDFOROTHERPURPOSES>0</CANBEUSEDFOROTHERPURPOSES>
  <STREETNAME>xxxxx</STREETNAME>
  <HOUSENUMBER>171</HOUSENUMBER>
  <POSTALCODE>xxxx</POSTALCODE>
  <CITY>ARNHEM</CITY>
  <COUNTRYID>1</COUNTRYID>
 </ROW>

screenshot.png
usedMethod.groovy

Alex Wouda

unread,
Mar 31, 2010, 8:16:06 AM3/31/10
to us...@groovy.codehaus.org
Hi,

I increased the xmx setting to 1250M and now it seems to work. in jconsole I see that memory usage stays around 1gig with spikes to 1.3 and after GC it drops to about 700M. So my guess is that parsing about 8MB large xml files takes some more memory...

I'm now running the process around 20 minutes and  until now about 12 files of about 8Mb have been processed and inserted 100K records.

So we might just have to increase the heapsize temporary for this import process.

If anybody still has code / performance / memory usage improvements, that would be interesting,

regards,

Alex

Paul King

unread,
Mar 31, 2010, 8:34:02 AM3/31/10
to us...@groovy.codehaus.org

XmlSlurper is fairly efficient for the kind of processing it allows
you to do - though whether everything is being GC'd as early as it
could is something worth looking into. Having said that, XmlSlurper
will most likely be much slower and will have more memory requirements
than StAX or SAX alternatives if you are willing to write the less
concise code which those alternatives require.

Cheers, Paul.


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email


Reply all
Reply to author
Forward
0 new messages