XML parsing and editing of big (not huge) files

98 views
Skip to first unread message

Enrico Rosso

unread,
Mar 31, 2015, 10:23:09 AM3/31/15
to lu...@googlegroups.com
Hello Lucee community.

I'm building a web tool that

- parses a XML document (always different)
- retrieve some nodes
- edits some attributes
- save it back.

on big documents (>3-4MB) the procedure just hangs with no apparent message. Since it stalls the whole lucee engine i suspect is an out of memory error due to XMLParse loading all the document in memory.

I've seen several approaches using java inputstream or cfml readline(), but these are good for reading but not for editing an xml.

My documents are from few kb to 70-80Mb. They're big but shouldn't be that huge....

Is there any library or method I could use to do that?

Thank You in advance for your answers.

-- 
Enrico

Walter Seethaler

unread,
Mar 31, 2015, 11:45:03 AM3/31/15
to lu...@googlegroups.com
Hi Enrico,

we use Saxon EE for XSLT and schema validation, XML changes are done via XSLT. There is a free versiont too, but it doesn't support streaming:


We still use some CF xml functions. After we switched to Lucee (with its default memory settings) we had the same server behavior as you describe it (caused by long running XML imports). The server now has 8 GB heap and runs stable.

Walter

Alex Skinner

unread,
Mar 31, 2015, 5:29:08 PM3/31/15
to lu...@googlegroups.com

I agree definitely large xml files are not a problem but check your memory settings of what you are giving to Lucee.

A

Sent from my phone

--
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/16aa33c6-cf38-4a57-8f81-384cdac35116%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Enrico

unread,
Mar 31, 2015, 6:42:52 PM3/31/15
to lu...@googlegroups.com
Thank you guys for your answers. Il check for memory settings. I guess, being on tomcat, I should use setenv.sh

Are there any suggested settings for Lucee?

-- 
Enrico

(Inviato da iPhone)
You received this message because you are subscribed to a topic in the Google Groups "Lucee" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lucee/4QzaECkyBEs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lucee+un...@googlegroups.com.

To post to this group, send email to lu...@googlegroups.com.

Andrew Dixon

unread,
Apr 1, 2015, 3:15:33 AM4/1/15
to lu...@googlegroups.com
Hi Enrico,

Yes, that is where you set the memory settings. It is impossible to give "recommended" settings as it completely depends on your application. Set them to some value and then test your application, monitor for performance, memory usage, etc... and see how it is performing.

To measure performance you can use many different applications but I would recommend either Fusion Reactor (http://www.fusion-reactor.com/) or New Relic (http://newrelic.com/). Fusion Reactor is more CFML specific and is a bit easier to install than New Relic. New Relic is more "industry standard" for Java APM but requires a bit more work. I put together a helper CFC for New Relic recently which you can find here https://github.com/mso-net/lucee-newrelic that gives you more CFML information than the out-of-the-box installation. Both have 14 day free trials and New Relic has a "free" tier than you can continue to use after 14 days but with a limited information set and only 24 hours of data retention.

Kind regards,

Andrew
about.me
mso - Lucee - Member

Michael Offner

unread,
Apr 2, 2015, 11:58:41 AM4/2/15
to lucee
Ii have added a new entry to the wiki that shows how you can use the event driven XML parser  (SAX) with Lucee.

This parser does not store any data on it's own in memory, it is completely up to you to store data. So you can read xml files in any size.

Micha



--
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.

Igal @ Lucee.org

unread,
Apr 2, 2015, 12:15:08 PM4/2/15
to lu...@googlegroups.com
cool stuff! :)

Igal Sapir
Lucee Core Developer
Lucee.org

Risto

unread,
Apr 2, 2015, 7:57:12 PM4/2/15
to lu...@googlegroups.com
Great example. I love the cookbooks. Cookbooks that demonstrate how Lucee can provide functionality for popular tasks is a sure way to get people to use or try the technology.
Reply all
Reply to author
Forward
0 new messages