Streaming parser for XML and JSON

1,430 views
Skip to first unread message

Wan Li

unread,
Mar 2, 2012, 5:46:35 AM3/2/12
to nod...@googlegroups.com
Hi Gurus,

I'm trying to convert some related large XML files (~30MB) to JSON format. I've tried some out of box solutions such as xml2js, xml2json. All of them consumes a lot of memory (200~300MB) ,so I'm wondering if there are modules that are less memory hungry.
After the JSON string is constructed, it takes about 700ms to JSON.parse the content which blocks the event loop. So is there any streaming JSON parser can help here?
Thanks. 

--
>: ~

fent

unread,
Mar 2, 2012, 6:04:04 AM3/2/12
to nod...@googlegroups.com

Wan Li

unread,
Mar 2, 2012, 6:07:09 AM3/2/12
to nod...@googlegroups.com
On Fri, Mar 2, 2012 at 7:04 PM, fent <rol...@gmail.com> wrote:
xml2js is built on top of sax-js. I even hacked version to utilize the streaming API provided by sax-js, but doesn't help too much.

--
>: ~

Roly

unread,
Mar 2, 2012, 6:14:11 AM3/2/12
to nod...@googlegroups.com
That's interesting.. xml2js returns a string in JSON format. Using sax-js directly and creating the Javascript object as the file is being read is more optiimal.

There's no reason to parse all that data twice.

--
>: ~

--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to
nodejs+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Nuno Job

unread,
Mar 2, 2012, 6:23:34 AM3/2/12
to nod...@googlegroups.com
No, saxjs cant be using all that memory. xml2js is probably doing some
buffering of its own that justifies that memory usage.

I've seen the saxjs code and the buffers are very well kept, plus you
can set a limit after which buffers will be purged.

Nuno

Thierry Templier

unread,
Mar 2, 2012, 8:10:27 AM3/2/12
to nod...@googlegroups.com
Hello,

I recently updated Node to version 0.6.11 and I have now problems when
trying to use node-waf to compile a native addon.

Here is the error I have:

Traceback (most recent call last):
File "/usr/local/bin/node-waf", line 16, in <module>
Scripting.prepare(t, os.getcwd(), VERSION, wafdir)
File "/usr/local/bin/../lib/node/wafadmin/Scripting.py", line 145, in
prepare
prepare_impl(t, cwd, ver, wafdir)
File "/usr/local/bin/../lib/node/wafadmin/Scripting.py", line 135, in
prepare_impl
main()
File "/usr/local/bin/../lib/node/wafadmin/Scripting.py", line 188, in
main
fun(ctx)
File "/usr/local/bin/../lib/node/wafadmin/Scripting.py", line 386, in
build
return build_impl(bld)
File "/usr/local/bin/../lib/node/wafadmin/Scripting.py", line 399, in
build_impl
bld.add_subdirs([os.path.split(Utils.g_module.root_path)[0]])
File "/usr/local/bin/../lib/node/wafadmin/Build.py", line 981, in
add_subdirs
self.recurse(dirs, 'build')
File "/usr/local/bin/../lib/node/wafadmin/Utils.py", line 634, in recurse
f(self)
File
"/home/templth/work/projects/node.js/manning/node.js-in-practice/source-code/addons/technique4/wscript",
line 11, in build
t = ctx.new_task_gen('cxx', 'shlib', 'node_addon')
File "/usr/local/bin/../lib/node/wafadmin/Build.py", line 335, in
new_task_gen
ret = cls(*k, **kw)
File "/usr/local/bin/../lib/node/wafadmin/Tools/ccroot.py", line 162,
in __init__
TaskGen.task_gen.__init__(self, *k, **kw)
File "/usr/local/bin/../lib/node/wafadmin/TaskGen.py", line 118, in
__init__
self.env = self.bld.env.copy()
AttributeError: 'NoneType' object has no attribute 'copy'

Thanks very muc for your help!
Thierry

Jeremy Darling

unread,
Mar 2, 2012, 8:49:40 AM3/2/12
to nod...@googlegroups.com
I can't help you with your problem but I will say that maybe you should be moving to node-gyp since it is the replacement for node-waf?  At least according to TTN it is :)

" node-gyp is a cross-platform command-line tool written in Node.js for compiling native addon modules for Node.js, which takes away the pain of dealing with the various differences in build platforms. It is the replacement to the node-waf program which is removed for node v0.8. If you have a native addon for node that still has a wscript file, then you should definitely add a bindings.gyp file to support the latest versions of node."

I've actually had better experiences getting things to  build with GYP than WAF

 - Jeremy


Thierry

--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to

Isaac Schlueter

unread,
Mar 2, 2012, 6:38:09 PM3/2/12
to nod...@googlegroups.com
xml2js is built on the sax streaming interface, but it doesn't present
a streaming interface. It builds up a data structure representing the
entire xml document.

If you have *lots* of xml to parse, you need to talk to the streaming
directly, not through xml2js.

Reply all
Reply to author
Forward
0 new messages