parsing real-time xml data coming from udp socket

1,672 views
Skip to first unread message

eschmitt

unread,
Dec 5, 2010, 12:07:45 AM12/5/10
to nodejs
Hi all,

I'm writing an application that needs to parse real-time xml data
coming from a udp socket, at 1 second intervals. Once I get the data,
I need to insert a time stamp, save the data to a MySQL database, and
send to a web app/client via socket.io. I have most of this working,
with test data being transmitted via udp and then sent to the web app/
client, at 1 second intervals. It works really well, however its all
using JSON objects which are then stringified for udp and socket.io
transmission and then finally parsed on the client side. Also, I'm
concerned that my meager xml parsing skills may yield code that is not
fast enough to keep up with the 1 second intervals. Any advice on
which node.js xml parsing library to go with, with an emphasis on
speed? Also, any tips on how best to implement this type of algorithm?
Best practices, things to watch out for, etc... I really appreciate
any advice you have.

Cheers,
Eric

Mikeal Rogers

unread,
Dec 5, 2010, 5:53:55 PM12/5/10
to nod...@googlegroups.com
why udp?

if a packet gets dropped that xml isn't gonna parse.

Preston Guillory

unread,
Dec 5, 2010, 6:29:20 PM12/5/10
to nod...@googlegroups.com
Any XML document that fits in a single UDP packet (64 KB, had to look that up) should be easy to parse in much less than a second.  Do you know how big they'll be, ballpark?

How many connections are you expecting?  Just the one UDP feed and one client?

Is it feasible to just stick with JSON?

How are you even receiving UDP packets in Node?  I only see TCP implemented in the docs.

Marco Rogers

unread,
Dec 5, 2010, 7:36:21 PM12/5/10
to nodejs
You can try my libxmljs library. It's a node binding to libxml2.

https://github.com/polotek/libxmljs

It has a streaming xml parser and it's fast. I don't think anyone's
used it for the use case you're describing so I'd be very interested
to see how it holds up. I'm also very responsive to issues.

However, the points that Mikeal and Preston made are important. If
you miss packets your parsing will fail. And If your xml fits in 1
packet, you probably don't need a streaming parser. There's also a
non-streaming parser so you can parse the full payload directly into
an xml dom.

@Preston http://nodejs.org/docs/v0.2.5/api.html#dgram-267

:Marco

Preston Guillory

unread,
Dec 5, 2010, 7:52:33 PM12/5/10
to nod...@googlegroups.com
Thanks.  Learn something new every day.

On Sun, Dec 5, 2010 at 6:36 PM, Marco Rogers <marco....@gmail.com> wrote:

@Preston http://nodejs.org/docs/v0.2.5/api.html#dgram-267

:Marco

eschmitt

unread,
Dec 6, 2010, 4:37:29 PM12/6/10
to nodejs
@ Mikeal,

You're absolutely correct, if the packet is dropped, I'll have nothing
to parse. But these are unfortunately the constraints I have to work
with. As far as I know, we are not worried about dropping packets.
Suffice it to say, the design required UDP, and I didn't have input
into that portion of the project. I hope this answers your question.

-Eric

eschmitt

unread,
Dec 6, 2010, 5:15:15 PM12/6/10
to nodejs
@Preston,

The test file I'm working with is only 503 bytes, and is roughly
1/14th the size of what the actual data sent will be. It looks like
this:

<resultset>
<result key="some_key" value="some_value" />
<!-- repeat <result> tag 3 more times -->
<!-- this is the summary/overview for all following resultsets -->
</resultset>
<resultset>
<result key="some_key" value="some_value" />
<!-- repeat 6 more times -->
</resultset>

...repeat the second <resultset> approximately 14 times.

This will be my first time actually having to parse xml, previously
I've used JAXB for Java which makes short work of things if you have
an XSD. JAXB simply creates Java objects based on the XSD, which you
can then use to easily retrieve your xml data. I'm guessing that
node.js does not have anything that sophisticated yet? Ideally, I'm
looking to get this into a JSON object as soon as I can, but
unfortunately it has to be after I receive it as xml via udp...
BELIEVE me when I say I really tried to push for JSON to be created
and sent over udp, but we just don't have the resources and time at
this point to go back and re-work that section of the project. :( At
any rate, should I be looking at a SAX parser to parse the xml snippet
above? Does this make any sense?

Cheers,
Eric

Dean Landolt

unread,
Dec 6, 2010, 5:30:27 PM12/6/10
to nod...@googlegroups.com
If it all fits into a single UDP packet (which it sounds like it must) there's no need for a sax parser -- you can easily fit the whole thing in memory. You could probably even get away with one of the various xml-to-json libs to get it into a (dom-like) json file right away.

Marco Rogers

unread,
Dec 6, 2010, 5:51:51 PM12/6/10
to nodejs
Yeah there are a few here that are much lighter than libxmljs.

https://github.com/ry/node/wiki/modules#parsers-xml


On Dec 6, 5:30 pm, Dean Landolt <d...@deanlandolt.com> wrote:

eschmitt

unread,
Dec 6, 2010, 5:53:51 PM12/6/10
to nodejs
@Marco,

Will this work on a 64-bit Linux (RedHat Enterprise Linux) platform?
If so, could you detail the steps to get it up and running? I ran into
some problems trying to install node-o3-fastxml and node-o3-xml on a
64bit platform, as they didn't have 64-bit binaries. Also, I'm not
real sure yet what the total size of the payload will be and if it
will fit in one packet... so I think trying to parse it as a stream
would be the safer approach for now. Your thoughts?..

Cheers,
Eric

On Dec 5, 4:36 pm, Marco Rogers <marco.rog...@gmail.com> wrote:
> You can try my libxmljs library.  It's a node binding to libxml2.
>
> https://github.com/polotek/libxmljs
>
> It has a streaming xml parser and it's fast.  I don't think anyone's
> used it for the use case you're describing so I'd be very interested
> to see how it holds up.  I'm also very responsive to issues.
>
> However, the points that Mikeal and Preston made are important.  If
> you miss packets your parsing will fail.  And If your xml fits in 1
> packet, you probably don't need a streaming parser.  There's also a
> non-streaming parser so you can parse the full payload directly into
> an xml dom.
>
> @Prestonhttp://nodejs.org/docs/v0.2.5/api.html#dgram-267

Marco Rogers

unread,
Dec 7, 2010, 8:32:30 AM12/7/10
to nodejs
libxmljs works on my 64-bit Macbook Pro. I don't see why it wouldn't
work on Redhat. I don't have the environment to test though.

Whether you should stream or not really depends on how confident you
are in the UDP connection. I've never worked with UDP so I don't what
a typical packet loss rate is. If average the size of your xml docs
is less than the average packet size (1K ? I have no idea), then you
could try parsing a packet at a time and see how often it fails.

You could go with streaming option by default and it wouldn't matter
if it came in chunks. In that case, libxmljs will take the chunks and
surface errors if there is a parse problem (it doesn't throw them,
they're available as document.errors). You could run some tests with
this setup and see how often it happens.

Streaming xml is bad enough (twitter just removed the xml option from
their streams). But over UDP adds a whole new layer of headache. Good
luck :)

:Marco
Reply all
Reply to author
Forward
0 new messages