Re: [nodejs] Using http client on a high volume streaming socket and I have to put split chunks of XML back together - why?

201 views
Skip to first unread message

Ben Noordhuis

unread,
Dec 28, 2012, 6:42:33 AM12/28/12
to nod...@googlegroups.com
On Thu, Dec 27, 2012 at 8:51 PM, am_p1 <andrewmc...@gmail.com> wrote:
> Just fyi, I never get an "end" since this is streaming... just "data" always
> and always... and 99.99% of them are not split chunks, ie; they are complete
> and valid XML.
>
> So I've got the merging back together mostly working but it's a pain and
> it's still not 100%, I assume due to me trying to merge them back together
> in UTF8 strings. Maybe I should be doing it with a buffer instead? cause I
> read the chunks could be split in the middle of the UTF8 encoding... yikes!!
>
> This cleared up most of it but again, still not 100%.
> str = (save + data).toString('utf8')
>
> And shouldn't node.js be handling this somehow? ( he asked not knowing
> hardly anything about node.js - this is my first post!!! )
>
> Thanks for any assistance!!

stream.setEncoding('utf8') will take care of partial multi-byte
sequences. Your data event listener will receive strings instead of
buffers.

Alternatively, you can concat the buffers together. The buffertools
module has a helper for that but it's quite trivial to implement one
yourself:

// a and b are the input buffers
var c = new Buffer(a.length + b.length);
a.copy(c, 0);
b.copy(c, a.length);

am_p1

unread,
Dec 28, 2012, 9:08:36 AM12/28/12
to nod...@googlegroups.com
I'm using this for sure:
response.setEncoding('utf8')

but the problem is the chunks can be split more than once and with UTF8 strings there doesn't seem to be any character that indicates the buffer was split. I read that JSON responses have the \n you can use but I don't see that anywhere in the XML response I'm receiving.

If node.js doesn't put these back together for me, then I need to figure out what characters are at the end of the buffer to indicate a split chunk so I can put them back together myself. Currently I'm looking for some strings in the XML packet to indicate a complete or incomplete chunk but again, it's not working 100%.

Current code below works 99.99% of the time receiving millions of these during the day. But occasionally DB2 won't accept the XML, now always with this error message:
SQL16110N  XML syntax error. Expected to find "Comment or PI".  SQLSTATE=2200M

Thanks for the help!!

====================================
  request.on('response',function(response) {

    response.setEncoding('utf8')

    response.on('data',function(data) {

      resp_ts = microtime.now().toString()

      datal = data.length
      datae = data.slice(-8)

      if (datal < 8 || (datae != '</trade>' && datae != '</quote>' && datae != '/status>')) {

        mics = resp_ts % mill

        console.log(Date(resp_ts).split(" ")[4]+"."+("000000"+mics).slice(-6),'partial')

        save = (save + data).toString('utf8')

      } else {

        str = (save + data).toString('utf8')

        save = ''

        db.query("insert into p1s values (?,current_timestamp,?)",[resp_ts,str],function(err, rows) {
.............
====================================

Ben Noordhuis

unread,
Dec 28, 2012, 9:46:20 AM12/28/12
to nod...@googlegroups.com
On Fri, Dec 28, 2012 at 3:08 PM, am_p1 <andrewmc...@gmail.com> wrote:
> I'm using this for sure:
> response.setEncoding('utf8')
>
> but the problem is the chunks can be split more than once and with UTF8
> strings there doesn't seem to be any character that indicates the buffer was
> split. I read that JSON responses have the \n you can use but I don't see
> that anywhere in the XML response I'm receiving.
>
> If node.js doesn't put these back together for me, then I need to figure out
> what characters are at the end of the buffer to indicate a split chunk so I
> can put them back together myself. Currently I'm looking for some strings in
> the XML packet to indicate a complete or incomplete chunk but again, it's
> not working 100%.

You may be looking at two separate issues here.

1. Partial character sequences. When used as documented,
stream.setEncoding() takes care of that: if the data chunk ends in a
partial sequence, it's not emitted until the next chunk arrives.

For the curious, the relevant code is in lib/string_decoder.js.

2. Partial XML documents. node.js can't help you here, you somehow
need to track that yourself.

If the server sets a Content-Length header, it's easy: just xml +=
data until Buffer.byteLength(xml) equals the content length. Caveat
emptor: repeatedly calling Buffer.byteLength() is not very efficient
but don't worry about that until later. Make it work first, then make
it work fast.

If the response is sent using chunked encoding, you probably need to
parse it with a SAX parser first.

am_p1

unread,
Dec 28, 2012, 11:42:53 AM12/28/12
to nod...@googlegroups.com
I doubt there's anything wrong with the XML, given the company I'm getting the data from. And I checked already that DB2 (the parser) has no issue in this area. And well over 3 million a day are being parsed/inserted just fine.

I just don't think I'm putting the UTF8 back together correctly on some edge case, or there's some other bug somewhere above my pay grade.

Is there any npm package I could use other than the standard node.js http client that might already have this resolved?

am_p1

unread,
Dec 28, 2012, 11:45:42 AM12/28/12
to nod...@googlegroups.com
and I left out that the console.log in UTF8 of the small XML packet looks perfectly "well formed" with absolutely no issues. That's why I think it's at a lower level...

Ben Noordhuis

unread,
Dec 28, 2012, 12:09:10 PM12/28/12
to nod...@googlegroups.com
On Fri, Dec 28, 2012 at 5:42 PM, am_p1 <andrewmc...@gmail.com> wrote:
> I doubt there's anything wrong with the XML, given the company I'm getting
> the data from. And I checked already that DB2 (the parser) has no issue in
> this area. And well over 3 million a day are being parsed/inserted just
> fine.
>
> I just don't think I'm putting the UTF8 back together correctly on some edge
> case, or there's some other bug somewhere above my pay grade.
>
> Is there any npm package I could use other than the standard node.js http
> client that might already have this resolved?

Maybe request, though I can't vouch for it.

On a tangential note, we (that is, a majority of the node.js core
maintainers) are putting together a paid support subscription system.
I can get you in touch with the right people if you want, issues like
yours are precisely why we started providing support plans.

am_p1

unread,
Jan 3, 2013, 10:21:09 AM1/3/13
to nod...@googlegroups.com
Thanks for the assist and the suggestion of paid support but I finally figured it out.

Had two issues, both my bad:
1 - javascript scope of some of my variables was wrong, especially in async node.js world ("use strict" helped me since I'm such a javascript noobie)
2 - two well formed xml packets concat'ed together don't equal a well formed xml packet

Currently receiving and inserting into db2 over 3 million XML packets in 6.5 hour day and only using about 1% of my server!!!

So Node.js + node-odbc + db2 is working great and the speed and low resource consumption is amazing!!!

And I'm only running one node on a streaming API. Now I'm going to try a few more with child_process.fork....

Thanks again!!

Ben Noordhuis

unread,
Jan 3, 2013, 11:54:21 AM1/3/13
to nod...@googlegroups.com
On Thu, Jan 3, 2013 at 4:21 PM, am_p1 <andrewmc...@gmail.com> wrote:
> Thanks for the assist and the suggestion of paid support but I finally
> figured it out.
>
> Had two issues, both my bad:
> 1 - javascript scope of some of my variables was wrong, especially in async
> node.js world ("use strict" helped me since I'm such a javascript noobie)
> 2 - two well formed xml packets concat'ed together don't equal a well formed
> xml packet
>
> Currently receiving and inserting into db2 over 3 million XML packets in 6.5
> hour day and only using about 1% of my server!!!
>
> So Node.js + node-odbc + db2 is working great and the speed and low resource
> consumption is amazing!!!
>
> And I'm only running one node on a streaming API. Now I'm going to try a few
> more with child_process.fork....
>
> Thanks again!!

Glad you got it solved. :-)

The reason I brought up support is that I thought you were someone I
spoke with earlier last week; he was also doing something with DB2 and
XML. What are the odds, right?
Reply all
Reply to author
Forward
0 new messages