Hi,
I'm processing a 350 MB XML file on my Loopback server, and of course ran into trouble immediately with my previous method of just reading the file and using xpath to extract what I need. The XML is basically a very long list of entries
<xml>
<entry>
...
</entry>
<entry>
...
</entry>
...
</xml>
So, I tried using readline to extract each entry on its own, process that, enter it into the database (I'm using a local mongodb), and then continue reading the xml. This works fine for 10 minutes or so, and then the script fails with 'Allocation failed - process out of memory'
I've googled around a bit, and here are two posts by someone with a very similar problem:
It seems that what I need to do is to implement this as a stream all the way, that way there is a built-in backpressure mechanism that throttles the read to accommodate a slow output process. But don't I then need to implement a streaming write function to the database as well? Is this possible using the built in model API? Or do I need to use a third party streaming module like 'stream-to-mongo'?
My current workaround is to wait until each entry has been saved before I start processing the next.
*** A corollary question is about what happens while the script is creating all these entries (in the future it might mostly be just updating, e.g. removing expired entries), it seems to make the API completely unresponsive. Since Node should be able to handle all this in a non-blocking way, my guess is that the mongo server is the bottleneck? Any thoughts on how to improve the situation?
Cheers,
Einar