Non-web always-blocking cpu-intensive high-memory node worker

131 views
Skip to first unread message

Baz

unread,
May 24, 2013, 1:36:12 AM5/24/13
to nod...@googlegroups.com
I'm new to node and I am investigating using node for a non-web always-blocking cpu-intensive high-memory process. The process basically runs in an infinitely loop, loading a lot of data from a datastore into memory, applying complex (blocking) business logic, then saving the result back to the store, and starting over again. It doesn't respond to web-requests or process html or anything of that nature. It just needs to run this loop.

Is this an acceptable use-case for node, given that it breaks every rule-of-thumb ever written? Is node reasonably efficient at doing this (it doesn't have to be the best solution in the world, just good enough)? Are there pitfalls to consider with having an always blocked event loop? Are there any issues with using lots of ram on a server (30gb+) in a single node? 

Additionally, anyone know of any good write-ups of people who have tried this?

Thanks for listening!

Baz

Prajwal Manjunath

unread,
May 24, 2013, 2:24:20 AM5/24/13
to nod...@googlegroups.com
There's nothing wrong with this per se, but you're simple not using node to its strengths here. Mozilla's Persona actually does something similar using node-compute-cluster (https://github.com/lloyd/node-compute-cluster).

But why use javascript for this? Computationally heavy programs are traditionally better done in purely functional languages, so I would rather use something lower level (thus faster) like JVM (Scala/Java) for this. Also, there are a lot fewer libraries to support computation heavy tasks in the Node ecosystem than in Java or Haskell.

If you feel you MUST use a high level scripting language, I guess JS is the best choice, followed by Python.

Pedro Teixeira

unread,
May 24, 2013, 4:54:17 AM5/24/13
to nod...@googlegroups.com
Node is great for doing I/O and not much else, so I'd settle on using node as master and have child processes, defining a clear interface between the 2 (perhaps JSONStream?) and then use whatever technology is more appropriate for implementing these child processes, which is where the CPU-heavy tasks get done.

--
Pedro

--
--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to
nodejs+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en
 
---
You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Ben Noordhuis

unread,
May 24, 2013, 7:24:50 AM5/24/13
to nod...@googlegroups.com
On Fri, May 24, 2013 at 7:36 AM, Baz <b...@thinkloop.com> wrote:
> I'm new to node and I am investigating using node for a non-web
> always-blocking cpu-intensive high-memory process. The process basically
> runs in an infinitely loop, loading a lot of data from a datastore into
> memory, applying complex (blocking) business logic, then saving the result
> back to the store, and starting over again. It doesn't respond to
> web-requests or process html or anything of that nature. It just needs to
> run this loop.
>
> Is this an acceptable use-case for node, given that it breaks every
> rule-of-thumb ever written?

It's certainly possible but it's not node's primary strength.

> Is node reasonably efficient at doing this (it doesn't have to be the best
> solution in the world, just good enough)?

It depends on what your business logic consists of. For example,
numbers in JS are 64 bits doubles so you're dealing with inexact
floating-point arithmetic all the time. Which is okay for most use
cases but not, say, when you are processing monetary transactions.

> Are there pitfalls to consider with having an always blocked event loop?

Things like process.nextTick() and setTimeout() won't work. Neither
will any kind of asynchronous I/O but I guess you already realized
that.

> Are there any issues with using lots of ram on a server (30gb+) in a single
> node?

Not really. However, the JS heap is limited to slightly less than 2
GB. Buffers and typed arrays live mostly outside the JS heap though,
so it's still possible to process large quantities of data.

Mark Hahn

unread,
May 24, 2013, 2:41:49 PM5/24/13
to nod...@googlegroups.com
Which is okay for most use cases but not, say, when you are processing monetary transactions.

This is somewhat of a myth.  People are afraid of floating point numbers because they are somehow "imprecise".  In fact, up to around 50 bits they are as precise as integers.  I've used them for money many times.  It is as simple as keeping the units in pennies.

Matt

unread,
May 24, 2013, 3:50:16 PM5/24/13
to nod...@googlegroups.com

On Fri, May 24, 2013 at 2:41 PM, Mark Hahn <ma...@hahnca.com> wrote:
Which is okay for most use cases but not, say, when you are processing monetary transactions.

This is somewhat of a myth.  People are afraid of floating point numbers because they are somehow "imprecise".  In fact, up to around 50 bits they are as precise as integers.  I've used them for money many times.  It is as simple as keeping the units in pennies.

It's not a myth at all if you're actually using floating point numbers though. Try this in node: 1.03 - 0.42.

If you're keeping the units in pennies then you're not using floating point anyway.

Matt.

Mark Hahn

unread,
May 24, 2013, 4:14:13 PM5/24/13
to nod...@googlegroups.com
 Try this in node: 1.03 - 0.42.

It gives the correct answer after rounding.  Also, integers certainly can't do that.  What is your point?

If you're keeping the units in pennies then you're not using floating point anyway.

Right. The floating point in js provides integer support.  For money use floating point as integers.

You can do anything in js you can do with integers up to 50 bits of precision.  JS is in no way inferior for money apps which is what you claimed.


--

Matt

unread,
May 24, 2013, 4:30:04 PM5/24/13
to nod...@googlegroups.com
On Fri, May 24, 2013 at 4:14 PM, Mark Hahn <ma...@hahnca.com> wrote:
 Try this in node: 1.03 - 0.42.

It gives the correct answer after rounding.  Also, integers certainly can't do that.  What is your point?

My point was simply that saying that floating point numbers are "as precise as integers up to 50 bits" is incorrect and misleading. Pocket change money values can result in incorrect calculations if using floating point.
 
If you're keeping the units in pennies then you're not using floating point anyway.

Right. The floating point in js provides integer support.  For money use floating point as integers.

You can do anything in js you can do with integers up to 50 bits of precision.  JS is in no way inferior for money apps which is what you claimed.

No I'm not the OP - I never claimed that. Doing it as integers/pennies is the right thing to do though.

Baz

unread,
May 24, 2013, 5:23:05 PM5/24/13
to nod...@googlegroups.com
Thanks for the responses gents. As a little background, the reason I'm considering node for this is because the datastore in question is Firebase - a real-time document store with a node sdk. It has a REST api too, so I don't have to use node, but it'd be nice to plug in to the real-time features.

I'm still getting the feeling that there is very little negative about doing this, and that mostly the negative feelings come from the fact that it's not the awesomest way of using node - rather than it actually being "bad". I'm not trying to compare awesome node to less awesome node, I'm trying to compare less awesome node to Java/Python/.net for a specific use-case.

Prajwal, you said "use something lower level (thus faster) like JVM (Scala/Java) for this". Is that proven? Is Java faster at looping, and concatenating and adding numbers and such? The only benchmarks I see are web-related, so I just have no idea. Good point on the limited computation libraries tho - I don't think I need anything fancy but I should definitely make sure the libs I need are available in node before getting started.

I've read some confusing stuff on memory, probably because things have changed quickly recently in V8 and node in that regard. It seems there shouldn't be a problem though because, as Ben mentioned, while the "JS heap is limited to slightly less than 2 GB, buffers and typed arrays live mostly outside the JS heap.". To be sure, what's an example of how I could run out of heap if I had, say, a 10GB array (outside of heap) that I'm processing?

It terms of concrete "problems", the main (only) one I'm seeing is the added complexity of using the full resources of a machine without threads - implementing something like Persona as Prajawal mentioned (which looks very interesting thanks!) seems harder than threading.

Pedro you say "Node is great for doing I/O and not much else", but this is exactly what I'm trying to get to the bottom of. I know this is the general drumbeat and I'm trying to find tangible reasons why.

Best,
Baz




Matt

unread,
May 24, 2013, 6:01:05 PM5/24/13
to nod...@googlegroups.com
On Fri, May 24, 2013 at 5:23 PM, Baz <b...@thinkloop.com> wrote:
Prajwal, you said "use something lower level (thus faster) like JVM (Scala/Java) for this". Is that proven? Is Java faster at looping, and concatenating and adding numbers and such? The only benchmarks I see are web-related, so I just have no idea. Good point on the limited computation libraries tho - I don't think I need anything fancy but I should definitely make sure the libs I need are available in node before getting started.

I think Java is well proven to be faster at this point. But it's a lot more complicated writing Java code.
 
I've read some confusing stuff on memory, probably because things have changed quickly recently in V8 and node in that regard. It seems there shouldn't be a problem though because, as Ben mentioned, while the "JS heap is limited to slightly less than 2 GB, buffers and typed arrays live mostly outside the JS heap.". To be sure, what's an example of how I could run out of heap if I had, say, a 10GB array (outside of heap) that I'm processing?

A typical example is if you started filling an object up with millions of keys. The garbage collector is going to have a bar hair day if you do that.
 
Pedro you say "Node is great for doing I/O and not much else", but this is exactly what I'm trying to get to the bottom of. I know this is the general drumbeat and I'm trying to find tangible reasons why.

I don't find that to be entirely true. Though it really comes down to a question of JS/V8 vs other options, not really Node. Sounds like you won't be doing much I/O so Node won't really come into it - it's more V8.


// ravi

unread,
May 24, 2013, 6:45:08 PM5/24/13
to nod...@googlegroups.com
On May 24, 2013, at 5:23 PM, Baz <b...@thinkloop.com> wrote:
> I'm still getting the feeling that there is very little negative about doing this, and that mostly the negative feelings come from the fact that it's not the awesomest way of using node - rather than it actually being "bad". I'm not trying to compare awesome node to less awesome node, I'm trying to compare less awesome node to Java/Python/.net for a specific use-case.


There are a lot of performance measurements and benchmarks out there, these days, that pit these languages (and the underlying frameworks) against each other. Given that you wrote (IIRC) that you want to churn through a lot of data, I assume performance is important to you. I haven’t seen any benchmark where Python or other interpreted (or byte-intepreted) languages come in better than JavaScript running on V8. Recently someone posted his measurement of JavaScript parseInt() which fared poorly against a hand-coded version (again in JS), so you may have to profile your code and watch for such quirks if they exist.

Apart from that, and considerations of personal preferences w.r.t coding style, I do not see anything that puts a general purpose high-level language (JS) with a robust engine (v8) and a progressively strong framework (node) — not to forget a lively community — at a disadvantage relative to others.

2 cents,

—ravi

Baz

unread,
May 25, 2013, 1:44:23 AM5/25/13
to nod...@googlegroups.com
Thanks a lot Matt and Ravi, very helpful.


Matt

unread,
May 25, 2013, 10:39:53 AM5/25/13
to nod...@googlegroups.com
On Fri, May 24, 2013 at 6:45 PM, // ravi <ravi-...@g8o.net> wrote:
There are a lot of performance measurements and benchmarks out there, these days, that pit these languages (and the underlying frameworks) against each other. Given that you wrote (IIRC) that you want to churn through a lot of data, I assume performance is important to you. I haven’t seen any benchmark where Python or other interpreted (or byte-intepreted) languages come in better than JavaScript running on V8.

Lua. But then it's a bit of a weird language. 

Alex Kocharin

unread,
May 25, 2013, 11:11:33 AM5/25/13
to nod...@googlegroups.com

What about rounding? Do you round all values after every operation? How do you do that?

I mean, if there is 10 dollars to be evenly split between three people, one of them must receive $3.3334, and two others must receive $3.3333 (all banks I know of work with 1/100 of a cent). Otherwise money will not add up, and manager will be mad. If you're working with integers, it comes almost naturally because you know a priori you can't really divide evenly. But with floats it could be a different story.
Reply all
Reply to author
Forward
0 new messages