A few newbie questions on Nodejs

71 views

Skip to first unread message

Harry Simons

unread,

Mar 23, 2016, 1:25:16 PM3/23/16

to nodejs

Hello,

I have not been able to see the following points addressed in all the online material I have read to date on Node, and so, hope to be enlightened by some very smart and knowledegable folks here that I presume would be reading this.

1. Since I/O happens asynchronously in worker threads, it is possible for a single Node process to quickly/efficiently accept 1000s of incoming requests compared to something like Apache. But, surely, the outgoing responses for each of those requests will take their own time, won't it? For example, if an isolated and primarily an I/O bound request takes, say, 3 seconds to get serviced (with no other load on the system), then if concurrently hit with 5000 such requests, won't Node take a lot of time to service them all, fully? If this 3-second task happens to involve exclusive access to the disk, then it would take 5000 x 3 sec = 15000 seconds, or over 4 hours of wait to see the response for the last request coming out of the Node app. In such scenarios, would it be correct to claim that a single-process Node configuration can 'handle' 1000s of requests per second (granted, a thread-server like Apache would do a lot worse with 5000 threads) when all that Node may be doing is simply putting the requests 'on hold' till they get fully serviced instead of rejecting them outrightly on initial their arrival itself? I'm asking this because as I'm reading up on Node, I'm often hearing how Node can address the C10K problem without any co-mention of any specific application setups or any specific application types that Node can or cannot handle... other than the broad, CPU- vs I/O-bound type of application classification.

2. What about the context switching overhead of the workers in the worker-thread pool? If C10K requests hit a Node-based application, won't the workers in the worker-thread pool end up context-switching just as much as the user threads in the thread pool of a regular, threaded-server (like Apache)...? because, all that would have happened in Node's event thread would be a quick request-parsing and request-routing, with the remainder (or, the bulk) of the processing still happening in the worker thread? That is, does it really matter (as far as minimization of thread context-switching is concerned) whether a request/response is handled from start to finish in a single thread (in the manner of threaded-server like Apache), or whether it happens transparently in a Node-managed worker thread with only minimal work (of request parsing and routing) subtracted from it? Ignore here the simpler, single-threaded user model of coding that comes with an evented server like Node.

3. If the RDBMS instance (say, MySQL) is co-located on the Node server box, then would it be correct to classify a database CRUD operation as a pure I/O task? My understanding is, a CRUD operation on a large, relational database will typically involve heavyduty CPU- and I/O-processing, and not just I/O-processing. However, the online material that I've been reading seem to label a 'database call' as merely an 'I/O call' which supposedly makes your application an I/O-bound application if that is the only the thing your application is (mostly) doing.

4. A final question (related to the above themes) that may require knowledge of modern hardware and OS which I am not fully up-to-date on. Can I/O (on a given I/O device) be done in parallel, or even concurrently if not parallelly, and THUS, scale proportionally with user-count? Example: Suppose I have written a file-serving Node app that serves files from the local hard-disk, making it strongly an I/O-bound app. If hit with N (== ~ C10K) concurrent file serving requests, what I think would happen is this:

The event-loop would spawn N async file-read requests, and go idle. (Alternatively, Node would have pre-spawned all its workers in the pool on process startup.)
The N async file-read requests would each get assigned to N worker threads in the worker thread pool. If N > pool size, then the balance will be made to wait in some sort of an internal queue to get assigned to a worker.
Each file-read request would run concurrently at best, or sequentially at worst - but definitely NOT N-parallelly. That is, even a RAID configuration would be able to service merely a handful of file-read requests parallelly, and certainly not all N parallelly.
This would result in a large, total wait-time for the last file serving request to be served fully.

So, if all these 4 points are true, how could we really say that a single Node-process based application is good for (because it scales well) for I/O-bound applications? Can the mere ability to receive a large number of incoming requests and keep them all on hold indefinitely while their I/O fully completes (versus, rejecting them outrightly on their initial arrival itself) be called 'servicing the requests'? Can such an application be seen as scaling well with respect to user-count?

Many thanks in advance.

Regards,

/HS

Matt

unread,

Mar 23, 2016, 10:21:35 PM3/23/16

to nod...@googlegroups.com

On Wed, Mar 23, 2016 at 12:25 PM, Harry Simons <simon...@gmail.com> wrote:

Hello,

I have not been able to see the following points addressed in all the online material I have read to date on Node, and so, hope to be enlightened by some very smart and knowledegable folks here that I presume would be reading this.

They probably have been, but I'll try and address them here for the benefit of the community.

1. Since I/O happens asynchronously in worker threads, it is possible for a single Node process to quickly/efficiently accept 1000s of incoming requests compared to something like Apache. But, surely, the outgoing responses for each of those requests will take their own time, won't it?

Of course. There's no free lunch.

For example, if an isolated and primarily an I/O bound request takes, say, 3 seconds to get serviced (with no other load on the system), then if concurrently hit with 5000 such requests, won't Node take a lot of time to service them all, fully?

What is taking 3 seconds? The answer, as with all technology is "it depends". If you block the CPU for 3 seconds then yes of course, your app will suck. If you're just sitting waiting on other I/O (e.g. a network request) for 3 seconds, then lots can happen in the gaps.

If this 3-second task happens to involve exclusive access to the disk, then it would take 5000 x 3 sec = 15000 seconds, or over 4 hours of wait to see the response for the last request coming out of the Node app. In such scenarios, would it be correct to claim that a single-process Node configuration can 'handle' 1000s of requests per second (granted, a thread-server like Apache would do a lot worse with 5000 threads) when all that Node may be doing is simply putting the requests 'on hold' till they get fully serviced instead of rejecting them outrightly on initial their arrival itself? I'm asking this because as I'm reading up on Node, I'm often hearing how Node can address the C10K problem without any co-mention of any specific application setups or any specific application types that Node can or cannot handle... other than the broad, CPU- vs I/O-bound type of application classification.

I think you've just generally misread a lot of stuff about this, honestly. Disk I/O is "complicated" in node (because async I/O to disk is complicated in operating systems, it's not Node's fault). But not many web apps use the "fs" module on their requests directly. Node uses a thread pool for the filesystem requests on Unix-like OSs, so there are limits there, but it's very rare to see that as an issue for developing node apps at scale. When you talk to any of the DB modules you're using network I/O in Node, not filesystem I/O.

2. What about the context switching overhead of the workers in the worker-thread pool? If C10K requests hit a Node-based application, won't the workers in the worker-thread pool end up context-switching just as much as the user threads in the thread pool of a regular, threaded-server (like Apache)...?

No, because most of Node isn't threaded. Only a few parts of Node use a thread pool. Any network I/O uses native OS async methods (epoll, kqueue, and whatever Windows uses these days). So there's zero context switching overhead - you're entirely in userspace.

because, all that would have happened in Node's event thread would be a quick request-parsing and request-routing, with the remainder (or, the bulk) of the processing still happening in the worker thread? That is, does it really matter (as far as minimization of thread context-switching is concerned) whether a request/response is handled from start to finish in a single thread (in the manner of threaded-server like Apache), or whether it happens transparently in a Node-managed worker thread with only minimal work (of request parsing and routing) subtracted from it? Ignore here the simpler, single-threaded user model of coding that comes with an evented server like Node.

This is why you need to read up on the C10K docs more - there's far too big an overhead in moving from kernel to user space. A problem that threaded servers suffer from. That's why even Apache offers the "event" mpm, why nginx is so much faster than Apache, why Whatsapp wrote their system in Erlang (which uses an event loop like Node, but offers some very nice scaling tools that Node doesn't on top of that). There's reasons these things scale better, and it's because the OS sucks at managing data between multiple processes (threads, processes, whatever - call them what you want).

3. If the RDBMS instance (say, MySQL) is co-located on the Node server box, then would it be correct to classify a database CRUD operation as a pure I/O task? My understanding is, a CRUD operation on a large, relational database will typically involve heavyduty CPU- and I/O-processing, and not just I/O-processing. However, the online material that I've been reading seem to label a 'database call' as merely an 'I/O call' which supposedly makes your application an I/O-bound application if that is the only the thing your application is (mostly) doing.

Here you need to understand the performance difference between disk (even SSD) and CPU. It's several orders of magnitude. CPU processing can take too long, but don't code your software that way unless you can't help it. When you can't help it, make sure you use something (like a queueing system) that can deal with that while letting other stuff run.

4. A final question (related to the above themes) that may require knowledge of modern hardware and OS which I am not fully up-to-date on. Can I/O (on a given I/O device) be done in parallel, or even concurrently if not parallelly, and THUS, scale proportionally with user-count?

That depends. A single disk? No of course not. It has a fixed rotation speed. An array of disks? Maybe. A disk with a cache? Possibly. See how complex this question gets?

Example: Suppose I have written a file-serving Node app that serves files from the local hard-disk, making it strongly an I/O-bound app.

Assuming no caching. But why would you do that if you want to serve thousands of clients?

I think that pretty much answers the rest of your question, so I didn't add further answers.

Matt.

Ben Noordhuis

unread,

Mar 23, 2016, 10:22:05 PM3/23/16

to nod...@googlegroups.com

Hello Harry, replies inline.

Node.js uses non-blocking I/O when it can and only falls back to a
thread pool when it must. Sockets, pipes, etc. are handled in
asynchronous, non-blocking fashion using native system APIs but e.g.
file I/O is offloaded to a thread pool.

> 2. What about the context switching overhead of the workers in the
> worker-thread pool? If C10K requests hit a Node-based application, won't the
> workers in the worker-thread pool end up context-switching just as much as
> the user threads in the thread pool of a regular, threaded-server (like
> Apache)...? because, all that would have happened in Node's event thread
> would be a quick request-parsing and request-routing, with the remainder
> (or, the bulk) of the processing still happening in the worker thread? That
> is, does it really matter (as far as minimization of thread
> context-switching is concerned) whether a request/response is handled from
> start to finish in a single thread (in the manner of threaded-server like
> Apache), or whether it happens transparently in a Node-managed worker thread
> with only minimal work (of request parsing and routing) subtracted from it?
> Ignore here the simpler, single-threaded user model of coding that comes
> with an evented server like Node.

See above. Depending on the application, you may not hit the thread
pool much or at all.

> 3. If the RDBMS instance (say, MySQL) is co-located on the Node server box,
> then would it be correct to classify a database CRUD operation as a pure I/O
> task? My understanding is, a CRUD operation on a large, relational database
> will typically involve heavyduty CPU- and I/O-processing, and not just
> I/O-processing. However, the online material that I've been reading seem to
> label a 'database call' as merely an 'I/O call' which supposedly makes your
> application an I/O-bound application if that is the only the thing your
> application is (mostly) doing.

Communication with the database normally takes place over a TCP or
UNIX socket, so as far as node.js is concerned, it's not much
different from any other network connection. The heavy-duty number
crunching takes place in a different process, the RDBMS.

> 4. A final question (related to the above themes) that may require knowledge
> of modern hardware and OS which I am not fully up-to-date on. Can I/O (on a
> given I/O device) be done in parallel, or even concurrently if not
> parallelly, and THUS, scale proportionally with user-count? Example: Suppose
> I have written a file-serving Node app that serves files from the local
> hard-disk, making it strongly an I/O-bound app. If hit with N (== ~ C10K)
> concurrent file serving requests, what I think would happen is this:
>
> The event-loop would spawn N async file-read requests, and go idle.
> (Alternatively, Node would have pre-spawned all its workers in the pool on
> process startup.)
> The N async file-read requests would each get assigned to N worker threads
> in the worker thread pool. If N > pool size, then the balance will be made
> to wait in some sort of an internal queue to get assigned to a worker.
> Each file-read request would run concurrently at best, or sequentially at
> worst - but definitely NOT N-parallelly. That is, even a RAID configuration
> would be able to service merely a handful of file-read requests parallelly,
> and certainly not all N parallelly.
> This would result in a large, total wait-time for the last file serving
> request to be served fully.

That's by and large correct.

> So, if all these 4 points are true, how could we really say that a single
> Node-process based application is good for (because it scales well) for
> I/O-bound applications? Can the mere ability to receive a large number of
> incoming requests and keep them all on hold indefinitely while their I/O
> fully completes (versus, rejecting them outrightly on their initial arrival
> itself) be called 'servicing the requests'? Can such an application be seen
> as scaling well with respect to user-count?

For many applications the answer is 'yes' because they can break up
the request in smaller parts that they can service independently.

Say your application has to read a) a file from disk, b) query a
database, and c) consult a web service before it can send a reply. In
the traditional web server model, it takes a+b+c time, whereas with
the asynchronous model it's max(a,b,c).

max(a,b,c) <= a+b+c so worst case, it performs the same, but common
case, it's much faster.

Does that answer your questions?

Harry Simons

unread,

Mar 24, 2016, 1:10:32 AM3/24/16

to nodejs

> Does that answer your questions?

Thanks Ben, for taking the time to write. I did find your response (and Matt's too) both interesting and enlightening.

> > Node.js uses non-blocking I/O when it can and only falls back to a
> > thread pool when it must.

So, like file-system I/O, what would be other examples of Node employing the worker thread pool. It seems, knowing this would be one of the essentials to writing a scalable node app.

> > 3. If the RDBMS instance (say, MySQL) is co-located on the Node server box,
> > then would it be correct
>

> Communication with the database normally takes place over a TCP or
> UNIX socket,

I see that you assume (or recommend) a specific setup of web apps... e.g. in which the DBMS server is on a dedicated box of its own. This, incidentally, is typically the case with << C10K apps also. So, Node's evented nature won't magically obviate the need of a dedicated DBMS server. (Note that I've never really developed or deployed even a modest scale web app, let alone a C10K scalable app - hence all this newbie ignorance.)

Harry Simons

unread,

Mar 24, 2016, 1:43:50 AM3/24/16

to nodejs

On Thursday, March 24, 2016 at 7:51:35 AM UTC+5:30, Matt Sergeant wrote:

On Wed, Mar 23, 2016 at 12:25 PM, Harry Simons <simon...@gmail.com> wrote:

For example, if an isolated and primarily an I/O bound request takes, say, 3 seconds to get serviced (with no other load on the system), then if concurrently hit with 5000 such requests, won't Node take a lot of time to service them all, fully?

What is taking 3 seconds? The answer, as with all technology is "it depends". If you block the CPU for 3 seconds then yes of course, your app will suck. If you're just sitting waiting on other I/O (e.g. a network request) for 3 seconds, then lots can happen in the gaps.

A CRUD operation against a large database could take well > 3 seconds. I was assuming the DB server being co-located with the Node server (on the same physical box) in my original question and was thus taking it to involve CPU+I/O processing instead of just network I/O; the latter would be the case if it were another physical server on the network (as in your response). Apparently, that's a bad idea even with an evented platform such as Node. Ben's response too assumes a remote DB server resulting in a pure I/O wait on the Node server. I get it now.

If this 3-second task happens to involve exclusive access to the disk, then it would take 5000 x 3 sec = 15000 seconds, or over 4 hours of wait to see the response for the last request coming out of the Node app. In such scenarios, would it be correct to claim that a single-process Node configuration can 'handle' 1000s of requests per second (granted, a thread-server like Apache would do a lot worse with 5000 threads) when all that Node may be doing is simply putting the requests 'on hold' till they get fully serviced instead of rejecting them outrightly on initial their arrival itself? I'm asking this because as I'm reading up on Node, I'm often hearing how Node can address the C10K problem without any co-mention of any specific application setups or any specific application types that Node can or cannot handle... other than the broad, CPU- vs I/O-bound type of application classification.

I think you've just generally misread a lot of stuff about this, honestly. Disk I/O is "complicated" in node (because async I/O to disk is complicated in operating systems, it's not Node's fault). But not many web apps use the "fs" module on their requests directly. Node uses a thread pool for the filesystem requests on Unix-like OSs, so there are limits there, but it's very rare to see that as an issue for developing node apps at scale. When you talk to any of the DB modules you're using network I/O in Node, not filesystem I/O.

I took up the specific case of the DB server co-located on the Node server. Apparently, even in a Node-based application this would be a bad idea - is what I'm hearing. Which is fine. I get it now.

2. What about the context switching overhead of the workers in the worker-thread pool? If C10K requests hit a Node-based application, won't the workers in the worker-thread pool end up context-switching just as much as the user threads in the thread pool of a regular, threaded-server (like Apache)...?

No, because most of Node isn't threaded. Only a few parts of Node use a thread pool. Any network I/O uses native OS async methods (epoll, kqueue, and whatever Windows uses these days). So there's zero context switching overhead - you're entirely in userspace.

For sake of completeness, I'm curious to know what ALL parts of Node are threaded. Filesystem I/O is apparently one, and network I/O apparently is not.

Zlatko

unread,

Mar 24, 2016, 9:53:56 AM3/24/16

to nodejs

On Thursday, March 24, 2016 at 6:43:50 AM UTC+1, Harry Simons wrote:

On Thursday, March 24, 2016 at 7:51:35 AM UTC+5:30, Matt Sergeant wrote:
On Wed, Mar 23, 2016 at 12:25 PM, Harry Simons <simon...@gmail.com> wrote:

For example, if an isolated and primarily an I/O bound request takes, say, 3 seconds to get serviced (with no other load on the system), then if concurrently hit with 5000 such requests, won't Node take a lot of time to service them all, fully?

What is taking 3 seconds? The answer, as with all technology is "it depends". If you block the CPU for 3 seconds then yes of course, your app will suck. If you're just sitting waiting on other I/O (e.g. a network request) for 3 seconds, then lots can happen in the gaps.

A CRUD operation against a large database could take well > 3 seconds. I was assuming the DB server being co-located with the Node server (on the same physical box) in my original question and was thus taking it to involve CPU+I/O processing instead of just network I/O; the latter would be the case if it were another physical server on the network (as in your response). Apparently, that's a bad idea even with an evented platform such as Node. Ben's response too assumes a remote DB server resulting in a pure I/O wait on the Node server. I get it now.

If you have a CRUD operation that takes well over 3 seconds, then this RDBMS would definitely benefit from its own dedicated box. And if you're serving that same query to those 10,000 concurrent users, then your _app_ processing time is probably irrelevant, be it Node, Python, Java or good old PHP. But you're talking scalability and comparing to apache, so let's compare: Node on its own - takes many requests. Apache - not so many. Database query duration time: lasts the same. That's where node defends its claim for "insta-scalability".

If this 3-second task happens to involve exclusive access to the disk, then it would take 5000 x 3 sec = 15000 seconds, or over 4 hours of wait to see the response for the last request coming out of the Node app. In such scenarios, would it be correct to claim that a single-process Node configuration can 'handle' 1000s of requests per second (granted, a thread-server like Apache would do a lot worse with 5000 threads) when all that Node may be doing is simply putting the requests 'on hold' till they get fully serviced instead of rejecting them outrightly on initial their arrival itself? I'm asking this because as I'm reading up on Node, I'm often hearing how Node can address the C10K problem without any co-mention of any specific application setups or any specific application types that Node can or cannot handle... other than the broad, CPU- vs I/O-bound type of application classification.

I think you've just generally misread a lot of stuff about this, honestly. Disk I/O is "complicated" in node (because async I/O to disk is complicated in operating systems, it's not Node's fault). But not many web apps use the "fs" module on their requests directly. Node uses a thread pool for the filesystem requests on Unix-like OSs, so there are limits there, but it's very rare to see that as an issue for developing node apps at scale. When you talk to any of the DB modules you're using network I/O in Node, not filesystem I/O.

I took up the specific case of the DB server co-located on the Node server. Apparently, even in a Node-based application this would be a bad idea - is what I'm hearing. Which is fine. I get it now.

It's not necessarily a bad idea, but it most likely is if you're serving 10,000 or more concurrent users. Say you have some simple mostly-read database stored in memory. I'm thinking Redis which is commonly used. Node and this Redis instance can sit on the same server for those 10k requests - but we do not depend so much on Node here (provided the app was written relatively well) - we're depending on the server itself.

And that is, I believe one of the huge benefits of Node. You basically lift any such concurrency limits from your runtime environment and from your app and move it outside, to the underlaying host OS. You can be certain that if you have problems, it's not your app or Node that's the bottleneck - it's the database system, disks or something similar.

(Although, it often _is_ your app. Well, for me, at least, most of the scaling issues I've had were my bad design.)

Will Hoover

unread,

Mar 24, 2016, 1:06:21 PM3/24/16

to nodejs

And to add the fact that most node apps are either clustered or multiple instances of the application are deployed (vertically and horizontally) and load balanced. So, there's a lot more "gaps" that node can take advantage of in real-world scenarios.

--
Job board: http://jobs.nodejs.org/
New group rules: https://gist.github.com/othiym23/9886289#file-moderation-policy-md
Old group rules: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
---
You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+un...@googlegroups.com.
To post to this group, send email to nod...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nodejs/852ec0e8-6863-4e91-9ffe-312c97480790%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages