Hello,I have not been able to see the following points addressed in all the online material I have read to date on Node, and so, hope to be enlightened by some very smart and knowledegable folks here that I presume would be reading this.
1. Since I/O happens asynchronously in worker threads, it is possible for a single Node process to quickly/efficiently accept 1000s of incoming requests compared to something like Apache. But, surely, the outgoing responses for each of those requests will take their own time, won't it?
For example, if an isolated and primarily an I/O bound request takes, say, 3 seconds to get serviced (with no other load on the system), then if concurrently hit with 5000 such requests, won't Node take a lot of time to service them all, fully?
If this 3-second task happens to involve exclusive access to the disk, then it would take 5000 x 3 sec = 15000 seconds, or over 4 hours of wait to see the response for the last request coming out of the Node app. In such scenarios, would it be correct to claim that a single-process Node configuration can 'handle' 1000s of requests per second (granted, a thread-server like Apache would do a lot worse with 5000 threads) when all that Node may be doing is simply putting the requests 'on hold' till they get fully serviced instead of rejecting them outrightly on initial their arrival itself? I'm asking this because as I'm reading up on Node, I'm often hearing how Node can address the C10K problem without any co-mention of any specific application setups or any specific application types that Node can or cannot handle... other than the broad, CPU- vs I/O-bound type of application classification.
2. What about the context switching overhead of the workers in the worker-thread pool? If C10K requests hit a Node-based application, won't the workers in the worker-thread pool end up context-switching just as much as the user threads in the thread pool of a regular, threaded-server (like Apache)...?
because, all that would have happened in Node's event thread would be a quick request-parsing and request-routing, with the remainder (or, the bulk) of the processing still happening in the worker thread? That is, does it really matter (as far as minimization of thread context-switching is concerned) whether a request/response is handled from start to finish in a single thread (in the manner of threaded-server like Apache), or whether it happens transparently in a Node-managed worker thread with only minimal work (of request parsing and routing) subtracted from it? Ignore here the simpler, single-threaded user model of coding that comes with an evented server like Node.
3. If the RDBMS instance (say, MySQL) is co-located on the Node server box, then would it be correct to classify a database CRUD operation as a pure I/O task? My understanding is, a CRUD operation on a large, relational database will typically involve heavyduty CPU- and I/O-processing, and not just I/O-processing. However, the online material that I've been reading seem to label a 'database call' as merely an 'I/O call' which supposedly makes your application an I/O-bound application if that is the only the thing your application is (mostly) doing.
4. A final question (related to the above themes) that may require knowledge of modern hardware and OS which I am not fully up-to-date on. Can I/O (on a given I/O device) be done in parallel, or even concurrently if not parallelly, and THUS, scale proportionally with user-count?
Example: Suppose I have written a file-serving Node app that serves files from the local hard-disk, making it strongly an I/O-bound app.
On Wed, Mar 23, 2016 at 12:25 PM, Harry Simons <simon...@gmail.com> wrote:
For example, if an isolated and primarily an I/O bound request takes, say, 3 seconds to get serviced (with no other load on the system), then if concurrently hit with 5000 such requests, won't Node take a lot of time to service them all, fully?What is taking 3 seconds? The answer, as with all technology is "it depends". If you block the CPU for 3 seconds then yes of course, your app will suck. If you're just sitting waiting on other I/O (e.g. a network request) for 3 seconds, then lots can happen in the gaps.
If this 3-second task happens to involve exclusive access to the disk, then it would take 5000 x 3 sec = 15000 seconds, or over 4 hours of wait to see the response for the last request coming out of the Node app. In such scenarios, would it be correct to claim that a single-process Node configuration can 'handle' 1000s of requests per second (granted, a thread-server like Apache would do a lot worse with 5000 threads) when all that Node may be doing is simply putting the requests 'on hold' till they get fully serviced instead of rejecting them outrightly on initial their arrival itself? I'm asking this because as I'm reading up on Node, I'm often hearing how Node can address the C10K problem without any co-mention of any specific application setups or any specific application types that Node can or cannot handle... other than the broad, CPU- vs I/O-bound type of application classification.I think you've just generally misread a lot of stuff about this, honestly. Disk I/O is "complicated" in node (because async I/O to disk is complicated in operating systems, it's not Node's fault). But not many web apps use the "fs" module on their requests directly. Node uses a thread pool for the filesystem requests on Unix-like OSs, so there are limits there, but it's very rare to see that as an issue for developing node apps at scale. When you talk to any of the DB modules you're using network I/O in Node, not filesystem I/O.
2. What about the context switching overhead of the workers in the worker-thread pool? If C10K requests hit a Node-based application, won't the workers in the worker-thread pool end up context-switching just as much as the user threads in the thread pool of a regular, threaded-server (like Apache)...?No, because most of Node isn't threaded. Only a few parts of Node use a thread pool. Any network I/O uses native OS async methods (epoll, kqueue, and whatever Windows uses these days). So there's zero context switching overhead - you're entirely in userspace.
On Thursday, March 24, 2016 at 7:51:35 AM UTC+5:30, Matt Sergeant wrote:On Wed, Mar 23, 2016 at 12:25 PM, Harry Simons <simon...@gmail.com> wrote:
For example, if an isolated and primarily an I/O bound request takes, say, 3 seconds to get serviced (with no other load on the system), then if concurrently hit with 5000 such requests, won't Node take a lot of time to service them all, fully?What is taking 3 seconds? The answer, as with all technology is "it depends". If you block the CPU for 3 seconds then yes of course, your app will suck. If you're just sitting waiting on other I/O (e.g. a network request) for 3 seconds, then lots can happen in the gaps.
A CRUD operation against a large database could take well > 3 seconds. I was assuming the DB server being co-located with the Node server (on the same physical box) in my original question and was thus taking it to involve CPU+I/O processing instead of just network I/O; the latter would be the case if it were another physical server on the network (as in your response). Apparently, that's a bad idea even with an evented platform such as Node. Ben's response too assumes a remote DB server resulting in a pure I/O wait on the Node server. I get it now.
If this 3-second task happens to involve exclusive access to the disk, then it would take 5000 x 3 sec = 15000 seconds, or over 4 hours of wait to see the response for the last request coming out of the Node app. In such scenarios, would it be correct to claim that a single-process Node configuration can 'handle' 1000s of requests per second (granted, a thread-server like Apache would do a lot worse with 5000 threads) when all that Node may be doing is simply putting the requests 'on hold' till they get fully serviced instead of rejecting them outrightly on initial their arrival itself? I'm asking this because as I'm reading up on Node, I'm often hearing how Node can address the C10K problem without any co-mention of any specific application setups or any specific application types that Node can or cannot handle... other than the broad, CPU- vs I/O-bound type of application classification.I think you've just generally misread a lot of stuff about this, honestly. Disk I/O is "complicated" in node (because async I/O to disk is complicated in operating systems, it's not Node's fault). But not many web apps use the "fs" module on their requests directly. Node uses a thread pool for the filesystem requests on Unix-like OSs, so there are limits there, but it's very rare to see that as an issue for developing node apps at scale. When you talk to any of the DB modules you're using network I/O in Node, not filesystem I/O.
I took up the specific case of the DB server co-located on the Node server. Apparently, even in a Node-based application this would be a bad idea - is what I'm hearing. Which is fine. I get it now.
And to add the fact that most node apps are either clustered or multiple instances of the application are deployed (vertically and horizontally) and load balanced. So, there's a lot more "gaps" that node can take advantage of in real-world scenarios.
--
Job board: http://jobs.nodejs.org/
New group rules: https://gist.github.com/othiym23/9886289#file-moderation-policy-md
Old group rules: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
---
You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+un...@googlegroups.com.
To post to this group, send email to nod...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nodejs/852ec0e8-6863-4e91-9ffe-312c97480790%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.