horizontal scaling with NodeJS on same physical machine

783 views
Skip to first unread message

Bijuv V

unread,
Dec 19, 2013, 11:50:03 AM12/19/13
to nod...@googlegroups.com
Hi ,

I have a Web server with an Application developed using express. In the application we do mention the port at which the application should listen for requests. For eg 3000.

If I spawn multiple node processes on the same machine, there will be a conflict on the ports. How can this be achieved (apart from the below options)?

The options that Im aware of are
a. Use Cluster feature of node. Everything is handled by node. Still experimental AFAIK.
b. Use multiple VM's to deploy each instance
c. start node on different ports - 3001 - 3008. Put a hardware LB before the same which will send the requests to one of the Node instances. - Dont want to invest on a H/W LB.

I was watching the video from Ryan. http://www.youtube.com/watch?v=F6k8lTrAE2g

He mentions about the limitations of node being single threaded and also talks about server file descriptors usage to build Web Servers. However, I could not get the details of how the file descriptors should be configured so that the requests go to one of the 8 instances on the machine.

Fabrizio Soppelsa

unread,
Dec 19, 2013, 12:19:23 PM12/19/13
to nod...@googlegroups.com
On 12/19/2013 05:50 PM, Bijuv V wrote:
> If I spawn multiple node processes on the same machine, there will be
> a conflict on the ports. How can this be achieved (apart from the
> below options)?

How about going with passenger standalone with a number of instances?
However, I've never tried it in production for Node.

FS.

Matt

unread,
Dec 19, 2013, 12:26:12 PM12/19/13
to nod...@googlegroups.com

On Thu, Dec 19, 2013 at 11:50 AM, Bijuv V <vvbij...@gmail.com> wrote:
a. Use Cluster feature of node. Everything is handled by node. Still experimental AFAIK.

I think while cluster is still marked experimental, it's built on top of child_process which is marked as Stable.

Most people go with cluster. It works well and is extremely reliable in my experience. As long as you're aware of the gotchas with graceful restarts.

Matt.

Luke Arduini

unread,
Dec 19, 2013, 12:28:53 PM12/19/13
to nod...@googlegroups.com
The "experimental" stability index means almost nothing. It's fine. Plenty of people use it in production. 
--
--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to
nodejs+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en
 
---
You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Shubhra Kar

unread,
Dec 19, 2013, 1:06:26 PM12/19/13
to nod...@googlegroups.com
Hi,

You can use StrongLoop's strong-cluster module to autoscale up or down and solve all these issues. If app is running as single process, with a single command or through the StrongOps GUI, more processes can be added as part of the cluster. All processes will be aware of each other and part of the same application. Strong-Cluster will take care of workload management and load balancing as well with Master-Slave configuration. 

The application will listen on a single port - say hostname:3000. When requests come in the master will distribute requests to multiple processes automatically. Any process doesn't need to be hit separately. For storing session information of the cluster, Strong-Cluster has an additional module called Strong-Cluster-Store, which uses redis type store to maintain cluster info. 

Alex Kocharin

unread,
Dec 19, 2013, 1:40:14 PM12/19/13
to nod...@googlegroups.com
 
Agreed, using cluster (either using 'pm2' module or by writing your own small cluster manager) is the way to do it.
 
 
19.12.2013, 21:29, "Luke Arduini" <luke.a...@gmail.com>:

Bijuv V

unread,
Dec 19, 2013, 1:41:22 PM12/19/13
to nod...@googlegroups.com
Thanks for sharing your views . Do you know what Ryan means by Server File Descriptors?

Also , about Strongloop, you mentioned about communication between the node instances - why would this be necessary - cant figure out a use case .


You received this message because you are subscribed to a topic in the Google Groups "nodejs" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nodejs/Dv5J7w34rcQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to nodejs+un...@googlegroups.com.

Alex Kocharin

unread,
Dec 19, 2013, 1:53:27 PM12/19/13
to nod...@googlegroups.com
 
I can't say for sure what he meant there, since year 2010 was like ages ago, and a lot of stuff happened.
 
But it's probably about passing file descriptors from master server to a child process. Only one process can listen on a port at any given time, so master does it. But once it accepts a connection, it sends file descriptor (everything is a file, remember?) to a child process which reads data and answers on it directly.
 
Communication between node instances is probably just a sign of overengineering.
 
 
19.12.2013, 22:41, "Bijuv V" <vvbij...@gmail.com>:

Bijuv V

unread,
Dec 19, 2013, 2:50:21 PM12/19/13
to nod...@googlegroups.com
Thanks Alex.

So the options finally are -

a. Use Cluster - used in many prod applications as mentioned by Luke

b. Use PM2 module - similar to cluster I believe

c. Use Phusion Passenger - Not quite sure why would I go with such a heavyweight ? (IMO - just by reading thru their website)

d. Use VM's

e. Use physical LB's

What would be the drawback of using a Cluster compared to a Dyno or VM? I understand about the restarts of the cluster




Shubhra Kar

unread,
Dec 19, 2013, 2:57:06 PM12/19/13
to nod...@googlegroups.com
The awareness between processes is useful for work load management between master and child. Description by Alex is mostly accurate.

We are also working on a more distributed work load management solution in strong-cluster to allocate specific type of workloads selectively to child processes. So you can manage it like a job and individual tasks can be split and can run dedicated on a specific set of processes in the cluster.

Cluster awareness is useful for sharing state and session information across processes. Not every node application will need it, definitely not stateless ones. So could look like over-engineering. But in cases where transaction integrity and rollback are needed - say banking transactions (if one processes drops a transaction in flight due to load, maybe the other can pick it up), it becomes critical.

Kind Regards,
Shubhra



On Thursday, December 19, 2013 10:41:22 AM UTC-8, Bijuv V wrote:
Thanks for sharing your views . Do you know what Ryan means by Server File Descriptors?

Also , about Strongloop, you mentioned about communication between the node instances - why would this be necessary - cant figure out a use case .
On Thu, Dec 19, 2013 at 6:28 PM, Luke Arduini <luke.a...@gmail.com> wrote:
The "experimental" stability index means almost nothing. It's fine. Plenty of people use it in production. 


On Thursday, December 19, 2013, Matt wrote:

On Thu, Dec 19, 2013 at 11:50 AM, Bijuv V <vvbij...@gmail.com> wrote:
a. Use Cluster feature of node. Everything is handled by node. Still experimental AFAIK.

I think while cluster is still marked experimental, it's built on top of child_process which is marked as Stable.

Most people go with cluster. It works well and is extremely reliable in my experience. As long as you're aware of the gotchas with graceful restarts.

Matt.

--
--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to

For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en
 
---
You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

Shubhra Kar

unread,
Dec 19, 2013, 3:01:12 PM12/19/13
to nod...@googlegroups.com
+ using strong cluster control, you can add dynamically additional processes to a running application based on workload and drop them when not needed. 

You can avoid application restarts.

So if you started with 2 processes as part of the cluster and found them overwhelmed with load, you can automatically resize the cluster to say 6 processes and the master will distribute load to the new processes.

Hope this helps.

Shubhra
--
Shubhra Kar | Director - Products & Systems Engineering  |  StrongLoop

Sam Roberts

unread,
Dec 19, 2013, 3:06:58 PM12/19/13
to nod...@googlegroups.com
On Thu, Dec 19, 2013 at 11:50 AM, Bijuv V <vvbij...@gmail.com> wrote:
> Thanks Alex.
>
> So the options finally are -
>
> a. Use Cluster - used in many prod applications as mentioned by Luke

Yeah, ignore the experimental, it means node is allowed to changes its
API, if they need to.

It's API is a bit low-level, you need to do some stuff to make it
useful, see (b)!

> b. Use PM2 module - similar to cluster I believe

pm2 is a wrapper around cluster, as are:

ql-io/cluster2 (ebay)
godaddy/cluster-service
strongloop/strong-cluster-control

They all just add functionality that many people have found useful on
top of node's built-in cluster. YMMV, check them out.

You asked why share state in a cluster... lots of reasons! MySQL...?

In large production, you probably will use something like Redis to
distribute state among workers. In a small single-machine cluster, or
while doing development or in CI, maybe you don't want to depend on
external servers. Or maybe you do, your choice.

If you don't, the set of "store" modules Shubra mentioned can be used
to distribute connect session state, socket.io sessions, etc. through
a node cluster without using external servers.

> c. Use Phusion Passenger - Not quite sure why would I go with such a
> heavyweight ? (IMO - just by reading thru their website)

nginx is pretty high performance... I don't if heavy weight is fair.

> d. Use VM's
> e. Use physical LB's
>
> What would be the drawback of using a Cluster compared to a Dyno or VM? I
> understand about the restarts of the cluster

And speaking of heavy-weight solutions... here come the VMs! ;-)

Alex Kocharin

unread,
Dec 19, 2013, 3:08:53 PM12/19/13
to nod...@googlegroups.com
 
pm2 is a cluster master process. You can write a master yourself, or you can use it instead, it is essentially just one option.
 
If you like to have another option, I'd name different ports and nginx as a load balancer between them. It was what we were doing before cluster was released, and I believe it's still viable. Besides, nginx is good for serving static files, so people are using it even with cluster anyway.
 
Passenger, strong control, etc. are too heavy and vendor-locking-looking for my taste. VMs like LXC are awesome, but I think they add big overhead as well. I never used physical LB's, so can't say anything.
 
 
19.12.2013, 23:50, "Bijuv V" <vvbij...@gmail.com>:

Tim Smart

unread,
Dec 19, 2013, 3:12:32 PM12/19/13
to nod...@googlegroups.com
I scaled on a single machine using haproxy and thalassa.

The gist of it:

var http = require('http')
var app = require('./app')
var pkg = require('./package.json')

var server = http.createServer()
server.on('request', app)

// Listen on port 0. This finds a random, free port.
server.listen(0, funciton () {
var port = server.address().port

// Here you would register the port / host combination with some sort of
// registry, which would dynamically update haproxy. In my case I used
// thalassa.
//
// registry.register('myservice', pkg.version, port)
// registry.start()
})

What I love about this approach is that you can easily go multi-machine later
down the track.
signature.sig

Alejandro de Brito Fontes

unread,
Dec 19, 2013, 3:28:10 PM12/19/13
to nod...@googlegroups.com
Hi, just use HAProxy http://haproxy.1wt.eu/ (software LB).
Here https://github.com/observing/balancerbattle you can find multiple options to scale.

Tomasz Janczuk

unread,
Dec 20, 2013, 12:48:51 AM12/20/13
to nod...@googlegroups.com

On Windows, you have the option of using the httpsys module (https://github.com/tjanczuk/httpsys) to replace Node's HTTP stack with the kernel mode HTTP.SYS implementation Windows comes with. This in turn allows you run several node processes listening on the same TCP port number, and have the OS kernel load balance incoming requests between them. Unlike with cluster, this approach does not require a master process. 

Also unlike in case of cluster, the httpsys module coupled with the TCP port sharing functionality of HTTP.SYS allows you to achieve not only horizontal scale out, but also horizontal partitioning with process level affinity. This is very useful for applications that must be scaled out to handle the traffic, and yet would benefit from keeping session state in-memory for performance reasons (e.g. socket.io chat applications). Again, check out http://tomasz.janczuk.org/2013/05/how-to-save-5-million-running-nodejs.html for details.

Nu Era

unread,
Dec 20, 2013, 1:05:36 AM12/20/13
to nod...@googlegroups.com
The cluster module seems pretty good from my experiments thus far.

:-)

mgutz

unread,
Dec 23, 2013, 12:17:42 AM12/23/13
to nod...@googlegroups.com
If you need process affinity, eg using Socket.IO, SockJS, your best choice is a load balancer. We use HAProxy 1.5dev for load balancing and SSL termination. Our 4 node.js servers with 8 app instances each approach 70-80% CPU regularly while the HAProxy server rarely exceeds 30% CPU. You probably don't need a HW load balancer right now. 

mgutz

unread,
Dec 23, 2013, 12:31:20 AM12/23/13
to nod...@googlegroups.com
I didn't make it clear. At the start we started with http-proxy and 5 node.js processes on a single server. The next step to improve performance was get rid of http-proxy and stop using node to serve static assets and SSL termination. We ended up with HAProxy + NGinx and 5 node.js processes on a single VPS. With this configuration it's simple to add more servers.

David Beck

unread,
Dec 23, 2013, 10:47:41 PM12/23/13
to nod...@googlegroups.com
May I please ask what to some will seem like a silly question, but I don't get it: why is it necessary to scale horizontally on a single machine? In theory, if you write asynchronous code, shouldn't one fast thread be roughly equivalent to many slower theads, since the real work is being done behind asynchronous callbacks anyway? Is it that the synchronous part of the node application is stalling performance significantly under heavy loads?

Thank you for helping me understand!

David

Alex Kocharin

unread,
Dec 23, 2013, 10:54:07 PM12/23/13
to nod...@googlegroups.com

1) If somebody has multicore computer and wants to utilize all cores, it'll be nice to launch multiple instances
2) Node.js is almost a crash-only app, so when it finally crashes, it's necessary to have another copies in place
3) Node.js is not always asynchronous, so when someone decides to compute fibonacci sequence, it will block

It's not necessary for everybody, but scaling on a single machine is something a lot of people do.


24.12.2013, 07:47, "David Beck" <dave...@gmail.com>:
> May I please ask what to some will seem like a silly question, but I don't get it: why is it necessary to scale horizontally on a single machine? In theory, if you write asynchronous code, shouldn't one fast thread be roughly equivalent to many slower theads, since the real work is being done behind asynchronous callbacks anyway? Is it that the synchronous part of the node application is stalling performance significantly under heavy loads?
>
> Thank you for helping me understand!
>
> David
>

Mark Hahn

unread,
Dec 24, 2013, 12:13:49 AM12/24/13
to nodejs
 If somebody has multicore computer and wants to utilize all cores, it'll be nice to launch multiple instances

That's pretty much the whole reason.

David Beck

unread,
Dec 25, 2013, 3:49:46 PM12/25/13
to nod...@googlegroups.com
Thank you! Merry Christmas!

Hongli Lai

unread,
Dec 27, 2013, 10:23:46 AM12/27/13
to nod...@googlegroups.com
On Thursday, December 19, 2013 8:50:21 PM UTC+1, Bijuv V wrote:
c. Use Phusion Passenger - Not quite sure why would I go with such a heavyweight ? (IMO - just by reading thru their website) 

Phusion Passenger is quite lightweight actually. Its core is written in C++, is high-performance and uses less than 5 MB of memory. Many Passenger users actually use it on VPSes with less than 1 GB of RAM (some even less than 512 MB) and it works great.

It sounds like Passenger solves your problem quite nicely. Passenger spawns multiple instances of your application (i.e. multiple processes), manages them automatically and exposes them over a single port. As a user you only have to deal with that single Passenger instance, not with all the individual processes. The multi-process management allows your app to automatically utilize all your CPU cores, and all requests are automatically and fairly distributed over the processes. Passenger also provides:
- Crash protection by restarting your processes automatically if they crash.
- Powerful administration tools for inspecting the status of your app as well as various statistics, such as number active requests, concurrency, memory usage, CPU usage etc. These are provided through simple command line tools - no need to sign up for any web services.
- Security features such as Automatic User Switching.
- Changing the cluster size dynamically.

Another major reason for using Passenger is its ability to reduce your configuration boilerplate by 90%, drastically simplifying deployment. With cluster, forever, pm2, etc you have to write Nginx reverse proxy configuration directives, setup init scripts, setup process monitoring, etc. Almost all of these are taken care of you by Passenger, though its Nginx integration mode.

And unlike the cluster module, you usually do not have to modify your application to make it multi-process capable. Except in some rare (and documented) cases, it should work out of the box.

As for vendor lock-in: Passenger is open source. Its source code is available at Github under the MIT license. Anybody can read or modify it: https://github.com/phusion/passenger

Although Passenger is new, users are slowly picking it up. For some reviews, see:

mgutz

unread,
Jan 8, 2014, 2:36:58 PM1/8/14
to nod...@googlegroups.com
Hi Bijuv,

We use HAproxy because it excels at load balancing and can balance based on cookie values. We use Nginx as a static assets server and to gzip RESTful APIs responses. You could use Nginx to do all of this if you do not have complicated load balancing requirements. I think Nginx can handle websockets now.


On Thursday, December 19, 2013 8:50:03 AM UTC-8, Bijuv V wrote:
Reply all
Reply to author
Forward
0 new messages