Node performance issues

domini...@digital.cabinet-office.gov.uk

unread,

Sep 13, 2018, 10:10:22 AM9/13/18

to nodejs

We are trying to debug a poorly performing node application and would appreciate any help or advice from this community. We have a node application that serves as the user facing frontend for a payment platform - code here https://github.com/alphagov/pay-frontend. We are in the process of assessing and expanding our capacity to meet increasing need.

We have a target of being able to serve X payment journeys per second.

A payment journey comprises 4 pages, two of which require a form submission.

Each page in the journey entails some communication between the node application in question (that we helpfully call frontend) and other microservices to establish the current status of the payment etc, on average around 2 http calls per page.

By carrying out performance tests (using Gatling) we have found that in order to meet our target of X tx/s, we have to provision around X/2 frontend nodes, i.e. each frontend node appears capable of processing around 2 payment journeys per second on average.

This seems wrong - by my reckoning it is wrong by orders of magnitude.

Details about our tech stack

We are on aws, and the frontends run in docker containers on C5.large ec2 instances.

We use https internally

We are running node 8 in production

The application is an express app

We use http.request to make downstream requests, but have also experimented with using request, with no appreciable difference.

There is no major cpu heavy processes in our frontend app, and event loop latency under normal load is fine

What we have found so far

The frontend nodes are CPU bound

Under strain/near breaking point, profiling reveals the frontends seem to be spending a large amount of time doing things related to making downstream http requests, but nothing obviously ludicrous.

Whilst there is no obvious memory leak, the heap dump deltas show a proportionately large number of Sockets hanging around - I think this is just due to keepalives though

Even not under heavy load, the network latency for a request seems high for an internal request - we are seeing average latency of ~20-40ms, vs around 2-5ms for a Java app that is more or less identical in the calls it's making.

Break down of the phases of a request (gained from request library's timing facility) reveals that under low load on average socket wait, dns lookup and tcp connection take practically no time - bulk of time is waiting for server response

Under load it appears to be the time to establish a tcp connection and the time to 'firstByte' that contribute to overall increase in http request time

Things we have tried

We have tried configuring the standard agent with different values of maxSockets, maxFreeSockets...

We have tried using different agents

We have tried disabling socket pooling entirely

We have tried two different client libs - the core http module, and request.

We have matched the number of workers in our cluster to the number of CPUs

Some of these things have yielded gains of ~10%, but I am still convinced there is something fundamentally wrong with the architecture and configuration of the application - the throughput just seems too low.

I realise I haven't given enough detail to solve anything here, but if anyone has any guidance on approaches that have worked for them, other knobs to twiddle, guidance on better interpretation of profiling and heap dumps, or any other useful pointers I would be very grateful.

Dom

Mikkel Wilson

unread,

Sep 15, 2018, 11:52:15 AM9/15/18

to nodejs

Dom,

You've mentioned the number of requests made from the frontend to the backend, but how many requests are you making from the backend express app to other microservices internally? You mention a 20-40ms latency which, I agree, seems abnormally high. If you're making 10 such sequential requests, that would explain the low 'journey' performance.

Things to look for:

- How is your docker networking set up? Swarm? k8s? If each microservice is running on each node of your production cluster, it may be choosing to connect to a remote node rather than one on localhost. Try adding an isotope (unique request ID generated at the outermost layer and forwarded/logged in every microservice) to see where the request is actually traveling. (Tip: CloudFlare sends a CF-RAY header. It's unique per request and you can use it this way.)

- Network routing. Ideally your edge nodes/LB would be externally accessible and internal microservice nodes *not* externally accessible. If the upstream nodes have external IPs, your DNS may be resolving to the external IP, which would be a longer network path and change the latency for AWS networking (ALB?). 'traceroute' is your friend here.

- Are the requests to internal microservices very small? If the size of the request/response to/from the internal microservices is smaller than the HTTP headers sent across, you should consider a different RPC mechanism.

- Do you need HTTPS on internal requests? Again, size of total request vs. size of payload should be balanced. Terminating SSL on the edge (perhaps in an ALB) would reduce the size of the internal requests.

- Not your fault? Is one of your microservices making a request to a slow or rate limited external service? Sending emails, generating PDFs, running CC transactions, etc. can be slow so you should run them asynchronously.

- Slow EC2 instance? Sometimes they are just bunk and only perform at 50% of what others do. It's an AWS mystery. Just kill the slow node and create a new one.

Alternative RPC mechanisms:

- Gearman (http://gearman.org/) is particularly useful if you have a mixed-language environment. It's fast, stable, supports retry for failed nodes, and sends ~10,000 emails a minute for Craigslist.

- gRPC (https://grpc.io/docs/tutorials/basic/node.html) uses protobuf for high-throughput, low latency RPC. Fast, stable, supported by Google.

- ZeroMQ (https://www.npmjs.com/package/zmq) is more of a socket transport than RPC mechanism, but depending on what your upstream microservices are doing this can be useful. It can also maintain a socket between services so setup/teardown time of the socket is minimized. Downsize: bearbones - you'll need to build many features yourself. Upside: Crazy fast. Used by high-frequency traders for stock market bots.

Debugging/rearchitecting stuff like this is my jam. Email me if you want to talk.

HTH,

Mikkel

Oblivious.io

Atul Agrawal

unread,

Sep 15, 2018, 2:11:46 PM9/15/18

to nodejs

are you running your nodejs apps in cluster mode in one instance?
Check why we need to put front end contents on the server itself because we can mange them from S3 or and CDNs

Check if we can use caching which can improve performance drastically

Reply all

Reply to author

Forward