Re: [nodejs] nodejs zlib performance

Ben Noordhuis

unread,

Oct 24, 2012, 8:41:23 PM10/24/12

to nod...@googlegroups.com

On Thu, Oct 25, 2012 at 12:26 AM, Vadim Antonov <ava...@gmail.com> wrote:
> Hi everybody,
>
> I've tried to google about the nodejs zlib performance and didn't find any
> useful information.
> I work on the high-loaded API which communicates with multiple backend
> servers. Throughput on 1 VM with 8 cores is around 200 QPS and for every
> query application makes up to 50 queries to cache/backends. Every response
> from cache/backends is compressed and requires decompression.
> It ends up that application need to make up too 10000 decompressions per
> second.
>
> Based on the zlib code for every decompression new thread from the thread
> pool is being used (one binding to the C++ code per decompression, we use
> naive method http://nodejs.org/api/zlib.html#zlib_zlib_gunzip_buf_callback -
> there is no way to use the stream methods):
> https://github.com/joyent/node/blob/master/lib/zlib.js#L272
> https://github.com/joyent/node/blob/master/src/node_zlib.cc#L430
>
> Service has started to see huge performance spikes a couple of times during
> the day which are coming from the decompression code: from time to time
> decompression takes up to 5 seconds and all decompression calls are blocked
> during this time.
> I think that the issue is coming from the thread pool (uv_work_t) which zlib
> is using. Does anybody else see the same behavior? Is there any workarounds
> for it? Where can I find documentation about it? V8 code?
> At this point of time we've started to use snappy library
> (https://github.com/kesla/node-snappy) with sync compression/decompression
> calls. But service still need to decompress backend responses with gzip...
>
> To illustrate a little bit what I'm talking about, here is a small example
> (it generate 'count' buffers, decompresses them 'count2' times and writes
> all + min/max/avg timings).
>
> var _ = require('underscore');
> var rbytes = require('rbytes');
> var step = require('step');
> var zlib = require('zlib');
>
> var count = 10;
> var count2 = 1000;
> var count3 = 0;
> var len = 1024;
> var buffers = [];
> var timings = {};
> var totalTime = 0;
> var concurrent = 0;
> var maxConcurrent = 128;
>
> function addCompressed(done) {
> zlib.gzip(rbytes.randomBytes(len), function (error, compressed) {
> buffers.push(compressed);
> done();
> });
> }
>
> function decompress(done) {
> var time = Date.now();
> zlib.gunzip(buffers[Math.floor(Math.random() * count)], function (error,
> decompresed) {
> if (error) {
> console.log(error);
> }
> var total = Date.now() - time;
> totalTime += total;
> if (!timings[total]) {
> timings[total] = 0;
> }
> timings[total]++;
> ++count3;
> if (done && count3 == count2) {
> done();
> }
> });
> }
>
> step(
> function genBuffers() {
> for(var i = 0; i < count; ++i) {
> var next = this.parallel();
> addCompressed(next);
> }
> },
> function runDecompression() {
> var next = this;
> for(var i = 0; i < count2; ++i) {
> decompress(next);
> }
> },
> function writeTotal() {
> var min = null;
> var max = -1;
> _.each(timings, function(total, value) {
> max = Math.max(value, max);
> min = min ? Math.min(min, value) : value;
> console.log(value + ' ' + total);
> });
> console.log('min ' + min);
> console.log('max ' + max);
> console.log('avg ' + totalTime / count2);
> }
> );
>
> Here'are results for different amount decompressions (amount of
> compressions, min/max/avg timings):
> 10 0 1 0.1
> 100 1 6 3.8
> 1000 19 47 30.7
> 10000 149 382 255.0
> 100000 4120 18607 16094.3
>
> Decompression time grows based on the amount of concurrent decompressions.
> Is there a way to make it faster/limit amount of threads which zlib is
> using?

How many active threads do you see in e.g. htop? There should
preferably be as many threads as there are cores in your machine (give
or take one).

Aside, the current thread pool implementation is a known bottleneck in
node right now. We're working on addressing that in master but it's
not done yet.

Vadim Antonov

unread,

Oct 25, 2012, 1:24:12 PM10/25/12

to nod...@googlegroups.com

There're 8 processes (1 per core, created with cluster lib) and every process has got 6 threads.

Do you have an ETA for the thread pool updates? Is there a way how can we help you?

Thank you.

--

Vadim

Jorge

unread,

Oct 25, 2012, 5:15:46 PM10/25/12

to nod...@googlegroups.com

Threads are evil™, don't use threads.

The Node Way® (just don't ask) is to pipeline processes as in the good ol' 70s. Flower Power, peace and love bro, and etc.

Cheers,
--
Jorge.

> --
> Job Board: http://jobs.nodejs.org/
> Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com
> To unsubscribe from this group, send email to
> nodejs+un...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en?hl=en

Vadim Antonov

unread,

Oct 25, 2012, 5:22:29 PM10/25/12

to nod...@googlegroups.com

We don't use threads by ourselves. It's default nodejs zlib library implementation.

--

Vadim

Mikeal Rogers

unread,

Oct 25, 2012, 5:22:51 PM10/25/12

to nod...@googlegroups.com

Seriously? How do you read this stuff before you send it and not think you're a troll?

Isaac Schlueter

unread,

Oct 25, 2012, 5:24:46 PM10/25/12

to nod...@googlegroups.com

Jorge,

Please do not make snarky remarks about Node on this mailing list. If
you have a problem with something, bring it up in a new thread. If
you have something to add to this thread, then please do so, but this
is not helpful.

Jorge

unread,

Oct 25, 2012, 5:26:12 PM10/25/12

to nod...@googlegroups.com

Are you saying that node, internally, delegates CPU intensive work to background threads?

Heresy!
--
Jorge.

Jorge

unread,

Oct 25, 2012, 5:38:04 PM10/25/12

to nod...@googlegroups.com

Hi Isaac,

He could pipe uncompressed output from node process A onto compressor node process B. That's the Node Way®, isn't it?

Or, he could do it all in a single node process, but that would mean delegating CPU intensive jobs to background threads, the mere idea of which is something that unnerves more than one here.

Or not?

Cheers,
--
Jorge.

Mark Hahn

unread,

Oct 25, 2012, 5:38:29 PM10/25/12

to nod...@googlegroups.com

> Are you saying that node, internally, delegates CPU intensive work to background threads?

What the heck are you talking about? There is no such feature and there shouldn't be. Are you just trolling? If so, quit it.

Mark Hahn

unread,

Oct 25, 2012, 5:40:57 PM10/25/12

to nod...@googlegroups.com

> He could pipe uncompressed output from node process A onto compressor node process B.

Could he use buffers? Would that be faster? What is the overhead of the piping compared to the compressing? I assume it would be minor.

Nathan Rajlich

unread,

Oct 25, 2012, 5:45:14 PM10/25/12

to nod...@googlegroups.com

Jorge, knock it off, seriously. Like Isaac said, actually bring
something useful to the discussion otherwise you're blatantly
trolling.

On Thu, Oct 25, 2012 at 2:38 PM, Jorge <jo...@jorgechamorro.com> wrote:

Jimb Esser

unread,

Oct 25, 2012, 7:49:24 PM10/25/12

to nod...@googlegroups.com

That is exactly what node seems to do with the zlib API. Though there are times where this is great, this API definitely bothers me, and causes some problems. In theory, if I have a 4 core machine, and have 4 busy node processes, and each of them try to use the zlib API, suddenly I've got 8 cores worth of threads all fighting for the CPU (or, possibly many more if multiple zlib requests are made and each process has a thread pool equal to the number of cores?). Since zlib stuff is CPU-intensive, not I/O intensive, it would be great if there was a synchronous API so that we can ensure the handling of a single task (whether it be expensive JS code, or zlib operations) is consuming only a core. The async API on Zlib stuff seems odd when compared to the Crypto APIs, which are all synchronous, despite being, I think, quite more CPU-intensive (per-byte) than Zlib.

Admittedly, there are OS-level CPU affinity masks which could be used here, but in general that's not particularly good for overall performance.

On Thursday, October 25, 2012 2:39:10 PM UTC-7, Mark Hahn wrote:

> Are you saying that node, internally, delegates CPU intensive work to background threads?

Dan Milon

unread,

Oct 25, 2012, 7:54:11 PM10/25/12

to nod...@googlegroups.com

Thats out of node's scope.
If you've got too much work to do, and too few cores, well, shit get
scheduled. Plus, there are a lot of processes already fighting for some
CPU juice.

Essentially what you're talking here is CPU scheduling.

danmilon.

Isaac Schlueter

unread,

Oct 25, 2012, 8:16:39 PM10/25/12

to nod...@googlegroups.com

Yes, it is a bit weird that crypto is sync and zlib is async. It'd be
nice if we had a sync zlib API, and could move crypto stuff onto the
thread pool, especially for TLS.

Ben Noordhuis

unread,

Oct 25, 2012, 9:09:37 PM10/25/12

to nod...@googlegroups.com

On Thu, Oct 25, 2012 at 7:24 PM, Vadim Antonov <ava...@gmail.com> wrote:
> There're 8 processes (1 per core, created with cluster lib) and every
> process has got 6 threads.
> Do you have an ETA for the thread pool updates? Is there a way how can we
> help you?

It's expected to land in v0.10. If you want, you can help out by
testing upcoming v0.9 dev releases; any thread pool improvements will
be mentioned in the Changelog.

Reply all

Reply to author

Forward