[erlang-questions] couchdb performace 10x: using NIF for file io

Joel Reymont

unread,

Oct 24, 2010, 9:56:34 AM10/24/10

to erlang-q...@erlang.org

Simply switching to NIFs for file IO seems to have improved CouchDB write performance more than ten-fold.

Compare the old graph

http://graphs.mikeal.couchone.com/#/graph/62b286fbb7aa55a4b0c4cc913c00f5a4

to the new graph

http://graphs.mikeal.couchone.com/#/graph/62b286fbb7aa55a4b0c4cc913c00f4d7

I was under the impression that the Erlang IO subsystem was highly optimized but there seems to be no limit to perfection.

NIFs are a giant black hole that will subsume Erlang code as performance has to be improved. Start at the lowest level and keep moving up. All that will be left of Erlang in the end is 99.99999% uptime, fault tolerance and supervision... of optimized C code. It's swell and I'm all for it!

Patch is here:

http://github.com/wagerlabs/couchdb/commit/23527eb8165f81e63d47b230f3297d3072c88d83

---
http://twitter.com/wagerlabs

________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questio...@erlang.org

Robert Virding

unread,

Oct 24, 2010, 10:17:28 AM10/24/10

to Joel Reymont, erlang-q...@erlang.org

One intersting thing in the graphs is the fluctuations occurring in the NIF'ed version. The old version is much more consistent, but slower. I wonder why it is so.

One problem with writing too much in C that there is a big risk that the code will contain more bugs. So while a system technically is "up" it will be less accessible if it has to restart sub-sections more often.

Robert

Martin Scholl

unread,

Oct 24, 2010, 10:30:21 AM10/24/10

to erlang-q...@erlang.org

I'd guess this is because the NIF blocks the current VM thread instead
of a dedicated I/O thread.

Just guessing,
Martin

Joel Reymont

unread,

Oct 24, 2010, 10:43:16 AM10/24/10

to Martin Scholl, erlang-q...@erlang.org

On Oct 24, 2010, at 3:30 PM, Martin Scholl wrote:

> I'd guess this is because the NIF blocks the current VM thread instead
> of a dedicated I/O thread.

+A 4 is used by default and using +A 16 reduces the spikes

http://graphs.mikeal.couchone.com/#/graph/62b286fbb7aa55a4b0c4cc913c011c14

This is the latest 2.66ghz Core i7 MacBook Pro, 8gb memory and a 2 year-old 256gb Apple SSD.

The number of cores is reported as 2 but, obviously, a lot of threads end up waiting.

Rapsey

unread,

Oct 24, 2010, 10:53:58 AM10/24/10

to erlang-q...@erlang.org

Yep NIFs are awesome as hell. I've written a socket library and it works
just beautifully. Once my streaming server is doing a few hundred mbit/s, it
actually consumes less CPU than haproxy that sits in front of it.

Sergej

Dave Smith

unread,

Oct 24, 2010, 1:12:29 PM10/24/10

to Joel Reymont, Martin Scholl, erlang-q...@erlang.org

On Sun, Oct 24, 2010 at 8:43 AM, Joel Reymont <joe...@gmail.com> wrote:
>
> On Oct 24, 2010, at 3:30 PM, Martin Scholl wrote:
>
>> I'd guess this is because the NIF blocks the current VM thread instead
>> of a dedicated I/O thread.
>
> +A 4 is used by default and using +A 16 reduces the spikes

+A anything won't help you anymore. You are now doing the I/O on the
VM scheduler thread and blocking ANY erlang code (on that scheduler)
while the file op runs. This will certainly make the benchmarks look
faster until typical circumstances; I'd be very curious to see what
happens when you get into an overload situation and spend all your
time blocked on file I/O while incoming client requests pile up. Since
your erlang code won't be running all that often, things could get
dicey.

Nonetheless, an interesting experiment.

D.

Joel Reymont

unread,

Oct 24, 2010, 1:35:57 PM10/24/10

to Dave Smith, Martin Scholl, erlang-q...@erlang.org

Dave,

On Oct 24, 2010, at 6:12 PM, Dave Smith wrote:

> +A anything won't help you anymore. You are now doing the I/O on the
> VM scheduler thread and blocking ANY erlang code (on that scheduler)
> while the file op runs.

Are you sure of this? Can you refer me to the spot in the VM code where I can learn more?

The reason I'm surprised is that increasing the number of threads in the async thread pool had a clear effect on my benchmark. Spikes were, basically, eliminated.

Thanks in advance, Joel

Kenneth Lundin

unread,

Oct 24, 2010, 2:16:20 PM10/24/10

to Joel Reymont, Dave Smith, Martin Scholl, erlang-q...@erlang.org

The so called async threads selected with the +A flag are only used by
the file driver that is in the original Erlang distribution from
Ericsson (erlang.org). They can also be used by other user written
drivers (I don't know if CouchDB has any driver like that). An
implementation of file operations as NIF's will not make use of the
asynch threads at all and starting Erlang with the +A <NN> flag should
not make any difference at all for the
execution of the NIFs.

NIFs are executing in the scheduler thread and all other Erlang
processes handled by the same scheduler
are blocked while the NIF is executed. Erlang processes handled by
other schedulers in an SMP setup can whoever execute as normal.

Calling NIFs with potentially long execution times can easily destroy
all multi processing capabilities and
soft real time characteristics for an Erlang node.

I wonder how responsive the system is to other events when running the
benchmark.

Of course it is possible to optimize the file operations if you know
exactly what file system you are working
towards, what file operations the application makes etc. but in the
general case it is not that easy.

/Kenneth Erlang/OTP , Ericsson

Joel Reymont

unread,

Oct 24, 2010, 4:59:11 PM10/24/10

to Kenneth Lundin, Dave Smith, Martin Scholl, erlang-q...@erlang.org

On Oct 24, 2010, at 7:16 PM, Kenneth Lundin wrote:

> I wonder how responsive the system is to other events when running the
> benchmark.

The benchmark simulates several hundred clients hitting a (mochiweb) web server to read and write couchdb (json) documents. The system seems to stay -highly- responsive, 10x so compared to the same system not using NIFs at all.

If low write response time is taken as a measure of system responsiveness then the first graph shows that the responsiveness of the system has increased dramatically. The write response here is the take taken to process a web request to write a couch document.

Max Lapshin

unread,

Oct 25, 2010, 12:21:48 AM10/25/10

to Joel Reymont, Kenneth Lundin, Dave Smith, Martin Scholl, erlang-q...@erlang.org

There is other interesting question: are there plans to support
asynchronous disk IO? It is already somehow implemented in FreeBSD and
in progress in Linux.

Martin Scholl

unread,

Oct 25, 2010, 8:56:01 AM10/25/10

to Max Lapshin, Joel Reymont, Kenneth Lundin, Dave Smith, erlang-q...@erlang.org

On 10/25/2010 06:21 AM, Max Lapshin wrote:
> There is other interesting question: are there plans to support
> asynchronous disk IO? It is already somehow implemented in FreeBSD and
> in progress in Linux.

For linux-land I have written a NIF for it (using a thread for
collecting the results via io_getevents/2).
The performance difference isn't huge compared with blocking I/O -- but
unlike blocking I/O you have to worry about alignment (I/O as well as
memory) and other issues much more.

For Linux-land I'd recommend to stick with Linus' recommendation:
http://kerneltrap.org/node/7563

Martin

Kenneth Lundin

unread,

Oct 25, 2010, 9:27:06 AM10/25/10

to Joel Reymont, erlang-q...@erlang.org

Hi,

Have you tried the very same benchmark with just using the 'raw'
option when opening the file.
You should then get about the same efficiency as you get with NIFs
with the difference that you
can use asynch threads as well (i.e . the +A option will be
effective). Without the +A you will get the same
setup as you get with your NIF example.
Another benefit with that is that you don't need to write any C-code.

If the NIF example still is much faster (which I doubt) please let me know.

Without use of the raw option all file operations is done via an extra
Erlang process which explains the difference in speed.

/Kenneth Erlang/OTP, Ericsson

Joel Reymont

unread,

Oct 25, 2010, 9:41:22 AM10/25/10

to Kenneth Lundin, erlang-q...@erlang.org

Kenneth,

On Oct 25, 2010, at 2:27 PM, Kenneth Lundin wrote:

> Have you tried the very same benchmark with just using the 'raw'
> option when opening the file.

Couch uses the raw option by default. Search for raw in the patch, I left the options untouched and ignore the ones I don't care about in the NIF.

http://github.com/wagerlabs/couchdb/commit/23527eb8165f81e63d47b230f3297d3072c88d83

> You should then get about the same efficiency as you get with NIFs

Not the case.

> If the NIF example still is much faster (which I doubt) please let me know.

NIF is still much faster! Note, however, that the difference on Linux is far less pronounced.

It's a 2x speedup on Linux with a hard disk vs 10x on Mac OSX with SSD.

Paul Davis

unread,

Oct 25, 2010, 10:25:11 AM10/25/10

to Joel Reymont, Kenneth Lundin, Dave Smith, Martin Scholl, erlang-q...@erlang.org

Joel,

I think the issue here is the way that we sequester file io to a
single Erlang process. In your benchmark you'd basically be
sacrificing an Erlang scheduler thread to speed up the synchronous
writes while the other schedulers are freely available to handle
read/write requests.

I'm not sure if Mikeal has written a test for Relaximation that runs a
similar test that hits multiple databases. I think the way I'd try and
show the issue would be to have a large number of clients attempting
to write to their own database.

Also, a trivial way to prove to your self that the NIF's are indeed
called in the same scheduler threads is to create a NIF that has a
function that just does "while(1){}". If you call that function in
more processes than you have schedulers the VM will halt.

HTH,
Paul Davis

Joel Reymont

unread,

Oct 25, 2010, 10:55:08 AM10/25/10

to Paul Davis, Kenneth Lundin, Dave Smith, Martin Scholl, erlang-q...@erlang.org, Mikeal Rogers

On Oct 25, 2010, at 3:25 PM, Paul Davis wrote:

> In your benchmark you'd basically be
> sacrificing an Erlang scheduler thread to speed up the synchronous
> writes while the other schedulers are freely available to handle
> read/write requests.

I sacrifice the schedule to make -both- read and write requests since I replaced all calls to the Erlang file module with calls to my NIFs. Well, except for trivial bits like rename and delete.

I think it should be fine to sacrifice a scheduler for small periods of time on a multicore machine.

I'll ask Mikeal for multi-database tests.

Paul Davis

unread,

Oct 25, 2010, 11:21:37 AM10/25/10

to Joel Reymont, Kenneth Lundin, Dave Smith, Martin Scholl, erlang-q...@erlang.org, Mikeal Rogers

On Mon, Oct 25, 2010 at 10:55 AM, Joel Reymont <joe...@gmail.com> wrote:
>
> On Oct 25, 2010, at 3:25 PM, Paul Davis wrote:
>
>> In your benchmark you'd basically be
>> sacrificing an Erlang scheduler thread to speed up the synchronous
>> writes while the other schedulers are freely available to handle
>> read/write requests.
>
> I sacrifice the schedule to make -both- read and write requests since I replaced all calls to the Erlang file module with calls to my NIFs. Well, except for trivial bits like rename and delete.
>
> I think it should be fine to sacrifice a scheduler for small periods of time on a multicore machine.
>
> I'll ask Mikeal for multi-database tests.
>
> ---
> http://twitter.com/wagerlabs
>
>

Joel,

What's the final size of your database though? IIRC, the point of
those tests wasn't really to test how fast the disk can read/write
data, but to look at hour readers and writers interact, ie, do lots of
readers make writes disproportionately slower? The point being that if
you have enough RAM you could be caching extensive parts of the
database in memory which would have the general effect of making most
reads be roughly non-blocking.

The proper way to test this would be to try and figure out a way to
saturate disk io so that a large number of read/write calls are
blocking and then try and do something that doesn't touch disk. I
can't think of a very good way to set that up other than maybe to
create a large number of large databases, compact each of them
simultaneously, and then try and run the reader/writer tests or some
such.

HTH,
Paul Davis

Edmond Begumisa

unread,

Oct 27, 2010, 3:46:01 AM10/27/10

to Kenneth Lundin, Joel Reymont, Dave Smith, Martin Scholl, erlang-q...@erlang.org

Hi,

I hope the Couch team isn't planning on doing this by default -- something
about it makes me nervous...

When CouchDB is on it's own, it might not be alarming/noticeable, but I'm
using CouchDB "embedded" in a wider Erlang/OTP application stack (i.e.
where Couch is just one of many OTP apps running in the *SAME* VM -- I
have a few hacks for avoiding socket communication.) I too worry about the
potential for NIF-endowed couch io disturbing the balance of Erlang's
scheduling.

It would be good to see similar benchmarking with the VM concurrently
doing things other than handling couch-related requests (which are
implicitly synchronised in your case.)

- Edmond -

--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

Paul Davis

unread,

Oct 27, 2010, 10:19:18 AM10/27/10

to Edmond Begumisa, Kenneth Lundin, Joel Reymont, Dave Smith, Martin Scholl, erlang-q...@erlang.org

On Wed, Oct 27, 2010 at 3:46 AM, Edmond Begumisa
<ebeg...@hysteria-tech.com> wrote:
> Hi,
>
> I hope the Couch team isn't planning on doing this by default -- something
> about it makes me nervous...
>
> When CouchDB is on it's own, it might not be alarming/noticeable, but I'm
> using CouchDB "embedded" in a wider Erlang/OTP application stack (i.e. where
> Couch is just one of many OTP apps running in the *SAME* VM -- I have a few
> hacks for avoiding socket communication.) I too worry about the potential
> for NIF-endowed couch io disturbing the balance of Erlang's scheduling.
>
> It would be good to see similar benchmarking with the VM concurrently doing
> things other than handling couch-related requests (which are implicitly
> synchronised in your case.)
>
> - Edmond -
>

Edmond,

You don't have to worry about us changing away from the default
configuration without making sure that different types of loads are
either unaffected for similarly improved. I've already had to address
this specific issue when integrating Emonk in an experimental branch
so it was the first thing I noticed about the file descriptor patch.

Out of curiosity, do you have another Erlang app that would be a good
candidate for using to test that other parts of the VM remain
responsive. I was going to suggest various parts of CouchDB that don't
touch IO as a smoke screen, but an app doing something real that we
can measure in and out of couch and with and without the new file io
would be a good help.

Paul Davis

Elliot Murphy

unread,

Oct 27, 2010, 12:54:44 PM10/27/10

to Paul Davis, Edmond Begumisa, Kenneth Lundin, Joel Reymont, Dave Smith, Martin Scholl, erlang-q...@erlang.org

On Wed, Oct 27, 2010 at 10:19 AM, Paul Davis
<paul.jos...@gmail.com> wrote:
> Out of curiosity, do you have another Erlang app that would be a good
> candidate for using to test that other parts of the VM remain
> responsive. I was going to suggest various parts of CouchDB that don't
> touch IO as a smoke screen, but an app doing something real that we
> can measure in and out of couch and with and without the new file io
> would be a good help.

I don't know how realistic it is to run both on the same server, but
RabbitMQ is another Erlang app that I deploy in the same sites as
CouchDB.

-elliot

Paul Davis

unread,

Oct 27, 2010, 1:17:32 PM10/27/10

to Elliot Murphy, Edmond Begumisa, Kenneth Lundin, Joel Reymont, Dave Smith, Martin Scholl, erlang-q...@erlang.org

On Wed, Oct 27, 2010 at 12:54 PM, Elliot Murphy <elliot...@gmail.com> wrote:
> On Wed, Oct 27, 2010 at 10:19 AM, Paul Davis
> <paul.jos...@gmail.com> wrote:
>> Out of curiosity, do you have another Erlang app that would be a good
>> candidate for using to test that other parts of the VM remain
>> responsive. I was going to suggest various parts of CouchDB that don't
>> touch IO as a smoke screen, but an app doing something real that we
>> can measure in and out of couch and with and without the new file io
>> would be a good help.
>
> I don't know how realistic it is to run both on the same server, but
> RabbitMQ is another Erlang app that I deploy in the same sites as
> CouchDB.
>
> -elliot
>

That's a bit more heavy weight than I was thinking. After a bit of
Googling I might take a look at writing something with egd to do CPU
saturation without being too synthetic.

Paul

Kenneth Lundin

unread,

Oct 27, 2010, 3:13:26 PM10/27/10

to Edmond Begumisa, Joel Reymont, Dave Smith, Martin Scholl, erlang-q...@erlang.org

My and the OTP teams view in this matter is:

- Reimplementing standard functionality already written in C (but as a
driver and with asynch thread support) as NIFs is generally a bad
idea)
- Implementing potentially blocking function calls with NIFs is a bad idea.
- You should have VERY strong reasons for writing NIFs at all. It is a
big point in not writing anything in C if it can be avoided.
- The implementation of NIFs is more modern than the driver concept
and among other things the passing of data between Erlang and C-code
is more efficient for NIFs than for drivers. The driver concept does
still have its place and advantages especially for
handling external asynchronous input to Erlang processes. We plan to
improve the driver mechanisms and make solutions
from the NIFs to be available when writing drivers as well.

If it is correct that NIFs for file operations is 2 times faster than
the current file operations in raw mode we will do something about it
because that difference is not justified. But first we must
investigate if that really is the case and where the performance is
lost.

/Kenneth Erlang/OTP Ericsson

Joel Reymont

unread,

Oct 27, 2010, 3:28:03 PM10/27/10

to Kenneth Lundin, Edmond Begumisa, Dave Smith, MartinScholl, erlang-q...@erlang.org

Kenneth,

The difference on the Mac is 10x for writes. It's at least 2x on Linux.

How do you propose to investigate?

Sent from my iPhone

Dave Smith

unread,

Oct 27, 2010, 4:18:19 PM10/27/10

to Joel Reymont, Kenneth Lundin, Edmond Begumisa, MartinScholl, erlang-q...@erlang.org

On Wed, Oct 27, 2010 at 1:28 PM, Joel Reymont <joe...@gmail.com> wrote:
> Kenneth,
>
> The difference on the Mac is 10x for writes. It's at least 2x on Linux.

Just to be pedantic, that 10x # is for a SSD, no? Otherwise, I've
never seen OS X/Darwin win any awards for I/O speed... :)

D.

Joel Reymont

unread,

Oct 27, 2010, 4:28:48 PM10/27/10

to Dave Smith, Kenneth Lundin, Edmond Begumisa, MartinScholl, erlang-q...@erlang.org

Writing to SSD, yes.

There's no award for speed, it's a relative 10x improvement, Darwin vs Darwin.

Sent from my iPhone

Edmond Begumisa

unread,

Oct 28, 2010, 3:37:25 AM10/28/10

to Paul Davis, Kenneth Lundin, Joel Reymont, Dave Smith, Martin Scholl, erlang-q...@erlang.org

You could try embedding yaws serving both static and dynamic content
(completely unrelated to couch's requests.) You could then also use
hovercraft for couch io -- I imagine this would eliminate yaws + couch's
mochiweb sharing the socket driver thereby giving a better picture of how
NIF io affects the yaws side.

- Edmond -

Edmond Begumisa

unread,

Oct 28, 2010, 4:06:25 AM10/28/10

to Joel Reymont, Kenneth Lundin, Paul Davis, Dave Smith, MartinScholl, erlang-q...@erlang.org

As a devout Couch user, what I'd like to see is a pragmatic approach as
the couch team investigates this -- possibly targeting an optional
configuration with a clear description of the trade-offs to Couch users.
For example, the Couch team could advise using this new feature strictly
in a private emulator devoted to couch since the benchmarks show that NIF
io doesn't seem to have a detrimental effect on the rest of couch.

Personally, for my current uses, Couch's existing io performance is
sufficient. The use of hovercraft, the Erlang view server and some
workarounds to avoid mochiweb have been effective in giving me performance
boosts where I need them.

That said, it cannot be denied that for a database server, a 2-10 fold
increment in io performance is a good enough reason to do things that you
should not really be doing or that are potentially disruptive. For some
couch users, the pros might outweigh the cons. What I'd like to see is a
clear description of what those cons are so I can decide for myself.

- Edmond -

Reply all

Reply to author

Forward