Linux epoll is now supported

5,334 views
Skip to first unread message

Salvatore Sanfilippo

unread,
Nov 24, 2009, 4:29:13 AM11/24/09
to Redis DB
Hello all,

before 1.1 it was too important to add support for epoll in order to
make Redis more scalable when there are many clients connected. In
theory the way to go is to use libevent() but I was not happy with
this because Redis is using a simple event loop, and I've full control
over it. Libevent *alone* is three times the codebase of Redis!

Also big libs are not always sane at every level. Just an example,
stable releases of libevent do a nonsensical reallocation business at
runtime in the arrays of events (they fixed it in 2.0 that is still
not considered stable). So I modified ae.c to make it modular, and to
avoid every kind of O(N) business in the core at the cost of more
memory allocated (but not a problem in Redis where it will be 0.001%
;) probably a problem on embedded systems, and libevent runs on
Android too for instance).
Now adding and removing an event is O(1), that is going to be very
important with 10k clients. For now we just use a single timer in
Redis, but if in the future more are needed I can hack the timers in
ae.c in order to create an O(log(N)) implementation using skip lists.

Currently we have just two modules, ae_select.c and ae_epoll.c

Given that writing a new module is very little work I'll probably
support ae_kevent.c, not sure if before or after 1.1.

If somebody with a good Linux box can try this to check if the
performances are what we expect it could be cool.

Note that redis-benchmark now has a new feature that helps about this,
that is, the -I switch.

./redis-benchmark -I -c 1000 ; # will open 1000 idle connections.

So that with another benchmark it's possible to test how this is scaling.

Cheers,
Salvatore

--
Salvatore 'antirez' Sanfilippo
http://invece.org

"Once you have something that grows faster than education grows,
you’re always going to get a pop culture.", Alan Kay

Sergey Shepelev

unread,
Nov 24, 2009, 4:56:58 AM11/24/09
to redi...@googlegroups.com
On Tue, Nov 24, 2009 at 12:29 PM, Salvatore Sanfilippo
<ant...@gmail.com> wrote:
> Hello all,
>
> before 1.1 it was too important to add support for epoll in order to
> make Redis more scalable when there are many clients connected. In
> theory the way to go is to use libevent() but I was not happy with
> this because Redis is using a simple event loop, and I've full control
> over it. Libevent *alone* is three times the codebase of Redis!
>
> Also big libs are not always sane at every level. Just an example,
> stable releases of libevent do a nonsensical reallocation business at
> runtime in the arrays of events (they fixed it in 2.0 that is still
> not considered stable).

Yeah, libevent sucks.
libev, on the opposite is small, well thought and clean library. It
doesn't provide any high-level features like HTTP, but it does provide
very good low level features.

Give it a try. http://software.schmorp.de/pkg/libev.html
> --
>
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>
>

Pedro Melo

unread,
Nov 24, 2009, 5:32:42 AM11/24/09
to redi...@googlegroups.com
Hi,

On 2009/11/24, at 09:56, Sergey Shepelev wrote:
> On Tue, Nov 24, 2009 at 12:29 PM, Salvatore Sanfilippo
> <ant...@gmail.com> wrote:
>> Hello all,
>>
>> before 1.1 it was too important to add support for epoll in order to
>> make Redis more scalable when there are many clients connected. In
>> theory the way to go is to use libevent() but I was not happy with
>> this because Redis is using a simple event loop, and I've full
>> control
>> over it. Libevent *alone* is three times the codebase of Redis!
>>
>> Also big libs are not always sane at every level. Just an example,
>> stable releases of libevent do a nonsensical reallocation business at
>> runtime in the arrays of events (they fixed it in 2.0 that is still
>> not considered stable).
>
> Yeah, libevent sucks.
> libev, on the opposite is small, well thought and clean library. It
> doesn't provide any high-level features like HTTP, but it does provide
> very good low level features.
>
> Give it a try. http://software.schmorp.de/pkg/libev.html

+1 on libev, you will be much happier with it.

Bye,

Summer

unread,
Nov 28, 2009, 11:19:56 AM11/28/09
to Redis DB
Great job, that means redis will be the next big thing
> Salvatore 'antirez' Sanfilippohttp://invece.org

Salvatore Sanfilippo

unread,
Nov 28, 2009, 11:31:08 AM11/28/09
to redi...@googlegroups.com
On Tue, Nov 24, 2009 at 11:32 AM, Pedro Melo <me...@simplicidade.org> wrote:

> +1 on libev, you will be much happier with it.

Hello Sergey, Pedro,

now Redis supports kevent as well (Thanks to Harish Malipeddi) ,
please take a look at my new implementation of ae.c and how simple is
to add support for a new module and how it's all an O(1) business to
add and remove events.

The low level modules like ae_epoll.c ae_select.c ae_kevent.c expose
just a minimal "ideal" API, while the higher level layer does to work
of taking care of the max FD currently alive and to take the higher
level state.

I think that we can be happy with this without to add other
dependencies after all, and it's not excluded that at some point will
want to have a more interesting semantic for our event loop, for
instance for LOCK if it will ever get implemented and for Virtual
Memory (ability to "pause" events, ability to read chunks of on-disk
files in background and so forth). Maybe we'll have to hack ae.c
enough that's ok with "our" code but not with other stuff linked to
Redis.

Salvatore Sanfilippo

unread,
Nov 28, 2009, 11:32:24 AM11/28/09
to redi...@googlegroups.com
On Sat, Nov 28, 2009 at 5:19 PM, Summer <youfo...@gmail.com> wrote:
> Great job, that means redis will be the next big thing

Well a lot more work is needed, but thanks!

Joan Miller

unread,
Nov 28, 2009, 12:39:44 PM11/28/09
to Redis DB
+1 on libev, in addition there is a python interface that would
improve the python client.

http://code.google.com/p/pyev/
> > Give it a try.http://software.schmorp.de/pkg/libev.html

Grégoire Welraeds

unread,
Nov 28, 2009, 12:41:28 PM11/28/09
to redi...@googlegroups.com
If I'm not mistaken: The nice thing is that you are adding support for kqueue/kevent for all *BSD platforms, including Mac OS X.
+1
Grégoire.

Pedro Melo

unread,
Nov 28, 2009, 1:25:57 PM11/28/09
to redi...@googlegroups.com
Hi,

On 2009/11/28, at 16:31, Salvatore Sanfilippo wrote:

> On Tue, Nov 24, 2009 at 11:32 AM, Pedro Melo <me...@simplicidade.org>
> wrote:
>
>> +1 on libev, you will be much happier with it.
>
> now Redis supports kevent as well (Thanks to Harish Malipeddi) ,
> please take a look at my new implementation of ae.c and how simple is
> to add support for a new module and how it's all an O(1) business to
> add and remove events.

Its not about being easy or difficult to add those modules. Its about
correctness.

Please look at the the libev changelog and seach for broken: all those
high-speed network APIs are riddled with small incompatibilities
between versions of the same OS, even worse between OSs. I think that
redis primary effort is not to create another event-driven-io-lib, so
it makes more sense to me to reuse a library that already takes care
of those problems, and falls back to safe, working, backends.

http://cvs.schmorp.de/libev/Changes?view=markup

You mention kqueue: for example, libev refuses to use kqueue on Mac OS
X 10.5.x but uses it on 10.6 for example, due to bugs that 10.5 has.

Bye,

Salvatore Sanfilippo

unread,
Nov 28, 2009, 2:08:44 PM11/28/09
to redi...@googlegroups.com
On Sat, Nov 28, 2009 at 7:25 PM, Pedro Melo <me...@simplicidade.org> wrote:

> Its not about being easy or difficult to add those modules. Its about
> correctness.
>
> Please look at the the libev changelog and seach for broken: all those
> high-speed network APIs are riddled with small incompatibilities
> between versions of the same OS, even worse between OSs. I think that
> redis primary effort is not to create another event-driven-io-lib, so
> it makes more sense to me to reuse a library that already takes care
> of those problems, and falls back to safe, working, backends.

Pedro I think that from a software engineering point of view you are
right, the probability of a Redis event loop bug are smaller reusing a
library that's already well tested. This is a list of things that will
prevent me of doing the right thing, I don't claim they are
objectively acceptable btw, so I understand very well the disagreement
about this issues:

- Many libs that in theory are well tested actually have bugs if used
in a way different that the top-N projects using this lib. For
instance the only piece of "external" code used in Redis, the LZF
compression, is around since *years*. After a few days of using it I
found a memory corruption bug (off by one). But everybody is using it
and is well tested! And still it was bugged.

- I plan to use a lot of timers in the future. All this libs are using
an O(N) timer algorithm, at least this is what I see from the sources.
A balanced tree or a skiplist is needed to improve over this. When
I'll need it I'll develop it without to wait for external developers
to merge my changes.

- I hate ./configure. Actually all this configure magics in a thing
like an event loop library is really tagetting X well known systems.
I'm more happy with a zero-configuration experience of Redis. I also
don't want to depend on external code, so the alternative was to
include external code into the redis tar ball, and make sure to
upgrade it if there are problems.

- I need to use zfree/zmalloc everywhere.

- There is some value in writing yet another event loop library if
it's simpler than others to read. For instance some day ago I saw
somebody on Twitter suggesting Redis ae.c as a good reading about a
simple event loop that actually is working in the real world.

> http://cvs.schmorp.de/libev/Changes?view=markup
>
> You mention kqueue: for example, libev refuses to use kqueue on Mac OS
> X 10.5.x but uses it on 10.6 for example, due to bugs that 10.5 has.

This is a good point. What I can do at least is to read the changelogs
of libev and libevent to figure what I should avoid and add a few more
#ifdefs to exclude OSes with broken implementations using ae_select
instead.

So you are right from an absolute point of view, but I've subjective
reasons for taking ae.c in Redis.

Salvatore Sanfilippo

unread,
Nov 28, 2009, 2:47:01 PM11/28/09
to redi...@googlegroups.com
Hello again,

I discovered the author of libev is expecting a bit too much from
kqueue/kevent btw:

> Kqueue deserves special mention, as at the time of this writing, it
> was broken on all BSDs except NetBSD (usually it doesn't work reliably
> with anything but sockets and pipes, except on Darwin, where of course
> it's completely useless). Unlike epoll, however, whose brokenness

Apart from being broken on Darwin, that's a bad thing, it's absolutely
the norm that things like select(2) poll(2) and so forth will not work
against disk I/O. At best they'll report that the file descriptor is
ready for both reading and writing immediately.

AFAIK in Linux the way to go is AIO. Don't know on BSD kernels, but
probably async disk I/O is a rare place where I could like a lot more
a thread based solution (and I've to pick one as with virtual memory
Redis will need a way to do non blocking disk I/O otherwise if a
client is going to read a key that's on disk it will block even the
other clients that may continue without problems).

That said in order to be safe for now I'm disabling the support for
kqueue/kpoll in Redis for Mac OS X < 10.6.x

Thanks,
Salvatore

Pedro Melo

unread,
Nov 28, 2009, 4:24:48 PM11/28/09
to redi...@googlegroups.com
Hi,

On 2009/11/28, at 19:08, Salvatore Sanfilippo wrote:
> On Sat, Nov 28, 2009 at 7:25 PM, Pedro Melo <me...@simplicidade.org>
> wrote:
>> Its not about being easy or difficult to add those modules. Its about
>> correctness.
>>
>> Please look at the the libev changelog and seach for broken: all
>> those
>> high-speed network APIs are riddled with small incompatibilities
>> between versions of the same OS, even worse between OSs. I think that
>> redis primary effort is not to create another event-driven-io-lib, so
>> it makes more sense to me to reuse a library that already takes care
>> of those problems, and falls back to safe, working, backends.
>
> Pedro I think that from a software engineering point of view you are
> right, the probability of a Redis event loop bug are smaller reusing a
> library that's already well tested. This is a list of things that will
> prevent me of doing the right thing, I don't claim they are
> objectively acceptable btw, so I understand very well the disagreement
> about this issues:

I really hope that I don't seem pushy about this, really :), just
pointing out the advantages. you know much better than I what is the
best path for Redis.


> - Many libs that in theory are well tested actually have bugs if used
> in a way different that the top-N projects using this lib. For
> instance the only piece of "external" code used in Redis, the LZF
> compression, is around since *years*. After a few days of using it I
> found a memory corruption bug (off by one). But everybody is using it
> and is well tested! And still it was bugged.

Of course, no lib is without bugs.

I can only say that libev is very active and the author responds
quickly to bug reports on the mailing list.


> - I plan to use a lot of timers in the future. All this libs are using
> an O(N) timer algorithm, at least this is what I see from the sources.
> A balanced tree or a skiplist is needed to improve over this. When
> I'll need it I'll develop it without to wait for external developers
> to merge my changes.

I believe timers are O(log N), but you can see the algorithm
complexity section of the documentation:

http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod#ALGORITHMIC_COMPLEXITIES

Actually, libev is one of the few libraries that *documents* the
complexity of the algorithms it uses. :)

Anyway, I'm sure it will all work out.

Bye,

Pedro Melo

unread,
Nov 28, 2009, 4:29:06 PM11/28/09
to redi...@googlegroups.com
Hi,

On 2009/11/28, at 19:47, Salvatore Sanfilippo wrote:

> I discovered the author of libev is expecting a bit too much from
> kqueue/kevent btw:

Yes, the author is very strict :)


>> Kqueue deserves special mention, as at the time of this writing, it
>> was broken on all BSDs except NetBSD (usually it doesn't work
>> reliably
>> with anything but sockets and pipes, except on Darwin, where of
>> course
>> it's completely useless). Unlike epoll, however, whose brokenness
>
> Apart from being broken on Darwin, that's a bad thing, it's absolutely
> the norm that things like select(2) poll(2) and so forth will not work
> against disk I/O. At best they'll report that the file descriptor is
> ready for both reading and writing immediately.
>
> AFAIK in Linux the way to go is AIO. Don't know on BSD kernels, but
> probably async disk I/O is a rare place where I could like a lot more
> a thread based solution (and I've to pick one as with virtual memory
> Redis will need a way to do non blocking disk I/O otherwise if a
> client is going to read a key that's on disk it will block even the
> other clients that may continue without problems).

I believe this library from the same author is trying to bring a
common API to all AIO:

http://software.schmorp.de/pkg/libeio.html

But I never used it in a real work-related project.

It might be interesting for you to take ideas from it if you wish.

Bye,

Salvatore Sanfilippo

unread,
Nov 28, 2009, 5:43:35 PM11/28/09
to redi...@googlegroups.com
On Sat, Nov 28, 2009 at 10:24 PM, Pedro Melo <me...@simplicidade.org> wrote:

> I really hope that I don't seem pushy about this, really :), just
> pointing out the advantages. you know much better than I what is the
> best path for Redis.

No problem Pedro, I think your arguments are *valid* and I agree that
libev is a good piece of code after reading the source for a while
today. If we put this issue as a purely software engineering stuff,
that is, supposing we are going to write a component for the Space
Shuttle, no doubt the right thing to do is to pick the safest thing
that works and is well tested.

But there are other possible point of views, and my feeling is that
they are not less important, but are not objective as well. I think
this discussion has some deep link with the most important motivations
for writing software. I think there is some value in simpler code that
does only what is needed and is clean to read. Libraries are a great
idea to accomplish great things in short time, but things like libev
finish to resemble every day more what they wanted to avoid, after
all there was libevent already. Was it buggy? Why don't fix, or at
least fork it? Because libevent is too complex, is a mess, and so
forth. But at the end this libraries, including libev, try to work for
everybody ruining the initial simple design. Need more identical
events registered for the same FD? For me is a design error, for
general libraries is a feature because there are people using such a
feature. And so forth.

There is also some value in not having dependencies. I have not proofs
but I *bet* part of the fact that Redis is starting to get some user
is not *only* due to its merits as a database, but also to the fact
that's so simple to get started. It's simple to understand how it
works, simple to compile, simple to run even without a configuration.
The semantics itself is so simple that I know at least a few people
implementing a Redis clones just for fun in different languages:
Erlang, Java, Javascript, ...

If you take the street of simplicity this should be adopted in
everything, from the protocol to the fact there are no dependencies,
and that everybody with some C skill can open ae.c and understand how
an event loop works.

> Of course, no lib is without bugs.
>
> I can only say that libev is very active and the author responds
> quickly to bug reports on the mailing list.

Yes it's a good project indeed.

>> - I plan to use a lot of timers in the future. All this libs are using
>> an O(N) timer algorithm, at least this is what I see from the sources.
>> A balanced tree or a skiplist is needed to improve over this. When
>> I'll need it I'll develop it without to wait for external developers
>> to merge my changes.
>
> I believe timers are O(log N), but you can see the algorithm
> complexity section of the documentation:
>
> http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod#ALGORITHMIC_COMPLEXITIES
>
> Actually, libev is one of the few libraries that *documents* the
> complexity of the algorithms it uses. :)

A very good point about it.

> Anyway, I'm sure it will all work out.

Well I'm not so sure as it's new code, I rewrote almost from scratch
ae.c, but even if there will be bugs and I'll fix it eventually, I
think it's worth the efforts. Btw now I entered feature freeze with
the last commit, and I'll use the next month before to release -rc1 to
read the source code from scratch, and do a lot of testing, this can
surely help.

Thanks for sharing your ideas,

chameleon95

unread,
Nov 29, 2009, 4:38:07 AM11/29/09
to Redis DB
+1 on libev..

Steven Ducat
http://vin-asia.com/

Sergey Shepelev

unread,
Nov 29, 2009, 10:17:44 AM11/29/09
to redis-db
On Sat, Nov 28, 2009 at 7:31 PM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
> On Tue, Nov 24, 2009 at 11:32 AM, Pedro Melo <me...@simplicidade.org> wrote:
>
>> +1 on libev, you will be much happier with it.
>
> Hello Sergey, Pedro,
>
> now Redis supports kevent as well (Thanks to Harish Malipeddi) ,
> please take a look at my new implementation of ae.c and how simple is
> to add support for a new module and how it's all an O(1) business to
> add and remove events.
>
> The low level modules like ae_epoll.c ae_select.c ae_kevent.c expose
> just a minimal "ideal" API, while the higher level layer does to work
> of taking care of the max FD currently alive and to take the higher
> level state.
>
> I think that we can be happy with this without to add other
> dependencies after all, and it's not excluded that at some point will
> want to have a more interesting semantic for our event loop, for
> instance for LOCK if it will ever get implemented and for Virtual
> Memory (ability to "pause" events, ability to read chunks of on-disk
> files in background and so forth). Maybe we'll have to hack ae.c
> enough that's ok with "our" code but not with other stuff linked to
> Redis.
>
> Cheers,
> Salvatore

Salvatore, i really respect your effort into making Redis good at
handling more concurrent connections via epoll and kqueue. I like
software without dependencies too.

The neatest thing about libev is that it is a tool, not a complete
solution for some particular task. That's why i believe you will like
it. It is extremely flexible, it may run inside another mainloop, it
may incorporate another mainloops inside itself. It has nothing to do
with network at all. I really believe you should take a look at libev.

>
> --
> Salvatore 'antirez' Sanfilippo
> http://invece.org
>
> "Once you have something that grows faster than education grows,
> you’re always going to get a pop culture.", Alan Kay
>

Aman Gupta

unread,
Nov 29, 2009, 12:23:50 PM11/29/09
to redi...@googlegroups.com
FWIW, I fully support the decision not to use libev. libev has many
features that redis simply does not need. The epoll/kqueue apis are
simple and stable for the redis use case- an additional abstraction
layer is not required.

If you haven't looked at ae.c already, I strongly suggest you do so
before recommending a switch to libev.

Aman

Alec Taylor

unread,
Feb 1, 2014, 1:43:11 AM2/1/14
to redi...@googlegroups.com
There is also libuv now; from the NodeJS project. Might be worth investigating when you find yourself with a spare afternoon :)

Pedro Melo

unread,
Feb 1, 2014, 7:43:45 AM2/1/14
to redi...@googlegroups.com
Hi,

uv is still a layer over libev on POSIX, right?

The principle still applies, those libraries are big, and Redis would only use a tiny fraction of it.

Besides, if it working why fix it?


--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.

To post to this group, send email to redi...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages