ANN: Aleph, an asynchronous web server

1,643 views
Skip to first unread message

Zach Tellman

unread,
Jul 7, 2010, 5:15:11 AM7/7/10
to Clojure
At the Bay Area user group meeting in June, there was a very
interesting discussion about how to best use Clojure's concurrency
primitives to field large numbers of concurrent requests, especially
in a long-poll/push type application. We didn't arrive at any solid
conclusion, but it was clear to everyone that a thread-per-request
model is especially gratuitous for a language like Clojure.

With this in mind, I decided to make the thinnest possible wrapper
around Netty such that a person could play around with alternate ways
to use Clojure effectively. The result can be found at
http://github.com/ztellman/aleph.

I've just discovered another Netty wrapper was released this weekend
(http://github.com/datskos/ring-netty-adapter), but it's somewhat
different in its design and intent; it couples the request and
response to allow for seamless interop with Ring.

Anyways, I hope some people find this interesting. Clojure doesn't
seem to have found its own voice w.r.t. web development; hopefully we
can work together to fix that.

David Nolen

unread,
Jul 7, 2010, 8:11:07 AM7/7/10
to clo...@googlegroups.com
It's great to see Clojure weighing in on the evented webserver. As far as feedback, it would be nice to see at least one example where aleph allows you to use the Clojure concurrency primitives in ways that are not possible with Ring and the Jetty adapter. Otherwise I'm a bit lost as to how to start playing around :)

David 

ngocdaothanh

unread,
Jul 7, 2010, 5:57:04 AM7/7/10
to Clojure
> [org.jboss.netty/netty "3.2.0.BETA1"]

Netty 3.2.1.Final has been released.

I think the ! mark in "respond!" is kind of misleading. Why not change
it to "arespond"?

Zach Tellman

unread,
Jul 7, 2010, 10:11:06 AM7/7/10
to Clojure
For some reason I couldn't get 3.2.1.Final to come in via maven. I
didn't want to spend too much time futzing with it, but I'll take
another look at it.

I don't understand your point about the bang. The respond! function
certainly has side effects, and shouldn't be used in a transaction.
What qualities is it missing that make the bang misleading?

David Nolen

unread,
Jul 7, 2010, 12:12:28 PM7/7/10
to clo...@googlegroups.com
On Wed, Jul 7, 2010 at 5:15 AM, Zach Tellman <ztel...@gmail.com> wrote:
With this in mind, I decided to make the thinnest possible wrapper
around Netty such that a person could play around with alternate ways
to use Clojure effectively.  The result can be found at
http://github.com/ztellman/aleph.

I played around with this some. Throughput is of course ridiculous (8+ K req/s on my machine). One thing is that this approach encourages using Clojure concurrency primitives over participating in the Netty NIO design. Is that the intent?

David

ngocdaothanh

unread,
Jul 7, 2010, 11:28:10 AM7/7/10
to Clojure
> For some reason I couldn't get 3.2.1.Final to come in via maven.

I think you need to add this to project.clj:
:repositories [["jboss" "http://repository.jboss.org/nexus/content/
groups/public/"]]

> What qualities is it missing that make the bang misleading?

I thinks ! means something dangerous is about to happen.

Zach Tellman

unread,
Jul 7, 2010, 1:43:46 PM7/7/10
to Clojure
On Jul 7, 9:12 am, David Nolen <dnolen.li...@gmail.com> wrote:
Developers are still required to "participate" in the NIO design, in
that blocking calls in the request handler need to be avoided to reap
the full benefits. Netty provides a lot of nice abstractions over
NIO, but kind of punts on how to effectively manage the concurrency it
requires. Clojure's concurrency primitives don't really have a
counterpart in Netty, so I don't see why they shouldn't be used.

If you really want access to Netty, though, (:channel request) will
return an org.jboss.netty.channel.Channel object, which will allow you
to do pretty much anything you want.

Zach Tellman

unread,
Jul 7, 2010, 1:46:15 PM7/7/10
to Clojure
On Jul 7, 8:28 am, ngocdaothanh <ngocdaoth...@gmail.com> wrote:
> > For some reason I couldn't get 3.2.1.Final to come in via maven.
>
> I think you need to add this to project.clj:
> :repositories [["jboss" "http://repository.jboss.org/nexus/content/
> groups/public/"]]
>

Thanks, I'll give that a try.

> > What qualities is it missing that make the bang misleading?
>
> I thinks ! means something dangerous is about to happen.

If by "dangerous" you mean side-effects, then respond! qualifies.
It's neither pure nor idempotent, and it's important that people
realize that fact.

David Nolen

unread,
Jul 7, 2010, 2:04:21 PM7/7/10
to clo...@googlegroups.com
On Wed, Jul 7, 2010 at 1:43 PM, Zach Tellman <ztel...@gmail.com> wrote:
Developers are still required to "participate" in the NIO design, in
that blocking calls in the request handler need to be avoided to reap
the full benefits.  Netty provides a lot of nice abstractions over
NIO, but kind of punts on how to effectively manage the concurrency it
requires.  Clojure's concurrency primitives don't really have a
counterpart in Netty, so I don't see why they shouldn't be used.

So something like this:

(defn hello-world [request]
  (future
   (Thread/sleep 1)
   (respond! request
             {:status 200
              :headers {"Content-Type" "text/html"}
              :body "Hello world!"})))

Is non-blocking and perfectly fine?
 
If you really want access to Netty, though, (:channel request) will
return an org.jboss.netty.channel.Channel object, which will allow you
to do pretty much anything you want.

Great!

James Reeves

unread,
Jul 7, 2010, 2:10:16 PM7/7/10
to clo...@googlegroups.com
On 7 July 2010 19:04, David Nolen <dnolen...@gmail.com> wrote:
> So something like this:
> (defn hello-world [request]
>   (future
>    (Thread/sleep 1)
>    (respond! request
>              {:status 200
>               :headers {"Content-Type" "text/html"}
>               :body "Hello world!"})))
> Is non-blocking and perfectly fine?

Actually that rather defeats the point of a non-blocking server.
You're still using up a thread, and hence haven't really gained
anything over:

(defn hello-world [request]
(Thread/sleep 1)


{:status 200
:headers {"Content-Type" "text/html"}
:body "Hello world!"})

The main advantage of a non-blocking server is that you're don't use
up a thread waiting for an event (such as the user sending data, or
some other external trigger).

- James

David Nolen

unread,
Jul 7, 2010, 2:17:26 PM7/7/10
to clo...@googlegroups.com
But I guess I'm trying to figure out what the most idiomatic way to pipeline in this situation would be (one thing I don't like about Node.js is that it encourages working with a mess of callbacks).

For example what would be the best most idiomatic way to hit a database (blocking operation), process that data and return the result with respond! ?

David

Zach Tellman

unread,
Jul 7, 2010, 2:53:07 PM7/7/10
to Clojure


On Jul 7, 11:17 am, David Nolen <dnolen.li...@gmail.com> wrote:
> On Wed, Jul 7, 2010 at 2:10 PM, James Reeves <jree...@weavejester.com>wrote:
I'd say the most idiomatic way to hit a database is to use one that
has a non-blocking interface (postgres is one example). Barring that,
I'd say that the future approach is slightly better than the thread-
per-request model because it uses a thread pool, but otherwise the
chain is only going to be as strong as its weakest link.

Pedro Henriques dos Santos Teixeira

unread,
Jul 7, 2010, 2:14:58 PM7/7/10
to clo...@googlegroups.com


Actually, a huge benefit of a non-blocking http server is that it
won't create a thread per request. But, don't seen any problem the use
code spawing threads to handle work for one particular request.

In clojure, I think it'll be hard to go NIO all the way (like in node.js).

David Nolen

unread,
Jul 7, 2010, 10:04:32 PM7/7/10
to clo...@googlegroups.com
On Wed, Jul 7, 2010 at 2:10 PM, James Reeves <jre...@weavejester.com> wrote:
The main advantage of a non-blocking server is that you're don't use
up a thread waiting for an event (such as the user sending data, or
some other external trigger).

- James

I think the main advantage of a non-blocking server is throughput, or at least that's what I'm seeing. I haven't been able to get Jetty to serve faster than 100 rq/s, while aleph (via Netty) is easily getting 600-700 rq/s even if I'm writing to a database (CouchDB).

I don't really care if threads do or don't get eaten up. In fact, in the "Hello world" microbenchmark Node.js gets trounced by aleph because aleph can take advantage of all cores.

I also note that using future gives me more throughput if I'm using (Thread/sleep 1).

In summary the raw request throughput of Netty NIO + the sanity of Clojure's concurrency primitives (atom, agent, ref, future, promise) might just be a real sweet spot.

David

Greg

unread,
Jul 7, 2010, 10:09:41 PM7/7/10
to clo...@googlegroups.com
"well I think the main advantage" is memory. :-)

Theoretically (I think), thread-per-connection servers can be very close to matching asynchronous servers in throughput, but they definitely require much more RAM to do so. RAM is one of the more expensive commodities to come by on VPS and cloud servers.

Of course, I'm being facetious, I think we can agree that there are lots of advantages to evented servers, and it's awesome they're getting better support in Clojure!

- Greg

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

gary b

unread,
Jul 7, 2010, 11:47:29 PM7/7/10
to Clojure
re: memory use

If the number of concurrent requests is small, then the memory used by
thread per request is usually not an issue.

When implementing long polling, the number of concurrent requests can
be very large. Sharing threads between requests in a long polling
server can result in significant memory savings.

re: advantages of evented servers

There are also disadvantages to an evented server. Writing logic in a
series of callbacks is more difficult to understand and debug than the
sequential code used in a thread per request model.

re: throughput of Netty/Aleph vs. Jetty

The difference in throughput between Aleph/Netty and Jetty might not
be a result of the different threading models. It might be that Jetty
has more goop in it than Netty (I don't know if this is true or not).
This blog post presents data showing that threading is faster than
NIO: http://mailinator.blogspot.com/2008/02/kill-myth-please-nio-is-not-faster-than.html

Miki

unread,
Jul 8, 2010, 12:55:34 PM7/8/10
to Clojure
Hello David,

> while aleph (via Netty) is easily getting 600-700 rq/s
> even if I'm writing to a database (CouchDB).
Can you share the code for this?

All the best,
--
Miki

Wilson MacGyver

unread,
Jul 8, 2010, 3:02:53 PM7/8/10
to clo...@googlegroups.com
for the hello world test, you are using the helloworld from
front page of node.js at http://nodejs.org/
right?

how did you setup the clojure one?

was it what you posted before?

(defn hello-world [request]
(future
(Thread/sleep 1)
(respond! request
{:status 200
:headers {"Content-Type" "text/html"}
:body "Hello world!"})))

On Wed, Jul 7, 2010 at 10:04 PM, David Nolen <dnolen...@gmail.com> wrote:
> I don't really care if threads do or don't get eaten up. In fact, in the
> "Hello world" microbenchmark Node.js gets trounced by aleph because aleph
> can take advantage of all cores.

--
Omnem crede diem tibi diluxisse supremum.

Dan Kersten

unread,
Jul 8, 2010, 3:02:00 PM7/8/10
to Clojure
There are more rasons to want to avoid using threads than memory.
Besides the obvious cost of creating and destroying threads (which is
reduced or removed by using thread pools), you also have the cost of
time slicing once you have more software threads than hardware
threads: there is the obvious cost of context switching, but also less
obvious costs such as bad processor cache usage (cache cooling, false
sharing etc) and lock preemption (if the threads make use of shared
resources).
Under heavy load, this can be quite costly, especially if each request
requires non-trivial processing (ie, enough to make time-slicing kick
in).

So, between memory overheads, cost of creating and destroying threads
and context switching, using a synchronous model can be extremely
heavyweight compared to an asynchronous model. Its no surprise that
people are seeing much better throughput with asynchronous servers.
> NIO:http://mailinator.blogspot.com/2008/02/kill-myth-please-nio-is-not-fa...

David Nolen

unread,
Jul 8, 2010, 3:09:29 PM7/8/10
to clo...@googlegroups.com
On Thu, Jul 8, 2010 at 3:02 PM, Wilson MacGyver <wmac...@gmail.com> wrote:
for the hello world test, you are using the helloworld from
front page of node.js at http://nodejs.org/
right?

how did you setup the clojure one?

was it what you posted before?

I wrote a blog post about it here with the code I used:

David Nolen

unread,
Jul 8, 2010, 3:11:24 PM7/8/10
to clo...@googlegroups.com
I don't have a standalone example at the moment. I might try to put one together. In the meantime it's pretty trivial to take my hello world aleph code and call into couchdb using one of the popular clojure http clients: clojure-http-client, clj-apache-http. Both can be found on GitHub.

David

Wilson MacGyver

unread,
Jul 8, 2010, 3:19:09 PM7/8/10
to clo...@googlegroups.com
thank you!

> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your
> first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

--

Antoni Batchelli

unread,
Jul 8, 2010, 5:26:21 PM7/8/10
to clo...@googlegroups.com
On Jul 7, 2010, at 8:47 PM, gary b <gary...@gmail.com> wrote:

> This blog post presents data showing that threading is faster than
> NIO: http://mailinator.blogspot.com/2008/02/kill-myth-please-nio-is-not-faster-than.html
>

I would not consider this article to be the definitive answer to the
question of NIO vs threads. My experience with high throughput java
servers is NOT what this guys represents. You can push NIO very far if
you want to, although it is hard. The advantage with NIO is that your
code doesn't have to go through the many abstraction layers that make
things very easy for the developer but quickly get in your way if you
want raw performance.

Also, in some instances with NIO you can even work directly with
kernel buffers, and so the network data doesn't even need to be copied
from the kernel space into the user space. That takes time if you are
managing a lot of network traffic.

Finally, as it has already been discussed, threads use memory, lots of
it. If the number of threads is not bound, a traffic spike will make
your memory requirements skyrocket, either exhausting the memory in
your JVM or prompting the OS to start paging on its VM. In the second
case, once your server is hitting Virtual Memory all those threads
will cause page misses left and right, and you'll watch your server
grind to a halt, since it will not be returning responses but still
receiving requests and thus creating even more new threads, happily
marching into a death spiral.

Even if that article is right, fast != scalable, or high throughput, or bounded.

Yes, I have issues with that article as I have seen it quoted one too
many times ;)

Peter Schuller

unread,
Jul 8, 2010, 6:41:29 PM7/8/10
to clo...@googlegroups.com
> Under heavy load, this can be quite costly, especially if each request
> requires non-trivial processing (ie, enough to make time-slicing kick
> in).

This doesn't really jive with reality as far as I can tell; if
anything it is the exact opposite of reality. If you're doing
significant work in between doing I/O calls (which tend to be context
switching points) even to the point of usually yielding only to
pre-emptive switching resulting from exceeding your time slice, the
relative overhead of threading should be much less (usually) than if
you're just doing a huge amount of very very small requests.

Whatever the extra cost is in a thread context switch compared to an
application context switch (and make no mistake, it's effectively
still a context switch; just because you're not switching threads
doesn't mean that different requests will not need to e.g. touch
differens cache lines, etc), that becomes more relevant as the amount
of work done after each switch decreases.

The cost of time slicing while holding a lock is real, but if you have
a code path with a high rate of lock acquisition in some kind of
performance critical situation, presumably you're holding locks for
very short periods of time and the likelyhood of switching away at
exactly the wrong moment is not very high.

Also: Remember that syscalls are most definitely not cheap, and an
asynchronous model doesn't save you from doing syscalls for the I/O.

> So, between memory overheads, cost of creating and destroying threads
> and context switching, using a synchronous model can be extremely
> heavyweight compared to an asynchronous model. Its no surprise that
> people are seeing much better throughput with asynchronous servers.

In my experience threading works quite well for many production tasks,
though not all (until we get better "vertical" (all the way from the
language to the bare metal) support for cheaper threads). The
maintenance and development costs associated with writing complex
software in callback form with all state explicitly managed, disabling
any use of sensible control flow, exceptions, etc, is very easy to
under-estimate in my opinion. It also makes every single call you ever
make have part of it's public interface whether or not it *might* do
I/O, which is one particular aspect I really dislike other than the
callback orientation.

You also need to consider latency. While some flawed benchmarks where
people throw some fixed concurrency at a problem will show that
latency is poor with a threaded model in comparison to an asynch
model; under an actual reasonable load where the rate of incoming
requests is not infinitely high, the fact that you're doing
pre-emption and scheduling across multiple CPU:s will mean that
individual expensive requests don't cause multiple other smaller
requests to have to wait for it to complete it's bit of work. So
again, for CPU-heavy tasks, this is another way in which a threaded
model can be better unless you very carefully control the amount of
work done in each reactor loop (presuming reactor pattern) in the
asynchronous case.

As far as I can tell, the advantages from an asynchronous model mostly
come in cases where you either (1) have very high concurrency or (2)
are doing very very little work for each unit of I/O done, such that
the cost of context switching is at it's most significant.

My wet dream is to be able to utilize something like Clojure (or
anything other than callback/state machine based models) on top of an
implementation where the underlying concurrency abstraction is in fact
really efficient (in terms of stack sizes and in terms of switching
overhead). In other words, the day where having a few hundred thousand
concurrents connections does *not* imply that you must write your
entire application to be event based, is when I am extremely happy ;)

--
/ Peter Schuller

Raoul Duke

unread,
Jul 8, 2010, 6:48:16 PM7/8/10
to clo...@googlegroups.com
can't we all just get along?

http://lambda-the-ultimate.org/node/1435

Greg

unread,
Jul 8, 2010, 5:36:42 PM7/8/10
to clo...@googlegroups.com
Great response Antoni.

A fundamental understanding of the difference between threads and kqueue/epoll (which power NIO) should clear up anyone's misgivings about evented servers. They are clearly more scalable, it is no contest.

- Greg

Raoul Duke

unread,
Jul 8, 2010, 6:54:59 PM7/8/10
to clo...@googlegroups.com
On Thu, Jul 8, 2010 at 2:36 PM, Greg <gr...@kinostudios.com> wrote:
> A fundamental understanding of the difference between threads and kqueue/epoll (which power NIO) should clear up anyone's misgivings about evented servers. They are clearly more scalable, it is no contest.

oh Erlang, were art thou?

Greg

unread,
Jul 8, 2010, 6:53:54 PM7/8/10
to clo...@googlegroups.com
Interesting link!

Unfortunately the link to the PDF was broken, here's one that works:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.101.7987&rep=rep1&type=pdf

On Jul 8, 2010, at 6:48 PM, Raoul Duke wrote:

> can't we all just get along?
>
> http://lambda-the-ultimate.org/node/1435
>

Greg

unread,
Jul 8, 2010, 7:03:39 PM7/8/10
to clo...@googlegroups.com
I hope it didn't sound like I was saying threads are *always* bad, as I definitely don't think that. :-p

Your link to the epoll + threads document is probably the best way to go (that I'm aware of), to address any of the issues that Raoul brought up w.r.t. long operations in between the events themselves, but doesn't Netty do that already with a thread-pool?

- Greg

Greg

unread,
Jul 8, 2010, 7:04:34 PM7/8/10
to clo...@googlegroups.com
Whoops. s/Raoul/Peter Schuller/.

Anders Rune Jensen

unread,
Jul 9, 2010, 8:44:45 AM7/9/10
to clo...@googlegroups.com
On Wed, Jul 7, 2010 at 11:15 AM, Zach Tellman <ztel...@gmail.com> wrote:
> At the Bay Area user group meeting in June, there was a very
> interesting discussion about how to best use Clojure's concurrency
> primitives to field large numbers of concurrent requests, especially
> in a long-poll/push type application.  We didn't arrive at any solid
> conclusion, but it was clear to everyone that a thread-per-request
> model is especially gratuitous for a language like Clojure.

>
> With this in mind, I decided to make the thinnest possible wrapper
> around Netty such that a person could play around with alternate ways
> to use Clojure effectively.  The result can be found at
> http://github.com/ztellman/aleph.

Very interesting!

I've been following the thread with great interest and did a quick
performance test today comparing standard compojure with jetty against
aleph and netty. I get around 4500 req/s with compojure and 3500 req/s
with aleph. The test was as simple as possible, just return hello
world.

> I've just discovered another Netty wrapper was released this weekend
> (http://github.com/datskos/ring-netty-adapter), but it's somewhat
> different in its design and intent; it couples the request and
> response to allow for seamless interop with Ring.
>
> Anyways, I hope some people find this interesting.  Clojure doesn't
> seem to have found its own voice w.r.t. web development; hopefully we
> can work together to fix that.

--
Anders Rune Jensen

David Nolen

unread,
Jul 9, 2010, 9:09:32 AM7/9/10
to clo...@googlegroups.com
On Fri, Jul 9, 2010 at 8:44 AM, Anders Rune Jensen <anders.ru...@gmail.com> wrote:
Very interesting!

I've been following the thread with great interest and did a quick
performance test today comparing standard compojure with jetty against
aleph and netty. I get around 4500 req/s with compojure and 3500 req/s
with aleph. The test was as simple as possible, just return hello
world.

I'm curious how you ran that test. With ab running 10 clients for 1 second I see ~4000-5000 req/s using Compojure 0.4.0. With aleph I see ~8000-9000 req/s. I also had a quick chat with Zach Tellman and it sounds like he hasn't done much in the way of optimizing (few Java type hints), so we'll likely see the aleph numbers go up.

David  

gary b

unread,
Jul 8, 2010, 8:32:00 PM7/8/10
to Clojure
On Jul 8, 2:26 pm, Antoni Batchelli <tbatche...@gmail.com> wrote:
> Also, in some instances with NIO you can even work directly
> with kernel buffers, and so the network data doesn't even need
> to be copied from the kernel space into the user space.

I assume that you are referring to NIO direct byte buffers. A
threaded application can use direct byte buffers.

I apologize for my sloppy terminology. When I wrote NIO, I was
referring to the evented or async programming model. I didn't mean to
imply that the threaded model cannot use NIO.

> If the number of threads is not bound, a traffic spike will make
> your memory requirements skyrocket, either exhausting the memory in
> your JVM or prompting the OS to start paging on its VM.

Yeah, a poorly constructed server can fall over with high load. The
servers that I have worked with use bounded thread pools for this and
other reasons.

Daniel Kersten

unread,
Jul 8, 2010, 11:28:11 PM7/8/10
to clo...@googlegroups.com
Maybe what I said makes less sense in the case of NIO vs blocking with threads - I've mainly been working with Intel Threading Building Blocks lately, where the cost of cache cooling is very real. For that reason (and the others mentioned - context switches and lock preemption), Intel Threading Building Blocks will try and run a single thread per core and load balance work amongst these.

I wouldn't say its just one thing (eg context switching), but a combination of thread memory overheads, thread creation and destruction, context switching, cache cooling, false sharing and lock preemption.
I also don't know how valid this is to Java/Clojure or web servers, though I don't see why it wouldn't be just as valid as any other multicore code.
Finally, I don't think you can really achieve true scalability by not using a multicore processors cores (by using a pure asynchronous server) - you would want to use the available cores. I'm just saying that having more threads than cores (or rather, more software threads than hardware threads) may hurt performance or scalability due to time slicing overheads. Obviously its more complicated than simply creating N worker threads for an N-core system though, since if any blocking IO is performed the cores are under-utilized.

"just because you're not switching threads doesn't mean that different requests will not need to e.g. touch different cache lines"
Yes, of course! I didn't mean to imply that an asynchronous server would save you from this.

However, in an asynchronous server, (or, more importantly, in one where the number of threads do not exceed the number of hardware threads) it becomes much more likely that a request is processed to completion before it gets evicted from the cache (as long as care is taken to prevent false sharing with other, independent data which share the cache lines).

As for locks, making sure to hold locks for as short a time as possible is a well known pattern, so I agree, the likelihood of switching away at the wrong time (and having another thread then try and aquire that same lock) is very low, but it can and does still happen on occasion - and when it does, it can really hit performance. (of course, the performance hit for a web application might not even be noticable - there wouldn't be so many web apps written in PHP, Ruby and Python!)

Anyway, web servers aren't my area of expertise, so please ignore me if this isn't at all relevant to the discussion. Still, I am very interested to hear yours and everyone elses real world experiences.


On an aside, callbacks aren't really all that cache frindly, unless great care is taken. OO isn't the greatest model for cache friendly multicore code either. Maybe thats one reason I like Clojures sequence abstraction as much as I do.


--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en



--
Daniel Kersten.

Anders Rune Jensen

unread,
Jul 9, 2010, 9:44:06 AM7/9/10
to clo...@googlegroups.com

Yeah I was possitive that the numbers were quite good for aleph
considering it's such a young project. But I was expecting netty to
beat jetty, so I was a little disappointed :)

I just ran the test as simple as possible: java -server (no others
paramters set), default kernel settings (Ubuntu) and then using ab -n
5000 -c 50 (as in your blog post). As always with java, one needs to
run ab a few times before the number stabilize :)

The test machine was an old intel c2 duo 2 GHz.

> David

--
Anders Rune Jensen

James Reeves

unread,
Jul 9, 2010, 10:22:40 AM7/9/10
to clo...@googlegroups.com
On 9 July 2010 14:09, David Nolen <dnolen...@gmail.com> wrote:
> I'm curious how you ran that test. With ab running 10 clients for 1 second I
> see ~4000-5000 req/s using Compojure 0.4.0. With aleph I see ~8000-9000
> req/s. I also had a quick chat with Zach Tellman and it sounds like he
> hasn't done much in the way of optimizing (few Java type hints), so we'll
> likely see the aleph numbers go up.

Benchmarking Aleph against Ring Jetty directly is likely to produce
more accurate results. Compojure adds middleware and routing logic, so
it's not really a fair test.

That said, I expect Aleph to outperform the Jetty adapter :)

- James

David Nolen

unread,
Jul 9, 2010, 10:34:02 AM7/9/10
to clo...@googlegroups.com
ab is a little weird. Trying running your tests again and you'll probably see the results I'm seeing.
 

The test machine was an old intel c2 duo 2 GHz.

> David

--
Anders Rune Jensen

Peter Schuller

unread,
Jul 9, 2010, 10:59:09 AM7/9/10
to clo...@googlegroups.com
> Maybe what I said makes less sense in the case of NIO vs blocking with
> threads - I've mainly been working with Intel Threading Building Blocks
> lately, where the cost of cache cooling is very real. For that reason (and
> the others mentioned - context switches and lock preemption), Intel
> Threading Building Blocks will try and run a single thread per core and load
> balance work amongst these.

I haven't used Building Blocks, but I certainly agree that running
exactly as many threads as cores is probably optimal under most
conditions (assuming cache contention doesn't interact in such a way
as to make it worse; e.g. you might see two threads going faster than
four and such under extreme conditions).

> would want to use the available cores. I'm just saying that having more
> threads than cores (or rather, more software threads than hardware threads)
> may hurt performance or scalability due to time slicing overheads. Obviously
> its more complicated than simply creating N worker threads for an N-core
> system though, since if any blocking IO is performed the cores are
> under-utilized.

Agreed.

> However, in an asynchronous server, (or, more importantly, in one where the
> number of threads do not exceed the number of hardware threads) it becomes
> much more likely that a request is processed to completion before it gets
> evicted from the cache (as long as care is taken to prevent false sharing
> with other, independent data which share the cache lines).

Agreed, but with the specific caveat that this is specifically under
circumstances where you are in fact trading latency for throughput. In
other words, this is true, but in any specific case where the asynch
design allowed you to complete where you would otherwise have context
switched, you are intrinsically violating your would-be timeslice,
thus having effects on latency resulting from other requests waiting
on your one long/expensive requests.

> isn't at all relevant to the discussion. Still, I am very interested to hear
> yours and everyone elses real world experiences.

I come from the perspective of first having written quite a lot of
multi-threaded C++ code (over a few years) that did fairly complex
combinations of "CPU work" and I/O with other services. I am really
confident that the code I/we wrote would never have been completed in
even close to the same amount of time/resources if we had written
everything event-based. I cannot overstate this point enough...

During the last year I've been exposed so quite a lot of reactive code
(C++, Python twisted, some others), with the expected IMO pretty
extreme consequences for code maintainability and productivity (even
for people who's been writing such code for a long time and are
clearly used to it).

So I have an strong desire to avoid going event based if possible as a
default position.

In terms of scalability, that definitely mattered when I worked on the
mentioned multi-threaded code. It directly translated to hardware
costs in terms of what you had to buy because we had effectively an
infinite amount of work to be done in some areas (such as crawling the
web; you don't really run out of things to do because you can always
do things more often, better or faster). However, that experience is
at best anecdotal since no formal studies were done on multi-core
scalability; rather doubling cores meant it went "almost twice as
fast" - purely anecdotal, based on empirical observations during
development cycles.

On this topic I found it interesting reading about Google's concerns
with and improvements to the Linux kernel to support their use. I
couldn't find the article right now (I'm pretty sure it was on lwn),
but it strongly implied that Google definitely used production systems
with very many threads. I found that interesting since given Google's
scale, presumably runtime efficiency may be very highly valued
compared to extra development cost to get there. My hypothesis,
probably colored by confirmation bias, is that the difference in
effort in writing large complex systems in an event-based fashion is
simply too expensive to be worth it even at Google's scale - at least
in the general case. Their release of Go was unsurprising to me for
this reason :)

Has anyone here got experience with writing really complex systems
(big code bases, services talking to lots of other services, doing
non-trivial control flow etc) in event-based form? Any comments on how
it scales, in terms of development costs, as the size and complexity
of the system grows?

--
/ Peter Schuller

Anders Rune Jensen

unread,
Jul 9, 2010, 7:04:48 PM7/9/10
to clo...@googlegroups.com
On Wed, Jul 7, 2010 at 11:15 AM, Zach Tellman <ztel...@gmail.com> wrote:
> At the Bay Area user group meeting in June, there was a very
> interesting discussion about how to best use Clojure's concurrency
> primitives to field large numbers of concurrent requests, especially
> in a long-poll/push type application.  We didn't arrive at any solid
> conclusion, but it was clear to everyone that a thread-per-request
> model is especially gratuitous for a language like Clojure.
>
> With this in mind, I decided to make the thinnest possible wrapper
> around Netty such that a person could play around with alternate ways
> to use Clojure effectively.  The result can be found at
> http://github.com/ztellman/aleph.
>
> I've just discovered another Netty wrapper was released this weekend
> (http://github.com/datskos/ring-netty-adapter), but it's somewhat
> different in its design and intent; it couples the request and
> response to allow for seamless interop with Ring.
>
> Anyways, I hope some people find this interesting.  Clojure doesn't
> seem to have found its own voice w.r.t. web development; hopefully we
> can work together to fix that.

Is it possible to get an exception or something when a client
disconnects? To avoid using needless resources.

--
Anders Rune Jensen

Reply all
Reply to author
Forward
0 new messages