With this in mind, I decided to make the thinnest possible wrapper
around Netty such that a person could play around with alternate ways
to use Clojure effectively. The result can be found at
http://github.com/ztellman/aleph.
Developers are still required to "participate" in the NIO design, in
that blocking calls in the request handler need to be avoided to reap
the full benefits. Netty provides a lot of nice abstractions over
NIO, but kind of punts on how to effectively manage the concurrency it
requires. Clojure's concurrency primitives don't really have a
counterpart in Netty, so I don't see why they shouldn't be used.
If you really want access to Netty, though, (:channel request) will
return an org.jboss.netty.channel.Channel object, which will allow you
to do pretty much anything you want.
Actually that rather defeats the point of a non-blocking server.
You're still using up a thread, and hence haven't really gained
anything over:
(defn hello-world [request]
(Thread/sleep 1)
{:status 200
:headers {"Content-Type" "text/html"}
:body "Hello world!"})
The main advantage of a non-blocking server is that you're don't use
up a thread waiting for an event (such as the user sending data, or
some other external trigger).
- James
Actually, a huge benefit of a non-blocking http server is that it
won't create a thread per request. But, don't seen any problem the use
code spawing threads to handle work for one particular request.
In clojure, I think it'll be hard to go NIO all the way (like in node.js).
The main advantage of a non-blocking server is that you're don't use
up a thread waiting for an event (such as the user sending data, or
some other external trigger).
- James
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
how did you setup the clojure one?
was it what you posted before?
(defn hello-world [request]
(future
(Thread/sleep 1)
(respond! request
{:status 200
:headers {"Content-Type" "text/html"}
:body "Hello world!"})))
On Wed, Jul 7, 2010 at 10:04 PM, David Nolen <dnolen...@gmail.com> wrote:
> I don't really care if threads do or don't get eaten up. In fact, in the
> "Hello world" microbenchmark Node.js gets trounced by aleph because aleph
> can take advantage of all cores.
--
Omnem crede diem tibi diluxisse supremum.
for the hello world test, you are using the helloworld from
front page of node.js at http://nodejs.org/
right?
how did you setup the clojure one?
was it what you posted before?
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your
> first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
--
> This blog post presents data showing that threading is faster than
> NIO: http://mailinator.blogspot.com/2008/02/kill-myth-please-nio-is-not-faster-than.html
>
I would not consider this article to be the definitive answer to the
question of NIO vs threads. My experience with high throughput java
servers is NOT what this guys represents. You can push NIO very far if
you want to, although it is hard. The advantage with NIO is that your
code doesn't have to go through the many abstraction layers that make
things very easy for the developer but quickly get in your way if you
want raw performance.
Also, in some instances with NIO you can even work directly with
kernel buffers, and so the network data doesn't even need to be copied
from the kernel space into the user space. That takes time if you are
managing a lot of network traffic.
Finally, as it has already been discussed, threads use memory, lots of
it. If the number of threads is not bound, a traffic spike will make
your memory requirements skyrocket, either exhausting the memory in
your JVM or prompting the OS to start paging on its VM. In the second
case, once your server is hitting Virtual Memory all those threads
will cause page misses left and right, and you'll watch your server
grind to a halt, since it will not be returning responses but still
receiving requests and thus creating even more new threads, happily
marching into a death spiral.
Even if that article is right, fast != scalable, or high throughput, or bounded.
Yes, I have issues with that article as I have seen it quoted one too
many times ;)
This doesn't really jive with reality as far as I can tell; if
anything it is the exact opposite of reality. If you're doing
significant work in between doing I/O calls (which tend to be context
switching points) even to the point of usually yielding only to
pre-emptive switching resulting from exceeding your time slice, the
relative overhead of threading should be much less (usually) than if
you're just doing a huge amount of very very small requests.
Whatever the extra cost is in a thread context switch compared to an
application context switch (and make no mistake, it's effectively
still a context switch; just because you're not switching threads
doesn't mean that different requests will not need to e.g. touch
differens cache lines, etc), that becomes more relevant as the amount
of work done after each switch decreases.
The cost of time slicing while holding a lock is real, but if you have
a code path with a high rate of lock acquisition in some kind of
performance critical situation, presumably you're holding locks for
very short periods of time and the likelyhood of switching away at
exactly the wrong moment is not very high.
Also: Remember that syscalls are most definitely not cheap, and an
asynchronous model doesn't save you from doing syscalls for the I/O.
> So, between memory overheads, cost of creating and destroying threads
> and context switching, using a synchronous model can be extremely
> heavyweight compared to an asynchronous model. Its no surprise that
> people are seeing much better throughput with asynchronous servers.
In my experience threading works quite well for many production tasks,
though not all (until we get better "vertical" (all the way from the
language to the bare metal) support for cheaper threads). The
maintenance and development costs associated with writing complex
software in callback form with all state explicitly managed, disabling
any use of sensible control flow, exceptions, etc, is very easy to
under-estimate in my opinion. It also makes every single call you ever
make have part of it's public interface whether or not it *might* do
I/O, which is one particular aspect I really dislike other than the
callback orientation.
You also need to consider latency. While some flawed benchmarks where
people throw some fixed concurrency at a problem will show that
latency is poor with a threaded model in comparison to an asynch
model; under an actual reasonable load where the rate of incoming
requests is not infinitely high, the fact that you're doing
pre-emption and scheduling across multiple CPU:s will mean that
individual expensive requests don't cause multiple other smaller
requests to have to wait for it to complete it's bit of work. So
again, for CPU-heavy tasks, this is another way in which a threaded
model can be better unless you very carefully control the amount of
work done in each reactor loop (presuming reactor pattern) in the
asynchronous case.
As far as I can tell, the advantages from an asynchronous model mostly
come in cases where you either (1) have very high concurrency or (2)
are doing very very little work for each unit of I/O done, such that
the cost of context switching is at it's most significant.
My wet dream is to be able to utilize something like Clojure (or
anything other than callback/state machine based models) on top of an
implementation where the underlying concurrency abstraction is in fact
really efficient (in terms of stack sizes and in terms of switching
overhead). In other words, the day where having a few hundred thousand
concurrents connections does *not* imply that you must write your
entire application to be event based, is when I am extremely happy ;)
--
/ Peter Schuller
A fundamental understanding of the difference between threads and kqueue/epoll (which power NIO) should clear up anyone's misgivings about evented servers. They are clearly more scalable, it is no contest.
- Greg
oh Erlang, were art thou?
Unfortunately the link to the PDF was broken, here's one that works:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.101.7987&rep=rep1&type=pdf
On Jul 8, 2010, at 6:48 PM, Raoul Duke wrote:
> can't we all just get along?
>
> http://lambda-the-ultimate.org/node/1435
>
Your link to the epoll + threads document is probably the best way to go (that I'm aware of), to address any of the issues that Raoul brought up w.r.t. long operations in between the events themselves, but doesn't Netty do that already with a thread-pool?
- Greg
Very interesting!
I've been following the thread with great interest and did a quick
performance test today comparing standard compojure with jetty against
aleph and netty. I get around 4500 req/s with compojure and 3500 req/s
with aleph. The test was as simple as possible, just return hello
world.
> I've just discovered another Netty wrapper was released this weekend
> (http://github.com/datskos/ring-netty-adapter), but it's somewhat
> different in its design and intent; it couples the request and
> response to allow for seamless interop with Ring.
>
> Anyways, I hope some people find this interesting. Clojure doesn't
> seem to have found its own voice w.r.t. web development; hopefully we
> can work together to fix that.
--
Anders Rune Jensen
Very interesting!
I've been following the thread with great interest and did a quick
performance test today comparing standard compojure with jetty against
aleph and netty. I get around 4500 req/s with compojure and 3500 req/s
with aleph. The test was as simple as possible, just return hello
world.
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
Yeah I was possitive that the numbers were quite good for aleph
considering it's such a young project. But I was expecting netty to
beat jetty, so I was a little disappointed :)
I just ran the test as simple as possible: java -server (no others
paramters set), default kernel settings (Ubuntu) and then using ab -n
5000 -c 50 (as in your blog post). As always with java, one needs to
run ab a few times before the number stabilize :)
The test machine was an old intel c2 duo 2 GHz.
> David
--
Anders Rune Jensen
Benchmarking Aleph against Ring Jetty directly is likely to produce
more accurate results. Compojure adds middleware and routing logic, so
it's not really a fair test.
That said, I expect Aleph to outperform the Jetty adapter :)
- James
The test machine was an old intel c2 duo 2 GHz.
> David
--
Anders Rune Jensen
I haven't used Building Blocks, but I certainly agree that running
exactly as many threads as cores is probably optimal under most
conditions (assuming cache contention doesn't interact in such a way
as to make it worse; e.g. you might see two threads going faster than
four and such under extreme conditions).
> would want to use the available cores. I'm just saying that having more
> threads than cores (or rather, more software threads than hardware threads)
> may hurt performance or scalability due to time slicing overheads. Obviously
> its more complicated than simply creating N worker threads for an N-core
> system though, since if any blocking IO is performed the cores are
> under-utilized.
Agreed.
> However, in an asynchronous server, (or, more importantly, in one where the
> number of threads do not exceed the number of hardware threads) it becomes
> much more likely that a request is processed to completion before it gets
> evicted from the cache (as long as care is taken to prevent false sharing
> with other, independent data which share the cache lines).
Agreed, but with the specific caveat that this is specifically under
circumstances where you are in fact trading latency for throughput. In
other words, this is true, but in any specific case where the asynch
design allowed you to complete where you would otherwise have context
switched, you are intrinsically violating your would-be timeslice,
thus having effects on latency resulting from other requests waiting
on your one long/expensive requests.
> isn't at all relevant to the discussion. Still, I am very interested to hear
> yours and everyone elses real world experiences.
I come from the perspective of first having written quite a lot of
multi-threaded C++ code (over a few years) that did fairly complex
combinations of "CPU work" and I/O with other services. I am really
confident that the code I/we wrote would never have been completed in
even close to the same amount of time/resources if we had written
everything event-based. I cannot overstate this point enough...
During the last year I've been exposed so quite a lot of reactive code
(C++, Python twisted, some others), with the expected IMO pretty
extreme consequences for code maintainability and productivity (even
for people who's been writing such code for a long time and are
clearly used to it).
So I have an strong desire to avoid going event based if possible as a
default position.
In terms of scalability, that definitely mattered when I worked on the
mentioned multi-threaded code. It directly translated to hardware
costs in terms of what you had to buy because we had effectively an
infinite amount of work to be done in some areas (such as crawling the
web; you don't really run out of things to do because you can always
do things more often, better or faster). However, that experience is
at best anecdotal since no formal studies were done on multi-core
scalability; rather doubling cores meant it went "almost twice as
fast" - purely anecdotal, based on empirical observations during
development cycles.
On this topic I found it interesting reading about Google's concerns
with and improvements to the Linux kernel to support their use. I
couldn't find the article right now (I'm pretty sure it was on lwn),
but it strongly implied that Google definitely used production systems
with very many threads. I found that interesting since given Google's
scale, presumably runtime efficiency may be very highly valued
compared to extra development cost to get there. My hypothesis,
probably colored by confirmation bias, is that the difference in
effort in writing large complex systems in an event-based fashion is
simply too expensive to be worth it even at Google's scale - at least
in the general case. Their release of Go was unsurprising to me for
this reason :)
Has anyone here got experience with writing really complex systems
(big code bases, services talking to lots of other services, doing
non-trivial control flow etc) in event-based form? Any comments on how
it scales, in terms of development costs, as the size and complexity
of the system grows?
--
/ Peter Schuller
Is it possible to get an exception or something when a client
disconnects? To avoid using needless resources.
--
Anders Rune Jensen