[erlang-questions] Cowboy (Erlang) VS Haskell (Warp)

BM Kim

unread,

Jun 25, 2013, 7:51:03 AM6/25/13

to erlang-q...@erlang.org

Hi folks,

First of all, I want to apologise for my poor english skills,
since english is not my first language, but I'll try my best
to formulate my quesions as clear as possible.

Second, I've just begun to learn erlang, so if I'm asking
obvious "noob" questions I apologise for that too in advance...

Anywho, now to my actual question:

I am planning to write a high-performance server application in erlang,
which will primarily handle HTTP requests. After some reseach with google,
I narrowed down my choices to cowboy, misultin and mochiweb and decided
to go with the cowboy library first...

Looking at some tutorials, I've quickly built a small server capable of
serving static files and was eager to see first benchmark-results...
I've also built a small Haskell server using Warp library to compare it
with erlang's cowboy...

But my first impression was, that my cowboy server is much much slower than
expected when serving static-files and after some research I found a presentation
of the cowboy's author claiming that cowboy shouldn't be used for serving
static-files. So I modified the server code, so that it replies to every
request with in-memory 4Kb binary blob and compared it with my haskell warp
server serving 4kb static file...

this is my simple cowboy's http handler:

----------------------------------------------------------------------

blob() ->
[<<0:8>> || _ <- lists:seq(1,4096)].

init({tcp, http}, Req, _Opts) ->
{ok, Req, []}.

handle(Req, State) ->
{ok, Req2} = cowboy_req:reply(200, [], blob(), Req),
{ok, Req2, State}.

terminate(_Reason, _Req, _State) ->
ok.

-----------------------------------------------------------------------

I've tested both cowboy server and warp server with

weighttp -k -c200 -n10000 http://localhost:8888/4kblob.txt

on my i5-2540M thinkpad laptop running ubuntu 12.10.
Haskell code was compiled via GHC 7.6 and my erlang VM is R16B.

----------------------------------------------------------------------
Warp:

finished in 0 sec, 279 millisec and 52 microsec, 35835 req/s, 147331 kbyte/s

-----------------------------------------------------------------------

-----------------------------------------------------------------------
Cowboy (serving actually in-memory blob):

finished in 1 sec, 683 millisec and 264 microsec, 5940 req/s, 24447 kbyte/s
-----------------------------------------------------------------------

I've noticed, that when testing the cowboy-server the cpu usage is so much
higher than when testing the warp-server...
What do you guys think could be the problem here? Can't imagine that erlang
server is that much slower than haskell server...
I aplogise again, if the question is forumalted chaotic, if you need more
details just let me know...

I am trying to look into this on my own, but since I am still an erlang noob, some
helpful tips/advices from the experts will be much much appreciated!!!

Thanks in advance,

--cheers,
bmk
_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Damian Dobroczyński

unread,

Jun 25, 2013, 8:05:08 AM6/25/13

to erlang-q...@erlang.org

W dniu 25.06.2013 13:51, BM Kim pisze:

First, try to replace blob/0 function with this:

blob() -> <<0:(4096*8)>>.

Then, restart the test and report ;)

-- D.

Max Lapshin

unread,

Jun 25, 2013, 9:25:16 AM6/25/13

to Damian Dobroczyński, Erlang-Questions Questions

You are making wrong tests.

Erlyvideo can stream many gigabits of video via Cowboy server. Is it a
performance? Is it bigger than 100 mbit of text messages?

Problem with high performance is not only in primitive benchmarks. It
is also about smoothness of work. What will happen with your server
sending millions of messages if one gen_server in it will get blocked
for a long disk read? Are you ready for crash due to OOM.

If you want to spend time, than start writing in Erlang, because
Erlang will allow you to write multicore software that can perform
self-monitoring of internal loading via regulating message flows.

It is a very important thing to understand: in Erlang cross-objects
function calls are serialized into messages. Other languages offer you
only direct function call.
And only after you have some load, start measuring and collecting
statistics: what is slow and what is not.

BM Kim

unread,

Jun 25, 2013, 1:38:25 PM6/25/13

to erlang-q...@erlang.org

Hi,

Thank you very much for pointing out the obvious mistake...
After correcting it, I got improvement from 5940 req/s to 8650 req/s...

But still much slower than the haskell warp-server, which has throughput
of 38000 req/s...

But I have another question regarding blob/0. Is it going to be evaluated
only once (like GHC would do) since it is a pure expression? I'm not
so sure, since erlang is not pure and any function can have side-effects
which you can't mark as with the IO monad in Haskell...

Sergej Jurecko

unread,

Jun 25, 2013, 2:30:22 PM6/25/13

to BM Kim, erlang-q...@erlang.org

If you only care about performance use Haskell or c.

Sergej

Loïc Hoguin

unread,

Jun 25, 2013, 2:38:55 PM6/25/13

to BM Kim, erlang-q...@erlang.org

That's not surprising at all, you are performing the same thing exactly
all the time, so of course Haskell is going to be fast at this. Same
goes for JIT enabled environments like Java. The JIT can easily compile
it to machine code once and be done with it.

You're not actually testing the HTTP server, or even the language
performance, you are testing the ability of the platform to optimize one
operation to death.

> But I have another question regarding blob/0. Is it going to be evaluated
> only once (like GHC would do) since it is a pure expression? I'm not
> so sure, since erlang is not pure and any function can have side-effects
> which you can't mark as with the IO monad in Haskell...

Erlang doesn't do that. Closest is at compile time when A = 1 + 1
becomes A = 2 in the compiled file, but it's only done for a very small
subset of all expressions.

Erlang shines not in synthetic benchmarks, but in production, when
thousands of clients connect to your server and expect their requests to
arrive as quick as if they were alone on the server. Erlang is optimized
for latency, and this latency will be the same regardless of there being
only one user or ten thousands.

Your benchmark on the other hand is evaluating throughput. Throughput is
boring, and not really useful for Web applications. (See Max' email for
more details on that.)

--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu

BM Kim

unread,

Jun 25, 2013, 2:54:43 PM6/25/13

to Loïc Hoguin, erlang-q...@erlang.org

On 06/25/2013 08:38 PM, Loïc Hoguin wrote:

[...]

Thank you so much, for your helpful and insightful explanation, which gives me a new
perspective on my current benchmark-strategy and expectations...
If it is not too much trouble, can you me some further advice regarding erlang
performance in general and using the cowboy library "efficiently", if there are some
issues/tricks which are not documented in the manual but I should be aware of?

by the way: can you recommend some opensource erlang projects incorporating the
cowboy library that I can learn from?

Again thank you for your time and advices!

-- bmk

Loïc Hoguin

unread,

Jun 25, 2013, 3:21:11 PM6/25/13

to BM Kim, erlang-q...@erlang.org

The tricks are documented in either the Cowboy manual or the Ranch
manual (for things relating to sockets). But there aren't many of them
and you shouldn't start with these questions. Ask yourself whether the
project enables you to do what you need instead.

There's no doubt that Erlang fits your needs if it's a Web project. The
only gotcha would be that you want to use a NIF for image manipulation
(there isn't any pure Erlang lib anyway at this time), and you want to
use the OpenCL NIFs for any heavy computation. You can also use Erlang
to handle clusters of Python instances or any other platforms, like has
been done by many people (see CloudI for example, if you have that kind
of need).

> by the way: can you recommend some opensource erlang projects incorporating the
> cowboy library that I can learn from?

I'd look at n2o, axiom, ChicagoBoss, or other frameworks, in that order.

--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu

okeuday

unread,

Jun 25, 2013, 3:52:20 PM6/25/13

to erlang-pr...@googlegroups.com, erlang-q...@erlang.org

When I use their benchmark here:
https://github.com/yesodweb/benchmarks
(as mentioned here: http://www.yesodweb.com/blog/2011/03/preliminary-warp-cross-language-benchmarks)
to test elli, here:
https://github.com/knutin/elli

I get 99000 req/s. The (httperf) test of CloudI's http_req erlang service using cowboy gives 13358 req/s. cowboy has more features, so that can explain the extra average latency which limits throughput.

If you want to understand why their benchmark isn't decent, read this:
http://www.mnot.net/blog/2011/05/18/http_benchmark_rules

So, if you want something faster in Erlang, you could use ellis, however, keep in mind their testing isn't long enough to be meaningful (due to garbage collection and other impacts on performance).

Loïc Hoguin

unread,

Jun 25, 2013, 4:06:57 PM6/25/13

to okeuday, erlang-q...@erlang.org

You'd also end up with tons of unsupervised processes, which isn't a
good thing when you want to see what's wrong in your production system
later on (something I last did today and do quite regularly when
consulting). That's the biggest difference between the two, Cowboy
follows OTP principles, elli doesn't. Unfortunately supervision does
have a cost when you need to accept many short-lived processes quickly,
but we have done a lot of work in Ranch to reduce that cost as much as
possible. You could always replace Ranch with something that doesn't
supervise if you really need to optimize to death, but so far even the
handful of companies that I know that use it for ad bidding and other
high frequency purposes haven't had the need to do that.

In short: removing supervision only looks good on benchmarks.

On 06/25/2013 09:52 PM, okeuday wrote:
> When I use their benchmark here:
> https://github.com/yesodweb/benchmarks
> (as mentioned here:
> http://www.yesodweb.com/blog/2011/03/preliminary-warp-cross-language-benchmarks)
> to test elli, here:
> https://github.com/knutin/elli
>
> I get 99000 req/s. The (httperf) test of CloudI's http_req erlang
> service using cowboy gives 13358 req/s. cowboy has more features, so
> that can explain the extra average latency which limits throughput.
>
> If you want to understand why their benchmark isn't decent, read this:
> http://www.mnot.net/blog/2011/05/18/http_benchmark_rules
>
> So, if you want something faster in Erlang, you could use ellis,
> however, keep in mind their testing isn't long enough to be meaningful
> (due to garbage collection and other impacts on performance).
>
>
> On Tuesday, June 25, 2013 10:38:25 AM UTC-7, BM Kim wrote:
>

> erlang-q...@erlang.org <javascript:>
> http://erlang.org/mailman/listinfo/erlang-questions
> <http://erlang.org/mailman/listinfo/erlang-questions>

>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-q...@erlang.org
> http://erlang.org/mailman/listinfo/erlang-questions
>

--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu

Knut Nesheim

unread,

Jun 26, 2013, 5:23:26 AM6/26/13

to Loïc Hoguin, erlang-q...@erlang.org

I'm a bit curious to your comments about unsupervised processes in Elli and Elli not following the OTP principles. Why do you think unsupervised request processes is a bad thing? They are short-lived and you cannot restart them in any meaningful way.

It is true that Elli does not hook any of the processes it creates into a supervisor. As a user of Elli, you include the Elli "server" inside your supervision tree. This server starts linked acceptor processes which accepts a new connection, then handles the request (and keep-alive) before they die and the Elli server starts a new acceptor. If any of these processes exits abnormally, Elli knows about it. If you wish to find these processes to debug something, you can use elli:get_acceptors(ElliPid) or process_info(ElliPid, links). Mochiweb and Yaws does more or less the same.

Knut

Loïc Hoguin

unread,

Jun 26, 2013, 10:48:36 AM6/26/13

to Knut Nesheim, erlang-q...@erlang.org

They are not short lived. Spawning unsupervised processes is only fine
if you spawn the process to do exactly one thing and that thing cannot
get the process stuck (otherwise you got an invisible leak). And you say
it yourself, your processes do keep-alive. So they can be open for quite
a while.

Now if a request handler gets stuck for one reason or another, and the
process is supervised, it takes about 5 minutes to find what's wrong
because the OTP tools and libraries do all the work for you in figuring
it out. You don't have to think, you already know how to deal with issues.

If the process isn't supervised, you cut yourself from most of the
documented tools and techniques, for example none of your processes will
show up in observer. You also force your users to figure out the
alternative ways to debug things instead of what they use everywhere
else (see Fred's posts [1] and [2]). You have to write custom code to
extract metrics, logging will work in unexpected ways, etc. And the
worse part is that you essentially reimplemented a supervisor in your
Elli server process.

Yesterday I was debugging an issue and found out file:consult/1 calls
were getting stuck (bug report incoming, still testing things). If I was
using Elli, I would have cursed your name many times because of all the
time I'd have lost trying to understand how to debug this thing.

Following OTP principles is one of Cowboy's most important feature, and
this will soon be improved again by making even request processes
special processes that you can debug like you would a gen_server. At
this point everything will be a special process. This is especially
important since the Web continues moving toward long-running connections
with Websocket, SPDY and HTTP/2.0.

[1] http://ferd.ca/poll-results-erlang-maintenance.html
[2]
http://ferd.ca/code-janitor-nobody-s-dream-everyone-s-job-and-how-erlang-can-help.html

> https://github.com/yesodweb/__benchmarks

> <https://github.com/yesodweb/benchmarks>
> (as mentioned here:

> http://www.yesodweb.com/blog/__2011/03/preliminary-warp-__cross-language-benchmarks
> <http://www.yesodweb.com/blog/2011/03/preliminary-warp-cross-language-benchmarks>)

> to test elli, here:
> https://github.com/knutin/elli
>
> I get 99000 req/s. The (httperf) test of CloudI's http_req erlang
> service using cowboy gives 13358 req/s. cowboy has more
> features, so
> that can explain the extra average latency which limits throughput.
>
> If you want to understand why their benchmark isn't decent, read
> this:

> http://www.mnot.net/blog/2011/__05/18/http_benchmark_rules

> <http://www.mnot.net/blog/2011/05/18/http_benchmark_rules>
>
> So, if you want something faster in Erlang, you could use ellis,
> however, keep in mind their testing isn't long enough to be
> meaningful
> (due to garbage collection and other impacts on performance).
>
>
> On Tuesday, June 25, 2013 10:38:25 AM UTC-7, BM Kim wrote:
>
> Damian Dobroczyński <qoocku <at> gmail.com

> <http://gmail.com> <http://gmail.com>> writes:
>
> >
> > W dniu 25.06.2013 13 <tel:25.06.2013%2013>:51, BM Kim pisze:

> ------------------------------__------------------------------__----------

> > >
> > > blob() ->
> > > [<<0:8>> || _ <- lists:seq(1,4096)].
> >
> > First, try to replace blob/0 function with this:
> >
> > blob() -> <<0:(4096*8)>>.
> >
> > Then, restart the test and report ;)
> >
>
>
> Hi,
>
> Thank you very much for pointing out the obvious mistake...
> After correcting it, I got improvement from 5940 req/s to
> 8650 req/s...
>
> But still much slower than the haskell warp-server, which
> has throughput
> of 38000 req/s...
>
> But I have another question regarding blob/0. Is it going to be
> evaluated
> only once (like GHC would do) since it is a pure
> expression? I'm not
> so sure, since erlang is not pure and any function can have
> side-effects
> which you can't mark as with the IO monad in Haskell...
>

> _________________________________________________
> erlang-questions mailing list
> erlang-q...@erlang.org <mailto:erlang-q...@erlang.org> <javascript:>
> http://erlang.org/mailman/__listinfo/erlang-questions
> <http://erlang.org/mailman/listinfo/erlang-questions>
> <http://erlang.org/mailman/__listinfo/erlang-questions
> <http://erlang.org/mailman/listinfo/erlang-questions>>
>
>
>
>
> _________________________________________________
> erlang-questions mailing list
> erlang-q...@erlang.org <mailto:erlang-q...@erlang.org>
> http://erlang.org/mailman/__listinfo/erlang-questions

> <http://erlang.org/mailman/listinfo/erlang-questions>
>
>
>
> --
> Loïc Hoguin
> Erlang Cowboy
> Nine Nines
> http://ninenines.eu

> _________________________________________________
> erlang-questions mailing list
> erlang-q...@erlang.org <mailto:erlang-q...@erlang.org>
> http://erlang.org/mailman/__listinfo/erlang-questions

Knut Nesheim

unread,

Jun 27, 2013, 5:17:00 AM6/27/13

to Loïc Hoguin, erlang-q...@erlang.org

I agree that having a web request process be "debuggable" using the sys module can be quite useful. So far no users of Elli has told me of such a wish. For myself and the users of Elli I know of, process_info/2 has been very helpful in debugging the web request processes.

If you want a webserver that strictly follows the OTP principles, maybe you should use Cowboy. There are both pros and cons with doing so and as is pretty obvious, people have different views of the tradeoffs. Some might think it is an absolute requirement to strictly follow the OTP principles, others might think there are cases when it doesn't make much sense.

Elli is not meant to be everything to everybody. If you find it hard to debug Elli, cursing my name is misplacing your anger and won't help you achieve anything. I happily help anybody who comes to me with questions.

Knut

erlang-q...@erlang.org <mailto:erlang-questions@erlang.org>

http://erlang.org/mailman/__listinfo/erlang-questions

<http://erlang.org/mailman/listinfo/erlang-questions>

--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu
_________________________________________________
erlang-questions mailing list

erlang-q...@erlang.org <mailto:erlang-questions@erlang.org>
http://erlang.org/mailman/__listinfo/erlang-questions
<http://erlang.org/mailman/listinfo/erlang-questions>

Loïc Hoguin

unread,

Jun 27, 2013, 7:32:53 AM6/27/13

to Knut Nesheim, erlang-q...@erlang.org

On 06/27/2013 11:17 AM, Knut Nesheim wrote:
> I happily help anybody who comes to me with questions.

Me too, but I'm not sure how that would help someone trying to debug an
issue in production quickly to avoid losing money.

You shouldn't make the product's usability dependent on the author's
availability. (And yep, I am also in the process of learning that,
except in my case it's with regards to documentation.)

--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu

okeuday

unread,

Jun 27, 2013, 3:59:02 PM6/27/13

to erlang-pr...@googlegroups.com, erlang-q...@erlang.org

To be a bit more accurate, when I hook-up elli to CloudI it shows that cowboy has better performance when doing the same http_req test with httperf. So, it is likely that the benchmark (https://github.com/yesodweb/benchmarks) could be faster with cowboy (i.e., CloudI is limiting throughput, probably mainly due to erlang:now/0 usage). One of the main reasons cowboy is faster is that the connection rate is an order of magnitude larger: 124.1 conn/s instead of 11.6 conn/s. The throughput, for the curious was 12539.2 req/s cowboy and 11560.6 req/s elli. I feel compelled to say again how httperf is not an accurate way to judge performance (see http://www.mnot.net/blog/2011/05/18/http_benchmark_rules), but it does provide a quick number to guess with. There is more serious loadtesting of CloudI here https://github.com/okeuday/CloudI/tree/master/src/tests/http_req .

Reply all

Reply to author

Forward