Threaded vs event driven and a new socket layer

722 views
Skip to first unread message

Lammert Bies

unread,
Dec 4, 2016, 8:05:15 PM12/4/16
to civetweb
One of the main reasons to write the Civetweb API reference was to get a detailed view myself of how Civetweb is working and how to get the most out of it when integrating it in my application. One thing makes me scratching my head though.

Civetweb relies heavily on callback functions which assumes an event driven model like for example nginx. But on the other hand the communication layer is constructed with multiple threads where each thread is only active for one communication activity which is much more like the Apache fork and thread model. I can't understand the reason for that. Maybe it is an historical decision by Sergey to use multi-threading and the code drifted slowly towards event driven, but now you have the bad things of two worlds. A large number of threads which may cause problems with resources and performance on one side and a relatively difficult flow of control through all the callback calls.

For another socket level communication activity in my application I use libuv (also MIT licensed) with my own code on top of it and it gives a much cleaner flow of control. Libuv which is the bottom layer of node.js is also known for its speed and efficient processing and is multi-platform.

Is it an idea to remove the low level socket routines and threading altogether from Civetweb and put libuv there instead?

Lammert Bies

unread,
Dec 7, 2016, 4:56:14 AM12/7/16
to civetweb
Nevermind

Consider this request obsolete as I will now continue development myself of the event driven socket layer at my Mongoose/Civetweb fork at https://github.com/lammertb/libhttp

bel

unread,
Dec 9, 2016, 1:52:13 PM12/9/16
to civetweb
It seems you already decided to create your own fork - I hope you don`t do this only because I was not reading emails for three days.

In case someone still wants to know:

The threading model is indeed a heritage of Sergey - however, in the current state it's established technology and proved to work reliable in several different applications. The callbacks for form handling were added by me later (replacing the mg_upload callback). Still, I don't think this is a bad design decision. Threads are not really that expensive - it was expensive in old times, when "thread" meant "process" (own address space) and virtual address (of the stack) meant committed physical RAM. The complexity of multi-threading is encapsulated in civetweb.c, so the callbacks written by users can be simple - certainly simpler as if the callback programmers would need to use asynchronous IO operations. So if civetweb takes complexitiy "inside", it's done to reduce coplexity "outside" (in callbacks).

It is not our intention to re-create Apache, nginx or node.js - if one of them fits perfectly for your application, you should use it, of course!
As to my knowledge, nginx cannot be used to create dynamic contents by executing scripts (only indirectly by working as a proxy for other webservers or special services like php-fastcgi) - civetweb can execute scripts and other clients will not suffer if a script blocks.
As to my knowledge, Node.js can only execute JavaScript code - no other scripts (http://stackoverflow.com/questions/5346055/can-i-replace-apache-with-node-js) or C/C++ callbacks.
As to my knowledge, you can not use Apache to extend your existing C/C++ application to give it some HTTP/HTTPS interface, but you have to add your application as an Apache module - and Apache is not really small, so if you target some embeded platform or just don't want the HTTP server to be many times as large as your actual application, it might not be a good choice.
So if you want to extend your C/C++ application and/or need a small webserver with performant scripting capability (Lua is way smaller than PHP & co.), civetweb would be a proper choice - civetweb is just one file without external depencencies, making it easy to add it into existing projects.
Nginx has been created 2002-2004 to overcome the "C10K" (10000 connections to one server) problem (see https://www.digitalocean.com/community/tutorials/apache-vs-nginx-practical-considerations). CivetWeb can handle more than 32000 connections on a year 2010 standard office PC (see https://github.com/civetweb/civetweb/issues/50).



On Monday, December 5, 2016 at 2:05:15 AM UTC+1, Lammert Bies wrote:
 

Lammert Bies

unread,
Dec 16, 2016, 8:32:36 AM12/16/16
to civetweb
I was talking about the architecture of Apache, Nginx and libuv (which is the I/O layer of Node.js), not the functionality at the user level. Architecture tells something about scalability in different environments.

I see the power of a project like Civetweb mainly as an efficient layer between the network sockets and a higher level application. I am not so concerned if that higher level application is running in PHP, Lua, JavaScript or plain C, but the project underneath should effectively handle things like connections, sending and receiving data, authentication, errors, etc. 

On a laptop/desktop/server size computer every HTTP server will run and the choice of a server depends largely on the wanted high level functionality. But in more constrained environments like embedded systems the way the I/O is handled can decide if an architecture is suitable or not. If memory is an issue it is impossible to run a large number of threads because at some moment in time those threads may be necessary to handle peaks in connections. Systems with thread pools are also vulnerable for DDOS attacks because once all threads are busy, no new connections can be served.

An asynchronous handling of events on the other hand will only consume resources when actual data processing is necessary and other connections are left in a dormant state with only enough memory resources allocated to them to remember their connection state.

Also smaller embedded systems tend to have inefficient threading implementations which will lower thread dependent performance compared to event driven implementations. And a fully event driven implementation has no need for locking, because no two single threads will ever need access to the same resources at the same time. My smallest webserver is running in 256 kB ROM and 128 kB RAM. Not on Civetweb obviously.

My first phase in the fork is to further document and do some cleanup in the code. After that I will exchange the current socket and file I/O level with asynchronous I/O.

Hansi PIERRE

unread,
Dec 16, 2016, 3:11:09 PM12/16/16
to Lammert Bies, civetweb
Hi
I have to admit i'm curious about your fork, it should be interesting :)


--
Sourceforge
https://sourceforge.net/projects/civetweb/
---
You received this message because you are subscribed to the Google Groups "civetweb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to civetweb+unsubscribe@googlegroups.com.
To post to this group, send email to cive...@googlegroups.com.
Visit this group at https://groups.google.com/group/civetweb.
To view this discussion on the web visit https://groups.google.com/d/msgid/civetweb/4da6563f-0ba1-4584-885f-159f42026da2%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
... parce que schtroumpf.

bel

unread,
Dec 17, 2016, 6:29:45 PM12/17/16
to civetweb
I know the architecture of Apache, Nginx and Node.js (at least to this extent).
I tend to see this more from a requirement point of view: How many clients/requests do you expect at the same time? If you need to work with hundred, thousand or more clients, if you expose the server directly to the internet, then you should use appropriate hardware - a 250 € or $ PC hardware works for 10000 clients. If you use it only locally, in a small to medium intranet, you have some ten requests, an embedded system like a raspberry pi for ~ 25 €/$ is sufficient, and CivetWeb will run there without any problems. If you want to use a 2.5 € / $ embedded system, you probably don't have a standard operating system anyway. You won't have features like TLS (HTTPS), CGI (PHP, Python, Ruby, ...), ... And you probably do not use it as a file server (static contents like html, css, js), but only for some dynamic content (eg REST interface for sensor values).
But here we are talking about system architecture of the entire service, which has as least two devices (the embedded server and the client), most likely more (hardware for reading and storing sensor data, protocol gateways, human machine interface hardware). In fact, within the architecture of an entire system, I always found a suitable piece of hardware.
I think you should not try to expose the smallest of the embedded systems directly to the internet anyway.

Every open connection consumes memory. You need to store the incoming data (at least until the request is complete), and the connection state and the state of request processing. In an asynchronous processing model, you are most likely going to allocate this on the heap. In a threaded model like in CivetWeb, the processing state is stored as local variables in a thread stack. The cost of one thread is, depending on the operating system, a few pages - so the cost for one thread can be as small as 8 kB or 12 kB (a page is 4 kB on many architectures - some have different page sizes). So there is some memory overhead, but only a small one.

Thread based models are much simpler and more robust when it comes to writing request processing callbacks. In an asynchronous IO model, all the callbacks have to work with asynchronous, non-blocking IO as well - if the server has only one thread, and the processing callback blocks, the entire server is blocked. Or you need to create a new thread, whenever you want to call a processing callback - but a combination of blocking and non-blocking IO isn't easy at all, and prone to implementation faults. In particular, if you have in server support for script interpreters such as Lua or server side JavaScript, you have to have a separate thread - or rewrite the io handling in the script interpreter. I also think, it does not make any sense to try to re-create Nginx. There are some advantages in thread based connection handling - like Apache does.

Furthermore, synchronous IO is somewhat standardized, and available in all operating systems. Asynchronous IO is completely different between Windows, Linux, different embedded operating systems - I don't know about OSX, BSD, Solaris, ...

I think asynchronous IO is good for static files, but bad for dynamic content (generated by a script).


If you really need to change the threading model of civetweb so it only uses exactly one thread on an embedded platform, you could also think about using a coroutine-like model, based on setjmp/longjmp and check if there is any pending data before doing an IO operation. But still, in the real world applications done within the last years, there was no problem/reason to do any change in this direction.

Lammert Bies

unread,
Dec 20, 2016, 7:00:07 AM12/20/16
to civetweb
Thank you for estimating that the hardware I am working on is 2.5 € / $ per system. It proves that I made the right decision to fork this project and continue it in a more professional way.

Wishing you all the best,

Lammert
Message has been deleted

bel

unread,
Dec 20, 2016, 3:42:02 PM12/20/16
to civetweb
I meant "If you want to use a very cheap embedded system" (price of the CPU module) "you won't have a supported operating system like Linux or Windows anyway". And 250€ / 25€ / 2.5€ are orders of magnitude meant to give an idea what amount of CPU power, RAM, disk, ... I'm talking about - am office PC, a raspberry pi or something much smaller (8-bit microcontroller). These numbers were meant as an illustration, but by no means as an insult. Even if a system costs a lot, but the CPU and RAM are equivalent to a microcontroller, you will encounter the same topics.

Finally, I'm still somewhat irritated that being abroad for a few days yields in someone declaring the project as dead, but I will certainly not start to guarantee anyone any response time - if I'm abroad for some days, I'm abroad.

Joe Mucchiello

unread,
Dec 20, 2016, 10:39:28 PM12/20/16
to civetweb
There is no reason to insult someone who is doing something FOR FREE. Bel has been extremely professional in his treatment of civetweb. You decide he's unprofessional because he has the nerve to have a life outside of working on a OSS project? Go read everything Bel's written on github and come back here and reiterate that he's in any way unprofessional.

He didn't say YOU use $2.5 system. Learn to read. He said civetweb supports those 8-bit systems in the exact same way it supports rack mount hardware.

You seem to have chip on your shoulder because you posted a document of the API and people aren't gushing with gratitude. You're going to fork civetweb. Fine. You don't want to play with others. That's on you. Take your ball and go home. But don't blame the person who made civetweb possible just because it isn't EXACTLY what you want.

Hansi PIERRE

unread,
Dec 21, 2016, 9:11:36 AM12/21/16
to civetweb
OK .. this was not constructive ....

Mynock17

unread,
Dec 28, 2016, 11:51:35 PM12/28/16
to civetweb
Have you ever made a performance comparison between node.js and civetweb? You will be very suprised (and take a look at the memory consumption) ...

Lammert Bies

unread,
Dec 29, 2016, 6:28:38 AM12/29/16
to civetweb
I was talking mainly about libuv, not Node.js as a complete package. Libuv is the abstraction layer below Node.js which allows non-blocking access to I/O resources like sockets and files. The Node.js layer on top adds a significant amount of overhead with CPU and memory usage. My goal is to combine both good things of both systems, i.e. the non-blocking handling of I/O resources, combined with the efficient higher level processing of the HTTP server as done in Civetweb.

bel

unread,
Jan 3, 2017, 12:51:08 AM1/3/17
to civetweb

I’ve been busy with creating a new release, and with New Year’s vacation, so I could not do some tests earlier.

 

Why test first?

A lot of people do not know or understand the difference between a process and a thread (I’m using the same terms as the Windows API here), lots of internet articles and even some “standard literature” sometimes get these things mixed up. I’m not claiming you (= anyone who posted here) does not know the difference, my claim is, a lot of the literature is bad in this respect. The cost of a process and a thread is vastly different. And anyway, some real comparison is more useful than some literature study.

 

Test candidates?

A full test can only be done with fully operational servers.

CivetWeb uses the Lua as build in script interpreter. A project that does use Lua and libuv in combination to build a web server is luavit (https://luvit.io/). So, I think the two test candidates would be CivetWeb 1.9 with Lua support (https://github.com/civetweb/civetweb/blob/master/test/page5.lua) in comparison to luavit (https://github.com/luvit/luvit/blob/master/examples/http-server.lua). And the test would be to do 100 concurrent requests to /page5.lua. Actually the luavit example does not care about the uri, and will answer everything with “Hello world\n“ in plain text. CivetWeb will load, interpret and execute the page5.lua script file for every request and will answer with a Hello world HTML page – CivetWeb could do better by using C or pre-compiled Lua scripts, but I did not do any optimization. It will run with –num_threads 100. Both servers run on an older Linux PC, the clients run 100 HTTP/1.0 requests on Windows in one process (in principle this code https://github.com/civetweb/civetweb/blob/master/test/public_server.c#L378, I also tested with curl from the command line, but the measurement is more precise if the client is in C code).

 

Test 1:

Unmodified code, 10 repetitions of 100 requests,

 

response time in ms

Luvit: 5875

Civet: 656

Luvit: 6046

Civet: 688

Luvit: 5953

Civet: 484

Luvit: 5828

Civet: 672

Luvit: 5844

Civet: 703

Luvit: 6296

Civet: 688

Luvit: 5812

Civet: 906

Luvit: 5859

Civet: 860

Luvit: 4860

Civet: 734

Luvit: 5797

Civet: 781

 

Linux memory consumption (https://github.com/pixelb/ps_mem/blob/master/ps_mem.py):

Private + Shared = RAM used

5.5 MiB + 156 kiB = 6.0 MiB civetweb

8.8 MiB + 69.5 kiB = 8.8 MiB luvit

 

CivetWeb uses more address space (as far as I remember, the Linux default is 8 MB per thread stack, Win32 default is 1 MB), but actually less RAM. Address space does not need RAM, so in contrast to a process, a thread may cost as little as 10kB RAM.

CivetWeb is multithreaded in a single process with a pre-created thread pool.

 

Same with 10 clients (instead of 100):

Luvit: 562

Civet: 141

Luvit: 515

Civet: 141

Luvit: 782

Civet: 125

Luvit: 625

Civet: 110

Luvit: 485

Civet: 156

Luvit: 578

Civet: 141

Luvit: 547

Civet: 140

Luvit: 515

Civet: 125

Luvit: 500

Civet: 125

Luvit: 500

Civet: 156

 

Same with 1000 clients (more than CivetWeb threads):

Did neither work with luvit, nor with Civetweb.

 

 

Test 2:

Scripts creating “hello world” are rather boring. This time I replaced the “world” in Lua by a function call “GetSomethingFromSomewhere()”. But this function call takes some milliseconds to collect the data from somewhere. In a first test, I just used a sleep 1 to simulate this.

 

Time for 100 clients in ms:

Luvit: 105250

Civet: 2781

Luvit: 105843

Civet: 2782

 

Several requests to luvit failed, while Civetweb worked for all 100 clients

Memory consumption for both was about 8.8 MB

 

Time for 10 clients:

Luvit: 10656

Civet: 1188

Luvit: 10563

Civet: 1187

Luvit: 10500

Civet: 1203

 

Again, one request to luvit failed

So, one thing you cannot do in this model is use blocking functions in callbacks, since this will scale badly.

 

The advantage of luvit, on the other hand, if you only have static files (or: no blocking callbacks), you do not need to know how many clients you will have at maximum, while you need some reasonable configuration for CivetWeb. So I tested what will happen if you configure CivetWeb to 100 threads, but use 200 clients.

 

The clients will be accepted, queued, and processed once a thread is free.

 

Time for 200 clients:

Luvit: 225250

Civet: 5500

Luvit: 216406

Civet: 5500

Luvit: 213219

Civet: 5406

 

8 requests (of 600) to CivetWeb failed, and 67 (of 600) to luvit.

 

 

Other test scenarios:

A test scenario where luvit might work better than CivetWeb is, if you have to keep a lot of connections open – maybe with websockets. But you probably run into a port limit here earlier.

I will probably do this test some time later.

 

Lammert Bies

unread,
Jan 6, 2017, 9:07:37 PM1/6/17
to civetweb
I'll post the performance data of the rework of Civetweb with an event driven I/O layer and lock-free operation here in this thread when I'm done rewriting the source. That would be better than comparing apples with oranges like civetweb with luvit. Tests will probably be on FreeBSD though because on FreeBSD much of the socket event handling is done in the kernel, whereas on Linux part of the logic is handled in user space.

There are also issues with libuv not aggregating small packets of data written from an application to the network layer which--if not handled correctly by the higher level layer--causes streams of data to be sent as a lot of small packets, rather than a few larger packets. This may be the case in luvit, I don't know, but can have significant effect on throughput.

bel

unread,
Jan 7, 2017, 8:26:18 AM1/7/17
to civetweb
I took luvit, because I wanted to compare two full web servers, and because it's mentioned as a libuv based application here http://docs.libuv.org/en/v1.x/
Luvit and civetweb can both use Lua, and (as to my knowledge) there is only one Lua interpreter implementation, while there are several javascript implementations, so CivetWeb+Lua compared to luvit is closer than CivetWeb+Duktape compared to node.js
CivetWeb+Lua compared to luvit is as close as I could get without comparing apples to apple-seeds (incomplete servers).

Yes, please post the results here, once you have reworked the source.
Then you should use C callbacks in CivetWeb instead of Lua scripts (in the test, I just used Lua to be closer to luvit).

I did not check the large package / many small packages effect - it would have been easy to do this with wireshark. I did not do any optimization but just use the example code out of the box. I also did not do a test outside a LAN (lossy 3G connections, ...).

I don't have a FreeBSD test system - I think performance tests must be done on real machines, so a VM does not help here.
I also don't know about the portability of libuv to various real time operating systems.

Finally, one of my most important use cases is to get data from functions that are blocking. I would be curious how you plan to handle this.


Reply all
Reply to author
Forward
0 new messages