data structure in antiweb.lisp

27 views
Skip to first unread message

Watt

unread,
Nov 19, 2008, 1:31:15 PM11/19/08
to antiweb
Hi Doug,

Your last post has shed some light for me in understanding your code
better. Before I get to that, let me ask you a trivial question.
What would happen if the HTTP request comes in as something other than
POST or GET, like PUT DELETE? Would that cause an error? I guess not
but I just can not read from the code what would happen.

I have picked up your code now and then when I have time to
understand more how things are done. Please correct me if I am off
track. Here is how I understand how you structure your code. At the
bottom, there is the C data structure that would be read in and put
out by Lisp code, which would basically serve as the message protocol
between the two sides. It finally hit me on your point that asks if C
code is Lisp. It sure looks like that to me now. Anyhow, once you
have the data structure setup, it will go through the series of steps
that are basically wrapped up in the aw compile phases. From that
point on the preferences that are read in through handlers will get
compiled and put into closures for each phases. I saw the pandoric-
eval that you mentioned and I think that looks pretty neat. I really
hope you can make it turtle all the way down from top to bottom that
way. That would make things concise to learn after a short while.
Also, if there can be an examples of some valid data in the data
structures that are being thrown around in the comment, that would
definitely help to understand what is going on. That would make it
easier to understand the message protocol (if i can call it that) and
hence easier to customize or extend in the end (if you allow, of
course). Thanks for your help.



Watt P.

do...@hcsw.org

unread,
Nov 19, 2008, 7:38:03 PM11/19/08
to ant...@googlegroups.com
Hi Watt,

On Wed, Nov 19, 2008 at 10:31:15AM -0800 or thereabouts, Watt wrote:
> Your last post has shed some light for me in understanding your code
> better. Before I get to that, let me ask you a trivial question.
> What would happen if the HTTP request comes in as something other than
> POST or GET, like PUT DELETE? Would that cause an error? I guess not
> but I just can not read from the code what would happen.

Good question. No, it will not cause an error because this is not
an unexpected condition. Antiweb expects that random clients on
the internet will connect and spew garbage into the connection.
Heck, for the last 3 or 4 years I've been one of the principal
maintainers of nmap's service detection--one of the most common
programs for spewing random shit into open ports to see what happens. :)

Here is how Antiweb works:

When a client connects, it must send a valid HTTP request within
keepalive seconds (by default, 65) or the connection is dropped.
If a valid HTTP request does not appear in the first
AW_MAX_HTTP_HEADER bytes (by default, 4096), the connection is
dropped. If a client's request does not conform to HTTP/1.1
(for example it is missing a Host: field) it is dropped.
If a client sends a request with a currently un-implemented
method like PUT or DELETE, it is dropped.

The code checks for this in 2 places, once in the hub before it
decides to transfer the connection or not, and again in the worker
when it decides whether to handle the request or not. In the hub
it is in this function: hub-accept-http-connection.

If that function doesn't call aw_send_conn and return 'conn-sent,
then it will either terminate the connection like this:

(send-http-err-and-linger c 404 "Virtual host not registered")

or (in your case if a client sends PUT or DELETE):

(send-http-err-and-linger c 400 "Valid HTTP/1.1 required")

It is on my TODO list to log such invalid requests to syslog
but currently they are silently dropped. Let me know if this
is a problem and I will increase the priority of this.

The worker dispatch function is more complicated because it
is dynamically compiled based on your worker conf file. But
the worker uses essentially the same technique for verifying
the virtual host and the HTTP method. See this helper function:
http-user-dispatch-macro.

It is important to check this in the worker as well as the
hub because if a client persists ("keeps-alive") a connection,
the hub will not process any further requests on this connection
and it is up to the worker to verify the requests.

> I have picked up your code now and then when I have time to
> understand more how things are done. Please correct me if I am off
> track. Here is how I understand how you structure your code. At the
> bottom, there is the C data structure that would be read in and put
> out by Lisp code, which would basically serve as the message protocol
> between the two sides.

Yes, exactly. The first place to learn about AW's low-level messaging
protocol is the file src/libantiweb.h. This file defines the two
most important data structures in AW, conns and ioblocks.

> It finally hit me on your point that asks if C
> code is Lisp.

Yes, this is how I look at it too. Most things in libantiweb.c could
also have been written in lisp. I chose C because I didn't want to
parse the many header files required by AW when there is already a
perfectly good C compiler on every system AW supports to do that
for us. Also, I know C reasonably well and its pointer syntax is
easier for me to reason about than the equivalent CFFI code.

But yes, libantiweb.h is NOT A C HEADER FILE. It is a special format
that is processed by both the C compiler and the lisp compiler. The
following function in antiweb.lisp is used to copy data from lisp
strings into ioblock chains in conn data structures:
write-to-conn-from-string. And this function is used to copy from
ioblock chains into a lisp string: read-from-conn-into-string.
Both of the above functions are optimised so that they will be
compiled into tight machine-code copy routines similar to
for example strcpy(3) (at least on CMUCL).

There is also this function which you should never use:
copy-lisp-string-to-c-string-no-bounds-check-yes-really.
AW only uses this in 1 place: to copy the string
representation of an IP address into a conn structure
which is guaranteed to have enough space available.

> Anyhow, once you
> have the data structure setup, it will go through the series of steps
> that are basically wrapped up in the aw compile phases. From that
> point on the preferences that are read in through handlers will get
> compiled and put into closures for each phases. I saw the pandoric-
> eval that you mentioned and I think that looks pretty neat.

Yes, again you understand correctly. The pandoric-eval lets us build
the http dispatch function at the last possible moment: once we've
read in and verified the worker conf file. Because of this design,
it doesn't matter how many features or modules we add. If you
don't use it in your worker conf file, the feature will not
even be compiled into the HTTP dispatch function.

This also lets you do things like re-compile and install new
modules on a worker without restarting the worker.

> That would make things concise to learn after a short while.
> Also, if there can be an examples of some valid data in the data
> structures that are being thrown around in the comment, that would
> definitely help to understand what is going on. That would make it
> easier to understand the message protocol (if i can call it that) and
> hence easier to customize or extend in the end (if you allow, of
> course).

Yes, you can definitely call it a message protocol. All of AW's
processes communicate with a simple line oriented message protocol.
The best example of the message protocol implementation is in
the function hub-accept-unix-connection in antiweb.lisp. This
function is called once the hub accepts a connection from the
hub.socket unix socket. It returns a closure that will be invoked
once the message separator (a single newline) is encountered.
The important macro fsm (short for "finite state machine") is
used to create the code that returns a closure.

Usually, a message will be a simple command like this:

"register-host hoytech.com\n"

This tells the hub that it should register this connection to be
a worker process and begin forwarding connections requesting
hoytech.com to this process. The worker can then add more hosts
if it wants. Next, the worker process will send a lock command:

"lock\n"

Actually, because AW's data structures are designed for
pipelining, usually these commands will be batched together
and written with a single writev(2) call:

"register-host hoytech.com\nregister-host www.hoytech.com\nlock\n"

But sometimes we need to send an arbitrary block of data
over the connection. We can't simply separate this with a
message separator like "\n" because the block of data
might itself contain "\n" characters. In this case, we
send the block of data immediately after the message.
The length of the block of data is embedded into the
message. For example, a worker connection might send
this to the hub:

"axslog 4\n1234axslog 6\n123456"

This will add 2 messages to the axslog file, the
first "1234" and the second "123456". How this
works is that after we have received the message
(ie, we've seen a "\n"), we then return a new
closure, one that is ready to read N bytes from
the connection, execute some functionality, and
then return the original closure that processes
regular "\n" separator messages.

This scheme is similar to DJB's netstrings protocol:

http://cr.yp.to/proto/netstrings.txt

In fact, DJB's software and writings have influenced
me and my programming in several ways.

Another detail about the message passing protocol
is that some messages implicitly transfer a socket
along with the message, for example when the hub
transfers a connection to a worker process, or when
you attach to an AW process, you transfer a unix
connection through the hub and into that process.

And one final thing: HTTP uses the exact same message
passing infrastructure except for the following
details:

1) The messages are terminated with "\r\n\r\n", not "\n".
2) The message body length is embedded in the Content-Length:
header (POST requests only), and not as a number preceding
the separator.
3) HTTP messages can never include transfered sockets
because HTTP messages always come from internet
sockets, not unix sockets.

Note that chunked HTTP messages currently are not
supported and I have no plans to support them at
this time. Luckily, afaik no HTTP clients ever
send chunked messages. When we do reverse proxying,
we might need to parse chunked messages from servers
though.

The most complicated message is the message that the
hub sends to the worker process to transfer in a new
connection. It might look something like this:

"http 54 127.0.0.1\nPOST /test HTTP/1.1\r\nHost: hoytech.com\r\nContent-Length: 5\r\n\r\n12345GET / HTT"

127.0.0.1 is the IP (can be IPv4 or IPv6) the client connected
from. 54 is the total number of characters following
the "\n" separator being transfered (I didn't actually
count them in this example so it's prolly not right).
Notice that this can be longer than an entire HTTP request
if the client pipelined more than one HTTP request (though
most HTTP clients don't actually do this currently.
Exception: http://www.daemonology.net/phttpget/).

Hope this helps,

Doug

signature.asc
Reply all
Reply to author
Forward
0 new messages