keeping 'processors' alive

Julius Chrobak

unread,

Aug 21, 2011, 5:06:54 PM8/21/11

to bandicoot

hi all,

I've finished another branch called 'processor'. Let me describe more
in details what the changes are about.

In bandicoot, there are three major components:
a) executor
b) volume
c) transaction manager.

More about the interaction among these components is written on the
following blog:
http://bandilab.org/private/blog/2011-06-22.Ostap.Running_on_Multiple_Nodes.html.

For the purpose of this branch I'll only describe in short how the
executor works. The executor is responsible for accepting HTTP
requests, performing the calculations, and providing the results back
to the caller. The requests are accepted by a main process. The
calculation is than always executed by a new process; called
"processor"; and results are sent back to the client through the main
process again. The responsibility of a processor is also to start a
transaction and exchange the data with the volume.

This approach has some clear benefits (i.e. complete segregation of
memory,easy error handling). However, the v3 version has the following
inefficiencies which are the subject of the 'processor' branch:
1) for each request coming from a client a new processor is
created, as a separate process
2) each time the processor needs to establish TCP/IP connections
with the rest of the system (transaction manager, volume, and back to
the executor)
3) if a client closes the connection before the execution is
finished the processor is not going to know about it and finishes the
execution
4) the response of the HTTP request is provided in chunked
encoding but all the data is always transfered in one chunk.

Here is a short description of how those points are addressed in the
branch:

1) a constant number, 8, of processors is created on the start up. The
main process simply passes the data between the client and the
processors and behaves as a proxy. If a processor dies the current
transaction is rolled back and the main process creates a new
processor to serve a next request.

2) all the TCP/IP connections are established by each processor
immediately on the start up and are reused for different requests.

3) if the main process recognizes that a client closed the connection
it immediately kills the appropriate processor and creates a new
one.This automatically triggers a roll back of the changes unless the
commit has already happened. In that case, the client is not notified
about the successful completion of the request.

4) the response data is now split into the http chunks of maximum size
of 64KB.

I hope I managed to describe the change in a reasonable form. The
branch is ready for review and your comments.

Ostap Cherkashin

unread,

Sep 12, 2011, 4:19:10 PM9/12/11

to band...@googlegroups.com

Great changes! I had a closer look at the implementation details and I think there are a couple more improvements you introduced which are worth mentioning. Both come from #1 and #2 below. Since there is no new processes started for each transaction it improves the transaction execution performance (by ~4ms on my laptop). The next big step would be to avoid calling env_init for each transaction, but that's something for the future. Also, problem #2 had some really bad side effects (discussed in a different thread). Under heavy load there were way too many sockets left in the TIME_WAIT state which could cause processor hangs (while running out of resources). Now it is all gone.

Please see my comments below.

- Ostap
---

1.) sys_proxy reads data in one format (byte stream) but replies in another format (chunked), and also http_* methods now read data in one format and write in another. Would be great to unify it as this complexity already resulted in one error in the exec_thread(): responding with http_500() for direct communication with the client rather than via sys_proxy (a client expects the byte stream rather than a chunk).

2.) It was there before (as a TODO) but while reading the code I got an idea how to fix it. The problem is that a transaction is committed prior to writing the last data chunk to a client socket. The later this is performed the better, since there is a good chance that a client will be able to close the socket and the transaction would get cancelled. I think it is just a matter of moving tx_commit() after the http_term() in processor(), and then performing the following changes in the exec_thread:

- 128 /* HTTP 500 only if no data was sent to the client */
- 129 if (cnt == 0)
+ 129 if (pio->err && !cio->err)
130 status = http_500(cio);

3.) sys_proxy uses a magic number (18), would be nice to document it.

4.) too many braces in chunk_recv():
---
103 if ((sys_readn(io, &sz, sizeof(sz)) <= 0))
---

Julius Chrobak

unread,

Sep 25, 2011, 12:57:29 PM9/25/11

to band...@googlegroups.com

hoi, thanks for the review. I've changed the code to address your points with the following comments:

1) a new "chunked" network IO introduced not to expose the chunked data exchange requirement for the sys_proxy.

2) I've moved the tx_commit right after the http_200 response in order delay a commit little bit. I did not want to move it even further because of two reasons:
a) a successful response (http_200, http_chunk or http_term) indicates only a successful transfer to main bandicoot process and not to the client
b) waiting with the commit until the result is fully sent might introduce a long running transaction.

In case of the http_500 we need to check for the count in order not to sent 500 response when client received some data already.

3) fixed.

4) fixed.

Ostap Cherkashin

unread,

Oct 3, 2011, 3:50:53 PM10/3/11

to band...@googlegroups.com

Thanks, the changes look great, and chunked IO removes quite some complexity from the rest of the code. I was thinking about the exposure of io->term flag and also the negative chunk size as the terminator indicator. It feels like we can go a bit further with the level of abstraction and make sys_proxy even more transparent to the protocol differences:

1.) essentially we want to reuse the same socket associated with the processor across different client requests. Here is how I imagine this: a socket reuse property is passed as a flag to sys_socket and sys_connect calls, e.g. something like this:
--- server reuses connections
io = sys_socket(…, REUSE);
while (true) {
cio = sys_accept(io);
sys_read(io, …);
sys_write(io, …);
sys_close(io);
}
--- client reuses connections
IO *io = NULL;
while (true) {
io = sys_connect(…, io, REUSE);
sys_read(io, …);
sys_write(io, …);
sys_close(io);
}
---
then sys_connect and sys_close can send the corresponding terminators if required, but of course this idea is very raw.

2.) indeed, committing a transaction right after http_200 is a great balance between performance and usability (i.e. canceling a transaction while still receiving the function call result).

- Ostap

Julius Chrobak

unread,

Oct 10, 2011, 1:20:56 PM10/10/11

to bandicoot

I've done some additional clean ups to fixed some bugs. At this point
I consider the branch to be finished. I'm going to do some long
running tests and if those are successful I'll merge the branch to the
master.

The idea about reusing the sockets is definitely something we need to
pursue. However, as per our discussion (unfortunately, outside of this
thread) there are more things to consider when implementing something
like this. Let's put this idea on hold for now. I'm sure we'll come
back to it very soon and implement it in a separate branch.

As always, the branch is ready for anyone to review and comment on it.

> >>>http://bandilab.org/private/blog/2011-06-22.Ostap.Running_on_Multiple....

Ostap Cherkashin

unread,

Oct 11, 2011, 3:09:28 PM10/11/11

to band...@googlegroups.com

Still the "implicit setting of the term flag via a write of the negative size" was replaced with a more straight forward termination command - sys_term(). This change reverted pretty much all of the modifications from http.c which is a good indicator :-) I think the branch is ready to go to the master, and it is finally time for the v4 release candidate!

I did another round of code review and found a couple of small things:

* comment on sys_proxy regarding the invocation result is confusing ("successful transfer" should be no errors)
* i think that the double buffering introduced in sys_proxy can be reverted
* it would be less error prone to use something like -1 for terminator in sys_term
* MAX_BLOCK size is quite weird (66560)

Julius Chrobak

unread,

Oct 14, 2011, 9:12:40 AM10/14/11

to bandicoot

The 'processor' branch is now merged to the master.

I've addressed the small things:
* the comment on sys_proxy rewritten
* the last chunk terminator is indicated by a size equal to -1
* some additional bug fixes discovered while testing on Linux and
Windows.

The MAX_BLOCK size stays on 66560 but I added a comment to the
config.h to review it later.

Reply all

Reply to author

Forward