StreamServer using process-per-connection

14 views
Skip to first unread message

Matt Billenstein

unread,
May 19, 2025, 10:24:16 PMMay 19
to gev...@googlegroups.com
Hi all,

I thought I'd write here about this for reference. I have an app that does
client/server with very long lived tcp sockets - connections intend to last
forever and in practice can last many months only needing to be re-established
if a process on one side is restarted.

Originally, I had a single gevent server process with many clients (dozens) -
and there are certain types of i/o operations or sending large files over the
socket that can be cpu-bound -- and the server process became the bottleneck.

So, I tried several ways to spawn a process when accepting a connection, but,
discarding the event-loop state and other state in the sub-process was always a
problem, but I recently discovered subprocess.Popen has this pass_fds=()
argument...

Under the hood on Linux what this does is a fork, then it immediately closes
all file descriptors not in this list, followed by an exec() call which
replaces the current process - so you get a clean process without the existing
event loop and files you don't care about. So this gave me the idea, I could
simply subprocess.Popen sys.argv[0] with some alternate arguments while
maintaining a couple sockets between the server process and now a sub-process
handling this new client connection. This is of course pretty heavy if the
new-connection rate is high, but for long-lived connections like this, it's
acceptable.

To communicate between the server proc and the proc handling the client, I
create a socket pair using socket.socketpair() keeping one end in the server
process, and the other in the client-handler process. This client-handler
process handles the heavy-weight messages and proxies other messages to the
server process [1]. This is fairly elegant I think, it starts here:

https://github.com/mattbillenstein/salty/blob/master/server.py#L90

Where we accept new connections - and the Popen()'d process runs here:

https://github.com/mattbillenstein/salty/blob/master/client_proc.py

I pass the fds to the new process via the cli (sys.argv) - thought being
explicit here was easiest.

So the picture before was just:

+--------+ +--------+
| server | <---> | client |
+--------+ +--------+

It became:

+--------+ +-------------+ +--------+
| server | <---> | client-proc | <---> | client |
+--------+ +-------------+ +--------+

Where server/client are typically on different machines, but server/client-proc
are always on the same machine.

This entire project implements a config management and deployment system in the
style of saltstack and ansible - stealing ideas from both. The client machines
run a long-lived agent, wait for commands from the server, and can make async
requests to the server for various file resources using a custom async msgpack
rpc protocol. I put this behind a chatops style deployment frontend and for
small changes can deploy to dozens or more systems in <10s.

thx

m


1. https://github.com/mattbillenstein/salty/blob/master/client_proc.py#L76

--
Matt Billenstein
ma...@vazor.com
https://vazor.com

Aleksandar Kordic

unread,
May 20, 2025, 1:22:51 AMMay 20
to gev...@googlegroups.com
Hi Matt,

Can you describe in more detail why threadpool is not suitable for CPU bound work in your project? 

--
You received this message because you are subscribed to the Google Groups "gevent: coroutine-based Python network library" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gevent+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/gevent/682be7cd.920a0220.2e23c6.4e7eSMTPIN_ADDED_BROKEN%40gmr-mx.google.com.

Matt Billenstein

unread,
May 20, 2025, 2:10:00 AMMay 20
to gev...@googlegroups.com
On Tue, May 20, 2025 at 07:22:26AM +0200, Aleksandar Kordic wrote:
> Hi Matt,

Hi Aleksandar,

> Can you describe in more detail why threadpool is not suitable for CPU bound
> work in your project? 

I think I ruled this out due to the GIL - or at least, with separate processes,
each with their own event loop and their own GIL, they can't contend with one
another by definition; and there are times where the whole thing having clear
access to multiple cores is desirable.

In other languages, threads would be the clear choice and I actually thought
about re-implementing the whole thing in C++ or something, but for my use case,
and keeping the thing as small as possible being a design goal, Python still
wins.

thx

m

Aleksandar Kordic

unread,
May 20, 2025, 3:18:49 AMMay 20
to gev...@googlegroups.com
Ah, I think this is the point where backend devs move away from Python server code. Because Gevent has all the right scaling abstractions, slightly better than golang, and GIL stops Python from utilizing all cores.

What are you planning to do to scale up to more than one server?



Matt Billenstein

unread,
May 20, 2025, 3:51:07 AMMay 20
to gev...@googlegroups.com
I don't have such plans - I will vertically scale this for the time being, I
could probably orchestrate hundreds of machines with thousands of cores with
this as it is which is significantly beyond my needs at the moment.

Aleksandar Kordic

unread,
May 20, 2025, 4:32:32 AMMay 20
to gev...@googlegroups.com
On Tue, May 20, 2025 at 9:51 AM 'Matt Billenstein' via gevent: coroutine-based Python network library <gev...@googlegroups.com> wrote:
I don't have such plans - I will vertically scale this for the time being, I
could probably orchestrate hundreds of machines with thousands of cores with
this as it is which is significantly beyond my needs at the moment.

What do you think about approaching vertical scaling with the same solution for horizontal scaling? This is usual for python deployments, to have nginx loadbalancing requests to many python processes. 

For example 16 gevent servers on one machine. 

Then when imbalance is detected move long running connection to other server. This can be simple redirect command. 

Avoiding popen is great move to simplicity. Sticking to gevent concepts preserves momentum in future dev. Makes future scaling easier. 
Reply all
Reply to author
Forward
0 new messages