Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Simple TCP proxy

67 views
Skip to first unread message

Morten W. Petersen

unread,
Jul 27, 2022, 12:08:06 PM7/27/22
to
Hi.

I'd like to share with you a recent project, which is a simple TCP proxy
that can stand in front of a TCP server of some sort, queueing requests and
then allowing n number of connections to pass through at a time:

https://github.com/morphex/stp

I'll be developing it further, but the the files committed in this tree
seem to be stable:

https://github.com/morphex/stp/tree/9910ca8c80e9d150222b680a4967e53f0457b465

I just bombed that code with 700+ requests almost simultaneously, and STP
handled it well.

Regards,

Morten

--
I am https://leavingnorway.info
Videos at https://www.youtube.com/user/TheBlogologue
Twittering at http://twitter.com/blogologue
Blogging at http://blogologue.com
Playing music at https://soundcloud.com/morten-w-petersen
Also playing music and podcasting here:
http://www.mixcloud.com/morten-w-petersen/
On Google+ here https://plus.google.com/107781930037068750156
On Instagram at https://instagram.com/morphexx/

Chris Angelico

unread,
Jul 27, 2022, 1:58:51 PM7/27/22
to
On Thu, 28 Jul 2022 at 02:15, Morten W. Petersen <mor...@gmail.com> wrote:
>
> Hi.
>
> I'd like to share with you a recent project, which is a simple TCP proxy
> that can stand in front of a TCP server of some sort, queueing requests and
> then allowing n number of connections to pass through at a time:

How's this different from what the networking subsystem already does?
When you listen, you can set a queue length. Can you elaborate?

ChrisA

Morten W. Petersen

unread,
Jul 27, 2022, 2:33:06 PM7/27/22
to
Hi Chris.

You're thinking of the backlog argument of listen?

Well, STP will accept all connections, but can limit how many of the
accepted connections that are active at any given time.

So when I bombed it with hundreds of almost simultaneous connections, all
of them were accepted, but only 25 were actively sending and receiving data
at any given time. First come, first served.

Regards,

Morten
> --
> https://mail.python.org/mailman/listinfo/python-list
On Instagram at https://instagram.com/morphexx/

Chris Angelico

unread,
Jul 27, 2022, 3:55:23 PM7/27/22
to
On Thu, 28 Jul 2022 at 04:32, Morten W. Petersen <mor...@gmail.com> wrote:
>
> Hi Chris.
>
> You're thinking of the backlog argument of listen?

Yes, precisely.

> Well, STP will accept all connections, but can limit how many of the accepted connections that are active at any given time.
>
> So when I bombed it with hundreds of almost simultaneous connections, all of them were accepted, but only 25 were actively sending and receiving data at any given time. First come, first served.
>

Hmm. Okay. Not sure what the advantage is, but sure.

If the server's capable of handling the total requests-per-minute,
then a queueing system like this should help with burst load, although
I would have thought that the listen backlog would do the same. What
happens if the server actually gets overloaded though? Do connections
get disconnected after appearing connected? What's the disconnect
mode?

BTW, you probably don't want to be using the _thread module - Python
has a threading module which is better suited to this sort of work.
Although you may want to consider asyncio instead, as that has far
lower overhead when working with large numbers of sockets.

ChrisA

Martin Di Paola

unread,
Jul 27, 2022, 4:58:42 PM7/27/22
to

On Wed, Jul 27, 2022 at 08:32:31PM +0200, Morten W. Petersen wrote:
>You're thinking of the backlog argument of listen?

From my understanding, yes, when you set up the "accepter" socket (the
one that you use to listen and accept new connections), you can define
the length of the queue for incoming connections that are not accepted
yet.

This will be the equivalent of your SimpleQueue which basically puts a
limits on how many incoming connections are "accepted" to do a real job.

Using skt.listen(N) the incoming connections are put on hold by the OS
while in your implementation are formally accepted but they are not
allowed to do any meaningful work: they are put on the SimpleQueue and
only when they are popped then they will work (send/recv data).

The difference then between the OS and your impl is minimal. The only
case that I can think is that on the clients' side it may exist a
timeout for the acceptance of the connection so your proxy server will
eagerly accept these connections so no timeout is possible(*)

On a side note, you implementation is too thread-naive: it uses plain
Python lists, integers and boolean variables which are not thread safe.
It is a matter of time until your server will start behave weird.

One option is that you use thread-safe objects. I'll encourage to read
about thread-safety in general and then which sync mechanisms Python
offers.

Another option is to remove the SimpleQueue and the background function
that allows a connection to be "active".

If you think, the handlers are 99% independent except that you want to
allow only N of them to progress (stablish and forward the connection)
and when a handler finishes, another handler "waiting" is activated, "in
a queue fashion" as you said.

If you allow me to not have a strict queue discipline here, you can achieve
the same results coordinating the handlers using semaphores. Once again,
take this email as starting point for your own research.

On a second side note, the use of handlers and threads is inefficient
because while you have N active handlers sending/receiving data, because
you are eagerly accepting new connections you will have much more
handlers created and (if I'm not wrong), each will be a thread.

A more efficient solution could be

1) accept as many connections as you can, saving the socket (not the
handler) in the thread-safe queue.
2) have N threads in the background popping from the queue a socket and
then doing the send/recv stuff. When the thread is done, the thread
closes the socket and pops another from the queue.

So the queue length will be the count of accepted connections but in any
moment your proxy will not activate (forward) more than N connections.

This idea is thread-safe, simpler, efficient and has the queue
discipline (I leave aside the usefulness).

I encourage you to take time to read about the different things
mentioned as concurrency and thread-related stuff is not easy to
master.

Thanks,
Martin.

(*) make your proxy server slow enough and yes, you will get timeouts
anyways.

>
>Well, STP will accept all connections, but can limit how many of the
>accepted connections that are active at any given time.
>
>So when I bombed it with hundreds of almost simultaneous connections, all
>of them were accepted, but only 25 were actively sending and receiving data
>at any given time. First come, first served.
>
>Regards,
>
>Morten
>
>On Wed, Jul 27, 2022 at 8:00 PM Chris Angelico <ros...@gmail.com> wrote:
>
>--
>https://mail.python.org/mailman/listinfo/python-list

Barry

unread,
Jul 28, 2022, 2:32:22 AM7/28/22
to


> On 27 Jul 2022, at 17:16, Morten W. Petersen <mor...@gmail.com> wrote:
>
> Hi.
>
> I'd like to share with you a recent project, which is a simple TCP proxy
> that can stand in front of a TCP server of some sort, queueing requests and
> then allowing n number of connections to pass through at a time:
>
> https://github.com/morphex/stp
>
> I'll be developing it further, but the the files committed in this tree
> seem to be stable:
>
> https://github.com/morphex/stp/tree/9910ca8c80e9d150222b680a4967e53f0457b465
>
> I just bombed that code with 700+ requests almost simultaneously, and STP
> handled it well.

What is the problem that this solves?

Why not just increase the allowed size of the socket listen backlog if you just want to handle bursts of traffic.

I do not think of this as a proxy, rather a tunnel.
And the tunnel is a lot more expensive the having kernel keep the connection in
the listen socket backlog.

I work on a web proxy written on python that handles huge load and
using backlog of the bursts.

It’s async using twisted as threads are not practice at scale.

Barry

>
> Regards,
>
> Morten
> https://mail.python.org/mailman/listinfo/python-list
>

Morten W. Petersen

unread,
Jul 28, 2022, 5:15:18 AM7/28/22
to
OK, I'll have a look at using something else than _threading.

I quickly saw a couple of points where code could be optimized for speed,
the loop that transfers data back and forth also has low throughput, but
first priority was getting it working and seeing that it is fairly stable.
On Instagram at https://instagram.com/morphexx/



On Wed, Jul 27, 2022 at 9:57 PM Chris Angelico <ros...@gmail.com> wrote:

> On Thu, 28 Jul 2022 at 04:32, Morten W. Petersen <mor...@gmail.com>
> wrote:
> >
> > Hi Chris.
> >
> > You're thinking of the backlog argument of listen?
>
> Yes, precisely.
>
> > Well, STP will accept all connections, but can limit how many of the
> accepted connections that are active at any given time.
> >
> > So when I bombed it with hundreds of almost simultaneous connections,
> all of them were accepted, but only 25 were actively sending and receiving
> data at any given time. First come, first served.
> >
>
> Hmm. Okay. Not sure what the advantage is, but sure.
>
> If the server's capable of handling the total requests-per-minute,
> then a queueing system like this should help with burst load, although
> I would have thought that the listen backlog would do the same. What
> happens if the server actually gets overloaded though? Do connections
> get disconnected after appearing connected? What's the disconnect
> mode?
>
> BTW, you probably don't want to be using the _thread module - Python
> has a threading module which is better suited to this sort of work.
> Although you may want to consider asyncio instead, as that has far
> lower overhead when working with large numbers of sockets.
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list

Morten W. Petersen

unread,
Jul 28, 2022, 5:32:03 AM7/28/22
to
Hi Barry.

Well, I can agree that using backlog is an option for handling bursts. But
what if that backlog number is exceeded? How easy is it to deal with such
a situation?

I just cloned twisted, and compared the size:

morphex@morphex-Latitude-E4310:~$ du -s stp; du -s tmp/twisted/
464 stp
98520 tmp/twisted/
morphex@morphex-Latitude-E4310:~$ du -sh stp/LICENSE
36K stp/LICENSE

>>> 464/98520.0
0.004709703613479496
>>>

It's quite easy to get an idea of what's going on in STP, as opposed to if
something goes wrong in Twisted with the size of the codebase. I used to
use emacs a lot, but then I came into a period where it was more practical
to use nano, and I mostly use nano now, unless I need to for example search
and replace or something like that.

-Morten

On Thu, Jul 28, 2022 at 8:31 AM Barry <ba...@barrys-emacs.org> wrote:

>
>
> > On 27 Jul 2022, at 17:16, Morten W. Petersen <mor...@gmail.com> wrote:
> >
> > Hi.
> >
> > I'd like to share with you a recent project, which is a simple TCP proxy
> > that can stand in front of a TCP server of some sort, queueing requests
> and
> > then allowing n number of connections to pass through at a time:
> >
> > https://github.com/morphex/stp
> >
> > I'll be developing it further, but the the files committed in this tree
> > seem to be stable:
> >
> >
> https://github.com/morphex/stp/tree/9910ca8c80e9d150222b680a4967e53f0457b465
> >
> > I just bombed that code with 700+ requests almost simultaneously, and STP
> > handled it well.
>
> What is the problem that this solves?
>
> Why not just increase the allowed size of the socket listen backlog if you
> just want to handle bursts of traffic.
>
> I do not think of this as a proxy, rather a tunnel.
> And the tunnel is a lot more expensive the having kernel keep the
> connection in
> the listen socket backlog.
>
> I work on a web proxy written on python that handles huge load and
> using backlog of the bursts.
>
> It’s async using twisted as threads are not practice at scale.
>
> Barry
>
> >
> > Regards,
> >
> > Morten
> >
> > --
> > I am https://leavingnorway.info
> > Videos at https://www.youtube.com/user/TheBlogologue
> > Twittering at http://twitter.com/blogologue
> > Blogging at http://blogologue.com
> > Playing music at https://soundcloud.com/morten-w-petersen
> > Also playing music and podcasting here:
> > http://www.mixcloud.com/morten-w-petersen/

Morten W. Petersen

unread,
Jul 28, 2022, 5:40:16 AM7/28/22
to
Hi Martin.

I was thinking of doing something with the handle function, but just this
little tweak:

https://github.com/morphex/stp/commit/9910ca8c80e9d150222b680a4967e53f0457b465

made a huge difference in CPU usage. Hundreds of waiting sockets are now
using 20-30% of CPU instead of 10x that. So for example making the handle
function exit / stop and wait isn't necessary at this point. It also opens
up the possibility of sending a noop that is appropriate for the given
protocol.

I've not done a lot of thread programming before, but yes, locks can be
used and will be used if necessary. I wasn't sure what data types were
thread safe in Python, and it might be that some variables could be off by
1 or more, if using <= >= checks is an option and that there is no risk of
the variable containing "garbage".

I think with a simple focus, that the project is aimed at one task, will
make it easier to manage even complex matters such as concurrency and
threads.

-Morten

On Wed, Jul 27, 2022 at 11:00 PM Martin Di Paola <martinp...@gmail.com>
wrote:

>
> On Wed, Jul 27, 2022 at 08:32:31PM +0200, Morten W. Petersen wrote:
> >You're thinking of the backlog argument of listen?
>
> >Well, STP will accept all connections, but can limit how many of the
> >accepted connections that are active at any given time.
> >
> >So when I bombed it with hundreds of almost simultaneous connections, all
> >of them were accepted, but only 25 were actively sending and receiving
> data
> >at any given time. First come, first served.
> >
> >Regards,
> >
> >Morten
> >
> >On Wed, Jul 27, 2022 at 8:00 PM Chris Angelico <ros...@gmail.com> wrote:
> >
> >> On Thu, 28 Jul 2022 at 02:15, Morten W. Petersen <mor...@gmail.com>
> >> wrote:
> >> >
> >> > Hi.
> >> >
> >> > I'd like to share with you a recent project, which is a simple TCP
> proxy
> >> > that can stand in front of a TCP server of some sort, queueing
> requests
> >> and
> >> > then allowing n number of connections to pass through at a time:
> >>
> >> How's this different from what the networking subsystem already does?
> >> When you listen, you can set a queue length. Can you elaborate?
> >>
> >> ChrisA
> >--
> >
> >I am https://leavingnorway.info
> >
> >Videos at https://www.youtube.com/user/TheBlogologue
> >Twittering at http://twitter.com/blogologue
> >
> >Blogging at http://blogologue.com
> >Playing music at https://soundcloud.com/morten-w-petersen
> >
> >Also playing music and podcasting here:
> >http://www.mixcloud.com/morten-w-petersen/
> >

Chris Angelico

unread,
Jul 28, 2022, 5:43:35 AM7/28/22
to
On Thu, 28 Jul 2022 at 19:41, Morten W. Petersen <mor...@gmail.com> wrote:
>
> Hi Martin.
>
> I was thinking of doing something with the handle function, but just this
> little tweak:
>
> https://github.com/morphex/stp/commit/9910ca8c80e9d150222b680a4967e53f0457b465
>
> made a huge difference in CPU usage. Hundreds of waiting sockets are now
> using 20-30% of CPU instead of 10x that.

.... wait, what?

Why do waiting sockets consume *any* measurable amount of CPU? Why
don't the threads simply block until it's time to do something?

ChrisA

Paul Rubin

unread,
Jul 28, 2022, 5:46:31 AM7/28/22
to
"Morten W. Petersen" <mor...@gmail.com> writes:
> I quickly saw a couple of points where code could be optimized for speed,
> the loop that transfers data back and forth also has low throughput, but
> first priority was getting it working and seeing that it is fairly stable.

Well I think the idea was to avoid using _thread which is a low level
interface to posix threads. "threading" is a higher level wrapper that
you should use instead. The alternative is asyncio which is likely to
be faster, and some people prefer it, but I've personally always found
threading to be more conceptually straightforward. I haven't had
trouble handling a few thousand connections with threads (2 threads per
connection) on a reasonable sized machine. If speed is that big an
issue you probably should consider alternatives to Python.

Morten W. Petersen

unread,
Jul 28, 2022, 7:01:46 AM7/28/22
to
Well, I was thinking of following the socketserver / handle layout of code
and execution, for now anyway.

It wouldn't be a big deal to make them block, but another option is to
increase the sleep period 100% for every 200 waiting connections while
waiting in handle.

Another thing is that it's nice to see Python handling 500+ threads without
problems. :)

-Morten

Chris Angelico

unread,
Jul 28, 2022, 8:29:53 AM7/28/22
to
On Thu, 28 Jul 2022 at 21:01, Morten W. Petersen <mor...@gmail.com> wrote:
>
> Well, I was thinking of following the socketserver / handle layout of code and execution, for now anyway.
>
> It wouldn't be a big deal to make them block, but another option is to increase the sleep period 100% for every 200 waiting connections while waiting in handle.

Easy denial-of-service attack then. Spam connections and the queue
starts blocking hard. The sleep loop seems like a rather inefficient
way to do things.

> Another thing is that it's nice to see Python handling 500+ threads without problems. :)

Yeah, well, that's not all THAT many threads, ultimately :)

ChrisA

Barry

unread,
Jul 28, 2022, 8:30:03 AM7/28/22
to


> On 28 Jul 2022, at 10:31, Morten W. Petersen <mor...@gmail.com> wrote:
>
> 
> Hi Barry.
>
> Well, I can agree that using backlog is an option for handling bursts. But what if that backlog number is exceeded? How easy is it to deal with such a situation?

You can make backlog very large, if that makes sense.
But at some point you will be forced to reject connections,
once you cannot keep up with the average rate of connections.


>
> I just cloned twisted, and compared the size:
>
> morphex@morphex-Latitude-E4310:~$ du -s stp; du -s tmp/twisted/
> 464 stp
> 98520 tmp/twisted/
> morphex@morphex-Latitude-E4310:~$ du -sh stp/LICENSE
> 36K stp/LICENSE
>
> >>> 464/98520.0
> 0.004709703613479496
> >>>
>
> It's quite easy to get an idea of what's going on in STP, as opposed to if something goes wrong in Twisted with the size of the codebase. I used to use emacs a lot, but then I came into a period where it was more practical to use nano, and I mostly use nano now, unless I need to for example search and replace or something like that.

I mentioned twisted for context. Depending on yours need the built in python 3 async support may well be sufficient for you needs. Using threads is not scalable.

In the places I code disk space of a few MiB is not an issue.

Barry

>
> -Morten
>
>> On Thu, Jul 28, 2022 at 8:31 AM Barry <ba...@barrys-emacs.org> wrote:
>>
>>
>> > On 27 Jul 2022, at 17:16, Morten W. Petersen <mor...@gmail.com> wrote:
>> >
>> > Hi.
>> >
>> > I'd like to share with you a recent project, which is a simple TCP proxy
>> > that can stand in front of a TCP server of some sort, queueing requests and
>> > then allowing n number of connections to pass through at a time:
>> >
>> > https://github.com/morphex/stp
>> >
>> > I'll be developing it further, but the the files committed in this tree
>> > seem to be stable:
>> >
>> > https://github.com/morphex/stp/tree/9910ca8c80e9d150222b680a4967e53f0457b465
>> >
>> > I just bombed that code with 700+ requests almost simultaneously, and STP
>> > handled it well.
>>
>> What is the problem that this solves?
>>
>> Why not just increase the allowed size of the socket listen backlog if you just want to handle bursts of traffic.
>>
>> I do not think of this as a proxy, rather a tunnel.
>> And the tunnel is a lot more expensive the having kernel keep the connection in
>> the listen socket backlog.
>>
>> I work on a web proxy written on python that handles huge load and
>> using backlog of the bursts.
>>
>> It’s async using twisted as threads are not practice at scale.
>>
>> Barry
>>
>> >
>> > Regards,
>> >
>> > Morten
>> >

Paul Rubin

unread,
Jul 28, 2022, 12:43:59 PM7/28/22
to
"Morten W. Petersen" <mor...@gmail.com> writes:
> Well, I was thinking of following the socketserver / handle layout of
> code and execution, for now anyway. It wouldn't be a big deal to make
> them block,

Yes, use SocketServer.ThreadingTCPServer and it handles the threads and
blocking automatically. I don't even see how to make them not block.
The whole idea of using threads is to have the individual tasks block as
if they are synchronous, and let the OS handle the messy stuff of
blocking and unblocking. I am somewhat curious as to what you are doing
so that the threads burn cpu instead of blocking.

Morten W. Petersen

unread,
Jul 28, 2022, 5:23:15 PM7/28/22
to
Forwarding to the list as well.

---------- Forwarded message ---------
From: Morten W. Petersen <mor...@gmail.com>
Date: Thu, Jul 28, 2022 at 11:22 PM
Subject: Re: Simple TCP proxy
To: Chris Angelico <ros...@gmail.com>


Well, an increase from 0.1 seconds to 0.2 seconds on "polling" in each
thread whether or not the connection should become active doesn't seem like
a big deal.

And there's also some point where it is pointless to accept more
connections, and where maybe remedies like accepting known good IPs,
blocking IPs / IP blocks with more than 3 connections etc. should be
considered.

I think I'll be getting closer than most applications to an eventual
ceiling for what Python can handle of threads, and that's interesting and
could be beneficial for Python as well.

-Morten

On Thu, Jul 28, 2022 at 2:31 PM Chris Angelico <ros...@gmail.com> wrote:

> On Thu, 28 Jul 2022 at 21:01, Morten W. Petersen <mor...@gmail.com>
> wrote:
> >
> > Well, I was thinking of following the socketserver / handle layout of
> code and execution, for now anyway.
> >
> > It wouldn't be a big deal to make them block, but another option is to
> increase the sleep period 100% for every 200 waiting connections while
> waiting in handle.
>
> Easy denial-of-service attack then. Spam connections and the queue
> starts blocking hard. The sleep loop seems like a rather inefficient
> way to do things.
>
> > Another thing is that it's nice to see Python handling 500+ threads
> without problems. :)
>
> Yeah, well, that's not all THAT many threads, ultimately :)
>
> ChrisA

Morten W. Petersen

unread,
Jul 28, 2022, 5:25:34 PM7/28/22
to
Well, it's not just code size in terms of disk space, it is also code
complexity, and the level of knowledge, skill and time it takes to make use
of something.

And if something fails in an unobvious way in Twisted, I imagine that
requires somebody highly skilled, and that costs quite a bit of money. And
people like that might also not always be available.

-Morten
> -Morten
>

Chris Angelico

unread,
Jul 28, 2022, 6:09:07 PM7/28/22
to
On Fri, 29 Jul 2022 at 07:24, Morten W. Petersen <mor...@gmail.com> wrote:
>
> Forwarding to the list as well.
>
> ---------- Forwarded message ---------
> From: Morten W. Petersen <mor...@gmail.com>
> Date: Thu, Jul 28, 2022 at 11:22 PM
> Subject: Re: Simple TCP proxy
> To: Chris Angelico <ros...@gmail.com>
>
>
> Well, an increase from 0.1 seconds to 0.2 seconds on "polling" in each
> thread whether or not the connection should become active doesn't seem like
> a big deal.

Maybe, but polling *at all* is the problem here. It shouldn't be
hammering the other server. You'll quickly find that there are limits
that simply shouldn't exist, because every connection is trying to
check to see if it's active now. This is *completely unnecessary*.
I'll reiterate the advice given earlier in this thread (of
conversation): Look into the tools available for thread (of execution)
synchronization, such as mutexes (in Python, threading.Lock) and
events (in Python, threading.Condition). A poll interval enforces a
delay before the thread notices that it's active, AND causes inactive
threads to consume CPU, neither of which is a good thing.

> And there's also some point where it is pointless to accept more
> connections, and where maybe remedies like accepting known good IPs,
> blocking IPs / IP blocks with more than 3 connections etc. should be
> considered.

Firewalling is its own science. Blocking IPs with too many
simultaneous connections should be decided administratively, not
because your proxy can't handle enough connections.

> I think I'll be getting closer than most applications to an eventual
> ceiling for what Python can handle of threads, and that's interesting and
> could be beneficial for Python as well.

Here's a quick demo of the cost of threads when they're all blocked on
something.

>>> import threading
>>> finish = threading.Condition()
>>> def thrd(cond):
... with cond: cond.wait()
...
>>> threading.active_count() # Main thread only
1
>>> import time
>>> def spawn(n):
... start = time.monotonic()
... for _ in range(n):
... t = threading.Thread(target=thrd, args=(finish,))
... t.start()
... print("Spawned", n, "threads in", time.monotonic() - start, "seconds")
...
>>> spawn(10000)
Spawned 10000 threads in 7.548425202025101 seconds
>>> threading.active_count()
10001
>>> with finish: finish.notify_all()
...
>>> threading.active_count()
1

It takes a bit of time to start ten thousand threads, but after that,
the system is completely idle again until I notify them all and they
shut down.

(Interestingly, it takes four times as long to start 20,000 threads,
suggesting that something in thread spawning has O(n²) cost. Still,
even that leaves the system completely idle once it's done spawning
them.)

If your proxy can handle 20,000 threads, I would be astonished. And
this isn't even close to a thread limit.

Obviously the cost is different if the threads are all doing things,
but if you have thousands of active socket connections, you'll start
finding that there are limitations in quite a few places, depending on
how much traffic is going through them. Ultimately, yes, you will find
that threads restrict you and asynchronous I/O is the only option; but
you can take threads a fairly long way before they are the limiting
factor.

ChrisA

Andrew MacIntyre

unread,
Jul 28, 2022, 9:41:49 PM7/28/22
to
On 29/07/2022 8:08 am, Chris Angelico wrote:
> It takes a bit of time to start ten thousand threads, but after that,
> the system is completely idle again until I notify them all and they
> shut down.
>
> (Interestingly, it takes four times as long to start 20,000 threads,
> suggesting that something in thread spawning has O(n²) cost. Still,
> even that leaves the system completely idle once it's done spawning
> them.)

Another cost of threads can be memory allocated as thread stack space,
the default size of which varies by OS (see e.g.
https://ariadne.space/2021/06/25/understanding-thread-stack-sizes-and-how-alpine-is-different/).

threading.stack_size() can be used to check and perhaps adjust the
allocation size.

--
-------------------------------------------------------------------------
Andrew I MacIntyre "These thoughts are mine alone..."
E-mail: and...@pcug.org.au (pref) | Snail: PO Box 370
and...@bullseye.apana.org.au (alt) | Belconnen ACT 2616
Web: http://www.andymac.org/ | Australia

Chris Angelico

unread,
Jul 28, 2022, 9:48:48 PM7/28/22
to
On Fri, 29 Jul 2022 at 11:42, Andrew MacIntyre <and...@pcug.org.au> wrote:
>
> On 29/07/2022 8:08 am, Chris Angelico wrote:
> > It takes a bit of time to start ten thousand threads, but after that,
> > the system is completely idle again until I notify them all and they
> > shut down.
> >
> > (Interestingly, it takes four times as long to start 20,000 threads,
> > suggesting that something in thread spawning has O(n²) cost. Still,
> > even that leaves the system completely idle once it's done spawning
> > them.)
>
> Another cost of threads can be memory allocated as thread stack space,
> the default size of which varies by OS (see e.g.
> https://ariadne.space/2021/06/25/understanding-thread-stack-sizes-and-how-alpine-is-different/).
>
> threading.stack_size() can be used to check and perhaps adjust the
> allocation size.
>

Yeah, they do have quite a few costs, and a naive approach of "give a
thread to every client", while very convenient, will end up limiting
throughput. (But I'll be honest: I still have a server that's built on
exactly that model, because it's much much safer than risking one
client stalling out the whole server due to a small bug. But that's a
MUD server.) Thing is, though, it'll most likely limit throughput to
something in the order of thousands of concurrent connections (or
thousands per second if it's something like HTTP where they tend to
get closed again), maybe tens of thousands. So if you have something
where every thread needs its own database connection, well, you're
gonna have database throughput problems WAY before you actually run
into thread count limitations!

ChrisA

Paul Rubin

unread,
Jul 29, 2022, 7:32:34 AM7/29/22
to
Chris Angelico <ros...@gmail.com> writes:
> If your proxy can handle 20,000 threads, I would be astonished. And
> this isn't even close to a thread limit.

I believe I've done that or something pretty close to it. One thing I
discovered was that select() stops working once you have 1024 sockets,
so I had to switch to epoll. Another was that there seemed to be a hard
limit of around 20k or 30k ports per IP address, rather than the 64k you
would expect. It took some GB of memory to have that many threads, but
that isn't a big deal on reasonably modern hardware.

OS threads aren't that great for this amount of concurrency, but async
is just a horrible code smell imho. If I needed that many connections
I'd consider writing in Erlang/Elixir or maybe Go, instead of CPython.

It would be great if we could someday have a Python implementation that
worked the way Erlang does, with lightweight processes that look like
real processes to the Python app.

Morten W. Petersen

unread,
Jul 29, 2022, 2:46:47 PM7/29/22
to
OK, that's useful to know. Thanks. :)

-Morten

On Fri, Jul 29, 2022 at 3:43 AM Andrew MacIntyre <and...@pcug.org.au>
wrote:

> On 29/07/2022 8:08 am, Chris Angelico wrote:
> > It takes a bit of time to start ten thousand threads, but after that,
> > the system is completely idle again until I notify them all and they
> > shut down.
> >
> > (Interestingly, it takes four times as long to start 20,000 threads,
> > suggesting that something in thread spawning has O(n²) cost. Still,
> > even that leaves the system completely idle once it's done spawning
> > them.)
>
> Another cost of threads can be memory allocated as thread stack space,
> the default size of which varies by OS (see e.g.
>
> https://ariadne.space/2021/06/25/understanding-thread-stack-sizes-and-how-alpine-is-different/
> ).
>
> threading.stack_size() can be used to check and perhaps adjust the
> allocation size.
>
> --
> -------------------------------------------------------------------------
> Andrew I MacIntyre "These thoughts are mine alone..."
> E-mail: and...@pcug.org.au (pref) | Snail: PO Box 370
> and...@bullseye.apana.org.au (alt) | Belconnen ACT 2616
> Web: http://www.andymac.org/ | Australia

Morten W. Petersen

unread,
Jul 29, 2022, 2:54:37 PM7/29/22
to
OK.

Well, I've worked with web hosting in the past, and proxies like squid were
used to lessen the load on dynamic backends. There was also a website
opensourcearticles.com that we had with Firefox, Thunderbird articles etc.
that got quite a bit of traffic.

IIRC, that website was mostly static with some dynamic bits and heavily
cached by squid.

Most websites don't get a lot of traffic though, and don't have a big
budget for "website system administration". So maybe that's where I'm
partly going with this, just making a proxy that can be put in front and
deal with a lot of common situations, in a reasonably good way.

If I run into problems with threads that can't be managed, then a switch to
something like the queue_manager function which has data and then functions
that manage the data and connections is an option.

-Morten
> It takes a bit of time to start ten thousand threads, but after that,
> the system is completely idle again until I notify them all and they
> shut down.
>
> (Interestingly, it takes four times as long to start 20,000 threads,
> suggesting that something in thread spawning has O(n²) cost. Still,
> even that leaves the system completely idle once it's done spawning
> them.)
>
> If your proxy can handle 20,000 threads, I would be astonished. And
> this isn't even close to a thread limit.
>
> Obviously the cost is different if the threads are all doing things,
> but if you have thousands of active socket connections, you'll start
> finding that there are limitations in quite a few places, depending on
> how much traffic is going through them. Ultimately, yes, you will find
> that threads restrict you and asynchronous I/O is the only option; but
> you can take threads a fairly long way before they are the limiting
> factor.
>
> ChrisA

Chris Angelico

unread,
Jul 29, 2022, 4:44:09 PM7/29/22
to
On Sat, 30 Jul 2022 at 04:54, Morten W. Petersen <mor...@gmail.com> wrote:
>
> OK.
>
> Well, I've worked with web hosting in the past, and proxies like squid were used to lessen the load on dynamic backends. There was also a website opensourcearticles.com that we had with Firefox, Thunderbird articles etc. that got quite a bit of traffic.
>
> IIRC, that website was mostly static with some dynamic bits and heavily cached by squid.

Yep, and squid almost certainly won't have a thread for every incoming
connection, spinning and waiting for the back end server. But squid
does a LOT more than simply queue connections - it'll be inspecting
headers and retaining a cache of static content, so it's not really
comparable.

> Most websites don't get a lot of traffic though, and don't have a big budget for "website system administration". So maybe that's where I'm partly going with this, just making a proxy that can be put in front and deal with a lot of common situations, in a reasonably good way.
>
> If I run into problems with threads that can't be managed, then a switch to something like the queue_manager function which has data and then functions that manage the data and connections is an option.
>

I'll be quite frank with you: this is not production-quality code. It
should not be deployed by anyone who doesn't have a big budget for
"website system administration *training*". This code is good as a
tool for YOU to learn how these things work; it shouldn't be a tool
for anyone who actually has server load issues.

I'm sorry if that sounds harsh, but the fact is, you can do a lot
better by using this to learn more about networking than you'll ever
do by trying to pitch it to any specific company.

That said though: it's still good to know what your (theoretical)
use-case is. That'll tell you what kinds of connection spam to throw
at your proxy (lots of idle sockets? lots of HTTP requests? billions
of half open TCP connections?) to see what it can cope with.

Keep on playing with this code. There's a lot you can gain from it, still.

ChrisA

Morten W. Petersen

unread,
Jul 29, 2022, 5:00:24 PM7/29/22
to
OK, sounds like sunshine is getting the best of you.

It's working with a pretty heavy load, I see ways of solving potential
problems that haven't become a problem yet, and I'm enjoying it.

Maybe you should tone down the coaching until someone asks for it.

Regards,

Morten

Roel Schroeven

unread,
Jul 30, 2022, 6:57:21 AM7/30/22
to
Morten W. Petersen schreef op 29/07/2022 om 22:59:
> OK, sounds like sunshine is getting the best of you.
It has to be said: that is uncalled for.

Chris gave you good advice, with the best of intentions. Sometimes we
don't like good advice if it says something we don't like, but that's no
reason to take it off on the messenger.

--
"Iceland is the place you go to remind yourself that planet Earth is a
machine... and that all organic life that has ever existed amounts to a greasy
film that has survived on the exterior of that machine thanks to furious
improvisation."
-- Sam Hughes, Ra

Barry Scott

unread,
Jul 30, 2022, 8:29:51 AM7/30/22
to
Morten,

As Chris remarked you need to learn a number of networking, python, system performance
and other skills to turn your project into production code.

Using threads does not scale very well. Its uses a lot of memory and raises CPU used
just to do the context switches. Also the GIL means that even if you are doing blocking
I/O the use of threads does not scale well.

Its rare to see multi threaded code, rather what you see is code that uses async I/O.

At its heart async code at the low level is using a kernel interface like epoll (or on old
systems select). What epoll allow you to do is wait on a sets of FDs for a range of
I/O operations. Like ready to read, ready to write and other activity (like the socket
closing).

You could write code to use epoll your self, but while fun to write you need to know
a lot about networking and linux to cover all the corner cases.

Libraries like twisted, trio, uvloop and pythons selectors implemented production quality
version of the required code with good APIs.

Do not judge these libraries by their size. They are no bloated and only as complex as
the problem they are solving requires.

There is a simple example of async code using the python selectors here that shows
the style of programming.
https://docs.python.org/3/library/selectors.html#examples <https://docs.python.org/3/library/selectors.html#examples>

The issues that you likely need to solve and test for include:
* handling unexpected socket close events.
* buffering and flow control from one socket's read to the another socket's write.
What if one side is reading slower then the other is writing?
* timeout sockets that stop sending data and close them

At some point you will exceed the capacity for one process to handle the load.
The solution we used is to listen on the socket in a parent process and fork
enough child processes to handle the I/O load. This avoids issues with the GIL
and allows you to scale.

But I am still not sure why you need to do anything more the increase the backlog
on your listen socket in the main app. Set the backlog to 1,000,000 does that fix
your issue?

You will need on Linux to change kernel limits to allow that size. See man listen
for info on what you need to change.

Barry

Morten W. Petersen

unread,
Jul 30, 2022, 3:31:06 PM7/30/22
to
I thought it was a bit much.

I just did a bit more testing, and saw that the throughput of wget through
regular lighttpd was 1,3 GB/s, while through STP it was 122 MB/s, and using
quite a bit of CPU.

Then I increased the buffer size 8-fold for reading and writing in run.py,
and the CPU usage went way down, and the transfer speed went up to 449 MB/s.

So it would require well more than a gigabit network interface to max out
STP throughput; CPU usage was around 30-40% max, on one processor core.

There is good enough, and then there's general practice and/or what is
regarded as an elegant solution. I'm looking for good enough, and in the
process I don't mind pushing the envelope on Python threading.

-Morten

On Sat, Jul 30, 2022 at 12:59 PM Roel Schroeven <ro...@roelschroeven.net>
wrote:

Barry

unread,
Jul 30, 2022, 4:32:05 PM7/30/22
to



> On 30 Jul 2022, at 20:33, Morten W. Petersen <mor...@gmail.com> wrote:
> I thought it was a bit much.
>
> I just did a bit more testing, and saw that the throughput of wget through
> regular lighttpd was 1,3 GB/s, while through STP it was 122 MB/s, and using
> quite a bit of CPU.
>
> Then I increased the buffer size 8-fold for reading and writing in run.py,
> and the CPU usage went way down, and the transfer speed went up to 449 MB/s.

You are trading latency for through put.

>
> So it would require well more than a gigabit network interface to max out
> STP throughput; CPU usage was around 30-40% max, on one processor core.

With how many connections?

>
> There is good enough, and then there's general practice and/or what is
> regarded as an elegant solution. I'm looking for good enough, and in the
> process I don't mind pushing the envelope on Python threading.

You never did answer my query on why a large backlog is not good enough.
Why do you need this program at all?

Barry
> --
> https://mail.python.org/mailman/listinfo/python-list

Morten W. Petersen

unread,
Jul 31, 2022, 2:04:04 PM7/31/22
to
Well, initially I was just curious.

As the name implies, it's a TCP proxy, and different features could go into
that.

I looked at for example port knocking for hindering unauthorized access to
the (protected) TCP service SMPS, but there you also have the possibility
of someone eavesdropping, and learning the right handshake, if you will.
So it's something that will work, until someone gets determined to make a
mess.

In short, it will give better control than backlog does, enabling
Python-style code and logic to deal with different situations.

I was about to say "deal with things intelligently"; but I think
"intelligent" is a word that doesn't fit here or in many other applications.

Say for example this service comes under attack for unknown reasons; it
could be possible to teach the proxy to only accept connections to the
backend server for IP addresses / subnets that have previously n number of
transmissions back and forth. If you know that the service will have max
50 different clients.

Anyway, what Chris said earlier, I think we can file that under "eagerness
to tech others and show what you know". Right Chris? :)

Regards,

Morten
0 new messages