maxwel...@gmail.com wrote:
> That's a command I didn't know about - thanks!
>
> My traffic is bursty, rather than steadily high.
> Whenever I manage to execute "ss -tl", on the load balancer or
> the backend, I see all zeros in the Recv-Q column, even if I've
> occupied the Starman workers.
Yeah, if the worker has actually made an accept/accept4 syscall,
that'll move the connection out of the backlog so it won't show
up in Recv-Q.
> To be sure, I'm not certain that I've occupied exactly five workers
> at the instants I get the 502 errors. Serving the application requests
> involves
> some rather heavy processing, and some other system limit, such as memory,
> could be coming into play. However, I've never had problems hitting the
> system process limit or having the workers that are running fail due to
> memory problems. So it seems likely to me that Starman closes the
> connection "voluntarily," based on its own decision that it can't serve the
> incoming request.
I doubt it, starman is one-connection-per-process, so it
won't make any accept/accept4 calls while it's doing application
processing. You can `strace -p $PID_OF_WORKER -e accept,accept4`
to be sure.
You should also also try using curl to hit the starman instances
directly and bypass nginx.
> In my ngin error.log on the load balancer, I see the message:
>
> "*6528 upstream prematurely closed connection while reading response header
> from upstream"
If all else fails, perhaps try a simple PSGI app that just calls
"sleep($A_FEW_SECONDS)" before returning to see if you can
simulate it. It may also be a network or firewall problem
between nginx and starman since they're on different boxes.
> Are there other conditions that could cause a connection to be closed
> prematurely?
Maybe workers are crashing/exiting and getting restarted?
> Also, where in the source code should I look for this?
> Starman itself appears to be pure Perl, with no reference to the "backlog"
> parameter that it parses. Which underlying module is doing the heavy
> lifting, and where does it make a decision about returning a 502
> or closing a connection?
Probably somewhere in the Net::Server::* suite which starman
inherits from. I'm not familiar with the code myself, but
the "backlog" should be the parameter passed to the listen(2)
system call which you can verify via strace.