I've been been having a rather obnoxious problem with "cannot spawn
child process" errors in my apache logs (1.2.4 and 1.2.0) (around 20,000
- 50,000 a day). Since I have MaxClients, MinSpareServers and
MaxSpareServers set at rediculously high values, I'm pretty sure the
kernel is to blame.
I've tried adjusting the file-max and inode-max at runtime (which at
least two documents I found recommended) like so:
echo 4096 > /proc/sys/kernel/file-max
echo 12288 > /proc/sys/kernel/inode-max
without luck. I also tried
echo 300 400 500 > /proc/sys/vm/freepages
which the kernel HOWTO suggested, although it was unclear what exactly
this is supposed to do.
I also played around with the values in include/linux/fs.h and limits.h
in the 2.0.33 and 2.0.30 kernels with some random results. I finally
got a kernel working with the following values
limits.h:
#define NR_OPEN 1024 /* 256*/
#define OPEN_MAX 1024 /* 256*/ /* # open files a process may
have */
fs.h:
#define NR_OPEN 1024 /*256*/
#define NR_INODE 65536 /*3072*/ /* this should be bigger than NR_FILE */
#define NR_FILE 16384 /*1024*/ /* this can well be larger on a larger
system */
I still experienced (am experiencing) the same problems with apache.
Increasing NR_OPEN and OPEN_MAX to 4096 or higher seemed to prevent the
kernel from booting on both machines I tried it on (it would freeze
right after adding swap). The comments in the headers weren't very
clearn about what NR_OPEN and OPEN_MAX are exactly (and how they
differ).
If anyone has any ideas why apache might still be misbehaving or if I'm
overlooking something else I need to do to increase the effective
maximum number of file descriptors, I'd be much obliged.. the Apache
documentation said that the 'cannot spawn child process' is generally
related to a shortages of file descriptors, so I assumed that was my
problem. If anyone has any other ideas, please let me know. The server
is doing primarily CGI (perl), if that any help.
Also, if someone could shed some light on what OPEN_MAX and NR_OPEN
control and how they differ, that would be helpful as well.
thanks--
sage
> I've been been having a rather obnoxious problem with "cannot spawn
> child process" errors in my apache logs (1.2.4 and 1.2.0) (around 20,000
> - 50,000 a day). Since I have MaxClients, MinSpareServers and
> MaxSpareServers set at rediculously high values, I'm pretty sure the
> kernel is to blame.
But are you sure that it's because of filedescriptors?
What kernel are you running? If it's 2.0.30 you may be running into the
problems with the swap system.
> echo 4096 > /proc/sys/kernel/file-max
> echo 12288 > /proc/sys/kernel/inode-max
Scary thing is that I went through ALL of the code edits you did :)
Eventually Michael did a much nicer patch. It (along with header hacks
I did) is available at 'http://www.linux.org.za/filehandle.patch.linux'.
Before you go wild and apply this patch please read the below though :)
> echo 300 400 500 > /proc/sys/vm/freepages
> which the kernel HOWTO suggested, although it was unclear what exactly
> this is supposed to do.
Alleviate the swap system problems, I guess
> If anyone has any ideas why apache might still be misbehaving or if I'm
> overlooking something else I need to do to increase the effective
> maximum number of file descriptors, I'd be much obliged.. the Apache
> documentation said that the 'cannot spawn child process' is generally
> related to a shortages of file descriptors, so I assumed that was my
> problem. If anyone has any other ideas, please let me know. The server
> is doing primarily CGI (perl), if that any help.
>
> Also, if someone could shed some light on what OPEN_MAX and NR_OPEN
> control and how they differ, that would be helpful as well.
Sure. It's been a while, but I hope that this helps:
There are basically 2 limits to filedescriptors:
1) Per process filedescriptors
2) Overall system filedescriptors
With a threaded program (or a 'huge select loop' like squid) you quite often
hit the first limit. If you have 50 established connections to your program
you use 50 FD's... plus for files you read etc... I made my initial patch
for squid... and Michael made a better one for an even heavier loaded squid.
This is normally set at 256.
With apache, though, you don't have one process... you have lots, and they
don't all have all the FD's open at the same time.... if you have 256
processes running at once (all using a few FD's) then you could hit the
overall system limit.
To change the per-process limit you normally have to recompile with the
new settings. (otherwise you are stuck with the old limits still). But the
overall system FD's used aren't known by any process, so you can change those
without a problem.
To see how many FD's your system has used overall type this:
[root@kasmilos kernel]# cat /proc/sys/kernel/file-nr
1680
(this is on our FTP server - lots of processes with a few FD's each...)
Note that the number never goes down, so if you have a burst of usage then
you can't tell when it disappears....
To see what your overall limit is, type this:
[root@kasmilos kernel]# cat /proc/sys/kernel/file-max
4096
This means that between all of my processes I can open 4096 FD's.
If my 'file-nr' value was close to 'file-max' I could increase it by putting
something like this in my 'rc.local':
#Increase systemwide filehandle max
/bin/echo 4096 >/proc/sys/kernel/file-max
/bin/echo 12288 > /proc/sys/kernel/inode-max
Note that if you increase the 'file-max' value you MUST increase the
'inode-max' value, and normally in the same ratio (double one, double the
other)
To find out how many FD's a running process is using, do this:
[root@kasmilos kernel]# cat /usr/local/lib/httpd/logs/httpd.pid
5616
[root@kasmilos kernel]# ls -l /proc/5616/fd/ | wc -l
15
means it's using 15 FD's.... if this was close to 256 (in the default config
or 1024 in your new kernel) then you would have a problem. (You may also have
to check the child processes to see if they are the possible cause).
Hope that this helps...
Oskar
--
"Haven't slept at all. I don't see why people insist on sleeping. You feel
so much better if you don't. And how can anyone want to lose a minute -
a single minute of being alive?" -- Think Twice
Looks like that was the problem.. I increased NR_TASKS to 4096 and it
solved the problem..
thanks!
sage
The max value for NR_TASKS is about 4092, as a task uses two
descriptors (TSS and LDT), and the kernel reserves 8, and APM
uses 3 if enabled. This also uses 64k for the GDT.
This only applies to x86 though.
--
Andrew E. Mileski mailto:a...@netcom.ca