unexpected process scheduler behaviour with nice 19 and fork/vfork

Skip to first unread message

Juergen Keil

Mar 17, 2002, 2:02:02 PM3/17/02
Background: The infamous seti@home client on a solaris 8 single
processor machine, running in the background 'nice'd down to the max
(using "nice 19" or "priocntl -e -c TS -p -60 ...") still has a
dramatic negative effect, at least on shell script performance. As an
example, I've measured the time for a typical "configure" script run
for some opensource application. "configure" uses 41 seconds on an
otherwise idle solaris 8 box, but that drops down to 3 minutes 40
seconds (a factor of *five* slower!) when a "nice 19" compute
intensive background process is active.

I've tried to reproduce the problem with some simple C programms,
trying to understand the reason for this behaviour and perhaps getting
some ideas how to tune the system to avoid the performance drop, and
came up with the following:

First I start a simple compute intensive program. I run it at
time-sharing user level priority of -60, which should guarantee that
the process' scheduling priority is always 0, the lowest possible
priority available in the SVR4 kernel:

% cat loop.c

main() {
for (;;)

% gcc -o loop loop.c

% priocntl -e -c TS -p -60 ./loop &

Now, to simulate a shell script (i.e. a shell process forking lots of
small unix utility programms), I run a loop 'fork'ing 1000 times a
/bin/date command:

% cat fork.c

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/wait.h>

main(int argc, char **argv)
int i;
int loops = 1000;
int use_fork = 1;
int dev_null = -1;
pid_t pid;
int status;

while ((i = getopt(argc, argv, "fvn")) != -1) {
switch (i) {
case 'f': use_fork = 1; break;
case 'v': use_fork = 0; break;
case 'n': dev_null = open("/dev/null", O_WRONLY); break;

if (argv[optind])
loops = atoi(argv[optind]);

printf("run %d date commands, using %s, output to %s\n",
use_fork ? "fork" : "vfork",
dev_null >= 0 ? "/dev/null" : "STDOUT");

for (i = 0; i < loops; i++) {
pid = use_fork ? fork() : vfork();
if (pid < 0) {
if (pid == 0) {
if (dev_null >= 0) {
dup2(dev_null, 1);
dup2(dev_null, 2);
execl("/bin/date", "date", NULL);
perror("exec /bin/date");
return 0;

% gcc -o fork fork.c

% /bin/time priocntl -e -c TS fork -fn
run 1000 date commands, using fork, output to /dev/null

real 1:25.2
user 0.9
sys 3.3

This is extremely slow. 'fork' only got ~ 5% CPU time, and according
to 'top' the niced down 'loop' process still grabs >95% of the
available CPU time?! :-/

Wouldn't it be nice, if the solaris process scheduler would be a bit
smarter than that?

Another test, without background activity:

% pkill -STOP loop
% /bin/time priocntl -e -c TS fork -fn
run 1000 date commands, using fork, output to /dev/null

real 8.9
user 2.5
sys 5.5

Just to verify the speed when there's no background activity: the 1000
forks of /bin/date run ~10 times faster (85 seconds vs 8.9 seconds)!

And now, using vfork() instead of fork(), with background activity:

% pkill -CONT loop
% /bin/time priocntl -e -c TS fork -vn
run 1000 date commands, using vfork, output to /dev/null

real 7.4
user 2.1
sys 4.3

This looks *much* better. 'fork' got >80% of the CPU time when the
date processes are started using vfork().

More observations:

- In the vfork() case, the 'fork' process often runs at a priority > 0
Using fork(), the 'fork' processs priority drops to 0, so it
probably competes with the niced 'loop' process in the same time-
sharing dispatch queue for time slices.

- reducing the length of the time slice used at priority level 0
with the dispadmin(1M) command speeds up the fork() test a bit.

# dispadmin -c TS -g > /tmp/ts.200
# sed s/RES=1000/RES=4000/ /tmp/ts.200 > /tmp/ts.50
# dispadmin -c TS -s /tmp/ts.50

% /bin/time priocntl -e -c TS fork -fn
run 1000 date commands, using fork, output to /dev/null

real 39.0
user 0.3
sys 1.4

Some improvement, but still slow.

- increasing the time-sharing user priority for 'fork' by one unit

(you have to raise the user priority limit TSUPRILIM first, this
step requires 'super-user' permission)

# /bin/time priocntl -s -c TS -m 1 {pid-of-shell-running-the-test}

% priocntl -d $$
28482 1 0

% /bin/time priocntl -e -c TS -p 1 fork -fn
run 1000 date commands, using fork, output to /dev/null

real 9.0
user 2.6
sys 5.4

Now this is the performance I was expecting!!

- The solaris "IA" interactive scheduling class obviously offers a
hack similar to raising the user level priority as described above,
but at least under Sun's Xserver with the SolarisIA server extension
it seems a bit un-deterministic whether the current X11 window with
the input focus gets the priority boost or not [*]. And once you
move the focus away from the window, the priority boost is gone.

([*] See the thread "CDE dtterm and SolarisIA server extension")

That is, when the command

% priocntl -d $$
12141 0 0 0

lists "0" in the IAMODE column, "fork -fn" will be slow, and when it
lists IAMODE == 1, "fork -fn" will run at a decent speed.

Any other clever ideas (or workarounds / bug fixes / ...) how to tune
a solaris system for such a type of workload (that is, for a process
that consumes idle cpu cycles and that does not have this type of
negative performance impact on the system)?

Reply all
Reply to author
0 new messages