Schedule idle

Harald Koenig

unread,

Nov 9, 1998, 3:00:00 AM11/9/98

to

On Nov 08, yoda...@chelm.cs.nmt.edu wrote:

> Spell out what "important RT" means precisely and you will be in a position
> to make your code work. If you want a completely preemptable kernel, you
> have your work cut out for you. God alone knows what it means for a
> high priority RT task to "use the FS".

my original feature request (which was solved for us by Ingo's SCHED_IDLE)
was to run background jobs (esp. `rc5' that time) which shall _not_
(or only very minimally) compete with other running tasks, even with
nive-19 tasks (because we're using NQS/LSF style batch jobs which are
far more important than rc5, but far less important than interactive work).
so it's not an "important RT" thing in my eyes, `only' an important nuance
of scheduling policy.

if there is another solution which will claim only some very few % of CPU-time
for those background/idle jobs (vs. nice-19 tasks) I'd be happy to use it...

Harald
--
All SCSI disks will from now on ___ _____
be required to send an email notice 0--,| /OOOOOOO\
24 hours prior to complete hardware failure! <_/ / /OOOOOOOOOOO\
\ \/OOOOOOOOOOOOOOO\
\ OOOOOOOOOOOOOOOOO|//
Harald Koenig, \/\/\/\/\/\/\/\/\/
Inst.f.Theoret.Astrophysik // / \\ \
koe...@tat.physik.uni-tuebingen.de ^^^^^ ^^^^^

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

yoda...@chelm.cs.nmt.edu

unread,

Nov 9, 1998, 3:00:00 AM11/9/98

to

>
> On Nov 08, yoda...@chelm.cs.nmt.edu wrote:
>
> > Spell out what "important RT" means precisely and you will be in a position
> > to make your code work. If you want a completely preemptable kernel, you
> > have your work cut out for you. God alone knows what it means for a
> > high priority RT task to "use the FS".
>
> my original feature request (which was solved for us by Ingo's SCHED_IDLE)
> was to run background jobs (esp. `rc5' that time) which shall _not_
> (or only very minimally) compete with other running tasks, even with
> nive-19 tasks (because we're using NQS/LSF style batch jobs which are

Why can't you run those tasks at nice 18?

> --
> All SCSI disks will from now on ___ _____
> be required to send an email notice 0--,| /OOOOOOO\
> 24 hours prior to complete hardware failure! <_/ / /OOOOOOOOOOO\

Inspiring policy.

Rik van Riel

unread,

Nov 9, 1998, 3:00:00 AM11/9/98

to

On Sun, 8 Nov 1998 yoda...@chelm.cs.nmt.edu wrote:
> Richard Gooch writes:
>
> >lock, call schedule() and then re-aquire the lock. That's the friendly
> >way. The alternative is that we implement priority inversion. Either
>
> You must mean: "implement priority inheritance" -- "priority inversion" is the
> error, "priority inheritance" is so-called "fix".

> Both are symptoms of bad design. The Linux kernel does not implement
> a priority scheme for access to shared resources other than the processor.
> There are many good reasons for this: among them, priority schemes don't
> work in complex systems. By introducing fake real-time into the kernel,
> you introduce an error based on the following contradiction:
> Linux kernel assumes that a kernel mode thread should progess
> until _it_ (the thread) decides to reschedule
>
> Priority based RT assumes that the highest priority runnable
> process will advance at any time.
>
> If the OS tries to believe both propositions at the same time it will die.

Not neccesarily. There are certain things you can do
do avoid this situation. Firstly, RT processes usually
don't take up loads of CPU time. If they do, you're
better off with a separate system anyway; you're
right on that.

OTOH, I have been running RT processes for a long
time now (can you say 1.3) without actually
experiencing any of the problems you mentioned.

> >way, the problem is not with the SCHED_IDLE patch.
>
> Of course it is.

The SCHED_IDLE idea is just an extension of the
realtime paradigm. I take it you're not really
serious about wanting to back _that_ out of the
kernel.

Hint: you don't need the SCHED_IDLE patch to bump
into the priority inversion scheme, hence the
problem is somewhere else.

> >Consider the following scenario:
> >- SCHED_OTHER process bangs on the FS

Which is fully threaded, except for some properly
protected areas.

> >- low-priority RT process computes primes or whatever
> >- high-priority RT process occasionally wakes up, reads a device and
> > uses the FS.

Priority inheritance would solve the problem.

> >Just as in your example, the important RT process gets locked out of
> >the FS if schedule() is called with locks held. So, the problem isn't
> >with SCHED_IDLE, it's elsewhere.
>
> The problem is that you are introducing a complex mechanism to do
> something unspecified.

The 'something unspecified' means solving possible
problems with priority inversion that might be
hidden somewhere in the code.

> Spell out what "important RT" means precisely and you will be in a
> position to make your code work. If you want a completely
> preemptable kernel, you have your work cut out for you. God alone
> knows what it means for a high priority RT task to "use the FS".

Uhmm, this is rather vague -- what is the hidden message
in this paragraph?

Rik -- slowly getting used to dvorak kbd layout...
+-------------------------------------------------------------------+
| Linux memory management tour guide. H.H.v...@phys.uu.nl |
| Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+

c...@ix.net.nz

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

On Mon, Nov 09, 1998 at 06:51:24AM -0700, yoda...@chelm.cs.nmt.edu wrote:

> Why can't you run those tasks at nice 18?

It still only widens the gap between running and sometimes and
running when idle.

It would be nice to be able to specify process to run only when the
system is idle (e.g. main{for(;;);}), if the system is never idle,
then these processes will _never_ run. Ideally, you could have more
than one of these processes doing RR.

For example:

[me:2] caffeine:~$ nice --adjustment=19 ./a.out &
[1] 929
[me:2] caffeine:~$ nice --adjustment=18 ./a.out &
[2] 930

--> wait some, from top (updates 60s) <--

PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND
930 me 20 18 192 192 140 R N 0 60.5 0.2 1:17 a.out
929 me 19 19 192 192 140 R N 0 39.3 0.2 0:53 a.out

When what is wanted is that process 929 _never_ run whilst 930 is
running.

-cw

yoda...@chelm.cs.nmt.edu

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

On Mon, Nov 09, 1998 at 08:33:28PM +0100, Rik van Riel wrote:
> > Linux kernel assumes that a kernel mode thread should progess
> > until _it_ (the thread) decides to reschedule
> >
> > Priority based RT assumes that the highest priority runnable
> > process will advance at any time.
> >
> > If the OS tries to believe both propositions at the same time it will die.
>
> Not neccesarily. There are certain things you can do
> do avoid this situation. Firstly, RT processes usually
> don't take up loads of CPU time. If they do, you're
> better off with a separate system anyway; you're
> right on that.

Orthogonal. It's not how much time they use, it's what gets to run when.

> OTOH, I have been running RT processes for a long
> time now (can you say 1.3) without actually
> experiencing any of the problems you mentioned.

That's because of the modest implementation of Posix soft RT.

>
> > >way, the problem is not with the SCHED_IDLE patch.
> >
> > Of course it is.
>
> The SCHED_IDLE idea is just an extension of the
> realtime paradigm. I take it you're not really
> serious about wanting to back _that_ out of the
> kernel.

"just an extension" is not a meaningful term. Posix soft rt is a
peculiar spec -- all api and almost no semantics. So supporting it is
not a problem -- pretending that it can be meaningfully extended is a problem.

> Hint: you don't need the SCHED_IDLE patch to bump
> into the priority inversion scheme, hence the
> problem is somewhere else.

It's good to try to define terms precisely. If we define "priority inversion"
as the condition of a higher priority task waiting for a lower priority task,
then we see that this is an inescapable property of the Linux design.
For example, (nice -10 cat junk | nice 10 cat ) makes the higher priority
"cat" wait on the lower priority "cat" once the pipe fills. Or consider
the do_fork code and ask what happens if high priority process A wants
to fork after low priority process B has acquired the semaphore and kernel
lock. So "priority inversion" is not a "problem" in Linux, it is an
intrinsic part of the operation of the kernel. It only becomes a problem
when people want to pretend that this design can also support RT semantics.
One might just as well claim that there is a lack of mobility problem in
a furnace -- furnaces are supposed to produce heat, not to move, and
UNIXs are designed to optimize throughput, not support a rather simple
minded RT semantics.

It's worth noting that even in pure priority scheduled systems, priority
inversion is a fact of life as soon as there is any kind of communication or
resource sharing. You can only fix this problem in papers where you
declare all tasks to be assumed to be noninterfering.

>
> > >Consider the following scenario:
> > >- SCHED_OTHER process bangs on the FS
>
> Which is fully threaded, except for some properly
> protected areas.
>
> > >- low-priority RT process computes primes or whatever
> > >- high-priority RT process occasionally wakes up, reads a device and
> > > uses the FS.
>
> Priority inheritance would solve the problem.

Explain how.
And consider: Low priority A allocates a buffer and sleeps waiting for I/O
"RT" process B asks for a buffer and finds none.
or
Low priority process A enters a system call and does
global_cli

High priority task B calls global_cli

>
> > >Just as in your example, the important RT process gets locked out of
> > >the FS if schedule() is called with locks held. So, the problem isn't
> > >with SCHED_IDLE, it's elsewhere.
> >
> > The problem is that you are introducing a complex mechanism to do
> > something unspecified.
>
> The 'something unspecified' means solving possible
> problems with priority inversion that might be
> hidden somewhere in the code.

Still no specification. What is the desired semantics of process operation?
If a RT process X is runnable and the highest priority, when should it
run? Soon? At once? As soon as a pre-emption point is reached?

>
> > Spell out what "important RT" means precisely and you will be in a
> > position to make your code work. If you want a completely
> > preemptable kernel, you have your work cut out for you. God alone
> > knows what it means for a high priority RT task to "use the FS".
>
> Uhmm, this is rather vague -- what is the hidden message
> in this paragraph?
>

There's no hidden message. I'm simply pointing out that if you have
specifications that depend on undefined words e.g. "important" and "RT"
whcih have absolutely no meaning in this context, then your solutions will
be, in the words of some guy, like a hippo blundering around.

--

---------------------------------
Victor Yodaiken
Department of Computer Science
New Mexico Institute of Mining and Technology
Socorro NM 87801
Homepage http://www.cs.nmt.edu/~yodaiken
PowerPC Linux page http://linuxppc.cs.nmt.edu
Real-Time Page http://rtlinux.org

Dale E. Martin

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

Harald Koenig <koe...@tat.physik.uni-tuebingen.de> writes:

> if there is another solution which will claim only some very few % of CPU-time
> for those background/idle jobs (vs. nice-19 tasks) I'd be happy to use it...

A buddy of mine wrote "loadwatch" specifically for rc5. Invoke it like
this, (on a uni-CPU machine):
loadwatch -d 15 -h 1.25 -l 0.25 -- ./rc5des

[ begin included program ]

/* as is, etc. */

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <signal.h>
#include <sys/types.h>
#include <string.h>
#include <time.h>
#include <sys/wait.h>
#include <signal.h>

main(int argc, char **argv)
{
int delay = 60;
float highmark = 3.0;
float lowmark = 2.0;
int c;
pid_t child;
pid_t childgroup;

while ((c = getopt(argc, argv, "d:l:h:")) != -1)
{
switch (c)
{
case 'd':
delay = atoi(optarg);
break;

case 'h':
highmark = atof(optarg);
break;

case 'l':
lowmark = atof(optarg);
break;

default:
fprintf(stderr, "unknown option '%c'\n", c);
exit(1);
}
}

if ((child = fork()) == 0)
{
setpgrp();
execv(argv[optind], argv + optind);
}
else
{
FILE *file;
double one, ten, fifteen;
int state = 0;
char line[80];
char *outline;
pid_t status;

sleep(1);

childgroup = getpgid(child);

while (1)
{
time_t now = time(0);
outline = ctime(&now);
*(strchr(outline, '\n')) = 0;

switch (status = waitpid(child, 0, WNOHANG))
{
case -1:
perror("problems waiting for child");
kill(-childgroup, SIGTERM);
exit(1);
break;

case 0:
break;

default:
fprintf(stderr, "%s: no child process, exiting.\n", outline);
exit(0);
break;
}

one = ten = fifteen = 0.0;
file = fopen("/proc/loadavg", "r");
if (fgets(line, 80, file))
{
one = atof(line);
if (state == 0 && one >= highmark)
{
fprintf(stderr, "%s: load to high, stopping.\n", outline);
state = 1;
kill(-childgroup, SIGSTOP);
}
else if (state == 1 && one <= lowmark)
{
fprintf(stderr, "%s: load low, continuing.\n", outline);
state = 0;
kill(-childgroup, SIGCONT);
}
}
fclose(file);
sleep(delay);
}
}
}

yoda...@chelm.cs.nmt.edu

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

> > For example, (nice -10 cat junk | nice 10 cat ) makes the higher
> > priority "cat" wait on the lower priority "cat" once the pipe fills.
> > Or consider the do_fork code and ask what happens if high priority
> > process A wants to fork after low priority process B has acquired
> > the semaphore and kernel lock.
>

> This is not a problem since Linux doesn't let the high
> priority process do busy waiting on the lock. Since no
> busy waiting is done the low priority task gets a fair
> chance to release the lock again.
>
> Only a third, medium priority, CPU eating task can
> ruin this situation.

i.e. (nice -10 cat junk | nice 10 cat & nice 5 cat bigjunk)

The nice -10 then may find itself waiting for the nice 5 to let the nice 10
run and empty the pipe.

>To properly fix it, we will
> want priority inheritance. There might be other
> ways, but none of them are as elegant or efficient.

>
> > > Priority inheritance would solve the problem.
> >
> > Explain how.
> > And consider: Low priority A allocates a buffer and sleeps waiting for I/O
> > "RT" process B asks for a buffer and finds none.
> > or
> > Low priority process A enters a system call and does
> > global_cli
> > High priority task B calls global_cli
>

> These two scenarios are non-problems in Linux, since
> the high priority task will block and let the other
> one finish the critical section and release the lock.

Consider the first scenario. Low allocates buffer and sleeps on I/O. "RT"
blocks on Low. You don't care about this, although it gives an unbounded delay.
Then Medium runs so when Low wakes, Low will not run and release buffer right
away because Medium is running. This delay does bother you more than the
other, for reasons I don't understand -- that's ok, this delay is a big deal
in the academic RT literature and I don't understand that either. For
me, one unbounded timing delay is the same as another. But, now you propose
to fix the second kind of delay with "elegant" priority inheritance.
This would simply involve promoting Low to the priority of RT while RT
is waiting on Low. When Low releases the buffer, all it needs to do is
scan the list of all processes waiting for any other resources it may hold
and then resume the highest priority of any of these or its own priority
if nobody is waiting (this is what Solaris does). Now we have gone
from a simple scheduler change to satisfy Posix 1.b to a scheme by which
the OS searches the list of all sleep queues each time a process
releases a resource (note that without this search we have the possibility of
Low gets resource A, Low gets B and sleeps, RT_A blocks on Low raising Low
to RT priority, RT_B blocks on A raising Low to RT_B priority, Low releases
B and then becomes Low priority again) Of course, we need to apply this
method, not only to buffers, but to any IPC between a RT task and any other
task, such as the pipe example above. But then we need to do a search on
delay chains each time a process blocks too because:
LowA | LowB | LowC | RT_Critical
means that LowA,LowB, and LowC must all be promoted, thus when RT_Critical
first blocks on its pipe, the OS needs to determine the entire chain,
and dynamically adjust it as the pipes fill and empty.
Now all system tasks pay the price
for this "feature" whether they need it or not.

Is that really what you want?

>
> > > > The problem is that you are introducing a complex mechanism to do
> > > > something unspecified.
> > >
> > > The 'something unspecified' means solving possible
> > > problems with priority inversion that might be
> > > hidden somewhere in the code.
> >
> > Still no specification. What is the desired semantics of process
> > operation? If a RT process X is runnable and the highest priority,
> > when should it run? Soon? At once? As soon as a pre-emption point is
> > reached?
>

> An RT process should run ASAP. I guess we can all agree

Good sentiment!
How is this different from any other process? X needs to run ASAP or
the screen looks bad, Emacs needs to run ASAP or it does not respond
to keyclicks. Are they RT? I'm not being sarcastic here, it's quite
difficult to determine what "soft RT" should look like and I know it's
a real problem for some people. But Linux should provide a real solution,
not repeat the mistakes of others.

Stephen C. Tweedie

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

Hi,

On Mon, 9 Nov 1998 20:33:28 +0100 (CET), Rik van Riel
<H.H.v...@phys.uu.nl> said:

> On Sun, 8 Nov 1998 yoda...@chelm.cs.nmt.edu wrote:

>> You must mean: "implement priority inheritance" -- "priority
>> inversion" is the error, "priority inheritance" is so-called "fix".

>> Priority based RT assumes that the highest priority runnable
>> process will advance at any time.
>> If the OS tries to believe both propositions at the same time it will die.

> Not neccesarily. There are certain things you can do do avoid this
> situation. Firstly, RT processes usually don't take up loads of CPU
> time. If they do, you're better off with a separate system anyway;
> you're right on that.

But SCHED_OTHER processes _do_ take up loads of CPU time.

> OTOH, I have been running RT processes for a long time now (can you
> say 1.3) without actually experiencing any of the problems you
> mentioned.

That's because, as you already pointed out, soft RT processes do not
typically lock themselves in a tight loop. They avoid that because it
is *defined* that such behaviour will lock up your system.

> The SCHED_IDLE idea is just an extension of the realtime paradigm. I
> take it you're not really serious about wanting to back _that_ out of
> the kernel.

SCHED_IDLE extends a realtime paradigm to time-sharing processes, and
that is broken. Strict priority levels let one process lock out lower
priority processes for good, without possibility of recovery. That's
fine if you are talking about carefully written realtime apps where you
need root privilege to set the priority level and where you are quite
explicitly accepting the need for careful coding. It is not OK if you
allow SCHED_OTHER processes to completely and permanently preempt a
SCHED_IDLE process which legally holds a semaphore. SCHED_OTHER should
not be able to lock things up.

By the way, I'm quite surprised that nobody has yet pointed out a
perfectly simple fix for the SCHED_IDLE issue, which is to disable the
SCHED_IDLE effect and revert to timeshared scheduling whenever a process
is blocked in kernel mode.

--Stephen

Rik van Riel

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

On Tue, 10 Nov 1998 yoda...@chelm.cs.nmt.edu wrote:

> task, such as the pipe example above. But then we need to do a search on
> delay chains each time a process blocks too because:
> LowA | LowB | LowC | RT_Critical
> means that LowA,LowB, and LowC must all be promoted, thus when RT_Critical
> first blocks on its pipe, the OS needs to determine the entire chain,

No, this is not a case of a system lock held or the
kernel blocking in a critical region (yes, I understand
this argument is promoting your point of view).

> and dynamically adjust it as the pipes fill and empty. Now all
> system tasks pay the price for this "feature" whether they need it
> or not.
>
> Is that really what you want?

Not really. My plan was to only use the feature for
specifically marked critical regions inside the
kernel. An RT task waiting for data from an idler
process is artificial (and dumb) enough for me not
to care...

> > > Still no specification. What is the desired semantics of process
> > > operation?
> >

> > An RT process should run ASAP. I guess we can all agree
>
> Good sentiment!
> How is this different from any other process?

For other processes we make a tradeoff between throughput
and latency. When you look at it a bit more in detail,
you'll see that latency is compromised to the point of
the system getting unuseable.

For RT tasks latency is all and throughput is sacrificed
entirely. The only reason that RT tasks can do something
useful is because we don't take the CPU away from them...

> X needs to run ASAP or the screen looks bad, Emacs needs to
> run ASAP or it does not respond to keyclicks. Are they RT?

No. It's quite OK to wait 1/10th of a second before
Emacs is scheduled. This really is not something we
care about a lot (OK, we do, but we don't have to).

The Linux scheduler lets a nice +19 task get it's
timeslice when Emacs has run out of it's, regardless
of whether the user pressed a key or not.

An RT task will always get priority, regardless of
CPU time used and/or other crap.

I guess the difference you're looking for is absolute
vs. dynamic priority. The RT tasks have absolute and
static priority while other tasks have a scheme of
dynamic priorities where everyone gets a turn.

> I'm not being sarcastic here, it's quite difficult to determine what
> "soft RT" should look like and I know it's a real problem for some
> people. But Linux should provide a real solution, not repeat the
> mistakes of others.

In my view, RT tasks really should have absolute
priority over non-RT tasks and static priority
within it's own class.

This inherently carries the problem of priority
inversion with it, so we need to find a solution
to it.

cheers,

Rik -- slowly getting used to dvorak kbd layout...
+-------------------------------------------------------------------+
| Linux memory management tour guide. H.H.v...@phys.uu.nl |
| Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+

Gordon P. Oliver

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

... yoda...@chelm.cs.nmt.edu said ...

>Consider the first scenario. Low allocates buffer and sleeps on I/O. "RT"
>blocks on Low. You don't care about this, although it gives an unbounded delay.
>Then Medium runs so when Low wakes, Low will not run and release buffer right
>away because Medium is running. This delay does bother you more than the
>other, for reasons I don't understand -- that's ok, this delay is a big deal
>in the academic RT literature and I don't understand that either.

That's because you are considering the _generic_ case. And almost all realtime
deals with the specific. An example may be something like:
High: Monitor a live system (with high tolerance to delay),
writing statistics to disk (locks superblock now and again)
Med: CPU intensive program that almost never touches disk.
Low: RC5 (or something similar) writes to disk occasionally (and
locks the superblock)

In this case the lock for the superblock is held for an unbounded, but
generally very short time. If the low priority process happens to be
pre-empted at this moment, the disk is locked until it gets time again.
Not a pretty sight... _this_ is what needs to be prevented if there is a
real possibility of starving a process for time...

In the "academic" RT literature (and in the _real_ use that caused ?Wind
River? to add priority inheritance) there are many more cases like this that
have bounded times for locking the semaphore, but, because they are in
pre-emptive MT (unlike the kernel) that time might become unbounded. In
the kernel, the time is unbounded in the cases we care about (where it goes
to sleep), but unbounded isn't a carte blanche to make it depend on user
priorities as well...

The generic case doesn't give enough information to tell the difference.

>When Low releases the buffer, all it needs to do is
>scan the list of all processes waiting for any other resources it may hold
>and then resume the highest priority of any of these or its own priority
>if nobody is waiting (this is what Solaris does).

_please_ We can do better than this. Only semaphores (not spinlocks) need
to have the priority inheritance. This can be done with lists off the
semaphore and tasks... You don't really need to scan anything. It does
make the down/up calls more heavyweight. The scheduler doesn't even notice.
Note that the up/down calls would only have to be heavyweight if they had
priority inheritance enabled (i.e. at up, the process has been promoted, and
at down, semaphore is locked by a lower priority process).

>Of course, we need to apply this
>method, not only to buffers, but to any IPC between a RT task and any other

>task, such as the pipe example above.

ehm. Not. The job of the OS is to assure that it doesn't deadlock processes
that wouldn't otherwise deadlock. If a user has a producer process at high
priority and a consumer at low priority he loses. That's not deadlock, that's
just simple poor assignment of priorities.

Note that I lost track of exactly what the deadlock was, but, for example,
locking a super-block and then sleeping might have exactly this effect...
-gordo
--
-----------------------------------------------------------------------
Gordon Oliver, member of Distributed Idea Group (http://www.digroup.com)

Steve VanDevender

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

Rik van Riel writes:
> In my view, RT tasks really should have absolute
> priority over non-RT tasks and static priority
> within it's own class.

You know, what's sad about this continued debate about "RT" is
that there is a persistent failure to understand that
"real-time", as it is conventionally used, means "guaranteed
response time". It does not mean "ASAP scheduling" or "minimum
scheduling latency". There are inherent conflicts between
real-time and timesharing that are not easy, or maybe not even
possible, to resolve.

If Richard Gooch had taken the tack that the Linux scheduler had
undesirable latency when proposing his scheduler patches, I don't
think I or several other people would have objected. But when he
claimed that his patches would make Linux a better "real-time"
operating system, I did object, because you can't make a
real-time operating system out of a timesharing system just by
fiddling with the scheduler, or implementing "priority
inheritance", or doing other things that have been proposed.

Linux is a timesharing system, by design, and that assumption is
pervasive though things like interrupt handling and the driver
model. Anyone who thinks that Linux can be made into a real-time
system by fiddling with the scheduler and maybe a couple of other
things is deluding himself.

yoda...@chelm.cs.nmt.edu

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

On Tue, Nov 10, 1998 at 05:12:25PM -0300, Gordon P. Oliver wrote:
> That's because you are considering the _generic_ case. And almost all realtime

I've never seen a good specific case.

> In the "academic" RT literature (and in the _real_ use that caused ?Wind
> River? to add priority inheritance) there are many more cases like this that
> have bounded times for locking the semaphore, but, because they are in

I disagree. There are many cases where poor programming and mixing priority
scheduling with blocking causes failure conditions, but these all have better
solutions that do not require priority inheritance. Essentially priority inheritance
is a hack to allow people who don't understand synchronization to use threads and
mostly get away with it.

> >When Low releases the buffer, all it needs to do is
> >scan the list of all processes waiting for any other resources it may hold
> >and then resume the highest priority of any of these or its own priority
> >if nobody is waiting (this is what Solaris does).
>
> _please_ We can do better than this. Only semaphores (not spinlocks) need
> to have the priority inheritance. This can be done with lists off the
> semaphore and tasks... You don't really need to scan anything. It does
> make the down/up calls more heavyweight. The scheduler doesn't even notice.

The down/up calls are "more heavyweight" because they must scan the lists. Think
about how much work is involved and how many synchronization points are introduced.
And think about why it's important to use this mechanism for semaphores but
not important when exactly the same type of "inversion" can happen on, for example,
memory allocation with sleeps -- just because you don't call it a semaphore does
not mean it does not have the same semantics.

> Note that the up/down calls would only have to be heavyweight if they had
> priority inheritance enabled (i.e. at up, the process has been promoted, and
> at down, semaphore is locked by a lower priority process).

Right: so you introduce another unpredictable form of latency.

> >Of course, we need to apply this
> >method, not only to buffers, but to any IPC between a RT task and any other
> >task, such as the pipe example above.
>
> ehm. Not. The job of the OS is to assure that it doesn't deadlock processes
> that wouldn't otherwise deadlock. If a user has a producer process at high
> priority and a consumer at low priority he loses. That's not deadlock, that's
> just simple poor assignment of priorities.

But that is exactly the error that paralyzed Mars Lander. Consider a high priority
consumer that reads a pipe from a low priority producer. The consumer finds an
empty pipe and sleeps. The producer can't run because a medium priority process
is hogging the cpu. This won't be a problem in regular Linux because even low
priority tasks progress eventually and because there is no timing assurance for
the high priority task. But if you are going to claim that the high priority task
is RT and the medium is also RT, then the high priority task may hang indefinitely.

--

---------------------------------
Victor Yodaiken
Department of Computer Science
New Mexico Institute of Mining and Technology
Socorro NM 87801
Homepage http://www.cs.nmt.edu/~yodaiken
PowerPC Linux page http://linuxppc.cs.nmt.edu
Real-Time Page http://rtlinux.org