Operating systems that made use of the 386 task switching hardware?

Johann 'Myrkraverk' Oskarsson

unread,

May 18, 2020, 8:07:46 AM5/18/20

to

Dear a.f.computers,

I'm reading Advanced 80386 Programming Techniques [1], and it covers the
386's own task switching mechanism. Now as I understand it, "modern"
operating systems such as Linux and 386BSD did not make use of these
hardware facilities for various reasons; and at least in the case of
Linux I know it did not make use of call gates for system calls [2].

Did any operating system at the time [3] actually make use of these
hardware task switching facilities? Do we know why, or why not? Did
NT, or OS/2?

[1] by James Turley; the edition I have is copyright 1999.

[2] Solaris 10, on the other hand had support for both call gates and
interrupt based system calls for 32bit code; maybe Illumos still has
this support but I haven't checked.

[3] I'm not certain what era is appropriate for 386 based operating
systems.

--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
I'm not from the Internet, I just work there. | twitter: @myrkraverk

Carlos E.R.

unread,

May 18, 2020, 8:32:07 AM5/18/20

to

On 18/05/2020 14.07, Johann 'Myrkraverk' Oskarsson wrote:
> Dear a.f.computers,
>
> I'm reading Advanced 80386 Programming Techniques [1], and it covers the
> 386's own task switching mechanism. Now as I understand it, "modern"
> operating systems such as Linux and 386BSD did not make use of these
> hardware facilities for various reasons; and at least in the case of
> Linux I know it did not make use of call gates for system calls [2].
>
> Did any operating system at the time [3] actually make use of these
> hardware task switching facilities? Do we know why, or why not? Did
> NT, or OS/2?
>
> [1] by James Turley; the edition I have is copyright 1999.
>
> [2] Solaris 10, on the other hand had support for both call gates and
> interrupt based system calls for 32bit code; maybe Illumos still has
> this support but I haven't checked.
>
> [3] I'm not certain what era is appropriate for 386 based operating
> systems.

I remember reading of this around 1990, and at the time I didn't know of
anything using those features.

--
Cheers, Carlos.

Bob Eager

unread,

May 18, 2020, 8:48:52 AM5/18/20

to

On Mon, 18 May 2020 20:07:28 +0800, Johann 'Myrkraverk' Oskarsson wrote:

> I'm reading Advanced 80386 Programming Techniques [1], and it covers the
> 386's own task switching mechanism. Now as I understand it, "modern"
> operating systems such as Linux and 386BSD did not make use of these
> hardware facilities for various reasons; and at least in the case of
> Linux I know it did not make use of call gates for system calls [2].
>
> Did any operating system at the time [3] actually make use of these
> hardware task switching facilities? Do we know why, or why not? Did
> NT, or OS/2?

OS/2 didn't use the TSS for task switching; that would have flushed all
the segment register caches, the TLBs, and that caused protection checks
as the segment registers were reloaded. It minimally used a single TSS
for privilege level transition.

It did, however, use call gates.

--
Using UNIX since v6 (1975)...

Use the BIG mirror service in the UK:
http://www.mirrorservice.org

Scott Lurndal

unread,

May 18, 2020, 3:12:15 PM5/18/20

to

Johann 'Myrkraverk' Oskarsson <joh...@myrkraverk.invalid> writes:
>Dear a.f.computers,
>
>I'm reading Advanced 80386 Programming Techniques [1], and it covers the
>386's own task switching mechanism. Now as I understand it, "modern"
>operating systems such as Linux and 386BSD did not make use of these
>hardware facilities for various reasons; and at least in the case of
>Linux I know it did not make use of call gates for system calls [2].
>
>Did any operating system at the time [3] actually make use of these
>hardware task switching facilities? Do we know why, or why not? Did
>NT, or OS/2?

For the most part, it make task switches far to costly; software could
do the switch far more effectively.

Johann 'Myrkraverk' Oskarsson

unread,

May 18, 2020, 10:07:13 PM5/18/20

to

Yes, This is what the 386BSD people had to say [1] about the JUMP TSS
instruction:

> As a matter of fact, while comprehensive, we find it too slow to use
> efficiently for our purposes. However, to be fair to the designers of
> the 386, if you need to do all of the things that JUMP TSS offers,
> using this instruction is probably your best bet.

In the next paragraph they continue with:

> We know that by saving fewer registers, we end up doing fewer loads
> and stores, and hence make our end-to-end cost lower. In our 386BSD
> swtch() function (see Listing Four), we get away with saving only
> six registers. We don't need to save %eax, %edx, and %ecx because
> these are compiler temporary registers which are discarded on return.
> We also don't save the segment registers because they don't change in
> this version of the system. In contrast, pushal saves eight registers
> and JMP TSS saves 20. Adding up the instruction costs, our approach is
> the best of the three.

I cannot say if they did benchmarks or just hand counted the cost of the
different strategies; but it's clear they decided JUMP TSS was too ex-
pensive.

So, would we consider the JUMP TSS instruction a design blunder at
Intel? According to WP [2], amd64 does not support this instruction at
all.

[1]
https://386bsd.org/releases/porting-unix-to-the-386-the-basic-kernel-multiprogramming-and-multiprogramming-and-multitasking-part-ii-article

[2] https://en.wikipedia.org/wiki/Task_state_segment

John Levine

unread,

May 18, 2020, 10:38:15 PM5/18/20

to

In article <kbHwG.426283$XQ5.3...@fx28.am4>,

Johann 'Myrkraverk' Oskarsson <joh...@myrkraverk.invalid> wrote:
>So, would we consider the JUMP TSS instruction a design blunder at
>Intel? According to WP [2], amd64 does not support this instruction at all.

JMP TSS took about 300 cycles compared to 2 for an ordinary register
push so it wasn't a hard calculation to make.

I think it's just part of a myopic attitude that that the segmenty
goodness of the 286 and 386 were so great that the horrible
performance didn't matter.

Segment loads on 286 and 386 were so slow that if you wanted decent
performance you had to cram your code and data into as few segments as
possible to avoid segment switches. They didn't even make the obvious
speedup of checking if you were reloading the same value into a
segment register, so you had to write all your code with short jumps
and calls, with long mode glue when you absolutely had to switch
segments.

--
Regards,
John Levine, jo...@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Charlie Gibbs

unread,

May 19, 2020, 12:44:16 AM5/19/20

to

On 2020-05-19, Johann 'Myrkraverk' Oskarsson <joh...@myrkraverk.invalid> wrote:

> So, would we consider the JUMP TSS instruction a design blunder at
> Intel? According to WP [2], amd64 does not support this instruction at
> all.

"It seemed like a good idea at the time..."

--
/~\ Charlie Gibbs | Microsoft is a dictatorship.
\ / <cgi...@kltpzyxm.invalid> | Apple is a cult.
X I'm really at ac.dekanfrus | Linux is anarchy.
/ \ if you read it the right way. | Pick your poison.

timcaf...@gmail.com

unread,

May 19, 2020, 11:22:19 AM5/19/20

to

Convergent Technology's OS (CTOS) used TSS's extensively in the protected
mode version. I'm not sure why, they didn't really isolate tasks from
each other. I figured this out when I was trying to write an interrupt
driver for the serial port. When the CPU got an interrupt, it went
through a task gate for the interrupt service routine, figured out
it needed to alert my device driver and called through another task
gate. On a 8Mhz 286 I couldn't keep up with 9600 bps, because it bounced
out through the two task gates on the way back to user space before it got
the next interrupt. 4 task gates between characters was just too much overhead. Most frustrating project I ever worked on.

- Tim

Scott Lurndal

unread,

May 19, 2020, 11:45:24 AM5/19/20

to

Charlie Gibbs <cgi...@kltpzyxm.invalid> writes:
>On 2020-05-19, Johann 'Myrkraverk' Oskarsson <joh...@myrkraverk.invalid> wrote:
>
>> So, would we consider the JUMP TSS instruction a design blunder at
>> Intel? According to WP [2], amd64 does not support this instruction at
>> all.
>
>"It seemed like a good idea at the time..."

And it was state of the art in the late 70's early 80's. Burroughs
medium systems used a similar instruction (BRV, Branch Virtual Reinstate)
that would save the state for the current task and restore the state for
the next task (including the base pointer for the segment tables for
the task).

Bob Eager

unread,

May 19, 2020, 11:48:43 AM5/19/20

to

The problem was making the TSS switch such a complete switch, with so
much context. Other architectures cached page and segment table entries
separately, and re-acquired them as and when required. Saving just the
working registers could be pretty fast. And with public segments for the
OS (where the segment entries were rarely changed) helped a lot.

Peter Flass

unread,

May 19, 2020, 1:24:25 PM5/19/20

to

I would imagine Burroughs did a bit of a better job with it, though.

--
Pete

Scott Lurndal

unread,

May 19, 2020, 1:33:01 PM5/19/20

to

I would say so, but then I was on the new architecture team when the
instruction was created :-)

Here's the simulator version of the instruction:

/**
* Branch Reinstate Virtual (Rev B).
*
* @param opp The operation descriptor
* @return false
*/
bool
c_processor::op_brv(struct _op *opp)
{
mem_addr_t sb = c_system::self()->get_subbase();
mem_addr_t rleaddr = sb + 6002;
mem_addr_t rle;
struct timeval end_time;

flush_mcp_data();

if (!is_control()
|| !is_kernel()) {
instruction_error(IE_PRIV_VIOLATION);
}

if (p_bfl == 1) {
rle = get_address(7, getdigits(p_operands[0]->getaddress()+2, 6));

if (gethex(rle+RLIST_WAIT_FIELD, RLIST_WAIT_FIELD_LEN) == 0x800000) {
ulong tp = gethex(sb+6000, 8);

puthex(rle+RLIST_NEXT, RLIST_NEXT_LEN, tp);
puthex(sb+6000, 2, 0xc7);
putdigits(rleaddr, 6, rle-sb);
puthex(rle+RLIST_WAIT_FIELD, 1,
(gethex(rle+RLIST_WAIT_FIELD, 1) & ~0x8));
}
}
//
// Select next task from the ready-list
//
while (!is_nullptr(rleaddr)) {
rle = getdigits(rleaddr, 6)+sb;
//
// If the task is not runnable, delink it from the ready-list. If
// the task is runnable, set the current processor number into the
// owning process field and drop through to the reinstate code.
//
if (gethex(rle+RLIST_WAIT_FIELD, RLIST_WAIT_FIELD_LEN) == 0) {
putdigits(rle+RLIST_PROC_NUM, RLIST_PROC_NUM_LEN, p_procnum);
} else {
if (getdigits(rle+RLIST_PROC_NUM, RLIST_PROC_NUM_LEN) != 0) {
rleaddr = rle+RLIST_NEXT+2;
continue;
}
puthex(rle+RLIST_WAIT_FIELD, 1, gethex(rle+RLIST_WAIT_FIELD, 1)|8);
puthex(rleaddr, 6, gethex(rle+RLIST_NEXT+2, 6));
//
// On to next ready-list element
//
continue;
}
//
// Reinstate the top runnable ready-list task
//
p_curtask = rle;
p_curtasknum = getdigits(p_curtask+RLIST_TASKNUM, RLIST_TASKNUM_LEN);

set_mode(MODE_EXECUTING);
p_kerneltime += difftime_usec(&end_time, &p_kernstart);
c_system::self()->unlock_kernel(); // Release kernel lock

if (!p_accumulator.load(p_curtask+RLIST_ACCUM, &p_rd)) {
handle_fault();
}
p_mop.load(p_curtask+RLIST_MOP);
p_imask.load(p_curtask+RLIST_IMASK);
#if defined(VSIM_DEBUG)
p_logger->trace("[%1.1lu/%4.4lu] set Imask=%3.3lx\n",
p_procnum, p_curtasknum, p_imask.get_val());
#endif
for(ulong i=4, source=p_curtask+RLIST_MOBIX; i < 8; i++, source += 8) {
p_ix[i]->load(source);
}
p_toggles.load(p_curtask+RLIST_MODE);
p_com_ovf = gethex(p_curtask+RLIST_COMS, RLIST_COMS_LEN);
if (!p_active_env.load(p_curtask+RLIST_ACTIVE_ENV)) {
address_error(AE_EO_INV_MSD);
}
load_mat(p_curtask, false);
p_ip = getdigits(p_curtask+RLIST_IP, RLIST_IP_LEN);

gettimeofday(&p_taskstart, NULL);

//
// The real hardware has a running timer that will trigger when the
// top digit hits zero. We'll do it here, instead. The timer has plenty
// of room to continue to run, and the task will be charged accordingly.
//
if (p_mp->getdigit(p_curtask+RLIST_TSR) == 0) {
p_interrupts.raise_timer_interrupt();
}

//
// Update the task number at base+82 (the rev B spec doesn't say this!)
// --- How will this work with multiple processors
// XXX Maybe this only needs to be done on entry to
// interrupt procedure?
//
putdigits(sb+82, 4,
getdigits(p_curtask+RLIST_TASKNUM, RLIST_TASKNUM_LEN));

if (is_soft_fault_enabled()) {
if (getdigits(p_curtask+8, 1) != 0) {
p_logger->trace("Soft Fault Hardware Call invoked\n");
soft_fault();
hardware_call(opp, false, 0);
}
}

p_taken = true;
alarm_check();
return false;
}

idle(opp);

return false;
}

Daiyu Hurst

unread,

May 19, 2020, 10:01:55 PM5/19/20

to

On Monday, May 18, 2020 at 8:07:46 AM UTC-4, Johann 'Myrkraverk' Oskarsson wrote:

> Dear a.f.computers,
>
> I'm reading Advanced 80386 Programming Techniques [1], and it covers the
> 386's own task switching mechanism. Now as I understand it, "modern"
> operating systems such as Linux and 386BSD did not make use of these
> hardware facilities for various reasons; and at least in the case of
> Linux I know it did not make use of call gates for system calls [2].
>
> Did any operating system at the time [3] actually make use of these
> hardware task switching facilities? Do we know why, or why not? Did
> NT, or OS/2?

Intel's RTOS 386 product.

-Dai

anti...@math.uni.wroc.pl

unread,

May 22, 2020, 11:24:31 AM5/22/20

to

Johann 'Myrkraverk' Oskarsson <joh...@myrkraverk.invalid> wrote:

> Dear a.f.computers,
>
> I'm reading Advanced 80386 Programming Techniques [1], and it covers the
> 386's own task switching mechanism. Now as I understand it, "modern"
> operating systems such as Linux and 386BSD did not make use of these
> hardware facilities for various reasons; and at least in the case of
> Linux I know it did not make use of call gates for system calls [2].

Linux 0.01 used 386's own task switching to implement process switch.
That handed in later versions.

--
Waldek Hebisch