would you explain the way of software task-switching
( namely, switching without using tss ) on x86 in detail ?
thanks.
It's simple:
-push all regs to the stack
-load a new value into ss:esp (switch kernel stacks)
-pop all regs from the new stack
-load a new value into cr3 (pd base reg)
-load a new ss:esp value into the system tss (patch it)
-iret (when the task switcher is placed into an interrupt handler)
OR even simplier: (faster; only for microkernels)
-save all regs to the thread data struct
-load all regs from the new thread data struct
-reload cr3 (on process switches only)
-iret
Viktor
ps1:
It can be as fast as 64 mov-s in case of a thread switch.
(about 32 cycles on a P6 core /Ppro-PIII/)
The 1st method modifies the tss. The x86 reads the ss0:esp values from
the tss on a ring3->ring0 switch. This is the only required field.
(Using one kernel stack per cpu saves the patch and the reload, but makes
the kernel ring0 code uninterruptable. Actually... who wants to
interrupt the task switcher in a critical section?)
ps2: the code in unoptimized inline gcc assembly:
asm("_kernel_lbl_int_00: ");
asm(" pushl $0 ");
asm(" jmp _kernel_lbl_to_kernel ");
[...]
asm("_kernel_lbl_int_40: ");
asm(" pushl $0 "); /* filler for the err code */
asm(" pushl $64 "); /* irq number */
asm(" jmp _kernel_lbl_to_kernel ");
[...]
asm("_kernel_lbl_to_kernel: ");
asm(" pushl %gs "); /* could be optimized */
asm(" pushl %fs "); /* to one instr. */
asm(" pushl %es ");
asm(" pushl %ds ");
asm(" pushl %ebp ");
asm(" pushl %edi ");
asm(" pushl %esi ");
asm(" pushl %edx ");
asm(" pushl %ecx ");
asm(" pushl %ebx ");
asm(" pushl %eax ");
asm(" ");
asm(" movl %esp, %ebx ");
asm(" movl $_kernel_stack_top, %esp "); /* single kstack mode */
asm(" pushl %ebx ");
asm(" movw $16, %ax "); /* KERNEL_DATA_SEG!!!! */
asm(" movw %ax, %ds ");
asm(" movw %ax, %es ");
asm(" movw %ax, %fs ");
asm(" movw %ax, %gs ");
asm(" call _kernel_entry "); /* => eax: retval */
asm(" popl %ebx ");
asm(" movl %ebx, %esp ");
asm(" ");
asm(" popl %eax ");
asm(" popl %ebx ");
asm(" popl %ecx ");
asm(" popl %edx ");
asm(" popl %esi ");
asm(" popl %edi ");
asm(" popl %ebp ");
asm(" popl %ds ");
asm(" popl %es ");
asm(" popl %fs ");
asm(" popl %gs ");
asm(" addl $8, %esp "); /* drop err and irq num */
asm(" iretl ");
[...]
uint kernel_entry(uint* regs)
{
/* put your microkernel here */
return(0); /* ignored now, could be used to flag new cr3 (pid) value */
}
"Kovacs Viktor Peter" <k...@error404.obuda.kando.hu> wrote in message
news:20020215024513...@error404.obuda.kando.hu...
Do you allow your kernel to access user-mode buffers?
What happens if they are paged out?
Mike
It won't push segment registers... but it's still insteresting. I know 1st
is better do things clear, but using a simple pushad is not so complex
optimitazion, isnt' it?
ps. excuse my poor english.
juanjo
Yes... I use it in my code.. the only "drawback" I remember is that
esp is pushed twice ... not a big issue anyway.
Regarding clocks, IIRC it was nearly the same, at least on the 386...
Greetings,
David.
Reply: replace 'nospam' with my lastname
--
OS/2: Obsolete Soon, Too.
On Fri, 15 Feb 2002 16:53:43 +0100, Juan J. Martinez spake thus:
pusha(d) takes 5 cycles on the Pentium, but can be broken down into 8
separate, pairable push instructions, taking a total of 4 cycles.
On the 486, pusha(d) takes 11 cycles, against 8 instructions which
take 1 cycles each, giving a total of 8 cycles.
On the 386, pusha(d) takes 24 cycles, against 8 instructions which
take 2 cycles (a total of 16 cycles).
On the 286, pusha is faster than the 8 instructions (19 cycles against
8 * 3).
--
Debs
de...@dwiles.nospam.demon.co.uk
----
Misspelled? Impossible! I have an error correcting modem.
No, buffers passed to the kernel must be allocated as movable, unpagable
memory. (aka. message memory) The validity is tested before any access.
Viktor
This is why I've said 'unoptimized' in the title. Actually, it turns out
that complex microcode is slower than simple instructions.
Also, the popad can leave the task in a non restartable condition after
a fault.
Viktor
Usually even simpler:
- all regs are already pushed to the stack in user -> kernel transition prolog. This is other code, not the context swap one.
- save the current value of EBP and ESP to the current thread structure.
- load the new value of "ESP base" to the TSS (one per CPU) from the new thread structure. This is the kernel ESP which will be set
on user/kernel transition.
- load the new EBP and ESP from the new thread structure.
- update the per-CPU "current thread" pointer (unless you use the Linux design of thread structure on the stack).
- if you have switched to other process - load the CR3 register from the new process' descriptor. This switches the address space.
- return.
Also some stuff with "task switch" flag, IOPL bitmap, and GDT entry which describes the user-mode TLS, and also LDTR stuff if you
support 16bit user apps.
I can email the assembly for KiSwapThread and SwapContext from w2k SMP kernel to anybody.
Email me directly.
Max