Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

CURLWP global register in NetBSD/mips

4 views
Skip to first unread message

Toru Nishimura

unread,
Dec 21, 2010, 4:56:05 AM12/21/10
to
Guys,

NetBSD/mips kernel uses a dedicated register to hold curlwp
global variable. Register S7 is used in -current whileT8 in
matt-nb5-mips64 branch (why differ?)

The place where curlwp register is reassigned is pretty
limited. As far as I understand, only cpu_switchto() does it.

Looking at cpu_lwp_fork() bottom half, we find a sequence
to assign values for newlwp's switchframe. The following line

pcb->pcb_context.val[MIPS_CURLWP_LABEL] = (intptr_t)l2;

is supposed to make the dedicated curlwp register to hold
newlwp at the very bottom of cpu_switchto().

During the context switch steps, cpu_switchto() assigns newlwp
value to the reserved register anyway. So, this particular
switchframe arrangement for cpu_switchto() bottom makes little
sense.

Now I propose here;

1. remove [... _CURLWP_LABEL] arrangement in vm_machdep.c
2. remove S7 (T8) value restoration at the bottom of cpu_switchto().

Comments are welcome.

Toru Nishimura / ALKYL Technology

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-...@muc.de

Matt Thomas

unread,
Dec 21, 2010, 12:45:21 PM12/21/10
to

On Dec 21, 2010, at 1:56 AM, Toru Nishimura wrote:

> Guys,
>
> NetBSD/mips kernel uses a dedicated register to hold curlwp
> global variable. Register S7 is used in -current whileT8 in
> matt-nb5-mips64 branch (why differ?)

Better code. T8 is rarely used but S7 is one of the saved
registers and used much more often.

> The place where curlwp register is reassigned is pretty
> limited. As far as I understand, only cpu_switchto() does it.
>
> Looking at cpu_lwp_fork() bottom half, we find a sequence
> to assign values for newlwp's switchframe. The following line
>
> pcb->pcb_context.val[MIPS_CURLWP_LABEL] = (intptr_t)l2;
>
> is supposed to make the dedicated curlwp register to hold
> newlwp at the very bottom of cpu_switchto().
>
> During the context switch steps, cpu_switchto() assigns newlwp
> value to the reserved register anyway. So, this particular
> switchframe arrangement for cpu_switchto() bottom makes little
> sense.
>
> Now I propose here;
>
> 1. remove [... _CURLWP_LABEL] arrangement in vm_machdep.c
> 2. remove S7 (T8) value restoration at the bottom of cpu_switchto().
>
> Comments are welcome.

3. Remove S7 (T8) value saving as well.

Simon Burge

unread,
Dec 21, 2010, 2:09:17 PM12/21/10
to
Matt Thomas wrote:

> On Dec 21, 2010, at 1:56 AM, Toru Nishimura wrote:
>
> > Guys,
> >
> > NetBSD/mips kernel uses a dedicated register to hold curlwp
> > global variable. Register S7 is used in -current whileT8 in
> > matt-nb5-mips64 branch (why differ?)
>
> Better code. T8 is rarely used but S7 is one of the saved
> registers and used much more often.

I'm curious - was the change to put curlwp ever actually benchmarked? I
couldn't find anything. I can find a reference to a kernel being 2487
bytes smaller in April 2007 but no benchmarks.

Cheers,
Simon.

Matt Thomas

unread,
Dec 21, 2010, 2:20:32 PM12/21/10
to

On Dec 21, 2010, at 11:09 AM, Simon Burge wrote:

> Matt Thomas wrote:
>
>> On Dec 21, 2010, at 1:56 AM, Toru Nishimura wrote:
>>
>>> Guys,
>>>
>>> NetBSD/mips kernel uses a dedicated register to hold curlwp
>>> global variable. Register S7 is used in -current whileT8 in
>>> matt-nb5-mips64 branch (why differ?)
>>
>> Better code. T8 is rarely used but S7 is one of the saved
>> registers and used much more often.
>
> I'm curious - was the change to put curlwp ever actually benchmarked? I
> couldn't find anything. I can find a reference to a kernel being 2487
> bytes smaller in April 2007 but no benchmarks.

I was asked about the use of S7 early this year and came up with:

The change came from the yamt-idlelwp branch (2007-05-17) as indicated in 1.45 of sys/arch/mips/conf/Makefile.mips. There no explanation of how s7 was chosen.

Here's a breakdown of a MALTA64 kernel of how often each register is used. I moved MIPS_CURLWP from s7 to t8 (23 to 24). It's curious as to why one of t1/t2 is never used.
s7 is now used 7524 times and the 288 uses of t8 have been moved to another register.

For a MALTA32 kernel:

text data bss dec hex filename
2561855 434128 217652 3213635 310943 s7 curlwp
2537035 434672 217652 3189359 30aa6f t8 curlwp
24820
6205 instructions

Saves about 1% in text size. Not a lot but for a simple change, pretty inpressive.


s7(23) t8(24)
----------------
25 25 gp
58 58 at
251 251 k1
264 264 k0
277 860 t8
456 463 t3
531 t2
807 t1
847 7524 s7
1539 1533 t0
2341 2301 a7
4702 4676 a6
5566 5528 a5
7586 5757 s8
9155 9149 s6
11391 11354 a4
12806 12804 s5
16909 16894 s4
20934 20930 ra
23441 23467 s3
26381 26332 a3
32974 32970 s2
34143 33753 a2
44910 44888 s1
59371 59015 a1
60367 60371 s0
81935 80425 v1
91835 91109 a0
121093 120234 sp
187080 185657 v0

Toru Nishimura

unread,
Dec 21, 2010, 6:18:28 PM12/21/10
to
Matt Thomas said;

>> 1. remove [... _CURLWP_LABEL] arrangement in vm_machdep.c
>> 2. remove S7 (T8) value restoration at the bottom of cpu_switchto().
>>
>> Comments are welcome.
>
> 3. Remove S7 (T8) value saving as well.

I considered the 3rd change and feel hesitation to do since it'd be
better and probably helpful to hold 'curlwp' value of arbitrary lwp in
its swtichframe.

Toru Nishimura / ALKYL Technology

--

Toru Nishimura

unread,
Dec 21, 2010, 8:42:04 PM12/21/10
to
Guys,

>>> 1. remove [... _CURLWP_LABEL] arrangement in vm_machdep.c
>>> 2. remove S7 (T8) value restoration at the bottom of cpu_switchto().

I made changes and code was committed. Compiled Ok for pmax and
disassembled binaries were inspected to verify the change. Should be
no worse than previous state of -current. No test on real HW.

Toru Nishimura

unread,
Dec 24, 2010, 1:43:23 AM12/24/10
to
Matt Thomas said;

> Here's a breakdown of a MALTA64 kernel of how often each register is used. I
> moved MIPS_CURLWP from s7 to t8 (23 to 24). It's curious as to why one of
> t1/t2 is never used.

It's the output of n32/n64 ABI which extends arg-on-register from 4 to to 8.
The result shows a4-a7 are frequently used indeed. It might imply
functions prefer a0-a7 as temporary scratch registers than choosing t0-t3
for the same purpose.

It'd be interesting to make o32 profiling since a3-a7 are used to be
t0-t3 in o32 ABI.

Paul Koning

unread,
Dec 24, 2010, 12:36:24 PM12/24/10
to
I think that gcc (all else being equal) will allocate registers from lowest numbered to highest. Saved registers are different from non-saved registers, I think, but I would expect all non-saved registers to be treated the same. So that would account for the a registers to be used as scratch a lot.

I'm not sure if this matters other than as a curiosity. If yes, it can be adjusted easily enough by changes to the back-end code in gcc.

paul

0 new messages