how to use x4 register(thread pointer)?

3,153 views
Skip to first unread message

Ma vincent

unread,
Dec 5, 2016, 10:16:11 AM12/5/16
to RISC-V ISA Dev, RISC-V SW Dev

hi all,


I am trying to run a rtos on my simple riscv core. Is the x4 register named 'thread pointer' used for storing current PC for current thread?


best,

vincent


Samuel Falvo II

unread,
Dec 5, 2016, 10:23:20 AM12/5/16
to Ma vincent, RISC-V SW Dev, RISC-V ISA Dev
I believe it's used to point to thread-local storage.  The PC is in the PC register (it cannot fetch instructions otherwise).

--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/SG2PR03MB143846F49AC9F8C33FE2161BC8830%40SG2PR03MB1438.apcprd03.prod.outlook.com.

Andrew Waterman

unread,
Dec 5, 2016, 2:11:49 PM12/5/16
to Samuel Falvo II, Ma vincent, RISC-V SW Dev, RISC-V ISA Dev
Yeah, it's used by __thread / thread_local in pthreads / C++ programs.


On Monday, December 5, 2016, Samuel Falvo II <sam....@gmail.com> wrote:
I believe it's used to point to thread-local storage.  The PC is in the PC register (it cannot fetch instructions otherwise).
On Dec 5, 2016 7:16 AM, "Ma vincent" <ma9...@live.cn> wrote:

hi all,


I am trying to run a rtos on my simple riscv core. Is the x4 register named 'thread pointer' used for storing current PC for current thread?


best,

vincent


--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAEz%3Dso%3DxJioyucDXsUQXPm9JDhUQHBkUfA6fmesFuPP80fY9tw%40mail.gmail.com.

Richard W.M. Jones

unread,
Dec 7, 2016, 9:38:27 AM12/7/16
to Andrew Waterman, Samuel Falvo II, RISC-V SW Dev, RISC-V ISA Dev
On Mon, Dec 05, 2016 at 11:11:46AM -0800, Andrew Waterman wrote:
> Yeah, it's used by __thread / thread_local in pthreads / C++ programs.

I had a possibly naive question about this: Do implementations treat
these registers like x4, x2 (sp), x1 (ra) differently, perhaps
optimizing them in some way? I mean by this three points:

(1) Can a non-threaded program go ahead and use x4 for its own
purposes without any penalty or other issues?

(2) Apart from the obvious tools and ABI concerns, could an
implementation use different registers for stack pointer, TLS, etc?

(3) Apart from x0, are all registers identical at the chip level?

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines. Supports shell scripting,
bindings from many languages. http://libguestfs.org

David Chisnall

unread,
Dec 7, 2016, 9:46:27 AM12/7/16
to Richard W.M. Jones, Andrew Waterman, Samuel Falvo II, RISC-V SW Dev, RISC-V ISA Dev
On 7 Dec 2016, at 14:38, Richard W.M. Jones <rjo...@redhat.com> wrote:
>
> On Mon, Dec 05, 2016 at 11:11:46AM -0800, Andrew Waterman wrote:
>> Yeah, it's used by __thread / thread_local in pthreads / C++ programs.
>
> I had a possibly naive question about this: Do implementations treat
> these registers like x4, x2 (sp), x1 (ra) differently, perhaps
> optimizing them in some way? I mean by this three points:
>
> (1) Can a non-threaded program go ahead and use x4 for its own
> purposes without any penalty or other issues?

Aside from ABI issues, this should be safe. Of course, if you can’t guarantee that your program is completely free from uses of TLS (which it will pick up from linking most libc implementations), then you can use the thread pointer register only between function calls (i.e. you must restore it before any calls and on function return).

> (2) Apart from the obvious tools and ABI concerns, could an
> implementation use different registers for stack pointer, TLS, etc?
>
> (3) Apart from x0, are all registers identical at the chip level?

These two are closely related. I’m not sure about existing RISC-V implementations, but it’s fairly common to do various microarchitectural optimisations based on some of these things. For example:

- A jump to the link register is typically treated as a return and will be predicted using a call stack predictor (less relevant in superscalar designs).

- x86 chips alias the top few stack slots with rename registers (which is why push and pop are so expensive), though this kind of optimisation is unlikely to be used with RISC-V.

- Some RISC chips use the stack pointer register to hint cache prefetching.

If any implementation does any of these things, then you will get worse performance from using these registers for other purposes.

David

Richard Herveille

unread,
Dec 7, 2016, 9:49:42 AM12/7/16
to Richard W.M. Jones, Richard Herveille, Andrew Waterman, Samuel Falvo II, RISC-V SW Dev, RISC-V ISA Dev
On 07 Dec 2016, at 15:38, Richard W.M. Jones <rjo...@redhat.com> wrote:

On Mon, Dec 05, 2016 at 11:11:46AM -0800, Andrew Waterman wrote:
Yeah, it's used by __thread / thread_local in pthreads / C++ programs.

I had a possibly naive question about this: Do implementations treat
these registers like x4, x2 (sp), x1 (ra) differently, perhaps
optimizing them in some way?  I mean by this three points:

(1) Can a non-threaded program go ahead and use x4 for its own
purposes without any penalty or other issues?

(2) Apart from the obvious tools and ABI concerns, could an
implementation use different registers for stack pointer, TLS, etc?

(3) Apart from x0, are all registers identical at the chip level?



I guess that’s implementation dependent. In my implementation; yes, yes, and yes.
Even x0 is implemented as a regular register. But that’s because I implement the register files in SRAM. Making a 32x32 (or 32x64) SRAM is simpler than adding an exception for address 0. So I use a regular SRAM register file and always output ‘0’ when I detect address 0.

Richard


ROA LOGIC
Design Services and Silicon Proven IP

Richard Herveille
Managing Director
Cell +31 (6) 5207 2230




Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.
To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

Stefan O'Rear

unread,
Dec 7, 2016, 2:25:12 PM12/7/16
to David Chisnall, Richard W.M. Jones, Andrew Waterman, Samuel Falvo II, RISC-V SW Dev, RISC-V ISA Dev
On Wed, Dec 7, 2016 at 6:46 AM, David Chisnall
<David.C...@cl.cam.ac.uk> wrote:
> On 7 Dec 2016, at 14:38, Richard W.M. Jones <rjo...@redhat.com> wrote:
>>
>> On Mon, Dec 05, 2016 at 11:11:46AM -0800, Andrew Waterman wrote:
>>> Yeah, it's used by __thread / thread_local in pthreads / C++ programs.
>>
>> I had a possibly naive question about this: Do implementations treat
>> these registers like x4, x2 (sp), x1 (ra) differently, perhaps
>> optimizing them in some way? I mean by this three points:
>>
>> (1) Can a non-threaded program go ahead and use x4 for its own
>> purposes without any penalty or other issues?
>
> Aside from ABI issues, this should be safe. Of course, if you can’t guarantee that your program is completely free from uses of TLS (which it will pick up from linking most libc implementations), then you can use the thread pointer register only between function calls (i.e. you must restore it before any calls and on function return).

Does your program have any signal handlers? If so you may have
problems because the signal handlers will expect the "real" x3 and x4.
(Theoretically the kernel could store a copy of x3 and x4, accessible
via prctl, and swap them in signal delivery / rt_sigreturn; this might
be worth proposing if enough people want to spill x3 and x4).

>> (2) Apart from the obvious tools and ABI concerns, could an
>> implementation use different registers for stack pointer, TLS, etc?

At a system level yes. In RV64G, all registers are semantically identical.

The C extension is more opinionated: many compressed instructions can
only use x8-x15, so those should be the hottest, and several
compressed instructions can only use x1 or x2.

>> (3) Apart from x0, are all registers identical at the chip level?
>
> These two are closely related. I’m not sure about existing RISC-V implementations, but it’s fairly common to do various microarchitectural optimisations based on some of these things. For example:

They're identical at the ISA level. Chips can distinguish them, but
that has to be handled as optimizations; you can use any stack pointer
and return pointer you want and it might be slower but it will work.

In rocket-chip, the only irregularity is that jumps to x1 and x5 (t0,
the secondary return pointer used for millicode) are predicted using
the return address stack instead of the BTB. Rocket-chip has no
special handling of x2/x3/x4.

> - A jump to the link register is typically treated as a return and will be predicted using a call stack predictor (less relevant in superscalar designs).
>
> - x86 chips alias the top few stack slots with rename registers (which is why push and pop are so expensive), though this kind of optimisation is unlikely to be used with RISC-V.

This claim is repeated often but I can't find a source for it. In
particular I checked Agner Fog's manuals pretty closely a week ago.

Most modern x86 tracks the _value_ of RSP in the frontend, so that
RSP-relative addressing / PUSH / POP are absolute loads and stores in
the OOO engine, which reduces op counts for push/pop and makes
store-to-load forwarding more effective.

> - Some RISC chips use the stack pointer register to hint cache prefetching.
>
> If any implementation does any of these things, then you will get worse performance from using these registers for other purposes.

-s
Reply all
Reply to author
Forward
0 new messages