MUST software mark unsued PTEs in a page directory as not valid ?

198 views
Skip to first unread message

Adnan Hamid

unread,
Mar 3, 2025, 12:48:14 PMMar 3
to RISC-V ISA Dev
Here is another tape-out gating customer question that is in need of a definitive ruling:

May a compliant RISCV implementation assume that software MUST mark unused PTEs in a page directory not valid ?

A situation arose where the Breker RISCV test generator created a scenario where it jumped (`jarl`) to a virtual address that was the second to last byte of a 4K page. The bytes at that address decoded to a return instruction that would allow software to continue normal execution.

Meanwhile an aggressive code prefetch engine speculatively read the following PTE with garbage bytes which happened to decode to a 64K page that overlapped with the 4K page being accessed. The 64K page overwrote the TLB mapping causing the side effect of having the `jarl` take an instruction access fault because the original 4K mapping was overwritten.

From 5.2.1 Supervisor Memory-Management Fence Instruction:
"An implicit read of the memory-management data structures may return any translation for an address that was valid at any time since the most recent SFENCE.VMA that subsumes that address."

A `sfence.vma` was issued for the `jarl` target, but not for any address in the following page, so it reasonable for the pre-fetch to have read the unused PTE.

Do we have any rule along the lines of "speculative implicit reads cannot have side effects such as causing page faults or overwriting TLB entries" ?

-adnan

Greg Favor

unread,
Mar 3, 2025, 1:13:51 PMMar 3
to Adnan Hamid, RISC-V ISA Dev
I'm not sure if this fully addresses what you are asking, but ...

Speculative/prefetch reads of PTEs can be cached, but cannot cause arch exceptions.  An arch exception only happens on an actual use of a translation by an instruction.

If multiple translations of different page sizes are cached (because an sfence.vma was not executed as part of changing the page tables from one page size to the other), an implementation is free to use either translation.

An address-specific sfence must invalidate all translations, of whatever page sizes, that contain the address.

Is your scenario one of there first being a 4KB page mapping, and then that was replaced by a 64KB mapping?  And when, relative to the demand access was an sfence.vma performed?

Greg

Adnan Hamid

unread,
Mar 3, 2025, 1:35:13 PMMar 3
to RISC-V ISA Dev, Greg Favor, RISC-V ISA Dev, Adnan Hamid
Hi Greg,
The sfence.vma was performed prior to the jarl that triggered the speculative prefetch and only on the target address of the jarl, there was no sfence.vma for the addresses in the following (unmapped) page.

From “Svnapot” Standard Extension for NAPOT Translation Contiguity, Version 1.0":
"It is the responsibility of the OS and/or hypervisor to configure the page tables in such
a way that there are no inconsistencies between NAPOT PTEs and other NAPOT or non-
NAPOT PTEs that overlap the same address range."

In this case by pure bad ( good ? ) luck the PTE following the properly mapped PTE happened to decode with V=1, X=1 and N=1.
There was definite inconsistency in the leaf page address between the two PTE's, thus violating the NATOP requirement.

Does it follow that the RISCV ISA requires that software mark all unused PTE's in a page directory as NOT valid so this case does not trigger ?

-adnan

Greg Favor

unread,
Mar 3, 2025, 1:42:32 PMMar 3
to Adnan Hamid, RISC-V ISA Dev
On Mon, Mar 3, 2025 at 10:35 AM Adnan Hamid <adnan....@gmail.com> wrote:
Hi Greg,
The sfence.vma was performed prior to the jarl that triggered the speculative prefetch and only on the target address of the jarl, there was no sfence.vma for the addresses in the following (unmapped) page.

From “Svnapot” Standard Extension for NAPOT Translation Contiguity, Version 1.0":
"It is the responsibility of the OS and/or hypervisor to configure the page tables in such
a way that there are no inconsistencies between NAPOT PTEs and other NAPOT or non-
NAPOT PTEs that overlap the same address range."

In this case by pure bad ( good ? ) luck the PTE following the properly mapped PTE happened to decode with V=1, X=1 and N=1.
There was definite inconsistency in the leaf page address between the two PTE's, thus violating the NATOP requirement.

Does it follow that the RISCV ISA requires that software mark all unused PTE's in a page directory as NOT valid so this case does not trigger ?

An implementation is free to cache a 64KB mapping of an address based on reading just one of the 16 NAPOT PTEs within a group.  So if there is inconsistency between the 16 PTEs, then one can end up caching a different translation for each of the PTEs that is different from the others.  So one can end up with some 4 KB mappings, a 64KB mapping, and some invalid mappings - all cached in a TLB.  A rather extreme scenario, but this exemplifies the potential resulting possibilities when one has inconsistent PTEs within a NAPOT group.

Greg

Adnan Hamid

unread,
Mar 3, 2025, 1:56:55 PMMar 3
to RISC-V ISA Dev, Greg Favor, RISC-V ISA Dev, Adnan Hamid
Hi Greg,
Your conclusion above is clear, and the impetus for starting this thread in the first place.

The only way I see of avoiding an erroneous caching of an implicit read from a speculative prefetch is to have a RISCV ISA requirement that software MUST mark all unused PTEs in a page directory as NOT valid.

Is this a RISCV ISA requirement ?
If not, how can the implementation avoid the bad instruction access fault for the original jarl ?
-adnan



Guy Lemieux

unread,
Mar 3, 2025, 1:58:54 PMMar 3
to Adnan Hamid, RISC-V ISA Dev
On Mon, Mar 3, 2025 at 9:48 AM Adnan Hamid <adnan....@gmail.com> wrote:
>
> Here is another tape-out gating customer question that is in need of a definitive ruling:
>
> May a compliant RISCV implementation assume that software MUST mark unused PTEs in a page directory not valid ?
>
> A situation arose where the Breker RISCV test generator created a scenario where it jumped (`jarl`) to a virtual address that was the second to last byte of a 4K page. The bytes at that address decoded to a return instruction that would allow software to continue normal execution.

is the return instruction 2B or 4B long?

if it is 4B long, then a second page translation is required.

is the problem arising because the second translation being launched
speculatively for a subsequent 4K page, but the instruction turns out
to be 2B so that speculative translation was never required? (and that
speculative translation happens to find an invalid 64K PTE, which I
would argue is invalid and should not be consulted)

> an aggressive code prefetch engine speculatively read the following PTE with garbage bytes which happened to decode to a 64K page

Just wanted to make sure I understand:
-- the following PTE is for a subsequent page, and it happens to be
64K which also overlaps with the initial PTE?
-- the 64K PTE is not initialized by the OS properly, so it contains
garbage info? I believe the OS *must* ensure these are at least marked
invalid (V=0), and that the behaviour of an implementation would be
undefined if V=1 but the rest of the entry was "garbage".

Guy

Guy Lemieux

unread,
Mar 3, 2025, 2:01:38 PMMar 3
to Adnan Hamid, RISC-V ISA Dev, Greg Favor
On Mon, Mar 3, 2025 at 10:56 AM Adnan Hamid <adnan....@gmail.com> wrote:
> The only way I see of avoiding an erroneous caching of an implicit read from a speculative prefetch is to have a RISCV ISA requirement that software MUST mark all unused PTEs in a page directory as NOT valid.

I'm confused ... why would an OS ever leave PTEs uninitialized (ie,
leave any unused PTEs possibly marked as VALID)? this would cause a
hardware page table walker to do bad things.

guy

Greg Favor

unread,
Mar 3, 2025, 2:03:01 PMMar 3
to Adnan Hamid, RISC-V ISA Dev
On Mon, Mar 3, 2025 at 10:56 AM Adnan Hamid <adnan....@gmail.com> wrote:
Hi Greg,
Your conclusion above is clear, and the impetus for starting this thread in the first place.

The only way I see of avoiding an erroneous caching of an implicit read from a speculative prefetch is to have a RISCV ISA requirement that software MUST mark all unused PTEs in a page directory as NOT valid.

Is this a RISCV ISA requirement ?

No.  Also note that even if there was a requirement, software can still not conform to that requirement and one is again left with the same possibilities.

In practice, if the spec required consistency, then it would probably also specify that the behavior is UNSPECIFIED if the requirement is not satisfied.

If not, how can the implementation avoid the bad instruction access fault for the original jarl ?

Don't have inconsistent PTEs within a NAPOT group - which is a software matter.  Otherwise hardware would probably have to jump through a variety of hoops to ensure some specific chosen implementation behavior in the face of inconsistent PTEs.

Greg

Adnan Hamid

unread,
Mar 3, 2025, 2:05:32 PMMar 3
to RISC-V ISA Dev, Guy Lemieux, RISC-V ISA Dev, Adnan Hamid
inline

On Monday, March 3, 2025 at 10:58:54 AM UTC-8 Guy Lemieux wrote:
On Mon, Mar 3, 2025 at 9:48 AM Adnan Hamid <adnan....@gmail.com> wrote:
>
> Here is another tape-out gating customer question that is in need of a definitive ruling:
>
> May a compliant RISCV implementation assume that software MUST mark unused PTEs in a page directory not valid ?
>
> A situation arose where the Breker RISCV test generator created a scenario where it jumped (`jarl`) to a virtual address that was the second to last byte of a 4K page. The bytes at that address decoded to a return instruction that would allow software to continue normal execution.

is the return instruction 2B or 4B long?
2B long

if it is 4B long, then a second page translation is required.
thus second page translation is NOT required.
 

is the problem arising because the second translation being launched
speculatively for a subsequent 4K page, but the instruction turns out
to be 2B so that speculative translation was never required? 
Correct.
(and that
speculative translation happens to find an invalid 64K PTE, which I
would argue is invalid and should not be consulted)
The 64K PTE was indeed invalid and left as random bytes and happened to have V=1.

> an aggressive code prefetch engine speculatively read the following PTE with garbage bytes which happened to decode to a 64K page

Just wanted to make sure I understand:
-- the following PTE is for a subsequent page, and it happens to be
64K which also overlaps with the initial PTE?
yes
-- the 64K PTE is not initialized by the OS properly, so it contains
garbage info? I believe the OS *must* ensure these are at least marked
invalid (V=0), and that the behaviour of an implementation would be
undefined if V=1 but the rest of the entry was "garbage".

I agree, and am looking for a definitive guidance from RVI that unused PTEs in a page directory must be marked invalid (V=0).

The only other interpretation I can think of is that speculative implicit memory reads may not have side effects such as exceptions ( which is true ) or TLB replacements ( which is false ).

Guy

Ved Shanbhogue

unread,
Mar 3, 2025, 2:08:56 PMMar 3
to Adnan Hamid, RISC-V ISA Dev, Greg Favor
Adnan Hamid wrote:
>The only way I see of avoiding an erroneous caching of an implicit read
>from a speculative prefetch is to have a RISCV ISA requirement that
>software MUST mark all unused PTEs in a page directory as NOT valid.

If the V - valid - bit is set then for the ISA/hardware, its a valid PTE.
Whether its unused or was marked valid due to a software bug is not
something the hardware/ISA can infer.

For the case of Svnapot, if the NAPOT PTEs are not configured identically
then the behavior due to the inconsistency is similar to what one can
expect due to incorrect use of sfence.vma and the implementation may
unpredictable use any of the NAPOT PTEs in that range. The guidance
in the Svnapot chapter clearly lays out the software requirements.

regards
ved

Adnan Hamid

unread,
Mar 3, 2025, 2:09:15 PMMar 3
to RISC-V ISA Dev, Guy Lemieux, RISC-V ISA Dev, Greg Favor, Adnan Hamid
Right. Hence the proposed clarification that all unused PTE's must have V=0.

Adnan Hamid

unread,
Mar 3, 2025, 2:24:52 PMMar 3
to RISC-V ISA Dev, Greg Favor, RISC-V ISA Dev, Adnan Hamid
On Monday, March 3, 2025 at 11:03:01 AM UTC-8 Greg Favor wrote:
On Mon, Mar 3, 2025 at 10:56 AM Adnan Hamid <adnan....@gmail.com> wrote:
Hi Greg,
Your conclusion above is clear, and the impetus for starting this thread in the first place.

The only way I see of avoiding an erroneous caching of an implicit read from a speculative prefetch is to have a RISCV ISA requirement that software MUST mark all unused PTEs in a page directory as NOT valid.

Is this a RISCV ISA requirement ?

No. 
No ? I was all ready to put this thread to bed before you said No.
 
Also note that even if there was a requirement, software can still not conform to that requirement and one is again left with the same possibilities.
Hmm? Software did not foresee the speculative fetch to the next page which was NOT required because the ret instruction occupied only 2B at the end of the target page. 

I am arguing that software is responsible for setting PTE.V=0 if the PTE is not mapped, otherwise behavior is UNSPECIFIED.
 

In practice, if the spec required consistency, then it would probably also specify that the behavior is UNSPECIFIED if the requirement is not satisfied.

If not, how can the implementation avoid the bad instruction access fault for the original jarl ?

Don't have inconsistent PTEs within a NAPOT group - which is a software matter.  Otherwise hardware would probably have to jump through a variety of hoops to ensure some specific chosen implementation behavior in the face of inconsistent PTEs.

The inconsistent PTEs within a NAPOT group occurred because random bytes were left in the unused PTE. 
Repeating my proposal, to avoid this software MUST guarantee PTE.V=0 if the PTE is not mapped/used, otherwise behavior is UNSPECIFIED.

This is not a problem, it just wasn't obvious, specially given the directive that speculative implicit reads may not cause exceptions, which leads one to imagine that speculative implicit reads do not have side effects.


Greg

Dan Petrisko

unread,
Mar 3, 2025, 2:29:21 PMMar 3
to Adnan Hamid, RISC-V ISA Dev, Guy Lemieux, Greg Favor
Hi Adnan,

What is an "unused" PTE? There's no such definition in the spec. Can you clarify what you're referring to? From an ISA perspective there's no difference between a PTE with V=1 by accident or on purpose so from my view any PTE with V=1 is "used"

From the spec, this is the line I think is most relevant:

In a conventional TLB design, it is possible for multiple entries to match a single

address if, for example, a page is upgraded to a superpage without first clearing the

original non-leaf PTE’s valid bit and executing an SFENCE.VMA with rs1=x0. In this

case, a similar remark applies: it is unpredictable whether the old non-leaf PTE or the

new leaf PTE is used, but the behavior is otherwise well defined.


Rather than think about speculation, let's work backwards from the other direction. It is legal to have a 0-entry TLB in RISC-V. This means that hardware would do a page-table walk for every memory access. As per the spec quote above, the page-table walk would be allowed to return either mapping arbitrarily. So this software problem exists without any form of speculation


Best,

-Dan



--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To view this discussion visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/fbca732a-bfc0-4f28-a308-f5f0f44ca1dbn%40groups.riscv.org.

Ved Shanbhogue

unread,
Mar 3, 2025, 2:35:37 PMMar 3
to Adnan Hamid, RISC-V ISA Dev, Greg Favor
Adnan Hamid wrote:
> The inconsistent PTEs within a NAPOT group occurred because random bytes
> were left in the unused PTE.

Its not clear what is mean "is not mapped/used" - A PTE is either
valid or is invalid. There is no concept of "not in use".

> Repeating my proposal, to avoid this software MUST guarantee PTE.V=0 if the
> PTE is not mapped/used, otherwise behavior is UNSPECIFIED.

There must not be invalid PTEs in a NAPOT group for it to be a
valid NAPOT group. And all PTEs in a NAPOT group must be
identical (barring the RSW bits; and the A/D bits if Svadu is
active).

regards
ved


BGB

unread,
Mar 3, 2025, 2:37:28 PMMar 3
to isa...@groups.riscv.org
Yeah, hardware misbehaving in the case of questionable/broken behavior
by software can probably be left as a software issue in this case. It is
more the responsibility of software to fix.

So, the role of the hardware is mostly to behave as described within
some range of reasonable scenarios (though, with extra care mainly in
cases that could potentially result in privilege escalation). Behavior
in unreasonable scenarios (that do not result in potential for privilege
escalation or similar) can be left as implementation defined.


However, at the same time, it isn't really the role of the hardware
description to dictate or prescribe behavior on the software side of
things. So, technically, software is free to shoot itself in the foot
all it wants.

And, if the OS crashes due to bad entries in the page table, this is the
OS's fault.


Though, realistically, it would be made as a working assumption that any
memory access (instruction or data) may potentially access both the
current and following cache line; so any access to the last cache line
of a page may also potentially access the first cache line of the next
page regardless of the actual access size (and whether or not the memory
access itself straddles a cache line boundary).


Cache line size depends on implementation:
In my case, I am using 16 bytes (in the L1 caches);
The L2 cache using 64 byte cache lines.
But, 32 and 64 byte cache lines are also popular.

Decided not to go into specifics of tradeoffs for cache line size and
why one might consider a bigger or smaller line size. Ideally, software
shouldn't need to care though.


Does likely mean though that for memory mappings that are too close to a
power of 2 size, it may make sense to pad them with an extra "dummy
page" that is likely read-only as zeroes, and 0 or more following pages
that are no access (though, this may be asking more on a 32b machine
than a 64b machine).

But, still not the role of HW to prescribe what SW does...


> guy
>

Allen Baum

unread,
Mar 3, 2025, 2:49:48 PMMar 3
to BGB, isa...@groups.riscv.org
So I think the scenario is that the processor implements NAPOT, but doesn't use it for a bunch of 4K PTE entries 
until a speculative fetch encounters one that does, and that causes it to replace the 4K PTE.
The SW fix is that if the NAPOT extension is implemented, then all PTE entries in a NAPOT group must be properly configured,
regardless of whether those pages are actually used by SW or not. 
It gets more interesting if this had happened at the end of a 64K page, in which case it might have still filled into the TLB, 
but not have overwritten an existing valid TLB entry.


--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

Guy Lemieux

unread,
Mar 3, 2025, 3:31:32 PMMar 3
to Adnan Hamid, RISC-V ISA Dev, Greg Favor
On Mon, Mar 3, 2025 at 11:24 AM Adnan Hamid <adnan....@gmail.com> wrote:
>
>
>
> On Monday, March 3, 2025 at 11:03:01 AM UTC-8 Greg Favor wrote:
>
> On Mon, Mar 3, 2025 at 10:56 AM Adnan Hamid <adnan....@gmail.com> wrote:
>
> Hi Greg,
> Your conclusion above is clear, and the impetus for starting this thread in the first place.
>
> The only way I see of avoiding an erroneous caching of an implicit read from a speculative prefetch is to have a RISCV ISA requirement that software MUST mark all unused PTEs in a page directory as NOT valid.
>
> Is this a RISCV ISA requirement ?
>
> No.
>
> No ? I was all ready to put this thread to bed before you said No.

Greg is correct -- this is not an ISA requirement.

For example, the OS can mark unused PTEs as valid. In this case, the
PTE may be valid, but its properties may be set in a way that still
causes an exception upon access (eg, by setting RWX=000, or by setting
A=0 or D=0 and expecting that the hardware PTW cannot change those
bits so an exception is required).

> Also note that even if there was a requirement, software can still not conform to that requirement and one is again left with the same possibilities.

Software that does not conform to ISA requirements has undefined behaviour.

> Hmm? Software did not foresee the speculative fetch to the next page which was NOT required because the ret instruction occupied only 2B at the end of the target page.
>
> I am arguing that software is responsible for setting PTE.V=0 if the PTE is not mapped, otherwise behavior is UNSPECIFIED.

Correct.


> In practice, if the spec required consistency, then it would probably also specify that the behavior is UNSPECIFIED if the requirement is not satisfied.

I believe you quoted the NAPOT spec as saying this already.

> If not, how can the implementation avoid the bad instruction access fault for the original jarl ?

speculative accesses cannot fault.

if the jarl has a valid 4K PTE and an invalid 64K PTE is read
speculatively, the 64K PTE (which is invalid) must not take priority
over the 4K PTE (which is valid).

if this was my implementation, if a hardware PTW does speculatively
reads a PTE that is marked invalid, I would not store this in the TLB
(so there is no chance the 4K PTE gets bumped).

I also noticed a related statement in the priv spec:

"Speculative executions of the address-translation algorithm behave as
non-speculative executions of the algorithm do, except that they must
not set the dirty bit for a PTE, they must not trigger an exception,
and they must not create address-translation cache entries if those
entries
would have been invalidated by any SFENCE.VMA instruction executed by
the hart since the speculative execution of the algorithm began."
https://github.com/riscv/riscv-isa-manual/blob/main/src/supervisor.adoc?plain=1

In other words, if the 64K entry really is invalid (and would still be
valid after SFENCE.VMA), then it is an HW IMPLEMENTATION ERROR
according to this spec for hardware to create a 64K entry in the TLB.

> Don't have inconsistent PTEs within a NAPOT group - which is a software matter. Otherwise hardware would probably have to jump through a variety of hoops to ensure some specific chosen implementation behavior in the face of inconsistent PTEs.

yes, this is a software matter. these types of bugs are very serious,
and can be difficult to track down, but they are not hardware bugs.

> The inconsistent PTEs within a NAPOT group occurred because random bytes were left in the unused PTE.
> Repeating my proposal, to avoid this software MUST guarantee PTE.V=0 if the PTE is not mapped/used, otherwise behavior is UNSPECIFIED.


Yes, V=0 generally means it is not mapped or used. However, as I wrote
above, it is possible to have a valid mapping that is still in some
"trap if you use me" state. You may cause such an entry an "unused
PTE", but there is no standard usage for this term.

> This is not a problem, it just wasn't obvious, specially given the directive that speculative implicit reads may not cause exceptions, which leads one to imagine that speculative implicit reads do not have side effects.

As noted above, the speculative implicit read above MAY insert the
entry into the TLB *only* if it is marked Valid *and* there no pending
store to mark it Invalid, but it is not required to do so. However, if
the entry is marked Invalid, then it *MUST NOT* store the entry in the
TLB.

Guy

Adnan Hamid

unread,
Mar 3, 2025, 3:33:34 PMMar 3
to RISC-V ISA Dev, Ved Shanbhogue, RISC-V ISA Dev, Greg Favor, Adnan Hamid
On Monday, March 3, 2025 at 11:35:37 AM UTC-8 Ved Shanbhogue wrote:
Adnan Hamid wrote:
> The inconsistent PTEs within a NAPOT group occurred because random bytes
> were left in the unused PTE.

Its not clear what is mean "is not mapped/used" - A PTE is either
valid or is invalid. There is no concept of "not in use".
This is the crux of the discussion.

A "is not mapped/used" PTE means software did not put any code or data at the page translation that would require this PTE to be read.
As a result it did not initialize that PTE. The old data in that PTE happened to have PTE.V=1 and PTE.N=1. There was no intention of creating a NAPOT group.

It was a speculative prefetch reading code bytes that were not needed that read the "is not mapped/used" PTE causing good TLB entry to be overwritten as a side effect of the prefetch.

Adnan Hamid

unread,
Mar 3, 2025, 3:57:12 PMMar 3
to RISC-V ISA Dev, Guy Lemieux, RISC-V ISA Dev, Greg Favor, Adnan Hamid
On Monday, March 3, 2025 at 12:31:32 PM UTC-8 Guy Lemieux wrote:
On Mon, Mar 3, 2025 at 11:24 AM Adnan Hamid <adnan....@gmail.com> wrote:
>
>
>
> On Monday, March 3, 2025 at 11:03:01 AM UTC-8 Greg Favor wrote:
>
> On Mon, Mar 3, 2025 at 10:56 AM Adnan Hamid <adnan....@gmail.com> wrote:
>
> Hi Greg,
> Your conclusion above is clear, and the impetus for starting this thread in the first place.
>
> The only way I see of avoiding an erroneous caching of an implicit read from a speculative prefetch is to have a RISCV ISA requirement that software MUST mark all unused PTEs in a page directory as NOT valid.
>
> Is this a RISCV ISA requirement ?
>
> No.
>
> No ? I was all ready to put this thread to bed before you said No.

Greg is correct -- this is not an ISA requirement.

For example, the OS can mark unused PTEs as valid. In this case, the
PTE may be valid, but its properties may be set in a way that still
causes an exception upon access (eg, by setting RWX=000, or by setting
A=0 or D=0 and expecting that the hardware PTW cannot change those
bits so an exception is required).
 
Shucks, I was soo close to closing this thread till you brought up the PTE.A=0 case.

It turns out in the failing case PTE.A=0 WAS true, and an exception would have been required to read the NAPOT page, which of course is not allowed by a speculative fetch.

I don't think this changes any of the reasoning about it being okay for the speculative implicit read to overwrite the valid TLB entry.

My conclusion is that the phrase "An implicit read of the memory-management data structures may return any translation for an address that was valid at any time since the most recent SFENCE.VMA that subsumes that address" implies that software MUST NOT leave around PTEs in a page directory that are not initialized.

This implication was not obvious.
-adnan

Greg Favor

unread,
Mar 3, 2025, 3:57:38 PMMar 3
to Adnan Hamid, RISC-V ISA Dev, Greg Favor
On Mon, Mar 3, 2025 at 11:24 AM Adnan Hamid <adnan....@gmail.com> wrote:
The only way I see of avoiding an erroneous caching of an implicit read from a speculative prefetch is to have a RISCV ISA requirement that software MUST mark all unused PTEs in a page directory as NOT valid.

Is this a RISCV ISA requirement ?

No. 
No ? I was all ready to put this thread to bed before you said No.

If you view the admonition in the spec as a "requirement", then yes this is a requirement.  But the spec really is just saying what software SHOULD do if it wants the desired behavior that only a 64 KB mapping gets cached.

Also note that even if there was a requirement, software can still not conform to that requirement and one is again left with the same possibilities.
Hmm? Software did not foresee the speculative fetch to the next page which was NOT required because the ret instruction occupied only 2B at the end of the target page. 

I am arguing that software is responsible for setting PTE.V=0 if the PTE is not mapped, otherwise behavior is UNSPECIFIED.

All 16 PTEs of a NAPOT group need to be consistent with each other.  Some being valid and some being invalid is not consistent.

The inconsistent PTEs within a NAPOT group occurred because random bytes were left in the unused PTE. 
Repeating my proposal, to avoid this software MUST guarantee PTE.V=0 if the PTE is not mapped/used, otherwise behavior is UNSPECIFIED.

If some PTEs within a 16 PTE NAPOT group are invalid, then one doesn't reliably get a 64KB translation.  Some addresses within the 64KB "page" may be successfully translated and some may result in a page fault.

Now if software is guaranteed to never access the "invalid" 4 KB portions of the 64KB translation, then things should be fine.  Any invalid translations that are seen and/or cached by prefetching hardware (or as the result of speculative loads/store execution) may be speculatively hit upon and speculatively cause arch exceptions, but that should all be tossed aside when the hardware mispseculation is resolved.

Greg

BGB

unread,
Mar 3, 2025, 4:14:12 PMMar 3
to Allen Baum, isa...@groups.riscv.org
On 3/3/2025 1:49 PM, Allen Baum wrote:
> So I think the scenario is that the processor implements NAPOT, but
> doesn't use it for a bunch of 4K PTE entries
> until a speculative fetch encounters one that does, and that causes it
> to replace the 4K PTE.
> The SW fix is that if the NAPOT extension is implemented, then all PTE
> entries in a NAPOT group must be properly configured,
> regardless of whether those pages are actually used by SW or not.
> It gets more interesting if this had happened at the end of a 64K page,
> in which case it might have still filled into the TLB,
> but not have overwritten an existing valid TLB entry.
>

OK, I missed that part, but yeah, this makes sense...

So, yeah, probably could make sense to require initializing all entries
in a group in this case, since potentially an access to any of them
could be understood as representing the full page.


Though... Admittedly in my case, I have a different MMU design and had
mostly just ended up using 16K as the default page size. In my testing,
16K came out as the most optimal.

Page size would be up to the OS though (one of 4K/16K/64K); with the MMU
configured relative to the smallest allowed page size (bigger pages
could be allowed, smaller ones not allowed; so configuring the MMU for
16K pages disallows 4K but still allows 64K; but setting the minimum
page size larger reduces the number of TLB misses, ...).

Decided to leave out going too much into my MMU design here.

But, will note, I don't currently have any direct equivalent of NAPOT.
> send an email to isa-dev+u...@groups.riscv.org <mailto:isa-
> dev%2Bunsu...@groups.riscv.org>.
> a6f9-2dc0bc0c8164%40gmail.com <https://groups.google.com/a/
> groups.riscv.org/d/msgid/isa-dev/91600bf0-c259-4b6f-
> a6f9-2dc0bc0c8164%40gmail.com>.
>

Allen Baum

unread,
Mar 3, 2025, 4:29:44 PMMar 3
to BGB, isa...@groups.riscv.org
In this specific case, a NAPOT block of entries (16 of them) must all be configured regardless of whether they define a NAPOT region or not.
IF they define a NAPOT region, then they must be identical; if they don't define one, then  NONE of them can be configured as a NAPOT region.

The salient point here is, if NAPOT extension is implemented, PTE entries must always be configured in groups of 16, 
regardless of whether NAPOT is ever used. No entry in a block of 16 can remain uninitialized if any are initialized.

And, thanks/congratulations to Adnan for finding such a gnarly corner case !

Adnan Hamid

unread,
Apr 3, 2025, 4:55:13 PMApr 3
to RISC-V ISA Dev, Allen Baum, isa...@groups.riscv.org, BGB
Sorry to resurrect a dead horse. I realize now that I myself did not understand the question I was trying to ask.

From Volume II: RISC-V Privileged Architectures V20211203, Chapter 6 “Svnapot”

Standard Extension for NAPOT Translation Contiguity

In Sv39, Sv48, and Sv57, when a PTE has N=1, the PTE represents a translation that is part of
a range of contiguous virtual-to-physical translations with the same values for PTE bits 5–0. Such
ranges must be of a naturally aligned power-of-2 (NAPOT) granularity larger than the base page
size.

(1) The phrase "same values for PTE bits 5–0" indicates that the N bit trumps the V bit.
16 PTE's need to be configured with N=1, and all 16 must have either PTE.V=0 or PTE.V=1.

From Volume II: RISC-V Privileged Architectures V20211203, 5.3.1 Addressing and Memory Protection

The PTE format for Sv32 is shown in Figures 5.18. The V bit indicates whether the PTE is valid; if
it is 0, all other bits in the PTE are don’t-cares and may be used freely by software.

And to be pedantic, from Volume II: RISC-V Privileged Architectures V20211203,
5.4.1 Addressing and Memory Protection

    The PTE format for Sv39 is shown in Figure 5.21. Bits 9–0 have the same meaning as for Sv32.

(2) This says if V=0, a hart should not look at any other bits.

It seems to me that conclusions (1) and (2) conflicting. 
Is there any other way to read this ?

The situation comes up where there are two adjacent L0 PTEs:

| PTE | Flags        | Comment                                            |
|-----|--------------|----------------------------------------------------|
| n   | N = 0, V = 1 | normal 4K PTE                                      |
| n+1 | N = 1, V = 0 | software is not NAPOT aware, and used N bit freely |


A hart could read PTE n+1, see the N bit when V == 0, and pollute its TLB with invalid information.

-adnan

Ved Shanbhogue

unread,
Apr 3, 2025, 5:06:13 PMApr 3
to Adnan Hamid, RISC-V ISA Dev, Allen Baum, BGB
Adnan Hamid wrote:
>(1) The phrase "same values for PTE bits 5–0" indicates that the N bit
>trumps the V bit.
>16 PTE's need to be configured with N=1, and all 16 must have either
>PTE.V=0 or PTE.V=1.

No, the N bit exists in a PTE only when V is 1. When V is 0, there is
no N bit in the PTE - its an invalid PTE.


>(2) This says if V=0, a hart should not look at any other bits.
>
>It seems to me that conclusions (1) and (2) conflicting.
>Is there any other way to read this ?
>
>The situation comes up where there are two adjacent L0 PTEs:
>
>| PTE | Flags | Comment |
>|-----|--------------|----------------------------------------------------|
>| n | N = 0, V = 1 | normal 4K PTE |
>| n+1 | N = 1, V = 0 | software is not NAPOT aware, and used N bit freely |
>
>A hart could read PTE n+1, see the N bit when V == 0, and pollute its TLB
>with invalid information.

In this example, I assume (n)/16 == (n+1)/16. If so then this is a poorly
configured page table.

If the hart reads pte n+1, then it will lead to a page fault if there is a
memory access to the corresponding VA. If Svvptc is not implemented then the
hart may cache the invalid PTE but it is not allowed to interpret any bits
- including the N bit - of such invalid PTEs.

regards
ved

Adnan Hamid

unread,
Apr 3, 2025, 5:33:04 PMApr 3
to RISC-V ISA Dev, Ved Shanbhogue, RISC-V ISA Dev, Allen Baum, BGB, Adnan Hamid
> In this example, I assume (n)/16 == (n+1)/16.
Yes, I should have thought to make that point

> If so then this is a poorly configured page table.
I'm trying to interpret this statement.
Are we saying this is an illegal configuration for software to create ? Since V==0, software "freely used the other bits" .
Software is not doing any explicit memory access to the VA corresponding to pte n+1.

Ved Shanbhogue

unread,
Apr 3, 2025, 8:14:36 PMApr 3
to Adnan Hamid, RISC-V ISA Dev, Allen Baum, BGB
Adnan Hamid wrote:
>> If so then this is a poorly configured page table.
>I'm trying to interpret this statement.
>Are we saying this is an illegal configuration for software to create ?
>Since V==0, software "freely used the other bits" .
>Software is not doing any explicit memory access to the VA corresponding to
>pte n+1.

I see. If that is the intent then there is no concern. All bits may be used
by software freely when V is 0.

regards
ved

Allen Baum

unread,
Apr 5, 2025, 1:40:56 AMApr 5
to Ved Shanbhogue, Adnan Hamid, RISC-V ISA Dev, BGB
To recap: My understanding of this case is where NAPOT is implemented and a PTE lookup in a 64K block had a
 - a valid 4K entry with a PTE.V=1, PTE.N=0, 
 - and a (following) entry in the same (aligned) block of 16 entries that had PTE.V=1 and PTE.N=1 and some illegal address or privilege
When the following entry is prefetched, it effectively replaces the 4K entry with the 64K entry.
What happens  next is undefined behavior: it is a SW bug because the 16 entries are not identical, and for defined behavior, they must be.

Note that if PTE.V=0 (in which case PTE.N is a don't-care)  you can get the same undefined behavior.
So, you can't leave garbage in any of the 16 aligned 4K PTE entries (where garbage means it is different than all other entries)
If NAPOT were not implemented, then there would be no overwrite, and V=0 isn't a problem
It hurts when you do that, don't do that.

Earl Killian pointed out (to me) that there are 32 x16 6 PTE entries pointed to by the next higher page table level, 
and you can't leave any of the other 496 entries uninitialized either.

Greg Favor

unread,
Apr 5, 2025, 1:52:44 AMApr 5
to Allen Baum, Ved Shanbhogue, Adnan Hamid, RISC-V ISA Dev, BGB
On Fri, Apr 4, 2025 at 10:40 PM 'Allen Baum' via RISC-V ISA Dev <isa...@groups.riscv.org> wrote:
Earl Killian pointed out (to me) that there are 32 x16 6 PTE entries pointed to by the next higher page table level, 
and you can't leave any of the other 496 entries uninitialized either.

In general one can't leave any path down the tree of page tables uninitialized since the architecture allows for arbitrary prefetching and caching of translations - and hence any seemingly valid translations and any seemingly invalid translations can be cached and used later used if hit upon (by speculative as well as by non-speculative execution of loads, stores, and instruction fetches).  And yes, any purely speculative use of "uninitialized translations" should not be a problem - until they are initialized and then sfence operations will need to be performed to invalidate any/all old "uninitialized translations" (and partial translations) that may have been cached.

Greg

MitchAlsup

unread,
Apr 10, 2025, 10:31:28 PMApr 10
to RISC-V ISA Dev, Adnan Hamid
On Monday, March 3, 2025 at 11:48:14 AM UTC-6 Adnan Hamid wrote:
Here is another tape-out gating customer question that is in need of a definitive ruling:

May a compliant RISCV implementation assume that software MUST mark unused PTEs in a page directory not valid ?

A situation arose where the Breker RISCV test generator created a scenario where it jumped (`jarl`) to a virtual address that was the second to last byte of a 4K page. The bytes at that address decoded to a return instruction that would allow software to continue normal execution.

Meanwhile an aggressive code prefetch engine speculatively read the following PTE with garbage bytes which happened to decode to a 64K page that overlapped with the 4K page being accessed. The 64K page overwrote the TLB mapping causing the side effect of having the `jarl` take an instruction access fault because the original 4K mapping was overwritten.

I would notice that if the TLB is not updated until the causing instruction retires, the ambiguity vanishes. 

Thus, since the instruction after the RET is not performed (all the way to retire) then its side effects of
cache and TLB pollution are not observed either.
 

-adnan

Allen Baum

unread,
Apr 11, 2025, 2:31:17 AMApr 11
to MitchAlsup, RISC-V ISA Dev, Adnan Hamid
That is an microarchitectural fix for this issue - but is  implicitly allowed by the architecture, as Greg points out .

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To view this discussion visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/146dbc9a-8cb2-4f2f-9fd4-772ab91dc1a0n%40groups.riscv.org.
Reply all
Reply to author
Forward
0 new messages