Effect of HFENCE.GVMA on cached VS-stage TLB entries

271 views
Skip to first unread message

ken...@imperas.com

unread,
Sep 21, 2021, 2:46:05 PM9/21/21
to RISC-V ISA Dev
I'm trying to understand the precise requirements for invalidation of VS-stage TLB entries when an HFENCE.GVMA instruction is executed, assuming that VS-stage and G-stage TLB entries are not merged by the hardware. Section 5.5.3 of the Privileged Architecture Specification (Memory-Management Fences) concludes by saying:

When PMP settings are modified in a manner that affects either the physical memory that holds guest-physical page tables or the physical memory to which guest-physical page tables point, an HFENCE.GVMA instruction with rs1=x0 and rs2=x0 must be executed in M-mode after the PMP CSRs are written. An HFENCE.VVMA instruction is not required. 

Does this mean that HFENCE.GVMA x0, x0 therefore also invalidates all VS-stage TLB entries, or is cached VS-stage translation data undisturbed?

Assuming that HFENCE.GVMA does cause invalidation of VS-stage cached entries, what is the required behavior when an HFENCE.GVMA is executed that invalidates only a single guest physical entry? Which VS-stage cached entries, if any, should be invalidated? The page table walk to create a VS-stage TLB entry could require access to 2, 3, 4 or even 5 different guest physical pages (with Sv57). Is invalidation of any one of those pages with HFENCE.GVMA required to invalidate the VS-stage entry?

Thanks.

Anup Patel

unread,
Sep 22, 2021, 12:06:02 AM9/22/21
to ken...@imperas.com, RISC-V ISA Dev
Hi,

On Wed, Sep 22, 2021 at 12:16 AM ken...@imperas.com <ken...@imperas.com> wrote:
>
> I'm trying to understand the precise requirements for invalidation of VS-stage TLB entries when an HFENCE.GVMA instruction is executed, assuming that VS-stage and G-stage TLB entries are not merged by the hardware. Section 5.5.3 of the Privileged Architecture Specification (Memory-Management Fences) concludes by saying:
>
> When PMP settings are modified in a manner that affects either the physical memory that holds guest-physical page tables or the physical memory to which guest-physical page tables point, an HFENCE.GVMA instruction with rs1=x0 and rs2=x0 must be executed in M-mode after the PMP CSRs are written. An HFENCE.VVMA instruction is not required.

This statement also covers systems with separate VS-stage and G-stage
TLB entries.

>
> Does this mean that HFENCE.GVMA x0, x0 therefore also invalidates all VS-stage TLB entries, or is cached VS-stage translation data undisturbed?

HFENCE.GVMA x0, x0 will only invalidate all G-stage TLB entries for
systems with separate VS-stage and G-stage TLB entries.

Upon HFENCE.GVMA x0, x0 after PMP changes would imply the following
for VS-stage TLB lookups immediately after HFENCE.GVMA:

1) VS-stage TLB hits will cause a G-stage page table walk for the
cached GVA-to-GPA translation in the VS-stage TLB entry because all
G-stage TLB entries have been invalidated. This will also result in
checking updated PMP settings on the result of G-stage page table
walk. If PMP checks fail on G-stage page table walk then M-mode
firmware will typically re-direct the access fault to HS-mode
Hypervisor.

2) VS-stage TLB miss will most likely cause G-stage page table walk
for all VS-stage levels and for the final GPA. This will also result
in checking updated PMP settings on all G-stage page table walks (same
as above). Also, if any PMP checks fail on G-stage page table walk
then M-mode firmware will redirect the access fault trap to HS-mode
Hypervisor.

>
> Assuming that HFENCE.GVMA does cause invalidation of VS-stage cached entries, what is the required behavior when an HFENCE.GVMA is executed that invalidates only a single guest physical entry? Which VS-stage cached entries, if any, should be invalidated? The page table walk to create a VS-stage TLB entry could require access to 2, 3, 4 or even 5 different guest physical pages (with Sv57). Is invalidation of any one of those pages with HFENCE.GVMA required to invalidate the VS-stage entry?

Regards,
Anup

>
> Thanks.
>
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/5c8ac9b9-8a58-4673-8feb-a9b29e203b94n%40groups.riscv.org.

Kelvin Goveas

unread,
Nov 10, 2021, 11:56:30 AM11/10/21
to RISC-V ISA Dev, Anup Patel, RISC-V ISA Dev, ken...@imperas.com
Hi Anup,

An implementation may also use translation path caches (TPCs) to accelerate table walks with separate TPCs for VS-stage and G-stage. A VS-stage TPC could cache up to 3 levels of the VS-stage table walk (L3, L2, L1). In the VS-stage TLB hit scenario you mentioned, a table walk engine could skip 1 or more VS-stage walks. Each of these would have one or more G-stage mini walks associated with them whose translations are indirectly cached as part of the VS-stage TPC entry. When the PMP settings are changed, a VS-stage TPC hit would not cause a re-walk of all the levels that were skipped. If any of the G-stage mini walks that were skipped used PAs that now cause a fault with the new PMP settings, this would not be detected by such an implementation even though the G-stage TLB/TPCs were invalidated. If software also executed an hfence.vvma x0,x0 when PMP settings were changed, this would invalidate entries in the VS-stage TPCs as well.

Kelvin

John Hauser

unread,
Nov 12, 2021, 11:02:57 PM11/12/21
to RISC-V ISA Dev
Kelvin Goveas wrote:
> An implementation may also use translation path caches (TPCs) to
> accelerate table walks with separate TPCs for VS-stage and G-stage.
> [...]

When a machine has caches like this, every modification to the G-stage
page tables requires not only an HFENCE.GVMA but also a generic
HFENCE.VVMA.  Rather than make software execute the HFENCE.VVMA, which
may not be ideal for every machine, we have chosen instead to require
that machines of this type infer automatically a generic HFENCE.VVMA
(hfence.vvma x0,x0) on every HFENCE.GVMA.

    - John Hauser

James Kenney

unread,
Nov 13, 2021, 10:56:58 AM11/13/21
to John Hauser, RISC-V ISA Dev
Hello John,

You say that:

Rather than make software execute the HFENCE.VVMA, which may not be ideal for every machine, we have chosen instead to require that machines of this type infer automatically a generic HFENCE.VVMA (hfence.vvma x0,x0) on every HFENCE.GVMA.

Can you point me at the part of the specification that says this? I was unable to find anything - perhaps there is some clarification in the works that I haven't seen, or I am otherwise missing or misreading something. This also directly contradicts Anup's response to my original question, I think.

Perhaps it would help to add a section that explicitly states under what architectural circumstances it is required that executing an HFENCE.GVMA also implies a generic HFENCE.VVMA. What types of implemented translation cache require this behavior?

Thanks,

James.




--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

John Hauser

unread,
Nov 13, 2021, 9:57:17 PM11/13/21
to RISC-V ISA Dev
I wrote:
> Rather than make software execute the HFENCE.VVMA, which
> may not be ideal for every machine, we have chosen instead to require
> that machines of this type infer automatically a generic HFENCE.VVMA
> (hfence.vvma x0,x0) on every HFENCE.GVMA.

Kenney@Imperas:

> Can you point me at the part of the specification that says this?

It's right here, in the section "Hypervisor Memory-Management Fence
Instructions":

    Executing an HFENCE.GVMA instruction guarantees that any previous
    stores already visible to the current hart are ordered before
    all subsequent implicit reads by that hart of guest-physical
    memory-management data structures done for instructions that follow
    the HFENCE.GVMA.

The contents of Kelvin's "TPCs" depend on implicit reads of guest-
physical memory-management data structures.  Because his TPCs don't
keep a record of those exact dependencies, he must flush the TPCs
completely to satisfy the requirement.

What I was saying is that this consequence was fully anticipated and is
not considered an error in the specification.

    - John Hauser

James Kenney

unread,
Nov 15, 2021, 3:58:03 AM11/15/21
to John Hauser, RISC-V ISA Dev
Hello John,

I think it would be really helpful to enhance the wording of the specification here to be a bit more explicit about this. I think that these fences have two distinct effects that are easily conflated:

1. Ordering of loads/stores with respect to the fence, including implicit loads/stores required for page table walks;
2. Invalidation of hardware caches of various kinds.

I think the paragraph you pointed me at partly clarifies the required behavior, but not completely: it clearly says that stores are ordered by the fence, and that any subsequent reads required for page table lookups must use the new guest physical mappings not the old ones, but it doesn't say anything about the effect on cached VS structures (at least the way I read it). It would be helpful if there was an explicit statement saying something like in implementations with separate G and VS stage TLB caches, there is no requirement that execution of HFENCE.GVMA invalidate any VS stage TLB entries, or something like that (as Anup stated, I think) - or the opposite, if that is what is required.

I can see that Kelvin's TPCs would definitely have to be flushed, because otherwise there is a risk that a subsequent page table lookup would be short-circuited and use stale data. But the situation for cached VS-stage entries is not clear. For context, we have customers who have conflicting opinions about this: one is sure that all VS-stage entries must be flushed by HFENCE.GVMA, and one is equally sure that they must not!

Thanks,

James.




--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

Kelvin Goveas

unread,
Nov 16, 2021, 10:10:35 AM11/16/21
to RISC-V ISA Dev, ken...@imperas.com, RISC-V ISA Dev, John Hauser
Thank you for the clarification John.

Kelvin

John Hauser

unread,
Dec 3, 2021, 6:21:08 PM12/3/21
to RISC-V ISA Dev
Everybody who reads the RISC-V ISA manuals brings his/her own
understanding of technical terms, and preconceptions about various
concepts that RISC-V shares with other ISAs.  This is a double-edged
sword, both good and bad.  For the most part, it's necessary that
readers already share a great deal of vocabulary with the authors, or
the documents would be ten times longer just trying to explain many
crucial concepts.

However, on the topic of implicit memory accesses and caches, I've
observed that many readers' preconceptions are interfering with a
"correct" reading of the manuals as intended.

In the RISC-V unprivileged ISA manual (Volume I), Section 1.4,
"Memory", makes the distinction between implicit and explicit memory
accesses.  It says:

    Executing each RISC-V machine instruction entails one or more
    memory accesses, subdivided into _implicit_ and _explicit_
    accesses.  For each instruction executed, an _implicit_ memory
    read (instruction fetch) is done to obtain the encoded instruction
    to execute.  Many RISC-V instructions perform no further memory
    accesses beyond instruction fetch.  Specific load and store
    instructions perform an _explicit_ read or write of memory at an
    address determined by the instruction.  The execution environment
    may dictate that instruction execution performs other _implicit_
    memory accesses (such as to implement address translation) beyond
    those documented for the unprivileged ISA.

If, for example, you execute a load instruction in a guest virtual
machine with two-stage address translation, that load instruction
might easily involve 31 implicit memory accesses in addition to the one
explicit memory access.  That number 31 comes from:

    (3 + 1)*3 + 3 = 15 to obtain the machine physical address of the
    instruction;

    1 for the instruction fetch itself; and

    (3 + 1)*3 + 3 = 15 to obtain the machine physical address of the
    data to load.

I think the misunderstanding some people have is to believe that caches
cause some or all of those implicit memory accesses to disappear, i.e.,
not to occur.  But that's not the way you're supposed to think about
implicit memory accesses.  Section 1.4 says:

    Except when specified otherwise, implicit reads that do not raise
    an exception may occur arbitrarily early and speculatively, even
    before the machine could possibly prove that the read will be
    needed.  For instance, a valid implementation could attempt to
    read all of main memory at the earliest opportunity, cache as many
    fetchable (executable) bytes as possible for later instruction
    fetches, and avoid reading main memory for instruction fetches
    ever again.  To ensure that certain implicit reads are ordered only
    after writes to the same memory locations, software must execute
    specific fence or cache-control instructions defined for this
    purpose (such as the FENCE.I instruction defined in Chapter 3).

Instead of eliminating implicit memory accesses, what caches do is
allow implicit memory accesses to occur earlier in time.

Note in my first quotation above, the document says:  "For each
instruction executed, an implicit memory read (instruction fetch) is
done to obtain the encoded instruction to execute."  It doesn't say
"might be done, unless the instruction is obtained from a cache."  An
implicit access is _always_ done for an instruction fetch, although
this implicit access might have occurred a long time ago and been
cached.  The same for implicit accesses for address translation.

Kenney@Imperas wrote:
> I think it would be really helpful to enhance the wording of the
> specification here to be a bit more explicit about this. I think that
> these fences have two distinct effects that are easily conflated:
>
>  1. Ordering of loads/stores with respect to the fence, including
>     implicit loads/stores required for page table walks;
>  2. Invalidation of hardware caches of various kinds.

When implicit memory accesses are properly understood, I think this
condition is pretty clear:

    Executing an HFENCE.GVMA instruction guarantees that any previous
    stores already visible to the current hart are ordered before
    all subsequent implicit reads by that hart of guest-physical
    memory-management data structures done for instructions that follow
    the HFENCE.GVMA.

Cached address translations won't be acceptable for implicit reads
of address translation data structures if those data structures might
have changed between the time the value was cached and the HFENCE.GVMA.
That's because the memory reads connected to those cached translations
would be unacceptably early (preceding the store that changed the data
structures).

If you still don't think so, I'd be interested to know why.

    - John Hauser

James Robinson

unread,
Dec 4, 2021, 12:54:17 PM12/4/21
to RISC-V ISA Dev, John Hauser
Hi John,

What you are describing seems very (and unnecessarily) punitive for implementations which are caching VS stage translations separately from G stage translations. As someone working on an implementation which does, this, I would like to check whether this is the right thing to be specifying.

This is not what Anup, who I believe is at the forefront of implementing relevant software, had understood (from his first response to the thread on Sep 22).

The net effect of what you describe is that whenever the hypervisor makes any modification to the guest mappings, all VS-stage translations must be invalidated, whereas it seems to me that there are a number of situations where the hypervisor might change G-stage mappings but the VS-stage mappings can (even must, see the example below) be considered to remain valid.

Whilst the guest OS and the hypervisor must both participate to generate a VS-stage mapping, the VS-stage part of the mapping information was solely determined by the behavior of the guest OS. The guest OS will execute SFENCE.VMA to inform the hardware when this information is no longer valid.

Suppose that the guest OS walks its page tables and stores the PTE value in a GPR. It then swtiches sgatp.MODE to bare, and uses the information read from the PTE to emulate the mapped access. I believe this is valid S-mode behavior.

Now suppose the hypervisor does a context switch when the PTE is cached in the GPR, then HFENCE.GVMA, then context switches back to this guest. The hypervisor HFENCE.GVMA will not have invalidated the PTE value which is stored in the GPR, even though you are saying it must invalidate the equivalent information in the TLB. I believe the guest should still be guaranteed to see the correct value on the bare read of memory following the translation rules from the PTE cached in the GPR.

The guest OS owns the information encapsulated in its PTEs, and is responsible for informing the hardware when that changes (via SFENCE.VMA). Likewise, the hypervisor has a responsibility not to change the apparent contents of memory from under the guest, even if it changes the actual physical address that guest PAs map to. This includes not changing the apparent contents of the guest's PTEs.

I think this hypothetical case shows that forcing HFENCE.GVMA to invalidate VS-stage TLB translations does not guarantee any specific behavior in the guest and is not required.

James

John Hauser

unread,
Dec 4, 2021, 4:30:08 PM12/4/21
to RISC-V ISA Dev
James Kenney wrote:
> It would be helpful if there was an explicit statement saying
> something like in implementations with separate G and VS stage
> TLB caches, there is no requirement that execution of HFENCE.GVMA
> invalidate any VS stage TLB entries, or something like that (as Anup
> stated, I think) - or the opposite, if that is what is required.
>
> I can see that Kelvin's TPCs would definitely have to be flushed,
> because otherwise there is a risk that a subsequent page table lookup
> would be short-circuited and use stale data. But the situation for
> cached VS-stage entries is not clear. For context, we have customers
> who have conflicting opinions about this: one is sure that all
> VS-stage entries must be flushed by HFENCE.GVMA, and one is equally
> sure that they must not!

I'm starting to realize I didn't read James's message well enough, and
I then wrote a long treatise yesterday addressing the wrong question.
Sorry!  (Although maybe it will prove helpful to some people anyway.)
I'll try to do better this time.

James Robinson:
> What you are describing seems very (and unnecessarily) punitive for
> implementations which are caching VS stage translations separately
> from G stage translations. As someone working on an implementation
> which does, this, I would like to check whether this is the right
> thing to be specifying.
>
> This is not what Anup, who I believe is at the forefront of
> implementing relevant software, had understood (from his first
> response to the thread on Sep 22).
>
> The net effect of what you describe is that whenever the hypervisor
> makes any modification to the guest mappings, all VS-stage
> translations must be invalidated, whereas it seems to me that there
> are a number of situations where the hypervisor might change G-stage
> mappings but the VS-stage mappings can (even must, see the example
> below) be considered to remain valid.

The Privileged ISA manual tries to say that HFENCE.GVMA applies only
to the address translation data structures (page tables) pointed to by
hgatp, not those pointed to by satp or vsatp.  Although I didn't say
so explicitly, I meant my statements yesterday in that context.  An
HFENCE.GVMA need not affect an address translation cache (TLB) that
caches only VS-stage translations.

I assume that answers the concerns of both of you, correct?

Even so, if you still believe the document is unclear as currently
written, please review the existing GitHub issues here:
https://github.com/riscv/riscv-isa-manual/issues
and if there isn't one already for this topic, add it.

Best,

    - John Hauser

James Robinson

unread,
Dec 4, 2021, 8:04:48 PM12/4/21
to RISC-V ISA Dev, John Hauser
Hi John,

It does look like we are in agreement. Thank you.

James

ken...@imperas.com

unread,
Dec 5, 2021, 5:21:37 AM12/5/21
to RISC-V ISA Dev, robin...@gmail.com, John Hauser
Hi John,

Yes, this is now what I expect to see as well. I think a simple and explicit statement to this effect in the Privileged Specification would be extremely helpful, so I will add a new issue for that.

Thanks,

James (2)
Reply all
Reply to author
Forward
0 new messages