MMU & Cache Coherence in EDK2

96 views
Skip to first unread message

Jingyu Li

unread,
Sep 8, 2023, 6:54:02 AM9/8/23
to RISC-V Firmware Exchange
Dear all,

We are porting the PCIe driver into the SG2042 platform. To solve the Cache Coherence problem between the PCIe and the CPU, we have enabled the MMU to allocate non-cacheable memory for the PCIe device. There are some questions:

Q0:
UEFI mandates a 1:1 mapping between the physical address and the virtual address. How to deal with the situation where the mapping between the two fails to meet the above prerequisites.

Take Sv39 for example, if the 38-bit of a device's physical address is 1, e.g. 0x40 0000 0000, the mapped virtual address is 0xFFFF FFFC 0000 0000 according to RISC-V Privileged Spec.

Q1:
Case 1. Set the PageAttributes as (readable, writable, bufferable, sharable, weak-order, non-cacheable) or (readable, writable, non-bufferable, strong-order, non-cacheable) according to the RISC-V privilege and the C920 Core IP spec, then the NVMe SSD over PCIe doesn't work, the data in SQ and CQ is not right, still has been cached. It seems that the non-cacheable PTEs have no effect.

Case 2.  Then we try to override the PciRootBridgeIoAllocateBuffer, PciRootBridgeIoFreeBuffer, PciRootBridgeIoMap, and PciRootBridgeIoUnMap functions with the reference to  NonCoherentDmaLib.  For each buffer allocation, modify the GcdAttributes to non-cacheable, and then update the page tables. Still, the NVMe DXE also needs to flush or invalidate the cache to pass the NVMe DXE initializing phase. It seems that the ready-to-use update page tables also have no effect.

So how do we keep the cache coherence between PCIe and CPU in edk2 by using MMU reasonably and appropriately?

Q2:
When the code about cache maintenance (e.g. CpuFlushCpuDataCache) will be implemented?

Q3:
When the  Svpbmt Extension will be implemented? Will be only a reference implementation given?

Thanks!

Best Regards,
Jingyu

Tuan Phan

unread,
Sep 8, 2023, 6:23:44 PM9/8/23
to Jingyu Li, RISC-V Firmware Exchange
Hi,
First of all, let me clear about PCIe memory map between CPU<->PCIe
There are two case when PCIe memory access involved
1. PCIe inbound:
   - This is the case that devices with DMA access content from memory without CPU involved. If there is no IOMMU between the data path then PCIe devices will access memory directly with the physical address. PCIe drivers in EDK2 must use PCIe root bridge IO protocol allocate function to allocate memory that will be used by DMA operation. In the function, memory needs to be allocated with a non cache attribute or manually be flushed before DMA starts. Also, if the memory is not 1:1 seen from a PCIe device vs CPU device, Map/UnMap function of that protocol needs to be used. The PCIe bridge IO protocol must be implemented by platform vendors if the generic one isn't suitable for this case.
2. PCIe outbound:
   - This is the case CPU read/write to PCIe BAR or ECAM memory. All access should be done through Read/Write API of root bridge IO protocol so the address from CPU can be translated to PCIe bridge memory range. Again, this implementation of these functions is up to platform vendors.

My comments inline for your questions:

On Fri, Sep 8, 2023 at 3:54 AM Jingyu Li <jingy...@gmail.com> wrote:
Dear all,

We are porting the PCIe driver into the SG2042 platform. To solve the Cache Coherence problem between the PCIe and the CPU, we have enabled the MMU to allocate non-cacheable memory for the PCIe device. There are some questions:

Q0:
UEFI mandates a 1:1 mapping between the physical address and the virtual address. How to deal with the situation where the mapping between the two fails to meet the above prerequisites.

Take Sv39 for example, if the 38-bit of a device's physical address is 1, e.g. 0x40 0000 0000, the mapped virtual address is 0xFFFF FFFC 0000 0000 according to RISC-V Privileged Spec.
[Tuan] I answered in the above statement.

Q1:
Case 1. Set the PageAttributes as (readable, writable, bufferable, sharable, weak-order, non-cacheable) or (readable, writable, non-bufferable, strong-order, non-cacheable) according to the RISC-V privilege and the C920 Core IP spec, then the NVMe SSD over PCIe doesn't work, the data in SQ and CQ is not right, still has been cached. It seems that the non-cacheable PTEs have no effect.

Case 2.  Then we try to override the PciRootBridgeIoAllocateBuffer, PciRootBridgeIoFreeBuffer, PciRootBridgeIoMap, and PciRootBridgeIoUnMap functions with the reference to  NonCoherentDmaLib.  For each buffer allocation, modify the GcdAttributes to non-cacheable, and then update the page tables. Still, the NVMe DXE also needs to flush or invalidate the cache to pass the NVMe DXE initializing phase. It seems that the ready-to-use update page tables also have no effect.

So how do we keep the cache coherence between PCIe and CPU in edk2 by using MMU reasonably and appropriately?
[Tuan] Currently, MMU library doesn't support cache attribute, all the attributes you mentioned look like from C920 Core IP.  To answer your question, more info needs to be provided. To debug this issue, you can focus on what the address that failed in NVMe DXE
     - What is the data path of this address, inbound or outbound
     - Have you confirmed the MMU library set the correct attribute for this memory range containing the address by dumping MMU PTE?
     - What kind of flush are you using to make it work?
     - Not sure where is the implementation of the NonCoherentDmaLib you mentioned?  

Q2:
When the code about cache maintenance (e.g. CpuFlushCpuDataCache) will be implemented?
[Tuan] Sunil may have an idea 

Q3:
When the  Svpbmt Extension will be implemented? Will be only a reference implementation given?
[Tuan] Svpbmt is ready to be submitted. Just waiting for the extension discovery mechanism to be posted. 

Thanks!

Best Regards,
Jingyu

--
You received this message because you are subscribed to the Google Groups "RISC-V Firmware Exchange" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fw-exchange...@riscv.org.
To view this discussion on the web visit https://groups.google.com/a/riscv.org/d/msgid/fw-exchange/ce3ad5d6-4cf3-4b2f-849f-982a1e991176n%40riscv.org.

Dhaval Sharma

unread,
Sep 9, 2023, 2:24:45 AM9/9/23
to Tuan Phan, Jingyu Li, RISC-V Firmware Exchange
CMO patches are sort of ready but requires extn discovery mechanism to be in place. Waiting for some feedback on the same. Does your platform support CMO operations? I was looking for someone to provide an actual HW platform to test it out. I could share the current patchset offline if you want for your POC like testing. 



--
Thanks!
=D

Warkentin, Andrei

unread,
Sep 9, 2023, 4:29:09 AM9/9/23
to Jingyu Li, RISC-V Firmware Exchange

Wrt Q0:

 

SG2042 is an Sv39 chip. Can you share what kind of devices are decoding bit 38? Are these platform (MMIO) devices or is that where you have set up the PCIe apertures?

 

The current RV calling convention in UEFI sadly says nothing about 1:1 mapping, but the current draft being reviewed (and soon approved) by the UEFI Forum has the following text, which is similar to other architectures. In fact, it is very similar to the ia32 – it /allows/ the MMU to be configured, but if it is, then the 1:1 mapping must hold true for /memory spaces/.

 

E.g.:

--------->

Address translation may be enabled. If enabled, any memory space defined by the UEFI memory map is identity mapped (virtual address equals physical address), although the attributes of certain regions may not have all read, write and execute attributes or be unmarked for purposes of platform protection. The mappings to other regions are undefined and may vary from implementation to implementation.

--------->

 

(UEFI memory maps don’t contain I/O, unless it’s a region used by runtime services code… that’s the only exception, since the OS has to map all regions used by RT).

 

So in your situation, where a device is physically at 0x40 0000 0000 but in Sv39 world is 0xFFFF FFFC 0000 0000, you could:

  1. Not enable the MMU. Not ideal (e.g., you can forget about MultiArchUefiPkg as it relies on page protection to intercept execution of non-native code)
  2. Make drivers aware that physical address of a device’s registers != actual address to use. For platform devices, e.g, passing the virtual address to RegisterNonDiscoverableMmioDevice instead of physical. For PCIe devices, you might need to hack up the PCI stack so that the reported BAR values are sign extended ones, not physical ones.
  3. For PCIe, move the aperture lower.

 

If you happen to have RAM at bit 38, then such RAM literally can’t be used by UEFI if you use the MMU, because memory must be 1:1. You’d have to reserve such ranges away from Tiano.

 

Q1) Is there a non-architected cache somewhere in the system? “For each buffer allocation, modify the GcdAttributes to non-cacheable, and then update the page tables.” – you’d also want to flush the buffers… Is your PCIe DMA non-coherent?

 

Q2) Does SG2042 implement CMO or is there a non-architected cache somewhere in the system, that requires custom MMIO-based operations?

 

Q3) Does SG2042 implement Svpbmt? I want to add a stronger requirement around MMU use to the UEFI calling conv. for RISC-V – mandate it for systems with Svpbmt (because presumably, you’d want to use MMU on such systems anyway to deal with UC overrides to regular memory PMA for supporting non-coherent I/O implementations.)

 

A

 

--

Jingyu Li

unread,
Sep 9, 2023, 1:13:56 PM9/9/23
to RISC-V Firmware Exchange, andrei.w...@intel.com, Jingyu Li
Thanks for your reply!

Please see the below in response to your questions:

On Saturday, September 9, 2023 at 4:29:09 PM UTC+8 andrei.w...@intel.com wrote:

Wrt Q0:

 

SG2042 is an Sv39 chip. Can you share what kind of devices are decoding bit 38? Are these platform (MMIO) devices or is that where you have set up the PCIe apertures?

 

[A] In the system map of SG2042,  all the devices (including the PCIe apertures) need to decode bit 38.

The current RV calling convention in UEFI sadly says nothing about 1:1 mapping, but the current draft being reviewed (and soon approved) by the UEFI Forum has the following text, which is similar to other architectures. In fact, it is very similar to the ia32 – it /allows/ the MMU to be configured, but if it is, then the 1:1 mapping must hold true for /memory spaces/.

 

E.g.:

--------->

Address translation may be enabled. If enabled, any memory space defined by the UEFI memory map is identity mapped (virtual address equals physical address), although the attributes of certain regions may not have all read, write and execute attributes or be unmarked for purposes of platform protection. The mappings to other regions are undefined and may vary from implementation to implementation.

--------->

 

(UEFI memory maps don’t contain I/O, unless it’s a region used by runtime services code… that’s the only exception, since the OS has to map all regions used by RT).

 

So in your situation, where a device is physically at 0x40 0000 0000 but in Sv39 world is 0xFFFF FFFC 0000 0000, you could:

  1. Not enable the MMU. Not ideal (e.g., you can forget about MultiArchUefiPkg as it relies on page protection to intercept execution of non-native code)
  2. Make drivers aware that physical address of a device’s registers != actual address to use. For platform devices, e.g, passing the virtual address to RegisterNonDiscoverableMmioDevice instead of physical. For PCIe devices, you might need to hack up the PCI stack so that the reported BAR values are sign extended ones, not physical ones.
  3. For PCIe, move the aperture lower.

 

If you happen to have RAM at bit 38, then such RAM literally can’t be used by UEFI if you use the MMU, because memory must be 1:1. You’d have to reserve such ranges away from Tiano.

 

Q1) Is there a non-architected cache somewhere in the system? “For each buffer allocation, modify the GcdAttributes to non-cacheable, and then update the page tables.” – you’d also want to flush the buffers… Is your PCIe DMA non-coherent?

A1) The PCIe DMA on SG2042 is non-coherent. So we want to use MMU to allocate a non-cacheable memory region for PCIe DMA to avoid the cache coherence problem.

Q2) Does SG2042 implement CMO or is there a non-architected cache somewhere in the system, that requires custom MMIO-based operations?

 

A2) The T-HEAD C920 Core IP has implemented its own CMO Extension (The relevant instructions are shown in the figure below) and is different from the standard CMO Extension.
THEAD_CMO.png

Q3) Does SG2042 implement Svpbmt? I want to add a stronger requirement around MMU use to the UEFI calling conv. for RISC-V – mandate it for systems with Svpbmt (because presumably, you’d want to use MMU on such systems anyway to deal with UC overrides to regular memory PMA for supporting non-coherent I/O implementations.)

 

A3)  The T-HEAD C920 Core IP has implemented its own Svpbmt Extension (The relevant register and T-HEAD memory type definitions are shown in the figure below) and is different from the standard Svpbmt Extension.

THEAD_SVPBMT0.pngte
THEAD_SVPBMT1.pngTHEAD_SVPBMT2.png

Warkentin, Andrei

unread,
Sep 9, 2023, 5:25:15 PM9/9/23
to Jingyu Li, RISC-V Firmware Exchange, Jingyu Li
Jingyu,

A few follow up Qs:

1) Did my comments on the handling of your device I/O regions (with bit 38) make sense? Do you acknowledge that a non-1:1 mapping of MMIO regions is not a violation of the identity mapping requirements seen in UEFI spec?

2) Do you acknowledge that you must perform cache maintenance, as part of allocating a non-coherent buffer, to avoid stale state from the cache to accidentally getting flushed back to RAM (and I suppose, depending on how the cache is implemented, to avoid cache hits on stale data, too...).

Do you have the code up somewhere? Perhaps we can eyeball the issue...

A




От: Jingyu Li <jingy...@gmail.com>
Отправлено: суббота, сентября 9, 2023 12:14 PM
Кому: RISC-V Firmware Exchange <fw-ex...@riscv.org>
Копия: Warkentin, Andrei <andrei.w...@intel.com>; Jingyu Li <jingy...@gmail.com>
Тема: Re: MMU & Cache Coherence in EDK2
 

Jingyu Li

unread,
Sep 10, 2023, 9:51:10 AM9/10/23
to RISC-V Firmware Exchange, andrei.w...@intel.com, Jingyu Li
Hi Andrei,

Answers to your questions:

On Sunday, September 10, 2023 at 5:25:15 AM UTC+8 andrei.w...@intel.com wrote:
Jingyu,

A few follow up Qs:

1) Did my comments on the handling of your device I/O regions (with bit 38) make sense? Do you acknowledge that a non-1:1 mapping of MMIO regions is not a violation of the identity mapping requirements seen in UEFI spec?

A1)  Your comments give us some ideas. And we have not adopted the MultiArchUefiPkg on SG2042 yet.
"Make drivers aware that physical address of a device’s registers != actual address to use.   For platform devices, e.g, passing the virtual address to RegisterNonDiscoverableMmioDevice instead of physical.  For PCIe devices, you might need to hack up the PCI stack so that the reported BAR values are sign extended ones, not physical ones."  
—— The proposal above is exactly the solution we have taken. Specifically, UEFI is booting using a Micro SD card on SG2042 currently, so the SDHI/EMMC driver is RT DXE. The SDHI/EMMC initialization is in the phase after enabling MMU. We pass the actual virtual base address of SD to ensure the read/write is correct. But for the UART driver, only use the SBI Legacy Extension -- sbi_console_putchar & sbi_console_getchar to keep getting the output from the serial port because the ordinary serial port driver needs to modify the base address from the physical address to the virtual address after enabling the MMU.

For the MMIO regions of the runtime services, non-1:1 mapping has the above solution. So maybe is not a violation of the the identity mapping requirements seen in UEFI spec?

2) Do you acknowledge that you must perform cache maintenance, as part of allocating a non-coherent buffer, to avoid stale state from the cache to accidentally getting flushed back to RAM (and I suppose, depending on how the cache is implemented, to avoid cache hits on stale data, too...).

A2)    According to the two cases described above, cache maintenance really must be performed. 
However, it is important to emphasize that it has not yet been checked that the page tables for uncached memory are created as expected. We are trying to figure that out.
(In my view, a device DMA does not need to perform cache-related operations as long as it allocates non-cacheable memory?)

Do you have the code up somewhere? Perhaps we can eyeball the issue...

A3) Sorry, the code is not ready to push on our repository.   However, we are planning to check and push our code to provide the public link soon.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages