PMAs - Physical Memory Attributes

1,826 views
Skip to first unread message

Michael Clark

unread,
Jun 17, 2017, 3:38:40 AM6/17/17
to RISC-V ISA Dev
The specification talks at a high level about PMAs (Physical Memory Attributes) but in reality only has fine details on PMP (Physical Memory Protection).

So while I was reading the spec I decided to make a table of PMAs.

I generally tried to avoid attributes with negative meanings so there are some properties that are expressed via the absence of flags. e.g. the absence of pma_policy_coherent means a region without this flag is incoherent.

I have split cache write policy from cache allocation policy so there are some combinations that may not make sense and there may be some duplication, however it is a starting point.

For cache allocation, the lack of read allocate means only writes cause allocation and vice versa. Normally both or neither would be set but there are cases where one may wish reads to be cached and writes direct to memory, or writes cached and reads go direct to memory. This is as about as fine grained control as one could get. Allocate on read and write are both properties of caches and my intention was to be exhaustive (even if both bits are usually set). In some sense pma_cache_alloc_read may make pma_cache_write_around redundant however it is included for completeness.

I am not happy with the IO sizes as they are all powers of 2 and the limit is currently 1024-bit. I ran out of bits in a 32-bit word. We have 384-bit GDDR memory systems and 4096-bit HBM memory systems so the IO sizes need some work.

Ordering and AMOs comes straight from the spec. There are also PMP properties for implementations that may implement PMP as attributes of a wider scoped PMA table. There are also needs to be a distinction between read-only attributes that expose the type of memory ranges versus bits that can be changed.

Given we were discussing caching I thought I would share the list. It might be useful for future reference.


/* supported memory range types */
pma_type_illegal = 1U<<0, /* illegal region */
pma_type_main = 1U<<1, /* main memory region */
pma_type_io = 1U<<2, /* IO memory region */

/* supported memory range cache write modes */
pma_cache_write_back = 1U<<3, /* write back caching (normal cache policy) */
pma_cache_write_through = 1U<<4, /* write through cache to backing store */
pma_cache_write_combine = 1U<<5, /* write accumulate until fence or cache line is full */
pma_cache_write_around = 1U<<6, /* uncacheable, write directly to backing store */

/* supported memory range cache allocation modes (4 combinations) */
pma_cache_alloc_read = 1U<<7, /* allocate cache on reads */
pma_cache_alloc_write = 1U<<8, /* allocate cache on writes */

/* supported memory range backing store write sizes */
pma_io_size_1 = 1U<<9, /* b - 8-bit */
pma_io_size_2 = 1U<<10, /* h - 16-bit */
pma_io_size_4 = 1U<<11, /* w - 32-bit */
pma_io_size_8 = 1U<<12, /* d - 64-bit */
pma_io_size_16 = 1U<<13, /* q - 128-bit */
pma_io_size_32 = 1U<<14, /* o - 256-bit */
pma_io_size_64 = 1U<<15, /* qq - 512-bit */
pma_io_size_128 = 1U<<16, /* qo - 1024-bit */

/* supported memory ordering */
pma_order_channel_0 = 1U<<17, /* hart point to point strong ordering */
pma_order_channel_1 = 1U<<18, /* hart global strong ordering */

/* supported memory range coherence policies (not coherent, not private is a type) */
pma_policy_coherent = 1U<<19, /* hardware managed coherence */
pma_policy_private = 1U<<20, /* non shared private */

/* support memory range atomic operations (not present indicates no amo support) */
pma_io_amo_swap = 1U<<21, /* amoswap */
pma_io_amo_logical = 1U<<22, /* above + amoand, amoor, amoxor */
pma_io_amo_arithmetic = 1U<<23, /* above + amoadd, amomin, amomax, amominu, amomaxu */

/* supported memory range atomic operation sizes (amo for aligend main memory is implied) */
pma_io_amo_size_4 = 1U<<24, /* amo<>.w */
pma_io_amo_size_8 = 1U<<25, /* amo<>.d */
pma_io_amo_size_16 = 1U<<26, /* amo<>.q */

/* supported memory range io idempotency (N/A main memory) */
pma_io_idempotent_read = 1U<<27, /* reads are idempotent (allow speculative or redundant reads) */
pma_io_idempotent_write = 1U<<28, /* writes are idempotent (allow speculative or redundant writes) */

/* supported memory range protection */
pma_prot_read = 1U<<29, /* region is readable */
pma_prot_write = 1U<<30, /* region is writable */
pma_prot_execute = 1U<<31, /* region is executable */

Tommy Thorn

unread,
Jun 17, 2017, 1:49:06 PM6/17/17
to Michael Clark, RISC-V ISA Dev
I assume this is just a proposal?

I generally tried to avoid attributes with negative meanings...
..
>                pma_type_illegal         = 1U<<0, /* illegal region */

Surely you meant

>                pma_type_legal         = 1U<<0, /* legal region */

I think this is better anyway (no accidentally legal entries).

Tommy




--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/28A60C47-3BFC-48BD-97D1-F4C86AD0D1C8%40mac.com.

Michael Clark

unread,
Jun 17, 2017, 6:04:29 PM6/17/17
to Tommy Thorn, RISC-V ISA Dev
On 18 Jun 2017, at 5:49 AM, Tommy Thorn <tommy...@esperantotech.com> wrote:

I assume this is just a proposal?

Yes. I thought I would share the list as I have not seen a complete table of PMAs. PMAs are a way to both declare or set cache policies for regions of memory, so It’s related but orthogonal to explicit cache control.

There are nuances that are not in my list. e.g. whether a property is declared (read-only), versus whether it is writable. This might depend on implementations as they may or may not support dynamic cache policies for various regions. In fact changing a cache-policy for a range of memory may or may not have side effects such as invalidating or flushing caches.

I generally tried to avoid attributes with negative meanings…

I’ll highlight “generally”.

>                pma_type_illegal         = 1U<<0, /* illegal region */

Surely you meant

>                pma_type_legal         = 1U<<0, /* legal region */

I think this is better anyway (no accidentally legal entries).

main, IO and empty are listed in the specification section on PMAs.

empty is implied by the absence of an entry.

Yes. illegal is a negative entry to mark a region as prohibited, while still being able to mark it as IO or main memory.

- regions that are not main or IO are implicitly illegal
- regions that are main or IO are implicitly legal.
- illegal allows one to mark a reserved main or IO region.

An example for an illegal entry passed in a memory map would be memory reserved by a monitor.

I guess it could be simplified by making this implicit, but think of it as a physical area that might be wired out (like A20), and even if RAM is plugged in, the address space cannot be used.

main or IO could be encoded to only require 1 bit. The encoding I have there is draft as I have simply assigned one bit to each attribute. It was a first pass to try to list a relatively complete list of physical memory attributes.

To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Michael Clark

unread,
Jun 18, 2017, 9:24:07 PM6/18/17
to RISC-V ISA Dev
I’ve changed illegal to reserved, remove PMA IO sizes, added cache read enable and cache eviction enable:

https://gist.github.com/michaeljclark/fa4df328f35c93405541c3407289cd81


We need some uses cases to test which flags are necessary. e.g. in early boot, on a system without soldered RAM where RAM timings are read via SPD, versus in ROM, there may need in early board bring up to cache the ROM, and configure some cache as pinned scratch memory. Disable eviction essentially.


1. Early ROM configuring the system to run the ROM out of cache before RAM has been configured. Running from ROM is very slow if the accesses are uncached. Read allocate is toggled. The cache is populated and eviction is disabled to pin the addresses in the TLB.

Set pma_type_main
Set pma_cache_read
Set pma_cache_read_alloc
Set pma_prot_read
Execute loads from ROM to populate cache
Unset pma_cache_read_alloc

1. Early ROM configuring part of cache as scratch RAM. Write allocate is toggled. Eviction is not enabled to pin the addresses in the TLB.

Set type_main
Set pma_cache_read
Set pma_cache_write_alloc
Set pma_cache_write_back
Set pma_prot_read
Set pma_prot_write
Execute stores of zeros into cache at desired address range
Unset pma_cache_write_alloc

2. Cachable IO region i.e. framebuffer

Set pma_type_io
Set pma_cache_read
Set pma_cache_read_alloc
Set pma_cache_write_alloc
Set pma_cache_write_combine
Set pma_cache_evict
Set pma_prot_read
Set pma_prot_write

3. Uncacheable  IO region

Set pma_type_io
Set pma_prot_read
Set pma_prot_write

4. Main memory

Set type_main
Set pma_cache_read
Set pma_cache_read_alloc
Set pma_cache_write_alloc
Set pma_cache_write_back
Set pma_cache_evict
Set pma_prot_read
Set pma_prot_write
Set pma_prot_exec

5. Non volatile memory (not executable)

Set pma_type_main
Set pma_cache_read
Set pma_cache_read_alloc
Set pma_cache_write_alloc
Set pma_cache_write_through
Set pma_cache_evict
Set pma_prot_read
Set pma_prot_write

6. AMOs  coherency and idempotency are likely read-only properties of the memory region that presumable are the ROM describing the memory layout and are informational (however they could be dynamic). The attributes may be used by the cache and memory system when executing certain operations. i.e. to know whether AMOs will function on an IO region, so a trap is created instead of blinding executing an AMO on a region that does not support them

Set pma_policy_coherent

Set pma_io_amo_swap
Set pma_io_amo_logical
Set pma_io_amo_arithmetic

Set pma_io_amo_size_4
Set pma_io_amo_size_8
Set pma_io_amo_size_16

7. Hart local MMIO region (e.g. local or shared memory in a SIMD cluster type arrangement)

Set pma_policy_private

8. Signal information about address space holes that may be occupied by monitor, hypervisor or address ranges that are wired out in hardware (Debug address space aperture that is only accessible via JTAG).

Set pma_policy_reserved

On 18 Jun 2017, at 5:49 AM, Tommy Thorn <tommy...@esperantotech.com> wrote:

To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Allen J. Baum

unread,
Jun 19, 2017, 12:47:05 PM6/19/17
to Michael Clark, RISC-V ISA Dev
I came up with a similar table, but based only on the information in the latest version of the spec. This proposal is much more specific, casting it in concrete, much like the PMP. Unclear if that is a good thing or not, as this might be something that may vary greatly depending on implementation.

I have a more basic question, however: what is the purpose of the IO and Memory bits?
Specifically: do they confer any properties that are not covered by the other bits?

I also echo some other concerns as to whether cache control properties belong in this.
Many of these properties are properly (or at least usually) controlled by page table entries.
I think it would be useful to list which of the properties are read-only, read-write, or could be either - or whether this is a consequence of the IO vs. memory property.
To unsubscribe from this group and stop receiving emails from it, send an email to

To post to this group, send email to

To view this discussion on the web visit
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to
isa...@groups.riscv.org.
Visit this group at
https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit
https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/6FFA0050-072A-4540-804E-E7DA9E220F2C%40mac.com.


-- 
**************************************************
* Allen Baum              tel. (908)BIT-BAUM     *
*                                   248-2286     *     
**************************************************

Michael Clark

unread,
Jun 19, 2017, 6:38:51 PM6/19/17
to Allen J. Baum, RISC-V ISA Dev
Hi Allen,

On 20 Jun 2017, at 4:46 AM, Allen J. Baum <allen...@esperantotech.com> wrote:

I came up with a similar table, but based only on the information in the latest version of the spec. This proposal is much more specific, casting it in concrete, much like the PMP. Unclear if that is a good thing or not, as this might be something that may vary greatly depending on implementation.

I have gone through the list a few times and each of the properties are quite distinct. The only one I am thinking of removing is write_around which is effectively uncached writes (or the default). I’ve read lots of documentation on platform memory controllers.

In fact IO size really needs to be put back in there as I can see it in the register properties of snoop controllers. i.e. the memory system needs to know if the backing store accept 1 byte writes or whether it needs to read/modify/write a 32-bit word to handle byte transactions. Perhaps limit it from 8-bit to 128-bit and leave out more weird memory systems.

I have a more basic question, however: what is the purpose of the IO and Memory bits?

IO and memory are primarily informational. A kernel would likely use the IO property in its io_remap implementation.

Specifically: do they confer any properties that are not covered by the other bits?

Most of the IO related properties are related to cache properties. e.g. write combine and uncacheable.

I also echo some other concerns as to whether cache control properties belong in this.
Many of these properties are properly (or at least usually) controlled by page table entries.

IOTLB entries look very similar to TLB entries too. Primarily the common properties are prot_read | prot_write | prot_exec. Interestingly IO TLB entries also have these properties as they can inform the memory system to deny exec on an IO region, or in fact allow it.

The physical memory layer also needs these properties and they are in small tables much like the PMP tables. MTRR is the legacy table on x86. There are also tables in the platform memory controller or MCH (memory controller hub) that has been moved onto the die in recent SOCs.

I think it would be useful to list which of the properties are read-only, read-write, or could be either - or whether this is a consequence of the IO vs. memory property.

I think this would be defined in a similar way to other properties in that one would try to write to the bit and see if it changes. Some implementations may wire physical memory properties and these properties might be in ROM. Some implementations with sophisticated i.e. present state of the art, may be able to set cache policy on regions. I do note that no evict  in current platform memory controllers is usually a global setting, in not publicly documented model specific control and status registers. It would be innovative to have the evict logic pull the bit from the current PMA entry rather than a global flag, assuming many of the cache and memory controllers decisions will be based on the fetched PMA entry.

My major criticism of the current PMP specification is that it in fact bakes in concrete much more restrictive implementation choices than having a variable length table (perhaps length and base address expososed in CSRs) to an MMIO region for the memory controller PMA configuration/

pmabr – possibly read-only - physical address of PMA table
pmacnt – possibly read-only ; number of entries of PMA table

Using the range scheme from the current PMP specification and an MMIO region (much like the PLIC) for the memory controller, would allow for much more flexibility in the future. The unimplemented bits can just be ignore bits. In this implementation it would require 32-bits for the PMA attributes and they would also support PMP.

I’d much prefer a more flexible PMA/PMP design that was scalable. a 16 entry MMIO region would be relatively similar in implementation complexity to the current approach, but would be much more scalable and future proof.

Michael.

Michael Clark

unread,
Jun 19, 2017, 6:58:16 PM6/19/17
to RISC-V ISA Dev
Based on thinking about this further i’ve made these changes

- Removed write around and labeled it default write policy, as write around is effectively uncacheable writes
- Added back IO sizes up to 512-bit (64-bytes) which is a present day typical upper limit on cache line sizes for CPUs


It’s worth noting that an emulator like Gem5 probably implements a memory controller with configuration of these properties and these relatively general properties are necessary, either statically or dynamically, implicitly or explicitly, for implementing a caching memory controller. The most specific properties are the ones from the spec relating to AMO grouping and memory ordering channels. Most of the properties are quite generic and are encountered in present day memory systems, and are reflected on by operating system kernels and such.

There are much more details properties with respect to modelling a multi-tiered cache hierarchy with respect to “inclusivity”, “exclusivity” at each tier, the cache coherency models, MESI, MOSI, MOESI, MOESIF and MSI, and the cache and memory system geometry/topology. I have not gone into that level of detail and these are not physical memory attributes and would need alternate specification if they are to be reflected on in a heterogenous system. OpenCL gives good examples for properties that can be reflected at runtime on memory system topology that can be used for optimising conpute kernels. I have for example written code that adapts to the available or local memory using introspection.

Future RISC-V systems with sophisticated memory system are going to face the problem of surfacing these attributes in a standardised way, so that software can be optimised for what will be a heterogenous ecosystem with multiple implementation choices. Surfacing these choices via introspection will be important.

Reply all
Reply to author
Forward
0 new messages