[PATCH v9 0/9] x86: Show in sysfs if a memory node is able to do encryption

20 views
Skip to first unread message

Martin Fernandez

unread,
Jul 4, 2022, 9:58:47 AM7/4/22
to linux-...@vger.kernel.org, linu...@vger.kernel.org, platform-...@vger.kernel.org, linu...@kvack.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, tg...@linutronix.de, mi...@redhat.com, b...@alien8.de, dave....@linux.intel.com, x...@kernel.org, h...@zytor.com, ar...@kernel.org, dvh...@infradead.org, an...@infradead.org, gre...@linuxfoundation.org, raf...@kernel.org, rp...@kernel.org, ak...@linux-foundation.org, daniel...@eclypsium.com, hugh...@gmail.com, alex.ba...@eclypsium.com, alison.s...@intel.com, kees...@chromium.org, Martin Fernandez
Show for each node if every memory descriptor in that node has the
EFI_MEMORY_CPU_CRYPTO attribute.

fwupd project plans to use it as part of a check to see if the users
have properly configured memory hardware encryption
capabilities. fwupd's people have seen cases where it seems like there
is memory encryption because all the hardware is capable of doing it,
but on a closer look there is not, either because of system firmware
or because some component requires updating to enable the feature.

The MKTME/TME spec says that it will only encrypt those memory regions
which are flagged with the EFI_MEMORY_CPU_CRYPTO attribute.

If all nodes are capable of encryption and if the system have tme/sme
on we can pretty confidently say that the device is actively
encrypting all its memory.

It's planned to make this check part of an specification that can be
passed to people purchasing hardware

These checks will run at every boot. The specification is called Host
Security ID: https://fwupd.github.io/libfwupdplugin/hsi.html.

We choosed to do it a per-node basis because although an ABI that
shows that the whole system memory is capable of encryption would be
useful for the fwupd use case, doing it in a per-node basis would make
the path easier to give the capability to the user to target
allocations from applications to NUMA nodes which have encryption
capabilities in the future.


Changes since v8:

Add unit tests to e820_range_* functions


Changes since v7:

Less kerneldocs

Less verbosity in the e820 code


Changes since v6:

Fixes in __e820__handle_range_update

Const correctness in e820.c

Correct alignment in memblock.h

Rework memblock_overlaps_region


Changes since v5:

Refactor e820__range_{update, remove, set_crypto_capable} in order to
avoid code duplication.

Warn the user when a node has both encryptable and non-encryptable
regions.

Check that e820_table has enough size to store both current e820_table
and EFI memmap.


Changes since v4:

Add enum to represent the cryptographic capabilities in e820:
e820_crypto_capabilities.

Revert __e820__range_update, only adding the new argument for
__e820__range_add about crypto capabilities.

Add a function __e820__range_update_crypto similar to
__e820__range_update but to only update this new field.


Changes since v3:

Update date in Doc/ABI file.

More information about the fwupd usecase and the rationale behind
doing it in a per-NUMA-node.


Changes since v2:

e820__range_mark_crypto -> e820__range_mark_crypto_capable.

In e820__range_remove: Create a region with crypto capabilities
instead of creating one without it and then mark it.


Changes since v1:

Modify __e820__range_update to update the crypto capabilities of a
range; now this function will change the crypto capability of a range
if it's called with the same old_type and new_type. Rework
efi_mark_e820_regions_as_crypto_capable based on this.

Update do_add_efi_memmap to mark the regions as it creates them.

Change the type of crypto_capable in e820_entry from bool to u8.

Fix e820__update_table changes.

Remove memblock_add_crypto_capable. Now you have to add the region and
mark it then.

Better place for crypto_capable in pglist_data.

Martin Fernandez (9):
mm/memblock: Tag memblocks with crypto capabilities
mm/mmzone: Tag pg_data_t with crypto capabilities
x86/e820: Add infrastructure to refactor e820__range_{update,remove}
x86/e820: Refactor __e820__range_update
x86/e820: Refactor e820__range_remove
x86/e820: Tag e820_entry with crypto capabilities
x86/e820: Add unit tests for e820_range_* functions
x86/efi: Mark e820_entries as crypto capable from EFI memmap
drivers/node: Show in sysfs node's crypto capabilities

Documentation/ABI/testing/sysfs-devices-node | 10 +
arch/x86/Kconfig.debug | 10 +
arch/x86/include/asm/e820/api.h | 1 +
arch/x86/include/asm/e820/types.h | 12 +-
arch/x86/kernel/e820.c | 393 ++++++++++++++-----
arch/x86/kernel/e820_test.c | 249 ++++++++++++
arch/x86/platform/efi/efi.c | 37 ++
drivers/base/node.c | 10 +
include/linux/memblock.h | 5 +
include/linux/mmzone.h | 3 +
mm/memblock.c | 62 +++
mm/page_alloc.c | 1 +
12 files changed, 695 insertions(+), 98 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-devices-node
create mode 100644 arch/x86/kernel/e820_test.c

--
2.30.2

Martin Fernandez

unread,
Jul 4, 2022, 9:58:53 AM7/4/22
to linux-...@vger.kernel.org, linu...@vger.kernel.org, platform-...@vger.kernel.org, linu...@kvack.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, tg...@linutronix.de, mi...@redhat.com, b...@alien8.de, dave....@linux.intel.com, x...@kernel.org, h...@zytor.com, ar...@kernel.org, dvh...@infradead.org, an...@infradead.org, gre...@linuxfoundation.org, raf...@kernel.org, rp...@kernel.org, ak...@linux-foundation.org, daniel...@eclypsium.com, hugh...@gmail.com, alex.ba...@eclypsium.com, alison.s...@intel.com, kees...@chromium.org, Martin Fernandez
Add the capability to mark regions of the memory memory_type able of
hardware memory encryption.

Also add the capability to query if all regions of a memory node are
able to do hardware memory encryption to call it when initializing the
nodes. Warn the user if a node has both encryptable and
non-encryptable regions.

Signed-off-by: Martin Fernandez <martin.f...@eclypsium.com>
---
include/linux/memblock.h | 5 ++++
mm/memblock.c | 62 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 67 insertions(+)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 50ad19662a32..00c4f1a20335 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -40,6 +40,7 @@ extern unsigned long long max_possible_pfn;
* via a driver, and never indicated in the firmware-provided memory map as
* system RAM. This corresponds to IORESOURCE_SYSRAM_DRIVER_MANAGED in the
* kernel resource tree.
+ * @MEMBLOCK_CRYPTO_CAPABLE: capable of hardware encryption
*/
enum memblock_flags {
MEMBLOCK_NONE = 0x0, /* No special request */
@@ -47,6 +48,7 @@ enum memblock_flags {
MEMBLOCK_MIRROR = 0x2, /* mirrored region */
MEMBLOCK_NOMAP = 0x4, /* don't add to kernel direct mapping */
MEMBLOCK_DRIVER_MANAGED = 0x8, /* always detected via a driver */
+ MEMBLOCK_CRYPTO_CAPABLE = 0x10, /* capable of hardware encryption */
};

/**
@@ -120,6 +122,9 @@ int memblock_physmem_add(phys_addr_t base, phys_addr_t size);
void memblock_trim_memory(phys_addr_t align);
bool memblock_overlaps_region(struct memblock_type *type,
phys_addr_t base, phys_addr_t size);
+bool memblock_node_is_crypto_capable(int nid);
+int memblock_mark_crypto_capable(phys_addr_t base, phys_addr_t size);
+int memblock_clear_crypto_capable(phys_addr_t base, phys_addr_t size);
int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size);
int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size);
int memblock_mark_mirror(phys_addr_t base, phys_addr_t size);
diff --git a/mm/memblock.c b/mm/memblock.c
index e4f03a6e8e56..d6399835b155 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -191,6 +191,40 @@ bool __init_memblock memblock_overlaps_region(struct memblock_type *type,
return i < type->cnt;
}

+/**
+ * memblock_node_is_crypto_capable - get if whole node is capable
+ * of encryption
+ * @nid: number of node
+ *
+ * Iterate over all memory memblock_type and find if all regions under
+ * node @nid are capable of hardware encryption.
+ *
+ * Return:
+ * true if every region in @nid is capable of encryption, false
+ * otherwise.
+ */
+bool __init_memblock memblock_node_is_crypto_capable(int nid)
+{
+ struct memblock_region *region;
+ int crypto_capables = 0;
+ int not_crypto_capables = 0;
+
+ for_each_mem_region(region) {
+ if (memblock_get_region_node(region) == nid) {
+ if (region->flags & MEMBLOCK_CRYPTO_CAPABLE)
+ crypto_capables++;
+ else
+ not_crypto_capables++;
+ }
+ }
+
+ if (crypto_capables > 0 && not_crypto_capables > 0)
+ pr_warn("Node %d has %d regions that are encryptable and %d regions that aren't",
+ nid, not_crypto_capables, crypto_capables);
+
+ return crypto_capables > 0 && not_crypto_capables == 0;
+}
+
/**
* __memblock_find_range_bottom_up - find free area utility in bottom-up
* @start: start of candidate range
@@ -891,6 +925,34 @@ static int __init_memblock memblock_setclr_flag(phys_addr_t base,
return 0;
}

+/**
+ * memblock_mark_crypto_capable - Mark memory regions capable of hardware
+ * encryption with flag MEMBLOCK_CRYPTO_CAPABLE.
+ * @base: the base phys addr of the region
+ * @size: the size of the region
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+int __init_memblock memblock_mark_crypto_capable(phys_addr_t base,
+ phys_addr_t size)
+{
+ return memblock_setclr_flag(base, size, 1, MEMBLOCK_CRYPTO_CAPABLE);
+}
+
+/**
+ * memblock_clear_crypto_capable - Clear flag MEMBLOCK_CRYPTO for a
+ * specified region.
+ * @base: the base phys addr of the region
+ * @size: the size of the region
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+int __init_memblock memblock_clear_crypto_capable(phys_addr_t base,
+ phys_addr_t size)
+{
+ return memblock_setclr_flag(base, size, 0, MEMBLOCK_CRYPTO_CAPABLE);
+}
+
/**
* memblock_mark_hotplug - Mark hotpluggable memory with flag MEMBLOCK_HOTPLUG.
* @base: the base phys addr of the region
--
2.30.2

Martin Fernandez

unread,
Jul 4, 2022, 9:59:01 AM7/4/22
to linux-...@vger.kernel.org, linu...@vger.kernel.org, platform-...@vger.kernel.org, linu...@kvack.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, tg...@linutronix.de, mi...@redhat.com, b...@alien8.de, dave....@linux.intel.com, x...@kernel.org, h...@zytor.com, ar...@kernel.org, dvh...@infradead.org, an...@infradead.org, gre...@linuxfoundation.org, raf...@kernel.org, rp...@kernel.org, ak...@linux-foundation.org, daniel...@eclypsium.com, hugh...@gmail.com, alex.ba...@eclypsium.com, alison.s...@intel.com, kees...@chromium.org, Martin Fernandez
Add a new member in the pg_data_t struct to tell whether the node
corresponding to that pg_data_t is able to do hardware memory
encryption.

This will be read from sysfs.

Signed-off-by: Martin Fernandez <martin.f...@eclypsium.com>
---
include/linux/mmzone.h | 3 +++
mm/page_alloc.c | 1 +
2 files changed, 4 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index aab70355d64f..6fd4785f1d05 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -883,6 +883,9 @@ typedef struct pglist_data {
struct task_struct *kcompactd;
bool proactive_compact_trigger;
#endif
+
+ bool crypto_capable;
+
/*
* This is a per-node reserve of pages that are not available
* to userspace allocations.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e008a3df0485..147437329ac7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7729,6 +7729,7 @@ static void __init free_area_init_node(int nid)
pgdat->node_id = nid;
pgdat->node_start_pfn = start_pfn;
pgdat->per_cpu_nodestats = NULL;
+ pgdat->crypto_capable = memblock_node_is_crypto_capable(nid);

if (start_pfn != end_pfn) {
pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid,
--
2.30.2

Martin Fernandez

unread,
Jul 4, 2022, 9:59:07 AM7/4/22
to linux-...@vger.kernel.org, linu...@vger.kernel.org, platform-...@vger.kernel.org, linu...@kvack.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, tg...@linutronix.de, mi...@redhat.com, b...@alien8.de, dave....@linux.intel.com, x...@kernel.org, h...@zytor.com, ar...@kernel.org, dvh...@infradead.org, an...@infradead.org, gre...@linuxfoundation.org, raf...@kernel.org, rp...@kernel.org, ak...@linux-foundation.org, daniel...@eclypsium.com, hugh...@gmail.com, alex.ba...@eclypsium.com, alison.s...@intel.com, kees...@chromium.org, Martin Fernandez
__e820__range_update and e820__range_remove had a very similar flow in
its implementation with a few lines different from each other, the
lines that actually perform the modification over the e820_table. The
similiraties were found in the checks for the different cases on how
each entry intersects with the given range (if it does at all). These
checks were very presice and error prone so it was not a good idea to
have them in both places.

Since I need to add a third one, similar to this two, in this and the
following patches I'll propose a refactor of these functions.

In this patch I introduce:

- A new type e820_entry_updater that will carry three callbacks, those
callbacks will decide WHEN to perform actions over the e820_table and
WHAY actions are going to be performed.

Note that there is a void pointer "data". This pointer will carry
useful information for the callbacks, like the type that we want to
update in e820__range_update or if we want to check the type in
e820__range_remove. Check it out in the next patches where I do the
rework of __e820__range_update and e820__range_remove.

- A new function __e820__handle_range_update that has the flow of the
original two functions to refactor. Together with e820_entry_updater
will perform the desired update on the input table.

On version 6 of this patch some people pointed out that this solution
was over-complicated. Mike Rapoport suggested a another solution [1].

I took a look at that, and although it is indeed simpler it's more
confusing at the same time. I think is manageable to have a single
function to update or remove sections of the table (what Mike did),
but when I added the functionality to also update the crypto_capable
it became really hard to manage.

I think that the approach presented in this patch it's complex but is
easier to read, to extend and to test.

[1] https://git.kernel.org/rppt/h/x86/e820-update-range

Signed-off-by: Martin Fernandez <martin.f...@eclypsium.com>

--------------------------------------------------

Changes from v7 to v8:

(Some) Callbacks of e820_entry_updater can now be NULL to avoid
defining empty functions

Remove kerneldocs in favor of plain comments just to explain what the
functions do.
---
arch/x86/kernel/e820.c | 127 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 127 insertions(+)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index f267205f2d5a..e0fa3ab755a5 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -459,6 +459,133 @@ static int __init append_e820_table(struct boot_e820_entry *entries, u32 nr_entr
return __append_e820_table(entries, nr_entries);
}

+/**
+ * struct e820_entry_updater - Helper type for
+ * __e820__handle_range_update().
+ * @should_update: Return true if @entry needs to be updated, false
+ * otherwise.
+ * @update: Apply desired actions to an @entry that is inside the
+ * range and satisfies @should_update. Can be set to NULL to avoid empty functions.
+ * @new: Create new entry in the table with information gathered from
+ * @original and @data. Can be set to NULL to avoid empty functions.
+ *
+ * Each function corresponds to an action that
+ * __e820__handle_range_update() does. Callbacks need to cast @data back
+ * to the corresponding type.
+ */
+struct e820_entry_updater {
+ bool (*should_update)(const struct e820_entry *entry, const void *data);
+ void (*update)(struct e820_entry *entry, const void *data);
+ void (*new)(struct e820_table *table, u64 new_start, u64 new_size,
+ const struct e820_entry *original, const void *data);
+};
+
+/*
+ * Helper for __e820__handle_range_update to handle the case where
+ * neither the entry completely covers the range nor the range
+ * completely covers the entry.
+ */
+static u64 __init
+__e820__handle_intersected_range_update(struct e820_table *table,
+ u64 start,
+ u64 size,
+ struct e820_entry *entry,
+ const struct e820_entry_updater *updater,
+ const void *data)
+{
+ u64 end;
+ u64 entry_end = entry->addr + entry->size;
+ u64 inner_start;
+ u64 inner_end;
+ u64 updated_size = 0;
+
+ if (size > (ULLONG_MAX - start))
+ size = ULLONG_MAX - start;
+
+ end = start + size;
+ inner_start = max(start, entry->addr);
+ inner_end = min(end, entry_end);
+
+ /* Range and entry do intersect and... */
+ if (inner_start < inner_end) {
+ /* Entry is on the left */
+ if (entry->addr < inner_start) {
+ /* Resize current entry */
+ entry->size = inner_start - entry->addr;
+ /* Entry is on the right */
+ } else {
+ /* Resize and move current section */
+ entry->addr = inner_end;
+ entry->size = entry_end - inner_end;
+ }
+ if (updater->new != NULL)
+ /* Create new entry with intersected region */
+ updater->new(table, inner_start, inner_end - inner_start, entry, data);
+
+ updated_size += inner_end - inner_start;
+ } /* Else: [start, end) doesn't cover entry */
+
+ return updated_size;
+}
+
+/*
+ * Update the table @table in [@start, @start + @size) doing the
+ * actions given in @updater.
+ */
+static u64 __init
+__e820__handle_range_update(struct e820_table *table,
+ u64 start,
+ u64 size,
+ const struct e820_entry_updater *updater,
+ const void *data)
+{
+ u64 updated_size = 0;
+ u64 end;
+ unsigned int i;
+
+ if (size > (ULLONG_MAX - start))
+ size = ULLONG_MAX - start;
+
+ end = start + size;
+
+ for (i = 0; i < table->nr_entries; i++) {
+ struct e820_entry *entry = &table->entries[i];
+ u64 entry_end = entry->addr + entry->size;
+
+ if (updater->should_update(entry, data)) {
+ /* Range completely covers entry */
+ if (entry->addr >= start && entry_end <= end) {
+ updated_size += entry->size;
+ if (updater->update != NULL)
+ updater->update(entry, data);
+ /* Entry completely covers range */
+ } else if (start > entry->addr && end < entry_end) {
+ /* Resize current entry */
+ entry->size = start - entry->addr;
+
+ if (updater->new != NULL)
+ /* Create new entry with intersection region */
+ updater->new(table, start, size, entry, data);
+
+ /*
+ * Create a new entry for the leftover
+ * of the current entry
+ */
+ __e820__range_add(table, end, entry_end - end,
+ entry->type);
+
+ updated_size += size;
+ } else {
+ updated_size +=
+ __e820__handle_intersected_range_update(table, start, size,
+ entry, updater, data);
+ }
+ }
+ }
+
+ return updated_size;
+}
+
static u64 __init
__e820__range_update(struct e820_table *table, u64 start, u64 size, enum e820_type old_type, enum e820_type new_type)
{
--
2.30.2

Martin Fernandez

unread,
Jul 4, 2022, 9:59:14 AM7/4/22
to linux-...@vger.kernel.org, linu...@vger.kernel.org, platform-...@vger.kernel.org, linu...@kvack.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, tg...@linutronix.de, mi...@redhat.com, b...@alien8.de, dave....@linux.intel.com, x...@kernel.org, h...@zytor.com, ar...@kernel.org, dvh...@infradead.org, an...@infradead.org, gre...@linuxfoundation.org, raf...@kernel.org, rp...@kernel.org, ak...@linux-foundation.org, daniel...@eclypsium.com, hugh...@gmail.com, alex.ba...@eclypsium.com, alison.s...@intel.com, kees...@chromium.org, Martin Fernandez
Refactor __e820__range_update with the introduction of
e820_type_updater_data, indented to be used as the void pointer in the
e820_entry_updater callbacks, and the implementation of the callbacks
to perform the update of the type in a range of a e820_table.

Signed-off-by: Martin Fernandez <martin.f...@eclypsium.com>
---
arch/x86/kernel/e820.c | 119 +++++++++++++++++++++--------------------
1 file changed, 62 insertions(+), 57 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index e0fa3ab755a5..36a22c0a2199 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -586,80 +586,85 @@ __e820__handle_range_update(struct e820_table *table,
return updated_size;
}

-static u64 __init
-__e820__range_update(struct e820_table *table, u64 start, u64 size, enum e820_type old_type, enum e820_type new_type)
-{
- u64 end;
- unsigned int i;
- u64 real_updated_size = 0;
-
- BUG_ON(old_type == new_type);
-
- if (size > (ULLONG_MAX - start))
- size = ULLONG_MAX - start;
+/*
+ * Type helper for the e820_entry_updater callbacks.
+ */
+struct e820_type_updater_data {
+ enum e820_type old_type;
+ enum e820_type new_type;
+};

- end = start + size;
- printk(KERN_DEBUG "e820: update [mem %#010Lx-%#010Lx] ", start, end - 1);
- e820_print_type(old_type);
- pr_cont(" ==> ");
- e820_print_type(new_type);
- pr_cont("\n");
+static bool __init type_updater__should_update(const struct e820_entry *entry,
+ const void *data)
+{
+ const struct e820_type_updater_data *type_updater_data = data;

- for (i = 0; i < table->nr_entries; i++) {
- struct e820_entry *entry = &table->entries[i];
- u64 final_start, final_end;
- u64 entry_end;
+ return entry->type == type_updater_data->old_type;
+}

- if (entry->type != old_type)
- continue;
+static void __init type_updater__update(struct e820_entry *entry,
+ const void *data)
+{
+ const struct e820_type_updater_data *type_updater_data = data;

- entry_end = entry->addr + entry->size;
+ entry->type = type_updater_data->new_type;
+}

- /* Completely covered by new range? */
- if (entry->addr >= start && entry_end <= end) {
- entry->type = new_type;
- real_updated_size += entry->size;
- continue;
- }
+static void __init type_updater__new(struct e820_table *table, u64 new_start,
+ u64 new_size,
+ const struct e820_entry *original,
+ const void *data)
+{
+ const struct e820_type_updater_data *type_updater_data = data;

- /* New range is completely covered? */
- if (entry->addr < start && entry_end > end) {
- __e820__range_add(table, start, size, new_type);
- __e820__range_add(table, end, entry_end - end, entry->type);
- entry->size = start - entry->addr;
- real_updated_size += size;
- continue;
- }
+ __e820__range_add(table, new_start, new_size,
+ type_updater_data->new_type);
+}

- /* Partially covered: */
- final_start = max(start, entry->addr);
- final_end = min(end, entry_end);
- if (final_start >= final_end)
- continue;
+static u64 __init __e820__range_update(struct e820_table *table, u64 start,
+ u64 size, enum e820_type old_type,
+ enum e820_type new_type)
+{
+ struct e820_entry_updater updater = {
+ .should_update = type_updater__should_update,
+ .update = type_updater__update,
+ .new = type_updater__new
+ };

- __e820__range_add(table, final_start, final_end - final_start, new_type);
+ struct e820_type_updater_data data = {
+ .old_type = old_type,
+ .new_type = new_type
+ };

- real_updated_size += final_end - final_start;
+ BUG_ON(old_type == new_type);

- /*
- * Left range could be head or tail, so need to update
- * its size first:
- */
- entry->size -= final_end - final_start;
- if (entry->addr < final_start)
- continue;
+ printk(KERN_DEBUG "e820: update [mem %#018Lx-%#018Lx] ", start,
+ start + size - 1);
+ e820_print_type(old_type);
+ pr_cont(" ==> ");
+ e820_print_type(new_type);
+ pr_cont("\n");

- entry->addr = final_end;
- }
- return real_updated_size;
+ return __e820__handle_range_update(table, start, size, &updater, &data);
}

-u64 __init e820__range_update(u64 start, u64 size, enum e820_type old_type, enum e820_type new_type)
+/*
+ * Update type of addresses in [@start, @start + @size) from @old_type
+ * to @new_type in e820_table.
+ */
+u64 __init e820__range_update(u64 start, u64 size, enum e820_type old_type,
+ enum e820_type new_type)
{
return __e820__range_update(e820_table, start, size, old_type, new_type);
}

-static u64 __init e820__range_update_kexec(u64 start, u64 size, enum e820_type old_type, enum e820_type new_type)
+/*
+ * Update type of addresses in [@start, @start + @size) from @old_type
+ * to @new_type in e820_table_kexec.
+ */
+static u64 __init e820__range_update_kexec(u64 start, u64 size,
+ enum e820_type old_type,
+ enum e820_type new_type)
{
return __e820__range_update(e820_table_kexec, start, size, old_type, new_type);
}
--
2.30.2

Martin Fernandez

unread,
Jul 4, 2022, 9:59:21 AM7/4/22
to linux-...@vger.kernel.org, linu...@vger.kernel.org, platform-...@vger.kernel.org, linu...@kvack.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, tg...@linutronix.de, mi...@redhat.com, b...@alien8.de, dave....@linux.intel.com, x...@kernel.org, h...@zytor.com, ar...@kernel.org, dvh...@infradead.org, an...@infradead.org, gre...@linuxfoundation.org, raf...@kernel.org, rp...@kernel.org, ak...@linux-foundation.org, daniel...@eclypsium.com, hugh...@gmail.com, alex.ba...@eclypsium.com, alison.s...@intel.com, kees...@chromium.org, Martin Fernandez
Refactor e820__range_remove with the introduction of
e820_remover_data, indented to be used as the void pointer in the
e820_entry_updater callbacks, and the implementation of the callbacks
remove a range in the e820_table.

Signed-off-by: Martin Fernandez <martin.f...@eclypsium.com>
---
arch/x86/kernel/e820.c | 94 ++++++++++++++++++------------------------
1 file changed, 41 insertions(+), 53 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 36a22c0a2199..0e5aa13ebdb8 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -669,66 +669,54 @@ static u64 __init e820__range_update_kexec(u64 start, u64 size,
return __e820__range_update(e820_table_kexec, start, size, old_type, new_type);
}

-/* Remove a range of memory from the E820 table: */
-u64 __init e820__range_remove(u64 start, u64 size, enum e820_type old_type, bool check_type)
-{
- int i;
- u64 end;
- u64 real_removed_size = 0;
-
- if (size > (ULLONG_MAX - start))
- size = ULLONG_MAX - start;
-
- end = start + size;
- printk(KERN_DEBUG "e820: remove [mem %#010Lx-%#010Lx] ", start, end - 1);
- if (check_type)
- e820_print_type(old_type);
- pr_cont("\n");
-
- for (i = 0; i < e820_table->nr_entries; i++) {
- struct e820_entry *entry = &e820_table->entries[i];
- u64 final_start, final_end;
- u64 entry_end;
-
- if (check_type && entry->type != old_type)
- continue;
+/*
+ * Type helper for the e820_entry_updater callbacks.
+ */
+struct e820_remover_data {
+ enum e820_type old_type;
+ bool check_type;
+};

- entry_end = entry->addr + entry->size;
+static bool __init remover__should_update(const struct e820_entry *entry,
+ const void *data)
+{
+ const struct e820_remover_data *remover_data = data;

- /* Completely covered? */
- if (entry->addr >= start && entry_end <= end) {
- real_removed_size += entry->size;
- memset(entry, 0, sizeof(*entry));
- continue;
- }
+ return !remover_data->check_type ||
+ entry->type == remover_data->old_type;
+}

- /* Is the new range completely covered? */
- if (entry->addr < start && entry_end > end) {
- e820__range_add(end, entry_end - end, entry->type);
- entry->size = start - entry->addr;
- real_removed_size += size;
- continue;
- }
+static void __init remover__update(struct e820_entry *entry, const void *data)
+{
+ memset(entry, 0, sizeof(*entry));
+}

- /* Partially covered: */
- final_start = max(start, entry->addr);
- final_end = min(end, entry_end);
- if (final_start >= final_end)
- continue;
+/*
+ * Remove [@start, @start + @size) from e820_table. If @check_type is
+ * true remove only entries with type @old_type.
+ */
+u64 __init e820__range_remove(u64 start, u64 size, enum e820_type old_type,
+ bool check_type)
+{
+ struct e820_entry_updater updater = {
+ .should_update = remover__should_update,
+ .update = remover__update,
+ .new = NULL
+ };

- real_removed_size += final_end - final_start;
+ struct e820_remover_data data = {
+ .check_type = check_type,
+ .old_type = old_type
+ };

- /*
- * Left range could be head or tail, so need to update
- * the size first:
- */
- entry->size -= final_end - final_start;
- if (entry->addr < final_start)
- continue;
+ printk(KERN_DEBUG "e820: remove [mem %#018Lx-%#018Lx] ", start,
+ start + size - 1);
+ if (check_type)
+ e820_print_type(old_type);
+ pr_cont("\n");

- entry->addr = final_end;
- }
- return real_removed_size;
+ return __e820__handle_range_update(e820_table, start, size, &updater,
+ &data);
}

void __init e820__update_table_print(void)
--
2.30.2

Martin Fernandez

unread,
Jul 4, 2022, 9:59:29 AM7/4/22
to linux-...@vger.kernel.org, linu...@vger.kernel.org, platform-...@vger.kernel.org, linu...@kvack.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, tg...@linutronix.de, mi...@redhat.com, b...@alien8.de, dave....@linux.intel.com, x...@kernel.org, h...@zytor.com, ar...@kernel.org, dvh...@infradead.org, an...@infradead.org, gre...@linuxfoundation.org, raf...@kernel.org, rp...@kernel.org, ak...@linux-foundation.org, daniel...@eclypsium.com, hugh...@gmail.com, alex.ba...@eclypsium.com, alison.s...@intel.com, kees...@chromium.org, Martin Fernandez
Add a new enum for crypto capabilities. I choosed an enum instead of a
boolean for more visibility in the code and because maybe in the
future we would like to track from where the cryptographic
capabilities comes (in this case, the EFI memmap).

Add a new member in e820_entry to hold this new enum.

Add a new function e820__range_set_crypto_capable to mark all the
entries in a range of addresses as encryptable. This will be called
when initializing EFI.

Change e820__update_table to handle merging and overlap problems
taking into account crypto_capable.

Signed-off-by: Martin Fernandez <martin.f...@eclypsium.com>
---
arch/x86/include/asm/e820/api.h | 1 +
arch/x86/include/asm/e820/types.h | 12 +++--
arch/x86/kernel/e820.c | 88 +++++++++++++++++++++++++++++--
3 files changed, 93 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h
index e8f58ddd06d9..4b3b01fafdd1 100644
--- a/arch/x86/include/asm/e820/api.h
+++ b/arch/x86/include/asm/e820/api.h
@@ -17,6 +17,7 @@ extern bool e820__mapped_all(u64 start, u64 end, enum e820_type type);
extern void e820__range_add (u64 start, u64 size, enum e820_type type);
extern u64 e820__range_update(u64 start, u64 size, enum e820_type old_type, enum e820_type new_type);
extern u64 e820__range_remove(u64 start, u64 size, enum e820_type old_type, bool check_type);
+extern u64 e820__range_set_crypto_capable(u64 start, u64 size);

extern void e820__print_table(char *who);
extern int e820__update_table(struct e820_table *table);
diff --git a/arch/x86/include/asm/e820/types.h b/arch/x86/include/asm/e820/types.h
index 314f75d886d0..aef03c665f5e 100644
--- a/arch/x86/include/asm/e820/types.h
+++ b/arch/x86/include/asm/e820/types.h
@@ -46,6 +46,11 @@ enum e820_type {
E820_TYPE_RESERVED_KERN = 128,
};

+enum e820_crypto_capabilities {
+ E820_NOT_CRYPTO_CAPABLE = 0,
+ E820_CRYPTO_CAPABLE = 1,
+};
+
/*
* A single E820 map entry, describing a memory range of [addr...addr+size-1],
* of 'type' memory type:
@@ -53,9 +58,10 @@ enum e820_type {
* (We pack it because there can be thousands of them on large systems.)
*/
struct e820_entry {
- u64 addr;
- u64 size;
- enum e820_type type;
+ u64 addr;
+ u64 size;
+ enum e820_type type;
+ enum e820_crypto_capabilities crypto_capable;
} __attribute__((packed));

/*
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 0e5aa13ebdb8..dade59758b9f 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -163,7 +163,9 @@ int e820__get_entry_type(u64 start, u64 end)
/*
* Add a memory region to the kernel E820 map.
*/
-static void __init __e820__range_add(struct e820_table *table, u64 start, u64 size, enum e820_type type)
+static void __init __e820__range_add(struct e820_table *table, u64 start,
+ u64 size, enum e820_type type,
+ enum e820_crypto_capabilities crypto_capable)
{
int x = table->nr_entries;

@@ -176,12 +178,13 @@ static void __init __e820__range_add(struct e820_table *table, u64 start, u64 si
table->entries[x].addr = start;
table->entries[x].size = size;
table->entries[x].type = type;
+ table->entries[x].crypto_capable = crypto_capable;
table->nr_entries++;
}

void __init e820__range_add(u64 start, u64 size, enum e820_type type)
{
- __e820__range_add(e820_table, start, size, type);
+ __e820__range_add(e820_table, start, size, type, E820_NOT_CRYPTO_CAPABLE);
}

static void __init e820_print_type(enum e820_type type)
@@ -211,6 +214,8 @@ void __init e820__print_table(char *who)
e820_table->entries[i].addr + e820_table->entries[i].size - 1);

e820_print_type(e820_table->entries[i].type);
+ if (e820_table->entries[i].crypto_capable == E820_CRYPTO_CAPABLE)
+ pr_cont("; crypto-capable");
pr_cont("\n");
}
}
@@ -327,6 +332,7 @@ int __init e820__update_table(struct e820_table *table)
unsigned long long last_addr;
u32 new_nr_entries, overlap_entries;
u32 i, chg_idx, chg_nr;
+ enum e820_crypto_capabilities current_crypto, last_crypto;

/* If there's only one memory region, don't bother: */
if (table->nr_entries < 2)
@@ -367,6 +373,7 @@ int __init e820__update_table(struct e820_table *table)
new_nr_entries = 0; /* Index for creating new map entries */
last_type = 0; /* Start with undefined memory type */
last_addr = 0; /* Start with 0 as last starting address */
+ last_crypto = E820_NOT_CRYPTO_CAPABLE;

/* Loop through change-points, determining effect on the new map: */
for (chg_idx = 0; chg_idx < chg_nr; chg_idx++) {
@@ -388,13 +395,19 @@ int __init e820__update_table(struct e820_table *table)
* 1=usable, 2,3,4,4+=unusable)
*/
current_type = 0;
+ current_crypto = E820_CRYPTO_CAPABLE;
for (i = 0; i < overlap_entries; i++) {
+ if (overlap_list[i]->crypto_capable < current_crypto)
+ current_crypto = overlap_list[i]->crypto_capable;
+
if (overlap_list[i]->type > current_type)
current_type = overlap_list[i]->type;
}

/* Continue building up new map based on this information: */
- if (current_type != last_type || e820_nomerge(current_type)) {
+ if (current_type != last_type ||
+ current_crypto != last_crypto ||
+ e820_nomerge(current_type)) {
if (last_type != 0) {
new_entries[new_nr_entries].size = change_point[chg_idx]->addr - last_addr;
/* Move forward only if the new size was non-zero: */
@@ -406,9 +419,12 @@ int __init e820__update_table(struct e820_table *table)
if (current_type != 0) {
new_entries[new_nr_entries].addr = change_point[chg_idx]->addr;
new_entries[new_nr_entries].type = current_type;
+ new_entries[new_nr_entries].crypto_capable = current_crypto;
+
last_addr = change_point[chg_idx]->addr;
}
last_type = current_type;
+ last_crypto = current_crypto;
}
}

@@ -572,7 +588,8 @@ __e820__handle_range_update(struct e820_table *table,
* of the current entry
*/
__e820__range_add(table, end, entry_end - end,
- entry->type);
+ entry->type,
+ entry->crypto_capable);

updated_size += size;
} else {
@@ -618,7 +635,8 @@ static void __init type_updater__new(struct e820_table *table, u64 new_start,
const struct e820_type_updater_data *type_updater_data = data;

__e820__range_add(table, new_start, new_size,
- type_updater_data->new_type);
+ type_updater_data->new_type,
+ original->crypto_capable);
}

static u64 __init __e820__range_update(struct e820_table *table, u64 start,
@@ -719,6 +737,64 @@ u64 __init e820__range_remove(u64 start, u64 size, enum e820_type old_type,
&data);
}

+static bool __init crypto_updater__should_update(const struct e820_entry *entry,
+ const void *data)
+{
+ const enum e820_crypto_capabilities *crypto_capable = data;
+
+ return *crypto_capable != entry->crypto_capable;
+}
+
+static void __init crypto_updater__update(struct e820_entry *entry,
+ const void *data)
+{
+ const enum e820_crypto_capabilities *crypto_capable = data;
+
+ entry->crypto_capable = *crypto_capable;
+}
+
+static void __init crypto_updater__new(struct e820_table *table, u64 new_start,
+ u64 new_size,
+ const struct e820_entry *original,
+ const void *data)
+{
+ const enum e820_crypto_capabilities *crypto_capable = data;
+
+ __e820__range_add(table, new_start, new_size, original->type, *crypto_capable);
+}
+
+static u64 __init
+__e820__range_update_crypto(struct e820_table *table, u64 start, u64 size,
+ enum e820_crypto_capabilities crypto_capable)
+{
+ struct e820_entry_updater updater = {
+ .should_update = crypto_updater__should_update,
+ .update = crypto_updater__update,
+ .new = crypto_updater__new
+ };
+
+ printk(KERN_DEBUG "e820: crypto update [mem %#018Lx-%#018Lx]", start,
+ start + size - 1);
+ pr_cont(" ==> ");
+ if (crypto_capable == E820_CRYPTO_CAPABLE)
+ pr_cont("crypto capable");
+ else
+ pr_cont("not crypto capable");
+ pr_cont("\n");
+
+ return __e820__handle_range_update(table, start, size, &updater,
+ &crypto_capable);
+}
+
+/*
+ * Set %E820_CRYPTO_CAPABLE to [@start, @start + @size) in e820_table.
+ */
+u64 __init e820__range_set_crypto_capable(u64 start, u64 size)
+{
+ return __e820__range_update_crypto(e820_table, start, size,
+ E820_CRYPTO_CAPABLE);
+}
+
void __init e820__update_table_print(void)
{
if (e820__update_table(e820_table))
@@ -1461,6 +1537,8 @@ void __init e820__memblock_setup(void)
continue;

memblock_add(entry->addr, entry->size);
+ if (entry->crypto_capable == E820_CRYPTO_CAPABLE)
+ memblock_mark_crypto_capable(entry->addr, entry->size);
}

/* Throw away partial pages: */
--
2.30.2

Martin Fernandez

unread,
Jul 4, 2022, 9:59:36 AM7/4/22
to linux-...@vger.kernel.org, linu...@vger.kernel.org, platform-...@vger.kernel.org, linu...@kvack.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, tg...@linutronix.de, mi...@redhat.com, b...@alien8.de, dave....@linux.intel.com, x...@kernel.org, h...@zytor.com, ar...@kernel.org, dvh...@infradead.org, an...@infradead.org, gre...@linuxfoundation.org, raf...@kernel.org, rp...@kernel.org, ak...@linux-foundation.org, daniel...@eclypsium.com, hugh...@gmail.com, alex.ba...@eclypsium.com, alison.s...@intel.com, kees...@chromium.org, Martin Fernandez
Add KUnit tests for the e820_range_* functions.

Signed-off-by: Martin Fernandez <martin.f...@eclypsium.com>
---
arch/x86/Kconfig.debug | 10 ++
arch/x86/kernel/e820.c | 5 +
arch/x86/kernel/e820_test.c | 249 ++++++++++++++++++++++++++++++++++++
3 files changed, 264 insertions(+)
create mode 100644 arch/x86/kernel/e820_test.c

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index d872a7522e55..b5040d345fb4 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -225,6 +225,16 @@ config PUNIT_ATOM_DEBUG
The current power state can be read from
/sys/kernel/debug/punit_atom/dev_power_state

+config E820_KUNIT_TEST
+ tristate "Tests for E820" if !KUNIT_ALL_TESTS
+ depends on KUNIT=y
+ default KUNIT_ALL_TESTS
+ help
+ This option enables unit tests for the e820.c code. It
+ should be enabled only in development environments.
+
+ If unsure, say N.
+
choice
prompt "Choose kernel unwinder"
default UNWINDER_ORC if X86_64
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index dade59758b9f..a6ced3e306dd 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1546,3 +1546,8 @@ void __init e820__memblock_setup(void)

memblock_dump_all();
}
+
+#ifdef CONFIG_E820_KUNIT_TEST
+/* Let e820_test have access the static functions in this file */
+#include "e820_test.c"
+#endif
diff --git a/arch/x86/kernel/e820_test.c b/arch/x86/kernel/e820_test.c
new file mode 100644
index 000000000000..6b28ea131380
--- /dev/null
+++ b/arch/x86/kernel/e820_test.c
@@ -0,0 +1,249 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <kunit/test.h>
+
+#include <asm/e820/api.h>
+#include <asm/setup.h>
+
+#define KUNIT_EXPECT_E820_ENTRY_EQ(_test, _entry, _addr, _size, _type, \
+ _crypto_capable) \
+ do { \
+ KUNIT_EXPECT_EQ((_test), (_entry).addr, (_addr)); \
+ KUNIT_EXPECT_EQ((_test), (_entry).size, (_size)); \
+ KUNIT_EXPECT_EQ((_test), (_entry).type, (_type)); \
+ KUNIT_EXPECT_EQ((_test), (_entry).crypto_capable, \
+ (_crypto_capable)); \
+ } while (0)
+
+struct e820_table test_table __initdata;
+
+static void __init test_e820_range_add(struct kunit *test)
+{
+ u32 full = ARRAY_SIZE(test_table.entries);
+ /* Add last entry. */
+ test_table.nr_entries = full - 1;
+ __e820__range_add(&test_table, 0, 15, 0, 0);
+ KUNIT_EXPECT_EQ(test, test_table.nr_entries, full);
+ /* Skip new entry when full. */
+ __e820__range_add(&test_table, 0, 15, 0, 0);
+ KUNIT_EXPECT_EQ(test, test_table.nr_entries, full);
+}
+
+static void __init test_e820_range_update(struct kunit *test)
+{
+ u64 entry_size = 15;
+ u64 updated_size = 0;
+ /* Initialize table */
+ test_table.nr_entries = 0;
+ __e820__range_add(&test_table, 0, entry_size, E820_TYPE_RAM,
+ E820_NOT_CRYPTO_CAPABLE);
+ __e820__range_add(&test_table, entry_size, entry_size, E820_TYPE_RAM,
+ E820_NOT_CRYPTO_CAPABLE);
+ __e820__range_add(&test_table, entry_size * 2, entry_size,
+ E820_TYPE_ACPI, E820_NOT_CRYPTO_CAPABLE);
+
+ updated_size = __e820__range_update(&test_table, 0, entry_size * 2,
+ E820_TYPE_RAM, E820_TYPE_RESERVED);
+
+ /* The first 2 regions were updated */
+ KUNIT_EXPECT_EQ(test, updated_size, entry_size * 2);
+ KUNIT_EXPECT_E820_ENTRY_EQ(test, test_table.entries[0], 0, entry_size,
+ E820_TYPE_RESERVED, E820_NOT_CRYPTO_CAPABLE);
+ KUNIT_EXPECT_E820_ENTRY_EQ(test, test_table.entries[1], entry_size,
+ entry_size, E820_TYPE_RESERVED,
+ E820_NOT_CRYPTO_CAPABLE);
+ KUNIT_EXPECT_E820_ENTRY_EQ(test, test_table.entries[2], entry_size * 2,
+ entry_size, E820_TYPE_ACPI,
+ E820_NOT_CRYPTO_CAPABLE);
+
+ updated_size = __e820__range_update(&test_table, 0, entry_size * 3,
+ E820_TYPE_RESERVED, E820_TYPE_RAM);
+
+ /*
+ * Only the first 2 regions were updated,
+ * since E820_TYPE_ACPI > E820_TYPE_RESERVED
+ */
+ KUNIT_EXPECT_EQ(test, updated_size, entry_size * 2);
+ KUNIT_EXPECT_E820_ENTRY_EQ(test, test_table.entries[0], 0, entry_size,
+ E820_TYPE_RAM, E820_NOT_CRYPTO_CAPABLE);
+ KUNIT_EXPECT_E820_ENTRY_EQ(test, test_table.entries[1], entry_size,
+ entry_size, E820_TYPE_RAM,
+ E820_NOT_CRYPTO_CAPABLE);
+ KUNIT_EXPECT_E820_ENTRY_EQ(test, test_table.entries[2], entry_size * 2,
+ entry_size, E820_TYPE_ACPI,
+ E820_NOT_CRYPTO_CAPABLE);
+}
+
+static void __init test_e820_range_remove(struct kunit *test)
+{
+ u64 entry_size = 15;
+ u64 removed_size = 0;
+
+ struct e820_entry_updater updater = { .should_update =
+ remover__should_update,
+ .update = remover__update,
+ .new = NULL };
+
+ struct e820_remover_data data = { .check_type = true,
+ .old_type = E820_TYPE_RAM };
+
+ /* Initialize table */
+ test_table.nr_entries = 0;
+ __e820__range_add(&test_table, 0, entry_size, E820_TYPE_RAM,
+ E820_NOT_CRYPTO_CAPABLE);
+ __e820__range_add(&test_table, entry_size, entry_size, E820_TYPE_RAM,
+ E820_NOT_CRYPTO_CAPABLE);
+ __e820__range_add(&test_table, entry_size * 2, entry_size,
+ E820_TYPE_ACPI, E820_NOT_CRYPTO_CAPABLE);
+
+ /*
+ * Need to use __e820__handle_range_update because
+ * e820__range_remove doesn't ask for the table
+ */
+ removed_size = __e820__handle_range_update(&test_table,
+ 0, entry_size * 2,
+ &updater, &data);
+
+ /* The first two regions were removed */
+ KUNIT_EXPECT_EQ(test, removed_size, entry_size * 2);
+ KUNIT_EXPECT_E820_ENTRY_EQ(test, test_table.entries[0], 0, 0, 0, 0);
+ KUNIT_EXPECT_E820_ENTRY_EQ(test, test_table.entries[1], 0, 0, 0, 0);
+
+ removed_size = __e820__handle_range_update(&test_table,
+ 0, entry_size * 3,
+ &updater, &data);
+
+ /* Nothing was removed, since nothing matched the target type */
+ KUNIT_EXPECT_EQ(test, removed_size, 0);
+ KUNIT_EXPECT_E820_ENTRY_EQ(test, test_table.entries[0], 0, 0, 0, 0);
+ KUNIT_EXPECT_E820_ENTRY_EQ(test, test_table.entries[1], 0, 0, 0, 0);
+ KUNIT_EXPECT_E820_ENTRY_EQ(test, test_table.entries[2], entry_size * 2,
+ entry_size, E820_TYPE_ACPI,
+ E820_NOT_CRYPTO_CAPABLE);
+}
+
+static void __init test_e820_range_crypto_update(struct kunit *test)
+{
+ u64 entry_size = 15;
+ u64 updated_size = 0;
+ /* Initialize table */
+ test_table.nr_entries = 0;
+ __e820__range_add(&test_table, 0, entry_size, E820_TYPE_RAM,
+ E820_CRYPTO_CAPABLE);
+ __e820__range_add(&test_table, entry_size, entry_size, E820_TYPE_RAM,
+ E820_NOT_CRYPTO_CAPABLE);
+ __e820__range_add(&test_table, entry_size * 2, entry_size,
+ E820_TYPE_RAM, E820_CRYPTO_CAPABLE);
+
+ updated_size = __e820__range_update_crypto(&test_table,
+ 0, entry_size * 3,
+ E820_CRYPTO_CAPABLE);
+
+ /* Only the region in the middle was updated */
+ KUNIT_EXPECT_EQ(test, updated_size, entry_size);
+ KUNIT_EXPECT_E820_ENTRY_EQ(test, test_table.entries[0], 0, entry_size,
+ E820_TYPE_RAM, E820_CRYPTO_CAPABLE);
+ KUNIT_EXPECT_E820_ENTRY_EQ(test, test_table.entries[1], entry_size,
+ entry_size, E820_TYPE_RAM,
+ E820_CRYPTO_CAPABLE);
+ KUNIT_EXPECT_E820_ENTRY_EQ(test, test_table.entries[2], entry_size * 2,
+ entry_size, E820_TYPE_RAM,
+ E820_CRYPTO_CAPABLE);
+}
+
+static void __init test_e820_handle_range_update_intersection(struct kunit *test)
+{
+ struct e820_entry_updater updater = {
+ .should_update = type_updater__should_update,
+ .update = type_updater__update,
+ .new = type_updater__new
+ };
+
+ struct e820_type_updater_data data = { .old_type = E820_TYPE_RAM,
+ .new_type = E820_TYPE_RESERVED };
+
+ u64 entry_size = 15;
+ u64 intersection_size = 2;
+ u64 updated_size = 0;
+ /* Initialize table */
+ test_table.nr_entries = 0;
+ __e820__range_add(&test_table, 0, entry_size, E820_TYPE_RAM,
+ E820_NOT_CRYPTO_CAPABLE);
+
+ updated_size =
+ __e820__handle_range_update(&test_table, 0,
+ entry_size - intersection_size,
+ &updater, &data);
+
+ KUNIT_EXPECT_EQ(test, updated_size, entry_size - intersection_size);
+
+ /* There is a new entry */
+ KUNIT_EXPECT_EQ(test, test_table.nr_entries, intersection_size);
+
+ /* The original entry now is moved */
+ KUNIT_EXPECT_E820_ENTRY_EQ(test, test_table.entries[0],
+ entry_size - intersection_size,
+ intersection_size, E820_TYPE_RAM,
+ E820_NOT_CRYPTO_CAPABLE);
+
+ /* The new entry has the correct values */
+ KUNIT_EXPECT_E820_ENTRY_EQ(test, test_table.entries[1], 0,
+ entry_size - intersection_size,
+ E820_TYPE_RESERVED, E820_NOT_CRYPTO_CAPABLE);
+}
+
+static void __init test_e820_handle_range_update_inside(struct kunit *test)
+{
+ struct e820_entry_updater updater = {
+ .should_update = type_updater__should_update,
+ .update = type_updater__update,
+ .new = type_updater__new
+ };
+
+ struct e820_type_updater_data data = { .old_type = E820_TYPE_RAM,
+ .new_type = E820_TYPE_RESERVED };
+
+ u64 entry_size = 15;
+ u64 updated_size = 0;
+ /* Initialize table */
+ test_table.nr_entries = 0;
+ __e820__range_add(&test_table, 0, entry_size, E820_TYPE_RAM,
+ E820_NOT_CRYPTO_CAPABLE);
+
+ updated_size = __e820__handle_range_update(&test_table, 5,
+ entry_size - 10,
+ &updater, &data);
+
+ KUNIT_EXPECT_EQ(test, updated_size, entry_size - 10);
+
+ /* There are two new entrie */
+ KUNIT_EXPECT_EQ(test, test_table.nr_entries, 3);
+
+ /* The original entry now shrunk */
+ KUNIT_EXPECT_E820_ENTRY_EQ(test, test_table.entries[0], 0, 5,
+ E820_TYPE_RAM, E820_NOT_CRYPTO_CAPABLE);
+
+ /* The new entries have the correct values */
+ KUNIT_EXPECT_E820_ENTRY_EQ(test, test_table.entries[1], 5,
+ entry_size - 10, E820_TYPE_RESERVED,
+ E820_NOT_CRYPTO_CAPABLE);
+ /* Left over of the original region */
+ KUNIT_EXPECT_E820_ENTRY_EQ(test, test_table.entries[2], entry_size - 5,
+ 5, E820_TYPE_RAM, E820_NOT_CRYPTO_CAPABLE);
+}
+
+static struct kunit_case e820_test_cases[] __initdata = {
+ KUNIT_CASE(test_e820_range_add),
+ KUNIT_CASE(test_e820_range_update),
+ KUNIT_CASE(test_e820_range_remove),
+ KUNIT_CASE(test_e820_range_crypto_update),
+ KUNIT_CASE(test_e820_handle_range_update_intersection),
+ KUNIT_CASE(test_e820_handle_range_update_inside),
+ {}
+};
+
+static struct kunit_suite e820_test_suite __initdata = {
+ .name = "e820",
+ .test_cases = e820_test_cases,
+};
+
+kunit_test_init_section_suite(e820_test_suite);
--
2.30.2

Martin Fernandez

unread,
Jul 4, 2022, 9:59:42 AM7/4/22
to linux-...@vger.kernel.org, linu...@vger.kernel.org, platform-...@vger.kernel.org, linu...@kvack.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, tg...@linutronix.de, mi...@redhat.com, b...@alien8.de, dave....@linux.intel.com, x...@kernel.org, h...@zytor.com, ar...@kernel.org, dvh...@infradead.org, an...@infradead.org, gre...@linuxfoundation.org, raf...@kernel.org, rp...@kernel.org, ak...@linux-foundation.org, daniel...@eclypsium.com, hugh...@gmail.com, alex.ba...@eclypsium.com, alison.s...@intel.com, kees...@chromium.org, Martin Fernandez
Add a function to iterate over the EFI Memory Map and mark the regions
tagged with EFI_MEMORY_CPU_CRYPTO in the e820_table; and call it from
efi_init if add_efi_memmap is disabled.

Also modify do_add_efi_memmap to mark the regions there.

If add_efi_memmap is false, also check that the e820_table has enough
size to (possibly) store also the EFI memmap.

Signed-off-by: Martin Fernandez <martin.f...@eclypsium.com>
---
arch/x86/platform/efi/efi.c | 37 +++++++++++++++++++++++++++++++++++++
1 file changed, 37 insertions(+)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 1591d67e0bcd..397d5e54d65e 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -187,6 +187,8 @@ static void __init do_add_efi_memmap(void)
}

e820__range_add(start, size, e820_type);
+ if (md->attribute & EFI_MEMORY_CPU_CRYPTO)
+ e820__range_set_crypto_capable(start, size);
}
e820__update_table(e820_table);
}
@@ -444,6 +446,34 @@ static int __init efi_config_init(const efi_config_table_type_t *arch_tables)
return ret;
}

+static void __init efi_mark_e820_regions_as_crypto_capable(void)
+{
+ efi_memory_desc_t *md;
+
+ /*
+ * Calling e820__range_set_crypto_capable several times
+ * creates a bunch of entries in the E820 table. They probably
+ * will get merged when calling update_table but we need the
+ * space there anyway
+ */
+ if (efi.memmap.nr_map + e820_table->nr_entries >= E820_MAX_ENTRIES) {
+ pr_err_once("E820 table is not large enough to fit EFI memmap; not marking entries as crypto capable\n");
+ return;
+ }
+
+ for_each_efi_memory_desc(md) {
+ if (md->attribute & EFI_MEMORY_CPU_CRYPTO)
+ e820__range_set_crypto_capable(md->phys_addr,
+ md->num_pages << EFI_PAGE_SHIFT);
+ }
+
+ /*
+ * We added and modified regions so it's good to update the
+ * table to merge/sort
+ */
+ e820__update_table(e820_table);
+}
+
void __init efi_init(void)
{
if (IS_ENABLED(CONFIG_X86_32) &&
@@ -497,6 +527,13 @@ void __init efi_init(void)
set_bit(EFI_RUNTIME_SERVICES, &efi.flags);
efi_clean_memmap();

+ /*
+ * If add_efi_memmap then there is no need to mark the regions
+ * again
+ */
+ if (!add_efi_memmap)
+ efi_mark_e820_regions_as_crypto_capable();
+
if (efi_enabled(EFI_DBG))
efi_print_memmap();
}
--
2.30.2

Martin Fernandez

unread,
Jul 4, 2022, 9:59:48 AM7/4/22
to linux-...@vger.kernel.org, linu...@vger.kernel.org, platform-...@vger.kernel.org, linu...@kvack.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, tg...@linutronix.de, mi...@redhat.com, b...@alien8.de, dave....@linux.intel.com, x...@kernel.org, h...@zytor.com, ar...@kernel.org, dvh...@infradead.org, an...@infradead.org, gre...@linuxfoundation.org, raf...@kernel.org, rp...@kernel.org, ak...@linux-foundation.org, daniel...@eclypsium.com, hugh...@gmail.com, alex.ba...@eclypsium.com, alison.s...@intel.com, kees...@chromium.org, Martin Fernandez
Show in each node in sysfs if its memory is able to do be encrypted by
the CPU; on EFI systems: if all its memory is marked with
EFI_MEMORY_CPU_CRYPTO in the EFI memory map.

Signed-off-by: Martin Fernandez <martin.f...@eclypsium.com>
---
Documentation/ABI/testing/sysfs-devices-node | 10 ++++++++++
drivers/base/node.c | 10 ++++++++++
2 files changed, 20 insertions(+)
create mode 100644 Documentation/ABI/testing/sysfs-devices-node

diff --git a/Documentation/ABI/testing/sysfs-devices-node b/Documentation/ABI/testing/sysfs-devices-node
new file mode 100644
index 000000000000..0e95420bd7c5
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-devices-node
@@ -0,0 +1,10 @@
+What: /sys/devices/system/node/nodeX/crypto_capable
+Date: April 2022
+Contact: Martin Fernandez <martin.f...@eclypsium.com>
+Users: fwupd (https://fwupd.org)
+Description:
+ This value is 1 if all system memory in this node is
+ capable of being protected with the CPU's memory
+ cryptographic capabilities. It is 0 otherwise.
+ On EFI systems the node will be marked with
+ EFI_MEMORY_CPU_CRYPTO.
\ No newline at end of file
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 0ac6376ef7a1..f081fa48c8e6 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -560,11 +560,21 @@ static ssize_t node_read_distance(struct device *dev,
}
static DEVICE_ATTR(distance, 0444, node_read_distance, NULL);

+static ssize_t crypto_capable_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct pglist_data *pgdat = NODE_DATA(dev->id);
+
+ return sysfs_emit(buf, "%d\n", pgdat->crypto_capable);
+}
+static DEVICE_ATTR_RO(crypto_capable);
+
static struct attribute *node_dev_attrs[] = {
&dev_attr_meminfo.attr,
&dev_attr_numastat.attr,
&dev_attr_distance.attr,
&dev_attr_vmstat.attr,
+ &dev_attr_crypto_capable.attr,
NULL
};

--
2.30.2

Greg KH

unread,
Jul 4, 2022, 10:34:09 AM7/4/22
to Martin Fernandez, linux-...@vger.kernel.org, linu...@vger.kernel.org, platform-...@vger.kernel.org, linu...@kvack.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, tg...@linutronix.de, mi...@redhat.com, b...@alien8.de, dave....@linux.intel.com, x...@kernel.org, h...@zytor.com, ar...@kernel.org, dvh...@infradead.org, an...@infradead.org, raf...@kernel.org, rp...@kernel.org, ak...@linux-foundation.org, daniel...@eclypsium.com, hugh...@gmail.com, alex.ba...@eclypsium.com, alison.s...@intel.com, kees...@chromium.org
Where will such a node be "marked"? I do not understand this last
sentence, sorry, can you please reword this?

And why is EFI an issue here at all?

thanks,

greg k-h

David Gow

unread,
Jul 4, 2022, 10:04:36 PM7/4/22
to Martin Fernandez, Linux Kernel Mailing List, linu...@vger.kernel.org, platform-...@vger.kernel.org, Linux Memory Management List, KUnit Development, open list:KERNEL SELFTEST FRAMEWORK, tg...@linutronix.de, Ingo Molnar, b...@alien8.de, dave....@linux.intel.com, x...@kernel.org, h...@zytor.com, ar...@kernel.org, dvh...@infradead.org, an...@infradead.org, Greg Kroah-Hartman, raf...@kernel.org, rp...@kernel.org, Andrew Morton, daniel...@eclypsium.com, hugh...@gmail.com, alex.ba...@eclypsium.com, alison.s...@intel.com, Kees Cook
On Mon, Jul 4, 2022 at 9:59 PM 'Martin Fernandez' via KUnit
Development <kuni...@googlegroups.com> wrote:
>
> Add KUnit tests for the e820_range_* functions.
>
> Signed-off-by: Martin Fernandez <martin.f...@eclypsium.com>
> ---

This looks good to me from a KUnit point of view. I've tested it on
both 32- and 64- bit x86 under qemu with the following:
./tools/testing/kunit/kunit.py run --arch=i386 'e820'
./tools/testing/kunit/kunit.py run --arch=x86_64 'e820'

Two notes inline below:
- An indentation error in the Kconfig entry, which stops it from compiling.
- Some minor pontificating about how KUnit wants to name macros in
general. (No action required: just making a note that this is probably
okay.)

With the indentation issue fixed, this is:

Reviewed-by: David Gow <davi...@google.com>

Cheers,
-- David

> arch/x86/Kconfig.debug | 10 ++
> arch/x86/kernel/e820.c | 5 +
> arch/x86/kernel/e820_test.c | 249 ++++++++++++++++++++++++++++++++++++
> 3 files changed, 264 insertions(+)
> create mode 100644 arch/x86/kernel/e820_test.c
>
> diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
> index d872a7522e55..b5040d345fb4 100644
> --- a/arch/x86/Kconfig.debug
> +++ b/arch/x86/Kconfig.debug
> @@ -225,6 +225,16 @@ config PUNIT_ATOM_DEBUG
> The current power state can be read from
> /sys/kernel/debug/punit_atom/dev_power_state
>
> +config E820_KUNIT_TEST
> + tristate "Tests for E820" if !KUNIT_ALL_TESTS
> + depends on KUNIT=y
> + default KUNIT_ALL_TESTS
> + help
> + This option enables unit tests for the e820.c code. It
> + should be enabled only in development environments.
> +
> + If unsure, say N.

The indentation here seems to be one space off, leading to errors building it:

arch/x86/Kconfig.debug:236: syntax error
arch/x86/Kconfig.debug:235:warning: ignoring unsupported character ','
arch/x86/Kconfig.debug:235:warning: ignoring unsupported character '.'
arch/x86/Kconfig.debug:235: unknown statement "If"
make[2]: *** [../scripts/kconfig/Makefile:77: olddefconfig] Error 1
I'm not 100% sure we ever came to a decision about tests naming their
own expect macros KUNIT_EXPECT_*. I know KASAN is doing it, though the
thought there was that other tests might have sensible reasons to
expect given memory accesses, so it might not be limited to the one
test.

Personally, I don't mind it, particularly since it's obvious that this
is specific to the e820 test.
> --
> You received this message because you are subscribed to the Google Groups "KUnit Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kunit-dev+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kunit-dev/20220704135833.1496303-8-martin.fernandez%40eclypsium.com.

Martin Fernandez

unread,
Jul 5, 2022, 1:24:19 PM7/5/22
to David Gow, Linux Kernel Mailing List, linu...@vger.kernel.org, platform-...@vger.kernel.org, Linux Memory Management List, KUnit Development, open list:KERNEL SELFTEST FRAMEWORK, tg...@linutronix.de, Ingo Molnar, b...@alien8.de, dave....@linux.intel.com, x...@kernel.org, h...@zytor.com, ar...@kernel.org, dvh...@infradead.org, an...@infradead.org, Greg Kroah-Hartman, raf...@kernel.org, rp...@kernel.org, Andrew Morton, daniel...@eclypsium.com, hugh...@gmail.com, alex.ba...@eclypsium.com, alison.s...@intel.com, Kees Cook
On 7/4/22, David Gow <davi...@google.com> wrote:
> On Mon, Jul 4, 2022 at 9:59 PM 'Martin Fernandez' via KUnit
> Development <kuni...@googlegroups.com> wrote:
>>
>> Add KUnit tests for the e820_range_* functions.
>>
>> Signed-off-by: Martin Fernandez <martin.f...@eclypsium.com>
>> ---
>
> This looks good to me from a KUnit point of view. I've tested it on
> both 32- and 64- bit x86 under qemu with the following:
> ./tools/testing/kunit/kunit.py run --arch=i386 'e820'
> ./tools/testing/kunit/kunit.py run --arch=x86_64 'e820'

Yes, that's how I ran it. The new qemu executions are great by the way :)
I don't know what happened, I saw checkpatch warning me about the a
help description but since it looked good to me I didn't mind. Now I
see the actual error.
That's true, I didn't think about, because as you said the naming is
quite obviuos, but I get that it could be an issue.

Martin Fernandez

unread,
Jul 5, 2022, 1:35:20 PM7/5/22
to Greg KH, linux-...@vger.kernel.org, linu...@vger.kernel.org, platform-...@vger.kernel.org, linu...@kvack.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, tg...@linutronix.de, mi...@redhat.com, b...@alien8.de, dave....@linux.intel.com, x...@kernel.org, h...@zytor.com, ar...@kernel.org, dvh...@infradead.org, an...@infradead.org, raf...@kernel.org, rp...@kernel.org, ak...@linux-foundation.org, daniel...@eclypsium.com, hugh...@gmail.com, alex.ba...@eclypsium.com, alison.s...@intel.com, kees...@chromium.org
What I meant is that if all the memory regions in a given node are
flagged with EFI_MEMORY_CPU_CRYPTO then that file will hold a 1.

Maybe it's a little confusing if you don't know what
EFI_MEMORY_CPU_CRYPTO is.

> And why is EFI an issue here at all?

Checking for EFI_MEMORY_CPU_CRYPTO is the way to know if a memory
region is able to be encrypted by the CPU on EFI platforms. It's not
really an issue and it's currently the only implementation for this
file.

Is it clearer here?

This value is 1 if the memory in this node is capable of being
protected with the CPU's memory cryptographic capabilities. It is 0
otherwise.
On EFI systems this means that all the memory regions of the node
have the EFI_MEMORY_CPU_CRYPTO attribute set.

> thanks,
>
> greg k-h
>

Greg KH

unread,
Jul 6, 2022, 2:38:42 AM7/6/22
to Martin Fernandez, linux-...@vger.kernel.org, linu...@vger.kernel.org, platform-...@vger.kernel.org, linu...@kvack.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, tg...@linutronix.de, mi...@redhat.com, b...@alien8.de, dave....@linux.intel.com, x...@kernel.org, h...@zytor.com, ar...@kernel.org, dvh...@infradead.org, an...@infradead.org, raf...@kernel.org, rp...@kernel.org, ak...@linux-foundation.org, daniel...@eclypsium.com, hugh...@gmail.com, alex.ba...@eclypsium.com, alison.s...@intel.com, kees...@chromium.org
Much better, thanks.

greg k-h

Kirill A. Shutemov

unread,
Oct 7, 2022, 11:53:28 AM10/7/22
to Martin Fernandez, linux-...@vger.kernel.org, linu...@vger.kernel.org, platform-...@vger.kernel.org, linu...@kvack.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, tg...@linutronix.de, mi...@redhat.com, b...@alien8.de, dave....@linux.intel.com, x...@kernel.org, h...@zytor.com, ar...@kernel.org, dvh...@infradead.org, an...@infradead.org, gre...@linuxfoundation.org, raf...@kernel.org, rp...@kernel.org, ak...@linux-foundation.org, daniel...@eclypsium.com, hugh...@gmail.com, alex.ba...@eclypsium.com, alison.s...@intel.com, kees...@chromium.org
On Mon, Jul 04, 2022 at 10:58:26AM -0300, Martin Fernandez wrote:
> Add a new member in the pg_data_t struct to tell whether the node
> corresponding to that pg_data_t is able to do hardware memory
> encryption.
>
> This will be read from sysfs.
>
> Signed-off-by: Martin Fernandez <martin.f...@eclypsium.com>
> ---
> include/linux/mmzone.h | 3 +++
> mm/page_alloc.c | 1 +
> 2 files changed, 4 insertions(+)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index aab70355d64f..6fd4785f1d05 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -883,6 +883,9 @@ typedef struct pglist_data {
> struct task_struct *kcompactd;
> bool proactive_compact_trigger;
> #endif
> +
> + bool crypto_capable;
> +

There's already pgdat->flags. Any reason we cannot encode it there?

> /*
> * This is a per-node reserve of pages that are not available
> * to userspace allocations.
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e008a3df0485..147437329ac7 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7729,6 +7729,7 @@ static void __init free_area_init_node(int nid)
> pgdat->node_id = nid;
> pgdat->node_start_pfn = start_pfn;
> pgdat->per_cpu_nodestats = NULL;
> + pgdat->crypto_capable = memblock_node_is_crypto_capable(nid);
>
> if (start_pfn != end_pfn) {
> pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid,
> --
> 2.30.2
>

--
Kiryl Shutsemau / Kirill A. Shutemov

Martin Fernandez

unread,
Oct 11, 2022, 9:28:45 AM10/11/22
to Kirill A. Shutemov, linux-...@vger.kernel.org, linu...@vger.kernel.org, platform-...@vger.kernel.org, linu...@kvack.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, tg...@linutronix.de, mi...@redhat.com, b...@alien8.de, dave....@linux.intel.com, x...@kernel.org, h...@zytor.com, ar...@kernel.org, dvh...@infradead.org, an...@infradead.org, gre...@linuxfoundation.org, raf...@kernel.org, rp...@kernel.org, ak...@linux-foundation.org, daniel...@eclypsium.com, hugh...@gmail.com, alex.ba...@eclypsium.com, alison.s...@intel.com, kees...@chromium.org
On 10/7/22, Kirill A. Shutemov <kir...@shutemov.name> wrote:
> On Mon, Jul 04, 2022 at 10:58:26AM -0300, Martin Fernandez wrote:
>> Add a new member in the pg_data_t struct to tell whether the node
>> corresponding to that pg_data_t is able to do hardware memory
>> encryption.
>>
>> This will be read from sysfs.
>>
>> Signed-off-by: Martin Fernandez <martin.f...@eclypsium.com>
>> ---
>> include/linux/mmzone.h | 3 +++
>> mm/page_alloc.c | 1 +
>> 2 files changed, 4 insertions(+)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index aab70355d64f..6fd4785f1d05 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -883,6 +883,9 @@ typedef struct pglist_data {
>> struct task_struct *kcompactd;
>> bool proactive_compact_trigger;
>> #endif
>> +
>> + bool crypto_capable;
>> +
>
> There's already pgdat->flags. Any reason we cannot encode it there?

Not really a reason, I'll considerate when I send then next version. I
tried to quickly find for references of what kind of flags does it
have, I didn't find any. Do you suggest it should work?

Kirill A. Shutemov

unread,
Oct 11, 2022, 11:28:07 AM10/11/22
to Martin Fernandez, linux-...@vger.kernel.org, linu...@vger.kernel.org, platform-...@vger.kernel.org, linu...@kvack.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, tg...@linutronix.de, mi...@redhat.com, b...@alien8.de, dave....@linux.intel.com, x...@kernel.org, h...@zytor.com, ar...@kernel.org, dvh...@infradead.org, an...@infradead.org, gre...@linuxfoundation.org, raf...@kernel.org, rp...@kernel.org, ak...@linux-foundation.org, daniel...@eclypsium.com, hugh...@gmail.com, alex.ba...@eclypsium.com, alison.s...@intel.com, kees...@chromium.org
Maybe. Or maybe introduce capabilities bitfield and make crypto as one of
them.

We consider to re-use the approach for other cases. Like if the memory in
the node is TDX-compatible (there's more requirements for it than just
encryption).

Borislav Petkov

unread,
Oct 13, 2022, 3:48:33 PM10/13/22
to Martin Fernandez, linux-...@vger.kernel.org, linu...@vger.kernel.org, platform-...@vger.kernel.org, linu...@kvack.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, tg...@linutronix.de, mi...@redhat.com, dave....@linux.intel.com, x...@kernel.org, h...@zytor.com, ar...@kernel.org, dvh...@infradead.org, an...@infradead.org, gre...@linuxfoundation.org, raf...@kernel.org, rp...@kernel.org, ak...@linux-foundation.org, daniel...@eclypsium.com, hugh...@gmail.com, alex.ba...@eclypsium.com, alison.s...@intel.com, kees...@chromium.org
On Mon, Jul 04, 2022 at 10:58:24AM -0300, Martin Fernandez wrote:
> If all nodes are capable of encryption and if the system have tme/sme
> on we can pretty confidently say that the device is actively
> encrypting all its memory.

Wait, what?

If all memory is crypto capable and I boot with mem_encrypt=off, then
the device is certainly not encrypting any memory.

dhansen says TME cannot be controlled this way and if you turn it off in
the BIOS, EFI_MEMORY_CPU_CRYPTO attr should not be set either. But that
marking won't work on AMD.

You really need to be able to check whether memory encryption is also
enabled.

And I believe I've said this before but even if encryption is on, it is
never "all its memory": the machine can decide to decrypt a page or a
bunch of them for whatever reason. And then they're plaintext.

> It's planned to make this check part of an specification that can be
> passed to people purchasing hardware

How is that supposed to work?

People would boot a Linux on that hardware and fwupd would tell them
whether it can encrypt memory or not?

But if that were the only use case, why can't EFI simply say that in its
fancy GUI?

Because all the kernel seems to be doing here is parrot further
EFI_MEMORY_CPU_CRYPTO.

And that attribute gets set by EFI so it goes and picks apart whether
the underlying hw can encrypt memory. So EFI could report it too.

Hmmm?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Martin Fernandez

unread,
Oct 13, 2022, 5:01:01 PM10/13/22
to Borislav Petkov, linux-...@vger.kernel.org, linu...@vger.kernel.org, platform-...@vger.kernel.org, linu...@kvack.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, tg...@linutronix.de, mi...@redhat.com, dave....@linux.intel.com, x...@kernel.org, h...@zytor.com, ar...@kernel.org, dvh...@infradead.org, an...@infradead.org, gre...@linuxfoundation.org, raf...@kernel.org, rp...@kernel.org, ak...@linux-foundation.org, daniel...@eclypsium.com, hugh...@gmail.com, alex.ba...@eclypsium.com, alison.s...@intel.com, kees...@chromium.org
On 10/13/22, Borislav Petkov <b...@alien8.de> wrote:
> On Mon, Jul 04, 2022 at 10:58:24AM -0300, Martin Fernandez wrote:
>> If all nodes are capable of encryption and if the system have tme/sme
>> on we can pretty confidently say that the device is actively
>> encrypting all its memory.
>
> Wait, what?
>
> If all memory is crypto capable and I boot with mem_encrypt=off, then
> the device is certainly not encrypting any memory.
>
> dhansen says TME cannot be controlled this way and if you turn it off in
> the BIOS, EFI_MEMORY_CPU_CRYPTO attr should not be set either.

That's bad, because it would be nice if that attribute only depended
on the hardware and not on some setting.

The plan of this patch was, as you mentioned just to report
EFI_MEMORY_CPU_CRYPTO in a per node level.

Now, I think I will need to check for tme/sme and only if those are
active then show the file in sysfs, otherwise not show it at all,
because it would be misleading. Any other idea?

> But that
> marking won't work on AMD.

You mean that EFI_MEMORY_CPU_CRYPTO means nothing on an AMD system?

Dave Hansen

unread,
Oct 13, 2022, 8:24:56 PM10/13/22
to Borislav Petkov, Martin Fernandez, linux-...@vger.kernel.org, linu...@vger.kernel.org, platform-...@vger.kernel.org, linu...@kvack.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, tg...@linutronix.de, mi...@redhat.com, dave....@linux.intel.com, x...@kernel.org, h...@zytor.com, ar...@kernel.org, dvh...@infradead.org, an...@infradead.org, gre...@linuxfoundation.org, raf...@kernel.org, rp...@kernel.org, ak...@linux-foundation.org, daniel...@eclypsium.com, hugh...@gmail.com, alex.ba...@eclypsium.com, alison.s...@intel.com, kees...@chromium.org
On 10/13/22 12:48, Borislav Petkov wrote:
>> It's planned to make this check part of an specification that can be
>> passed to people purchasing hardware
> How is that supposed to work?
>
> People would boot a Linux on that hardware and fwupd would tell them
> whether it can encrypt memory or not?
>
> But if that were the only use case, why can't EFI simply say that in its
> fancy GUI?
>
> Because all the kernel seems to be doing here is parrot further
> EFI_MEMORY_CPU_CRYPTO.
>
> And that attribute gets set by EFI so it goes and picks apart whether
> the underlying hw can encrypt memory. So EFI could report it too.

I think the kernel _would_ just be parroting the firmware's info *if*
this stuff was all static at boot. It pretty much _is_ static on
today's systems. You generally can't hotplug memory (encrypted or not)
on any of these fancy memory encryption systems. On the Intel side, I'm
thinking mostly of Ice Lake systems.

But, that is very much changing once CXL comes on the scene. A system
might boot with only DRAM attached right to the CPU and all of it is
encryption-capable. Then, some wise guys plugs in a CXL card that
doesn't support encryption.

That makes the "is everything encrypted" answer dynamic and is
essentially unanswerable at boot, other than to give a one-off answer.

Borislav Petkov

unread,
Oct 27, 2022, 4:57:39 AM10/27/22
to Martin Fernandez, linux-...@vger.kernel.org, linu...@vger.kernel.org, platform-...@vger.kernel.org, linu...@kvack.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, tg...@linutronix.de, mi...@redhat.com, dave....@linux.intel.com, x...@kernel.org, h...@zytor.com, ar...@kernel.org, dvh...@infradead.org, an...@infradead.org, gre...@linuxfoundation.org, raf...@kernel.org, rp...@kernel.org, ak...@linux-foundation.org, daniel...@eclypsium.com, hugh...@gmail.com, alex.ba...@eclypsium.com, alison.s...@intel.com, kees...@chromium.org
On Thu, Oct 13, 2022 at 06:00:58PM -0300, Martin Fernandez wrote:
> That's bad, because it would be nice if that attribute only depended
> on the hardware and not on some setting.

Why would that be bad?

You want to be able to disable encryption for whatever reason sometimes.

> The plan of this patch was, as you mentioned just to report
> EFI_MEMORY_CPU_CRYPTO in a per node level.
>
> Now, I think I will need to check for tme/sme and only if those are
> active then show the file in sysfs, otherwise not show it at all,
> because it would be misleading. Any other idea?

Well, I still think this is not going to work in all cases. SME/TME can
be enabled but the kernel can go - and for whatever reason - map a bunch
of memory unencrypted.

So I don't know what the goal of this fwupd checking whether users have
configured memory encryption properly is. It might end up giving that
false sense of security...

> You mean that EFI_MEMORY_CPU_CRYPTO means nothing on an AMD system?

I mean, you still can disable memory encryption.

Dave Hansen

unread,
Oct 27, 2022, 11:21:09 AM10/27/22
to Borislav Petkov, Martin Fernandez, linux-...@vger.kernel.org, linu...@vger.kernel.org, platform-...@vger.kernel.org, linu...@kvack.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, tg...@linutronix.de, mi...@redhat.com, dave....@linux.intel.com, x...@kernel.org, h...@zytor.com, ar...@kernel.org, dvh...@infradead.org, an...@infradead.org, gre...@linuxfoundation.org, raf...@kernel.org, rp...@kernel.org, ak...@linux-foundation.org, daniel...@eclypsium.com, hugh...@gmail.com, alex.ba...@eclypsium.com, alison.s...@intel.com, kees...@chromium.org
On 10/27/22 01:57, Borislav Petkov wrote:
> Well, I still think this is not going to work in all cases. SME/TME can
> be enabled but the kernel can go - and for whatever reason - map a bunch
> of memory unencrypted.

For TME on Intel systems, there's no way to make it unencrypted. The
memory controller is doing all the encryption behind the back of the OS
and even devices that are doing DMA. Nothing outside of the memory
controller really knows or cares that encryption is happening.

Borislav Petkov

unread,
Oct 27, 2022, 11:34:05 AM10/27/22
to Dave Hansen, Martin Fernandez, linux-...@vger.kernel.org, linu...@vger.kernel.org, platform-...@vger.kernel.org, linu...@kvack.org, kuni...@googlegroups.com, linux-k...@vger.kernel.org, tg...@linutronix.de, mi...@redhat.com, dave....@linux.intel.com, x...@kernel.org, h...@zytor.com, ar...@kernel.org, dvh...@infradead.org, an...@infradead.org, gre...@linuxfoundation.org, raf...@kernel.org, rp...@kernel.org, ak...@linux-foundation.org, daniel...@eclypsium.com, hugh...@gmail.com, alex.ba...@eclypsium.com, alison.s...@intel.com, kees...@chromium.org
Ok, Tom just confirmed that AMD's TSME thing also encrypts all memory.

So I guess the code should check for TME or TSME. If those are set, then
you can assume that all memory is encrypted.
Reply all
Reply to author
Forward
0 new messages