[PATCH 00/14] AMD-Vi support

52 views
Skip to first unread message

Valentine Sinitsyn

unread,
Sep 12, 2015, 8:06:00 AM9/12/15
to jailho...@googlegroups.com, Valentine Sinitsyn
Hi all,

This patchset implements AMD-Vi (AMD IOMMU) support. Only memory translation
(DMAR equivalent) is implemented so far, interrupt remapping support is still
missing.

There could be some issues with event logging, non-root cells and features I
don't have on my hardware, like Device Table Segmentation. Please report bugs.
Otherwise, I feel the patchset is suitable for initial review.

One more thing worth noting is that transactions coming from PCI devices
unlisted in cell configs will currently pass untranslated. Should we
target-abort them instead?

Thanks,
Valentine

Valentine Sinitsyn (14):
core: Introduce struct jailhouse_iommu
pci: Add pci_find_capability_by_id()
x86: Prepare to introduce amd_iommu code
x86: Store amd_iommu feature bits in config files
x86: Add amd_iommu structs and definitions
x86: Add amd_iommu page tables handling code
x86: Add amd_iommu initialization code
x86: Implement amd_iommu cell management functions
x86: Add amd_iommu's memory mapping functions
x86: Implement amd_iommu command posting
x86: Add device management functions for amd_iommu
x86: Add iommu_commit_config() for amd_iommu
x86: Implement amd_iommu event log
x86: Add amd_iommu pending faults check

configs/f2a88xm-hd3.c | 10 +-
configs/h87i.c | 12 +-
configs/qemu-vm.c | 7 +-
hypervisor/arch/x86/Makefile | 2 +-
hypervisor/arch/x86/amd_iommu.c | 922 ++++++++++++++++++++-
hypervisor/arch/x86/amd_iommu_paging.c | 255 ++++++
hypervisor/arch/x86/include/asm/amd_iommu.h | 118 ++-
hypervisor/arch/x86/include/asm/amd_iommu_paging.h | 28 +
hypervisor/arch/x86/include/asm/cell.h | 5 +-
hypervisor/arch/x86/iommu.c | 2 +-
hypervisor/arch/x86/vtd.c | 6 +-
hypervisor/include/jailhouse/cell-config.h | 11 +-
hypervisor/include/jailhouse/paging.h | 2 +-
hypervisor/include/jailhouse/pci.h | 4 +
hypervisor/pci.c | 33 +
tools/jailhouse-config-create | 72 +-
tools/root-cell-config.c.tmpl | 14 +-
17 files changed, 1450 insertions(+), 53 deletions(-)
create mode 100644 hypervisor/arch/x86/amd_iommu_paging.c
create mode 100644 hypervisor/arch/x86/include/asm/amd_iommu_paging.h

--
2.5.1

Valentine Sinitsyn

unread,
Sep 12, 2015, 8:06:08 AM9/12/15
to jailho...@googlegroups.com, Valentine Sinitsyn
For AMD, we need to store not only base address but also MMIO size
(and maybe other attributes in future). Introduce struct jailhoue_iommu
to encapsulate all data required. For now, all fields make sense both for
VT-d and AMD IOMMU.

Signed-off-by: Valentine Sinitsyn <valentine...@gmail.com>
---
configs/f2a88xm-hd3.c | 9 +++-
configs/h87i.c | 12 ++++--
configs/qemu-vm.c | 7 +++-
hypervisor/arch/x86/iommu.c | 2 +-
hypervisor/arch/x86/vtd.c | 6 ++-
hypervisor/include/jailhouse/cell-config.h | 10 ++++-
tools/jailhouse-config-create | 67 ++++++++++++++++++++++--------
tools/root-cell-config.c.tmpl | 13 ++++--
8 files changed, 95 insertions(+), 31 deletions(-)

diff --git a/configs/f2a88xm-hd3.c b/configs/f2a88xm-hd3.c
index 2736496..a48a4ed 100644
--- a/configs/f2a88xm-hd3.c
+++ b/configs/f2a88xm-hd3.c
@@ -40,8 +40,13 @@ struct {
.mmconfig_base = 0xe0000000,
.mmconfig_end_bus = 0xff,
.pm_timer_address = 0x808,
- .iommu_base = {
- 0xfeb80000,
+ .iommu_units = {
+ {
+ .base = 0xfeb80000,
+ .size = 0x80000,
+ .amd_bdf = 0x02,
+ .amd_cap = 0x40,
+ },
},
},
.device_limit = 128,
diff --git a/configs/h87i.c b/configs/h87i.c
index c5473ac..5c75d8e 100644
--- a/configs/h87i.c
+++ b/configs/h87i.c
@@ -35,9 +35,15 @@ struct {
.mmconfig_base = 0xf8000000,
.mmconfig_end_bus = 0x3f,
.pm_timer_address = 0x1808,
- .iommu_base = {
- 0xfed90000,
- 0xfed91000,
+ .iommu_units = {
+ {
+ .base = 0xfed90000,
+ .size = 0x1000,
+ },
+ {
+ .base = 0xfed91000,
+ .size = 0x1000,
+ },
},
},
.interrupt_limit = 256,
diff --git a/configs/qemu-vm.c b/configs/qemu-vm.c
index 637b9c6..823e0c2 100644
--- a/configs/qemu-vm.c
+++ b/configs/qemu-vm.c
@@ -49,8 +49,11 @@ struct {
.mmconfig_base = 0xb0000000,
.mmconfig_end_bus = 0xff,
.pm_timer_address = 0x608,
- .iommu_base = {
- 0xfed90000,
+ .iommu_units = {
+ {
+ .base = 0xfed90000,
+ .size = 0x1000,
+ },
},
},
.interrupt_limit = 256,
diff --git a/hypervisor/arch/x86/iommu.c b/hypervisor/arch/x86/iommu.c
index 3da61d6..11f35bc 100644
--- a/hypervisor/arch/x86/iommu.c
+++ b/hypervisor/arch/x86/iommu.c
@@ -18,7 +18,7 @@ unsigned int iommu_count_units(void)
unsigned int units = 0;

while (units < JAILHOUSE_MAX_IOMMU_UNITS &&
- system_config->platform_info.x86.iommu_base[units])
+ system_config->platform_info.x86.iommu_units[units].base)
units++;
return units;
}
diff --git a/hypervisor/arch/x86/vtd.c b/hypervisor/arch/x86/vtd.c
index db43d46..e20f4b5 100644
--- a/hypervisor/arch/x86/vtd.c
+++ b/hypervisor/arch/x86/vtd.c
@@ -421,7 +421,7 @@ static int vtd_init_ir_emulation(unsigned int unit_no, void *reg_base)

root_cell.arch.vtd.ir_emulation = true;

- base = system_config->platform_info.x86.iommu_base[unit_no];
+ base = system_config->platform_info.x86.iommu_units[unit_no].base;
mmio_region_register(&root_cell, base, PAGE_SIZE,
vtd_unit_access_handler, unit);

@@ -454,6 +454,7 @@ int iommu_init(void)
{
unsigned long version, caps, ecaps, ctrls, sllps_caps = ~0UL;
unsigned int units, pt_levels, num_did, n;
+ struct jailhouse_iommu *iommu;
void *reg_base;
u64 base_addr;
int err;
@@ -484,7 +485,8 @@ int iommu_init(void)
return -ENOMEM;

for (n = 0; n < units; n++) {
- base_addr = system_config->platform_info.x86.iommu_base[n];
+ iommu = &system_config->platform_info.x86.iommu_units[n];
+ base_addr = iommu->base;

reg_base = dmar_reg_base + n * PAGE_SIZE;

diff --git a/hypervisor/include/jailhouse/cell-config.h b/hypervisor/include/jailhouse/cell-config.h
index 261d9c7..aebe007 100644
--- a/hypervisor/include/jailhouse/cell-config.h
+++ b/hypervisor/include/jailhouse/cell-config.h
@@ -118,6 +118,13 @@ struct jailhouse_pci_capability {

#define JAILHOUSE_MAX_IOMMU_UNITS 8

+struct jailhouse_iommu {
+ __u64 base;
+ __u64 size;
+ __u16 amd_bdf;
+ __u16 amd_cap;
+} __attribute__((packed));
+
struct jailhouse_system {
struct jailhouse_memory hypervisor_memory;
struct jailhouse_memory debug_uart;
@@ -127,7 +134,8 @@ struct jailhouse_system {
__u8 mmconfig_end_bus;
__u8 padding[5];
__u16 pm_timer_address;
- __u64 iommu_base[JAILHOUSE_MAX_IOMMU_UNITS];
+ struct jailhouse_iommu
+ iommu_units[JAILHOUSE_MAX_IOMMU_UNITS];
} __attribute__((packed)) x86;
} __attribute__((packed)) platform_info;
__u32 device_limit;
diff --git a/tools/jailhouse-config-create b/tools/jailhouse-config-create
index 3e65315..e0582de 100755
--- a/tools/jailhouse-config-create
+++ b/tools/jailhouse-config-create
@@ -514,6 +514,20 @@ class IOMemRegionTree:
return regions


+class IOMMUConfig(object):
+ def __init__(self, props):
+ self.base_addr = props['base_addr']
+ self.mmio_size = props['mmio_size']
+ if 'amd_bdf' in props:
+ self.amd_bdf = props['amd_bdf']
+ if 'amd_cap' in props:
+ self.amd_cap = props['amd_cap']
+
+ @property
+ def is_amd_iommu(self):
+ return hasattr(self, 'amd_bdf')
+
+
def parse_iomem(pcidevices):
regions = IOMemRegionTree.parse_iomem_tree(
IOMemRegionTree.parse_iomem_file())
@@ -690,7 +704,11 @@ def parse_dmar(pcidevices, ioapics):
if len(units) >= 8:
raise RuntimeError('Too many DMAR units. '
'Raise JAILHOUSE_MAX_IOMMU_UNITS.')
- units.append(base)
+ # Always map one page per DMAR unit
+ units.append(IOMMUConfig({
+ 'base_addr': base,
+ 'mmio_size': 4096
+ }))
if flags & 1:
for d in pcidevices:
if d.iommu is None:
@@ -765,11 +783,12 @@ def parse_dmar(pcidevices, ioapics):
return units, regions


-def parse_ivrs(pcidevices, ioapics):
- def format_bdf(bdf):
- bus, dev, fun = (bdf >> 8) & 0xff, (bdf >> 3) & 0x1f, bdf & 0x7
- return '%02x:%02x.%x' % (bus, dev, fun)
+def format_pci_bdf(bdf):
+ bus, dev, fun = (bdf >> 8) & 0xff, (bdf >> 3) & 0x1f, bdf & 0x7
+ return '%02x:%02x.%x' % (bus, dev, fun)
+

+def parse_ivrs(pcidevices, ioapics):
f = input_open('/sys/firmware/acpi/tables/IVRS', 'rb')
signature = f.read(4)
if signature != b'IVRS':
@@ -786,19 +805,20 @@ def parse_ivrs(pcidevices, ioapics):
regions = []
# BDF of devices that are permitted outside IOMMU: root complex
iommu_skiplist = set([0x0])
+ ivhd_blocks = 0
while length > 0:
(block_type, block_length) = struct.unpack('<BxH', f.read(4))
if block_type in [0x10, 0x11]:
+ ivhd_blocks += 1
+ if ivhd_blocks > 1:
+ raise RuntimeError('Jailhouse doesn\'t support more than one '
+ 'AMD IOMMU per PCI function.')
# IVHD block
- (iommu_id, base_addr, pci_seg) = \
- struct.unpack('<HxxQH', f.read(14))
- length -= block_length
- block_length -= 18
+ ivhd_fields = struct.unpack('<HHQHxxL', f.read(20))
+ iommu_bdf, caps_ofs, base_addr, pci_seg, iommu_feat = ivhd_fields

- # IOMMU EFR image and reserved area
- skip_bytes = 6 if block_type == 0x10 else 22
- f.seek(skip_bytes, os.SEEK_CUR)
- block_length -= skip_bytes
+ length -= block_length
+ block_length -= 24

if pci_seg != 0:
raise RuntimeError('We do not support multiple PCI segments')
@@ -809,10 +829,22 @@ def parse_ivrs(pcidevices, ioapics):

# We shouldn't map IOMMU to the cells
for i, d in enumerate(pcidevices):
- if d.bdf() == iommu_id:
+ if d.bdf() == iommu_bdf:
del pcidevices[i]

- units.append(base_addr)
+ if (iommu_feat & (0xF << 13)) and (iommu_feat & (0x3F << 17)):
+ # Performance Counters are supported, allocate 512K
+ mmio_size = 524288
+ else:
+ # Allocate 16K
+ mmio_size = 16384
+
+ units.append(IOMMUConfig({
+ 'base_addr': base_addr,
+ 'mmio_size': mmio_size,
+ 'amd_bdf': iommu_bdf,
+ 'amd_cap': caps_ofs,
+ }))

bdf_start_range = None
while block_length > 0:
@@ -869,9 +901,10 @@ def parse_ivrs(pcidevices, ioapics):

elif type in [0x20, 0x21, 0x22]:
# IVMD block
+ # FIXME: Also UNTESTED due to lack of the appropriate hardware.
+ ivmd_fields = struct.unpack('<BBHHHxxxxxxxxQQ', f.read(32))
(block_type, block_flags, block_length,
- device_id, aux_data, mem_addr, mem_len) = struct.unpack(
- '<BBHHHxxxxxxxxQQ')
+ device_id, aux_data, mem_addr, mem_len) = ivmd_fields
length -= block_length

if int(block_flags):
diff --git a/tools/root-cell-config.c.tmpl b/tools/root-cell-config.c.tmpl
index fdf8f8b..d725059 100644
--- a/tools/root-cell-config.c.tmpl
+++ b/tools/root-cell-config.c.tmpl
@@ -63,9 +63,16 @@ struct {
.mmconfig_end_bus = ${hex(mmconfig.end_bus)},
.pm_timer_address = ${hex(pm_timer_base)},
% if iommu_units:
- .iommu_base = {
- % for d in iommu_units:
- ${hex(d)},
+ .iommu_units = {
+ % for unit in iommu_units:
+ {
+ .base = ${hex(unit.base_addr)},
+ .size = ${hex(unit.mmio_size)},
+ % if unit.is_amd_iommu:
+ .amd_bdf = ${hex(unit.amd_bdf)},
+ .amd_cap = ${hex(unit.amd_cap)},
+ % endif
+ },
% endfor
},
% endif
--
2.5.1

Valentine Sinitsyn

unread,
Sep 12, 2015, 8:06:09 AM9/12/15
to jailho...@googlegroups.com, Valentine Sinitsyn
Implement a way to get capability block by the identifier.
This is essential for PCI devices configuration from the hypervisor,
e.g. in amd_iommu code.

Signed-off-by: Valentine Sinitsyn <valentine...@gmail.com>
---
hypervisor/include/jailhouse/pci.h | 4 ++++
hypervisor/pci.c | 33 +++++++++++++++++++++++++++++++++
2 files changed, 37 insertions(+)

diff --git a/hypervisor/include/jailhouse/pci.h b/hypervisor/include/jailhouse/pci.h
index 1d0ec24..42b2a3c 100644
--- a/hypervisor/include/jailhouse/pci.h
+++ b/hypervisor/include/jailhouse/pci.h
@@ -40,6 +40,8 @@

#define PCI_IVSHMEM_NUM_MMIO_REGIONS 2

+#define PCI_CAP_PTR_MASK ~0x03
+
struct cell;

/**
@@ -151,6 +153,8 @@ int pci_init(void);
u32 pci_read_config(u16 bdf, u16 address, unsigned int size);
void pci_write_config(u16 bdf, u16 address, u32 value, unsigned int size);

+u8 pci_find_capability_by_id(u16 bdf, u8 cap_id);
+
struct pci_device *pci_get_assigned_device(const struct cell *cell, u16 bdf);

enum pci_access pci_cfg_read_moderate(struct pci_device *device, u16 address,
diff --git a/hypervisor/pci.c b/hypervisor/pci.c
index 2dd4cca..70a02c8 100644
--- a/hypervisor/pci.c
+++ b/hypervisor/pci.c
@@ -137,6 +137,39 @@ void pci_write_config(u16 bdf, u16 address, u32 value, unsigned int size)
mmio_write32(mmcfg_addr, value);
}

+
+/**
+ * Find PCI capability block by id value.
+ * @param bdf 16-bit bus/device/function ID of target.
+ * @param cap_id Capability ID to look for.
+ *
+ * @return Offset in configuration space or 0 if not found
+ */
+u8 pci_find_capability_by_id(u16 bdf, u8 cap_id)
+{
+ /* Protect against malformed Capabilities List */
+ unsigned int ttl = 48;
+ u32 device_status;
+ u16 offset, buf;
+
+ device_status = pci_read_config(bdf, PCI_CFG_STATUS, 2);
+ if (!(device_status & PCI_STS_CAPS))
+ return 0;
+
+ /* See PCI Local Bus Sepcification, Ver. 3.0, Sect. 6.7 */
+ offset = pci_read_config(bdf, PCI_CFG_CAPS, 1) & PCI_CAP_PTR_MASK;
+ while (offset != 0 && ttl-- != 0) {
+ buf = pci_read_config(bdf, offset, 2);
+ if ((buf & 0x00ff) == cap_id)
+ goto out;
+ offset = ((buf & 0xff00) >> 8) & PCI_CAP_PTR_MASK;
+ }
+
+ offset = 0;
+out:
+ return offset;
+}
+
/**
* Look up device owned by a cell.
* @param[in] cell Owning cell.
--
2.5.1

Valentine Sinitsyn

unread,
Sep 12, 2015, 8:06:10 AM9/12/15
to jailho...@googlegroups.com, Valentine Sinitsyn
Add few modifications to Jailhouse codebase required for amd_iommu code
to work properly. First, we allocate IOMMU page tables statically in
struct cell. Second, account for page sizes larger than 2GB in struct paging.

Signed-off-by: Valentine Sinitsyn <valentine...@gmail.com>
---
hypervisor/arch/x86/include/asm/cell.h | 5 ++++-
hypervisor/include/jailhouse/paging.h | 2 +-
2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/hypervisor/arch/x86/include/asm/cell.h b/hypervisor/arch/x86/include/asm/cell.h
index 02a4f70..d1b567d 100644
--- a/hypervisor/arch/x86/include/asm/cell.h
+++ b/hypervisor/arch/x86/include/asm/cell.h
@@ -49,7 +49,10 @@ struct arch_cell {
* cell. */
bool ir_emulation;
} vtd; /**< Intel VT-d specific fields. */
- /* TODO: No struct vtd equivalent for SVM code yet. */
+ struct {
+ /** Paging structures used for DMA requests. */
+ struct paging_structures pg_structs;
+ } amd_iommu; /**< AMD IOMMU specific fields. */
};

/** Shadow value of PCI config space address port register. */
diff --git a/hypervisor/include/jailhouse/paging.h b/hypervisor/include/jailhouse/paging.h
index 27286f0..319d9ce 100644
--- a/hypervisor/include/jailhouse/paging.h
+++ b/hypervisor/include/jailhouse/paging.h
@@ -71,7 +71,7 @@ typedef pt_entry_t page_table_t;
struct paging {
/** Page size of terminal entries in this level or 0 if none are
* supported. */
- unsigned int page_size;
+ unsigned long page_size;

/**
* Get entry in given table corresponding to virt address.
--
2.5.1

Valentine Sinitsyn

unread,
Sep 12, 2015, 8:06:11 AM9/12/15
to jailho...@googlegroups.com, Valentine Sinitsyn
Adjust jailhouse config tool and related data structures to store
AMD IOMMU feature bits (coming from ACPI and useful to determine
its properties like HATS) in cell configuration file.

Signed-off-by: Valentine Sinitsyn <valentine...@gmail.com>
---
configs/f2a88xm-hd3.c | 1 +
hypervisor/include/jailhouse/cell-config.h | 1 +
tools/jailhouse-config-create | 5 +++++
tools/root-cell-config.c.tmpl | 1 +
4 files changed, 8 insertions(+)

diff --git a/configs/f2a88xm-hd3.c b/configs/f2a88xm-hd3.c
index a48a4ed..0c3bac6 100644
--- a/configs/f2a88xm-hd3.c
+++ b/configs/f2a88xm-hd3.c
@@ -46,6 +46,7 @@ struct {
.size = 0x80000,
.amd_bdf = 0x02,
.amd_cap = 0x40,
+ .amd_features = 0x80048824,
},
},
},
diff --git a/hypervisor/include/jailhouse/cell-config.h b/hypervisor/include/jailhouse/cell-config.h
index aebe007..842294b 100644
--- a/hypervisor/include/jailhouse/cell-config.h
+++ b/hypervisor/include/jailhouse/cell-config.h
@@ -123,6 +123,7 @@ struct jailhouse_iommu {
__u64 size;
__u16 amd_bdf;
__u16 amd_cap;
+ __u32 amd_features;
} __attribute__((packed));

struct jailhouse_system {
diff --git a/tools/jailhouse-config-create b/tools/jailhouse-config-create
index e0582de..02d1b46 100755
--- a/tools/jailhouse-config-create
+++ b/tools/jailhouse-config-create
@@ -522,6 +522,8 @@ class IOMMUConfig(object):
self.amd_bdf = props['amd_bdf']
if 'amd_cap' in props:
self.amd_cap = props['amd_cap']
+ if 'amd_features' in props:
+ self.amd_features = props['amd_features']

@property
def is_amd_iommu(self):
@@ -844,6 +846,9 @@ def parse_ivrs(pcidevices, ioapics):
'mmio_size': mmio_size,
'amd_bdf': iommu_bdf,
'amd_cap': caps_ofs,
+ # All IVHD block types except 0x10 have exact EFR image
+ # so none of bits we are interested in could be overridden.
+ 'amd_features': iommu_feat if block_type == 0x10 else 0
}))

bdf_start_range = None
diff --git a/tools/root-cell-config.c.tmpl b/tools/root-cell-config.c.tmpl
index d725059..ee258e9 100644
--- a/tools/root-cell-config.c.tmpl
+++ b/tools/root-cell-config.c.tmpl
@@ -71,6 +71,7 @@ struct {
% if unit.is_amd_iommu:
.amd_bdf = ${hex(unit.amd_bdf)},
.amd_cap = ${hex(unit.amd_cap)},
+ .amd_features = ${hex(unit.amd_features)},
% endif
},
% endfor
--
2.5.1

Valentine Sinitsyn

unread,
Sep 12, 2015, 8:06:12 AM9/12/15
to jailho...@googlegroups.com, Valentine Sinitsyn
Introduce amd_iommu.h with all structures and definitions needed
for subsequent changes in this patches to work properly.

Signed-off-by: Valentine Sinitsyn <valentine...@gmail.com>
---
hypervisor/arch/x86/include/asm/amd_iommu.h | 118 +++++++++++++++++++++++++++-
1 file changed, 115 insertions(+), 3 deletions(-)

diff --git a/hypervisor/arch/x86/include/asm/amd_iommu.h b/hypervisor/arch/x86/include/asm/amd_iommu.h
index c02f66c..c15bd69 100644
--- a/hypervisor/arch/x86/include/asm/amd_iommu.h
+++ b/hypervisor/arch/x86/include/asm/amd_iommu.h
@@ -13,10 +13,122 @@
#ifndef _JAILHOUSE_ASM_AMD_IOMMU_H
#define _JAILHOUSE_ASM_AMD_IOMMU_H

+#include <jailhouse/types.h>
+
+#define AMD_IOMMU_PTE_P (1ULL << 0)
+#define AMD_IOMMU_PTE_IR (1ULL << 61)
+#define AMD_IOMMU_PTE_IW (1ULL << 62)
+
+#define AMD_IOMMU_PAGE_DEFAULT_FLAGS (AMD_IOMMU_PTE_IW | AMD_IOMMU_PTE_IR | \
+ AMD_IOMMU_PTE_P)
+#define AMD_IOMMU_MAX_PAGE_TABLE_LEVELS 6
+
+#define CAPS_IOMMU_BASE_HI 0x08
+#define CAPS_IOMMU_BASE_LOW 0x04
+#define IOMMU_CAP_ENABLE (1 << 0)
+#define IOMMU_CAP_EFR (1 << 27)
+
/*
- * There is nothing in this file yet, since AMD IOMMU support is missing.
- * It is assumed that AMD IOMMU-specific constants and definitions will
- * appear here someday.
+ * TODO: Check this. Also, make naming more consistent,
+ * i.e. MMIO_CONTROL_* instead of CONTROL etc.
*/
+#define CONTROL_SMIF_EN (1 << 22)
+#define CONTROL_SMIFLOG_EN (1 << 24)
+#define CONTROL_SEG_SUP_SHIFT 34
+#define CONTROL_SEG_SUP_MASK 0x7
+
+#define CONTROL_IOMMU_EN (1 << 0)
+#define CONTROL_EVT_LOG_EN (1 << 2)
+#define CONTROL_EVT_INT_EN (1 << 3)
+#define CONTROL_COMM_WAIT_INT_EN (1 << 4)
+#define CONTROL_CMD_BUF_EN (1 << 12)
+
+#define FEATURE_SMI_FSUP_MASK 0x30000
+#define FEATURE_SMI_FSUP_SHIFT 16
+
+#define FEATURE_SMI_FRC_MASK 0x1c0000
+#define FEATURE_SMI_FRC_SHIFT 18
+
+#define FEATURE_SEG_SUP_MASK 0xc000000000
+#define FEATURE_SEG_SUP_SHIFT 38
+
+#define FEATURE_HATS_MASK 0xc00
+#define FEATURE_HATS_SHIFT 10
+
+#define FEATURE_HE_SUP_MASK 0x80
+#define FEATURE_HE_SUP_SHIFT 7
+
+#define MMIO_DEV_TABLE_BASE 0x0000
+
+#define MMIO_CMD_BUF_OFFSET 0x0008
+#define MMIO_CONTROL_OFFSET 0x0018
+#define MMIO_EXT_FEATURES 0x0030
+#define MMIO_SMI_FREG0_OFFSET 0x0060
+#define MMIO_DEV_TABLE_SEG_BASE 0x0100
+#define MMIO_CMD_HEAD_OFFSET 0x2000
+#define MMIO_CMD_TAIL_OFFSET 0x2008
+
+#define MMIO_EVT_LOG_OFFSET 0x0010
+#define MMIO_EVT_HEAD_OFFSET 0x2010
+#define MMIO_EVT_TAIL_OFFSET 0x2018
+
+#define MMIO_STATUS_OFFSET 0x2020
+# define MMIO_STATUS_EVT_OVERFLOW_MASK (1 << 0)
+# define MMIO_STATUS_EVT_INT_MASK (1 << 1)
+# define MMIO_STATUS_EVT_RUN_MASK (1 << 3)
+
+#define MMIO_HEV_UPPER_OFFSET 0x0050
+#define MMIO_HEV_LOWER_OFFSET 0x0050
+#define MMIO_HEV_OFFSET 0x0050
+
+#define MMIO_HEV_VALID (1 << 1)
+#define MMIO_HEV_OVERFLOW (1 << 2)
+
+#define SMI_FREG_LOCKED (1 << 17)
+#define SMI_FREG_VALID (1 << 16)
+
+#define DTE_VALID (1 << 0)
+#define DTE_TRANSLATION_VALID (1 << 1)
+
+#define DTE_MODE_SHIFT 0x09
+#define DTE_MODE_MASK 0x07ULL
+
+#define DTE_SUPPRESS_EVENTS (1ULL << 33)
+#define DTE_CACHE_DISABLE (1ULL << 37)
+
+#define DEV_TABLE_SEG_MAX 8
+#define DEV_TABLE_SIZE 0x200000
+
+#define BUF_LEN_EXPONENT_SHIFT 56
+
+#define CMD_COMPL_WAIT 0x01
+# define CMD_COMPL_WAIT_STORE_MASK (1 << 0)
+# define CMD_COMPL_WAIT_INT_MASK (1 << 1)
+
+#define CMD_INV_DEVTAB_ENTRY 0x02
+
+#define CMD_INV_IOMMU_PAGES 0x03
+# define CMD_INV_IOMMU_PAGES_SIZE_MASK (1 << 0)
+# define CMD_INV_IOMMU_PAGES_PDE_MASK (1 << 1)
+
+#define CMD_SET_TYPE(cmd, type) ((cmd)->data[1] |= ((type & 0x7) << 28))
+
+#define EVENT_TYPE_ILL_CMD_ERR 0x05
+#define EVENT_TYPE_CMD_HW_ERR 0x06
+#define EVENT_TYPE_EVT_CNT_ZERO 0x0a
+
+#define MSI_ADDRESS_SHIFT 20
+
+struct dev_table_entry {
+ u64 data[4];
+} __attribute__((packed));
+
+struct cmd_buf_entry {
+ u32 data[4];
+} __attribute__((packed));
+
+struct evt_log_entry {
+ u32 data[4];
+} __attribute__((packed));

#endif
--
2.5.1

Valentine Sinitsyn

unread,
Sep 12, 2015, 8:06:13 AM9/12/15
to jailho...@googlegroups.com, Valentine Sinitsyn
amd_iommu page tables are extensions to x86_64 page tables. However,
they are 6-level, so one can't easily derive them from x66_64_paging
structures, as e.g. SVM code does.

This commit adds self-containing amd_iommu_paging structures.

Signed-off-by: Valentine Sinitsyn <valentine...@gmail.com>
---
hypervisor/arch/x86/Makefile | 2 +-
hypervisor/arch/x86/amd_iommu_paging.c | 255 +++++++++++++++++++++
hypervisor/arch/x86/include/asm/amd_iommu_paging.h | 28 +++
3 files changed, 284 insertions(+), 1 deletion(-)
create mode 100644 hypervisor/arch/x86/amd_iommu_paging.c
create mode 100644 hypervisor/arch/x86/include/asm/amd_iommu_paging.h

diff --git a/hypervisor/arch/x86/Makefile b/hypervisor/arch/x86/Makefile
index 0d3670e..1c06c3b 100644
--- a/hypervisor/arch/x86/Makefile
+++ b/hypervisor/arch/x86/Makefile
@@ -21,5 +21,5 @@ always := $(BUILT_IN_OBJECTS)

obj-y := $(BUILT_IN_OBJECTS)

-built-in-amd-y := $(COMMON_OBJECTS) svm.o amd_iommu.o svm-vmexit.o
+built-in-amd-y := $(COMMON_OBJECTS) svm.o amd_iommu.o amd_iommu_paging.o svm-vmexit.o
built-in-intel-y := $(COMMON_OBJECTS) vmx.o vtd.o vmx-vmexit.o
diff --git a/hypervisor/arch/x86/amd_iommu_paging.c b/hypervisor/arch/x86/amd_iommu_paging.c
new file mode 100644
index 0000000..08d13b7
--- /dev/null
+++ b/hypervisor/arch/x86/amd_iommu_paging.c
@@ -0,0 +1,255 @@
+/*
+ * Jailhouse, a Linux-based partitioning hypervisor
+ *
+ * Copyright (c) Siemens AG, 2014
+ * Copyright (c) Valentine Sinitsyn, 2015
+ *
+ * Authors:
+ * Jan Kiszka <jan.k...@siemens.com>
+ * Valentine Sinitsyn <valentine...@gmail.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ */
+
+#include <jailhouse/paging.h>
+#include <jailhouse/string.h>
+
+#include <asm/amd_iommu.h>
+#include <asm/amd_iommu_paging.h>
+
+/* TODO: Using DTE_* here is not clean, refactor it */
+#define AMD_IOMMU_PTE_MODE_MASK (DTE_MODE_MASK << DTE_MODE_SHIFT)
+#define PM_LEVEL_ENC(x) (((x) << DTE_MODE_SHIFT) & AMD_IOMMU_PTE_MODE_MASK)
+
+static bool amd_iommu_entry_valid(pt_entry_t pte, unsigned long flags)
+{
+ return (*pte & flags) == flags;
+}
+
+static unsigned long amd_iommu_get_flags(pt_entry_t pte)
+{
+ return *pte & 0x7800000000000001;
+}
+
+/* TODO: Return my macros after debugging */
+static void amd_iommu_set_next_pt_l6(pt_entry_t pte, unsigned long next_pt)
+{
+ *pte = (next_pt & 0x000ffffffffff000UL) | PM_LEVEL_ENC(5) |
+ AMD_IOMMU_PAGE_DEFAULT_FLAGS;
+}
+
+static void amd_iommu_set_next_pt_l5(pt_entry_t pte, unsigned long next_pt)
+{
+ *pte = (next_pt & 0x000ffffffffff000UL) | PM_LEVEL_ENC(4) |
+ AMD_IOMMU_PAGE_DEFAULT_FLAGS;
+}
+
+
+static void amd_iommu_set_next_pt_l4(pt_entry_t pte, unsigned long next_pt)
+{
+ *pte = (next_pt & 0x000ffffffffff000UL) | PM_LEVEL_ENC(3) |
+ AMD_IOMMU_PAGE_DEFAULT_FLAGS;
+}
+
+static void amd_iommu_set_next_pt_l3(pt_entry_t pte, unsigned long next_pt)
+{
+ *pte = (next_pt & 0x000ffffffffff000UL) | PM_LEVEL_ENC(2) |
+ AMD_IOMMU_PAGE_DEFAULT_FLAGS;
+}
+
+static void amd_iommu_set_next_pt_l2(pt_entry_t pte, unsigned long next_pt)
+{
+ *pte = (next_pt & 0x000ffffffffff000UL) | PM_LEVEL_ENC(1) |
+ AMD_IOMMU_PAGE_DEFAULT_FLAGS;
+}
+
+static void amd_iommu_clear_entry(pt_entry_t pte)
+{
+ *pte = 0;
+}
+
+static bool amd_iommu_page_table_empty(page_table_t page_table)
+{
+ pt_entry_t pte;
+ int n;
+
+ for (n = 0, pte = page_table; n < PAGE_SIZE / sizeof(u64); n++, pte++)
+ if (amd_iommu_entry_valid(pte, AMD_IOMMU_PTE_P))
+ return false;
+ return true;
+}
+
+static pt_entry_t amd_iommu_get_entry_l6(page_table_t page_table,
+ unsigned long virt)
+{
+ return &page_table[(virt >> 57) & 0x7f];
+}
+
+static pt_entry_t amd_iommu_get_entry_l5(page_table_t page_table,
+ unsigned long virt)
+{
+ return &page_table[(virt >> 48) & 0x1ff];
+}
+
+static pt_entry_t amd_iommu_get_entry_l4(page_table_t page_table,
+ unsigned long virt)
+{
+ return &page_table[(virt >> 39) & 0x1ff];
+}
+
+static pt_entry_t amd_iommu_get_entry_l3(page_table_t page_table,
+ unsigned long virt)
+{
+ return &page_table[(virt >> 30) & 0x1ff];
+}
+
+static pt_entry_t amd_iommu_get_entry_l2(page_table_t page_table,
+ unsigned long virt)
+{
+ return &page_table[(virt >> 21) & 0x1ff];
+}
+
+static pt_entry_t amd_iommu_get_entry_l1(page_table_t page_table,
+ unsigned long virt)
+{
+ return &page_table[(virt >> 12) & 0x1ff];
+}
+
+/* amd_iommu_set_terminal_lX(): set Next Level to 0 */
+static void amd_iommu_set_terminal_l5(pt_entry_t pte, unsigned long phys,
+ unsigned long flags)
+{
+ *pte = (phys & 0x000f000000000000UL) | flags;
+}
+
+static void amd_iommu_set_terminal_l4(pt_entry_t pte, unsigned long phys,
+ unsigned long flags)
+{
+ *pte = (phys & 0x000fff8000000000UL) | flags;
+}
+
+
+static void amd_iommu_set_terminal_l3(pt_entry_t pte, unsigned long phys,
+ unsigned long flags)
+{
+ *pte = (phys & 0x000fffffc0000000UL) | flags;
+}
+
+static void amd_iommu_set_terminal_l2(pt_entry_t pte, unsigned long phys,
+ unsigned long flags)
+{
+ *pte = (phys & 0x000fffffffe00000UL) | flags;
+}
+
+static void amd_iommu_set_terminal_l1(pt_entry_t pte, unsigned long phys,
+ unsigned long flags)
+{
+ *pte = (phys & 0x000ffffffffff000UL) | flags;
+}
+
+/* TODO: amd_iommu_get_phys(): support Next Level = 7 as well */
+static unsigned long amd_iommu_get_phys_l5(pt_entry_t pte, unsigned long virt)
+{
+ if ((*pte & AMD_IOMMU_PTE_MODE_MASK) != 0)
+ return INVALID_PHYS_ADDR;
+ return (*pte & 0x000f000000000000UL) |
+ (virt & 0x0000ffffffffffffUL);
+}
+
+static unsigned long amd_iommu_get_phys_l4(pt_entry_t pte, unsigned long virt)
+{
+ if (!(*pte & AMD_IOMMU_PTE_MODE_MASK))
+ return INVALID_PHYS_ADDR;
+ return (*pte & 0x000fff8000000000UL) |
+ (virt & 0x00000007ffffffffUL);
+}
+
+
+static unsigned long amd_iommu_get_phys_l3(pt_entry_t pte, unsigned long virt)
+{
+ if ((*pte & AMD_IOMMU_PTE_MODE_MASK) != 0)
+ return INVALID_PHYS_ADDR;
+ return (*pte & 0x000fffffc0000000UL) |
+ (virt & 0x000000003fffffffUL);
+}
+
+static unsigned long amd_iommu_get_phys_l2(pt_entry_t pte, unsigned long virt)
+{
+ if ((*pte & AMD_IOMMU_PTE_MODE_MASK) != 0)
+ return INVALID_PHYS_ADDR;
+ return (*pte & 0x000fffffffe00000UL) |
+ (virt & 0x00000000001fffffUL);
+}
+
+static unsigned long amd_iommu_get_phys_l1(pt_entry_t pte, unsigned long virt)
+{
+ return (*pte & 0x000ffffffffff000UL) |
+ (virt & 0x0000000000000fffUL);
+}
+
+static unsigned long amd_iommu_get_next_pt(pt_entry_t pte)
+{
+ return *pte & 0x000ffffffffff000UL;
+}
+
+#define AMD_IOMMU_PAGING_COMMON \
+ .entry_valid = amd_iommu_entry_valid, \
+ .get_flags = amd_iommu_get_flags, \
+ .clear_entry = amd_iommu_clear_entry, \
+ .page_table_empty = amd_iommu_page_table_empty
+
+const struct paging amd_iommu_paging[AMD_IOMMU_MAX_PAGE_TABLE_LEVELS] = {
+ {
+ AMD_IOMMU_PAGING_COMMON,
+ .get_entry = amd_iommu_get_entry_l6,
+ /* set_terminal not valid */
+ .get_phys = paging_get_phys_invalid,
+ .set_next_pt = amd_iommu_set_next_pt_l6,
+ .get_next_pt = amd_iommu_get_next_pt,
+ },
+ {
+ .page_size = 256ULL * 1024 * 1024 * 1024 * 1024,
+ AMD_IOMMU_PAGING_COMMON,
+ .get_entry = amd_iommu_get_entry_l5,
+ .set_terminal = amd_iommu_set_terminal_l5,
+ .get_phys = amd_iommu_get_phys_l5,
+ .set_next_pt = amd_iommu_set_next_pt_l5,
+ .get_next_pt = amd_iommu_get_next_pt,
+ },
+ {
+ .page_size = 512ULL * 1024 * 1024 * 1024,
+ AMD_IOMMU_PAGING_COMMON,
+ .get_entry = amd_iommu_get_entry_l4,
+ .set_terminal = amd_iommu_set_terminal_l4,
+ .get_phys = amd_iommu_get_phys_l4,
+ .set_next_pt = amd_iommu_set_next_pt_l4,
+ .get_next_pt = amd_iommu_get_next_pt,
+ },
+ {
+ .page_size = 1024 * 1024 * 1024,
+ AMD_IOMMU_PAGING_COMMON,
+ .get_entry = amd_iommu_get_entry_l3,
+ .set_terminal = amd_iommu_set_terminal_l3,
+ .get_phys = amd_iommu_get_phys_l3,
+ .set_next_pt = amd_iommu_set_next_pt_l3,
+ .get_next_pt = amd_iommu_get_next_pt,
+ },
+ {
+ .page_size = 2 * 1024 * 1024,
+ AMD_IOMMU_PAGING_COMMON,
+ .get_entry = amd_iommu_get_entry_l2,
+ .set_terminal = amd_iommu_set_terminal_l2,
+ .get_phys = amd_iommu_get_phys_l2,
+ .set_next_pt = amd_iommu_set_next_pt_l2,
+ .get_next_pt = amd_iommu_get_next_pt,
+ },
+ {
+ .page_size = PAGE_SIZE,
+ AMD_IOMMU_PAGING_COMMON,
+ .get_entry = amd_iommu_get_entry_l1,
+ .set_terminal = amd_iommu_set_terminal_l1,
+ .get_phys = amd_iommu_get_phys_l1,
+ /* set_next_pt, get_next_pt not valid */
+ },
+};
diff --git a/hypervisor/arch/x86/include/asm/amd_iommu_paging.h b/hypervisor/arch/x86/include/asm/amd_iommu_paging.h
new file mode 100644
index 0000000..271a84d
--- /dev/null
+++ b/hypervisor/arch/x86/include/asm/amd_iommu_paging.h
@@ -0,0 +1,28 @@
+/*
+ * Jailhouse, a Linux-based partitioning hypervisor
+ *
+ * Copyright (c) Valentine Sinitsyn, 2015
+ *
+ * Authors:
+ * Valentine Sinitsyn <valentine...@gmail.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ */
+
+#ifndef _JAILHOUSE_ASM_AMD_IOMMU_PAGING_H
+#define _JAILHOUSE_ASM_AMD_IOMMU_PAGING_H
+
+#include <jailhouse/paging.h>
+
+#define AMD_IOMMU_PTE_P (1ULL << 0)
+#define AMD_IOMMU_PTE_IR (1ULL << 61)
+#define AMD_IOMMU_PTE_IW (1ULL << 62)
+
+#define AMD_IOMMU_PAGE_DEFAULT_FLAGS (AMD_IOMMU_PTE_IW | AMD_IOMMU_PTE_IR | \
+ AMD_IOMMU_PTE_P)
+#define AMD_IOMMU_MAX_PAGE_TABLE_LEVELS 6
+
+extern const struct paging amd_iommu_paging[];
+
+#endif
--
2.5.1

Valentine Sinitsyn

unread,
Sep 12, 2015, 8:06:15 AM9/12/15
to jailho...@googlegroups.com, Valentine Sinitsyn
Implement iommu_init() and iommu_shutdown() for AMD-based systems.

Signed-off-by: Valentine Sinitsyn <valentine...@gmail.com>
---
hypervisor/arch/x86/amd_iommu.c | 393 +++++++++++++++++++++++++++++++++++++++-
1 file changed, 388 insertions(+), 5 deletions(-)

diff --git a/hypervisor/arch/x86/amd_iommu.c b/hypervisor/arch/x86/amd_iommu.c
index 1a076fb..f43bed7 100644
--- a/hypervisor/arch/x86/amd_iommu.c
+++ b/hypervisor/arch/x86/amd_iommu.c
@@ -13,25 +13,365 @@
#include <jailhouse/entry.h>
#include <jailhouse/cell.h>
#include <jailhouse/cell-config.h>
+#include <jailhouse/control.h>
+#include <jailhouse/mmio.h>
#include <jailhouse/pci.h>
#include <jailhouse/printk.h>
+#include <jailhouse/string.h>
#include <jailhouse/types.h>
+#include <asm/amd_iommu.h>
+#include <asm/amd_iommu_paging.h>
#include <asm/apic.h>
#include <asm/iommu.h>
#include <asm/percpu.h>

-unsigned int iommu_mmio_count_regions(struct cell *cell)
+/* Allocate minimum space possible (4K or 256 entries) */
+
+#define buffer_size(name, entry) ((1 << name##_LEN_EXPONENT) * \
+ sizeof(entry))
+
+#define CMD_BUF_LEN_EXPONENT 8
+#define EVT_LOG_LEN_EXPONENT 8
+
+#define CMD_BUF_SIZE buffer_size(CMD_BUF, struct cmd_buf_entry)
+#define EVT_LOG_SIZE buffer_size(EVT_LOG, struct evt_log_entry)
+
+#define BITS_PER_SHORT 16
+
+static struct amd_iommu {
+ int idx;
+ void *mmio_base;
+ int mmio_size;
+ /* Command Buffer, Event Log */
+ unsigned char *cmd_buf_base;
+ unsigned char *evt_log_base;
+ /* Device table */
+ void *devtable_segments[DEV_TABLE_SEG_MAX];
+ u8 dev_tbl_seg_sup;
+ u32 cmd_tail_ptr;
+
+ u16 bdf;
+ u16 msi_cap;
+
+ /* ACPI overrides for feature bits */
+ union {
+ u32 raw;
+ struct {
+ u8 __pad0:7;
+ u8 he_sup:1;
+ u32 __pad1:22;
+ u8 hats:2;
+ };
+ } features;
+} iommu_units[JAILHOUSE_MAX_IOMMU_UNITS];
+
+#define for_each_iommu(iommu) for(iommu = iommu_units; \
+ iommu < iommu_units + iommu_units_count; \
+ iommu++)
+
+#define to_u64(hi, lo) ((u64)(hi) << 32 | (lo))
+
+static unsigned int iommu_units_count;
+
+static unsigned int fault_reporting_cpu_id;
+
+/*
+ * XXX: The initial idea was to have this equals to minimum value for
+ * all IOMMUs in the system. It's not clear how to reuse paging.c functions
+ * for 6 page table levels, so we'll just set it to constant value of four.
+ */
+static int amd_iommu_pt_levels = 4;
+
+/*
+ * Real AMD IOMMU paging structures: a least common denominator for all IOMMUs
+ * in the current system.
+ */
+static struct paging real_amd_iommu_paging[AMD_IOMMU_MAX_PAGE_TABLE_LEVELS];
+
+static unsigned long amd_iommu_get_efr(struct amd_iommu *iommu)
+{
+ return mmio_read64(iommu->mmio_base + MMIO_EXT_FEATURES);
+}
+
+/*
+ * Following functions return various feature bits from EFR or
+ * IVRS ACPI table. The latter takes precedence as per Sect. 5.
+ * Note there is no indicator flag for ACPI overrides; we rely on
+ * the assumption that at least one feature bit will be set in
+ * the IVRS.
+ */
+static unsigned int amd_iommu_get_hats(struct amd_iommu *iommu)
+{
+ unsigned long efr = amd_iommu_get_efr(iommu);
+
+ if (iommu->features.raw)
+ return iommu->features.hats;
+
+ return (efr & FEATURE_HATS_MASK) >> FEATURE_HATS_SHIFT;
+}
+
+static unsigned int amd_iommu_get_he_sup(struct amd_iommu *iommu)
{
+ unsigned long efr = amd_iommu_get_efr(iommu);
+
+ if (iommu->features.raw)
+ return iommu->features.he_sup;
+
+ return (efr & FEATURE_HE_SUP_MASK) >> FEATURE_HE_SUP_SHIFT;
+}
+
+static int amd_iommu_init_mmio(struct amd_iommu *entry,
+ struct jailhouse_iommu *iommu)
+{
+ int err = -ENOMEM;
+
+ /* Allocate MMIO space (configured in amd_iommu_init_pci()) */
+ entry->mmio_base = page_alloc(&remap_pool, PAGES(iommu->size));
+ if (!entry->mmio_base)
+ goto out;
+ entry->mmio_size = iommu->size;
+
+ err = paging_create(&hv_paging_structs, iommu->base, iommu->size,
+ (unsigned long)entry->mmio_base,
+ PAGE_DEFAULT_FLAGS | PAGE_FLAG_DEVICE,
+ PAGING_NON_COHERENT);
+ if (err)
+ goto out_free_mmio;
+
return 0;
+
+out_free_mmio:
+ page_free(&remap_pool, entry->mmio_base, PAGES(iommu->size));
+out:
+ return err;
}

-int iommu_init(void)
+static int amd_iommu_init_buffers(struct amd_iommu *entry,
+ struct jailhouse_iommu *iommu)
{
- printk("WARNING: AMD IOMMU support is not implemented yet\n");
- /* TODO: Implement */
+ int err = -ENOMEM;
+
+ /* Allocate and configure command buffer */
+ entry->cmd_buf_base = page_alloc(&mem_pool, PAGES(CMD_BUF_SIZE));
+ if (!entry->cmd_buf_base)
+ goto out;
+
+ mmio_write64(entry->mmio_base + MMIO_CMD_BUF_OFFSET,
+ paging_hvirt2phys(entry->cmd_buf_base) |
+ ((u64)CMD_BUF_LEN_EXPONENT << BUF_LEN_EXPONENT_SHIFT));
+
+ entry->cmd_tail_ptr = 0;
+
+ /* Allocate and configure event log */
+ entry->evt_log_base = page_alloc(&mem_pool, PAGES(EVT_LOG_SIZE));
+ if (!entry->evt_log_base)
+ goto out_free_cmd_buf;
+
+ mmio_write64(entry->mmio_base + MMIO_EVT_LOG_OFFSET,
+ paging_hvirt2phys(entry->evt_log_base) |
+ ((u64)EVT_LOG_LEN_EXPONENT << BUF_LEN_EXPONENT_SHIFT));
+
+ return 0;
+
+out_free_cmd_buf:
+ page_free(&mem_pool, entry->cmd_buf_base, PAGES(CMD_BUF_SIZE));
+out:
+ return trace_error(err);
+}
+
+static int amd_iommu_init_pci(struct amd_iommu *entry,
+ struct jailhouse_iommu *iommu)
+{
+ u64 caps_header;
+ u32 lo, hi;
+
+ entry->bdf = iommu->amd_bdf;
+
+ /* Check alignment */
+ if (iommu->size & (iommu->size - 1))
+ return trace_error(-EINVAL);
+
+ /* Check that EFR is supported */
+ caps_header = pci_read_config(entry->bdf, iommu->amd_cap, 4);
+ if (!(caps_header & IOMMU_CAP_EFR))
+ return trace_error(-EINVAL);
+
+ lo = pci_read_config(
+ entry->bdf, iommu->amd_cap + CAPS_IOMMU_BASE_LOW, 4);
+ hi = pci_read_config(
+ entry->bdf, iommu->amd_cap + CAPS_IOMMU_BASE_HI, 4);
+
+ if ((lo & IOMMU_CAP_ENABLE) &&
+ (to_u64(hi, lo & ~IOMMU_CAP_ENABLE) != iommu->base)) {
+ printk("FATAL: IOMMU %d config is locked in invalid state. "
+ "Please reboot your system and try again.\n", entry->idx);
+ return trace_error(-EPERM);
+ }
+
+ /* Should be configured by BIOS, but we want to be sure */
+ pci_write_config(entry->bdf,
+ iommu->amd_cap + CAPS_IOMMU_BASE_HI,
+ (u32)(iommu->base >> 32), 4);
+ pci_write_config(entry->bdf,
+ iommu->amd_cap + CAPS_IOMMU_BASE_LOW,
+ (u32)(iommu->base & 0xffffffff) | IOMMU_CAP_ENABLE,
+ 4);
+
+ /* Store MSI capability pointer */
+ entry->msi_cap = pci_find_capability_by_id(entry->bdf, PCI_CAP_MSI);
+ if (!entry->msi_cap)
+ return trace_error(-EINVAL);
+
+ return 0;
+}
+
+static int amd_iommu_init_features(struct amd_iommu *entry,
+ struct jailhouse_iommu *iommu)
+{
+ unsigned char smi_filter_regcnt;
+ u64 efr, ctrl_reg = 0, smi_freg = 0, val;
+ unsigned int n, hats;
+ void *reg_base;
+
+ entry->features.raw = iommu->amd_features;
+
+ efr = amd_iommu_get_efr(entry);
+
+ /* Minimum HATS wins */
+ hats = amd_iommu_get_hats(entry);
+ if (amd_iommu_pt_levels < 0 || amd_iommu_pt_levels > hats + 4)
+ amd_iommu_pt_levels = hats + 4;
+
+ /*
+ * Require SMI Filter support. Enable and lock filter but
+ * mark all entries as invalid to disable SMI delivery.
+ */
+ if (!(efr & FEATURE_SMI_FSUP_MASK))
+ return trace_error(-EINVAL);
+
+ smi_filter_regcnt = (1 << (efr & FEATURE_SMI_FRC_MASK) >>
+ FEATURE_SMI_FRC_SHIFT);
+ for (n = 0; n < smi_filter_regcnt; n++) {
+ reg_base = entry->mmio_base + MMIO_SMI_FREG0_OFFSET + (n << 3);
+ smi_freg = mmio_read64(reg_base);
+
+ if (!(smi_freg & SMI_FREG_LOCKED)) {
+ /*
+ * Program unlocked register the way we need:
+ * invalid and locked.
+ */
+ mmio_write64(reg_base, SMI_FREG_LOCKED);
+ }
+ else if (smi_freg & SMI_FREG_VALID) {
+ /*
+ * The register is locked and programed
+ * the way we don't want - error.
+ */
+ printk("ERROR: SMI Filter register %d is locked "
+ "and can't be reprogrammed. Please reboot "
+ "and check no other component uses the "
+ "IOMMU %d.\n", n, entry->idx);
+ return trace_error(-EPERM);
+ }
+ /*
+ * The register is locked, but programmed
+ * the way we need - OK to go.
+ */
+ }
+
+ ctrl_reg |= (CONTROL_SMIF_EN | CONTROL_SMIFLOG_EN);
+
+ /* Enable maximum Device Table segmentation possible */
+ entry->dev_tbl_seg_sup = (efr & FEATURE_SEG_SUP_MASK) >>
+ FEATURE_SEG_SUP_SHIFT;
+ if (entry->dev_tbl_seg_sup) {
+ val = entry->dev_tbl_seg_sup & CONTROL_SEG_SUP_MASK;
+ ctrl_reg |= val << CONTROL_SEG_SUP_SHIFT;
+ }
+
+ mmio_write64(entry->mmio_base + MMIO_CONTROL_OFFSET, ctrl_reg);
+
return 0;
}

+static void amd_iommu_enable_command_processing(struct amd_iommu *iommu)
+{
+ u64 ctrl_reg;
+
+ ctrl_reg = mmio_read64(iommu->mmio_base + MMIO_CONTROL_OFFSET);
+ ctrl_reg |= CONTROL_IOMMU_EN | CONTROL_CMD_BUF_EN |
+ CONTROL_EVT_LOG_EN | CONTROL_EVT_INT_EN;
+ mmio_write64(iommu->mmio_base + MMIO_CONTROL_OFFSET, ctrl_reg);
+}
+
+unsigned int iommu_mmio_count_regions(struct cell *cell)
+{
+ return cell == &root_cell ? iommu_count_units() : 0;
+}
+
+int iommu_init(void)
+{
+ const struct paging *toplevel_paging = &amd_iommu_paging[0];
+ struct jailhouse_iommu *iommu;
+ struct amd_iommu *entry;
+ unsigned int n;
+ int err;
+
+ iommu = &system_config->platform_info.x86.iommu_units[0];
+ for (n = 0; iommu->base && n < iommu_count_units(); iommu++, n++) {
+ entry = &iommu_units[iommu_units_count];
+
+ entry->idx = n;
+
+ /* Protect against accidental VT-d configs. */
+ if (!iommu->amd_bdf)
+ return trace_error(-EINVAL);
+
+ /* Initialize PCI registers */
+ err = amd_iommu_init_pci(entry, iommu);
+ if (err)
+ return err;
+
+ /* Initialize MMIO space */
+ err = amd_iommu_init_mmio(entry, iommu);
+ if (err)
+ return err;
+
+ /* Setup IOMMU features */
+ err = amd_iommu_init_features(entry, iommu);
+ if (err)
+ return err;
+
+ /* Initialize command buffer and event log */
+ err = amd_iommu_init_buffers(entry, iommu);
+ if (err)
+ return err;
+
+ /* Enable the IOMMU */
+ amd_iommu_enable_command_processing(entry);
+
+ iommu_units_count++;
+ }
+
+ if (amd_iommu_pt_levels > AMD_IOMMU_MAX_PAGE_TABLE_LEVELS)
+ return trace_error(-ERANGE);
+
+ toplevel_paging += AMD_IOMMU_MAX_PAGE_TABLE_LEVELS - amd_iommu_pt_levels;
+ memcpy(real_amd_iommu_paging, toplevel_paging,
+ sizeof(struct paging) * amd_iommu_pt_levels);
+
+ /*
+ * Real page table has less levels than amd_iommu_paging - setup new
+ * top level paging structure.
+ */
+ if (amd_iommu_pt_levels != AMD_IOMMU_MAX_PAGE_TABLE_LEVELS) {
+ real_amd_iommu_paging[0].page_size = 0;
+ real_amd_iommu_paging[0].get_phys = paging_get_phys_invalid;
+ }
+
+ return iommu_cell_init(&root_cell);
+}
+
int iommu_cell_init(struct cell *cell)
{
/* TODO: Implement */
@@ -92,7 +432,50 @@ int iommu_map_interrupt(struct cell *cell, u16 device_id, unsigned int vector,

void iommu_shutdown(void)
{
- /* TODO: Implement */
+ struct amd_iommu *iommu;
+ u64 ctrl_reg;
+ u32 seg_size;
+ int n, err;
+ void *ptr;
+
+ for_each_iommu(iommu) {
+ /* Disable the IOMMU */
+ ctrl_reg = mmio_read64(iommu->mmio_base + MMIO_CONTROL_OFFSET);
+ ctrl_reg &= ~(CONTROL_IOMMU_EN | CONTROL_CMD_BUF_EN |
+ CONTROL_EVT_LOG_EN | CONTROL_EVT_INT_EN);
+ mmio_write64(iommu->mmio_base + MMIO_CONTROL_OFFSET, ctrl_reg);
+
+ /* Free Device Table (and segments) */
+ seg_size = DEV_TABLE_SIZE / (1 << iommu->dev_tbl_seg_sup);
+ for (n = 0; n < DEV_TABLE_SEG_MAX; n++) {
+ ptr = iommu->devtable_segments[n];
+ if (ptr)
+ page_free(&mem_pool, ptr, PAGES(seg_size));
+ }
+
+ /* Free Command Buffer and Event Log */
+ page_free(&mem_pool, iommu->cmd_buf_base,
+ PAGES(CMD_BUF_SIZE));
+
+ /* Free Event Log */
+ page_free(&mem_pool, iommu->evt_log_base,
+ PAGES(EVT_LOG_SIZE));
+
+ /* Free MMIO */
+ err = paging_destroy(&hv_paging_structs,
+ (unsigned long)iommu->mmio_base,
+ iommu->mmio_size, PAGING_NON_COHERENT);
+ if (err < 0) {
+ printk("ERROR: IOMMU %d: Unable to destroy "
+ "MMIO page mappings\n", iommu->idx);
+ /*
+ * There is not much more else we can do,
+ * as we are shutting down already.
+ */
+ }
+ page_free(&remap_pool, iommu->mmio_base,
+ PAGES(iommu->mmio_size));
+ }
}

void iommu_check_pending_faults(void)
--
2.5.1

Valentine Sinitsyn

unread,
Sep 12, 2015, 8:06:15 AM9/12/15
to jailho...@googlegroups.com, Valentine Sinitsyn
Add iommu_cell_init() and iommu_cell_destroy() for amd_iommu.

Signed-off-by: Valentine Sinitsyn <valentine...@gmail.com>
---
hypervisor/arch/x86/amd_iommu.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/hypervisor/arch/x86/amd_iommu.c b/hypervisor/arch/x86/amd_iommu.c
index f43bed7..0b5ff9b 100644
--- a/hypervisor/arch/x86/amd_iommu.c
+++ b/hypervisor/arch/x86/amd_iommu.c
@@ -374,7 +374,18 @@ int iommu_init(void)

int iommu_cell_init(struct cell *cell)
{
- /* TODO: Implement */
+ // HACK for QEMU
+ if (iommu_units_count == 0)
+ return 0;
+
+ if (cell->id > 0xffff)
+ return trace_error(-ERANGE);
+
+ cell->arch.amd_iommu.pg_structs.root_paging = real_amd_iommu_paging;
+ cell->arch.amd_iommu.pg_structs.root_table = page_alloc(&mem_pool, 1);
+ if (!cell->arch.amd_iommu.pg_structs.root_table)
+ return trace_error(-ENOMEM);
+
return 0;
}

@@ -404,7 +415,12 @@ void iommu_remove_pci_device(struct pci_device *device)

void iommu_cell_exit(struct cell *cell)
{
- /* TODO: Implement */
+ /* TODO: Again, this a copy of vtd.c:iommu_cell_exit */
+ // HACK for QEMU
+ if (iommu_units_count == 0)
+ return;
+
+ page_free(&mem_pool, cell->arch.amd_iommu.pg_structs.root_table, 1);
}

void iommu_config_commit(struct cell *cell_added_removed)
--
2.5.1

Valentine Sinitsyn

unread,
Sep 12, 2015, 8:06:17 AM9/12/15
to jailho...@googlegroups.com, Valentine Sinitsyn
Add basic infrastructure (heavily influenced by Linux amd_iommu driver)
to submit commands to AMD IOMMU command buffer. For now, having only
INVALIDATE_IOMMU_PAGES and COMPLETION_WAIT seems to be sufficient.

Signed-off-by: Valentine Sinitsyn <valentine...@gmail.com>
---
hypervisor/arch/x86/amd_iommu.c | 107 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 107 insertions(+)

diff --git a/hypervisor/arch/x86/amd_iommu.c b/hypervisor/arch/x86/amd_iommu.c
index 7a73752..1f0d059 100644
--- a/hypervisor/arch/x86/amd_iommu.c
+++ b/hypervisor/arch/x86/amd_iommu.c
@@ -6,6 +6,10 @@
* Authors:
* Valentine Sinitsyn <valentine...@gmail.com>
*
+ * Commands posting and event log parsing code, as well as many defines
+ * were adapted from Linux's amd_iommu driver written by Joerg Roedel
+ * and Leo Duran.
+ *
* This work is licensed under the terms of the GNU GPL, version 2. See
* the COPYING file in the top-level directory.
*/
@@ -389,6 +393,55 @@ int iommu_cell_init(struct cell *cell)
return 0;
}

+static void amd_iommu_completion_wait(struct amd_iommu *iommu);
+
+#define CMD_BUF_DRAIN_MAX_ATTEMPTS 8
+
+static void amd_iommu_submit_command(struct amd_iommu *iommu,
+ struct cmd_buf_entry *cmd)
+{
+ u32 head, next_tail, bytes_free;
+ unsigned char *cur_ptr;
+ static bool drain = false;
+ static int drain_attempts = 0;
+
+again:
+ head = mmio_read64(iommu->mmio_base + MMIO_CMD_HEAD_OFFSET);
+ next_tail = (iommu->cmd_tail_ptr + sizeof(*cmd)) % CMD_BUF_SIZE;
+ /* XXX: Learn why this works :) */
+ bytes_free = (head - next_tail) % CMD_BUF_SIZE;
+
+ /* Leave some space for COMPLETION_WAIT that drains the buffer. */
+ if (bytes_free < 2 * sizeof(*cmd) && !drain) {
+ /* Drain the buffer */
+ drain = true;
+ amd_iommu_completion_wait(iommu);
+ drain = false;
+ goto again;
+ }
+
+ if (drain) {
+ /* Ensure we won't drain the buffer indefinitely */
+ if (++drain_attempts > CMD_BUF_DRAIN_MAX_ATTEMPTS) {
+ panic_printk("FATAL: IOMMU %d: "
+ "Failed to drain the command buffer\n",
+ iommu->idx);
+ panic_park();
+ }
+ } else {
+ /* Buffer drained - reset the counter */
+ drain_attempts = 0;
+ }
+
+ cur_ptr = &iommu->cmd_buf_base[iommu->cmd_tail_ptr];
+ memcpy(cur_ptr, cmd, sizeof(*cmd));
+
+ /* Just to be sure. */
+ arch_paging_flush_cpu_caches(cur_ptr, sizeof(*cmd));
+
+ iommu->cmd_tail_ptr = next_tail;
+}
+
int iommu_map_memory_region(struct cell *cell,
const struct jailhouse_memory *mem)
{
@@ -461,6 +514,60 @@ void iommu_cell_exit(struct cell *cell)
page_free(&mem_pool, cell->arch.amd_iommu.pg_structs.root_table, 1);
}

+static void wait_for_zero(volatile u64 *sem, unsigned long mask)
+{
+ /*
+ * TODO: We should really have some sort of timeout here,
+ * otherwise there is a risk of looping indefinitely blocking
+ * the hypervisor. However, this requires some sort of time
+ * keeping, so let's postpone this till the time it will be
+ * available in Jailhouse.
+ */
+ while (*sem & mask)
+ cpu_relax();
+}
+
+static void amd_iommu_invalidate_pages(struct amd_iommu *iommu,
+ u16 domain_id)
+{
+ /* Double braces to please GCC */
+ struct cmd_buf_entry invalidate_pages = {{ 0 }};
+
+ /*
+ * Flush everything, including PDEs, in whole address range, i.e.
+ * 0x7ffffffffffff000 with S bit (see Sect. 2.2.3).
+ */
+ invalidate_pages.data[1] = domain_id;
+ invalidate_pages.data[2] = 0xfffff000 |
+ CMD_INV_IOMMU_PAGES_SIZE_MASK |
+ CMD_INV_IOMMU_PAGES_PDE_MASK;
+ invalidate_pages.data[3] = 0x7fffffff;
+ CMD_SET_TYPE(&invalidate_pages, CMD_INV_IOMMU_PAGES);
+
+ amd_iommu_submit_command(iommu, &invalidate_pages);
+}
+
+static void amd_iommu_completion_wait(struct amd_iommu *iommu)
+{
+ /* Double braces to please GCC */
+ struct cmd_buf_entry completion_wait = {{ 0 }};
+ volatile u64 sem = 1;
+ long addr;
+
+ addr = paging_hvirt2phys(&sem);
+
+ completion_wait.data[0] = (addr & 0xfffffff8UL) |
+ CMD_COMPL_WAIT_STORE_MASK;
+ completion_wait.data[1] = (addr & 0x000fffff00000000UL) >> 32;
+ CMD_SET_TYPE(&completion_wait, CMD_COMPL_WAIT);
+
+ amd_iommu_submit_command(iommu, &completion_wait);
+ mmio_write64(iommu->mmio_base + MMIO_CMD_TAIL_OFFSET,
+ iommu->cmd_tail_ptr);
+
+ wait_for_zero(&sem, -1);
+}
+
void iommu_config_commit(struct cell *cell_added_removed)
{
/* TODO: Implement */
--
2.5.1

Valentine Sinitsyn

unread,
Sep 12, 2015, 8:06:17 AM9/12/15
to jailho...@googlegroups.com, Valentine Sinitsyn
Implement iommu_map_memory_region() and iommu_unmap_memory_region()
for amd_iommu.

Signed-off-by: Valentine Sinitsyn <valentine...@gmail.com>
---
hypervisor/arch/x86/amd_iommu.c | 46 +++++++++++++++++++++++++++++++++++++----
1 file changed, 42 insertions(+), 4 deletions(-)

diff --git a/hypervisor/arch/x86/amd_iommu.c b/hypervisor/arch/x86/amd_iommu.c
index 0b5ff9b..7a73752 100644
--- a/hypervisor/arch/x86/amd_iommu.c
+++ b/hypervisor/arch/x86/amd_iommu.c
@@ -392,14 +392,52 @@ int iommu_cell_init(struct cell *cell)
int iommu_map_memory_region(struct cell *cell,
const struct jailhouse_memory *mem)
{
- /* TODO: Implement */
- return 0;
+ unsigned long flags = AMD_IOMMU_PTE_P, zero_bits;
+
+ // HACK for QEMU
+ if (iommu_units_count == 0)
+ return 0;
+
+ /*
+ * Check that the address is not outside scope of current page
+ * tables (given their level). Each level adds 9 bits and the offset
+ * is 12 bits long, so no bits higher than this should be set.
+ */
+ if (amd_iommu_pt_levels != AMD_IOMMU_MAX_PAGE_TABLE_LEVELS) {
+ zero_bits = ~((1ULL << (amd_iommu_pt_levels * 9 + 12)) - 1);
+ if (mem->virt_start & zero_bits)
+ return trace_error(-ERANGE);
+ }
+
+ if (!(mem->flags & JAILHOUSE_MEM_DMA))
+ return 0;
+
+ if (mem->flags & JAILHOUSE_MEM_READ)
+ flags |= AMD_IOMMU_PTE_IR;
+ if (mem->flags & JAILHOUSE_MEM_WRITE)
+ flags |= AMD_IOMMU_PTE_IW;
+
+ return paging_create(&cell->arch.amd_iommu.pg_structs, mem->phys_start,
+ mem->size, mem->virt_start, flags, PAGING_COHERENT);
}
+
int iommu_unmap_memory_region(struct cell *cell,
const struct jailhouse_memory *mem)
{
- /* TODO: Implement */
- return 0;
+ /*
+ * TODO: This is almost a complete copy of vtd.c counterpart
+ * (sans QEMU hack). Think of unification.
+ */
+
+ // HACK for QEMU
+ if (iommu_units_count == 0)
+ return 0;
+
+ if (!(mem->flags & JAILHOUSE_MEM_DMA))
+ return 0;
+
+ return paging_destroy(&cell->arch.amd_iommu.pg_structs, mem->virt_start,
+ mem->size, PAGING_COHERENT);
}

int iommu_add_pci_device(struct cell *cell, struct pci_device *device)
--
2.5.1

Valentine Sinitsyn

unread,
Sep 12, 2015, 8:06:18 AM9/12/15
to jailho...@googlegroups.com, Valentine Sinitsyn
Implement iommu_add_pci_device() for amd_iommu.

Basically, this is all about filling DTE entry. However, there is no way
to allocate device tables sparsely with ADM IOMMU. To save some memory,
Device Table Segmentation (Revision 2.6 and up) is used whenever possible,
and this adds some infrastructure.

Signed-off-by: Valentine Sinitsyn <valentine...@gmail.com>
---
hypervisor/arch/x86/amd_iommu.c | 151 +++++++++++++++++++++++++++++++++++++++-
1 file changed, 148 insertions(+), 3 deletions(-)

diff --git a/hypervisor/arch/x86/amd_iommu.c b/hypervisor/arch/x86/amd_iommu.c
index 1f0d059..d90fb35 100644
--- a/hypervisor/arch/x86/amd_iommu.c
+++ b/hypervisor/arch/x86/amd_iommu.c
@@ -493,15 +493,160 @@ int iommu_unmap_memory_region(struct cell *cell,
mem->size, PAGING_COHERENT);
}

+static void amd_iommu_inv_dte(struct amd_iommu *iommu, u16 device_id)
+{
+ /* Double braces to please GCC */
+ struct cmd_buf_entry invalidate_dte = {{ 0 }};
+
+ invalidate_dte.data[0] = device_id;
+ CMD_SET_TYPE(&invalidate_dte, CMD_INV_DEVTAB_ENTRY);
+
+ amd_iommu_submit_command(iommu, &invalidate_dte);
+}
+
+static struct dev_table_entry * get_dev_table_entry(struct amd_iommu *iommu,
+ u16 bdf, bool allocate)
+{
+ struct dev_table_entry *devtable_seg;
+ u8 seg_idx, seg_shift;
+ u64 reg_base, reg_val;
+ u16 seg_mask;
+ u32 seg_size;
+
+ /*
+ * FIXME: Device Table Segmentation is UNTESTED, as I don't have the hardware
+ * which supports this feature.
+ */
+ if (!iommu->dev_tbl_seg_sup) {
+ seg_mask = 0;
+ seg_idx = 0;
+ seg_size = DEV_TABLE_SIZE;
+ } else {
+ seg_shift = BITS_PER_SHORT - iommu->dev_tbl_seg_sup;
+ seg_mask = ~((1 << seg_shift) - 1);
+ seg_idx = (seg_mask & bdf) >> seg_shift;
+ seg_size = DEV_TABLE_SIZE / (1 << iommu->dev_tbl_seg_sup);
+ }
+
+ /*
+ * Device table segmentation is tricky in Jailhouse. As cells can
+ * "share" the IOMMU, we don't know maximum bdf in each segment
+ * because cells are initialized independently. Thus, we can't simply
+ * adjust segment sizes for our maximum bdfs.
+ *
+ * The next best things is to lazily allocate segments as we add
+ * device using maximum possible size for segments. In the worst case
+ * scenario, we waste around 2M chunk per IOMMU.
+ */
+ devtable_seg = iommu->devtable_segments[seg_idx];
+ if (!devtable_seg) {
+ /* If we are not permitted to allocate, just fail */
+ if (!allocate)
+ return NULL;
+
+ devtable_seg = page_alloc(&mem_pool, PAGES(seg_size));
+ if (!devtable_seg)
+ return NULL;
+ iommu->devtable_segments[seg_idx] = devtable_seg;
+
+ if (!seg_idx)
+ reg_base = MMIO_DEV_TABLE_BASE;
+ else
+ reg_base = MMIO_DEV_TABLE_SEG_BASE + (seg_idx - 1) * 8;
+
+ /* Size in Kbytes = (m + 1) * 4, see Sect 3.3.6 */
+ reg_val = paging_hvirt2phys(devtable_seg) |
+ (seg_size / PAGE_SIZE - 1);
+ mmio_write64(iommu->mmio_base + reg_base, reg_val);
+ }
+
+ return &devtable_seg[bdf & ~seg_mask];
+}
+
int iommu_add_pci_device(struct cell *cell, struct pci_device *device)
{
- /* TODO: Implement */
- return 0;
+ struct dev_table_entry *dte = NULL;
+ struct amd_iommu *iommu;
+ u8 iommu_idx;
+ int err = 0;
+ u16 bdf;
+
+ // HACK for QEMU
+ if (iommu_units_count == 0)
+ return 0;
+
+ if (device->info->type == JAILHOUSE_PCI_TYPE_IVSHMEM)
+ return 0;
+
+ iommu_idx = device->info->iommu;
+ if (iommu_idx > JAILHOUSE_MAX_IOMMU_UNITS) {
+ err = -ERANGE;
+ goto out;
+ }
+
+ iommu = &iommu_units[iommu_idx];
+ bdf = device->info->bdf;
+
+ dte = get_dev_table_entry(iommu, bdf, true);
+ if (!dte) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ memset(dte, 0, sizeof(*dte));
+
+ /* DomainID */
+ dte->data[1] = cell->id & 0xffff;
+
+ /* Translation information */
+ dte->data[0] = AMD_IOMMU_PTE_IR | AMD_IOMMU_PTE_IW |
+ paging_hvirt2phys(cell->arch.amd_iommu.pg_structs.root_table) |
+ ((amd_iommu_pt_levels & DTE_MODE_MASK) << DTE_MODE_SHIFT) |
+ DTE_TRANSLATION_VALID | DTE_VALID;
+
+ /* TODO: Interrupt remapping. For now, just forward them unmapped. */
+
+ /* Flush caches, just to be sure. */
+ arch_paging_flush_cpu_caches(dte, sizeof(*dte));
+
+ amd_iommu_inv_dte(iommu, bdf);
+
+out:
+ return trace_error(err);
}

void iommu_remove_pci_device(struct pci_device *device)
{
- /* TODO: Implement */
+ struct dev_table_entry *dte = NULL;
+ struct amd_iommu *iommu;
+ u8 iommu_idx;
+ u16 bdf;
+
+ // HACK for QEMU
+ if (iommu_units_count == 0)
+ return;
+
+ if (device->info->type == JAILHOUSE_PCI_TYPE_IVSHMEM)
+ return;
+
+ iommu_idx = device->info->iommu;
+ if (iommu_idx > JAILHOUSE_MAX_IOMMU_UNITS)
+ return;
+
+ iommu = &iommu_units[iommu_idx];
+ bdf = device->info->bdf;
+
+ dte = get_dev_table_entry(iommu, bdf, false);
+ if (!dte)
+ return;
+
+ /* Clear *_VALID flags */
+ dte->data[0] = 0;
+
+ /* Flush caches, just to be sure. */
+ arch_paging_flush_cpu_caches(dte, sizeof(*dte));
+
+ amd_iommu_inv_dte(iommu, bdf);
}

void iommu_cell_exit(struct cell *cell)
--
2.5.1

Valentine Sinitsyn

unread,
Sep 12, 2015, 8:06:20 AM9/12/15
to jailho...@googlegroups.com, Valentine Sinitsyn
Implement functions to apply configuration for an IOMMU.
In case something goes wrong, we need to trigger an NMI, which
amd_iommu_init_fault_nmi() configures.

Signed-off-by: Valentine Sinitsyn <valentine...@gmail.com>
---
hypervisor/arch/x86/amd_iommu.c | 74 ++++++++++++++++++++++++++++++++++++++++-
1 file changed, 73 insertions(+), 1 deletion(-)

diff --git a/hypervisor/arch/x86/amd_iommu.c b/hypervisor/arch/x86/amd_iommu.c
index d90fb35..c9de3cb 100644
--- a/hypervisor/arch/x86/amd_iommu.c
+++ b/hypervisor/arch/x86/amd_iommu.c
@@ -713,9 +713,81 @@ static void amd_iommu_completion_wait(struct amd_iommu *iommu)
wait_for_zero(&sem, -1);
}

+/* Parts of this code derives from vtd_init_fault_nmi(). */
+static void amd_iommu_init_fault_nmi(void)
+{
+ union pci_msi_registers msi = {{ 0 }};
+ struct per_cpu *cpu_data;
+ struct amd_iommu *iommu;
+ int n;
+
+ /*
+ * This assumes that at least one bit is set somewhere because we
+ * don't support configurations where Linux is left with no CPUs.
+ */
+ for (n = 0; root_cell.cpu_set->bitmap[n] == 0; n++)
+ /* Empty loop */;
+ cpu_data = per_cpu(ffsl(root_cell.cpu_set->bitmap[n]));
+
+ /*
+ * Save this value globally to avoid multiple reports of the same
+ * case from different CPUs.
+ */
+ fault_reporting_cpu_id = cpu_data->cpu_id;
+
+ for_each_iommu(iommu) {
+ msi.raw[0] = pci_read_config(iommu->bdf, iommu->msi_cap, 4);
+
+ /* Disable MSI during interrupt reprogramming */
+ msi.msg32.enable = 0;
+ pci_write_config(iommu->bdf, iommu->msi_cap, msi.raw[0], 4);
+
+ /* Send NMI to fault_reporting_cpu */
+ msi.msg64.address = (MSI_ADDRESS_VALUE << MSI_ADDRESS_SHIFT) |
+ ((cpu_data->apic_id << 12) & 0xff000);
+ msi.msg64.data = MSI_DM_NMI;
+
+ /* Enable MSI back */
+ msi.msg32.enable = 1;
+
+ /* Write new MSI capabilty block */
+ for (n = 3; n >= 0; n--)
+ pci_write_config(iommu->bdf, iommu->msi_cap + 4 * n,
+ msi.raw[n], 4);
+ }
+
+ /*
+ * There is a race window in between we change fault_reporting_cpu_id
+ * and actually reprogram the MSI. To prevent event loss, signal an
+ * interrupt when done, so iommu_check_pending_faults() is called
+ * upon completion even if no further NMIs due to events would occurr.
+ *
+ * Note we can't simply use CMD_COMPL_WAIT_INT_MASK in
+ * amd_iommu_completion_wait(), as it seems that IOMMU either signal
+ * an interrupt or do memory write, but not both.
+ */
+ apic_send_nmi_ipi(cpu_data);
+}
+
void iommu_config_commit(struct cell *cell_added_removed)
{
- /* TODO: Implement */
+ struct amd_iommu *iommu;
+
+ /* Ensure we'll get NMI on comletion, or if anything goes wrong. */
+ if (cell_added_removed)
+ amd_iommu_init_fault_nmi();
+
+ for_each_iommu(iommu) {
+ /* Flush caches */
+ if (cell_added_removed) {
+ amd_iommu_invalidate_pages(iommu,
+ cell_added_removed->id & 0xffff);
+ amd_iommu_invalidate_pages(iommu,
+ root_cell.id & 0xffff);
+ }
+ /* Execute all commands in the buffer */
+ amd_iommu_completion_wait(iommu);
+ }
}

struct apic_irq_message iommu_get_remapped_root_int(unsigned int iommu,
--
2.5.1

Valentine Sinitsyn

unread,
Sep 12, 2015, 8:06:21 AM9/12/15
to jailho...@googlegroups.com, Valentine Sinitsyn
Add functions to read event logs AMD IOMMU provides and print their
contents. The latter is rather basic, but decoding all possible log
entries is hairy, so we'd better wait and collect stats which
problems occur most often.

Signed-off-by: Valentine Sinitsyn <valentine...@gmail.com>
---
hypervisor/arch/x86/amd_iommu.c | 92 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 92 insertions(+)

diff --git a/hypervisor/arch/x86/amd_iommu.c b/hypervisor/arch/x86/amd_iommu.c
index c9de3cb..b1c287d 100644
--- a/hypervisor/arch/x86/amd_iommu.c
+++ b/hypervisor/arch/x86/amd_iommu.c
@@ -856,6 +856,98 @@ void iommu_shutdown(void)
}
}

+static void amd_iommu_print_event(struct amd_iommu *iommu,
+ struct evt_log_entry *entry)
+{
+ u8 evt_code = (entry->data[1] >> 28) & 0xf;
+ u64 op1, op2;
+
+ op1 = to_u64(entry->data[1] & 0x0fffffff, entry->data[0]);
+ op2 = to_u64(entry->data[3], entry->data[2]);
+
+ /* TODO: Can we handle these errors more gracefully? */
+ if (evt_code == EVENT_TYPE_ILL_CMD_ERR) {
+ printk("FATAL: IOMMU %d reported ILLEGAL_COMMAND_ERROR\n",
+ iommu->idx);
+ panic_stop();
+ }
+
+ if (evt_code == EVENT_TYPE_CMD_HW_ERR) {
+ printk("FATAL: IOMMU %d reported COMMAND_HARDWARE_ERROR\n",
+ iommu->idx);
+ panic_stop();
+ }
+
+ /*
+ * TODO: For now, very basic printer. Consider adapting
+ * iommu_print_event() from the Linux kerel (amd_iommu.c).
+ */
+ printk("AMD IOMMU %d reported event\n", iommu->idx);
+ /* Exclude EVENT_COUNTER_ZERO, as it doesn't report domain ID. */
+ if (evt_code != EVENT_TYPE_EVT_CNT_ZERO)
+ printk(" DeviceId (bus:dev.func): %02x:%02x.%x\n",
+ PCI_BDF_PARAMS(entry->data[0] & 0xffff));
+
+ printk(" EventCode: %lx Operand 1: %lx, Operand 2: %lx\n",
+ evt_code, op1, op2);
+}
+
+static void amd_iommu_restart_event_log(struct amd_iommu *iommu)
+{
+ void *base = iommu->mmio_base;
+
+ wait_for_zero(base + MMIO_STATUS_OFFSET, MMIO_STATUS_EVT_RUN_MASK);
+
+ mmio_write64_field(base + MMIO_CONTROL_OFFSET, CONTROL_EVT_LOG_EN, 0);
+
+ /* Simply start from the scratch */
+ mmio_write64(base + MMIO_EVT_HEAD_OFFSET, 0);
+ mmio_write64(base + MMIO_EVT_TAIL_OFFSET, 0);
+
+ /* Clear EventOverflow (RW1C) */
+ mmio_write64_field(base + MMIO_STATUS_OFFSET,
+ MMIO_STATUS_EVT_OVERFLOW_MASK, 1);
+
+ /* Bring logging back */
+ mmio_write64_field(base + MMIO_CONTROL_OFFSET, CONTROL_EVT_LOG_EN, 1);
+}
+
+static void amd_iommu_poll_events(struct amd_iommu *iommu)
+{
+ struct evt_log_entry *evt;
+ u32 head, tail;
+ u64 status;
+
+ status = mmio_read64(iommu->mmio_base + MMIO_STATUS_OFFSET);
+
+ if (status & MMIO_STATUS_EVT_OVERFLOW_MASK) {
+ printk("IOMMU %d: Event Log overflow occurred, "
+ "some events were lost!\n", iommu->idx);
+ amd_iommu_restart_event_log(iommu);
+ }
+
+ while (status & MMIO_STATUS_EVT_INT_MASK) {
+ /* Clear EventLogInt (RW1C) */
+ mmio_write64_field(iommu->mmio_base + MMIO_STATUS_OFFSET,
+ MMIO_STATUS_EVT_INT_MASK, 1);
+
+ head = mmio_read32(iommu->mmio_base + MMIO_EVT_HEAD_OFFSET);
+ tail = mmio_read32(iommu->mmio_base + MMIO_EVT_TAIL_OFFSET);
+
+ while (head != tail) {
+ evt = (struct evt_log_entry *)(
+ iommu->evt_log_base + head);
+ amd_iommu_print_event(iommu, evt);
+ head = (head + sizeof(*evt)) % EVT_LOG_SIZE;
+ }
+
+ mmio_write32(iommu->evt_log_base + MMIO_EVT_HEAD_OFFSET, head);
+
+ /* Re-read status to catch new events, as Linux does */
+ status = mmio_read64(iommu->mmio_base + MMIO_STATUS_OFFSET);
+ }
+}
+
void iommu_check_pending_faults(void)

Valentine Sinitsyn

unread,
Sep 12, 2015, 8:06:22 AM9/12/15
to jailho...@googlegroups.com, Valentine Sinitsyn
Add iommu_pending_faults() for amd_iommu. This looks into
Hardware Event Register first, and then loops over the event log
printing what's in it. This way, we don't miss errors that happen
when event logging is unavailable.

Signed-off-by: Valentine Sinitsyn <valentine...@gmail.com>
---
hypervisor/arch/x86/amd_iommu.c | 39 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/hypervisor/arch/x86/amd_iommu.c b/hypervisor/arch/x86/amd_iommu.c
index b1c287d..dc724b2 100644
--- a/hypervisor/arch/x86/amd_iommu.c
+++ b/hypervisor/arch/x86/amd_iommu.c
@@ -948,9 +948,46 @@ static void amd_iommu_poll_events(struct amd_iommu *iommu)
}
}

+static void amd_iommu_handle_hardware_event(struct amd_iommu *iommu)
+{
+ struct evt_log_entry hev_entry;
+ u64 hev, buf;
+
+ hev = mmio_read64(iommu->mmio_base + MMIO_HEV_OFFSET);
+
+ /* Check if hardware event is present and print it */
+ if (hev & MMIO_HEV_VALID) {
+ if (hev & MMIO_HEV_OVERFLOW) {
+ printk("IOMMU %d: Hardware Event Overflow occurred, "
+ "some events were lost!\n", iommu->idx);
+ }
+ buf = mmio_read64(
+ iommu->mmio_base + MMIO_HEV_UPPER_OFFSET);
+ memcpy(hev_entry.data, &buf, sizeof(buf));
+ buf = mmio_read64(
+ iommu->mmio_base + MMIO_HEV_LOWER_OFFSET);
+ memcpy(&hev_entry.data[2], &buf, sizeof(buf));
+
+ amd_iommu_print_event(iommu, &hev_entry);
+
+ /* Clear Hardware Event */
+ mmio_write64(iommu->mmio_base + MMIO_HEV_OFFSET, 0);
+ }
+}
+
void iommu_check_pending_faults(void)
{
- /* TODO: Implement */
+ struct amd_iommu *iommu;
+
+ if (this_cpu_data()->cpu_id != fault_reporting_cpu_id)
+ return;
+
+ for_each_iommu(iommu) {
+ /* Handle Hardware Event, if supported */
+ if (amd_iommu_get_he_sup(iommu))
+ amd_iommu_handle_hardware_event(iommu);
+ amd_iommu_poll_events(iommu);
+ }
}

bool iommu_cell_emulates_ir(struct cell *cell)
--
2.5.1

Jan Kiszka

unread,
Sep 14, 2015, 2:46:01 AM9/14/15
to Valentine Sinitsyn, jailho...@googlegroups.com
On 2015-09-12 14:05, Valentine Sinitsyn wrote:
> Hi all,
>
> This patchset implements AMD-Vi (AMD IOMMU) support. Only memory translation
> (DMAR equivalent) is implemented so far, interrupt remapping support is still
> missing.
>
> There could be some issues with event logging, non-root cells and features I
> don't have on my hardware, like Device Table Segmentation. Please report bugs.
> Otherwise, I feel the patchset is suitable for initial review.

Queued.

>
> One more thing worth noting is that transactions coming from PCI devices
> unlisted in cell configs will currently pass untranslated. Should we
> target-abort them instead?

Yes, that would be consistent with VT-d then: unknown devices are
blocked for any transaction.

Jan

--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

Jan Kiszka

unread,
Sep 14, 2015, 2:49:16 AM9/14/15
to Valentine Sinitsyn, jailho...@googlegroups.com
On 2015-09-12 14:05, Valentine Sinitsyn wrote:
> +static int amd_iommu_init_pci(struct amd_iommu *entry,
> + struct jailhouse_iommu *iommu)
> +{

...

> + /* Store MSI capability pointer */
> + entry->msi_cap = pci_find_capability_by_id(entry->bdf, PCI_CAP_MSI);
> + if (!entry->msi_cap)
> + return trace_error(-EINVAL);

You could add this number to jailhouse_iommu. Then your wouldn't need to
search for it and could drop patch 2.

Valentine Sinitsyn

unread,
Sep 14, 2015, 2:51:21 AM9/14/15
to Jan Kiszka, jailho...@googlegroups.com
Hi Jan,

On 14.09.2015 11:45, Jan Kiszka wrote:
> On 2015-09-12 14:05, Valentine Sinitsyn wrote:
>> Hi all,
>>
>> This patchset implements AMD-Vi (AMD IOMMU) support. Only memory translation
>> (DMAR equivalent) is implemented so far, interrupt remapping support is still
>> missing.
>>
>> There could be some issues with event logging, non-root cells and features I
I played with event logging this weekend, and it was spotty. I can't see
anything I do wrong in my code, so maybe my hardware is the reason. It
would be nice if you could try it on some other board. Just unmap some
JAILHOUSE_MEM_DMA regions from the config, and you should see lots of
IO_PAGE_FAULT events reported.

>> don't have on my hardware, like Device Table Segmentation. Please report bugs.
>> Otherwise, I feel the patchset is suitable for initial review.
>
> Queued.
Thanks.

>> One more thing worth noting is that transactions coming from PCI devices
>> unlisted in cell configs will currently pass untranslated. Should we
>> target-abort them instead?
>
> Yes, that would be consistent with VT-d then: unknown devices are
> blocked for any transaction.
Okay, will look into it.

Valentine

Jan Kiszka

unread,
Sep 14, 2015, 2:54:42 AM9/14/15
to Valentine Sinitsyn, jailho...@googlegroups.com
On 2015-09-12 14:05, Valentine Sinitsyn wrote:
> diff --git a/hypervisor/include/jailhouse/cell-config.h b/hypervisor/include/jailhouse/cell-config.h
> index 261d9c7..aebe007 100644
> --- a/hypervisor/include/jailhouse/cell-config.h
> +++ b/hypervisor/include/jailhouse/cell-config.h
> @@ -118,6 +118,13 @@ struct jailhouse_pci_capability {
>
> #define JAILHOUSE_MAX_IOMMU_UNITS 8
>
> +struct jailhouse_iommu {
> + __u64 base;
> + __u64 size;

__u32 size should be enough for this device.

> + __u16 amd_bdf;
> + __u16 amd_cap;

As suggested before:

+ __u8 amd_msi_cap_ofs;

> +} __attribute__((packed));
> +

Jan Kiszka

unread,
Sep 25, 2015, 8:40:35 AM9/25/15
to Valentine Sinitsyn, jailho...@googlegroups.com
On 2015-09-12 14:05, Valentine Sinitsyn wrote:
> Hi all,
>
> This patchset implements AMD-Vi (AMD IOMMU) support. Only memory translation
> (DMAR equivalent) is implemented so far, interrupt remapping support is still
> missing.
>
> There could be some issues with event logging, non-root cells and features I
> don't have on my hardware, like Device Table Segmentation. Please report bugs.
> Otherwise, I feel the patchset is suitable for initial review.

Just a brief note that the code runs on the Carrizo board, but only the
root cell was tested so far. Looking for non-root cell test case now.

How do I find out if I'm stressing any of the mentioned extended features?

Valentine Sinitsyn

unread,
Sep 25, 2015, 9:39:24 AM9/25/15
to Jan Kiszka, jailho...@googlegroups.com
Hi Jan,

On 25.09.2015 17:40, Jan Kiszka wrote:
> On 2015-09-12 14:05, Valentine Sinitsyn wrote:
>> Hi all,
>>
>> This patchset implements AMD-Vi (AMD IOMMU) support. Only memory translation
>> (DMAR equivalent) is implemented so far, interrupt remapping support is still
>> missing.
>>
>> There could be some issues with event logging, non-root cells and features I
>> don't have on my hardware, like Device Table Segmentation. Please report bugs.
>> Otherwise, I feel the patchset is suitable for initial review.
>
> Just a brief note that the code runs on the Carrizo board, but only the
> root cell was tested so far. Looking for non-root cell test case now.
Good to know, thanks.

> How do I find out if I'm stressing any of the mentioned extended features?
First, jailhouse-config-tool should generate correct config for your
board, or at least you shouldn't have to fix IOMMU-related things
manually. That is indirect test for IVRS parsing code.

For Device Table Segmentation, the easiest thing I guess is to printk()
entry->dev_tbl_seg_sup in amd_iommu_init_features(). Any non-zero value
means you have this feature.

Event logging is trickier, as you don't get an event unless something
goes wrong with an IOMMU. Perhaps try to strip JAILHOUSE_MEM_DMA off the
root cell config. Of course, you'll get a lock-up, but you should see
lots of events printed on serial console.

HTH,
Valentine

Jan Kiszka

unread,
Sep 25, 2015, 12:29:32 PM9/25/15
to Valentine Sinitsyn, jailho...@googlegroups.com
On 2015-09-25 15:39, Valentine Sinitsyn wrote:
> Hi Jan,
>
> On 25.09.2015 17:40, Jan Kiszka wrote:
>> On 2015-09-12 14:05, Valentine Sinitsyn wrote:
>>> Hi all,
>>>
>>> This patchset implements AMD-Vi (AMD IOMMU) support. Only memory
>>> translation
>>> (DMAR equivalent) is implemented so far, interrupt remapping support
>>> is still
>>> missing.
>>>
>>> There could be some issues with event logging, non-root cells and
>>> features I
>>> don't have on my hardware, like Device Table Segmentation. Please
>>> report bugs.
>>> Otherwise, I feel the patchset is suitable for initial review.
>>
>> Just a brief note that the code runs on the Carrizo board, but only the
>> root cell was tested so far. Looking for non-root cell test case now.
> Good to know, thanks.
>
>> How do I find out if I'm stressing any of the mentioned extended
>> features?
> First, jailhouse-config-tool should generate correct config for your
> board, or at least you shouldn't have to fix IOMMU-related things
> manually. That is indirect test for IVRS parsing code.

I used those bits unmodified - passed.

>
> For Device Table Segmentation, the easiest thing I guess is to printk()
> entry->dev_tbl_seg_sup in amd_iommu_init_features(). Any non-zero value
> means you have this feature.

Done, and it has that feature - passed.

>
> Event logging is trickier, as you don't get an event unless something
> goes wrong with an IOMMU. Perhaps try to strip JAILHOUSE_MEM_DMA off the
> root cell config. Of course, you'll get a lock-up, but you should see
> lots of events printed on serial console.

AMD IOMMU 0 reported event
DeviceId (bus:dev.func): 01:00.0
EventCode: 2 Operand 1: 100, Operand 2: 36d97500
AMD IOMMU 0 reported event
DeviceId (bus:dev.func): 00:11.0
EventCode: 2 Operand 1: 88, Operand 2: 37fd0000

Passed.

What does not pass, but that's unrelated to your patches, is "jailhouse
disable" - we corrupt some userspace state on AMD during handover,
apparently.

Valentine Sinitsyn

unread,
Sep 25, 2015, 12:36:39 PM9/25/15
to Jan Kiszka, jailho...@googlegroups.com
Heh, this means my own board is buggy as I only get these messages
sporadically. Too bad; but nice to see my small tests were passed. Thanks.

> What does not pass, but that's unrelated to your patches, is "jailhouse
> disable" - we corrupt some userspace state on AMD during handover,
> apparently.
I tried re-enabling Jailhouse, and it worked for me. Looks like a bug
has sneaked there. Could you share relevant parts of serial output for me?

Valentine

Jan Kiszka

unread,
Sep 25, 2015, 1:02:15 PM9/25/15
to Valentine Sinitsyn, jailho...@googlegroups.com
On 2015-09-25 18:36, Valentine Sinitsyn wrote:
>> What does not pass, but that's unrelated to your patches, is "jailhouse
>> disable" - we corrupt some userspace state on AMD during handover,
>> apparently.
> I tried re-enabling Jailhouse, and it worked for me. Looks like a bug
> has sneaked there. Could you share relevant parts of serial output for me?

"jailhouse disable" generates a segmentation fault, and the kernel
reports a bug in its userspace. I suspect some MSR stuff is not properly
restored here. Trying to bisect right now, if this isn't a regression of
my refactorings back then.... yeah, it probably is. Bisecting further.

Valentine Sinitsyn

unread,
Oct 17, 2015, 11:51:56 AM10/17/15
to Jailhouse
Hi,

On 12.09.2015 17:05, Valentine Sinitsyn wrote:
> Implement iommu_init() and iommu_shutdown() for AMD-based systems.
...
Note I block SMI completely here (I believe, VT-d code does the same).
Although it should do the trick for now, I think we'll need more
intelligent SMI handling in future, possibly including SMM mode
virtualization. Otherwise, enabling Jailhouse can potentially block a
BMC controller from communicating with the CPU (at least on AMD) which
is probably not we want.

Valentine

Valentine Sinitsyn

unread,
Nov 23, 2015, 1:42:29 AM11/23/15
to Jan Kiszka, jailho...@googlegroups.com
Hi Jan,

On 14.09.2015 11:45, Jan Kiszka wrote:
> On 2015-09-12 14:05, Valentine Sinitsyn wrote:
>> Hi all,
>>
>> This patchset implements AMD-Vi (AMD IOMMU) support. Only memory translation
>> (DMAR equivalent) is implemented so far, interrupt remapping support is still
>> missing.
>>
>> There could be some issues with event logging, non-root cells and features I
>> don't have on my hardware, like Device Table Segmentation. Please report bugs.
>> Otherwise, I feel the patchset is suitable for initial review.
>
> Queued.
Just a quick ping here. There are a few changes suggested in this thread
- are they the only one? Or review has just stopped for some reasons?

Thanks,
Valentine

Jan Kiszka

unread,
Nov 25, 2015, 4:47:04 AM11/25/15
to Valentine Sinitsyn, jailho...@googlegroups.com
It was suspended. I only tested the patches (think I reported that), and
then moved to something else. Let me try to resume that. But any updates
addressing the already brought up points are welcome as well.

Jan Kiszka

unread,
Dec 27, 2015, 6:09:08 AM12/27/15
to Valentine Sinitsyn, jailho...@googlegroups.com
On 2015-09-12 14:05, Valentine Sinitsyn wrote:
As mentioned in the first ad-hoc review, I would prefer to avoid this
extension and instead go for pre-discovered cap offsets as part of the
config.

Jan

Jan Kiszka

unread,
Dec 27, 2015, 6:09:17 AM12/27/15
to Valentine Sinitsyn, jailho...@googlegroups.com
On 2015-09-12 14:05, Valentine Sinitsyn wrote:
Reviewed-by: Jan Kiszka <jan.k...@siemens.com>

Jan Kiszka

unread,
Dec 27, 2015, 6:09:22 AM12/27/15
to Valentine Sinitsyn, jailho...@googlegroups.com
On 2015-09-12 14:05, Valentine Sinitsyn wrote:
Reviewed-by: Jan Kiszka <jan.k...@siemens.com>


Jan Kiszka

unread,
Dec 27, 2015, 6:09:48 AM12/27/15
to Valentine Sinitsyn, jailho...@googlegroups.com
On 2015-09-12 14:05, Valentine Sinitsyn wrote:
Check regarding what?

The renaming is planned for the next round?
Trailing whitespace.
You could align ^^^ this without creating an
over-long line - unless you also put 'type' into braces, like 'cmd'.

> +
> +#define EVENT_TYPE_ILL_CMD_ERR 0x05
> +#define EVENT_TYPE_CMD_HW_ERR 0x06
> +#define EVENT_TYPE_EVT_CNT_ZERO 0x0a
> +
> +#define MSI_ADDRESS_SHIFT 20
> +
> +struct dev_table_entry {
> + u64 data[4];
> +} __attribute__((packed));
> +
> +struct cmd_buf_entry {
> + u32 data[4];
> +} __attribute__((packed));
> +
> +struct evt_log_entry {
> + u32 data[4];
> +} __attribute__((packed));

Nit: you don't need to pack those structs, they are well aligned on the
target architecture by nature.

>
> #endif
>

Jan

Jan Kiszka

unread,
Dec 27, 2015, 6:10:30 AM12/27/15
to Valentine Sinitsyn, jailho...@googlegroups.com
On 2015-09-12 14:05, Valentine Sinitsyn wrote:
If it's simple to clean up, resolving the TODO(s) would be welcome for
the next round.

> +#define AMD_IOMMU_PTE_MODE_MASK (DTE_MODE_MASK << DTE_MODE_SHIFT)
> +#define PM_LEVEL_ENC(x) (((x) << DTE_MODE_SHIFT) & AMD_IOMMU_PTE_MODE_MASK)
> +
> +static bool amd_iommu_entry_valid(pt_entry_t pte, unsigned long flags)
> +{
> + return (*pte & flags) == flags;
> +}
> +
> +static unsigned long amd_iommu_get_flags(pt_entry_t pte)
> +{
> + return *pte & 0x7800000000000001;
> +}
> +
> +/* TODO: Return my macros after debugging */

???
That means l6 and l7 or only l6?

When would we need that many levels? Would be good to document the
implications of leaving out some level(s).
Hmm, that's already a seriously large page - 256 TB. I doubt we need
that soon, not to speak of the next larger level.

However, amd_iommu_paging has to contain all levels, we can't configure
the hardware to expect less, can we? If we could, I bet it makes sense
to do so, in order to save TLB entries and speed up the cache misses.
Jan

Valentine Sinitsyn

unread,
Dec 28, 2015, 1:21:40 AM12/28/15
to Jan Kiszka, jailho...@googlegroups.com
Hi Jan,
Looks like I left some internal TODO garbage in the code (you've already
spotted two occurrences). Nevermind, I will remove it in the next round.
Yes, but I preferred to be explicit.

Valentine

Valentine Sinitsyn

unread,
Dec 28, 2015, 1:25:23 AM12/28/15
to Jan Kiszka, jailho...@googlegroups.com
Level 7 is special, as AMD IOMMU supports 6-level page tables.
Basically, it means to skip some levels. No implementation I know
support this now.
OK. I feel that trying to support whole 6 levels wasn't a good idea in
the first place, and probably I'll revert to initial 6 levels, as Linux
kernel does IIRC. Upper levels are always unused in current Jailhouse
implementation, as I patch paging structures in amd_iommu.c
Valentine

Jan Kiszka

unread,
Dec 29, 2015, 2:37:09 AM12/29/15
to Valentine Sinitsyn, jailho...@googlegroups.com
On 2015-09-12 14:05, Valentine Sinitsyn wrote:
> Implement iommu_init() and iommu_shutdown() for AMD-based systems.
>
> Signed-off-by: Valentine Sinitsyn <valentine...@gmail.com>
> ---
> hypervisor/arch/x86/amd_iommu.c | 393 +++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 388 insertions(+), 5 deletions(-)
>
> diff --git a/hypervisor/arch/x86/amd_iommu.c b/hypervisor/arch/x86/amd_iommu.c
> index 1a076fb..f43bed7 100644
> --- a/hypervisor/arch/x86/amd_iommu.c
> +++ b/hypervisor/arch/x86/amd_iommu.c
> @@ -13,25 +13,365 @@
> #include <jailhouse/entry.h>
> #include <jailhouse/cell.h>
> #include <jailhouse/cell-config.h>
> +#include <jailhouse/control.h>
> +#include <jailhouse/mmio.h>
> #include <jailhouse/pci.h>
> #include <jailhouse/printk.h>
> +#include <jailhouse/string.h>
> #include <jailhouse/types.h>
> +#include <asm/amd_iommu.h>
> +#include <asm/amd_iommu_paging.h>
> #include <asm/apic.h>
> #include <asm/iommu.h>
> #include <asm/percpu.h>
>
> -unsigned int iommu_mmio_count_regions(struct cell *cell)
> +/* Allocate minimum space possible (4K or 256 entries) */
> +
> +#define buffer_size(name, entry) ((1 << name##_LEN_EXPONENT) * \
> + sizeof(entry))
> +
> +#define CMD_BUF_LEN_EXPONENT 8
> +#define EVT_LOG_LEN_EXPONENT 8
> +
> +#define CMD_BUF_SIZE buffer_size(CMD_BUF, struct cmd_buf_entry)
> +#define EVT_LOG_SIZE buffer_size(EVT_LOG, struct evt_log_entry)
> +
> +#define BITS_PER_SHORT 16
> +
> +static struct amd_iommu {
> + int idx;
> + void *mmio_base;
> + int mmio_size;
> + /* Command Buffer, Event Log */

Kind of redundant comment, given the readable variable names...

> + unsigned char *cmd_buf_base;
> + unsigned char *evt_log_base;
> + /* Device table */
> + void *devtable_segments[DEV_TABLE_SEG_MAX];
> + u8 dev_tbl_seg_sup;
> + u32 cmd_tail_ptr;
> +
> + u16 bdf;
> + u16 msi_cap;
> +
> + /* ACPI overrides for feature bits */
> + union {
> + u32 raw;
> + struct {
> + u8 __pad0:7;
> + u8 he_sup:1;
> + u32 __pad1:22;
> + u8 hats:2;
> + };
> + } features;
> +} iommu_units[JAILHOUSE_MAX_IOMMU_UNITS];
> +
> +#define for_each_iommu(iommu) for(iommu = iommu_units; \

"... for (..."

> + iommu < iommu_units + iommu_units_count; \
> + iommu++)

Indention should use more tabs, less spaces here.

> +
> +#define to_u64(hi, lo) ((u64)(hi) << 32 | (lo))
> +
> +static unsigned int iommu_units_count;
> +
> +static unsigned int fault_reporting_cpu_id;
> +
> +/*
> + * XXX: The initial idea was to have this equals to minimum value for

"equal"? A non-native speaking attempt to check.

> + * all IOMMUs in the system. It's not clear how to reuse paging.c functions
> + * for 6 page table levels, so we'll just set it to constant value of four.
> + */
> +static int amd_iommu_pt_levels = 4;

This is still conceptually unclear to me: amd_iommu_pt_levels will never
be larger than 4, given the current code, no? But the previous patch
accounts for more levels. Is that dead code then?

> +
> +/*
> + * Real AMD IOMMU paging structures: a least common denominator for all IOMMUs
> + * in the current system.
> + */
> +static struct paging real_amd_iommu_paging[AMD_IOMMU_MAX_PAGE_TABLE_LEVELS];
> +
> +static unsigned long amd_iommu_get_efr(struct amd_iommu *iommu)
> +{
> + return mmio_read64(iommu->mmio_base + MMIO_EXT_FEATURES);
> +}
> +
> +/*
> + * Following functions return various feature bits from EFR or
> + * IVRS ACPI table. The latter takes precedence as per Sect. 5.
> + * Note there is no indicator flag for ACPI overrides; we rely on
> + * the assumption that at least one feature bit will be set in
> + * the IVRS.
> + */
> +static unsigned int amd_iommu_get_hats(struct amd_iommu *iommu)
> +{
> + unsigned long efr = amd_iommu_get_efr(iommu);
> +
> + if (iommu->features.raw)
> + return iommu->features.hats;
> +
> + return (efr & FEATURE_HATS_MASK) >> FEATURE_HATS_SHIFT;
> +}
> +
> +static unsigned int amd_iommu_get_he_sup(struct amd_iommu *iommu)
> {
> + unsigned long efr = amd_iommu_get_efr(iommu);
> +
> + if (iommu->features.raw)
> + return iommu->features.he_sup;
> +
> + return (efr & FEATURE_HE_SUP_MASK) >> FEATURE_HE_SUP_SHIFT;
> +}
> +
> +static int amd_iommu_init_mmio(struct amd_iommu *entry,
> + struct jailhouse_iommu *iommu)
> +{
> + int err = -ENOMEM;
> +
> + /* Allocate MMIO space (configured in amd_iommu_init_pci()) */
> + entry->mmio_base = page_alloc(&remap_pool, PAGES(iommu->size));
> + if (!entry->mmio_base)
> + goto out;
> + entry->mmio_size = iommu->size;
> +
> + err = paging_create(&hv_paging_structs, iommu->base, iommu->size,
> + (unsigned long)entry->mmio_base,
> + PAGE_DEFAULT_FLAGS | PAGE_FLAG_DEVICE,
> + PAGING_NON_COHERENT);
> + if (err)
> + goto out_free_mmio;
> +
> return 0;
> +
> +out_free_mmio:
> + page_free(&remap_pool, entry->mmio_base, PAGES(iommu->size));

I think we can get away without rollback here as this only fails during
hypervisor init, right? Then we do not leak any memory as we will simply
destroy the hypervisor session.

> +out:
> + return err;
> }
>
> -int iommu_init(void)
> +static int amd_iommu_init_buffers(struct amd_iommu *entry,
> + struct jailhouse_iommu *iommu)
> {
> - printk("WARNING: AMD IOMMU support is not implemented yet\n");
> - /* TODO: Implement */
> + int err = -ENOMEM;
> +
> + /* Allocate and configure command buffer */
> + entry->cmd_buf_base = page_alloc(&mem_pool, PAGES(CMD_BUF_SIZE));
> + if (!entry->cmd_buf_base)
> + goto out;
> +
> + mmio_write64(entry->mmio_base + MMIO_CMD_BUF_OFFSET,
> + paging_hvirt2phys(entry->cmd_buf_base) |
> + ((u64)CMD_BUF_LEN_EXPONENT << BUF_LEN_EXPONENT_SHIFT));
> +
> + entry->cmd_tail_ptr = 0;
> +
> + /* Allocate and configure event log */
> + entry->evt_log_base = page_alloc(&mem_pool, PAGES(EVT_LOG_SIZE));
> + if (!entry->evt_log_base)
> + goto out_free_cmd_buf;
> +
> + mmio_write64(entry->mmio_base + MMIO_EVT_LOG_OFFSET,
> + paging_hvirt2phys(entry->evt_log_base) |
> + ((u64)EVT_LOG_LEN_EXPONENT << BUF_LEN_EXPONENT_SHIFT));
> +
> + return 0;
> +
> +out_free_cmd_buf:
> + page_free(&mem_pool, entry->cmd_buf_base, PAGES(CMD_BUF_SIZE));

Same here: no rollback if we only fail during hypervisor init.

> +out:
> + return trace_error(err);
> +}
> +
> +static int amd_iommu_init_pci(struct amd_iommu *entry,
> + struct jailhouse_iommu *iommu)
> +{
> + u64 caps_header;
> + u32 lo, hi;
> +
> + entry->bdf = iommu->amd_bdf;
> +
> + /* Check alignment */
> + if (iommu->size & (iommu->size - 1))
> + return trace_error(-EINVAL);
> +
> + /* Check that EFR is supported */
> + caps_header = pci_read_config(entry->bdf, iommu->amd_cap, 4);
> + if (!(caps_header & IOMMU_CAP_EFR))
> + return trace_error(-EINVAL);
> +
> + lo = pci_read_config(
> + entry->bdf, iommu->amd_cap + CAPS_IOMMU_BASE_LOW, 4);
> + hi = pci_read_config(
> + entry->bdf, iommu->amd_cap + CAPS_IOMMU_BASE_HI, 4);
> +
> + if ((lo & IOMMU_CAP_ENABLE) &&
> + (to_u64(hi, lo & ~IOMMU_CAP_ENABLE) != iommu->base)) {
> + printk("FATAL: IOMMU %d config is locked in invalid state. "
> + "Please reboot your system and try again.\n", entry->idx);
> + return trace_error(-EPERM);
> + }
> +
> + /* Should be configured by BIOS, but we want to be sure */
> + pci_write_config(entry->bdf,
> + iommu->amd_cap + CAPS_IOMMU_BASE_HI,
> + (u32)(iommu->base >> 32), 4);
> + pci_write_config(entry->bdf,
> + iommu->amd_cap + CAPS_IOMMU_BASE_LOW,
> + (u32)(iommu->base & 0xffffffff) | IOMMU_CAP_ENABLE,
> + 4);
> +
> + /* Store MSI capability pointer */
> + entry->msi_cap = pci_find_capability_by_id(entry->bdf, PCI_CAP_MSI);
> + if (!entry->msi_cap)
> + return trace_error(-EINVAL);
> +
> + return 0;
> +}
> +
> +static int amd_iommu_init_features(struct amd_iommu *entry,
> + struct jailhouse_iommu *iommu)
> +{
> + unsigned char smi_filter_regcnt;
> + u64 efr, ctrl_reg = 0, smi_freg = 0, val;
> + unsigned int n, hats;
> + void *reg_base;
> +
> + entry->features.raw = iommu->amd_features;
> +
> + efr = amd_iommu_get_efr(entry);
> +
> + /* Minimum HATS wins */
> + hats = amd_iommu_get_hats(entry);
> + if (amd_iommu_pt_levels < 0 || amd_iommu_pt_levels > hats + 4)

How could amd_iommu_pt_levels become < 0?

> + amd_iommu_pt_levels = hats + 4;

So, for my understanding, amd_iommu_pt_levels will always be <= 4, right?

> +
> + /*
> + * Require SMI Filter support. Enable and lock filter but
> + * mark all entries as invalid to disable SMI delivery.
> + */
> + if (!(efr & FEATURE_SMI_FSUP_MASK))
> + return trace_error(-EINVAL);
> +
> + smi_filter_regcnt = (1 << (efr & FEATURE_SMI_FRC_MASK) >>
> + FEATURE_SMI_FRC_SHIFT);
> + for (n = 0; n < smi_filter_regcnt; n++) {
> + reg_base = entry->mmio_base + MMIO_SMI_FREG0_OFFSET + (n << 3);
> + smi_freg = mmio_read64(reg_base);
> +
> + if (!(smi_freg & SMI_FREG_LOCKED)) {
> + /*
> + * Program unlocked register the way we need:
> + * invalid and locked.
> + */
> + mmio_write64(reg_base, SMI_FREG_LOCKED);
> + }
> + else if (smi_freg & SMI_FREG_VALID) {

} else ...

> + /*
> + * The register is locked and programed
> + * the way we don't want - error.
> + */
> + printk("ERROR: SMI Filter register %d is locked "
> + "and can't be reprogrammed. Please reboot "
> + "and check no other component uses the "
> + "IOMMU %d.\n", n, entry->idx);
> + return trace_error(-EPERM);
> + }
> + /*
> + * The register is locked, but programmed
> + * the way we need - OK to go.
> + */
> + }
> +
> + ctrl_reg |= (CONTROL_SMIF_EN | CONTROL_SMIFLOG_EN);
> +
> + /* Enable maximum Device Table segmentation possible */
> + entry->dev_tbl_seg_sup = (efr & FEATURE_SEG_SUP_MASK) >>
> + FEATURE_SEG_SUP_SHIFT;
> + if (entry->dev_tbl_seg_sup) {
> + val = entry->dev_tbl_seg_sup & CONTROL_SEG_SUP_MASK;
> + ctrl_reg |= val << CONTROL_SEG_SUP_SHIFT;
> + }
> +
> + mmio_write64(entry->mmio_base + MMIO_CONTROL_OFFSET, ctrl_reg);
> +
> return 0;
> }
>
> +static void amd_iommu_enable_command_processing(struct amd_iommu *iommu)
> +{
> + u64 ctrl_reg;
> +
> + ctrl_reg = mmio_read64(iommu->mmio_base + MMIO_CONTROL_OFFSET);
> + ctrl_reg |= CONTROL_IOMMU_EN | CONTROL_CMD_BUF_EN |
> + CONTROL_EVT_LOG_EN | CONTROL_EVT_INT_EN;
> + mmio_write64(iommu->mmio_base + MMIO_CONTROL_OFFSET, ctrl_reg);
> +}
> +
> +unsigned int iommu_mmio_count_regions(struct cell *cell)
> +{
> + return cell == &root_cell ? iommu_count_units() : 0;

You are not emulating the IOMMU towards the root cell, are you? We have
to do this for VT-d in order to hand over interrupt remapping. Therefore
you find a mmio_region_register there, and a non-zero region counter.

> +}
> +
> +int iommu_init(void)
> +{
> + const struct paging *toplevel_paging = &amd_iommu_paging[0];
> + struct jailhouse_iommu *iommu;
> + struct amd_iommu *entry;
> + unsigned int n;
> + int err;
> +
> + iommu = &system_config->platform_info.x86.iommu_units[0];
> + for (n = 0; iommu->base && n < iommu_count_units(); iommu++, n++) {
> + entry = &iommu_units[iommu_units_count];
> +
> + entry->idx = n;
> +
> + /* Protect against accidental VT-d configs. */
> + if (!iommu->amd_bdf)
> + return trace_error(-EINVAL);
> +
> + /* Initialize PCI registers */
> + err = amd_iommu_init_pci(entry, iommu);
> + if (err)
> + return err;
> +
> + /* Initialize MMIO space */
> + err = amd_iommu_init_mmio(entry, iommu);
> + if (err)
> + return err;
> +
> + /* Setup IOMMU features */
> + err = amd_iommu_init_features(entry, iommu);
> + if (err)
> + return err;
> +
> + /* Initialize command buffer and event log */
> + err = amd_iommu_init_buffers(entry, iommu);
> + if (err)
> + return err;
> +
> + /* Enable the IOMMU */
> + amd_iommu_enable_command_processing(entry);
> +
> + iommu_units_count++;
> + }
> +
> + if (amd_iommu_pt_levels > AMD_IOMMU_MAX_PAGE_TABLE_LEVELS)

Possible?

> + return trace_error(-ERANGE);
> +
> + toplevel_paging += AMD_IOMMU_MAX_PAGE_TABLE_LEVELS - amd_iommu_pt_levels;
> + memcpy(real_amd_iommu_paging, toplevel_paging,
> + sizeof(struct paging) * amd_iommu_pt_levels);

Wondering: if we only had to support 4 levels in practice, couldn't we
just patch the differences to x86_64_paging after copying from it? Or
are there too many?

> +
> + /*
> + * Real page table has less levels than amd_iommu_paging - setup new
> + * top level paging structure.
> + */
> + if (amd_iommu_pt_levels != AMD_IOMMU_MAX_PAGE_TABLE_LEVELS) {
> + real_amd_iommu_paging[0].page_size = 0;
> + real_amd_iommu_paging[0].get_phys = paging_get_phys_invalid;
> + }
> +
> + return iommu_cell_init(&root_cell);
> +}
> +
> int iommu_cell_init(struct cell *cell)
> {
> /* TODO: Implement */
> @@ -92,7 +432,50 @@ int iommu_map_interrupt(struct cell *cell, u16 device_id, unsigned int vector,
>
> void iommu_shutdown(void)
> {
> - /* TODO: Implement */
> + struct amd_iommu *iommu;
> + u64 ctrl_reg;
> + u32 seg_size;
> + int n, err;
> + void *ptr;
> +
> + for_each_iommu(iommu) {
> + /* Disable the IOMMU */
> + ctrl_reg = mmio_read64(iommu->mmio_base + MMIO_CONTROL_OFFSET);
> + ctrl_reg &= ~(CONTROL_IOMMU_EN | CONTROL_CMD_BUF_EN |
> + CONTROL_EVT_LOG_EN | CONTROL_EVT_INT_EN);
> + mmio_write64(iommu->mmio_base + MMIO_CONTROL_OFFSET, ctrl_reg);
> +
> + /* Free Device Table (and segments) */
> + seg_size = DEV_TABLE_SIZE / (1 << iommu->dev_tbl_seg_sup);
> + for (n = 0; n < DEV_TABLE_SEG_MAX; n++) {
> + ptr = iommu->devtable_segments[n];
> + if (ptr)
> + page_free(&mem_pool, ptr, PAGES(seg_size));
> + }
> +
> + /* Free Command Buffer and Event Log */
> + page_free(&mem_pool, iommu->cmd_buf_base,
> + PAGES(CMD_BUF_SIZE));
> +
> + /* Free Event Log */
> + page_free(&mem_pool, iommu->evt_log_base,
> + PAGES(EVT_LOG_SIZE));
> +
> + /* Free MMIO */
> + err = paging_destroy(&hv_paging_structs,
> + (unsigned long)iommu->mmio_base,
> + iommu->mmio_size, PAGING_NON_COHERENT);
> + if (err < 0) {

if (err) {

> + printk("ERROR: IOMMU %d: Unable to destroy "
> + "MMIO page mappings\n", iommu->idx);
> + /*
> + * There is not much more else we can do,
> + * as we are shutting down already.
> + */
> + }
> + page_free(&remap_pool, iommu->mmio_base,
> + PAGES(iommu->mmio_size));
> + }
> }
>
> void iommu_check_pending_faults(void)
>

Jan

Jan Kiszka

unread,
Dec 29, 2015, 2:37:21 AM12/29/15
to Valentine Sinitsyn, jailho...@googlegroups.com
On 2015-09-12 14:05, Valentine Sinitsyn wrote:
> Add iommu_cell_init() and iommu_cell_destroy() for amd_iommu.
>
> Signed-off-by: Valentine Sinitsyn <valentine...@gmail.com>
> ---
> hypervisor/arch/x86/amd_iommu.c | 20 ++++++++++++++++++--
> 1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/hypervisor/arch/x86/amd_iommu.c b/hypervisor/arch/x86/amd_iommu.c
> index f43bed7..0b5ff9b 100644
> --- a/hypervisor/arch/x86/amd_iommu.c
> +++ b/hypervisor/arch/x86/amd_iommu.c
> @@ -374,7 +374,18 @@ int iommu_init(void)
>
> int iommu_cell_init(struct cell *cell)
> {
> - /* TODO: Implement */
> + // HACK for QEMU
> + if (iommu_units_count == 0)
> + return 0;
> +
> + if (cell->id > 0xffff)
> + return trace_error(-ERANGE);
> +
> + cell->arch.amd_iommu.pg_structs.root_paging = real_amd_iommu_paging;
> + cell->arch.amd_iommu.pg_structs.root_table = page_alloc(&mem_pool, 1);
> + if (!cell->arch.amd_iommu.pg_structs.root_table)
> + return trace_error(-ENOMEM);
> +
> return 0;
> }
>
> @@ -404,7 +415,12 @@ void iommu_remove_pci_device(struct pci_device *device)
>
> void iommu_cell_exit(struct cell *cell)
> {
> - /* TODO: Implement */
> + /* TODO: Again, this a copy of vtd.c:iommu_cell_exit */

The pg_structs is a common field for both amd and vtd. Why not defining
it generically so that we can share code, this function, but also parts
of iomme_cell_init and maybe more? Instead of vtd_paging and
real_amd_iommu_paging, there could simply be x86_iommu_paging, defined
by each vendor implementation.

> + // HACK for QEMU
> + if (iommu_units_count == 0)
> + return;
> +
> + page_free(&mem_pool, cell->arch.amd_iommu.pg_structs.root_table, 1);
> }
>
> void iommu_config_commit(struct cell *cell_added_removed)
>

Jan

Jan Kiszka

unread,
Dec 29, 2015, 2:37:30 AM12/29/15
to Valentine Sinitsyn, jailho...@googlegroups.com
On 2015-09-12 14:05, Valentine Sinitsyn wrote:
> Implement iommu_map_memory_region() and iommu_unmap_memory_region()
> for amd_iommu.
>
> Signed-off-by: Valentine Sinitsyn <valentine...@gmail.com>
> ---
> hypervisor/arch/x86/amd_iommu.c | 46 +++++++++++++++++++++++++++++++++++++----
> 1 file changed, 42 insertions(+), 4 deletions(-)
>
> diff --git a/hypervisor/arch/x86/amd_iommu.c b/hypervisor/arch/x86/amd_iommu.c
> index 0b5ff9b..7a73752 100644
> --- a/hypervisor/arch/x86/amd_iommu.c
> +++ b/hypervisor/arch/x86/amd_iommu.c
> @@ -392,14 +392,52 @@ int iommu_cell_init(struct cell *cell)
> int iommu_map_memory_region(struct cell *cell,
> const struct jailhouse_memory *mem)
> {
> - /* TODO: Implement */
> - return 0;
> + unsigned long flags = AMD_IOMMU_PTE_P, zero_bits;

Let's grant a separate line for zero_bits.

> +
> + // HACK for QEMU
> + if (iommu_units_count == 0)
> + return 0;
> +
> + /*
> + * Check that the address is not outside scope of current page
> + * tables (given their level). Each level adds 9 bits and the offset
> + * is 12 bits long, so no bits higher than this should be set.
> + */
> + if (amd_iommu_pt_levels != AMD_IOMMU_MAX_PAGE_TABLE_LEVELS) {
> + zero_bits = ~((1ULL << (amd_iommu_pt_levels * 9 + 12)) - 1);
> + if (mem->virt_start & zero_bits)
> + return trace_error(-ERANGE);
> + }
> +
> + if (!(mem->flags & JAILHOUSE_MEM_DMA))
> + return 0;
> +
> + if (mem->flags & JAILHOUSE_MEM_READ)
> + flags |= AMD_IOMMU_PTE_IR;
> + if (mem->flags & JAILHOUSE_MEM_WRITE)
> + flags |= AMD_IOMMU_PTE_IW;
> +
> + return paging_create(&cell->arch.amd_iommu.pg_structs, mem->phys_start,
> + mem->size, mem->virt_start, flags, PAGING_COHERENT);
> }
> +
> int iommu_unmap_memory_region(struct cell *cell,
> const struct jailhouse_memory *mem)
> {
> - /* TODO: Implement */
> - return 0;
> + /*
> + * TODO: This is almost a complete copy of vtd.c counterpart
> + * (sans QEMU hack). Think of unification.

Indeed. With the proposed common iommu_pg_structs, it should be possible
to avoid this duplication.

> + */
> +
> + // HACK for QEMU
> + if (iommu_units_count == 0)
> + return 0;
> +
> + if (!(mem->flags & JAILHOUSE_MEM_DMA))
> + return 0;
> +
> + return paging_destroy(&cell->arch.amd_iommu.pg_structs, mem->virt_start,
> + mem->size, PAGING_COHERENT);
> }
>
> int iommu_add_pci_device(struct cell *cell, struct pci_device *device)
>

Jan

Valentine Sinitsyn

unread,
Dec 29, 2015, 3:06:37 AM12/29/15
to Jan Kiszka, jailho...@googlegroups.com
Hi Jan,
Page handling code is indeed a bit more complicated than it should.
Initially, I was going to support all six levels. But it never worked on
my dev board (likely due a hardware bug), so I forced the number of
levels to four. This essentially made some code "dead", but it will
become alive as soon as someone recompiles the images with
amd_iommu_pt_levels set to different value. Maybe we should make it a
configurable #define.

Regarding one of your next comments: I tried to patch x86_64 paging
structs (that was what I did initially), but the amount of patching
appeared to be too large to my taste.
Valentine

Valentine Sinitsyn

unread,
Dec 29, 2015, 3:08:33 AM12/29/15
to Jan Kiszka, jailho...@googlegroups.com
On 29.12.2015 13:06, Valentine Sinitsyn wrote:
...snip...
>>> +
>>> +#define to_u64(hi, lo) ((u64)(hi) << 32 | (lo))
>>> +
>>> +static unsigned int iommu_units_count;
>>> +
>>> +static unsigned int fault_reporting_cpu_id;
>>> +
>>> +/*
>>> + * XXX: The initial idea was to have this equals to minimum value for
>>
>> "equal"? A non-native speaking attempt to check.
Not getting your here. It's a typo from one of the previous edits.
Should be "equal". Will fix.

Valentine

Jan Kiszka

unread,
Dec 30, 2015, 7:50:07 PM12/30/15
to Valentine Sinitsyn, jailho...@googlegroups.com
On 2015-09-12 14:05, Valentine Sinitsyn wrote:
> Add basic infrastructure (heavily influenced by Linux amd_iommu driver)
> to submit commands to AMD IOMMU command buffer. For now, having only
> INVALIDATE_IOMMU_PAGES and COMPLETION_WAIT seems to be sufficient.
>
> Signed-off-by: Valentine Sinitsyn <valentine...@gmail.com>
> ---
> hypervisor/arch/x86/amd_iommu.c | 107 ++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 107 insertions(+)
>
> diff --git a/hypervisor/arch/x86/amd_iommu.c b/hypervisor/arch/x86/amd_iommu.c
> index 7a73752..1f0d059 100644
> --- a/hypervisor/arch/x86/amd_iommu.c
> +++ b/hypervisor/arch/x86/amd_iommu.c
> @@ -6,6 +6,10 @@
> * Authors:
> * Valentine Sinitsyn <valentine...@gmail.com>
> *
> + * Commands posting and event log parsing code, as well as many defines
> + * were adapted from Linux's amd_iommu driver written by Joerg Roedel
> + * and Leo Duran.
> + *
> * This work is licensed under the terms of the GNU GPL, version 2. See
> * the COPYING file in the top-level directory.
> */
> @@ -389,6 +393,55 @@ int iommu_cell_init(struct cell *cell)
> return 0;
> }
>
> +static void amd_iommu_completion_wait(struct amd_iommu *iommu);
> +
> +#define CMD_BUF_DRAIN_MAX_ATTEMPTS 8
> +
> +static void amd_iommu_submit_command(struct amd_iommu *iommu,
> + struct cmd_buf_entry *cmd)
> +{
> + u32 head, next_tail, bytes_free;
> + unsigned char *cur_ptr;
> + static bool drain = false;
> + static int drain_attempts = 0;
> +
> +again:
> + head = mmio_read64(iommu->mmio_base + MMIO_CMD_HEAD_OFFSET);
> + next_tail = (iommu->cmd_tail_ptr + sizeof(*cmd)) % CMD_BUF_SIZE;
> + /* XXX: Learn why this works :) */

Hmm...

> + bytes_free = (head - next_tail) % CMD_BUF_SIZE;
> +
> + /* Leave some space for COMPLETION_WAIT that drains the buffer. */
> + if (bytes_free < 2 * sizeof(*cmd) && !drain) {
> + /* Drain the buffer */
> + drain = true;

Can't completion_wait provide this as parameter to submit_command? Seems
cleaner to me. Or you even check here if the current command is of type
CMD_COMPL_WAIT.

> + amd_iommu_completion_wait(iommu);
> + drain = false;
> + goto again;
> + }
> +
> + if (drain) {
> + /* Ensure we won't drain the buffer indefinitely */
> + if (++drain_attempts > CMD_BUF_DRAIN_MAX_ATTEMPTS) {

How could this happen?

> + panic_printk("FATAL: IOMMU %d: "
> + "Failed to drain the command buffer\n",
> + iommu->idx);
> + panic_park();
> + }
> + } else {
> + /* Buffer drained - reset the counter */
> + drain_attempts = 0;
> + }
> +
> + cur_ptr = &iommu->cmd_buf_base[iommu->cmd_tail_ptr];
> + memcpy(cur_ptr, cmd, sizeof(*cmd));
> +
> + /* Just to be sure. */
> + arch_paging_flush_cpu_caches(cur_ptr, sizeof(*cmd));
> +
> + iommu->cmd_tail_ptr = next_tail;
> +}
> +
> int iommu_map_memory_region(struct cell *cell,
> const struct jailhouse_memory *mem)
> {
> @@ -461,6 +514,60 @@ void iommu_cell_exit(struct cell *cell)
> page_free(&mem_pool, cell->arch.amd_iommu.pg_structs.root_table, 1);
> }
>
> +static void wait_for_zero(volatile u64 *sem, unsigned long mask)
> +{
> + /*
> + * TODO: We should really have some sort of timeout here,
> + * otherwise there is a risk of looping indefinitely blocking
> + * the hypervisor. However, this requires some sort of time
> + * keeping, so let's postpone this till the time it will be
> + * available in Jailhouse.

Yes, we will eventually have that, but likely via some generic watchdog
mechanism that detects other kind of loops as well.

> + */
> + while (*sem & mask)
> + cpu_relax();
> +}
> +
> +static void amd_iommu_invalidate_pages(struct amd_iommu *iommu,
> + u16 domain_id)
> +{
> + /* Double braces to please GCC */
> + struct cmd_buf_entry invalidate_pages = {{ 0 }};

I'm not getting these comments: cmd_buf_entry is a struct containing an
array as its only field. Why should a compiler let you get away with a
single brace here?

> +
> + /*
> + * Flush everything, including PDEs, in whole address range, i.e.
> + * 0x7ffffffffffff000 with S bit (see Sect. 2.2.3).
> + */
> + invalidate_pages.data[1] = domain_id;
> + invalidate_pages.data[2] = 0xfffff000 |
> + CMD_INV_IOMMU_PAGES_SIZE_MASK |
> + CMD_INV_IOMMU_PAGES_PDE_MASK;
> + invalidate_pages.data[3] = 0x7fffffff;
> + CMD_SET_TYPE(&invalidate_pages, CMD_INV_IOMMU_PAGES);
> +
> + amd_iommu_submit_command(iommu, &invalidate_pages);
> +}
> +
> +static void amd_iommu_completion_wait(struct amd_iommu *iommu)
> +{
> + /* Double braces to please GCC */
> + struct cmd_buf_entry completion_wait = {{ 0 }};
> + volatile u64 sem = 1;
> + long addr;
> +
> + addr = paging_hvirt2phys(&sem);
> +
> + completion_wait.data[0] = (addr & 0xfffffff8UL) |
> + CMD_COMPL_WAIT_STORE_MASK;
> + completion_wait.data[1] = (addr & 0x000fffff00000000UL) >> 32;
> + CMD_SET_TYPE(&completion_wait, CMD_COMPL_WAIT);
> +
> + amd_iommu_submit_command(iommu, &completion_wait);
> + mmio_write64(iommu->mmio_base + MMIO_CMD_TAIL_OFFSET,
> + iommu->cmd_tail_ptr);

Indention ^

> +
> + wait_for_zero(&sem, -1);
> +}
> +
> void iommu_config_commit(struct cell *cell_added_removed)
> {
> /* TODO: Implement */
>


Jan Kiszka

unread,
Dec 30, 2015, 7:50:24 PM12/30/15
to Valentine Sinitsyn, jailho...@googlegroups.com
On 2015-09-12 14:05, Valentine Sinitsyn wrote:
> Implement iommu_add_pci_device() for amd_iommu.
>
> Basically, this is all about filling DTE entry. However, there is no way
> to allocate device tables sparsely with ADM IOMMU. To save some memory,
> Device Table Segmentation (Revision 2.6 and up) is used whenever possible,
> and this adds some infrastructure.
>
> Signed-off-by: Valentine Sinitsyn <valentine...@gmail.com>
> ---
> hypervisor/arch/x86/amd_iommu.c | 151 +++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 148 insertions(+), 3 deletions(-)
>
> diff --git a/hypervisor/arch/x86/amd_iommu.c b/hypervisor/arch/x86/amd_iommu.c
> index 1f0d059..d90fb35 100644
> --- a/hypervisor/arch/x86/amd_iommu.c
> +++ b/hypervisor/arch/x86/amd_iommu.c
> @@ -493,15 +493,160 @@ int iommu_unmap_memory_region(struct cell *cell,
> mem->size, PAGING_COHERENT);
> }
>
> +static void amd_iommu_inv_dte(struct amd_iommu *iommu, u16 device_id)
> +{
> + /* Double braces to please GCC */
> + struct cmd_buf_entry invalidate_dte = {{ 0 }};
> +
> + invalidate_dte.data[0] = device_id;
> + CMD_SET_TYPE(&invalidate_dte, CMD_INV_DEVTAB_ENTRY);
> +
> + amd_iommu_submit_command(iommu, &invalidate_dte);
> +}
> +
> +static struct dev_table_entry * get_dev_table_entry(struct amd_iommu *iommu,

^ no blank before the name

> + u16 bdf, bool allocate)
> +{
> + struct dev_table_entry *devtable_seg;
> + u8 seg_idx, seg_shift;
> + u64 reg_base, reg_val;
> + u16 seg_mask;
> + u32 seg_size;
> +
> + /*
> + * FIXME: Device Table Segmentation is UNTESTED, as I don't have the hardware

indention

> + * which supports this feature.
> + */
> + if (!iommu->dev_tbl_seg_sup) {
> + seg_mask = 0;
> + seg_idx = 0;
> + seg_size = DEV_TABLE_SIZE;
> + } else {
> + seg_shift = BITS_PER_SHORT - iommu->dev_tbl_seg_sup;
> + seg_mask = ~((1 << seg_shift) - 1);
> + seg_idx = (seg_mask & bdf) >> seg_shift;
> + seg_size = DEV_TABLE_SIZE / (1 << iommu->dev_tbl_seg_sup);
> + }
> +
> + /*
> + * Device table segmentation is tricky in Jailhouse. As cells can
> + * "share" the IOMMU, we don't know maximum bdf in each segment
> + * because cells are initialized independently. Thus, we can't simply
> + * adjust segment sizes for our maximum bdfs.
> + *
> + * The next best things is to lazily allocate segments as we add
> + * device using maximum possible size for segments. In the worst case
> + * scenario, we waste around 2M chunk per IOMMU.
> + */
> + devtable_seg = iommu->devtable_segments[seg_idx];
> + if (!devtable_seg) {
> + /* If we are not permitted to allocate, just fail */
> + if (!allocate)
> + return NULL;
> +
> + devtable_seg = page_alloc(&mem_pool, PAGES(seg_size));
> + if (!devtable_seg)
> + return NULL;
> + iommu->devtable_segments[seg_idx] = devtable_seg;
> +
> + if (!seg_idx)
> + reg_base = MMIO_DEV_TABLE_BASE;
> + else
> + reg_base = MMIO_DEV_TABLE_SEG_BASE + (seg_idx - 1) * 8;
> +
> + /* Size in Kbytes = (m + 1) * 4, see Sect 3.3.6 */
> + reg_val = paging_hvirt2phys(devtable_seg) |
> + (seg_size / PAGE_SIZE - 1);
> + mmio_write64(iommu->mmio_base + reg_base, reg_val);
> + }
> +
> + return &devtable_seg[bdf & ~seg_mask];
> +}
> +
> int iommu_add_pci_device(struct cell *cell, struct pci_device *device)
> {
> - /* TODO: Implement */
> - return 0;
> + struct dev_table_entry *dte = NULL;
> + struct amd_iommu *iommu;
> + u8 iommu_idx;
> + int err = 0;
> + u16 bdf;
> +
> + // HACK for QEMU
> + if (iommu_units_count == 0)
> + return 0;
> +
> + if (device->info->type == JAILHOUSE_PCI_TYPE_IVSHMEM)
> + return 0;
> +
> + iommu_idx = device->info->iommu;
> + if (iommu_idx > JAILHOUSE_MAX_IOMMU_UNITS) {
> + err = -ERANGE;
> + goto out;
> + }
> +
> + iommu = &iommu_units[iommu_idx];
> + bdf = device->info->bdf;
> +
> + dte = get_dev_table_entry(iommu, bdf, true);
> + if (!dte) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + memset(dte, 0, sizeof(*dte));
> +
> + /* DomainID */
> + dte->data[1] = cell->id & 0xffff;
> +
> + /* Translation information */
> + dte->data[0] = AMD_IOMMU_PTE_IR | AMD_IOMMU_PTE_IW |
> + paging_hvirt2phys(cell->arch.amd_iommu.pg_structs.root_table) |
> + ((amd_iommu_pt_levels & DTE_MODE_MASK) << DTE_MODE_SHIFT) |
> + DTE_TRANSLATION_VALID | DTE_VALID;
> +
> + /* TODO: Interrupt remapping. For now, just forward them unmapped. */
> +
> + /* Flush caches, just to be sure. */
> + arch_paging_flush_cpu_caches(dte, sizeof(*dte));
> +
> + amd_iommu_inv_dte(iommu, bdf);
> +
> +out:
> + return trace_error(err);
> }
>
> void iommu_remove_pci_device(struct pci_device *device)
> {
> - /* TODO: Implement */
> + struct dev_table_entry *dte = NULL;
> + struct amd_iommu *iommu;
> + u8 iommu_idx;
> + u16 bdf;
> +
> + // HACK for QEMU
> + if (iommu_units_count == 0)
> + return;
> +
> + if (device->info->type == JAILHOUSE_PCI_TYPE_IVSHMEM)
> + return;
> +
> + iommu_idx = device->info->iommu;
> + if (iommu_idx > JAILHOUSE_MAX_IOMMU_UNITS)
> + return;
> +
> + iommu = &iommu_units[iommu_idx];
> + bdf = device->info->bdf;
> +
> + dte = get_dev_table_entry(iommu, bdf, false);
> + if (!dte)
> + return;
> +
> + /* Clear *_VALID flags */
> + dte->data[0] = 0;
> +
> + /* Flush caches, just to be sure. */
> + arch_paging_flush_cpu_caches(dte, sizeof(*dte));
> +
> + amd_iommu_inv_dte(iommu, bdf);
> }
>
> void iommu_cell_exit(struct cell *cell)
>

Just minor style things.

Reviewed-by: Jan Kiszka <jan.k...@siemens.com>

Jan Kiszka

unread,
Dec 30, 2015, 7:51:01 PM12/30/15
to Valentine Sinitsyn, jailho...@googlegroups.com
On 2015-09-12 14:05, Valentine Sinitsyn wrote:
> Implement functions to apply configuration for an IOMMU.
> In case something goes wrong, we need to trigger an NMI, which
> amd_iommu_init_fault_nmi() configures.
>
> Signed-off-by: Valentine Sinitsyn <valentine...@gmail.com>
> ---
> hypervisor/arch/x86/amd_iommu.c | 74 ++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 73 insertions(+), 1 deletion(-)
>
> diff --git a/hypervisor/arch/x86/amd_iommu.c b/hypervisor/arch/x86/amd_iommu.c
> index d90fb35..c9de3cb 100644
> --- a/hypervisor/arch/x86/amd_iommu.c
> +++ b/hypervisor/arch/x86/amd_iommu.c
> @@ -713,9 +713,81 @@ static void amd_iommu_completion_wait(struct amd_iommu *iommu)
> wait_for_zero(&sem, -1);
> }
>
> +/* Parts of this code derives from vtd_init_fault_nmi(). */
> +static void amd_iommu_init_fault_nmi(void)
> +{
> + union pci_msi_registers msi = {{ 0 }};
> + struct per_cpu *cpu_data;
> + struct amd_iommu *iommu;
> + int n;
> +
> + /*
> + * This assumes that at least one bit is set somewhere because we
> + * don't support configurations where Linux is left with no CPUs.
> + */
> + for (n = 0; root_cell.cpu_set->bitmap[n] == 0; n++)
> + /* Empty loop */;
> + cpu_data = per_cpu(ffsl(root_cell.cpu_set->bitmap[n]));
> +
> + /*
> + * Save this value globally to avoid multiple reports of the same
> + * case from different CPUs.
> + */
> + fault_reporting_cpu_id = cpu_data->cpu_id;

This prologue looks indeed familiar. Can we consolidate this, maybe to
x86_set_fault_reporting_cpu()?

> +
> + for_each_iommu(iommu) {
> + msi.raw[0] = pci_read_config(iommu->bdf, iommu->msi_cap, 4);

I would rather suggest to bring msi.msg32 into a well-known state, ie.
clearing the whole first word (read-only fields can safely be set to 0,
that's going to be ignored). As you only flip 'enable' and start with
msi cleared, you can simply leave out this line.

> +
> + /* Disable MSI during interrupt reprogramming */
> + msi.msg32.enable = 0;
> + pci_write_config(iommu->bdf, iommu->msi_cap, msi.raw[0], 4);
> +
> + /* Send NMI to fault_reporting_cpu */
> + msi.msg64.address = (MSI_ADDRESS_VALUE << MSI_ADDRESS_SHIFT) |
> + ((cpu_data->apic_id << 12) & 0xff000);
> + msi.msg64.data = MSI_DM_NMI;
> +
> + /* Enable MSI back */
> + msi.msg32.enable = 1;
> +
> + /* Write new MSI capabilty block */
> + for (n = 3; n >= 0; n--)
> + pci_write_config(iommu->bdf, iommu->msi_cap + 4 * n,
> + msi.raw[n], 4);
> + }
> +
> + /*
> + * There is a race window in between we change fault_reporting_cpu_id
> + * and actually reprogram the MSI. To prevent event loss, signal an
> + * interrupt when done, so iommu_check_pending_faults() is called
> + * upon completion even if no further NMIs due to events would occurr.
> + *
> + * Note we can't simply use CMD_COMPL_WAIT_INT_MASK in
> + * amd_iommu_completion_wait(), as it seems that IOMMU either signal
> + * an interrupt or do memory write, but not both.
> + */
> + apic_send_nmi_ipi(cpu_data);
> +}
> +
> void iommu_config_commit(struct cell *cell_added_removed)
> {
> - /* TODO: Implement */
> + struct amd_iommu *iommu;
> +

Don't you need to bail out here if there are no IOMMUs? That would be in
line with vtd.

> + /* Ensure we'll get NMI on comletion, or if anything goes wrong. */

"completion"

> + if (cell_added_removed)
> + amd_iommu_init_fault_nmi();
> +
> + for_each_iommu(iommu) {
> + /* Flush caches */
> + if (cell_added_removed) {
> + amd_iommu_invalidate_pages(iommu,
> + cell_added_removed->id & 0xffff);
> + amd_iommu_invalidate_pages(iommu,
> + root_cell.id & 0xffff);
> + }
> + /* Execute all commands in the buffer */
> + amd_iommu_completion_wait(iommu);
> + }
> }
>
> struct apic_irq_message iommu_get_remapped_root_int(unsigned int iommu,
>

Jan Kiszka

unread,
Dec 31, 2015, 3:53:34 AM12/31/15
to Valentine Sinitsyn, jailho...@googlegroups.com
On 2015-09-12 14:05, Valentine Sinitsyn wrote:
> Add functions to read event logs AMD IOMMU provides and print their
> contents. The latter is rather basic, but decoding all possible log
> entries is hairy, so we'd better wait and collect stats which
> problems occur most often.
>
> Signed-off-by: Valentine Sinitsyn <valentine...@gmail.com>
> ---
> hypervisor/arch/x86/amd_iommu.c | 92 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 92 insertions(+)
>
> diff --git a/hypervisor/arch/x86/amd_iommu.c b/hypervisor/arch/x86/amd_iommu.c
> index c9de3cb..b1c287d 100644
> --- a/hypervisor/arch/x86/amd_iommu.c
> +++ b/hypervisor/arch/x86/amd_iommu.c
> @@ -856,6 +856,98 @@ void iommu_shutdown(void)
> }
> }
>
> +static void amd_iommu_print_event(struct amd_iommu *iommu,
> + struct evt_log_entry *entry)
> +{
> + u8 evt_code = (entry->data[1] >> 28) & 0xf;
> + u64 op1, op2;
> +
> + op1 = to_u64(entry->data[1] & 0x0fffffff, entry->data[0]);
> + op2 = to_u64(entry->data[3], entry->data[2]);
> +
> + /* TODO: Can we handle these errors more gracefully? */
> + if (evt_code == EVENT_TYPE_ILL_CMD_ERR) {
> + printk("FATAL: IOMMU %d reported ILLEGAL_COMMAND_ERROR\n",
> + iommu->idx);

What could trigger this? Hardware errors or bugs in Jailhouse?

BTW, you probably want panic_printk here, and below, to ensure that the
message is printed cleanly before the CPU dies.

> + panic_stop();
> + }
> +
> + if (evt_code == EVENT_TYPE_CMD_HW_ERR) {
> + printk("FATAL: IOMMU %d reported COMMAND_HARDWARE_ERROR\n",
> + iommu->idx);
> + panic_stop();

To handle such an error, we will require something similar to the (also
missing) MCE processing. For now it's fine this way.

> + }
> +
> + /*
> + * TODO: For now, very basic printer. Consider adapting
> + * iommu_print_event() from the Linux kerel (amd_iommu.c).

"kernel"

> + */
> + printk("AMD IOMMU %d reported event\n", iommu->idx);
> + /* Exclude EVENT_COUNTER_ZERO, as it doesn't report domain ID. */
> + if (evt_code != EVENT_TYPE_EVT_CNT_ZERO)
> + printk(" DeviceId (bus:dev.func): %02x:%02x.%x\n",
> + PCI_BDF_PARAMS(entry->data[0] & 0xffff));
> +
> + printk(" EventCode: %lx Operand 1: %lx, Operand 2: %lx\n",
> + evt_code, op1, op2);
> +}
> +
> +static void amd_iommu_restart_event_log(struct amd_iommu *iommu)
> +{
> + void *base = iommu->mmio_base;
> +
> + wait_for_zero(base + MMIO_STATUS_OFFSET, MMIO_STATUS_EVT_RUN_MASK);
> +
> + mmio_write64_field(base + MMIO_CONTROL_OFFSET, CONTROL_EVT_LOG_EN, 0);
> +
> + /* Simply start from the scratch */
> + mmio_write64(base + MMIO_EVT_HEAD_OFFSET, 0);
> + mmio_write64(base + MMIO_EVT_TAIL_OFFSET, 0);
> +
> + /* Clear EventOverflow (RW1C) */
> + mmio_write64_field(base + MMIO_STATUS_OFFSET,
> + MMIO_STATUS_EVT_OVERFLOW_MASK, 1);
> +
> + /* Bring logging back */
> + mmio_write64_field(base + MMIO_CONTROL_OFFSET, CONTROL_EVT_LOG_EN, 1);
> +}
> +
> +static void amd_iommu_poll_events(struct amd_iommu *iommu)
> +{
> + struct evt_log_entry *evt;
> + u32 head, tail;
> + u64 status;
> +
> + status = mmio_read64(iommu->mmio_base + MMIO_STATUS_OFFSET);
> +
> + if (status & MMIO_STATUS_EVT_OVERFLOW_MASK) {
> + printk("IOMMU %d: Event Log overflow occurred, "
> + "some events were lost!\n", iommu->idx);
> + amd_iommu_restart_event_log(iommu);
> + }
> +
> + while (status & MMIO_STATUS_EVT_INT_MASK) {
> + /* Clear EventLogInt (RW1C) */
> + mmio_write64_field(iommu->mmio_base + MMIO_STATUS_OFFSET,
> + MMIO_STATUS_EVT_INT_MASK, 1);
> +
> + head = mmio_read32(iommu->mmio_base + MMIO_EVT_HEAD_OFFSET);
> + tail = mmio_read32(iommu->mmio_base + MMIO_EVT_TAIL_OFFSET);
> +
> + while (head != tail) {
> + evt = (struct evt_log_entry *)(
> + iommu->evt_log_base + head);
> + amd_iommu_print_event(iommu, evt);
> + head = (head + sizeof(*evt)) % EVT_LOG_SIZE;
> + }
> +
> + mmio_write32(iommu->evt_log_base + MMIO_EVT_HEAD_OFFSET, head);
> +
> + /* Re-read status to catch new events, as Linux does */
> + status = mmio_read64(iommu->mmio_base + MMIO_STATUS_OFFSET);
> + }
> +}
> +
> void iommu_check_pending_faults(void)
> {
> /* TODO: Implement */
>

Jan

Jan Kiszka

unread,
Dec 31, 2015, 3:53:38 AM12/31/15
to Valentine Sinitsyn, jailho...@googlegroups.com
On 2015-09-12 14:05, Valentine Sinitsyn wrote:
> Add iommu_pending_faults() for amd_iommu. This looks into
> Hardware Event Register first, and then loops over the event log
> printing what's in it. This way, we don't miss errors that happen
> when event logging is unavailable.
>
> Signed-off-by: Valentine Sinitsyn <valentine...@gmail.com>
> ---
> hypervisor/arch/x86/amd_iommu.c | 39 ++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 38 insertions(+), 1 deletion(-)
>
> diff --git a/hypervisor/arch/x86/amd_iommu.c b/hypervisor/arch/x86/amd_iommu.c
> index b1c287d..dc724b2 100644
> --- a/hypervisor/arch/x86/amd_iommu.c
> +++ b/hypervisor/arch/x86/amd_iommu.c
> @@ -948,9 +948,46 @@ static void amd_iommu_poll_events(struct amd_iommu *iommu)
> }
> }
>
> +static void amd_iommu_handle_hardware_event(struct amd_iommu *iommu)
> +{
> + struct evt_log_entry hev_entry;
> + u64 hev, buf;
> +
> + hev = mmio_read64(iommu->mmio_base + MMIO_HEV_OFFSET);
> +
> + /* Check if hardware event is present and print it */
> + if (hev & MMIO_HEV_VALID) {
> + if (hev & MMIO_HEV_OVERFLOW) {
> + printk("IOMMU %d: Hardware Event Overflow occurred, "
> + "some events were lost!\n", iommu->idx);
> + }

No need for curly braces here.

> + buf = mmio_read64(
> + iommu->mmio_base + MMIO_HEV_UPPER_OFFSET);

This wrap-around seems unneeded: the line is long enough.

> + memcpy(hev_entry.data, &buf, sizeof(buf));
> + buf = mmio_read64(
> + iommu->mmio_base + MMIO_HEV_LOWER_OFFSET);

Same here.

> + memcpy(&hev_entry.data[2], &buf, sizeof(buf));
> +
> + amd_iommu_print_event(iommu, &hev_entry);
> +
> + /* Clear Hardware Event */
> + mmio_write64(iommu->mmio_base + MMIO_HEV_OFFSET, 0);
> + }
> +}
> +
> void iommu_check_pending_faults(void)
> {
> - /* TODO: Implement */
> + struct amd_iommu *iommu;
> +
> + if (this_cpu_data()->cpu_id != fault_reporting_cpu_id)
> + return;
> +
> + for_each_iommu(iommu) {
> + /* Handle Hardware Event, if supported */
> + if (amd_iommu_get_he_sup(iommu))
> + amd_iommu_handle_hardware_event(iommu);
> + amd_iommu_poll_events(iommu);
> + }
> }
>
> bool iommu_cell_emulates_ir(struct cell *cell)
>

Minor style issues, otherwise

Reviewed-by: Jan Kiszka <jan.k...@siemens.com>

Valentine Sinitsyn

unread,
Dec 31, 2015, 5:21:35 AM12/31/15
to Jan Kiszka, jailho...@googlegroups.com
Hi,
It is yet another leftover. I borrowed this line from the kernel, and it
was a bit unclear at the first sight.

>
>> + bytes_free = (head - next_tail) % CMD_BUF_SIZE;
>> +
>> + /* Leave some space for COMPLETION_WAIT that drains the buffer. */
>> + if (bytes_free < 2 * sizeof(*cmd) && !drain) {
>> + /* Drain the buffer */
>> + drain = true;
>
> Can't completion_wait provide this as parameter to submit_command? Seems
> cleaner to me. Or you even check here if the current command is of type
> CMD_COMPL_WAIT.
>
>> + amd_iommu_completion_wait(iommu);
>> + drain = false;
>> + goto again;
>> + }
>> +
>> + if (drain) {
>> + /* Ensure we won't drain the buffer indefinitely */
>> + if (++drain_attempts > CMD_BUF_DRAIN_MAX_ATTEMPTS) {
>
> How could this happen?
If something posts commands at higher rate than we consume them.
I missed the fact it's array inside. IIRC I was changing this line in
the airport just before boarding :)
Valentine

Valentine Sinitsyn

unread,
Dec 31, 2015, 5:22:44 AM12/31/15
to Jan Kiszka, jailho...@googlegroups.com
The latter.
Valentine

Jan Kiszka

unread,
Dec 31, 2015, 5:28:13 AM12/31/15
to Valentine Sinitsyn, jailho...@googlegroups.com
On 2015-12-31 11:21, Valentine Sinitsyn wrote:
>>> + amd_iommu_completion_wait(iommu);
>>> + drain = false;
>>> + goto again;
>>> + }
>>> +
>>> + if (drain) {
>>> + /* Ensure we won't drain the buffer indefinitely */
>>> + if (++drain_attempts > CMD_BUF_DRAIN_MAX_ATTEMPTS) {
>>
>> How could this happen?
> If something posts commands at higher rate than we consume them.

This is still a bit too vague. First of all, only one context - the CPU
that is currently executing the maintenance command - can issue
commands. That's already required because of lacking synchronization in
the IOMMU layer.

Then draining should by blocking, and it should not cause a nested drain
request.

So, either this test is redundant or some design issue is still sleeping
here. Please clarify this.

Jan

Valentine Sinitsyn

unread,
Dec 31, 2015, 5:30:21 AM12/31/15
to Jan Kiszka, jailho...@googlegroups.com
I prefer not to leave even a theoretical possibility for an infinite
loop. But you are right in that the test seems to be redundant in the
current design.

Valentine

Jan Kiszka

unread,
Dec 31, 2015, 5:35:47 AM12/31/15
to Valentine Sinitsyn, jailho...@googlegroups.com
OK, that's good news that should be documented as well. Note that we
have other potential infinite loops in the code, but with documented
termination conditions.

We can still think about how to handle that kind of assertion. So far
Jailhouse has none of them in the code because I'd like to apply such a
mechanism consistently at a point when we know how much of them are
actually needed and how to react on them best.

Jan

Jan Kiszka

unread,
Feb 18, 2016, 1:12:51 AM2/18/16
to Valentine Sinitsyn, jailho...@googlegroups.com
On 2015-09-12 14:05, Valentine Sinitsyn wrote:
> For AMD, we need to store not only base address but also MMIO size
> (and maybe other attributes in future). Introduce struct jailhoue_iommu
> to encapsulate all data required. For now, all fields make sense both for
> VT-d and AMD IOMMU.
>
> Signed-off-by: Valentine Sinitsyn <valentine...@gmail.com>
> ---
> configs/f2a88xm-hd3.c | 9 +++-
> configs/h87i.c | 12 ++++--
> configs/qemu-vm.c | 7 +++-
> hypervisor/arch/x86/iommu.c | 2 +-
> hypervisor/arch/x86/vtd.c | 6 ++-
> hypervisor/include/jailhouse/cell-config.h | 10 ++++-
> tools/jailhouse-config-create | 67 ++++++++++++++++++++++--------
> tools/root-cell-config.c.tmpl | 13 ++++--
> 8 files changed, 95 insertions(+), 31 deletions(-)
>
> diff --git a/configs/f2a88xm-hd3.c b/configs/f2a88xm-hd3.c
> index 2736496..a48a4ed 100644
> --- a/configs/f2a88xm-hd3.c
> +++ b/configs/f2a88xm-hd3.c
> @@ -40,8 +40,13 @@ struct {
> .mmconfig_base = 0xe0000000,
> .mmconfig_end_bus = 0xff,
> .pm_timer_address = 0x808,
> - .iommu_base = {
> - 0xfeb80000,
> + .iommu_units = {
> + {
> + .base = 0xfeb80000,
> + .size = 0x80000,
> + .amd_bdf = 0x02,
> + .amd_cap = 0x40,

I'm working on a version now that also has amd_msi_cap. Valentine, could
you look up the MSI capability offset for this device, or send me the
collected config files?

Thanks,
Jan

Valentine Sinitsyn

unread,
Feb 18, 2016, 11:08:29 AM2/18/16
to Jan Kiszka, jailho...@googlegroups.com
Hi Jan,
I guess it's 0x54:

00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Family 15h (Models
30h-3fh) I/O Memory Management Unit
Subsystem: Advanced Micro Devices, Inc. [AMD] Family 15h
(Models 30h-3fh) I/O Memory Management Unit
Flags: fast devsel, IRQ 255
Capabilities: [40] Secure device <?>
Capabilities: [54] MSI: Enable- Count=1/4 Maskable- 64bit+
Capabilities: [64] HyperTransport: MSI Mapping Enable+ Fixed+

HTH,
Valentine

Reply all
Reply to author
Forward
0 new messages