[PATCH 0/9] ivshmem-related fixes and cleanups

9 views
Skip to first unread message

Jan Kiszka

unread,
Dec 7, 2016, 1:23:32 AM12/7/16
to jailho...@googlegroups.com, Henning Schild
This primarily addresses ivhsmem setup and tear-down issues, most were
not visible so far as we had not non-root to non-root links. The fix for
mmio, however, may have been visible before when removing PCI devices
from the root cell. It was found, though, while testing multiple ivshmem
devices.

Jan


CC: Henning Schild <henning...@siemens.com>

Henning Schild (2):
inmates: x86: match pci device class in ivshmem-demo
core: ivshmem: match device class to connect

Jan Kiszka (7):
core: ivshmem: Fix matching of peers
core: ivshmem: Keep BDF also in ivshmem_data structure
core: ivshmem: Rework device creation and destruction
core: ivshmem: Factor out ivshmem_remote_interrupt
core: ivshmem: Synchronize remote device access with device
destruction
core: ivshmem: Use local variable for PCI device info
core: Fix matching on mmio_region_unregister

Documentation/inter-cell-communication.txt | 3 +-
hypervisor/arch/arm-common/ivshmem.c | 9 +-
hypervisor/arch/x86/ivshmem.c | 13 +-
hypervisor/include/jailhouse/ivshmem.h | 8 +-
hypervisor/ivshmem.c | 250 +++++++++++++----------------
hypervisor/mmio.c | 2 +-
inmates/demos/x86/ivshmem-demo.c | 21 ++-
inmates/lib/x86/inmate.h | 2 +
8 files changed, 143 insertions(+), 165 deletions(-)

--
2.1.4

Jan Kiszka

unread,
Dec 7, 2016, 1:23:32 AM12/7/16
to jailho...@googlegroups.com, Henning Schild
So far, we created another ivshmem_data structure in case two devices
matched by BDF but not by their shared memory regions. This did not make
sense (ivshmem_find only returns the first BDF match) and it could leak
the data structure when the device was destroyed again.

Change the policy so that we match on BDF but fail creating the second
device in case of region mismatch.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/ivshmem.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/hypervisor/ivshmem.c b/hypervisor/ivshmem.c
index f8d4feb..f06622b 100644
--- a/hypervisor/ivshmem.c
+++ b/hypervisor/ivshmem.c
@@ -393,21 +393,24 @@ int ivshmem_init(struct cell *cell, struct pci_device *device)
+ device->info->shmem_region;
ivp = ivshmem_find(device, NULL);
if (ivp) {
+ if ((*ivp)->eps[1].device)
+ return trace_error(-EBUSY);
+
dev0 = (*ivp)->eps[0].device;
mem0 = jailhouse_cell_mem_regions(dev0->cell->config) +
dev0->info->shmem_region;

+ /* check that the regions of both peers match */
+ if (mem0->phys_start != mem->phys_start ||
+ mem0->size != mem->size)
+ return trace_error(-EINVAL);
+
/* we already have a datastructure, connect second endpoint */
- if ((mem0->phys_start == mem->phys_start) &&
- (mem0->size == mem->size)) {
- if ((*ivp)->eps[1].device)
- return trace_error(-EBUSY);
- ivshmem_connect_cell(*ivp, device, mem, 1);
- printk("Virtual PCI connection established "
- "\"%s\" <--> \"%s\"\n",
- cell->config->name, dev0->cell->config->name);
- goto connected;
- }
+ ivshmem_connect_cell(*ivp, device, mem, 1);
+ printk("Virtual PCI connection established "
+ "\"%s\" <--> \"%s\"\n",
+ cell->config->name, dev0->cell->config->name);
+ goto connected;
}

/* this is the first endpoint, allocate a new datastructure */
--
2.1.4

Jan Kiszka

unread,
Dec 7, 2016, 1:23:32 AM12/7/16
to jailho...@googlegroups.com, Henning Schild
Cache the matching BDF of the peer in the ivshmem_data structure. This
will help us when we no longer require slot 0 to be in use.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/ivshmem.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hypervisor/ivshmem.c b/hypervisor/ivshmem.c
index f06622b..cd6632e 100644
--- a/hypervisor/ivshmem.c
+++ b/hypervisor/ivshmem.c
@@ -50,6 +50,7 @@

struct ivshmem_data {
struct ivshmem_endpoint eps[2];
+ u16 bdf;
struct ivshmem_data *next;
};

@@ -239,12 +240,10 @@ static int ivshmem_write_msix_control(struct ivshmem_endpoint *ive, u32 val)
static struct ivshmem_data **ivshmem_find(struct pci_device *d, int *cellnum)
{
struct ivshmem_data **ivp, *iv;
- u16 bdf2;

for (ivp = &ivshmem_list; *ivp; ivp = &((*ivp)->next)) {
iv = *ivp;
- bdf2 = iv->eps[0].device->info->bdf;
- if (d->info->bdf == bdf2) {
+ if (d->info->bdf == iv->bdf) {
if (iv->eps[0].device == d) {
if (cellnum)
*cellnum = 0;
@@ -419,6 +418,7 @@ int ivshmem_init(struct cell *cell, struct pci_device *device)
*ivp = page_alloc(&mem_pool, 1);
if (!(*ivp))
return -ENOMEM;
+ (*ivp)->bdf = device->info->bdf;
ivshmem_connect_cell(*ivp, device, mem, 0);

connected:
--
2.1.4

Jan Kiszka

unread,
Dec 7, 2016, 1:23:32 AM12/7/16
to jailho...@googlegroups.com, Henning Schild
Moving device from slot 1 to slot 0 in case the first slot becomes free
had a bug: it didn't update the pointer from pci_device to the
ivshmem_endpoint.

Instead of fixing this, let's take the chance and redesign the
management: Thanks to earlier preparation steps, we can now keep the
device in its original slot. We just need to check during creation which
slot is free (device pointer is NULL) and use that. Now ivpos of an
endpoint always equals the array index in ivshmem_data.eps that the
endpoint uses.

Finally, we can significantly simplify the code by folding ivshmem_find
and ivshmem_connect/disconnect_cell into their callers.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/ivshmem.c | 198 ++++++++++++++++++++-------------------------------
1 file changed, 76 insertions(+), 122 deletions(-)

diff --git a/hypervisor/ivshmem.c b/hypervisor/ivshmem.c
index cd6632e..5cedd35 100644
--- a/hypervisor/ivshmem.c
+++ b/hypervisor/ivshmem.c
@@ -1,10 +1,11 @@
/*
* Jailhouse, a Linux-based partitioning hypervisor
*
- * Copyright (c) Siemens AG, 2014, 2015
+ * Copyright (c) Siemens AG, 2014-2016
*
- * Author:
+ * Authors:
* Henning Schild <henning...@siemens.com>
+ * Jan Kiszka <jan.k...@siemens.com>
*
* This work is licensed under the terms of the GNU GPL, version 2. See
* the COPYING file in the top-level directory.
@@ -237,86 +238,6 @@ static int ivshmem_write_msix_control(struct ivshmem_endpoint *ive, u32 val)
return 0;
}

-static struct ivshmem_data **ivshmem_find(struct pci_device *d, int *cellnum)
-{
- struct ivshmem_data **ivp, *iv;
-
- for (ivp = &ivshmem_list; *ivp; ivp = &((*ivp)->next)) {
- iv = *ivp;
- if (d->info->bdf == iv->bdf) {
- if (iv->eps[0].device == d) {
- if (cellnum)
- *cellnum = 0;
- return ivp;
- }
- if (iv->eps[1].device == d) {
- if (cellnum)
- *cellnum = 1;
- return ivp;
- }
- if (!cellnum)
- return ivp;
- }
- }
-
- return NULL;
-}
-
-static void ivshmem_connect_cell(struct ivshmem_data *iv,
- struct pci_device *d,
- const struct jailhouse_memory *mem,
- int cellnum)
-{
- struct ivshmem_endpoint *remote = &iv->eps[(cellnum + 1) % 2];
- struct ivshmem_endpoint *ive = &iv->eps[cellnum];
-
- d->bar[0] = PCI_BAR_64BIT;
- d->bar[4] = PCI_BAR_64BIT;
-
- memcpy(ive->cspace, &default_cspace, sizeof(default_cspace));
-
- ive->cspace[0x08/4] |= d->info->shmem_protocol << 8;
-
- if (d->info->num_msix_vectors == 0) {
- /* let the PIN rotate based on the device number */
- ive->cspace[PCI_CFG_INT/4] =
- (((d->info->bdf >> 3) & 0x3) + 1) << 8;
- /* disable MSI-X capability */
- ive->cspace[PCI_CFG_CAPS/4] = 0;
- }
-
- ive->cspace[IVSHMEM_CFG_SHMEM_PTR/4] = (u32)mem->virt_start;
- ive->cspace[IVSHMEM_CFG_SHMEM_PTR/4 + 1] = (u32)(mem->virt_start >> 32);
- ive->cspace[IVSHMEM_CFG_SHMEM_SZ/4] = (u32)mem->size;
- ive->cspace[IVSHMEM_CFG_SHMEM_SZ/4 + 1] = (u32)(mem->size >> 32);
-
- ive->device = d;
- if (remote->device) {
- ive->remote = remote;
- remote->remote = ive;
- ive->ivpos = (remote->ivpos + 1) % 2;
- } else {
- ive->ivpos = cellnum;
- ive->remote = NULL;
- remote->remote = NULL;
- }
- d->ivshmem_endpoint = ive;
-}
-
-static void ivshmem_disconnect_cell(struct ivshmem_data *iv, int cellnum)
-{
- struct ivshmem_endpoint *remote = &iv->eps[(cellnum + 1) % 2];
- struct ivshmem_endpoint *ive = &iv->eps[cellnum];
-
- remote->remote = NULL;
- memory_barrier();
- arch_ivshmem_write_doorbell(ive);
-
- ive->device->ivshmem_endpoint = NULL;
- ive->device = NULL;
- ive->remote = NULL;
-}
-
/**
* Handler for MMIO-write-accesses to PCI config space of this virtual device.
* @param device The device that access should be performed on.
@@ -381,47 +302,80 @@ enum pci_access ivshmem_pci_cfg_read(struct pci_device *device, u16 address,
*/
int ivshmem_init(struct cell *cell, struct pci_device *device)
{
- const struct jailhouse_memory *mem, *mem0;
- struct ivshmem_data **ivp;
- struct pci_device *dev0;
+ const struct jailhouse_memory *mem, *peer_mem;
+ struct ivshmem_endpoint *ive, *remote;
+ struct pci_device *peer_dev;
+ struct ivshmem_data *iv;
+ unsigned int id = 0;

if (device->info->shmem_region >= cell->config->num_memory_regions)
return trace_error(-EINVAL);

mem = jailhouse_cell_mem_regions(cell->config)
+ device->info->shmem_region;
- ivp = ivshmem_find(device, NULL);
- if (ivp) {
- if ((*ivp)->eps[1].device)
+
+ for (iv = ivshmem_list; iv; iv = iv->next)
+ if (iv->bdf == device->info->bdf)
+ break;
+
+ if (iv) {
+ id = iv->eps[0].device ? 1 : 0;
+ if (iv->eps[id].device)
return trace_error(-EBUSY);

- dev0 = (*ivp)->eps[0].device;
- mem0 = jailhouse_cell_mem_regions(dev0->cell->config) +
- dev0->info->shmem_region;
+ peer_dev = iv->eps[id ^ 1].device;
+ peer_mem = jailhouse_cell_mem_regions(peer_dev->cell->config) +
+ peer_dev->info->shmem_region;

/* check that the regions of both peers match */
- if (mem0->phys_start != mem->phys_start ||
- mem0->size != mem->size)
+ if (peer_mem->phys_start != mem->phys_start ||
+ peer_mem->size != mem->size)
return trace_error(-EINVAL);

- /* we already have a datastructure, connect second endpoint */
- ivshmem_connect_cell(*ivp, device, mem, 1);
- printk("Virtual PCI connection established "
- "\"%s\" <--> \"%s\"\n",
- cell->config->name, dev0->cell->config->name);
- goto connected;
+ printk("Shared memory connection established: "
+ "\"%s\" <--> \"%s\"\n",
+ cell->config->name, peer_dev->cell->config->name);
+ } else {
+ iv = page_alloc(&mem_pool, 1);
+ if (!iv)
+ return -ENOMEM;
+
+ iv->bdf = device->info->bdf;
+ iv->next = ivshmem_list;
+ ivshmem_list = iv;
+ }
+
+ ive = &iv->eps[id];
+ remote = &iv->eps[id ^ 1];
+
+ device->bar[0] = PCI_BAR_64BIT;
+ device->bar[4] = PCI_BAR_64BIT;
+
+ memcpy(ive->cspace, &default_cspace, sizeof(default_cspace));
+
+ ive->cspace[0x08/4] |= device->info->shmem_protocol << 8;
+
+ if (device->info->num_msix_vectors == 0) {
+ /* let the PIN rotate based on the device number */
+ ive->cspace[PCI_CFG_INT/4] =
+ (((device->info->bdf >> 3) & 0x3) + 1) << 8;
+ /* disable MSI-X capability */
+ ive->cspace[PCI_CFG_CAPS/4] = 0;
}

- /* this is the first endpoint, allocate a new datastructure */
- for (ivp = &ivshmem_list; *ivp; ivp = &((*ivp)->next))
- ; /* empty loop */
- *ivp = page_alloc(&mem_pool, 1);
- if (!(*ivp))
- return -ENOMEM;
- (*ivp)->bdf = device->info->bdf;
- ivshmem_connect_cell(*ivp, device, mem, 0);
+ ive->cspace[IVSHMEM_CFG_SHMEM_PTR/4] = (u32)mem->virt_start;
+ ive->cspace[IVSHMEM_CFG_SHMEM_PTR/4 + 1] = (u32)(mem->virt_start >> 32);
+ ive->cspace[IVSHMEM_CFG_SHMEM_SZ/4] = (u32)mem->size;
+ ive->cspace[IVSHMEM_CFG_SHMEM_SZ/4 + 1] = (u32)(mem->size >> 32);
+
+ ive->device = device;
+ ive->ivpos = id;
+ device->ivshmem_endpoint = ive;
+ if (remote->device) {
+ ive->remote = remote;
+ remote->remote = ive;
+ }

-connected:
printk("Adding virtual PCI device %02x:%02x.%x to cell \"%s\"\n",
PCI_BDF_PARAMS(device->info->bdf), cell->config->name);

@@ -435,23 +389,23 @@ connected:
*/
void ivshmem_exit(struct pci_device *device)
{
+ struct ivshmem_endpoint *ive = device->ivshmem_endpoint;
struct ivshmem_data **ivp, *iv;
- int cellnum;

- ivp = ivshmem_find(device, &cellnum);
- if (!ivp || !(*ivp))
- return;
+ if (ive->remote) {
+ ive->remote->remote = NULL;
+ memory_barrier();
+ arch_ivshmem_write_doorbell(ive);

- iv = *ivp;
-
- ivshmem_disconnect_cell(iv, cellnum);
-
- if (cellnum == 0) {
- if (!iv->eps[1].device) {
- *ivp = iv->next;
- page_free(&mem_pool, iv, 1);
- return;
+ ive->device = NULL;
+ } else {
+ for (ivp = &ivshmem_list; *ivp; ivp = &(*ivp)->next) {
+ iv = *ivp;
+ if (&iv->eps[ive->ivpos] == ive) {
+ *ivp = iv->next;
+ page_free(&mem_pool, iv, 1);
+ break;
+ }
}
- iv->eps[0] = iv->eps[1];
}
}
--
2.1.4

Jan Kiszka

unread,
Dec 7, 2016, 1:23:33 AM12/7/16
to jailho...@googlegroups.com, Henning Schild
From: Henning Schild <henning...@siemens.com>

Match the device class and "protocol" while probing and skip unknow
devices.

Signed-off-by: Henning Schild <henning...@siemens.com>
Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
inmates/demos/x86/ivshmem-demo.c | 21 ++++++++++++++-------
inmates/lib/x86/inmate.h | 2 ++
2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/inmates/demos/x86/ivshmem-demo.c b/inmates/demos/x86/ivshmem-demo.c
index 8246117..7361032 100644
--- a/inmates/demos/x86/ivshmem-demo.c
+++ b/inmates/demos/x86/ivshmem-demo.c
@@ -1,7 +1,7 @@
/*
* Jailhouse, a Linux-based partitioning hypervisor
*
- * Copyright (c) Siemens AG, 2014
+ * Copyright (c) Siemens AG, 2014-2016
*
* Authors:
* Henning Schild <henning...@siemens.com>
@@ -17,6 +17,8 @@
#define IVSHMEM_CFG_SHMEM_PTR 0x40
#define IVSHMEM_CFG_SHMEM_SZ 0x48

+#define JAILHOUSE_SHMEM_PROTO_UNDEFINED 0x0000
+
#define IRQ_VECTOR 32

#define MAX_NDEV 4
@@ -114,19 +116,26 @@ void inmate_main(void)
{
int i;
int bdf = 0;
+ unsigned int class_rev;
struct ivshmem_dev_data *d;
volatile char *shmem;

int_init();

-again:
- bdf = pci_find_device(VENDORID, DEVICEID, bdf);
- if (bdf != -1) {
+ while ((ndevices < MAX_NDEV) &&
+ (-1 != (bdf = pci_find_device(VENDORID, DEVICEID, bdf)))) {
printk("IVSHMEM: Found %04x:%04x at %02x:%02x.%x\n",
pci_read_config(bdf, PCI_CFG_VENDOR_ID, 2),
pci_read_config(bdf, PCI_CFG_DEVICE_ID, 2),
bdf >> 8, (bdf >> 3) & 0x1f, bdf & 0x3);
-
+ class_rev = pci_read_config(bdf, 0x8, 4);
+ if (class_rev != (PCI_DEV_CLASS_OTHER << 24 |
+ JAILHOUSE_SHMEM_PROTO_UNDEFINED << 8)) {
+ printk("IVSHMEM: class/revision %08x, not supported "
+ "skipping device\n", class_rev);
+ bdf++;
+ continue;
+ }
ndevices++;
d = devs + ndevices - 1;
d->bdf = bdf;
@@ -139,8 +148,6 @@ again:
int_set_handler(IRQ_VECTOR + ndevices - 1, irq_handler);
pci_msix_set_vector(bdf, IRQ_VECTOR + ndevices - 1, 0);
bdf++;
- if (ndevices < MAX_NDEV)
- goto again;
}

if (!ndevices) {
diff --git a/inmates/lib/x86/inmate.h b/inmates/lib/x86/inmate.h
index 2c22de0..e255538 100644
--- a/inmates/lib/x86/inmate.h
+++ b/inmates/lib/x86/inmate.h
@@ -53,6 +53,8 @@

#define PCI_ID_ANY 0xffff

+#define PCI_DEV_CLASS_OTHER 0xff
+
#define PCI_CAP_MSI 0x05
#define PCI_CAP_MSIX 0x11

--
2.1.4

Jan Kiszka

unread,
Dec 7, 2016, 1:23:33 AM12/7/16
to jailho...@googlegroups.com, Henning Schild
From: Henning Schild <henning...@siemens.com>

Since 3e970ab533c4 we can specify a protocol to run on the ivshmem channel.
This patch considers a protocol mismatch between the two endpoints a
configuration error and does not connect them to each other.

Signed-off-by: Henning Schild <henning...@siemens.com>
[Jan: rebased]
Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
Documentation/inter-cell-communication.txt | 3 ++-
hypervisor/ivshmem.c | 5 +++--
2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/Documentation/inter-cell-communication.txt b/Documentation/inter-cell-communication.txt
index 2d0eded..dbaea06 100644
--- a/Documentation/inter-cell-communication.txt
+++ b/Documentation/inter-cell-communication.txt
@@ -61,7 +61,8 @@ cell config you should make sure that "iommu" is set to the correct value,
try using the same value that works for the other pci devices.
The link between two such virtual PCI devices is established by using the same
"bdf". The size and location of the shared memory can be configured freely but
-you have to make sure that the values match on both sides.
+you have to make sure that the values match on both sides. The "shmem_protocol"
+has to match as well.
For an example have a look at the cell configuration files of qemu and the
ivshmem-demo.

diff --git a/hypervisor/ivshmem.c b/hypervisor/ivshmem.c
index eace458..a9e8440 100644
--- a/hypervisor/ivshmem.c
+++ b/hypervisor/ivshmem.c
@@ -337,9 +337,10 @@ int ivshmem_init(struct cell *cell, struct pci_device *device)
peer_mem = jailhouse_cell_mem_regions(peer_dev->cell->config) +
peer_dev->info->shmem_region;

- /* check that the regions of both peers match */
+ /* check that the regions and protocols of both peers match */
if (peer_mem->phys_start != mem->phys_start ||
- peer_mem->size != mem->size)
+ peer_mem->size != mem->size ||
+ peer_dev->info->shmem_protocol != dev_info->shmem_protocol)
return trace_error(-EINVAL);

printk("Shared memory connection established: "
--
2.1.4

Jan Kiszka

unread,
Dec 7, 2016, 1:23:33 AM12/7/16
to jailho...@googlegroups.com, Henning Schild
The check for the remote being present before sending it an interrupt is
actually generic. Factor this out and change the arch-specific callback
to only trigger the interrupt on the destination device.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/arch/arm-common/ivshmem.c | 9 ++-------
hypervisor/arch/x86/ivshmem.c | 13 ++++---------
hypervisor/include/jailhouse/ivshmem.h | 6 +++---
hypervisor/ivshmem.c | 14 +++++++++++---
4 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/hypervisor/arch/arm-common/ivshmem.c b/hypervisor/arch/arm-common/ivshmem.c
index 14da345..19d300d 100644
--- a/hypervisor/arch/arm-common/ivshmem.c
+++ b/hypervisor/arch/arm-common/ivshmem.c
@@ -13,15 +13,10 @@
#include <jailhouse/ivshmem.h>
#include <asm/irqchip.h>

-void arch_ivshmem_write_doorbell(struct ivshmem_endpoint *ive)
+void arch_ivshmem_trigger_interrupt(struct ivshmem_endpoint *ive)
{
- struct ivshmem_endpoint *remote = ive->remote;
- unsigned int irq_id;
+ unsigned int irq_id = ive->arch.irq_id;

- if (!remote)
- return;
-
- irq_id = remote->arch.irq_id;
if (irq_id)
irqchip_set_pending(NULL, irq_id);
}
diff --git a/hypervisor/arch/x86/ivshmem.c b/hypervisor/arch/x86/ivshmem.c
index 0df018c..7ec3ea7 100644
--- a/hypervisor/arch/x86/ivshmem.c
+++ b/hypervisor/arch/x86/ivshmem.c
@@ -16,17 +16,12 @@
#include <jailhouse/printk.h>
#include <asm/pci.h>

-void arch_ivshmem_write_doorbell(struct ivshmem_endpoint *ive)
+void arch_ivshmem_trigger_interrupt(struct ivshmem_endpoint *ive)
{
- struct ivshmem_endpoint *remote = ive->remote;
- struct apic_irq_message irq_msg;
-
- if (!remote)
- return;
+ /* Get a copy of the struct before using it. */
+ struct apic_irq_message irq_msg = ive->arch.irq_msg;

- /* get a copy of the struct before using it, the read barrier makes
- * sure the copy is consistent */
- irq_msg = remote->arch.irq_msg;
+ /* The read barrier makes sure the copy is consistent. */
memory_load_barrier();
if (irq_msg.valid)
apic_send_irq(irq_msg);
diff --git a/hypervisor/include/jailhouse/ivshmem.h b/hypervisor/include/jailhouse/ivshmem.h
index 742678d..b6a1044 100644
--- a/hypervisor/include/jailhouse/ivshmem.h
+++ b/hypervisor/include/jailhouse/ivshmem.h
@@ -50,10 +50,10 @@ enum pci_access ivshmem_pci_cfg_read(struct pci_device *device, u16 address,
bool ivshmem_is_msix_masked(struct ivshmem_endpoint *ive);

/**
- * Handle write to doorbell register.
- * @param ive Ivshmem endpoint the write was performed on.
+ * Trigger interrupt on ivshmem endpoint.
+ * @param ive Ivshmem endpoint the interrupt should be raised at.
*/
-void arch_ivshmem_write_doorbell(struct ivshmem_endpoint *ive);
+void arch_ivshmem_trigger_interrupt(struct ivshmem_endpoint *ive);

/**
* Update cached MSI-X state (if any) of the given ivshmem device.
diff --git a/hypervisor/ivshmem.c b/hypervisor/ivshmem.c
index 5cedd35..1e3d27d 100644
--- a/hypervisor/ivshmem.c
+++ b/hypervisor/ivshmem.c
@@ -70,6 +70,14 @@ static const u32 default_cspace[IVSHMEM_CFG_SIZE / sizeof(u32)] = {
[(IVSHMEM_CFG_MSIX_CAP + 0x8)/4] = 0x10 * IVSHMEM_MSIX_VECTORS | 4,
};

+static void ivshmem_remote_interrupt(struct ivshmem_endpoint *ive)
+{
+ struct ivshmem_endpoint *remote = ive->remote;
+
+ if (remote)
+ arch_ivshmem_trigger_interrupt(ive->remote);
+}
+
static enum mmio_result ivshmem_register_mmio(void *arg,
struct mmio_access *mmio)
{
@@ -93,7 +101,7 @@ static enum mmio_result ivshmem_register_mmio(void *arg,

if (mmio->address == IVSHMEM_REG_DBELL) {
if (mmio->is_write)
- arch_ivshmem_write_doorbell(ive);
+ ivshmem_remote_interrupt(ive);
else
mmio->value = 0;
return MMIO_HANDLED;
@@ -103,7 +111,7 @@ static enum mmio_result ivshmem_register_mmio(void *arg,
if (mmio->is_write) {
ive->state = mmio->value;
memory_barrier();
- arch_ivshmem_write_doorbell(ive);
+ ivshmem_remote_interrupt(ive);
} else {
mmio->value = ive->state;
}
@@ -395,7 +403,7 @@ void ivshmem_exit(struct pci_device *device)
if (ive->remote) {
ive->remote->remote = NULL;
memory_barrier();
- arch_ivshmem_write_doorbell(ive);
+ ivshmem_remote_interrupt(ive);

ive->device = NULL;
} else {
--
2.1.4

Jan Kiszka

unread,
Dec 7, 2016, 1:23:33 AM12/7/16
to jailho...@googlegroups.com, Henning Schild
In case of an ivshmem link between two non-root cells, one can go down
while the other is running. That is different from root to non-root
links, because the non-root cell always leaves first, and the root cell
is stopped meanwhile. We have to avoid the, after destroying one peer,
the other will still send interrupts to the former peer.

The simplest way to achieve this is using a spinlock around
dereferencing the remote pointer of an ivshmem endpoint and sending out
the interrupt to the peer on one side and around the clearing of the
remote pointer during destruction. The latter will ensure that the
sender is either already done with the transmission or will see NULL as
remote, thus will not send any message anymore. This resolves the race
between flushing pending interrupts from the destructed cell and sending
new ones from the former peer cell.

Spinlocks come with memory barrier semantics, so we can remove the
explicit barriers.

Now that we have the lock, we can also synchronize the rstate readout of
the remaining peer with device destruction to ensure that no other CPU
will consider the device alive after ivshmem_exit returned.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/include/jailhouse/ivshmem.h | 2 ++
hypervisor/ivshmem.c | 32 +++++++++++++++++++++-----------
2 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/hypervisor/include/jailhouse/ivshmem.h b/hypervisor/include/jailhouse/ivshmem.h
index b6a1044..82a4837 100644
--- a/hypervisor/include/jailhouse/ivshmem.h
+++ b/hypervisor/include/jailhouse/ivshmem.h
@@ -16,6 +16,7 @@

#include <jailhouse/pci.h>
#include <asm/ivshmem.h>
+#include <asm/spinlock.h>

#define IVSHMEM_CFG_MSIX_CAP 0x50
#define IVSHMEM_CFG_SIZE (IVSHMEM_CFG_MSIX_CAP + 12)
@@ -35,6 +36,7 @@ struct ivshmem_endpoint {
u64 bar4_address;
struct pci_device *device;
struct ivshmem_endpoint *remote;
+ spinlock_t remote_lock;
struct arch_pci_ivshmem arch;
u32 intx_ctrl_reg;
};
diff --git a/hypervisor/ivshmem.c b/hypervisor/ivshmem.c
index 1e3d27d..bbf830d 100644
--- a/hypervisor/ivshmem.c
+++ b/hypervisor/ivshmem.c
@@ -72,10 +72,14 @@ static const u32 default_cspace[IVSHMEM_CFG_SIZE / sizeof(u32)] = {

static void ivshmem_remote_interrupt(struct ivshmem_endpoint *ive)
{
- struct ivshmem_endpoint *remote = ive->remote;
-
- if (remote)
+ /*
+ * Hold the remote lock while sending the interrupt so that
+ * ivshmem_exit can synchronize on the completion of the delivery.
+ */
+ spin_lock(&ive->remote_lock);
+ if (ive->remote)
arch_ivshmem_trigger_interrupt(ive->remote);
+ spin_unlock(&ive->remote_lock);
}

static enum mmio_result ivshmem_register_mmio(void *arg,
@@ -110,7 +114,6 @@ static enum mmio_result ivshmem_register_mmio(void *arg,
if (mmio->address == IVSHMEM_REG_LSTATE) {
if (mmio->is_write) {
ive->state = mmio->value;
- memory_barrier();
ivshmem_remote_interrupt(ive);
} else {
mmio->value = ive->state;
@@ -119,10 +122,9 @@ static enum mmio_result ivshmem_register_mmio(void *arg,
}

if (mmio->address == IVSHMEM_REG_RSTATE && !mmio->is_write) {
- if (ive->remote)
- mmio->value = ive->remote->state;
- else
- mmio->value = 0;
+ spin_lock(&ive->remote_lock);
+ mmio->value = ive->remote ? ive->remote->state : 0;
+ spin_unlock(&ive->remote_lock);
return MMIO_HANDLED;
}

@@ -398,11 +400,19 @@ int ivshmem_init(struct cell *cell, struct pci_device *device)
void ivshmem_exit(struct pci_device *device)
{
struct ivshmem_endpoint *ive = device->ivshmem_endpoint;
+ struct ivshmem_endpoint *remote = ive->remote;
struct ivshmem_data **ivp, *iv;

- if (ive->remote) {
- ive->remote->remote = NULL;
- memory_barrier();
+ if (remote) {
+ /*
+ * The spinlock synchronizes the disconnection of the remote
+ * device with any in-flight interrupts targeting the device
+ * to be destroyed.
+ */
+ spin_lock(&remote->remote_lock);
+ remote->remote = NULL;
+ spin_unlock(&remote->remote_lock);
+
ivshmem_remote_interrupt(ive);

ive->device = NULL;
--
2.1.4

Jan Kiszka

unread,
Dec 7, 2016, 1:23:33 AM12/7/16
to jailho...@googlegroups.com, Henning Schild
Shortens some existing and upcoming long lines.

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/ivshmem.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/hypervisor/ivshmem.c b/hypervisor/ivshmem.c
index bbf830d..eace458 100644
--- a/hypervisor/ivshmem.c
+++ b/hypervisor/ivshmem.c
@@ -312,20 +312,20 @@ enum pci_access ivshmem_pci_cfg_read(struct pci_device *device, u16 address,
*/
int ivshmem_init(struct cell *cell, struct pci_device *device)
{
+ const struct jailhouse_pci_device *dev_info = device->info;
const struct jailhouse_memory *mem, *peer_mem;
struct ivshmem_endpoint *ive, *remote;
struct pci_device *peer_dev;
struct ivshmem_data *iv;
unsigned int id = 0;

- if (device->info->shmem_region >= cell->config->num_memory_regions)
+ if (dev_info->shmem_region >= cell->config->num_memory_regions)
return trace_error(-EINVAL);

- mem = jailhouse_cell_mem_regions(cell->config)
- + device->info->shmem_region;
+ mem = jailhouse_cell_mem_regions(cell->config) + dev_info->shmem_region;

for (iv = ivshmem_list; iv; iv = iv->next)
- if (iv->bdf == device->info->bdf)
+ if (iv->bdf == dev_info->bdf)
break;

if (iv) {
@@ -350,7 +350,7 @@ int ivshmem_init(struct cell *cell, struct pci_device *device)
if (!iv)
return -ENOMEM;

- iv->bdf = device->info->bdf;
+ iv->bdf = dev_info->bdf;
iv->next = ivshmem_list;
ivshmem_list = iv;
}
@@ -363,12 +363,12 @@ int ivshmem_init(struct cell *cell, struct pci_device *device)

memcpy(ive->cspace, &default_cspace, sizeof(default_cspace));

- ive->cspace[0x08/4] |= device->info->shmem_protocol << 8;
+ ive->cspace[0x08/4] |= dev_info->shmem_protocol << 8;

- if (device->info->num_msix_vectors == 0) {
+ if (dev_info->num_msix_vectors == 0) {
/* let the PIN rotate based on the device number */
ive->cspace[PCI_CFG_INT/4] =
- (((device->info->bdf >> 3) & 0x3) + 1) << 8;
+ (((dev_info->bdf >> 3) & 0x3) + 1) << 8;
/* disable MSI-X capability */
ive->cspace[PCI_CFG_CAPS/4] = 0;
}
@@ -387,7 +387,7 @@ int ivshmem_init(struct cell *cell, struct pci_device *device)
}

printk("Adding virtual PCI device %02x:%02x.%x to cell \"%s\"\n",
- PCI_BDF_PARAMS(device->info->bdf), cell->config->name);
+ PCI_BDF_PARAMS(dev_info->bdf), cell->config->name);

return 0;
}
--
2.1.4

Jan Kiszka

unread,
Dec 7, 2016, 1:23:33 AM12/7/16
to jailho...@googlegroups.com, Henning Schild
Looking up the to-be-deleted region with zero legnth can cause false
matches:

Registered regions:
Start Size
0xb0000000 0x10000000
0x100000000 0x1000
0x100002000 0x20
0x100002020 0x20

Now trying to remove 0x100002020 will match on 0x100002000. This can be
trivially avoided by looking for a 1-byte region.

Fixes: 0993685e882f ("core: Add generic MMIO access dispatching")

Signed-off-by: Jan Kiszka <jan.k...@siemens.com>
---
hypervisor/mmio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hypervisor/mmio.c b/hypervisor/mmio.c
index 5fdcc31..7f3f8fa 100644
--- a/hypervisor/mmio.c
+++ b/hypervisor/mmio.c
@@ -171,7 +171,7 @@ void mmio_region_unregister(struct cell *cell, unsigned long start)

spin_lock(&cell->mmio_region_lock);

- index = find_region(cell, start, 0);
+ index = find_region(cell, start, 1);
if (index >= 0) {
for (/* empty */; index < cell->num_mmio_regions; index++)
copy_region(cell, index + 1, index);
--
2.1.4

Reply all
Reply to author
Forward
0 new messages