virtio as a model for IO devices and who is doing PCI?

ron minnich

unread,

Jan 2, 2017, 11:58:14 AM1/2/17

to RISC-V HW Dev

I can't remember if I asked this before, of so, sorry.

I've found virtio to be a pretty reasonable model for IO devices, especially as compared to things like AHCI. One particularly nice thing is that the addresses used for DMA are in a well defined structure, so a kernel can exercise reasonable control for well-behaving bits of hardware on those systems not having an IOMMU, where at least part of the problem is figuring out what addresses are used for DMA -- you just about have to write a second driver to work this out.

I'm wondering if we could at least encourage people when they create new hardware for RISCV to make their devices implement virtio as the interface?

Secondly, what rv64 chip company is doing PCI? Anyone?

ron

Samuel Falvo II

unread,

Jan 2, 2017, 12:32:28 PM1/2/17

to ron minnich, RISC-V HW Dev

On Mon, Jan 2, 2017 at 8:58 AM, ron minnich <rmin...@gmail.com> wrote:
> I'm wondering if we could at least encourage people when they create new
> hardware for RISCV to make their devices implement virtio as the interface?

I believe this is also valuable for emulation purposes as well. My
Kestrel-3 emulator currently ties video output into its normal
functioning (think of how the VICE emulator does when emulating a
Commodore 64, for example). I'm realizing now this is a mistake; it's
a great model for an appliance-oriented architecture, but not so much
when things are in flux and under active development.

virtio (if I understand it correctly) allows I/O devices to be
emulated in any reasonable manner that makes sense for a given level
of abstraction. I wish I'd known about it before I started this
project years ago.

--
Samuel A. Falvo II

ron minnich

unread,

Jan 2, 2017, 12:39:27 PM1/2/17

to Samuel Falvo II, RISC-V HW Dev

On Mon, Jan 2, 2017 at 9:32 AM Samuel Falvo II <sam....@gmail.com> wrote:

virtio (if I understand it correctly) allows I/O devices to be
emulated in any reasonable manner that makes sense for a given level
of abstraction. I wish I'd known about it before I started this
project years ago.

I'm not sure I'd put it that way. Virtio is just a (IMHO) well thought out definition of Ye Basic Shared Memory Queueueues, where said queues contain pointers [this is a simplified explanation ;-)]. It's easy to implement but the people who put it together spent a fair amount of time thinking about efficiency and I think they go the hard parts right. I've worked on virtio in Linux and added virtio-mm to Akaros and been happy with the results. It's also been shown to work well in many other kernels, which is a good sign.

more here: https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio

Michael Clark

unread,

Jan 2, 2017, 8:35:25 PM1/2/17

to ron minnich, RISC-V HW Dev

On 3 Jan 2017, at 5:58 AM, ron minnich <rmin...@gmail.com> wrote:

I can't remember if I asked this before, of so, sorry.

I've found virtio to be a pretty reasonable model for IO devices, especially as compared to things like AHCI. One particularly nice thing is that the addresses used for DMA are in a well defined structure, so a kernel can exercise reasonable control for well-behaving bits of hardware on those systems not having an IOMMU, where at least part of the problem is figuring out what addresses are used for DMA — you just about have to write a second driver to work this out.

I’ve been thinking about this.

Linux defines VIRTIO_PCI and VIRTIO_MMIO which use DMA but it seems there is a VirtIO model using Channel IO on S390.

It may be that we just need “ioremap” which on RISC-V would map host <-> device MMIO regions into the kernel address space using page table updates for VirtIO using a Channel IO model.

It seems it may be possible to work without DMA if a device’s virtqueue memory is in statically allocated per device MMIO regions shared between the host and the device. The S390 channel IO model seems ideal for a higher security environment where it’s impossible for untrusted drivers to scribble on random memory.

This has always concerned me that PCI devices can bus master and effectively scribble on any RAM accessible by the kernel. It reminds me of the Thunderbold/FireWire DMA exploits on the MacBook Pro (*1).

The cost of setting up DMA transfers to arbitrary buffers (essentially an sfence.vm for Virtual IO) has to be weighed against the cost of one memcpy. There needs to be at least one copy assuming the device has its own buffer (in MMIO region) e.g. netmap for a NIC buffer. It’s a question of whether it is DMA copy or memcpy.

I’m wondering if we could at least encourage people when they create new hardware for RISCV to make their devices implement virtio as the interface?

Good idea. The VirtIO structures could be used without DMA if there is a constraint that the physical addresses for buffers are in a statically allocated host <-> device MMIO region. Setting up page table mappings for arbitrary transfers may be more expensive than memcpy.

Secondly, what rv64 chip company is doing PCI? Anyone?

ron

[1] http://arstechnica.com/apple/2015/08/thunderstrike-2-rootkit-uses-thunderbolt-accessories-to-infect-mac-firmware/

ron minnich

unread,

Jan 2, 2017, 10:12:43 PM1/2/17

to Michael Clark, RISC-V HW Dev

On Mon, Jan 2, 2017 at 5:35 PM Michael Clark <michae...@mac.com> wrote:

Linux defines VIRTIO_PCI and VIRTIO_MMIO which use DMA but it seems there is a VirtIO model using Channel IO on S390.

they don't require dma for many of the devices, so I'm missing something here. DMA is made possible by the fact that pointers are passed in the virtio rings but DMA is not required to implement a virtio device.

It may be that we just need “ioremap” which on RISC-V would map host <-> device MMIO regions into the kernel address space using page table updates for VirtIO using a Channel IO model.

this part I don't understand.

Part of my argument for using virtio as a standard is that the kernel can easily control what memory is handed to the device, and it can have a pretty good idea what a "well behaved" device will do with that memory, and even what memory has been used and what has not. With some really complex devices (e.g. Mellanox) it can be extremely difficult to have an idea of what the device is doing. Virtio would make our life easy.

But using virtio does not provide guarantees for security. It's just a convenient and well defined spec.

Also, note, I'm not saying that use of virtio requires changes to RISC ISA or Priv spec. I'm just arguing that for new hardware, we might want to use virtio as a model for the new hardware's interface, rather than roll a new and different interface for every device.

Further, the virtio standard is dead simple and can be used for simple things like programmed-IO serial console, and complex things like DMA devices.

That said, I'm glad you like the idea :-)

ron

Michael Clark

unread,

Jan 2, 2017, 10:48:44 PM1/2/17

to ron minnich, RISC-V HW Dev

On 3 Jan 2017, at 4:12 PM, ron minnich <rmin...@gmail.com> wrote:

On Mon, Jan 2, 2017 at 5:35 PM Michael Clark <michae...@mac.com> wrote:

Linux defines VIRTIO_PCI and VIRTIO_MMIO which use DMA but it seems there is a VirtIO model using Channel IO on S390.

they don't require dma for many of the devices, so I'm missing something here. DMA is made possible by the fact that pointers are passed in the virtio rings but DMA is not required to implement a virtio device.

It may be that we just need “ioremap” which on RISC-V would map host <-> device MMIO regions into the kernel address space using page table updates for VirtIO using a Channel IO model.

this part I don’t understand.

ioremap is the Linux kernel interface to map physical MMIO space to a virtual address. S-Mode requires there to be page table entries to access memory mapped IO regions (physical addresses) on platforms that expose IO on the memory bus. i.e. MMIO. Code uses the readb/writeb readl/writel wrappers to read and write from ioremapped memory. On RISC-V readl/writel likely just access IO memory directly.

Not all devices use DMA. Many just have buffers that they expose via MMIO regions. I am thinking of using circular buffers in an MMIO aperture (instead of DMA) and this seems similar to the VirtIO Channel IO implementation for S390. I guess this is like programmed IO and has lower performance.

Part of my argument for using virtio as a standard is that the kernel can easily control what memory is handed to the device, and it can have a pretty good idea what a "well behaved" device will do with that memory, and even what memory has been used and what has not. With some really complex devices (e.g. Mellanox) it can be extremely difficult to have an idea of what the device is doing. Virtio would make our life easy.

I think its a good idea. If the device is not PCI and can do DMA then it’s connected directly to whatever memory bus is in use and can master i.e. initiate reads and writes. The model I am talking about is where the device is a slave and exposes its aperture (read and write buffers) to the system. In the “Virtual” Virtio case, this would be memory mapped in both the Hypervisor and the Guest. i.e. the “master” is a CPU so the DMA is a memcpy.

I guess one of the problems is that people will be using various third party IPs for SoCs, which means the register interfaces for those IPs might be exposed unless there is HDL glue to make them appear as VirtIO. I guess the advantage of DMA is that a CPU core is not blocked on slow memory reads and writes so it will be used, however in Virtual environments DMA requires an IOMMU, so memcpy is likely going to be used until there is an IOMMU.

But using virtio does not provide guarantees for security. It's just a convenient and well defined spec.

Also, note, I'm not saying that use of virtio requires changes to RISC ISA or Priv spec. I’m just arguing that for new hardware, we might want to use virtio as a model for the new hardware's interface, rather than roll a new and different interface for every device.

It’s a good idea. I was looking at I2C to create an emulated interface for loading boot1 code from an EEPROM, and while the electrical side is standardised, it seems every vendor has their own register level implementation and almost no device driver code is shared. It’s a total shambles as far as I can tell. Standard electrically, but non-standard on the software side. Everyone has different register layouts.

Further, the virtio standard is dead simple and can be used for simple things like programmed-IO serial console, and complex things like DMA devices.

Hmmm. Platform standard for IOMMU and DMA :-)

That said, I'm glad you like the idea :-)
ron

--
You received this message because you are subscribed to the Google Groups "RISC-V HW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hw-dev+un...@groups.riscv.org.
To post to this group, send email to hw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/hw-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/hw-dev/CAP6exYLsp4Sj0svFbnjsTwwLPYbJX3h4jD-kk1f60OndJYd6Mg%40mail.gmail.com.

Samuel Falvo II

unread,

Jan 3, 2017, 11:14:16 AM1/3/17

to Michael Clark, ron minnich, RISC-V HW Dev

On Mon, Jan 2, 2017 at 7:48 PM, Michael Clark <michae...@mac.com> wrote:
> Not all devices use DMA. Many just have buffers that they expose via MMIO
> regions. I am thinking of using circular buffers in an MMIO aperture
> (instead of DMA) and this seems similar to the VirtIO Channel IO
> implementation for S390. I guess this is like programmed IO and has lower
> performance.

From the context of the conversation so far, it's not clear to me that
background is known what Channel I/O is, so I'm going to summarize
below. Ignore if I am mistaken.

Channel I/O is most definitely a DMA-based communications method.
Devices operate as if they were independent computers (in fact, they
always are; all "controllers" on a channel are at least minimally
intelligent). A channel "program" is an effectively Turing-complete
(device support pending) program where each instruction is capable of
DMAing some number of bytes into or out of RAM. Each of these
instructions is called a Channel Control Word, or CCW.

Without going into too much detail, imagine:

struct CCW {
uint8_t command, flags;
void *buffer;
uint16_t length;
}

A channel program, then, can be written:

struct CCW channelProgram[PRG_LENGTH] = { ... };

The s390 kicks off a program by executing a SIO instruction (or, I
think on more modern machines, SSCH instruction):

int16_t deviceID;
sio(deviceID, &channelProgram); /* or */ ssch(deviceID, &channelProgram);

CCW.command includes the following four operations (there are more,
but are basically variations on the theme):

- read -- transfers data from the device into a buffer in memory.
Upon completion, CCW.length == number of bytes remaining to be
transferred, while CCW.buffer -> next byte to place said data into.
In effect, the hardware PCLSRs.

- write -- The same, except it sends data to the device.

- sense -- transfers data from the addressed device's *controller*
into a buffer in memory. This is used to retrieve the results of
sending a controller a "command", or sensing some other current status
indication.

- command -- transfers a command to the addressed device's controller.

- Transfer In Channel -- a normally unconditional jump. Destination
address is CCW.buffer; all other fields ignored.

You can configure a "sense" CCW (at least; not sure of others) so that
a subsequent CCW is skipped if some arbitrary (device-dependent)
boolean condition is found to be true. Thus, TIC instructions, while
technically unconditional, are used as conditional jumps in practice.
Another way of achieving Turing completeness is that CCWs often use
other CCWs as buffers to effect self-modifying channel programs. In
fact, this is how the IBM mainframe family *boots*!

Properties of Channel I/O include:

- It's message oriented. Commands are often sent to the device via
the upper bits of CCW.command if they're simple enough; if not,
command/sense CCWs are used to pass messages between the mainframe and
the peripheral a la how Plan 9 uses control files as out-of-band means
of talking to filesystem drivers, or how Linux might use ioctl().
Read/write commands *NEVER* frame their data (unless such framing is
deliberately part of the payload to be transferred), because doing so
invalidates their use in scatter/gather DMA scenarios.

- They're completely asynchronous. Of the thousands of instructions
the S390 and z/Architecture have today, only four are dedicated to I/O
(or, at least, are used with such regularity that this statement is
true in principle, if not in fact). All I/O happens in the
background, and is fully asynchronous from even the supervisor of the
OS. The four instructions amounts to, "Start I/O", "Start I/O on a
subchannel", "Test if I/O is possible", and "Cancel pending I/O".
Modern mainframes can actually support up to 320 to 640 concurrently
operating channel programs.

- It's *strictly* master-slave in relationship. While individual
channel programs might remind you of QDs and TDs in an Intel UHCI
driver stack, there is no equivalent concept to a frame counter or
other means of supporting isochronous data transfer.

- When a channel program terminates (normally or abnormally), an IRQ
is generated which can be trapped by the user's program somehow
(typically in the form of an asynchronous callback).

Relationship to virtio:

Virtio is a message-oriented device abstraction mechanism, where
*logically*, you arrange your command like so:

[command identifying what to do][any output buffers go here][any input
buffers go here]

This entire "string" of bytes comprising the above can come from a
single buffer, or it can come from a plurality of buffers queued in
the appropriate order. Once a virtio request is configured, the
operation is "kicked" into action. When the operation completes, the
client is notified through an asynchronous callback.

The act of configuring a virtio channel is equivalent to synthesizing
CCW programs. The act of "kicking" a virtio channel into action is
equivalent to dispatching that program with an SIO instruction. The
*precise* semantics don't line up perfectly of course, but the I/O
models are close enough that s390 support was one of the first
supported targets for virtio. It should be pointed out that no MMIO
buffers exist in channel I/O; the concept doesn't apply.

Michael Clark

unread,

Jan 3, 2017, 6:08:29 PM1/3/17

to Samuel Falvo II, ron minnich, RISC-V HW Dev

Thanks for the explanation.

I am thinking about the virtualised case where a circular buffer is shared between two VMs and there is no DMA to arbitrary addresses (which would constitute a security risk), just a shared mapping between VMs, or the Intel DPDK case where the Network card “ring buffer” is mapped into userspace. If you look at the fast paths, they are via a shared VM mapping which need to be set up in advance. This would be somewhat like pages that are mapped into two protection domains. It would be more expensive than programming a DMA transfer than to change page tables and call sfence.vm to allow another VM to “directly” access memory in another VM. So in the virtualised case, high speed transports are actually achieved via a static shared mapped (AFAICT).

http://www.linux-kvm.org/images/8/87/02x09-Aspen-Jun_Nakajima-KVM_as_the_NFV_Hypervisor.pdf

http://www.linux-kvm.org/images/1/1d/01x05-NFV.pdf

Michael Clark

unread,

Jan 3, 2017, 6:15:46 PM1/3/17

to Samuel Falvo II, ron minnich, RISC-V HW Dev

In essence, avoiding a TLB flush every time you want to send a packet between colocated VMs, or a VM and domain 0.

Michael Clark

unread,

Jan 3, 2017, 6:25:42 PM1/3/17

to Samuel Falvo II, ron minnich, RISC-V HW Dev

It is my believe that the highest traffic performance rates on Linux have been achieved with a static circular buffer mapped into userspace with a userspace TCP stack (or a unikernel approach like OSv there the app is linked to the kernel). I guess in this case, the NIC DMA is programmed statically for one section or RAM per core (multiqueue), with one thread per queue, and hopefully TCP queue affinity in hardware (to avoid ping pong of TLB across cores). There is no servo-loop as there is with the Berkeley sockets API. e.g. user <-> kernel <-> device. So the arbitrary DMA is set up to point to a single static memory window and no TLB flushes are required. Correct me if I am wrong.

In a virtual environment DMA is memcpy as the hardware is another CPU, so one has to reduce TLB flushes.

Stefan O'Rear

unread,

Jan 4, 2017, 2:05:13 AM1/4/17

to ron minnich, RISC-V HW Dev

On Mon, Jan 2, 2017 at 8:58 AM, ron minnich <rmin...@gmail.com> wrote:
> I can't remember if I asked this before, of so, sorry.
>
> I've found virtio to be a pretty reasonable model for IO devices, especially
> as compared to things like AHCI. One particularly nice thing is that the
> addresses used for DMA are in a well defined structure, so a kernel can
> exercise reasonable control for well-behaving bits of hardware on those
> systems not having an IOMMU, where at least part of the problem is figuring
> out what addresses are used for DMA -- you just about have to write a second
> driver to work this out.

Channel designs in general address the "one time transfer of data"
case, but not the "long-lived capabilities" case which primarily
matters for accelerators. If you want an attached program-executing
accelerator to be able to access memory with the precise authority of
a single user process, you need some kind of IOMMU which can walk
CPU-format page tables. From the documentation I've seen on the
Intel-ish IOMMUs, I don't think they can be significantly improved for
the "accelerator trusted by some processes but not system-wide" use
case, although the number of hoops is ... high ... for many simpler
use cases.

Virtio also does not support grants of the form "read this block of
physical memory repeatedly until reconfigured", which are relevant for
simple video interfaces (without a device-side framebuffer).

(Is there a single place for information on post-1.0 virtio things
like "timer" and "gpu"?)

This is pretty similar to the z/Architecture channel design as
discussed by Samuel. Since the DMA list for a request is not allowed
to be self-modifying you can do validation and translation ahead of
time.

I've mentioned previously I think that using MMIO in a hypervisored
system is a significant amount of entirely avoidable complexity,
insofar as you need a privileged interpreter that can decode and
execute all valid memory instructions (which is a particular problem
if your system has nonstandard instructions that the hypervisor
doesn't know about). I'd rather have a 100% ECALL+DMA design
available at some point, which seems to require a new virtio
transport. Maybe we could define a non-MMIO interface to PCI busses
for use with virtio? Rather not reinvent the whole world.

> I'm wondering if we could at least encourage people when they create new
> hardware for RISCV to make their devices implement virtio as the interface?

It's not entirely clear to me that RISC-V is the best forum for this
fight. Any plan would have to start from a clear assessment of our
leverage in different markets. Intelligent peripherals with
standardized interfaces are an evolving area right now.

> Secondly, what rv64 chip company is doing PCI? Anyone?

You probably saw the SiFive FPGA board running DOOM at the demo
session in November. The graphics, USB HID, and block device for that
were attached using the Xilinx PCIe AXI bridge. I'm not sure who if
anyone is working on "risc-v native" PCIe interfaces.

-s

Reinoud Zandijk

unread,

Jan 4, 2017, 9:14:16 AM1/4/17

to Stefan O'Rear, ron minnich, RISC-V HW Dev, lowRISC

Hi,

I followed this discussion with interest since I'm in the process of designing
an extension to the RISCV to allow for easier implementation of hypervisors.

On Tue, Jan 03, 2017 at 11:05:10PM -0800, Stefan O'Rear wrote:
> On Mon, Jan 2, 2017 at 8:58 AM, ron minnich <rmin...@gmail.com> wrote:
> Channel designs in general address the "one time transfer of data" case, but
> not the "long-lived capabilities" case which primarily matters for
> accelerators. If you want an attached program-executing accelerator to be
> able to access memory with the precise authority of a single user process,
> you need some kind of IOMMU which can walk CPU-format page tables. From the
> documentation I've seen on the Intel-ish IOMMUs, I don't think they can be
> significantly improved for the "accelerator trusted by some processes but
> not system-wide" use case, although the number of hoops is ... high ... for
> many simpler use cases.
>
> Virtio also does not support grants of the form "read this block of physical
> memory repeatedly until reconfigured", which are relevant for simple video
> interfaces (without a device-side framebuffer).

In short, my current solution is a two step system with a memory segment
bitmap for each domain and a secondary page table that is queried on the
resulting physical address when a memory segmentator signals its not allowed
for the current domain to access that physical memory. This secondary page
table including the memory segmentation bitmap are exclusively maintained by a
hypervisor. It allows for each domain to have fully independent page mappings
that are prevented to access anything outside their allocated memory BUT for
the exceptions in the secondary page table and then only for read/write of
data.

The protocol on these shared memory spaces is free to choose. A polling one
for say a framebuffer is fine but a queue of linked lists or virtio structures
is also OK. Just take note of the memory ordering semantics. As a tester I've
implemented a message based two way circular queue for memory/IO/config, a two
way console IO with circular queues and a (dumb) single issue block interface.
As extra instrumentation I've added a signaling function to notify for
processing. It now runs PV6 fine in an multi-CPU setup; writing `drivers' for
these was trivial and peanuts.

Reinoud

Theo Markettos

unread,

Jan 6, 2017, 4:24:15 PM1/6/17

to hw-...@groups.riscv.org

In article <CAP6exYKp0U_0pmDpBS5S5SseJsRubrnH1Pyds6xFTXY=_i=0...@mail.gmail.com> you wrote:
> I've found virtio to be a pretty reasonable model for IO devices,
> especially as compared to things like AHCI. One particularly nice thing is
> that the addresses used for DMA are in a well defined structure, so a
> kernel can exercise reasonable control for well-behaving bits of hardware
> on those systems not having an IOMMU, where at least part of the problem is
> figuring out what addresses are used for DMA -- you just about have to
> write a second driver to work this out.
>
> I'm wondering if we could at least encourage people when they create new
> hardware for RISCV to make their devices implement virtio as the interface?

VirtIO is outwardly seductive as a model for devices, but it fails for a
number of reasons. One of our summer students (Jamie Wood) tried to
implement it as a way of bridging between FPGA and ARM on SoC FPGAs (in our
case, Altera Cyclone V SoC). It's necessary to understand it in its
original context of a bridge between a VM hypervisor 'host' and a VM 'guest'
OS.

The first is that memory is owned by the 'guest', and it can be wherever the
guest OS wants to put it. That means the device has to go and DMA it from
wherever that might be. If you have any non-uniformity of the memory (eg
some memory isn't accessible from the device - eg FPGA can't read ARM
memory) then this is problematic. You can't tell the guest to use memory on
the device, for instance, or below 4GB, or whatever it might be. That can
only be specified by hacking the driver.

Next is that some device models aren't reversible. If you have a network
connection between a host and a guest, it doesn't matter which way round it
is: a network link from A to B looks just the same as one from B to A. But
if it's a disc model, the 'guest' is the master and the 'host' is the
storage. If you want the storage to live on the guest you're out of luck.
(Particularly an issue in our case where it would be nice to have the FPGA
be the storage and 'dd' data in and out)

The trickiest bit is that virtio writes its data into the ring
buffer structure and then hits a register to indicate it's ready. As this
is a write into the hypervisor, the hypervisor is expected to deal with
trapping the IO write, pausing the guest, dealing with the operation and
reorganising the buffers. While he built hardware to implement the MMIO
notification structure, the idea that things can happen underneath the guest
doesn't work out when it's real hardware and not a VM.

Basically, the distribution of powers between the guest and the hypervisor
is all wrong for a hardware implementation. That's not to say they couldn't
be fixed in a new virtio spec, but you can't use what exists
off-the-shelf.

Theo

ron minnich

unread,

Jan 6, 2017, 4:39:22 PM1/6/17

to Theo Markettos, hw-...@groups.riscv.org

On Fri, Jan 6, 2017 at 1:24 PM Theo Markettos <theom...@chiark.greenend.org.uk> wrote:

The trickiest bit is that virtio writes its data into the ring
buffer structure and then hits a register to indicate it's ready. As this
is a write into the hypervisor, the hypervisor is expected to deal with
trapping the IO write, pausing the guest, dealing with the operation and
reorganising the buffers.

yes, this is the thing about virtio I dislike the most. Note it doesn't have to be that way, and in akaros, we're implementing a new virtio that doesn't require this guest pausing, as it's a major performance issue.

But thanks for the note, I guess it won't work and that's unfortunate.

ron

Reinoud Zandijk

unread,

Jan 7, 2017, 8:03:16 AM1/7/17

to Theo Markettos, hw-...@groups.riscv.org

On Fri, Jan 06, 2017 at 09:24:13PM +0000, Theo Markettos wrote:
> In article
> <CAP6exYKp0U_0pmDpBS5S5SseJsRubrnH1Pyds6xFTXY=_i=0...@mail.gmail.com> you

> wrote: The first is that memory is owned by the 'guest', and it can be

> wherever the guest OS wants to put it. That means the device has to go and
> DMA it from wherever that might be. If you have any non-uniformity of the
> memory (eg some memory isn't accessible from the device - eg FPGA can't read
> ARM memory) then this is problematic. You can't tell the guest to use
> memory on the device, for instance, or below 4GB, or whatever it might be.
> That can only be specified by hacking the driver.

That is indeed one of my main issues with virtio in its currents state. Now if
the device would provide the buffer config to the driver....

> Next is that some device models aren't reversible. If you have a network
> connection between a host and a guest, it doesn't matter which way round it
> is: a network link from A to B looks just the same as one from B to A. But
> if it's a disc model, the 'guest' is the master and the 'host' is the
> storage. If you want the storage to live on the guest you're out of luck.
> (Particularly an issue in our case where it would be nice to have the FPGA
> be the storage and 'dd' data in and out)

Yep, thats a shame indeed. It shows that the virtio was defined as an
acceleration for emulation :-/

> The trickiest bit is that virtio writes its data into the ring buffer
> structure and then hits a register to indicate it's ready. As this is a
> write into the hypervisor, the hypervisor is expected to deal with trapping
> the IO write, pausing the guest, dealing with the operation and reorganising
> the buffers. While he built hardware to implement the MMIO notification
> structure, the idea that things can happen underneath the guest doesn't work
> out when it's real hardware and not a VM.

Not quite sure what you mean with `underneath the guest' in this context, but
if you mean that it might take time and the guest has to explicitly wait for
it to be finished, then yes. I `fixed´ this by not implementing the interrupts
first and let all responses be triggered on a timer. That automagically shows
all the assumptions on emulation :) Also, don't forget the `volatiles'! :)

With regards,
Reinoud

Reinoud Zandijk

unread,

Jan 7, 2017, 8:04:27 AM1/7/17

to ron minnich, Theo Markettos, hw-...@groups.riscv.org

Hi Ron,

On Fri, Jan 06, 2017 at 09:39:09PM +0000, ron minnich wrote:
> > The trickiest bit is that virtio writes its data into the ring buffer
> > structure and then hits a register to indicate it's ready. As this is a
> > write into the hypervisor, the hypervisor is expected to deal with
> > trapping the IO write, pausing the guest, dealing with the operation and
> > reorganising the buffers.
> >
> yes, this is the thing about virtio I dislike the most. Note it doesn't have
> to be that way, and in akaros, we're implementing a new virtio that doesn't
> require this guest pausing, as it's a major performance issue.

Do you have a pointer to your work? It would be nice to compare them to what
I've come up so far.

With regards,
Reinoud

ron minnich

unread,

Jan 7, 2017, 12:26:20 PM1/7/17

to Reinoud Zandijk, Theo Markettos, hw-...@groups.riscv.org

I'll try to get that to you when we have it. We had a short distraction while the going doing the virtio work got SMP going for Akaros VM guests.

Reply all

Reply to author

Forward