x86 IOMMU support (DMAR)

50 views
Skip to first unread message

Konstantin Belousov

unread,
May 27, 2013, 6:58:44 AM5/27/13
to ar...@freebsd.org, am...@freebsd.org
For the several months, I worked (and continue the work now) on the
driver for the Intel VT-d for FreeBSD. The VT-d is sold as the I/O
Virtualization technology, but in essence it is a DMA addresses
remapping engine, i.e. it is advanced and improved I/O MMU, as also
found on other big-iron machines, e.g. PowerPC or Sparc. See the
Intel document titled 'Intel Virtualization Technology for Directed
I/O Architecture Specification' and chipsets datasheets for the
description of the facility.

The development was greatly facilitated by Jim Harris from Intel who
provided me the access to the Sandy and Ivy Bridge north bridge
documentation. John Baldwin patiently educated me about newbus and
helped developing required hooks for integration with the existing
code.

The core hardware element of the VT-d is DMA remap unit, referenced as
DMAR both in the documentation and in the source code. Besides DMA
remap, VT-d also allows to do remapping of the MSI/MSI-X interrupt
messages. FreeBSD could utilize the functionality for the interrupt
rebalancing, instead of reprogramming msi registers of the PCI
devices, but this part is not (yet) implemented.

For the FreeBSD architecture, DMAR naturally fits as busdma engine,
making it possible to eliminate bounce page copying. Another great
benefit of the DMAR use is the reliability and security improvements,
since DMA transfers are only allowed to the memory areas explicitely
designated by the device driver as buffers. As noted by Jim Harris,
this security angle could find a use in the NTB driver.

The existing busdma code for x86 was split into generic interface,
kept in the busdma_machdep.c, and bouncing implementation in the
busdma_bounce.c. The DMAR-based implementation, which calls the DMAR
core, is located in the busdma_dmar.c. There is no KPI provided to
manage DMARs, but I plan to implement the proper interface after
discussing the needs of the bhyve.

I tried to support both i386 and amd64, but for i386 the limited KVA,
together with the busdma interface structure of never sleeping from
the driver calls, make some promises of IOMMU less strict. For
instance, to unload the map, code needs to transiently map the DMAR
page table pages, which require sleepable allocations of sf buffers.
As result, map unload on i386 is done asynchronously in the taskqueue
context, which makes it possible for the buggy device driver or
hardware to perform the transfer to freed pages for some time after
unload. This problem is not present for amd64 port. For the same
reason of busdma KPI, I cannot use queued invalidation both for i386
and amd64.

At the moment the code makes the 1:1 relations between device contexts
and domains, which is fine for busdma. To support PCI pass-through
into the virtualized machines, the relations should be changed to N:1
contexts to domains, which is planned but currently is not yet done.

Overall state of the code is that I can boot multiuser over the
network from if_igb(4) or if_bce(4), and can use ahci(4) and ata(4)
attached disks without corrupting UFS volumes. Uhci(4) has known
issues due to too late establishment of the RMRR mappings. Extensive
testing of the already written code is not done yet. Plans include
- providing the external KPI for the VMM consumers
- support ATS
- making it possible to select busdma_dmar or busdma_bounce for
individual PCI functions
- the stabilization work.
Also, by converting the ISA DMA implementation to use the busdma KPI,
it is possible to make the floppies work reliably again !

It is known that IOMMU adds overhead due to the mapping and unmapping
for each I/O. DMAR implementations usually have some erratas, as well
as PCIe devices sometime do not completely follow the specification,
causing misbehaviour with remapping enabled. For this reason I do not
plan to enable IOMMU by default, and intend to provide a possibility
to route individual PCI devices to the bounce busdma implementation.

http://people.freebsd.org/~kib/misc/DMAR.1.patch

Jeremie Le Hen

unread,
May 27, 2013, 8:27:00 AM5/27/13
to Konstantin Belousov, am...@freebsd.org, ar...@freebsd.org
Hi kib,

On Mon, May 27, 2013 at 01:58:44PM +0300, Konstantin Belousov wrote:
> For the several months, I worked (and continue the work now) on the
> driver for the Intel VT-d for FreeBSD. The VT-d is sold as the I/O
> Virtualization technology, but in essence it is a DMA addresses
> remapping engine, i.e. it is advanced and improved I/O MMU, as also
> found on other big-iron machines, e.g. PowerPC or Sparc. See the
> Intel document titled 'Intel Virtualization Technology for Directed
> I/O Architecture Specification' and chipsets datasheets for the
> description of the facility.
>
> [...]
>
> http://people.freebsd.org/~kib/misc/DMAR.1.patch

Which CPU flag is needed to be able to test this?

My -CURRENT machine has:

CPU: Intel(R) Core(TM)2 CPU 6320 @ 1.86GHz (1869.90-MHz K8-class CPU)
Origin = "GenuineIntel" Id = 0x6f6 Family = 0x6 Model = 0xf Stepping = 6
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0xe3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM>
AMD Features=0x20100800<SYSCALL,NX,LM>
AMD Features2=0x1<LAHF>

--
Jeremie Le Hen

Scientists say the world is made up of Protons, Neutrons and Electrons.
They forgot to mention Morons.
_______________________________________________
freebs...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch...@freebsd.org"

Konstantin Belousov

unread,
May 27, 2013, 12:15:05 PM5/27/13
to ar...@freebsd.org, am...@freebsd.org
On Mon, May 27, 2013 at 02:27:00PM +0200, Jeremie Le Hen wrote:
> Hi kib,
>
> On Mon, May 27, 2013 at 01:58:44PM +0300, Konstantin Belousov wrote:
> > For the several months, I worked (and continue the work now) on the
> > driver for the Intel VT-d for FreeBSD. The VT-d is sold as the I/O
> > Virtualization technology, but in essence it is a DMA addresses
> > remapping engine, i.e. it is advanced and improved I/O MMU, as also
> > found on other big-iron machines, e.g. PowerPC or Sparc. See the
> > Intel document titled 'Intel Virtualization Technology for Directed
> > I/O Architecture Specification' and chipsets datasheets for the
> > description of the facility.
> >
> > [...]
> >
> > http://people.freebsd.org/~kib/misc/DMAR.1.patch
>
> Which CPU flag is needed to be able to test this?
>
> My -CURRENT machine has:
>
> CPU: Intel(R) Core(TM)2 CPU 6320 @ 1.86GHz (1869.90-MHz K8-class CPU)
> Origin = "GenuineIntel" Id = 0x6f6 Family = 0x6 Model = 0xf Stepping = 6
> Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
> Features2=0xe3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM>
> AMD Features=0x20100800<SYSCALL,NX,LM>
> AMD Features2=0x1<LAHF>

The feature is announced by the ACPI table. You should use recent HEAD,
and do acpidump -t | grep DMAR. If the DMAR table is present, you have
VT-d enabled.

Presence of VT-d is determined by:
- north bridge. In other words, for older machines with Core2 and earlier
CPUs, north bridge chip of the chipset should support VT-d, for newer
Core iX, the north bridge inside the CPU.
- motherboard manufacturer, by the way of BIOS properly configuring the
DMARs and filling the right table. Some motherboard vendors offer the
knob in the BIOS setup, which enables VT-d.

One note: KMS-enabled Intel GPU driver is completely incompatible with
VT-d right now. Besides the fact the GPU MMU does not make the calls
to configure remapper, there are many erratas regarding interaction
between IOMMU and GPU, for all generations of chipsets, except possible
Ivy Bridge.
Reply all
Reply to author
Forward
0 new messages