Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

DMAR and DRHD errors[DMAR:[fault reason 06] PTE Read access is not set] Vt-d & intel_iommu

3,072 views
Skip to first unread message

Jason Gao

unread,
Dec 13, 2012, 5:00:02 AM12/13/12
to
Dear List:

Description of problem:
After installed Centos 6.3(RHEL6.3) on my Dell R710(lastest
bios:Version: 6.3.0,Release Date: 07/24/2012) server,and updated
lastest kernel "2.6.32-279.14.1.el6.x86_64",I want to use the Intel
82576 ET Dual Port nic's SR-IOV feature,assigning VFs to kvm guest

appended kernel boot parameter: intel_iommu=on,after boot with the
following messages:

Dec 13 16:58:15 2 kernel: DRHD: handling fault status reg 2
Dec 13 16:58:15 2 kernel: DMAR:[DMA Read] Request device [03:00.0]
fault addr ffe65000
Dec 13 16:58:15 2 kernel: DMAR:[fault reason 06] PTE Read access is not set
Dec 13 16:58:15 2 kernel: DRHD: handling fault status reg 102
Dec 13 16:58:15 2 kernel: DMAR:[DMA Read] Request device [03:00.0]
fault addr ffe8a000
Dec 13 16:58:15 2 kernel: DMAR:[fault reason 06] PTE Read access is not set
Dec 13 16:58:15 2 kernel: scsi 0:0:32:0: Enclosure DP
BACKPLANE 1.07 PQ: 0 ANSI: 5
Dec 13 16:58:15 2 kernel: DRHD: handling fault status reg 202
Dec 13 16:58:15 2 kernel: DMAR:[DMA Read] Request device [03:00.0]
fault addr ffe89000
Dec 13 16:58:15 2 kernel: DMAR:[fault reason 06] PTE Read access is not set

full dmesg detail:
http://pastebin.com/BzFQV0jU
lspci -vvv full detail:
http://pastebin.com/9rP2d1br


it's a production server,and I'm not sure if this is a critical
problem,how to fix it,any help would be greatly appreciated.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Alex Williamson

unread,
Dec 13, 2012, 11:30:01 AM12/13/12
to
Device 03:00.0 is your raid controller:

03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)

For some reason it's trying to read from ffe65000, ffe8a000, ffe89000,
ffe86000, ffe87000, ffe84000. Those are in reserved memory regions, so
it's not reading an OS allocated buffer, which probably means it's some
kind of side-band communication with a management controller. I'd guess
it's a BIOS bug and there should be an RMRR covering those accesses.
Thanks,

Alex

Jason Gao

unread,
Dec 13, 2012, 9:10:01 PM12/13/12
to
On Fri, Dec 14, 2012 at 12:23 AM, Alex Williamson
<alex.wi...@redhat.com> wrote:
>
> Device 03:00.0 is your raid controller:
>
> 03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)
>
> For some reason it's trying to read from ffe65000, ffe8a000, ffe89000,
> ffe86000, ffe87000, ffe84000. Those are in reserved memory regions, so
> it's not reading an OS allocated buffer, which probably means it's some
> kind of side-band communication with a management controller. I'd guess
> it's a BIOS bug and there should be an RMRR covering those accesses.
> Thanks,

First of all ,I want to known whether I can ignore these errors on the
production server,and do these error may affect the system?

By the way,when I removed the "intel_iommu=on" from /etc/grub.conf,no
DMAR related errors occur

It's a strange thing,other three Dell R710 servers with the same bios
version v. 6.3.0, same kernel 2.6.32-279.14.1 on RHEL6u3(Centos 6u3)
,but these errors don't appear on these tree servers

Anyone have any idea for this ?

thanks

Alex Williamson

unread,
Dec 13, 2012, 11:50:01 PM12/13/12
to
On Fri, 2012-12-14 at 10:01 +0800, Jason Gao wrote:
> On Fri, Dec 14, 2012 at 12:23 AM, Alex Williamson
> <alex.wi...@redhat.com> wrote:
> >
> > Device 03:00.0 is your raid controller:
> >
> > 03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)
> >
> > For some reason it's trying to read from ffe65000, ffe8a000, ffe89000,
> > ffe86000, ffe87000, ffe84000. Those are in reserved memory regions, so
> > it's not reading an OS allocated buffer, which probably means it's some
> > kind of side-band communication with a management controller. I'd guess
> > it's a BIOS bug and there should be an RMRR covering those accesses.
> > Thanks,
>
> First of all ,I want to known whether I can ignore these errors on the
> production server,and do these error may affect the system?

You'll have to make that call, the device is being blocked from reading
a memory address, we don't know what it's reading or why.

> By the way,when I removed the "intel_iommu=on" from /etc/grub.conf,no
> DMAR related errors occur

Of course. One option you have is to use the iommu in passthrough mode
which allows host used devices unrestricted, identity mapped access to
the system while still offering PCI device assignment. I wouldn't try
assigning device 3:00.0 though. Add iommu=pt to enable this.

> It's a strange thing,other three Dell R710 servers with the same bios
> version v. 6.3.0, same kernel 2.6.32-279.14.1 on RHEL6u3(Centos 6u3)
> ,but these errors don't appear on these tree servers

Is the MegaRAID firmware and system management firmware the same as
well? Thanks,

Alex

Jason Gao

unread,
Dec 14, 2012, 2:00:01 AM12/14/12
to
On Fri, Dec 14, 2012 at 12:45 PM, Alex Williamson
<alex.wi...@redhat.com> wrote:
> Is the MegaRAID firmware and system management firmware the same as
> well? Thanks.

I'v updated all the firmware using Dell's firmware-tools:

# inventory_firmware
Wait while we inventory system:
System inventory:
BIOS = 6.3.0
SAS/SATA Backplane 0:0 Backplane Firmware = 1.07
PERC 6/i Integrated Controller 0 Firmware = 6.3.1-0003
Dell OS Drivers Pack, v.6.5.3, A00 = 6.5.3
Dell Lifecycle Controller = 1.5.5.27
NetXtreme II BCM5709 Gigabit Ethernet rev 20 (eth1) = 7.2.20
NetXtreme II BCM5709 Gigabit Ethernet rev 20 (eth0) = 7.2.20
ST3600057SS Firmware = es66
iDRAC6 = 1.92
NetXtreme II BCM5709 Gigabit Ethernet rev 20 (eth2) = 7.2.20
NetXtreme II BCM5709 Gigabit Ethernet rev 20 (eth3) = 7.2.20
Dell 32 Bit Diagnostics, v.5154A0, 5154.1 = 5154a0
System BIOS for PowerEdge R710 = 6.3.0

Thanks

Jason Gao

unread,
Dec 14, 2012, 2:10:01 AM12/14/12
to
On Fri, Dec 14, 2012 at 2:56 PM, Jason Gao <pkill...@gmail.com> wrote:
> On Fri, Dec 14, 2012 at 12:45 PM, Alex Williamson
> <alex.wi...@redhat.com> wrote:
>> Is the MegaRAID firmware and system management firmware the same as
>> well? Thanks.
>
> I'v updated all the firmware using Dell's firmware-tools:
>
> # inventory_firmware
> Wait while we inventory system:
> System inventory:
> BIOS = 6.3.0
> SAS/SATA Backplane 0:0 Backplane Firmware = 1.07
> PERC 6/i Integrated Controller 0 Firmware = 6.3.1-0003
> Dell OS Drivers Pack, v.6.5.3, A00 = 6.5.3
> Dell Lifecycle Controller = 1.5.5.27
> NetXtreme II BCM5709 Gigabit Ethernet rev 20 (eth1) = 7.2.20
> NetXtreme II BCM5709 Gigabit Ethernet rev 20 (eth0) = 7.2.20
> ST3600057SS Firmware = es66
> iDRAC6 = 1.92
> NetXtreme II BCM5709 Gigabit Ethernet rev 20 (eth2) = 7.2.20
> NetXtreme II BCM5709 Gigabit Ethernet rev 20 (eth3) = 7.2.20
> Dell 32 Bit Diagnostics, v.5154A0, 5154.1 = 5154a0
> System BIOS for PowerEdge R710 = 6.3.0
>
> Thanks

#lspci -vvvv -s 03:00.0|grep fail:
pcilib: sysfs_read_vpd: read failed: Connection timed out

#strace lspci -vvvv -s 03:00.0|grep fail:
....
open("/sys/bus/pci/devices/0000:03:00.0/vpd", O_RDONLY) = 4
pread(4, 0x7fff30670b3f, 1, 0) = -1 ETIMEDOUT (Connection timed out)
write(2, "pcilib: ", 8pcilib: ) = 8
write(2, "sysfs_read_vpd: read failed: Con"..., 49sysfs_read_vpd: read
failed: Connection timed out) = 49
write(2, "\n", 1
....

Don Dutile

unread,
Dec 14, 2012, 4:40:03 PM12/14/12
to
DMAR table does not have an entry for this device to this region.
Once the driver reconfigs/resets the device to stop polling bios-boot
cmd rings and use (new) OS (dma-mapped) rings, there's a period of time
during this transition that the hw is babbling away to an area that is no
longer mapped.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in

Don Dutile

unread,
Dec 14, 2012, 5:00:02 PM12/14/12
to
On 12/13/2012 09:01 PM, Jason Gao wrote:
> On Fri, Dec 14, 2012 at 12:23 AM, Alex Williamson
> <alex.wi...@redhat.com> wrote:
>>
>> Device 03:00.0 is your raid controller:
>>
>> 03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)
>>
>> For some reason it's trying to read from ffe65000, ffe8a000, ffe89000,
>> ffe86000, ffe87000, ffe84000. Those are in reserved memory regions, so
>> it's not reading an OS allocated buffer, which probably means it's some
>> kind of side-band communication with a management controller. I'd guess
>> it's a BIOS bug and there should be an RMRR covering those accesses.
>> Thanks,
>
> First of all ,I want to known whether I can ignore these errors on the
> production server,and do these error may affect the system?
>
> By the way,when I removed the "intel_iommu=on" from /etc/grub.conf,no
> DMAR related errors occur
>
well, if you don't enable the IOMMU, then it won't have IOMMU faults! ;-)

> It's a strange thing,other three Dell R710 servers with the same bios
> version v. 6.3.0, same kernel 2.6.32-279.14.1 on RHEL6u3(Centos 6u3)
> ,but these errors don't appear on these tree servers
>
mptsas or smi fw has to be different....

> Anyone have any idea for this ?
>
> thanks
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in

Don Dutile

unread,
Dec 14, 2012, 5:00:02 PM12/14/12
to
On 12/13/2012 09:01 PM, Jason Gao wrote:
> On Fri, Dec 14, 2012 at 12:23 AM, Alex Williamson
> <alex.wi...@redhat.com> wrote:
>>
>> Device 03:00.0 is your raid controller:
>>
>> 03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)
>>
>> For some reason it's trying to read from ffe65000, ffe8a000, ffe89000,
>> ffe86000, ffe87000, ffe84000. Those are in reserved memory regions, so
>> it's not reading an OS allocated buffer, which probably means it's some
>> kind of side-band communication with a management controller. I'd guess
>> it's a BIOS bug and there should be an RMRR covering those accesses.
>> Thanks,
>
> First of all ,I want to known whether I can ignore these errors on the
> production server,and do these error may affect the system?
>
> By the way,when I removed the "intel_iommu=on" from /etc/grub.conf,no
> DMAR related errors occur
>
> It's a strange thing,other three Dell R710 servers with the same bios
> version v. 6.3.0, same kernel 2.6.32-279.14.1 on RHEL6u3(Centos 6u3)
> ,but these errors don't appear on these tree servers
>
forgot: did you check that all the bios settings are the same btwn
the 710 systems?

> Anyone have any idea for this ?
>
> thanks
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in

Jason Gao

unread,
Dec 15, 2012, 3:20:01 AM12/15/12
to
On Sat, Dec 15, 2012 at 5:54 AM, Don Dutile <ddu...@redhat.com> wrote:
> mptsas or smi fw has to be different....
this server:
# inventory_firmware
Wait while we inventory system:
System inventory:
BIOS = 6.3.0
SAS/SATA Backplane 0:0 Backplane Firmware = 1.07
PERC 6/i Integrated Controller 0 Firmware = 6.3.1-0003
Dell OS Drivers Pack, v.6.5.3, A00 = 6.5.3
Dell Lifecycle Controller = 1.5.5.27
NetXtreme II BCM5709 Gigabit Ethernet rev 20 (eth1) = 7.2.20
NetXtreme II BCM5709 Gigabit Ethernet rev 20 (eth0) = 7.2.20
ST3600057SS Firmware = es66
iDRAC6 = 1.92
NetXtreme II BCM5709 Gigabit Ethernet rev 20 (eth2) = 7.2.20
NetXtreme II BCM5709 Gigabit Ethernet rev 20 (eth3) = 7.2.20
Dell 32 Bit Diagnostics, v.5154A0, 5154.1 = 5154a0
System BIOS for PowerEdge R710 = 6.3.0

other servers:
# inventory_firmware
Wait while we inventory system:
System inventory:
BIOS = 6.3.0
SAS/SATA Backplane 0:0 Backplane Firmware = 1.07
PERC H700 Integrated Controller 0 Firmware = 12.10.4-0001
Dell OS Drivers Pack, 7.1.0.9, A00 = 7.1.0.9
Dell Lifecycle Controller, 1.5.5.27, A00 = 1.5.5.27
ST3600057SS Firmware = es65
iDRAC6 = 1.90
Dell 32 Bit Diagnostics, v.5154A0, 5154.

Jason Gao

unread,
Dec 15, 2012, 3:30:02 AM12/15/12
to
On Sat, Dec 15, 2012 at 5:55 AM, Don Dutile <ddu...@redhat.com> wrote:
> forgot: did you check that all the bios settings are the same btwn
> the 710 systems?

Bios settings should be the same between servers, I'v ignored these
errors and run KVM on this server,deployed non-critical java
production applications running on kvm guest,

thanks
--

Robert Hancock

unread,
Dec 15, 2012, 5:20:02 PM12/15/12
to
Maybe some kind of boot PCI quirk is needed to stop the device DMA
activity before enabling the IOMMU?

Don Dutile

unread,
Dec 17, 2012, 11:30:02 AM12/17/12
to
No, lack of a *proper* RMRR for this device is the source of the problem;
that's why the RMRR's exist -- so this transition state does not cause these
types of problems.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
0 new messages