Linux Pcie Driver Architecture

1 view
Skip to first unread message

David

unread,
Aug 3, 2024, 11:16:11 AM8/3/24
to highstazival

pci_register_driver() leaves most of the probing for devices tothe PCI layer and supports online insertion/removal of devices [thussupporting hot-pluggable PCI, CardBus, and Express-Card in a single driver].pci_register_driver() call requires passing in a table of functionpointers and thus dictates the high level structure of a driver.

If the PCI subsystem is not configured (CONFIG_PCI is not set), most ofthe PCI functions described below are defined as inline functions eithercompletely empty or just returning an appropriate error codes to avoidlots of ifdefs in the drivers.

Hook into reboot_notifier_list (kernel/sys.c).Intended to stop any idling DMA operations.Useful for enabling wake-on-lan (NIC) or changingthe power state of a device before reboot.e.g. drivers/net/e100.c.

Note that driver_data must match the value used by any of the pci_device_identries defined in the driver. This makes the driver_data field mandatoryif all the pci_device_id entries have a non-zero driver_data value.

PCI drivers should have a really good reason for not using thepci_register_driver() interface to search for PCI devices.The main reason PCI devices are controlled by multiple driversis because one PCI device implements several different HW services.E.g. combined serial/parallel port/floppy controller.

These functions are hotplug-safe. They increment the reference count onthe pci_dev that they return. You must eventually (possibly at module unload)decrement the reference count on these devices by calling pci_dev_put().

The driver can access PCI config space registers at any time.(Well, almost. When running BIST, config space can go away...butthat will just result in a PCI Bus Master Abort and config readswill return garbage).

If the PCI device can use the PCI Memory-Write-Invalidate transaction,call pci_set_mwi(). This enables the PCI_COMMAND bit for Mem-Wr-Invaland also ensures that the cache line size register is set correctly.Check the return value of pci_set_mwi() as not all architecturesor chip-sets may support Memory-Write-Invalidate. Alternatively,if Mem-Wr-Inval would be nice to have but is not required, callpci_try_set_mwi() to have the system do its best effort at enablingMem-Wr-Inval.

The device driver needs to call pci_request_region() to verifyno other device is already using the same address resource.Conversely, drivers should call pci_release_region() AFTERcalling pci_disable_device().The idea is to prevent two devices colliding on the same address range.

MSI capability can be enabled by calling pci_alloc_irq_vectors() with thePCI_IRQ_MSI and/or PCI_IRQ_MSIX flags before calling request_irq(). Thiscauses the PCI support to program CPU vector data into the PCI devicecapability registers. Many architectures, chip-sets, or BIOSes do NOTsupport MSI or MSI-X and a call to pci_alloc_irq_vectors with justthe PCI_IRQ_MSI and PCI_IRQ_MSIX flags will fail, so try to alwaysspecify PCI_IRQ_INTX as well.

Drivers that have different interrupt handlers for MSI/MSI-X andlegacy INTx should chose the right one based on the msi_enabledand msix_enabled flags in the pci_dev structure after callingpci_alloc_irq_vectors.

MSI avoids DMA/IRQ race conditions. DMA to host memory is guaranteedto be visible to the host CPU(s) when the MSI is delivered. Thisis important for both data coherency and avoiding stale control data.This guarantee allows the driver to omit MMIO reads to flushthe DMA stream.

While Chapter 9 introduced the lowest levels of hardware control, this chapter provides an overview of the higher-level bus architectures. A bus is made up of both an electrical interface and a programming interface. In this chapter, we deal with the programming interface.

The PCI specification covers most issues related to computer interfaces. We are not going to cover it all here; in this section, we are mainly concerned with how a PCI driver can find its hardware and gain access to it. The probing techniques discussed in Chapter 12 and Chapter 10 can be used with PCI devices, but the specification offers an alternative that is preferable to probing.

The PCI architecture was designed as a replacement for the ISA standard, with three main goals: to get better performance when transferring data between the computer and its peripherals, to be as platform independent as possible, and to simplify adding and removing peripherals to the system.

Most recent workstations feature at least two PCI buses. Plugging more than one bus in a single system is accomplished by means of bridges , special-purpose PCI peripherals whose task is joining two buses. The overall layout of a PCI system is a tree where each bus is connected to an upper-layer bus, up to bus 0 at the root of the tree. The CardBus PC-card system is also connected to the PCI system via bridges. A typical PCI system is represented in Figure 12-1, where the various bridges are highlighted.

The 16-bit hardware addresses associated with PCI peripherals, although mostly hidden in the struct pci_dev object, are still visible occasionally, especially when lists of devices are being used. One such situation is the output of lspci (part of the pciutils package, available with most distributions) and the layout of information in /proc/pci and /proc/bus/pci. The sysfs representation of PCI devices also shows this addressing scheme, with the addition of the PCI domain information.[1] When the hardware address is displayed, it can be shown as two values (an 8-bit bus number and an 8-bit device and function number), as three values (bus, device, and function), or as four values (domain, bus, device, and function); all the values are usually displayed in hexadecimal.

For example, /proc/bus/pci/devices uses a single 16-bit field (to ease parsing and sorting), while /proc/bus/ busnumber splits the address into three fields. The following shows how those addresses appear, showing only the beginning of the output lines:

The hardware circuitry of each peripheral board answers queries pertaining to three address spaces: memory locations, I/O ports, and configuration registers. The first two address spaces are shared by all the devices on the same PCI bus (i.e., when you access a memory location, all the devices on that PCI bus see the bus cycle at the same time). The configuration space, on the other hand, exploits geographical addressing . Configuration queries address only one slot at a time, so they never collide.

As far as the driver is concerned, memory and I/O regions are accessed in the usual ways via inb, readb, and so forth. Configuration transactions, on the other hand, are performed by calling specific kernel functions to access configuration registers. With regard to interrupts, every PCI slot has four interrupt pins, and each device function can use one of them without being concerned about how those pins are routed to the CPU. Such routing is the responsibility of the computer platform and is implemented outside of the PCI bus. Since the PCI specification requires interrupt lines to be shareable, even a processor with a limited number of IRQ lines, such as the x86, can host many PCI interface boards (each with four interrupt pins).

The PCI configuration space consists of 256 bytes for each device function (except for PCI Express devices, which have 4 KB of configuration space for each function), and the layout of the configuration registers is standardized. Four bytes of the configuration space hold a unique function ID, so the driver can identify its device by looking for the specific ID for that peripheral.[3] In summary, each device board is geographically addressed to retrieve its configuration registers; the information in those registers can then be used to perform normal I/O access, without the need for further geographic addressing.

It should be clear from this description that the main innovation of the PCI interface standard over ISA is the configuration address space. Therefore, in addition to the usual driver code, a PCI driver needs the ability to access the configuration space, in order to save itself from risky probing tasks.

Fortunately, every PCI motherboard is equipped with PCI-aware firmware, called the BIOS, NVRAM, or PROM, depending on the platform. The firmware offers access to the device configuration address space by reading and writing registers in the PCI controller.

The file config is a binary file that allows the raw PCI config information to be read from the device (just like the /proc/bus/pci/*/* provides.) The files vendor, device, subsystem_device, subsystem_vendor, and class all refer to the specific values of this PCI device (all PCI devices provide this information.) The file irq shows the current IRQ assigned to this PCI device, and the file resource shows the current memory resources allocated by this device.

In this section, we look at the configuration registers that PCI devices contain. All PCI devices feature at least a 256-byte address space. The first 64 bytes are standardized, while the rest are device dependent. Figure 12-2 shows the layout of the device-independent configuration space.

Three or five PCI registers identify a device: vendorID, deviceID, and class are the three that are always used. Every PCI manufacturer assigns proper values to these read-only registers, and the driver can use them to look for the device. Additionally, the fields subsystem vendorID and subsystem deviceID are sometimes set by the vendor to further differentiate similar devices.

This 16-bit register identifies a hardware manufacturer. For instance, every Intel device is marked with the same vendor number, 0x8086. There is a global registry of such numbers, maintained by the PCI Special Interest Group, and manufacturers must apply to have a unique number assigned to them.

This is another 16-bit register, selected by the manufacturer; no official registration is required for the device ID. This ID is usually paired with the vendor ID to make a unique 32-bit identifier for a hardware device. We use the word signature to refer to the vendor and device ID pair. A device driver usually relies on the signature to identify its device; you can find what value to look for in the hardware manual for the target device.

c80f0f1006
Reply all
Reply to author
Forward
0 new messages