In this blog we will see the linux code flow for the PCI bus enumeration. It tells us which functions fill up config data in pci_dev structure for the devices.
Later I have also mentioned how are the function calls when pci config space reads are done.
I have used Linux kernel 3.15 for this illustration.
The PCI enumeration is started from acpi_init in acpi supported platforms. The BIOS has already initialised the config space for devices. This config data is fetched while pci bus enumeration.
The function acpi_init calls acpi_scan_init
acpi_scan_init calls acpi_pci_root_init() function. This function adds a pci_root_handler for ACPI scan.
static struct acpi_scan_handler pci_root_handler = {
.ids = root_device_ids,
.attach = acpi_pci_root_add,
.detach = acpi_pci_root_remove,
.hotplug = {
.enabled = true,
.scan_dependent = acpi_pci_root_scan_dependent,
},
};
After this acpi_scan_init calls acpi_bus_scan
acpi_bus_scan scans for handlers and tries to call the handler in function acpi_scan_attach_handler. It calls the attach function for acpi handler. For this time it comes to be acpi_pci_root_add.
acpi_pci_root_add function fills the bus number in root->secondary resource.
The ACPI method of METHOD_NAME__CRS is called to fill the bus number.
Similarly root->segment is filled from ACPI method METHOD_NAME__SEG
root->mcfg_addr is filled from METHOD_NAME__CBA
After filling these info it calls function pci_acpi_scan_root. In function pci_acpi_scan_root since the bus is not added it calls pci_create_root_bus to allocate a root bus and corresponding host bridge.
It then calls function pci_scan_child_bus to proceed further. This function for the first time is scanning the root bus itself. Now this function calls pci_scan_slot for all the devices.
The devfn is combination of devicesID (5bits) and device function (3 bits). So this loop iterates through all the deviceIDs
/* Go find them, Rover! */
for (devfn = 0; devfn < 0x100; devfn += 8)
pci_scan_slot(bus, devfn)
After a while if the device comes out to be a bridge the this function also calls pci_scan_bridge.
Lets look at function pci_scan_slot. This function calls pci_scan_single_device calling pci_scan_device.
pci_scan_device reads the vendor ID for device.
pci_bus_read_dev_vendor_id if the read fails or the data returned is 0xffffffff then device is not present.
pci_scan_device allocated pci_dev structure and calls pci_setup_device.
pci_setup_device reads the various pci data from PCI config space and fills it in pci_dev structure.
pci_scan_single_device also calls pci_device_add. pci_device_add initializes struct device for pci_dev. Then it scans the various PCI capabilities in function pci_init_capabilities.
Sample call stack for pci_dev initialisation :
[ 0.154205] [<ffffffff816872a5>] pci_mmcfg_read+0x125/0x130
[ 0.154208] [<ffffffff8168b5f3>] raw_pci_read+0x23/0x40
[ 0.154211] [<ffffffff8168b63c>] pci_read+0x2c/0x30
[ 0.154214] [<ffffffff813e78c6>] pci_bus_read_config_dword+0x66/0x90
[ 0.154217] [<ffffffff813e9b5d>] pci_cfg_space_size_ext+0x6d/0xb0
[ 0.154220] [<ffffffff813ea968>] pci_cfg_space_size+0x68/0x70
[ 0.154223] [<ffffffff813eab31>] pci_setup_device+0x1c1/0x530
[ 0.154226] [<ffffffff814f10f7>] ? get_device+0x17/0x30
[ 0.154229] [<ffffffff813eb082>] pci_scan_single_device+0x82/0xc0
[ 0.154232] [<ffffffff813eb10e>] pci_scan_slot+0x4e/0x140
[ 0.154235] [<ffffffff813ec38d>] pci_scan_child_bus+0x3d/0x160
[ 0.154238] [<ffffffff81689f30>] pci_acpi_scan_root+0x360/0x550
[ 0.154242] [<ffffffff8143cf2c>] acpi_pci_root_add+0x3b7/0x49b
[ 0.154245] [<ffffffff8143ed7d>] ? acpi_pnp_match+0x31/0xa8
[ 0.154248] [<ffffffff81438fef>] acpi_bus_attach+0x109/0x1fc
[ 0.154251] [<ffffffff814f5a9e>] ? device_attach+0x6e/0xd0
[ 0.154254] [<ffffffff8143906a>] acpi_bus_attach+0x184/0x1fc
[ 0.154256] [<ffffffff814f5a9e>] ? device_attach+0x6e/0xd0
[ 0.154259] [<ffffffff8143906a>] acpi_bus_attach+0x184/0x1fc
[ 0.154262] [<ffffffff81d85be8>] ? acpi_sleep_proc_init+0x2a/0x2a
[ 0.154265] [<ffffffff814391d5>] acpi_bus_scan+0x5b/0x6d
[ 0.154268] [<ffffffff81d86039>] acpi_scan_init+0x6d/0x1b3
[ 0.154271] [<ffffffff81d9bfa9>] ? __pci_mmcfg_init+0x60/0xa7
[ 0.154274] [<ffffffff81d85e37>] acpi_init+0x24f/0x267
[ 0.154277] [<ffffffff81002144>] do_one_initcall+0xd4/0x210
[ 0.154280] [<ffffffff81091a00>] ? parse_args+0x70/0x480
[ 0.154283] [<ffffffff810b32f8>] ? __wake_up+0x48/0x60
[ 0.154286] [<ffffffff81d3e27a>] kernel_init_freeable+0x16c/0x1f9
[ 0.154289] [<ffffffff81d3d9a7>] ? initcall_blacklist+0xc0/0xc0
[ 0.154292] [<ffffffff817a00c0>] ? rest_init+0x80/0x80
[ 0.154295] [<ffffffff817a00ce>] kernel_init+0xe/0xf0
[ 0.154298] [<ffffffff817b5d98>] ret_from_fork+0x58/0x90
[ 0.154301] [<ffffffff817a00c0>] ? rest_init+0x80/0x80
How the PCI config reads/writes occur ?
We see that several function pci_read_config_byte, pci_read_config_word etc. are called to read the config space of PCI. These functions are defined in include/linux/pci.h as
static inline int pci_read_config_word(const struct pci_dev *dev, int where, u16 *val)
{
return pci_bus_read_config_word(dev->bus, dev->devfn, where, val);
}
The functions of pci_bus_read_config_word are defined in drivers/pci/access.c
#define PCI_OP_READ(size,type,len) \
int pci_bus_read_config_##size \
(struct pci_bus *bus, unsigned int devfn, int pos, type *value) \
{ \
int res; \
unsigned long flags; \
u32 data = 0; \
if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \
raw_spin_lock_irqsave(&pci_lock, flags); \
res = bus->ops->read(bus, devfn, pos, len, &data); \
*value = (type)data; \
raw_spin_unlock_irqrestore(&pci_lock, flags); \
return res; \
}
Here it is using the bus->ops->read function to do the reads. This read is initialized at init time. The variables of raw_pci_ops and raw_pci_ext_ops are initialised for this.
raw_pci_ext_ops is initialised to pci_mmcfg in function pci_mmcfg_arch_init
raw_pci_ext_ops = &pci_mmcfg;
[ 0.140712] kundan pci_mmcfg_arch_init 132
[ 0.140715] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.19.8-ckt9 #4
[ 0.140717] Hardware name: LENOVO 28427ZQ/INVALID, BIOS 6JET58WW (1.16 ) 09/17/2009
[ 0.140719] 0000000000000000 ffff88013afd3dd8 ffffffff817ae263 0000000000001970
[ 0.140723] ffffffff81cd7740 ffff88013afd3df8 ffffffff81d9bb9b 0000000000000aae
[ 0.140726] ffffffff81cd7740 ffff88013afd3e18 ffffffff81d9bfa9 ffffffff81c1d060
[ 0.140729] Call Trace:
[ 0.140733] [<ffffffff817ae263>] dump_stack+0x45/0x57
[ 0.140736] [<ffffffff81d9bb9b>] pci_mmcfg_arch_init+0x5a/0x63
[ 0.140740] [<ffffffff81d9bfa9>] __pci_mmcfg_init+0x60/0xa7
[ 0.140743] [<ffffffff81d9c637>] pci_mmcfg_late_init+0x27/0x29
[ 0.140746] [<ffffffff81d85e32>] acpi_init+0x24a/0x267
[ 0.140749] [<ffffffff81002144>] do_one_initcall+0xd4/0x210
[ 0.140752] [<ffffffff81091a00>] ? parse_args+0x70/0x480
[ 0.140755] [<ffffffff810b32f8>] ? __wake_up+0x48/0x60
[ 0.140758] [<ffffffff81d3e27a>] kernel_init_freeable+0x16c/0x1f9
[ 0.140761] [<ffffffff81d3d9a7>] ? initcall_blacklist+0xc0/0xc0
[ 0.140764] [<ffffffff817a00c0>] ? rest_init+0x80/0x80
[ 0.140767] [<ffffffff817a00ce>] kernel_init+0xe/0xf0
[ 0.140770] [<ffffffff817b5d98>] ret_from_fork+0x58/0x90
[ 0.140773] [<ffffffff817a00c0>] ? rest_init+0x80/0x80
Now pci_mmcfg is defined as :
const struct pci_raw_ops pci_mmcfg = {
.read = pci_mmcfg_read,
.write = pci_mmcfg_write,
};
static int pci_mmcfg_read(unsigned int seg, unsigned int bus,
unsigned int devfn, int reg, int len, u32 *value)
{
char __iomem *addr;
/* Why do we have this when nobody checks it. How about a BUG()!? -AK */
if (unlikely((bus > 255) || (devfn > 255) || (reg > 4095))) {
err: *value = -1;
return -EINVAL;
}
rcu_read_lock();
addr = pci_dev_base(seg, bus, devfn);
if (!addr) {
rcu_read_unlock();
goto err;
}
switch (len) {
case 1:
*value = mmio_config_readb(addr + reg);
break;
case 2:
*value = mmio_config_readw(addr + reg);
break;
case 4:
*value = mmio_config_readl(addr + reg);
break;
}
rcu_read_unlock();
return 0;
}
So the pci_dev_base function actually fetches the Memory mapped address for PCI config reads and writes.
static char __iomem *pci_dev_base(unsigned int seg, unsigned int bus, unsigned int devfn)
{
struct pci_mmcfg_region *cfg = pci_mmconfig_lookup(seg, bus);
if (cfg && cfg->virt)
return cfg->virt + (PCI_MMCFG_BUS_OFFSET(bus) | (devfn << 12));
return NULL;
}
struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus)
{
struct pci_mmcfg_region *cfg;
list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
if (cfg->segment == segment &&
cfg->start_bus <= bus && bus <= cfg->end_bus)
return cfg;
return NULL;
}
pci_mmcfg_list should have been populated for this bus. This is done in pci_mmconfig_add function. This value is assigned from ACPI by parsing the SFI table. Here is a call stack for it getting initialized
[ 0.112385] [<ffffffff81d9c0ab>] pci_mmconfig_add+0x3a/0xa0
[ 0.112388] [<ffffffff81d9c39d>] pci_parse_mcfg+0x8e/0x13b
[ 0.112391] [<ffffffff81d9c30f>] ? pci_mmcfg_e7520+0x61/0x61
[ 0.112394] [<ffffffff81d9bab0>] ? pcibios_resource_survey+0x72/0x72
[ 0.112397] [<ffffffff81d85030>] acpi_table_parse+0x6c/0x89
[ 0.112400] [<ffffffff81d9c054>] acpi_sfi_table_parse.constprop.8+0x17/0x34
[ 0.112403] [<ffffffff81d9c5ff>] pci_mmcfg_early_init+0xea/0xfb
[ 0.112406] [<ffffffff81d9bacb>] pci_arch_init+0x1b/0x6a
[ 0.112409] [<ffffffff81002144>] do_one_initcall+0xd4/0x210
[ 0.112411] [<ffffffff81091a00>] ? parse_args+0x70/0x480
[ 0.112414] [<ffffffff810b32f8>] ? __wake_up+0x48/0x60
[ 0.112417] [<ffffffff81d3e27a>] kernel_init_freeable+0x16c/0x1f9
[ 0.112420] [<ffffffff81d3d9a7>] ? initcall_blacklist+0xc0/0xc0
[ 0.112423] [<ffffffff817a00c0>] ? rest_init+0x80/0x80
[ 0.112426] [<ffffffff817a00ce>] kernel_init+0xe/0xf0
[ 0.112429] [<ffffffff817b5d98>] ret_from_fork+0x58/0x90
[ 0.112432] [<ffffffff817a00c0>] ? rest_init+0x80/0x80
There is another way to access the PCI config space using older method. This method is now also supported by PCI as a backward compatible stuff. Check http://wiki.osdev.org/PCI for relevance of 0xCFC and 0xCF8 ports
static int pci_conf1_read(unsigned int seg, unsigned int bus,
unsigned int devfn, int reg, int len, u32 *value)
{
unsigned long flags;
if (seg || (bus > 255) || (devfn > 255) || (reg > 4095)) {
*value = -1;
return -EINVAL;
}
raw_spin_lock_irqsave(&pci_config_lock, flags);
outl(PCI_CONF1_ADDRESS(bus, devfn, reg), 0xCF8);
switch (len) {
case 1:
*value = inb(0xCFC + (reg & 3));
break;
case 2:
*value = inw(0xCFC + (reg & 2));
break;
case 4:
*value = inl(0xCFC);
break;
}
raw_spin_unlock_irqrestore(&pci_config_lock, flags);
return 0;
}
Info about driver loading :
https://www.linux.com/news/hardware/peripherals/180950-udev
https://bobcares.com/blog/udev-introduction-to-device-management-in-modern-linux-system/
How the probe method of pci device driver is called ?
As we have seen in the PCI enumeration that all the devices attached are checked in a tree like fashion, reading the vendor ID of all the devices. For the devices if the vendor ID comes out to be valid then the function pci_device_add is called.
This function sends a Uevent to user space mentioning the vendor ID, device ID etc. The Udevd sees that this device is enumerated/added. Udevd looks inside /lib/modules/<kernel>/modules.alias for the driver present for this PCI device. Then Udevd loads the driver. Loading driver does pci_register_driver. This goes through following calls and calls the probe of driver.
Jan 10 19:57:42 localhost kernel: [ 11.547671] CPU: 1 PID: 345 Comm: udevd Not tainted 4.0.0 #1
Jan 10 19:57:42 localhost kernel: [ 11.547672] Hardware name: Hewlett-Packard HP ZBook 840 G2/2216, BIOS M71 Ver. 00.55 05/14/2014
Jan 10 19:57:42 localhost kernel: [ 11.547673] ffff88015707f400 ffff8801568af948 ffffffff816cd469 0000000000000080
Jan 10 19:57:42 localhost kernel: [ 11.547676] 0000000000000000 ffff8801568af978 ffffffff8144a2b2 0000000000000000
Jan 10 19:57:42 localhost kernel: [ 11.547678] ffff88015707f400 ffffffffa01f3d20 0000000000000003 ffff8801568afa18
Jan 10 19:57:42 localhost kernel: [ 11.547681] Call Trace:
Jan 10 19:57:42 localhost kernel: [ 11.547684] [<ffffffff816cd469>] dump_stack+0x45/0x57
Jan 10 19:57:42 localhost kernel: [ 11.547687] [<ffffffff8144a2b2>] platform_device_add+0x32/0x300
Jan 10 19:57:42 localhost kernel: [ 11.547690] [<ffffffffa01df44f>] mfd_add_device+0x30f/0x3a0 [mfd_core]
Jan 10 19:57:42 localhost kernel: [ 11.547693] [<ffffffff811b9466>] ? __kmalloc+0x166/0x1a0
Jan 10 19:57:42 localhost kernel: [ 11.547696] [<ffffffffa01df6d3>] ? mfd_add_devices+0x53/0x980 [mfd_core]
Jan 10 19:57:42 localhost kernel: [ 11.547699] [<ffffffffa01df6d3>] ? mfd_add_devices+0x53/0x980 [mfd_core]
Jan 10 19:57:42 localhost kernel: [ 11.547702] [<ffffffffa01df735>] mfd_add_devices+0xb5/0x980 [mfd_core]
Jan 10 19:57:42 localhost kernel: [ 11.547705] [<ffffffff815a54d3>] ? raw_pci_read+0x23/0x40
Jan 10 19:57:42 localhost kernel: [ 11.547709] [<ffffffffa01f0515>] lpc_ich_probe+0x395/0x5c4 [lpc_ich]
Jan 10 19:57:42 localhost kernel: [ 11.547712] [<ffffffff8136f70e>] local_pci_probe+0x4e/0xa0
Jan 10 19:57:42 localhost kernel: [ 11.547716] [<ffffffff814432a7>] ? get_device+0x17/0x30
Jan 10 19:57:42 localhost kernel: [ 11.547720] [<ffffffff8136f989>] pci_device_probe+0xd9/0x120
Jan 10 19:57:42 localhost kernel: [ 11.547724] [<ffffffff814480bd>] driver_probe_device+0x9d/0x3c0
Jan 10 19:57:42 localhost kernel: [ 11.547728] [<ffffffff8144848b>] __driver_attach+0xab/0xb0
Jan 10 19:57:42 localhost kernel: [ 11.547732] [<ffffffff814483e0>] ? driver_probe_device+0x3c0/0x3c0
Jan 10 19:57:42 localhost kernel: [ 11.547737] [<ffffffff8144617d>] bus_for_each_dev+0x5d/0xa0
Jan 10 19:57:42 localhost kernel: [ 11.547740] [<ffffffff814479fe>] driver_attach+0x1e/0x20
Jan 10 19:57:42 localhost kernel: [ 11.547744] [<ffffffff814476a4>] bus_add_driver+0x124/0x250
Jan 10 19:57:42 localhost kernel: [ 11.547748] [<ffffffffa01e4000>] ? 0xffffffffa01e4000
Jan 10 19:57:42 localhost kernel: [ 11.547752] [<ffffffff81448ca4>] driver_register+0x64/0xf0
Jan 10 19:57:42 localhost kernel: [ 11.547756] [<ffffffff8136ebbb>] __pci_register_driver+0x4b/0x50
Jan 10 19:57:42 localhost kernel: [ 11.547760] [<ffffffffa01e401e>] lpc_ich_driver_init+0x1e/0x1000 [lpc_ich]
Jan 10 19:57:42 localhost kernel: [ 11.547763] [<ffffffff81000310>] do_one_initcall+0xc0/0x1e0
Jan 10 19:57:42 localhost kernel: [ 11.547767] [<ffffffff811b97a5>] ? kmem_cache_alloc_trace+0x35/0x140
Jan 10 19:57:42 localhost kernel: [ 11.547769] [<ffffffff816c9e86>] do_init_module+0x61/0x1ce
Jan 10 19:57:42 localhost kernel: [ 11.547772] [<ffffffff810f2ad6>] load_module+0x1d16/0x2580
Jan 10 19:57:42 localhost kernel: [ 11.547775] [<ffffffff810ee7e0>] ? unset_module_core_ro_nx+0x80/0x80
Jan 10 19:57:42 localhost kernel: [ 11.547779] [<ffffffff816d6222>] ? page_fault+0x22/0x30
Jan 10 19:57:42 localhost kernel: [ 11.547782] [<ffffffff810f3443>] SyS_init_module+0x103/0x160
Jan 10 19:57:42 localhost kernel: [ 11.547785] [<ffffffff816d47b2>] system_call_fastpath+0x12/0x17
Later I have also mentioned how are the function calls when pci config space reads are done.
I have used Linux kernel 3.15 for this illustration.
The PCI enumeration is started from acpi_init in acpi supported platforms. The BIOS has already initialised the config space for devices. This config data is fetched while pci bus enumeration.
The function acpi_init calls acpi_scan_init
acpi_scan_init calls acpi_pci_root_init() function. This function adds a pci_root_handler for ACPI scan.
static struct acpi_scan_handler pci_root_handler = {
.ids = root_device_ids,
.attach = acpi_pci_root_add,
.detach = acpi_pci_root_remove,
.hotplug = {
.enabled = true,
.scan_dependent = acpi_pci_root_scan_dependent,
},
};
After this acpi_scan_init calls acpi_bus_scan
acpi_bus_scan scans for handlers and tries to call the handler in function acpi_scan_attach_handler. It calls the attach function for acpi handler. For this time it comes to be acpi_pci_root_add.
acpi_pci_root_add function fills the bus number in root->secondary resource.
The ACPI method of METHOD_NAME__CRS is called to fill the bus number.
Similarly root->segment is filled from ACPI method METHOD_NAME__SEG
root->mcfg_addr is filled from METHOD_NAME__CBA
After filling these info it calls function pci_acpi_scan_root. In function pci_acpi_scan_root since the bus is not added it calls pci_create_root_bus to allocate a root bus and corresponding host bridge.
It then calls function pci_scan_child_bus to proceed further. This function for the first time is scanning the root bus itself. Now this function calls pci_scan_slot for all the devices.
The devfn is combination of devicesID (5bits) and device function (3 bits). So this loop iterates through all the deviceIDs
/* Go find them, Rover! */
for (devfn = 0; devfn < 0x100; devfn += 8)
pci_scan_slot(bus, devfn)
After a while if the device comes out to be a bridge the this function also calls pci_scan_bridge.
Lets look at function pci_scan_slot. This function calls pci_scan_single_device calling pci_scan_device.
pci_scan_device reads the vendor ID for device.
pci_bus_read_dev_vendor_id if the read fails or the data returned is 0xffffffff then device is not present.
pci_scan_device allocated pci_dev structure and calls pci_setup_device.
pci_setup_device reads the various pci data from PCI config space and fills it in pci_dev structure.
pci_scan_single_device also calls pci_device_add. pci_device_add initializes struct device for pci_dev. Then it scans the various PCI capabilities in function pci_init_capabilities.
Sample call stack for pci_dev initialisation :
[ 0.154205] [<ffffffff816872a5>] pci_mmcfg_read+0x125/0x130
[ 0.154208] [<ffffffff8168b5f3>] raw_pci_read+0x23/0x40
[ 0.154211] [<ffffffff8168b63c>] pci_read+0x2c/0x30
[ 0.154214] [<ffffffff813e78c6>] pci_bus_read_config_dword+0x66/0x90
[ 0.154217] [<ffffffff813e9b5d>] pci_cfg_space_size_ext+0x6d/0xb0
[ 0.154220] [<ffffffff813ea968>] pci_cfg_space_size+0x68/0x70
[ 0.154223] [<ffffffff813eab31>] pci_setup_device+0x1c1/0x530
[ 0.154226] [<ffffffff814f10f7>] ? get_device+0x17/0x30
[ 0.154229] [<ffffffff813eb082>] pci_scan_single_device+0x82/0xc0
[ 0.154232] [<ffffffff813eb10e>] pci_scan_slot+0x4e/0x140
[ 0.154235] [<ffffffff813ec38d>] pci_scan_child_bus+0x3d/0x160
[ 0.154238] [<ffffffff81689f30>] pci_acpi_scan_root+0x360/0x550
[ 0.154242] [<ffffffff8143cf2c>] acpi_pci_root_add+0x3b7/0x49b
[ 0.154245] [<ffffffff8143ed7d>] ? acpi_pnp_match+0x31/0xa8
[ 0.154248] [<ffffffff81438fef>] acpi_bus_attach+0x109/0x1fc
[ 0.154251] [<ffffffff814f5a9e>] ? device_attach+0x6e/0xd0
[ 0.154254] [<ffffffff8143906a>] acpi_bus_attach+0x184/0x1fc
[ 0.154256] [<ffffffff814f5a9e>] ? device_attach+0x6e/0xd0
[ 0.154259] [<ffffffff8143906a>] acpi_bus_attach+0x184/0x1fc
[ 0.154262] [<ffffffff81d85be8>] ? acpi_sleep_proc_init+0x2a/0x2a
[ 0.154265] [<ffffffff814391d5>] acpi_bus_scan+0x5b/0x6d
[ 0.154268] [<ffffffff81d86039>] acpi_scan_init+0x6d/0x1b3
[ 0.154271] [<ffffffff81d9bfa9>] ? __pci_mmcfg_init+0x60/0xa7
[ 0.154274] [<ffffffff81d85e37>] acpi_init+0x24f/0x267
[ 0.154277] [<ffffffff81002144>] do_one_initcall+0xd4/0x210
[ 0.154280] [<ffffffff81091a00>] ? parse_args+0x70/0x480
[ 0.154283] [<ffffffff810b32f8>] ? __wake_up+0x48/0x60
[ 0.154286] [<ffffffff81d3e27a>] kernel_init_freeable+0x16c/0x1f9
[ 0.154289] [<ffffffff81d3d9a7>] ? initcall_blacklist+0xc0/0xc0
[ 0.154292] [<ffffffff817a00c0>] ? rest_init+0x80/0x80
[ 0.154295] [<ffffffff817a00ce>] kernel_init+0xe/0xf0
[ 0.154298] [<ffffffff817b5d98>] ret_from_fork+0x58/0x90
[ 0.154301] [<ffffffff817a00c0>] ? rest_init+0x80/0x80
How the PCI config reads/writes occur ?
We see that several function pci_read_config_byte, pci_read_config_word etc. are called to read the config space of PCI. These functions are defined in include/linux/pci.h as
static inline int pci_read_config_word(const struct pci_dev *dev, int where, u16 *val)
{
return pci_bus_read_config_word(dev->bus, dev->devfn, where, val);
}
The functions of pci_bus_read_config_word are defined in drivers/pci/access.c
#define PCI_OP_READ(size,type,len) \
int pci_bus_read_config_##size \
(struct pci_bus *bus, unsigned int devfn, int pos, type *value) \
{ \
int res; \
unsigned long flags; \
u32 data = 0; \
if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \
raw_spin_lock_irqsave(&pci_lock, flags); \
res = bus->ops->read(bus, devfn, pos, len, &data); \
*value = (type)data; \
raw_spin_unlock_irqrestore(&pci_lock, flags); \
return res; \
}
Here it is using the bus->ops->read function to do the reads. This read is initialized at init time. The variables of raw_pci_ops and raw_pci_ext_ops are initialised for this.
raw_pci_ext_ops is initialised to pci_mmcfg in function pci_mmcfg_arch_init
raw_pci_ext_ops = &pci_mmcfg;
[ 0.140712] kundan pci_mmcfg_arch_init 132
[ 0.140715] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.19.8-ckt9 #4
[ 0.140717] Hardware name: LENOVO 28427ZQ/INVALID, BIOS 6JET58WW (1.16 ) 09/17/2009
[ 0.140719] 0000000000000000 ffff88013afd3dd8 ffffffff817ae263 0000000000001970
[ 0.140723] ffffffff81cd7740 ffff88013afd3df8 ffffffff81d9bb9b 0000000000000aae
[ 0.140726] ffffffff81cd7740 ffff88013afd3e18 ffffffff81d9bfa9 ffffffff81c1d060
[ 0.140729] Call Trace:
[ 0.140733] [<ffffffff817ae263>] dump_stack+0x45/0x57
[ 0.140736] [<ffffffff81d9bb9b>] pci_mmcfg_arch_init+0x5a/0x63
[ 0.140740] [<ffffffff81d9bfa9>] __pci_mmcfg_init+0x60/0xa7
[ 0.140743] [<ffffffff81d9c637>] pci_mmcfg_late_init+0x27/0x29
[ 0.140746] [<ffffffff81d85e32>] acpi_init+0x24a/0x267
[ 0.140749] [<ffffffff81002144>] do_one_initcall+0xd4/0x210
[ 0.140752] [<ffffffff81091a00>] ? parse_args+0x70/0x480
[ 0.140755] [<ffffffff810b32f8>] ? __wake_up+0x48/0x60
[ 0.140758] [<ffffffff81d3e27a>] kernel_init_freeable+0x16c/0x1f9
[ 0.140761] [<ffffffff81d3d9a7>] ? initcall_blacklist+0xc0/0xc0
[ 0.140764] [<ffffffff817a00c0>] ? rest_init+0x80/0x80
[ 0.140767] [<ffffffff817a00ce>] kernel_init+0xe/0xf0
[ 0.140770] [<ffffffff817b5d98>] ret_from_fork+0x58/0x90
[ 0.140773] [<ffffffff817a00c0>] ? rest_init+0x80/0x80
Now pci_mmcfg is defined as :
const struct pci_raw_ops pci_mmcfg = {
.read = pci_mmcfg_read,
.write = pci_mmcfg_write,
};
static int pci_mmcfg_read(unsigned int seg, unsigned int bus,
unsigned int devfn, int reg, int len, u32 *value)
{
char __iomem *addr;
/* Why do we have this when nobody checks it. How about a BUG()!? -AK */
if (unlikely((bus > 255) || (devfn > 255) || (reg > 4095))) {
err: *value = -1;
return -EINVAL;
}
rcu_read_lock();
addr = pci_dev_base(seg, bus, devfn);
if (!addr) {
rcu_read_unlock();
goto err;
}
switch (len) {
case 1:
*value = mmio_config_readb(addr + reg);
break;
case 2:
*value = mmio_config_readw(addr + reg);
break;
case 4:
*value = mmio_config_readl(addr + reg);
break;
}
rcu_read_unlock();
return 0;
}
So the pci_dev_base function actually fetches the Memory mapped address for PCI config reads and writes.
static char __iomem *pci_dev_base(unsigned int seg, unsigned int bus, unsigned int devfn)
{
struct pci_mmcfg_region *cfg = pci_mmconfig_lookup(seg, bus);
if (cfg && cfg->virt)
return cfg->virt + (PCI_MMCFG_BUS_OFFSET(bus) | (devfn << 12));
return NULL;
}
struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus)
{
struct pci_mmcfg_region *cfg;
list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
if (cfg->segment == segment &&
cfg->start_bus <= bus && bus <= cfg->end_bus)
return cfg;
return NULL;
}
pci_mmcfg_list should have been populated for this bus. This is done in pci_mmconfig_add function. This value is assigned from ACPI by parsing the SFI table. Here is a call stack for it getting initialized
[ 0.112385] [<ffffffff81d9c0ab>] pci_mmconfig_add+0x3a/0xa0
[ 0.112388] [<ffffffff81d9c39d>] pci_parse_mcfg+0x8e/0x13b
[ 0.112391] [<ffffffff81d9c30f>] ? pci_mmcfg_e7520+0x61/0x61
[ 0.112394] [<ffffffff81d9bab0>] ? pcibios_resource_survey+0x72/0x72
[ 0.112397] [<ffffffff81d85030>] acpi_table_parse+0x6c/0x89
[ 0.112400] [<ffffffff81d9c054>] acpi_sfi_table_parse.constprop.8+0x17/0x34
[ 0.112403] [<ffffffff81d9c5ff>] pci_mmcfg_early_init+0xea/0xfb
[ 0.112406] [<ffffffff81d9bacb>] pci_arch_init+0x1b/0x6a
[ 0.112409] [<ffffffff81002144>] do_one_initcall+0xd4/0x210
[ 0.112411] [<ffffffff81091a00>] ? parse_args+0x70/0x480
[ 0.112414] [<ffffffff810b32f8>] ? __wake_up+0x48/0x60
[ 0.112417] [<ffffffff81d3e27a>] kernel_init_freeable+0x16c/0x1f9
[ 0.112420] [<ffffffff81d3d9a7>] ? initcall_blacklist+0xc0/0xc0
[ 0.112423] [<ffffffff817a00c0>] ? rest_init+0x80/0x80
[ 0.112426] [<ffffffff817a00ce>] kernel_init+0xe/0xf0
[ 0.112429] [<ffffffff817b5d98>] ret_from_fork+0x58/0x90
[ 0.112432] [<ffffffff817a00c0>] ? rest_init+0x80/0x80
There is another way to access the PCI config space using older method. This method is now also supported by PCI as a backward compatible stuff. Check http://wiki.osdev.org/PCI for relevance of 0xCFC and 0xCF8 ports
static int pci_conf1_read(unsigned int seg, unsigned int bus,
unsigned int devfn, int reg, int len, u32 *value)
{
unsigned long flags;
if (seg || (bus > 255) || (devfn > 255) || (reg > 4095)) {
*value = -1;
return -EINVAL;
}
raw_spin_lock_irqsave(&pci_config_lock, flags);
outl(PCI_CONF1_ADDRESS(bus, devfn, reg), 0xCF8);
switch (len) {
case 1:
*value = inb(0xCFC + (reg & 3));
break;
case 2:
*value = inw(0xCFC + (reg & 2));
break;
case 4:
*value = inl(0xCFC);
break;
}
raw_spin_unlock_irqrestore(&pci_config_lock, flags);
return 0;
}
Info about driver loading :
https://www.linux.com/news/hardware/peripherals/180950-udev
https://bobcares.com/blog/udev-introduction-to-device-management-in-modern-linux-system/
How the probe method of pci device driver is called ?
As we have seen in the PCI enumeration that all the devices attached are checked in a tree like fashion, reading the vendor ID of all the devices. For the devices if the vendor ID comes out to be valid then the function pci_device_add is called.
This function sends a Uevent to user space mentioning the vendor ID, device ID etc. The Udevd sees that this device is enumerated/added. Udevd looks inside /lib/modules/<kernel>/modules.alias for the driver present for this PCI device. Then Udevd loads the driver. Loading driver does pci_register_driver. This goes through following calls and calls the probe of driver.
Jan 10 19:57:42 localhost kernel: [ 11.547671] CPU: 1 PID: 345 Comm: udevd Not tainted 4.0.0 #1
Jan 10 19:57:42 localhost kernel: [ 11.547672] Hardware name: Hewlett-Packard HP ZBook 840 G2/2216, BIOS M71 Ver. 00.55 05/14/2014
Jan 10 19:57:42 localhost kernel: [ 11.547673] ffff88015707f400 ffff8801568af948 ffffffff816cd469 0000000000000080
Jan 10 19:57:42 localhost kernel: [ 11.547676] 0000000000000000 ffff8801568af978 ffffffff8144a2b2 0000000000000000
Jan 10 19:57:42 localhost kernel: [ 11.547678] ffff88015707f400 ffffffffa01f3d20 0000000000000003 ffff8801568afa18
Jan 10 19:57:42 localhost kernel: [ 11.547681] Call Trace:
Jan 10 19:57:42 localhost kernel: [ 11.547684] [<ffffffff816cd469>] dump_stack+0x45/0x57
Jan 10 19:57:42 localhost kernel: [ 11.547687] [<ffffffff8144a2b2>] platform_device_add+0x32/0x300
Jan 10 19:57:42 localhost kernel: [ 11.547690] [<ffffffffa01df44f>] mfd_add_device+0x30f/0x3a0 [mfd_core]
Jan 10 19:57:42 localhost kernel: [ 11.547693] [<ffffffff811b9466>] ? __kmalloc+0x166/0x1a0
Jan 10 19:57:42 localhost kernel: [ 11.547696] [<ffffffffa01df6d3>] ? mfd_add_devices+0x53/0x980 [mfd_core]
Jan 10 19:57:42 localhost kernel: [ 11.547699] [<ffffffffa01df6d3>] ? mfd_add_devices+0x53/0x980 [mfd_core]
Jan 10 19:57:42 localhost kernel: [ 11.547702] [<ffffffffa01df735>] mfd_add_devices+0xb5/0x980 [mfd_core]
Jan 10 19:57:42 localhost kernel: [ 11.547705] [<ffffffff815a54d3>] ? raw_pci_read+0x23/0x40
Jan 10 19:57:42 localhost kernel: [ 11.547709] [<ffffffffa01f0515>] lpc_ich_probe+0x395/0x5c4 [lpc_ich]
Jan 10 19:57:42 localhost kernel: [ 11.547712] [<ffffffff8136f70e>] local_pci_probe+0x4e/0xa0
Jan 10 19:57:42 localhost kernel: [ 11.547716] [<ffffffff814432a7>] ? get_device+0x17/0x30
Jan 10 19:57:42 localhost kernel: [ 11.547720] [<ffffffff8136f989>] pci_device_probe+0xd9/0x120
Jan 10 19:57:42 localhost kernel: [ 11.547724] [<ffffffff814480bd>] driver_probe_device+0x9d/0x3c0
Jan 10 19:57:42 localhost kernel: [ 11.547728] [<ffffffff8144848b>] __driver_attach+0xab/0xb0
Jan 10 19:57:42 localhost kernel: [ 11.547732] [<ffffffff814483e0>] ? driver_probe_device+0x3c0/0x3c0
Jan 10 19:57:42 localhost kernel: [ 11.547737] [<ffffffff8144617d>] bus_for_each_dev+0x5d/0xa0
Jan 10 19:57:42 localhost kernel: [ 11.547740] [<ffffffff814479fe>] driver_attach+0x1e/0x20
Jan 10 19:57:42 localhost kernel: [ 11.547744] [<ffffffff814476a4>] bus_add_driver+0x124/0x250
Jan 10 19:57:42 localhost kernel: [ 11.547748] [<ffffffffa01e4000>] ? 0xffffffffa01e4000
Jan 10 19:57:42 localhost kernel: [ 11.547752] [<ffffffff81448ca4>] driver_register+0x64/0xf0
Jan 10 19:57:42 localhost kernel: [ 11.547756] [<ffffffff8136ebbb>] __pci_register_driver+0x4b/0x50
Jan 10 19:57:42 localhost kernel: [ 11.547760] [<ffffffffa01e401e>] lpc_ich_driver_init+0x1e/0x1000 [lpc_ich]
Jan 10 19:57:42 localhost kernel: [ 11.547763] [<ffffffff81000310>] do_one_initcall+0xc0/0x1e0
Jan 10 19:57:42 localhost kernel: [ 11.547767] [<ffffffff811b97a5>] ? kmem_cache_alloc_trace+0x35/0x140
Jan 10 19:57:42 localhost kernel: [ 11.547769] [<ffffffff816c9e86>] do_init_module+0x61/0x1ce
Jan 10 19:57:42 localhost kernel: [ 11.547772] [<ffffffff810f2ad6>] load_module+0x1d16/0x2580
Jan 10 19:57:42 localhost kernel: [ 11.547775] [<ffffffff810ee7e0>] ? unset_module_core_ro_nx+0x80/0x80
Jan 10 19:57:42 localhost kernel: [ 11.547779] [<ffffffff816d6222>] ? page_fault+0x22/0x30
Jan 10 19:57:42 localhost kernel: [ 11.547782] [<ffffffff810f3443>] SyS_init_module+0x103/0x160
Jan 10 19:57:42 localhost kernel: [ 11.547785] [<ffffffff816d47b2>] system_call_fastpath+0x12/0x17
Nice post, Thank You
ReplyDeleteI have one question when the pci power management code comes into picture
ReplyDelete