Articles in this blog

Thursday, 23 June 2016

Segmentation in Linux

Linux segmentation (protected mode):Segmentation is used while converting the virtual address to physical address. This conversion goes through segmentation and paging.
In segmentation the segment base address needs to be fetched and added to virtual address. This completes segmentation.

The segment base address for each segment is 0 in Linux. This model is called “flat model”. Flat model is equivalent to disabling segmentation when it comes to translating memory addresses.
An executing Linux process is divided into segments. For example Code segment, data segment etc. Each segment has segment selector corresponding to it. This segment register points to entry in segment descriptor table (segmentation).
There are 2 segment descriptor tables for processes’ code execution.

Local descriptor Table: The LDT is essential to implementing separate address spaces for multiple processes. There will be generally one LDT per user process, describing privately held memory,

Global descriptor table: shared memory and kernel memory will be described by the GDT.
The starting address of these tables in cached in Global descriptor table register (GDTR) and Local descriptor table register (LDTR).
The LDT gets reloaded every time a new process is scheduled:
switch_mm()
..
..
                              /* Load the LDT, if the LDT is different: */
                              if (unlikely(prev->context.ldt != next->context.ldt))
                                             load_LDT_nolock(&next->context);



Interrupt descriptor table : IDT 

Privilege levels (CPL DPL RPL):

X86 systems has a feature of privilege levels. This restricts the memory access, IO ports access and ability to execute certain machine instructions. For kernel this privilege level is 0 and for user space programs it is 3. A code executing cannot change its privilege level itself. Change in privilege level can be done using lcall, int, lret and iret instructions. The raise of privilege level is done by lcall and iret instructions and lowering by lret and iret instructions. This explains the use of int 0x80 done while executing any system call. It is this int instruction which elevates the privilege level from user(0) to kernel(3) space.

CPL is the current privilege level (found in the lower 2 bits of the CS register), RPL is the requested privilege level from the segment selector, and DPL is the descriptor privilege level of the segment (found in the descriptor). All privilege levels are integers in the range 0–3, where the lowest number corresponds to the highest privilege.
The only way to change the processor privilege level (and reload CS) is through lcall, int, lret and iret instructions.
Also while accessing it is checked if following is true
max(CPL, RPL) ≤ DPL
else a General Protection Fault is raised.

Lets see the segmentation and per-CPU GDT under Linux :
Segment.h linux-4.0.5\arch\x86\include\asm
64 bit arch uses 16 GDT entries
 #define GDT_ENTRY_KERNEL32_CS 1
#define GDT_ENTRY_KERNEL_CS 2
#define GDT_ENTRY_KERNEL_DS 3
 #define __KERNEL32_CS   (GDT_ENTRY_KERNEL32_CS * 8)
 /*
 * we cannot use the same code segment descriptor for user and kernel
 * -- not even in the long flat mode, because of different DPL /kkeil
 * The segment offset needs to contain a RPL. Grr. -AK
 * GDT layout to get 64bit syscall right (sysret hardcodes gdt offsets)
 */
#define GDT_ENTRY_DEFAULT_USER32_CS 4
#define GDT_ENTRY_DEFAULT_USER_DS 5
#define GDT_ENTRY_DEFAULT_USER_CS 6
#define __USER32_CS   (GDT_ENTRY_DEFAULT_USER32_CS*8+3)
#define __USER32_DS     __USER_DS
 #define GDT_ENTRY_TSS 8           /* needs two entries */
#define GDT_ENTRY_LDT 10 /* needs two entries */
#define GDT_ENTRY_TLS_MIN 12
#define GDT_ENTRY_TLS_MAX 14
 #define GDT_ENTRY_PER_CPU 15             /* Abused to load per CPU data from limit */
#define __PER_CPU_SEG               (GDT_ENTRY_PER_CPU * 8 + 3)
 /* TLS indexes for 64bit - hardcoded in arch_prctl */
#define FS_TLS 0
#define GS_TLS 1
 #define GS_TLS_SEL ((GDT_ENTRY_TLS_MIN+GS_TLS)*8 + 3)
#define FS_TLS_SEL ((GDT_ENTRY_TLS_MIN+FS_TLS)*8 + 3)
 #define GDT_ENTRIES 16


What is TLS ?
Linux dedicates three global descriptor table (GDT) entries for thread-local storage.


Linux GDT instantiation:
Common.c
               /*
                * We need valid kernel segments for data and code in long mode too
                * IRET will check the segment types  kkeil 2000/10/28
                * Also sysret mandates a special GDT layout
                *
                * TLS descriptors are currently at a different place compared to i386.
                * Hopefully nobody expects them at a fixed place (Wine?)
                */
               [GDT_ENTRY_KERNEL32_CS]                      = GDT_ENTRY_INIT(0xc09b, 0, 0xfffff),
               [GDT_ENTRY_KERNEL_CS]                           = GDT_ENTRY_INIT(0xa09b, 0, 0xfffff),
               [GDT_ENTRY_KERNEL_DS]                          = GDT_ENTRY_INIT(0xc093, 0, 0xfffff),
               [GDT_ENTRY_DEFAULT_USER32_CS]        = GDT_ENTRY_INIT(0xc0fb, 0, 0xfffff),
               [GDT_ENTRY_DEFAULT_USER_DS]            = GDT_ENTRY_INIT(0xc0f3, 0, 0xfffff),
               [GDT_ENTRY_DEFAULT_USER_CS]            = GDT_ENTRY_INIT(0xa0fb, 0, 0xfffff),


From the Intel architecture manual : 
2.1.1 Global and Local Descriptor Tables
When operating in protected mode, all memory accesses pass through either the global descriptor table (GDT) or an optional local descriptor table (LDT) as shown in Figure 2-1. These tables contain entries called segment descriptors. Segment descriptors provide the base address of segments well as access rights, type, and usage information.
Each segment descriptor has an associated segment selector. A segment selector provides the software that uses it with an index into the GDT or LDT (the offset of its associated segment descriptor), a global/local flag (determines whether the selector points to the GDT or the LDT), and access rights information.

To access a byte in a segment, a segment selector and an offset must be supplied. The segment selector provides access to the segment descriptor for the segment (in the GDT or LDT). From the segment descriptor, the processor obtains the base address of the segment in the linear address space. The offset then provides the location of the byte relative to the base address. This mechanism can be used to access any valid code, data, or stack segment, provided the segment is accessible from the current privilege level (CPL) at which the processor is operating. The CPL is defined as the protection level of the currently executing code segment. 

However, the actual path from a segment selector to its associated segment is always
through a GDT or LDT. The linear address of the base of the GDT is contained in the GDT register (GDTR); the linear address of the LDT is contained in the LDT register (LDTR).

2.1.2 System Segments, Segment Descriptors, and Gates
Besides code, data, and stack segments that make up the execution environment of a program or procedure, the architecture defines two system segments: the task-state segment (TSS) and the LDT. The GDT is not considered a segment because it is not accessed by means of a segment selector and segment descriptor. TSSs and LDTs have segment descriptors defined for them.
The architecture also defines a set of special descriptors called gates (call gates, interrupt gates, trap gates, and task gates). These provide protected gateways to system procedures and handlers that may operate at a different privilege level than application programs and most procedures. For example, a CALL to a call gate can provide access to a procedure in a code segment that is at the same or a numerically lower privilege level (more privileged) than the current code segment. To access a procedure through a call gate, the calling procedure1 supplies the selector for the call gate. The processor then performs an access rights check on the call gate, comparing the CPL with the privilege level of the call gate and the destination code segment pointed to by the call gate.
If access to the destination code segment is allowed, the processor gets the segment selector for the destination code segment and an offset into that code segment from the call gate. If the call requires a change in privilege level, the processor also switches to the stack for the targeted privilege level. The segment selector for the new stack is obtained from the TSS for the currently running task. Gates also facilitate transitions between 16-bit and 32-bit code segments, and vice versa.

Friday, 17 June 2016

Linux jprobe example

In this post I will depict how to use jprobes in Linux. Jprobes are the functions which we need to register. Once registered your jprobe will be called just before the Linux kernel test function. For this to work properly your kernel should have been compiled with CONFIG_KPROBES as ''y'' . Also the kernel module shall be compiled with LICENSE "GPL".

Here is a sample module which puts prove on blk_queue_bio function of Linux kernel :

#include<linux/module.h>
#include<linux/version.h>
#include<linux/kernel.h>
#include<linux/init.h>
#include<linux/kprobes.h>

// for request_queue and bio
#include<linux/blkdev.h>
#include<linux/blk_types.h>

void my_handler(struct request_queue *q, struct bio *bio)
{
    static int i = 0;

    if(i <= 50)
        i++;

    if(i <= 50)
    {
        printk("Your probe got hit \n");
dump_stack();
        if((bio->bi_bdev != NULL) && (bio->bi_bdev->bd_disk != NULL))
        {
            printk("disk_name = %s\n", bio->bi_bdev->bd_disk->disk_name);
        }
    }

    jprobe_return();
}

static struct jprobe my_probe;

int myinit(void)
{
    printk("module inserted\n");
    my_probe.kp.addr = (kprobe_opcode_t *)0xffffffff81294310; //function address for blk_queue_bio
    my_probe.entry = (kprobe_opcode_t *)my_handler;
    register_jprobe(&my_probe);
    return 0;
}

void myexit(void)
{
    unregister_jprobe(&my_probe);
    printk("module removed\n");
}

module_init(myinit);
module_exit(myexit);


MODULE_AUTHOR("K_K");
MODULE_DESCRIPTION("SIMPLE MODULE");
MODULE_LICENSE("GPL");


You can take address of any kernel function from /proc/kallsyms
[root@localhost jprobe]# cat /proc/kallsyms | grep blk_queue_bio
ffffffff81294310 T blk_queue_bio

Makefile for this module :
obj-m +=jprobe_example.o
KDIR= /lib/modules/$(shell uname -r)/build
all:
    $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
clean:
       rm -rf *.o *.ko *.mod.* .c* .t* .*.cmd .tmp_versions


You can check the dump_stack statements getting hit whenever a call to blk_queue_bio is made.


This module can be inserted and run like:
insmod jprobe_example.ko
dd if=/dev/mapper/mpatha of=/root/test_file bs=1M count=100
rmmod jprobe_example

Tuesday, 5 April 2016

Linux internals for waitqueue/wakeup functions

Wait queues and wakeup in linux kernel  :

I am using the kernel version 3.19 for the depiction of waitqueue functions :

First there is waitqueue head and the waitqueue.

The waitqueue structure is  :
 20 struct __wait_queue {
 21         unsigned int            flags;
 22         void                    *private;
 23         wait_queue_func_t       func;
 24         struct list_head        task_list;
 25 };
 12 typedef struct __wait_queue wait_queue_t;

So basically it has a wait_queue function and a task_list.

Also the wait structure is as follows :

 39 struct __wait_queue_head {
 40         spinlock_t              lock;
 41         struct list_head        task_list;
 42 };
 43 typedef struct __wait_queue_head wait_queue_head_t;

Wait queue head has just the task list embedded in it.

Initially to make a linked list of task we declare and define the wait_queue head.

 63 #define DECLARE_WAIT_QUEUE_HEAD(name) \
 64         wait_queue_head_t name = __WAIT_QUEUE_HEAD_INITIALIZER(name)

 59 #define __WAIT_QUEUE_HEAD_INITIALIZER(name) {                           \
 60         .lock           = __SPIN_LOCK_UNLOCKED(name.lock),              \
 61         .task_list      = { &(name).task_list, &(name).task_list } }

These steps initializes the spinlock and the task list.

Initializing the wait_queue head is done like :

 72 extern void __init_waitqueue_head(wait_queue_head_t *q, const char *name, struct lock_class_key *);
 73
 74 #define init_waitqueue_head(q)                          \
 75         do {                                            \
 76                 static struct lock_class_key __key;     \
 77                                                         \
 78                 __init_waitqueue_head((q), #q, &__key); \
 79         } while (0)

 kernel/sched/wait.c
 14 void __init_waitqueue_head(wait_queue_head_t *q, const char *name, struct lock_class_key *key)
 15 {
 16         spin_lock_init(&q->lock);
 17         lockdep_set_class_and_name(&q->lock, key, name);
 18         INIT_LIST_HEAD(&q->task_list);
 19 }


Now we need to initialize the wait queue.

 51 #define __WAITQUEUE_INITIALIZER(name, tsk) {                            \
 52         .private        = tsk,                                          \
 53         .func           = default_wake_function,                        \
 54         .task_list      = { NULL, NULL } }
 55
 56 #define DECLARE_WAITQUEUE(name, tsk)                                    \
 57         wait_queue_t name = __WAITQUEUE_INITIALIZER(name, tsk)

This is the function to initialise a wait_queue entry :

 90 static inline void init_waitqueue_entry(wait_queue_t *q, struct task_struct *p)
 91 {
 92         q->flags        = 0;
 93         q->private      = p;
 94         q->func         = default_wake_function;
 95 }
 96

 Now to add the wait queue to wait queue head list we use function add_wait_queue :
 23 void add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
 24 {
 25         unsigned long flags;
 26
 27         wait->flags &= ~WQ_FLAG_EXCLUSIVE;
 28         spin_lock_irqsave(&q->lock, flags);
 29         __add_wait_queue(q, wait);
 30         spin_unlock_irqrestore(&q->lock, flags);
 31 }
 32 EXPORT_SYMBOL(add_wait_queue);


 114 static inline void __add_wait_queue(wait_queue_head_t *head, wait_queue_t *new)
115 {
116         list_add(&new->task_list, &head->task_list);
117 }

 Lets take a look at default_wake_function
 kernel/sched/core.c
2991 int default_wake_function(wait_queue_t *curr, unsigned mode, int wake_flags,
2992                           void *key)
2993 {
2994         return try_to_wake_up(curr->private, mode, wake_flags);
2995 }
2996 EXPORT_SYMBOL(default_wake_function);


1673 /**
1674  * try_to_wake_up - wake up a thread
1675  * @p: the thread to be awakened
1676  * @state: the mask of task states that can be woken
1677  * @wake_flags: wake modifier flags (WF_*)
1678  *
1679  * Put it on the run-queue if it's not already there. The "current"
1680  * thread is always on the run-queue (except when the actual
1681  * re-schedule is in progress), and as such you're allowed to do
1682  * the simpler "current->state = TASK_RUNNING" to mark yourself
1683  * runnable without the overhead of this.
1684  *
1685  * Return: %true if @p was woken up, %false if it was already running.
1686  * or @state didn't match @p's state.
1687  */
1688 static int
1689 try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
1690 {
1691         unsigned long flags;
1692         int cpu, success = 0;
1693
1694         /*
1695          * If we are going to wake up a thread waiting for CONDITION we
1696          * need to ensure that CONDITION=1 done by the caller can not be
1697          * reordered with p->state check below. This pairs with mb() in
1698          * set_current_state() the waiting thread does.
1699          */
1700         smp_mb__before_spinlock();
1701         raw_spin_lock_irqsave(&p->pi_lock, flags);
1702         if (!(p->state & state))
1703                 goto out;
1704
1705         success = 1; /* we're going to change ->state */
1706         cpu = task_cpu(p);
1707
1708         if (p->on_rq && ttwu_remote(p, wake_flags))
1709                 goto stat;
1710
1711 #ifdef CONFIG_SMP
1712         /*
1713          * If the owning (remote) cpu is still in the middle of schedule() with
1714          * this task as prev, wait until its done referencing the task.
1715          */
1716         while (p->on_cpu)
1717                 cpu_relax();
1718         /*
1719          * Pairs with the smp_wmb() in finish_lock_switch().
1720          */
1721         smp_rmb();
1722
1723         p->sched_contributes_to_load = !!task_contributes_to_load(p);
1724         p->state = TASK_WAKING;
1725
1726         if (p->sched_class->task_waking)
1727                 p->sched_class->task_waking(p);
1728
1729         cpu = select_task_rq(p, p->wake_cpu, SD_BALANCE_WAKE, wake_flags);
1730         if (task_cpu(p) != cpu) {
1731                 wake_flags |= WF_MIGRATED;
1732                 set_task_cpu(p, cpu);
1733         }
1734 #endif /* CONFIG_SMP */
1735
1736         ttwu_queue(p, cpu);
1737 stat:
1738         ttwu_stat(p, cpu, wake_flags);
1739 out:
1740         raw_spin_unlock_irqrestore(&p->pi_lock, flags);
1741
1742         return success;
1743 }


How the try_to_wake_up the default_wake_function is called ??
We need to wake up the tasks put in the wait queue. wake_up functions come for help. 

include/linux/wait.h
#define wake_up(x) __wake_up(x, TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, 1, NULL)
#define wake_up_nr(x, nr) __wake_up(x, TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, nr, NULL)
#define wake_up_all(x) __wake_up(x, TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, 0, NULL)
#define wake_up_interruptible(x) __wake_up(x, TASK_INTERRUPTIBLE, 1, NULL)
#define wake_up_interruptible_nr(x, nr) __wake_up(x, TASK_INTERRUPTIBLE, nr, NULL)
#define wake_up_interruptible_all(x) __wake_up(x, TASK_INTERRUPTIBLE, 0, NULL)

 89 void __wake_up(wait_queue_head_t *q, unsigned int mode,
 90                         int nr_exclusive, void *key)
 91 {
 92         unsigned long flags;
 93
 94         spin_lock_irqsave(&q->lock, flags);
 95         __wake_up_common(q, mode, nr_exclusive, 0, key);
 96         spin_unlock_irqrestore(&q->lock, flags);
 97 }
 98 EXPORT_SYMBOL(__wake_up);


 65 static void __wake_up_common(wait_queue_head_t *q, unsigned int mode,
 66                         int nr_exclusive, int wake_flags, void *key)
 67 {
 68         wait_queue_t *curr, *next;
 69
 70         list_for_each_entry_safe(curr, next, &q->task_list, task_list) {
 71                 unsigned flags = curr->flags;
 72
 73                 if (curr->func(curr, mode, wake_flags, key) &&         <------default_wake_function called
 74                                 (flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)
 75                         break;
 76         }
 77 }


Other functions for wait/wakeup 
#define wait_event_interruptible(wq, condition)
#define wait_event_timeout(wq, condition, timeout) { ... }
#define wait_event_interruptible_timeout(wq, condition, timeout)


390 #define wait_event_interruptible(wq, condition)                         \
391 ({                                                                      \
392         int __ret = 0;                                                  \
393         might_sleep();                                                  \
394         if (!(condition))                                               \
395                 __ret = __wait_event_interruptible(wq, condition);      \
396         __ret;                                                          \
397 })


371 #define __wait_event_interruptible(wq, condition)                       \
372         ___wait_event(wq, condition, TASK_INTERRUPTIBLE, 0, 0,          \
373                       schedule())
374

 212 #define ___wait_event(wq, condition, state, exclusive, ret, cmd)        \
213 ({                                                                      \
214         __label__ __out;                                                \
215         wait_queue_t __wait;                                            \
216         long __ret = ret;       /* explicit shadow */                   \
217                                                                         \
218         INIT_LIST_HEAD(&__wait.task_list);                              \
219         if (exclusive)                                                  \
220                 __wait.flags = WQ_FLAG_EXCLUSIVE;                       \
221         else                                                            \
222                 __wait.flags = 0;                                       \
223                                                                         \
224         for (;;) {                                                      \
225                 long __int = prepare_to_wait_event(&wq, &__wait, state);\
226                                                                         \
227                 if (condition)                                          \
228                         break;                                          \
229                                                                         \
230                 if (___wait_is_interruptible(state) && __int) {         \
231                         __ret = __int;                                  \
232                         if (exclusive) {                                \
233                                 abort_exclusive_wait(&wq, &__wait,      \
234                                                      state, NULL);      \
235                                 goto __out;                             \
236                         }                                               \
237                         break;                                          \
238                 }                                                       \
239                                                                         \
240                 cmd;                                                    \
241         }                                                               \
242         finish_wait(&wq, &__wait);                                      \
243 __out:  __ret;                                                          \
244 })
245


199 long prepare_to_wait_event(wait_queue_head_t *q, wait_queue_t *wait, int state)
200 {
201         unsigned long flags;
202
203         if (signal_pending_state(state, current))
204                 return -ERESTARTSYS;
205
206         wait->private = current;
207         wait->func = autoremove_wake_function;
208
209         spin_lock_irqsave(&q->lock, flags);
210         if (list_empty(&wait->task_list)) {
211                 if (wait->flags & WQ_FLAG_EXCLUSIVE)
212                         __add_wait_queue_tail(q, wait);
213                 else
214                         __add_wait_queue(q, wait);
215         }
216         set_current_state(state);
217         spin_unlock_irqrestore(&q->lock, flags);
218
219         return 0;
220 }
221 EXPORT_SYMBOL(prepare_to_wait_event);

232 void finish_wait(wait_queue_head_t *q, wait_queue_t *wait)
233 {
234         unsigned long flags;
235
236         __set_current_state(TASK_RUNNING);
237         /*
238          * We can check for list emptiness outside the lock
239          * IFF:
240          *  - we use the "careful" check that verifies both
241          *    the next and prev pointers, so that there cannot
242          *    be any half-pending updates in progress on other
243          *    CPU's that we haven't seen yet (and that might
244          *    still change the stack area.
245          * and
246          *  - all other users take the lock (ie we can only
247          *    have _one_ other CPU that looks at or modifies
248          *    the list).
249          */
250         if (!list_empty_careful(&wait->task_list)) {
251                 spin_lock_irqsave(&q->lock, flags);
252                 list_del_init(&wait->task_list);
253                 spin_unlock_irqrestore(&q->lock, flags);
254         }
255 }
256 EXPORT_SYMBOL(finish_wait);

Wednesday, 16 March 2016

Linux DMA Api's and details

Following documents gives many details of DMA-APIs used in Linux : 

Linux Documentation :
Documentation/DMA-API-HOWTO.txt
Documentation/DMA-API.txt
Documentation/Intel-IOMMU.txt

Online tutorials :
http://linuxkernelhacker.blogspot.in/2014/07/arm-dma-mapping-explained.html
http://www.linuxjournal.com/article/7104?page=0,0

Intel VT-D  spec explaining IOMMU, IOTLB, DMA maps etc.
http://www.intel.in/content/dam/www/public/us/en/documents/product-specifications/vt-directed-io-spec.pdf

dma_map_ops for intel platform

struct dma_map_ops intel_dma_ops = {
.alloc = intel_alloc_coherent,
.free = intel_free_coherent,
.map_sg = intel_map_sg,
.unmap_sg = intel_unmap_sg,
.map_page = intel_map_page,
.unmap_page = intel_unmap_page,
.mapping_error = intel_mapping_error,

};

IOMMU hardware translates bus addresses(as seen by device) to physical addresses(RAM address).

dma_map_single(), takes the CPU virtual address, sets up any required IOMMU mapping and returns the bus address.

dma_set_mask_and_coherent(struct device *dev, u64 mask); -- Checks if device  can address 32bits, 64 bits or 24 bits etc.

Consistent DMA mappings :
dma_alloc_coherent()
1. returns two values: the virtual address which you can use to access it from the CPU and dma_handle which you pass to the card.
2. The CPU virtual address returned is in multiples of pages.
intel_alloc_coherent()
..
vaddr = (void *)__get_free_pages(flags, order);
...
3. Can use map_single to generate the dma_handle(bus address as seen by device)
*dma_handle = __intel_map_single(dev, virt_to_bus(vaddr), size,
DMA_BIDIRECTIONAL,

  dev->coherent_dma_mask);
dma_free_coherent()


dma_pool_create/dma_pool_alloc/dma_pool_free
These work much like a struct kmem_cache allocating small DMA-coherent memory regions


Streaming DMA mapping : 
dma_addr_t
dma_map_single(struct device *dev, void *cpu_addr, size_t size,
     enum dma_data_direction direction)
Maps a piece of processor virtual memory so it can be accessed by the device and returns the bus address of the memory.
dma_addr_t
dma_map_page(struct device *dev, struct page *page,
   unsigned long offset, size_t size,
   enum dma_data_direction direction)
void
dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size,

        enum dma_data_direction direction)

int
dma_map_sg(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction direction)

Returns: the number of bus address segments mapped (this may be shorter than <nents> passed in if some elements of the scatter/gather list are physically or virtually adjacent and an IOMMU maps them with a  single entry).



virt_to_bus/bus_to_virt
virt_to_bus() and bus_to_virt() functions have been
superseded by the functionality provided by the PCI DMA interface



Tuesday, 8 March 2016

Linux Scheduling Internals

Lets start with Linux scheduling :

Linux Scheduling Classes :
In Linux scheduling is determined by the scheduling class to which the process belong.

The sched_class data structure can be found in include/linux/sched.h:
All existing scheduling classes in the kernel are in a list.
stop_sched_class → rt_sched_class → fair_sched_class → idle_sched_class → NULL

Stop and Idle are special scheduling classes. Stop is used to schedule the per-cpu stop task. It pre-empts everything and can be pre-empted by nothing, and Idle is used to schedule the per-cpu idle task (also called swapper task) which is run if no other task is runnable. The other two are for real time and normal tasks.

fair_sched_class
kernel\sched\fair.c implements the CFS scheduler described above.
rt_sched_class
kernel\sched\rt.c implements SCHED_FIFO and SCHED_RR semantics

Initialisation of task_struct : 
We can see the elements related to scheduling in task_struct
struct task_struct {
..
int prio, static_prio, normal_prio;
unsigned int rt_priority;
const struct sched_class *sched_class;
struct sched_entity se;
struct sched_rt_entity rt;
#ifdef CONFIG_CGROUP_SCHED
struct task_group *sched_task_group;
#endif
struct sched_dl_entity dl;
...

The task struct is initialized with sched_class when a process is forked
sched_fork :
} else if (rt_prio(p->prio)) {
p->sched_class = &rt_sched_class;
} else {
p->sched_class = &fair_sched_class;
}


Scheduler Policies : 
The POSIX standard specifies three scheduling policies, one of which is the “usual” or normal policy and is always the default. The other two are (soft) realtime scheduling policies. They are:
○ SCHED_NORMAL (or SCHED_OTHER) ← Default scheduling policy
○ SCHED_RR
○ SCHED_FIFO


SCHED_NORMAL(Complete fair scheduling - CFS) and Process Vruntime :
In CFS the virtual runtime is expressed and tracked via the per-task p->se.vruntime (nanosec-unit) value. This way, it's possible to accurately timestamp and measure the "expected CPU time" a task should have gotten.
CFS's task picking logic is based on this p->se.vruntime value and it is thus very simple, it always tries to run the task with the smallest p->se.vruntime value

CFS maintains a time-ordered rbtree, where all runnable tasks are sorted by the p->se.vruntime key. This key is updated in function entity_tick->update_curr


Runqueues
The basic data structure in the scheduler is the runqueue. The runqueue is defined in kernel/sched.c as struct rq. The runqueue is the list of runnable processes on a given processor; there is one runqueue per processor.

runqueue data structures for fair and real time scheduling classes
struct cfs_rq cfs;
struct rt_rq rt;


TIF_NEED_RESCHED :
The timer interrupt sets the need_resched flag of the task_struct indicating the schedule function to be called.

How exactly (and where in the kernel codebase) is the TIF_NEED_RESCHED flag set?

tick_setup_device
tick_setup_periodic
tick_set_periodic_handler
  dev->event_handler = tick_handle_periodic;
tick_handle_periodic(struct clock_event_device *dev)
tick_periodic
update_process_times(int user_tick)
scheduler_tick();

scheduler_tick calls task_tick
..
curr->sched_class->task_tick(rq, curr, 0);
..

For CFS task_tick function is task_tick_fair which calls entity_tick

entity_tick(cfs_rq, se, queued);
update_curr(cfs_rq); -- Update the current task's runtime statistics and calls resched_task
if (queued) {
resched_task(rq_of(cfs_rq)->curr);
return;
}
TIF_NEED_RESCHED getting set :
void resched_task(struct task_struct *p)
..
set_tsk_need_resched(p);
..

hrtimer way of updating process time and setting TIF_NEED_RESCHED
run_timer_softirq
hrtimer_run_pending
hrtimer_switch_to_hres
tick_setup_sched_timer
ts->sched_timer.function = tick_sched_timer;
-- tick_sched_timer
--tick_sched_handle
update_process_times(user_mode(regs));

TIF_NEED_RESCHED flag is checked on interrupt and userspace return. If this flag is set then the current process is scheduled out and a call to __schedule is made.

Scheduler Entry points :

1. Based on the TIF_NEED_RESCHED flag scheduling function schedule() is called from these places :
A) upon returning to user-space (system call return path). If it is set, the kernel invokes the scheduler before continuing.
Snippet from entry_64.S
ret_from_sys_call
..
sysret_careful:
bt $TIF_NEED_RESCHED,%edx
jnc sysret_signal
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_NONE)
pushq_cfi %rdi
SCHEDULE_USER

#ifdef CONFIG_CONTEXT_TRACKING
# define SCHEDULE_USER call schedule_user
#else
# define SCHEDULE_USER call schedule
#endif

B)  upon returning from a hardware interrupt, the need_resched flag is checked. If it is set And preempt_count is zero (meaning we're in a preemtible region of the kernel and no locks are held), the kernel invokes the scheduler before continuing
Schedule() function getting called :
While returning from interrupt in entry_64.S function :
ENTRY(retint_kernel)
cmpl $0,PER_CPU_VAR(__preempt_count)
jnz  retint_restore_args
bt   $9,EFLAGS-ARGOFFSET(%rsp) /* interrupts off? */
jnc  retint_restore_args
call preempt_schedule_irq
jmp exit_intr

preempt_schedule_irq

void __sched preempt_schedule_irq(void)
...
local_irq_enable();
__schedule();
local_irq_disable();
...
2. Schedule is called when currently running task goes to sleep
/* ‘q’ is the wait queue we wish to sleep on */
DEFINE_WAIT(wait);
add_wait_queue(q, &wait);
while (!condition) { /* condition is the event that we are waiting for */
prepare_to_wait(&q, &wait, TASK_INTERRUPTIBLE);
if (signal_pending(current))
/* handle signal */
schedule();
} finish_wait(&q, &
wait);


3. Sleeping task wakes up
The code that causes the event the sleeping task is waiting for typically calls wake_up() on the corresponding wait queue which eventually ends up in the scheduler function try_to_wake_up()

try_to_wake_up ->
ttwu_queue ->
ttwu_do_activate- >
ttwu_activate ->
activate_task->
enqueue_task->
p->sched_class->enqueue_task(rq, p, flags);



Monday, 18 January 2016

Linux PCI bus enumeration PCI config reads and writes

In this blog we will see the linux code flow for the PCI bus enumeration. It tells us which functions fill up config data in pci_dev structure for the devices.
Later I have also mentioned how are the function calls when pci config space reads are done.
I have used Linux kernel 3.15 for this illustration.
The PCI enumeration is started from acpi_init in acpi supported platforms. The BIOS has already initialised the config space for devices. This config data is fetched while pci bus enumeration.

The function acpi_init calls  acpi_scan_init
acpi_scan_init calls acpi_pci_root_init() function. This function adds a pci_root_handler for ACPI scan.

static struct acpi_scan_handler pci_root_handler = {
.ids = root_device_ids,
.attach = acpi_pci_root_add,
.detach = acpi_pci_root_remove,
.hotplug = {
.enabled = true,
.scan_dependent = acpi_pci_root_scan_dependent,
},
};

After this  acpi_scan_init calls acpi_bus_scan
acpi_bus_scan scans for handlers and tries to call the handler in function acpi_scan_attach_handler. It calls the attach function for acpi handler. For this time it comes to be acpi_pci_root_add.

acpi_pci_root_add function fills the bus number in root->secondary resource.
The ACPI method of METHOD_NAME__CRS is called to fill the bus number.
Similarly root->segment is filled from ACPI method METHOD_NAME__SEG
root->mcfg_addr is filled from METHOD_NAME__CBA

After filling these info it calls function pci_acpi_scan_root. In function pci_acpi_scan_root since the bus is not added it calls pci_create_root_bus to allocate a root bus and corresponding host bridge.

It then calls function pci_scan_child_bus to proceed further. This function for the first time is scanning the root bus itself. Now this function calls pci_scan_slot for all the devices.
The devfn is combination of devicesID (5bits) and device function (3 bits). So this loop iterates through all the deviceIDs
/* Go find them, Rover! */
for (devfn = 0; devfn < 0x100; devfn += 8)
pci_scan_slot(bus, devfn)

After a while if the device comes out to be a bridge the this function also calls pci_scan_bridge.

Lets look at function pci_scan_slot. This function calls pci_scan_single_device calling pci_scan_device.

pci_scan_device reads the vendor ID for device.
pci_bus_read_dev_vendor_id if the read fails  or the data returned is 0xffffffff then device is not present.

pci_scan_device allocated pci_dev structure and calls pci_setup_device.

pci_setup_device reads the various pci data from PCI config space and fills it in pci_dev structure.

pci_scan_single_device also calls pci_device_add. pci_device_add initializes struct device for pci_dev. Then it scans the various PCI capabilities in function pci_init_capabilities.

Sample call stack for pci_dev initialisation :
[    0.154205]  [<ffffffff816872a5>] pci_mmcfg_read+0x125/0x130
[    0.154208]  [<ffffffff8168b5f3>] raw_pci_read+0x23/0x40
[    0.154211]  [<ffffffff8168b63c>] pci_read+0x2c/0x30
[    0.154214]  [<ffffffff813e78c6>] pci_bus_read_config_dword+0x66/0x90
[    0.154217]  [<ffffffff813e9b5d>] pci_cfg_space_size_ext+0x6d/0xb0
[    0.154220]  [<ffffffff813ea968>] pci_cfg_space_size+0x68/0x70
[    0.154223]  [<ffffffff813eab31>] pci_setup_device+0x1c1/0x530
[    0.154226]  [<ffffffff814f10f7>] ? get_device+0x17/0x30
[    0.154229]  [<ffffffff813eb082>] pci_scan_single_device+0x82/0xc0
[    0.154232]  [<ffffffff813eb10e>] pci_scan_slot+0x4e/0x140
[    0.154235]  [<ffffffff813ec38d>] pci_scan_child_bus+0x3d/0x160
[    0.154238]  [<ffffffff81689f30>] pci_acpi_scan_root+0x360/0x550
[    0.154242]  [<ffffffff8143cf2c>] acpi_pci_root_add+0x3b7/0x49b
[    0.154245]  [<ffffffff8143ed7d>] ? acpi_pnp_match+0x31/0xa8
[    0.154248]  [<ffffffff81438fef>] acpi_bus_attach+0x109/0x1fc
[    0.154251]  [<ffffffff814f5a9e>] ? device_attach+0x6e/0xd0
[    0.154254]  [<ffffffff8143906a>] acpi_bus_attach+0x184/0x1fc
[    0.154256]  [<ffffffff814f5a9e>] ? device_attach+0x6e/0xd0
[    0.154259]  [<ffffffff8143906a>] acpi_bus_attach+0x184/0x1fc
[    0.154262]  [<ffffffff81d85be8>] ? acpi_sleep_proc_init+0x2a/0x2a
[    0.154265]  [<ffffffff814391d5>] acpi_bus_scan+0x5b/0x6d
[    0.154268]  [<ffffffff81d86039>] acpi_scan_init+0x6d/0x1b3
[    0.154271]  [<ffffffff81d9bfa9>] ? __pci_mmcfg_init+0x60/0xa7
[    0.154274]  [<ffffffff81d85e37>] acpi_init+0x24f/0x267
[    0.154277]  [<ffffffff81002144>] do_one_initcall+0xd4/0x210
[    0.154280]  [<ffffffff81091a00>] ? parse_args+0x70/0x480
[    0.154283]  [<ffffffff810b32f8>] ? __wake_up+0x48/0x60
[    0.154286]  [<ffffffff81d3e27a>] kernel_init_freeable+0x16c/0x1f9
[    0.154289]  [<ffffffff81d3d9a7>] ? initcall_blacklist+0xc0/0xc0
[    0.154292]  [<ffffffff817a00c0>] ? rest_init+0x80/0x80
[    0.154295]  [<ffffffff817a00ce>] kernel_init+0xe/0xf0
[    0.154298]  [<ffffffff817b5d98>] ret_from_fork+0x58/0x90
[    0.154301]  [<ffffffff817a00c0>] ? rest_init+0x80/0x80


How the PCI config reads/writes occur ? 
We see that several function pci_read_config_byte, pci_read_config_word etc. are called to read the config space of PCI. These functions are defined in include/linux/pci.h as
static inline int pci_read_config_word(const struct pci_dev *dev, int where, u16 *val)
{
return pci_bus_read_config_word(dev->bus, dev->devfn, where, val);
}

The functions of pci_bus_read_config_word are defined in drivers/pci/access.c
#define PCI_OP_READ(size,type,len) \
int pci_bus_read_config_##size \
(struct pci_bus *bus, unsigned int devfn, int pos, type *value) \
{ \
int res; \
unsigned long flags; \
u32 data = 0; \
if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \
raw_spin_lock_irqsave(&pci_lock, flags); \
res = bus->ops->read(bus, devfn, pos, len, &data); \
*value = (type)data; \
raw_spin_unlock_irqrestore(&pci_lock, flags); \
return res; \
}

Here it is using the bus->ops->read function to do the reads. This read is initialized at init time. The variables of raw_pci_ops and raw_pci_ext_ops are initialised for this.
raw_pci_ext_ops is initialised to pci_mmcfg in function pci_mmcfg_arch_init
raw_pci_ext_ops = &pci_mmcfg;

[    0.140712] kundan pci_mmcfg_arch_init 132
[    0.140715] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.19.8-ckt9 #4
[    0.140717] Hardware name: LENOVO 28427ZQ/INVALID, BIOS 6JET58WW (1.16 ) 09/17/2009
[    0.140719]  0000000000000000 ffff88013afd3dd8 ffffffff817ae263 0000000000001970
[    0.140723]  ffffffff81cd7740 ffff88013afd3df8 ffffffff81d9bb9b 0000000000000aae
[    0.140726]  ffffffff81cd7740 ffff88013afd3e18 ffffffff81d9bfa9 ffffffff81c1d060
[    0.140729] Call Trace:
[    0.140733]  [<ffffffff817ae263>] dump_stack+0x45/0x57
[    0.140736]  [<ffffffff81d9bb9b>] pci_mmcfg_arch_init+0x5a/0x63
[    0.140740]  [<ffffffff81d9bfa9>] __pci_mmcfg_init+0x60/0xa7
[    0.140743]  [<ffffffff81d9c637>] pci_mmcfg_late_init+0x27/0x29
[    0.140746]  [<ffffffff81d85e32>] acpi_init+0x24a/0x267
[    0.140749]  [<ffffffff81002144>] do_one_initcall+0xd4/0x210
[    0.140752]  [<ffffffff81091a00>] ? parse_args+0x70/0x480
[    0.140755]  [<ffffffff810b32f8>] ? __wake_up+0x48/0x60
[    0.140758]  [<ffffffff81d3e27a>] kernel_init_freeable+0x16c/0x1f9
[    0.140761]  [<ffffffff81d3d9a7>] ? initcall_blacklist+0xc0/0xc0
[    0.140764]  [<ffffffff817a00c0>] ? rest_init+0x80/0x80
[    0.140767]  [<ffffffff817a00ce>] kernel_init+0xe/0xf0
[    0.140770]  [<ffffffff817b5d98>] ret_from_fork+0x58/0x90
[    0.140773]  [<ffffffff817a00c0>] ? rest_init+0x80/0x80

Now pci_mmcfg is defined as :
const struct pci_raw_ops pci_mmcfg = {
.read = pci_mmcfg_read,
.write = pci_mmcfg_write,
};

static int pci_mmcfg_read(unsigned int seg, unsigned int bus,
 unsigned int devfn, int reg, int len, u32 *value)
{
char __iomem *addr;

/* Why do we have this when nobody checks it. How about a BUG()!? -AK */
if (unlikely((bus > 255) || (devfn > 255) || (reg > 4095))) {
err: *value = -1;
return -EINVAL;
}

rcu_read_lock();
addr = pci_dev_base(seg, bus, devfn);
if (!addr) {
rcu_read_unlock();
goto err;
}

switch (len) {
case 1:
*value = mmio_config_readb(addr + reg);
break;
case 2:
*value = mmio_config_readw(addr + reg);
break;
case 4:
*value = mmio_config_readl(addr + reg);
break;
}
rcu_read_unlock();

return 0;
}

So the pci_dev_base function actually fetches the Memory mapped address for PCI config reads and writes.

static char __iomem *pci_dev_base(unsigned int seg, unsigned int bus, unsigned int devfn)
{
struct pci_mmcfg_region *cfg = pci_mmconfig_lookup(seg, bus);

if (cfg && cfg->virt)
return cfg->virt + (PCI_MMCFG_BUS_OFFSET(bus) | (devfn << 12));
return NULL;
}

struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus)
{
struct pci_mmcfg_region *cfg;

list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
if (cfg->segment == segment &&
   cfg->start_bus <= bus && bus <= cfg->end_bus)
return cfg;

return NULL;
}

pci_mmcfg_list should have been populated for this bus. This is done in pci_mmconfig_add function. This value is assigned from ACPI by parsing the SFI table. Here is a call stack for it getting initialized

[    0.112385]  [<ffffffff81d9c0ab>] pci_mmconfig_add+0x3a/0xa0
[    0.112388]  [<ffffffff81d9c39d>] pci_parse_mcfg+0x8e/0x13b
[    0.112391]  [<ffffffff81d9c30f>] ? pci_mmcfg_e7520+0x61/0x61
[    0.112394]  [<ffffffff81d9bab0>] ? pcibios_resource_survey+0x72/0x72
[    0.112397]  [<ffffffff81d85030>] acpi_table_parse+0x6c/0x89
[    0.112400]  [<ffffffff81d9c054>] acpi_sfi_table_parse.constprop.8+0x17/0x34
[    0.112403]  [<ffffffff81d9c5ff>] pci_mmcfg_early_init+0xea/0xfb
[    0.112406]  [<ffffffff81d9bacb>] pci_arch_init+0x1b/0x6a
[    0.112409]  [<ffffffff81002144>] do_one_initcall+0xd4/0x210
[    0.112411]  [<ffffffff81091a00>] ? parse_args+0x70/0x480
[    0.112414]  [<ffffffff810b32f8>] ? __wake_up+0x48/0x60
[    0.112417]  [<ffffffff81d3e27a>] kernel_init_freeable+0x16c/0x1f9
[    0.112420]  [<ffffffff81d3d9a7>] ? initcall_blacklist+0xc0/0xc0
[    0.112423]  [<ffffffff817a00c0>] ? rest_init+0x80/0x80
[    0.112426]  [<ffffffff817a00ce>] kernel_init+0xe/0xf0
[    0.112429]  [<ffffffff817b5d98>] ret_from_fork+0x58/0x90
[    0.112432]  [<ffffffff817a00c0>] ? rest_init+0x80/0x80


There is another way to access the PCI config space using older method. This method is now also supported by PCI as a backward compatible stuff. Check http://wiki.osdev.org/PCI for relevance of 0xCFC and 0xCF8 ports

static int pci_conf1_read(unsigned int seg, unsigned int bus,
 unsigned int devfn, int reg, int len, u32 *value)
{
unsigned long flags;

if (seg || (bus > 255) || (devfn > 255) || (reg > 4095)) {
*value = -1;
return -EINVAL;
}

raw_spin_lock_irqsave(&pci_config_lock, flags);

outl(PCI_CONF1_ADDRESS(bus, devfn, reg), 0xCF8);

switch (len) {
case 1:
*value = inb(0xCFC + (reg & 3));
break;
case 2:
*value = inw(0xCFC + (reg & 2));
break;
case 4:
*value = inl(0xCFC);
break;
}

raw_spin_unlock_irqrestore(&pci_config_lock, flags);

return 0;
}


Info about driver loading :
https://www.linux.com/news/hardware/peripherals/180950-udev
https://bobcares.com/blog/udev-introduction-to-device-management-in-modern-linux-system/


How the probe method of pci device driver is called ?
As we have seen in the PCI enumeration that all the devices attached are checked in a tree like fashion, reading the vendor ID of all the devices. For the devices if the vendor ID comes out to be valid then the function pci_device_add is called.

This function sends a Uevent to user space mentioning the vendor ID, device ID etc. The Udevd sees that this device is enumerated/added. Udevd looks inside /lib/modules/<kernel>/modules.alias for the driver present for this PCI device. Then Udevd loads the driver. Loading driver does pci_register_driver. This goes through following calls and calls the probe of driver.

Jan 10 19:57:42 localhost kernel: [   11.547671] CPU: 1 PID: 345 Comm: udevd Not tainted 4.0.0 #1
Jan 10 19:57:42 localhost kernel: [   11.547672] Hardware name: Hewlett-Packard HP ZBook 840 G2/2216, BIOS M71 Ver. 00.55 05/14/2014
Jan 10 19:57:42 localhost kernel: [   11.547673]  ffff88015707f400 ffff8801568af948 ffffffff816cd469 0000000000000080
Jan 10 19:57:42 localhost kernel: [   11.547676]  0000000000000000 ffff8801568af978 ffffffff8144a2b2 0000000000000000
Jan 10 19:57:42 localhost kernel: [   11.547678]  ffff88015707f400 ffffffffa01f3d20 0000000000000003 ffff8801568afa18
Jan 10 19:57:42 localhost kernel: [   11.547681] Call Trace:
Jan 10 19:57:42 localhost kernel: [   11.547684]  [<ffffffff816cd469>] dump_stack+0x45/0x57
Jan 10 19:57:42 localhost kernel: [   11.547687]  [<ffffffff8144a2b2>] platform_device_add+0x32/0x300
Jan 10 19:57:42 localhost kernel: [   11.547690]  [<ffffffffa01df44f>] mfd_add_device+0x30f/0x3a0 [mfd_core]
Jan 10 19:57:42 localhost kernel: [   11.547693]  [<ffffffff811b9466>] ? __kmalloc+0x166/0x1a0
Jan 10 19:57:42 localhost kernel: [   11.547696]  [<ffffffffa01df6d3>] ? mfd_add_devices+0x53/0x980 [mfd_core]
Jan 10 19:57:42 localhost kernel: [   11.547699]  [<ffffffffa01df6d3>] ? mfd_add_devices+0x53/0x980 [mfd_core]
Jan 10 19:57:42 localhost kernel: [   11.547702]  [<ffffffffa01df735>] mfd_add_devices+0xb5/0x980 [mfd_core]
Jan 10 19:57:42 localhost kernel: [   11.547705]  [<ffffffff815a54d3>] ? raw_pci_read+0x23/0x40
Jan 10 19:57:42 localhost kernel: [   11.547709]  [<ffffffffa01f0515>] lpc_ich_probe+0x395/0x5c4 [lpc_ich]
Jan 10 19:57:42 localhost kernel: [   11.547712]  [<ffffffff8136f70e>] local_pci_probe+0x4e/0xa0
Jan 10 19:57:42 localhost kernel: [   11.547716]  [<ffffffff814432a7>] ? get_device+0x17/0x30
Jan 10 19:57:42 localhost kernel: [   11.547720]  [<ffffffff8136f989>] pci_device_probe+0xd9/0x120
Jan 10 19:57:42 localhost kernel: [   11.547724]  [<ffffffff814480bd>] driver_probe_device+0x9d/0x3c0
Jan 10 19:57:42 localhost kernel: [   11.547728]  [<ffffffff8144848b>] __driver_attach+0xab/0xb0
Jan 10 19:57:42 localhost kernel: [   11.547732]  [<ffffffff814483e0>] ? driver_probe_device+0x3c0/0x3c0
Jan 10 19:57:42 localhost kernel: [   11.547737]  [<ffffffff8144617d>] bus_for_each_dev+0x5d/0xa0
Jan 10 19:57:42 localhost kernel: [   11.547740]  [<ffffffff814479fe>] driver_attach+0x1e/0x20
Jan 10 19:57:42 localhost kernel: [   11.547744]  [<ffffffff814476a4>] bus_add_driver+0x124/0x250
Jan 10 19:57:42 localhost kernel: [   11.547748]  [<ffffffffa01e4000>] ? 0xffffffffa01e4000
Jan 10 19:57:42 localhost kernel: [   11.547752]  [<ffffffff81448ca4>] driver_register+0x64/0xf0
Jan 10 19:57:42 localhost kernel: [   11.547756]  [<ffffffff8136ebbb>] __pci_register_driver+0x4b/0x50
Jan 10 19:57:42 localhost kernel: [   11.547760]  [<ffffffffa01e401e>] lpc_ich_driver_init+0x1e/0x1000 [lpc_ich]
Jan 10 19:57:42 localhost kernel: [   11.547763]  [<ffffffff81000310>] do_one_initcall+0xc0/0x1e0
Jan 10 19:57:42 localhost kernel: [   11.547767]  [<ffffffff811b97a5>] ? kmem_cache_alloc_trace+0x35/0x140
Jan 10 19:57:42 localhost kernel: [   11.547769]  [<ffffffff816c9e86>] do_init_module+0x61/0x1ce
Jan 10 19:57:42 localhost kernel: [   11.547772]  [<ffffffff810f2ad6>] load_module+0x1d16/0x2580
Jan 10 19:57:42 localhost kernel: [   11.547775]  [<ffffffff810ee7e0>] ? unset_module_core_ro_nx+0x80/0x80
Jan 10 19:57:42 localhost kernel: [   11.547779]  [<ffffffff816d6222>] ? page_fault+0x22/0x30
Jan 10 19:57:42 localhost kernel: [   11.547782]  [<ffffffff810f3443>] SyS_init_module+0x103/0x160
Jan 10 19:57:42 localhost kernel: [   11.547785]  [<ffffffff816d47b2>] system_call_fastpath+0x12/0x17