Sunday, 7 May 2017

Linux Memory initialization

Linux memory init:


start_kernel calls setup_arch. setup_arch is a big function where many of the early boot memory allocations and initializations happen.

When not many of the kernel memory allocation apis are not available at boot, kernel uses memblock APIs to do the needed allocations.

One more mode of allocations is using the early ioremap functionality :

EARLY IOREMAP initialization : 

early_ioremap_init :
for (i = 0; i < FIX_BTMAPS_SLOTS; i++)
slot_virt[i] = __fix_to_virt(FIX_BTMAP_BEGIN - NR_FIX_BTMAPS*i);
fills the slot_virt array with virtual addresses of early fixmaps.

static unsigned long slot_virt[FIX_BTMAPS_SLOTS] __initdata;
static void __iomem *prev_map[FIX_BTMAPS_SLOTS] __initdata;
static unsigned long prev_size[FIX_BTMAPS_SLOTS] __initdata;

early_ioremap_init function fills the slot_virt array with the virtual addresses of the early fixmaps. There are 8 slots with 64 boot time map each. Altogether 512 maps.
pmd_populate_kernel fetches the pmd for FIX_BTMAP_BEGIN.

pmd_populate_kernel function populates the page middle directory (pmd) provided as an argument with the given page table entries (bm_pte).

As soon as early ioremap has been setup successfully, we can use it. It provides two functions:

early_ioremap
early_iounmap

More details here :
https://github.com/0xAX/linux-insides/blob/master/mm/linux-mm-2.md#use-of-early-ioremap

MEMBLOCKS :
Linux memblock initialization : 
Initially we have array of 128 regions for memory init and 128 regions for reserved init regions.
memblock_reserve region add memblock to reserve_init regions.

static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;
static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;

struct memblock memblock __initdata_memblock = {
.memory.regions = memblock_memory_init_regions,
.memory.cnt = 1, /* empty dummy entry */
.memory.max = INIT_MEMBLOCK_REGIONS,

.reserved.regions = memblock_reserved_init_regions,
.reserved.cnt = 1, /* empty dummy entry */
.reserved.max = INIT_MEMBLOCK_REGIONS,

.bottom_up = false,

.current_limit = MEMBLOCK_ALLOC_ANYWHERE,
};

#define INIT_MEMBLOCK_REGIONS 128

Memblock APIs :

memblock_add_range(&memblock.memory, base, size, MAX_NUMNODES, 0);
This function takes a physical base address and the size of the memory region as arguments and add them to the memblock



SETUP ARCH : 

reserves memory in for of memblock for _text, _data _bss of kernel using :
memblock_reserve(__pa_symbol(_text),
(unsigned long)__bss_stop - (unsigned long)_text);


Then it does the memblock reserve for initrd :
early_reserve_initrd

SETUP MEMORY MAP : 
Next memory related function is setup_memory_map. To understand this function we need to first go through the basics of e820 map.

e820 is used by BIOS to report the memory map to linux.
The memory map is built in function setup_arch which is called by start_kernel(). It is accessed via the int 15h call, by setting the AX register to value E820 in hexadecimal. Source (https://en.wikipedia.org/wiki/E820)

arch/x86/kernel/setup.c

start_kernel()
setup_arch()
setup_memory_map()
default_machine_specific_memory_setup()
Here the entries are taken from boot_params e820_map. Regions are sanitized and saved to struct e820map e820;

struct e820map {
__u32 nr_map;
struct e820entry map[E820_X_MAX];
};

append_e820_map copies all the BIOS entries into a safe place i.e struct e820map e820;
The same map is then printed on the Linux kernel messages:

Jul 13 21:42:42 localhost kernel: e820: BIOS-provided physical RAM map:
Jul 13 21:42:42 localhost kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009f7ff] usable
Jul 13 21:42:42 localhost kernel: BIOS-e820: [mem 0x000000000009f800-0x000000000009ffff] reserved
Jul 13 21:42:42 localhost kernel: BIOS-e820: [mem 0x00000000000dc000-0x00000000000fffff] reserved
Jul 13 21:42:42 localhost kernel: BIOS-e820: [mem 0x0000000000100000-0x000000007feeffff] usable
Jul 13 21:42:42 localhost kernel: BIOS-e820: [mem 0x000000007fef0000-0x000000007fefefff] ACPI data
Jul 13 21:42:42 localhost kernel: BIOS-e820: [mem 0x000000007feff000-0x000000007fefffff] ACPI NVS
Jul 13 21:42:42 localhost kernel: BIOS-e820: [mem 0x000000007ff00000-0x000000007fffffff] usable
Jul 13 21:42:42 localhost kernel: BIOS-e820: [mem 0x00000000f0000000-0x00000000f7ffffff] reserved
Jul 13 21:42:42 localhost kernel: BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved
Jul 13 21:42:42 localhost kernel: BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
Jul 13 21:42:42 localhost kernel: BIOS-e820: [mem 0x00000000fffe0000-0x00000000ffffffff] reserved

Lets see how many MBs gets used by all these regions :
region 1 9f7ff locations, meaning 9f7ff  8 bytes : 653311 bytes -- .65 MB
region 2 7FFF , 2047  8bytes 0.002 MBs
region 3 23FFF locations, .147 MB
region 4 7fDEFFFF locations, 2145320959 bytes 2.14 GB
region 5   .061 MB
region 6  fff 0.04 MB
region 7 1048575 1 MB
region 8 134mb
region 9 0.04mb
region 10 0.04mb
region 11 0.065 MB


Seems OK  as I have 2048 assigned for my OS in my VM

Following are the types of memory in e820 map :
01h memory, available to OS
02h reserved, not available (e.g. system ROM, memory-mapped device)
03h ACPI Reclaim Memory (usable by OS after reading ACPI tables)
04h ACPI NVS Memory (OS is required to save this memory between NVS sessions)

extended map is parsed and printed using parse_e820_ext

The setup_arch function then assigns the following variables
init_mm.start_code = (unsigned long) _text;
init_mm.end_code = (unsigned long) _etext;
init_mm.end_data = (unsigned long) _edata;
init_mm.brk = _brk_end;

May  7 20:55:51 localhost kernel: _text = 0xffffffff81000000
May  7 20:55:51 localhost kernel: _etext = 0xffffffff81600f65
May  7 20:55:51 localhost kernel: _edata = 0xffffffff81a00800
code_resource.start = 1000000
code_resource.end = 1600f64
data_resource.start = 1600f65
data_resource.end = 1a007ff
bss_resource.start = 1b95000
bss_resource.end = 1e2cfff

e820_add_kernel_range() adds the kernel range.
Apr  1 20:10:35 localhost kernel: e820: last_pfn = 0x80000 max_arch_pfn = 0x400000000
The last pfn comes as 128 MB amd max_arch_pfn comes to be 64TB.

1 comment:

  1. Great stuff thanks for ! It would be great if your code snippet had syntax highligthing.
    It's tought to read non coloured code! :)

    ReplyDelete