[x86] 64 bit cpu_alloc configuration Allow the use of a 1TB virtually mapped area for the cpu_allocations. For UP and SMP we simply use mappings that are of page size. The usual usage of cpu storage is less than 32k. For the NUMA case we use PMD mappings in order to avoid the per cpu areas generating TLB pressure. Large systems in particular may generate a large need for per cpu storage since each additional node manged by the page allocator adds to the cpu storage needs of a single processor. Typically a single 2M segment is sufficient for small NUMA systems up to 16 nodes. Usually the 1TB of virtually mapped memory is providing enough space for arbitrary amounts of cpu data. However, the following extreme cases need to be kept in mind (mostly relevant to SGI machines): 4k cpu configurations: Maximum mappable cpu data per cpu is 256MB (2^40-12) The maximum per cpu data usage will be reached with 256 node machines. Theoretically possible 16k machines in the far future: Maximum per cpu data can only reach 64MB. These machines may have up to 1k nodes which will likely require about 4x the amounts of cpu storage vs a 4k cpu configuration for both slab and page allocators. Cpu memory use can grow rapidly. F.e. if we assume that a page struct occupies 64 bytes of memory and we have 3 zones per node then we need 3 * 1k * 16k = 50 million pagesets or 3096 pagesets per processor. This results in a total of 3.2 GB of page structs. So each cpu needs around 200k of cpu storage for the page allocator alone. Signed-off-by: Christoph Lameter --- arch/x86/mm/init_64.c | 38 ++++++++++++++++++++++++++++++++++++++ include/asm-x86/pgtable_64.h | 5 +++++ 2 files changed, 43 insertions(+) Index: linux-2.6/include/asm-x86/pgtable_64.h =================================================================== --- linux-2.6.orig/include/asm-x86/pgtable_64.h 2007-11-03 13:32:08.717052956 -0700 +++ linux-2.6/include/asm-x86/pgtable_64.h 2007-11-03 13:58:37.789991830 -0700 @@ -138,6 +138,11 @@ static inline pte_t ptep_get_and_clear_f #define VMALLOC_START _AC(0xffffc20000000000, UL) #define VMALLOC_END _AC(0xffffe1ffffffffff, UL) #define VMEMMAP_START _AC(0xffffe20000000000, UL) +#define CPU_AREA_BASE _AC(0xfffff20000000000, UL) +#define CPU_AREA_BITS 43 +#ifdef CONFIG_NUMA +#define CPU_AREA_BLOCK_SHIFT PMD_SHIFT +#endif #define MODULES_VADDR _AC(0xffffffff88000000, UL) #define MODULES_END _AC(0xfffffffffff00000, UL) #define MODULES_LEN (MODULES_END - MODULES_VADDR) Index: linux-2.6/arch/x86/mm/init_64.c =================================================================== --- linux-2.6.orig/arch/x86/mm/init_64.c 2007-11-03 13:30:49.054553388 -0700 +++ linux-2.6/arch/x86/mm/init_64.c 2007-11-03 13:57:44.162053088 -0700 @@ -781,3 +781,41 @@ int __meminit vmemmap_populate(struct pa return 0; } #endif + +#ifdef CONFIG_NUMA +int __meminit cpu_area_populate(void *start, unsigned long size, + gfp_t flags, int node) +{ + unsigned long addr = (unsigned long)start; + unsigned long end = addr + size; + unsigned long next; + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + + for (; addr < end; addr = next) { + next = pmd_addr_end(addr, end); + + pgd = cpu_area_pgd_populate(addr, flags, node); + if (!pgd) + return -ENOMEM; + pud = cpu_area_pud_populate(pgd, addr, flags, node); + if (!pud) + return -ENOMEM; + + pmd = pmd_offset(pud, addr); + if (pmd_none(*pmd)) { + pte_t entry; + void *p = cpu_area_alloc_block(PMD_SIZE, flags, node); + if (!p) + return -ENOMEM; + + entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL); + mk_pte_huge(entry); + set_pmd(pmd, __pmd(pte_val(entry))); + } + } + + return 0; +} +#endif