From: Andy Whitcroft To summarise the problem, the buddy allocator currently requires that the boundaries between zones occur at MAX_ORDER boundaries. The specific case where we were tripping up on this was in x86 with NUMA enabled. There we try to ensure that each node's stuct pages are in node local memory, in order to allow them to be virtually mapped we have to reduce the size of ZONE_NORMAL. Here we are rounding the remap space up to a large page size to allow large page TLB entries to be used. However, these are smaller than MAX_ORDER. This can lead to bad buddy merges. With VM_DEBUG enabled we detect the attempts to merge across this boundary and panic. We have two basic options we can either apply the appropriate alignment when we make make the NUMA remap space, or we can 'fix' the assumption in the buddy allocator. The fix for the buddy allocator involves adding conditionals to the free fast path and so it seems reasonable to at least favor realigning the remap space. Following this email are 3 patches: zone-init-check-and-report-unaligned-zone-boundaries -- introduces a zone alignement helper, and uses it to add a check to zone initialisation for unaligned zone boundaries, x86-align-highmem-zone-boundaries-with-NUMA -- uses the zone alignment helper to align the end of ZONE_NORMAL after the remap space has been reserved, and zone-allow-unaligned-zone-boundaries -- modifies the buddy allocator so that we can allow unaligned zone boundaries. A new configuration option is added to enable this functionality. The first two are the fixes for alignement in x86, these fix the panics thrown when VM_DEBUG is enabled. The last is a patch to support unaligned zone boundaries. As this (re)introduces a zone check into the free hot path it seems reasonable to only enable this should it be needed; for example we never need this if we have a single zone. I have tested the failing system with this patch enabled and it also fixes the panic. I am inclined to suggest that it be included as it very clearly documents the alignment requirements for the buddy allocator. This patch: We have a number of strict constraints on the layout of struct page's for use with the buddy allocator. One of which is that zone boundaries must occur at MAX_ORDER page boundaries. Add a check for this during init. Signed-off-by: Andy Whitcroft Signed-off-by: Andrew Morton --- include/linux/mmzone.h | 5 +++++ mm/page_alloc.c | 4 ++++ 2 files changed, 9 insertions(+) diff -puN include/linux/mmzone.h~zone-init-check-and-report-unaligned-zone-boundaries include/linux/mmzone.h --- devel/include/linux/mmzone.h~zone-init-check-and-report-unaligned-zone-boundaries 2006-05-19 16:00:24.000000000 -0700 +++ devel-akpm/include/linux/mmzone.h 2006-05-19 16:00:24.000000000 -0700 @@ -387,6 +387,11 @@ static inline int is_dma(struct zone *zo return zone == zone->zone_pgdat->node_zones + ZONE_DMA; } +static inline unsigned long zone_boundary_align_pfn(unsigned long pfn) +{ + return pfn & ~((1 << MAX_ORDER) - 1); +} + /* These two functions are used to setup the per zone pages min values */ struct ctl_table; struct file; diff -puN mm/page_alloc.c~zone-init-check-and-report-unaligned-zone-boundaries mm/page_alloc.c --- devel/mm/page_alloc.c~zone-init-check-and-report-unaligned-zone-boundaries 2006-05-19 16:00:24.000000000 -0700 +++ devel-akpm/mm/page_alloc.c 2006-05-19 16:00:24.000000000 -0700 @@ -2079,6 +2079,10 @@ static void __init free_area_init_core(s struct zone *zone = pgdat->node_zones + j; unsigned long size, realsize; + if (zone_boundary_align_pfn(zone_start_pfn) != zone_start_pfn) + printk(KERN_CRIT "node %d zone %s misaligned " + "start pfn\n", nid, zone_names[j]); + realsize = size = zones_size[j]; if (zholes_size) realsize -= zholes_size[j]; _