From: Andy Whitcroft The buddy allocator has a requirement that boundaries between contigious zones occur aligned with the the MAX_ORDER ranges. Where they do not we will incorrectly merge pages cross zone boundaries. This can lead to pages from the wrong zone being handed out. Originally the buddy allocator would check that buddies were in the same zone by referencing the zone start and end page frame numbers. This was removed as it became very expensive and the buddy allocator already made the assumption that zones boundaries were aligned. It is clear that not all configurations and architectures are honouring this alignement requirement. Therefore it seems safest to reintroduce support for non-aligned zone boundaries. This can then default to on protecting those architectures which have not been audited. This patch introduces a new check when considering a page a buddy it compares the zone_table index for the two pages and refuses to merge the pages where they do not match. The zone_table index is unique for each node/zone combination when FLATMEM/DISCONTIGMEM is enabled and for each section/zone combination when SPARSEMEM is enabled (a SPARSEMEM section is at least a MAX_ORDER size). And add the UNALIGNED_ZONE_BOUNDARIES configuration option. This enables the additional checks allowing non-aligned zone boundaries to be handled safly. Default this to on, allowing architectures which have audited their zone create code to disable these check for efficiency. Signed-off-by: Andy Whitcroft Signed-off-by: Andrew Morton --- include/linux/mm.h | 7 +++++-- include/linux/mmzone.h | 4 ++++ mm/Kconfig | 13 +++++++++++++ mm/page_alloc.c | 21 ++++++++++++++------- 4 files changed, 36 insertions(+), 9 deletions(-) diff -puN include/linux/mm.h~zone-allow-unaligned-zone-boundaries include/linux/mm.h --- devel/include/linux/mm.h~zone-allow-unaligned-zone-boundaries 2006-06-09 15:21:46.000000000 -0700 +++ devel-akpm/include/linux/mm.h 2006-06-09 15:21:46.000000000 -0700 @@ -473,10 +473,13 @@ static inline unsigned long page_zonenum struct zone; extern struct zone *zone_table[]; +static inline int page_zone_id(struct page *page) +{ + return (page->flags >> ZONETABLE_PGSHIFT) & ZONETABLE_MASK; +} static inline struct zone *page_zone(struct page *page) { - return zone_table[(page->flags >> ZONETABLE_PGSHIFT) & - ZONETABLE_MASK]; + return zone_table[page_zone_id(page)]; } static inline unsigned long page_to_nid(struct page *page) diff -puN include/linux/mmzone.h~zone-allow-unaligned-zone-boundaries include/linux/mmzone.h --- devel/include/linux/mmzone.h~zone-allow-unaligned-zone-boundaries 2006-06-09 15:21:46.000000000 -0700 +++ devel-akpm/include/linux/mmzone.h 2006-06-09 15:21:46.000000000 -0700 @@ -391,7 +391,11 @@ static inline int is_dma(struct zone *zo static inline unsigned long zone_boundary_align_pfn(unsigned long pfn) { +#ifdef CONFIG_UNALIGNED_ZONE_BOUNDARIES + return pfn; +#else return pfn & ~((1 << MAX_ORDER) - 1); +#endif } /* These two functions are used to setup the per zone pages min values */ diff -puN mm/Kconfig~zone-allow-unaligned-zone-boundaries mm/Kconfig --- devel/mm/Kconfig~zone-allow-unaligned-zone-boundaries 2006-06-09 15:21:46.000000000 -0700 +++ devel-akpm/mm/Kconfig 2006-06-09 15:21:46.000000000 -0700 @@ -145,3 +145,16 @@ config MIGRATION while the virtual addresses are not changed. This is useful for example on NUMA systems to put pages nearer to the processors accessing the page. + +# +# Support for buddy zone boundaries within a MAX_ORDER sized area. +# +config UNALIGNED_ZONE_BOUNDARIES + bool "Unaligned zone boundaries" + default n if ARCH_ALIGNED_ZONE_BOUNDARIES + default y + help + Adds checks to the buddy allocator to ensure we do not + coalesce buddies across zone boundaries. The default + should be correct for your architecture. Enable this if + you are having trouble and you are requested to in dmesg. diff -puN mm/page_alloc.c~zone-allow-unaligned-zone-boundaries mm/page_alloc.c --- devel/mm/page_alloc.c~zone-allow-unaligned-zone-boundaries 2006-06-09 15:21:46.000000000 -0700 +++ devel-akpm/mm/page_alloc.c 2006-06-09 15:21:46.000000000 -0700 @@ -285,22 +285,28 @@ __find_combined_index(unsigned long page * we can do coalesce a page and its buddy if * (a) the buddy is not in a hole && * (b) the buddy is in the buddy system && - * (c) a page and its buddy have the same order. + * (c) a page and its buddy have the same order && + * (d) a page and its buddy are in the same zone. * * For recording whether a page is in the buddy system, we use PG_buddy. * Setting, clearing, and testing PG_buddy is serialized by zone->lock. * * For recording page's order, we use page_private(page). */ -static inline int page_is_buddy(struct page *page, int order) +static inline int page_is_buddy(struct page *page, struct page *buddy, + int order) { #ifdef CONFIG_HOLES_IN_ZONE - if (!pfn_valid(page_to_pfn(page))) + if (!pfn_valid(page_to_pfn(buddy))) + return 0; +#endif +#ifdef CONFIG_UNALIGNED_ZONE_BOUNDARIES + if (page_zone_id(page) != page_zone_id(buddy)) return 0; #endif - if (PageBuddy(page) && page_order(page) == order) { - BUG_ON(page_count(page) != 0); + if (PageBuddy(buddy) && page_order(buddy) == order) { + BUG_ON(page_count(buddy) != 0); return 1; } return 0; @@ -351,7 +357,7 @@ static inline void __free_one_page(struc struct page *buddy; buddy = __page_find_buddy(page, page_idx, order); - if (!page_is_buddy(buddy, order)) + if (!page_is_buddy(page, buddy, order)) break; /* Move the buddy up one level. */ list_del(&buddy->lru); @@ -2090,7 +2096,8 @@ static void __init free_area_init_core(s if (zone_boundary_align_pfn(zone_start_pfn) != zone_start_pfn && j != 0 && size != 0) printk(KERN_CRIT "node %d zone %s misaligned " - "start pfn\n", nid, zone_names[j]); + "start pfn, enable UNALIGNED_ZONE_BOUNDARIES\n", + nid, zone_names[j]); zone->spanned_pages = size; zone->present_pages = realsize; _