From mel@skynet.ie Wed Mar 7 06:56:45 2007 Date: Wed, 7 Mar 2007 14:56:34 +0000 From: Mel Gorman To: Christoph Lameter Cc: akpm@linux-foundation.org Subject: Re: 2.6.21-rc2-mm2 failure [ The following text is in the "iso-8859-15" character set. ] [ Your display is set for the "iso-8859-1" character set. ] [ Some special characters may be displayed incorrectly. ] On Tue, 6 Mar 2007, Christoph Lameter wrote: > Looks like this is due to the zone movable patches? 2.6.21-rc2-mm2 is > available but there was no announcement yet. So is the version officially > out or not? > > Sadly no stack trace. IA64 NUMA single node (occurs in simulator) > Hi again. I am having trouble reproducing this. I've successfully booted on the IA64 HP Ski simulator with and without virtual mem_map. Well, sortof successful, I see these warnings periodically but they are not due to me; BUG: sleeping function called from invalid context at mm/slab.c:3051 in_atomic():0, irqs_disabled():1 Call Trace: [] show_stack+0x80/0xa0 sp=e00000000398faf0 bsp=e0000000039811c8 [] dump_stack+0x30/0x60 sp=e00000000398fcc0 bsp=e0000000039811b0 [] __might_sleep+0x220/0x320 sp=e00000000398fcc0 bsp=e000000003981180 [] __kmalloc+0xb0/0x1c0 sp=e00000000398fcd0 bsp=e000000003981150 [] proc_create+0x150/0x200 sp=e00000000398fcd0 bsp=e000000003981110 [] proc_mkdir_mode+0x40/0xe0 sp=e00000000398fce0 bsp=e0000000039810e0 [] proc_mkdir+0x30/0x60 sp=e00000000398fcf0 bsp=e0000000039810b8 [] register_handler_proc+0x1a0/0x1e0 sp=e00000000398fcf0 bsp=e000000003981070 [] setup_irq+0x510/0x5e0 sp=e00000000398fd70 bsp=e000000003981018 [] request_irq+0x120/0x1a0 sp=e00000000398fd70 bsp=e000000003980fc8 [] rs_open+0xa80/0xb60 sp=e00000000398fd70 bsp=e000000003980f80 [] tty_open+0x5b0/0x840 sp=e00000000398fd80 bsp=e000000003980f30 [] chrdev_open+0x2d0/0x4e0 sp=e00000000398fd90 bsp=e000000003980ed8 [] __dentry_open+0x230/0x560 sp=e00000000398fda0 bsp=e000000003980e58 [] nameidata_to_filp+0x70/0xc0 sp=e00000000398fda0 bsp=e000000003980e30 [] do_filp_open+0xa0/0xe0 sp=e00000000398fda0 bsp=e000000003980df0 [] do_sys_open+0xb0/0x220 sp=e00000000398fe30 bsp=e000000003980da0 [] sys_open+0x50/0x80 sp=e00000000398fe30 bsp=e000000003980d48 [] ia64_ret_from_syscall+0x0/0x20 sp=e00000000398fe30 bsp=e000000003980d48 sid:/usr/src# dmesg | grep BUG | wc -l 5 I've also successfully booted on a "real" ia64. I noticed that your log had set slub_debug so I applied those patches, selected SLUB and successfully booted that as well. Despite the lack of bug reproduction, I think I know what's going wrong in your case. move_freepages_block() takes a page, looks up its PFN and checks that pages at the beginning and end of that MAX_ORDER_NR_PAGES block of pages are in the same zone before calling move_freepages(). The page must be valid because it came from the free lists and there normally is valid PFNs within a MAX_ORDER_NR_PAGES block but not when CONFIG_HOLES_IN_ZONE is set. I note that your PFN ranges are not aligned on MAX_ORDER_NR_PAGES. I am guessing that page_zone() is being passed a page outside of mem_map and it's not MAX_ORDER_NR_PAGES aligned. The "struct page" passed to page_zone() is then garbage. I've included a patch below that I'd like you to test please. It's functionally identical to current behaviour but avoids the use of page_zone() and instead relys on PFNs. This is preferable that sticking in more pfn_valid() checks. Please let me know if it cleans up your issue. I also have a few other questions. 1. What simulator are you using? I see your PFN ranges look like this > Zone PFN ranges: > Normal 12585984 -> 12598272 > early_node_map[1] active PFN ranges > 0: 12585984 -> 12598272 On Ski, it looks like this Zone PFN ranges: DMA 0 -> 65536 Normal 65536 -> 66048 So, ZONE_DMA is missing for you and that option is not available for Ski. Can I get access to whatever simulator you are using? I can add it to my normal regression tests. 2. Does this problem occur on 2.6.21-rc2-mm2 without additional patches? I would imagine "yes", but it doesn't hurt to check 3. Can I see your .config please? Thanks a lot for testing. Candidate fix patch follows ==== Usually, a mem_map is aligned on MAX_ORDER_NR_PAGES boundaries and the struct pages are always valid. However, this is not always the case when CONFIG_HOLES_IN_ZONE is set. move_freepages_block() checks that pages within a MAX_ORDER_NR_PAGES block are in the same zone using page_zone(). However, if an invalid page is passed to page_zone(), it can result in breakage on machines requiring CONFIG_HOLES_IN_ZONE. This patch avoids the use of page_zone() and instead checks the PFNs against the PFN ranges of the zone. Signed-off-by: Mel Gorman diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-rc2-mm2-clean/mm/page_alloc.c linux-2.6.21-rc2-mm2-ia64_movefreepages_fix/mm/page_alloc.c --- linux-2.6.21-rc2-mm2-clean/mm/page_alloc.c 2007-03-07 09:36:31.000000000 +0000 +++ linux-2.6.21-rc2-mm2-ia64_movefreepages_fix/mm/page_alloc.c 2007-03-07 14:17:43.000000000 +0000 @@ -734,18 +734,19 @@ int move_freepages(struct zone *zone, int move_freepages_block(struct zone *zone, struct page *page, int migratetype) { - unsigned long start_pfn; + unsigned long start_pfn, end_pfn; struct page *start_page, *end_page; start_pfn = page_to_pfn(page); start_pfn = start_pfn & ~(MAX_ORDER_NR_PAGES-1); start_page = pfn_to_page(start_pfn); end_page = start_page + MAX_ORDER_NR_PAGES; + end_pfn = start_pfn + MAX_ORDER_NR_PAGES; /* Do not cross zone boundaries */ - if (page_zone(page) != page_zone(start_page)) + if (start_pfn < zone->zone_start_pfn) start_page = page; - if (page_zone(page) != page_zone(end_page)) + if (end_pfn >= zone->zone_start_pfn + zone->spanned_pages) return 0; return move_freepages(zone, start_page, end_page, migratetype);