From clameter@sgi.com Wed Oct 3 20:57:14 2007 Message-Id: <20071004035656.203248544@sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 03 Oct 2007 20:56:56 -0700 From: Christoph Lameter To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [00/18] Virtual Compound Page Support V2 Allocations of larger pages are not reliable in Linux. If larger pages have to be allocated then one faces various choices of allowing graceful fallback or using vmalloc with a performance penalty due to the use of a page table. Virtual Compound pages are a simple solution out of this dilemma. If an allocation specifies GFP_VFALLBACK then the page allocator will first attempt to satisfy the request with physically contiguous memory. If that is not possible then the page allocator will create a virtually contiguous memory area for the caller. That way large allocations may perhaps be considered "reliable" indepedent of the memory fragmentation situation. This means that memory with optimal performance is used when available. We are currently gradually introducing methods to reduce memory defragmentation. The better these methods become the less the chances that fallback will occur. Fallback is rare in particular on machines with contemporary memory sizes of 1G or more. It seems to take special load situations that pin a lot of memory and systems with low memory in order to get system memory so fragmented that the fallback scheme must kick in. There is therefore a compile time option to switch on fallback for testing purposes. Virtually mapped mmemory may behave differently and the CONFIG_FALLBACK_ALWAYS option will insure that the code is tested to deal with virtual memory. The patchset then addresses a series of issues in the current code through the use of fallbacks: - Fallback for x86_64 stack allocations. The default stack size is 8k which requires an order 1 allocation. - Removes the manual fallback to vmalloc for sparsemem through the use of GFP_VFALLBACK. - Uses a compound page for the wait table in the zone thereby avoiding having to go through a page table to get to the data structures used for waiting on events in pages. - Allows fallback for the order 2 allocation in the crypto subsystem. - Allows fallback for the caller table used by SLUB when determining the call sites for slab caches for sysfs output. - Allows a configurable stack size on x86_64 (up to 32k). More uses are possible by simply adding GFP_VFALLBACK to the page flags or by converting vmalloc calls to regular page allocator calls. It is likely that we have had to avoid the use of larger memory areas because of the reliability issues. The patch may simplify future coding of handling large memoryh areas because these issues are taken care of by the page allocator. For HPC uses we constantly have to deal with demands for larger and larger memory areas to speed up various loads. Additional patches exist to enable SLUB and the Large Blocksize Patchset to use these fallbacks. The patchset is also available via git from the largeblock git tree via git pull git://git.kernel.org/pub/scm/linux/kernel/git/christoph/largeblocksize.git vcompound V1->V2 - Remove some cleanup patches and the SLUB patches from this set. - Transparent vcompound support through page_address() and virt_to_head_page(). - Additional use cases. - Factor the code better for an easier read - Add configurable stack size. - Follow up on various suggestions made for V1 RFC->V1 - Complete support for all compound functions for virtual compound pages (including the compound_nth_page() necessary for LBS mmap support) - Fix various bugs - Fix i386 build -- From clameter@sgi.com Wed Oct 3 20:57:14 2007 Message-Id: <20071004035714.698110319@sgi.com> References: <20071004035656.203248544@sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 03 Oct 2007 20:56:57 -0700 From: Christoph Lameter To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [01/18] vmalloc: clean up page array indexing Content-Disposition: inline; filename=vcompound_array_indexes The page array is repeatedly indexed both in vunmap and vmalloc_area_node(). Add a temporary variable to make it easier to read (and easier to patch later). Signed-off-by: Christoph Lameter --- mm/vmalloc.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) Index: linux-2.6/mm/vmalloc.c =================================================================== --- linux-2.6.orig/mm/vmalloc.c 2007-10-02 09:26:16.000000000 -0700 +++ linux-2.6/mm/vmalloc.c 2007-10-02 21:35:34.000000000 -0700 @@ -345,8 +345,10 @@ static void __vunmap(void *addr, int dea int i; for (i = 0; i < area->nr_pages; i++) { - BUG_ON(!area->pages[i]); - __free_page(area->pages[i]); + struct page *page = area->pages[i]; + + BUG_ON(!page); + __free_page(page); } if (area->flags & VM_VPAGES) @@ -450,15 +452,19 @@ void *__vmalloc_area_node(struct vm_stru } for (i = 0; i < area->nr_pages; i++) { + struct page *page; + if (node < 0) - area->pages[i] = alloc_page(gfp_mask); + page = alloc_page(gfp_mask); else - area->pages[i] = alloc_pages_node(node, gfp_mask, 0); - if (unlikely(!area->pages[i])) { + page = alloc_pages_node(node, gfp_mask, 0); + + if (unlikely(!page)) { /* Successfully allocated i pages, free them in __vunmap() */ area->nr_pages = i; goto fail; } + area->pages[i] = page; } if (map_vm_area(area, prot, &pages)) -- From clameter@sgi.com Wed Oct 3 20:57:15 2007 Message-Id: <20071004035714.853651769@sgi.com> References: <20071004035656.203248544@sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 03 Oct 2007 20:56:58 -0700 From: Christoph Lameter To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [02/18] vunmap: return page array passed on vmap() Content-Disposition: inline; filename=vcompound_vunmap_returns_pages Make vunmap return the page array that was used at vmap. This is useful if one has no structures to track the page array but simply stores the virtual address somewhere. The disposition of the page array can be decided upon after vunmap. vfree() may now also be used instead of vunmap which will release the page array after vunmap'ping it. As noted by Kamezawa: The same subsystem that provides the page array to vmap must must use its own method to dispose of the page array. If vfree() is called to free the page array then the page array must either be 1. Allocated via the slab allocator 2. Allocated via vmalloc but then VM_VPAGES must have been passed at vunmap to specify that a vfree is needed. Signed-off-by: Christoph Lameter --- include/linux/vmalloc.h | 2 +- mm/vmalloc.c | 32 ++++++++++++++++++++++---------- 2 files changed, 23 insertions(+), 11 deletions(-) Index: linux-2.6/include/linux/vmalloc.h =================================================================== --- linux-2.6.orig/include/linux/vmalloc.h 2007-10-03 16:19:29.000000000 -0700 +++ linux-2.6/include/linux/vmalloc.h 2007-10-03 16:19:41.000000000 -0700 @@ -49,7 +49,7 @@ extern void vfree(void *addr); extern void *vmap(struct page **pages, unsigned int count, unsigned long flags, pgprot_t prot); -extern void vunmap(void *addr); +extern struct page **vunmap(void *addr); extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr, unsigned long pgoff); Index: linux-2.6/mm/vmalloc.c =================================================================== --- linux-2.6.orig/mm/vmalloc.c 2007-10-03 16:19:35.000000000 -0700 +++ linux-2.6/mm/vmalloc.c 2007-10-03 16:20:15.000000000 -0700 @@ -152,6 +152,7 @@ int map_vm_area(struct vm_struct *area, unsigned long addr = (unsigned long) area->addr; unsigned long end = addr + area->size - PAGE_SIZE; int err; + area->pages = *pages; BUG_ON(addr >= end); pgd = pgd_offset_k(addr); @@ -162,6 +163,8 @@ int map_vm_area(struct vm_struct *area, break; } while (pgd++, addr = next, addr != end); flush_cache_vmap((unsigned long) area->addr, end); + + area->nr_pages = *pages - area->pages; return err; } EXPORT_SYMBOL_GPL(map_vm_area); @@ -318,17 +321,18 @@ struct vm_struct *remove_vm_area(void *a return v; } -static void __vunmap(void *addr, int deallocate_pages) +static struct page **__vunmap(void *addr, int deallocate_pages) { struct vm_struct *area; + struct page **pages; if (!addr) - return; + return NULL; if ((PAGE_SIZE-1) & (unsigned long)addr) { printk(KERN_ERR "Trying to vfree() bad address (%p)\n", addr); WARN_ON(1); - return; + return NULL; } area = remove_vm_area(addr); @@ -336,29 +340,30 @@ static void __vunmap(void *addr, int dea printk(KERN_ERR "Trying to vfree() nonexistent vm area (%p)\n", addr); WARN_ON(1); - return; + return NULL; } + pages = area->pages; debug_check_no_locks_freed(addr, area->size); if (deallocate_pages) { int i; for (i = 0; i < area->nr_pages; i++) { - struct page *page = area->pages[i]; + struct page *page = pages[i]; BUG_ON(!page); __free_page(page); } if (area->flags & VM_VPAGES) - vfree(area->pages); + vfree(pages); else - kfree(area->pages); + kfree(pages); } kfree(area); - return; + return pages; } /** @@ -387,10 +392,10 @@ EXPORT_SYMBOL(vfree); * * Must not be called in interrupt context. */ -void vunmap(void *addr) +struct page **vunmap(void *addr) { BUG_ON(in_interrupt()); - __vunmap(addr, 0); + return __vunmap(addr, 0); } EXPORT_SYMBOL(vunmap); @@ -403,6 +408,13 @@ EXPORT_SYMBOL(vunmap); * * Maps @count pages from @pages into contiguous kernel virtual * space. + * + * The page array may be freed via vfree() on the virtual address + * returned. In that case the page array must be allocated via + * the slab allocator. If the page array was allocated via + * vmalloc then VM_VPAGES must be specified in the flags. There is + * no support for vfree() to free a page array allocated via the + * page allocator. */ void *vmap(struct page **pages, unsigned int count, unsigned long flags, pgprot_t prot) -- From clameter@sgi.com Wed Oct 3 20:57:15 2007 Message-Id: <20071004035715.014810641@sgi.com> References: <20071004035656.203248544@sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 03 Oct 2007 20:56:59 -0700 From: Christoph Lameter To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [03/18] vmalloc_address(): Determine vmalloc address from page struct Content-Disposition: inline; filename=vcompound_vmalloc_address Sometimes we need to figure out which vmalloc address is in use for a certain page struct. There is no easy way to figure out the vmalloc address from the page struct. Simply search through the kernel page tables to find the address. Use sparingly. Signed-off-by: Christoph Lameter --- include/linux/mm.h | 2 + mm/vmalloc.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 81 insertions(+) Index: linux-2.6/mm/vmalloc.c =================================================================== --- linux-2.6.orig/mm/vmalloc.c 2007-10-03 16:20:15.000000000 -0700 +++ linux-2.6/mm/vmalloc.c 2007-10-03 16:20:48.000000000 -0700 @@ -840,3 +840,82 @@ void free_vm_area(struct vm_struct *area kfree(area); } EXPORT_SYMBOL_GPL(free_vm_area); + + +/* + * Determine vmalloc address from a page struct. + * + * Linear search through all ptes of the vmalloc area. + */ +static unsigned long vaddr_pte_range(pmd_t *pmd, unsigned long addr, + unsigned long end, unsigned long pfn) +{ + pte_t *pte; + + pte = pte_offset_kernel(pmd, addr); + do { + pte_t ptent = *pte; + if (pte_present(ptent) && pte_pfn(ptent) == pfn) + return addr; + } while (pte++, addr += PAGE_SIZE, addr != end); + return 0; +} + +static inline unsigned long vaddr_pmd_range(pud_t *pud, unsigned long addr, + unsigned long end, unsigned long pfn) +{ + pmd_t *pmd; + unsigned long next; + unsigned long n; + + pmd = pmd_offset(pud, addr); + do { + next = pmd_addr_end(addr, end); + if (pmd_none_or_clear_bad(pmd)) + continue; + n = vaddr_pte_range(pmd, addr, next, pfn); + if (n) + return n; + } while (pmd++, addr = next, addr != end); + return 0; +} + +static inline unsigned long vaddr_pud_range(pgd_t *pgd, unsigned long addr, + unsigned long end, unsigned long pfn) +{ + pud_t *pud; + unsigned long next; + unsigned long n; + + pud = pud_offset(pgd, addr); + do { + next = pud_addr_end(addr, end); + if (pud_none_or_clear_bad(pud)) + continue; + n = vaddr_pmd_range(pud, addr, next, pfn); + if (n) + return n; + } while (pud++, addr = next, addr != end); + return 0; +} + +void *vmalloc_address(struct page *page) +{ + pgd_t *pgd; + unsigned long next, n; + unsigned long addr = VMALLOC_START; + unsigned long pfn = page_to_pfn(page); + + pgd = pgd_offset_k(VMALLOC_START); + do { + next = pgd_addr_end(addr, VMALLOC_END); + if (pgd_none_or_clear_bad(pgd)) + continue; + n = vaddr_pud_range(pgd, addr, next, pfn); + if (n) + return (void *)n; + } while (pgd++, addr = next, addr < VMALLOC_END); + return NULL; +} +EXPORT_SYMBOL(vmalloc_address); + Index: linux-2.6/include/linux/mm.h =================================================================== --- linux-2.6.orig/include/linux/mm.h 2007-10-03 16:19:27.000000000 -0700 +++ linux-2.6/include/linux/mm.h 2007-10-03 16:20:48.000000000 -0700 @@ -294,6 +294,8 @@ static inline int get_page_unless_zero(s return atomic_inc_not_zero(&page->_count); } +void *vmalloc_address(struct page *); + static inline struct page *compound_head(struct page *page) { if (unlikely(PageTail(page))) -- From clameter@sgi.com Wed Oct 3 20:57:15 2007 Message-Id: <20071004035715.174596832@sgi.com> References: <20071004035656.203248544@sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 03 Oct 2007 20:57:00 -0700 From: Christoph Lameter To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [04/18] Vcompound: Smart up virt_to_head_page() Content-Disposition: inline; filename=vcompound_virt_to_head_page The determination of a page struct for an address in a compound page will need some more smarts in order to deal with virtual addresses. We need to use the evil constants VMALLOC_START and VMALLOC_END for this and they are notoriously for referencing various arch header files or may even be variables. Uninline the function to avoid trouble. Signed-off-by: Christoph Lameter --- include/linux/mm.h | 6 +----- mm/page_alloc.c | 23 +++++++++++++++++++++++ 2 files changed, 24 insertions(+), 5 deletions(-) Index: linux-2.6/include/linux/mm.h =================================================================== --- linux-2.6.orig/include/linux/mm.h 2007-10-03 19:21:50.000000000 -0700 +++ linux-2.6/include/linux/mm.h 2007-10-03 19:23:08.000000000 -0700 @@ -315,11 +315,7 @@ static inline void get_page(struct page atomic_inc(&page->_count); } -static inline struct page *virt_to_head_page(const void *x) -{ - struct page *page = virt_to_page(x); - return compound_head(page); -} +struct page *virt_to_head_page(const void *x); /* * Setup the page count before being freed into the page allocator for Index: linux-2.6/mm/page_alloc.c =================================================================== --- linux-2.6.orig/mm/page_alloc.c 2007-10-03 19:21:50.000000000 -0700 +++ linux-2.6/mm/page_alloc.c 2007-10-03 19:23:08.000000000 -0700 @@ -150,6 +150,29 @@ int nr_node_ids __read_mostly = MAX_NUMN EXPORT_SYMBOL(nr_node_ids); #endif +/* + * Determine the appropriate page struct given a virtual address + * (including vmalloced areas). + * + * Return the head page if this is a compound page. + * + * Cannot be inlined since VMALLOC_START and VMALLOC_END may contain + * complex calculations that depend on multiple arch includes or + * even variables. + */ +struct page *virt_to_head_page(const void *x) +{ + unsigned long addr = (unsigned long)x; + struct page *page; + + if (unlikely(addr >= VMALLOC_START && addr < VMALLOC_END)) + page = vmalloc_to_page((void *)addr); + else + page = virt_to_page(addr); + + return compound_head(page); +} + #ifdef CONFIG_DEBUG_VM static int page_outside_zone_boundaries(struct zone *zone, struct page *page) { -- From clameter@sgi.com Wed Oct 3 20:57:15 2007 Message-Id: <20071004035715.333644510@sgi.com> References: <20071004035656.203248544@sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 03 Oct 2007 20:57:01 -0700 From: Christoph Lameter To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [05/18] Page flags: Add PageVcompound() Content-Disposition: inline; filename=vcompound_pagevcompound Add a another page flag that can be used to figure out if a compound page is virtually mapped. The mark is necessary since we have to know when freeing pages if we have to destroy a virtual mapping. No additional flag is consumed through the use of PG_swapcache together with PG_compound (similar to PageHead() and PageTail()). Signed-off-by: Christoph Lameter --- include/linux/page-flags.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) Index: linux-2.6/include/linux/page-flags.h =================================================================== --- linux-2.6.orig/include/linux/page-flags.h 2007-10-03 19:31:51.000000000 -0700 +++ linux-2.6/include/linux/page-flags.h 2007-10-03 19:34:37.000000000 -0700 @@ -248,6 +248,24 @@ static inline void __ClearPageTail(struc #define __SetPageHead(page) __SetPageCompound(page) #define __ClearPageHead(page) __ClearPageCompound(page) +/* + * PG_swapcache is used in combination with PG_compound to indicate + * that a compound page was allocated via vmalloc. + */ +#define PG_vcompound_mask ((1L << PG_compound) | (1L << PG_swapcache)) +#define PageVcompound(page) ((page->flags & PG_vcompound_mask) \ + == PG_vcompound_mask) + +static inline void __SetPageVcompound(struct page *page) +{ + page->flags |= PG_vcompound_mask; +} + +static inline void __ClearPageVcompound(struct page *page) +{ + page->flags &= ~PG_vcompound_mask; +} + #ifdef CONFIG_SWAP #define PageSwapCache(page) test_bit(PG_swapcache, &(page)->flags) #define SetPageSwapCache(page) set_bit(PG_swapcache, &(page)->flags) -- From clameter@sgi.com Wed Oct 3 20:57:15 2007 Message-Id: <20071004035715.497063139@sgi.com> References: <20071004035656.203248544@sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 03 Oct 2007 20:57:02 -0700 From: Christoph Lameter To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, apw@shadowen.org Subject: [06/18] Vcompound: Update page address determination Content-Disposition: inline; filename=vcompound_page_address Make page_address() correctly determine the address of a potentially virtually mapped compound page. There are 3 cases to consider: 1. !HASHED_PAGE_VIRTUAL && !WANT_PAGE_VIRTUAL Call vmalloc_address() directly from the page_address function defined in mm.h. 2. HASHED_PAGE_VIRTUAL Modify page_address() in highmem.c to call vmalloc_address(). 3. WANT_PAGE_VIRTUAL set_page_address() is used to set up the virtual addresses of all pages that are part of the virtual compound. Cc: apw@shadowen.org Signed-off-by: Christoph Lameter --- include/linux/mm.h | 9 ++++++++- mm/highmem.c | 10 ++++++++-- 2 files changed, 16 insertions(+), 3 deletions(-) Index: linux-2.6/include/linux/mm.h =================================================================== --- linux-2.6.orig/include/linux/mm.h 2007-10-03 19:39:52.000000000 -0700 +++ linux-2.6/include/linux/mm.h 2007-10-03 19:40:29.000000000 -0700 @@ -605,7 +605,14 @@ void page_address_init(void); #endif #if !defined(HASHED_PAGE_VIRTUAL) && !defined(WANT_PAGE_VIRTUAL) -#define page_address(page) lowmem_page_address(page) + +static inline void *page_address(struct page *page) +{ + if (unlikely(PageVcompound(page))) + return vmalloc_address(page); + return lowmem_page_address(page); +} + #define set_page_address(page, address) do { } while(0) #define page_address_init() do { } while(0) #endif Index: linux-2.6/mm/highmem.c =================================================================== --- linux-2.6.orig/mm/highmem.c 2007-10-03 19:39:25.000000000 -0700 +++ linux-2.6/mm/highmem.c 2007-10-03 19:40:29.000000000 -0700 @@ -265,8 +265,11 @@ void *page_address(struct page *page) void *ret; struct page_address_slot *pas; - if (!PageHighMem(page)) + if (!PageHighMem(page)) { + if (PageVcompound(page)) + return vmalloc_address(page); return lowmem_page_address(page); + } pas = page_slot(page); ret = NULL; @@ -294,7 +297,10 @@ void set_page_address(struct page *page, struct page_address_slot *pas; struct page_address_map *pam; - BUG_ON(!PageHighMem(page)); + if (!PageHighMem(page)) { + BUG_ON(!PageVcompound(page)); + return; + } pas = page_slot(page); if (virtual) { /* Add */ -- From clameter@sgi.com Wed Oct 3 20:57:15 2007 Message-Id: <20071004035715.656319312@sgi.com> References: <20071004035656.203248544@sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 03 Oct 2007 20:57:03 -0700 From: Christoph Lameter To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [07/18] Vcompound: Add compound_nth_page() to determine nth base page Content-Disposition: inline; filename=vcompound_compound_nth_page Add a new function compound_nth_page(page, n) and vmalloc_nth_page(page, n) to find the nth page of a compound page. For real compound pages his simply reduces to page + n. For virtual compound pages we need to consult the page tables to figure out the nth page from the one specified. Update all the references to page[1] to use compound_nth instead. --- include/linux/mm.h | 17 +++++++++++++---- mm/page_alloc.c | 16 +++++++++++----- mm/vmalloc.c | 10 ++++++++++ 3 files changed, 34 insertions(+), 9 deletions(-) Index: linux-2.6/include/linux/mm.h =================================================================== --- linux-2.6.orig/include/linux/mm.h 2007-10-03 19:31:45.000000000 -0700 +++ linux-2.6/include/linux/mm.h 2007-10-03 19:31:51.000000000 -0700 @@ -295,6 +295,8 @@ static inline int get_page_unless_zero(s } void *vmalloc_address(struct page *); +struct page *vmalloc_to_page(void *addr); +struct page *vmalloc_nth_page(struct page *page, int n); static inline struct page *compound_head(struct page *page) { @@ -338,27 +340,34 @@ void split_page(struct page *page, unsig */ typedef void compound_page_dtor(struct page *); +static inline struct page *compound_nth_page(struct page *page, int n) +{ + if (likely(!PageVcompound(page))) + return page + n; + return vmalloc_nth_page(page, n); +} + static inline void set_compound_page_dtor(struct page *page, compound_page_dtor *dtor) { - page[1].lru.next = (void *)dtor; + compound_nth_page(page, 1)->lru.next = (void *)dtor; } static inline compound_page_dtor *get_compound_page_dtor(struct page *page) { - return (compound_page_dtor *)page[1].lru.next; + return (compound_page_dtor *)compound_nth_page(page, 1)->lru.next; } static inline int compound_order(struct page *page) { if (!PageHead(page)) return 0; - return (unsigned long)page[1].lru.prev; + return (unsigned long)compound_nth_page(page, 1)->lru.prev; } static inline void set_compound_order(struct page *page, unsigned long order) { - page[1].lru.prev = (void *)order; + compound_nth_page(page, 1)->lru.prev = (void *)order; } /* Index: linux-2.6/mm/vmalloc.c =================================================================== --- linux-2.6.orig/mm/vmalloc.c 2007-10-03 19:31:45.000000000 -0700 +++ linux-2.6/mm/vmalloc.c 2007-10-03 19:31:51.000000000 -0700 @@ -541,6 +541,16 @@ void *vmalloc(unsigned long size) } EXPORT_SYMBOL(vmalloc); +/* + * Given a pointer to the first page struct: + * Determine a pointer to the nth page. + */ +struct page *vmalloc_nth_page(struct page *page, int n) +{ + return vmalloc_to_page(page_address(page) + n * PAGE_SIZE); +} +EXPORT_SYMBOL(vmalloc_nth_page); + /** * vmalloc_user - allocate zeroed virtually contiguous memory for userspace * @size: allocation size Index: linux-2.6/mm/page_alloc.c =================================================================== --- linux-2.6.orig/mm/page_alloc.c 2007-10-03 19:31:51.000000000 -0700 +++ linux-2.6/mm/page_alloc.c 2007-10-03 19:32:45.000000000 -0700 @@ -274,7 +274,7 @@ static void prep_compound_page(struct pa set_compound_order(page, order); __SetPageHead(page); for (i = 1; i < nr_pages; i++) { - struct page *p = page + i; + struct page *p = compound_nth_page(page, i); __SetPageTail(p); p->first_page = page; @@ -289,17 +289,23 @@ static void destroy_compound_page(struct if (unlikely(compound_order(page) != order)) bad_page(page); - if (unlikely(!PageHead(page))) - bad_page(page); - __ClearPageHead(page); for (i = 1; i < nr_pages; i++) { - struct page *p = page + i; + struct page *p = compound_nth_page(page, i); if (unlikely(!PageTail(p) | (p->first_page != page))) bad_page(page); __ClearPageTail(p); } + + /* + * The PageHead is important since it determines how operations on + * a compound page have to be performed. We can only tear the head + * down after all the tail pages are done. + */ + if (unlikely(!PageHead(page))) + bad_page(page); + __ClearPageHead(page); } static inline void prep_zero_page(struct page *page, int order, gfp_t gfp_flags) -- From clameter@sgi.com Wed Oct 3 20:57:15 2007 Message-Id: <20071004035715.816015652@sgi.com> References: <20071004035656.203248544@sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 03 Oct 2007 20:57:04 -0700 From: Christoph Lameter To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [08/18] GFP_VFALLBACK: Allow fallback of compound pages to virtual mappings Content-Disposition: inline; filename=vcompound_core Add a new gfp flag __GFP_VFALLBACK If specified during a higher order allocation then the system will fall back to vmap if no physically contiguous pages can be found. This will create a virtually contiguous area instead of a physically contiguous area. In many cases the virtually contiguous area can stand in for the physically contiguous area (with some loss of performance). Signed-off-by: Christoph Lameter --- include/linux/gfp.h | 5 + mm/page_alloc.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 139 insertions(+), 5 deletions(-) Index: linux-2.6/mm/page_alloc.c =================================================================== --- linux-2.6.orig/mm/page_alloc.c 2007-10-03 19:44:07.000000000 -0700 +++ linux-2.6/mm/page_alloc.c 2007-10-03 19:44:08.000000000 -0700 @@ -60,6 +60,9 @@ long nr_swap_pages; int percpu_pagelist_fraction; static void __free_pages_ok(struct page *page, unsigned int order); +static struct page *alloc_vcompound(gfp_t, int, + struct zonelist *, unsigned long); +static void destroy_compound_page(struct page *page, unsigned long order); /* * results with 256, 32 in the lowmem_reserve sysctl: @@ -260,9 +263,51 @@ static void bad_page(struct page *page) * This usage means that zero-order pages may not be compound. */ +static void __free_vcompound(void *addr) +{ + struct page **pages; + int i; + struct page *page = vmalloc_to_page(addr); + int order = compound_order(page); + int nr_pages = 1 << order; + + if (!PageVcompound(page) || !PageHead(page)) { + bad_page(page); + return; + } + destroy_compound_page(page, order); + pages = vunmap(addr); + /* + * First page will have zero refcount since it maintains state + * for the compound and was decremented before we got here. + */ + set_page_address(page, NULL); + __ClearPageVcompound(page); + free_hot_page(page); + + for (i = 1; i < nr_pages; i++) { + page = pages[i]; + set_page_address(page, NULL); + __ClearPageVcompound(page); + __free_page(page); + } + kfree(pages); +} + + +static void free_vcompound(void *addr) +{ + __free_vcompound(addr); +} + static void free_compound_page(struct page *page) { - __free_pages_ok(page, compound_order(page)); + if (PageVcompound(page)) + free_vcompound(page_address(page)); + else { + destroy_compound_page(page, compound_order(page)); + __free_pages_ok(page, compound_order(page)); + } } static void prep_compound_page(struct page *page, unsigned long order) @@ -1259,6 +1304,67 @@ try_next_zone: } /* + * Virtual Compound Page support. + * + * Virtual Compound Pages are used to fall back to order 0 allocations if large + * linear mappings are not available and __GFP_VFALLBACK is set. They are + * formatted according to compound page conventions. I.e. following + * page->first_page if PageTail(page) is set can be used to determine the + * head page. + */ +static noinline struct page *alloc_vcompound(gfp_t gfp_mask, int order, + struct zonelist *zonelist, unsigned long alloc_flags) +{ + struct page *page; + int i; + struct vm_struct *vm; + int nr_pages = 1 << order; + struct page **pages = kmalloc(nr_pages * sizeof(struct page *), + gfp_mask & GFP_LEVEL_MASK); + struct page **pages2; + + if (!pages) + return NULL; + + gfp_mask &= ~(__GFP_COMP | __GFP_VFALLBACK); + for (i = 0; i < nr_pages; i++) { + page = get_page_from_freelist(gfp_mask, 0, zonelist, + alloc_flags); + if (!page) + goto abort; + + /* Sets PageCompound which makes PageHead(page) true */ + __SetPageVcompound(page); + pages[i] = page; + } + + vm = get_vm_area_node(nr_pages << PAGE_SHIFT, VM_MAP, + zone_to_nid(zonelist->zones[0]), gfp_mask); + pages2 = pages; + if (map_vm_area(vm, PAGE_KERNEL, &pages2)) + goto abort; + + prep_compound_page(pages[0], order); + + for (i = 0; i < nr_pages; i++) + set_page_address(pages[0], vm->addr + (i << PAGE_SHIFT)); + + return pages[0]; + +abort: + while (i-- > 0) { + page = pages[i]; + if (!page) + continue; + set_page_address(page, NULL); + __ClearPageVcompound(page); + __free_page(page); + } + kfree(pages); + return NULL; +} + +/* * This is the 'heart' of the zoned buddy allocator. */ struct page * fastcall @@ -1353,12 +1459,12 @@ nofail_alloc: goto nofail_alloc; } } - goto nopage; + goto try_vcompound; } /* Atomic allocations - we can't balance anything */ if (!wait) - goto nopage; + goto try_vcompound; cond_resched(); @@ -1389,6 +1495,11 @@ nofail_alloc: */ page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, order, zonelist, ALLOC_WMARK_HIGH|ALLOC_CPUSET); + + if (!page && order && (gfp_mask & __GFP_VFALLBACK)) + page = alloc_vcompound(gfp_mask, order, + zonelist, alloc_flags); + if (page) goto got_pg; @@ -1420,6 +1531,14 @@ nofail_alloc: goto rebalance; } +try_vcompound: + /* Last chance before failing the allocation */ + if (order && (gfp_mask & __GFP_VFALLBACK)) { + page = alloc_vcompound(gfp_mask, order, + zonelist, alloc_flags); + if (page) + goto got_pg; + } nopage: if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) { printk(KERN_WARNING "%s: page allocation failure." @@ -1480,6 +1599,9 @@ fastcall void __free_pages(struct page * if (order == 0) free_hot_page(page); else + if (unlikely(PageHead(page))) + free_compound_page(page); + else __free_pages_ok(page, order); } } @@ -1489,8 +1611,15 @@ EXPORT_SYMBOL(__free_pages); fastcall void free_pages(unsigned long addr, unsigned int order) { if (addr != 0) { - VM_BUG_ON(!virt_addr_valid((void *)addr)); - __free_pages(virt_to_page((void *)addr), order); + struct page *page; + + if (unlikely(addr >= VMALLOC_START && addr < VMALLOC_END)) + page = vmalloc_to_page((void *)addr); + else { + VM_BUG_ON(!virt_addr_valid(addr)); + page = virt_to_page(addr); + }; + __free_pages(page, order); } } Index: linux-2.6/include/linux/gfp.h =================================================================== --- linux-2.6.orig/include/linux/gfp.h 2007-10-03 19:44:07.000000000 -0700 +++ linux-2.6/include/linux/gfp.h 2007-10-03 19:44:08.000000000 -0700 @@ -43,6 +43,7 @@ struct vm_area_struct; #define __GFP_REPEAT ((__force gfp_t)0x400u) /* Retry the allocation. Might fail */ #define __GFP_NOFAIL ((__force gfp_t)0x800u) /* Retry for ever. Cannot fail */ #define __GFP_NORETRY ((__force gfp_t)0x1000u)/* Do not retry. Might fail */ +#define __GFP_VFALLBACK ((__force gfp_t)0x2000u)/* Permit fallback to vmalloc */ #define __GFP_COMP ((__force gfp_t)0x4000u)/* Add compound page metadata */ #define __GFP_ZERO ((__force gfp_t)0x8000u)/* Return zeroed page on success */ #define __GFP_NOMEMALLOC ((__force gfp_t)0x10000u) /* Don't use emergency reserves */ @@ -86,6 +87,10 @@ struct vm_area_struct; #define GFP_THISNODE ((__force gfp_t)0) #endif +/* + * Allocate large page but allow fallback to a virtually mapped page + */ +#define GFP_VFALLBACK (GFP_KERNEL | __GFP_VFALLBACK) /* Flag - indicates that the buffer will be suitable for DMA. Ignored on some platforms, used as appropriate on others */ -- From clameter@sgi.com Wed Oct 3 20:57:16 2007 Message-Id: <20071004035715.975460550@sgi.com> References: <20071004035656.203248544@sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 03 Oct 2007 20:57:05 -0700 From: Christoph Lameter To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [09/18] Vcompuond: GFP_VFALLBACK debugging aid Content-Disposition: inline; filename=vcompound_debugging_aid Virtual fallbacks are rare and thus subtle bugs may creep in if we do not test the fallbacks. CONFIG_VFALLBACK_ALWAYS makes all GFP_VFALLBACK allocations fall back to virtual mapping. Signed-off-by: Christoph Lameter --- lib/Kconfig.debug | 11 +++++++++++ mm/page_alloc.c | 6 ++++++ 2 files changed, 17 insertions(+) Index: linux-2.6/mm/page_alloc.c =================================================================== --- linux-2.6.orig/mm/page_alloc.c 2007-10-03 18:04:33.000000000 -0700 +++ linux-2.6/mm/page_alloc.c 2007-10-03 18:07:16.000000000 -0700 @@ -1257,6 +1257,12 @@ zonelist_scan: } } +#ifdef CONFIG_VFALLBACK_ALWAYS + if ((gfp_mask & __GFP_VFALLBACK) && + system_state == SYSTEM_RUNNING) + return alloc_vcompound(gfp_mask, order, + zonelist, alloc_flags); +#endif page = buffered_rmqueue(zonelist, zone, order, gfp_mask); if (page) break; Index: linux-2.6/lib/Kconfig.debug =================================================================== --- linux-2.6.orig/lib/Kconfig.debug 2007-10-03 18:04:29.000000000 -0700 +++ linux-2.6/lib/Kconfig.debug 2007-10-03 18:07:16.000000000 -0700 @@ -105,6 +105,17 @@ config DETECT_SOFTLOCKUP can be detected via the NMI-watchdog, on platforms that support it.) +config VFALLBACK_ALWAYS + bool "Always fall back to Virtual Compound pages" + default y + help + Virtual compound pages are only allocated if there is no linear + memory available. They are a fallback and errors created by the + use of virtual mappings instead of linear ones may not surface + because of their infrequent use. This option makes every + allocation that allows a fallback to a virtual mapping use + the virtual mapping. May have a significant performance impact. + config SCHED_DEBUG bool "Collect scheduler debugging info" depends on DEBUG_KERNEL && PROC_FS -- From clameter@sgi.com Wed Oct 3 20:57:16 2007 Message-Id: <20071004035716.135938019@sgi.com> References: <20071004035656.203248544@sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 03 Oct 2007 20:57:06 -0700 From: Christoph Lameter To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, apw@shadowen.org Subject: [10/18] Sparsemem: Use fallback for the memmap. Content-Disposition: inline; filename=vcompound_sparse_gfp_vfallback Sparsemem currently attempts first to do a physically contiguous mapping and then falls back to vmalloc. The same thing can now be accomplished using GFP_VFALLBACK. Cc: apw@shadowen.org Signed-off-by: Christoph Lameter --- mm/sparse.c | 33 +++------------------------------ 1 file changed, 3 insertions(+), 30 deletions(-) Index: linux-2.6/mm/sparse.c =================================================================== --- linux-2.6.orig/mm/sparse.c 2007-10-02 22:02:58.000000000 -0700 +++ linux-2.6/mm/sparse.c 2007-10-02 22:19:58.000000000 -0700 @@ -269,40 +269,13 @@ void __init sparse_init(void) #ifdef CONFIG_MEMORY_HOTPLUG static struct page *__kmalloc_section_memmap(unsigned long nr_pages) { - struct page *page, *ret; - unsigned long memmap_size = sizeof(struct page) * nr_pages; - - page = alloc_pages(GFP_KERNEL|__GFP_NOWARN, get_order(memmap_size)); - if (page) - goto got_map_page; - - ret = vmalloc(memmap_size); - if (ret) - goto got_map_ptr; - - return NULL; -got_map_page: - ret = (struct page *)pfn_to_kaddr(page_to_pfn(page)); -got_map_ptr: - memset(ret, 0, memmap_size); - - return ret; -} - -static int vaddr_in_vmalloc_area(void *addr) -{ - if (addr >= (void *)VMALLOC_START && - addr < (void *)VMALLOC_END) - return 1; - return 0; + return (struct page *)__get_free_pages(GFP_VFALLBACK, + get_order(memmap_size)); } static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages) { - if (vaddr_in_vmalloc_area(memmap)) - vfree(memmap); - else - free_pages((unsigned long)memmap, + free_pages((unsigned long)memmap, get_order(sizeof(struct page) * nr_pages)); } -- From clameter@sgi.com Wed Oct 3 20:57:16 2007 Message-Id: <20071004035716.295929772@sgi.com> References: <20071004035656.203248544@sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 03 Oct 2007 20:57:07 -0700 From: Christoph Lameter To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [11/18] Page allocator: Use a higher order allocation for the zone wait table. Content-Disposition: inline; filename=vcompound_wait_table_no_vmalloc Currently vmalloc is used for the zone wait table. Therefore the vmalloc page tables have to be consulted by the MMU to access the wait table. We can now use GFP_VFALLBACK to attempt the use of a physically contiguous page that can then use the large kernel TLBs. Drawback: The zone wait table is rounded up to the next power of two which may cost some memory. Signed-off-by: Christoph Lameter --- mm/page_alloc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) Index: linux-2.6/mm/page_alloc.c =================================================================== --- linux-2.6.orig/mm/page_alloc.c 2007-10-03 18:07:16.000000000 -0700 +++ linux-2.6/mm/page_alloc.c 2007-10-03 18:07:20.000000000 -0700 @@ -2585,7 +2585,9 @@ int zone_wait_table_init(struct zone *zo * To use this new node's memory, further consideration will be * necessary. */ - zone->wait_table = (wait_queue_head_t *)vmalloc(alloc_size); + zone->wait_table = (wait_queue_head_t *) + __get_free_pages(GFP_VFALLBACK, + get_order(alloc_size)); } if (!zone->wait_table) return -ENOMEM; -- From clameter@sgi.com Wed Oct 3 20:57:16 2007 Message-Id: <20071004035716.454806427@sgi.com> References: <20071004035656.203248544@sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 03 Oct 2007 20:57:08 -0700 From: Christoph Lameter To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [12/18] Wait: Allow bit_waitqueue to wait on a bit in a virtual compound page Content-Disposition: inline; filename=vcompound_wait_on_virtually_mapped_object If bit waitqueue is passed a virtual address then it must use virt_to_head_page instead of virt_to_page. Signed-off-by: Christoph Lameter --- kernel/wait.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6/kernel/wait.c =================================================================== --- linux-2.6.orig/kernel/wait.c 2007-10-03 17:44:21.000000000 -0700 +++ linux-2.6/kernel/wait.c 2007-10-03 17:53:07.000000000 -0700 @@ -245,7 +245,7 @@ EXPORT_SYMBOL(wake_up_bit); fastcall wait_queue_head_t *bit_waitqueue(void *word, int bit) { const int shift = BITS_PER_LONG == 32 ? 5 : 6; - const struct zone *zone = page_zone(virt_to_page(word)); + const struct zone *zone = page_zone(virt_to_head_page(word)); unsigned long val = (unsigned long)word << shift | bit; return &zone->wait_table[hash_long(val, zone->wait_table_bits)]; -- From clameter@sgi.com Wed Oct 3 20:57:16 2007 Message-Id: <20071004035716.619040358@sgi.com> References: <20071004035656.203248544@sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 03 Oct 2007 20:57:09 -0700 From: Christoph Lameter To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, ak@suse.de, travis@sgi.com Subject: [13/18] x86_64: Allow fallback for the stack Content-Disposition: inline; filename=vcompound_x86_64_stack_fallback Peter Zijlstra has recently demonstrated that we can have order 1 allocation failures under memory pressure with small memory configurations. The x86_64 stack has a size of 8k and thus requires a order 1 allocation. This patch adds a virtual fallback capability for the stack. The system may continue even in extreme situations and we may be able to increase the stack size if necessary (see next patch). Cc: ak@suse.de Cc: travis@sgi.com Signed-off-by: Christoph Lameter --- include/asm-x86_64/thread_info.h | 16 +++++----------- 1 file changed, 5 insertions(+), 11 deletions(-) Index: linux-2.6/include/asm-x86_64/thread_info.h =================================================================== --- linux-2.6.orig/include/asm-x86_64/thread_info.h 2007-10-03 14:49:48.000000000 -0700 +++ linux-2.6/include/asm-x86_64/thread_info.h 2007-10-03 14:51:00.000000000 -0700 @@ -74,20 +74,14 @@ static inline struct thread_info *stack_ /* thread information allocation */ #ifdef CONFIG_DEBUG_STACK_USAGE -#define alloc_thread_info(tsk) \ - ({ \ - struct thread_info *ret; \ - \ - ret = ((struct thread_info *) __get_free_pages(GFP_KERNEL,THREAD_ORDER)); \ - if (ret) \ - memset(ret, 0, THREAD_SIZE); \ - ret; \ - }) +#define THREAD_FLAGS (GFP_VFALLBACK | __GFP_ZERO) #else -#define alloc_thread_info(tsk) \ - ((struct thread_info *) __get_free_pages(GFP_KERNEL,THREAD_ORDER)) +#define THREAD_FLAGS GFP_VFALLBACK #endif +#define alloc_thread_info(tsk) \ + ((struct thread_info *) __get_free_pages(THREAD_FLAGS, THREAD_ORDER)) + #define free_thread_info(ti) free_pages((unsigned long) (ti), THREAD_ORDER) #else /* !__ASSEMBLY__ */ -- From clameter@sgi.com Wed Oct 3 20:57:16 2007 Message-Id: <20071004035716.775518672@sgi.com> References: <20071004035656.203248544@sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 03 Oct 2007 20:57:10 -0700 From: Christoph Lameter To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, ak@suse.de, travis@sgi.com Subject: [14/18] Configure stack size Content-Disposition: inline; filename=vcompound_x86_64_config_stack_size Make the stack size configurable now that we can fallback to vmalloc if necessary. SGI NUMA configurations may need more stack because cpumasks and nodemasks are at times kept on the stack. With the coming 16k cpu support this is going to be 2k just for the mask. This patch allows to run with 16k or 32k kernel stacks on x86_74. Cc: ak@suse.de Cc: travis@sgi.com Signed-off-by: Christoph Lameter --- arch/x86_64/Kconfig | 6 ++++++ include/asm-x86_64/page.h | 3 +-- include/asm-x86_64/thread_info.h | 4 ++-- 3 files changed, 9 insertions(+), 4 deletions(-) Index: linux-2.6/arch/x86_64/Kconfig =================================================================== --- linux-2.6.orig/arch/x86_64/Kconfig 2007-10-03 18:11:20.000000000 -0700 +++ linux-2.6/arch/x86_64/Kconfig 2007-10-03 18:12:13.000000000 -0700 @@ -363,6 +363,12 @@ config NODES_SHIFT default "6" depends on NEED_MULTIPLE_NODES +config THREAD_ORDER + int "Kernel stack size (in page order)" + default "1" + help + Page order for the thread stack. + # Dummy CONFIG option to select ACPI_NUMA from drivers/acpi/Kconfig. config X86_64_ACPI_NUMA Index: linux-2.6/include/asm-x86_64/page.h =================================================================== --- linux-2.6.orig/include/asm-x86_64/page.h 2007-10-03 18:11:20.000000000 -0700 +++ linux-2.6/include/asm-x86_64/page.h 2007-10-03 18:12:13.000000000 -0700 @@ -9,8 +9,7 @@ #define PAGE_MASK (~(PAGE_SIZE-1)) #define PHYSICAL_PAGE_MASK (~(PAGE_SIZE-1) & __PHYSICAL_MASK) -#define THREAD_ORDER 1 -#define THREAD_SIZE (PAGE_SIZE << THREAD_ORDER) +#define THREAD_SIZE (PAGE_SIZE << CONFIG_THREAD_ORDER) #define CURRENT_MASK (~(THREAD_SIZE-1)) #define EXCEPTION_STACK_ORDER 0 Index: linux-2.6/include/asm-x86_64/thread_info.h =================================================================== --- linux-2.6.orig/include/asm-x86_64/thread_info.h 2007-10-03 18:12:13.000000000 -0700 +++ linux-2.6/include/asm-x86_64/thread_info.h 2007-10-03 18:12:13.000000000 -0700 @@ -80,9 +80,9 @@ static inline struct thread_info *stack_ #endif #define alloc_thread_info(tsk) \ - ((struct thread_info *) __get_free_pages(THREAD_FLAGS, THREAD_ORDER)) + ((struct thread_info *) __get_free_pages(THREAD_FLAGS, CONFIG_THREAD_ORDER)) -#define free_thread_info(ti) free_pages((unsigned long) (ti), THREAD_ORDER) +#define free_thread_info(ti) free_pages((unsigned long) (ti), CONFIG_THREAD_ORDER) #else /* !__ASSEMBLY__ */ -- From clameter@sgi.com Wed Oct 3 20:57:17 2007 Message-Id: <20071004035716.938359116@sgi.com> References: <20071004035656.203248544@sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 03 Oct 2007 20:57:11 -0700 From: Christoph Lameter To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dan Williams Subject: [15/18] Fallback for temporary order 2 allocation Content-Disposition: inline; filename=vcompound_crypto The cryto subsystem needs an order 2 allocation. This is a temporary buffer for xoring data so we can safely allow fallback. Cc: Dan Williams Signed-off-by: Christoph Lameter --- crypto/xor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6/crypto/xor.c =================================================================== --- linux-2.6.orig/crypto/xor.c 2007-10-03 18:11:20.000000000 -0700 +++ linux-2.6/crypto/xor.c 2007-10-03 18:12:14.000000000 -0700 @@ -101,7 +101,7 @@ calibrate_xor_blocks(void) void *b1, *b2; struct xor_block_template *f, *fastest; - b1 = (void *) __get_free_pages(GFP_KERNEL, 2); + b1 = (void *) __get_free_pages(GFP_VFALLBACK, 2); if (!b1) { printk(KERN_WARNING "xor: Yikes! No memory available.\n"); return -ENOMEM; -- From clameter@sgi.com Wed Oct 3 20:57:17 2007 Message-Id: <20071004035717.093643381@sgi.com> References: <20071004035656.203248544@sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 03 Oct 2007 20:57:12 -0700 From: Christoph Lameter To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [16/18] Virtual Compound page allocation from interrupt context. Content-Disposition: inline; filename=vcompound_interrupt_alloc In an interrupt context we cannot wait for the vmlist_lock in __get_vm_area_node(). So use a trylock instead. If the trylock fails then the atomic allocation will fail and subsequently be retried. This only works because the flush_cache_vunmap in use for allocation is never performing any IPIs in contrast to flush_tlb_... in use for freeing. flush_cache_vunmap is only used on architectures with a virtually mapped cache (xtensa, pa-risc). [Note: Nick Piggin is working on a scheme to make this simpler by no longer requiring flushes] Signed-off-by: Christoph Lameter --- mm/vmalloc.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) Index: linux-2.6/mm/vmalloc.c =================================================================== --- linux-2.6.orig/mm/vmalloc.c 2007-10-03 16:21:10.000000000 -0700 +++ linux-2.6/mm/vmalloc.c 2007-10-03 16:25:17.000000000 -0700 @@ -177,7 +177,6 @@ static struct vm_struct *__get_vm_area_n unsigned long align = 1; unsigned long addr; - BUG_ON(in_interrupt()); if (flags & VM_IOREMAP) { int bit = fls(size); @@ -202,7 +201,14 @@ static struct vm_struct *__get_vm_area_n */ size += PAGE_SIZE; - write_lock(&vmlist_lock); + if (gfp_mask & __GFP_WAIT) + write_lock(&vmlist_lock); + else { + if (!write_trylock(&vmlist_lock)) { + kfree(area); + return NULL; + } + } for (p = &vmlist; (tmp = *p) != NULL ;p = &tmp->next) { if ((unsigned long)tmp->addr < addr) { if((unsigned long)tmp->addr + tmp->size >= addr) -- From clameter@sgi.com Wed Oct 3 20:57:17 2007 Message-Id: <20071004035717.256770492@sgi.com> References: <20071004035656.203248544@sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 03 Oct 2007 20:57:13 -0700 From: Christoph Lameter To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [17/18] Virtual compound page freeing in interrupt context Content-Disposition: inline; filename=vcompound_interrupt_free If we are in an interrupt context then simply defer the free via a workqueue. Removing a virtual mappping *must* be done with interrupts enabled since tlb_xx functions are called that rely on interrupts for processor to processor communications. Signed-off-by: Christoph Lameter --- mm/page_alloc.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) Index: linux-2.6/mm/page_alloc.c =================================================================== --- linux-2.6.orig/mm/page_alloc.c 2007-10-03 20:00:37.000000000 -0700 +++ linux-2.6/mm/page_alloc.c 2007-10-03 20:01:09.000000000 -0700 @@ -294,10 +294,20 @@ static void __free_vcompound(void *addr) kfree(pages); } +static void vcompound_free_work(struct work_struct *w) +{ + __free_vcompound((void *)w); +} static void free_vcompound(void *addr) { - __free_vcompound(addr); + struct work_struct *w = addr; + + if (irqs_disabled() || in_interrupt()) { + INIT_WORK(w, vcompound_free_work); + schedule_work(w); + } else + __free_vcompound(w); } static void free_compound_page(struct page *page) -- From clameter@sgi.com Wed Oct 3 20:57:17 2007 Message-Id: <20071004035717.415767649@sgi.com> References: <20071004035656.203248544@sgi.com> User-Agent: quilt/0.46-1 Date: Wed, 03 Oct 2007 20:57:14 -0700 From: Christoph Lameter To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [18/18] SLUB: Use fallback for table of callers/freers of a slab cache Content-Disposition: inline; filename=vcompound_slub_safe The caller table can get quite large if there are many call sites for a particular slab. Add GFP_FALLBACK allows falling back to vmalloc in case the caller table gets too big and memory is fragmented. Currently we would fail the operation. Signed-off-by: Christoph Lameter --- mm/slub.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2007-10-03 20:00:23.000000000 -0700 +++ linux-2.6/mm/slub.c 2007-10-03 20:01:12.000000000 -0700 @@ -3003,7 +3003,8 @@ static int alloc_loc_track(struct loc_tr order = get_order(sizeof(struct location) * max); - l = (void *)__get_free_pages(flags, order); + l = (void *)__get_free_pages(flags | __GFP_COMP | __GFP_VFALLBACK, + order); if (!l) return 0; --