From clameter@sgi.com Sat Nov 3 13:53:24 2007 Date: Sat, 3 Nov 2007 12:53:22 -0700 (PDT) From: Christoph Lameter To: Jack Steiner Cc: scaling@sgi.com, holt@sgi.com Subject: Re: Upstream quicklist fix Here is another release. Releasing of pages when trimming first goes into the check for offnode pages. The check is only run when we actually have more than the mininum number of pages and it only frees the max. After the offnode freeing has been done then we do the same with the on node pages provided that there are still morew than the mininum on it. So in the worst case we may actually free 2 * max pages when trimming. [NUMA] Quicklists: Do not free off node pages before the TLB flush has been done Delay the quicklist freeing until the trimming function is called. At that point we know that it is okay to free pages. There will be some loss of efficiency due to the need to rescan the list. Signed-off-by: Christoph Lameter Index: linux-2.6/include/linux/quicklist.h =================================================================== --- linux-2.6.orig/include/linux/quicklist.h 2007-11-01 15:48:32.000000000 -0700 +++ linux-2.6/include/linux/quicklist.h 2007-11-02 09:37:05.000000000 -0700 @@ -15,6 +15,9 @@ struct quicklist { void *page; +#ifdef CONFIG_NUMA + void *offnode; +#endif int nr_pages; }; @@ -56,18 +59,18 @@ static inline void __quicklist_free(int struct page *page) { struct quicklist *q; - int nid = page_to_nid(page); - - if (unlikely(nid != numa_node_id())) { - if (dtor) - dtor(p); - __free_page(page); - return; - } q = &get_cpu_var(quicklist)[nr]; - *(void **)p = q->page; - q->page = p; +#ifdef CONFIG_NUMA + if (page_to_nid(page) != numa_node_id()) { + *(void **)p = q->offnode; + q->offnode = p; + } else +#endif + { + *(void **)p = q->page; + q->page = p; + } q->nr_pages++; put_cpu_var(quicklist); } Index: linux-2.6/mm/quicklist.c =================================================================== --- linux-2.6.orig/mm/quicklist.c 2007-11-01 15:48:32.000000000 -0700 +++ linux-2.6/mm/quicklist.c 2007-11-03 01:07:41.000000000 -0700 @@ -44,6 +44,34 @@ static long min_pages_to_free(struct qui } /* + * Drop off node pages from the quicklists + */ +static inline void free_offnode_pages(struct quicklist *q, + int min_pages, int max_free, void (*dtor)(void *)) +{ +#ifdef CONFIG_NUMA + void **p; + void **prev = NULL; + + if (q->nr_pages <= min_pages) + return; + + for (p = q->offnode; p && max_free; p = *p) { + if (prev) + *prev = *p; + else + q->page = *p; + + if (dtor) + dtor(p); + free_page((unsigned long)p); + q->nr_pages--; + max_free--; + } +#endif +} + +/* * Trim down the number of pages in the quicklist */ void quicklist_trim(int nr, void (*dtor)(void *), @@ -53,6 +81,7 @@ void quicklist_trim(int nr, void (*dtor) struct quicklist *q; q = &get_cpu_var(quicklist)[nr]; + free_offnode_pages(q, min_pages, max_free, dtor); if (q->nr_pages > min_pages) { pages_to_free = min_pages_to_free(q, min_pages, max_free);