GIT f18bd57c8db2a9932abfc8e1eedaebcb798cbe14 git+ssh://master.kernel.org/pub/scm/linux/kernel/git/christoph/vm.git#slab-mm f18bd57c8db2a9932abfc8e1eedaebcb798cbe14 commit f18bd57c8db2a9932abfc8e1eedaebcb798cbe14 Author: Christoph Lameter Date: Mon Mar 31 16:45:45 2008 -0700 slub: pack objects denser Since we now have more orders available use a denser packing. Increase slab order if more than 1/16th of a slab would be wasted. Signed-off-by: Christoph Lameter commit fb34190de1f79c33451b54124db048b92b7dba16 Author: Christoph Lameter Date: Fri Feb 15 15:22:22 2008 -0800 slub: Adjust order boundaries and minimum objects per slab. Since there is now no worry anymore about higher order allocs (hopefully). Set the max order to default to PAGE_ALLOC_ORDER_COSTLY (32k). The mininum objects per slab is calculated based on the number of processors that may come online. Processors min_objects --------------------------- 1 4 2 8 4 12 8 16 16 20 32 24 64 28 1024 44 4096 52 V1->V2 - Add logic to compute min_objects based on processor count. - Dump the DEFAULT_xx constants. Signed-off-by: Christoph Lameter commit e0e43835a8e51f851391c2618dc40fa809fe1937 Author: Christoph Lameter Date: Wed Mar 26 12:46:37 2008 -0700 Slub: No need for slab counters if !SLUB_DEBUG Counters are used mainly for showing data through the sysfs API. If that API is not compiled in then there is no point in keeping track of this data. Disable counters for the number of slabs and the number of total slabs if !SLUB_DEBUG. This also affects SLABINFO support. It now must depends on SLUB_DEBUG (which is on by default). Signed-off-by: Christoph Lameter commit 1cb07332122fde39676a504f9d471675392c5176 Author: Christoph Lameter Date: Fri Feb 15 23:45:26 2008 -0800 slub: Simplify any_slab_object checks Since we now have total_objects counter per node use that to check for the presence of any objects. Reviewed-by: Pekka Enberg Signed-off-by: Christoph Lameter commit 934f84381200237f9d12a6787aa76cc9caffb318 Author: Christoph Lameter Date: Fri Feb 15 15:22:23 2008 -0800 slub: Make the order configurable for each slab cache Makes /sys/kernel/slab//order writable. The allocation order of a slab cache can then be changed dynamically during runtime. This can be used to override the objects per slabs value establisheed with the slub_min_objects setting that was manually specified or calculated on bootup. The changes of the slab order can occur while allocate_slab() runs. Allocate slab needs the order and the number of slab objects that are both changed by the change of order. Both are put into a machine word (struct kmem_cache_order_objects). They can then be atomically updates and retrieved. Signed-off-by: Christoph Lameter commit 8a6d20a7b6b5017ed42250015f3dbea4667e25a7 Author: Christoph Lameter Date: Fri Feb 15 15:22:22 2008 -0800 slub: Drop fallback to page allocator method There is now slub method of falling back to a slab page of minimal order. No need anymore for the fallback to kmalloc_large(). Reviewed-by: Pekka Enberg Signed-off-by: Christoph Lameter commit c547692764697bb5b4aa0eb2d1287a1ed75f1422 Author: Christoph Lameter Date: Fri Feb 15 15:22:22 2008 -0800 slub: Fallback to minimal order during slab page allocation If any higher order allocation fails then fall back the smallest order necessary to contain at least one object. Add a new field min_objects that will contain the objects for the smallest possible order of an allocation. Reviewed-by: Pekka Enberg Signed-off-by: Christoph Lameter commit 492368a8f77d83a932d1db8ebfa8d26e7fad7bb8 Author: Christoph Lameter Date: Fri Feb 15 15:22:22 2008 -0800 slub: Update statistics handling for variable order slabs Change the statistics to consider that slabs of the same slabcache can have different number of objects in them since they may be of different order. Provide a new sysfs field total_objects which shows the total objects that the allocated slabs of a slabcache could hold. Add a max field that holds the largest slab order that was ever used for a slab cache. V1->V2: - Fix slub statistics - update slabinfo Signed-off-by: Christoph Lameter commit b232556a796f7f7d80d9fd9afd394d0414a5a9e7 Author: Christoph Lameter Date: Wed Mar 26 12:46:35 2008 -0700 Add kmem_cache_order_objects struct Pack the order and the number of objects into a single word. This saves some memory in the kmem_cache_structure and more importantly allows us to fetch both values atomically. Later the slab orders become runtime configurable and we need to fetch these two items together in order to properly allocate a slab and initialize its objects. Fix the race by fetching the order and the number of objects in one word. Signed-off-by: Christoph Lameter commit b0ab550df49e6f327d32f9740a0504757f181d0f Author: Christoph Lameter Date: Fri Feb 15 15:22:21 2008 -0800 slub: for_each_object must be passed the number of objects in a slab Pass the number of objects to the for_each_object macro. Reviewed-by: Pekka Enberg Signed-off-by: Christoph Lameter commit ece694738bf79eff06b2abb58bf267bfd3d5db54 Author: Christoph Lameter Date: Mon Mar 17 00:46:18 2008 -0700 Store max number of objects in the page struct. Split the inuse field up to be able to store the number of objects in this page in the page struct as well. Necessary if we want to have pages of various orders for a slab. Also avoids touching struct kmem_cache cachelines in __slab_alloc(). Update diagnostic code to check the number of objects and make sure that the number of objects always stays within the bounds of a 16 bit unsigned integer. Signed-off-by: Christoph Lameter commit 738e299e35ab9e59313e37da8a8db2e388674859 Author: Christoph Lameter Date: Wed Mar 26 12:46:34 2008 -0700 Move map/flag clearing to __free_slab __free_slab does some diagnostics. The resetting of mapcount etc in discard_slab() can interfere with debug processing. So move the reset immediately before the page is freed. Signed-off-by: Christoph Lameter Documentation/vm/slabinfo.c | 27 +-- include/linux/mm_types.h | 5 +- include/linux/slub_def.h | 18 ++- init/Kconfig | 2 +- mm/slub.c | 425 ++++++++++++++++++++++++++----------------- 5 files changed, 285 insertions(+), 192 deletions(-) diff --git a/Documentation/vm/slabinfo.c b/Documentation/vm/slabinfo.c index 22d7e3e..d3ce295 100644 --- a/Documentation/vm/slabinfo.c +++ b/Documentation/vm/slabinfo.c @@ -31,7 +31,7 @@ struct slabinfo { int hwcache_align, object_size, objs_per_slab; int sanity_checks, slab_size, store_user, trace; int order, poison, reclaim_account, red_zone; - unsigned long partial, objects, slabs; + unsigned long partial, objects, slabs, objects_partial, objects_total; unsigned long alloc_fastpath, alloc_slowpath; unsigned long free_fastpath, free_slowpath; unsigned long free_frozen, free_add_partial, free_remove_partial; @@ -540,7 +540,8 @@ void slabcache(struct slabinfo *s) return; store_size(size_str, slab_size(s)); - snprintf(dist_str, 40, "%lu/%lu/%d", s->slabs, s->partial, s->cpu_slabs); + snprintf(dist_str, 40, "%lu/%lu/%d", s->slabs - s->cpu_slabs, + s->partial, s->cpu_slabs); if (!line++) first_line(); @@ -776,7 +777,6 @@ void totals(void) unsigned long used; unsigned long long wasted; unsigned long long objwaste; - long long objects_in_partial_slabs; unsigned long percentage_partial_slabs; unsigned long percentage_partial_objs; @@ -790,18 +790,11 @@ void totals(void) wasted = size - used; objwaste = s->slab_size - s->object_size; - objects_in_partial_slabs = s->objects - - (s->slabs - s->partial - s ->cpu_slabs) * - s->objs_per_slab; - - if (objects_in_partial_slabs < 0) - objects_in_partial_slabs = 0; - percentage_partial_slabs = s->partial * 100 / s->slabs; if (percentage_partial_slabs > 100) percentage_partial_slabs = 100; - percentage_partial_objs = objects_in_partial_slabs * 100 + percentage_partial_objs = s->objects_partial * 100 / s->objects; if (percentage_partial_objs > 100) @@ -823,8 +816,8 @@ void totals(void) min_objects = s->objects; if (used < min_used) min_used = used; - if (objects_in_partial_slabs < min_partobj) - min_partobj = objects_in_partial_slabs; + if (s->objects_partial < min_partobj) + min_partobj = s->objects_partial; if (percentage_partial_slabs < min_ppart) min_ppart = percentage_partial_slabs; if (percentage_partial_objs < min_ppartobj) @@ -848,8 +841,8 @@ void totals(void) max_objects = s->objects; if (used > max_used) max_used = used; - if (objects_in_partial_slabs > max_partobj) - max_partobj = objects_in_partial_slabs; + if (s->objects_partial > max_partobj) + max_partobj = s->objects_partial; if (percentage_partial_slabs > max_ppart) max_ppart = percentage_partial_slabs; if (percentage_partial_objs > max_ppartobj) @@ -864,7 +857,7 @@ void totals(void) total_objects += s->objects; total_used += used; - total_partobj += objects_in_partial_slabs; + total_partobj += s->objects_partial; total_ppart += percentage_partial_slabs; total_ppartobj += percentage_partial_objs; @@ -1160,6 +1153,8 @@ void read_slab_dir(void) slab->hwcache_align = get_obj("hwcache_align"); slab->object_size = get_obj("object_size"); slab->objects = get_obj("objects"); + slab->objects_partial = get_obj("objects_partial"); + slab->objects_total = get_obj("objects_total"); slab->objs_per_slab = get_obj("objs_per_slab"); slab->order = get_obj("order"); slab->partial = get_obj("partial"); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index af190ce..e0bd223 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -42,7 +42,10 @@ struct page { * to show when page is mapped * & limit reverse map searches. */ - unsigned int inuse; /* SLUB: Nr of objects */ + struct { /* SLUB */ + u16 inuse; + u16 objects; + }; }; union { struct { diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index b00c1c7..71e43a1 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -29,6 +29,7 @@ enum stat_item { DEACTIVATE_TO_HEAD, /* Cpu slab was moved to the head of partials */ DEACTIVATE_TO_TAIL, /* Cpu slab was moved to the tail of partials */ DEACTIVATE_REMOTE_FREES,/* Slab contained remotely freed objects */ + ORDER_FALLBACK, /* Number of times fallback was necessary */ NR_SLUB_STAT_ITEMS }; struct kmem_cache_cpu { @@ -45,14 +46,24 @@ struct kmem_cache_cpu { struct kmem_cache_node { spinlock_t list_lock; /* Protect partial list and nr_partial */ unsigned long nr_partial; - atomic_long_t nr_slabs; struct list_head partial; #ifdef CONFIG_SLUB_DEBUG + atomic_long_t nr_slabs; + atomic_long_t total_objects; struct list_head full; #endif }; /* + * Word size structure that can be atomically updated or read and that + * contains both the order and the number of objects that a slab of the + * given order would contain. + */ +struct kmem_cache_order_objects { + unsigned long x; +}; + +/* * Slab cache management. */ struct kmem_cache { @@ -61,7 +72,7 @@ struct kmem_cache { int size; /* The size of an object including meta data */ int objsize; /* The size of an object without meta data */ int offset; /* Free pointer offset. */ - int order; /* Current preferred allocation order */ + struct kmem_cache_order_objects oo; /* * Avoid an extra cache line for UP, SMP and for the node local to @@ -70,7 +81,8 @@ struct kmem_cache { struct kmem_cache_node local_node; /* Allocation and freeing of slabs */ - int objects; /* Number of objects in slab */ + struct kmem_cache_order_objects max; + struct kmem_cache_order_objects min; gfp_t allocflags; /* gfp flags to use on each alloc */ int refcount; /* Refcount for slab cache destroy */ void (*ctor)(struct kmem_cache *, void *); diff --git a/init/Kconfig b/init/Kconfig index a97924b..3bded1e 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -763,7 +763,7 @@ endmenu # General setup config SLABINFO bool depends on PROC_FS - depends on SLAB || SLUB + depends on SLAB || (SLUB && SLUB_DEBUG) default y config RT_MUTEXES diff --git a/mm/slub.c b/mm/slub.c index acc975f..8042733 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -5,7 +5,7 @@ * The allocator synchronizes using per slab locks and only * uses a centralized lock to manage a pool of partial slabs. * - * (C) 2007 SGI, Christoph Lameter + * (C) 2007, 2008 SGI, Christoph Lameter */ #include @@ -149,25 +149,6 @@ static inline void ClearSlabDebug(struct page *page) /* Enable to test recovery from slab corruption on boot */ #undef SLUB_RESILIENCY_TEST -#if PAGE_SHIFT <= 12 - -/* - * Small page size. Make sure that we do not fragment memory - */ -#define DEFAULT_MAX_ORDER 1 -#define DEFAULT_MIN_OBJECTS 4 - -#else - -/* - * Large page machines are customarily able to handle larger - * page orders. - */ -#define DEFAULT_MAX_ORDER 2 -#define DEFAULT_MIN_OBJECTS 8 - -#endif - /* * Mininum number of partial slabs. These will be left on the partial * lists even if they are empty. kmem_cache_shrink may reclaim them. @@ -204,8 +185,6 @@ static inline void ClearSlabDebug(struct page *page) /* Internal SLUB flags */ #define __OBJECT_POISON 0x80000000 /* Poison object */ #define __SYSFS_ADD_DEFERRED 0x40000000 /* Not yet visible via sysfs */ -#define __KMALLOC_CACHE 0x20000000 /* objects freed using kfree */ -#define __PAGE_ALLOC_FALLBACK 0x10000000 /* Allow fallback to page alloc */ /* Not all arches define cache_line_size */ #ifndef cache_line_size @@ -301,7 +280,7 @@ static inline int check_valid_pointer(struct kmem_cache *s, return 1; base = page_address(page); - if (object < base || object >= base + s->objects * s->size || + if (object < base || object >= base + page->objects * s->size || (object - base) % s->size) { return 0; } @@ -327,8 +306,8 @@ static inline void set_freepointer(struct kmem_cache *s, void *object, void *fp) } /* Loop over all objects in a slab */ -#define for_each_object(__p, __s, __addr) \ - for (__p = (__addr); __p < (__addr) + (__s)->objects * (__s)->size;\ +#define for_each_object(__p, __s, __addr, __objects) \ + for (__p = (__addr); __p < (__addr) + (__objects) * (__s)->size;\ __p += (__s)->size) /* Scan freelist */ @@ -341,6 +320,25 @@ static inline int slab_index(void *p, struct kmem_cache *s, void *addr) return (p - addr) / s->size; } +static inline struct kmem_cache_order_objects oo_make(int order, + unsigned long size) +{ + struct kmem_cache_order_objects x = + { (order << 16) + (PAGE_SIZE << order) / size }; + + return x; +} + +static inline int oo_order(struct kmem_cache_order_objects x) +{ + return x.x >> 16; +} + +static inline int oo_objects(struct kmem_cache_order_objects x) +{ + return x.x & ((1 << 16) - 1); +} + #ifdef CONFIG_SLUB_DEBUG /* * Debug settings: @@ -451,8 +449,8 @@ static void print_tracking(struct kmem_cache *s, void *object) static void print_page_info(struct page *page) { - printk(KERN_ERR "INFO: Slab 0x%p used=%u fp=0x%p flags=0x%04lx\n", - page, page->inuse, page->freelist, page->flags); + printk(KERN_ERR "INFO: Slab 0x%p objects=%u used=%u fp=0x%p flags=0x%04lx\n", + page, page->objects, page->inuse, page->freelist, page->flags); } @@ -652,6 +650,7 @@ static int check_pad_bytes(struct kmem_cache *s, struct page *page, u8 *p) p + off, POISON_INUSE, s->size - off); } +/* Check the pad bytes at the end of a slab page */ static int slab_pad_check(struct kmem_cache *s, struct page *page) { u8 *start; @@ -664,20 +663,20 @@ static int slab_pad_check(struct kmem_cache *s, struct page *page) return 1; start = page_address(page); - end = start + (PAGE_SIZE << s->order); - length = s->objects * s->size; - remainder = end - (start + length); + length = (PAGE_SIZE << oo_order(s->oo)); + end = start + length; + remainder = length % s->size; if (!remainder) return 1; - fault = check_bytes(start + length, POISON_INUSE, remainder); + fault = check_bytes(end - remainder, POISON_INUSE, remainder); if (!fault) return 1; while (end > fault && end[-1] == POISON_INUSE) end--; slab_err(s, page, "Padding overwritten. 0x%p-0x%p", fault, end - 1); - print_section("Padding", start, length); + print_section("Padding", end - remainder, remainder); restore_bytes(s, "slab padding", POISON_INUSE, start, end); return 0; @@ -739,15 +738,24 @@ static int check_object(struct kmem_cache *s, struct page *page, static int check_slab(struct kmem_cache *s, struct page *page) { + int maxobj; + VM_BUG_ON(!irqs_disabled()); if (!PageSlab(page)) { slab_err(s, page, "Not a valid slab page"); return 0; } - if (page->inuse > s->objects) { + + maxobj = (PAGE_SIZE << compound_order(page)) / s->size; + if (page->objects > maxobj) { + slab_err(s, page, "objects %u > max %u", + s->name, page->objects, maxobj); + return 0; + } + if (page->inuse > page->objects) { slab_err(s, page, "inuse %u > max %u", - s->name, page->inuse, s->objects); + s->name, page->inuse, page->objects); return 0; } /* Slab_pad_check fixes things up after itself */ @@ -764,8 +772,9 @@ static int on_freelist(struct kmem_cache *s, struct page *page, void *search) int nr = 0; void *fp = page->freelist; void *object = NULL; + unsigned long max_objects; - while (fp && nr <= s->objects) { + while (fp && nr <= page->objects) { if (fp == search) return 1; if (!check_valid_pointer(s, page, fp)) { @@ -777,7 +786,7 @@ static int on_freelist(struct kmem_cache *s, struct page *page, void *search) } else { slab_err(s, page, "Freepointer corrupt"); page->freelist = NULL; - page->inuse = s->objects; + page->inuse = page->objects; slab_fix(s, "Freelist cleared"); return 0; } @@ -788,10 +797,20 @@ static int on_freelist(struct kmem_cache *s, struct page *page, void *search) nr++; } - if (page->inuse != s->objects - nr) { + max_objects = (PAGE_SIZE << compound_order(page)) / s->size; + if (max_objects > 65535) + max_objects = 65535; + + if (page->objects != max_objects) { + slab_err(s, page, "Wrong number of objects. Found %d but " + "should be %d", page->objects, max_objects); + page->objects = max_objects; + slab_fix(s, "Number of objects adjusted."); + } + if (page->inuse != page->objects - nr) { slab_err(s, page, "Wrong object count. Counter is %d but " - "counted were %d", page->inuse, s->objects - nr); - page->inuse = s->objects - nr; + "counted were %d", page->inuse, page->objects - nr); + page->inuse = page->objects - nr; slab_fix(s, "Object count adjusted."); } return search == NULL; @@ -881,7 +900,7 @@ bad: * as used avoids touching the remaining objects. */ slab_fix(s, "Marking all objects used"); - page->inuse = s->objects; + page->inuse = page->objects; page->freelist = NULL; } return 0; @@ -1029,28 +1048,47 @@ static inline unsigned long kmem_cache_flags(unsigned long objsize, } #define slub_debug 0 #endif + +static inline struct page *alloc_slab_page(gfp_t flags, int node, + struct kmem_cache_order_objects oo) +{ + int order = oo_order(oo); + + if (node == -1) + return alloc_pages(flags, order); + else + return alloc_pages_node(node, flags, order); +} + /* * Slab allocation and freeing */ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) { struct page *page; - int pages = 1 << s->order; + struct kmem_cache_order_objects oo = s->oo; flags |= s->allocflags; - if (node == -1) - page = alloc_pages(flags, s->order); - else - page = alloc_pages_node(node, flags, s->order); - - if (!page) - return NULL; + page = alloc_slab_page(flags | __GFP_NOWARN | __GFP_NORETRY, node, + oo); + if (unlikely(!page)) { + oo = s->min; + /* + * Allocation may have failed due to fragmentation. + * Try a lower order alloc if possible + */ + page = alloc_slab_page(flags, node, oo); + if (!page) + return NULL; + stat(get_cpu_slab(s, raw_smp_processor_id()), ORDER_FALLBACK); + } + page->objects = oo_objects(oo); mod_zone_page_state(page_zone(page), (s->flags & SLAB_RECLAIM_ACCOUNT) ? NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE, - pages); + 1 << oo_order(oo)); return page; } @@ -1070,6 +1108,7 @@ static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node) void *start; void *last; void *p; + struct kmem_cache_order_objects oo = s->oo; BUG_ON(flags & GFP_SLAB_BUG_MASK); @@ -1079,8 +1118,12 @@ static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node) goto out; n = get_node(s, page_to_nid(page)); - if (n) +#ifdef CONFIG_SLUB_DEBUG + if (n) { atomic_long_inc(&n->nr_slabs); + atomic_long_add(page->objects, &n->total_objects); + } +#endif page->slab = s; page->flags |= 1 << PG_slab; if (s->flags & (SLAB_DEBUG_FREE | SLAB_RED_ZONE | SLAB_POISON | @@ -1090,10 +1133,10 @@ static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node) start = page_address(page); if (unlikely(s->flags & SLAB_POISON)) - memset(start, POISON_INUSE, PAGE_SIZE << s->order); + memset(start, POISON_INUSE, PAGE_SIZE << oo_order(oo)); last = start; - for_each_object(p, s, start) { + for_each_object(p, s, start, page->objects) { setup_object(s, page, last); set_freepointer(s, last, p); last = p; @@ -1109,13 +1152,15 @@ out: static void __free_slab(struct kmem_cache *s, struct page *page) { - int pages = 1 << s->order; + int order = compound_order(page); + int pages = 1 << order; if (unlikely(SlabDebug(page))) { void *p; slab_pad_check(s, page); - for_each_object(p, s, page_address(page)) + for_each_object(p, s, page_address(page), + page->objects) check_object(s, page, p, 0); ClearSlabDebug(page); } @@ -1125,7 +1170,9 @@ static void __free_slab(struct kmem_cache *s, struct page *page) NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE, -pages); - __free_pages(page, s->order); + __ClearPageSlab(page); + reset_page_mapcount(page); + __free_pages(page, order); } static void rcu_free_slab(struct rcu_head *h) @@ -1151,11 +1198,12 @@ static void free_slab(struct kmem_cache *s, struct page *page) static void discard_slab(struct kmem_cache *s, struct page *page) { +#ifdef CONFIG_SLUB_DEBUG struct kmem_cache_node *n = get_node(s, page_to_nid(page)); atomic_long_dec(&n->nr_slabs); - reset_page_mapcount(page); - __ClearPageSlab(page); + atomic_long_sub(page->objects, &n->total_objects); +#endif free_slab(s, page); } @@ -1470,9 +1518,6 @@ static void *__slab_alloc(struct kmem_cache *s, void **object; struct page *new; - /* We handle __GFP_ZERO in the caller */ - gfpflags &= ~__GFP_ZERO; - if (!c->page) goto new_slab; @@ -1490,7 +1535,7 @@ load_freelist: goto debug; c->freelist = object[c->offset]; - c->page->inuse = s->objects; + c->page->inuse = c->page->objects; c->page->freelist = NULL; c->node = page_to_nid(c->page); unlock_out: @@ -1527,27 +1572,6 @@ new_slab: c->page = new; goto load_freelist; } - - /* - * No memory available. - * - * If the slab uses higher order allocs but the object is - * smaller than a page size then we can fallback in emergencies - * to the page allocator via kmalloc_large. The page allocator may - * have failed to obtain a higher order page and we can try to - * allocate a single page if the object fits into a single page. - * That is only possible if certain conditions are met that are being - * checked when a slab is created. - */ - if (!(gfpflags & __GFP_NORETRY) && - (s->flags & __PAGE_ALLOC_FALLBACK)) { - if (gfpflags & __GFP_WAIT) - local_irq_enable(); - object = kmalloc_large(s->objsize, gfpflags); - if (gfpflags & __GFP_WAIT) - local_irq_disable(); - return object; - } return NULL; debug: if (!alloc_debug_processing(s, c->page, object, addr)) @@ -1748,8 +1772,8 @@ static struct page *get_object_page(const void *x) * take the list_lock. */ static int slub_min_order; -static int slub_max_order = DEFAULT_MAX_ORDER; -static int slub_min_objects = DEFAULT_MIN_OBJECTS; +static int slub_max_order = PAGE_ALLOC_COSTLY_ORDER; +static int slub_min_objects = 0; /* * Merge control. If this is set then no merging of slab caches will occur. @@ -1789,6 +1813,9 @@ static inline int slab_order(int size, int min_objects, int rem; int min_order = slub_min_order; + if ((PAGE_SIZE << min_order) / size > 65535) + return get_order(size * 65535) - 1; + for (order = max(min_order, fls(min_objects * size - 1) - PAGE_SHIFT); order <= max_order; order++) { @@ -1823,8 +1850,10 @@ static inline int calculate_order(int size) * we reduce the minimum objects required in a slab. */ min_objects = slub_min_objects; + if (!min_objects) + min_objects = 4 * fls(nr_cpu_ids); while (min_objects > 1) { - fraction = 8; + fraction = 16; while (fraction >= 4) { order = slab_order(size, min_objects, slub_max_order, fraction); @@ -1891,10 +1920,11 @@ static void init_kmem_cache_cpu(struct kmem_cache *s, static void init_kmem_cache_node(struct kmem_cache_node *n) { n->nr_partial = 0; - atomic_long_set(&n->nr_slabs, 0); spin_lock_init(&n->list_lock); INIT_LIST_HEAD(&n->partial); #ifdef CONFIG_SLUB_DEBUG + atomic_long_set(&n->nr_slabs, 0); + atomic_long_set(&n->total_objects, 0); INIT_LIST_HEAD(&n->full); #endif } @@ -2063,7 +2093,9 @@ static struct kmem_cache_node *early_kmem_cache_node_alloc(gfp_t gfpflags, init_tracking(kmalloc_caches, n); #endif init_kmem_cache_node(n); +#ifdef CONFIG_SLUB_DEBUG atomic_long_inc(&n->nr_slabs); +#endif /* * lockdep requires consistent irq usage for each lock @@ -2139,11 +2171,12 @@ static int init_kmem_cache_nodes(struct kmem_cache *s, gfp_t gfpflags) * calculate_sizes() determines the order and the distribution of data within * a slab object. */ -static int calculate_sizes(struct kmem_cache *s) +static int calculate_sizes(struct kmem_cache *s, int forced_order) { unsigned long flags = s->flags; unsigned long size = s->objsize; unsigned long align = s->align; + int order; /* * Round up object size to the next word boundary. We can only @@ -2227,26 +2260,16 @@ static int calculate_sizes(struct kmem_cache *s) */ size = ALIGN(size, align); s->size = size; + if (forced_order >= 0) + order = forced_order; + else + order = calculate_order(size); - if ((flags & __KMALLOC_CACHE) && - PAGE_SIZE / size < slub_min_objects) { - /* - * Kmalloc cache that would not have enough objects in - * an order 0 page. Kmalloc slabs can fallback to - * page allocator order 0 allocs so take a reasonably large - * order that will allows us a good number of objects. - */ - s->order = max(slub_max_order, PAGE_ALLOC_COSTLY_ORDER); - s->flags |= __PAGE_ALLOC_FALLBACK; - s->allocflags |= __GFP_NOWARN; - } else - s->order = calculate_order(size); - - if (s->order < 0) + if (order < 0) return 0; s->allocflags = 0; - if (s->order) + if (order) s->allocflags |= __GFP_COMP; if (s->flags & SLAB_CACHE_DMA) @@ -2258,9 +2281,12 @@ static int calculate_sizes(struct kmem_cache *s) /* * Determine the number of objects per slab */ - s->objects = (PAGE_SIZE << s->order) / size; + s->oo = oo_make(order, size); + s->min = oo_make(get_order(size), size); + if (oo_objects(s->oo) > oo_objects(s->max)) + s->max = s->oo; - return !!s->objects; + return !!oo_objects(s->oo); } @@ -2276,7 +2302,7 @@ static int kmem_cache_open(struct kmem_cache *s, gfp_t gfpflags, s->align = align; s->flags = kmem_cache_flags(size, flags, name, ctor); - if (!calculate_sizes(s)) + if (!calculate_sizes(s, -1)) goto error; s->refcount = 1; @@ -2293,7 +2319,7 @@ error: if (flags & SLAB_PANIC) panic("Cannot create slab %s size=%lu realsize=%u " "order=%u offset=%u flags=%lx\n", - s->name, (unsigned long)size, s->size, s->order, + s->name, (unsigned long)size, s->size, oo_order(s->oo), s->offset, flags); return 0; } @@ -2376,8 +2402,10 @@ static inline int kmem_cache_close(struct kmem_cache *s) struct kmem_cache_node *n = get_node(s, node); n->nr_partial -= free_list(s, n, &n->partial); +#ifdef CONFIG_SLUB_DEBUG if (atomic_long_read(&n->nr_slabs)) return 1; +#endif } free_kmem_cache_nodes(s); return 0; @@ -2458,7 +2486,7 @@ static struct kmem_cache *create_kmalloc_cache(struct kmem_cache *s, down_write(&slub_lock); if (!kmem_cache_open(s, gfp_flags, name, size, ARCH_KMALLOC_MINALIGN, - flags | __KMALLOC_CACHE, NULL)) + flags, NULL)) goto panic; list_add(&s->list, &slab_caches); @@ -2688,8 +2716,14 @@ void kfree(const void *x) } EXPORT_SYMBOL(kfree); +<<<<<<< HEAD:mm/slub.c #if defined(CONFIG_SLUB_DEBUG) || defined(CONFIG_SLABINFO) static unsigned long count_partial(struct kmem_cache_node *n) +======= +#if defined(SLUB_DEBUG) || defined(CONFIG_SLABINFO) +static unsigned long count_partial(struct kmem_cache_node *n, + int (*get_count)(struct page *)) +>>>>>>> FETCH_HEAD:mm/slub.c { unsigned long flags; unsigned long x = 0; @@ -2697,10 +2731,25 @@ static unsigned long count_partial(struct kmem_cache_node *n) spin_lock_irqsave(&n->list_lock, flags); list_for_each_entry(page, &n->partial, lru) - x += page->inuse; + x += get_count(page); spin_unlock_irqrestore(&n->list_lock, flags); return x; } + +static int count_inuse(struct page *page) +{ + return page->inuse; +} + +static int count_total(struct page *page) +{ + return page->objects; +} + +static int count_free(struct page *page) +{ + return page->objects - page->inuse; +} #endif /* @@ -2720,8 +2769,9 @@ int kmem_cache_shrink(struct kmem_cache *s) struct kmem_cache_node *n; struct page *page; struct page *t; + int objects = oo_objects(s->max); struct list_head *slabs_by_inuse = - kmalloc(sizeof(struct list_head) * s->objects, GFP_KERNEL); + kmalloc(sizeof(struct list_head) * objects, GFP_KERNEL); unsigned long flags; if (!slabs_by_inuse) @@ -2734,7 +2784,7 @@ int kmem_cache_shrink(struct kmem_cache *s) if (!n->nr_partial) continue; - for (i = 0; i < s->objects; i++) + for (i = 0; i < objects; i++) INIT_LIST_HEAD(slabs_by_inuse + i); spin_lock_irqsave(&n->list_lock, flags); @@ -2766,7 +2816,7 @@ int kmem_cache_shrink(struct kmem_cache *s) * Rebuild the partial list with the slabs filled up most * first and the least used slabs at the end. */ - for (i = s->objects - 1; i >= 0; i--) + for (i = objects - 1; i >= 0; i--) list_splice(slabs_by_inuse + i, n->partial.prev); spin_unlock_irqrestore(&n->list_lock, flags); @@ -2810,6 +2860,7 @@ static void slab_mem_offline_callback(void *arg) list_for_each_entry(s, &slab_caches, list) { n = get_node(s, offline_node); if (n) { +#ifdef CONFIG_SLUB_DEBUG /* * if n->nr_slabs > 0, slabs still exist on the node * that is going down. We were unable to free them, @@ -2817,6 +2868,7 @@ static void slab_mem_offline_callback(void *arg) * callback. So, we must fail. */ BUG_ON(atomic_long_read(&n->nr_slabs)); +#endif s->node[offline_node] = NULL; kmem_cache_free(kmalloc_caches, n); @@ -2987,9 +3039,6 @@ static int slab_unmergeable(struct kmem_cache *s) if (slub_nomerge || (s->flags & SLUB_NEVER_MERGE)) return 1; - if ((s->flags & __PAGE_ALLOC_FALLBACK)) - return 1; - if (s->ctor) return 1; @@ -3193,7 +3242,7 @@ static int validate_slab(struct kmem_cache *s, struct page *page, return 0; /* Now we know that a valid freelist exists */ - bitmap_zero(map, s->objects); + bitmap_zero(map, page->objects); for_each_free_object(p, s, page->freelist) { set_bit(slab_index(p, s, addr), map); @@ -3201,7 +3250,7 @@ static int validate_slab(struct kmem_cache *s, struct page *page, return 0; } - for_each_object(p, s, addr) + for_each_object(p, s, addr, page->objects) if (!test_bit(slab_index(p, s, addr), map)) if (!check_object(s, page, p, 1)) return 0; @@ -3267,7 +3316,7 @@ static long validate_slab_cache(struct kmem_cache *s) { int node; unsigned long count = 0; - unsigned long *map = kmalloc(BITS_TO_LONGS(s->objects) * + unsigned long *map = kmalloc(BITS_TO_LONGS(oo_objects(s->max)) * sizeof(unsigned long), GFP_KERNEL); if (!map) @@ -3470,14 +3519,14 @@ static void process_slab(struct loc_track *t, struct kmem_cache *s, struct page *page, enum track_item alloc) { void *addr = page_address(page); - DECLARE_BITMAP(map, s->objects); + DECLARE_BITMAP(map, page->objects); void *p; - bitmap_zero(map, s->objects); + bitmap_zero(map, page->objects); for_each_free_object(p, s, page->freelist) set_bit(slab_index(p, s, addr), map); - for_each_object(p, s, addr) + for_each_object(p, s, addr, page->objects) if (!test_bit(slab_index(p, s, addr), map)) add_location(t, s, get_track(s, p, alloc)); } @@ -3567,22 +3616,23 @@ static int list_locations(struct kmem_cache *s, char *buf, } enum slab_stat_type { - SL_FULL, - SL_PARTIAL, - SL_CPU, - SL_OBJECTS + SL_ALL, /* All slabs */ + SL_PARTIAL, /* Only partially allocated slabs */ + SL_CPU, /* Only slabs used for cpu caches */ + SL_OBJECTS, /* Determine allocated objects not slabs */ + SL_TOTAL /* Determine object capacity not slabs */ }; -#define SO_FULL (1 << SL_FULL) +#define SO_ALL (1 << SL_ALL) #define SO_PARTIAL (1 << SL_PARTIAL) #define SO_CPU (1 << SL_CPU) #define SO_OBJECTS (1 << SL_OBJECTS) +#define SO_TOTAL (1 << SL_TOTAL) static ssize_t show_slab_objects(struct kmem_cache *s, char *buf, unsigned long flags) { unsigned long total = 0; - int cpu; int node; int x; unsigned long *nodes; @@ -3593,56 +3643,60 @@ static ssize_t show_slab_objects(struct kmem_cache *s, return -ENOMEM; per_cpu = nodes + nr_node_ids; - for_each_possible_cpu(cpu) { - struct page *page; - struct kmem_cache_cpu *c = get_cpu_slab(s, cpu); + if (flags & SO_CPU) { + int cpu; - if (!c) - continue; + for_each_possible_cpu(cpu) { + struct kmem_cache_cpu *c = get_cpu_slab(s, cpu); - page = c->page; - node = c->node; - if (node < 0) - continue; - if (page) { - if (flags & SO_CPU) { - if (flags & SO_OBJECTS) - x = page->inuse; + if (!c || c->node < 0) + continue; + + if (c->page) { + if (flags & SO_TOTAL) + x = c->page->objects; + else if (flags & SO_OBJECTS) + x = c->page->inuse; else x = 1; + total += x; - nodes[node] += x; + nodes[c->node] += x; } - per_cpu[node]++; + per_cpu[c->node]++; } } - for_each_node_state(node, N_NORMAL_MEMORY) { - struct kmem_cache_node *n = get_node(s, node); + if (flags & SO_ALL) { + for_each_node_state(node, N_NORMAL_MEMORY) { + struct kmem_cache_node *n = get_node(s, node); + + if (flags & SO_TOTAL) + x = atomic_long_read(&n->total_objects); + else if (flags & SO_OBJECTS) + x = atomic_long_read(&n->total_objects) - + count_partial(n, count_free); - if (flags & SO_PARTIAL) { - if (flags & SO_OBJECTS) - x = count_partial(n); else - x = n->nr_partial; + x = atomic_long_read(&n->nr_slabs); total += x; nodes[node] += x; } - if (flags & SO_FULL) { - int full_slabs = atomic_long_read(&n->nr_slabs) - - per_cpu[node] - - n->nr_partial; + } else if (flags & SO_PARTIAL) { + for_each_node_state(node, N_NORMAL_MEMORY) { + struct kmem_cache_node *n = get_node(s, node); - if (flags & SO_OBJECTS) - x = full_slabs * s->objects; + if (flags & SO_TOTAL) + x = count_partial(n, count_total); + else if (flags & SO_OBJECTS) + x = count_partial(n, count_inuse); else - x = full_slabs; + x = n->nr_partial; total += x; nodes[node] += x; } } - x = sprintf(buf, "%lu", total); #ifdef CONFIG_NUMA for_each_node_state(node, N_NORMAL_MEMORY) @@ -3672,7 +3726,7 @@ static int any_slab_objects(struct kmem_cache *s) if (!n) continue; - if (n->nr_partial || atomic_long_read(&n->nr_slabs)) + if (atomic_long_read(&n->total_objects)) return 1; } return 0; @@ -3714,15 +3768,27 @@ SLAB_ATTR_RO(object_size); static ssize_t objs_per_slab_show(struct kmem_cache *s, char *buf) { - return sprintf(buf, "%d\n", s->objects); + return sprintf(buf, "%d\n", oo_objects(s->oo)); } SLAB_ATTR_RO(objs_per_slab); +static ssize_t order_store(struct kmem_cache *s, + const char *buf, size_t length) +{ + int order = simple_strtoul(buf, NULL, 10); + + if (order > slub_max_order || order < slub_min_order) + return -EINVAL; + + calculate_sizes(s, order); + return length; +} + static ssize_t order_show(struct kmem_cache *s, char *buf) { - return sprintf(buf, "%d\n", s->order); + return sprintf(buf, "%d\n", oo_order(s->oo)); } -SLAB_ATTR_RO(order); +SLAB_ATTR(order); static ssize_t ctor_show(struct kmem_cache *s, char *buf) { @@ -3743,7 +3809,7 @@ SLAB_ATTR_RO(aliases); static ssize_t slabs_show(struct kmem_cache *s, char *buf) { - return show_slab_objects(s, buf, SO_FULL|SO_PARTIAL|SO_CPU); + return show_slab_objects(s, buf, SO_ALL); } SLAB_ATTR_RO(slabs); @@ -3761,10 +3827,22 @@ SLAB_ATTR_RO(cpu_slabs); static ssize_t objects_show(struct kmem_cache *s, char *buf) { - return show_slab_objects(s, buf, SO_FULL|SO_PARTIAL|SO_CPU|SO_OBJECTS); + return show_slab_objects(s, buf, SO_ALL|SO_OBJECTS); } SLAB_ATTR_RO(objects); +static ssize_t objects_partial_show(struct kmem_cache *s, char *buf) +{ + return show_slab_objects(s, buf, SO_PARTIAL|SO_OBJECTS); +} +SLAB_ATTR_RO(objects_partial); + +static ssize_t total_objects_show(struct kmem_cache *s, char *buf) +{ + return show_slab_objects(s, buf, SO_ALL|SO_TOTAL); +} +SLAB_ATTR_RO(total_objects); + static ssize_t sanity_checks_show(struct kmem_cache *s, char *buf) { return sprintf(buf, "%d\n", !!(s->flags & SLAB_DEBUG_FREE)); @@ -3844,7 +3922,7 @@ static ssize_t red_zone_store(struct kmem_cache *s, s->flags &= ~SLAB_RED_ZONE; if (buf[0] == '1') s->flags |= SLAB_RED_ZONE; - calculate_sizes(s); + calculate_sizes(s, -1); return length; } SLAB_ATTR(red_zone); @@ -3863,7 +3941,7 @@ static ssize_t poison_store(struct kmem_cache *s, s->flags &= ~SLAB_POISON; if (buf[0] == '1') s->flags |= SLAB_POISON; - calculate_sizes(s); + calculate_sizes(s, -1); return length; } SLAB_ATTR(poison); @@ -3882,7 +3960,7 @@ static ssize_t store_user_store(struct kmem_cache *s, s->flags &= ~SLAB_STORE_USER; if (buf[0] == '1') s->flags |= SLAB_STORE_USER; - calculate_sizes(s); + calculate_sizes(s, -1); return length; } SLAB_ATTR(store_user); @@ -4011,7 +4089,7 @@ STAT_ATTR(DEACTIVATE_EMPTY, deactivate_empty); STAT_ATTR(DEACTIVATE_TO_HEAD, deactivate_to_head); STAT_ATTR(DEACTIVATE_TO_TAIL, deactivate_to_tail); STAT_ATTR(DEACTIVATE_REMOTE_FREES, deactivate_remote_frees); - +STAT_ATTR(ORDER_FALLBACK, order_fallback); #endif static struct attribute *slab_attrs[] = { @@ -4020,6 +4098,8 @@ static struct attribute *slab_attrs[] = { &objs_per_slab_attr.attr, &order_attr.attr, &objects_attr.attr, + &objects_partial_attr.attr, + &total_objects_attr.attr, &slabs_attr.attr, &partial_attr.attr, &cpu_slabs_attr.attr, @@ -4062,6 +4142,7 @@ static struct attribute *slab_attrs[] = { &deactivate_to_head_attr.attr, &deactivate_to_tail_attr.attr, &deactivate_remote_frees_attr.attr, + &order_fallback_attr.attr, #endif NULL }; @@ -4348,7 +4429,8 @@ static int s_show(struct seq_file *m, void *p) unsigned long nr_partials = 0; unsigned long nr_slabs = 0; unsigned long nr_inuse = 0; - unsigned long nr_objs; + unsigned long nr_objs = 0; + unsigned long nr_free = 0; struct kmem_cache *s; int node; @@ -4362,14 +4444,15 @@ static int s_show(struct seq_file *m, void *p) nr_partials += n->nr_partial; nr_slabs += atomic_long_read(&n->nr_slabs); - nr_inuse += count_partial(n); + nr_objs += atomic_long_read(&n->total_objects); + nr_free += count_partial(n, count_free); } - nr_objs = nr_slabs * s->objects; - nr_inuse += (nr_slabs - nr_partials) * s->objects; + nr_inuse = nr_objs - nr_free; seq_printf(m, "%-17s %6lu %6lu %6u %4u %4d", s->name, nr_inuse, - nr_objs, s->size, s->objects, (1 << s->order)); + nr_objs, s->size, oo_objects(s->oo), + (1 << oo_order(s->oo))); seq_printf(m, " : tunables %4u %4u %4u", 0, 0, 0); seq_printf(m, " : slabdata %6lu %6lu %6lu", nr_slabs, nr_slabs, 0UL);