From clameter@sgi.com Fri May 4 15:05:19 2007 Message-Id: <20070504215839.290346570@sgi.com> User-Agent: quilt/0.46-1 Date: Fri, 04 May 2007 14:58:39 -0700 From: clameter@sgi.com To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, dgc@sgi.com, Dipankar Sarma , Eric Dumazet , Mel Gorman Subject: [patch 0/3] [RFC] Slab Defrag / Slab Targeted Reclaim and general Slab API changes I originally intended this for the 2.6.23 development cycle but since there is an agressive push for SLUB I thought that we may want to introduce this earlier. This is an RFC for patches that do major changes to the way that slab allocations are handled in order to introduce some more advanced features and in order to get rid of some things that are no longer used or awkward. Specifically: A. Add Slab fragmentation This means that on kmem_cache_shrink SLUB will not only sort the partial slabs by object number but attempt to free objects out of partial slabs that have a low number of objects. Doing so increases the object density in the remaining partial. Ideally kmem_cache_shrink would be able to completely defrag the partial list so that only one partial slab is left over. But it is advantageous to have slabs with a few free objects since that speeds up kfree. Also going to the extreme on this one would mean that the reclaimable slabs would have to be able to move objects. So we just free objects in slabs with a low population ratio. B. Targeted Reclaim This is mainly to support antifragmentation / defragmentation methods. The Slab adds a new function kmem_cache_vacate(struct page *) which can be used to request from the SLAB that a page be cleared of all objects. This makes it possible to reduce the size of the RECLAIMABLE fragmentation area and move slabs into the MOVABLE area enhancing the capabilities of antifragmentation significantly. C. Introduces a slab_ops structure that allows a slab user to provide operations on slabs. This replaces the current constructor / destructor scheme. It is necessary in order to support additional methods needed to support targeted reclaim and slab defragmentation. A slab supporting targeted reclaim and slab defragmentation must support the following additional methods: 1. get_reference(void *) Get a reference on a particular slab object. 2. kick_object(void *) Kick an object off a slab. The object is either reclaimed (easiest) or a new object is alloced using kmem_cache_alloc() and then the object is moved to the new location. D. Slab creation is no longer done using kmem_cache_create kmem_cache_create is not a clean API since it has only 2 callbacks for constuctor and destructor, does not allow the specification of a slab ops structure. Parameters are confusing. F.e. It is possible to specify alignment information in the alignment field and in addtion in the flags field (SLAB_HWCACHE_ALIGN). The semantics of SLAB_HWCACHE_ALIGN are fuzzy because it only aligns object if larger than 1/2 cacheline. So there is no guarantee of an alignment. All of this is really not necessary since the compiler knows how to align structures and we should use this information instead of having the user specify an alignment. I would like to get rid of SLAB_HWCACHE_ALIGN and kmem_cache_create. Instead one would use the following macros (that then result in a call to __kmem_cache_create). KMEM_CACHE(, flags) The macro will determine the slab name from the struct name and use that for /sys/slab, will use the size of the struct for slab size and the alignment of the structure for alignment. This means one will be able to set slab object alignment by specifying the usual alignment options for static allocations when defining the structure. Since the name is derived from the struct name it will much easier to find the source code for slabs listed in /sys/slab. An additional macro is provided if the slab also supports slab operations. KMEM_CACHE_OPS(, flags, slab_ops) It is likely that this macro will be rarely used. E. kmem_cache_create() SLAB_HWCACHE_ALIGN legacy interface In order to avoid having to modify all slab creation calls throughout the kernel we will provide a kmem_cache_create emulation. That function is the only call that will still understand SLAB_HWCACHE_ALIGN. If that parameter is specified then it will set up the proper alignment (the slab allocators never see that flag). If constructor or destructor are specified then we will allocate a slab_ops structure and populate it with the values specified. Note that this will cause a memory leak if the slab is disposed of later. If you need disposable slabs then the new API must be used. F. Remove destructor support from all slab allocators? I am only aware of two call sites left after all the changes that are scheduled to go into 2.6.22-rc1 have been merged. These are in FRV and sh arch code. The one in FRV will go away if they switch to quicklists like i386. Sh contains another use but a single user is no justification for keeping destructors around. -- From clameter@sgi.com Fri May 4 15:05:19 2007 Message-Id: <20070504220519.815033039@sgi.com> References: <20070504215839.290346570@sgi.com> User-Agent: quilt/0.46-1 Date: Fri, 04 May 2007 14:58:40 -0700 From: clameter@sgi.com To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, dgc@sgi.com, Dipankar Sarma , Eric Dumazet , Mel Gorman Subject: [patch 1/3] SLUB: slab_ops instead of constructors / destructors Content-Disposition: inline; filename=slabapic23 This patch gets rid constructors and destructors and replaces them with a slab operations structure that is passed into SLUB. For backward compatibility we provide a kmem_cache_create() emulation that can construct a slab operations structure on the fly. The new API's to create slabs are: Without any callbacks: slabhandle = KMEM_CACHE(, ) Creates a slab based on the structure definition with the structure alignment, size and name. This is cleaner because the name showing up in /sys/slab/xxx will be the structure name. One can search the source for the name. The common alignment attributs to the struct can control slab alignmnet. Note: SLAB_HWCACHE_ALIGN is *not* supported as a flag. The flags do *not* specify alignments. The alignment is done to the structure and please nowhere else. Create a slabcache with slab_ops (please use only for special slabs): KMEM_CACHE_OPS(, , ) Old kmem_cache_create() support: kmem_cache_create alone still accepts the specification of SLAB_HWCACHE_ALIGN *if* there is no other alignment specified. In that case kmem_cache_create will generate a proper alignment depending on the size of the structure. Signed-off-by: Christoph Lameter --- include/linux/slab.h | 60 ++++++++++++++++++++++++++++++------ include/linux/slub_def.h | 3 - mm/slub.c | 77 +++++++++++++++++------------------------------ 3 files changed, 80 insertions(+), 60 deletions(-) Index: slub/include/linux/slab.h =================================================================== --- slub.orig/include/linux/slab.h 2007-05-03 20:53:00.000000000 -0700 +++ slub/include/linux/slab.h 2007-05-04 02:38:38.000000000 -0700 @@ -23,7 +23,6 @@ typedef struct kmem_cache kmem_cache_t _ #define SLAB_DEBUG_FREE 0x00000100UL /* DEBUG: Perform (expensive) checks on free */ #define SLAB_RED_ZONE 0x00000400UL /* DEBUG: Red zone objs in a cache */ #define SLAB_POISON 0x00000800UL /* DEBUG: Poison objects */ -#define SLAB_HWCACHE_ALIGN 0x00002000UL /* Align objs on cache lines */ #define SLAB_CACHE_DMA 0x00004000UL /* Use GFP_DMA memory */ #define SLAB_STORE_USER 0x00010000UL /* DEBUG: Store the last owner for bug hunting */ #define SLAB_RECLAIM_ACCOUNT 0x00020000UL /* Objects are reclaimable */ @@ -32,19 +31,21 @@ typedef struct kmem_cache kmem_cache_t _ #define SLAB_MEM_SPREAD 0x00100000UL /* Spread some memory over cpuset */ #define SLAB_TRACE 0x00200000UL /* Trace allocations and frees */ -/* Flags passed to a constructor functions */ -#define SLAB_CTOR_CONSTRUCTOR 0x001UL /* If not set, then deconstructor */ - /* * struct kmem_cache related prototypes */ void __init kmem_cache_init(void); int slab_is_available(void); -struct kmem_cache *kmem_cache_create(const char *, size_t, size_t, - unsigned long, - void (*)(void *, struct kmem_cache *, unsigned long), - void (*)(void *, struct kmem_cache *, unsigned long)); +struct slab_ops { + /* FIXME: ctor should only take the object as an argument. */ + void (*ctor)(void *, struct kmem_cache *, unsigned long); + /* FIXME: Remove all destructors ? */ + void (*dtor)(void *, struct kmem_cache *, unsigned long); +}; + +struct kmem_cache *__kmem_cache_create(const char *, size_t, size_t, + unsigned long, struct slab_ops *s); void kmem_cache_destroy(struct kmem_cache *); int kmem_cache_shrink(struct kmem_cache *); void *kmem_cache_alloc(struct kmem_cache *, gfp_t); @@ -62,9 +63,14 @@ int kmem_ptr_validate(struct kmem_cache * f.e. add ____cacheline_aligned_in_smp to the struct declaration * then the objects will be properly aligned in SMP configurations. */ -#define KMEM_CACHE(__struct, __flags) kmem_cache_create(#__struct,\ +#define KMEM_CACHE(__struct, __flags) __kmem_cache_create(#__struct,\ sizeof(struct __struct), __alignof__(struct __struct),\ - (__flags), NULL, NULL) + (__flags), NULL) + +#define KMEM_CACHE_OPS(__struct, __flags, __ops) \ + __kmem_cache_create(#__struct, sizeof(struct __struct), \ + __alignof__(struct __struct), (__flags), (__ops)) + #ifdef CONFIG_NUMA extern void *kmem_cache_alloc_node(struct kmem_cache *, gfp_t flags, int node); @@ -236,6 +242,40 @@ extern void *__kmalloc_node_track_caller extern const struct seq_operations slabinfo_op; ssize_t slabinfo_write(struct file *, const char __user *, size_t, loff_t *); +/* + * Legacy functions + * + * The sole reason that these definitions are here is because of their + * frequent use. Remove when all call sites have been updated. + */ +#define SLAB_HWCACHE_ALIGN 0x8000000000UL +#define SLAB_CTOR_CONSTRUCTOR 0x001UL + +static inline struct kmem_cache *kmem_cache_create(const char *s, + size_t size, size_t align, unsigned long flags, + void (*ctor)(void *, struct kmem_cache *, unsigned long), + void (*dtor)(void *, struct kmem_cache *, unsigned long)) +{ + struct slab_ops *so = NULL; + + if ((flags & SLAB_HWCACHE_ALIGN) && size > L1_CACHE_BYTES / 2) { + /* Clear the align flag. It is no longer supported */ + flags &= ~SLAB_HWCACHE_ALIGN; + + /* Do not allow conflicting alignment specificiations */ + BUG_ON(align); + + /* And set the cacheline alignment */ + align = L1_CACHE_BYTES; + } + if (ctor || dtor) { + so = kzalloc(sizeof(struct slab_ops), GFP_KERNEL); + so->ctor = ctor; + so->dtor = dtor; + } + return __kmem_cache_create(s, size, align, flags, so); +} + #endif /* __KERNEL__ */ #endif /* _LINUX_SLAB_H */ Index: slub/mm/slub.c =================================================================== --- slub.orig/mm/slub.c 2007-05-04 02:23:34.000000000 -0700 +++ slub/mm/slub.c 2007-05-04 02:40:13.000000000 -0700 @@ -209,6 +209,11 @@ static inline struct kmem_cache_node *ge #endif } +struct slab_ops default_slab_ops = { + NULL, + NULL +}; + /* * Object debugging */ @@ -809,8 +814,8 @@ static void setup_object(struct kmem_cac init_tracking(s, object); } - if (unlikely(s->ctor)) - s->ctor(object, s, SLAB_CTOR_CONSTRUCTOR); + if (unlikely(s->slab_ops->ctor)) + s->slab_ops->ctor(object, s, SLAB_CTOR_CONSTRUCTOR); } static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node) @@ -867,16 +872,18 @@ out: static void __free_slab(struct kmem_cache *s, struct page *page) { int pages = 1 << s->order; + void (*dtor)(void *, struct kmem_cache *, unsigned long) = + s->slab_ops->dtor; - if (unlikely(PageError(page) || s->dtor)) { + if (unlikely(PageError(page) || dtor)) { void *start = page_address(page); void *end = start + (pages << PAGE_SHIFT); void *p; slab_pad_check(s, page); for (p = start; p <= end - s->size; p += s->size) { - if (s->dtor) - s->dtor(p, s, 0); + if (dtor) + dtor(p, s, 0); check_object(s, page, p, 0); } } @@ -1618,7 +1625,7 @@ static int calculate_sizes(struct kmem_c * then we should never poison the object itself. */ if ((flags & SLAB_POISON) && !(flags & SLAB_DESTROY_BY_RCU) && - !s->ctor && !s->dtor) + s->slab_ops->ctor && !s->slab_ops->dtor) s->flags |= __OBJECT_POISON; else s->flags &= ~__OBJECT_POISON; @@ -1647,7 +1654,7 @@ static int calculate_sizes(struct kmem_c s->inuse = size; if (((flags & (SLAB_DESTROY_BY_RCU | SLAB_POISON)) || - s->ctor || s->dtor)) { + s->slab_ops->ctor || s->slab_ops->dtor)) { /* * Relocate free pointer after the object if it is not * permitted to overwrite the first word of the object on @@ -1731,13 +1738,11 @@ static int __init finish_bootstrap(void) static int kmem_cache_open(struct kmem_cache *s, gfp_t gfpflags, const char *name, size_t size, size_t align, unsigned long flags, - void (*ctor)(void *, struct kmem_cache *, unsigned long), - void (*dtor)(void *, struct kmem_cache *, unsigned long)) + struct slab_ops *slab_ops) { memset(s, 0, kmem_size); s->name = name; - s->ctor = ctor; - s->dtor = dtor; + s->slab_ops = slab_ops; s->objsize = size; s->flags = flags; s->align = align; @@ -1757,7 +1762,7 @@ static int kmem_cache_open(struct kmem_c if (s->size >= 65535 * sizeof(void *)) { BUG_ON(flags & (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | SLAB_DESTROY_BY_RCU)); - BUG_ON(ctor || dtor); + BUG_ON(slab_ops->ctor || slab_ops->dtor); } else /* @@ -1992,7 +1997,7 @@ static struct kmem_cache *create_kmalloc down_write(&slub_lock); if (!kmem_cache_open(s, gfp_flags, name, size, ARCH_KMALLOC_MINALIGN, - flags, NULL, NULL)) + flags, &default_slab_ops)) goto panic; list_add(&s->list, &slab_caches); @@ -2313,23 +2318,21 @@ static int slab_unmergeable(struct kmem_ if (slub_nomerge || (s->flags & SLUB_NEVER_MERGE)) return 1; - if (s->ctor || s->dtor) + if (s->slab_ops != &default_slab_ops) return 1; return 0; } static struct kmem_cache *find_mergeable(size_t size, - size_t align, unsigned long flags, - void (*ctor)(void *, struct kmem_cache *, unsigned long), - void (*dtor)(void *, struct kmem_cache *, unsigned long)) + size_t align, unsigned long flags, struct slab_ops *slab_ops) { struct list_head *h; if (slub_nomerge || (flags & SLUB_NEVER_MERGE)) return NULL; - if (ctor || dtor) + if (slab_ops != &default_slab_ops) return NULL; size = ALIGN(size, sizeof(void *)); @@ -2364,15 +2367,17 @@ static struct kmem_cache *find_mergeable return NULL; } -struct kmem_cache *kmem_cache_create(const char *name, size_t size, +struct kmem_cache *__kmem_cache_create(const char *name, size_t size, size_t align, unsigned long flags, - void (*ctor)(void *, struct kmem_cache *, unsigned long), - void (*dtor)(void *, struct kmem_cache *, unsigned long)) + struct slab_ops *slab_ops) { struct kmem_cache *s; + if (!slab_ops) + slab_ops = &default_slab_ops; + down_write(&slub_lock); - s = find_mergeable(size, align, flags, dtor, ctor); + s = find_mergeable(size, align, flags, slab_ops); if (s) { s->refcount++; /* @@ -2386,7 +2391,7 @@ struct kmem_cache *kmem_cache_create(con } else { s = kmalloc(kmem_size, GFP_KERNEL); if (s && kmem_cache_open(s, GFP_KERNEL, name, - size, align, flags, ctor, dtor)) { + size, align, flags, slab_ops)) { if (sysfs_slab_add(s)) { kfree(s); goto err; @@ -2406,7 +2411,7 @@ err: s = NULL; return s; } -EXPORT_SYMBOL(kmem_cache_create); +EXPORT_SYMBOL(__kmem_cache_create); void *kmem_cache_zalloc(struct kmem_cache *s, gfp_t flags) { @@ -2961,28 +2966,6 @@ static ssize_t order_show(struct kmem_ca } SLAB_ATTR_RO(order); -static ssize_t ctor_show(struct kmem_cache *s, char *buf) -{ - if (s->ctor) { - int n = sprint_symbol(buf, (unsigned long)s->ctor); - - return n + sprintf(buf + n, "\n"); - } - return 0; -} -SLAB_ATTR_RO(ctor); - -static ssize_t dtor_show(struct kmem_cache *s, char *buf) -{ - if (s->dtor) { - int n = sprint_symbol(buf, (unsigned long)s->dtor); - - return n + sprintf(buf + n, "\n"); - } - return 0; -} -SLAB_ATTR_RO(dtor); - static ssize_t aliases_show(struct kmem_cache *s, char *buf) { return sprintf(buf, "%d\n", s->refcount - 1); @@ -3213,8 +3196,6 @@ static struct attribute * slab_attrs[] = &slabs_attr.attr, &partial_attr.attr, &cpu_slabs_attr.attr, - &ctor_attr.attr, - &dtor_attr.attr, &aliases_attr.attr, &align_attr.attr, &sanity_checks_attr.attr, Index: slub/include/linux/slub_def.h =================================================================== --- slub.orig/include/linux/slub_def.h 2007-05-04 02:23:51.000000000 -0700 +++ slub/include/linux/slub_def.h 2007-05-04 02:24:27.000000000 -0700 @@ -39,8 +39,7 @@ struct kmem_cache { /* Allocation and freeing of slabs */ int objects; /* Number of objects in slab */ int refcount; /* Refcount for slab cache destroy */ - void (*ctor)(void *, struct kmem_cache *, unsigned long); - void (*dtor)(void *, struct kmem_cache *, unsigned long); + struct slab_ops *slab_ops; int inuse; /* Offset to metadata */ int align; /* Alignment */ const char *name; /* Name (only for display!) */ -- From clameter@sgi.com Fri May 4 15:05:20 2007 Message-Id: <20070504220519.978854977@sgi.com> References: <20070504215839.290346570@sgi.com> User-Agent: quilt/0.46-1 Date: Fri, 04 May 2007 14:58:41 -0700 From: clameter@sgi.com To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, dgc@sgi.com, Dipankar Sarma , Eric Dumazet , Mel Gorman Subject: [patch 2/3] SLUB: Implement targeted reclaim and partial list defragmentation Content-Disposition: inline; filename=reclaim_callback Targeted reclaim allows to target a single slab for reclaim. This is done by calling kmem_cache_vacate(slab, page); It will return 1 on success, 0 if the operation failed. The vacate functionality is also used for slab shrinking. During the shrink operation SLUB will generate a list sorted by the number of objects in use. We extract pages off that list that are only filled less than a quarter. These objects are then processed using kmem_cache_vacate. In order for a slabcache to support this functionality two functions must be defined via slab_operations. get_reference(void *) Must obtain a reference to the object if it has not been freed yet. It is up to the slabcache to resolve the race. SLUB guarantees that the objects is still allocated. However, another thread may be blocked in slab_free attempting to free the same object. It may succeed as soon as get_reference() returns to the slab allocator. get_reference() processing must recognize this situation (i.e. check refcount for zero) and fail in such a sitation (no problem since the object will be freed as soon we drop the slab lock before doing kick calls). No slab operations may be performed in get_reference(). The slab lock for the page with the object is taken. Any slab operations may lead to a deadlock. 2. kick_object(void *) After SLUB has established references to the remaining objects in a slab it will drop all locks and then use kick_object on each of the objects for which we obtained a reference. The existence of the objects is guaranteed by virtue of the earlier obtained reference. The callback may perform any slab operation since no locks are held at the time of call. The callback should remove the object from the slab in some way. This may be accomplished by reclaiming the object and then running kmem_cache_free() or reallocating it and then running kmem_cache_free(). Reallocation is advantageous at this point because it will then allocate from the partial slabs with the most objects because we have just finished slab shrinking. NOTE: This patch is for conceptual review. I'd appreciate any feedback especially on the locking approach taken here. It will be critical to resolve the locking issue for this approach to become feasable. Signed-off-by: Christoph Lameter --- include/linux/slab.h | 3 mm/slub.c | 159 ++++++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 154 insertions(+), 8 deletions(-) Index: slub/include/linux/slab.h =================================================================== --- slub.orig/include/linux/slab.h 2007-05-04 13:32:34.000000000 -0700 +++ slub/include/linux/slab.h 2007-05-04 13:32:50.000000000 -0700 @@ -42,6 +42,8 @@ struct slab_ops { void (*ctor)(void *, struct kmem_cache *, unsigned long); /* FIXME: Remove all destructors ? */ void (*dtor)(void *, struct kmem_cache *, unsigned long); + int (*get_reference)(void *); + void (*kick_object)(void *); }; struct kmem_cache *__kmem_cache_create(const char *, size_t, size_t, @@ -54,6 +56,7 @@ void kmem_cache_free(struct kmem_cache * unsigned int kmem_cache_size(struct kmem_cache *); const char *kmem_cache_name(struct kmem_cache *); int kmem_ptr_validate(struct kmem_cache *cachep, const void *ptr); +int kmem_cache_vacate(struct page *); /* * Please use this macro to create slab caches. Simply specify the Index: slub/mm/slub.c =================================================================== --- slub.orig/mm/slub.c 2007-05-04 13:32:34.000000000 -0700 +++ slub/mm/slub.c 2007-05-04 13:56:25.000000000 -0700 @@ -173,7 +173,7 @@ static struct notifier_block slab_notifi static enum { DOWN, /* No slab functionality available */ PARTIAL, /* kmem_cache_open() works but kmalloc does not */ - UP, /* Everything works */ + UP, /* Everything works but does not show up in sysfs */ SYSFS /* Sysfs up */ } slab_state = DOWN; @@ -211,6 +211,8 @@ static inline struct kmem_cache_node *ge struct slab_ops default_slab_ops = { NULL, + NULL, + NULL, NULL }; @@ -839,13 +841,10 @@ static struct page *new_slab(struct kmem n = get_node(s, page_to_nid(page)); if (n) atomic_long_inc(&n->nr_slabs); + page->offset = s->offset / sizeof(void *); page->slab = s; - page->flags |= 1 << PG_slab; - if (s->flags & (SLAB_DEBUG_FREE | SLAB_RED_ZONE | SLAB_POISON | - SLAB_STORE_USER | SLAB_TRACE)) - page->flags |= 1 << PG_error; - + page->inuse = 0; start = page_address(page); end = start + s->objects * s->size; @@ -862,7 +861,17 @@ static struct page *new_slab(struct kmem set_freepointer(s, last, NULL); page->freelist = start; - page->inuse = 0; + + /* + * pages->inuse must be visible when PageSlab(page) becomes + * true for targeted reclaim + */ + smp_wmb(); + page->flags |= 1 << PG_slab; + if (s->flags & (SLAB_DEBUG_FREE | SLAB_RED_ZONE | SLAB_POISON | + SLAB_STORE_USER | SLAB_TRACE)) + page->flags |= 1 << PG_error; + out: if (flags & __GFP_WAIT) local_irq_disable(); @@ -2124,6 +2133,111 @@ void kfree(const void *x) EXPORT_SYMBOL(kfree); /* + * Vacate all objects in the given slab. Slab must be locked. + * + * Will drop and regain and drop the slab lock. + * Slab must be marked PageActive() to avoid concurrent slab_free from + * remove the slab from the lists. At the end the slab will either + * be freed or have been returned to the partial lists. + * + * Return error code or number of remaining objects + */ +static int __kmem_cache_vacate(struct kmem_cache *s, struct page *page) +{ + void *p; + void *addr = page_address(page); + unsigned long map[BITS_TO_LONGS(s->objects)]; + int leftover; + + if (!page->inuse) + return 0; + + /* Determine free objects */ + bitmap_zero(map, s->objects); + for(p = page->freelist; p; p = get_freepointer(s, p)) + set_bit((p - addr) / s->size, map); + + /* + * Get a refcount for all used objects. If that fails then + * no KICK callback can be performed. + */ + for(p = addr; p < addr + s->objects * s->size; p += s->size) + if (!test_bit((p - addr) / s->size, map)) + if (!s->slab_ops->get_reference(p)) + set_bit((p - addr) / s->size, map); + + /* Got all the references we need. Now we can drop the slab lock */ + slab_unlock(page); + + /* Perform the KICK callbacks to remove the objects */ + for(p = addr; p < addr + s->objects * s->size; p += s->size) + if (!test_bit((p - addr) / s->size, map)) + s->slab_ops->kick_object(p); + + slab_lock(page); + leftover = page->inuse; + ClearPageActive(page); + putback_slab(s, page); + return leftover; +} + +/* + * Remove a page from the lists. Must be holding slab lock. + */ +static void remove_from_lists(struct kmem_cache *s, struct page *page) +{ + if (page->inuse < s->objects) + remove_partial(s, page); + else if (s->flags & SLAB_STORE_USER) + remove_full(s, page); +} + +/* + * Attempt to free objects in a page. Return 1 when succesful. + */ +int kmem_cache_vacate(struct page *page) +{ + struct kmem_cache *s; + int rc = 0; + + /* Get a reference to the page. Return if its freed or being freed */ + if (!get_page_unless_zero(page)) + return 0; + + /* Check that this is truly a slab page */ + if (!PageSlab(page)) + goto out; + + slab_lock(page); + + /* + * We may now have locked a page that is in various stages of being + * freed. If the PageSlab bit is off then we have already reached + * the page allocator. If page->inuse is zero then we are + * in SLUB but freeing or allocating the page. + * page->inuse is never modified without the slab lock held. + * + * Also abort if the page happens to be a per cpu slab + */ + if (!PageSlab(page) || PageActive(page) || !page->inuse) { + slab_unlock(page); + goto out; + } + + /* + * We are holding a lock on a slab page that is not in the + * process of being allocated or freed. + */ + s = page->slab; + remove_from_lists(s, page); + SetPageActive(page); + rc = __kmem_cache_vacate(s, page) == 0; +out: + put_page(page); + return rc; +} + +/* * kmem_cache_shrink removes empty slabs from the partial lists * and then sorts the partially allocated slabs by the number * of items in use. The slabs with the most items in use @@ -2137,11 +2251,12 @@ int kmem_cache_shrink(struct kmem_cache int node; int i; struct kmem_cache_node *n; - struct page *page; + struct page *page, *page2; struct page *t; struct list_head *slabs_by_inuse = kmalloc(sizeof(struct list_head) * s->objects, GFP_KERNEL); unsigned long flags; + LIST_HEAD(zaplist); if (!slabs_by_inuse) return -ENOMEM; @@ -2194,8 +2309,36 @@ int kmem_cache_shrink(struct kmem_cache for (i = s->objects - 1; i >= 0; i--) list_splice(slabs_by_inuse + i, n->partial.prev); + if (!s->slab_ops->get_reference || !s->slab_ops->kick_object) + goto out; + + /* Take objects with just a few objects off the tail */ + while (n->nr_partial > MAX_PARTIAL) { + page = container_of(n->partial.prev, struct page, lru); + + /* + * We are holding the list_lock so we can only + * trylock the slab + */ + if (!slab_trylock(page)) + break; + + if (page->inuse > s->objects / 4) + break; + + list_move(&page->lru, &zaplist); + n->nr_partial--; + SetPageActive(page); + slab_unlock(page); + } out: spin_unlock_irqrestore(&n->list_lock, flags); + + /* Now we can free objects in the slabs on the zaplist */ + list_for_each_entry_safe(page, page2, &zaplist, lru) { + slab_lock(page); + __kmem_cache_vacate(s, page); + } } kfree(slabs_by_inuse); -- From clameter@sgi.com Fri May 4 15:05:20 2007 Message-Id: <20070504220520.163037547@sgi.com> References: <20070504215839.290346570@sgi.com> User-Agent: quilt/0.46-1 Date: Fri, 04 May 2007 14:58:42 -0700 From: clameter@sgi.com To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, dgc@sgi.com, Dipankar Sarma , Eric Dumazet , Mel Gorman Subject: [patch 3/3] Support targeted reclaim and slab defrag for dentry cache Content-Disposition: inline; filename=dcache_targetd_reclaim This is an experimental patch for locking review only. I am not that familiar with dentry cache locking. We setup the dcache cache a bit differently using the new APIs and define a get_reference and kick_object() function for the dentry cache. get_dentry_reference simply works by incrementing the dentry refcount if its not already zero. If it is zero then the slab called us while another processor is in the process of freeing the object. The other process will finish this free as soon as we return from this call. So we have to fail. kick_dentry_object() is called after get_dentry_reference() has been used and after the slab has dropped all of its own locks. Trying to use the dentry pruning here. Hope that is correct. Signed-off-by: Christoph Lameter --- fs/dcache.c | 48 +++++++++++++++++++++++++++++++++++++++--------- include/linux/fs.h | 2 +- 2 files changed, 40 insertions(+), 10 deletions(-) Index: slub/fs/dcache.c =================================================================== --- slub.orig/fs/dcache.c 2007-05-04 13:32:15.000000000 -0700 +++ slub/fs/dcache.c 2007-05-04 13:55:39.000000000 -0700 @@ -2133,17 +2133,48 @@ static void __init dcache_init_early(voi INIT_HLIST_HEAD(&dentry_hashtable[loop]); } +/* + * The slab is holding locks on the current slab. We can just + * get a reference + */ +int get_dentry_reference(void *private) +{ + struct dentry *dentry = private; + + return atomic_inc_not_zero(&dentry->d_count); +} + +/* + * Slab has dropped all the locks. Get rid of the + * refcount we obtained earlier and also rid of the + * object. + */ +void kick_dentry_object(void *private) +{ + struct dentry *dentry = private; + + spin_lock(&dentry->d_lock); + if (atomic_read(&dentry->d_count) > 1) { + spin_unlock(&dentry->d_lock); + dput(dentry); + } + spin_lock(&dcache_lock); + prune_one_dentry(dentry, 1); + spin_unlock(&dcache_lock); +} + +struct slab_ops dentry_slab_ops = { + .get_reference = get_dentry_reference, + .kick_object = kick_dentry_object +}; + static void __init dcache_init(unsigned long mempages) { int loop; - /* - * A constructor could be added for stable state like the lists, - * but it is probably not worth it because of the cache nature - * of the dcache. - */ - dentry_cache = KMEM_CACHE(dentry, - SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD); + dentry_cache = KMEM_CACHE_OPS(dentry, + SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD, + &dentry_slab_ops); register_shrinker(&dcache_shrinker); @@ -2192,8 +2223,7 @@ void __init vfs_caches_init(unsigned lon names_cachep = kmem_cache_create("names_cache", PATH_MAX, 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL, NULL); - filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0, - SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL, NULL); + filp_cachep = KMEM_CACHE(file, SLAB_PANIC); dcache_init(mempages); inode_init(mempages); Index: slub/include/linux/fs.h =================================================================== --- slub.orig/include/linux/fs.h 2007-05-04 13:32:15.000000000 -0700 +++ slub/include/linux/fs.h 2007-05-04 13:55:39.000000000 -0700 @@ -785,7 +785,7 @@ struct file { spinlock_t f_ep_lock; #endif /* #ifdef CONFIG_EPOLL */ struct address_space *f_mapping; -}; +} ____cacheline_aligned; extern spinlock_t files_lock; #define file_list_lock() spin_lock(&files_lock); #define file_list_unlock() spin_unlock(&files_lock); --