SLUB: slab defragmentation and kmem_cache_vacate Slab defragmentation occurs when the slabs are shrunk (after inode, dentry shrinkers have been run from the reclaim code) or when a manual shrinking is requested via slabinfo. During the shrink operation SLUB will generate a list of partially populated slabs sorted by the number of objects in use. We extract pages off that list that are only filled less than a quarter and attempt to motivate the users of those slabs to either remove the objects or move the objects. Targeted reclaim allows to target a single slab for reclaim. This is done by calling kmem_cache_vacate(page); It will return 1 on success, 0 if the operation failed. In order for a slabcache to support defragmentation a couple of functions must be defined via kmem_cache_ops. These are void *get(struct kmem_cache *s, int nr, void **objects) Must obtain a reference to the listed objects. SLUB guarantees that the objects are still allocated. However, other threads may be blocked in slab_free attempting to free objects in the slab. These may succeed as soon as get() returns to the slab allocator. The function must be able to detect the situation and void the attempts to handle such objects (by for example voiding the corresponding entry in the objects array). No slab operations may be performed in get_reference(). Interrupts are disabled. What can be done is very limited. The slab lock for the page with the object is taken. Any attempt to perform a slab operation may lead to a deadlock. get() returns a private pointer that is passed to kick. Should we be unable to obtain all references then that pointer may indicate to the kick() function that it should not attempt any object removal or move but simply remove the reference counts. void kick(struct kmem_cache *, int nr, void **objects, void *get_result) After SLUB has established references to the objects in a slab it will drop all locks and then use kick() to move objects out of the slab. The existence of the object is guaranteed by virtue of the earlier obtained references via get(). The callback may perform any slab operation since no locks are held at the time of call. The callback should remove the object from the slab in some way. This may be accomplished by reclaiming the object and then running kmem_cache_free() or reallocating it and then running kmem_cache_free(). Reallocation is advantageous because the partial slabs were just sorted to have the partial slabs with the most objects first. Allocation is likely to result in filling up a slab so that it can be removed from the partial list. Kick() does not return a result. SLUB will check the number of remaining objects in the slab. If all objects were removed then we know that the operation was successful. If a kmem_cache_vacate on a page fails then the slab has usually a pretty low usage ratio. Go through the slab and resequence the freelist so that object addresses increase as we allocate objects. This will trigger the cacheline prefetcher when we start allocating from the slab again and thereby increase allocations speed. Signed-off-by: Christoph Lameter --- include/linux/slab.h | 31 +++++ mm/slab.c | 9 + mm/slob.c | 9 + mm/slub.c | 271 ++++++++++++++++++++++++++++++++++++++++++++++++--- 4 files changed, 307 insertions(+), 13 deletions(-) Index: slub/mm/slub.c =================================================================== --- slub.orig/mm/slub.c 2007-05-31 14:38:57.000000000 -0700 +++ slub/mm/slub.c 2007-05-31 14:42:51.000000000 -0700 @@ -2363,6 +2363,39 @@ void kfree(const void *x) EXPORT_SYMBOL(kfree); /* + * Order the freelist so that addresses increase as object are allocated. + * This is useful to trigger the cpu cacheline prefetching logic. + */ +void resequence_freelist(struct kmem_cache *s, struct page *page) +{ + void *p; + void *last; + void *addr = page_address(page); + DECLARE_BITMAP(map, s->objects); + + bitmap_zero(map, s->objects); + + /* Figure out which objects are on the freelist */ + for_each_free_object(p, s, page->freelist) + set_bit(slab_index(p, s, addr), map); + + last = NULL; + for_each_object(p, s, addr) + if (test_bit(slab_index(p, s, addr), map)) { + if (last) + set_freepointer(s, last, p); + else + page->freelist = p; + last = p; + } + + if (last) + set_freepointer(s, last, NULL); + else + page->freelist = NULL; +} + +/* * Vacate all objects in the given slab. * * Slab must be locked and frozen. Interrupts are disabled (flags must @@ -2421,6 +2454,13 @@ out: * Check the result and unfreeze the slab */ leftover = page->inuse; + if (leftover > 0) + /* + * Cannot free. Lets at least optimize the freelist. We have + * likely touched all the cachelines with the free pointers + * already so it is cheap to do here. + */ + resequence_freelist(s, page); unfreeze_slab(s, page); local_irq_restore(flags); return leftover;