SLUB: Avoid interrupt disable / enable in hot paths through cmpxchg A cmpxchg allows us to avoid disabling and enabling interrupts. The cmpxchg is optimal to allow operations on per cpu freelist even if we may be moved to other processors while getting to the cmpxchg. So we do not need to be pinned to a cpu. This may be particularly useful for the RT kernel where we currently seem to have major SLAB issues with the per cpu structures. But the constant interrupt disable / enable of slab operations also increases the performance in general. The hard binding to per cpu structures only comes into play when we enter the slow path (__slab_alloc and __slab_free). At that point we have disabled interrupts like before. We have a problem of determining the page struct in slab_free due the issue that the freelist pointer is the only data value that we can reliably operate on. So we need to do a virt_to_page() on the freelist. This makes it impossible to use the fastpath for a full slab and increases overhead through a second virt_to_page for each slab_free(). We really need the virtual memmap patchset to get slab_free to good performance for this one. Pro: - Dirty single cacheline with a single instruction in slab_alloc to accomplish allocation. - Critical section is also a single instruction in slab_free. (but we need to write to the cacheline of the object too) Con: - Complex racy freelist management - Recalculation of per cpu structure address is necessary in __slab_alloc since process may be rescheduled while executing in slab_alloc. - Need to determine page address from freelist in slab_free resulting in a second virt_to_page in slab_free. - slab_free() depends more on the performance of virt_to_page(). Signed-off-by: Christoph Lameter --- mm/slub.c | 41 ++++++++++++++++++++++++++++++++--------- 1 file changed, 32 insertions(+), 9 deletions(-) Index: linux-2.6.22-rc6-mm1/mm/slub.c =================================================================== --- linux-2.6.22-rc6-mm1.orig/mm/slub.c 2007-07-07 19:04:17.000000000 -0700 +++ linux-2.6.22-rc6-mm1/mm/slub.c 2007-07-07 19:17:38.000000000 -0700 @@ -1631,7 +1631,9 @@ static void __slab_free(struct kmem_cach { void *prior; void **object = (void *)x; + unsigned long flags; + local_irq_save(flags); slab_lock(page); if (unlikely(SlabDebug(page))) @@ -1657,6 +1659,7 @@ checks_ok: out_unlock: slab_unlock(page); + local_irq_restore(flags); return; slab_empty: @@ -1667,6 +1670,7 @@ slab_empty: remove_partial(s, page); slab_unlock(page); + local_irq_restore(flags); discard_slab(s, page); return; @@ -1691,18 +1695,37 @@ static void __always_inline slab_free(st struct page *page, void *x, void *addr) { void **object = (void *)x; - unsigned long flags; + void **freelist; struct kmem_cache_cpu *c; - local_irq_save(flags); - c = get_cpu_slab(s, smp_processor_id()); - if (likely(page == c->page && c->freelist)) { - object[c->offset] = c->freelist; - c->freelist = object; - } else - __slab_free(s, page, x, addr, c->offset); +redo: + c = get_cpu_slab(s, raw_smp_processor_id()); + freelist = c->freelist; - local_irq_restore(flags); + /* + * Must read c->freelist before c->page. If the page is + * later changed then the freelist also changes which + * will make the cmpxchg() fail. + * + * deactivate_slab() sets c->page to NULL while taking + * the slab lock which provides the corresponding + * smp_wmb() barrier. + */ + smp_rmb(); + if (unlikely(c->page != page)) + goto slow; + + if (unlikely(!freelist)) + goto slow; + + object[c->offset] = freelist; + if (unlikely(cmpxchg(&c->freelist, freelist, object) != freelist)) + goto redo; + + return; + +slow: + __slab_free(s, page, x, addr, c->offset); } void kmem_cache_free(struct kmem_cache *s, void *x)