From clameter@sgi.com Mon Oct 15 21:21:12 2007 Date: Mon, 15 Oct 2007 21:21:12 -0700 (PDT) From: Christoph Lameter To: Adrian Drzewiecki Subject: Re: slub_test (was Re: SLUB: cpu lists) On Mon, 15 Oct 2007, Adrian Drzewiecki wrote: > > > Any clue as to where those cycles go? > > > > I suspect kmem_cache_cpu pointer calculation. > > > > In the name of micro-optimization, and over-clockers everywhere, dropping > kmem_cache_cpu would be the proper thing to do. > > lockless_freelist isn't much uglier, and if it works okay, might as well keep > it, and let the masses QA it. On the other hand there are also some wins in allocation loops. Tough call. I guess I will just let it slide in. It cleans up some things after all. But we have not found a way to address the regression yet. The numbers for the remote test (cycles) are Size SLAB SLUB (2.6.23) 2.6.23-mm1 2.6.23-mm1+cpu_lists 8 1033 1371 1392 760 16 1036 1511 1536 804 32 1056 1785 1809 879 64 1104 2352 2344 1379 128 1213 2859 2826 1611 256 1399 3210 3213 2012 512 1839 3573 3649 2483 1024 2461 4144 4352 3592 2048 4229 4717 4853 4211 4096 7858 6484 3915 3905 The cpu lists are superior at sizes <64 but suck between 128 and 1024 bytes. Why is this? I guess the costs of stringing objects together by scanning linked lists. The 4096 value is better due to page allocator pass through in 2.6.23-mm1.