From clameter@sgi.com Mon Oct 15 21:21:12 2007
Date: Mon, 15 Oct 2007 21:21:12 -0700 (PDT)
From: Christoph Lameter <clameter@sgi.com>
To: Adrian Drzewiecki <z@drze.net>
Subject: Re: slub_test (was Re: SLUB: cpu lists)

On Mon, 15 Oct 2007, Adrian Drzewiecki wrote:

> > > Any clue as to where those cycles go?
> > 
> > I suspect kmem_cache_cpu pointer calculation.
> > 
> 
> In the name of micro-optimization, and over-clockers everywhere, dropping
> kmem_cache_cpu would be the proper thing to do.
> 
> lockless_freelist isn't much uglier, and if it works okay, might as well keep
> it, and let the masses QA it.

On the other hand there are also some wins in allocation loops. Tough 
call. I guess I will just let it slide in. It cleans up some things after 
all.

But we have not found a way to address the regression yet.

The numbers for the remote test (cycles) are

Size	SLAB	SLUB (2.6.23)  2.6.23-mm1   2.6.23-mm1+cpu_lists
8	1033	1371		1392		760
16	1036	1511		1536		804
32	1056	1785		1809		879
64	1104	2352		2344		1379
128	1213	2859		2826		1611
256	1399	3210		3213		2012
512	1839	3573		3649		2483
1024	2461	4144		4352		3592
2048	4229	4717		4853		4211
4096	7858	6484		3915		3905

The cpu lists are superior at sizes <64 but suck between 128 and 1024 
bytes. Why is this? I guess the costs of stringing objects together by 
scanning linked lists.

The 4096 value is better due to page allocator pass through in 2.6.23-mm1.