GIT 28e4b71a66881df1ac343f13d06395fa01021e8e git+ssh://master.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6.git#for-mm 28e4b71a66881df1ac343f13d06395fa01021e8e

commit 28e4b71a66881df1ac343f13d06395fa01021e8e
Author: Pekka Enberg <penberg@cs.helsinki.fi>
Date:   Tue Apr 8 22:26:36 2008 +0300

    slub: use typedefs for ->get and ->kick functions
    
    As suggested by Andrew Morton, use typedefs for the SLUB defragmentation ->get
    and ->kick callback functions.
    
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit 9ea7cc66193775609c0e736de6ef38c5487a5ad9
Author: Christoph Lameter <clameter@sgi.com>
Date:   Tue Apr 8 22:26:31 2008 +0300

    SLUB: Trigger defragmentation from memory reclaim
    
    This patch triggers slab defragmentation from memory reclaim. The logical
    point for this is after slab shrinking was performed in vmscan.c. At that point
    the fragmentation ratio of a slab was increased because objects were freed via
    the LRUs. So we call kmem_cache_defrag() from there.
    
    slab_shrink() from vmscan.c is called in some contexts to do global shrinking
    of slabs and in others to do shrinking for a particular zone. Pass the zone to
    slab_shrink, so that slab_shrink() can call kmem_cache_defrag() and restrict
    the defragmentation to the node that is under memory pressure.
    
    Reviewed-by: Rik van Riel <riel@redhat.com>
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit 94c7b233b80d6ecff8304ed20acf071c37de342c
Author: Christoph Lameter <clameter@sgi.com>
Date:   Tue Apr 8 22:26:31 2008 +0300

    slub: add defrag statistics
    
    Add statistics counters for slab defragmentation.
    
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit 171250363fe803b4dc61301276c2693cce3e5684
Author: Christoph Lameter <clameter@sgi.com>
Date:   Tue Apr 8 22:26:30 2008 +0300

    SLUB: Extend slabinfo to support -D and -F options
    
    -F lists caches that support defragmentation
    
    -C lists caches that use a ctor.
    
    Change field names for defrag_ratio and remote_node_defrag_ratio.
    
    Add determination of the allocation ratio for a slab. The allocation ratio
    is the percentage of available slots for objects in use.
    
    Reviewed-by: Rik van Riel <riel@redhat.com>
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit 831d78b552aade2c383cf8d75b180dd35f81a4e3
Author: Christoph Lameter <clameter@sgi.com>
Date:   Tue Apr 8 22:26:30 2008 +0300

    SLUB: Add KICKABLE to avoid repeated kick() attempts
    
    Add a flag KICKABLE to be set on slabs with a defragmentation method
    
    Clear the flag if a kick action is not successful in reducing the
    number of objects in a slab. This will avoid future attempts to
    kick objects out.
    
    The KICKABLE flag is set again when all objects of the slab have been
    allocated (Occurs during removal of a slab from the partial lists).
    
    Reviewed-by: Rik van Riel <riel@redhat.com>
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit c963d891d875a9bd39ae44da623c421bc0140937
Author: Christoph Lameter <clameter@sgi.com>
Date:   Tue Apr 8 22:26:30 2008 +0300

    SLUB: Slab defrag core
    
    Slab defragmentation may occur:
    
    1. Unconditionally when kmem_cache_shrink is called on a slab cache by the
       kernel calling kmem_cache_shrink.
    
    2. Use of the slabinfo command line to trigger slab shrinking.
    
    3. Per node defrag conditionally when kmem_cache_defrag(<node>) is called.
    
       Defragmentation is only performed if the fragmentation of the slab
       is lower than the specified percentage. Fragmentation ratios are measured
       by calculating the percentage of objects in use compared to the total
       number of objects that the slab page can accomodate.
    
       Fragmentation is skipped if it was less than a tenth of a second since we
       last checked a slab cache. An unsuccessful defrag attempt pauses attempts
       for at least one second. This is necessary to limit useless partial
       list scanning. The scanning of slab caches is optimized because the
       defragmentable slabs come first on the list. Thus we can terminate scans
       on the first slab encountered that does not support defragmentation.
    
       kmem_cache_defrag() takes a node parameter. This can either be -1 if
       defragmentation should be performed on all nodes, or a node number.
       If a node number was specified then defragmentation is only performed
       on a specific node.
    
    A couple of functions must be setup via a call to kmem_cache_setup_defrag()
    in order for a slabcache to support defragmentation. These are
    
    void *get(struct kmem_cache *s, int nr, void **objects)
    
    	Must obtain a reference to the listed objects. SLUB guarantees that
    	the objects are still allocated. However, other threads may be blocked
    	in slab_free() attempting to free objects in the slab. These may succeed
    	as soon as get() returns to the slab allocator. The function must
    	be able to detect such situations and void the attempts to free such
    	objects (by for example voiding the corresponding entry in the objects
    	array).
    
    	No slab operations may be performed in get(). Interrupts
    	are disabled. What can be done is very limited. The slab lock
    	for the page that contains the object is taken. Any attempt to perform
    	a slab operation may lead to a deadlock.
    
    	get() returns a private pointer that is passed to kick. Should we
    	be unable to obtain all references then that pointer may indicate
    	to the kick() function that it should not attempt any object removal
    	or move but simply remove the reference counts.
    
    void kick(struct kmem_cache *, int nr, void **objects, void *get_result)
    
    	After SLUB has established references to the objects in a
    	slab it will then drop all locks and use kick() to move objects out
    	of the slab. The existence of the object is guaranteed by virtue of
    	the earlier obtained references via get(). The callback may perform
    	any slab operation since no locks are held at the time of call.
    
    	The callback should remove the object from the slab in some way. This
    	may be accomplished by reclaiming the object and then running
    	kmem_cache_free() or reallocating it and then running
    	kmem_cache_free(). Reallocation is advantageous because the partial
    	slabs were just sorted to have the partial slabs with the most objects
    	first. Reallocation is likely to result in filling up a slab in
    	addition to freeing up one slab. A filled up slab can also be removed
    	from the partial list. So there could be a double effect.
    
    	Kick() does not return a result. SLUB will check the number of
    	remaining objects in the slab. If all objects were removed then
    	we know that the operation was successful.
    
    [penberg@cs.helsinki.fi: fix up locking in __kmem_cache_shrink()]
    Reviewed-by: Rik van Riel <riel@redhat.com>
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit bf197812d941b484ac64da8d28ee885f8938dcff
Author: Christoph Lameter <clameter@sgi.com>
Date:   Tue Apr 8 22:26:30 2008 +0300

    SLUB: Sort slab cache list and establish maximum objects for defrag slabs
    
    When defragmenting slabs then it is advantageous to have all
    defragmentable slabs together at the beginning of the list so that there is
    no need to scan the complete list. Put defragmentable caches first when adding
    a slab cache and others last.
    
    Determine the maximum number of objects in defragmentable slabs. This allows
    to size the allocation of arrays holding refs to these objects later.
    
    Reviewed-by: Rik van Riel <riel@redhat.com>
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit 66892337435a0d88996057af221e8c18ff91bc14
Author: Christoph Lameter <clameter@sgi.com>
Date:   Tue Apr 8 22:26:29 2008 +0300

    SLUB: Add get() and kick() methods
    
    Add the two methods needed for defragmentation and add the display of the
    methods via the proc interface.
    
    Add documentation explaining the use of these methods.
    
    Reviewed-by: Rik van Riel <riel@redhat.com>
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit 24337bca6e77ab48f459e35690b32ef20a34bda5
Author: Christoph Lameter <clameter@sgi.com>
Date:   Tue Apr 8 22:26:29 2008 +0300

    SLUB: Replace ctor field with ops field in /sys/slab/*
    
    Create an ops field in /sys/slab/*/ops to contain all the operations defined
    on a slab. This will be used to display the additional operations that will
    be defined soon.
    
    Reviewed-by: Rik van Riel <riel@redhat.com>
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit 676e1fd04174c8192c0fbf920a798c8f033e1960
Author: Christoph Lameter <clameter@sgi.com>
Date:   Tue Apr 8 22:26:29 2008 +0300

    SLUB: Add defrag_ratio field and sysfs support.
    
    The defrag_ratio is used to set the threshold at which defragmentation
    should be run on a slabcache.
    
    The allocation ratio is measured in the percentage of the available slots
    allocated. The percentage will be lower for slabs that are more fragmented.
    
    Add a defrag ratio field and set it to 30% by default. A limit of 30% specified
    that less than 3 out of 10 available slots for objects are in use before
    slab defragmeentation runs.
    
    Reviewed-by: Rik van Riel <riel@redhat.com>
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit d3cd9f89992a413a94e1e48714426b7ee22b06dc
Author: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
Date:   Tue Apr 8 17:18:06 2008 +0800

    slub: change the formula which calculates min_objects based on number of processors
    
    Current formula to calculate min_objects based on number of processors is
    '4 * fls(nr_cpu_ids)', which is not the best optimization on 16-core tigerton.
    If I add 4 to its result, hackbench result is better.
    
    On 16-core tigerton, by run
    ./hackbench 100 process 2000
    results are:
    1) 2.6.25-rc6slab: 23.5seconds
    2) 2.6.25-rc7SLUB+slub_min_objects=20: 31seconds
    3) 2.6.25-rc7SLUB+slub_min_objects=24: 23.5seconds
    
    So adding 4 to the output of '4 * fls(nr_cpu_ids)' could get the similar result
    like CONFIG_SLAB=y.
    
    Below patch adds 4 to the formula.
    With the patch, the mininum objects per slab is calculated as below:
    
    Processors    min_objects
    ---------------------------
    1             8
    2             12
    4             16
    8             20
    16            24
    32            28
    64            32
    1024          48
    4096          56
    
    Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
    CC: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit 8d817d05eeb38a52ebbe2982e65691e507bff1e2
Author: Christoph Lameter <clameter@sgi.com>
Date:   Fri Apr 4 15:50:31 2008 -0700

    slub: pack objects denser
    
    Since we now have more orders available use a denser packing.
    Increase slab order if more than 1/16th of a slab would be wasted.
    
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit 88a50aab9a11f0bd75c34703a42445a56489446a
Author: Christoph Lameter <clameter@sgi.com>
Date:   Fri Apr 4 15:50:30 2008 -0700

    slub: Calculate min_objects based on number of processors.
    
    The mininum objects per slab is calculated based on the number of processors
    that may come online.
    
    Processors	min_objects
    ---------------------------
    1		4
    2		8
    4		12
    8		16
    16		20
    32		24
    64		28
    1024		44
    4096		52
    
    The higher the number of processors the large the order sizes used for
    various slab caches will become. This has been shown to address the
    performance issues in hackbench on 16p etc.
    
    The calculation is only performed if slub_min_objects is zero (default).
    If one specifies a slub_min_objects on boot then that setting is taken.
    
    Cc: yanmin_zhang@linux.intel.com
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit b81da56f11a47ec547f4a745e329c6945d85637b
Author: Christoph Lameter <clameter@sgi.com>
Date:   Fri Apr 4 15:50:29 2008 -0700

    slub: Drop DEFAULT_MAX_ORDER / DEFAULT_MIN_OBJECTS
    
    We can now fallback to order 0 slabs. So set the slub_max_order to
    PAGE_CACHE_ORDER_COSTLY but keep the slub_min_objects at 4. This
    will mostly preserve the orders used in 2.6.25. F.e. The 2k kmalloc slab
    will use order 1 allocs and the 4k kmalloc slab order 2.
    
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit 23ac45a7efac366d5eabfbe3a94f16f992311f08
Author: Christoph Lameter <clameter@sgi.com>
Date:   Fri Apr 4 15:50:28 2008 -0700

    slub: Simplify any_slab_object checks
    
    Since we now have total_objects counter per node use that to
    check for the presence of any objects. The loop over all cpu slabs
    is not that useful since any cpu slab would require an object allocation
    first. So drop that.
    
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit acd49c885e03f087c31f49e7c42ccb8befbf4009
Author: Christoph Lameter <clameter@sgi.com>
Date:   Fri Apr 4 15:50:27 2008 -0700

    slub: Make the order configurable for each slab cache
    
    Makes /sys/kernel/slab/<slabname>/order writable. The allocation
    order of a slab cache can then be changed dynamically during runtime.
    This can be used to override the objects per slabs value establisheed
    with the slub_min_objects setting that was manually specified or
    calculated on bootup.
    
    The changes of the slab order can occur while allocate_slab() runs.
    Allocate slab needs the order and the number of slab objects that
    are both changed by the change of order. Both are put into
    a single word (struct kmem_cache_order_objects). They can then
    be atomically updated and retrieved.
    
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit d82693414e6841a6ce9ae30a7d767a1a98b30161
Author: Christoph Lameter <clameter@sgi.com>
Date:   Fri Apr 4 15:50:26 2008 -0700

    slub: Drop fallback to page allocator method
    
    There is now a generic method of falling back to a slab page of minimal
    order. No need anymore for the fallback to kmalloc_large().
    
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit c5dafdd6cf205e2ecb5ef513d0f3deab82ed45dd
Author: Christoph Lameter <clameter@sgi.com>
Date:   Fri Apr 4 15:50:25 2008 -0700

    slub: Fallback to minimal order during slab page allocation
    
    If any higher order allocation fails then fall back the smallest order
    necessary to contain at least one object. This enables fallback for all
    allocations to order 0 pages. The fallback will waste more memory (objects
    will not fit neatly) and the fallback slabs will be not as efficient as larger
    slabs since they contain less objects.
    
    Note that SLAB also depends on order 1 allocations for some slabs that waste
    too much memory if forced into PAGE_SIZE'd page. SLUB now can now deal with
    failing order 1 allocs which SLAB cannot do.
    
    Add a new field min that will contain the objects for the smallest possible order
    for a slab cache.
    
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit 5bfe92162b96c73a6244400e54676fa73617c6a4
Author: Christoph Lameter <clameter@sgi.com>
Date:   Fri Apr 4 15:50:24 2008 -0700

    slub: Update statistics handling for variable order slabs
    
    Change the statistics to consider that slabs of the same slabcache
    can have different number of objects in them since they may be of
    different order.
    
    Provide a new sysfs field
    
    	total_objects
    
    which shows the total objects that the allocated slabs of a slabcache
    could hold.
    
    Add a max field that holds the largest slab order that was ever used
    for a slab cache.
    
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit 5293be7138acda6968f082c57462837eb4d5040c
Author: Christoph Lameter <clameter@sgi.com>
Date:   Fri Apr 4 15:50:23 2008 -0700

    slub: Add kmem_cache_order_objects struct
    
    Pack the order and the number of objects into a single word.
    This saves some memory in the kmem_cache_structure and more importantly
    allows us to fetch both values atomically.
    
    Later the slab orders become runtime configurable and we need to fetch these
    two items together in order to properly allocate a slab and initialize its
    objects.
    
    Fix the race by fetching the order and the number of objects in one word.
    
    [penberg@cs.helsinki.fi: fix memset() page order in new_slab()]
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit 303383ffc854607f1ea0e6f0b9a7467549b2393a
Author: Christoph Lameter <clameter@sgi.com>
Date:   Fri Apr 4 15:50:22 2008 -0700

    slub: for_each_object must be passed the number of objects in a slab
    
    Pass the number of objects to the for_each_object macro. Most of these are
    debug related.
    
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit 2c494faabfb92d03b1a6cb8d0295dd2db129a9ab
Author: Christoph Lameter <clameter@sgi.com>
Date:   Fri Apr 4 15:50:21 2008 -0700

    slub: Store max number of objects in the page struct.
    
    Split the inuse field up to be able to store the number of objects in this
    page in the page struct as well. Necessary if we want to have pages of
    various orders for a slab. Also avoids touching struct kmem_cache cachelines in
    __slab_alloc().
    
    Update diagnostic code to check the number of objects and make sure that
    the number of objects always stays within the bounds of a 16 bit unsigned
    integer.
    
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit 910e419b394acb4f280aea33ab47eb52d654b4aa
Author: Christoph Lameter <clameter@sgi.com>
Date:   Sat Apr 5 20:02:13 2008 +0300

    slub: No need for per node slab counters if !SLUB_DEBUG
    
    The per node counters are used mainly for showing data through the sysfs API.
    If that API is not compiled in then there is no point in keeping track of this
    data. Disable counters for the number of slabs and the number of total slabs
    if !SLUB_DEBUG. Incrementing the per node counters is also accessing a
    potentially contended cacheline so this could actually be a performance
    benefit to embedded systems.
    
    SLABINFO support is also affected. It now must depends on SLUB_DEBUG (which
    is on by default).
    
    Patch also avoids a check for a NULL kmem_cache_node pointer in new_slab()
    if the system is not compiled with NUMA support.
    
    [penberg@cs.helsinki.fi: fix oops and move ->nr_slabs into CONFIG_SLUB_DEBUG]
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit 6aac8c3bc0db55415b461e9cb80086ac025e1bab
Author: Christoph Lameter <clameter@sgi.com>
Date:   Sat Apr 5 18:44:58 2008 +0300

    slub: Move map/flag clearing to __free_slab
    
    __free_slab does some diagnostics. The resetting of mapcount etc
    in discard_slab() can interfere with debug processing. So move
    the reset immediately before the page is freed.
    
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit 20666019174a16a129dd4b20130fd4adb09c3702
Author: Christoph Lameter <clameter@sgi.com>
Date:   Fri Apr 4 13:00:55 2008 -0700

    slub: Fixes to per cpu stat output in sysfs
    
    Only output per cpu stats if the kernel is build for SMP.
    
    Use a capital "C" as a leading character for the processor number
    (same as the numa statistics that also use a capital letter "N").
    
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit d1f8e10ba374edee5aa456594e0df0373ca1a0ad
Author: Christoph Lameter <clameter@sgi.com>
Date:   Fri Apr 4 13:00:54 2008 -0700

    slub: Deal with config variable dependencies
    
    count_partial() is used by both slabinfo and the sysfs proc support. Move
    the function directly before the beginning of the sysfs code so that it can
    be easily found. Rework the preprocessor conditional to take into account
    that slub sysfs support depends on CONFIG_SYSFS *and* CONFIG_SLUB_DEBUG.
    
    Make CONFIG_SLUB_STATS depend on CONFIG_SLUB_DEBUG and CONFIG_SYSFS. There
    is no point of keeping statistics if no one can restrive them.
    
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit efbda3893a44778241f02a77bf241f4cbd1cdccf
Author: Christoph Lameter <clameter@sgi.com>
Date:   Fri Apr 4 13:00:53 2008 -0700

    slub: Reduce #ifdef ZONE_DMA by moving kmalloc_caches_dma near dma logic
    
    Move the definition of kmalloc_caches_dma() into a later #ifdef CONFIG_ZONE_DMA.
    This saves one #ifdef and leaves us with a total of two #ifdefs for dma slab support.
    
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit 80ef72015572c71f81cfb128e208b2a0f51c7a66
Author: Pekka Enberg <penberg@cs.helsinki.fi>
Date:   Fri Apr 4 13:00:52 2008 -0700

    slub: Initialize per-cpu stats
    
    As spotted by kmemcheck, we need to initialize the per-CPU ->stat array before
    using it.
    
    [kmem_cache_cpu structures are usually allocated from arrays defined via
    DEFINE_PER_CPU that are zeroed so we have not noticed this so far --cl].
    
    Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
    Signed-off-by: Christoph Lameter <clameter@sgi.com>
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
 Documentation/vm/slabinfo.c |  103 ++++--
 fs/drop_caches.c            |    2 +-
 include/linux/mm.h          |    2 +-
 include/linux/mm_types.h    |    5 +-
 include/linux/slab_def.h    |    5 +
 include/linux/slob_def.h    |    5 +
 include/linux/slub_def.h    |   70 ++++-
 init/Kconfig                |    2 +-
 lib/Kconfig.debug           |    2 +-
 mm/slub.c                   |  914 ++++++++++++++++++++++++++++++-------------
 mm/vmscan.c                 |   26 +-
 11 files changed, 823 insertions(+), 313 deletions(-)

diff --git a/Documentation/vm/slabinfo.c b/Documentation/vm/slabinfo.c
index 22d7e3e..e14f026 100644
--- a/Documentation/vm/slabinfo.c
+++ b/Documentation/vm/slabinfo.c
@@ -31,7 +31,9 @@ struct slabinfo {
 	int hwcache_align, object_size, objs_per_slab;
 	int sanity_checks, slab_size, store_user, trace;
 	int order, poison, reclaim_account, red_zone;
-	unsigned long partial, objects, slabs;
+	int defrag, ctor;
+	int defrag_ratio, remote_node_defrag_ratio;
+	unsigned long partial, objects, slabs, objects_partial, objects_total;
 	unsigned long alloc_fastpath, alloc_slowpath;
 	unsigned long free_fastpath, free_slowpath;
 	unsigned long free_frozen, free_add_partial, free_remove_partial;
@@ -39,6 +41,9 @@ struct slabinfo {
 	unsigned long cpuslab_flush, deactivate_full, deactivate_empty;
 	unsigned long deactivate_to_head, deactivate_to_tail;
 	unsigned long deactivate_remote_frees;
+	unsigned long shrink_calls, shrink_attempt_defrag, shrink_empty_slab;
+	unsigned long shrink_slab_skipped, shrink_slab_reclaimed;
+	unsigned long shrink_object_reclaim_failure;
 	int numa[MAX_NODES];
 	int numa_partial[MAX_NODES];
 } slabinfo[MAX_SLABS];
@@ -64,6 +69,8 @@ int show_slab = 0;
 int skip_zero = 1;
 int show_numa = 0;
 int show_track = 0;
+int show_defrag = 0;
+int show_ctor = 0;
 int show_first_alias = 0;
 int validate = 0;
 int shrink = 0;
@@ -100,13 +107,15 @@ void fatal(const char *x, ...)
 void usage(void)
 {
 	printf("slabinfo 5/7/2007. (c) 2007 sgi. clameter@sgi.com\n\n"
-		"slabinfo [-ahnpvtsz] [-d debugopts] [slab-regexp]\n"
+		"slabinfo [-aCdDefFhnpvtsz] [-d debugopts] [slab-regexp]\n"
 		"-a|--aliases           Show aliases\n"
 		"-A|--activity          Most active slabs first\n"
+		"-C|--ctor              Show slabs with ctors\n"
 		"-d<options>|--debug=<options> Set/Clear Debug options\n"
 		"-D|--display-active    Switch line format to activity\n"
 		"-e|--empty             Show empty slabs\n"
 		"-f|--first-alias       Show first alias\n"
+		"-F|--defrag            Show defragmentable caches\n"
 		"-h|--help              Show usage information\n"
 		"-i|--inverted          Inverted list\n"
 		"-l|--slabs             Show slabs\n"
@@ -296,7 +305,7 @@ void first_line(void)
 		printf("Name                   Objects    Alloc     Free   %%Fast\n");
 	else
 		printf("Name                   Objects Objsize    Space "
-			"Slabs/Part/Cpu  O/S O %%Fr %%Ef Flg\n");
+			"Slabs/Part/Cpu  O/S O %%Ra %%Ef Flg\n");
 }
 
 /*
@@ -345,7 +354,7 @@ void slab_numa(struct slabinfo *s, int mode)
 		return;
 
 	if (!line) {
-		printf("\n%-21s:", mode ? "NUMA nodes" : "Slab");
+		printf("\n%-21s: Rto ", mode ? "NUMA nodes" : "Slab");
 		for(node = 0; node <= highest_node; node++)
 			printf(" %4d", node);
 		printf("\n----------------------");
@@ -354,6 +363,7 @@ void slab_numa(struct slabinfo *s, int mode)
 		printf("\n");
 	}
 	printf("%-21s ", mode ? "All slabs" : s->name);
+	printf("%3d ", s->remote_node_defrag_ratio);
 	for(node = 0; node <= highest_node; node++) {
 		char b[20];
 
@@ -459,22 +469,28 @@ void slab_stats(struct slabinfo *s)
 
 	printf("Total                %8lu %8lu\n\n", total_alloc, total_free);
 
-	if (s->cpuslab_flush)
-		printf("Flushes %8lu\n", s->cpuslab_flush);
-
-	if (s->alloc_refill)
-		printf("Refill %8lu\n", s->alloc_refill);
+	if (s->cpuslab_flush || s->alloc_refill)
+		printf("CPU Slab  : Flushes=%lu Refills=%lu\n",
+			s->cpuslab_flush, s->alloc_refill);
 
 	total = s->deactivate_full + s->deactivate_empty +
 			s->deactivate_to_head + s->deactivate_to_tail;
 
 	if (total)
-		printf("Deactivate Full=%lu(%lu%%) Empty=%lu(%lu%%) "
+		printf("Deactivate: Full=%lu(%lu%%) Empty=%lu(%lu%%) "
 			"ToHead=%lu(%lu%%) ToTail=%lu(%lu%%)\n",
 			s->deactivate_full, (s->deactivate_full * 100) / total,
 			s->deactivate_empty, (s->deactivate_empty * 100) / total,
 			s->deactivate_to_head, (s->deactivate_to_head * 100) / total,
 			s->deactivate_to_tail, (s->deactivate_to_tail * 100) / total);
+
+	if (s->shrink_calls)
+		printf("Shrink    : Calls=%lu Attempts=%lu Empty=%lu Successful=%lu\n",
+			s->shrink_calls, s->shrink_attempt_defrag,
+			s->shrink_empty_slab, s->shrink_slab_reclaimed);
+	if (s->shrink_slab_skipped || s->shrink_object_reclaim_failure)
+		printf("Defrag    : Slabs skipped=%lu Object reclaim failure=%lu\n",
+		s->shrink_slab_skipped, s->shrink_object_reclaim_failure);
 }
 
 void report(struct slabinfo *s)
@@ -492,6 +508,8 @@ void report(struct slabinfo *s)
 		printf("** Slabs are destroyed via RCU\n");
 	if (s->reclaim_account)
 		printf("** Reclaim accounting active\n");
+	if (s->defrag)
+		printf("** Defragmentation at %d%%\n", s->defrag_ratio);
 
 	printf("\nSizes (bytes)     Slabs              Debug                Memory\n");
 	printf("------------------------------------------------------------------------\n");
@@ -539,8 +557,15 @@ void slabcache(struct slabinfo *s)
 	if (show_empty && s->slabs)
 		return;
 
+	if (show_defrag && !s->defrag)
+		return;
+
+	if (show_ctor && !s->ctor)
+		return;
+
 	store_size(size_str, slab_size(s));
-	snprintf(dist_str, 40, "%lu/%lu/%d", s->slabs, s->partial, s->cpu_slabs);
+	snprintf(dist_str, 40, "%lu/%lu/%d", s->slabs - s->cpu_slabs,
+						s->partial, s->cpu_slabs);
 
 	if (!line++)
 		first_line();
@@ -549,6 +574,10 @@ void slabcache(struct slabinfo *s)
 		*p++ = '*';
 	if (s->cache_dma)
 		*p++ = 'd';
+	if (s->defrag)
+		*p++ = 'F';
+	if (s->ctor)
+		*p++ = 'C';
 	if (s->hwcache_align)
 		*p++ = 'A';
 	if (s->poison)
@@ -582,7 +611,8 @@ void slabcache(struct slabinfo *s)
 		printf("%-21s %8ld %7d %8s %14s %4d %1d %3ld %3ld %s\n",
 			s->name, s->objects, s->object_size, size_str, dist_str,
 			s->objs_per_slab, s->order,
-			s->slabs ? (s->partial * 100) / s->slabs : 100,
+			s->slabs ? (s->partial * 100) /
+					(s->slabs * s->objs_per_slab) : 100,
 			s->slabs ? (s->objects * s->object_size * 100) /
 				(s->slabs * (page_size << s->order)) : 100,
 			flags);
@@ -776,7 +806,6 @@ void totals(void)
 		unsigned long used;
 		unsigned long long wasted;
 		unsigned long long objwaste;
-		long long objects_in_partial_slabs;
 		unsigned long percentage_partial_slabs;
 		unsigned long percentage_partial_objs;
 
@@ -790,18 +819,11 @@ void totals(void)
 		wasted = size - used;
 		objwaste = s->slab_size - s->object_size;
 
-		objects_in_partial_slabs = s->objects -
-			(s->slabs - s->partial - s ->cpu_slabs) *
-			s->objs_per_slab;
-
-		if (objects_in_partial_slabs < 0)
-			objects_in_partial_slabs = 0;
-
 		percentage_partial_slabs = s->partial * 100 / s->slabs;
 		if (percentage_partial_slabs > 100)
 			percentage_partial_slabs = 100;
 
-		percentage_partial_objs = objects_in_partial_slabs * 100
+		percentage_partial_objs = s->objects_partial * 100
 							/ s->objects;
 
 		if (percentage_partial_objs > 100)
@@ -823,8 +845,8 @@ void totals(void)
 			min_objects = s->objects;
 		if (used < min_used)
 			min_used = used;
-		if (objects_in_partial_slabs < min_partobj)
-			min_partobj = objects_in_partial_slabs;
+		if (s->objects_partial < min_partobj)
+			min_partobj = s->objects_partial;
 		if (percentage_partial_slabs < min_ppart)
 			min_ppart = percentage_partial_slabs;
 		if (percentage_partial_objs < min_ppartobj)
@@ -848,8 +870,8 @@ void totals(void)
 			max_objects = s->objects;
 		if (used > max_used)
 			max_used = used;
-		if (objects_in_partial_slabs > max_partobj)
-			max_partobj = objects_in_partial_slabs;
+		if (s->objects_partial > max_partobj)
+			max_partobj = s->objects_partial;
 		if (percentage_partial_slabs > max_ppart)
 			max_ppart = percentage_partial_slabs;
 		if (percentage_partial_objs > max_ppartobj)
@@ -864,7 +886,7 @@ void totals(void)
 
 		total_objects += s->objects;
 		total_used += used;
-		total_partobj += objects_in_partial_slabs;
+		total_partobj += s->objects_partial;
 		total_ppart += percentage_partial_slabs;
 		total_ppartobj += percentage_partial_objs;
 
@@ -1160,6 +1182,8 @@ void read_slab_dir(void)
 			slab->hwcache_align = get_obj("hwcache_align");
 			slab->object_size = get_obj("object_size");
 			slab->objects = get_obj("objects");
+			slab->objects_partial = get_obj("objects_partial");
+			slab->objects_total = get_obj("objects_total");
 			slab->objs_per_slab = get_obj("objs_per_slab");
 			slab->order = get_obj("order");
 			slab->partial = get_obj("partial");
@@ -1193,7 +1217,24 @@ void read_slab_dir(void)
 			slab->deactivate_to_head = get_obj("deactivate_to_head");
 			slab->deactivate_to_tail = get_obj("deactivate_to_tail");
 			slab->deactivate_remote_frees = get_obj("deactivate_remote_frees");
+			slab->shrink_calls = get_obj("shrink_calls");
+			slab->shrink_attempt_defrag = get_obj("shrink_attempt_defrag");
+			slab->shrink_empty_slab = get_obj("shrink_empty_slab");
+			slab->shrink_slab_skipped = get_obj("shrink_slab_skipped");
+			slab->shrink_slab_reclaimed = get_obj("shrink_slab_reclaimed");
+			slab->shrink_object_reclaim_failure =
+					get_obj("shrink_object_reclaim_failure");
+			slab->defrag_ratio = get_obj("defrag_ratio");
+			slab->remote_node_defrag_ratio =
+				get_obj("remote_node_defrag_ratio");
 			chdir("..");
+			if (read_slab_obj(slab, "ops")) {
+				if (strstr(buffer, "ctor :"))
+					slab->ctor = 1;
+				if (strstr(buffer, "kick :"))
+					slab->defrag = 1;
+			}
+
 			if (slab->name[0] == ':')
 				alias_targets++;
 			slab++;
@@ -1244,10 +1285,12 @@ void output_slabs(void)
 struct option opts[] = {
 	{ "aliases", 0, NULL, 'a' },
 	{ "activity", 0, NULL, 'A' },
+	{ "ctor", 0, NULL, 'C' },
 	{ "debug", 2, NULL, 'd' },
 	{ "display-activity", 0, NULL, 'D' },
 	{ "empty", 0, NULL, 'e' },
 	{ "first-alias", 0, NULL, 'f' },
+	{ "defrag", 0, NULL, 'F' },
 	{ "help", 0, NULL, 'h' },
 	{ "inverted", 0, NULL, 'i'},
 	{ "numa", 0, NULL, 'n' },
@@ -1270,7 +1313,7 @@ int main(int argc, char *argv[])
 
 	page_size = getpagesize();
 
-	while ((c = getopt_long(argc, argv, "aAd::Defhil1noprstvzTS",
+	while ((c = getopt_long(argc, argv, "aACd::DefFhil1noprstvzTS",
 						opts, NULL)) != -1)
 		switch (c) {
 		case '1':
@@ -1326,6 +1369,12 @@ int main(int argc, char *argv[])
 		case 'z':
 			skip_zero = 0;
 			break;
+		case 'C':
+			show_ctor = 1;
+			break;
+		case 'F':
+			show_defrag = 1;
+			break;
 		case 'T':
 			show_totals = 1;
 			break;
diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index 59375ef..fb58e63 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -50,7 +50,7 @@ void drop_slab(void)
 	int nr_objects;
 
 	do {
-		nr_objects = shrink_slab(1000, GFP_KERNEL, 1000);
+		nr_objects = shrink_slab(1000, GFP_KERNEL, 1000, NULL);
 	} while (nr_objects > 10);
 }
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b695875..c322ae3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1206,7 +1206,7 @@ int in_gate_area_no_task(unsigned long addr);
 int drop_caches_sysctl_handler(struct ctl_table *, int, struct file *,
 					void __user *, size_t *, loff_t *);
 unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
-			unsigned long lru_pages);
+			unsigned long lru_pages, struct zone *z);
 void drop_pagecache(void);
 void drop_slab(void);
 
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index af190ce..e0bd223 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -42,7 +42,10 @@ struct page {
 					 * to show when page is mapped
 					 * & limit reverse map searches.
 					 */
-		unsigned int inuse;	/* SLUB: Nr of objects */
+		struct {		/* SLUB */
+			u16 inuse;
+			u16 objects;
+		};
 	};
 	union {
 	    struct {
diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 39c3a5e..3a3811a 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -95,4 +95,9 @@ found:
 
 #endif	/* CONFIG_NUMA */
 
+static inline void kmem_cache_setup_defrag(struct kmem_cache *s,
+	void *(*get)(struct kmem_cache *, int nr, void **),
+	void (*kick)(struct kmem_cache *, int nr, void **, void *private)) {}
+static inline int kmem_cache_defrag(int node) { return 0; }
+
 #endif	/* _LINUX_SLAB_DEF_H */
diff --git a/include/linux/slob_def.h b/include/linux/slob_def.h
index 59a3fa4..1e94782 100644
--- a/include/linux/slob_def.h
+++ b/include/linux/slob_def.h
@@ -33,4 +33,9 @@ static inline void *__kmalloc(size_t size, gfp_t flags)
 	return kmalloc(size, flags);
 }
 
+static inline void kmem_cache_setup_defrag(struct kmem_cache *s,
+	void *(*get)(struct kmem_cache *, int nr, void **),
+	void (*kick)(struct kmem_cache *, int nr, void **, void *private)) {}
+static inline int kmem_cache_defrag(int node) { return 0; }
+
 #endif /* __LINUX_SLOB_DEF_H */
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index b00c1c7..b2933d1 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -29,8 +29,18 @@ enum stat_item {
 	DEACTIVATE_TO_HEAD,	/* Cpu slab was moved to the head of partials */
 	DEACTIVATE_TO_TAIL,	/* Cpu slab was moved to the tail of partials */
 	DEACTIVATE_REMOTE_FREES,/* Slab contained remotely freed objects */
+	ORDER_FALLBACK,		/* Number of times fallback was necessary */
+	SHRINK_CALLS,		/* Number of invocations of kmem_cache_shrink */
+	SHRINK_ATTEMPT_DEFRAG,	/* Slabs that were attempted to be reclaimed */
+	SHRINK_EMPTY_SLAB,	/* Shrink encountered and freed empty slab */
+	SHRINK_SLAB_SKIPPED,	/* Slab reclaim skipped an slab (busy etc) */
+	SHRINK_SLAB_RECLAIMED,	/* Successfully reclaimed slabs */
+	SHRINK_OBJECT_RECLAIM_FAILED, /* Callbacks signaled busy objects */
 	NR_SLUB_STAT_ITEMS };
 
+typedef void *(*kmem_get_fn_t)(struct kmem_cache *, int, void **);
+typedef void (*kmem_kick_fn_t)(struct kmem_cache *, int, void **, void *);
+
 struct kmem_cache_cpu {
 	void **freelist;	/* Pointer to first free per cpu object */
 	struct page *page;	/* The slab from which we are allocating */
@@ -45,14 +55,24 @@ struct kmem_cache_cpu {
 struct kmem_cache_node {
 	spinlock_t list_lock;	/* Protect partial list and nr_partial */
 	unsigned long nr_partial;
-	atomic_long_t nr_slabs;
 	struct list_head partial;
 #ifdef CONFIG_SLUB_DEBUG
+	atomic_long_t nr_slabs;
+	atomic_long_t total_objects;
 	struct list_head full;
 #endif
 };
 
 /*
+ * Word size structure that can be atomically updated or read and that
+ * contains both the order and the number of objects that a slab of the
+ * given order would contain.
+ */
+struct kmem_cache_order_objects {
+	unsigned long x;
+};
+
+/*
  * Slab cache management.
  */
 struct kmem_cache {
@@ -61,7 +81,7 @@ struct kmem_cache {
 	int size;		/* The size of an object including meta data */
 	int objsize;		/* The size of an object without meta data */
 	int offset;		/* Free pointer offset. */
-	int order;		/* Current preferred allocation order */
+	struct kmem_cache_order_objects oo;
 
 	/*
 	 * Avoid an extra cache line for UP, SMP and for the node local to
@@ -70,12 +90,52 @@ struct kmem_cache {
 	struct kmem_cache_node local_node;
 
 	/* Allocation and freeing of slabs */
-	int objects;		/* Number of objects in slab */
+	struct kmem_cache_order_objects max;
+	struct kmem_cache_order_objects min;
 	gfp_t allocflags;	/* gfp flags to use on each alloc */
 	int refcount;		/* Refcount for slab cache destroy */
+	unsigned long next_defrag;
 	void (*ctor)(struct kmem_cache *, void *);
+	/*
+	 * Called with slab lock held and interrupts disabled.
+	 * No slab operation may be performed in get().
+	 *
+	 * Parameters passed are the number of objects to process
+	 * and an array of pointers to objects for which we
+	 * need references.
+	 *
+	 * Returns a pointer that is passed to the kick function.
+	 * If all objects cannot be moved then the pointer may
+	 * indicate that this wont work and then kick can simply
+	 * remove the references that were already obtained.
+	 *
+	 * The array passed to get() is also passed to kick(). The
+	 * function may remove objects by setting array elements to NULL.
+	 */
+	kmem_get_fn_t get;
+
+	/*
+	 * Called with no locks held and interrupts enabled.
+	 * Any operation may be performed in kick().
+	 *
+	 * Parameters passed are the number of objects in the array,
+	 * the array of pointers to the objects and the pointer
+	 * returned by get().
+	 *
+	 * Success is checked by examining the number of remaining
+	 * objects in the slab.
+	 */
+	kmem_kick_fn_t kick;
+
 	int inuse;		/* Offset to metadata */
 	int align;		/* Alignment */
+	int defrag_ratio;	/*
+				 * Ratio used to check the percentage of
+				 * objects allocate in a slab page.
+				 * If less than this ratio is allocated
+				 * then reclaim attempts are made.
+				 */
+
 	const char *name;	/* Name (only for display!) */
 	struct list_head list;	/* List of slab caches */
 #ifdef CONFIG_SLUB_DEBUG
@@ -231,4 +291,8 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
 }
 #endif
 
+void kmem_cache_setup_defrag(struct kmem_cache *s, kmem_get_fn_t get,
+			     kmem_kick_fn_t kick);
+int kmem_cache_defrag(int node);
+
 #endif /* _LINUX_SLUB_DEF_H */
diff --git a/init/Kconfig b/init/Kconfig
index a97924b..7fccf09 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -763,7 +763,7 @@ endmenu		# General setup
 config SLABINFO
 	bool
 	depends on PROC_FS
-	depends on SLAB || SLUB
+	depends on SLAB || SLUB_DEBUG
 	default y
 
 config RT_MUTEXES
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 0796c1a..eef557d 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -211,7 +211,7 @@ config SLUB_DEBUG_ON
 config SLUB_STATS
 	default n
 	bool "Enable SLUB performance statistics"
-	depends on SLUB
+	depends on SLUB && SLUB_DEBUG && SYSFS
 	help
 	  SLUB statistics are useful to debug SLUBs allocation behavior in
 	  order find ways to optimize the allocator. This should never be
diff --git a/mm/slub.c b/mm/slub.c
index acc975f..4b694a7 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -101,6 +101,7 @@
  */
 
 #define FROZEN (1 << PG_active)
+#define KICKABLE (1 << PG_dirty)
 
 #ifdef CONFIG_SLUB_DEBUG
 #define SLABDEBUG (1 << PG_error)
@@ -138,6 +139,21 @@ static inline void ClearSlabDebug(struct page *page)
 	page->flags &= ~SLABDEBUG;
 }
 
+static inline int SlabKickable(struct page *page)
+{
+	return page->flags & KICKABLE;
+}
+
+static inline void SetSlabKickable(struct page *page)
+{
+	page->flags |= KICKABLE;
+}
+
+static inline void ClearSlabKickable(struct page *page)
+{
+	page->flags &= ~KICKABLE;
+}
+
 /*
  * Issues still to be resolved:
  *
@@ -149,25 +165,6 @@ static inline void ClearSlabDebug(struct page *page)
 /* Enable to test recovery from slab corruption on boot */
 #undef SLUB_RESILIENCY_TEST
 
-#if PAGE_SHIFT <= 12
-
-/*
- * Small page size. Make sure that we do not fragment memory
- */
-#define DEFAULT_MAX_ORDER 1
-#define DEFAULT_MIN_OBJECTS 4
-
-#else
-
-/*
- * Large page machines are customarily able to handle larger
- * page orders.
- */
-#define DEFAULT_MAX_ORDER 2
-#define DEFAULT_MIN_OBJECTS 8
-
-#endif
-
 /*
  * Mininum number of partial slabs. These will be left on the partial
  * lists even if they are empty. kmem_cache_shrink may reclaim them.
@@ -176,10 +173,10 @@ static inline void ClearSlabDebug(struct page *page)
 
 /*
  * Maximum number of desirable partial slabs.
- * The existence of more partial slabs makes kmem_cache_shrink
- * sort the partial list by the number of objects in the.
+ * More slabs cause kmem_cache_shrink to sort the slabs by objects
+ * and triggers slab defragmentation.
  */
-#define MAX_PARTIAL 10
+#define MAX_PARTIAL 20
 
 #define DEBUG_DEFAULT_FLAGS (SLAB_DEBUG_FREE | SLAB_RED_ZONE | \
 				SLAB_POISON | SLAB_STORE_USER)
@@ -204,8 +201,6 @@ static inline void ClearSlabDebug(struct page *page)
 /* Internal SLUB flags */
 #define __OBJECT_POISON		0x80000000 /* Poison object */
 #define __SYSFS_ADD_DEFERRED	0x40000000 /* Not yet visible via sysfs */
-#define __KMALLOC_CACHE		0x20000000 /* objects freed using kfree */
-#define __PAGE_ALLOC_FALLBACK	0x10000000 /* Allow fallback to page alloc */
 
 /* Not all arches define cache_line_size */
 #ifndef cache_line_size
@@ -229,6 +224,9 @@ static enum {
 static DECLARE_RWSEM(slub_lock);
 static LIST_HEAD(slab_caches);
 
+/* Maximum objects in defragmentable slabs */
+static unsigned int max_defrag_slab_objects __read_mostly;
+
 /*
  * Tracking user of a slab.
  */
@@ -301,7 +299,7 @@ static inline int check_valid_pointer(struct kmem_cache *s,
 		return 1;
 
 	base = page_address(page);
-	if (object < base || object >= base + s->objects * s->size ||
+	if (object < base || object >= base + page->objects * s->size ||
 		(object - base) % s->size) {
 		return 0;
 	}
@@ -327,8 +325,8 @@ static inline void set_freepointer(struct kmem_cache *s, void *object, void *fp)
 }
 
 /* Loop over all objects in a slab */
-#define for_each_object(__p, __s, __addr) \
-	for (__p = (__addr); __p < (__addr) + (__s)->objects * (__s)->size;\
+#define for_each_object(__p, __s, __addr, __objects) \
+	for (__p = (__addr); __p < (__addr) + (__objects) * (__s)->size;\
 			__p += (__s)->size)
 
 /* Scan freelist */
@@ -341,6 +339,26 @@ static inline int slab_index(void *p, struct kmem_cache *s, void *addr)
 	return (p - addr) / s->size;
 }
 
+static inline struct kmem_cache_order_objects oo_make(int order,
+						unsigned long size)
+{
+	struct kmem_cache_order_objects x = {
+		(order << 16) + (PAGE_SIZE << order) / size
+	};
+
+	return x;
+}
+
+static inline int oo_order(struct kmem_cache_order_objects x)
+{
+	return x.x >> 16;
+}
+
+static inline int oo_objects(struct kmem_cache_order_objects x)
+{
+	return x.x & ((1 << 16) - 1);
+}
+
 #ifdef CONFIG_SLUB_DEBUG
 /*
  * Debug settings:
@@ -451,8 +469,8 @@ static void print_tracking(struct kmem_cache *s, void *object)
 
 static void print_page_info(struct page *page)
 {
-	printk(KERN_ERR "INFO: Slab 0x%p used=%u fp=0x%p flags=0x%04lx\n",
-		page, page->inuse, page->freelist, page->flags);
+	printk(KERN_ERR "INFO: Slab 0x%p objects=%u used=%u fp=0x%p flags=0x%04lx\n",
+		page, page->objects, page->inuse, page->freelist, page->flags);
 
 }
 
@@ -652,6 +670,7 @@ static int check_pad_bytes(struct kmem_cache *s, struct page *page, u8 *p)
 				p + off, POISON_INUSE, s->size - off);
 }
 
+/* Check the pad bytes at the end of a slab page */
 static int slab_pad_check(struct kmem_cache *s, struct page *page)
 {
 	u8 *start;
@@ -664,20 +683,20 @@ static int slab_pad_check(struct kmem_cache *s, struct page *page)
 		return 1;
 
 	start = page_address(page);
-	end = start + (PAGE_SIZE << s->order);
-	length = s->objects * s->size;
-	remainder = end - (start + length);
+	length = (PAGE_SIZE << compound_order(page));
+	end = start + length;
+	remainder = length % s->size;
 	if (!remainder)
 		return 1;
 
-	fault = check_bytes(start + length, POISON_INUSE, remainder);
+	fault = check_bytes(end - remainder, POISON_INUSE, remainder);
 	if (!fault)
 		return 1;
 	while (end > fault && end[-1] == POISON_INUSE)
 		end--;
 
 	slab_err(s, page, "Padding overwritten. 0x%p-0x%p", fault, end - 1);
-	print_section("Padding", start, length);
+	print_section("Padding", end - remainder, remainder);
 
 	restore_bytes(s, "slab padding", POISON_INUSE, start, end);
 	return 0;
@@ -739,15 +758,24 @@ static int check_object(struct kmem_cache *s, struct page *page,
 
 static int check_slab(struct kmem_cache *s, struct page *page)
 {
+	int maxobj;
+
 	VM_BUG_ON(!irqs_disabled());
 
 	if (!PageSlab(page)) {
 		slab_err(s, page, "Not a valid slab page");
 		return 0;
 	}
-	if (page->inuse > s->objects) {
+
+	maxobj = (PAGE_SIZE << compound_order(page)) / s->size;
+	if (page->objects > maxobj) {
+		slab_err(s, page, "objects %u > max %u",
+			s->name, page->objects, maxobj);
+		return 0;
+	}
+	if (page->inuse > page->objects) {
 		slab_err(s, page, "inuse %u > max %u",
-			s->name, page->inuse, s->objects);
+			s->name, page->inuse, page->objects);
 		return 0;
 	}
 	/* Slab_pad_check fixes things up after itself */
@@ -764,8 +792,9 @@ static int on_freelist(struct kmem_cache *s, struct page *page, void *search)
 	int nr = 0;
 	void *fp = page->freelist;
 	void *object = NULL;
+	unsigned long max_objects;
 
-	while (fp && nr <= s->objects) {
+	while (fp && nr <= page->objects) {
 		if (fp == search)
 			return 1;
 		if (!check_valid_pointer(s, page, fp)) {
@@ -777,7 +806,7 @@ static int on_freelist(struct kmem_cache *s, struct page *page, void *search)
 			} else {
 				slab_err(s, page, "Freepointer corrupt");
 				page->freelist = NULL;
-				page->inuse = s->objects;
+				page->inuse = page->objects;
 				slab_fix(s, "Freelist cleared");
 				return 0;
 			}
@@ -788,16 +817,27 @@ static int on_freelist(struct kmem_cache *s, struct page *page, void *search)
 		nr++;
 	}
 
-	if (page->inuse != s->objects - nr) {
+	max_objects = (PAGE_SIZE << compound_order(page)) / s->size;
+	if (max_objects > 65535)
+		max_objects = 65535;
+
+	if (page->objects != max_objects) {
+		slab_err(s, page, "Wrong number of objects. Found %d but "
+			"should be %d", page->objects, max_objects);
+		page->objects = max_objects;
+		slab_fix(s, "Number of objects adjusted.");
+	}
+	if (page->inuse != page->objects - nr) {
 		slab_err(s, page, "Wrong object count. Counter is %d but "
-			"counted were %d", page->inuse, s->objects - nr);
-		page->inuse = s->objects - nr;
+			"counted were %d", page->inuse, page->objects - nr);
+		page->inuse = page->objects - nr;
 		slab_fix(s, "Object count adjusted.");
 	}
 	return search == NULL;
 }
 
-static void trace(struct kmem_cache *s, struct page *page, void *object, int alloc)
+static void
+trace(struct kmem_cache *s, struct page *page, void *object, int alloc)
 {
 	if (s->flags & SLAB_TRACE) {
 		printk(KERN_INFO "TRACE %s %s 0x%p inuse=%d fp=0x%p\n",
@@ -837,6 +877,38 @@ static void remove_full(struct kmem_cache *s, struct page *page)
 	spin_unlock(&n->list_lock);
 }
 
+/* Tracking of the number of slabs for debugging purposes */
+static inline unsigned long slabs_node(struct kmem_cache *s, int node)
+{
+	struct kmem_cache_node *n = get_node(s, node);
+
+	return atomic_long_read(&n->nr_slabs);
+}
+
+static inline void inc_slabs_node(struct kmem_cache *s, int node, int objects)
+{
+	struct kmem_cache_node *n = get_node(s, node);
+
+	/*
+	 * May be called early in order to allocate a slab for the
+	 * kmem_cache_node structure. Solve the chicken-egg
+	 * dilemma by deferring the increment of the count during
+	 * bootstrap (see early_kmem_cache_node_alloc).
+	 */
+	if (!NUMA_BUILD || n) {
+		atomic_long_inc(&n->nr_slabs);
+		atomic_long_add(objects, &n->total_objects);
+	}
+}
+static inline void dec_slabs_node(struct kmem_cache *s, int node, int objects)
+{
+	struct kmem_cache_node *n = get_node(s, node);
+
+	atomic_long_dec(&n->nr_slabs);
+	atomic_long_sub(objects, &n->total_objects);
+}
+
+/* Object debug checks for alloc/free paths */
 static void setup_object_debug(struct kmem_cache *s, struct page *page,
 								void *object)
 {
@@ -881,7 +953,7 @@ bad:
 		 * as used avoids touching the remaining objects.
 		 */
 		slab_fix(s, "Marking all objects used");
-		page->inuse = s->objects;
+		page->inuse = page->objects;
 		page->freelist = NULL;
 	}
 	return 0;
@@ -1028,29 +1100,55 @@ static inline unsigned long kmem_cache_flags(unsigned long objsize,
 	return flags;
 }
 #define slub_debug 0
+
+static inline unsigned long slabs_node(struct kmem_cache *s, int node)
+							{ return 0; }
+static inline void inc_slabs_node(struct kmem_cache *s, int node,
+							int objects) {}
+static inline void dec_slabs_node(struct kmem_cache *s, int node,
+							int objects) {}
 #endif
+
 /*
  * Slab allocation and freeing
  */
+static inline struct page *alloc_slab_page(gfp_t flags, int node,
+					struct kmem_cache_order_objects oo)
+{
+	int order = oo_order(oo);
+
+	if (node == -1)
+		return alloc_pages(flags, order);
+	else
+		return alloc_pages_node(node, flags, order);
+}
+
 static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 {
 	struct page *page;
-	int pages = 1 << s->order;
+	struct kmem_cache_order_objects oo = s->oo;
 
 	flags |= s->allocflags;
 
-	if (node == -1)
-		page = alloc_pages(flags, s->order);
-	else
-		page = alloc_pages_node(node, flags, s->order);
-
-	if (!page)
-		return NULL;
+	page = alloc_slab_page(flags | __GFP_NOWARN | __GFP_NORETRY, node,
+									oo);
+	if (unlikely(!page)) {
+		oo = s->min;
+		/*
+		 * Allocation may have failed due to fragmentation.
+		 * Try a lower order alloc if possible
+		 */
+		page = alloc_slab_page(flags, node, oo);
+		if (!page)
+			return NULL;
 
+		stat(get_cpu_slab(s, raw_smp_processor_id()), ORDER_FALLBACK);
+	}
+	page->objects = oo_objects(oo);
 	mod_zone_page_state(page_zone(page),
 		(s->flags & SLAB_RECLAIM_ACCOUNT) ?
 		NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
-		pages);
+		1 << oo_order(oo));
 
 	return page;
 }
@@ -1066,7 +1164,6 @@ static void setup_object(struct kmem_cache *s, struct page *page,
 static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
 {
 	struct page *page;
-	struct kmem_cache_node *n;
 	void *start;
 	void *last;
 	void *p;
@@ -1078,22 +1175,23 @@ static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
 	if (!page)
 		goto out;
 
-	n = get_node(s, page_to_nid(page));
-	if (n)
-		atomic_long_inc(&n->nr_slabs);
+	inc_slabs_node(s, page_to_nid(page), page->objects);
 	page->slab = s;
 	page->flags |= 1 << PG_slab;
 	if (s->flags & (SLAB_DEBUG_FREE | SLAB_RED_ZONE | SLAB_POISON |
 			SLAB_STORE_USER | SLAB_TRACE))
 		SetSlabDebug(page);
 
+	if (s->kick)
+		SetSlabKickable(page);
+
 	start = page_address(page);
 
 	if (unlikely(s->flags & SLAB_POISON))
-		memset(start, POISON_INUSE, PAGE_SIZE << s->order);
+		memset(start, POISON_INUSE, PAGE_SIZE << compound_order(page));
 
 	last = start;
-	for_each_object(p, s, start) {
+	for_each_object(p, s, start, page->objects) {
 		setup_object(s, page, last);
 		set_freepointer(s, last, p);
 		last = p;
@@ -1109,13 +1207,15 @@ out:
 
 static void __free_slab(struct kmem_cache *s, struct page *page)
 {
-	int pages = 1 << s->order;
+	int order = compound_order(page);
+	int pages = 1 << order;
 
 	if (unlikely(SlabDebug(page))) {
 		void *p;
 
 		slab_pad_check(s, page);
-		for_each_object(p, s, page_address(page))
+		for_each_object(p, s, page_address(page),
+						page->objects)
 			check_object(s, page, p, 0);
 		ClearSlabDebug(page);
 	}
@@ -1125,7 +1225,10 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
 		NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
 		-pages);
 
-	__free_pages(page, s->order);
+	ClearSlabKickable(page);
+	__ClearPageSlab(page);
+	reset_page_mapcount(page);
+	__free_pages(page, order);
 }
 
 static void rcu_free_slab(struct rcu_head *h)
@@ -1151,11 +1254,7 @@ static void free_slab(struct kmem_cache *s, struct page *page)
 
 static void discard_slab(struct kmem_cache *s, struct page *page)
 {
-	struct kmem_cache_node *n = get_node(s, page_to_nid(page));
-
-	atomic_long_dec(&n->nr_slabs);
-	reset_page_mapcount(page);
-	__ClearPageSlab(page);
+	dec_slabs_node(s, page_to_nid(page), page->objects);
 	free_slab(s, page);
 }
 
@@ -1211,7 +1310,8 @@ static void remove_partial(struct kmem_cache *s,
  *
  * Must hold list_lock.
  */
-static inline int lock_and_freeze_slab(struct kmem_cache_node *n, struct page *page)
+static inline int
+lock_and_freeze_slab(struct kmem_cache_node *n, struct page *page)
 {
 	if (slab_trylock(page)) {
 		list_del(&page->lru);
@@ -1335,6 +1435,8 @@ static void unfreeze_slab(struct kmem_cache *s, struct page *page, int tail)
 			stat(c, DEACTIVATE_FULL);
 			if (SlabDebug(page) && (s->flags & SLAB_STORE_USER))
 				add_full(n, page);
+			if (s->kick)
+				SetSlabKickable(page);
 		}
 		slab_unlock(page);
 	} else {
@@ -1347,8 +1449,8 @@ static void unfreeze_slab(struct kmem_cache *s, struct page *page, int tail)
 			 * so that the others get filled first. That way the
 			 * size of the partial list stays small.
 			 *
-			 * kmem_cache_shrink can reclaim any empty slabs from the
-			 * partial list.
+			 * kmem_cache_shrink can reclaim any empty slabs from
+			 * the partial list.
 			 */
 			add_partial(n, page, 1);
 			slab_unlock(page);
@@ -1470,9 +1572,6 @@ static void *__slab_alloc(struct kmem_cache *s,
 	void **object;
 	struct page *new;
 
-	/* We handle __GFP_ZERO in the caller */
-	gfpflags &= ~__GFP_ZERO;
-
 	if (!c->page)
 		goto new_slab;
 
@@ -1490,7 +1589,7 @@ load_freelist:
 		goto debug;
 
 	c->freelist = object[c->offset];
-	c->page->inuse = s->objects;
+	c->page->inuse = c->page->objects;
 	c->page->freelist = NULL;
 	c->node = page_to_nid(c->page);
 unlock_out:
@@ -1527,27 +1626,6 @@ new_slab:
 		c->page = new;
 		goto load_freelist;
 	}
-
-	/*
-	 * No memory available.
-	 *
-	 * If the slab uses higher order allocs but the object is
-	 * smaller than a page size then we can fallback in emergencies
-	 * to the page allocator via kmalloc_large. The page allocator may
-	 * have failed to obtain a higher order page and we can try to
-	 * allocate a single page if the object fits into a single page.
-	 * That is only possible if certain conditions are met that are being
-	 * checked when a slab is created.
-	 */
-	if (!(gfpflags & __GFP_NORETRY) &&
-				(s->flags & __PAGE_ALLOC_FALLBACK)) {
-		if (gfpflags & __GFP_WAIT)
-			local_irq_enable();
-		object = kmalloc_large(s->objsize, gfpflags);
-		if (gfpflags & __GFP_WAIT)
-			local_irq_disable();
-		return object;
-	}
 	return NULL;
 debug:
 	if (!alloc_debug_processing(s, c->page, object, addr))
@@ -1748,8 +1826,8 @@ static struct page *get_object_page(const void *x)
  * take the list_lock.
  */
 static int slub_min_order;
-static int slub_max_order = DEFAULT_MAX_ORDER;
-static int slub_min_objects = DEFAULT_MIN_OBJECTS;
+static int slub_max_order = PAGE_ALLOC_COSTLY_ORDER;
+static int slub_min_objects __read_mostly;
 
 /*
  * Merge control. If this is set then no merging of slab caches will occur.
@@ -1764,7 +1842,7 @@ static int slub_nomerge;
  * system components. Generally order 0 allocations should be preferred since
  * order 0 does not cause fragmentation in the page allocator. Larger objects
  * be problematic to put into order 0 slabs because there may be too much
- * unused space left. We go to a higher order if more than 1/8th of the slab
+ * unused space left. We go to a higher order if more than 1/16th of the slab
  * would be wasted.
  *
  * In order to reach satisfactory performance we must ensure that a minimum
@@ -1789,6 +1867,9 @@ static inline int slab_order(int size, int min_objects,
 	int rem;
 	int min_order = slub_min_order;
 
+	if ((PAGE_SIZE << min_order) / size > 65535)
+		return get_order(size * 65535) - 1;
+
 	for (order = max(min_order,
 				fls(min_objects * size - 1) - PAGE_SHIFT);
 			order <= max_order; order++) {
@@ -1823,8 +1904,10 @@ static inline int calculate_order(int size)
 	 * we reduce the minimum objects required in a slab.
 	 */
 	min_objects = slub_min_objects;
+	if (!min_objects)
+		min_objects = 4 * (fls(nr_cpu_ids) + 1);
 	while (min_objects > 1) {
-		fraction = 8;
+		fraction = 16;
 		while (fraction >= 4) {
 			order = slab_order(size, min_objects,
 						slub_max_order, fraction);
@@ -1886,15 +1969,18 @@ static void init_kmem_cache_cpu(struct kmem_cache *s,
 	c->node = 0;
 	c->offset = s->offset / sizeof(void *);
 	c->objsize = s->objsize;
+#ifdef CONFIG_SLUB_STATS
+	memset(c->stat, 0, NR_SLUB_STAT_ITEMS * sizeof(unsigned));
+#endif
 }
 
 static void init_kmem_cache_node(struct kmem_cache_node *n)
 {
 	n->nr_partial = 0;
-	atomic_long_set(&n->nr_slabs, 0);
 	spin_lock_init(&n->list_lock);
 	INIT_LIST_HEAD(&n->partial);
 #ifdef CONFIG_SLUB_DEBUG
+	atomic_long_set(&n->nr_slabs, 0);
 	INIT_LIST_HEAD(&n->full);
 #endif
 }
@@ -2063,7 +2149,7 @@ static struct kmem_cache_node *early_kmem_cache_node_alloc(gfp_t gfpflags,
 	init_tracking(kmalloc_caches, n);
 #endif
 	init_kmem_cache_node(n);
-	atomic_long_inc(&n->nr_slabs);
+	inc_slabs_node(kmalloc_caches, node, page->objects);
 
 	/*
 	 * lockdep requires consistent irq usage for each lock
@@ -2139,11 +2225,12 @@ static int init_kmem_cache_nodes(struct kmem_cache *s, gfp_t gfpflags)
  * calculate_sizes() determines the order and the distribution of data within
  * a slab object.
  */
-static int calculate_sizes(struct kmem_cache *s)
+static int calculate_sizes(struct kmem_cache *s, int forced_order)
 {
 	unsigned long flags = s->flags;
 	unsigned long size = s->objsize;
 	unsigned long align = s->align;
+	int order;
 
 	/*
 	 * Round up object size to the next word boundary. We can only
@@ -2227,26 +2314,16 @@ static int calculate_sizes(struct kmem_cache *s)
 	 */
 	size = ALIGN(size, align);
 	s->size = size;
+	if (forced_order >= 0)
+		order = forced_order;
+	else
+		order = calculate_order(size);
 
-	if ((flags & __KMALLOC_CACHE) &&
-			PAGE_SIZE / size < slub_min_objects) {
-		/*
-		 * Kmalloc cache that would not have enough objects in
-		 * an order 0 page. Kmalloc slabs can fallback to
-		 * page allocator order 0 allocs so take a reasonably large
-		 * order that will allows us a good number of objects.
-		 */
-		s->order = max(slub_max_order, PAGE_ALLOC_COSTLY_ORDER);
-		s->flags |= __PAGE_ALLOC_FALLBACK;
-		s->allocflags |= __GFP_NOWARN;
-	} else
-		s->order = calculate_order(size);
-
-	if (s->order < 0)
+	if (order < 0)
 		return 0;
 
 	s->allocflags = 0;
-	if (s->order)
+	if (order)
 		s->allocflags |= __GFP_COMP;
 
 	if (s->flags & SLAB_CACHE_DMA)
@@ -2258,9 +2335,12 @@ static int calculate_sizes(struct kmem_cache *s)
 	/*
 	 * Determine the number of objects per slab
 	 */
-	s->objects = (PAGE_SIZE << s->order) / size;
+	s->oo = oo_make(order, size);
+	s->min = oo_make(get_order(size), size);
+	if (oo_objects(s->oo) > oo_objects(s->max))
+		s->max = s->oo;
 
-	return !!s->objects;
+	return !!oo_objects(s->oo);
 
 }
 
@@ -2276,10 +2356,11 @@ static int kmem_cache_open(struct kmem_cache *s, gfp_t gfpflags,
 	s->align = align;
 	s->flags = kmem_cache_flags(size, flags, name, ctor);
 
-	if (!calculate_sizes(s))
+	if (!calculate_sizes(s, -1))
 		goto error;
 
 	s->refcount = 1;
+	s->defrag_ratio = 30;
 #ifdef CONFIG_NUMA
 	s->remote_node_defrag_ratio = 100;
 #endif
@@ -2293,7 +2374,7 @@ error:
 	if (flags & SLAB_PANIC)
 		panic("Cannot create slab %s size=%lu realsize=%u "
 			"order=%u offset=%u flags=%lx\n",
-			s->name, (unsigned long)size, s->size, s->order,
+			s->name, (unsigned long)size, s->size, oo_order(s->oo),
 			s->offset, flags);
 	return 0;
 }
@@ -2376,7 +2457,7 @@ static inline int kmem_cache_close(struct kmem_cache *s)
 		struct kmem_cache_node *n = get_node(s, node);
 
 		n->nr_partial -= free_list(s, n, &n->partial);
-		if (atomic_long_read(&n->nr_slabs))
+		if (slabs_node(s, node))
 			return 1;
 	}
 	free_kmem_cache_nodes(s);
@@ -2409,10 +2490,6 @@ EXPORT_SYMBOL(kmem_cache_destroy);
 struct kmem_cache kmalloc_caches[PAGE_SHIFT + 1] __cacheline_aligned;
 EXPORT_SYMBOL(kmalloc_caches);
 
-#ifdef CONFIG_ZONE_DMA
-static struct kmem_cache *kmalloc_caches_dma[PAGE_SHIFT + 1];
-#endif
-
 static int __init setup_slub_min_order(char *str)
 {
 	get_option(&str, &slub_min_order);
@@ -2458,10 +2535,10 @@ static struct kmem_cache *create_kmalloc_cache(struct kmem_cache *s,
 
 	down_write(&slub_lock);
 	if (!kmem_cache_open(s, gfp_flags, name, size, ARCH_KMALLOC_MINALIGN,
-			flags | __KMALLOC_CACHE, NULL))
+								flags, NULL))
 		goto panic;
 
-	list_add(&s->list, &slab_caches);
+	list_add_tail(&s->list, &slab_caches);
 	up_write(&slub_lock);
 	if (sysfs_slab_add(s))
 		goto panic;
@@ -2472,6 +2549,7 @@ panic:
 }
 
 #ifdef CONFIG_ZONE_DMA
+static struct kmem_cache *kmalloc_caches_dma[PAGE_SHIFT + 1];
 
 static void sysfs_add_func(struct work_struct *w)
 {
@@ -2688,91 +2766,268 @@ void kfree(const void *x)
 }
 EXPORT_SYMBOL(kfree);
 
-#if defined(CONFIG_SLUB_DEBUG) || defined(CONFIG_SLABINFO)
-static unsigned long count_partial(struct kmem_cache_node *n)
+static inline void *alloc_scratch(void)
 {
-	unsigned long flags;
-	unsigned long x = 0;
-	struct page *page;
+	return kmalloc(max_defrag_slab_objects * sizeof(void *) +
+	    BITS_TO_LONGS(max_defrag_slab_objects) * sizeof(unsigned long),
+								GFP_KERNEL);
+}
 
-	spin_lock_irqsave(&n->list_lock, flags);
-	list_for_each_entry(page, &n->partial, lru)
-		x += page->inuse;
-	spin_unlock_irqrestore(&n->list_lock, flags);
-	return x;
+void kmem_cache_setup_defrag(struct kmem_cache *s, kmem_get_fn_t get,
+			     kmem_kick_fn_t kick)
+{
+	int max_objects = oo_objects(s->max);
+
+	/*
+	 * Defragmentable slabs must have a ctor otherwise objects may be
+	 * in an undetermined state after they are allocated.
+	 */
+	BUG_ON(!s->ctor);
+	s->get = get;
+	s->kick = kick;
+	down_write(&slub_lock);
+	list_move(&s->list, &slab_caches);
+	if (max_objects > max_defrag_slab_objects)
+		max_defrag_slab_objects = max_objects;
+	up_write(&slub_lock);
 }
-#endif
+EXPORT_SYMBOL(kmem_cache_setup_defrag);
 
 /*
- * kmem_cache_shrink removes empty slabs from the partial lists and sorts
- * the remaining slabs by the number of items in use. The slabs with the
- * most items in use come first. New allocations will then fill those up
- * and thus they can be removed from the partial lists.
+ * Vacate all objects in the given slab.
  *
- * The slabs with the least items are placed last. This results in them
- * being allocated from last increasing the chance that the last objects
- * are freed in them.
+ * The scratch aread passed to list function is sufficient to hold
+ * struct listhead times objects per slab. We use it to hold void ** times
+ * objects per slab plus a bitmap for each object.
  */
-int kmem_cache_shrink(struct kmem_cache *s)
+static int kmem_cache_vacate(struct page *page, void *scratch)
 {
-	int node;
-	int i;
-	struct kmem_cache_node *n;
-	struct page *page;
-	struct page *t;
-	struct list_head *slabs_by_inuse =
-		kmalloc(sizeof(struct list_head) * s->objects, GFP_KERNEL);
+	void **vector = scratch;
+	void *p;
+	void *addr = page_address(page);
+	struct kmem_cache *s;
+	unsigned long *map;
+	int leftover;
+	int count;
+	void *private;
 	unsigned long flags;
+	unsigned long objects;
+	struct kmem_cache_cpu *c;
 
-	if (!slabs_by_inuse)
-		return -ENOMEM;
+	BUG_ON(!PageSlab(page));
+	local_irq_save(flags);
+	slab_lock(page);
+	BUG_ON(!SlabFrozen(page));
 
-	flush_all(s);
-	for_each_node_state(node, N_NORMAL_MEMORY) {
-		n = get_node(s, node);
+	s = page->slab;
+	objects = page->objects;
+	c = get_cpu_slab(s, smp_processor_id());
+	map = scratch + max_defrag_slab_objects * sizeof(void **);
+	if (!page->inuse || !s->kick || !SlabKickable(page)) {
+		stat(c, SHRINK_SLAB_SKIPPED);
+		goto out;
+	}
+
+	/* Determine used objects */
+	bitmap_fill(map, objects);
+	for_each_free_object(p, s, page->freelist)
+		__clear_bit(slab_index(p, s, addr), map);
+
+	count = 0;
+	memset(vector, 0, objects * sizeof(void **));
+	for_each_object(p, s, addr, objects)
+		if (test_bit(slab_index(p, s, addr), map))
+			vector[count++] = p;
+
+	private = s->get(s, count, vector);
+
+	/*
+	 * Got references. Now we can drop the slab lock. The slab
+	 * is frozen so it cannot vanish from under us nor will
+	 * allocations be performed on the slab. However, unlocking the
+	 * slab will allow concurrent slab_frees to proceed.
+	 */
+	slab_unlock(page);
+	local_irq_restore(flags);
+
+	/*
+	 * Perform the KICK callbacks to remove the objects.
+	 */
+	s->kick(s, count, vector, private);
+
+	local_irq_save(flags);
+	slab_lock(page);
+out:
+	/*
+	 * Check the result and unfreeze the slab
+	 */
+	leftover = page->inuse;
+	if (leftover) {
+		stat(c, SHRINK_OBJECT_RECLAIM_FAILED);
+		ClearSlabKickable(page);
+	} else
+		stat(c, SHRINK_SLAB_RECLAIMED);
+
+	unfreeze_slab(s, page, leftover > 0);
+	local_irq_restore(flags);
+	return leftover;
+}
+
+/*
+ * Remove objects from a list of slab pages that have been gathered.
+ * Must be called with slabs that have been isolated before.
+ */
+int kmem_cache_reclaim(struct list_head *zaplist)
+{
+	int freed = 0;
+	void **scratch;
+	struct page *page;
+	struct page *page2;
+
+	if (list_empty(zaplist))
+		return 0;
+
+	scratch = alloc_scratch();
+	if (!scratch)
+		return 0;
 
-		if (!n->nr_partial)
+	list_for_each_entry_safe(page, page2, zaplist, lru) {
+		list_del(&page->lru);
+		if (kmem_cache_vacate(page, scratch) == 0)
+			freed++;
+	}
+	kfree(scratch);
+	return freed;
+}
+
+/*
+ * Shrink the slab cache on a particular node of the cache
+ * by releasing slabs with zero objects and trying to reclaim
+ * slabs with less than a quarter of objects allocated.
+ */
+static unsigned long __kmem_cache_shrink(struct kmem_cache *s, int node,
+							unsigned long limit)
+{
+	unsigned long flags;
+	struct page *page, *page2;
+	LIST_HEAD(zaplist);
+	int freed = 0;
+	struct kmem_cache_node *n = get_node(s, node);
+	struct kmem_cache_cpu *c;
+
+	if (n->nr_partial <= limit)
+		return 0;
+
+	spin_lock_irqsave(&n->list_lock, flags);
+	c = get_cpu_slab(s, smp_processor_id());
+	stat(c, SHRINK_CALLS);
+	list_for_each_entry_safe(page, page2, &n->partial, lru) {
+		if (!slab_trylock(page))
 			continue;
 
-		for (i = 0; i < s->objects; i++)
-			INIT_LIST_HEAD(slabs_by_inuse + i);
+		if (page->inuse) {
+			if (!SlabKickable(page))
+				continue;
 
-		spin_lock_irqsave(&n->list_lock, flags);
+			if (page->inuse * 100 >=
+					s->defrag_ratio * page->objects) {
+				slab_unlock(page);
+				continue;
+			}
 
-		/*
-		 * Build lists indexed by the items in use in each slab.
-		 *
-		 * Note that concurrent frees may occur while we hold the
-		 * list_lock. page->inuse here is the upper limit.
-		 */
-		list_for_each_entry_safe(page, t, &n->partial, lru) {
-			if (!page->inuse && slab_trylock(page)) {
-				/*
-				 * Must hold slab lock here because slab_free
-				 * may have freed the last object and be
-				 * waiting to release the slab.
-				 */
-				list_del(&page->lru);
+			list_move(&page->lru, &zaplist);
+			if (s->kick) {
+				stat(c, SHRINK_ATTEMPT_DEFRAG);
 				n->nr_partial--;
-				slab_unlock(page);
-				discard_slab(s, page);
-			} else {
-				list_move(&page->lru,
-				slabs_by_inuse + page->inuse);
+				SetSlabFrozen(page);
 			}
+			slab_unlock(page);
+		} else {
+			stat(c, SHRINK_EMPTY_SLAB);
+			list_del(&page->lru);
+			n->nr_partial--;
+			slab_unlock(page);
+			discard_slab(s, page);
+			freed++;
 		}
+	}
+
+	if (!s->kick)
+		/* Simply put the zaplist at the end */
+		list_splice(&zaplist, n->partial.prev);
+
+	spin_unlock_irqrestore(&n->list_lock, flags);
+
+	if (s->kick)
+		freed += kmem_cache_reclaim(&zaplist);
+
+	return freed;
+}
+
+/*
+ * Defrag slabs conditional on the amount of fragmentation in a page.
+ */
+int kmem_cache_defrag(int node)
+{
+	struct kmem_cache *s;
+	unsigned long slabs = 0;
+
+	/*
+	 * kmem_cache_defrag may be called from the reclaim path which may be
+	 * called for any page allocator alloc. So there is the danger that we
+	 * get called in a situation where slub already acquired the slub_lock
+	 * for other purposes.
+	 */
+	if (!down_read_trylock(&slub_lock))
+		return 0;
+
+	list_for_each_entry(s, &slab_caches, list) {
+		unsigned long reclaimed;
+
+		if (time_before(jiffies, s->next_defrag))
+			continue;
 
 		/*
-		 * Rebuild the partial list with the slabs filled up most
-		 * first and the least used slabs at the end.
+		 * Defragmentable caches come first. If the slab cache is not
+		 * defragmentable then we can stop traversing the list.
 		 */
-		for (i = s->objects - 1; i >= 0; i--)
-			list_splice(slabs_by_inuse + i, n->partial.prev);
+		if (!s->kick)
+			break;
 
-		spin_unlock_irqrestore(&n->list_lock, flags);
+		if (node == -1) {
+			int nid;
+
+			for_each_node_state(nid, N_NORMAL_MEMORY)
+				reclaimed = __kmem_cache_shrink(s, nid,
+								MAX_PARTIAL);
+		} else
+			reclaimed = __kmem_cache_shrink(s, node, MAX_PARTIAL);
+
+		if (reclaimed)
+			s->next_defrag = jiffies + HZ / 10;
+		else
+			s->next_defrag = jiffies + HZ;
+
+		slabs += reclaimed;
 	}
+	up_read(&slub_lock);
+	return slabs;
+}
+EXPORT_SYMBOL(kmem_cache_defrag);
+
+/*
+ * kmem_cache_shrink removes empty slabs from the partial lists.
+ * If the slab cache supports defragmentation then objects are
+ * reclaimed.
+ */
+int kmem_cache_shrink(struct kmem_cache *s)
+{
+	int node;
+
+	flush_all(s);
+	for_each_node_state(node, N_NORMAL_MEMORY)
+		__kmem_cache_shrink(s, node, 0);
 
-	kfree(slabs_by_inuse);
 	return 0;
 }
 EXPORT_SYMBOL(kmem_cache_shrink);
@@ -2816,7 +3071,7 @@ static void slab_mem_offline_callback(void *arg)
 			 * and offline_pages() function shoudn't call this
 			 * callback. So, we must fail.
 			 */
-			BUG_ON(atomic_long_read(&n->nr_slabs));
+			BUG_ON(slabs_node(s, offline_node));
 
 			s->node[offline_node] = NULL;
 			kmem_cache_free(kmalloc_caches, n);
@@ -2841,7 +3096,7 @@ static int slab_mem_going_online_callback(void *arg)
 		return 0;
 
 	/*
-	 * We are bringing a node online. No memory is availabe yet. We must
+	 * We are bringing a node online. No memory is available yet. We must
 	 * allocate a kmem_cache_node structure in order to bring the node
 	 * online.
 	 */
@@ -2987,10 +3242,7 @@ static int slab_unmergeable(struct kmem_cache *s)
 	if (slub_nomerge || (s->flags & SLUB_NEVER_MERGE))
 		return 1;
 
-	if ((s->flags & __PAGE_ALLOC_FALLBACK))
-		return 1;
-
-	if (s->ctor)
+	if (s->ctor || s->kick || s->get)
 		return 1;
 
 	/*
@@ -3080,7 +3332,7 @@ struct kmem_cache *kmem_cache_create(const char *name, size_t size,
 	if (s) {
 		if (kmem_cache_open(s, GFP_KERNEL, name,
 				size, align, flags, ctor)) {
-			list_add(&s->list, &slab_caches);
+			list_add_tail(&s->list, &slab_caches);
 			up_write(&slub_lock);
 			if (sysfs_slab_add(s))
 				goto err;
@@ -3181,6 +3433,37 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
 	return slab_alloc(s, gfpflags, node, caller);
 }
 
+#if (defined(CONFIG_SYSFS) && defined(CONFIG_SLUB_DEBUG)) || defined(CONFIG_SLABINFO)
+static unsigned long count_partial(struct kmem_cache_node *n,
+					int (*get_count)(struct page *))
+{
+	unsigned long flags;
+	unsigned long x = 0;
+	struct page *page;
+
+	spin_lock_irqsave(&n->list_lock, flags);
+	list_for_each_entry(page, &n->partial, lru)
+		x += get_count(page);
+	spin_unlock_irqrestore(&n->list_lock, flags);
+	return x;
+}
+
+static int count_inuse(struct page *page)
+{
+	return page->inuse;
+}
+
+static int count_total(struct page *page)
+{
+	return page->objects;
+}
+
+static int count_free(struct page *page)
+{
+	return page->objects - page->inuse;
+}
+#endif
+
 #if defined(CONFIG_SYSFS) && defined(CONFIG_SLUB_DEBUG)
 static int validate_slab(struct kmem_cache *s, struct page *page,
 						unsigned long *map)
@@ -3193,7 +3476,7 @@ static int validate_slab(struct kmem_cache *s, struct page *page,
 		return 0;
 
 	/* Now we know that a valid freelist exists */
-	bitmap_zero(map, s->objects);
+	bitmap_zero(map, page->objects);
 
 	for_each_free_object(p, s, page->freelist) {
 		set_bit(slab_index(p, s, addr), map);
@@ -3201,7 +3484,7 @@ static int validate_slab(struct kmem_cache *s, struct page *page,
 			return 0;
 	}
 
-	for_each_object(p, s, addr)
+	for_each_object(p, s, addr, page->objects)
 		if (!test_bit(slab_index(p, s, addr), map))
 			if (!check_object(s, page, p, 1))
 				return 0;
@@ -3267,7 +3550,7 @@ static long validate_slab_cache(struct kmem_cache *s)
 {
 	int node;
 	unsigned long count = 0;
-	unsigned long *map = kmalloc(BITS_TO_LONGS(s->objects) *
+	unsigned long *map = kmalloc(BITS_TO_LONGS(oo_objects(s->max)) *
 				sizeof(unsigned long), GFP_KERNEL);
 
 	if (!map)
@@ -3470,14 +3753,14 @@ static void process_slab(struct loc_track *t, struct kmem_cache *s,
 		struct page *page, enum track_item alloc)
 {
 	void *addr = page_address(page);
-	DECLARE_BITMAP(map, s->objects);
+	DECLARE_BITMAP(map, page->objects);
 	void *p;
 
-	bitmap_zero(map, s->objects);
+	bitmap_zero(map, page->objects);
 	for_each_free_object(p, s, page->freelist)
 		set_bit(slab_index(p, s, addr), map);
 
-	for_each_object(p, s, addr)
+	for_each_object(p, s, addr, page->objects)
 		if (!test_bit(slab_index(p, s, addr), map))
 			add_location(t, s, get_track(s, p, alloc));
 }
@@ -3567,22 +3850,23 @@ static int list_locations(struct kmem_cache *s, char *buf,
 }
 
 enum slab_stat_type {
-	SL_FULL,
-	SL_PARTIAL,
-	SL_CPU,
-	SL_OBJECTS
+	SL_ALL,			/* All slabs */
+	SL_PARTIAL,		/* Only partially allocated slabs */
+	SL_CPU,			/* Only slabs used for cpu caches */
+	SL_OBJECTS,		/* Determine allocated objects not slabs */
+	SL_TOTAL		/* Determine object capacity not slabs */
 };
 
-#define SO_FULL		(1 << SL_FULL)
+#define SO_ALL		(1 << SL_ALL)
 #define SO_PARTIAL	(1 << SL_PARTIAL)
 #define SO_CPU		(1 << SL_CPU)
 #define SO_OBJECTS	(1 << SL_OBJECTS)
+#define SO_TOTAL	(1 << SL_TOTAL)
 
 static ssize_t show_slab_objects(struct kmem_cache *s,
 			    char *buf, unsigned long flags)
 {
 	unsigned long total = 0;
-	int cpu;
 	int node;
 	int x;
 	unsigned long *nodes;
@@ -3593,56 +3877,60 @@ static ssize_t show_slab_objects(struct kmem_cache *s,
 		return -ENOMEM;
 	per_cpu = nodes + nr_node_ids;
 
-	for_each_possible_cpu(cpu) {
-		struct page *page;
-		struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
+	if (flags & SO_CPU) {
+		int cpu;
 
-		if (!c)
-			continue;
+		for_each_possible_cpu(cpu) {
+			struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
 
-		page = c->page;
-		node = c->node;
-		if (node < 0)
-			continue;
-		if (page) {
-			if (flags & SO_CPU) {
-				if (flags & SO_OBJECTS)
-					x = page->inuse;
+			if (!c || c->node < 0)
+				continue;
+
+			if (c->page) {
+					if (flags & SO_TOTAL)
+						x = c->page->objects;
+				else if (flags & SO_OBJECTS)
+					x = c->page->inuse;
 				else
 					x = 1;
+
 				total += x;
-				nodes[node] += x;
+				nodes[c->node] += x;
 			}
-			per_cpu[node]++;
+			per_cpu[c->node]++;
 		}
 	}
 
-	for_each_node_state(node, N_NORMAL_MEMORY) {
-		struct kmem_cache_node *n = get_node(s, node);
+	if (flags & SO_ALL) {
+		for_each_node_state(node, N_NORMAL_MEMORY) {
+			struct kmem_cache_node *n = get_node(s, node);
+
+		if (flags & SO_TOTAL)
+			x = atomic_long_read(&n->total_objects);
+		else if (flags & SO_OBJECTS)
+			x = atomic_long_read(&n->total_objects) -
+				count_partial(n, count_free);
 
-		if (flags & SO_PARTIAL) {
-			if (flags & SO_OBJECTS)
-				x = count_partial(n);
 			else
-				x = n->nr_partial;
+				x = atomic_long_read(&n->nr_slabs);
 			total += x;
 			nodes[node] += x;
 		}
 
-		if (flags & SO_FULL) {
-			int full_slabs = atomic_long_read(&n->nr_slabs)
-					- per_cpu[node]
-					- n->nr_partial;
+	} else if (flags & SO_PARTIAL) {
+		for_each_node_state(node, N_NORMAL_MEMORY) {
+			struct kmem_cache_node *n = get_node(s, node);
 
-			if (flags & SO_OBJECTS)
-				x = full_slabs * s->objects;
+			if (flags & SO_TOTAL)
+				x = count_partial(n, count_total);
+			else if (flags & SO_OBJECTS)
+				x = count_partial(n, count_inuse);
 			else
-				x = full_slabs;
+				x = n->nr_partial;
 			total += x;
 			nodes[node] += x;
 		}
 	}
-
 	x = sprintf(buf, "%lu", total);
 #ifdef CONFIG_NUMA
 	for_each_node_state(node, N_NORMAL_MEMORY)
@@ -3657,14 +3945,6 @@ static ssize_t show_slab_objects(struct kmem_cache *s,
 static int any_slab_objects(struct kmem_cache *s)
 {
 	int node;
-	int cpu;
-
-	for_each_possible_cpu(cpu) {
-		struct kmem_cache_cpu *c = get_cpu_slab(s, cpu);
-
-		if (c && c->page)
-			return 1;
-	}
 
 	for_each_online_node(node) {
 		struct kmem_cache_node *n = get_node(s, node);
@@ -3672,7 +3952,7 @@ static int any_slab_objects(struct kmem_cache *s)
 		if (!n)
 			continue;
 
-		if (n->nr_partial || atomic_long_read(&n->nr_slabs))
+		if (atomic_read(&n->total_objects))
 			return 1;
 	}
 	return 0;
@@ -3714,26 +3994,59 @@ SLAB_ATTR_RO(object_size);
 
 static ssize_t objs_per_slab_show(struct kmem_cache *s, char *buf)
 {
-	return sprintf(buf, "%d\n", s->objects);
+	return sprintf(buf, "%d\n", oo_objects(s->oo));
 }
 SLAB_ATTR_RO(objs_per_slab);
 
+static ssize_t order_store(struct kmem_cache *s,
+				const char *buf, size_t length)
+{
+	unsigned long order;
+	int err;
+
+	err = strict_strtoul(buf, 10, &order);
+	if (err)
+		return err;
+
+	if (order > slub_max_order || order < slub_min_order)
+		return -EINVAL;
+
+	calculate_sizes(s, order);
+	return length;
+}
+
 static ssize_t order_show(struct kmem_cache *s, char *buf)
 {
-	return sprintf(buf, "%d\n", s->order);
+	return sprintf(buf, "%d\n", oo_order(s->oo));
 }
-SLAB_ATTR_RO(order);
+SLAB_ATTR(order);
 
-static ssize_t ctor_show(struct kmem_cache *s, char *buf)
+static ssize_t ops_show(struct kmem_cache *s, char *buf)
 {
+	int x = 0;
+
 	if (s->ctor) {
-		int n = sprint_symbol(buf, (unsigned long)s->ctor);
+		x += sprintf(buf + x, "ctor : ");
+		x += sprint_symbol(buf + x, (unsigned long)s->ctor);
+		x += sprintf(buf + x, "\n");
+	}
 
-		return n + sprintf(buf + n, "\n");
+	if (s->get) {
+		x += sprintf(buf + x, "get : ");
+		x += sprint_symbol(buf + x,
+				(unsigned long)s->get);
+		x += sprintf(buf + x, "\n");
 	}
-	return 0;
+
+	if (s->kick) {
+		x += sprintf(buf + x, "kick : ");
+		x += sprint_symbol(buf + x,
+				(unsigned long)s->kick);
+		x += sprintf(buf + x, "\n");
+	}
+	return x;
 }
-SLAB_ATTR_RO(ctor);
+SLAB_ATTR_RO(ops);
 
 static ssize_t aliases_show(struct kmem_cache *s, char *buf)
 {
@@ -3743,7 +4056,7 @@ SLAB_ATTR_RO(aliases);
 
 static ssize_t slabs_show(struct kmem_cache *s, char *buf)
 {
-	return show_slab_objects(s, buf, SO_FULL|SO_PARTIAL|SO_CPU);
+	return show_slab_objects(s, buf, SO_ALL);
 }
 SLAB_ATTR_RO(slabs);
 
@@ -3761,10 +4074,22 @@ SLAB_ATTR_RO(cpu_slabs);
 
 static ssize_t objects_show(struct kmem_cache *s, char *buf)
 {
-	return show_slab_objects(s, buf, SO_FULL|SO_PARTIAL|SO_CPU|SO_OBJECTS);
+	return show_slab_objects(s, buf, SO_ALL|SO_OBJECTS);
 }
 SLAB_ATTR_RO(objects);
 
+static ssize_t objects_partial_show(struct kmem_cache *s, char *buf)
+{
+	return show_slab_objects(s, buf, SO_PARTIAL|SO_OBJECTS);
+}
+SLAB_ATTR_RO(objects_partial);
+
+static ssize_t total_objects_show(struct kmem_cache *s, char *buf)
+{
+	return show_slab_objects(s, buf, SO_ALL|SO_TOTAL);
+}
+SLAB_ATTR_RO(total_objects);
+
 static ssize_t sanity_checks_show(struct kmem_cache *s, char *buf)
 {
 	return sprintf(buf, "%d\n", !!(s->flags & SLAB_DEBUG_FREE));
@@ -3844,7 +4169,7 @@ static ssize_t red_zone_store(struct kmem_cache *s,
 	s->flags &= ~SLAB_RED_ZONE;
 	if (buf[0] == '1')
 		s->flags |= SLAB_RED_ZONE;
-	calculate_sizes(s);
+	calculate_sizes(s, -1);
 	return length;
 }
 SLAB_ATTR(red_zone);
@@ -3863,7 +4188,7 @@ static ssize_t poison_store(struct kmem_cache *s,
 	s->flags &= ~SLAB_POISON;
 	if (buf[0] == '1')
 		s->flags |= SLAB_POISON;
-	calculate_sizes(s);
+	calculate_sizes(s, -1);
 	return length;
 }
 SLAB_ATTR(poison);
@@ -3882,7 +4207,7 @@ static ssize_t store_user_store(struct kmem_cache *s,
 	s->flags &= ~SLAB_STORE_USER;
 	if (buf[0] == '1')
 		s->flags |= SLAB_STORE_USER;
-	calculate_sizes(s);
+	calculate_sizes(s, -1);
 	return length;
 }
 SLAB_ATTR(store_user);
@@ -3941,6 +4266,27 @@ static ssize_t free_calls_show(struct kmem_cache *s, char *buf)
 }
 SLAB_ATTR_RO(free_calls);
 
+static ssize_t defrag_ratio_show(struct kmem_cache *s, char *buf)
+{
+	return sprintf(buf, "%d\n", s->defrag_ratio);
+}
+
+static ssize_t defrag_ratio_store(struct kmem_cache *s,
+				const char *buf, size_t length)
+{
+	unsigned long ratio;
+	int err;
+
+	err = strict_strtoul(buf, 10, &ratio);
+	if (err)
+		return err;
+
+	if (ratio < 100)
+		s->defrag_ratio = ratio;
+	return length;
+}
+SLAB_ATTR(defrag_ratio);
+
 #ifdef CONFIG_NUMA
 static ssize_t remote_node_defrag_ratio_show(struct kmem_cache *s, char *buf)
 {
@@ -3950,10 +4296,16 @@ static ssize_t remote_node_defrag_ratio_show(struct kmem_cache *s, char *buf)
 static ssize_t remote_node_defrag_ratio_store(struct kmem_cache *s,
 				const char *buf, size_t length)
 {
-	int n = simple_strtoul(buf, NULL, 10);
+	unsigned long ratio;
+	int err;
+
+	err = strict_strtoul(buf, 10, &ratio);
+	if (err)
+		return err;
+
+	if (ratio < 100)
+		s->remote_node_defrag_ratio = ratio * 10;
 
-	if (n < 100)
-		s->remote_node_defrag_ratio = n * 10;
 	return length;
 }
 SLAB_ATTR(remote_node_defrag_ratio);
@@ -3979,10 +4331,12 @@ static int show_stat(struct kmem_cache *s, char *buf, enum stat_item si)
 
 	len = sprintf(buf, "%lu", sum);
 
+#ifdef CONFIG_SMP
 	for_each_online_cpu(cpu) {
 		if (data[cpu] && len < PAGE_SIZE - 20)
-			len += sprintf(buf + len, " c%d=%u", cpu, data[cpu]);
+			len += sprintf(buf + len, " C%d=%u", cpu, data[cpu]);
 	}
+#endif
 	kfree(data);
 	return len + sprintf(buf + len, "\n");
 }
@@ -4011,7 +4365,13 @@ STAT_ATTR(DEACTIVATE_EMPTY, deactivate_empty);
 STAT_ATTR(DEACTIVATE_TO_HEAD, deactivate_to_head);
 STAT_ATTR(DEACTIVATE_TO_TAIL, deactivate_to_tail);
 STAT_ATTR(DEACTIVATE_REMOTE_FREES, deactivate_remote_frees);
-
+STAT_ATTR(ORDER_FALLBACK, order_fallback);
+STAT_ATTR(SHRINK_CALLS, shrink_calls);
+STAT_ATTR(SHRINK_ATTEMPT_DEFRAG, shrink_attempt_defrag);
+STAT_ATTR(SHRINK_EMPTY_SLAB, shrink_empty_slab);
+STAT_ATTR(SHRINK_SLAB_SKIPPED, shrink_slab_skipped);
+STAT_ATTR(SHRINK_SLAB_RECLAIMED, shrink_slab_reclaimed);
+STAT_ATTR(SHRINK_OBJECT_RECLAIM_FAILED, shrink_object_reclaim_failed);
 #endif
 
 static struct attribute *slab_attrs[] = {
@@ -4020,10 +4380,12 @@ static struct attribute *slab_attrs[] = {
 	&objs_per_slab_attr.attr,
 	&order_attr.attr,
 	&objects_attr.attr,
+	&objects_partial_attr.attr,
+	&total_objects_attr.attr,
 	&slabs_attr.attr,
 	&partial_attr.attr,
 	&cpu_slabs_attr.attr,
-	&ctor_attr.attr,
+	&ops_attr.attr,
 	&aliases_attr.attr,
 	&align_attr.attr,
 	&sanity_checks_attr.attr,
@@ -4038,6 +4400,7 @@ static struct attribute *slab_attrs[] = {
 	&shrink_attr.attr,
 	&alloc_calls_attr.attr,
 	&free_calls_attr.attr,
+	&defrag_ratio_attr.attr,
 #ifdef CONFIG_ZONE_DMA
 	&cache_dma_attr.attr,
 #endif
@@ -4062,6 +4425,13 @@ static struct attribute *slab_attrs[] = {
 	&deactivate_to_head_attr.attr,
 	&deactivate_to_tail_attr.attr,
 	&deactivate_remote_frees_attr.attr,
+	&order_fallback_attr.attr,
+	&shrink_calls_attr.attr,
+	&shrink_attempt_defrag_attr.attr,
+	&shrink_empty_slab_attr.attr,
+	&shrink_slab_skipped_attr.attr,
+	&shrink_slab_reclaimed_attr.attr,
+	&shrink_object_reclaim_failed_attr.attr,
 #endif
 	NULL
 };
@@ -4305,8 +4675,8 @@ __initcall(slab_sysfs_init);
  */
 #ifdef CONFIG_SLABINFO
 
-ssize_t slabinfo_write(struct file *file, const char __user * buffer,
-                       size_t count, loff_t *ppos)
+ssize_t slabinfo_write(struct file *file, const char __user *buffer,
+		       size_t count, loff_t *ppos)
 {
 	return -EINVAL;
 }
@@ -4348,7 +4718,8 @@ static int s_show(struct seq_file *m, void *p)
 	unsigned long nr_partials = 0;
 	unsigned long nr_slabs = 0;
 	unsigned long nr_inuse = 0;
-	unsigned long nr_objs;
+	unsigned long nr_objs = 0;
+	unsigned long nr_free = 0;
 	struct kmem_cache *s;
 	int node;
 
@@ -4362,14 +4733,15 @@ static int s_show(struct seq_file *m, void *p)
 
 		nr_partials += n->nr_partial;
 		nr_slabs += atomic_long_read(&n->nr_slabs);
-		nr_inuse += count_partial(n);
+		nr_objs += atomic_long_read(&n->total_objects);
+		nr_free += count_partial(n, count_free);
 	}
 
-	nr_objs = nr_slabs * s->objects;
-	nr_inuse += (nr_slabs - nr_partials) * s->objects;
+	nr_inuse = nr_objs - nr_free;
 
 	seq_printf(m, "%-17s %6lu %6lu %6u %4u %4d", s->name, nr_inuse,
-		   nr_objs, s->size, s->objects, (1 << s->order));
+		   nr_objs, s->size, oo_objects(s->oo),
+		   (1 << oo_order(s->oo)));
 	seq_printf(m, " : tunables %4u %4u %4u", 0, 0, 0);
 	seq_printf(m, " : slabdata %6lu %6lu %6lu", nr_slabs, nr_slabs,
 		   0UL);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4046434..032ff11 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -166,10 +166,18 @@ EXPORT_SYMBOL(unregister_shrinker);
  * are eligible for the caller's allocation attempt.  It is used for balancing
  * slab reclaim versus page reclaim.
  *
+ * zone is the zone for which we are shrinking the slabs. If the intent
+ * is to do a global shrink then zone may be NULL. Specification of a
+ * zone is currently only used to limit slab defragmentation to a NUMA node.
+ * The performace of shrink_slab would be better (in particular under NUMA)
+ * if it could be targeted as a whole to the zone that is under memory
+ * pressure but the VFS infrastructure does not allow that at the present
+ * time.
+ *
  * Returns the number of slab objects which we shrunk.
  */
 unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
-			unsigned long lru_pages)
+			unsigned long lru_pages, struct zone *zone)
 {
 	struct shrinker *shrinker;
 	unsigned long ret = 0;
@@ -226,6 +234,8 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
 		shrinker->nr += total_scan;
 	}
 	up_read(&shrinker_rwsem);
+	if (ret && (gfp_mask & __GFP_FS))
+		kmem_cache_defrag(zone ? zone_to_nid(zone) : -1);
 	return ret;
 }
 
@@ -1339,7 +1349,7 @@ static unsigned long do_try_to_free_pages(struct zone **zones, gfp_t gfp_mask,
 		 * over limit cgroups
 		 */
 		if (scan_global_lru(sc)) {
-			shrink_slab(sc->nr_scanned, gfp_mask, lru_pages);
+			shrink_slab(sc->nr_scanned, gfp_mask, lru_pages, NULL);
 			if (reclaim_state) {
 				nr_reclaimed += reclaim_state->reclaimed_slab;
 				reclaim_state->reclaimed_slab = 0;
@@ -1566,7 +1576,7 @@ loop_again:
 				nr_reclaimed += shrink_zone(priority, zone, &sc);
 			reclaim_state->reclaimed_slab = 0;
 			nr_slab = shrink_slab(sc.nr_scanned, GFP_KERNEL,
-						lru_pages);
+						lru_pages, zone);
 			nr_reclaimed += reclaim_state->reclaimed_slab;
 			total_scanned += sc.nr_scanned;
 			if (zone_is_all_unreclaimable(zone))
@@ -1806,7 +1816,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
 	/* If slab caches are huge, it's better to hit them first */
 	while (nr_slab >= lru_pages) {
 		reclaim_state.reclaimed_slab = 0;
-		shrink_slab(nr_pages, sc.gfp_mask, lru_pages);
+		shrink_slab(nr_pages, sc.gfp_mask, lru_pages, NULL);
 		if (!reclaim_state.reclaimed_slab)
 			break;
 
@@ -1844,7 +1854,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
 
 			reclaim_state.reclaimed_slab = 0;
 			shrink_slab(sc.nr_scanned, sc.gfp_mask,
-					count_lru_pages());
+					count_lru_pages(), NULL);
 			ret += reclaim_state.reclaimed_slab;
 			if (ret >= nr_pages)
 				goto out;
@@ -1861,7 +1871,8 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
 	if (!ret) {
 		do {
 			reclaim_state.reclaimed_slab = 0;
-			shrink_slab(nr_pages, sc.gfp_mask, count_lru_pages());
+			shrink_slab(nr_pages, sc.gfp_mask,
+					count_lru_pages(), NULL);
 			ret += reclaim_state.reclaimed_slab;
 		} while (ret < nr_pages && reclaim_state.reclaimed_slab > 0);
 	}
@@ -2024,7 +2035,8 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 		 * Note that shrink_slab will free memory on all zones and may
 		 * take a long time.
 		 */
-		while (shrink_slab(sc.nr_scanned, gfp_mask, order) &&
+		while (shrink_slab(sc.nr_scanned, gfp_mask, order,
+						zone) &&
 			zone_page_state(zone, NR_SLAB_RECLAIMABLE) >
 				slab_reclaimable - nr_pages)
 			;