V2->V3 - Fix Kconfig issues by setting CONFIG_QUICKLIST explicitly and default to one quicklist if NR_QUICK is not set. - Fix i386 support. (Cannot mix PMD and PTE allocs.) - Discussion of V2. http://marc.info/?l=linux-kernel&m=117391339914767&w=2 V1->V2 - Add sparch64 patch - Single i386 and x86_64 patch - Update attribution - Update justification - Update approvals - Earlier discussion of V1 was at http://marc.info/?l=linux-kernel&m=117357922219342&w=2 This patchset introduces an arch independent framework to handle lists of recently used page table pages. 1. Proven code from the IA64 arch. The method used here has been fine tuned for years and is NUMA aware. It is based on the knowledge that accesses to page table pages are sparse in nature. Taking a page off the freelists allows a reduction of number of cachelines touched during a page table page allocation in addition to getting rid of the slab overhead. So performance improves. This is particularly useful if pgds contain standard mappings. We can save on the teardown and setup of such a page if we have some on the quicklists. 2. Light weight alternative to use slab to manage page size pages Slab overhead is significant and even page allocator use is pretty heavy weight. The use of a per cpu quicklist means that we touch only two cachelines for an allocation. There is no need to access the page_struct (unless arch code needs to fiddle around with it). So the fast past just means bringing in one cacheline at the beginning of the page. That same cacheline may then be used to store the page table entry. Or a second cacheline may be used if the page table entry is not in the first cacheline of the page. The current code will zero the page which means touching 32 cachelines (assuming 128 byte). So we get down from 32 to 2 cachelines in the fast path. 3. Fix conflicting use of page_structs by slab and arch code. F.e. Both arches use the ->private and ->index field to create lists of pgds. i386 uses other page flags. The slab can also use the ->private field for allocations that are larger than page size which would occur if one enables debugging. In that case the arch code would overwrite the pointer to the first page of the compound page allocated by the slab. SLAB has been modified to not enable debugging for such slabs (!). There the potential for additional conflicts here especially since some arches also uses page flags to mark page table pages. This patch removes these conflicts by no longer using the slab for these purposes. The page allocators natural allocation size are PAGE_SIZE chunks after all. Then we can start using standard list operations via page->lru instead of improvising linked lists. SLUB makes more extensive use of the page struct and so far had to create workarounds for these slabs. The ->index field is used for the SLUB freelist. So SLUB cannot allow the use of a freelist for these slabs and--like slab-- currently does not allow debugging and forces slabs to only contain a single object (avoids freelist). If we do not get rid of these issues then both SLAB and SLUB have to continue to provide special code to support these slabs. 4. i386 gets lightweight NUMA aware management of page table pages. Note that the use of SLAB on NUMA systems will require the use of alien caches to efficiently remove remote page table pages. Which (for a PAGE_SIZEd allocation) is a lengthy and expensive process. 5. x86_64 gets lightweight page table page management. This will allow x86_64 arch code to faster repopulate pgds and other page table entries. The list operations for pgds are reduced in the same way as for i386 to the point where a pgd is allocated from the page allocator and when it is freed back to the page allocator. A pgd can pass through the quicklists without having to be reinitialized. 6. Consolidation of code from multiple arches So far arches have their own implementation of quicklist management. This patch moves that feature into the core allowing an easier maintenance and consistent management of quicklists. Page table pages have the characteristics that they are typically zero or in a known state when they are freed. This is usually the exactly same state as needed after allocation. So it makes sense to build a list of freed page table pages and then consume the pages already in use first. Those pages have already been initialized correctly (thus no need to zero them) and are likely already cached in such a way that the MMU can use them most effectively. Page table pages are used in a sparse way so zeroing them on allocation is not too useful. Such an implementation already exits for ia64. Howver, that implementation did not support constructors and destructors as needed by i386 / x86_64. It also only supported a single quicklist. The implementation here has constructor and destructor support as well as the ability for an arch to specify how many quicklists are needed. Quicklists are defined by an arch defining CONFIG_QUICKLISTS. If more than one quicklist is necessary then we can define NR_QUICK for additional lists. F.e. i386 needs two and thus has config NR_QUICK int default 2 If an arch has requested quicklist support then pages can be allocated from the quicklist (or from the page allocator if the quicklist is empty) via: quicklist_alloc(, , ) Page table pages can be freed using: quicklist_free(, , ) Pages must have a definite state after allocation and before they are freed. If no constructor is specified then pages will be zeroed on allocation and must be zeroed before they are freed. If a constructor is used then the constructor will establish a definite page state. F.e. the i386 and x86_64 pgd constructors establish certain mappings. Constructors and destructors can also be used to track the pages. i386 and x86_64 use a list of pgds in order to be able to dynamically update standard mappings. Tested on: i386 UP / SMP, x86_64 UP, NUMA emulation, IA64 NUMA.