Subject: PAGEFLAGS_EXTENDED and separate page flags for Head and Tail Having separate page flags for the head and the tail of a compound page allows the compiler to use bitops instead of operations on a word to check for a tail page. That is f.e. important for virt_to_head_page() which is used in various critical code paths (kfree for example): Code for PageTail(page) Before: mov (%rdi),%rdx page->flags mov %rdx,%rax 3 bytes and $0x12000,%eax 5 bytes cmp $0x12000,%rax 6 bytes je 897 After: mov (%rdi),%rax test $0x40,%ah (3 bytes) jne 887 So we go from 14 bytes to 3 bytes and from 3 instructions to one. From the use of 2 registers we go to none. We can only use page flags for this if we have page flags available. This patch introduces CONFIG_PAGEFLAGS_EXTENDED that is set if pageflags are not scarce due to SPARSEMEM using page flags for its sectionid on 32 bit NUMA platforms. Additional page flag definitions can be added to the CONFIG_PAGEFLAGS_EXTENDED section in page-flags.h if the functionality depends on PAGEFLAGS_EXTENDED or if more page flag overlapping tricks are used for the !PAGEFLAGS_EXTENDED fallback. Avoiding the overlaying of Pg_reclaim also clears the way for possible use of compound pages for the pagecache or on the LRU. Signed-off-by: Christoph Lameter --- include/linux/page-flags.h | 28 ++++++++++++++++++++++++++++ mm/Kconfig | 12 ++++++++++++ 2 files changed, 40 insertions(+) Index: linux-2.6.25-rc8-mm1/include/linux/page-flags.h =================================================================== --- linux-2.6.25-rc8-mm1.orig/include/linux/page-flags.h 2008-04-03 15:42:44.513884467 -0700 +++ linux-2.6.25-rc8-mm1/include/linux/page-flags.h 2008-04-03 16:20:22.725663800 -0700 @@ -83,7 +83,12 @@ enum pageflags { PG_reserved, PG_private, /* If pagecache, has fs-private data */ PG_writeback, /* Page is under writeback */ +#ifdef CONFIG_PAGEFLAGS_EXTENDED + PG_head, /* A head page */ + PG_tail, /* A tail page */ +#else PG_compound, /* A compound page */ +#endif PG_swapcache, /* Swap page: swp_entry_t in private */ PG_mappedtodisk, /* Has blocks allocated on-disk */ PG_reclaim, /* To be reclaimed asap */ @@ -248,6 +253,28 @@ static inline void set_page_writeback(st test_set_page_writeback(page); } +#ifdef CONFIG_PAGEFLAGS_EXTENDED +/* + * System with lots of page flags available. This allows separate + * flags for PageHead() and PageTail() checks of compound pages so that bit + * tests can be used in performance sensitive paths. PageCompound is + * generally not used in hot code paths. + */ +__PAGEFLAG(Head, head) +__PAGEFLAG(Tail, tail) + +static inline int PageCompound(struct page *page) +{ + return page->flags & ((1L << PG_head) | (1L << PG_tail)); + +} +#else +/* + * Reduce page flag use as much as possible by overlapping + * compound page flags with the flags used for page cache pages. Possible + * because PageCompound is always set for compound pages and not for + * pages on the LRU and/or pagecache. + */ TESTPAGEFLAG(Compound, compound) __PAGEFLAG(Head, compound) @@ -278,5 +305,6 @@ static inline void __ClearPageTail(struc page->flags &= ~PG_head_tail_mask; } +#endif /* !PAGEFLAGS_EXTENDED */ #endif /* !__GENERATING_BOUNDS_H */ #endif /* PAGE_FLAGS_H */ Index: linux-2.6.25-rc8-mm1/mm/Kconfig =================================================================== --- linux-2.6.25-rc8-mm1.orig/mm/Kconfig 2008-04-03 16:20:25.989689940 -0700 +++ linux-2.6.25-rc8-mm1/mm/Kconfig 2008-04-03 16:30:02.673595262 -0700 @@ -143,6 +143,18 @@ config MEMORY_HOTREMOVE depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE depends on MIGRATION +# +# If we have space for more page flags then we can enable additional +# optimizations and functionality. +# +# Regular Sparsemem takes page flag bits for the sectionid if it does not +# use a virtual memmap. Disable extended page flags for 32 bit platforms +# that require the use of a sectionid in the page flags. +# +config PAGEFLAGS_EXTENDED + def_bool y + depends on 64BIT || SPARSEMEM_VMEMMAP || !NUMA || !SPARSEMEM + # Heavily threaded applications may benefit from splitting the mm-wide # page_table_lock, so that faults on different parts of the user address # space can be handled with less contention: split it at this NR_CPUS.