zone_reclaim: partial scans instead of full scan. Instead of scanning all the pages in a zone, imitate real swap and scan only a portion of the pages and gradually scan more if we do not free up enough pages. This avoids a zone suddenly loosing all unused pagecache pages but still frees up large chunks if a zone only contains unused pagecache pages. Signed-off-by: Christoph Lameter Index: linux-2.6.16-rc1-mm3/mm/vmscan.c =================================================================== --- linux-2.6.16-rc1-mm3.orig/mm/vmscan.c 2006-01-25 10:02:09.000000000 -0800 +++ linux-2.6.16-rc1-mm3/mm/vmscan.c 2006-01-25 10:05:05.000000000 -0800 @@ -1835,6 +1835,14 @@ int zone_reclaim_mode __read_mostly; * Mininum time between zone reclaim scans */ #define ZONE_RECLAIM_INTERVAL 30*HZ + +/* + * Priority for ZONE_RECLAIM. This determines the fraction of pages + * of a node considered for each zone_reclaim. 4 scans 1/16th of + * a zone. + */ +#define ZONE_RECLAIM_PRIORITY 4 + /* * Try to free up some pages from this zone through reclaim. */ @@ -1865,7 +1873,7 @@ int zone_reclaim(struct zone *zone, gfp_ sc.may_swap = 0; sc.nr_scanned = 0; sc.nr_reclaimed = 0; - sc.priority = 0; + sc.priority = ZONE_RECLAIM_PRIORITY + 1; sc.nr_mapped = read_page_state(nr_mapped); sc.gfp_mask = gfp_mask; @@ -1882,7 +1890,15 @@ int zone_reclaim(struct zone *zone, gfp_ reclaim_state.reclaimed_slab = 0; p->reclaim_state = &reclaim_state; - shrink_zone(zone, &sc); + /* + * Free memory by calling shrink zone with increasing priorities + * until we have enough memory freed. + */ + do { + sc.priority--; + shrink_zone(zone, &sc); + + } while (sc.nr_reclaimed < nr_pages && sc.priority > 0); p->reclaim_state = NULL; current->flags &= ~PF_MEMALLOC;