From: "Chen, Kenneth W" Imprecise RSS accounting is an irritating ill effect with pt sharing. After consulted with several VM experts, I have tried various methods to solve that problem: (1) iterate through all mm_structs that share the PT and increment count; (2) keep RSS count in page table structure and then sum them up at reporting time. None of the above methods yield any satisfactory implementation. Since process RSS accounting is pure information only, I propose we don't count them at all for hugetlb page. rlimit has such field, though there is absolutely no enforcement on limiting that resource. One other method is to account all RSS at hugetlb mmap time regardless they are faulted or not. I opt for the simplicity of no accounting at all. Hugetlb page are special, they are reserved up front in global reservation pool and is not reclaimable. From physical memory resource point of view, it is already consumed regardless whether there are users using them. If the concern is that RSS can be used to control resource allocation, we already can specify hugetlb fs size limit and sysadmin can enforce that at mount time. Combined with the two points mentioned above, I fail to see if there is anything got affected because of this patch. Signed-off-by: Ken Chen Acked-by: Hugh Dickins Cc: Dave McCracken Cc: William Lee Irwin III Cc: "Luck, Tony" Cc: Paul Mackerras Cc: Benjamin Herrenschmidt Cc: David Gibson Cc: Adam Litke Cc: Paul Mundt Cc: "David S. Miller" Signed-off-by: Andrew Morton --- mm/hugetlb.c | 8 -------- 1 file changed, 8 deletions(-) diff -puN mm/hugetlb.c~htlb-forget-rss-with-pt-sharing mm/hugetlb.c --- a/mm/hugetlb.c~htlb-forget-rss-with-pt-sharing +++ a/mm/hugetlb.c @@ -344,7 +344,6 @@ int copy_hugetlb_page_range(struct mm_st entry = *src_pte; ptepage = pte_page(entry); get_page(ptepage); - add_mm_counter(dst, file_rss, HPAGE_SIZE / PAGE_SIZE); set_huge_pte_at(dst, addr, dst_pte, entry); } spin_unlock(&src->page_table_lock); @@ -377,10 +376,6 @@ void __unmap_hugepage_range(struct vm_ar BUG_ON(end & ~HPAGE_MASK); spin_lock(&mm->page_table_lock); - - /* Update high watermark before we lower rss */ - update_hiwater_rss(mm); - for (address = start; address < end; address += HPAGE_SIZE) { ptep = huge_pte_offset(mm, address); if (!ptep) @@ -395,9 +390,7 @@ void __unmap_hugepage_range(struct vm_ar page = pte_page(pte); list_add(&page->lru, &page_list); - add_mm_counter(mm, file_rss, (int) -(HPAGE_SIZE / PAGE_SIZE)); } - spin_unlock(&mm->page_table_lock); flush_tlb_range(vma, start, end); list_for_each_entry_safe(page, tmp, &page_list, lru) { @@ -523,7 +516,6 @@ retry: if (!pte_none(*ptep)) goto backout; - add_mm_counter(mm, file_rss, HPAGE_SIZE / PAGE_SIZE); new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE) && (vma->vm_flags & VM_SHARED))); set_huge_pte_at(mm, address, ptep, new_pte); _