From andrea@qumranet.com Tue Jan 29 10:28:41 2008 Date: Tue, 29 Jan 2008 19:28:32 +0100 From: Andrea Arcangeli To: Christoph Lameter Cc: Robin Holt , Avi Kivity , Izik Eidus , Nick Piggin , kvm-devel@lists.sourceforge.net, Benjamin Herrenschmidt , Peter Zijlstra , steiner@sgi.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, daniel.blueman@quadrics.com, Hugh Dickins Subject: Re: [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges Christoph, the below patch should fix the current leak of the pinned pages. I hope the page-pin that should be dropped by the invalidate_range op, is enough to prevent the "physical page" mapped on that "mm+address" to change before invalidate_range returns. If that would ever happen, there would be a coherency loss between the guest VM writes and the writes coming from userland on the same mm+address from a different thread (qemu, whatever). invalidate_page before PT lock was obviously safe. Now we entirely relay on the pin to prevent the page to change before invalidate_range returns. If the pte is unmapped and the page is mapped back in with a minor fault that's ok, as long as the physical page remains the same for that mm+address, until all sptes are gone. Signed-off-by: Andrea Arcangeli --- mm/fremap.c | 2 +- mm/memory.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) Index: linux-2.6/mm/fremap.c =================================================================== --- linux-2.6.orig/mm/fremap.c 2008-01-29 12:42:16.000000000 -0800 +++ linux-2.6/mm/fremap.c 2008-01-29 16:39:15.000000000 -0800 @@ -212,8 +212,8 @@ asmlinkage long sys_remap_file_pages(uns spin_unlock(&mapping->i_mmap_lock); } - mmu_notifier(invalidate_range, mm, start, start + size, 0); err = populate_range(mm, vma, start, size, pgoff); + mmu_notifier(invalidate_range, mm, start, start + size, 0); if (!err && !(flags & MAP_NONBLOCK)) { if (unlikely(has_write_lock)) { downgrade_write(&mm->mmap_sem); Index: linux-2.6/mm/memory.c =================================================================== --- linux-2.6.orig/mm/memory.c 2008-01-28 18:41:17.000000000 -0800 +++ linux-2.6/mm/memory.c 2008-01-29 16:39:15.000000000 -0800 @@ -1638,8 +1638,6 @@ gotten: /* * Re-check the pte - we dropped the lock */ - mmu_notifier(invalidate_range, mm, address, - address + PAGE_SIZE - 1, 0); page_table = pte_offset_map_lock(mm, pmd, address, &ptl); if (likely(pte_same(*page_table, orig_pte))) { if (old_page) { @@ -1675,6 +1673,8 @@ gotten: page_cache_release(old_page); unlock: pte_unmap_unlock(page_table, ptl); + mmu_notifier(invalidate_range, mm, address, + address + PAGE_SIZE - 1, 0); if (dirty_page) { if (vma->vm_file) file_update_time(vma->vm_file);