From: Christoph Lameter To: Hugh Dickins Cc: Andrea Arcangeli Cc: Robin Holt , Avi Kivity , Izik Eidus , Cc: kvm-devel@lists.sourceforge.net Cc: Peter Zijlstra , general@lists.openfabrics.org Cc: Steve Wise Cc: Roland Dreier Cc: Kanoj Sarcar Cc: steiner@sgi.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Cc: daniel.blueman@quadrics.com Subject-Prefix: [patch @num@/@total@] Subject: [RFC] EMM Notifier V2 [Note that I will be giving talks next week at the OpenFabrics Forum and at the Linux Collab Summit in Austin on memory pinning etc. It would be great if I could get some feedback on the approach then] V1->V2: - Additional optimizations in the VM - Convert vm spinlocks to rw sems. - Add XPMEM driver (requires sleeping in callbacks) - Add XPMEM example This patch implements a simple callback for device drivers that establish their own references to pages (KVM, GRU, XPmem, RDMA/Infiniband, DMA engines etc). These references are unknown to the VM (therefore external). With these callbacks it is possible for the device driver to release external references when the VM requests it. This enables swapping, page migration and allows support of remapping, permission changes etc etc for the externally mapped memory. With this functionality it becomes also possible to avoid pinning or mlocking pages (commonly done to stop the VM from unmapping device mapped pages). A device driver must subscribe to a process using emm_register_notifier(struct emm_notifier *, struct mm_struct *) The VM will then perform callbacks for operations that unmap or change permissions of pages in that address space. When the process terminates the callback function is called with emm_release. Callbacks are performed before and after the unmapping action of the VM. emm_invalidate_start before emm_invalidate_end after The device driver must hold off establishing new references to pages in the range specified between a callback with emm_invalidate_start and the subsequent call with emm_invalidate_end set. This allows the VM to ensure that no concurrent driver actions are performed on an address range while performing remapping or unmapping operations. This patchset contains additional modifications needed to ensure that the callbacks can sleep. For that purpose two key locks in the vm need to be converted to rw_sems. These patches are brand new, invasive and need extensive discussion and evaluation. The first patch alone may be applied if callbacks in atomic context are sufficient for a device driver (likely the case for KVM and GRU and simple DMA drivers). Following the VM modifications is the XPMEM device driver that allows sharing of memory between processes running on different instances of Linux. This is also a prototype. It is known to run trivial sample programs included as the last patch.