commit 422e6c4bc4b48c15b3cb57a1ca71431abfc57e54 Merge: c83ce98 574197e Author: Linus Torvalds Date: Tue Mar 15 15:48:13 2011 -0700 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (57 commits) tidy the trailing symlinks traversal up Turn resolution of trailing symlinks iterative everywhere simplify link_path_walk() tail Make trailing symlink resolution in path_lookupat() iterative update nd->inode in __do_follow_link() instead of after do_follow_link() pull handling of one pathname component into a helper fs: allow AT_EMPTY_PATH in linkat(), limit that to CAP_DAC_READ_SEARCH Allow passing O_PATH descriptors via SCM_RIGHTS datagrams readlinkat(), fchownat() and fstatat() with empty relative pathnames Allow O_PATH for symlinks New kind of open files - "location only". ext4: Copy fs UUID to superblock ext3: Copy fs UUID to superblock. vfs: Export file system uuid via /proc//mountinfo unistd.h: Add new syscalls numbers to asm-generic x86: Add new syscalls for x86_64 x86: Add new syscalls for x86_32 fs: Remove i_nlink check from file system link callback fs: Don't allow to create hardlink for deleted file vfs: Add open by file handle support ... commit c83ce989cb5ff86575821992ea82c4df5c388ebc Author: Trond Myklebust Date: Tue Mar 15 13:36:43 2011 -0400 VFS: Fix the nfs sillyrename regression in kernel 2.6.38 The new vfs locking scheme introduced in 2.6.38 breaks NFS sillyrename because the latter relies on being able to determine the parent directory of the dentry in the ->iput() callback in order to send the appropriate unlink rpc call. Looking at the code that cares about races with dput(), there doesn't seem to be anything that specifically uses d_parent as a test for whether or not there is a race: - __d_lookup_rcu(), __d_lookup() all test for d_hashed() after d_parent - shrink_dcache_for_umount() is safe since nothing else can rearrange the dentries in that super block. - have_submount(), select_parent() and d_genocide() can test for a deletion if we set the DCACHE_DISCONNECTED flag when the dentry is removed from the parent's d_subdirs list. Signed-off-by: Trond Myklebust Cc: stable@kernel.org (2.6.38, needs commit c826cb7dfce8 "dcache.c: create helper function for duplicated functionality" ) Signed-off-by: Linus Torvalds commit c826cb7dfce80512c26c984350077a25046bd215 Author: Linus Torvalds Date: Tue Mar 15 15:29:21 2011 -0700 dcache.c: create helper function for duplicated functionality This creates a helper function for he "try to ascend into the parent directory" case, which was written out in triplicate before. With all the locking and subtle sequence number stuff, we really don't want to duplicate that kind of code. Signed-off-by: Linus Torvalds commit 574197e0de46a8a4db5c54ef7b65e43ffa8873a7 Author: Al Viro Date: Mon Mar 14 22:20:34 2011 -0400 tidy the trailing symlinks traversal up * pull the handling of current->total_link_count into __do_follow_link() * put the common "do ->put_link() if needed and path_put() the link" stuff into a helper (put_link(nd, link, cookie)) * rename __do_follow_link() to follow_link(), while we are at it Signed-off-by: Al Viro commit b356379a020bb7197603118bb1cbc903963aa198 Author: Al Viro Date: Mon Mar 14 21:54:55 2011 -0400 Turn resolution of trailing symlinks iterative everywhere The last remaining place (resolution of nested symlink) converted to the loop of the same kind we have in path_lookupat() and path_openat(). Note that we still *do* have a recursion in pathname resolution; can't avoid it, really. However, it's strictly for nested symlinks now - i.e. ones in the middle of a pathname. link_path_walk() has lost the tail now - it always walks everything except the last component. do_follow_link() renamed to nested_symlink() and moved down. Signed-off-by: Al Viro commit ce0525449da56444948c368f52e10f3db0465338 Author: Al Viro Date: Mon Mar 14 21:28:04 2011 -0400 simplify link_path_walk() tail Now that link_path_walk() is called without LOOKUP_PARENT only from do_follow_link(), we can simplify the checks in last component handling. First of all, checking if we'd arrived to a directory is not needed - the caller will check it anyway. And LOOKUP_FOLLOW is guaranteed to be there, since we only get to that place with nd->depth > 0. Signed-off-by: Al Viro commit bd92d7fed877ed1e6997e4f3f13dbcd872947653 Author: Al Viro Date: Mon Mar 14 19:54:59 2011 -0400 Make trailing symlink resolution in path_lookupat() iterative Now the only caller of link_path_walk() that does *not* pass LOOKUP_PARENT is do_follow_link() Signed-off-by: Al Viro commit b21041d0f72899ed815bd2cbf7275339c74737b6 Author: Al Viro Date: Mon Mar 14 20:01:51 2011 -0400 update nd->inode in __do_follow_link() instead of after do_follow_link() ... and note that we only need to do it for LAST_BIND symlinks Signed-off-by: Al Viro commit ce57dfc1791221ef58b6d6b8f5437fccefc4e187 Author: Al Viro Date: Sun Mar 13 19:58:58 2011 -0400 pull handling of one pathname component into a helper new helper: walk_component(). Handles everything except symlinks; returns negative on error, 0 on success and 1 on symlinks we decided to follow. Drops out of RCU mode on such symlinks. link_path_walk() and do_last() switched to using that. Signed-off-by: Al Viro commit 11a7b371b64ef39fc5fb1b6f2218eef7c4d035e3 Author: Aneesh Kumar K.V Date: Sat Jan 29 18:43:42 2011 +0530 fs: allow AT_EMPTY_PATH in linkat(), limit that to CAP_DAC_READ_SEARCH We don't want to allow creation of private hardlinks by different application using the fd passed to them via SCM_RIGHTS. So limit the null relative name usage in linkat syscall to CAP_DAC_READ_SEARCH Signed-off-by: Aneesh Kumar K.V commit 76ca07832842100b14a31ad8996dab7b0c28aa42 Merge: 27d2a8b b056b6a Author: Linus Torvalds Date: Tue Mar 15 10:59:09 2011 -0700 Merge branch 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm * 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm: xen: suspend: remove xen_hvm_suspend xen: suspend: pull pre/post suspend hooks out into suspend_info xen: suspend: move arch specific pre/post suspend hooks into generic hooks xen: suspend: refactor non-arch specific pre/post suspend hooks xen: suspend: add "arch" to pre/post suspend hooks xen: suspend: pass extra hypercall argument via suspend_info struct xen: suspend: refactor cancellation flag into a structure xen: suspend: use HYPERVISOR_suspend for PVHVM case instead of open coding xen: switch to new schedop hypercall by default. xen: use new schedop interface for suspend xen: do not respond to unknown xenstore control requests xen: fix compile issue if XEN is enabled but XEN_PVHVM is disabled xen: PV on HVM: support PV spinlocks and IPIs xen: make the ballon driver work for hvm domains xen-blkfront: handle Xen major numbers other than XENVBD xen: do not use xen_info on HVM, set pv_info name to "Xen HVM" xen: no need to delay xen_setup_shutdown_event for hvm guests anymore commit 27d2a8b97ebc4467e47722415b81ebe72d5f654f Merge: 010b8f4 44e6976 51de695 44b46c3 Author: Linus Torvalds Date: Tue Mar 15 10:49:16 2011 -0700 Merge branches 'stable/ia64', 'stable/blkfront-cleanup' and 'stable/cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/ia64' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: ia64 build broken due to "xen: switch to new schedop hypercall by default." * 'stable/blkfront-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: Union the blkif_request request specific fields * 'stable/cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: annotate functions which only call into __init at start of day xen p2m: annotate variable which appears unused xen: events: mark cpu_evtchn_mask_p as __refdata commit 010b8f4e264b0b6f596186574956dde2fa02df1c Merge: 397fae0 71eef7d Author: Linus Torvalds Date: Tue Mar 15 10:47:56 2011 -0700 Merge branch 'stable/irq.cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/irq.cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: events: remove dom0 specific xen_create_msi_irq xen: events: use xen_bind_pirq_msi_to_irq from xen_create_msi_irq xen: events: push set_irq_msi down into xen_create_msi_irq xen: events: update pirq_to_irq in xen_create_msi_irq xen: events: refactor xen_create_msi_irq slightly xen: events: separate MSI PIRQ allocation from PIRQ binding to IRQ xen: events: assume PHYSDEVOP_get_free_pirq exists xen: pci: collapse apic_register_gsi_xen_hvm and xen_hvm_register_pirq xen: events: return irq from xen_allocate_pirq_msi xen: events: drop XEN_ALLOC_IRQ flag to xen_allocate_pirq_msi xen: events: do not leak IRQ from xen_allocate_pirq_msi when no pirq available. xen: pci: only define xen_initdom_setup_msi_irqs if CONFIG_XEN_DOM0 commit 397fae081869784d07cd4edde0ddf436ca2011e0 Merge: c7146dd 1aa0b51 3d74a53 Author: Linus Torvalds Date: Tue Mar 15 10:47:16 2011 -0700 Merge branches 'stable/irq.rework' and 'stable/pcifront-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/irq.rework' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/irq: Cleanup up the pirq_to_irq for DomU PV PCI passthrough guests as well. xen: Use IRQF_FORCE_RESUME xen/timer: Missing IRQF_NO_SUSPEND in timer code broke suspend. xen: Fix compile error introduced by "switch to new irq_chip functions" xen: Switch to new irq_chip functions xen: Remove stale irq_chip.end xen: events: do not free legacy IRQs xen: events: allocate GSIs and dynamic IRQs from separate IRQ ranges. xen: events: add xen_allocate_irq_{dynamic, gsi} and xen_free_irq xen:events: move find_unbound_irq inside CONFIG_PCI_MSI xen: handled remapped IRQs when enabling a pcifront PCI device. genirq: Add IRQF_FORCE_RESUME * 'stable/pcifront-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: pci/xen: When free-ing MSI-X/MSI irq->desc also use generic code. pci/xen: Cleanup: convert int** to int[] pci/xen: Use xen_allocate_pirq_msi instead of xen_allocate_pirq xen-pcifront: Sanity check the MSI/MSI-X values xen-pcifront: don't use flush_scheduled_work() commit c7146dd0090b9c98ae8525900abf1c38fc7e4e0d Merge: 521cb40 706cc9d 86b3212 Author: Linus Torvalds Date: Tue Mar 15 10:32:15 2011 -0700 Merge branches 'stable/p2m-identity.v4.9.1' and 'stable/e820' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/p2m-identity.v4.9.1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set.. xen/m2p: No need to catch exceptions when we know that there is no RAM xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set. xen/debugfs: Add 'p2m' file for printing out the P2M layout. xen/setup: Set identity mapping for non-RAM E820 and E820 gaps. xen/mmu: WARN_ON when racing to swap middle leaf. xen/mmu: Set _PAGE_IOMAP if PFN is an identity PFN. xen/mmu: Add the notion of identity (1-1) mapping. xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY. * 'stable/e820' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/e820: Don't mark balloon memory as E820_UNUSABLE when running as guest and fix overflow. xen/setup: Inhibit resource API from using System RAM E820 gaps as PCI mem gaps. commit 326be7b484843988afe57566b627fb7a70beac56 Author: Al Viro Date: Sun Mar 13 17:08:22 2011 -0400 Allow passing O_PATH descriptors via SCM_RIGHTS datagrams Just need to make sure that AF_UNIX garbage collector won't confuse O_PATHed socket on filesystem for real AF_UNIX opened socket. Signed-off-by: Al Viro commit 65cfc6722361570bfe255698d9cd4dccaf47570d Author: Al Viro Date: Sun Mar 13 15:56:26 2011 -0400 readlinkat(), fchownat() and fstatat() with empty relative pathnames For readlinkat() we simply allow empty pathname; it will fail unless we have dfd equal to O_PATH-opened symlink, so we are outside of POSIX scope here. For fchownat() and fstatat() we allow AT_EMPTY_PATH; let the caller explicitly ask for such behaviour. Signed-off-by: Al Viro commit bcda76524cd1fa32af748536f27f674a13e56700 Author: Al Viro Date: Sun Mar 13 16:42:14 2011 -0400 Allow O_PATH for symlinks At that point we can't do almost nothing with them. They can be opened with O_PATH, we can manipulate such descriptors with dup(), etc. and we can see them in /proc/*/{fd,fdinfo}/*. We can't (and won't be able to) follow /proc/*/fd/* symlinks for those; there's simply not enough information for pathname resolution to go on from such point - to resolve a symlink we need to know which directory does it live in. We will be able to do useful things with them after the next commit, though - readlinkat() and fchownat() will be possible to use with dfd being an O_PATH-opened symlink and empty relative pathname. Combined with open_by_handle() it'll give us a way to do realink-by-handle and lchown-by-handle without messing with more redundant syscalls. Signed-off-by: Al Viro commit 1abf0c718f15a56a0a435588d1b104c7a37dc9bd Author: Al Viro Date: Sun Mar 13 03:51:11 2011 -0400 New kind of open files - "location only". New flag for open(2) - O_PATH. Semantics: * pathname is resolved, but the file itself is _NOT_ opened as far as filesystem is concerned. * almost all operations on the resulting descriptors shall fail with -EBADF. Exceptions are: 1) operations on descriptors themselves (i.e. close(), dup(), dup2(), dup3(), fcntl(fd, F_DUPFD), fcntl(fd, F_DUPFD_CLOEXEC, ...), fcntl(fd, F_GETFD), fcntl(fd, F_SETFD, ...)) 2) fcntl(fd, F_GETFL), for a common non-destructive way to check if descriptor is open 3) "dfd" arguments of ...at(2) syscalls, i.e. the starting points of pathname resolution * closing such descriptor does *NOT* affect dnotify or posix locks. * permissions are checked as usual along the way to file; no permission checks are applied to the file itself. Of course, giving such thing to syscall will result in permission checks (at the moment it means checking that starting point of ....at() is a directory and caller has exec permissions on it). fget() and fget_light() return NULL on such descriptors; use of fget_raw() and fget_raw_light() is needed to get them. That protects existing code from dealing with those things. There are two things still missing (they come in the next commits): one is handling of symlinks (right now we refuse to open them that way; see the next commit for semantics related to those) and another is descriptor passing via SCM_RIGHTS datagrams. Signed-off-by: Al Viro commit f2fa2ffc2046fdc35f96366d1ec8675f4d578522 Author: Aneesh Kumar K.V Date: Sat Jan 29 18:43:40 2011 +0530 ext4: Copy fs UUID to superblock File system UUID is made available to application via /proc//mountinfo Signed-off-by: Aneesh Kumar K.V Signed-off-by: Al Viro commit 03cb5f03dcb26846fcad345d8c15aae91579a53d Author: Aneesh Kumar K.V Date: Sat Jan 29 18:43:39 2011 +0530 ext3: Copy fs UUID to superblock. File system UUID is made available to application via /proc//mountinfo Signed-off-by: Aneesh Kumar K.V Signed-off-by: Al Viro commit 93f1c20bc8cdb757be50566eff88d65c3b26881f Author: Aneesh Kumar K.V Date: Sat Jan 29 18:43:38 2011 +0530 vfs: Export file system uuid via /proc//mountinfo We add a per superblock uuid field. File systems should update the uuid in the fill_super callback Signed-off-by: Aneesh Kumar K.V Signed-off-by: Al Viro commit a51571ccb8be1b88aea502ebba8350519682c16d Author: Aneesh Kumar K.V Date: Sat Jan 29 18:43:38 2011 +0530 unistd.h: Add new syscalls numbers to asm-generic Signed-off-by: Aneesh Kumar K.V Signed-off-by: Al Viro commit 6aae5f2b2085c5c90964bb78676ea8a6a336e037 Author: Aneesh Kumar K.V Date: Sat Jan 29 18:43:37 2011 +0530 x86: Add new syscalls for x86_64 This patch add new syscalls to x86_64 Signed-off-by: Aneesh Kumar K.V Signed-off-by: Al Viro commit 7dadb755b082c259f7dd4a95a3a6eb21646a28d5 Author: Aneesh Kumar K.V Date: Sat Jan 29 18:43:35 2011 +0530 x86: Add new syscalls for x86_32 This patch adds new syscalls to x86_32 Signed-off-by: Aneesh Kumar K.V Signed-off-by: Al Viro commit f17b6042073e7000a90063f7edbca59a5bd1caa2 Author: Aneesh Kumar K.V Date: Sat Jan 29 18:43:30 2011 +0530 fs: Remove i_nlink check from file system link callback Now that VFS check for inode->i_nlink == 0 and returns proper error, remove similar check from file system Signed-off-by: Aneesh Kumar K.V Signed-off-by: Al Viro commit aae8a97d3ec30788790d1720b71d76fd8eb44b73 Author: Aneesh Kumar K.V Date: Sat Jan 29 18:43:27 2011 +0530 fs: Don't allow to create hardlink for deleted file Add inode->i_nlink == 0 check in VFS. Some of the file systems do this internally. A followup patch will remove those instance. This is needed to ensure that with link by handle we don't allow to create hardlink of an unlinked file. The check also prevent a race between unlink and link Signed-off-by: Aneesh Kumar K.V Signed-off-by: Al Viro commit becfd1f37544798cbdfd788f32c827160fab98c1 Author: Aneesh Kumar K.V Date: Sat Jan 29 18:43:26 2011 +0530 vfs: Add open by file handle support [AV: duplicate of open() guts removed; file_open_root() used instead] Signed-off-by: Aneesh Kumar K.V Signed-off-by: Al Viro commit 990d6c2d7aee921e3bce22b2d6a750fd552262be Author: Aneesh Kumar K.V Date: Sat Jan 29 18:43:26 2011 +0530 vfs: Add name to file handle conversion support The syscall also return mount id which can be used to lookup file system specific information such as uuid in /proc//mountinfo Signed-off-by: Aneesh Kumar K.V Signed-off-by: Al Viro commit f52e0c11305aa09ed56cad97ffc8f0cdc3d78b5d Author: Al Viro Date: Mon Mar 14 18:56:51 2011 -0400 New AT_... flag: AT_EMPTY_PATH For name_to_handle_at(2) we'll want both ...at()-style syscall that would be usable for non-directory descriptors (with empty relative pathname). Introduce new flag (AT_EMPTY_PATH) to deal with that and corresponding LOOKUP_EMPTY; teach user_path_at() and path_init() to deal with the latter. Signed-off-by: Al Viro commit 706cc9d2a4cb9b03217e15b0bb3d117f4d5109ee Author: Stefano Stabellini Date: Wed Feb 2 18:32:59 2011 +0000 xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set.. If there is no proper PFN value in the M2P for the MFN (so we get 0xFFFFF.. or 0x55555, or 0x0), we should consult the M2P override to see if there is an entry for this. [Note: we also consult the M2P override if the MFN is past our machine_to_phys size]. We consult the P2M with the PFN. In case the returned MFN is one of the special values: 0xFFF.., 0x5555 (which signify that the MFN can be either "missing" or it belongs to DOMID_IO) or the p2m(m2p(mfn)) != mfn, we check the M2P override. If we fail the M2P override check, we reset the PFN value to INVALID_P2M_ENTRY. Next we try to find the MFN in the P2M using the MFN value (not the PFN value) and if found, we know that this MFN is an identity value and return it as so. Otherwise we have exhausted all the posibilities and we return the PFN, which at this stage can either be a real PFN value found in the machine_to_phys.. array, or INVALID_P2M_ENTRY value. [v1: Added Review-by tag] Reviewed-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 146c4e511717e581065800938537b276173d8548 Author: Konrad Rzeszutek Wilk Date: Fri Jan 14 17:55:44 2011 -0500 xen/m2p: No need to catch exceptions when we know that there is no RAM .. beyound what we think is the end of memory. However there might be more System RAM - but assigned to a guest. Hence jump to the M2P override check and consult. [v1: Added Review-by tag] Reviewed-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit fc25151d9ac7d809239fe68de0a1490b504bb94a Author: Konrad Rzeszutek Wilk Date: Thu Dec 23 16:25:29 2010 -0500 xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set. Only enabled if XEN_DEBUG is enabled. We print a warning when: pfn_to_mfn(pfn) == pfn, but no VM_IO (_PAGE_IOMAP) flag set (and pfn is an identity mapped pfn) pfn_to_mfn(pfn) != pfn, and VM_IO flag is set. (ditto, pfn is an identity mapped pfn) [v2: Make it dependent on CONFIG_XEN_DEBUG instead of ..DEBUG_FS] [v3: Fix compiler warning] Reviewed-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 2222e71bd6eff7b2ad026d4ee663b6327c5a49f5 Author: Konrad Rzeszutek Wilk Date: Wed Dec 22 08:57:30 2010 -0500 xen/debugfs: Add 'p2m' file for printing out the P2M layout. We walk over the whole P2M tree and construct a simplified view of which PFN regions belong to what level and what type they are. Only enabled if CONFIG_XEN_DEBUG_FS is set. [v2: UNKN->UNKNOWN, use uninitialized_var] [v3: Rebased on top of mmu->p2m code split] [v4: Fixed the else if] Reviewed-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 68df0da7f42be6ae017fe9f48ac414c43a7b9d32 Author: Konrad Rzeszutek Wilk Date: Tue Feb 1 17:15:30 2011 -0500 xen/setup: Set identity mapping for non-RAM E820 and E820 gaps. We walk the E820 region and start at 0 (for PV guests we start at ISA_END_ADDRESS) and skip any E820 RAM regions. For all other regions and as well the gaps we set them to be identity mappings. The reasons we do not want to set the identity mapping from 0-> ISA_END_ADDRESS when running as PV is b/c that the kernel would try to read DMI information and fail (no permissions to read that). There is a lot of gnarly code to deal with that weird region so we won't try to do a cleanup in this patch. This code ends up calling 'set_phys_to_identity' with the start and end PFN of the the E820 that are non-RAM or have gaps. On 99% of machines that means one big region right underneath the 4GB mark. Usually starts at 0xc0000 (or 0x80000) and goes to 0x100000. [v2: Fix for E820 crossing 1MB region and clamp the start] [v3: Squshed in code that does this over ranges] [v4: Moved the comment to the correct spot] [v5: Use the "raw" E820 from the hypervisor] [v6: Added Review-by tag] Reviewed-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit c7617798771ad588d585986d896197c04b737621 Author: Konrad Rzeszutek Wilk Date: Tue Jan 18 20:17:10 2011 -0500 xen/mmu: WARN_ON when racing to swap middle leaf. The initial bootup code uses set_phys_to_machine quite a lot, and after bootup it would be used by the balloon driver. The balloon driver does have mutex lock so this should not be necessary - but just in case, add a WARN_ON if we do hit this scenario. If we do fail this, it is OK to continue as there is a backup mechanism (VM_IO) that can bypass the P2M and still set the _PAGE_IOMAP flags. [v2: Change from WARN to BUG_ON] [v3: Rebased on top of xen->p2m code split] [v4: Change from BUG_ON to WARN] Reviewed-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit fb38923ead10aa8a28db191548e176e8856614d7 Author: Konrad Rzeszutek Wilk Date: Wed Jan 5 15:46:31 2011 -0500 xen/mmu: Set _PAGE_IOMAP if PFN is an identity PFN. If we find that the PFN is within the P2M as an identity PFN make sure to tack on the _PAGE_IOMAP flag. Reviewed-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit f4cec35b0d4b90d96e3770a3d1e68ea882e7a7c8 Author: Konrad Rzeszutek Wilk Date: Tue Jan 18 20:15:21 2011 -0500 xen/mmu: Add the notion of identity (1-1) mapping. Our P2M tree structure is a three-level. On the leaf nodes we set the Machine Frame Number (MFN) of the PFN. What this means is that when one does: pfn_to_mfn(pfn), which is used when creating PTE entries, you get the real MFN of the hardware. When Xen sets up a guest it initially populates a array which has descending (or ascending) MFN values, as so: idx: 0, 1, 2 [0x290F, 0x290E, 0x290D, ..] so pfn_to_mfn(2)==0x290D. If you start, restart many guests that list starts looking quite random. We graft this structure on our P2M tree structure and stick in those MFN in the leafs. But for all other leaf entries, or for the top root, or middle one, for which there is a void entry, we assume it is "missing". So pfn_to_mfn(0xc0000)=INVALID_P2M_ENTRY. We add the possibility of setting 1-1 mappings on certain regions, so that: pfn_to_mfn(0xc0000)=0xc0000 The benefit of this is, that we can assume for non-RAM regions (think PCI BARs, or ACPI spaces), we can create mappings easily b/c we get the PFN value to match the MFN. For this to work efficiently we introduce one new page p2m_identity and allocate (via reserved_brk) any other pages we need to cover the sides (1GB or 4MB boundary violations). All entries in p2m_identity are set to INVALID_P2M_ENTRY type (Xen toolstack only recognizes that and MFNs, no other fancy value). On lookup we spot that the entry points to p2m_identity and return the identity value instead of dereferencing and returning INVALID_P2M_ENTRY. If the entry points to an allocated page, we just proceed as before and return the PFN. If the PFN has IDENTITY_FRAME_BIT set we unmask that in appropriate functions (pfn_to_mfn). The reason for having the IDENTITY_FRAME_BIT instead of just returning the PFN is that we could find ourselves where pfn_to_mfn(pfn)==pfn for a non-identity pfn. To protect ourselves against we elect to set (and get) the IDENTITY_FRAME_BIT on all identity mapped PFNs. This simplistic diagram is used to explain the more subtle piece of code. There is also a digram of the P2M at the end that can help. Imagine your E820 looking as so: 1GB 2GB /-------------------+---------\/----\ /----------\ /---+-----\ | System RAM | Sys RAM ||ACPI| | reserved | | Sys RAM | \-------------------+---------/\----/ \----------/ \---+-----/ ^- 1029MB ^- 2001MB [1029MB = 263424 (0x40500), 2001MB = 512256 (0x7D100), 2048MB = 524288 (0x80000)] And dom0_mem=max:3GB,1GB is passed in to the guest, meaning memory past 1GB is actually not present (would have to kick the balloon driver to put it in). When we are told to set the PFNs for identity mapping (see patch: "xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.") we pass in the start of the PFN and the end PFN (263424 and 512256 respectively). The first step is to reserve_brk a top leaf page if the p2m[1] is missing. The top leaf page covers 512^2 of page estate (1GB) and in case the start or end PFN is not aligned on 512^2*PAGE_SIZE (1GB) we loop on aligned 1GB PFNs from start pfn to end pfn. We reserve_brk top leaf pages if they are missing (means they point to p2m_mid_missing). With the E820 example above, 263424 is not 1GB aligned so we allocate a reserve_brk page which will cover the PFNs estate from 0x40000 to 0x80000. Each entry in the allocate page is "missing" (points to p2m_missing). Next stage is to determine if we need to do a more granular boundary check on the 4MB (or 2MB depending on architecture) off the start and end pfn's. We check if the start pfn and end pfn violate that boundary check, and if so reserve_brk a middle (p2m[x][y]) leaf page. This way we have a much finer granularity of setting which PFNs are missing and which ones are identity. In our example 263424 and 512256 both fail the check so we reserve_brk two pages. Populate them with INVALID_P2M_ENTRY (so they both have "missing" values) and assign them to p2m[1][2] and p2m[1][488] respectively. At this point we would at minimum reserve_brk one page, but could be up to three. Each call to set_phys_range_identity has at maximum a three page cost. If we were to query the P2M at this stage, all those entries from start PFN through end PFN (so 1029MB -> 2001MB) would return INVALID_P2M_ENTRY ("missing"). The next step is to walk from the start pfn to the end pfn setting the IDENTITY_FRAME_BIT on each PFN. This is done in 'set_phys_range_identity'. If we find that the middle leaf is pointing to p2m_missing we can swap it over to p2m_identity - this way covering 4MB (or 2MB) PFN space. At this point we do not need to worry about boundary aligment (so no need to reserve_brk a middle page, figure out which PFNs are "missing" and which ones are identity), as that has been done earlier. If we find that the middle leaf is not occupied by p2m_identity or p2m_missing, we dereference that page (which covers 512 PFNs) and set the appropriate PFN with IDENTITY_FRAME_BIT. In our example 263424 and 512256 end up there, and we set from p2m[1][2][256->511] and p2m[1][488][0->256] with IDENTITY_FRAME_BIT set. All other regions that are void (or not filled) either point to p2m_missing (considered missing) or have the default value of INVALID_P2M_ENTRY (also considered missing). In our case, p2m[1][2][0->255] and p2m[1][488][257->511] contain the INVALID_P2M_ENTRY value and are considered "missing." This is what the p2m ends up looking (for the E820 above) with this fabulous drawing: p2m /--------------\ /-----\ | &mfn_list[0],| /-----------------\ | 0 |------>| &mfn_list[1],| /---------------\ | ~0, ~0, .. | |-----| | ..., ~0, ~0 | | ~0, ~0, [x]---+----->| IDENTITY [@256] | | 1 |---\ \--------------/ | [p2m_identity]+\ | IDENTITY [@257] | |-----| \ | [p2m_identity]+\\ | .... | | 2 |--\ \-------------------->| ... | \\ \----------------/ |-----| \ \---------------/ \\ | 3 |\ \ \\ p2m_identity |-----| \ \-------------------->/---------------\ /-----------------\ | .. +->+ | [p2m_identity]+-->| ~0, ~0, ~0, ... | \-----/ / | [p2m_identity]+-->| ..., ~0 | / /---------------\ | .... | \-----------------/ / | IDENTITY[@0] | /-+-[x], ~0, ~0.. | / | IDENTITY[@256]|<----/ \---------------/ / | ~0, ~0, .... | | \---------------/ | p2m_missing p2m_missing /------------------\ /------------\ | [p2m_mid_missing]+---->| ~0, ~0, ~0 | | [p2m_mid_missing]+---->| ..., ~0 | \------------------/ \------------/ where ~0 is INVALID_P2M_ENTRY. IDENTITY is (PFN | IDENTITY_BIT) Reviewed-by: Ian Campbell [v5: Changed code to use ranges, added ASCII art] [v6: Rebased on top of xen->p2m code split] [v4: Squished patches in just this one] [v7: Added RESERVE_BRK for potentially allocated pages] [v8: Fixed alignment problem] [v9: Changed 1<<3X to 1< commit 5fe0c2378884e68beb532f5890cc0e3539ac747b Author: Aneesh Kumar K.V Date: Sat Jan 29 18:43:25 2011 +0530 exportfs: Return the minimum required handle size The exportfs encode handle function should return the minimum required handle size. This helps user to find out the handle size by passing 0 handle size in the first step and then redoing to the call again with the returned handle size value. Acked-by: Serge Hallyn Signed-off-by: Aneesh Kumar K.V Signed-off-by: Al Viro commit c8b91accfa1059d5565443193d89572eca2f5dd6 Author: Al Viro Date: Sat Mar 12 10:41:39 2011 -0500 clean statfs-like syscalls up New helpers: user_statfs() and fd_statfs(), taking userland pathname and descriptor resp. and filling struct kstatfs. Syscalls of statfs family (native, compat and foreign - osf and hpux on alpha and parisc resp.) switched to those. Removes some boilerplate code, simplifies cleanup on errors... Signed-off-by: Al Viro commit 73d049a40fc6269189c4e2ba6792cb5dd054883c Author: Al Viro Date: Fri Mar 11 12:08:24 2011 -0500 open-style analog of vfs_path_lookup() new function: file_open_root(dentry, mnt, name, flags) opens the file vfs_path_lookup would arrive to. Note that name can be empty; in that case the usual requirement that dentry should be a directory is lifted. open-coded equivalents switched to it, may_open() got down exactly one caller and became static. Signed-off-by: Al Viro commit 5b6ca027d85b7438c84b78a54ccdc2e53f2909cd Author: Al Viro Date: Wed Mar 9 23:04:47 2011 -0500 reduce vfs_path_lookup() to do_path_lookup() New lookup flag: LOOKUP_ROOT. nd->root is set (and held) by caller, path_init() starts walking from that place and all pathname resolution machinery never drops nd->root if that flag is set. That turns vfs_path_lookup() into a special case of do_path_lookup() *and* gets us down to 3 callers of link_path_walk(), making it finally feasible to rip the handling of trailing symlink out of link_path_walk(). That will not only simply the living hell out of it, but make life much simpler for unionfs merge. Trailing symlink handling will become iterative, which is a good thing for stack footprint in a lot of situations as well. Signed-off-by: Al Viro commit 5a18fff2090c3af830d699c8ccb230498a1e37e5 Author: Al Viro Date: Fri Mar 11 04:44:53 2011 -0500 untangle do_lookup() That thing has devolved into rats nest of gotos; sane use of unlikely() gets rid of that horror and gives much more readable structure: * make a fast attempt to find a dentry; false negatives are OK. In RCU mode if everything went fine, we are done, otherwise just drop out of RCU. If we'd done (RCU) ->d_revalidate() and it had not refused outright (i.e. didn't give us -ECHILD), remember its result. * now we are not in RCU mode and hopefully have a dentry. If we do not, lock parent, do full d_lookup() and if that has not found anything, allocate and call ->lookup(). If we'd done that ->lookup(), remember that dentry is good and we don't need to revalidate it. * now we have a dentry. If it has ->d_revalidate() and we can't skip it, call it. * hopefully dentry is good; if not, either fail (in case of error) or try to invalidate it. If d_invalidate() has succeeded, drop it and retry everything as if original attempt had not found a dentry. * now we can finish it up - deal with mountpoint crossing and automount. Signed-off-by: Al Viro commit 40b39136f07279fdc868a36cba050f4e84ce0ace Author: Al Viro Date: Wed Mar 9 16:22:18 2011 -0500 path_openat: clean ELOOP handling a bit Signed-off-by: Al Viro commit f374ed5fa8afed8590deaae5dc147422e0e1a6d9 Author: Al Viro Date: Wed Mar 9 01:34:45 2011 -0500 do_last: kill a rudiment of old ->d_revalidate() workaround There used to be time when ->d_revalidate() couldn't return an error. So intents code had lookup_instantiate_filp() stash ERR_PTR(error) in nd->intent.open.filp and had it checked after lookup_hash(), to catch the otherwise silent failures. That had been introduced by commit 4af4c52f34606bdaab6930a845550c6fb02078a4. These days ->d_revalidate() can and does propagate errors back to callers explicitly, so this check isn't needed anymore. Signed-off-by: Al Viro commit 6c0d46c493217cf48999b3f8808910ae534aa085 Author: Al Viro Date: Wed Mar 9 00:59:59 2011 -0500 fold __open_namei_create() and open_will_truncate() into do_last() ... and clean up a bit more Signed-off-by: Al Viro commit ca344a894b41a133dab07dfbbdf652c053f6658c Author: Al Viro Date: Wed Mar 9 00:36:45 2011 -0500 do_last: unify may_open() call and everyting after it We have a bunch of diverging codepaths in do_last(); some of them converge, but the case of having to create a new file duplicates large part of common tail of the rest and exits separately. Massage them so that they could be merged. Signed-off-by: Al Viro commit 9b44f1b3928b6f41532c9a1dc9a6fc665989ad5b Author: Al Viro Date: Wed Mar 9 00:17:27 2011 -0500 move may_open() from __open_name_create() to do_last() Signed-off-by: Al Viro commit 0f9d1a10c341020617e5b1c7f9c16f6a070438ec Author: Al Viro Date: Wed Mar 9 00:13:14 2011 -0500 expand finish_open() in its only caller Signed-off-by: Al Viro commit 5a202bcd75bbd2397136397961babbd8463416af Author: Al Viro Date: Tue Mar 8 14:17:44 2011 -0500 sanitize pathname component hash calculation Lift it to lookup_one_len() and link_path_walk() resp. into the same place where we calculated default hash function of the same name. Signed-off-by: Al Viro commit 6a96ba54418be740303765c0f52be028573cb99a Author: Al Viro Date: Mon Mar 7 23:49:20 2011 -0500 kill __lookup_one_len() only one caller left Signed-off-by: Al Viro commit fe2d35ff0d18a2c93993b0d7d46f846ff4331b72 Author: Al Viro Date: Sat Mar 5 22:58:25 2011 -0500 switch non-create side of open() to use of do_last() Instead of path_lookupat() doing trailing symlink resolution, use the same scheme as on the O_CREAT side. Walk with LOOKUP_PARENT, then (in do_last()) look the final component up, then either open it or return error or, if it's a symlink, give the symlink back to path_openat() to be resolved there. The really messy complication here is RCU. We don't want to drop out of RCU mode before the final lookup, since we don't want to bounce parent directory ->d_count without a good reason. Result is _not_ pretty; later in the series we'll clean it up. For now we are roughly back where we'd been before the revert done by Nick's series - top-level logics of path_openat() is cleaned up, do_last() does actual opening, symlink resolution is done uniformly. Signed-off-by: Al Viro commit 70e9b3571107b88674cd55ae4bed33f76261e7d3 Author: Al Viro Date: Sat Mar 5 21:12:22 2011 -0500 get rid of nd->file Don't stash the struct file * used as starting point of walk in nameidata; pass file ** to path_init() instead. Signed-off-by: Al Viro commit 951361f954596bd134d4270df834f47d151f98a6 Author: Al Viro Date: Fri Mar 4 14:44:37 2011 -0500 get rid of the last LOOKUP_RCU dependencies in link_path_walk() New helper: terminate_walk(). An error has happened during pathname resolution and we either drop nd->path or terminate RCU, depending the mode we had been in. After that, nd is essentially empty. Switch link_path_walk() to using that for cleanup. Now the top-level logics in link_path_walk() is back to sanity. RCU dependencies are in the lower-level functions. Signed-off-by: Al Viro commit a7472baba22dd5d68580f528374f93421b33667e Author: Al Viro Date: Fri Mar 4 14:39:30 2011 -0500 make nameidata_dentry_drop_rcu_maybe() always leave RCU mode Now we have do_follow_link() guaranteed to leave without dangling RCU and the next step will get LOOKUP_RCU logics completely out of link_path_walk(). Signed-off-by: Al Viro commit ef7562d5283a91da3ba5c14de3221f47b7f08823 Author: Al Viro Date: Fri Mar 4 14:35:59 2011 -0500 make handle_dots() leave RCU mode on error Signed-off-by: Al Viro commit 4455ca6223cc59cbc0a75f4be8bce9e84cc0d6b8 Author: Al Viro Date: Fri Mar 4 14:28:10 2011 -0500 clear RCU on all failure exits from link_path_walk() Signed-off-by: Al Viro commit 9856fa1b281eccdc9f8d94d716e96818c675e78e Author: Al Viro Date: Fri Mar 4 14:22:06 2011 -0500 pull handling of . and .. into inlined helper getting LOOKUP_RCU checks out of link_path_walk()... Signed-off-by: Al Viro commit 7bc055d1d524f209bf49d8b9cb220712dd7df4ed Author: Al Viro Date: Wed Feb 23 19:41:31 2011 -0500 kill out_dput: in link_path_walk() Signed-off-by: Al Viro commit 13aab428a73d3200b9283b61b7fdf5713181ac66 Author: Al Viro Date: Wed Feb 23 17:54:08 2011 -0500 separate -ESTALE/-ECHILD retries in do_filp_open() from real work new helper: path_openat(). Does what do_filp_open() does, except that it tries only the walk mode (RCU/normal/force revalidation) it had been told to. Both create and non-create branches are using path_lookupat() now. Fixed the double audit_inode() in non-create branch. Signed-off-by: Al Viro commit 47c805dc2d2dff686962f5f0baa6bac2d703ba19 Author: Al Viro Date: Wed Feb 23 17:44:09 2011 -0500 switch do_filp_open() to struct open_flags take calculation of open_flags by open(2) arguments into new helper in fs/open.c, move filp_open() over there, have it and do_sys_open() use that helper, switch exec.c callers of do_filp_open() to explicit (and constant) struct open_flags. Signed-off-by: Al Viro commit c3e380b0b3cfa613189fb91513efd88a65e1d9d8 Author: Al Viro Date: Wed Feb 23 13:39:45 2011 -0500 Collect "operation mode" arguments of do_last() into a structure No point messing with passing shitloads of "operation mode" arguments to do_open() one by one, especially since they are not going to change during do_filp_open(). Collect them into a struct, fill it and pass to do_last() by reference. Make sure that lookup intent flags are correctly set and removed - we want them for do_last(), but they make no sense for __do_follow_link(). Signed-off-by: Al Viro commit f1afe9efc84476ca42fbb7301a441021063eead7 Author: Al Viro Date: Tue Feb 22 22:27:28 2011 -0500 clean up the failure exits after __do_follow_link() in do_filp_open() Signed-off-by: Al Viro commit 36f3b4f69070fee7c647bab5dc4408990bb3606c Author: Al Viro Date: Tue Feb 22 21:24:38 2011 -0500 pull security_inode_follow_link() into __do_follow_link() Signed-off-by: Al Viro commit 086e183a641109033420e0b26ddecb6f4abb4c89 Author: Al Viro Date: Tue Feb 22 20:56:27 2011 -0500 pull dropping RCU on success of link_path_walk() into path_lookupat() Signed-off-by: Al Viro commit 16c2cd7179881d5dd87779512ca5a0d657c64f62 Author: Al Viro Date: Tue Feb 22 15:50:10 2011 -0500 untangle the "need_reval_dot" mess instead of ad-hackery around need_reval_dot(), do the following: set a flag (LOOKUP_JUMPED) in the beginning of path, on absolute symlink traversal, on ".." and on procfs-style symlinks. Clear on normal components, leave unchanged on ".". Non-nested callers of link_path_walk() call handle_reval_path(), which checks that flag is set and that fs does want the final revalidate thing, then does ->d_revalidate(). In link_path_walk() all the return_reval stuff is gone. Signed-off-by: Al Viro commit fe479a580dc9c737c4eb49ff7fdb31d41d2c7003 Author: Al Viro Date: Tue Feb 22 15:10:03 2011 -0500 merge component type recognition no need to do it in three places... Signed-off-by: Al Viro commit e41f7d4ee5bdb00da7d327a00b0ab9c4a2e9eaa3 Author: Al Viro Date: Tue Feb 22 14:02:58 2011 -0500 merge path_init and path_init_rcu Actual dependency on whether we want RCU or not is in 3 small areas (as it ought to be) and everything around those is the same in both versions. Since each function has only one caller and those callers are on two sides of if (flags & LOOKUP_RCU), it's easier and cleaner to merge them and pull the checks inside. Signed-off-by: Al Viro commit ee0827cd6b42b0385dc1a116cd853ac1b739f711 Author: Al Viro Date: Mon Feb 21 23:38:09 2011 -0500 sanitize path_walk() mess New helper: path_lookupat(). Basically, what do_path_lookup() boils to modulo -ECHILD/-ESTALE handler. path_walk* family is gone; vfs_path_lookup() is using link_path_walk() directly, do_path_lookup() and do_filp_open() are using path_lookupat(). Signed-off-by: Al Viro commit 52094c8a0610cf57920ad4c6c57470ae2ccbbd25 Author: Al Viro Date: Mon Feb 21 21:34:47 2011 -0500 take RCU-dependent stuff around exec_permission() into a new helper Signed-off-by: Al Viro commit c9c6cac0c2bdbda42e7b804838648d0bc60ddb13 Author: Al Viro Date: Wed Feb 16 15:15:47 2011 -0500 kill path_lookup() all remaining callers pass LOOKUP_PARENT to it, so flags argument can die; renamed to kern_path_parent() Signed-off-by: Al Viro commit 15a9155fe3e8215c02b80df51ec2cac7c0d726ad Author: Al Viro Date: Wed Feb 16 15:08:54 2011 -0500 fix race in audit_get_nd() don't rely on pathname resolution ending up twice at the same point... Signed-off-by: Al Viro commit 586ce098a23b6ab7383df853a84ae3d48dc889aa Author: Al Viro Date: Sun Mar 13 01:50:58 2011 -0500 compat breakage in preadv() and pwritev() Fix for a dumb preadv()/pwritev() compat bug - unlike the native variants, compat_... ones forget to check FMODE_P{READ,WRITE}, so e.g. on pipe the native preadv() will fail with -ESPIPE and compat one will act as readv() and succeed. Not critical, but it's a clear bug with trivial fix. Signed-off-by: Al Viro commit 86b32122fd54addc9af01f8b919c65d3f49090a3 Author: Konrad Rzeszutek Wilk Date: Wed Mar 9 22:03:16 2011 -0500 xen/e820: Don't mark balloon memory as E820_UNUSABLE when running as guest and fix overflow. If we have a guest that asked for: memory=1024 maxmem=2048 Which means we want 1GB now, and create pagetables so that we can expand up to 2GB, we would have this E820 layout: [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved) [ 0.000000] Xen: 0000000000100000 - 0000000080800000 (usable) Due to patch: "xen/setup: Inhibit resource API from using System RAM E820 gaps as PCI mem gaps." we would mark the memory past the 1GB mark as unusuable resulting in: [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved) [ 0.000000] Xen: 0000000000100000 - 0000000040000000 (usable) [ 0.000000] Xen: 0000000040000000 - 0000000080800000 (unusable) which meant that we could not balloon up anymore. We could balloon the guest down. The fix is to run the code introduced by the above mentioned patch only for the initial domain. We will have to revisit this once we start introducing a modified E820 for PCI passthrough so that we can utilize the P2M identity code. We also fix an overflow by having UL instead of ULL on 32-bit machines. [v2: Ian pointed to the overflow issue] Signed-off-by: Konrad Rzeszutek Wilk commit 71eef7d1e3d9df760897fdd2cad6949a8bcf1620 Author: Ian Campbell Date: Fri Feb 18 17:06:55 2011 +0000 xen: events: remove dom0 specific xen_create_msi_irq The function name does not distinguish it from xen_allocate_pirq_msi (which operates on domU and pvhvm domains rather than dom0). Hoist domain 0 specific functionality up into the only caller leaving functionality common to all guest types in xen_bind_pirq_msi_to_irq. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit ca1d8fe9521fb67c95cfa736c08f4bbbc282b5bd Author: Ian Campbell Date: Fri Feb 18 16:43:36 2011 +0000 xen: events: use xen_bind_pirq_msi_to_irq from xen_create_msi_irq Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit f420e010edd84eb2c237fc87b7451e69740fed46 Author: Ian Campbell Date: Fri Feb 18 16:43:35 2011 +0000 xen: events: push set_irq_msi down into xen_create_msi_irq Makes the tail end of this function look even more like xen_bind_pirq_msi_to_irq. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 2e55288f63343f0810f4f0a3004f78037cfb93d3 Author: Ian Campbell Date: Fri Feb 18 16:43:34 2011 +0000 xen: events: update pirq_to_irq in xen_create_msi_irq I don't think this was a deliberate ommision. Makes the tail end of this function look even more like xen_bind_pirq_msi_to_irq. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 8135591e90c81462a6902f6ffa1f1ca021db077a Author: Ian Campbell Date: Fri Feb 18 16:43:33 2011 +0000 xen: events: refactor xen_create_msi_irq slightly Calling PHYSDEVOP_map_pirq earlier simplifies error handling and starts to make the tail end of this function look like xen_bind_pirq_msi_to_irq. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit bf480d952bcf25e8ff7e95d2a23964107513ac51 Author: Ian Campbell Date: Fri Feb 18 16:43:32 2011 +0000 xen: events: separate MSI PIRQ allocation from PIRQ binding to IRQ Split the binding aspect of xen_allocate_pirq_msi out into a new xen_bind_pirq_to_irq function. In xen_hvm_setup_msi_irq when allocating a pirq write the MSI message to signal the PIRQ as soon as the pirq is obtained. There is no way to free the pirq back so if the subsequent binding to an IRQ fails we want to ensure that we will reuse the PIRQ next time rather than leak it. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 5cad61a6ba6f4956a218ffbb64cafcc1daefaca0 Author: Ian Campbell Date: Fri Feb 18 16:43:31 2011 +0000 xen: events: assume PHYSDEVOP_get_free_pirq exists The find_unbound_pirq is called only from xen_allocate_pirq_msi and only if alloc_pirq is true. The only caller which does this is xen_hvm_setup_msi_irqs. The use of this function is gated, in pci_xen_hvm_init, on XENFEAT_hvm_pirqs. The PHYSDEVOP_get_free_pirq interfaces was added to the hypervisor in 22410:be96f6058c05 while XENFEAT_hvm_pirqs was added a couple of minutes prior in 22409:6663214f06ac. Therefore we do not need to concern ourselves with hypervisors which support XENFEAT_hvm_pirqs but not PHYSDEVOP_get_free_pirq. This eliminates the fallback path in find_unbound_pirq which walks to pirq_to_irq array looking for a free pirq. Unlike the PHYSDEVOP_get_free_pirq interface this fallback only looks up a free pirq but does not reserve it. Removing this fallback will simplify locking in the future. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 9a626612c2010699d9909a4c3141d3a38660f3b3 Author: Ian Campbell Date: Fri Feb 18 16:43:30 2011 +0000 xen: pci: collapse apic_register_gsi_xen_hvm and xen_hvm_register_pirq apic_register_gsi_xen_hvm is a tiny wrapper around xen_hvm_register_pirq. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 4b41df7f6e0b5684378d9155773c42a4577e8582 Author: Ian Campbell Date: Fri Feb 18 16:43:29 2011 +0000 xen: events: return irq from xen_allocate_pirq_msi consistent with other similar functions. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit bb5d079aefa828c292c267ed34ed2282947fa233 Author: Ian Campbell Date: Fri Feb 18 16:43:28 2011 +0000 xen: events: drop XEN_ALLOC_IRQ flag to xen_allocate_pirq_msi All callers pass this flag so it is pointless. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit ae1635b05fae30804061406010914d85d12431ac Author: Ian Campbell Date: Fri Feb 18 16:43:27 2011 +0000 xen: events: do not leak IRQ from xen_allocate_pirq_msi when no pirq available. Cc: Jeremy Fitzhardinge Cc: xen-devel@lists.xensource.com Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 260a7d4cfd26d8bad8ac3a7fce11de47491d7e00 Author: Ian Campbell Date: Fri Feb 18 16:43:26 2011 +0000 xen: pci: only define xen_initdom_setup_msi_irqs if CONFIG_XEN_DOM0 Fixes: CC arch/x86/pci/xen.o arch/x86/pci/xen.c:183: warning: 'xen_initdom_setup_msi_irqs' defined but not used Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 8448f0119a4309ef9626cf8e2dc5abb881e6dc2a Merge: 8054c36 3d74a53 Author: Konrad Rzeszutek Wilk Date: Thu Mar 10 14:42:11 2011 -0500 Merge branch 'stable/pcifront-fixes' into stable/irq.cleanup * stable/pcifront-fixes: pci/xen: When free-ing MSI-X/MSI irq->desc also use generic code. pci/xen: Cleanup: convert int** to int[] pci/xen: Use xen_allocate_pirq_msi instead of xen_allocate_pirq xen-pcifront: Sanity check the MSI/MSI-X values xen-pcifront: don't use flush_scheduled_work() commit 8054c3634cb3cb9d426c8ade934389213b857858 Merge: f5412be 1aa0b51 Author: Konrad Rzeszutek Wilk Date: Thu Mar 10 14:41:43 2011 -0500 Merge branch 'stable/irq.rework' into stable/irq.cleanup * stable/irq.rework: xen/irq: Cleanup up the pirq_to_irq for DomU PV PCI passthrough guests as well. xen: Use IRQF_FORCE_RESUME xen/timer: Missing IRQF_NO_SUSPEND in timer code broke suspend. xen: Fix compile error introduced by "switch to new irq_chip functions" xen: Switch to new irq_chip functions xen: Remove stale irq_chip.end xen: events: do not free legacy IRQs xen: events: allocate GSIs and dynamic IRQs from separate IRQ ranges. xen: events: add xen_allocate_irq_{dynamic, gsi} and xen_free_irq xen:events: move find_unbound_irq inside CONFIG_PCI_MSI xen: handled remapped IRQs when enabling a pcifront PCI device. genirq: Add IRQF_FORCE_RESUME commit 51de69523ffe1c17994dc2f260369f29dfdce71c Author: Owen Smith Date: Wed Dec 22 15:05:00 2010 +0000 xen: Union the blkif_request request specific fields Prepare for extending the block device ring to allow request specific fields, by moving the request specific fields for reads, writes and barrier requests to a union member. Acked-by: Jens Axboe Signed-off-by: Owen Smith Signed-off-by: Konrad Rzeszutek Wilk commit 1aa0b51a033d4a1ec6d29d06487e053398afa21b Author: Konrad Rzeszutek Wilk Date: Thu Feb 17 11:23:58 2011 -0500 xen/irq: Cleanup up the pirq_to_irq for DomU PV PCI passthrough guests as well. We only did this for PV guests that are xen_initial_domain() but there is not reason not to do this for other cases. The other case is only exercised when you pass in a PCI device to a PV guest _and_ the device in question. Reviewed-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 676dc3cf5bc36a9e129a3ad8fe3bd7b2ebf20f5d Author: Thomas Gleixner Date: Sat Feb 5 20:08:59 2011 +0000 xen: Use IRQF_FORCE_RESUME Mark the IRQF_NO_SUSPEND interrupts IRQF_FORCE_RESUME and remove the extra walk through the interrupt descriptors. Signed-off-by: Thomas Gleixner Signed-off-by: Konrad Rzeszutek Wilk commit 8aef4857d26c46ca3d4f1a7f3a7aa4b51a72385e Merge: f611f2d dc5f219 Author: Konrad Rzeszutek Wilk Date: Thu Mar 3 12:02:02 2011 -0500 Merge branch 'irq/for-xen' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip into HEAD * 'irq/for-xen' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq: Add IRQF_FORCE_RESUME commit f611f2da99420abc973c32cdbddbf5c365d0a20c Author: Ian Campbell Date: Tue Feb 8 14:03:31 2011 +0000 xen/timer: Missing IRQF_NO_SUSPEND in timer code broke suspend. The patches missed an indirect use of IRQF_NO_SUSPEND pulled in via IRQF_TIMER. The following patch fixes the issue. With this fixlet PV guest migration works just fine. I also booted the entire series as a dom0 kernel and it appeared fine. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit aa673c1cb3a66d0b37595251c4e8bb688efc8726 Author: Ian Campbell Date: Mon Feb 7 11:08:39 2011 +0000 xen: Fix compile error introduced by "switch to new irq_chip functions" drivers/xen/events.c: In function 'ack_pirq': drivers/xen/events.c:568: error: implicit declaration of function 'irq_move_irq' Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit c9e265e030537167c94cbed190826f02e3887f4d Author: Thomas Gleixner Date: Sat Feb 5 20:08:54 2011 +0000 xen: Switch to new irq_chip functions Convert Xen to the new irq_chip functions. Brings us closer to enable CONFIG_GENERIC_HARDIRQS_NO_DEPRECATED Signed-off-by: Thomas Gleixner Acked-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 149f256f8ca690c28dd8aa9fb8bcdaf2e93b1e1c Author: Thomas Gleixner Date: Sat Feb 5 20:08:52 2011 +0000 xen: Remove stale irq_chip.end irq_chip.end got obsolete with the removal of __do_IRQ() Signed-off-by: Thomas Gleixner Acked-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 7214610475b2847a81478d96e4d3ba0bbe49598c Author: Ian Campbell Date: Thu Feb 3 09:49:35 2011 +0000 xen: events: do not free legacy IRQs c514d00c8057 "xen: events: add xen_allocate_irq_{dynamic, gsi} and xen_free_irq" correctly avoids reallocating legacy IRQs (which are managed by the arch core) but erroneously did not prevent them being freed. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 89911501f3aae44a43984793341a3bf1f4c583c2 Author: Ian Campbell Date: Thu Mar 3 11:57:44 2011 -0500 xen: events: allocate GSIs and dynamic IRQs from separate IRQ ranges. There are three cases which we need to care about, PV guest, PV domain 0 and HVM guest. The PV guest case is simple since it has no access to ACPI or real APICs and therefore has no GSIs therefore we simply dynamically allocate all IRQs. The potentially interesting case here is PIRQ type event channels associated with passed through PCI devices. However even in this case the guest has no direct interaction with the physical GSI since that happens in the PCI backend. The PV domain 0 and HVM guest cases are actually the same. In domain 0 case the kernel sees the host ACPI and GSIs (although it only sees the APIC indirectly via the hypervisor) and in the HVM guest case it sees the virtualised ACPI and emulated APICs. In these cases we start allocating dynamic IRQs at nr_irqs_gsi so that they cannot clash with any GSI. Currently xen_allocate_irq_dynamic starts at nr_irqs and works backwards looking for a free IRQ in order to (try and) avoid clashing with GSIs used in domain 0 and in HVM guests. This change avoids that although we retain the behaviour of allowing dynamic IRQs to encroach on the GSI range if no suitable IRQs are available since a future IRQ clash is deemed preferable to failure right now. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk Cc: Stefano Stabellini Cc: Jeremy Fitzhardinge commit c9df1ce585e3bb5a2f101c1d87381b285a9f962f Author: Ian Campbell Date: Tue Jan 11 17:20:15 2011 +0000 xen: events: add xen_allocate_irq_{dynamic, gsi} and xen_free_irq This is neater than open-coded calls to irq_alloc_desc_at and irq_free_desc. No intended behavioural change. Note that we previously were not checking the return value of irq_alloc_desc_at which would be failing for GSI Signed-off-by: Konrad Rzeszutek Wilk Cc: Stefano Stabellini Cc: Jeremy Fitzhardinge commit cbf6aa89fc52c5253ee141d53eeb73147eb37ac0 Author: Ian Campbell Date: Tue Jan 11 17:20:14 2011 +0000 xen:events: move find_unbound_irq inside CONFIG_PCI_MSI The only caller is xen_allocate_pirq_msi which is also under this ifdef so this fixes: drivers/xen/events.c:377: warning: 'find_unbound_pirq' defined but not used when CONFIG_PCI_MSI=n Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk Cc: Stefano Stabellini Cc: Jeremy Fitzhardinge commit 3f2a230caf21a1f7ac75f9e4892d0e5af9ccee88 Author: Ian Campbell Date: Tue Jan 11 17:20:13 2011 +0000 xen: handled remapped IRQs when enabling a pcifront PCI device. This happens to not be an issue currently because we take pains to try to ensure that the GSI-IRQ mapping is 1-1 in a PV guest and that regular event channels do not clash. However a subsequent patch is going to break this 1-1 mapping. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk Cc: Stefano Stabellini Cc: Jeremy Fitzhardinge commit 6eaa412f2753d98566b777836a98c6e7f672a3bb Author: Konrad Rzeszutek Wilk Date: Tue Jan 18 20:09:41 2011 -0500 xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY. With this patch, we diligently set regions that will be used by the balloon driver to be INVALID_P2M_ENTRY and under the ownership of the balloon driver. We are OK using the __set_phys_to_machine as we do not expect to be allocating any P2M middle or entries pages. The set_phys_to_machine has the side-effect of potentially allocating new pages and we do not want that at this stage. We can do this because xen_build_mfn_list_list will have already allocated all such pages up to xen_max_p2m_pfn. We also move the check for auto translated physmap down the stack so it is present in __set_phys_to_machine. [v2: Rebased with mmu->p2m code split] Reviewed-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 44e69767cb7c3bc46e5370c39532c205d4347d80 Author: Ian Campbell Date: Tue Mar 1 20:05:49 2011 +0000 xen: ia64 build broken due to "xen: switch to new schedop hypercall by default." The git commit: > commit a8b7458363b9174f3c2196ca6085630b4b30b7a1 > Author: Ian Campbell > Date: Thu Feb 17 11:04:20 2011 +0000 > > xen: switch to new schedop hypercall by default. > > Rename old interface to sched_op_compat and rename sched_op_new to > simply sched_op. > breaks the IA64 build. This patch fixes it. Signed-off-by: Tony Luck Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit b056b6a0144de90707cd22cf7b4f60bf69c86d59 Author: Ian Campbell Date: Thu Feb 17 11:04:20 2011 +0000 xen: suspend: remove xen_hvm_suspend It is now identical to xen_suspend, the differences are encapsulated in the suspend_info struct. Signed-off-by: Ian Campbell Reviewed-by: Konrad Rzeszutek Wilk commit 55fb4acef7089a6d4d93ed8caae6c258d06cfaf7 Author: Ian Campbell Date: Thu Feb 17 11:04:20 2011 +0000 xen: suspend: pull pre/post suspend hooks out into suspend_info Signed-off-by: Ian Campbell Reviewed-by: Konrad Rzeszutek Wilk commit 07af38102fc4f260cc5a2418ec833707f53cdf70 Author: Ian Campbell Date: Thu Feb 17 11:04:20 2011 +0000 xen: suspend: move arch specific pre/post suspend hooks into generic hooks Signed-off-by: Ian Campbell Reviewed-by: Konrad Rzeszutek Wilk commit 82043bb60d24d2897074905c94be5a53071e8913 Author: Ian Campbell Date: Thu Feb 17 11:04:20 2011 +0000 xen: suspend: refactor non-arch specific pre/post suspend hooks Signed-off-by: Ian Campbell Reviewed-by: Konrad Rzeszutek Wilk commit 03c8142bd2fb3b87effa6ecb2f8957be588bc85f Author: Ian Campbell Date: Thu Feb 17 11:04:20 2011 +0000 xen: suspend: add "arch" to pre/post suspend hooks xen_pre_device_suspend is unused on ia64. Signed-off-by: Ian Campbell Reviewed-by: Konrad Rzeszutek Wilk commit 36b401e2c2788c7b4881115ddbbff603fe4cf78d Author: Ian Campbell Date: Thu Feb 17 11:04:20 2011 +0000 xen: suspend: pass extra hypercall argument via suspend_info struct Signed-off-by: Ian Campbell Reviewed-by: Konrad Rzeszutek Wilk commit ceb180294790c8a6a437533488616f6b591b49d0 Author: Ian Campbell Date: Thu Feb 17 11:04:20 2011 +0000 xen: suspend: refactor cancellation flag into a structure Will add extra fields in subsequent patches. Signed-off-by: Ian Campbell Reviewed-by: Konrad Rzeszutek Wilk commit bd1c0ad28451df4610d352c7e438213c84de0c28 Author: Ian Campbell Date: Thu Feb 17 11:04:20 2011 +0000 xen: suspend: use HYPERVISOR_suspend for PVHVM case instead of open coding Signed-off-by: Ian Campbell Reviewed-by: Konrad Rzeszutek Wilk commit a8b7458363b9174f3c2196ca6085630b4b30b7a1 Author: Ian Campbell Date: Thu Feb 17 11:04:20 2011 +0000 xen: switch to new schedop hypercall by default. Rename old interface to sched_op_compat and rename sched_op_new to simply sched_op. Signed-off-by: Ian Campbell Reviewed-by: Konrad Rzeszutek Wilk commit 8e15597fa430c03415e2268dfbae0f262b948788 Author: Ian Campbell Date: Thu Feb 17 11:04:20 2011 +0000 xen: use new schedop interface for suspend Take the opportunity to comment on the semantics of the PV guest suspend hypercall arguments. Signed-off-by: Ian Campbell Reviewed-by: Konrad Rzeszutek Wilk commit 552717231e50b478dfd19d63fd97879476ae051d Author: Ian Campbell Date: Thu Feb 17 11:04:20 2011 +0000 xen: do not respond to unknown xenstore control requests The PV xenbus control/shutdown node is written by the toolstack as a request to the guest to perform a particular action (shutdown, reboot, suspend etc). The guest is expected to acknowledge that it will complete a request by clearing the control node. Previously it would acknowledge any request, even if it did not know what to do with it. Specifically in the case where CONFIG_PM_SLEEP is not enabled the kernel would acknowledge a suspend request even though it was not actually going to do anything. Instead make the kernel only acknowledge requests if it is actually going to do something with it. This will improve the toolstack's ability to diagnose and deal with failures. Signed-off-by: Ian Campbell Reviewed-by: Konrad Rzeszutek Wilk commit e057a4b6e0eb6701f6ec923be2075d4984cef51a Author: Stefano Stabellini Date: Fri Feb 11 17:55:13 2011 +0000 xen: fix compile issue if XEN is enabled but XEN_PVHVM is disabled Signed-off-by: Stefano Stabellini commit 99bbb3a84a99cd04ab16b998b20f01a72cfa9f4f Author: Stefano Stabellini Date: Thu Dec 2 17:55:10 2010 +0000 xen: PV on HVM: support PV spinlocks and IPIs Initialize PV spinlocks on boot CPU right after native_smp_prepare_cpus (that switch to APIC mode and initialize APIC routing); on secondary CPUs on CPU_UP_PREPARE. Enable the usage of event channels to send and receive IPIs when running as a PV on HVM guest. Signed-off-by: Stefano Stabellini commit 53d5522cad291a0e93a385e0594b6aea6b54a071 Author: Stefano Stabellini Date: Thu Dec 2 17:55:05 2010 +0000 xen: make the ballon driver work for hvm domains Signed-off-by: Stefano Stabellini commit c80a420995e721099906607b07c09a24543b31d9 Author: Stefano Stabellini Date: Thu Dec 2 17:55:00 2010 +0000 xen-blkfront: handle Xen major numbers other than XENVBD This patch makes sure blkfront handles correctly virtual device numbers corresponding to Xen emulated IDE and SCSI disks: in those cases blkfront translates the major number to XENVBD and the minor number to a low xvd minor. Note: this behaviour is different from what old xenlinux PV guests used to do: they used to steal an IDE or SCSI major number and use it instead. Signed-off-by: Stefano Stabellini Acked-by: Jeremy Fitzhardinge commit cff520b9c2ee1486ea9ff1dbc774510c62e5ecb9 Author: Stefano Stabellini Date: Thu Dec 2 17:54:54 2010 +0000 xen: do not use xen_info on HVM, set pv_info name to "Xen HVM" Signed-off-by: Stefano Stabellini Acked-by: Jeremy Fitzhardinge commit 702d4eb9b3de4398ab99cf0a4e799e552c7ab756 Author: Stefano Stabellini Date: Thu Dec 2 17:54:50 2010 +0000 xen: no need to delay xen_setup_shutdown_event for hvm guests anymore Now that xenstore_ready is used correctly for PV on HVM guests too, we don't need to delay the initialization of xen_setup_shutdown_event anymore. Signed-off-by: Stefano Stabellini Acked-by: Jeremy Fitzhardinge commit 2f14ddc3a7146ea4cd5a3d1ecd993f85f2e4f948 Author: Zhang, Fengzhe Date: Wed Feb 16 22:26:20 2011 +0800 xen/setup: Inhibit resource API from using System RAM E820 gaps as PCI mem gaps. With the hypervisor argument of dom0_mem=X we iterate over the physical (only for the initial domain) E820 and subtract the the size from each E820_RAM region the delta so that the cumulative size of all E820_RAM regions is equal to 'X'. This sometimes ends up with E820_RAM regions with zero size (which are removed by e820_sanitize) and E820_RAM that are smaller than physically. Later on the PCI API looks at the E820 and attempts to set up an resource region for the "PCI mem". The E820 (assume dom0_mem=1GB is set) compared to the physical looks as so: [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 0000000000097c00 (usable) [ 0.000000] Xen: 0000000000097c00 - 0000000000100000 (reserved) -[ 0.000000] Xen: 0000000000100000 - 00000000defafe00 (usable) +[ 0.000000] Xen: 0000000000100000 - 0000000040000000 (usable) [ 0.000000] Xen: 00000000defafe00 - 00000000defb1ea0 (ACPI NVS) [ 0.000000] Xen: 00000000defb1ea0 - 00000000e0000000 (reserved) [ 0.000000] Xen: 00000000f4000000 - 00000000f8000000 (reserved) .. And we get [ 0.000000] Allocating PCI resources starting at 40000000 (gap: 40000000:9efafe00) while it should have started at e0000000 (a nice big gap up to f4000000 exists). The "Allocating PCI" is part of the resource API. The users that end up using those PCI I/O regions usually supply their own BARs when calling the resource API (request_resource, or allocate_resource), but there are exceptions which provide an empty 'struct resource' and expect the API to provide the 'struct resource' to be populated with valid values. The one that triggered this bug was the intel AGP driver that requested a region for the flush page (intel_i9xx_setup_flush). Before this patch, when running under Xen hypervisor, the 'struct resource' returned could have (depending on the dom0_mem size) physical ranges of a 'System RAM' instead of 'I/O' regions. This ended up with the Hypervisor failing a request to populate PTE's with those PFNs as the domain did not have access to those 'System RAM' regions (rightly so). After this patch, the left-over E820_RAM region from the truncation, will be labeled as E820_UNUSABLE. The E820 will look as so: [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 0000000000097c00 (usable) [ 0.000000] Xen: 0000000000097c00 - 0000000000100000 (reserved) -[ 0.000000] Xen: 0000000000100000 - 00000000defafe00 (usable) +[ 0.000000] Xen: 0000000000100000 - 0000000040000000 (usable) +[ 0.000000] Xen: 0000000040000000 - 00000000defafe00 (unusable) [ 0.000000] Xen: 00000000defafe00 - 00000000defb1ea0 (ACPI NVS) [ 0.000000] Xen: 00000000defb1ea0 - 00000000e0000000 (reserved) [ 0.000000] Xen: 00000000f4000000 - 00000000f8000000 (reserved) For more information: http://mid.gmane.org/1A42CE6F5F474C41B63392A5F80372B2335E978C@shsmsx501.ccr.corp.intel.com BugLink: http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1726 Signed-off-by: Fengzhe Zhang Signed-off-by: Konrad Rzeszutek Wilk commit 3d74a539ae07a8f3c061332e426fc07b2310cf05 Author: Konrad Rzeszutek Wilk Date: Thu Feb 17 16:12:51 2011 -0500 pci/xen: When free-ing MSI-X/MSI irq->desc also use generic code. This code path is only run when an MSI/MSI-X PCI device is passed in to PV DomU. In 2.6.37 time-frame we over-wrote the default cleanup handler for MSI/MSI-X irq->desc to be "xen_teardown_msi_irqs". That function calls the the xen-pcifront driver which can tell the backend to cleanup/take back the MSI/MSI-X device. However, we forgot to continue the process of free-ing the MSI/MSI-X device resources (irq->desc) in the PV domU side. Which is what the default cleanup handler: default_teardown_msi_irqs did. Hence we would leak IRQ descriptors. Without this patch, doing "rmmod igbvf;modprobe igbvf" multiple times ends with abandoned IRQ descriptors: 28: 5 xen-pirq-pcifront-msi-x 29: 8 xen-pirq-pcifront-msi-x ... 130: 10 xen-pirq-pcifront-msi-x with the end result of running out of IRQ descriptors. Reviewed-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit cc0f89c4a426fcd6400a89e9e34e4a8851abef76 Author: Konrad Rzeszutek Wilk Date: Thu Feb 17 12:02:23 2011 -0500 pci/xen: Cleanup: convert int** to int[] Cleanup code. Cosmetic change to make the code look easier to read. Reviewed-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 55cb8cd45e0600df1473489518d7f12ce1bbe973 Author: Konrad Rzeszutek Wilk Date: Wed Feb 16 13:43:04 2011 -0500 pci/xen: Use xen_allocate_pirq_msi instead of xen_allocate_pirq xen_allocate_pirq -> xen_map_pirq_gsi -> PHYSDEVOP_alloc_irq_vector IFF xen_initial_domain() in addition to the kernel side book-keeping side of things (set chip and handler, update irq_info etc) whereas xen_allocate_pirq_msi just does the kernel book keeping. Also xen_allocate_pirq allocates an IRQ in the 1-1 GSI space whereas xen_allocate_pirq_msi allocates a dynamic one in the >GSI IRQ space. All of this is uneccessary as this code path is only executed when we run as a domU PV guest with an MSI/MSI-X PCI card passed in. Hence we can jump straight to allocating an dynamic IRQ (and binding it to the proper PIRQ) and skip the rest. In short: this change is a cosmetic one. Reviewed-by: Ian Campbell Reviewed-by: Stefano Stabellini Signed-off-by: Konrad Rzeszutek Wilk commit 1d4610527bc71d3f9eea520fc51a02d54f79dcd0 Author: Konrad Rzeszutek Wilk Date: Wed Feb 16 13:43:22 2011 -0500 xen-pcifront: Sanity check the MSI/MSI-X values Check the returned vector values for any values that are odd or plain incorrect (say vector value zero), and if so print a warning. Also fixup the return values. This patch was precipiated by the Xen PCIBack returning the incorrect values due to how it was retrieving PIRQ values. This has been fixed in the xen-pciback by "xen/pciback: Utilize 'xen_pirq_from_irq' to get PIRQ value" patch. Reviewed-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit db2e2e6ee9ee9ce93b04c6975fdfef304771d6ad Author: Tejun Heo Date: Mon Jan 24 15:43:03 2011 +0100 xen-pcifront: don't use flush_scheduled_work() flush_scheduled_work() is scheduled for deprecation. Cancel ->op_work directly instead. Signed-off-by: Tejun Heo Cc: Ryan Wilson Cc: Jan Beulich Cc: Jesse Barnes Signed-off-by: Konrad Rzeszutek Wilk commit 44b46c3ef805793ab3a7730dc71c72d0f258ea8e Author: Ian Campbell Date: Fri Feb 11 16:37:41 2011 +0000 xen: annotate functions which only call into __init at start of day Both xen_hvm_init_shared_info and xen_build_mfn_list_list can be called at resume time as well as at start of day but only reference __init functions (extend_brk) at start of day. Hence annotate with __ref. WARNING: arch/x86/built-in.o(.text+0x4f1): Section mismatch in reference from the function xen_hvm_init_shared_info() to the function .init.text:extend_brk() The function xen_hvm_init_shared_info() references the function __init extend_brk(). This is often because xen_hvm_init_shared_info lacks a __init annotation or the annotation of extend_brk is wrong. xen_hvm_init_shared_info calls extend_brk() iff !shared_info_page and initialises shared_info_page with the result. This happens at start of day only. WARNING: arch/x86/built-in.o(.text+0x599b): Section mismatch in reference from the function xen_build_mfn_list_list() to the function .init.text:extend_brk() The function xen_build_mfn_list_list() references the function __init extend_brk(). This is often because xen_build_mfn_list_list lacks a __init annotation or the annotation of extend_brk is wrong. (this warning occurs multiple times) xen_build_mfn_list_list only calls extend_brk() at boot time, while building the initial mfn list list Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 6b08cfebd3bd346d8a2fd68a2265fc7736849802 Author: Ian Campbell Date: Fri Feb 11 15:23:58 2011 +0000 xen p2m: annotate variable which appears unused CC arch/x86/xen/p2m.o arch/x86/xen/p2m.c: In function 'm2p_remove_override': arch/x86/xen/p2m.c:460: warning: 'address' may be used uninitialized in this function arch/x86/xen/p2m.c: In function 'm2p_add_override': arch/x86/xen/p2m.c:426: warning: 'address' may be used uninitialized in this function In actual fact address is inialised in one "if (!PageHighMem(page))" statement and used in a second and so is always initialised before use. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit b052181a985592f81767f631f9f42accb4b436cd Author: Ian Campbell Date: Fri Feb 11 15:23:56 2011 +0000 xen: events: mark cpu_evtchn_mask_p as __refdata This variable starts out pointing at init_evtchn_mask which is marked __initdata but is set to point to a non-init data region in xen_init_IRQ which is itself an __init function so this is safe. Signed-off-by: Ian Campbell Tested-and-acked-by: Andrew Jones Signed-off-by: Konrad Rzeszutek Wilk commit dc5f219e88294b93009eef946251251ffffb6d60 Author: Thomas Gleixner Date: Fri Feb 4 13:19:20 2011 +0100 genirq: Add IRQF_FORCE_RESUME Xen needs to reenable interrupts which are marked IRQF_NO_SUSPEND in the resume path. Add a flag to force the reenabling in the resume code. Tested-and-acked-by: Ian Campbell Signed-off-by: Thomas Gleixner