commit 845bf8657c85043e2ded6622ee29c60be348c91d Author: Greg Kroah-Hartman Date: Mon Oct 5 08:19:01 2009 -0700 Linux 2.6.27.36 commit 3341371fbbfe53013961cfda09fd4282bdf5462b Author: Lee Schermerhorn Date: Mon Sep 21 17:03:40 2009 -0700 mmap: avoid unnecessary anon_vma lock acquisition in vma_adjust() commit 252c5f94d944487e9f50ece7942b0fbf659c5c31 upstream. We noticed very erratic behavior [throughput] with the AIM7 shared workload running on recent distro [SLES11] and mainline kernels on an 8-socket, 32-core, 256GB x86_64 platform. On the SLES11 kernel [2.6.27.19+] with Barcelona processors, as we increased the load [10s of thousands of tasks], the throughput would vary between two "plateaus"--one at ~65K jobs per minute and one at ~130K jpm. The simple patch below causes the results to smooth out at the ~130k plateau. But wait, there's more: We do not see this behavior on smaller platforms--e.g., 4 socket/8 core. This could be the result of the larger number of cpus on the larger platform--a scalability issue--or it could be the result of the larger number of interconnect "hops" between some nodes in this platform and how the tasks for a given load end up distributed over the nodes' cpus and memories--a stochastic NUMA effect. The variability in the results are less pronounced [on the same platform] with Shanghai processors and with mainline kernels. With 31-rc6 on Shanghai processors and 288 file systems on 288 fibre attached storage volumes, the curves [jpm vs load] are both quite flat with the patched kernel consistently producing ~3.9% better throughput [~80K jpm vs ~77K jpm] than the unpatched kernel. Profiling indicated that the "slow" runs were incurring high[er] contention on an anon_vma lock in vma_adjust(), apparently called from the sbrk() system call. The patch: A comment in mm/mmap.c:vma_adjust() suggests that we don't really need the anon_vma lock when we're only adjusting the end of a vma, as is the case for brk(). The comment questions whether it's worth while to optimize for this case. Apparently, on the newer, larger x86_64 platforms, with interesting NUMA topologies, it is worth while--especially considering that the patch [if correct!] is quite simple. We can detect this condition--no overlap with next vma--by noting a NULL "importer". The anon_vma pointer will also be NULL in this case, so simply avoid loading vma->anon_vma to avoid the lock. However, we DO need to take the anon_vma lock when we're inserting a vma ['insert' non-NULL] even when we have no overlap [NULL "importer"], so we need to check for 'insert', as well. And Hugh points out that we should also take it when adjusting vm_start (so that rmap.c can rely upon vma_address() while it holds the anon_vma lock). akpm: Zhang Yanmin reprts a 150% throughput improvement with aim7, so it might be -stable material even though thiss isn't a regression: "this issue is not clear on dual socket Nehalem machine (2*4*2 cpu), but is severe on large machine (4*8*2 cpu)" [hugh.dickins@tiscali.co.uk: test vma start too] Signed-off-by: Lee Schermerhorn Signed-off-by: Hugh Dickins Cc: Nick Piggin Cc: Eric Whitney Tested-by: "Zhang, Yanmin" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 007a89a7e629137845da320c48af2a06c2bc624e Author: Hugh Dickins Date: Mon Sep 21 17:03:29 2009 -0700 mm: fix anonymous dirtying commit 1ac0cb5d0e22d5e483f56b2bc12172dec1cf7536 upstream. do_anonymous_page() has been wrong to dirty the pte regardless. If it's not going to mark the pte writable, then it won't help to mark it dirty here, and clogs up memory with pages which will need swap instead of being thrown away. Especially wrong if no overcommit is chosen, and this vma is not yet VM_ACCOUNTed - we could exceed the limit and OOM despite no overcommit. Signed-off-by: Hugh Dickins Acked-by: Rik van Riel Cc: KAMEZAWA Hiroyuki Cc: KOSAKI Motohiro Cc: Nick Piggin Cc: Mel Gorman Cc: Minchan Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 66ff90a1b5fce45397f7643a11c33e1e92bd891e Author: Lee Schermerhorn Date: Mon Sep 21 17:01:04 2009 -0700 hugetlb: restore interleaving of bootmem huge pages (2.6.31) Not upstream as it is fixed differently in .32 I noticed that alloc_bootmem_huge_page() will only advance to the next node on failure to allocate a huge page. I asked about this on linux-mm and linux-numa, cc'ing the usual huge page suspects. Mel Gorman responded: I strongly suspect that the same node being used until allocation failure instead of round-robin is an oversight and not deliberate at all. It appears to be a side-effect of a fix made way back in commit 63b4613c3f0d4b724ba259dc6c201bb68b884e1a ["hugetlb: fix hugepage allocation with memoryless nodes"]. Prior to that patch it looked like allocations would always round-robin even when allocation was successful. Andy Whitcroft countered that the existing behavior looked like Andi Kleen's original implementation and suggested that we ask him. We did and Andy replied that his intention was to interleave the allocations. So, ... This patch moves the advance of the hstate next node from which to allocate up before the test for success of the attempted allocation. This will unconditionally advance the next node from which to alloc, interleaving successful allocations over the nodes with sufficient contiguous memory, and skipping over nodes that fail the huge page allocation attempt. Note that alloc_bootmem_huge_page() will only be called for huge pages of order > MAX_ORDER. Signed-off-by: Lee Schermerhorn Reviewed-by: Andi Kleen Cc: Mel Gorman Cc: David Rientjes Cc: Adam Litke Cc: Andy Whitcroft Cc: Eric Whitney Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit d5270785ea9ad9689d8ba473f4b3d82d34ac85b1 Author: Patrick McHardy Date: Thu Sep 17 13:58:29 2009 +0200 netfilter: bridge: refcount fix netfilter: bridge: refcount fix Upstream commit f3abc9b9: commit f216f082b2b37c4943f1e7c393e2786648d48f6f ([NETFILTER]: bridge netfilter: deal with martians correctly) added a refcount leak on in_dev. Instead of using in_dev_get(), we can use __in_dev_get_rcu(), as netfilter hooks are running under rcu_read_lock(), as pointed by Patrick. Signed-off-by: Eric Dumazet Signed-off-by: Patrick McHardy Signed-off-by: Greg Kroah-Hartman commit daa72302d0376fe98d71285ea962d5197198536f Author: Arjan van de Ven Date: Wed Sep 30 13:54:47 2009 +0200 net: Make the copy length in af_packet sockopt handler unsigned fixed upstream in commit b7058842c940ad2c08dd829b21e5c92ebe3b8758 in a different way The length of the to-copy data structure is currently stored in a signed integer. However many comparisons are done with sizeof(..) which is unsigned. It's more suitable for this variable to be unsigned to make these comparisons more naturally right. Signed-off-by: Arjan van de Ven Cc: David S. Miller Cc: Ingo Molnar Cc: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit dc7fec29e1f24be79190e25aef1583c92111b052 Author: Arjan van de Ven Date: Wed Sep 30 13:51:11 2009 +0200 net ax25: Fix signed comparison in the sockopt handler fixed upstream in commit b7058842c940ad2c08dd829b21e5c92ebe3b8758 in a different way The ax25 code tried to use if (optlen < sizeof(int)) return -EINVAL; as a security check against optlen being negative (or zero) in the set socket option. Unfortunately, "sizeof(int)" is an unsigned property, with the result that the whole comparison is done in unsigned, letting negative values slip through. This patch changes this to if (optlen < (int)sizeof(int)) return -EINVAL; so that the comparison is done as signed, and negative values get properly caught. Signed-off-by: Arjan van de Ven Cc: David S. Miller Cc: Ingo Molnar Cc: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 18494baca2fd31e6203be2ae9c572c86667da1b1 Author: Tilman Schmidt Date: Tue Aug 25 17:35:56 2009 +0200 Fix incorrect stable backport to bas_gigaset bas_gigaset: correctly allocate USB interrupt transfer buffer [ Upstream commit 170ebf85160dd128e1c4206cc197cce7d1424705 ] This incorrect backport to 2.6.28.10 placed some code into the probe function which used a pointer before it was initialized. Moving this to the correct place (as it is in upstream). Signed-off-by: Stefan Bader Acked-by: Tim Gardner Acked-by: Steve Conklin Signed-off-by: Greg Kroah-Hartman commit 54cd9c84498433f1800ad47445428d3e33d75188 Author: Cord Walter Date: Tue Feb 3 15:14:05 2009 -0800 pcnet_cs: Fix misuse of the equality operator. commit a9d3a146923d374b945aa388dc884df69564a818 upstream. Signed-off-by: Cord Walter Signed-off-by: Komuro Signed-off-by: David S. Miller Cc: Christoph Biedl Signed-off-by: Greg Kroah-Hartman commit 52ccbcd1bcbed712bb586c3c17164fe326e89f1a Author: Baruch Siach Date: Sun Jan 4 16:23:01 2009 -0800 enc28j60: fix RX buffer overflow commit 22692018b93f0782cda5a843cecfffda1854eb8d upstream. The enc28j60 driver doesn't check whether the length of the packet as reported by the hardware fits into the preallocated buffer. When stressed, the hardware may report insanely large packets even tough the "Receive OK" bit is set. Fix this. Signed-off-by: Baruch Siach Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e6edca0908edee83e00c90a904c5ecaf9d0e5404 Author: Christian Lamparter Date: Mon Sep 14 23:08:43 2009 +0200 p54usb: add Zcomax XG-705A usbid commit f7f71173ea69d4dabf166533beffa9294090b7ef upstream. This patch adds a new usbid for Zcomax XG-705A to the device table. Reported-by: Jari Jaakola Signed-off-by: Christian Lamparter Signed-off-by: John W. Linville Signed-off-by: Greg Kroah-Hartman commit 0c1f4cf28922f48c216a11eac10c52b55da8eeb3 Author: Jan Kara Date: Mon Sep 21 17:01:06 2009 -0700 fs: make sure data stored into inode is properly seen before unlocking new inode commit 580be0837a7a59b207c3d5c661d044d8dd0a6a30 upstream. In theory it could happen that on one CPU we initialize a new inode but clearing of I_NEW | I_LOCK gets reordered before some of the initialization. Thus on another CPU we return not fully uptodate inode from iget_locked(). This seems to fix a corruption issue on ext3 mounted over NFS. [akpm@linux-foundation.org: add some commentary] Signed-off-by: Jan Kara Cc: Christoph Hellwig Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman