commit 7273c1c64b2d4cb0ddfa7682ec7ab71dfe906398 Author: Greg Kroah-Hartman Date: Sun Sep 26 17:19:16 2010 -0700 Linux 2.6.35.6 commit 97fe34b1a65bcdf9eb994423e95d7edb1ec491c5 Author: Michael Cree Date: Wed Sep 1 11:25:17 2010 -0400 alpha: Fix printk format errors commit 3e073367a57d41e506f20aebb98e308387ce3090 upstream. When compiling alpha generic build get errors such as: arch/alpha/kernel/err_marvel.c: In function ‘marvel_print_err_cyc’: arch/alpha/kernel/err_marvel.c:119: error: format ‘%ld’ expects type ‘long int’, but argument 6 has type ‘u64’ Replaced a number of %ld format specifiers with %lld since u64 is unsigned long long. Signed-off-by: Michael Cree Signed-off-by: Matt Turner Signed-off-by: Greg Kroah-Hartman commit ae043da9f41d0e020eec1c8888337e7e7012eae2 Author: Chris Wilson Date: Sun Sep 12 18:25:19 2010 +0100 drm/i915: Ensure that the crtcinfo is populated during mode_fixup() commit 897493504addc5609f04a2c4f73c37ab972c29b2 upstream. This should fix the mysterious mode setting failures reported during boot up and after resume, generally for i8xx class machines. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=16478 Reported-and-tested-by: Xavier Chantry Buzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29413 Tested-by: Daniel Vetter Signed-off-by: Chris Wilson Signed-off-by: Greg Kroah-Hartman commit 21a32e679ffa7491e89bfbefa0d36413ca9cb64c Author: Vlad Yasevich Date: Wed Sep 15 10:00:26 2010 -0400 sctp: Do not reset the packet during sctp_packet_config(). commit 4bdab43323b459900578b200a4b8cf9713ac8fab upstream. sctp_packet_config() is called when getting the packet ready for appending of chunks. The function should not touch the current state, since it's possible to ping-pong between two transports when sending, and that can result packet corruption followed by skb overlfow crash. Reported-by: Thomas Dreibholz Signed-off-by: Vlad Yasevich Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 2178c26dbdca2ac63c0e4183aa9e570e2ffe593e Author: Daniel J Blueman Date: Tue Aug 17 23:56:55 2010 +0100 Fix unprotected access to task credentials in waitid() commit f362b73244fb16ea4ae127ced1467dd8adaa7733 upstream. Using a program like the following: #include #include #include #include int main() { id_t id; siginfo_t infop; pid_t res; id = fork(); if (id == 0) { sleep(1); exit(0); } kill(id, SIGSTOP); alarm(1); waitid(P_PID, id, &infop, WCONTINUED); return 0; } to call waitid() on a stopped process results in access to the child task's credentials without the RCU read lock being held - which may be replaced in the meantime - eliciting the following warning: =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- kernel/exit.c:1460 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 2 locks held by waitid02/22252: #0: (tasklist_lock){.?.?..}, at: [] do_wait+0xc5/0x310 #1: (&(&sighand->siglock)->rlock){-.-...}, at: [] wait_consider_task+0x19a/0xbe0 stack backtrace: Pid: 22252, comm: waitid02 Not tainted 2.6.35-323cd+ #3 Call Trace: [] lockdep_rcu_dereference+0xa4/0xc0 [] wait_consider_task+0xaf1/0xbe0 [] do_wait+0xf5/0x310 [] sys_waitid+0x86/0x1f0 [] ? child_wait_callback+0x0/0x70 [] system_call_fastpath+0x16/0x1b This is fixed by holding the RCU read lock in wait_task_continued() to ensure that the task's current credentials aren't destroyed between us reading the cred pointer and us reading the UID from those credentials. Furthermore, protect wait_task_stopped() in the same way. We don't need to keep holding the RCU read lock once we've read the UID from the credentials as holding the RCU read lock doesn't stop the target task from changing its creds under us - so the credentials may be outdated immediately after we've read the pointer, lock or no lock. Signed-off-by: Daniel J Blueman Signed-off-by: David Howells Acked-by: Paul E. McKenney Acked-by: Oleg Nesterov Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 0ac99e6ff81f5c4d57c2cb1b716a3cb78227740c Author: Luck, Tony Date: Tue Aug 24 11:44:18 2010 -0700 guard page for stacks that grow upwards commit 8ca3eb08097f6839b2206e2242db4179aee3cfb3 upstream. pa-risc and ia64 have stacks that grow upwards. Check that they do not run into other mappings. By making VM_GROWSUP 0x0 on architectures that do not ever use it, we can avoid some unpleasant #ifdefs in check_stack_guard_page(). Signed-off-by: Tony Luck Signed-off-by: Linus Torvalds Cc: dann frazier Signed-off-by: Greg Kroah-Hartman commit 3c8b7588c8bb9609c73c49087449be991d3d45ba Author: Mel Gorman Date: Thu Sep 9 16:38:16 2010 -0700 mm: page allocator: update free page counters after pages are placed on the free list commit 72853e2991a2702ae93aaf889ac7db743a415dd3 upstream. When allocating a page, the system uses NR_FREE_PAGES counters to determine if watermarks would remain intact after the allocation was made. This check is made without interrupts disabled or the zone lock held and so is race-prone by nature. Unfortunately, when pages are being freed in batch, the counters are updated before the pages are added on the list. During this window, the counters are misleading as the pages do not exist yet. When under significant pressure on systems with large numbers of CPUs, it's possible for processes to make progress even though they should have been stalled. This is particularly problematic if a number of the processes are using GFP_ATOMIC as the min watermark can be accidentally breached and in extreme cases, the system can livelock. This patch updates the counters after the pages have been added to the list. This makes the allocator more cautious with respect to preserving the watermarks and mitigates livelock possibilities. [akpm@linux-foundation.org: avoid modifying incoming args] Signed-off-by: Mel Gorman Reviewed-by: Rik van Riel Reviewed-by: Minchan Kim Reviewed-by: KAMEZAWA Hiroyuki Reviewed-by: Christoph Lameter Reviewed-by: KOSAKI Motohiro Acked-by: Johannes Weiner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 87f1cbdee91c60af6dd255226e792a6410d77fbb Author: Christoph Lameter Date: Thu Sep 9 16:38:17 2010 -0700 mm: page allocator: calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake commit aa45484031ddee09b06350ab8528bfe5b2c76d1c upstream. Ordinarily watermark checks are based on the vmstat NR_FREE_PAGES as it is cheaper than scanning a number of lists. To avoid synchronization overhead, counter deltas are maintained on a per-cpu basis and drained both periodically and when the delta is above a threshold. On large CPU systems, the difference between the estimated and real value of NR_FREE_PAGES can be very high. If NR_FREE_PAGES is much higher than number of real free page in buddy, the VM can allocate pages below min watermark, at worst reducing the real number of pages to zero. Even if the OOM killer kills some victim for freeing memory, it may not free memory if the exit path requires a new page resulting in livelock. This patch introduces a zone_page_state_snapshot() function (courtesy of Christoph) that takes a slightly more accurate view of an arbitrary vmstat counter. It is used to read NR_FREE_PAGES while kswapd is awake to avoid the watermark being accidentally broken. The estimate is not perfect and may result in cache line bounces but is expected to be lighter than the IPI calls necessary to continually drain the per-cpu counters while kswapd is awake. Signed-off-by: Christoph Lameter Signed-off-by: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 39f7a62c0d15eb8b489a707778e9c8c30d7f4d5d Author: Mel Gorman Date: Thu Sep 9 16:38:18 2010 -0700 mm: page allocator: drain per-cpu lists after direct reclaim allocation fails commit 9ee493ce0a60bf42c0f8fd0b0fe91df5704a1cbf upstream. When under significant memory pressure, a process enters direct reclaim and immediately afterwards tries to allocate a page. If it fails and no further progress is made, it's possible the system will go OOM. However, on systems with large amounts of memory, it's possible that a significant number of pages are on per-cpu lists and inaccessible to the calling process. This leads to a process entering direct reclaim more often than it should increasing the pressure on the system and compounding the problem. This patch notes that if direct reclaim is making progress but allocations are still failing that the system is already under heavy pressure. In this case, it drains the per-cpu lists and tries the allocation a second time before continuing. Signed-off-by: Mel Gorman Reviewed-by: Minchan Kim Reviewed-by: KAMEZAWA Hiroyuki Reviewed-by: KOSAKI Motohiro Reviewed-by: Christoph Lameter Cc: Dave Chinner Cc: Wu Fengguang Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 221d4da7cbe3a3a66d2adc0d7bb91a1d92d8c0bb Author: Islam Amer Date: Thu Jun 24 13:39:47 2010 -0400 dell-wmi: Add support for eject key on Dell Studio 1555 commit d5164dbf1f651d1e955b158fb70a9c844cc91cd1 upstream. Fixes pressing the eject key on Dell Studio 1555 does not work and produces message : dell-wmi: Unknown key 0 pressed Signed-off-by: Islam Amer Cc: Kyle McMartin Signed-off-by: Greg Kroah-Hartman commit 92e8e73dcc82076d63296c03700a3d6324672028 Author: Morten H. Larsen Date: Tue Aug 31 22:29:13 2010 -0400 Fix call to replaced SuperIO functions commit 59b25ed91400ace98d6cf0d59b1cb6928ad5cd37 upstream. This patch fixes the failure to compile Alpha Generic because of previously overlooked calls to ns87312_enable_ide(). The function has been replaced by newer SuperIO code. Tested-by: Michael Cree Signed-off-by: Morten H. Larsen Signed-off-by: Matt Turner Signed-off-by: Greg Kroah-Hartman commit 013dbea410981f6a064d1e7b2abfcbbbe6d24c9c Author: Daniel J Blueman Date: Tue Aug 3 11:09:13 2010 +0100 ALSA: hda - Fix beep frequency on IDT 92HD73xx and 92HD71Bxx codecs commit 1b0e372d7b52c9fc96348779015a6db7df7f286e upstream. Fix HDA beep frequency on IDT 92HD73xx and 92HD71Bxx codecs. These codecs use the standard beep frequency calculation although the datasheet says it's linear frequency. Other IDT/STAC codecs might have the same problem. They should be fixed individually later. Signed-off-by: Daniel J Blueman Signed-off-by: Takashi Iwai Cc: أحمد المحمودي Signed-off-by: Greg Kroah-Hartman commit 24dadf6e9439a4599fb8afeb243930dcc5883ffa Author: Luca Barbieri Date: Thu Aug 12 07:00:35 2010 -0700 x86, asm: Use a lower case name for the end macro in atomic64_386_32.S commit 417484d47e115774745ef025bce712a102b6f86f upstream. Use a lowercase name for the end macro, which somehow fixes a binutils 2.16 problem. Signed-off-by: Luca Barbieri LKML-Reference: Signed-off-by: H. Peter Anvin Signed-off-by: Greg Kroah-Hartman commit ff278af4dcda812b4a3f8ef0d0260232ed1352ac Author: Rafael J. Wysocki Date: Sat Sep 11 20:58:27 2010 +0200 PM / Hibernate: Avoid hitting OOM during preallocation of memory commit 6715045ddc7472a22be5e49d4047d2d89b391f45 upstream. There is a problem in hibernate_preallocate_memory() that it calls preallocate_image_memory() with an argument that may be greater than the total number of available non-highmem memory pages. If that's the case, the OOM condition is guaranteed to trigger, which in turn can cause significant slowdown to occur during hibernation. To avoid that, make preallocate_image_memory() adjust its argument before calling preallocate_image_pages(), so that the total number of saveable non-highem pages left is not less than the minimum size of a hibernation image. Change hibernate_preallocate_memory() to try to allocate from highmem if the number of pages allocated by preallocate_image_memory() is too low. Modify free_unnecessary_pages() to take all possible memory allocation patterns into account. Reported-by: KOSAKI Motohiro Signed-off-by: Rafael J. Wysocki Tested-by: M. Vefa Bicakci Signed-off-by: Greg Kroah-Hartman commit 2af3495b6359c30740b31175aa818f600f5b8bb9 Author: Colin Cross Date: Fri Sep 3 01:24:07 2010 +0200 PM: Prevent waiting forever on asynchronous resume after failing suspend commit 152e1d592071c8b312bb898bc1118b64e4aea535 upstream. During suspend, the power.completion is expected to be set when a device has not yet started suspending. Set it on init to fix a corner case where a device is resumed when its parent has never suspended. Consider three drivers, A, B, and C. The parent of A is C, and C has async_suspend set. On boot, C->power.completion is initialized to 0. During the first suspend: suspend_devices_and_enter(...) dpm_resume(...) device_suspend(A) device_suspend(B) returns error, aborts suspend dpm_resume_end(...) dpm_resume(...) device_resume(A) dpm_wait(A->parent == C) wait_for_completion(C->power.completion) The wait_for_completion will never complete, because complete_all(C->power.completion) will only be called from device_suspend(C) or device_resume(C), neither of which is called if suspend is aborted before C. After a successful suspend->resume cycle, where B doesn't abort suspend, C->power.completion is left in the completed state by the call to device_resume(C), and the same call path will work if B aborts suspend. Signed-off-by: Colin Cross Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman commit bf818d9482431477a787aba2704c8cc48e226703 Author: Nicolas Ferre Date: Fri Aug 20 16:44:33 2010 +0200 AT91: change dma resource index commit 8d2602e0778299e2d6084f03086b716d6e7a1e1e upstream. Reported-by: Dan Liang Signed-off-by: Nicolas Ferre Signed-off-by: Greg Kroah-Hartman commit 63e70525feb93ddfdadd2eb81d0a71a055343d99 Author: Dan Rosenberg Date: Wed Sep 15 19:08:24 2010 -0400 drivers/video/via/ioctl.c: prevent reading uninitialized stack memory commit b4aaa78f4c2f9cde2f335b14f4ca30b01f9651ca upstream. The VIAFB_GET_INFO device ioctl allows unprivileged users to read 246 bytes of uninitialized stack memory, because the "reserved" member of the viafb_ioctl_info struct declared on the stack is not altered or zeroed before being copied back to the user. This patch takes care of it. Signed-off-by: Dan Rosenberg Signed-off-by: Florian Tobias Schandinat Signed-off-by: Greg Kroah-Hartman commit 0d1f22d21f4ca1fe9943667b712beb4d80611236 Author: Dan Rosenberg Date: Mon Sep 6 18:24:57 2010 -0400 xfs: prevent reading uninitialized stack memory commit a122eb2fdfd78b58c6dd992d6f4b1aaef667eef9 upstream. The XFS_IOC_FSGETXATTR ioctl allows unprivileged users to read 12 bytes of uninitialized stack memory, because the fsxattr struct declared on the stack in xfs_ioc_fsgetxattr() does not alter (or zero) the 12-byte fsx_pad member before copying it back to the user. This patch takes care of it. Signed-off-by: Dan Rosenberg Reviewed-by: Eric Sandeen Signed-off-by: Alex Elder Cc: dann frazier Signed-off-by: Greg Kroah-Hartman commit 0f403655ecddfdb7630adbb6ba59c30cb0c541a4 Author: David Howells Date: Fri Sep 10 09:59:51 2010 +0100 KEYS: Fix bug in keyctl_session_to_parent() if parent has no session keyring commit 3d96406c7da1ed5811ea52a3b0905f4f0e295376 upstream. Fix a bug in keyctl_session_to_parent() whereby it tries to check the ownership of the parent process's session keyring whether or not the parent has a session keyring [CVE-2010-2960]. This results in the following oops: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0 IP: [] keyctl_session_to_parent+0x251/0x443 ... Call Trace: [] ? keyctl_session_to_parent+0x67/0x443 [] ? __do_fault+0x24b/0x3d0 [] sys_keyctl+0xb4/0xb8 [] system_call_fastpath+0x16/0x1b if the parent process has no session keyring. If the system is using pam_keyinit then it mostly protected against this as all processes derived from a login will have inherited the session keyring created by pam_keyinit during the log in procedure. To test this, pam_keyinit calls need to be commented out in /etc/pam.d/. Reported-by: Tavis Ormandy Signed-off-by: David Howells Acked-by: Tavis Ormandy Cc: dann frazier Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 21313e0760fd268ca778f54c10b4e8f8e889c97c Author: David Howells Date: Fri Sep 10 09:59:46 2010 +0100 KEYS: Fix RCU no-lock warning in keyctl_session_to_parent() commit 9d1ac65a9698513d00e5608d93fca0c53f536c14 upstream. There's an protected access to the parent process's credentials in the middle of keyctl_session_to_parent(). This results in the following RCU warning: =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- security/keys/keyctl.c:1291 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by keyctl-session-/2137: #0: (tasklist_lock){.+.+..}, at: [] keyctl_session_to_parent+0x60/0x236 stack backtrace: Pid: 2137, comm: keyctl-session- Not tainted 2.6.36-rc2-cachefs+ #1 Call Trace: [] lockdep_rcu_dereference+0xaa/0xb3 [] keyctl_session_to_parent+0xed/0x236 [] sys_keyctl+0xb4/0xb6 [] system_call_fastpath+0x16/0x1b The code should take the RCU read lock to make sure the parents credentials don't go away, even though it's holding a spinlock and has IRQ disabled. Signed-off-by: David Howells Signed-off-by: Linus Torvalds Cc: dann frazier Signed-off-by: Greg Kroah-Hartman commit f2c26f52fe46da66b74d6bf818d7941e01eefa04 Author: Petr Tesarik Date: Wed Sep 15 15:35:48 2010 -0700 IA64: Optimize ticket spinlocks in fsys_rt_sigprocmask commit 2d2b6901649a62977452be85df53eda2412def24 upstream. Tony's fix (f574c843191728d9407b766a027f779dcd27b272) has a small bug, it incorrectly uses "r3" as a scratch register in the first of the two unlock paths ... it is also inefficient. Optimize the fast path again. Signed-off-by: Petr Tesarik Signed-off-by: Tony Luck Signed-off-by: Greg Kroah-Hartman commit 749d278c05e0d81327647ba7ea71bf01422844c2 Author: Tony Luck Date: Thu Sep 9 15:16:56 2010 -0700 IA64: fix siglock commit f574c843191728d9407b766a027f779dcd27b272 upstream. When ia64 converted to using ticket locks, an inline implementation of trylock/unlock in fsys.S was missed. This was not noticed because in most circumstances it simply resulted in using the slow path because the siglock was apparently not available (under old spinlock rules). Problems occur when the ticket spinlock has value 0x0 (when first initialised, or when it wraps around). At this point the fsys.S code acquires the lock (changing the 0x0 to 0x1. If another process attempts to get the lock at this point, it will change the value from 0x1 to 0x2 (using new ticket lock rules). Then the fsys.S code will free the lock using old spinlock rules by writing 0x0 to it. From here a variety of bad things can happen. Signed-off-by: Tony Luck Signed-off-by: Greg Kroah-Hartman commit d963ca2b66d33935a858dc01a899dec0b3c77fc9 Author: Avi Kivity Date: Fri Sep 17 13:13:18 2010 -0300 KVM: VMX: Fix host GDT.LIMIT corruption commit 3444d7da1839b851eefedd372978d8a982316c36 upstream. vmx does not restore GDT.LIMIT to the host value, instead it sets it to 64KB. This means host userspace can learn a few bits of host memory. Fix by reloading GDTR when we load other host state. Signed-off-by: Avi Kivity Signed-off-by: Marcelo Tosatti Signed-off-by: Greg Kroah-Hartman commit 291e06417553bbca9d6cf59d5ac1040a9d71da99 Author: Andrea Arcangeli Date: Fri Sep 17 13:13:17 2010 -0300 KVM: MMU: fix mmu notifier invalidate handler for huge spte commit 6e3e243c3b6e0bbd18c6ce0fbc12bc3fe2d77b34 upstream. The index wasn't calculated correctly (off by one) for huge spte so KVM guest was unstable with transparent hugepages. Signed-off-by: Andrea Arcangeli Reviewed-by: Reviewed-by: Rik van Riel Signed-off-by: Avi Kivity Cc: Marcelo Tosatti Signed-off-by: Greg Kroah-Hartman commit 41d2a3bcdb188e4d6d8cc799f93aed93c0639e36 Author: Gleb Natapov Date: Fri Sep 17 13:13:16 2010 -0300 KVM: x86: emulator: inc/dec can have lock prefix commit c0e0608cb902af1a1fd8d413ec0a07ee1e62c652 upstream. Mark inc (0xfe/0 0xff/0) and dec (0xfe/1 0xff/1) as lock prefix capable. Signed-off-by: Gleb Natapov Signed-off-by: Marcelo Tosatti Signed-off-by: Greg Kroah-Hartman commit 724b86b0bcd26cda145915e2991952f23e0775d2 Author: Xiao Guangrong Date: Fri Sep 17 13:13:15 2010 -0300 KVM: MMU: fix direct sp's access corrupted commit 9e7b0e7fba45ca3c6357aeb7091ebc281f1de365 upstream. If the mapping is writable but the dirty flag is not set, we will find the read-only direct sp and setup the mapping, then if the write #PF occur, we will mark this mapping writable in the read-only direct sp, now, other real read-only mapping will happily write it without #PF. It may hurt guest's COW Fixed by re-install the mapping when write #PF occur. Signed-off-by: Xiao Guangrong Signed-off-by: Marcelo Tosatti Signed-off-by: Greg Kroah-Hartman commit aaf270ed0cb87db6a037a682a3d2f860efb32f59 Author: Avi Kivity Date: Fri Sep 17 13:13:14 2010 -0300 KVM: Prevent internal slots from being COWed commit 7ac77099ce88a0c31b75acd0ec5ef3da4415a6d8 upstream. If a process with a memory slot is COWed, the page will change its address (despite having an elevated reference count). This breaks internal memory slots which have their physical addresses loaded into vmcs registers (see the APIC access memory slot). Signed-off-by: Avi Kivity Cc: Marcelo Tosatti Signed-off-by: Greg Kroah-Hartman commit 1be501980f140e5a6698999df9c7c6124c14810f Author: Avi Kivity Date: Fri Sep 17 13:13:13 2010 -0300 KVM: Keep slot ID in memory slot structure commit e36d96f7cfaa71870c407131eb4fbd38ea285c01 upstream. May be used for distinguishing between internal and user slots, or for sorting slots in size order. Signed-off-by: Avi Kivity Cc: Marcelo Tosatti Signed-off-by: Greg Kroah-Hartman commit 80e1809cb470846bd477c8ee041262d1483c0dc8 Author: Ryan Kuester Date: Mon Apr 26 18:11:54 2010 -0500 SCSI: mptsas: fix hangs caused by ATA pass-through commit 2a1b7e575b80ceb19ea50bfa86ce0053ea57181d upstream. I may have an explanation for the LSI 1068 HBA hangs provoked by ATA pass-through commands, in particular by smartctl. First, my version of the symptoms. On an LSI SAS1068E B3 HBA running 01.29.00.00 firmware, with SATA disks, and with smartd running, I'm seeing occasional task, bus, and host resets, some of which lead to hard faults of the HBA requiring a reboot. Abusively looping the smartctl command, # while true; do smartctl -a /dev/sdb > /dev/null; done dramatically increases the frequency of these failures to nearly one per minute. A high IO load through the HBA while looping smartctl seems to improve the chance of a full scsi host reset or a non-recoverable hang. I reduced what smartctl was doing down to a simple test case which causes the hang with a single IO when pointed at the sd interface. See the code at the bottom of this e-mail. It uses an SG_IO ioctl to issue a single pass-through ATA identify device command. If the buffer userspace gives for the read data has certain alignments, the task is issued to the HBA but the HBA fails to respond. If run against the sg interface, neither the test code nor smartctl causes a hang. sd and sg handle the SG_IO ioctl slightly differently. Unless you specifically set a flag to do direct IO, sg passes a buffer of its own, which is page-aligned, to the block layer and later copies the result into the userspace buffer regardless of its alignment. sd, on the other hand, always does direct IO unless the userspace buffer fails an alignment test at block/blk-map.c line 57, in which case a page-aligned buffer is created and used for the transfer. The alignment test currently checks for word-alignment, the default setup by scsi_lib.c; therefore, userspace buffers of almost any alignment are given directly to the HBA as DMA targets. The LSI 1068 hardware doesn't seem to like at least a couple of the alignments which cross a page boundary (see the test code below). Curiously, many page-boundary-crossing alignments do work just fine. So, either the hardware has an bug handling certain alignments or the hardware has a stricter alignment requirement than the driver is advertising. If stricter alignment is required, then in no case should misaligned buffers from userspace be allowed through without being bounced or at least causing an error to be returned. It seems the mptsas driver could use blk_queue_dma_alignment() to advertise a stricter alignment requirement. If it does, sd does the right thing and bounces misaligned buffers (see block/blk-map.c line 57). The following patch to 2.6.34-rc5 makes my symptoms go away. I'm sure this is the wrong place for this code, but it gets my idea across. Acked-by: Kashyap Desai Signed-off-by: James Bottomley Signed-off-by: Greg Kroah-Hartman commit f4254199838237a0b7df93474cf6b4f46c0b0a4a Author: Eric Paris Date: Wed Jul 28 10:18:37 2010 -0400 inotify: send IN_UNMOUNT events commit 611da04f7a31b2208e838be55a42c7a1310ae321 upstream. Since the .31 or so notify rewrite inotify has not sent events about inodes which are unmounted. This patch restores those events. Signed-off-by: Eric Paris Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman commit b85ec83e1e6dc064f6623378a8f3b402a001ade7 Author: Marcin Slusarz Date: Sun Aug 22 20:54:08 2010 +0200 drm/nv50: initialize ramht_refs list for faked 0 channel commit 615661f3948a066fd22a36fe8ea0c528b75ee373 upstream. We need it for PFIFO_INTR_CACHE_ERROR interrupt handling, because nouveau_fifo_swmthd looks for matching gpuobj in ramht_refs list. It fixes kernel panic in nouveau_gpuobj_ref_find. Signed-off-by: Marcin Slusarz Signed-off-by: Ben Skeggs Signed-off-by: Greg Kroah-Hartman commit 2e4e3bb743cc3dfe09fd39aa6bd3aea6a19e5286 Author: Steven Whitehouse Date: Thu Sep 9 14:45:00 2010 +0100 GFS2: gfs2_logd should be using interruptible waits commit 5f4874903df3562b9d5649fc1cf7b8c6bb238e42 upstream. Looks like this crept in, in a recent update. Reported-by: Krzysztof Urbaniak Signed-off-by: Steven Whitehouse Signed-off-by: Greg Kroah-Hartman commit 06a351856476ca85b2aba3161522ade9b2ab6cdb Author: Thomas Renninger Date: Fri May 21 16:18:09 2010 +0200 x86 platform drivers: hp-wmi Reorder event id processing commit 751ae808f6b29803228609f51aa1ae057f5c576e upstream. Event id 0x4 defines the hotkey event. No need (or even wrong) to query HPWMI_HOTKEY_QUERY if event id is != 0x4. Reorder the eventcode conditionals and use switch case instead of if/else. Use an enum for the event ids cases. Signed-off-by: Thomas Renninger Signed-off-by: Matthew Garrett CC: linux-acpi@vger.kernel.org CC: platform-driver-x86@vger.kernel.org Signed-off-by: Greg Kroah-Hartman commit 1d506ee42d387f84cd063fff8014d8decc1875c4 Author: Jeff Moyer Date: Fri Sep 10 14:16:00 2010 -0700 aio: check for multiplication overflow in do_io_submit commit 75e1c70fc31490ef8a373ea2a4bea2524099b478 upstream. Tavis Ormandy pointed out that do_io_submit does not do proper bounds checking on the passed-in iocb array:        if (unlikely(nr < 0))                return -EINVAL;        if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(iocbpp)))))                return -EFAULT;                      ^^^^^^^^^^^^^^^^^^ The attached patch checks for overflow, and if it is detected, the number of iocbs submitted is scaled down to a number that will fit in the long.  This is an ok thing to do, as sys_io_submit is documented as returning the number of iocbs submitted, so callers should handle a return value of less than the 'nr' argument passed in. Reported-by: Tavis Ormandy Signed-off-by: Jeff Moyer Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 94c195797a9c05cb34426abb0fd38d254d293591 Author: Jan Kara Date: Wed Sep 22 13:05:03 2010 -0700 aio: do not return ERESTARTSYS as a result of AIO commit a0c42bac79731276c9b2f28d54f9e658fcf843a2 upstream. OCFS2 can return ERESTARTSYS from its write function when the process is signalled while waiting for a cluster lock (and the filesystem is mounted with intr mount option). Generally, it seems reasonable to allow filesystems to return this error code from its IO functions. As we must not leak ERESTARTSYS (and similar error codes) to userspace as a result of an AIO operation, we have to properly convert it to EINTR inside AIO code (restarting the syscall isn't really an option because other AIO could have been already submitted by the same io_submit syscall). Signed-off-by: Jan Kara Reviewed-by: Jeff Moyer Cc: Christoph Hellwig Cc: Zach Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 3cc932ba7fa6f24a84c18a581eae1269dc31efef Author: Tejun Heo Date: Tue Sep 21 07:57:19 2010 +0200 percpu: fix pcpu_last_unit_cpu commit 46b30ea9bc3698bc1d1e6fd726c9601d46fa0a91 upstream. pcpu_first/last_unit_cpu are used to track which cpu has the first and last units assigned. This in turn is used to determine the span of a chunk for man/unmap cache flushes and whether an address belongs to the first chunk or not in per_cpu_ptr_to_phys(). When the number of possible CPUs isn't power of two, a chunk may contain unassigned units towards the end of a chunk. The logic to determine pcpu_last_unit_cpu was incorrect when there was an unused unit at the end of a chunk. It failed to ignore the unused unit and assigned the unused marker NR_CPUS to pcpu_last_unit_cpu. This was discovered through kdump failure which was caused by malfunctioning per_cpu_ptr_to_phys() on a kvm setup with 50 possible CPUs by CAI Qian. Signed-off-by: Tejun Heo Reported-by: CAI Qian Signed-off-by: Greg Kroah-Hartman commit ee71a393b8612fc2b89fdac0f2a3056972cdd3f9 Author: Minchan Kim Date: Wed Sep 22 13:05:01 2010 -0700 vmscan: check all_unreclaimable in direct reclaim path commit d1908362ae0b97374eb8328fbb471576332f9fb1 upstream. M. Vefa Bicakci reported 2.6.35 kernel hang up when hibernation on his 32bit 3GB mem machine. (https://bugzilla.kernel.org/show_bug.cgi?id=16771). Also he bisected the regression to commit bb21c7ce18eff8e6e7877ca1d06c6db719376e3c Author: KOSAKI Motohiro Date: Fri Jun 4 14:15:05 2010 -0700 vmscan: fix do_try_to_free_pages() return value when priority==0 reclaim failure At first impression, this seemed very strange because the above commit only chenged function return value and hibernate_preallocate_memory() ignore return value of shrink_all_memory(). But it's related. Now, page allocation from hibernation code may enter infinite loop if the system has highmem. The reasons are that vmscan don't care enough OOM case when oom_killer_disabled. The problem sequence is following as. 1. hibernation 2. oom_disable 3. alloc_pages 4. do_try_to_free_pages if (scanning_global_lru(sc) && !all_unreclaimable) return 1; If kswapd is not freozen, it would set zone->all_unreclaimable to 1 and then shrink_zones maybe return true(ie, all_unreclaimable is true). So at last, alloc_pages could go to _nopage_. If it is, it should have no problem. This patch adds all_unreclaimable check to protect in direct reclaim path, too. It can care of hibernation OOM case and help bailout all_unreclaimable case slightly. Signed-off-by: KOSAKI Motohiro Signed-off-by: Minchan Kim Reported-by: M. Vefa Bicakci Reported-by: Reviewed-by: Johannes Weiner Tested-by: Acked-by: Rafael J. Wysocki Acked-by: Rik van Riel Acked-by: KAMEZAWA Hiroyuki Cc: Balbir Singh Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit db42f7e4f8cdd2e25ab838365f4bfb6b328289da Author: Arnd Bergmann Date: Wed Sep 22 13:04:54 2010 -0700 /proc/vmcore: fix seeking commit c227e69028473c7c7994a9b0a2cc0034f3f7e0fe upstream. Commit 73296bc611 ("procfs: Use generic_file_llseek in /proc/vmcore") broke seeking on /proc/vmcore. This changes it back to use default_llseek in order to restore the original behaviour. The problem with generic_file_llseek is that it only allows seeks up to inode->i_sb->s_maxbytes, which is zero on procfs and some other virtual file systems. We should merge generic_file_llseek and default_llseek some day and clean this up in a proper way, but for 2.6.35/36, reverting vmcore is the safer solution. Signed-off-by: Arnd Bergmann Cc: Frederic Weisbecker Reported-by: CAI Qian Tested-by: CAI Qian Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 32e987bf2db7941e5a27bfa4dbed50663c5b1ba2 Author: Dan Rosenberg Date: Wed Sep 22 14:32:56 2010 -0400 Prevent freeing uninitialized pointer in compat_do_readv_writev commit 767b68e96993e29e3480d7ecdd9c4b84667c5762 upstream. In 32-bit compatibility mode, the error handling for compat_do_readv_writev() may free an uninitialized pointer, potentially leading to all sorts of ugly memory corruption. This is reliably triggerable by unprivileged users by invoking the readv()/writev() syscalls with an invalid iovec pointer. The below patch fixes this to emulate the non-compat version. Introduced by commit b83733639a49 ("compat: factor out compat_rw_copy_check_uvector from compat_do_readv_writev") Signed-off-by: Dan Rosenberg Cc: Al Viro Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit e658aee24443e17c08424ae4ce193735c7fde737 Author: Vladimir Zapolskiy Date: Wed Sep 22 13:05:13 2010 -0700 rtc: s3c: balance state changes of wakeup flag commit f501ed524b26ba1b739b7f7feb0a0e1496878769 upstream. This change resolves a problem about unbalanced calls of enable_irq_wakeup() and disable_irq_wakeup() for alarm interrupt. Bug reproduction: root@eb600:~# echo 0 > /sys/class/rtc/rtc0/wakealarm WARNING: at kernel/irq/manage.c:361 set_irq_wake+0x7c/0xe4() Unbalanced IRQ 46 wake disable Modules linked in: [] (unwind_backtrace+0x0/0xd8) from [] (warn_slowpath_common+0x44/0x5c) [] (warn_slowpath_common+0x44/0x5c) from [] (warn_slowpath_fmt+0x24/0x30) [] (warn_slowpath_fmt+0x24/0x30) from [] (set_irq_wake+0x7c/0xe4) [] (set_irq_wake+0x7c/0xe4) from [] (s3c_rtc_setalarm+0xa8/0xb8) [] (s3c_rtc_setalarm+0xa8/0xb8) from [] (rtc_set_alarm+0x60/0x74) [] (rtc_set_alarm+0x60/0x74) from [] (rtc_sysfs_set_wakealarm+0xc8/0xd8) [] (rtc_sysfs_set_wakealarm+0xc8/0xd8) from [] (dev_attr_store+0x20/0x24) [] (dev_attr_store+0x20/0x24) from [] (sysfs_write_file+0x104/0x13c) [] (sysfs_write_file+0x104/0x13c) from [] (vfs_write+0xb0/0x158) [] (vfs_write+0xb0/0x158) from [] (sys_write+0x3c/0x68) [] (sys_write+0x3c/0x68) from [] (ret_fast_syscall+0x0/0x28) Signed-off-by: Vladimir Zapolskiy Cc: Alessandro Zummo Cc: Ben Dooks Cc: Atul Dahiya Cc: Taekgyun Ko Cc: Kukjin Kim Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit a3bec3ef92a0395e64b5ac67640f420bfabe1bab Author: Dan Rosenberg Date: Wed Sep 22 13:05:09 2010 -0700 drivers/video/sis/sis_main.c: prevent reading uninitialized stack memory commit fd02db9de73faebc51240619c7c7f99bee9f65c7 upstream. The FBIOGET_VBLANK device ioctl allows unprivileged users to read 16 bytes of uninitialized stack memory, because the "reserved" member of the fb_vblank struct declared on the stack is not altered or zeroed before being copied back to the user. This patch takes care of it. Signed-off-by: Dan Rosenberg Cc: Thomas Winischhofer Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit e675865c92e66331dba733c544eb53b3d6c45eaa Author: Andrea Arcangeli Date: Wed Sep 22 13:05:12 2010 -0700 mmap: call unlink_anon_vmas() in __split_vma() in case of error commit 2aeadc30de45a72648f271603203ab392b80f607 upstream. If __split_vma fails because of an out of memory condition the anon_vma_chain isn't teardown and freed potentially leading to rmap walks accessing freed vma information plus there's a memleak. Signed-off-by: Andrea Arcangeli Acked-by: Johannes Weiner Acked-by: Rik van Riel Acked-by: Hugh Dickins Cc: Marcelo Tosatti Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit b932db652630421d4a75e3b661995962a856bf64 Author: Andrew Morton Date: Wed Sep 22 13:05:11 2010 -0700 drivers/pci/intel-iommu.c: fix build with older gcc's commit df08cdc7ef606509debe7677c439be0ca48790e4 upstream. drivers/pci/intel-iommu.c: In function `__iommu_calculate_agaw': drivers/pci/intel-iommu.c:437: sorry, unimplemented: inlining failed in call to 'width_to_agaw': function body not available drivers/pci/intel-iommu.c:445: sorry, unimplemented: called from here Move the offending function (and its siblings) to top-of-file, remove the forward declaration. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=17441 Reported-by: Martin Mokrejs Cc: David Woodhouse Cc: Jesse Barnes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 0f8f659720f03fcd2fd68384b9fd2ac43535b4be Author: Jan Kara Date: Tue Sep 21 11:49:01 2010 +0200 char: Mark /dev/zero and /dev/kmem as not capable of writeback commit 371d217ee1ff8b418b8f73fb2a34990f951ec2d4 upstream. These devices don't do any writeback but their device inodes still can get dirty so mark bdi appropriately so that bdi code does the right thing and files inodes to lists of bdi carrying the device inodes. Signed-off-by: Jan Kara Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit 2263d443c405052b413b65506feb3534d375d55c Author: Jan Kara Date: Tue Sep 21 11:48:55 2010 +0200 bdi: Initialize noop_backing_dev_info properly commit 976e48f8a5b02fc33f3e5cad87fb3fcea041a49c upstream. Properly initialize this backing dev info so that writeback code does not barf when getting to it e.g. via sb->s_bdi. Signed-off-by: Jan Kara Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit e1d7519864aede29b957d1dc5760d83d9ec2ff3a Author: Chris Wilson Date: Fri Sep 17 08:22:30 2010 +0100 drm/i915,agp/intel: Add second set of PCI-IDs for B43 commit 41a51428916ab04587bacee2dda61c4a0c4fc02f upstream. There is a second revision of B43 (a desktop gen4 part) floating around, functionally equivalent to the original B43, so simply add the new PCI-IDs. Bugzilla: https://bugs.freedesktop.org/show_bugs.cgi?id=30221 Signed-off-by: Chris Wilson Signed-off-by: Greg Kroah-Hartman commit 6d4179dfe5289f3e7d9c0e4ecb7bf2600404595f Author: Patrick Simmons Date: Wed Sep 8 10:34:28 2010 -0400 oprofile: Add Support for Intel CPU Family 6 / Model 22 (Intel Celeron 540) commit c33f543d320843e1732534c3931da4bbd18e6c14 upstream. This patch adds CPU type detection for the Intel Celeron 540, which is part of the Core 2 family according to Wikipedia; the family and ID pair is absent from the Volume 3B table referenced in the source code comments. I have tested this patch on an Intel Celeron 540 machine reporting itself as Family 6 Model 22, and OProfile runs on the machine without issue. Spec: http://download.intel.com/design/mobile/SPECUPDT/317667.pdf Signed-off-by: Patrick Simmons Acked-by: Andi Kleen Acked-by: Arnd Bergmann Signed-off-by: Robert Richter Signed-off-by: Greg Kroah-Hartman commit 22bad8bbb944a2973d3fa3581198fd67c2f40eba Author: Stanislaw Gruszka Date: Tue Sep 14 16:35:14 2010 +0200 sched: Fix user time incorrectly accounted as system time on 32-bit commit e75e863dd5c7d96b91ebbd241da5328fc38a78cc upstream. We have 32-bit variable overflow possibility when multiply in task_times() and thread_group_times() functions. When the overflow happens then the scaled utime value becomes erroneously small and the scaled stime becomes i erroneously big. Reported here: https://bugzilla.redhat.com/show_bug.cgi?id=633037 https://bugzilla.kernel.org/show_bug.cgi?id=16559 Reported-by: Michael Chapman Reported-by: Ciriaco Garcia de Celis Signed-off-by: Stanislaw Gruszka Signed-off-by: Peter Zijlstra Cc: Hidetoshi Seto LKML-Reference: <20100914143513.GB8415@redhat.com> Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 7495973bc945e7f4962f676ad631492b2ba4c061 Author: Paul E. McKenney Date: Tue Aug 31 17:00:18 2010 -0700 pid: make setpgid() system call use RCU read-side critical section commit 950eaaca681c44aab87a46225c9e44f902c080aa upstream. [ 23.584719] [ 23.584720] =================================================== [ 23.585059] [ INFO: suspicious rcu_dereference_check() usage. ] [ 23.585176] --------------------------------------------------- [ 23.585176] kernel/pid.c:419 invoked rcu_dereference_check() without protection! [ 23.585176] [ 23.585176] other info that might help us debug this: [ 23.585176] [ 23.585176] [ 23.585176] rcu_scheduler_active = 1, debug_locks = 1 [ 23.585176] 1 lock held by rc.sysinit/728: [ 23.585176] #0: (tasklist_lock){.+.+..}, at: [] sys_setpgid+0x5f/0x193 [ 23.585176] [ 23.585176] stack backtrace: [ 23.585176] Pid: 728, comm: rc.sysinit Not tainted 2.6.36-rc2 #2 [ 23.585176] Call Trace: [ 23.585176] [] lockdep_rcu_dereference+0x99/0xa2 [ 23.585176] [] find_task_by_pid_ns+0x50/0x6a [ 23.585176] [] find_task_by_vpid+0x1d/0x1f [ 23.585176] [] sys_setpgid+0x67/0x193 [ 23.585176] [] system_call_fastpath+0x16/0x1b [ 24.959669] type=1400 audit(1282938522.956:4): avc: denied { module_request } for pid=766 comm="hwclock" kmod="char-major-10-135" scontext=system_u:system_r:hwclock_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclas It turns out that the setpgid() system call fails to enter an RCU read-side critical section before doing a PID-to-task_struct translation. This commit therefore does rcu_read_lock() before the translation, and also does rcu_read_unlock() after the last use of the returned pointer. Reported-by: Andrew Morton Signed-off-by: Paul E. McKenney Acked-by: David Howells Cc: Jiri Slaby Cc: Oleg Nesterov Signed-off-by: Greg Kroah-Hartman commit e5aec52ddb8d72e0fe2ab17f637e2f4428b19da0 Author: Matt Helsley Date: Mon Sep 13 13:01:18 2010 -0700 hw breakpoints: Fix pid namespace bug commit 068e35eee9ef98eb4cab55181977e24995d273be upstream. Hardware breakpoints can't be registered within pid namespaces because tsk->pid is passed rather than the pid in the current namespace. (See https://bugzilla.kernel.org/show_bug.cgi?id=17281 ) This is a quick fix demonstrating the problem but is not the best method of solving the problem since passing pids internally is not the best way to avoid pid namespace bugs. Subsequent patches will show a better solution. Much thanks to Frederic Weisbecker for doing the bulk of the work finding this bug. Reported-by: Robin Green Signed-off-by: Matt Helsley Signed-off-by: Peter Zijlstra Cc: Prasad Cc: Arnaldo Carvalho de Melo Cc: Steven Rostedt Cc: Will Deacon Cc: Mahesh Salgaonkar LKML-Reference: Signed-off-by: Ingo Molnar Signed-off-by: Frederic Weisbecker Signed-off-by: Greg Kroah-Hartman commit 96b6a8c56dbee75f61271f6484f56730beb17a20 Author: Zhenyu Wang Date: Sun Sep 19 10:28:54 2010 +0800 agp/intel: fix dma mask bits on sandybridge [This is backport patch from upstream 877fdacf.] Signed-off-by: Zhenyu Wang Signed-off-by: Greg Kroah-Hartman commit 1252894fa9ea0a4e73cb68f49f5913dda9834d6c Author: Zhenyu Wang Date: Sun Sep 19 10:28:53 2010 +0800 agp/intel: fix physical address mask bits for sandybridge [This is backport patch from upstream 8dfc2b14.] Signed-off-by: Zhenyu Wang Signed-off-by: Greg Kroah-Hartman commit 6d51cdffcae15394f615489d57d8ca0e9a91e494 Author: Zhenyu Wang Date: Sun Sep 19 10:28:52 2010 +0800 intel_agp, drm/i915: Add all sandybridge graphics devices support New pci ids for all sandybridge graphics versions on desktop/mobile/server. [This is backport patch from upstream commit 4fefe435 and 85540480.] Signed-off-by: Zhenyu Wang Signed-off-by: Greg Kroah-Hartman commit ba4de7fd43bb21be634a39b0de274cfb9b19fd83 Author: Hans de Goede Date: Thu Sep 16 14:43:08 2010 +0530 virtio: console: Fix poll blocking even though there is data to read commit 6df7aadcd9290807c464675098b5dd2dc9da5075 upstream. I found this while working on a Linux agent for spice, the symptom I was seeing was select blocking on the spice vdagent virtio serial port even though there were messages queued up there. virtio_console's port_fops_poll checks port->inbuf != NULL to determine if read won't block. However if an application reads enough bytes from inbuf through port_fops_read, to empty the current port->inbuf, port->inbuf will be NULL even though there may be buffers left in the virtqueue. This causes poll() to block even though there is data to be read, this patch fixes this by using will_read_block(port) instead of the port->inbuf != NULL check. Signed-off-By: Hans de Goede Signed-off-by: Amit Shah Signed-off-by: Rusty Russell Signed-off-by: Greg Kroah-Hartman commit d4f5d4c1a4f45e69deb3bcf727e511b0637a72d1 Author: Amit Shah Date: Tue Sep 14 13:26:16 2010 +0530 virtio: console: Prevent userspace from submitting NULL buffers commit 65745422a898741ee0e7068ef06624ab06e8aefa upstream. A userspace could submit a buffer with 0 length to be written to the host. Prevent such a situation. This was not needed previously, but recent changes in the way write() works exposed this condition to trigger a virtqueue event to the host, causing a NULL buffer to be sent across. Signed-off-by: Amit Shah Signed-off-by: Rusty Russell Signed-off-by: Greg Kroah-Hartman commit d987d040e9623502a97a9dd742377d4ec00e2ba2 Author: Hugh Dickins Date: Sun Sep 19 19:40:22 2010 -0700 mm: further fix swapin race condition commit 31c4a3d3a0f84a5847665f8aa0552d188389f791 upstream. Commit 4969c1192d15 ("mm: fix swapin race condition") is now agreed to be incomplete. There's a race, not very much less likely than the original race envisaged, in which it is further necessary to check that the swapcache page's swap has not changed. Here's the reasoning: cast in terms of reuse_swap_page(), but probably could be reformulated to rely on try_to_free_swap() instead, or on swapoff+swapon. A, faults into do_swap_page(): does page1 = lookup_swap_cache(swap1) and comes through the lock_page(page1). B, a racing thread of the same process, faults on the same address: does page1 = lookup_swap_cache(swap1) and now waits in lock_page(page1), but for whatever reason is unlucky not to get the lock any time soon. A carries on through do_swap_page(), a write fault, but cannot reuse the swap page1 (another reference to swap1). Unlocks the page1 (but B doesn't get it yet), does COW in do_wp_page(), page2 now in that pte. C, perhaps the parent of A+B, comes in and write faults the same swap page1 into its mm, reuse_swap_page() succeeds this time, swap1 is freed. kswapd comes in after some time (B still unlucky) and swaps out some pages from A+B and C: it allocates the original swap1 to page2 in A+B, and some other swap2 to the original page1 now in C. But does not immediately free page1 (actually it couldn't: B holds a reference), leaving it in swap cache for now. B at last gets the lock on page1, hooray! Is PageSwapCache(page1)? Yes. Is pte_same(*page_table, orig_pte)? Yes, because page2 has now been given the swap1 which page1 used to have. So B proceeds to insert page1 into A+B's page_table, though its content now belongs to C, quite different from what A wrote there. B ought to have checked that page1's swap was still swap1. Signed-off-by: Hugh Dickins Reviewed-by: Rik van Riel Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 69bce8b48bc106ee22e68013dcfdbda8ed45f1ac Author: Andrea Arcangeli Date: Thu Sep 9 16:37:52 2010 -0700 mm: fix swapin race condition commit 4969c1192d15afa3389e7ae3302096ff684ba655 upstream. The pte_same check is reliable only if the swap entry remains pinned (by the page lock on swapcache). We've also to ensure the swapcache isn't removed before we take the lock as try_to_free_swap won't care about the page pin. One of the possible impacts of this patch is that a KSM-shared page can point to the anon_vma of another process, which could exit before the page is freed. This can leave a page with a pointer to a recycled anon_vma object, or worse, a pointer to something that is no longer an anon_vma. [Backport to 2.6.35.5 (anon_vma instead of anon_vma->root in ksm.h) by Hugh] [riel@redhat.com: changelog help] Signed-off-by: Andrea Arcangeli Acked-by: Hugh Dickins Reviewed-by: Rik van Riel Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Hugh Dickins Signed-off-by: Greg Kroah-Hartman commit 5b11eef8e3d3e3b2ce9aae50f6ab4fd8451cfc5b Author: Dan Carpenter Date: Fri Sep 10 01:56:16 2010 +0000 net/llc: make opt unsigned in llc_ui_setsockopt() commit 339db11b219f36cf7da61b390992d95bb6b7ba2e upstream. The members of struct llc_sock are unsigned so if we pass a negative value for "opt" it can cause a sign bug. Also it can cause an integer overflow when we multiply "opt * HZ". Signed-off-by: Dan Carpenter Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 9db783be92c6c07d4ff6238d01022368a29e0d82 Author: Dan Carpenter Date: Mon Sep 6 14:32:30 2010 +0200 Staging: vt6655: fix buffer overflow commit dd173abfead903c7df54e977535973f3312cd307 upstream. "param->u.wpa_associate.wpa_ie_len" comes from the user. We should check it so that the copy_from_user() doesn't overflow the buffer. Also further down in the function, we assume that if "param->u.wpa_associate.wpa_ie_len" is set then "abyWPAIE[0]" is initialized. To make that work, I changed the test here to say that if "wpa_ie_len" is set then "wpa_ie" has to be a valid pointer or we return -EINVAL. Oddly, we only use the first element of the abyWPAIE[] array. So I suspect there may be some other issues in this function. Signed-off-by: Dan Carpenter Signed-off-by: Greg Kroah-Hartman commit b38ac5ef17532d9d73dc08859fd256a647e969db Author: Andy Gospodarek Date: Fri Sep 10 11:43:20 2010 +0000 bonding: correctly process non-linear skbs commit ab12811c89e88f2e66746790b1fe4469ccb7bdd9 upstream. It was recently brought to my attention that 802.3ad mode bonds would no longer form when using some network hardware after a driver update. After snooping around I realized that the particular hardware was using page-based skbs and found that skb->data did not contain a valid LACPDU as it was not stored there. That explained the inability to form an 802.3ad-based bond. For balance-alb mode bonds this was also an issue as ARPs would not be properly processed. This patch fixes the issue in my tests and should be applied to 2.6.36 and as far back as anyone cares to add it to stable. Thanks to Alexander Duyck and Jesse Brandeburg for the suggestions on this one. Signed-off-by: Andy Gospodarek CC: Alexander Duyck CC: Jesse Brandeburg Signed-off-by: Jay Vosburgh Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 073b0b42c9a06e8c3d2e81fac440e3f387c0498a Author: Dan Rosenberg Date: Wed Sep 15 11:43:04 2010 +0000 drivers/net/eql.c: prevent reading uninitialized stack memory commit 44467187dc22fdd33a1a06ea0ba86ce20be3fe3c upstream. Fixed formatting (tabs and line breaks). The EQL_GETMASTRCFG device ioctl allows unprivileged users to read 16 bytes of uninitialized stack memory, because the "master_name" member of the master_config_t struct declared on the stack in eql_g_master_cfg() is not altered or zeroed before being copied back to the user. This patch takes care of it. Signed-off-by: Dan Rosenberg Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 79872c1916caaebb002448f63455050cfde581cf Author: Dan Rosenberg Date: Wed Sep 15 11:43:12 2010 +0000 drivers/net/cxgb3/cxgb3_main.c: prevent reading uninitialized stack memory commit 49c37c0334a9b85d30ab3d6b5d1acb05ef2ef6de upstream. Fixed formatting (tabs and line breaks). The CHELSIO_GET_QSET_NUM device ioctl allows unprivileged users to read 4 bytes of uninitialized stack memory, because the "addr" member of the ch_reg struct declared on the stack in cxgb_extension_ioctl() is not altered or zeroed before being copied back to the user. This patch takes care of it. Signed-off-by: Dan Rosenberg Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 9df2a1a9348fe1098e55ca3899963a60419c12d8 Author: Dan Rosenberg Date: Wed Sep 15 11:43:28 2010 +0000 drivers/net/usb/hso.c: prevent reading uninitialized memory commit 7011e660938fc44ed86319c18a5954e95a82ab3e upstream. Fixed formatting (tabs and line breaks). The TIOCGICOUNT device ioctl allows unprivileged users to read uninitialized stack memory, because the "reserved" member of the serial_icounter_struct struct declared on the stack in hso_get_count() is not altered or zeroed before being copied back to the user. This patch takes care of it. Signed-off-by: Dan Rosenberg Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit df7b44d7704fd1a1f3f6af399587bc15732490b0 Author: David S. Miller Date: Mon Aug 23 23:10:57 2010 -0700 sparc64: Get rid of indirect p1275 PROM call buffer. [ Upstream commit 25edd6946a1d74e5e77813c2324a0908c68bcf9e ] This is based upon a report by Meelis Roos showing that it's possible that we'll try to fetch a property that is 32K in size with some devices. With the current fixed 3K buffer we use for moving data in and out of the firmware during PROM calls, that simply won't work. In fact, it will scramble random kernel data during bootup. The reasoning behind the temporary buffer is entirely historical. It used to be the case that we had problems referencing dynamic kernel memory (including the stack) early in the boot process before we explicitly told the firwmare to switch us over to the kernel trap table. So what we did was always give the firmware buffers that were locked into the main kernel image. But we no longer have problems like that, so get rid of all of this indirect bounce buffering. Besides fixing Meelis's bug, this also makes the kernel data about 3K smaller. It was also discovered during these conversions that the implementation of prom_retain() was completely wrong, so that was fixed here as well. Currently that interface is not in use. Reported-by: Meelis Roos Tested-by: Meelis Roos Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 1919a55fb8c709f11224a964bb6a93a267daad14 Author: Jianzhao Wang Date: Wed Sep 8 14:35:43 2010 -0700 net: blackhole route should always be recalculated [ Upstream commit ae2688d59b5f861dc70a091d003773975d2ae7fb ] Blackhole routes are used when xfrm_lookup() returns -EREMOTE (error triggered by IKE for example), hence this kind of route is always temporary and so we should check if a better route exists for next packets. Bug has been introduced by commit d11a4dc18bf41719c9f0d7ed494d295dd2973b92. Signed-off-by: Jianzhao Wang Signed-off-by: Nicolas Dichtel Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 2974d51279e5437f5b34a5163d1d835586488946 Author: Eric Dumazet Date: Wed Aug 25 23:44:35 2010 +0000 l2tp: test for ethernet header in l2tp_eth_dev_recv() [ Upstream commit bfc960a8eec023a170a80697fe65157cd4f44f81 ] close https://bugzilla.kernel.org/show_bug.cgi?id=16529 Before calling dev_forward_skb(), we should make sure skb head contains at least an ethernet header, even if length included in upper layer said so. Use pskb_may_pull() to make sure this ethernet header is present in skb head. Reported-by: Thomas Heil Reported-by: Ian Campbell Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4aeda42a1b443d705980c85184cca7f8f28625bf Author: Tetsuo Handa Date: Sat Sep 4 01:34:28 2010 +0000 UNIX: Do not loop forever at unix_autobind(). [ Upstream commit a9117426d0fcc05a194f728159a2d43df43c7add ] We assumed that unix_autobind() never fails if kzalloc() succeeded. But unix_autobind() allows only 1048576 names. If /proc/sys/fs/file-max is larger than 1048576 (e.g. systems with more than 10GB of RAM), a local user can consume all names using fork()/socket()/bind(). If all names are in use, those who call bind() with addr_len == sizeof(short) or connect()/sendmsg() with setsockopt(SO_PASSCRED) will continue while (1) yield(); loop at unix_autobind() till a name becomes available. This patch adds a loop counter in order to give up after 1048576 attempts. Calling yield() for once per 256 attempts may not be sufficient when many names are already in use, for __unix_find_socket_byname() can take long time under such circumstance. Therefore, this patch also adds cond_resched() call. Note that currently a local user can consume 2GB of kernel memory if the user is allowed to create and autobind 1048576 UNIX domain sockets. We should consider adding some restriction for autobind operation. Signed-off-by: Tetsuo Handa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 749f63ed9bf702e588000f1d1ce6290e6c156289 Author: Eric Dumazet Date: Wed Sep 8 05:08:44 2010 +0000 udp: add rehash on connect() commit 719f835853a92f6090258114a72ffe41f09155cd upstream commit 30fff923 introduced in linux-2.6.33 (udp: bind() optimisation) added a secondary hash on UDP, hashed on (local addr, local port). Problem is that following sequence : fd = socket(...) connect(fd, &remote, ...) not only selects remote end point (address and port), but also sets local address, while UDP stack stored in secondary hash table the socket while its local address was INADDR_ANY (or ipv6 equivalent) Sequence is : - autobind() : choose a random local port, insert socket in hash tables [while local address is INADDR_ANY] - connect() : set remote address and port, change local address to IP given by a route lookup. When an incoming UDP frame comes, if more than 10 sockets are found in primary hash table, we switch to secondary table, and fail to find socket because its local address changed. One solution to this problem is to rehash datagram socket if needed. We add a new rehash(struct socket *) method in "struct proto", and implement this method for UDP v4 & v6, using a common helper. This rehashing only takes care of secondary hash table, since primary hash (based on local port only) is not changed. Reported-by: Krzysztof Piotr Oledzki Signed-off-by: Eric Dumazet Tested-by: Krzysztof Piotr Oledzki Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4275a88aa5891aadb1227631d9e602a1f66f347f Author: Alexey Kuznetsov Date: Wed Sep 15 10:27:52 2010 -0700 tcp: Prevent overzealous packetization by SWS logic. [ Upstream commit 01f83d69844d307be2aa6fea88b0e8fe5cbdb2f4 ] If peer uses tiny MSS (say, 75 bytes) and similarly tiny advertised window, the SWS logic will packetize to half the MSS unnecessarily. This causes problems with some embedded devices. However for large MSS devices we do want to half-MSS packetize otherwise we never get enough packets into the pipe for things like fast retransmit and recovery to work. Be careful also to handle the case where MSS > window, otherwise we'll never send until the probe timer. Reported-by: ツ Leandro Melo de Sales Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5bc7298fc100540046e9e252d2fc0e4606d39b44 Author: KOSAKI Motohiro Date: Tue Aug 24 16:05:48 2010 +0000 tcp: select(writefds) don't hang up when a peer close connection [ Upstream commit d84ba638e4ba3c40023ff997aa5e8d3ed002af36 ] This issue come from ruby language community. Below test program hang up when only run on Linux. % uname -mrsv Linux 2.6.26-2-486 #1 Sat Dec 26 08:37:39 UTC 2009 i686 % ruby -rsocket -ve ' BasicSocket.do_not_reverse_lookup = true serv = TCPServer.open("127.0.0.1", 0) s1 = TCPSocket.open("127.0.0.1", serv.addr[1]) s2 = serv.accept s2.close s1.write("a") rescue p $! s1.write("a") rescue p $! Thread.new { s1.write("a") }.join' ruby 1.9.3dev (2010-07-06 trunk 28554) [i686-linux] # [Hang Here] FreeBSD, Solaris, Mac doesn't. because Ruby's write() method call select() internally. and tcp_poll has a bug. SUS defined 'ready for writing' of select() as following. | A descriptor shall be considered ready for writing when a call to an output | function with O_NONBLOCK clear would not block, whether or not the function | would transfer data successfully. That said, EPIPE situation is clearly one of 'ready for writing'. We don't have read-side issue because tcp_poll() already has read side shutdown care. | if (sk->sk_shutdown & RCV_SHUTDOWN) | mask |= POLLIN | POLLRDNORM | POLLRDHUP; So, Let's insert same logic in write side. - reference url http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/31065 http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/31068 Signed-off-by: KOSAKI Motohiro Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 8411b2debbd772b2dc156f06d7da73cdcbd08d35 Author: Eric Dumazet Date: Wed Aug 25 23:02:17 2010 -0700 tcp: fix three tcp sysctls tuning [ Upstream commit c5ed63d66f24fd4f7089b5a6e087b0ce7202aa8e ] As discovered by Anton Blanchard, current code to autotune tcp_death_row.sysctl_max_tw_buckets, sysctl_tcp_max_orphans and sysctl_max_syn_backlog makes little sense. The bigger a page is, the less tcp_max_orphans is : 4096 on a 512GB machine in Anton's case. (tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket)) is much bigger if spinlock debugging is on. Its wrong to select bigger limits in this case (where kernel structures are also bigger) bhash_size max is 65536, and we get this value even for small machines. A better ground is to use size of ehash table, this also makes code shorter and more obvious. Based on a patch from Anton, and another from David. Reported-and-tested-by: Anton Blanchard Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit fd31b5dcb8c69c007705caa9dccdaa371f21b4f0 Author: David S. Miller Date: Wed Aug 25 02:27:49 2010 -0700 tcp: Combat per-cpu skew in orphan tests. [ Upstream commit ad1af0fedba14f82b240a03fe20eb9b2fdbd0357 ] As reported by Anton Blanchard when we use percpu_counter_read_positive() to make our orphan socket limit checks, the check can be off by up to num_cpus_online() * batch (which is 32 by default) which on a 128 cpu machine can be as large as the default orphan limit itself. Fix this by doing the full expensive sum check if the optimized check triggers. Reported-by: Anton Blanchard Signed-off-by: David S. Miller Acked-by: Eric Dumazet Signed-off-by: Greg Kroah-Hartman commit 515b723e04cb4df7f96c59dee019d126df707fc0 Author: David S. Miller Date: Tue Sep 14 21:41:20 2010 -0700 net: RPS needs to depend upon USE_GENERIC_SMP_HELPERS [ Upstream commit 6dcbc12290abb452a5e42713faa6461b248e2f55 ] You cannot invoke __smp_call_function_single() unless the architecture sets this symbol. Reported-by: Daniel Hellstrom Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 3289b85e643814c06b3196bafeda3a44d4d75fe8 Author: Eric Dumazet Date: Mon Aug 16 03:25:00 2010 +0000 rds: fix a leak of kernel memory [ Upstream commit f037590fff3005ce8a1513858d7d44f50053cc8f ] struct rds_rdma_notify contains a 32 bits hole on 64bit arches, make sure it is zeroed before copying it to user. Signed-off-by: Eric Dumazet CC: Andy Grover Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 2f2c0c6440110d7436f202ecf918734243e7b870 Author: David S. Miller Date: Mon Aug 30 18:35:24 2010 -0700 irda: Correctly clean up self->ias_obj on irda_bind() failure. [ Upstream commit 628e300cccaa628d8fb92aa28cb7530a3d5f2257 ] If irda_open_tsap() fails, the irda_bind() code tries to destroy the ->ias_obj object by hand, but does so wrongly. In particular, it fails to a) release the hashbin attached to the object and b) reset the self->ias_obj pointer to NULL. Fix both problems by using irias_delete_object() and explicitly setting self->ias_obj to NULL, just as irda_release() does. Reported-by: Tavis Ormandy Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 88c2651de40107913811814405591d9077d71b02 Author: Jarek Poplawski Date: Sat Sep 4 10:34:29 2010 +0000 gro: Re-fix different skb headrooms [ Upstream commit 64289c8e6851bca0e589e064c9a5c9fbd6ae5dd4 ] The patch: "gro: fix different skb headrooms" in its part: "2) allocate a minimal skb for head of frag_list" is buggy. The copied skb has p->data set at the ip header at the moment, and skb_gro_offset is the length of ip + tcp headers. So, after the change the length of mac header is skipped. Later skb_set_mac_header() sets it into the NET_SKB_PAD area (if it's long enough) and ip header is misaligned at NET_SKB_PAD + NET_IP_ALIGN offset. There is no reason to assume the original skb was wrongly allocated, so let's copy it as it was. bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626 fixes commit: 3d3be4333fdf6faa080947b331a6a19bce1a4f57 Reported-by: Plamen Petrov Signed-off-by: Jarek Poplawski CC: Eric Dumazet Acked-by: Eric Dumazet Tested-by: Plamen Petrov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 76dcdc683e9ab26c0227d9f40b390edba9fca621 Author: Eric Dumazet Date: Wed Sep 1 00:50:51 2010 +0000 gro: fix different skb headrooms [ Upstream commit 3d3be4333fdf6faa080947b331a6a19bce1a4f57 ] Packets entering GRO might have different headrooms, even for a given flow (because of implementation details in drivers, like copybreak). We cant force drivers to deliver packets with a fixed headroom. 1) fix skb_segment() skb_segment() makes the false assumption headrooms of fragments are same than the head. When CHECKSUM_PARTIAL is used, this can give csum_start errors, and crash later in skb_copy_and_csum_dev() 2) allocate a minimal skb for head of frag_list skb_gro_receive() uses netdev_alloc_skb(headroom + skb_gro_offset(p)) to allocate a fresh skb. This adds NET_SKB_PAD to a padding already provided by netdevice, depending on various things, like copybreak. Use alloc_skb() to allocate an exact padding, to reduce cache line needs: NET_SKB_PAD + NET_IP_ALIGN bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626 Many thanks to Plamen Petrov, testing many debugging patches ! With help of Jarek Poplawski. Reported-by: Plamen Petrov Signed-off-by: Eric Dumazet CC: Jarek Poplawski Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5d92da4418de42f1f85920f84d81a66bb559727e Author: David S. Miller Date: Wed Sep 1 18:06:39 2010 -0700 bridge: Clear INET control block of SKBs passed into ip_fragment(). [ Upstream commit 4ce6b9e1621c187a32a47a17bf6be93b1dc4a3df ] In a similar vain to commit 17762060c25590bfddd68cc1131f28ec720f405f ("bridge: Clear IPCB before possible entry into IP stack") Any time we call into the IP stack we have to make sure the state there is as expected by the ipv4 code. With help from Eric Dumazet and Herbert Xu. Reported-by: Brandan Das Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 150e749ab763aef866fcc1d28e25371a39e7fcc1 Author: Dan Rosenberg Date: Wed Sep 15 17:44:16 2010 -0400 USB: serial/mos*: prevent reading uninitialized stack memory commit a0846f1868b11cd827bdfeaf4527d8b1b1c0b098 upstream. The TIOCGICOUNT device ioctl in both mos7720.c and mos7840.c allows unprivileged users to read uninitialized stack memory, because the "reserved" member of the serial_icounter_struct struct declared on the stack is not altered or zeroed before being copied back to the user. This patch takes care of it. Signed-off-by: Dan Rosenberg Signed-off-by: Greg Kroah-Hartman commit 6c09969e0edfbc991387436eb011d5c7642b2b6a Author: Mathias Nyman Date: Mon Sep 6 13:52:01 2010 +0300 usb: musb_debugfs: don't use the struct file private_data field with seq_files commit 024cfa5943a7e89565c60b612d698c2bfb3da66a upstream. seq_files use the private_data field of a file struct for storing a seq_file structure, data should be stored in seq_file's own private field (e.g. file->private_data->private) Otherwise seq_release() will free the private data when the file is closed. Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman