commit dd59f6c76b265ed2ff18b497d6105a9511b1feb1 Merge: 70e66a5 acadbfb Author: Linus Torvalds Date: Sat Dec 19 12:38:55 2009 -0800 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha-2.6: alpha: Convert BUG() to use unreachable() alpha: Add minimal support for software performance events alpha: Wire up missing/new syscalls commit 70e66a5079b2b33f142303d31581cf03f7af98fe Merge: eca9dfc 0535f2b Author: Linus Torvalds Date: Sat Dec 19 11:04:29 2009 -0800 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: sata_mv: remove pointless NULL test pata_hpt3x2n: fix clock turnaround libata: fix reporting of drained bytes when clearing DRQ sata_mv: add power management support for the PCI controllers. sata_mv: store the board_idx into the host private data pata_octeon_cf: use resource_size(), to fix resource sizing bug libata: use the WRITE_SAME_16 define sata_mv: move the PCI bar description initialization code sata_mv: add power management support for the platform driver sata_mv: support clkdev framework sata_mv: increase PIO IORDY timeout Fixed crazy mode-change in merge. commit eca9dfcd0029c8a84b1094bb84a2fb53e4addf6c Merge: 3981e15 b5b60fd Author: Linus Torvalds Date: Sat Dec 19 09:48:42 2009 -0800 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf session: Make events_stats u64 to avoid overflow on 32-bit arches hw-breakpoints: Fix hardware breakpoints -> perf events dependency perf events: Dont report side-band events on each cpu for per-task-per-cpu events perf events, x86/stacktrace: Fix performance/softlockup by providing a special frame pointer-only stack walker perf events, x86/stacktrace: Make stack walking optional perf events: Remove unused perf_counter.h header file perf probe: Check new event name kprobe-tracer: Check new event/group name perf probe: Check whether debugfs path is correct perf probe: Fix libdwarf include path for Debian commit 3981e152864fcc1dbbb564e1f4c0ae11a09639d2 Merge: aac3d39 18374d8 Author: Linus Torvalds Date: Sat Dec 19 09:48:14 2009 -0800 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, irq: Allow 0xff for /proc/irq/[n]/smp_affinity on an 8-cpu system Makefile: Unexport LC_ALL instead of clearing it x86: Fix objdump version check in arch/x86/tools/chkobjdump.awk x86: Reenable TSC sync check at boot, even with NONSTOP_TSC x86: Don't use POSIX character classes in gen-insn-attr-x86.awk Makefile: set LC_CTYPE, LC_COLLATE, LC_NUMERIC to C x86: Increase MAX_EARLY_RES; insufficient on 32-bit NUMA x86: Fix checking of SRAT when node 0 ram is not from 0 x86, cpuid: Add "volatile" to asm in native_cpuid() x86, msr: msrs_alloc/free for CONFIG_SMP=n x86, amd: Get multi-node CPU info from NodeId MSR instead of PCI config space x86: Add IA32_TSC_AUX MSR and use it x86, msr/cpuid: Register enough minors for the MSR and CPUID drivers initramfs: add missing decompressor error check bzip2: Add missing checks for malloc returning NULL bzip2/lzma/gzip: pre-boot malloc doesn't return NULL on failure commit aac3d39693529ca538e37ebdb6ed5d6432a697c7 Merge: 10e5453 077614e Author: Linus Torvalds Date: Sat Dec 19 09:47:49 2009 -0800 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (25 commits) sched: Fix broken assertion sched: Assert task state bits at build time sched: Update task_state_arraypwith new states sched: Add missing state chars to TASK_STATE_TO_CHAR_STR sched: Move TASK_STATE_TO_CHAR_STR near the TASK_state bits sched: Teach might_sleep() about preemptible RCU sched: Make warning less noisy sched: Simplify set_task_cpu() sched: Remove the cfs_rq dependency from set_task_cpu() sched: Add pre and post wakeup hooks sched: Move kthread_bind() back to kthread.c sched: Fix select_task_rq() vs hotplug issues sched: Fix sched_exec() balancing sched: Ensure set_task_cpu() is never called on blocked tasks sched: Use TASK_WAKING for fork wakups sched: Select_task_rq_fair() must honour SD_LOAD_BALANCE sched: Fix task_hot() test order sched: Fix set_cpu_active() in cpu_down() sched: Mark boot-cpu active before smp_init() sched: Fix cpu_clock() in NMIs, on !CONFIG_HAVE_UNSTABLE_SCHED_CLOCK ... commit 10e5453ffa0d04a2eda3cda3f55b88cb9c04595f Merge: 3cd312c d4581a2 Author: Linus Torvalds Date: Sat Dec 19 09:47:34 2009 -0800 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sys: Fix missing rcu protection for __task_cred() access signals: Fix more rcu assumptions signal: Fix racy access to __task_cred in kill_pid_info_as_uid() commit 3cd312c3e887b4bee2d94668a481b3d19c07732c Merge: ecd5907 cf1e367 Author: Linus Torvalds Date: Sat Dec 19 09:47:18 2009 -0800 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: timers: Remove duplicate setting of new_base in __mod_timer() clockevents: Prevent clockevent_devices list corruption on cpu hotplug commit ecd5907a200b18aeddac68b8c734b8ad4c931205 Merge: b4c30aa 1d802e2 Author: Linus Torvalds Date: Sat Dec 19 09:46:46 2009 -0800 Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6 * 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: [S390] Use strim instead of strstrip to avoid false warnings. [S390] qdio: add counter for input queue full condition [S390] qdio: remove superfluous log entries and WARN_ONs. [S390] ptrace: dont abuse PT_PTRACED [S390] cio: fix channel path vary [S390] drivers: Correct size given to memset [S390] tape: Add pr_fmt() macro to all tape source files [S390] rename NT_PRXSTATUS to NT_S390_HIGHREGS [S390] tty: PTR_ERR return of wrong pointer in fs3270_open() [S390] s390: PTR_ERR return of wrong pointer in fallback_init_cip() [S390] dasd: PTR_ERR return of wrong pointer in [S390] dasd: move dasd-diag kmsg to dasd [S390] cio: fix drvdata usage for the console subchannel [S390] wire up sys_recvmmsg commit b4c30aad39805902cf5b855aa8a8b22d728ad057 Author: Al Viro Date: Sat Dec 19 16:03:30 2009 +0000 fix more leaks in audit_tree.c tag_chunk() Several leaks in audit_tree didn't get caught by commit 318b6d3d7ddbcad3d6867e630711b8a705d873d7, including the leak on normal exit in case of multiple rules refering to the same chunk. Signed-off-by: Al Viro Signed-off-by: Linus Torvalds commit 6f5d51148921c242680a7a1d9913384a30ab3cbe Author: Al Viro Date: Sat Dec 19 15:59:45 2009 +0000 fix braindamage in audit_tree.c untag_chunk() ... aka "Al had badly fscked up when writing that thing and nobody noticed until Eric had fixed leaks that used to mask the breakage". The function essentially creates a copy of old array sans one element and replaces the references to elements of original (they are on cyclic lists) with those to corresponding elements of new one. After that the old one is fair game for freeing. First of all, there's a dumb braino: when we get to list_replace_init we use indices for wrong arrays - position in new one with the old array and vice versa. Another bug is more subtle - termination condition is wrong if the element to be excluded happens to be the last one. We shouldn't go until we fill the new array, we should go until we'd finished the old one. Otherwise the element we are trying to kill will remain on the cyclic lists... That crap used to be masked by several leaks, so it was not quite trivial to hit. Eric had fixed some of those leaks a while ago and the shit had hit the fan... Signed-off-by: Al Viro Signed-off-by: Linus Torvalds commit 9b0fd1149747b117e7c3e9917fdea03b774ae3d0 Author: Andres Salomon Date: Fri Dec 18 13:02:38 2009 -0500 watchdog: update geodewdt for new MFGPT API Update to the new cs5535_mfgpt* API. The geode-specific wording should eventually be dropped from this driver... Signed-off-by: Andres Salomon Signed-off-by: Linus Torvalds commit 1d802e24774c94ec7bdb12b6515226f3341533c1 Author: Heiko Carstens Date: Fri Dec 18 17:43:27 2009 +0100 [S390] Use strim instead of strstrip to avoid false warnings. Cc: Michael Holzheu Signed-off-by: Heiko Carstens Signed-off-by: Martin Schwidefsky commit 8bcd9b04fdbab9cee4948501f8862af2a288f1b5 Author: Jan Glauber Date: Fri Dec 18 17:43:26 2009 +0100 [S390] qdio: add counter for input queue full condition Add a counter to the qdio performance statistics that indicates that no free buffers were left in the input queue. If the counter gets increased it means that the qdio adapter filled all available buffers and possibly had more buffers ready but could not transmit them. Signed-off-by: Jan Glauber Signed-off-by: Martin Schwidefsky commit 7883097f1602c8cbb1da764a6ac43e0b8a7f56d9 Author: Jan Glauber Date: Fri Dec 18 17:43:25 2009 +0100 [S390] qdio: remove superfluous log entries and WARN_ONs. * Don't write debug feature log entries for sl, slsb and sbal since these elements can be located from the qdio_q pointer which is also logged. * Convert WARN_ON for wrong alignment of sbal to BUG_ON. * Remove WARN_ON's for wrong alignment of q / qib / slib since these alignments should be guaranteed by kmem_cache_alloc alignment / struct aligned attribute / __get_free_page. Signed-off-by: Jan Glauber Signed-off-by: Martin Schwidefsky commit ca633fd006486ed2c2d3b542283067aab61e6dc8 Author: Oleg Nesterov Date: Fri Dec 18 17:43:24 2009 +0100 [S390] ptrace: dont abuse PT_PTRACED Nobody except ptrace itself should use task->ptrace or PT_PTRACED directly, change arch/s390/kernel/traps.c to use the helper. Signed-off-by: Oleg Nesterov Acked-by: Roland McGrath Signed-off-by: Heiko Carstens Signed-off-by: Martin Schwidefsky commit d302e1a5dbe1677a495033a2d310656a55139cdf Author: Peter Oberparleiter Date: Fri Dec 18 17:43:23 2009 +0100 [S390] cio: fix channel path vary Channel path vary is currently broken: channel paths which are varied offline are still used by Linux. The reason for this is that: * the path mask indicating which paths of an I/O device can be used is reset by each internal I/O request * the logic that checks if a path group is already in its designated target state incorrectly interprets the result "is correctly set" as "is correctly set and available" Fix this by resetting the path mask only for internal I/O requests which affect the path mask and by correcting the pgid check logic. Signed-off-by: Peter Oberparleiter Signed-off-by: Martin Schwidefsky commit 83e56d0b23f91b70a7e708ce0979a57b6c6a1507 Author: Julia Lawall Date: Fri Dec 18 17:43:22 2009 +0100 [S390] drivers: Correct size given to memset Memset should be given the size of the structure, not the size of the pointer. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // @@ type T; T *x; expression E; @@ memset(x, E, sizeof( + * x)) // Signed-off-by: Julia Lawall Signed-off-by: Heiko Carstens Signed-off-by: Martin Schwidefsky commit bb509912481214cf6ad1181c968295c62ff1ad9e Author: Michael Holzheu Date: Fri Dec 18 17:43:21 2009 +0100 [S390] tape: Add pr_fmt() macro to all tape source files Without defining the pr_fmt() macro, the "tape: " prefix will not be printed when using the pr_xxx printk macros. This patch adds the missing definitions. Signed-off-by: Michael Holzheu Signed-off-by: Martin Schwidefsky commit 622e99bf0d54c4517cb0524540cd77257db8621a Author: Martin Schwidefsky Date: Fri Dec 18 17:43:20 2009 +0100 [S390] rename NT_PRXSTATUS to NT_S390_HIGHREGS The elf notes number for the upper register halves is s390 specific. Change the name of the elf notes to include S390. Signed-off-by: Martin Schwidefsky Signed-off-by: Martin Schwidefsky commit 2b31001d306a2b5fd690eee878d2ee61a0a0674c Author: Roel Kluin Date: Fri Dec 18 17:43:19 2009 +0100 [S390] tty: PTR_ERR return of wrong pointer in fs3270_open() Return the PTR_ERR of the correct pointer. Signed-off-by: Roel Kluin Signed-off-by: Martin Schwidefsky commit b59cdcb339fc7286161b80403f6af63acf26876f Author: Roel Kluin Date: Fri Dec 18 17:43:18 2009 +0100 [S390] s390: PTR_ERR return of wrong pointer in fallback_init_cip() Return the PTR_ERR of the correct pointer. Signed-off-by: Roel Kluin Signed-off-by: Martin Schwidefsky commit 6d53cfe590c17c28ebae2c869bb7a5ab9554b4da Author: Roel Kluin Date: Fri Dec 18 17:43:17 2009 +0100 [S390] dasd: PTR_ERR return of wrong pointer in Return the PTR_ERR of the correct pointer. Signed-off-by: Roel Kluin Signed-off-by: Martin Schwidefsky commit ea058544542a60e92fd023d93aa901709be18daa Author: Stefan Haberland Date: Fri Dec 18 17:43:16 2009 +0100 [S390] dasd: move dasd-diag kmsg to dasd The DIAG discipline does not have a own driver name. It shows up as dasd-eckd or dasd-fba. So messages for dasd-diag are moved to the generic dasd part. Signed-off-by: Stefan Haberland Signed-off-by: Martin Schwidefsky commit ffa8d2a3e80a3f0dee9886947dbd506d2bb226d2 Author: Sebastian Ott Date: Fri Dec 18 17:43:15 2009 +0100 [S390] cio: fix drvdata usage for the console subchannel Using dev_set_drvdata prior to device_register will force the driver core to kmalloc its private data. Since we use this for the console subchannel lets set the drvdata before taking the subchannels spinlock. Signed-off-by: Sebastian Ott Signed-off-by: Martin Schwidefsky commit 70ee9518cfc8baec618e69e4ef22566dcb2f29d3 Author: Heiko Carstens Date: Fri Dec 18 17:43:14 2009 +0100 [S390] wire up sys_recvmmsg Signed-off-by: Heiko Carstens Signed-off-by: Martin Schwidefsky commit b5b60fda1e462a849bc37dfbace2888191be82cc Author: Arnaldo Carvalho de Melo Date: Fri Dec 18 13:03:03 2009 -0200 perf session: Make events_stats u64 to avoid overflow on 32-bit arches Pekka Enberg reported weird percentages in perf report. It turns out we are overflowing a 32-bit variables in struct events_stats on 32-bit architectures. Before: [acme@ana linux-2.6-tip]$ perf report -i pekka.perf.data 2> /dev/null | head -10 281.96% Xorg b710a561 [.] 0x000000b710a561 140.15% Xorg [kernel] [k] __initramfs_end 51.56% metacity libgobject-2.0.so.0.2000.1 [.] 0x00000000026e46 35.12% evolution libcairo.so.2.10800.6 [.] 0x000000000203bd 33.84% metacity libpthread-2.9.so [.] 0x00000000007a3d After: [acme@ana linux-2.6-tip]$ perf report -i pekka.perf.data 2> /dev/null | head -10 30.04% Xorg b710a561 [.] 0x000000b710a561 14.93% Xorg [kernel] [k] __initramfs_end 5.49% metacity libgobject-2.0.so.0.2000.1 [.] 0x00000000026e46 3.74% evolution libcairo.so.2.10800.6 [.] 0x000000000203bd 3.61% metacity libpthread-2.9.so [.] 0x00000000007a3d Reported-by: Pekka Enberg Tested-by: Pekka Enberg Signed-off-by: Arnaldo Carvalho de Melo Cc: Frederic Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1261148583-20395-1-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit 99e8c5a3b875a34d894a711c9a3669858d6adf45 Author: Frederic Weisbecker Date: Thu Dec 17 01:33:54 2009 +0100 hw-breakpoints: Fix hardware breakpoints -> perf events dependency The kbuild's select command doesn't propagate through the config dependencies. Hence the current rules of hardware breakpoint's config can't ensure perf can never be disabled under us. We have: config X86 selects HAVE_HW_BREAKPOINTS config HAVE_HW_BREAKPOINTS select PERF_EVENTS config PERF_EVENTS [...] x86 will select the breakpoints but that won't propagate to perf events. The user can still disable the latter, but it is necessary for the breakpoints. What we need is: - x86 selects HAVE_HW_BREAKPOINTS and PERF_EVENTS - HAVE_HW_BREAKPOINTS depends on PERF_EVENTS so that we ensure PERF_EVENTS is enabled and frozen for x86. This fixes the following kind of build errors: In file included from arch/x86/kernel/hw_breakpoint.c:31: include/linux/hw_breakpoint.h: In function 'hw_breakpoint_addr': include/linux/hw_breakpoint.h:39: error: 'struct perf_event' has no member named 'attr' v2: Select also ANON_INODES from x86, required for perf Reported-by: Cyrill Gorcunov Reported-by: Michal Marek Reported-by: Andrew Randrianasulu Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Randy Dunlap Cc: K.Prasad LKML-Reference: <1261010034-7786-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar commit acadbfb90a54673d6c8b05aa4e93218433890411 Author: David Daney Date: Thu Dec 10 18:07:24 2009 -0500 alpha: Convert BUG() to use unreachable() Use the new unreachable() macro instead of for(;;); Signed-off-by: David Daney CC: Richard Henderson CC: Ivan Kokshaysky CC: linux-alpha@vger.kernel.org Signed-off-by: Matt Turner commit a582e6f01b90211933e70edcec9bc0bbb1157402 Author: Michael Cree Date: Tue Dec 8 14:27:01 2009 -0500 alpha: Add minimal support for software performance events In the kernel the patch enables configuration of the perf event option, adds the perf_event_open syscall, and includes a minimal architecture specific asm/perf_event.h header file. Signed-off-by: Michael Cree Cc: Richard Henderson Cc: Ivan Kokshaysky Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Ingo Molnar Signed-off-by: Matt Turner commit 21797c599c710d3851d241c4b50690f2482bf618 Author: Daniele Calore Date: Tue Dec 8 13:59:47 2009 -0500 alpha: Wire up missing/new syscalls This wire up the: fallocate, timerfd_create, timerfd_settime, timerfd_gettime, signalfd4, eventfd2, epoll_create1, dup3, pipe2, inotify_init1, preadv, pwritev and rt_tgsigqueueinfo syscalls for the alpha port. For umount2, alpha have an "old" and "new" version called: oldumount and umount; so ignore umount2. Rebased on top of 6e17e8b9fb74b9fb9f6ea331f7f4a049c5b4c4b8 by Matt Turner. Signed-off-by: Daniele Calore Cc: Richard Henderson Cc: Ivan Kokshaysky Signed-off-by: Matt Turner commit 18374d89e5fe96772102f44f535efb1198d9be08 Author: Suresh Siddha Date: Thu Dec 17 18:29:46 2009 -0800 x86, irq: Allow 0xff for /proc/irq/[n]/smp_affinity on an 8-cpu system John Blackwood reported: > on an older Dell PowerEdge 6650 system with 8 cpus (4 are hyper-threaded), > and 32 bit (x86) kernel, once you change the irq smp_affinity of an irq > to be less than all cpus in the system, you can never change really the > irq smp_affinity back to be all cpus in the system (0xff) again, > even though no error status is returned on the "/bin/echo ff > > /proc/irq/[n]/smp_affinity" operation. > > This is due to that fact that BAD_APICID has the same value as > all cpus (0xff) on 32bit kernels, and thus the value returned from > set_desc_affinity() via the cpu_mask_to_apicid_and() function is treated > as a failure in set_ioapic_affinity_irq_desc(), and no affinity changes > are made. set_desc_affinity() is already checking if the incoming cpu mask intersects with the cpu online mask or not. So there is no need for the apic op cpu_mask_to_apicid_and() to check again and return BAD_APICID. Remove the BAD_APICID return value from cpu_mask_to_apicid_and() and also fix set_desc_affinity() to return -1 instead of using BAD_APICID to represent error conditions (as cpu_mask_to_apicid_and() can return logical or physical apicid values and BAD_APICID is really to represent bad physical apic id). Reported-by: John Blackwood Root-caused-by: John Blackwood Signed-off-by: Suresh Siddha LKML-Reference: <1261103386.2535.409.camel@sbs-t61> Signed-off-by: H. Peter Anvin commit 06b5dc646b9479b786d77749936f25910cd82a37 Author: H. Peter Anvin Date: Thu Dec 17 15:51:37 2009 -0800 Makefile: Unexport LC_ALL instead of clearing it Apparently not all versions of glibc and utilities treat an empty LC_ALL as nonexistent, causing error messages to be garbled. Instead, explicitly unexport it from the environment. Reported-and-tested-by: Masami Hiramatsu Signed-off-by: H. Peter Anvin LKML-Reference: <4B2AC394.4030108@redhat.com> Cc: Michal Marek Cc: Roland Dreier Cc: Sam Ravnborg commit 8c63450718ea62ee3a70bffde170b4d15fc72d3c Author: akpm@linux-foundation.org Date: Thu Dec 17 15:26:36 2009 -0800 x86: Fix objdump version check in arch/x86/tools/chkobjdump.awk It says Warning: objdump version is older than 2.19 Warning: Skipping posttest. because it used the wrong field from `objdump -v': akpm:/usr/src/25> /opt/crosstool/gcc-4.0.2-glibc-2.3.6/x86_64-unknown-linux-gnu/bin/x86_64-unknown-linux-gnu-objdump -v GNU objdump 2.16.1 Copyright 2005 Free Software Foundation, Inc. This program is free software; you may redistribute it under the terms of the GNU General Public License. This program has absolutely no warranty. Cc: Thomas Gleixner Cc: Ingo Molnar Signed-off-by: Andrew Morton LKML-Reference: <200912172326.nBHNQaQl024796@imap1.linux-foundation.org> Signed-off-by: H. Peter Anvin Cc: Masami Hiramatsu commit 6c56ccecf05fafe100ab4ea94f6fccbf5ff00db7 Author: Pallipadi, Venkatesh Date: Thu Dec 17 12:27:02 2009 -0800 x86: Reenable TSC sync check at boot, even with NONSTOP_TSC Commit 83ce4009 did the following change If the TSC is constant and non-stop, also set it reliable. But, there seems to be few systems that will end up with TSC warp across sockets, depending on how the cpus come out of reset. Skipping TSC sync test on such systems may result in time inconsistency later. So, reenable TSC sync test even on constant and non-stop TSC systems. Set, sched_clock_stable to 1 by default and reset it in mark_tsc_unstable, if TSC sync fails. This change still gives perf benefit mentioned in 83ce4009 for systems where TSC is reliable. Signed-off-by: Venkatesh Pallipadi Acked-by: Suresh Siddha LKML-Reference: <20091217202702.GA18015@linux-os.sc.intel.com> Signed-off-by: H. Peter Anvin commit 4beb3d6d144c41525541cce2b611858b2645c725 Author: Roland Dreier Date: Wed Dec 16 17:39:48 2009 -0800 x86: Don't use POSIX character classes in gen-insn-attr-x86.awk Not all awk implementations (including the default awk in Ubuntu 9.10) support POSIX character classes. Since x86-opcode-map.txt is plain ASCII, we can just use explicit ranges for lower case, alphabetic, and alphanumeric characters instead. Signed-off-by: Roland Dreier Acked-by: Masami Hiramatsu LKML-Reference: Signed-off-by: H. Peter Anvin commit c051346b7db27aaf674b8f3b4955240580b2a58a Author: H. Peter Anvin Date: Thu Dec 17 06:56:11 2009 -0800 Makefile: set LC_CTYPE, LC_COLLATE, LC_NUMERIC to C There are a number of common Unix constructs like character ranges in grep/sed/awk which don't work as expected with LC_COLLATE set to other than C. Similarly, set LC_CTYPE and LC_NUMERIC to C to avoid other nasty surprises. In order to make sure these actually take effect we also have to clear LC_ALL. Signed-off-by: H. Peter Anvin Acked-by: Michal Marek Acked-by: Masami Hiramatsu Acked-by: Roland Dreier Cc: Sam Ravnborg LKML-Reference: <4B2A1761.4070904@suse.cz> commit 077614ee1e93245a3b9a4e1213659405dbeb0ba6 Author: Peter Zijlstra Date: Thu Dec 17 13:16:31 2009 +0100 sched: Fix broken assertion There's a preemption race in the set_task_cpu() debug check in that when we get preempted after setting task->state we'd still be on the rq proper, but fail the test. Check for preempted tasks, since those are always on the RQ. Signed-off-by: Peter Zijlstra LKML-Reference: <20091217121830.137155561@chello.nl> Signed-off-by: Ingo Molnar commit e1781538cf5c870ab696e9b8f0a5c498d3900f2f Author: Peter Zijlstra Date: Thu Dec 17 13:16:30 2009 +0100 sched: Assert task state bits at build time Since everybody is lazy and prone to forgetting things, make the compiler help us a bit. Signed-off-by: Peter Zijlstra LKML-Reference: <20091217121830.060186433@chello.nl> Signed-off-by: Ingo Molnar commit 464763cf1c6df632dccc8f2f4c7e50163154a2c0 Author: Peter Zijlstra Date: Thu Dec 17 13:16:29 2009 +0100 sched: Update task_state_arraypwith new states Neglected because its hidden... (who reads comments anyway) Signed-off-by: Peter Zijlstra LKML-Reference: <20091217121829.970166036@chello.nl> Signed-off-by: Ingo Molnar commit 44d90df6b757c59651ddd55f1a84f28132b50d29 Author: Peter Zijlstra Date: Thu Dec 17 13:16:28 2009 +0100 sched: Add missing state chars to TASK_STATE_TO_CHAR_STR We grew 3 new task states since the last time someone touched it. Signed-off-by: Peter Zijlstra LKML-Reference: <20091217121829.892737686@chello.nl> Signed-off-by: Ingo Molnar commit 733421516b42c44b9e21f1793c430cc801ef8324 Author: Peter Zijlstra Date: Thu Dec 17 13:16:27 2009 +0100 sched: Move TASK_STATE_TO_CHAR_STR near the TASK_state bits So that we don't keep forgetting about it. Signed-off-by: Peter Zijlstra LKML-Reference: <20091217121829.815779372@chello.nl> Signed-off-by: Ingo Molnar commit 5d27c23df09b702868d9a3bff86ec6abd22963ac Author: Peter Zijlstra Date: Thu Dec 17 13:16:32 2009 +0100 perf events: Dont report side-band events on each cpu for per-task-per-cpu events Acme noticed that his FORK/MMAP numbers were inflated by about the same factor as his cpu-count. This led to the discovery of a few more sites that need to respect the event->cpu filter. Reported-by: Arnaldo Carvalho de Melo Signed-off-by: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <20091217121830.215333434@chello.nl> Signed-off-by: Ingo Molnar commit 06d65bda75341485d32f33da474b0664819ad497 Author: Frederic Weisbecker Date: Thu Dec 17 05:40:34 2009 +0100 perf events, x86/stacktrace: Fix performance/softlockup by providing a special frame pointer-only stack walker It's just wasteful for stacktrace users like perf to walk through every entries on the stack whereas these only accept reliable ones, ie: that the frame pointer validates. Since perf requires pure reliable stacktraces, it needs a stack walker based on frame pointers-only to optimize the stacktrace processing. This might solve some near-lockup scenarios that can be triggered by call-graph tracing timer events. Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras LKML-Reference: <1261024834-5336-2-git-send-regression-fweisbec@gmail.com> [ v2: fix for modular builds and small detail tidyup ] Signed-off-by: Ingo Molnar commit 61c1917f47f73c968e92d04d15370b1dc3ec4592 Author: Frederic Weisbecker Date: Thu Dec 17 05:40:33 2009 +0100 perf events, x86/stacktrace: Make stack walking optional The current print_context_stack helper that does the stack walking job is good for usual stacktraces as it walks through all the stack and reports even addresses that look unreliable, which is nice when we don't have frame pointers for example. But we have users like perf that only require reliable stacktraces, and those may want a more adapted stack walker, so lets make this function a callback in stacktrace_ops that users can tune for their needs. Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras LKML-Reference: <1261024834-5336-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar commit 5b74ed4729ad2b2017453add68104a83206caefb Author: Robert P. J. Day Date: Wed Dec 16 10:09:45 2009 -0500 perf events: Remove unused perf_counter.h header file Since nothing includes the file and it's also not exported to user space, remove it. Signed-off-by: Robert P. J. Day Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: Signed-off-by: Ingo Molnar commit 234da7bcdc7aaa935846534c3b726dbc79a9cdd5 Author: Frederic Weisbecker Date: Wed Dec 16 20:21:05 2009 +0100 sched: Teach might_sleep() about preemptible RCU In practice, it is harmless to voluntarily sleep in a rcu_read_lock() section if we are running under preempt rcu, but it is illegal if we build a kernel running non-preemptable rcu. Currently, might_sleep() doesn't notice sleepable operations under rcu_read_lock() sections if we are running under preemptable rcu because preempt_count() is left untouched after rcu_read_lock() in this case. But we want developers who test their changes under such config to notice the "sleeping while atomic" issues. So we add rcu_read_lock_nesting to prempt_count() in might_sleep() checks. [ v2: Handle rcu-tiny ] Signed-off-by: Frederic Weisbecker Reviewed-by: Paul E. McKenney Cc: Peter Zijlstra LKML-Reference: <1260991265-8451-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar commit b7702a2136b5f8e0e186e22cae91aaecf98b418c Author: Masami Hiramatsu Date: Wed Dec 16 17:24:15 2009 -0500 perf probe: Check new event name Check new event name is same syntax as a C symbol in perf command. In other words, checking the name is as like as other tracepoint events. This can prevent user to create an event with useless name (e.g. foo|bar, foo*bar). Signed-off-by: Masami Hiramatsu Cc: Frederic Weisbecker Cc: Paul Mackerras Cc: Arnaldo Carvalho de Melo Cc: Steven Rostedt Cc: Jim Keniston Cc: Ananth N Mavinakayanahalli Cc: Christoph Hellwig Cc: Jason Baron Cc: K.Prasad Cc: Peter Zijlstra Cc: Srikar Dronamraju Cc: Frederic Weisbecker Cc: systemtap Cc: DLE LKML-Reference: <20091216222415.14459.71383.stgit@dhcp-100-2-132.bos.redhat.com> [ v2: minor cleanups ] Signed-off-by: Ingo Molnar commit 6f3cf440470650b3841d325acacd0c5ea9504c68 Author: Masami Hiramatsu Date: Wed Dec 16 17:24:08 2009 -0500 kprobe-tracer: Check new event/group name Check new event/group name is same syntax as a C symbol. In other words, checking the name is as like as other tracepoint events. This can prevent user to create an event with useless name (e.g. foo|bar, foo*bar). Signed-off-by: Masami Hiramatsu Cc: Frederic Weisbecker Cc: Paul Mackerras Cc: Arnaldo Carvalho de Melo Cc: Steven Rostedt Cc: Jim Keniston Cc: Ananth N Mavinakayanahalli Cc: Christoph Hellwig Cc: Jason Baron Cc: K.Prasad Cc: Peter Zijlstra Cc: Srikar Dronamraju Cc: systemtap Cc: DLE LKML-Reference: <20091216222408.14459.68790.stgit@dhcp-100-2-132.bos.redhat.com> [ v2: minor cleanups ] Signed-off-by: Ingo Molnar commit 96c96612e952f63cc0055db9df7d8b5b1ada02be Author: Masami Hiramatsu Date: Wed Dec 16 17:24:00 2009 -0500 perf probe: Check whether debugfs path is correct Check whether the debugfs path is correct before executing a command, because perf-probe depends on debugfs. Signed-off-by: Masami Hiramatsu Cc: Frederic Weisbecker Cc: Paul Mackerras Cc: Arnaldo Carvalho de Melo Cc: Steven Rostedt Cc: Jim Keniston Cc: Ananth N Mavinakayanahalli Cc: Christoph Hellwig Cc: Jason Baron Cc: K.Prasad Cc: Peter Zijlstra Cc: Srikar Dronamraju Cc: systemtap Cc: DLE LKML-Reference: <20091216222400.14459.48162.stgit@dhcp-100-2-132.bos.redhat.com> Signed-off-by: Ingo Molnar commit 27f3b24de03fc7cec6f2406f8525ad18086c2121 Author: Masami Hiramatsu Date: Wed Dec 16 17:16:19 2009 -0500 perf probe: Fix libdwarf include path for Debian Fix libdwarf include path to fit debian-like systems too. Borislav Petkov reported: > even after installing libdwarf-dev on my debian box here, > make in tools/perf/ still complains that it cannot find libdwarf: > > Makefile:491: No libdwarf.h found or old libdwarf.h found, disables dwarf > support. Please install libdwarf-dev/libdwarf-devel >= 20081231 > > The problem is that the include path on debian is not > /usr/include/libdwarf/ but simply /usr/include because the debian > package libdwarf-dev puts the headers straight into > /usr/include. This patch adds -I/usr/include/libdwarf to BASIC_CFLAGS and fix probe-finder.h to include just libdwarf.h/dwarf.h. This patch also adds a workaround for the undefined _MIPS_SZLONG bug in libdwarf.h. Reported-by: Borislav Petkov Signed-off-by: Masami Hiramatsu Cc: Frederic Weisbecker Cc: Paul Mackerras Cc: Arnaldo Carvalho de Melo Cc: Peter Zijlstra Cc: Srikar Dronamraju Cc: Gabor Gombas Cc: systemtap Cc: DLE LKML-Reference: <20091216221618.13816.83296.stgit@dhcp-100-2-132.bos.redhat.com> [ v2: small stylistic fixlets to probe-finder.h ] Signed-off-by: Ingo Molnar commit 0535f2bc170bc0779ac471faff39f633ca19ab59 Author: Jeff Garzik Date: Thu Dec 17 01:23:16 2009 -0500 sata_mv: remove pointless NULL test Remove !ap test, where ap is guaranteed not-NULL. Found by way of automated bug report from Alexander Strakh via "Linux Device Drivers Verification Project (Svace Detector)" Signed-off-by: Jeff Garzik commit 256ace9bbd4cdb6d48d5f55d55d42fa20527fad1 Author: Sergei Shtylyov Date: Thu Dec 17 01:11:27 2009 -0500 pata_hpt3x2n: fix clock turnaround The clock turnaround code still doesn't work for several reasons: - 'USE_DPLL' flag in 'ap->host->private_data' is never initialized or updated, so the driver can only set the chip to the DPLL clock mode, not the PCI mode; - the driver doesn't serialize access to the channels depending on the current clock mode like the vendor drivers, so the clock turnaround is only executed "optionally", not always as it should be; - the wrong ports are written to when hpt3x2n_set_clock() is called for the secondary channel; - hpt3x2n_set_clock() can inadvertently enable the disabled channels when resetting the channel state machines. Signed-off-by: Sergei Shtylyov Cc: stable@kernel.org Signed-off-by: Jeff Garzik commit 9a8fd68b15e7b047678a651b7f7e2f3dcd19d20d Author: Robert Hancock Date: Tue Dec 8 20:48:10 2009 -0600 libata: fix reporting of drained bytes when clearing DRQ When we drain data from a device to clear DRQ during error recovery, the number of bytes reported as drained is too low by a factor of 2 because the count is actually reporting the number of words drained, not bytes. Fix this. Signed-off-by: Robert Hancock Signed-off-by: Jeff Garzik commit b2dec48ccaad004fc706352f82725d43369d9bd7 Author: Saeed Bishara Date: Sun Dec 6 18:26:22 2009 +0200 sata_mv: add power management support for the PCI controllers. Signed-off-by: Saeed Bishara Signed-off-by: Jeff Garzik commit 1bfeff03f8a52eb896e5aad33d52e2451437bb0b Author: Saeed Bishara Date: Thu Dec 17 01:05:00 2009 -0500 sata_mv: store the board_idx into the host private data This information will be used in the resume function. Signed-off-by: Saeed Bishara Signed-off-by: Jeff Garzik commit 4716eaf20f37d10fd01b0fcacb3e41c1abd362c3 Author: H Hartley Sweeten Date: Thu Dec 10 20:03:10 2009 -0500 pata_octeon_cf: use resource_size(), to fix resource sizing bug It appears the size for cs1 is calculated using the wrong resource. Use the function resource_size to get the correct value. Signed-off-by: H Hartley Sweeten Signed-off-by: Jeff Garzik commit 0cdd6eb7e08fc39e9c906cc46b6ee9095c3077a9 Author: Christoph Hellwig Date: Thu Dec 10 10:36:01 2009 +0100 libata: use the WRITE_SAME_16 define Now that the scsi tree has hit mainline we can use the newly added WRITE_SAME_16 define. Signed-off-by: Christoph Hellwig Signed-off-by: Jeff Garzik commit c4bc7d7310a40c8c0b917e88983dc4a8e6b59e38 Author: Saeed Bishara Date: Sun Dec 6 18:26:20 2009 +0200 sata_mv: move the PCI bar description initialization code The mv_init_host will be used to initialize the host hw on resume. The PCI bar description need to be initialized only once when the device probed. Signed-off-by: Saeed Bishara Signed-off-by: Jeff Garzik commit 6481f2b52cd5411ea6342b749daf0e4f3b390d7b Author: Saeed Bishara Date: Sun Dec 6 18:26:19 2009 +0200 sata_mv: add power management support for the platform driver Signed-off-by: Saeed Bishara Signed-off-by: Jeff Garzik commit c77a2f4e6b76c9094182dfa653ece4243f6df80c Author: Saeed Bishara Date: Sun Dec 6 18:26:18 2009 +0200 sata_mv: support clkdev framework Signed-off-by: Saeed Bishara Signed-off-by: Jeff Garzik commit d7b0c143693bcbf391d2be235e150b97bfd8f9ba Author: Saeed Bishara Date: Sun Dec 6 18:26:17 2009 +0200 sata_mv: increase PIO IORDY timeout The old value (0xbc) in cycles of the IORDY timeout is suitable for devices with core clock of 166 MHz, but some SoC controllers have faster core clocks. The new value will make the IORDY timeout large enough also for all SoC devices. Signed-off-by: Saeed Bishara Signed-off-by: Jeff Garzik commit 416eb39556a03d1c7e52b0791e9052ccd71db241 Author: Ingo Molnar Date: Thu Dec 17 06:05:49 2009 +0100 sched: Make warning less noisy Cc: Peter Zijlstra Cc: Mike Galbraith LKML-Reference: <20091216170517.807938893@chello.nl> Signed-off-by: Ingo Molnar commit 6a1e008a0915f502eb026fb995ea3e49d5b017f7 Author: Yinghai Lu Date: Tue Dec 15 17:59:03 2009 -0800 x86: Increase MAX_EARLY_RES; insufficient on 32-bit NUMA Due to recent changes wakeup and mptable, we run out of early reservations on 32-bit NUMA. Thus, adjust the available number. Signed-off-by: Yinghai Lu LKML-Reference: <4B22D754.2020706@kernel.org> Signed-off-by: H. Peter Anvin commit 329962503692b42d8088f31584e42d52db179d52 Author: Yinghai Lu Date: Tue Dec 15 17:59:02 2009 -0800 x86: Fix checking of SRAT when node 0 ram is not from 0 Found one system that boot from socket1 instead of socket0, SRAT get rejected... [ 0.000000] SRAT: Node 1 PXM 0 0-a0000 [ 0.000000] SRAT: Node 1 PXM 0 100000-80000000 [ 0.000000] SRAT: Node 1 PXM 0 100000000-2080000000 [ 0.000000] SRAT: Node 0 PXM 1 2080000000-4080000000 [ 0.000000] SRAT: Node 2 PXM 2 4080000000-6080000000 [ 0.000000] SRAT: Node 3 PXM 3 6080000000-8080000000 [ 0.000000] SRAT: Node 4 PXM 4 8080000000-a080000000 [ 0.000000] SRAT: Node 5 PXM 5 a080000000-c080000000 [ 0.000000] SRAT: Node 6 PXM 6 c080000000-e080000000 [ 0.000000] SRAT: Node 7 PXM 7 e080000000-10080000000 ... [ 0.000000] NUMA: Allocated memnodemap from 500000 - 701040 [ 0.000000] NUMA: Using 20 for the hash shift. [ 0.000000] Adding active range (0, 0x2080000, 0x4080000) 0 entries of 3200 used [ 0.000000] Adding active range (1, 0x0, 0x96) 1 entries of 3200 used [ 0.000000] Adding active range (1, 0x100, 0x7f750) 2 entries of 3200 used [ 0.000000] Adding active range (1, 0x100000, 0x2080000) 3 entries of 3200 used [ 0.000000] Adding active range (2, 0x4080000, 0x6080000) 4 entries of 3200 used [ 0.000000] Adding active range (3, 0x6080000, 0x8080000) 5 entries of 3200 used [ 0.000000] Adding active range (4, 0x8080000, 0xa080000) 6 entries of 3200 used [ 0.000000] Adding active range (5, 0xa080000, 0xc080000) 7 entries of 3200 used [ 0.000000] Adding active range (6, 0xc080000, 0xe080000) 8 entries of 3200 used [ 0.000000] Adding active range (7, 0xe080000, 0x10080000) 9 entries of 3200 used [ 0.000000] SRAT: PXMs only cover 917504MB of your 1048566MB e820 RAM. Not used. [ 0.000000] SRAT: SRAT not used. the early_node_map is not sorted because node0 with non zero start come first. so try to sort it right away after all regions are registered. also fixs refression by 8716273c (x86: Export srat physical topology) -v2: make it more solid to handle cross node case like node0 [0,4g), [8,12g) and node1 [4g, 8g), [12g, 16g) -v3: update comments. Reported-and-tested-by: Jens Axboe Signed-off-by: Yinghai Lu LKML-Reference: <4B2579D2.3010201@kernel.org> Signed-off-by: H. Peter Anvin commit 45a94d7cd45ed991914011919e7d40eb6d2546d1 Author: Suresh Siddha Date: Wed Dec 16 16:25:42 2009 -0800 x86, cpuid: Add "volatile" to asm in native_cpuid() xsave_cntxt_init() does something like: cpuid(0xd, ..); // find out what features FP/SSE/.. etc are supported xsetbv(); // enable the features known to OS cpuid(0xd, ..); // find out the size of the context for features enabled Depending on what features get enabled in xsetbv(), value of the cpuid.eax=0xd.ecx=0.ebx changes correspondingly (representing the size of the context that is enabled). As we don't have volatile keyword for native_cpuid(), gcc 4.1.2 optimizes away the second cpuid and the kernel continues to use the cpuid information obtained before xsetbv(), ultimately leading to kernel crash on processors supporting more state than the legacy FP/SSE. Add "volatile" for native_cpuid(). Signed-off-by: Suresh Siddha LKML-Reference: <1261009542.2745.55.camel@sbs-t61.sc.intel.com> Cc: stable@kernel.org Signed-off-by: H. Peter Anvin commit cf1e367ee84e02ac349ad0858eb65e8a6a511c8b Author: Simon Horman Date: Thu Dec 17 11:15:42 2009 +1100 timers: Remove duplicate setting of new_base in __mod_timer() new_base is set using per_cpu(tvec_bases, cpu) after selecting the desired value of cpu immediately below so this line is a unnecessary. Signed-off-by: Simon Horman LKML-Reference: <20091217001542.GD25317@verge.net.au> Signed-off-by: Thomas Gleixner commit 6ede31e03084ee084bcee073ef3d1136f68d0906 Author: Borislav Petkov Date: Thu Dec 17 00:16:25 2009 +0100 x86, msr: msrs_alloc/free for CONFIG_SMP=n Randy Dunlap reported the following build error: "When CONFIG_SMP=n, CONFIG_X86_MSR=m: ERROR: "msrs_free" [drivers/edac/amd64_edac_mod.ko] undefined! ERROR: "msrs_alloc" [drivers/edac/amd64_edac_mod.ko] undefined!" This is due to the fact that is conditioned on CONFIG_SMP and in the UP case we have only the stubs in the header. Fork off SMP functionality into a new file (msr-smp.c) and build msrs_{alloc,free} unconditionally. Reported-by: Randy Dunlap Cc: H. Peter Anvin Signed-off-by: Borislav Petkov LKML-Reference: <20091216231625.GD27228@liondog.tnic> Signed-off-by: H. Peter Anvin commit 9d260ebc09a0ad6b5c73e17676df42c7bc75ff64 Author: Andreas Herrmann Date: Wed Dec 16 15:43:55 2009 +0100 x86, amd: Get multi-node CPU info from NodeId MSR instead of PCI config space Use NodeId MSR to get NodeId and number of nodes per processor. Signed-off-by: Andreas Herrmann LKML-Reference: <20091216144355.GB28798@alberich.amd.com> Signed-off-by: H. Peter Anvin commit 738d2be4301007f054541c5c4bf7fb6a361c9b3a Author: Peter Zijlstra Date: Wed Dec 16 18:04:42 2009 +0100 sched: Simplify set_task_cpu() Rearrange code a bit now that its a simpler function. Signed-off-by: Peter Zijlstra Cc: Mike Galbraith LKML-Reference: <20091216170518.269101883@chello.nl> Signed-off-by: Ingo Molnar commit 88ec22d3edb72b261f8628226cd543589a6d5e1b Author: Peter Zijlstra Date: Wed Dec 16 18:04:41 2009 +0100 sched: Remove the cfs_rq dependency from set_task_cpu() In order to remove the cfs_rq dependency from set_task_cpu() we need to ensure the task is cfs_rq invariant for all callsites. The simple approach is to substract cfs_rq->min_vruntime from se->vruntime on dequeue, and add cfs_rq->min_vruntime on enqueue. However, this has the downside of breaking FAIR_SLEEPERS since we loose the old vruntime as we only maintain the relative position. To solve this, we observe that we only migrate runnable tasks, we do this using deactivate_task(.sleep=0) and activate_task(.wakeup=0), therefore we can restrain the min_vruntime invariance to that state. The only other case is wakeup balancing, since we want to maintain the old vruntime we cannot make it relative on dequeue, but since we don't migrate inactive tasks, we can do so right before we activate it again. This is where we need the new pre-wakeup hook, we need to call this while still holding the old rq->lock. We could fold it into ->select_task_rq(), but since that has multiple callsites and would obfuscate the locking requirements, that seems like a fudge. This leaves the fork() case, simply make sure that ->task_fork() leaves the ->vruntime in a relative state. This covers all cases where set_task_cpu() gets called, and ensures it sees a relative vruntime. Signed-off-by: Peter Zijlstra Cc: Mike Galbraith LKML-Reference: <20091216170518.191697025@chello.nl> Signed-off-by: Ingo Molnar commit efbbd05a595343a413964ad85a2ad359b7b7efbd Author: Peter Zijlstra Date: Wed Dec 16 18:04:40 2009 +0100 sched: Add pre and post wakeup hooks As will be apparent in the next patch, we need a pre wakeup hook for sched_fair task migration, hence rename the post wakeup hook and one pre wakeup. Signed-off-by: Peter Zijlstra Cc: Mike Galbraith LKML-Reference: <20091216170518.114746117@chello.nl> Signed-off-by: Ingo Molnar commit 881232b70b195768a71cd74ff4b4e8ab9502997b Author: Peter Zijlstra Date: Wed Dec 16 18:04:39 2009 +0100 sched: Move kthread_bind() back to kthread.c Since kthread_bind() lost its dependencies on sched.c, move it back where it came from. Signed-off-by: Peter Zijlstra Cc: Mike Galbraith LKML-Reference: <20091216170518.039524041@chello.nl> Signed-off-by: Ingo Molnar commit 5da9a0fb673a0ea0a093862f95f6b89b3390c31e Author: Peter Zijlstra Date: Wed Dec 16 18:04:38 2009 +0100 sched: Fix select_task_rq() vs hotplug issues Since select_task_rq() is now responsible for guaranteeing ->cpus_allowed and cpu_active_mask, we need to verify this. select_task_rq_rt() can blindly return smp_processor_id()/task_cpu() without checking the valid masks, select_task_rq_fair() can do the same in the rare case that all SD_flags are disabled. Signed-off-by: Peter Zijlstra Cc: Mike Galbraith LKML-Reference: <20091216170517.961475466@chello.nl> Signed-off-by: Ingo Molnar commit 3802290628348674985d14914f9bfee7b9084548 Author: Peter Zijlstra Date: Wed Dec 16 18:04:37 2009 +0100 sched: Fix sched_exec() balancing Since we access ->cpus_allowed without holding rq->lock we need a retry loop to validate the result, this comes for near free when we merge sched_migrate_task() into sched_exec() since that already does the needed check. Signed-off-by: Peter Zijlstra Cc: Mike Galbraith LKML-Reference: <20091216170517.884743662@chello.nl> Signed-off-by: Ingo Molnar commit e2912009fb7b715728311b0d8fe327a1432b3f79 Author: Peter Zijlstra Date: Wed Dec 16 18:04:36 2009 +0100 sched: Ensure set_task_cpu() is never called on blocked tasks In order to clean up the set_task_cpu() rq dependencies we need to ensure it is never called on blocked tasks because such usage does not pair with consistent rq->lock usage. This puts the migration burden on ttwu(). Furthermore we need to close a race against changing ->cpus_allowed, since select_task_rq() runs with only preemption disabled. For sched_fork() this is safe because the child isn't in the tasklist yet, for wakeup we fix this by synchronizing set_cpus_allowed_ptr() against TASK_WAKING, which leaves sched_exec to be a problem This also closes a hole in (6ad4c1888 sched: Fix balance vs hotplug race) where ->select_task_rq() doesn't validate the result against the sched_domain/root_domain. Signed-off-by: Peter Zijlstra Cc: Mike Galbraith LKML-Reference: <20091216170517.807938893@chello.nl> Signed-off-by: Ingo Molnar commit 06b83b5fbea273672822b6ee93e16781046553ec Author: Peter Zijlstra Date: Wed Dec 16 18:04:35 2009 +0100 sched: Use TASK_WAKING for fork wakups For later convenience use TASK_WAKING for fresh tasks. Signed-off-by: Peter Zijlstra Cc: Mike Galbraith LKML-Reference: <20091216170517.732561278@chello.nl> Signed-off-by: Ingo Molnar commit e4f4288842ee12747e10c354d72be7d424c0b627 Author: Peter Zijlstra Date: Wed Dec 16 18:04:34 2009 +0100 sched: Select_task_rq_fair() must honour SD_LOAD_BALANCE We should skip !SD_LOAD_BALANCE domains. Signed-off-by: Peter Zijlstra Cc: Mike Galbraith LKML-Reference: <20091216170517.653578430@chello.nl> CC: stable@kernel.org Signed-off-by: Ingo Molnar commit e6c8fba7771563b2f3dfb96a78f36ec17e15bdf0 Author: Peter Zijlstra Date: Wed Dec 16 18:04:33 2009 +0100 sched: Fix task_hot() test order Make sure not to access sched_fair fields before verifying it is indeed a sched_fair task. Signed-off-by: Peter Zijlstra Cc: Mike Galbraith CC: stable@kernel.org LKML-Reference: <20091216170517.577998058@chello.nl> Signed-off-by: Ingo Molnar commit 9ee349ad6d326df3633d43f54202427295999c47 Author: Xiaotian Feng Date: Wed Dec 16 18:04:32 2009 +0100 sched: Fix set_cpu_active() in cpu_down() Sachin found cpu hotplug test failures on powerpc, which made the kernel hang on his POWER box. The problem is that we fail to re-activate a cpu when a hot-unplug fails. Fix this by moving the de-activation into _cpu_down after doing the initial checks. Remove the synchronize_sched() calls and rely on those implied by rebuilding the sched domains using the new mask. Reported-by: Sachin Sant Signed-off-by: Xiaotian Feng Tested-by: Sachin Sant Signed-off-by: Peter Zijlstra Cc: Mike Galbraith LKML-Reference: <20091216170517.500272612@chello.nl> Signed-off-by: Ingo Molnar commit 933b0618d8b2a59c7a0742e43836544e02f1e9bd Author: Peter Zijlstra Date: Wed Dec 16 18:04:31 2009 +0100 sched: Mark boot-cpu active before smp_init() A UP machine has 1 active cpu, not having the boot-cpu in the active map when starting the scheduler confuses things. Signed-off-by: Peter Zijlstra Cc: Mike Galbraith LKML-Reference: <20091216170517.423469527@chello.nl> Signed-off-by: Ingo Molnar commit ee1156c11a1121e118b0a7f2dec240f0d421b1fd Merge: b9f8fcd 8bea867 Author: Ingo Molnar Date: Wed Dec 16 18:33:49 2009 +0100 Merge branch 'linus' into sched/urgent Conflicts: kernel/sched_idletask.c Merge reason: resolve the conflicts, pick up latest changes. Signed-off-by: Ingo Molnar commit 5df974009fe513c664303de24725ea0f8b47f12e Author: Sheng Yang Date: Wed Dec 16 13:48:04 2009 +0800 x86: Add IA32_TSC_AUX MSR and use it Clean up write_tsc() and write_tscp_aux() by replacing hardcoded values. No change in functionality. Signed-off-by: Sheng Yang Cc: Avi Kivity Cc: Marcelo Tosatti LKML-Reference: <1260942485-19156-4-git-send-email-sheng@linux.intel.com> Signed-off-by: Ingo Molnar commit 0b962d473af32ec334df271b54ff4973cb2b4c73 Author: H. Peter Anvin Date: Tue Dec 15 15:13:07 2009 -0800 x86, msr/cpuid: Register enough minors for the MSR and CPUID drivers register_chrdev() hardcodes registering 256 minors, presumably to avoid breaking old drivers. However, we need to register enough minors so that we have all possible CPUs. checkpatch warns on this patch, but the patch is correct: NR_CPUS here is a static *upper bound* on the *maximum CPU index* (not *number of CPUs!*) and that is what we want. Reported-and-tested-by: Russ Anderson Cc: Tejun Heo Cc: Alan Cox Cc: Takashi Iwai Cc: Alexander Viro Signed-off-by: H. Peter Anvin LKML-Reference: commit 54291362d2a5738e1b0495df2abcb9e6b0563a3f Author: Phillip Lougher Date: Mon Dec 14 21:45:19 2009 +0000 initramfs: add missing decompressor error check The decompressors return error by calling a supplied error function, and/or by returning an error return value. The initramfs code, however, fails to check the exit code returned by the decompressor, and only checks the error status set by calling the error function. This patch adds a return code check and calls the error function. Signed-off-by: Phillip Lougher LKML-Reference: <4b26b1ef.0+ZWxT6886olqcSc%phillip@lougher.demon.co.uk> Signed-off-by: H. Peter Anvin commit d4529862cae4de19fda8d4bbcbddc60f3e48a4cf Author: Phillip Lougher Date: Mon Dec 14 21:45:19 2009 +0000 bzip2: Add missing checks for malloc returning NULL Signed-off-by: Phillip Lougher LKML-Reference: <4b26b1ef.ln20bM9Mn4gzB21L%phillip@lougher.demon.co.uk> Signed-off-by: H. Peter Anvin commit c1e7c3ae59b065bf7ff24a05cb609b2f9e314db6 Author: Phillip Lougher Date: Mon Dec 14 21:45:19 2009 +0000 bzip2/lzma/gzip: pre-boot malloc doesn't return NULL on failure The trivial malloc implementation used in the pre-boot environment by the decompressors returns a bad pointer on failure (falling through after calling error). This is doubly wrong - the callers expect malloc to return NULL on failure, second the error function is intended to be used by the decompressors to propagate errors to *their* callers. The decompressors have no access to any state set by the error function. Signed-off-by: Phillip Lougher LKML-Reference: <4b26b1ef.hIInb2AYPMtImAJO%phillip@lougher.demon.co.uk> Signed-off-by: H. Peter Anvin commit b9f8fcd55bbdb037e5332dbdb7b494f0b70861ac Author: David Miller Date: Sun Dec 13 18:25:02 2009 -0800 sched: Fix cpu_clock() in NMIs, on !CONFIG_HAVE_UNSTABLE_SCHED_CLOCK Relax stable-sched-clock architectures to not save/disable/restore hardirqs in cpu_clock(). The background is that I was trying to resolve a sparc64 perf issue when I discovered this problem. On sparc64 I implement pseudo NMIs by simply running the kernel at IRQ level 14 when local_irq_disable() is called, this allows performance counter events to still come in at IRQ level 15. This doesn't work if any code in an NMI handler does local_irq_save() or local_irq_disable() since the "disable" will kick us back to cpu IRQ level 14 thus letting NMIs back in and we recurse. The only path which that does that in the perf event IRQ handling path is the code supporting frequency based events. It uses cpu_clock(). cpu_clock() simply invokes sched_clock() with IRQs disabled. And that's a fundamental bug all on it's own, particularly for the HAVE_UNSTABLE_SCHED_CLOCK case. NMIs can thus get into the sched_clock() code interrupting the local IRQ disable code sections of it. Furthermore, for the not-HAVE_UNSTABLE_SCHED_CLOCK case, the IRQ disabling done by cpu_clock() is just pure overhead and completely unnecessary. So the core problem is that sched_clock() is not NMI safe, but we are invoking it from NMI contexts in the perf events code (via cpu_clock()). A less important issue is the overhead of IRQ disabling when it isn't necessary in cpu_clock(). CONFIG_HAVE_UNSTABLE_SCHED_CLOCK architectures are not affected by this patch. Signed-off-by: David S. Miller Acked-by: Peter Zijlstra Cc: Mike Galbraith LKML-Reference: <20091213.182502.215092085.davem@davemloft.net> Signed-off-by: Ingo Molnar commit 1a551ae715825bb2a2107a2dd68de024a1fa4e32 Author: Thomas Gleixner Date: Wed Dec 9 10:15:11 2009 +0000 sched: Use rcu in sched_get_rr_param() read_lock(&tasklist_lock) does not protect sys_sched_get_rr_param() against a concurrent update of the policy or scheduler parameters as do_sched_scheduler() does not take the tasklist_lock. The access to task->sched_class->get_rr_interval is protected by task_rq_lock(task). Use rcu_read_lock() to protect find_task_by_vpid() and prevent the task struct from going away. Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra LKML-Reference: <20091209100706.862897167@linutronix.de> Signed-off-by: Ingo Molnar commit 23f5d142519621b16cf2b378cf8adf4dcf01a616 Author: Thomas Gleixner Date: Wed Dec 9 10:15:01 2009 +0000 sched: Use rcu in sched_get/set_affinity() tasklist_lock is held read locked to protect the find_task_by_vpid() call and to prevent the task going away. sched_setaffinity acquires a task struct ref and drops tasklist lock right away. The access to the cpus_allowed mask is protected by rq->lock. rcu_read_lock() provides the same protection here. Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra LKML-Reference: <20091209100706.789059966@linutronix.de> Signed-off-by: Ingo Molnar commit 5fe85be081edf0ac92d83f9c39e0ab5c1371eb82 Author: Thomas Gleixner Date: Wed Dec 9 10:14:58 2009 +0000 sched: Use rcu in sys_sched_getscheduler/sys_sched_getparam() read_lock(&tasklist_lock) does not protect sys_sched_getscheduler and sys_sched_getparam() against a concurrent update of the policy or scheduler parameters as do_sched_setscheduler() does not take the tasklist_lock. The accessed integers can be retrieved w/o locking and are snapshots anyway. Using rcu_read_lock() to protect find_task_by_vpid() and prevent the task struct from going away is not changing the above situation. Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra LKML-Reference: <20091209100706.753790977@linutronix.de> Signed-off-by: Ingo Molnar commit 663997d417330a59a566452f52cfa04c8ffd190b Author: Joe Perches Date: Sat Dec 12 13:57:27 2009 -0800 sched: Use pr_fmt() and pr_() - Convert printk(KERN_ to pr_ (not KERN_DEBUG) - Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt - Coalesce long format strings - Add missing \n to "ERROR: !SD_LOAD_BALANCE domain has parent" Signed-off-by: Joe Perches Cc: Peter Zijlstra LKML-Reference: <1260655047.2637.7.camel@Joe-Laptop.home> Signed-off-by: Ingo Molnar commit 7539a3b3d1f892dd97eaf094134d7de55c13befe Author: Rafael J. Wysocki Date: Sun Dec 13 00:07:30 2009 +0100 sched: Make wakeup side and atomic variants of completion API irq safe Alan Stern noticed that all the wakeup side (and atomic) variants of the completion APIs should be irq safe, but the newly introduced completion_done() and try_wait_for_completion() aren't. The use of the irq unsafe variants in IRQ contexts can cause crashes/hangs. Fix the problem by making them use spin_lock_irqsave() and spin_lock_irqrestore(). Reported-by: Alan Stern Signed-off-by: Rafael J. Wysocki Cc: Linus Torvalds Cc: Zhang Rui Cc: pm list Cc: Peter Zijlstra Cc: David Chinner Cc: Lachlan McIlroy LKML-Reference: <200912130007.30541.rjw@sisk.pl> Signed-off-by: Ingo Molnar commit bb6eddf7676e1c1f3e637aa93c5224488d99036f Author: Thomas Gleixner Date: Thu Dec 10 15:35:10 2009 +0100 clockevents: Prevent clockevent_devices list corruption on cpu hotplug Xiaotian Feng triggered a list corruption in the clock events list on CPU hotplug and debugged the root cause. If a CPU registers more than one per cpu clock event device, then only the active clock event device is removed on CPU_DEAD. The unused devices are kept in the clock events device list. On CPU up the clock event devices are registered again, which means that we list_add an already enqueued list_head. That results in list corruption. Resolve this by removing all devices which are associated to the dead CPU on CPU_DEAD. Reported-by: Xiaotian Feng Signed-off-by: Thomas Gleixner Tested-by: Xiaotian Feng Cc: stable@kernel.org commit d4581a239a40319205762b76c01eb6363f277efa Author: Thomas Gleixner Date: Thu Dec 10 00:52:51 2009 +0000 sys: Fix missing rcu protection for __task_cred() access commit c69e8d9 (CRED: Use RCU to access another task's creds and to release a task's own creds) added non rcu_read_lock() protected access to task creds of the target task in set_prio_one(). The comment above the function says: * - the caller must hold the RCU read lock The calling code in sys_setpriority does read_lock(&tasklist_lock) but not rcu_read_lock(). This works only when CONFIG_TREE_PREEMPT_RCU=n. With CONFIG_TREE_PREEMPT_RCU=y the rcu_callbacks can run in the tick interrupt when they see no read side critical section. There is another instance of __task_cred() in sys_setpriority() itself which is equally unprotected. Wrap the whole code section into a rcu read side critical section to fix this quick and dirty. Will be revisited in course of the read_lock(&tasklist_lock) -> rcu crusade. Oleg noted further: This also fixes another bug here. find_task_by_vpid() is not safe without rcu_read_lock(). I do not mean it is not safe to use the result, just find_pid_ns() by itself is not safe. Usually tasklist gives enough protection, but if copy_process() fails it calls free_pid() lockless and does call_rcu(delayed_put_pid(). This means, without rcu lock find_pid_ns() can't scan the hash table safely. Signed-off-by: Thomas Gleixner LKML-Reference: <20091210004703.029784964@linutronix.de> Acked-by: Paul E. McKenney commit 7cf7db8df0b78076eafa4ead47559344ca7b7a43 Author: Thomas Gleixner Date: Thu Dec 10 00:53:21 2009 +0000 signals: Fix more rcu assumptions 1) Remove the misleading comment in __sigqueue_alloc() which claims that holding a spinlock is equivalent to rcu_read_lock(). 2) Add a rcu_read_lock/unlock around the __task_cred() access in __sigqueue_alloc() This needs to be revisited to remove the remaining users of read_lock(&tasklist_lock) but that's outside the scope of this patch. Signed-off-by: Thomas Gleixner LKML-Reference: <20091210004703.269843657@linutronix.de> commit 14d8c9f3c09e7fd7b9af80904289fe204f5b93c6 Author: Thomas Gleixner Date: Thu Dec 10 00:53:17 2009 +0000 signal: Fix racy access to __task_cred in kill_pid_info_as_uid() kill_pid_info_as_uid() accesses __task_cred() without being in a RCU read side critical section. tasklist_lock is not protecting that when CONFIG_TREE_PREEMPT_RCU=y. Convert the whole tasklist_lock section to rcu and use lock_task_sighand to prevent the exit race. Signed-off-by: Thomas Gleixner LKML-Reference: <20091210004703.232302055@linutronix.de> Acked-by: Oleg Nesterov