commit cb600d2f83c854ec3d6660063e4466431999489b Merge: 47935a7 d50d8fe Author: Linus Torvalds Date: Thu Jan 6 11:12:17 2011 -0800 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, mm: Initialize initial_page_table before paravirt jumps commit 47935a731b7b850a4c6c0e55ed0741e3dd25d889 Merge: 77a0dd5 3fb82d5 fd35fbc 9e76a97 c8217b8 3cf9b85 f6cd247 Author: Linus Torvalds Date: Thu Jan 6 11:11:50 2011 -0800 Merge branches 'x86-alternatives-for-linus', 'x86-fpu-for-linus', 'x86-hwmon-for-linus', 'x86-paravirt-for-linus', 'core-locking-for-linus' and 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-alternatives-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, suspend: Avoid unnecessary smp alternatives switch during suspend/resume * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-64, asm: Use fxsaveq/fxrestorq in more places * 'x86-hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, hwmon: Add core threshold notification to therm_throt.c * 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, paravirt: Use native_halt on a halt, not native_safe_halt * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: locking, lockdep: Convert sprintf_symbol to %pS * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: irq: Better struct irqaction layout commit 77a0dd54ba3c86b00ab7079bc3be5d82395ecab2 Merge: d7a5a18 cfa6091 Author: Linus Torvalds Date: Thu Jan 6 11:09:57 2011 -0800 Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, UV, BAU: Extend for more than 16 cpus per socket x86, UV: Fix the effect of extra bits in the hub nodeid register x86, UV: Add common uv_early_read_mmr() function for reading MMRs commit d7a5a18190d6f523b5d795bfd73f83cf13a3a383 Merge: 4f00b90 a8760ec Author: Linus Torvalds Date: Thu Jan 6 11:08:14 2011 -0800 Merge branch 'x86-tsc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-tsc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Check tsc available/disabled in the delayed init function x86: Improve TSC calibration using a delayed workqueue x86: Make tsc=reliable override boot time stability checks commit 4f00b901d4233a78e6ca4d44c8c6fc5d38a3ee9e Merge: b4c6e2e 94462ad Author: Linus Torvalds Date: Thu Jan 6 11:07:33 2011 -0800 Merge branch 'x86-security-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-security-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: module: Move RO/NX module protection to after ftrace module update x86: Resume trampoline must be executable x86: Add RO/NX protection for loadable kernel modules x86: Add NX protection for kernel data x86: Fix improper large page preservation commit b4c6e2ea5e46b03c764a918f4999a77a3149979f Merge: 6f46b12 991cfff Author: Linus Torvalds Date: Thu Jan 6 11:06:31 2011 -0800 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, earlyprintk: Move mrst early console to platform/ and fix a typo x86, apbt: Setup affinity for apb timers acting as per-cpu timer ce4100: Add errata fixes for UART on CE4100 x86: platform: Move iris to x86/platform where it belongs x86, mrst: Check platform_device_register() return code x86/platform: Add Eurobraille/Iris power off support x86, mrst: Add explanation for using 1960 as the year offset for vrtc x86, mrst: Fix dependencies of "select INTEL_SCU_IPC" x86, mrst: The shutdown for MRST requires the SCU IPC mechanism x86: Ce4100: Add reboot_fixup() for CE4100 ce4100: Add PCI register emulation for CE4100 x86: Add CE4100 platform support x86: mrst: Set vRTC's IRQ to level trigger type x86: mrst: Add audio driver bindings rtc: Add drivers/rtc/rtc-mrst.c x86: mrst: Add vrtc driver which serves as a wall clock device x86: mrst: Add Moorestown specific reboot/shutdown support x86: mrst: Parse SFI timer table for all timer configs x86/mrst: Add SFI platform device parsing code commit 6f46b120a96212b85cbdcb84a64c854dfd791ede Merge: 4e1db5e c7657ac Author: Linus Torvalds Date: Thu Jan 6 11:06:09 2011 -0800 Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, microcode, AMD: Cleanup code a bit x86, microcode, AMD: Replace vmalloc+memset with vzalloc commit 4e1db5e58af8bc6ab4a651df279add41c48d3fc2 Merge: 37d9a8c eb48c9c Author: Linus Torvalds Date: Thu Jan 6 11:05:21 2011 -0800 Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: apic, amd: Make firmware bug messages more meaningful mce, amd: Remove goto in threshold_create_device() mce, amd: Add helper functions to setup APIC mce, amd: Shorten local variables mci_misc_{hi,lo} mce, amd: Implement mce_threshold_block_init() helper function commit 37d9a8c5ea8fc063841c133fc53cc168ee620762 Merge: 017892c 79250af Author: Linus Torvalds Date: Thu Jan 6 10:56:02 2011 -0800 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix included-by file reference comments x86, cpu: Only CPU features determine NX capabilities x86, cpu: Call verify_cpu during 32bit CPU startup x86, cpu: Clear XD_DISABLED flag on Intel to regain NX x86, cpu: Rename verify_cpu_64.S to verify_cpu.S commit 017892c341033b3e961e695bc0bf1a815efcf92e Merge: 42cbd8e cb2ded3 Author: Linus Torvalds Date: Thu Jan 6 10:51:36 2011 -0800 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix APIC ID sizing bug on larger systems, clean up MAX_APICS confusion x86, acpi: Parse all SRAT cpu entries even above the cpu number limitation x86, acpi: Add MAX_LOCAL_APIC for 32bit x86: io_apic: Split setup_ioapic_ids_from_mpc() x86: io_apic: Fix CONFIG_X86_IO_APIC=n breakage x86: apic: Move probe_nr_irqs_gsi() into ioapic_init_mappings() x86: Allow platforms to force enable apic commit 42cbd8efb0746b55112de45173219f76c54390da Merge: dda5f0a f658bcf Author: Linus Torvalds Date: Thu Jan 6 10:50:28 2011 -0800 Merge branch 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, cacheinfo: Cleanup L3 cache index disable support x86, amd-nb: Cleanup AMD northbridge caching code x86, amd-nb: Complete the rename of AMD NB and related code commit dda5f0a372873bca5f0b1d1866d7784dffd8b675 Merge: 65b2074 88606e8 Author: Linus Torvalds Date: Thu Jan 6 10:42:43 2011 -0800 Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: MAINTAINERS: Update timer related entries timers: Use this_cpu_read timerqueue: Make timerqueue_getnext() static inline hrtimer: fix timerqueue conversion flub hrtimers: Convert hrtimers to use timerlist infrastructure timers: Fixup allmodconfig build issue timers: Rename timerlist infrastructure to timerqueue timers: Introduce timerlist infrastructure. hrtimer: Remove stale comment on curr_timer timer: Warn when del_timer_sync() is called in hardirq context timer: Del_timer_sync() can be used in softirq context timer: Make try_to_del_timer_sync() the same on SMP and UP posix-timers: Annotate lock_timer() timer: Permit statically-declared work with deferrable timers time: Use ARRAY_SIZE macro in timecompare.c timer: Initialize the field slack of timer_list timer_list: Remove alignment padding on 64 bit when CONFIG_TIMER_STATS time: Compensate for rounding on odd-frequency clocksources Fix up trivial conflict in MAINTAINERS commit 65b2074f84be2287e020839e93b4cdaaf60eb37c Merge: 28d9bfc 6bf4123 Author: Linus Torvalds Date: Thu Jan 6 10:23:33 2011 -0800 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits) sched: Change wait_for_completion_*_timeout() to return a signed long sched, autogroup: Fix reference leak sched, autogroup: Fix potential access to freed memory sched: Remove redundant CONFIG_CGROUP_SCHED ifdef sched: Fix interactivity bug by charging unaccounted run-time on entity re-weight sched: Move periodic share updates to entity_tick() printk: Use this_cpu_{read|write} api on printk_pending sched: Make pushable_tasks CONFIG_SMP dependant sched: Add 'autogroup' scheduling feature: automated per session task groups sched: Fix unregister_fair_sched_group() sched: Remove unused argument dest_cpu to migrate_task() mutexes, sched: Introduce arch_mutex_cpu_relax() sched: Add some clock info to sched_debug cpu: Remove incorrect BUG_ON cpu: Remove unused variable sched: Fix UP build breakage sched: Make task dump print all 15 chars of proc comm sched: Update tg->shares after cpu.shares write sched: Allow update_cfs_load() to update global load sched: Implement demand based update_cfs_load() ... commit 28d9bfc37c861aa9c8386dff1ac7e9a10e5c5162 Merge: f3b0cfa 4b95f13 Author: Linus Torvalds Date: Thu Jan 6 10:17:26 2011 -0800 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (146 commits) tools, perf: Documentation for the power events API perf: Add calls to suspend trace point perf script: Make some lists static perf script: Use the default lost event handler perf session: Warn about errors when processing pipe events too perf tools: Fix perf_event.h header usage perf test: Clarify some error reports in the open syscall test x86, NMI: Add touch_nmi_watchdog to io_check_error delay x86: Avoid calling arch_trigger_all_cpu_backtrace() at the same time x86: Only call smp_processor_id in non-preempt cases perf timechart: Adjust perf timechart to the new power events perf: Clean up power events by introducing new, more generic ones perf: Do not export power_frequency, but power_start event perf test: Add test for counting open syscalls perf evsel: Auto allocate resources needed for some methods perf evsel: Use {cpu,thread}_map to shorten list of parameters perf tools: Refactor all_tids to hold nr and the map perf tools: Refactor cpumap to hold nr and the map perf evsel: Introduce per cpu and per thread open helpers perf evsel: Steal the counter reading routines from stat ... commit f3b0cfa9b017a9d4686c9b14b908a1685f97a077 Merge: 2af49b6 5bdb05f Author: Linus Torvalds Date: Thu Jan 6 10:07:05 2011 -0800 Merge branch 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futex: Add futex_q static initializer futex: Replace fshared and clockrt with combined flags futex: Cleanup stale fshared flag interfaces commit 2af49b6058d857fa5b476db642d4452bf5833ecd Merge: b08b272 394f452 Author: Linus Torvalds Date: Thu Jan 6 10:06:26 2011 -0800 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: remove unused __list_for_each_rcu() macro rculist: fix borked __list_for_each_rcu() macro rcu: reduce __call_rcu()-induced contention on rcu_node structures rcu: limit rcu_node leaf-level fanout rcu: fine-tune grace-period begin/end checks rcu: Keep gpnum and completed fields synchronized rcu: Stop chasing QS if another CPU did it for us rcu: increase synchronize_sched_expedited() batching rcu: Make synchronize_srcu_expedited() fast if running readers rcu: fix race condition in synchronize_sched_expedited() rcu: update documentation/comments for Lai's adoption patch rcu,cleanup: simplify the code when cpu is dying rcu,cleanup: move synchronize_sched_expedited() out of sched.c rcu: get rid of obsolete "classic" names in TREE_RCU tracing rcu: Distinguish between boosting and boosted rcu: document TINY_RCU and TINY_PREEMPT_RCU tracing. rcu: add tracing for TINY_RCU and TINY_PREEMPT_RCU rcu: priority boosting for TINY_PREEMPT_RCU rcu: move TINY_RCU from softirq to kthread rcu: add priority-inversion testing to rcutorture commit b08b27213384d1bd6eda04a2b6f788b4cdee0f34 Merge: 8484baa 846f404 Author: Linus Torvalds Date: Thu Jan 6 10:01:23 2011 -0800 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: GFS2: Don't flush delete workqueue when releasing the transaction lock GFS2: fsck.gfs2 reported statfs error after gfs2_grow GFS2: Merge glock state fields into a bitfield GFS2: Fix uninitialised error value in previous patch GFS2: fix recursive locking during rindex truncates GFS2: reread rindex when necessary to grow rindex GFS2: Remove duplicate #defines from glock.h GFS2: Clean up of gdlm_lock function GFS2: Allow gfs2 to update quota usage values through the quotactl interface GFS2: fs/gfs2/glock.h: Add __attribute__((format(printf,2,3)) to gfs2_print_dbg GFS2: fs/gfs2/glock.c: Use printf extension %pV GFS2: Clean up duplicated setattr code GFS2: Remove unreachable calls to vmtruncate GFS2: fs/gfs2/glock.c: Convert sprintf_symbol to %pS GFS2: Change two WQ_RESCUERs into WQ_MEM_RECLAIM commit 8484baaa5065b460e5eb18ee721d8417251f7897 Author: Randy Dunlap Date: Wed Jan 5 16:28:43 2011 -0800 kernel-doc: code reorganization Move 'main' code vs. subroutines around so that they are not so intermixed, for better readability/understanding (relative to Perl). It was messy to follow the primary flow of code execution with the code being mixed. Now the code begins with data initialization, followed by all subroutines, then ends with the main code execution. This is almost totally source code movement, with a few changes as needed for forward declarations. Signed-off-by: Randy Dunlap Signed-off-by: Linus Torvalds commit d5ba92b7958e3ff2f0878e45b9b42cb6976853dd Author: Nicolas Kaiser Date: Wed Jan 5 16:27:53 2011 -0800 Documentation: update kernel-docs.txt Fixed typos, and removed duplicated entries. Signed-off-by: Nicolas Kaiser Signed-off-by: Randy Dunlap Signed-off-by: Linus Torvalds commit a40649781b9c10e192fad9f00a60c6d946da553f Author: Michael Prokop Date: Wed Jan 5 16:27:15 2011 -0800 Documentation/dontdiff: add further autogenerated files to ignore list Mainly resulting from (but not limited to) autogenerated files of lib/raid6 and drivers/gpu/drm/radeon. List generated as result of a diff of a clean 2.6.36 tree against a built one. Signed-off-by: Michael Prokop Signed-off-by: Randy Dunlap Signed-off-by: Linus Torvalds commit 4b95f135f606c87e4056b6d7fd3c5781c818858b Author: Jean Pihet Date: Wed Jan 5 19:49:02 2011 +0100 tools, perf: Documentation for the power events API Provides documentation for the following: - the new power trace API, - the old (legacy) power trace API, - the DEPRECATED Kconfig option usage. Signed-off-by: Jean Pihet Cc: Arjan van de Ven Cc: trenn@suse.de Cc: Len Brown Cc: Pavel Machek Cc: Rafael J. Wysocki Cc: Steven Rostedt Cc: Arnaldo Carvalho de Melo Cc: linux-pm@lists.linux-foundation.org LKML-Reference: <1294253342-29056-3-git-send-email-j-pihet@ti.com> Signed-off-by: Ingo Molnar commit 938cfed18bec2c7361f37efc954712a7cc42c353 Author: Jean Pihet Date: Wed Jan 5 19:49:01 2011 +0100 perf: Add calls to suspend trace point Uses the machine_suspend trace point, called from the generic kernel suspend_devices_and_enter function. Signed-off-by: Jean Pihet Acked-by: Rafael J. Wysocki Cc: Arjan van de Ven CC: Thomas Renninger Cc: Len Brown Cc: Pavel Machek Cc: Steven Rostedt Cc: Arnaldo Carvalho de Melo Cc: Peter Zijlstra Cc: linux-pm@lists.linux-foundation.org LKML-Reference: <1294253342-29056-2-git-send-email-j-pihet@ti.com> Signed-off-by: Ingo Molnar commit eccdfe2d245a882feacc4630c9bc29805e9929c8 Author: Arnaldo Carvalho de Melo Date: Tue Jan 4 16:32:52 2011 -0200 perf script: Make some lists static Not accessed outside builtin-script, so make them static. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 6d8afb56300c53a250c6de0f973ef502e54aabf3 Author: Arnaldo Carvalho de Melo Date: Tue Jan 4 16:27:30 2011 -0200 perf script: Use the default lost event handler That already does what was being done here. The warning is now unconditionally given by __perf_session__process_pipe_events, just like for non pipe processing. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 1109599458c06256064213dc44ca5f5fa8ee3833 Author: Arnaldo Carvalho de Melo Date: Tue Jan 4 16:25:15 2011 -0200 perf session: Warn about errors when processing pipe events too Just like we do at __perf_session__process_events Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit d030260ad33b482a371f999c7e9db79ef7a2111f Author: Stephane Eranian Date: Tue Jan 4 16:30:01 2011 +0200 perf tools: Fix perf_event.h header usage This patch fixes the usage of the perf_event.h header file between command modules and the supporting code in util. It is necessary to ensure that ALL files use the SAME perf_event.h header from the kernel source tree. There were a couple of #include mixed with #include "../../perf_event.h". This caused issues on some distros because of mismatch in the layout of struct perf_event_attr. That eventually led perf stat to segfault. Cc: David S. Miller Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Robert Richter Cc: Stephane Eranian LKML-Reference: <4d233cf0.2308e30a.7b00.ffffc187@mx.google.com> Signed-off-by: Stephane Eranian Signed-off-by: Arnaldo Carvalho de Melo commit 454a3bbe9b75eb8cbddffcf383fbb8e97ea78f52 Author: Arnaldo Carvalho de Melo Date: Tue Jan 4 10:40:08 2011 -0200 perf test: Clarify some error reports in the open syscall test Rebooted my devel machine, first thing I ran was perf test, that expects debugfs to be mounted, test fails. Be more clear about it. Also add missing newlines and add more informative message when sys_perf_event_open fails. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 74d91e3c6a66359bb754fb5d8a5b54fb6ba2f9a6 Author: Huang Ying Date: Tue Jan 4 22:38:09 2011 -0500 x86, NMI: Add touch_nmi_watchdog to io_check_error delay Prevent the long delay in io_check_error making NMI watchdog timeout. Signed-off-by: Huang Ying Signed-off-by: Don Zickus LKML-Reference: <1294198689-15447-3-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar commit 554ec063982752e9a569ab9189eeffa3d96731b2 Author: Dongdong Deng Date: Tue Jan 4 22:38:08 2011 -0500 x86: Avoid calling arch_trigger_all_cpu_backtrace() at the same time The spin_lock_debug/rcu_cpu_stall detector uses trigger_all_cpu_backtrace() to dump cpu backtrace. Therefore it is possible that trigger_all_cpu_backtrace() could be called at the same time on different CPUs, which triggers and 'unknown reason NMI' warning. The following case illustrates the problem: CPU1 CPU2 ... CPU N trigger_all_cpu_backtrace() set "backtrace_mask" to cpu mask | generate NMI interrupts generate NMI interrupts ... \ | / \ | / The "backtrace_mask" will be cleaned by the first NMI interrupt at nmi_watchdog_tick(), then the following NMI interrupts generated by other cpus's arch_trigger_all_cpu_backtrace() will be taken as unknown reason NMI interrupts. This patch uses a test_and_set to avoid the problem, and stop the arch_trigger_all_cpu_backtrace() from calling to avoid dumping a double cpu backtrace info when there is already a trigger_all_cpu_backtrace() in progress. Signed-off-by: Dongdong Deng Reviewed-by: Bruce Ashfield Cc: fweisbec@gmail.com LKML-Reference: <1294198689-15447-2-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar Signed-off-by: Don Zickus commit 9ab181fa9ff73a38fccd0a4f1c40a38dfe62b535 Author: Don Zickus Date: Tue Jan 4 22:38:07 2011 -0500 x86: Only call smp_processor_id in non-preempt cases There are some paths that walk the die_chain with preemption on. Make sure we are in an NMI call before we start doing anything. This was triggered by do_general_protection calling notify_die with DIE_GPF. Reported-by: Jan Kiszka Signed-off-by: Don Zickus LKML-Reference: <1294198689-15447-1-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar commit aef1b9cef78ae65c6501850851cc3f61f9be477b Merge: 20c457b 3c0eee3 Author: Ingo Molnar Date: Wed Jan 5 14:22:08 2011 +0100 Merge commit 'v2.6.37' into perf/core Merge reason: Add the final .37 tree. Signed-off-by: Ingo Molnar commit 6bf4123760a5aece6e4829ce90b70b6ffd751d65 Author: NeilBrown Date: Wed Jan 5 12:50:16 2011 +1100 sched: Change wait_for_completion_*_timeout() to return a signed long wait_for_completion_*_timeout() can return: 0: if the wait timed out -ve: if the wait was interrupted +ve: if the completion was completed. As they currently return an 'unsigned long', the last two cases are not easily distinguished which can easily result in buggy code, as is the case for the recently added wait_for_completion_interruptible_timeout() call in net/sunrpc/cache.c So change them both to return 'long'. As MAX_SCHEDULE_TIMEOUT is LONG_MAX, a large +ve return value should never overflow. Signed-off-by: NeilBrown Cc: Peter Zijlstra Cc: J. Bruce Fields Cc: Andrew Morton Cc: Linus Torvalds LKML-Reference: <20110105125016.64ccab0e@notabene.brown> Signed-off-by: Ingo Molnar commit 27066fd484a32c80630136aa2b91c980f3198f9d Merge: 101e5f7 3c0eee3 Author: Ingo Molnar Date: Wed Jan 5 14:14:42 2011 +0100 Merge commit 'v2.6.37' into sched/core Merge reason: Merge the final .37 tree. Signed-off-by: Ingo Molnar commit cb2ded37fd2e1039f96c8c892da024a8f033add5 Author: Yinghai Lu Date: Tue Jan 4 16:38:52 2011 -0800 x86: Fix APIC ID sizing bug on larger systems, clean up MAX_APICS confusion Found one x2apic pre-enabled system, x2apic_mode suddenly get corrupted after register some cpus, when compiled CONFIG_NR_CPUS=255 instead of 512. It turns out that generic_processor_info() ==> phyid_set(apicid, phys_cpu_present_map) causes the problem. phys_cpu_present_map is sized by MAX_APICS bits, and pre-enabled system some cpus have an apic id > 255. The variable after phys_cpu_present_map may get corrupted silently: ffffffff828e8420 B phys_cpu_present_map ffffffff828e8440 B apic_verbosity ffffffff828e8444 B local_apic_timer_c2_ok ffffffff828e8448 B disable_apic ffffffff828e844c B x2apic_mode ffffffff828e8450 B x2apic_disabled ffffffff828e8454 B num_processors ... Actually phys_cpu_present_map is referenced via apic id, instead index. We should use MAX_LOCAL_APIC instead MAX_APICS. For 64-bit it will be 32768 in all cases. BSS will increase by 4k bytes on 64-bit: text data bss dec filename 21696943 4193748 12787712 38678403 vmlinux.before 21696943 4193748 12791808 38682499 vmlinux.after No change on 32bit. Finally we can remove MAX_APCIS that was rather confusing. Signed-off-by: Yinghai Lu Cc: H. Peter Anvin Cc: "Eric W. Biederman" LKML-Reference: <4D23BD9C.3070102@kernel.org> Signed-off-by: Ingo Molnar commit 101e5f77bf35679809586e250b6c62193d2ed179 Author: Mike Galbraith Date: Fri Dec 31 09:32:30 2010 +0100 sched, autogroup: Fix reference leak The cgroup exit mess also uncovered a struct autogroup reference leak. copy_process() was simply freeing vs putting the signal_struct, stranding a reference. Signed-off-by: Mike Galbraith Signed-off-by: Peter Zijlstra Cc: Oleg Nesterov LKML-Reference: <1293784350.6839.2.camel@marge.simson.net> Signed-off-by: Ingo Molnar commit 4f8219875a0dad2cfad9e93a3fafcd9626db98d2 Author: Mike Galbraith Date: Thu Dec 16 15:09:52 2010 +0100 sched, autogroup: Fix potential access to freed memory Oleg pointed out that the /proc interface kref_get() useage may race with the final put during autogroup_move_group(). A signal->autogroup assignment may be in flight when the /proc interface dereference, leaving them taking a reference to an already dead group. Reported-by: Oleg Nesterov Signed-off-by: Mike Galbraith Signed-off-by: Peter Zijlstra LKML-Reference: <1292508592.5940.28.camel@maggy.simson.net> Signed-off-by: Ingo Molnar commit d50d8fe192428090790e7178e9507e981e0b005b Author: Rusty Russell Date: Tue Jan 4 17:20:54 2011 +1030 x86, mm: Initialize initial_page_table before paravirt jumps v2.6.36-rc8-54-gb40827f (x86-32, mm: Add an initial page table for core bootstrapping) made x86 boot using initial_page_table and broke lguest. For 2.6.37 we simply cut & paste the initialization code into lguest (da32dac10126 "lguest: populate initial_page_table"), now we fix it properly by doing that initialization before the paravirt jump. Signed-off-by: Rusty Russell Acked-by: Jeremy Fitzhardinge Cc: lguest Cc: Linus Torvalds Cc: Andrew Morton Cc: Peter Zijlstra LKML-Reference: <201101041720.54535.rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar commit bc030d6cb9532877c1c5a3f5e7123344fa24a285 Merge: d3bd058 387c31c Author: Ingo Molnar Date: Tue Jan 4 09:43:42 2011 +0100 Merge commit 'v2.6.37-rc8' into x86/apic Conflicts: arch/x86/include/asm/io_apic.h Merge reason: move to a fresh -rc, resolve the conflict. Signed-off-by: Ingo Molnar commit 6706125e291bd3dddd269e043323a6ab93ccd5fb Author: Yong Zhang Date: Fri Dec 31 21:58:58 2010 +0800 sched: Remove redundant CONFIG_CGROUP_SCHED ifdef CONFIG_[FAIR|RT]_GROUP_SCHED always means CONFIG_CGROUP_SCHED Signed-off-by: Yong Zhang Cc: Peter Zijlstra LKML-Reference: <1293803938-8157-1-git-send-email-yong.zhang0@gmail.com> Signed-off-by: Ingo Molnar commit 20c457b8587bee4644d998331d9e13be82e05b4c Author: Thomas Renninger Date: Mon Jan 3 17:50:45 2011 +0100 perf timechart: Adjust perf timechart to the new power events builtin-timechart must only pass -e power:xy events if they are supported by the running kernel, otherwise try to fetch the old power:power{start,end} events. For this I added the tiny helper function: int is_valid_tracepoint(const char *event_string) to parse-events.[hc], which could be more generic as an interface and support hardware/software/... events, not only tracepoints, but someone else could extend that if needed... Signed-off-by: Thomas Renninger Acked-by: Arjan van de Ven Acked-by: Jean Pihet LKML-Reference: <1294073445-14812-4-git-send-email-trenn@suse.de> Signed-off-by: Ingo Molnar commit 25e41933b58777f2d020c3b0186b430ea004ec28 Author: Thomas Renninger Date: Mon Jan 3 17:50:44 2011 +0100 perf: Clean up power events by introducing new, more generic ones Add these new power trace events: power:cpu_idle power:cpu_frequency power:machine_suspend The old C-state/idle accounting events: power:power_start power:power_end Have now a replacement (but we are still keeping the old tracepoints for compatibility): power:cpu_idle and power:power_frequency is replaced with: power:cpu_frequency power:machine_suspend is newly introduced. Jean Pihet has a patch integrated into the generic layer (kernel/power/suspend.c) which will make use of it. the type= field got removed from both, it was never used and the type is differed by the event type itself. perf timechart userspace tool gets adjusted in a separate patch. Signed-off-by: Thomas Renninger Signed-off-by: Ingo Molnar Acked-by: Arjan van de Ven Acked-by: Jean Pihet Cc: Arnaldo Carvalho de Melo Cc: Peter Zijlstra Cc: Linus Torvalds Cc: rjw@sisk.pl LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de> Signed-off-by: Ingo Molnar LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de> commit 61a0d49c33c7fd57c14895e5b0760bd02b65ac1f Author: Thomas Renninger Date: Mon Jan 3 17:50:43 2011 +0100 perf: Do not export power_frequency, but power_start event power_frequency moved to drivers/cpufreq/cpufreq.c which has to be compiled in, no need to export it. intel_idle can a be module though... Signed-off-by: Thomas Renninger Signed-off-by: Ingo Molnar Acked-by: Jean Pihet Cc: Jean Pihet Cc: Arjan van de Ven Cc: rjw@sisk.pl LKML-Reference: <1294073445-14812-2-git-send-email-trenn@suse.de> Signed-off-by: Ingo Molnar LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de> commit 928585536ff5a8f320e60efc60e2b7ef2a5f548d Merge: cc22219 d854861 Author: Ingo Molnar Date: Tue Jan 4 08:10:28 2011 +0100 Merge branch 'perf/test' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core commit cc2221969906a166a638aecdbae84a3d0462719e Merge: 56f4c40 387c31c Author: Ingo Molnar Date: Tue Jan 4 08:08:51 2011 +0100 Merge commit 'v2.6.37-rc8' into perf/core Merge reason: pick up latest -rc. Signed-off-by: Ingo Molnar commit d854861c4292a4e675a5d3bfd862c5f7421c81e8 Author: Arnaldo Carvalho de Melo Date: Tue Jan 4 00:16:20 2011 -0200 perf test: Add test for counting open syscalls To test the use of the perf_evsel class on something other than the tools from where we refactored code to create it. It calls open() N times and then checks if the event created to monitor it returns N events. [acme@felicio linux]$ perf test 1: vmlinux symtab matches kallsyms: Ok 2: detect open syscall event: Ok [acme@felicio linux]$ It does. Cc: Frederic Weisbecker Cc: Han Pingtian Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 4eed11d5e24540dc133003b6e8f904cb747ac4bb Author: Arnaldo Carvalho de Melo Date: Tue Jan 4 00:13:17 2011 -0200 perf evsel: Auto allocate resources needed for some methods While writing the first user of the routines created from the ad-hoc routines in the existing builtins I noticed that the resulting set of calls was too long, reduce it by doing some best effort allocations. Tools that need to operate on multiple threads and cpus should pre-allocate enough resources by explicitely calling the perf_evsel__alloc_{fd,counters} methods. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 86bd5e8603b00b06189328c6d7034d2dc434d6bb Author: Arnaldo Carvalho de Melo Date: Mon Jan 3 23:09:46 2011 -0200 perf evsel: Use {cpu,thread}_map to shorten list of parameters Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 5c98d466e49267a9221f30958d45cd06f794269a Author: Arnaldo Carvalho de Melo Date: Mon Jan 3 17:53:33 2011 -0200 perf tools: Refactor all_tids to hold nr and the map So that later, we can pass the thread_map instance instead of (thread_num, thread_map) for things like perf_evsel__open and friends, just like was done with cpu_map. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 60d567e2d9187379d642f6aba7c8a52b3fd5d261 Author: Arnaldo Carvalho de Melo Date: Mon Jan 3 17:49:48 2011 -0200 perf tools: Refactor cpumap to hold nr and the map So that later, we can pass the cpu_map instance instead of (nr_cpus, cpu_map) for things like perf_evsel__open and friends. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 48290609c0d265f5dac0fca6fd4e3c5732542f67 Author: Arnaldo Carvalho de Melo Date: Mon Jan 3 17:48:12 2011 -0200 perf evsel: Introduce per cpu and per thread open helpers Abstracting away the loops needed to create the various event fd handlers. The users have to pass a confiruged perf->evsel.attr field, which is already usable after perf_evsel__new (constructor) time, using defaults. Comes out of the ad-hoc routines in builtin-stat, that now uses it. Fixed a small silly bug where we were die()ing before killing our children, dysfunctional family this one 8-) Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit c52b12ed2511e6c031a0295fd903ea72b93701fb Author: Arnaldo Carvalho de Melo Date: Mon Jan 3 17:45:52 2011 -0200 perf evsel: Steal the counter reading routines from stat Making them hopefully generic enough to be used in 'perf test', well see. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit cfa60917f0ba6eca83f41aef3cb4a7dd7736ac9f Author: Cliff Wickman Date: Mon Jan 3 12:03:53 2011 -0600 x86, UV, BAU: Extend for more than 16 cpus per socket Fix a hard-coded limit of a maximum of 16 cpu's per socket. The UV Broadcast Assist Unit code initializes by scanning the cpu topology of the system and assigning a master cpu for each socket and UV hub. That scan had an assumption of a limit of 16 cpus per socket. With Westmere we are going over that limit. The UV hub hardware will allow up to 32. If the scan finds the system has gone over that limit it returns an error and we print a warning and fall back to doing TLB shootdowns without the BAU. Signed-off-by: Cliff Wickman Cc: # .37.x LKML-Reference: Signed-off-by: Ingo Molnar commit 70d544d0576775a2b3923a7e68cb49b0313d80c9 Author: Arnaldo Carvalho de Melo Date: Mon Jan 3 16:51:39 2011 -0200 perf evsel: Delete the event selectors at exit Freeing all the possibly allocated resources, reducing complexity on each tool exit path. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 1e7972cc5c16e06f258b0278d8c9adfb5aa75c68 Author: Arnaldo Carvalho de Melo Date: Mon Jan 3 16:50:55 2011 -0200 perf util: Move do_read from session to util Not really something to be exported from session.c. Rename it to 'readn' as others did in the past. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit daec78a09de3df5fbfbbd167da0304d49d7fcfe5 Author: Arnaldo Carvalho de Melo Date: Mon Jan 3 16:49:44 2011 -0200 perf evsel: Adopt MATCH_EVENT macro from 'stat' Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 69aad6f1ee69546dea8535ab8f3da9f445d57328 Author: Arnaldo Carvalho de Melo Date: Mon Jan 3 16:39:04 2011 -0200 perf tools: Introduce event selectors Out of ad-hoc code and global arrays with hard coded sizes. This is the first step on having a library that will be first used on regression tests in the 'perf test' tool. [acme@felicio linux]$ size /tmp/perf.before text data bss dec hex filename 1273776 97384 5104416 6475576 62cf38 /tmp/perf.before [acme@felicio linux]$ size /tmp/perf.new text data bss dec hex filename 1275422 97416 1392416 2765254 2a31c6 /tmp/perf.new Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 9e76a97efd31a08cb19d0ba12013b8fb4ad3e474 Author: R, Durgadoss Date: Mon Jan 3 17:22:04 2011 +0530 x86, hwmon: Add core threshold notification to therm_throt.c This patch adds code to therm_throt.c to notify core thermal threshold events. These thresholds are supported by the IA32_THERM_INTERRUPT register. The status/log for the same is monitored using the IA32_THERM_STATUS register. The necessary #defines are in msr-index.h. A call back is added to mce.h, to further notify the thermal stack, about the threshold events. Signed-off-by: Durgadoss R LKML-Reference: Signed-off-by: H. Peter Anvin commit 56f4c400349157289b474a3fd49ee96acab0a4d7 Merge: 32ae2ad da169f5 Author: Ingo Molnar Date: Thu Dec 30 11:26:45 2010 +0100 Merge branch 'core' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/core commit c8217b8305e5e75c23617f2f4cd262527d952c0a Author: Cliff Wickman Date: Mon Dec 13 10:51:57 2010 -0600 x86, paravirt: Use native_halt on a halt, not native_safe_halt halt() should use native_halt() safe_halt() uses native_safe_halt() If CONFIG_PARAVIRT=y, halt() is defined in arch/x86/include/asm/paravirt.h as static inline void halt(void) { PVOP_VCALL0(pv_irq_ops.safe_halt); } Otherwise (no CONFIG_PARAVIRT) halt() in arch/x86/include/asm/irqflags.h is static inline void halt(void) { native_halt(); } So it looks to me like the CONFIG_PARAVIRT case of using native_safe_halt() for a halt() is an oversight. Am I missing something? It probably hasn't shown up as a problem because the local apic is disabled on a shutdown or restart. But if we disable interrupts and call halt() we shouldn't expect that the halt() will re-enable interrupts. Signed-off-by: Cliff Wickman LKML-Reference: Signed-off-by: H. Peter Anvin commit 32ae2ade462146729580117d9886cc9efd83dfbe Author: Franck Bui-Huu Date: Thu Dec 23 16:04:23 2010 +0100 perf probe: Fix short file name probe location reporting After adding probes, perf-probe(1) reports the probes locations which include filenames for certain cases. But for short file names (whose length < 32), perf-probe didn't display the name correctly. It actually skipped the first character. Here's an example where 'icmp.c' was screwed: $ perf probe -n -a "icmp.c;sk=*" Add new events: probe:icmp_push_reply (on @cmp.c) probe:icmp_reply (on @cmp.c) probe:icmp_reply_1 (on @cmp.c) probe:icmp_send (on @cmp.c) probe:icmp_send_1 (on @cmp.c) probe:icmp_error (on @cmp.c) probe:icmp_error_1 (on @cmp.c) probe:icmp_error_2 (on @cmp.c) probe:icmp_error_3 (on @cmp.c) This patch fixes this bug in synthesize_perf_probe_point(). Acked-by: Masami Hiramatsu Cc: Masami Hiramatsu LKML-Reference: Signed-off-by: Franck Bui-Huu Signed-off-by: Arnaldo Carvalho de Melo commit ce0ac9e1851364fa67c991659ce1db05ab82c6ae Author: Arnaldo Carvalho de Melo Date: Sat Dec 25 18:33:12 2010 -0200 perf script: Fix event ordering settings to work with older kernels If we don't use .ordering_requires_timestamps we'll end up trying to order events with no timestamps when running on older kernels. Problem introduced in eac23d1c. After the last three fixes, perf scripting is back working, tested with new perf userspace on old and new (with sample_id_all) kernels. Cc: Frederic Weisbecker Cc: Ian Munsie Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi Cc: Torok Edwin LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit a43d3f08c64125edbdfdc3d3aa512d3e37321f37 Author: Arnaldo Carvalho de Melo Date: Sat Dec 25 12:12:25 2010 -0200 perf record: Fix use of sample_id_all userspace with !sample_id_all kernels Check if parse_single_tracepoint_event has already asked for PERF_SAMPLE_TIME. This is kludgy but short term fix for problems introduced by eac23d1c that broke 'perf script' by having different sample_types when using multiple tracepoint events when we use a perf binary that tries to use sample_id_all on an older kernel. We need to move counter creation to perf_session, support different sample_types, etc. Ongoing work on the perf test infrastructure needs this so that we can create counters to monitor threads generating specific events, etc. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi Cc: Torok Edwin Cc: Ian Munsie LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 765532c8aaac624b5f8687af6d319c6a1138a257 Author: Arnaldo Carvalho de Melo Date: Thu Dec 23 13:10:22 2010 -0200 perf script: Finish the rename from trace to script The scripts have calls to 'perf trace' that need to be converted to 'perf script', do it. This problem was introduced in 133dc4c. Reported-by: Torok Edwin Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi Cc: Torok Edwin LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit d3bd058826aa8b79590cca6c8e6d1557bf576ada Author: Yinghai Lu Date: Thu Dec 16 19:09:58 2010 -0800 x86, acpi: Parse all SRAT cpu entries even above the cpu number limitation Recent Intel new system have different order in MADT, aka will list all thread0 at first, then all thread1. But SRAT table still old order, it will list cpus in one socket all together. If the user have compiled limited NR_CPUS or boot with nr_cpus=, could have missed to put some cpus apic id to node mapping into apicid_to_node[]. for example for 4 sockets system with 64 cpus with nr_cpus=32 will get crash... [ 9.106288] Total of 32 processors activated (136190.88 BogoMIPS). [ 9.235021] divide error: 0000 [#1] SMP [ 9.235315] last sysfs file: [ 9.235481] CPU 1 [ 9.235592] Modules linked in: [ 9.245398] [ 9.245478] Pid: 2, comm: kthreadd Not tainted 2.6.37-rc1-tip-yh-01782-ge92ef79-dirty #274 /Sun Fire x4800 [ 9.265415] RIP: 0010:[] [] select_task_rq_fair+0x4f0/0x623 ... [ 9.645938] RIP [] select_task_rq_fair+0x4f0/0x623 [ 9.665356] RSP [ 9.665568] ---[ end trace 2296156d35fdfc87 ]--- So let just parse all cpu entries in SRAT. Also add apicid checking with MAX_LOCAL_APIC, in case We could out of boundaries of apicid_to_node[]. it fixes following bug too. https://bugzilla.kernel.org/show_bug.cgi?id=22662 -v2: expand to 32bit according to hpa need to add MAX_LOCAL_APIC for 32bit Reported-and-Tested-by: Wu Fengguang Reported-by: Bjorn Helgaas Tested-by: Myron Stowe Signed-off-by: Yinghai Lu LKML-Reference: <4D0AD486.9020704@kernel.org> Signed-off-by: H. Peter Anvin commit 56d91f132c9be66e98cce1b1e77a28027048bb26 Author: Yinghai Lu Date: Thu Dec 16 19:09:24 2010 -0800 x86, acpi: Add MAX_LOCAL_APIC for 32bit We should use MAX_LOCAL_APIC for max apic ids and MAX_APICS as number of local apics. Also apic_version[] array should use MAX_LOCAL_APICs. Signed-off-by: Yinghai Lu LKML-Reference: <4D0AD464.2020408@kernel.org> Signed-off-by: H. Peter Anvin commit 94462ad3b14739d158a1ab87bb30008c1e5a6bc1 Author: Steven Rostedt Date: Mon Nov 29 13:15:42 2010 -0500 module: Move RO/NX module protection to after ftrace module update The commit: 84e1c6bb38eb318e456558b610396d9f1afaabf0 x86: Add RO/NX protection for loadable kernel modules Broke the function tracer with this output: ------------[ cut here ]------------ WARNING: at kernel/trace/ftrace.c:1014 ftrace_bug+0x114/0x171() Hardware name: Precision WorkStation 470 Modules linked in: i2c_core(+) Pid: 86, comm: modprobe Not tainted 2.6.37-rc2+ #68 Call Trace: [] warn_slowpath_common+0x85/0x9d [] ? __process_new_adapter+0x7/0x34 [i2c_core] [] ? __process_new_adapter+0x7/0x34 [i2c_core] [] warn_slowpath_null+0x1a/0x1c [] ftrace_bug+0x114/0x171 [] ? __process_new_adapter+0x7/0x34 [i2c_core] [] ftrace_process_locs+0x1ae/0x274 [] ? __process_new_adapter+0x7/0x34 [i2c_core] [] ftrace_module_notify+0x39/0x44 [] notifier_call_chain+0x37/0x63 [] __blocking_notifier_call_chain+0x46/0x5b [] blocking_notifier_call_chain+0x14/0x16 [] sys_init_module+0x73/0x1f3 [] system_call_fastpath+0x16/0x1b ---[ end trace 2aff4f4ca53ec746 ]--- ftrace faulted on writing [] __process_new_adapter+0x7/0x34 [i2c_core] The cause was that the module text was set to read only before ftrace could convert the calls to mcount to nops. Thus, the conversions failed due to not being able to write to the text locations. The simple fix is to move setting the module to read only after the module notifiers are called (where ftrace sets the module mcounts to nops). Reported-by: Peter Zijlstra Acked-by: Rusty Russell Signed-off-by: Steven Rostedt commit 104db7ff1d9d01a03a2568a156b19e1fd972e8bf Merge: 4a7863c 32b2b6e Author: Ingo Molnar Date: Thu Dec 23 14:19:45 2010 +0100 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core commit 394f4528c523d88daabd50f883a8d6b164075555 Merge: 90a8a73 3c2dcf2 Author: Ingo Molnar Date: Thu Dec 23 12:57:04 2010 +0100 Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu into core/rcu commit 26e20a108caca6231c6a5ec659f815a866904751 Merge: 691513f 90a8a73 Author: Ingo Molnar Date: Thu Dec 23 09:48:41 2010 +0100 Merge commit 'v2.6.37-rc7' into x86/security commit 32b2b6ec57a3adb3ab7215fbf36ec61c15de06ee Author: Franck Bui-Huu Date: Wed Dec 22 17:37:13 2010 +0100 perf probe: Fix wrong warning in __show_one_line() if read(1) errors happen This was introduced by commit fde52dbd7f71934aba4e150f3d1d51e826a08850. Cc: H. Peter Anvin Cc: Masami Hiramatsu Cc: Ingo Molnar Cc: Thomas Gleixner LKML-Reference: Signed-off-by: Franck Bui-Huu Signed-off-by: Arnaldo Carvalho de Melo commit d3678758048308049cdad31ec3eae063be17c0db Author: Arnaldo Carvalho de Melo Date: Tue Dec 21 23:38:37 2010 -0200 perf test: Look forward for symbol aliases Not just before, fixing these false positives: [acme@mica linux]$ perf test -v 1 1: vmlinux symtab matches kallsyms: --- start --- Looking at the vmlinux_path (6 entries long) Using //lib/modules/2.6.37-rc5-00180-ge06b6bf/build/vmlinux for symbols 0xffffffff81058dc0: diff name v: sys_vm86old k: sys_ni_syscall 0xffffffff81058dc0: diff name v: sys_vm86 k: sys_ni_syscall 0xffffffff81058dc0: diff name v: sys_subpage_prot k: sys_ni_syscall 0xffffffff810b5f7c: diff name v: probe_kernel_write k: __probe_kernel_write 0xffffffff810b5fe5: diff name v: probe_kernel_read k: __probe_kernel_read 0xffffffff811bc380: diff name v: __memset k: memset 0xffffffff81384a98: diff name v: __sched_text_start k: sleep_on_common 0xffffffff81386750: diff name v: __sched_text_end k: _raw_spin_trylock 0xffffffff8138cee8: diff name v: __irqentry_text_start k: do_IRQ 0xffffffff8138f079: diff name v: __start_notes k: _etext 0xffffffff8138f079: diff name v: __stop_notes k: _etext ---- end ---- vmlinux symtab matches kallsyms: FAILED! [acme@mica linux]$ Some are weak functions, others are just markers, etc. They get in the rb tree with the same addr, so we need to look around to find the symbol with the same name. We were looking just at the previous entries with the same addr, look forward too. Cc: Frederic Weisbecker Cc: Han Pingtian Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Stephane Eranian LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 3b01a413c196c91040d41c86e5b56f76bb369f74 Author: Arnaldo Carvalho de Melo Date: Wed Dec 22 01:08:36 2010 -0200 perf symbols: Improve kallsyms symbol end addr calculation For kallsyms we don't have the symbol address end, so we do an extra pass and set the symbol end addr as being the start of the next minus one. But this was being done just after we filtered the symbols of a particular type (functions, variables), so the symbol end was sometimes after what it really is. Fixing up symbol end also was falling apart when we have symbol aliases, then the end address of all but the last alias was being set to be before its start. Fix it up by checking for symbol aliases and making the kallsyms__parse routine use the next symbol, whatever its type, as the limit for the previous symbol, passing that end address to the callback. This was detected by the 'perf test' synthetic paranoid regression tests, fix it up so that even that case doesn't mislead us. Cc: Frederic Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Stephane Eranian LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 4a7863cc2eb5f9804f1c4e9156619a801cd7f14f Author: Don Zickus Date: Wed Dec 22 14:00:03 2010 -0500 x86, nmi_watchdog: Remove ARCH_HAS_NMI_WATCHDOG and rely on CONFIG_HARDLOCKUP_DETECTOR The x86 arch has shifted its use of the nmi_watchdog from a local implementation to the global one provide by kernel/watchdog.c. This shift has caused a whole bunch of compile problems under different config options. I attempt to simplify things with the patch below. In order to simplify things, I had to come to terms with the meaning of two terms ARCH_HAS_NMI_WATCHDOG and CONFIG_HARDLOCKUP_DETECTOR. Basically they mean the same thing, the former on a local level and the latter on a global level. With the old x86 nmi watchdog gone, there is no need to rely on defining the ARCH_HAS_NMI_WATCHDOG variable because it doesn't make sense any more. x86 will now use the global implementation. The changes below do a few things. First it changes the few places that relied on ARCH_HAS_NMI_WATCHDOG to use CONFIG_X86_LOCAL_APIC (the former was an alias for the latter anyway, so nothing unusual here). Those pieces of code were relying more on local apic functionality the nmi watchdog functionality, so the change should make sense. Second, I removed the x86 implementation of touch_nmi_watchdog(). It isn't need now, instead x86 will rely on kernel/watchdog.c's implementation. Third, I removed the #define ARCH_HAS_NMI_WATCHDOG itself from x86. And tweaked the include/linux/nmi.h file to tell users to look for an externally defined touch_nmi_watchdog in the case of ARCH_HAS_NMI_WATCHDOG _or_ CONFIG_HARDLOCKUP_DETECTOR. This changes removes some of the ugliness in that file. Finally, I added a Kconfig dependency for CONFIG_HARDLOCKUP_DETECTOR that said you can't have ARCH_HAS_NMI_WATCHDOG _and_ CONFIG_HARDLOCKUP_DETECTOR. You can only have one nmi_watchdog. Tested with ARCH=i386: allnoconfig, defconfig, allyesconfig, (various broken configs) ARCH=x86_64: allnoconfig, defconfig, allyesconfig, (various broken configs) Hopefully, after this patch I won't get any more compile broken emails. :-) v3: changed a couple of 'linux/nmi.h' -> 'asm/nmi.h' to pick-up correct function prototypes when CONFIG_HARDLOCKUP_DETECTOR is not set. Signed-off-by: Don Zickus Cc: Peter Zijlstra Cc: fweisbec@gmail.com LKML-Reference: <1293044403-14117-1-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar commit 9fb67204d7a00a6444bc121f221527034613d338 Merge: 8c1df40 287050d Author: Ingo Molnar Date: Wed Dec 22 12:46:12 2010 +0100 Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core commit d8850ba425d9823d3184bd52f065899dac4689f9 Author: Jack Steiner Date: Tue Nov 30 13:55:40 2010 -0600 x86, UV: Fix the effect of extra bits in the hub nodeid register UV systems can be partitioned into multiple independent SSIs. Large partitioned systems may have extra bits in the node_id register. These bits are used when the total memory on all SSIs exceeds 16TB. These extra bits need to be ignored when calculating x2apic_extra_bits. Signed-off-by: Jack Steiner LKML-Reference: <20101130195926.972776133@sgi.com> Signed-off-by: Ingo Molnar commit e681041388e61ecd7f99dba66b3c1db11a564d92 Author: Jack Steiner Date: Tue Nov 30 13:55:39 2010 -0600 x86, UV: Add common uv_early_read_mmr() function for reading MMRs Early in boot, reading MMRs from the UV hub controller require calls to early_ioremap()/early_iounmap(). Rather than duplicating code, add a common function to do the map/read/unmap. Signed-off-by: Jack Steiner LKML-Reference: <20101130195926.834804371@sgi.com> Signed-off-by: Ingo Molnar commit 8c1df4002aa425973d7d25ffa56c042acd953bed Merge: 6c529a2 21dd9ae Author: Ingo Molnar Date: Wed Dec 22 11:54:50 2010 +0100 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core commit 6c529a266bdc590a870ee2d2092ff6527eff427b Merge: 7639dae 90a8a73 Author: Ingo Molnar Date: Wed Dec 22 11:53:20 2010 +0100 Merge commit 'v2.6.37-rc7' into perf/core Merge reason: Pick up the latest -rc. Signed-off-by: Ingo Molnar commit 21dd9ae5a4e9f717f3957ec934dd3158129436b8 Author: Franck Bui-Huu Date: Mon Dec 20 15:18:05 2010 +0100 perf probe: Handle gracefully some stupid and buggy line syntaxes Currently perf probe doesn't handle those incorrect syntaxes: $ perf probe -L sched.c:++13 $ perf probe -L sched.c:-+13 $ perf probe -L sched.c:10000000000000000000000000000+13 This patches rewrites parse_line_range_desc() to handle them. As a bonus, it reports more useful error messages instead of: "Tailing with invalid character...". Acked-by: Masami Hiramatsu Cc: Masami Hiramatsu LKML-Reference: <1292854685-8230-7-git-send-email-fbuihuu@gmail.com> Signed-off-by: Franck Bui-Huu Signed-off-by: Arnaldo Carvalho de Melo commit fde52dbd7f71934aba4e150f3d1d51e826a08850 Author: Franck Bui-Huu Date: Mon Dec 20 15:18:04 2010 +0100 perf probe: Don't always consider EOF as an error when listing source code When listing a whole file or a function which is located at the end, perf-probe -L output wrongly: "Source file is shorter than expected.". This is because show_one_line() always consider EOF as an error. This patch fixes this by not considering EOF as an error when dumping the trailing lines. Otherwise it's still an error and perf-probe still outputs its warning. Acked-by: Masami Hiramatsu Cc: Masami Hiramatsu LKML-Reference: <1292854685-8230-6-git-send-email-fbuihuu@gmail.com> Signed-off-by: Franck Bui-Huu Signed-off-by: Arnaldo Carvalho de Melo commit 9d95b580a8d64ef4d1660a21a9de0658fe29f041 Author: Franck Bui-Huu Date: Mon Dec 20 15:18:03 2010 +0100 perf probe: Fix line range description since a single file is allowed $ perf-probe -L sched.c is currently allowed but not documented. Cc: Masami Hiramatsu LKML-Reference: <1292854685-8230-5-git-send-email-fbuihuu@gmail.com> Signed-off-by: Franck Bui-Huu Signed-off-by: Arnaldo Carvalho de Melo commit 44b81e929b0c00e703a31a3d634b668bb27eb1c8 Author: Franck Bui-Huu Date: Mon Dec 20 15:18:02 2010 +0100 perf probe: Clean up redundant tests in show_line_range() It also removes some superflous parentheses. Cc: Masami Hiramatsu LKML-Reference: <1292854685-8230-4-git-send-email-fbuihuu@gmail.com> Signed-off-by: Franck Bui-Huu Signed-off-by: Arnaldo Carvalho de Melo commit befe341468f4e61ecaf337a0237f2aab76817437 Author: Franck Bui-Huu Date: Mon Dec 20 15:18:01 2010 +0100 perf probe: Rewrite show_one_line() to make it simpler Cc: Masami Hiramatsu LKML-Reference: <1292854685-8230-3-git-send-email-fbuihuu@gmail.com> Signed-off-by: Franck Bui-Huu Signed-off-by: Arnaldo Carvalho de Melo commit 62c15fc49bd1b35d79b34ea96f132ab435e2215a Author: Franck Bui-Huu Date: Mon Dec 20 15:18:00 2010 +0100 perf probe: Make -L display the absolute path of the dumped file The actual file used by 'perf probe -L sched.c' is reported in the ouput of the command. But it's simply displayed as it has been given to the command (simply sched.c) which is too ambiguous to be really usefull since several sched.c files can be found into the same project and we also don't know which search path has been used. Acked-by: Masami Hiramatsu Cc: Masami Hiramatsu LKML-Reference: <1292854685-8230-2-git-send-email-fbuihuu@gmail.com> Signed-off-by: Franck Bui-Huu Signed-off-by: Arnaldo Carvalho de Melo commit 0e43e5d222095ca2d1d825dd2e4fa158bdc4cc9b Author: Masami Hiramatsu Date: Fri Dec 17 22:12:11 2010 +0900 perf probe: Cleanup messages Add new lines for error or debug messages, change dwarf related words to more generic words (or just removed). Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Srikar Dronamraju Cc: Steven Rostedt LKML-Reference: <20101217131211.24123.40437.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu Signed-off-by: Arnaldo Carvalho de Melo commit ec5761eab318e50e69fcf8e63e9edaef5949c067 Author: David Ahern Date: Thu Dec 9 13:27:07 2010 -0700 perf symbols: Add symfs option for off-box analysis using specified tree The symfs argument allows analysis of perf.data file using a locally accessible filesystem tree with debug symbols - e.g., tree created during image builds, sshfs mount, loop mounted KVM disk images, USB keys, initrds, etc. Anything with an OS tree can be analyzed from anywhere without the need to populate a local data store with build-ids. Commiter notes: o Fixed up symfs="/" variants handling. o prefixed DSO__ORIG_GUEST_KMODULE case with symfs too, avoiding use of files outside the symfs directory. LKML-Reference: <1291926427-28846-1-git-send-email-daahern@cisco.com> Signed-off-by: David Ahern Signed-off-by: Arnaldo Carvalho de Melo commit eac23d1c384b55e4bbb89ea9e5a6bb77fb4d1140 Author: Ian Munsie Date: Thu Dec 9 16:33:53 2010 +1100 perf record,report,annotate,diff: Process events in order This patch changes perf report to ask for the ID info on all events be default if recording from multiple CPUs. Perf report, annotate and diff will now process the events in order if the kernel is able to provide timestamps on all events. This ensures that events such as COMM and MMAP which are necessary to correctly interpret samples are processed prior to those samples so that they are attributed correctly. Before: # perf record ./cachetest # perf report # Events: 6K cycles # # Overhead Command Shared Object Symbol # ........ ....... ................. ............................... # 74.11% :3259 [unknown] [k] 0x4a6c 1.50% cachetest ld-2.11.2.so [.] 0x1777c 1.46% :3259 [kernel.kallsyms] [k] .perf_event_mmap_ctx 1.25% :3259 [kernel.kallsyms] [k] restore 0.74% :3259 [kernel.kallsyms] [k] ._raw_spin_lock 0.71% :3259 [kernel.kallsyms] [k] .filemap_fault 0.66% :3259 [kernel.kallsyms] [k] .memset 0.54% cachetest [kernel.kallsyms] [k] .sha_transform 0.54% :3259 [kernel.kallsyms] [k] .copy_4K_page 0.54% :3259 [kernel.kallsyms] [k] .find_get_page 0.52% :3259 [kernel.kallsyms] [k] .trace_hardirqs_off 0.50% :3259 [kernel.kallsyms] [k] .__do_fault After: # perf report # Events: 6K cycles # # Overhead Command Shared Object Symbol # ........ ....... ................. ............................... # 44.28% cachetest cachetest [.] sumArrayNaive 22.53% cachetest cachetest [.] sumArrayOptimal 6.59% cachetest ld-2.11.2.so [.] 0x1777c 2.13% cachetest [unknown] [k] 0x340 1.46% cachetest [kernel.kallsyms] [k] .perf_event_mmap_ctx 1.25% cachetest [kernel.kallsyms] [k] restore 0.74% cachetest [kernel.kallsyms] [k] ._raw_spin_lock 0.71% cachetest [kernel.kallsyms] [k] .filemap_fault 0.66% cachetest [kernel.kallsyms] [k] .memset 0.54% cachetest [kernel.kallsyms] [k] .copy_4K_page 0.54% cachetest [kernel.kallsyms] [k] .find_get_page 0.54% cachetest [kernel.kallsyms] [k] .sha_transform 0.52% cachetest [kernel.kallsyms] [k] .trace_hardirqs_off 0.50% cachetest [kernel.kallsyms] [k] .__do_fault Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Thomas Gleixner LKML-Reference: <1291872833-839-1-git-send-email-imunsie@au1.ibm.com> Signed-off-by: Ian Munsie Signed-off-by: Arnaldo Carvalho de Melo commit 21ef97f05a7da5bc23b26cb34d6746f83ca9bf20 Author: Ian Munsie Date: Fri Dec 10 14:09:16 2010 +1100 perf session: Fallback to unordered processing if no sample_id_all If we are running the new perf on an old kernel without support for sample_id_all, we should fall back to the old unordered processing of events. If we didn't than we would *always* process events without timestamps out of order, whether or not we hit a reordering race. In other words, instead of there being a chance of not attributing samples correctly, we would guarantee that samples would not be attributed. While processing all events without timestamps before events with timestamps may seem like an intuitive solution, it falls down as PERF_RECORD_EXIT events would also be processed before any samples. Even with a workaround for that case, samples before/after an exec would not be attributed correctly. This patch allows commands to indicate whether they need to fall back to unordered processing, so that commands that do not care about timestamps on every event will not be affected. If we do fallback, this will print out a warning if report -D was invoked. This patch adds the test in perf_session__new so that we only need to test once per session. Commands that do not use an event_ops (such as record and top) can simply pass NULL in it's place. Acked-by: Thomas Gleixner Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Thomas Gleixner LKML-Reference: <1291951882-sup-6069@au1.ibm.com> Signed-off-by: Ian Munsie Signed-off-by: Arnaldo Carvalho de Melo commit 19e5eebb8eaa5ca3ff8aa18cb57ccb7a9f67277d Author: Paul Turner Date: Wed Dec 15 19:10:18 2010 -0800 sched: Fix interactivity bug by charging unaccounted run-time on entity re-weight Mike Galbraith reported poor interactivity[*] when the new shares distribution code was combined with autogroups. The root cause turns out to be a mis-ordering of accounting accrued execution time and shares updates. Since update_curr() is issued hierarchically, updating the parent entity weights to reflect child enqueue/dequeue results in the parent's unaccounted execution time then being accrued (vs vruntime) at the new weight as opposed to the weight present at accumulation. While this doesn't have much effect on processes with timeslices that cross a tick, it is particularly problematic for an interactive process (e.g. Xorg) which incurs many (tiny) timeslices. In this scenario almost all updates are at dequeue which can result in significant fairness perturbation (especially if it is the only thread, resulting in potential {tg->shares, MIN_SHARES} transitions). Correct this by ensuring unaccounted time is accumulated prior to manipulating an entity's weight. [*] http://xkcd.com/619/ is perversely Nostradamian here. Signed-off-by: Paul Turner Signed-off-by: Peter Zijlstra Cc: Mike Galbraith Cc: Linus Torvalds LKML-Reference: <20101216031038.159704378@google.com> Signed-off-by: Ingo Molnar commit 43365bd7ff37979d2afdccbe953299ed64a4649b Author: Paul Turner Date: Wed Dec 15 19:10:17 2010 -0800 sched: Move periodic share updates to entity_tick() Long running entities that do not block (dequeue) require periodic updates to maintain accurate share values. (Note: group entities with several threads are quite likely to be non-blocking in many circumstances). By virtue of being long-running however, we will see entity ticks (otherwise the required update occurs in dequeue/put and we are done). Thus we can move the detection (and associated work) for these updates into the periodic path. This restores the 'atomicity' of update_curr() with respect to accounting. Signed-off-by: Paul Turner Signed-off-by: Peter Zijlstra LKML-Reference: <20101216031038.067028969@google.com> Signed-off-by: Ingo Molnar commit ca680888d5d0d03862ec311a83c6a1c7a1e00a01 Merge: 40dc11f b0c3844 Author: Ingo Molnar Date: Sun Dec 19 16:35:08 2010 +0100 Merge commit 'v2.6.37-rc6' into sched/core Merge reason: Update to the latest -rc. Signed-off-by: Ingo Molnar commit da169f5df2764a6a937cb3b07562e269edfb1c0e Author: Robert Richter Date: Fri Sep 24 15:54:43 2010 +0200 oprofile, x86: Add support for 6 counters (AMD family 15h) This patch adds support for up to 6 hardware counters for AMD family 15h cpus. There is a new MSR range for hardware counters beginning at MSRC001_0200 Performance Event Select (PERF_CTL0). Signed-off-by: Robert Richter commit 30570bced107243d5227527dd5317b22883dcf0c Author: Robert Richter Date: Tue Aug 31 10:44:38 2010 +0200 oprofile, x86: Add support for AMD family 15h This patch adds support for AMD family 15h (Interlagos/Valencia/ Zambezi) cpus. Signed-off-by: Robert Richter commit 3c2dcf2aed5ea22ecf65a9a871c4963faec421b3 Author: Paul E. McKenney Date: Wed Dec 15 21:12:15 2010 -0800 rcu: remove unused __list_for_each_rcu() macro Signed-off-by: Paul E. McKenney commit 8a9c1cee26c0ece23b38c45b92b724438878f842 Author: Mariusz Kozlowski Date: Wed Dec 15 23:11:12 2010 +0100 rculist: fix borked __list_for_each_rcu() macro This restores parentheses blance. Signed-off-by: Mariusz Kozlowski Signed-off-by: Paul E. McKenney commit b52573d2796274f7f31cfeff7185c320adcd4f12 Author: Paul E. McKenney Date: Tue Dec 14 17:36:02 2010 -0800 rcu: reduce __call_rcu()-induced contention on rcu_node structures When the current __call_rcu() function was written, the expedited APIs did not exist. The __call_rcu() implementation therefore went to great lengths to detect the end of old grace periods and to start new ones, all in the name of reducing grace-period latency. Now the expedited APIs do exist, and the usage of __call_rcu() has increased considerably. This commit therefore causes __call_rcu() to avoid worrying about grace periods unless there are a large number of RCU callbacks stacked up on the current CPU. Signed-off-by: Paul E. McKenney commit 0209f6490b030f35349a2bb71294f3fd75b0f36d Author: Paul E. McKenney Date: Tue Dec 14 16:07:52 2010 -0800 rcu: limit rcu_node leaf-level fanout Some recent benchmarks have indicated possible lock contention on the leaf-level rcu_node locks. This commit therefore limits the number of CPUs per leaf-level rcu_node structure to 16, in other words, there can be at most 16 rcu_data structures fanning into a given rcu_node structure. Prior to this, the limit was 32 on 32-bit systems and 64 on 64-bit systems. Note that the fanout of non-leaf rcu_node structures is unchanged. The organization of accesses to the rcu_node tree is such that references to non-leaf rcu_node structures are much less frequent than to the leaf structures. Signed-off-by: Paul E. McKenney commit 121dfc4b3eba9e2f3c42d35205a3510cc65b9931 Author: Paul E. McKenney Date: Fri Dec 10 15:02:47 2010 -0800 rcu: fine-tune grace-period begin/end checks Use the CPU's bit in rnp->qsmask to determine whether or not the CPU should try to report a quiescent state. Handle overflow in the check for rdp->gpnum having fallen behind. Signed-off-by: Paul E. McKenney commit 5ff8e6f0535fe730e921ca347bc38dcb9e01791a Author: Frederic Weisbecker Date: Fri Dec 10 22:11:11 2010 +0100 rcu: Keep gpnum and completed fields synchronized When a CPU that was in an extended quiescent state wakes up and catches up with grace periods that remote CPUs completed on its behalf, we update the completed field but not the gpnum that keeps a stale value of a backward grace period ID. Later, note_new_gpnum() will interpret the shift between the local CPU and the node grace period ID as some new grace period to handle and will then start to hunt quiescent state. But if every grace periods have already been completed, this interpretation becomes broken. And we'll be stuck in clusters of spurious softirqs because rcu_report_qs_rdp() will make this broken state run into infinite loop. The solution, as suggested by Lai Jiangshan, is to ensure that the gpnum and completed fields are well synchronized when we catch up with completed grace periods on their behalf by other cpus. This way we won't start noting spurious new grace periods. Suggested-by: Lai Jiangshan Signed-off-by: Frederic Weisbecker Cc: Paul E. McKenney Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Steven Rostedt commit 20377f32dcb77941d450728da18cce5b1a7faec5 Author: Frederic Weisbecker Date: Fri Dec 10 22:11:10 2010 +0100 rcu: Stop chasing QS if another CPU did it for us When a CPU is idle and others CPUs handled its extended quiescent state to complete grace periods on its behalf, it will catch up with completed grace periods numbers when it wakes up. But at this point there might be no more grace period to complete, but still the woken CPU always keeps its stale qs_pending value and will then continue to chase quiescent states even if its not needed anymore. This results in clusters of spurious softirqs until a new real grace period is started. Because if we continue to chase quiescent states but we have completed every grace periods, rcu_report_qs_rdp() is puzzled and makes that state run into infinite loops. As suggested by Lai Jiangshan, just reset qs_pending if someone completed every grace periods on our behalf. Suggested-by: Lai Jiangshan Signed-off-by: Frederic Weisbecker Cc: Paul E. McKenney Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Steven Rostedt Signed-off-by: Paul E. McKenney commit e27fc9641e8ddc8146f8e01f06e5eba2469698de Author: Tejun Heo Date: Mon Nov 22 21:36:11 2010 -0800 rcu: increase synchronize_sched_expedited() batching The fix in commit #6a0cc49 requires more than three concurrent instances of synchronize_sched_expedited() before batching is possible. This patch uses a ticket-counter-like approach that is also not unrelated to Lai Jiangshan's Ring RCU to allow sharing of expedited grace periods even when there are only two concurrent instances of synchronize_sched_expedited(). This commit builds on Tejun's original posting, which may be found at http://lkml.org/lkml/2010/11/9/204, adding memory barriers, avoiding overflow of signed integers (other than via atomic_t), and fixing the detection of batching. Signed-off-by: Tejun Heo Signed-off-by: Paul E. McKenney commit 846f40455276617275284a4b76b89311b4aed0b9 Author: Steven Whitehouse Date: Thu Dec 16 15:18:48 2010 +0000 GFS2: Don't flush delete workqueue when releasing the transaction lock There is no requirement to flush the delete workqueue before a gfs2 filesystem is suspended. The workqueue's work will just be suspended along with the rest of the tasks on the filesystem. The resolves a deadlock situation where the transaction lock's demotion code was trying to flush the delete workqueue while at the same time, the workqueue was waiting for the transaction lock. The delete workqueue is flushed by gfs2_make_fs_ro() already, so that umount/remount are correctly protected anyway. Signed-off-by: Steven Whitehouse commit 7639dae0ca11038286bbbcda05f2bef601c1eb8d Author: Peter Zijlstra Date: Tue Dec 14 21:26:40 2010 +0100 perf, x86: Provide a PEBS capable cycle event Signed-off-by: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit abe43400579d5de0078c2d3a760e6598e183f871 Author: Peter Zijlstra Date: Wed Nov 17 23:17:37 2010 +0100 perf: Sysfs enumeration Simple sysfs emumeration of the PMUs. Use a "event_source" bus, and add PMU devices using their name. Each PMU device has a type attribute which contrains the value needed for perf_event_attr::type to identify this PMU. This is the minimal stub needed to start using this interface, we'll consider extending the sysfs usage later. Cc: Kay Sievers Cc: Greg KH Signed-off-by: Peter Zijlstra LKML-Reference: <20101117222056.316982569@chello.nl> Signed-off-by: Ingo Molnar commit 2e80a82a49c4c7eca4e35734380f28298ba5db19 Author: Peter Zijlstra Date: Wed Nov 17 23:17:36 2010 +0100 perf: Dynamic pmu types Extend the perf_pmu_register() interface to allow for named and dynamic pmu types. Because we need to support the existing static types we cannot use dynamic types for everything, hence provide a type argument. If we want to enumerate the PMUs they need a name, provide one. Signed-off-by: Peter Zijlstra LKML-Reference: <20101117222056.259707703@chello.nl> Signed-off-by: Ingo Molnar commit 9f58a205c62d0dad1df38d076324a89b1a0f1d65 Author: Peter Zijlstra Date: Wed Nov 17 23:17:35 2010 +0100 init: Initialized IDR earlier perf_event_init() wants to start using IDR trees, its needs in turn are satisfied by mm_init(). Signed-off-by: Peter Zijlstra LKML-Reference: <20101117222056.206992649@chello.nl> Signed-off-by: Ingo Molnar commit 24a24bb6ff3dc3a09bb131241be920ecc3f0e519 Author: Peter Zijlstra Date: Wed Nov 17 23:17:33 2010 +0100 perf: Move perf_event_init() into main.c Currently we call perf_event_init() from sched_init(). In order to make it more obvious move it to the cannnonical location. Signed-off-by: Peter Zijlstra LKML-Reference: <20101117222056.093629821@chello.nl> Signed-off-by: Ingo Molnar commit 4407204c5c9037763aadce39b025529dfbfcac9e Author: Peter Zijlstra Date: Wed Dec 8 15:56:23 2010 +0100 perf, x86: Detect broken BIOSes that corrupt the PMU Some BIOSes use PMU resources, which can cause various bugs: - Non-working or erratic PMU based statistics - the PMU can end up counting the wrong thing, resulting in misleading statistics - Profiling can stop working or it can profile the wrong thing - A non-working or erratic NMI watchdog that cannot be relied on - The kernel may disturb whatever thing the BIOS tries to use the PMU for - possibly causing hardware malfunction in extreme cases. - ... and other forms of potential misbehavior Various forms of such misbehavior has been observed in practice - there are BIOSes that just corrupt the PMU state, consequences be damned. The PMU is a CPU resource that is handled by the kernel and the BIOS stealing+corrupting it is not acceptable nor robust, so we detect it, warn about it and further refuse to touch the PMU ourselves. Signed-off-by: Peter Zijlstra Cc: Jason Wessel Cc: Don Zickus Cc: Linus Torvalds Cc: Thomas Gleixner Cc: "H. Peter Anvin" LKML-Reference: Signed-off-by: Ingo Molnar commit 006b20fe4c69189b0d854e5eabf269e50ca86cdd Merge: 5f29805 d949750 Author: Ingo Molnar Date: Thu Dec 16 11:22:25 2010 +0100 Merge branch 'perf/urgent' into perf/core Merge reason: We want to apply a dependent patch. Signed-off-by: Ingo Molnar commit 88606e80da0e8d862a42ee19e5bb60b01b940ea7 Author: Thomas Gleixner Date: Tue Dec 14 21:37:13 2010 +0100 MAINTAINERS: Update timer related entries Bring the existing file list up to date and add a new entry for timekeeping and ntp. Assign John Stultz to this new entry so he gets all the blame :) Signed-off-by: Thomas Gleixner Cc: John Stultz commit 3fb82d56ad003e804923185316236f26b30dfdd5 Author: Suresh Siddha Date: Tue Nov 23 16:11:40 2010 -0800 x86, suspend: Avoid unnecessary smp alternatives switch during suspend/resume During suspend, we disable all the non boot cpus. And during resume we bring them all back again. So no need to do alternatives_smp_switch() in between. On my core 2 based laptop, this speeds up the suspend path by 15msec and the resume path by 5 msec (suspend/resume speed up differences can be attributed to the different P-states that the cpu is in during suspend/resume). Signed-off-by: Suresh Siddha LKML-Reference: <1290557500.4946.8.camel@sbsiddha-MOBL3.sc.intel.com> Cc: Rafael J. Wysocki Signed-off-by: H. Peter Anvin commit 5f29805a4f4627e766f862ff9f10c14f5f314359 Author: Don Zickus Date: Mon Dec 13 10:31:58 2010 -0500 x86, watchdog: Compile fix when CONFIG_LOCAL_APIC not enabled When adjusting the code to handle removing the old nmi watchdog, I forgot to consider the compile case when the local apic is not enabled. This change fixes the following build error: arch/x86/kernel/apic/hw_nmi.c:28:6: error: redefinition of ‘touch_nmi_watchdog’ Signed-off-by: Don Zickus Acked-by: Randy Dunlap Cc: Randy Dunlap Cc: Stephen Rothwell Cc: Rakib Mullick LKML-Reference: <20101213153719.GD18577@redhat.com> Signed-off-by: Ingo Molnar commit a8760eca6cf60ed303ad494ef45901f63165d2c8 Author: Thomas Gleixner Date: Mon Dec 13 11:28:02 2010 +0100 x86: Check tsc available/disabled in the delayed init function The delayed TSC init function does not check whether the system has no TSC or TSC is disabled at the kernel command line, which results in a crash in the work queue based extended calibration due to division by zero because the basic calibration never happened. Add the missing checks and do not touch TSC when not available or disabled. Signed-off-by: Thomas Gleixner Cc: John Stultz commit 7496351ad87e61e96b49dd7b43c6534e3401f566 Author: Christoph Lameter Date: Tue Nov 30 14:05:53 2010 -0600 timers: Use this_cpu_read Eric asked for this. [tglx: Because it generates faster code according to Erics ] Signed-off-by: Christoph Lameter Cc: Pekka Enberg Cc: Eric Dumazet Cc: Mathieu Desnoyers Cc: Tejun Heo Cc: linux-mm@kvack.org LKML-Reference: Signed-off-by: Thomas Gleixner commit 45f74264e18449cf3c93cccaf098ee6e9524ab78 Author: Thomas Gleixner Date: Sat Dec 11 12:34:34 2010 +0100 timerqueue: Make timerqueue_getnext() static inline No point in calling a function just to dereference a pointer. Signed-off-by: Thomas Gleixner Cc: John Stultz commit b007c389d3e09b823eccda1503390fa2a9adca0d Author: John Stultz Date: Fri Dec 10 22:19:53 2010 -0800 hrtimer: fix timerqueue conversion flub In converting the hrtimers to timerqueue, I missed a spot in hrtimer_run_queues where we loop running timers. We end up not pulling the new next value out and instead just use the last next value, causing boot time hangs in some cases. The proper fix is to pull timerqueue_getnext each iteration instead of using a local next value. Reported-by: Ingo Molnar Signed-off-by: John Stultz commit 998adc3dda59f811966b3ccb21eb223680b25ec4 Author: John Stultz Date: Mon Sep 20 19:19:17 2010 -0700 hrtimers: Convert hrtimers to use timerlist infrastructure Converts the hrtimer code to use the new timerlist infrastructure Signed-off-by: John Stultz LKML Reference: <1290136329-18291-3-git-send-email-john.stultz@linaro.org> Reviewed-by: Thomas Gleixner CC: Alessandro Zummo CC: Thomas Gleixner CC: Richard Cochran commit 9bb99b147018945366c763b3d4d7008927dc8557 Author: John Stultz Date: Mon Dec 6 13:32:12 2010 -0800 timers: Fixup allmodconfig build issue Adds missed EXPORT_SYMBOL lines that cause the following build failures with allmodconfig: ERROR: "timerqueue_add" [drivers/rtc/rtc-core.ko] undefined! ERROR: "timerqueue_getnext" [drivers/rtc/rtc-core.ko] undefined! ERROR: "timerqueue_del" [drivers/rtc/rtc-core.ko] undefined! Reported-by: Ingo Molnar Reported-by: Thomas Gleixner Signed-off-by: John Stultz commit 1f5a24794a54588ea3a9efd521be31d826e0b9d7 Author: John Stultz Date: Thu Dec 9 12:02:18 2010 -0800 timers: Rename timerlist infrastructure to timerqueue Thomas pointed out a namespace collision between the new timerlist infrastructure I introduced and the existing timer_list.c So to avoid confusion, I've renamed the timerlist infrastructure to timerqueue. Reported-by: Thomas Gleixner Signed-off-by: John Stultz commit 67b96c182c36c83cd6881122b4a7922b2634047b Merge: efc70d2 ddbc24b Author: Ingo Molnar Date: Fri Dec 10 00:31:30 2010 +0100 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core commit efc70d241f5c7fc0a9f1c2a01781ed946d9dbe21 Author: Ingo Molnar Date: Fri Dec 10 00:27:23 2010 +0100 perf, sparc: Fix CONFIG_PERF_EVENTS=y build error Fix a typo in: 004417a6d468: perf, arch: Cleanup perf-pmu init vs lockup-detector Which caused a build failure on Sparc, reported by Stephen Rothwell. Reported-by: Stephen Rothwell Cc: David S. Miller Signed-off-by: Peter Zijlstra Signed-off-by: Ingo Molnar commit 5dc3055879b8f659f62abb7c3d1eaa4d02e36d65 Author: Don Zickus Date: Mon Nov 29 17:07:17 2010 -0500 x86, NMI: Add back unknown_nmi_panic and nmi_watchdog sysctls Originally adapted from Huang Ying's patch which moved the unknown_nmi_panic to the traps.c file. Because the old nmi watchdog was deleted before this change happened, the unknown_nmi_panic sysctl was lost. This re-adds it. Also, the nmi_watchdog sysctl was re-implemented and its documentation updated accordingly. Patch-inspired-by: Huang Ying Signed-off-by: Don Zickus Reviewed-by: Cyrill Gorcunov Acked-by: Yinghai Lu Cc: fweisbec@gmail.com LKML-Reference: <1291068437-5331-3-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar commit 96a84c20d635fb1e98ab92f9fc517c4441f5c424 Author: Don Zickus Date: Mon Nov 29 17:07:16 2010 -0500 lockup detector: Compile fixes from removing the old x86 nmi watchdog My patch that removed the old x86 nmi watchdog broke other arches. This change reverts a piece of that patch and puts the change in the correct spot. Signed-off-by: Don Zickus Reviewed-by: Cyrill Gorcunov Cc: fweisbec@gmail.com Cc: yinghai@kernel.org LKML-Reference: <1291068437-5331-2-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar commit 2c6cb1053ad8b61ab9fb50b578d0ffea959f7583 Author: Rakib Mullick Date: Thu Dec 9 14:47:34 2010 +0600 x86: Address 'unused' warning in hw_nmi.c again arch/x86/kernel/apic/hw_nmi.c:29: warning: backtrace_mask defined but not used commit 0e2af2a9(x86, hw_nmi: Move backtrace_mask declaration under ARCH_HAS_NMI_WATCHDOG) addressed this warning, but it was reintroduced by commit 5f2b0ba4(x86, nmi_watchdog: Remove the old nmi_watchdog). Move backtrace_mask into the #ifdef arch_trigger_all_cpu_backtrace section again. Signed-off-by: Rakib Mullick Cc: Don Zickus Cc: Frederic Weisbecker LKML-Reference: Signed-off-by: Thomas Gleixner commit ddbc24b72c2c3f3f0182bbc2cb70b31c52a6f45b Author: Arnaldo Carvalho de Melo Date: Thu Dec 9 12:20:20 2010 -0200 perf session: Remove unneeded dump_printf calls Since we check at the beginning of the callers, no need to ask if dump_trace is set multiple times. Cc: Frederic Weisbecker Cc: Ian Munsie Cc: Ingo Molnar Cc: Thomas Gleixner LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit ba74f0640d963ccc914ac533cb0ba133ee07bcf2 Author: Thomas Gleixner Date: Tue Dec 7 12:49:01 2010 +0000 perf session: Split out user event processing Simplify further. Cc: Frederic Weisbecker Cc: Ian Munsie Cc: Ingo Molnar Cc: Peter Zijlstra LKML-Reference: <20101207124551.110956235@linutronix.de> Signed-off-by: Thomas Gleixner Signed-off-by: Arnaldo Carvalho de Melo commit 3dfc2c0aee789843d18f6e4675658e6879465a56 Author: Thomas Gleixner Date: Tue Dec 7 12:48:58 2010 +0000 perf session: Split out sample preprocessing Simplify the code a bit. Cc: Frederic Weisbecker Cc: Ian Munsie Cc: Ingo Molnar Cc: Peter Zijlstra LKML-Reference: <20101207124551.014649793@linutronix.de> Signed-off-by: Thomas Gleixner Signed-off-by: Arnaldo Carvalho de Melo commit 532e7269c01098f0be6e08113c6947ec6ed11bfa Author: Thomas Gleixner Date: Tue Dec 7 12:48:55 2010 +0000 perf session: Move dump code to event delivery path Preparatory patch for ordered perf report -D Acked-by: Ian Munsie Cc: Frederic Weisbecker Cc: Ian Munsie Cc: Ingo Molnar Cc: Peter Zijlstra LKML-Reference: <20101207124550.918655066@linutronix.de> Signed-off-by: Thomas Gleixner Signed-off-by: Arnaldo Carvalho de Melo commit f74725dcf2f6931c26bc65e77e34e693eeb8441c Author: Thomas Gleixner Date: Tue Dec 7 12:48:53 2010 +0000 perf session: Add file_offset to event delivery function Preparatory patch for ordered output of perf report -D Acked-by: Ian Munsie Cc: Frederic Weisbecker Cc: Ian Munsie Cc: Ingo Molnar Cc: Peter Zijlstra LKML-Reference: <20101207124550.818568607@linutronix.de> Signed-off-by: Thomas Gleixner Signed-off-by: Arnaldo Carvalho de Melo commit e4c2df132fef60a28b851abc1859a531e64f350c Author: Thomas Gleixner Date: Tue Dec 7 12:48:50 2010 +0000 perf session: Store file offset in sample_queue Preparatory patch for ordered output of perf report -D. Acked-by: Ian Munsie Cc: Frederic Weisbecker Cc: Ian Munsie Cc: Ingo Molnar Cc: Peter Zijlstra LKML-Reference: <20101207124550.725128545@linutronix.de> Signed-off-by: Thomas Gleixner Signed-off-by: Arnaldo Carvalho de Melo commit 9aefcab0de472ee2b3ab195a6827ddd4b170e3a7 Author: Thomas Gleixner Date: Tue Dec 7 12:48:47 2010 +0000 perf session: Consolidate the dump code The dump code used by perf report -D is scattered all over the place. Move it to separate functions. Acked-by: Ian Munsie Cc: Frederic Weisbecker Cc: Ian Munsie Cc: Ingo Molnar Cc: Peter Zijlstra LKML-Reference: <20101207124550.625434869@linutronix.de> Signed-off-by: Thomas Gleixner Signed-off-by: Arnaldo Carvalho de Melo commit 79a14c1f458d598642bf11f09512c83d33a114e6 Author: Thomas Gleixner Date: Tue Dec 7 12:48:44 2010 +0000 perf session: Dont queue events w/o timestamps If the event has no timestamp assigned then the parse code sets it to ~0ULL which causes the ordering code to enqueue it at the end. Process it right away. Reported-by: Ian Munsie Cc: Frederic Weisbecker Cc: Ian Munsie Cc: Ingo Molnar Cc: Peter Zijlstra LKML-Reference: <20101207124550.528788441@linutronix.de> Signed-off-by: Thomas Gleixner Signed-off-by: Arnaldo Carvalho de Melo commit 3835bc00c5b2d8e337a6e9d7b44f47e02760dba3 Author: Thomas Gleixner Date: Tue Dec 7 12:48:42 2010 +0000 perf event: Prevent unbound event__name array access event__name[] is missing an entry for PERF_RECORD_FINISHED_ROUND, but we happily access the array from the dump code. Make event__name[] static and provide an accessor function, fix up all callers and add the missing string. Cc: Frederic Weisbecker Cc: Ian Munsie Cc: Ingo Molnar Cc: Peter Zijlstra LKML-Reference: <20101207124550.432593943@linutronix.de> Signed-off-by: Thomas Gleixner Signed-off-by: Arnaldo Carvalho de Melo commit b226a5a72901bc9c73d639ea2e53e6c304bf3b74 Author: David Ahern Date: Tue Dec 7 19:39:46 2010 -0700 perf report: Allow user to specify path to kallsyms file This is useful for analyzing a perf data file on a different system than the one data was collected on and still include symbols from loaded kernel modules in the output. Commiter note: Updated the man page accordingly. LKML-Reference: <1291775986-16475-1-git-send-email-daahern@cisco.com> Signed-off-by: David Ahern Signed-off-by: Arnaldo Carvalho de Melo commit c277443cfc29b1623b4923219ff0bdb48b91b589 Author: Peter Zijlstra Date: Wed Dec 8 15:29:02 2010 +0100 perf: Stop all counters on reboot Use the reboot notifier to detach all running counters on reboot, this solves a problem with kexec where the new kernel doesn't expect running counters (rightly so). It will however decrease the coverage of the NMI watchdog. Making a kexec specific reboot notifier callback would be best, however that would require touching all notifier callback handlers as they are not properly structured to deal with new state. As a compromise, place the perf reboot notifier at the very last position in the list. Reported-by: Yinghai Lu Signed-off-by: Peter Zijlstra Cc: Vivek Goyal Cc: Eric W. Biederman Cc: Jason Wessel Cc: Don Zickus LKML-Reference: Signed-off-by: Ingo Molnar commit c079c791c5a0627fc7b752d31d72e274e0596ba8 Author: Peter Zijlstra Date: Thu Nov 25 08:56:17 2010 +0100 perf, amd: Remove the nb lock Since all the hotplug stuff is serialized by the hotplug mutex, do away with the amd_nb_lock. Cc: Stephane Eranian Signed-off-by: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit 40dc11ffb35e8c4e8fa71092048e0f8de9db758c Author: Eric Dumazet Date: Fri Nov 26 17:22:16 2010 +0100 printk: Use this_cpu_{read|write} api on printk_pending __get_cpu_var() is a bit inefficient, lets use __this_cpu_read() and __this_cpu_write() to manipulate printk_pending. printk_needs_cpu(cpu) is called only for the current cpu : Use faster __this_cpu_read(). Remove the redundant unlikely on (cpu_is_offline(cpu)) test: # size kernel/printk.o* text data bss dec hex filename 9942 756 263488 274186 42f0a kernel/printk.o.new 9990 756 263488 274234 42f3a kernel/printk.o.old Signed-off-by: Eric Dumazet Cc: Heiko Carstens Cc: H. Peter Anvin Cc: Christoph Lameter Signed-off-by: Peter Zijlstra LKML-Reference: <1290788536.2855.237.camel@edumazet-laptop> Signed-off-by: Ingo Molnar commit 806c09a7db457be3758e14b1f152761135d89af5 Author: Dario Faggioli Date: Tue Nov 30 19:51:33 2010 +0100 sched: Make pushable_tasks CONFIG_SMP dependant As noted by Peter Zijlstra at https://lkml.org/lkml/2010/11/10/391 (while reviewing other stuff, though), tracking pushable tasks only makes sense on SMP systems. Signed-off-by: Dario Faggioli Acked-by: Steven Rostedt Acked-by: Gregory Haskins Signed-off-by: Peter Zijlstra LKML-Reference: <1291143093.2697.298.camel@Palantir> Signed-off-by: Ingo Molnar commit 8e9255e6a2141e050d51bc4d96dbef494a87d653 Merge: 5091faa 6313e3c Author: Ingo Molnar Date: Wed Dec 8 20:15:26 2010 +0100 Merge branch 'linus' into sched/core Merge reason: we want to queue up dependent cleanup Signed-off-by: Ingo Molnar commit bcd7278d8a423a255e45f4d10afe564328f1885f Author: Bob Peterson Date: Tue Dec 7 13:58:56 2010 -0500 GFS2: fsck.gfs2 reported statfs error after gfs2_grow When you do gfs2_grow it failed to take the very last rgrp into account when adding up the new free space due to an off-by-one error. It was not reading the last rgrp from the rindex because of a check for "<=" that should have been "<". Therefore, fsck.gfs2 was finding (and fixing) an error with the system statfs file. Signed-off-by: Bob Peterson commit b38aa89600be39b3e10c5b6529aed2e66518598e Author: Ian Munsie Date: Mon Nov 29 11:53:07 2010 +1100 perf makefile: Allow strong and weak functions in LIB_OBJS When we build perf we place all of the .o files from the library files (util, arch/x/util, etc) into libperf.a which is then linked into perf. The problem is that the linker will by default only consider .o files within the .a archive if they are necessary to satisfy an unresolved symbol. As weak functions are not unresolved, it will not consider a .o file from the archive containing the strong versions of weak functions unless it requires it for another reason. This patch adds the --whole-archive flags to the linker when passing in the libperf.a file to ensure that it will consider every .o file in the archive, not just what it believes that it needs. The end result is that weak functions can now be overridden by strong variants of them in the libperf.a file. Cc: "tom.leiming" Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <1290991642-sup-5890@au1.ibm.com> Signed-off-by: Ian Munsie Signed-off-by: Arnaldo Carvalho de Melo commit 75b5293a5d176cd9caf6dc590da4f3458c048c3c Merge: 10a18d7 ce47dc5 Author: Ingo Molnar Date: Tue Dec 7 07:51:14 2010 +0100 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core commit 10a18d7dc0d9f12483c95ffc234118e9b80edfeb Merge: f984ba4 cf7d7e5 Author: Ingo Molnar Date: Tue Dec 7 07:49:48 2010 +0100 Merge commit 'v2.6.37-rc5' into perf/core Merge reason: Pick up the latest -rc. Signed-off-by: Ingo Molnar commit 991cfffa7c19aa648546aff666595af896e568ba Author: Feng Tang Date: Fri Dec 3 11:51:37 2010 +0800 x86, earlyprintk: Move mrst early console to platform/ and fix a typo Move the code to arch/x86/platform/mrst/. Also fix a typo to use the correct config option: ONFIG_EARLY_PRINTK_MRST Signed-off-by: Feng Tang Cc: alan@linux.intel.com LKML-Reference: <1291348298-21263-1-git-send-email-feng.tang@intel.com> Signed-off-by: Thomas Gleixner commit f984ba4eb575e4a27ed28a76d4126d2aa9233c32 Author: Masami Hiramatsu Date: Fri Dec 3 18:54:34 2010 +0900 kprobes: Use text_poke_smp_batch for unoptimizing Use text_poke_smp_batch() on unoptimization path for reducing the number of stop_machine() issues. If the number of unoptimizing probes is more than MAX_OPTIMIZE_PROBES(=256), kprobes unoptimizes first MAX_OPTIMIZE_PROBES probes and kicks optimizer for remaining probes. Signed-off-by: Masami Hiramatsu Cc: Rusty Russell Cc: Frederic Weisbecker Cc: Ananth N Mavinakayanahalli Cc: Jason Baron Cc: Mathieu Desnoyers Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Peter Zijlstra Cc: Steven Rostedt LKML-Reference: <20101203095434.2961.22657.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar commit cd7ebe2298ff1c3112232878678ce5fe6be8a15b Author: Masami Hiramatsu Date: Fri Dec 3 18:54:28 2010 +0900 kprobes: Use text_poke_smp_batch for optimizing Use text_poke_smp_batch() in optimization path for reducing the number of stop_machine() issues. If the number of optimizing probes is more than MAX_OPTIMIZE_PROBES(=256), kprobes optimizes first MAX_OPTIMIZE_PROBES probes and kicks optimizer for remaining probes. Changes in v5: - Use kick_kprobe_optimizer() instead of directly calling schedule_delayed_work(). - Rescheduling optimizer outside of kprobe mutex lock. Changes in v2: - Allocate code buffer and parameters in arch_init_kprobes() instead of using static arraies. - Merge previous max optimization limit patch into this patch. So, this patch introduces upper limit of optimization at once. Signed-off-by: Masami Hiramatsu Cc: Rusty Russell Cc: Frederic Weisbecker Cc: Ananth N Mavinakayanahalli Cc: Jason Baron Cc: Mathieu Desnoyers Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Peter Zijlstra Cc: Steven Rostedt LKML-Reference: <20101203095428.2961.8994.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar commit 7deb18dcf0478940ac979de002db1ed8ba6531dc Author: Masami Hiramatsu Date: Fri Dec 3 18:54:22 2010 +0900 x86: Introduce text_poke_smp_batch() for batch-code modifying Introduce text_poke_smp_batch(). This function modifies several text areas with one stop_machine() on SMP. Because calling stop_machine() is heavy task, it is better to aggregate text_poke requests. ( Note: I've talked with Rusty about this interface, and he would not like to expand stop_machine() interface, since it is not for generic use. ) Signed-off-by: Masami Hiramatsu Acked-by: Steven Rostedt Cc: Rusty Russell Cc: Frederic Weisbecker Cc: Ananth N Mavinakayanahalli Cc: Jason Baron Cc: Mathieu Desnoyers Cc: Jan Beulich Cc: 2nddept-manager@sdl.hitachi.co.jp LKML-Reference: <20101203095422.2961.51217.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar commit 0490cd1f9d99569d3bd64e17adc88db06a5007be Author: Masami Hiramatsu Date: Fri Dec 3 18:54:16 2010 +0900 kprobes: Reuse unused kprobe Reuse unused (waiting for unoptimizing and no user handler) kprobe on given address instead of returning -EBUSY for registering a new kprobe. Signed-off-by: Masami Hiramatsu Cc: Rusty Russell Cc: Frederic Weisbecker Cc: Ananth N Mavinakayanahalli Cc: Jason Baron Cc: Mathieu Desnoyers Cc: 2nddept-manager@sdl.hitachi.co.jp LKML-Reference: <20101203095416.2961.39080.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar commit 6274de4984a630b45c6934b3ee62e5692c745328 Author: Masami Hiramatsu Date: Fri Dec 3 18:54:09 2010 +0900 kprobes: Support delayed unoptimizing Unoptimization occurs when a probe is unregistered or disabled, and is heavy because it recovers instructions by using stop_machine(). This patch delays unoptimization operations and unoptimize several probes at once by using text_poke_smp_batch(). This can avoid unexpected system slowdown coming from stop_machine(). Changes in v5: - Split this patch into several cleanup patches and this patch. - Fix some text_mutex lock miss. - Use bool instead of int for behavior flags. - Add additional comment for (un)optimizing path. Changes in v2: - Use dynamic allocated buffers and params. Signed-off-by: Masami Hiramatsu Cc: Rusty Russell Cc: Frederic Weisbecker Cc: Ananth N Mavinakayanahalli Cc: Jason Baron Cc: Mathieu Desnoyers Cc: 2nddept-manager@sdl.hitachi.co.jp LKML-Reference: <20101203095409.2961.82733.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar commit 61f4e13ffd85c037a942c5b7fd13f2b0c3162862 Author: Masami Hiramatsu Date: Fri Dec 3 18:54:03 2010 +0900 kprobes: Separate kprobe optimizing code from optimizer Separate kprobe optimizing code from optimizer, this will make easy to introducing unoptimizing code in optimizer. Signed-off-by: Masami Hiramatsu Cc: Rusty Russell Cc: Frederic Weisbecker Cc: Ananth N Mavinakayanahalli Cc: Jason Baron Cc: Mathieu Desnoyers Cc: 2nddept-manager@sdl.hitachi.co.jp LKML-Reference: <20101203095403.2961.91201.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar commit 6f0f1dd71953d4243c11e490dd49ef24ebaf6c0b Author: Masami Hiramatsu Date: Fri Dec 3 18:53:57 2010 +0900 kprobes: Cleanup disabling and unregistering path Merge disabling kprobe to unregistering kprobe function and add comments for disabing/unregistring process. Current unregistering code disables(disarms) kprobes after checking target kprobe status. This patch changes it to disabling kprobe first after that it changing the kprobe's state. This allows to share probe disabling code between disable_kprobe() and unregister_kprobe(). Signed-off-by: Masami Hiramatsu Cc: Rusty Russell Cc: Frederic Weisbecker Cc: Ananth N Mavinakayanahalli Cc: Jason Baron Cc: Mathieu Desnoyers Cc: 2nddept-manager@sdl.hitachi.co.jp LKML-Reference: <20101203095356.2961.30152.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar commit 6d8e40a85ef72a0514ebd00748eb18cab432b200 Author: Masami Hiramatsu Date: Fri Dec 3 18:53:50 2010 +0900 kprobes: Rename old_p to more appropriate name Rename irrelevant uses of "old_p" to more appropriate names. Originally, "old_p" just meant "the old kprobe on given address" but current code uses that name as "just another kprobe" or something like that. This patch renames those pointer names to more appropriate one for maintainability. Signed-off-by: Masami Hiramatsu Cc: Rusty Russell Cc: Frederic Weisbecker Cc: Ananth N Mavinakayanahalli Cc: Jason Baron Cc: Mathieu Desnoyers Cc: 2nddept-manager@sdl.hitachi.co.jp LKML-Reference: <20101203095350.2961.48110.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar commit e4d2ebcab11b308b5b59073578efd33eccd55d46 Author: Feng Tang Date: Fri Dec 3 11:51:38 2010 +0800 x86, apbt: Setup affinity for apb timers acting as per-cpu timer Commit a5ef2e70 "x86: Sanitize apb timer interrupt handling" forgot the affinity setup when cleaning up the code, this patch just adds the forgotten part Signed-off-by: Feng Tang Cc: Jacob Pan Cc: Alan Cox LKML-Reference: <1291348298-21263-2-git-send-email-feng.tang@intel.com> Signed-off-by: Thomas Gleixner commit 5ec6960f6f0c7be9cc6e5506fdf0070add3b6e08 Author: Dirk Brandewie Date: Mon Nov 22 06:28:48 2010 -0800 ce4100: Add errata fixes for UART on CE4100 This patch enables the UART on the CE4100. The UART has a couple of issues that need to be worked around. First the UART is mostly PC compatible except that it is clocked eight times faster than a standard PC so the default configuration provided in arch/x86/include/asm/serial.h needs to be overridden. Second the TX interrupt may not be set correctly all the time. Lastly accessing the UART via I/O space for early_prink() hangs the chip when the IOAPIC is enabled. A custom mem_serial_in() is provided to work around the TX interrupt issue. The configuration issues are dealt with in the call back registered with the 8250 driver via serial8250_set_isa_configurator() Signed-off-by: Dirk Brandewie LKML-Reference: <1290436128-17958-1-git-send-email-dirk.brandewie@gmail.com> Signed-off-by: Thomas Gleixner commit ce47dc56a2241dc035160a85bc5e34283cdd622c Author: Chris Samuel Date: Sat Nov 13 13:35:06 2010 +1100 perf tools: Catch a few uncheck calloc/malloc's There were a few stray calloc()'s and malloc()'s which were not having their return values checked for success. As the calling code either already coped with failure or didn't actually care we just return -ENOMEM at that point. Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Signed-off-by: Chris Samuel LKML-Reference: <4CDDF95A.1050400@csamuel.org> Signed-off-by: Arnaldo Carvalho de Melo commit 965bb6beaf70862d3846e330ea7a14996d82c499 Author: Stephane Eranian Date: Fri Dec 3 17:52:01 2010 +0200 perf script: Fix compiler warning in builtin_script.c:is_top_script() Fix annoying compiler warning in the is_top_script() function. The issue was that a const char * was cast into a char * to call ends_with(). We fix the users of ends_with() instead. Some are passing a char *, but it is okay to cast the return value of ends_with() to char * (because we understand what ends_with() does). Cc: David S. Miller Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Robert Richter Cc: Stephane Eranian LKML-Reference: <4cf92096.17edd80a.1540.5d60@mx.google.com> Signed-off-by: Stephane Eranian Signed-off-by: Arnaldo Carvalho de Melo commit cbf41645f35224798cb61641766e6a16e141ffe4 Author: Thomas Gleixner Date: Sun Dec 5 14:32:55 2010 +0100 perf session: Sort all events if ordered_samples=true Now that we have timestamps on FORK, EXIT, COMM, MMAP events we can sort everything in time order. This fixes the following observed problem: mmap(file1) -> pagefault() -> munmap(file1) mmap(file2) -> pagefault() -> munmap(file2) Resulted in decoding both pagefaults in file2 because the file1 map was already replaced by the file2 map when the map address was identical. With all events sorted we decode both pagefaults correctly. Cc: Frederic Weisbecker Cc: Ian Munsie Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian LKML-Reference: Signed-off-by: Thomas Gleixner Signed-off-by: Arnaldo Carvalho de Melo commit e4e18d568b0e833c75c1f7833e1690cdde8f4d76 Author: Akihiro Nagai Date: Fri Dec 3 12:58:53 2010 +0900 perf options: add OPT_CALLBACK_DEFAULT_NOOPT Add new macro OPT_CALLBACK_DEFAULT_NOOPT for parse_options. It enables to pass the default value (opt->defval) to the callback function processing options require no argument. Reviewed-by: Masami Hiramatsu Cc: Ingo Molnar Cc: Masami Hiramatsu Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <20101203035853.7827.17502.stgit@localhost6.localdomain6> Signed-off-by: Akihiro Nagai Signed-off-by: Arnaldo Carvalho de Melo commit 1437a30aae865d83c7d96e3401f503a73fffe14d Author: Ian Munsie Date: Mon Dec 6 13:37:04 2010 +1100 perf hist: Better displaying of unresolved DSOs and symbols In the event that a DSO has not been identified, just print out [unknown] instead of the instruction pointer as we previously were doing, which is pretty meaningless for a shared object (at least to the users perspective). The IP we print out is fairly meaningless in general anyway - it's just one (the first) of the many addresses that were lumped together as unidentified, and could span many shared objects and symbols. In reality if we see this [unknown] output then the report -D output is going to be more useful anyway as we can see all the different address that it represents. If we are printing the symbols we are still going to see this IP in that column anyway since they shouldn't resolve either. This patch also changes the symbol address printouts so that they print out 0x before the address, are left aligned, and changes the %L format string (which relies on a glibc bug) to %ll. Before: 74.11% :3259 4a6c [k] 4a6c After: 74.11% :3259 [unknown] [k] 0x4a6c Cc: Frederic Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian LKML-Reference: <1291603026-11785-2-git-send-email-imunsie@au1.ibm.com> Signed-off-by: Ian Munsie Signed-off-by: Arnaldo Carvalho de Melo commit a38c5380ef9f088be9f49b6e4c5d80af8b1b5cd4 Author: Sebastian Andrzej Siewior Date: Fri Nov 26 17:50:20 2010 +0100 x86: io_apic: Split setup_ioapic_ids_from_mpc() Sodaville needs to setup the IO_APIC ids as the boot loader leaves them uninitialized. Split out the setter function so it can be called unconditionally from the sodaville board code. Signed-off-by: Sebastian Andrzej Siewior Cc: Yinghai Lu LKML-Reference: <20101126165020.GA26361@www.tglx.de> Signed-off-by: Thomas Gleixner commit 9c90a61c7e4286aa5a38b314a2d8f5a1e70b5135 Author: Arnaldo Carvalho de Melo Date: Thu Dec 2 10:25:28 2010 -0200 perf tools: Ask for ID PERF_SAMPLE_ info on all PERF_RECORD_ events So that we can use -T == --timestamp, asking for PERF_SAMPLE_TIME: $ perf record -aT $ perf report -D | grep PERF_RECORD_ 3 5951915425 0x47530 [0x58]: PERF_RECORD_SAMPLE(IP, 1): 16811/16811: 0xffffffff8138c1a2 period: 215979 cpu:3 3 5952026879 0x47588 [0x90]: PERF_RECORD_SAMPLE(IP, 1): 16811/16811: 0xffffffff810cb480 period: 215979 cpu:3 3 5952059959 0x47618 [0x38]: PERF_RECORD_FORK(6853:6853):(16811:16811) 3 5952138878 0x47650 [0x78]: PERF_RECORD_SAMPLE(IP, 1): 16811/16811: 0xffffffff811bac35 period: 431478 cpu:3 3 5952375068 0x476c8 [0x30]: PERF_RECORD_COMM: find:6853 3 5952395923 0x476f8 [0x50]: PERF_RECORD_MMAP 6853/6853: [0x400000(0x25000) @ 0]: /usr/bin/find 3 5952413756 0x47748 [0xa0]: PERF_RECORD_SAMPLE(IP, 1): 6853/6853: 0xffffffff810d080f period: 859332 cpu:3 3 5952419837 0x477e8 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f44600000(0x21d000) @ 0]: /lib64/ld-2.5.so 3 5952437929 0x47840 [0x48]: PERF_RECORD_MMAP 6853/6853: [0x7fff7e1c9000(0x1000) @ 0x7fff7e1c9000]: [vdso] 3 5952570127 0x47888 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f46200000(0x218000) @ 0]: /lib64/libselinux.so.1 3 5952623637 0x478e0 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f44a00000(0x356000) @ 0]: /lib64/libc-2.5.so 3 5952675720 0x47938 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f44e00000(0x204000) @ 0]: /lib64/libdl-2.5.so 3 5952710080 0x47990 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f45a00000(0x246000) @ 0]: /lib64/libsepol.so.1 3 5952847802 0x479e8 [0x58]: PERF_RECORD_SAMPLE(IP, 1): 6853/6853: 0xffffffff813897f0 period: 1142536 cpu:3 First column is the cpu and the second the timestamp. That way we can investigate problems in the event stream. If the new perf binary is run on an older kernel, it will disable this feature automatically. Tested-by: Thomas Gleixner Reviewed-by: Thomas Gleixner Acked-by: Ian Munsie Acked-by: Thomas Gleixner Cc: Frédéric Weisbecker Cc: Ian Munsie Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Stephane Eranian LKML-Reference: <1291318772-30880-5-git-send-email-acme@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo commit 640c03ce837fe8d4b56342aba376ea0da3960459 Author: Arnaldo Carvalho de Melo Date: Thu Dec 2 14:10:21 2010 -0200 perf session: Parse sample earlier At perf_session__process_event, so that we reduce the number of lines in eache tool sample processing routine that now receives a sample_data pointer already parsed. This will also be useful in the next patch, where we'll allow sample the identity fields in MMAP, FORK, EXIT, etc, when it will be possible to see (cpu, timestamp) just after before every event. Also validate callchains in perf_session__process_event, i.e. as early as possible, and keep a counter of the number of events discarded due to invalid callchains, warning the user about it if it happens. There is an assumption that was kept that all events have the same sample_type, that will be dealt with in the future, when this preexisting limitation will be removed. Tested-by: Thomas Gleixner Reviewed-by: Thomas Gleixner Acked-by: Ian Munsie Acked-by: Thomas Gleixner Cc: Frédéric Weisbecker Cc: Ian Munsie Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Stephane Eranian LKML-Reference: <1291318772-30880-4-git-send-email-acme@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo commit c980d1091810df13f21aabbce545fd98f545bbf7 Author: Arnaldo Carvalho de Melo Date: Sat Dec 4 23:02:20 2010 -0200 perf events: Make sample_type identity fields available in all PERF_RECORD_ events If perf_event_attr.sample_id_all is set it will add the PERF_SAMPLE_ identity info: TID, TIME, ID, CPU, STREAM_ID As a trailer, so that older perf tools can process new files, just ignoring the extra payload. With this its possible to do further analysis on problems in the event stream, like detecting reordering of MMAP and FORK events, etc. V2: Fixup header size in comm, mmap and task processing, as we have to take into account different sample_types for each matching event, noticed by Thomas Gleixner. Thomas also noticed a problem in v2 where if we didn't had space in the buffer we wouldn't restore the header size. Tested-by: Thomas Gleixner Reviewed-by: Thomas Gleixner Acked-by: Ian Munsie Acked-by: Peter Zijlstra Acked-by: Thomas Gleixner Cc: Frédéric Weisbecker Cc: Ian Munsie Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Stephane Eranian Cc: Thomas Gleixner LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 6844c09d849aeb00e8ddfe9525e8567a531c22d0 Author: Arnaldo Carvalho de Melo Date: Fri Dec 3 16:36:35 2010 -0200 perf events: Separate the routines handling the PERF_SAMPLE_ identity fields Those will be made available in sample like events like MMAP, EXEC, etc in a followup patch. So precalculate the extra id header space and have a separate routine to fill them up. V2: Thomas noticed that the id header needs to be precalculated at inherit_events too: LKML-Reference: Tested-by: Thomas Gleixner Reviewed-by: Thomas Gleixner Acked-by: Ian Munsie Acked-by: Peter Zijlstra Acked-by: Thomas Gleixner Cc: Frédéric Weisbecker Cc: Ian Munsie Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Stephane Eranian Cc: Thomas Gleixner LKML-Reference: <1291318772-30880-2-git-send-email-acme@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo commit 614b6780eb0c393d2fb49ff62d61f29b877bd07e Author: Thomas Gleixner Date: Fri Dec 3 16:24:32 2010 -0200 perf events: Fix event inherit fallout of precalculated headers The precalculated header size is not updated when an event is inherited. That results in bogus sample entries for all child events. Bug introduced in c320c7b. Cc: Frederic Weisbecker Cc: Ian Munsie Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian LKML-Reference: Signed-off-by: Thomas Gleixner Signed-off-by: Arnaldo Carvalho de Melo commit 287050d390264402e11bea8b811859e42e8faa29 Author: Steven Rostedt Date: Thu Dec 2 16:46:18 2010 -0500 tracing: Add TRACE_EVENT_CONDITIONAL() There are instances in the kernel that we only want to trace a tracepoint when a certain condition is set. But we do not want to test for that condition in the core kernel. If we test for that condition before calling the tracepoin, then we will be performing that test even when tracing is not enabled. This is 99.99% of the time. We currently can just filter out on that condition, but that happens after we write to the trace buffer. We just wasted time writing to the ring buffer for an event we never cared about. This patch adds: TRACE_EVENT_CONDITION() and DEFINE_EVENT_CONDITION() These have a new TP_CONDITION() argument that comes right after the TP_ARGS(). This condition can use the parameters of TP_ARGS() in the TRACE_EVENT() to determine if the tracepoint should be traced or not. The TP_CONDITION() will be placed in a if (cond) trace; For example, for the tracepoint sched_wakeup, it is useless to trace a wakeup event where the caller never actually wakes anything up (where success == 0). So adding: TP_CONDITION(success), which uses the "success" parameter of the wakeup tracepoint will have it only trace when we have successfully woken up a task. Acked-by: Mathieu Desnoyers Acked-by: Frederic Weisbecker Cc: Arjan van de Ven Cc: Thomas Gleixner Signed-off-by: Steven Rostedt commit 08ec0c58fb8a05d3191d5cb6f5d6f81adb419798 Author: John Stultz Date: Tue Jul 27 17:00:00 2010 -0700 x86: Improve TSC calibration using a delayed workqueue Boot to boot the TSC calibration may vary by quite a large amount. While normal variance of 50-100ppm can easily be seen, the quick calibration code only requires 500ppm accuracy, which is the limit of what NTP can correct for. This can cause problems for systems being used as NTP servers, as every time they reboot it can take hours for them to calculate the new drift error caused by the calibration. The classic trade-off here is calibration accuracy vs slow boot times, as during the calibration nothing else can run. This patch uses a delayed workqueue to calibrate the TSC over the period of a second. This allows very accurate calibration (in my tests only varying by 1khz or 0.4ppm boot to boot). Additionally this refined calibration step does not block the boot process, and only delays the TSC clocksoure registration by a few seconds in early boot. If the refined calibration strays 1% from the early boot calibration value, the system will fall back to already calculated early boot calibration. Credit to Andi Kleen who suggested using a timer quite awhile back, but I dismissed it thinking the timer calibration would be done after the clocksource was registered (which would break things). Forgive me for my short-sightedness. This patch has worked very well in my testing, but TSC hardware is quite varied so it would probably be good to get some extended testing, possibly pushing inclusion out to 2.6.39. Signed-off-by: John Stultz LKML-Reference: <1289003985-29060-1-git-send-email-johnstul@us.ibm.com> Reviewed-by: Thomas Gleixner CC: Thomas Gleixner CC: Ingo Molnar CC: Martin Schwidefsky CC: Clark Williams CC: Andi Kleen commit b0f969009f647cd473c5e559aeec9c4229d12f87 Merge: 3561d43 d3b8f88 Author: John Stultz Date: Thu Dec 2 16:47:52 2010 -0800 Merge remote branch 'tip/x86/tsc' into fortglx/2.6.38/tip/x86/tsc Conflicts: Documentation/kernel-parameters.txt commit 87de5ac782761a3ebf806e434e8c9cc205a87274 Author: John Stultz Date: Mon Sep 20 17:42:46 2010 -0700 timers: Introduce timerlist infrastructure. The timerlist infrastructure is a thin layer over the rbtree code that implements a simple list of timers sorted by an expires value, and a getnext function that provides a pointer to the earliest timer. This infrastructure allows drivers and other kernel infrastructure to easily implement timers without duplicating code. Signed-off-by: John Stultz LKML Reference: <1290136329-18291-2-git-send-email-john.stultz@linaro.org> Reviewed-by: Thomas Gleixner CC: Alessandro Zummo CC: Thomas Gleixner CC: Richard Cochran commit e4b546a3643fbfc510d5ef7db538e4d3ab00effb Merge: b3d006c d7470b6 Author: Ingo Molnar Date: Thu Dec 2 11:20:11 2010 +0100 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core commit d7470b6afca85ed4388fff57fc9d89f5a3be02ff Author: Stephane Eranian Date: Wed Dec 1 18:49:05 2010 +0200 perf stat: Add csv-style output This patch adds an option (-x/--field-separator) to print counts using a CSV-style output. The user can pass a custom separator. This makes it very easy to import counts directly into your favorite spreadsheet without having to write scripts. Example: $ perf stat --field-separator=, -a -- sleep 1 4009.961740,task-clock-msecs 13,context-switches 2,CPU-migrations 189,page-faults 9596385684,cycles 3493659441,instructions 872897069,branches 41562,branch-misses 22424,cache-references 1289,cache-misses Works also in non-aggregated mode: $ perf stat -x , -a -A -- sleep 1 CPU0,1002.526168,task-clock-msecs CPU1,1002.528365,task-clock-msecs CPU2,1002.523360,task-clock-msecs CPU3,1002.519878,task-clock-msecs CPU0,1,context-switches CPU1,5,context-switches CPU2,5,context-switches CPU3,6,context-switches CPU0,0,CPU-migrations CPU1,1,CPU-migrations CPU2,0,CPU-migrations CPU3,1,CPU-migrations CPU0,2,page-faults CPU1,6,page-faults CPU2,9,page-faults CPU3,174,page-faults CPU0,2399439771,cycles CPU1,2380369063,cycles CPU2,2399142710,cycles CPU3,2373161192,cycles CPU0,872900618,instructions CPU1,873030960,instructions CPU2,872714525,instructions CPU3,874460580,instructions CPU0,221556839,branches CPU1,218134342,branches CPU2,218161730,branches CPU3,218284093,branches CPU0,18556,branch-misses CPU1,1449,branch-misses CPU2,3447,branch-misses CPU3,12714,branch-misses CPU0,8330,cache-references CPU1,313844,cache-references CPU2,47993728,cache-references CPU3,826481,cache-references CPU0,272,cache-misses CPU1,5360,cache-misses CPU2,1342193,cache-misses CPU3,13992,cache-misses This second version adds the ability to name a separator and uses field-separator as the long option to be consistent with perf report. Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be used with it, we need to notice if the user explicitely enabled or disabled -B, add code to disable big_num if the user didn't explicitely set --big_num when -x is used. Cc: David S. Miller Cc: Frederik Weisbecker Cc: Ingo Molnar Cc: paulus@samba.org Cc: Peter Zijlstra Cc: Robert Richter LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com> Signed-off-by: Stephane Eranian Signed-off-by: Arnaldo Carvalho de Melo commit 201e0b06efee80ce090579aa165c65c3d0836d95 Author: Arnaldo Carvalho de Melo Date: Wed Dec 1 17:53:27 2010 -0200 perf stat: Use --big-num format by default [acme@mica linux]$ perf stat ls > /dev/null Performance counter stats for 'ls': 1.512532 task-clock-msecs # 0.801 CPUs 2 context-switches # 0.001 M/sec 0 CPU-migrations # 0.000 M/sec 241 page-faults # 0.159 M/sec 2,973,331 cycles # 1965.797 M/sec 1,460,802 instructions # 0.491 IPC 314,642 branches # 208.023 M/sec 18,475 branch-misses # 5.872 % cache-references cache-misses 0.001887676 seconds time elapsed To get the previous behaviour just use --no-big-num: [acme@mica linux]$ perf stat --no-big-num ls > /dev/null Performance counter stats for 'ls': 1.468014 task-clock-msecs # 0.795 CPUs 1 context-switches # 0.001 M/sec 0 CPU-migrations # 0.000 M/sec 241 page-faults # 0.164 M/sec 2900254 cycles # 1975.631 M/sec 1437991 instructions # 0.496 IPC 310905 branches # 211.786 M/sec 17912 branch-misses # 5.761 % cache-references cache-misses 0.001845435 seconds time elapsed [acme@mica linux]$ Suggested-by: Ingo Molnar Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Stephane Eranian LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 8c207692fc8fa3ea1a5ff97ad698efb09a81975a Author: Shawn Bohrer Date: Tue Nov 30 19:57:19 2010 -0600 perf stat: Document missing options Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <1291168642-11402-12-git-send-email-shawn.bohrer@gmail.com> Signed-off-by: Shawn Bohrer Signed-off-by: Arnaldo Carvalho de Melo commit f68d6bd451782b58c2b563dc86f0a81ba106c30c Author: Shawn Bohrer Date: Tue Nov 30 19:57:20 2010 -0600 perf test: Fix spelling mistake in documentation Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <1291168642-11402-13-git-send-email-shawn.bohrer@gmail.com> Signed-off-by: Shawn Bohrer Signed-off-by: Arnaldo Carvalho de Melo commit 646420f1bcf6ddf3e150f92a5f2a8bd7d6057ff8 Author: Shawn Bohrer Date: Tue Nov 30 19:57:22 2010 -0600 perf trace: Document missing options Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <1291168642-11402-15-git-send-email-shawn.bohrer@gmail.com> Signed-off-by: Shawn Bohrer Signed-off-by: Arnaldo Carvalho de Melo commit 2e7a988198b11877d844d147ec81c7caea82c314 Author: Shawn Bohrer Date: Tue Nov 30 19:57:21 2010 -0600 perf top: Document missing options Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <1291168642-11402-14-git-send-email-shawn.bohrer@gmail.com> Signed-off-by: Shawn Bohrer Signed-off-by: Arnaldo Carvalho de Melo commit 1eacc94a66ce347abbf75f223361b21461331383 Author: Shawn Bohrer Date: Tue Nov 30 19:57:18 2010 -0600 perf sched: Document missing options Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <1291168642-11402-11-git-send-email-shawn.bohrer@gmail.com> Signed-off-by: Shawn Bohrer Signed-off-by: Arnaldo Carvalho de Melo commit e04fffc321aeebab4962cfc120952272f2d1df98 Author: Shawn Bohrer Date: Tue Nov 30 19:57:17 2010 -0600 perf report: Document missing options Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <1291168642-11402-10-git-send-email-shawn.bohrer@gmail.com> Signed-off-by: Shawn Bohrer Signed-off-by: Arnaldo Carvalho de Melo commit 08dbd7e3fa2398910713b21399cca7c6e4b43011 Author: Shawn Bohrer Date: Tue Nov 30 19:57:16 2010 -0600 perf record: Document missing options Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <1291168642-11402-9-git-send-email-shawn.bohrer@gmail.com> Signed-off-by: Shawn Bohrer Signed-off-by: Arnaldo Carvalho de Melo commit 9d5b7f5b2b2c1ceade1fbbaefb2bd9167436329d Author: Shawn Bohrer Date: Tue Nov 30 19:57:15 2010 -0600 perf probe: Fix spelling mistake in documentation Acked-by: Masami Hiramatsu Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <1291168642-11402-8-git-send-email-shawn.bohrer@gmail.com> Signed-off-by: Shawn Bohrer Signed-off-by: Arnaldo Carvalho de Melo commit 4aace251519745977f4b5ddf625b630b4807be3a Author: Shawn Bohrer Date: Tue Nov 30 19:57:14 2010 -0600 perf lock: Document missing options Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <1291168642-11402-7-git-send-email-shawn.bohrer@gmail.com> Signed-off-by: Shawn Bohrer Signed-off-by: Arnaldo Carvalho de Melo commit 5c0ef0ab077c8bdb224540fea60473439be1bdb4 Author: Shawn Bohrer Date: Tue Nov 30 19:57:13 2010 -0600 perf kvm: Document missing options Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <1291168642-11402-6-git-send-email-shawn.bohrer@gmail.com> Signed-off-by: Shawn Bohrer Signed-off-by: Arnaldo Carvalho de Melo commit 5ea4f857850450dc1442144a00547d6623d79d78 Author: Shawn Bohrer Date: Tue Nov 30 19:57:12 2010 -0600 perf diff: Document missing options Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <1291168642-11402-5-git-send-email-shawn.bohrer@gmail.com> Signed-off-by: Shawn Bohrer Signed-off-by: Arnaldo Carvalho de Melo commit 342955593af08f185279e074b3d77719d2f23b82 Author: Shawn Bohrer Date: Tue Nov 30 19:57:11 2010 -0600 perf diff: Fix displacement and modules options short flag The --displacement and --modules options to perf diff both use -m as a short flag. Change --displacement to use -M since other perf commands use -m, --modules. Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <1291168642-11402-4-git-send-email-shawn.bohrer@gmail.com> Signed-off-by: Shawn Bohrer Signed-off-by: Arnaldo Carvalho de Melo commit b6a535dbf615e168e796eec7318c77d1a3b7047f Author: Shawn Bohrer Date: Tue Nov 30 19:57:10 2010 -0600 perf buildid-list: Document missing options Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <1291168642-11402-3-git-send-email-shawn.bohrer@gmail.com> Signed-off-by: Shawn Bohrer Signed-off-by: Arnaldo Carvalho de Melo commit 1968ad911d0769e58a484bb721d275814baf9f2f Author: Shawn Bohrer Date: Tue Nov 30 19:57:09 2010 -0600 perf annotate: Document missing options. Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <1291168642-11402-2-git-send-email-shawn.bohrer@gmail.com> Signed-off-by: Shawn Bohrer Signed-off-by: Arnaldo Carvalho de Melo commit b3d006c0e745bfd2dab4984ffe3279d5cf4e926a Merge: 4c635a4 133dc4c Author: Ingo Molnar Date: Wed Dec 1 09:18:12 2010 +0100 Merge branch 'perf/rename' into perf/core Merge reason: This is an older commit under testing that was not pushed yet - merge it. Also fix up the merge in command-list.txt. Signed-off-by: Ingo Molnar Acked-by: Tom Zanussi commit 4c635a4e04700a371ef7e4d4bb33ed88747e801e Author: Corey Ashford Date: Tue Nov 30 14:27:01 2010 -0800 perf tools: fix event parsing of comma-separated tracepoint events There are number of issues that prevent the use of multiple tracepoint events being specified in a -e/--event switch, separated by commas. For example, perf stat -e irq:irq_handler_entry,irq:irq_handler_exit ... fails because the tracepoint event parsing code doesn't recognize the comma separator properly. This patch corrects those issues. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Julia Lawall Cc: Paul Mackerras Cc: Peter Zijlstra Reported-by: Michael Ellerman LKML-Reference: <1291156021-17711-1-git-send-email-cjashfor@linux.vnet.ibm.com> Signed-off-by: Corey Ashford Signed-off-by: Arnaldo Carvalho de Melo commit 3e8e24f2fc66d32eb0e570e4117dfd05227047e6 Author: Don Zickus Date: Tue Nov 30 17:12:13 2010 -0500 perf packaging: add memcpy to perf MANIFEST There seems to be a new dependency on arch/*/lib/memcpy*.S when compiling the perf tool. Make sure that file is included in the MANIFEST when creating the tarball. Cc: Ingo Molnar LKML-Reference: <1291155133-3499-2-git-send-email-dzickus@redhat.com> Signed-off-by: Don Zickus Signed-off-by: Arnaldo Carvalho de Melo commit 5b1c144475a7f2d0ab34d0b9b8414ab18b02a283 Author: Arnaldo Carvalho de Melo Date: Tue Nov 30 17:48:53 2010 -0200 perf debug: Simplify trace_event No need to check that many times if debug_trace is on. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Stephane Eranian LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 5c891f3840a7a330c96d7203d4bb5be6fa033724 Author: Thomas Gleixner Date: Tue Nov 30 17:49:55 2010 +0000 perf session: Allocate chunks of sample objects The ordered sample code allocates singular reference objects struct sample_queue which have 48byte size on 64bit and 20 bytes on 32bit. That's silly. Allocate ~64k sized chunks and hand them out. Performance gain: ~ 15% Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Frederic Weisbecker LKML-Reference: <20101130163820.398713983@linutronix.de> Signed-off-by: Thomas Gleixner Signed-off-by: Arnaldo Carvalho de Melo commit 020bb75a6deeca5ebeae531dc7378c157affc8fd Author: Thomas Gleixner Date: Tue Nov 30 17:49:53 2010 +0000 perf session: Cache sample objects When the sample queue is flushed we free the sample reference objects. Though we need to malloc new objects when we process further. Stop the malloc/free orgy and cache the already allocated object for resuage. Only allocate when the cache is empty. Performance gain: ~ 10% Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Frederic Weisbecker LKML-Reference: <20101130163820.338488630@linutronix.de> Signed-off-by: Thomas Gleixner Signed-off-by: Arnaldo Carvalho de Melo commit fe17420784a6d3602e98f798731369fa05936cbe Author: Thomas Gleixner Date: Tue Nov 30 17:49:49 2010 +0000 perf session: Keep file mmaped instead of malloc/memcpy Profiling perf with perf revealed that a large part of the processing time is spent in malloc/memcpy/free in the sample ordering code. That code copies the data from the mmap into malloc'ed memory. That's silly. We can keep the mmap and just store the pointer in the queuing data structure. For 64 bit this is not a problem as we map the whole file anyway. On 32bit we keep 8 maps around and unmap the oldest before mmaping the next chunk of the file. Performance gain: 2.95s -> 1.23s (Faktor 2.4) Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Frederic Weisbecker LKML-Reference: <20101130163820.278787719@linutronix.de> Signed-off-by: Thomas Gleixner Signed-off-by: Arnaldo Carvalho de Melo commit 55b44629f599a2305265ae9c77f9d9bcfd6ddc17 Author: Thomas Gleixner Date: Tue Nov 30 17:49:46 2010 +0000 perf session: Use sensible mmap size On 64bit we can map the whole file in one go, on 32bit we can at least map 32MB and not map/unmap tiny chunks of the file. Base the progress bar on 1/16 of the data size. Preparatory patch to get rid of the malloc/memcpy/free of trace data. Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Frederic Weisbecker LKML-Reference: <20101130163820.213687773@linutronix.de> Signed-off-by: Thomas Gleixner Signed-off-by: Arnaldo Carvalho de Melo commit d6513281c5f728d138ba895d600b9788e51508b1 Author: Thomas Gleixner Date: Tue Nov 30 17:49:44 2010 +0000 perf session: Simplify termination checks No need to check twice. Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Frederic Weisbecker LKML-Reference: <20101130163820.152886642@linutronix.de> Signed-off-by: Thomas Gleixner Signed-off-by: Arnaldo Carvalho de Melo commit 85b99952ccd3d84707661d8ae103c710daca1c8a Author: Thomas Gleixner Date: Tue Nov 30 17:49:41 2010 +0000 perf session: Move ui_progress_update in __perf_session__process_events() The progress bar is changed when the file offset changes. This happens only when the next mmap is done. No need to call ui_progress_update() for every event. Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Frederic Weisbecker LKML-Reference: <20101130163820.094836523@linutronix.de> Signed-off-by: Thomas Gleixner Signed-off-by: Arnaldo Carvalho de Melo commit 0331ee0cf4187dcdc2b184cf701d8b58bf9ff637 Author: Thomas Gleixner Date: Tue Nov 30 17:49:38 2010 +0000 perf session: Cleanup __perf_session__process_events() Replace the pseudo C++ self argument with session and give the mmap related variables a sensible name. shift is a complete misnomer - it took me several rounds of cursing to figure out that it's not a shift value. Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Frederic Weisbecker LKML-Reference: <20101130163820.029687218@linutronix.de> Signed-off-by: Thomas Gleixner Signed-off-by: Arnaldo Carvalho de Melo commit 28990f75e66b36faf6ce56747890009d4e250243 Author: Thomas Gleixner Date: Tue Nov 30 17:49:35 2010 +0000 perf session: Use appropriate pointer type instead of silly typecasting There is no reason to use a struct sample_event pointer in struct sample_queue and type cast it when flushing the queue. Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Frederic Weisbecker LKML-Reference: <20101130163819.969462809@linutronix.de> Signed-off-by: Thomas Gleixner Signed-off-by: Arnaldo Carvalho de Melo commit a1225decc43849a73f7e4c333c3fdbbb8a9c1e65 Author: Thomas Gleixner Date: Tue Nov 30 17:49:33 2010 +0000 perf session: Fix list sort algorithm The homebrewn sort algorithm fails to sort in time order. One of the problem spots is that it fails to deal with equal timestamps correctly. My first gut reaction was to replace the fancy list with an rbtree, but the performance is 3 times worse. Rewrite it so it works. Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Frederic Weisbecker LKML-Reference: <20101130163819.908482530@linutronix.de> Signed-off-by: Thomas Gleixner Signed-off-by: Arnaldo Carvalho de Melo commit c320c7b7d380e630f595de1236d9d085b035d5b4 Author: Arnaldo Carvalho de Melo Date: Wed Oct 20 12:50:11 2010 -0200 perf events: Precalculate the header space for PERF_SAMPLE_ fields PERF_SAMPLE_{CALLCHAIN,RAW} have variable lenghts per sample, but the others can be precalculated, reducing a bit the per sample cost. Acked-by: Peter Zijlstra Cc: Frédéric Weisbecker Cc: Ian Munsie Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Stephane Eranian LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 47a25380e37f44db7202093ca92e4af569c34f55 Author: Steven Whitehouse Date: Tue Nov 30 15:49:31 2010 +0000 GFS2: Merge glock state fields into a bitfield We can only merge the fields into a bitfield if the locking rules for them are the same. In this case gl_spin covers all of the fields (write side) but a couple of them are used with GLF_LOCK as the read side lock, which should be ok since we know that the field in question won't be changing at the time. The gl_req setting has to be done earlier (in glock.c) in order to place it under gl_spin. The gl_reply setting also has to be brought under gl_spin in order to comply with the new rules. This saves 4*sizeof(unsigned int) per glock. Signed-off-by: Steven Whitehouse Cc: Bob Peterson commit e06dfc492870e1d380f02722cde084b724dc197b Author: Steven Whitehouse Date: Tue Nov 30 15:46:02 2010 +0000 GFS2: Fix uninitialised error value in previous patch Signed-off-by: Steven Whitehouse commit 086d8334cf73b3bb695b82dd864a7a8b00d96b7e Author: Benjamin Marzinski Date: Tue Nov 23 23:52:55 2010 -0600 GFS2: fix recursive locking during rindex truncates When you truncate the rindex file, you need to avoid calling gfs2_rindex_hold, since you already hold it. However, if you haven't already read in the resource groups, you need to do that. Signed-off-by: Benjamin Marzinski Signed-off-by: Steven Whitehouse commit 0489b3f5eba735413ccedd425651cf41d6b1f7c5 Author: Benjamin Marzinski Date: Tue Nov 30 09:38:35 2010 -0600 GFS2: reread rindex when necessary to grow rindex When GFS2 grew the filesystem, it was never rereading the rindex file during the grow. This is necessary for large grows when the filesystem is almost full, and GFS2 needs to use some of the space allocated earlier in the grow to complete it. Now, if GFS2 fails to reserve the necessary space and the rindex file is not uptodate, it rereads it. Also, the only difference between gfs2_ri_update() and gfs2_ri_update_special() was that gfs2_ri_update_special() didn't clear out the existing resource groups, since you knew that it was only called when there were no resource groups. Attempting to clear out the resource groups when there are none takes almost no time, and rarely happens, so I simply removed gfs2_ri_update_special(). Signed-off-by: Benjamin Marzinski Signed-off-by: Steven Whitehouse commit 0b1246e6776c79719ff4a3afd9c38fba99b99d5a Author: Steven Whitehouse Date: Tue Nov 30 15:33:04 2010 +0000 GFS2: Remove duplicate #defines from glock.h There are a number of duplicated #defines in glock.h plus one which is unused. This removes the extra definitions. Signed-off-by: Steven Whitehouse commit 5091faa449ee0b7d73bc296a93bca9540fc51d0a Author: Mike Galbraith Date: Tue Nov 30 14:18:03 2010 +0100 sched: Add 'autogroup' scheduling feature: automated per session task groups A recurring complaint from CFS users is that parallel kbuild has a negative impact on desktop interactivity. This patch implements an idea from Linus, to automatically create task groups. Currently, only per session autogroups are implemented, but the patch leaves the way open for enhancement. Implementation: each task's signal struct contains an inherited pointer to a refcounted autogroup struct containing a task group pointer, the default for all tasks pointing to the init_task_group. When a task calls setsid(), a new task group is created, the process is moved into the new task group, and a reference to the preveious task group is dropped. Child processes inherit this task group thereafter, and increase it's refcount. When the last thread of a process exits, the process's reference is dropped, such that when the last process referencing an autogroup exits, the autogroup is destroyed. At runqueue selection time, IFF a task has no cgroup assignment, its current autogroup is used. Autogroup bandwidth is controllable via setting it's nice level through the proc filesystem: cat /proc//autogroup Displays the task's group and the group's nice level. echo > /proc//autogroup Sets the task group's shares to the weight of nice task. Setting nice level is rate limited for !admin users due to the abuse risk of task group locking. The feature is enabled from boot by default if CONFIG_SCHED_AUTOGROUP=y is selected, but can be disabled via the boot option noautogroup, and can also be turned on/off on the fly via: echo [01] > /proc/sys/kernel/sched_autogroup_enabled ... which will automatically move tasks to/from the root task group. Signed-off-by: Mike Galbraith Acked-by: Linus Torvalds Acked-by: Peter Zijlstra Cc: Markus Trippelsdorf Cc: Mathieu Desnoyers Cc: Paul Turner Cc: Oleg Nesterov [ Removed the task_group_path() debug code, and fixed !EVENTFD build failure. ] Signed-off-by: Ingo Molnar LKML-Reference: <1290281700.28711.9.camel@maggy.simson.net> Signed-off-by: Ingo Molnar commit 921169ca2f7c8a0a2ccda2ce33c465dfe3ae30ef Author: Steven Whitehouse Date: Mon Nov 29 12:50:38 2010 +0000 GFS2: Clean up of gdlm_lock function The DLM never returns -EAGAIN in response to dlm_lock(), and even if it did, the test in gdlm_lock() was wrong anyway. Once that test is removed, it is possible to greatly simplify this code by simply using a "normal" error return code (0 for success). We then no longer need the LM_OUT_ASYNC return code which can be removed. Signed-off-by: Steven Whitehouse commit 802ec9b6682349d9d9c92a9e55f44324d2954f41 Author: Abhijith Das Date: Thu Nov 18 11:26:46 2010 -0500 GFS2: Allow gfs2 to update quota usage values through the quotactl interface With this patch the gfs2_set_dqblk() function will be able to update the quota usage block count (FS_DQ_BCOUNT) in addition to the already supported FS_DQ_BHARD (limit) and FS_DQ_BSOFT (warn) fields of the dquot structure. Signed-off-by: Abhi Das Signed-off-by: Steven Whitehouse commit edc221d00bd5c6da0e5c67701f3782b72796619f Author: Joe Perches Date: Wed Nov 10 13:19:06 2010 -0800 GFS2: fs/gfs2/glock.h: Add __attribute__((format(printf,2,3)) to gfs2_print_dbg Functions that use printf formatting, especially those that use %pV, should have their uses of printf format and arguments checked by the compiler. Signed-off-by: Joe Perches Signed-off-by: Steven Whitehouse commit 5e69069c1afb655b5f1a154856ccdb4bb7327b81 Author: Joe Perches Date: Tue Nov 9 16:35:20 2010 -0800 GFS2: fs/gfs2/glock.c: Use printf extension %pV Using %pV reduces the number of printk calls and eliminates any possible message interleaving from other printk calls. Signed-off-by: Joe Perches Signed-off-by: Steven Whitehouse commit 2ae51ed7b548c1d943d080da617515e801ea5c3e Author: Steven Whitehouse Date: Wed Nov 10 15:14:57 2010 +0000 GFS2: Clean up duplicated setattr code While preparing the last patch I noticed that the gfs2_setattr_simple code had been duplicated into two other places. This patch updates those to call gfs2_setattr_simple rather than open coding it. Signed-off-by: Steven Whitehouse commit 9e55cd53728719ac3a3234a6618259ab8e203a10 Author: Steven Whitehouse Date: Tue Nov 9 14:09:53 2010 +0000 GFS2: Remove unreachable calls to vmtruncate Suggested-by: Christoph Hellwig Signed-off-by: Steven Whitehouse commit cc18152eb7c27653199546bd14e991a451ab8d1b Author: Joe Perches Date: Fri Nov 5 16:12:36 2010 -0700 GFS2: fs/gfs2/glock.c: Convert sprintf_symbol to %pS Signed-off-by: Joe Perches Signed-off-by: Steven Whitehouse commit d2115778c7ea0df2201f1ad9aab948c49ffa1078 Author: Steven Whitehouse Date: Wed Nov 3 19:58:53 2010 +0000 GFS2: Change two WQ_RESCUERs into WQ_MEM_RECLAIM The WQ_RESCUER flag should only be used internally to the workqueue implementation. Signed-off-by: Steven Whitehouse Acked-by: Tejun Heo commit 822bc180a7f7a7bc5fcaaea195f41b487cc8cae8 Author: Paul Turner Date: Mon Nov 29 16:55:40 2010 -0800 sched: Fix unregister_fair_sched_group() In the flipping and flopping between calling unregister_fair_sched_group() on a per-cpu versus per-group basis we ended up in a bad state. Remove from the list for the passed cpu as opposed to some arbitrary index. ( This fixes explosions w/ autogroup as well as a group creation/destruction stress test. ) Reported-by: Stephen Rothwell Signed-off-by: Paul Turner Cc: Peter Zijlstra Cc: Mike Galbraith LKML-Reference: <20101130005740.080828123@google.com> Signed-off-by: Ingo Molnar commit 46fdb0937f26124700fc9fc80da4776330cc00d3 Author: Paul E. McKenney Date: Tue Oct 26 02:11:40 2010 -0700 rcu: Make synchronize_srcu_expedited() fast if running readers The synchronize_srcu_expedited() function is currently quick if there are no active readers, but will delay a full jiffy if there are any. If these readers leave their SRCU read-side critical sections quickly, this is way too long to wait. So this commit first waits ten microseconds, and only then falls back to jiffy-at-a-time waiting. Reported-by: Avi Kivity Reported-by: Marcelo Tosatti Tested-by: Takuya Yoshikawa Signed-off-by: Paul E. McKenney commit db3a8920995484e5e9a0abaf3bad2c7311b163db Author: Paul E. McKenney Date: Mon Oct 25 07:39:22 2010 -0700 rcu: fix race condition in synchronize_sched_expedited() The new (early 2010) implementation of synchronize_sched_expedited() uses try_stop_cpu() to force a context switch on every CPU. It also permits concurrent calls to synchronize_sched_expedited() to share a single call to try_stop_cpu() through use of an atomically incremented synchronize_sched_expedited_count variable. Unfortunately, this is subject to failure as follows: o Task A invokes synchronize_sched_expedited(), try_stop_cpus() succeeds, but Task A is preempted before getting to the atomic increment of synchronize_sched_expedited_count. o Task B also invokes synchronize_sched_expedited(), with exactly the same outcome as Task A. o Task C also invokes synchronize_sched_expedited(), again with exactly the same outcome as Tasks A and B. o Task D also invokes synchronize_sched_expedited(), but only gets as far as acquiring the mutex within try_stop_cpus() before being preempted, interrupted, or otherwise delayed. o Task E also invokes synchronize_sched_expedited(), but only gets to the snapshotting of synchronize_sched_expedited_count. o Tasks A, B, and C all increment synchronize_sched_expedited_count. o Task E fails to get the mutex, so checks the new value of synchronize_sched_expedited_count. It finds that the value has increased, so (wrongly) assumes that its work has been done, returning despite there having been no expedited grace period since it began. The solution is to have the lowest-numbered CPU atomically increment the synchronize_sched_expedited_count variable within the synchronize_sched_expedited_cpu_stop() function, which is under the protection of the mutex acquired by try_stop_cpus(). However, this also requires that piggybacking tasks wait for three rather than two instances of try_stop_cpu(), because we cannot control the order in which the per-CPU callback function occur. Cc: Tejun Heo Cc: Lai Jiangshan Signed-off-by: Paul E. McKenney commit 2d999e03b7c8305b4385dd20992e4ed3e827177b Author: Paul E. McKenney Date: Wed Oct 20 12:06:18 2010 -0700 rcu: update documentation/comments for Lai's adoption patch Lai's RCU-callback immediate-adoption patch changes the RCU tracing output, so update tracing.txt. Also update a few comments to clarify the synchronization design. Signed-off-by: Paul E. McKenney commit 29494be71afe2a16ad04e344306a620d7cc22d06 Author: Lai Jiangshan Date: Wed Oct 20 14:13:06 2010 +0800 rcu,cleanup: simplify the code when cpu is dying When we handle the CPU_DYING notifier, the whole system is stopped except for the current CPU. We therefore need no synchronization with the other CPUs. This allows us to move any orphaned RCU callbacks directly to the list of any online CPU without needing to run them through the global orphan lists. These global orphan lists can therefore be dispensed with. This commit makes thes changes, though currently victimizes CPU 0 @@@. Signed-off-by: Lai Jiangshan Signed-off-by: Paul E. McKenney commit 7b27d5475f86186914e54e4a6bb994e9a985337b Author: Lai Jiangshan Date: Thu Oct 21 11:29:05 2010 +0800 rcu,cleanup: move synchronize_sched_expedited() out of sched.c The first version of synchronize_sched_expedited() used the migration code in the scheduler, and was therefore implemented in kernel/sched.c. However, the more recent version of this code no longer uses the migration code, so this commit moves it to the main RCU source files. Signed-off-by: Lai Jiangshan Signed-off-by: Paul E. McKenney commit deb7a41815a8a32d4f9ea2af7a48ed1175222cec Author: Paul E. McKenney Date: Thu Sep 30 21:33:32 2010 -0700 rcu: get rid of obsolete "classic" names in TREE_RCU tracing The TREE_RCU tracing had obsolete rcuclassic_trace_init() and rcuclassic_trace_cleanup() function names. This commit brings them up to date: rcutree_trace_init() and rcutree_trace_cleanup(), respectively. Signed-off-by: Paul E. McKenney commit e940cc804ec212e483f91167b93d1740c2fd3415 Author: Paul E. McKenney Date: Thu Nov 4 14:55:26 2010 -0700 rcu: Distinguish between boosting and boosted RCU priority boosting's tracing did not distinguish between ongoing boosting and completion of boosting. This commit therefore adds this capability. Signed-off-by: Paul E. McKenney Signed-off-by: Paul E. McKenney commit 8e79e1f9615b83d1e1d26b328d1b776111ca0cf7 Author: Paul E. McKenney Date: Thu Nov 4 14:31:19 2010 -0700 rcu: document TINY_RCU and TINY_PREEMPT_RCU tracing. Add the required verbiage to Documentation/RCU/trace.txt. Signed-off-by: Paul E. McKenney Signed-off-by: Paul E. McKenney commit 9e571a82f0cb205a65a0ea41657f19f22b7fabb8 Author: Paul E. McKenney Date: Thu Sep 30 21:26:52 2010 -0700 rcu: add tracing for TINY_RCU and TINY_PREEMPT_RCU Add tracing for the tiny RCU implementations, including statistics on boosting in the case of TINY_PREEMPT_RCU and RCU_BOOST. Signed-off-by: Paul E. McKenney Signed-off-by: Paul E. McKenney commit 24278d148316d2180be6df40e06db013d8b232b8 Author: Paul E. McKenney Date: Mon Sep 27 17:25:23 2010 -0700 rcu: priority boosting for TINY_PREEMPT_RCU Add priority boosting, but only for TINY_PREEMPT_RCU. This is enabled by the default-off RCU_BOOST kernel parameter. The priority to which to boost preempted RCU readers is controlled by the RCU_BOOST_PRIO kernel parameter (defaulting to real-time priority 1) and the time to wait before boosting the readers blocking a given grace period is controlled by the RCU_BOOST_DELAY kernel parameter (defaulting to 500 milliseconds). Signed-off-by: Paul E. McKenney Signed-off-by: Paul E. McKenney commit 068ffaa8bfb67c2ddb3ecaf38cc90f94a1a92fe3 Author: Arnaldo Carvalho de Melo Date: Sat Nov 27 02:41:01 2010 -0200 perf tools: Fix lost and unknown events handling Fix it by explaining what can be happening and giving the number of processed and lost events. Also holler if unknown events were found, that can be due to processing a perf.data file collected using a newer tool where newer events got added on reporting using an older perf tool, that or a bug, so ask for a report to be made. Works on both --tui and --stdio. Suggested-by: Thomas Gleixner Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Stephane Eranian Cc: Thomas Gleixner LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 008f29d3865828bb27e35d6d3fa889d0853b469f Author: Shawn Bohrer Date: Sun Nov 21 10:09:39 2010 -0600 perf trace: Handle DT_UNKNOWN on filesystems that don't support d_type Some filesystems like xfs and reiserfs will return DT_UNKNOWN for the d_type. Handle this case by calling stat() to determine the type. Cc: Andreas Schwab Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <1290355779-3276-1-git-send-email-sbohrer@rgmadvisors.com> Signed-off-by: Shawn Bohrer Signed-off-by: Arnaldo Carvalho de Melo commit 9d1faba5fe410558099f13cfada2eab03186769d Author: Ian Munsie Date: Thu Nov 25 15:12:53 2010 +1100 perf symbols: Correct final kernel map guesses If a 32bit userspace perf is running on a 64bit kernel, the end of the final map in the kernel would incorrectly be set to 2^32-1 rather than 2^64-1. Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <1290658375-10342-1-git-send-email-imunsie@au1.ibm.com> Signed-off-by: Ian Munsie Signed-off-by: Arnaldo Carvalho de Melo commit 37982ba0a0630066a6a0844a66aedaf91c66db84 Author: Arnaldo Carvalho de Melo Date: Fri Nov 26 18:31:54 2010 -0200 perf events: Default to using event__process_lost Tool developers have to fill in a 'perf_event_ops' method table to specify how to handle each event, so far the ones that were not explicitely especified would get a stub that would just discard the event. Change that so that tool developers can get the lost event details and the total number of such events at the end of 'perf report -D' output. Suggested-by: Thomas Gleixner Cc: Frederic Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian CC: Thomas Gleixner Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit baa2f6cedbfae962f04281a31f08ec29667d31a0 Author: Arnaldo Carvalho de Melo Date: Fri Nov 26 19:39:15 2010 -0200 perf record: Add option to disable collecting build-ids Collecting build-ids for long running sessions may take a long time because it needs to traverse the whole just collected perf.data stream of events, marking the DSOs that had hits and then looking for the .note.gnu.build-id ELF section. For things like the 'trace' tool that records and right away consumes the data on systems where its unlikely that the DSOs being monitored will change while 'trace' runs, it is desirable to remove build id collection, so add a -B/--no-buildid option to perf record to allow such use case. Longer term we'll avoid all this if we, at DSO load time, in the kernel, take advantage of this slow code path to collect the build-id and stash it somewhere, so that we can insert it in the PERF_RECORD_MMAP event. Reported-by: Thomas Gleixner Cc: Frederic Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit af86da5318136eb49c0453c2e2be3280ee5d18d9 Author: Cyrill Gorcunov Date: Fri Nov 26 14:32:09 2010 +0300 perf, x86: P4 PMU - describe config format Add description of .config in a sake of RAW events. At least this should bring some light to those who will be reading this code. Signed-off-by: Cyrill Gorcunov Reviewed-by: Stephane Eranian Cc: Lin Ming Signed-off-by: Ingo Molnar commit 004417a6d468e24399e383645c068b498eed84ad Author: Peter Zijlstra Date: Thu Nov 25 18:38:29 2010 +0100 perf, arch: Cleanup perf-pmu init vs lockup-detector The perf hardware pmu got initialized at various points in the boot, some before early_initcall() some after (notably arch_initcall). The problem is that the NMI lockup detector is ran from early_initcall() and expects the hardware pmu to be present. Sanitize this by moving all architecture hardware pmu implementations to initialize at early_initcall() and move the lockup detector to an explicit initcall right after that. Cc: paulus Cc: davem Cc: Michael Cree Cc: Deng-Cheng Zhu Acked-by: Paul Mundt Acked-by: Will Deacon Signed-off-by: Peter Zijlstra LKML-Reference: <1290707759.2145.119.camel@laptop> Signed-off-by: Ingo Molnar commit 5ef428c4b5950dddce7311e84321abb3aff7ebb0 Author: Andi Kleen Date: Thu Nov 18 11:47:31 2010 +0100 x86: Set cpu masks before calling CPU_STARTING notifiers When booting up a CPU set the various topology masks before calling the CPU_STARTING notifier. This way the notifier can actually use the masks. This is needed for a perf change. Signed-off-by: Andi Kleen Signed-off-by: Peter Zijlstra LKML-Reference: <1290077254-12165-2-git-send-email-andi@firstfloor.org> Signed-off-by: Ingo Molnar commit 963988262c3c8f4234f64a0dde59446a295e07bb Author: Peter Zijlstra Date: Wed Nov 24 18:55:29 2010 +0100 perf: Ignore non-sampling overflows Some arch implementations call perf_event_overflow() by 'accident', ignore this. Reported-by: Francis Moreau Signed-off-by: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit 5d508e820a23d9b6e8a149dfaa8ba5cbedf3d95c Author: Franck Bui-Huu Date: Tue Nov 23 16:21:45 2010 +0100 perf: Don't bother to init the hrtimer for no SW sampling counters Signed-off-by: Franck Bui-Huu Signed-off-by: Peter Zijlstra LKML-Reference: <1290525705-6265-3-git-send-email-fbuihuu@gmail.com> Signed-off-by: Ingo Molnar commit 2e939d1da9b5628642314c1e68b4319e61263c94 Author: Franck Bui-Huu Date: Tue Nov 23 16:21:44 2010 +0100 perf: Limit event refresh to sampling event Signed-off-by: Franck Bui-Huu Signed-off-by: Peter Zijlstra LKML-Reference: <1290525705-6265-2-git-send-email-fbuihuu@gmail.com> Signed-off-by: Ingo Molnar commit 6c7e550f13f8ad82efb6a5653ae628c2543c1768 Author: Franck Bui-Huu Date: Tue Nov 23 16:21:43 2010 +0100 perf: Introduce is_sampling_event() and use it when appropriate. Signed-off-by: Franck Bui-Huu Signed-off-by: Peter Zijlstra LKML-Reference: <1290525705-6265-1-git-send-email-fbuihuu@gmail.com> Signed-off-by: Ingo Molnar commit 35d3778a8fe3c8b4a7513565e34d3bde00ce43ec Author: Peter Zijlstra Date: Wed Nov 24 10:43:55 2010 +0100 scripts/tags.sh: Add magic for trace-events Make tags find the trace-event definitions Signed-off-by: Peter Zijlstra Acked-by: WANG Cong LKML-Reference: <1290591835.2072.438.camel@laptop> Signed-off-by: Ingo Molnar commit 6c869e772c72d509d0db243a56c205ef48a29baf Merge: e4e91ac ee6dcfa Author: Ingo Molnar Date: Fri Nov 26 15:07:02 2010 +0100 Merge branch 'perf/urgent' into perf/core Conflicts: arch/x86/kernel/apic/hw_nmi.c Merge reason: Resolve conflict, queue up dependent patch. Signed-off-by: Ingo Molnar commit b7a2b39d9b7703ccf068f549c8dc3465fc41d015 Author: Nikanth Karthikesan Date: Fri Nov 26 12:37:09 2010 +0530 sched: Remove unused argument dest_cpu to migrate_task() Remove unused argument, 'dest_cpu' of migrate_task(), and pass runqueue, as it is always known at the call site. Signed-off-by: Nikanth Karthikesan Signed-off-by: Peter Zijlstra LKML-Reference: <201011261237.09187.knikanth@suse.de> Signed-off-by: Ingo Molnar commit 335d7afbfb71faac833734a94240c1e07cf0ead8 Author: Gerald Schaefer Date: Mon Nov 22 15:47:36 2010 +0100 mutexes, sched: Introduce arch_mutex_cpu_relax() The spinning mutex implementation uses cpu_relax() in busy loops as a compiler barrier. Depending on the architecture, cpu_relax() may do more than needed in this specific mutex spin loops. On System z we also give up the time slice of the virtual cpu in cpu_relax(), which prevents effective spinning on the mutex. This patch replaces cpu_relax() in the spinning mutex code with arch_mutex_cpu_relax(), which can be defined by each architecture that selects HAVE_ARCH_MUTEX_CPU_RELAX. The default is still cpu_relax(), so this patch should not affect other architectures than System z for now. Signed-off-by: Gerald Schaefer Signed-off-by: Peter Zijlstra LKML-Reference: <1290437256.7455.4.camel@thinkpad> Signed-off-by: Ingo Molnar commit 22a867d81707b0a2720bb5f65255265b95d30526 Merge: 5bb6b1e 3561d43 Author: Ingo Molnar Date: Fri Nov 26 15:03:27 2010 +0100 Merge commit 'v2.6.37-rc3' into sched/core Merge reason: Pick up latest fixes. Signed-off-by: Ingo Molnar commit e4e91ac410356da3a518188f371e9d3b52ee38ee Merge: ea7872b 3561d43 Author: Ingo Molnar Date: Fri Nov 26 15:04:42 2010 +0100 Merge commit 'v2.6.37-rc3' into perf/core Merge reason: Pick up latest fixes. Signed-off-by: Ingo Molnar commit ea7872b9d6a81101f6ba0ec141544a62fea35876 Author: Hitoshi Mitake Date: Thu Nov 25 16:04:53 2010 +0900 perf bench: Add feature that measures the performance of the arch/x86/lib/memcpy_64.S memcpy routines via 'perf bench mem' This patch ports arch/x86/lib/memcpy_64.S to perf bench mem memcpy for benchmarking memcpy() in userland with tricky and dirty way. util/include/asm/cpufeature.h, util/include/asm/dwarf2.h, and util/include/linux/linkage.h are mostly dummy files with small wrappers, so that we are able to include memcpy_64.S unmodified. Signed-off-by: Hitoshi Mitake Cc: h.mitake@gmail.com Cc: Miao Xie Cc: Ma Ling Cc: Zhao Yakui Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Frederic Weisbecker Cc: Steven Rostedt Cc: Andi Kleen LKML-Reference: <1290668693-27068-2-git-send-email-mitake@dcl.info.waseda.ac.jp> Signed-off-by: Ingo Molnar commit 49ce8fc651794878189fd5f273228832cdfb5be9 Author: Hitoshi Mitake Date: Thu Nov 25 16:04:52 2010 +0900 perf bench: Print both of prefaulted and no prefaulted results by default After applying this patch, perf bench mem memcpy prints both of prefualted and without prefaulted score of memcpy(). New options --no-prefault and --only-prefault are added to print single result, mainly for scripting usage. Usage example: | mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB | # Running mem/memcpy benchmark... | # Copying 500MB Bytes ... | | 634.969014 MB/Sec | 4.828062 GB/Sec (with prefault) | mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB --only-prefault | # Running mem/memcpy benchmark... | # Copying 500MB Bytes ... | | 4.705192 GB/Sec (with prefault) | mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB --no-prefault | # Running mem/memcpy benchmark... | # Copying 500MB Bytes ... | | 642.725568 MB/Sec Signed-off-by: Hitoshi Mitake Cc: h.mitake@gmail.com Cc: Miao Xie Cc: Ma Ling Cc: Zhao Yakui Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Frederic Weisbecker Cc: Steven Rostedt Cc: Andi Kleen LKML-Reference: <1290668693-27068-1-git-send-email-mitake@dcl.info.waseda.ac.jp> Signed-off-by: Ingo Molnar commit 5bb6b1ea67a73f0665a41726dd7138977b992c6c Author: Peter Zijlstra Date: Fri Nov 19 21:11:09 2010 +0100 sched: Add some clock info to sched_debug Add more clock information to /proc/sched_debug, Thomas wanted to see the sched_clock_stable state. Requested-by: Thomas Gleixner Signed-off-by: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit 51a96c77815e7f139892a6e9c8275a50e9baebdf Author: Peter Zijlstra Date: Fri Nov 19 20:37:53 2010 +0100 cpu: Remove incorrect BUG_ON Oleg mentioned that there is no actual guarantee the dying cpu's migration thread is actually finished running when we get there, so replace the BUG_ON() with a spinloop waiting for it. Reported-by: Oleg Nesterov Signed-off-by: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit 2e01f4740a874b6085da6ebf541e7ffde9a72bf2 Author: Dhaval Giani Date: Thu Nov 18 15:44:54 2010 +0100 cpu: Remove unused variable GCC warns us about: kernel/cpu.c: In function ‘take_cpu_down’: kernel/cpu.c:200:15: warning: unused variable ‘cpu’ This variable is unused since param->hcpu is directly used later on in cpu_notify. Signed-off-by: Dhaval Giani Signed-off-by: Peter Zijlstra LKML-Reference: <1290091494.1145.5.camel@gondor.retis> Signed-off-by: Ingo Molnar commit 70caf8a6c13c2279b35f2ad6b644815533d6c476 Author: Peter Zijlstra Date: Sat Nov 20 00:53:51 2010 +0100 sched: Fix UP build breakage The recent cgroup-scheduling rework caused a UP build problem. Cc: Paul Turner Signed-off-by: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit 28d0686cf7b14e30243096bd874d3f80591ed392 Author: Erik Gilling Date: Fri Nov 19 18:08:51 2010 -0800 sched: Make task dump print all 15 chars of proc comm Signed-off-by: Erik Gilling Signed-off-by: John Stultz Signed-off-by: Peter Zijlstra LKML-Reference: <1290218934-8544-3-git-send-email-john.stultz@linaro.org> Signed-off-by: Ingo Molnar commit 691513f70d3957939a318da970987b876c720861 Author: Lin Ming Date: Mon Nov 22 14:03:28 2010 +0100 x86: Resume trampoline must be executable commit 5bd5a452(x86: Add NX protection for kernel data) marked the trampoline area NX - which unsurprisingly breaks resume and cpu hotplug. Revert the portion of that commit, which touches the trampoline. Originally-from: Lin Ming LKML-Reference: <1290410581.2405.24.camel@minggr.sh.intel.com> Cc: Matthieu Castet Cc: Siarhei Liakh Cc: Xuxian Jiang Cc: Ingo Molnar Cc: Arjan van de Ven Cc: Andi Kleen Tested-by: Peter Zijlstra Signed-off-by: Thomas Gleixner commit d9cf837ef9629ab34167bd6fc0141383ddb8813a Author: Corey Ashford Date: Fri Nov 19 17:37:24 2010 -0800 perf stat: Change and clean up sys_perf_event_open error handling This patch makes several changes to "perf stat": - "perf stat" will no longer go ahead and run the application when one or more of the specified events could not be opened. - Use error() and die() instead of pr_err() so that the output is more consistent with "perf top" and "perf record". - Handle permission errors in a more robust way, and in a similar way to "perf record" and "perf top". In addition, the sys_perf_event_open() error handling of "perf top" and "perf record" is made more consistent and adds the following phrase when an event doesn't open (with something ther than an access or permission error): "/bin/dmesg may provide additional information." This is added because kernel code doesn't have a good way of expressing detailed errors to user space, so its only avenue is to use printk's. However, many users may not think of looking at dmesg to find out why an event is being rejected. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Ian Munsie Cc: Michael Ellerman LKML-Reference: <1290217044-26293-1-git-send-email-cjashfor@linux.vnet.ibm.com> Signed-off-by: Corey Ashford Signed-off-by: Arnaldo Carvalho de Melo commit 9cdca869724e766eb48c061967cb777ddb436c76 Author: Thomas Gleixner Date: Sat Nov 20 10:37:05 2010 +0100 x86: platform: Move iris to x86/platform where it belongs Signed-off-by: Thomas Gleixner commit 4aafd3f71a257a3522932944b5204262dfdff147 Author: Arnaldo Carvalho de Melo Date: Fri Nov 19 16:46:26 2010 -0200 perf tools: Change my maintainer address Also remove old snail mail address from CREDITS, moved years ago. LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit a71123977eb3c72dd5a8bac723b13faf9cdd2828 Author: Robert Morell Date: Tue Nov 16 14:16:33 2010 -0800 perf tools: Remove hardcoded include paths for elfutils This change removes the use of hardcoded absolute "/usr/include/elfutils" paths from the perf build. The problem with hardcoded paths is that it prevents them from being overridden by $prefix or by -I in CFLAGS (e.g., for cross-compiling purposes). Instead, just include the "elfutils/" subdirectory as a relative path when files are needed from that directory. Tested by building perf: - Cross-compiled for ARM on x86_64 - Built natively on x86_64 - Built on x86_64 with /usr/include/elfutils moved to another location and manually included in CFLAGS Acked-by: Masami Hiramatsu Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Ingo Molnar Cc: Masami Hiramatsu LKML-Reference: <1289945793-31441-1-git-send-email-rmorell@nvidia.com> Signed-off-by: Robert Morell Signed-off-by: Arnaldo Carvalho de Melo commit f5b4a9c3ab53d544a540a6f3a5d17184e374d91a Author: Stephane Eranian Date: Tue Nov 16 11:05:01 2010 +0200 perf stat: Add no-aggregation mode to -a This patch adds a new -A option to perf stat. If specified then perf stat does not aggregate counts across all monitored CPUs in system-wide mode, i.e., when using -a. This option is not supported in per-thread mode. Being able to get a per-cpu breakdown is useful to detect imbalances between CPUs when running a uniform workload than spans all monitored CPUs. The second version corrects the missing cpumap[] support, so that it works when the -C option is used. The third version fixes a missing cpumap[] in print_counter() and removes a stray patch in builtin-trace.c. Examples on a 4-way system: # perf stat -a -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': 9592808135 cycles 3490380006 instructions # 0.364 IPC 1.001584632 seconds time elapsed # perf stat -a -A -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': CPU0 2398163767 cycles CPU1 2398180817 cycles CPU2 2398217115 cycles CPU3 2398247483 cycles CPU0 872282046 instructions # 0.364 IPC CPU1 873481776 instructions # 0.364 IPC CPU2 872638127 instructions # 0.364 IPC CPU3 872437789 instructions # 0.364 IPC 1.001556052 seconds time elapsed Cc: David S. Miller Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Robert Richter LKML-Reference: <4ce257b5.1e07e30a.7b6b.3aa9@mx.google.com> Signed-off-by: Stephane Eranian Signed-off-by: Arnaldo Carvalho de Melo commit 042957801626465492b9428860de39a3cb2a8219 Author: Steven Rostedt Date: Fri Nov 12 22:32:11 2010 -0500 tracing/events: Show real number in array fields Currently we have in something like the sched_switch event: field:char prev_comm[TASK_COMM_LEN]; offset:12; size:16; signed:1; When a userspace tool such as perf tries to parse this, the TASK_COMM_LEN is meaningless. This is done because the TRACE_EVENT() macro simply uses a #len to show the string of the length. When the length is an enum, we get a string that means nothing for tools. By adding a static buffer and a mutex to protect it, we can store the string into that buffer with snprintf and show the actual number. Now we get: field:char prev_comm[16]; offset:12; size:16; signed:1; Something much more useful. Signed-off-by: Steven Rostedt commit 45677454dd6d128608117abe7dcd2bdfdd7cdf72 Author: Wu Zhangjin Date: Thu Oct 28 00:24:34 2010 +0800 ftrace: Speed up recordmcount cmd_record_mcount is used to locate the _mcount symbols in the object files, only the files compiled with -pg has the _mcount symbol, so, it is only needed for such files, but the current cmd_record_mcount is used for all of the object files, so, we need to fix it and speed it up. Since -pg may be removed by the method used in kernel/trace/Makefile: ORIG_CFLAGS := $(KBUILD_CFLAGS) KBUILD_CFLAGS = $(subst -pg,,$(ORIG_CFLAGS)) Or may be removed by the method used in arch/x86/kernel/Makefile: CFLAGS_REMOVE_file.o = -pg So, we must check the last variable stores the compiling flags, that is c_flags(Please refer to cmd_cc_o_c and rule_cc_o_c defined in scripts/Makefile.build) and since the CFLAGS_REMOVE_file.o is already filtered in _c_flags(Please refer to scripts/Makefile.lib) and _c_flags has less symbols, therefore, we only need to check _c_flags. --------------- Changes from v1: o Don't touch Makefile for CONFIG_FTRACE_MCOUNT_RECORD is enough o Use _c_flags intead of KBUILD_CFLAGS to cover CONFIG_REMOVE_file.o = -pg (feedback from Steven Rostedt ) Acked-by: Michal Marek Signed-off-by: Wu Zhangjin LKML-Reference: <3dc8cddf022eb7024f9f2cf857529a15bee8999a.1288196498.git.wuzhangjin@gmail.com> [ changed if [ .. == .. ] to if [ .. = .. ] to handle dash environments ] Signed-off-by: Steven Rostedt commit 5ca9afdb9f6a5267927b54de3f42c756e8af7fcd Author: Vasiliy Kulikov Date: Thu Nov 18 21:16:45 2010 +0300 x86, mrst: Check platform_device_register() return code platform_device_register() may fail, if so propagate the return code from mrst_device_create(). Signed-off-by: Vasiliy Kulikov LKML-Reference: <1290104207-31279-1-git-send-email-segoon@openwall.com> Acked-by: Alan Cox Signed-off-by: H. Peter Anvin commit ae51ce9061b1ddc0fde363913c932bee5b9bc5fd Merge: 072b198 423478c Author: Ingo Molnar Date: Thu Nov 18 20:07:12 2010 +0100 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core commit f658bcfb2607bf0808966a69cf74135ce98e5c2d Author: Hans Rosenfeld Date: Fri Oct 29 17:14:32 2010 +0200 x86, cacheinfo: Cleanup L3 cache index disable support Adaptions to the changes of the AMD northbridge caching code: instead of a bool in each l3 struct, use a flag in amd_northbridges.flags to indicate L3 cache index disable support; use a pointer to the whole northbridge instead of the misc device in the l3 struct; simplify the initialisation; dynamically generate sysfs attribute array. Signed-off-by: Hans Rosenfeld Signed-off-by: Borislav Petkov commit 9653a5c76c8677b05b45b3b999d3b39988d2a064 Author: Hans Rosenfeld Date: Fri Oct 29 17:14:31 2010 +0200 x86, amd-nb: Cleanup AMD northbridge caching code Support more than just the "Misc Control" part of the northbridges. Support more flags by turning "gart_supported" into a single bit flag that is stored in a flags member. Clean up related code by using a set of functions (amd_nb_num(), amd_nb_has_feature() and node_to_amd_nb()) instead of accessing the NB data structures directly. Reorder the initialization code and put the GART flush words caching in a separate function. Signed-off-by: Hans Rosenfeld Signed-off-by: Borislav Petkov commit eec1d4fa00c6552ae2fdf71d59f1eded7c88dd89 Author: Hans Rosenfeld Date: Fri Oct 29 17:14:30 2010 +0200 x86, amd-nb: Complete the rename of AMD NB and related code Not only the naming of the files was confusing, it was even more so for the function and variable names. Renamed the K8 NB and NUMA stuff that is also used on other AMD platforms. This also renames the CONFIG_K8_NUMA option to CONFIG_AMD_NUMA and the related file k8topology_64.c to amdtopology_64.c. No functional changes intended. Signed-off-by: Hans Rosenfeld Signed-off-by: Borislav Petkov commit 423478cde453eebdfcfebf4b8d378d8f5d49b853 Author: Frederic Weisbecker Date: Thu Nov 18 02:21:26 2010 +0100 tracing: Remove useless syscall ftrace_event_call declaration It is defined right after, which makes the declaration completely useless. Signed-off-by: Frederic Weisbecker Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Thomas Gleixner Cc: Steven Rostedt Cc: Li Zefan Cc: Jason Baron commit 53cf810b1934f08a68e131aeeb16267a778f43df Author: Frederic Weisbecker Date: Thu Nov 18 02:11:42 2010 +0100 tracing: Allow syscall trace events for non privileged users As for the raw syscalls events, individual syscall events won't leak system wide information on task bound tracing. Allow non privileged users to use them in such workflow. Signed-off-by: Frederic Weisbecker Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Thomas Gleixner Cc: Steven Rostedt Cc: Li Zefan Cc: Jason Baron commit fe5542030dce3b951f9eaf3ecb9a7bc5fa7bfed1 Author: Frederic Weisbecker Date: Thu Nov 18 01:52:06 2010 +0100 tracing: Allow raw syscall trace events for non privileged users This allows non privileged users to use the raw syscall trace events for task bound tracing in perf. It is safe because raw syscall trace events don't leak system wide informations. Signed-off-by: Frederic Weisbecker Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Thomas Gleixner Cc: Steven Rostedt Cc: Li Zefan Cc: Jason Baron commit 1ed0c5971159974185653170543a764cc061c857 Author: Frederic Weisbecker Date: Thu Nov 18 01:46:57 2010 +0100 tracing: New macro to set up initial event flags value This introduces the new TRACE_EVENT_FLAGS() macro in order to set up initial event flags value. This macro must simply follow the definition of a trace event and take the event name and the flag value as parameters: TRACE_EVENT(my_event, ..... .... ); TRACE_EVENT_FLAGS(my_event, 1) This will set up 1 as the initial my_event->flags value. Signed-off-by: Frederic Weisbecker Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Thomas Gleixner Cc: Steven Rostedt Cc: Li Zefan Cc: Jason Baron commit 61c32659b12c44e62de32fbf99f7e4ca783dc38b Author: Frederic Weisbecker Date: Thu Nov 18 01:39:17 2010 +0100 tracing: New flag to allow non privileged users to use a trace event This adds a new trace event internal flag that allows them to be used in perf by non privileged users in case of task bound tracing. This is desired for syscalls tracepoint because they don't leak global system informations, like some other tracepoints. Signed-off-by: Frederic Weisbecker Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Thomas Gleixner Cc: Steven Rostedt Cc: Li Zefan Cc: Jason Baron commit 9c0729dc8062bed96189bd14ac6d4920f3958743 Author: Soeren Sandmann Pedersen Date: Fri Nov 5 05:59:39 2010 -0400 x86: Eliminate bp argument from the stack tracing routines The various stack tracing routines take a 'bp' argument in which the caller is supposed to provide the base pointer to use, or 0 if doesn't have one. Since bp is garbage whenever CONFIG_FRAME_POINTER is not defined, this means all callers in principle should either always pass 0, or be conditional on CONFIG_FRAME_POINTER. However, there are only really three use cases for stack tracing: (a) Trace the current task, including IRQ stack if any (b) Trace the current task, but skip IRQ stack (c) Trace some other task In all cases, if CONFIG_FRAME_POINTER is not defined, bp should just be 0. If it _is_ defined, then - in case (a) bp should be gotten directly from the CPU's register, so the caller should pass NULL for regs, - in case (b) the caller should should pass the IRQ registers to dump_trace(), - in case (c) bp should be gotten from the top of the task's stack, so the caller should pass NULL for regs. Hence, the bp argument is not necessary because the combination of task and regs is sufficient to determine an appropriate value for bp. This patch introduces a new inline function stack_frame(task, regs) that computes the desired bp. This function is then called from the two versions of dump_stack(). Signed-off-by: Soren Sandmann Acked-by: Steven Rostedt Cc: Thomas Gleixner Cc: Ingo Molnar Cc: H. Peter Anvin Cc: Peter Zijlstra Cc: Arjan van de Ven , Cc: Frederic Weisbecker , Cc: Arnaldo Carvalho de Melo , LKML-Reference: > Signed-off-by: Frederic Weisbecker commit 84e1c6bb38eb318e456558b610396d9f1afaabf0 Author: matthieu castet Date: Tue Nov 16 22:35:16 2010 +0100 x86: Add RO/NX protection for loadable kernel modules This patch is a logical extension of the protection provided by CONFIG_DEBUG_RODATA to LKMs. The protection is provided by splitting module_core and module_init into three logical parts each and setting appropriate page access permissions for each individual section: 1. Code: RO+X 2. RO data: RO+NX 3. RW data: RW+NX In order to achieve proper protection, layout_sections() have been modified to align each of the three parts mentioned above onto page boundary. Next, the corresponding page access permissions are set right before successful exit from load_module(). Further, free_module() and sys_init_module have been modified to set module_core and module_init as RW+NX right before calling module_free(). By default, the original section layout and access flags are preserved. When compiled with CONFIG_DEBUG_SET_MODULE_RONX=y, the patch will page-align each group of sections to ensure that each page contains only one type of content and will enforce RO/NX for each group of pages. -v1: Initial proof-of-concept patch. -v2: The patch have been re-written to reduce the number of #ifdefs and to make it architecture-agnostic. Code formatting has also been corrected. -v3: Opportunistic RO/NX protection is now unconditional. Section page-alignment is enabled when CONFIG_DEBUG_RODATA=y. -v4: Removed most macros and improved coding style. -v5: Changed page-alignment and RO/NX section size calculation -v6: Fixed comments. Restricted RO/NX enforcement to x86 only -v7: Introduced CONFIG_DEBUG_SET_MODULE_RONX, added calls to set_all_modules_text_rw() and set_all_modules_text_ro() in ftrace -v8: updated for compatibility with linux 2.6.33-rc5 -v9: coding style fixes -v10: more coding style fixes -v11: minor adjustments for -tip -v12: minor adjustments for v2.6.35-rc2-tip -v13: minor adjustments for v2.6.37-rc1-tip Signed-off-by: Siarhei Liakh Signed-off-by: Xuxian Jiang Acked-by: Arjan van de Ven Reviewed-by: James Morris Signed-off-by: H. Peter Anvin Cc: Andi Kleen Cc: Rusty Russell Cc: Stephen Rothwell Cc: Dave Jones Cc: Kees Cook Cc: Linus Torvalds LKML-Reference: <4CE2F914.9070106@free.fr> [ minor cleanliness edits, -v14: build failure fix ] Signed-off-by: Ingo Molnar commit 9437178f623a19af5951808d880a8599f66ac150 Author: Paul Turner Date: Mon Nov 15 15:47:10 2010 -0800 sched: Update tg->shares after cpu.shares write Formerly sched_group_set_shares would force a rebalance by overflowing domain share sums. Now that per-cpu averages are maintained we can set the true value by issuing an update_cfs_shares() following a tg->shares update. Also initialize tg se->load to 0 for consistency since we'll now set correct weights on enqueue. Signed-off-by: Paul Turner Signed-off-by: Peter Zijlstra LKML-Reference: <20101115234938.465521344@google.com> Signed-off-by: Ingo Molnar commit d6b5591829bd348a5fbe1c428d28dea00621cdba Author: Paul Turner Date: Mon Nov 15 15:47:09 2010 -0800 sched: Allow update_cfs_load() to update global load Refactor the global load updates from update_shares_cpu() so that update_cfs_load() can update global load when it is more than ~10% out of sync. The new global_load parameter allows us to force an update, regardless of the error factor so that we can synchronize w/ update_shares(). Signed-off-by: Paul Turner Signed-off-by: Peter Zijlstra LKML-Reference: <20101115234938.377473595@google.com> Signed-off-by: Ingo Molnar commit 3b3d190ec3683d568fd2ebaead5e1ec7f97b6e37 Author: Paul Turner Date: Mon Nov 15 15:47:08 2010 -0800 sched: Implement demand based update_cfs_load() When the system is busy, dilation of rq->next_balance makes lb->update_shares() insufficiently frequent for threads which don't sleep (no dequeue/enqueue updates). Adjust for this by making demand based updates based on the accumulation of execution time sufficient to wrap our averaging window. Signed-off-by: Paul Turner Signed-off-by: Peter Zijlstra LKML-Reference: <20101115234938.291159744@google.com> Signed-off-by: Ingo Molnar commit c66eaf619c0c7937e9ded160ae83b5a7a6b19b56 Author: Paul Turner Date: Mon Nov 15 15:47:07 2010 -0800 sched: Update shares on idle_balance Since shares updates are no longer expensive and effectively local, update them at idle_balance(). This allows us to more quickly redistribute shares to another cpu when our load becomes idle. Signed-off-by: Paul Turner Signed-off-by: Peter Zijlstra LKML-Reference: <20101115234938.204191702@google.com> Signed-off-by: Ingo Molnar commit a7a4f8a752ec734b2eab904fc863d5dc873de338 Author: Paul Turner Date: Mon Nov 15 15:47:06 2010 -0800 sched: Add sysctl_sched_shares_window Introduce a new sysctl for the shares window and disambiguate it from sched_time_avg. A 10ms window appears to be a good compromise between accuracy and performance. Signed-off-by: Paul Turner Signed-off-by: Peter Zijlstra LKML-Reference: <20101115234938.112173964@google.com> Signed-off-by: Ingo Molnar commit 67e86250f8ea7b8f7da53ac25ea73c6bd71f5cd9 Author: Paul Turner Date: Mon Nov 15 15:47:05 2010 -0800 sched: Introduce hierarchal order on shares update list Avoid duplicate shares update calls by ensuring children always appear before parents in rq->leaf_cfs_rq_list. This allows us to do a single in-order traversal for update_shares(). Since we always enqueue in bottom-up order this reduces to 2 cases: 1) Our parent is already in the list, e.g. root \ b /\ c d* (root->b->c already enqueued) Since d's parent is enqueued we push it to the head of the list, implicitly ahead of b. 2) Our parent does not appear in the list (or we have no parent) In this case we enqueue to the tail of the list, if our parent is subsequently enqueued (bottom-up) it will appear to our right by the same rule. Signed-off-by: Paul Turner Signed-off-by: Peter Zijlstra LKML-Reference: <20101115234938.022488865@google.com> Signed-off-by: Ingo Molnar commit e33078baa4d30ad1d0e46d1f62b9e5a63a3e6ee3 Author: Paul Turner Date: Mon Nov 15 15:47:04 2010 -0800 sched: Fix update_cfs_load() synchronization Using cfs_rq->nr_running is not sufficient to synchronize update_cfs_load with the put path since nr_running accounting occurs at deactivation. It's also not safe to make the removal decision based on load_avg as this fails with both high periods and low shares. Resolve this by clipping history after 4 periods without activity. Note: the above will always occur from update_shares() since in the last-task-sleep-case that task will still be cfs_rq->curr when update_cfs_load is called. Signed-off-by: Paul Turner Signed-off-by: Peter Zijlstra LKML-Reference: <20101115234937.933428187@google.com> Signed-off-by: Ingo Molnar commit f0d7442a5924a802b66eef79b3708f77297bfb35 Author: Paul Turner Date: Mon Nov 15 15:47:03 2010 -0800 sched: Fix load corruption from update_cfs_shares() As part of enqueue_entity both a new entity weight and its contribution to the queuing cfs_rq / rq are updated. Since update_cfs_shares will only update the queueing weights when the entity is on_rq (which in this case it is not yet), there's a dependency loop here: update_cfs_shares needs account_entity_enqueue to update cfs_rq->load.weight account_entity_enqueue needs the updated weight for the queuing cfs_rq load[*] Fix this and avoid spurious dequeue/enqueues by issuing update_cfs_shares as if we had accounted the enqueue already. This was also resulting in rq->load corruption previously. [*]: this dependency also exists when using the group cfs_rq w/ update_cfs_shares as the weight of the enqueued entity changes without the load being updated. Signed-off-by: Paul Turner Signed-off-by: Peter Zijlstra LKML-Reference: <20101115234937.844900206@google.com> Signed-off-by: Ingo Molnar commit 9e3081ca61147b29f52fddb4f7c6b6b82ea5eb7a Author: Peter Zijlstra Date: Mon Nov 15 15:47:02 2010 -0800 sched: Make tg_shares_up() walk on-demand Make tg_shares_up() use the active cgroup list, this means we cannot do a strict bottom-up walk of the hierarchy, but assuming its a very wide tree with a small number of active groups it should be a win. Signed-off-by: Paul Turner Signed-off-by: Peter Zijlstra LKML-Reference: <20101115234937.754159484@google.com> Signed-off-by: Ingo Molnar commit 3d4b47b4b040c9d77dd68104cfc1055d89a55afd Author: Peter Zijlstra Date: Mon Nov 15 15:47:01 2010 -0800 sched: Implement on-demand (active) cfs_rq list Make certain load-balance actions scale per number of active cgroups instead of the number of existing cgroups. This makes wakeup/sleep paths more expensive, but is a win for systems where the vast majority of existing cgroups are idle. Signed-off-by: Paul Turner Signed-off-by: Peter Zijlstra LKML-Reference: <20101115234937.666535048@google.com> Signed-off-by: Ingo Molnar commit 2069dd75c7d0f49355939e5586daf5a9ab216db7 Author: Peter Zijlstra Date: Mon Nov 15 15:47:00 2010 -0800 sched: Rewrite tg_shares_up) By tracking a per-cpu load-avg for each cfs_rq and folding it into a global task_group load on each tick we can rework tg_shares_up to be strictly per-cpu. This should improve cpu-cgroup performance for smp systems significantly. [ Paul: changed to use queueing cfs_rq + bug fixes ] Signed-off-by: Paul Turner Signed-off-by: Peter Zijlstra LKML-Reference: <20101115234937.580480400@google.com> Signed-off-by: Ingo Molnar commit 48c5ccae88dcd989d9de507e8510313c6cbd352b Author: Peter Zijlstra Date: Sat Nov 13 19:32:29 2010 +0100 sched: Simplify cpu-hot-unplug task migration While discussing the need for sched_idle_next(), Oleg remarked that since try_to_wake_up() ensures sleeping tasks will end up running on a sane cpu, we can do away with migrate_live_tasks(). If we then extend the existing hack of migrating current from CPU_DYING to migrating the full rq worth of tasks from CPU_DYING, the need for the sched_idle_next() abomination disappears as well, since idle will be the only possible thread left after the migration thread stops. This greatly simplifies the hot-unplug task migration path, as can be seen from the resulting code reduction (and about half the new lines are comments). Suggested-by: Oleg Nesterov Signed-off-by: Peter Zijlstra LKML-Reference: <1289851597.2109.547.camel@laptop> Signed-off-by: Ingo Molnar commit 92fd4d4d67b945c0766416284d4ab236b31542c4 Merge: fe7de49 e53beac Author: Ingo Molnar Date: Thu Nov 18 13:22:14 2010 +0100 Merge commit 'v2.6.37-rc2' into sched/core Merge reason: Move to a .37-rc base. Signed-off-by: Ingo Molnar commit 5bd5a452662bc37c54fb6828db1a3faf87e6511c Author: Matthieu Castet Date: Tue Nov 16 22:31:26 2010 +0100 x86: Add NX protection for kernel data This patch expands functionality of CONFIG_DEBUG_RODATA to set main (static) kernel data area as NX. The following steps are taken to achieve this: 1. Linker script is adjusted so .text always starts and ends on a page bound 2. Linker script is adjusted so .rodata always start and end on a page boundary 3. NX is set for all pages from _etext through _end in mark_rodata_ro. 4. free_init_pages() sets released memory NX in arch/x86/mm/init.c 5. bios rom is set to x when pcibios is used. The results of patch application may be observed in the diff of kernel page table dumps: pcibios: -- data_nx_pt_before.txt 2009-10-13 07:48:59.000000000 -0400 ++ data_nx_pt_after.txt 2009-10-13 07:26:46.000000000 -0400 0x00000000-0xc0000000 3G pmd ---[ Kernel Mapping ]--- -0xc0000000-0xc0100000 1M RW GLB x pte +0xc0000000-0xc00a0000 640K RW GLB NX pte +0xc00a0000-0xc0100000 384K RW GLB x pte -0xc0100000-0xc03d7000 2908K ro GLB x pte +0xc0100000-0xc0318000 2144K ro GLB x pte +0xc0318000-0xc03d7000 764K ro GLB NX pte -0xc03d7000-0xc0600000 2212K RW GLB x pte +0xc03d7000-0xc0600000 2212K RW GLB NX pte 0xc0600000-0xf7a00000 884M RW PSE GLB NX pmd 0xf7a00000-0xf7bfe000 2040K RW GLB NX pte 0xf7bfe000-0xf7c00000 8K pte No pcibios: -- data_nx_pt_before.txt 2009-10-13 07:48:59.000000000 -0400 ++ data_nx_pt_after.txt 2009-10-13 07:26:46.000000000 -0400 0x00000000-0xc0000000 3G pmd ---[ Kernel Mapping ]--- -0xc0000000-0xc0100000 1M RW GLB x pte +0xc0000000-0xc0100000 1M RW GLB NX pte -0xc0100000-0xc03d7000 2908K ro GLB x pte +0xc0100000-0xc0318000 2144K ro GLB x pte +0xc0318000-0xc03d7000 764K ro GLB NX pte -0xc03d7000-0xc0600000 2212K RW GLB x pte +0xc03d7000-0xc0600000 2212K RW GLB NX pte 0xc0600000-0xf7a00000 884M RW PSE GLB NX pmd 0xf7a00000-0xf7bfe000 2040K RW GLB NX pte 0xf7bfe000-0xf7c00000 8K pte The patch has been originally developed for Linux 2.6.34-rc2 x86 by Siarhei Liakh and Xuxian Jiang . -v1: initial patch for 2.6.30 -v2: patch for 2.6.31-rc7 -v3: moved all code into arch/x86, adjusted credits -v4: fixed ifdef, removed credits from CREDITS -v5: fixed an address calculation bug in mark_nxdata_nx() -v6: added acked-by and PT dump diff to commit log -v7: minor adjustments for -tip -v8: rework with the merge of "Set first MB as RW+NX" Signed-off-by: Siarhei Liakh Signed-off-by: Xuxian Jiang Signed-off-by: Matthieu CASTET Cc: Arjan van de Ven Cc: James Morris Cc: Andi Kleen Cc: Rusty Russell Cc: Stephen Rothwell Cc: Dave Jones Cc: Kees Cook Cc: Linus Torvalds LKML-Reference: <4CE2F82E.60601@free.fr> [ minor cleanliness edits ] Signed-off-by: Ingo Molnar commit 64edc8ed5ffae999d8d413ba006850e9e34166cb Author: matthieu castet Date: Tue Nov 16 22:30:27 2010 +0100 x86: Fix improper large page preservation This patch fixes a bug in try_preserve_large_page() which may result in improper large page preservation and improper application of page attributes to the memory area outside of the original change request. More specifically, the problem manifests itself when set_memory_*() is called for several pages at the beginning of the large page and try_preserve_large_page() erroneously concludes that the change can be applied to whole large page. The fix consists of 3 parts: 1. Addition of "required" protection attributes in static_protections(), so .data and .bss can be guaranteed to stay "RW" 2. static_protections() is now called for every small page within large page to determine compatibility of new protection attributes (instead of just small pages within the requested range). 3. Large page can be preserved only if attribute change is large-page-aligned and covers whole large page. -v1: Try_preserve_large_page() patch for Linux 2.6.34-rc2 -v2: Replaced pfn check with address check for kernel rw-data Signed-off-by: Siarhei Liakh Signed-off-by: Xuxian Jiang Reviewed-by: Suresh Siddha Cc: Arjan van de Ven Cc: James Morris Cc: Andi Kleen Cc: Rusty Russell Cc: Stephen Rothwell Cc: Dave Jones Cc: Kees Cook Cc: Linus Torvalds LKML-Reference: <4CE2F7F3.8030809@free.fr> Signed-off-by: Ingo Molnar commit 82148d1d0b2f369851f2dff5088f7840f9f16abf Author: Shérab Date: Sat Sep 25 06:06:57 2010 +0200 x86/platform: Add Eurobraille/Iris power off support The Iris machines from Eurobraille do not have APM or ACPI support to shut themselves down properly. A special I/O sequence is needed to do so. This modle runs this I/O sequence at kernel shutdown when its force parameter is set to 1. Signed-off-by: Shérab Acked-by: "H. Peter Anvin" [ did minor coding style edits ] Signed-off-by: Ingo Molnar commit 79250af2d5953b69380a6319b493862bf4ece972 Author: Kees Cook Date: Tue Nov 16 10:10:04 2010 -0800 x86: Fix included-by file reference comments Adjust the paths for files that are including verify_cpu.S. Reported-by: Yinghai Lu Signed-off-by: Kees Cook Acked-by: Pekka Enberg Cc: Alan Cox LKML-Reference: <1289931004-16066-1-git-send-email-kees.cook@canonical.com> Signed-off-by: Ingo Molnar commit 072b198a4ad48bd722ec6d203d65422a4698eae7 Author: Don Zickus Date: Fri Nov 12 11:22:24 2010 -0500 x86, nmi_watchdog: Remove all stub function calls from old nmi_watchdog Now that the bulk of the old nmi_watchdog is gone, remove all the stub variables and hooks associated with it. This touches lots of files mainly because of how the io_apic nmi_watchdog was implemented. Now that the io_apic nmi_watchdog is forever gone, remove all its fingers. Most of this code was not being exercised by virtue of nmi_watchdog != NMI_IO_APIC, so there shouldn't be anything to risky here. Signed-off-by: Don Zickus Cc: fweisbec@gmail.com Cc: gorcunov@openvz.org LKML-Reference: <1289578944-28564-3-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar commit 5f2b0ba4d94b3ac23cbc4b7f675d98eb677a760a Author: Don Zickus Date: Fri Nov 12 11:22:23 2010 -0500 x86, nmi_watchdog: Remove the old nmi_watchdog Now that we have a new nmi_watchdog that is more generic and sits on top of the perf subsystem, we really do not need the old nmi_watchdog any more. In addition, the old nmi_watchdog doesn't really work if you are using the default clocksource, hpet. The old nmi_watchdog code relied on local apic interrupts to determine if the cpu is still alive. With hpet as the clocksource, these interrupts don't increment any more and the old nmi_watchdog triggers false postives. This piece removes the old nmi_watchdog code and stubs out any variables and functions calls. The stubs are the same ones used by the new nmi_watchdog code, so it should be well tested. Signed-off-by: Don Zickus Cc: fweisbec@gmail.com Cc: gorcunov@openvz.org LKML-Reference: <1289578944-28564-2-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar commit b2c0710c464ede15e1fc52fb1e7ee9ba54cea186 Author: Paul E. McKenney Date: Thu Sep 9 13:40:39 2010 -0700 rcu: move TINY_RCU from softirq to kthread If RCU priority boosting is to be meaningful, callback invocation must be boosted in addition to preempted RCU readers. Otherwise, in presence of CPU real-time threads, the grace period ends, but the callbacks don't get invoked. If the callbacks don't get invoked, the associated memory doesn't get freed, so the system is still subject to OOM. But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit moves the callback invocations to a kthread, which can be boosted easily. Signed-off-by: Paul E. McKenney Signed-off-by: Paul E. McKenney commit d3e1884bc585a43674d2cb0d3f0aeeb0ae43bc04 Author: Feng Tang Date: Wed Nov 17 12:11:24 2010 +0000 x86, mrst: Add explanation for using 1960 as the year offset for vrtc Explain the reason for the apparently odd choice of year offset so we don't get more questions about it. Signed-off-by: Feng Tang Signed-off-by: Alan Cox LKML-Reference: <20101117121050.9998.89348.stgit@localhost.localdomain> Signed-off-by: Thomas Gleixner commit ad02519a0d27da4a0a50cbc696e810c94e27c28e Author: Randy Dunlap Date: Mon Nov 15 10:14:06 2010 -0800 x86, mrst: Fix dependencies of "select INTEL_SCU_IPC" commit b9fc71f47 (x86, mrst: The shutdown for MRST requires the SCU IPC mechanism) introduced the following warning: warning: (X86_MRST && PCI && PCI_GOANY && X86_32 && X86_EXTENDED_PLATFORM && X86_IO_APIC) selects INTEL_SCU_IPC which has unmet direct dependencies (X86 && X86_PLATFORM_DEVICES && X86_MRST) which is due to the hierarchical menu structure. Select X86_PLATFORM_DEVICES as well. Originally-from: Randy Dunlap Signed-off-by: Thomas Gleixner LKML-Reference: <20101115101406.77e072ef.randy.dunlap@oracle.com> Cc: Alan Cox commit b9fc71f47dc060c588e5099638242fad44eeecbc Author: Alan Cox Date: Mon Nov 15 17:31:19 2010 +0000 x86, mrst: The shutdown for MRST requires the SCU IPC mechanism Fix the build failure reported by Randy. Reported-by: Randy Dunlap Signed-off-by: Alan Cox LKML-Reference: <20101115173110.6877.83958.stgit@localhost.localdomain> Signed-off-by: Thomas Gleixner commit 133dc4c39c57eeef2577ca5b4ed24765b7a78ce2 Author: Ingo Molnar Date: Tue Nov 16 18:45:39 2010 +0100 perf: Rename 'perf trace' to 'perf script' Free the perf trace name space and rename the trace to 'script' which is a better match for the scripting engine. Signed-off-by: Ingo Molnar Signed-off-by: Thomas Gleixner commit 37bc9f5078c62bfa73edeb0053edceb3ed5e46a4 Author: Dirk Brandewie Date: Tue Nov 9 12:08:08 2010 -0800 x86: Ce4100: Add reboot_fixup() for CE4100 This patch adds the CE4100 reboot fixup to reboot_fixups_32.c [ tglx: Moved PCI id to reboot_fixups_32.c ] Signed-off-by: Dirk Brandewie LKML-Reference: <5bdcfb4f0206fa721570504e95659a03b815bc5e.1289331834.git.dirk.brandewie@gmail.com> Signed-off-by: Thomas Gleixner commit 91d8037f563e4a86ff8b02c994530989c7936427 Author: Dirk Brandewie Date: Tue Nov 9 12:08:05 2010 -0800 ce4100: Add PCI register emulation for CE4100 This patch provides access methods for PCI registers that mis-behave on the CE4100. Each register can be assigned a private init, read and write routine. The exception to this is the bridge device. The bridge device is the only device on bus zero (0) that requires any fixup so it is a special case. [ tglx: minor coding style cleanups, __init annotation and simplification of ce4100_conf_read/write ] Signed-off-by: Dirk Brandewie LKML-Reference: <40b6751381c2275dc359db5a17989cce22ad8db7.1289331834.git.dirk.brandewie@gmail.com> Signed-off-by: Thomas Gleixner commit c751e17b5371ad86cdde6cf5c0175e06f3ff0347 Author: Thomas Gleixner Date: Tue Nov 9 12:08:04 2010 -0800 x86: Add CE4100 platform support Add CE4100 platform support. CE4100 needs early setup like moorestown. Signed-off-by: Thomas Gleixner Signed-off-by: Dirk Brandewie LKML-Reference: <94720fd7f5564a12ebf202cf2c4f4c0d619aab35.1289331834.git.dirk.brandewie@gmail.com> Signed-off-by: Thomas Gleixner commit 6f207e9bb4219d261d9326597ca533f954f31755 Author: Feng Tang Date: Thu Nov 11 15:50:50 2010 +0000 x86: mrst: Set vRTC's IRQ to level trigger type When setting up the mpc_intsrc structure for vRTC's IRQ, we need to set its irqflag to level trigger, otherwise it will be taken as edge triggered and the vRTC IRQ will fire only once, as there is never a EOI issued from the IA core for it. The original code worked in previous kernel. This is because it was configured to level trigger type by luck. It fell into the default PCI trigger category which is level triggered. Signed-off-by: Feng Tang Signed-off-by: Alan Cox LKML-Reference: <20101111155019.12924.569.stgit@localhost.localdomain> Signed-off-by: Thomas Gleixner commit 86071535f845fd054753122e564cee9406c84e70 Author: Vinod Koul Date: Wed Nov 10 17:40:48 2010 +0000 x86: mrst: Add audio driver bindings This patch adds the sound card bindings for Moorestown (pmic_audio) and the Medfield platform (msic_audio) as IPC devices. This ensures they will be created at the right time. Signed-off-by: Vinod Koul Signed-off-by: Alan Cox LKML-Reference: <20101110174044.11340.78008.stgit@localhost.localdomain> Signed-off-by: Thomas Gleixner commit 0146f26145af75d53e12dbf23a36996aff373680 Author: Feng Tang Date: Wed Nov 10 17:29:17 2010 +0000 rtc: Add drivers/rtc/rtc-mrst.c Provide the standard kernel rtc driver interface on top of the vrtc layer added in the previous patch. Signed-off-by: Feng Tang LKML-Reference: <20101110172911.3311.20593.stgit@localhost.localdomain> [Fixed swapped arguments on IPC] Signed-off-by: Arjan van de Ven [Cleaned up and the device creation moved to arch/x86/platform] Signed-off-by: Alan Cox Signed-off-by: Thomas Gleixner commit 7309282c90d251cde77fe3b520a8276e25315c49 Author: Feng Tang Date: Wed Nov 10 17:29:00 2010 +0000 x86: mrst: Add vrtc driver which serves as a wall clock device Moorestown platform doesn't have a m146818 RTC device like traditional x86 PC, but a firmware emulated virtual RTC device(vrtc), which provides some basic RTC functions like get/set time. vrtc serves as the only wall clock device on Moorestown platform. [ tglx: Changed the exports to _GPL ] Signed-off-by: Feng Tang Signed-off-by: Jacob Pan Signed-off-by: Alan Cox LKML-Reference: <20101110172837.3311.40483.stgit@localhost.localdomain> Signed-off-by: Thomas Gleixner commit cfb505a7ebd4c84206b4cc7d9f966d864a2ac05a Author: Alek Du Date: Wed Nov 10 16:50:08 2010 +0000 x86: mrst: Add Moorestown specific reboot/shutdown support Moorestowns needs to use a special IPC command to reboot or shutdown the platform. Signed-off-by: Alek Du Signed-off-by: Alan Cox LKML-Reference: <20101110164928.6365.94243.stgit@localhost.localdomain> Signed-off-by: Thomas Gleixner commit 6036f373ea03687d355634fa70fb04baa95ab75e Author: Kees Cook Date: Wed Nov 10 10:35:54 2010 -0800 x86, cpu: Only CPU features determine NX capabilities Fix the NX feature boot warning when NX is missing to correctly reflect that BIOSes cannot disable NX now. Signed-off-by: Kees Cook LKML-Reference: <1289414154-7829-5-git-send-email-kees.cook@canonical.com> Acked-by: Pekka Enberg Acked-by: Alan Cox Signed-off-by: H. Peter Anvin commit ebba638ae723d8a8fc2f7abce5ec18b688b791d7 Author: Kees Cook Date: Wed Nov 10 10:35:53 2010 -0800 x86, cpu: Call verify_cpu during 32bit CPU startup The XD_DISABLE-clearing side-effect needs to happen for both 32bit and 64bit, but the 32bit init routines were not calling verify_cpu() yet. This adds that call to gain the side-effect. The longmode/SSE tests being performed in verify_cpu() need to happen very early for 64bit but not for 32bit. Instead of including it in two places for 32bit, we can just include it once in arch/x86/kernel/head_32.S. Signed-off-by: Kees Cook LKML-Reference: <1289414154-7829-4-git-send-email-kees.cook@canonical.com> Acked-by: Pekka Enberg Acked-by: Alan Cox Signed-off-by: H. Peter Anvin commit ae84739c27b6b3725993202fe02ff35ab86468e1 Author: Kees Cook Date: Wed Nov 10 10:35:52 2010 -0800 x86, cpu: Clear XD_DISABLED flag on Intel to regain NX Intel CPUs have an additional MSR bit to indicate if the BIOS was configured to disable the NX cpu feature. This bit was traditionally used for operating systems that did not understand how to handle the NX bit. Since Linux understands this, this BIOS flag should be ignored by default. In a review[1] of reported hardware being used by Ubuntu bug reporters, almost 10% of systems had an incorrectly configured BIOS, leaving their systems unable to use the NX features of their CPU. This change will clear the MSR_IA32_MISC_ENABLE_XD_DISABLE bit so that NX cannot be inappropriately controlled by the BIOS on Intel CPUs. If, under very strange hardware configurations, NX actually needs to be disabled, "noexec=off" can be used to restore the prior behavior. [1] http://www.outflux.net/blog/archives/2010/02/18/data-mining-for-nx-bit/ Signed-off-by: Kees Cook LKML-Reference: <1289414154-7829-3-git-send-email-kees.cook@canonical.com> Acked-by: Pekka Enberg Acked-by: Alan Cox Signed-off-by: H. Peter Anvin commit c5cbac69422a9bffe7c7fd9a115130e272b547f5 Author: Kees Cook Date: Wed Nov 10 10:35:51 2010 -0800 x86, cpu: Rename verify_cpu_64.S to verify_cpu.S The code is 32bit already, and can be used in 32bit routines. Signed-off-by: Kees Cook LKML-Reference: <1289414154-7829-2-git-send-email-kees.cook@canonical.com> Acked-by: Pekka Enberg Acked-by: Alan Cox Signed-off-by: H. Peter Anvin commit 5bdb05f91b27b9361c4f348a4e05999f597df72e Author: Darren Hart Date: Mon Nov 8 13:40:28 2010 -0800 futex: Add futex_q static initializer The futex_q struct has grown considerably over the last couple years. I believe it now merits a static initializer to avoid uninitialized data errors (having spent more time than I care to admit debugging an uninitialized q.bitset in an experimental new op code). With the key initializer built in, several of the FUTEX_KEY_INIT calls can be removed. V2: use a static variable instead of an init macro. use a C99 initializer and don't rely on variable ordering in the struct. V3: make futex_q_init const Signed-off-by: Darren Hart Cc: Peter Zijlstra Cc: Eric Dumazet Cc: John Kacur Cc: Ingo Molnar LKML-Reference: <1289252428-18383-1-git-send-email-dvhart@linux.intel.com> Signed-off-by: Thomas Gleixner commit b41277dc7a18ee332d9e8078e978bacdf6e76157 Author: Darren Hart Date: Mon Nov 8 13:10:09 2010 -0800 futex: Replace fshared and clockrt with combined flags In the early days we passed the mmap sem around. That became the "int fshared" with the fast gup improvements. Then we added "int clockrt" in places. This patch unifies these options as "flags". [ tglx: Split out the stale fshared cleanup ] Signed-off-by: Darren Hart Cc: Peter Zijlstra Cc: Eric Dumazet Cc: John Kacur Cc: Ingo Molnar LKML-Reference: <1289250609-16304-1-git-send-email-dvhart@linux.intel.com> Signed-off-by: Thomas Gleixner commit ae791a2d2e382adc69990a144a7f1a6c4bc24f1e Author: Thomas Gleixner Date: Wed Nov 10 13:30:36 2010 +0100 futex: Cleanup stale fshared flag interfaces The fast GUP changes stopped using the fshared flag in put_futex_keys(), but we kept the interface the same. Cleanup all stale users. This patch is split out from Darren Harts combo patch which also combines various flags. This way the changes are clearly separated. Signed-off-by: Thomas Gleixner Cc: Darren Hart LKML-Reference: <1289250609-16304-1-git-send-email-dvhart@linux.intel.com> commit c7657ac0c3e4d4ab569296911164b7a2b0ff871a Author: Borislav Petkov Date: Mon Nov 1 23:36:53 2010 +0100 x86, microcode, AMD: Cleanup code a bit get_ucode_data is a memcpy() wrapper which always returns 0. Move it into the header and make it an inline. Remove all code checking its return value and turn it into a void. There should be no functionality change resulting from this patch. Signed-off-by: Borislav Petkov commit 1ea6be212eea5ce1e8fabadacb0c639ad87b2f00 Author: Jesper Juhl Date: Mon Nov 1 22:44:34 2010 +0100 x86, microcode, AMD: Replace vmalloc+memset with vzalloc We don't have to do memset() ourselves after vmalloc() when we have vzalloc(), so change that in arch/x86/kernel/microcode_amd.c::get_next_ucode(). Signed-off-by: Jesper Juhl Signed-off-by: Borislav Petkov commit 5e4f083f78d03e9f8d2e327daccde16976f9bb00 Author: Yong Zhang Date: Sun Oct 24 11:50:53 2010 +0800 hrtimer: Remove stale comment on curr_timer curr_timer doesn't resident in struct hrtimer_cpu_base anymore. Signed-off-by: Yong Zhang LKML-Reference: <1287892253-2587-1-git-send-email-yong.zhang0@gmail.com> Signed-off-by: Thomas Gleixner commit 3cf9b85b474e656a0856b88290c7a289ac5ea247 Author: Joe Perches Date: Fri Nov 5 16:12:38 2010 -0700 locking, lockdep: Convert sprintf_symbol to %pS Signed-off-by: Joe Perches Cc: Peter Zijlstra Cc: Jiri Kosina LKML-Reference: <1288998760-11775-6-git-send-email-joe@perches.com> Signed-off-by: Ingo Molnar commit f6cd24777513fcc673d432cc29ef59881d3e4df1 Author: Eric Dumazet Date: Thu Nov 4 11:13:48 2010 +0100 irq: Better struct irqaction layout We currently use kmalloc-96 slab for struct irqaction allocations on 64bit arches. This is unfortunate because of possible false sharing and two cache lines accesses. Move 'name' and 'dir' fields at the end of the structure, and force a suitable alignement. Hot path fields now use one cache line on x86_64. Signed-off-by: Eric Dumazet Reviewed-by: Andi Kleen Cc: Peter Zijlstra LKML-Reference: <1288865628.2659.69.camel@edumazet-laptop> Signed-off-by: Thomas Gleixner commit 7f05dec3dd70f086870fdc1d40dbe30db1fe0994 Author: Jacob Pan Date: Tue Nov 9 11:28:43 2010 +0000 x86: mrst: Parse SFI timer table for all timer configs Penwell has APB timer based watchdog timers, it requires platform code to parse SFI MTMR tables in order to claim its timer. This patch will always parse SFI MTMR regardless of system timer configuration choices. Otherwise, SFI MTMR table may not get parsed if running on Medfield with always-on local APIC timers and constant TSC. Watchdog timer driver will then not get a timer to use. Signed-off-by: Jacob Pan Signed-off-by: Alan Cox LKML-Reference: <20101109112800.20591.10802.stgit@localhost.localdomain> Signed-off-by: Thomas Gleixner commit 1da4b1c6a4dfb5a13d7147a27c1ac53fed09befd Author: Feng Tang Date: Tue Nov 9 11:22:58 2010 +0000 x86/mrst: Add SFI platform device parsing code SFI provides a series of tables. These describe the platform devices present including SPI and I²C devices, as well as various sensors, keypads and other glue as well as interfaces provided via the SCU IPC mechanism (intel_scu_ipc.c) This patch is a merge of the core elements and relevant fixes from the Intel development code by Feng, Alek, myself into a single coherent patch for upstream submission. It provides the needed infrastructure to register I2C, SPI and platform devices described by the tables, as well as handlers for some of the hardware already supported in kernel. The 0.8 firmware also provides GPIO tables. Devices are created at boot time or if they are SCU dependant at the point an SCU is discovered. The existing Linux device mechanisms will then handle the device binding. At an abstract level this is an SFI to Linux device translator. Device/platform specific setup/glue is in this file. This is done so that the drivers for the generic I²C and SPI bus devices remain cross platform as they should. (Updated from RFC version to correct the emc1403 name used by the firmware and a wrongly used #define) Signed-off-by: Alek Du LKML-Reference: <20101109112158.20013.6158.stgit@localhost.localdomain> [Clean ups, removal of 0.7 support] Signed-off-by: Feng Tang [Clean ups] Signed-off-by: Alan Cox Signed-off-by: Thomas Gleixner commit eb48c9cb2053e7bb5f7f8f0371cb578a0d439450 Author: Robert Richter Date: Mon Oct 25 16:03:39 2010 +0200 apic, amd: Make firmware bug messages more meaningful This improves error messages in case the BIOS was setting up wrong LVT offsets. Signed-off-by: Robert Richter Acked-by: Borislav Petkov LKML-Reference: <1288015419-29543-6-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar commit 0a17941e71f089b128514f7b5b486e20072ca7dc Author: Robert Richter Date: Mon Oct 25 16:03:38 2010 +0200 mce, amd: Remove goto in threshold_create_device() Removing the goto in threshold_create_device(). Signed-off-by: Robert Richter Acked-by: Borislav Petkov LKML-Reference: <1288015419-29543-5-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar commit bbaff08dca3c34d0fb6b4c4051354184e33e3df8 Author: Robert Richter Date: Mon Oct 25 16:03:37 2010 +0200 mce, amd: Add helper functions to setup APIC This patch reworks and cleans up mce_amd_feature_init() by introducing helper functions to setup and check the LVT offset. It also fixes line endings in pr_err() calls. Signed-off-by: Robert Richter Acked-by: Borislav Petkov LKML-Reference: <1288015419-29543-4-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar commit 7203a0494084541575bac6dfc4e153f9e28869b8 Author: Robert Richter Date: Mon Oct 25 16:03:36 2010 +0200 mce, amd: Shorten local variables mci_misc_{hi,lo} Shorten this variables to make later changes more readable. Signed-off-by: Robert Richter Acked-by: Borislav Petkov LKML-Reference: <1288015419-29543-3-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar commit 9c37c9d89773ee9da9f6af28ee37d931bd045711 Author: Robert Richter Date: Mon Oct 25 16:03:35 2010 +0200 mce, amd: Implement mce_threshold_block_init() helper function This patch adds a helper function for the initial setup of an mce threshold block. The LVT offset is passed as argument. Also making variable threshold_defaults local as it is only used in function mce_amd_feature_init(). Function threshold_restart_bank() is extended to setup the LVT offset, the change is backward compatible. Thus, now there is only a single wrmsrl() to setup the block. Signed-off-by: Robert Richter Acked-by: Borislav Petkov LKML-Reference: <1288015419-29543-2-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar commit 7fb2b870d6a3b92f6750ac2b72858fd098dc9e3f Author: Thomas Gleixner Date: Sun Oct 24 11:11:22 2010 +0200 x86: io_apic: Fix CONFIG_X86_IO_APIC=n breakage Stupid me forgot to change the function name for the CONFIG_X86_IO_APIC=n case in commit 23f9b2671 (x86: apic: Move probe_nr_irqs_gsi() into ioapic_init_mappings()) Signed-off-by: Thomas Gleixner commit fe7de49f9d4e53f24ec9ef762a503f70b562341c Author: KOSAKI Motohiro Date: Wed Oct 20 16:01:12 2010 -0700 sched: Make sched_param argument static in sched_setscheduler() callers Andrew Morton pointed out almost all sched_setscheduler() callers are using fixed parameters and can be converted to static. It reduces runtime memory use a little. Signed-off-by: KOSAKI Motohiro Reported-by: Andrew Morton Acked-by: James Morris Cc: Ingo Molnar Cc: Steven Rostedt Signed-off-by: Andrew Morton Signed-off-by: Thomas Gleixner Signed-off-by: Ingo Molnar commit 23f9b267159b4c7ff59d2e6c8ed31693eff841e3 Author: Thomas Gleixner Date: Fri Oct 15 15:38:50 2010 -0700 x86: apic: Move probe_nr_irqs_gsi() into ioapic_init_mappings() probe_br_irqs_gsi() is called right after ioapic_init_mappings() and there are no other users. Move it into ioapic_init_mappings() so the declaration can disappear and the function can become static. Rename ioapic_init_mappings() to ioapic_and_gsi_init() to reflect that change. Signed-off-by: Thomas Gleixner LKML-Reference: <1287510389-8388-2-git-send-email-dirk.brandewie@gmail.com> Signed-off-by: Dirk Brandewie commit 5a7ae78fd478624df3059cb6f55056b85d074acc Author: Thomas Gleixner Date: Tue Oct 19 10:46:28 2010 -0700 x86: Allow platforms to force enable apic Some embedded x86 platforms don't setup the APIC in the BIOS/bootloader and would be forced to add "lapic" on the kernel command line. That's a bit akward. Split out the force enable code from detect_init_APIC() and allow platform code to call it from the platform setup. That avoids the command line parameter and possible replication of the MSR dance in the force enable code. Signed-off-by: Thomas Gleixner LKML-Reference: <1287510389-8388-1-git-send-email-dirk.brandewie@gmail.com> Signed-off-by: Dirk Brandewie commit fd35fbcdd1b2579a6e00a1545f7124e4005d0474 Author: H. Peter Anvin Date: Fri Oct 22 15:33:38 2010 -0700 x86-64, asm: Use fxsaveq/fxrestorq in more places Checkin d7acb92fea932ad2e7846480aeacddc2c03c8485 made use of fxsaveq in fpu_fxsave() if the assembler supports it; this adds fxsaveq/fxrstorq to fxrstor_checking() and fxsave_user() as well. Reported-by: Linus Torvalds LKML-Reference: Signed-off-by: H. Peter Anvin commit 466bd3030973910118ca601da8072be97a1e2209 Author: Yong Zhang Date: Wed Oct 20 15:57:33 2010 -0700 timer: Warn when del_timer_sync() is called in hardirq context Add explict warning when del_timer_sync() is called in hardirq context. Signed-off-by: Yong Zhang Cc: Ingo Molnar Cc: Peter Zijlstra Acked-by: Oleg Nesterov Signed-off-by: Andrew Morton Signed-off-by: Thomas Gleixner commit 1118e2cd33d47254854e1ba3ba8e32802ff14fdf Author: Yong Zhang Date: Wed Oct 20 15:57:32 2010 -0700 timer: Del_timer_sync() can be used in softirq context Actually we have used del_timer_sync() in softirq context for a long time, e.g. in __dst_free()::cancel_delayed_work(). So change the comments of it to warn on hardirq context only, and make lockdep know about this change. Signed-off-by: Yong Zhang Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Oleg Nesterov Signed-off-by: Andrew Morton Signed-off-by: Thomas Gleixner commit 6f1bc451e6a79470b122a37ee1fc6bbca450f444 Author: Yong Zhang Date: Wed Oct 20 15:57:31 2010 -0700 timer: Make try_to_del_timer_sync() the same on SMP and UP On UP try_to_del_timer_sync() is mapped to del_timer() which does not take the running timer callback into account, so it has different semantics. Remove the SMP dependency of try_to_del_timer_sync() by using base->running_timer in the UP case as well. [ tglx: Removed set_running_timer() inline and tweaked the changelog ] Signed-off-by: Yong Zhang Cc: Ingo Molnar Cc: Peter Zijlstra Acked-by: Oleg Nesterov Signed-off-by: Andrew Morton Signed-off-by: Thomas Gleixner commit 20f33a03f0cf87e51165f7084f697acfb68e865b Author: Namhyung Kim Date: Wed Oct 20 15:57:34 2010 -0700 posix-timers: Annotate lock_timer() lock_timer() conditionally grabs it_lock in case of returning non-NULL but unlock_timer() releases it unconditionally. This leads sparse to complain about the lock context imbalance. Rename and wrap lock_timer using __cond_lock() macro to make sparse happy. Signed-off-by: Namhyung Kim Signed-off-by: Andrew Morton Signed-off-by: Thomas Gleixner commit dd6414b50fa2b1cd247a8aa8f8bd42414b7453e1 Author: Phil Carmody Date: Wed Oct 20 15:57:33 2010 -0700 timer: Permit statically-declared work with deferrable timers Currently, you have to just define a delayed_work uninitialised, and then initialise it before first use. That's a tad clumsy. At risk of playing mind-games with the compiler, fooling it into doing pointer arithmetic with compile-time-constants, this lets clients properly initialise delayed work with deferrable timers statically. This patch was inspired by the issues which lead Artem Bityutskiy to commit 8eab945c5616fc984 ("sunrpc: make the cache cleaner workqueue deferrable"). Signed-off-by: Phil Carmody Acked-by: Artem Bityutskiy Cc: Arjan van de Ven Signed-off-by: Andrew Morton Signed-off-by: Thomas Gleixner commit 2bf1c05e3c406925e498d06da66b4828f0209ea6 Author: Nikitas Angelinas Date: Wed Oct 20 15:57:31 2010 -0700 time: Use ARRAY_SIZE macro in timecompare.c Replace sizeof(buffer)/sizeof(buffer[0]) with ARRAY_SIZE(buffer) in kernel/time/timecompare.c Signed-off-by: Nikitas Angelinas Signed-off-by: Andrew Morton Signed-off-by: Thomas Gleixner commit aaabe31c25a439b92cc281b14ca18b85bae7e7a6 Author: Changli Gao Date: Wed Oct 20 15:57:30 2010 -0700 timer: Initialize the field slack of timer_list TIMER_INITIALIZER() should initialize the field slack of timer_list as __init_timer() does. Signed-off-by: Changli Gao Cc: Arjan van de Ven Signed-off-by: Andrew Morton Signed-off-by: Thomas Gleixner commit d0959024d8fb6555ba8bfdc6624cc7b7c2e675fd Author: Richard Kennedy Date: Wed Oct 20 15:57:30 2010 -0700 timer_list: Remove alignment padding on 64 bit when CONFIG_TIMER_STATS Reorder struct timer_list to remove 8 bytes of alignment padding on 64 bit builds when CONFIG_TIMER_STATS is selected. timer_list is widely used across the kernel so many structures will benefit and shrink in size. For example, with my config on x86_64 per_cpu_dm_data shrinks from 136 to 128 bytes and ahci_port_priv shrinks from 1032 to 968 bytes. Signed-off-by: Richard Kennedy Cc: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Thomas Gleixner commit a386b5af8edda1c742ce9f77891e112eefffc005 Author: Kasper Pedersen Date: Wed Oct 20 15:55:15 2010 -0700 time: Compensate for rounding on odd-frequency clocksources When the clocksource is not a multiple of HZ, the clock will be off. For acpi_pm, HZ=1000 the error is 127.111 ppm: The rounding of cycle_interval ends up generating a false error term in ntp_error accumulation since xtime_interval is not exactly 1/HZ. So, we subtract out the error caused by the rounding. This has been visible since 2.6.32-rc2 commit a092ff0f90cae22b2ac8028ecd2c6f6c1a9e4601 time: Implement logarithmic time accumulation That commit raised NTP_INTERVAL_FREQ and exposed the rounding error. testing tool: http://n1.taur.dk/permanent/testpmt.c Also tested with ntpd and a frequency counter. Signed-off-by: Kasper Pedersen Acked-by: john stultz Cc: John Kacur Cc: Clark Williams Cc: Martin Schwidefsky Signed-off-by: Andrew Morton Signed-off-by: Thomas Gleixner commit 8e8be45e8e55daa381028aec339829929ddb53a5 Author: Paul E. McKenney Date: Thu Sep 2 16:16:14 2010 -0700 rcu: add priority-inversion testing to rcutorture Add an optional test to force long-term preemption of RCU read-side critical sections, controlled by new test_boost, test_boost_interval, and test_boost_duration module parameters. This is to be used to test RCU priority boosting. Signed-off-by: Paul E. McKenney commit d3b8f889a220aed825accc28eb64ce283a0d51ac Author: john stultz Date: Mon Aug 17 16:40:47 2009 -0700 x86: Make tsc=reliable override boot time stability checks This patch makes the tsc=reliable option disable the boot time stability checks. Currently the option only disables the runtime watchdog checks. This change allows folks who want to override the boot time TSC stability checks and use the TSC when the system would otherwise disqualify it. There still are some situations that the TSC will be disqualified, such as cpufreq scaling. But these are situations where the box will hang if allowed. Patch also includes a fix for an issue found by Thomas Gleixner, where the TSC disqualification message wouldn't be printed after a call to unsynchronized_tsc(). Signed-off-by: John Stultz Cc: Andrew Morton Cc: akataria@vmware.com Cc: Stephen Hemminger LKML-Reference: <1250552447.7212.92.camel@localhost.localdomain> Signed-off-by: Thomas Gleixner