commit 537b60d17894b7c19a6060feae40299d7109d6e7 Merge: 3ae684e a289cc7 Author: Linus Torvalds Date: Tue May 18 09:46:35 2010 -0700 Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, UV: uv_irq.c: Fix all sparse warnings x86, UV: Improve BAU performance and error recovery x86, UV: Delete unneeded boot messages x86, UV: Clean up UV headers for MMR definitions commit 3ae684e1c48e6deedc9b9faff8fa1c391ca8a652 Merge: c4fd308 4bd96a7 Author: Linus Torvalds Date: Tue May 18 09:28:24 2010 -0700 Merge branch 'x86-txt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-txt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, tboot: Add support for S3 memory integrity protection commit c4fd308ed62f292518363ea9c6c2adb3c2d95f9d Merge: 96fbeb9 1f9cc3c Author: Linus Torvalds Date: Tue May 18 09:28:04 2010 -0700 Merge branch 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, pat: Update the page flags for memtype atomically instead of using memtype_lock x86, pat: In rbt_memtype_check_insert(), update new->type only if valid x86, pat: Migrate to rbtree only backend for pat memtype management x86, pat: Preparatory changes in pat.c for bigger rbtree change rbtree: Add support for augmented rbtrees commit 96fbeb973a7e17594a429537201611ca0b395622 Merge: 1f8caa9 fea24e2 Author: Linus Torvalds Date: Tue May 18 09:27:49 2010 -0700 Merge branch 'x86-mrst-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mrst-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, mrst: add nop functions to x86_init mpparse functions x86, mrst, pci: return 0 for non-present pci bars x86: Avoid check hlt for newer cpus commit 1f8caa986a5f4de2e40f3defe66a07b4c5a019c2 Merge: d6f3875 2e61878 Author: Linus Torvalds Date: Tue May 18 09:17:50 2010 -0700 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-64: Combine SRAT regions when possible commit d6f3875252bb703a9a3de0b92f7ae154f12c986c Merge: cb41838 3f10940 Author: Linus Torvalds Date: Tue May 18 09:17:17 2010 -0700 Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86/microcode: Use nonseekable_open() x86: Improve Intel microcode loader performance commit cb41838bbc4403f7270a94b93a9a0d9fc9c2e7ea Merge: 98f0172 c59bd56 Author: Linus Torvalds Date: Tue May 18 09:17:01 2010 -0700 Merge branch 'core-hweight-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-hweight-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, hweight: Use a 32-bit popcnt for __arch_hweight32() arch, hweight: Fix compilation errors x86: Add optimized popcnt variants bitops: Optimize hweight() by making use of compile-time evaluation commit 98f01720cbe3e2eb719682777049b6514e9db556 Merge: 41d5910 4f47b4c Author: Linus Torvalds Date: Tue May 18 09:15:57 2010 -0700 Merge branch 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, acpi/irq: Define gsi_end when X86_IO_APIC is undefined x86, irq: Kill io_apic_renumber_irq x86, acpi/irq: Handle isa irqs that are not identity mapped to gsi's. x86, ioapic: Simplify probe_nr_irqs_gsi. x86, ioapic: Optimize pin_2_irq x86, ioapic: Move nr_ioapic_registers calculation to mp_register_ioapic. x86, ioapic: In mpparse use mp_register_ioapic x86, ioapic: Teach mp_register_ioapic to compute a global gsi_end x86, ioapic: Fix the types of gsi values x86, ioapic: Fix io_apic_redir_entries to return the number of entries. x86, ioapic: Only export mp_find_ioapic and mp_find_ioapic_pin in io_apic.h x86, acpi/irq: Generalize mp_config_acpi_legacy_irqs x86, acpi/irq: Fix acpi_sci_ioapic_setup so it has both bus_irq and gsi x86, acpi/irq: pci device dev->irq is an isa irq not a gsi x86, acpi/irq: Teach acpi_get_override_irq to take a gsi not an isa_irq x86, acpi/irq: Introduce apci_isa_irq_to_gsi commit 41d59102e146a4423a490b8eca68a5860af4fe1c Merge: 3e1dd19 c9775b4 Author: Linus Torvalds Date: Tue May 18 08:58:16 2010 -0700 Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, fpu: Use static_cpu_has() to implement use_xsave() x86: Add new static_cpu_has() function using alternatives x86, fpu: Use the proper asm constraint in use_xsave() x86, fpu: Unbreak FPU emulation x86: Introduce 'struct fpu' and related API x86: Eliminate TS_XSAVE x86-32: Don't set ignore_fpu_irq in simd exception x86: Merge kernel_math_error() into math_error() x86: Merge simd_math_error() into math_error() x86-32: Rework cache flush denied handler Fix trivial conflict in arch/x86/kernel/process.c commit 3e1dd193edefd2a806a0ba6cf0879cf1a95217da Merge: 07d7775 372e22e Author: Linus Torvalds Date: Tue May 18 08:53:57 2010 -0700 Merge branch 'x86-doc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-doc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, docbook: Fix errors from x86 headers merger commit 07d77759c95d899b84f8e473a01cff001019dd5f Merge: b7723f9 3998d09 Author: Linus Torvalds Date: Tue May 18 08:49:13 2010 -0700 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, hypervisor: add missing Modify the VMware balloon driver for the new x86_hyper API x86, hypervisor: Export the x86_hyper* symbols x86: Clean up the hypervisor layer x86, HyperV: fix up the license to mshyperv.c x86: Detect running on a Microsoft HyperV system x86, cpu: Make APERF/MPERF a normal table-driven flag x86, k8: Fix build error when K8_NB is disabled x86, cacheinfo: Disable index in all four subcaches x86, cacheinfo: Make L3 cache info per node x86, cacheinfo: Reorganize AMD L3 cache structure x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments x86, cacheinfo: Unify AMD L3 cache index disable checking cpufreq: Unify sysfs attribute definition macros powernow-k8: Fix frequency reporting x86, cpufreq: Add APERF/MPERF support for AMD processors x86: Unify APERF/MPERF support powernow-k8: Add core performance boost support x86, cpu: Add AMD core boosting feature flag to /proc/cpuinfo Fix up trivial conflicts in arch/x86/kernel/cpu/intel_cacheinfo.c and drivers/cpufreq/cpufreq_ondemand.c commit b7723f9d21d8d6043e63f5e3e412f321f5f1900c Merge: 93c9d7f 6fc108a Author: Linus Torvalds Date: Tue May 18 08:40:21 2010 -0700 Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Clean up arch/x86/Kconfig* x86-64: Don't export init_level4_pgt commit 93c9d7f60c0cb7715890b1f9e159da6f4d1f5a65 Merge: 7421a10 d9c5841 Author: Linus Torvalds Date: Tue May 18 08:40:05 2010 -0700 Merge branch 'x86-atomic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-atomic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix LOCK_PREFIX_HERE for uniprocessor build x86, atomic64: In selftest, distinguish x86-64 from 586+ x86-32: Fix atomic64_inc_not_zero return value convention lib: Fix atomic64_inc_not_zero test lib: Fix atomic64_add_unless return value convention x86-32: Fix atomic64_add_unless return value convention lib: Fix atomic64_add_unless test x86: Implement atomic[64]_dec_if_positive() lib: Only test atomic64_dec_if_positive on archs having it x86-32: Rewrite 32-bit atomic64 functions in assembly lib: Add self-test for atomic64_t x86-32: Allow UP/SMP lock replacement in cmpxchg64 x86: Add support for lock prefix in alternatives commit 7421a10de7a525f67cc082fca7a91011d00eada4 Merge: 752f114 9e56529 Author: Linus Torvalds Date: Tue May 18 08:35:37 2010 -0700 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Use .cfi_sections for assembly code x86-64: Reduce SMP locks table size x86, asm: Introduce and use percpu_inc() commit 752f114fb83c5839de37a250b4f8257ed5438341 Merge: b8ae30e ad56b07 Author: Linus Torvalds Date: Tue May 18 08:35:04 2010 -0700 Merge branch 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: Fix "integer as NULL pointer" warning. tracing: Fix tracepoint.h DECLARE_TRACE() to allow more than one header tracing: Make the documentation clear on trace_event boot option ring-buffer: Wrap open-coded WARN_ONCE tracing: Convert nop macros to static inlines tracing: Fix sleep time function profiling tracing: Show sample std dev in function profiling tracing: Add documentation for trace commands mod, traceon/traceoff ring-buffer: Make benchmark handle missed events ring-buffer: Make non-consuming read less expensive with lots of cpus. tracing: Add graph output support for irqsoff tracer tracing: Have graph flags passed in to ouput functions tracing: Add ftrace events for graph tracer tracing: Dump either the oops's cpu source or all cpus buffers tracing: Fix uninitialized variable of tracing/trace output commit b8ae30ee26d379db436b0b8c8c3ff1b52f69e5d1 Merge: 4d7b4ac 9c6f7e4 Author: Linus Torvalds Date: Tue May 18 08:27:54 2010 -0700 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits) stop_machine: Move local variable closer to the usage site in cpu_stop_cpu_callback() sched, wait: Use wrapper functions sched: Remove a stale comment ondemand: Make the iowait-is-busy time a sysfs tunable ondemand: Solve a big performance issue by counting IOWAIT time as busy sched: Intoduce get_cpu_iowait_time_us() sched: Eliminate the ts->idle_lastupdate field sched: Fold updating of the last_update_time_info into update_ts_time_stats() sched: Update the idle statistics in get_cpu_idle_time_us() sched: Introduce a function to update the idle statistics sched: Add a comment to get_cpu_idle_time_us() cpu_stop: add dummy implementation for UP sched: Remove rq argument to the tracepoints rcu: need barrier() in UP synchronize_sched_expedited() sched: correctly place paranioa memory barriers in synchronize_sched_expedited() sched: kill paranoia check in synchronize_sched_expedited() sched: replace migration_thread with cpu_stop stop_machine: reimplement using cpu_stop cpu_stop: implement stop_cpu[s]() sched: Fix select_idle_sibling() logic in select_task_rq_fair() ... commit 4d7b4ac22fbec1a03206c6cde353f2fd6942f828 Merge: 3aaf51a 94f3ca9 Author: Linus Torvalds Date: Tue May 18 08:19:03 2010 -0700 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (311 commits) perf tools: Add mode to build without newt support perf symbols: symbol inconsistency message should be done only at verbose=1 perf tui: Add explicit -lslang option perf options: Type check all the remaining OPT_ variants perf options: Type check OPT_BOOLEAN and fix the offenders perf options: Check v type in OPT_U?INTEGER perf options: Introduce OPT_UINTEGER perf tui: Add workaround for slang < 2.1.4 perf record: Fix bug mismatch with -c option definition perf options: Introduce OPT_U64 perf tui: Add help window to show key associations perf tui: Make <- exit menus too perf newt: Add single key shortcuts for zoom into DSO and threads perf newt: Exit browser unconditionally when CTRL+C, q or Q is pressed perf newt: Fix the 'A'/'a' shortcut for annotate perf newt: Make <- exit the ui_browser x86, perf: P4 PMU - fix counters management logic perf newt: Make <- zoom out filters perf report: Report number of events, not samples perf hist: Clarify events_stats fields usage ... Fix up trivial conflicts in kernel/fork.c and tools/perf/builtin-record.c commit 3aaf51ace5975050ab43c7d4d7e439e0ae7d13d7 Merge: f262af3 cc49b09 Author: Linus Torvalds Date: Tue May 18 08:18:07 2010 -0700 Merge branch 'oprofile-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'oprofile-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits) oprofile/x86: make AMD IBS hotplug capable oprofile/x86: notify cpus only when daemon is running oprofile/x86: reordering some functions oprofile/x86: stop disabled counters in nmi handler oprofile/x86: protect cpu hotplug sections oprofile/x86: remove CONFIG_SMP macros oprofile/x86: fix uninitialized counter usage during cpu hotplug oprofile/x86: remove duplicate IBS capability check oprofile/x86: move IBS code oprofile/x86: return -EBUSY if counters are already reserved oprofile/x86: moving shutdown functions oprofile/x86: reserve counter msrs pairwise oprofile/x86: rework error handler in nmi_setup() oprofile: update file list in MAINTAINERS file oprofile: protect from not being in an IRQ context oprofile: remove double ring buffering ring-buffer: Add lost event count to end of sub buffer tracing: Show the lost events in the trace_pipe output ring-buffer: Add place holder recording of dropped events tracing: Fix compile error in module tracepoints when MODULE_UNLOAD not set ... commit f262af3d08d3fffc4e11277d3a177b2d67ea2aba Merge: 1014cfe 72d5a9f Author: Linus Torvalds Date: Tue May 18 08:17:58 2010 -0700 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits) rcu: remove all rcu head initializations, except on_stack initializations rcu head introduce rcu head init on stack Debugobjects transition check rcu: fix build bug in RCU_FAST_NO_HZ builds rcu: RCU_FAST_NO_HZ must check RCU dyntick state rcu: make SRCU usable in modules rcu: improve the RCU CPU-stall warning documentation rcu: reduce the number of spurious RCU_SOFTIRQ invocations rcu: permit discontiguous cpu_possible_mask CPU numbering rcu: improve RCU CPU stall-warning messages rcu: print boot-time console messages if RCU configs out of ordinary rcu: disable CPU stall warnings upon panic rcu: enable CPU_STALL_VERBOSE by default rcu: slim down rcutiny by removing rcu_scheduler_active and friends rcu: refactor RCU's context-switch handling rcu: rename rcutiny rcu_ctrlblk to rcu_sched_ctrlblk rcu: shrink rcutiny by making synchronize_rcu_bh() be inline rcu: fix now-bogus rcu_scheduler_active comments. rcu: Fix bogus CONFIG_PROVE_LOCKING in comments to reflect reality. rcu: ignore offline CPUs in last non-dyntick-idle CPU check ... commit 1014cfe2fb4cdd663137aafb21448cb613dd6a7d Merge: 8123d8f 4726f2a Author: Linus Torvalds Date: Tue May 18 08:17:35 2010 -0700 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: lockdep: Reduce stack_trace usage lockdep: No need to disable preemption in debug atomic ops lockdep: Actually _dec_ in debug_atomic_dec lockdep: Provide off case for redundant_hardirqs_on increment lockdep: Simplify debug atomic ops lockdep: Fix redundant_hardirqs_on incremented with irqs enabled lockstat: Make lockstat counting per cpu i8253: Convert i8253_lock to raw_spinlock commit 8123d8f17d8ba9d67e556688e4f025456ca97842 Merge: 06ee772 795e74f Author: Linus Torvalds Date: Tue May 18 07:22:37 2010 -0700 Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86/amd-iommu: Add amd_iommu=off command line option iommu-api: Remove iommu_{un}map_range functions x86/amd-iommu: Implement ->{un}map callbacks for iommu-api x86/amd-iommu: Make amd_iommu_iova_to_phys aware of multiple page sizes x86/amd-iommu: Make iommu_unmap_page and fetch_pte aware of page sizes x86/amd-iommu: Make iommu_map_page and alloc_pte aware of page sizes kvm: Change kvm_iommu_map_pages to map large pages VT-d: Change {un}map_range functions to implement {un}map interface iommu-api: Add ->{un}map callbacks to iommu_ops iommu-api: Add iommu_map and iommu_unmap functions iommu-api: Rename ->{un}map function pointers to ->{un}map_range commit 06ee772043c7ad125f2c2e6a08dc563706f39e8d Merge: fd25a1f 1fb2f77 Author: Linus Torvalds Date: Tue May 18 07:20:19 2010 -0700 Merge branch 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: debugobjects: Section mismatch cleanup commit fd25a1f556760dbd6e29dec66c70223a8912cdb2 Merge: ba2e1c5f 4065c80 Author: Linus Torvalds Date: Tue May 18 07:18:38 2010 -0700 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: (23 commits) cifs: fix noserverino handling when unix extensions are enabled cifs: don't update uniqueid in cifs_fattr_to_inode cifs: always revalidate hardlinked inodes when using noserverino [CIFS] drop quota operation stubs cifs: propagate cifs_new_fileinfo() error back to the caller cifs: add comments explaining cifs_new_fileinfo behavior cifs: remove unused parameter from cifs_posix_open_inode_helper() [CIFS] Remove unused cifs_oplock_cachep cifs: have decode_negTokenInit set flags in server struct cifs: break negotiate protocol calls out of cifs_setup_session cifs: eliminate "first_time" parm to CIFS_SessSetup [CIFS] Fix lease break for writes cifs: save the dialect chosen by server cifs: change && to || cifs: rename "extended_security" to "global_secflags" cifs: move tcon find/create into separate function cifs: move SMB session creation code into separate function cifs: track local_nls in volume info [CIFS] Allow null nd (as nfs server uses) on create [CIFS] Fix losing locks during fork() ... commit 9c6f7e43b4e02c161b53e97ba913855246876c61 Author: Ingo Molnar Date: Tue May 18 00:17:44 2010 +0200 stop_machine: Move local variable closer to the usage site in cpu_stop_cpu_callback() This addresses the following compiler warning: kernel/stop_machine.c: In function 'cpu_stop_cpu_callback': kernel/stop_machine.c:297: warning: unused variable 'work' Cc: Tejun Heo Cc: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit c59bd5688299cddb71183e156e7a3c1409b90df2 Author: H. Peter Anvin Date: Mon May 17 15:13:23 2010 -0700 x86, hweight: Use a 32-bit popcnt for __arch_hweight32() Use a 32-bit popcnt instruction for __arch_hweight32(), even on x86-64. Even though the input register will *usually* be zero-extended due to the standard operation of the hardware, it isn't necessarily so if the input value was the result of truncating a 64-bit operation. Note: the POPCNT32 variant used on x86-64 has a technically unnecessary REX prefix to make it five bytes long, the same as a CALL instruction, therefore avoiding an unnecessary NOP. Reported-by: Linus Torvalds Signed-off-by: H. Peter Anvin Cc: Borislav Petkov LKML-Reference: commit 94f3ca95787ada3d64339a4ecb2754236ab563f6 Author: Arnaldo Carvalho de Melo Date: Mon May 17 18:18:11 2010 -0300 perf tools: Add mode to build without newt support make NO_NEWT=1 Will avoid building the newt (tui) support. Suggested-by: Ingo Molnar LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 4065c802da7484fa36f8cdf10f18d087233ecb88 Author: Jeff Layton Date: Mon May 17 07:18:58 2010 -0400 cifs: fix noserverino handling when unix extensions are enabled The uniqueid field sent by the server when unix extensions are enabled is currently used sometimes when it shouldn't be. The readdir codepath is correct, but most others are not. Fix it. Signed-off-by: Jeff Layton Signed-off-by: Steve French commit 2f51903bc3139e25ec908f8944a0001c7b868e90 Author: Arnaldo Carvalho de Melo Date: Mon May 17 17:57:59 2010 -0300 perf symbols: symbol inconsistency message should be done only at verbose=1 That happened for an old perf.data file that had no fake MMAP events for the kernel modules, but even then it should warn once for each module, not one time for every symbol in every module not found. Reported-by: Ingo Molnar LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 84f30c66c3689745abbd3b9ce39816caeb9bec3b Author: Jeff Layton Date: Mon May 17 07:18:57 2010 -0400 cifs: don't update uniqueid in cifs_fattr_to_inode We use this value to find an inode within the hash bucket, so we can't change this without re-hashing the inode. For now, treat this value as immutable. Eventually, we should probably use an inode number change on a path based operation to indicate that the lookup cache is invalid, but that's a bit more code to deal with. Signed-off-by: Jeff Layton Signed-off-by: Steve French commit db19272edc93661835bf6ec9736cfd0754aa3c62 Author: Jeff Layton Date: Mon May 17 14:51:49 2010 -0400 cifs: always revalidate hardlinked inodes when using noserverino The old cifs_revalidate logic always revalidated hardlinked inodes. This hack allowed CIFS to pass some connectathon tests when server inode numbers aren't used (basic test7, in particular). Signed-off-by: Jeff Layton Signed-off-by: Steve French commit ba2e1c5f25a99dec3873745031ad23ce3fd79bff Merge: 7d32c0a 92183b3 Author: Linus Torvalds Date: Mon May 17 13:54:29 2010 -0700 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: amiga - Floppy platform device conversion m68k: amiga - Sound platform device conversion m68k: amiga - Frame buffer platform device conversion m68k: amiga - Zorro host bridge platform device conversion m68k: amiga - Zorro bus modalias support platform: Make platform resource input parameters const m68k: invoke oom-killer from page fault serial167: Kill unused variables m68k: Implement generic_find_next_{zero_,}le_bit() m68k: hp300 - Checkpatch cleanup m68k: Remove trailing spaces in messages m68k: Simplify param.h by using m68k: Remove BKL from rtc implementations commit 7d32c0aca4fbd0319c860d12af5fae3e88c760e6 Merge: 3d2c978 6f485b4 Author: Linus Torvalds Date: Mon May 17 13:53:35 2010 -0700 Merge git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs * git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs: logfs: handle powerfail on NAND flash logfs: handle errors from get_mtd_device() logfs: remove unused variable logfs: fix sync logfs: fix compile failure logfs: initialize li->li_refcount logfs: commit reservations under space pressure logfs: survive logfs_buf_recover read errors logfs: Close i_ino reuse race logfs: fix logfs_seek_hole() logfs: Return -EINVAL if filesystem image doesn't match LogFS: Fix typo in b6349ac8 logfs: testing the wrong variable commit 3d2c978e0cd8b1157f9eebd13062d61fb7a75ad5 Merge: 81880d6 b8bc138 Author: Linus Torvalds Date: Mon May 17 13:48:10 2010 -0700 Merge branch 'bkl/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing * 'bkl/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing: ptrace: Cleanup useless header ptrace: kill BKL in ptrace syscall commit 63aa9e7e3ab28ad5362502b1a69fae945367ad65 Author: Arnaldo Carvalho de Melo Date: Mon May 17 16:42:37 2010 -0300 perf tui: Add explicit -lslang option At least on rawhide using -lnewt is not enough if we use SLang routines directly, so add an explicit -lslang since we use SLang routines. Reported-by: Ingo Molnar Tested-by: Ingo Molnar Signed-off-by: Arnaldo Carvalho de Melo commit 92183b346f02773dae09182c65f16b013f295d80 Author: Geert Uytterhoeven Date: Sun Apr 5 13:02:13 2009 +0200 m68k: amiga - Floppy platform device conversion Signed-off-by: Geert Uytterhoeven commit ff2db7c5ab78817eb3c5d15dd87f18e9be726f1a Author: Geert Uytterhoeven Date: Sun Apr 5 12:59:54 2009 +0200 m68k: amiga - Sound platform device conversion Signed-off-by: Geert Uytterhoeven commit fa6688e1c7e7341fb7d1ca5878a3641762e60dec Author: Geert Uytterhoeven Date: Sun Apr 5 12:45:56 2009 +0200 m68k: amiga - Frame buffer platform device conversion Signed-off-by: Geert Uytterhoeven commit 0d305464aefff342c85b4db8b3d7a4345246e5a1 Author: Geert Uytterhoeven Date: Sun Apr 5 12:40:41 2009 +0200 m68k: amiga - Zorro host bridge platform device conversion Signed-off-by: Geert Uytterhoeven commit bf54a2b3c0dbf76136f00ff785bf6d8f6291311d Author: Geert Uytterhoeven Date: Tue Nov 18 21:13:53 2008 +0100 m68k: amiga - Zorro bus modalias support Add Amiga Zorro bus modalias and uevent support Signed-off-by: Geert Uytterhoeven commit 0b7f1a7efb38b551f5948a13d0b36e876ba536db Author: Geert Uytterhoeven Date: Wed Jan 28 21:01:02 2009 +0100 platform: Make platform resource input parameters const Make the platform resource input parameters of platform_device_add_resources() and platform_device_register_simple() const, as the resources are copied and never modified. Signed-off-by: Geert Uytterhoeven Acked-by: Greg Kroah-Hartman commit adbf6e6952e80ae42a403442dcae21438cae94b3 Author: Nick Piggin Date: Fri Apr 23 02:06:20 2010 +1000 m68k: invoke oom-killer from page fault As explained in commit 1c0fe6e3bd, we want to call the architecture independent oom killer when getting an unexplained OOM from handle_mm_fault, rather than simply killing current. Cc: linux-m68k@lists.linux-m68k.org Cc: Geert Uytterhoeven Cc: linux-arch@vger.kernel.org Signed-off-by: Nick Piggin Acked-by: David Rientjes [Geert] Kill 2 introduced compiler warnings Signed-off-by: Geert Uytterhoeven commit f3c7f317c91e45aac0ef9d0b6474cd4637de69f0 Author: Geert Uytterhoeven Date: Wed Apr 14 18:48:50 2010 +0200 serial167: Kill unused variables commits 638157bc1461f6718eeca06bedd9a09cf1f35c36 ("serial167: prepare to push BKL down into drivers") and 4165fe4ef7305609a96c7f248cefb9c414d0ede5 ("tty: Fix up char drivers request_room usage") removed code without removing the corresponding variables: | drivers/char/serial167.c: In function 'cd2401_rx_interrupt': | drivers/char/serial167.c:630: warning: unused variable 'len' | drivers/char/serial167.c: In function 'cy_ioctl': | drivers/char/serial167.c:1531: warning: unused variable 'val' Remove the variables to kill the warnings. Signed-off-by: Geert Uytterhoeven commit edb7c60e27c1baff38d82440dc52eaffac9a45f4 Author: Arnaldo Carvalho de Melo Date: Mon May 17 16:22:41 2010 -0300 perf options: Type check all the remaining OPT_ variants OPT_SET_INT was renamed to OPT_SET_UINT since the only use in these tools is to set something that has an enum type, that is builtin compatible with unsigned int. Several string constifications were done to make OPT_STRING require a const char * type. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 8035458fbb567ae138c77a5f710050107c6a7066 Author: Arnaldo Carvalho de Melo Date: Mon May 17 15:51:10 2010 -0300 perf options: Type check OPT_BOOLEAN and fix the offenders Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit b52dd0077cde89111c00efc73a8db07f50ebb3e8 Author: Geert Uytterhoeven Date: Sun Feb 28 13:06:27 2010 +0100 m68k: Implement generic_find_next_{zero_,}le_bit() linux-next: fs/udf/balloc.c: In function 'udf_bitmap_new_block': fs/udf/balloc.c:274: error: implicit declaration of function 'generic_find_next_le_bit' Convert ext2_find_next_{zero_,}bit() into generic_find_next_{zero_,}le_bit(), and wrap the ext2_find_next_{zero_,}bit() around the latter. Signed-off-by: Geert Uytterhoeven commit 9881bbb269e8f287d0e55ae3384e48b26f1872f7 Author: Andrea Gelmini Date: Sat Feb 27 17:51:33 2010 +0100 m68k: hp300 - Checkpatch cleanup arch/m68k/hp300/time.h:2: WARNING: space prohibited between function name and open parenthesis '(' Signed-off-by: Andrea Gelmini Signed-off-by: Geert Uytterhoeven commit b9b0d8b430ebd61d32e2a9544e75a3c4b10cddcd Author: Frans Pop Date: Sat Feb 6 18:47:11 2010 +0100 m68k: Remove trailing spaces in messages Signed-off-by: Frans Pop Signed-off-by: Geert Uytterhoeven commit 291d7e9553fef2e18825b1ef1345fbd316dff98f Author: Robert P. J. Day Date: Thu Dec 31 15:35:44 2009 -0500 m68k: Simplify param.h by using Signed-off-by: Robert P. J. Day Signed-off-by: Geert Uytterhoeven commit b1f3bb494e8acddeb972331c2ac642b3c7bceeb9 Author: Thomas Gleixner Date: Thu Oct 15 08:42:21 2009 +0000 m68k: Remove BKL from rtc implementations m68k does not support SMP. The access to the rtc is already serialized with local_irq_save/restore which is sufficient on UP. The open() protection in arch/m68k/mvme16x/rtc.c is not pretty but sufficient on UP and safe w/o the BKL. open() in arch/m68k/bvme6000/rtc.c can do with the same atomic logic as arch/m68k/mvme16x/rtc.c Signed-off-by: Thomas Gleixner Signed-off-by: Geert Uytterhoeven commit 1967936d688c475b85d34d84e09858cf514c893c Author: Arnaldo Carvalho de Melo Date: Mon May 17 15:39:16 2010 -0300 perf options: Check v type in OPT_U?INTEGER To avoid problems like the one fixed by Stephane Eranian in 3de29ca, now we'll got this instead: bench/sched-messaging.c:259: error: negative width in bit-field ‘’ bench/sched-messaging.c:261: error: negative width in bit-field ‘’ Which is rather cryptic, but is how BUILD_BUG_ON_ZERO works, so kernel hackers should be already used to this. With it in place found some problems, fixed by changing the affected variables to sensible types or changed some OPT_INTEGER to OPT_UINTEGER. Next csets will go thru converting each of the remaining OPT_ so that review can be made easier by grouping changes per type per patch. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit c100edbee8dbf033ec4095a976a74c1c75c9fc1d Author: Arnaldo Carvalho de Melo Date: Mon May 17 15:30:00 2010 -0300 perf options: Introduce OPT_UINTEGER For unsigned int options to be parsed, next patches will make use of it. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit dc4ff19341126155c5714119396efbae62ab40bf Author: Arnaldo Carvalho de Melo Date: Mon May 17 12:25:09 2010 -0300 perf tui: Add workaround for slang < 2.1.4 Older versions of the slang library didn't used the 'const' specifier, causing problems with modern compilers of this kind: util/newt.c:252: error: passing argument 1 of ‘SLsmg_printf’ discards qualifiers from pointer target type Fix it by using some wrappers that when needed const the affected parameters back to plain (char *). Reported-by: Lin Ming Cc: Frédéric Weisbecker Cc: Lin Ming Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: <20100517145421.GD29052@ghostprotocols.net> Signed-off-by: Arnaldo Carvalho de Melo commit 3de29cab1f8d62db557a4afed0fb17eebfe64438 Author: Stephane Eranian Date: Mon May 17 12:20:43 2010 -0300 perf record: Fix bug mismatch with -c option definition The -c option defines the user requested sampling period. It was implemented using an unsigned int variable but the type of the option was OPT_LONG. Thus, the option parser was overwriting memory belonging to other variables, namely the mmap_pages leading to a zero page sampling buffer. The bug was exposed only when compiling at -O0, probably because the compiler was padding variables at higher optimization levels. This patch fixes this problem by declaring user_interval as u64. This also avoids wrap-around issues for large period on 32-bit systems. Commiter note: Made it use OPT_U64(user_interval) after implementing OPT_U64 in the previous patch. Cc: David S. Miller Cc: Frédéric Weisbecker Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <4bf11ae9.e88cd80a.06b0.ffffa8e3@mx.google.com> Signed-off-by: Stephane Eranian Signed-off-by: Arnaldo Carvalho de Melo commit 6ba85cea872954a36d79e46bf6a9c6ea92794f01 Author: Arnaldo Carvalho de Melo Date: Mon May 17 12:16:48 2010 -0300 perf options: Introduce OPT_U64 We have things like user_interval (-c/--count) in 'perf record' that needs this. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 81880d603d00c645e0890d0a44d50711c503b72b Author: Anton Blanchard Date: Mon May 17 14:34:57 2010 +1000 atomic_t: Remove volatile from atomic_t definition When looking at a performance problem on PowerPC, I noticed some awful code generation: c00000000051fc98: 3b 60 00 01 li r27,1 ... c00000000051fca0: 3b 80 00 00 li r28,0 ... c00000000051fcdc: 93 61 00 70 stw r27,112(r1) c00000000051fce0: 93 81 00 74 stw r28,116(r1) c00000000051fce4: 81 21 00 70 lwz r9,112(r1) c00000000051fce8: 80 01 00 74 lwz r0,116(r1) c00000000051fcec: 7d 29 07 b4 extsw r9,r9 c00000000051fcf0: 7c 00 07 b4 extsw r0,r0 c00000000051fcf4: 7c 20 04 ac lwsync c00000000051fcf8: 7d 60 f8 28 lwarx r11,0,r31 c00000000051fcfc: 7c 0b 48 00 cmpw r11,r9 c00000000051fd00: 40 c2 00 10 bne- c00000000051fd10 c00000000051fd04: 7c 00 f9 2d stwcx. r0,0,r31 c00000000051fd08: 40 c2 ff f0 bne+ c00000000051fcf8 c00000000051fd0c: 4c 00 01 2c isync We create two constants, write them out to the stack, read them straight back in and sign extend them. What a mess. It turns out this bad code is a result of us defining atomic_t as a volatile int. We removed the volatile attribute from the powerpc atomic_t definition years ago, but commit ea435467500612636f8f4fb639ff6e76b2496e4b (atomic_t: unify all arch definitions) added it back in. To dig up an old quote from Linus: > The fact is, volatile on data structures is a bug. It's a wart in the C > language. It shouldn't be used. > > Volatile accesses in *code* can be ok, and if we have "atomic_read()" > expand to a "*(volatile int *)&(x)->value", then I'd be ok with that. > > But marking data structures volatile just makes the compiler screw up > totally, and makes code for initialization sequences etc much worse. And screw up it does :) With the volatile removed, we see much more reasonable code generation: c00000000051f5b8: 3b 60 00 01 li r27,1 ... c00000000051f5c0: 3b 80 00 00 li r28,0 ... c00000000051fc7c: 7c 20 04 ac lwsync c00000000051fc80: 7c 00 f8 28 lwarx r0,0,r31 c00000000051fc84: 7c 00 d8 00 cmpw r0,r27 c00000000051fc88: 40 c2 00 10 bne- c00000000051fc98 c00000000051fc8c: 7f 80 f9 2d stwcx. r28,0,r31 c00000000051fc90: 40 c2 ff f0 bne+ c00000000051fc80 c00000000051fc94: 4c 00 01 2c isync Six instructions less. Signed-off-by: Anton Blanchard Signed-off-by: Linus Torvalds commit f3d46f9d3194e0329216002a8724d4c0957abc79 Author: Anton Blanchard Date: Mon May 17 14:33:53 2010 +1000 atomic_t: Cast to volatile when accessing atomic variables In preparation for removing volatile from the atomic_t definition, this patch adds a volatile cast to all the atomic read functions. Signed-off-by: Anton Blanchard Signed-off-by: Linus Torvalds commit fea24e28c663e62663097f0ed3b8ff1f9a87f15e Author: Jacob Pan Date: Fri May 14 14:41:20 2010 -0700 x86, mrst: add nop functions to x86_init mpparse functions Moorestown does not have BIOS provided MP tables, we can save some time by avoiding scaning of these tables. e.g. [ 0.000000] Scan SMP from c0000000 for 1024 bytes. [ 0.000000] Scan SMP from c009fc00 for 1024 bytes. [ 0.000000] Scan SMP from c00f0000 for 65536 bytes. [ 0.000000] Scan SMP from c00bfff0 for 1024 bytes. Searching EBDA with the base at 0x40E will also result in random pointer deferencing within 1MB. This can be a problem in Lincroft if the pointer hits VGA area and VGA mode is not enabled. Signed-off-by: Jacob Pan LKML-Reference: <1273873281-17489-8-git-send-email-jacob.jun.pan@linux.intel.com> Acked-by: Thomas Gleixner Signed-off-by: H. Peter Anvin commit e4af4268a34d8cd28c46a03161fc017cbd2db887 Author: Jacob Pan Date: Fri May 14 14:41:14 2010 -0700 x86, mrst, pci: return 0 for non-present pci bars Moorestown PCI code has special handling of devices with fixed BARs. In case of BAR sizing writes, we need to update the fake PCI MMCFG space with real size decode value. When a BAR is not present, we need to return 0 instead of ~0. ~0 will be treated as device error per bugzilla 12006. Signed-off-by: Jacob Pan LKML-Reference: <1273873281-17489-2-git-send-email-jacob.jun.pan@linux.intel.com> Acked-by: Jesse Barnes Acked-by: Thomas Gleixner Signed-off-by: H. Peter Anvin commit a9a4ab747e2d45bf08fddbc1568f080091486af9 Author: Arnaldo Carvalho de Melo Date: Sun May 16 21:04:27 2010 -0300 perf tui: Add help window to show key associations Suggested-by: Ingo Molnar Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit a308f3a868185d4f804fe71d0400e2b058c6d9af Author: Arnaldo Carvalho de Melo Date: Sun May 16 20:29:38 2010 -0300 perf tui: Make <- exit menus too In fact it is now added to the hot key list when newt_form__new is used, allowing us to remove the explicit assignment in all its users. The visible change is that <- will exit the menu that pops up when -> is pressed (and Enter when callchains are not being used). Suggested-by: Ingo Molnar Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 9d192e118a094087494997ea1c8a2faf39af38c5 Author: Arnaldo Carvalho de Melo Date: Sat May 15 21:15:01 2010 -0300 perf newt: Add single key shortcuts for zoom into DSO and threads 'D'/'d' for zooming into the DSO in the current highlighted hist entry, 'T'/'t' for zooming into the current thread. Suggested-by: Ingo Molnar Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 29351db6a05e7e42be457569428425520a18beec Author: Arnaldo Carvalho de Melo Date: Sat May 15 21:06:58 2010 -0300 perf newt: Exit browser unconditionally when CTRL+C, q or Q is pressed ESC still asks for confirmation. Suggested-by: Ingo Molnar Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit c1ec5fefd9cd9ccb020966a49a3c7f44b25d9e84 Author: Arnaldo Carvalho de Melo Date: Sat May 15 20:45:31 2010 -0300 perf newt: Fix the 'A'/'a' shortcut for annotate Reported-by: Ingo Molnar Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 605539034f208d365f76af8e2152cb25f702367d Author: Arnaldo Carvalho de Melo Date: Sat May 15 20:40:34 2010 -0300 perf newt: Make <- exit the ui_browser Right now that means that pressing the left arrow willl make the symbol annotation window to exit back to the main symbol histogram browser. This is another improvement on the UI fastpath, i.e. just the arrows and enter are enough for most browsing. Suggested-by: Ingo Molnar Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 7ebaa2838a751125c113072486334d7b4e63f9ad Merge: 1ff3d7d 3e1bbdc3 Author: Ingo Molnar Date: Sat May 15 08:39:09 2010 +0200 Merge branch 'perf' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core commit 1ff3d7d79204612ebe2e611d2592f8898908ca00 Author: Cyrill Gorcunov Date: Fri May 14 23:08:15 2010 +0400 x86, perf: P4 PMU - fix counters management logic Jaswinder reported this #GP: | | Message from syslogd@ht at May 14 09:39:32 ... | kernel:[ 314.908612] EIP: [] | x86_perf_event_set_period+0x19d/0x1b2 SS:ESP 0068:edac3d70 | Ming has narrowed it down to a comparision issue between arguments with different sizes and signs. As result event index reached a wrong value which in turn led to a GP fault. At the same time it was found that p4_next_cntr has broken logic and should return the counter index only if it was not yet borrowed for another event. Reported-by: Jaswinder Singh Rajput Reported-by: Lin Ming Bisected-by: Lin Ming Tested-by: Jaswinder Singh Rajput Signed-off-by: Cyrill Gorcunov CC: Peter Zijlstra CC: Frederic Weisbecker LKML-Reference: <20100514190815.GG13509@lenovo> Signed-off-by: Ingo Molnar commit 3e1bbdc3a721f4b1ed44f4554402a8dbc60fa97f Author: Arnaldo Carvalho de Melo Date: Fri May 14 20:05:21 2010 -0300 perf newt: Make <- zoom out filters After we use the filters to zoom into DSOs or threads, we can use <- (left arrow) to zoom out from the last filter applied. It is still possible to zoom out of order by using the popup menu. With this we now have the zoom out operation on the browsing fast path, by allowing fast navigation using just the four arrors and the enter key to expand collapse callchains. Suggested-by: Ingo Molnar Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit c82ee828aa20487d254a5225d256cd422acee459 Author: Arnaldo Carvalho de Melo Date: Fri May 14 14:19:35 2010 -0300 perf report: Report number of events, not samples Number of samples is meaningless after we switched to auto-freq, so report the number of events, i.e. not the sum of the different periods, but the number PERF_RECORD_SAMPLE emitted by the kernel. While doing this I noticed that naming "count" to the sum of all the event periods can be confusing, so rename it to .period, just like in struct sample.data, so that we become more consistent. This helps with the next step, that was to record in struct hist_entry the number of sample events for each instance, we need that because we use it to generate the number of events when applying filters to the tree of hist entries like it is being done in the TUI report browser. Suggested-by: Ingo Molnar Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit ade275c4b6db98ae7b197cc7c6bdd73567a975c2 Merge: 5f65f15 508ff9d Author: Steve French Date: Fri May 14 17:11:48 2010 +0000 Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6 commit cee75ac7ecc27084accdb9d9d6fde65a09f047ae Author: Arnaldo Carvalho de Melo Date: Fri May 14 13:16:55 2010 -0300 perf hist: Clarify events_stats fields usage The events_stats.total field is too generic, rename it to .total_period, and also add a comment explaining that it is the sum of all the .period fields in samples, that is needed because we use auto-freq to avoid sampling artifacts. Ditto for events_stats.lost, that is the sum of all lost_event.lost fields, i.e. the number of events the kernel dropped. Looking at the users, builtin-sched.c can make use of these fields and stop doing it again. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 5f65f150fbc580ccfd57e5528e1fc905aaaef65c Merge: baa4563 6a251b0 Author: Steve French Date: Fri May 14 15:15:28 2010 +0000 Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6 commit c8446b9bdabcb0caa61bb341bd73c58f7104b503 Author: Arnaldo Carvalho de Melo Date: Fri May 14 10:36:42 2010 -0300 perf hist: Make event__totals per hists This is one more thing that started global but are more useful per hist or per session. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 9e565292270a2d55524be38835104c564ac8f795 Author: Roland McGrath Date: Thu May 13 21:43:03 2010 -0700 x86: Use .cfi_sections for assembly code The newer assemblers support the .cfi_sections directive so we can put the CFI from .S files into the .debug_frame section that is preserved in unstripped vmlinux and in separate debuginfo, rather than the .eh_frame section that is now discarded by vmlinux.lds.S. Signed-off-by: Roland McGrath LKML-Reference: <20100514044303.A6FE7400BE@magilla.sf.frob.com> Signed-off-by: H. Peter Anvin commit baa456331738b4e76a92318b62b354377a30ad80 Merge: aa3e557 4462dc0 Author: Steve French Date: Thu May 13 22:19:32 2010 +0000 Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: fs/cifs/inode.c commit 5d2be7cb198a0a6bc6088d3806fb7261b184ad89 Author: Kirill Smelkov Date: Thu May 13 14:39:25 2010 +0400 perf trace scripts: Fix typos in perf-trace-python.txt option option -> option special special -> special Signed-off-by: Kirill Smelkov Cc: Tom Zanussi LKML-Reference: <1273747165-17242-1-git-send-email-kirr@mns.spb.ru> Signed-off-by: Arnaldo Carvalho de Melo commit 2e6cdf996ba43ce0b090ffbf754f83e17362cd69 Author: Stephane Eranian Date: Wed May 12 10:40:01 2010 +0200 perf tools: change event inheritance logic in stat and record By default, event inheritance across fork and pthread_create was on but the -i option of stat and record, which enabled inheritance, led to believe it was off by default. This patch fixes this logic by inverting the meaning of the -i option. By default inheritance is on whether you attach to a process (-p), a thread (-t) or start a process. If you pass -i, then you turn off inheritance. Turning off inheritance if you don't need it, helps limit perf resource usage as well. The patch also fixes perf stat -t xxxx and perf record -t xxxx which did not start the counters. Acked-by: Frederic Weisbecker Cc: David S. Miller Cc: Frédéric Weisbecker Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <4bea9d2f.d60ce30a.0b5b.08e1@mx.google.com> Signed-off-by: Stephane Eranian Signed-off-by: Arnaldo Carvalho de Melo commit 8a0ecfb8b47dc765fdf460913231876bbc95385e Author: Frederic Weisbecker Date: Thu May 13 19:47:16 2010 +0200 perf hist: Fix missing getline declaration hist.c needs to include util.h so that it gets stdio.h inclusion with __GNU_SOURCE defined. Fixes: util/hist.c: In function ‘hist_entry__parse_objdump_line’: util/hist.c:931: erreur: implicit declaration of function ‘getline’ util/hist.c:931: erreur: nested extern declaration of ‘getline’ Signed-off-by: Frederic Weisbecker Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras LKML-Reference: <1273772836-11533-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo commit 8769e1c7177fd9f6981042bcc6c2851c99a4a7e7 Author: Frederic Weisbecker Date: Thu May 13 19:22:58 2010 +0200 perf hist: Fix hists__browse no-newt case Fix mistake in a parameter type of the no-newt hists__browse() version. Fixes: builtin-report.c: In function ‘__cmd_report’: builtin-report.c:314: erreur: incompatible type for argument 1 of ‘hists__browse’ Signed-off-by: Frederic Weisbecker Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras LKML-Reference: <1273771378-8577-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo commit 720019908fd5a1bb442bb0a35a6027ba21864d25 Author: Cyrill Gorcunov Date: Wed May 12 21:42:42 2010 +0400 x86, perf: P4 PMU -- use hash for p4_get_escr_idx() Linear search over all p4 MSRs should be fine if only we would not use it in events scheduling routine which is pretty time critical. Lets use hashes. It should speed scheduling up significantly. v2: Steven proposed to use more gentle approach than issue BUG on error, so we use WARN_ONCE now Signed-off-by: Cyrill Gorcunov Cc: Peter Zijlstra Cc: Steven Rostedt Cc: Frederic Weisbecker Cc: Lin Ming LKML-Reference: <20100512174242.GA5190@lenovo> Signed-off-by: Ingo Molnar commit ad56b0797e67df5e04b2f1a1e02900145c5c16f2 Merge: 668eb65 b57f95a Author: Ingo Molnar Date: Thu May 13 08:11:26 2010 +0200 Merge commit 'v2.6.34-rc7' into tracing/core Merge reason: Update from -rc5 to -rc7. Signed-off-by: Ingo Molnar commit 975fc2d5f20b071576e7c9920c4f1a1eae80f88d Merge: 8e6d557 ef7b93a Author: Ingo Molnar Date: Wed May 12 08:14:29 2010 +0200 Merge branch 'perf' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core commit aa3e5572c538d753dce11bf93532a75f95d22b40 Author: Steve French Date: Wed May 12 02:24:12 2010 +0000 [CIFS] drop quota operation stubs CIFS has stubs for XFS-style quotas without an actual implementation backing them, hidden behind a config option not visible in Kconfig. Remove these stubs for now as the quota operations will see some major changes and this code simply gets in the way. Signed-off-by: Christoph Hellwig Reviewed-by: Jeff Layton Signed-off-by: Jan Kara Signed-off-by: Steve French commit ef7b93a11904c6ba10604233d318d9e8ec88cddc Author: Arnaldo Carvalho de Melo Date: Tue May 11 23:18:06 2010 -0300 perf report: Librarize the annotation code and use it in the newt browser Now we don't anymore use popen to run 'perf annotate' for the selected symbol, instead we collect per address samplings when processing samples in 'perf report' if we're using the newt browser, then we use this data directly to do annotation. Done this way we can actually traverse the objdump_line objects directly, matching the addresses to the collected samples and colouring them appropriately using lower level slang routines. The new ui_browser class will be reused for the main, callchain aware, histogram browser, when it will be made generic and don't assume that the objects are always instances of the objdump_line class maintained using list_heads. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit c9775b4cc522e5f1b40b1366a993f0f05f600f39 Author: H. Peter Anvin Date: Tue May 11 17:49:54 2010 -0700 x86, fpu: Use static_cpu_has() to implement use_xsave() use_xsave() is now just a special case of static_cpu_has(), so use static_cpu_has(). Signed-off-by: H. Peter Anvin Cc: Avi Kivity Cc: Suresh Siddha LKML-Reference: <1273135546-29690-2-git-send-email-avi@redhat.com> commit a3c8acd04376d604370dcb6cd2143c9c14078a50 Author: H. Peter Anvin Date: Tue May 11 17:47:07 2010 -0700 x86: Add new static_cpu_has() function using alternatives For CPU-feature-specific code that touches performance-critical paths, introduce a static patching version of [boot_]cpu_has(). This is run at alternatives time and is therefore not appropriate for most initialization code, but on the other hand initialization code is generally not performance critical. On gcc 4.5+ this uses the new "asm goto" feature. Signed-off-by: H. Peter Anvin Cc: Avi Kivity Cc: Suresh Siddha LKML-Reference: <1273135546-29690-2-git-send-email-avi@redhat.com> commit 72d5a9f7a9542f88397558c65bcfc3b115a65e34 Author: Paul E. McKenney Date: Mon May 10 17:12:17 2010 -0700 rcu: remove all rcu head initializations, except on_stack initializations Remove all rcu head inits. We don't care about the RCU head state before passing it to call_rcu() anyway. Only leave the "on_stack" variants so debugobjects can keep track of objects on stack. Signed-off-by: Mathieu Desnoyers Signed-off-by: Paul E. McKenney commit 3798ed7bc7ade26d3f59506cd06288615dfc7585 Author: Arnaldo Carvalho de Melo Date: Tue May 11 18:01:23 2010 -0300 perf ui: Add ui_helpline methods Initially this was just to be able to have a printf like method to prepare the formatted string and then pass to newtPushHelpLine, but as we already have for ui_progress, etc, its a step in identifying a restricted, highlevel set of widgets we can then have implementations for multiple widget sets (GTK, etc). Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit a93d2f1744206827ccf416e2cdc5018aa503314e Author: Changli Gao Date: Fri May 7 14:33:26 2010 +0800 sched, wait: Use wrapper functions epoll should not touch flags in wait_queue_t. This patch introduces a new function __add_wait_queue_exclusive(), for the users, who use wait queue as a LIFO queue. __add_wait_queue_tail_exclusive() is introduced too instead of add_wait_queue_exclusive_locked(). remove_wait_queue_locked() is removed, as it is a duplicate of __remove_wait_queue(), disliked by users, and with less users. Signed-off-by: Changli Gao Signed-off-by: Peter Zijlstra Cc: Alexander Viro Cc: Paul Menage Cc: Li Zefan Cc: Davide Libenzi Cc: LKML-Reference: <1273214006-2979-1-git-send-email-xiaosuo@gmail.com> Signed-off-by: Ingo Molnar commit d11c7addfe0fa501cb54c824c0fac3481d527433 Author: Kyle McMartin Date: Mon May 10 16:43:35 2010 -0400 perf symbols: allow forcing use of cplus_demangle For Fedora, I want to force perf to link against libiberty.a for cplus_demangle, rather than libbfd.a for bfd_demangle due to licensing insanity on binutils. (libiberty is LGPL2, libbfd is GPL3.) If we just rely on autodetection, we'll end up with libbfd linked against us, since they're both in binutils-static in the buildroot. Cc: Ingo Molnar LKML-Reference: <20100510204335.GA7565@bombadil.infradead.org> Signed-off-by: Kyle McMartin Signed-off-by: Arnaldo Carvalho de Melo commit 6b3c4ef50441e85dc9b2c9b67e95e8ad1185c15e Author: Masami Hiramatsu Date: Tue May 11 00:59:53 2010 -0400 perf probe: Check older elfutils and set NO_DWARF Check whether elfutils is older than 0.138 (from which version checking routine has been introduced). And if so, set NO_DWARF because it is hard to check the API dependency without version checking. Signed-off-by: Masami Hiramatsu Reported-by: Robert Richter Cc: Robert Richter Cc: Ingo Molnar LKML-Reference: <20100511045953.9913.19485.stgit@localhost6.localdomain6> Signed-off-by: Arnaldo Carvalho de Melo commit b09e0190acf88c7fe3b05e3c331e1b2ef5310896 Author: Arnaldo Carvalho de Melo Date: Tue May 11 11:10:15 2010 -0300 perf hist: Adopt filter by dso and by thread methods from the newt browser Those are really not specific to the newt code, can be used by other UI frontends. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit fdb3603800e7a65bc3cafdfd5a1797d08f09e582 Author: Suresh Jayaraman Date: Tue May 11 09:46:46 2010 +0530 cifs: propagate cifs_new_fileinfo() error back to the caller ..otherwise memory allocation errors go undetected. Signed-off-by: Suresh Jayaraman Signed-off-by: Steve French commit 795e74f7a69f9c08afa4fa7c86cc4f18a62bd630 Merge: a523572 12c7389 Author: Joerg Roedel Date: Tue May 11 17:40:57 2010 +0200 Merge branch 'iommu/largepages' into amd-iommu/2.6.35 Conflicts: arch/x86/kernel/amd_iommu.c commit a52357259680fe5368c2fabf5949209e231f2aa2 Author: Joerg Roedel Date: Tue May 11 17:12:33 2010 +0200 x86/amd-iommu: Add amd_iommu=off command line option This patch adds a command line option to tell the AMD IOMMU driver to not initialize any IOMMU it finds. Signed-off-by: Joerg Roedel commit 8e6d5573af55435160d329f6ae3fe16a0abbdaec Author: Lin Ming Date: Sat May 8 20:28:41 2010 +1000 perf, powerpc: Implement group scheduling transactional APIs [paulus@samba.org: Set cpuhw->event[i]->hw.config in power_pmu_commit_txn.] Signed-off-by: Lin Ming Signed-off-by: Paul Mackerras Signed-off-by: Peter Zijlstra LKML-Reference: <20100508102841.GA10650@brick.ozlabs.ibm.com> Signed-off-by: Ingo Molnar commit 96c21a460a37880abfbc8445d5b098dbab958a29 Author: Peter Zijlstra Date: Tue May 11 16:19:10 2010 +0200 perf: Fix exit() vs event-groups Corey reported that the value scale times of group siblings are not updated when the monitored task dies. The problem appears to be that we only update the group leader's time values, fix it by updating the whole group. Reported-by: Corey Ashford Signed-off-by: Peter Zijlstra Cc: Paul Mackerras Cc: Mike Galbraith Cc: Arnaldo Carvalho de Melo Cc: Frederic Weisbecker Cc: # .34.x LKML-Reference: <1273588935.1810.6.camel@laptop> Signed-off-by: Ingo Molnar commit 050735b08ca8a016bbace4445fa025b88fee770b Author: Peter Zijlstra Date: Tue May 11 11:51:53 2010 +0200 perf: Fix exit() vs PERF_FORMAT_GROUP Both Stephane and Corey reported that PERF_FORMAT_GROUP didn't work as expected if the task the counters were attached to quit before the read() call. The cause is that we unconditionally destroy the grouping when we remove counters from their context. Fix this by splitting off the group destroy from the list removal such that perf_event_remove_from_context() does not do this and change perf_event_release() to do so. Reported-by: Corey Ashford Reported-by: Stephane Eranian Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Frederic Weisbecker Cc: Paul Mackerras Cc: # .34.x LKML-Reference: <1273571513.5605.3527.camel@twins> Signed-off-by: Ingo Molnar commit e3174cfd2a1e28fff774681f00a0eef3d31da970 Author: Ingo Molnar Date: Tue May 11 08:31:49 2010 +0200 Revert "perf: Fix exit() vs PERF_FORMAT_GROUP" This reverts commit 4fd38e4595e2f6c9d27732c042a0e16b2753049c. It causes various crashes and hangs when events are activated. The cause is not fully understood yet but we need to revert it because the effects are severe. Reported-by: Stephane Eranian Reported-by: Lin Ming Cc: Peter Zijlstra Cc: Corey Ashford Cc: Mike Galbraith Cc: Paul Mackerras Cc: Arnaldo Carvalho de Melo Cc: Frederic Weisbecker LKML-Reference: Signed-off-by: Ingo Molnar commit 4376030a54860dedab9d848dfa7cc700a6025c0b Author: Mathieu Desnoyers Date: Sat Apr 17 08:48:39 2010 -0400 rcu head introduce rcu head init on stack PEM: o Would it be possible to make this bisectable as follows? a. Insert a new patch after current patch 4/6 that defines destroy_rcu_head_on_stack(), init_rcu_head_on_stack(), and init_rcu_head() with their !CONFIG_DEBUG_OBJECTS_RCU_HEAD definitions. This patch performs this transition. Signed-off-by: Mathieu Desnoyers CC: "Paul E. McKenney" CC: David S. Miller CC: akpm@linux-foundation.org CC: mingo@elte.hu CC: laijs@cn.fujitsu.com CC: dipankar@in.ibm.com CC: josh@joshtriplett.org CC: dvhltc@us.ibm.com CC: niv@us.ibm.com CC: tglx@linutronix.de CC: peterz@infradead.org CC: rostedt@goodmis.org CC: Valdis.Kletnieks@vt.edu CC: dhowells@redhat.com CC: eric.dumazet@gmail.com CC: Alexey Dobriyan Signed-off-by: Paul E. McKenney commit a5d8e467f83f6672104f276223a88e3b50cbd375 Author: Mathieu Desnoyers Date: Sat Apr 17 08:48:38 2010 -0400 Debugobjects transition check Implement a basic state machine checker in the debugobjects. This state machine checker detects races and inconsistencies within the "active" life of a debugobject. The checker only keeps track of the current state; all the state machine logic is kept at the object instance level. The checker works by adding a supplementary "unsigned int astate" field to the debug_obj structure. It keeps track of the current "active state" of the object. The only constraints that are imposed on the states by the debugobjects system is that: - activation of an object sets the current active state to 0, - deactivation of an object expects the current active state to be 0. For the rest of the states, the state mapping is determined by the specific object instance. Therefore, the logic keeping track of the state machine is within the specialized instance, without any need to know about it at the debugobject level. The current object active state is changed by calling: debug_object_active_state(addr, descr, expect, next) where "expect" is the expected state and "next" is the next state to move to if the expected state is found. A warning is generated if the expected is not found. Signed-off-by: Mathieu Desnoyers Reviewed-by: Thomas Gleixner Acked-by: David S. Miller CC: "Paul E. McKenney" CC: akpm@linux-foundation.org CC: mingo@elte.hu CC: laijs@cn.fujitsu.com CC: dipankar@in.ibm.com CC: josh@joshtriplett.org CC: dvhltc@us.ibm.com CC: niv@us.ibm.com CC: peterz@infradead.org CC: rostedt@goodmis.org CC: Valdis.Kletnieks@vt.edu CC: dhowells@redhat.com CC: eric.dumazet@gmail.com CC: Alexey Dobriyan Signed-off-by: Paul E. McKenney commit e61a639a794063d78fd248a37ce2c21d5c81fc19 Author: Tom Zanussi Date: Sun May 9 23:47:00 2010 -0500 perf/trace/scripting: syscall-counts script cleanup A small fix for the syscall counts script: - silence the match output in the shell script Cc: Frédéric Weisbecker Cc: Ingo Molnar LKML-Reference: <1273466820-9330-10-git-send-email-tzanussi@gmail.com> Signed-off-by: Tom Zanussi Signed-off-by: Arnaldo Carvalho de Melo commit 79e653f1bf2e52d12a952366e782dadf590b9d1d Author: Tom Zanussi Date: Sun May 9 23:46:59 2010 -0500 perf/trace/scripting: syscall-counts-by-pid script cleanup A small fix for the syscall counts by pid script: - silence the match output in the shell script Cc: Frédéric Weisbecker Cc: Ingo Molnar LKML-Reference: <1273466820-9330-9-git-send-email-tzanussi@gmail.com> Signed-off-by: Tom Zanussi Signed-off-by: Arnaldo Carvalho de Melo commit a4ab0c12975d1286b2696370f5e0576450609bf0 Author: Tom Zanussi Date: Sun May 9 23:46:58 2010 -0500 perf/trace/scripting: failed-syscalls-by-pid script cleanup A small fixe for the failed syscalls by pid script: - silence the match output in the shell script Cc: Frédéric Weisbecker Cc: Ingo Molnar LKML-Reference: <1273466820-9330-8-git-send-email-tzanussi@gmail.com> Signed-off-by: Tom Zanussi Signed-off-by: Arnaldo Carvalho de Melo commit 3824a4e8da9791f4eed99d69bfcdb3b42f440426 Author: Tom Zanussi Date: Sun May 9 23:46:57 2010 -0500 perf/trace/scripting: don't show script start/stop messages by default Only print the script start/stop messages in verbose mode - users normally don't care and it just clutters up the output. Cc: Frédéric Weisbecker Cc: Ingo Molnar LKML-Reference: <1273466820-9330-7-git-send-email-tzanussi@gmail.com> Signed-off-by: Tom Zanussi Signed-off-by: Arnaldo Carvalho de Melo commit a3412d9b358d37fce4527fd67ea601635f2b9496 Author: Tom Zanussi Date: Sun May 9 23:46:56 2010 -0500 perf/trace/scripting: workqueue-stats script cleanup Some minor fixes for the workqueue-stats script: - Fix nuisance 'use of uninitialized value' warnings Cc: Frédéric Weisbecker Cc: Ingo Molnar LKML-Reference: <1273466820-9330-6-git-send-email-tzanussi@gmail.com> Signed-off-by: Tom Zanussi Signed-off-by: Arnaldo Carvalho de Melo commit e366728d57cb8c708f76b282ae194c6044355b5f Author: Tom Zanussi Date: Sun May 9 23:46:55 2010 -0500 perf/trace/scripting: wakeup-latency script cleanup Some minor fixes for the wakeup-latency script: - Fix nuisance 'use of uninitialized value' warnings - Avoid divide-by-zero error Cc: Frédéric Weisbecker Cc: Ingo Molnar LKML-Reference: <1273466820-9330-5-git-send-email-tzanussi@gmail.com> Signed-off-by: Tom Zanussi Signed-off-by: Arnaldo Carvalho de Melo commit e88a4bfbcda440b1c6b9d5a31a554a6ad9686182 Author: Tom Zanussi Date: Sun May 9 23:46:54 2010 -0500 perf/trace/scripting: rwtop script cleanup A couple of fixes for the rwtop script: - printing the totals and clearing the hashes in the signal handler eventually leads to various random and serious problems when running the rwtop script continuously. Moving the print_totals() calls to the event handlers solves that problem, and the event handlers are invoked frequently enough that it doesn't affect the timeliness of the output. - Fix nuisance 'use of uninitialized value' warnings Cc: Frédéric Weisbecker Cc: Ingo Molnar Message-Id: <1273466820-9330-4-git-send-email-tzanussi@gmail.com> Signed-off-by: Tom Zanussi Signed-off-by: Arnaldo Carvalho de Melo commit 6922c3d772711239e75ddaea760e6b0535e7e7a6 Author: Tom Zanussi Date: Sun May 9 23:46:53 2010 -0500 perf/trace/scripting: rw-by-pid script cleanup Some minor fixes for the rw-by-pid script: - Fix nuisance 'use of uninitialized value' warnings - Change the failed read/write sections to sort by error counts Cc: Frédéric Weisbecker Cc: Ingo Molnar LKML-Reference: <1273466820-9330-3-git-send-email-tzanussi@gmail.com> Signed-off-by: Tom Zanussi Signed-off-by: Arnaldo Carvalho de Melo commit c3f5fd287aa897f710f3305367a1d256c9cf3e83 Author: Tom Zanussi Date: Sun May 9 23:46:52 2010 -0500 perf/trace/scripting: failed-syscalls script cleanup A couple small fixes for the failed syscalls script: - The script description says it can be restricted to a specific comm, make it so. - silence the match output in the shell script Cc: Frédéric Weisbecker Cc: Ingo Molnar LKML-Reference: <1273466820-9330-2-git-send-email-tzanussi@gmail.com> Signed-off-by: Tom Zanussi Signed-off-by: Arnaldo Carvalho de Melo commit fefb0b94bbab858be0909a7eb5ef357e0f996a79 Author: Arnaldo Carvalho de Melo Date: Mon May 10 13:57:51 2010 -0300 perf hist: Calculate max_sym name len and nr_entries Better done when we are adding entries, be it initially of when we're re-sorting the histograms. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit dce8bf4e115aa44d590802ce3554e926840c9042 Author: H. Peter Anvin Date: Mon May 10 13:41:41 2010 -0700 x86, fpu: Use the proper asm constraint in use_xsave() The proper constraint for a receiving 8-bit variable is "=qm", not "=g" which equals "=rim"; even though the "i" will never match, bugs can and do happen due to the difference between "q" and "r". Signed-off-by: H. Peter Anvin Cc: Avi Kivity Cc: Suresh Siddha LKML-Reference: <1273135546-29690-2-git-send-email-avi@redhat.com> commit c3f8978ea332cd4be88e12574452a025892ac9af Author: H. Peter Anvin Date: Mon May 10 13:37:16 2010 -0700 x86, fpu: Unbreak FPU emulation Unbreak FPU emulation, broken by checkin 86603283326c9e95e5ad4e9fdddeec93cac5d9ad: x86: Introduce 'struct fpu' and related API Signed-off-by: H. Peter Anvin Cc: Avi Kivity Cc: Suresh Siddha LKML-Reference: <1273135546-29690-3-git-send-email-avi@redhat.com> commit d822ed1094032ab524344a9a474c93128d9c2159 Author: Paul E. McKenney Date: Sat May 8 19:58:22 2010 -0700 rcu: fix build bug in RCU_FAST_NO_HZ builds Signed-off-by: Paul E. McKenney commit 77e38ed347162423c6b72e06c865a121081c2bb6 Author: Paul E. McKenney Date: Sun Apr 25 21:04:29 2010 -0700 rcu: RCU_FAST_NO_HZ must check RCU dyntick state The current version of RCU_FAST_NO_HZ reproduces the old CLASSIC_RCU dyntick-idle bug, as it fails to detect CPUs that have interrupted or NMIed out of dyntick-idle mode. Fix this by making rcu_needs_cpu() check the state in the per-CPU rcu_dynticks variables, thus correctly detecting the dyntick-idle state from an RCU perspective. Signed-off-by: Paul E. McKenney commit d14aada8e20bdf81ffd43f433b123972cf575b32 Author: Paul E. McKenney Date: Mon Apr 19 22:24:22 2010 -0700 rcu: make SRCU usable in modules Add a #include for mutex.h to allow SRCU to be more easily used in kernel modules. Signed-off-by: Lai Jiangshan Signed-off-by: Paul E. McKenney commit f1d507beeab1d1d60a1c58eac7dc81522c6f4629 Author: Paul E. McKenney Date: Thu Apr 15 15:49:46 2010 -0700 rcu: improve the RCU CPU-stall warning documentation The existing Documentation/RCU/stallwarn.txt has proven unhelpful, so rework it a bit. In particular, show how to interpret the stall-warning messages. Signed-off-by: Paul E. McKenney commit d21670acab9fcb4bc74a40b68a6941059234c55c Author: Paul E. McKenney Date: Wed Apr 14 17:39:26 2010 -0700 rcu: reduce the number of spurious RCU_SOFTIRQ invocations Lai Jiangshan noted that up to 10% of the RCU_SOFTIRQ are spurious, and traced this down to the fact that the current grace-period machinery will uselessly raise RCU_SOFTIRQ when a given CPU needs to go through a quiescent state, but has not yet done so. In this situation, there might well be nothing that RCU_SOFTIRQ can do, and the overhead can be worth worrying about in the ksoftirqd case. This patch therefore avoids raising RCU_SOFTIRQ in this situation. Changes since v1 (http://lkml.org/lkml/2010/3/30/122 from Lai Jiangshan): o Omit the rcu_qs_pending() prechecks, as they aren't that much less expensive than the quiescent-state checks. o Merge with the set_need_resched() patch that reduces IPIs. o Add the new n_rp_report_qs field to the rcu_pending tracing output. o Update the tracing documentation accordingly. Signed-off-by: Lai Jiangshan Signed-off-by: Paul E. McKenney commit 4a90a0681cf6cd21cd444184302aa045156486b3 Author: Paul E. McKenney Date: Wed Apr 14 16:48:11 2010 -0700 rcu: permit discontiguous cpu_possible_mask CPU numbering TREE_RCU assumes that CPU numbering is contiguous, but some users need large holes in the numbering to better map to hardware layout. This patch makes TREE_RCU (and TREE_PREEMPT_RCU) tolerate large holes in the CPU numbering. However, NR_CPUS must still be greater than the largest CPU number. Signed-off-by: Paul E. McKenney commit 4300aa642cc9ecb35f2e0683dd294fb790ef028c Author: Paul E. McKenney Date: Tue Apr 13 16:18:22 2010 -0700 rcu: improve RCU CPU stall-warning messages The existing RCU CPU stall-warning messages can be confusing, especially in the case where one CPU detects a single other stalled CPU. In addition, the console messages did not say which flavor of RCU detected the stall, which can make it difficult to work out exactly what is causing the stall. This commit improves these messages. Requested-by: Dhaval Giani Signed-off-by: Paul E. McKenney commit 26845c2860cebebe6ce2d9d01ae3cb3db84b7e29 Author: Paul E. McKenney Date: Tue Apr 13 14:19:23 2010 -0700 rcu: print boot-time console messages if RCU configs out of ordinary Print boot-time messages if tracing is enabled, if fanout is set to non-default values, if exact fanout is specified, if accelerated dyntick-idle grace periods have been enabled, if RCU-lockdep is enabled, if rcutorture has been boot-time enabled, if the CPU stall detector has been disabled, or if four-level hierarchy has been enabled. This is all for TREE_RCU and TREE_PREEMPT_RCU. TINY_RCU will be handled separately, if at all. Suggested-by: Josh Triplett Signed-off-by: Paul E. McKenney commit c68de2097a8799549a3c3bf27cbfeea24a604284 Author: Paul E. McKenney Date: Thu Apr 15 10:12:40 2010 -0700 rcu: disable CPU stall warnings upon panic The current RCU CPU stall warnings remain enabled even after a panic occurs, which some people have found to be a bit counterproductive. This patch therefore uses a notifier to disable stall warnings once a panic occurs. Signed-off-by: Paul E. McKenney commit 55ec936ff4e57cc626db336a7bf33b267390e9b4 Author: Paul E. McKenney Date: Tue Apr 13 12:22:33 2010 -0700 rcu: enable CPU_STALL_VERBOSE by default The CPU_STALL_VERBOSE kernel configuration parameter was added to 2.6.34 to identify any preempted/blocked tasks that were preventing the current grace period from completing when running preemptible RCU. As is conventional for new configurations parameters, this defaulted disabled. It is now time to enable it by default. Signed-off-by: Paul E. McKenney commit bbad937983147c017c25406860287cb94da9af7c Author: Paul E. McKenney Date: Fri Apr 2 16:17:17 2010 -0700 rcu: slim down rcutiny by removing rcu_scheduler_active and friends TINY_RCU does not need rcu_scheduler_active unless CONFIG_DEBUG_LOCK_ALLOC. So conditionally compile rcu_scheduler_active in order to slim down rcutiny a bit more. Also gets rid of an EXPORT_SYMBOL_GPL, which is responsible for most of the slimming. Signed-off-by: Paul E. McKenney commit 25502a6c13745f4650cc59322bd198194f55e796 Author: Paul E. McKenney Date: Thu Apr 1 17:37:01 2010 -0700 rcu: refactor RCU's context-switch handling The addition of preemptible RCU to treercu resulted in a bit of confusion and inefficiency surrounding the handling of context switches for RCU-sched and for RCU-preempt. For RCU-sched, a context switch is a quiescent state, pure and simple, just like it always has been. For RCU-preempt, a context switch is in no way a quiescent state, but special handling is required when a task blocks in an RCU read-side critical section. However, the callout from the scheduler and the outer loop in ksoftirqd still calls something named rcu_sched_qs(), whose name is no longer accurate. Furthermore, when rcu_check_callbacks() notes an RCU-sched quiescent state, it ends up unnecessarily (though harmlessly, aside from the performance hit) enqueuing the current task if it happens to be running in an RCU-preempt read-side critical section. This not only increases the maximum latency of scheduler_tick(), it also needlessly increases the overhead of the next outermost rcu_read_unlock() invocation. This patch addresses this situation by separating the notion of RCU's context-switch handling from that of RCU-sched's quiescent states. The context-switch handling is covered by rcu_note_context_switch() in general and by rcu_preempt_note_context_switch() for preemptible RCU. This permits rcu_sched_qs() to handle quiescent states and only quiescent states. It also reduces the maximum latency of scheduler_tick(), though probably by much less than a microsecond. Finally, it means that tasks within preemptible-RCU read-side critical sections avoid incurring the overhead of queuing unless there really is a context switch. Suggested-by: Lai Jiangshan Acked-by: Lai Jiangshan Signed-off-by: Paul E. McKenney Cc: Ingo Molnar Cc: Peter Zijlstra commit 99652b54de1ee094236f7171485214071af4ef31 Author: Paul E. McKenney Date: Tue Mar 30 15:50:01 2010 -0700 rcu: rename rcutiny rcu_ctrlblk to rcu_sched_ctrlblk Make naming line up in preparation for CONFIG_TINY_PREEMPT_RCU. Signed-off-by: Paul E. McKenney commit da848c47bc6e873a54a445ea1960423a495b6b32 Author: Paul E. McKenney Date: Tue Mar 30 15:46:01 2010 -0700 rcu: shrink rcutiny by making synchronize_rcu_bh() be inline Because synchronize_rcu_bh() is identical to synchronize_sched(), make the former a static inline invoking the latter, saving the overhead of an EXPORT_SYMBOL_GPL() and the duplicate code. Signed-off-by: Paul E. McKenney commit 32c141a0a1dfa29e0a07d78bec0c0919fc4b9f88 Author: Paul E. McKenney Date: Tue Mar 30 10:59:28 2010 -0700 rcu: fix now-bogus rcu_scheduler_active comments. The rcu_scheduler_active check has been wrapped into the new debug_lockdep_rcu_enabled() function, so update the comments to reflect this new reality. Signed-off-by: Paul E. McKenney commit d20200b591f59847ab6a5c23507084a7d29e23c5 Author: Paul E. McKenney Date: Tue Mar 30 10:52:21 2010 -0700 rcu: Fix bogus CONFIG_PROVE_LOCKING in comments to reflect reality. It is CONFIG_DEBUG_LOCK_ALLOC rather than CONFIG_PROVE_LOCKING, so fix it. Signed-off-by: Paul E. McKenney commit 5db356736acb9ba717df1aa9444e4e44cbb30a71 Author: Lai Jiangshan Date: Tue Mar 30 18:40:36 2010 +0800 rcu: ignore offline CPUs in last non-dyntick-idle CPU check Offline CPUs are not in nohz_cpu_mask, but can be ignored when checking for the last non-dyntick-idle CPU. This patch therefore only checks online CPUs for not being dyntick idle, allowing fast entry into full-system dyntick-idle state even when there are some offline CPUs. Signed-off-by: Lai Jiangshan Signed-off-by: Paul E. McKenney commit 0c34029abdfdea64420cb4264c4e91a776b22157 Author: Lai Jiangshan Date: Sun Mar 28 11:12:30 2010 +0800 rcu: move some code from macro to function Shrink the RCU_INIT_FLAVOR() macro by moving all but the initialization of the ->rda[] array to rcu_init_one(). The call to rcu_init_one() can then be moved to the end of the RCU_INIT_FLAVOR() macro, which is required because rcu_boot_init_percpu_data(), which is now called from rcu_init_one(), depends on the initialization of the ->rda[] array. Signed-off-by: Lai Jiangshan Signed-off-by: Paul E. McKenney commit f261414f0d56dd1a0e34888e27d1d4902ad052f3 Author: Lai Jiangshan Date: Sun Mar 28 11:15:20 2010 +0800 rcu: make dead code really dead cleanup: make dead code really dead Signed-off-by: Lai Jiangshan Signed-off-by: Paul E. McKenney commit d25eb9442bb2c38c1e742f0fa764d7132d72593f Author: Paul E. McKenney Date: Thu Mar 18 21:36:51 2010 -0700 rcu: substitute set_need_resched for sending resched IPIs This patch adds a check to __rcu_pending() that does a local set_need_resched() if the current CPU is holding up the current grace period and if force_quiescent_state() will be called soon. The goal is to reduce the probability that force_quiescent_state() will need to do smp_send_reschedule(), which sends an IPI and is therefore more expensive on most architectures. Signed-off-by: "Paul E. McKenney" commit 2b3fc35f6919344e3cf722dde8308f47235c0b70 Author: Lai Jiangshan Date: Tue Apr 20 16:23:07 2010 +0800 rcu: optionally leave lockdep enabled after RCU lockdep splat There is no need to disable lockdep after an RCU lockdep splat, so remove the debug_lockdeps_off() from lockdep_rcu_dereference(). To avoid repeated lockdep splats, use a static variable in the inlined rcu_dereference_check() and rcu_dereference_protected() macros so that a given instance splats only once, but so that multiple instances can be detected per boot. This is controlled by a new config variable CONFIG_PROVE_RCU_REPEATEDLY, which is disabled by default. This provides the normal lockdep behavior by default, but permits people who want to find multiple RCU-lockdep splats per boot to easily do so. Requested-by: Eric Paris Signed-off-by: Lai Jiangshan Tested-by: Eric Paris Signed-off-by: Paul E. McKenney commit fae683f764f91f31ab45512e70cc8cc81d4d157b Author: Suresh Jayaraman Date: Mon May 10 20:00:05 2010 +0530 cifs: add comments explaining cifs_new_fileinfo behavior The comments make it clear the otherwise subtle behavior of cifs_new_fileinfo(). Signed-off-by: Suresh Jayaraman Reviewed-by: Shirish Pargaonkar -- fs/cifs/dir.c | 18 ++++++++++++++++-- 1 files changed, 16 insertions(+), 2 deletions(-) Signed-off-by: Steve French commit 86603283326c9e95e5ad4e9fdddeec93cac5d9ad Author: Avi Kivity Date: Thu May 6 11:45:46 2010 +0300 x86: Introduce 'struct fpu' and related API Currently all fpu state access is through tsk->thread.xstate. Since we wish to generalize fpu access to non-task contexts, wrap the state in a new 'struct fpu' and convert existing access to use an fpu API. Signal frame handlers are not converted to the API since they will remain task context only things. Signed-off-by: Avi Kivity Acked-by: Suresh Siddha LKML-Reference: <1273135546-29690-3-git-send-email-avi@redhat.com> Signed-off-by: H. Peter Anvin commit c9ad488289144ae5ef53b012e15895ef1f5e4bb6 Author: Avi Kivity Date: Thu May 6 11:45:45 2010 +0300 x86: Eliminate TS_XSAVE The fpu code currently uses current->thread_info->status & TS_XSAVE as a way to distinguish between XSAVE capable processors and older processors. The decision is not really task specific; instead we use the task status to avoid a global memory reference - the value should be the same across all threads. Eliminate this tie-in into the task structure by using an alternative instruction keyed off the XSAVE cpu feature; this results in shorter and faster code, without introducing a global memory reference. [ hpa: in the future, this probably should use an asm jmp ] Signed-off-by: Avi Kivity Acked-by: Suresh Siddha LKML-Reference: <1273135546-29690-2-git-send-email-avi@redhat.com> Signed-off-by: H. Peter Anvin commit 1c02c4d2e92f2097f1bba63ec71560b0e05a7f36 Author: Arnaldo Carvalho de Melo Date: Mon May 10 13:04:11 2010 -0300 perf hist: Introduce hists class and move lots of methods to it In cbbc79a we introduced support for multiple events by introducing a new "event_stat_id" struct and then made several perf_session methods receive a point to it instead of a pointer to perf_session, and kept the event_stats and hists rb_tree in perf_session. While working on the new newt based browser, I realised that it would be better to introduce a new class, "hists" (short for "histograms"), renaming the "event_stat_id" struct and the perf_session methods that were really "hists" methods, as they manipulate only struct hists members, not touching anything in the other perf_session members. Other optimizations, such as calculating the maximum lenght of a symbol name present in an hists instance will be possible as we add them, avoiding a re-traversal just for finding that information. The rationale for the name "hists" to replace "event_stat_id" is that we may have multiple sets of hists for the same event_stat id, as, for instance, the 'perf diff' tool has, so event stat id is not what characterizes what this struct and the functions that manipulate it do. Cc: Eric B Munson Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit d118f8ba6ac2af2bf11d40cba657c813f0f39ca2 Author: Arnaldo Carvalho de Melo Date: Mon May 10 12:51:05 2010 -0300 perf session: create_kernel_maps should use ->host_machine Using machines__create_kernel_maps(..., HOST_KERNEL_ID) it would create another machine instance for the host machine, and since 1f626bc we have it out of the machines rb_tree. Fix it by using machine__create_kernel_maps(&self->host_machine) directly. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit cdd5b75b0cd24c4d6a98b12a219217b1ccfe2586 Author: Arnaldo Carvalho de Melo Date: Mon May 10 10:56:50 2010 -0300 perf callchains: Use zalloc to allocate objects Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 51c8176472de1551a301b676e36a61884e0e8494 Author: Suresh Jayaraman Date: Mon May 10 15:15:24 2010 +0530 cifs: remove unused parameter from cifs_posix_open_inode_helper() ..a left over from the commit 3321b791b2e8897323f8c044a0c77ff25781381c. Cc: Jeff Layton Signed-off-by: Suresh Jayaraman Signed-off-by: Steve French commit 7f8264539c62378cccbdf9b598927b034bef4a92 Author: Arnaldo Carvalho de Melo Date: Mon May 10 10:51:25 2010 -0300 perf newt: Use newtAddComponent() Instead of newtAddComponents(just-one-entry, NULL), that is not needed if, like in this browser, we're adding just one component at a time. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit cc49b092d308f8ea8634134b0d95d831a88a674b Merge: 7c224a0 bae663b Author: Ingo Molnar Date: Mon May 10 13:13:40 2010 +0200 Merge branch 'core' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into oprofile commit 7c224a03a79021ab12ce057964df9e679af5386d Merge: 5bdb793 b57f95a Author: Ingo Molnar Date: Mon May 10 13:12:26 2010 +0200 Merge commit 'v2.6.34-rc7' into oprofile Merge reason: Update to Linus's latest -rc. Signed-off-by: Ingo Molnar commit af507ae8a0512a83728b17d8f8c5fa1561669f50 Author: Li Zefan Date: Mon May 10 11:24:27 2010 +0800 sched: Remove a stale comment This comment should have been removed together with uids_mutex when removing user sched. Signed-off-by: Li Zefan Cc: Peter Zijlstra Cc: Dhaval Giani LKML-Reference: <4BE77C6B.5010402@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 1f0ac7183f4d270bd9ce511254ba5d931d4f29c9 Merge: 232a5c9 76ba7e8 Author: Ingo Molnar Date: Mon May 10 08:20:19 2010 +0200 Merge branch 'perf/test' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core commit 3998d095354d2a3062bdaa821ef07a1e1c82873c Author: H. Peter Anvin Date: Sun May 9 22:46:54 2010 -0700 x86, hypervisor: add missing EXPORT_SYMBOL() needs to be included; fixes modular builds of the VMware balloon driver, and any future modular drivers which depends on the hypervisor. Reported-by: Ingo Molnar Signed-off-by: H. Peter Anvin Cc: Greg KH Cc: Hank Janssen Cc: Alok Kataria Cc: Ky Srinivasan Cc: Dmitry Torokhov LKML-Reference: <4BE49778.6060800@zytor.com> commit 232a5c948da5e23dff27e48180abf4a4238f7602 Author: Arnaldo Carvalho de Melo Date: Sun May 9 20:28:10 2010 -0300 perf report: Allow limiting the number of entries to print in callchains Works by adding a third parameter to the '-g' argument, after the graph type and minimum percentage, for example: [root@doppio linux-2.6-tip]# perf report -g fractal,0.5,2 Will show only the first two symbols where at least 0.5% of the samples took place. All the other symbols that don't fall outside these constraints will be put together in the last entry, prefixed with "[...]" and the total percentage for them. Suggested-by: Arjan van de Ven Acked-by: Arjan van de Ven Cc: Arjan van de Ven Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 1f626bc36847ac8dd192f055aed0f9678a781313 Author: Arnaldo Carvalho de Melo Date: Sun May 9 19:57:08 2010 -0300 perf session: Embed the host machine data on perf_session We have just one host on a given session, and that is the most common setup right now, so embed a ->host_machine struct machine instance directly in the perf_session class, check if we're looking for it before going to the rb_tree. This also fixes a problem found when we try to process old perf.data files where we didn't have MMAP events for the kernel and modules and thus don't create the kernel maps, do it in event__preprocess_sample if it wasn't already. Reported-by: Ingo Molnar Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi Cc: Zhang, Yanmin LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 4cc4945844fe2cf493f1783b6ce938ba1617d5c2 Author: Arnaldo Carvalho de Melo Date: Sun May 9 21:14:07 2010 -0300 perf symbols: Check if a struct machine instance was found Which can happen when processing old files that had no fake kernel MMAP, events. That shouldn't result in perf_session__create_kernel_maps not being called, this will be fixed in a followup patch, for now do these checks to avoid segfaulting. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 3ceb0d4438876a65606c258e5d69e03e57460dd6 Author: Arnaldo Carvalho de Melo Date: Sun May 9 16:07:01 2010 -0300 perf symbols: Consider unresolved DSOs in the dso__col_widt calculation By using BITS_PER_LONG / 4, that is the number of chars that will be used in such cases as the DSO "name". Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 76ba7e846fcc89d9d4b25b89e303c9058de96d60 Author: Hitoshi Mitake Date: Sat May 8 17:10:29 2010 +0900 perf lock: Drop "-a" option from cmd_record() default arguments set This patch drops "-a" from the default arguments passed to perf record by perf lock. If a user wants to do a system wide record of lock events, perf lock record -a ... is enough for this purpose. This can reduce the size of the perf.data file. % sudo ./perf lock record whoami root [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.439 MB perf.data (~19170 samples) ] % sudo ./perf lock record -a whoami # with -a option root [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 48.962 MB perf.data (~2139197 samples) ] Signed-off-by: Hitoshi Mitake Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Arnaldo Carvalho de Melo Cc: Jens Axboe Cc: Jason Baron Cc: Xiao Guangrong LKML-Reference: Message-Id: <1273306229-5216-1-git-send-email-mitake@dcl.info.waseda.ac.jp> Signed-off-by: Frederic Weisbecker commit 19379b11819efc1fc3b602e64f7e7531050aaddb Author: Arjan van de Ven Date: Sun May 9 08:26:51 2010 -0700 ondemand: Make the iowait-is-busy time a sysfs tunable Pavel Machek pointed out that not all CPUs have an efficient idle at high frequency. Specifically, older Intel and various AMD cpus would get a higher powerusage when copying files from USB. Mike Chan pointed out that the same is true for various ARM chips as well. Thomas Renninger suggested to make this a sysfs tunable with a reasonable default. This patch adds a sysfs tunable for the new behavior, and uses a very simple function to determine a reasonable default, depending on the CPU vendor/type. Signed-off-by: Arjan van de Ven Acked-by: Rik van Riel Acked-by: Pavel Machek Acked-by: Peter Zijlstra Cc: davej@redhat.com LKML-Reference: <20100509082651.46914d04@infradead.org> [ minor tidyup ] Signed-off-by: Ingo Molnar commit 6b8fcd9029f217a9ecce822db645e19111c11080 Author: Arjan van de Ven Date: Sun May 9 08:26:06 2010 -0700 ondemand: Solve a big performance issue by counting IOWAIT time as busy The ondemand cpufreq governor uses CPU busy time (e.g. not-idle time) as a measure for scaling the CPU frequency up or down. If the CPU is busy, the CPU frequency scales up, if it's idle, the CPU frequency scales down. Effectively, it uses the CPU busy time as proxy variable for the more nebulous "how critical is performance right now" question. This algorithm falls flat on its face in the light of workloads where you're alternatingly disk and CPU bound, such as the ever popular "git grep", but also things like startup of programs and maildir using email clients... much to the chagarin of Andrew Morton. This patch changes the ondemand algorithm to count iowait time as busy, not idle, time. As shown in the breakdown cases above, iowait is performance critical often, and by counting iowait, the proxy variable becomes a more accurate representation of the "how critical is performance" question. The problem and fix are both verified with the "perf timechar" tool. Signed-off-by: Arjan van de Ven Signed-off-by: Andrew Morton Signed-off-by: Dave Jones Reviewed-by: Rik van Riel Acked-by: Peter Zijlstra LKML-Reference: <20100509082606.3d9f00d0@infradead.org> Signed-off-by: Ingo Molnar commit 0224cf4c5ee0d7faec83956b8e21f7d89e3df3bd Author: Arjan van de Ven Date: Sun May 9 08:25:23 2010 -0700 sched: Intoduce get_cpu_iowait_time_us() For the ondemand cpufreq governor, it is desired that the iowait time is microaccounted in a similar way as idle time is. This patch introduces the infrastructure to account and expose this information via the get_cpu_iowait_time_us() function. [akpm@linux-foundation.org: fix CONFIG_NO_HZ=n build] Signed-off-by: Arjan van de Ven Signed-off-by: Andrew Morton Reviewed-by: Rik van Riel Acked-by: Peter Zijlstra Cc: davej@redhat.com LKML-Reference: <20100509082523.284feab6@infradead.org> Signed-off-by: Ingo Molnar commit e0e37c200f1357db0dd986edb359c41c57d24f6e Author: Arjan van de Ven Date: Sun May 9 08:24:39 2010 -0700 sched: Eliminate the ts->idle_lastupdate field Now that the only user of ts->idle_lastupdate is update_ts_time_stats(), the entire field can be eliminated. In update_ts_time_stats(), idle_lastupdate is first set to "now", and a few lines later, the only user is an if() statement that assigns a variable either to "now" or to ts->idle_lastupdate, which has the value of "now" at that point. Signed-off-by: Arjan van de Ven Signed-off-by: Andrew Morton Reviewed-by: Rik van Riel Acked-by: Peter Zijlstra Cc: davej@redhat.com LKML-Reference: <20100509082439.2fab0b4f@infradead.org> Signed-off-by: Ingo Molnar commit 8d63bf949e330588b80d30ca8f0a27a45297a9e9 Author: Arjan van de Ven Date: Sun May 9 08:24:03 2010 -0700 sched: Fold updating of the last_update_time_info into update_ts_time_stats() This patch folds the updating of the last_update_time into the update_ts_time_stats() function, and updates the callers. This allows for further cleanups that are done in the next patch. Signed-off-by: Arjan van de Ven Signed-off-by: Andrew Morton Reviewed-by: Rik van Riel Acked-by: Peter Zijlstra Cc: davej@redhat.com LKML-Reference: <20100509082403.60072967@infradead.org> Signed-off-by: Ingo Molnar commit 8c7b09f43f4bf570654bcc458ce96819a932303c Author: Arjan van de Ven Date: Sun May 9 08:23:23 2010 -0700 sched: Update the idle statistics in get_cpu_idle_time_us() Right now, get_cpu_idle_time_us() only reports the idle statistics upto the point the CPU entered last idle; not what is valid right now. This patch adds an update of the idle statistics to get_cpu_idle_time_us(), so that calling this function always returns statistics that are accurate at the point of the call. This includes resetting the start of the idle time for accounting purposes to avoid double accounting. Signed-off-by: Arjan van de Ven Signed-off-by: Andrew Morton Reviewed-by: Rik van Riel Acked-by: Peter Zijlstra Cc: davej@redhat.com LKML-Reference: <20100509082323.2d2f1945@infradead.org> Signed-off-by: Ingo Molnar commit 595aac488b546c7185be7e29c8ae165a588b2a9f Author: Arjan van de Ven Date: Sun May 9 08:22:45 2010 -0700 sched: Introduce a function to update the idle statistics Currently, two places update the idle statistics (and more to come later in this series). This patch creates a helper function for updating these statistics. Signed-off-by: Arjan van de Ven Signed-off-by: Andrew Morton Reviewed-by: Rik van Riel Acked-by: Peter Zijlstra Cc: davej@redhat.com LKML-Reference: <20100509082245.163e67ed@infradead.org> Signed-off-by: Ingo Molnar commit b1f724c3055fa75a31d272222213647547a2d3d4 Author: Arjan van de Ven Date: Sun May 9 08:22:08 2010 -0700 sched: Add a comment to get_cpu_idle_time_us() The exported function get_cpu_idle_time_us() has no comment describing it; add a kerneldoc comment Signed-off-by: Arjan van de Ven Signed-off-by: Andrew Morton Reviewed-by: Rik van Riel Acked-by: Peter Zijlstra Cc: davej@redhat.com LKML-Reference: <20100509082208.7cb721f0@infradead.org> Signed-off-by: Ingo Molnar commit 28e2a106d16046ca792722795f809e3f80a5af80 Author: Arnaldo Carvalho de Melo Date: Sun May 9 13:02:23 2010 -0300 perf hist: Simplify the insertion of new hist_entry instances And with that fix at least one bug: The first hit for an entry, the one that calls malloc to create a new instance in __perf_session__add_hist_entry, wasn't adding the count to the per cpumode (PERF_RECORD_MISC_USER, etc) total variable. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 39d1e1b1e26dc84d40bf2792287d0d61e44b57df Author: Arnaldo Carvalho de Melo Date: Sun May 9 12:01:05 2010 -0300 perf report: Fix leak of resolved callchains array on error path Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 139633c6a43781cd44798165b0472a34bf53a1e8 Author: Arnaldo Carvalho de Melo Date: Sun May 9 11:47:13 2010 -0300 perf callchain: Move validate_callchain to callchain lib Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 794e43b56c18b95fc9776c914a2659e7d558a352 Author: Tom Zanussi Date: Wed May 5 00:27:40 2010 -0500 perf/live-mode: Handle payload-less events Some events, such as the PERF_RECORD_FINISHED_ROUND event consist of only an event header and no data. In this case, a 0-length payload will be read, and the 0 return value will be wrongly interpreted as an 'unexpected end of event stream'. This patch allows for proper handling of data-less events by skipping 0-length reads. Signed-off-by: Tom Zanussi Cc: Arnaldo Carvalho de Melo Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Masami Hiramatsu LKML-Reference: <1273038527.6383.51.camel@tropicana> Signed-off-by: Frederic Weisbecker commit 2c193c736803ceb547daec725e5c5d992d039f20 Author: Frederic Weisbecker Date: Sat May 8 06:36:02 2010 +0200 tracing: Factorize lock events in a lock class lock_acquired, lock_contended and lock_release now share the same prototype and format. Let's factorize them into a lock event class. Signed-off-by: Frederic Weisbecker Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Hitoshi Mitake Cc: Steven Rostedt commit 93135439459920c4d856f4ab8f068c030085c8df Author: Frederic Weisbecker Date: Sat May 8 06:24:25 2010 +0200 tracing: Drop the nested field from lock_release event Drop the nested field as we don't use it. Every nested state can be computed from a state machine on post processing already. Signed-off-by: Frederic Weisbecker Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Hitoshi Mitake Cc: Steven Rostedt commit 883a2a3189dae9d2912c417e47152f51cb922a3f Author: Frederic Weisbecker Date: Sat May 8 06:16:11 2010 +0200 tracing: Drop lock_acquired waittime field Drop the waittime field from the lock_acquired event, we can calculate it by substracting the lock_acquired event timestamp with the matching lock_acquire one. It is not needed and takes useless space in the traces. Signed-off-by: Frederic Weisbecker Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Hitoshi Mitake Cc: Steven Rostedt commit 90c0e5fc7b73d2575c7367e1da70ff9521718e5e Author: Frederic Weisbecker Date: Fri May 7 02:33:42 2010 +0200 perf lock: Always check min AND max wait time When a lock is acquired after beeing contended, we update the wait time statistics for the given lock. But if the min wait time is updated, we don't check the max wait time. This is wrong because the first time we update the wait time, we want to update both min and max wait time. Before: Name acquired contended total wait (ns) max wait (ns) min wait (ns) key 8 1 21656 0 21656 After: Name acquired contended total wait (ns) max wait (ns) min wait (ns) key 8 1 21656 21656 21656 Signed-off-by: Frederic Weisbecker Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Hitoshi Mitake commit 5efe08cf685f33f562566dc68b6077b6f6a4f706 Author: Frederic Weisbecker Date: Thu May 6 04:55:22 2010 +0200 perf: Fix perf lock bad rate Fix the cast made to get the bad rate. It is made in the result instead of the operands. We need the operands to be cast in double, otherwise the result will always be zero. Signed-off-by: Frederic Weisbecker Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Hitoshi Mitake commit 84c7a21791eb2e962a27e19bab5b77d5d9e13a34 Author: Frederic Weisbecker Date: Wed May 5 23:57:25 2010 +0200 perf: Humanize lock flags in perf lock Use an enum instead of plain constants for lock flags. Signed-off-by: Frederic Weisbecker Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Hitoshi Mitake commit 10350ec362b48f79f3df8447c25813790075e27c Author: Frederic Weisbecker Date: Wed May 5 23:47:28 2010 +0200 perf: Cleanup perf lock broken states Use enum to get a human view of bad_hist indexes and put bad histogram output in its own function. Signed-off-by: Frederic Weisbecker Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Hitoshi Mitake commit 26242d859c9be9eea61f7f19514e9d272ae8ce26 Author: Hitoshi Mitake Date: Mon May 3 14:12:00 2010 +0900 perf lock: Add "info" subcommand for dumping misc information This adds the "info" subcommand to perf lock which can be used to dump metadata like threads or addresses of lock instances. "map" was removed because info should do the work for it. This will be useful not only for debugging but also for ordinary analyzing. v2: adding example of usage % sudo ./perf lock info -t | Thread ID: comm | 0: swapper | 1: init | 18: migration/5 | 29: events/2 | 32: events/5 | 33: events/6 ... % sudo ./perf lock info -m | Address of instance: name of class | 0xffff8800b95adae0: &(&sighand->siglock)->rlock | 0xffff8800bbb41ae0: &(&sighand->siglock)->rlock | 0xffff8800bf165ae0: &(&sighand->siglock)->rlock | 0xffff8800b9576a98: &p->cred_guard_mutex | 0xffff8800bb890a08: &(&p->alloc_lock)->rlock | 0xffff8800b9522a08: &(&p->alloc_lock)->rlock | 0xffff8800bb8aaa08: &(&p->alloc_lock)->rlock | 0xffff8800bba72a08: &(&p->alloc_lock)->rlock | 0xffff8800bf18ea08: &(&p->alloc_lock)->rlock | 0xffff8800b8a0d8a0: &(&ip->i_lock)->mr_lock | 0xffff88009bf818a0: &(&ip->i_lock)->mr_lock | 0xffff88004c66b8a0: &(&ip->i_lock)->mr_lock | 0xffff8800bb6478a0: &(shost->host_lock)->rlock v3: fixed some problems Frederic pointed out * better rbtree tracking in dump_threads() * removed printf() and used pr_info() and pr_debug() Signed-off-by: Hitoshi Mitake Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Arnaldo Carvalho de Melo Cc: Jens Axboe Cc: Jason Baron Cc: Xiao Guangrong LKML-Reference: <1272863520-16179-1-git-send-email-mitake@dcl.info.waseda.ac.jp> Signed-off-by: Frederic Weisbecker commit d6b17bebd79dae2e3577f2ea27a832af4991a5e6 Author: Frederic Weisbecker Date: Mon May 3 15:14:33 2010 +0200 perf: Provide a new deterministic events reordering algorithm The current events reordering algorithm is based on a heuristic that gets broken once we deal with a very fast flow of events. Indeed the time period based flushing is not suitable anymore in the following case, assuming we have a flush period of two seconds. CPU 0 | CPU 1 | cnt1 timestamps | cnt1 timestamps | 0 | 0 1 | 1 2 | 2 3 | 3 [...] | [...] 4 seconds later If we spend too much time to read the buffers (case of a lot of events to record in each buffers or when we have a lot of CPU buffers to read), in the next pass the CPU 0 buffer could contain a slice of several seconds of events. We'll read them all and notice we've reached the period to flush. In the above example we flush the first half of the CPU 0 buffer, then we read the CPU 1 buffer where we have events that were on the flush slice and then the reordering fails. It's simple to reproduce with: perf lock record perf bench sched messaging To solve this, we use a new solution that doesn't rely on an heuristical time slice period anymore but on a deterministic basis based on how perf record does its job. perf record saves the buffers through passes. A pass is a tour on every buffers from every CPUs. This is made in order: for each CPU we read the buffers of every counters. So the more buffers we visit, the later will be the timstamps of their events. When perf record finishes a pass it records a PERF_RECORD_FINISHED_ROUND pseudo event. We record the max timestamp t found in the pass n. Assuming these timestamps are monotonic across cpus, we know that if a buffer still has events with timestamps below t, they will be all available and then read in the pass n + 1. Hence when we start to read the pass n + 2, we can safely flush every events with timestamps below t. ============ PASS n ================= CPU 0 | CPU 1 | cnt1 timestamps | cnt2 timestamps 1 | 2 2 | 3 - | 4 <--- max recorded ============ PASS n + 1 ============== CPU 0 | CPU 1 | cnt1 timestamps | cnt2 timestamps 3 | 5 4 | 6 5 | 7 <---- max recorded Flush every events below timestamp 4 ============ PASS n + 2 ============== CPU 0 | CPU 1 | cnt1 timestamps | cnt2 timestamps 6 | 8 7 | 9 - | 10 Flush every events below timestamp 7 etc... It also works on perf.data versions that don't have PERF_RECORD_FINISHED_ROUND pseudo events. The difference is that the events will be only flushed in the end of the perf.data processing. It will then consume more memory and scale less with large perf.data files. Signed-off-by: Frederic Weisbecker Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Tom Zanussi Cc: Masami Hiramatsu commit 984028075794c00cbf4fb1e94bb6233e8be08875 Author: Frederic Weisbecker Date: Sun May 2 22:05:29 2010 +0200 perf: Introduce a new "round of buffers read" pseudo event In order to provide a more rubust and deterministic reordering algorithm, we need to know when we reach a point where we just did a pass through over every counter buffers to read every thing they had. This patch introduces a new PERF_RECORD_FINISHED_ROUND pseudo event that only consist in an event header and doesn't need to contain anything. Signed-off-by: Frederic Weisbecker Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Tom Zanussi Cc: Masami Hiramatsu commit a10a569806e43b9be5fce60b21f836b50b1010e4 Author: H. Peter Anvin Date: Sun May 9 01:13:42 2010 -0700 Modify the VMware balloon driver for the new x86_hyper API Modify the VMware balloon driver to match the new x86_hyper API. Signed-off-by: H. Peter Anvin Cc: Greg KH Cc: Hank Janssen Cc: Alok Kataria Cc: Ky Srinivasan Cc: Dmitry Torokhov LKML-Reference: <4BE49778.6060800@zytor.com> commit 96f6e775b58687d85ee33004d414419b5ec34106 Author: H. Peter Anvin Date: Sun May 9 01:10:34 2010 -0700 x86, hypervisor: Export the x86_hyper* symbols Export x86_hyper and the related specific structures, allowing for hypervisor identification by modules. Signed-off-by: H. Peter Anvin Cc: Greg KH Cc: Hank Janssen Cc: Alok Kataria Cc: Ky Srinivasan Cc: Dmitry Torokhov LKML-Reference: <4BE49778.6060800@zytor.com> commit d7be0ce6afb1df60bc786f57410407ceae92b994 Merge: e08cae4 66f41d4 Author: H. Peter Anvin Date: Sat May 8 14:59:58 2010 -0700 Merge commit 'v2.6.34-rc6' into x86/cpu commit e157eb8341e7885ff2d9f1620155e3da6e0c8f56 Author: Pekka Enberg Date: Sat May 8 18:33:03 2010 +0300 perf report: Document '--call-graph' better for usage This patch improves 'perf report -h' output for the '--call-graph' command line option by enumerating the different output types. Signed-off-by: Pekka Enberg Cc: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <1273332783-4268-1-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar commit e7858f52a5cb868289a72264534a1f05f3340c6c Merge: 27a9da6 bbf1bb3 Author: Ingo Molnar Date: Sat May 8 18:11:19 2010 +0200 Merge branch 'cpu_stop' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into sched/core commit c0614829c16ab9d31f1b7d40516decfbf3d32102 Author: Masami Hiramatsu Date: Tue Apr 27 18:33:12 2010 -0400 kprobes: Move enable/disable_kprobe() out from debugfs code Move enable/disable_kprobe() API out from debugfs related code, because these interfaces are not related to debugfs interface. This fixes a compiler warning. Signed-off-by: Masami Hiramatsu Acked-by: Ananth N Mavinakayanahalli Acked-by: Tony Luck Cc: systemtap Cc: DLE LKML-Reference: <20100427223312.2322.60512.stgit@localhost6.localdomain6> Signed-off-by: Ingo Molnar commit bbf1bb3eee86f2eef2baa14e600be454d09109ee Author: Tejun Heo Date: Sat May 8 16:20:53 2010 +0200 cpu_stop: add dummy implementation for UP When !CONFIG_SMP, cpu_stop functions weren't defined at all which could lead to build failures if UP code uses cpu_stop facility. Add dummy cpu_stop implementation for UP. The waiting variants execute the work function directly with preempt disabled and stop_one_cpu_nowait() schedules a workqueue work. Makefile and ifdefs around stop_machine implementation are updated to accomodate CONFIG_SMP && !CONFIG_STOP_MACHINE case. Signed-off-by: Tejun Heo Reported-by: Ingo Molnar commit c7993165ef0c1d636ca05f4787739f8414584e6d Author: Cyrill Gorcunov Date: Sat May 8 15:25:54 2010 +0400 x86, perf: P4 PMU -- check for proper event index in RAW events RAW events are special and we should be ready for user passing in insane event index values. Signed-off-by: Cyrill Gorcunov Cc: Peter Zijlstra Cc: Frederic Weisbecker Cc: Lin Ming LKML-Reference: <20100508112717.315897547@openvz.org> Signed-off-by: Ingo Molnar commit 3f51b7119d052827dbb0e40c966acdf2bdc6f47f Author: Cyrill Gorcunov Date: Sat May 8 15:25:53 2010 +0400 x86, perf: P4 PMU -- Get rid of redundant check for array index The caller already has done such a check. And it was wrong anyway, it had to be '>=' rather than '>' Signed-off-by: Cyrill Gorcunov Cc: Peter Zijlstra Cc: Frederic Weisbecker Cc: Lin Ming LKML-Reference: <20100508112717.130386882@openvz.org> Signed-off-by: Ingo Molnar commit 137351e0feeb9f25d99488ee1afc1c79f5499a9a Author: Cyrill Gorcunov Date: Sat May 8 15:25:52 2010 +0400 x86, perf: P4 PMU -- protect sensible procedures from preemption Steven reported: | | I'm getting: | | Pid: 3477, comm: perf Not tainted 2.6.34-rc6 #2727 | Call Trace: | [] debug_smp_processor_id+0xd5/0xf0 | [] p4_hw_config+0x2b/0x15c | [] ? trace_hardirqs_on_caller+0x12b/0x14f | [] hw_perf_event_init+0x468/0x7be | [] ? debug_mutex_init+0x31/0x3c | [] T.850+0x273/0x42e | [] sys_perf_event_open+0x23e/0x3f1 | [] ? sysret_check+0x2e/0x69 | [] system_call_fastpath+0x16/0x1b | | When running perf record in latest tip/perf/core | Due to the fact that p4 counters are shared between HT threads we synthetically divide the whole set of counters into two non-intersected subsets. And while we're "borrowing" counters from these subsets we should not be preempted (well, strictly speaking in p4_hw_config we just pre-set reference to the subset which allow to save some cycles in schedule routine if it happens on the same cpu). So use get_cpu/put_cpu pair. Also p4_pmu_schedule_events should use smp_processor_id rather than raw_ version. This allow us to catch up preemption issue (if there will ever be). Reported-by: Steven Rostedt Tested-by: Steven Rostedt Signed-off-by: Cyrill Gorcunov Cc: Steven Rostedt Cc: Peter Zijlstra Cc: Frederic Weisbecker Cc: Lin Ming LKML-Reference: <20100508112716.963478928@openvz.org> Signed-off-by: Ingo Molnar commit de902d967feb96f2dfddfbe9dbd69dc22fd5ebcb Author: Cyrill Gorcunov Date: Sat May 8 15:39:52 2010 +0400 x86, perf: P4 PMU -- configure predefined events If an event is not RAW we should not exit p4_hw_config early but call x86_setup_perfctr as well. Signed-off-by: Cyrill Gorcunov Cc: Peter Zijlstra Cc: Frederic Weisbecker Cc: Lin Ming Cc: Robert Richter Signed-off-by: Ingo Molnar commit 6e85158cf5a2385264316870256fb6ad681156a0 Author: Paul Mackerras Date: Sat May 8 20:58:00 2010 +1000 perf_event: Make software events work again Commit 6bde9b6ce0127e2a56228a2071536d422be31336 ("perf: Add group scheduling transactional APIs") added code to allow a group to be scheduled in a single transaction. However, it introduced a bug in handling events whose pmu does not implement transactions -- at the end of scheduling in the events in the group, in the non-transactional case the code now falls through to the group_error label, and proceeds to unschedule all the events in the group and return failure. This fixes it by returning 0 (success) in the non-transactional case. Signed-off-by: Paul Mackerras Cc: Peter Zijlstra Cc: Lin Ming Cc: Frederic Weisbecker Cc: eranian@gmail.com LKML-Reference: <20100508105800.GB10650@brick.ozlabs.ibm.com> Signed-off-by: Ingo Molnar commit ed82702155b6343727ee732f7eae6d72e8b453fe Merge: 4d1c52b 1cf4a06 Author: Ingo Molnar Date: Sat May 8 10:02:57 2010 +0200 Merge branch 'perf' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core commit e08cae4181af9483b04ecfac48f01c8e5a5f27bf Author: H. Peter Anvin Date: Fri May 7 16:57:28 2010 -0700 x86: Clean up the hypervisor layer Clean up the hypervisor layer and the hypervisor drivers, using an ops structure instead of an enumeration with if statements. The identity of the hypervisor, if needed, can be tested by testing the pointer value in x86_hyper. The MS-HyperV private state is moved into a normal global variable (it's per-system state, not per-CPU state). Being a normal bss variable, it will be left at all zero on non-HyperV platforms, and so can generally be tested for HyperV-specific features without additional qualification. Signed-off-by: H. Peter Anvin Acked-by: Greg KH Cc: Hank Janssen Cc: Alok Kataria Cc: Ky Srinivasan LKML-Reference: <4BE49778.6060800@zytor.com> commit 9fa02317429449e8176c9bb6da3ac00eb14d52d3 Author: Greg Kroah-Hartman Date: Fri May 7 16:55:41 2010 -0700 x86, HyperV: fix up the license to mshyperv.c This should have been GPLv2 only, we cut and pasted from the wrong file originally, sorry. Also removed some unneeded boilerplate license code, we all know where to find the GPLv2, and that there's no warranty as that is implicit from the license. Cc: Ky Srinivasan Cc: Hank Janssen Signed-off-by: Greg Kroah-Hartman LKML-Reference: <20100507235541.GA15448@kroah.com> Signed-off-by: H. Peter Anvin commit 2b107d93635616db0c3f893c8cc2e6d5cd8d77b2 Author: Jacob Pan Date: Fri May 7 14:59:45 2010 -0700 x86: Avoid check hlt for newer cpus Check hlt instruction was targeted for some older CPUs. It is an expensive operation in that it takes 4 ticks to break out the check. We can avoid such check completely for newer x86 cpus (family >= 5). [ hpa: corrected family > 5 to family >= 5 ] Signed-off-by: Jacob Pan LKML-Reference: <1273269585-14346-1-git-send-email-jacob.jun.pan@linux.intel.com> Signed-off-by: H. Peter Anvin commit 6f485b41875dbf5160c1990322469c1f65f77b28 Author: Joern Engel Date: Fri May 7 19:38:40 2010 +0200 logfs: handle powerfail on NAND flash The write buffer may not have been written and may no longer be written due to an interrupted write in the affected page. Signed-off-by: Joern Engel commit 1cf4a0632c24ea61162ed819bde358bc94c55510 Author: Arnaldo Carvalho de Melo Date: Fri May 7 14:07:05 2010 -0300 perf list: Improve the raw hw event descriptor documentation It was x86 specific and imcomplete at that, improve the situation by making it clear where the example provided applies and by adding the URLs for the Intel and AMD manuals where this is discussed in depth. Acked-by: Robert Richter Cc: Cyrill Gorcunov Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi Cc: Robert Richter Reported-by: Robert Richter Signed-off-by: Arnaldo Carvalho de Melo commit ccf31c10f125ab5233c8517f91d4b3bd0bd60936 Author: Dan Carpenter Date: Fri May 7 10:59:06 2010 +0200 logfs: handle errors from get_mtd_device() The get_mtd_device() function returns error pointers on failure and if we don't handle it, it leads to a crash. Signed-off-by: Dan Carpenter Signed-off-by: Joern Engel commit 58e323cf5e4ed621a6e88263aca40c3d9c3d9bfd Author: Joern Engel Date: Fri May 7 14:15:04 2010 +0200 logfs: remove unused variable Signed-off-by: Joern Engel commit 4d1c52b02d977d884abb21d0bbaba6b5d6bc8374 Author: Lin Ming Date: Fri Apr 23 13:56:12 2010 +0800 perf, x86: implement group scheduling transactional APIs Convert to the transactional PMU API and remove the duplication of group_sched_in(). Reviewed-by: Stephane Eranian Signed-off-by: Lin Ming Signed-off-by: Peter Zijlstra Cc: David Miller Cc: Paul Mackerras LKML-Reference: <1272002172.5707.61.camel@minggr.sh.intel.com> Signed-off-by: Ingo Molnar commit 6bde9b6ce0127e2a56228a2071536d422be31336 Author: Lin Ming Date: Fri Apr 23 13:56:00 2010 +0800 perf: Add group scheduling transactional APIs Add group scheduling transactional APIs to struct pmu. These APIs will be implemented in arch code, based on Peter's idea as below. > the idea behind hw_perf_group_sched_in() is to not perform > schedulability tests on each event in the group, but to add the group > as a whole and then perform one test. > > Of course, when that test fails, you'll have to roll-back the whole > group again. > > So start_txn (or a better name) would simply toggle a flag in the pmu > implementation that will make pmu::enable() not perform the > schedulablilty test. > > Then commit_txn() will perform the schedulability test (so note the > method has to have a !void return value. > > This will allow us to use the regular > kernel/perf_event.c::group_sched_in() and all the rollback code. > Currently each hw_perf_group_sched_in() implementation duplicates all > the rolllback code (with various bugs). ->start_txn: Start group events scheduling transaction, set a flag to make pmu::enable() not perform the schedulability test, it will be performed at commit time. ->commit_txn: Commit group events scheduling transaction, perform the group schedulability as a whole ->cancel_txn: Stop group events scheduling transaction, clear the flag so pmu::enable() will perform the schedulability test. Reviewed-by: Stephane Eranian Reviewed-by: Frederic Weisbecker Signed-off-by: Lin Ming Cc: David Miller Cc: Paul Mackerras Signed-off-by: Peter Zijlstra LKML-Reference: <1272002160.5707.60.camel@minggr.sh.intel.com> Signed-off-by: Ingo Molnar commit ab608344bcbde4f55ec4cd911b686b0ce3eae076 Author: Peter Zijlstra Date: Thu Apr 8 23:03:20 2010 +0200 perf, x86: Improve the PEBS ABI Rename perf_event_attr::precise to perf_event_attr::precise_ip and widen it to 2 bits. This new field describes the required precision of the PERF_SAMPLE_IP field: 0 - SAMPLE_IP can have arbitrary skid 1 - SAMPLE_IP must have constant skid 2 - SAMPLE_IP requested to have 0 skid 3 - SAMPLE_IP must have 0 skid And modify the Intel PEBS code accordingly. The PEBS implementation now supports up to precise_ip == 2, where we perform the IP fixup. Also s/PERF_RECORD_MISC_EXACT/&_IP/ to clarify its meaning, this bit should be set for each PERF_SAMPLE_IP field known to match the actual instruction triggering the event. This new scheme allows for a PEBS mode that uses the buffer for more than a single event. Signed-off-by: Peter Zijlstra Cc: Paul Mackerras Cc: Stephane Eranian LKML-Reference: Signed-off-by: Ingo Molnar commit 2b0b5c6fe9b383f3cf35a0a6371c9d577bd523ff Author: Peter Zijlstra Date: Thu Apr 8 23:03:20 2010 +0200 perf, x86: Consolidate some code repetition Remove some duplicated logic. Signed-off-by: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit 1e9a6d8d44cb6dcd2799b36ceb23007e6a423bfe Author: Peter Zijlstra Date: Tue May 4 16:30:21 2010 +0200 perf, x86: Remove PEBS SAMPLE_RAW support Its broken, we really should get PERF_SAMPLE_REGS sorted. Signed-off-by: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit a1f2b70a942b8d858a0ab02567da3999b60a99b2 Author: Robert Richter Date: Tue Apr 13 22:23:15 2010 +0200 perf, x86: Use weight instead of cmask in for_each_event_constraint() There may exist constraints with a cmask set to zero. In this case for_each_event_constraint() will not work properly. Now weight is used instead of the cmask for loop exit detection. Weight is always a value other than zero since the default contains the HWEIGHT from the counter mask and in other cases a value of zero does not fit too. This is in preparation of ibs event constraints that wont have a cmask. Signed-off-by: Robert Richter Signed-off-by: Peter Zijlstra LKML-Reference: <1271190201-25705-7-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar commit 31fa58af57c41d2912debf62d47d5811062411f1 Author: Robert Richter Date: Tue Apr 13 22:23:14 2010 +0200 perf, x86: Pass enable bit mask to __x86_pmu_enable_event() To reuse this function for events with different enable bit masks, this mask is part of the function's argument list now. The function will be used later to control ibs events too. Signed-off-by: Robert Richter Signed-off-by: Peter Zijlstra LKML-Reference: <1271190201-25705-6-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar commit 9d0fcba67e47ff398a6fa86476d4884d472dc98a Author: Robert Richter Date: Tue Apr 13 22:23:12 2010 +0200 perf, x86: Call x86_setup_perfctr() from .hw_config() The perfctr setup calls are in the corresponding .hw_config() functions now. This makes it possible to introduce config functions for other pmu events that are not perfctr specific. Also, all of a sudden the code looks much nicer. Signed-off-by: Robert Richter Signed-off-by: Peter Zijlstra LKML-Reference: <1271190201-25705-4-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar commit c1726f343b3bfc2ee037e191907c632a31903021 Author: Robert Richter Date: Tue Apr 13 22:23:11 2010 +0200 perf, x86: Move x86_setup_perfctr() Move x86_setup_perfctr(), no other changes made. Signed-off-by: Robert Richter Signed-off-by: Peter Zijlstra LKML-Reference: <1271190201-25705-3-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar commit 4261e0e0efd9e04b6c69e0773c3cf4d6f337c416 Author: Robert Richter Date: Tue Apr 13 22:23:10 2010 +0200 perf, x86: Move perfctr init code to x86_setup_perfctr() Split __hw_perf_event_init() to configure pmu events other than perfctrs. Perfctr code is moved to a separate function x86_setup_perfctr(). This and the following patches refactor the code. Split in multiple patches for better review. Signed-off-by: Robert Richter Signed-off-by: Peter Zijlstra LKML-Reference: <1271190201-25705-2-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar commit a0507c84bf47dfd204299774f45fd16da33f0619 Author: Peter Zijlstra Date: Thu May 6 15:42:53 2010 +0200 perf: Annotate perf_event_read_group() vs perf_event_release_kernel() Stephane reported a lockdep warning while using PERF_FORMAT_GROUP. The issue is that perf_event_read_group() takes faults while holding the ctx->mutex, while perf_event_release_kernel() can be called from munmap(). Which makes for an AB-BA deadlock. Except we can never establish the deadlock because we'll only ever call perf_event_release_kernel() after all file descriptors are dead so there is no concurrency possible. Reported-by: Stephane Eranian Cc: Paul Mackerras Signed-off-by: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit cce913178118b0b36742eb7544c2b38a0c957ee7 Merge: d9f599e 4fd38e4 Author: Ingo Molnar Date: Fri May 7 11:30:29 2010 +0200 Merge branch 'perf/urgent' into perf/core Merge reason: Resolve patch dependency Signed-off-by: Ingo Molnar commit 4fd38e4595e2f6c9d27732c042a0e16b2753049c Author: Peter Zijlstra Date: Thu May 6 17:31:38 2010 +0200 perf: Fix exit() vs PERF_FORMAT_GROUP Both Stephane and Corey reported that PERF_FORMAT_GROUP didn't work as expected if the task the counters were attached to quit before the read() call. The cause is that we unconditionally destroy the grouping when we remove counters from their context. Fix this by only doing this when we free the counter itself. Reported-by: Corey Ashford Reported-by: Stephane Eranian Signed-off-by: Peter Zijlstra LKML-Reference: <1273160566.5605.404.camel@twins> Signed-off-by: Ingo Molnar commit 27a9da6538ee18046d7bff8e36a9f783542c54c3 Author: Peter Zijlstra Date: Tue May 4 20:36:56 2010 +0200 sched: Remove rq argument to the tracepoints struct rq isn't visible outside of sched.o so its near useless to expose the pointer, also there are no users of it, so remove it. Acked-by: Steven Rostedt Signed-off-by: Peter Zijlstra LKML-Reference: <1272997616.1642.207.camel@laptop> Signed-off-by: Ingo Molnar commit 48652ced1533c3372f996a0d83b6e73b1f1c9381 Merge: 99bd5e2 66f41d4 Author: Ingo Molnar Date: Fri May 7 11:27:54 2010 +0200 Merge commit 'v2.6.34-rc6' into sched/core commit 4726f2a617ebd868a4fdeb5679613b897e5f1676 Author: Yong Zhang Date: Tue May 4 14:16:48 2010 +0800 lockdep: Reduce stack_trace usage When calling check_prevs_add(), if all validations passed add_lock_to_list() will add new lock to dependency tree and alloc stack_trace for each list_entry. But at this time, we are always on the same stack, so stack_trace for each list_entry has the same value. This is redundant and eats up lots of memory which could lead to warning on low MAX_STACK_TRACE_ENTRIES. Use one copy of stack_trace instead. V2: As suggested by Peter Zijlstra, move save_trace() from check_prevs_add() to check_prev_add(). Add tracking for trylock dependence which is also redundant. Signed-off-by: Yong Zhang Cc: David S. Miller Signed-off-by: Peter Zijlstra LKML-Reference: <20100504065711.GC10784@windriver.com> Signed-off-by: Ingo Molnar commit fc390cde362309f6892bb719194f242c466a978b Author: Paul E. McKenney Date: Thu May 6 11:42:52 2010 -0700 rcu: need barrier() in UP synchronize_sched_expedited() If synchronize_sched_expedited() is ever to be called from within kernel/sched.c in a !SMP PREEMPT kernel, the !SMP implementation needs a barrier(). Signed-off-by: Paul E. McKenney Signed-off-by: Tejun Heo commit a2a47c6c3d1a7c01da4464b3b7be93b924f874c1 Author: Ky Srinivasan Date: Thu May 6 12:08:41 2010 -0700 x86: Detect running on a Microsoft HyperV system This patch integrates HyperV detection within the framework currently used by VmWare. With this patch, we can avoid having to replicate the HyperV detection code in each of the Microsoft HyperV drivers. Reworked and tweaked by Greg K-H to build properly. Signed-off-by: K. Y. Srinivasan LKML-Reference: <20100506190841.GA1605@kroah.com> Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Vadim Rozenfeld Cc: Avi Kivity Cc: Gleb Natapov Cc: Frederic Weisbecker Cc: Alexey Dobriyan Cc: "K.Prasad" Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Paul Mackerras Cc: Alan Cox Cc: Haiyang Zhang Cc: Hank Janssen Signed-off-by: Greg Kroah-Hartman Signed-off-by: H. Peter Anvin commit d9f599e1e6d019968b35d2dc63074b9e8964fa69 Author: Dan Carpenter Date: Sat Mar 20 17:39:11 2010 +0300 perf: Fix check at end of event search The original code doesn't work because "call" is never NULL there. Signed-off-by: Dan Carpenter LKML-Reference: <20100320143911.GF5331@bicker> Signed-off-by: Steven Rostedt commit cc631fb732b8ccd6a0cc45557475ea09b0c21a68 Author: Paul E. McKenney Date: Thu May 6 18:49:21 2010 +0200 sched: correctly place paranioa memory barriers in synchronize_sched_expedited() The memory barriers must be in the SMP case, not in the !SMP case. Also add a barrier after the atomic_inc() in order to ensure that other CPUs see post-synchronize_sched_expedited() actions as following the expedited grace period. Signed-off-by: Paul E. McKenney Signed-off-by: Tejun Heo commit 94458d5ecb3da844823cc191e73e5c5ead98a464 Author: Tejun Heo Date: Thu May 6 18:49:21 2010 +0200 sched: kill paranoia check in synchronize_sched_expedited() The paranoid check which verifies that the cpu_stop callback is actually called on all online cpus is completely superflous. It's guaranteed by cpu_stop facility and if it didn't work as advertised other things would go horribly wrong and trying to recover using synchronize_sched() wouldn't be very meaningful. Kill the paranoid check. Removal of this feature is done as a separate step so that it can serve as a bisection point if something actually goes wrong. Signed-off-by: Tejun Heo Acked-by: Peter Zijlstra Cc: Ingo Molnar Cc: Dipankar Sarma Cc: Josh Triplett Cc: Paul E. McKenney Cc: Oleg Nesterov Cc: Dimitri Sivanich commit 969c79215a35b06e5e3efe69b9412f858df7856c Author: Tejun Heo Date: Thu May 6 18:49:21 2010 +0200 sched: replace migration_thread with cpu_stop Currently migration_thread is serving three purposes - migration pusher, context to execute active_load_balance() and forced context switcher for expedited RCU synchronize_sched. All three roles are hardcoded into migration_thread() and determining which job is scheduled is slightly messy. This patch kills migration_thread and replaces all three uses with cpu_stop. The three different roles of migration_thread() are splitted into three separate cpu_stop callbacks - migration_cpu_stop(), active_load_balance_cpu_stop() and synchronize_sched_expedited_cpu_stop() - and each use case now simply asks cpu_stop to execute the callback as necessary. synchronize_sched_expedited() was implemented with private preallocated resources and custom multi-cpu queueing and waiting logic, both of which are provided by cpu_stop. synchronize_sched_expedited_count is made atomic and all other shared resources along with the mutex are dropped. synchronize_sched_expedited() also implemented a check to detect cases where not all the callback got executed on their assigned cpus and fall back to synchronize_sched(). If called with cpu hotplug blocked, cpu_stop already guarantees that and the condition cannot happen; otherwise, stop_machine() would break. However, this patch preserves the paranoid check using a cpumask to record on which cpus the stopper ran so that it can serve as a bisection point if something actually goes wrong theree. Because the internal execution state is no longer visible, rcu_expedited_torture_stats() is removed. This patch also renames cpu_stop threads to from "stopper/%d" to "migration/%d". The names of these threads ultimately don't matter and there's no reason to make unnecessary userland visible changes. With this patch applied, stop_machine() and sched now share the same resources. stop_machine() is faster without wasting any resources and sched migration users are much cleaner. Signed-off-by: Tejun Heo Acked-by: Peter Zijlstra Cc: Ingo Molnar Cc: Dipankar Sarma Cc: Josh Triplett Cc: Paul E. McKenney Cc: Oleg Nesterov Cc: Dimitri Sivanich commit 3fc1f1e27a5b807791d72e5d992aa33b668a6626 Author: Tejun Heo Date: Thu May 6 18:49:20 2010 +0200 stop_machine: reimplement using cpu_stop Reimplement stop_machine using cpu_stop. As cpu stoppers are guaranteed to be available for all online cpus, stop_machine_create/destroy() are no longer necessary and removed. With resource management and synchronization handled by cpu_stop, the new implementation is much simpler. Asking the cpu_stop to execute the stop_cpu() state machine on all online cpus with cpu hotplug disabled is enough. stop_machine itself doesn't need to manage any global resources anymore, so all per-instance information is rolled into struct stop_machine_data and the mutex and all static data variables are removed. The previous implementation created and destroyed RT workqueues as necessary which made stop_machine() calls highly expensive on very large machines. According to Dimitri Sivanich, preventing the dynamic creation/destruction makes booting faster more than twice on very large machines. cpu_stop resources are preallocated for all online cpus and should have the same effect. Signed-off-by: Tejun Heo Acked-by: Rusty Russell Acked-by: Peter Zijlstra Cc: Oleg Nesterov Cc: Dimitri Sivanich commit 1142d810298e694754498dbb4983fcb6cb7fd884 Author: Tejun Heo Date: Thu May 6 18:49:20 2010 +0200 cpu_stop: implement stop_cpu[s]() Implement a simplistic per-cpu maximum priority cpu monopolization mechanism. A non-sleeping callback can be scheduled to run on one or multiple cpus with maximum priority monopolozing those cpus. This is primarily to replace and unify RT workqueue usage in stop_machine and scheduler migration_thread which currently is serving multiple purposes. Four functions are provided - stop_one_cpu(), stop_one_cpu_nowait(), stop_cpus() and try_stop_cpus(). This is to allow clean sharing of resources among stop_cpu and all the migration thread users. One stopper thread per cpu is created which is currently named "stopper/CPU". This will eventually replace the migration thread and take on its name. * This facility was originally named cpuhog and lived in separate files but Peter Zijlstra nacked the name and thus got renamed to cpu_stop and moved into stop_machine.c. * Better reporting of preemption leak as per Peter's suggestion. Signed-off-by: Tejun Heo Acked-by: Peter Zijlstra Cc: Oleg Nesterov Cc: Dimitri Sivanich commit bae663bc635e2726c7c5228dbf0f2051e16d1c81 Author: Robert Richter Date: Wed May 5 17:47:17 2010 +0200 oprofile/x86: make AMD IBS hotplug capable Current IBS code is not hotplug capable. An offline cpu might not be initialized or deinitialized properly. This patch fixes this by removing on_each_cpu() functions. The IBS init/deinit code is executed in the per-cpu functions model->setup_ctrs() and model->cpu_down() which are also called by hotplug notifiers. model->cpu_down() replaces model->exit() that became obsolete. Cc: Andi Kleen Signed-off-by: Robert Richter commit 3de668ee8d5b1e08da3200f926ff5a28aeb99bc2 Author: Robert Richter Date: Mon May 3 15:00:25 2010 +0200 oprofile/x86: notify cpus only when daemon is running This patch moves the cpu notifier registration from nmi_init() to nmi_setup(). The corresponding unregistration function is now in nmi_shutdown(). Thus, the hotplug code is only active, if the oprofile daemon is running. Cc: Andi Kleen Signed-off-by: Robert Richter commit 4f47b4c9f0b711bf84adb8c27774ae80d346b628 Author: Eric W. Biederman Date: Wed May 5 13:22:25 2010 -0700 x86, acpi/irq: Define gsi_end when X86_IO_APIC is undefined My recent changes introducing a global gsi_end variable failed to take into account the case of using acpi on a system not built to support IO_APICs, causing the build to fail. Define gsi_end to 15 when CONFIG_X86_IO_APIC is not set to avoid compile errors. Signed-off-by: Eric W. Biederman Cc: Yinghai Lu LKML-Reference: Signed-off-by: Ingo Molnar commit bdfae149c5b7430b9a26371f14b2d385fd3a4389 Author: Steve French Date: Thu May 6 00:38:16 2010 +0000 [CIFS] Remove unused cifs_oplock_cachep CC: Jeff Layton Signed-off-by: Steve French commit 26efa0bac9dc3587ee8892c06642735bcded59e5 Author: Jeff Layton Date: Sat Apr 24 07:57:49 2010 -0400 cifs: have decode_negTokenInit set flags in server struct ...rather than the secType. This allows us to get rid of the MSKerberos securityEnum. The client just makes a decision at upcall time. Signed-off-by: Jeff Layton Signed-off-by: Steve French commit 198b5682781b97251afd9025dbf559a77969abdd Author: Jeff Layton Date: Sat Apr 24 07:57:48 2010 -0400 cifs: break negotiate protocol calls out of cifs_setup_session So that we can reasonably set up the secType based on both the NegotiateProtocol response and the parsed mount options. Signed-off-by: Jeff Layton Signed-off-by: Steve French commit c0c79c31c9d5fcc19812c6c35f842baf50ee788a Author: Joern Engel Date: Wed May 5 22:33:36 2010 +0200 logfs: fix sync Rather self-explanatory. Signed-off-by: Joern Engel commit bba0b5c2c27e6dadc93c476f8a4b49d108b66292 Author: Joern Engel Date: Wed May 5 22:32:52 2010 +0200 logfs: fix compile failure When CONFIG_BLOCK is not enabled: fs/logfs/super.c:142: error: implicit declaration of function 'bdev_get_queue' fs/logfs/super.c:142: error: invalid type argument of '->' (have 'int') Found by Randy Dunlap Signed-off-by: Joern Engel commit 668eb65f092902eb7dd526af73d4a7f025a94612 Author: Thiago Farina Date: Sun Jan 24 11:03:50 2010 -0500 tracing: Fix "integer as NULL pointer" warning. kernel/trace/trace_output.c:256:24: warning: Using plain integer as NULL pointer Signed-off-by: Thiago Farina LKML-Reference: <1264349038-1766-3-git-send-email-tfransosi@gmail.com> Signed-off-by: Steven Rostedt commit 2e26ca7150a4f2ab3e69471dfc65f131e7dd7a05 Author: Steven Rostedt Date: Wed May 5 10:52:31 2010 -0400 tracing: Fix tracepoint.h DECLARE_TRACE() to allow more than one header When more than one header is included under CREATE_TRACE_POINTS the DECLARE_TRACE() macro is not defined back to its original meaning and the second include will fail to initialize the TRACE_EVENT() and DECLARE_TRACE() correctly. To fix this the tracepoint.h file moves the define of DECLARE_TRACE() out of the #ifdef _LINUX_TRACEPOINT_H protection (just like the define of the TRACE_EVENT()). This way the define_trace.h will undef the DECLARE_TRACE() at the end and allow new headers to start from scratch. This patch also requires fixing the include/events/napi.h It currently uses DECLARE_TRACE() and should be converted to a TRACE_EVENT() format. But I'll leave that change to the authors of that file. But since the napi.h file depends on using the CREATE_TRACE_POINTS and does not define its own DEFINE_TRACE() it must use the define_trace.h method instead. Cc: Neil Horman Cc: David S. Miller Cc: Mathieu Desnoyers Signed-off-by: Steven Rostedt commit 4778e0e8c64f683a71632dba1cff1f85f76f83c4 Author: Arnaldo Carvalho de Melo Date: Wed May 5 11:23:27 2010 -0300 perf tools: Fixup minor doc formatting issues Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 9e32a3cb0661a6a30e0fd2b77ce85293805e6337 Author: Arnaldo Carvalho de Melo Date: Wed May 5 11:20:05 2010 -0300 perf list: Add explanation about raw hardware event descriptors Using explanation given by Ingo Molnar in the oprofile mailing list. Suggested-by: Nick Black Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Nick Black Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit db620b1c2fb172346dc54eb62bba9b4a117d173b Author: Tom Zanussi Date: Tue May 4 22:20:16 2010 -0500 perf/record: simplify TRACE_INFO tracepoint check Fix a couple of inefficiencies and redundancies related to have_tracepoints() and its use when checking whether to write TRACE_INFO. First, there's no need to use get_tracepoints_path() in have_tracepoints() - we really just want the part that checks whether any attributes correspondo to tracepoints. Second, we really don't care about raw_samples per se - tracepoints are always raw_samples. In any case, the have_tracepoints() check should be sufficient to decide whether or not to write TRACE_INFO. Cc: Frederic Weisbecker Cc: Ingo Molnar , Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Acked-by: Frederic Weisbecker LKML-Reference: <1273030770.6383.6.camel@tropicana> Signed-off-by: Tom Zanussi Signed-off-by: Arnaldo Carvalho de Melo commit 9890948d857c2120c234b0ca91a80416e8f747fb Author: Arnaldo Carvalho de Melo Date: Tue May 4 20:58:51 2010 -0300 perf report: Make dso__calc_col_width agree with hist_entry__dso_snprintf The first was always using the ->long_name, while the later used ->short_name if verbose was not set, resulting in the dso column to be much wider than needed most of the time. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 7b20bd5fb902088579af4e70f7f802b93181a628 Author: Eric W. Biederman Date: Tue Mar 30 01:07:16 2010 -0700 x86, irq: Kill io_apic_renumber_irq Now that the generic irq layer is performing the exact same remapping as io_apic_renumber_irq we can kill this weird es7000 specific function. Signed-off-by: Eric W. Biederman LKML-Reference: <1269936436-7039-15-git-send-email-ebiederm@xmission.com> Signed-off-by: H. Peter Anvin commit 988856ee1623bd37e384105f7bb2b7fe44c009f6 Author: Eric W. Biederman Date: Tue Mar 30 01:07:15 2010 -0700 x86, acpi/irq: Handle isa irqs that are not identity mapped to gsi's. ACPI irq source overrides are allowed for the 16 isa irqs and are allowed to map any gsi to any isa irq. A few motherboards have been seen to take advantage of this and put the isa irqs on the 2nd or 3rd ioapic. This causes some problems, most notably the fact that we can not use any gsi < 16. To correct this move the gsis that are not isa irqs and have a gsi number < 16 into the linux irq space just past gsi_end. This is what the es7000 platform is doing today. Moving only the low 16 gsis above the rest of the gsi's only penalizes weird platforms, leaving sane acpi implementations with a 1-1 mapping of gsis and irqs. Signed-off-by: Eric W. Biederman LKML-Reference: <1269936436-7039-14-git-send-email-ebiederm@xmission.com> Signed-off-by: H. Peter Anvin commit 4afc51a835d3aeba11c35090f524e05c84586d27 Author: Eric W. Biederman Date: Tue Mar 30 01:07:14 2010 -0700 x86, ioapic: Simplify probe_nr_irqs_gsi. Use the global gsi_end value now that all ioapics have valid gsi numbers instead of a combination of acpi_probe_gsi and walking all of the ioapics and couting their number of entries by hand if acpi_probe_gsi gave us an answer we did not like. Signed-off-by: Eric W. Biederman LKML-Reference: <1269936436-7039-13-git-send-email-ebiederm@xmission.com> Signed-off-by: H. Peter Anvin commit d464207c4fdd70c2a0febd4f9c58206fa915bb36 Author: Eric W. Biederman Date: Tue Mar 30 01:07:13 2010 -0700 x86, ioapic: Optimize pin_2_irq Now that all ioapics have valid gsi_base values use this to accellerate pin_2_irq. In the case of acpi this also ensures that pin_2_irq will compute the same irq value for an ioapic pin as acpi will. Signed-off-by: Eric W. Biederman LKML-Reference: <1269936436-7039-12-git-send-email-ebiederm@xmission.com> Signed-off-by: H. Peter Anvin commit 7716a5c4ff5f1f3dc5e9edcab125cbf7fceef0af Author: Eric W. Biederman Date: Tue Mar 30 01:07:12 2010 -0700 x86, ioapic: Move nr_ioapic_registers calculation to mp_register_ioapic. Now that all ioapic registration happens in mp_register_ioapic we can move the calculation of nr_ioapic_registers there from enable_IO_APIC. The number of ioapic registers is already calucated in mp_register_ioapic so all that really needs to be done is to save the caluclated value in nr_ioapic_registers. Signed-off-by: Eric W. Biederman LKML-Reference: <1269936436-7039-11-git-send-email-ebiederm@xmission.com> Signed-off-by: H. Peter Anvin commit cf7500c0ea133d66f8449d86392d83f840102632 Author: Eric W. Biederman Date: Tue Mar 30 01:07:11 2010 -0700 x86, ioapic: In mpparse use mp_register_ioapic Long ago MP_ioapic_info was the primary way of setting up our ioapic data structures and mp_register_ioapic was a compatibility shim for acpi code. Now the situation is reversed and and mp_register_ioapic is the primary way of setting up our ioapic data structures. Keep the setting up of ioapic data structures uniform by having mp_register_ioapic call mp_register_ioapic. This changes a few fields: - type: is now hardset to MP_IOAPIC but type had to bey MP_IOAPIC or MP_ioapic_info would not have been called. - flags: is now hard coded to MPC_APIC_USABLE. We require flags to contain at least MPC_APIC_USEBLE in MP_ioapic_info and we don't ever examine flags so dropping a few flags that might possibly exist that we have never used is harmless. - apicaddr: Unchanged - apicver: Read from the ioapic instead of using the cached hardware value in the MP table. The real hardware value will be more accurate. - apicid: Now verified to be unique and changed if it is not. If the BIOS got this right this is a noop. If the BIOS did not fixing things appears to be the better solution. This adds gsi_base and gsi_end values to our ioapics defined with the mpatable, which will make our lives simpler later since we can always assume gsi_base and gsi_end are valid. Signed-off-by: Eric W. Biederman LKML-Reference: <1269936436-7039-10-git-send-email-ebiederm@xmission.com> Signed-off-by: H. Peter Anvin commit 5777372af5c929b8f3c706ed7b295b7279537c88 Author: Eric W. Biederman Date: Tue Mar 30 01:07:10 2010 -0700 x86, ioapic: Teach mp_register_ioapic to compute a global gsi_end Add the global variable gsi_end and teach mp_register_ioapic to keep it uptodate as we add more ioapics into the system. ioapics can only be added early in boot so the code that runs later can treat gsi_end as a constant. Remove the have hacks in sfi.c to second guess mp_register_ioapic by keeping t's own running total of how many gsi's have been seen, and instead use the gsi_end. Signed-off-by: Eric W. Biederman LKML-Reference: <1269936436-7039-9-git-send-email-ebiederm@xmission.com> Signed-off-by: H. Peter Anvin commit eddb0c55a14074d6bac8c2ef169aefd7e2c6f139 Author: Eric W. Biederman Date: Tue Mar 30 01:07:09 2010 -0700 x86, ioapic: Fix the types of gsi values This patches fixes the types of gsi_base and gsi_end values in struct mp_ioapic_gsi, and the gsi parameter of mp_find_ioapic and mp_find_ioapic_pin A gsi is cannonically a u32, not an int. Signed-off-by: Eric W. Biederman LKML-Reference: <1269936436-7039-8-git-send-email-ebiederm@xmission.com> Signed-off-by: H. Peter Anvin commit 4b6b19a1c7302477653d799a53d48063dd53d555 Author: Eric W. Biederman Date: Tue Mar 30 01:07:08 2010 -0700 x86, ioapic: Fix io_apic_redir_entries to return the number of entries. io_apic_redir_entries has a huge conceptual bug. It returns the maximum redirection entry not the number of redirection entries. Which simply does not match what the name of the function. This just caught me and it caught Feng Tang, and Len Brown when they wrote sfi_parse_ioapic. Modify io_apic_redir_entries to actually return the number of redirection entries, and fix the callers so that they properly handle receiving the number of the number of redirection table entries, instead of the number of redirection table entries less one. While the usage in sfi.c does not show up in this patch it is fixed by virtue of the fact that io_apic_redir_entries now has the semantics sfi_parse_ioapic most reasonably expects. Signed-off-by: Eric W. Biederman LKML-Reference: <1269936436-7039-7-git-send-email-ebiederm@xmission.com> Signed-off-by: H. Peter Anvin commit 9638fa521e42c9281c874c6b5a382b1ced4ee496 Author: Eric W. Biederman Date: Tue Mar 30 01:07:07 2010 -0700 x86, ioapic: Only export mp_find_ioapic and mp_find_ioapic_pin in io_apic.h Multiple declarations of the same function in different headers is a pain to maintain. Signed-off-by: Eric W. Biederman LKML-Reference: <1269936436-7039-6-git-send-email-ebiederm@xmission.com> Signed-off-by: H. Peter Anvin commit 0fd52670fb6400be0996ac492b5ed77f3d83d69a Author: Eric W. Biederman Date: Tue Mar 30 01:07:06 2010 -0700 x86, acpi/irq: Generalize mp_config_acpi_legacy_irqs Remove the assumption that there is not an override for isa irq 0. Instead lookup the gsi and from that lookup the ioapic and pin of each isa irq indivdually. In general this should not have any behavioural affect but in perverse cases this gets all of the details correct, instead of doing something weird. Signed-off-by: Eric W. Biederman LKML-Reference: <1269936436-7039-5-git-send-email-ebiederm@xmission.com> Signed-off-by: H. Peter Anvin commit 9d2062b879495649bb525cf7979126da2e45d288 Author: Eric W. Biederman Date: Tue Mar 30 01:07:05 2010 -0700 x86, acpi/irq: Fix acpi_sci_ioapic_setup so it has both bus_irq and gsi Currently acpi_sci_ioapic_setup calls mp_override_legacy_irq with bus_irq == gsi, which is wrong if we are comming from an override Instead pass the bus_irq into acpi_sci_ioapic_setup. This fix was inspired by a similar fix from: Yinghai Lu Signed-off-by: Eric W. Biederman LKML-Reference: <1269936436-7039-4-git-send-email-ebiederm@xmission.com> Signed-off-by: H. Peter Anvin commit 414d3448dbcb40807a1265ace64b2576ef919fbe Author: Eric W. Biederman Date: Tue Mar 30 01:07:04 2010 -0700 x86, acpi/irq: pci device dev->irq is an isa irq not a gsi Strictly speaking on x86 (where acpi is used) dev->irq must be a dual i8259 irq input aka an isa irq. Therefore we should translate that isa irq into a gsi before passing it to a function that takes a gsi. Signed-off-by: Eric W. Biederman LKML-Reference: <1269936436-7039-3-git-send-email-ebiederm@xmission.com> Signed-off-by: H. Peter Anvin commit 9a0a91bb56d2915cdb8585717de38376ad20fef9 Author: Eric W. Biederman Date: Tue Mar 30 01:07:03 2010 -0700 x86, acpi/irq: Teach acpi_get_override_irq to take a gsi not an isa_irq In perverse acpi implementations the isa irqs are not identity mapped to the first 16 gsi. Furthermore at least the extended interrupt resource capability may return gsi's and not isa irqs. So since what we get from acpi is a gsi teach acpi_get_overrride_irq to operate on a gsi instead of an isa_irq. Signed-off-by: Eric W. Biederman LKML-Reference: <1269936436-7039-2-git-send-email-ebiederm@xmission.com> Signed-off-by: H. Peter Anvin commit 2c2df8418ac7908eec4558407b83f16739006c54 Author: Eric W. Biederman Date: Tue Mar 30 01:07:02 2010 -0700 x86, acpi/irq: Introduce apci_isa_irq_to_gsi There are a number of cases where the current code makes the assumption that isa irqs identity map to the first 16 acpi global system intereupts. In most instances that assumption is correct as that is the required behaviour in dual i8259 mode and the default behavior in ioapic mode. However there are some systems out there that take advantage of acpis interrupt remapping for the isa irqs to have a completely different mapping of isa_irq to gsi. Introduce acpi_isa_irq_to_gsi to perform this mapping explicitly in the code that needs it. Initially this will be just the current assumed identity mapping to ensure it's introduction does not cause regressions. Signed-off-by: Eric W. Biederman LKML-Reference: <1269936436-7039-1-git-send-email-ebiederm@xmission.com> Signed-off-by: H. Peter Anvin commit 03d646e62b06e9364e2dbb939d67934c6c9826cd Author: Li Zefan Date: Mon Dec 21 14:27:24 2009 +0800 tracing: Make the documentation clear on trace_event boot option Make it clear that event-list is a comma separated list of events. Reported-by: KOSAKI Motohiro Signed-off-by: Li Zefan LKML-Reference: <4B2F154C.2060503@cn.fujitsu.com> Signed-off-by: Steven Rostedt commit 24797535e18ae219be1fc2632959327075bef5da Author: Prasad Joshi Date: Tue May 4 22:13:59 2010 +0200 logfs: initialize li->li_refcount li_refcount was not re-initialized in function logfs_init_inode(), small patch that will fix the problem Signed-off-by: Prasad Joshi Signed-off-by: Joern Engel commit 05ebad852901cf9127a743df6ea10c0e8b1590c3 Author: Joern Engel Date: Tue May 4 19:41:09 2010 +0200 logfs: commit reservations under space pressure Ensures we only return -ENOSPC when there really is no space. Signed-off-by: Joern Engel commit 20503664b008e17976bff1fdbc693c77ebd6f6c9 Author: Joern Engel Date: Mon May 3 20:54:34 2010 +0200 logfs: survive logfs_buf_recover read errors Refusing to mount beats a kernel crash. Signed-off-by: Joern Engel commit 4677d4a53e0d565742277e8913e91c821453e63e Author: Borislav Petkov Date: Mon May 3 14:57:11 2010 +0200 arch, hweight: Fix compilation errors Fix function prototype visibility issues when compiling for non-x86 architectures. Tested with crosstool (ftp://ftp.kernel.org/pub/tools/crosstool/) with alpha, ia64 and sparc targets. Signed-off-by: Borislav Petkov LKML-Reference: <20100503130736.GD26107@aftab> Signed-off-by: H. Peter Anvin commit c4f3b5a2d70eae4abb8bcaaf8dc3f067ff1714e8 Merge: 777d041 02bf60a Author: Ingo Molnar Date: Tue May 4 18:31:47 2010 +0200 Merge branch 'perf' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core commit 956097912c40a03bf22603a3be73503fd9ea9e44 Author: Borislav Petkov Date: Sun May 2 08:03:54 2010 +0200 ring-buffer: Wrap open-coded WARN_ONCE Wrap open-coded WARN_ONCE functionality into the equivalent macro. Signed-off-by: Borislav Petkov LKML-Reference: <20100502060354.GA5281@liondog.tnic> Signed-off-by: Steven Rostedt commit 4dbf6bc239c169b032777616806ecc648058f6b2 Author: Steven Rostedt Date: Tue May 4 11:24:01 2010 -0400 tracing: Convert nop macros to static inlines The ftrace.h file contains several functions as macros when the functions are disabled due to config options. This patch converts most of them to static inlines. There are two exceptions: register_ftrace_function() and unregister_ftrace_function() This is because their parameter "ops" must not be evaluated since code using the function is allowed to #ifdef out the creation of the parameter. This also fixes an error caused by recent changes: kernel/trace/trace_irqsoff.c: In function 'start_irqsoff_tracer': kernel/trace/trace_irqsoff.c:571: error: expected expression before 'do' Reported-by: Ingo Molnar Signed-off-by: Steven Rostedt commit 02bf60aad7d5912dfcdbe0154f1bd67ea7a8301e Author: Anton Blanchard Date: Tue May 4 21:19:15 2010 +1000 perf: Fix performance issue with perf report On a large machine we spend a lot of time in perf_header__find_attr when running perf report. If we are parsing a file without PERF_SAMPLE_ID then for each sample we call perf_header__find_attr and loop through all counter IDs, never finding a match. As the machine gets larger there are more per cpu counters and we spend an awful lot of time in there. The patch below initialises each sample id to -1ULL and checks for this in perf_header__find_attr. We may need to do something more intelligent eventually (eg a hash lookup from counter id to attr) but this at least fixes the most common usage of perf report. Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Ingo Molnar Cc: Frederic Weisbecker Cc: Eric B Munson Acked-by: Eric B Munson LKML-Reference: <20100504111915.GB14636@kryten> Signed-off-by: Anton Blanchard -- Signed-off-by: Arnaldo Carvalho de Melo commit 11d232ec285b07860670277c8ab3f6076f7bce1e Author: Arnaldo Carvalho de Melo Date: Tue May 4 10:48:22 2010 -0300 perf inject: Add missing bits New commands need to have Documentation and be added to command-list.txt so that they can appear when 'perf' is called withouth any subcommand: [root@doppio linux-2.6-tip]# perf usage: perf [--version] [--help] COMMAND [ARGS] The most commonly used perf commands are: annotate Read perf.data (created by perf record) and display annotated code archive Create archive with object files with build-ids found in perf.data file bench General framework for benchmark suites buildid-cache Manage build-id cache. buildid-list List the buildids in a perf.data file diff Read two perf.data files and display the differential profile inject Filter to augment the events stream with additional information kmem Tool to trace/measure kernel memory(slab) properties kvm Tool to trace/measure kvm guest os list List all symbolic event types lock Analyze lock events probe Define new dynamic tracepoints record Run a command and record its profile into perf.data report Read perf.data (created by perf record) and display the profile sched Tool to trace/measure scheduler properties (latencies) stat Run a command and gather performance counter statistics test Runs sanity tests. timechart Tool to visualize total system behavior during a workload top System profiling tool. trace Read perf.data (created by perf record) and display trace output See 'perf help COMMAND' for more information on a specific command. [root@doppio linux-2.6-tip]# The new 'perf inject' command hadn't so it wasn't appearing on that list. Also fix the long option, that should have no spaces in it, rename the faulty one to be '--build-ids', instead of '--inject build-ids'. Reported-by: Ingo Molnar Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit d30d64c6da3ec7a0708bfffa7e05752d5b9a1093 Author: Robert Richter Date: Mon May 3 15:52:26 2010 +0200 oprofile/x86: reordering some functions Reordering some functions. Necessary for the next patch. No functional changes. Cc: Andi Kleen Signed-off-by: Robert Richter commit de654649737696ecf32873c341b305e30f3dc777 Author: Robert Richter Date: Mon May 3 14:41:22 2010 +0200 oprofile/x86: stop disabled counters in nmi handler This patch adds checks to the nmi handler. Now samples are only generated and counters reenabled, if the counters are running. Otherwise the counters are stopped, if oprofile is using the nmi. In other cases it will ignore the nmi notification. Cc: Andi Kleen Signed-off-by: Robert Richter commit 6ae56b55bc364bc2f2342f599b46581627ba22da Author: Robert Richter Date: Thu Apr 29 14:55:55 2010 +0200 oprofile/x86: protect cpu hotplug sections This patch reworks oprofile cpu hotplug code as follows: Introduce ctr_running variable to check, if counters are running or not. The state must be known for taking a cpu on or offline and when switching counters during counter multiplexing. Protect on_each_cpu() sections with get_online_cpus()/put_online_cpu() functions. This is necessary if notifiers or states are modified. Within these sections the cpu mask may not change. Switch only between counters in nmi_cpu_switch(), if counters are running. Otherwise the switch may restart a counter though they are disabled. Add nmi_cpu_setup() and nmi_cpu_shutdown() to cpu hotplug code. The function must also be called to avoid uninitialzed counter usage. Cc: Andi Kleen Signed-off-by: Robert Richter commit 216f3d9b4e5121feea4b13fae9d4c83e8d7e1c8a Author: Robert Richter Date: Mon May 3 11:58:46 2010 +0200 oprofile/x86: remove CONFIG_SMP macros CPU notifier register functions also exist if CONFIG_SMP is disabled. This change is part of hotplug code rework and also necessary for later patches. Cc: Andi Kleen Signed-off-by: Robert Richter commit 2623a1d55a6260c855e1f6d1895900b50b40a896 Author: Robert Richter Date: Mon May 3 19:44:32 2010 +0200 oprofile/x86: fix uninitialized counter usage during cpu hotplug This fixes a NULL pointer dereference that is triggered when taking a cpu offline after oprofile was initialized, e.g.: $ opcontrol --init $ opcontrol --start-daemon $ opcontrol --shutdown $ opcontrol --deinit $ echo 0 > /sys/devices/system/cpu/cpu1/online See the crash dump below. Though the counter has been disabled the cpu notifier is still active and trying to use already freed counter data. This fix is for linux-stable. To proper fix this, the hotplug code must be rewritten. Thus I will leave a WARN_ON_ONCE() message with this patch. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [] op_amd_stop+0x2d/0x8e PGD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/system/cpu/cpu1/online CPU 1 Modules linked in: Pid: 0, comm: swapper Not tainted 2.6.34-rc5-oprofile-x86_64-standard-00210-g8c00f06 #16 Anaheim/Anaheim RIP: 0010:[] [] op_amd_stop+0x2d/0x8e RSP: 0018:ffff880001843f28 EFLAGS: 00010006 RAX: 0000000000000000 RBX: 0000000000000000 RCX: dead000000200200 RDX: ffff880001843f68 RSI: dead000000100100 RDI: 0000000000000000 RBP: ffff880001843f48 R08: 0000000000000000 R09: ffff880001843f08 R10: ffffffff8102c9a5 R11: ffff88000184ea80 R12: 0000000000000000 R13: ffff88000184f6c0 R14: 0000000000000000 R15: 0000000000000000 FS: 00007fec6a92e6f0(0000) GS:ffff880001840000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 000000000163b000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffff88042fcd8000, task ffff88042fcd51d0) Stack: ffff880001843f48 0000000000000001 ffff88042e9f7d38 ffff880001843f68 <0> ffff880001843f58 ffffffff8132a602 ffff880001843f98 ffffffff810521b3 <0> ffff880001843f68 ffff880001843f68 ffff880001843f88 ffff88042fcd9fd8 Call Trace: [] nmi_cpu_stop+0x21/0x23 [] generic_smp_call_function_single_interrupt+0xdf/0x11b [] smp_call_function_single_interrupt+0x22/0x31 [] call_function_single_interrupt+0x13/0x20 [] ? wake_up_process+0x10/0x12 [] ? default_idle+0x22/0x37 [] c1e_idle+0xdf/0xe6 [] ? atomic_notifier_call_chain+0x13/0x15 [] cpu_idle+0x4b/0x7e [] start_secondary+0x1ae/0x1b2 Code: 89 e5 41 55 49 89 fd 41 54 45 31 e4 53 31 db 48 83 ec 08 89 df e8 be f8 ff ff 48 98 48 83 3c c5 10 67 7a 81 00 74 1f 49 8b 45 08 <42> 8b 0c 20 0f 32 48 c1 e2 20 25 ff ff bf ff 48 09 d0 48 89 c2 RIP [] op_amd_stop+0x2d/0x8e RSP CR2: 0000000000000000 ---[ end trace 679ac372d674b757 ]--- Kernel panic - not syncing: Fatal exception in interrupt Pid: 0, comm: swapper Tainted: G D 2.6.34-rc5-oprofile-x86_64-standard-00210-g8c00f06 #16 Call Trace: [] panic+0x9e/0x10c [] ? up+0x34/0x39 [] ? kmsg_dump+0x112/0x12c [] oops_end+0x81/0x8e [] no_context+0x1f3/0x202 [] __bad_area_nosemaphore+0x1ba/0x1e0 [] ? enqueue_task_fair+0x16d/0x17a [] ? activate_task+0x42/0x53 [] ? try_to_wake_up+0x272/0x284 [] bad_area_nosemaphore+0xe/0x10 [] do_page_fault+0x1c8/0x37c [] ? enqueue_task_fair+0x16d/0x17a [] page_fault+0x1f/0x30 [] ? wake_up_process+0x10/0x12 [] ? op_amd_stop+0x2d/0x8e [] ? op_amd_stop+0x1c/0x8e [] nmi_cpu_stop+0x21/0x23 [] generic_smp_call_function_single_interrupt+0xdf/0x11b [] smp_call_function_single_interrupt+0x22/0x31 [] call_function_single_interrupt+0x13/0x20 [] ? wake_up_process+0x10/0x12 [] ? default_idle+0x22/0x37 [] c1e_idle+0xdf/0xe6 [] ? atomic_notifier_call_chain+0x13/0x15 [] cpu_idle+0x4b/0x7e [] start_secondary+0x1ae/0x1b2 ------------[ cut here ]------------ WARNING: at /local/rrichter/.source/linux/arch/x86/kernel/smp.c:118 native_smp_send_reschedule+0x27/0x53() Hardware name: Anaheim Modules linked in: Pid: 0, comm: swapper Tainted: G D 2.6.34-rc5-oprofile-x86_64-standard-00210-g8c00f06 #16 Call Trace: [] ? native_smp_send_reschedule+0x27/0x53 [] warn_slowpath_common+0x77/0xa4 [] warn_slowpath_null+0xf/0x11 [] native_smp_send_reschedule+0x27/0x53 [] resched_task+0x60/0x62 [] check_preempt_curr_idle+0x10/0x12 [] try_to_wake_up+0x1f5/0x284 [] default_wake_function+0xd/0xf [] pollwake+0x57/0x5a [] ? default_wake_function+0x0/0xf [] __wake_up_common+0x46/0x75 [] __wake_up+0x38/0x50 [] printk_tick+0x39/0x3b [] update_process_times+0x3f/0x5c [] tick_periodic+0x5d/0x69 [] tick_handle_periodic+0x21/0x71 [] smp_apic_timer_interrupt+0x82/0x95 [] apic_timer_interrupt+0x13/0x20 [] ? panic_blink_one_second+0x0/0x7b [] ? panic+0x10a/0x10c [] ? up+0x34/0x39 [] ? kmsg_dump+0x112/0x12c [] ? oops_end+0x81/0x8e [] ? no_context+0x1f3/0x202 [] ? __bad_area_nosemaphore+0x1ba/0x1e0 [] ? enqueue_task_fair+0x16d/0x17a [] ? activate_task+0x42/0x53 [] ? try_to_wake_up+0x272/0x284 [] ? bad_area_nosemaphore+0xe/0x10 [] ? do_page_fault+0x1c8/0x37c [] ? enqueue_task_fair+0x16d/0x17a [] ? page_fault+0x1f/0x30 [] ? wake_up_process+0x10/0x12 [] ? op_amd_stop+0x2d/0x8e [] ? op_amd_stop+0x1c/0x8e [] ? nmi_cpu_stop+0x21/0x23 [] ? generic_smp_call_function_single_interrupt+0xdf/0x11b [] ? smp_call_function_single_interrupt+0x22/0x31 [] ? call_function_single_interrupt+0x13/0x20 [] ? wake_up_process+0x10/0x12 [] ? default_idle+0x22/0x37 [] ? c1e_idle+0xdf/0xe6 [] ? atomic_notifier_call_chain+0x13/0x15 [] ? cpu_idle+0x4b/0x7e [] ? start_secondary+0x1ae/0x1b2 ---[ end trace 679ac372d674b758 ]--- Cc: Andi Kleen Cc: stable Signed-off-by: Robert Richter commit 5bdb7934ca4115a12c7d585c5a45312b1c36909b Author: Robert Richter Date: Wed Mar 31 11:58:36 2010 +0200 oprofile/x86: remove duplicate IBS capability check The check is already done in ibs_exit(). Signed-off-by: Robert Richter commit da759fe5be24ec3b236a76c007b460cf6caf2009 Author: Robert Richter Date: Fri Feb 26 10:54:56 2010 +0100 oprofile/x86: move IBS code Moving code to make future changes easier. This groups all IBS code together. Signed-off-by: Robert Richter commit 8617f98c001d00b176422d707e6a67b88bcd7e0d Author: Robert Richter Date: Fri Feb 26 17:20:55 2010 +0100 oprofile/x86: return -EBUSY if counters are already reserved In case a counter is already reserved by the watchdog or perf_event subsystem, oprofile ignored this counters silently. This case is handled now and oprofile_setup() now reports an error. Signed-off-by: Robert Richter commit 83300ce0df6b72e156b386457aa0f0902b8c0a98 Author: Robert Richter Date: Tue Mar 23 20:01:54 2010 +0100 oprofile/x86: moving shutdown functions Moving some code in preparation of the next patch. Signed-off-by: Robert Richter commit d0e4120fda6f87eead438eed4d49032e12060e58 Author: Robert Richter Date: Tue Mar 23 19:33:21 2010 +0100 oprofile/x86: reserve counter msrs pairwise For AMD's and Intel's P6 generic performance counters have pairwise counter and control msrs. This patch changes the counter reservation in a way that both msrs must be registered. It joins some counter loops and also removes the unnecessary NUM_CONTROLS macro in the AMD implementation. Signed-off-by: Robert Richter commit 8f5a2dd83a1f8e89fdc17eb0f2f07c2e713e635a Author: Robert Richter Date: Tue Mar 23 19:09:51 2010 +0100 oprofile/x86: rework error handler in nmi_setup() This patch improves the error handler in nmi_setup(). Most parts of the code are moved to allocate_msrs(). In case of an error allocate_msrs() also frees already allocated memory. nmi_setup() becomes easier and better extendable. Signed-off-by: Robert Richter commit 777d0411cd1e384115985dac5ccd42031e3eee2b Author: Frederic Weisbecker Date: Mon May 3 15:39:45 2010 +0200 hw_breakpoints: Fix percpu build failure Fix this build error: kernel/hw_breakpoint.c:58:1: error: pasting "__pcpu_scope_" and "*" does not give a valid preprocessing token It happens if CONFIG_DEBUG_FORCE_WEAK_PER_CPU, because we concatenate someting with the name and we have the "*" in the name. Cc: Frederic Weisbecker Cc: Will Deacon Cc: Paul Mundt Cc: Mahesh Salgaonkar Cc: K. Prasad Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Jason Wessel LKML-Reference: <20100503133942.GA5497@nowhere> Signed-off-by: Ingo Molnar commit 54d47a2be5e7f928fb77b2f5a0761f6bd3c9dbff Author: Frederic Weisbecker Date: Tue May 4 04:54:47 2010 +0200 lockdep: No need to disable preemption in debug atomic ops No need to disable preemption in the debug_atomic_* ops, as we ensure interrupts are disabled already. So let's use the __this_cpu_ops() rather than this_cpu_ops() that enclose the ops in a preempt disabled section. Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra commit fa9a97dec611c5356301645d576b523ce3919eba Author: Frederic Weisbecker Date: Tue May 4 04:52:48 2010 +0200 lockdep: Actually _dec_ in debug_atomic_dec Fix a silly copy-paste mistake that was making debug_atomic_dec use this_cpu_inc instead of this_cpu_dec. Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra commit ba697f40dbb704956a4cf67a7845b538015a01ea Author: Frederic Weisbecker Date: Tue May 4 04:47:25 2010 +0200 lockdep: Provide off case for redundant_hardirqs_on increment We forgot to provide a !CONFIG_DEBUG_LOCKDEP case for the redundant_hardirqs_on stat handling. Manage that in the headers with a new __debug_atomic_inc() helper. Fixes: kernel/lockdep.c:2306: error: 'lockdep_stats' undeclared (first use in this function) kernel/lockdep.c:2306: error: (Each undeclared identifier is reported only once kernel/lockdep.c:2306: error: for each function it appears in.) Reported-by: Ingo Molnar Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra commit 097c1bd5673edaf2a162724636858b71f658fdd2 Author: H. Peter Anvin Date: Mon May 3 15:49:31 2010 -0700 x86, cpu: Make APERF/MPERF a normal table-driven flag APERF/MPERF can be handled via the table like all the other scattered CPU flags. Signed-off-by: H. Peter Anvin Cc: Thomas Renninger Cc: Borislav Petkov LKML-Reference: <1270065406-1814-4-git-send-email-bp@amd64.org> commit d88d95eb1c2a72b6126a550debe0883ff723a948 Author: Borislav Petkov Date: Sat Apr 24 09:56:53 2010 +0200 x86, k8: Fix build error when K8_NB is disabled K8_NB depends on PCI and when the last is disabled (allnoconfig) we fail at the final linking stage due to missing exported num_k8_northbridges. Add a header stub for that. Signed-off-by: Borislav Petkov LKML-Reference: <20100503183036.GJ26107@aftab> Signed-off-by: H. Peter Anvin commit 81c4a8a6733ad2ff49c0e077b51403367601b3e7 Author: Robert Richter Date: Thu Apr 22 19:14:49 2010 +0200 oprofile: update file list in MAINTAINERS file File list now catches: $ xargs | eval ls -d $(cat) | sort -u arch/*/include/asm/oprofile*.h arch/*/oprofile/ drivers/oprofile/ include/linux/oprofile.h arch/alpha/oprofile/ arch/arm/oprofile/ arch/avr32/oprofile/ arch/blackfin/oprofile/ arch/ia64/oprofile/ arch/m32r/oprofile/ arch/microblaze/oprofile/ arch/mips/oprofile/ arch/mn10300/oprofile/ arch/parisc/oprofile/ arch/powerpc/include/asm/oprofile_impl.h arch/powerpc/oprofile/ arch/s390/oprofile/ arch/sh/oprofile/ arch/sparc/oprofile/ arch/x86/oprofile/ drivers/oprofile/ include/linux/oprofile.h Signed-off-by: Robert Richter commit 9414e99672271adcc661f3c160a30b374179b92f Author: Phil Carmody Date: Wed Apr 28 12:09:16 2010 -0500 oprofile: protect from not being in an IRQ context http://lkml.org/lkml/2010/4/27/285 Protect against dereferencing regs when it's NULL, and force a magic number into pc to prevent too deep processing. This approach permits the dropped samples to be tallied as invalid Instruction Pointer events. e.g. output from about 15mins at 10kHz sample rate: Nr. samples received: 2565380 Nr. samples lost invalid pc: 4 Signed-off-by: Phil Carmody Signed-off-by: Robert Richter commit 250825008f1f94887bc039e9227a8adfb5ba366e Author: Brian Gerst Date: Sun Mar 21 09:00:46 2010 -0400 x86-32: Don't set ignore_fpu_irq in simd exception Any processor that supports simd will have an internal fpu, and the irq13 handler will not be enabled. Signed-off-by: Brian Gerst LKML-Reference: <1269176446-2489-5-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin commit e2e75c915de045f0785387dc32f55e92fab0614c Author: Brian Gerst Date: Sun Mar 21 09:00:45 2010 -0400 x86: Merge kernel_math_error() into math_error() Clean up the kernel exception handling and make it more similar to the other traps. Signed-off-by: Brian Gerst LKML-Reference: <1269176446-2489-4-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin commit 9b6dba9e0798325dab427b9d60c61630ccc39b28 Author: Brian Gerst Date: Sun Mar 21 09:00:44 2010 -0400 x86: Merge simd_math_error() into math_error() The only difference between FPU and SIMD exceptions is where the status bits are read from (cwd/swd vs. mxcsr). This also fixes the discrepency introduced by commit adf77bac, which fixed FPU but not SIMD. Signed-off-by: Brian Gerst LKML-Reference: <1269176446-2489-3-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin commit 40d2e76315da38993129090dc5d56377e573c312 Author: Brian Gerst Date: Sun Mar 21 09:00:43 2010 -0400 x86-32: Rework cache flush denied handler The cache flush denied error is an erratum on some AMD 486 clones. If an invd instruction is executed in userspace, the processor calls exception 19 (13 hex) instead of #GP (13 decimal). On cpus where XMM is not supported, redirect exception 19 to do_general_protection(). Also, remove die_if_kernel(), since this was the last user. Signed-off-by: Brian Gerst LKML-Reference: <1269176446-2489-2-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin commit 63e0c7715aab6085faa487d498889f4361dc6542 Author: Tom Zanussi Date: Mon May 3 00:14:48 2010 -0500 perf: record TRACE_INFO only if using tracepoints and SAMPLE_RAW The current perf code implicitly assumes SAMPLE_RAW means tracepoints are being used, but doesn't check for that. It happily records the TRACE_INFO even if SAMPLE_RAW is used without tracepoints, but when the perf data is read it won't go any further when it finds TRACE_INFO but no tracepoints, and displays misleading errors. This adds a check for both in perf-record, and won't record TRACE_INFO unless both are true. This at least allows perf report -D to dump raw events, and avoids triggering a misleading error condition in perf trace. It doesn't actually enable the non-tracepoint raw events to be displayed in perf trace, since perf trace currently only deals with tracepoint events. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <1272865861.7932.16.camel@tropicana> Signed-off-by: Tom Zanussi Signed-off-by: Arnaldo Carvalho de Melo commit 0e417fe1f247bb3ac056ee04604332005c334fac Merge: 53ba4f2 913769f Author: Ingo Molnar Date: Mon May 3 09:17:46 2010 +0200 Merge branch 'core/locking' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into core/locking commit 53ba4f2fa73225113a488584df0d85d3cba52943 Merge: bd6d29c 66f41d4 Author: Ingo Molnar Date: Mon May 3 09:17:01 2010 +0200 Merge commit 'v2.6.34-rc6' into core/locking commit 0806ebd974590ab24ab357d5d87db744e56bfe13 Merge: 090f720 feef47d Author: Ingo Molnar Date: Mon May 3 08:29:35 2010 +0200 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core commit 090f7204dfdb5d7f18208ea81dfdba845897cedd Author: Arnaldo Carvalho de Melo Date: Sun May 2 19:46:36 2010 -0300 perf inject: Refactor read_buildid function Into two functions, one that actually reads the build_id for the dso if it wasn't already read, and another taht will inject the event if the build_id is available. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 2c9faa060064343a4a0b16f5b77f3c61d1d17e23 Author: Arnaldo Carvalho de Melo Date: Sun May 2 13:37:24 2010 -0300 perf record: Don't exit in live mode when no tracepoints are enabled With this I was able to actually test Tom Zanussi's two previous patches in my usual perf testing ways, i.e. without any tracepoints activated. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 454c407ec17a0c63e4023ac0877d687945a7df4a Author: Tom Zanussi Date: Sat May 1 01:41:20 2010 -0500 perf: add perf-inject builtin Currently, perf 'live mode' writes build-ids at the end of the session, which isn't actually useful for processing live mode events. What would be better would be to have the build-ids sent before any of the samples that reference them, which can be done by processing the event stream and retrieving the build-ids on the first hit. Doing that in perf-record itself, however, is off-limits. This patch introduces perf-inject, which does the same job while leaving perf-record untouched. Normal mode perf still records the build-ids at the end of the session as it should, but for live mode, perf-inject can be injected in between the record and report steps e.g.: perf record -o - ./hackbench 10 | perf inject -v -b | perf report -v -i - perf-inject reads a perf-record event stream and repipes it to stdout. At any point the processing code can inject other events into the event stream - in this case build-ids (-b option) are read and injected as needed into the event stream. Build-ids are just the first user of perf-inject - potentially anything that needs userspace processing to augment the trace stream with additional information could make use of this facility. Cc: Ingo Molnar Cc: Steven Rostedt Cc: Frédéric Weisbecker LKML-Reference: <1272696080-16435-3-git-send-email-tzanussi@gmail.com> Signed-off-by: Tom Zanussi Signed-off-by: Arnaldo Carvalho de Melo commit 789688faef5b3ba78065beaf2f3d6f1c839f74a3 Author: Tom Zanussi Date: Sat May 1 01:41:19 2010 -0500 perf/live: don't synthesize build ids at the end of a live mode trace It doesn't really make sense to record the build ids at the end of a live mode session - live mode samples need that information during the trace rather than at the end. Leave event__synthesize_build_id() in place, however; we'll still be using that to synthesize build ids in a more timely fashion in a future patch. Cc: Frédéric Weisbecker Cc: Ingo Molnar Cc: Steven Rostedt LKML-Reference: <1272696080-16435-2-git-send-email-tzanussi@gmail.com> Signed-off-by: Tom Zanussi Signed-off-by: Arnaldo Carvalho de Melo commit fb72014d98afd51e85aab9c061344ef32d615606 Author: Arnaldo Carvalho de Melo Date: Fri Apr 30 19:31:12 2010 -0300 perf tools: Don't use code surrounded by __KERNEL__ We need to refactor code to be explicitely shared by the kernel and at least the tools/ userspace programs, so, till we do that, copy the bare minimum bitmap/bitops code needed by tools/perf. Reported-by: "H. Peter Anvin" Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit ccc0197b02178f7e1707e659cbc5242fc94b499a Author: Joern Engel Date: Sat May 1 17:00:34 2010 +0200 logfs: Close i_ino reuse race logfs_seek_hole() may return the same offset it is passed as argument. Found by Prasad Joshi Signed-off-by: Joern Engel commit bd2b3f29594c50d7c5bd864d9af05d440394ee82 Author: Joern Engel Date: Sat May 1 17:33:06 2010 +0200 logfs: fix logfs_seek_hole() logfs_seek_hole(inode, 0x200) would crap itself if the inode contained just 0x1ff (or fewer) blocks. Signed-off-by: Joern Engel commit ad342631f13d40aa787b9e5aaf4800f10d6c3647 Author: Joern Engel Date: Tue Apr 27 13:45:31 2010 +0200 logfs: Return -EINVAL if filesystem image doesn't match Signed-off-by: Joern Engel commit feef47d0cb530e8419dfa0b48141b538b89b1b1a Author: Frederic Weisbecker Date: Fri Apr 23 05:59:55 2010 +0200 hw-breakpoints: Get the number of available registers on boot dynamically The breakpoint generic layer assumes that archs always know in advance the static number of address registers available to host breakpoints through the HBP_NUM macro. However this is not true for every archs. For example Arm needs to get this information dynamically to handle the compatiblity between different versions. To solve this, this patch proposes to drop the static HBP_NUM macro and let the arch provide the number of available slots through a new hw_breakpoint_slots() function. For archs that have CONFIG_HAVE_MIXED_BREAKPOINTS_REGS selected, it will be called once as the number of registers fits for instruction and data breakpoints together. For the others it will be called first to get the number of instruction breakpoint registers and another time to get the data breakpoint registers, the targeted type is given as a parameter of hw_breakpoint_slots(). Reported-by: Will Deacon Signed-off-by: Frederic Weisbecker Acked-by: Paul Mundt Cc: Mahesh Salgaonkar Cc: K. Prasad Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Jason Wessel Cc: Ingo Molnar commit f93a20541134fa767e8dc4eb32e956d30b9f6b92 Author: Frederic Weisbecker Date: Tue Apr 13 00:32:30 2010 +0200 hw-breakpoints: Handle breakpoint weight in allocation constraints Depending on their nature and on what an arch supports, breakpoints may consume more than one address register. For example a simple absolute address match usually only requires one address register. But an address range match may consume two registers. Currently our slot allocation constraints, that tend to reflect the limited arch's resources, always consider that a breakpoint consumes one slot. Then provide a way for archs to tell us the weight of a breakpoint through a new hw_breakpoint_weight() helper. This weight will be computed against the generic allocation constraints instead of a constant value. Signed-off-by: Frederic Weisbecker Acked-by: Paul Mundt Cc: Will Deacon Cc: Mahesh Salgaonkar Cc: K. Prasad Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Ingo Molnar commit 0102752e4c9e0655b39734550d4c35327954f7f9 Author: Frederic Weisbecker Date: Sun Apr 11 18:55:56 2010 +0200 hw-breakpoints: Separate constraint space for data and instruction breakpoints There are two outstanding fashions for archs to implement hardware breakpoints. The first is to separate breakpoint address pattern definition space between data and instruction breakpoints. We then have typically distinct instruction address breakpoint registers and data address breakpoint registers, delivered with separate control registers for data and instruction breakpoints as well. This is the case of PowerPc and ARM for example. The second consists in having merged breakpoint address space definition between data and instruction breakpoint. Address registers can host either instruction or data address and the access mode for the breakpoint is defined in a control register. This is the case of x86 and Super H. This patch adds a new CONFIG_HAVE_MIXED_BREAKPOINTS_REGS config that archs can select if they belong to the second case. Those will have their slot allocation merged for instructions and data breakpoints. The others will have a separate slot tracking between data and instruction breakpoints. Signed-off-by: Frederic Weisbecker Acked-by: Paul Mundt Cc: Will Deacon Cc: Mahesh Salgaonkar Cc: K. Prasad Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Ingo Molnar commit b2812d031dea86926e9c10f7714af33ac2f6b43d Author: Frederic Weisbecker Date: Sun Apr 18 18:11:53 2010 +0200 hw-breakpoints: Change/Enforce some breakpoints policies The current policies of breakpoints in x86 and SH are the following: - task bound breakpoints can only break on userspace addresses - cpu wide breakpoints can only break on kernel addresses The former rule prevents ptrace breakpoints to be set to trigger on kernel addresses, which is good. But as a side effect, we can't breakpoint on kernel addresses for task bound breakpoints. The latter rule simply makes no sense, there is no reason why we can't set breakpoints on userspace while performing cpu bound profiles. We want the following new policies: - task bound breakpoint can set userspace address breakpoints, with no particular privilege required. - task bound breakpoints can set kernelspace address breakpoints but must be privileged to do that. - cpu bound breakpoints can do what they want as they are privileged already. To implement these new policies, this patch checks if we are dealing with a kernel address breakpoint, if so and if the exclude_kernel parameter is set, we tell the user that the breakpoint is invalid, which makes a good generic ptrace protection. If we don't have exclude_kernel, ensure the user has the right privileges as kernel breakpoints are quite sensitive (risk of trap recursion attacks and global performance impacts). [ Paul Mundt: keep addr space check for sh signal delivery and fix double function declaration] Signed-off-by: Frederic Weisbecker Cc: Will Deacon Cc: Mahesh Salgaonkar Cc: K. Prasad Cc: Paul Mundt Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Jason Wessel Cc: Ingo Molnar Signed-off-by: Paul Mundt commit 87e9b2024659c614a876ce359a57e98a47b5ef37 Author: Frederic Weisbecker Date: Sat Apr 17 18:11:59 2010 +0200 hw-breakpoints: Check disabled breakpoints again We stopped checking disabled breakpoints because we weren't allowing breakpoints on NULL addresses. And gdb tends to set NULL addresses on inactive breakpoints. But refusing NULL addresses was actually a regression that has been fixed now. There is no reason anymore to not validate inactive breakpoint settings. Signed-off-by: Frederic Weisbecker Acked-by: Paul Mundt Cc: Will Deacon Cc: Mahesh Salgaonkar Cc: K. Prasad Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Jason Wessel Cc: Ingo Molnar commit 73266fc1df2f94cf72b3beba3eee3b88ed0b0664 Author: Frederic Weisbecker Date: Thu Apr 22 05:05:45 2010 +0200 hw-breakpoints: Tag ptrace breakpoint as exclude_kernel Tag ptrace breakpoints with the exclude_kernel attribute set. This will make it easier to set generic policies on breakpoints, when it comes to ensure nobody unpriviliged try to breakpoint on the kernel. Signed-off-by: Frederic Weisbecker Acked-by: Paul Mundt Cc: Will Deacon Cc: Mahesh Salgaonkar Cc: K. Prasad Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Ingo Molnar commit d00a47cce569a3e660a8c9de5d57af28d6a9f0f7 Author: Frederic Weisbecker Date: Sat May 1 03:08:46 2010 +0200 perf: Fix warning while reading ring buffer headers commit e9e94e3bd862d31777335722e747e97d9821bc1d "perf trace: Ignore "overwrite" field if present in /events/header_page" makes perf trace launching spurious warnings about unexpected tokens read: Warning: Error: expected type 6 but read 4 This change tries to handle the overcommit field in the header_page file whenever this field is present or not. The problem is that if this field is not present, we try to find it and give up in the middle of the line when we realize we are actually dealing with another field, which is the "data" one. And this failure abandons the file pointer in the middle of the "data" description line: field: u64 timestamp; offset:0; size:8; signed:0; field: local_t commit; offset:8; size:8; signed:1; field: char data; offset:16; size:4080; signed:1; ^^^ Here What happens next is that we want to read this line to parse the data field, but we fail because the pointer is not in the beginning of the line. We could probably fix that by rewinding the pointer. But in fact we don't care much about these headers that only concern the ftrace ring-buffer. We don't use them from perf. Just skip this part of perf.data, but don't remove it from recording to stay compatible with olders perf.data Signed-off-by: Frederic Weisbecker Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Stephane Eranian Cc: Steven Rostedt commit e5a5f1f015cf435eb3d2f5712ba51ffdbb92cbef Author: Frederic Weisbecker Date: Fri Apr 30 19:55:00 2010 +0200 perf: Remove leftover useless options to record trace events from scripts -f, -c 1, -R are now useless for trace events recording, moreover -M is useless and event hurts. Remove them from the documentation examples and from record scripts. Signed-off-by: Frederic Weisbecker Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Tom Zanussi commit 913769f24eadcd38a936ffae41d9b4895ec02e43 Author: Frederic Weisbecker Date: Thu Apr 15 23:10:43 2010 +0200 lockdep: Simplify debug atomic ops Simplify debug_atomic_inc/dec by using this_cpu_inc/dec() instead of doing it through an indirect get_cpu_var() and a manual incrementation. Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra commit 8795d7717c467bea7b0a0649d44a258e09f34db2 Author: Frederic Weisbecker Date: Thu Apr 15 23:10:42 2010 +0200 lockdep: Fix redundant_hardirqs_on incremented with irqs enabled When a path restore the flags while irqs are already enabled, we update the per cpu var redundant_hardirqs_on in a racy fashion and debug_atomic_inc() warns about this situation. In this particular case, loosing a few hits in a stat is not a big deal, so increment it without protection. v2: Don't bother with disabling irq, we can miss one count in rare situations Reported-by: Stephen Rothwell Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: David Miller Cc: Benjamin Herrenschmidt commit 868c522b1b75fd3fd3e6a636b4c344ac08edf13a Merge: bd6d29c 66f41d4 Author: Frederic Weisbecker Date: Fri Apr 30 19:11:13 2010 +0200 Merge commit 'v2.6.34-rc6' into core/locking Merge reason: Further lockdep patches depend on per cpu updates made in -rc1. commit bc4b473f1aa2ef785ccfd890a24a1de5a6660f98 Merge: 3ca5049 1c6a800 Author: Ingo Molnar Date: Fri Apr 30 09:58:05 2010 +0200 Merge branch 'perf' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core commit 3ca50496c2677a2b3fdd3ede86660fd1433beac6 Merge: 462b04e 66f41d4 Author: Ingo Molnar Date: Fri Apr 30 09:56:41 2010 +0200 Merge commit 'v2.6.34-rc6' into perf/core Merge reason: update to the latest -rc. Signed-off-by: Ingo Molnar commit d9c5841e22231e4e49fd0a1004164e6fce59b7a6 Merge: b701a47 5967ed8 Author: H. Peter Anvin Date: Thu Apr 29 16:53:17 2010 -0700 Merge branch 'x86/asm' into x86/atomic Merge reason: Conflict between LOCK_PREFIX_HERE and relative alternatives pointers Resolved Conflicts: arch/x86/include/asm/alternative.h arch/x86/kernel/alternative.c Signed-off-by: H. Peter Anvin commit b701a47ba48b698976fb2fe05fb285b0edc1d26a Author: H. Peter Anvin Date: Thu Apr 29 16:03:57 2010 -0700 x86: Fix LOCK_PREFIX_HERE for uniprocessor build Checkin b3ac891b67bd4b1fc728d1c784cad1212dea433d: x86: Add support for lock prefix in alternatives ... did not define LOCK_PREFIX_HERE in the case of a uniprocessor build. As a result, it would cause any of the usages of this macro to fail on a uniprocessor build. Fix this by defining LOCK_PREFIX_HERE as a null string. Signed-off-by: H. Peter Anvin Cc: Luca Barbieri LKML-Reference: <1267005265-27958-2-git-send-email-luca@luca-barbieri.com> commit 1c6a800cde3b818fd8320b5d402f2d77d2948c00 Author: Arnaldo Carvalho de Melo Date: Thu Apr 29 18:58:32 2010 -0300 perf test: Initial regression testing command First an example with the first internal test: [acme@doppio linux-2.6-tip]$ perf test 1: vmlinux symtab matches kallsyms: Ok So it run just one test, that is "vmlinux symtab matches kallsyms", and it was successful. If we run it in verbose mode, we'll see details about errors and extra warnings for non-fatal problems: [acme@doppio linux-2.6-tip]$ perf test -v 1: vmlinux symtab matches kallsyms: --- start --- Looking at the vmlinux_path (5 entries long) No build_id in vmlinux, ignoring it No build_id in /boot/vmlinux, ignoring it No build_id in /boot/vmlinux-2.6.34-rc4-tip+, ignoring it Using /lib/modules/2.6.34-rc4-tip+/build/vmlinux for symbols Maps only in vmlinux: ffffffff81cb81b1-ffffffff81e1149b 0 [kernel].init.text ffffffff81e1149c-ffffffff9fffffff 0 [kernel].exit.text ffffffffff600000-ffffffffff6000ff 0 [kernel].vsyscall_0 ffffffffff600100-ffffffffff6003ff 0 [kernel].vsyscall_fn ffffffffff600400-ffffffffff6007ff 0 [kernel].vsyscall_1 ffffffffff600800-ffffffffffffffff 0 [kernel].vsyscall_2 Maps in vmlinux with a different name in kallsyms: ffffffffff600000-ffffffffff6000ff 0 [kernel].vsyscall_0 in kallsyms as [kernel].0 ffffffffff600100-ffffffffff6003ff 0 [kernel].vsyscall_fn in kallsyms as: *ffffffffff600100-ffffffffff60012f 0 [kernel].2 ffffffffff600400-ffffffffff6007ff 0 [kernel].vsyscall_1 in kallsyms as [kernel].6 ffffffffff600800-ffffffffffffffff 0 [kernel].vsyscall_2 in kallsyms as [kernel].8 Maps only in kallsyms: ffffffffff600130-ffffffffff6003ff 0 [kernel].4 ---- end ---- vmlinux symtab matches kallsyms: Ok [acme@doppio linux-2.6-tip]$ In the above case we only know the name of the non contiguous kernel ranges in the address space when reading the symbol information from the ELF symtab in vmlinux. The /proc/kallsyms file lack this, we only notice they are separate because there are modules after the kernel and after that more kernel functions, so we need to have a module rbtree backed by the module .ko path to get symtabs in the vmlinux case. The tool uses it to match by address to emit appropriate warning, but don't considers this fatal. The .init.text and .exit.text ines, of course, aren't in kallsyms, so I left these cases just as extra info in verbose mode. The end of the sections also aren't in kallsyms, so we the symbols layer does another pass and sets the end addresses as the next map start minus one, which sometimes pads, causing harmless mismatches. But at least the symbols match, tested it by copying /proc/kallsyms to /tmp/kallsyms and doing changes to see if they were detected. This first test also should serve as a first stab at documenting the symbol library by providing a self contained example that exercises it together with comments about what is being done. More tests to check if actions done on a monitored app, like doing mmaps, etc, makes the kernel generate the expected events should be added next. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 5c0541d53ef3897494768decb09eb8f1087953a5 Author: Arnaldo Carvalho de Melo Date: Thu Apr 29 15:25:23 2010 -0300 perf symbols: Add machine helper routines Created when writing the first 'perf test' regression testing routine. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 2e531fa0d0868f5114c2b3a782ab02eb9d6f914d Author: Joern Engel Date: Thu Apr 29 14:56:37 2010 +0200 LogFS: Fix typo in b6349ac8 Signed-off-by: Joern Engel commit 3272c8a57b77a7277a740e211fe12171e4b37e99 Author: Dan Carpenter Date: Wed Apr 21 12:33:54 2010 +0200 logfs: testing the wrong variable There is a typo here. We should test "last" instead of "first". Signed-off-by: Dan Carpenter Signed-off-by: Joern Engel commit 6fc108a08dcddf8f9113cc7102ddaacf7ed37a6b Author: Jan Beulich Date: Wed Apr 21 15:23:44 2010 +0100 x86: Clean up arch/x86/Kconfig* No functional change intended. Signed-off-by: Jan Beulich LKML-Reference: <4BCF2690020000780003B340@vpn.id2.novell.com> Signed-off-by: H. Peter Anvin commit 47f9fe26299ae022ac1e3fa12e7e73def62b7898 Author: Jan Beulich Date: Wed Apr 21 16:19:57 2010 +0100 x86-64: Don't export init_level4_pgt It's not used by any module, and i386 (as well as some other arches) also doesn't export its equivalent (swapper_pg_dir). Signed-off-by: Jan Beulich LKML-Reference: <4BCF33BD020000780003B3E4@vpn.id2.novell.com> Signed-off-by: H. Peter Anvin commit 5967ed87ade85a421ef814296c3c7f182b08c225 Author: Jan Beulich Date: Wed Apr 21 16:08:14 2010 +0100 x86-64: Reduce SMP locks table size Reduce the SMP locks table size by using relative pointers instead of absolute ones, thus cutting the table size by half. Signed-off-by: Jan Beulich LKML-Reference: <4BCF30FE020000780003B3B6@vpn.id2.novell.com> Signed-off-by: H. Peter Anvin commit 2e61878698781d6a9a8bfbaa4ea9c5ddb5a178c3 Author: Jan Beulich Date: Wed Apr 21 16:13:20 2010 +0100 x86-64: Combine SRAT regions when possible ... i.e. when the hole between two regions isn't occupied by memory on another node. This reduces the memory->node table size, thus reducing cache footprint of lookups, which got increased significantly some time ago, and things go back to how they were before that change on the systems I looked at. Signed-off-by: Jan Beulich LKML-Reference: <4BCF3230020000780003B3CA@vpn.id2.novell.com> Signed-off-by: H. Peter Anvin commit 402af0d7c692ddcfa2333e93d3f275ebd0487926 Author: Jan Beulich Date: Wed Apr 21 15:21:51 2010 +0100 x86, asm: Introduce and use percpu_inc() ... generating slightly smaller code. Signed-off-by: Jan Beulich LKML-Reference: <4BCF261F020000780003B33C@vpn.id2.novell.com> Signed-off-by: H. Peter Anvin commit 18acde52b83bd1c8e1d007db519f46d344aa13ed Author: Arnaldo Carvalho de Melo Date: Tue Apr 27 22:26:51 2010 -0300 perf tools: Create $(OUTPUT)arch/$(ARCH)/util/ directory So that "make -C tools/perf O=/tmp/some/path" works again. Problem introduced in: cd932c5 "perf: Move arch specific code into separate arch director" Cc: Ian Munsie Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 37e44bc50d91df1fe7edcf6f02fe168c6d802e64 Author: Steven Rostedt Date: Tue Apr 27 21:04:24 2010 -0400 tracing: Fix sleep time function profiling When sleep_time is off the function profiler ignores the time that a task is scheduled out. When the task is scheduled out a timestamp is taken. When the task is scheduled back in, the timestamp is compared to the current time and the saved calltimes are adjusted accordingly. But when stopping the function profiler, the sched switch hook that does this adjustment was stopped before shutting down the tracer. This allowed some tasks to not get their timestamps set when they scheduled out. When the function profiler started again, this would skew the times of the scheduler functions. This patch moves the stopping of the sched switch to after the function profiler is stopped. It also ignores zero set calltimes, which may happen on start up. Signed-off-by: Steven Rostedt commit ebe6aa5ac456a13213ed563863e70dd441618a97 Author: Jeff Layton Date: Sat Apr 24 07:57:47 2010 -0400 cifs: eliminate "first_time" parm to CIFS_SessSetup We can use the is_first_ses_reconnect() function to determine this. Signed-off-by: Jeff Layton Signed-off-by: Steve French commit cbf6968098f89d3216d074f06544b5032b344da4 Author: Arnaldo Carvalho de Melo Date: Tue Apr 27 21:22:44 2010 -0300 perf machines: Make the machines class adopt the dsos__fprintf methods Now those methods don't operate on a global list of dsos, but on lists of machines, so make this clear by renaming the functions. Cc: Avi Kivity Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Zhang, Yanmin LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit d28c62232e50eab202bcd3f19b5c7a25b8b900b6 Author: Arnaldo Carvalho de Melo Date: Tue Apr 27 21:20:43 2010 -0300 perf machine: Adopt some map_groups functions Those functions operated on members now grouped in 'struct machine', so move those methods to this new class. The changes made to 'perf probe' shows that using this abstraction inserting probes on guests almost got supported for free. Cc: Avi Kivity Cc: Frédéric Weisbecker Cc: Masami Hiramatsu Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Zhang, Yanmin LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 48ea8f5470aa6f35244d1b218316705ea88c0259 Author: Arnaldo Carvalho de Melo Date: Tue Apr 27 21:19:05 2010 -0300 perf machine: Pass buffer size to machine__mmap_name Don't blindly assume that the size of the buffer is enough, use snprintf. Cc: Avi Kivity Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Zhang, Yanmin LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 23346f21b277e3aae5e9989e711a11cbe8133a45 Author: Arnaldo Carvalho de Melo Date: Tue Apr 27 21:17:50 2010 -0300 perf tools: Rename "kernel_info" to "machine" struct kernel_info and kerninfo__ are too vague, what they really describe are machines, virtual ones or hosts. There are more changes to introduce helpers to shorten function calls and to make more clear what is really being done, but I left that for subsequent patches. Cc: Avi Kivity Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Zhang, Yanmin LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit e330b3bcd83199dd63a819d8d12e40f9edae6c77 Author: Chase Douglas Date: Mon Apr 26 14:02:05 2010 -0400 tracing: Show sample std dev in function profiling When combined with function graph tracing the ftrace function profiler also prints the average run time of functions. While this gives us some good information, it doesn't tell us anything about the variance of the run times of the function. This change prints out the s^2 sample standard deviation alongside the average. This change adds one entry to the profile record structure. This increases the memory footprint of the function profiler by 1/3 on a 32-bit system, and by 1/5 on a 64-bit system when function graphing is enabled, though the memory is only allocated when the profiler is turned on. During the profiling, one extra line of code adds the squared calltime to the new record entry, so this should not adversly affect performance. Note that the square of the sample standard deviation is printed because there is no sqrt implementation for unsigned long long in the kernel. Signed-off-by: Chase Douglas LKML-Reference: <1272304925-2436-1-git-send-email-chase.douglas@canonical.com> [ fixed comment about ns^2 -> us^2 conversion ] Signed-off-by: Steven Rostedt commit 07271aa42d13378e67ebd79ea9ca1c4a5e2ad46f Author: Chase Douglas Date: Fri Apr 23 14:02:39 2010 -0400 tracing: Add documentation for trace commands mod, traceon/traceoff The mod command went in as commit 64e7c440618998fd69eee6ab490b042d12248021 The traceon/traceoff commands went in as commit 23b4ff3aa479c9e3bb23cb6b2d0a97878399784a Signed-off-by: Chase Douglas LKML-Reference: <1272045759-32018-1-git-send-email-chase.douglas@canonical.com> Signed-off-by: Steven Rostedt commit a838b2e634405fb89ddbf4fa9412acb33911911f Author: Steven Rostedt Date: Tue Apr 27 13:26:58 2010 -0400 ring-buffer: Make benchmark handle missed events With the addition of the "missed events" flags that is stored in the commit field of the ring buffer page, the ring_buffer_benchmark was not updated to handle this. If events are missed, then the missed events flag is set in the ring buffer page, the benchmark will count that flag as part of the size of the page and will hit the BUG() when it tries to read beyond the page. The solution is simply to have the ring buffer benchmark mask off the extra bits. Reported-by: Ingo Molnar Signed-off-by: Steven Rostedt commit 72c9ddfd4c5bf54ef03cfdf57026416cb678eeba Author: David Miller Date: Tue Apr 20 15:47:11 2010 -0700 ring-buffer: Make non-consuming read less expensive with lots of cpus. When performing a non-consuming read, a synchronize_sched() is performed once for every cpu which is actively tracing. This is very expensive, and can make it take several seconds to open up the 'trace' file with lots of cpus. Only one synchronize_sched() call is actually necessary. What is desired is for all cpus to see the disabling state change. So we transform the existing sequence: for_each_cpu() { ring_buffer_read_start(); } where each ring_buffer_start() call performs a synchronize_sched(), into the following: for_each_cpu() { ring_buffer_read_prepare(); } ring_buffer_read_prepare_sync(); for_each_cpu() { ring_buffer_read_start(); } wherein only the single ring_buffer_read_prepare_sync() call needs to do the synchronize_sched(). The first phase, via ring_buffer_read_prepare(), allocates the 'iter' memory and increments ->record_disabled. In the second phase, ring_buffer_read_prepare_sync() makes sure this ->record_disabled state is visible fully to all cpus. And in the final third phase, the ring_buffer_read_start() calls reset the 'iter' objects allocated in the first phase since we now know that none of the cpus are adding trace entries any more. This makes openning the 'trace' file nearly instantaneous on a sparc64 Niagara2 box with 128 cpus tracing. Signed-off-by: David S. Miller LKML-Reference: <20100420.154711.11246950.davem@davemloft.net> Signed-off-by: Steven Rostedt commit 62b915f1060996a8e1f69be50e3b8e9e43b710cb Author: Jiri Olsa Date: Fri Apr 2 19:01:22 2010 +0200 tracing: Add graph output support for irqsoff tracer Add function graph output to irqsoff tracer. The graph output is enabled by setting new 'display-graph' trace option. Signed-off-by: Jiri Olsa LKML-Reference: <1270227683-14631-4-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt commit 462b04e28a7ec1339c892117c3f20a40e55d0e83 Merge: cfadf9d f93830f Author: Ingo Molnar Date: Tue Apr 27 11:16:54 2010 +0200 Merge branch 'perf' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core commit d54ff73259a852d4b3886dc586587fdef5e9c8de Author: Steve French Date: Tue Apr 27 04:38:15 2010 +0000 [CIFS] Fix lease break for writes On lease break we were breaking to readonly leases always even if write requested. Also removed experimental ifdef around setlease code Signed-off-by: Steve French commit 9bf67e516f16d31f86aa6f063576a959bbf19990 Author: Jeff Layton Date: Sat Apr 24 07:57:46 2010 -0400 cifs: save the dialect chosen by server Signed-off-by: Jeff Layton Signed-off-by: Steve French commit b8bc1389b74c2b66255651a6fcfae56c78b6e63f Author: Alessio Igor Bogani Date: Sun Apr 25 13:18:48 2010 +0200 ptrace: Cleanup useless header BKL isn't present anymore into this file thus we can safely remove smp_lock.h inclusion. Signed-off-by: Alessio Igor Bogani Cc: Roland McGrath Cc: Oleg Nesterov Cc: Andrew Morton Cc: James Morris Cc: Ingo Molnar Signed-off-by: Frederic Weisbecker commit d7a8d9e907cc294ec7a4a7046d1886375fbcc82e Author: Jiri Olsa Date: Fri Apr 2 19:01:21 2010 +0200 tracing: Have graph flags passed in to ouput functions Let the function graph tracer have custom flags passed to its output functions. Signed-off-by: Jiri Olsa LKML-Reference: <1270227683-14631-3-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt commit 9106b69382912ddc403a307b69bf894a6f3004e4 Author: Jiri Olsa Date: Fri Apr 2 19:01:20 2010 +0200 tracing: Add ftrace events for graph tracer Add ftrace events for graph tracer, so the graph output could be shared with other tracers. Signed-off-by: Jiri Olsa LKML-Reference: <1270227683-14631-2-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt commit ad6cca6d5d0f713e1987e20ed982cfa9eb16b27e Author: Dan Carpenter Date: Mon Apr 26 12:10:06 2010 +0200 cifs: change && to || This is a typo, if pvolume_info were NULL it would oops. This function is used in clean up and error handling. The current code never passes a NULL pvolume_info, but it could pass a NULL *pvolume_info if the kmalloc() failed. Signed-off-by: Dan Carpenter Acked-by: Jeff Layton Signed-off-by: Steve French commit 04912d6a20185473db025061b9b2c81fbdffc48b Author: Jeff Layton Date: Sat Apr 24 07:57:45 2010 -0400 cifs: rename "extended_security" to "global_secflags" ...since that more accurately describes what that variable holds. Signed-off-by: Jeff Layton Signed-off-by: Steve French commit d00c28de55a69d13cf2c6a99efc3e81a5d95af74 Author: Jeff Layton Date: Sat Apr 24 07:57:44 2010 -0400 cifs: move tcon find/create into separate function ...and out of cifs_mount. Signed-off-by: Jeff Layton Signed-off-by: Steve French commit 36988c76f007738cad5fe1c873a5fb0cda7eb2f6 Author: Jeff Layton Date: Sat Apr 24 07:57:43 2010 -0400 cifs: move SMB session creation code into separate function ...it's mostly part of cifs_mount. Break it out into a separate function. Signed-off-by: Jeff Layton Signed-off-by: Steve French commit a5fc4ce01867842f6a9cc317035df3081302bffc Author: Jeff Layton Date: Sat Apr 24 07:57:42 2010 -0400 cifs: track local_nls in volume info Add a local_nls field to the smb_vol struct and keep a pointer to the local_nls in it. Signed-off-by: Jeff Layton Signed-off-by: Steve French commit f93830fbb06b67848c762f2177c06cc3cbb97deb Author: Stefan Hajnoczi Date: Mon Apr 26 15:39:54 2010 -0300 perf tools: Fix libdw-dev package name in error message The headers required for DWARF support are provided by the libdw-dev package in Debian-based distros. This patch corrects the elfutils-dev package name to libdw-dev in the Makefile error message when libdw.h is not found. Signed-off-by: Stefan Hajnoczi Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Masami Hiramatsu Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Tom Zanussi LKML-Reference: <1272292023-9869-1-git-send-email-stefanha@linux.vnet.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo commit ef4a356574426877d569f8b6579325537eb7909b Author: Masami Hiramatsu Date: Wed Apr 21 15:56:40 2010 -0400 perf probe: Add --max-probes option Add --max-probes option to change the maximum limit of findable probe points per event, since inlined function can be expanded into thousands of probe points. Default value is 128. Signed-off-by: Masami Hiramatsu Suggested-by: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Frederic Weisbecker Cc: Ingo Molnar LKML-Reference: <20100421195640.24664.62984.stgit@localhost6.localdomain6> Signed-off-by: Arnaldo Carvalho de Melo commit 5d1ee0413c8e2e0aa48510b1edfb3c4d2d43455b Author: Masami Hiramatsu Date: Wed Apr 21 15:56:32 2010 -0400 perf probe: Fix to exit callback soon after finding too many probe points Fix to exit callback soon after finding too many probe points. Don't try to continue searching because it already failed. Signed-off-by: Masami Hiramatsu Reported-by: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Frederic Weisbecker Cc: Ingo Molnar LKML-Reference: <20100421195632.24664.42598.stgit@localhost6.localdomain6> Signed-off-by: Arnaldo Carvalho de Melo commit 15eca306ec95e164d05457f9f27c722f69af6d18 Author: Masami Hiramatsu Date: Wed Apr 21 15:56:24 2010 -0400 perf probe: Fix to use symtab only if no debuginfo Fix perf probe to use symtab only if there is no debuginfo, because debuginfo has more information than symtab. If we can't find a function in debuginfo, we never find it in symtab. Signed-off-by: Masami Hiramatsu Reported-by: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Frederic Weisbecker Cc: Ingo Molnar LKML-Reference: <20100421195624.24664.46214.stgit@localhost6.localdomain6> Signed-off-by: Arnaldo Carvalho de Melo commit 0ab061cd523a7f2dbf1b59aab0542cb0ab2e4633 Author: Masami Hiramatsu Date: Wed Apr 21 15:56:16 2010 -0400 perf tools: Initialize dso->node member in dso__new If dso->node member is not initialized, it causes a segmentation fault when adding to other lists. It should be initilized in dso__new(). Signed-off-by: Masami Hiramatsu Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Mike Galbraith Cc: Frederic Weisbecker Cc: Ingo Molnar LKML-Reference: : <20100421195616.24664.89980.stgit@localhost6.localdomain6> Signed-off-by: Arnaldo Carvalho de Melo commit cfadf9d4ac4be940595ab73d3def24c23c8b875f Author: William Cohen Date: Fri Apr 23 16:36:21 2010 -0400 perf: Some perf-kvm documentation edits asciidoc does not allow the "===" to be longer than the line above it. Also fix a couple types and formatting errors. Signed-off-by: William Cohen Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Ingo Molnar LKML-Reference: <4BD204C5.9000504@redhat.com> Signed-off-by: Frederic Weisbecker commit e1889d75aff0c3786bc53aeb7d9eaca0691c19c5 Author: Frederic Weisbecker Date: Sat Apr 24 01:55:09 2010 +0200 perf: Add a perf trace option to check samples ordering reliability To ensure sample events time reordering is reliable, add a -d option to perf trace to check that automatically. Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Hitoshi Mitake Cc: Ingo Molnar Cc: Masami Hiramatsu Cc: Tom Zanussi commit 9df9bbba9f7e2e4ffdc51bbbfa524b67691321d2 Author: Frederic Weisbecker Date: Sat Apr 24 01:18:48 2010 +0200 perf: Use generic sample reordering in perf timechart Use the new generic sample events reordering from perf timechart, this drops the ad hoc sample reordering it was using before. Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Hitoshi Mitake Cc: Ingo Molnar Cc: Masami Hiramatsu Cc: Tom Zanussi Cc: Arjan van de Ven commit e0a808c65c23f88e48a5fff48775b90e7919c64f Author: Frederic Weisbecker Date: Sat Apr 24 00:38:33 2010 +0200 perf: Use generic sample reordering in perf trace Use the new generic sample events reordering from perf trace. Before that, the displayed traces were ordered as they were in the input as recorded by perf record (not time ordered). This makes eventually perf trace displaying the events as beeing time ordered. Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Hitoshi Mitake Cc: Ingo Molnar Cc: Masami Hiramatsu Cc: Tom Zanussi commit 587570d4cc3cac80da7d569bee9cea3ca104d60e Author: Frederic Weisbecker Date: Sat Apr 24 00:34:53 2010 +0200 perf: Use generic sample reordering in perf kmem Use the new generic sample events reordering from perf kmem, this drops the need of multiplexing the buffers on record time, improving the scalability of perf kmem. Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Hitoshi Mitake Cc: Ingo Molnar Cc: Masami Hiramatsu Cc: Tom Zanussi Cc: Pekka Enberg Cc: Li Zefan commit a64eae703b390185abe1d15fa932b48f04fa7fbb Author: Frederic Weisbecker Date: Sat Apr 24 00:29:40 2010 +0200 perf: Use generic sample reordering in perf sched Use the new generic sample events reordering from perf sched, this drops the need of multiplexing the buffers on record time, improving the scalability of perf sched. Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Hitoshi Mitake Cc: Ingo Molnar Cc: Masami Hiramatsu Cc: Tom Zanussi commit c61e52ee705f938596d307625dce00cc4345aaf0 Author: Frederic Weisbecker Date: Sat Apr 24 00:04:12 2010 +0200 perf: Generalize perf lock's sample event reordering to the session layer The sample events recorded by perf record are not time ordered because we have one buffer per cpu for each event (even demultiplexed per task/per cpu for task bound events). But when we read trace events we want them to be ordered by time because many state machines are involved. There are currently two ways perf tools deal with that: - use -M to multiplex every buffers (perf sched, perf kmem) But this creates a lot of contention in SMP machines on record time. - use a post-processing time reordering (perf timechart, perf lock) The reordering used by timechart is simple but doesn't scale well with huge flow of events, in terms of performance and memory use (unusable with perf lock for example). Perf lock has its own samples reordering that flushes its memory use in a regular basis and that uses a sorting based on the previous event queued (a new event to be queued is close to the previous one most of the time). This patch proposes to export perf lock's samples reordering facility to the session layer that reads the events. So if a tool wants to get ordered sample events, it needs to set its struct perf_event_ops::ordered_samples to true and that's it. This prepares tracing based perf tools to get rid of the need to use buffers multiplexing (-M) or to implement their own reordering. Also lower the flush period to 2 as it's sufficient already. Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Hitoshi Mitake Cc: Ingo Molnar Cc: Masami Hiramatsu Cc: Tom Zanussi commit 5710fcad7c367adefe5634dc998f1f88780a8457 Author: Stephane Eranian Date: Wed Apr 21 18:06:01 2010 +0200 perf: Fix initialization bug in parse_single_tracepoint_event() The parse_single_tracepoint_event() was setting some attributes before it validated the event was indeed a tracepoint event. This caused problems with other initialization routines like in the builtin-top.c module whereby sample_period is not set if not 0. Signed-off-by: Stephane Eranian Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Ingo Molnar LKML-Reference: <4bcf232b.698fd80a.6fbe.ffffb737@mx.google.com> Signed-off-by: Frederic Weisbecker commit e4cef1f65061429c3e8b356233e87dc6653a9da5 Author: Hitoshi Mitake Date: Wed Apr 21 21:23:54 2010 +0900 perf lock: Fix state machine to recognize lock sequence Previous state machine of perf lock was really broken. This patch improves it a little. This patch prepares the list of state machine that represents lock sequences for each threads. These state machines can be one of these sequences: 1) acquire -> acquired -> release 2) acquire -> contended -> acquired -> release 3) acquire (w/ try) -> release 4) acquire (w/ read) -> release The case of 4) is a little special. Double acquire of read lock is allowed, so the state machine counts read lock number, and permits double acquire and release. But, things are not so simple. Something in my model is still wrong. I counted the number of lock instances with bad sequence, and ratio is like this (case of tracing whoami): bad:233, total:2279 version 2: * threads are now identified with tid, not pid * prepared SEQ_STATE_READ_ACQUIRED for read lock. * bunch of struct lock_seq_stat is now linked list * debug information enhanced (this have to be removed someday) e.g. | === output for debug=== | | bad:233, total:2279 | bad rate:0.000000 | histogram of events caused bad sequence | acquire: 165 | acquired: 0 | contended: 0 | release: 68 Signed-off-by: Hitoshi Mitake Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Arnaldo Carvalho de Melo Cc: Jens Axboe Cc: Jason Baron Cc: Xiao Guangrong Cc: Ingo Molnar LKML-Reference: <1271852634-9351-1-git-send-email-mitake@dcl.info.waseda.ac.jp> [rename SEQ_STATE_UNINITED to SEQ_STATE_UNINITIALIZED] Signed-off-by: Frederic Weisbecker commit 1f9cc3cb6a27521edfe0a21abf97d2bb11c4d237 Author: Robin Holt Date: Fri Apr 23 10:36:22 2010 -0500 x86, pat: Update the page flags for memtype atomically instead of using memtype_lock While testing an application using the xpmem (out of kernel) driver, we noticed a significant page fault rate reduction of x86_64 with respect to ia64. For one test running with 32 cpus, one thread per cpu, it took 01:08 for each of the threads to vm_insert_pfn 2GB worth of pages. For the same test running on 256 cpus, one thread per cpu, it took 14:48 to vm_insert_pfn 2 GB worth of pages. The slowdown was tracked to lookup_memtype which acquires the spinlock memtype_lock. This heavily contended lock was slowing down vm_insert_pfn(). With the cmpxchg on page->flags method, both the 32 cpu and 256 cpu cases take approx 00:01.3 seconds to complete. Signed-off-by: Robin Holt LKML-Reference: <20100423153627.751194346@gulag1.americas.sgi.com> Cc: Venkatesh Pallipadi Cc: Rafael Wysocki Reviewed-by: Suresh Siddha Signed-off-by: H. Peter Anvin commit b971f06187d83b5c03d2b597cccdfef421c0ca91 Merge: cb6e943 c1ab9ca Author: Robert Richter Date: Fri Apr 23 16:47:51 2010 +0200 Merge commit 'tip/tracing/core' into oprofile/core Conflicts: drivers/oprofile/cpu_buffer.c Signed-off-by: Robert Richter commit cb6e943ccf19ab6d3189147e9d625a992e016084 Author: Andi Kleen Date: Thu Apr 1 03:17:25 2010 +0200 oprofile: remove double ring buffering oprofile used a double buffer scheme for its cpu event buffer to avoid races on reading with the old locked ring buffer. But that is obsolete now with the new ring buffer, so simply use a single buffer. This greatly simplifies the code and avoids a lot of sample drops on large runs, especially with call graph. Based on suggestions from Steven Rostedt For stable kernels from v2.6.32, but not earlier. Signed-off-by: Andi Kleen Cc: Steven Rostedt Cc: stable Signed-off-by: Robert Richter commit a36bf32e9e8a86f291f746b7f8292e042ee04a46 Merge: bc078e4 01bf0b6 Author: Robert Richter Date: Fri Apr 23 14:30:22 2010 +0200 Merge commit 'v2.6.34-rc5' into oprofile/core commit 77a7f2e94e6998e307917fe63fa4b6d5162d44e9 Merge: ac0053f cecbca9 Author: Ingo Molnar Date: Fri Apr 23 11:25:31 2010 +0200 Merge branch 'tracing/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into tracing/core commit 4093b150e52e6da20e9496df8aa007423952ae42 Merge: 70bce3b fead796 Author: Ingo Molnar Date: Fri Apr 23 11:11:38 2010 +0200 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/perf into perf/core commit 70bce3ba77540ebe77b8c0e1ac38d281a23fbb5e Merge: 6eca8cc d5a3045 Author: Ingo Molnar Date: Fri Apr 23 11:10:28 2010 +0200 Merge branch 'linus' into perf/core Merge reason: merge the latest fixes, update to latest -rc. Signed-off-by: Ingo Molnar commit 99bd5e2f245d8cd17d040c82d40becdb3efd9b69 Author: Suresh Siddha Date: Wed Mar 31 16:47:45 2010 -0700 sched: Fix select_idle_sibling() logic in select_task_rq_fair() Issues in the current select_idle_sibling() logic in select_task_rq_fair() in the context of a task wake-up: a) Once we select the idle sibling, we use that domain (spanning the cpu that the task is currently woken-up and the idle sibling that we found) in our wake_affine() decisions. This domain is completely different from the domain(we are supposed to use) that spans the cpu that the task currently woken-up and the cpu where the task previously ran. b) We do select_idle_sibling() check only for the cpu that the task is currently woken-up on. If select_task_rq_fair() selects the previously run cpu for waking the task, doing a select_idle_sibling() check for that cpu also helps and we don't do this currently. c) In the scenarios where the cpu that the task is woken-up is busy but with its HT siblings are idle, we are selecting the task be woken-up on the idle HT sibling instead of a core that it previously ran and currently completely idle. i.e., we are not taking decisions based on wake_affine() but directly selecting an idle sibling that can cause an imbalance at the SMT/MC level which will be later corrected by the periodic load balancer. Fix this by first going through the load imbalance calculations using wake_affine() and once we make a decision of woken-up cpu vs previously-ran cpu, then choose a possible idle sibling for waking up the task on. Signed-off-by: Suresh Siddha Signed-off-by: Peter Zijlstra LKML-Reference: <1270079265.7835.8.camel@sbs-t61.sc.intel.com> Signed-off-by: Ingo Molnar commit 669c55e9f99b90e46eaa0f98a67ec53d46dc969a Author: Peter Zijlstra Date: Fri Apr 16 14:59:29 2010 +0200 sched: Pre-compute cpumask_weight(sched_domain_span(sd)) Dave reported that his large SPARC machines spend lots of time in hweight64(), try and optimize some of those needless cpumask_weight() invocations (esp. with the large offstack cpumasks these are very expensive indeed). Reported-by: David Miller Signed-off-by: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit 74f5187ac873042f502227701ed1727e7c5fbfa9 Author: Peter Zijlstra Date: Thu Apr 22 21:50:19 2010 +0200 sched: Cure load average vs NO_HZ woes Chase reported that due to us decrementing calc_load_task prematurely (before the next LOAD_FREQ sample), the load average could be scewed by as much as the number of CPUs in the machine. This patch, based on Chase's patch, cures the problem by keeping the delta of the CPU going into NO_HZ idle separately and folding that in on the next LOAD_FREQ update. This restores the balance and we get strict LOAD_FREQ period samples. Signed-off-by: Peter Zijlstra Acked-by: Chase Douglas LKML-Reference: <1271934490.1776.343.camel@laptop> Signed-off-by: Ingo Molnar commit 59d3b388741cf1a5eb7ad27fd4e9ed72643164ae Author: Borislav Petkov Date: Thu Apr 22 16:07:02 2010 +0200 x86, cacheinfo: Disable index in all four subcaches When disabling an L3 cache index, make sure we disable that index in all four subcaches of the L3. Clarify nomenclature while at it, wrt to disable slots versus disable index and rename accordingly. Signed-off-by: Borislav Petkov LKML-Reference: <1271945222-5283-6-git-send-email-bp@amd64.org> Signed-off-by: H. Peter Anvin commit ba06edb63f5ef2913aad37070eaec3c9d8ac73b8 Author: Borislav Petkov Date: Thu Apr 22 16:07:01 2010 +0200 x86, cacheinfo: Make L3 cache info per node Currently, we're allocating L3 cache info and calculating indices for each online cpu which is clearly superfluous. Instead, we need to do this per-node as is each L3 cache. No functional change, only per-cpu memory savings. -v2: Allocate L3 cache descriptors array dynamically. Signed-off-by: Borislav Petkov LKML-Reference: <1271945222-5283-5-git-send-email-bp@amd64.org> Signed-off-by: H. Peter Anvin commit 9350f982e4fe539e83a2d4a13e9b53ad8253c4a8 Author: Borislav Petkov Date: Thu Apr 22 16:07:00 2010 +0200 x86, cacheinfo: Reorganize AMD L3 cache structure Add a struct representing L3 cache attributes (subcache sizes and indices count) and move the respective members out of _cpuid4_info. Also, stash the struct pci_dev ptr into the struct simplifying the code even more. There should be no functionality change resulting from this patch except slightly slimming the _cpuid4_info per-cpu vars. Signed-off-by: Borislav Petkov LKML-Reference: <1271945222-5283-4-git-send-email-bp@amd64.org> Signed-off-by: H. Peter Anvin commit f2b20e41407fccfcfacf927ff91ec888832a37af Author: Frank Arnold Date: Thu Apr 22 16:06:59 2010 +0200 x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments When running a quest kernel on xen we get: BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 IP: [] cpuid4_cache_lookup_regs+0x2ca/0x3df PGD 0 Oops: 0000 [#1] SMP last sysfs file: CPU 0 Modules linked in: Pid: 0, comm: swapper Tainted: G W 2.6.34-rc3 #1 /HVM domU RIP: 0010:[] [] cpuid4_cache_lookup_regs+0x 2ca/0x3df RSP: 0018:ffff880002203e08 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000060 RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000 RBP: ffff880002203ed8 R08: 00000000000017c0 R09: ffff880002203e38 R10: ffff8800023d5d40 R11: ffffffff81a01e28 R12: ffff880187e6f5c0 R13: ffff880002203e34 R14: ffff880002203e58 R15: ffff880002203e68 FS: 0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000038 CR3: 0000000001a3c000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a44020) Stack: ffffffff810d7ecb ffff880002203e20 ffffffff81059140 ffff880002203e30 <0> ffffffff810d7ec9 0000000002203e40 000000000050d140 ffff880002203e70 <0> 0000000002008140 0000000000000086 ffff880040020140 ffffffff81068b8b Call Trace: [] ? sync_supers_timer_fn+0x0/0x1c [] ? mod_timer+0x23/0x25 [] ? arm_supers_timer+0x34/0x36 [] ? hrtimer_get_next_event+0xa7/0xc3 [] ? get_next_timer_interrupt+0x19a/0x20d [] get_cpu_leaves+0x5c/0x232 [] ? sched_clock_local+0x1c/0x82 [] ? sched_clock_tick+0x75/0x7a [] generic_smp_call_function_single_interrupt+0xae/0xd0 [] smp_call_function_single_interrupt+0x18/0x27 [] call_function_single_interrupt+0x13/0x20 [] ? notifier_call_chain+0x14/0x63 [] ? native_safe_halt+0xc/0xd [] ? default_idle+0x36/0x53 [] cpu_idle+0xaa/0xe4 [] rest_init+0x7e/0x80 [] start_kernel+0x40e/0x419 [] x86_64_start_reservations+0xb3/0xb7 [] x86_64_start_kernel+0xf8/0x107 Code: 14 d5 40 ff ae 81 8b 14 02 31 c0 3b 15 47 1c 8b 00 7d 0e 48 8b 05 36 1c 8b 00 48 63 d2 48 8b 04 d0 c7 85 5c ff ff ff 00 00 00 00 <8b> 70 38 48 8d 8d 5c ff ff ff 48 8b 78 10 ba c4 01 00 00 e8 eb RIP [] cpuid4_cache_lookup_regs+0x2ca/0x3df RSP CR2: 0000000000000038 ---[ end trace a7919e7f17c0a726 ]--- The L3 cache index disable feature of AMD CPUs has to be disabled if the kernel is running as guest on top of a hypervisor because northbridge devices are not available to the guest. Currently, this fixes a boot crash on top of Xen. In the future this will become an issue on KVM as well. Check if northbridge devices are present and do not enable the feature if there are none. Signed-off-by: Frank Arnold LKML-Reference: <1271945222-5283-3-git-send-email-bp@amd64.org> Acked-by: Borislav Petkov Signed-off-by: H. Peter Anvin commit b1ab1b4d9ab9812c77843abec79030292ef0a544 Author: Borislav Petkov Date: Thu Apr 22 16:06:58 2010 +0200 x86, cacheinfo: Unify AMD L3 cache index disable checking All F10h CPUs starting with model 8 resp. 9, stepping 1, support L3 cache index disable. Concentrate the family, model, stepping checking at one place and enable the feature implicitly on upcoming Fam10h models. Signed-off-by: Borislav Petkov LKML-Reference: <1271945222-5283-2-git-send-email-bp@amd64.org> Signed-off-by: H. Peter Anvin commit fa588e0c57048b3d4bfcd772d80dc0615f83fd35 Author: Steve French Date: Thu Apr 22 19:21:55 2010 +0000 [CIFS] Allow null nd (as nfs server uses) on create While creating a file on a server which supports unix extensions such as Samba, if a file is being created which does not supply nameidata (i.e. nd is null), cifs client can oops when calling cifs_posix_open. Signed-off-by: Shirish Pargaonkar Signed-off-by: Steve French commit fead7960f0b645348fa4757100f4d57654bf9ffa Author: Ian Munsie Date: Tue Apr 20 16:58:33 2010 +1000 perf probe: Add PowerPC DWARF register number mappings This adds mappings from the register numbers from DWARF to the register names used in the PowerPC Regs and Stack Access API. This allows perf probe to be used to record variable contents on PowerPC. This requires the functionality represented by the config symbol HAVE_REGS_AND_STACK_ACCESS_API in order to function, although it will compile without it. That functionality is added for PowerPC in commit 359e4284 ("powerpc: Add kprobe-based event tracer"). Signed-off-by: Ian Munsie Acked-by: Masami Hiramatsu Signed-off-by: Paul Mackerras commit cd932c593995abee1d1a8a0bfe608f7d103d87ad Author: Ian Munsie Date: Tue Apr 20 16:58:32 2010 +1000 perf: Move arch specific code into separate arch directory The perf userspace tool included some architecture specific code to map registers from the DWARF register number into the names used by the regs and stack access API. This moves the architecture specific code out into a separate arch/x86 directory along with the infrastructure required to use it. Signed-off-by: Ian Munsie Acked-by: Masami Hiramatsu Signed-off-by: Paul Mackerras commit cecbca96da387428e220e307a9c945e37e2f4d9e Author: Frederic Weisbecker Date: Sun Apr 18 19:08:41 2010 +0200 tracing: Dump either the oops's cpu source or all cpus buffers The ftrace_dump_on_oops kernel parameter, sysctl and sysrq let one dump every cpu buffers when an oops or panic happens. It's nice when you have few cpus but it may take ages if have many, plus you miss the real origin of the problem in all the cpu traces. Sometimes, all you need is to dump the cpu buffer that triggered the opps, most of the time it is our main interest. This patch modifies ftrace_dump_on_oops to handle this choice. The ftrace_dump_on_oops kernel parameter, when it comes alone, has the same behaviour than before. But ftrace_dump_on_oops=orig_cpu will only dump the buffer of the cpu that oops'ed. Similarly, sysctl kernel.ftrace_dump_on_oops=1 and echo 1 > /proc/sys/kernel/ftrace_dump_on_oops keep their previous behaviour. But setting 2 jumps into cpu origin dump mode. v2: Fix double setup v3: Fix spelling issues reported by Randy Dunlap v4: Also update __ftrace_dump in the selftests Signed-off-by: Frederic Weisbecker Acked-by: David S. Miller Acked-by: Steven Rostedt Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Li Zefan Cc: Lai Jiangshan commit 2c964d1f7c87eb71f7902111cd7c8fbba225e4b6 Author: Pavel Shilovsky Date: Wed Apr 21 19:44:24 2010 +0000 [CIFS] Fix losing locks during fork() When process does fork() private_data of files with lock list stays the same for file descriptors of the parent and of the child. While finishing the child closes files and deletes locks from the list even if unlocking fails. When the child process finishes the parent doesn't have lock in lock list and can't unlock previously before fork() locked region after the child process finished. This patch provides behaviour to save locks in lock list if unlocking fails. Signed-off-by: Pavel Shilovsky Reviewed-by: Jeff Layton Signed-off-by: Steve French commit ac0053fd51d2bac09a7d4b4a59f6dac863bd4373 Merge: b15c7b1 01bf0b6 Author: Ingo Molnar Date: Wed Apr 21 09:47:00 2010 +0200 Merge commit 'v2.6.34-rc5' into tracing/core Merge reason: pick up latest -rc's. Signed-off-by: Ingo Molnar commit 6eca8cc35b50af1037bc919106dd6dd332c959c2 Author: Frederic Weisbecker Date: Wed Apr 21 02:01:05 2010 +0200 perf: Fix perf probe build error When we run into dry run mode, we want to make write_kprobe_trace_event to succeed on writing the event. Let's initialize it to 0. Fixes the following build error: util/probe-event.c:1266: attention : «ret» may be used uninitialized in this function util/probe-event.c:1266: note: «ret» was declared here Signed-off-by: Frederic Weisbecker Acked-by: Masami Hiramatsu Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Arnaldo Carvalho de Melo LKML-Reference: <1271808065-25290-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar commit f19159dc5ab9ec28c3b8230689101335d98e2d68 Author: Steve French Date: Wed Apr 21 04:12:10 2010 +0000 [CIFS] Cleanup various minor breakage in previous cFYI cleanup Signed-off-by: Steve French commit b6b38f704a8193daba520493ebdaf7e819962fc8 Author: Joe Perches Date: Wed Apr 21 03:50:45 2010 +0000 [CIFS] Neaten cERROR and cFYI macros, reduce text space Neaten cERROR and cFYI macros, reduce text space ~2.5K Convert '__FILE__ ": " fmt' to '"%s: " fmt', __FILE__' to save text space Surround macros with do {} while Add parentheses to macros Make statement expression macro from macro with assign Remove now unnecessary parentheses from cFYI and cERROR uses defconfig with CIFS support old $ size fs/cifs/built-in.o text data bss dec hex filename 156012 1760 148 157920 268e0 fs/cifs/built-in.o defconfig with CIFS support old $ size fs/cifs/built-in.o text data bss dec hex filename 153508 1760 148 155416 25f18 fs/cifs/built-in.o allyesconfig old: $ size fs/cifs/built-in.o text data bss dec hex filename 309138 3864 74824 387826 5eaf2 fs/cifs/built-in.o allyesconfig new $ size fs/cifs/built-in.o text data bss dec hex filename 305655 3864 74824 384343 5dd57 fs/cifs/built-in.o Signed-off-by: Joe Perches Signed-off-by: Steve French commit 315e995c63a15cb4d4efdbfd70fe2db191917f7a Author: Nick Piggin Date: Wed Apr 21 03:18:28 2010 +0000 [CIFS] use add_to_page_cache_lru add_to_page_cache_lru is exported, so it should be used. Benefits over using a private pagevec: neater code, 128 bytes fewer stack used, percpu lru ordering is preserved, and finally don't need to flush pagevec before returning so batching may be shared with other LRU insertions. Signed-off-by: Nick Piggin Reviewed-by: Dave Kleikamp Signed-off-by: Steve French commit dcf46b9443ad48a227a61713adea001228925adf Author: Zhang, Yanmin Date: Tue Apr 20 10:13:58 2010 +0800 perf & kvm: Clean up some of the guest profiling callback API details Fix some build bug and programming style issues: - use valid C - fix up various style details Signed-off-by: Zhang Yanmin Cc: Avi Kivity Cc: Peter Zijlstra Cc: Sheng Yang Cc: Marcelo Tosatti Cc: oerg Roedel Cc: Jes Sorensen Cc: Gleb Natapov Cc: Zachary Amsden Cc: zhiteng.huang@intel.com Cc: tim.c.chen@intel.com Cc: Arnaldo Carvalho de Melo LKML-Reference: <1271729638.2078.624.camel@ymzhang.sh.intel.com> Signed-off-by: Ingo Molnar commit a1645ce12adb6c9cc9e19d7695466204e3f017fe Author: Zhang, Yanmin Date: Mon Apr 19 13:32:50 2010 +0800 perf: 'perf kvm' tool for monitoring guest performance from host Here is the patch of userspace perf tool. Signed-off-by: Zhang Yanmin Signed-off-by: Avi Kivity commit ff9d07a0e7ce756a183e7c2e483aec452ee6b574 Author: Zhang, Yanmin Date: Mon Apr 19 13:32:45 2010 +0800 KVM: Implement perf callbacks for guest sampling Below patch implements the perf_guest_info_callbacks on kvm. Signed-off-by: Zhang Yanmin Signed-off-by: Avi Kivity commit 39447b386c846bbf1c56f6403c5282837486200f Author: Zhang, Yanmin Date: Mon Apr 19 13:32:41 2010 +0800 perf: Enhance perf to allow for guest statistic collection from host Below patch introduces perf_guest_info_callbacks and related register/unregister functions. Add more PERF_RECORD_MISC_XXX bits meaning guest kernel and guest user space. Signed-off-by: Zhang Yanmin Signed-off-by: Avi Kivity commit a289cc7c70da784a2d370b91885cab4f966dcb0f Author: Randy Dunlap Date: Fri Apr 16 17:51:42 2010 -0700 x86, UV: uv_irq.c: Fix all sparse warnings Fix all sparse warnings in building uv_irq.c. arch/x86/kernel/uv_irq.c:46:17: warning: symbol 'uv_irq_chip' was not declared. Should it be static? arch/x86/kernel/uv_irq.c:143:50: error: no identifier for function argument arch/x86/kernel/uv_irq.c:162:13: error: typename in expression arch/x86/kernel/uv_irq.c:162:13: error: undefined identifier 'restrict' arch/x86/kernel/uv_irq.c:250:44: error: no identifier for function argument arch/x86/kernel/uv_irq.c:260:17: error: typename in expression arch/x86/kernel/uv_irq.c:260:17: error: undefined identifier 'restrict' arch/x86/kernel/uv_irq.c:233:50: warning: incorrect type in argument 3 (different signedness) arch/x86/kernel/uv_irq.c:233:50: expected int *pnode arch/x86/kernel/uv_irq.c:233:50: got unsigned int * arch/x86/include/asm/uv/uv_hub.h:318:44: warning: incorrect type in argument 2 (different address spaces) arch/x86/include/asm/uv/uv_hub.h:318:44: expected void volatile [noderef] *addr arch/x86/include/asm/uv/uv_hub.h:318:44: got unsigned long * Signed-off-by: Randy Dunlap Cc: Dimitri Sivanich Cc: Russ Anderson Cc: Robin Holt Cc: Mike Travis Cc: Cliff Wickman Cc: Jack Steiner LKML-Reference: <20100416175142.f4b59683.randy.dunlap@oracle.com> Signed-off-by: Ingo Molnar commit 09a40af5240de02d848247ab82440ad75b31ab11 Author: Mike Galbraith Date: Thu Apr 15 07:29:59 2010 +0200 sched: Fix UP update_avg() build warning update_avg() is only used for SMP builds, move it to the nearest SMP block. Reported-by: Stephen Rothwell Signed-off-by: Mike Galbraith Cc: Peter Zijlstra LKML-Reference: <1271309399.14779.17.camel@marge.simson.net> Signed-off-by: Ingo Molnar commit b257c14ceb1194a6181144210056d38f22127189 Merge: 371fd7e 2ba3abd Author: Ingo Molnar Date: Thu Apr 15 09:35:24 2010 +0200 Merge branch 'linus' into sched/core Merge reason: merge the latest fixes, update to -rc4. Signed-off-by: Ingo Molnar commit b5a80b7e91d6c067339e4d81a0176a835e9bf910 Merge: 84b13fd f6c903f Author: Ingo Molnar Date: Thu Apr 15 09:16:51 2010 +0200 Merge branch 'perf' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core commit 84b13fd596522db47f9545d5124c30cc00dfdf5a Merge: f921281 a0cccc2 Author: Ingo Molnar Date: Thu Apr 15 09:13:26 2010 +0200 Merge branch 'perf/live' into perf/core Conflicts: tools/perf/builtin-record.c Merge reason: add the live tracing feature, resolve conflict. Signed-off-by: Ingo Molnar commit f92128193094c288bc315db1694fafeaeb7ee1d0 Author: Frederic Weisbecker Date: Wed Apr 14 22:09:02 2010 +0200 perf: Make the trace events sample period default to 1 Trace events are mostly used for tracing and then require not to be lost when possible. As opposite to hardware events that really require to trigger after a given sample period, trace events mostly need to trigger everytime. It is a frustrating experience to trace with perf and realize we lost a lot of events because we forgot the "-c 1" option. Then default sample_period to 1 for trace events but let the user override it. Signed-off-by: Frederic Weisbecker Cc: Arnaldo Carvalho de Melo Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Ingo Molnar Cc: Thomas Gleixner commit bdef3b02ceeb97f5f67fcfa6dff13c4e70b34fb7 Author: Frederic Weisbecker Date: Wed Apr 14 20:05:17 2010 +0200 perf: Always record tracepoints raw samples from perf record Trace events are mostly used for tracing rather than simple counting. Don't bother anymore with adding -R when using them, just record raw samples of trace events every time. Signed-off-by: Frederic Weisbecker Cc: Arnaldo Carvalho de Melo Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Ingo Molnar Cc: Thomas Gleixner commit 7865e817e9b4b378ac57ab7f16183100b95466ce Author: Frederic Weisbecker Date: Wed Apr 14 19:42:07 2010 +0200 perf: Make -f the default for perf record Force the overwriting mode by default if append mode is not explicit. Adding -f every time one uses perf on a daily basis quickly becomes a burden. Keep the -f among the options though to avoid breaking some random users scripts. Signed-off-by: Frederic Weisbecker Cc: Arnaldo Carvalho de Melo Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Ingo Molnar Cc: Thomas Gleixner commit a1e2f60e3efc812bf66a2be0d8530ee175003f6d Author: Thomas Gleixner Date: Wed Apr 14 23:58:03 2010 +0200 perf: Fix dynamic field detection Checking if a tracing field is an array with a dynamic length requires to check the field type and seek the "__data_loc" string that prepends the actual type, as can be found in a trace event format file: field:__data_loc char[] name; offset:16; size:4; signed:1; But we actually use strcmp() to check if the field type fully matches "__data_loc", which may fail as we trip over the rest of the type. To fix this, use strncmp to only check if it starts with "__data_loc". Signed-off-by: Thomas Gleixner Signed-off-by: Frederic Weisbecker Cc: Arnaldo Carvalho de Melo Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Tom Zanussi Cc: Steven Rostedt LKML-Reference: <1271282283-23721-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar commit 95476b64ab11d528de2557366ec584977c215b9e Author: Frederic Weisbecker Date: Wed Apr 14 23:42:18 2010 +0200 perf: Fix hlist related build error hlist helpers need to be available for all software events, not only trace events. Pull them out outside the ifdef CONFIG_EVENT_TRACING section. Fixes: kernel/perf_event.c:4573: error: implicit declaration of function 'swevent_hlist_put' kernel/perf_event.c:4614: error: implicit declaration of function 'swevent_hlist_get' kernel/perf_event.c:5534: error: implicit declaration of function 'swevent_hlist_release Reported-by: Ingo Molnar Signed-off-by: Frederic Weisbecker Cc: Arnaldo Carvalho de Melo Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1271281338-23491-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar commit f6c903f5856ffa75ae19dcee4dbb5093e320d45c Author: Masami Hiramatsu Date: Wed Apr 14 18:40:07 2010 -0400 perf probe: Show function entry line as probe-able Function entry line should be shown as probe-able line, because each function has declared line attribute. LKML-Reference: <20100414224007.14630.96915.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Frederic Weisbecker Cc: Ingo Molnar Signed-off-by: Arnaldo Carvalho de Melo commit de1439d8a521d22c3219fc007a570fcf944ac789 Author: Masami Hiramatsu Date: Wed Apr 14 17:44:00 2010 -0300 perf probe: Support DW_OP_plus_uconst in DW_AT_data_member_location DW_OP_plus_uconst can be used for DW_AT_data_member_location. This patch adds DW_OP_plus_uconst support when getting structure member offset. Commiter note: Fixed up the size_t format specifier in one case: cc1: warnings being treated as errors util/probe-finder.c: In function ‘die_get_data_member_location’: util/probe-finder.c:270: error: format ‘%d’ expects type ‘int’, but argument 4 has type ‘size_t’ make: *** [/home/acme/git/build/perf/util/probe-finder.o] Error 1 LKML-Reference: <20100414223958.14630.5230.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Frederic Weisbecker Cc: Ingo Molnar Signed-off-by: Arnaldo Carvalho de Melo commit dda4ab34fe1905d3d590572b776dd92aa0866558 Author: Masami Hiramatsu Date: Wed Apr 14 18:39:50 2010 -0400 perf probe: Fix line range to show end line Line range should reject the range if the number of lines is 0 (e.g. "sched.c:1024+0"), and it should show the lines include the end of line number (e.g. "sched.c:1024-2048" should show 2048th line). LKML-Reference: <20100414223950.14630.42263.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Frederic Weisbecker Cc: Ingo Molnar Signed-off-by: Arnaldo Carvalho de Melo commit d3b63d7ae04879a817bac5c0bf09749f73629d32 Author: Masami Hiramatsu Date: Wed Apr 14 18:39:42 2010 -0400 perf probe: Fix a bug that --line range can be overflow Since line_finder.lno_s/e are signed int but line_range.start/end are unsigned int, it is possible to be overflow when converting line_range->start/end to line_finder->lno_s/e. This changes line_range.start/end and line_list.line to signed int and adds overflow checks when setting line_finder.lno_s/e. LKML-Reference: <20100414223942.14630.72730.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Frederic Weisbecker Cc: Ingo Molnar Signed-off-by: Arnaldo Carvalho de Melo commit dd259c5db26ccda46409dbf6efc79d5a2b259e38 Author: Masami Hiramatsu Date: Wed Apr 14 18:39:35 2010 -0400 perf probe: Fix mis-estimation for shortening filename Fix mis-estimation size for making a short filename. Since the buffer size is 32 bytes and there are '@' prefix and '\0' termination, maximum shorten filename length should be 30. This means, before searching '/', it should be 31 bytes. LKML-Reference: <20100414223935.14630.11954.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Frederic Weisbecker Cc: Ingo Molnar Signed-off-by: Arnaldo Carvalho de Melo commit 7ca5989dd065cbc48a958666c273794686ea7525 Author: Masami Hiramatsu Date: Wed Apr 14 18:39:28 2010 -0400 perf probe: Fix to use correct debugfs path finder Instead of using debugfs_path, use debugfs_find_mountpoint() to find actual debugfs path. LKML-Reference: <20100414223928.14630.38326.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu Reported-by: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Frederic Weisbecker Signed-off-by: Arnaldo Carvalho de Melo commit 02b95dadc8a1d2c302513e5fa24c492380d26e93 Author: Masami Hiramatsu Date: Mon Apr 12 13:17:56 2010 -0400 perf probe: Remove xstrdup()/xstrndup() from util/probe-{event, finder}.c Remove all xstr*dup() calls from util/probe-{event,finder}.c since it may cause 'sudden death' in utility functions and it makes reusing it from other code difficult. Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Frederic Weisbecker LKML-Reference: <20100412171756.3790.89607.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu Signed-off-by: Arnaldo Carvalho de Melo commit e334016f1d7250a6523b3a44ccecfe23af6e2f57 Author: Masami Hiramatsu Date: Mon Apr 12 13:17:49 2010 -0400 perf probe: Remove xzalloc() from util/probe-{event, finder}.c Remove all xzalloc() calls from util/probe-{event,finder}.c since it may cause 'sudden death' in utility functions and it makes reusing it from other code difficult. Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Frederic Weisbecker LKML-Reference: <20100412171749.3790.33303.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu Signed-off-by: Arnaldo Carvalho de Melo commit 146a143948ed9e8b248c0ec59937f3e9e1bbc7e5 Author: Masami Hiramatsu Date: Mon Apr 12 13:17:42 2010 -0400 perf probe: Remove die() from probe-event code Remove die() and DIE_IF() code from util/probe-event.c since these 'sudden death' in utility functions make reusing it from other code (especially tui/gui) difficult. Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Frederic Weisbecker LKML-Reference: <20100412171742.3790.33650.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu Signed-off-by: Arnaldo Carvalho de Melo commit b55a87ade3839c33ab94768a0b5955734073f987 Author: Masami Hiramatsu Date: Mon Apr 12 13:17:35 2010 -0400 perf probe: Remove die() from probe-finder code Remove die() and DIE_IF() code from util/probe-finder.c since these 'sudden death' in utility functions make reusing it from other code (especially tui/gui) difficult. Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Frederic Weisbecker LKML-Reference: <20100412171735.3790.88853.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu Signed-off-by: Arnaldo Carvalho de Melo commit a34a985499895a46a4dacff727d0fbc63cdc75e8 Author: Masami Hiramatsu Date: Mon Apr 12 13:17:29 2010 -0400 perf probe: Support DW_OP_call_frame_cfa in debuginfo When building kernel without CONFIG_FRAME_POINTER, gcc uses CFA (canonical frame address) for frame base. With this patch, perf probe just gets CFI (call frame information) from debuginfo and search corresponding CFA from the CFI. IOW, this allows perf probe works correctly on the kernel without CONFIG_FRAME_POINTER. ./perf probe -fn sched_slice:12 lw.weight Fatal: DW_OP 156 is not supported. (^^^ DW_OP_call_frame_cfa) ./perf probe -fn sched_slice:12 lw.weight Add new event: probe:sched_slice (on sched_slice:12 with weight=lw.weight) Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Frederic Weisbecker LKML-Reference: <20100412171728.3790.98217.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu Signed-off-by: Arnaldo Carvalho de Melo commit 11a1ca3554b377d2a8a318a3cbf8ce10a7a2a8e4 Author: Masami Hiramatsu Date: Mon Apr 12 13:17:22 2010 -0400 perf probe: Support basic type casting Add basic type casting for arguments to perf probe. This allows users to specify the actual type of arguments. Of course, if user sets invalid types, kprobe-tracer rejects that. Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Frederic Weisbecker LKML-Reference: <20100412171722.3790.50372.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu Signed-off-by: Arnaldo Carvalho de Melo commit 4984912eb23113a4007940cd09c8351c0623ea5f Author: Masami Hiramatsu Date: Mon Apr 12 13:17:15 2010 -0400 perf probe: Query basic types from debuginfo Query the basic type information (byte-size and signed-flag) from debuginfo and pass that to kprobe-tracer. This is especially useful for tracing the members of data structure, because each member has different byte-size on the memory. Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Frederic Weisbecker LKML-Reference: <20100412171715.3790.23730.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu Signed-off-by: Arnaldo Carvalho de Melo commit 93ccae7a2227466a0d071fe52c51319f2f34c365 Author: Masami Hiramatsu Date: Mon Apr 12 13:17:08 2010 -0400 tracing/kprobes: Support basic types on dynamic events Support basic types of integer (u8, u16, u32, u64, s8, s16, s32, s64) in kprobe tracer. With this patch, users can specify above basic types on each arguments after ':'. If omitted, the argument type is set as unsigned long (u32 or u64, arch-dependent). e.g. echo 'p account_system_time+0 hardirq_offset=%si:s32' > kprobe_events adds a probe recording hardirq_offset in signed-32bits value on the entry of account_system_time. Cc: Ingo Molnar Cc: Steven Rostedt Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Frederic Weisbecker LKML-Reference: <20100412171708.3790.18599.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu Signed-off-by: Arnaldo Carvalho de Melo commit df0faf4be02996135bc3a06b4f34360449c78084 Author: Masami Hiramatsu Date: Mon Apr 12 13:17:00 2010 -0400 perf probe: Use the last field name as the argument name Set the last field name to the argument name when the argument is refering a data-structure member. e.g. ./perf probe --add 'vfs_read file->f_mode' Add new event: probe:vfs_read (on vfs_read with f_mode=file->f_mode) This probe records file->f_mode, but the argument name becomes "f_mode". This enables perf-trace command to parse trace event format correctly. Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Frederic Weisbecker LKML-Reference: <20100412171700.3790.72961.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu Signed-off-by: Arnaldo Carvalho de Melo commit 48481938b02471d505296d7557ed296eb093d496 Author: Masami Hiramatsu Date: Mon Apr 12 13:16:53 2010 -0400 perf probe: Support argument name Set given names to event arguments. The syntax is same as kprobe-tracer, you can add 'NAME=' right before each argument. e.g. ./perf probe vfs_read foo=file then, 'foo' is set to the argument name as below. ./perf probe -l probe:vfs_read (on vfs_read@linux-2.6-tip/fs/read_write.c with foo) Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Frederic Weisbecker LKML-Reference: <20100412171653.3790.74624.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu Signed-off-by: Arnaldo Carvalho de Melo commit fcd1498405c2c88ac632e7c3c3fce3213d9196db Author: Frederic Weisbecker Date: Wed Apr 14 19:11:29 2010 +0200 perf tools: Fix accidentally preprocessed snprintf callback struct sort_entry has a callback named snprintf that turns an entry into a string result. But there are glibc versions that implement snprintf through a macro. The following expression is then going to get the snprintf call preprocessed: ent->snprintf(...) to finally end up in a build error: util/hist.c: Dans la fonction «hist_entry__snprintf» : util/hist.c:539: erreur: «struct sort_entry» has no member named «__builtin___snprintf_chk» To fix this, prepend struct sort_entry callbacks with an "se_" prefix. Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Ingo Molnar Signed-off-by: Arnaldo Carvalho de Melo commit b8f7fb13d2d7ff14818fd1d3edd8b834d38b0217 Author: Cliff Wickman Date: Wed Apr 14 11:35:46 2010 -0500 x86, UV: Improve BAU performance and error recovery - increase performance of the interrupt handler - release timed-out software acknowledge resources - recover from continuous-busy status due to a hardware issue - add a 'throttle' to keep a uvhub from sending more than a specified number of broadcasts concurrently (work around the hardware issue) - provide a 'nobau' boot command line option - rename 'pnode' and 'node' to 'uvhub' (the 'node' terminology is ambiguous) - add some new statistics about the scope of broadcasts, retries, the hardware issue and the 'throttle' - split off new function uv_bau_retry_msg() from uv_bau_process_message() per community coding style feedback. - simplify the argument list to uv_bau_process_message(), per community coding style feedback. Signed-off-by: Cliff Wickman Cc: linux-mm@kvack.org Cc: Jack Steiner Cc: Russ Anderson Cc: Mike Travis Cc: "H. Peter Anvin" Cc: Thomas Gleixner LKML-Reference: Signed-off-by: Ingo Molnar commit df8290bf7ea6b3051e2f315579a6e829309ec1ed Author: Frederic Weisbecker Date: Fri Apr 9 00:28:14 2010 +0200 perf: Make clock software events consistent with general exclusion rules The cpu/task clock events implement their own version of exclusion on top of exclude_user and exclude_kernel. The result is that when the event triggered in the kernel but we have exclude_kernel set, we try to rewind using task_pt_regs. There are two side effects of this: - we call task_pt_regs even on kernel threads, which doesn't give us the desired result. - if the event occured in the kernel, we shouldn't rewind to the user context. We want to actually ignore the event. get_irq_regs() will always give us the right interrupted context, so use its result and submit it to perf_exclude_context() that knows when an event must be ignored. Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Ingo Molnar commit 76e1d9047e4edefb8ada20aa90d5762306082bd6 Author: Frederic Weisbecker Date: Mon Apr 5 15:35:57 2010 +0200 perf: Store active software events in a hashlist Each time a software event triggers, we need to walk through the entire list of events from the current cpu and task contexts to retrieve a running perf event that matches. We also need to check a matching perf event is actually counting. This walk is wasteful and makes the event fast path scaling down with a growing number of events running on the same contexts. To solve this, we store the running perf events in a hashlist to get an immediate access to them against their type:event_id when they trigger. v2: - Fix SWEVENT_HLIST_SIZE definition (and re-learn some basic maths along the way) - Only allocate hlist for online cpus, but keep track of the refcount on offline possible cpus too, so that we allocate it if needed when it becomes online. - Drop the kref use as it's not adapted to our tricks anymore. v3: - Fix bad refcount check (address instead of value). Thanks to Eric Dumazet who spotted this. - While exiting cpu, move the hlist release out of the IPI path to lock the hlist mutex sanely. Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Ingo Molnar commit 3f10940e4fb69d312602078f2c5234206797ca31 Author: Arnd Bergmann Date: Sat Apr 10 16:46:21 2010 +0200 x86/microcode: Use nonseekable_open() No need to seek on this file, so prevent it outright so we can avoid using default_llseek - removes one more BKL usage. Signed-off-by: Arnd Bergmann [drop useless llseek = no_llseek and smp_lock.h inclusion] Signed-off-by: Frederic Weisbecker Cc: Arnd Bergmann Cc: H. Peter Anvin Cc: Dmitry Adamushko LKML-Reference: <1270910781-8786-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar commit b15c7b1cee119999e9eafcd602d24a595e77adac Merge: c1ab9ca aa27497 Author: Ingo Molnar Date: Wed Apr 14 12:15:23 2010 +0200 Merge branch 'tip/tracing/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/core commit a0cccc2e8e9fb16cbed3a117b30e3fbac3092ee3 Author: Tom Zanussi Date: Thu Apr 1 23:59:25 2010 -0500 perf trace: Invoke live mode automatically if record/report not specified Currently, live mode is invoked by explicitly invoking the record and report sides and connecting them with a pipe e.g. $ perf trace record rwtop -o - | perf trace report rwtop 5 -i - In terms of usability, it's not that bad, but it does require the user to type and remember more than necessary. This patch allows the user to accomplish the same thing without specifying the separate record/report steps or the pipe. So the same command as above can be accomplished more simply as: $ perf trace rwtop 5 Notice that the '-i -' and '-o -' aren't required in this case - they're added internally, and that any extra arguments are passed along to the report script (but not to the record script). The overall effect is that any of the scripts listed in 'perf trace -l' can now be used directly in live mode, with the expected arguments, by simply specifying the script and args to 'perf trace'. Signed-off-by: Tom Zanussi Acked-by: Thomas Gleixner Cc: fweisbec@gmail.com Cc: rostedt@goodmis.org Cc: k-keiichi@bx.jp.nec.com Cc: acme@ghostprotocols.net LKML-Reference: <1270184365-8281-12-git-send-email-tzanussi@gmail.com> Signed-off-by: Ingo Molnar commit 00b21a01935892a2b97613f10300434998f45093 Author: Tom Zanussi Date: Thu Apr 1 23:59:24 2010 -0500 perf trace/scripting: Enable scripting shell scripts for live mode It should be possible to run any perf trace script in 'live mode'. This requires being able to pass in e.g. '-i -' or other args, which the current shell scripts aren't equipped to handle. In a few cases, there are required or optional args that also need special handling. This patch makes changes the current set of shell scripts as necessary. Signed-off-by: Tom Zanussi Acked-by: Thomas Gleixner Cc: fweisbec@gmail.com Cc: rostedt@goodmis.org Cc: k-keiichi@bx.jp.nec.com Cc: acme@ghostprotocols.net LKML-Reference: <1270184365-8281-11-git-send-email-tzanussi@gmail.com> Signed-off-by: Ingo Molnar commit 47902f3611b392209e2a412bf7ec02dca95e666d Author: Tom Zanussi Date: Thu Apr 1 23:59:23 2010 -0500 perf trace/scripting: Add rwtop and sctop scripts A couple of scripts, one in Python and the other in Perl, that demonstrate 'live mode' tracing. For each, the output of the perf event stream is fed continuously to the script, which continuously aggregates the data and reports the current results every 3 seconds, or at the optionally specified interval. After the current results are displayed, the aggregations are cleared and the cycle begins anew. To run the scripts, simply pipe the output of the 'perf trace record' step as input to the corresponding 'perf trace report' step, using '-' as the filename to -o and -i: $ perf trace record sctop -o - | perf trace report sctop -i - Also adds clear_term() utility functions to the Util.pm and Util.py utility modules, for use by any script to clear the screen. Signed-off-by: Tom Zanussi Acked-by: Thomas Gleixner Cc: fweisbec@gmail.com Cc: rostedt@goodmis.org Cc: k-keiichi@bx.jp.nec.com Cc: acme@ghostprotocols.net LKML-Reference: <1270184365-8281-10-git-send-email-tzanussi@gmail.com> Signed-off-by: Ingo Molnar commit c7929e4727e8ff2d6fc8327188820e3b1c2f1dc3 Author: Tom Zanussi Date: Thu Apr 1 23:59:22 2010 -0500 perf: Convert perf header build_ids into build_id events Bypasses the build_id perf header code and replaces it with a synthesized event and processing function that accomplishes the same thing, used when reading/writing perf data to/from a pipe. Signed-off-by: Tom Zanussi Acked-by: Thomas Gleixner Cc: fweisbec@gmail.com Cc: rostedt@goodmis.org Cc: k-keiichi@bx.jp.nec.com Cc: acme@ghostprotocols.net LKML-Reference: <1270184365-8281-9-git-send-email-tzanussi@gmail.com> Signed-off-by: Ingo Molnar commit 9215545e99d8c0b27323df2de504f4294bf5e407 Author: Tom Zanussi Date: Thu Apr 1 23:59:21 2010 -0500 perf: Convert perf tracing data into a tracing_data event Bypasses the tracing_data perf header code and replaces it with a synthesized event and processing function that accomplishes the same thing, used when reading/writing perf data to/from a pipe. The tracing data is pretty large, and this patch doesn't attempt to break it down into component events. The tracing_data event itself doesn't actually contain the tracing data, rather it arranges for the event processing code to skip over it after it's read, using the skip return value added to the event processing loop in a previous patch. Signed-off-by: Tom Zanussi Acked-by: Thomas Gleixner Cc: fweisbec@gmail.com Cc: rostedt@goodmis.org Cc: k-keiichi@bx.jp.nec.com Cc: acme@ghostprotocols.net LKML-Reference: <1270184365-8281-8-git-send-email-tzanussi@gmail.com> Signed-off-by: Ingo Molnar commit cd19a035f3b63fee6dcbdb5371c4b22276f7dc8c Author: Tom Zanussi Date: Thu Apr 1 23:59:20 2010 -0500 perf: Convert perf event types into event type events Bypasses the event type perf header code and replaces it with a synthesized event and processing function that accomplishes the same thing, used when reading/writing perf data to/from a pipe. Signed-off-by: Tom Zanussi Acked-by: Thomas Gleixner Cc: fweisbec@gmail.com Cc: rostedt@goodmis.org Cc: k-keiichi@bx.jp.nec.com Cc: acme@ghostprotocols.net LKML-Reference: <1270184365-8281-7-git-send-email-tzanussi@gmail.com> Signed-off-by: Ingo Molnar commit 2c46dbb517a10b18d459e6ceffefde5bfb290cf6 Author: Tom Zanussi Date: Thu Apr 1 23:59:19 2010 -0500 perf: Convert perf header attrs into attr events Bypasses the attr perf header code and replaces it with a synthesized event and processing function that accomplishes the same thing, used when reading/writing perf data to/from a pipe. Making the attrs into events allows them to be streamed over a pipe along with the rest of the header data (in later patches). It also paves the way to allowing events to be added and removed from perf sessions dynamically. Signed-off-by: Tom Zanussi Acked-by: Thomas Gleixner Cc: fweisbec@gmail.com Cc: rostedt@goodmis.org Cc: k-keiichi@bx.jp.nec.com Cc: acme@ghostprotocols.net LKML-Reference: <1270184365-8281-6-git-send-email-tzanussi@gmail.com> Signed-off-by: Ingo Molnar commit c239da3b4b55dbb8f30bcb8d1a0d63fc44a567c3 Author: Tom Zanussi Date: Thu Apr 1 23:59:18 2010 -0500 perf trace: Introduce special handling for pipe input Adds special treatment for stdin - if the user specifies '-i -' to perf trace, the intent is that the event stream be read from stdin rather than from a disk file. The actual handling of the '-' filename is done by the session; this just adds a signal handler to stop reporting, and turns off interference by the pager. Signed-off-by: Tom Zanussi Acked-by: Thomas Gleixner Cc: fweisbec@gmail.com Cc: rostedt@goodmis.org Cc: k-keiichi@bx.jp.nec.com Cc: acme@ghostprotocols.net LKML-Reference: <1270184365-8281-5-git-send-email-tzanussi@gmail.com> Signed-off-by: Ingo Molnar commit 46656ac7fb3252f8a3db29b18638e0e8067849ba Author: Tom Zanussi Date: Thu Apr 1 23:59:17 2010 -0500 perf report: Introduce special handling for pipe input Adds special treatment for stdin - if the user specifies '-i -' to perf report, the intent is that the event stream be written to stdin rather than from a disk file. The actual handling of the '-' filename is done by the session; this just adds a signal handler to stop reporting, and turns off interference by the pager. Signed-off-by: Tom Zanussi Acked-by: Thomas Gleixner Cc: fweisbec@gmail.com Cc: rostedt@goodmis.org Cc: k-keiichi@bx.jp.nec.com Cc: acme@ghostprotocols.net LKML-Reference: <1270184365-8281-4-git-send-email-tzanussi@gmail.com> Signed-off-by: Ingo Molnar commit 529870e37473a9fc609078f03cc5b4148cf06a87 Author: Tom Zanussi Date: Thu Apr 1 23:59:16 2010 -0500 perf record: Introduce special handling for pipe output Adds special treatment for stdout - if the user specifies '-o -' to perf record, the intent is that the event stream be written to stdout rather than to a disk file. Also, redirect stdout of forked child to stderr - in pipe mode, stdout of the forked child interferes with the stdout perf stream, so redirect it to stderr where it can still be seen but won't be mixed in with the perf output. Signed-off-by: Tom Zanussi Acked-by: Thomas Gleixner Cc: fweisbec@gmail.com Cc: rostedt@goodmis.org Cc: k-keiichi@bx.jp.nec.com Cc: acme@ghostprotocols.net LKML-Reference: <1270184365-8281-3-git-send-email-tzanussi@gmail.com> Signed-off-by: Ingo Molnar commit 8dc58101f2c838355d44402aa77646649d10dbec Author: Tom Zanussi Date: Thu Apr 1 23:59:15 2010 -0500 perf: Add pipe-specific header read/write and event processing code This patch makes several changes to allow the perf event stream to be sent and received over a pipe: - adds pipe-specific versions of the header read/write code - adds pipe-specific version of the event processing code - adds a range of event types to be used for header or other pseudo events, above the range used by the kernel - checks the return value of event handlers, which they can use to skip over large events during event processing rather than actually reading them into event objects. - unifies the multiple do_read() functions and updates its users. Note that none of these changes affect the existing perf data file format or processing - this code only comes into play if perf output is sent to stdout (or is read from stdin). Signed-off-by: Tom Zanussi Acked-by: Thomas Gleixner Cc: fweisbec@gmail.com Cc: rostedt@goodmis.org Cc: k-keiichi@bx.jp.nec.com Cc: acme@ghostprotocols.net LKML-Reference: <1270184365-8281-2-git-send-email-tzanussi@gmail.com> Signed-off-by: Ingo Molnar commit c05556421742eb47f80301767653a4bcb19de9de Author: Ian Munsie Date: Tue Apr 13 18:37:33 2010 +1000 perf: Fix endianness argument compatibility with OPT_BOOLEAN() and introduce OPT_INCR() Parsing an option from the command line with OPT_BOOLEAN on a bool data type would not work on a big-endian machine due to the manner in which the boolean was being cast into an int and incremented. For example, running 'perf probe --list' on a PowerPC machine would fail to properly set the list_events bool and would therefore print out the usage information and terminate. This patch makes OPT_BOOLEAN work as expected with a bool datatype. For cases where the original OPT_BOOLEAN was intentionally being used to increment an int each time it was passed in on the command line, this patch introduces OPT_INCR with the old behaviour of OPT_BOOLEAN (the verbose variable is currently the only such example of this). I have reviewed every use of OPT_BOOLEAN to verify that a true C99 bool was passed. Where integers were used, I verified that they were only being used for boolean logic and changed them to bools to ensure that they would not be mistakenly used as ints. The major exception was the verbose variable which now uses OPT_INCR instead of OPT_BOOLEAN. Signed-off-by: Ian Munsie Acked-by: David S. Miller Cc: # NOTE: wont apply to .3[34].x cleanly, please backport Cc: Git development list Cc: Ian Munsie Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Arnaldo Carvalho de Melo Cc: KOSAKI Motohiro Cc: Hitoshi Mitake Cc: Rusty Russell Cc: Frederic Weisbecker Cc: Eric B Munson Cc: Valdis.Kletnieks@vt.edu Cc: WANG Cong Cc: Thiago Farina Cc: Masami Hiramatsu Cc: Xiao Guangrong Cc: Jaswinder Singh Rajput Cc: Arjan van de Ven Cc: OGAWA Hirofumi Cc: Mike Galbraith Cc: Tom Zanussi Cc: Anton Blanchard Cc: John Kacur Cc: Li Zefan Cc: Steven Rostedt LKML-Reference: <1271147857-11604-1-git-send-email-imunsie@au.ibm.com> Signed-off-by: Ingo Molnar commit 5534ecb2dda04345e8243901e0e49599228b4273 Author: Arnd Bergmann Date: Sat Feb 27 19:49:37 2010 +0100 ptrace: kill BKL in ptrace syscall The comment suggests that this usage is stale. There is no bkl in the exec path so if there is a race lurking there, the bkl in ptrace is not going to help in this regard. Overview of the possibility of "accidental" races this bkl might protect: - ptrace_traceme() is protected against task removal and concurrent read/write on current->ptrace as it locks write tasklist_lock. - arch_ptrace_attach() is serialized by ptrace_traceme() against concurrent PTRACE_TRACEME or PTRACE_ATTACH - ptrace_attach() is protected the same way ptrace_traceme() and in turn serializes arch_ptrace_attach() - ptrace_check_attach() does its own well described serializing too. There is no obvious race here. Signed-off-by: Arnd Bergmann Signed-off-by: Frederic Weisbecker Acked-by: Oleg Nesterov Acked-by: Roland McGrath Cc: Andrew Morton Cc: Roland McGrath commit 6dad2a29646ce3792c40cfc52d77e9b65a7bb143 Author: Borislav Petkov Date: Wed Mar 31 21:56:46 2010 +0200 cpufreq: Unify sysfs attribute definition macros Multiple modules used to define those which are with identical functionality and were needlessly replicated among the different cpufreq drivers. Push them into the header and remove duplication. Signed-off-by: Borislav Petkov LKML-Reference: <1270065406-1814-7-git-send-email-bp@amd64.org> Reviewed-by: Thomas Renninger Signed-off-by: H. Peter Anvin commit 679370641e3675633cad222449262abbe93a4a2a Author: Mark Langsdorf Date: Wed Mar 31 21:56:45 2010 +0200 powernow-k8: Fix frequency reporting With F10, model 10, all valid frequencies are in the ACPI _PST table. Cc: # 33.x 32.x Signed-off-by: Mark Langsdorf LKML-Reference: <1270065406-1814-6-git-send-email-bp@amd64.org> Signed-off-by: Borislav Petkov Reviewed-by: Thomas Renninger Signed-off-by: H. Peter Anvin commit a2fed573f065e526bfd5cbf26e5491973d9e9aaa Author: Mark Langsdorf Date: Thu Mar 18 18:41:46 2010 +0100 x86, cpufreq: Add APERF/MPERF support for AMD processors Starting with model 10 of Family 0x10, AMD processors may have support for APERF/MPERF. Add support for identifying it and using it within cpufreq. Move the APERF/MPERF functions out of the acpi-cpufreq code and into their own file so they can easily be shared. Signed-off-by: Mark Langsdorf LKML-Reference: <20100401141956.GA1930@aftab> Signed-off-by: Borislav Petkov Reviewed-by: Thomas Renninger Signed-off-by: H. Peter Anvin commit d65ad45cd82a0db9544469b8c54f5dc5cafbb2d8 Author: Borislav Petkov Date: Wed Mar 31 21:56:43 2010 +0200 x86: Unify APERF/MPERF support Initialize this CPUID flag feature in common code. It could be made a standalone function later, maybe, if more functionality is duplicated. Signed-off-by: Borislav Petkov LKML-Reference: <1270065406-1814-4-git-send-email-bp@amd64.org> Reviewed-by: Thomas Renninger Signed-off-by: H. Peter Anvin commit 73860c6b2fd159a35637e233d735e36887c266ad Author: Borislav Petkov Date: Wed Mar 31 21:56:42 2010 +0200 powernow-k8: Add core performance boost support Starting with F10h, revE, AMD processors add support for a dynamic core boosting feature called Core Performance Boost. When a specific condition is present, a subset of the cores on a system are boosted beyond their P0 operating frequency to speed up the performance of single-threaded applications. In the normal case, the system comes out of reset with core boosting enabled. This patch adds a sysfs knob with which core boosting can be switched on or off for benchmarking purposes. While at it, make the CPB code hotplug-aware so that taking cores offline wouldn't interfere with boosting the remaining online cores. Furthermore, add cpu_online_mask hotplug protection as suggested by Andrew. Finally, cleanup the driver init codepath and update copyrights. Signed-off-by: Borislav Petkov LKML-Reference: <1270065406-1814-3-git-send-email-bp@amd64.org> Reviewed-by: Thomas Renninger Signed-off-by: H. Peter Anvin commit 5958f1d5d722df7a9e5d129676614a8e5219bacd Author: Borislav Petkov Date: Wed Mar 31 21:56:41 2010 +0200 x86, cpu: Add AMD core boosting feature flag to /proc/cpuinfo By semi-popular demand, this adds the Core Performance Boost feature flag to /proc/cpuinfo. Possible use case for this is userspace tools like cpufreq-aperf, for example, so that they don't have to jump through hoops of accessing "/dev/cpu/%d/cpuid" in order to check for CPB hw support, or call cpuid from userspace. Signed-off-by: Borislav Petkov LKML-Reference: <1270065406-1814-2-git-send-email-bp@amd64.org> Reviewed-by: Thomas Renninger Signed-off-by: H. Peter Anvin commit 53e5b5c215ce8372250e227f2c9acf9892de8434 Author: Arnaldo Carvalho de Melo Date: Fri Apr 9 13:33:54 2010 -0300 perf tools: Fix perl support installation when O= is used We need to create the $O/scripts/perl/Perf-Trace-Util/ directory too. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit e9e94e3bd862d31777335722e747e97d9821bc1d Author: Arnaldo Carvalho de Melo Date: Mon Apr 5 18:01:10 2010 -0300 perf trace: Ignore "overwrite" field if present in /events/header_page That is not used in perf where we have the LOST events. Without this patch we get: [root@doppio ~]# perf lock report | head -3 Warning: Error: expected 'data' but read 'overwrite' So, to make the same perf command work with kernels with and without this field, introduce variants for the parsing routines to not warn the user in such case. Discussed-with: Steven Rostedt Cc: Frédéric Weisbecker Cc: Hitoshi Mitake Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 854c5548dfad017920a36201d40449fdbad90bfb Author: Randy Dunlap Date: Wed Mar 31 11:31:00 2010 -0700 perf: cleanup some Documentation Correct typos in perf bench & perf sched help text. Cc: Peter Zijlstra , Cc: Paul Mackerras LKML-Reference: <20100331113100.cc898487.randy.dunlap@oracle.com> Signed-off-by: Randy Dunlap commit f0e9c4fcefa42b5d28b915768dab81a7fdbbc00c Author: Randy Dunlap Date: Wed Mar 31 11:30:56 2010 -0700 perf bench: fix spello Fix spello in user message. Cc: Peter Zijlstra , Cc: Paul Mackerra s LKML-Reference: <20100331113056.2c7df509.randy.dunlap@oracle.com> Signed-off-by: Randy Dunlap commit eed05fe70f96b04ebeb218b07ae8898e605f9b23 Author: Arnaldo Carvalho de Melo Date: Mon Apr 5 12:53:45 2010 -0300 perf tools: Reorganize some structs to save space Using 'pahole --packable' I found some structs that could be reorganized to eliminate alignment holes, in some cases getting them to be cacheline multiples. [acme@doppio linux-2.6-tip]$ codiff perf.old ~/bin/perf builtin-annotate.c: struct perf_session | -8 struct perf_header | -8 2 structs changed builtin-diff.c: struct sample_data | -8 1 struct changed diff__process_sample_event | -8 1 function changed, 8 bytes removed, diff: -8 builtin-sched.c: struct sched_atom | -8 1 struct changed builtin-timechart.c: struct per_pid | -8 1 struct changed cmd_timechart | -16 1 function changed, 16 bytes removed, diff: -16 builtin-probe.c: struct perf_probe_point | -8 struct perf_probe_event | -8 2 structs changed opt_add_probe_event | -3 1 function changed, 3 bytes removed, diff: -3 util/probe-finder.c: struct probe_finder | -8 1 struct changed find_kprobe_trace_events | -16 1 function changed, 16 bytes removed, diff: -16 /home/acme/bin/perf: 4 functions changed, 43 bytes removed, diff: -43 [acme@doppio linux-2.6-tip]$ Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit c0ed55d2e4f600335193612725c0d7c8211a7a4a Author: Arnaldo Carvalho de Melo Date: Mon Apr 5 12:04:23 2010 -0300 perf TUI: Move "Yes" button to before "No" Esc + Enter should be enough warning to avoid accidentaly exiting from the browser. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 6e7ab4c649eb7ed403d970b8eda32ca3745e8024 Author: Arnaldo Carvalho de Melo Date: Mon Apr 5 12:02:18 2010 -0300 perf TUI: Show filters on the title and add help line about how to zoom out Suggested-by: Ingo Molnar Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit ca7e0c612005937a4a5a75d3fed90459993de65c Merge: 8141d00 f5284e7 Author: Ingo Molnar Date: Thu Apr 8 13:36:36 2010 +0200 Merge branch 'linus' into perf/core Semantic conflict: arch/x86/kernel/cpu/perf_event_intel_ds.c Merge reason: pick up latest fixes, fix the conflict Signed-off-by: Ingo Molnar commit c1ab9cab75098924fa8226a8a371de66977439df Merge: ff0ff84 f5284e7 Author: Ingo Molnar Date: Thu Apr 8 09:06:12 2010 +0200 Merge branch 'linus' into tracing/core Conflicts: include/linux/module.h kernel/module.c Semantic conflict: include/trace/events/module.h Merge reason: Resolve the conflict with upstream commit 5fbfb18 ("Fix up possibly racy module refcounting") Signed-off-by: Ingo Molnar commit d61931d89be506372d01a90d1755f6d0a9fafe2d Author: Borislav Petkov Date: Fri Mar 5 17:34:46 2010 +0100 x86: Add optimized popcnt variants Add support for the hardware version of the Hamming weight function, popcnt, present in CPUs which advertize it under CPUID, Function 0x0000_0001_ECX[23]. On CPUs which don't support it, we fallback to the default lib/hweight.c sw versions. A synthetic benchmark comparing popcnt with __sw_hweight64 showed almost a 3x speedup on a F10h machine. Signed-off-by: Borislav Petkov LKML-Reference: <20100318112015.GC11152@aftab> Signed-off-by: H. Peter Anvin commit 1527bc8b928dd1399c3d3467dd47d9ede210978a Author: Peter Zijlstra Date: Mon Feb 1 15:03:07 2010 +0100 bitops: Optimize hweight() by making use of compile-time evaluation Rename the extisting runtime hweight() implementations to __arch_hweight(), rename the compile-time versions to __const_hweight() and then have hweight() pick between them. Suggested-by: H. Peter Anvin Signed-off-by: Peter Zijlstra LKML-Reference: <20100318111929.GB11152@aftab> Acked-by: H. Peter Anvin LKML-Reference: <1265028224.24455.154.camel@laptop> Signed-off-by: H. Peter Anvin commit bd6d29c25bb1a24a4c160ec5de43e0004e01f72b Author: Frederic Weisbecker Date: Tue Apr 6 00:10:17 2010 +0200 lockstat: Make lockstat counting per cpu Locking statistics are implemented using global atomic variables. This is usually fine unless some path write them very often. This is the case for the function and function graph tracers that disable irqs for each entry saved (except if the function tracer is in preempt disabled only mode). And calls to local_irq_save/restore() increment hardirqs_on_events and hardirqs_off_events stats (or similar stats for redundant versions). Incrementing these global vars for each function ends up in too much cache bouncing if lockstats are enabled. To solve this, implement the debug_atomic_*() operations using per cpu vars. -v2: Use per_cpu() instead of get_cpu_var() to fetch the desired cpu vars on debug_atomic_read() -v3: Store the stats in a structure. No need for local_t as we are NMI/irq safe. -v4: Fix tons of build errors. I thought I had tested it but I probably forgot to select the relevant config. Suggested-by: Steven Rostedt Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Steven Rostedt LKML-Reference: <1270505417-8144-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar Cc: Peter Zijlstra Cc: Steven Rostedt commit aa27497c2fb4c7f57706099bd489e683e5cc3e3b Author: Lai Jiangshan Date: Mon Apr 5 17:11:05 2010 +0800 tracing: Fix uninitialized variable of tracing/trace output Because a local variable is not initialized, I got these when I did 'cat tracing/trace'. (not trace_pipe): CPU:0 [LOST 18446744071579453134 EVENTS] ps-3099 [000] 560.770221: lock_acquire: ffff880030865010 &(&dentry->d_lock)->rlock CPU:0 [LOST 18446744071579453134 EVENTS] ps-3099 [000] 560.770221: lock_release: ffff880030865010 &(&dentry->d_lock)->rlock CPU:0 [LOST 18446612133255294080 EVENTS] ps-3099 [000] 560.770221: lock_acquire: ffff880030865010 &(&dentry->d_lock)->rlock CPU:0 [LOST 18446744071579453134 EVENTS] ps-3099 [000] 560.770222: lock_release: ffff880030865010 &(&dentry->d_lock)->rlock CPU:0 [LOST 18446744071579453134 EVENTS] ps-3099 [000] 560.770222: lock_release: ffffffff816cfb98 dcache_lock See peek_next_entry(), it does not set *lost_events when we 'cat tracing/trace' Signed-off-by: Lai Jiangshan LKML-Reference: <4BB9A929.2000303@cn.fujitsu.com> Signed-off-by: Steven Rostedt commit 8141d0050d76e5695011b5ab577ec66fb51a998c Author: Hitoshi Mitake Date: Sun Apr 4 17:13:18 2010 +0900 perf: Swap inclusion order of util.h and string.h in util/string.c Currently util/string.c includes headers in this order: string.h, util.h But this causes a build error because __USE_GNU definition is needed for strndup() definition: % make -j touch .perf.dev.null CC util/string.o cc1: warnings being treated as errors util/string.c: In function ‘argv_split’: util/string.c:171: error: implicit declaration of function ‘strndup’ util/string.c:171: error: incompatible implicit declaration of built-in function ‘strndup’ So this patch swaps the headers inclusion order. util.h defines _GNU_SOURCE, and /usr/include/features.h defines __USE_GNU as 1 if _GNU_SOURCE is defined. Signed-off-by: Hitoshi Mitake Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Arnaldo Carvalho de Melo LKML-Reference: <1270368798-27232-1-git-send-email-mitake@dcl.info.waseda.ac.jp> Signed-off-by: Frederic Weisbecker commit 3326c1ceee234e63160852720d48be8a8f7a6d08 Author: Arnd Bergmann Date: Tue Mar 23 19:09:33 2010 +0100 perf_event: Make perf fd non seekable Perf_event does not need seeking, so prevent it in order to get rid of default_llseek, which uses the BKL. Signed-off-by: Arnd Bergmann Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Ingo Molnar [drop the nonseekable_open, not needed for anon inodes] Signed-off-by: Frederic Weisbecker commit 6cc8a7c1d8560c042f486b23318a6291569ab96b Author: Frederic Weisbecker Date: Fri Mar 19 01:23:53 2010 +0100 perf: Fetch hot regs from the template caller Trace events can be defined from a template using DECLARE_EVENT_CLASS/DEFINE_EVENT or directly with TRACE_EVENT. In both cases we have a template tracepoint handler, used to record the trace, to which we pass our ftrace event instance. In the function level, if the class is named "foo" and the event is named "blah", we have the following chain of calls: perf_trace_blah() -> perf_trace_templ_foo() In the case we have several events sharing the class "blah", we'll have multiple users of perf_trace_templ_foo(), and it won't be inlined by the compiler. This is usually what happens with the DECLARE_EVENT_CLASS/DEFINE_EVENT based definition. But if perf_trace_blah() is the only caller of perf_trace_templ_foo() there are fair chances that it will be inlined. The problem is that we fetch the regs from perf_trace_templ_foo() after we rewinded the frame pointer to the second caller, we want to reach the caller of perf_trace_blah() to get the right source of the event. And we do this by always assuming that perf_trace_templ_foo() is not inlined. But as shown above this is not always true. And if it is inlined we miss the first caller, losing the most important level of precision. We get: 61.31% ls [kernel.kallsyms] [k] do_softirq | --- do_softirq irq_exit do_IRQ common_interrupt | |--25.00%-- tty_buffer_request_room Instead of: 61.31% ls [kernel.kallsyms] [k] __do_softirq | --- __do_softirq do_softirq irq_exit do_IRQ common_interrupt | |--25.00%-- tty_buffer_request_room To fix this, we fetch the regs from perf_trace_blah() rather than perf_trace_templ_foo() so that we don't have to deal with inlining surprises. That also bring us the advantage of having the true source of the event even if we don't have frame pointers. Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Ingo Molnar commit 6f4dee06fbf0133917f3d76fa3fb50e18b10c1f5 Author: Frederic Weisbecker Date: Thu Mar 18 23:47:01 2010 +0100 perf: Drop the frame reliablity check It is useless now that we have a pure stack frame walker, as given addr are always reliable. Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Ingo Molnar commit a5e29aca02fcecd086ac160ea29244cae6b4305e Author: Arnaldo Carvalho de Melo Date: Sat Apr 3 22:44:37 2010 -0300 perf TUI: Add a "Zoom into COMM(PID) thread" and reverse operations Now one can press the right arrow key and in addition to being able to filter by DSO, filter out by thread too, or a combination of both filters. With this one can start collecting events for the whole system, then focus on a subset of the collected data quickly. Cc: Avi Kivity Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 83753190c136901c916df267703937e60f24b8b8 Author: Arnaldo Carvalho de Melo Date: Sat Apr 3 16:30:44 2010 -0300 perf newt: Add a "Zoom into foo.so DSO" and reverse operations Clicking on -> will bring as one of the popup menu options a "Zoom into CURRENT DSO", i.e. CURRENT will be replaced by the name of the DSO in the current line. Choosing this option will filter out all samples that didn't took place in a symbol in this DSO. After that the option reverts to "Zoom out of CURRENT DSO", to allow going back to the more compreensive view, not filtered by DSO. Future similar operations will include zooming into a particular thread, COMM, CPU, "last minute", "last N usecs", etc. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 22a4e4c435bbc0edccc2e7e5143ce4fbe9679e2d Merge: 70a7c12 6e03bb5 Author: Ingo Molnar Date: Sat Apr 3 18:17:55 2010 +0200 Merge branch 'perf/urgent' into perf/core Conflicts: tools/perf/Makefile Merge reason: resolve the conflict. Signed-off-by: Ingo Molnar commit 70a7c1271e2bfca8ad2bf71f44c516ea2763b9ed Merge: 40b91cd 533c46c Author: Ingo Molnar Date: Sat Apr 3 18:16:42 2010 +0200 Merge branch 'perf' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core commit 533c46c31c0e82f19dbb087c77d85eaccd6fefdb Author: Arnaldo Carvalho de Melo Date: Sat Apr 3 11:54:35 2010 -0300 perf newt: Pass the input_name to perf_session__browse_hists So that it can use it in the 'perf annotate' command line, otherwise it'll use the default and not the specified -i filename passed to 'perf report'. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit e65713ea1e61e92d28284a55df2aa039ebe10003 Author: Arnaldo Carvalho de Melo Date: Sat Apr 3 11:25:56 2010 -0300 perf newt: Move the hist browser population bits to separare function Next patches will use that when applying filtes to then repopulate the browser with the narrowed vision. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit fb6b893180faec03e1d32149ef5cc412df9714df Author: Arnaldo Carvalho de Melo Date: Sat Apr 3 11:04:55 2010 -0300 perf newt: Remove useless column width calculation Not used in the TUI interface. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 4af8b35db6634dd1e0d616de689582b6c93550af Author: Anton Blanchard Date: Sat Apr 3 22:53:31 2010 +1100 perf symbols: Fill in pgoff in mmap synthesized events When we synthesize mmap events we need to fill in the pgoff field. I wasn't able to test this completely since I couldn't find an executable region with a non 0 offset. We will see it when we start doing data profiling. Signed-off-by: Anton Blanchard Cc: David Miller Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <20100403115331.GK5594@kryten> Signed-off-by: Arnaldo Carvalho de Melo commit e206d556c5793ac5e28c0aaba2e07432e5f9a098 Author: Arnaldo Carvalho de Melo Date: Sat Apr 3 10:19:26 2010 -0300 perf tools: Move the prototypes in util/string.h to util.h So that we avoid conflict with libc's string.h header. Reviewed-by: KOSAKI Motohiro Suggested-by: KOSAKI Motohiro Cc: KOSAKI Motohiro Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Signed-off-by: Arnaldo Carvalho de Melo commit 2aefa4f733f2c5ce51dd2316ffecb258463fde71 Author: Arnaldo Carvalho de Melo Date: Fri Apr 2 12:30:57 2010 -0300 perf tools: sort_dimension__add shouldn't die Propagate error instead. LKML-Reference: Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Signed-off-by: Arnaldo Carvalho de Melo commit ad5b217b152d99ca3922153500c619d9758dd87a Author: Arnaldo Carvalho de Melo Date: Fri Apr 2 10:04:18 2010 -0300 perf session: Remove one more exit() call from library code Return NULL instead and make the caller propagate the error. LKML-Reference: Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Signed-off-by: Arnaldo Carvalho de Melo commit b9fb93047756c5e4129dfda7591612de61b0e877 Author: Arnaldo Carvalho de Melo Date: Fri Apr 2 09:50:42 2010 -0300 perf hist: Only allocate callchain_node if processing callchains The struct callchain_node size is 120 bytes, that are never used when there are no callchains or '-g none' is specified, so conditionally allocate it, reducing sizeof(struct hist_entry) from 210 bytes to only 96, greatly speeding the non-callchain processing. LKML-Reference: Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Signed-off-by: Arnaldo Carvalho de Melo commit 71cf8b8ff7d6a79af086be9e4c72628da9d62d58 Author: Arnaldo Carvalho de Melo Date: Thu Apr 1 21:24:38 2010 -0300 perf kmem: Fixup the symbol address before using it We get absolute addresses in the events, but relative ones from the symbol subsystem, so calculate the absolute address by asking for the map where the symbol was found, that has the place where the DSO was actually loaded. For the core kernel this poses no problems if the kernel is not relocated by things like kexec, or if we use /proc/kallsyms, but for modules we were getting really large, negative offsets. LKML-Reference: Cc: Frédéric Weisbecker Cc: Li Zefan Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Signed-off-by: Arnaldo Carvalho de Melo commit e727ca73f85d4c5be3547eda674168219d1c22d8 Author: Arnaldo Carvalho de Melo Date: Thu Apr 1 19:12:13 2010 -0300 perf kmem: Resolve kernel symbols again Due to the assumption in perf_session__new that the kernel maps would be created using the fake PERF_RECORD_MMAP event in a perf.data file 'perf kmem --stat caller', that doesn't have such event, ends up not being able to resolve the kernel addresses. Fix it by calling perf_session__create_kernel_maps() in __cmd_kmem(). LKML-Reference: Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Signed-off-by: Arnaldo Carvalho de Melo commit a4e3b956a820162b7c1d616117b4f23b6017f504 Author: Arnaldo Carvalho de Melo Date: Wed Mar 31 11:33:40 2010 -0300 perf hist: Replace ->print() routines by ->snprintf() equivalents Then hist_entry__fprintf will just us the newly introduced hist_entry__snprintf, add the newline and fprintf it to the supplied FILE descriptor. This allows us to remove the use_browser checking in the color_printf routines, that now got color_snprintf variants too. The newt TUI browser (and other GUIs that may come in the future) don't have to worry about stdio specific stuff in the strings they get from the se->snprintf routines and instead use whatever means to do the equivalent. Also the newt TUI browser don't have to use the fmemopen() hack, instead it can use the se->snprintf routines directly. For now tho use the hist_entry__snprintf routine to reduce the patch size. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Signed-off-by: Arnaldo Carvalho de Melo commit 70162138c91b040da3162fe1f34fe8aaf6506f10 Author: Arnaldo Carvalho de Melo Date: Tue Mar 30 18:27:39 2010 -0300 perf record: Add a fallback to the reference relocation symbol Usually "_text" is enough, but I received reports that its not always available, so fallback to "_stext" for the symbol we use to check if we need to apply any relocation to all the symbols in the kernel symtab, for when, for instance, kexec is being used. Reported-by: Darren Hart Cc: Darren Hart Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Signed-off-by: Arnaldo Carvalho de Melo commit c29ede615fd35a640e771fbbb1778e915fac43a7 Author: Arnaldo Carvalho de Melo Date: Sat Mar 27 14:30:45 2010 -0300 perf tools: Allow specifying O= to build files in a separate directory Avoiding polluting the source tree with build files. Reported-by: Steven Rostedt Cc: Steven Rostedt Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Signed-off-by: Arnaldo Carvalho de Melo commit 8b2c551f9635bf1c5c2d38de300137998915478f Author: Arnaldo Carvalho de Melo Date: Sat Mar 27 11:43:36 2010 -0300 perf tools: Use -o $(BITBUCKET) in one more case As described in 1703f2c some gcc versions has issues using /dev/null, so use the mechanism used elsewhere. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Signed-off-by: Arnaldo Carvalho de Melo commit 5f4d3f8816461300ce54505c9117bf85b3044aa0 Author: Arnaldo Carvalho de Melo Date: Fri Mar 26 21:16:22 2010 -0300 perf report: Add progress bars For when we are processing the events and inserting the entries in the browser. Experimentation here: naming "ui_something" we may be treading into creating a TUI/GUI set of routines that can then be implemented in terms of multiple backends. Also the time it takes for adding things to the "browser" takes, visually (I guess I should do some profiling here ;-) ), more time than for processing the events... That means we probably need to create a custom hist_entry browser, so that we reuse the structures we have in place instead of duplicating them in newt. But progress was made and at least we can see something while long files are being loaded, that must be one of UI 101 bullet points :-) Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Signed-off-by: Arnaldo Carvalho de Melo commit 7e5e1b1404c30db5f6bc3f5203b6c21c1d244f99 Author: Arnaldo Carvalho de Melo Date: Fri Mar 26 12:30:40 2010 -0300 perf symbols: map_groups__find_symbol must return the map too Tools need to know from which map in the map_group a symbol was resolved to, so that, for isntance, we can annotate kernel modules symbols by getting its precise name, etc. Also add the _by_name variants for completeness. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Signed-off-by: Arnaldo Carvalho de Melo commit c6e718ff8cdcf5e7855077687720b37c4a07650a Author: Arnaldo Carvalho de Melo Date: Fri Mar 26 12:11:06 2010 -0300 perf symbols: Move more map_groups methods to map.c While writing a standalone test app that uses the symbol system to find kernel space symbols I noticed these also need to be moved. Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Signed-off-by: Arnaldo Carvalho de Melo commit 371fd7e7a56a5c136d31aa980011bd2f131c3ef5 Author: Peter Zijlstra Date: Wed Mar 24 16:38:48 2010 +0100 sched: Add enqueue/dequeue flags In order to reduce the dependency on TASK_WAKING rework the enqueue interface to support a proper flags field. Replace the int wakeup, bool head arguments with an int flags argument and create the following flags: ENQUEUE_WAKEUP - the enqueue is a wakeup of a sleeping task, ENQUEUE_WAKING - the enqueue has relative vruntime due to having sched_class::task_waking() called, ENQUEUE_HEAD - the waking task should be places on the head of the priority queue (where appropriate). For symmetry also convert sched_class::dequeue() to a flags scheme. Signed-off-by: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit cc87f76a601d2d256118f7bab15e35254356ae21 Author: Peter Zijlstra Date: Fri Mar 26 12:22:14 2010 +0100 sched: Fix nr_uninterruptible count The cpuload calculation in calc_load_account_active() assumes rq->nr_uninterruptible will not change on an offline cpu after migrate_nr_uninterruptible(). However the recent migrate on wakeup changes broke that and would result in decrementing the offline cpu's rq->nr_uninterruptible. Fix this by accounting the nr_uninterruptible on the waking cpu. Signed-off-by: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit 65cc8e4859ff29a9ddc989c88557d6059834c2a2 Author: Peter Zijlstra Date: Thu Mar 25 21:05:16 2010 +0100 sched: Optimize task_rq_lock() Now that we hold the rq->lock over set_task_cpu() again, we can do away with most of the TASK_WAKING checks and reduce them again to set_cpus_allowed_ptr(). Removes some conditionals from scheduling hot-paths. Signed-off-by: Peter Zijlstra Cc: Oleg Nesterov LKML-Reference: Signed-off-by: Ingo Molnar commit 0017d735092844118bef006696a750a0e4ef6ebd Author: Peter Zijlstra Date: Wed Mar 24 18:34:10 2010 +0100 sched: Fix TASK_WAKING vs fork deadlock Oleg noticed a few races with the TASK_WAKING usage on fork. - since TASK_WAKING is basically a spinlock, it should be IRQ safe - since we set TASK_WAKING (*) without holding rq->lock it could be there still is a rq->lock holder, thereby not actually providing full serialization. (*) in fact we clear PF_STARTING, which in effect enables TASK_WAKING. Cure the second issue by not setting TASK_WAKING in sched_fork(), but only temporarily in wake_up_new_task() while calling select_task_rq(). Cure the first by holding rq->lock around the select_task_rq() call, this will disable IRQs, this however requires that we push down the rq->lock release into select_task_rq_fair()'s cgroup stuff. Because select_task_rq_fair() still needs to drop the rq->lock we cannot fully get rid of TASK_WAKING. Reported-by: Oleg Nesterov Signed-off-by: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit 9084bb8246ea935b98320554229e2f371f7f52fa Author: Oleg Nesterov Date: Mon Mar 15 10:10:27 2010 +0100 sched: Make select_fallback_rq() cpuset friendly Introduce cpuset_cpus_allowed_fallback() helper to fix the cpuset problems with select_fallback_rq(). It can be called from any context and can't use any cpuset locks including task_lock(). It is called when the task doesn't have online cpus in ->cpus_allowed but ttwu/etc must be able to find a suitable cpu. I am not proud of this patch. Everything which needs such a fat comment can't be good even if correct. But I'd prefer to not change the locking rules in the code I hardly understand, and in any case I believe this simple change make the code much more correct compared to deadlocks we currently have. Signed-off-by: Oleg Nesterov Signed-off-by: Peter Zijlstra LKML-Reference: <20100315091027.GA9155@redhat.com> Signed-off-by: Ingo Molnar commit 6a1bdc1b577ebcb65f6603c57f8347309bc4ab13 Author: Oleg Nesterov Date: Mon Mar 15 10:10:23 2010 +0100 sched: _cpu_down(): Don't play with current->cpus_allowed _cpu_down() changes the current task's affinity and then recovers it at the end. The problems are well known: we can't restore old_allowed if it was bound to the now-dead-cpu, and we can race with the userspace which can change cpu-affinity during unplug. _cpu_down() should not play with current->cpus_allowed at all. Instead, take_cpu_down() can migrate the caller of _cpu_down() after __cpu_disable() removes the dying cpu from cpu_online_mask. Signed-off-by: Oleg Nesterov Acked-by: Rafael J. Wysocki Signed-off-by: Peter Zijlstra LKML-Reference: <20100315091023.GA9148@redhat.com> Signed-off-by: Ingo Molnar commit 30da688ef6b76e01969b00608202fff1eed2accc Author: Oleg Nesterov Date: Mon Mar 15 10:10:19 2010 +0100 sched: sched_exec(): Remove the select_fallback_rq() logic sched_exec()->select_task_rq() reads/updates ->cpus_allowed lockless. This can race with other CPUs updating our ->cpus_allowed, and this looks meaningless to me. The task is current and running, it must have online cpus in ->cpus_allowed, the fallback mode is bogus. And, if ->sched_class returns the "wrong" cpu, this likely means we raced with set_cpus_allowed() which was called for reason, why should sched_exec() retry and call ->select_task_rq() again? Change the code to call sched_class->select_task_rq() directly and do nothing if the returned cpu is wrong after re-checking under rq->lock. From now task_struct->cpus_allowed is always stable under TASK_WAKING, select_fallback_rq() is always called under rq-lock or the caller or the caller owns TASK_WAKING (select_task_rq). Signed-off-by: Oleg Nesterov Signed-off-by: Peter Zijlstra LKML-Reference: <20100315091019.GA9141@redhat.com> Signed-off-by: Ingo Molnar commit c1804d547dc098363443667609c272d1e4d15ee8 Author: Oleg Nesterov Date: Mon Mar 15 10:10:14 2010 +0100 sched: move_task_off_dead_cpu(): Remove retry logic The previous patch preserved the retry logic, but it looks unneeded. __migrate_task() can only fail if we raced with migration after we dropped the lock, but in this case the caller of set_cpus_allowed/etc must initiate migration itself if ->on_rq == T. We already fixed p->cpus_allowed, the changes in active/online masks must be visible to racer, it should migrate the task to online cpu correctly. Signed-off-by: Oleg Nesterov Signed-off-by: Peter Zijlstra LKML-Reference: <20100315091014.GA9138@redhat.com> Signed-off-by: Ingo Molnar commit 1445c08d06c5594895b4fae952ef8a457e89c390 Author: Oleg Nesterov Date: Mon Mar 15 10:10:10 2010 +0100 sched: move_task_off_dead_cpu(): Take rq->lock around select_fallback_rq() move_task_off_dead_cpu()->select_fallback_rq() reads/updates ->cpus_allowed lockless. We can race with set_cpus_allowed() running in parallel. Change it to take rq->lock around select_fallback_rq(). Note that it is not trivial to move this spin_lock() into select_fallback_rq(), we must recheck the task was not migrated after we take the lock and other callers do not need this lock. To avoid the races with other callers of select_fallback_rq() which rely on TASK_WAKING, we also check p->state != TASK_WAKING and do nothing otherwise. The owner of TASK_WAKING must update ->cpus_allowed and choose the correct CPU anyway, and the subsequent __migrate_task() is just meaningless because p->se.on_rq must be false. Alternatively, we could change select_task_rq() to take rq->lock right after it calls sched_class->select_task_rq(), but this looks a bit ugly. Also, change it to not assume irqs are disabled and absorb __migrate_task_irq(). Signed-off-by: Oleg Nesterov Signed-off-by: Peter Zijlstra LKML-Reference: <20100315091010.GA9131@redhat.com> Signed-off-by: Ingo Molnar commit 897f0b3c3ff40b443c84e271bef19bd6ae885195 Author: Oleg Nesterov Date: Mon Mar 15 10:10:03 2010 +0100 sched: Kill the broken and deadlockable cpuset_lock/cpuset_cpus_allowed_locked code This patch just states the fact the cpusets/cpuhotplug interaction is broken and removes the deadlockable code which only pretends to work. - cpuset_lock() doesn't really work. It is needed for cpuset_cpus_allowed_locked() but we can't take this lock in try_to_wake_up()->select_fallback_rq() path. - cpuset_lock() is deadlockable. Suppose that a task T bound to CPU takes callback_mutex. If cpu_down(CPU) happens before T drops callback_mutex stop_machine() preempts T, then migration_call(CPU_DEAD) tries to take cpuset_lock() and hangs forever because CPU is already dead and thus T can't be scheduled. - cpuset_cpus_allowed_locked() is deadlockable too. It takes task_lock() which is not irq-safe, but try_to_wake_up() can be called from irq. Kill them, and change select_fallback_rq() to use cpu_possible_mask, like we currently do without CONFIG_CPUSETS. Also, with or without this patch, with or without CONFIG_CPUSETS, the callers of select_fallback_rq() can race with each other or with set_cpus_allowed() pathes. The subsequent patches try to to fix these problems. Signed-off-by: Oleg Nesterov Signed-off-by: Peter Zijlstra LKML-Reference: <20100315091003.GA9123@redhat.com> Signed-off-by: Ingo Molnar commit 25c2d55c00c6097e6792ebf21e31342f23b9b768 Author: Li Zefan Date: Wed Mar 24 13:17:50 2010 +0800 sched: Remove USER_SCHED from documentation USER_SCHED has been removed, so update the documentation accordingly. Signed-off-by: Li Zefan Signed-off-by: Peter Zijlstra Acked-by: Serge E. Hallyn LKML-Reference: <4BA9A07E.8070508@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 32bd7eb5a7f4596c8440dd9440322fe9e686634d Author: Li Zefan Date: Wed Mar 24 13:17:19 2010 +0800 sched: Remove remaining USER_SCHED code This is left over from commit 7c9414385e ("sched: Remove USER_SCHED"") Signed-off-by: Li Zefan Acked-by: Dhaval Giani Signed-off-by: Peter Zijlstra Cc: David Howells LKML-Reference: <4BA9A05F.7010407@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit c9494727cf293ae2ec66af57547a3e79c724fec2 Merge: 6427462 42be79e Author: Ingo Molnar Date: Fri Apr 2 20:02:55 2010 +0200 Merge branch 'linus' into sched/core Merge reason: update to latest upstream Signed-off-by: Ingo Molnar commit 40b91cd10f000b4c4934e48e2e5c0bec66def144 Author: Peter Zijlstra Date: Mon Mar 29 16:37:17 2010 +0200 perf, x86: Add Nehalem programming quirk to Westmere According to the Xeon-5600 errata the Westmere suffers the same PMU programming bug as the original Nehalem did. Signed-off-by: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit caaa8be3b6707cb9664e573a28b00f845ce9f32e Author: Peter Zijlstra Date: Mon Mar 29 13:09:53 2010 +0200 perf, x86: Fix __initconst vs const All variables that have __initconst should also be const. Suggested-by: Stephen Rothwell Signed-off-by: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit b4cdc5c264b35c67007800dec3928e9547a9d70b Author: Peter Zijlstra Date: Tue Mar 30 17:00:06 2010 +0200 perf, x86: Fix up the ANY flag stuff Stephane noticed that the ANY flag was in generic arch code, and Cyrill reported that it broke the P4 code. Solve this by merging x86_pmu::raw_event into x86_pmu::hw_config and provide intel_pmu and amd_pmu specific versions of this callback. The intel_pmu one deals with the ANY flag, the amd_pmu adds the few extra event bits AMD64 has. Reported-by: Stephane Eranian Reported-by: Cyrill Gorcunov Acked-by: Robert Richter Acked-by: Cyrill Gorcunov Acked-by: Stephane Eranian Signed-off-by: Peter Zijlstra LKML-Reference: <1269968113.5258.442.camel@laptop> Signed-off-by: Ingo Molnar commit a098f4484bc7dae23f5b62360954007b99b64600 Author: Robert Richter Date: Tue Mar 30 11:28:21 2010 +0200 perf, x86: implement ARCH_PERFMON_EVENTSEL bit masks ARCH_PERFMON_EVENTSEL bit masks are often used in the kernel. This patch adds macros for the bit masks and removes local defines. The function intel_pmu_raw_event() becomes x86_pmu_raw_event() which is generic for x86 models and same also for p6. Duplicate code is removed. Signed-off-by: Robert Richter Signed-off-by: Peter Zijlstra LKML-Reference: <20100330092821.GH11907@erda.amd.com> Signed-off-by: Ingo Molnar commit 948b1bb89a44561560531394c18da4a99215f772 Author: Robert Richter Date: Mon Mar 29 18:36:50 2010 +0200 perf, x86: Undo some some *_counter* -> *_event* renames The big rename: cdd6c48 perf: Do the big rename: Performance Counters -> Performance Events accidentally renamed some members of stucts that were named after registers in the spec. To avoid confusion this patch reverts some changes. The related specs are MSR descriptions in AMD's BKDGs and the ARCHITECTURAL PERFORMANCE MONITORING section in the Intel 64 and IA-32 Architectures Software Developer's Manuals. This patch does: $ sed -i -e 's:num_events:num_counters:g' \ arch/x86/include/asm/perf_event.h \ arch/x86/kernel/cpu/perf_event_amd.c \ arch/x86/kernel/cpu/perf_event.c \ arch/x86/kernel/cpu/perf_event_intel.c \ arch/x86/kernel/cpu/perf_event_p6.c \ arch/x86/kernel/cpu/perf_event_p4.c \ arch/x86/oprofile/op_model_ppro.c $ sed -i -e 's:event_bits:cntval_bits:g' -e 's:event_mask:cntval_mask:g' \ arch/x86/kernel/cpu/perf_event_amd.c \ arch/x86/kernel/cpu/perf_event.c \ arch/x86/kernel/cpu/perf_event_intel.c \ arch/x86/kernel/cpu/perf_event_p6.c \ arch/x86/kernel/cpu/perf_event_p4.c Signed-off-by: Robert Richter Signed-off-by: Peter Zijlstra LKML-Reference: <1269880612-25800-2-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar commit ec5e61aabeac58670691bd0613388d16697d0d81 Merge: 75ec5a2 8bb39f9 Author: Ingo Molnar Date: Fri Apr 2 19:37:50 2010 +0200 Merge branch 'perf/urgent' into perf/core Conflicts: arch/x86/kernel/cpu/perf_event.c Merge reason: Resolve the conflict, pick up fixes Signed-off-by: Ingo Molnar commit 75ec5a245c7763c397f31ec8964d0a46c54a7386 Author: Masami Hiramatsu Date: Fri Apr 2 12:50:59 2010 -0400 perf probe: Fix to close dwarf when failing to analyze it Fix to close libdw routine when failing to analyze it in find_perf_probe_point(). Signed-off-by: Masami Hiramatsu Cc: Paul Mackerras Cc: Arnaldo Carvalho de Melo Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Frederic Weisbecker Cc: systemtap Cc: DLE LKML-Reference: <20100402165059.23551.95587.stgit@localhost6.localdomain6> Signed-off-by: Ingo Molnar commit 12e5a7ae475ccb2733d740ffb95d9ca0a18392da Author: Masami Hiramatsu Date: Fri Apr 2 12:50:53 2010 -0400 perf probe: Correct error message for non-structure type perf probe outputs incorrect error message when it is called with non-existent field on a non-data structure local variable. # perf probe vfs_read 'count.hoge' Fatal: Structure on a register is not supported yet. # perf probe vfs_read 'count->hoge' Fatal: Semantic error: hoge must be referred by '.' This corrects the messsage. # perf probe vfs_read 'count.hoge' Fatal: count is not a data structure. # perf probe vfs_read 'count->hoge' Fatal: count is not a data structure. Signed-off-by: Masami Hiramatsu Cc: Paul Mackerras Cc: Arnaldo Carvalho de Melo Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Frederic Weisbecker Cc: systemtap Cc: DLE LKML-Reference: <20100402165052.23551.75866.stgit@localhost6.localdomain6> Signed-off-by: Ingo Molnar commit c9e385826d4f1ca5a72005ab8503598f791a8dc0 Author: Masami Hiramatsu Date: Fri Apr 2 12:50:45 2010 -0400 perf probe: Fix not to return non-matched file Fix cu_find_realpath() not to return the last file path if that is not matched to input pattern. Signed-off-by: Masami Hiramatsu Cc: Paul Mackerras Cc: Arnaldo Carvalho de Melo Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Frederic Weisbecker Cc: systemtap Cc: DLE LKML-Reference: <20100402165045.23551.47780.stgit@localhost6.localdomain6> Signed-off-by: Ingo Molnar commit 085ea739adf107b5a5d131f3625e517ff4a5181e Author: Masami Hiramatsu Date: Fri Apr 2 12:50:39 2010 -0400 perf probe: Fix --line syntax help and document Just fix typos. --line option accepts ':START-END' syntax, not ':START:END'. Signed-off-by: Masami Hiramatsu Cc: Paul Mackerras Cc: Arnaldo Carvalho de Melo Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Frederic Weisbecker Cc: systemtap Cc: DLE LKML-Reference: <20100402165038.23551.62590.stgit@localhost6.localdomain6> Signed-off-by: Ingo Molnar commit ff0ff84a0767df48d728c36510365344a7e7d582 Author: Steven Rostedt Date: Wed Mar 31 22:11:42 2010 -0400 ring-buffer: Add lost event count to end of sub buffer Currently, binary readers of the ring buffer only know where events were lost, but not how many events were lost at that location. This information is available, but it would require adding another field to the sub buffer header to include it. But when a event can not fit at the end of a sub buffer, it is written to the next sub buffer. This means there is a good chance that the buffer may have room to hold this counter. If it does, write the counter at the end of the sub buffer and set another flag in the data size field that states that this information exists. Signed-off-by: Steven Rostedt commit bc21b478425ac73f66a5ec0b375a5e0d12d609ce Author: Steven Rostedt Date: Wed Mar 31 19:49:26 2010 -0400 tracing: Show the lost events in the trace_pipe output Now that the ring buffer can keep track of where events are lost. Use this information to the output of trace_pipe: hackbench-3588 [001] 1326.701660: lock_acquire: ffffffff816591e0 read rcu_read_lock hackbench-3588 [001] 1326.701661: lock_acquire: ffff88003f4091f0 &(&dentry->d_lock)->rlock hackbench-3588 [001] 1326.701664: lock_release: ffff88003f4091f0 &(&dentry->d_lock)->rlock CPU:1 [LOST 673 EVENTS] hackbench-3588 [001] 1326.702711: kmem_cache_free: call_site=ffffffff81102b85 ptr=ffff880026d96738 hackbench-3588 [001] 1326.702712: lock_release: ffff88003e1480a8 &mm->mmap_sem hackbench-3588 [001] 1326.702713: lock_acquire: ffff88003e1480a8 &mm->mmap_sem Even works with the function graph tracer: 2) ! 170.098 us | } 2) 4.036 us | rcu_irq_exit(); 2) 3.657 us | idle_cpu(); 2) ! 190.301 us | } CPU:2 [LOST 2196 EVENTS] 2) 0.853 us | } /* cancel_dirty_page */ 2) | remove_from_page_cache() { 2) 1.578 us | _raw_spin_lock_irq(); 2) | __remove_from_page_cache() { Note, it does not work with the iterator "trace" file, since it requires the use of consuming the page from the ring buffer to determine how many events were lost, which the iterator does not do. Signed-off-by: Steven Rostedt commit 66a8cb95ed04025664d1db4e952155ee1dccd048 Author: Steven Rostedt Date: Wed Mar 31 13:21:56 2010 -0400 ring-buffer: Add place holder recording of dropped events Currently, when the ring buffer drops events, it does not record the fact that it did so. It does inform the writer that the event was dropped by returning a NULL event, but it does not put in any place holder where the event was dropped. This is not a trivial thing to add because the ring buffer mostly runs in overwrite (flight recorder) mode. That is, when the ring buffer is full, new data will overwrite old data. In a produce/consumer mode, where new data is simply dropped when the ring buffer is full, it is trivial to add the placeholder for dropped events. When there's more room to write new data, then a special event can be added to notify the reader about the dropped events. But in overwrite mode, any new write can overwrite events. A place holder can not be inserted into the ring buffer since there never may be room. A reader could also come in at anytime and miss the placeholder. Luckily, the way the ring buffer works, the read side can find out if events were lost or not, and how many events. Everytime a write takes place, if it overwrites the header page (the next read) it updates a "overrun" variable that keeps track of the number of lost events. When a reader swaps out a page from the ring buffer, it can record this number, perfom the swap, and then check to see if the number changed, and take the diff if it has, which would be the number of events dropped. This can be stored by the reader and returned to callers of the reader. Since the reader page swap will fail if the writer moved the head page since the time the reader page set up the swap, this gives room to record the overruns without worrying about races. If the reader sets up the pages, records the overrun, than performs the swap, if the swap succeeds, then the overrun variable has not been updated since the setup before the swap. For binary readers of the ring buffer, a flag is set in the header of each sub page (sub buffer) of the ring buffer. This flag is embedded in the size field of the data on the sub buffer, in the 31st bit (the size can be 32 or 64 bits depending on the architecture), but only 27 bits needs to be used for the actual size (less actually). We could add a new field in the sub buffer header to also record the number of events dropped since the last read, but this will change the format of the binary ring buffer a bit too much. Perhaps this change can be made if the information on the number of events dropped is considered important enough. Note, the notification of dropped events is only used by consuming reads or peeking at the ring buffer. Iterating over the ring buffer does not keep this information because the necessary data is only available when a page swap is made, and the iterator does not swap out pages. Cc: Robert Richter Cc: Andi Kleen Cc: Li Zefan Cc: Arnaldo Carvalho de Melo Cc: "Luis Claudio R. Goncalves" Cc: Frederic Weisbecker Signed-off-by: Steven Rostedt commit eb0c53771fb2f5f66b0edb3ebce33be4bbf1c285 Author: Steven Rostedt Date: Mon Mar 29 14:25:18 2010 -0400 tracing: Fix compile error in module tracepoints when MODULE_UNLOAD not set If modules are configured in the build but unloading of modules is not, then the refcnt is not defined. Place the get/put module tracepoints under CONFIG_MODULE_UNLOAD since it references this field in the module structure. As a side-effect, this patch also reduces the code when MODULE_UNLOAD is not set, because these unused tracepoints are not created. Signed-off-by: Steven Rostedt commit ae832d1e03ac9bf09fb8a07fb37908ab40c7cd0e Author: Li Zefan Date: Wed Mar 24 10:57:43 2010 +0800 tracing: Remove side effect from module tracepoints that caused a GPF Remove the @refcnt argument, because it has side-effects, and arguments with side-effects are not skipped by the jump over disabled instrumentation and are executed even when the tracepoint is disabled. This was also causing a GPF as found by Randy Dunlap: Subject: 2.6.33 GP fault only when built with tracing LKML-Reference: <4BA2B69D.3000309@oracle.com> Note, the current 2.6.34-rc has a fix for the actual cause of the GPF, but this fixes one of its triggers. Tested-by: Randy Dunlap Acked-by: Mathieu Desnoyers Signed-off-by: Li Zefan LKML-Reference: <4BA97FA7.6040406@cn.fujitsu.com> Signed-off-by: Steven Rostedt commit 50354a8a28d0c91695a2d6d25b5a821bfe557a07 Author: Li Zefan Date: Wed Mar 24 10:58:24 2010 +0800 tracing: Update comments Make some comments consistent with the code. Signed-off-by: Li Zefan LKML-Reference: <4BA97FD0.7090202@cn.fujitsu.com> Signed-off-by: Steven Rostedt commit 4bdde044dc36ac7b01f7502394d52619af9d1927 Author: Li Zefan Date: Wed Mar 24 10:58:05 2010 +0800 tracing: Convert some signal events to DEFINE_TRACE Use DECLARE_EVENT_CLASS to remove duplicate code: text data bss dec hex filename 23639 6084 8 29731 7423 kernel/signal.o.orig 22727 6084 8 28819 7093 kernel/signal.o 2 events are converted: signal_queue_overflow: signal_overflow_fail, signal_lose_info No functional change. Acked-by: Masami Hiramatsu Signed-off-by: Li Zefan LKML-Reference: <4BA97FBD.8070703@cn.fujitsu.com> Signed-off-by: Steven Rostedt commit 1fb2f77c037624601fd214fb7c29faa84cd7bdd7 Author: Henrik Kretzschmar Date: Fri Mar 26 20:38:35 2010 +0100 debugobjects: Section mismatch cleanup This patch marks two functions, which only get called at initialization, as __init. Here is also interesting, that modpost doesn't catch here the right function name. WARNING: lib/built-in.o(.text+0x585f): Section mismatch in reference from the function T.506() to the variable .init.data:obj The function T.506() references the variable __initdata obj. This is often because T.506 lacks a __initdata annotation or the annotation of obj is wrong. Signed-off-by: Henrik Kretzschmar LKML-Reference: <1269632315-19403-1-git-send-email-henne@nachtwindheim.de> Signed-off-by: Thomas Gleixner commit 11164cd4f6dab326a88bdf27f2f8f7c11977e91a Author: Peter Zijlstra Date: Fri Mar 26 14:08:44 2010 +0100 perf, x86: Add Nehelem PMU programming errata workaround Implement the workaround for Intel Errata AAK100 and AAP53. Also, remove the Core-i7 name for Nehalem events since there are also Westmere based i7 chips. Signed-off-by: Peter Zijlstra Cc: Stephane Eranian LKML-Reference: <1269608924.12097.147.camel@laptop> Signed-off-by: Ingo Molnar commit ea8e61b7bbc4a2faef77db34eb2db2a2c2372ff6 Author: Peter Zijlstra Date: Thu Mar 25 14:51:51 2010 +0100 x86, ptrace: Fix block-step Implement ptrace-block-step using TIF_BLOCKSTEP which will set DEBUGCTLMSR_BTF when set for a task while preserving any other DEBUGCTLMSR bits. Signed-off-by: Peter Zijlstra LKML-Reference: <20100325135414.017536066@chello.nl> Signed-off-by: Ingo Molnar commit faa4602e47690fb11221e00f9b9697c8dc0d4b19 Author: Peter Zijlstra Date: Thu Mar 25 14:51:50 2010 +0100 x86, perf, bts, mm: Delete the never used BTS-ptrace code Support for the PMU's BTS features has been upstreamed in v2.6.32, but we still have the old and disabled ptrace-BTS, as Linus noticed it not so long ago. It's buggy: TIF_DEBUGCTLMSR is trampling all over that MSR without regard for other uses (perf) and doesn't provide the flexibility needed for perf either. Its users are ptrace-block-step and ptrace-bts, since ptrace-bts was never used and ptrace-block-step can be implemented using a much simpler approach. So axe all 3000 lines of it. That includes the *locked_memory*() APIs in mm/mlock.c as well. Reported-by: Linus Torvalds Signed-off-by: Peter Zijlstra Cc: Roland McGrath Cc: Oleg Nesterov Cc: Markus Metzger Cc: Steven Rostedt Cc: Andrew Morton LKML-Reference: <20100325135413.938004390@chello.nl> Signed-off-by: Ingo Molnar commit 7c5ecaf7666617889f337296c610815b519abfa9 Author: Peter Zijlstra Date: Thu Mar 25 14:51:49 2010 +0100 perf, x86: Clean up debugctlmsr bit definitions Move all debugctlmsr thingies into msr-index.h Signed-off-by: Peter Zijlstra LKML-Reference: <20100325135413.861425293@chello.nl> Signed-off-by: Ingo Molnar commit 5a10317483f606106395814ee2fdaa2f1256a3b3 Author: Zhang, Yanmin Date: Thu Mar 25 19:59:01 2010 -0300 perf record: Zero out mmap_array to fix segfault Reported-by: Li Zefan Tested-by: Li Zefan Signed-off-by: Zhang Yanmin Signed-off-by: Arnaldo Carvalho de Melo LKML-Reference: <1269557941-15617-6-git-send-email-acme@infradead.org> Cc: Signed-off-by: Ingo Molnar commit 5aab621b7bf024608f0c089e21656e7fe875a150 Author: Arnaldo Carvalho de Melo Date: Thu Mar 25 19:59:00 2010 -0300 perf symbols: Move hex2u64 and strxfrchar to symbol.c Mostly used in symbol.c so move them there to reduce the number of files needed to use the symbol system. Also do some header adjustments with the same intent. Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1269557941-15617-5-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit 618038df3588fdfcaccfd40057f36ce792bee252 Author: Arnaldo Carvalho de Melo Date: Thu Mar 25 19:58:59 2010 -0300 perf tools: Move __used from perf.h to linux/compiler.h Just like in the kernel and also to remove the need to include perf.h in the symbol subsystem. Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1269557941-15617-4-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit 4b8cf84624e9a58a21aaac3d064222092ae234e0 Author: Arnaldo Carvalho de Melo Date: Thu Mar 25 19:58:58 2010 -0300 perf symbols: Move map related routines to map.c Thru series of refactorings functions were being renamed but not moved to map.c to reduce patch noise, now lets have them in the same place so that use of the symbol system by tools can be constrained to building and linking fewer source files: symbol.c, map.c and rbtree.c. Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1269557941-15617-3-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit b177f63f5226e75280855bbcd106e677250778bd Author: Arnaldo Carvalho de Melo Date: Thu Mar 25 19:58:57 2010 -0300 perf symbols: Pass the mmap parameters instead of using mmap_event To reduce the coupling of the symbol system with the rest of perf. Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1269557941-15617-2-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit d5679ae4d2d4369477dc3b60027cca222aef33e9 Author: Arnaldo Carvalho de Melo Date: Wed Mar 24 16:40:19 2010 -0300 perf report: Pass the DSO to 'perf annotate' So that we ensure that the symbol asked for annotation really is in the DSO we are interested in. Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1269459619-982-6-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit b3c9ac0846c654dea4df095999ee202e8b4cb253 Author: Arnaldo Carvalho de Melo Date: Wed Mar 24 16:40:18 2010 -0300 perf callchains: Store the map together with the symbol We need this to know where a symbol in a callchain came from, for various reasons, among them precise annotation from a TUI/GUI tool. Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1269459619-982-5-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit 59fd53062f71011a68d03f4cd0ba93d822ac3249 Author: Arnaldo Carvalho de Melo Date: Wed Mar 24 16:40:17 2010 -0300 perf tools: Introduce struct map_symbol That will be in both struct hist_entry and struct callchain_list, so that the TUI can store a pointer to the pair (map, symbol) in the trees where hist_entries and callchain_lists are present, to allow precise annotation instead of looking for the first symbol with the selected name. Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1269459619-982-4-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit ac73c5a9c1767b2771e6d2b5accafdef89db04c2 Author: Arnaldo Carvalho de Melo Date: Wed Mar 24 16:40:16 2010 -0300 perf annotate: Allow specifying DSOs where to look for symbol Using the same parameter as in 'perf report', allowing to specify just one and disambiguate between DSOs that have the symbol of interest. Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1269459619-982-3-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit 96415e4d3f5fdf9cdb12eedfcbc58152b1e1458c Author: Arnaldo Carvalho de Melo Date: Wed Mar 24 16:40:15 2010 -0300 perf symbols: Avoid unnecessary symbol loading when dso list is specified We were performing the full thread__find_addr_location operation, i.e. resolving to a map/dso _and_ loading its symbols when we can optimize it by first calling thread__find_addr_map to find just the map/dso, check if it is one that we are interested in (passed via --dsos/-d in 'perf annotate', 'perf report', etc) and if not avoid loading the symtab. Nice speedup when we know which DSO we're interested in. Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1269459619-982-2-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit 53c540195724b52422da067a31ef6916d2c70202 Author: Arnaldo Carvalho de Melo Date: Wed Mar 24 16:40:14 2010 -0300 perf report: Add a popup menu to ask what operation is to be performed Right now it presents a menu with these options: +------------------------------+ | Annotate CURRENT_SYMBOL_NAME | | Exit | +------------------------------+ If the highlighted (current) symbol is not annotatable only the "Exit" option will appear. Also add a confirmation dialog when ESC is pressed on the top level to avoid exiting the application by pressing one too many ESC key. To get to the menu just press the -> (Right navigation key), to exit just press ESC. Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1269459619-982-1-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit d814f30105798b6677ecb73ed61d691ff96dada9 Author: Cyrill Gorcunov Date: Wed Mar 24 12:09:26 2010 +0800 x86, perf: Add raw events support for the P4 PMU The adding of raw event support lead to complete code refactoring. I hope is became more readable then it was. The list of changes: 1) The 64bit config field is enough to hold all information we need to track event details. To achieve it we used *own* enum for events selection in ESCR register and map this key into proper value at moment of event enabling. For the same reason we use 12LSB bits in CCCR register -- to track which exactly cache trace event was requested. And we cear this bits at real 'write' moment. 2) There is no per-cpu area reserved for P4 PMU anymore. We don't need it. All is held by config. 3) Now we may use any available counter, ie we try to grab any possible counter. v2: - Lin Ming reported the lack of ESCR selector in CCCR for cache events v3: - Don't loose cache event codes at config unpacking procedure, we may need it one day so no obscure hack behind our back, better to clear reserved bits explicitly when needed (thanks Ming for pointing out) - Lin Ming fixed misplaced opcodes in cache events Signed-off-by: Cyrill Gorcunov Tested-by: Lin Ming Signed-off-by: Lin Ming Cc: Arnaldo Carvalho de Melo Cc: Stephane Eranian Cc: Robert Richter Cc: Frederic Weisbecker Cc: Cyrill Gorcunov Cc: Peter Zijlstra LKML-Reference: <1269403766.3409.6.camel@minggr.sh.intel.com> [ v4: did a few whitespace fixlets ] Signed-off-by: Ingo Molnar commit 88978e562302c836c1c4597700c79d971e93abc0 Author: Arnaldo Carvalho de Melo Date: Tue Mar 23 14:33:58 2010 -0300 perf archive: Explain how to use the generated tarball [root@doppio ~]# perf archive Now please run: $ tar xvf perf.data.tar.bz2 -C ~/.debug wherever you need to run 'perf report' on. [root@doppio ~]# Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1269365638-10223-1-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit 4ded2b250f1fbba4e414d17dc55ee513485c0aa1 Author: Arnaldo Carvalho de Melo Date: Mon Mar 22 17:52:49 2010 -0300 perf report: Implement Newt callgraphs Starts collapsed, allows annotating by pressing 'A' or 'a' on the symbol, be it the top level one or any of the symbols in the chains. It (ab)uses the only tree widget in newt, that is actually a checkbox tree that we use with just one option ('.'), end result is usable but we really need to create a custom widget tree so that we can use the data structures we have (hist_entry rb_tree + callchain rb_tree + lists), so that we reduce the memory footprint by not creating a mirror set of data structures in the newtCheckboxTree widget. Thanks to Frédéric Weisbacker for fixing the orphanage problem in 301fde2, without that we were tripping a newt bug (fix already sent to newt's maintainer). Signed-off-by: Arnaldo Carvalho de Melo Cc: Avi Kivity Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1269291169-29820-1-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit 478b0973bf8c90db3677fbb8d812e2bdefc43d9b Author: Arnaldo Carvalho de Melo Date: Mon Mar 22 13:10:29 2010 -0300 perf tools: Exit browser before printing usage when unkown option passed If not the screen will get garbled when using newt. Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1269274229-20442-5-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit 084ab9f862416b2ddb4bb9804884de19bf09774d Author: Arnaldo Carvalho de Melo Date: Mon Mar 22 13:10:28 2010 -0300 perf stat: Better report failure to collect system wide stats Before: [acme@doppio linux-2.6-tip]$ perf stat -a sleep 1s Performance counter stats for 'sleep 1s': task-clock-msecs context-switches CPU-migrations page-faults cycles instructions branches branch-misses cache-references cache-misses 1.016998463 seconds time elapsed [acme@doppio linux-2.6-tip]$ Now: [acme@doppio linux-2.6-tip]$ perf stat -a sleep 1s No permission to collect system-wide stats. Consider tweaking /proc/sys/kernel/perf_event_paranoid. [acme@doppio linux-2.6-tip]$ Reported-by: Ingo Molnar Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1269274229-20442-4-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit ca721e45b39209415d2288dbac3667b26d9d1def Author: Masami Hiramatsu Date: Mon Mar 22 13:10:27 2010 -0300 perf probe: Add NO_DWARF make option Add NO_DWARF make option for testing build without libdw. Signed-off-by: Masami Hiramatsu Signed-off-by: Arnaldo Carvalho de Melo Cc: Frederic Weisbecker Cc: Paul Mackerras Cc: Mike Galbraith Cc: Peter Zijlstra LKML-Reference: <1269274229-20442-3-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit 4b4da7f76660ea8b5aa45615165c48f62167ffa8 Author: Masami Hiramatsu Date: Mon Mar 22 13:10:26 2010 -0300 perf probe: Cleanup debuginfo related code Cleanup debuginfo related code to eliminate fragile code which pointed by Ingo (Thanks!). 1) Invert logic of NO_DWARF_SUPPORT to DWARF_SUPPORT. 2) For removing assymetric/local variable ifdefs, introduce more helper functions. 3) Change options order to reduce the number of ifdefs. Reported-by: Ingo Molnar Signed-off-by: Masami Hiramatsu Signed-off-by: Arnaldo Carvalho de Melo Cc: Frederic Weisbecker Cc: Paul Mackerras Cc: Mike Galbraith Cc: Peter Zijlstra LKML-Reference: <1269274229-20442-2-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit f3a1f0ea9432ec395cd112f42201e8e523c07bc5 Author: Arnaldo Carvalho de Melo Date: Mon Mar 22 13:10:25 2010 -0300 perf newt: Properly restore the screen when error exiting Show an OK message box with the last message sent via pr_err, etc. Reported-by: Ingo Molnar Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1269274229-20442-1-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit 301fde27c7fcd0380b02b175d547e894ff65d78a Author: Frederic Weisbecker Date: Mon Mar 22 13:09:33 2010 -0300 perf: Fix orphan callchain branches Callchains have markers inside their capture to tell we enter a context (kernel, user, ...). Those are not displayed in the callchains but they are incidentally an active part of the radix tree where callchains are stored, just like any other address. If we have the two following callchains: addr1 -> addr2 -> user context -> addr3 addr1 -> addr2 -> user context -> addr4 addr1 -> addr2 -> addr 5 This is pretty common if addr1 and addr2 are part of an interrupt path, addr3 and addr4 are user addresses and addr5 is a kernel non interrupt path. This will be stored as follows in the tree: addr1 addr2 / \ / addr5 user context / \ addr3 addr4 But we ignore the context markers in the report, hence the addr3 and addr4 will appear as orphan branches: |--28.30%-- hrtimer_interrupt | smp_apic_timer_interrupt | apic_timer_interrupt | | <------------- here, no parent! | | | | | |--11.11%-- 0x7fae7bccb875 | | | | | |--11.11%-- 0xffffffffff60013b | | | | | |--11.11%-- __pthread_mutex_lock_internal | | | | | |--11.11%-- __errno_location Fix this by removing the context markers when we process the callchains to the tree. Reported-by: Arnaldo Carvalho de Melo Signed-off-by: Frederic Weisbecker Cc: Paul Mackerras Cc: Peter Zijlstra Signed-off-by: Arnaldo Carvalho de Melo LKML-Reference: <1269274173-20328-1-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit d2f1e15b661e71fd52111f51c99a6ce41384e9ef Merge: 40b7e05 220bf99 Author: Ingo Molnar Date: Mon Mar 22 18:46:57 2010 +0100 Merge commit 'v2.6.34-rc2' into perf/core Merge reason: Pick up latest perf fixes from upstream. Signed-off-by: Ingo Molnar commit 4bd96a7a8185755b091233b16034c7436cbf57af Author: Shane Wang Date: Wed Mar 10 14:36:10 2010 +0800 x86, tboot: Add support for S3 memory integrity protection This patch adds support for S3 memory integrity protection within an Intel(R) TXT launched kernel, for all kernel and userspace memory. All RAM used by the kernel and userspace, as indicated by memory ranges of type E820_RAM and E820_RESERVED_KERN in the e820 table, will be integrity protected. The MAINTAINERS file is also updated to reflect the maintainers of the TXT-related code. All MACing is done in tboot, based on a complexity analysis and tradeoff. v3: Compared with v2, this patch adds a check of array size in tboot.c, and a note to specify which c/s of tboot supports this kind of MACing in intel_txt.txt. Signed-off-by: Shane Wang LKML-Reference: <4B973DDA.6050902@intel.com> Signed-off-by: Joseph Cihula Acked-by: Pavel Machek Acked-by: Rafael J. Wysocki Signed-off-by: H. Peter Anvin commit 40b7e05e17eef31ff30fe08dfc2424ef653a792c Author: Lin Ming Date: Fri Mar 19 15:28:58 2010 +0800 perf, x86: Fix key indexing in Pentium-4 PMU Index 0-6 in p4_templates are reserved for common hardware events. So p4_templates is arranged as below: 0 - 6: common hardware events 7 - N: cache events N+1 - ...: other raw events Reported-by: Cyrill Gorcunov Signed-off-by: Lin Ming Acked-by: Cyrill Gorcunov Cc: Peter Zijlstra LKML-Reference: <1268983738.13901.142.camel@minggr.sh.intel.com> Signed-off-by: Ingo Molnar commit 9c8c6bad3137112d2c7bf3d215b736ee4215fa74 Author: Cyrill Gorcunov Date: Fri Mar 19 00:12:56 2010 +0300 x86, perf: Fix few cosmetic dabs for P4 pmu (comments and constantify) - A few ESCR have escaped fixing at previous attempt. - p4_escr_map is read only, make it const. Nothing serious. Signed-off-by: Cyrill Gorcunov Cc: Lin Ming LKML-Reference: <20100318211256.GH5062@lenovo> Signed-off-by: Ingo Molnar commit 4b24a88b35e15e04bd8f2c5dda65b5dc8ebca05f Author: Stephane Eranian Date: Wed Mar 17 23:21:01 2010 +0200 perf_events: Fix resource leak in x86 __hw_perf_event_init() If reserve_pmc_hardware() succeeds but reserve_ds_buffers() fails, then we need to release_pmc_hardware. It won't be done by the destroy() callback because we return before setting it in case of error. Signed-off-by: Stephane Eranian Cc: Cc: peterz@infradead.org Cc: paulus@samba.org Cc: davem@davemloft.net Cc: fweisbec@gmail.com Cc: robert.richter@amd.com Cc: perfmon2-devel@lists.sf.net LKML-Reference: <4ba1568b.15185e0a.182a.7802@mx.google.com> Signed-off-by: Ingo Molnar -- arch/x86/kernel/cpu/perf_event.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) commit cb7d6b5053e86598735d9af19930f5929f007b7f Author: Lin Ming Date: Thu Mar 18 18:33:12 2010 +0800 perf, x86: Add cache events for the Pentium-4 PMU Move the HT bit setting code from p4_pmu_event_map to p4_hw_config. So the cache events can get HT bit set correctly. Tested on my P4 desktop, below 6 cache events work: L1-dcache-load-misses LLC-load-misses dTLB-load-misses dTLB-store-misses iTLB-loads iTLB-load-misses Signed-off-by: Lin Ming Reviewed-by: Cyrill Gorcunov Cc: Peter Zijlstra LKML-Reference: <1268908392.13901.128.camel@minggr.sh.intel.com> Signed-off-by: Ingo Molnar commit f34edbc1cdb0f8f83d94e1d668dd6e41abf0defb Author: Lin Ming Date: Thu Mar 18 18:33:07 2010 +0800 perf, x86: Add a key to simplify template lookup in Pentium-4 PMU Currently, we use opcode(Event and Event-Selector) + emask to look up template in p4_templates. But cache events (L1-dcache-load-misses, LLC-load-misses, etc) use the same event(P4_REPLAY_EVENT) to do the counting, ie, they have the same opcode and emask. So we can not use current lookup mechanism to find the template for cache events. This patch introduces a "key", which is the index into p4_templates. The low 12 bits of CCCR are reserved, so we can hide the "key" in the low 12 bits of hwc->config. We extract the key from hwc->config and then quickly find the template. Signed-off-by: Lin Ming Reviewed-by: Cyrill Gorcunov Cc: Peter Zijlstra LKML-Reference: <1268908387.13901.127.camel@minggr.sh.intel.com> Signed-off-by: Ingo Molnar commit 55632770d7298835645489828af87f854c47749c Author: Ingo Molnar Date: Thu Mar 18 16:51:16 2010 +0100 perf events: Fix false positive build warning with older GCC's gcc 4.2.1 produces: util/probe-event.c: In function 'add_perf_probe_events': util/probe-event.c:883: warning: 'tev' may be used uninitialized in this function make: *** [util/probe-event.o] Error 1 Newer GCCs get this right. To work it around, initialize the variable to NULL so that older GCCs see it as initialized too. Cc: Masami Hiramatsu Cc: Frederic Weisbecker Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Mike Galbraith Cc: Peter Zijlstra LKML-Reference: <20100316220612.32050.33806.stgit@localhost6.localdomain6> Signed-off-by: Ingo Molnar commit 7335f75e9ca166044e38a96abad422d8e6e364b5 Author: Cyrill Gorcunov Date: Wed Mar 17 13:37:01 2010 +0300 x86, perf: Use apic_write unconditionally Since apic_write() maps to a plain noop in the !CONFIG_X86_LOCAL_APIC case we're safe to remove this conditional compilation and clean up the code a bit. Signed-off-by: Cyrill Gorcunov Cc: fweisbec@gmail.com Cc: acme@redhat.com Cc: eranian@google.com Cc: peterz@infradead.org LKML-Reference: <20100317104356.232371479@openvz.org> Signed-off-by: Ingo Molnar commit d674cd1963129b70bc5f631c51fb30fb73213fb2 Author: Cyrill Gorcunov Date: Wed Mar 17 13:37:00 2010 +0300 x86, apic: Allow to use certain functions without APIC built-in support In case even if the kernel is configured so that no APIC support is built-in we still may allow to use certain apic functions as dummy calls. In particular we start using it in perf-events code. Note that this is not that same as NOOP apic driver (which is used if APIC support is present but no physical APIC is available), this is for the case when we don't have apic code compiled in at all. Signed-off-by: Cyrill Gorcunov Cc: H. Peter Anvin Cc: Yinghai Lu Cc: Yinghai Lu LKML-Reference: <20100317104356.011052632@openvz.org> Signed-off-by: Ingo Molnar commit d6d901c23a9c4c7361aa901b5b2dda69703dd5e0 Author: Zhang, Yanmin Date: Thu Mar 18 11:36:05 2010 -0300 perf events: Change perf parameter --pid to process-wide collection instead of thread-wide Parameter --pid (or -p) of perf currently means a thread-wide collection. For exmaple, if a process whose id is 8888 has 10 threads, 'perf top -p 8888' just collects the main thread statistics. That's misleading. Users are used to attach a whole process when debugging a process by gdb. To follow normal usage style, the patch change --pid to process-wide collection and add --tid (-t) to mean a thread-wide collection. Usage example is: # perf top -p 8888 # perf record -p 8888 -f sleep 10 # perf stat -p 8888 -f sleep 10 Above commands collect the statistics of all threads of process 8888. Signed-off-by: Zhang Yanmin Signed-off-by: Arnaldo Carvalho de Melo Cc: Avi Kivity Cc: Peter Zijlstra Cc: Sheng Yang Cc: Joerg Roedel Cc: Jes Sorensen Cc: Marcelo Tosatti Cc: Gleb Natapov Cc: zhiteng.huang@intel.com Cc: Zachary Amsden LKML-Reference: <1268922965-14774-3-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit 46be604b5ba738d53e5f5314813a4e7092864baf Author: Zhang, Yanmin Date: Thu Mar 18 11:36:04 2010 -0300 perf record: Enable counters only when kernel is execing subcommand 'perf record' starts counters before subcommand is execed, so the statistics is not precise because it includes data of some preparation steps. I fix it with the patch. In addition, change the condition to fork/exec subcommand. If there is a subcommand parameter, perf always fork/exec it. The usage example is: # perf record -f -a sleep 10 So this command could collect statistics for 10 seconds precisely. User still could stop it by CTRL+C. Without the new capability, user could only input CTRL+C to stop it without precise time clock. Signed-off-by: Zhang Yanmin Signed-off-by: Arnaldo Carvalho de Melo Cc: Avi Kivity Cc: Peter Zijlstra Cc: Sheng Yang Cc: oerg Roedel Cc: Jes Sorensen Cc: Marcelo Tosatti Cc: Gleb Natapov Cc: Cc: Zachary Amsden LKML-Reference: <1268922965-14774-2-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit 6be2850effd6a8bae11d623c8c52e88d2fbc0e96 Author: Zhang, Yanmin Date: Thu Mar 18 11:36:03 2010 -0300 perf stat: Enable counters when collecting process-wide or system-wide data Command 'perf stat' doesn't enable counters when collecting an existing (by -p) process or system-wide statistics. Fix the issue. Change the condition of fork/exec subcommand. If there is a subcommand parameter, perf always forks/execs it. The usage example is: # perf stat -a sleep 10 So this command could collect statistics for 10 seconds precisely. User still could stop it by CTRL+C. Without the new capability, user could only use CTRL+C to stop it without precise time clock. Another issue is 'perf stat -a' consumes 100% time of a full single logical cpu. It has a bad impact on running workload. Fix it by adding a sleep(1) in the while(!done) loop in function run_perf_stat. Signed-off-by: Zhang Yanmin Signed-off-by: Arnaldo Carvalho de Melo Cc: Avi Kivity Cc: Peter Zijlstra Cc: Sheng Yang Cc: Marcelo Tosatti Cc: Joerg Roedel Cc: Jes Sorensen Cc: Gleb Natapov Cc: Zachary Amsden Cc: LKML-Reference: <1268922965-14774-1-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit 2acebe9ecb2b77876e87a1480729cfb2db4570dd Author: Jack Steiner Date: Wed Mar 17 10:40:38 2010 -0500 x86, UV: Delete unneeded boot messages SGI:UV: Delete extra boot messages that describe the system topology. These messages are no longer useful. Signed-off-by: Jack Steiner LKML-Reference: <20100317154038.GA29346@sgi.com> Signed-off-by: Ingo Molnar commit d6dc0b4ead6e8720096ecfa3d9e899b47ddbc8ed Author: Robert Richter Date: Wed Mar 17 12:49:13 2010 +0100 perf/core, x86: Remove duplicate perf_event_mask variable The same information is stored also in x86_pmu.intel_ctrl. This patch removes perf_event_mask and instead uses x86_pmu.intel_ctrl directly. Signed-off-by: Robert Richter Cc: Stephane Eranian Cc: Peter Zijlstra LKML-Reference: <1268826553-19518-5-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar commit 10f1014d86fd4fe5087080d609b51183396c5e4c Author: Robert Richter Date: Wed Mar 17 12:49:12 2010 +0100 perf/core, x86: Remove cpu_hw_events.interrupts This member in the struct is not used anymore and can be removed. Signed-off-by: Robert Richter Cc: Stephane Eranian Cc: Peter Zijlstra LKML-Reference: <1268826553-19518-4-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar commit 141c4296cb630a7ed4c3730913bc3c0617ef9753 Author: Robert Richter Date: Wed Mar 17 12:49:11 2010 +0100 perf/core: Correct files in MAINTAINERS entry This corrects the file entries for perf_events. The following files are caught now: $ xargs | eval ls $(cat) | sort -u kernel/perf_event*.c include/linux/perf_event.h arch/*/kernel/perf_event*.c arch/*/kernel/*/perf_event*.c arch/*/kernel/*/*/perf_event*.c arch/*/include/asm/perf_event.h arch/*/lib/perf_event*.c arch/*/kernel/perf_callchain.c arch/alpha/include/asm/perf_event.h arch/arm/include/asm/perf_event.h arch/arm/kernel/perf_event.c arch/frv/include/asm/perf_event.h arch/frv/lib/perf_event.c arch/parisc/include/asm/perf_event.h arch/powerpc/include/asm/perf_event.h arch/powerpc/kernel/perf_callchain.c arch/powerpc/kernel/perf_event.c arch/s390/include/asm/perf_event.h arch/sh/include/asm/perf_event.h arch/sh/kernel/cpu/sh4a/perf_event.c arch/sh/kernel/cpu/sh4/perf_event.c arch/sh/kernel/perf_callchain.c arch/sh/kernel/perf_event.c arch/sparc/include/asm/perf_event.h arch/sparc/kernel/perf_event.c arch/x86/include/asm/perf_event.h arch/x86/kernel/cpu/perf_event_amd.c arch/x86/kernel/cpu/perf_event.c arch/x86/kernel/cpu/perf_event_intel.c arch/x86/kernel/cpu/perf_event_p6.c include/linux/perf_event.h kernel/perf_event.c Signed-off-by: Robert Richter Cc: Stephane Eranian Cc: Peter Zijlstra LKML-Reference: <1268826553-19518-3-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar commit b27ea29c6267889be255f2217fa7a6106e6a8b04 Author: Robert Richter Date: Wed Mar 17 12:49:10 2010 +0100 perf/core, x86: Reduce number of CONFIG_X86_LOCAL_APIC macros The function reserve_pmc_hardware() and release_pmc_hardware() were hard to read. This patch improves readability of the code by removing most of the CONFIG_X86_LOCAL_APIC macros. Signed-off-by: Robert Richter Cc: Stephane Eranian Cc: Peter Zijlstra LKML-Reference: <1268826553-19518-2-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar commit 3b0d516463f8deb897a55cb81e9dbbe58a2490ed Author: Ingo Molnar Date: Wed Mar 17 12:13:28 2010 +0100 perf probe: Fix !dwarf build Fix the !drawf build. This uses the existing NO_DWARF_SUPPORT mechanism we use for that, but it's really fragile and needs a cleanup. (in a separate patch) 1) Such uses: #ifndef NO_DWARF_SUPPORT are double inverted logic a'la 'not not'. Instead the flag should be called DWARF_SUPPORT. 2) Furthermore, assymetric #ifdef polluted code flow like: if (need_dwarf) #ifdef NO_DWARF_SUPPORT die("Debuginfo-analysis is not supported"); #else /* !NO_DWARF_SUPPORT */ pr_debug("Some probes require debuginfo.\n"); fd = open_vmlinux(); is very fragile and not acceptable. Instead of that helper functions should be created and the dwarf/no-dwarf logic should be separated more cleanly. 3) Local variable #ifdefs like this: #ifndef NO_DWARF_SUPPORT int fd; #endif Are fragile as well and should be eliminated. Helper functions achieve that too. Cc: Masami Hiramatsu Cc: Frederic Weisbecker Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Mike Galbraith Cc: Peter Zijlstra LKML-Reference: <20100316220612.32050.33806.stgit@localhost6.localdomain6> Signed-off-by: Ingo Molnar commit 7df2f32956cf0f1a45df38cd0e0fe0c3467580e8 Author: Masami Hiramatsu Date: Tue Mar 16 18:06:26 2010 -0400 perf probe: Add data structure member access support Support accessing members in the data structures. With this, perf-probe accepts data-structure members(IOW, it now accepts dot '.' and arrow '->' operators) as probe arguemnts. e.g. ./perf probe --add 'schedule:44 rq->curr' ./perf probe --add 'vfs_read file->f_op->read file->f_path.dentry' Note that '>' can be interpreted as redirection in command-line. Signed-off-by: Masami Hiramatsu Cc: systemtap Cc: DLE Cc: Frederic Weisbecker Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Mike Galbraith Cc: Peter Zijlstra LKML-Reference: <20100316220626.32050.57552.stgit@localhost6.localdomain6> Signed-off-by: Ingo Molnar commit fb1587d869a399554220e166d4b90b581a8ade01 Author: Masami Hiramatsu Date: Tue Mar 16 18:06:19 2010 -0400 perf probe: List probes with line number and file name Improve --list to show current exist probes with line number and file name. This enables user easily to check which line is already probed. for example: ./perf probe --list probe:vfs_read (on vfs_read:8@linux-2.6-tip/fs/read_write.c) Signed-off-by: Masami Hiramatsu Cc: systemtap Cc: DLE Cc: Frederic Weisbecker Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Mike Galbraith Cc: Peter Zijlstra LKML-Reference: <20100316220619.32050.48702.stgit@localhost6.localdomain6> Signed-off-by: Ingo Molnar commit 4235b0454ebeefc2295ad8417e18a8761425b19e Author: Masami Hiramatsu Date: Tue Mar 16 18:06:12 2010 -0400 perf probe: Introduce kprobe_trace_event and perf_probe_event Introduce kprobe_trace_event and perf_probe_event and replace old probe_point structure with it. probe_point structure is not enough flexible nor extensible. New data structures will help implementing further features. Signed-off-by: Masami Hiramatsu Cc: systemtap Cc: DLE Cc: Frederic Weisbecker Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Mike Galbraith Cc: Peter Zijlstra LKML-Reference: <20100316220612.32050.33806.stgit@localhost6.localdomain6> Signed-off-by: Ingo Molnar commit f4d7da499e4fc1fdff8f26fdeb1a058d475a7a6c Author: Masami Hiramatsu Date: Tue Mar 16 18:06:05 2010 -0400 perf probe: Add --dry-run option Add --dry-run option for debugging and testing. Signed-off-by: Masami Hiramatsu Cc: systemtap Cc: DLE Cc: Frederic Weisbecker Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Mike Galbraith Cc: Peter Zijlstra LKML-Reference: <20100316220605.32050.6571.stgit@localhost6.localdomain6> Signed-off-by: Ingo Molnar commit 016f262e4fb10c6ecff709317098912f94a21efa Author: Masami Hiramatsu Date: Tue Mar 16 18:05:58 2010 -0400 perf probe: Introduce die_find_child() function Introduce die_find_child() function to integrate DIE-tree searching functions. Signed-off-by: Masami Hiramatsu Cc: systemtap Cc: DLE Cc: Frederic Weisbecker Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Mike Galbraith Cc: Peter Zijlstra LKML-Reference: <20100316220558.32050.7905.stgit@localhost6.localdomain6> Signed-off-by: Ingo Molnar commit 95a3e4c4e21de1920a2ddb54bfc57c0af7e2561e Author: Masami Hiramatsu Date: Tue Mar 16 18:05:51 2010 -0400 perf probe: Rename some die_get_* functions Rename die_get_real_subprogram and die_get_inlinefunc to die_find_real_subprogram and die_find_inlinefunc respectively, because these functions search its children. After that, 'die_get_' means getting a property of that die, and 'die_find_' means searching DIE-tree to get an appropriate child die. Signed-off-by: Masami Hiramatsu Cc: systemtap Cc: DLE Cc: Frederic Weisbecker Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Mike Galbraith Cc: Peter Zijlstra LKML-Reference: <20100316220551.32050.36181.stgit@localhost6.localdomain6> Signed-off-by: Ingo Molnar commit 12a1fadb41b5a6733c36b488b881fb19a28c92d3 Author: Masami Hiramatsu Date: Tue Mar 16 18:05:44 2010 -0400 perf probe: Rename session to param Since this name 'session' conflicts with 'perf_session', and this structure just holds parameters anymore. Signed-off-by: Masami Hiramatsu Cc: systemtap Cc: DLE Cc: Frederic Weisbecker Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Mike Galbraith Cc: Peter Zijlstra LKML-Reference: <20100316220544.32050.8788.stgit@localhost6.localdomain6> Signed-off-by: Ingo Molnar commit e0faa8d35845bb1893cf9e608a5a5d92e9390bf0 Author: Masami Hiramatsu Date: Tue Mar 16 18:05:37 2010 -0400 perf probe: Move add-probe routine to util/ Move add-probe routine to util/probe_event.c. This simplifies main routine for reducing maintenance cost. Signed-off-by: Masami Hiramatsu Cc: systemtap Cc: DLE Cc: Frederic Weisbecker Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Mike Galbraith Cc: Peter Zijlstra LKML-Reference: <20100316220537.32050.72214.stgit@localhost6.localdomain6> Signed-off-by: Ingo Molnar commit 31facc5f1ac674fbcc29f212377e589396bb934c Author: Masami Hiramatsu Date: Tue Mar 16 18:05:30 2010 -0400 perf probe: Use wrapper functions Use wrapped functions as much as possible, to check out of memory conditions in perf probe. Signed-off-by: Masami Hiramatsu Cc: systemtap Cc: DLE Cc: Frederic Weisbecker Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Mike Galbraith Cc: Peter Zijlstra LKML-Reference: <20100316220530.32050.53951.stgit@localhost6.localdomain6> Signed-off-by: Ingo Molnar commit a1d37d5285bcda07f9c0b80a2634ca20ab545297 Author: Masami Hiramatsu Date: Tue Mar 16 18:05:21 2010 -0400 perf tools: Introduce xzalloc() for detecting out of memory conditions Introducing xzalloc() which wrapping zalloc() for detecting out of memory conditions. Signed-off-by: Masami Hiramatsu Cc: systemtap Cc: DLE Cc: Frederic Weisbecker Cc: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Mike Galbraith Cc: Peter Zijlstra LKML-Reference: <20100316220521.32050.85155.stgit@localhost6.localdomain6> [ -v2: small cleanups in surrounding code ] Signed-off-by: Ingo Molnar commit e4713e93b125497e9ba44d93de1bd9d8e5ad8946 Merge: 984763c a6b8457 Author: Ingo Molnar Date: Wed Mar 17 11:31:45 2010 +0100 Merge branch 'perf/urgent' into perf/core Merge reason: We'll be queueing dependent changes. Signed-off-by: Ingo Molnar commit a6b84574eed7e4fd8cb8dac2d0926fe2cf34b941 Author: Frederic Weisbecker Date: Tue Mar 16 01:05:02 2010 +0100 perf: Fix unexported generic perf_arch_fetch_caller_regs perf_arch_fetch_caller_regs() is exported for the overriden x86 version, but not for the generic weak version. As a general rule, weak functions should not have their symbol exported in the same file they are defined. So let's export it on trace_event_perf.c as it is used by trace events only. This fixes: ERROR: ".perf_arch_fetch_caller_regs" [fs/xfs/xfs.ko] undefined! ERROR: ".perf_arch_fetch_caller_regs" [arch/powerpc/platforms/cell/spufs/spufs.ko] undefined! -v2: And also only build it if trace events are enabled. -v3: Fix changelog mistake Reported-by: Stephen Rothwell Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Xiao Guangrong Cc: Paul Mackerras LKML-Reference: <1268697902-9518-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar commit 984763cb90d4b5444baa0c3e43feff7926bf1834 Author: Robert Richter Date: Tue Mar 16 17:07:33 2010 +0100 perf, x86: Report error code that returned from x86_pmu.hw_config() If x86_pmu.hw_config() fails a fixed error code (-EOPNOTSUPP) is returned even if a different error was reported. This patch fixes this. Signed-off-by: Robert Richter Acked-by: Cyrill Gorcunov Acked-by: Lin Ming Cc: acme@redhat.com Cc: eranian@google.com Cc: gorcunov@openvz.org Cc: peterz@infradead.org Cc: fweisbec@gmail.com LKML-Reference: <20100316160733.GR1585@erda.amd.com> Signed-off-by: Ingo Molnar commit 5cc718b9dad682329a60e73547c6e708faa5bbe4 Author: Masami Hiramatsu Date: Mon Mar 15 13:00:54 2010 -0400 kprobes: Hide CONFIG_OPTPROBES and set if arch supports optimized kprobes Hide CONFIG_OPTPROBES and set if the arch supports optimized kprobes (IOW, HAVE_OPTPROBES=y), since this option doesn't change the major behavior of kprobes, and workarounds for minor changes are documented. Signed-off-by: Masami Hiramatsu Cc: systemtap Cc: DLE Cc: Dieter Ries Cc: Ananth N Mavinakayanahalli Cc: OGAWA Hirofumi Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Paul Mackerras Cc: Arnaldo Carvalho de Melo Cc: Frederic Weisbecker LKML-Reference: <20100315170054.31593.3153.stgit@localhost6.localdomain6> Signed-off-by: Ingo Molnar commit 6427462bfa50f50dc6c088c07037264fcc73eca1 Author: Dan Carpenter Date: Mon Mar 15 11:21:48 2010 +0300 sched: Remove some dead code This was left over from "7c9414385e sched: Remove USER_SCHED" Signed-off-by: Dan Carpenter Acked-by: Dhaval Giani Cc: Kay Sievers Cc: Greg Kroah-Hartman LKML-Reference: <20100315082148.GD18181@bicker> Signed-off-by: Ingo Molnar commit 8ea7f544100844307072cae2f5fc108afdef999a Author: Lin Ming Date: Tue Mar 16 10:12:36 2010 +0800 x86, perf: Fix comments in Pentium-4 PMU definitions Reported-by: Cyrill Gorcunov Signed-off-by: Lin Ming Cc: Peter Zijlstra Cc: Mike Galbraith Cc: Paul Mackerras Cc: Arnaldo Carvalho de Melo Cc: Frederic Weisbecker LKML-Reference: <1268705556.3379.8.camel@minggr.sh.intel.com> Signed-off-by: Ingo Molnar commit 1d199b1ad606ae8b88acebd295b101c4e1cf2a57 Author: Frederic Weisbecker Date: Tue Mar 16 01:05:02 2010 +0100 perf: Fix unexported generic perf_arch_fetch_caller_regs perf_arch_fetch_caller_regs() is exported for the overriden x86 version, but not for the generic weak version. As a general rule, weak functions should not have their symbol exported in the same file they are defined. So let's export it on trace_event_perf.c as it is used by trace events only. This fixes: ERROR: ".perf_arch_fetch_caller_regs" [fs/xfs/xfs.ko] undefined! ERROR: ".perf_arch_fetch_caller_regs" [arch/powerpc/platforms/cell/spufs/spufs.ko] undefined! -v2: And also only build it if trace events are enabled. -v3: Fix changelog mistake Reported-by: Stephen Rothwell Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Xiao Guangrong Cc: Paul Mackerras LKML-Reference: <1268697902-9518-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar commit d06d92b7c9b99ea52bdaeb13f544675529891b8a Author: Arnaldo Carvalho de Melo Date: Mon Mar 15 13:04:33 2010 -0300 perf annotate: Properly notify the user that vmlinux is missing Before this patch we would not find a vmlinux, then try to pass objdump "[kernel.kallsyms]" as the filename, it would get confused and produce no output: [root@doppio ~]# perf annotate n_tty_write ------------------------------------------------ Percent | Source code & Disassembly of [kernel.kallsyms] ------------------------------------------------ Now we check that and emit meaningful warning: [root@doppio ~]# perf annotate n_tty_write Can't annotate n_tty_write: No vmlinux file was found in the path: [0] vmlinux [1] /boot/vmlinux [2] /boot/vmlinux-2.6.34-rc1-tip+ [3] /lib/modules/2.6.34-rc1-tip+/build/vmlinux [4] /usr/lib/debug/lib/modules/2.6.34-rc1-tip+/vmlinux [root@doppio ~]# This bug was introduced when we added automatic search for vmlinux, before that time the user had to specify a vmlinux file. v2: Print the warning just for the first symbol found when no symbol name is specified, otherwise it will spam the screen repeating the warning for each symbol. Reported-by: Ingo Molnar Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Cc: LKML-Reference: <1268669073-6856-1-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit b0a9ab62ab96e258a0ddd81d7fe2719c3db36006 Author: Arnaldo Carvalho de Melo Date: Mon Mar 15 11:46:58 2010 -0300 perf top: Properly notify the user that vmlinux is missing Before this patch this message would very briefly appear on the screen and then the screen would get updates only on the top, for number of interrupts received, etc, but no annotation would be performed: [root@doppio linux-2.6-tip]# perf top -s n_tty_write > /tmp/bla objdump: '[kernel.kallsyms]': No such file Now this is what the user gets: [root@doppio linux-2.6-tip]# perf top -s n_tty_write Can't annotate n_tty_write: No vmlinux file was found in the path: [0] vmlinux [1] /boot/vmlinux [2] /boot/vmlinux-2.6.33-rc5 [3] /lib/modules/2.6.33-rc5/build/vmlinux [4] /usr/lib/debug/lib/modules/2.6.33-rc5/vmlinux [root@doppio linux-2.6-tip]# This bug was introduced when we added automatic search for vmlinux, before that time the user had to specify a vmlinux file. Reported-by: David S. Miller Reported-by: Ingo Molnar Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras Cc: LKML-Reference: <1268664418-28328-2-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit bedbfdea31daf3880745001d56450c683959ee7e Author: Eric B Munson Date: Mon Mar 15 11:46:57 2010 -0300 perf record: Enable the enable_on_exec flag if record forks the target When forking its target, perf record can capture data from before the target application is started. Perf stat uses the enable_on_exec flag in the event attributes to keep from displaying events from before the target program starts, this patch adds the same functionality to perf record when it is will fork the target process. Signed-off-by: Eric B Munson Signed-off-by: Arnaldo Carvalho de Melo Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Frederic Weisbecker LKML-Reference: <1268664418-28328-1-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit e4495262826d1eabca3529fa6ac22394eb348132 Author: Cyrill Gorcunov Date: Mon Mar 15 12:58:22 2010 +0800 perf, x86: Enable not tagged retired instruction counting on P4s This should turn on instruction counting on P4s, which was missing in the first version of the new PMU driver. It's inaccurate for now, we still need dependant event to tag mops before we can count them precisely. The result is that the number of instruction may be lifted up. Signed-off-by: Cyrill Gorcunov Signed-off-by: Lin Ming Cc: Peter Zijlstra LKML-Reference: <1268629102.3355.11.camel@minggr.sh.intel.com> Signed-off-by: Ingo Molnar commit 8576e1971663ffdb6139041de97cdd2e1d4791cc Author: Cyrill Gorcunov Date: Sat Mar 13 11:11:16 2010 +0300 x86, perf: Unmask LVTPC only if we have APIC supported Ingo reported: | | There's a build failure on -tip with the P4 driver, on UP 32-bit, if | PERF_EVENTS is enabled but UP_APIC is disabled: | | arch/x86/built-in.o: In function `p4_pmu_handle_irq': | perf_event.c:(.text+0xa756): undefined reference to `apic' | perf_event.c:(.text+0xa76e): undefined reference to `apic' | So we have to unmask LVTPC only if we're configured to have one. Reported-by: Ingo Molnar Signed-off-by: Cyrill Gorcunov CC: Lin Ming CC: Peter Zijlstra LKML-Reference: <20100313081116.GA5179@lenovo> Signed-off-by: Ingo Molnar commit 567e54790e5c07152a93b6de4d0210af8b77da87 Author: Arnaldo Carvalho de Melo Date: Fri Mar 12 21:05:10 2010 -0300 perf tools: Fix non-newt build The use_browser needs to be in a file that is always built and also we need a browser__show_help stub in that case. Reported-by: Anton Blanchard Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1268438710-32697-1-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit 0308635917273030db6121d67c41ef2279b30340 Merge: 3997d37 0b86122 Author: Ingo Molnar Date: Fri Mar 12 21:06:35 2010 +0100 Merge branch 'perf/x86' into perf/core Merge reason: The new P4 driver is stable and ready now for more testing. Signed-off-by: Ingo Molnar commit 3997d3776a6e89586e76a0ef355bfbbd8a76966c Author: Arnaldo Carvalho de Melo Date: Fri Mar 12 12:46:48 2010 -0300 perf hist: Don't fprintf the callgraph unconditionally [root@doppio ~]# perf report -i newt.data | head -10 # Samples: 11999679868 # # Overhead Command Shared Object Symbol # ........ ....... ............................. ...... # 63.61% perf libslang.so.2.1.4 [.] SLsmg_write_chars 6.30% perf perf [.] symbols__find 2.19% perf libnewt.so.0.52.10 [.] newtListboxAppendEntry 2.08% perf libslang.so.2.1.4 [.] SLsmg_write_chars@plt 1.99% perf libc-2.10.2.so [.] _IO_vfprintf_internal [root@doppio ~]# Not good, the newt form for report works, but slang has to eat the cost of the additional callgraph lines everytime it prints a line, and the callgraph doesn't appear on the screen, so move the callgraph printing to a separate function and don't use it in newt.c. Newt tree widgets are being investigated to properly support callgraphs, but till that gets merged, lets remove this huge overhead and show at least the symbol overheads for a callgraph rich perf.data with good performance. Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1268408808-13595-2-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit cb7afb7092bc502b890f0a897ffd67c2b078d347 Author: Arnaldo Carvalho de Melo Date: Fri Mar 12 12:46:47 2010 -0300 perf newt: Use newtGetScreenSize For consistency, use the newt API more fully. Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1268408808-13595-1-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit 7081e087b90d4eb4348f7970bd6b266d837321ef Author: Arnaldo Carvalho de Melo Date: Fri Mar 12 10:48:12 2010 -0300 perf newt: Add 'Q', 'q' and Ctrl+C as ways to exit from forms These are keys people expect when pressed to exit the current widget, so have associate all of them to this semantic. Suggested-by: Ingo Molnar Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1268401692-9361-1-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit f9224c5c944b60cf709db4adf1f5195264b8d194 Author: Arnaldo Carvalho de Melo Date: Thu Mar 11 20:12:44 2010 -0300 perf report: Implement initial UI using newt Newt has widespread availability and provides a rather simple API as can be seen by the size of this patch. The work needed to support it will benefit other frontends too. In this initial patch it just checks if the output is a tty, if not it falls back to the previous behaviour, also if newt-devel/libnewt-dev is not installed the previous behaviour is maintaned. Pressing enter on a symbol will annotate it, ESC in the annotation window will return to the report symbol list. More work will be done to remove the special casing in color_fprintf, stop using fmemopen/FILE in the printing of hist_entries, etc. Also the annotation doesn't need to be done via spawning "perf annotate" and then browsing its output, we can do better by calling directly the builtin-annotate.c functions, that would then be moved to tools/perf/util/annotate.c and shared with perf top, etc But lets go by baby steps, this patch already improves perf usability by allowing to quickly do annotations on symbols from the report screen and provides a first experimentation with libnewt/TUI integration of tools. Tested on RHEL5 and Fedora12 X86_64 and on Debian PARISC64 to browse a perf.data file collected on a Fedora12 x86_64 box. Signed-off-by: Arnaldo Carvalho de Melo Cc: Avi Kivity Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1268349164-5822-5-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit dd2ee78dd8e4c6d6f1a333fd60c3dd27d1b07042 Author: Arnaldo Carvalho de Melo Date: Thu Mar 11 20:12:43 2010 -0300 perf tools: Add missing bytes printed in hist_entry__fprintf We need those to properly size the browser widht in the newt TUI. Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1268349164-5822-4-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit b4f5296f0eec2aa7061dfd8bb8c0744f095f9bd1 Author: Arnaldo Carvalho de Melo Date: Thu Mar 11 20:12:42 2010 -0300 perf tools: Use eprintf for pr_{err,warning,info} too Just like we do for pr_debug, so that we can have a single point where to redirect to the currently used output system, be it stdio or newt. Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1268349164-5822-3-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit 895f0edc3cd04a2a3de2c608ba20821b832a8abb Author: Arnaldo Carvalho de Melo Date: Thu Mar 11 20:12:41 2010 -0300 perf top: Export get_window_dimensions Will be used by the newt code too. Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1268349164-5822-2-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit fe2197b8bb2f318c1e0b0f589f24f781dd27d1f2 Author: Arnaldo Carvalho de Melo Date: Thu Mar 11 20:12:40 2010 -0300 perf symbols: Bump plt synthesizing warning debug level Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1268349164-5822-1-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit 937779db13fb6cb621e28d9ae0a6cf1d05b57d05 Merge: 6230f2c 9f591fd Author: Ingo Molnar Date: Fri Mar 12 10:20:57 2010 +0100 Merge branch 'perf/urgent' into perf/core Merge reason: We want to queue up a dependent patch. Signed-off-by: Ingo Molnar commit 0b861225a5890f22445f08ca9cc7a87cff276ff7 Author: Cyrill Gorcunov Date: Fri Mar 12 00:50:16 2010 +0300 x86, perf: Fix NULL deref on not assigned x86_pmu In case of not assigned x86_pmu and software events NULL dereference may being hit via x86_pmu::schedule_events method. Fix it by checking if x86_pmu is initialized at all. Signed-off-by: Cyrill Gorcunov Cc: Lin Ming Cc: Arnaldo Carvalho de Melo Cc: Stephane Eranian Cc: Robert Richter Cc: Frederic Weisbecker Cc: Peter Zijlstra LKML-Reference: <20100311215016.GG25162@lenovo> Signed-off-by: Ingo Molnar commit 6230f2c7ef01a69e2ba9370326572c287209d32a Author: Arnaldo Carvalho de Melo Date: Thu Mar 11 15:53:12 2010 -0300 perf record: Mention paranoid sysctl when failing to create counter [acme@mica linux-2.6-tip]$ perf record -a -f Fatal: Permission error - are you root? Consider tweaking /proc/sys/kernel/perf_event_paranoid. [acme@mica linux-2.6-tip]$ Suggested-by: Ingo Molnar Signed-off-by: Arnaldo Carvalho de Melo Cc: Frédéric Weisbecker Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <1268333592-30872-2-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar commit a072738e04f0eb26370e39ec679e9a0d65e49aea Author: Cyrill Gorcunov Date: Thu Mar 11 19:54:39 2010 +0300 perf, x86: Implement initial P4 PMU driver The netburst PMU is way different from the "architectural perfomance monitoring" specification that current CPUs use. P4 uses a tuple of ESCR+CCCR+COUNTER MSR registers to handle perfomance monitoring events. A few implementational details: 1) We need a separate x86_pmu::hw_config helper in struct x86_pmu since register bit-fields are quite different from P6, Core and later cpu series. 2) For the same reason is a x86_pmu::schedule_events helper introduced. 3) hw_perf_event::config consists of packed ESCR+CCCR values. It's allowed since in reality both registers only use a half of their size. Of course before making a real write into a particular MSR we need to unpack the value and extend it to a proper size. 4) The tuple of packed ESCR+CCCR in hw_perf_event::config doesn't describe the memory address of ESCR MSR register so that we need to keep a mapping between these tuples used and available ESCR (various P4 events may use same ESCRs but not simultaneously), for this sake every active event has a per-cpu map of hw_perf_event::idx <--> ESCR addresses. 5) Since hw_perf_event::idx is an offset to counter/control register we need to lift X86_PMC_MAX_GENERIC up, otherwise kernel strips it down to 8 registers and event armed may never be turned off (ie the bit in active_mask is set but the loop never reaches this index to check), thanks to Peter Zijlstra Restrictions: - No cascaded counters support (do we ever need them?) - No dependent events support (so PERF_COUNT_HW_INSTRUCTIONS doesn't work for now) - There are events with same counters which can't work simultaneously (need to use intersected ones due to broken counter 1) - No PERF_COUNT_HW_CACHE_ events yet Todo: - Implement dependent events - Need proper hashing for event opcodes (no linear search, good for debugging stage but not in real loads) - Some events counted during a clock cycle -- need to set threshold for them and count every clock cycle just to get summary statistics (ie to behave the same way as other PMUs do) - Need to swicth to use event_constraints - To support RAW events we need to encode a global list of P4 events into p4_templates - Cache events need to be added Event support status matrix: Event status ----------------------------- cycles works cache-references works cache-misses works branch-misses works bus-cycles partially (does not work on 64bit cpu with HT enabled) instruction doesnt work (needs dependent event [mop tagging]) branches doesnt work Signed-off-by: Cyrill Gorcunov Signed-off-by: Lin Ming Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Stephane Eranian Cc: Robert Richter Cc: Frederic Weisbecker LKML-Reference: <20100311165439.GB5129@lenovo> Signed-off-by: Ingo Molnar commit beac4c7e4a1cc6d57801f690e5e82fa2c9c245c8 Author: Mike Galbraith Date: Thu Mar 11 17:17:20 2010 +0100 sched: Remove AFFINE_WAKEUPS feature Disabling affine wakeups is too horrible to contemplate. Remove the feature flag. Signed-off-by: Mike Galbraith Signed-off-by: Peter Zijlstra LKML-Reference: <1268301890.6785.50.camel@marge.simson.net> Signed-off-by: Ingo Molnar commit 13814d42e45dfbe845a0bbe5184565d9236896ae Author: Mike Galbraith Date: Thu Mar 11 17:17:04 2010 +0100 sched: Remove ASYM_GRAN feature This features has been enabled for quite a while, after testing showed that easing preemption for light tasks was harmful to high priority threads. Remove the feature flag. Signed-off-by: Mike Galbraith Signed-off-by: Peter Zijlstra LKML-Reference: <1268301675.6785.44.camel@marge.simson.net> Signed-off-by: Ingo Molnar commit c6ee36c423c3ed1fb86bb3eabba9fc256a300d16 Author: Mike Galbraith Date: Thu Mar 11 17:16:43 2010 +0100 sched: Remove SYNC_WAKEUPS feature Sync wakeups are critical functionality with a long history. Remove it, we don't need the branch or icache footprint. Signed-off-by: Mike Galbraith Signed-off-by: Peter Zijlstra LKML-Reference: <1268301817.6785.47.camel@marge.simson.net> Signed-off-by: Ingo Molnar commit f2e74eeac03ffb779d64b66a643c5e598145a28b Author: Mike Galbraith Date: Thu Mar 11 17:17:18 2010 +0100 sched: Remove WAKEUP_SYNC feature This feature never earned its keep, remove it. Signed-off-by: Mike Galbraith Signed-off-by: Peter Zijlstra LKML-Reference: <1268301591.6785.42.camel@marge.simson.net> Signed-off-by: Ingo Molnar commit 5ca9880c6f4ba4c84b517bc2fed5366adf63d191 Author: Mike Galbraith Date: Thu Mar 11 17:17:17 2010 +0100 sched: Remove FAIR_SLEEPERS feature Our preemption model relies too heavily on sleeper fairness to disable it without dire consequences. Remove the feature, and save a branch or two. Signed-off-by: Mike Galbraith Signed-off-by: Peter Zijlstra LKML-Reference: <1268301520.6785.40.camel@marge.simson.net> Signed-off-by: Ingo Molnar commit 6bc6cf2b61336ed0c55a615eb4c0c8ed5daf3f08 Author: Mike Galbraith Date: Thu Mar 11 17:17:17 2010 +0100 sched: Remove NORMALIZED_SLEEPER This feature hasn't been enabled in a long time, remove effectively dead code. Signed-off-by: Mike Galbraith Signed-off-by: Peter Zijlstra LKML-Reference: <1268301447.6785.38.camel@marge.simson.net> Signed-off-by: Ingo Molnar commit 8b911acdf08477c059d1c36c21113ab1696c612b Author: Mike Galbraith Date: Thu Mar 11 17:17:16 2010 +0100 sched: Fix select_idle_sibling() Don't bother with selection when the current cpu is idle. Recent load balancing changes also make it no longer necessary to check wake_affine() success before returning the selected sibling, so we now always use it. Signed-off-by: Mike Galbraith Signed-off-by: Peter Zijlstra LKML-Reference: <1268301369.6785.36.camel@marge.simson.net> Signed-off-by: Ingo Molnar commit 21406928afe43f1db6acab4931bb8c886f4d04ce Author: Mike Galbraith Date: Thu Mar 11 17:17:15 2010 +0100 sched: Tweak sched_latency and min_granularity Allow LAST_BUDDY to kick in sooner, improving cache utilization as soon as a second buddy pair arrives on scene. The cost is latency starting to climb sooner, the tbenefit for tbench 8 on my Q6600 box is ~2%. No detrimental effects noted in normal idesktop usage. Signed-off-by: Mike Galbraith Signed-off-by: Peter Zijlstra LKML-Reference: <1268301285.6785.34.camel@marge.simson.net> Signed-off-by: Ingo Molnar commit a64692a3afd85fe048551ab89142fd5ca99a0dbd Author: Mike Galbraith Date: Thu Mar 11 17:16:20 2010 +0100 sched: Cleanup/optimize clock updates Now that we no longer depend on the clock being updated prior to enqueueing on migratory wakeup, we can clean up a bit, placing calls to update_rq_clock() exactly where they are needed, ie on enqueue, dequeue and schedule events. In the case of a freshly enqueued task immediately preempting, we can skip the update during preemption, as the clock was just updated by the enqueue event. We also save an unneeded call during a migratory wakeup by not updating the previous runqueue, where update_curr() won't be invoked. Signed-off-by: Mike Galbraith Signed-off-by: Peter Zijlstra LKML-Reference: <1268301199.6785.32.camel@marge.simson.net> Signed-off-by: Ingo Molnar commit e12f31d3e5d36328c7fbd0fce40a95e70b59152c Author: Mike Galbraith Date: Thu Mar 11 17:15:51 2010 +0100 sched: Remove avg_overlap Both avg_overlap and avg_wakeup had an inherent problem in that their accuracy was detrimentally affected by cross-cpu wakeups, this because we are missing the necessary call to update_curr(). This can't be fixed without increasing overhead in our already too fat fastpath. Additionally, with recent load balancing changes making us prefer to place tasks in an idle cache domain (which is good for compute bound loads), communicating tasks suffer when a sync wakeup, which would enable affine placement, is turned into a non-sync wakeup by SYNC_LESS. With one task on the runqueue, wake_affine() rejects the affine wakeup request, leaving the unfortunate where placed, taking frequent cache misses. Remove it, and recover some fastpath cycles. Signed-off-by: Mike Galbraith Signed-off-by: Peter Zijlstra LKML-Reference: <1268301121.6785.30.camel@marge.simson.net> Signed-off-by: Ingo Molnar commit b42e0c41a422a212ddea0666d5a3a0e3c35206db Author: Mike Galbraith Date: Thu Mar 11 17:15:38 2010 +0100 sched: Remove avg_wakeup Testing the load which led to this heuristic (nfs4 kbuild) shows that it has outlived it's usefullness. With intervening load balancing changes, I cannot see any difference with/without, so recover there fastpath cycles. Signed-off-by: Mike Galbraith Signed-off-by: Peter Zijlstra LKML-Reference: <1268301062.6785.29.camel@marge.simson.net> Signed-off-by: Ingo Molnar commit 39c0cbe2150cbd848a25ba6cdb271d1ad46818ad Author: Mike Galbraith Date: Thu Mar 11 17:17:13 2010 +0100 sched: Rate-limit nohz Entering nohz code on every micro-idle is costing ~10% throughput for netperf TCP_RR when scheduling cross-cpu. Rate limiting entry fixes this, but raises ticks a bit. On my Q6600, an idle box goes from ~85 interrupts/sec to 128. The higher the context switch rate, the more nohz entry costs. With this patch and some cycle recovery patches in my tree, max cross cpu context switch rate is improved by ~16%, a large portion of which of which is this ratelimiting. Signed-off-by: Mike Galbraith Signed-off-by: Peter Zijlstra LKML-Reference: <1268301003.6785.28.camel@marge.simson.net> Signed-off-by: Ingo Molnar commit 9b33fa6ba0e2f90fdf407501db801c2511121564 Author: eranian@google.com Date: Wed Mar 10 22:26:05 2010 -0800 perf_events: Improve task_sched_in() This patch is an optimization in perf_event_task_sched_in() to avoid scheduling the events twice in a row. Without it, the perf_disable()/perf_enable() pair is invoked twice, thereby pinned events counts while scheduling flexible events and we go throuh hw_perf_enable() twice. By encapsulating, the whole sequence into perf_disable()/perf_enable() we ensure, hw_perf_enable() is going to be invoked only once because of the refcount protection. Signed-off-by: Stephane Eranian Signed-off-by: Peter Zijlstra LKML-Reference: <1268288765-5326-1-git-send-email-eranian@google.com> Signed-off-by: Ingo Molnar commit 41acab8851a0408c1d5ad6c21a07456f88b54d40 Author: Lucas De Marchi Date: Wed Mar 10 23:37:45 2010 -0300 sched: Implement group scheduler statistics in one struct Put all statistic fields of sched_entity in one struct, sched_statistics, and embed it into sched_entity. This change allows to memset the sched_statistics to 0 when needed (for instance when forking), avoiding bugs of non initialized fields. Signed-off-by: Lucas De Marchi Signed-off-by: Peter Zijlstra LKML-Reference: <1268275065-18542-1-git-send-email-lucas.de.marchi@gmail.com> Signed-off-by: Ingo Molnar commit 6f4edd69e40aba4f45bf9558c1e9a950d79ab4e4 Author: Jack Steiner Date: Wed Mar 10 14:44:58 2010 -0600 x86, UV: Clean up UV headers for MMR definitions Update UV mmr definitions header file. Eliminate definitions no longer needed. Move 2 definitions from tlb_uv.c into the header file where they belong. Signed-off-by: Jack Steiner LKML-Reference: <20100310204458.GA28835@sgi.com> Signed-off-by: Ingo Molnar commit 938179b4f8cf8a4f11234ebf2dff2eb48400acfe Author: Dimitri Sivanich Date: Fri Mar 5 11:42:03 2010 -0600 x86: Improve Intel microcode loader performance We've noticed that on large SGI UV system configurations, running microcode.ctl can take very long periods of time. This is due to the large number of vmalloc/vfree calls made by the Intel generic_load_microcode() logic. By reusing allocated space, the following patch reduces the time to run microcode.ctl on a 1024 cpu system from approximately 80 seconds down to 1 or 2 seconds. Signed-off-by: Dimitri Sivanich Acked-by: Dmitry Adamushko Cc: Avi Kivity Cc: Bill Davidsen LKML-Reference: <20100305174203.GA19638@sgi.com> Signed-off-by: Ingo Molnar commit caa0142d84ceb0fc83e28f0475d0a7316cb6df77 Author: Ingo Molnar Date: Sat Jun 6 13:58:12 2009 +0200 perf, x86: Fix the !CONFIG_CPU_SUP_INTEL build Fix typo. But the modularization here is ugly and should be improved. Cc: Peter Zijlstra Cc: Frederic Weisbecker Cc: Paul Mackerras Cc: Arnaldo Carvalho de Melo LKML-Reference: Signed-off-by: Ingo Molnar commit ba7e4d13fc7e25af1d167d40e6f028298dfc55ad Author: Ingo Molnar Date: Sat Jun 6 13:58:12 2009 +0200 perf, x86: Add INSTRUCTION_DECODER config flag The PEBS+LBR decoding magic needs the insn_get_length() infrastructure to be able to decode x86 instruction length. So split it out of KPROBES dependency and make it enabled when either KPROBES or PERF_EVENTS is enabled. Cc: Peter Zijlstra Cc: Masami Hiramatsu Cc: Frederic Weisbecker Cc: Paul Mackerras Cc: Arnaldo Carvalho de Melo LKML-Reference: Signed-off-by: Ingo Molnar commit 63fb3f9b2312e131be5a0a2dddb63f2fb123db9b Author: Peter Zijlstra Date: Tue Mar 9 11:51:02 2010 +0100 perf, x86: Fix LBR read-out Don't decrement the TOS twice... Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com LKML-Reference: Signed-off-by: Ingo Molnar commit d80c7502ff63aa0d99d8c0c5803d28bbef67a74e Author: Peter Zijlstra Date: Tue Mar 9 11:41:02 2010 +0100 perf, x86: Fixup the PEBS handler for Core2 cpus Pull the core handler in line with the nhm one, also make sure we always drain the buffer. Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com LKML-Reference: Signed-off-by: Ingo Molnar commit 7645a24cbd01cbf4865d1273d5ddaa8d8c2ccb3a Author: Peter Zijlstra Date: Mon Mar 8 13:51:31 2010 +0100 perf, x86: Remove checking_{wr,rd}msr() usage We don't need checking_{wr,rd}msr() calls, since we should know what cpu we're running on and not use blindly poke at msrs. Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com LKML-Reference: Signed-off-by: Ingo Molnar commit b83a46e7da4a948cc852ba7805dfb1a392dec861 Author: Peter Zijlstra Date: Mon Mar 8 13:51:12 2010 +0100 perf, x86: Don't reset the LBR as frequently If we reset the LBR on each first counter, simple counter rotation which first deschedules all counters and then reschedules the new ones will lead to LBR reset, even though we're still in the same task context. Reduce this by not flushing on the first counter but only flushing on different task contexts. Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com LKML-Reference: Signed-off-by: Ingo Molnar commit ad0e6cfe2a2a61d7b5530188e571d508146cb43b Author: Peter Zijlstra Date: Sat Mar 6 19:49:06 2010 +0100 perf, x86: Fix silly bug in intel_pmu_pebs_{enable,disable} We need to use the actual cpuc->pebs_enabled value, not a local copy for the changes to take effect. Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com LKML-Reference: Signed-off-by: Ingo Molnar commit 12ab854d744f04bfc5c6c4db723b7e31fc03eb29 Author: Peter Zijlstra Date: Sat Mar 6 18:57:38 2010 +0100 perf, x86: Deal with multiple state bits for pebs-fmt1 Its unclear if the PEBS state record will have only a single bit set, in case it does not and accumulates bits, deal with that by only processing each event once. Also, robustify some of the code. Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com LKML-Reference: Signed-off-by: Ingo Molnar commit d329527e47851f84b1e7944ed9601205f35f1b93 Author: Peter Zijlstra Date: Mon Mar 8 13:57:14 2010 +0100 perf, x86: Reorder intel_pmu_enable_all() The documentation says we have to enable PEBS before we enable the PMU proper. Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com LKML-Reference: Signed-off-by: Ingo Molnar commit 2df202bf7520eaffcbfb07e45dfa3cfb0aeee2c0 Author: Peter Zijlstra Date: Sat Mar 6 13:48:54 2010 +0100 perf, x86: Fix LBR enable/disable vs cpuc->enabled We should never call ->enable with the pmu enabled, and we _can_ have ->disable called with the pmu enabled. Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com LKML-Reference: Signed-off-by: Ingo Molnar commit 4807e3d5dc7bb7057dd6ca3abb09f3da2eb8c323 Author: Peter Zijlstra Date: Sat Mar 6 13:47:07 2010 +0100 perf, x86: Fix PEBS enable/disable vs cpuc->enabled We should never call ->enable with the pmu enabled, and we _can_ have ->disable called with the pmu enabled. Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com LKML-Reference: Signed-off-by: Ingo Molnar commit 8f4aebd2be9892bf8fb79a2d8576d3f3ee7f00f6 Author: Peter Zijlstra Date: Sat Mar 6 13:26:11 2010 +0100 perf, x86: Fix pebs drains I overlooked the perf_disable()/perf_enable() calls in intel_pmu_handle_irq(), (pointed out by Markus) so we should not explicitly disable_all/enable_all pebs counters in the drain functions, these are already disabled and enabling them early is confusing. Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com LKML-Reference: Signed-off-by: Ingo Molnar commit cc7f00820b2f3be656569c41158d9323e425bcfe Author: Peter Zijlstra Date: Mon Mar 8 17:51:33 2010 +0100 perf, x86: Avoid double disable on throttle vs ioctl(PERF_IOC_DISABLE) Calling ioctl(PERF_EVENT_IOC_DISABLE) on a thottled counter would result in a double disable, cure this by using x86_pmu_{start,stop} for throttle/unthrottle and teach x86_pmu_stop() to check ->active_mask. Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com LKML-Reference: Signed-off-by: Ingo Molnar commit a562b1871f7f7d2f3a835c3c1e07fa58d473cfb7 Author: Peter Zijlstra Date: Fri Mar 5 16:29:14 2010 +0100 perf, x86: Robustify PEBS fixup It turns out the LBR is massively unreliable on certain CPUs, so code the fixup a little more defensive to avoid crashing the kernel. Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com LKML-Reference: <20100305154129.042271287@chello.nl> Signed-off-by: Ingo Molnar commit 74846d35b24b6efd61bb88a0a750b6bb257e6e78 Author: Peter Zijlstra Date: Fri Mar 5 13:49:35 2010 +0100 perf, x86: Clear the LBRs on init Some CPUs have errata where the LBR is not cleared on Power-On. So always clear the LBRs before use. Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com LKML-Reference: <20100305154128.966563424@chello.nl> Signed-off-by: Ingo Molnar commit 3c44780b220e876b01e39d4028cd6f4205fbf5d6 Author: Peter Zijlstra Date: Thu Mar 4 21:49:01 2010 +0100 perf, x86: Disable PEBS on clovertown chips This CPU has just too many handycaps to be really useful. Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com LKML-Reference: <20100305154128.890278662@chello.nl> Signed-off-by: Ingo Molnar commit 3adaebd69557615c1bf0365ce5e32d93ac7d82af Author: Peter Zijlstra Date: Fri Mar 5 12:09:29 2010 +0100 perf, x86: Fix silly bug in data store buffer allocation Fix up the ds allocation error path, where we could free @buffer before we used it. Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com LKML-Reference: <20100305154128.813452402@chello.nl> Signed-off-by: Ingo Molnar commit 30a813ae035d3e220a89609adce878e045c49547 Author: Peter Zijlstra Date: Thu Mar 4 13:49:21 2010 +0100 x86: Move MAX_INSN_SIZE into asm/insn.h Since there's now two users for this, place it in a common header. Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com LKML-Reference: <20100304140100.923774125@chello.nl> Signed-off-by: Ingo Molnar commit 7e1a40dda619b0483fbe0740494ed2c2a1f05289 Author: Peter Zijlstra Date: Thu Mar 4 12:38:03 2010 +0100 perf, x86: Expose the full PEBS record using PERF_SAMPLE_RAW Expose the full PEBS record using PERF_SAMPLE_RAW Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com LKML-Reference: <20100304140100.847218224@chello.nl> Signed-off-by: Ingo Molnar commit 8db909a7e3c888b5d45aef7650d74ccebe3ce725 Author: Peter Zijlstra Date: Wed Mar 3 17:07:40 2010 +0100 perf, x86: Clean up IA32_PERF_CAPABILITIES usage Saner PERF_CAPABILITIES support, which also exposes pebs_trap. Use that latter to make PEBS's use of LBR conditional since a fault-like pebs should already report the correct IP. ( As of this writing there is no known hardware that implements !pebs_trap ) Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com LKML-Reference: <20100304140100.770650663@chello.nl> Signed-off-by: Ingo Molnar commit 1676b8a077c352085d52578fb4f29350b58b6e74 Author: Peter Zijlstra Date: Thu Mar 4 14:19:36 2010 +0100 perf-top: Show the percentage of successfull PEBS-fixups Use the PERF_RECORD_MISC_EXACT information to measure the success rate of the PEBS fix-up. Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com LKML-Reference: <20100304140100.694233760@chello.nl> Signed-off-by: Ingo Molnar commit ef21f683a045a79b6aa86ad81e5fdfc0d5ddd250 Author: Peter Zijlstra Date: Wed Mar 3 13:12:23 2010 +0100 perf, x86: use LBR for PEBS IP+1 fixup Use the LBR to fix up the PEBS IP+1 issue. As said, PEBS reports the next instruction, here we use the LBR to find the last branch and from that construct the actual IP. If the IP matches the LBR-TO, we use LBR-FROM, otherwise we use the LBR-TO address as the beginning of the last basic block and decode forward. Once we find a match to the current IP, we use the previous location. This patch introduces a new ABI element: PERF_RECORD_MISC_EXACT, which conveys that the reported IP (PERF_SAMPLE_IP) is the exact instruction that caused the event (barring CPU errata). The fixup can fail due to various reasons: 1) LBR contains invalid data (quite possible) 2) part of the basic block got paged out 3) the reported IP isn't part of the basic block (see 1) Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: "Zhang, Yanmin" Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com LKML-Reference: <20100304140100.619375431@chello.nl> Signed-off-by: Ingo Molnar commit caff2befffe899e63df5cc760b7ed01cfd902685 Author: Peter Zijlstra Date: Wed Mar 3 12:02:30 2010 +0100 perf, x86: Implement simple LBR support Implement simple suport Intel Last-Branch-Record, it supports all hardware that implements FREEZE_LBRS_ON_PMI, but does not (yet) implement the LBR config register. The Intel LBR is a FIFO of From,To addresses describing the last few branches the hardware took. This patch does not add perf interface to the LBR, but merely provides an interface for internal use. Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com LKML-Reference: <20100304140100.544191154@chello.nl> Signed-off-by: Ingo Molnar commit 69fef0d2e2c2c049ef4207a52e78b50d527bd85a Author: Peter Zijlstra Date: Thu Mar 4 13:57:24 2010 +0100 perf: Add attr->precise support to raw event parsing Minimal userspace interface to the new 'precise' events flag. Can be used like "perf top -e r00c0p" which will use PEBS to sample retired instructions. Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com LKML-Reference: <20100304140100.468665803@chello.nl> Signed-off-by: Ingo Molnar commit ca037701a025334e724e5c61b3b1082940c8b981 Author: Peter Zijlstra Date: Tue Mar 2 19:52:12 2010 +0100 perf, x86: Add PEBS infrastructure This patch implements support for Intel Precise Event Based Sampling, which is an alternative counter mode in which the counter triggers a hardware assist to collect information on events. The hardware assist takes a trap like snapshot of a subset of the machine registers. This data is written to the Intel Debug-Store, which can be programmed with a data threshold at which to raise a PMI. With the PEBS hardware assist being trap like, the reported IP is always one instruction after the actual instruction that triggered the event. This implements a simple PEBS model that always takes a single PEBS event at a time. This is done so that the interaction with the rest of the system is as expected (freq adjust, period randomization, lbr, callchains, etc.). It adds an ABI element: perf_event_attr::precise, which indicates that we wish to use this (constrained, but precise) mode. Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: paulus@samba.org Cc: eranian@google.com Cc: robert.richter@amd.com Cc: fweisbec@gmail.com LKML-Reference: <20100304140100.392111285@chello.nl> Signed-off-by: Ingo Molnar commit 12c7389abe5786349d3ea6da1961cf78d0c1c7cd Author: Joerg Roedel Date: Thu Jan 21 11:50:28 2010 +0100 iommu-api: Remove iommu_{un}map_range functions These functions are not longer used and can be removed savely. There functionality is now provided by the iommu_{un}map functions which are also capable of multiple page sizes. Signed-off-by: Joerg Roedel commit 468e2366cdb80cf8a691b8bc212260cfbdbd518e Author: Joerg Roedel Date: Thu Jan 21 16:37:36 2010 +0100 x86/amd-iommu: Implement ->{un}map callbacks for iommu-api This patch implements the new callbacks for the IOMMU-API with functions that can handle different page sizes in the IOMMU page table. Signed-off-by: Joerg Roedel commit f03152bb7d0a74f409ad63ed36916444a7493d72 Author: Joerg Roedel Date: Thu Jan 21 16:15:24 2010 +0100 x86/amd-iommu: Make amd_iommu_iova_to_phys aware of multiple page sizes This patch extends the amd_iommu_iova_to_phys() function to handle different page sizes correctly. It doesn't use fetch_pte() anymore because we don't know (or care about) the page_size used for mapping the given iova. Signed-off-by: Joerg Roedel commit 24cd772315c19e4d9409d0d21367ec1ebab3149f Author: Joerg Roedel Date: Tue Jan 19 17:27:39 2010 +0100 x86/amd-iommu: Make iommu_unmap_page and fetch_pte aware of page sizes This patch extends the functionality of iommu_unmap_page and fetch_pte to support arbitrary page sizes. Signed-off-by: Joerg Roedel commit cbb9d729f3433c9c2660b01dc52e6deb89488886 Author: Joerg Roedel Date: Fri Jan 15 14:41:15 2010 +0100 x86/amd-iommu: Make iommu_map_page and alloc_pte aware of page sizes This patch changes the old map_size parameter of alloc_pte to a page_size parameter which can be used more easily to alloc a pte for intermediate page sizes. Signed-off-by: Joerg Roedel commit fcd95807fb61e67d602610e7ff7129ed769e9fee Author: Joerg Roedel Date: Mon Jan 11 16:38:18 2010 +0100 kvm: Change kvm_iommu_map_pages to map large pages This patch changes the implementation of of kvm_iommu_map_pages to map the pages with the host page size into the io virtual address space. Signed-off-by: Joerg Roedel Acked-By: Avi Kivity commit b146a1c9f7f1feeacf840fa1ba197a99593cea15 Author: Joerg Roedel Date: Wed Jan 20 17:17:37 2010 +0100 VT-d: Change {un}map_range functions to implement {un}map interface This patch changes the iommu-api functions for mapping and unmapping page ranges to use the new page-size based interface. This allows to remove the range based functions later. Signed-off-by: Joerg Roedel commit 67651786948c360c3122b8a17cb1e59209d50880 Author: Joerg Roedel Date: Thu Jan 21 16:32:27 2010 +0100 iommu-api: Add ->{un}map callbacks to iommu_ops This patch adds new callbacks for mapping and unmapping pages to the iommu_ops structure. These callbacks are aware of page sizes which makes them different to the ->{un}map_range callbacks. Signed-off-by: Joerg Roedel commit cefc53c7f494240d4813c80154c7617452d1904d Author: Joerg Roedel Date: Fri Jan 8 13:35:09 2010 +0100 iommu-api: Add iommu_map and iommu_unmap functions These two functions provide support for mapping and unmapping physical addresses to io virtual addresses. The difference to the iommu_(un)map_range() is that the new functions take a gfp_order parameter instead of a size. This allows the IOMMU backend implementations to detect easier if a given range can be mapped by larger page sizes. These new functions should replace the old ones in the long term. Signed-off-by: Joerg Roedel commit 4abc14a733f9002c05623db755aaafdd27fa7a91 Author: Joerg Roedel Date: Wed Jan 20 14:52:23 2010 +0100 iommu-api: Rename ->{un}map function pointers to ->{un}map_range The new function pointer names match better with the top-level functions of the iommu-api which are using them. Main intention of this change is to make the ->{un}map pointer names free for two new mapping functions. Signed-off-by: Joerg Roedel commit bc078e4eab65f11bbaeed380593ab8151b30d703 Author: Martin Schwidefsky Date: Tue Mar 2 16:01:10 2010 +0100 oprofile: convert oprofile from timer_hook to hrtimer Oprofile is currently broken on systems running with NOHZ enabled. A maximum of 1 tick is accounted via the timer_hook if a cpu sleeps for a longer period of time. This does bad things to the percentages in the profiler output. To solve this problem convert oprofile to use a restarting hrtimer instead of the timer_hook. Signed-off-by: Martin Schwidefsky Signed-off-by: Robert Richter commit ced918eb748ce30b3aace549fd17540e40ffdca0 Author: Thomas Gleixner Date: Wed Feb 17 16:47:10 2010 +0000 i8253: Convert i8253_lock to raw_spinlock i8253_lock needs to be a real spinlock in preempt-rt, i.e. it can not be converted to a sleeping lock. Convert it to raw_spinlock and fix up all users. Signed-off-by: Thomas Gleixner Acked-by: Ralf Baechle Acked-by: Dmitry Torokhov Acked-by: Takashi Iwai Cc: Jens Axboe LKML-Reference: <20100217163751.030764372@linutronix.de> commit 372e22ef0a87d5fc10d387791f9f19721115820c Author: Randy Dunlap Date: Mon Mar 1 22:06:25 2010 -0800 x86, docbook: Fix errors from x86 headers merger Fix docbook errors for x86 headers that were recently merged: docproc: arch/x86/include/asm/atomic_32.h: No such file or directory docproc: arch/x86/include/asm/io_32.h: No such file or directory Signed-off-by: Randy Dunlap LKML-Reference: <20100301220625.bad46c19.randy.dunlap@oracle.com> Cc: Brian Gerst Signed-off-by: H. Peter Anvin commit 4daa2a8093ecd1148270a1fc64e99f072b8c2901 Author: Pallipadi, Venkatesh Date: Wed Feb 24 13:43:55 2010 -0800 x86, pat: In rbt_memtype_check_insert(), update new->type only if valid new->type should only change when there is a valid ret_type. Otherwise the requested type and return type should be same. Signed-off-by: Venkatesh Pallipadi LKML-Reference: <20100224214355.GA16431@linux-os.sc.intel.com> Tested-by: Jack Steiner Signed-off-by: H. Peter Anvin commit a5c9161f27c3e1ae6c0094d262f03a7e98262181 Author: H. Peter Anvin Date: Mon Mar 1 11:49:23 2010 -0800 x86, atomic64: In selftest, distinguish x86-64 from 586+ The x86-64 implementation of the atomics is totally different from the i586+ implementation, which makes it quite confusing to call it "586+". Also fix indentation, and add "i" for "i386" and "i586" as used elsewhere in the kernel. Signed-off-by: H. Peter Anvin Cc: Luca Barbieri LKML-Reference: <1267005265-27958-4-git-send-email-luca@luca-barbieri.com> commit f3e83131469e29032a700217aa394996107b8fc5 Author: Luca Barbieri Date: Mon Mar 1 19:55:49 2010 +0100 x86-32: Fix atomic64_inc_not_zero return value convention atomic64_inc_not_zero must return 1 if it perfomed the add and 0 otherwise. It was doing the opposite thing. Signed-off-by: Luca Barbieri LKML-Reference: <1267469749-11878-6-git-send-email-luca@luca-barbieri.com> Signed-off-by: H. Peter Anvin commit 25a304f277ad70166eeae25a4958d2049005c33a Author: Luca Barbieri Date: Mon Mar 1 19:55:48 2010 +0100 lib: Fix atomic64_inc_not_zero test atomic64_inc_not_zero must return 1 if it perfomed the add and 0 otherwise. The test assumed the opposite convention. Signed-off-by: Luca Barbieri LKML-Reference: <1267469749-11878-5-git-send-email-luca@luca-barbieri.com> Signed-off-by: H. Peter Anvin commit 97577896f6b9c056fa0a5e9f6a608110cb3dcd33 Author: Luca Barbieri Date: Mon Mar 1 19:55:47 2010 +0100 lib: Fix atomic64_add_unless return value convention atomic64_add_unless must return 1 if it perfomed the add and 0 otherwise. The generic implementation did the opposite thing. Reported-by: H. Peter Anvin Confirmed-by: Paul Mackerras Signed-off-by: Luca Barbieri LKML-Reference: <1267469749-11878-4-git-send-email-luca@luca-barbieri.com> Signed-off-by: H. Peter Anvin commit 6e6104fe085026e6ef82cc5cc303d6c8ceb7e411 Author: Luca Barbieri Date: Mon Mar 1 19:55:46 2010 +0100 x86-32: Fix atomic64_add_unless return value convention atomic64_add_unless must return 1 if it perfomed the add and 0 otherwise. The implementation did the opposite thing. Reported-by: H. Peter Anvin Signed-off-by: Luca Barbieri LKML-Reference: <1267469749-11878-3-git-send-email-luca@luca-barbieri.com> Signed-off-by: H. Peter Anvin commit 9efbcd590243045111670c171a951923b877b57d Author: Luca Barbieri Date: Mon Mar 1 19:55:45 2010 +0100 lib: Fix atomic64_add_unless test atomic64_add_unless must return 1 if it perfomed the add and 0 otherwise. The test assumed the opposite convention. Reported-by: H. Peter Anvin Signed-off-by: Luca Barbieri LKML-Reference: <1267469749-11878-2-git-send-email-luca@luca-barbieri.com> Signed-off-by: H. Peter Anvin commit d7f6de1e9c4a12e11ba7186c70f0f40caa76f590 Author: Luca Barbieri Date: Fri Feb 26 12:22:41 2010 +0100 x86: Implement atomic[64]_dec_if_positive() Add support for atomic_dec_if_positive(), and atomic64_dec_if_positive() for x86-64. atomic64_dec_if_positive() for x86-32 was already implemented in a previous patch. Signed-off-by: Luca Barbieri LKML-Reference: <1267183361-20775-2-git-send-email-luca@luca-barbieri.com> Signed-off-by: H. Peter Anvin commit 8f4f202b335144bf5be5c9e5b1bc9477ecdae958 Author: Luca Barbieri Date: Fri Feb 26 12:22:40 2010 +0100 lib: Only test atomic64_dec_if_positive on archs having it Currently atomic64_dec_if_positive() is only supported by PowerPC, MIPS and x86-32. Signed-off-by: Luca Barbieri LKML-Reference: <1267183361-20775-1-git-send-email-luca@luca-barbieri.com> Signed-off-by: H. Peter Anvin commit a7e926abc3adfbd2e5e20d2b46177adb4e313915 Author: Luca Barbieri Date: Wed Feb 24 10:54:25 2010 +0100 x86-32: Rewrite 32-bit atomic64 functions in assembly This patch replaces atomic64_32.c with two assembly implementations, one for 386/486 machines using pushf/cli/popf and one for 586+ machines using cmpxchg8b. The cmpxchg8b implementation provides the following advantages over the current one: 1. Implements atomic64_add_unless, atomic64_dec_if_positive and atomic64_inc_not_zero 2. Uses the ZF flag changed by cmpxchg8b instead of doing a comparison 3. Uses custom register calling conventions that reduce or eliminate register moves to suit cmpxchg8b 4. Reads the initial value instead of using cmpxchg8b to do that. Currently we use lock xaddl and movl, which seems the fastest. 5. Does not use the lock prefix for atomic64_set 64-bit writes are already atomic, so we don't need that. We still need it for atomic64_read to avoid restoring a value changed in the meantime. 6. Allocates registers as well or better than gcc The 386 implementation provides support for 386 and 486 machines. 386/486 SMP is not supported (we dropped it), but such support can be added easily if desired. A pure assembly implementation is required due to the custom calling conventions, and desire to use %ebp in atomic64_add_return (we need 7 registers...), as well as the ability to use pushf/popf in the 386 code without an intermediate pop/push. The parameter names are changed to match the convention in atomic_64.h Changes in v3 (due to rebasing to tip/x86/asm): - Patches atomic64_32.h instead of atomic_32.h - Uses the CALL alternative mechanism from commit 1b1d9258181bae199dc940f4bd0298126b9a73d9 Changes in v2: - Merged 386 and cx8 support in the same patch - 386 support now done in assembly, C code no longer used at all - cmpxchg64 is used for atomic64_cmpxchg - stop using macros, use one-line inline functions instead - miscellanous changes and improvements Signed-off-by: Luca Barbieri LKML-Reference: <1267005265-27958-5-git-send-email-luca@luca-barbieri.com> Signed-off-by: H. Peter Anvin commit 86a8938078a8bb518c5376de493e348c7490d506 Author: Luca Barbieri Date: Wed Feb 24 10:54:24 2010 +0100 lib: Add self-test for atomic64_t This patch adds self-test on boot code for atomic64_t. This has been used to test the later changes in this patchset. Signed-off-by: Luca Barbieri LKML-Reference: <1267005265-27958-4-git-send-email-luca@luca-barbieri.com> Signed-off-by: H. Peter Anvin commit 9c76b38476b18c45f97098a10b0176b321eba3ea Author: Luca Barbieri Date: Wed Feb 24 10:54:23 2010 +0100 x86-32: Allow UP/SMP lock replacement in cmpxchg64 Use the functionality just introduced in the previous patch: mark the lock prefixes in cmpxchg64 alternatives for UP removal. Changes in v2: - Naming change Signed-off-by: Luca Barbieri LKML-Reference: <1267005265-27958-3-git-send-email-luca@luca-barbieri.com> Signed-off-by: H. Peter Anvin commit b3ac891b67bd4b1fc728d1c784cad1212dea433d Author: Luca Barbieri Date: Wed Feb 24 10:54:22 2010 +0100 x86: Add support for lock prefix in alternatives The current lock prefix UP/SMP alternative code doesn't allow LOCK_PREFIX to be used in alternatives code. This patch solves the problem by adding a new LOCK_PREFIX_ALTERNATIVE_PATCH macro that only records the lock prefix location but does not emit the prefix. The user of this macro can then start any alternative sequence with "lock" and have it UP/SMP patched. To make this work, the UP/SMP alternative code is changed to do the lock/DS prefix switching only if the byte actually contains a lock or DS prefix. Thus, if an alternative without the "lock" is selected, it will now do nothing instead of clobbering the code. Changes in v2: - Naming change - Change label to not conflict with alternatives Signed-off-by: Luca Barbieri LKML-Reference: <1267005265-27958-2-git-send-email-luca@luca-barbieri.com> Signed-off-by: H. Peter Anvin commit 9e41a49aab88a5a6c8f4875bf10a5543bc321f2d Author: Pallipadi, Venkatesh Date: Wed Feb 10 15:26:07 2010 -0800 x86, pat: Migrate to rbtree only backend for pat memtype management Move pat backend to fully rbtree based implementation from the existing rbtree and linked list hybrid. New rbtree based solution uses interval trees (augmented rbtrees) in order to store the PAT ranges. The new code seprates out the pat backend to pat_rbtree.c file, making is cleaner. The change also makes the PAT lookup, reserve and free operations more optimal, as we don't have to traverse linear linked list of few tens of entries in normal case. Signed-off-by: Venkatesh Pallipadi LKML-Reference: <20100210232607.GB11465@linux-os.sc.intel.com> Signed-off-by: Suresh Siddha Signed-off-by: H. Peter Anvin commit be5a0c126ad1dea2128dc5aef12c87083518d1ab Author: venkatesh.pallipadi@intel.com Date: Wed Feb 10 11:57:06 2010 -0800 x86, pat: Preparatory changes in pat.c for bigger rbtree change Minor changes in pat.c to cleanup code and make it smoother to introduce bigger rbtree only change in the following patch. The changes are cleaup only and should not have any functional impact. Signed-off-by: Venkatesh Pallipadi LKML-Reference: <20100210195909.792781000@intel.com> Signed-off-by: Suresh Siddha Signed-off-by: H. Peter Anvin commit 17d9ddc72fb8bba0d4f67868c9c612e472a594a9 Author: Pallipadi, Venkatesh Date: Wed Feb 10 15:23:44 2010 -0800 rbtree: Add support for augmented rbtrees Add support for augmented rbtrees in core rbtree code. This will be used in subsequent patches, in x86 PAT code, which needs interval trees to efficiently keep track of PAT ranges. Signed-off-by: Venkatesh Pallipadi LKML-Reference: <20100210232343.GA11465@linux-os.sc.intel.com> Signed-off-by: Suresh Siddha Signed-off-by: H. Peter Anvin