commit 991ec02cdca33b03a132a0cacfe6f0aa0be9aa8d Merge: 8623661 84047e3 Author: Linus Torvalds Date: Wed Jun 10 19:58:10 2009 -0700 Merge branch 'tracing-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: function-graph: always initialize task ret_stack function-graph: move initialization of new tasks up in fork function-graph: add memory barriers for accessing task's ret_stack function-graph: enable the stack after initialization of other variables function-graph: only allocate init tasks if it was not already done Manually fix trivial conflict in kernel/trace/ftrace.c commit 862366118026a358882eefc70238dbcc3db37aac Merge: 57eee9a 511b01b Author: Linus Torvalds Date: Wed Jun 10 19:53:40 2009 -0700 Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (244 commits) Revert "x86, bts: reenable ptrace branch trace support" tracing: do not translate event helper macros in print format ftrace/documentation: fix typo in function grapher name tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK tracing: add protection around module events unload tracing: add trace_seq_vprint interface tracing: fix the block trace points print size tracing/events: convert block trace points to TRACE_EVENT() ring-buffer: fix ret in rb_add_time_stamp ring-buffer: pass in lockdep class key for reader_lock tracing: add annotation to what type of stack trace is recorded tracing: fix multiple use of __print_flags and __print_symbolic tracing/events: fix output format of user stack tracing/events: fix output format of kernel stack tracing/trace_stack: fix the number of entries in the header ring-buffer: discard timestamps that are at the start of the buffer ring-buffer: try to discard unneeded timestamps ring-buffer: fix bug in ring_buffer_discard_commit ftrace: do not profile functions when disabled tracing: make trace pipe recognize latency format flag ... commit 57eee9ae7bbcfb692dc96c739a5184adb6349733 Merge: 8f40642 7e4e0bd Author: Linus Torvalds Date: Wed Jun 10 19:51:10 2009 -0700 Merge branch 'oprofile-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'oprofile-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: oprofile: introduce module_param oprofile.cpu_type oprofile: add support for Core i7 and Atom oprofile: remove undocumented oprofile.p4force option oprofile: re-add force_arch_perfmon option commit 8f40642ad315c553bab4ae800766ade07e574a77 Merge: 20f3f3c 12d1611 Author: Linus Torvalds Date: Wed Jun 10 19:50:52 2009 -0700 Merge branch 'signal-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'signal-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: hookup sys_rt_tgsigqueueinfo signals: implement sys_rt_tgsigqueueinfo signals: split do_tkill commit 20f3f3ca499d2c211771ba552685398b65d83859 Merge: 769f3e8 41c51c9 Author: Linus Torvalds Date: Wed Jun 10 19:50:03 2009 -0700 Merge branch 'rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: rcu_sched_grace_period(): kill the bogus flush_signals() rculist: use list_entry_rcu in places where it's appropriate rculist.h: introduce list_entry_rcu() and list_first_entry_rcu() rcu: Update RCU tracing documentation for __rcu_pending rcu: Add __rcu_pending tracing to hierarchical RCU RCU: make treercu be default commit 769f3e8c384795cc350e2aae27de2a12374d19d4 Merge: c0d2545 0c8b946 Author: Linus Torvalds Date: Wed Jun 10 16:21:16 2009 -0700 Merge branch 'printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: vsprintf: introduce %pf format specifier printk: add support of hh length modifier for printk commit c0d254504fdaeb0878b8415295e365ebb78684f8 Merge: e7241d7 e1b9aa3 Author: Linus Torvalds Date: Wed Jun 10 16:20:30 2009 -0700 Merge branch 'percpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'percpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: percpu: remove rbtree and use page->index instead percpu: don't put the first chunk in reverse-map rbtree commit e7241d771419b8a8671ebc46a043c324ccb0dcf7 Merge: 3f6280d 04dce7d Author: Linus Torvalds Date: Wed Jun 10 16:19:40 2009 -0700 Merge branch 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: spinlock: Add missing __raw_spin_lock_flags() stub for UP mutex: add atomic_dec_and_mutex_lock(), fix locking, rtmutex.c: Documentation cleanup mutex: add atomic_dec_and_mutex_lock() commit 3f6280ddf25fa656d0e17960588e52bee48a7547 Merge: 7506360 92db1e6 Author: Linus Torvalds Date: Wed Jun 10 16:19:14 2009 -0700 Merge branch 'iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (61 commits) amd-iommu: remove unnecessary "AMD IOMMU: " prefix amd-iommu: detach device explicitly before attaching it to a new domain amd-iommu: remove BUS_NOTIFY_BOUND_DRIVER handling dma-debug: simplify logic in driver_filter() dma-debug: disable/enable irqs only once in device_dma_allocations dma-debug: use pr_* instead of printk(KERN_* ...) dma-debug: code style fixes dma-debug: comment style fixes dma-debug: change hash_bucket_find from first-fit to best-fit x86: enable GART-IOMMU only after setting up protection methods amd_iommu: fix lock imbalance dma-debug: add documentation for the driver filter dma-debug: add dma_debug_driver kernel command line dma-debug: add debugfs file for driver filter dma-debug: add variables and checks for driver filter dma-debug: fix debug_dma_sync_sg_for_cpu and debug_dma_sync_sg_for_device dma-debug: use sg_dma_len accessor dma-debug: use sg_dma_address accessor instead of using dma_address directly amd-iommu: don't free dma adresses below 512MB with CONFIG_IOMMU_STRESS amd-iommu: don't preallocate page tables with CONFIG_IOMMU_STRESS ... commit 75063600fd7b27fe447112c27997f100b9e2f99b Merge: be15f9d 2070887 Author: Linus Torvalds Date: Wed Jun 10 16:16:48 2009 -0700 Merge branch 'futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futex: fix restart in wait_requeue_pi futex: fix restart for early wakeup in futex_wait_requeue_pi() futex: cleanup error exit futex: remove the wait queue futex: add requeue-pi documentation futex: remove FUTEX_REQUEUE_PI (non CMP) futex: fix futex_wait_setup key handling sparc64: extend TI_RESTART_BLOCK space by 8 bytes futex: fixup unlocked requeue pi case futex: add requeue_pi functionality futex: split out futex value validation code futex: distangle futex_requeue() futex: add FUTEX_HAS_TIMEOUT flag to restart.futex.flags rt_mutex: add proxy lock routines futex: split out fixup owner logic from futex_lock_pi() futex: split out atomic logic from futex_lock_pi() futex: add helper to find the top prio waiter of a futex futex: separate futex_wait_queue_me() logic from futex_wait() commit be15f9d63b97da0065187696962331de6cd9de9e Merge: 595dc54 a789ed5 Author: Linus Torvalds Date: Wed Jun 10 16:16:27 2009 -0700 Merge branch 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (42 commits) xen: cache cr0 value to avoid trap'n'emulate for read_cr0 xen/x86-64: clean up warnings about IST-using traps xen/x86-64: fix breakpoints and hardware watchpoints xen: reserve Xen start_info rather than e820 reserving xen: add FIX_TEXT_POKE to fixmap lguest: update lazy mmu changes to match lguest's use of kvm hypercalls xen: honour VCPU availability on boot xen: add "capabilities" file xen: drop kexec bits from /sys/hypervisor since kexec isn't implemented yet xen/sys/hypervisor: change writable_pt to features xen: add /sys/hypervisor support xen/xenbus: export xenbus_dev_changed xen: use device model for suspending xenbus devices xen: remove suspend_cancel hook xen/dev-evtchn: clean up locking in evtchn xen: export ioctl headers to userspace xen: add /dev/xen/evtchn driver xen: add irq_from_evtchn xen: clean up gate trap/interrupt constants xen: set _PAGE_NX in __supported_pte_mask before pagetable construction ... commit 595dc54a1da91408a52c4b962f3deeb1109aaca0 Merge: 9b29e82 7d96fd4 Author: Linus Torvalds Date: Wed Jun 10 16:15:59 2009 -0700 Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: move rdtsc_barrier() into the TSC vread method commit 9b29e8228a5c2a169436a1a90a60b1f88cb35cd1 Merge: bec7068 0b8c3d5 Author: Linus Torvalds Date: Wed Jun 10 16:15:14 2009 -0700 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Clear TS in irq_ts_save() when in an atomic section x86: Detect use of extended APIC ID for AMD CPUs x86: memtest: remove 64-bit division x86, UV: Fix macros for multiple coherency domains x86: Fix non-lazy GS handling in sys_vm86() x86: Add quirk for reboot stalls on a Dell Optiplex 360 x86: Fix UV BAU activation descriptor init commit bec706838ec2f9c8c2b99e88a1270d7cba159b06 Merge: bb77629 ee07366 Author: Linus Torvalds Date: Wed Jun 10 16:14:41 2009 -0700 Merge branch 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, setup: fix comment in the "glove box" code x86, setup: "glove box" BIOS interrupts in the video code x86, setup: "glove box" BIOS interrupts in the MCA code x86, setup: "glove box" BIOS interrupts in the EDD code x86, setup: "glove box" BIOS interrupts in the APM code x86, setup: "glove box" BIOS interrupts in the core boot code x86, setup: "glove box" BIOS calls -- infrastructure commit bb7762961d3ce745688e9050e914c1d3f980268d Merge: 48c72d1 35d5a9a Author: Linus Torvalds Date: Wed Jun 10 16:13:20 2009 -0700 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (22 commits) x86: fix system without memory on node0 x86, mm: Fix node_possible_map logic mm, x86: remove MEMORY_HOTPLUG_RESERVE related code x86: make sparse mem work in non-NUMA mode x86: process.c, remove useless headers x86: merge process.c a bit x86: use sparse_memory_present_with_active_regions() on UMA x86: unify 64-bit UMA and NUMA paging_init() x86: Allow 1MB of slack between the e820 map and SRAT, not 4GB x86: Sanity check the e820 against the SRAT table using e820 map only x86: clean up and and print out initial max_pfn_mapped x86/pci: remove rounding quirk from e820_setup_gap() x86, e820, pci: reserve extra free space near end of RAM x86: fix typo in address space documentation x86: 46 bit physical address support on 64 bits x86, mm: fault.c, use printk_once() in is_errata93() x86: move per-cpu mmu_gathers to mm/init.c x86: move max_pfn_mapped and max_low_pfn_mapped to setup.c x86: unify noexec handling x86: remove (null) in /sys kernel_page_tables ... commit 48c72d1ab4ec86789a23aed0b0b5f31ac083c0c6 Merge: 5301e0d aeef50b Author: Linus Torvalds Date: Wed Jun 10 16:13:06 2009 -0700 Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, microcode: Simplify vfree() use x86: microcode: use smp_call_function_single instead of set_cpus_allowed, cleanup of synchronization logic commit 5301e0de34a29625c06b8eef115c1a9b4c1f72b1 Merge: c44e3ed 4ecf458 Author: Linus Torvalds Date: Wed Jun 10 15:55:04 2009 -0700 Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86_64: fix incorrect comments x86: unify restore_fpu_checking x86_32: introduce restore_fpu_checking() commit c44e3ed539e4fc17d6bcb5eaecb894a94de4cc5f Merge: 7dc3ca3 5095f59 Author: Linus Torvalds Date: Wed Jun 10 15:51:15 2009 -0700 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: cpu_debug: Remove model information to reduce encoding-decoding x86: fixup numa_node information for AMD CPU northbridge functions x86: k8 convert node_to_k8_nb_misc() from a macro to an inline function x86: cacheinfo: complete L2/L3 Cache and TLB associativity field definitions x86/docs: add description for cache_disable sysfs interface x86: cacheinfo: disable L3 ECC scrubbing when L3 cache index is disabled x86: cacheinfo: replace sysfs interface for cache_disable feature x86: cacheinfo: use cached K8 NB_MISC devices instead of scanning for it x86: cacheinfo: correct return value when cache_disable feature is not active x86: cacheinfo: use L3 cache index disable feature only for CPUs that support it commit 7dc3ca39cb1e22eedbf1207ff9ac7bf682fc0f6d Merge: aa98936 a4046f8 Author: Linus Torvalds Date: Wed Jun 10 15:49:36 2009 -0700 Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, nmi: Use predefined numbers instead of hardcoded one x86: asm/processor.h: remove double declaration x86, mtrr: replace MTRRdefType_MSR with msr-index's MSR_MTRRdefType x86, mtrr: replace MTRRfix4K_C0000_MSR with msr-index's MSR_MTRRfix4K_C0000 x86, mtrr: remove mtrr MSRs double declaration x86, mtrr: replace MTRRfix16K_80000_MSR with msr-index's MSR_MTRRfix16K_80000 x86, mtrr: replace MTRRfix64K_00000_MSR with msr-index's MSR_MTRRfix64K_00000 x86, mtrr: replace MTRRcap_MSR with msr-index's MSR_MTRRcap x86: mce: remove duplicated #include x86: msr-index.h remove duplicate MSR C001_0015 declaration x86: clean up arch/x86/kernel/tsc_sync.c a bit x86: use symbolic name for VM86_SIGNAL when used as vm86 default return x86: added 'ifndef _ASM_X86_IOMAP_H' to iomap.h x86: avoid multiple declaration of kstack_depth_to_print x86: vdso/vma.c declare vdso_enabled and arch_setup_additional_pages before they get used x86: clean up declarations and variables x86: apic/x2apic_cluster.c x86_cpu_to_logical_apicid should be static x86 early quirks: eliminate unused function commit aa98936e4f423dc2706771368598b04870059d14 Merge: 082b63a 0c23590 Author: Linus Torvalds Date: Wed Jun 10 15:49:10 2009 -0700 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, 64-bit: ifdef out struct thread_struct::ip x86, 32-bit: ifdef out struct thread_struct::fs x86: clean up alternative.h commit 082b63ae45e7d14e15995dedd782ec7344596fb2 Merge: 99e97b8 50fa610 Author: Linus Torvalds Date: Wed Jun 10 15:48:53 2009 -0700 Merge branch 'sched-docs-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-docs-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Document memory barriers implied by sleep/wake-up primitives commit 99e97b860e14c64760855198e91d1166697131a7 Merge: 82782ca f04d82b Author: Linus Torvalds Date: Wed Jun 10 15:32:59 2009 -0700 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: fix typo in sched-rt-group.txt file ftrace: fix typo about map of kernel priority in ftrace.txt file. sched: properly define the sched_group::cpumask and sched_domain::span fields sched, timers: cleanup avenrun users sched, timers: move calc_load() to scheduler sched: Don't export sched_mc_power_savings on multi-socket single core system sched: emit thread info flags with stack trace sched: rt: document the risk of small values in the bandwidth settings sched: Replace first_cpu() with cpumask_first() in ILB nomination code sched: remove extra call overhead for schedule() sched: use group_first_cpu() instead of cpumask_first(sched_group_cpus()) wait: don't use __wake_up_common() sched: Nominate a power-efficient ilb in select_nohz_balancer() sched: Nominate idle load balancer from a semi-idle package. sched: remove redundant hierarchy walk in check_preempt_wakeup commit 511b01bdf64ad8a38414096eab283c7784aebfc4 Author: Ingo Molnar Date: Thu Jun 11 00:32:00 2009 +0200 Revert "x86, bts: reenable ptrace branch trace support" This reverts commit 7e0bfad24d85de7cf2202a7b0ce51de11a077b21. A late objection to the ABI has arrived: http://lkml.org/lkml/2009/6/10/253 Keep the ABI disabled out of caution, to not create premature user-space expectations. While the hw-branch-tracing variant uses and tests the BTS code. Cc: Peter Zijlstra Cc: Markus Metzger Cc: Oleg Nesterov Signed-off-by: Ingo Molnar commit 82782ca77d1bfb32b0334cce40a25b91bd8ec016 Merge: f0d5e12 6799687 Author: Linus Torvalds Date: Wed Jun 10 15:30:41 2009 -0700 Merge branch 'x86-kbuild-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-kbuild-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (46 commits) x86, boot: add new generated files to the appropriate .gitignore files x86, boot: correct the calculation of ZO_INIT_SIZE x86-64: align __PHYSICAL_START, remove __KERNEL_ALIGN x86, boot: correct sanity checks in boot/compressed/misc.c x86: add extension fields for bootloader type and version x86, defconfig: update kernel position parameters x86, defconfig: update to current, no material changes x86: make CONFIG_RELOCATABLE the default x86: default CONFIG_PHYSICAL_START and CONFIG_PHYSICAL_ALIGN to 16 MB x86: document new bzImage fields x86, boot: make kernel_alignment adjustable; new bzImage fields x86, boot: remove dead code from boot/compressed/head_*.S x86, boot: use LOAD_PHYSICAL_ADDR on 64 bits x86, boot: make symbols from the main vmlinux available x86, boot: determine compressed code offset at compile time x86, boot: use appropriate rep string for move and clear x86, boot: zero EFLAGS on 32 bits x86, boot: set up the decompression stack as early as possible x86, boot: straighten out ranges to copy/zero in compressed/head*.S x86, boot: stylistic cleanups for boot/compressed/head_64.S ... Fixed trivial conflict in arch/x86/configs/x86_64_defconfig manually commit f0d5e12bd42b7173ebbbf59279c867605f859814 Merge: 0fea615 103428e Author: Linus Torvalds Date: Wed Jun 10 15:25:41 2009 -0700 Merge branch 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (76 commits) x86, apic: Fix dummy apic read operation together with broken MP handling x86, apic: Restore irqs on fail paths x86: Print real IOAPIC version for x86-64 x86: enable_update_mptable should be a macro sparseirq: Allow early irq_desc allocation x86, io-apic: Don't mark pin_programmed early x86, irq: don't call mp_config_acpi_gsi() if update_mptable is not enabled x86, irq: update_mptable needs pci_routeirq x86: don't call read_apic_id if !cpu_has_apic x86, apic: introduce io_apic_irq_attr x86/pci: add 4 more return parameters to IO_APIC_get_PCI_irq_vector(), fix x86: read apic ID in the !acpi_lapic case x86: apic: Fixmap apic address even if apic disabled x86: display extended apic registers with print_local_APIC and cpu_debug code x86: read apic ID in the !acpi_lapic case x86: clean up and fix setup_clear/force_cpu_cap handling x86: apic: Check rev 3 fadt correctly for physical_apic bit x86/pci: update pirq_enable_irq() to setup io apic routing x86/acpi: move setup io apic routing out of CONFIG_ACPI scope x86/pci: add 4 more return parameters to IO_APIC_get_PCI_irq_vector() ... commit 0fea615e526b4b7eff0363ee02d5753e5f924089 Author: Harald Welte Date: Mon Jun 8 18:29:36 2009 +0800 CPUFREQ: Mark e_powersaver driver as EXPERIMENTAL and DANGEROUS The e_powersaver driver for VIA's C7 CPU's needs to be marked as DANGEROUS as it configures the CPU to power states that are out of specification. According to Centaur, all systems with C7 and Nano CPU's support the ACPI p-state method. Thus, the acpi-cpufreq driver should be used instead. Signed-off-by: Harald Welte Signed-off-by: Linus Torvalds commit 0de51088e6a82bc8413d3ca9e28bbca2788b5b53 Author: Harald Welte Date: Mon Jun 8 18:27:54 2009 +0800 CPUFREQ: Enable acpi-cpufreq driver for VIA/Centaur CPUs The VIA/Centaur C7, C7-M and Nano CPU's all support ACPI based cpu p-states using a MSR interface. The Linux driver just never made use of it, since in addition to the check for the EST flag it also checked if the vendor is Intel. Signed-off-by: Harald Welte [ Removed the vendor checks entirely - Linus ] Signed-off-by: Linus Torvalds commit 6ff9a64d2aaa6eae396adc95e9c91c0cbfa6dbe4 Author: Steven Rostedt Date: Wed Jun 10 14:28:34 2009 -0400 tracing: do not translate event helper macros in print format By moving the macro that creates the print format code above the defining of the event macro helpers (__get_str, __print_symbolic, and __get_dynamic_array), we get a little cleaner print format. Instead of: (char *)((void *)REC + REC->__data_loc_name) we get: __get_str(name) Instead of: ({ static const struct trace_print_flags symbols[] = { { HI_SOFTIRQ, "HI" }, { we get: __print_symbolic(REC->vec, { HI_SOFTIRQ, "HI" }, { Signed-off-by: Steven Rostedt commit bc5c6c043d8381676339fb3da59cc4cc5921d368 Author: Mike Frysinger Date: Wed Jun 10 04:48:41 2009 -0400 ftrace/documentation: fix typo in function grapher name The function graph tracer is called just "function_graph" (no trailing "_tracer" needed). Signed-off-by: Mike Frysinger LKML-Reference: <1244623722-6325-1-git-send-email-vapier@gentoo.org> Signed-off-by: Steven Rostedt commit f1db457ce6e2f63cb01022f58c0c023838958bd1 Author: Li Zefan Date: Wed Jun 10 10:06:24 2009 +0800 tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK Fix building failures when CONFIG_BLOCK == n. Signed-off-by: Li Zefan LKML-Reference: <4A2F1520.8020003@cn.fujitsu.com> Signed-off-by: Steven Rostedt Signed-off-by: Ingo Molnar commit 04dce7d9d429ea5ea04e9432d1726c930f4d67da Author: Benjamin Herrenschmidt Date: Wed Jun 10 16:59:46 2009 +1000 spinlock: Add missing __raw_spin_lock_flags() stub for UP This was only defined with CONFIG_DEBUG_SPINLOCK set, but some obscure arch/powerpc code wants it always. Signed-off-by: Benjamin Herrenschmidt Acked-by: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit 2b83868723d090078ac0e2120e06a1cc94dbaef0 Author: Linus Torvalds Date: Tue Jun 9 20:40:25 2009 -0700 Make /dev/zero reads interruptible by signals This helps with bad latencies for large reads from /dev/zero, but might conceivably break some application that "knows" that a read of /dev/zero cannot return early. So do this early in the merge window to give us maximal test coverage, even if the patch is totally trivial. Obviously, no well-behaved application should ever depend on the read being uninterruptible, but hey, bugs happen. Signed-off-by: Linus Torvalds commit 110bf2b764eb6026b868d84499263cb24b1bcc8d Author: Steven Rostedt Date: Tue Jun 9 17:29:07 2009 -0400 tracing: add protection around module events unload When reading the trace buffer, there is a race that when a module is unloaded it removes events that is stilled referenced in the buffers. This patch adds the protection around the unloading of the events from modules and the reading of the trace buffers. Signed-off-by: Steven Rostedt commit 725c624a58a10ef90a2ff889e122158fabf36147 Author: Steven Rostedt Date: Mon Jun 8 19:09:45 2009 -0400 tracing: add trace_seq_vprint interface The code to update the print formats for events requires a vprintf format in the trace_seq. This patch adds that interface. Signed-off-by: Steven Rostedt commit 6556d1df88fe68f9836beeb43342a336691cb67c Author: Steven Rostedt Date: Tue Jun 9 14:04:26 2009 -0400 tracing: fix the block trace points print size The sector field is either u64 or unsigned long depending on the arch. This patch casts the sector to unsigned long long to prevent the printf warnings. [ Impact: remove compile warnings ] Signed-off-by: Steven Rostedt commit 55782138e47d9baf2f7d3a7af9e7cf42adf72c56 Author: Li Zefan Date: Tue Jun 9 13:43:05 2009 +0800 tracing/events: convert block trace points to TRACE_EVENT() TRACE_EVENT is a more generic way to define tracepoints. Doing so adds these new capabilities to this tracepoint: - zero-copy and per-cpu splice() tracing - binary tracing without printf overhead - structured logging records exposed under /debug/tracing/events - trace events embedded in function tracer output and other plugins - user-defined, per tracepoint filter expressions ... Cons: - no dev_t info for the output of plug, unplug_timer and unplug_io events. no dev_t info for getrq and sleeprq events if bio == NULL. no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL. This is mainly because we can't get the deivce from a request queue. But this may change in the future. - A packet command is converted to a string in TP_assign, not TP_print. While blktrace do the convertion just before output. Since pc requests should be rather rare, this is not a big issue. - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT has a unique format, which means we have some unused data in a trace entry. The overhead is minimized by using __dynamic_array() instead of __array(). I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing: dd dd + ioctl blktrace dd + TRACE_EVENT (splice) 1 7.36s, 42.7 MB/s 7.50s, 42.0 MB/s 7.41s, 42.5 MB/s 2 7.43s, 42.3 MB/s 7.48s, 42.1 MB/s 7.43s, 42.4 MB/s 3 7.38s, 42.6 MB/s 7.45s, 42.2 MB/s 7.41s, 42.5 MB/s So the overhead of tracing is very small, and no regression when using those trace events vs blktrace. And the binary output of TRACE_EVENT is much smaller than blktrace: # ls -l -h -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0 -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1 -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out Following are some comparisons between TRACE_EVENT and blktrace: plug: kjournald-480 [000] 303.084981: block_plug: [kjournald] kjournald-480 [000] 303.084981: 8,0 P N [kjournald] unplug_io: kblockd/0-118 [000] 300.052973: block_unplug_io: [kblockd/0] 1 kblockd/0-118 [000] 300.052974: 8,0 U N [kblockd/0] 1 remap: kjournald-480 [000] 303.085042: block_remap: 8,0 W 102736992 + 8 <- (8,8) 33384 kjournald-480 [000] 303.085043: 8,0 A W 102736992 + 8 <- (8,8) 33384 bio_backmerge: kjournald-480 [000] 303.085086: block_bio_backmerge: 8,0 W 102737032 + 8 [kjournald] kjournald-480 [000] 303.085086: 8,0 M W 102737032 + 8 [kjournald] getrq: kjournald-480 [000] 303.084974: block_getrq: 8,0 W 102736984 + 8 [kjournald] kjournald-480 [000] 303.084975: 8,0 G W 102736984 + 8 [kjournald] bash-2066 [001] 1072.953770: 8,0 G N [bash] bash-2066 [001] 1072.953773: block_getrq: 0,0 N 0 + 0 [bash] rq_complete: konsole-2065 [001] 300.053184: block_rq_complete: 8,0 W () 103669040 + 16 [0] konsole-2065 [001] 300.053191: 8,0 C W 103669040 + 16 [0] ksoftirqd/1-7 [001] 1072.953811: 8,0 C N (5a 00 08 00 00 00 00 00 24 00) [0] ksoftirqd/1-7 [001] 1072.953813: block_rq_complete: 0,0 N (5a 00 08 00 00 00 00 00 24 00) 0 + 0 [0] rq_insert: kjournald-480 [000] 303.084985: block_rq_insert: 8,0 W 0 () 102736984 + 8 [kjournald] kjournald-480 [000] 303.084986: 8,0 I W 102736984 + 8 [kjournald] Changelog from v2 -> v3: - use the newly introduced __dynamic_array(). Changelog from v1 -> v2: - use __string() instead of __array() to minimize the memory required to store hex dump of rq->cmd(). - support large pc requests. - add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT. - some cleanups. Signed-off-by: Li Zefan LKML-Reference: <4A2DF669.5070905@cn.fujitsu.com> Signed-off-by: Steven Rostedt commit f57a8a1911342265e7acdc190333c4e9235a6632 Author: Steven Rostedt Date: Fri Jun 5 14:11:30 2009 -0400 ring-buffer: fix ret in rb_add_time_stamp The update of ret got mistakenly added to the if statement of rb_try_to_discard. The variable ret should be 1 on commit and zero otherwise. [ Impact: fix compiler warning and real bug ] Signed-off-by: Steven Rostedt commit 0b8c3d5ab000c22889af7f9409799a6cdc31a2b2 Author: Chuck Ebbert Date: Tue Jun 9 10:40:50 2009 -0400 x86: Clear TS in irq_ts_save() when in an atomic section The dynamic FPU context allocation changes caused the padlock driver to generate the below warning. Fix it by masking TS when doing padlock encryption operations in an atomic section. This solves: BUG: sleeping function called from invalid context at mm/slub.c:1602 in_atomic(): 1, irqs_disabled(): 0, pid: 82, name: cryptomgr_test Pid: 82, comm: cryptomgr_test Not tainted 2.6.29.4-168.test7.fc11.x86_64 #1 Call Trace: [] __might_sleep+0x10b/0x110 [] kmem_cache_alloc+0x37/0xf1 [] init_fpu+0x49/0x8a [] math_state_restore+0x3e/0xbc [] do_device_not_available+0x9/0xb [] device_not_available+0x1b/0x20 [] ? aes_crypt+0x66/0x74 [padlock_aes] [] ? blkcipher_walk_next+0x257/0x2e0 [] ? blkcipher_walk_first+0x18e/0x19d [] aes_encrypt+0x9d/0xe5 [padlock_aes] [] crypt+0x6b/0x114 [xts] [] ? aes_encrypt+0x0/0xe5 [padlock_aes] [] ? aes_encrypt+0x0/0xe5 [padlock_aes] [] encrypt+0x49/0x4b [xts] [] async_encrypt+0x3c/0x3e [] test_skcipher+0x1da/0x658 [] ? crypto_spawn_tfm+0x8e/0xb1 [] ? __crypto_alloc_tfm+0x11b/0x15f [] ? crypto_spawn_tfm+0x8e/0xb1 [] ? skcipher_geniv_init+0x2b/0x47 [] ? async_chainiv_init+0x5c/0x61 [] alg_test_skcipher+0x63/0x9b [] alg_test+0x12d/0x175 [] cryptomgr_test+0x38/0x54 [] ? cryptomgr_test+0x0/0x54 [] kthread+0x4d/0x78 [] child_rip+0xa/0x20 [] ? restore_args+0x0/0x30 [] ? kthread+0x0/0x78 [] ? child_rip+0x0/0x20 Signed-off-by: Chuck Ebbert Cc: Suresh Siddha LKML-Reference: <20090609104050.50158cfe@dhcp-100-2-144.bos.redhat.com> Signed-off-by: Ingo Molnar commit 92db1e6af747faa129e236d68386af26a0efc12b Merge: 0bf8412 e9a22a1 Author: Ingo Molnar Date: Tue Jun 9 16:18:11 2009 +0200 Merge branch 'amd-iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into core/iommu commit 42937e81a82b6bbc51a309c83da140b3a7ca5945 Author: Andreas Herrmann Date: Mon Jun 8 15:55:09 2009 +0200 x86: Detect use of extended APIC ID for AMD CPUs Booting a 32-bit kernel on Magny-Cours results in the following panic: ... Using APIC driver default ... Overriding APIC driver with bigsmp ... Getting VERSION: 80050010 Getting VERSION: 80050010 Getting ID: 10000000 Getting ID: ef000000 Getting LVT0: 700 Getting LVT1: 10000 Kernel panic - not syncing: Boot APIC ID in local APIC unexpected (16 vs 0) Pid: 1, comm: swapper Not tainted 2.6.30-rcX #2 Call Trace: [] ? panic+0x38/0xd3 [] ? native_smp_prepare_cpus+0x259/0x31f [] ? kernel_init+0x3e/0x141 [] ? kernel_init+0x0/0x141 [] ? kernel_thread_helper+0x7/0x10 The reason is that default_get_apic_id handled extension of local APIC ID field just in case of XAPIC. Thus for this AMD CPU, default_get_apic_id() returns 0 and bigsmp_get_apic_id() returns 16 which leads to the respective kernel panic. This patch introduces a Linux specific feature flag to indicate support for extended APIC id (8 bits instead of 4 bits width) and sets the flag on AMD CPUs if applicable. Signed-off-by: Andreas Herrmann Cc: LKML-Reference: <20090608135509.GA12431@alberich.amd.com> Signed-off-by: Ingo Molnar commit e9a22a13c71986851e931bdfa054f68839ff8576 Author: Joerg Roedel Date: Tue Jun 9 12:00:37 2009 +0200 amd-iommu: remove unnecessary "AMD IOMMU: " prefix That prefix is already included in the DUMP_printk macro. So there is no need to repeat it in the format string. Signed-off-by: Joerg Roedel commit 71ff3bca2f70264effe8cbbdd5bc10cf6be5f2f0 Author: Joerg Roedel Date: Mon Jun 8 13:47:33 2009 -0700 amd-iommu: detach device explicitly before attaching it to a new domain This fixes a bug with a device that could not be assigned to a KVM guest because it is still assigned to a dma_ops protection domain. [chrisw: simply remove WARN_ON(), will always fire since dev->driver will be pci-sub] Signed-off-by: Chris Wright Signed-off-by: Joerg Roedel commit 29150078d7a1758df8c7a6cd2ec066ac65e1fab9 Author: Joerg Roedel Date: Tue Jun 9 10:54:18 2009 +0200 amd-iommu: remove BUS_NOTIFY_BOUND_DRIVER handling Handling this event causes device assignment in KVM to fail because the device gets re-attached as soon as the pci-stub registers as the driver for the device. Signed-off-by: Joerg Roedel commit d2dd01de9924ae24afeba5aa5bc2e08287701df6 Merge: 367d04c 62a6f46 Author: Joerg Roedel Date: Tue Jun 9 10:50:57 2009 +0200 Merge commit 'tip/core/iommu' into amd-iommu/fixes commit 1f8a6a10fb9437eac3f516ea4324a19087872f30 Author: Peter Zijlstra Date: Mon Jun 8 18:18:39 2009 +0200 ring-buffer: pass in lockdep class key for reader_lock On Sun, 7 Jun 2009, Ingo Molnar wrote: > Testing tracer sched_switch: <6>Starting ring buffer hammer > PASSED > Testing tracer sysprof: PASSED > Testing tracer function: PASSED > Testing tracer irqsoff: > ============================================= > PASSED > Testing tracer preemptoff: PASSED > Testing tracer preemptirqsoff: [ INFO: possible recursive locking detected ] > PASSED > Testing tracer branch: 2.6.30-rc8-tip-01972-ge5b9078-dirty #5760 > --------------------------------------------- > rb_consumer/431 is trying to acquire lock: > (&cpu_buffer->reader_lock){......}, at: [] ring_buffer_reset_cpu+0x37/0x70 > > but task is already holding lock: > (&cpu_buffer->reader_lock){......}, at: [] ring_buffer_consume+0x7e/0xc0 > > other info that might help us debug this: > 1 lock held by rb_consumer/431: > #0: (&cpu_buffer->reader_lock){......}, at: [] ring_buffer_consume+0x7e/0xc0 The ring buffer is a generic structure, and can be used outside of ftrace. If ftrace traces within the use of the ring buffer, it can produce false positives with lockdep. This patch passes in a static lock key into the allocation of the ring buffer, so that different ring buffers will have their own lock class. Reported-by: Ingo Molnar Signed-off-by: Peter Zijlstra LKML-Reference: <1244477919.13761.9042.camel@twins> [ store key in ring buffer descriptor ] Signed-off-by: Steven Rostedt commit c9690998ef48ffefeccb91c70a7739eebdea57f9 Author: Andreas Herrmann Date: Mon Jun 8 19:09:39 2009 +0200 x86: memtest: remove 64-bit division Using gcc 3.3.5 a "make allmodconfig" + "CONFIG_KVM=n" triggers a build error: arch/x86/mm/built-in.o(.init.text+0x43f7): In function `__change_page_attr': arch/x86/mm/pageattr.c:114: undefined reference to `__udivdi3' make: *** [.tmp_vmlinux1] Error 1 The culprit turned out to be a division in arch/x86/mm/memtest.c For more info see this thread: http://marc.info/?l=linux-kernel&m=124416232620683 The patch entirely removes the division that caused the build error. [ Impact: build fix with certain GCC versions ] Reported-by: Tetsuo Handa Signed-off-by: Andreas Herrmann Cc: Yinghai Lu Cc: xiyou.wangcong@gmail.com Cc: Andrew Morton Cc: LKML-Reference: <20090608170939.GB12431@alberich.amd.com> Signed-off-by: Ingo Molnar commit c4ed3f04ba9defe22aa729d1646f970f791c03d7 Author: Jack Steiner Date: Mon Jun 8 10:44:05 2009 -0500 x86, UV: Fix macros for multiple coherency domains Fix bug in the SGI UV macros that support systems with multiple coherency domains. The macros used for referencing global MMR (chipset registers) are failing to correctly "or" the NASID (node identifier) bits that reside above M+N. These high bits are supplied automatically by the chipset for memory accesses coming from the processor socket. However, the bits must be present for references to the special global MMR space used to map chipset registers. (See uv_hub.h for more details ...) The bug results in references to invalid/incorrect nodes. Signed-off-by: Jack Steiner Cc: LKML-Reference: <20090608154405.GA16395@sgi.com> Signed-off-by: Ingo Molnar commit 0bf841281e58d0b3cc9fe9dc4383df7694bde6bd Author: Joerg Roedel Date: Mon Jun 8 15:53:46 2009 +0200 dma-debug: simplify logic in driver_filter() This patch makes the driver_filter function more readable by reorganizing the code. The removal of a code code block to an upper indentation level makes hard-to-read line-wraps unnecessary. Signed-off-by: Joerg Roedel commit be81c6ea23b8b471141734ef4bc005f5127aaf43 Author: Joerg Roedel Date: Mon Jun 8 15:46:19 2009 +0200 dma-debug: disable/enable irqs only once in device_dma_allocations There is no need to disable/enable irqs on each loop iteration. Just disable irqs for the whole time the loop runs. Signed-off-by: Joerg Roedel commit e7ed70eedccc78e79ce6da2155e9caf90aff4003 Author: Joerg Roedel Date: Mon Jun 8 15:39:24 2009 +0200 dma-debug: use pr_* instead of printk(KERN_* ...) The pr_* macros are shorter than the old printk(KERN_ ...) variant. Change the dma-debug code to use the new macros and save a few unnecessary line breaks. If lines don't break the source code can also be grepped more easily. Signed-off-by: Joerg Roedel commit c17e2cf7376a2010b8b114fdeebd4e340a5e9cb2 Author: Joerg Roedel Date: Mon Jun 8 15:19:29 2009 +0200 dma-debug: code style fixes This patch changes the recent updates to dma-debug to conform with coding style guidelines of Linux and the -tip tree. Signed-off-by: Joerg Roedel commit 312325094785566a0e42a88c1bf6e7eb54c5d70e Author: Joerg Roedel Date: Mon Jun 8 15:07:08 2009 +0200 dma-debug: comment style fixes Last patch series introduced some new comment which does not fit the Kernel comment style guidelines. Fix it with this patch. Signed-off-by: Joerg Roedel commit aeef50bc0483fa70ce0bddb686ec84a274b7f3d4 Author: Figo.zhang Date: Sun Jun 7 22:30:36 2009 +0800 x86, microcode: Simplify vfree() use vfree() does its own 'NULL' check, so no need for check before calling it. In v2, remove the stray newline. [ Impact: cleanup ] Signed-off-by: Figo.zhang Cc: Dmitry Adamushko LKML-Reference: <1244385036.3402.11.camel@myhost> Signed-off-by: Ingo Molnar commit 3aa6b186f86c5d06d6d92d14311ffed51f091f40 Author: Lubomir Rintel Date: Sun Jun 7 16:23:48 2009 +0200 x86: Fix non-lazy GS handling in sys_vm86() This fixes a stack corruption panic or null dereference oops due to a bad GS in resume_userspace() when returning from sys_vm86() and calling lockdep_sys_exit(). Only a problem when CONFIG_LOCKDEP and CONFIG_CC_STACKPROTECTOR enabled. Signed-off-by: Lubomir Rintel Cc: H. Peter Anvin LKML-Reference: <1244384628.2323.4.camel@bimbo> Signed-off-by: Ingo Molnar commit a4046f8d299e00e9855ae292527c2d66a42670eb Author: Cyrill Gorcunov Date: Sun Jun 7 12:19:37 2009 +0400 x86, nmi: Use predefined numbers instead of hardcoded one [ Impact: cleanup ] Signed-off-by: Cyrill Gorcunov LKML-Reference: <20090607081937.GC4547@lenovo> Signed-off-by: Ingo Molnar commit 103428e57be323c3c5545db8ad12667099bc6005 Author: Cyrill Gorcunov Date: Sun Jun 7 16:48:40 2009 +0400 x86, apic: Fix dummy apic read operation together with broken MP handling Ingo Molnar reported that read_apic is buggy novadays: [ 0.000000] Using APIC driver default [ 0.000000] SMP: Allowing 1 CPUs, 0 hotplug CPUs [ 0.000000] Local APIC disabled by BIOS -- you can enable it with "lapic" [ 0.000000] APIC: disable apic facility [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: at arch/x86/kernel/apic/apic.c:254 native_apic_read_dummy+0x2d/0x3b() [ 0.000000] Hardware name: HP OmniBook PC Indeed we still rely on apic->read operation for SMP compiled kernel. And instead of disfigure the SMP code with #ifdef we allow to call apic->read. To capture any unexpected results we check for apic->read being called for sane reason via WARN_ON_ONCE but(!) instead of OR we should use AND logical operation (thanks Yinghai for spotting the root of the problem). Along with that we could be have bad MP table and we are to fix it that way no SMP started and no complains about BIOS bug if apic was just disabled via command line. Signed-off-by: Cyrill Gorcunov Cc: Yinghai Lu LKML-Reference: <20090607124840.GD4547@lenovo> Signed-off-by: Ingo Molnar commit 4a4aca641bc4598e77b866804f47c651ec4a764d Author: Jean Delvare Date: Fri Jun 5 12:02:38 2009 +0200 x86: Add quirk for reboot stalls on a Dell Optiplex 360 The Dell Optiplex 360 hangs on reboot, just like the Optiplex 330, so the same quirk is needed. Signed-off-by: Jean Delvare Cc: Steve Conklin Cc: Leann Ogasawara Cc: LKML-Reference: <200906051202.38311.jdelvare@suse.de> Signed-off-by: Ingo Molnar commit 5095f59bda6793a7b8f0856096d6893fe98e0e51 Author: Jaswinder Singh Rajput Date: Fri Jun 5 23:27:17 2009 +0530 x86: cpu_debug: Remove model information to reduce encoding-decoding Remove model information, encoding/decoding and reduce bookkeeping. This, besides removing a lot of code and cleaning up the code, also enables these features on many more CPUs that were enumerated before. Reported-by: Ingo Molnar Signed-off-by: Jaswinder Singh Rajput Cc: Alan Cox LKML-Reference: <1244224637.8212.6.camel@ht.satnam> Signed-off-by: Ingo Molnar commit 5f4457a4f62cc9d78e04c0eb12ff0540899aad89 Merge: 9b94b3a b87297f Author: Ingo Molnar Date: Sun Jun 7 12:22:15 2009 +0200 Merge branch 'linus' into x86/cpu commit 62a6f465f6572e1f28765c583c12753bb3e23715 Merge: 56fdd18 bdc2911 Author: Ingo Molnar Date: Sun Jun 7 11:36:02 2009 +0200 Merge branch 'dma-debug/2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into core/iommu commit 56fdd18c7b89a2fac1dfe5d54750c9143867fdc4 Merge: 7caf6a4 b87297f Author: Ingo Molnar Date: Sun Jun 7 11:34:59 2009 +0200 Merge branch 'linus' into core/iommu Merge reason: This branch was on an -rc5 base so pull almost-2.6.30 to resync with the latest upstream fixes and make sure the combination works fine. Signed-off-by: Ingo Molnar commit 7caf6a49bb17d0377210693af5737563b31aa5ee Author: Joerg Roedel Date: Fri Jun 5 12:01:35 2009 +0200 dma-debug: change hash_bucket_find from first-fit to best-fit Some device drivers map the same physical address multiple times to a dma address. Without an IOMMU this results in the same dma address being put into the dma-debug hash multiple times. With a first-fit match in hash_bucket_find() this function may return the wrong dma_debug_entry. This can result in false positive warnings. This patch fixes it by changing the first-fit behavior of hash_bucket_find() into a best-fit algorithm. Reported-by: Torsten Kaiser Reported-by: FUJITA Tomonori Signed-off-by: Joerg Roedel Cc: lethal@linux-sh.org Cc: just.for.lkml@googlemail.com Cc: hancockrwd@gmail.com Cc: jens.axboe@oracle.com Cc: bharrosh@panasas.com Cc: FUJITA Tomonori Cc: Linus Torvalds Cc: LKML-Reference: <20090605104132.GE24836@amd.com> Signed-off-by: Ingo Molnar commit fe2245c905631a3a353504fc04388ce3dfaf9d9e Author: Mark Langsdorf Date: Sun Jul 5 15:50:52 2009 -0500 x86: enable GART-IOMMU only after setting up protection methods The current code to set up the GART as an IOMMU enables GART translations before it removes the aperture from the kernel memory map, sets the GART PTEs to UC, sets up the guard and scratch pages, or does a wbinvd(). This leaves the possibility of cache aliasing open and can cause system crashes. Re-order the code so as to enable the GART translations only after all safeguards are in place and the tlb has been flushed. AMD has tested this patch on both Istanbul systems and 1st generation Opteron systems with APG enabled and seen no adverse effects. Istanbul systems with HT Assist enabled sometimes see MCE errors due to cache artifacts with the unmodified code. Signed-off-by: Mark Langsdorf Cc: Cc: Joerg Roedel Cc: akpm@linux-foundation.org Cc: jbarnes@virtuousgeek.org Signed-off-by: Ingo Molnar commit 918143e8b7d6153d7a83a3f854323407939f4a7e Merge: 64edbc5 563af16 Author: Ingo Molnar Date: Fri Jun 5 16:50:29 2009 +0200 Merge branch 'tip/tracing/ftrace-4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace commit 64edbc562034f2ec3fce382cb208fab40586d005 Merge: 43bd123 0f6ce3d Author: Ingo Molnar Date: Thu Jun 4 13:59:26 2009 +0200 Merge branch 'tracing/ftrace' into tracing/core Merge reason: this mini-topic had outstanding problems that delayed its merge, so it does not fast-forward. Signed-off-by: Ingo Molnar commit 563af16c30ede41eda2d614195d88e07f7c7103d Author: Steven Rostedt Date: Wed Jun 3 11:10:44 2009 -0400 tracing: add annotation to what type of stack trace is recorded The current method of printing out a stack trace is to add a new line and print out the trace: yum-updatesd-3120 [002] 573.691303: => do_softirq => irq_exit => smp_apic_timer_interrupt => apic_timer_interrupt This looks a bit awkward, and if we have both stack and user stack traces running, it would be nice to have a title to tell them apart, although it is easy to tell by the output. This patch adds an annotation to the start of the stack traces: init-1 [003] 929.304979: => user_path_at => vfs_fstatat => vfs_stat => sys_newstat => system_call_fastpath cat-3459 [002] 1016.824040: => <0000003aae6c0250> => <00007ffff4b06ae4> => <69636172742f6775> Signed-off-by: Steven Rostedt commit 56d8bd3f0b98972312cad683947ec90b21011199 Author: Steven Whitehouse Date: Wed Jun 3 14:52:03 2009 +0100 tracing: fix multiple use of __print_flags and __print_symbolic Here is an updated patch to include the extra call to trace_seq_init() as requested. This is vs. the latest -tip tree and fixes the use of multiple __print_flags and __print_symbolic in a single tracer. Also tested to ensure its working now: mount.gfs2-2534 [000] 235.850587: gfs2_glock_queue: 8.7 glock 1:2 dequeue PR mount.gfs2-2534 [000] 235.850591: gfs2_demote_rq: 8.7 glock 1:0 demote EX to NL flags:DI mount.gfs2-2534 [000] 235.850591: gfs2_glock_queue: 8.7 glock 1:0 dequeue EX glock_workqueue-2529 [000] 235.850666: gfs2_glock_state_change: 8.7 glock 1:0 state EX => NL tgt:NL dmt:NL flags:lDpI glock_workqueue-2529 [000] 235.850672: gfs2_glock_put: 8.7 glock 1:0 state NL => IV flags:I Signed-off-by: Steven Whitehouse LKML-Reference: <1244037123.29604.603.camel@localhost.localdomain> Signed-off-by: Steven Rostedt commit 048dc50c5e7eada19ebabbad70b7966d14283d41 Author: walimis Date: Wed Jun 3 16:01:30 2009 +0800 tracing/events: fix output format of user stack According to "events/ftrace/user_stack/format", fix the output of user stack. before fix: sh-1073 [000] 31.137561: <- <0804e33c> <- <080835c1> after fix: sh-1072 [000] 37.039329: => => <0804e33c> => <080835c1> Signed-off-by: walimis LKML-Reference: <1244016090-7814-3-git-send-email-walimisdev@gmail.com> Signed-off-by: Steven Rostedt commit f11b3f4e2932bfdcfc458ab8d1ece62724ceabfc Author: walimis Date: Wed Jun 3 16:01:29 2009 +0800 tracing/events: fix output format of kernel stack According to "events/ftrace/kernel_stack/format", output format of kernel stack should use "=>" instead of "<=". The second problem is that we shouldn't skip the first entry in the stack, although it seems to be duplicated when used in the "function" tracer, but events also use it. If we skip the first one, we will drop the topmost entry of the stack. The last problem is that if the last entry is ULONG_MAX(0xffffffff), we should drop it, otherwise it will print a NULL name line. before fix: sh-1072 [000] 26.957239: sched_process_fork: parent sh:1072 child sh:1073 sh-1072 [000] 26.957262: <= syscall_call <= sh-1072 [000] 26.957744: sched_switch: task sh:1072 [120] (R) ==> sh:1073 [120] sh-1072 [000] 26.957752: <= preempt_schedule <= wake_up_new_task <= do_fork <= sys_clone <= syscall_call <= After fix: sh-1075 [000] 39.791848: sched_process_fork: parent sh:1075 child sh:1076 sh-1075 [000] 39.791871: => sys_clone => syscall_call sh-1075 [000] 39.792713: sched_switch: task sh:1075 [120] (R) ==> sh:1076 [120] sh-1075 [000] 39.792722: => schedule => preempt_schedule => wake_up_new_task => do_fork => sys_clone => syscall_call Signed-off-by: walimis LKML-Reference: <1244016090-7814-2-git-send-email-walimisdev@gmail.com> Signed-off-by: Steven Rostedt commit 083a63b48e4dd0a6a2d44216720076dc81ebb255 Author: walimis Date: Wed Jun 3 16:01:28 2009 +0800 tracing/trace_stack: fix the number of entries in the header The last entry in the stack_dump_trace is ULONG_MAX, which is not a valid entry, but max_stack_trace.nr_entries has accounted for it. So when printing the header, we should decrease it by one. Before fix, print as following, for example: Depth Size Location (53 entries) <--- should be 52 ----- ---- -------- 0) 3264 108 update_wall_time+0x4d5/0x9a0 ... 51) 80 80 syscall_call+0x7/0xb ^^^ it's correct. Signed-off-by: walimis LKML-Reference: <1244016090-7814-1-git-send-email-walimisdev@gmail.com> Signed-off-by: Steven Rostedt commit ea05b57cc19234d8de9887c8a32c2e58e84b56ba Author: Steven Rostedt Date: Wed Jun 3 09:30:10 2009 -0400 ring-buffer: discard timestamps that are at the start of the buffer Every buffer page in the ring buffer includes its own time stamp. When an event is recorded to the ring buffer with a delta time greater than what can be held in the event header, a time stamp event is created. If the the create timestamp falls over to the next buffer page, it is redundant because the buffer page holds a full time stamp. This patch will try to discard the time stamp when it falls to the start of the next page. This change also fixes a issues with disarding events. If most events are discarded, timestamps will start to creep into the ring buffer. If we do not discard the timestamps then they can fill up the ring buffer over time and waste space. This change will keep time stamps from filling up over another page. If something is recorded in the buffer page, and the rest is filtered, then the time stamps can only fill up to the end of the page. [ Impact: prevent time stamps from filling ring buffer ] Reported-by: Tim Bird Signed-off-by: Steven Rostedt commit edd813bffc62a980bb4fb9b1243f31c1cce78da3 Author: Steven Rostedt Date: Tue Jun 2 23:00:53 2009 -0400 ring-buffer: try to discard unneeded timestamps There are times that a race may happen that we add a timestamp in a nested write. This timestamp would just contain a zero delta and serves no purpose. Now that we have a way to discard events, this patch will try to discard the timestamp instead of just wasting the space in the ring buffer. Signed-off-by: Steven Rostedt commit a2023556409cf7fec5d67a26f7fcfa57c5a4086d Author: Tim Bird Date: Tue Jun 2 17:06:54 2009 -0700 ring-buffer: fix bug in ring_buffer_discard_commit There's a bug in ring_buffer_discard_commit. The wrong pointer is being compared in order to check if the event can be freed from the buffer rather than discarded (i.e. marked as PAD). I noticed this when I was working on duration filtering. The bug is not deadly - it just results in lots of wasted space in the buffer. All filtered events are left in the buffer and marked as discarded, rather than being removed from the buffer to make space for other events. Unfortunately, when I fixed this bug, I got errors doing a filtered function trace. Multiple TIME_EXTEND events pile up in the buffer, and trigger the following loop overage warning in rb_iter_peek(): again: ... if (RB_WARN_ON(cpu_buffer, ++nr_loops > 10)) return NULL; I'm not sure what the best way is to fix this. I don't know if I should extend the loop threshhold, or if I should make the test more complex (ignore TIME_EXTEND events), or just get rid of this loop check completely. Note that if I implement a workaround for this, then I see another problem from rb_advance_iter(). I haven't tracked that one down yet. In general, it seems like the case of removing filtered events has not been working properly, and so some assumptions about buffer invariant conditions need to be revisited. Here's the patch for the simple fix: Compare correct pointer for checking if an event can be freed rather than left as discarded in the buffer. Signed-off-by: Tim Bird LKML-Reference: <4A25BE9E.5090909@am.sony.com> Signed-off-by: Steven Rostedt commit 0e2595cdfd7df9f1128f7185152601ae5417483b Author: Cliff Wickman Date: Wed May 20 08:10:57 2009 -0500 x86: Fix UV BAU activation descriptor init The UV tlb shootdown code has a serious initialization error. An array of structures [32*8] is initialized as if it were [32]. The array is indexed by (cpu number on the blade)*8, so the short initialization works for up to 4 cpus on a blade. But above that, we provide an invalid opcode to the hub's broadcast assist unit. This patch changes the allocation of the array to use its symbolic dimensions for better clarity. And initializes all 32*8 entries. Shortened 'UV_ACTIVATION_DESCRIPTOR_SIZE' to 'UV_ADP_SIZE' per Ingo's recommendation. Tested on the UV simulator. Signed-off-by: Cliff Wickman Cc: LKML-Reference: Signed-off-by: Ingo Molnar commit 367d04c4ec02dad34d80452e32e3370db7fb6fee Author: Jiri Slaby Date: Thu May 28 09:54:48 2009 +0200 amd_iommu: fix lock imbalance In alloc_coherent there is an omitted unlock on the path where mapping fails. Add the unlock. [ Impact: fix lock imbalance in alloc_coherent ] Signed-off-by: Jiri Slaby Cc: Joerg Roedel Signed-off-by: Joerg Roedel commit 6799687a53a28536fd027ccb644833f66a778925 Author: Mike Galbraith Date: Tue Jun 2 08:23:58 2009 +0200 x86, boot: add new generated files to the appropriate .gitignore files git status complains of untracked (generated) files in arch/x86/boot.. # Untracked files: # (use "git add ..." to include in what will be committed) # # ../../arch/x86/boot/compressed/mkpiggy # ../../arch/x86/boot/compressed/piggy.S # ../../arch/x86/boot/compressed/vmlinux.lds # ../../arch/x86/boot/voffset.h # ../../arch/x86/boot/zoffset.h ..so adjust .gitignore files accordingly. Signed-off-by: Mike Galbraith Signed-off-by: H. Peter Anvin commit 84047e360af0394ac5861d433f26bbcf30f77dd1 Author: Steven Rostedt Date: Tue Jun 2 16:51:55 2009 -0400 function-graph: always initialize task ret_stack On creating a new task while running the function graph tracer, if we fail to allocate the ret_stack, and then fail the fork, the code will free the parent ret_stack. This is because the child duplicated the parent and currently points to the parent's ret_stack. This patch always initializes the task's ret_stack to NULL. [ Impact: prevent crash of parent on low memory during fork ] Signed-off-by: Steven Rostedt commit f7e8b616ed1cc6f790b82324bce8a2a60295e5c2 Author: Steven Rostedt Date: Tue Jun 2 16:39:48 2009 -0400 function-graph: move initialization of new tasks up in fork When the function graph tracer is enabled, all new tasks must allocate a ret_stack to place the return address of functions. This is because the function graph tracer will replace the real return address with a call to the tracing of the exit function. This initialization happens in fork, but it happens too late. If fork fails, then it will call free_task and that calls the freeing of this ret_stack. But before initialization happens, the new (failed) task points to its parents ret_stack. If a fork failure happens during the function trace, it would be catastrophic for the parent. Also, there's no need to call ftrace_graph_exit_task from fork, since it is called by free_task which fork calls on failure. [ Impact: prevent crash during failed fork running function graph tracer ] Signed-off-by: Steven Rostedt commit 26c01624a2a40f8a4ddf6449b65c9b1c418d0e72 Author: Steven Rostedt Date: Tue Jun 2 14:01:19 2009 -0400 function-graph: add memory barriers for accessing task's ret_stack The code that handles the tasks ret_stack allocation for every task assumes that only an interrupt can cause issues (even though interrupts are disabled). In reality, the code is allocating the ret_stack for tasks that may be running on other CPUs and there are not efficient memory barriers to handle this case. [ Impact: prevent crash due to using of uninitialized ret_stack variables ] Signed-off-by: Steven Rostedt commit 82310a3272d5a2a7652f5649ad8a55f58c8f74d9 Author: Steven Rostedt Date: Tue Jun 2 12:26:07 2009 -0400 function-graph: enable the stack after initialization of other variables The function graph tracer checks if the task_struct has ret_stack defined to know if it is OK or not to use it. The initialization is done for all tasks by one process, but the idle tasks use the same initialization used by new tasks. If an interrupt happens on an idle task that just had the ret_stack created, but before the rest of the initialization took place, then we can corrupt the return address of the functions. This patch moves the setting of the task_struct's ret_stack to after the other variables have been initialized. [ Impact: prevent kernel panic on idle task when starting function graph ] Signed-off-by: Steven Rostedt commit 179c498ae2998461fe436437a74dc29036fc7dcc Author: Steven Rostedt Date: Tue Jun 2 12:03:19 2009 -0400 function-graph: only allocate init tasks if it was not already done When the function graph tracer is enabled, it calls the initialization needed for the init tasks that would be called on all created tasks. The problem is that this is called every time the function graph tracer is enabled, and the ret_stack is allocated for the idle tasks each time. Thus, the old ret_stack is lost and a memory leak is created. This is also dangerous because if an interrupt happened on another CPU with the init task and the ret_stack is replaced, we then lose all the return pointers for the interrupt, and a crash would take place. [ Impact: fix memory leak and possible crash due to race ] Signed-off-by: Steven Rostedt commit bdc2911cde7d18580a545483844d75fdb3551729 Merge: 88f3907 016ea68 Author: Joerg Roedel Date: Tue Jun 2 16:45:02 2009 +0200 Merge branches 'dma-debug/fixes' and 'dma-debug/driver-filter' into dma-debug/2.6.31 commit 016ea6874a6d58df85b54f56997d26df13c307b2 Author: Joerg Roedel Date: Fri May 22 21:57:23 2009 +0200 dma-debug: add documentation for the driver filter This patch adds the driver filter feature to the dma-debug documentation. Signed-off-by: Joerg Roedel commit 1745de5e5639457513fe43440f2800e23c3cbc7d Author: Joerg Roedel Date: Fri May 22 21:49:51 2009 +0200 dma-debug: add dma_debug_driver kernel command line This patch add the dma_debug_driver= boot parameter to enable the driver filter for early boot. Signed-off-by: Joerg Roedel commit 8a6fc708b9bb48a79a385bdc2be0959ee2ab788d Author: Joerg Roedel Date: Fri May 22 21:23:13 2009 +0200 dma-debug: add debugfs file for driver filter This patch adds the dma-api/driver_filter file to debugfs. The root user can write a driver name into this file to see only dma-api errors for that particular driver in the kernel log. Writing an empty string to that file disables the driver filter. Signed-off-by: Joerg Roedel commit 2e507d849f1834d3fe9aae71b7aabf4c2a3827b8 Author: Joerg Roedel Date: Fri May 22 18:24:20 2009 +0200 dma-debug: add variables and checks for driver filter This patch adds the state variables for the driver filter and a function to check if the filter is enabled and matches to the current device. The check is built into the err_printk function. Signed-off-by: Joerg Roedel commit 0f6ce3de4ef6ff940308087c49760d068851c1a7 Author: Steven Rostedt Date: Mon Jun 1 21:51:28 2009 -0400 ftrace: do not profile functions when disabled A race was found that if one were to enable and disable the function profiler repeatedly, then the system can panic. This was because a profiled function may be preempted just before disabling interrupts. While the profiler is disabled and then reenabled, the preempted function could start again, and access the hash as it is being initialized. This just adds a check in the irq disabled part to check if the profiler is enabled, and if it is not then it will just exit. When the system is disabled, the profile_enabled variable is cleared before calling the unregistering of the function profiler. This unregistering calls stop machine which also acts as a synchronize schedule. [ Impact: fix panic in enabling/disabling function profiler ] Signed-off-by: Steven Rostedt commit 112f38a7e36e9d688b389507136bf3af3e6d159b Author: Steven Rostedt Date: Mon Jun 1 15:16:05 2009 -0400 tracing: make trace pipe recognize latency format flag The trace_pipe did not recognize the latency format flag and would produce different output than the trace file. The problem was partly due that the trace flags in the iterator was not set as well as the trace_pipe zeros out part of the iterator (including the flags) to be able to use the same routines as the trace file. trace_flags of the iterator should not cause any problems when not zeroed out by for trace_pipe. Reported-by: Johannes Berg Signed-off-by: Steven Rostedt commit 1d080d6c3141623c92caaebe20e847cb99ccbb60 Author: Steven Rostedt Date: Mon Jun 1 12:20:40 2009 -0400 tracing: remove redundant SOFTIRQ from softirq event traces After converting the softirq tracer to use te flags options, this caused a regression with the name. Since the flag was used directly it was printed out (i.e. HRTIMER_SOFTIRQ). This patch only shows the softirq name without the SOFTIRQ part. [ Impact: fix regression of output from softirq events ] Signed-off-by: Steven Rostedt commit ec081ddc3d90aab35bc0de19a358b964978837cf Author: Steven Whitehouse Date: Mon Jun 1 15:53:35 2009 +0100 tracing: add exports to use __print_symbolic and __print_flags from a module A patch to allow the use of __print_symbolic and __print_flags from a module. This allows the current GFS2 tracing patch to build. Signed-off-by: Steven Whitehouse LKML-Reference: <1243868015.29604.542.camel@localhost.localdomain> Signed-off-by: Steven Rostedt commit 7fcb7c472f455d1711eb5a7633204dba8800a6d6 Author: Li Zefan Date: Mon Jun 1 15:35:46 2009 +0800 tracing/events: introduce __dynamic_array() __string() is limited: - it's a char array, but we may want to define array with other types - a source string should be available, but we may just know the string size We introduce __dynamic_array() to break those limitations, and __string() becomes a wrapper of it. As a side effect, now __get_str() can be used in TP_fast_assign but not only TP_print. Take XFS for example, we have the string length in the dirent, but the string itself is not NULL-terminated, so __dynamic_array() can be used: TRACE_EVENT(xfs_dir2, TP_PROTO(struct xfs_da_args *args), TP_ARGS(args), TP_STRUCT__entry( __field(int, namelen) __dynamic_array(char, name, args->namelen + 1) ... ), TP_fast_assign( char *name = __get_str(name); if (args->namelen) memcpy(name, args->name, args->namelen); name[args->namelen] = '\0'; __entry->namelen = args->namelen; ), TP_printk("name %.*s namelen %d", __entry->namelen ? __get_str(name) : NULL __entry->namelen) ); [ Impact: allow defining dynamic size arrays ] Signed-off-by: Li Zefan LKML-Reference: <4A2384D2.3080403@cn.fujitsu.com> Signed-off-by: Steven Rostedt commit a9c1c3abe1160a5632e48c929b02b740556bf423 Author: Li Zefan Date: Mon Jun 1 15:35:13 2009 +0800 tracing/events: put TP_fast_assign into braces Currently TP_fast_assign has a limitation that we can't define local variables in it. Here's one use case when we introduce __dynamic_array(): TP_fast_assign( type *p = __get_dynamic_array(item); foo(p); bar(p); ), [ Impact: allow defining local variables in TP_fast_assign ] Signed-off-by: Li Zefan LKML-Reference: <4A2384B1.90100@cn.fujitsu.com> Signed-off-by: Steven Rostedt commit 6e25db44a7ad7eb380f4ec774ec00a8fcddea112 Author: Li Zefan Date: Fri May 29 11:24:59 2009 +0800 tracing/events: fix a typo in __string() format output "tsize" should be "\tsize". Also remove the space before "__str_loc". Before: # cat tracing/events/irq/irq_handler_entry/format ... field:int irq; offset:12; size:4; field: __str_loc name; offset:16;tsize:2; ... After: # cat tracing/events/irq/irq_handler_entry/format ... field:int irq; offset:12; size:4; field:__str_loc name; offset:16; size:2; ... [ Impact: standardize __string field description in events format file ] Signed-off-by: Li Zefan Signed-off-by: Frederic Weisbecker Signed-off-by: Steven Rostedt commit 897f17a65389a26509bd0c79a9812d1c9ea8ea6f Author: Steven Rostedt Date: Thu May 28 16:31:21 2009 -0400 tracing: combine the default tracers into one config Both event tracer and sched switch plugin are selected by default by all generic tracers. But if no generic tracer is enabled, their options appear. But ether one of them will select the other, thus it only makes sense to have the default tracers be selected by one option. [ Impact: clean up kconfig menu ] Signed-off-by: Steven Rostedt commit 5e0a093910876882f91f1d4b8a1635a099e6c7ba Author: Steven Rostedt Date: Thu May 28 15:50:13 2009 -0400 tracing: fix config options to not show when automatically selected There are two options that are selected by all tracers, but we want to have those options available when no tracer is selected. These are The event tracer and sched switch tracer. The are enabled by all tracers, but if a tracer is not selected we want the options to appear. All tracers including them select TRACING. Thus what we would like to do is: config EVENT_TRACER bool "prompt" depends on TRACING select TRACING But that gives us a bug in the kbuild system since we just created a circular dependency. We only want the prompt to show when TRACING is off. This patch adds GENERIC_TRACER that all tracers will select instead of TRACING. The two options (sched switch and event tracer) will select TRACING directly and depend on !GENERIC_TRACER. This solves the cicular dependency. [ Impact: hide options that are selected by default ] Signed-off-by: Steven Rostedt commit 2af15d6a44b871ad4c2a651302374cde8f335480 Author: Steven Rostedt Date: Thu May 28 13:37:24 2009 -0400 ftrace: add kernel command line function filtering When using ftrace=function on the command line to trace functions on boot up, one can not filter out functions that are commonly called. This patch adds two new ftrace command line commands. ftrace_notrace=function-list ftrace_filter=function-list Where function-list is a comma separated list of functions to filter. The ftrace_notrace will make the functions listed not be included in the function tracing, and ftrace_filter will only trace the functions listed. These two act the same as the debugfs/tracing/set_ftrace_notrace and debugfs/tracing/set_ftrace_filter respectively. The simple glob expressions that are allowed by the filter files can also be used by the command line interface. ftrace_notrace=rcu*,*lock,*spin* Will not trace any function that starts with rcu, ends with lock, or has the word spin in it. Note, if the self tests are enabled, they may interfere with the filtering set by the command lines. Signed-off-by: Steven Rostedt commit 3d58829b0510244596079c1d2f1762c53aef2e97 Author: Jiri Slaby Date: Thu May 28 09:54:47 2009 +0200 x86, apic: Restore irqs on fail paths lapic_resume forgets to restore interrupts on fail paths. Fix that. Signed-off-by: Jiri Slaby Acked-by: Cyrill Gorcunov LKML-Reference: <1243497289-18591-1-git-send-email-jirislaby@gmail.com> Signed-off-by: Ingo Molnar Cc: H. Peter Anvin commit 58f892e022e88438183c48661dcdc6a2997dab99 Author: Naga Chumbalkar Date: Tue May 26 21:48:07 2009 +0000 x86: Print real IOAPIC version for x86-64 Fix the fact that the IOAPIC version number in the x86_64 code path always gets assigned to 0, instead of the correct value. Before the patch: (from "dmesg" output): ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 8, version 0, address 0xfec00000, GSI 0-23 <--- After the patch: ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23 <--- History: io_apic_get_version() was compiled out of the x86_64 code path in the commit f2c2cca3acef8b253a36381d9b469ad4fb08563a: Author: Andi Kleen Date: Tue Sep 26 10:52:37 2006 +0200 [PATCH] Remove APIC version/cpu capability mpparse checking/printing ACPI went to great trouble to get the APIC version and CPU capabilities of different CPUs before passing them to the mpparser. But all that data was used was to print it out. Actually it even faked some data based on the boot cpu, not on the actual CPU being booted. Remove all this code because it's not needed. Cc: len.brown@intel.com At the time, the IOAPIC version number was deliberately not printed in the x86_64 code path. However, after the x86 and x86_64 files were merged, the net result is that the IOAPIC version is printed incorrectly in the x86_64 code path. The patch below provides a fix. I have tested it with acpi, and with acpi=off, and did not see any problems. Signed-off-by: Naga Chumbalkar Acked-by: Yinghai Lu LKML-Reference: <20090416014230.4885.94926.sendpatchset@localhost.localdomain> Signed-off-by: Ingo Molnar ************************* commit 43bd1236234cacbc18d1476a9b57e7a306efddf5 Author: Frederic Weisbecker Date: Sat May 30 04:25:30 2009 +0200 tracing/stat: remove unappropriate safe walk on list register_stat_tracer() uses list_for_each_entry_safe to check whether a tracer is already present in the list. But we don't delete anything from the list here, so we don't need the safe version [ Impact: cleanup list use is stat tracing ] Signed-off-by: Frederic Weisbecker commit dbd3fbdfeecfad4e71139db05d72560c3583e2a9 Author: Li Zefan Date: Wed May 27 11:42:46 2009 +0800 tracing/stat: do some cleanups - remove duplicate code in stat_seq_init() - update comments to reflect the change from stat list to stat rbtree [ Impact: clean up ] Signed-off-by: Li Zefan Signed-off-by: Frederic Weisbecker commit e16228069083a2f6b94383ac5739aea7a0f38ce4 Author: Li Zefan Date: Wed May 27 11:04:48 2009 +0800 tracing/stat: remember to free root node When closing a trace_stat file, we destroy the rbtree constructed during file open, but there is memory leak that the root node is not freed. [ Impact: fix memory leak when closing a trace_stat file ] Signed-off-by: Li Zefan Signed-off-by: Frederic Weisbecker commit b3dd7ba7d862707800c7ac45068f14ade2b65155 Author: Li Zefan Date: Wed May 27 11:04:26 2009 +0800 tracing/stat: change dummpy_cmp() to return -1 Currently the output of trace_stat/workqueues is totally reversed: # cat /debug/tracing/trace_stat/workqueues ... 1 17 17 210 37 `-blk_unplug_work+0x0/0x57 1 3779 3779 181 11 |-cfq_kick_queue+0x0/0x2f 1 3796 3796 kblockd/1:120 ... The correct output should be: 1 3796 3796 kblockd/1:120 1 3779 3779 181 11 |-cfq_kick_queue+0x0/0x2f 1 17 17 210 37 `-blk_unplug_work+0x0/0x57 It's caused by "tracing/stat: replace linked list by an rbtree for sorting" (53059c9b67a62a3dc8c80204d3da42b9267ea5a0). dummpy_cmp() should return -1, so rb_node will always be inserted as right-most node in the rbtree, thus we sort the output in ascending order. [ Impact: fix the output of trace_stat/workqueues ] Signed-off-by: Li Zefan Signed-off-by: Frederic Weisbecker commit 8f184f27300f66f6dcc8296c2dae7a1fbe8429c9 Author: Frederic Weisbecker Date: Sat May 16 06:24:36 2009 +0200 tracing/stat: replace linked list by an rbtree for sorting When the stat tracing framework prepares the entries from a tracer to output them to the user, it starts by computing a linear sort through a linked list to give the entries ordered by relevance to the user. This is quite ugly and causes a small latency when we begin to read the file. This patch changes that by turning the linked list into a red-black tree. Athough the whole iteration using the start and next tracer callbacks while opening the file remain the same, it is now much more fast and scalable. The rbtree guarantees O(log(n)) insertions whereas a linked list with linear sorting brought us a O(n) despair. Now the (visible) latency has disapeared. [ Impact: kill the latency while starting to read a stat tracer file ] Signed-off-by: Frederic Weisbecker commit 0d64f8342de26d02451900b1aad94716fe92c4ab Author: Frederic Weisbecker Date: Sat May 16 05:58:49 2009 +0200 tracing/stat: replace trace_stat_session by stat_session The "trace" prefix in struct trace_stat_session type is annoying while reading the trace_stat.c file. It makes the lines longer, and is not that much useful to explain the sense of this type. Just keep "struct stat_session" for this type. [ Impact: make the code a bit more readable ] Signed-off-by: Frederic Weisbecker commit f3c4ae26e93d354152196b62797ba86ad86dd0cc Author: Zhaolei Date: Mon Apr 20 15:02:17 2009 +0800 trace_workqueue: remove blank line between each cpu The blankline between each cpu's workqueue stat is not necessary, because the cpu number is enough to part them by eye. Old style also caused a blankline below headline, and made code complex by using lock, disableirq and get cpu var. Old style: # CPU INSERTED EXECUTED NAME # | | | | 0 8644 8644 events/0 0 0 0 cpuset ... 0 1 1 kdmflush 1 35365 35365 events/1 ... New style: # CPU INSERTED EXECUTED NAME # | | | | 0 8644 8644 events/0 0 0 0 cpuset ... 0 1 1 kdmflush 1 35365 35365 events/1 ... [ Impact: provide more readable code ] Signed-off-by: Zhao Lei Cc: KOSAKI Motohiro Cc: Steven Rostedt Cc: Tom Zanussi Cc: Oleg Nesterov Cc: Andrew Morton Signed-off-by: Frederic Weisbecker commit b8867164f05791a6b5363bd51c1274e03600886e Author: Zhaolei Date: Mon Apr 20 14:59:36 2009 +0800 trace_workqueue: remove cpu_workqueue_stats->first_entry cpu_workqueue_stats->first_entry is useless because we can retrieve the header of a cpu workqueue using: if (&cpu_workqueue_stats->list == workqueue_cpu_stat(cpu)->list.next) [ Impact: cleanup ] Signed-off-by: Zhao Lei Cc: Steven Rostedt Cc: Tom Zanussi Cc: Oleg Nesterov Cc: Andrew Morton Signed-off-by: Frederic Weisbecker commit 1fdfca9c577aac96a559c1ea68f5c9156f17d636 Author: Zhaolei Date: Mon Apr 20 14:58:26 2009 +0800 trace_workqueue: use list_for_each_entry() instead of list_for_each_entry_safe() No need to use list_for_each_entry_safe() in iteration without deleting any node, we can use list_for_each_entry() instead. [ Impact: cleanup ] Signed-off-by: Zhao Lei Cc: Steven Rostedt Cc: Tom Zanussi Cc: Oleg Nesterov Cc: Andrew Morton Signed-off-by: Frederic Weisbecker commit fb39125fd79a25c5002f3b45cf4c80e3fa6b961b Author: Zhaolei Date: Fri Apr 17 15:15:51 2009 +0800 ftrace, workqueuetrace: make workqueue tracepoints use TRACE_EVENT macro v3: zhaolei@cn.fujitsu.com: Change TRACE_EVENT definition to new format introduced by Steven Rostedt: consolidate trace and trace_event headers v2: kosaki@jp.fujitsu.com: print the function names instead of addr, and zap the work addr v1: zhaolei@cn.fujitsu.com: Make workqueue tracepoints use TRACE_EVENT macro TRACE_EVENT is a more generic way to define tracepoints. Doing so adds these new capabilities to the tracepoints: - zero-copy and per-cpu splice() tracing - binary tracing without printf overhead - structured logging records exposed under /debug/tracing/events - trace events embedded in function tracer output and other plugins - user-defined, per tracepoint filter expressions Then, this patch converts DEFINE_TRACE to TRACE_EVENT in workqueue related tracepoints. [ Impact: expand workqueue tracer to events tracing ] Signed-off-by: Zhao Lei Cc: Steven Rostedt Cc: Tom Zanussi Cc: Oleg Nesterov Cc: Andrew Morton Signed-off-by: KOSAKI Motohiro Signed-off-by: Frederic Weisbecker commit ee4c24a5c9b530481394132c8dbc10572d57c075 Merge: 3d58f48 3e0c373 Author: Ingo Molnar Date: Mon Jun 1 22:29:35 2009 +0200 Merge branch 'x86/cpufeature' into irq/numa Merge reason: irq/numa didnt build because this commit: 2759c32: x86: don't call read_apic_id if !cpu_has_apic Had a dependency on x86/cpufeature changes. Pull in that (small) branch to fix the dependency. Signed-off-by: Ingo Molnar commit 3d58f48ba05caed9118bce62b3047f8683438835 Merge: abfe0af d9244b5 Author: Ingo Molnar Date: Mon Jun 1 21:06:21 2009 +0200 Merge branch 'linus' into irq/numa Conflicts: arch/mips/sibyte/bcm1480/irq.c arch/mips/sibyte/sb1250/irq.c Merge reason: we gathered a few conflicts plus update to latest upstream fixes. Signed-off-by: Ingo Molnar commit f04d82b7e0c63d0251f9952a537a4bc4d73aa1a9 Author: GeunSik Lim Date: Thu May 28 10:36:14 2009 +0900 sched: fix typo in sched-rt-group.txt file Fix typo about static priority's range. Kernel Space User Space =============================================================== 0(high) to 98(low) user RT priority 99(high) to 1(low) with SCHED_RR or SCHED_FIFO --------------------------------------------------------------- 99 sched_priority is not used in scheduling decisions(it must be specified as 0) --------------------------------------------------------------- 100(high) to 139(low) user nice -20(high) to 19(low) --------------------------------------------------------------- 140 idle task priority --------------------------------------------------------------- * ref) http://www.kernel.org/doc/man-pages/online/pages/man2/sched_setscheduler.2.html Signed-off-by: GeunSik Lim CC: Steven Rostedt Signed-off-by: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit 294ae4011530d008c59c4fb9847738e39228821e Author: GeunSik Lim Date: Thu May 28 10:36:11 2009 +0900 ftrace: fix typo about map of kernel priority in ftrace.txt file. Fix typo about chart to map the kernel priority to user land priorities. * About sched_setscheduler(2) Processes scheduled under SCHED_FIFO or SCHED_RR can have a (user-space) static priority in the range 1 to 99. (reference: http://www.kernel.org/doc/man-pages/online/pages/ man2/sched_setscheduler.2.html) * From: Steven Rostedt 0 to 98 - maps to RT tasks 99 to 1 (SCHED_RR or SCHED_FIFO) 99 - maps to internal kernel threads that want to be lower than RT tasks but higher than SCHED_OTHER tasks. Although I'm not sure if any kernel thread actually uses this. I'm not even sure how this can be set, because the internal sched_setscheduler function does not allow for it. 100 to 139 - maps nice levels -20 to 19. These are not set via sched_setscheduler, but are set via the nice system call. 140 - reserved for idle tasks. Signed-off-by: GeunSik Lim Acked-by: Steven Rostedt Signed-off-by: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit 88f3907f6f447899544beadf491dccb32015dacb Author: FUJITA Tomonori Date: Wed May 27 09:43:03 2009 +0900 dma-debug: fix debug_dma_sync_sg_for_cpu and debug_dma_sync_sg_for_device DMA-mapping.txt says that debug_dma_sync_sg family must be called with the _same_ one you passed into the dma_map_sg call, it should _NOT_ be the 'count' value _returned_ from the dma_map_sg call. debug_dma_sync_sg_for_cpu and debug_dma_sync_sg_for_device can't handle this properly; they need to use the sg_mapped_ents in struct dma_debug_entry as debug_dma_unmap_sg() does. Signed-off-by: FUJITA Tomonori Signed-off-by: Joerg Roedel commit 884d05970bfbc3db368f23460dc4ce63257f240d Author: FUJITA Tomonori Date: Wed May 27 09:43:02 2009 +0900 dma-debug: use sg_dma_len accessor debug_dma_map_sg() and debug_dma_unmap_sg() use length in struct scatterlist while debug_dma_sync_sg_for_cpu() and debug_dma_sync_sg_for_device() use dma_length. This causes bugs warnings on some IOMMU implementations since these values are not same; the length doesn't represent the dma length. We always need to use sg_dma_len() accessor to get the dma length of a scatterlist entry. Signed-off-by: FUJITA Tomonori Signed-off-by: Joerg Roedel commit 15aedea439c4d7dbec17c99b5e1594c01b979833 Author: FUJITA Tomonori Date: Wed May 27 09:43:01 2009 +0900 dma-debug: use sg_dma_address accessor instead of using dma_address directly Architectures might not have dma_address in struct scatterlist (PARISC doesn't). Directly accessing to dma_address in struct scatterlist is wrong; we need to use sg_dma_address() accesssor instead. Signed-off-by: FUJITA Tomonori Signed-off-by: Joerg Roedel commit 83cce2b69eaa4bc7535f98f75b79397baf277470 Merge: c1eee67 2e8b569 736501e 47bccd6 Author: Joerg Roedel Date: Thu May 28 18:23:56 2009 +0200 Merge branches 'amd-iommu/fixes', 'amd-iommu/debug', 'amd-iommu/suspend-resume' and 'amd-iommu/extended-allocator' into amd-iommu/2.6.31 Conflicts: arch/x86/kernel/amd_iommu.c arch/x86/kernel/amd_iommu_init.c commit 47bccd6bb2b866449d3ecf2ba350ac1c7473b2b8 Author: Joerg Roedel Date: Fri May 22 12:40:54 2009 +0200 amd-iommu: don't free dma adresses below 512MB with CONFIG_IOMMU_STRESS This will test the automatic aperture enlargement code. This is important because only very few devices will ever trigger this code path. So force it under CONFIG_IOMMU_STRESS. Signed-off-by: Joerg Roedel commit f5e9705c6429d24dee832b2edd7f4848d432ea03 Author: Joerg Roedel Date: Fri May 22 12:31:53 2009 +0200 amd-iommu: don't preallocate page tables with CONFIG_IOMMU_STRESS This forces testing of on-demand page table allocation code. Signed-off-by: Joerg Roedel commit fe16f088a88fb73161bba8784375c829f7e87b54 Author: Joerg Roedel Date: Fri May 22 12:27:53 2009 +0200 amd-iommu: disable round-robin allocator for CONFIG_IOMMU_STRESS Disabling the round-robin allocator results in reusing the same dma-addresses again very fast. This is a good test if the iotlb flushing is working correctly. Signed-off-by: Joerg Roedel commit d9cfed925448f097ec7faab80d903eb7e5f99712 Author: Joerg Roedel Date: Tue May 19 12:16:29 2009 +0200 amd-iommu: remove amd_iommu_size kernel parameter This parameter is not longer necessary when aperture increases dynamically. Signed-off-by: Joerg Roedel commit 11b83888ae729457b5cfb936dbd498481f6408df Author: Joerg Roedel Date: Tue May 19 10:23:15 2009 +0200 amd-iommu: enlarge the aperture dynamically By dynamically increasing the aperture the extended allocator is now ready for use. Signed-off-by: Joerg Roedel commit 00cd122ae5e5e7c60cce2af3c35b190d4c3f2d0d Author: Joerg Roedel Date: Tue May 19 09:52:40 2009 +0200 amd-iommu: handle exlusion ranges and unity mappings in alloc_new_range This patch makes sure no reserved addresses are allocated in an dma_ops domain when the aperture is increased dynamically. Signed-off-by: Joerg Roedel commit 9cabe89b99773e682538a8809abc7d4000c77083 Author: Joerg Roedel Date: Mon May 18 16:38:55 2009 +0200 amd-iommu: move aperture_range allocation code to seperate function This patch prepares the dynamic increasement of dma_ops domain apertures. Signed-off-by: Joerg Roedel commit 803b8cb4d9a93b90c67aba2aab7f2c54d595b5b9 Author: Joerg Roedel Date: Mon May 18 15:32:48 2009 +0200 amd-iommu: change dma_dom->next_bit to dma_dom->next_address Simplify the code a little bit by using the same unit for all address space related state in the dma_ops domain structure. Signed-off-by: Joerg Roedel commit 384de72910a7bf96a02a6d8023fe9e16d872beb2 Author: Joerg Roedel Date: Fri May 15 12:30:05 2009 +0200 amd-iommu: make address allocator aware of multiple aperture ranges This patch changes the AMD IOMMU address allocator to allow up to 32 aperture ranges per dma_ops domain. Signed-off-by: Joerg Roedel commit 53812c115cda1f660b286c939669154a56976f6b Author: Joerg Roedel Date: Tue May 12 12:17:38 2009 +0200 amd-iommu: handle page table allocation failures in dma_ops code The code will be required when the aperture size increases dynamically in the extended address allocator. Signed-off-by: Joerg Roedel commit 8bda3092bcfa68f786d94549ae026e8db1eff041 Author: Joerg Roedel Date: Tue May 12 12:02:46 2009 +0200 amd-iommu: move page table allocation code to seperate function This patch makes page table allocation usable for dma_ops code. Signed-off-by: Joerg Roedel commit c3239567a20e90e3026ac5453d5267506ef7b030 Author: Joerg Roedel Date: Tue May 12 10:56:44 2009 +0200 amd-iommu: introduce aperture_range structure This is a preperation for extended address allocator. Signed-off-by: Joerg Roedel commit 736501ee000757082a4f0832826ae1eda7ea106e Author: Joerg Roedel Date: Tue May 12 09:56:12 2009 +0200 amd-iommu: implement suspend/resume This patch puts everything together and enables suspend/resume support in the AMD IOMMU driver. Signed-off-by: Joerg Roedel commit 05f92db9f47f852ff48bbed1b063b8ab8ad00285 Author: Joerg Roedel Date: Tue May 12 09:52:46 2009 +0200 amd_iommu: un __init functions required for suspend/resume This patch makes sure that no function required for suspend/resume of AMD IOMMU driver is thrown away after boot. Signed-off-by: Joerg Roedel commit 7d7a110c6127b7fc683dc6d764555f2dbd22b054 Author: Joerg Roedel Date: Tue May 5 15:48:10 2009 +0200 amd-iommu: add function to flush tlb for all devices This function is required for suspend/resume support with AMD IOMMU enabled. Signed-off-by: Joerg Roedel commit bfd1be1857e5a3385bf146e02e6dc3dd4241bec1 Author: Joerg Roedel Date: Tue May 5 15:33:57 2009 +0200 amd-iommu: add function to flush tlb for all domains This function is required for suspend/resume support with AMD IOMMU enabled. Signed-off-by: Joerg Roedel commit 92ac4320af6ed4294c2c221dd4ccbfd9026a3aa7 Author: Joerg Roedel Date: Tue May 19 19:06:27 2009 +0200 amd-iommu: add function to disable all iommus This function is required for suspend/resume support with AMD IOMMU enabled. Signed-off-by: Joerg Roedel commit d91cecdd796c27df46339e80ed436a980c56fcad Author: Joerg Roedel Date: Mon May 4 18:51:00 2009 +0200 amd-iommu: remove support for msi-x Current hardware uses msi instead of msi-x so this code it not necessary and can not be tested. The best thing is to drop this code. Signed-off-by: Joerg Roedel commit fab6afa30954a0684ef8ac1d9a606e74a6215ab6 Author: Joerg Roedel Date: Mon May 4 18:46:34 2009 +0200 amd-iommu: drop pointless iommu-loop in msi setup code It is not necessary to loop again over all IOMMUs in this code. So drop the loop. Signed-off-by: Joerg Roedel commit 58492e128892e3b55f1a6ef0cf3c3ab4ce7cc214 Author: Joerg Roedel Date: Mon May 4 18:41:16 2009 +0200 amd-iommu: consolidate hardware initialization to one function This patch restructures the AMD IOMMU initialization code to initialize all hardware registers with one single function call. This is helpful for suspend/resume support. Signed-off-by: Joerg Roedel commit 3bd221724adb9d642270df0e78b0105fb61e4a1c Author: Joerg Roedel Date: Mon May 4 15:06:20 2009 +0200 amd-iommu: introduce for_each_iommu* macros This patch introduces the for_each_iommu and for_each_iommu_safe macros to simplify the developers life when having to iterate over all AMD IOMMUs in the system. Signed-off-by: Joerg Roedel commit c1eee67b2d8464781f5868a34168df61e40e85a6 Author: Chris Wright Date: Thu May 21 00:56:58 2009 -0700 amd iommu: properly detach from protection domain on ->remove Some drivers may use the dma api during ->remove which will cause a protection domain to get reattached to a device. Delay the detach until after the driver is completely unbound. [ joro: added a little merge helper ] [ Impact: fix too early device<->domain removal ] Signed-off-by: Chris Wright Signed-off-by: Joerg Roedel commit 0bc252f430d6a3ac7836d40f00d0ae020593b11b Author: Joerg Roedel Date: Fri May 22 12:48:05 2009 +0200 amd-iommu: make sure only ivmd entries are parsed The bug never triggered. But it should be fixed to protect against broken ACPI tables in the future. [ Impact: protect against broken ivrs acpi table ] Signed-off-by: Joerg Roedel commit 7455aab1f95f6464c5af3fbdee28744e73f38564 Author: Neil Turton Date: Thu May 14 14:08:11 2009 +0100 amd-iommu: fix the handling of device aliases in the AMD IOMMU driver. The devid parameter to set_dev_entry_from_acpi is the requester ID rather than the device ID since it is used to index the IOMMU device table. The handling of IVHD_DEV_ALIAS used to pass the device ID. This patch fixes it to pass the requester ID. [ Impact: fix setting the wrong req-id in acpi-table parsing ] Signed-off-by: Neil Turton Signed-off-by: Joerg Roedel commit 421f909c803d1c397f6c66b75653f238696c39ee Author: Neil Turton Date: Thu May 14 14:00:35 2009 +0100 amd-iommu: fix an off-by-one error in the AMD IOMMU driver. The variable amd_iommu_last_bdf holds the maximum bdf of any device controlled by an IOMMU, so the number of device entries needed is amd_iommu_last_bdf+1. The function tbl_size used amd_iommu_last_bdf instead. This would be a problem if the last device were a large enough power of 2. [ Impact: fix amd_iommu_last_bdf off-by-one error ] Signed-off-by: Neil Turton Signed-off-by: Joerg Roedel commit 2e8b569614b89c9b1b85cba37db36daeeeff744e Author: Joerg Roedel Date: Fri May 22 12:44:03 2009 +0200 amd-iommu: disable device isolation with CONFIG_IOMMU_STRESS With device isolation disabled we can test better for race conditions in dma_ops related code. Signed-off-by: Joerg Roedel commit 2be69c79e9a46a554fc3ff57d886e65e7a73eb72 Author: Joerg Roedel Date: Fri May 22 12:15:49 2009 +0200 x86/iommu: add IOMMU_STRESS Kconfig entry This Kconfig option is intended to enable various code paths or parameters in IOMMU implementations to stress test the code and/or the hardware. This can also be done by disabling optimizations in the code when this option is switched on. Signed-off-by: Joerg Roedel Cc: David Woodhouse Cc: FUJITA Tomonori commit b3b99ef8b4f80f3f093a72110e7697c2281ae45d Author: Joerg Roedel Date: Fri May 22 12:02:48 2009 +0200 amd-iommu: move protection domain printk to dump code This information is only helpful for debugging. Don't print it anymore unless explicitly requested. Signed-off-by: Joerg Roedel commit 02acc43a294098c2a4cd22cf24e9c988644f9f7f Author: Joerg Roedel Date: Wed May 20 16:24:21 2009 +0200 amd-iommu: print ivmd information to dmesg when requested Add information about device memory mapping requirements for the IOMMU as described in the IVRS ACPI table to the kernel log if amd_iommu_dump was specified on the kernel command line. Signed-off-by: Joerg Roedel commit 42a698f40a0946f5517308411b9e003ae031414d Author: Joerg Roedel Date: Wed May 20 15:41:28 2009 +0200 amd-iommu: print ivhd information to dmesg when requested Add information about devices belonging to an IOMMU as described in the IVRS ACPI table to the kernel log if amd_iommu_dump was specified on the kernel command line. Signed-off-by: Joerg Roedel commit 9c72041f719e2864d4208a89341c36b316dbf893 Author: Joerg Roedel Date: Wed May 20 13:53:57 2009 +0200 amd-iommu: add dump for iommus described in ivrs table Add information about IOMMU devices described in the IVRS ACPI table to the kernel log if amd_iommu_dump was specified on the kernel command line. Signed-off-by: Joerg Roedel commit fefda117ddb324b872312f1f061230e627c9f5ee Author: Joerg Roedel Date: Wed May 20 12:21:42 2009 +0200 amd-iommu: add amd_iommu_dump parameter This kernel parameter will be useful to get some AMD IOMMU related information in dmesg that is not necessary for the default user but may be helpful in debug situations. Signed-off-by: Joerg Roedel commit ed888aef427365d19f887c271a3a906d16422d24 Author: Joerg Roedel Date: Fri May 22 17:16:04 2009 +0200 dma-debug: re-add dma memory leak detection This is basically a revert of commit 314eeac9 but now in a fixed version. Signed-off-by: Joerg Roedel commit 7d96fd41cadc55f4e00231c8c71b8e25c779f122 Author: Petr Tesarik Date: Mon May 25 11:02:02 2009 +0200 x86: move rdtsc_barrier() into the TSC vread method The *fence instructions were moved to vsyscall_64.c by commit cb9e35dce94a1b9c59d46224e8a94377d673e204. But this breaks the vDSO, because vread methods are also called from there. Besides, the synchronization might be unnecessary for other time sources than TSC. [ Impact: fix potential time warp in VDSO ] Signed-off-by: Petr Tesarik LKML-Reference: <9d0ea9ea0f866bdc1f4d76831221ae117f11ea67.1243241859.git.ptesarik@suse.cz> Signed-off-by: Thomas Gleixner Cc: commit abfe0af9813153bae8c85d9bac966bafcb8ddab1 Author: Yinghai Lu Date: Wed May 20 00:37:40 2009 -0700 x86: enable_update_mptable should be a macro instead of declaring one variant as an inline function... because other case is a variable Signed-off-by: Yinghai Lu LKML-Reference: <4A13B344.7030307@kernel.org> Signed-off-by: Ingo Molnar commit f2aebaee653a35b01c3665de2cbb1e31456b8ea8 Author: Zhaolei Date: Wed May 27 21:36:02 2009 +0800 ftrace: don't convert function's local variable name in macro "call" is an argument of macro, but it is also used as a local variable name of function in macro. We should keep this local variable name distinct from any CPP macro parameter name if both are in the same macro scope, although it hasn't caused any problem yet. [ Impact: robustify macro ] Signed-off-by: Zhao Lei Acked-by: Steven Rostedt Signed-off-by: Frederic Weisbecker commit 5b6045a906f48d37591365c5dcdd6d1d146bfd4a Author: Heiko Carstens Date: Tue May 26 17:28:02 2009 +0200 trace: disable preemption before taking raw spinlocks s390 code uses smp_processor_id() in __raw_spin_lock() code which reveals that a (raw) spinlock is taken without preemption disabled. This can potentially deadlock. To fix this explicitly disable and enable preemption. BUG: using smp_processor_id() in preemptible [00000000] code: cat/2278 caller is trace_find_cmdline+0x40/0xfc CPU: 0 Not tainted 2.6.30-rc7-dirty #39 Process cat (pid: 2278, task: 000000003faedb68, ksp: 000000003b33b988) 000000003b33b988 000000003b33bae0 0000000000000002 0000000000000000 000000003b33bb80 000000003b33baf8 000000003b33baf8 00000000000175d6 0000000000000001 000000003b33b988 000000003f9b0000 000000000000000b 000000000000000c 000000003b33bb40 000000003b33bae0 0000000000000000 0000000000000000 00000000000175d6 000000003b33bae0 000000003b33bb28 Call Trace: ([<00000000000174b2>] show_trace+0x112/0x170) [<0000000000017582>] show_stack+0x72/0x100 [<0000000000441538>] dump_stack+0xc8/0xd8 [<000000000025c350>] debug_smp_processor_id+0x114/0x130 [<00000000000bf0e4>] trace_find_cmdline+0x40/0xfc [<00000000000c35d4>] trace_print_context+0x58/0xac [<00000000000bb676>] print_trace_line+0x416/0x470 [<00000000000bc8fe>] s_show+0x4e/0x428 [<000000000013834e>] seq_read+0x36a/0x5d4 [<0000000000112a78>] vfs_read+0xc8/0x174 [<0000000000112c58>] SyS_read+0x74/0xc4 [<000000000002c7ae>] sysc_noemu+0x10/0x16 [<000002000012436c>] 0x2000012436c 1 lock held by cat/2278: #0: (&p->lock){+.+.+.}, at: [<0000000000138056>] seq_read+0x72/0x5d4 [ Impact: fix preempt-unsafe raw spinlock ] Signed-off-by: Heiko Carstens Acked-by: Steven Rostedt Signed-off-by: Frederic Weisbecker commit c2adae0970ca1db8adb92fb56ae3bcabd916e8bd Author: Steven Rostedt Date: Wed May 20 19:56:19 2009 -0400 tracing: convert irq events to use __print_symbolic The recording of the names at trace time is inefficient. This patch implements the softirq event recording to only record the vector and then use the __print_symbolic interface to print out the names. [ Impact: faster recording of softirq events ] Signed-off-by: Steven Rostedt Signed-off-by: Frederic Weisbecker commit 0f4fc29dd68dfab9c6ddd5d087d34a5b6818cb00 Author: Steven Rostedt Date: Wed May 20 19:21:47 2009 -0400 tracing: add __print_symbolic to trace events This patch adds __print_symbolic which is similar to __print_flags but works for an enumeration type instead. That is, there is only a one to one mapping between the values and the symbols. When a match is made, then it is printed, otherwise the hex value is outputed. [ Impact: add interface for showing symbol names in events ] Signed-off-by: Steven Rostedt Signed-off-by: Frederic Weisbecker commit 62ba180e80f4194a498585ac0e4c07daa8ca08d1 Author: Steven Rostedt Date: Fri May 15 16:16:30 2009 -0400 tracing: add flag output for kmem events This patch changes the output for gfp_flags from being a simple hex value to the actual names. gfp_flags=GFP_ATOMIC instead of gfp_flags=00000020 And even gfp_flags=GFP_KERNEL instead of gfp_flags=000000d0 (Thanks to Frederic Weisbecker for pointing out that the first version had a bad order of GFP masks) [ Impact: more human readable output from tracer ] Acked-by: Eduard - Gabriel Munteanu Signed-off-by: Steven Rostedt Signed-off-by: Frederic Weisbecker commit 937cdb9db7f59278d0cb1582e6e64e3dfd73b4fc Author: Steven Rostedt Date: Fri May 15 10:51:13 2009 -0400 tracing: add previous task state info to sched switch event It is useful to see the state of a task that is being switched out. This patch adds the output of the state of the previous task in the context switch event. [ Impact: see state of switched out task in context switch ] Signed-off-by: Steven Rostedt Signed-off-by: Frederic Weisbecker commit be74b73a57645cc253d881ab0c1014eb64b9cf22 Author: Steven Rostedt Date: Tue May 26 20:25:22 2009 +0200 tracing: add __print_flags for events Developers have been asking for the ability in the ftrace event tracer to display names of bits in a flags variable. Instead of printing out c2, it would be easier to read FOO|BAR|GOO, assuming that FOO is bit 1, BAR is bit 6 and GOO is bit 7. Some examples where this would be useful are the state flags in a context switch, kmalloc flags, and even permision flags in accessing files. [ v2 changes include: Frederic Weisbecker's idea of using a mask instead of bits, thus we can output GFP_KERNEL instead of GPF_WAIT|GFP_IO|GFP_FS. Li Zefan's idea of allowing the caller of __print_flags to add their own delimiter (or no delimiter) where we can get for file permissions rwx instead of r|w|x. ] [ v3 changes: Christoph Hellwig's idea of using an array instead of va_args. ] [ Impact: better displaying of flags in trace output ] Signed-off-by: Steven Rostedt Signed-off-by: Frederic Weisbecker commit 0e907c99391362385c8e3af2c43b904dd1fd5d73 Author: Zhaolei Date: Mon May 25 18:13:59 2009 +0800 ftrace: clean up of using ftrace_event_enable_disable() Always use ftrace_event_enable_disable() to enable/disable an event so that we can factorize out the event toggling code. [ Impact: factorize and cleanup event tracing code ] Signed-off-by: Zhao Lei Cc: Steven Rostedt Cc: Tom Zanussi LKML-Reference: <4A14FDFE.2080402@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker commit b11c53e12f94a46b50bccc7a1a953d7ca1d54a31 Author: Zhaolei Date: Mon May 25 18:11:59 2009 +0800 ftrace: Add task_comm support for trace_event If we enable a trace event alone without any tracer running (such as function tracer, sched switch tracer, etc...) it can't output enough task command information. We need to use the tracing_{start/stop}_cmdline_record() helpers which are designed to keep track of cmdlines for any tasks that were scheduled during the tracing. Before this patch: # echo 1 > debugfs/tracing/events/sched/sched_switch/enable # cat debugfs/tracing/trace # tracer: nop # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | <...>-2289 [000] 526276.724790: sched_switch: task bash:2289 [120] ==> sshd:2287 [120] <...>-2287 [000] 526276.725231: sched_switch: task sshd:2287 [120] ==> bash:2289 [120] <...>-2289 [000] 526276.725452: sched_switch: task bash:2289 [120] ==> sshd:2287 [120] <...>-2287 [000] 526276.727181: sched_switch: task sshd:2287 [120] ==> swapper:0 [140] -0 [000] 526277.032734: sched_switch: task swapper:0 [140] ==> events/0:5 [115] <...>-5 [000] 526277.032782: sched_switch: task events/0:5 [115] ==> swapper:0 [140] ... After this patch: # tracer: nop # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | bash-2269 [000] 527347.989229: sched_switch: task bash:2269 [120] ==> sshd:2267 [120] sshd-2267 [000] 527347.990960: sched_switch: task sshd:2267 [120] ==> bash:2269 [120] bash-2269 [000] 527347.991143: sched_switch: task bash:2269 [120] ==> sshd:2267 [120] sshd-2267 [000] 527347.992959: sched_switch: task sshd:2267 [120] ==> swapper:0 [140] -0 [000] 527348.531989: sched_switch: task swapper:0 [140] ==> events/0:5 [115] events/0-5 [000] 527348.532115: sched_switch: task events/0:5 [115] ==> swapper:0 [140] ... Changelog: v1->v2: Update Kconfig to select CONTEXT_SWITCH_TRACER in ENABLE_EVENT_TRACING v2->v3: v2 can solve problem that was caused by config EVENT_TRACING alone, but when CONFIG_FTRACE is off and CONFIG_TRACING is selected by other config, compile fail happened again. This version solves it. [ Impact: fix incomplete output of event tracing ] Signed-off-by: Zhao Lei Cc: Tom Zanussi Cc: Steven Rostedt LKML-Reference: <4A14FDFE.2080402@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker commit 29fcefba8a2f0fea11e2b721fe174a1832801284 Author: Pekka Enberg Date: Sun May 24 11:13:17 2009 +0300 kmemtrace: fix kernel parameter documentation The kmemtrace.enable kernel parameter no longer works. To enable kmemtrace at boot-time, you must pass "ftrace=kmemtrace" instead. [ Impact: remove obsolete kernel parameter documentation ] Cc: Eduard - Gabriel Munteanu Signed-off-by: Pekka Enberg LKML-Reference: Signed-off-by: Frederic Weisbecker commit b0aae68cc5508f3c2fbf728988c954db4c8b8a53 Author: Li Zefan Date: Thu May 21 13:59:18 2009 +0800 tracing/events: change the type of __str_loc_item to unsigned short When defining a dynamic size string, we add __str_loc_##item to the trace entry, and it stores the location of the actual string in entry->_str_data[] 'unsigned short' should be sufficient to store this information, thus we save 2 bytes per dyn-size string in the ring buffer. [ Impact: reduce memory occupied by dyn-size strings in ring buffer ] Signed-off-by: Li Zefan Cc: Steven Rostedt LKML-Reference: <4A14EDB6.2050507@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker commit 4f5359685af6de7dca101393dc606620adbe963f Author: Lai Jiangshan Date: Mon May 18 19:35:34 2009 +0800 tracing: add trace_event_read_lock() I found that there is nothing to protect event_hash in ftrace_find_event(). Rcu protects the event hashlist but not the event itself while we use it after its extraction through ftrace_find_event(). This lack of a proper locking in this spot opens a race window between any event dereferencing and module removal. Eg: --Task A-- print_trace_line(trace) { event = find_ftrace_event(trace) --Task B-- trace_module_remove_events(mod) { list_trace_events_module(ev, mod) { unregister_ftrace_event(ev->event) { hlist_del(ev->event->node) list_del(....) } } } |--> module removed, the event has been dropped --Task A-- event->print(trace); // Dereferencing freed memory If the event retrieved belongs to a module and this module is concurrently removed, we may end up dereferencing a data from a freed module. RCU could solve this, but it would add latency to the kernel and forbid tracers output callbacks to call any sleepable code. So this fix converts 'trace_event_mutex' to a read/write semaphore, and adds trace_event_read_lock() to protect ftrace_find_event(). [ Impact: fix possible freed memory dereference in ftrace ] Signed-off-by: Lai Jiangshan Acked-by: Steven Rostedt LKML-Reference: <4A114806.7090302@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker commit ee0736627d3347be0be2769fa7b26431f9726c9d Merge: cf9972a 0af48f4 Author: H. Peter Anvin Date: Sat May 23 16:42:19 2009 -0700 Merge branch 'x86/urgent' into x86/setup Resolved conflicts: arch/x86/boot/memory.c Signed-off-by: H. Peter Anvin commit 948cd52906baf1f92aeea2f9b5c515db1b2e592a Author: Paul Mundt Date: Fri May 22 10:40:09 2009 +0900 sparseirq: Allow early irq_desc allocation Presently non-legacy IRQs have their irq_desc allocated with kzalloc_node(). This assumes that all callers of irq_to_desc_node_alloc() will be sufficiently late in the boot process that kmalloc is available. While porting sparseirq support to sh this blew up immediately, as at the time that we register the CPU's interrupt vector map only bootmem is available. Check slab_is_available() to work out which path to use. [ Impact: fix SH early boot crash with sparseirq enabled ] Signed-off-by: Paul Mundt Acked-by: Yinghai Lu Cc: Andrew Morton Cc: Mel Gorman LKML-Reference: <20090522014008.GA2806@linux-sh.org> Signed-off-by: Ingo Molnar commit c6ac4c18fbc92a26df71ece609b082bc3099676b Author: H. Peter Anvin Date: Wed May 20 11:26:09 2009 -0700 x86, boot: correct the calculation of ZO_INIT_SIZE Correct the calculation of ZO_INIT_SIZE (the amount of memory we need during decompression). One symbol (ZO_startup_32) was missing from zoffset.h, and another (ZO_z_extract_offset) was misspelled. [ Impact: build fix ] Signed-off-by: H. Peter Anvin commit 5537937696c55530447c20aa27daccb8d0d29b33 Author: Ming Lei Date: Mon May 18 23:04:46 2009 +0800 ftrace: fix check for return value of register_module_notifier in event_trace_init register_module_notifier() returns zero in the success case. So fix the inverted fail case check in trace events modules handler. [ Impact: fix spurious warning on ftrace initialization] Reported-by: Li Zefan Signed-off-by: Ming Lei Signed-off-by: Frederic Weisbecker commit 2070887fdeacd9c13f3e805e3f0086c9f22a4d93 Author: Thomas Gleixner Date: Tue May 19 23:04:59 2009 +0200 futex: fix restart in wait_requeue_pi If the waiter has been requeued to the outer PI futex and is interrupted by a signal and the thread handles the signal then ERESTART_RESTARTBLOCK is changed to EINTR and the restart block is discarded. That way we return an unexcpected EINTR to user space instead of ending up in futex_lock_pi_restart. But we do not need to restart the syscall because we know that the condition has changed since we have been requeued. If we would simply restart the syscall then we would drop out via the comparison of the user space value with EWOULDBLOCK. The user space side needs to handle EWOULDBLOCK anyway as the enqueueing on the inner futex can race with a requeue/wake. So we can simply return EWOULDBLOCK to user space which also signals that we did not take the outer futex and let user space handle it in the same way it has to handle the requeue/wake race. Signed-off-by: Thomas Gleixner commit 1c840c14906d4ddf66c1f4f5daea059aad951c82 Author: Thomas Gleixner Date: Wed May 20 09:22:40 2009 +0200 futex: fix restart for early wakeup in futex_wait_requeue_pi() The futex_wait_requeue_pi op should restart unconditionally like futex_lock_pi. The user of that function e.g. pthread_cond_wait can not be interrupted so we do not care about the SA_RESTART flag of the signal. Clean up the FIXMEs. Signed-off-by: Thomas Gleixner commit c8b15a706d921baed3195407e4f55270112bb3c6 Author: Thomas Gleixner Date: Wed May 20 09:18:50 2009 +0200 futex: cleanup error exit Reuse the put_key_ref(key2) call in the exit path. Signed-off-by: Thomas Gleixner commit 521c180874dae86f675d23c4eade4dba8b1f2cc8 Merge: f1a11e0 64d1304 Author: Thomas Gleixner Date: Wed May 20 09:02:28 2009 +0200 Merge branch 'core/urgent' into core/futexes Merge reason: this branch was on an pre -rc1 base, merge it up to -rc6+ to get the latest upstream fixes. Conflicts: kernel/futex.c Signed-off-by: Thomas Gleixner commit 4c6f18fc81565967da20f2d4a3922cdba33f8e2b Author: Yinghai Lu Date: Mon May 18 10:23:28 2009 -0700 x86, io-apic: Don't mark pin_programmed early Peter bisected that: | commit b9c61b70075c87a8612624736faf4a2de5b1ed30 | Date: Wed May 6 10:10:06 2009 -0700 | | x86/pci: update pirq_enable_irq() to setup io apic routing | | So we can set io apic routing only when enabling the device irq. wrecked his opteron box, ata1 interrupts fail to get through. ata1 is using irq 11: [ 1.451839] sata_svw 0000:01:0e.0: version 2.3 [ 1.456333] sata_svw 0000:01:0e.0: PCI INT A -> GSI 11 (level, low) -> IRQ 11 [ 1.463639] scsi0 : sata_svw [ 1.466949] scsi1 : sata_svw [ 1.470022] scsi2 : sata_svw [ 1.473090] scsi3 : sata_svw [ 1.476112] ata1: SATA max UDMA/133 mmio m8192@0xff3fe000 port 0xff3fe000 irq 11 [ 1.483490] ata2: SATA max UDMA/133 mmio m8192@0xff3fe000 port 0xff3fe100 irq 11 [ 1.490870] ata3: SATA max UDMA/133 mmio m8192@0xff3fe000 port 0xff3fe200 irq 11 [ 1.498247] ata4: SATA max UDMA/133 mmio m8192@0xff3fe000 port 0xff3fe300 irq 11 that pin is overlapped with pin with legacy ones. We should not set bits in pin_programmed here, so that those bit could be set later via io_apic_set_pci_routing(). [ Impact: fix boot hang on certain systems ] Reported-by: Peter Zijlstra Signed-off-by: Yinghai Lu Tested-by: Peter Zijlstra Cc: Jack Steiner LKML-Reference: <4A119990.9020606@kernel.org> Signed-off-by: Ingo Molnar commit 4aee2ad461889132bfb5a1518a9580d00e17008c Author: Jaswinder Singh Rajput Date: Tue May 19 17:07:01 2009 +0530 x86: asm/processor.h: remove double declaration Remove double declaration of: extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c); extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c); extern unsigned short num_cache_leaves; they are already defined in the same file. [ Impact: cleanup ] Signed-off-by: Jaswinder Singh Rajput LKML-Reference: <1242733021.3377.1.camel@localhost.localdomain> Signed-off-by: Ingo Molnar commit fd51d251e4cdb21f68e9dbc4336514d64a105a79 Author: Stefan Raspl Date: Tue May 19 09:59:08 2009 +0200 blktrace: remove debugfs entries on bad path debugfs directory entries for devices are not removed on some of the failure pathes in do_blk_trace_setup(). One way to reproduce is to start blktrace on multiple devices with insufficient Vmalloc space: Devices will fail with a message like this: BLKTRACESETUP(2) /dev/sdu failed: 5/Input/output error If so, the respective entries in debugfs (e.g. /sys/kernel/debug/block/sdu) will remain and subsequent attempts to start blktrace on the respective devices will not succeed due to existing directories. [ Impact: fix /debug/tracing file cleanup corner case ] Signed-off-by: Stefan Raspl Acked-by: Li Zefan Cc: Li Zefan Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com LKML-Reference: <4A1266CC.5040801@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar commit 143c145e3a475065a4be661468d0df1bd0b25f74 Author: Li Zefan Date: Tue May 19 14:43:15 2009 +0800 tracing/events: Documentation updates - fix some typos - document the difference between '>' and '>>' - document the 'enable' toggle - remove section "Defining an event-enabled tracepoint", since it's out-dated and sample/trace_events/ already serves this purpose. v2: add "Updated by Li Zefan" [ Impact: make documentation up-to-date ] Signed-off-by: Li Zefan Cc: Steven Rostedt Cc: Frederic Weisbecker Cc: "Theodore Ts'o" LKML-Reference: <4A125503.5060406@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 4200efd9acda4accf24640f1e77d24fdcdb524df Author: Ingo Molnar Date: Tue May 19 09:22:19 2009 +0200 sched: properly define the sched_group::cpumask and sched_domain::span fields Properly document the variable-size structure tricks we are doing wrt. struct sched_group and sched_domain, and use the field[0] GCC extension instead of defining a vla array. Dont use unions for this, as pointed out by Linus. [ Impact: cleanup, un-confuse Sparse and LLVM ] Reported-by: Jeff Garzik Acked-by: Linus Torvalds LKML-Reference: Signed-off-by: Ingo Molnar commit 24ed0c4bfc7d2d7507bb9d50f7f3bbdcd85d76dd Author: Ming Lei Date: Sun May 17 15:31:38 2009 +0800 tracing: fix check for return value of register_module_notifier return zero should be correct, so fix it. [ Impact: eliminate incorrect syslog message ] Signed-off-by: Ming Lei Acked-by: Frederic Weisbecker Acked-by: Li Zefan Cc: rostedt@goodmis.org LKML-Reference: <1242545498-7285-1-git-send-email-tom.leiming@gmail.com> Signed-off-by: Ingo Molnar commit 1079cac0f4eb7d968395378b1625979d4c818dd6 Merge: 5872144 1406de8 Author: Ingo Molnar Date: Mon May 18 10:15:09 2009 +0200 Merge commit 'v2.6.30-rc6' into tracing/core Merge reason: we were on an -rc4 base, sync up to -rc6 Signed-off-by: Ingo Molnar commit f1bdb523880c7f6990e9e8e50b0fc972ca475e84 Author: Yinghai Lu Date: Fri May 15 13:05:16 2009 -0700 x86, irq: don't call mp_config_acpi_gsi() if update_mptable is not enabled Len expressed concern that the update_mptable feature has side-effects on the ACPI code. Make it sure explicitly that the code only ever gets called if the (default disabled) update_mptable boot quirk option is disabled. [ Impact: isolate the update_mptable feature from ACPI code more ] Signed-off-by: Yinghai Lu Cc: Len Brown LKML-Reference: <4A0DC832.5090200@kernel.org> Signed-off-by: Ingo Molnar commit 629e15d245f46bef9d26199b450f882f9437a8fe Author: Yinghai Lu Date: Fri May 15 13:05:16 2009 -0700 x86, irq: update_mptable needs pci_routeirq To get all device irq routing and to save them. This is basically an implicit pci=routeirq enablement if (and on if) the update_mptable boot option (which is off by default) has been specified. [ Impact: extend the update_mptable boot opion's scope ] Signed-off-by: Yinghai Lu Cc: Jesse Barnes LKML-Reference: <4A0DB7B4.4060702@kernel.org> Signed-off-by: Ingo Molnar commit 35d5a9a61490bf39d2e48d7f499c8c801a39ebe9 Author: Yinghai Lu Date: Fri May 15 13:59:37 2009 -0700 x86: fix system without memory on node0 Jack found a boot crash on a system which doesn't have memory on node0. It turns out with recent per_cpu changes, node_number for BSP will always be 0, and it is not consistent to cpu_to_node() that might set it to a different (nearer) node already. aka when numa_set_node() for node0 is called early before per_cpu area is setup: two places touched that per_cpu(node_number,): 1. in cpu/common.c::cpu_init() and it is not for BP | #ifdef CONFIG_NUMA | if (cpu != 0 && percpu_read(node_number) == 0 && | cpu_to_node(cpu) != NUMA_NO_NODE) | percpu_write(node_number, cpu_to_node(cpu)); | #endif for BP: traps_init ==> cpu_init for AP: start_secondary ==> cpu_init 2. cpu/intel.c or amd.c::srat_detect_node via numa_set_node() for BP: check_bugs ==> identify_boot_cpu ==> identify_cpu() that is rather later before numa_node_id() is used for BP... for AP: start_secondary => smp_callin => smp_store_cpu_info() => => identify_secondary_cpu => identify_cpu() so try to set that for BP earlier in setup_per_cpu_areas(), and don't bother to set that for APs there (it will be updated later and will be used later) (and don't mess the 0 before the copying BP per_cpu data to APs) [ Impact: fix boot crash on memoryless node-0 ] Reported-and-tested-by: Jack Steiner Cc: Tejun Heo Signed-off-by: Yinghai Lu LKML-Reference: <4A0C4A02.7050401@kernel.org> Signed-off-by: Ingo Molnar commit 7c43769a9776141ec23ca81a1bdd5a9c0512f165 Author: Yinghai Lu Date: Fri May 15 13:59:37 2009 -0700 x86, mm: Fix node_possible_map logic Recently there were some changes to the meaning of node_possible_map, and it is quite strange: - the node without memory would be set in node_possible_map - but some node with less NODE_MIN_SIZE will be kicked out of node_possible_map. fix it by adding strict_setup_node_bootmem(). Also, remove unparse_node(). so result will be: 1. cpu_to_node() will return online node only (nearest one) 2. apicid_to_node() still returns the node that could be not online but is set in node_possible_map. 3. node_possible_map will include nodes that mem on it are less NODE_MIN_SIZE v2: after move_cpus_to_node change. [ Impact: get node_possible_map right ] Signed-off-by: Yinghai Lu Tested-by: Jack Steiner LKML-Reference: <4A0C49BE.6080800@kernel.org> [ v3: various small cleanups and comment clarifications ] Signed-off-by: Ingo Molnar commit 888a589f6be07d624e21e2174d98375e9f95911b Author: Yinghai Lu Date: Fri May 15 13:59:37 2009 -0700 mm, x86: remove MEMORY_HOTPLUG_RESERVE related code after: | commit b263295dbffd33b0fbff670720fa178c30e3392a | Author: Christoph Lameter | Date: Wed Jan 30 13:30:47 2008 +0100 | | x86: 64-bit, make sparsemem vmemmap the only memory model we don't have MEMORY_HOTPLUG_RESERVE anymore. Historically, x86-64 had an architecture-specific method for memory hotplug whereby it scanned the SRAT for physical memory ranges that could be potentially used for memory hot-add later. By reserving those ranges without physical memory, the memmap would be allocated and left dormant until needed. This depended on the DISCONTIG memory model which has been removed so the code implementing HOTPLUG_RESERVE is now dead. This patch removes the dead code used by MEMORY_HOTPLUG_RESERVE. (Changelog authored by Mel.) v2: updated changelog, and remove hotadd= in doc [ Impact: remove dead code ] Signed-off-by: Yinghai Lu Reviewed-by: Christoph Lameter Reviewed-by: Mel Gorman Workflow-found-OK-by: Andrew Morton LKML-Reference: <4A0C4910.7090508@kernel.org> Signed-off-by: Ingo Molnar commit b286e21868ea1af724a7a4802da2c8e144fa70de Merge: ed077b5 1406de8 Author: Ingo Molnar Date: Mon May 18 09:12:45 2009 +0200 Merge commit 'v2.6.30-rc6' into x86/mm Merge reason: sync up to -rc6 which has changes to mm/ which we are going to touch in the commits to follow as well. Signed-off-by: Ingo Molnar commit 2759c3287de27266e06f1f4e82cbd2d65f6a044c Author: Yinghai Lu Date: Fri May 15 13:05:16 2009 -0700 x86: don't call read_apic_id if !cpu_has_apic should not call that if apic is disabled. [ Impact: fix crash on certain UP configs ] Signed-off-by: Yinghai Lu Cc: Cyrill Gorcunov LKML-Reference: <4A09CCBB.2000306@kernel.org> Signed-off-by: Ingo Molnar commit e5198075c67a22ec9a09565b1ce88d3d3f5ba855 Author: Yinghai Lu Date: Fri May 15 13:05:16 2009 -0700 x86, apic: introduce io_apic_irq_attr according to Ingo, io_apic irq-setup related functions have too many parameters with a repetitive signature. So reduce related funcs to get less params by passing a pointer to a newly defined io_apic_irq_attr structure. v2: io_apic_irq ==> irq_attr triggering ==> trigger v3: add set_io_apic_irq_attr [ Impact: cleanup ] Signed-off-by: Yinghai Lu Cc: Andrew Morton Cc: Jesse Barnes Cc: Len Brown LKML-Reference: <4A08ACD3.2070401@kernel.org> Signed-off-by: Ingo Molnar commit 52650257ea06bb15c2e2bbe854cbdf463726141a Author: Jaswinder Singh Rajput Date: Thu May 14 12:35:46 2009 +0530 x86, mtrr: replace MTRRdefType_MSR with msr-index's MSR_MTRRdefType Use standard msr-index.h's MSR declaration and no need to declare again. [ Impact: cleanup, no object code change ] Signed-off-by: Jaswinder Singh Rajput Signed-off-by: H. Peter Anvin commit ba5673ff1ff5f428256db4cedd4b05b7be008bb6 Author: Jaswinder Singh Rajput Date: Thu May 14 12:29:25 2009 +0530 x86, mtrr: replace MTRRfix4K_C0000_MSR with msr-index's MSR_MTRRfix4K_C0000 Use standard msr-index.h's MSR declaration and no need to declare again. [ Impact: cleanup, no object code change ] Signed-off-by: Jaswinder Singh Rajput Signed-off-by: H. Peter Anvin commit 654ac05801ae806661c8fdeb3b5c420a31cbc5b1 Author: Jaswinder Singh Rajput Date: Thu May 14 12:21:54 2009 +0530 x86, mtrr: remove mtrr MSRs double declaration Removed MTRR MSR from mtrr/mtrr.h as these are already declared in msr-index.h and nobody is using them: MTRRfix16K_A0000_MSR MTRRfix4K_C8000_MSR MTRRfix4K_D0000_MSR MTRRfix4K_D8000_MSR MTRRfix4K_E0000_MSR MTRRfix4K_E8000_MSR MTRRfix4K_F0000_MSR MTRRfix4K_F8000_MSR Use standard msr-index.h's MSR declaration and no need to declare again [ Impact: cleanup, no object code change ] Signed-off-by: Jaswinder Singh Rajput Signed-off-by: H. Peter Anvin commit 7d9d55e449089df8463bca2045d702ae6cda64a2 Author: Jaswinder Singh Rajput Date: Thu May 14 12:15:32 2009 +0530 x86, mtrr: replace MTRRfix16K_80000_MSR with msr-index's MSR_MTRRfix16K_80000 Use standard msr-index.h's MSR declaration and no need to declare again [ Impact: cleanup, no object code change ] Signed-off-by: Jaswinder Singh Rajput Signed-off-by: H. Peter Anvin commit a036c7a358cc9d7ed28a188480b9a4d709e09b24 Author: Jaswinder Singh Rajput Date: Thu May 14 12:10:43 2009 +0530 x86, mtrr: replace MTRRfix64K_00000_MSR with msr-index's MSR_MTRRfix64K_00000 Use standard msr-index.h's MSR declaration and no need to declare again. [ Impact: cleanup, no object code change ] Signed-off-by: Jaswinder Singh Rajput Signed-off-by: H. Peter Anvin commit d9bcc01d58d18ed287091707b0b45c6ac888a11a Author: Jaswinder Singh Rajput Date: Thu May 14 12:06:12 2009 +0530 x86, mtrr: replace MTRRcap_MSR with msr-index's MSR_MTRRcap Use standard msr-index.h's MSR declaration and no need to declare again. [ Impact: cleanup, no object code change ] Signed-off-by: Jaswinder Singh Rajput Signed-off-by: H. Peter Anvin commit 2d02494f5a90f2e4b3c4c6acc85ec94674cdc431 Author: Thomas Gleixner Date: Sat May 2 20:08:52 2009 +0200 sched, timers: cleanup avenrun users avenrun is an rough estimate so we don't have to worry about consistency of the three avenrun values. Remove the xtime lock dependency and provide a function to scale the values. Cleanup the users. [ Impact: cleanup ] Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra commit dce48a84adf1806676319f6f480e30a6daa012f9 Author: Thomas Gleixner Date: Sat Apr 11 10:43:41 2009 +0200 sched, timers: move calc_load() to scheduler Dimitri Sivanich noticed that xtime_lock is held write locked across calc_load() which iterates over all online CPUs. That can cause long latencies for xtime_lock readers on large SMP systems. The load average calculation is an rough estimate anyway so there is no real need to protect the readers vs. the update. It's not a problem when the avenrun array is updated while a reader copies the values. Instead of iterating over all online CPUs let the scheduler_tick code update the number of active tasks shortly before the avenrun update happens. The avenrun update itself is handled by the CPU which calls do_timer(). [ Impact: reduce xtime_lock write locked section ] Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra commit f1a11e0576c7a73d759d05d776692b2b2d37172b Author: Thomas Gleixner Date: Tue May 5 19:21:40 2009 +0200 futex: remove the wait queue The waitqueue which is used in struct futex_q is a leftover from the futexfd implementation. There is no need to use a waitqueue at all, as the waiting task is the only user of it. The waitqueue just adds additional locking and a loop in the wake up path which both can be avoided. We have already a task reference in struct futex_q which is used for PI futexes. Use it for normal futexes as well and just wake up the task directly. The logic of signalling the futex wakeup via setting q->lock_ptr to NULL is kept with the difference that we set it NULL before doing the wakeup. This opens an exit race window vs. a non futex wake up of the to be woken up task, which we prevent with get_task_struct / put_task_struct on the waiter. [ Impact: simplification ] Signed-off-by: Thomas Gleixner commit 5872144f64b34a5942f6b4acedc90b02de72c58b Author: Li Zefan Date: Fri May 15 11:07:56 2009 +0800 tracing/filters: fix off-by-one bug We should leave the last slot for the ending '\0'. [ Impact: fix possible crash when the length of an operand is 128 ] Signed-off-by: Li Zefan LKML-Reference: <4A0CDC8C.30602@cn.fujitsu.com> Signed-off-by: Steven Rostedt commit 8cd995b6deedf98b7694ed32a786ee7f793d1eec Author: Li Zefan Date: Fri May 15 11:07:27 2009 +0800 tracing/filters: add missing unlock in a failure path [ Impact: fix deadlock in a rare case we fail to allocate memory ] Signed-off-by: Li Zefan LKML-Reference: <4A0CDC6F.7070200@cn.fujitsu.com> Signed-off-by: Steven Rostedt commit 1ec7c4849c214fc78b023230264399836ea3b245 Author: Steven Rostedt Date: Thu May 14 23:40:06 2009 -0400 tracing: stop stack trace on first empty entry The stack tracer stores eight entries in the ring buffer when an event traces the stack. The output outputs all eight entries regardless of how many entries were recorded. This patch breaks out of the loop when a null entry is discovered. [ Impact: only print the stack that is recorded ] Signed-off-by: Steven Rostedt commit 29a679754b1a2581ee456eada6c2de7ce95068bb Author: Steven Rostedt Date: Thu May 14 23:19:09 2009 -0400 x86/stacktrace: return 0 instead of -1 for stack ops If we return -1 in the ops->stack for the stacktrace saving, we end up breaking out of the loop if the stack we are tracing is in the exception stack. This causes traces like: -0 [002] 34263.745825: raise_softirq_irqoff <-__blk_complete_request -0 [002] 34263.745826: <= 0 <= 0 <= 0 <= 0 <= 0 <= 0 <= 0 By returning "0" instead, the irq stack is saved as well, and we see: -0 [003] 883.280992: raise_softirq_irqoff <-__hrtimer_star t_range_ns -0 [003] 883.280992: <= hrtimer_start_range_ns <= tick_nohz_restart_sched_tick <= cpu_idle <= start_secondary <= <= 0 <= 0 [ Impact: record stacks from interrupts ] Signed-off-by: Steven Rostedt commit c4f68236e41641494f9c8a418ccc0678c335bbb5 Author: H. Peter Anvin Date: Tue May 12 11:37:34 2009 -0700 x86-64: align __PHYSICAL_START, remove __KERNEL_ALIGN Handle the misconfiguration where CONFIG_PHYSICAL_START is incompatible with CONFIG_PHYSICAL_ALIGN. This is a configuration error, but one which arises easily since Kconfig doesn't have the smarts to express the true relationship between these two variables. Hence, align __PHYSICAL_START the same way we align LOAD_PHYSICAL_ADDR in . For non-relocatable kernels, this would cause the boot to fail. [ Impact: fix boot failures for non-relocatable kernels ] Reported-by: Ingo Molnar Signed-off-by: H. Peter Anvin commit 7ed42a28b269f8682eefae27f5c11187eb56e63b Author: H. Peter Anvin Date: Tue May 12 11:33:08 2009 -0700 x86, boot: correct sanity checks in boot/compressed/misc.c arch/x86/boot/compressed/misc.c contains several sanity checks on the output address. Correct constraints that are no longer correct: - the alignment test should be MIN_KERNEL_ALIGN on both 32 and 64 bits. - the 64 bit maximum address was set to 2^40, which was the limit of one specific x86-64 implementation. Change the test to 2^46, the current Linux limit, and at least try to test the end rather than the beginning. - for non-relocatable kernels, test against LOAD_PHYSICAL_ADDR on both 32 and 64 bits. [ Impact: fix potential boot failure due to invalid tests ] Signed-off-by: H. Peter Anvin commit b5710ce92a8cf8e3fc0ffc230cfdbfa23463f1c8 Author: Cyrill Gorcunov Date: Tue May 12 18:51:28 2009 +0400 x86/pci: add 4 more return parameters to IO_APIC_get_PCI_irq_vector(), fix Fix trivial typo in the drivers/pci/hotplug/ibmphp_core.c changes. [ Impact: build fix ] Signed-off-by: Cyrill Gorcunov CC: Yinghai Lu Cc: eswierk@aristanetworks.com LKML-Reference: <20090512145128.GA10220@lenovo> Signed-off-by: Ingo Molnar commit 4797f6b021a3fa399942245d07a1feb30df81bb8 Author: Yinghai Lu Date: Sat May 2 10:40:57 2009 -0700 x86: read apic ID in the !acpi_lapic case Ed found that on 32-bit, boot_cpu_physical_apicid is not read right, when the mptable is broken. Interestingly, actually three paths use/set it: 1. acpi: at that time that is already read from reg 2. mptable: only read from mptable 3. no madt, and no mptable, that use default apic id 0 for 64-bit, -1 for 32-bit so we could read the apic id for the 2/3 path. We trust the hardware register more than we trust a BIOS data structure (the mptable). We can also avoid the double set_fixmap() when acpi_lapic is used, and also need to move cpu_has_apic earlier and call apic_disable(). Also when need to update the apic id, we'd better read and set the apic version as well - so that quirks are applied precisely. v2: make path 3 with 64bit, use -1 as apic id, so could read it later. v3: fix whitespace problem pointed out by Ed Swierk v5: fix boot crash [ Impact: get correct apic id for bsp other than acpi path ] Reported-by: Ed Swierk Signed-off-by: Yinghai Lu Acked-by: Cyrill Gorcunov LKML-Reference: <49FC85A9.2070702@kernel.org> [ v4: sanity-check in the ACPI case too ] Signed-off-by: Ingo Molnar commit 6cda3eb62ef42aa5acd649bf99c8db544e0f4051 Merge: b9c61b70 cec6be6 Author: Ingo Molnar Date: Tue May 12 12:17:30 2009 +0200 Merge branch 'x86/apic' into irq/numa Merge reason: both topics modify the APIC code but were able to do it in parallel so far. An upcoming patch generates a conflict so merge them to avoid the conflict. Signed-off-by: Ingo Molnar commit ed077b58f6146684069975122b1728a9d248a501 Author: Shaohua Li Date: Tue May 12 16:40:00 2009 +0800 x86: make sparse mem work in non-NUMA mode With sparse memory, holes should not be marked present for memmap. This patch makes sure sparsemem really works on SMP mode (!NUMA). [ Impact: use less memory to map fragmented RAM, avoid boot-OOM/crash ] Signed-off-by: Shaohua Li Signed-off-by: Sheng Yang LKML-Reference: <1242117600.22431.0.camel@sli10-desk.sh.intel.com> Signed-off-by: Ingo Molnar commit bf78ad69cd351798b9447a269c6bd41ce1f111f4 Author: Amerigo Wang Date: Mon May 11 23:29:09 2009 -0400 x86: process.c, remove useless headers is not needed by these files, remove them. [ Impact: cleanup ] Signed-off-by: WANG Cong Cc: akpm@linux-foundation.org LKML-Reference: <20090512032956.5040.77055.sendpatchset@localhost.localdomain> Signed-off-by: Ingo Molnar commit 9d62dcdfa6f6fc843f7d9b494bcd48f00b94f883 Author: Amerigo Wang Date: Mon May 11 22:05:28 2009 -0400 x86: merge process.c a bit Merge arch_align_stack() and arch_randomize_brk(), since they are the same. Tested on x86_64. [ Impact: cleanup ] Signed-off-by: Amerigo Wang Signed-off-by: Ingo Molnar commit 871b72dd1e12afc3f024479531d25a9339d2e3f9 Author: Dmitry Adamushko Date: Mon May 11 23:48:27 2009 +0200 x86: microcode: use smp_call_function_single instead of set_cpus_allowed, cleanup of synchronization logic * Solve issues described in 6f66cbc63081fd70e3191b4dbb796746780e5ae1 in a way that doesn't resort to set_cpus_allowed(); * in fact, only collect_cpu_info and apply_microcode callbacks must run on a target cpu, others will do just fine on any other. smp_call_function_single() (as suggested by Ingo) is used to run these callbacks on a target cpu. * cleanup of synchronization logic of the 'microcode_core' part The generic 'microcode_core' part guarantees that only a single cpu (be it a full-fledged cpu, one of the cores or HT) is being updated at any particular moment of time. In general, there is no need for any additional sync. mechanism in arch-specific parts (the patch removes existing spinlocks). See also the "Synchronization" section in microcode_core.c. * return -EINVAL instead of -1 (which is translated into -EPERM) in microcode_write(), reload_cpu() and mc_sysdev_add(). Other suggestions for an error code? * use 'enum ucode_state' as return value of request_microcode_{fw, user} to gain more flexibility by distinguishing between real error cases and situations when an appropriate ucode was not found (which is not an error per-se). * some minor cleanups Thanks a lot to Hugh Dickins for review/suggestions/testing! Reference: http://marc.info/?l=linux-kernel&m=124025889012541&w=2 [ Impact: refactor and clean up microcode driver locking code ] Signed-off-by: Dmitry Adamushko Acked-by: Hugh Dickins Cc: Andrew Morton Cc: Rusty Russell Cc: Andreas Herrmann Cc: Peter Oruba Cc: Arjan van de Ven LKML-Reference: <1242078507.5560.9.camel@earth> [ did some more cleanups ] Signed-off-by: Ingo Molnar arch/x86/include/asm/microcode.h | 25 ++ arch/x86/kernel/microcode_amd.c | 58 ++---- arch/x86/kernel/microcode_core.c | 326 +++++++++++++++++++++----------------- arch/x86/kernel/microcode_intel.c | 92 +++------- 4 files changed, 261 insertions(+), 240 deletions(-) (~20 new comment lines) commit 168b6b1d0594c7866caa73b12f3b8d91075695f2 Author: Steven Rostedt Date: Mon May 11 22:11:05 2009 -0400 ring-buffer: move code around to remove some branches This is a bit of micro-optimizations. But since the ring buffer is used in tracing every function call, it is an extreme hot path. Every nanosecond counts. This change shows over 5% improvement in the ring-buffer-benchmark. [ Impact: more efficient code ] Signed-off-by: Steven Rostedt commit 88eb0125362f2ab272cbaf84252cf101ddc2dec9 Author: Steven Rostedt Date: Mon May 11 16:28:23 2009 -0400 ring-buffer: use internal time stamp function The ring_buffer_time_stamp that is exported adds a little more overhead than is needed for using it internally. This patch adds an internal timestamp function that can be inlined (a single line function) and used internally for the ring buffer. [ Impact: a little less overhead to the ring buffer ] Signed-off-by: Steven Rostedt commit 0f0c85fc80adbbd2265d89867d743f929d516805 Author: Steven Rostedt Date: Mon May 11 16:08:00 2009 -0400 ring-buffer: small optimizations Doing some small changes in the fast path of the ring buffer recording saves over 3% in the ring-buffer-benchmark test. [ Impact: a little faster ring buffer recording ] Signed-off-by: Steven Rostedt commit 5031296c57024a78ddad4edfc993367dbf4abb98 Author: H. Peter Anvin Date: Thu May 7 16:54:11 2009 -0700 x86: add extension fields for bootloader type and version A long ago, in days of yore, it all began with a god named Thor. There were vikings and boats and some plans for a Linux kernel header. Unfortunately, a single 8-bit field was used for bootloader type and version. This has generally worked without *too* much pain, but we're getting close to flat running out of ID fields. Add extension fields for both type and version. The type will be extended if it the old field is 0xE; the version is a simple MSB extension. Keep /proc/sys/kernel/bootloader_type containing (type << 4) + (ver & 0xf) for backwards compatiblity, but also add /proc/sys/kernel/bootloader_version which contains the full version number. [ Impact: new feature to support more bootloaders ] Signed-off-by: H. Peter Anvin commit fe83fcc0a14dcf71996de5eb84771b2369ba7abc Author: H. Peter Anvin Date: Mon May 11 16:23:16 2009 -0700 x86, defconfig: update kernel position parameters Update CONFIG_RELOCATABLE, CONFIG_PHYSICAL_START and CONFIG_PHYSICAL_ALIGN to reflect the current defaults. [ Impact: make defconfig match Kconfig defaults ] Signed-off-by: H. Peter Anvin commit c4a994645d04d5fa6bfa52a3204af87dd92168d5 Author: H. Peter Anvin Date: Mon May 11 16:20:12 2009 -0700 x86, defconfig: update to current, no material changes Update defconfigs to reflect current configuration files. No other changes. [ Impact: updates defconfigs to match what "make defconfig" generates ] Signed-off-by: H. Peter Anvin commit 26717808f93a27c22d4853c4fb17fa225f4ccc68 Author: H. Peter Anvin Date: Thu May 7 14:19:34 2009 -0700 x86: make CONFIG_RELOCATABLE the default Remove the EXPERIMENTAL tag from CONFIG_RELOCATABLE and make it the default. Relocatable kernels have been used for a while now, and should now have identical semantics to non-relocatable kernels when loaded by a non-relocating bootloader. Signed-off-by: H. Peter Anvin commit ceefccc93932b920a8ec6f35f596db05202a12fe Author: H. Peter Anvin Date: Mon May 11 16:12:16 2009 -0700 x86: default CONFIG_PHYSICAL_START and CONFIG_PHYSICAL_ALIGN to 16 MB Default CONFIG_PHYSICAL_START and CONFIG_PHYSICAL_ALIGN each to 16 MB, so that both non-relocatable and relocatable kernels are loaded at 16 MB by a non-relocating bootloader. This is somewhat hacky, but it appears to be the only way to do this that does not break some some set of existing bootloaders. We want to avoid the bottom 16 MB because of large page breakup, memory holes, and ZONE_DMA. Embedded systems may need to reduce this, or update their bootloaders to be aware of the new min_alignment field. [ Impact: performance improvement, avoids problems on some systems ] Signed-off-by: H. Peter Anvin commit d297366ba692faf1f0384811a6ff0b20c3470b1b Author: H. Peter Anvin Date: Mon May 11 16:06:23 2009 -0700 x86: document new bzImage fields Document the new bzImage fields for kernel memory placement. [ Impact: adds documentation ] Signed-off-by: H. Peter Anvin commit 37ba7ab5e33cebc25c68fffe33e9f21e7c2014e8 Author: H. Peter Anvin Date: Mon May 11 15:56:08 2009 -0700 x86, boot: make kernel_alignment adjustable; new bzImage fields Make the kernel_alignment field adjustable; this allows us to set it to a large value (intended to be 16 MB to avoid ZONE_DMA contention, memory holes and other weirdness) while a smart bootloader can still force a loading at a lesser alignment if absolutely necessary. Also export pref_address (preferred loading address, corresponding to the link-time address) and init_size, the total amount of linear memory the kernel will require during initialization. [ Impact: allows better kernel placement, gives bootloader more info ] Signed-off-by: H. Peter Anvin commit 99aa45595f45603526513d5e29fc00f8afbf3913 Author: H. Peter Anvin Date: Mon May 11 16:02:10 2009 -0700 x86, boot: remove dead code from boot/compressed/head_*.S Remove a couple of lines of dead code from arch/x86/boot/compressed/head_*.S; all of these update registers that are dead in the current code. [ Impact: cleanup ] Signed-off-by: H. Peter Anvin commit 2ff799d3cff1ecb274049378b28120ee5c1c5e5f Author: Vaidyanathan Srinivasan Date: Mon May 11 20:09:14 2009 +0530 sched: Don't export sched_mc_power_savings on multi-socket single core system Fix to prevent sched_mc_power_saving from being exported through sysfs for multi-scoket single core system. Max cores should be always greater than one (1). My earlier patch that introduced fix for not exporting 'sched_mc_power_saving' on laptops broke it on multi-socket single core system. This fix addresses issue on both laptop and multi-socket single core system. Below are the Test results: 1. Single socket - multi-core Before Patch: Does not export 'sched_mc_power_saving' After Patch: Does not export 'sched_mc_power_saving' Result: Pass 2. Multi Socket - single core Before Patch: exports 'sched_mc_power_saving' After Patch: Does not export 'sched_mc_power_saving' Result: Pass 3. Multi Socket - Multi core Before Patch: exports 'sched_mc_power_saving' After Patch: exports 'sched_mc_power_saving' [ Impact: make the sched_mc_power_saving control available more consistently ] Signed-off-by: Mahesh Salgaonkar Cc: Suresh B Siddha Cc: Venkatesh Pallipadi Cc: Peter Zijlstra LKML-Reference: <20090511143914.GB4853@dirshya.in.ibm.com> Signed-off-by: Ingo Molnar commit 40b387a8a9a821878ecdf9fb117958c426fc1385 Author: H. Peter Anvin Date: Mon May 11 14:41:55 2009 -0700 x86, boot: use LOAD_PHYSICAL_ADDR on 64 bits Use LOAD_PHYSICAL_ADDR instead of CONFIG_PHYSICAL_START in the 64-bit decompression code, for equivalence with the 32-bit code. [ Impact: cleanup, increases code similarity ] Signed-off-by: H. Peter Anvin commit 77d1a4999502c260df0eb2de437d320bf8c64b36 Author: H. Peter Anvin Date: Mon May 11 14:21:12 2009 -0700 x86, boot: make symbols from the main vmlinux available Make symbols from the main vmlinux, as opposed to just compressed/vmlinux, available to header.S. Also, export a few additional symbols. This will be used in a subsequent patch to export the total memory footprint of the kernel. [ Impact: enable future enhancement ] Signed-off-by: H. Peter Anvin commit be957c447f7233a67904a1b11eb3ab61e702bf4d Author: Steven Rostedt Date: Mon May 11 14:42:53 2009 -0400 ring-buffer: move calculation of event length The event length is calculated and passed in to rb_reserve_next_event in two different locations. Having rb_reserve_next_event do the calculations directly makes only one location to do the change and causes the calculation to be inlined by gcc. Before: text data bss dec hex filename 16538 24 12 16574 40be kernel/trace/ring_buffer.o After: text data bss dec hex filename 16490 24 12 16526 408e kernel/trace/ring_buffer.o [ Impact: smaller more efficient code ] Signed-off-by: Steven Rostedt commit 1cd8d7358948909ab80b254eb14bcebc555ad417 Author: Steven Rostedt Date: Mon May 11 14:08:09 2009 -0400 ring-buffer: remove type parameter from rb_reserve_next_event The rb_reserve_next_event is only called for the data type (type = 0). There is no reason to pass in the type to the function. Before: text data bss dec hex filename 16554 24 12 16590 40ce kernel/trace/ring_buffer.o After: text data bss dec hex filename 16538 24 12 16574 40be kernel/trace/ring_buffer.o [ Impact: cleaner, smaller and slightly more efficient code ] Signed-off-by: Steven Rostedt commit d988ff94c1074c4c914235c8591bcceafb585ecf Author: Steven Rostedt Date: Fri May 8 11:03:57 2009 -0400 ring-buffer: check for divide by zero in ring-buffer-benchmark Although we check if "missed" is not zero, we divide by hit + missed, and the addition can possible overflow and become a divide by zero. This patch checks for this case, and will report it when it happens then modify "hit" to make the calculation be non zero. [ Impact: prevent possible divide by zero in ring-buffer-benchmark ] Reported-by: Andrew Morton Signed-off-by: Steven Rostedt commit 5a772b2b3c68e7e0b503c5a48469113bb0634314 Author: Steven Rostedt Date: Fri May 8 10:56:33 2009 -0400 ring-buffer: replace constants with time macros in ring-buffer-benchmark The use of numeric constants is discouraged. It is cleaner and more descriptive to use macros for constant time conversions. This patch also removes an extra new line. [ Impact: more descriptive time conversions ] Reported-by: Andrew Morton Signed-off-by: Steven Rostedt commit 0c23590f00f85467b318ad0c20c36796a5bd4c60 Author: Alexey Dobriyan Date: Mon May 4 03:30:15 2009 +0400 x86, 64-bit: ifdef out struct thread_struct::ip struct thread_struct::ip isn't used on x86_64, struct pt_regs::ip is used instead. kgdb should be reading 0 always, but I can't check it. [ Impact: (potentially) reduce thread_struct size on 64-bit ] Signed-off-by: Alexey Dobriyan Cc: containers@lists.linux-foundation.org LKML-Reference: <20090503233015.GJ16631@x200.localdomain> Signed-off-by: Ingo Molnar commit d756f4adb9d8a86e347a2d5435bb5cc95744733e Author: Alexey Dobriyan Date: Mon May 4 03:29:52 2009 +0400 x86, 32-bit: ifdef out struct thread_struct::fs After commit 464d1a78fbf8cf6c7fd970e7b3e2db50a320ce28 aka "[PATCH] i386: Convert i386 PDA code to use %fs" %fs saved during context switch moved from thread_struct to pt_regs and value on thread_struct became unused. [ Impact: reduce thread_struct size on 32-bit ] Signed-off-by: Alexey Dobriyan Cc: containers@lists.linux-foundation.org LKML-Reference: <20090503232952.GI16631@x200.localdomain> Signed-off-by: Ingo Molnar commit cec6be6d1069d697beb490bbb40a290d5ff554a2 Author: Cyrill Gorcunov Date: Mon May 11 17:41:40 2009 +0400 x86: apic: Fixmap apic address even if apic disabled In case if apic were disabled by boot option we still need read_apic operation. So fixmap a fake apic area if needed. [ Impact: fix boot crash ] Signed-off-by: Cyrill Gorcunov Cc: yinghai@kernel.org Cc: eswierk@aristanetworks.com LKML-Reference: <20090511134140.GH4624@lenovo> Signed-off-by: Ingo Molnar commit 41fb454ebe6024f5c1e3b3cbc0abc0da762e7b51 Merge: 19c1a6f 091bf76 Author: Ingo Molnar Date: Mon May 11 14:44:27 2009 +0200 Merge commit 'v2.6.30-rc5' into core/iommu Merge reason: core/iommu was on an .30-rc1 base, update it to .30-rc5 to refresh. Signed-off-by: Ingo Molnar commit 97a52714658cd959a3cfa35c5b6f489859f0204b Author: Andreas Herrmann Date: Fri May 8 18:23:50 2009 +0200 x86: display extended apic registers with print_local_APIC and cpu_debug code Both print_local_APIC (used when apic=debug kernel param is set) and cpu_debug code missed support for some extended APIC registers that I'd like to see. This adds support to show: - extended APIC feature register - extended APIC control register - extended LVT registers [ Impact: print more debug info ] Signed-off-by: Andreas Herrmann Cc: Jaswinder Singh Rajput Cc: Cyrill Gorcunov LKML-Reference: <20090508162350.GO29045@alberich.amd.com> Signed-off-by: Ingo Molnar commit 79c5d3ce614d8fe706545c7bca2158b63db6bb5e Author: Li Zefan Date: Mon May 11 15:06:46 2009 +0800 blktrace: from-sector redundant in trace_block_remap, cleanup The last argument of block_remap prober is the original sector before remap, so it should be 'from', not 'to'. [ Impact: clean up ] Signed-off-by: Li Zefan Cc: "Alan D. Brunelle" Cc: Jens Axboe Cc: Arnaldo Carvalho de Melo Cc: KOSAKI Motohiro LKML-Reference: <4A07CE86.5090301@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 7961386fe9596e6bf03d09948a73c5df9653325b Merge: aa47b7e 091bf76 Author: Ingo Molnar Date: Mon May 11 12:59:32 2009 +0200 Merge commit 'v2.6.30-rc5' into sched/core Merge reason: sched/core was on .30-rc1 before, update to latest fixes Signed-off-by: Ingo Molnar commit 049862579333cc6cd9e6edfd6987cd0addfd8c59 Author: Li Zefan Date: Mon May 11 14:33:23 2009 +0800 blktrace: pdu_buf of pc events should be unsigned I got this: 8,0 1 305.417782332 2037 I R 32 (ffffff9e 10 00 ...) [bash] It should be: 8,0 1 305.417782332 2037 I R 32 (9e 10 00 ...) [bash] [ Impact: fix output of pc events ] Signed-off-by: Li Zefan Cc: Jens Axboe Cc: Arnaldo Carvalho de Melo Cc: Steven Rostedt Cc: Frederic Weisbecker LKML-Reference: <4A07C6B3.9080802@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 087fa4e964e04371a67f340e1f6ac92c5db5e0a6 Author: Pekka Enberg Date: Thu May 7 15:35:42 2009 +0300 x86: use sparse_memory_present_with_active_regions() on UMA There's no need to use call memory_present() manually on UMA because initmem_init() sets up early_node_map by calling e820_register_active_regions(). [ Impact: cleanup ] Signed-off-by: Pekka Enberg LKML-Reference: <1241699742.17846.31.camel@penberg-laptop> Signed-off-by: Ingo Molnar commit 3551f88f6439cf4da3f5a3747b320280e30500de Author: Pekka Enberg Date: Thu May 7 15:35:41 2009 +0300 x86: unify 64-bit UMA and NUMA paging_init() 64-bit UMA and NUMA versions of paging_init() are almost identical. Therefore, merge the copy in mm/numa_64.c to mm/init_64.c to remove duplicate code. [ Impact: cleanup ] Signed-off-by: Pekka Enberg LKML-Reference: <1241699741.17846.30.camel@penberg-laptop> Signed-off-by: Ingo Molnar commit 0964b0562bb9c93194e852b47bab2397b9e11c18 Author: Yinghai Lu Date: Fri May 8 00:37:34 2009 -0700 x86: Allow 1MB of slack between the e820 map and SRAT, not 4GB It is expected that there might be slight differences between the e820 map and the SRAT table and the intention was that 1MB of slack be allowed. The calculation comparing e820ram and pxmram assumes the units are bytes, when they are in fact pages. This means 4GB of slack is being allowed, not 1MB. This patch makes the correct comparison. comment is from Mel. [ Impact: don't accept buggy SRATs that could dump up to 4G of RAM ] Signed-off-by: Yinghai Lu Acked-by: Mel Gorman Cc: Andrew Morton LKML-Reference: <4A03E13E.6050107@kernel.org> Signed-off-by: Ingo Molnar commit b37ab91907e9002925f4217e3bbd496aa12c2fa3 Author: Yinghai Lu Date: Fri May 8 00:36:44 2009 -0700 x86: Sanity check the e820 against the SRAT table using e820 map only node_cover_memory() sanity checks the SRAT table by ensuring that all PXMs cover the memory reported in the e820. However, when calculating the size of the holes in the e820, it uses the early_node_map[] which contains information taken from both SRAT and e820. If the SRAT is missing an entry, then it is not detected that the SRAT table is incorrect and missing entries. This patch uses the e820 map to calculate the holes instead of early_node_map[]. comment is from Mel. [ Impact: reject incorrect SRAT tables ] Signed-off-by: Yinghai Lu Acked-by: Mel Gorman Cc: Andrew Morton LKML-Reference: <4A03E10C.60906@kernel.org> Signed-off-by: Ingo Molnar commit 4401da6111ac58f94234417427d06a72c4048c74 Author: Yinghai Lu Date: Sat May 2 10:40:57 2009 -0700 x86: read apic ID in the !acpi_lapic case Ed found that on 32-bit, boot_cpu_physical_apicid is not read right, when the mptable is broken. Interestingly, actually three paths use/set it: 1. acpi: at that time that is already read from reg 2. mptable: only read from mptable 3. no madt, and no mptable, that use default apic id 0 for 64-bit, -1 for 32-bit so we could read the apic id for the 2/3 path. We trust the hardware register more than we trust a BIOS data structure (the mptable). We can also avoid the double set_fixmap() when acpi_lapic is used, and also need to move cpu_has_apic earlier and call apic_disable(). Also when need to update the apic id, we'd better read and set the apic version as well - so that quirks are applied precisely. v2: make path 3 with 64bit, use -1 as apic id, so could read it later. v3: fix whitespace problem pointed out by Ed Swierk [ Impact: get correct apic id for bsp other than acpi path ] Reported-by: Ed Swierk Signed-off-by: Yinghai Lu Acked-by: Cyrill Gorcunov LKML-Reference: <49FC85A9.2070702@kernel.org> [ v4: sanity-check in the ACPI case too ] Signed-off-by: Ingo Molnar commit 80989ce0643c1034822f3e339ed8d790b649abe1 Author: Yinghai Lu Date: Sat May 9 23:47:42 2009 -0700 x86: clean up and and print out initial max_pfn_mapped Do this so we can check the range that is mapped before init_memory_mapping(). To be able to print out meaningful info, we first have to fix 64-bit to have max_pfn_mapped assigned before that call. This also unifies the code-path a bit. [ Impact: print more debug info, cleanup ] Signed-off-by: Yinghai Lu LKML-Reference: <49BF0978.40605@kernel.org> Signed-off-by: Ingo Molnar commit 3e0c373749d7eb5b354ac0b043f2b2cdf84eefef Author: Yinghai Lu Date: Sat May 9 23:47:42 2009 -0700 x86: clean up and fix setup_clear/force_cpu_cap handling setup_force_cpu_cap() only have one user (Xen guest code), but it should not reuse cleared_cpu_cpus, otherwise it will have problems on SMP. Need to have a separate cpu_cpus_set array too, for forced-on flags, beyond the forced-off flags. Also need to setup handling before all cpus caps are combined. [ Impact: fix the forced-set CPU feature flag logic ] Cc: H. Peter Anvin Cc: Jeremy Fitzhardinge Cc: Rusty Russell Signed-off-by: Yinghai Lu LKML-Reference: Signed-off-by: Ingo Molnar commit 61fe91e1319556f32bebfd7ed2c68ef02e2c17f7 Author: Yinghai Lu Date: Sat May 9 23:47:42 2009 -0700 x86: apic: Check rev 3 fadt correctly for physical_apic bit Impact: fix fadt version checking FADT2_REVISION_ID has value 3 aka rev 3 FADT. So need to use >= instead of >, as other places in the code do. [ Impact: extend scope of APIC boot quirk ] Signed-off-by: Yinghai Lu LKML-Reference: Signed-off-by: Ingo Molnar commit b9c61b70075c87a8612624736faf4a2de5b1ed30 Author: Yinghai Lu Date: Wed May 6 10:10:06 2009 -0700 x86/pci: update pirq_enable_irq() to setup io apic routing So we can set io apic routing only when enabling the device irq. This is advantageous for IRQ descriptor allocation affinity: if we set up the IO-APIC entry later, we have a chance to allocate the IRQ descriptor later and know which device it is on and can set affinity accordingly. [ Impact: standardize/enhance irq-enabling sequence for mptable irqs ] Signed-off-by: Yinghai Lu Acked-by: Jesse Barnes Cc: Len Brown Cc: Andrew Morton LKML-Reference: <4A01C46E.8000501@kernel.org> Signed-off-by: Ingo Molnar commit 5ef2183768bb7d64b85eccbfa1537a61cbefa97c Author: Yinghai Lu Date: Wed May 6 10:08:50 2009 -0700 x86/acpi: move setup io apic routing out of CONFIG_ACPI scope So we could set io apic routing when ACPI is not enabled. [ Impact: prepare for new functionality ] Signed-off-by: Yinghai Lu Cc: Andrew Morton Cc: Jesse Barnes Cc: Len Brown LKML-Reference: <4A01C422.5070400@kernel.org> Signed-off-by: Ingo Molnar commit e20c06fd6950265a899edd96a02dc2e6ae2d1ce5 Author: Yinghai Lu Date: Wed May 6 10:08:22 2009 -0700 x86/pci: add 4 more return parameters to IO_APIC_get_PCI_irq_vector() To prepare those params for pcibios_irq_enable() to call setup_io_apic_routing(). [ Impact: extend function call API to prepare for new functionality ] Signed-off-by: Yinghai Lu Acked-by: Jesse Barnes Cc: Len Brown Cc: Andrew Morton LKML-Reference: <4A01C406.2040303@kernel.org> Signed-off-by: Ingo Molnar commit bdfe8ac153546537ed24de69610ea781a734f785 Author: Yinghai Lu Date: Wed May 6 10:07:41 2009 -0700 x86/acpi: move pin_programmed bit map to io_apic.c Prepare to call setup_io_apic_routing() in pcibios_irq_enable() also remove not needed member apic_id. [ Impact: clean up, prepare for future change ] Signed-off-by: Yinghai Lu Cc: Andrew Morton Cc: Jesse Barnes Cc: Len Brown LKML-Reference: <4A01C3DD.3050104@kernel.org> Signed-off-by: Ingo Molnar commit a31f82057ce6f7ced578d64c07a72ccbdc7336e4 Author: Yinghai Lu Date: Wed May 6 10:06:15 2009 -0700 x86/acpi: call mp_config_acpi_gsi() in mp_register_gsi() The patch to call mp_config_acpi_gsi() from the ACPI IRQ registration code never got mainline because there were open discussions about it. This call is needed to properly update the kernel's copy of the mptable, when the update_mptable boot parameter is needed. Now that the dust has settled with the APIC unification, and since there were no objections when the patch was re-submitted, try this again. [ Impact: fix the update_mptable boot parameter ] Signed-off-by: Yinghai Lu Cc: Andrew Morton Cc: Jesse Barnes Cc: Len Brown LKML-Reference: <4A01C387.7090103@kernel.org> Signed-off-by: Ingo Molnar commit ee214558c2e959781a406e76c5b34364da638e1d Author: Yinghai Lu Date: Wed May 6 10:07:07 2009 -0700 x86: fix alloc_mptable() Fix the conditions when we stop updating the mptable due to running out of slots. [ Impact: fix memory corruption / non-working update_mptable boot parameter ] Signed-off-by: Yinghai Lu Cc: Andrew Morton Cc: Jesse Barnes Cc: Len Brown LKML-Reference: <4A01C3BB.1000609@kernel.org> Signed-off-by: Ingo Molnar commit b9e0353fc85dab4ef5ebcef2bd09ebc4ce6d5a7b Author: Yinghai Lu Date: Wed May 6 10:05:32 2009 -0700 x86/acpi: remove irq-compression trick on 32-bit We already have a per cpu vector on 32-bit via recent changes, and don't need this trick any more (which trick obfuscates the real GSI mappings and which only triggers on larger systems to begin with): On 3 ioapic system (24 per ioapic) before patch I got: ACPI: PCI Interrupt Link [ILSB] enabled at IRQ 71 IOAPIC[2]: Set routing entry (10-23 -> 0xa9 -> IRQ 64 Mode:1 Active:1) pci 0000:80:01.1: PCI INT A -> Link[ILSB] -> GSI 71 (level, low) -> IRQ 64 ACPI: PCI Interrupt Link [LE5B] enabled at IRQ 67 IOAPIC[2]: Set routing entry (10-19 -> 0xb1 -> IRQ 65 Mode:1 Active:1) pci 0000:83:00.0: PCI INT B -> Link[LE5B] -> GSI 67 (level, low) -> IRQ 65 ACPI: PCI Interrupt Link [LE5A] enabled at IRQ 66 IOAPIC[2]: Set routing entry (10-18 -> 0xb9 -> IRQ 66 Mode:1 Active:1) pci 0000:83:00.1: PCI INT A -> Link[LE5A] -> GSI 66 (level, low) -> IRQ 66 ACPI: PCI Interrupt Link [LE5D] enabled at IRQ 65 IOAPIC[2]: Set routing entry (10-17 -> 0xc1 -> IRQ 67 Mode:1 Active:1) pci 0000:84:00.0: PCI INT B -> Link[LE5D] -> GSI 65 (level, low) -> IRQ 67 ACPI: PCI Interrupt Link [LE5C] enabled at IRQ 64 IOAPIC[2]: Set routing entry (10-16 -> 0xc9 -> IRQ 68 Mode:1 Active:1) pci 0000:84:00.1: PCI INT A -> Link[LE5C] -> GSI 64 (level, low) -> IRQ 68 pci 0000:87:00.0: PCI INT B -> Link[LE5A] -> GSI 66 (level, low) -> IRQ 66 pci 0000:87:00.1: PCI INT A -> Link[LE5D] -> GSI 65 (level, low) -> IRQ 67 pci 0000:88:00.0: PCI INT B -> Link[LE5C] -> GSI 64 (level, low) -> IRQ 68 pci 0000:88:00.1: PCI INT A -> Link[LE5B] -> GSI 67 (level, low) -> IRQ 65 pci 0000:8b:00.0: PCI INT B -> Link[LE5A] -> GSI 66 (level, low) -> IRQ 66 pci 0000:8b:00.1: PCI INT A -> Link[LE5D] -> GSI 65 (level, low) -> IRQ 67 pci 0000:8c:00.0: PCI INT B -> Link[LE5C] -> GSI 64 (level, low) -> IRQ 68 pci 0000:8c:00.1: PCI INT A -> Link[LE5B] -> GSI 67 (level, low) -> IRQ 65 after the patch we get: ACPI: PCI Interrupt Link [ILSB] enabled at IRQ 71 IOAPIC[2]: Set routing entry (10-23 -> 0xa9 -> IRQ 71 Mode:1 Active:1) pci 0000:80:01.1: PCI INT A -> Link[ILSB] -> GSI 71 (level, low) -> IRQ 71 ACPI: PCI Interrupt Link [LE5B] enabled at IRQ 67 IOAPIC[2]: Set routing entry (10-19 -> 0xb1 -> IRQ 67 Mode:1 Active:1) pci 0000:83:00.0: PCI INT B -> Link[LE5B] -> GSI 67 (level, low) -> IRQ 67 ACPI: PCI Interrupt Link [LE5A] enabled at IRQ 66 IOAPIC[2]: Set routing entry (10-18 -> 0xb9 -> IRQ 66 Mode:1 Active:1) pci 0000:83:00.1: PCI INT A -> Link[LE5A] -> GSI 66 (level, low) -> IRQ 66 ACPI: PCI Interrupt Link [LE5D] enabled at IRQ 65 IOAPIC[2]: Set routing entry (10-17 -> 0xc1 -> IRQ 65 Mode:1 Active:1) pci 0000:84:00.0: PCI INT B -> Link[LE5D] -> GSI 65 (level, low) -> IRQ 65 ACPI: PCI Interrupt Link [LE5C] enabled at IRQ 64 IOAPIC[2]: Set routing entry (10-16 -> 0xc9 -> IRQ 64 Mode:1 Active:1) pci 0000:84:00.1: PCI INT A -> Link[LE5C] -> GSI 64 (level, low) -> IRQ 64 pci 0000:87:00.0: PCI INT B -> Link[LE5A] -> GSI 66 (level, low) -> IRQ 66 pci 0000:87:00.1: PCI INT A -> Link[LE5D] -> GSI 65 (level, low) -> IRQ 65 pci 0000:88:00.0: PCI INT B -> Link[LE5C] -> GSI 64 (level, low) -> IRQ 64 pci 0000:88:00.1: PCI INT A -> Link[LE5B] -> GSI 67 (level, low) -> IRQ 67 pci 0000:8b:00.0: PCI INT B -> Link[LE5A] -> GSI 66 (level, low) -> IRQ 66 pci 0000:8b:00.1: PCI INT A -> Link[LE5D] -> GSI 65 (level, low) -> IRQ 65 pci 0000:8c:00.0: PCI INT B -> Link[LE5C] -> GSI 64 (level, low) -> IRQ 64 pci 0000:8c:00.1: PCI INT A -> Link[LE5B] -> GSI 67 (level, low) -> IRQ 67 As it can be seen that GSIs now get mapped lineary. [ Impact: simplify irq number mapping on bigger 32-bit systems ] Signed-off-by: Yinghai Lu Cc: Andrew Morton Cc: Jesse Barnes Cc: Len Brown LKML-Reference: <4A01C35C.7060207@kernel.org> Signed-off-by: Ingo Molnar commit 7a309490da98981558a07183786201f02a6341e2 Merge: 9a8709d 091bf76 Author: Ingo Molnar Date: Mon May 11 09:33:06 2009 +0200 Merge commit 'v2.6.30-rc5' into x86/apic Merge reason: this branch was on a .30-rc2 base - sync it up with all the latest fixes. Signed-off-by: Ingo Molnar commit 5d423ccd7ba4285f1084e91b26805e1d0ae978ed Author: Yinghai Lu Date: Wed May 6 08:07:52 2009 -0700 x86/pci: remove rounding quirk from e820_setup_gap() Now that the e820 code explicitly reserves 'potentially dangerous' free physical memory address space to protect ACPI stolen RAM, there's no need for the rounding quirk in the PCI allocator anymore. Also, this quirk was open-ended iteration that could end up reserving a lot of free space and potentially breaking drivers - such as the one reported by Yannick Roehlly where there's a PCI device with a large memory resource. So remove it. [ Impact: make more of the PCI hole available for assigning pci devices ] Reported-by: Yannick Roehlly Signed-off-by: Yinghai Lu Acked-by: Jesse Barnes Cc: Ivan Kokshaysky Cc: Linus Torvalds Cc: Andrew Morton LKML-Reference: <4A01A7C8.5090701@kernel.org> Signed-off-by: Ingo Molnar commit 45fbe3ee01b8e463b28c2751b5dcc0cbdc142d90 Author: Linus Torvalds Date: Wed May 6 08:06:44 2009 -0700 x86, e820, pci: reserve extra free space near end of RAM The point is to take all RAM resources we have, and _after_ we've added all the resources we've seen in the E820 tree, we then _also_ try to add fake reserved entries for any "round up to X" at the end of the RAM resources. [ Impact: improve PCI mem-resource allocation robustness, protect "stolen RAM" ] Reported-by: Yannick Roehlly Acked-by: Jesse Barnes Signed-off-by: Yinghai Lu Cc: Ivan Kokshaysky Cc: Andrew Morton Cc: yannick.roehlly@free.fr LKML-Reference: <4A01A784.2050407@kernel.org> Signed-off-by: Ingo Molnar commit 134cbf35c739bf89c51fd975a33a6b87507482c4 Merge: 2feceef 091bf76 Author: Ingo Molnar Date: Mon May 11 09:33:06 2009 +0200 Merge commit 'v2.6.30-rc5' into x86/mm Merge reason: this branch was on a .30-rc2 base - sync it up with all the latest fixes. Signed-off-by: Ingo Molnar commit b30505c81a9d4adea8b70ecff512b0216929b797 Author: Darren Hart Date: Thu May 7 15:40:14 2009 -0700 futex: add requeue-pi documentation Add Documentation/futex-requeue-pi.txt describing the motivation for the newly added FUTEX_*REQUEUE_PI op codes and their implementation. [ Impact: add documentation ] Signed-off-by: Darren Hart Cc: Sripathi Kodi Cc: Peter Zijlstra Cc: John Stultz Cc: Steven Rostedt Cc: Dinakar Guniguntala Cc: Ulrich Drepper Cc: Eric Dumazet Cc: Jakub Jelinek LKML-Reference: <4A03634E.3080609@us.ibm.com> [ reformatted the file ] Signed-off-by: Ingo Molnar commit 778dedae0cb76a441145f3a0c5d59fcb3ba296d5 Author: Huang Weiyi Date: Sat May 9 12:54:34 2009 +0800 x86: mce: remove duplicated #include Remove duplicated #include in arch/x86/kernel/cpu/mcheck/mce_intel_64.c. [ Impact: cleanup ] Signed-off-by: Huang Weiyi Signed-off-by: Ingo Molnar commit 02a884c0fe7ec8459d00d34b7d4101af21fc4a86 Author: H. Peter Anvin Date: Fri May 8 17:42:16 2009 -0700 x86, boot: determine compressed code offset at compile time Determine the compressed code offset (from the kernel runtime address) at compile time. This allows some minor optimizations in arch/x86/boot/compressed/head_*.S, but more importantly it makes this value available to the build process, which will enable a future patch to export the necessary linear memory footprint into the bzImage header. [ Impact: cleanup, future patch enabling ] Signed-off-by: H. Peter Anvin commit 36d3793c947f1ef7ba3d24eeeddc1be41adc5ab4 Author: H. Peter Anvin Date: Fri May 8 16:45:15 2009 -0700 x86, boot: use appropriate rep string for move and clear In the pre-decompression code, use the appropriate largest possible rep movs and rep stos to move code and clear bss, respectively. For reverse copy, do note that the initial values are supposed to be the address of the first (highest) copy datum, not one byte beyond the end of the buffer. rep strings are not necessarily the fastest way to perform these operations on all current processors, but are likely to be in the future, and perhaps more importantly, we want to encourage the architecturally right thing to do here. This also fixes a couple of trivial inefficiencies on 64 bits. [ Impact: trivial performance enhancement, increase code similarity ] Signed-off-by: H. Peter Anvin commit 97541912785369925723b6255438ad9fce2ddf04 Author: H. Peter Anvin Date: Wed May 6 17:56:51 2009 -0700 x86, boot: zero EFLAGS on 32 bits The 64-bit code already clears EFLAGS as soon as it has a stack. This seems like a reasonable precaution, so do it on 32 bits as well. [ Impact: extra paranoia ] Signed-off-by: H. Peter Anvin commit 0a137736704ef9af719409933b3c33e138461786 Author: H. Peter Anvin Date: Fri May 8 16:27:41 2009 -0700 x86, boot: set up the decompression stack as early as possible Set up the decompression stack as soon as we know where it needs to go. That way we have a full-service stack as soon as possible, rather than relying on the BP_scratch field. Note that the stack does need to be empty during bss zeroing (or else the stack needs to be moved out of the bss segment, which is also an option.) [ Impact: cleanup, minor paranoia ] Signed-off-by: H. Peter Anvin commit 5b11f1cee5797b38d16b94d8745b12b6727a8373 Author: H. Peter Anvin Date: Fri May 8 16:20:34 2009 -0700 x86, boot: straighten out ranges to copy/zero in compressed/head*.S Both on 32 and 64 bits, we copy all the way up to the end of bss, except that on 64 bits there is a hack to avoid copying on top of the page tables. There is no point in copying bss at all, especially since we are just about to zero it all anyway. To clean up and unify the handling, we now do: - copy from startup_32 to _bss. - zero from _bss to _ebss. - the _ebss symbol is aligned to an 8-byte boundary. - the page tables are moved to a separate section. Use _bss as the copy endpoint since _edata may be misaligned. [ Impact: cleanup, trivial performance improvement ] Signed-off-by: H. Peter Anvin commit b40d68d5b5b799caaf99d2e073e62962e6d917ce Author: H. Peter Anvin Date: Fri May 8 15:59:13 2009 -0700 x86, boot: stylistic cleanups for boot/compressed/head_64.S Clean up style issues in arch/x86/boot/compressed/head_64.S. This file had a lot fewer style issues than its 32-bit cousin, but the ones it has are worth fixing, especially since it makes the two files more similar. [ Impact: cleanup, no object code change ] Signed-off-by: H. Peter Anvin commit 5f64ec64e7f9b246c0a94f34cdf7782f98a6e55d Author: H. Peter Anvin Date: Fri May 8 15:45:17 2009 -0700 x86, boot: stylistic cleanups for boot/compressed/head_32.S Reformat arch/x86/boot/compressed/head_32.S to be closer to currently preferred kernel assembly style, that is: - opcode and operand separated by tab - operands separated by ", " - C-style comments This also makes it more similar to head_64.S. [ Impact: cleanup, no object code change ] Signed-off-by: H. Peter Anvin commit bd2a36984c50bb546a7d04cb395fddcf98a1092c Author: H. Peter Anvin Date: Tue May 5 23:24:50 2009 -0700 x86, boot: use BP_scratch in arch/x86/boot/compressed/head_*.S Use the BP_scratch symbol from asm-offsets.h instead of hard-coding the location. [ Impact: cleanup ] Signed-off-by: H. Peter Anvin commit 283ab1c0bd462dd0b179393fb081a626f6687413 Author: H. Peter Anvin Date: Fri May 8 15:32:47 2009 -0700 x86, boot: follow standard Kbuild style for compression suffix When generating the compression suffix in arch/x86/boot/compressed/Makefile, follow standard Kbuild conventions, that is: - Use a dash not underscore before y/m/n endings - Use := whenever possible. Requested-by: Sam Ravnborg Signed-off-by: H. Peter Anvin commit 5f11e02019ef44f041e6e38a1363fa2fd4b8785d Author: H. Peter Anvin Date: Tue May 5 22:53:11 2009 -0700 x86, boot: simplify arch/x86/boot/compressed/Makefile Simplify the arch/x86/boot/compressed/Makefile, by using the new capability of specifying multiple inputs to a compressor, and the CONFIG_X86_NEED_RELOCS Kconfig symbol. Signed-off-by: H. Peter Anvin Acked-by: Sam Ravnborg commit 845adf7266a7ba6970bf982ffd96abc60d2018ab Author: H. Peter Anvin Date: Tue May 5 21:20:51 2009 -0700 x86: add a Kconfig symbol for when relocations are needed We only need to build relocations when we are building a 32-bit relocatable kernel. Rather than unnecessarily complicating the Makefiles, make an explicit Kbuild symbol for this. [ Impact: permits future cleanup ] Signed-off-by: H. Peter Anvin Cc: Sam Ravnborg commit d3dd3b5a29bb9582957451531fed461628dfc834 Author: H. Peter Anvin Date: Tue May 5 21:17:15 2009 -0700 kbuild: allow compressors (gzip, bzip2, lzma) to take multiple inputs Allow the compression commands in Kbuild (i.e. gzip, bzip2, lzma) to take multiple input files and emit the concatenated compressed output. This avoids an intermediate step when a kernel image is built from multiple components, such as the relocatable x86-32 kernel. Sam Ravnborg integrated the bin_size script into the Makefile. [ Impact: new build feature, not yet used ] Signed-off-by: H. Peter Anvin Acked-by: Sam Ravnborg commit 0b4eb462da10f832b28d518abffa4d77805928a0 Author: H. Peter Anvin Date: Thu Apr 30 17:59:36 2009 -0700 x86, boot: align the .bss section in the decompressor Aligning the .bss section makes it trivial to use large operation sizes for moving the initialized sections and clearing the .bss. The alignment chosen (L1 cache) is somewhat arbitrary, but should be large enough to avoid all known performance traps and small enough to not cause troubles. [ Impact: trivial performance enhancement, future patch prep ] Signed-off-by: H. Peter Anvin commit a789ed5fb6d0256c4177c2cc27e06520ddbe4d4c Author: Jeremy Fitzhardinge Date: Fri Apr 24 00:26:50 2009 -0700 xen: cache cr0 value to avoid trap'n'emulate for read_cr0 stts() is implemented in terms of read_cr0/write_cr0 to update the state of the TS bit. This happens during context switch, and so is fairly performance critical. Rather than falling back to a trap-and-emulate native read_cr0, implement our own by caching the last-written value from write_cr0 (the TS bit is the only one we really care about). Impact: optimise Xen context switches Signed-off-by: Jeremy Fitzhardinge commit b80119bb35a49a4e8dbfb9708872adfd5cf38dee Author: Jeremy Fitzhardinge Date: Fri Apr 24 00:22:08 2009 -0700 xen/x86-64: clean up warnings about IST-using traps Ignore known IST-using traps. Aside from the debugger traps, they're low-level faults which Xen will handle for us, so the kernel needn't worry about them. Keep warning in case unknown trap starts using IST. Impact: suppress spurious warnings Signed-off-by: Jeremy Fitzhardinge commit 6cac5a924668a56c7ccefc345805f1fe0536a90e Author: Jeremy Fitzhardinge Date: Sun Mar 29 19:56:29 2009 -0700 xen/x86-64: fix breakpoints and hardware watchpoints Native x86-64 uses the IST mechanism to run int3 and debug traps on an alternative stack. Xen does not do this, and so the frames were being misinterpreted by the ptrace code. This change special-cases these two exceptions by using Xen variants which run on the normal kernel stack properly. Impact: avoid crash or bad data when IST trap is invoked under Xen Signed-off-by: Jeremy Fitzhardinge commit 4671c79408a3f8a5a6a45e39c4c164dada3a5678 Author: Steven Rostedt Date: Fri May 8 16:27:41 2009 -0400 tracing: add trace_set_clr_event to export event enabling function Other parts of the kernel may need to be able to enable or disable specific events. Especially parts that create trace events. [ Impact: allow enabling of trace events by those that create the event ] Signed-off-by: Steven Rostedt commit 29f93943d1916d1a3faa3f10f4a06994347ac990 Author: Steven Rostedt Date: Fri May 8 16:06:47 2009 -0400 tracing: initialize return value for __ftrace_set_clr_event Commit 8f31bfe538ebafac187d2d4465a92e1d9ee6d8c2 tracing/events: clean up for ftrace_set_clr_event() Moved out the code for ftrace_set_clr_event into a helper funciton but did not initialize the return value. As a result, we do not warn about a typo in the echoing of events in set_event. This patch restores the old warning: # echo foobar > set_event -bash: echo: write error: Invalid argument [ Impact: restore warning of invalid entries to set_event ] Signed-off-by: Steven Rostedt commit bf8b9a63c18a1a7777571650de0c9f4fd4368ca0 Author: Jaswinder Singh Rajput Date: Fri May 8 20:53:58 2009 +0530 x86: msr-index.h remove duplicate MSR C001_0015 declaration MSRC001_0015 Hardware Configuration Register (HWCR) is already defined as MSR_K7_HWCR. And HWCR is available for >= K7. So MSR_K8_HWCR is not required and no-one is using it. [ Impact: cleanup, no object code change ] Signed-off-by: Jaswinder Singh Rajput commit c142b15dc56ee6d55cb97a062e3c8e9c61e384c0 Author: Li Zefan Date: Fri May 8 10:32:05 2009 +0800 tracing/events: simplify system_enable_read() A smarter way to figure out the output of an enable file. [ Impact: clean up ] Signed-off-by: Li Zefan Acked-by: Steven Rostedt Acked-by: Frederic Weisbecker LKML-Reference: <4A0399A5.2080603@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 8f31bfe538ebafac187d2d4465a92e1d9ee6d8c2 Author: Li Zefan Date: Fri May 8 10:31:42 2009 +0800 tracing/events: clean up for ftrace_set_clr_event() Add a helper function __ftrace_set_clr_event(), and replace some ftrace_set_clr_event() calls with this helper, thus we don't need any kstrdup() or kmalloc(). As a side effect, this patch fixes an issue in self tests code, which is similar to the one fixed in commit d6bf81ef0f7474434c2a049e8bf3c9146a14dd96 ("tracing: append ":*" to internal setting of system events") It's a small issue and won't cause any bug in fact, but we should do things right anyway. [ Impact: prevent spurious event-enabling in tracing self-tests ] Signed-off-by: Li Zefan Acked-by: Steven Rostedt Acked-by: Frederic Weisbecker LKML-Reference: <4A03998E.3020503@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 7e4e0bd50e80df2fe5501f48f872448376cdd997 Author: Robert Richter Date: Wed May 6 12:10:23 2009 +0200 oprofile: introduce module_param oprofile.cpu_type This patch removes module_param oprofile.force_arch_perfmon and introduces oprofile.cpu_type=archperfmon instead. This new parameter can be reused for other models and architectures. Currently only archperfmon is supported. Cc: Andi Kleen Signed-off-by: Robert Richter commit 6adf406f0a0eaf37251018d15f51e93f5b538ee6 Author: Andi Kleen Date: Mon Apr 27 17:44:13 2009 +0200 oprofile: add support for Core i7 and Atom The registers are about the same as other Family 6 CPUs so we only need to add detection. I'm not completely happy with calling Nehalem Core i7 because there will be undoubtedly other Nehalem based CPUs in the future with different marketing names, but it's the best we got for now. Requires updated oprofile userland for the new event files. If you don't want to update right now you can also use oprofile.force_arch_perfmon=1 (added in the next patch) with 0.9.4 Signed-off-by: Andi Kleen Signed-off-by: Robert Richter commit 1f3d7b60691993d8d368d8dd7d5d85871d41e8f5 Author: Andi Kleen Date: Mon Apr 27 17:44:12 2009 +0200 oprofile: remove undocumented oprofile.p4force option There are no new P4s and the oprofile code knows about all existing ones, so we don't really need the p4force option anymore. Remove it. Signed-off-by: Andi Kleen Signed-off-by: Robert Richter commit 1dcdb5a9e7c235e6e80f1f4d5b8247b3e5347e48 Author: Andi Kleen Date: Mon Apr 27 17:44:11 2009 +0200 oprofile: re-add force_arch_perfmon option This re-adds the force_arch_perfmon option that was in the original arch perfmon patchkit. Originally this was rejected in favour of a generalized perfmon=name option, but it turned out implementing the later in a reliable way is hard (and it would have been easy to crash the kernel if a user gets it wrong) But now Atom and Core i7 support being readded a user would need to update their oprofile userland to beyond 0.9.4 to use oprofile again on Atom or Core i7. To avoid this problem readd the force_arch_perfmon option. Signed-off-by: Andi Kleen Signed-off-by: Robert Richter commit 6b2e8523df148c15ea5abf13075026fb8bdb3f86 Author: Jeremy Fitzhardinge Date: Thu May 7 11:56:49 2009 -0700 xen: reserve Xen start_info rather than e820 reserving Use reserve_early rather than e820 reservations for Xen start info and mfn->pfn table, so that the memory use is a bit more self-documenting. [ Impact: cleanup ] Signed-off-by: Jeremy Fitzhardinge Cc: Xen-devel Cc: Linus Torvalds LKML-Reference: <4A032EF1.6070708@goop.org> Signed-off-by: Ingo Molnar commit f066a155334642b8a206eec625b1925d88c48aeb Merge: e7c0648 33df4db Author: Ingo Molnar Date: Fri May 8 10:50:00 2009 +0200 Merge branch 'x86/urgent' into x86/xen Conflicts: arch/frv/include/asm/pgtable.h arch/x86/include/asm/required-features.h arch/x86/xen/mmu.c Merge reason: x86/xen was on a .29 base still, move it to a fresher branch and pick up Xen fixes as well, plus resolve conflicts Signed-off-by: Ingo Molnar commit 74f4fd21664148b8c454cc07bfe74e4dd51cf07b Author: Steven Rostedt Date: Thu May 7 19:58:55 2009 -0400 ring-buffer: change WARN_ON from checking preempt_count to preemptible There's a WARN_ON in the ring buffer code that makes sure preemption is disabled. It checks "!preempt_count()". But when CONFIG_PREEMPT is not enabled, preempt_count() is always zero, and this will trigger the warning. [ Impact: prevent false warning on non preemptible kernels ] Signed-off-by: Steven Rostedt commit 7da3046d6ce6ea97494020081c509b642b7016af Author: Steven Rostedt Date: Thu May 7 19:52:20 2009 -0400 ring-buffer: add total count in ring-buffer-benchmark It is nice to see the overhead of the benchmark test when tracing is disabled. That is, we turn off the ring buffer just to see what the cost of running the loop that calls into the ring buffer is. Currently, if no entries wer made, we get 0. This is not informative. This patch changes it to check if we had any "missed" (non recorded) events. If so, a total count is also reported. [ Impact: evaluate the over head of the ring buffer benchmark test ] Signed-off-by: Steven Rostedt commit 0574ea421b90e0e45a72c447dd3c2c79ffd8c153 Author: Steven Rostedt Date: Thu May 7 14:20:28 2009 -0400 ring-buffer: only periodically call cond_resched to ring-buffer-benchmark Calling cond_resched at every iteration of the loop adds a bit of overhead to the benchmark. This patch does two things. 1) only calls cond-resched when CONFIG_PREEMPT is not enabled 2) only calls cond-resched after so many traces has been performed. [ Impact: less overhead to the ring-buffer-benchmark ] Signed-off-by: Steven Rostedt commit 65b77242043f74bca6a0d733c0e48ef03a8c9893 Author: Steven Rostedt Date: Thu May 7 12:49:27 2009 -0400 tracing: have menu default enabled when kernel debug is configured Tracing can be very helpful to debug the kernel. When DEBUG_KERNEL is enabled it is nice to enable the trace menu as well. This patch only make the tracing menu enabled by default, it does not make any of the tracers enabled. And the menu is only enabled by default if DEBUG_KERNEL is enabled. [ Impact: show tracing options to those debugging the kernel ] Signed-off-by: Steven Rostedt commit d6bf81ef0f7474434c2a049e8bf3c9146a14dd96 Author: Steven Rostedt Date: Thu May 7 11:49:35 2009 -0400 tracing: append ":*" to internal setting of system events The system enabling of events uses the same code as the set_event file. It passes in the name of the system to the parser and that will enable all the events that has that system as a name. The problem is that it will also enable events with the same name as the system. If you have system name foo, and system name bar, but within the system bar, there exists an event called foo. By setting the system name foo, you will also be enabling the event foo in the system bar. This is not an expected result. The solution is to pass in "foo:*", which will only enable the system foo and not events called foo. [ Impact: prevent accidental enabling of events with same name as a system ] Reported-by: Li Zefan Signed-off-by: Steven Rostedt commit 29c8000ee7da3a6756d26143991e573eaaf2a9f6 Author: Steven Rostedt Date: Thu May 7 11:13:42 2009 -0400 ring-buffer: remove complex calculations in ring-buffer-test Ingo Molnar thought that the code to calculate the time in cond_resched is a bit too ugly and is not needed. This patch removes it and replaces it with a simple call to cond_resched. I kept the comment that explains the reason for the cond_resched. [ Impact: remove ugly code ] Reported-by: Ingo Molnar Signed-off-by: Steven Rostedt commit 0ad5d703c6c0fcd385d956555460df95dff7eb7e Merge: 44347d9 1cb81b1 Author: Ingo Molnar Date: Thu May 7 11:18:34 2009 +0200 Merge branch 'tracing/hw-branch-tracing' into tracing/core Merge reason: this topic is ready for upstream now. It passed Oleg's review and Andrew had no further mm/* objections/observations either. Signed-off-by: Ingo Molnar commit 44347d947f628060b92449702071bfe1d31dfb75 Merge: d94fc52 413f81e Author: Ingo Molnar Date: Thu May 7 11:17:13 2009 +0200 Merge branch 'linus' into tracing/core Merge reason: tracing/core was on a .30-rc1 base and was missing out on on a handful of tracing fixes present in .30-rc5-almost. Signed-off-by: Ingo Molnar commit d94fc523f3c35bd8013f04827e94756cbc0212f4 Author: Li Zefan Date: Thu May 7 15:11:15 2009 +0800 tracing/events: fix concurrent access to ftrace_events list, fix In filter_add_subsystem_pred() we should release event_mutex before calling filter_free_subsystem_preds(), since both functions hold event_mutex. [ Impact: fix deadlock when writing invalid pred into subsystem filter ] Signed-off-by: Li Zefan Cc: tzanussi@gmail.com Cc: a.p.zijlstra@chello.nl Cc: fweisbec@gmail.com Cc: rostedt@goodmis.org LKML-Reference: <4A028993.7020509@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 5928c3cc0ffcb6894bbab6be591b7ae1786b2d87 Author: Frederic Weisbecker Date: Sun May 3 03:03:57 2009 +0200 tracing/filters: support for operator reserved characters in strings When we set a filter for an event, such as: echo "name == my_lock_name" > \ /debug/tracing/events/lockdep/lock_acquired/filter then the following order of token type is parsed: - space - operator - parentheses - operand Because the operators and parentheses have a higher precedence than the operand characters, which is normal, then we can't use any string containing such special characters: ()=<>!&| To get this support and also avoid ambiguous intepretation from the parser or the human, we can do it using double quotes so that we keep the usual languages habits. Then after this patch you can still declare string condition like before: echo name == myname But if you want to compare against a string containing an operator character, you can use double quotes: echo 'name == "&myname"' Don't forget to include the whole expression into single quotes or the double ones will be eaten by echo. [ Impact: support strings with special characters for tracing filters ] Cc: Tom Zanussi Cc: Steven Rostedt Cc: Li Zefan Cc: Zhaolei Signed-off-by: Frederic Weisbecker commit e8808c1019b048a43686dbd25c188a035842c2e2 Author: Frederic Weisbecker Date: Sun May 3 02:48:52 2009 +0200 tracing/filters: support for filters of dynamic sized arrays Currently the filtering infrastructure supports well the numeric types and fixed sized array types. But the recently added __string() field uses a specific indirect offset mechanism which requires a specific predicate. Until now it wasn't supported. This patch adds this support and implies very few changes, only a new predicate is needed, the management of this specific field can be done through the usual string helpers in the filtering infrastructure. [ Impact: support all kinds of strings in the tracing filters ] Cc: Tom Zanussi Cc: Steven Rostedt Cc: Li Zefan Cc: Zhaolei Signed-off-by: Frederic Weisbecker commit aa47b7e0f89b9998dad4d1667447e8cb7703ff4e Author: David Rientjes Date: Mon May 4 01:38:05 2009 -0700 sched: emit thread info flags with stack trace When a thread is oom killed and fails to exit, it's helpful to know which threads have access to memory reserves if the machine livelocks. This is done by testing for the TIF_MEMDIE thread info flag and should be displayed alongside stack traces to identify tasks that have access to such reserves but are still stuck allocating pages, for instance. It would probably be helpful in other cases as well, so all thread info flags are emitted when showing a task. ( v2: fix warning reported by Stephen Rothwell ) [ Impact: extend debug printout info ] Signed-off-by: David Rientjes Cc: Peter Zijlstra Cc: Stephen Rothwell LKML-Reference: Signed-off-by: Ingo Molnar commit 643bec956544d376b7c2a80a3d5c3d0bf94da8d3 Author: Ingo Molnar Date: Thu May 7 09:12:50 2009 +0200 x86: clean up arch/x86/kernel/tsc_sync.c a bit - remove unused define - make the lock variable definition stand out some more - convert KERN_* to pr_info() / pr_warning() [ Impact: cleanup ] LKML-Reference: Signed-off-by: Ingo Molnar commit 975e5f45500dff6d15c0001bb662e9aac0ce0076 Author: Samuel Bronson Date: Wed May 6 22:27:55 2009 -0400 x86: use symbolic name for VM86_SIGNAL when used as vm86 default return This code has apparently used "0" and not VM86_SIGNAL since Linux 1.1.9, when Linus added VM86_SIGNAL to vm86.h. This patch changes the code to use the symbolic name. The magic 0 tripped me up in trying to extend the vm86(2) manpage to actually explain vm86()'s interface -- my greps for VM86_SIGNAL came up fruitless. [ Impact: cleanup; no object code change ] Signed-off-by: Samuel Bronson Signed-off-by: H. Peter Anvin commit 8ae79a138e88aceeeb07077bff2883245fb7c218 Author: Steven Rostedt Date: Wed May 6 22:52:15 2009 -0400 tracing: add hierarchical enabling of events With the current event directory, you can only enable individual events. The file debugfs/tracing/set_event is used to be able to enable or disable several events at once. But that can still be awkward. This patch adds hierarchical enabling of events. That is, each directory in debugfs/tracing/events has an "enable" file. This file can enable or disable all events within the directory and below. # echo 1 > /debugfs/tracing/events/enable will enable all events. # echo 1 > /debugfs/tracing/events/sched/enable will enable all events in the sched subsystem. # echo 1 > /debugfs/tracing/events/enable # echo 0 > /debugfs/tracing/events/irq/enable will enable all events, but then disable just the irq subsystem events. When reading one of these enable files, there are four results: 0 - all events this file affects are disabled 1 - all events this file affects are enabled X - there is a mixture of events enabled and disabled ? - this file does not affect any event Signed-off-by: Steven Rostedt commit 9456f0fa6d3cb944d3b9fc31c9a244e0362c26ea Author: Steven Rostedt Date: Wed May 6 21:54:09 2009 -0400 tracing: reset ring buffer when removing modules with events Li Zefan found that there's a race using the event ids of events and modules. When a module is loaded, an event id is incremented. We only have 16 bits for event ids (65536) and there is a possible (but highly unlikely) race that we could load and unload a module that registers events so many times that the event id counter overflows. When it overflows, it then restarts and goes looking for available ids. An id is available if it was added by a module and released. The race is if you have one module add an id, and then is removed. Another module loaded can use that same event id. But if the old module still had events in the ring buffer, the new module's call back would get bogus data. At best (and most likely) the output would just be garbage. But if the module for some reason used pointers (not recommended) then this could potentially crash. The safest thing to do is just reset the ring buffer if a module that registered events is removed. [ Impact: prevent unpredictable results of event id overflows ] Reported-by: Li Zefan LKML-Reference: <49FEAFD0.30106@cn.fujitsu.com> Signed-off-by: Steven Rostedt commit 71e1c8ac42ae4038ddb1367cce7097ab868dc532 Author: Steven Rostedt Date: Wed May 6 21:20:39 2009 -0400 tracing: update sample with TRACE_INCLUDE_FILE When creating trace events for ftrace, the header file with the TRACE_EVENT macros must also have a macro called TRACE_SYSTEM. This macro describes the name of the system the TRACE_EVENTS are defined for. It also doubles as a way for the define_trace.h file to include the file that included it. For example: in irq.h #define TRACE_SYSTEM irq [...] #include The define_trace will use TRACE_SYSTEM to include irq.h. But if the name of the trace system does not match the name of the trace header file, one can override it with: Which will change define_trace.h to inclued foo_trace.h instead of foo.h The sample comments this, but people that use the sample code will more likely use the code and not read the comments. This patch changes the sample code to use the TRACE_INCLUDE_FILE to better show developers how to use it. [ Impact: make sample less confusing to developers ] Reported-by: Christoph Hellwig Signed-off-by: Steven Rostedt commit 3e07a4f680adc66dfa175aa5021aedf340251b12 Author: Steven Rostedt Date: Wed May 6 18:36:59 2009 -0400 ring-buffer: change test to be more latency friendly The ring buffer benchmark/test runs a producer for 10 seconds. This is done with preemption and interrupts enabled. But if the kernel is not compiled with CONFIG_PREEMPT, it basically stops everything but interrupts for 10 seconds. Although this is just a test and is not for production, this attribute can be quite annoying. It can also spawn badness elsewhere. This patch solves the issues by calling "cond_resched" when the system is not compiled with CONFIG_PREEMPT. It also keeps track of the time spent to call cond_resched such that it does not go against the time calculations. That is, if the task schedules away, the time scheduled out is removed from the test data. Note, this only works for non PREEMPT because we do not know when the task is scheduled out if we have PREEMPT enabled. [ Impact: prevent test from stopping the world for 10 seconds ] Signed-off-by: Steven Rostedt commit 6634ff26cce2da04e5c2a5481bcb8888e7d01786 Author: Steven Rostedt Date: Wed May 6 15:30:07 2009 -0400 ring-buffer: make moving the tail page a separate function Ingo Molnar thought the code would be cleaner if we used a function call instead of a goto for moving the tail page. After implementing this, it seems that gcc still inlines the result and the output is pretty much the same. Since this is considered a cleaner approach, might as well implement it. [ Impact: code clean up ] Signed-off-by: Steven Rostedt commit 00c81a58c5b4e0de14ee33bfbc3d71c90f69f9ea Author: Steven Rostedt Date: Wed May 6 12:40:51 2009 -0400 ring-buffer: check for failed allocation in ring buffer benchmark The result of the allocation of the ring buffer read page in the ring buffer bench mark does not check the return to see if a page was actually allocated. This patch fixes that. [ Impact: avoid NULL dereference ] Signed-off-by: Steven Rostedt commit 8e7abf1c62941ebb7a1416cbc62392c8a0902625 Author: Steven Rostedt Date: Wed May 6 10:26:45 2009 -0400 ring-buffer: remove unneeded conditional in rb_reserve_next The code in __rb_reserve_next checks on page overflow if it is the original commiter and then resets the page back to the original setting. Although this is fine, and the code is correct, it is a bit fragil. Some experimental work I did breaks it easily. The better and more robust solution is to have all commiters that overflow the page, simply subtract what they added. [ Impact: more robust ring buffer account management ] Signed-off-by: Steven Rostedt commit 35cf723e99c0e26ddf51f037dffaa4ff2c2c9106 Author: Christoph Hellwig Date: Wed May 6 12:33:38 2009 +0200 tracing: small trave_events sample Makefile cleanup Use -I$(src) to add the current directory the include path. [ Impact: cleanup ] Signed-off-by: Christoph Hellwig Acked-by: Steven Rostedt LKML-Reference: Signed-off-by: Ingo Molnar commit 48dd0fed90e2b1f1ba87401439b85942181c6df3 Author: Jaswinder Singh Rajput Date: Wed May 6 15:45:45 2009 +0530 tracing: trace_output.c, fix false positive compiler warning This compiler warning: CC kernel/trace/trace_output.o kernel/trace/trace_output.c: In function ‘register_ftrace_event’: kernel/trace/trace_output.c:544: warning: ‘list’ may be used uninitialized in this function Is wrong as 'list' is always initialized - but GCC (4.3.2) does not recognize this relationship properly. Work around the warning by initializing the variable to NULL. [ Impact: fix false positive compiler warning ] Signed-off-by: Jaswinder Singh Rajput Acked-by: Steven Rostedt LKML-Reference: Signed-off-by: Ingo Molnar commit 22a7c31a9659deaddafbbcec6562d44141e84474 Author: Alan D. Brunelle Date: Mon May 4 16:35:08 2009 -0400 blktrace: from-sector redundant in trace_block_remap Remove redundant from-sector parameter: it's /always/ the bio's sector passed in. [ Impact: cleanup ] Signed-off-by: Alan D. Brunelle Reviewed-by: Li Zefan Reviewed-by: KOSAKI Motohiro Cc: Jens Axboe Cc: Arnaldo Carvalho de Melo LKML-Reference: <49FF517C.7000503@hp.com> Signed-off-by: Ingo Molnar commit a42aaa3bbce85ac487ad4fad5db99e8e91b7aac1 Author: Alan D. Brunelle Date: Mon May 4 16:27:26 2009 -0400 blktrace: correct remap names This attempts to clarify names utilized during block I/O remap operations (partition, volume manager). It correctly matches up the /from/ information for both device & sector. This takes in the concept from Kosaki Motohiro and extends it to include better naming for the "device_from" field. [ Impact: cleanup ] Signed-off-by: Alan D. Brunelle Reviewed-by: Li Zefan Reviewed-by: KOSAKI Motohiro Cc: Jens Axboe Cc: Arnaldo Carvalho de Melo LKML-Reference: <49FF4FAE.3000301@hp.com> Signed-off-by: Ingo Molnar commit de1d7286060430e79a1d50ad6e5fee8fe863c5f6 Author: Mathieu Desnoyers Date: Tue May 5 16:49:59 2009 +0800 tracepoint: trace_sched_migrate_task(): remove parameter The orig_cpu parameter in trace_sched_migrate_task() is not necessary, it can be got by using task_cpu(p) in the probe. [ Impact: micro-optimization ] Signed-off-by: Mathieu Desnoyers [ modified from Mathieu's patch. The original patch is at: http://marc.info/?l=linux-kernel&m=123791201716239&w=2 ] Signed-off-by: Xiao Guangrong Cc: fweisbec@gmail.com Cc: rostedt@goodmis.org Cc: Li Zefan Cc: zhaolei@cn.fujitsu.com Cc: laijs@cn.fujitsu.com LKML-Reference: <49FFFDB7.1050402@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 20c8928abe70e204bd077ab6cfe23002d7788983 Author: Li Zefan Date: Wed May 6 10:33:45 2009 +0800 tracing/events: fix concurrent access to ftrace_events list A module will add/remove its trace events when it gets loaded/unloaded, so the ftrace_events list is not "const", and concurrent access needs to be protected. This patch thus fixes races between loading/unloding modules and read 'available_events' or read/write 'set_event', etc. Below shows how to reproduce the race: # for ((; ;)) { cat /mnt/tracing/available_events; } > /dev/null & # for ((; ;)) { insmod trace-events-sample.ko; rmmod sample; } & After a while: BUG: unable to handle kernel paging request at 0010011c IP: [] t_next+0x1b/0x2d ... Call Trace: [] ? seq_read+0x217/0x30d [] ? seq_read+0x0/0x30d [] ? vfs_read+0x8f/0x136 [] ? sys_read+0x40/0x65 [] ? sysenter_do_call+0x12/0x36 [ Impact: fix races when concurrent accessing ftrace_events list ] Signed-off-by: Li Zefan Acked-by: Steven Rostedt Acked-by: Frederic Weisbecker Cc: Tom Zanussi Cc: Peter Zijlstra LKML-Reference: <4A00F709.3080800@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 2df75e415709ad12862028916c772c1f377f6a7c Author: Li Zefan Date: Wed May 6 10:33:04 2009 +0800 tracing/events: fix memory leak when unloading module When unloading a module, memory allocated by init_preds() and trace_define_field() is not freed. [ Impact: fix memory leak ] Signed-off-by: Li Zefan Acked-by: Frederic Weisbecker Acked-by: Steven Rostedt Cc: Tom Zanussi LKML-Reference: <4A00F6E0.3040503@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 96d17980fabeb757706d2d6db5a28580a6156bfc Author: Li Zefan Date: Wed May 6 10:32:32 2009 +0800 tracing/events: make SAMPLE_TRACE_EVENTS default to n Normally a config should be default to n. This patch also makes the sample module-only, like SAMPLE_MARKERS and SAMPLE_TRACEPOINTS. [ Impact: don't build trace event sample by default ] Signed-off-by: Li Zefan Acked-by: Steven Rostedt Acked-by: Frederic Weisbecker LKML-Reference: <4A00F6C0.8090803@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit fd6da10a617f483348ee32bcfe53fd20c302eca1 Author: Li Zefan Date: Wed May 6 10:32:13 2009 +0800 tracing/events: don't say hi when loading the trace event sample The sample is useful for testing, and I'm using it. But after loading the module, it keeps saying hi every 10 seconds, this may be disturbing. Also Steven said commenting out the "hi" helped in causing races. :) [ Impact: make testing a bit easier ] Signed-off-by: Li Zefan Acked-by: Steven Rostedt Acked-by: Frederic Weisbecker LKML-Reference: <4A00F6AD.2070008@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit b2e5d8588de0b5341eddad87dbe48d2185eaa3dd Author: Ingo Molnar Date: Wed May 6 07:55:33 2009 +0200 irq: change ->set_affinity() to return status, fix This build failure: arch/powerpc/sysdev/mpic.c:810: error: conflicting types for 'mpic_set_affinity' arch/powerpc/sysdev/mpic.h:39: error: previous declaration of 'mpic_set_affinity' was here make[2]: *** [arch/powerpc/sysdev/mpic.o] Error 1 make[2]: *** Waiting for unfinished jobs.... Triggers because the function prototype was not updated when the function call signature got changed by: d5dedd4: irq: change ->set_affinity() to return status [ Impact: build fix on powerpc ] Cc: Benjamin Herrenschmidt Cc: Yinghai Lu Cc: Andrew Morton Cc: Rusty Russell Cc: linux-arch@vger.kernel.org LKML-Reference: <49F654E9.4070809@kernel.org> Signed-off-by: Ingo Molnar commit 5092dbc96f3acdac5433b27c06860352dc6d23b9 Author: Steven Rostedt Date: Tue May 5 22:47:18 2009 -0400 ring-buffer: add benchmark and tester This patch adds code that can benchmark the ring buffer as well as test it. This code can be compiled into the kernel (not recommended) or as a module. A separate ring buffer is used to not interfer with other users, like ftrace. It creates a producer and a consumer (option to disable creation of the consumer) and will run for 10 seconds, then sleep for 10 seconds and then repeat. While running, the producer will write 10 byte loads into the ring buffer with just putting in the current CPU number. The reader will continually try to read the buffer. The reader will alternate from reading the buffer via event by event, or by full pages. The output is a pr_info, thus it will fill up the syslogs. Starting ring buffer hammer End ring buffer hammer Time: 9000349 (usecs) Overruns: 12578640 Read: 5358440 (by events) Entries: 0 Total: 17937080 Missed: 0 Hit: 17937080 Entries per millisec: 1993 501 ns per entry Sleeping for 10 secs Starting ring buffer hammer End ring buffer hammer Time: 9936350 (usecs) Overruns: 0 Read: 28146644 (by pages) Entries: 74 Total: 28146718 Missed: 0 Hit: 28146718 Entries per millisec: 2832 353 ns per entry Sleeping for 10 secs Time: is the time the test ran Overruns: the number of events that were overwritten and not read Read: the number of events read (either by pages or events) Entries: the number of entries left in the buffer (the by pages will only read full pages) Total: Entries + Read + Overruns Missed: the number of entries that failed to write Hit: the number of entries that were written The above example shows that it takes ~353 nanosecs per entry when there is a reader, reading by pages (and no overruns) The event by event reader slowed the producer down to 501 nanosecs. [ Impact: see how changes to the ring buffer affect stability and performance ] Signed-off-by: Steven Rostedt commit 2feceeff1e771850e49f9074307f071964fd9e3e Author: H. Peter Anvin Date: Tue May 5 19:07:07 2009 -0700 x86: fix typo in address space documentation Fix a trivial typo in Documentation/x86/x86_64/mm.txt. [ Impact: documentation only ] Signed-off-by: H. Peter Anvin Cc: Rik van Riel commit c898faf91b3ec6b0f6efa35831b3984fa3331db0 Author: Rik van Riel Date: Tue May 5 17:28:56 2009 -0400 x86: 46 bit physical address support on 64 bits Extend the maximum addressable memory on x86-64 from 2^44 to 2^46 bytes. This requires some shuffling around of the vmalloc and virtual memmap memory areas, to keep them away from the direct mapping of up to 64TB of physical memory. This patch also introduces a guard hole between the vmalloc area and the virtual memory map space. There's really no good reason why we wouldn't have a guard hole there. [ Impact: future hardware enablement ] Signed-off-by: Rik van Riel LKML-Reference: <20090505172856.6820db22@cuia.bos.redhat.com> Signed-off-by: H. Peter Anvin commit aa20ae8444fc6c318272c643f856d8d8ad3e198d Author: Steven Rostedt Date: Tue May 5 21:16:11 2009 -0400 ring-buffer: move big if statement down In the hot path of the ring buffer "__rb_reserve_next" there's a big if statement that does not even return back to the work flow. code; if (cross to next page) { [ lots of code ] return; } more code; The condition is even the unlikely path, although we do not denote it with an unlikely because gcc is fine with it. The condition is true when the write crosses a page boundary, and we need to start at a new page. Having this if statement makes it hard to read, but calling another function to do the work is also not appropriate, because we are using a lot of variables that were set before the if statement, and we do not want to send them as parameters. This patch changes it to a goto: code; if (cross to next page) goto next_page; more code; return; next_page: [ lots of code] This makes the code easier to understand, and a bit more obvious. The output from gcc is practically identical. For some reason, gcc decided to use different registers when I switched it to a goto. But other than that, the logic is the same. [ Impact: easier to read code ] Signed-off-by: Steven Rostedt commit 94487d6d53af5acae10cf9fd52f74498994d46b1 Author: Steven Rostedt Date: Tue May 5 19:22:53 2009 -0400 tracing: use proper export symbol for tracing api When adding the EXPORT_SYMBOL to some of the tracing API, I accidently used EXPORT_SYMBOL instead of EXPORT_SYMBOL_GPL. This patch fixes that mistake. [ Impact: export the tracing code only for GPL modules ] Reported-by: Christoph Hellwig Signed-off-by: Steven Rostedt commit 31b6e76e21b2ffd3cb2f6fe4149790a9fdadce2d Author: Tim Abbott Date: Thu Apr 30 20:06:11 2009 -0400 ftrace: use .sched.text, not .text.sched in recordmcount.pl The only references in the kernel to the .text.sched section are in recordmcount.pl. Since the code it has is intended to be example code it should refer to real kernel sections. So change it to .sched.text instead. [ Impact: consistency in comments ] Signed-off-by: Tim Abbott LKML-Reference: <1241136371-10768-1-git-send-email-tabbott@mit.edu> Acked-by: Sam Ravnborg Signed-off-by: Steven Rostedt commit 41ede23eded40832c955d98d4b71bc244809abb3 Author: Steven Rostedt Date: Fri May 1 20:26:54 2009 -0400 ring-buffer: disable writers when resetting buffers As a precaution, it is best to disable writing to the ring buffers when reseting them. [ Impact: prevent weird things if write happens during reset ] Signed-off-by: Steven Rostedt commit afbab76a62b69ea6197e19727d4b8a8aef8deb25 Author: Steven Rostedt Date: Fri May 1 19:40:05 2009 -0400 ring-buffer: have read page swap increment counter with page entries In the swap page ring buffer code that is used by the ftrace splice code, we scan the page to increment the counter of entries read. With the number of entries already in the page we simply need to add it. [ Impact: speed up reading page from ring buffer ] Signed-off-by: Steven Rostedt commit 778c55d44eb4f5f658915ed631d68ed9d1ac3ad1 Author: Steven Rostedt Date: Fri May 1 18:44:45 2009 -0400 ring-buffer: record page entries in buffer page descriptor Currently, when the ring buffer writer overflows the buffer and must write over non consumed data, we increment the overrun counter by reading the entries on the page we are about to overwrite. This reads the entries one by one. This is not very effecient. This patch adds another entry counter into each buffer page descriptor that keeps track of the number of entries on the page. Now on overwrite, the overrun counter simply needs to add the number of entries that is on the page it is about to overwrite. [ Impact: speed up of ring buffer in overwrite mode ] Signed-off-by: Steven Rostedt commit 41c51c98f588edcdf6141cff1895df738e03ddd4 Author: Oleg Nesterov Date: Sun May 3 23:11:18 2009 +0200 rcu: rcu_sched_grace_period(): kill the bogus flush_signals() As a kernel thread, rcu_sched_grace_period() runs with all signals ignored. It can never receive a signal even if it sleeps in TASK_INTERRUPTIBLE, it needs the explicit allow_signal() to be visible for signals. [ Impact: reduce kernel size, remove dead code ] Signed-off-by: Oleg Nesterov Reviewed-by: Paul E. McKenney Cc: Andrew Morton LKML-Reference: <20090503211118.GA22973@redhat.com> Signed-off-by: Ingo Molnar commit e4906eff9e6fbd2d311abcbcc53d5a531773c982 Author: Steven Rostedt Date: Thu Apr 30 20:49:44 2009 -0400 ring-buffer: convert cpu buffer entries to local_t The entries counter in cpu buffer is not atomic. It can be updated by other interrupts or from another CPU (readers). But making entries into "atomic_t" causes an atomic operation that can hurt performance. Instead we convert it to a local_t that will increment a counter with a local CPU atomic operation (if the arch supports it). Instead of fighting with readers and overwrites that decrement the counter, I added a "read" counter. Every time a reader reads an entry it is incremented. We already have a overrun counter and with that, the entries counter and the read counter, we can calculate the total number of entries in the buffer with: (entries - overrun) - read As long as the total number of entries in the ring buffer is less than the word size, this will work. But since the entries counter was previously a long, this is no different than what we had before. Thanks to Andrew Morton for pointing out in the first version that atomic_t does not replace unsigned long. I switched to atomic_long_t even though it is signed. A negative count is most likely a bug. [ Impact: keep accurate count of cpu buffer entries ] Signed-off-by: Steven Rostedt commit 60aa605dfce2976e54fa76e805ab0f221372d4d9 Author: Peter Zijlstra Date: Tue May 5 17:50:21 2009 +0200 sched: rt: document the risk of small values in the bandwidth settings Thomas noted that we should disallow sysctl_sched_rt_runtime == 0 for (!RT_GROUP) since the root group always has some RT tasks in it. Further, update the documentation to inspire clue. [ Impact: exclude corner-case sysctl_sched_rt_runtime value ] Reported-by: Thomas Gleixner Signed-off-by: Peter Zijlstra LKML-Reference: <20090505155436.863098054@chello.nl> Signed-off-by: Ingo Molnar commit c8d771835e18c938dae8690611d65fe98ad30f58 Author: Steven Rostedt Date: Wed Apr 29 18:03:45 2009 -0400 tracing: export stats of ring buffers to userspace This patch adds stats to the ftrace ring buffers: # cat /debugfs/tracing/per_cpu/cpu0/stats entries: 42360 overrun: 30509326 commit overrun: 0 nmi dropped: 0 Where entries are the total number of data entries in the buffer. overrun is the number of entries not consumed and were overwritten by the writer. commit overrun is the number of entries dropped due to nested writers wrapping the buffer before the initial writer finished the commit. nmi dropped is the number of entries dropped due to the ring buffer lock being held when an nmi was going to write to the ring buffer. Note, this field will be meaningless and will go away when the ring buffer becomes lockless. [ Impact: let userspace know what is happening in the ring buffers ] Signed-off-by: Steven Rostedt commit f0d2c681ac0a85142fc8abe65fc33fcad35cb9b7 Author: Steven Rostedt Date: Wed Apr 29 13:43:37 2009 -0400 ring-buffer: add counters for commit overrun and nmi dropped entries The WARN_ON in the ring buffer when a commit is preempted and the buffer is filled by preceding writes can happen in normal operations. The WARN_ON makes it look like a bug, not to mention, because it does not stop tracing and calls printk which can also recurse, this is prone to deadlock (the WARN_ON is not in a position to recurse). This patch removes the WARN_ON and replaces it with a counter that can be retrieved by a tracer. This counter is called commit_overrun. While at it, I added a nmi_dropped counter to count any time an NMI entry is dropped because the NMI could not take the spinlock. [ Impact: prevent deadlock by printing normal case warning ] Signed-off-by: Steven Rostedt commit d6ce96dabe2c4409fd009ec14250a1fdbab4b133 Author: Steven Rostedt Date: Tue May 5 01:15:24 2009 -0400 ring-buffer: export symbols I'm adding a module to do a series of tests on the ring buffer as well as benchmarks. This module needs to have more of the ring buffer API exported. There's nothing wrong with reading the ring buffer from a module. [ Impact: allow modules to read pages from the ring buffer ] Signed-off-by: Steven Rostedt commit 9a8709d44139748fe2e0ab56d20d8c384c8b65ad Author: Cyrill Gorcunov Date: Sat May 2 00:25:11 2009 +0400 x86: uv - prevent NULL dereference in uv_system_init() We may reach NULL dereference oops if kmalloc failed. Prevent it with explicit BUG_ON. [ Impact: more controlled assert in 'impossible' scenario ] Signed-off-by: Cyrill Gorcunov Acked-by: Jack Steiner LKML-Reference: <20090501202511.GE4633@lenovo> Signed-off-by: Ingo Molnar commit 3969c52d4d2fef5a4b9e3ab0e51b3901e1cc8b83 Author: Jaswinder Singh Rajput Date: Sun May 3 11:11:35 2009 +0530 x86: cpufeature.h fix name for X86_FEATURE_MCE X86_FEATURE_MCE = Machine Check Exception X86_FEATURE_MCA = Machine Check Architecture [ Impact: cleanup ] Signed-off-by: Jaswinder Singh Rajput LKML-Reference: <1241329295.6321.1.camel@localhost.localdomain> Signed-off-by: Ingo Molnar commit 1cbac972ba28e706fa9ce4d4c81830040bc811ee Author: Cyrill Gorcunov Date: Sat May 2 13:39:56 2009 +0400 x86: uv io-apic - use BUILD_BUG_ON instead of BUG_ON The expression is known to be true/false at compilation time so we're allowed to use build-time instead of run-time check. Also align 'entry' items assignment. [ Impact: shrink kernel a bit, cleanup ] Signed-off-by: Cyrill Gorcunov Cc: Jack Steiner LKML-Reference: <20090502093956.GB4791@lenovo> Signed-off-by: Ingo Molnar commit a454ab3110175d710f4f9a96226a26ce4d5d5de2 Author: Ingo Molnar Date: Sun May 3 10:09:03 2009 +0200 x86, mm: fault.c, use printk_once() in is_errata93() Andrew pointed out that the 'once' variable has a needlessly function-global scope. We can in fact eliminate it completely, via the use of printk_once(). [ Impact: cleanup ] Reported-by: Andrew Morton Cc: Linus Torvalds Signed-off-by: Ingo Molnar commit 6f0aced639d346e5f54eea9fcb2784b633493d09 Author: Cyrill Gorcunov Date: Fri May 1 23:54:25 2009 +0400 x86, apic: use pr_ macro Replace recenly appeared printk with pr_ macro (the file already use a lot of them). [ Impact: cleanup ] Signed-off-by: Cyrill Gorcunov LKML-Reference: <20090501195425.GB4633@lenovo> Signed-off-by: Ingo Molnar commit 4420471f14b79f2a42e4603be7794ea49b68bca4 Merge: 15e957d e0e4214 Author: Ingo Molnar Date: Fri May 1 19:02:50 2009 +0200 Merge branch 'x86/apic' into irq/numa Conflicts: arch/x86/kernel/apic/io_apic.c Merge reason: non-trivial interaction between ongoing work in io_apic.c and the NUMA migration feature in the irq tree. Signed-off-by: Ingo Molnar commit 15e957d08dd4a841359cfec59ecb74041e0097aa Author: Yinghai Lu Date: Thu Apr 30 01:17:50 2009 -0700 x86/irq: use move_irq_desc() in create_irq_nr() move_irq_desc() will try to move irq_desc to the home node if the allocated one is not correct, in create_irq_nr(). ( This can happen on devices that are on different nodes that are using MSI, when drivers are loaded and unloaded randomly. ) v2: fix non-smp build v3: add NUMA_IRQ_DESC to eliminate #ifdefs [ Impact: improve irq descriptor locality on NUMA systems ] Signed-off-by: Yinghai Lu Cc: Andrew Morton Cc: Suresh Siddha Cc: "Eric W. Biederman" Cc: Rusty Russell LKML-Reference: <49F95EAE.2050903@kernel.org> Signed-off-by: Ingo Molnar commit 9ee1983c9aa18f12388ef660d0c76a23dc112959 Author: Jason Baron Date: Thu Apr 30 13:29:47 2009 -0400 tracing: add irq tracepoint documentation Document irqs for the newly created docbook. [ Impact: add documentation ] Signed-off-by: Jason Baron Acked-by: Randy Dunlap Cc: akpm@linux-foundation.org Cc: rostedt@goodmis.org Cc: fweisbec@gmail.com Cc: mathieu.desnoyers@polymtl.ca Cc: wcohen@redhat.com LKML-Reference: <73ff42be3420157667ec548e9b0e409c3cfad05f.1241107197.git.jbaron@redhat.com> Signed-off-by: Ingo Molnar commit a76f8c6da1e48fd4ef025f42c736389532ff30ba Author: Jason Baron Date: Thu Apr 30 13:29:42 2009 -0400 tracing: add new tracepoints docbook Add tracepoint docbook. This will help us document and understand what tracepoints are in the kernel. Since there are multiple macros, and files that contain tracepoints. [ Impact: add documentation ] Signed-off-by: Jason Baron Acked-by: Randy Dunlap Cc: akpm@linux-foundation.org Cc: rostedt@goodmis.org Cc: fweisbec@gmail.com Cc: mathieu.desnoyers@polymtl.ca Cc: wcohen@redhat.com LKML-Reference: <84160b6bd94aff02455da7e12bad054d34c579a0.1241107197.git.jbaron@redhat.com> Signed-off-by: Ingo Molnar commit 56afb0f8823650f53a5f0e96d69a282e8892c61b Author: Jason Baron Date: Thu Apr 30 13:29:36 2009 -0400 kerneldoc, tracing: make kernel-doc understand TRACE_EVENT() macro (take #2) Add support to kernel-doc for tracepoint comments above TRACE_EVENT() macro definitions. Paves the way for tracepoint docbook. [ Impact: extend DocBook infrastructure ] Signed-off-by: Jason Baron Acked-by: Randy Dunlap Cc: akpm@linux-foundation.org Cc: rostedt@goodmis.org Cc: fweisbec@gmail.com Cc: mathieu.desnoyers@polymtl.ca Cc: wcohen@redhat.com LKML-Reference: Signed-off-by: Ingo Molnar commit bf293c17b26b8854241df08b9b63f7270cbde012 Author: Remis Lima Baima Date: Thu Apr 30 18:36:23 2009 +0200 x86: added 'ifndef _ASM_X86_IOMAP_H' to iomap.h iomap.h misses the include guards. [ Impact: cleanup ] Signed-off-by: Remis Lima Baima Signed-off-by: Arnd Bergmann LKML-Reference: <200904301836.23885.arnd@arndb.de> Signed-off-by: Ingo Molnar commit 12d161147f828192b5bcc33166f468a827832767 Author: Thomas Gleixner Date: Sat Apr 4 21:01:10 2009 +0000 x86: hookup sys_rt_tgsigqueueinfo Make the new sys_rt_tgsigqueueinfo available for x86. Signed-off-by: Thomas Gleixner commit 62ab4505e3efaf67784f84059e0fb9cedb1728ea Author: Thomas Gleixner Date: Sat Apr 4 21:01:06 2009 +0000 signals: implement sys_rt_tgsigqueueinfo sys_kill has the per thread counterpart sys_tgkill. sigqueueinfo is missing a thread directed counterpart. Such an interface is important for migrating applications from other OSes which have the per thread delivery implemented. Signed-off-by: Thomas Gleixner Reviewed-by: Oleg Nesterov Acked-by: Roland McGrath Acked-by: Ulrich Drepper commit 30b4ae8a4498543863501f707879b7220b649602 Author: Thomas Gleixner Date: Sat Apr 4 21:01:01 2009 +0000 signals: split do_tkill Split out the code from do_tkill to make it reusable by the follow up patch which implements sys_rt_tgsigqueueinfo Signed-off-by: Thomas Gleixner Reviewed-by: Oleg Nesterov commit 83c4832683bc8ebcd1687b3c0bf3ba1ab253dd4f Author: Sam Ravnborg Date: Thu Apr 30 12:03:16 2009 +0200 x86: boot/compressed/vmlinux.lds.S: fix build of bzImage with 64 bit compiler Jesper reported that he saw following build issue: > ld:arch/x86/boot/compressed/vmlinux.lds:9: syntax error > make[2]: *** [arch/x86/boot/compressed/vmlinux] Error 1 > make[1]: *** [arch/x86/boot/compressed/vmlinux] Error 2 > make: *** [bzImage] Error 2 CPP defines the symbol "i386" to "1". Undefine this to fix it. [ Impact: build fix with certain tool chains ] Reported-by: Jesper Dangaard Brouer Signed-off-by: Sam Ravnborg Cc: Linus Torvalds LKML-Reference: Signed-off-by: Ingo Molnar commit ba9c22f2c01cf5c88beed5a6b9e07d42e10bd358 Author: Darren Hart Date: Mon Apr 20 22:22:22 2009 -0700 futex: remove FUTEX_REQUEUE_PI (non CMP) The new requeue PI futex op codes were modeled after the existing FUTEX_REQUEUE and FUTEX_CMP_REQUEUE calls. I was unaware at the time that FUTEX_REQUEUE was only around for compatibility reasons and shouldn't be used in new code. Ulrich Drepper elaborates on this in his Futexes are Tricky paper: http://people.redhat.com/drepper/futex.pdf. The deprecated call doesn't catch changes to the futex corresponding to the destination futex which can lead to deadlock. Therefor, I feel it best to remove FUTEX_REQUEUE_PI and leave only FUTEX_CMP_REQUEUE_PI as there are not yet any existing users of the API. This patch does change the OP code value of FUTEX_CMP_REQUEUE_PI to 12 from 13. Since my test case is the only known user of this API, I felt this was the right thing to do, rather than leave a hole in the enumeration. I chose to continue using the _CMP_ modifier in the OP code to make it explicit to the user that the test is being done. Builds, boots, and ran several hundred iterations requeue_pi.c. Signed-off-by: Darren Hart LKML-Reference: <49ED580E.1050502@us.ibm.com> Signed-off-by: Thomas Gleixner commit 9518e0e4350a5ea8ca200ce320b28d6284a7b0ce Author: Pekka Enberg Date: Tue Apr 28 16:00:50 2009 +0300 x86: move per-cpu mmu_gathers to mm/init.c [ Impact: cleanup ] Signed-off-by: Pekka Enberg LKML-Reference: <1240923650.1982.22.camel@penberg-laptop> Signed-off-by: Ingo Molnar commit 2b72394e4089643f11669d9610907a1442fe044a Author: Pekka Enberg Date: Tue Apr 28 16:00:49 2009 +0300 x86: move max_pfn_mapped and max_low_pfn_mapped to setup.c This patch moves the max_pfn_mapped and max_low_pfn_mapped global variables to kernel/setup.c where they're initialized. [ Impact: cleanup ] Signed-off-by: Pekka Enberg LKML-Reference: <1240923649.1982.21.camel@penberg-laptop> Signed-off-by: Ingo Molnar commit a511e3f968c462a55ef58697257f5347c73d306e Author: Andrew Morton Date: Wed Apr 29 15:59:58 2009 -0700 mutex: add atomic_dec_and_mutex_lock(), fix include/linux/mutex.h:136: warning: 'mutex_lock' declared inline after being called include/linux/mutex.h:136: warning: previous declaration of 'mutex_lock' was here uninline it. [ Impact: clean up and uninline, address compiler warning ] Signed-off-by: Andrew Morton Cc: Al Viro Cc: Christoph Hellwig Cc: Eric Paris Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <200904292318.n3TNIsi6028340@imap1.linux-foundation.org> Signed-off-by: Ingo Molnar commit 23b94b967f118bef941369238f33c8140be46539 Author: Luis Henriques Date: Wed Apr 29 21:54:51 2009 +0100 locking, rtmutex.c: Documentation cleanup Two minor updates on functions documentation: - Updated documentation for function rt_mutex_unlock(), which contained an incorrect name - Removed extra '*' from comment in function rt_mutex_destroy() [ Impact: cleanup ] Signed-off-by: Luis Henriques Cc: Steven Rostedt LKML-Reference: <20090429205451.GA23154@hades.domain.com> Signed-off-by: Ingo Molnar commit 0c8b946e3ebb3846103486420ea7430a4b5e5b1b Author: Frederic Weisbecker Date: Wed Apr 15 17:48:18 2009 +0200 vsprintf: introduce %pf format specifier A printf format specifier which would allow us to print a pure function name has been suggested by Andrew Morton a couple of months ago. The current %pF is very convenient to print a function symbol, but often we only want to print the name of the function, without its asm offset. That's what %pf does in this patch. The lowecase f has been chosen for its intuitive meaning of a 'weak kind of %pF'. The support for this new format would be welcome by the tracing code where the need to print pure function names is often needed. This is also true for other parts of the kernel: $ git-grep -E "kallsyms_lookup\(.+?\)" arch/blackfin/kernel/traps.c: symname = kallsyms_lookup(address, &symsize, &offset, &modname, namebuf); arch/powerpc/xmon/xmon.c: name = kallsyms_lookup(pc, &size, &offset, NULL, tmpstr); arch/sh/kernel/cpu/sh5/unwind.c: sym = kallsyms_lookup(pc, NULL, &offset, NULL, namebuf); arch/x86/kernel/ftrace.c: kallsyms_lookup((unsigned long) syscall, NULL, NULL, NULL, str); kernel/kprobes.c: sym = kallsyms_lookup((unsigned long)p->addr, NULL, kernel/lockdep.c: return kallsyms_lookup((unsigned long)key, NULL, NULL, NULL, str); kernel/trace/ftrace.c: kallsyms_lookup(rec->ip, NULL, NULL, NULL, str); kernel/trace/ftrace.c: kallsyms_lookup(rec->ip, NULL, NULL, NULL, str); kernel/trace/ftrace.c: kallsyms_lookup((unsigned long)rec->ops->func, NULL, NULL, NULL, str); kernel/trace/ftrace.c: kallsyms_lookup(rec->ip, NULL, NULL, NULL, str); kernel/trace/ftrace.c: kallsyms_lookup(rec->ip, NULL, NULL, NULL, str); kernel/trace/ftrace.c: kallsyms_lookup(rec->ip, NULL, NULL, &modname, str); kernel/trace/ftrace.c: kallsyms_lookup(*ptr, NULL, NULL, NULL, str); kernel/trace/trace_functions.c: kallsyms_lookup(ip, NULL, NULL, NULL, str); kernel/trace/trace_output.c: kallsyms_lookup(address, NULL, NULL, NULL, str); Changes in v2: - Add the explanation of the %pf role for vsnprintf() and bstr_printf() - Change the comments by dropping the "asm offset" notion and only define the %pf against the actual function offset notion. Signed-off-by: Frederic Weisbecker Acked-by: Mike Frysinger Cc: Linus Torvalds Cc: Zhaolei Cc: Tom Zanussi Cc: Li Zefan Cc: Andrew Morton Cc: Steven Rostedt LKML-Reference: <20090415154817.GC5989@nowhere> Signed-off-by: Ingo Molnar commit b1fca26631f76a5e8b18435a43f5d82b8734da4b Author: Eric Paris Date: Mon Mar 23 18:22:09 2009 +0100 mutex: add atomic_dec_and_mutex_lock() Much like the atomic_dec_and_lock() function in which we take an hold a spin_lock if we drop the atomic to 0 this function takes and holds the mutex if we dec the atomic to 0. Signed-off-by: Eric Paris Signed-off-by: Peter Zijlstra Cc: Paul Mackerras Orig-LKML-Reference: <20090323172417.410913479@chello.nl> Signed-off-by: Ingo Molnar commit 50fa610a3b6ba7cf91d7a92229177dfaff2b81a1 Author: David Howells Date: Tue Apr 28 15:01:38 2009 +0100 sched: Document memory barriers implied by sleep/wake-up primitives Add a section to the memory barriers document to note the implied memory barriers of sleep primitives (set_current_state() and wrappers) and wake-up primitives (wake_up() and co.). Also extend the in-code comments on the wake_up() functions to note these implied barriers. [ Impact: add documentation ] Signed-off-by: David Howells Cc: Oleg Nesterov Cc: Linus Torvalds Cc: Andrew Morton LKML-Reference: <20090428140138.1192.94723.stgit@warthog.procyon.org.uk> Signed-off-by: Ingo Molnar commit a0e39ed378fb6ba916522764cd508fa7d42ad495 Author: Heiko Carstens Date: Wed Apr 29 13:51:39 2009 +0200 tracing: fix build failure on s390 "tracing: create automated trace defines" causes this compile error on s390, as reported by Sachin Sant against linux-next: kernel/built-in.o: In function `__do_softirq': (.text+0x1c680): undefined reference to `__tracepoint_softirq_entry' This happens because the definitions of the softirq tracepoints were moved from kernel/softirq.c to kernel/irq/handle.c. Since s390 doesn't support generic hardirqs handle.c doesn't get compiled and the definitions are missing. So move the tracepoints to softirq.c again. [ Impact: fix build failure on s390 ] Reported-by: Sachin Sant Signed-off-by: Heiko Carstens Cc: Steven Rostedt Cc: fweisbec@gmail.com LKML-Reference: <20090429135139.5fac79b8@osiris.boeblingen.de.ibm.com> Signed-off-by: Ingo Molnar commit 8b3725621074040d380664964ffbc40610aef8c6 Author: Tom Zanussi Date: Tue Apr 28 03:04:59 2009 -0500 tracing/filters: a better event parser Replace the current event parser hack with a better one. Filters are no longer specified predicate by predicate, but all at once and can use parens and any of the following operators: numeric fields: ==, !=, <, <=, >, >= string fields: ==, != predicates can be combined with the logical operators: &&, || examples: "common_preempt_count > 4" > filter "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter If there was an error, the erroneous string along with an error message can be seen by looking at the filter e.g.: ((sig >= 10 && sig < 15) || dsig == 17) && comm != bash ^ parse_error: Field not found Currently the caret for an error always appears at the beginning of the filter; a real position should be used, but the error message should be useful even without it. To clear a filter, '0' can be written to the filter file. Filters can also be set or cleared for a complete subsystem by writing the same filter as would be written to an individual event to the filter file at the root of the subsytem. Note however, that if any event in the subsystem lacks a field specified in the filter being set, the set will fail and all filters in the subsytem are automatically cleared. This change from the previous version was made because using only the fields that happen to exist for a given event would most likely result in a meaningless filter. Because the logical operators are now implemented as predicates, the maximum number of predicates in a filter was increased from 8 to 16. [ Impact: add new, extended trace-filter implementation ] Signed-off-by: Tom Zanussi Acked-by: Steven Rostedt Cc: fweisbec@gmail.com Cc: Li Zefan LKML-Reference: <1240905899.6416.121.camel@tropicana> Signed-off-by: Ingo Molnar commit a118e4d1402f1349fe3d953493e4168a300a752d Author: Tom Zanussi Date: Tue Apr 28 03:04:53 2009 -0500 tracing/filters: distinguish between signed and unsigned fields The new filter comparison ops need to be able to distinguish between signed and unsigned field types, so add an is_signed flag/param to the event field struct/trace_define_fields(). Also define a simple macro, is_signed_type() to determine the signedness at compile time, used in the trace macros. If the is_signed_type() macro won't work with a specific type, a new slightly modified version of TRACE_FIELD() called TRACE_FIELD_SIGN(), allows the signedness to be set explicitly. [ Impact: extend trace-filter code for new feature ] Signed-off-by: Tom Zanussi Acked-by: Steven Rostedt Cc: fweisbec@gmail.com Cc: Li Zefan LKML-Reference: <1240905893.6416.120.camel@tropicana> Signed-off-by: Ingo Molnar commit 30e673b230f9d556eb81ef68a7b1a08c8b3b142c Author: Tom Zanussi Date: Tue Apr 28 03:04:47 2009 -0500 tracing/filters: move preds into event_filter object Create a new event_filter object, and move the pred-related members out of the call and subsystem objects and into the filter object - the details of the filter implementation don't need to be exposed in the call and subsystem in any case, and it will also help make the new parser implementation a little cleaner. [ Impact: refactor trace-filter code to prepare for new features ] Signed-off-by: Tom Zanussi Acked-by: Steven Rostedt Cc: fweisbec@gmail.com Cc: Li Zefan LKML-Reference: <1240905887.6416.119.camel@tropicana> Signed-off-by: Ingo Molnar commit fd0731944333db6e9e91b6954c6ef95f4b71ab04 Author: Ingo Molnar Date: Wed Apr 29 12:56:58 2009 +0200 x86, vmlinux.lds: fix relocatable symbols __init_begin/_end symbols should be inside sections as well, otherwise the relocatable kernel gets confused when freeing init sections in the wrong place. [ Impact: fix bootup crash ] Cc: Sam Ravnborg Cc: Tim Abbott Cc: Linus Torvalds LKML-Reference: <20090429105056.GA28720@uranus.ravnborg.org> Signed-off-by: Ingo Molnar commit 0f9a623dd6c9b5b4dd00c232f29525bfc7a8ecf2 Author: Stuart Bennett Date: Tue Apr 28 20:17:51 2009 +0100 tracing: x86, mmiotrace: only register for die notifier when tracer active Follow up to afcfe024aebd74b0984a41af9a34e009cf5badaf in Linus' tree ("x86: mmiotrace: quieten spurious warning message") Signed-off-by: Stuart Bennett Acked-by: Pekka Paalanen Cc: Steven Rostedt LKML-Reference: <1240946271-7083-5-git-send-email-stuart@freedesktop.org> Signed-off-by: Ingo Molnar commit 46e91d00b1165b14b484aa33800e1bba0794ae1a Author: Stuart Bennett Date: Tue Apr 28 20:17:50 2009 +0100 tracing: x86, mmiotrace: refactor clearing/restore of page presence * change function names to clear_* from set_*: in reality we only clear and restore page presence, and never unconditionally set present. Using clear_*({true, false}, ...) is therefore more honest than set_*({false, true}, ...) * upgrade presence storage to pteval_t: doing user-space tracing will require saving and manipulation of the _PAGE_PROTNONE bit, in addition to the existing _PAGE_PRESENT changes, and having multiple bools stored and passed around does not seem optimal [ Impact: refactor, clean up mmiotrace code ] Signed-off-by: Stuart Bennett Acked-by: Pekka Paalanen Cc: Steven Rostedt LKML-Reference: <1240946271-7083-4-git-send-email-stuart@freedesktop.org> Signed-off-by: Ingo Molnar commit 0492e1bb8fe7d122901c9f3af75e537d4129712e Author: Stuart Bennett Date: Tue Apr 28 20:17:49 2009 +0100 tracing: x86, mmiotrace: code consistency/legibility improvement kmmio_probe being *p and kmmio_fault_page being sometimes *f and sometimes *p is not helpful. [ Impact: cleanup ] Signed-off-by: Stuart Bennett Acked-by: Pekka Paalanen Cc: Steven Rostedt LKML-Reference: <1240946271-7083-3-git-send-email-stuart@freedesktop.org> Signed-off-by: Ingo Molnar commit 91fd7fe809bdf4d8aa56559d17b9f25a1a6fe732 Author: Ingo Molnar Date: Wed Apr 29 10:58:38 2009 +0200 x86, vmlinux.lds: add copyright Acked-by: Sam Ravnborg Cc: Tim Abbott Cc: Linus Torvalds LKML-Reference: <1240991249-27117-2-git-send-email-sam@ravnborg.org> Signed-off-by: Ingo Molnar commit 091e52c3551d3031343df24b573b770b4c6c72b6 Author: Sam Ravnborg Date: Wed Apr 29 09:47:29 2009 +0200 x86, vmlinux.lds: unify remaining parts 32 bit: - explicit page align .bss - move ALING() out of .brk output section - discard *(.eh_frame) 64 bit: - move ALIGN() out of .bss output section - move ALIGN() out of .brk output section - use a dedicated section to define _end [ Impact: unify and fix section alignments in linker script ] Signed-off-by: Sam Ravnborg Cc: Tim Abbott Cc: Linus Torvalds LKML-Reference: <1240991249-27117-13-git-send-email-sam@ravnborg.org> Signed-off-by: Ingo Molnar commit 9d16e78318f174fd4b07916a93e41749d5199267 Author: Sam Ravnborg Date: Wed Apr 29 09:47:28 2009 +0200 x86, vmlinux.lds: unify percpu 32 bit: - move __init_end outside the .bss output section It really did not belong in there [ Impact: 64-bit: cleanup, 32-bit: refactor linker script ] Signed-off-by: Sam Ravnborg Cc: Tim Abbott Cc: Linus Torvalds LKML-Reference: <1240991249-27117-12-git-send-email-sam@ravnborg.org> Signed-off-by: Ingo Molnar commit bf6a57418d5445c98047edbec022c9e54d1526e6 Author: Sam Ravnborg Date: Wed Apr 29 09:47:27 2009 +0200 x86, vmlinux.lds: unify .exit.* and .init.ramfs [ Impact: cleanup ] Signed-off-by: Sam Ravnborg Cc: Tim Abbott Cc: Linus Torvalds LKML-Reference: <1240991249-27117-11-git-send-email-sam@ravnborg.org> Signed-off-by: Ingo Molnar commit ae61836289a415351caa524d328110aaeae100d4 Author: Sam Ravnborg Date: Wed Apr 29 09:47:26 2009 +0200 x86, vmlinux.lds: unify parainstructions 32 bit: - increase alignment from 4 to 8 for .parainstructions - increase alignment from 4 to 8 for .altinstructions 64 bit: - move ALIGN() outside output section for .altinstructions None of the above should result in any functional change. [ Impact: refactor and unify linker script ] Signed-off-by: Sam Ravnborg Cc: Tim Abbott Cc: Linus Torvalds LKML-Reference: <1240991249-27117-10-git-send-email-sam@ravnborg.org> Signed-off-by: Ingo Molnar commit e58bdaa8f810332e5c1760ce496b01e07d51642c Author: Sam Ravnborg Date: Wed Apr 29 09:47:25 2009 +0200 x86, vmlinux.lds: unify first part of initdata 32-bit: - Move definition of __init_begin outside output_section because it covers more than one section - Move ALIGN() for end-of-section inside .smp_locks output section. Same effect but the intent is better documented that we need both start and end aligned. 64-bit: - Move ALIGN() outside output section in .init.setup - Deleted unused __smp_alt_* symbols None of the above should result in any functional change. [ Impact: refactor and unify linker script ] Signed-off-by: Sam Ravnborg Cc: Tim Abbott Cc: Linus Torvalds LKML-Reference: <1240991249-27117-9-git-send-email-sam@ravnborg.org> Signed-off-by: Ingo Molnar commit ff6f87e1626e10beef675084c9b5384a9477e3d5 Author: Sam Ravnborg Date: Wed Apr 29 09:47:24 2009 +0200 x86, vmlinux.lds: move vsyscall output sections [ Impact: cleanup ] Signed-off-by: Sam Ravnborg Cc: Tim Abbott Cc: Linus Torvalds LKML-Reference: <1240991249-27117-8-git-send-email-sam@ravnborg.org> Signed-off-by: Ingo Molnar commit 1f6397bac55040cd520d9eaf299e155a7aa01d5f Author: Sam Ravnborg Date: Wed Apr 29 09:47:23 2009 +0200 x86, vmlinux.lds: unify data output sections For 64 bit the following functional changes are introduced: - .data.page_aligned has moved - .data.cacheline_aligned has moved - .data.read_mostly has moved - ALIGN() moved out of output section for .data.cacheline_aligned - ALIGN() moved out of output section for .data.page_aligned Notice that 32 bit and 64 bit has different location of _edata. .data_nosave is 32 bit only as 64 bit is special due to PERCPU. [ Impact: 32-bit: cleanup, 64-bit: use 32-bit linker script ] Signed-off-by: Sam Ravnborg Cc: Tim Abbott Cc: Linus Torvalds LKML-Reference: <1240991249-27117-7-git-send-email-sam@ravnborg.org> Signed-off-by: Ingo Molnar commit 448bc3ab0d03e77fee8e4264de0d001fc87bc164 Author: Sam Ravnborg Date: Wed Apr 29 09:47:22 2009 +0200 x86, vmlinux.lds: unify exception table [ Impact: cleanup ] Signed-off-by: Sam Ravnborg Cc: Tim Abbott Cc: Linus Torvalds LKML-Reference: <1240991249-27117-6-git-send-email-sam@ravnborg.org> Signed-off-by: Ingo Molnar commit dfc20895d944cfa81d8ff00809b68ecb8f72cbb0 Author: Sam Ravnborg Date: Wed Apr 29 09:47:21 2009 +0200 x86, vmlinux.lds: unify .text output sections 32 bit x86 had a dedicated .text.head output section, whereas 64 bit had it all in a single output section. In the unified version the dedicated .text.head output section was kept to have full control over the head code. 32 bit: - Moved definition of _stext to the linker script. The definition is located _after_ .text.page_aligned as this is what 32 bit did before. The ALIGN(8) was introduced so we hit the exact same address (on the tested config) before and after the move. I assume that it is a bug that _stext did not cover the .text.page_aligned section - if this is true it can be fixed in a follow-up patch (and the ugly ALIGN() can be dropped). [ Impact: 64-bit: cleanup, 32-bit: use the 64-bit linker script ] Signed-off-by: Sam Ravnborg Cc: Tim Abbott Cc: Linus Torvalds LKML-Reference: <1240991249-27117-5-git-send-email-sam@ravnborg.org> Signed-off-by: Ingo Molnar commit 444e0ae4831f99ba25062d9a5ccb7117c62841a0 Author: Sam Ravnborg Date: Wed Apr 29 09:47:20 2009 +0200 x86, vmlinux.lds: unify start/end of SECTIONS [ Impact: cleanup ] Signed-off-by: Sam Ravnborg Cc: Tim Abbott Cc: Linus Torvalds LKML-Reference: <1240991249-27117-4-git-send-email-sam@ravnborg.org> Signed-off-by: Ingo Molnar commit afb8095a7eab32e5760613fa73d2f80a39cc45bf Author: Sam Ravnborg Date: Wed Apr 29 09:47:19 2009 +0200 x86, vmlinux.lds: unify PHDRS PHDRS are not equal for the two - so use ifdefs to cover up for that. On the assumption that they may become equal the ifdef is inside the PHDRS definiton. [ Impact: cleanup ] Signed-off-by: Sam Ravnborg Cc: Tim Abbott Cc: Linus Torvalds LKML-Reference: <1240991249-27117-3-git-send-email-sam@ravnborg.org> Signed-off-by: Ingo Molnar commit 17ce265d6a1789eae5eb739a3bb7fcffdb3e87c5 Author: Sam Ravnborg Date: Wed Apr 29 09:47:18 2009 +0200 x86, vmlinux.lds: unify header/footer Merge everything except PHDRS and SECTIONS into vmlinux.lds.S. [ Impact: cleanup ] Signed-off-by: Sam Ravnborg Cc: Tim Abbott Cc: Linus Torvalds LKML-Reference: <1240991249-27117-2-git-send-email-sam@ravnborg.org> Signed-off-by: Ingo Molnar commit aee6a166a5401dcfcb17fcdc055e5edf2a4f4042 Author: Sam Ravnborg Date: Wed Apr 29 09:47:17 2009 +0200 x86: beautify vmlinux_32.lds.S Beautify vmlinux_32.lds.S: - Use tabs for indent - Located curly braces like in C code - Rearranged a few comments To see actual differences use "git diff -b" which ignore 'whitespace' changes. The beautification is done to prepare a unification of the _32 and _64 variants of the linker scripts. [ Impact: cleanup ] Signed-off-by: Sam Ravnborg Cc: Tim Abbott Cc: Linus Torvalds LKML-Reference: <1240991249-27117-1-git-send-email-sam@ravnborg.org> Signed-off-by: Ingo Molnar commit 7d7d2b803159d4edeb051b0e5efbc1a8d9ef1c67 Author: Steven Rostedt Date: Mon Apr 27 12:37:49 2009 -0400 ring-buffer: fix printk output The warning output in trace_recursive_lock uses %d for a long when it should be %ld. [ Impact: fix compile warning ] Signed-off-by: Steven Rostedt commit f2957f1f196b0217644a17c1379855a118a37d72 Author: Steven Rostedt Date: Wed Apr 29 00:26:30 2009 -0400 tracing: have splice only copy full pages Splice works with pages, it is much more effecient to use an entire page than to copy bits over several pages. Using logdev to trace the internals of the splice mechanism, I was able to see that splice can be very aggressive. When tracing is occurring, and the reader caught up to the writer, and the writer is on the reader page, the reader will copy what is there into the splice page. Splice may iterate over several pages and if the writer is still writing to the page, the reader will keep copying bits to new pages to pass to userspace. This patch changes it to only pass data to userspace if the page is full (the writer has left the page). This has a small side effect that splice can not read a partial page, and must wait for the page to fill. This should not be an issue. If tracing has stopped, then a use of "read" will still read all of the page. [ Impact: better performance for ring buffer splice code ] Signed-off-by: Steven Rostedt commit 93459c6cb9816c52200993d29dd18cea1daee335 Author: Steven Rostedt Date: Wed Apr 29 00:23:13 2009 -0400 tracing: only add splice page if entries exist The splice code allocates a page even when the ring buffer is empty. It detects the ring buffer being empty when it it fails to copy anything from the ring buffer into the page. This patch adds a check to see if there is anything in the ring buffer before allocating a page. Thanks to logdev for letting me trace the tracer to find this. [ Impact: speed up due to removing unnecessary allocation ] Signed-off-by: Steven Rostedt commit 5beae6efd1004b44c3e257dc96087978e4c763c1 Author: Steven Rostedt Date: Wed Apr 29 00:16:21 2009 -0400 tracing: fix ref count in splice pages The pages allocated for the splice binary buffer did not initialize the ref count correctly. This caused pages not to be freed and causes a drastic memory leak. Thanks to logdev I was able to trace the tracer to find where the leak was. [ Impact: stop memory leak when using splice ] Signed-off-by: Steven Rostedt commit edc953fa4ebc0265ef3b1754fe116a9fd4264e15 Author: Mathieu Desnoyers Date: Tue Apr 28 11:13:46 2009 -0400 x86: clean up alternative.h Alternative header duplicates assembly that could be merged in one single macro. Merging this into this macro also allows to directly declare ALTERNATIVE() statements within assembly code. Uses a __stringify() of the feature bits rather than passing a "i" operand. Leave the old %0 operand as-is (set to 0), unused to stay compatible with API. (v2: tab alignment fixes) [ Impact: cleanup ] Signed-off-by: Mathieu Desnoyers LKML-Reference: <20090428151346.GA31212@Krystal> Signed-off-by: Ingo Molnar commit cd891ae0305601bdb4d2e7e85282961c4ff256cd Author: Steven Rostedt Date: Tue Apr 28 11:39:34 2009 -0400 tracing: convert ftrace_dump spinlocks to raw ftrace_dump is used for printing out the contents of the ftrace ring buffer to the console on failure. Currently it uses a spinlock to synchronize the output from multiple failures on different CPUs. This spin lock currently is a normal spinlock and can cause issues with lockdep and lock tracing. This patch converts it to raw since it is for error handling only. The lock is local to the ftrace_dump and is not used by any other infrastructure. [ Impact: prevent ftrace_dump from locking up by internal tracing ] Signed-off-by: Steven Rostedt commit 56b581ea9591b5767b1e0204c6a06c7d0c49396e Author: Yinghai Lu Date: Mon Apr 27 18:02:46 2009 -0700 irq: make ht irq_desc more numa aware Try to get irq_desc on the same node as create_irq_nr(). [ Impact: optimization, make HT IRQs more NUMA-aware ] Signed-off-by: Yinghai Lu Cc: Andrew Morton Cc: Suresh Siddha Cc: "Eric W. Biederman" Cc: Rusty Russell LKML-Reference: <49F655B6.8020109@kernel.org> Signed-off-by: Ingo Molnar commit d047f53a2ecce37e3bdf79eac5a326fbaadb3628 Author: Yinghai Lu Date: Mon Apr 27 18:02:23 2009 -0700 x86/irq: change MSI irq_desc to be more numa aware Try to get irq_desc on the home node in create_irq_nr(). v2: don't check if we can move it when sparse_irq is not used v3: use move_irq_des, if that node is not what we want [ Impact: optimization, make MSI IRQ descriptors more NUMA aware ] Signed-off-by: Yinghai Lu Cc: Andrew Morton Cc: Suresh Siddha Cc: "Eric W. Biederman" Cc: Rusty Russell LKML-Reference: <49F6559F.7070005@kernel.org> Signed-off-by: Ingo Molnar commit 024154cfdd802654cb236a18c78b6e37351e2c49 Author: Yinghai Lu Date: Mon Apr 27 18:01:50 2009 -0700 irq: change io_apic_set_pci_routing() to use device parameter Make actual use of the device parameter passed down to io_apic_set_pci_routing() - to have the IRQ descriptor on the home node of the device. If no device has been passed down, we assume it's a platform device and use the boot node ID for the IRQ descriptor. [ Impact: optimization, make IO-APIC code more NUMA aware ] Signed-off-by: Yinghai Lu Cc: Andrew Morton Cc: Suresh Siddha Cc: "Eric W. Biederman" Cc: Rusty Russell LKML-Reference: <49F6557E.3080101@kernel.org> Signed-off-by: Ingo Molnar commit a2f809b08ae4dddc1015c7dcd8659e5729e45b3e Author: Yinghai Lu Date: Mon Apr 27 18:01:20 2009 -0700 irq: change ACPI GSI APIs to also take a device argument We want to use dev_to_node() later on, to be aware of the 'home node' of the GSI in question. [ Impact: cleanup, prepare the IRQ code to be more NUMA aware ] Signed-off-by: Yinghai Lu Acked-by: Len Brown Cc: Andrew Morton Cc: Suresh Siddha Cc: "Eric W. Biederman" Cc: Rusty Russell Cc: Len Brown Cc: Bjorn Helgaas Cc: Tony Luck Cc: linux-acpi@vger.kernel.org Cc: linux-ia64@vger.kernel.org LKML-Reference: <49F65560.20904@kernel.org> Signed-off-by: Ingo Molnar commit 85ac16d033370caf6f48d743c8dc8103700f5cc5 Author: Yinghai Lu Date: Mon Apr 27 18:00:38 2009 -0700 x86/irq: change irq_desc_alloc() to take node instead of cpu This simplifies the node awareness of the code. All our allocators only deal with a NUMA node ID locality not with CPU ids anyway - so there's no need to maintain (and transform) a CPU id all across the IRq layer. v2: keep move_irq_desc related [ Impact: cleanup, prepare IRQ code to be NUMA-aware ] Signed-off-by: Yinghai Lu Cc: Andrew Morton Cc: Suresh Siddha Cc: "Eric W. Biederman" Cc: Rusty Russell Cc: Jeremy Fitzhardinge LKML-Reference: <49F65536.2020300@kernel.org> Signed-off-by: Ingo Molnar commit 57b150cce8e004ddd36330490a68bfb59b7271e9 Author: Yinghai Lu Date: Mon Apr 27 17:59:53 2009 -0700 irq: only update affinity if ->set_affinity() is sucessfull irq_set_affinity() and move_masked_irq() try to assign affinity before calling chip set_affinity(). Some archs are assigning it in ->set_affinity() again. We do something like: cpumask_cpy(desc->affinity, mask); desc->chip->set_affinity(mask); But in the failure path, affinity should not be touched - otherwise we'll end up with a different affinity mask despite the failure to migrate the IRQ. So try to update the afffinity only if set_affinity returns with 0. Also call irq_set_thread_affinity accordingly. v2: update after "irq, x86: Remove IRQ_DISABLED check in process context IRQ move" v3: according to Ingo, change set_affinity() in irq_chip should return int. v4: update comments by removing moving irq_desc code. [ Impact: fix /proc/irq/*/smp_affinity setting corner case bug ] Signed-off-by: Yinghai Lu Cc: Andrew Morton Cc: Suresh Siddha Cc: "Eric W. Biederman" Cc: Rusty Russell LKML-Reference: <49F65509.60307@kernel.org> Signed-off-by: Ingo Molnar commit d5dedd4507d307eb3f35f21b6e16f336fdc0d82a Author: Yinghai Lu Date: Mon Apr 27 17:59:21 2009 -0700 irq: change ->set_affinity() to return status according to Ingo, change set_affinity() in irq_chip should return int, because that way we can handle failure cases in a much cleaner way, in the genirq layer. v2: fix two typos [ Impact: extend API ] Signed-off-by: Yinghai Lu Cc: Andrew Morton Cc: Suresh Siddha Cc: "Eric W. Biederman" Cc: Rusty Russell Cc: linux-arch@vger.kernel.org LKML-Reference: <49F654E9.4070809@kernel.org> Signed-off-by: Ingo Molnar commit fcef5911c7ea89b80d5bfc727f402f37c9eefd57 Author: Yinghai Lu Date: Mon Apr 27 17:58:23 2009 -0700 x86/irq: remove leftover code from NUMA_MIGRATE_IRQ_DESC The original feature of migrating irq_desc dynamic was too fragile and was causing problems: it caused crashes on systems with lots of cards with MSI-X when user-space irq-balancer was enabled. We now have new patches that create irq_desc according to device numa node. This patch removes the leftover bits of the dynamic balancer. [ Impact: remove dead code ] Signed-off-by: Yinghai Lu Cc: Andrew Morton Cc: Suresh Siddha Cc: "Eric W. Biederman" Cc: Rusty Russell LKML-Reference: <49F654AF.8000808@kernel.org> Signed-off-by: Ingo Molnar commit 9ec4fa271faf2db3b8e1419c998da1ca6b094eb6 Author: Yinghai Lu Date: Mon Apr 27 17:57:18 2009 -0700 irq, cpumask: correct CPUMASKS_OFFSTACK typo and fix fallout CPUMASKS_OFFSTACK is not defined anywhere (it is CPUMASK_OFFSTACK). It is a typo and init_allocate_desc_masks() is called before it set affinity to all cpus... Split init_alloc_desc_masks() into all_desc_masks() and init_desc_masks(). Also use CPUMASK_OFFSTACK in alloc_desc_masks(). [ Impact: fix smp_affinity copying/setup when moving irq_desc between CPUs ] Signed-off-by: Yinghai Lu Acked-by: Rusty Russell Cc: Andrew Morton Cc: Suresh Siddha Cc: "Eric W. Biederman" LKML-Reference: <49F6546E.3040406@kernel.org> Signed-off-by: Ingo Molnar commit e0e42142bab96404de535cceb85d6533d5ad7942 Author: Yinghai Lu Date: Sun Apr 26 23:39:38 2009 -0700 x86: Use dmi check in apic_is_clustered() on 64-bit to mark the TSC unstable We will have systems with 2 and more sockets 8cores/2thread, but we treat them as multi chassis - while they could have a stable TSC domain. Use DMI check instead. [ Impact: do not turn possibly stable TSCs off incorrectly ] Signed-off-by: Yinghai Lu Cc: Ravikiran Thirumalai LKML-Reference: <49F5532A.5000802@kernel.org> Signed-off-by: Ingo Molnar commit b2ba83ff4f4405cebc10884121ee71338a1a6c94 Author: Yinghai Lu Date: Sun Apr 26 23:38:08 2009 -0700 x86: apic: Remove duplicated macros XAPIC_DEST_* is dupliicated to the one in apicdef.h [ Impact: cleanup ] Signed-off-by: Yinghai Lu LKML-Reference: <49F552D0.5050505@kernel.org> Signed-off-by: Ingo Molnar commit 51b26ada79b605ed709ddcedbb6012e8f8e0ebed Author: Linus Torvalds Date: Sun Apr 26 10:12:47 2009 -0700 x86: unify arch/x86/boot/compressed/vmlinux_*.lds Look at the: diff -u arch/x86/boot/compressed/vmlinux_*.lds output and realize that they're basially exactly the same except for trivial naming differences, and the fact that the 64-bit version has a "pgtable" thing. So unify them. There's some trivial cleanup there (make the output format a Kconfig thing rather than doing #ifdef's for it, and unify both 32-bit and 64-bit BSS end to "_ebss", where 32-bit used to use the traditional "_end"), but other than that it's really very mindless and straigt conversion. For example, I think we should aim to remove "startup_32" vs "startup_64", and just call it "startup", and get rid of one more difference. I didn't do that. Also, notice the comment in the unified vmlinux.lds.S talks about "head_64" and "startup_32" which is an odd and incorrect mix, but that was actually what the old 64-bit only lds file had, so the confusion isn't new, and now that mixing is arguably more accurate thanks to the vmlinux.lds.S file being shared between the two cases ;) [ Impact: cleanup, unification ] Signed-off-by: Linus Torvalds Acked-by: Sam Ravnborg Signed-off-by: Ingo Molnar commit 0a3ec21fcd311b26ab0f249d62960e127bc20ca8 Author: Sam Ravnborg Date: Sun Apr 26 23:07:42 2009 +0200 x86: beautify vmlinux_64.lds.S Beautify vmlinux_64.lds.S: - Use tabs for indent - Located curly braces like in C code - Rearranged a few comments There is no functional changes in this patch The beautification is done to prepare a unification of the _32 and the _64 variants of the linker scripts. [ Impact: cleanup ] Signed-off-by: Sam Ravnborg Cc: Tim Abbott Cc: Linus Torvalds LKML-Reference: <20090426210742.GA3464@uranus.ravnborg.org> Signed-off-by: Ingo Molnar commit 701970b3a83cc639c1ec8fc6f40a7871cb99426f Author: Steven Rostedt Date: Fri Apr 24 23:11:22 2009 -0400 tracing/events: make modules have their own file_operations structure For proper module reference counting, the file_operations that modules use must have the "owner" field set to the module. Unfortunately, the trace events use share file_operations. The same file_operations are used by all both kernel core and all modules. This patch makes the modules allocate their own file_operations and copies the functions from the core kernel. This allows those file operations to be owned by the module. Care is taken to free this code on module unload. Thanks to Greg KH for reminding me that file_operations must be owned by the module to have reference counting take place. [ Impact: fix modular tracepoints / potential crash ] Signed-off-by: Steven Rostedt Acked-by: Greg Kroah-Hartman commit 060fa5c83e67901ba47ab484cfcdb32737d630ba Author: Steven Rostedt Date: Fri Apr 24 12:20:52 2009 -0400 tracing/events: reuse trace event ids after overflow With modules being able to add trace events, and the max trace event counter is 16 bits (65536) we can overflow the counter easily with a simple while loop adding and removing modules that contain trace events. This patch links together the registered trace events and on overflow searches for available trace event ids. It will still fail if over 65536 events are registered, but considering that a typical kernel only has 22000 functions, 65000 events should be sufficient. Reported-by: Li Zefan Signed-off-by: Steven Rostedt commit b8e65554d80b4c560d201362d0e8fa02109d89fd Author: Steven Rostedt Date: Fri Apr 24 11:50:39 2009 -0400 tracing: remove deprecated TRACE_FORMAT The TRACE_FORMAT macro has been deprecated by the TRACE_EVENT macro. There are no more users. All new users must use the TRACE_EVENT macro. [ Impact: remove old functionality ] Cc: Peter Zijlstra Signed-off-by: Steven Rostedt commit 160031b556e93590fa8635210d73d93c3d3853a9 Author: Steven Rostedt Date: Fri Apr 24 11:26:55 2009 -0400 tracing/irq: convert irq traces to use TRACE_EVENT macro The TRACE_FORMAT will soon be deprecated. This patch converts it to the TRACE_EVENT macro. Note, this change should also speed up the tracing. [ Impact: remove a user of deprecated TRACE_FORMAT ] Cc: Jason Baron Signed-off-by: Steven Rostedt commit 39517091f88fae32b52254b561ced78da1eaf0a7 Author: Steven Rostedt Date: Fri Apr 24 11:05:52 2009 -0400 tracing/lockdep: convert lockdep to use TRACE_EVENT macro The TRACE_FORMAT will soon be deprecated. This patch converts it to the TRACE_EVENT macro. Note, this change should also speed up the tracing. [ Impact: remove a user of deprecated TRACE_FORMAT ] Cc: Peter Zijlstra Signed-off-by: Steven Rostedt commit 1cb81b143fa8f0e4629f10690862e2e52ca792ff Author: Markus Metzger Date: Fri Apr 24 09:51:43 2009 +0200 x86, bts, mm: clean up buffer allocation The current mm interface is asymetric. One function allocates a locked buffer, another function only refunds the memory. Change this to have two functions for accounting and refunding locked memory, respectively; and do the actual buffer allocation in ptrace. [ Impact: refactor BTS buffer allocation code ] Signed-off-by: Markus Metzger Acked-by: Andrew Morton Cc: Peter Zijlstra LKML-Reference: <20090424095143.A30265@sedona.ch.intel.com> Signed-off-by: Ingo Molnar commit 7e0bfad24d85de7cf2202a7b0ce51de11a077b21 Author: Markus Metzger Date: Fri Apr 24 09:44:48 2009 +0200 x86, bts: reenable ptrace branch trace support The races found by Oleg Nesterov have been fixed. Reenable branch trace support. Signed-off-by: Markus Metzger Acked-by: Oleg Nesterov LKML-Reference: <20090424094448.A30216@sedona.ch.intel.com> Signed-off-by: Ingo Molnar commit 782cc5ae6331d63b4febaa312c9d14493aafa9b8 Author: Markus Metzger Date: Fri Apr 24 09:43:09 2009 +0200 x86, ds: fix buffer alignment in debug store selftest The debug store selftest code uses a stack-allocated buffer, which is not necessarily correctly aligned. For tests using a buffer to hold a single entry, the buffer that is passed to ds_request must already be suitably aligned. Pass a suitably aligned portion of the bigger buffer. [ Impact: fix hw-branch-tracer self-test failure ] Signed-off-by: Markus Metzger Cc: markus.t.metzger@gmail.com LKML-Reference: <20090424094309.A30145@sedona.ch.intel.com> Signed-off-by: Ingo Molnar commit 416dfdcdb894432547ead4fcb9fa6a36b396059e Merge: 56449f4 0910697 Author: Ingo Molnar Date: Fri Apr 24 10:11:18 2009 +0200 Merge commit 'v2.6.30-rc3' into tracing/hw-branch-tracing Conflicts: arch/x86/kernel/ptrace.c Merge reason: fix the conflict above, and also pick up the CONFIG_BROKEN dependency change from upstream so that we can remove it here. Signed-off-by: Ingo Molnar commit 334d4169a6592d3fcd863bbe822a8f6985ffa9af Author: Lai Jiangshan Date: Fri Apr 24 11:27:05 2009 +0800 ring_buffer: compressed event header RB_MAX_SMALL_DATA = 28bytes is too small for most tracers, it wastes an 'u32' to save the actually length for events which data size > 28. This fix uses compressed event header and enlarges RB_MAX_SMALL_DATA. [ Impact: saves about 0%-12.5%(depends on tracer) memory in ring_buffer ] Signed-off-by: Lai Jiangshan LKML-Reference: <49F13189.3090000@cn.fujitsu.com> Signed-off-by: Steven Rostedt commit c2518c4366f087ebc10b3919cb2461bbe4f42d0c Author: Steven Rostedt Date: Thu Apr 23 23:26:18 2009 -0400 tracing: fix cut and paste macro error In case a module uses the TRACE_EVENT macro for creating automated events in ftrace, it may choose to use a different file name than the defined system name, or choose to use a different path than the default "include/trace/events" include path. If this is done, then before including trace/define_trace.h the header would define either "TRACE_INCLUDE_FILE" for the file name or "TRACE_INCLUDE_PATH" for the include path. If it does not define these, then the define_trace.h defines them instead. If define trace defines them, then define_trace.h should also undefine them before exiting. To do this a macro is used to note this: #ifndef TRACE_INCLUDE_FILE # define TRACE_INCLUDE_FILE TRACE_SYSTEM # define UNDEF_TRACE_INCLUDE_FILE #endif [...] #ifdef UNDEF_TRACE_INCLUDE_FILE # undef TRACE_INCLUDE_FILE # undef UNDEF_TRACE_INCLUDE_FILE #endif The UNDEF_TRACE_INCLUDE_FILE acts as a CPP variable to know to undef the TRACE_INCLUDE_FILE before leaving define_trace.h. Unfortunately, due to cut and paste errors, the macros between FILE and PATH got mixed up. [ Impact: undef TRACE_INCLUDE_FILE and/or TRACE_INCLUDE_PATH when needed ] Signed-off-by: Steven Rostedt commit d7285c6b5c54397fdf112c2fb98ee43193173aa9 Author: Chris Wright Date: Thu Apr 23 10:21:38 2009 -0700 x86: use native register access for native tlb flushing currently these are paravirtulaized, doesn't appear any callers rely on this (no pv_ops backends are using native_tlb and overriding cr3/4 access). [ Impact: fix lockdep warning with paravirt and function tracer ] Signed-off-by: Chris Wright LKML-Reference: <20090423172138.GR3036@sequoia.sous-sol.org> Signed-off-by: Steven Rostedt commit 75db37d2f4c0ad9466ead57d467277d097b4105c Author: Steven Rostedt Date: Thu Mar 26 11:43:36 2009 -0400 tracing: add size checks for exported ftrace internal structures The events exported by TRACE_EVENT are automated and are guaranteed to be correct when used. The internal ftrace structures on the other hand are more manually exported. These require the ftrace maintainer to make sure they are up to date. This patch adds a size check to help flag when a type changes in an internal ftrace data structure, and the update needs to be reflected in the export. If a export is incorrect, then the only harm is that the user space tools will not know how to correctly read the internal structures of ftrace. [ Impact: help prevent inconsistent ftrace format print outs ] Signed-off-by: Steven Rostedt commit 89ec0dee9eba6275d47be0b878cf5f6d5c2fb6eb Author: Steven Rostedt Date: Thu Mar 26 11:03:29 2009 -0400 tracing: increase size of number of possible events With the new event tracing registration, we must increase the number of events that can be registered. Currently the type field is only one byte, which leaves us only 256 possible events. Since we do not save the CPU number in the tracer anymore (it is determined by the per cpu ring buffer that is used) we have an extra byte to use. This patch increases the size of type from 1 byte (256 events) to 2 bytes (65,536 events). It also adds a WARN_ON_ONCE if we exceed that limit. [ Impact: allow more than 255 events ] Signed-off-by: Steven Rostedt commit 9be24414aad047dcf9d8d2a9a929321536c7ebec Author: Steven Rostedt Date: Thu Mar 26 10:25:24 2009 -0400 tracing/wakeup: move access to wakeup_cpu into spinlock The code had the following outside the lock: if (next != wakeup_task) return; pc = preempt_count(); /* The task we are waiting for is waking up */ data = wakeup_trace->data[wakeup_cpu]; On initialization, wakeup_task is NULL and wakeup_cpu -1. This code is not under a lock. If wakeup_task is set on another CPU as that task is waking up, we can see the wakeup_task before wakeup_cpu is set. If we read wakeup_cpu while it is still -1 then we will have a bad data pointer. This patch moves the reading of wakeup_cpu within the protection of the spinlock used to protect the writing of wakeup_cpu and wakeup_task. [ Impact: remove possible race causing invalid pointer dereference ] Reported-by: Maneesh Soni Signed-off-by: Steven Rostedt commit 6a74aa40907757ec98d8710ff66cd4cfe064e7d8 Author: Frederic Weisbecker Date: Wed Apr 22 00:41:09 2009 +0200 tracing/events: protect __get_str() The __get_str() macro is used in a code part then its content should be protected with parenthesis. [ Impact: make macro definition more robust ] Reported-by: Steven Rostedt Signed-off-by: Frederic Weisbecker commit 7e7ca9a22dbbc5c91763cd16923c7509918709b6 Author: Frederic Weisbecker Date: Sun Apr 19 04:54:49 2009 +0200 tracing/lock: provide lock_acquired event support for dynamic size string Now that we can support the dynamic sized string, make the lock tracing able to use it, making it safe against modules removal and consuming the right amount of memory needed for each lock name Changes in v2: adapt to the __ending_string() updates and the opening_string() removal. [ Impact: protect lock tracer against module removal ] Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Steven Rostedt commit 9cbf117662e24c6d33245666804487f92c21b59d Author: Frederic Weisbecker Date: Sun Apr 19 04:51:29 2009 +0200 tracing/events: provide string with undefined size support This patch provides the support for dynamic size strings on event tracing. The key concept is to use a structure with an ending char array field of undefined size and use such ability to allocate the minimal size on the ring buffer to make one or more string entries fit inside, as opposite to a fixed length strings with upper bound. The strings themselves are represented using fields which have an offset value from the beginning of the entry. This patch provides three new macros: __string(item, src) This one declares a string to the structure inside TP_STRUCT__entry. You need to provide the name of the string field and the source that will be copied inside. This will also add the dynamic size of the string needed for the ring buffer entry allocation. A stack allocated structure is used to temporarily store the offset of each strings, avoiding double calls to strlen() on each event insertion. __get_str(field) This one will give you a pointer to the string you have created. This is an abstract helper to resolve the absolute address given the field name which is a relative address from the beginning of the trace_structure. __assign_str(dst, src) Use this macro to automatically perform the string copy from src to dst. src must be a variable to assign and dst is the name of a __string field. Example on how to use it: TRACE_EVENT(my_event, TP_PROTO(char *src1, char *src2), TP_ARGS(src1, src2), TP_STRUCT__entry( __string(str1, src1) __string(str2, src2) ), TP_fast_assign( __assign_str(str1, src1); __assign_str(str2, src2); ), TP_printk("%s %s", __get_str(src1), __get_str(src2)) ) Of course you can mix-up any __field or __array inside this TRACE_EVENT. The position of the __string or __assign_str doesn't matter. Changes in v2: Address the suggestion of Steven Rostedt: drop the opening_string() macro and redefine __ending_string() to get the size of the string to be copied instead of overwritting the whole ring buffer allocation. Changes in v3: Address other suggestions of Steven Rostedt and Peter Zijlstra with some changes: drop the __ending_string and the need to have only one string field. Use offsets instead of absolute addresses. [ Impact: allow more compact memory usage for string tracing ] Signed-off-by: Frederic Weisbecker Cc: Steven Rostedt Cc: Li Zefan Cc: Peter Zijlstra commit ff166cb57a17124af75714a9c11f448f56f1a4a3 Author: Suresh Siddha Date: Mon Apr 20 13:02:30 2009 -0700 x86: x2apic, IR: remove reinit_intr_remapped_IO_APIC() When interrupt-remapping is enabled, we are relying on setup_IO_APIC_irqs() to configure remapped entries in the IO-APIC, which comes little bit later after enabling interrupt-remapping. Meanwhile, restoration of old io-apic entries after enabling interrupt-remapping will not make the interrupts through io-apic functional anyway. So remove the unnecessary reinit_intr_remapped_IO_APIC() step. The longer story: When interrupt-remapping is enabled, IO-APIC entries need to be setup in the re-mappable format (pointing to interrupt-remapping table entries setup by the OS). This remapping configuration is happening in the same place where we traditionally configure IO-APIC (i.e., in setup_IO_APIC_irqs()). So when we enable interrupt-remapping successfully, there is no need to restore old io-apic RTE entries before we actually do a complete configuration shortly in setup_IO_APIC_irqs(). Old IO-APIC RTE's may be in traditional format (non re-mappable) or in re-mappable format pointing to interrupt-remapping table entries setup by BIOS. Restoring both of these will not make IO-APIC functional. We have to rely on setup_IO_APIC_irqs() for proper configuration by OS. So I am removing this unnecessary and broken step. [ Impact: remove unnecessary/broken IO-APIC setup step ] Signed-off-by: Suresh Siddha Acked-by: Weidong Han Cc: dwmw2@infradead.org LKML-Reference: <20090420200450.552359000@linux-os.sc.intel.com> Signed-off-by: Ingo Molnar commit 7a4f453b6d7379a7c380825949977c5a838aa012 Author: Li Zefan Date: Wed Apr 22 16:53:34 2009 +0800 tracing/events: make struct trace_entry->type to be int type struct trace_entry->type is unsigned char, while trace event's id is int type, thus for a event with id >= 256, it's entry->type is cast to (id % 256), and then we can't see the trace output of this event. # insmod trace-events-sample.ko # echo foo_bar > /mnt/tracing/set_event # cat /debug/tracing/events/trace-events-sample/foo_bar/id 256 # cat /mnt/tracing/trace_pipe <...>-3548 [001] 215.091142: Unknown type 0 <...>-3548 [001] 216.089207: Unknown type 0 <...>-3548 [001] 217.087271: Unknown type 0 <...>-3548 [001] 218.085332: Unknown type 0 [ Impact: fix output for trace events with id >= 256 ] Signed-off-by: Li Zefan Acked-by: Frederic Weisbecker Cc: Steven Rostedt Cc: Tom Zanussi LKML-Reference: <49EEDB0E.5070207@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 3554228d4289098a8fe5cfd87512ec32a19bbe5a Author: Steven Rostedt Date: Tue Apr 21 09:41:26 2009 -0400 ring-buffer: only warn on wrap if buffer is bigger than two pages On boot up, to save memory, ftrace allocates the minimum buffer which is two pages. Ftrace also goes through a series of tests (when configured) on boot up. These tests can fill up a page within a single interrupt. The ring buffer also has a WARN_ON when it detects that the buffer was completely filled within a single commit (other commits are allowed to be nested). Combine the small buffer on start up, with the tests that can fill more than a single page within an interrupt, this can trigger the WARN_ON. This patch makes the WARN_ON only happen when the ring buffer consists of more than two pages. [ Impact: prevent false WARN_ON in ftrace startup tests ] Reported-by: Ingo Molnar LKML-Reference: <20090421094616.GA14561@elte.hu> Signed-off-by: Steven Rostedt Signed-off-by: Ingo Molnar commit f66578a7637b87810cbb9041c4e3a77fd2fa4706 Author: Li Zefan Date: Tue Apr 21 17:12:11 2009 +0800 tracing/filters: allow user-input to be integer-like string Suppose we would like to trace all tasks named '123', but this will fail: # echo 'parent_comm == 123' > events/sched/sched_process_fork/filter bash: echo: write error: Invalid argument Don't guess the type of the filter pred in filter_parse(), but instead we check it in __filter_add_pred(). [ Impact: extend allowed filter field string values ] Signed-off-by: Li Zefan Cc: Tom Zanussi Cc: Steven Rostedt Cc: Frederic Weisbecker LKML-Reference: <49ED8DEB.6000700@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit e8082f3f5a17d7a7bfc7dd1050a3f958dc034e9a Author: Li Zefan Date: Tue Apr 21 17:11:46 2009 +0800 tracing/filters: don't remove old filters when failed to write subsys->filter If writing subsys->filter returns EINVAL or ENOSPC, the original filters in subsys/ and subsys/events/ will be removed. This is definitely wrong. [ Impact: fix filter setting semantics on error condition ] Signed-off-by: Li Zefan Cc: Tom Zanussi Cc: Steven Rostedt Cc: Frederic Weisbecker LKML-Reference: <49ED8DD2.2070700@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 89388913f2c88a2cd15d24abab571b17a2596127 Author: Pekka Enberg Date: Tue Apr 21 11:39:27 2009 +0300 x86: unify noexec handling This patch unifies noexec handling on 32-bit and 64-bit. [ Impact: cleanup ] Signed-off-by: Pekka Enberg [ mingo@elte.hu: build fix ] LKML-Reference: <1240303167.771.69.camel@penberg-laptop> Signed-off-by: Ingo Molnar commit 8ecee4620e76aae418bfa0e8cc830e92cb559bbb Merge: 6424fb3 a939b96 Author: Ingo Molnar Date: Tue Apr 21 10:46:39 2009 +0200 Merge branch 'linus' into x86/mm Merge reason: refresh the topic: there's been 290 non-merges commits upstream to arch/x86 alone. Signed-off-by: Ingo Molnar commit 9d6c26e73bd248c286bb3597aaf788716e8fcceb Author: Suresh Siddha Date: Mon Apr 20 13:02:31 2009 -0700 x86: x2apic, IR: Make config X86_UV dependent on X86_X2APIC Instead of selecting X86_X2APIC, make config X86_UV dependent on X86_X2APIC. This will eliminate enabling CONFIG_X86_X2APIC with out enabling CONFIG_INTR_REMAP. [ Impact: cleanup ] Signed-off-by: Suresh Siddha Acked-by: Jack Steiner Cc: dwmw2@infradead.org Cc: Suresh Siddha Cc: Weidong Han LKML-Reference: <20090420200450.694598000@linux-os.sc.intel.com> Signed-off-by: Ingo Molnar commit 39d83a5d684a457046aa2a6dac60f105966e78e9 Author: Suresh Siddha Date: Mon Apr 20 13:02:29 2009 -0700 x86: x2apic, IR: Clean up panic() with nox2apic boot option Instead of panic() ignore the "nox2apic" boot option when BIOS has already enabled x2apic prior to OS handover. [ Impact: printk warning instead of panic() when BIOS has enabled x2apic already ] Signed-off-by: Suresh Siddha Cc: dwmw2@infradead.org Cc: Weidong Han LKML-Reference: <20090420200450.425091000@linux-os.sc.intel.com> Signed-off-by: Ingo Molnar commit 25629d810a52176758401184d9b437fbb7f79195 Author: Suresh Siddha Date: Mon Apr 20 13:02:28 2009 -0700 x86: x2apic, IR: Move eoi_ioapic_irq() into a CONFIG_INTR_REMAP section Address the following complier warning: arch/x86/kernel/apic/io_apic.c:2543: warning: `eoi_ioapic_irq' defined but not used By moving that function (and eoi_ioapic_irq()) into an existing #ifdef CONFIG_INTR_REMAP section of the code. [ Impact: cleanup ] Signed-off-by: Suresh Siddha Cc: dwmw2@infradead.org Cc: Weidong Han LKML-Reference: <20090420200450.271099000@linux-os.sc.intel.com> Signed-off-by: Ingo Molnar Cc: Weidong Han commit fc1edaf9e7cc4d4696f83dee495b8f158d01c4eb Author: Suresh Siddha Date: Mon Apr 20 13:02:27 2009 -0700 x86: x2apic, IR: Clean up X86_X2APIC and INTR_REMAP config checks Add x2apic_supported() to clean up CONFIG_X86_X2APIC checks. Fix CONFIG_INTR_REMAP checks. [ Impact: cleanup ] Signed-off-by: Suresh Siddha Cc: dwmw2@infradead.org Cc: Suresh Siddha Cc: Weidong Han LKML-Reference: <20090420200450.128993000@linux-os.sc.intel.com> Signed-off-by: Ingo Molnar commit 6e29ec5701e9d44fa02b96c1c5c45f7516182b65 Author: Gautham R Shenoy Date: Tue Apr 21 08:40:49 2009 +0530 sched: Replace first_cpu() with cpumask_first() in ILB nomination code Stephen Rothwell reported this build warning: > kernel/sched.c: In function 'find_new_ilb': > kernel/sched.c:4355: warning: passing argument 1 of '__first_cpu' from incompatible pointer type > > Possibly caused by commit f711f6090a81cbd396b63de90f415d33f563af9b > ("sched: Nominate idle load balancer from a semi-idle package") from > the sched tree. Should this call to first_cpu be cpumask_first? For !(CONFIG_SCHED_MC || CONFIG_SCHED_SMT), find_new_ilb() nominates the Idle load balancer as the first cpu from the nohz.cpu_mask. This code uses the older API first_cpu(). Replace it with cpumask_first(), which is the correct API here. [ Impact: cleanup, address build warning ] Reported-by: Stephen Rothwell Signed-off-by: Gautham R Shenoy Cc: Rusty Russell LKML-Reference: <20090421031049.GA4140@in.ibm.com> Signed-off-by: Ingo Molnar commit cb4764a6dbffd9bb3cf759421ae82384071a933d Author: Steven Rostedt Date: Mon Apr 20 18:16:44 2009 -0400 tracing: use nowakeup version of commit for function event trace tests The startup tests for the event tracer also runs with the function tracer enabled. The "wakeup" version of the trace commit was used which can grab spinlocks. If a task was preempted by an NMI that called a function being traced, it could deadlock due to the function tracer trying to grab the same lock. Thanks to Frederic Weisbecker for pointing out where the bug was. Reported-by: Ingo Molnar Reported-by: Frederic Weisbecker Signed-off-by: Steven Rostedt commit aa18efb2a2f07e1cf062039848e9d369bb358724 Author: Steven Rostedt Date: Mon Apr 20 16:16:11 2009 -0400 tracing: use recursive counter over irq level Althought using the irq level (hardirq_count, softirq_count and in_nmi) was nice to detect bad recursion right away, but since the counters are not atomically updated with respect to the interrupts, the function tracer might trigger the test from an interrupt handler before the hardirq_count is updated. This will trigger a false warning. This patch converts the recursive detection to a simple counter. If the depth is greater than 16 then the recursive detection will trigger. 16 is more than enough for any nested interrupts. [ Impact: fix false positive trace recursion detection ] Signed-off-by: Steven Rostedt commit ff743345bf7685a207868048a70e23164c4785e5 Author: Peter Zijlstra Date: Fri Mar 13 12:21:26 2009 +0100 sched: remove extra call overhead for schedule() Lai Jiangshan's patch reminded me that I promised Nick to remove that extra call overhead in schedule(). Signed-off-by: Peter Zijlstra LKML-Reference: <20090313112300.927414207@chello.nl> Signed-off-by: Ingo Molnar commit e395898e98119085f666febbc7b631dd69bc637f Author: Steven Rostedt Date: Mon Apr 20 13:32:44 2009 -0400 tracing: remove recursive test from ring_buffer_event_discard The ring_buffer_event_discard is not tied to ring_buffer_lock_reserve. It can be called inside or outside the reserve/commit. Even if it is called inside the reserve/commit the commit part must also be called. Only ring_buffer_discard_commit can be used as a replacement for ring_buffer_unlock_commit. This patch removes the trace_recursive_unlock from ring_buffer_event_discard since it would be the wrong place to do so. [Impact: prevent breakage in trace recursive testing ] Cc: Frederic Weisbecker Signed-off-by: Steven Rostedt commit 17487bfeb6cfb05920e6a9d5a54f345f2917b4e7 Author: Steven Rostedt Date: Mon Apr 20 13:24:21 2009 -0400 tracing: fix recursive test level calculation The recursive tests to detect same level recursion in the ring buffers did not account for the hard/softirq_counts to be shifted. Thus the numbers could be larger than then mask to be tested. This patch includes the shift for the calculation of the irq depth. [ Impact: stop false positives in trace recursion detection ] Signed-off-by: Steven Rostedt commit 23de29de2d8b227943be191d59fb6d983996d55e Author: Steven Rostedt Date: Mon Apr 20 12:59:29 2009 -0400 tracing: remove dangling semicolon Due to a cut and paste error, the trace_seq_putc had a semicolon after the prototype but before the stub function when tracing is disabled. [Impact: fix compile error ] Signed-off-by: Steven Rostedt commit 28d20e2d6e94434827e11c310788b87204b84559 Author: Steven Rostedt Date: Mon Apr 20 12:12:44 2009 -0400 tracing/events: call the correct event trace selftest init function The late_initcall calls a helper function instead of the proper init event selftest function. This update may have been lost due to conflicting merges. [ Impact: fix compiler warning and call extended event trace self tests ] Signed-off-by: Steven Rostedt commit a7abe97fd8e7a6ccabba5a04a9f17be9211d418c Author: Steven Rostedt Date: Mon Apr 20 10:59:34 2009 -0400 tracing: rename EVENT_TRACER config to ENABLE_EVENT_TRACING Currently we have two configs: EVENT_TRACING and EVENT_TRACER. All tracers enable EVENT_TRACING. The EVENT_TRACER is only a convenience to enable the EVENT_TRACING when no other tracers are enabled. The names EVENT_TRACER and EVENT_TRACING are too similar and confusing. This patch renames EVENT_TRACER to ENABLE_EVENT_TRACING to be more appropriate to what it actually does, as well as add a comment in the help menu to explain the option's purpose. [ Impact: rename config option to reduce confusion ] Reported-by: Ingo Molnar Signed-off-by: Steven Rostedt commit 4ed9f0716e46bb9646f26e73f4a1b5b24db7947a Author: Steven Rostedt Date: Mon Apr 20 10:47:36 2009 -0400 tracing: create menuconfig for tracing infrastructure During testing we often use randconfig to test various kernels. The current configuration set up does not give an easy way to disable all tracing with a single config. The case where randconfig would test all tracing disabled is very unlikely. This patch adds a config option to enable or disable all tracing. It is hooked into the tracing menu just like other submenus are done. [ Impact: allow randconfig to easily produce all traces disabled ] Reported-by: Ingo Molnar Signed-off-by: Steven Rostedt commit 9ae5b8790037d05d32746f521af146c32089bfec Author: Steven Rostedt Date: Mon Apr 20 10:27:58 2009 -0400 tracing: change branch profiling to a choice selection This patch makes the branch profiling into a choice selection: None - no branch profiling likely/unlikely - only profile likely/unlikely branches all - profile all branches The all profiler will also enable the likely/unlikely branches. This does not change the way the profiler works or the dependencies between the profilers. What this patch does, is keep the branch profiling from being selected by an allyesconfig make. The branch profiler is very intrusive and it is known to break various architecture builds when selected as an allyesconfig. [ Impact: prevent branch profiler from being selected in allyesconfig ] Reported-by: Heiko Carstens Reported-by: Al Viro Reported-by: Stephen Rothwell Reported-by: Andrew Morton Signed-off-by: Steven Rostedt commit f3b9aae16219aaeca2dd5a9ca69f7a10faa063df Author: Frederic Weisbecker Date: Sun Apr 19 23:39:33 2009 +0200 tracing/ring-buffer: Add unlock recursion protection on discard The pair of helpers trace_recursive_lock() and trace_recursive_unlock() have been introduced recently to provide generic tracing recursion protection. They are used in a symetric way: - trace_recursive_lock() on buffer reserve - trace_recursive_unlock() on buffer commit However sometimes, we don't commit but discard on entry to the buffer, ie: in case of filter checking. Then we must also unlock the recursion protection on discard time, otherwise the tracing gets definitely deactivated and a warning is raised spuriously, such as: 111.119821] ------------[ cut here ]------------ [ 111.119829] WARNING: at kernel/trace/ring_buffer.c:1498 ring_buffer_lock_reserve+0x1b7/0x1d0() [ 111.119835] Hardware name: AMILO Li 2727 [ 111.119839] Modules linked in: [ 111.119846] Pid: 5731, comm: Xorg Tainted: G W 2.6.30-rc1 #69 [ 111.119851] Call Trace: [ 111.119863] [] warn_slowpath+0xd8/0x130 [ 111.119873] [] ? __lock_acquire+0x19f/0x1ae0 [ 111.119882] [] ? __lock_acquire+0x19f/0x1ae0 [ 111.119891] [] ? native_sched_clock+0x20/0x70 [ 111.119899] [] ? put_lock_stats+0xe/0x30 [ 111.119906] [] ? lock_release_holdtime+0xa8/0x150 [ 111.119913] [] ring_buffer_lock_reserve+0x1b7/0x1d0 [ 111.119921] [] trace_buffer_lock_reserve+0x30/0x70 [ 111.119930] [] trace_current_buffer_lock_reserve+0x20/0x30 [ 111.119939] [] ftrace_raw_event_sched_switch+0x58/0x100 [ 111.119948] [] __schedule+0x3a7/0x4cd [ 111.119957] [] ? ftrace_call+0x5/0x2b [ 111.119964] [] ? ftrace_call+0x5/0x2b [ 111.119971] [] schedule+0x18/0x40 [ 111.119977] [] preempt_schedule+0x39/0x60 [ 111.119985] [] _read_unlock+0x53/0x60 [ 111.119993] [] sock_def_readable+0x72/0x80 [ 111.120002] [] unix_stream_sendmsg+0x24d/0x3d0 [ 111.120011] [] sock_aio_write+0x143/0x160 [ 111.120019] [] ? ftrace_call+0x5/0x2b [ 111.120026] [] ? sock_aio_write+0x0/0x160 [ 111.120033] [] ? sock_aio_write+0x0/0x160 [ 111.120042] [] do_sync_readv_writev+0xf3/0x140 [ 111.120049] [] ? ftrace_call+0x5/0x2b [ 111.120057] [] ? autoremove_wake_function+0x0/0x40 [ 111.120067] [] ? cap_file_permission+0x9/0x10 [ 111.120074] [] ? security_file_permission+0x16/0x20 [ 111.120082] [] do_readv_writev+0xd4/0x1f0 [ 111.120089] [] ? ftrace_call+0x5/0x2b [ 111.120097] [] ? ftrace_call+0x5/0x2b [ 111.120105] [] vfs_writev+0x48/0x70 [ 111.120111] [] sys_writev+0x55/0xc0 [ 111.120119] [] system_call_fastpath+0x16/0x1b [ 111.120125] ---[ end trace 15605f4e98d5ccb5 ]--- [ Impact: fix spurious warning triggering tracing shutdown ] Signed-off-by: Frederic Weisbecker commit e057a5e5647a1c9d0d0054fbd298bfa04b3d1cb4 Author: Frederic Weisbecker Date: Sun Apr 19 23:38:12 2009 +0200 tracing/core: Add current context on tracing recursion warning In case of tracing recursion detection, we only get the stacktrace. But the current context may be very useful to debug the issue. This patch adds the softirq/hardirq/nmi context with the warning using lockdep context display to have a familiar output. v2: Use printk_once() v3: drop {hardirq,softirq}_context which depend on lockdep, only keep what is part of current->trace_recursion, sufficient to debug the warning source. [ Impact: print context necessary to debug recursion ] Signed-off-by: Frederic Weisbecker commit 667c5296cc76fefe0abcb79228952b28d9af45e3 Author: Cyrill Gorcunov Date: Sun Apr 19 11:43:11 2009 +0400 x86: es7000, uv - use __cpuinit for kicking secondary cpus The caller already has __cpuinit attribute. [ Impact: save memory, address section mismatch warning ] Signed-off-by: Cyrill Gorcunov Cc: Yinghai Lu Cc: Pavel Emelyanov LKML-Reference: <20090419074311.GA8670@lenovo> Signed-off-by: Ingo Molnar commit cece3155d869a50ba534ed161b5a05e8a29dcad0 Author: Cyrill Gorcunov Date: Sat Apr 18 23:45:28 2009 +0400 x86: smpboot - wakeup_secondary should be done via __cpuinit section A caller (do_boot_cpu) already has __cpuinit attribute. Since HOTPLUG_CPU depends on SMP && HOTPLUG it doesn't lead to panic at moment. [ Impact: cleanup ] Signed-off-by: Cyrill Gorcunov LKML-Reference: <20090418194528.GD25510@lenovo> Signed-off-by: Ingo Molnar commit 9a2755c3569e4db92bd9b1daadeddb4045b0cccd Author: Weidong Han Date: Fri Apr 17 16:42:16 2009 +0800 x86, intr-remap: fix x2apic/intr-remap resume Interrupt remapping was decoupled from x2apic. Shouldn't check x2apic before resume interrupt remapping. Otherwise, interrupt remapping won't be resumed when x2apic is not enabled. [ Impact: fix potential intr-remap resume hang on !x2apic ] Signed-off-by: Suresh Siddha Signed-off-by: Weidong Han Acked-by: David Woodhouse Cc: iommu@lists.linux-foundation.org Cc: allen.m.kay@intel.com Cc: fenghua.yu@intel.com LKML-Reference: <1239957736-6161-6-git-send-email-weidong.han@intel.com> Signed-off-by: Ingo Molnar commit 03ea81550676296d94596e4337c771c6ba29f542 Author: Weidong Han Date: Fri Apr 17 16:42:15 2009 +0800 x86, intr-remap: add option to disable interrupt remapping Add option "nointremap" to disable interrupt remapping. [ Impact: add new boot option ] Signed-off-by: Suresh Siddha Signed-off-by: Weidong Han Acked-by: David Woodhouse Cc: iommu@lists.linux-foundation.org Cc: allen.m.kay@intel.com Cc: fenghua.yu@intel.com LKML-Reference: <1239957736-6161-5-git-send-email-weidong.han@intel.com> Signed-off-by: Ingo Molnar commit 937582382c71b75b29fbb92615629494e1a05ac0 Author: Weidong Han Date: Fri Apr 17 16:42:14 2009 +0800 x86, intr-remap: enable interrupt remapping early Currently, when x2apic is not enabled, interrupt remapping will be enabled in init_dmars(), where it is too late to remap ioapic interrupts, that is, ioapic interrupts are really in compatibility mode, not remappable mode. This patch always enables interrupt remapping before ioapic setup, it guarantees all interrupts will be remapped when interrupt remapping is enabled. Thus it doesn't need to set the compatibility interrupt bit. [ Impact: refactor intr-remap init sequence, enable fuller remap mode ] Signed-off-by: Suresh Siddha Signed-off-by: Weidong Han Acked-by: David Woodhouse Cc: iommu@lists.linux-foundation.org Cc: allen.m.kay@intel.com Cc: fenghua.yu@intel.com LKML-Reference: <1239957736-6161-4-git-send-email-weidong.han@intel.com> Signed-off-by: Ingo Molnar commit 5d0ae2db6deac4f15dac4f42f23bc56448fc8d4d Author: Weidong Han Date: Fri Apr 17 16:42:13 2009 +0800 x86, intr-remap: fix ack for interrupt remapping Shouldn't call ack_apic_edge() in ir_ack_apic_edge(), because ack_apic_edge() does more than just ack: it also does irq migration in the non-interrupt-remapping case. But there is no such need for interrupt-remapping case, as irq migration is done in the process context. Similarly, ir_ack_apic_level() shouldn't call ack_apic_level, and instead should do the local cpu's EOI + directed EOI to the io-apic. ack_x2APIC_irq() is not neccessary, because ack_APIC_irq() will use MSR write for x2apic, and uncached write for non-x2apic. [ Impact: simplify/standardize intr-remap IRQ acking, fix on !x2apic ] Signed-off-by: Suresh Siddha Signed-off-by: Weidong Han Acked-by: David Woodhouse Cc: iommu@lists.linux-foundation.org Cc: allen.m.kay@intel.com Cc: fenghua.yu@intel.com LKML-Reference: <1239957736-6161-3-git-send-email-weidong.han@intel.com> Signed-off-by: Ingo Molnar commit 2b2fd87a6ef56ba7647a578e81bb8c8efda166b8 Author: Weidong Han Date: Fri Apr 17 16:42:12 2009 +0800 docs, x86: add nox2apic back to kernel-parameters.txt "nox2apic" was removed from kernel-parameters.txt by mistake, when entries were sorted in alpha order (commit 0cb55ad2). But this early parameter is still there, add it back to kernel-parameters.txt. [ Impact: add boot parameter description ] Signed-off-by: Suresh Siddha Signed-off-by: Weidong Han Acked-by: David Woodhouse Cc: Randy Dunlap Cc: iommu@lists.linux-foundation.org Cc: allen.m.kay@intel.com Cc: fenghua.yu@intel.com LKML-Reference: <1239957736-6161-2-git-send-email-weidong.han@intel.com> Signed-off-by: Ingo Molnar commit 8e668b5b3455207e4540fc7ccab9ecf70142f288 Author: Steven Rostedt Date: Fri Apr 17 17:17:55 2009 -0400 tracing: remove format attribute of inline function Due to a cut and paste error, I added the gcc attribute for printf format to the static inline stub of trace_seq_printf. This will cause a compile failure. [ Impact: fix compiler error when CONFIG_TRACING is off ] Reported-by: Ingo Molnar Signed-off-by: Steven Rostedt Cc: Andrew Morton Cc: =?ISO-8859-15?Q?Fr=E9d=E9ric_Weisbecker?= LKML-Reference: Signed-off-by: Ingo Molnar commit 3189cdb31622f4e40688ce5a6fc5d940b42bc805 Author: Steven Rostedt Date: Fri Apr 17 16:13:55 2009 -0400 tracing: protect trace_printk from recursion trace_printk can be called from any context, including NMIs. If this happens, then we must test for for recursion before grabbing any spinlocks. This patch prevents trace_printk from being called recursively. [ Impact: prevent hard lockup in lockdep event tracer ] Cc: Peter Zijlstra Cc: Frederic Weisbecker Signed-off-by: Steven Rostedt commit 261842b7c9099f56de2eb969c8ad65402d68e00e Author: Steven Rostedt Date: Thu Apr 16 21:41:52 2009 -0400 tracing: add same level recursion detection The tracing infrastructure allows for recursion. That is, an interrupt may interrupt the act of tracing an event, and that interrupt may very well perform its own trace. This is a recursive trace, and is fine to do. The problem arises when there is a bug, and the utility doing the trace calls something that recurses back into the tracer. This recursion is not caused by an external event like an interrupt, but by code that is not expected to recurse. The result could be a lockup. This patch adds a bitmask to the task structure that keeps track of the trace recursion. To find the interrupt depth, the following algorithm is used: level = hardirq_count() + softirq_count() + in_nmi; Here, level will be the depth of interrutps and softirqs, and even handles the nmi. Then the corresponding bit is set in the recursion bitmask. If the bit was already set, we know we had a recursion at the same level and we warn about it and fail the writing to the buffer. After the data has been committed to the buffer, we clear the bit. No atomics are needed. The only races are with interrupts and they reset the bitmask before returning anywy. [ Impact: detect same irq level trace recursion ] Signed-off-by: Steven Rostedt commit 12acd473d45cf2e40de3782cb2de712e5cd4d715 Author: Steven Rostedt Date: Fri Apr 17 16:01:56 2009 -0400 tracing: add EXPORT_SYMBOL_GPL for trace commits Not all the necessary symbols were exported to allow for tracing by modules. This patch adds them in. [ Impact: allow modules to commit data to the ring buffer ] Signed-off-by: Steven Rostedt commit b0afdc126d0515e76890f0a5f26b28501cfa298e Author: Steven Rostedt Date: Fri Apr 17 13:02:22 2009 -0400 tracing/events: enable code with EVENT_TRACING not EVENT_TRACER The CONFIG_EVENT_TRACER is the way to turn on event tracing when no other tracing has been configured. All code to get enabled should depend on CONFIG_EVENT_TRACING. That is what is enabled when TRACING (or CONFIG_EVENT_TRACER) is selected. This patch enables the include/trace/ftrace.h file when CONFIG_EVENT_TRACING is enabled. [ Impact: fix warning in event tracer selftest ] Signed-off-by: Steven Rostedt commit ac1adc55fc71c7515caa2eb0e63e49b3d1c6a47c Author: Tom Zanussi Date: Fri Apr 17 00:27:08 2009 -0500 tracing/filters: add filter_mutex to protect filter predicates This patch adds a filter_mutex to prevent the filter predicates from being accessed concurrently by various external functions. It's based on a previous patch by Li Zefan: "[PATCH 7/7] tracing/filters: make filter preds RCU safe" v2 changes: - fixed wrong value returned in a add_subsystem_pred() failure case noticed by Li Zefan. [ Impact: fix trace filter corruption/crashes on parallel access ] Signed-off-by: Tom Zanussi Reviewed-by: Li Zefan Tested-by: Li Zefan Cc: Frederic Weisbecker Cc: Steven Rostedt Cc: paulmck@linux.vnet.ibm.com LKML-Reference: <1239946028.6639.13.camel@tropicana> Signed-off-by: Ingo Molnar commit 46de405f25f1d9fa73b657ffbb752aa0cc87a91d Author: Zhaolei Date: Fri Apr 17 10:53:43 2009 +0800 tracing: Remove include/trace/kmem_event_types.h kmem_event_types.h is no longer necessary since tracepoint definitions are put into include/trace/events/kmem.h [ Impact: remove now-unused file. ] Signed-off-by: Zhao Lei Acked-by: Steven Rostedt Cc: Frederic Weisbecker Cc: Tom Zanussi LKML-Reference: <49E7EF37.2080205@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 9b94b3a19b13e094c10f65f24bc358f6ffe4eacd Author: Andreas Herrmann Date: Fri Apr 17 12:07:46 2009 +0200 x86: fixup numa_node information for AMD CPU northbridge functions Currently the numa_node attribute for these PCI devices is 0 (it corresponds to the numa_node for PCI bus 0). This is not a big issue but incorrect. This inconsistency can be fixed by reading the node number from CPU NB function 0. [ Impact: fill in dev->numa_node information, to optimize DMA allocations ] Signed-off-by: Andreas Herrmann Cc: jbarnes@virtuousgeek.org LKML-Reference: <20090417100746.GG16198@alberich.amd.com> Signed-off-by: Ingo Molnar commit 339ae5d3c3fc2025e3657637921495fd600027c7 Author: Li Zefan Date: Fri Apr 17 10:34:30 2009 +0800 tracing: fix file mode of trace and README trace is read-write and README is read-only. [ Impact: fix /debug/tracing/ file permissions. ] Signed-off-by: Li Zefan Acked-by: Frederic Weisbecker Acked-by: Steven Rostedt LKML-Reference: <49E7EAB6.4070605@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 76aa81118ddfbb3dc31533030cf3ec329dd067a6 Author: Jeremy Fitzhardinge Date: Thu Apr 16 23:35:39 2009 -0700 tracing: avoid warnings from zero-arg tracepoints Tracepoints with no arguments can issue two warnings: "field" defined by not used "ret" is uninitialized in this function Mark field as being OK to leave unused, and initialize ret. [ Impact: fix false positive compiler warnings. ] Signed-off-by: Jeremy Fitzhardinge Acked-by: Steven Rostedt Cc: mathieu.desnoyers@polymtl.ca LKML-Reference: <1239950139-1119-5-git-send-email-jeremy@goop.org> Signed-off-by: Ingo Molnar commit 9ea21c1ecdb35ecdcac5fd9d95f62a1f6a7ffec0 Author: Steven Rostedt Date: Thu Apr 16 12:15:44 2009 -0400 tracing/events: perform function tracing in event selftests We can find some bugs in the trace events if we stress the writes as well. The function tracer is a good way to stress the events. [ Impact: extend scope of event tracer self-tests ] Signed-off-by: Steven Rostedt Cc: Andrew Morton Cc: Peter Zijlstra Cc: Frederic Weisbecker LKML-Reference: <20090416161746.604786131@goodmis.org> Signed-off-by: Ingo Molnar commit 69abe6a5d18a9394baa325bab8f57748b037c517 Author: Avadh Patel Date: Fri Apr 10 16:04:48 2009 -0400 tracing: add saved_cmdlines file to show cached task comms Export the cached task comms to userspace. This allows user apps to translate the pids from a trace into their respective task command lines. [ Impact: let userspace apps reading binary buffer know comm's of pids ] Signed-off-by: Avadh Patel [ added error checking and use of buf pointer to index file_buf ] Signed-off-by: Steven Rostedt commit d1b182a8d49ed6416325b4e0a1cb0f17cd4e702a Author: Steven Rostedt Date: Wed Apr 15 16:53:47 2009 -0400 tracing/events/ring-buffer: expose format of ring buffer headers to users Currently, every thing needed to read the binary output from the ring buffers is available, with the exception of the way the ring buffers handles itself internally. This patch creates two special files in the debugfs/tracing/events directory: # cat /debug/tracing/events/header_page field: u64 timestamp; offset:0; size:8; field: local_t commit; offset:8; size:8; field: char data; offset:16; size:4080; # cat /debug/tracing/events/header_event type : 2 bits len : 3 bits time_delta : 27 bits array : 32 bits padding : type == 0 time_extend : type == 1 data : type == 3 This is to allow a userspace app to see if the ring buffer format changes or not. [ Impact: allow userspace apps to know of ringbuffer format changes ] Signed-off-by: Steven Rostedt commit e6187007d6c365b551c69ea3df46f06fd1c8bd19 Author: Steven Rostedt Date: Wed Apr 15 13:36:40 2009 -0400 tracing/events: add startup tests for events As events start to become popular, and the new way to add tracing infrastructure into ftrace, it is important to catch any problems that might happen with a mistake in the TRACE_EVENT macro. This patch introduces a startup self test on the registered trace events. Note, it can only do a generic test, any type of testing that needs more involement is needed to be implemented by the tracepoint creators. The test goes down one by one enabling a trace point and running some random tasks (random in the sense that I just made them up). Those tasks are creating threads, grabbing mutexes and spinlocks and using workqueues. After testing each event individually, it does the same test after enabling each system of trace points. Like sched, irq, lockdep. Then finally it enables all tracepoints and performs the tasks again. The output to the console on bootup will look like this when everything works: Running tests on trace events: Testing event kfree_skb: OK Testing event kmalloc: OK Testing event kmem_cache_alloc: OK Testing event kmalloc_node: OK Testing event kmem_cache_alloc_node: OK Testing event kfree: OK Testing event kmem_cache_free: OK Testing event irq_handler_exit: OK Testing event irq_handler_entry: OK Testing event softirq_entry: OK Testing event softirq_exit: OK Testing event lock_acquire: OK Testing event lock_release: OK Testing event sched_kthread_stop: OK Testing event sched_kthread_stop_ret: OK Testing event sched_wait_task: OK Testing event sched_wakeup: OK Testing event sched_wakeup_new: OK Testing event sched_switch: OK Testing event sched_migrate_task: OK Testing event sched_process_free: OK Testing event sched_process_exit: OK Testing event sched_process_wait: OK Testing event sched_process_fork: OK Testing event sched_signal_send: OK Running tests on trace event systems: Testing event system skb: OK Testing event system kmem: OK Testing event system irq: OK Testing event system lockdep: OK Testing event system sched: OK Running tests on all trace events: Testing all events: OK [ folded in: tracing: add #include to fix build failure in test_work() This build failure occured on a few rare configs: kernel/trace/trace_events.c: In function ‘test_work’: kernel/trace/trace_events.c:975: error: implicit declaration of function ‘udelay’ kernel/trace/trace_events.c:980: error: implicit declaration of function ‘msleep’ delay.h is included in way too many other headers, hiding cases where new usage is added without header inclusion. [ Impact: build fix ] Signed-off-by: Ingo Molnar ] [ Impact: add event tracer self-tests ] Signed-off-by: Steven Rostedt commit 93eb677d74a4f7d3edfb678c94f6c0544d9fbad2 Author: Steven Rostedt Date: Wed Apr 15 13:24:06 2009 -0400 ftrace: use module notifier for function tracer The hooks in the module code for the function tracer must be called before any of that module code runs. The function tracer hooks modify the module (replacing calls to mcount to nops). If the code is executed while the change occurs, then the CPU can take a GPF. To handle the above with a bit of paranoia, I originally implemented the hooks as calls directly from the module code. After examining the notifier calls, it looks as though the start up notify is called before any of the module's code is executed. This makes the use of the notify safe with ftrace. Only the startup notify is required to be "safe". The shutdown simply removes the entries from the ftrace function list, and does not modify any code. This change has another benefit. It removes a issue with a reverse dependency in the mutexes of ftrace_lock and module_mutex. [ Impact: fix lock dependency bug, cleanup ] Cc: Rusty Russell Signed-off-by: Steven Rostedt commit 5043124e660fcc3ddefe4239ddfa017bf13f5081 Merge: 77857dc 9f76208 Author: Ingo Molnar Date: Fri Apr 17 16:18:22 2009 +0200 Merge branch 'linus' into x86/apic Merge reason: new intr-remap patches depend on the s2ram iommu fixes from upstream Signed-off-by: Ingo Molnar commit d1f0ae5e2e45e74cff4c3bdefb0fc77608cdfeec Author: Sam Ravnborg Date: Wed Apr 15 21:34:55 2009 +0200 x86: standardize Kbuild rules Introducing this Kbuild file allow us to: make arch/x86/ And thus building all the core part of x86. Signed-off-by: Sam Ravnborg Cc: Jaswinder Singh Rajput Signed-off-by: Ingo Molnar commit f3948f8857ef5de239f28a61dddb1554a0ae4c2c Author: Li Zefan Date: Wed Apr 15 11:02:56 2009 +0800 blktrace: fix context-info when mixed-using blk tracer and trace events When current tracer is set to blk tracer, TRACE_ITER_CONTEXT_INFO is unset, but actually context-info is printed: pdflush-431 [000] 821.181576: 8,0 P N [pdflush] And then if we enable TRACE_ITER_CONTEXT_INFO: # echo context-info > trace_options We'll see context-info printed twice. What's worse, when we use blk tracer and trace events at the same time, we'll see no context-info for trace events at all: jbd2_commit_logging: dev dm-0:8 transaction 333227 jbd2_end_commit: dev dm-0:8 transaction 333227 head 332814 rm-25433 [001] 9578.307485: 8,18 m N cfq25433 slice expired t=0 rm-25433 [001] 9578.307486: 8,18 m N cfq25433 put_queue This patch adds blk_tracer->set_flags(), and context-info flag is unset only when we set the output to classic mode. Note after this patch, one should unset context-info explicitly if he wants to get binary output that can be parsed by blkparse: # echo nocontext-info > trace_options # echo bin > trace_options # echo blk > current_tracer # cat trace_pipe | blkparse -i - Reported-by: Theodore Ts'o Signed-off-by: Li Zefan Cc: Jens Axboe Cc: Arnaldo Carvalho de Melo Cc: Frederic Weisbecker Cc: Steven Rostedt LKML-Reference: <49E54E60.50408@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 1d54ad6da9192fed5dd3b60224d9f2dfea0dcd82 Author: Li Zefan Date: Tue Apr 14 14:00:05 2009 +0800 blktrace: add trace/ to /sys/block/sda Impact: allow ftrace-plugin blktrace to trace device-mapper devices To trace a single partition: # echo 1 > /sys/block/sda/sda1/enable To trace the whole sda instead: # echo 1 > /sys/block/sda/enable Thus we also fix an issue reported by Ted, that ftrace-plugin blktrace can't be used to trace device-mapper devices. Now: # echo 1 > /sys/block/dm-0/trace/enable echo: write error: No such device or address # mount -t ext4 /dev/dm-0 /mnt # echo 1 > /sys/block/dm-0/trace/enable # echo blk > /debug/tracing/current_tracer Reported-by: Theodore Tso Signed-off-by: Li Zefan Acked-by: "Theodore Ts'o" Cc: Arnaldo Carvalho de Melo Cc: Shawn Du Cc: Jens Axboe LKML-Reference: <49E42665.6020506@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 9908c30997b8a73c95f836170b9998dae9aa3f4a Author: Li Zefan Date: Tue Apr 14 13:59:34 2009 +0800 blktrace: support per-partition tracing for ftrace plugin The previous patch adds support to trace a single partition for relay+ioctl blktrace, and this patch is for ftrace plugin blktrace: # echo 1 > /sys/block/sda/sda7/enable # cat start_lba 102398373 # cat end_lba 102703545 Signed-off-by: Li Zefan Acked-by: "Theodore Ts'o" Cc: Arnaldo Carvalho de Melo Cc: Shawn Du Cc: Jens Axboe LKML-Reference: <49E42646.4060608@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit d0deef5b14af7d5bbd0003a0a2a1a32326e20a6d Author: Shawn Du Date: Tue Apr 14 13:58:56 2009 +0800 blktrace: support per-partition tracing Though one can specify '-d /dev/sda1' when using blktrace, it still traces the whole sda. To support per-partition tracing, when we start tracing, we initialize bt->start_lba and bt->end_lba to the start and end sector of that partition. Note some actions are per device, thus we don't filter 0-sector events. The original patch and discussion can be found here: http://marc.info/?l=linux-btrace&m=122949374214540&w=2 Signed-off-by: Shawn Du Signed-off-by: Li Zefan Acked-by: "Theodore Ts'o" Cc: Arnaldo Carvalho de Melo Cc: Jens Axboe LKML-Reference: <49E42620.4050701@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 77857dc07247ed5fa700a197c96ef842d8dbebdf Author: Yinghai Lu Date: Wed Apr 15 11:57:01 2009 -0700 x86: use used_vectors in init_IRQ() Impact: fix crash with many devices I found this crash: [ 552.616646] general protection fault: 0403 [#1] SMP [ 552.620013] last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/usb1/1-1/1-1:1.0/host13/target13:0:0/13:0:0:0/block/sr0/size [ 552.620013] CPU 0 [ 552.620013] Modules linked in: [ 552.620013] Pid: 0, comm: swapper Not tainted 2.6.30-rc1-tip-01931-g8fcafd8-dirty #28 Sun Fire X4440 [ 552.620013] RIP: 0010:[] [] default_idle+0x7d/0xda [ 552.620013] RSP: 0018:ffffffff81345e68 EFLAGS: 00010246 [ 552.620013] RAX: 0000000000000000 RBX: ffffffff8133d870 RCX: ffffc20000000000 [ 552.620013] RDX: 00000000001d0620 RSI: ffffffff8023bad8 RDI: ffffffff802a3169 [ 552.620013] RBP: ffffffff81345e98 R08: 0000000000000000 R09: ffffffff812244a0 [ 552.620013] R10: ffffffff81345dc8 R11: 7ebe1b6fa0bcac50 R12: 4ec4ec4ec4ec4ec5 [ 552.620013] R13: ffffffff813a54d0 R14: ffffffff813a7a40 R15: 0000000000000000 [ 552.620013] FS: 00000000006d1880(0000) GS:ffffc20000000000(0000) knlGS:0000000000000000 [ 552.620013] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [ 552.620013] CR2: 00007fec9d936a50 CR3: 000000007d1a9000 CR4: 00000000000006e0 [ 552.620013] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 552.620013] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 552.620013] Process swapper (pid: 0, threadinfo ffffffff81344000,task ffffffff812244a0) [ 552.620013] Stack: [ 552.620013] 0000000000000000 ffffc20000000000 00000000001d0620 7ebe1b6fa0bcac50 [ 552.620013] ffffffff8133d870 4ec4ec4ec4ec4ec5 ffffffff81345ec8 ffffffff8023bd84 [ 552.620013] 4ec4ec4ec4ec4ec5 ffffffff813a54d0 7ebe1b6fa0bcac50 ffffffff8133d870 [ 552.620013] Call Trace: [ 552.620013] [] c1e_idle+0x109/0x124 [ 552.620013] [] cpu_idle+0xb8/0x101 [ 552.620013] [] rest_init+0x7e/0x94 [ 552.620013] [] start_kernel+0x3dc/0x3fd [ 552.620013] [] x86_64_start_reservations+0xb9/0xd4 [ 552.620013] [] x86_64_start_kernel+0xee/0x109 [ 552.620013] Code: 48 8b 04 25 f8 b4 00 00 83 a0 3c e0 ff ff fb 0f ae f0 65 48 8b 04 25 f8 b4 00 00 f6 80 38 e0 ff ff 08 75 09 e8 71 76 06 00 fb f4 06 e8 68 76 06 00 fb 65 48 8b 04 25 f8 b4 00 00 83 88 3c e0 [ 552.620013] RIP [] default_idle+0x7d/0xda [ 552.620013] RSP [ 552.828646] ---[ end trace 4cbfc5c01382af7f ]--- Joerg Roedel said "The 0403 error code means that there was an external interrupt with vector 0x80. Yinghai, my theory is that the kernel on this machine has no 32bit emulation compiled in, right? In this case the selector points to a zero entry which may cause the #gpf right after the hlt. But I have no idea where the external int 0x80 comes from" it turns out that we could use 0x80 for external device on 64-bit when 32-bit emulation is disabled. But we forgot to set the gate for it. try to set gate for it by checking used_vectors. Also move apic_intr_init() early to avoid setting that gate two times. Signed-off-by: Yinghai Lu Cc: Andrew Morton Cc: Joerg Roedel LKML-Reference: <49E62DFD.6010904@kernel.org> Signed-off-by: Ingo Molnar commit 13318a7186d8e0ae08c996ea4111a945e7789772 Author: Miao Xie Date: Wed Apr 15 09:59:10 2009 +0800 sched: use group_first_cpu() instead of cpumask_first(sched_group_cpus()) Impact: cleanup This patch changes cpumask_first(sched_group_cpus()) to group_first_cpu() for maintainability. Signed-off-by: Miao Xie Acked-by: Peter Zijlstra Signed-off-by: Ingo Molnar commit 19c1a6f5764d787113fa323ffb18be7991208f82 Author: FUJITA Tomonori Date: Tue Apr 14 09:43:19 2009 +0900 x86 gart: reimplement IOMMU_LEAK feature by using DMA_API_DEBUG IOMMU_LEAK, GART's own feature, dumps the used IOMMU entries when IOMMU entries is full, which might be useful to find a bad driver that eats IOMMU entries. DMA_API_DEBUG provides the similar feature, debug_dma_dump_mappings, and it's better than GART's IOMMU_LEAK feature. GART's IOMMU_LEAK feature doesn't say who uses IOMMU entries so it's hard to find a bad driver. This patch reimplements the GART's IOMMU_LEAK feature by using DMA_API_DEBUG. Signed-off-by: FUJITA Tomonori Acked-by: Joerg Roedel Cc: Andrew Morton LKML-Reference: <1239669799-23579-2-git-send-email-fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar commit e6a1a89d572c31b62d6dcf11a371c7323852d9b2 Author: FUJITA Tomonori Date: Wed Apr 15 18:22:41 2009 +0900 dma-debug: add dma_debug_resize_entries() to adjust the number of dma_debug_entries We use a static value for the number of dma_debug_entries. It can be overwritten by a kernel command line option. Some IOMMUs (e.g. GART) can't set an appropriate value by a kernel command line option because they can't know such value until they finish initializing up their hardware. This patch adds dma_debug_resize_entries() enables IOMMUs to adjust the number of dma_debug_entries anytime. Signed-off-by: FUJITA Tomonori Acked-by: Joerg Roedel Cc: fujita.tomonori@lab.ntt.co.jp Cc: akpm@linux-foundation.org LKML-Reference: <20090415182234R.fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar commit b206525ad1f653b7da35f5827be93770d28eae11 Author: Jaswinder Singh Rajput Date: Tue Apr 14 23:04:37 2009 +0530 x86: k8 convert node_to_k8_nb_misc() from a macro to an inline function Converting node_to_k8_nb_misc() from a macro to an inline function makes compiler see the 'node' parameter in the !CONFIG_K8_NB too, which eliminates these compiler warnings: arch/x86/kernel/cpu/intel_cacheinfo.c: In function ‘show_cache_disable’: arch/x86/kernel/cpu/intel_cacheinfo.c:712: warning: unused variable ‘node’ arch/x86/kernel/cpu/intel_cacheinfo.c: In function ‘store_cache_disable’: arch/x86/kernel/cpu/intel_cacheinfo.c:739: warning: unused variable ‘node’ Signed-off-by: Jaswinder Singh Rajput Cc: Andreas Herrmann Cc: Mark Langsdorf LKML-Reference: <1239730477.2966.26.camel@ht.satnam> Signed-off-by: Ingo Molnar commit 05725f7eb4b8acb147c5fc7b91397b1f6bcab00d Author: Jiri Pirko Date: Tue Apr 14 20:17:16 2009 +0200 rculist: use list_entry_rcu in places where it's appropriate Use previously introduced list_entry_rcu instead of an open-coded list_entry + rcu_dereference combination. Signed-off-by: Jiri Pirko Reviewed-by: Paul E. McKenney Cc: dipankar@in.ibm.com LKML-Reference: <20090414181715.GA3634@psychotron.englab.brq.redhat.com> Signed-off-by: Ingo Molnar commit 9cfe06f8cd5c8c3ad6ab323973e87dde670642b8 Author: Steven Rostedt Date: Tue Apr 14 21:37:03 2009 -0400 tracing/events: add trace-events-sample This patch adds a sample to the samples directory on how to create and use TRACE_EVENT trace points. Signed-off-by: Steven Rostedt commit ad8d75fff811a6a230f7f43b05a6483099349533 Author: Steven Rostedt Date: Tue Apr 14 19:39:12 2009 -0400 tracing/events: move trace point headers into include/trace/events Impact: clean up Create a sub directory in include/trace called events to keep the trace point headers in their own separate directory. Only headers that declare trace points should be defined in this directory. Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Neil Horman Cc: Zhao Lei Cc: Eduard - Gabriel Munteanu Cc: Pekka Enberg Signed-off-by: Steven Rostedt commit ecda8ae02a08ef065ff387f5cb2a2d4999da2408 Author: Steven Rostedt Date: Tue Apr 14 18:49:38 2009 -0400 tracing/events: fix lockdep system name Impact: fix compile error of lockdep event tracer Ingo Molnar pointed out that the system name for the lockdep tracer was "lock" which is used to include the event trace file name. It should be "lockdep" Reported-by: Ingo Molnar Signed-off-by: Steven Rostedt commit 61f919a12fbdc3fd20f980a34a118d597198a392 Author: Steven Rostedt Date: Tue Apr 14 18:22:32 2009 -0400 tracing/events: fix compile for modules disabled Impact: compile fix The addition of TRACE_EVENT for modules breaks the build for when modules are disabled. This code fixes that. Reported-by: Ingo Molnar Signed-off-by: Steven Rostedt commit 6d723736e472f7a0cd5b62c84152fceead241328 Author: Steven Rostedt Date: Fri Apr 10 14:53:50 2009 -0400 tracing/events: add support for modules to TRACE_EVENT Impact: allow modules to add TRACE_EVENTS on load This patch adds the final hooks to allow modules to use the TRACE_EVENT macro. A notifier and a data structure are used to link the TRACE_EVENTs defined in the module to connect them with the ftrace event tracing system. It also adds the necessary automated clean ups to the trace events when a module is removed. Cc: Rusty Russell Signed-off-by: Steven Rostedt commit 17c873ec280a03894bc718af817f7f24fa787ae1 Author: Steven Rostedt Date: Fri Apr 10 18:12:50 2009 -0400 tracing/events: add export symbols for trace events in modules Impact: let modules add trace events The trace event code requires some functions to be exported to allow modules to use TRACE_EVENT. This patch adds EXPORT_SYMBOL_GPL to the necessary functions. Signed-off-by: Steven Rostedt commit a59fd6027218bd7c994e39d14afe0242f895144f Author: Steven Rostedt Date: Fri Apr 10 13:52:20 2009 -0400 tracing/events: convert event call sites to use a link list Impact: makes it possible to define events in modules The events are created by reading down the section that they are linked in by the macros. But this is not scalable to modules. This patch converts the manipulations to use a global link list, and on boot up it adds the items in the section to the list. This change will allow modules to add their tracing events to the list as well. Note, this change alone does not permit modules to use the TRACE_EVENT macros, but the change is needed for them to eventually do so. Signed-off-by: Steven Rostedt commit f42c85e74faa422cf0bc747ed808681145448f88 Author: Steven Rostedt Date: Mon Apr 13 12:25:37 2009 -0400 tracing/events: move the ftrace event tracing code to core This patch moves the ftrace creation into include/trace/ftrace.h and simplifies the work of developers in adding new tracepoints. Just the act of creating the trace points in include/trace and including define_trace.h will create the events in the debugfs/tracing/events directory. This patch removes the need of include/trace/trace_events.h Signed-off-by: Steven Rostedt commit 97f2025153499faa17267a0d4e18c7afaf73f39d Author: Steven Rostedt Date: Mon Apr 13 11:20:49 2009 -0400 tracing/events: move declarations from trace directory to core include In preparation to allowing trace events to happen in modules, we need to move some of the local declarations in the kernel/trace directory into include/linux. This patch simply moves the declarations and performs no context changes. Signed-off-by: Steven Rostedt commit 9504504cbab29ecb694186b1c5b15d3579c43c51 Author: Steven Rostedt Date: Sat Apr 11 12:59:57 2009 -0400 tracing: make trace_seq operations available for core kernel In the process to make TRACE_EVENT macro work for modules, the trace_seq operations must be available for core kernel code. These operations are quite useful and can be used for other implementations. The main idea is that we create a trace_seq handle that acts very much like the seq_file handle. struct trace_seq *s = kmalloc(sizeof(*s, GFP_KERNEL); trace_seq_init(s); trace_seq_printf(s, "some data %d\n", variable); printk("%s", s->buffer); The main use is to allow a top level function call several other functions that may store printf like data into the buffer. Then at the end, the top level function can process all the data with any method it would like to. It could be passed to userspace, output via printk or even use seq_file: trace_seq_to_user(s, ubuf, cnt); seq_puts(m, s->buffer); Signed-off-by: Steven Rostedt commit a8d154b009168337494fbf345671bab74d3e4b8b Author: Steven Rostedt Date: Fri Apr 10 09:36:00 2009 -0400 tracing: create automated trace defines This patch lowers the number of places a developer must modify to add new tracepoints. The current method to add a new tracepoint into an existing system is to write the trace point macro in the trace header with one of the macros TRACE_EVENT, TRACE_FORMAT or DECLARE_TRACE, then they must add the same named item into the C file with the macro DEFINE_TRACE(name) and then add the trace point. This change cuts out the needing to add the DEFINE_TRACE(name). Every file that uses the tracepoint must still include the trace/.h file, but the one C file must also add a define before the including of that file. #define CREATE_TRACE_POINTS #include This will cause the trace/mytrace.h file to also produce the C code necessary to implement the trace point. Note, if more than one trace/.h is used to create the C code it is best to list them all together. #define CREATE_TRACE_POINTS #include #include #include Thanks to Mathieu Desnoyers and Christoph Hellwig for coming up with the cleaner solution of the define above the includes over my first design to have the C code include a "special" header. This patch converts sched, irq and lockdep and skb to use this new method. Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Neil Horman Cc: Zhao Lei Cc: Eduard - Gabriel Munteanu Cc: Pekka Enberg Signed-off-by: Steven Rostedt commit 72c6a9870f901045f2464c3dc6ee8914bfdc07aa Author: Jiri Pirko Date: Tue Apr 14 17:33:57 2009 +0200 rculist.h: introduce list_entry_rcu() and list_first_entry_rcu() I've run into the situation where I need to use list_first_entry with rcu-guarded list. This patch introduces this. Also simplify list_for_each_entry_rcu() to use new list_entry_rcu() instead of list_entry(). Signed-off-by: Jiri Pirko Reviewed-by: Paul E. McKenney Cc: dipankar@in.ibm.com LKML-Reference: <20090414153356.GC3999@psychotron.englab.brq.redhat.com> Signed-off-by: Ingo Molnar commit 56449f437add737a1e5e1cb7e00f63ac8ead1938 Author: Ingo Molnar Date: Tue Apr 14 11:24:36 2009 +0200 tracing: make the trace clocks available generally Jeremy Fitzhardinge reported this build failure: LD .tmp_vmlinux1 arch/x86/kernel/built-in.o: In function `ds_take_timestamp': git/linux/arch/x86/kernel/ds.c:1380: undefined reference to `trace_clock_global' git/linux/arch/x86/kernel/ds.c:1380: undefined reference to `trace_clock_global' Which is due to !CONFIG_TRACING && CONFIG_X86_DS=y. Expose the trace clock code to CONFIG_X86_DS as well. [ Unfortunately librarizing doesnt work well - ancient architectures with no raw_local_irq_save() primitive break the build. ] Reported-by: Jeremy Fitzhardinge LKML-Reference: <49E4413F.7070700@goop.org> Signed-off-by: Ingo Molnar commit 78ddb08feb7d4fbe3c0a9931804c51ee58be4023 Author: Johannes Weiner Date: Tue Apr 14 16:53:05 2009 +0200 wait: don't use __wake_up_common() '777c6c5 wait: prevent exclusive waiter starvation' made __wake_up_common() global to be used from abort_exclusive_wait(). It was needed to do a wake-up with the waitqueue lock held while passing down a key to the wake-up function. Since '4ede816 epoll keyed wakeups: add __wake_up_locked_key() and __wake_up_sync_key()' there is an appropriate wrapper for this case: __wake_up_locked_key(). Use it here and make __wake_up_common() private to the scheduler again. Signed-off-by: Johannes Weiner Cc: Andrew Morton Cc: Peter Zijlstra LKML-Reference: <1239720785-19661-1-git-send-email-hannes@cmpxchg.org> Signed-off-by: Ingo Molnar commit ea20d9293ce423a39717ed4375393129a2e701f9 Author: Steven Rostedt Date: Fri Apr 10 08:54:16 2009 -0400 tracing: consolidate trace and trace_event headers Impact: clean up Neil Horman (et. al.) criticized the way the trace events were broken up into two files. The reason for that was that ftrace needed to separate out the declarations from where the #include was used. It then dawned on me that the tracepoint.h header only needs to define the TRACE_EVENT macro if it is not already defined. The solution is simply to test if TRACE_EVENT is defined, and if it is not then the linux/tracepoint.h header can define it. This change consolidates all the .h and _event_types.h into the .h file. Reported-by: Neil Horman Reported-by: Theodore Tso Reported-by: Jiaying Zhang Cc: Zhaolei Cc: Frederic Weisbecker Cc: Peter Zijlstra Cc: Jason Baron Cc: Mathieu Desnoyers Signed-off-by: Steven Rostedt commit 7e05575c422d45f393c2d9b5900e97a30bf69bea Author: FUJITA Tomonori Date: Tue Apr 14 12:12:29 2009 +0900 x86: calgary: remove IOMMU_DEBUG CONFIG_IOMMU_DEBUG has depends on CONFIG_GART_IOMMU: config IOMMU_DEBUG bool "Enable IOMMU debugging" depends on GART_IOMMU && DEBUG_KERNEL depends on X86_64 So it's not useful to have CONFIG_IOMMU_DEBUG in Calgary IOMMU code, which does the extra checking of the bitmap space management. And Calgary uses the iommu helper for the bitmap space management now so it would be better to have the extra checking feature in the iommu helper rather than Calgary code (if necessary). Signed-off-by: FUJITA Tomonori Acked-by: Muli Ben-Yehuda Cc: Joerg Roedel Cc: alexisb@us.ibm.com LKML-Reference: <20090414120827G.fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar commit 6424fb38667fffbbb1b90be0ffd9a0c540db6a4b Author: Yinghai Lu Date: Mon Apr 13 23:51:46 2009 -0700 x86: remove (null) in /sys kernel_page_tables Impact: cleanup %p prints out 0x000000000000000 as (null) so use %lx instead. Signed-off-by: Yinghai Lu LKML-Reference: <49E43282.1090607@kernel.org> Signed-off-by: Ingo Molnar commit e790fb0ba64bfec158e1219d899cb588275d12ab Author: Gautham R Shenoy Date: Tue Apr 14 10:25:35 2009 +0530 sched: Nominate a power-efficient ilb in select_nohz_balancer() The CPU that first goes idle becomes the idle-load-balancer and remains that until either it picks up a task or till all the CPUs of the system goes idle. Optimize this further to allow it to relinquish it's post once all it's siblings in the power-aware sched_domain go idle, thereby allowing the whole package-core to go idle. While relinquising the post, nominate another an idle-load balancer from a semi-idle core/package. Signed-off-by: Gautham R Shenoy Acked-by: Peter Zijlstra LKML-Reference: <20090414045535.7645.31641.stgit@sofia.in.ibm.com> Signed-off-by: Ingo Molnar commit f711f6090a81cbd396b63de90f415d33f563af9b Author: Gautham R Shenoy Date: Tue Apr 14 10:25:30 2009 +0530 sched: Nominate idle load balancer from a semi-idle package. Currently the nomination of idle-load balancer is done by choosing the first idle cpu in the nohz.cpu_mask. This may not be power-efficient, since such an idle cpu could come from a completely idle core/package thereby preventing the whole core/package from being in a low-power state. For eg, consider a quad-core dual package system. The cpu numbering need not be sequential and can something like [0, 2, 4, 6] and [1, 3, 5, 7]. With sched_mc/smt_power_savings and the power-aware IRQ balance, we try to keep as fewer Packages/Cores active. But the current idle load balancer logic goes against this by choosing the first_cpu in the nohz.cpu_mask and not taking the system topology into consideration. Improve the algorithm to nominate the idle load balancer from a semi idle cores/packages thereby increasing the probability of the cores/packages being in deeper sleep states for longer duration. The algorithm is activated only when sched_mc/smt_power_savings != 0. Signed-off-by: Gautham R Shenoy Acked-by: Peter Zijlstra LKML-Reference: <20090414045530.7645.12175.stgit@sofia.in.ibm.com> Signed-off-by: Ingo Molnar commit e7d43a74cb07cbc4b8e9b5e4a914816b33fb0719 Author: Jaswinder Singh Rajput Date: Tue Apr 14 13:18:28 2009 +0530 x86: avoid multiple declaration of kstack_depth_to_print Impact: cleanup asm/stacktrace.h is more appropriate so removing other 2 declarations. Signed-off-by: Jaswinder Singh Rajput Cc: Neil Horman LKML-Reference: <1239695308.3033.34.camel@ht.satnam> Signed-off-by: Ingo Molnar commit 66aa230e437d89ca56224135f617e2d8e391a3ef Author: Jaswinder Singh Rajput Date: Tue Apr 14 12:54:29 2009 +0530 x86: page_types.h unification of declarations Impact: unification of declarations, cleanup Unification of declarations: moved init_memory_mapping, initmem_init and free_initmem from page_XX_types.h to page_types.h Signed-off-by: Jaswinder Singh Rajput Acked-by: Pekka Enberg Cc: Andrew Morton LKML-Reference: <1239693869.3033.31.camel@ht.satnam> Signed-off-by: Ingo Molnar commit 6fd9b3a40b82081d9e6490b0d7cd656e9a78a134 Author: Paul E. McKenney Date: Mon Apr 13 21:31:18 2009 -0700 rcu: Update RCU tracing documentation for __rcu_pending This patch updates the RCU documentation to reflect the changes in tracing made in the previous patch in the set. Located-by: Anton Blanchard Tested-by: Anton Blanchard Signed-off-by: Paul E. McKenney Cc: anton@samba.org Cc: akpm@linux-foundation.org Cc: dipankar@in.ibm.com Cc: manfred@colorfullife.com Cc: cl@linux-foundation.org Cc: josht@linux.vnet.ibm.com Cc: schamp@sgi.com Cc: niv@us.ibm.com Cc: dvhltc@us.ibm.com Cc: ego@in.ibm.com Cc: laijs@cn.fujitsu.com Cc: rostedt@goodmis.org Cc: peterz@infradead.org Cc: penberg@cs.helsinki.fi Cc: andi@firstfloor.org Cc: "Paul E. McKenney" LKML-Reference: <12396834792865-git-send-email-> Signed-off-by: Ingo Molnar commit 7ba5c840e64d4a967379f1ae3eca73278180b11d Author: Paul E. McKenney Date: Mon Apr 13 21:31:17 2009 -0700 rcu: Add __rcu_pending tracing to hierarchical RCU Add tracing to __rcu_pending() to provide information on why RCU processing was kicked off. This is helpful for debugging hierarchical RCU, and might also be helpful in learning how hierarchical RCU operates. Located-by: Anton Blanchard Tested-by: Anton Blanchard Signed-off-by: Paul E. McKenney Cc: anton@samba.org Cc: akpm@linux-foundation.org Cc: dipankar@in.ibm.com Cc: manfred@colorfullife.com Cc: cl@linux-foundation.org Cc: josht@linux.vnet.ibm.com Cc: schamp@sgi.com Cc: niv@us.ibm.com Cc: dvhltc@us.ibm.com Cc: ego@in.ibm.com Cc: laijs@cn.fujitsu.com Cc: rostedt@goodmis.org Cc: peterz@infradead.org Cc: penberg@cs.helsinki.fi Cc: andi@firstfloor.org Cc: "Paul E. McKenney" LKML-Reference: <1239683479943-git-send-email-> Signed-off-by: Ingo Molnar commit 05cfbd66d07c44865983c8b65ae9d0037d874206 Merge: 31c9a24 ef631b0 Author: Ingo Molnar Date: Tue Apr 14 11:32:23 2009 +0200 Merge branch 'core/urgent' into core/rcu Merge reason: new patches to be queued up depend on: ef631b0: rcu: Make hierarchical RCU less IPI-happy Signed-off-by: Ingo Molnar commit 0a19e53c1514ad8e9c3cbab40c6c3f52c86f403d Author: Tom Zanussi Date: Mon Apr 13 03:17:50 2009 -0500 tracing/filters: allow on-the-fly filter switching This patch allows event filters to be safely removed or switched on-the-fly while avoiding the use of rcu or the suspension of tracing of previous versions. It does it by adding a new filter_pred_none() predicate function which does nothing and by never deallocating either the predicates or any of the filter_pred members used in matching; the predicate lists are allocated and initialized during ftrace_event_calls initialization. Whenever a filter is removed or replaced, the filter_pred_* functions currently in use by the affected ftrace_event_call are immediately switched over to to the filter_pred_none() function, while the rest of the filter_pred members are left intact, allowing any currently executing filter_pred_* functions to finish up, using the values they're currently using. In the case of filter replacement, the new predicate values are copied into the old predicates after the above step, and the filter_pred_none() functions are replaced by the filter_pred_* functions for the new filter. In this case, it is possible though very unlikely that a previous filter_pred_* is still running even after the filter_pred_none() switch and the switch to the new filter_pred_*. In that case, however, because nothing has been deallocated in the filter_pred, the worst that can happen is that the old filter_pred_* function sees the new values and as a result produces either a false positive or a false negative, depending on the values it finds. So one downside to this method is that rarely, it can produce a bad match during the filter switch, but it should be possible to live with that, IMHO. The other downside is that at least in this patch the predicate lists are always pre-allocated, taking up memory from the start. They could probably be allocated on first-use, and de-allocated when tracing is completely stopped - if this patch makes sense, I could create another one to do that later on. Oh, and it also places a restriction on the size of __arrays in events, currently set to 128, since they can't be larger than the now embedded str_val arrays in the filter_pred struct. Signed-off-by: Tom Zanussi Acked-by: Frederic Weisbecker Cc: Steven Rostedt Cc: paulmck@linux.vnet.ibm.com LKML-Reference: <1239610670.6660.49.camel@tropicana> Signed-off-by: Ingo Molnar commit b5c851a88a369854c04e511cefb84ea2d0cfa209 Merge: eb02ce0 80a04d3 Author: Ingo Molnar Date: Tue Apr 14 00:02:16 2009 +0200 Merge branch 'linus' into tracing/core Merge reason: merge latest tracing fixes to avoid conflicts in kernel/trace/trace_events_filter.c with upcoming change Signed-off-by: Ingo Molnar commit eb02ce017dd83985041a7e54c6449f92d53b026f Author: Tom Zanussi Date: Wed Apr 8 03:15:54 2009 -0500 tracing/filters: use ring_buffer_discard_commit() in filter_check_discard() This patch changes filter_check_discard() to make use of the new ring_buffer_discard_commit() function and modifies the current users to call the old commit function in the non-discard case. It also introduces a version of filter_check_discard() that uses the global trace buffer (filter_current_check_discard()) for those cases. v2 changes: - fix compile error noticed by Ingo Molnar Signed-off-by: Tom Zanussi Cc: Steven Rostedt Cc: fweisbec@gmail.com LKML-Reference: <1239178554.10295.36.camel@tropicana> Signed-off-by: Ingo Molnar commit 5f77a88b3f8268b11940b51d2e03d26a663ceb90 Author: Tom Zanussi Date: Wed Apr 8 03:14:01 2009 -0500 tracing/infrastructure: separate event tracer from event support Add a new config option, CONFIG_EVENT_TRACING that gets selected when CONFIG_TRACING is selected and adds everything needed by the stuff in trace_export - basically all the event tracing support needed by e.g. bprint, minus the actual events, which are only included if CONFIG_EVENT_TRACER is selected. So CONFIG_EVENT_TRACER can be used to turn on or off the generated events (what I think of as the 'event tracer'), while CONFIG_EVENT_TRACING turns on or off the base event tracing support used by both the event tracer and the other things such as bprint that can't be configured out. Signed-off-by: Tom Zanussi Cc: Steven Rostedt Cc: fweisbec@gmail.com LKML-Reference: <1239178441.10295.34.camel@tropicana> Signed-off-by: Ingo Molnar commit 77d9f465d46fd67cdb82ee5e1ab99dd57a17c486 Author: Steven Rostedt Date: Thu Apr 2 01:16:59 2009 -0400 tracing/filters: use ring_buffer_discard_commit for discarded events The ring_buffer_discard_commit makes better usage of the ring_buffer when an event has been discarded. It tries to remove it completely if possible. This patch converts the trace event filtering to use ring_buffer_discard_commit instead of the ring_buffer_event_discard. Signed-off-by: Steven Rostedt Signed-off-by: Ingo Molnar commit fa1b47dd85453ec7d4bcfe4aa4a2d172ba452fc3 Author: Steven Rostedt Date: Thu Apr 2 00:09:41 2009 -0400 ring-buffer: add ring_buffer_discard_commit The ring_buffer_discard_commit is similar to ring_buffer_event_discard but it can only be done on an event that has yet to be commited. Unpredictable results can happen otherwise. The main difference between ring_buffer_discard_commit and ring_buffer_event_discard is that ring_buffer_discard_commit will try to free the data in the ring buffer if nothing has addded data after the reserved event. If something did, then it acts almost the same as ring_buffer_event_discard followed by a ring_buffer_unlock_commit. Note, either ring_buffer_commit_discard and ring_buffer_unlock_commit can be called on an event, not both. This commit also exports both discard functions to be usable by GPL modules. Signed-off-by: Steven Rostedt Signed-off-by: Ingo Molnar commit e45f2e2bd298e1ff687448e5fd15a3588b5807ec Author: Tom Zanussi Date: Tue Mar 31 00:49:16 2009 -0500 tracing/filters: add TRACE_EVENT_FORMAT_NOFILTER event macro Frederic Weisbecker suggested that the trace_special event shouldn't be filterable; this patch adds a TRACE_EVENT_FORMAT_NOFILTER event macro that allows an event format to be exported without having a filter attached, and removes filtering from the trace_special event. Signed-off-by: Tom Zanussi Signed-off-by: Steven Rostedt Signed-off-by: Ingo Molnar commit e1112b4d96859367a93468027c9635e2ac04eb3f Author: Tom Zanussi Date: Tue Mar 31 00:48:49 2009 -0500 tracing/filters: add run-time field descriptions to TRACE_EVENT_FORMAT events This patch adds run-time field descriptions to all the event formats exported using TRACE_EVENT_FORMAT. It also hooks up all the tracers that use them (i.e. the tracers in the 'ftrace subsystem') so they can also have their output filtered by the event-filtering mechanism. When I was testing this, there were a couple of things that fooled me into thinking the filters weren't working, when actually they were - I'll mention them here so others don't make the same mistakes (and file bug reports. ;-) One is that some of the tracers trace multiple events e.g. the sched_switch tracer uses the context_switch and wakeup events, and if you don't set filters on all of the traced events, the unfiltered output from the events without filters on them can make it look like the filtering as a whole isn't working properly, when actually it is doing what it was asked to do - it just wasn't asked to do the right thing. The other is that for the really high-volume tracers e.g. the function tracer, the volume of filtered events can be so high that it pushes the unfiltered events out of the ring buffer before they can be read so e.g. cat'ing the trace file repeatedly shows either no output, or once in awhile some output but that isn't there the next time you read the trace, which isn't what you normally expect when reading the trace file. If you read from the trace_pipe file though, you can catch them before they disappear. Changes from v1: As suggested by Frederic Weisbecker: - get rid of externs in functions - added unlikely() to filter_check_discard() Signed-off-by: Tom Zanussi Signed-off-by: Steven Rostedt Signed-off-by: Ingo Molnar commit 5cda395f4a262788d8ed79ac8a26a2b821e5f751 Author: Alexander van Heukelum Date: Mon Apr 13 17:39:24 2009 +0200 x86: fix function definitions after: x86: apic - introduce imcr_ helpers The patch "introduce imcr_ helpers" introduced good comments, but also a few new compile warnings. This fixes the function definitions to have a 'void' return type. Signed-off-by: Alexander van Heukelum Acked-by: Cyrill Gorcunov LKML-Reference: <20090413153924.GA20287@mailshack.com> Signed-off-by: Ingo Molnar commit b9b34f24b23ba9e79e07c0980e7fff16af2a67d1 Author: Cyrill Gorcunov Date: Sun Apr 12 20:47:42 2009 +0400 x86: smp.c - align smp_ops assignments Impact: cleanup It's a bit hard to parse by eyes without them being aligned. Signed-off-by: Cyrill Gorcunov LKML-Reference: <20090412165058.924175574@openvz.org> Signed-off-by: Ingo Molnar commit 08306ce61d6848e6fbf74fa4cc693c3fb29e943f Author: Cyrill Gorcunov Date: Sun Apr 12 20:47:41 2009 +0400 x86: apic - introduce dummy apic operations Impact: refactor, speed up and robustize code In case if apic was disabled by kernel option or by hardware limits we can use dummy operations in apic->write to simplify the ack_APIC_irq() code. At the lame time the patch fixes the missed EOI in do_IRQ function (which has place if kernel is compiled as X86-32 and interrupt without handler happens where apic was not asked to be disabled via kernel option). Note that native_apic_write_dummy() consists of WARN_ON_ONCE to catch any buggy writes on enabled APICs. Could be removed after some time of testing. Signed-off-by: Cyrill Gorcunov LKML-Reference: <20090412165058.724788431@openvz.org> Signed-off-by: Ingo Molnar commit c0eaa4536f08b98fbcfa7fce5b7b0de1bebcb0e1 Author: Cyrill Gorcunov Date: Sun Apr 12 20:47:40 2009 +0400 x86: apic - introduce imcr_ helpers Impact: cleanup Distinguish port writting magic into helpers with comments. Signed-off-by: Cyrill Gorcunov LKML-Reference: <20090412165058.535921550@openvz.org> Signed-off-by: Ingo Molnar commit edea7148a87c099e5d5d4838285cc27e459588b7 Author: Cyrill Gorcunov Date: Sun Apr 12 20:47:39 2009 +0400 x86: irq.c - tiny cleanup Impact: cleanup, robustization 1) guard ack_bad_irq with printk_ratelimit since there is no guarantee we will not be flooded one day 2) use pr_emerg() helper Signed-off-by: Cyrill Gorcunov LKML-Reference: <20090412165058.277579847@openvz.org> Signed-off-by: Ingo Molnar commit 3fa89ca7ba5ba50b3924a11f6604b4bdce5f7842 Author: Jaswinder Singh Rajput Date: Sun Apr 12 20:37:25 2009 +0530 x86: vdso/vma.c declare vdso_enabled and arch_setup_additional_pages before they get used Impact: cleanup, address sparse warnings Addresses the problem pointed out by these sparse warning: arch/x86/vdso/vma.c:19:28: warning: symbol 'vdso_enabled' was not declared. Should it be static? arch/x86/vdso/vma.c:101:5: warning: symbol 'arch_setup_additional_pages' was not declared. Should it be static? Signed-off-by: Jaswinder Singh Rajput LKML-Reference: <1239548845.4170.2.camel@localhost.localdomain> Signed-off-by: Ingo Molnar commit 66de7792c02693b49671afe58c771fde3b092fc7 Author: Li Zefan Date: Wed Apr 1 16:19:19 2009 +0800 blktrace: fix output of BLK_TC_PC events BLK_TC_PC events should be treated differently with BLK_TC_FS events. Before this patch: # echo 1 > /sys/block/sda/sda1/trace/enable # echo pc > /sys/block/sda/sda1/trace/act_mask # echo blk > /debugfs/tracing/current_tracer # (generate some BLK_TC_PC events) # cat trace bash-2184 [000] 1774.275413: 8,7 I N [bash] bash-2184 [000] 1774.275435: 8,7 D N [bash] bash-2184 [000] 1774.275540: 8,7 I R [bash] bash-2184 [000] 1774.275547: 8,7 D R [bash] ksoftirqd/0-4 [000] 1774.275580: 8,7 C N 0 [0] bash-2184 [000] 1774.275648: 8,7 I R [bash] bash-2184 [000] 1774.275653: 8,7 D R [bash] ksoftirqd/0-4 [000] 1774.275682: 8,7 C N 0 [0] bash-2184 [000] 1774.275739: 8,7 I R [bash] bash-2184 [000] 1774.275744: 8,7 D R [bash] ksoftirqd/0-4 [000] 1774.275771: 8,7 C N 0 [0] bash-2184 [000] 1774.275804: 8,7 I R [bash] bash-2184 [000] 1774.275808: 8,7 D R [bash] ksoftirqd/0-4 [000] 1774.275836: 8,7 C N 0 [0] After this patch: # cat trace bash-2263 [000] 366.782149: 8,7 I N 0 (00 ..) [bash] bash-2263 [000] 366.782323: 8,7 D N 0 (00 ..) [bash] bash-2263 [000] 366.782557: 8,7 I R 8 (25 00 ..) [bash] bash-2263 [000] 366.782560: 8,7 D R 8 (25 00 ..) [bash] ksoftirqd/0-4 [000] 366.782582: 8,7 C N (25 00 ..) [0] bash-2263 [000] 366.782648: 8,7 I R 8 (5a 00 3f 00) [bash] bash-2263 [000] 366.782650: 8,7 D R 8 (5a 00 3f 00) [bash] ksoftirqd/0-4 [000] 366.782669: 8,7 C N (5a 00 3f 00) [0] bash-2263 [000] 366.782710: 8,7 I R 8 (5a 00 08 00) [bash] bash-2263 [000] 366.782713: 8,7 D R 8 (5a 00 08 00) [bash] ksoftirqd/0-4 [000] 366.782730: 8,7 C N (5a 00 08 00) [0] bash-2263 [000] 366.783375: 8,7 I R 36 (5a 00 08 00) [bash] bash-2263 [000] 366.783379: 8,7 D R 36 (5a 00 08 00) [bash] ksoftirqd/0-4 [000] 366.783404: 8,7 C N (5a 00 08 00) [0] This is what we do with PC events in user-space blktrace. Signed-off-by: Li Zefan Acked-by: Arnaldo Carvalho de Melo Cc: Jens Axboe Cc: Steven Rostedt Cc: Frederic Weisbecker LKML-Reference: <49D32387.9040106@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit b78825d608f30a47e3154ab6872a03f0de0c9d45 Author: Li Zefan Date: Wed Apr 1 16:18:53 2009 +0800 blktrace: fix output of unknown events Not all events are pc (packet command) events. An event is a pc event only if it has BLK_TC_PC bit set. Signed-off-by: Li Zefan Acked-by: Arnaldo Carvalho de Melo Cc: Jens Axboe Cc: Steven Rostedt Cc: Frederic Weisbecker LKML-Reference: <49D3236D.3090705@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit fc182a4330fc22ea1b68fa3d5064dd85a73a4c4a Author: Zhaolei Date: Fri Apr 10 14:27:38 2009 +0800 tracing, kmemtrace: Make kmem tracepoints use TRACE_EVENT macro TRACE_EVENT is a more generic way to define tracepoints. Doing so adds these new capabilities to this tracepoint: - zero-copy and per-cpu splice() tracing - binary tracing without printf overhead - structured logging records exposed under /debug/tracing/events - trace events embedded in function tracer output and other plugins - user-defined, per tracepoint filter expressions Signed-off-by: Zhao Lei Acked-by: Eduard - Gabriel Munteanu Acked-by: Pekka Enberg Cc: Steven Rostedt Cc: Frederic Weisbecker Cc: Tom Zanussi LKML-Reference: <49DEE6DA.80600@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 02af61bb50f5d5f0322dbe5ab2a0d75808d25c7b Author: Zhaolei Date: Fri Apr 10 14:26:18 2009 +0800 tracing, kmemtrace: Separate include/trace/kmemtrace.h to kmemtrace part and tracepoint part Impact: refactor code for future changes Current kmemtrace.h is used both as header file of kmemtrace and kmem's tracepoints definition. Tracepoints' definition file may be used by other code, and should only have definition of tracepoint. We can separate include/trace/kmemtrace.h into 2 files: include/linux/kmemtrace.h: header file for kmemtrace include/trace/kmem.h: definition of kmem tracepoints Signed-off-by: Zhao Lei Acked-by: Eduard - Gabriel Munteanu Acked-by: Pekka Enberg Cc: Steven Rostedt Cc: Frederic Weisbecker Cc: Tom Zanussi LKML-Reference: <49DEE68A.5040902@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 2c1b284e4fa260fd922b9a65c99169e2630c6862 Author: Jaswinder Singh Rajput Date: Sat Apr 11 00:03:10 2009 +0530 x86: clean up declarations and variables Impact: cleanup, no code changed - syscalls.h update declarations due to unifications - irq.c declare smp_generic_interrupt() before it gets used - process.c declare sys_fork() and sys_vfork() before they get used - tsc.c rename tsc_khz shadowed variable - apic/probe_32.c declare apic_default before it gets used - apic/nmi.c prev_nmi_count should be unsigned - apic/io_apic.c declare smp_irq_move_cleanup_interrupt() before it gets used - mm/init.c declare direct_gbpages and free_initrd_mem before they get used Signed-off-by: Jaswinder Singh Rajput Signed-off-by: Ingo Molnar commit abd41443ac76d3e9c29a8c1d9e9a3312306cc55e Author: Theodore Ts'o Date: Sat Apr 11 15:51:18 2009 -0400 tracing: Document the event tracing system Signed-off-by: "Theodore Ts'o" Cc: Theodore Ts'o Cc: Steven Rostedt LKML-Reference: <1239479479-2603-3-git-send-email-tytso@mit.edu> Signed-off-by: Ingo Molnar commit 56c49951747f250d8398582509e02ae5ce1d36d1 Author: Theodore Ts'o Date: Sat Apr 11 15:51:19 2009 -0400 tracing: Add documentation for the power tracer Signed-off-by: "Theodore Ts'o" Acked-by: Arjan van de Ven Cc: Frederic Weisbecker Cc: Steven Rostedt LKML-Reference: <1239479479-2603-4-git-send-email-tytso@mit.edu> Signed-off-by: Ingo Molnar commit 2de1f33e99cec5fd79542a1d0e26efb9c36a98bb Author: Jaswinder Singh Rajput Date: Sat Apr 11 12:55:26 2009 +0530 x86: apic/x2apic_cluster.c x86_cpu_to_logical_apicid should be static Impact: reduce kernel size a bit, address sparse warning Addresses the problem pointed out by this sparse warning: arch/x86/kernel/apic/x2apic_cluster.c:13:1: warning: symbol 'per_cpu__x86_cpu_to_logical_apicid' was not declared. Should it be static? Signed-off-by: Jaswinder Singh Rajput Cc: Suresh Siddha LKML-Reference: <1239434726.4418.24.camel@localhost.localdomain> Signed-off-by: Ingo Molnar commit cf9972a921470b0a2da7906104bcd540b20e33bf Author: H. Peter Anvin Date: Sat Apr 11 22:24:05 2009 -0700 x86, setup: fix comment in the "glove box" code Impact: Comment change only The glove box is about avoiding problems with *registers* being touched, not *memory*. Signed-off-by: H. Peter Anvin commit a5a2a0c7fa039c59619bc908b3b1ed24734d442a Author: Darren Hart Date: Fri Apr 10 09:50:05 2009 -0700 futex: fix futex_wait_setup key handling If the get_futex_key() call were to fail, the existing code would try and put_futex_key() prior to returning. This patch makes sure we only put_futex_key() if get_futex_key() succeeded. Reported-by: Clark Williams Signed-off-by: Darren Hart LKML-Reference: <20090410165005.14342.16973.stgit@Aeon> Signed-off-by: Thomas Gleixner commit 47f16ca7631f9c6bad8e6d968cfb1433029b09ec Author: Ingo Molnar Date: Fri Apr 10 14:58:05 2009 +0200 x86, irqinit: preempt merge conflicts To make the topic merge life easier for tip:perfcounters/core, include two (inactive in this topic) IRQ vector initializations here. Also fix build bug - missing kprobes.h inclusion. Signed-off-by: Ingo Molnar commit 6265ff19ca08df0d96c859ae5e4dc2d9ad07070e Author: Andreas Herrmann Date: Thu Apr 9 15:47:10 2009 +0200 x86: cacheinfo: complete L2/L3 Cache and TLB associativity field definitions See "CPUID Specification" (AMD Publication #: 25481, Rev. 2.28, April 2008) Signed-off-by: Andreas Herrmann Cc: Mark Langsdorf LKML-Reference: <20090409134710.GA8026@alberich.amd.com> Signed-off-by: Ingo Molnar commit abdb5a5713330e17dfe91ab0d3e29c4744d95162 Author: Pekka Enberg Date: Thu Apr 9 11:52:30 2009 +0300 x86: remove some ifdefs from native_init_IRQ() Impact: cleanup Reviewed-by Cyrill Gorcunov Signed-off-by: Pekka Enberg Signed-off-by: Ingo Molnar commit ac3048dfd4740becf8d768844cf47ebee363c9f8 Author: Pekka Enberg Date: Thu Apr 9 11:52:29 2009 +0300 x86: define IA32_SYSCALL_VECTOR on 32-bit to reduce ifdefs Impact: cleanup We can remove some #ifdefs if we define IA32_SYSCALL_VECTOR on 32-bit. Reviewed-by Cyrill Gorcunov Signed-off-by: Pekka Enberg Signed-off-by: Ingo Molnar commit 31cb45ef2600d47191d51253ec94b5e3f689260d Author: Pekka Enberg Date: Thu Apr 9 11:52:28 2009 +0300 x86: unify irqinit_{32,64}.c into irqinit.c Impact: cleanup Reviewed-by Cyrill Gorcunov Signed-off-by: Pekka Enberg Signed-off-by: Ingo Molnar commit ab19c25abd14db28d7454f00805ea59f22ed6057 Author: Pekka Enberg Date: Thu Apr 9 11:52:27 2009 +0300 x86: unify apic_intr_init() in irqinit_{32,64}.c Impact: cleanup Reviewed-by Cyrill Gorcunov Signed-off-by: Pekka Enberg Signed-off-by: Ingo Molnar commit 778838600eb6973bdb6fd11e7f91b43cea4d6f45 Author: Pekka Enberg Date: Thu Apr 9 11:52:26 2009 +0300 x86: unify trivial differences in irqinit_{32,64}.c Impact: cleanup Reviewed-by Cyrill Gorcunov Signed-off-by: Pekka Enberg Signed-off-by: Ingo Molnar commit 320fd99672a44ece6d1cd0d838ba31c8ebbf5979 Author: Pekka Enberg Date: Thu Apr 9 11:52:25 2009 +0300 x86: unify native_init_IRQ() in irqinit_{32,64}.c Impact: cleanup Reviewed-by Cyrill Gorcunov Signed-off-by: Pekka Enberg Signed-off-by: Ingo Molnar commit 598c73d250ffb112715aa48fb325d79e255be23b Author: Pekka Enberg Date: Thu Apr 9 11:52:24 2009 +0300 x86: unify init_ISA_irqs() in irqinit_{32,64}.c Impact: cleanup Reviewed-by Cyrill Gorcunov Signed-off-by: Pekka Enberg Signed-off-by: Ingo Molnar commit b0096bb0b640d0a7713618b3472fd0f4adf30a96 Author: Pekka Enberg Date: Thu Apr 9 11:52:23 2009 +0300 x86: unify smp_intr_init() in irqinit_{32,64}.h Impact: cleanup Reviewed-by Cyrill Gorcunov Signed-off-by: Pekka Enberg Signed-off-by: Ingo Molnar commit d3496c85cae22fb7713af6ed542a6aeae8ee4210 Author: Pekka Enberg Date: Thu Apr 9 11:52:22 2009 +0300 x86: use identical loop constructs in 32-bit and 64-bit native_init_IRQ() Impact: cleanup Reviewed-by Cyrill Gorcunov Signed-off-by: Pekka Enberg Signed-off-by: Ingo Molnar commit 22813c45228160b07244a7c4ed7580388ac0f33d Author: Pekka Enberg Date: Thu Apr 9 11:52:21 2009 +0300 x86: introduce apic_intr_init() in irqinit_32.c Impact: cleanup Reviewed-by Cyrill Gorcunov Signed-off-by: Pekka Enberg Signed-off-by: Ingo Molnar commit 36290d87f5abf260a543e5b711be4ceed03e6b1a Author: Pekka Enberg Date: Thu Apr 9 11:52:20 2009 +0300 x86: introduce smp_intr_init() in irqinit_32.c Impact: cleanup Reviewed-by Cyrill Gorcunov Signed-off-by: Pekka Enberg Signed-off-by: Ingo Molnar commit 7371d9fcb88dc9185be9719f64744a339c537a92 Author: Pekka Enberg Date: Thu Apr 9 11:52:19 2009 +0300 x86: move init_ISA_irqs() in irqinit_32.c to match ordering in irqinit_64.c Impact: cleanup Reviewed-by Cyrill Gorcunov Signed-off-by: Pekka Enberg Signed-off-by: Ingo Molnar commit f465145235313c451164bdfa9037ac254bf00c9a Author: Pekka Enberg Date: Thu Apr 9 11:52:18 2009 +0300 x86: move x86_quirk_pre_intr_init() to irqinit_32.c Impact: cleanup In preparation for unifying irqinit_{32,64}.c, make x86_quirk_pre_intr_init() local to irqinit_32.c. Reviewed-by Cyrill Gorcunov Signed-off-by: Pekka Enberg Signed-off-by: Ingo Molnar commit 2fad2d9bb8310889f3261035b594b4e068b6eb8b Author: Mark Langsdorf Date: Thu Apr 9 15:31:53 2009 +0200 x86/docs: add description for cache_disable sysfs interface Signed-off-by: Mark Langsdorf Signed-off-by: Andreas Herrmann Cc: Andrew Morton LKML-Reference: <20090409133153.GL31527@alberich.amd.com> Signed-off-by: Ingo Molnar commit ba518bea2db21c72d44a6cbfd825b026ef9cdcb6 Author: Mark Langsdorf Date: Thu Apr 9 15:24:06 2009 +0200 x86: cacheinfo: disable L3 ECC scrubbing when L3 cache index is disabled (Use correct mask to zero out bits 24-28 by Andreas) Signed-off-by: Mark Langsdorf Signed-off-by: Andreas Herrmann Cc: Andrew Morton LKML-Reference: <20090409132406.GK31527@alberich.amd.com> Signed-off-by: Ingo Molnar commit f8b201fc7110c3673437254e8ba02451461ece0b Author: Mark Langsdorf Date: Thu Apr 9 15:18:49 2009 +0200 x86: cacheinfo: replace sysfs interface for cache_disable feature Impact: replace sysfs attribute Current interface violates against "one-value-per-sysfs-attribute rule". This patch replaces current attribute with two attributes -- one for each L3 Cache Index Disable register. Signed-off-by: Mark Langsdorf Signed-off-by: Andreas Herrmann Cc: Andrew Morton LKML-Reference: <20090409131849.GJ31527@alberich.amd.com> Signed-off-by: Ingo Molnar commit afd9fceec55225d33be878927056a548c2eef26c Author: Andreas Herrmann Date: Thu Apr 9 15:16:17 2009 +0200 x86: cacheinfo: use cached K8 NB_MISC devices instead of scanning for it Impact: avoid code duplication Signed-off-by: Andreas Herrmann Cc: Andrew Morton Cc: Mark Langsdorf LKML-Reference: <20090409131617.GI31527@alberich.amd.com> Signed-off-by: Ingo Molnar commit 845d8c761ec763871936c62b837c4a9ea6d0fbdb Author: Andreas Herrmann Date: Thu Apr 9 15:07:29 2009 +0200 x86: cacheinfo: correct return value when cache_disable feature is not active Impact: bug fix If user writes to "cache_disable" attribute on a CPU that does not support this feature, the process hangs due to an invalid return value in store_cache_disable(). Signed-off-by: Andreas Herrmann Cc: Andrew Morton Cc: Mark Langsdorf LKML-Reference: <20090409130729.GH31527@alberich.amd.com> Signed-off-by: Ingo Molnar commit bda869c614c937c318547c3ee1d65a316b693c21 Author: Andreas Herrmann Date: Thu Apr 9 15:05:10 2009 +0200 x86: cacheinfo: use L3 cache index disable feature only for CPUs that support it AMD family 0x11 CPU doesn't support the feature. Some AMD family 0x10 CPUs do not support it or have an erratum, see erratum #382 in "Revision Guide for AMD Family 10h Processors, 41322 Rev. 3.40 February 2009". Signed-off-by: Andreas Herrmann CC: Mark Langsdorf Cc: Andrew Morton LKML-Reference: <20090409130510.GG31527@alberich.amd.com> Signed-off-by: Ingo Molnar commit 5cb3d1d9d34ac04bcaa2034139345b2a5fea54c1 Author: Zhaolei Date: Thu Apr 9 14:08:18 2009 +0800 tracing, net, skb tracepoint: make skb tracepoint use the TRACE_EVENT() macro TRACE_EVENT is a more generic way to define a tracepoint. Doing so adds these new capabilities to this tracepoint: - zero-copy and per-cpu splice() tracing - binary tracing without printf overhead - structured logging records exposed under /debug/tracing/events - trace events embedded in function tracer output and other plugins - user-defined, per tracepoint filter expressions Signed-off-by: Zhao Lei Acked-by: Neil Horman Cc: "David S. Miller" Cc: Arnaldo Carvalho de Melo Cc: "Steven Rostedt ;" Cc: Frederic Weisbecker Cc: Tom Zanussi LKML-Reference: <49DD90D2.5020604@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit e71e99c294058a61b7a8b9bb6da2f745ac51aa4f Author: Steven Rostedt Date: Wed Mar 25 14:30:04 2009 -0400 x86, function-graph: only save return values on x86_64 Impact: speed up The return to handler portion of the function graph tracer should only need to save the return values. The caller already saved off the registers that the callee can modify. The returning function already saved the registers it modified. When we call our own trace function it too will save the registers that the callee must restore. There's no reason to save off anything more that the registers used to return the values. Note, I did a complete kernel build with this modification and the function graph tracer running on x86_64. Signed-off-by: Steven Rostedt Signed-off-by: Ingo Molnar commit 2062501ae6505dbc5bff3a792246c2661d114050 Author: Frederic Weisbecker Date: Mon Apr 6 01:49:33 2009 +0200 tracing/lockdep: report the time waited for a lock While trying to optimize the new lock on reiserfs to replace the bkl, I find the lock tracing very useful though it lacks something important for performance (and latency) instrumentation: the time a task waits for a lock. That's what this patch implements: bash-4816 [000] 202.652815: lock_contended: lock_contended: &sb->s_type->i_mutex_key bash-4816 [000] 202.652819: lock_acquired: &rq->lock (0.000 us) <...>-4787 [000] 202.652825: lock_acquired: &rq->lock (0.000 us) <...>-4787 [000] 202.652829: lock_acquired: &rq->lock (0.000 us) bash-4816 [000] 202.652833: lock_acquired: &sb->s_type->i_mutex_key (16.005 us) As shown above, the "lock acquired" field is followed by the time it has been waiting for the lock. Usually, a lock contended entry is followed by a near lock_acquired entry with a non-zero time waited. Signed-off-by: Frederic Weisbecker Acked-by: Peter Zijlstra Cc: Steven Rostedt LKML-Reference: <1238975373-15739-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar commit 1cad1252ed279ea59f3f8d3d3a5817eeb2f7a4d3 Merge: dcef788 93cfb3c Author: Ingo Molnar Date: Fri Apr 10 12:46:28 2009 +0200 Merge branch 'tracing/urgent' into tracing/core Merge reason: pick up both v2.6.30-rc1 [which includes tracing/urgent fixes] and pick up the current lineup of tracing/urgent fixes as well Signed-off-by: Ingo Molnar commit cf06de7b9cdd3efee7a59dced1977b3c21d43732 Author: H. Peter Anvin Date: Wed Apr 1 18:20:11 2009 -0700 x86, setup: "glove box" BIOS interrupts in the video code Impact: BIOS proofing "Glove box" off BIOS interrupts in the video code. LKML-Reference: <49DE7F79.4030106@zytor.com> Signed-off-by: H. Peter Anvin Cc: Pavel Machek Cc: Rafael J. Wysocki commit 0a706db320768f8f6e43bbf73b58d2aabdc93354 Author: H. Peter Anvin Date: Wed Apr 1 18:19:00 2009 -0700 x86, setup: "glove box" BIOS interrupts in the MCA code Impact: BIOS proofing "Glove box" off BIOS interrupts in the MCA code. LKML-Reference: <49DE7F79.4030106@zytor.com> Signed-off-by: H. Peter Anvin Cc: James Bottomley commit 3435d3476c5ed955d56a6216ed2d156847b3a575 Author: H. Peter Anvin Date: Wed Apr 1 18:17:17 2009 -0700 x86, setup: "glove box" BIOS interrupts in the EDD code Impact: BIOS proofing "Glove box" off BIOS interrupts in the EDD code. LKML-Reference: <49DE7F79.4030106@zytor.com> Signed-off-by: H. Peter Anvin commit d54ea252e4c92357226992cf65d94616a96e6fce Author: H. Peter Anvin Date: Wed Apr 1 18:14:26 2009 -0700 x86, setup: "glove box" BIOS interrupts in the APM code Impact: BIOS proofing "Glove box" off BIOS interrupts in the APM code. LKML-Reference: <49DE7F79.4030106@zytor.com> Signed-off-by: H. Peter Anvin Cc: Stephen Rothwell commit df7699c56421c0476704f24a43409ac8c505f3d2 Author: H. Peter Anvin Date: Wed Apr 1 18:13:46 2009 -0700 x86, setup: "glove box" BIOS interrupts in the core boot code Impact: BIOS proofing "Glove box" off BIOS interrupts in the core boot code. LKML-Reference: <49DE7F79.4030106@zytor.com> Signed-off-by: H. Peter Anvin commit 7a734e7dd93b9aea08ed51036a9a0e2c9dfd8dac Author: H. Peter Anvin Date: Wed Apr 1 18:08:28 2009 -0700 x86, setup: "glove box" BIOS calls -- infrastructure Impact: new interfaces (not yet used) For all the platforms out there, there is an infinite number of buggy BIOSes. This adds infrastructure to treat BIOS interrupts more like toxic waste and "glove box" them -- we switch out the register set, perform the BIOS interrupt, and then restore the previous state. LKML-Reference: <49DE7F79.4030106@zytor.com> Signed-off-by: H. Peter Anvin Cc: Pavel Machek Cc: Rafael J. Wysocki commit e7c064889606aab3569669078c69b87b2c527e72 Author: Jeremy Fitzhardinge Date: Sat Mar 7 23:48:41 2009 -0800 xen: add FIX_TEXT_POKE to fixmap FIX_TEXT_POKE[01] are used to map kernel addresses, so they're mapping pfns, not mfns. Signed-off-by: Jeremy Fitzhardinge commit 002f128b473fb82f454654be5081b0919ee01ab2 Author: Paul Turner Date: Wed Apr 8 15:29:43 2009 -0700 sched: remove redundant hierarchy walk in check_preempt_wakeup Impact: micro-optimization Under group scheduling we traverse up until we are at common siblings to make the wakeup comparison on. At this point however, they should have the same parent so continuing to check up the tree is redundant. Signed-off-by: Paul Turner Acked-by: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit d2de688891909b148efe83a6fc9520a9cd6015f0 Author: Stephen Rothwell Date: Thu Apr 9 15:17:22 2009 +1000 sparc64: extend TI_RESTART_BLOCK space by 8 bytes Impact: build fix Today's linux-next build (sparc64 defconfig) failed like this: arch/sparc/kernel/built-in.o: In function `trap_init': (.init.text+0x4): undefined reference to `thread_info_offsets_are_bolixed_dave' Caused by commit 52400ba946759af28442dee6265c5c0180ac7122 ("futex: add requeue_pi functionality") (from the tip-core tree) which changed the size of struct restart_block. Shift TI_KUNA_REGS and TI_KUNA_INSN up by 8 bytes to make space for the larger restart block. Signed-off-by: Stephen Rothwell Acked-by: "David S. Miller" Cc: Darren Hart LKML-Reference: <20090409151722.c8eabb56.sfr@canb.auug.org.au> Signed-off-by: Ingo Molnar commit e85abf8f432bb2a13733ab7609fbb8e1500af51d Author: Gary Hade Date: Wed Apr 8 14:07:25 2009 -0700 x86: consolidate SMP code in io_apic.c Impact: Cleanup Reorganizes the code in arch/x86/kernel/io_apic.c by combining two '#ifdef CONFIG_SMP' regions. In addition to making the code easier to understand the first '#ifdef CONFIG_SMP' region is moved to a location later in the file which will reduce the need for function forward declarations when the code subsequently revised. The only changes other than relocating code to a different position in the file were the removal of the assign_irq_vector() forward declaration which was no longer needed and some line length reduction formatting changes. Signed-off-by: Gary Hade Cc: lcm@us.ibm.com LKML-Reference: <20090408210725.GC11159@us.ibm.com> Signed-off-by: Ingo Molnar commit e1b9aa3f47242e757c776a3771bb6613e675bf9c Author: Christoph Lameter Date: Thu Apr 2 13:21:44 2009 +0900 percpu: remove rbtree and use page->index instead Impact: use page->index for addr to chunk mapping instead of dedicated rbtree The rbtree is used to determine the chunk from the virtual address. However, we can already determine the page struct from a virtual address and there are several unused fields in page struct used by vmalloc. Use the index field to store a pointer to the chunk. Then there is no need anymore for an rbtree. tj: * s/(set|get)_chunk/pcpu_\1_page_chunk/ * Drop inline from the above two functions and moved them upwards so that they are with other simple helpers. * Initial pages might not (actually most of the time don't) live in the vmalloc area. With the previous patch to manually reverse-map both first chunks, this is no longer an issue. Removed pcpu_set_chunk() call on initial pages. Signed-off-by: Christoph Lameter Signed-off-by: Tejun Heo Cc: Martin Schwidefsky Cc: rusty@rustcorp.com.au Cc: Paul Mundt Cc: rmk@arm.linux.org.uk Cc: starvik@axis.com Cc: ralf@linux-mips.org Cc: davem@davemloft.net Cc: cooloney@kernel.org Cc: kyle@mcmartin.ca Cc: matthew@wil.cx Cc: grundler@parisc-linux.org Cc: takata@linux-m32r.org Cc: benh@kernel.crashing.org Cc: rth@twiddle.net Cc: ink@jurassic.park.msu.ru Cc: heiko.carstens@de.ibm.com Cc: Linus Torvalds Cc: Nick Piggin LKML-Reference: <49D43D58.4050102@kernel.org> Signed-off-by: Ingo Molnar commit ae9e6bc9f74f8247cbca50a6a93c80e0d686fa19 Author: Tejun Heo Date: Thu Apr 2 13:19:54 2009 +0900 percpu: don't put the first chunk in reverse-map rbtree Impact: both first chunks don't use rbtree, no functional change There can be two first chunks - reserved and dynamic with the former one being optional. Dynamic first chunk was linked on reverse-mapping rbtree while the reserved one was mapped manually using the start address and reserved offset limit. This patch makes both first chunks to be looked up manually without using the rbtree. This is to help getting rid of the rbtree. Signed-off-by: Tejun Heo Cc: Martin Schwidefsky Cc: rusty@rustcorp.com.au Cc: Paul Mundt Cc: rmk@arm.linux.org.uk Cc: starvik@axis.com Cc: ralf@linux-mips.org Cc: davem@davemloft.net Cc: cooloney@kernel.org Cc: kyle@mcmartin.ca Cc: matthew@wil.cx Cc: grundler@parisc-linux.org Cc: takata@linux-m32r.org Cc: benh@kernel.crashing.org Cc: rth@twiddle.net Cc: ink@jurassic.park.msu.ru Cc: heiko.carstens@de.ibm.com Cc: Linus Torvalds Cc: Nick Piggin Cc: Christoph Lameter LKML-Reference: <49D43CEA.3040609@kernel.org> Signed-off-by: Ingo Molnar commit a4e94ef0dd391eae05bdeacd12b8da3510957a97 Author: Zhaolei Date: Fri Mar 27 17:07:05 2009 +0800 printk: add support of hh length modifier for printk Impact: new feature, extend vsprintf format strings hh is used as length modifier for signed char or unsigned char. It is supported by glibc, we add kernel support now. Signed-off-by: Zhao Lei Acked-by: Lai Jiangshan Acked-by: Frederic Weisbecker Cc: torvalds@linux-foundation.org Cc: Steven Rostedt LKML-Reference: <49CC9739.30107@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 42d7c5e353cef9062129b0de3ec9ddf10567b9ca Author: Becky Bruce Date: Wed Apr 8 09:09:21 2009 -0500 swiotlb: change swiotlb_bus_to[phys,virt] prototypes Add a hwdev argument that is needed on some architectures in order to access a per-device offset that is taken into account when producing a physical address (also needed to get from bus address to virtual address because the physical address is an intermediate step). Also make swiotlb_bus_to_virt weak so architectures can override it. Signed-off-by: Becky Bruce Acked-by: FUJITA Tomonori Signed-off-by: Kumar Gala Cc: jeremy@goop.org Cc: ian.campbell@citrix.com LKML-Reference: <1239199761-22886-8-git-send-email-galak@kernel.crashing.org> Signed-off-by: Ingo Molnar commit 380d687833aee098c4a2c3b35beaefe1c1f48d01 Author: Becky Bruce Date: Wed Apr 8 09:09:20 2009 -0500 swiotlb: use swiotlb_sync_single instead of duplicating code Right now both swiotlb_sync_single_range and swiotlb_sync_sg were duplicating the code in swiotlb_sync_single. Just call it instead. Also rearrange the sync_single code for readability. Note that the swiotlb_sync_sg code was previously doing a complicated comparison to determine if an addresses needed to be unmapped where a simple is_swiotlb_buffer() call would have sufficed. Signed-off-by: Becky Bruce Acked-by: FUJITA Tomonori Signed-off-by: Kumar Gala Cc: jeremy@goop.org Cc: ian.campbell@citrix.com LKML-Reference: <1239199761-22886-7-git-send-email-galak@kernel.crashing.org> Signed-off-by: Ingo Molnar commit 7fcebbd2d984eac3fdd6da2f4453e7c43d32de89 Author: Becky Bruce Date: Wed Apr 8 09:09:19 2009 -0500 swiotlb: rename unmap_single to do_unmap_single Previously, swiotlb_unmap_page and swiotlb_unmap_sg were duplicating very similar code. Refactor that code into a new unmap_single and unmap_single use do_unmap_single. Note that the swiotlb_unmap_sg code was previously doing a complicated comparison to determine if an addresses needed to be unmapped where a simple is_swiotlb_buffer() call would have sufficed. Signed-off-by: Becky Bruce Signed-off-by: Kumar Gala Cc: jeremy@goop.org Cc: ian.campbell@citrix.com LKML-Reference: <1239199761-22886-6-git-send-email-galak@kernel.crashing.org> Signed-off-by: Ingo Molnar commit ef5722f698bde01cfec2b98fff733a48663ebf55 Author: Becky Bruce Date: Wed Apr 8 09:09:18 2009 -0500 swiotlb: allow arch override of address_needs_mapping Some architectures require additional checking to determine if a device can dma to an address and need to provide their own address_needs_mapping.. Signed-off-by: Becky Bruce Acked-by: FUJITA Tomonori Signed-off-by: Kumar Gala Cc: jeremy@goop.org Cc: ian.campbell@citrix.com LKML-Reference: <1239199761-22886-5-git-send-email-galak@kernel.crashing.org> Signed-off-by: Ingo Molnar commit dd6b02fe427f30520d0adc94aa52352367227873 Author: Becky Bruce Date: Wed Apr 8 09:09:17 2009 -0500 swiotlb: map_page fix for highmem systems The current code calls virt_to_phys() on address that might be in highmem, which is bad. This wasn't needed, anyway, because we already have the physical address we need. Get rid of the now-unused virtual address as well. Signed-off-by: Becky Bruce Acked-by: FUJITA Tomonori Signed-off-by: Kumar Gala Cc: jeremy@goop.org Cc: ian.campbell@citrix.com LKML-Reference: <1239199761-22886-4-git-send-email-galak@kernel.crashing.org> Signed-off-by: Ingo Molnar commit 67131ad0514d7105b55003a0506209cf1bba3f00 Author: Becky Bruce Date: Wed Apr 8 09:09:16 2009 -0500 swiotlb: fix compile warning Squash a build warning seen on 32-bit powerpc caused by calling min() with 2 different types. Use min_t() instead. Signed-off-by: Becky Bruce Acked-by: FUJITA Tomonori Signed-off-by: Kumar Gala Cc: jeremy@goop.org Cc: ian.campbell@citrix.com LKML-Reference: <1239199761-22886-3-git-send-email-galak@kernel.crashing.org> Signed-off-by: Ingo Molnar commit ceb5ac3264686e75e6951de6a18d4baa9bdecb92 Author: Becky Bruce Date: Wed Apr 8 09:09:15 2009 -0500 swiotlb: comment corrections Impact: cleanup swiotlb_map/unmap_single are now swiotlb_map/unmap_page; trivially change all the comments to reference new names. Also, there were some comments that should have been referring to just plain old map_single, not swiotlb_map_single; fix those as well. Also change a use of the word "pointer", when what is referred to is actually a dma/physical address. Signed-off-by: Becky Bruce Acked-by: FUJITA Tomonori Signed-off-by: Kumar Gala Cc: jeremy@goop.org Cc: ian.campbell@citrix.com LKML-Reference: <1239199761-22886-2-git-send-email-galak@kernel.crashing.org> Signed-off-by: Ingo Molnar commit 02421f98ec55c3ff118f358740ff640f096c7ad6 Author: Yinghai Lu Date: Fri Apr 3 17:15:53 2009 -0700 x86: consistent about warm_reset_vector for UN_NON_UNIQUE_APIC Impact: cleanup didn't set it for UV_NON_UNIQUE_APIC, so don't restore it Signed-off-by: Yinghai Lu LKML-Reference: <49D6A6B9.6060501@kernel.org> Signed-off-by: Ingo Molnar commit cdc1cb0d4445f39561a65204d26f89365f917550 Author: Yinghai Lu Date: Fri Apr 3 17:15:14 2009 -0700 x86: make wakeup_secondary_cpu_via_init static Impact: cleanup Signed-off-by: Yinghai Lu LKML-Reference: <49D6A692.6040400@kernel.org> Signed-off-by: Ingo Molnar commit a59dacfdc9ba06903652fa4883bf1106278b18ec Author: Ingo Molnar Date: Fri Oct 17 14:38:08 2008 +0200 x86 early quirks: eliminate unused function Impact: cleanup this warning: arch/x86/kernel/early-quirks.c:99: warning: ‘ati_ixp4x0_rev’ defined but not used triggers because ati_ixp4x0_rev() is only used in the ACPI && X86_IO_APIC case. Signed-off-by: Ingo Molnar commit 4ecf458492c2d97b3f9d850a5f92d79792e0a7e7 Author: Jiri Slaby Date: Wed Apr 8 13:32:00 2009 +0200 x86_64: fix incorrect comments Impact: cleanup The comments which fxrstor_checking and fxsave_uset refer to is now in fxsave. Change the comments appropriately. Signed-off-by: Jiri Slaby Cc: Jiri Slaby LKML-Reference: <1239190320-23952-3-git-send-email-jirislaby@gmail.com> Signed-off-by: Ingo Molnar commit 34ba476a01e128aad51e02f9be854584e9ec73cf Author: Jiri Slaby Date: Wed Apr 8 13:31:59 2009 +0200 x86: unify restore_fpu_checking Impact: cleanup On x86_32, separate f*rstor to an inline function which makes restore_fpu_checking the same on both platforms -> move it outside the ifdefs. Signed-off-by: Jiri Slaby LKML-Reference: <1239190320-23952-2-git-send-email-jirislaby@gmail.com> Signed-off-by: Ingo Molnar commit fcb2ac5bdfa3a7a04fb9749b916f64400f4c35a8 Author: Jiri Slaby Date: Wed Apr 8 13:31:58 2009 +0200 x86_32: introduce restore_fpu_checking() Impact: cleanup, prepare FPU code unificaton Like on x86_64, return an error from restore_fpu and kill the task if it fails. Also rename restore_fpu to restore_fpu_checking which allows ifdefs to be removed in math_state_restore(). Signed-off-by: Jiri Slaby LKML-Reference: <1239190320-23952-1-git-send-email-jirislaby@gmail.com> Signed-off-by: Ingo Molnar commit bab5bc9e857638880facef76e4b4c3fa807f8c73 Author: Darren Hart Date: Tue Apr 7 23:23:50 2009 -0700 futex: fixup unlocked requeue pi case Thomas's testing caught a problem when the requeue target futex is unowned and multiple tasks are requeued to it. This patch ensures the FUTEX_WAITERS bit gets set if futex_requeue() will requeue one or more tasks in addition to the one acquiring the lock. Signed-off-by: Darren Hart Signed-off-by: Thomas Gleixner commit a34b50ddc265bae058c66661b096ef6384c5a8b1 Author: Ingo Molnar Date: Wed Apr 8 10:56:54 2009 +0200 mm, x86, ptrace, bts: defer branch trace stopping, remove dead code Remove the unused free_locked_buffer() API. Signed-off-by: Ingo Molnar commit 44bc9dc729e33a4ec6ebed4d0b6c08e8d20b42cf Author: Ingo Molnar Date: Wed Apr 8 10:47:17 2009 +0200 mm, x86, ptrace, bts: defer branch trace stopping, cleanup Andrew Morton noticed that mm.h needlessly includes sched.h - remove it. Reported-by: Andrew Morton Signed-off-by: Ingo Molnar commit 169aafbc8d3f05431b5cfeb60294a12b8ef2bcee Author: Jeremy Fitzhardinge Date: Tue Apr 7 13:37:26 2009 -0700 lguest: update lazy mmu changes to match lguest's use of kvm hypercalls Duplicate hcall -> kvm_hypercall0 convertion from "lguest: use KVM hypercalls". Signed-off-by: Jeremy Fitzhardinge Cc: Matias Zabaljauregui Cc: Rusty Russell commit 38f4b8c0da01ae7cd9b93386842ce272d6fde9ab Merge: a811454 8e2c4f2 Author: Jeremy Fitzhardinge Date: Tue Apr 7 13:34:16 2009 -0700 Merge commit 'origin/master' into for-linus/xen/master * commit 'origin/master': (4825 commits) Fix build errors due to CONFIG_BRANCH_TRACER=y parport: Use the PCI IRQ if offered tty: jsm cleanups Adjust path to gpio headers KGDB_SERIAL_CONSOLE check for module Change KCONFIG name tty: Blackin CTS/RTS Change hardware flow control from poll to interrupt driven Add support for the MAX3100 SPI UART. lanana: assign a device name and numbering for MAX3100 serqt: initial clean up pass for tty side tty: Use the generic RS485 ioctl on CRIS tty: Correct inline types for tty_driver_kref_get() splice: fix deadlock in splicing to file nilfs2: support nanosecond timestamp nilfs2: introduce secondary super block nilfs2: simplify handling of active state of segments nilfs2: mark minor flag for checkpoint created by internal operation nilfs2: clean up sketch file nilfs2: super block operations fix endian bug ... Conflicts: arch/x86/include/asm/thread_info.h arch/x86/lguest/boot.c drivers/xen/manage.c commit a811454027352c762e0d5bba1b1d8f7d26bf96ae Merge: 1943689 53152f9 Author: Jeremy Fitzhardinge Date: Tue Apr 7 13:16:04 2009 -0700 Merge branch 'for-linus/xen/core' into for-linus/xen/master * for-linus/xen/core: xen: honour VCPU availability on boot commit dcef788eb9659b61a2110284fcce3ca6e63480d2 Author: Zhaolei Date: Tue Mar 31 15:26:14 2009 +0800 ftrace: clean up enable logic for sched_switch Unify sched_switch and sched_wakeup's action to following logic: Do record_cmdline when start_cmdline_record() is called. Start tracing events when the tracer is started. Signed-off-by: Zhao Lei LKML-Reference: <49D1C596.5050203@cn.fujitsu.com> Signed-off-by: Steven Rostedt commit 597af81537654097b67fd7a0c92775e66d4a86fe Author: Steven Rostedt Date: Fri Apr 3 15:24:12 2009 -0400 function-graph: use int instead of atomic for ftrace_graph_active Impact: cleanup The variable ftrace_graph_active is only modified under the ftrace_lock mutex, thus an atomic is not necessary for modification. Reported-by: Andrew Morton Signed-off-by: Steven Rostedt commit 5452af664f6fba26b80eb2c8c4ceae2999d5cf56 Author: Frederic Weisbecker Date: Fri Mar 27 00:25:38 2009 +0100 tracing/ftrace: factorize the tracing files creation Impact: cleanup Most of the tracing files creation follow the same pattern: ret = debugfs_create_file(...) if (!ret) pr_warning("Couldn't create ... entry\n") Unify it! Reported-by: Ingo Molnar Signed-off-by: Frederic Weisbecker LKML-Reference: <1238109938-11840-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Steven Rostedt commit a5dec5573f3c7e63f2f9b5852b9759ea342a5ff9 Author: Li Zefan Date: Fri Mar 27 14:55:44 2009 +0800 tracing: use macros to denote usec and nsec per second Impact: cleanup Use USEC_PER_SEC and NSEC_PER_SEC instead of 1000000 and 1000000000. Signed-off-by: Li Zefan LKML-Reference: <49CC7870.9000309@cn.fujitsu.com> Acked-by: Frederic Weisbecker Signed-off-by: Steven Rostedt commit 86665c75da41889f92b774f31ea5a9a436f392a8 Merge: 93776a8 1bbe2a8 Author: Ingo Molnar Date: Tue Apr 7 14:41:14 2009 +0200 Merge branch 'tracing/urgent' into tracing/ftrace commit 93776a8ec746cf9d32c36e5a5b23d28d8be28826 Merge: 34886c8 d508afb Author: Ingo Molnar Date: Tue Apr 7 13:47:33 2009 +0200 Merge branch 'linus' into tracing/core Merge reason: update to upstream tracing facilities Signed-off-by: Ingo Molnar commit 017bc617657c928cb9a0c45a7a7e9f4e66695347 Author: Markus Metzger Date: Fri Apr 3 16:43:52 2009 +0200 x86, ds: support Core i7 Add debug store support for Core i7. Core i7 adds a reset value for each performance counter and a new PEBS record format. Signed-off-by: Markus Metzger Cc: roland@redhat.com Cc: eranian@googlemail.com Cc: oleg@redhat.com Cc: juan.villacis@intel.com Cc: ak@linux.jf.intel.com LKML-Reference: <20090403144607.088997000@intel.com> Signed-off-by: Ingo Molnar commit 150f5164c1258e05b7dea16f29e592f354c48f34 Author: Markus Metzger Date: Fri Apr 3 16:43:51 2009 +0200 x86, ds: allow small debug store buffers Check the buffer size more precisely to allow buffers for exactly one element provided the base address is already properly aligned. Add a debug store selftest. Reported-by: Stephane Eranian Signed-off-by: Markus Metzger Cc: roland@redhat.com Cc: eranian@googlemail.com Cc: oleg@redhat.com Cc: juan.villacis@intel.com Cc: ak@linux.jf.intel.com LKML-Reference: <20090403144606.139137000@intel.com> Signed-off-by: Ingo Molnar commit 608780a9048efa3e85fbc4d8649b26805cc588aa Author: Markus Metzger Date: Fri Apr 3 16:43:50 2009 +0200 x86, ds: fix bad ds_reset_pebs() Ds_reset_pebs() passed the wrong qualifier to a shared function resulting in a reset of bts, rather than pebs. Reported-by: Stephane Eranian Signed-off-by: Markus Metzger Cc: roland@redhat.com Cc: eranian@googlemail.com Cc: oleg@redhat.com Cc: juan.villacis@intel.com Cc: ak@linux.jf.intel.com LKML-Reference: <20090403144605.206510000@intel.com> Signed-off-by: Ingo Molnar commit 6047550d3d26fed88b18a208b31f8b90b5ef3e9b Author: Markus Metzger Date: Fri Apr 3 16:43:49 2009 +0200 x86, ds: dont use TIF_DEBUGCTLMSR Debug store already uses TIF_DS_AREA_MSR to trigger debug store context switch handling. No need to use TIF_DEBUGCTLMSR, as well. Signed-off-by: Markus Metzger Cc: roland@redhat.com Cc: eranian@googlemail.com Cc: oleg@redhat.com Cc: juan.villacis@intel.com Cc: ak@linux.jf.intel.com LKML-Reference: <20090403144604.256645000@intel.com> Signed-off-by: Ingo Molnar commit 0f4814065ff8c24ca8bfd75c9b73502be152c287 Author: Markus Metzger Date: Fri Apr 3 16:43:48 2009 +0200 x86, ptrace: add bts context unconditionally Add the ptrace bts context field to task_struct unconditionally. Initialize the field directly in copy_process(). Remove all the unneeded functionality used to initialize that field. Signed-off-by: Markus Metzger Cc: roland@redhat.com Cc: eranian@googlemail.com Cc: oleg@redhat.com Cc: juan.villacis@intel.com Cc: ak@linux.jf.intel.com LKML-Reference: <20090403144603.292754000@intel.com> Signed-off-by: Ingo Molnar commit ee811517a5604aa63fae803b7c044712699e1303 Author: Markus Metzger Date: Fri Apr 3 16:43:47 2009 +0200 x86, ds: use single debug store cpu configuration Use a single configuration for all cpus. Reported-by: Ingo Molnar Signed-off-by: Markus Metzger Cc: roland@redhat.com Cc: eranian@googlemail.com Cc: oleg@redhat.com Cc: juan.villacis@intel.com Cc: ak@linux.jf.intel.com LKML-Reference: <20090403144602.191165000@intel.com> Signed-off-by: Ingo Molnar commit 2311f0de21c17b2a8b960677a9cccfbfa52beb35 Author: Markus Metzger Date: Fri Apr 3 16:43:46 2009 +0200 x86, ds: add leakage warning Add a warning in case a debug store context is not removed before the task it is attached to is freed. Remove the old warning at thread exit. It is too early. Declare the debug store context field in thread_struct unconditionally. Remove ds_copy_thread() and ds_exit_thread() and do the work directly in process*.c. Signed-off-by: Markus Metzger Cc: roland@redhat.com Cc: eranian@googlemail.com Cc: oleg@redhat.com Cc: juan.villacis@intel.com Cc: ak@linux.jf.intel.com LKML-Reference: <20090403144601.254472000@intel.com> Signed-off-by: Ingo Molnar commit 3a68eef945b234f286406d96dc690fe17863c203 Author: Markus Metzger Date: Fri Apr 3 16:43:45 2009 +0200 x86, ds: add task tracing selftest Add selftests to cover per-task branch tracing. Signed-off-by: Markus Metzger Cc: roland@redhat.com Cc: eranian@googlemail.com Cc: oleg@redhat.com Cc: juan.villacis@intel.com Cc: ak@linux.jf.intel.com LKML-Reference: <20090403144600.329346000@intel.com> Signed-off-by: Ingo Molnar commit 01f6569ece6915616f6cae1d7d8b46ab8da9c1bd Author: Markus Metzger Date: Fri Apr 3 16:43:44 2009 +0200 x86, ds: selftest each cpu Perform debug store selftests on each cpu. Cover both the normal and the _noirq variant of the debug store interface. Signed-off-by: Markus Metzger Cc: roland@redhat.com Cc: eranian@googlemail.com Cc: oleg@redhat.com Cc: juan.villacis@intel.com Cc: ak@linux.jf.intel.com LKML-Reference: <20090403144559.394583000@intel.com> Signed-off-by: Ingo Molnar commit 84f201139245c30777ff858e71b8d7e134b8c3ed Author: Markus Metzger Date: Fri Apr 3 16:43:43 2009 +0200 x86, ds: fix bounds check in ds selftest Fix a bad bounds check in the debug store selftest. Signed-off-by: Markus Metzger Cc: roland@redhat.com Cc: eranian@googlemail.com Cc: oleg@redhat.com Cc: juan.villacis@intel.com Cc: ak@linux.jf.intel.com LKML-Reference: <20090403144558.450027000@intel.com> Signed-off-by: Ingo Molnar commit 353afeea24cc51aafc0ff21a72ec740b6f0af50c Author: Markus Metzger Date: Fri Apr 3 16:43:42 2009 +0200 x86, ds: fix compiler warning Size_t is defined differently on i386 and x86_64. Change type to avoid compiler warning. Signed-off-by: Markus Metzger Cc: roland@redhat.com Cc: eranian@googlemail.com Cc: oleg@redhat.com Cc: juan.villacis@intel.com Cc: ak@linux.jf.intel.com LKML-Reference: <20090403144557.523964000@intel.com> Signed-off-by: Ingo Molnar commit 4d657e51dfc042216febd4a007c6f36881f9256d Author: Markus Metzger Date: Fri Apr 3 16:43:41 2009 +0200 x86, hw-branch-tracer: allocate selftest iterator on heap Allocate the trace_iterator for the hw-branch-tracer selftest on the heap. Signed-off-by: Markus Metzger Cc: roland@redhat.com Cc: eranian@googlemail.com Cc: oleg@redhat.com Cc: juan.villacis@intel.com Cc: ak@linux.jf.intel.com LKML-Reference: <20090403144556.578777000@intel.com> Signed-off-by: Ingo Molnar commit de79f54f5347ad7ec6ff55ccbb6d4ab2a21f6a93 Author: Markus Metzger Date: Fri Apr 3 16:43:40 2009 +0200 x86, bts, hw-branch-tracer: add _noirq variants to the debug store interface The hw-branch-tracer uses debug store functions from an on_each_cpu() context, which is simply wrong since the functions may sleep. Add _noirq variants for most functions, which may be called with interrupts disabled. Separate per-cpu and per-task tracing and allow per-cpu tracing to be controlled from any cpu. Make the hw-branch-tracer use the new debug store interface, synchronize with hotplug cpu event using get/put_online_cpus(), and remove the unnecessary spinlock. Make the ptrace bts and the ds selftest code use the new interface. Defer the ds selftest. Signed-off-by: Markus Metzger Cc: roland@redhat.com Cc: eranian@googlemail.com Cc: oleg@redhat.com Cc: juan.villacis@intel.com Cc: ak@linux.jf.intel.com LKML-Reference: <20090403144555.658136000@intel.com> Signed-off-by: Ingo Molnar commit 35bb7600c17762bb129588c1877d2717fe325289 Author: Markus Metzger Date: Fri Apr 3 16:43:39 2009 +0200 x86, debugctlmsr: add _on_cpu variants to debugctlmsr functions Add functions to get and set the debugctlmsr on different cpus. Signed-off-by: Markus Metzger Cc: roland@redhat.com Cc: eranian@googlemail.com Cc: oleg@redhat.com Cc: juan.villacis@intel.com Cc: ak@linux.jf.intel.com LKML-Reference: <20090403144554.738772000@intel.com> Signed-off-by: Ingo Molnar commit 15879d042164650b93d83281ad5f87ad323bfbfe Author: Markus Metzger Date: Fri Apr 3 16:43:38 2009 +0200 x86, bts: use trace_clock_global() for timestamps Rename the bts_struct timestamp field to event. Use trace_clock_global() for time measurement. Reported-by: Ingo Molnar Signed-off-by: Markus Metzger Cc: roland@redhat.com Cc: eranian@googlemail.com Cc: oleg@redhat.com Cc: juan.villacis@intel.com Cc: ak@linux.jf.intel.com LKML-Reference: <20090403144553.773216000@intel.com> Signed-off-by: Ingo Molnar commit 38f801129ad07b9afa7f9bd3779f61b805416d8c Author: Markus Metzger Date: Fri Apr 3 16:43:37 2009 +0200 x86, bts: fix race between per-task and per-cpu branch tracing Per-task branch tracing installs a debug store context with the traced task. This immediately results in the branch trace control bits to be cleared for the next context switch of that task, if not set before. Either per-cpu or per-task tracing are allowed at the same time. An active per-cpu tracing would be disabled even if the per-task tracing request is rejected and the task debug store context removed. Check the tracing type (per-cpu or per-task) before installing a task debug store context. Signed-off-by: Markus Metzger Cc: roland@redhat.com Cc: eranian@googlemail.com Cc: oleg@redhat.com Cc: juan.villacis@intel.com Cc: ak@linux.jf.intel.com LKML-Reference: <20090403144552.856000000@intel.com> Signed-off-by: Ingo Molnar commit 8d99b3ac2726e5edd97ad147fa5c1f2acb63a745 Author: Markus Metzger Date: Fri Apr 3 16:43:36 2009 +0200 x86, bts: wait until traced task has been scheduled out In order to stop branch tracing for a running task, we need to first clear the branch tracing control bits before we may free the tracing buffer. If the traced task is running, the cpu might still trace that task after the branch trace control bits have cleared. Wait until the traced task has been scheduled out before proceeding. A similar problem affects the task debug store context. We first remove the context, then we need to wait until the task has been scheduled out before we can free the context memory. Reviewed-by: Oleg Nesterov Signed-off-by: Markus Metzger Cc: roland@redhat.com Cc: eranian@googlemail.com Cc: juan.villacis@intel.com Cc: ak@linux.jf.intel.com LKML-Reference: <20090403144551.919636000@intel.com> Signed-off-by: Ingo Molnar commit e2b371f00a6f529f6362654239bdec8dcd510760 Author: Markus Metzger Date: Fri Apr 3 16:43:35 2009 +0200 mm, x86, ptrace, bts: defer branch trace stopping When a ptraced task is unlinked, we need to stop branch tracing for that task. Since the unlink is called with interrupts disabled, and we need interrupts enabled to stop branch tracing, we defer the work. Collect all branch tracing related stuff in a branch tracing context. Reviewed-by: Oleg Nesterov Signed-off-by: Markus Metzger Cc: Andrew Morton Cc: Peter Zijlstra Cc: roland@redhat.com Cc: eranian@googlemail.com Cc: juan.villacis@intel.com Cc: ak@linux.jf.intel.com LKML-Reference: <20090403144550.712401000@intel.com> Signed-off-by: Ingo Molnar commit a26b89f05d194413c7238e0bea071054f6b5d3c8 Author: Markus Metzger Date: Fri Apr 3 16:43:34 2009 +0200 sched, hw-branch-tracer: add wait_task_context_switch() function to sched.h Add a function to wait until some other task has been switched out at least once. This differs from wait_task_inactive() subtly, in that the latter will wait until the task has left the CPU. Signed-off-by: Markus Metzger Cc: markus.t.metzger@gmail.com Cc: roland@redhat.com Cc: eranian@googlemail.com Cc: oleg@redhat.com Cc: juan.villacis@intel.com Cc: ak@linux.jf.intel.com LKML-Reference: <20090403144549.794157000@intel.com> Signed-off-by: Ingo Molnar commit cac94f979326212831c0ea44ed9ea1622b4f4e93 Author: Markus Metzger Date: Fri Apr 3 16:43:33 2009 +0200 x86, bts: fix race when bts tracer is removed When the bts tracer is removed while the traced task is running, the write to clear the bts tracer pointer races with context switch code. Read the tracer once during a context switch. When a new tracer is installed, the bts tracer is set in the ds context before the tracer is initialized in order to claim the context for that tracer. This may result in write accesses using an uninitialized trace configuration when scheduling timestamps have been requested. Store active tracing flags separately and only set active flags after the tracing configuration has been initialized. Reviewed-by: Oleg Nesterov Signed-off-by: Markus Metzger Cc: roland@redhat.com Cc: eranian@googlemail.com Cc: juan.villacis@intel.com Cc: ak@linux.jf.intel.com LKML-Reference: <20090403144548.881338000@intel.com> Signed-off-by: Ingo Molnar commit 2e8844e13ab73f1107aea4317a53ff5879f2e1d7 Merge: c78a395 d508afb Author: Ingo Molnar Date: Tue Apr 7 13:34:26 2009 +0200 Merge branch 'linus' into tracing/hw-branch-tracing Merge reason: update to latest tracing and ptrace APIs Signed-off-by: Ingo Molnar commit 52400ba946759af28442dee6265c5c0180ac7122 Author: Darren Hart Date: Fri Apr 3 13:40:49 2009 -0700 futex: add requeue_pi functionality PI Futexes and their underlying rt_mutex cannot be left ownerless if there are pending waiters as this will break the PI boosting logic, so the standard requeue commands aren't sufficient. The new commands properly manage pi futex ownership by ensuring a futex with waiters has an owner at all times. This will allow glibc to properly handle pi mutexes with pthread_condvars. The approach taken here is to create two new futex op codes: FUTEX_WAIT_REQUEUE_PI: Tasks will use this op code to wait on a futex (such as a non-pi waitqueue) and wake after they have been requeued to a pi futex. Prior to returning to userspace, they will acquire this pi futex (and the underlying rt_mutex). futex_wait_requeue_pi() is the result of a high speed collision between futex_wait() and futex_lock_pi() (with the first part of futex_lock_pi() being done by futex_proxy_trylock_atomic() on behalf of the top_waiter). FUTEX_REQUEUE_PI (and FUTEX_CMP_REQUEUE_PI): This call must be used to wake tasks waiting with FUTEX_WAIT_REQUEUE_PI, regardless of how many tasks the caller intends to wake or requeue. pthread_cond_broadcast() should call this with nr_wake=1 and nr_requeue=INT_MAX. pthread_cond_signal() should call this with nr_wake=1 and nr_requeue=0. The reason being we need both callers to get the benefit of the futex_proxy_trylock_atomic() routine. futex_requeue() also enqueues the top_waiter on the rt_mutex via rt_mutex_start_proxy_lock(). Signed-off-by: Darren Hart Reviewed-by: Thomas Gleixner Signed-off-by: Thomas Gleixner commit f801073f87aa22ddf0e9146355fec3993163790f Author: Darren Hart Date: Fri Apr 3 13:40:40 2009 -0700 futex: split out futex value validation code Refactor the code to validate the expected futex value in order to reuse it with the requeue_pi code. Signed-off-by: Darren Hart Reviewed-by: Thomas Gleixner Signed-off-by: Thomas Gleixner commit 9121e4783cd5c7e2a407763f3b61c2d573891133 Author: Darren Hart Date: Fri Apr 3 13:40:31 2009 -0700 futex: distangle futex_requeue() futex_requeue() is getting a bit long-winded, and will be getting more so after the requeue_pi patch. Factor out the actual requeueing into a nicely contained inline function to reduce function length and improve legibility. Signed-off-by: Darren Hart Reviewed-by: Thomas Gleixner Signed-off-by: Thomas Gleixner commit a72188d8a64ebe74722f1cf7ffac41b41ffdba21 Author: Darren Hart Date: Fri Apr 3 13:40:22 2009 -0700 futex: add FUTEX_HAS_TIMEOUT flag to restart.futex.flags Currently restart is only used if there is a timeout. The requeue_pi functionality requires restarting to futex_lock_pi() on signal after wakeup in futex_wait_requeue_pi() regardless of if there was a timeout or not. Using 0 for the timeout value is confusing as that could indicate an expired timer. The flag makes this explicit. While the check is not technically needed in futex_wait_restart(), doing so makes the code consistent with and will avoid confusion should the need arise to restart wait without a timeout. Signed-off-by: Darren Hart Reviewed-by: Thomas Gleixner Signed-off-by: Thomas Gleixner commit 8dac456a681bd94272ff50ecb31be6b669382c2b Author: Darren Hart Date: Fri Apr 3 13:40:12 2009 -0700 rt_mutex: add proxy lock routines This patch is a prerequisite for futex requeue_pi. It basically splits rt_mutex_slowlock() right down the middle, just before the first call to schedule(). It further adds helper functions which make use of the split and provide the rt-mutex preliminaries for futex requeue_pi. Signed-off-by: Darren Hart Reviewed-by: Thomas Gleixner Signed-off-by: Thomas Gleixner commit dd9739980b50c8cde33e1f8eb08b7e0140bcd61e Author: Darren Hart Date: Fri Apr 3 13:40:02 2009 -0700 futex: split out fixup owner logic from futex_lock_pi() Refactor the post lock acquisition logic from futex_lock_pi(). This code will be reused in futex_wait_requeue_pi(). Signed-off-by: Darren Hart Reviewed-by: Thomas Gleixner Signed-off-by: Thomas Gleixner commit 1a52084d0919c2799258737c21fb328a9de159b5 Author: Darren Hart Date: Fri Apr 3 13:39:52 2009 -0700 futex: split out atomic logic from futex_lock_pi() Refactor the atomic portion of futex_lock_pi() into futex_lock_pi_atomic(). This logic will be needed by requeue_pi, so modularize it to reduce code duplication. The only significant change is passing of the task to try and take the lock for. This simplifies the -EDEADLK test as if the lock is owned by task t, it's a deadlock, regardless of if we are doing requeue pi or not. This patch updates the corresponding comment accordingly. Signed-off-by: Darren Hart Reviewed-by: Thomas Gleixner Signed-off-by: Thomas Gleixner commit 4b1c486b3587d2abf50bee4a05eb488cd4045f2c Author: Darren Hart Date: Fri Apr 3 13:39:42 2009 -0700 futex: add helper to find the top prio waiter of a futex Improve legibility by wrapping finding the top waiter in a function. This will be used by the follow-on patches for enabling requeue pi. Signed-off-by: Darren Hart Reviewed-by: Thomas Gleixner Signed-off-by: Thomas Gleixner commit ca5f9524d61f54b1f618293ab92fc6b49cac864d Author: Darren Hart Date: Fri Apr 3 13:39:33 2009 -0700 futex: separate futex_wait_queue_me() logic from futex_wait() Refactor futex_wait() in preparation for futex_wait_requeue_pi(). In order to reuse a good chunk of the futex_wait() code for the upcoming futex_wait_requeue_pi() function, this patch breaks out the queue-to-wakeup section of futex_wait() into futex_wait_queue_me(). Signed-off-by: Darren Hart Reviewed-by: Thomas Gleixner Signed-off-by: Thomas Gleixner commit 31c9a24ec82926fcae49483e53566d231e705057 Author: Paul E. McKenney Date: Thu Apr 2 21:06:25 2009 -0700 RCU: make treercu be default Impact: switch default config from CLASSIC_RCU to TREE_RCU Given that I have not gotten any complaints or bug reports on treercu recently, this patch makes it be the default. There are a number of other defconfig files that explicitly call out CLASSIC_RCU, but which have comment headers saying not to edit them. Probably holdovers from one of the flavors of "make config", but... Signed-off-by: Paul E. McKenney Cc: akpm@linux-foundation.org Cc: dipankar@in.ibm.com Cc: niv@us.ibm.com Cc: manfred@colorfullife.com Cc: peterz@infradead.org LKML-Reference: <20090403040625.GA9473@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar commit 53152f957d4a5dfd537d17c823afeb1a2c03753e Author: Ian Campbell Date: Thu Apr 2 13:24:28 2009 +0100 xen: honour VCPU availability on boot If a VM is booted with offline VCPUs then unplug them during boot. Determining the availability of a VCPU requires access to XenStore which is not available at the point smp_prepare_cpus() is called, therefore we bring up all VCPUS initially and unplug the offline ones as soon as XenStore becomes available. Signed-off-by: Ian Campbell commit 1943689c47acaad7fc7f3dae8d35ef82de5d48f4 Merge: 6d02c42 0a4666b c6a960c 818fd20 f078370 Author: Jeremy Fitzhardinge Date: Mon Mar 30 10:00:45 2009 -0700 Merge branches 'for-linus/xen/dev-evtchn', 'for-linus/xen/xenbus', 'for-linus/xen/xenfs' and 'for-linus/xen/sys-hypervisor' into for-linus/xen/master * for-linus/xen/dev-evtchn: xen/dev-evtchn: clean up locking in evtchn xen: export ioctl headers to userspace xen: add /dev/xen/evtchn driver xen: add irq_from_evtchn * for-linus/xen/xenbus: xen/xenbus: export xenbus_dev_changed xen: use device model for suspending xenbus devices xen: remove suspend_cancel hook * for-linus/xen/xenfs: xen: add "capabilities" file * for-linus/xen/sys-hypervisor: xen: drop kexec bits from /sys/hypervisor since kexec isn't implemented yet xen/sys/hypervisor: change writable_pt to features xen: add /sys/hypervisor support Conflicts: drivers/xen/Makefile commit 818fd20673df82031e604bb784d836f1fc2e2451 Author: Jeremy Fitzhardinge Date: Fri Feb 6 18:46:47 2009 -0800 xen: add "capabilities" file The xenfs capabilities file allows usermode to determine what capabilities the domain has. The only one at present is "control_d" in a privileged domain. Signed-off-by: Jeremy Fitzhardinge commit f0783708bf63a2827863cf2be57c08a24843e6bd Author: Ian Campbell Date: Wed Mar 11 10:19:54 2009 +0000 xen: drop kexec bits from /sys/hypervisor since kexec isn't implemented yet I needed this to compile since there is no kexec yet in pvops kernel CC drivers/xen/sys-hypervisor.o drivers/xen/sys-hypervisor.c: In function 'hyper_sysfs_init': drivers/xen/sys-hypervisor.c:405: error: 'vmcoreinfo_size_xen' undeclared (first use in this function) drivers/xen/sys-hypervisor.c:405: error: (Each undeclared identifier is reported only once drivers/xen/sys-hypervisor.c:405: error: for each function it appears in.) drivers/xen/sys-hypervisor.c:406: error: implicit declaration of function 'xen_sysfs_vmcoreinfo_init' drivers/xen/sys-hypervisor.c: In function 'hyper_sysfs_exit': drivers/xen/sys-hypervisor.c:433: error: 'vmcoreinfo_size_xen' undeclared (first use in this function) drivers/xen/sys-hypervisor.c:434: error: implicit declaration of function 'xen_sysfs_vmcoreinfo_destroy' Signed-off-by: Ian Campbell commit a649b720614d5675dc402bef75a92576143fede7 Author: Jeremy Fitzhardinge Date: Tue Mar 10 17:17:41 2009 -0700 xen/sys/hypervisor: change writable_pt to features /sys/hypervisor/properties/writable_pt was misnamed. Rename to features, expressed as a bit array in hex. Signed-off-by: Jeremy Fitzhardinge commit cff7e81b3dd7c25cd2248cd7a04c5764552d5d55 Author: Jeremy Fitzhardinge Date: Tue Mar 10 14:39:59 2009 -0700 xen: add /sys/hypervisor support Adds support for Xen info under /sys/hypervisor. Taken from Novell 2.6.27 backport tree. Signed-off-by: Jeremy Fitzhardinge commit c6a960ce8858f20036cc3afc3b9422670d0d9021 Author: Jeremy Fitzhardinge Date: Mon Feb 9 12:05:53 2009 -0800 xen/xenbus: export xenbus_dev_changed Signed-off-by: Jeremy Fitzhardinge commit de5b31bd47de7e6f41be2e271318dbc8f1af354d Author: Ian Campbell Date: Mon Feb 9 12:05:50 2009 -0800 xen: use device model for suspending xenbus devices Signed-off-by: Ian Campbell Signed-off-by: Jeremy Fitzhardinge commit a1ce1be578365a4da7e7d7db4812539d2d5da763 Author: Ian Campbell Date: Mon Feb 9 12:05:50 2009 -0800 xen: remove suspend_cancel hook Remove suspend_cancel hook from xenbus_driver, in preparation for using the device model for suspending. Signed-off-by: Ian Campbell Signed-off-by: Jeremy Fitzhardinge commit 0a4666b539a0e896ec4e8396a034a479e3573125 Author: Jeremy Fitzhardinge Date: Thu Feb 12 13:03:24 2009 -0800 xen/dev-evtchn: clean up locking in evtchn Define a new per_user_data mutex to serialize bind/unbind operations to prevent them from racing with each other. Fix error returns and don't do a bind while holding a spinlock. Signed-off-by: Jeremy Fitzhardinge commit c5cfef0f79cacc3aa438fc28f4747f0d10c54d0d Author: Ian Campbell Date: Fri Feb 6 19:21:19 2009 -0800 xen: export ioctl headers to userspace Signed-off-by: Ian Campbell Signed-off-by: Jeremy Fitzhardinge commit f7116284c734f3a47180cd9c907944a1837ccb3c Author: Ian Campbell Date: Fri Feb 6 19:21:19 2009 -0800 xen: add /dev/xen/evtchn driver This driver is used by application which wish to receive notifications from the hypervisor or other guests via Xen's event channel mechanism. In particular it is used by the xenstore daemon in domain 0. Signed-off-by: Ian Campbell Signed-off-by: Jeremy Fitzhardinge commit d4c045364d3107603187f21a56ec231e74d26441 Author: Ian Campbell Date: Fri Feb 6 19:20:31 2009 -0800 xen: add irq_from_evtchn Given an evtchn, return the corresponding irq. Signed-off-by: Ian Campbell Signed-off-by: Jeremy Fitzhardinge commit 6d02c42698f99eccb290ac53d4f10ca883b9f90c Author: Jeremy Fitzhardinge Date: Sun Mar 29 22:57:15 2009 -0700 xen: clean up gate trap/interrupt constants Use GATE_INTERRUPT/TRAP rather than 0xe/f. Signed-off-by: Jeremy Fitzhardinge commit 707ebbc81c61eb480d8a51ca61e355e240df1d32 Author: Jeremy Fitzhardinge Date: Fri Mar 27 11:29:02 2009 -0700 xen: set _PAGE_NX in __supported_pte_mask before pagetable construction Some 64-bit machines don't support the NX flag in ptes. Check for NX before constructing the kernel pagetables. Signed-off-by: Jeremy Fitzhardinge commit 1e6fcf840e11ceff8a656a678c6e4b0560a98e08 Author: Ian Campbell Date: Wed Mar 25 17:46:42 2009 +0000 xen: resume interrupts before system devices. Impact: bugfix Xen domain restore Otherwise the first timer interrupt after resume is missed and we never get another. Signed-off-by: Ian Campbell Signed-off-by: Jeremy Fitzhardinge commit 8de07bbdede03598801cf33ab23dcbcd28a918d2 Author: Jeremy Fitzhardinge Date: Wed Mar 4 17:36:57 2009 -0800 xen/mmu: weaken flush_tlb_other test Impact: fixes crashing bug There's no particular problem with getting an empty cpu mask, so just shortcut-return if we get one. Avoids crash reported by Christophe Saout Signed-off-by: Jeremy Fitzhardinge commit 4185f35404dc96f8525298c7c548aee419f3b3f4 Author: Jeremy Fitzhardinge Date: Tue Mar 17 13:30:55 2009 -0700 xen/mmu: some early pagetable cleanups 1. make sure early-allocated ptes are pinned, so they can be later unpinned 2. don't pin pmd+pud, just make them RO 3. scatter some __inits around Signed-off-by: Jeremy Fitzhardinge commit 5f241e65f2be4661a33e1937e1c829252a80b2b8 Author: Jeremy Fitzhardinge Date: Mon Mar 16 17:08:48 2009 -0700 x86-64: non-paravirt systems always has PSE and PGE A paravirtualized system may not have PSE or PGE available to guests, so they are not required features. However, without paravirt we can assume that any x86-64 implementation will have them available. Signed-off-by: Jeremy Fitzhardinge commit 1e7449730853e7c9ae9a2458b2ced7ba12559a0e Author: Alex Nixon Date: Mon Feb 9 12:05:46 2009 -0800 Xen: Add virt_to_pfn helper function Signed-off-by: Alex Nixon commit 68509cdcde6583ee1a9542899d1270449c7d5903 Author: Jeremy Fitzhardinge Date: Sun Mar 8 03:59:04 2009 -0700 x86-64: remove PGE from must-have feature list PGE may not be available when running paravirtualized, so test the cpuid bit before using it. Signed-off-by: Jeremy Fitzhardinge commit e826fe1ba1563a9272345da8e3279a930ac160a7 Author: Jeremy Fitzhardinge Date: Sat Mar 7 17:09:27 2009 -0800 xen: mask XSAVE from cpuid Xen leaves XSAVE set in cpuid, but doesn't allow cr4.OSXSAVE to be set. This confuses the kernel and it ends up crashing on an xsetbv instruction. At boot time, try to set cr4.OSXSAVE, and mask XSAVE out of cpuid it we can't. This will produce a spurious error from Xen, but allows us to support XSAVE if/when Xen does. This also factors out the cpuid mask decisions to boot time. Signed-off-by: Jeremy Fitzhardinge commit e9e2d1ffcfdb38bed11a3064aa74bea9ee38ed80 Author: Hannes Eder Date: Thu Mar 5 20:13:57 2009 +0100 NULL noise: arch/x86/xen/smp.c Fix this sparse warnings: arch/x86/xen/smp.c:316:52: warning: Using plain integer as NULL pointer arch/x86/xen/smp.c:421:60: warning: Using plain integer as NULL pointer Signed-off-by: Hannes Eder Signed-off-by: Jeremy Fitzhardinge commit b4b7e58590d0e94ed78bd6be1aa163caba7b6c74 Author: Jeremy Fitzhardinge Date: Wed Mar 4 16:34:27 2009 -0800 xen: remove xen_load_gdt debug Don't need the noise. Signed-off-by: Jeremy Fitzhardinge commit 3ce5fa7ebff74b6a4dc5fdcdc22e6979f5a4ff85 Author: Jeremy Fitzhardinge Date: Wed Mar 4 15:26:00 2009 -0800 xen: make xen_load_gdt simpler Remove use of multicall machinery which is unused (gdt loading is never performance critical). This removes the implicit use of percpu variables, which simplifies understanding how the percpu code's use of load_gdt interacts with this code. Signed-off-by: Jeremy Fitzhardinge commit 6ed6bf428aff64fe37cdc54b239d598fee6016f1 Author: Jeremy Fitzhardinge Date: Wed Mar 4 13:02:18 2009 -0800 xen: clean up xen_load_gdt Makes the logic a bit clearer. Signed-off-by: Jeremy Fitzhardinge commit 7571a60446030d2576d881438447e86a0755a83b Author: Jeremy Fitzhardinge Date: Fri Feb 27 15:34:59 2009 -0800 xen: split construction of p2m mfn tables from registration Build the p2m_mfn_list_list early with the rest of the p2m table, but register it later when the real shared_info structure is in place. Signed-off-by: Jeremy Fitzhardinge commit a2bcd4731f77cb77ae4b5e4a3d7f5471cf346c33 Author: Ingo Molnar Date: Sun Mar 29 23:47:48 2009 +0200 x86/mm: further cleanups of fault.c's include file section Impact: cleanup Eliminate more than 20 unnecessary #include lines in fault.c Also fix include file dependency bug in asm/traps.h. (this was masked before, by implicit inclusion) Signed-off-by: Ingo Molnar LKML-Reference: Acked-by: H. Peter Anvin commit 59d7187142bbe9b404a403ed0f874d3227305f26 Author: Jeremy Fitzhardinge Date: Thu Feb 26 15:48:33 2009 -0800 xen: separate p2m allocation from setting When doing very early p2m setting, we need to separate setting from allocation, so split things up accordingly. Signed-off-by: Jeremy Fitzhardinge commit 5caecb9432428241d0c641897f07ff4003f1b55f Author: Jeremy Fitzhardinge Date: Fri Feb 20 23:01:26 2009 -0800 xen: disable preempt for leave_lazy_mmu xen_mc_flush() requires preemption to be disabled for its own sanity, so disable it while we're flushing. Signed-off-by: Jeremy Fitzhardinge commit ab2f75f0b760d2b0c9a875b669a1b51dce02c85a Author: Jeremy Fitzhardinge Date: Wed Feb 18 00:18:50 2009 -0800 x86/paravirt: use percpu_ rather than __get_cpu_var Impact: minor optimisation percpu_read/write is a slightly more direct way of getting to percpu data. Signed-off-by: Jeremy Fitzhardinge commit 252a6bf2a3a7e7add56b17d48aecf3f3ef213103 Author: Jeremy Fitzhardinge Date: Wed Feb 18 00:11:28 2009 -0800 mm: allow preemption in apply_to_pte_range Impact: allow preemption in apply_to_pte_range updates to init_mm Preemption is now allowed for lazy mmu mode, so don't disable it for the inner loop of apply_to_pte_range. This only applies when doing updates to init_mm; user pagetables are still modified under the pte lock, so preemption is disabled anyway. Signed-off-by: Jeremy Fitzhardinge Acked-by: Peter Zijlstra commit 2829b449276aed45f3d649efb21e3418e39dd5d1 Author: Jeremy Fitzhardinge Date: Tue Feb 17 23:53:19 2009 -0800 x86/paravirt: allow preemption with lazy mmu mode Impact: remove obsolete checks, simplification Lift restrictions on preemption with lazy mmu mode, as it is now allowed. Signed-off-by: Jeremy Fitzhardinge Acked-by: Peter Zijlstra commit 224101ed69d3fbb486868e0f6e0f9fa37302efb4 Author: Jeremy Fitzhardinge Date: Wed Feb 18 11:18:57 2009 -0800 x86/paravirt: finish change from lazy cpu to context switch start/end Impact: fix lazy context switch API Pass the previous and next tasks into the context switch start end calls, so that the called functions can properly access the task state (esp in end_context_switch, in which the next task is not yet completely current). Signed-off-by: Jeremy Fitzhardinge Acked-by: Peter Zijlstra commit b407fc57b815b2016186220baabc76cc8264206e Author: Jeremy Fitzhardinge Date: Tue Feb 17 23:46:21 2009 -0800 x86/paravirt: flush pending mmu updates on context switch Impact: allow preemption during lazy mmu updates If we're in lazy mmu mode when context switching, leave lazy mmu mode, but remember the task's state in TIF_LAZY_MMU_UPDATES. When we resume the task, check this flag and re-enter lazy mmu mode if its set. This sets things up for allowing lazy mmu mode while preemptible, though that won't actually be active until the next change. Signed-off-by: Jeremy Fitzhardinge Acked-by: Peter Zijlstra commit 7fd7d83d49914f03aefffba6aee09032fcd54cce Author: Jeremy Fitzhardinge Date: Tue Feb 17 23:24:03 2009 -0800 x86/pvops: replace arch_enter_lazy_cpu_mode with arch_start_context_switch Impact: simplification, prepare for later changes Make lazy cpu mode more specific to context switching, so that it makes sense to do more context-switch specific things in the callbacks. Signed-off-by: Jeremy Fitzhardinge Acked-by: Peter Zijlstra commit b8bcfe997e46150fedcc3f5b26b846400122fdd9 Author: Jeremy Fitzhardinge Date: Tue Feb 17 23:05:19 2009 -0800 x86/paravirt: remove lazy mode in interrupts Impact: simplification, robustness Make paravirt_lazy_mode() always return PARAVIRT_LAZY_NONE when in an interrupt. This prevents interrupt code from accidentally inheriting an outer lazy state, and instead does everything synchronously. Outer batched operations are left deferred. Signed-off-by: Jeremy Fitzhardinge Acked-by: Peter Zijlstra Cc: Thomas Gleixner commit a8a93f3f03b7a8008d720e8d91798efe599d416c Author: Jeremy Fitzhardinge Date: Thu Feb 12 13:45:34 2009 -0800 mm: disable preemption in apply_to_pte_range Impact: bugfix Lazy mmu mode needs preemption disabled, so if we're apply to init_mm (which doesn't require any pte locks), then explicitly disable preemption. (Do it unconditionally after checking we've successfully done the allocation to simplify the error handling.) Signed-off-by: Jeremy Fitzhardinge commit 34886c8bc590f078d4c0b88f50d061326639198d Author: Steven Rostedt Date: Wed Mar 25 21:00:47 2009 -0400 tracing: add average time in function to function profiler Show the average time in the function (Time / Hit) Function Hit Time Avg -------- --- ---- --- mwait_idle 51 140326.6 us 2751.503 us smp_apic_timer_interrupt 47 3517.735 us 74.845 us schedule 10 2738.754 us 273.875 us __schedule 10 2732.857 us 273.285 us hrtimer_interrupt 47 1896.104 us 40.342 us irq_exit 56 1711.833 us 30.568 us __run_hrtimer 47 1315.589 us 27.991 us tick_sched_timer 47 1138.690 us 24.227 us do_softirq 56 1116.829 us 19.943 us __do_softirq 56 1066.932 us 19.052 us do_IRQ 9 926.153 us 102.905 us Signed-off-by: Steven Rostedt commit 318e0a73c9e41b9a17241829bcd0605a39b87cb9 Author: Steven Rostedt Date: Wed Mar 25 20:06:34 2009 -0400 tracing: remove on the fly allocator from function profiler Impact: safer code The on the fly allocator for the function profiler was to save memory. But at the expense of stability. Although it survived several tests, allocating from the function tracer is just too risky, just to save space. This patch removes the allocator and simply allocates enough entries at start up. Each function gets a profiling structure of 40 bytes. With an average of 20K functions, and this is for each CPU, we have 800K per online CPU. This is not too bad, at least for non-embedded. Signed-off-by: Steven Rostedt commit fb9fb015e92123fa3a8e0c2e2fff491d4a56b470 Author: Steven Rostedt Date: Wed Mar 25 13:26:41 2009 -0400 tracing: clean up tracing profiler Ingo Molnar suggested clean ups for the profiling code. This patch makes those updates. Reported-by: Ingo Molnar Signed-off-by: Steven Rostedt commit a2a16d6a3156ef7309ca7328a20c35df9418e670 Author: Steven Rostedt Date: Tue Mar 24 23:17:58 2009 -0400 function-graph: add option to calculate graph time or not graph time is the time that a function is executing another function. Thus if function A calls B, if graph-time is set, then the time for A includes B. This is the default behavior. But if graph-time is off, then the time spent executing B is subtracted from A. Signed-off-by: Steven Rostedt commit cafb168a1c92e4c9e1731fe3d666c39611762c49 Author: Steven Rostedt Date: Tue Mar 24 20:50:39 2009 -0400 tracing: make the function profiler per cpu Impact: speed enhancement By making the function profiler record in per cpu data we not only get better readings, avoid races, we also do not have to take any locks. Signed-off-by: Steven Rostedt commit 0706f1c48ca8a7ab478090b4e38f2e578ae2bfe0 Author: Steven Rostedt Date: Mon Mar 23 23:12:58 2009 -0400 tracing: adding function timings to function profiler If the function graph trace is enabled, the function profiler will use it to take the timing of the functions. cat /debug/tracing/trace_stat/functions Function Hit Time -------- --- ---- mwait_idle 127 183028.4 us schedule 26 151997.7 us __schedule 31 151975.1 us sys_wait4 2 74080.53 us do_wait 2 74077.80 us sys_newlstat 138 39929.16 us do_path_lookup 179 39845.79 us vfs_lstat_fd 138 39761.97 us user_path_at 153 39469.58 us path_walk 179 39435.76 us __link_path_walk 189 39143.73 us [...] Note the times are skewed due to the function graph tracer not taking into account schedules. Signed-off-by: Steven Rostedt commit 493762fc534c71d11d489f872c4b4a2c61173668 Author: Steven Rostedt Date: Mon Mar 23 17:12:36 2009 -0400 tracing: move function profiler data out of function struct Impact: reduce size of memory in function profiler The function profiler originally introduces its counters into the function records itself. There is 20 thousand different functions on a normal system, and that is adding 20 thousand counters for profiling event when not needed. A normal run of the profiler yields only a couple of thousand functions executed, depending on what is being profiled. This means we have around 18 thousand useless counters. This patch rectifies this by moving the data out of the function records used by dynamic ftrace. Data is preallocated to hold the functions when the profiling begins. Checks are made during profiling to see if more recorcds should be allocated, and they are allocated if it is safe to do so. This also removes the dependency from using dynamic ftrace, and also removes the overhead by having it enabled. Signed-off-by: Steven Rostedt commit bac429f037f1a51a74d62bad6d1518c3be065df3 Author: Steven Rostedt Date: Fri Mar 20 12:50:56 2009 -0400 tracing: add function profiler Impact: new profiling feature This patch adds a function profiler. In debugfs/tracing/ two new files are created. function_profile_enabled - to enable or disable profiling trace_stat/functions - the profiled functions. For example: echo 1 > /debugfs/tracing/function_profile_enabled ./hackbench 50 echo 0 > /debugfs/tracing/function_profile_enabled yields: cat /debugfs/tracing/trace_stat/functions Function Hit -------- --- _spin_lock 10106442 _spin_unlock 10097492 kfree 6013704 _spin_unlock_irqrestore 4423941 _spin_lock_irqsave 4406825 __phys_addr 4181686 __slab_free 4038222 dput 4030130 path_put 4023387 unroll_tree_refs 4019532 [...] The most hit functions are listed first. Functions that are not hit are not listed. This feature depends on and uses dynamic function tracing. When the function profiling is disabled, no overhead occurs. But it still takes up around 300KB to hold the data, thus it is not recomended to keep it enabled for systems low on memory. When a '1' is echoed into the function_profile_enabled file, the counters for is function is reset back to zero. Thus you can see what functions are hit most by different programs. Signed-off-by: Steven Rostedt commit 425480081e936d8725f0d44b8829d699bf088c6b Author: Steven Rostedt Date: Tue Mar 24 13:38:36 2009 -0400 tracing: add handler to trace_stat Currently, if a trace_stat user wants a handle to some private data, the trace_stat infrastructure does not supply a way to do that. This patch passes the trace_stat structure to the start function of the trace_stat code. Signed-off-by: Steven Rostedt commit c78a3956b982418186e40978a51636a2b43221bc Author: Markus Metzger Date: Wed Mar 18 19:27:00 2009 +0100 x86, bts: use atomic memory allocation Ds_request_bts() needs to allocate memory. It uses GFP_KERNEL. Hw-branch-tracer calls ds_request_bts() within on_each_cpu(). Use atomic memory allocation to allow it to be used in that context. Signed-off-by: Markus Metzger LKML-Reference: <20090318192700.A6038@sedona.ch.intel.com> Signed-off-by: Ingo Molnar commit 79258a354e0c69be94ae2871809a195bf4a647b1 Author: Ingo Molnar Date: Fri Mar 13 12:02:08 2009 +0100 x86, bts: detect size of DS fields, fix Impact: build fix One usage site was missed in the sizeof_field -> sizeof_ptr_field rename. Cc: Markus Metzger LKML-Reference: <20090313104218.A30096@sedona.ch.intel.com> Signed-off-by: Ingo Molnar commit e9a22d1fb94050b7d600019c32e6b672d539054b Author: Ingo Molnar Date: Fri Mar 13 11:54:40 2009 +0100 x86, bts: cleanups Impact: cleanup, no code changed Cc: Markus Metzger LKML-Reference: <20090313104218.A30096@sedona.ch.intel.com> Signed-off-by: Ingo Molnar commit 321bb5e1ac461c04b6a93f795010d6eb01d8c5ca Author: Markus Metzger Date: Fri Mar 13 10:50:27 2009 +0100 x86, hw-branch-tracer: add selftest Add a selftest for the hw-branch-tracer. Signed-off-by: Markus Metzger LKML-Reference: <20090313105027.A30183@sedona.ch.intel.com> Signed-off-by: Ingo Molnar commit ba9372a8f306c4e53a5f61dcbcd6c1e4a8c2e9ac Author: Markus Metzger Date: Fri Mar 13 10:48:52 2009 +0100 x86, hw-branch-tracer: keep resources on stop Distinguish init/reset and start/stop: init/reset will allocate and release bts tracing resources stop/start will suspend and resume bts tracing Return an error on init() if no cpu can be traced. Signed-off-by: Markus Metzger LKML-Reference: <20090313104852.A30168@sedona.ch.intel.com> Signed-off-by: Ingo Molnar commit b8e47195451c5d3f62620b2b1b5928669afd56eb Author: Markus Metzger Date: Fri Mar 13 10:46:42 2009 +0100 x86, bts: correct comment style in ds.c Correct the comment style in ds.c. Signed-off-by: Markus Metzger LKML-Reference: <20090313104642.A30149@sedona.ch.intel.com> Signed-off-by: Ingo Molnar commit 8a327f6d1b05f5ce16572b4413a5df1d0e872283 Author: Markus Metzger Date: Fri Mar 13 10:45:07 2009 +0100 x86, bts: add selftest for BTS Perform a selftest of branch trace store when a cpu is initialized. WARN and disable branch trace store support if the selftest fails. Signed-off-by: Markus Metzger LKML-Reference: <20090313104507.A30125@sedona.ch.intel.com> Signed-off-by: Ingo Molnar commit bc44fb5f7d3e764ed7698c835a1a0f35aba2eb3d Author: Markus Metzger Date: Fri Mar 13 10:42:18 2009 +0100 x86, bts: detect size of DS fields Impact: more robust DS feature enumeration Detect the size of the pointer-type fields in the DS area configuration via the DTES64 features rather than based on the cpuid. Rename a variable to denote that size to reflect that it only covers the pointer-type fields. Add more boot-time diagnostics giving the detected size and the sizes of BTS and PEBS records. Use the size of the BTS/PEBS record to indicate that the respective feature is not available (if the record size is zero). Signed-off-by: Markus Metzger LKML-Reference: <20090313104218.A30096@sedona.ch.intel.com> Signed-off-by: Ingo Molnar