commit d10902812c9cd5583130a4ebb9ad19c60b68149d Merge: 181f977 25874a2 Author: Linus Torvalds Date: Tue Mar 15 20:01:36 2011 -0700 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits) x86: Clean up apic.c and apic.h x86: Remove superflous goal definition of tsc_sync x86: dt: Correct local apic documentation in device tree bindings x86: dt: Cleanup local apic setup x86: dt: Fix OLPC=y/INTEL_CE=n build rtc: cmos: Add OF bindings x86: ce4100: Use OF to setup devices x86: ioapic: Add OF bindings for IO_APIC x86: dtb: Add generic bus probe x86: dtb: Add support for PCI devices backed by dtb nodes x86: dtb: Add device tree support for HPET x86: dtb: Add early parsing of IO_APIC x86: dtb: Add irq domain abstraction x86: dtb: Add a device tree for CE4100 x86: Add device tree support x86: e820: Remove conditional early mapping in parse_e820_ext x86: OLPC: Make OLPC=n build again x86: OLPC: Remove extra OLPC_OPENFIRMWARE_DT indirection x86: OLPC: Cleanup config maze completely x86: OLPC: Hide OLPC_OPENFIRMWARE config switch ... Fix up conflicts in arch/x86/platform/ce4100/ce4100.c commit 181f977d134a9f8e3f8839f42af655b045fc059e Merge: d5d4239 25542c6 Author: Linus Torvalds Date: Tue Mar 15 19:49:10 2011 -0700 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (93 commits) x86, tlb, UV: Do small micro-optimization for native_flush_tlb_others() x86-64, NUMA: Don't call numa_set_distanc() for all possible node combinations during emulation x86-64, NUMA: Don't assume phys node 0 is always online in numa_emulation() x86-64, NUMA: Clean up initmem_init() x86-64, NUMA: Fix numa_emulation code with node0 without RAM x86-64, NUMA: Revert NUMA affine page table allocation x86: Work around old gas bug x86-64, NUMA: Better explain numa_distance handling x86-64, NUMA: Fix distance table handling mm: Move early_node_map[] reverse scan helpers under HAVE_MEMBLOCK x86-64, NUMA: Fix size of numa_distance array x86: Rename e820_table_* to pgt_buf_* bootmem: Move __alloc_memory_core_early() to nobootmem.c bootmem: Move contig_page_data definition to bootmem.c/nobootmem.c bootmem: Separate out CONFIG_NO_BOOTMEM code into nobootmem.c x86-64, NUMA: Seperate out numa_alloc_distance() from numa_set_distance() x86-64, NUMA: Add proper function comments to global functions x86-64, NUMA: Move NUMA emulation into numa_emulation.c x86-64, NUMA: Prepare numa_emulation() for moving NUMA emulation into a separate file x86-64, NUMA: Do not scan two times for setup_node_bootmem() ... Fix up conflicts in arch/x86/kernel/smpboot.c commit d5d42399bd7b66bd6b55363b311810504110c967 Merge: 209b6c8 9599ec0 Author: Linus Torvalds Date: Tue Mar 15 19:41:42 2011 -0700 Merge branch 'x86-mem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-64, mem: Convert memmove() to assembly file and fix return value bug commit 209b6c8fa72e8b726a0cd273a56aded55be22bfa Merge: 0310e43 1396fa9 Author: Linus Torvalds Date: Tue Mar 15 19:40:53 2011 -0700 Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, microcode, AMD: Fix signedness bug in generic_load_microcode() x86, microcode, AMD: Extend ucode size verification x86, microcode, AMD: Cleanup dmesg output x86, microcode, AMD: Remove unneeded memset call x86, microcode, AMD: Simplify get_next_ucode x86, microcode, AMD: Simplify install_equiv_cpu_table x86, microcode, AMD: Release firmware on error x86, microcode: Correct sysdev_add error path commit 0310e437182568a9e0aa862f2a9d13908069df73 Merge: 5f6fb45 53c39ce Author: Linus Torvalds Date: Tue Mar 15 19:40:35 2011 -0700 Merge branch 'um-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'um-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: um: Select GENERIC_HARDIRQS_NO_DEPRECATED um: Use proper accessors in show_interrupts() um: Convert irq_chips to new functions um: Remove stale irq_chip.end commit 5f6fb45466b2273ffb91c9cf209f164f666c33b1 Merge: 3904afb c0185808 Author: Linus Torvalds Date: Tue Mar 15 19:23:40 2011 -0700 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (116 commits) x86: Enable forced interrupt threading support x86: Mark low level interrupts IRQF_NO_THREAD x86: Use generic show_interrupts x86: ioapic: Avoid redundant lookup of irq_cfg x86: ioapic: Use new move_irq functions x86: Use the proper accessors in fixup_irqs() x86: ioapic: Use irq_data->state x86: ioapic: Simplify irq chip and handler setup x86: Cleanup the genirq name space genirq: Add chip flag to force mask on suspend genirq: Add desc->irq_data accessor genirq: Add comments to Kconfig switches genirq: Fixup fasteoi handler for oneshot mode genirq: Provide forced interrupt threading sched: Switch wait_task_inactive to schedule_hrtimeout() genirq: Add IRQF_NO_THREAD genirq: Allow shared oneshot interrupts genirq: Prepare the handling of shared oneshot interrupts genirq: Make warning in handle_percpu_event useful x86: ioapic: Move trigger defines to io_apic.h ... Fix up trivial(?) conflicts in arch/x86/pci/xen.c due to genirq name space changes clashing with the Xen cleanups. The set_irq_msi() had moved to xen_bind_pirq_msi_to_irq(). commit 3904afb41d4316f7a2968c615d689e19149a4f84 Merge: 502f4d4 fd8fa4d3 Author: Linus Torvalds Date: Tue Mar 15 19:16:00 2011 -0700 Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Combine printk()s in show_regs_common() x86: Don't call dump_stack() from arch_trigger_all_cpu_backtrace_handler() commit 502f4d4f74219749a9758b9bbc27fb665b2e83ab Merge: da849ab e5fea86 Author: Linus Torvalds Date: Tue Mar 15 19:00:53 2011 -0700 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix and clean up generic_processor_info() x86: Don't copy per_cpu cpuinfo for BSP two times x86: Move llc_shared_map out of cpu_info commit da849abeb86ddaa093b0935fde595e8e4dd21ffc Merge: 21a3281 371c394 Author: Linus Torvalds Date: Tue Mar 15 18:59:56 2011 -0700 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, binutils, xen: Fix another wrong size directive x86: Remove dead config option X86_CPU x86: Really print supported CPUs if PROCESSOR_SELECT=y x86: Fix a bogus unwind annotation in lib/semaphore_32.S um, x86-64: Fix UML build after adding CFI annotations to lib/rwsem_64.S x86: Remove unused bits from lib/thunk_*.S x86: Use {push,pop}_cfi in more places x86-64: Add CFI annotations to lib/rwsem_64.S x86, asm: Cleanup unnecssary macros in asm-offsets.c x86, system.h: Drop unused __SAVE/__RESTORE macros x86: Use bitmap library functions x86: Partly unify asm-offsets_{32,64}.c x86: Reduce back the alignment of the per-CPU data section commit 21a32816b2e13eafb6d8a4589a84c6e629adc392 Merge: 420c1c5 ea04683 Author: Linus Torvalds Date: Tue Mar 15 18:59:21 2011 -0700 Merge branch 'timers-rtc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-rtc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: RTC: Fix up rtc.txt documentation to reflect changes to generic rtc layer RTC: sa1100: Update the sa1100 RTC driver. RTC: Fix the cross interrupt issue on rtc-test. RTC: Remove UIE and PIE information from the sa1100 driver proc. RTC: Include information about UIE and PIE in RTC driver proc. RTC: Clean out UIE icotl implementations RTC: Cleanup rtc_class_ops->update_irq_enable() RTC: Cleanup rtc_class_ops->irq_set_freq() RTC: Cleanup rtc_class_ops->irq_set_state RTC: Initialize kernel state from RTC commit 420c1c572d4ceaa2f37b6311b7017ac6cf049fe2 Merge: 9620639 6e6823d Author: Linus Torvalds Date: Tue Mar 15 18:53:35 2011 -0700 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (62 commits) posix-clocks: Check write permissions in posix syscalls hrtimer: Remove empty hrtimer_init_hres_timer() hrtimer: Update hrtimer->state documentation hrtimer: Update base[CLOCK_BOOTTIME].offset correctly timers: Export CLOCK_BOOTTIME via the posix timers interface timers: Add CLOCK_BOOTTIME hrtimer base time: Extend get_xtime_and_monotonic_offset() to also return sleep time: Introduce get_monotonic_boottime and ktime_get_boottime hrtimers: extend hrtimer base code to handle more then 2 clockids ntp: Remove redundant and incorrect parameter check mn10300: Switch do_timer() to xtimer_update() posix clocks: Introduce dynamic clocks posix-timers: Cleanup namespace posix-timers: Add support for fd based clocks x86: Add clock_adjtime for x86 posix-timers: Introduce a syscall for clock tuning. time: Splitout compat timex accessors ntp: Add ADJ_SETOFFSET mode bit time: Introduce timekeeping_inject_offset posix-timer: Update comment ... Fix up new system-call-related conflicts in arch/x86/ia32/ia32entry.S arch/x86/include/asm/unistd_32.h arch/x86/include/asm/unistd_64.h arch/x86/kernel/syscall_table_32.S (name_to_handle_at()/open_by_handle_at() vs clock_adjtime()), and some due to movement of get_jiffies_64() in: kernel/time.c commit 9620639b7ea3843983f4ced8b4c81eb4d8974838 Merge: a926021 6d1cafd Author: Linus Torvalds Date: Tue Mar 15 18:37:30 2011 -0700 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits) sched: Resched proper CPU on yield_to() sched: Allow users with sufficient RLIMIT_NICE to change from SCHED_IDLE policy sched: Allow SCHED_BATCH to preempt SCHED_IDLE tasks sched: Clean up the IRQ_TIME_ACCOUNTING code sched: Add #ifdef around irq time accounting functions sched, autogroup: Stop claiming ownership of the root task group sched, autogroup: Stop going ahead if autogroup is disabled sched, autogroup, sysctl: Use proc_dointvec_minmax() instead sched: Fix the group_imb logic sched: Clean up some f_b_g() comments sched: Clean up remnants of sd_idle sched: Wholesale removal of sd_idle logic sched: Add yield_to(task, preempt) functionality sched: Use a buddy to implement yield_task_fair() sched: Limit the scope of clear_buddies sched: Check the right ->nr_running in yield_task_fair() sched: Avoid expensive initial update_cfs_load(), on UP too sched: Fix switch_from_fair() sched: Simplify the idle scheduling class softirqs: Account ksoftirqd time as cpustat softirq ... commit a926021cb1f8a99a275eaf6eb546102e9469dc59 Merge: 0586bed 5e814dd Author: Linus Torvalds Date: Tue Mar 15 18:31:30 2011 -0700 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (184 commits) perf probe: Clean up probe_point_lazy_walker() return value tracing: Fix irqoff selftest expanding max buffer tracing: Align 4 byte ints together in struct tracer tracing: Export trace_set_clr_event() tracing: Explain about unstable clock on resume with ring buffer warning ftrace/graph: Trace function entry before updating index ftrace: Add .ref.text as one of the safe areas to trace tracing: Adjust conditional expression latency formatting. tracing: Fix event alignment: skb:kfree_skb tracing: Fix event alignment: mce:mce_record tracing: Fix event alignment: kvm:kvm_hv_hypercall tracing: Fix event alignment: module:module_request tracing: Fix event alignment: ftrace:context_switch and ftrace:wakeup tracing: Remove lock_depth from event entry perf header: Stop using 'self' perf session: Use evlist/evsel for managing perf.data attributes perf top: Don't let events to eat up whole header line perf top: Fix events overflow in top command ring-buffer: Remove unused #include tracing: Add an 'overwrite' trace_option. ... commit 0586bed3e8563c2eb89bc7256e30ce633ae06cfb Merge: b80cd62 dbebbfb Author: Linus Torvalds Date: Tue Mar 15 18:28:30 2011 -0700 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rtmutex: tester: Remove the remaining BKL leftovers lockdep/timers: Explain in detail the locking problems del_timer_sync() may cause rtmutex: Simplify PI algorithm and make highest prio task get lock rwsem: Remove redundant asmregparm annotation rwsem: Move duplicate function prototypes to linux/rwsem.h rwsem: Unify the duplicate rwsem_is_locked() inlines rwsem: Move duplicate init macros and functions to linux/rwsem.h rwsem: Move duplicate struct rwsem declaration to linux/rwsem.h x86: Cleanup rwsem_count_t typedef rwsem: Cleanup includes locking: Remove deprecated lock initializers cred: Replace deprecated spinlock initialization kthread: Replace deprecated spinlock initialization xtensa: Replace deprecated spinlock initialization um: Replace deprecated spinlock initialization sparc: Replace deprecated spinlock initialization mips: Replace deprecated spinlock initialization cris: Replace deprecated spinlock initialization alpha: Replace deprecated spinlock initialization rtmutex-tester: Remove BKL tests commit b80cd62b7d4406bbe8c573fe4381dcc71a2850fd Merge: c345f60 07d5eca Author: Linus Torvalds Date: Tue Mar 15 18:23:52 2011 -0700 Merge branch 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: arm: Remove bogus comment in futex_atomic_cmpxchg_inatomic() futex: Deobfuscate handle_futex_death() plist: Add priority list test plist: Shrink struct plist_head futex,plist: Remove debug lock assignment from plist_node futex,plist: Pass the real head of the priority list to plist_del() futex: Sanitize futex ops argument types futex: Sanitize cmpxchg_futex_value_locked API futex: Remove redundant pagefault_disable in futex_atomic_cmpxchg_inatomic() futex: Avoid redudant evaluation of task_pid_vnr() futex: Update futex_wait_setup comments about locking commit c345f60a5f58a65004f22fb0d257d65ec1528310 Merge: 422e6c4 9977728 Author: Linus Torvalds Date: Tue Mar 15 18:23:25 2011 -0700 Merge branch 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: debugobjects: Add hint for better object identification commit 422e6c4bc4b48c15b3cb57a1ca71431abfc57e54 Merge: c83ce98 574197e Author: Linus Torvalds Date: Tue Mar 15 15:48:13 2011 -0700 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (57 commits) tidy the trailing symlinks traversal up Turn resolution of trailing symlinks iterative everywhere simplify link_path_walk() tail Make trailing symlink resolution in path_lookupat() iterative update nd->inode in __do_follow_link() instead of after do_follow_link() pull handling of one pathname component into a helper fs: allow AT_EMPTY_PATH in linkat(), limit that to CAP_DAC_READ_SEARCH Allow passing O_PATH descriptors via SCM_RIGHTS datagrams readlinkat(), fchownat() and fstatat() with empty relative pathnames Allow O_PATH for symlinks New kind of open files - "location only". ext4: Copy fs UUID to superblock ext3: Copy fs UUID to superblock. vfs: Export file system uuid via /proc//mountinfo unistd.h: Add new syscalls numbers to asm-generic x86: Add new syscalls for x86_64 x86: Add new syscalls for x86_32 fs: Remove i_nlink check from file system link callback fs: Don't allow to create hardlink for deleted file vfs: Add open by file handle support ... commit c83ce989cb5ff86575821992ea82c4df5c388ebc Author: Trond Myklebust Date: Tue Mar 15 13:36:43 2011 -0400 VFS: Fix the nfs sillyrename regression in kernel 2.6.38 The new vfs locking scheme introduced in 2.6.38 breaks NFS sillyrename because the latter relies on being able to determine the parent directory of the dentry in the ->iput() callback in order to send the appropriate unlink rpc call. Looking at the code that cares about races with dput(), there doesn't seem to be anything that specifically uses d_parent as a test for whether or not there is a race: - __d_lookup_rcu(), __d_lookup() all test for d_hashed() after d_parent - shrink_dcache_for_umount() is safe since nothing else can rearrange the dentries in that super block. - have_submount(), select_parent() and d_genocide() can test for a deletion if we set the DCACHE_DISCONNECTED flag when the dentry is removed from the parent's d_subdirs list. Signed-off-by: Trond Myklebust Cc: stable@kernel.org (2.6.38, needs commit c826cb7dfce8 "dcache.c: create helper function for duplicated functionality" ) Signed-off-by: Linus Torvalds commit c826cb7dfce80512c26c984350077a25046bd215 Author: Linus Torvalds Date: Tue Mar 15 15:29:21 2011 -0700 dcache.c: create helper function for duplicated functionality This creates a helper function for he "try to ascend into the parent directory" case, which was written out in triplicate before. With all the locking and subtle sequence number stuff, we really don't want to duplicate that kind of code. Signed-off-by: Linus Torvalds commit 574197e0de46a8a4db5c54ef7b65e43ffa8873a7 Author: Al Viro Date: Mon Mar 14 22:20:34 2011 -0400 tidy the trailing symlinks traversal up * pull the handling of current->total_link_count into __do_follow_link() * put the common "do ->put_link() if needed and path_put() the link" stuff into a helper (put_link(nd, link, cookie)) * rename __do_follow_link() to follow_link(), while we are at it Signed-off-by: Al Viro commit b356379a020bb7197603118bb1cbc903963aa198 Author: Al Viro Date: Mon Mar 14 21:54:55 2011 -0400 Turn resolution of trailing symlinks iterative everywhere The last remaining place (resolution of nested symlink) converted to the loop of the same kind we have in path_lookupat() and path_openat(). Note that we still *do* have a recursion in pathname resolution; can't avoid it, really. However, it's strictly for nested symlinks now - i.e. ones in the middle of a pathname. link_path_walk() has lost the tail now - it always walks everything except the last component. do_follow_link() renamed to nested_symlink() and moved down. Signed-off-by: Al Viro commit ce0525449da56444948c368f52e10f3db0465338 Author: Al Viro Date: Mon Mar 14 21:28:04 2011 -0400 simplify link_path_walk() tail Now that link_path_walk() is called without LOOKUP_PARENT only from do_follow_link(), we can simplify the checks in last component handling. First of all, checking if we'd arrived to a directory is not needed - the caller will check it anyway. And LOOKUP_FOLLOW is guaranteed to be there, since we only get to that place with nd->depth > 0. Signed-off-by: Al Viro commit bd92d7fed877ed1e6997e4f3f13dbcd872947653 Author: Al Viro Date: Mon Mar 14 19:54:59 2011 -0400 Make trailing symlink resolution in path_lookupat() iterative Now the only caller of link_path_walk() that does *not* pass LOOKUP_PARENT is do_follow_link() Signed-off-by: Al Viro commit b21041d0f72899ed815bd2cbf7275339c74737b6 Author: Al Viro Date: Mon Mar 14 20:01:51 2011 -0400 update nd->inode in __do_follow_link() instead of after do_follow_link() ... and note that we only need to do it for LAST_BIND symlinks Signed-off-by: Al Viro commit ce57dfc1791221ef58b6d6b8f5437fccefc4e187 Author: Al Viro Date: Sun Mar 13 19:58:58 2011 -0400 pull handling of one pathname component into a helper new helper: walk_component(). Handles everything except symlinks; returns negative on error, 0 on success and 1 on symlinks we decided to follow. Drops out of RCU mode on such symlinks. link_path_walk() and do_last() switched to using that. Signed-off-by: Al Viro commit 11a7b371b64ef39fc5fb1b6f2218eef7c4d035e3 Author: Aneesh Kumar K.V Date: Sat Jan 29 18:43:42 2011 +0530 fs: allow AT_EMPTY_PATH in linkat(), limit that to CAP_DAC_READ_SEARCH We don't want to allow creation of private hardlinks by different application using the fd passed to them via SCM_RIGHTS. So limit the null relative name usage in linkat syscall to CAP_DAC_READ_SEARCH Signed-off-by: Aneesh Kumar K.V commit 5e814dd597c42daeb8d2a276e64a6ec986ad0e2a Author: Ingo Molnar Date: Tue Mar 15 20:51:09 2011 +0100 perf probe: Clean up probe_point_lazy_walker() return value Newer compilers (gcc 4.6) complains about: return ret < 0 ?: 0; For the following reason: util/probe-finder.c: In function ‘probe_point_lazy_walker’: util/probe-finder.c:1331:18: error: the omitted middle operand in ?: will always be ‘true’, suggest explicit middle operand [-Werror=parentheses] And indeed the return value is a somewhat obscure (but correct) value of 'true', so return 'ret' instead - this is cleaner and unconfuses GCC as well. Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Frederic Weisbecker Cc: Masami Hiramatsu Cc: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit 76ca07832842100b14a31ad8996dab7b0c28aa42 Merge: 27d2a8b b056b6a Author: Linus Torvalds Date: Tue Mar 15 10:59:09 2011 -0700 Merge branch 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm * 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm: xen: suspend: remove xen_hvm_suspend xen: suspend: pull pre/post suspend hooks out into suspend_info xen: suspend: move arch specific pre/post suspend hooks into generic hooks xen: suspend: refactor non-arch specific pre/post suspend hooks xen: suspend: add "arch" to pre/post suspend hooks xen: suspend: pass extra hypercall argument via suspend_info struct xen: suspend: refactor cancellation flag into a structure xen: suspend: use HYPERVISOR_suspend for PVHVM case instead of open coding xen: switch to new schedop hypercall by default. xen: use new schedop interface for suspend xen: do not respond to unknown xenstore control requests xen: fix compile issue if XEN is enabled but XEN_PVHVM is disabled xen: PV on HVM: support PV spinlocks and IPIs xen: make the ballon driver work for hvm domains xen-blkfront: handle Xen major numbers other than XENVBD xen: do not use xen_info on HVM, set pv_info name to "Xen HVM" xen: no need to delay xen_setup_shutdown_event for hvm guests anymore commit 27d2a8b97ebc4467e47722415b81ebe72d5f654f Merge: 010b8f4 44e6976 51de695 44b46c3 Author: Linus Torvalds Date: Tue Mar 15 10:49:16 2011 -0700 Merge branches 'stable/ia64', 'stable/blkfront-cleanup' and 'stable/cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/ia64' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: ia64 build broken due to "xen: switch to new schedop hypercall by default." * 'stable/blkfront-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: Union the blkif_request request specific fields * 'stable/cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: annotate functions which only call into __init at start of day xen p2m: annotate variable which appears unused xen: events: mark cpu_evtchn_mask_p as __refdata commit 010b8f4e264b0b6f596186574956dde2fa02df1c Merge: 397fae0 71eef7d Author: Linus Torvalds Date: Tue Mar 15 10:47:56 2011 -0700 Merge branch 'stable/irq.cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/irq.cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: events: remove dom0 specific xen_create_msi_irq xen: events: use xen_bind_pirq_msi_to_irq from xen_create_msi_irq xen: events: push set_irq_msi down into xen_create_msi_irq xen: events: update pirq_to_irq in xen_create_msi_irq xen: events: refactor xen_create_msi_irq slightly xen: events: separate MSI PIRQ allocation from PIRQ binding to IRQ xen: events: assume PHYSDEVOP_get_free_pirq exists xen: pci: collapse apic_register_gsi_xen_hvm and xen_hvm_register_pirq xen: events: return irq from xen_allocate_pirq_msi xen: events: drop XEN_ALLOC_IRQ flag to xen_allocate_pirq_msi xen: events: do not leak IRQ from xen_allocate_pirq_msi when no pirq available. xen: pci: only define xen_initdom_setup_msi_irqs if CONFIG_XEN_DOM0 commit 397fae081869784d07cd4edde0ddf436ca2011e0 Merge: c7146dd 1aa0b51 3d74a53 Author: Linus Torvalds Date: Tue Mar 15 10:47:16 2011 -0700 Merge branches 'stable/irq.rework' and 'stable/pcifront-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/irq.rework' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/irq: Cleanup up the pirq_to_irq for DomU PV PCI passthrough guests as well. xen: Use IRQF_FORCE_RESUME xen/timer: Missing IRQF_NO_SUSPEND in timer code broke suspend. xen: Fix compile error introduced by "switch to new irq_chip functions" xen: Switch to new irq_chip functions xen: Remove stale irq_chip.end xen: events: do not free legacy IRQs xen: events: allocate GSIs and dynamic IRQs from separate IRQ ranges. xen: events: add xen_allocate_irq_{dynamic, gsi} and xen_free_irq xen:events: move find_unbound_irq inside CONFIG_PCI_MSI xen: handled remapped IRQs when enabling a pcifront PCI device. genirq: Add IRQF_FORCE_RESUME * 'stable/pcifront-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: pci/xen: When free-ing MSI-X/MSI irq->desc also use generic code. pci/xen: Cleanup: convert int** to int[] pci/xen: Use xen_allocate_pirq_msi instead of xen_allocate_pirq xen-pcifront: Sanity check the MSI/MSI-X values xen-pcifront: don't use flush_scheduled_work() commit c7146dd0090b9c98ae8525900abf1c38fc7e4e0d Merge: 521cb40 706cc9d 86b3212 Author: Linus Torvalds Date: Tue Mar 15 10:32:15 2011 -0700 Merge branches 'stable/p2m-identity.v4.9.1' and 'stable/e820' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/p2m-identity.v4.9.1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set.. xen/m2p: No need to catch exceptions when we know that there is no RAM xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set. xen/debugfs: Add 'p2m' file for printing out the P2M layout. xen/setup: Set identity mapping for non-RAM E820 and E820 gaps. xen/mmu: WARN_ON when racing to swap middle leaf. xen/mmu: Set _PAGE_IOMAP if PFN is an identity PFN. xen/mmu: Add the notion of identity (1-1) mapping. xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY. * 'stable/e820' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/e820: Don't mark balloon memory as E820_UNUSABLE when running as guest and fix overflow. xen/setup: Inhibit resource API from using System RAM E820 gaps as PCI mem gaps. commit 25542c646afbf14c43fa7d2b443055cadb73b07a Author: Xiao Guangrong Date: Tue Mar 15 09:57:37 2011 +0800 x86, tlb, UV: Do small micro-optimization for native_flush_tlb_others() native_flush_tlb_others() is called from: flush_tlb_current_task() flush_tlb_mm() flush_tlb_page() All these functions disable preemption explicitly, so we can use smp_processor_id() instead of get_cpu() and put_cpu(). Signed-off-by: Xiao Guangrong Cc: Cliff Wickman LKML-Reference: <4D7EC791.4040003@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 8460b3e5bc64955aeefdd8357b3bf7b5ff79b3f2 Merge: 56396e6 521cb40 Author: Ingo Molnar Date: Tue Mar 15 08:29:44 2011 +0100 Merge commit 'v2.6.38' into x86/mm Conflicts: arch/x86/mm/numa_64.c Merge reason: Resolve the conflict, update the branch to .38. Signed-off-by: Ingo Molnar commit 326be7b484843988afe57566b627fb7a70beac56 Author: Al Viro Date: Sun Mar 13 17:08:22 2011 -0400 Allow passing O_PATH descriptors via SCM_RIGHTS datagrams Just need to make sure that AF_UNIX garbage collector won't confuse O_PATHed socket on filesystem for real AF_UNIX opened socket. Signed-off-by: Al Viro commit 65cfc6722361570bfe255698d9cd4dccaf47570d Author: Al Viro Date: Sun Mar 13 15:56:26 2011 -0400 readlinkat(), fchownat() and fstatat() with empty relative pathnames For readlinkat() we simply allow empty pathname; it will fail unless we have dfd equal to O_PATH-opened symlink, so we are outside of POSIX scope here. For fchownat() and fstatat() we allow AT_EMPTY_PATH; let the caller explicitly ask for such behaviour. Signed-off-by: Al Viro commit bcda76524cd1fa32af748536f27f674a13e56700 Author: Al Viro Date: Sun Mar 13 16:42:14 2011 -0400 Allow O_PATH for symlinks At that point we can't do almost nothing with them. They can be opened with O_PATH, we can manipulate such descriptors with dup(), etc. and we can see them in /proc/*/{fd,fdinfo}/*. We can't (and won't be able to) follow /proc/*/fd/* symlinks for those; there's simply not enough information for pathname resolution to go on from such point - to resolve a symlink we need to know which directory does it live in. We will be able to do useful things with them after the next commit, though - readlinkat() and fchownat() will be possible to use with dfd being an O_PATH-opened symlink and empty relative pathname. Combined with open_by_handle() it'll give us a way to do realink-by-handle and lchown-by-handle without messing with more redundant syscalls. Signed-off-by: Al Viro commit 1abf0c718f15a56a0a435588d1b104c7a37dc9bd Author: Al Viro Date: Sun Mar 13 03:51:11 2011 -0400 New kind of open files - "location only". New flag for open(2) - O_PATH. Semantics: * pathname is resolved, but the file itself is _NOT_ opened as far as filesystem is concerned. * almost all operations on the resulting descriptors shall fail with -EBADF. Exceptions are: 1) operations on descriptors themselves (i.e. close(), dup(), dup2(), dup3(), fcntl(fd, F_DUPFD), fcntl(fd, F_DUPFD_CLOEXEC, ...), fcntl(fd, F_GETFD), fcntl(fd, F_SETFD, ...)) 2) fcntl(fd, F_GETFL), for a common non-destructive way to check if descriptor is open 3) "dfd" arguments of ...at(2) syscalls, i.e. the starting points of pathname resolution * closing such descriptor does *NOT* affect dnotify or posix locks. * permissions are checked as usual along the way to file; no permission checks are applied to the file itself. Of course, giving such thing to syscall will result in permission checks (at the moment it means checking that starting point of ....at() is a directory and caller has exec permissions on it). fget() and fget_light() return NULL on such descriptors; use of fget_raw() and fget_raw_light() is needed to get them. That protects existing code from dealing with those things. There are two things still missing (they come in the next commits): one is handling of symlinks (right now we refuse to open them that way; see the next commit for semantics related to those) and another is descriptor passing via SCM_RIGHTS datagrams. Signed-off-by: Al Viro commit f2fa2ffc2046fdc35f96366d1ec8675f4d578522 Author: Aneesh Kumar K.V Date: Sat Jan 29 18:43:40 2011 +0530 ext4: Copy fs UUID to superblock File system UUID is made available to application via /proc//mountinfo Signed-off-by: Aneesh Kumar K.V Signed-off-by: Al Viro commit 03cb5f03dcb26846fcad345d8c15aae91579a53d Author: Aneesh Kumar K.V Date: Sat Jan 29 18:43:39 2011 +0530 ext3: Copy fs UUID to superblock. File system UUID is made available to application via /proc//mountinfo Signed-off-by: Aneesh Kumar K.V Signed-off-by: Al Viro commit 93f1c20bc8cdb757be50566eff88d65c3b26881f Author: Aneesh Kumar K.V Date: Sat Jan 29 18:43:38 2011 +0530 vfs: Export file system uuid via /proc//mountinfo We add a per superblock uuid field. File systems should update the uuid in the fill_super callback Signed-off-by: Aneesh Kumar K.V Signed-off-by: Al Viro commit a51571ccb8be1b88aea502ebba8350519682c16d Author: Aneesh Kumar K.V Date: Sat Jan 29 18:43:38 2011 +0530 unistd.h: Add new syscalls numbers to asm-generic Signed-off-by: Aneesh Kumar K.V Signed-off-by: Al Viro commit 6aae5f2b2085c5c90964bb78676ea8a6a336e037 Author: Aneesh Kumar K.V Date: Sat Jan 29 18:43:37 2011 +0530 x86: Add new syscalls for x86_64 This patch add new syscalls to x86_64 Signed-off-by: Aneesh Kumar K.V Signed-off-by: Al Viro commit 7dadb755b082c259f7dd4a95a3a6eb21646a28d5 Author: Aneesh Kumar K.V Date: Sat Jan 29 18:43:35 2011 +0530 x86: Add new syscalls for x86_32 This patch adds new syscalls to x86_32 Signed-off-by: Aneesh Kumar K.V Signed-off-by: Al Viro commit f17b6042073e7000a90063f7edbca59a5bd1caa2 Author: Aneesh Kumar K.V Date: Sat Jan 29 18:43:30 2011 +0530 fs: Remove i_nlink check from file system link callback Now that VFS check for inode->i_nlink == 0 and returns proper error, remove similar check from file system Signed-off-by: Aneesh Kumar K.V Signed-off-by: Al Viro commit aae8a97d3ec30788790d1720b71d76fd8eb44b73 Author: Aneesh Kumar K.V Date: Sat Jan 29 18:43:27 2011 +0530 fs: Don't allow to create hardlink for deleted file Add inode->i_nlink == 0 check in VFS. Some of the file systems do this internally. A followup patch will remove those instance. This is needed to ensure that with link by handle we don't allow to create hardlink of an unlinked file. The check also prevent a race between unlink and link Signed-off-by: Aneesh Kumar K.V Signed-off-by: Al Viro commit becfd1f37544798cbdfd788f32c827160fab98c1 Author: Aneesh Kumar K.V Date: Sat Jan 29 18:43:26 2011 +0530 vfs: Add open by file handle support [AV: duplicate of open() guts removed; file_open_root() used instead] Signed-off-by: Aneesh Kumar K.V Signed-off-by: Al Viro commit 990d6c2d7aee921e3bce22b2d6a750fd552262be Author: Aneesh Kumar K.V Date: Sat Jan 29 18:43:26 2011 +0530 vfs: Add name to file handle conversion support The syscall also return mount id which can be used to lookup file system specific information such as uuid in /proc//mountinfo Signed-off-by: Aneesh Kumar K.V Signed-off-by: Al Viro commit f52e0c11305aa09ed56cad97ffc8f0cdc3d78b5d Author: Al Viro Date: Mon Mar 14 18:56:51 2011 -0400 New AT_... flag: AT_EMPTY_PATH For name_to_handle_at(2) we'll want both ...at()-style syscall that would be usable for non-directory descriptors (with empty relative pathname). Introduce new flag (AT_EMPTY_PATH) to deal with that and corresponding LOOKUP_EMPTY; teach user_path_at() and path_init() to deal with the latter. Signed-off-by: Al Viro commit 07d5ecae2940ddd77746e2fb597dcf57d3c2e277 Author: Thomas Gleixner Date: Mon Mar 14 20:00:30 2011 +0100 arm: Remove bogus comment in futex_atomic_cmpxchg_inatomic() commit 522d7dec(futex: Remove redundant pagefault_disable in futex_atomic_cmpxchg_inatomic()) added a bogus comment. /* Note that preemption is disabled by futex_atomic_cmpxchg_inatomic * call sites. */ Bogus in two aspects: 1) pagefault_disable != preempt_disable even if the mechanism we use is the same 2) we have a call site which deliberately does not disable pagefaults as it wants the possible fault to be handled - though that has been changed for consistency reasons now. Sigh. I really should have seen that when committing the above. :( Catched-by-and-rightfully-ranted-at-by: Linus Torvalds Signed-off-by: Thomas Gleixner LKML-Reference: Cc: Michel Lespinasse Cc: Darren Hart commit 6e0aa9f8a8190e0879a29bd67aa606b51734a122 Author: Thomas Gleixner Date: Mon Mar 14 10:34:35 2011 +0100 futex: Deobfuscate handle_futex_death() handle_futex_death() uses futex_atomic_cmpxchg_inatomic() without disabling page faults. That's ok, but totally non obvious. We don't hold locks so we actually can and want to fault here, because the get_user() before futex_atomic_cmpxchg_inatomic() does not guarantee a R/W mapping. We could just add a big fat comment to explain this, but actually changing the code so that the functionality is entirely clear is better. Use the helper function which disables page faults around the futex_atomic_cmpxchg_inatomic() and handle a fault with a call to fault_in_user_writeable() as all other places in the futex code do as well. Pointed-out-by: Linus Torvalds Signed-off-by: Thomas Gleixner Acked-by: Darren Hart Cc: Michel Lespinasse Cc: Peter Zijlstra Cc: Matt Turner Cc: Russell King Cc: David Howells Cc: Tony Luck Cc: Michal Simek Cc: Ralf Baechle Cc: "James E.J. Bottomley" Cc: Benjamin Herrenschmidt Cc: Martin Schwidefsky Cc: Paul Mundt Cc: "David S. Miller" Cc: Chris Metcalf LKML-Reference: Signed-off-by: Thomas Gleixner commit 706cc9d2a4cb9b03217e15b0bb3d117f4d5109ee Author: Stefano Stabellini Date: Wed Feb 2 18:32:59 2011 +0000 xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set.. If there is no proper PFN value in the M2P for the MFN (so we get 0xFFFFF.. or 0x55555, or 0x0), we should consult the M2P override to see if there is an entry for this. [Note: we also consult the M2P override if the MFN is past our machine_to_phys size]. We consult the P2M with the PFN. In case the returned MFN is one of the special values: 0xFFF.., 0x5555 (which signify that the MFN can be either "missing" or it belongs to DOMID_IO) or the p2m(m2p(mfn)) != mfn, we check the M2P override. If we fail the M2P override check, we reset the PFN value to INVALID_P2M_ENTRY. Next we try to find the MFN in the P2M using the MFN value (not the PFN value) and if found, we know that this MFN is an identity value and return it as so. Otherwise we have exhausted all the posibilities and we return the PFN, which at this stage can either be a real PFN value found in the machine_to_phys.. array, or INVALID_P2M_ENTRY value. [v1: Added Review-by tag] Reviewed-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 146c4e511717e581065800938537b276173d8548 Author: Konrad Rzeszutek Wilk Date: Fri Jan 14 17:55:44 2011 -0500 xen/m2p: No need to catch exceptions when we know that there is no RAM .. beyound what we think is the end of memory. However there might be more System RAM - but assigned to a guest. Hence jump to the M2P override check and consult. [v1: Added Review-by tag] Reviewed-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit fc25151d9ac7d809239fe68de0a1490b504bb94a Author: Konrad Rzeszutek Wilk Date: Thu Dec 23 16:25:29 2010 -0500 xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set. Only enabled if XEN_DEBUG is enabled. We print a warning when: pfn_to_mfn(pfn) == pfn, but no VM_IO (_PAGE_IOMAP) flag set (and pfn is an identity mapped pfn) pfn_to_mfn(pfn) != pfn, and VM_IO flag is set. (ditto, pfn is an identity mapped pfn) [v2: Make it dependent on CONFIG_XEN_DEBUG instead of ..DEBUG_FS] [v3: Fix compiler warning] Reviewed-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 2222e71bd6eff7b2ad026d4ee663b6327c5a49f5 Author: Konrad Rzeszutek Wilk Date: Wed Dec 22 08:57:30 2010 -0500 xen/debugfs: Add 'p2m' file for printing out the P2M layout. We walk over the whole P2M tree and construct a simplified view of which PFN regions belong to what level and what type they are. Only enabled if CONFIG_XEN_DEBUG_FS is set. [v2: UNKN->UNKNOWN, use uninitialized_var] [v3: Rebased on top of mmu->p2m code split] [v4: Fixed the else if] Reviewed-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 68df0da7f42be6ae017fe9f48ac414c43a7b9d32 Author: Konrad Rzeszutek Wilk Date: Tue Feb 1 17:15:30 2011 -0500 xen/setup: Set identity mapping for non-RAM E820 and E820 gaps. We walk the E820 region and start at 0 (for PV guests we start at ISA_END_ADDRESS) and skip any E820 RAM regions. For all other regions and as well the gaps we set them to be identity mappings. The reasons we do not want to set the identity mapping from 0-> ISA_END_ADDRESS when running as PV is b/c that the kernel would try to read DMI information and fail (no permissions to read that). There is a lot of gnarly code to deal with that weird region so we won't try to do a cleanup in this patch. This code ends up calling 'set_phys_to_identity' with the start and end PFN of the the E820 that are non-RAM or have gaps. On 99% of machines that means one big region right underneath the 4GB mark. Usually starts at 0xc0000 (or 0x80000) and goes to 0x100000. [v2: Fix for E820 crossing 1MB region and clamp the start] [v3: Squshed in code that does this over ranges] [v4: Moved the comment to the correct spot] [v5: Use the "raw" E820 from the hypervisor] [v6: Added Review-by tag] Reviewed-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit c7617798771ad588d585986d896197c04b737621 Author: Konrad Rzeszutek Wilk Date: Tue Jan 18 20:17:10 2011 -0500 xen/mmu: WARN_ON when racing to swap middle leaf. The initial bootup code uses set_phys_to_machine quite a lot, and after bootup it would be used by the balloon driver. The balloon driver does have mutex lock so this should not be necessary - but just in case, add a WARN_ON if we do hit this scenario. If we do fail this, it is OK to continue as there is a backup mechanism (VM_IO) that can bypass the P2M and still set the _PAGE_IOMAP flags. [v2: Change from WARN to BUG_ON] [v3: Rebased on top of xen->p2m code split] [v4: Change from BUG_ON to WARN] Reviewed-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit fb38923ead10aa8a28db191548e176e8856614d7 Author: Konrad Rzeszutek Wilk Date: Wed Jan 5 15:46:31 2011 -0500 xen/mmu: Set _PAGE_IOMAP if PFN is an identity PFN. If we find that the PFN is within the P2M as an identity PFN make sure to tack on the _PAGE_IOMAP flag. Reviewed-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit f4cec35b0d4b90d96e3770a3d1e68ea882e7a7c8 Author: Konrad Rzeszutek Wilk Date: Tue Jan 18 20:15:21 2011 -0500 xen/mmu: Add the notion of identity (1-1) mapping. Our P2M tree structure is a three-level. On the leaf nodes we set the Machine Frame Number (MFN) of the PFN. What this means is that when one does: pfn_to_mfn(pfn), which is used when creating PTE entries, you get the real MFN of the hardware. When Xen sets up a guest it initially populates a array which has descending (or ascending) MFN values, as so: idx: 0, 1, 2 [0x290F, 0x290E, 0x290D, ..] so pfn_to_mfn(2)==0x290D. If you start, restart many guests that list starts looking quite random. We graft this structure on our P2M tree structure and stick in those MFN in the leafs. But for all other leaf entries, or for the top root, or middle one, for which there is a void entry, we assume it is "missing". So pfn_to_mfn(0xc0000)=INVALID_P2M_ENTRY. We add the possibility of setting 1-1 mappings on certain regions, so that: pfn_to_mfn(0xc0000)=0xc0000 The benefit of this is, that we can assume for non-RAM regions (think PCI BARs, or ACPI spaces), we can create mappings easily b/c we get the PFN value to match the MFN. For this to work efficiently we introduce one new page p2m_identity and allocate (via reserved_brk) any other pages we need to cover the sides (1GB or 4MB boundary violations). All entries in p2m_identity are set to INVALID_P2M_ENTRY type (Xen toolstack only recognizes that and MFNs, no other fancy value). On lookup we spot that the entry points to p2m_identity and return the identity value instead of dereferencing and returning INVALID_P2M_ENTRY. If the entry points to an allocated page, we just proceed as before and return the PFN. If the PFN has IDENTITY_FRAME_BIT set we unmask that in appropriate functions (pfn_to_mfn). The reason for having the IDENTITY_FRAME_BIT instead of just returning the PFN is that we could find ourselves where pfn_to_mfn(pfn)==pfn for a non-identity pfn. To protect ourselves against we elect to set (and get) the IDENTITY_FRAME_BIT on all identity mapped PFNs. This simplistic diagram is used to explain the more subtle piece of code. There is also a digram of the P2M at the end that can help. Imagine your E820 looking as so: 1GB 2GB /-------------------+---------\/----\ /----------\ /---+-----\ | System RAM | Sys RAM ||ACPI| | reserved | | Sys RAM | \-------------------+---------/\----/ \----------/ \---+-----/ ^- 1029MB ^- 2001MB [1029MB = 263424 (0x40500), 2001MB = 512256 (0x7D100), 2048MB = 524288 (0x80000)] And dom0_mem=max:3GB,1GB is passed in to the guest, meaning memory past 1GB is actually not present (would have to kick the balloon driver to put it in). When we are told to set the PFNs for identity mapping (see patch: "xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.") we pass in the start of the PFN and the end PFN (263424 and 512256 respectively). The first step is to reserve_brk a top leaf page if the p2m[1] is missing. The top leaf page covers 512^2 of page estate (1GB) and in case the start or end PFN is not aligned on 512^2*PAGE_SIZE (1GB) we loop on aligned 1GB PFNs from start pfn to end pfn. We reserve_brk top leaf pages if they are missing (means they point to p2m_mid_missing). With the E820 example above, 263424 is not 1GB aligned so we allocate a reserve_brk page which will cover the PFNs estate from 0x40000 to 0x80000. Each entry in the allocate page is "missing" (points to p2m_missing). Next stage is to determine if we need to do a more granular boundary check on the 4MB (or 2MB depending on architecture) off the start and end pfn's. We check if the start pfn and end pfn violate that boundary check, and if so reserve_brk a middle (p2m[x][y]) leaf page. This way we have a much finer granularity of setting which PFNs are missing and which ones are identity. In our example 263424 and 512256 both fail the check so we reserve_brk two pages. Populate them with INVALID_P2M_ENTRY (so they both have "missing" values) and assign them to p2m[1][2] and p2m[1][488] respectively. At this point we would at minimum reserve_brk one page, but could be up to three. Each call to set_phys_range_identity has at maximum a three page cost. If we were to query the P2M at this stage, all those entries from start PFN through end PFN (so 1029MB -> 2001MB) would return INVALID_P2M_ENTRY ("missing"). The next step is to walk from the start pfn to the end pfn setting the IDENTITY_FRAME_BIT on each PFN. This is done in 'set_phys_range_identity'. If we find that the middle leaf is pointing to p2m_missing we can swap it over to p2m_identity - this way covering 4MB (or 2MB) PFN space. At this point we do not need to worry about boundary aligment (so no need to reserve_brk a middle page, figure out which PFNs are "missing" and which ones are identity), as that has been done earlier. If we find that the middle leaf is not occupied by p2m_identity or p2m_missing, we dereference that page (which covers 512 PFNs) and set the appropriate PFN with IDENTITY_FRAME_BIT. In our example 263424 and 512256 end up there, and we set from p2m[1][2][256->511] and p2m[1][488][0->256] with IDENTITY_FRAME_BIT set. All other regions that are void (or not filled) either point to p2m_missing (considered missing) or have the default value of INVALID_P2M_ENTRY (also considered missing). In our case, p2m[1][2][0->255] and p2m[1][488][257->511] contain the INVALID_P2M_ENTRY value and are considered "missing." This is what the p2m ends up looking (for the E820 above) with this fabulous drawing: p2m /--------------\ /-----\ | &mfn_list[0],| /-----------------\ | 0 |------>| &mfn_list[1],| /---------------\ | ~0, ~0, .. | |-----| | ..., ~0, ~0 | | ~0, ~0, [x]---+----->| IDENTITY [@256] | | 1 |---\ \--------------/ | [p2m_identity]+\ | IDENTITY [@257] | |-----| \ | [p2m_identity]+\\ | .... | | 2 |--\ \-------------------->| ... | \\ \----------------/ |-----| \ \---------------/ \\ | 3 |\ \ \\ p2m_identity |-----| \ \-------------------->/---------------\ /-----------------\ | .. +->+ | [p2m_identity]+-->| ~0, ~0, ~0, ... | \-----/ / | [p2m_identity]+-->| ..., ~0 | / /---------------\ | .... | \-----------------/ / | IDENTITY[@0] | /-+-[x], ~0, ~0.. | / | IDENTITY[@256]|<----/ \---------------/ / | ~0, ~0, .... | | \---------------/ | p2m_missing p2m_missing /------------------\ /------------\ | [p2m_mid_missing]+---->| ~0, ~0, ~0 | | [p2m_mid_missing]+---->| ..., ~0 | \------------------/ \------------/ where ~0 is INVALID_P2M_ENTRY. IDENTITY is (PFN | IDENTITY_BIT) Reviewed-by: Ian Campbell [v5: Changed code to use ranges, added ASCII art] [v6: Rebased on top of xen->p2m code split] [v4: Squished patches in just this one] [v7: Added RESERVE_BRK for potentially allocated pages] [v8: Fixed alignment problem] [v9: Changed 1<<3X to 1< commit 5fe0c2378884e68beb532f5890cc0e3539ac747b Author: Aneesh Kumar K.V Date: Sat Jan 29 18:43:25 2011 +0530 exportfs: Return the minimum required handle size The exportfs encode handle function should return the minimum required handle size. This helps user to find out the handle size by passing 0 handle size in the first step and then redoing to the call again with the returned handle size value. Acked-by: Serge Hallyn Signed-off-by: Aneesh Kumar K.V Signed-off-by: Al Viro commit c8b91accfa1059d5565443193d89572eca2f5dd6 Author: Al Viro Date: Sat Mar 12 10:41:39 2011 -0500 clean statfs-like syscalls up New helpers: user_statfs() and fd_statfs(), taking userland pathname and descriptor resp. and filling struct kstatfs. Syscalls of statfs family (native, compat and foreign - osf and hpux on alpha and parisc resp.) switched to those. Removes some boilerplate code, simplifies cleanup on errors... Signed-off-by: Al Viro commit 73d049a40fc6269189c4e2ba6792cb5dd054883c Author: Al Viro Date: Fri Mar 11 12:08:24 2011 -0500 open-style analog of vfs_path_lookup() new function: file_open_root(dentry, mnt, name, flags) opens the file vfs_path_lookup would arrive to. Note that name can be empty; in that case the usual requirement that dentry should be a directory is lifted. open-coded equivalents switched to it, may_open() got down exactly one caller and became static. Signed-off-by: Al Viro commit 5b6ca027d85b7438c84b78a54ccdc2e53f2909cd Author: Al Viro Date: Wed Mar 9 23:04:47 2011 -0500 reduce vfs_path_lookup() to do_path_lookup() New lookup flag: LOOKUP_ROOT. nd->root is set (and held) by caller, path_init() starts walking from that place and all pathname resolution machinery never drops nd->root if that flag is set. That turns vfs_path_lookup() into a special case of do_path_lookup() *and* gets us down to 3 callers of link_path_walk(), making it finally feasible to rip the handling of trailing symlink out of link_path_walk(). That will not only simply the living hell out of it, but make life much simpler for unionfs merge. Trailing symlink handling will become iterative, which is a good thing for stack footprint in a lot of situations as well. Signed-off-by: Al Viro commit 5a18fff2090c3af830d699c8ccb230498a1e37e5 Author: Al Viro Date: Fri Mar 11 04:44:53 2011 -0500 untangle do_lookup() That thing has devolved into rats nest of gotos; sane use of unlikely() gets rid of that horror and gives much more readable structure: * make a fast attempt to find a dentry; false negatives are OK. In RCU mode if everything went fine, we are done, otherwise just drop out of RCU. If we'd done (RCU) ->d_revalidate() and it had not refused outright (i.e. didn't give us -ECHILD), remember its result. * now we are not in RCU mode and hopefully have a dentry. If we do not, lock parent, do full d_lookup() and if that has not found anything, allocate and call ->lookup(). If we'd done that ->lookup(), remember that dentry is good and we don't need to revalidate it. * now we have a dentry. If it has ->d_revalidate() and we can't skip it, call it. * hopefully dentry is good; if not, either fail (in case of error) or try to invalidate it. If d_invalidate() has succeeded, drop it and retry everything as if original attempt had not found a dentry. * now we can finish it up - deal with mountpoint crossing and automount. Signed-off-by: Al Viro commit 40b39136f07279fdc868a36cba050f4e84ce0ace Author: Al Viro Date: Wed Mar 9 16:22:18 2011 -0500 path_openat: clean ELOOP handling a bit Signed-off-by: Al Viro commit f374ed5fa8afed8590deaae5dc147422e0e1a6d9 Author: Al Viro Date: Wed Mar 9 01:34:45 2011 -0500 do_last: kill a rudiment of old ->d_revalidate() workaround There used to be time when ->d_revalidate() couldn't return an error. So intents code had lookup_instantiate_filp() stash ERR_PTR(error) in nd->intent.open.filp and had it checked after lookup_hash(), to catch the otherwise silent failures. That had been introduced by commit 4af4c52f34606bdaab6930a845550c6fb02078a4. These days ->d_revalidate() can and does propagate errors back to callers explicitly, so this check isn't needed anymore. Signed-off-by: Al Viro commit 6c0d46c493217cf48999b3f8808910ae534aa085 Author: Al Viro Date: Wed Mar 9 00:59:59 2011 -0500 fold __open_namei_create() and open_will_truncate() into do_last() ... and clean up a bit more Signed-off-by: Al Viro commit ca344a894b41a133dab07dfbbdf652c053f6658c Author: Al Viro Date: Wed Mar 9 00:36:45 2011 -0500 do_last: unify may_open() call and everyting after it We have a bunch of diverging codepaths in do_last(); some of them converge, but the case of having to create a new file duplicates large part of common tail of the rest and exits separately. Massage them so that they could be merged. Signed-off-by: Al Viro commit 9b44f1b3928b6f41532c9a1dc9a6fc665989ad5b Author: Al Viro Date: Wed Mar 9 00:17:27 2011 -0500 move may_open() from __open_name_create() to do_last() Signed-off-by: Al Viro commit 0f9d1a10c341020617e5b1c7f9c16f6a070438ec Author: Al Viro Date: Wed Mar 9 00:13:14 2011 -0500 expand finish_open() in its only caller Signed-off-by: Al Viro commit 5a202bcd75bbd2397136397961babbd8463416af Author: Al Viro Date: Tue Mar 8 14:17:44 2011 -0500 sanitize pathname component hash calculation Lift it to lookup_one_len() and link_path_walk() resp. into the same place where we calculated default hash function of the same name. Signed-off-by: Al Viro commit 6a96ba54418be740303765c0f52be028573cb99a Author: Al Viro Date: Mon Mar 7 23:49:20 2011 -0500 kill __lookup_one_len() only one caller left Signed-off-by: Al Viro commit fe2d35ff0d18a2c93993b0d7d46f846ff4331b72 Author: Al Viro Date: Sat Mar 5 22:58:25 2011 -0500 switch non-create side of open() to use of do_last() Instead of path_lookupat() doing trailing symlink resolution, use the same scheme as on the O_CREAT side. Walk with LOOKUP_PARENT, then (in do_last()) look the final component up, then either open it or return error or, if it's a symlink, give the symlink back to path_openat() to be resolved there. The really messy complication here is RCU. We don't want to drop out of RCU mode before the final lookup, since we don't want to bounce parent directory ->d_count without a good reason. Result is _not_ pretty; later in the series we'll clean it up. For now we are roughly back where we'd been before the revert done by Nick's series - top-level logics of path_openat() is cleaned up, do_last() does actual opening, symlink resolution is done uniformly. Signed-off-by: Al Viro commit 70e9b3571107b88674cd55ae4bed33f76261e7d3 Author: Al Viro Date: Sat Mar 5 21:12:22 2011 -0500 get rid of nd->file Don't stash the struct file * used as starting point of walk in nameidata; pass file ** to path_init() instead. Signed-off-by: Al Viro commit 951361f954596bd134d4270df834f47d151f98a6 Author: Al Viro Date: Fri Mar 4 14:44:37 2011 -0500 get rid of the last LOOKUP_RCU dependencies in link_path_walk() New helper: terminate_walk(). An error has happened during pathname resolution and we either drop nd->path or terminate RCU, depending the mode we had been in. After that, nd is essentially empty. Switch link_path_walk() to using that for cleanup. Now the top-level logics in link_path_walk() is back to sanity. RCU dependencies are in the lower-level functions. Signed-off-by: Al Viro commit a7472baba22dd5d68580f528374f93421b33667e Author: Al Viro Date: Fri Mar 4 14:39:30 2011 -0500 make nameidata_dentry_drop_rcu_maybe() always leave RCU mode Now we have do_follow_link() guaranteed to leave without dangling RCU and the next step will get LOOKUP_RCU logics completely out of link_path_walk(). Signed-off-by: Al Viro commit ef7562d5283a91da3ba5c14de3221f47b7f08823 Author: Al Viro Date: Fri Mar 4 14:35:59 2011 -0500 make handle_dots() leave RCU mode on error Signed-off-by: Al Viro commit 4455ca6223cc59cbc0a75f4be8bce9e84cc0d6b8 Author: Al Viro Date: Fri Mar 4 14:28:10 2011 -0500 clear RCU on all failure exits from link_path_walk() Signed-off-by: Al Viro commit 9856fa1b281eccdc9f8d94d716e96818c675e78e Author: Al Viro Date: Fri Mar 4 14:22:06 2011 -0500 pull handling of . and .. into inlined helper getting LOOKUP_RCU checks out of link_path_walk()... Signed-off-by: Al Viro commit 7bc055d1d524f209bf49d8b9cb220712dd7df4ed Author: Al Viro Date: Wed Feb 23 19:41:31 2011 -0500 kill out_dput: in link_path_walk() Signed-off-by: Al Viro commit 13aab428a73d3200b9283b61b7fdf5713181ac66 Author: Al Viro Date: Wed Feb 23 17:54:08 2011 -0500 separate -ESTALE/-ECHILD retries in do_filp_open() from real work new helper: path_openat(). Does what do_filp_open() does, except that it tries only the walk mode (RCU/normal/force revalidation) it had been told to. Both create and non-create branches are using path_lookupat() now. Fixed the double audit_inode() in non-create branch. Signed-off-by: Al Viro commit 47c805dc2d2dff686962f5f0baa6bac2d703ba19 Author: Al Viro Date: Wed Feb 23 17:44:09 2011 -0500 switch do_filp_open() to struct open_flags take calculation of open_flags by open(2) arguments into new helper in fs/open.c, move filp_open() over there, have it and do_sys_open() use that helper, switch exec.c callers of do_filp_open() to explicit (and constant) struct open_flags. Signed-off-by: Al Viro commit c3e380b0b3cfa613189fb91513efd88a65e1d9d8 Author: Al Viro Date: Wed Feb 23 13:39:45 2011 -0500 Collect "operation mode" arguments of do_last() into a structure No point messing with passing shitloads of "operation mode" arguments to do_open() one by one, especially since they are not going to change during do_filp_open(). Collect them into a struct, fill it and pass to do_last() by reference. Make sure that lookup intent flags are correctly set and removed - we want them for do_last(), but they make no sense for __do_follow_link(). Signed-off-by: Al Viro commit f1afe9efc84476ca42fbb7301a441021063eead7 Author: Al Viro Date: Tue Feb 22 22:27:28 2011 -0500 clean up the failure exits after __do_follow_link() in do_filp_open() Signed-off-by: Al Viro commit 36f3b4f69070fee7c647bab5dc4408990bb3606c Author: Al Viro Date: Tue Feb 22 21:24:38 2011 -0500 pull security_inode_follow_link() into __do_follow_link() Signed-off-by: Al Viro commit 086e183a641109033420e0b26ddecb6f4abb4c89 Author: Al Viro Date: Tue Feb 22 20:56:27 2011 -0500 pull dropping RCU on success of link_path_walk() into path_lookupat() Signed-off-by: Al Viro commit 16c2cd7179881d5dd87779512ca5a0d657c64f62 Author: Al Viro Date: Tue Feb 22 15:50:10 2011 -0500 untangle the "need_reval_dot" mess instead of ad-hackery around need_reval_dot(), do the following: set a flag (LOOKUP_JUMPED) in the beginning of path, on absolute symlink traversal, on ".." and on procfs-style symlinks. Clear on normal components, leave unchanged on ".". Non-nested callers of link_path_walk() call handle_reval_path(), which checks that flag is set and that fs does want the final revalidate thing, then does ->d_revalidate(). In link_path_walk() all the return_reval stuff is gone. Signed-off-by: Al Viro commit fe479a580dc9c737c4eb49ff7fdb31d41d2c7003 Author: Al Viro Date: Tue Feb 22 15:10:03 2011 -0500 merge component type recognition no need to do it in three places... Signed-off-by: Al Viro commit e41f7d4ee5bdb00da7d327a00b0ab9c4a2e9eaa3 Author: Al Viro Date: Tue Feb 22 14:02:58 2011 -0500 merge path_init and path_init_rcu Actual dependency on whether we want RCU or not is in 3 small areas (as it ought to be) and everything around those is the same in both versions. Since each function has only one caller and those callers are on two sides of if (flags & LOOKUP_RCU), it's easier and cleaner to merge them and pull the checks inside. Signed-off-by: Al Viro commit ee0827cd6b42b0385dc1a116cd853ac1b739f711 Author: Al Viro Date: Mon Feb 21 23:38:09 2011 -0500 sanitize path_walk() mess New helper: path_lookupat(). Basically, what do_path_lookup() boils to modulo -ECHILD/-ESTALE handler. path_walk* family is gone; vfs_path_lookup() is using link_path_walk() directly, do_path_lookup() and do_filp_open() are using path_lookupat(). Signed-off-by: Al Viro commit 52094c8a0610cf57920ad4c6c57470ae2ccbbd25 Author: Al Viro Date: Mon Feb 21 21:34:47 2011 -0500 take RCU-dependent stuff around exec_permission() into a new helper Signed-off-by: Al Viro commit c9c6cac0c2bdbda42e7b804838648d0bc60ddb13 Author: Al Viro Date: Wed Feb 16 15:15:47 2011 -0500 kill path_lookup() all remaining callers pass LOOKUP_PARENT to it, so flags argument can die; renamed to kern_path_parent() Signed-off-by: Al Viro commit 15a9155fe3e8215c02b80df51ec2cac7c0d726ad Author: Al Viro Date: Wed Feb 16 15:08:54 2011 -0500 fix race in audit_get_nd() don't rely on pathname resolution ending up twice at the same point... Signed-off-by: Al Viro commit 586ce098a23b6ab7383df853a84ae3d48dc889aa Author: Al Viro Date: Sun Mar 13 01:50:58 2011 -0500 compat breakage in preadv() and pwritev() Fix for a dumb preadv()/pwritev() compat bug - unlike the native variants, compat_... ones forget to check FMODE_P{READ,WRITE}, so e.g. on pipe the native preadv() will fail with -ESPIPE and compat one will act as readv() and succeed. Not critical, but it's a clear bug with trivial fix. Signed-off-by: Al Viro commit 6e6823d17b157f185be09f4c70181299f9273f0b Author: Torben Hohn Date: Thu Mar 3 18:26:14 2011 +0100 posix-clocks: Check write permissions in posix syscalls pc_clock_settime() and pc_clock_adjtime() do not check whether the fd was opened in write mode, so a clock can be set with a read only fd. [ tglx: We deliberately do not return -EPERM as we want this to be distingushable from the capability based permission check ] Signed-off-by: Torben Hohn LKML-Reference: <1299173174-348-4-git-send-email-torbenh@gmx.de> Cc: Richard Cochran Cc: John Stultz Cc: Thomas Gleixner commit c0185808eb85139f45dbfd0de66963c498d0c4db Author: Thomas Gleixner Date: Mon Feb 7 02:24:08 2011 +0100 x86: Enable forced interrupt threading support All non threadeable interrupts are marked. Enable forced irq threading support. Signed-off-by: Thomas Gleixner commit 9bbbff25b31bbfdb512563cc5a14bcde5bf29bdf Author: Thomas Gleixner Date: Thu Jan 27 18:17:01 2011 +0100 x86: Mark low level interrupts IRQF_NO_THREAD These cannot be threaded. Signed-off-by: Thomas Gleixner commit 517e49815677b43b26d3167aadca83919ef36a45 Author: Thomas Gleixner Date: Thu Dec 16 17:59:57 2010 +0100 x86: Use generic show_interrupts Signed-off-by: Thomas Gleixner commit 1a0e62a49ad417712cfa79a395f6c39f67aadb44 Author: Thomas Gleixner Date: Sat Mar 12 13:47:18 2011 +0100 x86: ioapic: Avoid redundant lookup of irq_cfg The caller of ioapic_register_intr() has a pointer to the irq_cfg for the irq already. Hand it in to avoid a full lookup. In msi_compose_msg() the pointer to irq_cfg is already available. No need to look it up again. Signed-off-by: Thomas Gleixner commit 08221110e88ae101acf2464154f98e6d1b1ab21c Author: Thomas Gleixner Date: Fri Feb 4 18:56:11 2011 +0100 x86: ioapic: Use new move_irq functions Use the functions which take irq_data. We already have a pointer to irq_data. That avoids a sparse irq lookup in move_*_irq. Signed-off-by: Thomas Gleixner commit 51c43ac6e4540786a6d79ea318b30f7bfa615ec7 Author: Thomas Gleixner Date: Thu Feb 10 21:40:36 2011 +0100 x86: Use the proper accessors in fixup_irqs() Signed-off-by: Thomas Gleixner commit 5451ddc5621550a2f4f82ddeac938b3ca392525f Author: Thomas Gleixner Date: Sat Feb 5 15:35:51 2011 +0100 x86: ioapic: Use irq_data->state Use the state information in irq_data. That avoids a radix-tree lookup from apic_ack_level() and simplifies setup_ioapic_dest(). Signed-off-by: Thomas Gleixner commit c60eaf25cd211d2282a6edddb3ce26b1e5795097 Author: Thomas Gleixner Date: Fri Mar 11 13:17:16 2011 +0100 x86: ioapic: Simplify irq chip and handler setup Use pointers instead of ugly multiline if/else constructs. Signed-off-by: Thomas Gleixner commit 2c778651f73d92edb847e65d371bb29b17c7ca60 Author: Thomas Gleixner Date: Sat Mar 12 12:20:43 2011 +0100 x86: Cleanup the genirq name space genirq is switching to a consistent name space for the irq related functions. Convert x86. Conversion was done with coccinelle. Signed-off-by: Thomas Gleixner commit c1c5e4d463f844e5d44cafab752049267c102ca3 Merge: cfe08bb d209a69 Author: Thomas Gleixner Date: Sat Mar 12 13:23:37 2011 +0100 Merge branch 'irq/core' into x86/irq Reason: Enabling irq threads and update to latest genirq functionality requires the core code Signed-off-by: Thomas Gleixner commit cfe08bba1e0017d94a8f738a195d3a2b479327e3 Merge: 58bff94 abb0052 Author: Thomas Gleixner Date: Sat Mar 12 13:22:22 2011 +0100 Merge branch 'x86/apic' into x86/irq Reason: Update to latest genirq code conflicts with pending apic changes Signed-off-by: Thomas Gleixner commit 995612178c88407d8330f580ba6572cb8b284dd8 Merge: 8d7718a 6d55da5 Author: Thomas Gleixner Date: Sat Mar 12 11:37:14 2011 +0100 Merge branch 'tip/futex/devel' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-rt into core/futexes futex,plist: Pass the real head of the priority list to plist_del() futex,plist: Remove debug lock assignment from plist_node plist: Shrink struct plist_head plist: Add priority list test commit 56396e6823fe9b42fe9cf9403d6ed67756255f70 Author: Tejun Heo Date: Fri Mar 11 10:33:31 2011 +0100 x86-64, NUMA: Don't call numa_set_distanc() for all possible node combinations during emulation The distance transforming in numa_emulation() used to call numa_set_distance() for all MAX_NUMNODES * MAX_NUMNODES node combinations regardless of which are enabled. As numa_set_distance() ignores all out-of-bound distance settings, this doesn't cause any problem other than looping unnecessarily many times during boot. However, as MAX_NUMNODES * MAX_NUMNODES can be pretty high, update the code such that it iterates through only the enabled combinations. Yinghai Lu identified the issue and provided an initial patch to address the issue; however, the patch was incorrect in that it didn't build emulated distance table when there's no physical distance table and unnecessarily complex. http://thread.gmane.org/gmane.linux.kernel/1107986/focus=1107988 Signed-off-by: Tejun Heo Reported-by: Yinghai Lu Acked-by: Yinghai Lu commit d209a699a0b975ad47f399d70ddc3791f1b84496 Author: Thomas Gleixner Date: Fri Mar 11 21:22:14 2011 +0100 genirq: Add chip flag to force mask on suspend On suspend we disable all interrupts in the core code, but this does not mask the interrupt line in the default implementation as we use a lazy disable approach. That means we mark the interrupt disabled, but leave the hardware unmasked. That's an optimization because we avoid the hardware access for the common case where no interrupt happens after we marked it disabled. If an interrupt happens, then the interrupt flow handler masks the line at the hardware level and marks it pending. Suspend makes use of this delayed disable as it "disables" all interrupts when preparing the suspend transition. Right before the system goes into hardware suspend state it checks whether one of the interrupts which is marked as a wakeup interrupt came in after disabling it. Most interrupt chips have a separate register which selects the interrupts which can wake up the system from suspend, so we don't have to mask any on the non wakeup interrupts. But now we have to deal with brilliant designed hardware which lacks such a wakeup configuration facility. For such hardware it's necessary to mask all non wakeup interrupts before going into suspend in order to avoid the wakeup from random interrupts. Rather than working around this in the affected interrupt chip implementations we can solve this elegant in the core code itself. Add a flag IRQCHIP_MASK_ON_SUSPEND which can be set by the irq chip implementation to indicate, that the interrupts which are not selected as wakeup sources must be masked in the suspend path. Mask them in the loop which checks the wakeup interrupts pending flag. Signed-off-by: Thomas Gleixner Reviewed-by: Abhijeet Dharmapurikar LKML-Reference: commit 371c394af27ab7d1e58a66bc19d9f1f3ac1f67b4 Author: Alexander van Heukelum Date: Fri Mar 11 21:59:38 2011 +0100 x86, binutils, xen: Fix another wrong size directive The latest binutils (2.21.0.20110302/Ubuntu) breaks the build yet another time, under CONFIG_XEN=y due to a .size directive that refers to a slightly differently named (hence, to the now very strict and unforgiving assembler, non-existent) symbol. [ mingo: This unnecessary build breakage caused by new binutils version 2.21 gets escallated back several kernel releases spanning several years of Linux history, affecting over 130,000 upstream kernel commits (!), on CONFIG_XEN=y 64-bit kernels (i.e. essentially affecting all major Linux distro kernel configs). Git annotate tells us that this slight debug symbol code mismatch bug has been introduced in 2008 in commit 3d75e1b8: 3d75e1b8 (Jeremy Fitzhardinge 2008-07-08 15:06:49 -0700 1231) ENTRY(xen_do_hypervisor_callback) # do_hypervisor_callback(struct *pt_regs) The 'bug' is just a slight assymetry in ENTRY()/END() debug-symbols sequences, with lots of assembly code between the ENTRY() and the END(): ENTRY(xen_do_hypervisor_callback) # do_hypervisor_callback(struct *pt_regs) ... END(do_hypervisor_callback) Human reviewers almost never catch such small mismatches, and binutils never even warned about it either. This new binutils version thus breaks the Xen build on all upstream kernels since v2.6.27, out of the blue. This makes a straightforward Git bisection of all 64-bit Xen-enabled kernels impossible on such binutils, for a bisection window of over hundred thousand historic commits. (!) This is a major fail on the side of binutils and binutils needs to turn this show-stopper build failure into a warning ASAP. ] Signed-off-by: Alexander van Heukelum Cc: Jeremy Fitzhardinge Cc: Jan Beulich Cc: H.J. Lu Cc: Linus Torvalds Cc: Andrew Morton Cc: "H. Peter Anvin" Cc: Kees Cook LKML-Reference: <1299877178-26063-1-git-send-email-heukelum@fastmail.fm> Signed-off-by: Ingo Molnar commit 6d55da53db3d9b911f69f2ce1e5fb8943eafe057 Author: Lai Jiangshan Date: Tue Dec 21 17:55:18 2010 +0800 plist: Add priority list test Add test code for checking plist when the kernel is booting. Signed-off-by: Lai Jiangshan LKML-Reference: <4D107986.1010302@cn.fujitsu.com> Signed-off-by: Steven Rostedt commit bf6a9b8336ba12672755c2ae898b0abe42c7a5ac Author: Lai Jiangshan Date: Tue Dec 21 17:55:14 2010 +0800 plist: Shrink struct plist_head struct plist_head is used in struct task_struct as well as struct rtmutex. If we can make it smaller, it will also make these structures smaller as well. The field prio_list in struct plist_head is seldom used and we can get its information from the plist_nodes. Removing this field will decrease the size of plist_head by half. Signed-off-by: Lai Jiangshan LKML-Reference: <4D107982.9090700@cn.fujitsu.com> Signed-off-by: Steven Rostedt commit 017f2b239dabb2740b91df162e004371b861f371 Author: Lai Jiangshan Date: Tue Dec 21 17:55:10 2010 +0800 futex,plist: Remove debug lock assignment from plist_node The original code uses &plist_node->plist as the fake head of the priority list for plist_del(), these debug locks in the fake head are needed for CONFIG_DEBUG_PI_LIST. But now we always pass the real head to plist_del(), the debug locks in plist_node will not be used, so we remove these assignments. Acked-by: Darren Hart Signed-off-by: Lai Jiangshan LKML-Reference: <4D10797E.7040803@cn.fujitsu.com> Signed-off-by: Steven Rostedt commit 2e12978a9f7a7abd54e8eb9ce70a7718767b8b2c Author: Lai Jiangshan Date: Wed Dec 22 14:18:50 2010 +0800 futex,plist: Pass the real head of the priority list to plist_del() Some plist_del()s in kernel/futex.c are passed a faked head of the priority list. It does not fail because the current code does not require the real head in plist_del(). The current code of plist_del() just uses the head for checking, so it will not cause a bad result even when we use a faked head. But it is undocumented usage: /** * plist_del - Remove a @node from plist. * * @node: &struct plist_node pointer - entry to be removed * @head: &struct plist_head pointer - list head */ The document says that the @head is the "list head" head of the priority list. In futex code, several places use "plist_del(&q->list, &q->list.plist);", they pass a fake head. We need to fix them all. Thanks to Darren Hart for many suggestions. Acked-by: Darren Hart Signed-off-by: Lai Jiangshan LKML-Reference: <4D11984A.5030203@cn.fujitsu.com> Signed-off-by: Steven Rostedt commit 86b32122fd54addc9af01f8b919c65d3f49090a3 Author: Konrad Rzeszutek Wilk Date: Wed Mar 9 22:03:16 2011 -0500 xen/e820: Don't mark balloon memory as E820_UNUSABLE when running as guest and fix overflow. If we have a guest that asked for: memory=1024 maxmem=2048 Which means we want 1GB now, and create pagetables so that we can expand up to 2GB, we would have this E820 layout: [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved) [ 0.000000] Xen: 0000000000100000 - 0000000080800000 (usable) Due to patch: "xen/setup: Inhibit resource API from using System RAM E820 gaps as PCI mem gaps." we would mark the memory past the 1GB mark as unusuable resulting in: [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved) [ 0.000000] Xen: 0000000000100000 - 0000000040000000 (usable) [ 0.000000] Xen: 0000000040000000 - 0000000080800000 (unusable) which meant that we could not balloon up anymore. We could balloon the guest down. The fix is to run the code introduced by the above mentioned patch only for the initial domain. We will have to revisit this once we start introducing a modified E820 for PCI passthrough so that we can utilize the P2M identity code. We also fix an overflow by having UL instead of ULL on 32-bit machines. [v2: Ian pointed to the overflow issue] Signed-off-by: Konrad Rzeszutek Wilk commit d9936bb3952a08d701f7b03f8f62d158f94d8085 Author: Thomas Gleixner Date: Fri Mar 11 14:15:35 2011 +0100 genirq: Add desc->irq_data accessor We have accessors for all fields in irq_data based on irq_desc, but not for irq_data itself. Signed-off-by: Thomas Gleixner commit 8d7718aa082aaf30a0b4989e1f04858952f941bc Author: Michel Lespinasse Date: Thu Mar 10 18:50:58 2011 -0800 futex: Sanitize futex ops argument types Change futex_atomic_op_inuser and futex_atomic_cmpxchg_inatomic prototypes to use u32 types for the futex as this is the data type the futex core code uses all over the place. Signed-off-by: Michel Lespinasse Cc: Darren Hart Cc: Peter Zijlstra Cc: Matt Turner Cc: Russell King Cc: David Howells Cc: Tony Luck Cc: Michal Simek Cc: Ralf Baechle Cc: "James E.J. Bottomley" Cc: Benjamin Herrenschmidt Cc: Martin Schwidefsky Cc: Paul Mundt Cc: "David S. Miller" Cc: Chris Metcalf Cc: Linus Torvalds LKML-Reference: <20110311025058.GD26122@google.com> Signed-off-by: Thomas Gleixner commit 37a9d912b24f96a0591773e6e6c3642991ae5a70 Author: Michel Lespinasse Date: Thu Mar 10 18:48:51 2011 -0800 futex: Sanitize cmpxchg_futex_value_locked API The cmpxchg_futex_value_locked API was funny in that it returned either the original, user-exposed futex value OR an error code such as -EFAULT. This was confusing at best, and could be a source of livelocks in places that retry the cmpxchg_futex_value_locked after trying to fix the issue by running fault_in_user_writeable(). This change makes the cmpxchg_futex_value_locked API more similar to the get_futex_value_locked one, returning an error code and updating the original value through a reference argument. Signed-off-by: Michel Lespinasse Acked-by: Chris Metcalf [tile] Acked-by: Tony Luck [ia64] Acked-by: Thomas Gleixner Tested-by: Michal Simek [microblaze] Acked-by: David Howells [frv] Cc: Darren Hart Cc: Peter Zijlstra Cc: Matt Turner Cc: Russell King Cc: Ralf Baechle Cc: "James E.J. Bottomley" Cc: Benjamin Herrenschmidt Cc: Martin Schwidefsky Cc: Paul Mundt Cc: "David S. Miller" Cc: Linus Torvalds LKML-Reference: <20110311024851.GC26122@google.com> Signed-off-by: Thomas Gleixner commit 522d7decc0370070448a8c28982c8dfd8970489e Author: Michel Lespinasse Date: Thu Mar 10 18:47:31 2011 -0800 futex: Remove redundant pagefault_disable in futex_atomic_cmpxchg_inatomic() kernel/futex.c disables page faults before calling futex_atomic_cmpxchg_inatomic(), so there is no need to do it again within that function. Signed-off-by: Michel Lespinasse Cc: Darren Hart Cc: Peter Zijlstra Cc: Matt Turner Cc: Russell King Cc: David Howells Cc: Tony Luck Cc: Michal Simek Cc: Ralf Baechle Cc: "James E.J. Bottomley" Cc: Benjamin Herrenschmidt Cc: Martin Schwidefsky Cc: Paul Mundt Cc: "David S. Miller" Cc: Chris Metcalf Cc: Linus Torvalds LKML-Reference: <20110311024731.GB26122@google.com> Signed-off-by: Thomas Gleixner commit c0c9ed15042ceac7c485813012a0a97316101b57 Author: Thomas Gleixner Date: Fri Mar 11 11:51:22 2011 +0100 futex: Avoid redudant evaluation of task_pid_vnr() The result is not going to change under us, so no need to reevaluate this over and over. Seems to be a leftover from the mechanical mass conversion of task->pid to task_pid_vnr(tsk). Signed-off-by: Thomas Gleixner commit 137ee20ddd10fdc20600c389fe63edab0c39cb1a Merge: 4a0b166 1c0b04d Author: Ingo Molnar Date: Fri Mar 11 09:28:31 2011 +0100 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core commit 25874a299ef8037df03ce4ada570bc4e42f9748f Author: Henrik Kretzschmar Date: Fri Mar 11 08:02:36 2011 +0100 x86: Clean up apic.c and apic.h This patch moves some functions and variables into init sections, makes a function static and removes some lines of cruft. Signed-off-by: Henrik Kretzschmar Acked-by: Cyrill Gorcunov LKML-Reference: <1299826956-8607-2-git-send-email-henne@nachtwindheim.de> Signed-off-by: Ingo Molnar commit ec8df88f6bd808db47ac7a06c96dcc90d7ed6ecc Author: Henrik Kretzschmar Date: Fri Mar 11 08:02:35 2011 +0100 x86: Remove superflous goal definition of tsc_sync The extra tsc_sync.o goal definition is superflous. CONFIG_X86_64_SMP depends on CONFIG_SMP and tsc_sync.o is already in the definition of CONFIG_SMP. Signed-off-by: Henrik Kretzschmar Acked-by: Cyrill Gorcunov LKML-Reference: <1299826956-8607-1-git-send-email-henne@nachtwindheim.de> Signed-off-by: Ingo Molnar commit 71eef7d1e3d9df760897fdd2cad6949a8bcf1620 Author: Ian Campbell Date: Fri Feb 18 17:06:55 2011 +0000 xen: events: remove dom0 specific xen_create_msi_irq The function name does not distinguish it from xen_allocate_pirq_msi (which operates on domU and pvhvm domains rather than dom0). Hoist domain 0 specific functionality up into the only caller leaving functionality common to all guest types in xen_bind_pirq_msi_to_irq. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit ca1d8fe9521fb67c95cfa736c08f4bbbc282b5bd Author: Ian Campbell Date: Fri Feb 18 16:43:36 2011 +0000 xen: events: use xen_bind_pirq_msi_to_irq from xen_create_msi_irq Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit f420e010edd84eb2c237fc87b7451e69740fed46 Author: Ian Campbell Date: Fri Feb 18 16:43:35 2011 +0000 xen: events: push set_irq_msi down into xen_create_msi_irq Makes the tail end of this function look even more like xen_bind_pirq_msi_to_irq. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 2e55288f63343f0810f4f0a3004f78037cfb93d3 Author: Ian Campbell Date: Fri Feb 18 16:43:34 2011 +0000 xen: events: update pirq_to_irq in xen_create_msi_irq I don't think this was a deliberate ommision. Makes the tail end of this function look even more like xen_bind_pirq_msi_to_irq. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 8135591e90c81462a6902f6ffa1f1ca021db077a Author: Ian Campbell Date: Fri Feb 18 16:43:33 2011 +0000 xen: events: refactor xen_create_msi_irq slightly Calling PHYSDEVOP_map_pirq earlier simplifies error handling and starts to make the tail end of this function look like xen_bind_pirq_msi_to_irq. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit bf480d952bcf25e8ff7e95d2a23964107513ac51 Author: Ian Campbell Date: Fri Feb 18 16:43:32 2011 +0000 xen: events: separate MSI PIRQ allocation from PIRQ binding to IRQ Split the binding aspect of xen_allocate_pirq_msi out into a new xen_bind_pirq_to_irq function. In xen_hvm_setup_msi_irq when allocating a pirq write the MSI message to signal the PIRQ as soon as the pirq is obtained. There is no way to free the pirq back so if the subsequent binding to an IRQ fails we want to ensure that we will reuse the PIRQ next time rather than leak it. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 5cad61a6ba6f4956a218ffbb64cafcc1daefaca0 Author: Ian Campbell Date: Fri Feb 18 16:43:31 2011 +0000 xen: events: assume PHYSDEVOP_get_free_pirq exists The find_unbound_pirq is called only from xen_allocate_pirq_msi and only if alloc_pirq is true. The only caller which does this is xen_hvm_setup_msi_irqs. The use of this function is gated, in pci_xen_hvm_init, on XENFEAT_hvm_pirqs. The PHYSDEVOP_get_free_pirq interfaces was added to the hypervisor in 22410:be96f6058c05 while XENFEAT_hvm_pirqs was added a couple of minutes prior in 22409:6663214f06ac. Therefore we do not need to concern ourselves with hypervisors which support XENFEAT_hvm_pirqs but not PHYSDEVOP_get_free_pirq. This eliminates the fallback path in find_unbound_pirq which walks to pirq_to_irq array looking for a free pirq. Unlike the PHYSDEVOP_get_free_pirq interface this fallback only looks up a free pirq but does not reserve it. Removing this fallback will simplify locking in the future. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 9a626612c2010699d9909a4c3141d3a38660f3b3 Author: Ian Campbell Date: Fri Feb 18 16:43:30 2011 +0000 xen: pci: collapse apic_register_gsi_xen_hvm and xen_hvm_register_pirq apic_register_gsi_xen_hvm is a tiny wrapper around xen_hvm_register_pirq. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 4b41df7f6e0b5684378d9155773c42a4577e8582 Author: Ian Campbell Date: Fri Feb 18 16:43:29 2011 +0000 xen: events: return irq from xen_allocate_pirq_msi consistent with other similar functions. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit bb5d079aefa828c292c267ed34ed2282947fa233 Author: Ian Campbell Date: Fri Feb 18 16:43:28 2011 +0000 xen: events: drop XEN_ALLOC_IRQ flag to xen_allocate_pirq_msi All callers pass this flag so it is pointless. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit ae1635b05fae30804061406010914d85d12431ac Author: Ian Campbell Date: Fri Feb 18 16:43:27 2011 +0000 xen: events: do not leak IRQ from xen_allocate_pirq_msi when no pirq available. Cc: Jeremy Fitzhardinge Cc: xen-devel@lists.xensource.com Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 260a7d4cfd26d8bad8ac3a7fce11de47491d7e00 Author: Ian Campbell Date: Fri Feb 18 16:43:26 2011 +0000 xen: pci: only define xen_initdom_setup_msi_irqs if CONFIG_XEN_DOM0 Fixes: CC arch/x86/pci/xen.o arch/x86/pci/xen.c:183: warning: 'xen_initdom_setup_msi_irqs' defined but not used Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 8448f0119a4309ef9626cf8e2dc5abb881e6dc2a Merge: 8054c36 3d74a53 Author: Konrad Rzeszutek Wilk Date: Thu Mar 10 14:42:11 2011 -0500 Merge branch 'stable/pcifront-fixes' into stable/irq.cleanup * stable/pcifront-fixes: pci/xen: When free-ing MSI-X/MSI irq->desc also use generic code. pci/xen: Cleanup: convert int** to int[] pci/xen: Use xen_allocate_pirq_msi instead of xen_allocate_pirq xen-pcifront: Sanity check the MSI/MSI-X values xen-pcifront: don't use flush_scheduled_work() commit 8054c3634cb3cb9d426c8ade934389213b857858 Merge: f5412be 1aa0b51 Author: Konrad Rzeszutek Wilk Date: Thu Mar 10 14:41:43 2011 -0500 Merge branch 'stable/irq.rework' into stable/irq.cleanup * stable/irq.rework: xen/irq: Cleanup up the pirq_to_irq for DomU PV PCI passthrough guests as well. xen: Use IRQF_FORCE_RESUME xen/timer: Missing IRQF_NO_SUSPEND in timer code broke suspend. xen: Fix compile error introduced by "switch to new irq_chip functions" xen: Switch to new irq_chip functions xen: Remove stale irq_chip.end xen: events: do not free legacy IRQs xen: events: allocate GSIs and dynamic IRQs from separate IRQ ranges. xen: events: add xen_allocate_irq_{dynamic, gsi} and xen_free_irq xen:events: move find_unbound_irq inside CONFIG_PCI_MSI xen: handled remapped IRQs when enabling a pcifront PCI device. genirq: Add IRQF_FORCE_RESUME commit 8fe8f545c6d753ead15e1f4919d39e8f9bb49629 Author: Michel Lespinasse Date: Sun Mar 6 18:07:50 2011 -0800 futex: Update futex_wait_setup comments about locking Reviving a cleanup I had done about a year ago as part of a larger futex_set_wait proposal. Over the years, the locking of the hashed futex queue got improved, so that some of the "rare but normal" race conditions described in comments can't actually happen anymore. Signed-off-by: Michel Lespinasse Cc: Linus Torvalds Cc: Darren Hart Cc: Peter Zijlstra LKML-Reference: <20110307020750.GA31188@google.com> Signed-off-by: Thomas Gleixner commit a9e7acfff0a279792918b7b0de74106e576e9988 Author: Thomas Gleixner Date: Thu Mar 10 19:12:24 2011 +0100 hrtimer: Remove empty hrtimer_init_hres_timer() Leftover from earlier implementation. All empty, remove it. Signed-off-by: Thomas Gleixner commit 53370d2e8c0382e3e2aa76def93365ed674e7fc7 Author: Thomas Gleixner Date: Thu Mar 10 18:26:33 2011 +0100 hrtimer: Update hrtimer->state documentation We changed some of the state bits and combinations thereof over time, but never updated the documentation. Signed-off-by: Thomas Gleixner commit 4a0b1665db09cf2da9ad7d0f12da386373c10bfa Author: Steven Rostedt Date: Wed Mar 9 20:09:26 2011 -0500 tracing: Fix irqoff selftest expanding max buffer If the kernel command line declares a tracer "ftrace=sometracer" and that tracer is either not defined or is enabled after irqsoff, then the irqs off selftest will fail with the following error: Testing tracer irqsoff: ------------[ cut here ]------------ WARNING: at /home/rostedt/work/autotest/nobackup/linux-test.git/kernel/trace/tra ce.c:713 update_max_tr_single+0xfa/0x11b() Hardware name: Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.38-rc8-test #1 Call Trace: [] ? warn_slowpath_common+0x65/0x7a [] ? update_max_tr_single+0xfa/0x11b [] ? warn_slowpath_null+0xf/0x13 [] ? update_max_tr_single+0xfa/0x11b [] ? stop_critical_timing+0x154/0x204 [] ? trace_selftest_startup_irqsoff+0x5b/0xc1 [] ? trace_selftest_startup_irqsoff+0x5b/0xc1 [] ? trace_selftest_startup_irqsoff+0x5b/0xc1 [] ? time_hardirqs_on+0x25/0x28 [] ? trace_hardirqs_on_caller+0x18/0x12f [] ? trace_hardirqs_on+0xb/0xd [] ? trace_selftest_startup_irqsoff+0x5b/0xc1 [] ? register_tracer+0xf8/0x1a3 [] ? init_irqsoff_tracer+0xd/0x11 [] ? do_one_initcall+0x71/0x121 [] ? init_irqsoff_tracer+0x0/0x11 [] ? kernel_init+0x13a/0x1b6 [] ? kernel_init+0x0/0x1b6 [] ? kernel_thread_helper+0x6/0x10 ---[ end trace e93713a9d40cd06c ]--- .. no entries found ..FAILED! What happens is the "ftrace=..." will expand the ring buffer to its default size (from its minimum size) but it will not expand the max ring buffer (the ring buffer to store maximum latencies). When the irqsoff test runs, it will call the ring buffer swap routine that checks if the max ring buffer is the same size as the normal ring buffer, and will fail if it is not. This causes the test to fail. The solution is to expand the max ring buffer before running the self test if the max ring buffer is used by that tracer and the normal ring buffer is expanded. The max ring buffer should be shrunk again after the test is done to save space. Signed-off-by: Steven Rostedt commit 9a24470b2826e4665b1484836c7ae6aba1ddea32 Author: Steven Rostedt Date: Wed Mar 9 14:53:38 2011 -0500 tracing: Align 4 byte ints together in struct tracer Move elements in struct tracer for better alignment. Signed-off-by: Steven Rostedt commit 56355b83e2a24ce7e1870c8479205e2cdd332225 Author: Yuanhan Liu Date: Mon Nov 8 14:05:12 2010 +0800 tracing: Export trace_set_clr_event() Trace events belonging to a module only exists when the module is loaded. Well, we can use trace_set_clr_event funtion to enable some trace event at the module init routine, so that we will not miss something while loading then module. So, Export the trace_set_clr_event function so that module can use it. Signed-off-by: Yuanhan Liu LKML-Reference: <1289196312-25323-1-git-send-email-yuanhan.liu@linux.intel.com> Cc: Steven Rostedt Cc: Frederic Weisbecker Cc: Ingo Molnar Signed-off-by: Steven Rostedt commit 31274d72f01604f4b02d933b4f3cac84d2c201fd Author: Jiri Olsa Date: Fri Feb 18 15:52:19 2011 +0100 tracing: Explain about unstable clock on resume with ring buffer warning The "Delta way too big" warning might appear on a system with a unstable shed clock right after the system is resumed and tracing was enabled at time of suspend. Since it's not realy a bug, and the unstable sched clock is working fast and reliable otherwise, Steven suggested to keep using the sched clock in any case and just to make note in the warning itself. v2 changes: - added #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK Signed-off-by: Jiri Olsa LKML-Reference: <20110218145219.GD2604@jolsa.brq.redhat.com> Signed-off-by: Steven Rostedt commit 722b3c74695377d11d18a52f3da08114d37f3f37 Author: Steven Rostedt Date: Fri Feb 11 20:36:02 2011 -0500 ftrace/graph: Trace function entry before updating index Currently the index to the ret_stack is updated and the real return address is saved in the ret_stack. Then we call the trace function. The trace function could decide that it doesn't want to trace this function (ex. set_graph_function does not match) and it will return 0 which means not to trace this call. The normal function graph tracer has this code: if (!(trace->depth || ftrace_graph_addr(trace->func)) || ftrace_graph_ignore_irqs()) return 0; What this states is, if the trace depth (which is curr_ret_stack) is zero (top of nested functions) then test if we want to trace this function. If this function is not to be traced, then return 0 and the rest of the function graph tracer logic will not trace this function. The problem arises when an interrupt comes in after we updated the curr_ret_stack. The next function that gets called will have a trace->depth of 1. Which fools this trace code into thinking that we are in a nested function, and that we should trace. This causes interrupts to be traced when they should not be. The solution is to trace the function first and then update the ret_stack. Reported-by: zhiping zhong Reported-by: wu zhangjin Signed-off-by: Steven Rostedt commit 1274a9c2e91652e28efa45c3e5886ec82f08bfbe Author: Steven Rostedt Date: Fri Feb 11 16:43:33 2011 -0500 ftrace: Add .ref.text as one of the safe areas to trace The section .ref.text will not go away unexpectedly and is safe to trace. Add it to the safe list of sections to allow tracing. Signed-off-by: Steven Rostedt commit 10da37a645b5e915d8572cc2b1f5eb11ada3ea4f Author: David Sharp Date: Fri Dec 3 16:13:26 2010 -0800 tracing: Adjust conditional expression latency formatting. Formatting change only to improve code readability. No code changes except to introduce intermediate variables. Signed-off-by: David Sharp LKML-Reference: <1291421609-14665-13-git-send-email-dhsharp@google.com> [ Keep variable declarations and assignment separate ] Signed-off-by: Steven Rostedt commit ca9da2dd63b0b32de1b693953dff66cadeb6400b Author: David Sharp Date: Fri Dec 3 16:13:23 2010 -0800 tracing: Fix event alignment: skb:kfree_skb Acked-by: Neil Horman Signed-off-by: David Sharp LKML-Reference: <1291421609-14665-10-git-send-email-dhsharp@google.com> Signed-off-by: Steven Rostedt commit ad440ad66f1617194738bf674dfe2d38978ac54d Author: David Sharp Date: Fri Dec 3 16:13:22 2010 -0800 tracing: Fix event alignment: mce:mce_record Signed-off-by: David Sharp LKML-Reference: <1291421609-14665-9-git-send-email-dhsharp@google.com> Signed-off-by: Steven Rostedt commit d5bf2ff07230a4a1b73ecb22363f77c02e1d85ab Author: David Sharp Date: Fri Dec 3 16:13:21 2010 -0800 tracing: Fix event alignment: kvm:kvm_hv_hypercall Acked-by: Avi Kivity Signed-off-by: David Sharp LKML-Reference: <1291421609-14665-8-git-send-email-dhsharp@google.com> Signed-off-by: Steven Rostedt commit b5e3008e489f5a00c6d5db914a4c4338c9ef5e8b Author: David Sharp Date: Fri Dec 3 16:13:20 2010 -0800 tracing: Fix event alignment: module:module_request Acked-by: Li Zefan Signed-off-by: David Sharp LKML-Reference: <1291421609-14665-7-git-send-email-dhsharp@google.com> Signed-off-by: Steven Rostedt commit 140e4f2d1cd816aed196705c036763313c0e4bd3 Author: David Sharp Date: Fri Dec 3 16:13:19 2010 -0800 tracing: Fix event alignment: ftrace:context_switch and ftrace:wakeup Signed-off-by: David Sharp LKML-Reference: <1291421609-14665-6-git-send-email-dhsharp@google.com> Signed-off-by: Steven Rostedt commit e6e1e2593592a8f6f6380496655d8c6f67431266 Author: Steven Rostedt Date: Wed Mar 9 10:41:56 2011 -0500 tracing: Remove lock_depth from event entry The lock_depth field in the event headers was added as a temporary data point for help in removing the BKL. Now that the BKL is pretty much been removed, we can remove this field. This in turn changes the header from 12 bytes to 8 bytes, removing the 4 byte buffer that gcc would insert if the first field in the data load was 8 bytes in size. Signed-off-by: Steven Rostedt commit 1c0b04d10bbe35279c50e3b36cf5b8ec2a0050d8 Author: Arnaldo Carvalho de Melo Date: Wed Mar 9 08:13:19 2011 -0300 perf header: Stop using 'self' Stop using this python/OOP convention, doesn't really helps. Will do more from time to time till we get it cleaned up in all of tools/perf. Suggested-by: Thomas Gleixner LKML-Reference: Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Tom Zanussi Signed-off-by: Arnaldo Carvalho de Melo commit a91e5431d54f5359fccb5ec2512f252eb217707e Author: Arnaldo Carvalho de Melo Date: Thu Mar 10 11:15:54 2011 -0300 perf session: Use evlist/evsel for managing perf.data attributes So that we can reuse things like the id to attr lookup routine (perf_evlist__id2evsel) that uses a hash table instead of the linear lookup done in the older perf_header_attr routines, etc. Also to make evsels/evlist more pervasive an API, simplyfing using the emerging perf lib. cc: Arun Sharma Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 6547250381eb315acff3d52b4872ad775359407c Author: Jiri Olsa Date: Mon Mar 7 21:13:41 2011 +0100 perf top: Don't let events to eat up whole header line Passing multiple events might force out information about pid/tid/cpu. Attached patch leaves 30 characters for this info at the expense of the events' names. Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Han Pingtian LKML-Reference: <1299528821-17521-3-git-send-email-jolsa@redhat.com> Signed-off-by: Jiri Olsa Signed-off-by: Arnaldo Carvalho de Melo commit b9a46bba88001504235459c8410f17e6a7e38008 Author: Jiri Olsa Date: Mon Mar 7 21:13:40 2011 +0100 perf top: Fix events overflow in top command The snprintf function returns number of printed characters even if it cross the size parameter. So passing enough events via '-e' parameter will cause segmentation fault. It's reproduced by following command: perf top -e `perf list | grep Tracepoint | awk -F'[' '\ {gsub(/[[:space:]]+/,"",$1);array[FNR]=$1}END{outputs=array[1];\ for (i=2;i<=FNR;i++){ outputs=outputs "," array[i];};print outputs}'` Attached patch is adding SNPRINTF macro that provides the overflow check and returns actuall number of printed characters. Reported-by: Han Pingtian Cc: Han Pingtian Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <1299528821-17521-2-git-send-email-jolsa@redhat.com> Signed-off-by: Jiri Olsa Signed-off-by: Arnaldo Carvalho de Melo commit ea04683f592e6200b52e191b7e2842aedcfd88b6 Author: John Stultz Date: Thu Feb 10 15:32:59 2011 -0800 RTC: Fix up rtc.txt documentation to reflect changes to generic rtc layer Now that the genric RTC layer handles much of the RTC functionality, the rtc.txt documentation needs to be updated to remove outdated information. CC: Thomas Gleixner CC: Alessandro Zummo CC: Marcelo Roberto Jimenez CC: rtc-linux@googlegroups.com Signed-off-by: John Stultz commit 416f0e8056f757c119dc3d4fa434a62b65c8272b Author: Marcelo Roberto Jimenez Date: Mon Feb 7 19:16:08 2011 -0200 RTC: sa1100: Update the sa1100 RTC driver. Since PIE interrupts are now emulated, this patch removes the previous code that used the hardware counters. The removal of read_callback() also fixes a wrong user space behaviour of this driver, which was not returning the right value to read(). [john.stultz: Merge fixups] CC: Thomas Gleixner CC: Alessandro Zummo CC: Marcelo Roberto Jimenez CC: rtc-linux@googlegroups.com Signed-off-by: Marcelo Roberto Jimenez Signed-off-by: John Stultz commit a417493ef916b8b6d1782a589766a713c553842e Author: Marcelo Roberto Jimenez Date: Mon Feb 7 19:16:07 2011 -0200 RTC: Fix the cross interrupt issue on rtc-test. The rtc-test driver is meant to provide a test/debug code for the RTC subsystem. The rtc-test driver simulates specific interrupts by echoing to the sys interface. Those were the update, alarm and periodic interrupts. As a side effect of the new implementation, any interrupt generated in the rtc-test driver would trigger the same code path in the generic code, and thus the distinction among interrupts gets lost. This patch preserves the previous behaviour of the rtc-test driver, where e.g. an update interrupt would not trigger an alarm or periodic interrupt, and vice-versa. In real world RTC drivers, this is not an issue, but in the rtc-test driver it may be interesting to distinguish these interrupts for testing purposes. CC: Thomas Gleixner CC: Alessandro Zummo CC: Marcelo Roberto Jimenez CC: rtc-linux@googlegroups.com Signed-off-by: Marcelo Roberto Jimenez Signed-off-by: John Stultz commit 4cebe7aadc9ee8e7b44857b7aba3a878870cef65 Author: Marcelo Roberto Jimenez Date: Mon Feb 7 19:16:06 2011 -0200 RTC: Remove UIE and PIE information from the sa1100 driver proc. This patch removes the UIE and PIE information that is now being supplied directly in the generic RTC code. CC: Thomas Gleixner CC: Alessandro Zummo CC: Marcelo Roberto Jimenez CC: rtc-linux@googlegroups.com Signed-off-by: Marcelo Roberto Jimenez Signed-off-by: John Stultz commit bca8521c551afcd926bdc8f814ebaefcb8215c57 Author: Marcelo Roberto Jimenez Date: Fri Feb 11 11:50:24 2011 -0200 RTC: Include information about UIE and PIE in RTC driver proc. Generic RTC code is always able to provide the necessary information about update and periodic interrupts. This patch add such information to the proc interface. CC: Thomas Gleixner CC: Alessandro Zummo CC: Marcelo Roberto Jimenez CC: rtc-linux@googlegroups.com Signed-off-by: Marcelo Roberto Jimenez Signed-off-by: John Stultz commit e428c6a2772bcf6b022baf7c8267cca3634c0c3e Author: John Stultz Date: Fri Feb 4 16:16:12 2011 -0800 RTC: Clean out UIE icotl implementations With the generic RTC rework, the UIE mode irqs are handled in the generic layer, and only hardware specific ioctls get passed down to the rtc driver layer. So this patch removes the UIE mode ioctl handling in the rtc driver layer, which never get used. CC: Thomas Gleixner CC: Alessandro Zummo CC: Marcelo Roberto Jimenez CC: rtc-linux@googlegroups.com Signed-off-by: John Stultz commit 51ba60c5bb3b0f71bee26404ddc22d8e4109e88a Author: John Stultz Date: Thu Feb 3 12:13:50 2011 -0800 RTC: Cleanup rtc_class_ops->update_irq_enable() Now that the generic code handles UIE mode irqs via periodic alarm interrupts, no one calls the rtc_class_ops->update_irq_enable() method anymore. This patch removes the driver hooks and implementations of update_irq_enable if no one else is calling it. CC: Thomas Gleixner CC: Alessandro Zummo CC: Marcelo Roberto Jimenez CC: rtc-linux@googlegroups.com Signed-off-by: John Stultz commit 696160fec162601d06940862b5b3aa4460344c1b Author: John Stultz Date: Thu Feb 3 12:02:07 2011 -0800 RTC: Cleanup rtc_class_ops->irq_set_freq() With the generic rtc code now emulating PIE mode irqs via an hrtimer, no one calls the rtc_class_ops->irq_set_freq call. This patch removes the hook and deletes the driver functions if no one else calls them. CC: Thomas Gleixner CC: Alessandro Zummo CC: Marcelo Roberto Jimenez CC: rtc-linux@googlegroups.com Signed-off-by: John Stultz commit 80d4bb515b78f38738f3378fd1be6039063ab040 Author: John Stultz Date: Thu Feb 3 11:34:50 2011 -0800 RTC: Cleanup rtc_class_ops->irq_set_state With PIE mode interrupts now emulated in generic code via an hrtimer, no one calls rtc_class_ops->irq_set_state(), so this patch removes it along with driver implementations. CC: Thomas Gleixner CC: Alessandro Zummo CC: Marcelo Roberto Jimenez CC: rtc-linux@googlegroups.com Signed-off-by: John Stultz commit f44f7f96a20af16f6f12e1c995576d6becf5f57b Author: John Stultz Date: Mon Feb 21 22:58:51 2011 -0800 RTC: Initialize kernel state from RTC Mark Brown pointed out a corner case: that RTC alarms should be allowed to be persistent across reboots if the hardware supported it. The rework of the generic layer to virtualize the RTC alarm virtualized much of the alarm handling, and removed the code used to read the alarm time from the hardware. Mark noted if we want the alarm to be persistent across reboots, we need to re-read the alarm value into the virtualized generic layer at boot up, so that the generic layer properly exposes that value. This patch restores much of the earlier removed rtc_read_alarm code and wires it in so that we set the kernel's alarm value to what we find in the hardware at boot time. NOTE: Not all hardware supports persistent RTC alarm state across system reset. rtc-cmos for example will keep the alarm time, but disables the AIE mode irq. Applications should not expect the RTC alarm to be valid after a system reset. We will preserve what we can, to represent the hardware state at boot, but its not guarenteed. Further, in the future, with multiplexed RTC alarms, the soonest alarm to fire may not be the one set via the /dev/rt ioctls. So an application may set the alarm with RTC_ALM_SET, but after a reset find that RTC_ALM_READ returns an earlier time. Again, we preserve what we can, but applications should not expect the RTC alarm state to persist across a system reset. Big thanks to Mark for pointing out the issue! Thanks also to Marcelo for helping think through the solution. CC: Mark Brown CC: Marcelo Roberto Jimenez CC: Thomas Gleixner CC: Alessandro Zummo CC: rtc-linux@googlegroups.com Reported-by: Mark Brown Signed-off-by: John Stultz commit de29be5e712dc8b7eef2bef9417af3bb6a88e47a Author: David Sharp Date: Fri Dec 3 16:13:16 2010 -0800 ring-buffer: Remove unused #include Signed-off-by: David Sharp LKML-Reference: <1291421609-14665-3-git-send-email-dhsharp@google.com> Signed-off-by: Steven Rostedt commit 750912fa366312e9c5bc83eab352898a26750401 Author: David Sharp Date: Wed Dec 8 13:46:47 2010 -0800 tracing: Add an 'overwrite' trace_option. Add an "overwrite" trace_option for ftrace to control whether the buffer should be overwritten on overflow or not. The default remains to overwrite old events when the buffer is full. This patch adds the option to instead discard newest events when the buffer is full. This is useful to get a snapshot of traces just after enabling traces. Dropping the current event is also a simpler code path. Signed-off-by: David Sharp LKML-Reference: <1291844807-15481-1-git-send-email-dhsharp@google.com> Signed-off-by: Steven Rostedt commit c49aa5bd1376939b40759a6da5ba6cf701702721 Author: Jan Beulich Date: Tue Mar 8 09:24:26 2011 +0000 x86: Remove dead config option X86_CPU This isn't being referenced anywhere, and the selects done from it can be easily done together with all the other X86 ones. v2: Also adjust UML's Kconfig.x86. Signed-off-by: Jan Beulich LKML-Reference: <4D7603DA02000078000351C1@vpn.id2.novell.com> Signed-off-by: Ingo Molnar commit c8b44163b754612fc4769fe1c5df00e98fc9d3c6 Merge: ac23f25 a5abba9 Author: Ingo Molnar Date: Wed Mar 9 10:38:55 2011 +0100 Merge commit 'v2.6.38-rc8' into x86/asm Merge reason: Update with the latest fixes. Signed-off-by: Ingo Molnar commit 51de69523ffe1c17994dc2f260369f29dfdce71c Author: Owen Smith Date: Wed Dec 22 15:05:00 2010 +0000 xen: Union the blkif_request request specific fields Prepare for extending the block device ring to allow request specific fields, by moving the request specific fields for reads, writes and barrier requests to a union member. Acked-by: Jens Axboe Signed-off-by: Owen Smith Signed-off-by: Konrad Rzeszutek Wilk commit c68fd4f3ca90de7d18c567e70b2c164078aefadf Author: Thomas Gleixner Date: Tue Mar 8 19:52:55 2011 +0100 genirq: Add comments to Kconfig switches Signed-off-by: Thomas Gleixner Cc: Sam Ravnborg commit 2a8247a2600c3e087a568fc68a6ec4eedac27ef1 Author: Jiri Olsa Date: Mon Feb 21 15:25:13 2011 +0100 kprobes: Disabling optimized kprobes for entry text section You can crash the kernel (with root/admin privileges) using kprobe tracer by running: echo "p system_call_after_swapgs" > ./kprobe_events echo 1 > ./events/kprobes/enable The reason is that at the system_call_after_swapgs label, the kernel stack is not set up. If optimized kprobes are enabled, the user space stack is being used in this case (see optimized kprobe template) and this might result in a crash. There are several places like this over the entry code (entry_$BIT). As it seems there's no any reasonable/maintainable way to disable only those places where the stack is not ready, I switched off the whole entry code from kprobe optimizing. Signed-off-by: Jiri Olsa Acked-by: Masami Hiramatsu Cc: acme@redhat.com Cc: fweisbec@gmail.com Cc: ananth@in.ibm.com Cc: davem@davemloft.net Cc: a.p.zijlstra@chello.nl Cc: eric.dumazet@gmail.com Cc: 2nddept-manager@sdl.hitachi.co.jp LKML-Reference: <1298298313-5980-3-git-send-email-jolsa@redhat.com> Signed-off-by: Ingo Molnar commit ea7145477a461e09d8d194cac4b996dc4f449107 Author: Jiri Olsa Date: Mon Mar 7 19:10:39 2011 +0100 x86: Separate out entry text section Put x86 entry code into a separate link section: .entry.text. Separating the entry text section seems to have performance benefits - caused by more efficient instruction cache usage. Running hackbench with perf stat --repeat showed that the change compresses the icache footprint. The icache load miss rate went down by about 15%: before patch: 19417627 L1-icache-load-misses ( +- 0.147% ) after patch: 16490788 L1-icache-load-misses ( +- 0.180% ) The motivation of the patch was to fix a particular kprobes bug that relates to the entry text section, the performance advantage was discovered accidentally. Whole perf output follows: - results for current tip tree: Performance counter stats for './hackbench/hackbench 10' (500 runs): 19417627 L1-icache-load-misses ( +- 0.147% ) 2676914223 instructions # 0.497 IPC ( +- 0.079% ) 5389516026 cycles ( +- 0.144% ) 0.206267711 seconds time elapsed ( +- 0.138% ) - results for current tip tree with the patch applied: Performance counter stats for './hackbench/hackbench 10' (500 runs): 16490788 L1-icache-load-misses ( +- 0.180% ) 2717734941 instructions # 0.502 IPC ( +- 0.079% ) 5414756975 cycles ( +- 0.148% ) 0.206747566 seconds time elapsed ( +- 0.137% ) Signed-off-by: Jiri Olsa Cc: Arnaldo Carvalho de Melo Cc: Frederic Weisbecker Cc: Peter Zijlstra Cc: Linus Torvalds Cc: Andrew Morton Cc: Nick Piggin Cc: Eric Dumazet Cc: masami.hiramatsu.pt@hitachi.com Cc: ananth@in.ibm.com Cc: davem@davemloft.net Cc: 2nddept-manager@sdl.hitachi.co.jp LKML-Reference: <20110307181039.GB15197@jolsa.redhat.com> Signed-off-by: Ingo Molnar commit 86cb2ec7b22a0a89b8660110dc03321fadbef45f Merge: 7f0030b a5abba9 Author: Ingo Molnar Date: Tue Mar 8 17:21:49 2011 +0100 Merge commit 'v2.6.38-rc8' into perf/core Merge reason: Merge latest fixes. Signed-off-by: Ingo Molnar commit 997772884036e6e121de39322179989154437d9f Author: Stanislaw Gruszka Date: Mon Mar 7 09:58:33 2011 +0100 debugobjects: Add hint for better object identification In complex subsystems like mac80211 structures can contain several timers and work structs, so identifying a specific instance from the call trace and object type output of debugobjects can be hard. Allow the subsystems which support debugobjects to provide a hint function. This function returns a pointer to a kernel address (preferrably the objects callback function) which is printed along with the debugobjects type. Add hint methods for timer_list, work_struct and hrtimer. [ tglx: Massaged changelog, made it compile ] Signed-off-by: Stanislaw Gruszka LKML-Reference: <20110307085809.GA9334@redhat.com> Signed-off-by: Thomas Gleixner commit 7f0030b211579939461468f25b80c73e293c46e0 Author: Arnaldo Carvalho de Melo Date: Sun Mar 6 13:07:30 2011 -0300 perf report tui: Improve multi event session support When multiple events were used in 'perf record', allow the user to choose which one is wanted before showing the per event histograms. Annotations will be performed on the chosen event. Allow going back and forth from event to event quickly using just the arrow keys and enter. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi Cc: William Cohen LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit e248de331a452f8771eda6ed4bb30d92c82df28b Author: Arnaldo Carvalho de Melo Date: Sat Mar 5 21:40:06 2011 -0300 perf tools: Improve support for sessions with multiple events By creating an perf_evlist out of the attributes in the perf.data file header, so that we can use evlists and evsels when reading recorded sessions in addition to when we record sessions. More work is needed to allow tools to allow the user to select which events are wanted when browsing sessions, be it just one or a subset of them, aggregated or showed at the same time but with different indications on the UI to allow seeing workloads thru different views at the same time. But the overall goal/trend is to more uniformly use evsels and evlists. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 3d3b5e95997208067c963923db90ed1517565d14 Author: Arnaldo Carvalho de Melo Date: Fri Mar 4 22:29:39 2011 -0300 perf evlist: Split perf_evlist__id_hash The previous situation was to receive an fd from where to read the event ID. Spin off a routine for when we have the ID handy, not having to read it from some fd. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 60098917c06d154d06ce030c125266eab9e60768 Author: Arnaldo Carvalho de Melo Date: Fri Mar 4 21:19:21 2011 -0300 perf hists browser: Handle browsing empty hists tree Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit d7603d5122d9700fb8f36fa08b04f4e900fef059 Author: Arnaldo Carvalho de Melo Date: Fri Mar 4 14:51:33 2011 -0300 perf hists: Remove needless global col lenght calcs To support multiple events we need to do these calcs per 'struct hists' instance, and it turns out we already do that at: __hists__add_entry hists__inc_nr_entries hists__calc_col_len for all the unfiltered hist_entry instances we stash in the rb tree, so trow away the dead code. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit a03f35ceeb3d279da35c5a914ac01a4b1effb0a1 Author: Arnaldo Carvalho de Melo Date: Thu Mar 3 16:43:03 2011 -0300 perf report tui: Fix multi event switching TAB/UNTAB were not hotkeys, so didn't exit hists__browse back to hists__tui_browse_tree, allowing just the first event to be browsed. Reported-by: William Cohen Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi Cc: William Cohen LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit ac23f25355ef53f3d14352fcff3c6817527a9749 Author: Jan Beulich Date: Fri Mar 4 15:52:35 2011 +0000 x86: Really print supported CPUs if PROCESSOR_SELECT=y I'm sure it was a mere oversight that the CONFIG_ prefixes are missing. Signed-off-by: Jan Beulich Cc: Dave Jones LKML-Reference: <4D7118D30200007800034F79@vpn.id2.novell.com> Signed-off-by: Ingo Molnar commit ca764aaf025d2c83054191895b366fa81a9ccf48 Merge: d04c579 078a198 Author: Ingo Molnar Date: Sat Mar 5 07:32:45 2011 +0100 Merge branch 'x86-mm' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into x86/mm commit 6909262429b70a162e9e7053672cfd8024c9275d Author: Lin Ming Date: Thu Mar 3 10:34:50 2011 +0800 perf: Avoid the percore allocations if the CPU is not HT capable Signed-off-by: Lin Ming Signed-off-by: Peter Zijlstra LKML-Reference: <1299119690-13991-5-git-send-email-ming.m.lin@intel.com> Signed-off-by: Ingo Molnar commit 078a198906c796981f93ff100c210506e91aade5 Author: Tejun Heo Date: Fri Mar 4 16:32:02 2011 +0100 x86-64, NUMA: Don't assume phys node 0 is always online in numa_emulation() Undetermined entries in emu_nid_to_phys[] are filled with zero assuming that physical node 0 is always online; however, this might not be true depending on hardware configuration. Find a physical node which is actually online and use it instead. Signed-off-by: Tejun Heo Reported-by: David Rientjes LKML-Reference: commit c09cedf4f75f1e47ea17f55e18e9cfb81bec8575 Author: David Rientjes Date: Fri Mar 4 15:17:21 2011 +0100 x86-64, NUMA: Clean up initmem_init() This patch cleans initmem_init() so that it is more readable and doesn't use an unnecessary array of function pointers to convolute the flow of the code. It also makes it obvious that dummy_numa_init() will always succeed (and documents that requirement) so that the existing BUG() is never actually reached. No functional change. -tj: Updated comment for dummy_numa_init() slightly. Signed-off-by: David Rientjes Signed-off-by: Tejun Heo commit 51b361b4009f4e19ae68d2bcbb35e254e91b6054 Author: Yinghai Lu Date: Fri Mar 4 14:49:28 2011 +0100 x86-64, NUMA: Fix numa_emulation code with node0 without RAM On one system that does not have RAM on node0. When numa_emulation is compiled in, and 1. boot system without numa=fake... 2. or boot system with numa=fake=128 to make emulation fail will get: [ 0.092026] ------------[ cut here ]------------ [ 0.096005] kernel BUG at arch/x86/mm/numa_emulation.c:439! [ 0.096005] invalid opcode: 0000 [#1] SMP [ 0.096005] last sysfs file: [ 0.096005] CPU 0 [ 0.096005] Modules linked in: [ 0.096005] [ 0.096005] Pid: 0, comm: swapper Not tainted 2.6.38-rc6-tip-yh-03869-gcb0491d-dirty #684 Sun Microsystems Sun Fire X4240/Sun Fire X4240 [ 0.096005] RIP: 0010:[] [] numa_add_cpu+0x56/0xcf [ 0.096005] RSP: 0000:ffffffff82437ed8 EFLAGS: 00010246 ... [ 0.096005] Call Trace: [ 0.096005] [] identify_cpu+0x2d7/0x2df [ 0.096005] [] identify_boot_cpu+0x10/0x30 [ 0.096005] [] check_bugs+0x9/0x2d [ 0.096005] [] start_kernel+0x3d7/0x3f1 [ 0.096005] [] x86_64_start_reservations+0x9c/0xa0 [ 0.096005] [] x86_64_start_kernel+0x1dd/0x1e8 [ 0.096005] Code: 74 06 48 8d 04 90 eb 0f 48 c7 c0 30 d9 00 00 48 03 04 d5 90 0f 60 82 8b 00 83 f8 ff 74 0d 0f a3 05 8b 7e 92 00 19 d2 85 d2 75 02 <0f> 0b 48 98 be 00 01 00 00 48 c7 c7 e0 44 60 82 44 8b 2c 85 e0 [ 0.096005] RIP [] numa_add_cpu+0x56/0xcf [ 0.096005] RSP [ 0.096026] ---[ end trace a7919e7f17c0a725 ]--- We need to use early_cpu_to_node() directly, because numa_cpu_node() will return node0 that is not onlined. Signed-off-by: Yinghai Lu Signed-off-by: Tejun Heo commit e994d7d23a0bae34cd28834e85522ed4e782faf7 Author: Andi Kleen Date: Thu Mar 3 10:34:48 2011 +0800 perf: Fix LLC-* events on Intel Nehalem/Westmere On Intel Nehalem and Westmere CPUs the generic perf LLC-* events count the L2 caches, not the real L3 LLC - this was inconsistent with behavior on other CPUs. Fixing this requires the use of the special OFFCORE_RESPONSE events which need a separate mask register. This has been implemented by the previous patch, now use this infrastructure to set correct events for the LLC-* on Nehalem and Westmere. Signed-off-by: Andi Kleen Signed-off-by: Lin Ming Signed-off-by: Peter Zijlstra LKML-Reference: <1299119690-13991-3-git-send-email-ming.m.lin@intel.com> Signed-off-by: Ingo Molnar commit a7e3ed1e470116c9d12c2f778431a481a6be8ab6 Author: Andi Kleen Date: Thu Mar 3 10:34:47 2011 +0800 perf: Add support for supplementary event registers Change logs against Andi's original version: - Extends perf_event_attr:config to config{,1,2} (Peter Zijlstra) - Fixed a major event scheduling issue. There cannot be a ref++ on an event that has already done ref++ once and without calling put_constraint() in between. (Stephane Eranian) - Use thread_cpumask for percore allocation. (Lin Ming) - Use MSR names in the extra reg lists. (Lin Ming) - Remove redundant "c = NULL" in intel_percore_constraints - Fix comment of perf_event_attr::config1 Intel Nehalem/Westmere have a special OFFCORE_RESPONSE event that can be used to monitor any offcore accesses from a core. This is a very useful event for various tunings, and it's also needed to implement the generic LLC-* events correctly. Unfortunately this event requires programming a mask in a separate register. And worse this separate register is per core, not per CPU thread. This patch: - Teaches perf_events that OFFCORE_RESPONSE needs extra parameters. The extra parameters are passed by user space in the perf_event_attr::config1 field. - Adds support to the Intel perf_event core to schedule per core resources. This adds fairly generic infrastructure that can be also used for other per core resources. The basic code has is patterned after the similar AMD northbridge constraints code. Thanks to Stephane Eranian who pointed out some problems in the original version and suggested improvements. Signed-off-by: Andi Kleen Signed-off-by: Lin Ming Signed-off-by: Peter Zijlstra LKML-Reference: <1299119690-13991-2-git-send-email-ming.m.lin@intel.com> Signed-off-by: Ingo Molnar commit 17e3162972cbb9796035fff1e2fd30669b0eef65 Author: Stephane Eranian Date: Wed Mar 2 17:05:01 2011 +0200 perf_events: Update PEBS event constraints This patch updates PEBS event constraints for Intel Atom, Nehalem, Westmere. This patch also reorganizes the PEBS format/constraint detection code. It is now based on processor model and not PEBS format. Two processors may use the same PEBS format without have the same list of PEBS events. In this second version, we simplified the initialization of the PEBS constraints by leveraging the existing switch() statement in perf_event_intel.c. We also renamed the constraint tables to be more consistent with regular constraints. In this 3rd version, we drop BR_INST_RETIRED.MISPRED from Intel Atom as it does not seem to work. Use MISPREDICTED_BRANCH_RETIRED instead. Also add FP_ASSIST.* o both Intel Nehalem and Westmere. I misssed those in the earlier patches. Events were tested using libpfm4 perf_examples. Signed-off-by: Stephane Eranian Signed-off-by: Peter Zijlstra LKML-Reference: <4d6e6b02.815bdf0a.637b.07a7@mx.google.com> Signed-off-by: Ingo Molnar commit 08309379b7083a9ceec0f9bb96a629058fb623c4 Author: Peter Zijlstra Date: Thu Mar 3 11:31:20 2011 +0100 perf: Fix cgroup vs jump_label problem Li Zefan reported that the jump label code sleeps and we're calling it under a spinlock, *fail* ;-) Reported-by: Li Zefan Signed-off-by: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit 2d0f25201ee210a0666ec9c41538ba05a07f8bc6 Author: Li Zefan Date: Thu Mar 3 14:26:20 2011 +0800 perf cgroup: Fix a typo in kernel config s/specificied/specified Signed-off-by: Li Zefan Acked-by: Stephane Eranian Signed-off-by: Peter Zijlstra LKML-Reference: <4D6F348C.2050804@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 1b15d0558e82df9b3659804ceb44187b98eda354 Author: Li Zefan Date: Thu Mar 3 14:26:06 2011 +0800 perf cgroup: Clean up perf_cgroup_create() - Use kzalloc() to replace kmalloc() + memset(). - Remove redundant initialization, since alloc_percpu() returns zero-filled percpu memory. Signed-off-by: Li Zefan Acked-by: Stephane Eranian Signed-off-by: Peter Zijlstra LKML-Reference: <4D6F347E.2010806@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit f75e18cb9627b1d3d752b83a0b5563da0042c50a Author: Li Zefan Date: Thu Mar 3 14:25:50 2011 +0800 perf cgroup: Fix unmatched call to perf_detach_cgroup() In the failure path, we call perf_detach_cgroup(), but we didn't call perf_get_cgroup() prio to it. Signed-off-by: Li Zefan Acked-by: Stephane Eranian Signed-off-by: Peter Zijlstra LKML-Reference: <4D6F346E.9070606@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 3db272c0494900fcb905a201180a78cae3addd6e Author: Li Zefan Date: Thu Mar 3 14:25:37 2011 +0800 perf cgroup: Fix leak of file reference count In perf_cgroup_connect(), fput_light() is missing in a failure path. Signed-off-by: Li Zefan Acked-by: Stephane Eranian Signed-off-by: Peter Zijlstra LKML-Reference: <4D6F3461.6060406@cn.fujitsu.com> Signed-off-by: Ingo Molnar commit 940c5b2971de443df22eed0441bc74fb0116e9f5 Author: Lin Ming Date: Sun Feb 27 21:13:31 2011 +0800 perf: Fix the missing event initialization when pmu is found in idr Currently, the event is not initialized if pmu is found in idr. This never causes bug just because now no pmu is associated with the idr id. Signed-off-by: Lin Ming Signed-off-by: Peter Zijlstra LKML-Reference: <1298812411.2699.9.camel@localhost> Signed-off-by: Ingo Molnar commit 6d1cafd8b56ea726c10a5a104de57cc3ed8fa953 Author: Venkatesh Pallipadi Date: Tue Mar 1 16:28:21 2011 -0800 sched: Resched proper CPU on yield_to() yield_to_task_fair() has code to resched the CPU of yielding task when the intention is to resched the CPU of the task that is being yielded to. Change here fixes the problem and also makes the resched conditional on rq != p_rq. Signed-off-by: Venkatesh Pallipadi Reviewed-by: Rik van Riel Signed-off-by: Peter Zijlstra LKML-Reference: <1299025701-22168-1-git-send-email-venki@google.com> Signed-off-by: Ingo Molnar commit c02aa73b1d18e43cfd79c2f193b225e84ca497c8 Author: Darren Hart Date: Thu Feb 17 15:37:07 2011 -0800 sched: Allow users with sufficient RLIMIT_NICE to change from SCHED_IDLE policy The current scheduler implementation returns -EPERM when trying to change from SCHED_IDLE to SCHED_OTHER or SCHED_BATCH. Since SCHED_IDLE is considered to be a nice 20 on steroids, changing to another policy should be allowed provided the RLIMIT_NICE is accounted for. This patch allows the following test-case to pass with RLIMIT_NICE=40, but still fail with RLIMIT_NICE=10 when the calling process is run from a typical shell (nice 0, or 20 in rlimit terms). int main() { int ret; struct sched_param sp; sp.sched_priority = 0; /* switch to SCHED_IDLE */ ret = sched_setscheduler(0, SCHED_IDLE, &sp); printf("setscheduler IDLE: %d\n", ret); if (ret) return ret; /* switch back to SCHED_OTHER */ ret = sched_setscheduler(0, SCHED_OTHER, &sp); printf("setscheduler OTHER: %d\n", ret); return ret; } $ ulimit -e 40 $ ./test setscheduler IDLE: 0 setscheduler OTHER: 0 $ ulimit -e 10 $ ulimit -e 10 $ ./test setscheduler IDLE: 0 setscheduler OTHER: -1 Signed-off-by: Darren Hart Signed-off-by: Peter Zijlstra Cc: Richard Purdie LKML-Reference: <4D657BEE.4040608@linux.intel.com> Signed-off-by: Ingo Molnar commit a2f5c9ab79f78e8b91ac993e0543d65b661dd19b Author: Darren Hart Date: Tue Feb 22 13:04:33 2011 -0800 sched: Allow SCHED_BATCH to preempt SCHED_IDLE tasks Perform the test for SCHED_IDLE before testing for SCHED_BATCH (and ensure idle tasks don't preempt idle tasks) so the non-interactive, but still important, SCHED_BATCH tasks will run in favor of the very low priority SCHED_IDLE tasks. Signed-off-by: Darren Hart Signed-off-by: Peter Zijlstra Acked-by: Mike Galbraith Cc: Richard Purdie LKML-Reference: <1298408674-3130-2-git-send-email-dvhart@linux.intel.com> Signed-off-by: Ingo Molnar commit e0a92c17470775cd85bac52f5372ccc3dc58254a Merge: 544b4a1 0c3b916 Author: Ingo Molnar Date: Fri Mar 4 11:12:24 2011 +0100 Merge branch 'sched/urgent' into sched/core Merge reason: Add fixes before applying dependent patches. Signed-off-by: Ingo Molnar commit 888a8a3e9d79cbb9d83e53955f684998248580ec Merge: cfff2d9 b06b3d4 Author: Ingo Molnar Date: Fri Mar 4 10:40:22 2011 +0100 Merge branch 'perf/urgent' into perf/core Merge reason: Pick up updates before queueing up dependent patches. Signed-off-by: Ingo Molnar commit f89112502805c1f6a6955f90ad158e538edb319d Author: Tejun Heo Date: Fri Mar 4 10:26:36 2011 +0100 x86-64, NUMA: Revert NUMA affine page table allocation This patch reverts NUMA affine page table allocation added by commit 1411e0ec31 (x86-64, numa: Put pgtable to local node memory). The commit made an undocumented change where the kernel linear mapping strictly follows intersection of e820 memory map and NUMA configuration. If the physical memory configuration has holes or NUMA nodes are not properly aligned, this leads to using unnecessarily smaller mapping size which leads to increased TLB pressure. For details, http://thread.gmane.org/gmane.linux.kernel/1104672 Patches to fix the problem have been proposed but the underlying code needs more cleanup and the approach itself seems a bit heavy handed and it has been determined to revert the feature for now and come back to it in the next developement cycle. http://thread.gmane.org/gmane.linux.kernel/1105959 As init_memory_mapping_high() callsites have been consolidated since the commit, reverting is done manually. Also, the RED-PEN comment in arch/x86/mm/init.c is not restored as the problem no longer exists with memblock based top-down early memory allocation. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Thomas Gleixner commit cfff2d909cbdaf8c467bd321aa0502a548ec8f7e Author: Frederic Weisbecker Date: Fri Feb 25 21:30:16 2011 +0100 perf: Fix undefined PyVarObject_HEAD_INIT in python 2.5 PyVarObject_HEAD_INIT is undefined in python 2.5, resulting in a build crash: util/python.c:81: attention : déclaration implicite de la fonction « «PyVarObject_HEAD_INIT» » util/python.c:82: erreur: request for member «tp_name» in something not a structure or union util/python.c:117: erreur: request for member «tp_name» in something not a structure or union util/python.c:146: erreur: request for member «tp_name» in something not a structure or union util/python.c:177: erreur: request for member «tp_name» in something not a structure or union util/python.c:290: erreur: request for member «tp_name» in something not a structure or union util/python.c:359: erreur: request for member «tp_name» in something not a structure or union util/python.c:532: erreur: request for member «tp_name» in something not a structure or union util/python.c:761: erreur: request for member «tp_name» in something not a structure or union error: command 'gcc' failed with exit status 1 make: *** [python/perf.so] Erreur 1 We can fix that by defining PyVarObject_HEAD_INIT as a wrapper on PyObject_HEAD_INIT, thanks to a trick found on biopython: https://github.com/biopython/biopython/commit/d4eaf57946c7b4c32eca8d18821edf32f83e300d Signed-off-by: Frederic Weisbecker Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi Cc: Arnaldo Carvalho de Melo commit ff9ae1babd8ce88c3f90db6278ea5f55bdcb4624 Author: Frederic Weisbecker Date: Fri Feb 25 21:57:04 2011 +0100 perf: Fix missing strndup declaration is included first without _GNU_SOURCE, so it ends up including without declaring strndup(). And further declarations, even with _GNU_SOURCE defined, are of course without effect. Therefore: util/strfilter.c: Dans la fonction «strfilter_node__new» : util/strfilter.c:134: attention : déclaration implicite de la fonction « «strndup» » util/strfilter.c:134: attention : incompatible implicit declaration of built-in function «strndup» make: *** [util/strfilter.o] Erreur 1 Just don't include ctype.h as it doesn't appear to be necessary anyway. Signed-off-by: Frederic Weisbecker Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi Cc: Arnaldo Carvalho de Melo commit 1aa0b51a033d4a1ec6d29d06487e053398afa21b Author: Konrad Rzeszutek Wilk Date: Thu Feb 17 11:23:58 2011 -0500 xen/irq: Cleanup up the pirq_to_irq for DomU PV PCI passthrough guests as well. We only did this for PV guests that are xen_initial_domain() but there is not reason not to do this for other cases. The other case is only exercised when you pass in a PCI device to a PV guest _and_ the device in question. Reviewed-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 676dc3cf5bc36a9e129a3ad8fe3bd7b2ebf20f5d Author: Thomas Gleixner Date: Sat Feb 5 20:08:59 2011 +0000 xen: Use IRQF_FORCE_RESUME Mark the IRQF_NO_SUSPEND interrupts IRQF_FORCE_RESUME and remove the extra walk through the interrupt descriptors. Signed-off-by: Thomas Gleixner Signed-off-by: Konrad Rzeszutek Wilk commit 8aef4857d26c46ca3d4f1a7f3a7aa4b51a72385e Merge: f611f2d dc5f219 Author: Konrad Rzeszutek Wilk Date: Thu Mar 3 12:02:02 2011 -0500 Merge branch 'irq/for-xen' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip into HEAD * 'irq/for-xen' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq: Add IRQF_FORCE_RESUME commit f611f2da99420abc973c32cdbddbf5c365d0a20c Author: Ian Campbell Date: Tue Feb 8 14:03:31 2011 +0000 xen/timer: Missing IRQF_NO_SUSPEND in timer code broke suspend. The patches missed an indirect use of IRQF_NO_SUSPEND pulled in via IRQF_TIMER. The following patch fixes the issue. With this fixlet PV guest migration works just fine. I also booted the entire series as a dom0 kernel and it appeared fine. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit aa673c1cb3a66d0b37595251c4e8bb688efc8726 Author: Ian Campbell Date: Mon Feb 7 11:08:39 2011 +0000 xen: Fix compile error introduced by "switch to new irq_chip functions" drivers/xen/events.c: In function 'ack_pirq': drivers/xen/events.c:568: error: implicit declaration of function 'irq_move_irq' Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit c9e265e030537167c94cbed190826f02e3887f4d Author: Thomas Gleixner Date: Sat Feb 5 20:08:54 2011 +0000 xen: Switch to new irq_chip functions Convert Xen to the new irq_chip functions. Brings us closer to enable CONFIG_GENERIC_HARDIRQS_NO_DEPRECATED Signed-off-by: Thomas Gleixner Acked-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 149f256f8ca690c28dd8aa9fb8bcdaf2e93b1e1c Author: Thomas Gleixner Date: Sat Feb 5 20:08:52 2011 +0000 xen: Remove stale irq_chip.end irq_chip.end got obsolete with the removal of __do_IRQ() Signed-off-by: Thomas Gleixner Acked-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 7214610475b2847a81478d96e4d3ba0bbe49598c Author: Ian Campbell Date: Thu Feb 3 09:49:35 2011 +0000 xen: events: do not free legacy IRQs c514d00c8057 "xen: events: add xen_allocate_irq_{dynamic, gsi} and xen_free_irq" correctly avoids reallocating legacy IRQs (which are managed by the arch core) but erroneously did not prevent them being freed. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 89911501f3aae44a43984793341a3bf1f4c583c2 Author: Ian Campbell Date: Thu Mar 3 11:57:44 2011 -0500 xen: events: allocate GSIs and dynamic IRQs from separate IRQ ranges. There are three cases which we need to care about, PV guest, PV domain 0 and HVM guest. The PV guest case is simple since it has no access to ACPI or real APICs and therefore has no GSIs therefore we simply dynamically allocate all IRQs. The potentially interesting case here is PIRQ type event channels associated with passed through PCI devices. However even in this case the guest has no direct interaction with the physical GSI since that happens in the PCI backend. The PV domain 0 and HVM guest cases are actually the same. In domain 0 case the kernel sees the host ACPI and GSIs (although it only sees the APIC indirectly via the hypervisor) and in the HVM guest case it sees the virtualised ACPI and emulated APICs. In these cases we start allocating dynamic IRQs at nr_irqs_gsi so that they cannot clash with any GSI. Currently xen_allocate_irq_dynamic starts at nr_irqs and works backwards looking for a free IRQ in order to (try and) avoid clashing with GSIs used in domain 0 and in HVM guests. This change avoids that although we retain the behaviour of allowing dynamic IRQs to encroach on the GSI range if no suitable IRQs are available since a future IRQ clash is deemed preferable to failure right now. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk Cc: Stefano Stabellini Cc: Jeremy Fitzhardinge commit c9df1ce585e3bb5a2f101c1d87381b285a9f962f Author: Ian Campbell Date: Tue Jan 11 17:20:15 2011 +0000 xen: events: add xen_allocate_irq_{dynamic, gsi} and xen_free_irq This is neater than open-coded calls to irq_alloc_desc_at and irq_free_desc. No intended behavioural change. Note that we previously were not checking the return value of irq_alloc_desc_at which would be failing for GSI Signed-off-by: Konrad Rzeszutek Wilk Cc: Stefano Stabellini Cc: Jeremy Fitzhardinge commit cbf6aa89fc52c5253ee141d53eeb73147eb37ac0 Author: Ian Campbell Date: Tue Jan 11 17:20:14 2011 +0000 xen:events: move find_unbound_irq inside CONFIG_PCI_MSI The only caller is xen_allocate_pirq_msi which is also under this ifdef so this fixes: drivers/xen/events.c:377: warning: 'find_unbound_pirq' defined but not used when CONFIG_PCI_MSI=n Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk Cc: Stefano Stabellini Cc: Jeremy Fitzhardinge commit 3f2a230caf21a1f7ac75f9e4892d0e5af9ccee88 Author: Ian Campbell Date: Tue Jan 11 17:20:13 2011 +0000 xen: handled remapped IRQs when enabling a pcifront PCI device. This happens to not be an issue currently because we take pains to try to ensure that the GSI-IRQ mapping is 1-1 in a PV guest and that regular event channels do not clash. However a subsequent patch is going to break this 1-1 mapping. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk Cc: Stefano Stabellini Cc: Jeremy Fitzhardinge commit 6eaa412f2753d98566b777836a98c6e7f672a3bb Author: Konrad Rzeszutek Wilk Date: Tue Jan 18 20:09:41 2011 -0500 xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY. With this patch, we diligently set regions that will be used by the balloon driver to be INVALID_P2M_ENTRY and under the ownership of the balloon driver. We are OK using the __set_phys_to_machine as we do not expect to be allocating any P2M middle or entries pages. The set_phys_to_machine has the side-effect of potentially allocating new pages and we do not want that at this stage. We can do this because xen_build_mfn_list_list will have already allocated all such pages up to xen_max_p2m_pfn. We also move the check for auto translated physmap down the stack so it is present in __set_phys_to_machine. [v2: Rebased with mmu->p2m code split] Reviewed-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit d04c579f971bf7d995db1ef7a7161c0143068859 Author: Jan Beulich Date: Thu Mar 3 10:55:29 2011 +0000 x86: Work around old gas bug Add extra parentheses around a couple of definitions introduced by "x86: Cleanup vector usage" and used in assembly macro arguments, and remove spaces. Without that old (2.16.1) gas would see more macro arguments than were actually specified. Reported-and-tested-by: Andrew Morton Signed-off-by: Jan Beulich Cc: Shaohua Li LKML-Reference: <4D6F81B10200007800034B0B@vpn.id2.novell.com> Signed-off-by: Ingo Molnar commit c09d7a3d2e365e11c09b9c6414c17fe55bd32a8e Merge: 0a10247 4defe68 Author: Frederic Weisbecker Date: Wed Mar 2 16:09:55 2011 +0100 Merge branch '/tip/perf/filter' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git into perf/core commit 5cd10e7946d28cfc42442fee2e6c757e244d756e Author: Thomas Gleixner Date: Wed Mar 2 16:58:30 2011 +0100 hrtimer: Update base[CLOCK_BOOTTIME].offset correctly We calculate the current time of each clock base by adding an offset to clock_monotonic. The offset for the clock bases is set in retrigger_next_event() which is called when we switch a cpu to highres mode or when the clock was set. Add the missing update for clock boottime. Signed-off-by: Thomas Gleixner Cc: John Stultz commit eb8c1e2c830fc25c93bc94e215ed387fe142a98d Author: Tejun Heo Date: Wed Mar 2 11:32:47 2011 +0100 x86-64, NUMA: Better explain numa_distance handling Handling of out-of-bounds distances and allocation failure can use better documentation. Add it. Signed-off-by: Tejun Heo Cc: Yinghai Lu Acked-by: David Rientjes commit ce0033307f1b45e23e0c149f56ea4855eb4687ce Author: Yinghai Lu Date: Wed Mar 2 11:22:14 2011 +0100 x86-64, NUMA: Fix distance table handling NUMA distance table handling has the following problems. * numa_reset_distance() uses numa_distance * sizeof(numa_distance[0]) as the table size when it should be using the square of numa_distance. * The same size miscalculation when allocation space for phys_dist in numa_emulation(). * In numa_emulation(), phys_dist must be reserved; otherwise, the new emulated distance table may overlap it. Fix them and, while at it, take numa_distance_cnt resetting in numa_reset_distance() out of the if block to simplify the code a bit. David Rientjes reported incorrect handling of distance table during emulation. -tj: Edited out numa_alloc_distance() related changes which weren't necessary and rewrote patch description. -v2: Ingo was unhappy with 80-column limit induced linebreaks. Let lines run over 80-column. Signed-off-by: Yinghai Lu Reported-by: David Rientjes Signed-off-by: Tejun Heo Cc: Ingo Molnar Acked-by: David Rientjes commit 0a10247914a5cad3caf7ef8a255c54c4d3ed2062 Author: Frederic Weisbecker Date: Sat Feb 26 04:51:54 2011 +0100 perf: Set filters before mmaping events We currently set the filters after we mmap the events, this is a race that let undesired events record themselves in the buffer before we had the time to set the filters. So set the filters before they can be recorded. That also librarizes the filters setting so that filtering can be done more easily from other tools than perf record later. Signed-off-by: Frederic Weisbecker Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi Cc: Arnaldo Carvalho de Melo Cc: Steven Rostedt commit b06b3d49699a52e8f9ca056c4f96e81b1987d78e Author: Lin Ming Date: Wed Mar 2 21:27:04 2011 +0800 perf, x86: Add Intel SandyBridge CPU support This patch adds basic SandyBridge support, including hardware cache events and PEBS events support. It has been tested on SandyBridge CPUs with perf stat and also with PEBS based profiling - both work fine. The patch does not affect other models. v2 -> v3: - fix PEBS event 0xd0 with right umask combinations - move snb pebs constraint assignment to intel_pmu_init v1 -> v2: - add more raw and PEBS events constraints - use offcore events for LLC-* cache events - remove the call to Nehalem workaround enable_all function Signed-off-by: Lin Ming Acked-by: Peter Zijlstra Cc: Stephane Eranian Cc: Andi Kleen LKML-Reference: <1299072424.2175.24.camel@localhost> Signed-off-by: Ingo Molnar commit c69e3758ff56d03e161187355791ec992c574276 Author: Thomas Gleixner Date: Wed Mar 2 11:49:21 2011 +0100 genirq: Fixup fasteoi handler for oneshot mode The fasteoi handler must mask the interrupt line in oneshot mode otherwise we end up with an irq storm. Signed-off-by: Thomas Gleixner commit e938c287ea8d977e079f07464ac69923412663ce Author: Jan Beulich Date: Tue Mar 1 14:28:02 2011 +0000 x86: Fix a bogus unwind annotation in lib/semaphore_32.S 'simple' would have required specifying current frame address and return address location manually, but that's obviously not the case (and not necessary) here. Signed-off-by: Jan Beulich LKML-Reference: <4D6D1082020000780003454C@vpn.id2.novell.com> Signed-off-by: Ingo Molnar commit 44e69767cb7c3bc46e5370c39532c205d4347d80 Author: Ian Campbell Date: Tue Mar 1 20:05:49 2011 +0000 xen: ia64 build broken due to "xen: switch to new schedop hypercall by default." The git commit: > commit a8b7458363b9174f3c2196ca6085630b4b30b7a1 > Author: Ian Campbell > Date: Thu Feb 17 11:04:20 2011 +0000 > > xen: switch to new schedop hypercall by default. > > Rename old interface to sched_op_compat and rename sched_op_new to > simply sched_op. > breaks the IA64 build. This patch fixes it. Signed-off-by: Tony Luck Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 5807806a92450fd57f8063868efae9d4af74db02 Author: Arnaldo Carvalho de Melo Date: Tue Mar 1 10:43:03 2011 -0300 perf top tui: Wait till the first sample to refresh the screen. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 374cfe56892701f062586d6a6de6cb71777a4184 Author: Arnaldo Carvalho de Melo Date: Tue Mar 1 10:27:27 2011 -0300 perf top: Fix reporting of invalid --vmlinux Using ui__warning, that will, in --tui, show a window with the message, waiting for the user to press Ok. Also run exit_browser() to let newt do its final cleaning of the screen. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit a1ceb741cf86ef433006379742db81c00b450bae Author: Arnaldo Carvalho de Melo Date: Tue Mar 1 10:24:43 2011 -0300 perf tui: Make ui__warning modal By taking the ui__lock so that no other screen updates take place while waiting for the user. That was happening when handling an invalid --vmlinux parameter in 'perf top --tui', with the screen refresh routine repainting the screen and removing the warning window. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 3166fc8fb6a2f52273d545e970297524e02c3e39 Author: Arnaldo Carvalho de Melo Date: Tue Mar 1 10:21:44 2011 -0300 perf top browser: Handle empty active symbols list Fixing a SEGV. An empty list could happen when not being able to resolve symbols, for instance when --vmlinux invalid-file is used. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit bfc39061d3dbf812e6a78f9529a548e5f0050c64 Author: Jan Beulich Date: Tue Mar 1 11:14:55 2011 +0000 um, x86-64: Fix UML build after adding CFI annotations to lib/rwsem_64.S arch/um/Kconfig.x86 has X86_32 but not X86_64 - that's resulting in asm/dwarf2.h producing the 32-bit (pushl_cfi & Co) macros instead of the 64-bit ones. Signed-off-by: Jan Beulich Cc: Jeff Dike LKML-Reference: <4D6CE3400200007800034498@vpn.id2.novell.com> Signed-off-by: Ingo Molnar commit 039e13890b0615cb8c5c04b6afa84d676e24c761 Author: Jan Beulich Date: Mon Feb 28 15:56:00 2011 +0000 x86: Remove unused bits from lib/thunk_*.S Some of the items removed were apparently never used, others simply didn't get removed with their last user. Signed-off-by: Jan Beulich LKML-Reference: <4D6BD3A002000078000341F1@vpn.id2.novell.com> Signed-off-by: Ingo Molnar commit 60cf637a13932a4750da6746efd0199e8a4c341b Author: Jan Beulich Date: Mon Feb 28 15:54:40 2011 +0000 x86: Use {push,pop}_cfi in more places Cleaning up and shortening code... Signed-off-by: Jan Beulich Cc: Alexander van Heukelum LKML-Reference: <4D6BD35002000078000341DA@vpn.id2.novell.com> Signed-off-by: Ingo Molnar commit 39f2205e1abd1b6fffdaf45e1f1c3049a5f8999c Author: Jan Beulich Date: Mon Feb 28 15:31:59 2011 +0000 x86-64: Add CFI annotations to lib/rwsem_64.S These weren't part of the initial commit of this code. Signed-off-by: Jan Beulich Cc: Alexander van Heukelum LKML-Reference: <4D6BCDFF02000078000341B0@vpn.id2.novell.com> Signed-off-by: Ingo Molnar commit a56ec98357ad26b380f1005198de1aa519c9e9cb Author: Sebastian Andrzej Siewior Date: Sun Feb 27 19:13:39 2011 +0100 x86: dt: Correct local apic documentation in device tree bindings Until "x86: dt: Cleanup local apic setup" we read the local apic address from the MSR and ignored the entry in DT. Reflect this change in the documentation. Signed-off-by: Sebastian Andrzej Siewior LKML-Reference: <1298830419-22681-1-git-send-email-bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner commit cc28989437de5617875a2943697fe6ba51a0da8f Author: Yinghai Lu Date: Sat Feb 26 13:05:43 2011 +0100 mm: Move early_node_map[] reverse scan helpers under HAVE_MEMBLOCK Heiko found recent memblock change triggers these warnings on s390: mm/page_alloc.c:3623:22: warning: 'last_active_region_index_in_nid' defined but not used mm/page_alloc.c:3638:22: warning: 'previous_active_region_index_in_nid' defined but not used Need to move those two function under HAVE_MEMBLOCK with its only user, find_memory_core_early(). -tj: Minor updates to description. Reported-by: Heiko Carstens Signed-off-by: Yinghai Lu Signed-off-by: Tejun Heo commit 8d32a307e4faa8b123dc8a9cd56d1a7525f69ad3 Author: Thomas Gleixner Date: Wed Feb 23 23:52:23 2011 +0000 genirq: Provide forced interrupt threading Add a commandline parameter "threadirqs" which forces all interrupts except those marked IRQF_NO_THREAD to run threaded. That's mostly a debug option to allow retrieving better debug data from crashing interrupt handlers. If "threadirqs" is not enabled on the kernel command line, then there is no impact in the interrupt hotpath. Architecture code needs to select CONFIG_IRQ_FORCED_THREADING after marking the interrupts which cant be threaded IRQF_NO_THREAD. All interrupts which have IRQF_TIMER set are implict marked IRQF_NO_THREAD. Also all PER_CPU interrupts are excluded. Forced threading hard interrupts also forces all soft interrupt handling into thread context. When enabled it might slow down things a bit, but for debugging problems in interrupt code it's a reasonable penalty as it does not immediately crash and burn the machine when an interrupt handler is buggy. Some test results on a Core2Duo machine: Cache cold run of: # time git grep irq_desc non-threaded threaded real 1m18.741s 1m19.061s user 0m1.874s 0m1.757s sys 0m5.843s 0m5.427s # iperf -c server non-threaded [ 3] 0.0-10.0 sec 1.09 GBytes 933 Mbits/sec [ 3] 0.0-10.0 sec 1.09 GBytes 934 Mbits/sec [ 3] 0.0-10.0 sec 1.09 GBytes 933 Mbits/sec threaded [ 3] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec [ 3] 0.0-10.0 sec 1.09 GBytes 934 Mbits/sec [ 3] 0.0-10.0 sec 1.09 GBytes 937 Mbits/sec Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra LKML-Reference: <20110223234956.772668648@linutronix.de> commit 544b4a1f309d18f40969dbab7e08bafd136b2f55 Author: Venkatesh Pallipadi Date: Fri Feb 25 15:13:16 2011 -0800 sched: Clean up the IRQ_TIME_ACCOUNTING code Fix this warning: lkml.org/lkml/2011/1/30/124 kernel/sched.c:3719: warning: 'irqtime_account_idle_ticks' defined but not used kernel/sched.c:3720: warning: 'irqtime_account_process_tick' defined but not used In a cleaner way than: 7e9498705e81: sched: Add #ifdef around irq time accounting functions This patch will not have any functional impact. Signed-off-by: Venkatesh Pallipadi Cc: heiko.carstens@de.ibm.com Cc: a.p.zijlstra@chello.nl LKML-Reference: <1298675596-10992-1-git-send-email-venki@google.com> Signed-off-by: Ingo Molnar commit 7bf04be8f48ceeeffa5b5a79734d6d6e0d59e5f8 Author: Stratos Psomadakis Date: Fri Feb 25 22:46:13 2011 +0200 x86, asm: Cleanup unnecssary macros in asm-offsets.c PAGE_SIZE_asm, PAGE_SHIFT_asm, THREAD_SIZE_asm can be safely removed from asm-offsets.c, and be replaced by their non-'_asm' counterparts in the code that uses them, since the _AC macro defined in include/linux/const.h makes PAGE_SIZE/PAGE_SHIFT/THREAD_SIZE work with as. Signed-off-by: Stratos Psomadakis LKML-Reference: <1298666774-17646-2-git-send-email-psomas@cslab.ece.ntua.gr> Signed-off-by: H. Peter Anvin commit 8eb90c30e0e815a1308828352eabd03ca04229dd Author: Thomas Gleixner Date: Wed Feb 23 23:52:21 2011 +0000 sched: Switch wait_task_inactive to schedule_hrtimeout() When we force thread hard and soft interrupts the startup of ksoftirqd would hang in kthread_bind() when wait_task_inactive() calls schedule_timeout_uninterruptible() because there is no softirq yet which will wake us up. Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra LKML-Reference: <20110223234956.677109139@linutronix.de> commit 0c4602ff88d6d6ef0ee6d228ee9acaa6448ff6f5 Author: Thomas Gleixner Date: Wed Feb 23 23:52:18 2011 +0000 genirq: Add IRQF_NO_THREAD Some low level interrupts cannot be threaded even when we force thread all interrupt handlers. Add a flag to annotate such interrupts. Add all timer interrupts to this category by default. Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra LKML-Reference: <20110223234956.578893460@linutronix.de> commit 9d591edd02a245305b1b9379e4c5571bad4d2774 Author: Thomas Gleixner Date: Wed Feb 23 23:52:16 2011 +0000 genirq: Allow shared oneshot interrupts Support ONESHOT on shared interrupts, if all drivers agree on it. Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra LKML-Reference: <20110223234956.483640430@linutronix.de> commit b5faba21a6805c33b40e258d36f57997ee1de131 Author: Thomas Gleixner Date: Wed Feb 23 23:52:13 2011 +0000 genirq: Prepare the handling of shared oneshot interrupts For level type interrupts we need to track how many threads are on flight to avoid useless interrupt storms when not all thread handlers have finished yet. Keep track of the woken threads and only unmask when there are no more threads in flight. Yes, I'm lazy and using a bitfield. But not only because I'm lazy, the main reason is that it's way simpler than using a refcount. A refcount based solution would need to keep track of various things like crashing the irq thread, spurious interrupts coming in, disables/enables, free_irq() and some more. The bitfield keeps the tracking simple and makes things just work. It's also nicely confined to the thread code pathes and does not require additional checks all over the place. Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra LKML-Reference: <20110223234956.388095876@linutronix.de> commit b056b6a0144de90707cd22cf7b4f60bf69c86d59 Author: Ian Campbell Date: Thu Feb 17 11:04:20 2011 +0000 xen: suspend: remove xen_hvm_suspend It is now identical to xen_suspend, the differences are encapsulated in the suspend_info struct. Signed-off-by: Ian Campbell Reviewed-by: Konrad Rzeszutek Wilk commit 55fb4acef7089a6d4d93ed8caae6c258d06cfaf7 Author: Ian Campbell Date: Thu Feb 17 11:04:20 2011 +0000 xen: suspend: pull pre/post suspend hooks out into suspend_info Signed-off-by: Ian Campbell Reviewed-by: Konrad Rzeszutek Wilk commit 07af38102fc4f260cc5a2418ec833707f53cdf70 Author: Ian Campbell Date: Thu Feb 17 11:04:20 2011 +0000 xen: suspend: move arch specific pre/post suspend hooks into generic hooks Signed-off-by: Ian Campbell Reviewed-by: Konrad Rzeszutek Wilk commit 82043bb60d24d2897074905c94be5a53071e8913 Author: Ian Campbell Date: Thu Feb 17 11:04:20 2011 +0000 xen: suspend: refactor non-arch specific pre/post suspend hooks Signed-off-by: Ian Campbell Reviewed-by: Konrad Rzeszutek Wilk commit 03c8142bd2fb3b87effa6ecb2f8957be588bc85f Author: Ian Campbell Date: Thu Feb 17 11:04:20 2011 +0000 xen: suspend: add "arch" to pre/post suspend hooks xen_pre_device_suspend is unused on ia64. Signed-off-by: Ian Campbell Reviewed-by: Konrad Rzeszutek Wilk commit 36b401e2c2788c7b4881115ddbbff603fe4cf78d Author: Ian Campbell Date: Thu Feb 17 11:04:20 2011 +0000 xen: suspend: pass extra hypercall argument via suspend_info struct Signed-off-by: Ian Campbell Reviewed-by: Konrad Rzeszutek Wilk commit ceb180294790c8a6a437533488616f6b591b49d0 Author: Ian Campbell Date: Thu Feb 17 11:04:20 2011 +0000 xen: suspend: refactor cancellation flag into a structure Will add extra fields in subsequent patches. Signed-off-by: Ian Campbell Reviewed-by: Konrad Rzeszutek Wilk commit bd1c0ad28451df4610d352c7e438213c84de0c28 Author: Ian Campbell Date: Thu Feb 17 11:04:20 2011 +0000 xen: suspend: use HYPERVISOR_suspend for PVHVM case instead of open coding Signed-off-by: Ian Campbell Reviewed-by: Konrad Rzeszutek Wilk commit a8b7458363b9174f3c2196ca6085630b4b30b7a1 Author: Ian Campbell Date: Thu Feb 17 11:04:20 2011 +0000 xen: switch to new schedop hypercall by default. Rename old interface to sched_op_compat and rename sched_op_new to simply sched_op. Signed-off-by: Ian Campbell Reviewed-by: Konrad Rzeszutek Wilk commit 8e15597fa430c03415e2268dfbae0f262b948788 Author: Ian Campbell Date: Thu Feb 17 11:04:20 2011 +0000 xen: use new schedop interface for suspend Take the opportunity to comment on the semantics of the PV guest suspend hypercall arguments. Signed-off-by: Ian Campbell Reviewed-by: Konrad Rzeszutek Wilk commit 552717231e50b478dfd19d63fd97879476ae051d Author: Ian Campbell Date: Thu Feb 17 11:04:20 2011 +0000 xen: do not respond to unknown xenstore control requests The PV xenbus control/shutdown node is written by the toolstack as a request to the guest to perform a particular action (shutdown, reboot, suspend etc). The guest is expected to acknowledge that it will complete a request by clearing the control node. Previously it would acknowledge any request, even if it did not know what to do with it. Specifically in the case where CONFIG_PM_SLEEP is not enabled the kernel would acknowledge a suspend request even though it was not actually going to do anything. Instead make the kernel only acknowledge requests if it is actually going to do something with it. This will improve the toolstack's ability to diagnose and deal with failures. Signed-off-by: Ian Campbell Reviewed-by: Konrad Rzeszutek Wilk commit e057a4b6e0eb6701f6ec923be2075d4984cef51a Author: Stefano Stabellini Date: Fri Feb 11 17:55:13 2011 +0000 xen: fix compile issue if XEN is enabled but XEN_PVHVM is disabled Signed-off-by: Stefano Stabellini commit 99bbb3a84a99cd04ab16b998b20f01a72cfa9f4f Author: Stefano Stabellini Date: Thu Dec 2 17:55:10 2010 +0000 xen: PV on HVM: support PV spinlocks and IPIs Initialize PV spinlocks on boot CPU right after native_smp_prepare_cpus (that switch to APIC mode and initialize APIC routing); on secondary CPUs on CPU_UP_PREPARE. Enable the usage of event channels to send and receive IPIs when running as a PV on HVM guest. Signed-off-by: Stefano Stabellini commit 53d5522cad291a0e93a385e0594b6aea6b54a071 Author: Stefano Stabellini Date: Thu Dec 2 17:55:05 2010 +0000 xen: make the ballon driver work for hvm domains Signed-off-by: Stefano Stabellini commit c80a420995e721099906607b07c09a24543b31d9 Author: Stefano Stabellini Date: Thu Dec 2 17:55:00 2010 +0000 xen-blkfront: handle Xen major numbers other than XENVBD This patch makes sure blkfront handles correctly virtual device numbers corresponding to Xen emulated IDE and SCSI disks: in those cases blkfront translates the major number to XENVBD and the minor number to a low xvd minor. Note: this behaviour is different from what old xenlinux PV guests used to do: they used to steal an IDE or SCSI major number and use it instead. Signed-off-by: Stefano Stabellini Acked-by: Jeremy Fitzhardinge commit cff520b9c2ee1486ea9ff1dbc774510c62e5ecb9 Author: Stefano Stabellini Date: Thu Dec 2 17:54:54 2010 +0000 xen: do not use xen_info on HVM, set pv_info name to "Xen HVM" Signed-off-by: Stefano Stabellini Acked-by: Jeremy Fitzhardinge commit 702d4eb9b3de4398ab99cf0a4e799e552c7ab756 Author: Stefano Stabellini Date: Thu Dec 2 17:54:50 2010 +0000 xen: no need to delay xen_setup_shutdown_event for hvm guests anymore Now that xenstore_ready is used correctly for PV on HVM guests too, we don't need to delay the initialization of xen_setup_shutdown_event anymore. Signed-off-by: Stefano Stabellini Acked-by: Jeremy Fitzhardinge commit 1204e95689f9fbd245a4ce5c1b0cd0a9b77f8d25 Author: Thomas Gleixner Date: Fri Feb 25 17:17:18 2011 +0100 genirq: Make warning in handle_percpu_event useful The WARN_ON_ONCE in handle_percpu_event() which emits a warning when an action handler returns with interrupts enabled is not really useful. It does not reveal the interrupt number and handler function which caused it. Make it WARN_ONCE() and add the information. Reported-by: Tony Luck Signed-off-by: Thomas Gleixner commit a906fdaacca49917d83e5032dfc31f694249ad10 Author: Thomas Gleixner Date: Fri Feb 25 16:09:31 2011 +0100 x86: dt: Cleanup local apic setup Up to now we force enable the local apic in the devicetree setup uncoditionally and set smp_found_config unconditionally to 1 when a devicetree blob is available. This breaks, when local apic is disabled in the Kconfig. Make it consistent by initializing device tree explicitely before smp_get_config() so a non lapic configuration could be used as well. To be functional that would require to implement PIT as an interrupt host, but the only user of this code until now is ce4100 which requires apics to be available. So we leave this up to those who need it. Tested-by: Sebastian Siewior Signed-off-by: Thomas Gleixner commit b210b3bb1b002f27165325a5edb6ebce3c168e92 Author: Arnaldo Carvalho de Melo Date: Fri Feb 25 11:33:31 2011 -0300 perf ui browser: Introduce ui_browser__show_title Needed because we were only showing the title in ui_browser__show, not in ui_browser__run, and in the run loop we may be calling other browsers that would then change the title, when we go back to the previous browser, we need to redraw the title. We could have done this as the Newt help line, with pop, etc, but I don't think its worth, doing it explicitely, when needed (some browsers may not use the title area at all) seems enough/more flexible. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 7e9498705e810404ecf29bb2d6fa632b9484c609 Author: Heiko Carstens Date: Fri Feb 25 14:32:28 2011 +0100 sched: Add #ifdef around irq time accounting functions Get rid of this: kernel/sched.c:3731:13: warning: 'irqtime_account_idle_ticks' defined but not used kernel/sched.c:3732:13: warning: 'irqtime_account_process_tick' defined but not used Signed-off-by: Heiko Carstens Cc: Venkatesh Pallipadi Cc: Peter Zijlstra LKML-Reference: <20110225133228.GD7469@osiris.boeblingen.de.ibm.com> Signed-off-by: Ingo Molnar commit c16bfe9ac389b13a37ff617a09682ecc0685960f Author: Arnaldo Carvalho de Melo Date: Fri Feb 25 09:30:29 2011 -0300 perf top browser: Fix up exit keys The left key was exiting 'perf top --tui' when it really shouldn't, it was too easy to leave the live annotation window and then press one too many <- and get out of the tool altogether. Do just like the report TUI does, ignore the left key for exit and also ask the user when pressing ESC if that is really what is wanted. Reported-by: Mike Galbraith Suggested-by: Ingo Molnar Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 1f565a896ee139a70e1a16f74a4ec29707691b0b Author: David Rientjes Date: Fri Feb 25 10:06:39 2011 +0100 x86-64, NUMA: Fix size of numa_distance array numa_distance should be sized like the SLIT, an NxN matrix where N is the highest node id + 1. This patch fixes the calculation to avoid overflowing the array on the subsequent iteration. -tj: The original patch used last index to calculate size. Yinghai pointed out it should be incremented so it is the number of elements instead of the last index to calculate the size of the table. Updated accordingly. Signed-off-by: David Rientjes Cc: Yinghai Lu Signed-off-by: Tejun Heo commit d1b19426b04787e48f2689923e28d37b488969b0 Author: Yinghai Lu Date: Thu Feb 24 14:46:24 2011 +0100 x86: Rename e820_table_* to pgt_buf_* e820_table_{start|end|top}, which are used to buffer page table allocation during early boot, are now derived from memblock and don't have much to do with e820. Change the names so that they reflect what they're used for. This patch doesn't introduce any behavior change. -v2: Ingo found that earlier patch "x86: Use early pre-allocated page table buffer top-down" caused crash on 32bit and needed to be dropped. This patch was updated to reflect the change. -tj: Updated commit description. Signed-off-by: Yinghai Lu Signed-off-by: Tejun Heo commit 8bc1f91e1f0e977fb95b11d8fa686f5091888110 Author: Yinghai Lu Date: Thu Feb 24 14:43:06 2011 +0100 bootmem: Move __alloc_memory_core_early() to nobootmem.c Now that bootmem.c and nobootmem.c are separate, there's no reason to define __alloc_memory_core_early(), which is used only by nobootmem, inside #ifdef in page_alloc.c. Move it to nobootmem.c and make it static. This patch doesn't introduce any behavior change. -tj: Updated commit description. Signed-off-by: Yinghai Lu Acked-by: Andrew Morton Signed-off-by: Tejun Heo commit e782ab421bbba1912c87934bd0e8998630736418 Author: Yinghai Lu Date: Thu Feb 24 14:43:06 2011 +0100 bootmem: Move contig_page_data definition to bootmem.c/nobootmem.c Now that bootmem.c and nobootmem.c are separate, it's cleaner to define contig_page_data in each file than in page_alloc.c with #ifdef. Move it. This patch doesn't introduce any behavior change. -v2: According to Andrew, fixed the struct layout. -tj: Updated commit description. Signed-off-by: Yinghai Lu Acked-by: Andrew Morton Signed-off-by: Tejun Heo commit 0932587328d9bd5b500a640fbaff3290c8d4cabf Author: Yinghai Lu Date: Thu Feb 24 14:43:05 2011 +0100 bootmem: Separate out CONFIG_NO_BOOTMEM code into nobootmem.c mm/bootmem.c contained code paths for both bootmem and no bootmem configurations. They implement about the same set of APIs in different ways and as a result bootmem.c contains massive amount of #ifdef CONFIG_NO_BOOTMEM. Separate out CONFIG_NO_BOOTMEM code into mm/nobootmem.c. As the common part is relatively small, duplicate them in nobootmem.c instead of creating a common file or ifdef'ing in bootmem.c. The followings are duplicated. * {min|max}_low_pfn, max_pfn, saved_max_pfn * free_bootmem_late() * ___alloc_bootmem() * __alloc_bootmem_low() The followings are applicable only to nobootmem and moved verbatim. * __free_pages_memory() * free_all_memory_core_early() The followings are not applicable to nobootmem and omitted in nobootmem.c. * reserve_bootmem_node() * reserve_bootmem() The rest split function bodies according to CONFIG_NO_BOOTMEM. Makefile is updated so that only either bootmem.c or nobootmem.c is built according to CONFIG_NO_BOOTMEM. This patch doesn't introduce any behavior change. -tj: Rewrote commit description. Suggested-by: Ingo Molnar Signed-off-by: Yinghai Lu Acked-by: Andrew Morton Signed-off-by: Tejun Heo commit 4a66b1d95ad8baf6ab884a1c64461449b463eb78 Author: Sebastian Andrzej Siewior Date: Thu Feb 24 09:52:42 2011 +0100 x86: dt: Fix OLPC=y/INTEL_CE=n build Both OLPC and CE4100 activate CONFIG_OF. OLPC uses PROMTREE while CE uses FLATTREE. Compiling for OLPC only breaks due to missing flat tree functions and variables. Use proper wrappers and provide an empty x86_flattree_get_config() inline so OF=y FLATTREE=n builds and works. [ tglx: Make it work with HPET_TIMER=n and make a function static ] Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 3bcbaf6e08d8d82cde781997bd2c56dda87049b5 Author: Sebastian Andrzej Siewior Date: Tue Feb 22 21:07:46 2011 +0100 rtc: cmos: Add OF bindings This allows to load the OF driver based informations from the device tree. Systems without BIOS may need to perform some initialization. PowerPC creates a PNP device from the OF information and performs this kind of initialization in their private PCI quirk. This looks more generic. This patch also avoids registering the platform RTC driver on X86 if we have a device tree blob. Otherwise we would setup the device based on the hardcoded information in arch/x86 rather than the device tree based one. [ tglx: Changed "int of_have_populated_dt()" to bool as recommended by Grant ] Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Dirk Brandewie Acked-by: Grant Likely Cc: sodaville@linutronix.de Cc: devicetree-discuss@lists.ozlabs.org Cc: rtc-linux@googlegroups.com Cc: Alessandro Zummo LKML-Reference: <1298405266-1624-12-git-send-email-bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner commit 1fa4163bdc199a0b80f9e333d718b3f65e901593 Author: Sebastian Andrzej Siewior Date: Tue Feb 22 21:07:45 2011 +0100 x86: ce4100: Use OF to setup devices Use device tree information to setup IO_APIC configuration, interrupt routing, HPET and everything else which cannot be enumerated by other means. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Dirk Brandewie Acked-by: Grant Likely Cc: sodaville@linutronix.de Cc: devicetree-discuss@lists.ozlabs.org LKML-Reference: <1298405266-1624-11-git-send-email-bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner commit bcc7c1244fcfd852b9f4590935491057e1cab9dd Author: Sebastian Andrzej Siewior Date: Tue Feb 22 21:07:44 2011 +0100 x86: ioapic: Add OF bindings for IO_APIC ioapic_xlate provides a translation from the information in device tree to ioapic related informations. This includes - obtaining hw irq which is the vector number "=> pin number + gsi" - obtaining type (level/edge/..) - programming this information into ioapic ioapic_add_ofnode adds an irq_domain based on informations from the device tree. This information (irq_domain) is required in order to map a device to its proper interrupt controller. [ tglx: Adapted to the io_apic changes, which let us move that whole code to devicetree.c ] Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Dirk Brandewie Acked-by: Grant Likely Cc: sodaville@linutronix.de Cc: devicetree-discuss@lists.ozlabs.org LKML-Reference: <1298405266-1624-10-git-send-email-bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner commit 9079b35364e75ce6b968a179f861d2f819f33e61 Author: Sebastian Andrzej Siewior Date: Tue Feb 22 21:07:43 2011 +0100 x86: dtb: Add generic bus probe For now we probe these busses and we change this to board dependent probes once we have to. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Dirk Brandewie Acked-by: Grant Likely Cc: sodaville@linutronix.de Cc: devicetree-discuss@lists.ozlabs.org LKML-Reference: <1298405266-1624-9-git-send-email-bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner commit 96e0a0797eba35b5420c710b928f19094b2d5c45 Author: Sebastian Andrzej Siewior Date: Tue Feb 22 21:07:42 2011 +0100 x86: dtb: Add support for PCI devices backed by dtb nodes x86_of_pci_init() does two things: - it provides a generic irq enable and disable function. enable queries the device tree for the interrupt information, calls ->xlate on the irq host and updates the pci->irq information for the device. - it walks through PCI bus(es) in the device tree and adds its children (device) nodes to appropriate pci_dev nodes in kernel. So the dtb node information is available at probe time of the PCI device. Adding a PCI bus based on the information in the device tree is currently not supported. Right now direct access via ioports is used. Signed-off-by: Sebastian Andrzej Siewior Tested-by: Dirk Brandewie Acked-by: Grant Likely Cc: sodaville@linutronix.de Cc: devicetree-discuss@lists.ozlabs.org LKML-Reference: <1298405266-1624-8-git-send-email-bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner commit ffb9fc68dff38f811eeb24c15aba0418b6a8ee53 Author: Sebastian Andrzej Siewior Date: Tue Feb 22 21:07:41 2011 +0100 x86: dtb: Add device tree support for HPET Set hpet_address based on information provied form DTB Signed-off-by: Sebastian Andrzej Siewior Acked-by: Grant Likely Cc: sodaville@linutronix.de Cc: devicetree-discuss@lists.ozlabs.org Cc: Dirk Brandewie LKML-Reference: <1298405266-1624-7-git-send-email-bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner commit 3879a6f32948330782889cebc4d74c4f2316c676 Author: Sebastian Andrzej Siewior Date: Tue Feb 22 21:07:40 2011 +0100 x86: dtb: Add early parsing of IO_APIC APIC and IO_APIC have to be added to the system early because native_init_IRQ() requires it. In order to obtain the address of the ioapic the device tree has to be unflattened so of_address_to_resource() works. The device tree is relocated to ensure it is always covered by the kernel mapping. That way the boot loader does not have to make any assumptions about kernel's memory layout. Signed-off-by: Sebastian Andrzej Siewior Acked-by: Grant Likely Cc: sodaville@linutronix.de Cc: devicetree-discuss@lists.ozlabs.org Cc: Dirk Brandewie LKML-Reference: <1298405266-1624-6-git-send-email-bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner commit 19c4f5f7f7e9c5db89a91627af2a426cfb5568de Author: Sebastian Andrzej Siewior Date: Tue Feb 22 21:07:39 2011 +0100 x86: dtb: Add irq domain abstraction The here introduced irq_domain abstraction represents a generic irq controller. It is a subset of powerpc's irq_host which is going to be renamed to irq_domain and then become generic. This implementation will be removed once it is generic. The xlate callback is resposible to parse irq informations like irq type and number and returns the hardware irq number which is reported by the hardware as active. Signed-off-by: Sebastian Andrzej Siewior Tested-by: Dirk Brandewie Acked-by: Grant Likely Cc: sodaville@linutronix.de Cc: devicetree-discuss@lists.ozlabs.org LKML-Reference: <1298405266-1624-5-git-send-email-bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner commit df2634f43f5106947f3735a0b61a6527a4b278cd Author: Sebastian Andrzej Siewior Date: Tue Feb 22 21:07:38 2011 +0100 x86: dtb: Add a device tree for CE4100 History: v1..v2: - dropped device_type except for cpu & pci. I have the compatible string for pci so I can drop the device_type once it is possible - I lowercased all compatible types. I will need to resend some patches which have upper case intel - The cpu had the same compatible string as the soc node. So I added to the soc node -immr for internel memory mapped registers. - I added generic names for all parts. - I reworked the i2c bars matching the way you suggested. I added a compatible node for the PCI device which only the PCI ids in its compatible string. The bars (each represents a complete i2c controller) have a "intel,ce4100-i2c-controller" compatible node. It is not used by the driver. The driver is probed via PCI ids (by the pci subsystem not OF) and matches the bar address against the ressource in the child node. Once there is a hit the node is attached. - The SPI driver is also probed via pci. However I also attached a compatible property based on PCI ids v2..v3: - intel,ce4100-immr become intel,ce4100-cp. cp stands for core peripherals. The Atom data sheet talks here about ACPI devices. Since we don't have ACPI this does not apply here. - The interrupt map is gone. There are now plenty of device nodes. - The "unit address string" got fixed, it uses not DD,V format. v3..v4: - added descriptions for compatible nodes introduced here: - intel,ce4100-ioapic - intel,ce4100-lapic - intel,ce4100-hpet - intel,ce4100 - intel,ce4100-cp - intel,ce4100-pci - added a description about I2C controller magic. - Added gpio-controller and gpio-cells property to gpio devices. Those properties are not (yet) used. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Dirk Brandewie Acked-by: Grant Likely Cc: sodaville@linutronix.de Cc: devicetree-discuss@lists.ozlabs.org LKML-Reference: <1298405266-1624-4-git-send-email-bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner commit da6b737b9ab768dd06bb4b0395131d10e524cf83 Author: Sebastian Andrzej Siewior Date: Tue Feb 22 21:07:37 2011 +0100 x86: Add device tree support This patch adds minimal support for device tree on x86. The device tree blob is passed to the kernel via setup_data which requires at least boot protocol 2.09. Memory size, restricted memory regions, boot arguments are gathered the traditional way so things like cmd_line are just here to let the code compile. The current plan is use the device tree as an extension and to gather information which can not be enumerated and would have to be hardcoded otherwise. This includes things like - which devices are on this I2C/SPI bus? - how are the interrupts wired to IO APIC? - where could my hpet be? Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Dirk Brandewie Acked-by: Grant Likely Cc: sodaville@linutronix.de Cc: devicetree-discuss@lists.ozlabs.org LKML-Reference: <1298405266-1624-3-git-send-email-bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner commit f1c2b357148ec27fcc6ce0992211209a0ea20d8f Author: Sebastian Andrzej Siewior Date: Tue Feb 22 21:07:36 2011 +0100 x86: e820: Remove conditional early mapping in parse_e820_ext This patch ensures that the memory passed from parse_setup_data() is large enough to cover the complete data structure. That means that the conditional mapping in parse_e820_ext() can go. While here, I also attempt not to map two pages if the address is not aligned to a page boundary. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Dirk Brandewie Cc: sodaville@linutronix.de Cc: devicetree-discuss@lists.ozlabs.org LKML-Reference: <1298405266-1624-2-git-send-email-bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner commit cb4cfd568c1181501419b89d916e8523107c0106 Merge: 939d578 abb0052 Author: Thomas Gleixner Date: Wed Feb 23 20:00:55 2011 +0100 Merge branch 'x86/apic' into x86/platform Reason: Devicetree based ioapic setup depends on the apic changes. Signed-off-by: Thomas Gleixner commit abb0052289e58140d933b29491f59e4be0a19727 Author: Thomas Gleixner Date: Wed Feb 23 19:54:53 2011 +0100 x86: ioapic: Move trigger defines to io_apic.h Required for devicetree based io_apic configuration. Signed-off-by: Thomas Gleixner commit 710dcda64369e3f3704a0eee502ce27dbf9fedc1 Author: Thomas Gleixner Date: Wed Feb 23 17:47:41 2011 +0100 x86: ioapic: Implement and use io_apic_setup_irq_pin_once() io_apic_set_pci_routing() and mp_save_irq() check the pin_programmed bit before calling io_apic_setup_irq_pin() and set the bit when the pin was setup. Move that duplicated code into a separate function and use it. Signed-off-by: Thomas Gleixner commit b77cf6a8609a8450786c572bc8af6ad068022dbe Author: Thomas Gleixner Date: Wed Feb 23 17:33:53 2011 +0100 x86: ioapic: Remove useless inlines There is no point to have irq_trigger() and irq_polarity() as wrappers around the MPBIOS_* camel case functions. Get rid of both the inlines and the ugly camel case. Signed-off-by: Thomas Gleixner commit 41098ffe050c4befe5fc21a5cedd42ebbd6f7469 Author: Thomas Gleixner Date: Wed Feb 23 16:08:03 2011 +0100 x86: ioapic: Make a few functions static No users outside of io_apic.c. Mark bad_ioapic() __init while at it. Signed-off-by: Thomas Gleixner commit da1ad9d7b2477594e8ff43706644ba8a375ad62a Author: Thomas Gleixner Date: Wed Feb 23 14:52:16 2011 +0100 x86: ioapic: Use setup function in setup_IO_APIC_irq_extra() Another version of the same thing. Only set the pin programmed, when the setup function succeeds. Signed-off-by: Thomas Gleixner commit 2d57e37dbf648fd6547752b8954f4104a85f4b15 Author: Thomas Gleixner Date: Wed Feb 23 14:40:35 2011 +0100 x86: ioapic: Use setup function in __io_apic_setup_irqs() Replace the duplicated code. Signed-off-by: Thomas Gleixner commit e0799c04b2080e0832538a911361f962c93fb744 Author: Thomas Gleixner Date: Wed Feb 23 14:10:54 2011 +0100 x86: ioapic: Use setup function in __io_apic_set_pci_routing() The only difference here is that we did not call __add_pin_to_irq_node() for the legacy irqs, but that's not worth 30 lines of extra code. Signed-off-by: Thomas Gleixner commit f880ec78fabebc58180778d223600e9be7b48502 Author: Thomas Gleixner Date: Wed Feb 23 13:07:54 2011 +0100 x86: ioapic: Use new setup function in pre_init_apic_IRQ0() Remove the duplicated code and call the function. It does not matter whether we allocated the cfg before calling setup_local_APIC() and we can set the irq chip and handler after that as well. Signed-off-by: Thomas Gleixner commit ff973d041e5ab9ada9e49f4e93ef3a699c511463 Author: Thomas Gleixner Date: Wed Feb 23 13:00:56 2011 +0100 x86: ioapic: Add io_apic_setup_irq_pin() There are about four places in the ioapic code which do exactly the same setup sequence. Also the OF based ioapic setup needs that function to avoid putting the OF specific code into ioapic.c Signed-off-by: Thomas Gleixner commit ed972ccf434a9881a5881915ae04602af2776bad Author: Thomas Gleixner Date: Wed Feb 23 14:31:36 2011 +0100 x86: ioapic: Split out the nested loop in setup_IO_APIC_irqs() Two consecutive for(...) for(...) lines to avoid an extra indentation are just horrible to read. I had to look more than once to figure out what the code is doing. Split out the inner loop into a separate function. Signed-off-by: Thomas Gleixner commit c8d6b8fe72216ca47e399204b58c8be0448d4083 Author: Thomas Gleixner Date: Wed Feb 23 14:29:34 2011 +0100 x86: ioapic: Remove silly debug bloat in setup_IOAPIC_irqs() This is debug code and it does not matter at all whether we print each not connected pin in an extra line or try to be extra clever. Signed-off-by: Thomas Gleixner commit 170ae6bc24e1d7f9bd921a484ec9ea2825497970 Author: Arnaldo Carvalho de Melo Date: Wed Feb 23 11:08:59 2011 -0300 perf annotate: Show better message when no vmlinux is found In both --tui and --stdio, in 'annotate', 'top', 'report' when trying to annotate a kernel symbol having just access to a kallsyms file, that doesn't have the DWARF info needed for annotation. Suggested-by: Ingo Molnar Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 9848caf26dafefcec7881f0e3c35fc81c454ba59 Merge: 768a06e 9826e83 Author: Ingo Molnar Date: Wed Feb 23 15:50:13 2011 +0100 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core commit 939d578ecc62b07efeb186576ab190fe0b766501 Author: Thomas Gleixner Date: Wed Feb 23 11:46:01 2011 +0100 x86: OLPC: Make OLPC=n build again Stupid me missed the functions called from setup.c. Add the stubs back for OLPC=n Signed-off-by: Thomas Gleixner commit 1444e0c9daf0d3472677efc15588b192fc2db761 Author: Henrik Kretzschmar Date: Tue Feb 22 15:38:07 2011 +0100 x86: Fix deps of X86_UP_IOAPIC Since commit 7cd92366a593246650cc7d6198e2c7d3af8c1d8a lAPIC enabled accidently the IOAPIC, which now gets fixed. Signed-off-by: Henrik Kretzschmar LKML-Reference: <1298385487-4708-5-git-send-email-henne@nachtwindheim.de> Signed-off-by: Ingo Molnar commit 7d0f1926131cf79aa5998d463bf1582156e7b41e Author: Henrik Kretzschmar Date: Tue Feb 22 15:38:06 2011 +0100 x86: Add dummy functions for compiling without IOAPIC This patch adds IOAPIC dummy functions for compilation with local APIC, but without IOAPIC. The local variable ioapic_entries in enable_IR_x2apic() does not need initialization anymore, since the dummy returns NULL. Signed-off-by: Henrik Kretzschmar LKML-Reference: <1298385487-4708-4-git-send-email-henne@nachtwindheim.de> Signed-off-by: Ingo Molnar commit 7167d08e780a722fa79ea414fc4e72bc00751392 Author: Henrik Kretzschmar Date: Tue Feb 22 15:38:05 2011 +0100 x86: Rework arch_disable_smp_support() for x86 Currently arch_disable_smp_support() on x86 disables only the support for the IOAPIC and is also compiled in if SMP-support is not. Therefore this function is renamed to disable_ioapic_support(), which meets its purpose and is only compiled in the kernel when IOAPIC support is also. A new arch_disable_smp_support() is created in smpboot.c, which calls disable_ioapic_support() and gets only compiled in the kernel when SMP support is also. Signed-off-by: Henrik Kretzschmar LKML-Reference: <1298385487-4708-3-git-send-email-henne@nachtwindheim.de> Signed-off-by: Ingo Molnar commit b6a1432da81fa387d76215108dc9f6ea6d343aed Author: Henrik Kretzschmar Date: Tue Feb 22 15:38:04 2011 +0100 x86: Add dummy mp_save_irq() This is a dummy function, used when no IOAPIC is compiled in. Signed-off-by: Henrik Kretzschmar LKML-Reference: <1298385487-4708-2-git-send-email-henne@nachtwindheim.de> Signed-off-by: Ingo Molnar commit 4e034b245133adfd006ade5d7a809c9cac4beef9 Author: Henrik Kretzschmar Date: Tue Feb 22 15:38:03 2011 +0100 x86: Move ioapic_irq_destination_types to apicdef.h This enum is used by non IOAPIC code, so apicdef.h is the best place for it. Signed-off-by: Henrik Kretzschmar LKML-Reference: <1298385487-4708-1-git-send-email-henne@nachtwindheim.de> Signed-off-by: Ingo Molnar commit 768a06e2ca49cdf72389208cfc056a36cf8bc5e3 Author: Peter Zijlstra Date: Tue Feb 22 16:52:24 2011 +0100 perf: Simplify task_clock_event_read() There is no point in us having different code paths for nmi and !nmi here, so remove the !nmi one. Signed-off-by: Peter Zijlstra Cc: Stephane Eranian LKML-Reference: Signed-off-by: Ingo Molnar commit 3f7cce3c18188a067d463749168bdda5abc5b0f7 Author: Stephane Eranian Date: Fri Feb 18 14:40:01 2011 +0200 perf_events: Fix rcu and locking issues with cgroup support This patches ensures that we do not end up calling perf_cgroup_from_task() when there is no cgroup event. This avoids potential RCU and locking issues. The change in perf_cgroup_set_timestamp() ensures we check against ctx->nr_cgroups. It also avoids calling perf_clock() tiwce in a row. It also ensures we do need to grab ctx->lock before calling the function. We drop update_cgrp_time() from task_clock_event_read() because it is not needed. This also avoids having to deal with perf_cgroup_from_task(). Thanks to Peter Zijlstra for his help on this. Signed-off-by: Stephane Eranian Signed-off-by: Peter Zijlstra LKML-Reference: <4d5e76b8.815bdf0a.7ac3.774f@mx.google.com> Signed-off-by: Ingo Molnar commit 511f67a5997c4967c69a3961e2fc9f04d8d244ac Author: Mike Galbraith Date: Tue Feb 22 15:02:00 2011 +0100 sched, autogroup: Stop claiming ownership of the root task group Disown it, and only display autogroup association if one exists. Signed-off-by: Mike Galbraith Reviewed-by: Yong Zhang Signed-off-by: Peter Zijlstra LKML-Reference: <1298383320.8036.5.camel@marge.simson.net> Signed-off-by: Ingo Molnar commit 800d4d30c8f20bd728e5741a3b77c4859a613f7c Author: Yong Zhang Date: Sun Feb 20 15:08:14 2011 +0800 sched, autogroup: Stop going ahead if autogroup is disabled when autogroup is disable from the beginning, sched_autogroup_create_attach() autogroup_move_group() <== 1 sched_move_task() <== 2 task_move_group_fair() set_task_rq() task_group() autogroup_task_group() We go the whole path without doing anything useful. Then stop going further if autogroup is disabled. But there will be a race window between 1 and 2, in which sysctl_sched_autogroup_enabled is enabled. This issue will be toke by following patch. Signed-off-by: Yong Zhang Signed-off-by: Peter Zijlstra Cc: Mike Galbraith LKML-Reference: <1298185696-4403-4-git-send-email-yong.zhang0@gmail.com> Signed-off-by: Ingo Molnar commit 1747b21fecbfb63fbf6b9624e8b92707960d5a97 Author: Yong Zhang Date: Sun Feb 20 15:08:12 2011 +0800 sched, autogroup, sysctl: Use proc_dointvec_minmax() instead sched_autogroup_enabled has min/max value, proc_dointvec_minmax() is be used for this case. Signed-off-by: Yong Zhang Cc: Mike Galbraith Signed-off-by: Peter Zijlstra LKML-Reference: <1298185696-4403-2-git-send-email-yong.zhang0@gmail.com> Signed-off-by: Ingo Molnar commit 866ab43efd325fae8889ea77a744d03f2b957e38 Author: Peter Zijlstra Date: Mon Feb 21 18:56:47 2011 +0100 sched: Fix the group_imb logic On a 2*6*2 machine something like: taskset -c 3-11 bash -c 'for ((i=0;i<9;i++)) do while :; do :; done & done' _should_ result in 9 busy CPUs, each running 1 task. However it didn't quite work reliably, most of the time one cpu of the second socket (6-11) would be idle and one cpu of the first socket (0-5) would have two tasks on it. The group_imb logic is supposed to deal with this and detect when a particular group is imbalanced (like in our case, 0-2 are idle but 3-5 will have 4 tasks on it). The detection phase needed a bit of a tweak as it was too weak and required more than 2 avg weight tasks difference between idle and busy cpus in the group which won't trigger for our test-case. So cure that to be one or more avg task weight difference between cpus. Once the detection phase worked, it was then defeated by the f_b_g() tests trying to avoid ping-pongs. In particular, this_load >= max_load triggered because the pulling cpu (the (first) idle cpu in on the second socket, say 6) would find this_load to be 5 and max_load to be 4 (there'd be 5 tasks running on our socket and only 4 on the other socket). Signed-off-by: Peter Zijlstra Cc: Nikhil Rao Cc: Venkatesh Pallipadi Cc: Suresh Siddha Cc: Mike Galbraith LKML-Reference: Signed-off-by: Ingo Molnar commit cc57aa8f4b3bece8c26c7929728edcc5fa6b5aed Author: Peter Zijlstra Date: Mon Feb 21 18:55:32 2011 +0100 sched: Clean up some f_b_g() comments The existing comment tends to grow state (as it already has), split it up and place it near the actual tests. Signed-off-by: Peter Zijlstra Cc: Nikhil Rao Cc: Venkatesh Pallipadi Cc: Suresh Siddha Cc: Mike Galbraith LKML-Reference: Signed-off-by: Ingo Molnar commit c186fafe9aba87c1a93df8c7120a6ae01fe435ad Author: Peter Zijlstra Date: Mon Feb 21 18:52:53 2011 +0100 sched: Clean up remnants of sd_idle With the wholesale removal of the sd_idle SMT logic we can clean up some more. Signed-off-by: Peter Zijlstra Cc: Nikhil Rao Cc: Venkatesh Pallipadi Cc: Suresh Siddha Cc: Mike Galbraith LKML-Reference: Signed-off-by: Ingo Molnar commit d927dc937910ad8c7350266cac70e42a5f0b48cf Merge: 46e49b3 f5412be Author: Ingo Molnar Date: Wed Feb 23 11:31:34 2011 +0100 Merge commit 'v2.6.38-rc6' into sched/core Merge reason: Pick up the latest fixes before queueing up new changes. Signed-off-by: Ingo Molnar commit 9826e8329bc160e4cc58b83019f3f056965e42d0 Author: Marcin Slusarz Date: Tue Feb 22 21:53:12 2011 +0100 perf lock: Document valid sort keys Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <20110222205312.GA18474@joi.lan> Signed-off-by: Marcin Slusarz Signed-off-by: Arnaldo Carvalho de Melo commit 6435a5e39d3e01a1a73a925ed53ee18619b0a368 Author: Arnaldo Carvalho de Melo Date: Wed Feb 23 07:25:02 2011 -0300 perf top browser: Adjust the browser indexes when refreshing This is not a problem when we're not at the bottom of the active symbols list, so was not noticed, but at the end of the screen it falls apart. Fix it by adjusting the ui_browser indexes according to the new number of entries in the rb_tree and by seeking from the start of the rb_tree to find the new symbol at the top of the screen. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit c2a941fadb57d157d59eec424674bd0c3a28788c Author: Thomas Gleixner Date: Wed Feb 23 10:32:42 2011 +0100 x86: OLPC: Remove extra OLPC_OPENFIRMWARE_DT indirection OLPC_OPENFIRMWARE_DT is just there to be selected by OLPC and selects OF_PROMTREE. So let OLPC select OF_PROMTREE and remove that extra config indirection. Fixup code and Makefile and use CONFIG_OF_PROMTREE instead. Signed-off-by: Thomas Gleixner Cc: Andres Salomon commit dc3119e700216a70e82fe07a79f1618852058354 Author: Thomas Gleixner Date: Wed Feb 23 10:08:31 2011 +0100 x86: OLPC: Cleanup config maze completely Neither CONFIG_OLPC_OPENFIRMWARE nor CONFIG_OLPC_OPENFIRMWARE_DT are really necessary. OLPC selects OLPC_OPENFIRMWARE unconditionally, so move the "select OF" part under OLPC config option and fixup the dependencies in Makefiles and code. Signed-off-by: Thomas Gleixner Cc: Andres Salomon commit fe239545a1eed57a60c5d4063f0b56f6cd1811ff Author: Thomas Gleixner Date: Wed Feb 23 10:05:53 2011 +0100 x86: OLPC: Hide OLPC_OPENFIRMWARE config switch OLPC selects OLPC_OPENFIRMWARE unconditionally. If OLPC=n then the OLPC_OPENFIRMWARE functionality is pointless. Signed-off-by: Thomas Gleixner Cc: Andres Salomon commit 540089798d2ae115fc9bff7ed3823c8c32249607 Author: Thomas Gleixner Date: Wed Feb 23 09:50:15 2011 +0100 x86: OLPC: Remove redundant !X64_64 config dependency OLPC is under if X86_32 already. Signed-off-by: Thomas Gleixner Cc: Andres Salomon commit 7acdbb3f35f4d08c0c4f7cfa306bc7006b6ba902 Merge: 695884f f5412be Author: Thomas Gleixner Date: Wed Feb 23 09:21:41 2011 +0100 Merge branch 'linus' into x86/platform Reason: Import mainline device tree changes on which further patches depend on or conflict. Trivial conflict in: drivers/spi/pxa2xx_spi_pci.c Signed-off-by: Thomas Gleixner commit fd4afaf33313d94f548cb09129ecba3dbab62931 Author: Jan Beulich Date: Thu Feb 17 13:39:05 2011 +0000 genirq: Streamline kernel/irq/Kconfig "def_bool n" without prompt is pointless, these should be just "bool". [ tglx: Adapted to latest changes ] Signed-off-by: Jan Beulich LKML-Reference: <4D5D3309020000780003264A@vpn.id2.novell.com> Signed-off-by: Thomas Gleixner commit dbebbfbb1605f0179e7c0d900d941cc9c45de569 Author: Thomas Gleixner Date: Tue Feb 22 21:46:25 2011 +0100 rtmutex: tester: Remove the remaining BKL leftovers We just leave the numbers assinged as commemoration and in case that someone was crazy enough to reimplement the test stuff out of tree. Signed-off-by: Thomas Gleixner commit 2f14ddc3a7146ea4cd5a3d1ecd993f85f2e4f948 Author: Zhang, Fengzhe Date: Wed Feb 16 22:26:20 2011 +0800 xen/setup: Inhibit resource API from using System RAM E820 gaps as PCI mem gaps. With the hypervisor argument of dom0_mem=X we iterate over the physical (only for the initial domain) E820 and subtract the the size from each E820_RAM region the delta so that the cumulative size of all E820_RAM regions is equal to 'X'. This sometimes ends up with E820_RAM regions with zero size (which are removed by e820_sanitize) and E820_RAM that are smaller than physically. Later on the PCI API looks at the E820 and attempts to set up an resource region for the "PCI mem". The E820 (assume dom0_mem=1GB is set) compared to the physical looks as so: [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 0000000000097c00 (usable) [ 0.000000] Xen: 0000000000097c00 - 0000000000100000 (reserved) -[ 0.000000] Xen: 0000000000100000 - 00000000defafe00 (usable) +[ 0.000000] Xen: 0000000000100000 - 0000000040000000 (usable) [ 0.000000] Xen: 00000000defafe00 - 00000000defb1ea0 (ACPI NVS) [ 0.000000] Xen: 00000000defb1ea0 - 00000000e0000000 (reserved) [ 0.000000] Xen: 00000000f4000000 - 00000000f8000000 (reserved) .. And we get [ 0.000000] Allocating PCI resources starting at 40000000 (gap: 40000000:9efafe00) while it should have started at e0000000 (a nice big gap up to f4000000 exists). The "Allocating PCI" is part of the resource API. The users that end up using those PCI I/O regions usually supply their own BARs when calling the resource API (request_resource, or allocate_resource), but there are exceptions which provide an empty 'struct resource' and expect the API to provide the 'struct resource' to be populated with valid values. The one that triggered this bug was the intel AGP driver that requested a region for the flush page (intel_i9xx_setup_flush). Before this patch, when running under Xen hypervisor, the 'struct resource' returned could have (depending on the dom0_mem size) physical ranges of a 'System RAM' instead of 'I/O' regions. This ended up with the Hypervisor failing a request to populate PTE's with those PFNs as the domain did not have access to those 'System RAM' regions (rightly so). After this patch, the left-over E820_RAM region from the truncation, will be labeled as E820_UNUSABLE. The E820 will look as so: [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 0000000000097c00 (usable) [ 0.000000] Xen: 0000000000097c00 - 0000000000100000 (reserved) -[ 0.000000] Xen: 0000000000100000 - 00000000defafe00 (usable) +[ 0.000000] Xen: 0000000000100000 - 0000000040000000 (usable) +[ 0.000000] Xen: 0000000040000000 - 00000000defafe00 (unusable) [ 0.000000] Xen: 00000000defafe00 - 00000000defb1ea0 (ACPI NVS) [ 0.000000] Xen: 00000000defb1ea0 - 00000000e0000000 (reserved) [ 0.000000] Xen: 00000000f4000000 - 00000000f8000000 (reserved) For more information: http://mid.gmane.org/1A42CE6F5F474C41B63392A5F80372B2335E978C@shsmsx501.ccr.corp.intel.com BugLink: http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1726 Signed-off-by: Fengzhe Zhang Signed-off-by: Konrad Rzeszutek Wilk commit 695884fb8acd9857e0e7120ccb2150e30f4b8fef Merge: 5df9150 04bea68 Author: Thomas Gleixner Date: Tue Feb 22 18:24:26 2011 +0100 Merge branch 'devicetree/for-x86' of git://git.secretlab.ca/git/linux-2.6 into x86/platform Reason: x86 devicetree support for ce4100 depends on those device tree changes scheduled for .39. Signed-off-by: Thomas Gleixner commit c97cf42219b7b6037d2f96c27a5f114f2383f828 Author: Arnaldo Carvalho de Melo Date: Tue Feb 22 12:02:07 2011 -0300 perf top: Live TUI Annotation Now one has just to press the right key, 'a' or Enter on the main 'perf top --tui' screen to live annotate the symbol under the cursor. The annotate window starts centered on the hottest line (the one with most samples so far) then TAB and shift+TAB can be used to go to the prev/next hot line. Pressing 'H' at any point will center again the screen on the hottest line. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 70433c01613c2a44756c7b25f7bdd6c1c77b119f Author: Thomas Gleixner Date: Tue Feb 22 12:50:12 2011 +0100 genirq: Use the correct variable for note_interrupt note_interrupt wants to be called with the combined result of all handlers called, not with the last one. If it's a shared interrupt then the last handler might return IRQ_NONE often enough to trigger the spurious dectector which turns off a perfectly fine working interrupt line. Bug was introduced in commit 1277a532(genirq: Simplify handle_irq_event()). Yes, I really messed up there. First the variable ret should not have been named differently to avoid similarity with retval. Second it should have been declared in the do {} loop. Rename it to res and move it into the do {} loop and vanish under a huge brown paperbag. Reported-bisected-tested-by: Ingo Molnar Signed-off-by: Thomas Gleixner commit 2bf50555b0920be7e29d3823f6bbd20ee5920489 Author: Yinghai Lu Date: Tue Feb 22 11:18:49 2011 +0100 x86-64, NUMA: Seperate out numa_alloc_distance() from numa_set_distance() Alloc code is much bigger the distance setting. Separate it out into numa_alloc_distance() for readability. -v2: Let alloc_numa_distance to return -ENOMEM on failing path, requested by tj. -tj: Description update. Minor tweaks including function name, location and return value check. Signed-off-by: Yinghai Lu Acked-by: David Rientjes Signed-off-by: Tejun Heo commit 90e6b677b47ff8c5ba1637941af6b9f92723b003 Author: Tejun Heo Date: Tue Feb 22 11:10:08 2011 +0100 x86-64, NUMA: Add proper function comments to global functions Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Ingo Molnar commit b8ef9172b2aad7eeb1fcd37a9e632c7b24da1f64 Author: Tejun Heo Date: Tue Feb 22 11:10:08 2011 +0100 x86-64, NUMA: Move NUMA emulation into numa_emulation.c Create numa_emulation.c and move all NUMA emulation code there. The definitions of struct numa_memblk and numa_meminfo are moved to numa_64.h. Also, numa_remove_memblk_from(), numa_cleanup_meminfo(), numa_reset_distance() along with numa_emulation() are made global. - v2: Internal declarations moved to numa_internal.h as suggested by Yinghai. Signed-off-by: Tejun Heo Acked-by: Yinghai Lu Cc: Ingo Molnar commit fbe99959d1db85222829a64d869dcab704ac7ec8 Author: Tejun Heo Date: Tue Feb 22 11:10:08 2011 +0100 x86-64, NUMA: Prepare numa_emulation() for moving NUMA emulation into a separate file Update numa_emulation() such that, it - takes @numa_meminfo and @numa_dist_cnt instead of directly referencing the global variables. - copies the distance table by iterating each distance with node_distance() instead of memcpy'ing the distance table. - tests emu_cmdline to determine whether emulation is requested and fills emu_nid_to_phys[] with identity mapping if emulation is not used. This allows the caller to call numa_emulation() unconditionally and makes return value unncessary. - defines dummy version if CONFIG_NUMA_EMU is disabled. This patch doesn't introduce any behavior change. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Ingo Molnar commit 8635bf6ea3402154eec64763e6ed14972013c1c1 Author: Arnaldo Carvalho de Melo Date: Tue Feb 22 06:56:18 2011 -0300 perf probe: Remove redundant checks While fixing an error propagating problem in f809b25 I added two redundant checks. I did that because I didn't expect the checks to be on the while and for loop condition expression, where they are tested before we run the loop, where the 'ret' variable is set. So remove it from there and leave it just after it is actually set, eliminating unneded tests. Reported-by: Masami Hiramatsu Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Masami Hiramatsu Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit e603dc15072c7fec0ae263597e6dabc3bb4c5c5b Author: Arnaldo Carvalho de Melo Date: Mon Feb 21 16:05:50 2011 -0300 perf evsel: Fix inverted test for fixing up attr.inherit flag The kernel refuses mmapping an event with the inherit flag set for something that is systemwide (cpu == -1), and the evsel layer got this reversed at some point, fix it. The symtom was that the --pid and --tid parameters for 'perf record' and 'perf top' returned with -EINVAL, like: # /tmp/build-perf/perf record -v -fo/tmp/perf.data -p 1042 Warning: ... trying to fall back to cpu-clock-ticks Fatal: failed to mmap with 22 (Invalid argument) Reported-by: David Ahern Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit fbee632d0ca9f4073a3fefb9a843eac8af036b0f Author: Arnaldo Carvalho de Melo Date: Mon Feb 21 13:23:57 2011 -0300 perf probe: Fix error propagation leading to segfault There are two hunks in this patch that stops probe processing as soon as one error is found, breaking out of loops, the other fix an error propagation that should return a negative error number but instead was returning the result of "ret < 0", which is 1 and thus made several error checks fail because they test agains < 0. The problem could be triggered by asking for a variable that was optimized out, fact that should stop the whole probe processing but instead was segfaulting while installing broken probes: [root@emilia ~]# probe perf_mmap:55 user_lock_limit Failed to find the location of user_lock_limit at this address. Perhaps, it has been optimized out. Failed to find 'user_lock_limit' in this function. Add new events: probe:perf_mmap (on perf_mmap:55 with user_lock_limit) probe:perf_mmap_1 (on perf_mmap:55 with user_lock_limit) Segmentation fault (core dumped) [root@emilia ~]# perf probe -l probe:perf_mmap (on perf_mmap:55@git/linux/kernel/perf_event.c with user_lock_limit) probe:perf_mmap_1 (on perf_mmap:55@git/linux/kernel/perf_event.c with user_lock_limit) [root@emilia ~]# After the fix: [root@emilia ~]# probe perf_mmap:55 user_lock_limit Failed to find the location of user_lock_limit at this address. Perhaps, it has been optimized out. Failed to find 'user_lock_limit' in this function. Error: Failed to add events. (-2) [root@emilia ~]# Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Masami Hiramatsu Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 7fdd7f89006dd5a4c702fa0ce0c272345fa44ae0 Author: John Stultz Date: Tue Feb 15 10:52:57 2011 -0800 timers: Export CLOCK_BOOTTIME via the posix timers interface This patch exports CLOCK_BOOTTIME through the posix timers interface CC: Jamie Lokier CC: Thomas Gleixner CC: Alexander Shishkin CC: Arve Hjønnevåg Signed-off-by: John Stultz commit 70a08cca1227dc31c784ec930099a4417a06e7d0 Author: John Stultz Date: Tue Feb 15 10:45:16 2011 -0800 timers: Add CLOCK_BOOTTIME hrtimer base CLOCK_MONOTONIC stops while the system is in suspend. This is because to applications system suspend is invisible. However, there is a growing set of applications that are wanting to be suspend-aware, but do not want to deal with the complications of CLOCK_REALTIME (which might jump around if settimeofday is called). For these applications, I propose a new clockid: CLOCK_BOOTTIME. CLOCK_BOOTTIME is idential to CLOCK_MONOTONIC, except it also includes any time spent in suspend. This patch add hrtimer base for CLOCK_BOOTTIME, using get_monotonic_boottime/ktime_get_boottime, to allow in kernel users to set timers against. CC: Jamie Lokier CC: Thomas Gleixner CC: Alexander Shishkin CC: Arve Hjønnevåg Signed-off-by: John Stultz commit 314ac37150011ebb398f522db528d2dbcc611189 Author: John Stultz Date: Mon Feb 14 18:43:08 2011 -0800 time: Extend get_xtime_and_monotonic_offset() to also return sleep Extend get_xtime_and_monotonic_offset to get_xtime_and_monotonic_and_sleep_offset(). CC: Jamie Lokier CC: Thomas Gleixner CC: Alexander Shishkin CC: Arve Hjønnevåg Signed-off-by: John Stultz commit abb3a4ea2e0ea7114a4475745da2f32bd9ad5b73 Author: John Stultz Date: Mon Feb 14 17:52:09 2011 -0800 time: Introduce get_monotonic_boottime and ktime_get_boottime This adds new functions that return the monotonic time since boot (in other words, CLOCK_MONOTONIC + suspend time). CC: Jamie Lokier CC: Thomas Gleixner CC: Alexander Shishkin CC: Arve Hjønnevåg Signed-off-by: John Stultz commit e06383db9ec591696a06654257474b85bac1f8cb Author: John Stultz Date: Tue Dec 14 19:37:07 2010 -0800 hrtimers: extend hrtimer base code to handle more then 2 clockids The hrtimer code is written mainly with CLOCK_REALTIME and CLOCK_MONOTONIC in mind. These are clockids 0 and 1 resepctively. However, if we are to introduce any new hrtimer bases, using new clockids, we have to skip the cputimers (clockids 2,3) as well as other clockids that may not impelement timers. This patch adds a little bit of indirection between the clockid and the base, so that we can extend the base by one when we add a new clockid at number 7 or so. CC: Jamie Lokier CC: Thomas Gleixner CC: Alexander Shishkin CC: Arve Hjønnevåg Signed-off-by: John Stultz commit 8fff39e06987492da3d4a0b9ec7cdbd245b6762b Author: Thomas Gleixner Date: Mon Feb 21 14:19:42 2011 +0100 genirq: Add missing break in __irq_set_trigger() The switch case in __irq_set_trigger() lacks a break, which emits a pr_err unconditionally on success. Reported-by: Lars-Peter Clausen Signed-off-by: Thomas Gleixner commit ed4dea6e0e33a3e58d8b77b775a8f0e433e7a005 Author: Yinghai Lu Date: Sat Feb 19 11:07:37 2011 -0800 genirq: Use IRQ_BITMAP_BITS as search size in irq_alloc_descs() The runtime expansion of nr_irqs does not take into account that bitmap_find_next_zero_area() returns "start" + size in case the search for an matching zero area fails. That results in a start value which can be completely off and is not covered by the following expand_nr_irqs() and possibly outside of the absolute limit. But we use it without further checking. Use IRQ_BITMAP_BITS as the limit for the bitmap search and expand nr_irqs when the start bit is beyond nr_irqs. So start is always pointing to the correct area in the bitmap. nr_irqs is just the limit for irq enumerations, not the real limit for the irq space. [ tglx: Let irq_expand_nr_irqs() take the new upper end so we do not expand nr_irqs more than necessary. Made changelog readable ] Signed-off-by: Yinghai Lu LKML-Reference: <4D6014F9.8040605@kernel.org> Signed-off-by: Thomas Gleixner commit a61d825808a0ce9935afebc225dcd602d5339e14 Author: Thomas Gleixner Date: Mon Feb 21 12:54:34 2011 +0100 genirq: Fix misplaced status update in irq_disable() We lazy disable interrupt lines, so only mark the line masked, when the chip provides an irq_disable callback. Signed-off-by: Thomas Gleixner commit 69efcc6d90d234a3a076afb2c635c1609536faa4 Author: Yinghai Lu Date: Mon Feb 21 10:58:13 2011 +0100 x86-64, NUMA: Do not scan two times for setup_node_bootmem() By the time setup_node_bootmem() is called, all the memblocks are already registered. As node_data is allocated from these memblocks, calling it more than once doesn't make any difference. Drop the loop. tj: Dropped comment referencing to the old behavior as suggested by David and rephrased the description. Signed-off-by: Yinghai Lu Acked-by: David Rientjes Signed-off-by: Tejun Heo commit 1396fa9cd2e34669253b7ca8c75f12103481f71c Author: Dan Carpenter Date: Fri Feb 18 12:17:16 2011 +0300 x86, microcode, AMD: Fix signedness bug in generic_load_microcode() install_equiv_cpu_table() returns type int. It uses negative error codes so using an unsigned type breaks the error handling. Signed-off-by: Dan Carpenter Acked-by: Borislav Petkov Cc: open list:AMD MICROCODE UPD... Cc: Andreas Herrmann LKML-Reference: <20110218091716.GA4384@bicker> Signed-off-by: Ingo Molnar commit 2b15cd96e5e93f9aa4f7041c91c1e7c344a62cbb Author: Borislav Petkov Date: Fri Feb 18 16:47:36 2011 +0100 x86, system.h: Drop unused __SAVE/__RESTORE macros Those are unused since at least the beginning of git history. Signed-off-by: Borislav Petkov LKML-Reference: <1298044056-31104-1-git-send-email-bp@amd64.org> Signed-off-by: Ingo Molnar commit a439520f8b18917b322f576be04c54aba84bb044 Author: Thomas Gleixner Date: Fri Feb 4 18:46:16 2011 +0100 genirq: Implement irq_data based move_*_irq() versions No need to lookup the irq descriptor when calling from a chip callback function which has irq_data already handy. Signed-off-by: Thomas Gleixner commit 77694b408abb8f92195ad5ed6ce5492f1d794c77 Author: Thomas Gleixner Date: Tue Feb 15 10:33:57 2011 +0100 genirq; Add fasteoi irq_chip quirk Some chips want irq_eoi() only called when an interrupt is actually handled. So they have checks for INPROGRESS and DISABLED in their irq_eoi callbacks. Add a chip flag, which allows to handle that in the generic code. No impact on the fastpath. Signed-off-by: Thomas Gleixner commit 781295762defc709a609efc01d8bb065276cd9a2 Author: Thomas Gleixner Date: Thu Feb 10 15:14:20 2011 +0100 genirq: Add preflow handler support sparc64 needs to call a preflow handler on certain interrupts befor calling the action chain. Integrate it into handle_fasteoi_irq. Must be enabled via CONFIG_IRQ_FASTEOI_PREFLOW. No impact when disabled. Signed-off-by: Thomas Gleixner Cc: David S. Miller commit 3836ca08aad4575c120ccf328652f3873eea9063 Author: Thomas Gleixner Date: Mon Feb 14 20:09:19 2011 +0100 genirq: Consolidate set_chip_handler functions No need to have separate functions if we have one plus inline wrappers. Signed-off-by: Thomas Gleixner commit 02725e7471b8dd58fa96f6604bdb5dde45405a2e Author: Thomas Gleixner Date: Sat Feb 12 10:37:36 2011 +0100 genirq: Use irq_get/put functions Convert the management functions to use the common irq_get/put function. Signed-off-by: Thomas Gleixner commit d5eb4ad2dfb2dfae43fd51bc8630b4fc3ef00e92 Author: Thomas Gleixner Date: Sat Feb 12 12:16:16 2011 +0100 genirq: Implement irq_get/put_desc_[bus]locked/unlock() Most of the managing functions get the irq descriptor and lock it - either with or without buslock. Instead of open coding this over and over provide a common function to do that. Signed-off-by: Thomas Gleixner commit 091738a266fc74329ae186f22ff2b3f01319112d Author: Thomas Gleixner Date: Mon Feb 14 20:16:43 2011 +0100 genirq: Remove real old transition functions These transition helpers are stale for years now. Remove them. Signed-off-by: Thomas Gleixner commit a6967caf00ebbb2d4acdebcb72a25f2e9ba43fd2 Author: Thomas Gleixner Date: Thu Feb 10 22:01:25 2011 +0100 genirq: Remove desc->status when GENERIC_HARDIRQS_NO_COMPAT=y If everything uses the right accessors, then enabling GENERIC_HARDIRQS_NO_COMPAT should just work. If not it will tell you. Don't be lazy and use the trick which I use in the core code! git grep status_use_accessors will unearth it in a split second. Offenders are tracked down and not slapped with stinking trouts. This time we use frozen shark for a better educational value. Signed-off-by: Thomas Gleixner commit e1ef824146131709d7466e37f889f2dab24ca98e Author: Thomas Gleixner Date: Thu Feb 10 22:25:31 2011 +0100 genirq: Reflect IRQ_MOVE_PCNTXT in irq_data state Required by x86. Signed-off-by: Thomas Gleixner commit 7f94226f03299f1ca32f118f02f2a0295e0e5e93 Author: Thomas Gleixner Date: Thu Feb 10 19:46:26 2011 +0100 genirq: Move wakeup state to irq_data Some irq_chips need to know the state of wakeup mode for setting the trigger type etc. Reflect it in irq_data state. Signed-off-by: Thomas Gleixner commit d4d5e08960844a062da8387ee5f16ca7a33200d0 Author: Thomas Gleixner Date: Thu Feb 10 13:16:14 2011 +0100 genirq: Add IRQCHIP_SET_TYPE_MASKED flag irq_chips, which require to mask the chip before changing the trigger type should set this flag. So the core takes care of it and the requirement for looking into desc->status in the chip goes away. Signed-off-by: Thomas Gleixner Cc: Linus Walleij Cc: Lars-Peter Clausen commit 2bff17ad2107c66fc8ca96501a7128dd7fa7a390 Author: Thomas Gleixner Date: Thu Feb 10 13:08:38 2011 +0100 genirq: Add flags to irq_chip Looking through irq_chip implementations I noticed that some of them have special requirements, like setting the type masked and therefor fiddle in irq_desc->status. Add a flag field, so the core code can handle it. Signed-off-by: Thomas Gleixner commit 5d4d8fc9ac3e9a90bbdf90bae6864cb2c01f2208 Author: Thomas Gleixner Date: Tue Feb 8 17:27:18 2011 +0100 genirq: Cleanup irq.h Put the constants into an enum and document them. Signed-off-by: Thomas Gleixner commit f9e4989eb8183a1f33581fa1b99274287b0639d2 Author: Thomas Gleixner Date: Wed Feb 9 14:54:49 2011 +0100 genirq: Force wrapped access to desc->status in core code Force the usage of wrappers by another nasty CPP substitution. Signed-off-by: Thomas Gleixner commit 1ccb4e612f68ceefb888c2c6c1def6294ea8666d Author: Thomas Gleixner Date: Wed Feb 9 14:44:17 2011 +0100 genirq: Wrap the remaning IRQ_* flags Use wrappers to keep them away from the core code. Signed-off-by: Thomas Gleixner commit 876dbd4cc1b35c1a4cb96a2be1d43ea0eabce3b4 Author: Thomas Gleixner Date: Tue Feb 8 17:28:12 2011 +0100 genirq: Mirror irq trigger type bits in irq_data.state That's the data structure chip functions get provided. Also allow them to signal the core code that they updated the flags in irq_data.state by returning IRQ_SET_MASK_OK_NOCOPY. The default is unchanged. The type bits should be accessed via: val = irqd_get_trigger_type(irqdata); and irqd_set_trigger_type(irqdata, val); Coders who access them directly will be tracked down and slapped with stinking trouts. Signed-off-by: Thomas Gleixner commit 2bdd10558c8d93009cb6c32ce9e30800fbb08add Author: Thomas Gleixner Date: Tue Feb 8 17:22:00 2011 +0100 genirq: Move IRQ_AFFINITY_SET to core Keep status in sync until last abuser is gone. Signed-off-by: Thomas Gleixner commit bce43032ad79fae0ce5b6174ce1321e643ceb54b Author: Thomas Gleixner Date: Thu Feb 10 22:37:41 2011 +0100 genirq: Reuse existing can set affinty check Add a !desc check while at it. Signed-off-by: Thomas Gleixner commit a005677b3dd05decdd8880cf3044ae709856f58f Author: Thomas Gleixner Date: Tue Feb 8 17:11:03 2011 +0100 genirq: Mirror IRQ_PER_CPU and IRQ_NO_BALANCING in irq_data.state That's the right data structure to look at for arch code. Accessor functions are provided. irqd_is_per_cpu(irqdata); irqd_can_balance(irqdata); Coders who access them directly will be tracked down and slapped with stinking trouts. Signed-off-by: Thomas Gleixner commit 1ce6068dac1924f7095be5850481e790cbf1b3c1 Author: Thomas Gleixner Date: Wed Feb 9 20:44:21 2011 +0100 genirq: Move debug code to separate header It'll break when I'm going to undefine the constants. Signed-off-by: Thomas Gleixner commit fae581e588e64a0690f3fc995e404fcacaebe772 Author: Thomas Gleixner Date: Tue Feb 8 16:53:24 2011 +0100 genirq: Remove CHECK_IRQ_PER_CPU from core code Signed-off-by: Thomas Gleixner commit 8f53f92404bead2ab2154d45c8f508880bb5d95d Author: Thomas Gleixner Date: Tue Feb 8 16:50:00 2011 +0100 genirq: Make CHECK_IRQ_PER_CPU an inline and deprecate it Its' too ugly and needs to go. The only users are core code and parisc. Core code does not need it and parisc gets a new check once IRQ_PER_CPU is reflected in irq_data.state. Signed-off-by: Thomas Gleixner commit 6a58fb3bad099076f36f0f30f44507bc3275cdb6 Author: Thomas Gleixner Date: Tue Feb 8 15:40:05 2011 +0100 genirq: Remove CONFIG_IRQ_PER_CPU The saving of this switch is minimal versus the ifdef mess it creates. Simple enable PER_CPU unconditionally and remove the config switch. Signed-off-by: Thomas Gleixner commit f230b6d5c48f8d12f4dfa1f8b5ab0b0320076d21 Author: Thomas Gleixner Date: Sat Feb 5 15:20:04 2011 +0100 genirq: Add IRQ_MOVE_PENDING to irq_data.state chip implementations need to know about it. Keep status in sync until all users are fixed. Accessor function: irqd_is_setaffinity_pending(irqdata) Coders who access them directly will be tracked down and slapped with stinking trouts. Signed-off-by: Thomas Gleixner commit 91c499178139d6597e68db19638e4135510a34b8 Author: Thomas Gleixner Date: Thu Feb 3 20:48:29 2011 +0100 genirq: Add state field to irq_data Some chip implementations need to access certain status flags. With sparse irqs that requires a lookup of the irq descriptor. Add a state field which contains such flags. Name it in a way which will make coders happy to access it with the proper accessor functions. And it's easy to grep for. Signed-off-by: Thomas Gleixner commit 6d2cd17fde1fc3e93302815f049f255bb2b3123e Author: Thomas Gleixner Date: Tue Feb 8 14:34:18 2011 +0100 genirq: Move IRQ_WAKEUP to core No users outside of core. Signed-off-by: Thomas Gleixner commit c531e8361f1968d664e6e97fbd3bfa4cf0e62e42 Author: Thomas Gleixner Date: Tue Feb 8 12:44:58 2011 +0100 genirq: Move IRQ_SUSPENDED to core No users outside of core. Signed-off-by: Thomas Gleixner commit 6e40262ea43c4b0e3f435b3a083e4461ef921c17 Author: Thomas Gleixner Date: Tue Feb 8 12:36:06 2011 +0100 genirq: Move IRQ_MASKED to core Keep status in sync until all users are fixed. Signed-off-by: Thomas Gleixner commit 2a0d6fb335d4428285dab2d254911748e6040807 Author: Thomas Gleixner Date: Tue Feb 8 12:17:57 2011 +0100 genirq: Move IRQ_PENDING flag to core Keep status in sync until all users are fixed. Signed-off-by: Thomas Gleixner commit c1594b77e46124bb462f961e536120e471c67446 Author: Thomas Gleixner Date: Mon Feb 7 22:11:30 2011 +0100 genirq: Move IRQ_DISABLED to core Keep status in sync until all abusers are fixed. Signed-off-by: Thomas Gleixner commit 163ef3091195f514a06f064b12914597d2644c55 Author: Thomas Gleixner Date: Tue Feb 8 11:39:15 2011 +0100 genirq: Move IRQ_REPLAY and IRQ_WAITING to core No users outside of core. Signed-off-by: Thomas Gleixner commit 3d67baec7f1b01fc289ac1a2f1a7e6d5e43391c6 Author: Thomas Gleixner Date: Mon Feb 7 21:02:10 2011 +0100 genirq: Move IRQ_ONESHOT to core No users outside of core. Signed-off-by: Thomas Gleixner commit 009b4c3b8ad584b3462734127a5bec680d5d6af4 Author: Thomas Gleixner Date: Mon Feb 7 21:48:49 2011 +0100 genirq: Add IRQ_INPROGRESS to core We need to maintain the flag for now in both fields status and istate. Add a CONFIG_GENERIC_HARDIRQS_NO_COMPAT switch to allow testing w/o the status one. Wrap the access to status IRQ_INPROGRESS in a inline which can be turned of with CONFIG_GENERIC_HARDIRQS_NO_COMPAT along with the define. There is no reason that anything outside of core looks at this. That needs some modifications, but we'll get there. Signed-off-by: Thomas Gleixner commit 6954b75b488dd740950573f244ddd66fd28620aa Author: Thomas Gleixner Date: Mon Feb 7 20:55:35 2011 +0100 genirq: Move IRQ_POLL_INPROGRESS to core No users outside of core. Signed-off-by: Thomas Gleixner commit 6f91a52d9bb28396177662f1da0f2e2cef9cf5d0 Author: Thomas Gleixner Date: Mon Feb 14 13:33:16 2011 +0100 genirq: Use modify_status for set_irq_nested_thread No need for a separate function in the core code. Signed-off-by: Thomas Gleixner commit 7acdd53e5b2c55b6f7e3427e85e2f91fa814a4f9 Author: Thomas Gleixner Date: Mon Feb 7 20:40:54 2011 +0100 genirq: Move IRQ_SPURIOUS_DISABLED to core state No users outside. Signed-off-by: Thomas Gleixner commit bd062e7667ac173afef57fbfe9327f3b914a9d4c Author: Thomas Gleixner Date: Mon Feb 7 20:25:25 2011 +0100 genirq: Move IRQ_AUTODETECT to internal state No users outside of core Signed-off-by: Thomas Gleixner commit e6bea9c404699223322d7411c6f2ceaec02fa83c Author: Thomas Gleixner Date: Wed Feb 9 13:16:52 2011 +0100 genirq: Protect tglx from tripping over his own feet The irq_desc.status field will either go away or renamed to settings. Anyway we need to maintain compatibility to avoid breaking the world and some more. While moving bits into the core, I need to avoid that I use any of the still existing IRQ_ bits in the core code by typos. So that file will hold the inline wrappers and some nasty CPP tricks to break the build when typoed. Signed-off-by: Thomas Gleixner commit dbec07bac614a61e3392c1e7c08cc6a49ad43f7a Author: Thomas Gleixner Date: Mon Feb 7 20:19:55 2011 +0100 genirq: Add internal state field to irq_desc That field will contain internal state information which is not going to be exposed to anything outside the core code - except via accessor functions. I'm tired of everyone fiddling in irq_desc.status. core_internal_state__do_not_mess_with_it is clear enough, annoying to type and easy to grep for. Offenders will be tracked down and slapped with stinking trouts. Signed-off-by: Thomas Gleixner commit 35e857cbeb24e75c6f9a9312ac30454eee8c5950 Author: Thomas Gleixner Date: Thu Feb 10 12:20:23 2011 +0100 genirq: Fixup core code namespace fallout Signed-off-by: Thomas Gleixner commit c78b9b65faa291def628dbd8539649f58299f0f3 Author: Thomas Gleixner Date: Thu Dec 16 17:21:47 2010 +0100 genirq: Implement generic irq_show_interrupts() All archs implement show_interrupts() in more or less the same way. That's tons of duplicated code with different bugs with no value. Implement a generic version and deprecate show_interrupts() Unfortunately we need some ifdeffery for !GENERIC_HARDIRQ archs. Signed-off-by: Thomas Gleixner commit 1277a5325adfc53caac7dd3dac5d3d2fd2a125b4 Author: Thomas Gleixner Date: Mon Feb 7 01:40:27 2011 +0100 genirq: Simplify handle_irq_event() Now that all core users are converted one layer can go. Signed-off-by: Thomas Gleixner commit 0877d66257082ce86fca8f9826b91870575b272c Author: Thomas Gleixner Date: Mon Feb 7 01:29:15 2011 +0100 genirq: Use handle_irq_event() in the spurious poll code Signed-off-by: Thomas Gleixner commit 849f061c25f8951d11c7dd88f44950ccde296392 Author: Thomas Gleixner Date: Mon Feb 7 01:25:41 2011 +0100 genirq: Use handle_perpcu_event() in handle_percpu_irq() Signed-off-by: Thomas Gleixner commit a60a5dc2db3b08b3c2900614c43b1262410c2d8c Author: Thomas Gleixner Date: Mon Feb 7 01:24:07 2011 +0100 genirq: Use handle_irq_event() in handle_edge_irq() It's safe to drop the IRQ_INPROGRESS flag between action chain walks as we are protected by desc->lock. Signed-off-by: Thomas Gleixner commit a7ae4de5c8ae8110556f0f9c7241093ef984605c Author: Thomas Gleixner Date: Mon Feb 7 01:23:07 2011 +0100 genirq: Use handle_irq_event() in handle_fasteoi_irq() Signed-off-by: Thomas Gleixner commit 1529866c63d789925de9b4250646d82d033e4b95 Author: Thomas Gleixner Date: Mon Feb 7 01:22:17 2011 +0100 genirq: Use handle_irq_event() in handle_level_irq() Signed-off-by: Thomas Gleixner commit 107781e72192067b95a7d373bfa460434a13c6ae Author: Thomas Gleixner Date: Mon Feb 7 01:21:02 2011 +0100 genirq: Use handle_irq_event() in handle_simple_irq() Signed-off-by: Thomas Gleixner commit 4912609f228da4a3d2bfbdf0f31de3d9eab2b7f8 Author: Thomas Gleixner Date: Mon Feb 7 01:08:49 2011 +0100 genirq: Implement handle_irq_event() Core code replacement for the ugly camel case. It contains all the code which is shared in all handlers. clear status flags set INPROGRESS flag unlock call action chain note_interrupt lock clr INPROGRESS flag Signed-off-by: Thomas Gleixner commit d78f8dd36b90626106ce19cb2e6828b0dc39447e Author: Thomas Gleixner Date: Wed Feb 2 21:41:17 2011 +0000 genirq: Do not fiddle with IRQ_MASKED in handle_edge_irq() IRQ_MASKED is set in mask_ack_irq() anyway. Remove it from handle_edge_irq() to allow simpler ab^HHreuse of that function. Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra LKML-Reference: <20110202212551.918484270@linutronix.de> commit 3aae994fb0f43f6d94a31c33536a83869504abdf Author: Thomas Gleixner Date: Fri Feb 4 10:17:52 2011 +0100 genirq: Consolidate IRQ_DISABLED Handle IRQ_DISABLED consistent. Signed-off-by: Thomas Gleixner commit 50f7c0327513d5acefbe26fd33498af18d1ffac5 Author: Thomas Gleixner Date: Thu Feb 3 13:23:54 2011 +0100 genirq: Remove default magic Now that everything uses the wrappers, we can remove the default functions. None of those functions is performance critical. That makes the IRQ_MASKED flag tracking fully consistent. Signed-off-by: Thomas Gleixner commit 87923470c712dff00b101ffb6b6fbc27bd7a6df5 Author: Thomas Gleixner Date: Thu Feb 3 12:27:44 2011 +0100 genirq: Consolidate disable/enable Create irq_disable/enable and use them to keep the flags consistent. Signed-off-by: Thomas Gleixner commit 4699923861513671d3f6ade8efb4e56a9a7ecadf Author: Thomas Gleixner Date: Wed Feb 2 21:41:14 2011 +0000 genirq: Consolidate startup/shutdown of interrupts Aside of duplicated code some of the startup/shutdown sites do not handle the MASKED/DISABLED flags and the depth field at all. Move that to a helper function and take care of it there. Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra LKML-Reference: <20110202212551.787481468@linutronix.de> commit 3b56f0585fd4c02d047dc406668cb40159b2d340 Author: Thomas Gleixner Date: Wed Feb 2 21:41:12 2011 +0000 genirq: Remove bogus conditional The if (chip->irq_shutdown) check will always evaluate to true, as we fill in chip->irq_shutdown with default_shutdown in irq_chip_set_defaults() if the chip does not provide its own function. Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra LKML-Reference: <20110202212551.667607458@linutronix.de> commit 1535dfacbf21c4da1b73fcf07c39913da5bd5581 Author: Thomas Gleixner Date: Mon Feb 7 01:55:43 2011 +0100 genirq: Move irq thread flags to core Soleley used in core code. Signed-off-by: Thomas Gleixner commit fe200ae48ef5c79bf7941fe8046ff9505c570ff6 Author: Thomas Gleixner Date: Mon Feb 7 10:34:30 2011 +0100 genirq: Mark polled irqs and defer the real handler With the chip.end() function gone we might run into a situation where a poll call runs and the real interrupt comes in, sees IRQ_INPROGRESS and disables the line. That might be a perfect working one, which will then be masked forever. So mark them polled while the poll runs. When the real handler sees IRQ_INPROGRESS it checks the poll flag and waits for the polling to complete. Add the necessary amount of sanity checks to it to avoid deadlocks. Signed-off-by: Thomas Gleixner commit d05c65fff0ef672be75429266751f0e015b54d94 Author: Thomas Gleixner Date: Mon Feb 7 14:31:37 2011 +0100 genirq: spurious: Run only one poller at a time No point in running concurrent pollers which confuse each other by setting PENDING. Signed-off-by: Thomas Gleixner commit c7259cd7af757ddcd65701c37099dcddae2054f0 Author: Thomas Gleixner Date: Mon Feb 7 09:52:27 2011 +0100 genirq: Do not poll disabled, percpu and timer interrupts There is no point in polling disabled lines. percpu does not make sense at all because we only poll on the cpu we're currently running on. Also polling per_cpu interrupts is racy as hell. The handler runs without locking so we might get a huge surprise. If the timer interrupt needs polling, then we wont get there anyway. Signed-off-by: Thomas Gleixner commit fa27271bc8d230355c1f24ddea103824fdc12de6 Author: Thomas Gleixner Date: Mon Feb 7 09:10:39 2011 +0100 genirq: Fixup poll handling try_one_irq() contains redundant code and lots of useless checks for shared interrupts. Check for shared before setting IRQ_INPROGRESS and then call handle_IRQ_event() while pending. Shorter version with the same functionality. Signed-off-by: Thomas Gleixner commit b738a50a202639614c98b5763b01bf9201779e50 Author: Thomas Gleixner Date: Wed Feb 2 23:58:19 2011 +0100 genirq: Warn when handler enables interrupts We run all handlers with interrupts disabled and expect them not to enable them. Warn when we catch one who does. Signed-off-by: Thomas Gleixner commit 1082687e8d6292a61759eb83358e7db39fed1bf4 Author: Thomas Gleixner Date: Mon Feb 7 09:05:05 2011 +0100 genirq: Plug race in report_bad_irq() We cannot walk the action chain unlocked. Even if IRQ_INPROGRESS is set an action can be removed and we follow a null pointer. It's safe to take the lock there, because the code which removes the action will call synchronize_irq() which waits unlocked for IRQ_INPROGRESS going away. Signed-off-by: Thomas Gleixner commit 2b879eaf095878430c38cbd95e5c0fc4ce65ad8e Author: Thomas Gleixner Date: Mon Feb 14 11:25:02 2011 +0100 genirq: Remove redundant thread affinity setting Thread affinity is already set by setup_affinity(). Signed-off-by: Thomas Gleixner commit 3b8249e759c701c4a82f99d957be651a7657bf6f Author: Thomas Gleixner Date: Mon Feb 7 16:02:20 2011 +0100 genirq: Do not copy affinity before set While rumaging through arch code I found that there are a few workarounds which deal with the fact that the initial affinity setting from request_irq() copies the mask into irq_data->affinity before the chip code is called. In the normal path we unconditionally copy the mask when the chip code returns 0. Copy after the code is called and add a return code IRQ_SET_MASK_OK_NOCOPY for the chip functions, which prevents the copy. That way we see the real mask when the chip function decided to truncate it further as some arches do. IRQ_SET_MASK_OK is 0, which is the current behaviour. Signed-off-by: Thomas Gleixner commit 569bda8df11effa03e618729293c7961696abb10 Author: Thomas Gleixner Date: Mon Feb 7 17:05:08 2011 +0100 genirq: Always apply cpu online mask If the affinity had been set by the user, then a later request_irq() will honour that setting. But online cpus can have changed. So apply the online mask and for this case as well. Signed-off-by: Thomas Gleixner commit b008207cbd0d5ce606a1a2ac52826e0ab37d0b99 Author: Thomas Gleixner Date: Mon Feb 7 17:30:50 2011 +0100 genirq: Rremove redundant check IRQ_NO_BALANCING is already checked in irq_can_set_affinity() above, no need to check it again. Signed-off-by: Thomas Gleixner commit 1fa46f1f070961783661ae640cd2f6b2557f3885 Author: Thomas Gleixner Date: Mon Feb 7 16:46:58 2011 +0100 genirq: Simplify affinity related code There is lot of #ifdef CONFIG_GENERIC_PENDING_IRQ along with duplicated code in the irq core. Move the #ifdeffery into one place and cleanup the code so it's readable. No functional change. Signed-off-by: Thomas Gleixner commit a0cd9ca2b907d7ee26575e7b63ac92dad768a75e Author: Thomas Gleixner Date: Thu Feb 10 11:36:33 2011 +0100 genirq: Namespace cleanup The irq namespace has become quite convoluted. My bad. Clean it up and deprecate the old functions. All new functions follow the scheme: irq number based: irq_set/get/xxx/_xxx(unsigned int irq, ...) irq_data based: irq_data_set/get/xxx/_xxx(struct irq_data *d, ....) irq_desc based: irq_desc_get_xxx(struct irq_desc *desc) Signed-off-by: Thomas Gleixner commit 43abe43ce0619d744c7a5bb15cce075e532b53b7 Author: Thomas Gleixner Date: Sat Feb 12 12:10:49 2011 +0100 genirq: Add missing buslock to set_irq_type(), set_irq_wake() chips behind a slow bus cannot update the chip under desc->lock, but we miss the chip_buslock/chip_bus_sync_unlock() calls around the set type and set wake functions. Signed-off-by: Thomas Gleixner commit e7bcecb7b1d29b9ad5af939149a945658620ca8f Author: Thomas Gleixner Date: Wed Feb 16 17:12:57 2011 +0100 genirq: Make nr_irqs runtime expandable We face more and more the requirement to expand nr_irqs at runtime. The reason are irq expanders which can not be detected in the early boot stage. So we speculate nr_irqs to have enough room. Further Xen needs extra irq numbers and we really want to avoid adding more "detection" code into the early boot. There is no real good reason why we need to limit nr_irqs at early boot. Allow the allocation code to expand nr_irqs. We have already 8k extra number space in the allocation bitmap, so lets use it. Signed-off-by: Thomas Gleixner commit 218502bfe674f570205367b9094048207b04ba15 Merge: 51327ad 6d83f94 Author: Thomas Gleixner Date: Sat Feb 19 12:56:36 2011 +0100 Merge branch 'irq/urgent' into irq/core Reason: Further patches are conflicting with mainline fixes Signed-off-by: Thomas Gleixner commit 5df91509d324d44cfb11e55d9cb02fe18b53b045 Author: jacob.jun.pan@linux.intel.com Date: Fri Feb 18 13:42:54 2011 -0800 x86: mrst: Remove apb timer read workaround APB timer current count was unreliable in the earlier silicon, which could result in time going backwards. This problem has been fixed in the current silicon stepping. This patch removes the workaround which was used to check and prevent timer rolling back when APB timer is used as clocksource device. The workaround code was also flawed by potential race condition around the cached read value last_read. Though a fix can be done by assigning last_read to a local variable at the beginning of apbt_read_clocksource(), but this is not necessary anymore. [ tglx: A sane timer on an Intel chip - I can't believe it ] Signed-off-by: Jacob Pan Cc: Arjan van de Ven Cc: Alan Cox LKML-Reference: <1298065374-25532-1-git-send-email-jacob.jun.pan@linux.intel.com> Signed-off-by: Thomas Gleixner commit 3d74a539ae07a8f3c061332e426fc07b2310cf05 Author: Konrad Rzeszutek Wilk Date: Thu Feb 17 16:12:51 2011 -0500 pci/xen: When free-ing MSI-X/MSI irq->desc also use generic code. This code path is only run when an MSI/MSI-X PCI device is passed in to PV DomU. In 2.6.37 time-frame we over-wrote the default cleanup handler for MSI/MSI-X irq->desc to be "xen_teardown_msi_irqs". That function calls the the xen-pcifront driver which can tell the backend to cleanup/take back the MSI/MSI-X device. However, we forgot to continue the process of free-ing the MSI/MSI-X device resources (irq->desc) in the PV domU side. Which is what the default cleanup handler: default_teardown_msi_irqs did. Hence we would leak IRQ descriptors. Without this patch, doing "rmmod igbvf;modprobe igbvf" multiple times ends with abandoned IRQ descriptors: 28: 5 xen-pirq-pcifront-msi-x 29: 8 xen-pirq-pcifront-msi-x ... 130: 10 xen-pirq-pcifront-msi-x with the end result of running out of IRQ descriptors. Reviewed-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit cc0f89c4a426fcd6400a89e9e34e4a8851abef76 Author: Konrad Rzeszutek Wilk Date: Thu Feb 17 12:02:23 2011 -0500 pci/xen: Cleanup: convert int** to int[] Cleanup code. Cosmetic change to make the code look easier to read. Reviewed-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit db1c1cce4a653dcbe6949c72ae7b9f42cab1b929 Author: Richard Cochran Date: Fri Feb 18 10:07:25 2011 +0100 ntp: Remove redundant and incorrect parameter check The ADJ_SETOFFSET code redundantly checks the range of the nanoseconds field of the time value. This field is checked again in the subsequent call to timekeeping_inject_offset(). Also, as is, the check will not detect whether the number of microseconds is out of range. Let timekeeping_inject_offset() do the error checking. Signed-off-by: Richard Cochran Cc: johnstul@us.ibm.com LKML-Reference: <20110218090724.GA2924@riccoc20.at.omicron.at> Signed-off-by: Thomas Gleixner commit 13884c6680973f0ce3483dc59b636b4962d6dafe Author: Sebastian Andrzej Siewior Date: Fri Dec 17 16:33:53 2010 +0100 x86/pci: Remove unused variable |arch/x86/pci/ce4100.c: In function `ce4100_conf_read': |arch/x86/pci/ce4100.c:257:9: warning: unused variable `retval' Signed-off-by: Sebastian Andrzej Siewior Cc: dirk.brandewie@gmail.com LKML-Reference: <1292600033-12271-16-git-send-email-bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner commit 55cb8cd45e0600df1473489518d7f12ce1bbe973 Author: Konrad Rzeszutek Wilk Date: Wed Feb 16 13:43:04 2011 -0500 pci/xen: Use xen_allocate_pirq_msi instead of xen_allocate_pirq xen_allocate_pirq -> xen_map_pirq_gsi -> PHYSDEVOP_alloc_irq_vector IFF xen_initial_domain() in addition to the kernel side book-keeping side of things (set chip and handler, update irq_info etc) whereas xen_allocate_pirq_msi just does the kernel book keeping. Also xen_allocate_pirq allocates an IRQ in the 1-1 GSI space whereas xen_allocate_pirq_msi allocates a dynamic one in the >GSI IRQ space. All of this is uneccessary as this code path is only executed when we run as a domU PV guest with an MSI/MSI-X PCI card passed in. Hence we can jump straight to allocating an dynamic IRQ (and binding it to the proper PIRQ) and skip the rest. In short: this change is a cosmetic one. Reviewed-by: Ian Campbell Reviewed-by: Stefano Stabellini Signed-off-by: Konrad Rzeszutek Wilk commit 1d4610527bc71d3f9eea520fc51a02d54f79dcd0 Author: Konrad Rzeszutek Wilk Date: Wed Feb 16 13:43:22 2011 -0500 xen-pcifront: Sanity check the MSI/MSI-X values Check the returned vector values for any values that are odd or plain incorrect (say vector value zero), and if so print a warning. Also fixup the return values. This patch was precipiated by the Xen PCIBack returning the incorrect values due to how it was retrieving PIRQ values. This has been fixed in the xen-pciback by "xen/pciback: Utilize 'xen_pirq_from_irq' to get PIRQ value" patch. Reviewed-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit a3d1ee10d1bf4520af3d44c1aa6cd46956ec4fd7 Author: Michael Witten Date: Wed Feb 2 14:22:08 2011 -0600 perf tools: Makefile: Remove various and sundry cruft This commit squashes several commits that remove: unnecessary uname calls `sh -c' BUILT_INS and QUIET_BUILT_IN They have no effect, and the `fixup-builtins' and `check-builtins.sh' scripts don't even exist. RUNTIME_PREFIX It's currently never anything but unset, and it's apparently only meaningful when Microsoft Windows is the operating system (according to the source for git). TEST_PROGRAMS EXTRA_PROGRAMS unused SHELL_PATH_SQ portions unused test for V=2 useless exports Only when `V' is undefined (that is, only when the value of `V' is empty) is `export V' performed, which just has the effect of placing the empty-valued variable `V' in the environment. The only other script to make use of `V' is `Documentation/Makefile', which only checks whether `V' is undefined (that is, whether the value of `V' is empty); hence, the `export V' has no effect whatsoever. Similarly, `export QUIET_GEN' is useless because it will only have a non-empty value when `V' has an empty-value, and when `V' has an empty-value, `QUIET_GEN' is always explicitly set in every script in which it is used. `DESTDIR' is only ever defined by the user via the environment or the command line, both of which are automatically exported to sub-make processes. Furthermore, no non-make sub-scripts make use of `DESTDIR' as an environment variable. No other scripts use `perfexec_instdir'. unused QUIET_SUBDIR{0,1} TAR and RPMBUILD PTHREAD_LIBS Maintainer's dist rules and commands distclean target Test suite coverage testing PRINT_DIR and NO_SUBDIR `configure' target NO_CURL @@PERF_VERSION@@ substitution Without the sed command, all of the rule's commands can be reduced to a single line that copies a file and sets the permissions properly in the process. `make test' echo line template_instdir PERF-BUILD-OPTIONS double-colon rules The use of double-colon rules seems misguided or vestigial git. Essentially hard-coded $(SCRIPTS) expansion Signed-off-by: Michael Witten LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 0a54fb63600b745e060d24879ed5194382a466c5 Author: Michael Witten Date: Wed Feb 2 12:04:27 2011 -0600 perf tools: Makefile: Remove tool-specific cruft This commit squashes several commits that remove: NO_C99_FORMAT CURLDIR and EXPATDIR NO_DEFLATE_BOUND CC_LD_DYNPATH and NO_R_TO_GCC_LINKER NO_PERL_MAKEMAKER INTERNAL_QSORT NO_EXTERNAL_GREP NO_PERL SCRIPT_PERL PERL_PATH_SQ Signed-off-by: Michael Witten LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 8796cb9d7dc028945af4b2ea858ae8f8f2ecbe8c Author: Michael Witten Date: Wed Feb 2 11:57:41 2011 -0600 perf tools: Makefile: Remove platform-specific cruft While it makes sense that this tool could be used on other platforms at least to parse data, there doesn't appear to be any real support for such usage. This commit squashes several commits that remove: SNPRINTF_RETURNS_BOGUS FREAD_READS_DIRECTORIES NO_D_{INO,TYPE}_IN_DIRENT NO_STRCASESTR NO_MEMMEM NO_STRTOUMAX and NO_STRTOULL NO_SETENV NO_UNSETENV NO_MKDTEMP NEEDS_LIBICONV NEEDS_SOCKET NO_MMAP NO_PTHREADS NO_PREAD NO_TRUSTABLE_FILEMODE NO_IPV6 and NO_SOCKADDR_STORAGE NO_ICONV and OLD_ICONV NO_NSEC, USE_NSEC, and USE_ST_TIMESPEC NO_ST_BLOCKS_IN_STRUCT_STAT NO_FINK and NO_DARWIN_PORTS NO_SYS_SELECT_H NO_HSTRERROR DIR_HAS_BSD_GROUP_SEMANTICS and FORCE_DIR_SET_GID NEEDS_NSL, NO_UINTMAX_T, NO_INET_{N,P}TON COMPAT_{CFLAGS,OBJS} Executable extension `X' Signed-off-by: Michael Witten LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 006cdc32618e09ffe228a7a86af044f3cc0dd714 Author: Michael Witten Date: Wed Feb 2 13:01:41 2011 -0600 perf tools: Makefile: Remove vestigial git-specific cruft This commit squashes several commits that remove: NO_SYMLINK_HEAD NO_SVN_TESTS NO_FAST_WORKING_DIRECTORY USE_STDEV SHA1/SSL cruft makefile rules Signed-off-by: Michael Witten LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 58bff947e2d164c7e5cbf7f485e4b3d4884befeb Author: Jan Beulich Date: Thu Feb 17 15:54:26 2011 +0000 x86: Eliminate pointless adjustment attempts in fixup_irqs() Not only when an IRQ's affinity equals cpu_online_mask is there no need to actually try to adjust the affinity, but also when it's a subset thereof. This particularly avoids adjustment attempts during system shutdown to any IRQs bound to CPU#0. Signed-off-by: Jan Beulich Cc: Thomas Gleixner Cc: Eric W. Biederman Cc: Suresh Siddha Cc: Gary Hade LKML-Reference: <4D5D52C2020000780003272C@vpn.id2.novell.com> Signed-off-by: Ingo Molnar commit 02ca752e4181e219e243cd61a60dd1da47251f11 Author: Jan Beulich Date: Thu Feb 17 15:51:40 2011 +0000 x86: Remove die_nmi() With no caller left, the function and the DIE_NMIWATCHDOG enumerator can both go away. Signed-off-by: Jan Beulich Cc: Peter Zijlstra Cc: Don Zickus LKML-Reference: <4D5D521C0200007800032702@vpn.id2.novell.com> Signed-off-by: Ingo Molnar commit fd8fa4d3ddc4cc04ec8097e632b995d535c52beb Author: Jan Beulich Date: Thu Feb 17 15:56:58 2011 +0000 x86: Combine printk()s in show_regs_common() Printing a single character alone when there's an immediately following printk() is pretty pointless (and wasteful). Signed-off-by: Jan Beulich LKML-Reference: <4D5D535A0200007800032730@vpn.id2.novell.com> Signed-off-by: Ingo Molnar commit bb3e6251a69e67d7620373ee18e35b404964273e Author: Jan Beulich Date: Thu Feb 17 15:47:37 2011 +0000 x86: Don't call dump_stack() from arch_trigger_all_cpu_backtrace_handler() show_regs() already prints two(!) stack traces, no need for a third one. Signed-off-by: Jan Beulich LKML-Reference: <4D5D512902000078000326EE@vpn.id2.novell.com> Signed-off-by: Ingo Molnar commit e4cc9f4a207aeb819f358114eb23a04547d4807c Merge: e9345aa 668b878 Author: Ingo Molnar Date: Fri Feb 18 08:25:05 2011 +0100 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core commit e9345aab675382176740bc8a2c6d3caf1510e46d Author: Ingo Molnar Date: Fri Feb 18 08:09:49 2011 +0100 Revert "tracing: Add unstable sched clock note to the warning" This reverts commit 5e38ca8f3ea423442eaafe1b7e206084aa38120a. Breaks the build of several !CONFIG_HAVE_UNSTABLE_SCHED_CLOCK architectures. Cc: Jiri Olsa Cc: Steven Rostedt Message-ID: <20110217171823.GB17058@elte.hu> Signed-off-by: Ingo Molnar commit db2e2e6ee9ee9ce93b04c6975fdfef304771d6ad Author: Tejun Heo Date: Mon Jan 24 15:43:03 2011 +0100 xen-pcifront: don't use flush_scheduled_work() flush_scheduled_work() is scheduled for deprecation. Cancel ->op_work directly instead. Signed-off-by: Tejun Heo Cc: Ryan Wilson Cc: Jan Beulich Cc: Jesse Barnes Signed-off-by: Konrad Rzeszutek Wilk commit 668b8788f497b2386402daeca583d6300240d41d Author: Arnaldo Carvalho de Melo Date: Thu Feb 17 15:38:58 2011 -0200 perf list: Allow filtering list of events The man page has the details, here are some examples: [root@emilia ~]# perf list *fault* *:*wait* List of pre-defined events (to be used in -e): page-faults OR faults [Software event] minor-faults [Software event] major-faults [Software event] alignment-faults [Software event] emulation-faults [Software event] radeon:radeon_fence_wait_begin [Tracepoint event] radeon:radeon_fence_wait_end [Tracepoint event] writeback:wbc_writeback_wait [Tracepoint event] writeback:wbc_balance_dirty_wait [Tracepoint event] writeback:writeback_congestion_wait [Tracepoint event] writeback:writeback_wait_iff_congested [Tracepoint event] sched:sched_wait_task [Tracepoint event] sched:sched_process_wait [Tracepoint event] sched:sched_stat_wait [Tracepoint event] sched:sched_stat_iowait [Tracepoint event] syscalls:sys_enter_epoll_wait [Tracepoint event] syscalls:sys_exit_epoll_wait [Tracepoint event] syscalls:sys_enter_epoll_pwait [Tracepoint event] syscalls:sys_exit_epoll_pwait [Tracepoint event] syscalls:sys_enter_rt_sigtimedwait [Tracepoint event] syscalls:sys_exit_rt_sigtimedwait [Tracepoint event] syscalls:sys_enter_waitid [Tracepoint event] syscalls:sys_exit_waitid [Tracepoint event] syscalls:sys_enter_wait4 [Tracepoint event] syscalls:sys_exit_wait4 [Tracepoint event] syscalls:sys_enter_waitpid [Tracepoint event] syscalls:sys_exit_waitpid [Tracepoint event] [root@emilia ~]# Suggested-by: Ingo Molnar Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 74cfc17dc1a69c37ce6c8a76c1ce84bcb796eb0e Author: Arnaldo Carvalho de Melo Date: Thu Feb 17 14:40:46 2011 -0200 perf report: Tell the user when a perf.data file has no samples [root@emilia ~]# perf report --stdio The perf.data file has no samples! [root@emilia ~]# The TUI shows a popup warning message with the same message. Reported-by: Ingo Molnar Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Steven Rostedt Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 712a4b6049724278121d56aba683151d86c8c35a Author: Arnaldo Carvalho de Melo Date: Thu Feb 17 12:18:42 2011 -0200 perf record: Delay setting the header writing atexit call While testing the --filter option I noticed that we were writing lots of unneeded stuff to the perf.data header when the filter ioctl fails, so move the atexit(atexit_header) call to after we create the counters successfully. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit fec9cbd15b9e99bab9bc50f1ed7e20a1087d7c6d Author: Arnaldo Carvalho de Melo Date: Thu Feb 17 10:37:23 2011 -0200 perf hists: Print number of samples, not the period sum So that we match the header where we state the number of events with the "Samples" column when using 'perf report -n/--show-nr-samples': [root@emilia ~]# perf record -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.111 MB perf.data (~4860 samples) ] [root@emilia ~]# perf report --stdio --show-nr-samples # Events: 11 cycles # # Overhead Samples Command Shared Object Symbol # ........ .......... ........... .................. ............................ # 16.65% 1 sleep [kernel.kallsyms] [k] unmap_vmas 16.10% 1 perf libpthread-2.12.so [.] __pthread_cleanup_push_defer 15.79% 2 perf [kernel.kallsyms] [k] format_decode 12.88% 1 kworker/1:2 [kernel.kallsyms] [k] cache_reap 10.69% 1 swapper [kernel.kallsyms] [k] _raw_spin_lock 7.55% 1 sleep [kernel.kallsyms] [k] prepare_exec_creds 6.00% 1 perf [jbd2] [k] start_this_handle 5.29% 1 perf [kernel.kallsyms] [k] seq_read 4.75% 1 perf [kernel.kallsyms] [k] get_pid_task 4.30% 1 perf [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore # # (For a higher level overview, try: perf report --sort comm,dso) # [root@emilia ~]# Reported-by: Stephane Eranian Acked-by: Stephane Eranian Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 6d496f9f232790d44144f3784856290e0b27b8f3 Author: Yinghai Lu Date: Thu Feb 17 14:53:20 2011 +0100 x86-64, NUMA: Put dummy_numa_init() in the init section dummy_numa_init() is used only during system boot. Put it in .init like other NUMA init functions. - tj: Description update. Signed-off-by: Yinghai Lu Signed-off-by: Tejun Heo commit 2ca230baeb7c61864cab9b53e37a3da28a2ca7e5 Author: Yinghai Lu Date: Thu Feb 17 14:46:37 2011 +0100 x86-64, NUMA: Don't call __pa() with invalid address in numa_reset_distance() Do not call __pa(numa_distance) if it was not allocated before. Calling with invalid address triggers VIRTUAL_BUG_ON() in __phys_addr() if CONFIG_DEBUG_VIRTUAL. Also reported by Ingo. http://thread.gmane.org/gmane.linux.kernel/1101306/focus=1101785 - v2: Change to check existing path as tj requested. - tj: Description update. Signed-off-by: Yinghai Lu Signed-off-by: Tejun Heo Reported-by: Ingo Molnar commit da1016df85ed67b6f7dbb765532c54bc35ba08d7 Author: Akinobu Mita Date: Wed Feb 16 23:48:35 2011 +0900 x86: Use bitmap library functions Use bitmap_set()/bitmap_clear() to fill/zero a region of a bitmap instead of doing set_bit()/clear_bit() each bit. This change has been tested with ioperm() and there's no change in behavior. Signed-off-by: Akinobu Mita LKML-Reference: <1297867715-20394-1-git-send-email-akinobu.mita@gmail.com> Signed-off-by: Ingo Molnar commit bee96907383e71d3996ba2bd0682fefaa492d942 Merge: 5beda5f 8737ebd Author: Ingo Molnar Date: Thu Feb 17 14:46:35 2011 +0100 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core commit 5beda5f6e4e4523e8dbe596bf163a01b45776808 Merge: ba3dd36 6752ab4 Author: Ingo Molnar Date: Thu Feb 17 14:11:15 2011 +0100 Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core commit f0c55bcf4aa41b4b1dbee826513b1acb01bf65e1 Author: Stephane Eranian Date: Wed Feb 16 15:10:01 2011 +0200 perf: make perf stat print user provided full event names This patch changes the way perf stat prints event names at the end of a run. Until now, it was trying to reconstruct the event name from its encoding. The problem is that it would only print generic events without their modifiers (u, k, pp). This patch saves the event name as passed by the user in the evsel struct and uses it to print the final event name. This would also work in case perf is linked with a library (such as libpfm4) which provides full PMU event tables. $ perf stat -e cycles:u,cycles:k date Wed Feb 16 14:58:52 CET 2011 Performance counter stats for 'date': 568600 cycles:u 2779715 cycles:k 0.001908182 seconds time elapsed Cc: Arun Sharma Cc: David S. Miller Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Robert Richter Cc: Stephane Eranian LPU-Reference: <4d5bdc64.98a1df0a.7aa3.06c2@mx.google.com> Signed-off-by: Stephane Eranian [ committer note: Fixed a merge problem with 023695d "Add cgroup support" ] Signed-off-by: Arnaldo Carvalho de Melo commit 4498062e72fd55b2a9a4ac1b44fab8cb44ad5367 Author: Arnaldo Carvalho de Melo Date: Thu Feb 17 10:07:42 2011 -0200 perf python: Add cgroup.c to setup.py to get it building again The 023695d cset added a new file, util/cgroup.c, that is referenced from util/evsel.c, so it needs to be present in util/setup.py so that the python shared object binding works, fixing this: [root@emilia linux]# export PYTHONPATH=~acme/git/build/perf/python/ [root@emilia linux]# ./tools/perf/python/twatch.py Traceback (most recent call last): File "./tools/perf/python/twatch.py", line 16, in import perf ImportError: /home/acme/git/build/perf/python/perf.so: undefined symbol: close_cgroup Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 8737ebdea02315eaffaebb3b73d55f2f726a4fe0 Author: Masami Hiramatsu Date: Thu Feb 10 18:08:16 2011 +0900 perf probe: Show filename which contains target function Show filename which contains a target function with the function name on "--lines" mode, because perf-probe just shows the first function even if there are many same-name functions. Originally adopted by Franck Bui-Huu's patch which shows file name instead of function name. I've just modified it to show both of function name and file name, because of completeness of output. E.g.) $ perf probe -L t_show 0 static int t_show(struct seq_file *m, void *v) 1 { 2 struct ftrace_iterator *iter = m->private; ... $ perf probe -L t_show@trace/trace.c 0 static int t_show(struct seq_file *m, void *v) 1 { struct tracer *t = v; ... Original-patch-by: Franck Bui-Huu Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Franck Bui-Huu Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <20110210090816.1809.43426.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu Signed-off-by: Arnaldo Carvalho de Melo commit e116dfa1c357da49f55e1555767ec991225a8321 Author: Masami Hiramatsu Date: Thu Feb 10 18:08:10 2011 +0900 perf probe: Support function@filename syntax for --line Since "perf probe --add" supports function@filename syntax, --line option should also support it. Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Franck Bui-Huu Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: linux-kernel@vger.kernel.org LKML-Reference: <20110210090810.1809.26913.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu Signed-off-by: Arnaldo Carvalho de Melo commit 4187e262bc90369ba581ee28ec74ed416618889e Author: Jesse Brandeburg Date: Wed Feb 9 17:11:00 2011 -0800 perf tools: Update Makefile with some help The perf makefile is nicely complete except for a) an uninstall option b) a 'make help' description This patch implements b) it also comments out other non-working makefile targets Signed-off-by: Jesse Brandeburg LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit b99976e2d277c963138e090ae17bf835f8a07680 Author: Arnaldo Carvalho de Melo Date: Wed Feb 9 13:59:14 2011 -0200 perf annotate browser: Use the percent color for the whole line Not just for the percentage number, to see the hot lines more easily. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 289c082044643e55f65c6a16bb3621cf3f35a454 Author: Arnaldo Carvalho de Melo Date: Wed Feb 9 13:56:28 2011 -0200 perf annotate: Check if offset is less than symbol size Just like done on symbol__inc_addr_samples to catch misparsed offsets from objdump. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 5c35d69fb60b1dc49595f5b9a2c7158283e9eaf3 Author: Arnaldo Carvalho de Melo Date: Wed Feb 9 11:38:43 2011 -0200 perf ui: Serialize screen updates The ui operations so far were used by just one thread, but 'perf top --tui' now has two threads updating the screen, so we need to use a mutex to avoid garbling the screen. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit e23bba604433a202cd301a976454a90ea6b783ef Author: Tejun Heo Date: Wed Feb 16 17:11:10 2011 +0100 x86-64, NUMA: Unify emulated distance mapping NUMA emulation needs to update node distance information. It did it by remapping apicid to PXM mapping, even when amdtopology is being used. There is no reason to go through such convolution. The generic code has all the information necessary to transform the distance table to the emulated nid space. Implement generic distance table transformation in numa_emulation() and drop private implementations in srat_64 and amdtopology_64. This makes find_node_by_addr() and fake_physnodes() and related functions unnecessary, drop them. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit 6b78cb549b4105cbf7c6f7461f27a21f00c44997 Author: Tejun Heo Date: Wed Feb 16 17:11:10 2011 +0100 x86-64, NUMA: Unify emulated apicid -> node mapping transformation NUMA emulation changes node mappings and thus apicid -> node mapping needs to be updated accordingly. srat_64 and amdtopology_64 did this separately; however, all the necessary information is the mapping from emulated nodes to physical nodes which is available in emu_nid_to_phys[]. Implement common __apicid_to_node[] transformation in numa_emulation() and drop duplicate implementations. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit 1cca53407336fb6a86092e36dbc5c1e4d45d912b Author: Tejun Heo Date: Wed Feb 16 17:11:10 2011 +0100 x86-64, NUMA: Emulate directly from numa_meminfo NUMA emulation built physnodes[] array which could only represent configurations from the physical meminfo and emulated nodes using the information. There's no reason to take this extra level of indirection. Update emulation functions so that they operate directly on numa_meminfo. This simplifies the code and makes emulation layout behave better with interleaved physical nodes. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit 775ee85d7bff8ce7c7eccde90eda400658b650a3 Author: Tejun Heo Date: Wed Feb 16 17:11:10 2011 +0100 x86-64, NUMA: Wrap node ID during emulation Both emulation layout functions - split_nodes[_size]_interleave() - didn't wrap emulated nid while laying out the fake nodes and tried to avoid interating over the specified number of nodes, which is fragile. Now that the emulation code generates numa_meminfo, the node memblks don't need to be consecutive and emulated node IDs can simply wrap. This makes the code more robust and is necessary for updates to better handle the cases where the physical nodes are interleaved. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit c88aea7a70b0f014f98c695069ba91abc9e9b9a4 Author: Tejun Heo Date: Wed Feb 16 17:11:10 2011 +0100 x86-64, NUMA: Make emulation code build numa_meminfo and share the registration path NUMA emulation code built nodes[] array and had its own registration path to set up the emulated nodes. Update it such that it generates emulated numa_meminfo and returns control to initmem_init() and shares the same registration path with non-emulated cases. Because {acpi|amd}_fake_nodes() expect nodes[] parameter, fake_physnodes() now generates nodes[] from numa_meminfo. This will go away with further updates. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit 9d073caeb372940af02a768d2b7e845ac732bda0 Author: Tejun Heo Date: Wed Feb 16 17:11:10 2011 +0100 x86-64, NUMA: Build and use direct emulated nid -> phys nid mapping NUMA emulation copied physical NUMA configuration into physnodes[] and used it to reverse-map emulated nodes to physical nodes, which is unnecessarily convoluted. Build emu_nid_to_phys[] array to map emulated nids directly to the matching physical nids and use it in numa_add_cpu(). physnodes[] will be removed with further patches. - v2: Build failure when CONFIG_DEBUG_PER_CPU_MAPS due to missing local variable definition fixed. Reported by Ingo. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit d9c515eacb3bde73f7a5ecb7e35ea6e660ad421d Author: Tejun Heo Date: Wed Feb 16 17:11:10 2011 +0100 x86-64, NUMA: Trivial changes to prepare for emulation updates * Separate out numa_add_memblk_to() from numa_add_memblk() so that different numa_meminfo can be used. * Rename cmdline to emu_cmdline. * Drop @start/last_pfn from numa_emulation() and use max_pfn directly. This patch doesn't introduce any behavior change. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit ac7136b611ee8f8bd6231ce2e1dbdd31ae3d39bc Author: Tejun Heo Date: Wed Feb 16 17:11:09 2011 +0100 x86-64, NUMA: Implement generic node distance handling Node distance either used direct node comparison, ACPI PXM comparison or ACPI SLIT table lookup. This patch implements generic node distance handling. NUMA init methods can call numa_set_distance() to set distance between nodes and the common __node_distance() implementation will report the set distance. Due to the way NUMA emulation is implemented, the generic node distance handling is used only when emulation is not used. Later patches will update NUMA emulation to use the generic distance mechanism. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit 4697bdcc945c094d2c8a4876a24faeaf31a283e0 Author: Tejun Heo Date: Wed Feb 16 17:11:09 2011 +0100 x86-64, NUMA: Kill mem_nodes_parsed With all memory configuration information now carried in numa_meminfo, there's no need to keep mem_nodes_parsed separate. Drop it and use numa_nodes_parsed for CPU / memory-less nodes. A new helper numa_nodemask_from_meminfo() is added to calculate memnode mask on the fly which is currently used to set node_possible_map. This simplifies NUMA init methods a bit and removes a source of possible inconsistencies. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit 92d4a4371eeb89e1e12b9ebbed0956f499b6c2c0 Author: Tejun Heo Date: Wed Feb 16 17:11:09 2011 +0100 x86-64, NUMA: Rename cpu_nodes_parsed to numa_nodes_parsed It's no longer necessary to keep both cpu_nodes_parsed and mem_nodes_parsed. In preparation for merge, rename cpu_nodes_parsed to numa_nodes_parsed. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit 91556237ec872e1029e3036174bae3b1a8df65eb Author: Tejun Heo Date: Wed Feb 16 17:11:09 2011 +0100 x86-64, NUMA: Kill numa_nodes[] numa_nodes[] doesn't carry any information which isn't present in numa_meminfo. Each entry is simply min/max range of all the memblks for the node. This is not only redundant but also inaccurate when memblks for different nodes interleave - for example, find_node_by_addr() can return the wrong nodeid. Kill numa_nodes[] and always use numa_meminfo instead. * nodes_cover_memory() is renamed to numa_meminfo_cover_memory() and now operations on numa_meminfo and returns bool. * setup_node_bootmem() needs min/max range. Compute the range on the fly. setup_node_bootmem() invocation is restructured to use outer loop instead of hardcoding the double invocations. * find_node_by_addr() now operates on numa_meminfo. * setup_physnodes() builds physnodes[] from memblks. This will go away when emulation code is updated to use struct numa_meminfo. This patch also makes the following misc changes. * Clearing of nodes_add[] clearing is converted to memset(). * numa_add_memblk() in amd_numa_init() is moved down a bit for consistency. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit a844ef46fa3055165c28feede6114a711b8375ad Author: Tejun Heo Date: Wed Feb 16 17:11:09 2011 +0100 x86-64, NUMA: Add common find_node_by_addr() srat_64.c and amdtopology_64.c had their own versions of find_node_by_addr() which were basically the same. Add common one in numa_64.c and remove the duplicates. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit 56e827fbde9a3cb886a2fe138db0d99e98efbfb1 Author: Tejun Heo Date: Wed Feb 16 17:11:09 2011 +0100 x86-64, NUMA: consolidate and improve memblk sanity checks memblk sanity check was scattered around and incomplete. Consolidate and improve. * Confliction detection and cutoff_node() logic are moved to numa_cleanup_meminfo(). * numa_cleanup_meminfo() clears the unused memblks before returning. * Check and warn about invalid input parameters in numa_add_memblk(). * Check the maximum number of memblk isn't exceeded in numa_add_memblk(). * numa_cleanup_meminfo() is now called before numa_emulation() so that the emulation code also uses the cleaned up version. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit 2e756be44714d0ec2f9827e4f4797c60876167a1 Author: Tejun Heo Date: Wed Feb 16 17:11:09 2011 +0100 x86-64, NUMA: make numa_cleanup_meminfo() prettier * Factor out numa_remove_memblk_from(). * Hole detection doesn't need separate start/end. Calculate start/end once. * Relocate comment. * Define iterators at the top and remove unnecessary prefix increments. This prepares for further improvements to the function. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit f9c60251c3d08777db6758cafd959a55a838abd6 Author: Tejun Heo Date: Wed Feb 16 17:11:09 2011 +0100 x86-64, NUMA: Separate out numa_cleanup_meminfo() Separate out numa_cleanup_meminfo() from numa_register_memblks(). node_possible_map initialization is moved to the top of the split numa_register_memblks(). This patch doesn't cause behavior change. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit 97e7b78d0674882a0aae043fda428c583dbb225d Author: Tejun Heo Date: Wed Feb 16 17:11:08 2011 +0100 x86-64, NUMA: Introduce struct numa_meminfo Arrays for memblks and nodeids and their length lived in separate variables making things unnecessarily cumbersome. Introduce struct numa_meminfo which contains all memory configuration info. This patch doesn't cause any behavior change. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit 8968dab8ad90ea16ef92f2406868354ea3ab6bb9 Author: Tejun Heo Date: Wed Feb 16 17:11:08 2011 +0100 x86-64, NUMA: Remove %NULL @nodeids handling from compute_hash_shift() numa_emulation() called compute_hash_shift() with %NULL @nodeids which meant identity mapping between index and nodeid. Make numa_emulation() build identity array and drop %NULL @nodeids handling from populate_memnodemap() and thus from compute_hash_shift(). This is to prepare for transition to using memblks instead. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit 5d371b08fea80c4fb7450d31e5a4e35b438ef850 Author: Tejun Heo Date: Wed Feb 16 17:11:08 2011 +0100 x86-64, NUMA: Kill {acpi|amd|dummy}_scan_nodes() They are empty now. Kill them. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit fd0435d8fb1d4e5771f9ae3af71f2a77c1f4bd09 Author: Tejun Heo Date: Wed Feb 16 17:11:08 2011 +0100 x86-64, NUMA: Unify the rest of memblk registration Move the remaining memblk registration logic from acpi_scan_nodes() to numa_register_memblks() and initmem_init(). This applies nodes_cover_memory() sanity check, memory node sorting and node_online() checking, which were only applied to acpi, to all init methods. As all memblk registration is moved to common code, active range clearing is moved to initmem_init() too and removed from bad_srat(). Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit 43a662f04f731c331706456c9852ef7146ba5d85 Author: Tejun Heo Date: Wed Feb 16 17:11:08 2011 +0100 x86-64, NUMA: Unify use of memblk in all init methods Make both amd and dummy use numa_add_memblk() to describe the detected memory blocks. This allows initmem_init() to call numa_register_memblk() regardless of init method in use. Drop custom memory registration codes from amd and dummy. After this change, memblk merge/cleanup in numa_register_memblks() is applied to all init methods. As this makes compute_hash_shift() and numa_register_memblks() used only inside numa_64.c, make them static. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit ef396ec96c1a8ffd2b0bc67f1f79c7274de02b95 Author: Tejun Heo Date: Wed Feb 16 17:11:07 2011 +0100 x86-64, NUMA: Factor out memblk handling into numa_{add|register}_memblk() Factor out memblk handling from srat_64.c into two functions in numa_64.c. This patch doesn't introduce any behavior change. The next patch will make all init methods use these functions. - v2: Fixed build failure on 32bit due to misplaced NR_NODE_MEMBLKS. Reported by Ingo. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit 48228f7b470a74b6469a250d2977a13128d8fe96 Author: Steven Rostedt Date: Tue Feb 8 12:39:54 2011 -0500 lockdep/timers: Explain in detail the locking problems del_timer_sync() may cause Twice I had to explain the output about why lockdep gives an error with locks in IRQ context and with del_timer_sync(). Might as well write it up and place it in the comments above the code in del_timer_sync(). Perhaps the next time this lockdep dump triggers people will understand the issues. It is a ticky issue and very subtle, explaining it in detail in the code may help others understand the issue when they stumble upon the bug again. Signed-off-by: Steven Rostedt Signed-off-by: Peter Zijlstra LKML-Reference: <1297186794.23343.19.camel@gandalf.stny.rr.com> Signed-off-by: Ingo Molnar commit a3ec4a603faf4244e275bf11b467aad092dfbd8a Merge: 51563cd 85e2efb Author: Ingo Molnar Date: Wed Feb 16 13:33:35 2011 +0100 Merge commit 'v2.6.38-rc5' into core/locking Merge reason: pick up upstream fixes. Signed-off-by: Ingo Molnar commit 46e49b3836c7cd2ae5b5fe76fa981d0d292a52fe Author: Venkatesh Pallipadi Date: Mon Feb 14 14:38:50 2011 -0800 sched: Wholesale removal of sd_idle logic sd_idle logic was introduced way back in 2005 (commit 5969fe06), as an HT optimization. As per the discussion in the thread here: lkml - sched: Resolve sd_idle and first_idle_cpu Catch-22 - v1 https://patchwork.kernel.org/patch/532501/ The capacity based logic in the load balancer right now handles this in a much cleaner way, handling more than 2 SMT siblings etc, and sd_idle does not seem to bring any additional benefits. sd_idle logic also has some bugs that has performance impact. Here is the patch that removes the sd_idle logic altogether. Also, there was a dependency of sched_mc_power_savings == 2, with sd_idle logic. Signed-off-by: Venkatesh Pallipadi Acked-by: Vaidyanathan Srinivasan Signed-off-by: Peter Zijlstra LKML-Reference: <1297723130-693-1-git-send-email-venki@google.com> Signed-off-by: Ingo Molnar commit 48fa4b8ecf683f5e411303553da9e186e8b8406e Merge: d95f412 85e2efb Author: Ingo Molnar Date: Wed Feb 16 13:31:51 2011 +0100 Merge commit 'v2.6.38-rc5' into sched/core Merge reason: Pick up upstream fixes. Signed-off-by: Ingo Molnar commit ba3dd36c6775264ee6e7354ba1aabcd6e86d7298 Author: Peter Zijlstra Date: Tue Feb 15 12:41:46 2011 +0100 perf: Optimize hrtimer events There is no need to re-initialize the hrtimer every time we start it, so don't do that (shaves a few cycles). Also, since we know hrtimers run at a fixed rate (nanoseconds) we can pre-compute the desired frequency at which they tick. This avoids us having to go through the whole adaptive frequency feedback logic (shaves another few cycles). Signed-off-by: Peter Zijlstra LKML-Reference: <1297448589.5226.47.camel@laptop> Signed-off-by: Ingo Molnar commit 163ec4354a5135c6c38c3f4a9b46a31900ebdf48 Author: Peter Zijlstra Date: Wed Feb 16 11:22:34 2011 +0100 perf: Optimize throttling code By pre-computing the maximum number of samples per tick we can avoid a multiplication and a conditional since MAX_INTERRUPTS > max_samples_per_tick. Signed-off-by: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit 4979d2729af22f6ce8faa325fc60a85a2c2daa02 Author: Robert Richter Date: Wed Feb 2 17:36:12 2011 +0100 perf, x86: Add support for AMD family 15h core counters This patch adds support for AMD family 15h core counters. There are major changes compared to family 10h. First, there is a new perfctr msr range for up to 6 counters. Northbridge counters are separate now. This patch only adds support for core counters. Second, certain events may only be scheduled on certain counters. For this we need to extend the event scheduling and constraints. We use cpu feature flags to calculate family 15h msr address offsets. This way we later can implement a faster ALTERNATIVE() version for this. Signed-off-by: Robert Richter Signed-off-by: Peter Zijlstra LKML-Reference: <20110215135210.GB5874@erda.amd.com> Signed-off-by: Ingo Molnar commit 73d6e52206a20354738418625cedc244cbfd5023 Author: Robert Richter Date: Wed Feb 2 17:40:59 2011 +0100 perf, x86: Store perfctr msr addresses in config_base/event_base Instead of storing the base addresses we can store the counter's msr addresses directly in config_base/event_base of struct hw_perf_event. This avoids recalculating the address with each msr access. The addresses are configured one time. We also need this change to later modify the address calculation. Signed-off-by: Robert Richter Signed-off-by: Peter Zijlstra LKML-Reference: <1296664860-10886-5-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar commit 69d8e1e8ac0a7d829f1c0fd5bd07eb3022d9a1a0 Author: Robert Richter Date: Wed Feb 2 17:40:58 2011 +0100 perf, x86: Add new AMD family 15h msrs to perfctr reservation code This patch allows the reservation of perfctrs with new msr addresses introduced for AMD cpu family 15h (0xc0010200/0xc0010201, etc). Signed-off-by: Robert Richter Signed-off-by: Peter Zijlstra LKML-Reference: <1296664860-10886-4-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar commit 41bf498949a263fa0b2d32524b89d696ac330e94 Author: Robert Richter Date: Wed Feb 2 17:40:57 2011 +0100 perf, x86: Calculate perfctr msr addresses in helper functions This patch adds helper functions to calculate perfctr msr addresses. We need this to later add support for AMD family 15h cpus. For this we have to change the algorithms to generate the perfctr's msr addresses. Signed-off-by: Robert Richter Signed-off-by: Peter Zijlstra LKML-Reference: <1296664860-10886-3-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar commit d45dd923fcc620c948bd1eda16cc61426ac31646 Author: Robert Richter Date: Wed Feb 2 17:40:56 2011 +0100 perf, x86: Use helper function in x86_pmu_enable_all() Use helper function in x86_pmu_enable_all() to minimize access to x86_pmu.eventsel in the fast path. The counter's msr address is now calculated using struct hw_perf_event. Later we add code that calculates the msr addresses with a table lookup which shouldn't be done in the fast path. Signed-off-by: Robert Richter Signed-off-by: Peter Zijlstra LKML-Reference: <1296664860-10886-2-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar commit 023695d96ee06f36cf5014e286edcd623e9fb847 Author: Stephane Eranian Date: Mon Feb 14 11:20:01 2011 +0200 perf tool: Add cgroup support This patch adds the ability to filter monitoring based on container groups (cgroups) for both perf stat and perf record. It is possible to monitor multiple cgroup in parallel. There is one cgroup per event. The cgroups to monitor are passed via a new -G option followed by a comma separated list of cgroup names. The cgroup filesystem has to be mounted. Given a cgroup name, the perf tool finds the corresponding directory in the cgroup filesystem and opens it. It then passes that file descriptor to the kernel. Example: $ perf stat -B -a -e cycles:u,cycles:u,cycles:u -G test1,,test2 -- sleep 1 Performance counter stats for 'sleep 1': 2,368,667,414 cycles test1 2,369,661,459 cycles cycles test2 1.001856890 seconds time elapsed Signed-off-by: Stephane Eranian Signed-off-by: Peter Zijlstra LKML-Reference: <4d590290.825bdf0a.7d0a.4890@mx.google.com> Signed-off-by: Ingo Molnar commit e5d1367f17ba6a6fed5fd8b74e4d5720923e0c25 Author: Stephane Eranian Date: Mon Feb 14 11:20:01 2011 +0200 perf: Add cgroup support This kernel patch adds the ability to filter monitoring based on container groups (cgroups). This is for use in per-cpu mode only. The cgroup to monitor is passed as a file descriptor in the pid argument to the syscall. The file descriptor must be opened to the cgroup name in the cgroup filesystem. For instance, if the cgroup name is foo and cgroupfs is mounted in /cgroup, then the file descriptor is opened to /cgroup/foo. Cgroup mode is activated by passing PERF_FLAG_PID_CGROUP in the flags argument to the syscall. For instance to measure in cgroup foo on CPU1 assuming cgroupfs is mounted under /cgroup: struct perf_event_attr attr; int cgroup_fd, fd; cgroup_fd = open("/cgroup/foo", O_RDONLY); fd = perf_event_open(&attr, cgroup_fd, 1, -1, PERF_FLAG_PID_CGROUP); close(cgroup_fd); Signed-off-by: Stephane Eranian [ added perf_cgroup_{exit,attach} ] Signed-off-by: Peter Zijlstra LKML-Reference: <4d590250.114ddf0a.689e.4482@mx.google.com> Signed-off-by: Ingo Molnar commit d41d5a01631af821d3a3447e6613a316f5ee6c25 Author: Peter Zijlstra Date: Mon Feb 7 17:02:20 2011 +0100 cgroup: Fix cgroup_subsys::exit callback Make the ::exit method act like ::attach, it is after all very nearly the same thing. The bug had no effect on correctness - fixing it is an optimization for the scheduler. Also, later perf-cgroups patches rely on it. Signed-off-by: Peter Zijlstra Acked-by: Paul Menage LKML-Reference: <1297160655.13327.92.camel@laptop> Signed-off-by: Ingo Molnar commit b00560f2d4de69bb12f66f9605985b516df98d77 Merge: bf1af3a 4fe757d Author: Ingo Molnar Date: Wed Feb 16 13:27:18 2011 +0100 Merge branch 'perf/urgent' into perf/core Merge reason: we need to queue up dependent patch Signed-off-by: Ingo Molnar commit 19095548704ecd0f32fd5deba01d56430ad7a344 Author: Tejun Heo Date: Wed Feb 16 12:13:07 2011 +0100 x86-64, NUMA: Kill {acpi|amd}_get_nodes() With common numa_nodes[], common code in numa_64.c can access it directly. Copy directly and kill {acpi|amd}_get_nodes(). Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit 206e42087a037fa3adca8908fd318a0cb64d4dee Author: Tejun Heo Date: Wed Feb 16 12:13:07 2011 +0100 x86-64, NUMA: Use common numa_nodes[] ACPI and amd are using separate nodes[] array. Add numa_nodes[] and use them in all NUMA init methods. cutoff_node() cleanup is moved from srat_64.c to numa_64.c and applied in initmem_init() regardless of init methods. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit 45fe6c78c4ccc384044d1b4877eebe7acf359e76 Author: Tejun Heo Date: Wed Feb 16 12:13:07 2011 +0100 x86-64, NUMA: Move apicid to numa mapping initialization from amd_scan_nodes() to amd_numa_init() This brings amd initialization behavior closer to that of acpi. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit 99df738cd28cc39054cd1a77685d4a94ed2193a4 Author: Tejun Heo Date: Wed Feb 16 12:13:07 2011 +0100 x86-64, NUMA: Remove local variable found from amd_numa_init() Use weight count on mem_nodes_parsed instead. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit ec8cf29b1d39aeb6ef98bc589f0c9a33a8f94c49 Author: Tejun Heo Date: Wed Feb 16 12:13:07 2011 +0100 x86-64, NUMA: Use common {cpu|mem}_nodes_parsed ACPI and amd are using separate nodes_parsed masks. Add {cpu|mem}_nodes_parsed and use them in all NUMA init methods. Initialization of the masks and building node_possible_map are now handled commonly by initmem_init(). dummy_numa_init() is updated to set node 0 on both masks. While at it, move the info messages from scan to init. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit ffe77a4605fb2588f8666850ad3e3b196241658f Author: Tejun Heo Date: Wed Feb 16 12:13:06 2011 +0100 x86-64, NUMA: Restructure initmem_init() Reorganize initmem_init() such that, * Different NUMA init methods are iterated in a consistent way. * Each iteration re-initializes all the parameters and different method can be tried after a failure. * Dummy init is handled the same as other methods. Apart from how retry after failure, this patch doesn't change the behavior. The call sequences are kept equivalent across the conversion. After the change, bad_srat() doesn't need to clear apic to node mapping or worry about numa_off. Simplified accordingly. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit d8fc3afc49bb226c20e37f48a4ddd493cd092837 Author: Tejun Heo Date: Wed Feb 16 12:13:06 2011 +0100 x86, NUMA: Move *_numa_init() invocations into initmem_init() There's no reason for these to live in setup_arch(). Move them inside initmem_init(). - v2: x86-32 initmem_init() weren't updated breaking 32bit builds. Fixed. Found by Ankita. Signed-off-by: Tejun Heo Cc: Ankita Garg Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit a9aec56afac238e4ed3980bd10b22121b83866dd Author: Tejun Heo Date: Wed Feb 16 12:13:06 2011 +0100 x86-64, NUMA: Wrap acpi_numa_init() so that failure can be indicated by return value Because of the way ACPI tables are parsed, the generic acpi_numa_init() couldn't return failure when error was detected by arch hooks. Instead, the failure state was recorded and later arch dependent init hook - acpi_scan_nodes() - would fail. Wrap acpi_numa_init() with x86_acpi_numa_init() so that failure can be indicated as return value immediately. This is in preparation for further NUMA init cleanups. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit 940fed2e79a15cf0d006c860d7811adbe5c19882 Author: Tejun Heo Date: Wed Feb 16 12:13:06 2011 +0100 x86-64, NUMA: Unify {acpi|amd}_{numa_init|scan_nodes}() arguments and return values The functions used during NUMA initialization - *_numa_init() and *_scan_nodes() - have different arguments and return values. Unify them such that they all take no argument and return 0 on success and -errno on failure. This is in preparation for further NUMA init cleanups. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit 86ef4dbf1f736bb1a4d567e043e3dd81b8b7860c Author: Tejun Heo Date: Wed Feb 16 12:13:06 2011 +0100 x86, NUMA: Drop @start/last_pfn from initmem_init() initmem_init() extensively accesses and modifies global data structures and the parameters aren't even followed depending on which path is being used. Drop @start/last_pfn and let it deal with @max_pfn directly. This is in preparation for further NUMA init cleanups. - v2: x86-32 initmem_init() weren't updated breaking 32bit builds. Fixed. Found by Yinghai. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit 13081df5dd6eae1951a3c398fa17d3ed2037a78f Author: Tejun Heo Date: Wed Feb 16 12:13:06 2011 +0100 x86-64, NUMA: Simplify hotplug node handling in acpi_numa_memory_affinity_init() Hotplug node handling in acpi_numa_memory_affinity_init() was unnecessarily complicated with storing the original nodes[] entry and restoring it afterwards. Simplify it by not modifying the nodes[] entry for hotplug nodes from the beginning. Signed-off-by: Tejun Heo Cc: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit 7d36b7bc9022f35f95cd85cdf441846298e8f9fb Author: Tejun Heo Date: Wed Feb 16 12:13:06 2011 +0100 x86-64, NUMA: Make dummy node initialization path similar to non-dummy ones Dummy node initialization in initmem_init() didn't initialize apicid to node mapping and set cpu to node mapping directly by caling numa_set_node(), which is different from non-dummy init paths. Update it such that they behave similarly. Initialize apicid to node mapping and call numa_init_array(). The actual cpu to node mapping is handled by init_cpu_to_node() later. Signed-off-by: Tejun Heo Acked-by: Yinghai Lu Cc: Brian Gerst Cc: Cyrill Gorcunov Cc: Shaohui Zheng Cc: David Rientjes Cc: Ingo Molnar Cc: H. Peter Anvin commit 275a88d3cf0e2f08a98dc5ce9494af0cb6ed2092 Merge: 52b8b8d 9e81509 Author: Ingo Molnar Date: Wed Feb 16 09:45:33 2011 +0100 Merge branch 'x86/amd-nb' into x86/mm Merge reason: consolidate it into the more generic x86/mm tree to prevent conflicts with ongoing NUMA work. Signed-off-by: Ingo Molnar commit 52b8b8d7251f8f7b8ed4a6c623618106d83e18b5 Merge: 02ac81a 14392fd Author: Ingo Molnar Date: Wed Feb 16 09:44:04 2011 +0100 Merge branch 'x86/numa' into x86/mm Merge reason: consolidate it into the more generic x86/mm tree to prevent conflicts. Signed-off-by: Ingo Molnar commit 02ac81a812fe0575a8475a93bdc22fb291ebad91 Merge: 9a6d44b d2137d5 Author: Ingo Molnar Date: Wed Feb 16 09:42:50 2011 +0100 Merge branch 'x86/bootmem' into x86/mm Merge reason: the topic is ready - consolidate it into the more generic x86/mm tree and prevent conflicts. Signed-off-by: Ingo Molnar commit 9e81509efc4fefcdd75cc6a4121672fa71ae8745 Author: Borislav Petkov Date: Mon Feb 14 18:14:51 2011 +0100 x86, amd: Initialize variable properly Commit d518573de63f ("x86, amd: Normalize compute unit IDs on multi-node processors") introduced compute unit normalization but causes a compiler warning: arch/x86/kernel/cpu/amd.c: In function 'amd_detect_cmp': arch/x86/kernel/cpu/amd.c:268: warning: 'cores_per_cu' may be used uninitialized in this function arch/x86/kernel/cpu/amd.c:268: note: 'cores_per_cu' was declared here The compiler is right - initialize it with a proper value. Also, fix up a comment while at it. Reported-by: Andrew Morton Signed-off-by: Borislav Petkov Cc: Andreas Herrmann LKML-Reference: <20110214171451.GB10076@kryptos.osrc.amd.com> Signed-off-by: Ingo Molnar commit 53c39ce56d203d80ba8217a16bb024b25185fb7e Author: Thomas Gleixner Date: Sun Feb 6 22:45:38 2011 +0000 um: Select GENERIC_HARDIRQS_NO_DEPRECATED irq chips converted and proper accessor functions used. Signed-off-by: Thomas Gleixner Cc: Jeff Dike Cc: Andrew Morton LKML-Reference: <20110206224515.430825903@linutronix.de> commit d5b4eea1c575b78448fd5dc54d250ff302ca22f9 Author: Thomas Gleixner Date: Sun Feb 6 22:45:36 2011 +0000 um: Use proper accessors in show_interrupts() Signed-off-by: Thomas Gleixner Cc: Jeff Dike Cc: Andrew Morton LKML-Reference: <20110206224515.322707425@linutronix.de> commit 1d119aa06fb2b2608151a162f15c480d46694b65 Author: Thomas Gleixner Date: Sun Feb 6 22:45:34 2011 +0000 um: Convert irq_chips to new functions Signed-off-by: Thomas Gleixner Cc: Jeff Dike Cc: Andrew Morton LKML-Reference: <20110206224515.224027758@linutronix.de> commit 6ea96e7e4946f790330557e4b7c4c8a174c1c6d2 Author: Thomas Gleixner Date: Sun Feb 6 22:45:31 2011 +0000 um: Remove stale irq_chip.end irq_chip.end got obsolete with the remnoval of __do_IRQ(). Signed-off-by: Thomas Gleixner Cc: Jeff Dike Cc: Peter Zijlstra Cc: Andrew Morton LKML-Reference: <20110206224515.135703209@linutronix.de> commit 168202c7bf89d7a2abaf8deaf4bbed18a1f7b3a3 Author: Feng Tang Date: Tue Feb 15 00:13:32 2011 +0800 mrst/vrtc: Avoid using cmos rtc ops If we don't assign Moorestown specific wallclock init and ops function the rtc/persisent clock code will use cmos rtc for access, this will crash Moorestown in that the ioports are not present. Also in vrtc driver, should avoid using cmos access to check UIP status. [feng.tang@intel.com: use set_fixmap_offset_nocache() to simplify code] Signed-off-by: Jacob Pan Signed-off-by: Feng Tang Signed-off-by: Alan Cox Signed-off-by: Thomas Gleixner commit 6b617e224dfac0b64ed70dacdac50be6eb78a6a1 Author: Feng Tang Date: Tue Feb 15 00:13:31 2011 +0800 x86/platform: Add a wallclock_init func to x86_init.timers ops Some wall clock devices use MMIO based HW register, this new function will give them a chance to do some initialization work before their get/set_time service get called, which is usually in early kernel boot phase. Signed-off-by: Feng Tang Signed-off-by: Jacob Pan Signed-off-by: Alan Cox Signed-off-by: Thomas Gleixner commit bf1af3a809506645b9130755b713b008da14737f Merge: 0de4b34 868baf0 Author: Ingo Molnar Date: Mon Feb 14 15:15:16 2011 +0100 Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core commit 14392fd329eca9b59d51c0aa5d0acfb4965424d1 Author: David Rientjes Date: Mon Feb 7 14:08:53 2011 -0800 x86, numa: Add error handling for bad cpu-to-node mappings CONFIG_DEBUG_PER_CPU_MAPS may return NUMA_NO_NODE when an early_cpu_to_node() mapping hasn't been initialized. In such a case, it emits a warning and continues without an issue but callers may try to use the return value to index into an array. We can catch those errors and fail silently since a warning has already been emitted. No current user of numa_add_cpu() requires this error checking to avoid a crash, but it's better to be proactive in case a future user happens to have a bug and a user tries to diagnose it with CONFIG_DEBUG_PER_CPU_MAPS. Reported-by: Jesper Juhl Signed-off-by: David Rientjes Cc: Tejun Heo LKML-Reference: Signed-off-by: Ingo Molnar commit b366801c95bdbeda811ac9668a3943051a18c188 Merge: eff9073 100b33c Author: Ingo Molnar Date: Mon Feb 14 13:28:29 2011 +0100 Merge commit 'v2.6.38-rc4' into x86/numa Merge reason: Merge latest fixes before applying new patch. Signed-off-by: Ingo Molnar commit e5fea868e6c04343e501176a373d568c1c0094aa Author: Yinghai Lu Date: Tue Feb 8 23:22:17 2011 -0800 x86: Fix and clean up generic_processor_info() One of the error printouts in generic_processor_info() prints out the APIC version instead of the cpu index the warning text describes. Move version validation down, after we get the right cpu index. -v2: add comments about reason why we can have cpu=0 there. Signed-off-by: Yinghai Lu LKML-Reference: <4D5240A9.4080703@kernel.org> [ Cleaned up and made the BIOS bug printouts more consistent ] Signed-off-by: Ingo Molnar commit 91e04ec05838a5b2c790decf2a91af98cb1666e8 Merge: 792363d 100b33c Author: Ingo Molnar Date: Mon Feb 14 13:18:51 2011 +0100 Merge commit 'v2.6.38-rc4' into x86/cpu Merge reason: pick up the latest fixes. Signed-off-by: Ingo Molnar commit 9a6d44b9adb777ca9549e88cd55bd8f2673c52a2 Author: Kamal Mostafa Date: Thu Feb 3 17:38:05 2011 -0800 x86: Emit "mem=nopentium ignored" warning when not supported Emit warning when "mem=nopentium" is specified on any arch other than x86_32 (the only that arch supports it). Signed-off-by: Kamal Mostafa BugLink: http://bugs.launchpad.net/bugs/553464 Cc: Yinghai Lu Cc: Len Brown Cc: Rafael J. Wysocki LKML-Reference: <1296783486-23033-2-git-send-email-kamal@canonical.com> Signed-off-by: Ingo Molnar Cc: commit 77eed821accf5dd962b1f13bed0680e217e49112 Author: Kamal Mostafa Date: Thu Feb 3 17:38:04 2011 -0800 x86: Fix panic when handling "mem={invalid}" param Avoid removing all of memory and panicing when "mem={invalid}" is specified, e.g. mem=blahblah, mem=0, or mem=nopentium (on platforms other than x86_32). Signed-off-by: Kamal Mostafa BugLink: http://bugs.launchpad.net/bugs/553464 Cc: Yinghai Lu Cc: Len Brown Cc: Rafael J. Wysocki Cc: # .3x: as far back as it applies LKML-Reference: <1296783486-23033-1-git-send-email-kamal@canonical.com> Signed-off-by: Ingo Molnar commit 7064d865af804b9b841e7b9a3e9b653e40c3e5ca Author: Shaohua Li Date: Mon Jan 17 10:52:10 2011 +0800 x86: Avoid tlbstate lock if not enough cpus This one isn't related to previous patch. If online cpus are below NUM_INVALIDATE_TLB_VECTORS, we don't need the lock. The comments in the code declares we don't need the check, but a hot lock still needs an atomic operation and expensive, so add the check here. Uses nr_cpu_ids here as suggested by Eric Dumazet. Signed-off-by: Shaohua Li Acked-by: Eric Dumazet Cc: Andi Kleen LKML-Reference: <1295232730.1949.710.camel@sli10-conroe> Signed-off-by: Ingo Molnar commit 70e4a369733a21e3d16b059a6ccdad22a344bf57 Author: Shaohua Li Date: Mon Jan 17 10:52:07 2011 +0800 x86: Scale up the number of TLB invalidate vectors with NR_CPUs, up to 32 Make the maxium TLB invalidate vectors depend on NR_CPUS linearly, with a maximum of 32 vectors. We currently only have 8 vectors for TLB invalidation and that is clearly inadequate. If we have a lot of CPUs, the CPUs need share the 8 vectors and tlbstate_lock is used to protect them. flush_tlb_page() is heavily used in page reclaim, which will cause a lot of lock contention for tlbstate_lock. Andi Kleen suggested increasing the vectors number to 32, which should be good for current typical systems to reduce the tlbstate_lock contention. My test system has 4 sockets and 64G memory, and 64 CPUs. My workload creates 64 processes. Each process mmap reads a big empty sparse file. The total size of the files are 2*total_mem, so this will cause a lot of page reclaim. Below is the result I get from perf call-graph profiling: without the patch: ------------------ 24.25% usemem [kernel] [k] _raw_spin_lock | --- _raw_spin_lock | |--42.15%-- native_flush_tlb_others with the patch: ------------------ 14.96% usemem [kernel] [k] _raw_spin_lock | --- _raw_spin_lock |--13.89%-- native_flush_tlb_others So this heavily reduces the tlbstate_lock contention. Suggested-by: Andi Kleen Signed-off-by: Shaohua Li Cc: Eric Dumazet Cc: Arnaldo Carvalho de Melo Cc: Frederic Weisbecker Cc: Peter Zijlstra LKML-Reference: <1295232727.1949.709.camel@sli10-conroe> Signed-off-by: Ingo Molnar commit 3a09fb4570a1cce11472b8e5da3f6ee409f529d5 Author: Shaohua Li Date: Mon Jan 17 10:52:05 2011 +0800 x86: Allocate 32 tlb_invalidate_interrupt handler stubs Add up to 32 invalidate_interrupt handlers. How many handlers are added depends on NUM_INVALIDATE_TLB_VECTORS. So if NUM_INVALIDATE_TLB_VECTORS is smaller than 32, we reduce code size. Signed-off-by: Shaohua Li Cc: Andi Kleen Cc: Eric Dumazet LKML-Reference: <1295232725.1949.708.camel@sli10-conroe> Signed-off-by: Ingo Molnar commit 60f6e65d7887c257392313755f95540ef5e7ea89 Author: Shaohua Li Date: Mon Jan 17 10:52:02 2011 +0800 x86: Cleanup vector usage Cleanup the vector usage and make them continuous if possible. Signed-off-by: Shaohua Li Cc: Andi Kleen Cc: Eric Dumazet LKML-Reference: <1295232722.1949.707.camel@sli10-conroe> Signed-off-by: Ingo Molnar commit 0de4b34d466bae571b50f41c7296b85248205e35 Author: Masami Hiramatsu Date: Mon Feb 14 14:48:07 2011 +0900 tracing/kprobe: Fix NULL pointer deref check Add NULL check for avoiding NULL pointer deref. This bug has been introduced by: 1ff511e35ed8: tracing/kprobes: Add bitfield type which causes a null pointer dereference bug when kprobe-tracer parses an argument without type. Reported-by: Arnaldo Carvalho de Melo Signed-off-by: Masami Hiramatsu Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Peter Zijlstra LKML-Reference: <20110214054807.8919.69740.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar Reported-by: Arnaldo Carvalho de Melo commit d2137d5af4259f50c19addb8246a186c9ffac325 Merge: f005fe1 795abaf Author: Ingo Molnar Date: Mon Feb 14 11:55:18 2011 +0100 Merge branch 'linus' into x86/bootmem Conflicts: arch/x86/mm/numa_64.c Merge reason: fix the conflict, update to latest -rc and pick up this dependent fix from Yinghai: e6d2e2b2b1e1: memblock: don't adjust size in memblock_find_base() Signed-off-by: Ingo Molnar commit 40262a71536d0b4a7486b279fa39463cfffabcc2 Merge: 3e86858 0849327 Author: Ingo Molnar Date: Sat Feb 12 02:24:52 2011 +0100 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core commit 3e86858133c632060b290985837a11dbe2e0cc0e Merge: d5e3d74 100b33c Author: Ingo Molnar Date: Sat Feb 12 02:24:23 2011 +0100 Merge commit 'v2.6.38-rc4' into perf/core Merge reason: pick up the latest fixes. Signed-off-by: Ingo Molnar commit 868baf07b1a259f5f3803c1dc2777b6c358f83cf Author: Steven Rostedt Date: Thu Feb 10 21:26:13 2011 -0500 ftrace: Fix memory leak with function graph and cpu hotplug When the fuction graph tracer starts, it needs to make a special stack for each task to save the real return values of the tasks. All running tasks have this stack created, as well as any new tasks. On CPU hot plug, the new idle task will allocate a stack as well when init_idle() is called. The problem is that cpu hotplug does not create a new idle_task. Instead it uses the idle task that existed when the cpu went down. ftrace_graph_init_task() will add a new ret_stack to the task that is given to it. Because a clone will make the task have a stack of its parent it does not check if the task's ret_stack is already NULL or not. When the CPU hotplug code starts a CPU up again, it will allocate a new stack even though one already existed for it. The solution is to treat the idle_task specially. In fact, the function_graph code already does, just not at init_idle(). Instead of using the ftrace_graph_init_task() for the idle task, which that function expects the task to be a clone, have a separate ftrace_graph_init_idle_task(). Also, we will create a per_cpu ret_stack that is used by the idle task. When we call ftrace_graph_init_idle_task() it will check if the idle task's ret_stack is NULL, if it is, then it will assign it the per_cpu ret_stack. Reported-by: Benjamin Herrenschmidt Suggested-by: Peter Zijlstra Cc: Stable Tree Signed-off-by: Steven Rostedt commit 44b46c3ef805793ab3a7730dc71c72d0f258ea8e Author: Ian Campbell Date: Fri Feb 11 16:37:41 2011 +0000 xen: annotate functions which only call into __init at start of day Both xen_hvm_init_shared_info and xen_build_mfn_list_list can be called at resume time as well as at start of day but only reference __init functions (extend_brk) at start of day. Hence annotate with __ref. WARNING: arch/x86/built-in.o(.text+0x4f1): Section mismatch in reference from the function xen_hvm_init_shared_info() to the function .init.text:extend_brk() The function xen_hvm_init_shared_info() references the function __init extend_brk(). This is often because xen_hvm_init_shared_info lacks a __init annotation or the annotation of extend_brk is wrong. xen_hvm_init_shared_info calls extend_brk() iff !shared_info_page and initialises shared_info_page with the result. This happens at start of day only. WARNING: arch/x86/built-in.o(.text+0x599b): Section mismatch in reference from the function xen_build_mfn_list_list() to the function .init.text:extend_brk() The function xen_build_mfn_list_list() references the function __init extend_brk(). This is often because xen_build_mfn_list_list lacks a __init annotation or the annotation of extend_brk is wrong. (this warning occurs multiple times) xen_build_mfn_list_list only calls extend_brk() at boot time, while building the initial mfn list list Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit 6b08cfebd3bd346d8a2fd68a2265fc7736849802 Author: Ian Campbell Date: Fri Feb 11 15:23:58 2011 +0000 xen p2m: annotate variable which appears unused CC arch/x86/xen/p2m.o arch/x86/xen/p2m.c: In function 'm2p_remove_override': arch/x86/xen/p2m.c:460: warning: 'address' may be used uninitialized in this function arch/x86/xen/p2m.c: In function 'm2p_add_override': arch/x86/xen/p2m.c:426: warning: 'address' may be used uninitialized in this function In actual fact address is inialised in one "if (!PageHighMem(page))" statement and used in a second and so is always initialised before use. Signed-off-by: Ian Campbell Signed-off-by: Konrad Rzeszutek Wilk commit b052181a985592f81767f631f9f42accb4b436cd Author: Ian Campbell Date: Fri Feb 11 15:23:56 2011 +0000 xen: events: mark cpu_evtchn_mask_p as __refdata This variable starts out pointing at init_evtchn_mask which is marked __initdata but is set to point to a non-init data region in xen_init_IRQ which is itself an __init function so this is safe. Signed-off-by: Ian Campbell Tested-and-acked-by: Andrew Jones Signed-off-by: Konrad Rzeszutek Wilk commit 0849327d13a0bd7f6512b7c21f4b3e79efb2076d Author: Arnaldo Carvalho de Melo Date: Fri Feb 11 12:09:54 2011 -0200 perf report: Fix initializion of annotate symbol priv area We only allocate it when in TUI mode. In --stdio mode unconditionally initializing this area leads to memory corruption. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 7c940c18c57e45910f7dd9a4011c4658cacba4b6 Merge: d5e3d74 401b8e1 Author: Arnaldo Carvalho de Melo Date: Fri Feb 11 11:45:54 2011 -0200 Merge remote branch 'acme/perf/urgent' into perf/core Fixups due to rename of event_t routines from event__ to perf_event__ done in perf/core. Conflicts: tools/perf/builtin-record.c tools/perf/builtin-top.c tools/perf/util/event.c tools/perf/util/event.h Signed-off-by: Arnaldo Carvalho de Melo commit 51327ada7142ab520ed610a42572d1f4cbfbb2dc Merge: 44951a6 986c011 Author: Thomas Gleixner Date: Fri Feb 11 00:26:54 2011 +0100 Merge branch 'irq/for-mips' into irq/core Reason: irq/for-mips is provided for mips to make core independent progress. Merge it into irq/core to avoid conflicts Signed-off-by: Thomas Gleixner commit 986c011ddbb3ed44b35e1bfd67f6aa60b293b495 Author: David Daney Date: Wed Feb 9 16:04:25 2011 -0800 genirq: Call bus_lock/unlock functions in setup_irq() irq_chips that supply .irq_bus_lock/.irq_bus_sync_unlock functions, expect that the other chip methods will be called inside of calls to the pair. If this expectation is not met, things tend to not work. Make setup_irq() call chip_bus_lock()/chip_bus_sync_unlock() too. For the vast majority of irq_chips, this will be a NOP as most don't have these bus lock functions. [ tglx: No we don't want to call that in __setup_irq(). Way too many error exit pathes. ] Signed-off-by: David Daney LKML-Reference: <1297296265-18680-1-git-send-email-ddaney@caviumnetworks.com> Signed-off-by: Thomas Gleixner commit 691269f0d918cd72454c254f97722f194c07b9a8 Author: Jan Beulich Date: Wed Feb 9 08:26:53 2011 +0000 x86: Adjust section placement in AMD northbridge related code amd_nb_misc_ids[] can live in .rodata, and enable_pci_io_ecs() can be moved into .cpuinit.text. Signed-off-by: Jan Beulich Cc: Hans Rosenfeld Cc: Andreas Herrmann Cc: Borislav Petkov LKML-Reference: <4D525DDD0200007800030F07@vpn.id2.novell.com> Signed-off-by: Ingo Molnar commit b82fef82d56789439e6be638a87a1a5bba1e6e75 Author: Jan Beulich Date: Wed Feb 9 08:24:34 2011 +0000 x86: Partly unify asm-offsets_{32,64}.c Just consolidating the common parts. Full unification would seem straight forward, but it's not clear the necessary #ifdef-s would be acceptable. Signed-off-by: Jan Beulich LKML-Reference: <4D525D520200007800030EE9@vpn.id2.novell.com> Signed-off-by: Ingo Molnar commit 94d1ac8b55799be10487fff9766cce6d6628462a Author: Jan Beulich Date: Wed Feb 9 08:22:46 2011 +0000 x86: Reduce back the alignment of the per-CPU data section This complements commit: 47f19a0814e8: percpu: Remove the multi-page alignment facility reverting one leftover of: fe8e0c25cad2: x86, 32-bit: Align percpu area and irq stacks to THREAD_SIZE Signed-off-by: Jan Beulich Acked-by: Alexander van Heukelum Cc: Linus Torvalds LKML-Reference: <4D525CE60200007800030EE5@vpn.id2.novell.com> Signed-off-by: Ingo Molnar Cc: Alexander van Heukelum commit 44d60c0f5c58c2168f31df9a481761451840eb54 Author: Borislav Petkov Date: Thu Feb 10 12:19:47 2011 +0100 x86, microcode, AMD: Extend ucode size verification The different families have a different max size for the ucode patch, adjust size checking to the family we're running on. Also, do not vzalloc the max size of the ucode but only the actual size that is passed on from the firmware loader. Signed-off-by: Borislav Petkov commit 22b7fcdae562b6792b3f5517e89fd7e0337180ae Author: Torben Hohn Date: Thu Jan 27 16:00:12 2011 +0100 mn10300: Switch do_timer() to xtimer_update() Only one CPU gets the timer interrupt so mn10300_last_tsc does not need to be protected by xtime lock. Remove xtime lovking and use xtime_update() which does the locking itself. Signed-off-by: Torben Hohn Cc: David Howells Cc: Koichi Yasutake LKML-Reference: <20110127150011.23248.62040.stgit@localhost> Signed-off-by: Thomas Gleixner commit 258721ef34fce97a7a6ca9cebebb303827645868 Author: Borislav Petkov Date: Wed Jan 5 18:13:19 2011 +0100 x86, microcode, AMD: Cleanup dmesg output Unify pr_* to use pr_fmt, shorten messages, correct type formatting. Signed-off-by: Borislav Petkov Acked-by: Andreas Herrmann commit 05ff02e4c0686051fcb074aec92df03f2c184fd1 Author: Borislav Petkov Date: Wed Jan 5 18:04:11 2011 +0100 x86, microcode, AMD: Remove unneeded memset call collect_cpu_info_amd() clears its csig arg but this is done in the microcode_core's collect_cpu_info() by clearing the embedding struct ucode_cpu_info. Drop it. Signed-off-by: Borislav Petkov Acked-by: Andreas Herrmann commit 7cc27349cbfec271eecec9488b4bf3f3fadb2ce4 Author: Borislav Petkov Date: Fri Dec 31 16:57:48 2010 +0100 x86, microcode, AMD: Simplify get_next_ucode Do not copy the section header but look at it directly through the pointer. Also, make it return a ptr to a ucode header directly thus dropping a bunch of unneeded casts. Finally, simplify generic_load_microcode(), while at it. Signed-off-by: Borislav Petkov Acked-by: Andreas Herrmann commit 10de52d6655ef0d4a1b8d2804db30208c26601ed Author: Borislav Petkov Date: Thu Dec 30 22:10:12 2010 +0100 x86, microcode, AMD: Simplify install_equiv_cpu_table There's no need to memcpy the ucode header in order to look at it only in this function - use the original buffer instead. Also, fix return type semantics by returning a negative value on error and a positive otherwise. Signed-off-by: Borislav Petkov Acked-by: Andreas Herrmann commit ffc7e8ac820bf9dd6106b01d3e64fecb5177cf43 Author: Borislav Petkov Date: Thu Dec 30 21:06:01 2010 +0100 x86, microcode, AMD: Release firmware on error When the ucode magic is wrong, for whatever reason, we don't release the loaded firmware binary and its related resources. Make sure we do. Also, fix function naming to fit this driver's convention and shorten variable names. Signed-off-by: Borislav Petkov Acked-by: Andreas Herrmann commit 6c53cbfced048c421e4f72cb2183465f68fbc5e7 Author: Borislav Petkov Date: Thu Jan 6 16:56:51 2011 +0100 x86, microcode: Correct sysdev_add error path When we encounter an error while initting the microcode driver on a CPU, we must undo the previously added sysfs group. Cc: Tigran Aivazian Signed-off-by: Borislav Petkov Acked-by: Andreas Herrmann commit 6752ab4a9c30d5411b2dfdb251a3f1cb18aae487 Author: Steven Rostedt Date: Tue Feb 8 13:54:06 2011 -0500 tracing: Deprecate tracing_enabled for tracing_on tracing_enabled should not be used, it is heavy weight and does not do much in helping lower the overhead. tracing_on should be used instead. Warn users to use tracing_on when tracing_enabled is used as it will soon be removed from the tracing directory. Signed-off-by: Steven Rostedt commit 87d80de2800d087ea833cb79bc13f85ff34ed49f Author: Steven Rostedt Date: Tue Feb 8 13:19:49 2011 -0500 tracing: Remove obsolete sched_switch tracer The trace events sched_switch and sched_wakeup do the same thing as the stand alone sched_switch tracer does. It is no longer needed. Signed-off-by: Steven Rostedt commit f4d5c029bd6731baac0937324cef0f746e7d5ea7 Author: Lai Jiangshan Date: Wed Jan 26 16:49:00 2011 +0800 tracing: Compile time initialization for event flags value Compile time initialization is better than runtime initialization. Remove many early_initcall()s and many trace_init_flags_##name()s. Acked-by: Frederic Weisbecker Signed-off-by: Lai Jiangshan LKML-Reference: <4D3FDFFC.6030304@cn.fujitsu.com> Signed-off-by: Steven Rostedt commit cce1dac871f387d0f3da81440d85bd387d8fd5a6 Author: Uwe Kleine-König Date: Mon Jan 24 21:12:01 2011 +0100 trivial: Fix Steven's Copyright typos OK, the copyright allows you to write a copy, still I think the lawyers prefer the correct spelling. Signed-off-by: Uwe Kleine-König LKML-Reference: <1295899921-11333-1-git-send-email-u.kleine-koenig@pengutronix.de> Signed-off-by: Steven Rostedt commit 44951a60ff888add9e84f509ffce20052e45af94 Author: Thomas Gleixner Date: Fri Feb 4 17:33:49 2011 +0100 genirq: Remove dead code CONFIG_KSTAT_IRQS_ONDEMAND does not exist. It's not worth to implement it. Use sparse irqs if you care about memory consumption of the interrupt layer. Found by undertaker: http://vamos.informatik.uni-erlangen.de/trac/undertaker Signed-off-by: Thomas Gleixner commit c305d524e5dd3c3c7a6035083e30950bea1b52dc Author: Thomas Gleixner Date: Wed Feb 2 17:10:48 2011 +0100 softirq: Avoid stack switch from ksoftirqd ksoftirqd() calls do_softirq() which switches stacks on several architectures. That makes no sense at all. ksoftirqd's stack is sufficient. Call __do_softirq() directly. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra Cc: Benjamin Herrenschmidt Cc: Heiko Carstens Acked-by: David Miller Cc: Paul Mundt Reviewed-by: Frank Rowand LKML-Reference: commit d5e3d747007fdb541e57ed72e020ff0b94db3470 Author: Arnaldo Carvalho de Melo Date: Tue Feb 8 15:29:25 2011 -0200 perf annotate: Fix annotate context lines regression The live annotation done in 'perf top' needs to limit the context before lines that aren't filtered out by the min percent filter, if we don't do that, the screen in a tty often is not enough for showing what is interesting: lines with hits and a few source code lines before it. Reported-by: Mike Galbraith Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit ce6f4fab4059cd72638a0cfa596a8ee2c79c1c8e Author: Arnaldo Carvalho de Melo Date: Tue Feb 8 13:27:39 2011 -0200 perf annotate: Move locking to struct annotation Since we'll need it when implementing the live annotate TUI browser. This also simplifies things a bit by having the list head for the source code to be in the dynamicly allocated part of struct annotation, that way we don't have to pass it around, it can be found from the struct symbol that is passed everywhere. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit e3087b80aa0bceda9863f33307460f3ba79f2b15 Author: Arnaldo Carvalho de Melo Date: Tue Feb 8 15:01:39 2011 -0200 perf annotate: Fix --stdio rendering The checks for not using a max_lines parameter were b0rked, problem introduced in 3653246. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 5e38ca8f3ea423442eaafe1b7e206084aa38120a Author: Jiri Olsa Date: Wed Feb 2 13:28:18 2011 +0100 tracing: Add unstable sched clock note to the warning The warning "Delta way too big" warning might appear on a system with unstable shed clock right after the system is resumed and tracing was enabled during the suspend. Since it's not realy bug, and the unstable sched clock is working fast and reliable otherwise, Steven suggested to keep using the sched clock in any case and just to make note in the warning itself. Signed-off-by: Jiri Olsa LKML-Reference: <1296649698-6003-1-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt commit c9a443cdf7726ce8b78c3177c6ae601ce37292fc Merge: 285c1a2 dc5f219 Author: Thomas Gleixner Date: Tue Feb 8 16:38:00 2011 +0100 Merge branch 'irq/for-xen' into irq/core irq/for-xen contains new functionality to avoid Xen private irq hackery. That branch has a single irq commit and is pulled by Xen to base their new features on. Merge it into irq/core as other patches modify the same code. Signed-off-by: Thomas Gleixner commit dc5f219e88294b93009eef946251251ffffb6d60 Author: Thomas Gleixner Date: Fri Feb 4 13:19:20 2011 +0100 genirq: Add IRQF_FORCE_RESUME Xen needs to reenable interrupts which are marked IRQF_NO_SUSPEND in the resume path. Add a flag to force the reenabling in the resume code. Tested-and-acked-by: Ian Campbell Signed-off-by: Thomas Gleixner commit ae07f551c42d6e4162436ca452a199deac9dab4d Author: Ian Munsie Date: Thu Feb 3 14:27:25 2011 +1100 tracing/syscalls: Early terminate search for sys_ni_syscall Many system calls are unimplemented and mapped to sys_ni_syscall, but at boot ftrace would still search through every syscall metadata entry for a match which wouldn't be there. This patch adds causes the search to terminate early if the system call is not mapped. Signed-off-by: Ian Munsie LKML-Reference: <1296703645-18718-7-git-send-email-imunsie@au1.ibm.com> Signed-off-by: Steven Rostedt commit b2d55496818d64310b9f5486d4eea76ea614d7f8 Author: Ian Munsie Date: Thu Feb 3 14:27:23 2011 +1100 tracing/syscalls: Allow arch specific syscall symbol matching Some architectures have unusual symbol names and the generic code to match the symbol name with the function name for the syscall metadata will fail. For example, symbols on PPC64 start with a period and the generic code will fail to match them. This patch moves the match logic out into a separate function which an arch can override by defining ARCH_HAS_SYSCALL_MATCH_SYM_NAME in asm/ftrace.h and implementing arch_syscall_match_sym_name. Signed-off-by: Ian Munsie LKML-Reference: <1296703645-18718-5-git-send-email-imunsie@au1.ibm.com> Signed-off-by: Steven Rostedt commit c763ba06bd9b5db2c46c36276c89103d92d2c604 Author: Ian Munsie Date: Thu Feb 3 14:27:22 2011 +1100 tracing/syscalls: Make arch_syscall_addr weak Some architectures use non-trivial system call tables and will not work with the generic arch_syscall_addr code. For example, PowerPC64 uses a table of twin long longs. This patch makes the generic arch_syscall_addr weak to allow architectures with non-trivial system call tables to override it. Signed-off-by: Ian Munsie LKML-Reference: <1296703645-18718-4-git-send-email-imunsie@au1.ibm.com> Signed-off-by: Steven Rostedt commit 3773b389b6927595512558594d040c1edba46f36 Author: Ian Munsie Date: Thu Feb 3 14:27:21 2011 +1100 tracing/syscalls: Convert redundant syscall_nr checks into WARN_ON With the ftrace events now checking if the syscall_nr is valid upon initialisation it should no longer be possible to register or unregister a syscall event without a valid syscall_nr since they should not be created. This adds a WARN_ON_ONCE in the register and unregister functions to locate potential regressions in the future. Signed-off-by: Ian Munsie LKML-Reference: <1296703645-18718-3-git-send-email-imunsie@au1.ibm.com> Signed-off-by: Steven Rostedt commit ba976970c79fd2fbfe1a4b3b6766a318f4eb9d4c Author: Ian Munsie Date: Thu Feb 3 14:27:20 2011 +1100 tracing/syscalls: Don't add events for unmapped syscalls FTRACE_SYSCALLS would create events for each and every system call, even if it had failed to map the system call's name with it's number. This resulted in a number of events being created that would not behave as expected. This could happen, for example, on architectures who's symbol names are unusual and will not match the system call name. It could also happen with system calls which were mapped to sys_ni_syscall. This patch changes the default system call number in the metadata to -1. If the system call name from the metadata is not successfully mapped to a system call number during boot, than the event initialisation routine will now return an error, preventing the event from being created. Signed-off-by: Ian Munsie LKML-Reference: <1296703645-18718-2-git-send-email-imunsie@au1.ibm.com> Signed-off-by: Steven Rostedt commit 4defe682d81a4960b6840ee4ed1a36f9db77c7bd Author: Steven Rostedt Date: Thu Feb 3 23:29:06 2011 -0500 tracing/filter: Remove synchronize_sched() from __alloc_preds() Because the filters are processed first and then activated (added to the call), we no longer need to worry about the preds of the filter in __alloc_preds() being used. As the filter that is allocating preds is not activated yet. Signed-off-by: Steven Rostedt commit 75b8e98263fdb0bfbdeba60d4db463259f1fe8a2 Author: Steven Rostedt Date: Thu Feb 3 23:25:46 2011 -0500 tracing/filter: Swap entire filter of events When creating a new filter, instead of allocating the filter to the event call first and then processing the filter, it is easier to process a temporary filter and then just swap it with the call filter. By doing this, it simplifies the code. A filter is allocated and processed, when it is done, it is swapped with the call filter, synchronize_sched() is called to make sure all callers are done with the old filter (filters are called with premption disabled), and then the old filter is freed. Cc: Tom Zanussi Signed-off-by: Steven Rostedt commit bf93f9ed3a2cb89eb7e58851139d3be375b98027 Author: Steven Rostedt Date: Thu Jan 27 23:21:34 2011 -0500 tracing/filter: Increase the max preds to 2^14 Now that the filter logic does not require to save the pred results on the stack, we can increase the max number of preds we allow. As the preds are index by a short value, and we use the MSBs as flags we can increase the max preds to 2^14 (16384) which should be way more than enough. Cc: Tom Zanussi Signed-off-by: Steven Rostedt commit 4a3d27e98a7f2682e96d6f863752e0424b00d691 Author: Steven Rostedt Date: Thu Jan 27 23:19:49 2011 -0500 tracing/filter: Move MAX_FILTER_PRED to local tracing directory The MAX_FILTER_PRED is only needed by the kernel/trace/*.c files. Move it to kernel/trace/trace.h. Cc: Tom Zanussi Signed-off-by: Steven Rostedt commit 43cd414552d8137157e926e46361678ea867e476 Author: Steven Rostedt Date: Thu Jan 27 23:16:51 2011 -0500 tracing/filter: Optimize filter by folding the tree There are many cases that a filter will contain multiple ORs or ANDs together near the leafs. Walking up and down the tree to get to the next compare can be a waste. If there are several ORs or ANDs together, fold them into a single pred and allocate an array of the conditions that they check. This will speed up the filter by linearly walking an array and can still break out if a short circuit condition is met. Cc: Tom Zanussi Signed-off-by: Steven Rostedt commit ec126cac23945de12eb2d103374e1f7ee97c5595 Author: Steven Rostedt Date: Thu Jan 27 23:14:25 2011 -0500 tracing/filter: Check the created pred tree Since the filter walks a tree to determine if a match is made or not, if the tree was incorrectly created, it could cause an infinite loop. Add a check to walk the entire tree before assigning it as a filter to make sure the tree is correct. Cc: Tom Zanussi Signed-off-by: Steven Rostedt commit 55719274188f13cff9e3bd11fdd4c0e7617cd03d Author: Steven Rostedt Date: Thu Jan 27 23:12:05 2011 -0500 tracing/filter: Optimize short ciruit check The test if we should break out early for OR and AND operations can be optimized by comparing the current result with (pred->op == OP_OR) That is if the result is true and the op is an OP_OR, or if the result is false and the op is not an OP_OR (thus an OP_AND) we can break out early in either case. Otherwise we continue processing. Cc: Tom Zanussi Signed-off-by: Steven Rostedt commit 61e9dea20e1ada886cc49a9ec6fe3c6ac0de7324 Author: Steven Rostedt Date: Thu Jan 27 22:54:33 2011 -0500 tracing/filter: Use a tree instead of stack for filter_match_preds() Currently the filter_match_preds() requires a stack to push and pop the preds to determine if the filter matches the record or not. This has two drawbacks: 1) It requires a stack to store state information. As this is done in fast paths we can't allocate the storage for this stack, and we can't use a global as it must be re-entrant. The stack is stored on the kernel stack and this greatly limits how many preds we may allow. 2) All conditions are calculated even when a short circuit exists. a || b will always calculate a and b even though a was determined to be true. Using a tree we can walk a constant structure that will save the state as we go. The algorithm is simply: pred = root; do { switch (move) { case MOVE_DOWN: if (OR or AND) { pred = left; continue; } if (pred == root) break; match = pred->fn(); pred = pred->parent; move = left child ? MOVE_UP_FROM_LEFT : MOVE_UP_FROM_RIGHT; continue; case MOVE_UP_FROM_LEFT: /* Only OR or AND can be a parent */ if (match && OR || !match && AND) { /* short circuit */ if (pred == root) break; pred = pred->parent; move = left child ? MOVE_UP_FROM_LEFT : MOVE_UP_FROM_RIGHT; continue; } pred = pred->right; move = MOVE_DOWN; continue; case MOVE_UP_FROM_RIGHT: if (pred == root) break; pred = pred->parent; move = left child ? MOVE_UP_FROM_LEFT : MOVE_UP_FROM_RIGHT; continue; } done = 1; } while (!done); This way there's no strict limit to how many preds we allow and it also will short circuit the logical operations when possible. Cc: Tom Zanussi Signed-off-by: Steven Rostedt commit f76690afd05e3e163149310bdcd30234f93b3a7a Author: Steven Rostedt Date: Thu Jan 27 22:53:06 2011 -0500 tracing/filter: Free pred array on disabling of filter When a filter is disabled, free the preds. Cc: Tom Zanussi Signed-off-by: Steven Rostedt commit 74e9e58c350a24139e268dd6857bbaa55c5aafcf Author: Steven Rostedt Date: Thu Jan 27 22:49:48 2011 -0500 tracing/filter: Allocate the preds in an array Currently we allocate an array of pointers to filter_preds, and then allocate a separate filter_pred for each item in the array. This adds slight overhead in the filters as it needs to derefernce twice to get to the op condition. Allocating the preds themselves in a single array removes a dereference as well as helps on the cache footprint. Cc: Tom Zanussi Signed-off-by: Steven Rostedt commit 0fc3ca9a10a61a77f18710fb708b41fd99c79a56 Author: Steven Rostedt Date: Thu Jan 27 22:46:46 2011 -0500 tracing/filter: Call synchronize_sched() just once for system filters By separating out the reseting of the filter->n_preds to zero from the reallocation of preds for the filter, we can reset groups of filters first, call synchronize_sched() just once, and then reallocate each of the filters in the system group. Cc: Tom Zanussi Signed-off-by: Steven Rostedt commit c9c53ca03d6f97fdd9832d5ed3f15b30ee5cdb86 Author: Steven Rostedt Date: Thu Jan 27 22:42:43 2011 -0500 tracing/filter: Dynamically allocate preds For every filter that is made, we create predicates to hold every operation within the filter. We have a max of 32 predicates that we can hold. Currently, we allocate all 32 even if we only need to use one. Part of the reason we do this is that the filter can be used at any moment by any event. Fortunately, the filter is only used with preemption disabled. By reseting the count of preds used "n_preds" to zero, then performing a synchronize_sched(), we can safely free and reallocate a new array of preds. Cc: Tom Zanussi Signed-off-by: Steven Rostedt commit 58d9a597c4275d830a819625e7d437cd6fb23fa5 Author: Steven Rostedt Date: Thu Jan 27 22:37:09 2011 -0500 tracing/filter: Move OR and AND logic out of fn() method The ops OR and AND act different from the other ops, as they are the only ones to take other ops as their arguements. These ops als change the logic of the filter_match_preds. By removing the OR and AND fn's we can also remove the val1 and val2 that is passed to all other fn's and are unused. Cc: Tom Zanussi Signed-off-by: Steven Rostedt commit 6d54057d76e25c91165cda0e6e007f1811faa2be Author: Steven Rostedt Date: Thu Jan 27 22:33:26 2011 -0500 tracing/filter: Have no filter return a match The n_preds field of a file can change at anytime, and even can become zero, just as the filter is about to be processed by an event. In the case that is zero on entering the filter, return 1, telling the caller the event matchs and should be trace. Also use a variable and assign it with ACCESS_ONCE() such that the count stays consistent within the function. Cc: Tom Zanussi Signed-off-by: Steven Rostedt commit cabb5bd7ff4d6963ec9e67f958fc30e7815425e6 Author: Hans Rosenfeld Date: Mon Feb 7 18:10:39 2011 +0100 x86, amd: Support L3 Cache Partitioning on AMD family 0x15 CPUs L3 Cache Partitioning allows selecting which of the 4 L3 subcaches can be used for evictions by the L2 cache of each compute unit. By writing a 4-bit hexadecimal mask into the the sysfs file /sys/devices/system/cpu/cpuX/cache/index3/subcaches, the user can set the enabled subcaches for a CPU. The settings are directly read from and written to the hardware, so there is no way to have contradicting settings for two CPUs belonging to the same compute unit. Writing will always overwrite any previous setting for a compute unit. Signed-off-by: Hans Rosenfeld Cc: LKML-Reference: <1297098639-431383-1-git-send-email-hans.rosenfeld@amd.com> [ -v3: minor style fixes ] Signed-off-by: Ingo Molnar commit 124bb83cd7de4d851af7595650233fb9e9279d5d Author: Masami Hiramatsu Date: Fri Feb 4 21:52:11 2011 +0900 perf probe: Add bitfield member support Add bitfield member accessing support to probe arguments. Suggested-by: Arnaldo Carvalho de Melo Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Srikar Dronamraju Cc: Steven Rostedt LKML-Reference: <20110204125211.9507.60265.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu [ committer note: Fixed up '%lu' use for return of BYTES_TO_BITS ('%zd') ] Signed-off-by: Arnaldo Carvalho de Melo commit a2221796256ea7b236cec6bf027c1c1de5b8ccd7 Author: Borislav Petkov Date: Mon Feb 7 15:32:18 2011 +0100 perf annotate: Fix build error A small fix for when NO_NEWT_SUPPORT is defined. Add a missing "struct" to the function prototype. Cc: Frederic Weisbecker Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Tom Zanussi LKML-Reference: <20110207143218.GA31197@kryptos.osrc.amd.com> Signed-off-by: Borislav Petkov Signed-off-by: Arnaldo Carvalho de Melo commit fb7d0b3cefb80a105f7fd26bbc62e0cbf9192822 Author: Kyle McMartin Date: Mon Jan 24 11:13:04 2011 -0500 perf tool: Fix gcc 4.6.0 issues GCC 4.6.0 in Fedora rawhide turned up some compile errors in tools/perf due to the -Werror=unused-but-set-variable flag. I've gone through and annotated some of the assignments that had side effects (ie: return value from a function) with the __used annotation, and in some cases, just removed unused code. In a few cases, we were assigning something useful, but not using it in later parts of the function. kyle@dreadnought:~/src% gcc --version gcc (GCC) 4.6.0 20110122 (Red Hat 4.6.0-0.3) Cc: Ingo Molnar LKML-Reference: <20110124161304.GK27353@bombadil.infradead.org> Signed-off-by: Kyle McMartin [ committer note: Fixed up the annotation fixes, as that code moved recently ] Signed-off-by: Arnaldo Carvalho de Melo commit 1ff511e35ed87cc2ebade9e678e4a2fe39b6f9c5 Author: Masami Hiramatsu Date: Fri Feb 4 21:52:05 2011 +0900 tracing/kprobes: Add bitfield type Add bitfield type for tracing arguments on kprobe-tracer. The syntax of a bitfield type is: b@/ e.g. Accessing 2 bits-width field with 4 bits-offset in 32 bits-width data at 4 bytes offseted from the address pointed by AX register: +4(%ax):b2@4/32 Since the width of container data depends on the arch, so I just added the container-size at the end. Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Srikar Dronamraju Cc: Steven Rostedt LKML-Reference: <20110204125205.9507.11363.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu Signed-off-by: Arnaldo Carvalho de Melo commit e3745369986ddcdaa19f70e2d24e658876b97e84 Author: Masami Hiramatsu Date: Fri Feb 4 21:51:59 2011 +0900 tracing/kprobes: Support longer (>128 bytes) command Expand command line buffer of kprobe-tracer to 4096 bytes. Reported-by: Arnaldo Carvalho de Melo Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Srikar Dronamraju Cc: Steven Rostedt LKML-Reference: <20110204125159.9507.20895.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu Signed-off-by: Arnaldo Carvalho de Melo commit 76022db323dd6d7c6958df3d595f7dedf7a14778 Author: Masami Hiramatsu Date: Fri Feb 4 21:51:53 2011 +0900 tracing/kprobes: Cleanup strict_strtol() using code Since strict_strtol() accepts minus digits started with '-', it doesn't need to invert after converting. Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Srikar Dronamraju Cc: Steven Rostedt LKML-Reference: <20110204125153.9507.49335.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu Signed-off-by: Arnaldo Carvalho de Melo commit f50c2169bd054984e976e67e8651d28f3caf6ba3 Author: Franck Bui-Huu Date: Thu Jan 13 11:18:30 2011 +0100 perf probe: Rewrite find_lazy_match_lines() by using getline(3) Acked-by: Masami Hiramatsu Cc: Masami Hiramatsu Cc: lkml LKML-Reference: Signed-off-by: Franck Bui-Huu Signed-off-by: Arnaldo Carvalho de Melo commit ef4d001d79ac4bab6c2d81e9986a42059f877ec3 Author: Denis Kirjanov Date: Sat Feb 5 20:39:38 2011 +0000 perf top: Use pid_t for target_{pid|tid} Use pid_t data type for target_{pid|tid} vars. Cc: Ingo Molnar LKML-Reference: <20110205203938.GA15328@hera.kernel.org> Signed-off-by: Denis Kirjanov [ committer note: those variables are now in struct perf_top, fixed ] Signed-off-by: Arnaldo Carvalho de Melo commit 9c56dfeb784a586713f467e2028a127a2a58a238 Author: Michael Witten Date: Thu Feb 3 22:10:55 2011 -0600 perf tools: Makefile: Use $(QUIET_GEN) for perf.so So that we get this: CC /home/acme/git/build/perf/bench/mem-memcpy-x86-64-asm.o GEN perf-archive * GEN /home/acme/git/build/perf/python/perf.so CC /home/acme/git/build/perf/builtin-annotate.o Instead of silently building the python binding. LKML-Reference: <1296890359-22659-1-git-send-email-mfwitten@gmail.com> Signed-off-by: Michael Witten Signed-off-by: Arnaldo Carvalho de Melo commit 075de90c46562de1435db16c2129ec4ff92e5bd2 Merge: c7f9a6f 3653246 Author: Ingo Molnar Date: Mon Feb 7 08:45:48 2011 +0100 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core commit c7f9a6f377fa64e5a74f8c128d4349765c28fab1 Merge: fe4b04f 8dbdea8 Author: Ingo Molnar Date: Mon Feb 7 08:44:11 2011 +0100 Merge branch 'linus' into perf/core Merge reason: Pick up perf fixes that are now upstream Signed-off-by: Ingo Molnar commit 36532461a0f60bb36c5470a0326f7394f19db23c Author: Arnaldo Carvalho de Melo Date: Sun Feb 6 14:54:44 2011 -0200 perf top: Ditch private annotation code, share perf annotate's Next step: Live TUI annotation in perf top, just press enter on a symbol line. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit f1e2701de02cff6d988b1dd49960620d5720cb89 Author: Arnaldo Carvalho de Melo Date: Sat Feb 5 18:51:38 2011 -0200 perf annotate: Separate objdump parsing from actual screen rendering Because in 'perf top' we'll need to parse just once and then, as samples come, render multiple times with evolving counter values. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 285c1a2c3a5f84ce1c811ab4cb1f8a17466e1a06 Merge: 1fb0ef3 a9fe8d5 Author: Thomas Gleixner Date: Sat Feb 5 21:49:57 2011 +0100 Merge branch 'irq/urgent' into irq/core Reason: Get mainline fixes integrated. Further patches conflict with them Signed-off-by: Thomas Gleixner commit d040bd363824f9f0ad6610b91ee6c65f292c066c Author: Arnaldo Carvalho de Melo Date: Sat Feb 5 15:37:31 2011 -0200 perf annotate: Config options for symbol__tty_annotate Max line# that should be printed, minimum percentage filter, just like 'perf top', alas, due to it :-) Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 2f525d0148ef2734c8a172201e5e1e9167a8a5fd Author: Arnaldo Carvalho de Melo Date: Fri Feb 4 13:43:24 2011 -0200 perf annotate: Support multiple histograms in annotation The perf annotate tool continues aggregating everything on just one histograms, but to support the top model add support for one histogram perf evsel in the evlist. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 78f7defedbb4da73b9a07635c357c1afcaa55c8f Author: Arnaldo Carvalho de Melo Date: Fri Feb 4 09:45:46 2011 -0200 perf annotate: Move annotate functions to util/ They will be used by perf top, so that we have just one set of routines to do annotation. Rename "struct sym_priv" to "struct annotation", etc, to clarify this code a bit. Rename "struct sym_ext" to "struct source_line", to give it a meaningful name, that clarifies that it is a the result of an addr2line call, that is sorted by percentage one particular source code line appeared in the annotation. And since we're moving things around also rename 'sym_hist->ip' to 'sym_hist->addr' as we want to do data structure annotation at some point. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 764328d3209dd81b02a55722556b07b6f35e3ca0 Author: Arnaldo Carvalho de Melo Date: Fri Feb 4 07:33:24 2011 -0200 perf top: Remove superfluous name_len field From the sym_entry struct, struct symbol already has this field. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 04bea68b2f0eeebb089ecc67b618795925268b4a Author: Sebastian Andrzej Siewior Date: Mon Jan 24 09:58:55 2011 +0530 of/pci: move of_irq_map_pci() into generic code There is a tiny difference between PPC32 and PPC64. Microblaze uses the PPC32 variant. Signed-off-by: Sebastian Andrzej Siewior [grant.likely@secretlab.ca: Added comment to #endif, moved documentation block to function implementation, fixed for non ppc and microblaze compiles] Signed-off-by: Grant Likely commit c64eae9a73a847c1698f913c893aa4012d2a30b0 Merge: c9e358d ebf5382 Author: Grant Likely Date: Fri Feb 4 11:46:43 2011 -0700 Merge commit 'v2.6.38-rc3' into devicetree/next commit d95f412200652694e63e64bfd49f0ae274a54479 Author: Mike Galbraith Date: Tue Feb 1 09:50:51 2011 -0500 sched: Add yield_to(task, preempt) functionality Currently only implemented for fair class tasks. Add a yield_to_task method() to the fair scheduling class. allowing the caller of yield_to() to accelerate another thread in it's thread group, task group. Implemented via a scheduler hint, using cfs_rq->next to encourage the target being selected. We can rely on pick_next_entity to keep things fair, so noone can accelerate a thread that has already used its fair share of CPU time. This also means callers should only call yield_to when they really mean it. Calling it too often can result in the scheduler just ignoring the hint. Signed-off-by: Rik van Riel Signed-off-by: Marcelo Tosatti Signed-off-by: Mike Galbraith Signed-off-by: Peter Zijlstra LKML-Reference: <20110201095051.4ddb7738@annuminas.surriel.com> Signed-off-by: Ingo Molnar commit ac53db596cc08ecb8040cfb6f71ae40c6f2041c4 Author: Rik van Riel Date: Tue Feb 1 09:51:03 2011 -0500 sched: Use a buddy to implement yield_task_fair() Use the buddy mechanism to implement yield_task_fair. This allows us to skip onto the next highest priority se at every level in the CFS tree, unless doing so would introduce gross unfairness in CPU time distribution. We order the buddy selection in pick_next_entity to check yield first, then last, then next. We need next to be able to override yield, because it is possible for the "next" and "yield" task to be different processen in the same sub-tree of the CFS tree. When they are, we need to go into that sub-tree regardless of the "yield" hint, and pick the correct entity once we get to the right level. Signed-off-by: Rik van Riel Signed-off-by: Peter Zijlstra LKML-Reference: <20110201095103.3a79e92a@annuminas.surriel.com> Signed-off-by: Ingo Molnar commit 2c13c919d9e9a3db9896143a501f83dcbbe1ced4 Author: Rik van Riel Date: Tue Feb 1 09:48:37 2011 -0500 sched: Limit the scope of clear_buddies The clear_buddies function does not seem to play well with the concept of hierarchical runqueues. In the following tree, task groups are represented by 'G', tasks by 'T', next by 'n' and last by 'l'. (nl) / \ G(nl) G / \ \ T(l) T(n) T This situation can arise when a task is woken up T(n), and the previously running task T(l) is marked last. When clear_buddies is called from either T(l) or T(n), the next and last buddies of the group G(nl) will be cleared. This is not the desired result, since we would like to be able to find the other type of buddy in many cases. This especially a worry when implementing yield_task_fair through the buddy system. The fix is simple: only clear the buddy type that the task itself is indicated to be. As an added bonus, we stop walking up the tree when the buddy has already been cleared or pointed elsewhere. Signed-off-by: Rik van Riel Signed-off-by: Peter Zijlstra LKML-Reference: <20110201094837.6b0962a9@annuminas.surriel.com> Signed-off-by: Ingo Molnar commit 725e7580aaf98e9f7b22b8ccfc640ad0c09e2b08 Author: Rik van Riel Date: Tue Feb 1 09:47:15 2011 -0500 sched: Check the right ->nr_running in yield_task_fair() With CONFIG_FAIR_GROUP_SCHED, each task_group has its own cfs_rq. Yielding to a task from another cfs_rq may be worthwhile, since a process calling yield typically cannot use the CPU right now. Therefor, we want to check the per-cpu nr_running, not the cgroup local one. Signed-off-by: Rik van Riel Signed-off-by: Peter Zijlstra LKML-Reference: <20110201094715.798c4f86@annuminas.surriel.com> Signed-off-by: Ingo Molnar commit fe4b04fa31a6dcf4358aa84cf81e5a7fd079469b Author: Peter Zijlstra Date: Wed Feb 2 13:19:09 2011 +0100 perf: Cure task_oncpu_function_call() races Oleg reported that on architectures with __ARCH_WANT_INTERRUPTS_ON_CTXSW the IPI from task_oncpu_function_call() can land before perf_event_task_sched_in() and cause interesting situations for eg. perf_install_in_context(). This patch reworks the task_oncpu_function_call() interface to give a more usable primitive as well as rework all its users to hopefully be more obvious as well as remove the races. While looking at the code I also found a number of races against perf_event_task_sched_out() which can flip contexts between tasks so plug those too. Reported-and-reviewed-by: Oleg Nesterov Signed-off-by: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit 0606f422b453f76c31ab2b1bd52943ff06a2dcf2 Author: Richard Cochran Date: Tue Feb 1 13:52:35 2011 +0000 posix clocks: Introduce dynamic clocks This patch adds support for adding and removing posix clocks. The clock lifetime cycle is patterned after usb devices. Each clock is represented by a standard character device. In addition, the driver may optionally implement custom character device operations. The posix clock and timer system calls listed below now work with dynamic posix clocks, as well as the traditional static clocks. The following system calls are affected: - clock_adjtime (brand new syscall) - clock_gettime - clock_getres - clock_settime - timer_create - timer_delete - timer_gettime - timer_settime [ tglx: Adapted to the posix-timer cleanup. Moved clock_posix_dynamic to posix-clock.c and made all referenced functions static ] Signed-off-by: Richard Cochran Acked-by: John Stultz LKML-Reference: <20110201134420.164172635@linutronix.de> Signed-off-by: Thomas Gleixner commit 527087374faa488776a789375a7d6ea74fda6f71 Author: Thomas Gleixner Date: Wed Feb 2 12:10:09 2011 +0100 posix-timers: Cleanup namespace Rename register_posix_clock() to posix_timers_register_clock(). That's what the function really does. As a side effect this cleans up the posix_clock namespace for the upcoming dynamic posix_clock infrastructure. Signed-off-by: Thomas Gleixner Tested-by: Richard Cochran Cc: John Stultz LKML-Reference: commit 81e294cba2596f5f10848bbe19d98b344c2a2d5c Author: Richard Cochran Date: Tue Feb 1 13:52:32 2011 +0000 posix-timers: Add support for fd based clocks Extend the negative clockids which are currently used by posix cpu timers to encode the PID with a file descriptor based type which encodes the fd in the upper bits. Originally-from: Richard Cochran Signed-off-by: Thomas Gleixner Acked-by: John Stultz LKML-Reference: <20110201134420.062860200@linutronix.de> commit ce26efdefa5e8f22d933df72d7f7482725091d6d Author: Richard Cochran Date: Tue Feb 1 13:52:30 2011 +0000 x86: Add clock_adjtime for x86 This patch adds the clock_adjtime system call to the x86 architecture. Signed-off-by: Richard Cochran Acked-by: John Stultz LKML-Reference: <20110201134419.968905083@linutronix.de> Signed-off-by: Thomas Gleixner commit f1f1d5ebd10ffa4242bce7a90a56a222d6b7bc77 Author: Richard Cochran Date: Tue Feb 1 13:52:26 2011 +0000 posix-timers: Introduce a syscall for clock tuning. A new syscall is introduced that allows tuning of a POSIX clock. The new call, clock_adjtime, takes two parameters, the clock ID and a pointer to a struct timex. Any ADJTIMEX(2) operation may be requested via this system call, but various POSIX clocks may or may not support tuning. [ tglx: Adapted to the posix-timer cleanup series. Avoid copy_to_user in the error case ] Signed-off-by: Richard Cochran Acked-by: John Stultz LKML-Reference: <20110201134419.869804645@linutronix.de> Signed-off-by: Thomas Gleixner commit 65f5d80bdf83ec0d7f3887db10153bf3f36ed73c Author: Richard Cochran Date: Tue Feb 1 13:52:23 2011 +0000 time: Splitout compat timex accessors Split out the compat timex accessors into separate functions. Preparatory patch for a new syscall. [ tglx: Split that patch from Richards "posix-timers: Introduce a syscall for clock tuning.". Keeps the changes strictly separate ] Originally-from: Richard Cochran Acked-by: John Stultz Signed-off-by: Thomas Gleixner LKML-Reference: <20110201134419.772343089@linutronix.de> commit 094aa1881fdc1b8889b442eb3511b31f3ec2b762 Author: Richard Cochran Date: Tue Feb 1 13:52:20 2011 +0000 ntp: Add ADJ_SETOFFSET mode bit This patch adds a new mode bit into the timex structure. When set, the bit instructs the kernel to add the given time value to the current time. Signed-off-by: Richard Cochran Acked-by: John Stultz LKML-Reference: <20110201134320.688829863@linutronix.de> Signed-off-by: Thomas Gleixner commit c528f7c6c208f1fae6b4025957173dec045e5f21 Author: John Stultz Date: Tue Feb 1 13:52:17 2011 +0000 time: Introduce timekeeping_inject_offset This adds a kernel-internal timekeeping interface to add or subtract a fixed amount from CLOCK_REALTIME. This makes it so kernel users or interfaces trying to do so do not have to read the time, then add an offset and then call settimeofday(), which adds some extra error in comparision to just simply adding the offset in the kernel timekeeping core. Signed-off-by: John Stultz Signed-off-by: Richard Cochran LKML-Reference: <20110201134419.584311693@linutronix.de> Signed-off-by: Thomas Gleixner commit 0061748dd2400d0bcd4d49d258db5d7b5d106ca0 Author: Richard Cochran Date: Tue Feb 1 13:52:15 2011 +0000 posix-timer: Update comment Pick the cleanup to the comment in posix-timers.c from Richards all in one conversion patch. Originally-from: Richard Cochran Signed-off-by: Thomas Gleixner Acked-by: John Stultz LKML-Reference: <20110201134419.487708516@linutronix.de> commit bc2c8ea483d73e95fc88f1fc9e7755180f89b892 Author: Thomas Gleixner Date: Tue Feb 1 13:52:12 2011 +0000 posix-timers: Make posix-cpu-timers functions static All functions are accessed via clock_posix_cpu now. So make them static. Signed-off-by: Thomas Gleixner Acked-by: John Stultz Tested-by: Richard Cochran LKML-Reference: <20110201134419.389755466@linutronix.de> commit 0aa3975f02ce78f27be3076fbfa3d94ae5a659d5 Author: Thomas Gleixner Date: Tue Feb 1 13:52:09 2011 +0000 posix-timers: Remove CLOCK_DISPATCH leftovers All users gone. Remove the cruft. Huge thanks to Richard Cochran who tackled that maze first. Signed-off-by: Thomas Gleixner Acked-by: John Stultz Tested-by: Richard Cochran LKML-Reference: <20110201134419.294620613@linutronix.de> commit 6761c6702e2c647582e1829abe8cf90794f61d9d Author: Thomas Gleixner Date: Tue Feb 1 13:52:07 2011 +0000 posix-timers: Convert timer_delete() to clockid_to_kclock() Set the common function for CLOCK_MONOTONIC and CLOCK_REALTIME kclocks and use the new decoding function. Signed-off-by: Thomas Gleixner Acked-by: John Stultz Tested-by: Richard Cochran LKML-Reference: <20110201134419.198999420@linutronix.de> commit a7319fa253a549c4c6528fb550ae6e72a9c83811 Author: Thomas Gleixner Date: Tue Feb 1 13:52:04 2011 +0000 posix-timers: Convert timer_gettime() to clockid_to_kclock() Set the common function for CLOCK_MONOTONIC and CLOCK_REALTIME kclocks and use the new decoding function. Signed-off-by: Thomas Gleixner Acked-by: John Stultz Tested-by: Richard Cochran LKML-Reference: <20110201134419.101243181@linutronix.de> commit 27722df16ef143017db55ac7baac1703a68017ff Author: Thomas Gleixner Date: Tue Feb 1 13:52:01 2011 +0000 posix-timers: Convert timer_settime() to clockid_to_kclock() Set the common function for CLOCK_MONOTONIC and CLOCK_REALTIME kclocks and use the new decoding function. Signed-off-by: Thomas Gleixner Acked-by: John Stultz Tested-by: Richard Cochran LKML-Reference: <20110201134419.001863714@linutronix.de> commit 838394fbf989973ec7f5a0ad82cb6ff09e5c39aa Author: Thomas Gleixner Date: Tue Feb 1 13:51:58 2011 +0000 posix-timers: Convert timer_create() to clockid_to_kclock() Setup timer_create for CLOCK_MONOTONIC and CLOCK_REALTIME kclocks and remove the no_timer_create() implementation. Signed-off-by: Thomas Gleixner Acked-by: John Stultz Tested-by: Richard Cochran LKML-Reference: <20110201134418.903604289@linutronix.de> commit ebaac757acae0431e2c79c00e09f1debdabbddd7 Author: Thomas Gleixner Date: Tue Feb 1 13:51:56 2011 +0000 posix-timers: Remove useless res field from k_clock The res member of kclock is only used by mmtimer.c, but even there it contains redundant information. Remove the field and fixup mmtimer. Signed-off-by: Thomas Gleixner Acked-by: John Stultz Tested-by: Richard Cochran LKML-Reference: <20110201134418.808714587@linutronix.de> commit e5e542eea9075dd008993c2ee80b2cc9f31fc494 Author: Thomas Gleixner Date: Tue Feb 1 13:51:53 2011 +0000 posix-timers: Convert clock_getres() to clockid_to_kclock() Use the new kclock decoding. Fixup the fallout in mmtimer.c Signed-off-by: Thomas Gleixner Acked-by: John Stultz Tested-by: Richard Cochran LKML-Reference: <20110201134418.709802797@linutronix.de> commit 4359ac0ace1a2a267927390ad27f781a2f8e0ab8 Author: Thomas Gleixner Date: Wed Feb 2 11:45:23 2011 +0100 posix-timers: Make clock_getres and clock_get mandatory Richard said: "I would think that we can require k_clocks to provide the read function. This could be checked and enforced in register_posix_clock()." Add checks for clock_getres and clock_get in the register function. Suggested-by: Richard Cochran Cc: John Stultz Signed-off-by: Thomas Gleixner commit 42285777631aa0654fbb6442057b3e176445c6c5 Author: Thomas Gleixner Date: Tue Feb 1 13:51:50 2011 +0000 posix-timers: Convert clock_gettime() to clockid_to_kclock() Use the new kclock decoding mechanism and rename the misnomed common_clock_get() to posix_clock_realtime_get(). Signed-off-by: Thomas Gleixner Acked-by: John Stultz Tested-by: Richard Cochran LKML-Reference: <20110201134418.611097203@linutronix.de> commit 26f9a4796af330173d790c8d2b5e2efcc489e755 Author: Thomas Gleixner Date: Tue Feb 1 13:51:48 2011 +0000 posix-timers: Convert clock_settime to clockid_to_kclock() Use the new kclock decoding function in clock_settime and cleanup all kclocks which use the default functions. Rename the misnomed common_clock_set() to posix_clock_realtime_set(). Signed-off-by: Thomas Gleixner Acked-by: John Stultz Tested-by: Richard Cochran LKML-Reference: <20110201134418.518851246@linutronix.de> commit 79c9da0d0539fb341a1b48a2a5a23d974726de90 Author: Thomas Gleixner Date: Tue Feb 1 13:51:45 2011 +0000 posix-cpu-timers: Remove the stub nanosleep functions CLOCK_THREAD_CPUTIME_ID implements stub functions for nanosleep and nanosleep_restart, which return -EINVAL. That return value is wrong. The correct return value is -ENOTSUP. Remove the stubs and let the new dispatch code return the correct error code. Signed-off-by: Thomas Gleixner Acked-by: John Stultz Tested-by: Richard Cochran LKML-Reference: <20110201134418.422446502@linutronix.de> commit d608c18203a969e5d14572a9861c646d0bb66872 Author: Thomas Gleixner Date: Tue Feb 1 13:51:43 2011 +0000 thread_info: Remove legacy arg0-3 from restart_block posix timers were the last users of the legacy arg0-3 members of restart_block. Remove the cruft. Signed-off-by: Thomas Gleixner Acked-by: John Stultz Tested-by: Richard Cochran LKML-Reference: <20110201134418.326209775@linutronix.de> commit 3751f9f29bcbc19bd10e92254a273486f150c245 Author: Thomas Gleixner Date: Tue Feb 1 13:51:20 2011 +0000 posix-timers: Cleanup restart_block usage posix timers still use the legacy arg0-arg3 members of restart_block. Use restart_block.nanosleep instead Signed-off-by: Thomas Gleixner Acked-by: John Stultz Tested-by: Richard Cochran LKML-Reference: <20110201134418.232288779@linutronix.de> commit 59bd5bc24aa69f6c62da1e242c16f09f667def96 Author: Thomas Gleixner Date: Tue Feb 1 13:51:17 2011 +0000 posix-timers: Convert clock_nanosleep_restart to clockid_to_kclock() Use the new kclock decoding function in clock_nanosleep_restart. Signed-off-by: Thomas Gleixner Acked-by: John Stultz Tested-by: Richard Cochran LKML-Reference: <20110201134418.131263211@linutronix.de> commit a5cd2880106cb2c79b3fe24f1c53dadba6a542a0 Author: Thomas Gleixner Date: Tue Feb 1 13:51:11 2011 +0000 posix-timers: Convert clock_nanosleep to clockid_to_kclock() Use the new kclock decoding function in clock_nanosleep and cleanup all kclocks which use the default functions. Signed-off-by: Thomas Gleixner Acked-by: John Stultz Tested-by: Richard Cochran LKML-Reference: <20110201134418.034175556@linutronix.de> commit cc785ac22b17ed53e8ff5c1501e422be6d10be3c Author: Thomas Gleixner Date: Tue Feb 1 13:51:09 2011 +0000 posix-timers: Introduce clockid_to_kclock() New function to find the kclock for a given clockid. Returns a pointer to clock_posix_cpu if clockid < 0. If clockid >= MAXCLOCK or if the clock_getres pointer is not set it returns NULL. For valid clocks it returns a pointer to the matching posix_clock. Signed-off-by: Thomas Gleixner Cc: John Stultz Acked-by: Richard Cochran LKML-Reference: <20110201134417.938447839@linutronix.de> commit 1976945eeaab5fa461735a6225a82c3cf1e65d62 Author: Thomas Gleixner Date: Tue Feb 1 13:51:06 2011 +0000 posix-timers: Introduce clock_posix_cpu The CLOCK_DISPATCH() macro is a horrible magic. We call common functions if a function pointer is not set. That's just backwards. To support dynamic file decriptor based clocks we need to cleanup that dispatch logic. Create a k_clock struct clock_posix_cpu which has all the posix-cpu-timer functions filled in. After the cleanup the functions can be made static. Signed-off-by: Thomas Gleixner Acked-by: John Stultz Tested-by: Richard Cochran LKML-Reference: <20110201134417.841974553@linutronix.de> commit 2fd1f04089cb657c5d6c484b280ec4d3398aa157 Author: Thomas Gleixner Date: Tue Feb 1 13:51:03 2011 +0000 posix-timers: Cleanup struct initializers Cosmetic. No functional change Signed-off-by: Thomas Gleixner Acked-by: John Stultz Tested-by: Richard Cochran LKML-Reference: <20110201134417.745627057@linutronix.de> commit 65da528d7cc94966cf24d2a1e0837b689159b543 Author: Thomas Gleixner Date: Tue Feb 1 13:51:01 2011 +0000 posix-timers: Define nanosleep not supported error separate Define the conditional nanosleep not supported error value outside of do_posix_clock_nonanosleep(). Preparatory patch for further cleanups. Signed-off-by: Thomas Gleixner Acked-by: John Stultz Tested-by: Richard Cochran LKML-Reference: <20110201134417.643486574@linutronix.de> commit 1e6d767924c74929c0cfe839ae8f37bcee9e544e Author: Richard Cochran Date: Tue Feb 1 13:50:58 2011 +0000 time: Correct the *settime* parameters Both settimeofday() and clock_settime() promise with a 'const' attribute not to alter the arguments passed in. This patch adds the missing 'const' attribute into the various kernel functions implementing these calls. Signed-off-by: Richard Cochran Acked-by: John Stultz LKML-Reference: <20110201134417.545698637@linutronix.de> Signed-off-by: Thomas Gleixner commit b84defe6036e6dea782d41b80a4590e54f249671 Merge: 8104a47 cdb0861 Author: Ingo Molnar Date: Wed Feb 2 07:11:02 2011 +0100 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core commit 8104a4775ad8a7863af0b898224b15aa708582db Merge: f6bbc1d ebf5382 Author: Ingo Molnar Date: Wed Feb 2 07:10:03 2011 +0100 Merge commit 'v2.6.38-rc3' into perf/core Merge reason: Pick up latest fixes. Signed-off-by: Ingo Molnar commit cdb0861c85c03fe80f4da033aab69df949579dc6 Author: Yinghai Lu Date: Tue Feb 1 10:51:23 2011 -0800 perf top: Fix TUI compilation > + slsmg_write_nstring(width >= syme->map->dso->long_name_len ? > + syme->map->dso->long_name : > + syme->map->dso->short_name, width); need update macro for that calling util/ui/browsers/top.c: In function ‘perf_top_browser__write’: util/ui/browsers/top.c:60:2: error: cast to pointer from integer of different size util/ui/browsers/top.c:60:2: error: comparison between pointer and integer util/ui/browsers/top.c:60:2: error: passing argument 1 of ‘SLsmg_write_nstring’ discards qualifiers from pointer target type /usr/include/slang.h:1728:16: note: expected ‘char *’ but argument is of type ‘const char *’ make: *** [util/ui/browsers/top.o] Error 1 Cc: Frederic Weisbecker Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Tom Zanussi LKML-Reference: <4D48562B.20006@kernel.org> Signed-off-by: Yinghai Lu Signed-off-by: Arnaldo Carvalho de Melo commit 978f626c4e5b9524d1898788d8e34d86dfa00795 Author: Arnaldo Carvalho de Melo Date: Tue Feb 1 16:40:51 2011 -0200 perf tools: Don't try to build python bindings if Python.h not available Just leverage the test done for python support in 'python script', emitting a warning about losing those features if python-dev[el] is not installed. Reported-by: Peter Zijlstra Cc: Frederic Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 0015e2e101f5fd3256ab8b5a374c0e8806098871 Author: Arnaldo Carvalho de Melo Date: Tue Feb 1 16:18:10 2011 -0200 perf stat: Fix up resource release order That was causing a SEGV on selected old distros. Problem introduced in 7e2ed09. Reported-by: Peter Zijlstra Cc: Frederic Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 568bb7b8e856b9efb98a3f63259c717adc1b96b8 Author: Arnaldo Carvalho de Melo Date: Tue Feb 1 15:05:00 2011 -0200 perf tools: Fix up 'make clean' target It wasn't using $(OUTPUT) to rm *.o and there were some funny looking automake files that never get created but were being deleted anyway. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 067187fc9f1d09738fc833392e117f125cb6bbad Author: Arnaldo Carvalho de Melo Date: Tue Feb 1 14:57:02 2011 -0200 perf tools: Remove verbose build messages for the python binding Also now it builds it in a well known location: [acme@felicio linux]$ rm -rf ../build/perf/ [acme@felicio linux]$ mkdir ../build/perf [acme@felicio linux]$ make -j2 O=~acme/git/build/perf -C tools/perf/ [acme@felicio linux]$ ls -la ../build/perf/python/ total 152 -rwxrwxr-x 1 acme acme 147957 Feb 1 14:56 perf.so drwxrwxr-x 3 acme acme 17 Feb 1 14:56 temp [acme@felicio linux]$ [root@felicio ~]# strip ~acme/git/build/perf/python/perf.so [root@felicio ~]# ls -la ~acme/git/build/perf/python/perf.so -rwxrwxr-x 1 acme acme 46264 Feb 1 14:58 /home/acme/git/build/perf/python/perf.so [root@felicio ~]# export PYTHONPATH=~acme/git/build/perf/python/ [root@felicio ~]# ~acme/git/linux/tools/perf/python/twatch.py cpu: 0, pid: 7751, tid: 7751 { type: exit, pid: 7751, ppid: 7751, tid: 7751, ptid: 7751, time: 54562393512356} cpu: 0, pid: 13700, tid: 13700 { type: fork, pid: 7756, ppid: 13700, tid: 7756, ptid: 13700, time: 54562393746739} cpu: 1, pid: 7756, tid: 7756 { type: fork, pid: 7757, ppid: 7756, tid: 7757, ptid: 7756, time: 54562394246152} cpu: 1, pid: 7757, tid: 7757 { type: comm, pid: 7757, tid: 7757, comm: awk } cpu: 1, pid: 7757, tid: 7757 { type: exit, pid: 7757, ppid: 7757, tid: 7757, ptid: 7757, time: 54562395456813} Reported-by: Ingo Molnar Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 7cf37e87dd2cfa17a64f28ea7f31eed4525f79e4 Author: Thomas Gleixner Date: Tue Feb 1 09:34:58 2011 +0100 time: Fix legacy arch fallout The xtime/dotimer cleanup broke architectures which do not implement clockevents. Time to send out another __do_IRQ threat. Signed-off-by: Thomas Gleixner Reported-by: Ingo Molnar Cc: Torben Hohn Cc: Peter Zijlstra Cc: johnstul@us.ibm.com Cc: yong.zhang0@gmail.com Cc: hch@infradead.org LKML-Reference: <20110127145905.23248.30458.stgit@localhost> Signed-off-by: Ingo Molnar commit f6bbc1daac964da551130dbf01809d3fbd178b2d Author: Arnaldo Carvalho de Melo Date: Mon Jan 31 20:56:27 2011 -0200 perf python: Fix build on 32-bit Where there are lots of errors related to python methods receiving 'char *' for things like file open mode, which break the build, also disable strict aliasing and fixup some other warnings. Now builds on both 32-bit and 64-bit fedora systems. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 823c7164a92a6347d46bb64aaae728b6d08f3bb8 Author: Arnaldo Carvalho de Melo Date: Mon Jan 31 19:45:38 2011 -0200 perf probe: Use %td for pointer arithmetic result %td is for ptrdiff_t, avoiding this warning on 32-bit: cc1: warnings being treated as errors builtin-probe.c: In function ‘opt_set_filter’: builtin-probe.c:176:4: error: format ‘%ld’ expects type ‘long int’, but argument 3 has type ‘int’ Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Masami Hiramatsu Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit c0443df1b69b59675fc6790e0ddce87c8ca00abf Author: Arnaldo Carvalho de Melo Date: Mon Jan 31 18:19:33 2011 -0200 perf top: Introduce slang based TUI Disabled by default as there are features found in the stdio based one that aren't implemented, like live annotation, filtering knobs data entry. Annotation hopefully will get somehow merged with the 'perf annotate' code. To use it: perf top --tui Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 229ade9ba36341f7369ecb4f134bcec9133520bf Author: Arnaldo Carvalho de Melo Date: Mon Jan 31 18:08:39 2011 -0200 perf tools: Don't fallback to setup_pager unconditionally Because in tools like 'top' we don't want the pager. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit e2830b5c1b2b2217894370a3b95af87d4a958401 Author: Torben Hohn Date: Thu Jan 27 16:00:32 2011 +0100 time: Make do_timer() and xtime_lock local to kernel/time/ All callers of do_timer() are converted to xtime_update(). The only users of xtime_lock are in kernel/time/. Make both local to kernel/time/ and remove them from the global header files. [ tglx: Reuse tick-internal.h instead of creating another local header file. Massaged changelog ] Signed-off-by: Torben Hohn Cc: Peter Zijlstra Cc: johnstul@us.ibm.com Cc: yong.zhang0@gmail.com Cc: hch@infradead.org Signed-off-by: Thomas Gleixner commit eff9073790e1286aa12bf1c65814d3e0132b12e1 Author: Tejun Heo Date: Mon Jan 31 16:59:05 2011 +0100 x86: Rename incorrectly named parameter of numa_cpu_node() numa_cpu_node() prototype in numa_32.h has wrongly named parameter @apicid when it actually takes the CPU number. Change it to @cpu. Reported-by: Yinghai Lu Signed-off-by: Tejun Heo LKML-Reference: <20110131155905.GM7459@htj.dyndns.org> Signed-off-by: Ingo Molnar commit 8c3e10eb1968877d6a1957b7e790c6ce01bd56fc Author: Arnaldo Carvalho de Melo Date: Mon Jan 31 14:50:39 2011 -0200 perf top: Move display agnostic routines to util/top.[ch] Paving the way for a slang browser a la 'perf report --tui'. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 7e2ed097538c57ff5268e9a6bced7c0b885809c8 Author: Arnaldo Carvalho de Melo Date: Sun Jan 30 11:59:43 2011 -0200 perf evlist: Store pointer to the cpu and thread maps So that we don't have to pass it around to the several methods that needs it, simplifying usage. There is one case where we don't have the thread/cpu map in advance, which is in the parsing routines used by top, stat, record, that we have to wait till all options are parsed to know if a cpu or thread list was passed to then create those maps. For that case consolidate the cpu and thread map creation via perf_evlist__create_maps() out of the code in top and record, while also providing a perf_evlist__set_maps() for cases where multiple evlists share maps or for when maps that represent CPU sockets, for instance, get crafted out of topology information or subsets of threads in a particular application are to be monitored, providing more granularity in specifying which cpus and threads to monitor. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 51563cd53c4b1c1790fccd2e0af0e2b756589af9 Merge: d123375 8161239 Author: Thomas Gleixner Date: Mon Jan 31 15:08:43 2011 +0100 Merge branch 'tip/rtmutex' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into core/locking *git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace tip/rtmutex: rtmutex: Simplify PI algorithm and make highest prio task get lock commit d12b0e24c56c6fb2398609f26858e5278d688840 Author: Torben Hohn Date: Thu Jan 27 16:00:27 2011 +0100 xtensa: Switch do_timer() to xtime_update() xtime_update() takes the xtime_lock itself. set_linux_timer() does not need to be protected by xtime_lock. [ tglx: This code is broken on SMP anyway. ] Signed-off-by: Torben Hohn Cc: Chris Zankel Cc: Peter Zijlstra Cc: johnstul@us.ibm.com Cc: hch@infradead.org Cc: yong.zhang0@gmail.com LKML-Reference: <20110127150027.23248.61798.stgit@localhost> Signed-off-by: Thomas Gleixner commit 4ea1b72551d052a3993ef72ce06ecf5bdd125859 Author: Torben Hohn Date: Thu Jan 27 16:00:22 2011 +0100 sparc: Switch do_timer() to xtime_update() xtime_update() takes the xtime_lock itself. pcic_clear_clock_irq() and clear_clock_irq do not need to be protected by xtime_lock. Signed-off-by: Torben Hohn Acked-by: David S. Miller Cc: Peter Zijlstra Cc: johnstul@us.ibm.com Cc: hch@infradead.org Cc: yong.zhang0@gmail.com LKML-Reference: <20110127150022.23248.80369.stgit@localhost> Signed-off-by: Thomas Gleixner commit bb1dfc1cf6c51ca42f7c05029a6f06df9092a0fc Author: Torben Hohn Date: Thu Jan 27 16:00:17 2011 +0100 parisc: Switch do_timer() to xtime_update() xtime_update() takes the xtime_lock itself. Signed-off-by: Torben Hohn Cc: hch@infradead.org Cc: Peter Zijlstra Cc: johnstul@us.ibm.com Cc: Helge Deller Cc: "James E.J. Bottomley" Cc: Kyle McMartin Cc: yong.zhang0@gmail.com LKML-Reference: <20110127150017.23248.22559.stgit@localhost> Signed-off-by: Thomas Gleixner commit e53f276beb655c711a5d1f25f800b61aa976e34f Author: Torben Hohn Date: Thu Jan 27 16:00:06 2011 +0100 m68k: Switch do_timer() to xtime_update() xtime_update() properly takes the xtime_lock Signed-off-by: Torben Hohn Cc: Sam Creasey Cc: Peter Zijlstra Cc: johnstul@us.ibm.com Cc: Roman Zippel Cc: hch@infradead.org Cc: yong.zhang0@gmail.com Cc: Geert Uytterhoeven Cc: Greg Ungerer LKML-Reference: <20110127150006.23248.71790.stgit@localhost> Signed-off-by: Thomas Gleixner commit 7bde2ab7cb51f14c6f6574f0f5a78445f2caed3e Author: Torben Hohn Date: Thu Jan 27 16:00:01 2011 +0100 m32r: Switch from do_timer() to xtime_update() xtime_update() does proper locking. Signed-off-by: Torben Hohn Cc: Peter Zijlstra Cc: johnstul@us.ibm.com Cc: Hirokazu Takata Cc: hch@infradead.org Cc: yong.zhang0@gmail.com LKML-Reference: <20110127150001.23248.68620.stgit@localhost> Signed-off-by: Thomas Gleixner commit 1aabd67d2e97e6affdf5a7c65f442ac91ace3f85 Author: Torben Hohn Date: Thu Jan 27 15:59:56 2011 +0100 ia64: Switch do_timer() to xtime_update() local_cpu_data->itm_next = new_itm; does not need to be protected by xtime_lock. xtime_update() takes the lock itself. Signed-off-by: Torben Hohn Cc: Fenghua Yu Cc: Tony Luck Cc: Peter Zijlstra Cc: johnstul@us.ibm.com Cc: hch@infradead.org Cc: yong.zhang0@gmail.com LKML-Reference: <20110127145956.23248.49107.stgit@localhost> Signed-off-by: Thomas Gleixner commit daad8b581e7f5e21a2f79e49d57d4f6a73b26510 Author: Torben Hohn Date: Thu Jan 27 15:59:51 2011 +0100 h8300: Switch do_timer() to xtime_update() xtime_update() takes the xtime_lock itself. Signed-off-by: Torben Hohn Cc: Yoshinori Sato Cc: Peter Zijlstra Cc: johnstul@us.ibm.com Cc: hch@infradead.org Cc: yong.zhang0@gmail.com LKML-Reference: <20110127145951.23248.92727.stgit@localhost> Signed-off-by: Thomas Gleixner commit 57464bd87f708e75b47312766e3fc8dc3aaf66ad Author: Torben Hohn Date: Thu Jan 27 15:59:46 2011 +0100 frv: Switch do_timer() to xtime_update() __set_LEDS() does not need to be protected by xtime_lock. its used unprotected in other places. [ tglx: Removed stale comment ] Signed-off-by: Torben Hohn Cc: hch@infradead.org Cc: Peter Zijlstra Cc: johnstul@us.ibm.com Cc: David Howells Cc: yong.zhang0@gmail.com LKML-Reference: <20110127145946.23248.57952.stgit@localhost> Signed-off-by: Thomas Gleixner commit 36cb07bb8118cb14211ef25c58026f005877c47d Author: Torben Hohn Date: Thu Jan 27 15:59:41 2011 +0100 cris: arch-v32: Switch do_timer() to xtime_update() xtime_update() takes the xtime_lock itself. Signed-off-by: Torben Hohn Cc: hch@infradead.org Cc: Jesper Nilsson Cc: Peter Zijlstra Cc: johnstul@us.ibm.com Cc: Mikael Starvik Cc: yong.zhang0@gmail.com LKML-Reference: <20110127145941.23248.92547.stgit@localhost> Signed-off-by: Thomas Gleixner commit 17588b99183ece563013622afeefd37eb8e68fd3 Author: Torben Hohn Date: Thu Jan 27 15:59:36 2011 +0100 cris: arch-v10: Switch do_timer() to xtime_update() This code failed to take the xtime_lock, which must be held when calling do_timer(). Use the safe version xtime_update() Signed-off-by: Torben Hohn Cc: hch@infradead.org Cc: Jesper Nilsson Cc: Peter Zijlstra Cc: johnstul@us.ibm.com Cc: Mikael Starvik Cc: yong.zhang0@gmail.com LKML-Reference: <20110127145936.23248.16192.stgit@localhost> Signed-off-by: Thomas Gleixner commit 4196b892d55caaf2c98da05e80472ca482ca19fe Author: Torben Hohn Date: Thu Jan 27 15:59:31 2011 +0100 blackfin: Switch from do_timer() to xtime_update() xtime_update() takes the xtime_lock itself. Signed-off-by: Torben Hohn Cc: Mike Frysinger Cc: Peter Zijlstra Cc: johnstul@us.ibm.com Cc: hch@infradead.org Cc: yong.zhang0@gmail.com LKML-Reference: <20110127145931.23248.33917.stgit@localhost> Signed-off-by: Thomas Gleixner commit ec2dff2febf19ff2109c2eb3e56d5a969fe399e2 Author: Torben Hohn Date: Thu Jan 27 15:59:26 2011 +0100 arm/mach-clps711x: Switch do_timer() to xtime_update() do_timer() requires holding the xtime_lock, which this code did not do. Use the safe version xtime_update() Signed-off-by: Torben Hohn Cc: Russell King Cc: Peter Zijlstra Cc: johnstul@us.ibm.com Cc: hch@infradead.org Cc: yong.zhang0@gmail.com LKML-Reference: <20110127145926.23248.56369.stgit@localhost> Signed-off-by: Thomas Gleixner commit 6906e33cc555c390cd091f6f363b783322dfedf6 Author: Torben Hohn Date: Thu Jan 27 15:59:21 2011 +0100 arm: Switch from do_timer() to xtime_update() xtime_update takes the xtime_lock itself. Signed-off-by: Torben Hohn Cc: Russell King Cc: Peter Zijlstra Cc: johnstul@us.ibm.com Cc: hch@infradead.org Cc: yong.zhang0@gmail.com LKML-Reference: <20110127145920.23248.75541.stgit@localhost> Signed-off-by: Thomas Gleixner commit 1340f3e0b29b745a33f431455c3a37f48197bc81 Author: Torben Hohn Date: Thu Jan 27 15:59:15 2011 +0100 alpha: Change do_timer() to xtime_update() xtime_update() takes the xtime_lock itself. timer_interrupt() is only called on the boot cpu. See do_entInt(). So "state" in timer_interrupt does not require protection by xtime_lock. Signed-off-by: Torben Hohn Cc: Richard Henderson Cc: Ivan Kokshaysky Cc: Matt Turner Cc: Peter Zijlstra Cc: johnstul@us.ibm.com Cc: hch@infradead.org Cc: yong.zhang0@gmail.com LKML-Reference: <20110127145915.23248.20919.stgit@localhost> Signed-off-by: Thomas Gleixner commit f0af911a9dec9de702645182c8d269449e24d24b Author: Torben Hohn Date: Thu Jan 27 15:59:10 2011 +0100 time: Provide xtime_update() xtime_update() takes xtime_lock write locked and calls do_timer(). Provided to replace the do_timer() calls in the architecture code. Signed-off-by: Torben Hohn Cc: Peter Zijlstra Cc: johnstul@us.ibm.com Cc: yong.zhang0@gmail.com Cc: hch@infradead.org LKML-Reference: <20110127145910.23248.21379.stgit@localhost> Signed-off-by: Thomas Gleixner commit 79ecaf0d15344d78904becf0f25de3fc9b49d430 Author: Thomas Gleixner Date: Mon Jan 31 11:07:54 2011 +0100 time: Remove unused __get_wall_to_monotonic() No users left. Remove it. Signed-off-by: Thomas Gleixner commit 48cf76f7104f655bbd48a75c7759dce82c3e1ab6 Author: Torben Hohn Date: Thu Jan 27 15:59:05 2011 +0100 time: Provide get_xtime_and_monotonic_offset() The hrtimer code accesses timekeeping variables under xtime_lock. Provide a sensible accessor function and use it. [ tglx: Removed the conditionals, unused variable, fixed codingstyle and massaged changelog ] Signed-off-by: Torben Hohn Cc: Peter Zijlstra Cc: johnstul@us.ibm.com Cc: yong.zhang0@gmail.com Cc: hch@infradead.org LKML-Reference: <20110127145905.23248.30458.stgit@localhost> Signed-off-by: Thomas Gleixner commit fbad1ea94159a71bc0f68b00e57ae803606af9fb Author: Torben Hohn Date: Thu Jan 27 15:59:00 2011 +0100 time: Move get_jiffies_64 to kernel/time/jiffies.c Move the jiffies access functions to the jiffies clocksource code. [ tglx: Add missing include ] Signed-off-by: Torben Hohn Cc: Peter Zijlstra Cc: johnstul@us.ibm.com Cc: yong.zhang0@gmail.com Cc: hch@infradead.org LKML-Reference: <20110127145900.23248.73352.stgit@localhost> Signed-off-by: Thomas Gleixner commit 871cf1e5f2a17702f58539a3af8b18fc8666ad4c Author: Torben Hohn Date: Thu Jan 27 15:58:55 2011 +0100 time: Move do_timer() to kernel/time/timekeeping.c do_timer() is primary timekeeping related. calc_global_load() is called from do_timer() as well, but that's more for historical reasons. [ tglx: Fixed up the calc_global_load() reject andmassaged changelog ] Signed-off-by: Torben Hohn Cc: Peter Zijlstra Cc: johnstul@us.ibm.com Cc: yong.zhang0@gmail.com Cc: hch@infradead.org LKML-Reference: <20110127145855.23248.56933.stgit@localhost> Signed-off-by: Thomas Gleixner commit 1fb0ef31f428f345a7c3666f8e7444a563edd537 Author: Thomas Gleixner Date: Mon Jan 31 08:57:41 2011 +0100 genirq: Fix affinity notifier fallout The new code of commit cd7eab44e(genirq: Add IRQ affinity notifiers) references irq_desc.affinity which fails to compile with CONFIG_GENERIC_HARDIRQS_NO_DEPRECATED=y. Use irq_desc.irq_data.affinity instead. Signed-off-by: Thomas Gleixner Cc: Ben Hutchings commit f8a9530939ed87b9a1b1a038b90e355098b679a2 Author: Arnaldo Carvalho de Melo Date: Sun Jan 30 10:46:46 2011 -0200 perf evlist: Move evlist methods to evlist.c They were on evsel.c because they came from refactoring existing evsel methods, so, to make reviewing the changes easier, I kept it there, now its a plain move. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 877108e42b1b9ba64857c4030cf356ecc120fd18 Author: Arnaldo Carvalho de Melo Date: Sat Jan 29 15:44:29 2011 -0200 perf tools: Initial python binding First clarifying that this kind of binding is not a replacement or an equivalent to the 'perf script' way of using python with perf. The 'perf script' way is to process events and look at a given script for some python function that matches the events to pass each event for processing. This is a python module, i.e. everything is driven from the python script, that merely uses "import perf" or "from perf import". perf script is focused on tracepoints, this binding is focused on profiling as an initial target. More work is needed to make available tracepoint specific variables as event variables accessible via this binding. There is one example of such usage model, in tools/perf/python/twatch.py, a tool to watch "cycles" events together with task (fork, exit) and comm perf events. For now, due to me not being able to grok how python distutils cope with building C extensions outside the sources dir the install target just builds it, I'm using it as: [root@emilia linux]# export PYTHONPATH=~acme/git/build/perf/lib.linux-x86_64-2.6/ [root@emilia linux]# tools/perf/python/twatch.py cpu: 4, pid: 30126, tid: 30126 { type: mmap, pid: 30126, tid: 30126, start: 0x4, length: 0x82e9ca03, offset: 0, filename: } cpu: 6, pid: 47, tid: 47 { type: mmap, pid: 47, tid: 47, start: 0x6, length: 0xbef87c36, offset: 0, filename: } cpu: 1, pid: 0, tid: 0 { type: mmap, pid: 0, tid: 0, start: 0x1, length: 0x775d1904, offset: 0, filename: } cpu: 7, pid: 0, tid: 0 { type: mmap, pid: 0, tid: 0, start: 0x7, length: 0xc750aeb6, offset: 0, filename: } cpu: 5, pid: 2255, tid: 2255 { type: mmap, pid: 2255, tid: 2255, start: 0x5, length: 0x76669635, offset: 0, filename: } cpu: 0, pid: 0, tid: 0 { type: mmap, pid: 0, tid: 0, start: 0, length: 0x6422ef6b, offset: 0, filename: } cpu: 2, pid: 2255, tid: 2255 { type: mmap, pid: 2255, tid: 2255, start: 0x2, length: 0xe078757a, offset: 0, filename: } cpu: 1, pid: 5769, tid: 5769 { type: fork, pid: 30127, ppid: 5769, tid: 30127, ptid: 5769, time: 103893991270534} cpu: 6, pid: 30127, tid: 30127 { type: comm, pid: 30127, tid: 30127, comm: ls } cpu: 6, pid: 30127, tid: 30127 { type: exit, pid: 30127, ppid: 30127, tid: 30127, ptid: 30127, time: 103893993273024} The first 8 mmap events in this 8 way machine are a mistery that is still being investigated. More of the tools/perf/util/ APIs will be exposed via this python binding as the need arises. For now the focus is on creating events and processing them, symbol resolution is an obvious next step, with tracepoint variables as a close second step. Cc: Clark Williams Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 8115d60c323dd9931b95221c0a392aeddc1d6ef3 Author: Arnaldo Carvalho de Melo Date: Sat Jan 29 14:01:45 2011 -0200 perf tools: Kill event_t typedef, use 'union perf_event' instead And move the event_t methods to the perf_event__ too. No code changes, just namespace consistency. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 8d50e5b4171a69cf48ca94a1e7c14033d0b4771d Author: Arnaldo Carvalho de Melo Date: Sat Jan 29 13:02:00 2011 -0200 perf tools: Rename 'struct sample_data' to 'struct perf_sample' Making the namespace more uniform. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 93fc64f14472ae24fd640bf3834a178f59142842 Author: Arnaldo Carvalho de Melo Date: Sat Jan 29 12:08:00 2011 -0200 perf top: Switch to non overwrite mode Just like 'perf record'. Warn the user when PERF_RECORD_LOST events happen. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 7bb41152b9be7e31f10d8919bce5034135525d9d Author: Arnaldo Carvalho de Melo Date: Sat Jan 29 09:08:13 2011 -0200 perf evlist: Support non overwrite mode in perf_evlist__read_on_cpu I.e. stash the overwrite mode in struct perf_evlist and act accordingly in perf_evlist__read_on_cpu, not checking for overwrites and touching the tail after consuming one event, like perf record does, for instance. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit ef2bf6d043ac9bd4a6f38d862af407154a4754d9 Author: Arnaldo Carvalho de Melo Date: Sat Jan 29 09:04:40 2011 -0200 perf events: Account PERF_RECORD_LOST events in event__process Right now this function is only used by perf top, that uses PROT_READ only, i.e. overwrite mode, so no PERF_RECORD_LOST events are generated, but don't forget those events. The patch that moved this out of perf top was made so that this routine could be used by 'perf probe' in the uprobes patchset, so perhaps there they need to check for LOST events and warn the user, as will be done in the following patches that will switch 'perf top' to non overwrite mode (mmap with PROT_READ|PROT_WRITE). Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Srikar Dronamraju Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit dc82009aac6ee6e423b48de43a251745c62ab012 Author: Arnaldo Carvalho de Melo Date: Fri Jan 28 14:49:19 2011 -0200 perf record: No need to check for overwrites As we open the mmap with (PROT_READ | PROT_WRITE), signalling the kernel with perf_mmap__write_tail() when consuming data, so the kernel will not overwrite. Suggested-by: Peter Zijlstra Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 4e62445b90ac4ef708bd11c7ae052b1d5ef765b5 Author: Tejun Heo Date: Fri Jan 28 17:22:48 2011 +0100 x86: Fix build failure on X86_UP_APIC Commit 4c321ff8 (x86: Replace cpu_2_logical_apicid[] with early percpu variable) and following changes introduced and used x86_cpu_to_logical_apicid percpu variable. It was declared and defined inside CONFIG_SMP && CONFIG_X86_32 but if CONFIG_X86_UP_APIC is set UP configuration makes use of it and build fails. Fix it by declaring and defining it inside CONFIG_X86_LOCAL_APIC && CONFIG_X86_32. Signed-off-by: Tejun Heo Reported-by: Ingo Molnar Cc: eric.dumazet@gmail.com Cc: yinghai@kernel.org Cc: brgerst@gmail.com Cc: gorcunov@gmail.com Cc: penberg@kernel.org Cc: shaohui.zheng@intel.com Cc: rientjes@google.com LKML-Reference: <20110128162248.GA25746@htj.dyndns.org> Signed-off-by: Ingo Molnar commit 8db78cc4b4048e3add40bca1bc3e55057c319256 Author: Tejun Heo Date: Sun Jan 23 14:37:42 2011 +0100 x86: Unify NUMA initialization between 32 and 64bit Now that everything else is unified, NUMA initialization can be unified too. * numa_init_array() and init_cpu_to_node() are moved from numa_64 to numa. * numa_32::initmem_init() is updated to call numa_init_array() and setup_arch() to call init_cpu_to_node() on 32bit too. * x86_cpu_to_node_map is now initialized to NUMA_NO_NODE on 32bit too. This is safe now as numa_init_array() will initialize it early during boot. This makes NUMA mapping fully initialized before setup_per_cpu_areas() on 32bit too and thus makes the first percpu chunk which contains all the static variables and some of dynamic area allocated with NUMA affinity correctly considered. Signed-off-by: Tejun Heo Cc: yinghai@kernel.org Cc: brgerst@gmail.com Cc: gorcunov@gmail.com Cc: shaohui.zheng@intel.com Cc: rientjes@google.com LKML-Reference: <1295789862-25482-17-git-send-email-tj@kernel.org> Signed-off-by: Ingo Molnar Reported-by: Eric Dumazet Reviewed-by: Pekka Enberg commit de2d9445f1627830ed2ebd00ee9d851986c940b5 Author: Tejun Heo Date: Sun Jan 23 14:37:41 2011 +0100 x86: Unify node_to_cpumask_map handling between 32 and 64bit x86_32 has been managing node_to_cpumask_map explicitly from map_cpu_to_node() and friends in a rather ugly way. With previous changes, it's now possible to share the code with 64bit. * When CONFIG_NUMA_EMU is disabled, numa_add/remove_cpu() are implemented in numa.c and shared by 32 and 64bit. CONFIG_NUMA_EMU versions still live in numa_64.c. NUMA_EMU's dependency on 64bit is planned to be removed and the above should go away together. * identify_cpu() now calls numa_add_cpu() for 32bit too. This makes the explicit mask management from map_cpu_to_node() unnecessary. * The whole x86_32 specific map_cpu_to_node() chunk is no longer necessary. Dropped. Signed-off-by: Tejun Heo Reviewed-by: Pekka Enberg Cc: eric.dumazet@gmail.com Cc: yinghai@kernel.org Cc: brgerst@gmail.com Cc: gorcunov@gmail.com Cc: shaohui.zheng@intel.com Cc: rientjes@google.com LKML-Reference: <1295789862-25482-16-git-send-email-tj@kernel.org> Signed-off-by: Ingo Molnar Cc: David Rientjes Cc: Shaohui Zheng commit 645a79195f66eb68ef3ab2b21d9829ac3aa085a9 Author: Tejun Heo Date: Sun Jan 23 14:37:40 2011 +0100 x86: Unify CPU -> NUMA node mapping between 32 and 64bit Unlike 64bit, 32bit has been using its own cpu_to_node_map[] for CPU -> NUMA node mapping. Replace it with early_percpu variable x86_cpu_to_node_map and share the mapping code with 64bit. * USE_PERCPU_NUMA_NODE_ID is now enabled for 32bit too. * x86_cpu_to_node_map and numa_set/clear_node() are moved from numa_64 to numa. For now, on 32bit, x86_cpu_to_node_map is initialized with 0 instead of NUMA_NO_NODE. This is to avoid introducing unexpected behavior change and will be updated once init path is unified. * srat_detect_node() is now enabled for x86_32 too. It calls numa_set_node() and initializes the mapping making explicit cpu_to_node_map[] updates from map/unmap_cpu_to_node() unnecessary. Signed-off-by: Tejun Heo Cc: eric.dumazet@gmail.com Cc: yinghai@kernel.org Cc: brgerst@gmail.com Cc: gorcunov@gmail.com Cc: penberg@kernel.org Cc: shaohui.zheng@intel.com Cc: rientjes@google.com LKML-Reference: <1295789862-25482-15-git-send-email-tj@kernel.org> Signed-off-by: Ingo Molnar Cc: David Rientjes commit bbc9e2f452d9c4b166d1f9a78d941d80173312fe Author: Tejun Heo Date: Sun Jan 23 14:37:39 2011 +0100 x86: Unify cpu/apicid <-> NUMA node mapping between 32 and 64bit The mapping between cpu/apicid and node is done via apicid_to_node[] on 64bit and apicid_2_node[] + apic->x86_32_numa_cpu_node() on 32bit. This difference makes it difficult to further unify 32 and 64bit NUMA handling. This patch unifies it by replacing both apicid_to_node[] and apicid_2_node[] with __apicid_to_node[] array, which is accessed by two accessors - set_apicid_to_node() and numa_cpu_node(). On 64bit, numa_cpu_node() always consults __apicid_to_node[] directly while 32bit goes through apic->numa_cpu_node() method to allow apic implementations to override it. srat_detect_node() for amd cpus contains workaround for broken NUMA configuration which assumes relationship between APIC ID, HT node ID and NUMA topology. Leave it to access __apicid_to_node[] directly as mapping through CPU might result in undesirable behavior change. The comment is reformatted and updated to note the ugliness. Signed-off-by: Tejun Heo Reviewed-by: Pekka Enberg Cc: eric.dumazet@gmail.com Cc: yinghai@kernel.org Cc: brgerst@gmail.com Cc: gorcunov@gmail.com Cc: shaohui.zheng@intel.com Cc: rientjes@google.com LKML-Reference: <1295789862-25482-14-git-send-email-tj@kernel.org> Signed-off-by: Ingo Molnar Cc: David Rientjes commit 89e5dc218e084e13a3996db6693b01478912f4ee Author: Tejun Heo Date: Sun Jan 23 14:37:38 2011 +0100 x86: Replace apic->apicid_to_node() with ->x86_32_numa_cpu_node() apic->apicid_to_node() is 32bit specific apic operation which determines NUMA node for a CPU. Depending on the APIC implementation, it can be easier to determine NUMA node from either physical or logical apicid. Currently, ->apicid_to_node() takes @logical_apicid and calls hard_smp_processor_id() if the physical apicid is needed. This prevents NUMA mapping from being queried from a different CPU, which in turn makes it impossible to initialize NUMA mapping before SMP bringup. This patch replaces apic->apicid_to_node() with ->x86_32_numa_cpu_node() which takes @cpu, from which both logical and physical apicids can easily be determined. While at it, drop duplicate implementations from bigsmp_32 and summit_32, and use the default one. Signed-off-by: Tejun Heo Reviewed-by: Pekka Enberg Cc: eric.dumazet@gmail.com Cc: yinghai@kernel.org Cc: brgerst@gmail.com Cc: gorcunov@gmail.com Cc: shaohui.zheng@intel.com Cc: rientjes@google.com LKML-Reference: <1295789862-25482-13-git-send-email-tj@kernel.org> Signed-off-by: Ingo Molnar commit df04cf011b0657ddc782b48d455f7e232b9be41c Author: Tejun Heo Date: Sun Jan 23 14:37:37 2011 +0100 x86: Implement x86_32_early_logical_apicid() for numaq_32 Signed-off-by: Tejun Heo Cc: eric.dumazet@gmail.com Cc: yinghai@kernel.org Cc: brgerst@gmail.com Cc: gorcunov@gmail.com Cc: penberg@kernel.org Cc: shaohui.zheng@intel.com Cc: rientjes@google.com LKML-Reference: <1295789862-25482-12-git-send-email-tj@kernel.org> Signed-off-by: Ingo Molnar commit 3b39d937843e071c59b3aeecbf7de4750f095b12 Author: Tejun Heo Date: Sun Jan 23 14:37:36 2011 +0100 x86: Implement x86_32_early_logical_apicid() for summit_32 Factor out logical apic id calculation from summit_init_apic_ldr() and use it for the x86_32_early_logical_apicid() callback. Signed-off-by: Tejun Heo Cc: eric.dumazet@gmail.com Cc: yinghai@kernel.org Cc: brgerst@gmail.com Cc: gorcunov@gmail.com Cc: penberg@kernel.org Cc: shaohui.zheng@intel.com Cc: rientjes@google.com LKML-Reference: <1295789862-25482-11-git-send-email-tj@kernel.org> Signed-off-by: Ingo Molnar commit 12bf24a47c1a095233cc8a8b863b509a0d8e0f2c Author: Tejun Heo Date: Sun Jan 23 14:37:35 2011 +0100 x86: Implement x86_32_early_logical_apicid() for bigsmp_32 Signed-off-by: Tejun Heo Cc: eric.dumazet@gmail.com Cc: yinghai@kernel.org Cc: brgerst@gmail.com Cc: gorcunov@gmail.com Cc: penberg@kernel.org Cc: shaohui.zheng@intel.com Cc: rientjes@google.com LKML-Reference: <1295789862-25482-10-git-send-email-tj@kernel.org> Signed-off-by: Ingo Molnar commit 3f6f6798889d50ec7ca8eef1d100cda37dc658ea Author: Tejun Heo Date: Sun Jan 23 14:37:34 2011 +0100 x86: Implement the default x86_32_early_logical_apicid() Implement x86_32_early_logical_apicid() for the default apic flat routing. Signed-off-by: Tejun Heo Cc: eric.dumazet@gmail.com Cc: yinghai@kernel.org Cc: brgerst@gmail.com Cc: gorcunov@gmail.com Cc: penberg@kernel.org Cc: shaohui.zheng@intel.com Cc: rientjes@google.com LKML-Reference: <1295789862-25482-9-git-send-email-tj@kernel.org> Signed-off-by: Ingo Molnar commit acb8bc09c6185e4d3d582d0076aaa6a89f19d8c5 Author: Tejun Heo Date: Sun Jan 23 14:37:33 2011 +0100 x86: Add apic->x86_32_early_logical_apicid() On x86_32, the mapping between cpu and logical apic ID differs depending on the specific apic implementation in use. The mapping is initialized while bringing up CPUs; however, this makes early inits ignore memory topology. Add a x86_32 specific apic->x86_32_early_logical_apicid() which is called early during boot to query the mapping. The mapping is later verified against the result of init_apic_ldr(). The method is allowed to return BAD_APICID if it can't be determined early. noop variant which always returns BAD_APICID is implemented and added to all x86_32 apic implementations. Signed-off-by: Tejun Heo Cc: eric.dumazet@gmail.com Cc: yinghai@kernel.org Cc: brgerst@gmail.com Cc: gorcunov@gmail.com Cc: penberg@kernel.org Cc: shaohui.zheng@intel.com Cc: rientjes@google.com LKML-Reference: <1295789862-25482-8-git-send-email-tj@kernel.org> Signed-off-by: Ingo Molnar commit 7632611f534340182c832d2b139cb19676f24e1a Author: Tejun Heo Date: Sun Jan 23 14:37:32 2011 +0100 x86: Kill apic->cpu_to_logical_apicid() After the previous patch, apic->cpu_to_logical_apicid() is no longer used. Kill it. For apic types with custom cpu_to_logical_apicid() which is also used for other purposes, remove the function and modify its users to do the mapping directly. #ifdef's on CONFIG_SMP in es7000_32 and summit_32 are ignored during conversion as they are not used for UP kernels. Signed-off-by: Tejun Heo Cc: eric.dumazet@gmail.com Cc: yinghai@kernel.org Cc: brgerst@gmail.com Cc: gorcunov@gmail.com Cc: penberg@kernel.org Cc: shaohui.zheng@intel.com Cc: rientjes@google.com LKML-Reference: <1295789862-25482-7-git-send-email-tj@kernel.org> Signed-off-by: Ingo Molnar commit 6f802c4bfa2acf1bffa8341fe9084da0205d581d Author: Tejun Heo Date: Sun Jan 23 14:37:31 2011 +0100 x86: Always use x86_cpu_to_logical_apicid for cpu -> logical apic id Currently, cpu -> logical apic id translation is done by apic->cpu_to_logical_apicid() callback which may or may not use x86_cpu_to_logical_apicid. This is unnecessary as it should always equal logical_smp_processor_id() which is known early during CPU bring up. Initialize x86_cpu_to_logical_apicid after apic->init_apic_ldr() in setup_local_APIC() and always use x86_cpu_to_logical_apicid for cpu -> logical apic id mapping. Signed-off-by: Tejun Heo Cc: eric.dumazet@gmail.com Cc: yinghai@kernel.org Cc: brgerst@gmail.com Cc: gorcunov@gmail.com Cc: penberg@kernel.org Cc: shaohui.zheng@intel.com Cc: rientjes@google.com LKML-Reference: <1295789862-25482-6-git-send-email-tj@kernel.org> Signed-off-by: Ingo Molnar commit 4c321ff8a01a95badf5d5403d80ca4e0ab07fce7 Author: Tejun Heo Date: Sun Jan 23 14:37:30 2011 +0100 x86: Replace cpu_2_logical_apicid[] with early percpu variable Unlike x86_64, on x86_32, the mapping from cpu to logical apicid may vary depending on apic in use. cpu_2_logical_apicid[] array is used for this mapping. Replace it with early percpu variable x86_cpu_to_logical_apicid to make it better aligned with other mappings. Signed-off-by: Tejun Heo Cc: eric.dumazet@gmail.com Cc: yinghai@kernel.org Cc: brgerst@gmail.com Cc: gorcunov@gmail.com Cc: penberg@kernel.org Cc: shaohui.zheng@intel.com Cc: rientjes@google.com LKML-Reference: <1295789862-25482-5-git-send-email-tj@kernel.org> Signed-off-by: Ingo Molnar commit 1245e1668c6e52bee76a423f8fab3bfcdd6226ae Author: Tejun Heo Date: Sun Jan 23 14:37:29 2011 +0100 x86: Make default_send_IPI_mask_sequence/allbutself_logical() 32bit only Both functions are used only in 32bit. Put them inside CONFIG_X86_32. This is to prepare for logical apicid handling update. - Cyrill Gorcunov spotted that I forgot to move declarations in ipi.h under CONFIG_X86_32. Fixed. Signed-off-by: Tejun Heo Reviewed-by: Pekka Enberg Reviewed-by: Cyrill Gorcunov Acked-by: Yinghai Lu Cc: eric.dumazet@gmail.com Cc: brgerst@gmail.com Cc: shaohui.zheng@intel.com Cc: rientjes@google.com LKML-Reference: <1295789862-25482-4-git-send-email-tj@kernel.org> Signed-off-by: Ingo Molnar commit b78aa66b1fe4179d28e3f6502dc179773519a1bb Author: Tejun Heo Date: Sun Jan 23 14:37:28 2011 +0100 x86: Drop x86_32 MAX_APICID Commit 56d91f13 (x86, acpi: Add MAX_LOCAL_APIC for 32bit) added MAX_LOCAL_APIC for x86_32 but didn't replace MAX_APICID users with it. Convert MAX_APICID users to MAX_LOCAL_APIC and drop MAX_APICID. Signed-off-by: Tejun Heo Reviewed-by: Pekka Enberg Acked-by: Yinghai Lu Cc: eric.dumazet@gmail.com Cc: yinghai@kernel.org Cc: brgerst@gmail.com Cc: gorcunov@gmail.com Cc: shaohui.zheng@intel.com Cc: rientjes@google.com LKML-Reference: <1295789862-25482-3-git-send-email-tj@kernel.org> Signed-off-by: Ingo Molnar commit bd22a2f1982fa3e90ce7d5d011c37d88aa67e73c Author: Tejun Heo Date: Sun Jan 23 14:37:27 2011 +0100 x86: Kill unused static boot_cpu_logical_apicid in smpboot.c Signed-off-by: Tejun Heo Reviewed-by: Pekka Enberg Acked-by: Yinghai Lu Cc: eric.dumazet@gmail.com Cc: yinghai@kernel.org Cc: brgerst@gmail.com Cc: gorcunov@gmail.com Cc: shaohui.zheng@intel.com Cc: rientjes@google.com LKML-Reference: <1295789862-25482-2-git-send-email-tj@kernel.org> Signed-off-by: Ingo Molnar commit 54489c189b1a0c10eaf21c6d2c5916b50442c871 Author: Han Pingtian Date: Tue Jan 25 07:39:00 2011 +0800 perf test: Fix return values checking Fixing some cut'n'paste mistakes. LKML-Reference: <20110124233900.GA3443@epc900.nay.redhat.com> Signed-off-by: Han Pingtian [ committer note: I had already removed the CPU_ALLOC calls ] Signed-off-by: Arnaldo Carvalho de Melo commit 3c42258c9a4db70133fa6946a275b62a16792bb5 Author: Masami Hiramatsu Date: Thu Jan 20 23:15:45 2011 +0900 perf probe: Add filters support for available functions Add filters support for available function list. Default filter is "!_*" for filtering out local-purpose symbols. e.g.: # perf probe --filter="add*" -F add_disk add_disk_randomness add_input_randomness add_interrupt_randomness add_memory add_page_to_unevictable_list add_page_wait_queue ... Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Chase Douglas Cc: Franck Bui-Huu Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Srikar Dronamraju Cc: Steven Rostedt LKML-Reference: <20110120141545.25915.85930.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu Signed-off-by: Arnaldo Carvalho de Melo commit bd09d7b5efeb13965b6725b4a3e9944908bca9d2 Author: Masami Hiramatsu Date: Thu Jan 20 23:15:39 2011 +0900 perf probe: Add variable filter support Add filters support for available variable list. Default filter is "!__k???tab_*&!__crc_*" for filtering out automatically generated symbols. The format of filter rule is "[!]GLOBPATTERN", so you can use wild cards. If the filter rule starts with '!', matched variables are filter out. e.g.: # perf probe -V schedule --externs --filter=cpu* Available variables at schedule @ cpumask_var_t cpu_callout_mask cpumask_var_t cpu_core_map cpumask_var_t cpu_isolated_map cpumask_var_t cpu_sibling_map int cpu_number long unsigned int* cpu_bit_bitmap ... Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Chase Douglas Cc: Franck Bui-Huu Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Srikar Dronamraju Cc: Steven Rostedt LKML-Reference: <20110120141539.25915.43401.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu [ committer note: Removed the elf.h include as it was fixed up in e80711c] Signed-off-by: Arnaldo Carvalho de Melo commit 68baa431ec2f14ba7510d4e79bceb6ceaf0d3b74 Author: Masami Hiramatsu Date: Thu Jan 20 23:15:30 2011 +0900 perf tools: Add strfilter for general purpose string filter Add strfilter for general purpose string filter. Every filter rules are descrived by glob matching pattern and '!' prefix which means Logical NOT. A strfilter consists of those filter rules connected with '&' and '|'. A set of rules can be folded by using '(' and ')'. It also accepts spaces around rules and those operators. Format: ::= | "!" | | "(" ")" ::= "&" | "|" e.g.: "(add* | del*) & *timer" filter rules pass strings which start with add or del and end with timer. This will be used by perf probe --filter. Changes in V2: - Fix to check result of strdup() and strfilter__alloc(). - Encapsulate and simplify interfaces as like regex(3). Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Franck Bui-Huu Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Srikar Dronamraju Cc: Steven Rostedt LKML-Reference: <20110120141530.25915.12673.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu Signed-off-by: Arnaldo Carvalho de Melo commit 8161239a8bcce9ad6b537c04a1fa3b5c68bae693 Author: Lai Jiangshan Date: Fri Jan 14 17:09:41 2011 +0800 rtmutex: Simplify PI algorithm and make highest prio task get lock In current rtmutex, the pending owner may be boosted by the tasks in the rtmutex's waitlist when the pending owner is deboosted or a task in the waitlist is boosted. This boosting is unrelated, because the pending owner does not really take the rtmutex. It is not reasonable. Example. time1: A(high prio) onwers the rtmutex. B(mid prio) and C (low prio) in the waitlist. time2 A release the lock, B becomes the pending owner A(or other high prio task) continues to run. B's prio is lower than A, so B is just queued at the runqueue. time3 A or other high prio task sleeps, but we have passed some time The B and C's prio are changed in the period (time2 ~ time3) due to boosting or deboosting. Now C has the priority higher than B. ***Is it reasonable that C has to boost B and help B to get the rtmutex? NO!! I think, it is unrelated/unneed boosting before B really owns the rtmutex. We should give C a chance to beat B and win the rtmutex. This is the motivation of this patch. This patch *ensures* only the top waiter or higher priority task can take the lock. How? 1) we don't dequeue the top waiter when unlock, if the top waiter is changed, the old top waiter will fail and go to sleep again. 2) when requiring lock, it will get the lock when the lock is not taken and: there is no waiter OR higher priority than waiters OR it is top waiter. 3) In any time, the top waiter is changed, the top waiter will be woken up. The algorithm is much simpler than before, no pending owner, no boosting for pending owner. Other advantage of this patch: 1) The states of a rtmutex are reduced a half, easier to read the code. 2) the codes become shorter. 3) top waiter is not dequeued until it really take the lock: they will retain FIFO when it is stolen. Not advantage nor disadvantage 1) Even we may wakeup multiple waiters(any time when top waiter changed), we hardly cause "thundering herd", the number of wokenup task is likely 1 or very little. 2) two APIs are changed. rt_mutex_owner() will not return pending owner, it will return NULL when the top waiter is going to take the lock. rt_mutex_next_owner() always return the top waiter. will not return NULL if we have waiters because the top waiter is not dequeued. I have fixed the code that use these APIs. need updated after this patch is accepted 1) Document/* 2) the testcase scripts/rt-tester/t4-l2-pi-deboost.tst Signed-off-by: Lai Jiangshan LKML-Reference: <4D3012D5.4060709@cn.fujitsu.com> Reviewed-by: Steven Rostedt Signed-off-by: Steven Rostedt commit d380eaaea70d775c0520dcb5702ea5d2a56b7be9 Merge: dda9911 ef1d1af Author: Ingo Molnar Date: Thu Jan 27 19:23:20 2011 +0100 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/core commit dda99116969142cc41e945a1047a419b937536af Author: Yinghai Lu Date: Fri Jan 21 15:30:01 2011 -0800 x86, perf: Change two init functions to static init_hw_perf_events() is called via early_initcall now. x86_pmu_event_init is x86_pmu member function. So we can change them to static. Signed-off-by: Yinghai Lu Cc: Peter Zijlstra Cc: Paul Mackerras Cc: Arnaldo Carvalho de Melo LKML-Reference: <4D3A16F9.109@kernel.org> Signed-off-by: Ingo Molnar commit 6ea72f12069306b235151c5b05ac0cca7e1dedfa Author: Peter Zijlstra Date: Wed Jan 26 13:36:03 2011 +0100 sched: Avoid expensive initial update_cfs_load(), on UP too Fix the build on UP. Signed-off-by: Peter Zijlstra Cc: Paul Turner LKML-Reference: <20110122044852.102126037@google.com> Signed-off-by: Ingo Molnar commit d123375425d7df4b6081a631fc1203fceafa59b2 Author: Thomas Gleixner Date: Wed Jan 26 21:32:01 2011 +0100 rwsem: Remove redundant asmregparm annotation Peter Zijlstra pointed out, that the only user of asmregparm (x86) is compiling the kernel already with -mregparm=3. So the annotation of the rwsem functions is redundant. Remove it. Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: David Howells Cc: Benjamin Herrenschmidt Cc: Matt Turner Cc: Tony Luck Cc: Heiko Carstens Cc: Paul Mundt Cc: David Miller Cc: Chris Zankel LKML-Reference: Signed-off-by: Thomas Gleixner commit aac72277fda6ef788bb8d5deaa502ce9b9b6e472 Author: Thomas Gleixner Date: Wed Jan 26 20:06:06 2011 +0000 rwsem: Move duplicate function prototypes to linux/rwsem.h All architecture specific rwsem headers carry the same function prototypes. Just x86 adds asmregparm, which is an empty define on all other architectures. S390 has a stale rwsem_downgrade_write() prototype. Remove the duplicates and add the prototypes to linux/rwsem.h Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: David Howells Cc: Benjamin Herrenschmidt Cc: Richard Henderson Acked-by: Tony Luck Acked-by: Heiko Carstens Cc: Paul Mundt Acked-by: David Miller Cc: Chris Zankel LKML-Reference: <20110126195833.970840140@linutronix.de> Signed-off-by: Thomas Gleixner commit 41e5887fa39ab272d9266a09cbefdef270e28b93 Author: Thomas Gleixner Date: Wed Jan 26 20:06:03 2011 +0000 rwsem: Unify the duplicate rwsem_is_locked() inlines Instead of having the same implementation in each architecture, move it to linux/rwsem.h and remove the duplicates. It's unlikely that an arch will ever implement something different, but we can deal with that when it happens. Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: David Howells Cc: Benjamin Herrenschmidt Cc: Matt Turner Acked-by: Tony Luck Acked-by: Heiko Carstens Cc: Paul Mundt Acked-by: David Miller Cc: Chris Zankel LKML-Reference: <20110126195833.876773757@linutronix.de> Signed-off-by: Thomas Gleixner commit 12249b34414dba7f386aadcf6be7ca36c6878300 Author: Thomas Gleixner Date: Wed Jan 26 20:06:00 2011 +0000 rwsem: Move duplicate init macros and functions to linux/rwsem.h The rwsem initializers and related macros and functions are mostly the same. Some of them lack the lockdep initializer, but having it in place does not matter for architectures which do not support lockdep. powerpc, sparc, x86: No functional change sh, s390: Removes the duplicate init_rwsem (inline and #define) alpha, ia64, xtensa: Use the lockdep capable init function in lib/rwsem.c which is just uninlining the init function for the LOCKDEP=n case Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: David Howells Cc: Benjamin Herrenschmidt Cc: Matt Turner Acked-by: Tony Luck Acked-by: Heiko Carstens Cc: Paul Mundt Acked-by: David Miller Cc: Chris Zankel LKML-Reference: <20110126195833.771812729@linutronix.de> commit 1c8ed640d918290ddc1de5ada02ef6686a733c9f Author: Thomas Gleixner Date: Wed Jan 26 20:05:56 2011 +0000 rwsem: Move duplicate struct rwsem declaration to linux/rwsem.h The difference between these declarations is the data type of the count member and the lack of lockdep in some architectures/ long is equivivalent to signed long and the #ifdef guarded dep_map member does not hurt anyone. Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: David Howells Cc: Benjamin Herrenschmidt Cc: Matt Turner Acked-by: Tony Luck Acked-by: Heiko Carstens Cc: Paul Mundt Acked-by: David Miller Cc: Chris Zankel LKML-Reference: <20110126195833.679641914@linutronix.de> Signed-off-by: Thomas Gleixner commit bde11efbc21ea84c3351464a422b467eaefabb9a Author: Thomas Gleixner Date: Wed Jan 26 20:05:53 2011 +0000 x86: Cleanup rwsem_count_t typedef Remove the typedef which has no real reason to be there. Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: David Howells Cc: Benjamin Herrenschmidt Cc: Matt Turner Cc: Tony Luck Cc: Heiko Carstens Cc: Paul Mundt Cc: David Miller Cc: Chris Zankel LKML-Reference: <20110126195833.580335506@linutronix.de> commit c16a87ce063f79e0ec7d25ce2950e1bc6db03c72 Author: Thomas Gleixner Date: Wed Jan 26 20:05:50 2011 +0000 rwsem: Cleanup includes All rwsem implementations include the same headers. Include them from include/linux/rwsem.h Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: David Howells Cc: Benjamin Herrenschmidt Cc: Matt Turner Acked-by: Tony Luck Acked-by: Heiko Carstens Cc: Paul Mundt Acked-by: David Miller Cc: Chris Zankel LKML-Reference: <20110126195833.483520950@linutronix.de> commit d04fa5a3ba06c3b7a1c4a6860d0fa4825507a755 Author: Thomas Gleixner Date: Sun Jan 23 15:30:09 2011 +0100 locking: Remove deprecated lock initializers Last users are gone. Remove the left overs. Signed-off-by: Thomas Gleixner commit 10389a15e25fd4784d42de7e0e3fc8c242f2011d Author: Thomas Gleixner Date: Sun Jan 23 15:25:56 2011 +0100 cred: Replace deprecated spinlock initialization SPIN_LOCK_UNLOCK is deprecated. Use the lockdep capable variant instead. Signed-off-by: Thomas Gleixner commit 92578c0b8078f6919f9b47e7e16a1cf770bd127b Author: Thomas Gleixner Date: Sun Jan 23 15:24:55 2011 +0100 kthread: Replace deprecated spinlock initialization SPIN_LOCK_UNLOCK is deprecated. Use the lockdep capable variant instead. Signed-off-by: Thomas Gleixner commit 235454c99851ba21038061b9acf38d1a636068c5 Author: Thomas Gleixner Date: Sun Jan 23 15:23:14 2011 +0100 xtensa: Replace deprecated spinlock initialization SPIN_LOCK_UNLOCK is deprecated. Use the lockdep capable variant instead. Signed-off-by: Thomas Gleixner Cc: Chris Zankel commit 22e650045899011b028e40625bc73df9f3260bac Author: Thomas Gleixner Date: Sun Jan 23 15:21:25 2011 +0100 um: Replace deprecated spinlock initialization SPIN_LOCK_UNLOCK is deprecated. Use the lockdep capable variant instead. Signed-off-by: Thomas Gleixner Cc: Jeff Dike commit 24774fbdeab8f6ac05a19e81bd645b0f7e5d2bb7 Author: Thomas Gleixner Date: Sun Jan 23 15:19:12 2011 +0100 sparc: Replace deprecated spinlock initialization SPIN_LOCK_UNLOCK is deprecated. Use the lockdep capable variant instead. Signed-off-by: Thomas Gleixner Acked-by: David S. Miller commit 7424cdf77b1b2975d82619084f20f0055f715166 Author: Thomas Gleixner Date: Sun Jan 23 15:16:35 2011 +0100 mips: Replace deprecated spinlock initialization SPIN_LOCK_UNLOCK is deprecated. Use the lockdep capable variant instead. Signed-off-by: Thomas Gleixner Cc: Ralf Baechle commit e41c8ab174f47ed5ed10a365482d0d7b0e352beb Author: Thomas Gleixner Date: Sun Jan 23 15:14:15 2011 +0100 cris: Replace deprecated spinlock initialization SPIN_LOCK_UNLOCK is deprecated. Use the lockdep capable variant instead. Signed-off-by: Thomas Gleixner Cc: Jesper Nilsson commit 61bb46082775acd18c712607615a8b7dbeff7873 Author: Thomas Gleixner Date: Sun Jan 23 15:10:51 2011 +0100 alpha: Replace deprecated spinlock initialization SPIN_LOCK_UNLOCK is deprecated. Use the lockdep capable variant instead. Signed-off-by: Thomas Gleixner Cc: Matt Turner commit f97b12cce6dea51880a6a89d4607c29c70a6a841 Merge: ccaa8d6 1bae4ce Author: Thomas Gleixner Date: Thu Jan 27 12:29:13 2011 +0100 Merge commit 'v2.6.38-rc2' into core/locking Reason: Update to mainline before adding the locking cleanup Signed-off-by: Thomas Gleixner commit ccaa8d657117bb1876d471bd91579d774106778d Author: Arnd Bergmann Date: Tue Jan 25 23:17:32 2011 +0100 rtmutex-tester: Remove BKL tests The BKL is going away, no need to test it any more. I left the definitions of the test case numbers in, so that the other tests do not get renumbered. Signed-off-by: Arnd Bergmann Cc: Arjan van de Ven Cc: Ingo Molnar Cc: Andrew Morton LKML-Reference: <1295993854-4971-19-git-send-email-arnd@arndb.de> Signed-off-by: Thomas Gleixner commit da7a735e51f9622eb3e1672594d4a41da01d7e4f Author: Peter Zijlstra Date: Mon Jan 17 17:03:27 2011 +0100 sched: Fix switch_from_fair() When a task is taken out of the fair class we must ensure the vruntime is properly normalized because when we put it back in it will assume to be normalized. The case that goes wrong is when changing away from the fair class while sleeping. Sleeping tasks have non-normalized vruntime in order to make sleeper-fairness work. So treat the switch away from fair as a wakeup and preserve the relative vruntime. Also update sysrq-n to call the ->switch_{to,from} methods. Reported-by: Onkalo Samu Signed-off-by: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit a8941d7ec81678fb69aea7183338175f112f3e0d Author: Peter Zijlstra Date: Tue Jan 25 16:30:03 2011 +0100 sched: Simplify the idle scheduling class Since commit 48c5ccae88dcd (sched: Simplify cpu-hot-unplug task migration) this should no longer happen, so remove the code. Signed-off-by: Peter Zijlstra LKML-Reference: Signed-off-by: Ingo Molnar commit 414bee9ba613adb3804965e2d84db32d0599f9c6 Author: Venkatesh Pallipadi Date: Tue Dec 21 17:09:04 2010 -0800 softirqs: Account ksoftirqd time as cpustat softirq softirq time in ksoftirqd context is not accounted in ns granularity per cpu softirq stats, as we want that to be a part of ksoftirqd exec_runtime. Accounting them as softirq on /proc/stat separately. Tested-by: Shaun Ruffell Signed-off-by: Venkatesh Pallipadi Signed-off-by: Peter Zijlstra LKML-Reference: <1292980144-28796-6-git-send-email-venki@google.com> Signed-off-by: Ingo Molnar commit abb74cefa9c682fb38ba86c17ca3c86fed6cc464 Author: Venkatesh Pallipadi Date: Tue Dec 21 17:09:03 2010 -0800 sched: Export ns irqtimes through /proc/stat CONFIG_IRQ_TIME_ACCOUNTING adds ns granularity irq time on each CPU. This info is already used in scheduler to do proper task chargeback (earlier patches). This patch retro-fits this ns granularity hardirq and softirq information to /proc/stat irq and softirq fields. The update is still done on timer tick, where we look at accumulated ns hardirq/softirq time and account the tick to user/system/irq/hardirq/guest accordingly. No new interface added. Earlier versions looked at adding this as new fields in some /proc files. This one seems to be the best in terms of impact to existing apps, even though it has somewhat more kernel code than earlier versions. Tested-by: Shaun Ruffell Signed-off-by: Venkatesh Pallipadi Signed-off-by: Peter Zijlstra LKML-Reference: <1292980144-28796-5-git-send-email-venki@google.com> Signed-off-by: Ingo Molnar commit 70a89a6620f658d47a1488515bada4b8ee6291d8 Author: Venkatesh Pallipadi Date: Tue Dec 21 17:09:02 2010 -0800 sched: Refactor account_system_time separating id-update Refactor account_system_time, to separate out the logic of identifying the update needed and code that does actual update. This is used by following patch for IRQ_TIME_ACCOUNTING, which has different identification logic and same update logic. Tested-by: Shaun Ruffell Signed-off-by: Venkatesh Pallipadi Signed-off-by: Peter Zijlstra LKML-Reference: <1292980144-28796-4-git-send-email-venki@google.com> Signed-off-by: Ingo Molnar commit a1dabb6bfffccb897eff3e1d102dacf2a4bedf3b Author: Venkatesh Pallipadi Date: Tue Dec 21 17:09:01 2010 -0800 time: Add nsecs_to_cputime64 interface for asm-generic Add nsecs_to_cputime64 interface. This is used in following patches that updates cpu irq stat based on ns granularity info in IRQ_TIME_ACCOUNTING. Tested-by: Shaun Ruffell Signed-off-by: Venkatesh Pallipadi Signed-off-by: Peter Zijlstra LKML-Reference: <1292980144-28796-3-git-send-email-venki@google.com> Signed-off-by: Ingo Molnar commit 4dd53d891ca46dcc1fde0376a33540d3fd83cb9a Author: Venkatesh Pallipadi Date: Tue Dec 21 17:09:00 2010 -0800 softirqs: Free up pf flag PF_KSOFTIRQD Cleanup patch, freeing up PF_KSOFTIRQD and use per_cpu ksoftirqd pointer instead, as suggested by Eric Dumazet. Tested-by: Shaun Ruffell Signed-off-by: Venkatesh Pallipadi Signed-off-by: Peter Zijlstra LKML-Reference: <1292980144-28796-2-git-send-email-venki@google.com> Signed-off-by: Ingo Molnar commit f07333bf6ee66d9b49286cec4371cf375e745b7a Author: Paul Turner Date: Fri Jan 21 20:45:03 2011 -0800 sched: Avoid expensive initial update_cfs_load() Since cfs->{load_stamp,load_last} are zero-initalized the initial load update will consider the delta to be 'since the beginning of time'. This results in a lot of pointless divisions to bring this large period to be within the sysctl_sched_shares_window. Fix this by initializing load_stamp to be 1 at cfs_rq initialization, this allows for an initial load_stamp > load_last which then lets standard idle truncation proceed. We avoid spinning (and slightly improve consistency) by fixing delta to be [period - 1] in this path resulting in a slightly more predictable shares ramp. (Previously the amount of idle time preserved by the overflow would range between [period/2,period-1].) Signed-off-by: Paul Turner Signed-off-by: Peter Zijlstra LKML-Reference: <20110122044852.102126037@google.com> Signed-off-by: Ingo Molnar commit 6d5ab2932a21ea54406ab95c43ecff90a3eddfda Author: Paul Turner Date: Fri Jan 21 20:45:01 2011 -0800 sched: Simplify update_cfs_shares parameters Re-visiting this: Since update_cfs_shares will now only ever re-weight an entity that is a relative parent of the current entity in enqueue_entity; we can safely issue the account_entity_enqueue relative to that cfs_rq and avoid the requirement for special handling of the enqueue case in update_cfs_shares. Signed-off-by: Paul Turner Signed-off-by: Peter Zijlstra LKML-Reference: <20110122044851.915214637@google.com> Signed-off-by: Ingo Molnar commit 792363d2beceb1c7d865e517fa9939c8b8c1442a Author: Yinghai Lu Date: Fri Jan 21 15:29:54 2011 -0800 x86: Don't copy per_cpu cpuinfo for BSP two times smp_store_cpu_info(0) will do that. Signed-off-by: Yinghai Lu Cc: Suresh Siddha Cc: Tejun Heo Cc: Borislav Petkov LKML-Reference: <4D3A16F2.5090902@kernel.org> Signed-off-by: Ingo Molnar commit b3d7336db553d318e7ec042eb50a70d307013339 Author: Yinghai Lu Date: Fri Jan 21 15:29:44 2011 -0800 x86: Move llc_shared_map out of cpu_info cpu_info is already with per_cpu, We can take llc_shared_map out of cpu_info, and declare it as per_cpu variable directly. So later referencing could be simple and directly instead of diving to find cpu_info at first. Also could make smp_store_cpu_info() much simple to avoid to do save and restore trick. Signed-off-by: Yinghai Lu Cc: Hans Rosenfeld Cc: Alok N Kataria Cc: Stephen Hemminger Cc: Hans J. Koch Cc: Tejun Heo Cc: Borislav Petkov Cc: Andreas Herrmann Cc: Robert Richter Cc: Suresh Siddha LKML-Reference: <4D3A16E8.5020608@kernel.org> Signed-off-by: Ingo Molnar commit 41b2610c3443e6c4760e61fc10eef73f96f9f6a5 Author: Hans Rosenfeld Date: Mon Jan 24 16:05:42 2011 +0100 x86, amd: Extend AMD northbridge caching code to support "Link Control" devices "Link Control" devices (NB function 4) will be used by L3 cache partitioning on family 0x15. Signed-off-by: Hans Rosenfeld Cc: LKML-Reference: <1295881543-572552-4-git-send-email-hans.rosenfeld@amd.com> Signed-off-by: Ingo Molnar commit b453de02b786c63b8928ec822401468131db0a9b Author: Hans Rosenfeld Date: Mon Jan 24 16:05:41 2011 +0100 x86, amd: Enable L3 cache index disable on family 0x15 AMD family 0x15 CPUs support L3 cache index disable, so enable it on them. Signed-off-by: Hans Rosenfeld Cc: LKML-Reference: <1295881543-572552-3-git-send-email-hans.rosenfeld@amd.com> Signed-off-by: Ingo Molnar commit d518573de63fb119e5e9a3137386544671387681 Author: Andreas Herrmann Date: Mon Jan 24 16:05:40 2011 +0100 x86, amd: Normalize compute unit IDs on multi-node processors On multi-node CPUs we don't need the socket wide compute unit ID but the node-wide compute unit ID. Thus we need to normalize the value. This is similar to what we do with cpu_core_id. A compute unit is then identified by physical_package_id, node_id, and compute_unit_id. Signed-off-by: Andreas Herrmann LKML-Reference: <1295881543-572552-2-git-send-email-hans.rosenfeld@amd.com> Signed-off-by: Ingo Molnar commit 9599ec0471deae24044241e2173090d2cbc0e899 Author: Fenghua Yu Date: Mon Jan 17 17:39:15 2011 -0800 x86-64, mem: Convert memmove() to assembly file and fix return value bug memmove_64.c only implements memmove() function which is completely written in inline assembly code. Therefore it doesn't make sense to keep the assembly code in .c file. Currently memmove() doesn't store return value to rax. This may cause issue if caller uses the return value. The patch fixes this issue. Signed-off-by: Fenghua Yu LKML-Reference: <1295314755-6625-1-git-send-email-fenghua.yu@intel.com> Signed-off-by: H. Peter Anvin commit ef1d1af28ca37fdbc2745da040529cd2953c1af5 Author: Arnaldo Carvalho de Melo Date: Tue Jan 18 21:41:45 2011 -0200 perf evsel: Introduce perf_evsel__{in,ex}it Out of the {con,des}structor, as in interpreted language bindings we will need to go back from the wrapper object to the real thing. In that case using container_of will save us to have an extra pointer in the perf_evsel struct. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit d0dd74e853a0a6f37e8061d6d50be41c7034c54c Author: Arnaldo Carvalho de Melo Date: Fri Jan 21 13:46:41 2011 -0200 perf tools: Move event__parse_sample to evsel.c To avoid linking more stuff in the python binding I'm working on, future csets will make the sample type be taken from the evsel itself, but for that we need to first have one file per cpu and per sample_type, not a single perf.data file. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit fd78260b5376173faeb17127bd63b3c99a8e8bfb Author: Arnaldo Carvalho de Melo Date: Tue Jan 18 15:15:24 2011 -0200 perf threads: Move thread_map to separate file To untangle it from struct thread handling, that is tied to symbols, etc. Right now in the python bindings I'm working on I need just a subset of the util/ files, untangling it allows me to do that. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 17ea1b70a87e28457821318341bead2b45563092 Author: Arnaldo Carvalho de Melo Date: Mon Jan 17 14:40:46 2011 -0200 perf tools: Pass the struct opt to the wildcard parsing routine It is needed because it will call parse_event for each tracepoint name that matches, and we pass the perf_evlist via opt->value. Problem introduced in 4503fdd where my assumption about opt being always non NULL made me not look at callers of parse_events outside builtin-*.c. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit d7065adb9b4f3384c2615f0a3dbdb6c3aae1eb18 Author: Franck Bui-Huu Date: Sun Jan 16 17:14:45 2011 +0100 perf record: auto detect when stdout is a pipe This patch gives the ability to 'perf record' to detect when its stdout has been redirected to a pipe. There's now no more need to add '-o -' switch in this case. However '-o ' option has always precedence, that is if specified and stdout has been connected via a pipe then the output will go into the specified output. LKML-Reference: Signed-off-by: Franck Bui-Huu Signed-off-by: Arnaldo Carvalho de Melo commit e80711ca8512c8586da0c3e18e2f1caf73c88731 Author: Masami Hiramatsu Date: Thu Jan 13 21:46:11 2011 +0900 perf probe: Add --funcs to show available functions in symtab Add --funcs to show available functions in symtab. Originally this feature came from Srikar's uprobes patches ( http://lkml.org/lkml/2010/8/27/244 ) e.g. ... __ablkcipher_walk_complete __absent_pages_in_range __account_scheduler_latency __add_pages __alloc_pages_nodemask __alloc_percpu __alloc_reserved_percpu __alloc_skb __alloc_workqueue_key __any_online_cpu __ata_ehi_push_desc ... This also supports symbols in module, e.g. ... cleanup_module cpuid_maxphyaddr emulate_clts emulate_instruction emulate_int_real emulate_invlpg emulator_get_dr emulator_set_dr emulator_task_switch emulator_write_emulated emulator_write_phys fx_init ... Original-patch-from: Srikar Dronamraju Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Franck Bui-Huu Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Srikar Dronamraju Cc: Steven Rostedt LKML-Reference: <20110113124611.22426.10835.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu [ committer note: Add missing elf.h for STB_GLOBAL that broke a RHEL4 build ] Signed-off-by: Arnaldo Carvalho de Melo commit 5069ed86be3c2f28bcdf7fae1374ec0c325aafba Author: Masami Hiramatsu Date: Thu Jan 13 21:46:05 2011 +0900 perf probe: Enable to put probe inline function call site Enable to put probe inline function call site. This will increase line-based probe-ability. $ ./perf probe -L schedule:48 pre_schedule(rq, prev); 50 if (unlikely(!rq->nr_running)) idle_balance(cpu, rq); put_prev_task(rq, prev); next = pick_next_task(rq); 56 if (likely(prev != next)) { sched_info_switch(prev, next); trace_sched_switch_out(prev, next); perf_event_task_sched_out(prev, next); $ ./perf probe -L schedule:48 48 pre_schedule(rq, prev); 50 if (unlikely(!rq->nr_running)) 51 idle_balance(cpu, rq); 53 put_prev_task(rq, prev); 54 next = pick_next_task(rq); 56 if (likely(prev != next)) { 57 sched_info_switch(prev, next); 58 trace_sched_switch_out(prev, next); 59 perf_event_task_sched_out(prev, next); Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Franck Bui-Huu Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Srikar Dronamraju Cc: Steven Rostedt LKML-Reference: <20110113124604.22426.48873.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu Signed-off-by: Arnaldo Carvalho de Melo commit 4cc9cec636e7f78aba7f17606ac13cac07ea5787 Author: Masami Hiramatsu Date: Thu Jan 13 21:45:58 2011 +0900 perf probe: Introduce lines walker interface Introduce die_walk_lines() for walking on the line list of given die, and use it in line_range finder and probe point finder. Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Franck Bui-Huu Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Srikar Dronamraju Cc: Steven Rostedt LKML-Reference: <20110113124558.22426.48170.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu [ committer note: s/%ld/%zd/ for a size_t nlines var that broke f14 x86 build] Signed-off-by: Arnaldo Carvalho de Melo commit b0e8572f3b29c0760b66ba5627a6d5426c88c97d Author: Arnaldo Carvalho de Melo Date: Sun Jan 16 17:39:15 2011 -0200 perf top: Add native_safe_halt to skip symbols Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 529363b76929beb85b81439c61063130af046a21 Author: Frederic Weisbecker Date: Fri Jan 14 04:52:01 2011 +0100 perf callchain: Don't give arbitrary gender to callchain tree nodes Some little callchain tree nodes shyly asked me if they can have sisters. How cute! Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <1294977121-5700-5-git-send-email-fweisbec@gmail.com> Signed-off-by: Frederic Weisbecker Signed-off-by: Arnaldo Carvalho de Melo commit 16537f1355017a285b904bfb6bf767464293e69c Author: Frederic Weisbecker Date: Fri Jan 14 04:52:00 2011 +0100 perf callchain: Rename register_callchain_param into callchain_register_param To make the callchain API naming more consistent. Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <1294977121-5700-4-git-send-email-fweisbec@gmail.com> Signed-off-by: Frederic Weisbecker Signed-off-by: Arnaldo Carvalho de Melo commit f08c3154ac439c4b5762a40107d84e839e08fbc5 Author: Frederic Weisbecker Date: Fri Jan 14 04:51:59 2011 +0100 perf callchain: Rename cumul_hits into callchain_cumul_hits That makes the callchain API naming more consistent and reduce potential naming clashes. Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <1294977121-5700-3-git-send-email-fweisbec@gmail.com> Signed-off-by: Frederic Weisbecker Signed-off-by: Arnaldo Carvalho de Melo commit 1b3a0e9592ebf174af934b3908a2bf6a6fa86169 Author: Frederic Weisbecker Date: Fri Jan 14 04:51:58 2011 +0100 perf callchain: Feed callchains into a cursor The callchains are fed with an array of a fixed size. As a result we iterate over each callchains three times: - 1st to resolve symbols - 2nd to filter out context boundaries - 3rd for the insertion into the tree This also involves some pairs of memory allocation/deallocation everytime we insert a callchain, for the filtered out array of addresses and for the array of symbols that comes along. Instead, feed the callchains through a linked list with persistent allocations. It brings several pros like: - Merge the 1st and 2nd iterations in one. That was possible before but in a way that would involve allocating an array slightly taller than necessary because we don't know in advance the number of context boundaries to filter out. - Much lesser allocations/deallocations. The linked list keeps persistent empty entries for the next usages and is extendable at will. - Makes it easier for multiple sources of callchains to feed a stacktrace together. This is deemed to pave the way for cfi based callchains wherein traditional frame pointer based kernel stacktraces will precede cfi based user ones, producing an overall callchain which size is hardly predictable. This requirement makes the static array obsolete and makes a linked list based iterator a much more flexible fit. Basic testing on a big perf file containing callchains (~ 176 MB) has shown a throughput gain of about 11% with perf report. Cc: Ingo Molnar Cc: Paul Mackerras Cc: Peter Zijlstra LKML-Reference: <1294977121-5700-2-git-send-email-fweisbec@gmail.com> Signed-off-by: Frederic Weisbecker Signed-off-by: Arnaldo Carvalho de Melo commit de5fa3a8a05cd60f59622e88cfeb90416760d78e Author: Arnaldo Carvalho de Melo Date: Sat Jan 15 10:42:46 2011 -0200 perf test: Add test for the evlist mmap routines This test will generate random numbers of calls to some getpid syscalls, then establish an mmap for a group of events that are created to monitor these syscalls. It will receive the events, using mmap, use its PERF_SAMPLE_ID generated sample.id field to map back to its respective perf_evsel instance. Then it checks if the number of syscalls reported as perf events by the kernel corresponds to the number of syscalls made. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 04391debc3e1195222a4dbb162ace6542dd89c1c Author: Arnaldo Carvalho de Melo Date: Sat Jan 15 10:40:59 2011 -0200 perf evlist: Steal mmap reading routine from 'perf top' Will be used in the upcoming 'perf test' entry for the evlist mmap routines. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 98d77b78504a423fca911a26a17bee00ef2fdda2 Author: Han Pingtian Date: Sat Jan 15 07:00:50 2011 +0800 perf test: check if cpu_map__new() return NULL It looks like we should check if cpus is NULL after cpus = cpu_map__new(NULL); in test__open_syscall_event_on_all_cpus(). LKML-Reference: <20110114230050.GA7011@localhost> Signed-off-by: Han Pingtian Signed-off-by: Arnaldo Carvalho de Melo commit d2af9687c96f3864178de1860e6d83873aeef224 Author: Arnaldo Carvalho de Melo Date: Fri Jan 14 16:24:49 2011 -0200 perf test: Check counts on all cpus in test__open_syscall_event_on_all_cpus We were bailing out after the first count mismatch, do it in all to see if only some CPUs are not getting the expected number of events. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 915fce20ecf8f7ff4189d0fff42b62aebf6a57cc Author: Arnaldo Carvalho de Melo Date: Fri Jan 14 16:19:12 2011 -0200 perf tools: Add missing cpu_map__delete() Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 0a27d7f9f417c0305f7efa70631764a53c7af219 Author: Arnaldo Carvalho de Melo Date: Fri Jan 14 15:50:51 2011 -0200 perf record: Use perf_evlist__mmap There is more stuff that can go to the perf_ev{sel,list} layer, like detecting if sample_id_all is available, etc, but lets try using this in 'perf test' first. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 70db7533caef02350ec8d6852e589491bca3a951 Author: Arnaldo Carvalho de Melo Date: Wed Jan 12 22:39:13 2011 -0200 perf evlist: Move the mmap array from perf_evsel Adopting the new model used in 'perf record', where we don't have a map per thread per cpu, instead we have an mmap per cpu, established on the first fd for that cpu and ask the kernel using the PERF_EVENT_IOC_SET_OUTPUT ioctl to send events for the other fds on that cpu for the one with the mmap. The methods moved from perf_evsel to perf_evlist, but for easing review they were modified in place, in evsel.c, the next patch will move the migrated methods to evlist.c. With this 'perf top' now uses the same mmap model used by 'perf record' and the next patches will make 'perf record' use these new routines, establishing a common codebase for both tools. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 115d2d8963a426670ac3ce983fc4c4e001703943 Author: Arnaldo Carvalho de Melo Date: Wed Jan 12 17:11:53 2011 -0200 perf record: Move perf_mmap__write_tail to perf.h Close to perf_mmap__read_head() and the perf_mmap struct definition. This is useful for any recorder, and we will need it in 'perf test'. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 744bd8aa3c8b43447f689a27872fa95e700b8a4f Author: Arnaldo Carvalho de Melo Date: Wed Jan 12 17:07:28 2011 -0200 perf record: Use struct perf_mmap and helpers Paving the way to using perf_evsel->mmap, do this to reduce the patch noise in the next ones. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 70082dd92c4b288bd723a77897e2b555f0e63113 Author: Arnaldo Carvalho de Melo Date: Wed Jan 12 17:03:24 2011 -0200 perf evsel: Introduce mmap support Out of the code in 'perf top'. Record is next in line. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit dd7927f4f8ee75b032ff15aeef4bda49719a443a Author: Arnaldo Carvalho de Melo Date: Wed Jan 12 14:28:51 2011 -0200 perf record: Use perf_evsel__open Now its time to factor out the mmap handling bits into the perf_evsel class. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 72cb7013e08dec29631e0447f9496b7bacd3e14b Author: Arnaldo Carvalho de Melo Date: Wed Jan 12 10:52:47 2011 -0200 perf top: Use perf_evsel__open Now that it handles group_fd and inherit we can use it, sharing it with stat. Next step: 'perf record' should use, then move the mmap_array out of ->priv and into perf_evsel, with top and record sharing this, and at the same time, write a 'perf test' stress test. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 9d04f1781772e11bd58806391555fc23ebb54377 Author: Arnaldo Carvalho de Melo Date: Wed Jan 12 00:08:18 2011 -0200 perf evsel: Allow specifying if the inherit bit should be set As this is a per-cpu attribute, we can't set it up in advance and use it for all the calls. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit f08199d314458610d4ca52f8e86e0a4ec7a7bc54 Author: Arnaldo Carvalho de Melo Date: Tue Jan 11 23:42:19 2011 -0200 perf evsel: Support event groups The perf_evsel__open now have an extra boolean argument specifying if event grouping is desired. The first file descriptor created on a CPU becomes the group leader. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi LKML-Reference: Signed-off-by: Arnaldo Carvalho de Melo commit 5c581041cf97aa7980b442de81ddea8273d6dcde Author: Arnaldo Carvalho de Melo Date: Tue Jan 11 22:30:02 2011 -0200 perf evlist: Adopt the pollfd array Allocating just the space needed for nr_cpus * nr_threads * nr_evsels, not the MAX_NR_CPUS and counters. LKML-Reference: Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi Signed-off-by: Arnaldo Carvalho de Melo commit 361c99a661a78ed22264649440e87fe4fe8da1f2 Author: Arnaldo Carvalho de Melo Date: Tue Jan 11 20:56:53 2011 -0200 perf evsel: Introduce perf_evlist Killing two more perf wide global variables: nr_counters and evsel_list as a list_head. There are more operations that will need more fields in perf_evlist, like the pollfd for polling all the fds in a list of evsel instances. Use option->value to pass the evsel_list to parse_{events,filters}. LKML-Reference: Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mike Galbraith Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Tom Zanussi Signed-off-by: Arnaldo Carvalho de Melo commit cd7eab44e9946c28d595abe3e9a43e945bc49141 Author: Ben Hutchings Date: Wed Jan 19 21:01:44 2011 +0000 genirq: Add IRQ affinity notifiers When initiating I/O on a multiqueue and multi-IRQ device, we may want to select a queue for which the response will be handled on the same or a nearby CPU. This requires a reverse-map of IRQ affinity. Add a notification mechanism to support this. This is based closely on work by Thomas Gleixner . Signed-off-by: Ben Hutchings Cc: linux-net-drivers@solarflare.com Cc: Tom Herbert Cc: David Miller LKML-Reference: <1295470904.11126.84.camel@bwh-desktop> Signed-off-by: Thomas Gleixner commit c9e358dfc4a8cb2227172ef77908c2e0ee17bcb9 Author: Grant Likely Date: Fri Jan 21 09:24:48 2011 -0700 driver-core: remove conditionals around devicetree pointers Having conditional around the of_match_table and the of_node pointers turns out to make driver code use ugly #ifdef blocks. Drop the conditionals and remove the #ifdef blocks from the affected drivers. Also tidy up minor whitespace issues within the same hunks. Signed-off-by: Grant Likely Acked-by: Greg Kroah-Hartman commit f005fe12b90c5b9fe180a09209a893e09affa8aa Author: Yinghai Lu Date: Mon Dec 27 16:48:32 2010 -0800 x86-64: Move out cleanup higmap [_brk_end, _end) out of init_memory_mapping() It is not related to init_memory_mapping(), and init_memory_mapping() is getting more bigger. So make it as seperated function and call it from reserve_brk() and that is point when _brk_end is concluded. Signed-off-by: Yinghai Lu LKML-Reference: <4D1933E0.7090305@kernel.org> Signed-off-by: H. Peter Anvin commit 1411e0ec3123ae4c4ead6bfc9fe3ee5a3ae5c327 Author: Yinghai Lu Date: Mon Dec 27 16:48:17 2010 -0800 x86-64, numa: Put pgtable to local node memory Introduce init_memory_mapping_high(), and use it with 64bit. It will go with every memory segment above 4g to create page table to the memory range itself. before this patch all page tables was on one node. with this patch, one RED-PEN is killed debug out for 8 sockets system after patch [ 0.000000] initial memory mapped : 0 - 20000000 [ 0.000000] init_memory_mapping: [0x00000000000000-0x0000007f74ffff] [ 0.000000] 0000000000 - 007f600000 page 2M [ 0.000000] 007f600000 - 007f750000 page 4k [ 0.000000] kernel direct mapping tables up to 7f750000 @ [0x7f74c000-0x7f74ffff] [ 0.000000] RAMDISK: 7bc84000 - 7f745000 .... [ 0.000000] Adding active range (0, 0x10, 0x95) 0 entries of 3200 used [ 0.000000] Adding active range (0, 0x100, 0x7f750) 1 entries of 3200 used [ 0.000000] Adding active range (0, 0x100000, 0x1080000) 2 entries of 3200 used [ 0.000000] Adding active range (1, 0x1080000, 0x2080000) 3 entries of 3200 used [ 0.000000] Adding active range (2, 0x2080000, 0x3080000) 4 entries of 3200 used [ 0.000000] Adding active range (3, 0x3080000, 0x4080000) 5 entries of 3200 used [ 0.000000] Adding active range (4, 0x4080000, 0x5080000) 6 entries of 3200 used [ 0.000000] Adding active range (5, 0x5080000, 0x6080000) 7 entries of 3200 used [ 0.000000] Adding active range (6, 0x6080000, 0x7080000) 8 entries of 3200 used [ 0.000000] Adding active range (7, 0x7080000, 0x8080000) 9 entries of 3200 used [ 0.000000] init_memory_mapping: [0x00000100000000-0x0000107fffffff] [ 0.000000] 0100000000 - 1080000000 page 2M [ 0.000000] kernel direct mapping tables up to 1080000000 @ [0x107ffbd000-0x107fffffff] [ 0.000000] memblock_x86_reserve_range: [0x107ffc2000-0x107fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00001080000000-0x0000207fffffff] [ 0.000000] 1080000000 - 2080000000 page 2M [ 0.000000] kernel direct mapping tables up to 2080000000 @ [0x207ff7d000-0x207fffffff] [ 0.000000] memblock_x86_reserve_range: [0x207ffc0000-0x207fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00002080000000-0x0000307fffffff] [ 0.000000] 2080000000 - 3080000000 page 2M [ 0.000000] kernel direct mapping tables up to 3080000000 @ [0x307ff3d000-0x307fffffff] [ 0.000000] memblock_x86_reserve_range: [0x307ffc0000-0x307fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00003080000000-0x0000407fffffff] [ 0.000000] 3080000000 - 4080000000 page 2M [ 0.000000] kernel direct mapping tables up to 4080000000 @ [0x407fefd000-0x407fffffff] [ 0.000000] memblock_x86_reserve_range: [0x407ffc0000-0x407fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00004080000000-0x0000507fffffff] [ 0.000000] 4080000000 - 5080000000 page 2M [ 0.000000] kernel direct mapping tables up to 5080000000 @ [0x507febd000-0x507fffffff] [ 0.000000] memblock_x86_reserve_range: [0x507ffc0000-0x507fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00005080000000-0x0000607fffffff] [ 0.000000] 5080000000 - 6080000000 page 2M [ 0.000000] kernel direct mapping tables up to 6080000000 @ [0x607fe7d000-0x607fffffff] [ 0.000000] memblock_x86_reserve_range: [0x607ffc0000-0x607fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00006080000000-0x0000707fffffff] [ 0.000000] 6080000000 - 7080000000 page 2M [ 0.000000] kernel direct mapping tables up to 7080000000 @ [0x707fe3d000-0x707fffffff] [ 0.000000] memblock_x86_reserve_range: [0x707ffc0000-0x707fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00007080000000-0x0000807fffffff] [ 0.000000] 7080000000 - 8080000000 page 2M [ 0.000000] kernel direct mapping tables up to 8080000000 @ [0x807fdfc000-0x807fffffff] [ 0.000000] memblock_x86_reserve_range: [0x807ffbf000-0x807fffffff] PGTABLE [ 0.000000] Initmem setup node 0 [0000000000000000-000000107fffffff] [ 0.000000] NODE_DATA [0x0000107ffbd000-0x0000107ffc1fff] [ 0.000000] Initmem setup node 1 [0000001080000000-000000207fffffff] [ 0.000000] NODE_DATA [0x0000207ffbb000-0x0000207ffbffff] [ 0.000000] Initmem setup node 2 [0000002080000000-000000307fffffff] [ 0.000000] NODE_DATA [0x0000307ffbb000-0x0000307ffbffff] [ 0.000000] Initmem setup node 3 [0000003080000000-000000407fffffff] [ 0.000000] NODE_DATA [0x0000407ffbb000-0x0000407ffbffff] [ 0.000000] Initmem setup node 4 [0000004080000000-000000507fffffff] [ 0.000000] NODE_DATA [0x0000507ffbb000-0x0000507ffbffff] [ 0.000000] Initmem setup node 5 [0000005080000000-000000607fffffff] [ 0.000000] NODE_DATA [0x0000607ffbb000-0x0000607ffbffff] [ 0.000000] Initmem setup node 6 [0000006080000000-000000707fffffff] [ 0.000000] NODE_DATA [0x0000707ffbb000-0x0000707ffbffff] [ 0.000000] Initmem setup node 7 [0000007080000000-000000807fffffff] [ 0.000000] NODE_DATA [0x0000807ffba000-0x0000807ffbefff] Signed-off-by: Yinghai Lu LKML-Reference: <4D1933D1.9020609@kernel.org> Signed-off-by: H. Peter Anvin commit dbef7b56d2fc5115f26f72a0b080283bbf972cab Author: Yinghai Lu Date: Mon Dec 27 16:48:08 2010 -0800 x86-64, numa: Allocate memnodemap under max_pfn_mapped We need to access it right way, so make sure that it is mapped already. Prepare to put page table on local node, and nodemap is used before that. Signed-off-by: Yinghai Lu LKML-Reference: <4D1933C8.7060105@kernel.org> Signed-off-by: H. Peter Anvin commit 45635ab5e41bcde94a82f9a05d660ef77fe38c1b Author: Yinghai Lu Date: Mon Dec 27 16:47:54 2010 -0800 x86: Change get_max_mapped() to inline Move it into head file. to prepare use it in other files. [ hpa: added missing and changed type to phys_addr_t. ] Signed-off-by: Yinghai Lu LKML-Reference: <4D1933BA.8000508@kernel.org> Signed-off-by: H. Peter Anvin commit 1a4a678b12c84db9ae5dce424e0e97f0559bb57c Author: Yinghai Lu Date: Fri Dec 17 16:59:07 2010 -0800 memblock: Make find_memory_core_early() find from top-down That is used for find ram in node or bootmem type. We should make it top-down so it will be consistent to memblock_find, and to avoid allocating potentially valuable low memory before we actually need it. Suggested-by: Jeremy Fitzhardinge Signed-off-by: Yinghai Lu LKML-Reference: <4D0C075B.3040501@kernel.org> Signed-off-by: H. Peter Anvin commit 32e3f2b00c529477d26895c5428ed95bba537443 Author: Yinghai Lu Date: Fri Dec 17 16:58:40 2010 -0800 x86-64, gart: Fix allocation with memblock When trying to change alloc_bootmem with memblock to go with real top-down Found one old system: [ 0.000000] Node 0: aperture @ ac000000 size 64 MB [ 0.000000] Aperture pointing to e820 RAM. Ignoring. [ 0.000000] Your BIOS doesn't leave a aperture memory hole [ 0.000000] Please enable the IOMMU option in the BIOS setup [ 0.000000] This costs you 64 MB of RAM [ 0.000000] memblock_x86_reserve_range: [0x2020000000-0x2023ffffff] aperture64 [ 0.000000] Cannot allocate aperture memory hole (ffff882020000000,65536K) [ 0.000000] memblock_x86_free_range: [0x2020000000-0x2023ffffff] [ 0.000000] Kernel panic - not syncing: Not enough memory for aperture [ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.37-rc5-tip-yh-06229-gb792dc2-dirty #331 [ 0.000000] Call Trace: [ 0.000000] [] ? panic+0x91/0x1a3 [ 0.000000] [] ? gart_iommu_hole_init+0x3d7/0x4a3 [ 0.000000] [] ? _etext+0x0/0x3 [ 0.000000] [] ? pci_iommu_alloc+0x47/0x71 [ 0.000000] [] ? mem_init+0x19/0xec [ 0.000000] [] ? start_kernel+0x20a/0x3e8 [ 0.000000] [] ? x86_64_start_reservations+0x9c/0xa0 [ 0.000000] [] ? x86_64_start_kernel+0x114/0x11b it means __alloc_bootmem_nopanic() get too high for that aperture. Use memblock_find_in_range() with limit directly. Signed-off-by: Yinghai Lu LKML-Reference: <4D0C0740.90104@kernel.org> Signed-off-by: H. Peter Anvin commit 4b239f458c229de044d6905c2b0f9fe16ed9e01e Author: Yinghai Lu Date: Fri Dec 17 16:58:28 2010 -0800 x86-64, mm: Put early page table high While dubug kdump, found current kernel will have problem with crashkernel=512M. It turns out that initial mapping is to 512M, and later initial mapping to 4G (acutally is 2040M in my platform), will put page table near 512M. then initial mapping to 128g will be near 2g. before this patch: [ 0.000000] initial memory mapped : 0 - 20000000 [ 0.000000] init_memory_mapping: [0x00000000000000-0x0000007f74ffff] [ 0.000000] 0000000000 - 007f600000 page 2M [ 0.000000] 007f600000 - 007f750000 page 4k [ 0.000000] kernel direct mapping tables up to 7f750000 @ [0x1fffc000-0x1fffffff] [ 0.000000] memblock_x86_reserve_range: [0x1fffc000-0x1fffdfff] PGTABLE [ 0.000000] init_memory_mapping: [0x00000100000000-0x0000207fffffff] [ 0.000000] 0100000000 - 2080000000 page 2M [ 0.000000] kernel direct mapping tables up to 2080000000 @ [0x7bc01000-0x7bc83fff] [ 0.000000] memblock_x86_reserve_range: [0x7bc01000-0x7bc7efff] PGTABLE [ 0.000000] RAMDISK: 7bc84000 - 7f745000 [ 0.000000] crashkernel reservation failed - No suitable area found. after patch: [ 0.000000] initial memory mapped : 0 - 20000000 [ 0.000000] init_memory_mapping: [0x00000000000000-0x0000007f74ffff] [ 0.000000] 0000000000 - 007f600000 page 2M [ 0.000000] 007f600000 - 007f750000 page 4k [ 0.000000] kernel direct mapping tables up to 7f750000 @ [0x7f74c000-0x7f74ffff] [ 0.000000] memblock_x86_reserve_range: [0x7f74c000-0x7f74dfff] PGTABLE [ 0.000000] init_memory_mapping: [0x00000100000000-0x0000207fffffff] [ 0.000000] 0100000000 - 2080000000 page 2M [ 0.000000] kernel direct mapping tables up to 2080000000 @ [0x207ff7d000-0x207fffffff] [ 0.000000] memblock_x86_reserve_range: [0x207ff7d000-0x207fffafff] PGTABLE [ 0.000000] RAMDISK: 7bc84000 - 7f745000 [ 0.000000] memblock_x86_reserve_range: [0x17000000-0x36ffffff] CRASH KERNEL [ 0.000000] Reserving 512MB of memory at 368MB for crashkernel (System RAM: 133120MB) It means with the patch, page table for [0, 2g) will need 2g, instead of under 512M, page table for [4g, 128g) will be near 128g, instead of under 2g. That would good, if we have lots of memory above 4g, like 1024g, or 2048g or 16T, will not put related page table under 2g. that would be have chance to fill the under 2g if 1G or 2M page is not used. the code change will use add map_low_page() and update unmap_low_page() for 64bit, and use them to get access the corresponding high memory for page table setting. Signed-off-by: Yinghai Lu LKML-Reference: <4D0C0734.7060900@kernel.org> Signed-off-by: H. Peter Anvin