commit 08dcf29e01dcb786c13dc80045bd65f804117efb Merge: 11320d1... d546b67... Author: Linus Torvalds Date: Wed Mar 26 15:02:12 2008 -0700 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86: x86: fix performance drop for glx x86: fix trim mtrr not to setup_memory two times x86: GEODE: add missing module.h include x86, cpufreq: fix Speedfreq-SMI call that clobbers ECX x86: fix memoryless node oops during boot x86: add dmi quirk for io_delay x86: convert mtrr/generic.c to kernel-doc x86: Documentation/i386/IO-APIC.txt: fix description commit 11320d17ce4ecf8002dc8f9b6f1e49cd18e45a94 Author: Nishanth Aravamudan Date: Wed Mar 26 14:40:20 2008 -0700 hugetlb: fix potential livelock in return_unused_surplus_hugepages() Running the counters testcase from libhugetlbfs results in on 2.6.25-rc5 and 2.6.25-rc5-mm1: BUG: soft lockup - CPU#3 stuck for 61s! [counters:10531] NIP: c0000000000d1f3c LR: c0000000000d1f2c CTR: c0000000001b5088 REGS: c000005db12cb360 TRAP: 0901 Not tainted (2.6.25-rc5-autokern1) MSR: 8000000000009032 CR: 48008448 XER: 20000000 TASK = c000005dbf3d6000[10531] 'counters' THREAD: c000005db12c8000 CPU: 3 GPR00: 0000000000000004 c000005db12cb5e0 c000000000879228 0000000000000004 GPR04: 0000000000000010 0000000000000000 0000000000200200 0000000000100100 GPR08: c0000000008aba10 000000000000ffff 0000000000000004 0000000000000000 GPR12: 0000000028000442 c000000000770080 NIP [c0000000000d1f3c] .return_unused_surplus_pages+0x84/0x18c LR [c0000000000d1f2c] .return_unused_surplus_pages+0x74/0x18c Call Trace: [c000005db12cb5e0] [c000005db12cb670] 0xc000005db12cb670 (unreliable) [c000005db12cb670] [c0000000000d24c4] .hugetlb_acct_memory+0x2e0/0x354 [c000005db12cb740] [c0000000001b5048] .truncate_hugepages+0x1d4/0x214 [c000005db12cb890] [c0000000001b50a4] .hugetlbfs_delete_inode+0x1c/0x3c [c000005db12cb920] [c000000000103fd8] .generic_delete_inode+0xf8/0x1c0 [c000005db12cb9b0] [c0000000001b5100] .hugetlbfs_drop_inode+0x3c/0x24c [c000005db12cba50] [c00000000010287c] .iput+0xdc/0xf8 [c000005db12cbad0] [c0000000000fee54] .dentry_iput+0x12c/0x194 [c000005db12cbb60] [c0000000000ff050] .d_kill+0x6c/0xa4 [c000005db12cbbf0] [c0000000000ffb74] .dput+0x18c/0x1b0 [c000005db12cbc70] [c0000000000e9e98] .__fput+0x1a4/0x1e8 [c000005db12cbd10] [c0000000000e61ec] .filp_close+0xb8/0xe0 [c000005db12cbda0] [c0000000000e62d0] .sys_close+0xbc/0x134 [c000005db12cbe30] [c00000000000872c] syscall_exit+0x0/0x40 Instruction dump: ebbe8038 38800010 e8bf0002 3bbd0008 7fa3eb78 38a50001 7ca507b4 4818df25 60000000 38800010 38a00000 7c601b78 <7fa3eb78> 2f800010 409d0008 38000010 This was tracked down to a potential livelock in return_unused_surplus_hugepages(). In the case where we have surplus pages on some node, but no free pages on the same node, we may never break out of the loop. To avoid this livelock, terminate the search if we iterate a number of times equal to the number of online nodes without freeing a page. Thanks to Andy Whitcroft and Adam Litke for helping with debugging and the patch. Signed-off-by: Nishanth Aravamudan Signed-off-by: Linus Torvalds commit a1de09195b294c6a4c5dec8c8defd0a2688d3f75 Author: Nishanth Aravamudan Date: Wed Mar 26 14:37:53 2008 -0700 hugetlb: indicate surplus huge page counts in per-node meminfo Currently we show the surplus hugetlb pool state in /proc/meminfo, but not in the per-node meminfo files, even though we track the information on a per-node basis. Printing it there can help track down dynamic pool bugs including the one in the follow-on patch. Signed-off-by: Nishanth Aravamudan Signed-off-by: Linus Torvalds commit d546b67a940eb42a99f56b86c5cd8d47c8348c2a Author: Suresh Siddha Date: Tue Mar 25 17:39:12 2008 -0700 x86: fix performance drop for glx fix the 3D performance drop reported at: http://bugzilla.kernel.org/show_bug.cgi?id=10328 fb drivers are using ioremap()/ioremap_nocache(), followed by mtrr_add with WC attribute. Recent changes in page attribute code made both ioremap()/ioremap_nocache() mappings as UC (instead of previous UC-). This breaks the graphics performance, as the effective memory type is UC instead of expected WC. The correct way to fix this is to add ioremap_wc() (which uses UC- in the absence of PAT kernel support and WC with PAT) and change all the fb drivers to use this new ioremap_wc() API. We can take this correct and longer route for post 2.6.25. For now, revert back to the UC- behavior for ioremap/ioremap_nocache. Signed-off-by: Suresh Siddha Signed-off-by: Venkatesh Pallipadi Cc: Arjan van de Ven Signed-off-by: Ingo Molnar commit 76c324182bbd29dfe4298ca65efb15be18055df1 Author: Yinghai Lu Date: Sun Mar 23 00:16:49 2008 -0700 x86: fix trim mtrr not to setup_memory two times we could call find_max_pfn() directly instead of setup_memory() to get max_pfn needed for mtrr trimming. otherwise setup_memory() is called two times... that is duplicated... [ mingo@elte.hu: both Thomas and me simulated a double call to setup_bootmem_allocator() and can confirm that it is a real bug which can hang in certain configs. It's not been reported yet but that is probably due to the relatively scarce nature of MTRR-trimming systems. ] Signed-off-by: Yinghai Lu Signed-off-by: Ingo Molnar commit 923a0cf82f2b504e316642e2d152d38b6c0be4ba Author: Andres Salomon Date: Wed Mar 26 14:13:01 2008 -0400 x86: GEODE: add missing module.h include On Wed, 26 Mar 2008 11:56:22 -0600 Jordan Crouse wrote: > On 26/03/08 14:31 +0100, Stefan Pfetzing wrote: > > Hello Jordan, > > > > I just tried to build your geodwdt driver for the geode watchdog. Therefore > > I pulled your repository from http://git.infradead.org/geode.git (or more, > > the git url). > > > > I tried to build the geodewdt driver as a module - which didn't work, and > > it failed with the same problem as earlier mentioned on lkmk [1]. I also > > checked the fix [2], but that seems to be already in your (or linus) tree - > > and so I'm unsure what the problem is. > > > > [1] http://kerneltrap.org/mailarchive/linux-kernel/2008/2/17/884074 > > [2] http://kerneltrap.org/mailarchive/linux-kernel/2008/2/17/884174 > > > > Building directly into the kernel seems to work. > > > > Maybe you have some idea? > > Hmm - that is strange. Exporting the symbols should work. I recommend > starting over with a clean tree. > > CCing Andres - any thoughts? > > Jordan > Er, yeah. The patch below should fix it. This should probably go into 2.6.25. Oops, EXPORT_SYMBOL_GPL wasn't being declared due to this header being missing. Signed-off-by: Andres Salomon Signed-off-by: Ingo Molnar commit c6e8256a7b15033bc5d7797e25c7e053040c4c7c Author: Stephan Diestelhorst Date: Mon Mar 10 16:05:41 2008 +0100 x86, cpufreq: fix Speedfreq-SMI call that clobbers ECX I have found that using SMI to change the cpu's frequency on my DELL Latitude L400 clobbers the ECX register in speedstep_set_state, causing unneccessary retries because the "state" variable has changed silently (GCC assumes it is still present in ECX). play safe and avoid gcc caching any register across IO port accesses that trigger SMIs. Signed-off by: Signed-off-by: Ingo Molnar commit 475613b9e374bf0c15340eb166a962da04aa02e8 Author: Yinghai Lu Date: Sun Feb 24 23:23:09 2008 -0800 x86: fix memoryless node oops during boot fix oops during boot reported in this thread: http://lkml.org/lkml/2008/2/6/65 enable booting on memoryless nodes. Reported-by: Kamalesh Babulal Signed-off-by: Ingo Molnar commit 3c274c2909e17aa0afeded4cd4520b7357357ca0 Author: Ingo Molnar Date: Fri Mar 21 10:06:32 2008 +0100 x86: add dmi quirk for io_delay reported by mereandor@gmail.com, in: http://bugzilla.kernel.org/show_bug.cgi?id=6307 Signed-off-by: Ingo Molnar Signed-off-by: Thomas Gleixner commit 1d3381ebf42de1b6f8c118732893cb5bdc37edcd Author: Randy Dunlap Date: Thu Mar 13 16:59:12 2008 -0700 x86: convert mtrr/generic.c to kernel-doc Convert function comment blocks to kernel-doc notation. Signed-off-by: Randy Dunlap Signed-off-by: Ingo Molnar Signed-off-by: Thomas Gleixner commit c0c20fb5a8f2e2eddf7f0e5467c7511fee907903 Author: Nick Andrew Date: Tue Mar 4 15:05:40 2008 -0800 x86: Documentation/i386/IO-APIC.txt: fix description The description of the interrupt routing doesn't match the (nice) diagram. Signed-off-by: Nick Andrew Signed-off-by: Andrew Morton Signed-off-by: Ingo Molnar Signed-off-by: Thomas Gleixner commit 6edef97e17ca1c322b146023862da8a39e36204d Author: Masami Hiramatsu Date: Wed Mar 26 15:53:19 2008 -0400 kprobes: MAINTAINERS update Add Masami Hiramatsu to kprobes maintainers Signed-off-by: Masami Hiramatsu Signed-off-by: Linus Torvalds commit 5254149f6c4e938fea3735183434e208097bd188 Merge: 8f404fa... ec1f5ee... Author: Linus Torvalds Date: Wed Mar 26 11:30:17 2008 -0700 Merge branch 'slab-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm * 'slab-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm: slab: fix cache_cache bootstrap in kmem_cache_init() count_partial() is not used if !SLUB_DEBUG and !CONFIG_SLABINFO commit 8f404faa72f4e458e7bd81ac75ce55ae829e953d Merge: 729eb52... 06d8308... Author: Linus Torvalds Date: Wed Mar 26 11:29:35 2008 -0700 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt: NOHZ: reevaluate idle sleep length after add_timer_on() clocksource: revert: use init_timer_deferrable for clocksource_watchdog commit 729eb528c7e10a4828fece102872ec5255946f64 Merge: c8237a5... 5eb7f9f... Author: Linus Torvalds Date: Wed Mar 26 11:27:24 2008 -0700 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: relay: set an spd_release() hook for splice set relay file can not be read by pread(2) commit c8237a5fcea9d49a73275b4c8f541dd42f8da1a4 Author: Tom Tucker Date: Tue Mar 25 22:27:19 2008 -0400 SVCRDMA: Check num_sge when setting LAST_CTXT bit The RDMACTXT_F_LAST_CTXT bit was getting set incorrectly when the last chunk in the read-list spanned multiple pages. This resulted in a kernel panic when the wrong context was used to build the RPC iovec page list. RDMA_READ is used to fetch RPC data from the client for NFS_WRITE requests. A scatter-gather is used to map the advertised client side buffer to the server-side iovec and associated page list. WR contexts are used to convey which scatter-gather entries are handled by each WR. When the write data is large, a single RPC may require multiple RDMA_READ requests so the contexts for a single RPC are chained together in a linked list. The last context in this list is marked with a bit RDMACTXT_F_LAST_CTXT so that when this WR completes, the CQ handler code can enqueue the RPC for processing. The code in rdma_read_xdr was setting this bit on the last two contexts on this list when the last read-list chunk spanned multiple pages. This caused the svc_rdma_recvfrom logic to incorrectly build the RPC and caused the kernel to crash because the second-to-last context doesn't contain the iovec page list. Modified the condition that sets this bit so that it correctly detects the last context for the RPC. Signed-off-by: Tom Tucker Tested-by: Roland Dreier Signed-off-by: J. Bruce Fields Signed-off-by: Linus Torvalds commit 12c22d6ef299ccf0955e5756eb57d90d7577ac68 Author: Linus Torvalds Date: Wed Mar 26 11:22:40 2008 -0700 Revert "PCI: remove transparent bridge sizing" This reverts commit 8fa5913d54f3b1e09948e6a0db34da887e05ff1f, which caused various interesting problems for people, including wrong resource allocations. See for example bugzilla entry "2.6.25-rc2: ohci1394 problem (MMIO broken)" at http://bugzilla.kernel.org/show_bug.cgi?id=10080 And Gary Hade says: "The same change had also exposed an issue reported by Paul Martin that has been causing an Oops while hotplugging ThinkPads to a ThinkPad Dock II. See http://lkml.org/lkml/2008/2/19/405 http://bugzilla.kernel.org/show_bug.cgi?id=9961 I have a fix for the ThinkPad docking Oops but if the issue being discussed here is caused by the transparent bridge sizing removal change I totally agree that it should be reverted." The transparent bridge sizing removal change was motivated by insufficient PCI memory resource for a transparent bridge window that was being created as a result of expansion ROM(s) being included in the transparent bridge sizing calculations. A later "PCI: Remove default PCI expansion ROM memory allocation" change ( re: http://lkml.org/lkml/2007/12/11/361 ) removes the expansion ROM(s) from the transparent bridge sizing calculations which actually resolves the original issue in a different manner. So, even if the "PCI: remove transparent bridge sizing" is not problematic it is no longer needed anyway." Identified-by: Ivan Kokshaysky Tested-by: Thomas Meyer Acked-by: Gary Hade Acked-by: Ingo Molnar Cc: Stefan Richter Signed-off-by: Linus Torvalds commit ec1f5eeeb5a79a0d48036de649a3498da42db565 Author: Daniel Yeisley Date: Tue Mar 25 23:59:08 2008 +0200 slab: fix cache_cache bootstrap in kmem_cache_init() Commit 556a169dab38b5100df6f4a45b655dddd3db94c1 ("slab: fix bootstrap on memoryless node") introduced bootstrap-time cache_cache list3s for all nodes but forgot that initkmem_list3 needs to be accessed by [somevalue + node]. This patch fixes list_add() corruption in mm/slab.c seen on the ES7000. Cc: Mel Gorman Cc: Olaf Hering Cc: Christoph Lameter Signed-off-by: Dan Yeisley Signed-off-by: Pekka Enberg Signed-off-by: Christoph Lameter commit 53625b4204753b904addd40ca96d9ba802e6977d Author: Christoph Lameter Date: Wed Mar 19 13:42:07 2008 -0700 count_partial() is not used if !SLUB_DEBUG and !CONFIG_SLABINFO Avoid warnings about unused functions if neither SLUB_DEBUG nor CONFIG_SLABINFO is defined. This patch will be reversed when slab defrag is merged since slab defrag requires count_partial() to determine the fragmentation status of slab caches. Signed-off-by: Christoph Lameter commit 5eb7f9fa847b8ab6e4864bfb8cb45f370844a47c Author: Jens Axboe Date: Wed Mar 26 12:04:09 2008 +0100 relay: set an spd_release() hook for splice relay doesn't reference the pages it adds, however we need a non-NULL hook or splice_to_pipe() can oops. Signed-off-by: Jens Axboe commit 37529fe9f62835e1c11895a1895064748b032dc1 Author: Lai Jiangshan Date: Wed Mar 26 12:01:28 2008 +0100 set relay file can not be read by pread(2) I found that relay files can be read by pread(2). I fix it, for relay files are not capable of seeking. Signed-off-by: Lai Jiangshan Signed-off-by: Jens Axboe commit 06d8308c61e54346585b2691c13ee3f90cb6fb2f Author: Thomas Gleixner Date: Sat Mar 22 09:20:24 2008 +0100 NOHZ: reevaluate idle sleep length after add_timer_on() add_timer_on() can add a timer on a CPU which is currently in a long idle sleep, but the timer wheel is not reevaluated by the nohz code on that CPU. So a timer can be delayed for quite a long time. This triggered a false positive in the clocksource watchdog code. To avoid this we need to wake up the idle CPU and enforce the reevaluation of the timer wheel for the next timer event. Add a function, which checks a given CPU for idle state, marks the idle task with NEED_RESCHED and sends a reschedule IPI to notify the other CPU of the change in the timer wheel. Call this function from add_timer_on(). Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra Acked-by: Ingo Molnar Cc: stable@kernel.org -- include/linux/sched.h | 6 ++++++ kernel/sched.c | 43 +++++++++++++++++++++++++++++++++++++++++++ kernel/timer.c | 10 +++++++++- 3 files changed, 58 insertions(+), 1 deletion(-) commit 898a19de1502649877091b398229026b4142c0e2 Author: Thomas Gleixner Date: Tue Mar 25 09:01:51 2008 +0100 clocksource: revert: use init_timer_deferrable for clocksource_watchdog Revert commit 1077f5a917b7c630231037826b344b2f7f5b903f Author: Parag Warudkar Date: Wed Jan 30 13:30:01 2008 +0100 clocksource.c: use init_timer_deferrable for clocksource_watchdog clocksource_watchdog can use a deferrable timer - reduces wakeups from idle per second. The watchdog timer needs to run with the specified interval. Otherwise it will miss the possible wrap of the watchdog clocksource. Signed-off-by: Thomas Gleixner Cc: stable@kernel.org