commit b922df7383749a1c0b7ea64c50fa839263d3816b Merge: c54dcd8... cdbb92b... Author: Linus Torvalds Date: Fri Oct 10 13:10:51 2008 -0700 Merge branch 'rcu-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'rcu-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (21 commits) rcu: RCU-based detection of stalled CPUs for Classic RCU, fix rcu: RCU-based detection of stalled CPUs for Classic RCU rcu: add rcu_read_lock_sched() / rcu_read_unlock_sched() rcu: fix sparse shadowed variable warning doc/RCU: fix pseudocode in rcuref.txt rcuclassic: fix compiler warning rcu: use irq-safe locks rcuclassic: fix compilation NG rcu: fix locking cleanup fallout rcu: remove redundant ACCESS_ONCE definition from rcupreempt.c rcu: fix classic RCU locking cleanup lockdep problem rcu: trace fix possible mem-leak rcu: just rename call_rcu_bh instead of making it a macro rcu: remove list_for_each_rcu() rcu: fixes to include/linux/rcupreempt.h rcu: classic RCU locking and memory-barrier cleanups rcu: prevent console flood when one CPU sees another AWOL via RCU rcu, debug: detect stalled grace periods, cleanups rcu, debug: detect stalled grace periods rcu classic: new algorithm for callbacks-processing(v2) ... commit c54dcd8ec9f05c8951d1e622e90904aef95379f9 Merge: b11ce8a... 9ac684f... Author: Linus Torvalds Date: Fri Oct 10 12:44:43 2008 -0700 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: selinux: Fix an uninitialized variable BUG/panic in selinux_secattr_to_sid() selinux: use default proc sid on symlinks file capabilities: uninline cap_safe_nice Update selinux info in MAINTAINERS and Kconfig help text SELinux: add gitignore file for mdp script SELinux: add boundary support and thread context assignment securityfs: do not depend on CONFIG_SECURITY selinux: add support for installing a dummy policy (v2) security: add/fix security kernel-doc selinux: Unify for- and while-loop style selinux: conditional expression type validation was off-by-one smack: limit privilege by label SELinux: Fix a potentially uninitialised variable in SELinux hooks SELinux: trivial, remove unneeded local variable SELinux: Trivial minor fixes that change C null character style make selinux_write_opts() static commit b11ce8a26d26ed9019a8803aa90d580b52f23e79 Merge: f6bccf6... a5d8c34... Author: Linus Torvalds Date: Fri Oct 10 12:42:31 2008 -0700 Merge branch 'sched-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (38 commits) sched debug: add name to sched_domain sysctl entries sched: sync wakeups vs avg_overlap sched: remove redundant code in cpu_cgroup_create() sched_rt.c: resch needed in rt_rq_enqueue() for the root rt_rq cpusets: scan_for_empty_cpusets(), cpuset doesn't seem to be so const sched: minor optimizations in wake_affine and select_task_rq_fair sched: maintain only task entities in cfs_rq->tasks list sched: fixup buddy selection sched: more sanity checks on the bandwidth settings sched: add some comments to the bandwidth code sched: fixlet for group load balance sched: rework wakeup preemption CFS scheduler: documentation about scheduling policies sched: clarify ifdef tangle sched: fix list traversal to use _rcu variant sched: turn off WAKEUP_OVERLAP sched: wakeup preempt when small overlap kernel/cpu.c: create a CPU_STARTING cpu_chain notifier kernel/cpu.c: Move the CPU_DYING notifiers sched: fix __load_balance_iterator() for cfq with only one task ... commit f6bccf695431da0e9bd773550ae91b8cb9ffb227 Merge: 3af73d3... a0f000e... Author: Linus Torvalds Date: Fri Oct 10 11:20:42 2008 -0700 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: skcipher - Use RNG interface instead of get_random_bytes crypto: rng - RNG interface and implementation crypto: api - Add fips_enable flag crypto: skcipher - Move IV generators into their own modules crypto: cryptomgr - Test ciphers using ECB crypto: api - Use test infrastructure crypto: cryptomgr - Add test infrastructure crypto: tcrypt - Add alg_test interface crypto: tcrypt - Abort and only log if there is an error crypto: crc32c - Use Intel CRC32 instruction crypto: tcrypt - Avoid using contiguous pages crypto: api - Display larval objects properly crypto: api - Export crypto_alg_lookup instead of __crypto_alg_lookup crypto: Kconfig - Replace leading spaces with tabs commit 3af73d392c9c414ca527bab9c5d4c2a97698acbd Merge: 13dd7f8... eedd5d0... Author: Linus Torvalds Date: Fri Oct 10 11:16:33 2008 -0700 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (29 commits) RDMA/nes: Fix slab corruption IB/mlx4: Set RLKEY bit for kernel QPs RDMA/nes: Correct error_module bit mask RDMA/nes: Fix routed RDMA connections RDMA/nes: Enhanced PFT management scheme RDMA/nes: Handle AE bounds violation RDMA/nes: Limit critical error interrupts RDMA/nes: Stop spurious MAC interrupts RDMA/nes: Correct tso_wqe_length RDMA/nes: Fill in firmware version for ethtool RDMA/nes: Use ethtool timer value RDMA/nes: Correct MAX TSO frags value RDMA/nes: Enable MC/UC after changing MTU RDMA/nes: Free NIC TX buffers when destroying NIC QP RDMA/nes: Fix MDC setting RDMA/nes: Add wqm_quanta module option RDMA/nes: Module parameter permissions RDMA/cxgb3: Set active_mtu in ib_port_attr RDMA/nes: Add support for 4-port 1G HP blade card RDMA/nes: Make mini_cm_connect() static ... commit 13dd7f876dffb44088c5435c3df1986e33cff960 Merge: b0af205... 27eccf4... Author: Linus Torvalds Date: Fri Oct 10 11:13:55 2008 -0700 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: choose better identifiers dlm: remove bkl dlm: fix address compare dlm: fix locking of lockspace list in dlm_scand dlm: detect available userspace daemon dlm: allow multiple lockspace creates commit b0af205afb111e17ac8db64c3b9c4f2c332de92a Merge: 73f6aa4... 0c2322e... Author: Linus Torvalds Date: Fri Oct 10 11:11:47 2008 -0700 Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm * git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: dm: detect lost queue dm: publish dm_vcalloc dm: publish dm_table_unplug_all dm: publish dm_get_mapinfo dm: export struct dm_dev dm crypt: avoid unnecessary wait when splitting bio dm crypt: tidy ctx pending dm crypt: fix async inc_pending dm crypt: move dec_pending on error into write_io_submit dm crypt: remove inc_pending from write_io_submit dm crypt: tidy write loop pending dm crypt: tidy crypt alloc dm crypt: tidy inc pending dm exception store: use chunk_t for_areas dm exception store: introduce area_location function dm raid1: kcopyd should stop on error if errors handled dm mpath: remove is_active from struct dm_path dm mpath: use more error codes Fixed up trivial conflict in drivers/md/dm-mpath.c manually. commit 73f6aa4d44ab6157badc456ddfa05b31e58de5f0 Author: Christoph Hellwig Date: Fri Oct 10 17:28:29 2008 +1100 Fix barrier fail detection in XFS Currently we disable barriers as soon as we get a buffer in xlog_iodone that has the XBF_ORDERED flag cleared. But this can be the case not only for buffers where the barrier failed, but also the first buffer of a split log write in case of a log wraparound. Due to the disabled barriers we can easily get directory corruption on unclean shutdowns. So instead of using this check add a new buffer flag for failed barrier writes. This is a regression vs 2.6.26 caused by patch to use the right macro to check for the ORDERED flag, as we previously got true returned for every buffer. Thanks to Toei Rei for reporting the bug. Signed-off-by: Christoph Hellwig Reviewed-by: Eric Sandeen Reviewed-by: David Chinner Signed-off-by: Tim Shimmin Signed-off-by: Linus Torvalds commit 445e1ceda377a681c6f53595311b0d654ca21003 Merge: ef5bef3... 254db57... Author: Linus Torvalds Date: Fri Oct 10 11:02:22 2008 -0700 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: GFS2: Support for I/O barriers GFS2: Add UUID to GFS2 sb GFS2: high time to take some time over atime GFS2: The war on bloat GFS2: GFS2 will panic if you misspell any mount options GFS2: Direct IO write at end of file error GFS2: Use an IS_ERR test rather than a NULL test GFS2: Fix race relating to glock min-hold time GFS2: Fix & clean up GFS2 rename GFS2: rm on multiple nodes causes panic GFS2: Fix metafs mounts GFS2: Fix debugfs glock file iterator commit ef5bef357cdf49f3a386c7102dbf3be5f7e5c913 Merge: e26feff... 41bfcf9... Author: Linus Torvalds Date: Fri Oct 10 10:53:26 2008 -0700 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (37 commits) [SCSI] zfcp: fix double dbf id usage [SCSI] zfcp: wait on SCSI work to be finished before proceeding with init dev [SCSI] zfcp: fix erp list usage without using locks [SCSI] zfcp: prevent fc_remote_port_delete calls for unregistered rport [SCSI] zfcp: fix deadlock caused by shared work queue tasks [SCSI] zfcp: put threshold data in hba trace [SCSI] zfcp: Simplify zfcp data structures [SCSI] zfcp: Simplify get_adapter_by_busid [SCSI] zfcp: remove all typedefs and replace them with standards [SCSI] zfcp: attach and release SAN nameserver port on demand [SCSI] zfcp: remove unused references, declarations and flags [SCSI] zfcp: Update message with input from review [SCSI] zfcp: add queue_full sysfs attribute [SCSI] scsi_dh: suppress comparison warning [SCSI] scsi_dh: add Dell product information into rdac device handler [SCSI] qla2xxx: remove the unused SCSI_QLOGIC_FC_FIRMWARE option [SCSI] qla2xxx: fix printk format warnings [SCSI] qla2xxx: Update version number to 8.02.01-k8. [SCSI] qla2xxx: Ignore payload reserved-bits during RSCN processing. [SCSI] qla2xxx: Additional residual-count corrections during UNDERRUN handling. ... commit e26feff647ef34423b048b940540a0059001ddb0 Merge: d403a64... b911e47... Author: Linus Torvalds Date: Fri Oct 10 10:52:45 2008 -0700 Merge branch 'for-2.6.28' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.28' of git://git.kernel.dk/linux-2.6-block: (132 commits) doc/cdrom: Trvial documentation error, file not present block_dev: fix kernel-doc in new functions block: add some comments around the bio read-write flags block: mark bio_split_pool static block: Find bio sector offset given idx and offset block: gendisk integrity wrapper block: Switch blk_integrity_compare from bdev to gendisk block: Fix double put in blk_integrity_unregister block: Introduce integrity data ownership flag block: revert part of d7533ad0e132f92e75c1b2eb7c26387b25a583c1 bio.h: Remove unused conditional code block: remove end_{queued|dequeued}_request() block: change elevator to use __blk_end_request() gdrom: change to use __blk_end_request() memstick: change to use __blk_end_request() virtio_blk: change to use __blk_end_request() blktrace: use BLKTRACE_BDEV_SIZE as the name size for setup structure block: add lld busy state exporting interface block: Fix blk_start_queueing() to not kick a stopped queue include blktrace_api.h in headers_install ... commit d403a6484f0341bf0624d17ece46f24f741b6a92 Merge: ed458df... e496e3d... Author: Linus Torvalds Date: Fri Oct 10 08:07:53 2008 -0700 Merge phase #1 of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip This merges phase 1 of the x86 tree, which is a collection of branches: x86/alternatives, x86/cleanups, x86/commandline, x86/crashdump, x86/debug, x86/defconfig, x86/doc, x86/exports, x86/fpu, x86/gart, x86/idle, x86/mm, x86/mtrr, x86/nmi-watchdog, x86/oprofile, x86/paravirt, x86/reboot, x86/sparse-fixes, x86/tsc, x86/urgent and x86/vmalloc and as Ingo says: "these are the easiest, purely independent x86 topics with no conflicts, in one nice Octopus merge". * 'x86-v28-for-linus-phase1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (147 commits) x86: mtrr_cleanup: treat WRPROT as UNCACHEABLE x86: mtrr_cleanup: first 1M may be covered in var mtrrs x86: mtrr_cleanup: print out correct type v2 x86: trivial printk fix in efi.c x86, debug: mtrr_cleanup print out var mtrr before change it x86: mtrr_cleanup try gran_size to less than 1M, v3 x86: mtrr_cleanup try gran_size to less than 1M, cleanup x86: change MTRR_SANITIZER to def_bool y x86, debug printouts: IOMMU setup failures should not be KERN_ERR x86: export set_memory_ro and set_memory_rw x86: mtrr_cleanup try gran_size to less than 1M x86: mtrr_cleanup prepare to make gran_size to less 1M x86: mtrr_cleanup safe to get more spare regs now x86_64: be less annoying on boot, v2 x86: mtrr_cleanup hole size should be less than half of chunk_size, v2 x86: add mtrr_cleanup_debug command line x86: mtrr_cleanup optimization, v2 x86: don't need to go to chunksize to 4G x86_64: be less annoying on boot x86, olpc: fix endian bug in openfirmware workaround ... commit ed458df4d2470adc02762a87a9ad665d0b1a2bd4 Author: Linus Torvalds Date: Fri Oct 10 08:00:17 2008 -0700 PnP: move pnpacpi/pnpbios_init to after PCI init We already did that a long time ago for pnp_system_init, but pnpacpi_init and pnpbios_init remained as subsys_initcalls, and get linked into the kernel before the arch-specific routines that finalize the PCI resources (pci_subsys_init). This means that the PnP routines would either register their resources before the PCI layer could, or would be unable to check whether a PCI resource had already been registered. Both are problematic. I wanted to do this before 2.6.27, but every time we change something like this, something breaks. That said, _every_ single time we trust some firmware (like PnP tables) more than we trust the hardware itself (like PCI probing), the problems have been worse. Signed-off-by: Linus Torvalds commit 82219fceeb654789a9dd7cd3c6cce12dbf659342 Merge: 3fa8749... 0395e61... Author: Linus Torvalds Date: Fri Oct 10 07:46:45 2008 -0700 Merge branch 'upstream-2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: ata_piix: IDE Mode SATA patch for Intel Ibex Peak DeviceIDs libata-eh: clear UNIT ATTENTION after reset ata_piix: add Hercules EC-900 mini-notebook to ich_laptop short cable list libata: reorder ata_device to remove 8 bytes of padding on 64 bits [libata] pata_bf54x: Add proper PM operation pata_sil680: convert CONFIG_PPC_MERGE to CONFIG_PPC libata: Implement disk shock protection support [libata] Introduce ata_id_has_unload() PATA: RPC now selects HAVE_PATA_PLATFORM for pata platform driver ata_piix: drop merged SCR access and use slave_link instead libata: implement slave_link libata: misc updates to prepare for slave link libata: reimplement link iterator libata: make SCR access ops per-link commit 0c2322e4ce144e130c03d813fe92de3798662c5e Author: Alasdair G Kergon Date: Fri Oct 10 13:37:13 2008 +0100 dm: detect lost queue Detect and report buggy drivers that destroy their request_queue. Signed-off-by: Alasdair G Kergon Cc: Stefan Raspl Cc: Jens Axboe Cc: Andrew Morton commit 54160904260fa764ba6e2dc738770be30fdf9553 Author: Mikulas Patocka Date: Fri Oct 10 13:37:12 2008 +0100 dm: publish dm_vcalloc Publish dm_vcalloc in include/linux/device-mapper.h because this function is used by targets. Signed-off-by: Mikulas Patocka Signed-off-by: Alasdair G Kergon commit ea0ec640940c2ae3a8d71af3249fccf06a9997a3 Author: Mikulas Patocka Date: Fri Oct 10 13:37:11 2008 +0100 dm: publish dm_table_unplug_all Publish dm_table_unplug_all in include/linux/device-mapper.h because this function is used by targets. Signed-off-by: Mikulas Patocka Signed-off-by: Alasdair G Kergon commit 89343da077ad564ed130c46e5ea6a79388410fa5 Author: Mikulas Patocka Date: Fri Oct 10 13:37:10 2008 +0100 dm: publish dm_get_mapinfo Publish dm_get_mapinfo in include/linux/device-mapper.h because this function is used by targets. Signed-off-by: Mikulas Patocka Signed-off-by: Alasdair G Kergon commit 82b1519b345d61dcfae526e3fcb08128f39f9bcc Author: Mikulas Patocka Date: Fri Oct 10 13:37:09 2008 +0100 dm: export struct dm_dev Split struct dm_dev in two and publish the part that other targets need in include/linux/device-mapper.h. Signed-off-by: Mikulas Patocka Signed-off-by: Alasdair G Kergon commit 933f01d43326fb12a978a8e0bb062c28a2de4d5a Author: Milan Broz Date: Fri Oct 10 13:37:08 2008 +0100 dm crypt: avoid unnecessary wait when splitting bio Don't wait between submitting crypt requests for a bio unless we are short of memory. There are two situations when we must split an encrypted bio: 1) there are no free pages; 2) the new bio would violate underlying device restrictions (e.g. max hw segments). In case (2) we do not need to wait. Add output variable to crypt_alloc_buffer() to distinguish between these cases. Signed-off-by: Milan Broz Signed-off-by: Alasdair G Kergon commit c8081618a9f832fdf7ca81eb087f9f61f2bf07d5 Author: Milan Broz Date: Fri Oct 10 13:37:08 2008 +0100 dm crypt: tidy ctx pending Move the initialisation of ctx->pending into one place, at the start of crypt_convert(). Introduce crypt_finished to indicate whether or not the encryption is finished, for use in a later patch. No functional change. Signed-off-by: Milan Broz Signed-off-by: Alasdair G Kergon commit 4e59409891c9cc30cb4d5d73250b0c968af8e39b Author: Milan Broz Date: Fri Oct 10 13:37:07 2008 +0100 dm crypt: fix async inc_pending The pending reference count must be incremented *before* the async work is queued to another thread, not after. Otherwise there's a race if the work completes and decrements the reference count before it gets incremented. Signed-off-by: Milan Broz Signed-off-by: Alasdair G Kergon commit 6c031f41db15b6cb0cd33545cec28ca706cd3c7e Author: Milan Broz Date: Fri Oct 10 13:37:06 2008 +0100 dm crypt: move dec_pending on error into write_io_submit Make kcryptd_crypt_write_io_submit() responsible for decrementing the pending count after an error. Also fixes a bug in the async path that forgot to decrement it. Signed-off-by: Milan Broz Signed-off-by: Alasdair G Kergon commit 1e37bb8e557a186d327eb4d1387953880ffc2cdd Author: Alasdair G Kergon Date: Fri Oct 10 13:37:05 2008 +0100 dm crypt: remove inc_pending from write_io_submit Make the caller reponsible for incrementing the pending count before calling kcryptd_crypt_write_io_submit() in the non-async case to bring it into line with the async case. Signed-off-by: Alasdair G Kergon commit fc5a5e9aa878f86642c962b309f793fb2db0727e Author: Milan Broz Date: Fri Oct 10 13:37:04 2008 +0100 dm crypt: tidy write loop pending Move kcryptd_crypt_write_convert_loop inside kcryptd_crypt_write_convert. This change is needed for a later patch. No functional change. Signed-off-by: Milan Broz Signed-off-by: Alasdair G Kergon commit dc440d1e56c481f80d5350daadc7d078a04ca729 Author: Milan Broz Date: Fri Oct 10 13:37:03 2008 +0100 dm crypt: tidy crypt alloc Factor out crypt io allocation code. Later patches will call it from another place. No functional change. Signed-off-by: Milan Broz Signed-off-by: Alasdair G Kergon commit 3e1a8bdd05d6b1734a8ccf7af28042d72c447780 Author: Milan Broz Date: Fri Oct 10 13:37:02 2008 +0100 dm crypt: tidy inc pending Move io pending to one place. No functional change, usefull to simplify debugging. Signed-off-by: Milan Broz Signed-off-by: Alasdair G Kergon commit fd14acf6fc9f4635be201960004d847b14236a20 Author: Mikulas Patocka Date: Fri Oct 10 13:37:01 2008 +0100 dm exception store: use chunk_t for_areas Change uint32_t into chunk_t to remove 32-bit limitation on the number of chunks on systems with 64-bit sector numbers. Signed-off-by: Mikulas Patocka Signed-off-by: Alasdair G Kergon commit a481db784682b33d078c7bf8a1d0581dc09946c1 Author: Mikulas Patocka Date: Fri Oct 10 13:37:00 2008 +0100 dm exception store: introduce area_location function Move this logic to a function, because it will be reused later. Signed-off-by: Mikulas Patocka Signed-off-by: Alasdair G Kergon commit f7c83e2e4783c4f7abe6f3a85a8c5e210f98bc7b Author: Jonathan Brassow Date: Fri Oct 10 13:36:59 2008 +0100 dm raid1: kcopyd should stop on error if errors handled dm-raid1 is setting the 'DM_KCOPYD_IGNORE_ERROR' flag unconditionally when assigning kcopyd work. kcopyd is responsible for copying an assigned section of disk to one or more other disks. The 'DM_KCOPYD_IGNORE_ERROR' flag affects kcopyd in the following way: When not set: kcopyd will immediately stop the copy operation when an error is encountered. When set: kcopyd will try to proceed regardless of errors and try to continue copying any remaining amount. Since dm-raid1 tracks regions of the address space that are (or are not) in sync and it now has the ability to handle these errors, we can safely enable this optimization. This optimization is conditional on whether mirror error handling has been enabled. Signed-off-by: Jonathan Brassow Signed-off-by: Alasdair G Kergon commit 6680073d3ec7c6dbdbf77870bf1fea869767d779 Author: Kiyoshi Ueda Date: Fri Oct 10 13:36:58 2008 +0100 dm mpath: remove is_active from struct dm_path This patch moves 'is_active' from struct dm_path to struct pgpath as it does not need exporting. Signed-off-by: Kiyoshi Ueda Signed-off-by: Jun'ichi Nomura Signed-off-by: Alasdair G Kergon commit 01460f3520c100010aacc8f8500cafcb17ce4665 Author: Benjamin Marzinski Date: Fri Oct 10 13:36:57 2008 +0100 dm mpath: use more error codes This patch allows path errors from the multipath ctr function to propagate up to userspace as errno values from the ioctl() call. This is in response to https://www.redhat.com/archives/dm-devel/2008-May/msg00000.html and https://bugzilla.redhat.com/show_bug.cgi?id=444421 The patch only lets through the errors that it needs to in order to get the path errors from parse_path(). Signed-off-by: Benjamin Marzinski Signed-off-by: Alasdair G Kergon commit b911e473d24633c19414b54b82b9ff0b1a2419d7 Author: Randy Dunlap Date: Fri Oct 10 08:22:44 2008 +0200 doc/cdrom: Trvial documentation error, file not present The sbpcd tester program is not included in the kernel source tree, so remove the reference to it. Signed-off-by: Randy Dunlap Reported-by: Nick Warne Signed-off-by: Jens Axboe commit eedd5d0a707a8ad704e03bda5fbfe6b1a8e5f028 Merge: a7e80ce... c752c78... b9012e0... e441d63... 943c246... 7097228... cd86f42... d57f5f7... 208dde2... fbcffcc... Author: Roland Dreier Date: Thu Oct 9 17:41:15 2008 -0700 Merge branches 'cma', 'cxgb3', 'ehca', 'ipath', 'ipoib', 'mad', 'misc', 'mlx4', 'mthca' and 'nes' into for-next commit fbcffcc6a0536544fa53cd5bd5c4913efe1a5982 Author: Chien Tung Date: Thu Oct 9 17:41:05 2008 -0700 RDMA/nes: Fix slab corruption Referencing cm_node after it is freed via rem_ref_cm_node() causes a slab corruption. There is no need to set cm_node->cm_id to NULL in mini_cm_close(). Signed-off-by: Chien Tung Signed-off-by: Roland Dreier commit 9ac684fc38cf17fbd25c0c9e388713c5ddfa3b14 Merge: 3fa8749... 81990fb... Author: James Morris Date: Fri Oct 10 11:09:47 2008 +1100 Merge branch 'next' into for-linus commit a5d8c3483a6e19aca95ef6a2c5890e33bfa5b293 Author: Ingo Molnar Date: Thu Oct 9 11:35:51 2008 +0200 sched debug: add name to sched_domain sysctl entries add /proc/sys/kernel/sched_domain/cpu0/domain0/name, to make it easier to see which specific scheduler domain remained at that entry. Since we process the scheduler domain tree and simplify it, it's not always immediately clear during debugging which domain came from where. depends on CONFIG_SCHED_DEBUG=y. Signed-off-by: Ingo Molnar commit 57d1b5366f46fe434e565b710baf683daff78dd8 Author: Randy Dunlap Date: Thu Oct 9 10:42:38 2008 +0200 block_dev: fix kernel-doc in new functions Fix kernel-doc in new functions: Error(mmotm-2008-1002-1617//fs/block_dev.c:895): duplicate section name 'Description' Error(mmotm-2008-1002-1617//fs/block_dev.c:924): duplicate section name 'Description' Warning(mmotm-2008-1002-1617//fs/block_dev.c:1282): No description found for parameter 'pathname' Signed-off-by: Randy Dunlap cc: Andrew Patterson Signed-off-by: Jens Axboe commit af5639424008ffe96f89b059bea1aec15e0115a9 Author: Jens Axboe Date: Thu Oct 9 09:01:10 2008 +0200 block: add some comments around the bio read-write flags Signed-off-by: Jens Axboe commit 6feef531f55cf4a20fd9eb39f5352e5745203603 Author: Denis ChengRq Date: Thu Oct 9 08:57:05 2008 +0200 block: mark bio_split_pool static Since all bio_split calls refer the same single bio_split_pool, the bio_split function can use bio_split_pool directly instead of the mempool_t parameter; then the mempool_t parameter can be removed from bio_split param list, and bio_split_pool is only referred in fs/bio.c file, can be marked static. Signed-off-by: Denis ChengRq Signed-off-by: Jens Axboe commit ad3316bf4eeb53c89164f759767f911072b56203 Author: Martin K. Petersen Date: Wed Oct 1 22:42:53 2008 -0400 block: Find bio sector offset given idx and offset Helper function to find the sector offset in a bio given bvec index and page offset. Signed-off-by: Martin K. Petersen Signed-off-by: Jens Axboe commit b02739b01c5309d74a59859f2ce92c931d1f1955 Author: Martin K. Petersen Date: Thu Oct 2 18:47:49 2008 +0200 block: gendisk integrity wrapper This is a wrapper for accessing a gendisk's integrity bits. It allows the integrity support in MD to be compiled with BLK_DEV_INTEGRITY off. Signed-off-by: Martin K. Petersen Signed-off-by: Jens Axboe commit ad7fce93147d32ae53d25d9ea1a8ba31a239deee Author: Martin K. Petersen Date: Wed Oct 1 03:38:39 2008 -0400 block: Switch blk_integrity_compare from bdev to gendisk The DM and MD integrity support now depends on being able to use gendisks instead of block_devices when comparing integrity profiles. Change function parameters accordingly. Also update comparison logic so that two NULL profiles are a valid configuration. Signed-off-by: Martin K. Petersen Signed-off-by: Jens Axboe commit 0c032ab889e7b20b8a5a7d09313e4aca214a15f7 Author: Martin K. Petersen Date: Wed Oct 1 03:38:38 2008 -0400 block: Fix double put in blk_integrity_unregister - kobject_del already puts the parent. - Set integrity profile to NULL to prevent stale data. Signed-off-by: Martin K. Petersen Signed-off-by: Jens Axboe commit 74aa8c2cc010035a7eef2b4ca4d6430e0dae206a Author: Martin K. Petersen Date: Wed Oct 1 03:38:37 2008 -0400 block: Introduce integrity data ownership flag A filesystem might supply its own integrity metadata. Introduce a flag that indicates whether the filesystem or the block layer owns the integrity buffer. Signed-off-by: Martin K. Petersen Signed-off-by: Jens Axboe commit b04accc425d52ca59699290661e0dfd09b0feeeb Author: Jens Axboe Date: Thu Oct 2 12:53:22 2008 +0200 block: revert part of d7533ad0e132f92e75c1b2eb7c26387b25a583c1 We need bdev_get_integrity() to support the pending md/dm patches. Signed-off-by: Jens Axboe commit 8deaf7210728c453295dc1cb2a5b66c68183ac85 Author: Alberto Bertogli Date: Thu Oct 2 12:46:53 2008 +0200 bio.h: Remove unused conditional code The whole bio_integrity() definition is inside an #ifdef CONFIG_BLK_DEV_INTEGRITY, there's no need for the conditional code. Signed-off-by: Alberto Bertogli Signed-off-by: Jens Axboe commit d00e29fd99dd63d1c51917604e35dee824ed567f Author: Kiyoshi Ueda Date: Wed Oct 1 10:14:46 2008 -0400 block: remove end_{queued|dequeued}_request() This patch removes end_queued_request() and end_dequeued_request(), which are no longer used. As a results, users of __end_request() became only end_request(). So the actual code in __end_request() is moved to end_request() and __end_request() is removed. Signed-off-by: Kiyoshi Ueda Signed-off-by: Jun'ichi Nomura Signed-off-by: Jens Axboe commit 99cd3386f290eaf61f2b7596d5a4cc2007771174 Author: Kiyoshi Ueda Date: Wed Oct 1 10:13:44 2008 -0400 block: change elevator to use __blk_end_request() This patch converts elevator to use __blk_end_request() directly so that end_{queued|dequeued}_request() can be removed. Related 'uptodate' arguments is converted to 'error'. Signed-off-by: Kiyoshi Ueda Signed-off-by: Jun'ichi Nomura Signed-off-by: Jens Axboe commit 7afb3a6e752503d5ebeb038336aa0fa886a51b44 Author: Kiyoshi Ueda Date: Wed Oct 1 10:13:02 2008 -0400 gdrom: change to use __blk_end_request() This patch converts gdrom to use __blk_end_request() directly so that end_{queued|dequeued}_request() can be removed. gd.transfer is '1' in error cases and '0' in non-error cases, so gdrom hasn't been propagating any error code to the block layer. We can just convert error cases to '-EIO'. Signed-off-by: Kiyoshi Ueda Signed-off-by: Jun'ichi Nomura Cc: Adrian McMenamin Signed-off-by: Jens Axboe commit 2a9df5055a99df25533daf4041fdb99f0ed3463c Author: Kiyoshi Ueda Date: Wed Oct 1 10:12:15 2008 -0400 memstick: change to use __blk_end_request() This patch converts memstick to use __blk_end_request() directly so that end_{queued|dequeued}_request() can be removed. Signed-off-by: Kiyoshi Ueda Signed-off-by: Jun'ichi Nomura Cc: Alex Dubov Signed-off-by: Jens Axboe commit 8316982ac06d7d8875dc8738efbb030791dc33bb Author: Kiyoshi Ueda Date: Wed Oct 1 10:11:20 2008 -0400 virtio_blk: change to use __blk_end_request() This patch converts virtio_blk to use __blk_end_request() directly so that end_{queued|dequeued}_request() can be removed. Related 'uptodate' argument is converted to 'error'. Signed-off-by: Kiyoshi Ueda Signed-off-by: Jun'ichi Nomura Cc: Rusty Russell Signed-off-by: Jens Axboe commit 0497b345e7d067109e0dd9bf9f4978a6847ee13b Author: Jens Axboe Date: Wed Oct 1 16:16:25 2008 +0200 blktrace: use BLKTRACE_BDEV_SIZE as the name size for setup structure Define as 32, which is is what BDEVNAME_SIZE is/was as well. This keeps the user interface the same and gets rid of the difference between kernel and user api here. Signed-off-by: Jens Axboe commit ef9e3facdf1fe1228721a7c295a76d1b7a0e57ec Author: Kiyoshi Ueda Date: Wed Oct 1 16:12:15 2008 +0200 block: add lld busy state exporting interface This patch adds an new interface, blk_lld_busy(), to check lld's busy state from the block layer. blk_lld_busy() calls down into low-level drivers for the checking if the drivers set q->lld_busy_fn() using blk_queue_lld_busy(). This resolves a performance problem on request stacking devices below. Some drivers like scsi mid layer stop dispatching request when they detect busy state on its low-level device like host/target/device. It allows other requests to stay in the I/O scheduler's queue for a chance of merging. Request stacking drivers like request-based dm should follow the same logic. However, there is no generic interface for the stacked device to check if the underlying device(s) are busy. If the request stacking driver dispatches and submits requests to the busy underlying device, the requests will stay in the underlying device's queue without a chance of merging. This causes performance problem on burst I/O load. With this patch, busy state of the underlying device is exported via q->lld_busy_fn(). So the request stacking driver can check it and stop dispatching requests if busy. The underlying device driver must return the busy state appropriately: 1: when the device driver can't process requests immediately. 0: when the device driver can process requests immediately, including abnormal situations where the device driver needs to kill all requests. Signed-off-by: Kiyoshi Ueda Signed-off-by: Jun'ichi Nomura Cc: Andrew Morton Signed-off-by: Jens Axboe commit 336c3d8ce771608815b65bcfa27a17a83b297328 Author: Elias Oltmanns Date: Wed Oct 1 16:02:33 2008 +0200 block: Fix blk_start_queueing() to not kick a stopped queue blk_start_queueing() should act like the generic queue unplugging and kicking and ignore a stopped queue. Such a queue may not be run until after a call to blk_start_queue(). Signed-off-by: Elias Oltmanns Signed-off-by: Jens Axboe commit c0ddffa84a7d12da9943a94d04dadbfb1883b904 Author: Sven Schuetz Date: Fri Sep 26 10:58:02 2008 +0200 include blktrace_api.h in headers_install This header file is of interest for user space programming, i.e. for tools that process blktrace data. We would like to use it for a tool on-top of blktrace which processes data provided by blktrace. For this purpose, it would be helpful if the blktrace API would make it to /usr/include/linux. The git tree for the blktrace tools comes with its own copy of this header file. I didn't manage to replace that copy with the file generated by the patch below yet. A few more cleanups would be needed. For example, the blktrace ioctl numbers, which are currently defined in usr/include/fs.h, might need to be moved. Should be feasible, though. Signed-off-by: Sven Schuetz Signed-off-by: Martin Peschke Signed-off-by: Jens Axboe commit e3ba9ae58a5599226e3976b29c8093041ae7c332 Author: Jens Axboe Date: Thu Sep 25 11:42:41 2008 +0200 block: reserve some tags just for sync IO By only allowing async IO to consume 3/4 ths of the tag depth, we always have slots free to serve sync IO. This is important to avoid having writes fill the entire tag queue, thus starving reads. Original patch and idea from Linus Torvalds Signed-off-by: Jens Axboe commit f7d7b7a7a3db6526a84ea755c1c54a051e9a52de Author: Jens Axboe Date: Thu Sep 25 11:37:50 2008 +0200 block: as/cfq ssd idle check update We really need to know about the hardware tagging support as well, since if the SSD does not do tagging then we still want to idle. Otherwise have the same dependent sync IO vs flooding async IO problem as on rotational media. Signed-off-by: Jens Axboe commit 8bff7c6b0f63c7ee9c5e3a076338d74125b8debb Author: Jens Axboe Date: Wed Sep 24 13:05:10 2008 +0200 libata: set queue SSD flag for SSD devices SSD devices should give an RPM setting of 1 in word 217 of the ID page. If we see such a device, tell the block layer about it. Signed-off-by: Jens Axboe commit a68bbddba486020c9c74825ce90c4c1ec463e0e8 Author: Jens Axboe Date: Wed Sep 24 13:03:33 2008 +0200 block: add queue flag for SSD/non-rotational devices We don't want to idle in AS/CFQ if the device doesn't have a seek penalty. So add a QUEUE_FLAG_NONROT to indicate a non-rotational device, low level drivers should set this flag upon discovery of an SSD or similar device type. Signed-off-by: Jens Axboe commit 9e49184c82e9ec3ab4d45f9ea5a17ccaf43869f0 Author: Keith Wansbrough Date: Mon Sep 22 14:57:17 2008 -0700 floppy: support arbitrary first-sector numbers The current floppy_struct allows floppies to number sectors starting from 0 or 1. This patch allows arbitrary first-sector numbers - for example, 0xC1 for Amstrad CPC disks. This extends the existing 1-bit field (FD_ZEROBASED, bit 2 of stretch) to 8 bits (FD_SECTMASK, bits 2 to 9). Currently 0x00 denotes a first sector number of 1, and 0x01 denotes a first sector number of 0. We extend this by interpreting FD_SECTMASK as the first sector number with the LSB flipped. Signed-off-by: Keith Wansbrough Cc: Alain Knaff Cc: Michael Kerrisk Cc: Karel Zak Signed-off-by: Andrew Morton Signed-off-by: Jens Axboe commit 061837bc8687edc2739ef02f721b7ae0b8076390 Author: Julia Lawall Date: Mon Sep 22 14:57:16 2008 -0700 drivers/block: Use DIV_ROUND_UP The kernel.h macro DIV_ROUND_UP performs the computation (((n) + (d) - 1) / (d)) but is perhaps more readable. An extract of the semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // @haskernel@ @@ #include @depends on haskernel@ expression n,d; @@ ( - (n + d - 1) / d + DIV_ROUND_UP(n,d) | - (n + (d - 1)) / d + DIV_ROUND_UP(n,d) ) @depends on haskernel@ expression n,d; @@ - DIV_ROUND_UP((n),d) + DIV_ROUND_UP(n,d) @depends on haskernel@ expression n,d; @@ - DIV_ROUND_UP(n,(d)) + DIV_ROUND_UP(n,d) // Signed-off-by: Julia Lawall Cc: Signed-off-by: Andrew Morton Signed-off-by: Jens Axboe commit 905bd78f2188da69e74966918e3d71df3dff382b Author: scameron@beardog.cca.cpqcorp.net Date: Fri Sep 19 18:27:47 2008 -0700 cciss: Fix cciss SCSI rescan code to better notice device changes Fix cciss SCSI rescan code to better notice device changes. If you hot-unplug a tape drive, then hot-plug a different tape drive into the same slot in a storage enclosure, the cciss driver wouldn't notice anything had changed, as it was only looking at the LUN address and device type. Now it looks at the inquiry page 0x83 device identifier, and vendor and model strings as well. Signed-off-by: Stephen M. Cameron Signed-off-by: Jens Axboe commit 79eb014578b79fcfb9d9e7dc979d1316079220aa Author: FUJITA Tomonori Date: Thu Sep 18 09:35:28 2008 -0700 fix an example of scatterlists handling in DMA-API.txt This example isn't the proper way to handle scatterlists (can't handle sg chaining). Signed-off-by: FUJITA Tomonori Signed-off-by: Jens Axboe commit 4ee5eaf4516a60f8ef64d3c246c64c6be0cf8c3a Author: Kiyoshi Ueda Date: Thu Sep 18 10:46:13 2008 -0400 block: add a queue flag for request stacking support This patch adds a queue flag to indicate the block device can be used for request stacking. Request stacking drivers need to stack their devices on top of only devices of which q->request_fn is functional. Since bio stacking drivers (e.g. md, loop) basically initialize their queue using blk_alloc_queue() and don't set q->request_fn, the check of (q->request_fn == NULL) looks enough for that purpose. However, dm will become both types of stacking driver (bio-based and request-based). And dm will always set q->request_fn even if the dm device is bio-based of which q->request_fn is not functional actually. So we need something else to distinguish the type of the device. Adding a queue flag is a solution for that. The reason why dm always sets q->request_fn is to keep the compatibility of dm user-space tools. Currently, all dm user-space tools are using bio-based dm without specifying the type of the dm device they use. To use request-based dm without changing such tools, the kernel must decide the type of the dm device automatically. The automatic type decision can't be done at the device creation time and needs to be deferred until such tools load a mapping table, since the actual type is decided by dm target type included in the mapping table. So a dm device has to be initialized using blk_init_queue() so that we can load either type of table. Then, all queue stuffs are set (e.g. q->request_fn) and we have no element to distinguish that it is bio-based or request-based, even after a table is loaded and the type of the device is decided. By the way, some stuffs of the queue (e.g. request_list, elevator) are needless when the dm device is used as bio-based. But the memory size is not so large (about 20[KB] per queue on ia64), so I hope the memory loss can be acceptable for bio-based dm users. Signed-off-by: Kiyoshi Ueda Signed-off-by: Jun'ichi Nomura Signed-off-by: Jens Axboe commit 82124d60354846623a4b94af335717a5e142a074 Author: Kiyoshi Ueda Date: Thu Sep 18 10:45:38 2008 -0400 block: add request submission interface This patch adds blk_insert_cloned_request(), a generic request submission interface for request stacking drivers. Request-based dm will use it to submit their clones to underlying devices. blk_rq_check_limits() is also added because it is possible that the lower queue has stronger limitations than the upper queue if multiple drivers are stacking at request-level. Not only for blk_insert_cloned_request()'s internal use, the function will be used by request-based dm when the queue limitation is modified (e.g. by replacing dm's table). Signed-off-by: Kiyoshi Ueda Signed-off-by: Jun'ichi Nomura Signed-off-by: Jens Axboe commit 32fab448e5e86694beade415e750363538ea5f49 Author: Kiyoshi Ueda Date: Thu Sep 18 10:45:09 2008 -0400 block: add request update interface This patch adds blk_update_request(), which updates struct request with completing its data part, but doesn't complete the struct request itself. Though it looks like end_that_request_first() of older kernels, blk_update_request() should be used only by request stacking drivers. Request-based dm will use it in bio->bi_end_io callback to update the original request when a data part of a cloned request completes. Followings are additional background information of why request-based dm needs this interface. - Request stacking drivers can't use blk_end_request() directly from the lower driver's completion context (bio->bi_end_io or rq->end_io), because some device drivers (e.g. ide) may try to complete their request with queue lock held, and it may cause deadlock. See below for detailed description of possible deadlock: - To solve that, request-based dm offloads the completion of cloned struct request to softirq context (i.e. using blk_complete_request() from rq->end_io). - Though it is possible to use the same solution from bio->bi_end_io, it will delay the notification of bio completion to the original submitter. Also, it will cause inefficient partial completion, because the lower driver can't perform the cloned request anymore and request-based dm needs to requeue and redispatch it to the lower driver again later. That's not good. - So request-based dm needs blk_update_request() to perform the bio completion in the lower driver's completion context, which is more efficient. Signed-off-by: Kiyoshi Ueda Signed-off-by: Jun'ichi Nomura Signed-off-by: Jens Axboe commit e3335de94067dbebe22e3962632ead34e832cb60 Author: Jens Axboe Date: Thu Sep 18 09:22:54 2008 -0700 block: blk_cleanup_queue() should call blk_sync_queue() When a driver calls blk_cleanup_queue(), the device should be fully idle. However, the block layer may have pending plugging timers and the IO schedulers may have pending work in the work queues. So quisce the device by waiting for the timer and flushing the work queues. Signed-off-by: Jens Axboe commit 9246b5f06deeea541e7c62437c2ad19a0b1172c0 Author: Chris Lalancette Date: Wed Sep 17 14:30:32 2008 -0700 block: Expand Xen blkfront for > 16 xvd Until recently, the maximum number of xvd block devices you could attach to a Xen domU was 16. This limitation turned out to be problematic for some users, so it was expanded to handle a much larger number of disks. However, this requires a couple of changes in the way that blkfront scans for disks. This functionality is already present in the Xen linux-2.6.18-xen.hg tree; the attached patch adds this functionality to the mainline xen-blkfront implementation. I successfully tested it on a 2.6.25 tree, and build tested it on 2.6.27-rc3. Signed-off-by: Chris Lalancette Acked-by: Jeremy Fitzhardinge Signed-off-by: Jens Axboe commit 9c02f2b02e29a2244e36c6e1f246080d8afc6cff Author: Jens Axboe Date: Thu Sep 18 09:31:53 2008 -0700 block: cleanup some of the integrity stuff in blkdev.h Don't put functions that are only used in fs/bio-integrity.c in blkdev.h, it's much cleaner to just keep it in there. Also kill completely unused bdev_get_tag_size() Signed-off-by: Jens Axboe commit 7ba1fbaa4a478f72fbaf5a56af9c82a77966b4c7 Author: Jens Axboe Date: Tue Sep 16 09:54:11 2008 -0700 block: use rq complete marking in blk_abort_request() We cannot abort a request if we raced with the timeout handler already, or with the IO completion. So make blk_abort_request() mark the request as complete, and only continue if we succeeded. Found and suggested by Mike Anderson Signed-off-by: Jens Axboe commit 581d4e28d9195aa8b2231383dbabc288988d615e Author: Jens Axboe Date: Sun Sep 14 05:56:33 2008 -0700 block: add fault injection mechanism for faking request timeouts Only works for the generic request timer handling. Allows one to sporadically ignore request completions, thus exercising the timeout handling. Signed-off-by: Jens Axboe commit 0a0d96b03a1f3bfd6bc3ea08008699e8e59fccd9 Author: Jens Axboe Date: Thu Sep 11 13:17:37 2008 +0200 block: add bio_kmalloc() Not all callers need (or want!) the mempool backing guarentee, it essentially means that you can only use bio_alloc() for short allocations and not for preallocating some bio's at setup or init time. So add bio_kmalloc() which does the same thing as bio_alloc(), except it just uses kmalloc() as the backing instead of the bio mempools. Signed-off-by: Jens Axboe commit 3e6053d76dcbd92b2f9f4ad5ece9bce83149523e Author: Hugh Dickins Date: Thu Sep 11 10:57:55 2008 +0200 block: adjust blkdev_issue_discard for swap Two mods to blkdev_issue_discard(), thinking ahead to its use on swap: 1. Add gfp_mask argument, so swap allocation can use it where GFP_KERNEL might deadlock but GFP_NOIO is safe. 2. Enlarge nr_sects argument from unsigned to sector_t: unsigned long is enough to cover a whole swap area, but sector_t suits any partition. Change sb_issue_discard()'s nr_blocks to sector_t too; but no need seen for a gfp_mask there, just pass GFP_KERNEL down to blkdev_issue_discard(). Signed-off-by: Hugh Dickins Signed-off-by: Jens Axboe commit 4677735f03f5b6b6f2182f457a921855cadfb85b Author: FUJITA Tomonori Date: Tue Sep 2 22:50:08 2008 +0900 sg: remove unnecessary blk_rq_unmap_user blk_rq_unmap_user in sg_finish_rem_req can take care of all the cases. Signed-off-by: FUJITA Tomonori Signed-off-by: Jens Axboe commit 0b6cb26c6686f1f24607c41f0a6d21ce54191710 Author: FUJITA Tomonori Date: Tue Sep 2 22:50:07 2008 +0900 sg: remove sg_read_xfer sg_read_xfer was used to copy data to user space for READ commands. blk_rq_unmap_user does the job so sg_read_xfer does nothing useful. Signed-off-by: FUJITA Tomonori Signed-off-by: Jens Axboe commit c3919af2354fff673026dcbeac6f009d2ce5ceee Author: FUJITA Tomonori Date: Tue Sep 2 22:50:06 2008 +0900 sg: remove sg_write_xfer sg_write_xfer was used to copy data from user space for WRITE commands. blk_rq_map_user_iov and blk_rq_map_user do the job so sg_write_xfer does nothing useful. Signed-off-by: FUJITA Tomonori Signed-off-by: Jens Axboe commit 626710c9d665ff381c7ec666b6a023f064ca5fef Author: FUJITA Tomonori Date: Tue Sep 2 22:50:05 2008 +0900 sg: incorporate sg_build_direct into sg_start_req Calling blk_rq_map_user() at a single place is better than at different two places. It makes the code more understandable. Signed-off-by: FUJITA Tomonori Signed-off-by: Jens Axboe commit 44c7b0eaa041007066e30ab4869d5bbf8dad5989 Author: FUJITA Tomonori Date: Tue Sep 2 22:50:04 2008 +0900 sg: remove __sg_start_req __sg_start_req() was used temporarily to call blk_get_request() during converting sg to use the block layer. Now sg always calls blk_get_request() so we can move blk_get_request() to sg_start_req(). We don't need __sg_start_req anymore. Signed-off-by: FUJITA Tomonori Signed-off-by: Jens Axboe commit fd1c1de0766844af4cfc39298e109ad273e72a9e Author: FUJITA Tomonori Date: Tue Sep 2 22:50:03 2008 +0900 sg: remove b_malloc_len in sg_scatter_hold struct It's not used for anything useful after the block layer conversion. Signed-off-by: FUJITA Tomonori Signed-off-by: Jens Axboe commit 7e56cb0f7e7a132803ffefa0a5a15fb2079afaf1 Author: FUJITA Tomonori Date: Tue Sep 2 22:50:02 2008 +0900 sg: remove SG_ALLOW_DIO_CODE define sg had lots of the own functions for the direct IO but now sg uses the block layer functions for it. There are only five lines for the direct IO. SG_ALLOW_DIO_CODE define was used to compile out the direct IO code but we don't need the define. If someone wants to remove the direct IO code, he can do easily without the define. Signed-off-by: FUJITA Tomonori Signed-off-by: Jens Axboe commit a91a3a20e06621b9931793888583efe37db4e4e8 Author: FUJITA Tomonori Date: Tue Sep 2 22:50:01 2008 +0900 sg: rename sg_cmd_done sg_rq_end_io old sg_rq_end_io() was used to wrap sg_cmd_done during converting sg to use the block layer (in order to cover the difference scsi_execute_async and blk_execute_rq_nowait). Now we don't need it so let's remove it. Signed-off-by: FUJITA Tomonori Signed-off-by: Jens Axboe commit 224cb3e981f1b2f9f93dbd49eaef505d17d894c2 Author: Mike Anderson Date: Fri Aug 29 09:36:09 2008 +0200 dm: Call blk_abort_queue on failed paths Signed-off-by: Mike Anderson Signed-off-by: Jens Axboe commit 11914a53d2ec2974a565311af327b8983d8c820d Author: Mike Anderson Date: Sat Sep 13 20:31:27 2008 +0200 block: Add interface to abort queued requests Signed-off-by: Mike Anderson Signed-off-by: Jens Axboe commit 242f9dcb8ba6f68fcd217a119a7648a4f69290e9 Author: Jens Axboe Date: Sun Sep 14 05:55:09 2008 -0700 block: unify request timeout handling Right now SCSI and others do their own command timeout handling. Move those bits to the block layer. Instead of having a timer per command, we try to be a bit more clever and simply have one per-queue. This avoids the overhead of having to tear down and setup a timer for each command, so it will result in a lot less timer fiddling. Signed-off-by: Mike Anderson Signed-off-by: Jens Axboe commit 608aeef17a91747d6303de4df5e2c2e6899a95e8 Author: Andrew Patterson Date: Thu Sep 4 14:27:45 2008 -0600 Call flush_disk() after detecting an online resize. We call flush_disk() to make sure the buffer cache for the disk is flushed after a disk resize. There are two resize cases, growing and shrinking. Given that users can shrink/then grow a disk before revalidate_disk() is called, we treat the grow case identically to shrinking. We need to flush the buffer cache after an online shrink because, as James Bottomley puts it, The two use cases for shrinking I can see are 1. planned: the fs is already shrunk to within the new boundaries and all data is relocated, so invalidate is fine (any dirty buffers that might exist in the shrunk region are there only because they were relocated but not yet written to their original location). 2. unplanned: In this case, the fs is probably toast, so whether we invalidate or not isn't going to make a whole lot of difference; it's still going to try to read or write from sectors beyond the new size and get I/O errors. Immediately invalidating shrunk disks will cause errors for outstanding I/Os for reads/write beyond the new end of the disk to be generated earlier then if we waited for the normal buffer cache operation. It also removes a potential security hole where we might keep old data around from beyond the end of the shrunk disk if the disk was not invalidated. Signed-off-by: Andrew Patterson Signed-off-by: Jens Axboe commit 56ade44b46780fa291fa68b824f1dafdcb11b0ca Author: Andrew Patterson Date: Thu Sep 4 14:27:40 2008 -0600 Added flush_disk to factor out common buffer cache flushing code. We need to be able to flush the buffer cache for for more than just when a disk is changed, so we factor out common cache flush code in check_disk_change() to an internal flush_disk() routine. This routine will then be used for both disk changes and disk resizes (in a later patch). Include the disk name in the text indicating that there are busy inodes on the device and increase the KERN severity of the message. Signed-off-by: Andrew Patterson Signed-off-by: Jens Axboe commit f98a8cae12f2b2a8f9bfd7a53c990a1a405e880e Author: Andrew Patterson Date: Thu Sep 4 14:27:35 2008 -0600 SCSI sd driver calls revalidate_disk wrapper. Modify the SCSI disk driver to call the revalidate_disk() wrapper. This allows us to do some housekeeping such as accounting for a disk being resized online. The wrapper will call sd_revalidate_disk() at the appropriate time. Signed-off-by: Andrew Patterson Signed-off-by: Jens Axboe commit 9bc3ffbfbdf71fefda8a261ef8d6fdc388a29b42 Author: Andrew Patterson Date: Thu Sep 4 14:27:30 2008 -0600 Check for device resize when rescanning partitions Check for device resize in the rescan_partitions() routine. If the device has been resized, the bdev size is set to match. The rescan_partitions() routine is called when opening the device and when calling the BLKRRPART ioctl. Signed-off-by: Andrew Patterson Signed-off-by: Jens Axboe commit c3279d1454cdfed02a557d789d8a6d08ab4cbe70 Author: Andrew Patterson Date: Thu Sep 4 14:27:25 2008 -0600 Adjust block device size after an online resize of a disk. The revalidate_disk routine now checks if a disk has been resized by comparing the gendisk capacity to the bdev inode size. If they are different (usually because the disk has been resized underneath the kernel) the bdev inode size is adjusted to match the capacity. Signed-off-by: Andrew Patterson Signed-off-by: Jens Axboe commit 0c002c2f74e10baa9021d3ecc50585c6eafea568 Author: Andrew Patterson Date: Thu Sep 4 14:27:20 2008 -0600 Wrapper for lower-level revalidate_disk routines. This is a wrapper for the lower-level revalidate_disk call-backs such as sd_revalidate_disk(). It allows us to perform pre and post operations when calling them. We will use this wrapper in a later patch to adjust block device sizes after an online resize (a _post_ operation). Signed-off-by: Andrew Patterson Signed-off-by: Jens Axboe commit 243294dae09c909c0442c8f04d470b69c3c19d6e Author: Tejun Heo Date: Thu Sep 4 09:17:31 2008 +0200 block: fix duplicate headers for /proc/partitions seqf can be started multiple times for a read and the header should be printed only for the initial one. Fix it. Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe commit fad7f01e61bf737fe8a3740d803f000db57ecac6 Author: FUJITA Tomonori Date: Tue Sep 2 16:20:20 2008 +0900 sg: set dxferp to NULL for READ with the older SG interface With the older SG interface, we don't know a user-space address to trasfer data when executing a SCSI command. So we can't pass a user-space address to blk_rq_map_user. This patch fixes sg to pass a NULL user-space address to blk_rq_map_user so that it just sets up a request and bios with page frames propely without data transfer. Signed-off-by: FUJITA Tomonori Signed-off-by: Jens Axboe commit 818827669d85b84241696ffef2de485db46b0b5e Author: FUJITA Tomonori Date: Tue Sep 2 16:20:19 2008 +0900 block: make blk_rq_map_user take a NULL user-space buffer This patch changes blk_rq_map_user to accept a NULL user-space buffer with a READ command if rq_map_data is not NULL. Thus a caller can pass page frames to lk_rq_map_user to just set up a request and bios with page frames propely. bio_uncopy_user (called via blk_rq_unmap_user) doesn't copy data to user space with such request. Signed-off-by: FUJITA Tomonori Signed-off-by: Jens Axboe commit 839e96afba87117befd39cf4e43f156edc8047a7 Author: Jens Axboe Date: Tue Sep 2 09:25:21 2008 +0200 block: update comment on end_request() It refers to functions that no longer exist after the IO completion changes. Signed-off-by: Jens Axboe commit 55dc7db70a73a3809a2334063c9b5b0d8ccebdaa Author: Tejun Heo Date: Mon Sep 1 13:44:35 2008 +0200 init: DEBUG_BLOCK_EXT_DEVT requires explicit root= param DEBUG_BLOCK_EXT_DEVT shuffles SCSI and IDE device numbers and root device number set using rdev become meaningless. Root devices should be explicitly specified using textual names. Warn about it if root can't be found and DEBUG_BLOCK_EXT_DEVT is enabled. Also, add warning to the help text. Signed-off-by: Tejun Heo Cc: Bartlomiej Zolnierkiewicz Signed-off-by: Jens Axboe commit 2bbedcb4c1abac498f18e5770d62ae66ff235ada Author: Tejun Heo Date: Fri Aug 29 11:41:51 2008 +0200 block: don't test for partition size in bdget_disk() and blk_lookup_devt() bdget_disk() and blk_lookup_devt() never cared whether the specified partition (or disk) is zero sized or not. I got confused while converting those not to depend on consecutive minor numbers in commit 5a6411b1178baf534aa9138052864dfa89d3eada and later when dev0 was added it broke callers which expected to get valid return for zero sized disk devices. So, they never needed nr_sects checks in the first place. Kill them. This problem was spotted and debugged by Bartlmoiej Zolnierkiewicz. Signed-off-by: Tejun Heo Cc: Bartlomiej Zolnierkiewicz Signed-off-by: Jens Axboe commit 759f8ca3048f7438aa3129268d7252552505d662 Author: Jens Axboe Date: Fri Aug 29 09:06:29 2008 +0200 Change default value of CONFIG_DEBUG_BLOCK_EXT_DEVT to 'n' It's a debug option that you would explicitly enable to test this feature, we should default it to 'n' to prevent accidental surprises for now. Signed-off-by: Jens Axboe commit aeb3d3a81e81c6323a17fe914e91eb228b3f1aa1 Author: Harvey Harrison Date: Thu Aug 28 09:27:42 2008 +0200 block: kmalloc args reversed, small function definition fixes Noticed by sparse: block/blk-softirq.c:156:12: warning: symbol 'blk_softirq_init' was not declared. Should it be static? block/genhd.c:583:28: warning: function 'bdget_disk' with external linkage has definition block/genhd.c:659:17: warning: incorrect type in argument 1 (different base types) block/genhd.c:659:17: expected unsigned int [unsigned] [usertype] size block/genhd.c:659:17: got restricted gfp_t block/genhd.c:659:29: warning: incorrect type in argument 2 (different base types) block/genhd.c:659:29: expected restricted gfp_t [usertype] flags block/genhd.c:659:29: got unsigned int block: kmalloc args reversed Signed-off-by: Harvey Harrison Signed-off-by: Jens Axboe commit 01cfcddd98f09e05a2e36031654ed46643b76f23 Author: FUJITA Tomonori Date: Thu Aug 28 15:05:59 2008 +0900 sg: use blk_rq_aligned helper function Signed-off-by: FUJITA Tomonori Cc: Douglas Gilbert Cc: Jens Axboe Signed-off-by: Jens Axboe commit 879040742cf09f2360a9ac41846288707e4e567c Author: FUJITA Tomonori Date: Thu Aug 28 15:05:58 2008 +0900 block: add blk_rq_aligned helper function This adds blk_rq_aligned helper function to see if alignment and padding requirement is satisfied for DMA transfer. This also converts blk_rq_map_kern and __blk_rq_map_user to use the helper function. Signed-off-by: FUJITA Tomonori Cc: Jens Axboe Signed-off-by: Jens Axboe commit 4d8ab62e087d9300883b82c2662e73e6eef803a3 Author: FUJITA Tomonori Date: Thu Aug 28 15:05:57 2008 +0900 bio: convert bio_copy_kern to use bio_copy_user bio_copy_kern and bio_copy_user are very similar. This converts bio_copy_kern to use bio_copy_user. Signed-off-by: FUJITA Tomonori Cc: Jens Axboe Signed-off-by: Jens Axboe commit 10db10d144c0248f285242f79daf6b9de6b00a62 Author: FUJITA Tomonori Date: Fri Aug 29 12:32:18 2008 +0200 sg: convert the indirect IO path to use the block layer This patch converts the indirect IO path (including mmap IO and old struct sg_header) to use the block layer functions (blk_get_request, blk_execute_rq_nowait, blk_rq_map_user, etc) instead of scsi_execute_async(). [Jens: fixed compile error with SCSI logging enabled] Signed-off-by: FUJITA Tomonori Signed-off-by: Douglas Gilbert Cc: Mike Christie Cc: James Bottomley Signed-off-by: Jens Axboe commit 6e5a30cba5e7c03b2cd564e968f1dd667a0f7c42 Author: FUJITA Tomonori Date: Thu Aug 28 16:17:08 2008 +0900 sg: convert the direct IO path to use the block layer This patch converts the direct IO path (SG_FLAG_DIRECT_IO) to use the block layer functions (blk_get_request, blk_execute_rq_nowait, blk_rq_map_user, etc) instead of scsi_execute_async(). Signed-off-by: FUJITA Tomonori Signed-off-by: Douglas Gilbert Cc: Mike Christie Cc: James Bottomley Signed-off-by: Jens Axboe commit 10865dfa34e7552c4c64606edcdf1e21a110c985 Author: FUJITA Tomonori Date: Thu Aug 28 16:17:07 2008 +0900 sg: convert the non-data path to use the block layer This patch converts the non data path to use the block layer functions (blk_get_request, blk_execute_rq_nowait, etc) instead of uses scsi_execute_async(). Signed-off-by: FUJITA Tomonori Signed-off-by: Douglas Gilbert Cc: Mike Christie Cc: James Bottomley Signed-off-by: Jens Axboe commit 152e283fdfea0cd11e297d982378b55937842dde Author: FUJITA Tomonori Date: Thu Aug 28 16:17:06 2008 +0900 block: introduce struct rq_map_data to use reserved pages This patch introduces struct rq_map_data to enable bio_copy_use_iov() use reserved pages. Currently, bio_copy_user_iov allocates bounce pages but drivers/scsi/sg.c wants to allocate pages by itself and use them. struct rq_map_data can be used to pass allocated pages to bio_copy_user_iov. The current users of bio_copy_user_iov simply passes NULL (they don't want to use pre-allocated pages). Signed-off-by: FUJITA Tomonori Cc: Jens Axboe Cc: Douglas Gilbert Cc: Mike Christie Cc: James Bottomley Signed-off-by: Jens Axboe commit a3bce90edd8f6cafe3f63b1a943800792e830178 Author: FUJITA Tomonori Date: Thu Aug 28 16:17:05 2008 +0900 block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov Currently, blk_rq_map_user and blk_rq_map_user_iov always do GFP_KERNEL allocation. This adds gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov so sg can use it (sg always does GFP_ATOMIC allocation). Signed-off-by: FUJITA Tomonori Signed-off-by: Douglas Gilbert Cc: Mike Christie Cc: James Bottomley Signed-off-by: Jens Axboe commit 45333d5a31296d0af886d94f1d08f128231cab8e Author: Aaron Carroll Date: Tue Aug 26 15:52:36 2008 +0200 cfq-iosched: fix queue depth detection CFQ's detection of queueing devices assumes a non-queuing device and detects if the queue depth reaches a certain threshold. Under some workloads (e.g. synchronous reads), CFQ effectively forces a unit queue depth, thus defeating the detection logic. This leads to poor performance on queuing hardware, since the idle window remains enabled. This patch inverts the sense of the logic: assume a queuing-capable device, and detect if the depth does not exceed the threshold. Signed-off-by: Aaron Carroll Signed-off-by: Jens Axboe commit 605401618ce4409045bc4db86e88d4b38f2ad585 Author: Jens Axboe Date: Tue Aug 26 13:34:34 2008 +0200 block: don't use bio_has_data() in the completion path We should just check for rq->bio, as that is really the information we are looking for. Even if the bio attached doesn't carry data, we still need to do IO post processing on it. Signed-off-by: Jens Axboe commit ab780f1ece0dc8d5e8e8e85435acc5e4747ccda3 Author: Jens Axboe Date: Tue Aug 26 10:25:02 2008 +0200 block: inherit CPU completion on bio->rq and rq->rq merges Somewhat incomplete, as we do allow merges of requests and bios that have different completion CPUs given. This is done on the assumption that a larger IO is still more beneficial than CPU locality. Signed-off-by: Jens Axboe commit c7c22e4d5c1fdebfac4dba76de7d0338c2b0d832 Author: Jens Axboe Date: Sat Sep 13 20:26:01 2008 +0200 block: add support for IO CPU affinity This patch adds support for controlling the IO completion CPU of either all requests on a queue, or on a per-request basis. We export a sysfs variable (rq_affinity) which, if set, migrates completions of requests to the CPU that originally submitted it. A bio helper (bio_set_completion_cpu()) is also added, so that queuers can ask for completion on that specific CPU. In testing, this has been show to cut the system time by as much as 20-40% on synthetic workloads where CPU affinity is desired. This requires a little help from the architecture, so it'll only work as designed for archs that are using the new generic smp helper infrastructure. Signed-off-by: Jens Axboe commit 18887ad910e56066233a07fd3cfb2fa11338b782 Author: Jens Axboe Date: Mon Jul 28 13:08:45 2008 +0200 block: make kblockd_schedule_work() take the queue as parameter Preparatory patch for checking queuing affinity. Signed-off-by: Jens Axboe commit b646fc59b332ef307895558c9cd1359dc2d25813 Author: Jens Axboe Date: Mon Jul 28 13:06:00 2008 +0200 block: split softirq handling into blk-softirq.c Signed-off-by: Jens Axboe commit 0835da67c11e879ed5dc23160934d8970470a2ce Author: Jens Axboe Date: Tue Aug 26 09:15:47 2008 +0200 block: use linux/uaccess.h in elevator.c instead of asm variant Signed-off-by: Jens Axboe commit 3e1a7ff8a0a7b948f2684930166954f9e8e776fe Author: Tejun Heo Date: Mon Aug 25 19:56:17 2008 +0900 block: allow disk to have extended device number Now that disk and partition handlings are mostly unified, it's easy to allow disk to have extended device number. This patch makes add_disk() use extended device number if disk->minors is zero. Both sd and ide-disk are updated to use this. * sd_format_disk_name() is implemented which can generically determine the drive name. This removes disk number restriction stemming from limited device names. * If sd index goes over SD_MAX_DISKS (which can be increased now BTW), sd simply doesn't initialize minors letting block layer choose extended device number. * If CONFIG_DEBUG_EXT_DEVT is set, both sd and ide-disk always set minors to 0 and use extended device numbers. Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe commit 689d6fac40b41c7bf154f362deaf442548e4dc81 Author: Tejun Heo Date: Mon Aug 25 19:56:16 2008 +0900 block: replace @ext_minors with GENHD_FL_EXT_DEVT With previous changes, it's meaningless to limit the number of partitions. Replace @ext_minors with GENHD_FL_EXT_DEVT such that setting the flag allows the disk to have maximum number of allowed partitions (only limited by the number of entries in parsed_partitions as determined by MAX_PART constant). This kills not-too-pretty alloc_disk_ext[_node]() functions and makes @minors parameter to alloc_disk[_node]() unnecessary. The parameter is left alone to avoid disturbing the users. Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe commit 540eed5637b766bb1e881ef744c42617760b4815 Author: Tejun Heo Date: Mon Aug 25 19:56:15 2008 +0900 block: make partition array dynamic disk->__part used to be statically allocated to the maximum possible number of partitions. This patch makes partition array allocation dynamic. The added overhead is minimal as only real change is one memory dereference changed to RCU one. This saves both a bit of memory and cpu cycles iterating through unoccupied slots and makes increasing partition limit easier. Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe commit 074a7aca7afa6f230104e8e65eba3420263714a5 Author: Tejun Heo Date: Mon Aug 25 19:56:14 2008 +0900 block: move stats from disk to part0 Move stats related fields - stamp, in_flight, dkstats - from disk to part0 and unify stat handling such that... * part_stat_*() now updates part0 together if the specified partition is not part0. ie. part_stat_*() are now essentially all_stat_*(). * {disk|all}_stat_*() are gone. * part_round_stats() is updated similary. It handles part0 stats automatically and disk_round_stats() is killed. * part_{inc|dec}_in_fligh() is implemented which automatically updates part0 stats for parts other than part0. * disk_map_sector_rcu() is updated to return part0 if no part matches. Combined with the above changes, this makes NULL special case handling in callers unnecessary. * Separate stats show code paths for disk are collapsed into part stats show code paths. * Rename disk_stat_lock/unlock() to part_stat_lock/unlock() While at it, reposition stat handling macros a bit and add missing parentheses around macro parameters. Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe commit eddb2e26b5ee3c5da68ba4bf1921ba20e2097bff Author: Tejun Heo Date: Mon Aug 25 19:56:13 2008 +0900 block: kill GENHD_FL_FAIL and use part0->make_it_fail GENHD_FL_FAIL for disk is what make_it_fail is for parts. Kill it and use part0->make_it_fail. Sysfs node handling is unified too. Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe commit 0762b8bde9729f10f8e6249809660ff2ec3ad735 Author: Tejun Heo Date: Mon Aug 25 19:56:12 2008 +0900 block: always set bdev->bd_part Till now, bdev->bd_part is set only if the bdev was for parts other than part0. This patch makes bdev->bd_part always set so that code paths don't have to differenciate common handling. Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe commit 4c46501d1659475dc6c89554af6ce7fe6ecf615c Author: Tejun Heo Date: Mon Aug 25 19:56:11 2008 +0900 block: move holder_dir from disk to part0 Move disk->holder_dir to part0->holder_dir. Kill now mostly superflous bdev_get_holder(). While at it, kill superflous kobject_get/put() around holder_dir, slave_dir and cmd_filter creation and collapse disk_sysfs_add_subdirs() into register_disk(). These serve no purpose but obfuscating the code. Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe commit b7db9956e57c8151b930d5e5fe5c766e6aad3ff7 Author: Tejun Heo Date: Mon Aug 25 19:56:10 2008 +0900 block: move policy from disk to part0 Move disk->policy to part0->policy. Implement and use get_disk_ro(). Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe commit e56105214943ce5f0901d20e972a7cfd0d1d0656 Author: Tejun Heo Date: Mon Aug 25 19:56:09 2008 +0900 block: unify sysfs size node handling Now that capacity and __dev are moved to part0, part0 and others can share the same method. Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe commit 548b10eb2959c96cef6fc29fc96e0931eeb53bc5 Author: Tejun Heo Date: Fri Aug 29 09:01:47 2008 +0200 block: move __dev from disk to part0 Move disk->__dev to part0->__dev. This simplifies bdget_disk() and lookup_devt() and allows common sysfs attributes to be unified. part_to_disk() is updated to handle part0 -> disk. Updated to include a fix from Bartlomiej Zolnierkiewicz , he writes: "part0 is a "special" partition and doesn't need to have capacity set - this fixes regression caused by "block: move __dev from disk to part0" commit." Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe commit 80795aefb76d10c5d698e60c7e7750b5330787da Author: Tejun Heo Date: Mon Aug 25 19:56:07 2008 +0900 block: move capacity from disk to part0 Move disk->capacity to part0->nr_sects and convert all users who directly accessed the field to use {get|set}_capacity(). This is done early to allow the __dev field to be moved. Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe commit b5d0b9df0ba5d9a044f3a21e7544f53d90bd1465 Author: Tejun Heo Date: Wed Sep 3 09:06:42 2008 +0200 block: introduce partition 0 genhd and partition code handled disk and partitions separately. All information about the whole disk was in struct genhd and partitions in struct hd_struct. However, the whole disk (part0) and other partitions have a lot in common and the data structures end up having good number of common fields and thus separate code paths doing the same thing. Also, the partition array was indexed by partno - 1 which gets pretty confusing at times. This patch introduces partition 0 and makes the partition array indexed by partno. Following patches will unify the handling of disk and parts piece-by-piece. This patch also implements disk_partitionable() which tests whether a disk is partitionable. With coming dynamic partition array change, the most common usage of disk_max_parts() will be testing whether a disk is partitionable and the number of max partitions will become much less important. Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe commit ed9e1982347b36573cd622ee5f4e2a7ccd79b3fd Author: Tejun Heo Date: Mon Aug 25 19:56:05 2008 +0900 block: implement and use {disk|part}_to_dev() Implement {disk|part}_to_dev() and use them to access generic device instead of directly dereferencing {disk|part}->dev. To make sure no user is left behind, rename generic devices fields to __dev. This is in preparation of unifying partition 0 handling with other partitions. Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe commit 870d6656126add8e383645732b03df2b7ccd4f94 Author: Tejun Heo Date: Mon Aug 25 19:47:25 2008 +0900 block: implement CONFIG_DEBUG_BLOCK_EXT_DEVT Extended devt introduces non-contiguos device numbers. This patch implements a debug option which forces most devt allocations to be from the extended area and spreads them out. This is enabled by default if DEBUG_KERNEL is set and achieves... 1. Detects code paths in kernel or userland which expect predetermined consecutive device numbers. 2. When something goes wrong, avoid corruption as adding to the minor of earlier partition won't lead to the wrong but valid device. Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe commit f615b48cc7df7cac3865ec76ac1a5bb04d3e07f4 Author: Tejun Heo Date: Mon Aug 25 19:47:24 2008 +0900 sd/ide-disk: apply extended minors to sd and ide Update sd and ide-disk such that they can take advantage of extended minors. ide-disk already has 64 minors per device and currently doesn't use extended minors although after this patch it can be turned on by simply tweaking constants. sd only had 16 minors per device causing problems on certain peculiar configurations. This patch lifts the restriction and enables it to use upto 64 minors. Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe commit 1f0142905d4812966831613847db38a66da29eb8 Author: Tejun Heo Date: Mon Aug 25 19:47:23 2008 +0900 block: adjust formatting for large minors and add ext_range sysfs attr With extended minors and the soon-to-follow debug feature, large minor numbers for block devices will be common. This patch does the followings to make printouts pretty. * Adapt print formats such that large minors don't break the formatting. * For extended MAJ:MIN, %02x%02x for MAJ:MIN used in printk_all_partitions() doesn't cut it anymore. Update it such that %03x:%05x is used if either MAJ or MIN doesn't fit in %02x. * Implement ext_range sysfs attribute which shows total minors the device can use including both conventional minor space and the extended one. Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe commit bcce3de1be61e424deef35d1e86e86a35c4b6e65 Author: Tejun Heo Date: Mon Aug 25 19:47:22 2008 +0900 block: implement extended dev numbers Implement extended device numbers. A block driver can tell block layer that it wants to use extended device numbers. After the usual minor space is used up, block layer automatically allocates devt's from EXT_BLOCK_MAJOR. Currently only one major number is allocated for this but as the allocation is strictly on-demand, ~1mil minor space under it should suffice unless the system actually has more than ~1mil partitions and if that ever happens adding more majors to the extended devt area is easy. Due to internal implementation issues, the first partition can't be allocated on the extended area. In other words, genhd->minors should at least be 1. This limitation will be lifted by later changes. Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe commit c9959059161ddd7bf4670cf47367033d6b2f79c4 Author: Tejun Heo Date: Mon Aug 25 19:47:21 2008 +0900 block: fix diskstats access There are two variants of stat functions - ones prefixed with double underbars which don't care about preemption and ones without which disable preemption before manipulating per-cpu counters. It's unclear whether the underbarred ones assume that preemtion is disabled on entry as some callers don't do that. This patch unifies diskstats access by implementing disk_stat_lock() and disk_stat_unlock() which take care of both RCU (for partition access) and preemption (for per-cpu counter access). diskstats access should always be enclosed between the two functions. As such, there's no need for the versions which disables preemption. They're removed and double underbars ones are renamed to drop the underbars. As an extra argument is added, there's no danger of using the old version unconverted. disk_stat_lock() uses get_cpu() and returns the cpu index and all diskstat functions which access per-cpu counters now has @cpu argument to help RT. This change adds RCU or preemption operations at some places but also collapses several preemption ops into one at others. Overall, the performance difference should be negligible as all involved ops are very lightweight per-cpu ones. Signed-off-by: Tejun Heo Cc: Peter Zijlstra Signed-off-by: Jens Axboe commit e71bf0d0ee89e51b92776391c5634938236977d5 Author: Tejun Heo Date: Wed Sep 3 09:03:02 2008 +0200 block: fix disk->part[] dereferencing race disk->part[] is protected by its matching bdev's lock. However, non-critical accesses like collecting stats and printing out sysfs and proc information used to be performed without any locking. As partitions can come and go dynamically, partitions can go away underneath those non-critical accesses. As some of those accesses are writes, this theoretically can lead to silent corruption. This patch fixes the race by using RCU for the partition array and dev reference counter to hold partitions. * Rename disk->part[] to disk->__part[] to make sure no one outside genhd layer proper accesses it directly. * Use RCU for disk->__part[] dereferencing. * Implement disk_{get|put}_part() which can be used to get and put partitions from gendisk respectively. * Iterators are implemented to help iterate through all partitions safely. * Functions which require RCU readlock are marked with _rcu suffix. * Use disk_put_part() in __blkdev_put() instead of directly putting the contained kobject. Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe commit f331c0296f2a9fee0d396a70598b954062603015 Author: Tejun Heo Date: Wed Sep 3 09:01:48 2008 +0200 block: don't depend on consecutive minor space * Implement disk_devt() and part_devt() and use them to directly access devt instead of computing it from ->major and ->first_minor. Note that all references to ->major and ->first_minor outside of block layer is used to determine devt of the disk (the part0) and as ->major and ->first_minor will continue to represent devt for the disk, converting these users aren't strictly necessary. However, convert them for consistency. * Implement disk_max_parts() to avoid directly deferencing genhd->minors. * Update bdget_disk() such that it doesn't assume consecutive minor space. * Move devt computation from register_disk() to add_disk() and make it the only one (all other usages use the initially determined value). These changes clean up the code and will help disk->part dereference fix and extended block device numbers. Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe commit cf771cb5a7b716f3f9e532fd42a1e3a0a75adec5 Author: Tejun Heo Date: Wed Sep 3 09:01:09 2008 +0200 block: make variable and argument names more consistent In hd_struct, @partno is used to denote partition number and a number of other places use @part to denote hd_struct. Functions use @part and @index instead. This causes confusion and makes it difficult to use consistent variable names for hd_struct. Always use @partno if a variable represents partition number. Also, print out functions use @f or @part for seq_file argument. Use @seqf uniformly instead. Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe commit 310a2c1012934f590192377f65940cad4aa72b15 Author: Tejun Heo Date: Mon Aug 25 19:47:17 2008 +0900 block: misc updates This patch makes the following misc updates in preparation for disk->part dereference fix and extended block devt support. * implment part_to_disk() * fix comment about gendisk->part indexing * rename get_part() to disk_map_sector() * don't use n which is always zero while printing disk information in diskstats_show() Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe commit 88e341261ca4d39eec21b212961c77eff51105f7 Author: Tejun Heo Date: Mon Aug 25 19:30:16 2008 +0900 block: update add_partition() error handling d805dda4 tried to fix error case handling in add_partition() but had a few problems. * disk->part[] entry is set early and left dangling if operation fails. * Once device initialized, the last put_device() is responsible for freeing all the resources. The failure path freed part_stats and p regardless of put_device() causing double free. * holders subdir holds reference to the disk device, so failure path should remove it to release resources properly which was missing. This patch fixes the above problems and while at it move partition slot busy check into add_partition() for completeness and inlines holders subdirectory creation. Using separate function for it just obfuscates the code. Signed-off-by: Tejun Heo Cc: Abdel Benamrouche Signed-off-by: Jens Axboe commit ec2cdedf798385a9397ac50dd0405dd658f8529c Author: Tejun Heo Date: Mon Aug 25 19:30:15 2008 +0900 block: allow deleting zero length partition delete_partition() was noop for zero length partition. As the addition code allows creating zero lenght partition and deletion is assumed to always succeed, this causes memory leak for zero length partitions. Allow zero length partitions to end their meaningless lives. While at it, allow deleting zero lenght partition via BLKPG_DEL_PARTITION ioctl too. Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe commit def4e38ddda9bef20b69bfa939195c2f79da7979 Author: Tejun Heo Date: Wed Sep 3 08:57:12 2008 +0200 block: use class_dev_iterator instead of class_for_each_device() Recent block_class iteration updates 5c6f35c5..27f3025 converted all class device iteration to class_for_each_device() and class_find_device(), which are correct but pain in the ass to use. This pach converts them to newly introduced class_dev_iterator so that they can use more natural control structures instead of separate callbacks and struct to pass parameters to them. This results in smaller and easier code. This patch also restores the original behavior of not printing header in /proc/partitions if there's no partition to print. This is trivial but still user-visible behavior. Signed-off-by: Tejun Heo Cc: Greg Kroah-Hartman Signed-off-by: Jens Axboe commit 2ac3cee5298a247b2774f3319b28a05f588c3f0e Author: Tejun Heo Date: Wed Sep 3 08:53:37 2008 +0200 block: don't grab block_class_lock unnecessarily block_class_lock protects major_names array and bdev_map and doesn't have anything to do with block class devices. Don't grab them while iterating over block class devices. Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe commit ac65ece4eee10b03ac29ee925cadc179dc810bab Author: Tejun Heo Date: Mon Aug 25 19:30:12 2008 +0900 block: fix partition info printouts Recent block_class iteration updates 5c6f35c5..27f3025 broke partition info printouts. * printk_all_partitions(): Partition print out stops when it meets a partition hole. Partition printing inner loop should continue instead of exiting on empty partition slot. * /proc/partitions and /proc/diskstats: If all information can't be read in single read(), the information is truncated. This is because find_start() doesn't actually update the counter containing the initial seek. It runs to the end and ends up always reporting EOF on the second read. This patch fixes both problems. Signed-off-by: Tejun Heo Cc: Greg Kroah-Hartman Signed-off-by: Jens Axboe commit 5a3ceb861663040f9ef0176df4aaa494bba5e352 Author: Tejun Heo Date: Mon Aug 25 19:50:19 2008 +0200 driver-core: use klist for class device list and implement iterator Iterating over entries using callback usually isn't too fun especially when the entry being iterated over can't be manipulated freely. This patch converts class->p->class_devices to klist and implements class device iterator so that the users can freely build their own control structure. The users are also free to call back into class code without worrying about locking. class_for_each_device() and class_find_device() are converted to use the new iterators, so their users don't have to worry about locking anymore either. Note: This depends on klist-dont-iterate-over-deleted-entries patch because class_intf->add/remove_dev() depends on proper synchronization with device removal. Signed-off-by: Tejun Heo Cc: Greg Kroah-Hartman Cc: Jens Axboe Signed-off-by: Jens Axboe commit a1ed5b0cffe4b16a93a6a3390e8cee0fbef94f86 Author: Tejun Heo Date: Mon Aug 25 19:50:16 2008 +0200 klist: don't iterate over deleted entries A klist entry is kept on the list till all its current iterations are finished; however, a new iteration after deletion also iterates over deleted entries as long as their reference count stays above zero. This causes problems for cases where there are users which iterate over the list while synchronized against list manipulations and natuarally expect already deleted entries to not show up during iteration. This patch implements dead flag which gets set on deletion so that iteration can skip already deleted entries. The dead flag piggy backs on the lowest bit of knode->n_klist and only visible to klist implementation proper. While at it, drop klist_iter->i_head as it's redundant and doesn't offer anything in semantics or performance wise as klist_iter->i_klist is dereferenced on every iteration anyway. Signed-off-by: Tejun Heo Cc: Greg Kroah-Hartman Cc: Alan Stern Cc: Jens Axboe Signed-off-by: Jens Axboe commit 710027a48ede75428cc68eaa8ae2269b1e356e2c Author: Randy Dunlap Date: Tue Aug 19 20:13:11 2008 +0200 Add some block/ source files to the kernel-api docbook. Fix kernel-doc notation in them as needed. Fix changed function parameter names. Fix typos/spellos. In comments, change REQ_SPECIAL to REQ_TYPE_SPECIAL and REQ_BLOCK_PC to REQ_TYPE_BLOCK_PC. Signed-off-by: Randy Dunlap Signed-off-by: Jens Axboe commit 5b99c2ffa980528a197f26c7d876cceeccce8dd5 Author: Jens Axboe Date: Fri Aug 15 10:56:11 2008 +0200 block: make bi_phys_segments an unsigned int instead of short raid5 can overflow with more than 255 stripes, and we can increase it to an int for free on both 32 and 64-bit archs due to the padding. Signed-off-by: Jens Axboe commit 960e739d9e9f1c2346d8bdc65299ee2e1ed42218 Author: Jens Axboe Date: Fri Aug 15 10:41:18 2008 +0200 block: raid fixups for removal of bi_hw_segments Signed-off-by: Jens Axboe commit 5df97b91b5d7ed426034fcc84cb6e7cf682b8838 Author: Mikulas Patocka Date: Fri Aug 15 10:20:02 2008 +0200 drop vmerge accounting Remove hw_segments field from struct bio and struct request. Without virtual merge accounting they have no purpose. Signed-off-by: Mikulas Patocka Signed-off-by: Jens Axboe commit b8b3e16cfe6435d961f6aaebcfd52a1ff2a988c5 Author: Mikulas Patocka Date: Fri Aug 15 10:15:19 2008 +0200 block: drop virtual merging accounting Remove virtual merge accounting. Signed-off-by: Mikulas Patocka Signed-off-by: Jens Axboe commit 6a421c1dc94b12923294a359822346f12492de5e Author: Aaron Carroll Date: Thu Aug 14 18:17:15 2008 +1000 block: update documentation for deadline fifo_batch tunable Update the description of fifo_batch to match the current implementation, and include a description of how to tune it. Signed-off-by: Aaron Carroll Signed-off-by: Jens Axboe commit 4fb72f7646e86874eb2798256eaa6bf3fbe4edcf Author: Aaron Carroll Date: Thu Aug 14 18:17:14 2008 +1000 deadline-iosched: non-functional fixes * convert goto to simpler while loop; * use rq_end_sector() instead of computing manually; * fix false comments; * remove spurious whitespace; * convert rq_rb_root macro to an inline function. Signed-off-by: Aaron Carroll Signed-off-by: Jens Axboe commit 63de428b139d3d31d86ebe25ae97b33f6540fb7e Author: Aaron Carroll Date: Thu Aug 14 18:17:13 2008 +1000 deadline-iosched: allow non-sequential batching Deadline currently only batches sector-contiguous requests, so except for a few circumstances (e.g. requests in a single direction), it is essentially first come first served. This is bad for throughput, so change it to CSCAN, which means requests in a batch do not need to be sequential and are issued in increasing sector order. Signed-off-by: Aaron Carroll Signed-off-by: Jens Axboe commit 766ca4428d1239a970926856c447310c9c191af2 Author: Fernando Luis Vázquez Cao Date: Thu Aug 14 09:59:13 2008 +0200 virtio_blk: use a wrapper function to access io context information of IO requests struct request has an ioprio member but it is never updated because currently bios do not hold io context information. The implication of this is that virtio_blk ends up passing useless information to the backend driver. That said, some IO schedulers such as CFQ do store io context information in struct request, but use private members for that, which means that that information cannot be directly accessed in a IO scheduler-independent way. This patch adds a function to obtain the ioprio of a request. We should avoid accessing ioprio directly and use this function instead, so that its users do not have to care about future changes in block layer structures or what the currently active IO controller is. This patch does not introduce any functional changes but paves the way for future clean-ups and enhancements. Signed-off-by: Fernando Luis Vazquez Cao Acked-by: Rusty Russell Signed-off-by: Jens Axboe commit 1a8e2bddd5c29008f311613e75925fecbf522c5b Author: David Woodhouse Date: Wed Aug 13 12:35:09 2008 +0100 Kill REQ_TYPE_FLUSH It was only used by ps3disk, and it should probably have been REQ_TYPE_LINUX_BLOCK + REQ_LB_OP_FLUSH. Signed-off-by: David Woodhouse Signed-off-by: Jens Axboe commit e17fc0a1ccf88f6d4dcb363729f3141b0958c325 Author: David Woodhouse Date: Sat Aug 9 16:42:20 2008 +0100 Allow elevators to sort/merge discard requests But blkdev_issue_discard() still emits requests which are interpreted as soft barriers, because naïve callers might otherwise issue subsequent writes to those same sectors, which might cross on the queue (if they're reallocated quickly enough). Callers still _can_ issue non-barrier discard requests, but they have to take care of queue ordering for themselves. Signed-off-by: David Woodhouse Signed-off-by: Jens Axboe commit d30a2605be9d5132d95944916e8f578fcfe4f976 Author: David Woodhouse Date: Mon Aug 11 15:58:42 2008 +0100 Add BLKDISCARD ioctl to allow userspace to discard sectors We may well want mkfs tools to use this to mark the whole device as unwanted before they format it, for example. The ioctl takes a pair of uint64_ts, which are start offset and length in _bytes_. Although at the moment it might make sense for them both to be in 512-byte sectors, I don't want to limit the ABI to that. Signed-off-by: David Woodhouse Signed-off-by: Jens Axboe commit 2ebca85abcfcbaaf1c0b242e39fc88ad3da90090 Author: OGAWA Hirofumi Date: Mon Aug 11 17:07:08 2008 +0100 Use WRITE_BARRIER in blkdev_issue_flush(), not (1< Signed-off-by: David Woodhouse Signed-off-by: Jens Axboe commit 35ba8f7083e87602b695d6eaca38a6464d5b74db Author: David Woodhouse Date: Sun Aug 10 12:33:00 2008 +0100 blktrace: simplify flags handling in __blk_add_trace Let the compiler see what's going on, and it can all get a lot simpler. On PPC64 this reduces the size of the code calculating these bits by about 60%. On x86_64 it's less of a win -- only 40%. Signed-off-by: David Woodhouse Signed-off-by: Jens Axboe commit 27b29e86bf3d4b3cf6641a0efd78ed11a9b633b2 Author: David Woodhouse Date: Sun Aug 10 11:21:57 2008 +0100 blktrace: support discard requests Signed-off-by: David Woodhouse Signed-off-by: Jens Axboe commit fdc53971bce56d299cb5f1f06ecbff30b34cbaf2 Author: David Woodhouse Date: Tue Aug 5 18:08:56 2008 +0100 Support 'discard sectors' operation. We can benefit from knowing that the file system no longer cares about the contents of certain sectors, by throwing them away immediately and then never having to garbage collect them, and using the extra free space to make our operations more efficient. Do so. Signed-off-by: David Woodhouse Signed-off-by: Jens Axboe commit eae9acd13a8d14b50c00a961fa959606f34bbd92 Author: David Woodhouse Date: Tue Aug 5 18:08:25 2008 +0100 Support 'discard sectors' operation in translation layer support core Signed-off-by: David Woodhouse Signed-off-by: Jens Axboe commit 8c540a96c175bdf55bda8707db04cec78b816454 Author: David Woodhouse Date: Tue Aug 5 18:05:46 2008 +0100 Let the block device know when sectors can be discarded [hirofumi@mail.parknet.co.jp: discard _after_ checking for corrupt chains] Signed-off-by: David Woodhouse Acked-by: OGAWA Hirofumi Signed-off-by: Jens Axboe commit fb2dce862d9f9a68e6b9374579056ec9eca02a63 Author: David Woodhouse Date: Tue Aug 5 18:01:53 2008 +0100 Add 'discard' request handling Some block devices benefit from a hint that they can forget the contents of certain sectors. Add basic support for this to the block core, along with a 'blkdev_issue_discard()' helper function which issues such requests. The caller doesn't get to provide an end_io functio, since blkdev_issue_discard() will automatically split the request up into multiple bios if appropriate. Neither does the function wait for completion -- it's expected that callers won't care about when, or even _if_, the request completes. It's only a hint to the device anyway. By definition, the file system doesn't _care_ about these sectors any more. [With feedback from OGAWA Hirofumi and Jens Axboe Signed-off-by: Jens Axboe commit d628eaef310533767ce68664873869c2d7f78f09 Author: David Woodhouse Date: Sat Aug 9 16:22:17 2008 +0100 Fix up comments about matching flags between bio and rq Signed-off-by: David Woodhouse Signed-off-by: Jens Axboe commit 36144077bce9f89763ce994bc631cbd1c9db7785 Author: Jens Axboe Date: Thu Aug 14 13:12:15 2008 +0200 highmem: use bio_has_data() in the bounce path Signed-off-by: Jens Axboe commit 051cc3952a8fb6fa875a4eff68d06cf42207dcf4 Author: Jens Axboe Date: Fri Aug 8 11:06:45 2008 +0200 block: use bio_has_data() in the IO completion path Signed-off-by: Jens Axboe commit a9c701e594669dd49fed448c27c64f20cfacc8a7 Author: Jens Axboe Date: Fri Aug 8 11:04:44 2008 +0200 block: use bio_has_data() to check for data carrying bio Signed-off-by: Jens Axboe commit 7a67f63b3233ff28e753854fe27891c44f8588ae Author: Jens Axboe Date: Fri Aug 8 11:17:12 2008 +0200 block: add bio_has_data() to detect whether a bio carries data or not Signed-off-by: Jens Axboe commit 35e396cd100489dfe8f5a76e3613fb8049ffdff3 Author: xiphmont@xiph.org Date: Fri Aug 22 11:12:21 2008 +0200 SG_IO block filter whitelist missing MMC SET READ AHEAD command I have another request for the block filter SG_IO command whitelist, specifically the MMC streaming command set SET READ AHEAD command. The command applies only to MMC CDROM/DVDROM drives with the streaming optional feature set. The command is useful to cdparanoia in that it allows explicit cache control side effects that are, on many drives, cdparanoia's most efficient way to flush/disable the media cache on cdrom drives. I am aware of no reason why it should not be accessible from usespace. Also note that the command is already fully accessible through the SCSI-native version of the SG_IO ioctl as well as the traditional SG interface. The command is only being refused on block devices. That means that on a typical stock distro, the command is available through /dev/sg* but not /dev/scd* although both are typically available and accessible. Filtering the command is not providing any protection, only a confusing inconsistency. Signed-off-by: Jens Axboe commit d57f5f72df1b0da501c4b55e56a1040b1631c1f3 Author: Vladimir Sokolovsky Date: Wed Oct 8 20:09:01 2008 -0700 IB/mlx4: Set RLKEY bit for kernel QPs Set RLKEY bit in the HW context for kernel QPs so that kernel QPs can use the reserved L_Key for memory reference. Signed-off-by: Vladimir Sokolovsky Signed-off-by: Roland Dreier commit cdbb92b31d3c465aa96bd09f2d42c39b87b32bee Merge: 2ec2b48... 6984937... Author: Ingo Molnar Date: Thu Oct 9 00:17:25 2008 +0200 Merge branch 'linus' into core/rcu commit e2f5e7333a2fb51ef9e45280c3da9ca3bde65fde Author: Chien Tung Date: Wed Oct 8 14:43:29 2008 -0700 RDMA/nes: Correct error_module bit mask error_module is 5 bits wide not 4. The corresponding crit_error_count array is correct with 32 entries. Signed-off-by: Chien Tung -- drivers/infiniband/hw/nes/nes_hw.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) Signed-off-by: Roland Dreier commit 2fb7635c4cea310992a39580133099dd99ad151c Author: Peter Zijlstra Date: Wed Oct 8 09:16:04 2008 +0200 sched: sync wakeups vs avg_overlap While looking at the code I wondered why we always do: sync && avg_overlap < migration_cost Which is a bit odd, since the overlap test was meant to detect sync wakeups so using it to specialize sync wakeups doesn't make much sense. Hence change the code to do: sync || avg_overlap < migration_cost Signed-off-by: Peter Zijlstra Signed-off-by: Ingo Molnar commit 990d0f2ced23052abc7efa09bd05bff34e00cf73 Merge: 85ba94b... 34b3ede... e545a61... d294eb8... Author: Ingo Molnar Date: Wed Oct 8 11:31:02 2008 +0200 Merge branches 'sched/devel', 'sched/cpu-hotplug', 'sched/cpusets' and 'sched/urgent' into sched/core commit e496e3d645c93206faf61ff6005995ebd08cc39c Merge: b159d7a... 5bbd4c3... 175e438... 516cbf3... af2d237... 9b15684... 5b7e41f... 1befdef... a03352d... 7b22ff5... 2c7e9fd... 91030ca... dd55235... b3e15bd... 20211e4... efd327a... c7ffa6c... e51a1ac... 5df4551... d99e901... e621bd1... Author: Ingo Molnar Date: Mon Oct 6 18:17:07 2008 +0200 Merge branches 'x86/alternatives', 'x86/cleanups', 'x86/commandline', 'x86/crashdump', 'x86/debug', 'x86/defconfig', 'x86/doc', 'x86/exports', 'x86/fpu', 'x86/gart', 'x86/idle', 'x86/mm', 'x86/mtrr', 'x86/nmi-watchdog', 'x86/oprofile', 'x86/paravirt', 'x86/reboot', 'x86/sparse-fixes', 'x86/tsc', 'x86/urgent' and 'x86/vmalloc' into x86-v28-for-linus-phase1 commit b159d7a989e53ab3529084348aa80441520b8575 Merge: 0962f40... 4ab4ba3... Author: Ingo Molnar Date: Mon Oct 6 18:16:40 2008 +0200 Merge branch 'x86/tracehook' into x86-v28-for-linus-phase1 Conflicts: arch/x86/kernel/signal_64.c Signed-off-by: Ingo Molnar commit 0962f402af1bb0b53ccee626785d202a10c12fff Merge: 19268ed... 8d7ccaa... Author: Ingo Molnar Date: Mon Oct 6 16:18:26 2008 +0200 Merge branch 'x86/prototypes' into x86-v28-for-linus-phase1 Conflicts: arch/x86/kernel/process_32.c Signed-off-by: Ingo Molnar commit 19268ed7449c561694d048a34601a30e2d1aaf79 Merge: b8cd9d0... 493cd91... Author: Ingo Molnar Date: Mon Oct 6 16:17:23 2008 +0200 Merge branch 'x86/pebs' into x86-v28-for-linus-phase1 Conflicts: include/asm-x86/ds.h Signed-off-by: Ingo Molnar commit b8cd9d056bbc5f2630ab1787dbf76f83bbb517c0 Merge: fec6ed1... 1503af6... Author: Ingo Molnar Date: Mon Oct 6 16:15:57 2008 +0200 Merge branch 'x86/header-guards' into x86-v28-for-linus-phase1 Conflicts: include/asm-x86/dma-mapping.h include/asm-x86/gpio.h include/asm-x86/idle.h include/asm-x86/kvm_host.h include/asm-x86/namei.h include/asm-x86/uaccess.h Signed-off-by: Ingo Molnar commit 34b3ede2353604ec9861c1d900b2a835ff85de47 Author: Li Zefan Date: Mon Oct 6 09:27:00 2008 +0800 sched: remove redundant code in cpu_cgroup_create() css will be initialized by cgroup core. Signed-off-by: Li Zefan Acked-by: Peter Zijlstra Signed-off-by: Ingo Molnar commit 2c10c22af088ab5d94fae93ce3fe6436b2a208b4 Merge: f6121f4... fec6ed1... Author: Ingo Molnar Date: Mon Oct 6 08:13:18 2008 +0200 Merge branch 'linus' into sched/devel commit dd5523552c2897e3fde16fc2fc8f6332addf66ab Author: Yinghai Lu Date: Sat Oct 4 14:50:33 2008 -0700 x86: mtrr_cleanup: treat WRPROT as UNCACHEABLE For the purpose of MTRR canonicalization, treat WRPROT as UNCACHEABLE. Signed-off-by: Yinghai Lu Signed-off-by: H. Peter Anvin commit 99e1aa17ce434010dd820b583628370cc15f10f3 Author: Yinghai Lu Date: Sat Oct 4 14:50:32 2008 -0700 x86: mtrr_cleanup: first 1M may be covered in var mtrrs The first 1M is don't care when it comes to the variables MTRRs. Cover it as WB as a heuristic approximation; this is generally what we want to minimize the number of registers. Signed-off-by: Yinghai Lu Signed-off-by: H. Peter Anvin commit 42fde7a05c5d6dc76e518ae12091ea897b1c132b Author: Yinghai Lu Date: Sat Oct 4 19:34:18 2008 -0700 x86: mtrr_cleanup: print out correct type v2 Print out the correct type when the Write Protected (WP) type is seen. Signed-off-by: Yinghai Lu Signed-off-by: H. Peter Anvin commit f6121f4f8708195e88cbdf8dd8d171b226b3f858 Author: Dario Faggioli Date: Fri Oct 3 17:40:46 2008 +0200 sched_rt.c: resch needed in rt_rq_enqueue() for the root rt_rq While working on the new version of the code for SCHED_SPORADIC I noticed something strange in the present throttling mechanism. More specifically in the throttling timer handler in sched_rt.c (do_sched_rt_period_timer()) and in rt_rq_enqueue(). The problem is that, when unthrottling a runqueue, rt_rq_enqueue() only asks for rescheduling if the runqueue has a sched_entity associated to it (i.e., rt_rq->rt_se != NULL). Now, if the runqueue is the root rq (which has a rt_se = NULL) rescheduling does not take place, and it is delayed to some undefined instant in the future. This imply some random bandwidth usage by the RT tasks under throttling. For instance, setting rt_runtime_us/rt_period_us = 950ms/1000ms an RT task will get less than 95%. In our tests we got something varying between 70% to 95%. Using smaller time values, e.g., 95ms/100ms, things are even worse, and I can see values also going down to 20-25%!! The tests we performed are simply running 'yes' as a SCHED_FIFO task, and checking the CPU usage with top, but we can investigate thoroughly if you think it is needed. Things go much better, for us, with the attached patch... Don't know if it is the best approach, but it solved the issue for us. Signed-off-by: Dario Faggioli Signed-off-by: Michael Trimarchi Acked-by: Peter Zijlstra Cc: Signed-off-by: Ingo Molnar commit 81990fbdd18b9cfdc93dc221ff3250f81468aed8 Author: Paul Moore Date: Fri Oct 3 10:51:15 2008 -0400 selinux: Fix an uninitialized variable BUG/panic in selinux_secattr_to_sid() At some point during the 2.6.27 development cycle two new fields were added to the SELinux context structure, a string pointer and a length field. The code in selinux_secattr_to_sid() was not modified and as a result these two fields were left uninitialized which could result in erratic behavior, including kernel panics, when NetLabel is used. This patch fixes the problem by fully initializing the context in selinux_secattr_to_sid() before use and reducing the level of direct context manipulation done to help prevent future problems. Please apply this to the 2.6.27-rcX release stream. Signed-off-by: Paul Moore Signed-off-by: James Morris commit 7191a0a18228c8da9abc7776433c6a3953ff1e4b Author: Bob Sharp Date: Fri Oct 3 12:21:19 2008 -0700 RDMA/nes: Fix routed RDMA connections Fix routed RDMA connections to destinations where the next hop is not the final destination. Use neigh_*() to properly locate neighbor. Signed-off-by: Bob Sharp Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung commit 7e36d3d732438de894802f87a0ca21372e00fb74 Author: Vadim Makhervaks Date: Fri Oct 3 12:21:18 2008 -0700 RDMA/nes: Enhanced PFT management scheme Change management of perfect filter table to allow enhanced performance applications. Signed-off-by: Vadim Makhervaks Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung Signed-off-by: Roland Dreier commit 41bfcf90101601f9507240ff0435c1b73d28a132 Author: Swen Schillig Date: Wed Oct 1 12:42:26 2008 +0200 [SCSI] zfcp: fix double dbf id usage Trace ids 107 and 3 are used twice, fix this to have unique ids for the erp triggers. Signed-off-by: Swen Schillig Signed-off-by: Christof Schmitt Signed-off-by: James Bottomley commit 091694a556d168dc9df4d79e3a40116550b183cf Author: Swen Schillig Date: Wed Oct 1 12:42:25 2008 +0200 [SCSI] zfcp: wait on SCSI work to be finished before proceeding with init dev Due to the character of a scheduled work we cannot guarantee the LUN register to be finished before an initial device tries to use it. Therefor we have to wait for PENDING_SCSI_WORK flag to be cleared before proceeding. Signed-off-by: Swen Schillig Signed-off-by: Christof Schmitt Signed-off-by: James Bottomley commit 9fb3cd86e4870d54d71a80323e97c48df4de05bd Author: Swen Schillig Date: Wed Oct 1 12:42:24 2008 +0200 [SCSI] zfcp: fix erp list usage without using locks The zfcp_erp_thread was using the nolock version of the dbf function. This resulted in a list access while other tasks could modifying the list. The symptom was an erp thread running at 100% CPU and never returning from the dbf function. Signed-off-by: Swen Schillig Signed-off-by: Christof Schmitt Signed-off-by: James Bottomley commit e4e9ba5d9313f362d2192fb7a2d35a3bfb714b1e Author: Swen Schillig Date: Wed Oct 1 12:42:23 2008 +0200 [SCSI] zfcp: prevent fc_remote_port_delete calls for unregistered rport In case of an adapter reopen all rports have to be deleted from the environment. This should only happen for already registered rports otherwise fc_remote_port_delete is called with a NULL pointer. Signed-off-by: Swen Schillig Signed-off-by: Christof Schmitt Signed-off-by: James Bottomley commit b7f15f3c94196accac799727502ed88a029ae7ef Author: Swen Schillig Date: Wed Oct 1 12:42:22 2008 +0200 [SCSI] zfcp: fix deadlock caused by shared work queue tasks Each adapter reopen trigger automatically a scan_port task which is waiting for the ERP to be finished before further processing. Since the initial device setup enqueues adapter, port and LUN which are individual ERP actions, this process would start after everything is done. Unfortunately the port_reopen requires another scheduled work to be finished which is queued after the automatic scan_port -> deadlock ! This fix creates an own work queue for ERP based nameserver requests. Signed-off-by: Swen Schillig Signed-off-by: Christof Schmitt Signed-off-by: James Bottomley commit 57069386699994c3e67042fc4928c418f3a39e01 Author: Swen Schillig Date: Wed Oct 1 12:42:21 2008 +0200 [SCSI] zfcp: put threshold data in hba trace Now that we removed the long messages for the bit error threshold data, put the data in the hba trace. This way, we get a short warning for the threshold event from the hardware and have the data in the trace for further analysis. Signed-off-by: Swen Schillig Signed-off-by: Christof Schmitt Signed-off-by: James Bottomley commit 0406289ed57955860a4f8d744a14f4c819260ce4 Author: Christof Schmitt Date: Wed Oct 1 12:42:20 2008 +0200 [SCSI] zfcp: Simplify zfcp data structures Reduce the size of zfcp data structures by removing unused and redundant members. scsi_lun is only the mangled version of the fcp_lun. So, remove the redundant field and use the fcp_lun instead. Since the queue lock and the pci_batch indicator are only used in the request queue, move them from the common queue struct to the adapter struct. Signed-off-by: Christof Schmitt Signed-off-by: Swen Schillig Signed-off-by: James Bottomley commit a1b449de5d35b9eec8981c6ea999eea263b19a0b Author: Swen Schillig Date: Wed Oct 1 12:42:19 2008 +0200 [SCSI] zfcp: Simplify get_adapter_by_busid Call the helper function from cio instead looping through all zfcp adapters. Signed-off-by: Swen Schillig Signed-off-by: Christof Schmitt Signed-off-by: James Bottomley commit 7ba58c9cc16d296290fe645acb11db2b01276544 Author: Swen Schillig Date: Wed Oct 1 12:42:18 2008 +0200 [SCSI] zfcp: remove all typedefs and replace them with standards Remove typedefs from zfcp, use already existing types instead. Signed-off-by: Swen Schillig Signed-off-by: Christof Schmitt Signed-off-by: James Bottomley commit 5ab944f97e09a3d52951fe903eed9a7b88d810b2 Author: Swen Schillig Date: Wed Oct 1 12:42:17 2008 +0200 [SCSI] zfcp: attach and release SAN nameserver port on demand Changing the zfcp behaviour from always having the nameserver port open to an on-demand strategy. This strategy reduces the use of limited resources like port connections. The patch provides a common infrastructure which could be used for all WKA ports in future. Also reduce the number of nameserver lookups by changing the zfcp behaviour of always querying the nameserver for the corresponding destination ID of the remote port. If the destination ID has changed during the reopen process we will be informed and then trigger a nameserver query on demand. Signed-off-by: Swen Schillig Signed-off-by: Christof Schmitt Signed-off-by: James Bottomley commit 44cc76f2d154aa24340354b4711a0fe7f8f08adc Author: Swen Schillig Date: Wed Oct 1 12:42:16 2008 +0200 [SCSI] zfcp: remove unused references, declarations and flags - Remove unused references and declarations, including one instance of the FC ls_adisc struct that has been defined twice. - Also remove the flags COMMON_OPENING, COMMON_CLOSING, ADAPTER_REGISTERED and XPORT_OK that are only set and cleared, but not checked anywhere. - Remove the zfcp specific atomic_test_mask makro. Simply use atomic_read directly instead. - Remove the zfcp internal sg helper functions and switch the places where it is still used to call sg_virt directly. - With the update of the QDIO code, the QDIO data structures no longer use the volatile type qualifier. Now we can also remove the volatile qualifiers from the zfcp code. Signed-off-by: Swen Schillig Signed-off-by: Christof Schmitt Signed-off-by: James Bottomley commit ff3b24fa5370a7ca618f212284d9b36fcedb9c0e Author: Christof Schmitt Date: Wed Oct 1 12:42:15 2008 +0200 [SCSI] zfcp: Update message with input from review Update the kernel messages in zfcp with input from the message review and remove some messages that have been identified as redundant. Signed-off-by: Christof Schmitt Signed-off-by: Swen Schillig Signed-off-by: James Bottomley commit 2450d3e7b8604d0abb042817f2502cb7ee0b782f Author: Stefan Raspl Date: Wed Oct 1 12:42:14 2008 +0200 [SCSI] zfcp: add queue_full sysfs attribute Adds a new sysfs attribute queue_full for adapters that records the number of incidents where a requests could not be submitted due to insufficient free space on the request queue. Signed-off-by: Stefan Raspl Signed-off-by: Martin Peschke Signed-off-by: Christof Schmitt Signed-off-by: James Bottomley commit 7ae628d9d21a088b4a2d26a9d39c29c0acd2d03b Author: James Bottomley Date: Tue Sep 23 07:58:59 2008 -0700 [SCSI] scsi_dh: suppress comparison warning On Mon, 2008-09-22 at 14:56 -0700, akpm@linux-foundation.org wrote: > From: Andrew Morton > > s390: > > drivers/scsi/device_handler/scsi_dh_emc.c: In function 'parse_sp_info_reply': > drivers/scsi/device_handler/scsi_dh_emc.c:179: warning: comparison is always false due to limited range of data type > > because chars are unsigned, I assume. Fix by making csdev->buffer explicitly an unsigned char and dropping the < 0 test. Signed-off-by: James Bottomley commit 650849d71ca05d55a1553fe42fb21af9dce5612b Author: Yanqing_Liu@Dell.com Date: Thu Oct 2 12:18:33 2008 -0500 [SCSI] scsi_dh: add Dell product information into rdac device handler Add Dell Powervault storage arrays into device list of rdac device handler. Signed-off-by: Yanqing Liu Signed-off-by: James Bottomley commit b78ded89156827b50729518b1d13fea6defb46ee Author: Adrian Bunk Date: Mon Sep 22 14:56:49 2008 -0700 [SCSI] qla2xxx: remove the unused SCSI_QLOGIC_FC_FIRMWARE option This option was forgotten when the SCSI_QLOGIC_FC driver was removed. Reported-by: Robert P. J. Day Signed-off-by: Adrian Bunk Signed-off-by: Andrew Morton Signed-off-by: James Bottomley commit 4a3e1ea8b2f25f962e91883be880de678bad8e3e Author: Alexander Beregalov Date: Thu Sep 18 04:09:57 2008 +0400 [SCSI] qla2xxx: fix printk format warnings Signed-off-by: Alexander Beregalov Acked-by: Andrew Vasquez Signed-off-by: James Bottomley commit 36d1cdcffbbb33b966d6bdd6534bb03c22b3fd0c Author: Andrew Vasquez Date: Thu Sep 11 21:22:54 2008 -0700 [SCSI] qla2xxx: Update version number to 8.02.01-k8. Signed-off-by: Andrew Vasquez Signed-off-by: James Bottomley commit 59d72d873ccfaf59e9ceea1487459f5a57c0d504 Author: Ravi Anand Date: Thu Sep 11 21:22:53 2008 -0700 [SCSI] qla2xxx: Ignore payload reserved-bits during RSCN processing. As the driver is only interested in bits 0-9 of the 1st RSCN-payload word: rsvd[15:14]RscnEventQualifier[13:10]Fmt[9:8]Domain[7:0] Area[15:8]Alpa[7:0] Signed-off-by: Ravi Anand Signed-off-by: Andrew Vasquez Signed-off-by: James Bottomley commit 2d136938792ba04bf3b6bb2d3a2807dc05c396fc Author: Andrew Vasquez Date: Thu Sep 11 21:22:52 2008 -0700 [SCSI] qla2xxx: Additional residual-count corrections during UNDERRUN handling. Add additional tightening of residual-count handling (originally from commit 6acf8190025e9c4ea513d4084ff089d476112816) where the driver should discard any lower SCSI-status during firmware/transport residual-count mismatches. Signed-off-by: Andrew Vasquez Signed-off-by: James Bottomley commit cc3ef7bc40bbede7bbd0bb395d6452a575e95efe Author: Bjorn Helgaas Date: Thu Sep 11 21:22:51 2008 -0700 [SCSI] qla2xxx: Fix "occured" spelling errors. Fix "occured" spelling errors. Most of these are in comments, which I wouldn't normally bother with, but a couple are in printks, which irritate me more. So I just fixed them all at the same time. Signed-off-by: Bjorn Helgaas Signed-off-by: Andrew Vasquez Signed-off-by: James Bottomley commit 272976ca186982f7bbc4f22876c53d6c9f7b6e32 Author: Andrew Vasquez Date: Thu Sep 11 21:22:50 2008 -0700 [SCSI] qla2xxx: Add NPIV-Config Table support. To instatiate pre-configured vport entities defined within an HBA's flash memory. Signed-off-by: Andrew Vasquez Signed-off-by: James Bottomley commit c00d8994d91e51aa6b891ad0e877f66cc1011de2 Author: Andrew Vasquez Date: Thu Sep 11 21:22:49 2008 -0700 [SCSI] qla2xxx: Add Flash Layout Table support. The Flash Layout Table (FLT) present on many recent HBAs encodes flash usage information, organizes data stored into separate regions and presents the information uniformly to the driver. Use this information rather than using specific hard-coded values based on ISP type. Signed-off-by: Andrew Vasquez Signed-off-by: James Bottomley commit 4b89258c7320bab4155b692e76ae9ffdd85e79be Author: Andrew Vasquez Date: Thu Sep 11 21:22:48 2008 -0700 [SCSI] qla2xxx: Change GFP_ATOMIC to GFP_KERNEL for non-atomic allocations. Both call-sites are sleeping-capable. Signed-off-by: Andrew Vasquez Signed-off-by: James Bottomley commit 49fd462a1ba4a1b9bfbfe01d279d506017d85492 Author: Harish Zunjarrao Date: Thu Sep 11 21:22:47 2008 -0700 [SCSI] qla2xxx: Add input/output byte-count statistics. Currently Firmware does not have counters for input megabytes and output megabytes, therefore driver counts these values depending on the status of the scsi command and direction of the command. The values are exported in the FC_HOST path. Signed-off-by: Andrew Vasquez Signed-off-by: James Bottomley commit ac26d41dee65167109e7cdcd0289b44ca61cd741 Author: Mike Christie Date: Sat Sep 6 08:39:15 2008 -0500 [SCSI] libiscsi: return error passed in during iscsi recovery Due to patch building error on my side, we are still passing DID_BUS_BUSY for commands that are running, when we want to return whatever the caller of fail_all_commands wanted. This replaces the hardcoded error code with the value that is passed in. Signed-off-by: Mike Christie Signed-off-by: James Bottomley commit a7bbb57333447d0cf950992653b6b079585f3531 Author: Pierre Ossman Date: Sat Sep 6 10:57:57 2008 +0200 [SCSI] mmc_block: use generic helper to print capacities Signed-off-by: Pierre Ossman Signed-off-by: James Bottomley commit a30c3f69e6336cb9b09a989595e417367e4e9b1b Author: Andrew Vasquez Date: Fri Jul 18 08:32:52 2008 -0700 [SCSI] fc_transport: Add an API to allow an LLD to create vports There's already a fc_vport_termintate() call exported by the transport. This patch adds a symmetric call to the API to allow an NPIV-capable LLD to instantiate vports sans user intervention. Additional comments/updates: Re: scsi_fc_transport.txt Add a function prototype for fc_vport_terminate similar to what's done for fc_vport_create Re: fc_vport_create I recommend we pass the channel number in fc_vport_create rather than fixing it at zero. Also, ids->vport_type should be set to FC_PORTTYPE_NPIV prior to calling fc_vport_create. The comment is also meaningless. Added-by and Signed-off-by: James Smart Signed-off-by: Andrew Vasquez Signed-off-by: James Bottomley commit 7404ad3b6d04efbd918e9e2e776bf560fbedf47d Author: James Bottomley Date: Sun Aug 31 10:41:52 2008 -0500 [SCSI] sd: use generic helper to print capacities in both binary and SI Signed-off-by: James Bottomley commit 3c9f3681d0b4af09c1cbf04f92fdfb72bd81ad7b Author: James Bottomley Date: Sun Aug 31 10:13:54 2008 -0500 [SCSI] lib: add generic helper to print sizes rounded to the correct SI range This patch adds the ability to print sizes in either units of 10^3 (SI) or 2^10 (Binary) units. It rounds up to three significant figures and can be used for either memory or storage capacities. Oh, and I'm fully aware that 64 bits is only 16EiB ... the Zetta and Yotta units are added for future proofing against the day we have 128 bit computers ... [fujita.tomonori@lab.ntt.co.jp: fix missed unsigned long long cast] Signed-off-by: James Bottomley commit 114f1ea408944f9515e45aa7452238fb15a1fa00 Author: FUJITA Tomonori Date: Mon Aug 25 13:51:58 2008 -0700 [SCSI] scsi_dh: no need to initialize rq->cmd with blk_get_request blk_get_request initializes rq->cmd (rq_init does) so the users don't need to do that. Signed-off-by: FUJITA Tomonori Signed-off-by: Chandra Seetharaman Signed-off-by: James Bottomley commit 6f4267e3bd1211b3d09130e626b0b3d885077610 Author: James Bottomley Date: Fri Aug 22 16:53:31 2008 -0500 [SCSI] Update the SCSI state model to allow blocking in the created state Brian King reported that fibre channel devices can oops during scanning if their ports block (because the device goes from CREATED -> BLOCK -> RUNNING rather than CREATED -> BLOCK -> CREATED). Fix this by adding a new state: CREATED_BLOCK which can only transition back to CREATED and disallow the CREATED -> BLOCK transition. Now both the created and blocked states that the mid-layer recognises can include CREATED_BLOCK. Signed-off-by: James Bottomley commit 0f1d87a2acb8fd1f2ef8af109a785123ddc1a6cb Author: James Bottomley Date: Fri Aug 22 16:43:59 2008 -0500 [SCSI] add inline functions for recognising created and blocked states The created and blocked states are very shortly going to correspond to mixed sdev_state states. Signed-off-by: James Bottomley commit 22447be7d15aefcfab84e9bec4859a28198b0c62 Author: James Smart Date: Fri Aug 8 02:14:18 2008 -0400 [SCSI] scsi_netlink: Add transport and LLD recieve and event support This patch adds scsi netlink recieve and event support for transport and scsi LLDD's. It is a reimplementation of the patch posted last week by David Somayajulu. http://marc.info/?l=linux-scsi&m=121745486221819&w=2 There are a few things done differently: - Transport support is included - Event delivery is included - The vendor message is now its own unique message type, considered part of the generic "SCSI Transport". - LLDD entry points are now registered rather than included in the scsi_host_template. Background: When I started to implement the event handler via template, I had to either: muck up scsi_add_host and scsi_remove_host; or have the event handler search all possible shosts. Neither was acceptable. Moving to a registration solves this, and also limits the scope of the changes to something that could be backported to a distro without breaking an already-released-distro kabi. However, I admit it isn't as elegant, as the passing of the LLDD host template in the registration and the complexity around dynamic add/remove shows. - The receive path was augmented to require a unique identifier for the LLDD before the message was allowed to be handed off to the driver. Given how quickly very fatal errors occur if there's msg mismatches (which I saw in testing my own tools :), I believe this to be a very good thing. The id plays off the vendor id scheme already introduced for the vendor unique event messages used by FC. Additionally, the id use as the basis of the registration/deregistration. - Send assist functions, for both the transport and LLDDs are included. [fujita.tomonori@lab.ntt.co.jp: fix missing cast] Signed-off-by: James Smart Signed-off-by: James Bottomley commit 557cc476c04146ee610f4bc77bfe20ce1c823d7c Author: Nick Warne Date: Sun Aug 17 00:00:44 2008 +0200 [SCSI] tmscsim: Fixup KERN_INFO in printk Multiline kernel messages should contain a priority in every line, besides, some log daemons represent a tabulator as "^I", fix both these issues. Signed-off-by: Nick Warne Signed-off-by: Guennadi Liakhovetski Signed-off-by: James Bottomley commit 3fd7f938831b7f2e7b6103118ad1a455f6f03110 Author: Matthew Wilcox Date: Wed Aug 13 21:36:55 2008 -0700 [SCSI] qla2xxx: Remove semaphore.h Now that qla2xxx has been converted to mutexes, it no longer needs the semaphore include. Signed-off-by: Matthew Wilcox Signed-off-by: Andrew Vasquez Signed-off-by: James Bottomley commit 315cb0ad124575e75da2d0e0a95990587fc23485 Author: James Smart Date: Thu Aug 7 20:49:30 2008 -0400 [SCSI] scsi_host_lookup: error returns and NULL pointers This patch cleans up the behavior of scsi_host_lookup(). The original implementation attempted to use the dual role of either returning a pointer value, or a negative error code. User's needed to use IS_ERR() to check the result. Additionally, the IS_ERR() macro never checks for when a NULL pointer was returned, so a NULL pointer actually passes with a success case. Note: scsi_host_get(), used by scsi_host_lookup(), can return a NULL pointer. Talk about a mudhole for the unitiated to step into.... This patch converts scsi_host_lookup() to return either NULL or a valid pointer. The consumers were updated for the change. Signed-off-by: James Smart Signed-off-by: James Bottomley commit d294eb83d8d39a29f01dad391f15fc3a29aa04f9 Author: Frederic Weisbecker Date: Fri Oct 3 12:10:10 2008 +0200 cpusets: scan_for_empty_cpusets(), cpuset doesn't seem to be so const This fixes a warning on latest -tip: kernel/cpuset.c: Dans la fonction «scan_for_empty_cpusets» : kernel/cpuset.c:1932: attention : passing argument 1 of «list_add_tail» discards qualifiers from pointer target type Actually the struct cpuset *root passed in parameter to scan_for_empty_cpusets is not supposed to be const since an entry is added on the tail of its list. Just correct the qualifier. Signed-off-by: Frederic Weisbecker Signed-off-by: Ingo Molnar commit 2ec2b482b10a1ed3493c224f1893cddd3d33833b Author: Ingo Molnar Date: Fri Oct 3 10:41:00 2008 +0200 rcu: RCU-based detection of stalled CPUs for Classic RCU, fix fix the !CONFIG_RCU_CPU_STALL_DETECTOR path: kernel/rcuclassic.c: In function '__rcu_pending': kernel/rcuclassic.c:609: error: too few arguments to function 'check_cpu_stall' Signed-off-by: Ingo Molnar commit 2133b5d7ff531bc15a923db4a6a50bf96c561be9 Author: Paul E. McKenney Date: Thu Oct 2 16:06:39 2008 -0700 rcu: RCU-based detection of stalled CPUs for Classic RCU This patch adds stalled-CPU detection to Classic RCU. This capability is enabled by a new config variable CONFIG_RCU_CPU_STALL_DETECTOR, which defaults disabled. This is a debugging feature to detect infinite loops in kernel code, not something that non-kernel-hackers would be expected to care about. This feature can detect looping CPUs in !PREEMPT builds and looping CPUs with preemption disabled in PREEMPT builds. This is essentially a port of this functionality from the treercu patch, replacing the stall debug patch that is already in tip/core/rcu (commit 67182ae1c4). The changes from the patch in tip/core/rcu include making the config variable name match that in treercu, changing from seconds to jiffies to avoid spurious warnings, and printing a boot message when this feature is enabled. Signed-off-by: Paul E. McKenney Signed-off-by: Ingo Molnar commit b5259d944279d0b7e78a83849a352d8ba0447c4c Merge: 1c50b72... 94aca1d... Author: Ingo Molnar Date: Fri Oct 3 10:34:36 2008 +0200 Merge commit 'v2.6.27-rc8' into core/rcu commit 175e438f7a2de9d94110046be48697969569736a Author: Russ Anderson Date: Thu Oct 2 17:32:06 2008 -0500 x86: trivial printk fix in efi.c [patch] x86: Trivial printk fix in efi.c The following line is lacking a space between "memdesc" and "doesn't". "Kernel-defined memdescdoesn't match the one from EFI!" Fixed the printk by adding a space. Signed-off-by: Russ Anderson Cc: Russ Anderson Signed-off-by: Ingo Molnar commit 8bb39311bf461243f6cade3644764d848516dd23 Author: Yinghai Lu Date: Thu Oct 2 23:26:59 2008 -0700 x86, debug: mtrr_cleanup print out var mtrr before change it Signed-off-by: Yinghai Lu Signed-off-by: Ingo Molnar commit 834836ee6a3a9deadff9394b23db965a08bcde35 Author: Yinghai Lu Date: Thu Oct 2 15:46:20 2008 -0700 x86: mtrr_cleanup try gran_size to less than 1M, v3 J.A. Magallón reported: >> Also, on a 64 bit box with 4Gb, it gives this: >> >> cicely:~# cat /proc/mtrr >> reg00: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1 >> reg01: base=0x100000000 (4096MB), size=1024MB: write-back, count=1 >> reg02: base=0x140000000 (5120MB), size= 512MB: write-back, count=1 >> reg03: base=0x160000000 (5632MB), size= 256MB: write-back, count=1 >> reg04: base=0x80000000 (2048MB), size=2048MB: uncachable, count=1 boundary handling has a problem ... fix it. Reported-by: J.A. Magallón Signed-off-by: Ingo Molnar commit 136c82c6f7f12c9bed8bf709f92123077966512c Author: J.A. Magallón Date: Fri Oct 3 00:32:37 2008 +0200 x86: mtrr_cleanup try gran_size to less than 1M, cleanup Patch below cleans up formatting, with space for big bases and sizes (64 Gb). Signed-off-by: Ingo Molnar commit 2ffb3501f6f356ff80e7149214bc64d3fa9021c4 Author: Yinghai Lu Date: Tue Sep 30 16:29:40 2008 -0700 x86: change MTRR_SANITIZER to def_bool y This option has been added in v2.6.26 as a default-disabled feature and went through several revisions since then. The feature fixes a wide range of MTRR setup problems that BIOSes leave us with: slow system, slow Xorg, slow system when adding lots of RAM, etc., so we want to enable it by default for v2.6.28. See: [Bug 10508] Upgrade to 4GB of RAM messes up MTRRs http://bugzilla.kernel.org/show_bug.cgi?id=10508 and the test results in: http://lkml.org/lkml/2008/9/29/273 1. hpa reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1 reg01: base=0x13c000000 (5056MB), size= 64MB: uncachable, count=1 reg02: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1 reg03: base=0x100000000 (4096MB), size=1024MB: write-back, count=1 reg04: base=0xbf700000 (3063MB), size= 1MB: uncachable, count=1 reg05: base=0xbf800000 (3064MB), size= 8MB: uncachable, count=1 will get Found optimal setting for mtrr clean up gran_size: 1M chunk_size: 128M num_reg: 6 lose RAM: 0M range0: 0000000000000000 - 00000000c0000000 Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB Setting variable MTRR 1, base: 2048MB, range: 1024MB, type WB hole: 00000000bf700000 - 00000000c0000000 Setting variable MTRR 2, base: 3063MB, range: 1MB, type UC Setting variable MTRR 3, base: 3064MB, range: 8MB, type UC range0: 0000000100000000 - 0000000140000000 Setting variable MTRR 4, base: 4096MB, range: 1024MB, type WB hole: 000000013c000000 - 0000000140000000 Setting variable MTRR 5, base: 5056MB, range: 64MB, type UC 2. Dylan Taft reg00: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1 reg01: base=0x100000000 (4096MB), size= 512MB: write-back, count=1 reg02: base=0x120000000 (4608MB), size= 256MB: write-back, count=1 reg03: base=0xd0000000 (3328MB), size= 256MB: uncachable, count=1 reg04: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1 reg05: base=0xc7e00000 (3198MB), size= 2MB: uncachable, count=1 reg06: base=0xc8000000 (3200MB), size= 128MB: uncachable, count=1 will get Found optimal setting for mtrr clean up gran_size: 1M chunk_size: 4M num_reg: 6 lose RAM: 0M range0: 0000000000000000 - 00000000c8000000 Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB Setting variable MTRR 1, base: 2048MB, range: 1024MB, type WB Setting variable MTRR 2, base: 3072MB, range: 128MB, type WB hole: 00000000c7e00000 - 00000000c8000000 Setting variable MTRR 3, base: 3198MB, range: 2MB, type UC rangeX: 0000000100000000 - 0000000130000000 Setting variable MTRR 4, base: 4096MB, range: 512MB, type WB Setting variable MTRR 5, base: 4608MB, range: 256MB, type WB 3. Gabriel reg00: base=0xd0000000 (3328MB), size= 256MB: uncachable, count=1 reg01: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1 reg02: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1 reg03: base=0x100000000 (4096MB), size= 512MB: write-back, count=1 reg04: base=0x120000000 (4608MB), size= 128MB: write-back, count=1 reg05: base=0x128000000 (4736MB), size= 64MB: write-back, count=1 reg06: base=0xcf600000 (3318MB), size= 2MB: uncachable, count=1 will get Found optimal setting for mtrr clean up gran_size: 1M chunk_size: 16M num_reg: 7 lose RAM: 0M range0: 0000000000000000 - 00000000d0000000 Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB Setting variable MTRR 1, base: 2048MB, range: 1024MB, type WB Setting variable MTRR 2, base: 3072MB, range: 256MB, type WB hole: 00000000cf600000 - 00000000cf800000 Setting variable MTRR 3, base: 3318MB, range: 2MB, type UC rangeX: 0000000100000000 - 000000012c000000 Setting variable MTRR 4, base: 4096MB, range: 512MB, type WB Setting variable MTRR 5, base: 4608MB, range: 128MB, type WB Setting variable MTRR 6, base: 4736MB, range: 64MB, type WB 4. Mika Fischer reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1 reg01: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1 reg02: base=0x100000000 (4096MB), size=1024MB: write-back, count=1 reg03: base=0xbf700000 (3063MB), size= 1MB: uncachable, count=1 reg04: base=0xbf800000 (3064MB), size= 8MB: uncachable, count=1 will get Found optimal setting for mtrr clean up gran_size: 1M chunk_size: 16M num_reg: 5 lose RAM: 0M range0: 0000000000000000 - 00000000c0000000 Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB Setting variable MTRR 1, base: 2048MB, range: 1024MB, type WB hole: 00000000bf700000 - 00000000c0000000 Setting variable MTRR 2, base: 3063MB, range: 1MB, type UC Setting variable MTRR 3, base: 3064MB, range: 8MB, type UC rangeX: 0000000100000000 - 0000000140000000 Setting variable MTRR 4, base: 4096MB, range: 1024MB, type WB Signed-off-by: Yinghai Lu Signed-off-by: Ingo Molnar commit 1bb28499979d926806139bbdef6969fc37621118 Author: Faisal Latif Date: Fri Sep 26 15:08:10 2008 -0500 RDMA/nes: Handle AE bounds violation Handle async error NES_AEQE_AEID_AMP_BOUNDS_VIOLATION. Signed-off-by: Faisal Latif Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung Signed-off-by: Roland Dreier commit 9d156947c734747065178331e0c95745cf3a55e1 Author: Chien Tung Date: Fri Sep 26 15:08:10 2008 -0500 RDMA/nes: Limit critical error interrupts Mask off a critical error after 100 critical error interrupts to keep the system "sane". Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung Signed-off-by: Roland Dreier commit 068e80de6af2b920d2644bba3a2c060431834160 Author: Chien Tung Date: Fri Sep 26 15:08:10 2008 -0500 RDMA/nes: Stop spurious MAC interrupts Mask off MAC interrupts on netdev_stop to prevent spurious MAC interrupts on unload/reload of iw_nes. Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung Signed-off-by: Roland Dreier commit 168ac8244df5af1a9ab03bf39e4a9d3161dd9f11 Author: Chien Tung Date: Fri Sep 26 15:08:10 2008 -0500 RDMA/nes: Correct tso_wqe_length Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung Signed-off-by: Roland Dreier commit 0c93ae355ed7301249d932e509f8546977d53376 Author: Chien Tung Date: Fri Sep 26 15:08:10 2008 -0500 RDMA/nes: Fill in firmware version for ethtool Fill in firmware version for ethtool_drvinfo. Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung Signed-off-by: Roland Dreier commit 27ffed603f555ce0a644de6e550d3462ff51d64f Author: John Lacombe Date: Fri Sep 26 15:08:10 2008 -0500 RDMA/nes: Use ethtool timer value Use timer value set via ethtool intead of #defines. Signed-off-by: John Lacombe Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung Signed-off-by: Roland Dreier commit a06fd26d48eb3304db246f3f4a0aa5a50afb10ec Author: Bob Sharp Date: Fri Sep 26 15:08:10 2008 -0500 RDMA/nes: Correct MAX TSO frags value Use correct define for max TSO fragments. Signed-off-by: Bob Sharp Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung Signed-off-by: Roland Dreier commit e0e31afbf9a9bb6ca934d3c64ef321cb5f873efe Author: Bob Sharp Date: Fri Sep 26 15:08:10 2008 -0500 RDMA/nes: Enable MC/UC after changing MTU Re-enable multicast and unicast after changing MTU. Signed-off-by: Bob Sharp Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung Signed-off-by: Roland Dreier commit 7a8d14070b3e2d52d2b531434ed09fa1787ae7ca Author: Bob Sharp Date: Fri Sep 26 15:08:10 2008 -0500 RDMA/nes: Free NIC TX buffers when destroying NIC QP Signed-off-by: Bob Sharp Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung Signed-off-by: Roland Dreier commit e88bd7b624133e0b07adb21c45c9e6f68f8fdda2 Author: Chien Tung Date: Fri Sep 26 15:08:10 2008 -0500 RDMA/nes: Fix MDC setting Clear MDC bits before setting them to a new value. Adjust MDC value for 10G. Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung Signed-off-by: Roland Dreier commit 2b537c2824194d50072ab260f54d6fe4cb8d17e8 Author: Chien Tung Date: Fri Sep 26 15:08:10 2008 -0500 RDMA/nes: Add wqm_quanta module option Add a module parameter wqm_quanta. It controls the number of segments transmitted at a time. Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung Signed-off-by: Roland Dreier commit de182149c31786b2b07fa408fb076599b29232a1 Author: Chien Tung Date: Fri Sep 26 15:08:10 2008 -0500 RDMA/nes: Module parameter permissions Change permission to 0644 so root can set mpa_version, disable_mpa_crc, send_first, and nes_drv_opt at runtime. Signed-off-by: Sweta Bhatt Signed-off-by: Chien Tung Signed-off-by: Roland Dreier commit c752c78275fbf3fcb1d6d0af9b03ac999fe1963d Author: Jon Mason Date: Tue Sep 30 14:51:19 2008 -0700 RDMA/cxgb3: Set active_mtu in ib_port_attr When running ibv_devinfo, the active_mtu returned is garbage. This is due to the field not being populated in the query_port function in the driver. The patch below populates the active_mtu field with a MTU of 2k. It also zeros the struct, so that any new additions to it will return 0. Signed-off-by: Jon Mason Acked-by: Steve Wise Signed-off-by: Roland Dreier commit fcb7ad31beda842804167f0645ca54660713bcd6 Author: Chien Tung Date: Tue Sep 30 14:49:44 2008 -0700 RDMA/nes: Add support for 4-port 1G HP blade card Add support for NetEffect 4 port 1G HP blade card. The mapping between physical port and MAC is different from the standup card. Signed-off-by: Chien Tung Signed-off-by: Roland Dreier commit 54c86a8c838301e8a619e454b686288578002300 Author: Faisal Latif Date: Tue Sep 30 14:47:27 2008 -0700 RDMA/nes: Make mini_cm_connect() static Signed-off-by: Faisal Latif Signed-off-by: Roland Dreier commit a7e80ce26caa174b1caa5fdfbb3dbd740a87d33a Author: Hefty, Sean Date: Tue Sep 30 10:36:54 2008 -0700 IB/cm: Correctly free cm_device structure commit 110cf374 ("infiniband: make cm_device use a struct device and not a kobject.") introduced a memory leak, since it deleted cm_release_dev_obj(), which was where cm_dev was freed. Fix this by freeing the leaked structure after calling device_unregister(). Signed-off-by: Sean Hefty Signed-off-by: Roland Dreier commit 943c246e9ba9078a61b6bcc5b4a8131ce8befb64 Author: Roland Dreier Date: Tue Sep 30 10:36:21 2008 -0700 IPoIB: Use netif_tx_lock() and get rid of private tx_lock, LLTX Currently, IPoIB is an LLTX driver that uses its own IRQ-disabling tx_lock. Not only do we want to get rid of LLTX, this actually causes problems because of the skb_orphan() done with this tx_lock held: some skb destructors expect to be run with interrupts enabled. The simplest fix for this is to get rid of the driver-private tx_lock and stop using LLTX. We kill off priv->tx_lock and use netif_tx_lock[_bh]() instead; the patch to do this is a tiny bit tricky because we need to update places that take priv->lock inside the tx_lock to disable IRQs, rather than relying on tx_lock having already disabled IRQs. Also, there are a couple of places where we need to disable BHs to make sure we have a consistent context to call netif_tx_lock() (since we no longer can use _irqsave() variants), and we also have to change ipoib_send_comp_handler() to call drain_tx_cq() through a timer rather than directly, because ipoib_send_comp_handler() runs in interrupt context and drain_tx_cq() must run in BH context so it can call netif_tx_lock(). Signed-off-by: Roland Dreier commit 64b9e0294d24a4204232e13e01630b0690e48d61 Author: Amit K. Arora Date: Tue Sep 30 17:15:39 2008 +0530 sched: minor optimizations in wake_affine and select_task_rq_fair This patch does following: o Removes unused variable and argument "rq". o Optimizes one of the "if" conditions in wake_affine() - i.e. if "balanced" is true, we need not do rest of the calculations in the condition. o If this cpu is same as the previous cpu (on which woken up task was running when it went to sleep), no need to call wake_affine at all. Signed-off-by: Amit K Arora Acked-by: Peter Zijlstra Signed-off-by: Ingo Molnar commit 1c50b728c3e734150b8a4a8310ce3e01bc5c70be Author: Mathieu Desnoyers Date: Mon Sep 29 11:06:46 2008 -0400 rcu: add rcu_read_lock_sched() / rcu_read_unlock_sched() Add rcu_read_lock_sched() and rcu_read_unlock_sched() to rcupdate.h to match the recently added write-side call_rcu_sched() and rcu_barrier_sched(). They also match the no-so-recently-added synchronize_sched(). It will help following matching use of the update/read lock primitives. Those new read lock will replace preempt_disable()/enable() used in pair with RCU-classic synchronization. Signed-off-by: Mathieu Desnoyers Acked-by: Peter Zijlstra Signed-off-by: Ingo Molnar commit 9b1568458a3ef006361710dc12848aec891883b5 Author: Adam Jackson Date: Mon Sep 29 14:52:03 2008 -0400 x86, debug printouts: IOMMU setup failures should not be KERN_ERR The number of BIOSes that have an option to enable the IOMMU, or fix anything about its configuration, is vanishingly small. There's no good reason to punish quiet boot for this. Signed-off-by: Adam Jackson Signed-off-by: Ingo Molnar commit a03352d2c1dcb00970801fb8b800a39acd3103d9 Author: Bruce Allan Date: Mon Sep 29 20:19:22 2008 -0700 x86: export set_memory_ro and set_memory_rw Export set_memory_ro() and set_memory_rw() calls for use by drivers that need to have more debug information about who might be writing to memory space. this was initially developed for use while debugging a memory corruption problem with e1000e. Signed-off-by: Bruce Allan Signed-off-by: Jesse Brandeburg Signed-off-by: Ingo Molnar commit 208dde28b0f73c0e2dc6be74040fa562e129a6e8 Author: Roland Dreier Date: Mon Sep 29 21:37:33 2008 -0700 IB/mthca: Use pci_request_regions() Back in prehistoric (pre-git!) days, the kernel's MSI-X support did request_mem_region() on a device's MSI-X tables, which meant that a driver that enabled MSI-X couldn't use pci_request_regions() (since that would clash with the PCI layer's MSI-X request). However, that was removed (by me!) years ago, so mthca can just use pci_request_regions() and pci_release_regions() instead of its own much more complicated code that avoids requesting the MSI-X tables. Signed-off-by: Roland Dreier commit e441d6342890838bfc6d64ca2f0964aca08ae2a2 Author: Yannick Cote Date: Mon Sep 29 21:24:04 2008 -0700 IB/ipath: Fix hang on module unload Handle the case where posting a send is requested when the link is down. This fixes . Signed-off-by: Yannick Cote Signed-off-by: Roland Dreier commit 4624065731751a3ace88e5824d8e5654e2d7abd3 Author: Yinghai Lu Date: Mon Sep 29 18:54:12 2008 -0700 x86: mtrr_cleanup try gran_size to less than 1M one have gran < 1M reg00: base=0xd8000000 (3456MB), size= 128MB: uncachable, count=1 reg01: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1 reg02: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1 reg03: base=0x100000000 (4096MB), size= 512MB: write-back, count=1 reg04: base=0x120000000 (4608MB), size= 128MB: write-back, count=1 reg05: base=0xd7f80000 (3455MB), size= 512KB: uncachable, count=1 will get Found optimal setting for mtrr clean up gran_size: 512K chunk_size: 2M num_reg: 7 lose RAM: 0G range0: 0000000000000000 - 00000000d8000000 Setting variable MTRR 0, base: 0GB, range: 2GB, type WB Setting variable MTRR 1, base: 2GB, range: 1GB, type WB Setting variable MTRR 2, base: 3GB, range: 256MB, type WB Setting variable MTRR 3, base: 3328MB, range: 128MB, type WB hole: 00000000d7f00000 - 00000000d7f80000 Setting variable MTRR 4, base: 3455MB, range: 512KB, type UC rangeX: 0000000100000000 - 0000000128000000 Setting variable MTRR 5, base: 4GB, range: 512MB, type WB Setting variable MTRR 6, base: 4608MB, range: 128MB, type WB so start from 64k instead of 1M Signed-off-by: Yinghai Lu Signed-off-by: H. Peter Anvin commit dd7e52224fd7438d701b4bb9834a9ddc06828210 Author: Yinghai Lu Date: Mon Sep 29 18:54:11 2008 -0700 x86: mtrr_cleanup prepare to make gran_size to less 1M make the print out right with size < 1M Signed-off-by: Yinghai Lu Signed-off-by: H. Peter Anvin commit 73436a1d2501575f9c2e6ddb26889145e23cefd8 Author: Yinghai Lu Date: Mon Sep 29 13:39:17 2008 -0700 x86: mtrr_cleanup safe to get more spare regs now Delay exit to make sure we can actually get the optimal result in as many cases as possible. Signed-off-by: Yinghai Lu Signed-off-by: H. Peter Anvin commit ea6b184f7d521a503ecab71feca6e4057562252b Author: Stephen Smalley Date: Mon Sep 22 15:41:19 2008 -0400 selinux: use default proc sid on symlinks As we are not concerned with fine-grained control over reading of symlinks in proc, always use the default proc SID for all proc symlinks. This should help avoid permission issues upon changes to the proc tree as in the /proc/net -> /proc/self/net example. This does not alter labeling of symlinks within /proc/pid directories. ls -Zd /proc/net output before and after the patch should show the difference. Signed-off-by: Stephen D. Smalley Signed-off-by: James Morris commit 12544697f12e0ecdcf971075415c7678fae502af Author: dcg Date: Sun Sep 28 18:49:46 2008 +0200 x86_64: be less annoying on boot, v2 Honour "quiet" boot parameter in early_printk() calls Signed-off-by: Diego Calleja Signed-off-by: Ingo Molnar commit 0395e61babd59c749fb5efe112affbfaa7d50eb7 Author: Seth Heasley Date: Wed Aug 27 16:40:06 2008 -0700 ata_piix: IDE Mode SATA patch for Intel Ibex Peak DeviceIDs This patch updates the Intel Ibex Peak (PCH) IDE mode SATA Controller DeviceIDs. Signed-off-by: Seth Heasley Signed-off-by: Jeff Garzik commit 11fc33da8d8413d6bfa5143f454dfcb998c27617 Author: Tejun Heo Date: Sat Aug 30 14:20:01 2008 +0200 libata-eh: clear UNIT ATTENTION after reset Resets make ATAPI devices raise UNIT ATTENTION which fails the next command. As resets can happen asynchronously for unrelated reasons, this sometimes disrupts innocent users. For example, reading DVD fails after the system wakes up from suspend or the other device sharing the channel went through bus error. Clearing UA has some problems as it might clear UA which the userland needs to know about. However, UA after resets can only be about the reset itself and benefits of clearing it overweights cons. Missing UA can only delay failure to one of the following commands anyway. For example, timeout while burning is in progress will trigger reset and reset the device state and probably corrupt the burning run. Although the userland application won't get the UA, its pending writes will fail. Signed-off-by: Tejun Heo Signed-off-by: Jeff Garzik commit d09addf65cb5b3b19a536aa3329efeedbc6bb56c Author: Herton Ronaldo Krzesinski Date: Wed Sep 17 14:29:05 2008 -0300 ata_piix: add Hercules EC-900 mini-notebook to ich_laptop short cable list Signed-off-by: Herton Ronaldo Krzesinski Signed-off-by: Jeff Garzik commit 6866e7bc83f13a1bc6de59099930e9db1ab0042f Author: Richard Kennedy Date: Mon Sep 22 14:47:13 2008 -0700 libata: reorder ata_device to remove 8 bytes of padding on 64 bits reduce size by 8 bytes from 1160 to 1152 allowing it to fit in 1 fewer cachelines. Signed-off-by: Richard Kennedy Signed-off-by: Andrew Morton Signed-off-by: Jeff Garzik commit 67e3e221d61c0e70b2f244fd921e5e601d6c7339 Author: Sonic Zhang Date: Mon Sep 22 14:47:10 2008 -0700 [libata] pata_bf54x: Add proper PM operation [akpm@linux-foundation.org: remove ifdefs, make things static] Signed-off-by: Sonic Zhang Signed-off-by: Bryan Wu Cc: Jeff Garzik Signed-off-by: Andrew Morton Signed-off-by: Jeff Garzik commit 47d692a946f12c299c21536fff6b39369311f002 Author: Kumar Gala Date: Mon Sep 22 14:47:33 2008 -0700 pata_sil680: convert CONFIG_PPC_MERGE to CONFIG_PPC Now that arch/ppc is dead CONFIG_PPC_MERGE is always defined for all powerpc platforms and we want to get rid of CONFIG_PPC_MERGE use CONFIG_PPC instead. Signed-off-by: Kumar Gala Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Alan Cox Signed-off-by: Andrew Morton Signed-off-by: Jeff Garzik commit 45fabbb77bd95adff7a80bde1c7a0ace1075fde6 Author: Elias Oltmanns Date: Sun Sep 21 11:54:08 2008 +0200 libata: Implement disk shock protection support On user request (through sysfs), the IDLE IMMEDIATE command with UNLOAD FEATURE as specified in ATA-7 is issued to the device and processing of the request queue is stopped thereafter until the specified timeout expires or user space asks to resume normal operation. This is supposed to prevent the heads of a hard drive from accidentally crashing onto the platter when a heavy shock is anticipated (like a falling laptop expected to hit the floor). In fact, the whole port stops processing commands until the timeout has expired in order to avoid any resets due to failed commands on another device. Signed-off-by: Elias Oltmanns Signed-off-by: Jeff Garzik commit ea6ce53cd5d005455ec0a3cc1d45d3af0cb90919 Author: Elias Oltmanns Date: Fri Sep 19 23:46:01 2008 +0200 [libata] Introduce ata_id_has_unload() Add a function to check an ATA device's id for head unload support as specified in ATA-7. Signed-off-by: Elias Oltmanns Signed-off-by: Jeff Garzik commit 2ad69677b626fc311783b47af25dfecf7be2845b Author: Ben Dooks Date: Fri Sep 26 18:12:52 2008 +0100 PATA: RPC now selects HAVE_PATA_PLATFORM for pata platform driver The RPC machine type now selects HAVE_PATA_PLATFORM so we can remove the special case in the PATA_PLATFORM configuration code. Cc: Russell King Signed-off-by: Ben Dooks Signed-off-by: Jeff Garzik commit be77e43abb433c2d6f2fc69352289e34dcbf040a Author: Tejun Heo Date: Thu Jul 31 17:02:44 2008 +0900 ata_piix: drop merged SCR access and use slave_link instead Now that libata has slave_link, there's no need to keep ugly merged SCR access. Drop it and use slave_link instead. This results in simpler code and much better separate link handling for master and slave. Signed-off-by: Tejun Heo Signed-off-by: Jeff Garzik commit b1c72916abbdd0a55015c87358536ca0ebaf6735 Author: Tejun Heo Date: Thu Jul 31 17:02:43 2008 +0900 libata: implement slave_link Explanation taken from the comment of ata_slave_link_init(). In libata, a port contains links and a link contains devices. There is single host link but if a PMP is attached to it, there can be multiple fan-out links. On SATA, there's usually a single device connected to a link but PATA and SATA controllers emulating TF based interface can have two - master and slave. However, there are a few controllers which don't fit into this abstraction too well - SATA controllers which emulate TF interface with both master and slave devices but also have separate SCR register sets for each device. These controllers need separate links for physical link handling (e.g. onlineness, link speed) but should be treated like a traditional M/S controller for everything else (e.g. command issue, softreset). slave_link is libata's way of handling this class of controllers without impacting core layer too much. For anything other than physical link handling, the default host link is used for both master and slave. For physical link handling, separate @ap->slave_link is used. All dirty details are implemented inside libata core layer. From LLD's POV, the only difference is that prereset, hardreset and postreset are called once more for the slave link, so the reset sequence looks like the following. prereset(M) -> prereset(S) -> hardreset(M) -> hardreset(S) -> softreset(M) -> postreset(M) -> postreset(S) Note that softreset is called only for the master. Softreset resets both M/S by definition, so SRST on master should handle both (the standard method will work just fine). As slave_link excludes PMP support and only code paths which deal with the attributes of physical link are affected, all the changes are localized to libata.h, libata-core.c and libata-eh.c. * ata_is_host_link() updated so that slave_link is considered as host link too. * iterator extended to iterate over the slave_link when using the underbarred version. * force param handling updated such that devno 16 is mapped to the slave link/device. * ata_link_on/offline() updated to return the combined result from master and slave link. ata_phys_link_on/offline() are the direct versions. * EH autopsy and report are performed separately for master slave links. Reset is udpated to implement the above described reset sequence. Except for reset update, most changes are minor, many of them just modifying dev->link to ata_dev_phys_link(dev) or using phys online test instead. After this update, LLDs can take full advantage of per-dev SCR registers by simply turning on slave link. Signed-off-by: Tejun Heo Signed-off-by: Jeff Garzik commit b5b3fa386b8f96c7fa92e507e5deddc2637924b4 Author: Tejun Heo Date: Thu Jul 31 17:02:42 2008 +0900 libata: misc updates to prepare for slave link * Add ATA_EH_ALL_ACTIONS. * Make sata_link_{on|off}_line() return bool instead of int. Signed-off-by: Tejun Heo Signed-off-by: Jeff Garzik commit aadffb682cc5572f48cc24883681db65530bd284 Author: Tejun Heo Date: Thu Jul 31 17:02:41 2008 +0900 libata: reimplement link iterator Implement __ata_port_next_link() and reimplement __ata_port_for_each_link() and ata_port_for_each_link() using it. This removes relatively large inlined code and makes iteration easier to extend. Signed-off-by: Tejun Heo Signed-off-by: Jeff Garzik commit 82ef04fb4c82542b3eda81cca461f0594ce9cd0b Author: Tejun Heo Date: Thu Jul 31 17:02:40 2008 +0900 libata: make SCR access ops per-link Logically, SCR access ops should take @link; however, there was no compelling reason to convert all SCR access ops when adding @link abstraction as there's one-to-one mapping between a port and a non-PMP link. However, that assumption won't hold anymore with the scheduled addition of slave link. Make SCR access ops per-link. Signed-off-by: Tejun Heo Signed-off-by: Jeff Garzik commit 8f0afaa58e912bbe7d5b0bad9fb024337edf363e Author: Yinghai Lu Date: Sat Sep 27 20:26:06 2008 -0700 x86: mtrr_cleanup hole size should be less than half of chunk_size, v2 v2: should check with half of range0 size instead of chunk_size So don't have silly big hole. in hpa's case we could auto detect instead of adding mtrr_chunk_size in command line. Signed-off-by: Yinghai Lu Signed-off-by: Ingo Molnar commit 54d45ff4208836e62536fc40b0141586dbf6641f Author: Yinghai Lu Date: Sat Sep 27 00:30:06 2008 -0700 x86: add mtrr_cleanup_debug command line add mtrr_cleanup_debug to print out more info about layout Signed-off-by: Yinghai Lu Cc: Yinghai Lu Cc: Andrew Morton Signed-off-by: Ingo Molnar commit 2313c2793d290a8cc37c428f8622c53f3fe1d6dc Author: Yinghai Lu Date: Sat Sep 27 00:30:08 2008 -0700 x86: mtrr_cleanup optimization, v2 fix hpa's t61 with 4g ram: change layout from (n - 1)*chunksize + chunk_size - NC to n*chunksize - NC Signed-off-by: Yinghai Lu Signed-off-by: Ingo Molnar commit 7fc2368d1d0dce7a778beb2fba3acac8fa7a34b6 Author: Yinghai Lu Date: Sat Sep 27 00:30:07 2008 -0700 x86: don't need to go to chunksize to 4G change back chunksize max to 2g otherwise will get strange layout in 2G ram system like 0 - 4g WB, 2040M - 2048M UC, 2048M - 4G NC instead of 0 - 2g WB, 2040M - 2048M UC Signed-off-by: Yinghai Lu Signed-off-by: Ingo Molnar commit de45e806a84909648623119dfe6fc1d31e71ceba Author: Serge E. Hallyn Date: Fri Sep 26 22:27:47 2008 -0400 file capabilities: uninline cap_safe_nice This reduces the kernel size by 289 bytes. Signed-off-by: Serge E. Hallyn Acked-by: Andrew G. Morgan Signed-off-by: James Morris commit 254db57f9b12daba841a4d91ddb9a8161e9c74ba Author: Steven Whitehouse Date: Fri Sep 26 10:23:22 2008 +0100 GFS2: Support for I/O barriers This patch adds barrier support to GFS2. There is not a lot of change really... we just add the barrier flag when we write journal header blocks. If the underlying device refuses to support them, we fall back to the previous way of doing things (wait for the I/O and hope) since there is nothing else we can do. There is no user configuration, barriers will always be on unless the device refuses to support them. This seems a reasonable solution to me since this is a correctness issue. Signed-off-by: Steven Whitehouse commit c9da4bad5b80c3d9884e2c6ad8d2091252c32d5e Author: Roland Dreier Date: Thu Sep 25 15:26:15 2008 -0700 IPoIB: Fix crash when path record fails after path flush Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM change events") changed how paths are flushed on an SM event. This change introduces a problem if the path record query triggered by fails, causing path->ah to become NULL. A later successful path query will then trigger WARN_ON() in path_rec_completion(), and crash because path->ah has already been freed, so the ipoib_put_ah() inside the lock in path_rec_completion() may actually drop the last reference (contrary to the comment that claims this is safe). Fix this by updating path->ah and freeing old_ah only when the path record query is successful. This prevents the neighbour AH and that path AH from getting out of sync. This fixes Reported-by: Rabah Salem Debugged-by: Eli Cohen Signed-off-by: Roland Dreier commit b87f17242da6b2ac6db2d179b2f93fb84cff2fbe Author: Bharata B Rao Date: Thu Sep 25 09:53:54 2008 +0530 sched: maintain only task entities in cfs_rq->tasks list cfs_rq->tasks list is used by the load balancer to iterate over all the tasks. Currently it holds all the entities (both task and group entities) because of which there is a need to check for group entities explicitly during load balancing. This patch changes the cfs_rq->tasks list to hold only task entities. Signed-off-by: Bharata B Rao Acked-by: Peter Zijlstra Signed-off-by: Ingo Molnar commit f6476774f1fe32593d3d71903b1e98514efbf685 Author: Bill Nottingham Date: Wed Sep 24 14:35:17 2008 -0400 x86_64: be less annoying on boot Remove mostly useless message on every boot. Signed-off-by: Bill Nottingham Signed-off-by: Ingo Molnar commit e51a1ac2dfca9ad869471e88f828281db7e810c0 Author: Harvey Harrison Date: Tue Sep 23 15:20:09 2008 -0700 x86, olpc: fix endian bug in openfirmware workaround Boardrev is always treated as a u32 everywhere else, no reason to byteswap the 0xc2 value. The only use is to print out if it is a prerelease board, the test being: (olpc_platform_info.boardrev & 0xf) < 8 Which is currently always true as be32_to_cpu(0xc2) & 0xf = 0 but I doubt that was the intention here. The consequences of the bug are pretty minor though (incorrect boardrev displayed in dmesg when ofw support not configured) Also annotate the temporary used to read the boardrev in the ofw case. The confusion was noticed by Sparse: arch/x86/kernel/olpc.c:206:32: warning: cast to restricted __be32 Signed-off-by: Harvey Harrison Signed-off-by: Ingo Molnar commit 493cd9122af5bd0b219974a48f0e31da0c29ff7e Author: Harvey Harrison Date: Tue Sep 23 14:56:44 2008 -0700 x86: ds.c ptrace.c integer as NULL pointer sparse fixes fix: arch/x86/kernel/ptrace.c:763:29: warning: Using plain integer as NULL pointer arch/x86/kernel/ptrace.c:777:46: warning: Using plain integer as NULL pointer arch/x86/kernel/ptrace.c:1115:45: warning: Using plain integer as NULL pointer arch/x86/kernel/ds.c:482:26: warning: Using plain integer as NULL pointer arch/x86/kernel/ds.c:487:25: warning: Using plain integer as NULL pointer Signed-off-by: Harvey Harrison Acked-by: Cyrill Gorcunov Signed-off-by: Ingo Molnar commit ebdd90a8cb2e3963f55499850f02ce6003558b55 Merge: 3c93390... 72d3105... Author: Ingo Molnar Date: Wed Sep 24 09:56:20 2008 +0200 Merge commit 'v2.6.27-rc7' into x86/pebs commit 57fdc26d4a734a3e00c6b2fc0e1e40ff8da4dc31 Author: Peter Zijlstra Date: Tue Sep 23 15:33:45 2008 +0200 sched: fixup buddy selection We should set the buddy even though we might already have the TIF_RESCHED flag set. Signed-off-by: Peter Zijlstra Signed-off-by: Ingo Molnar commit 4653f803e6e0d970ffeac0efd2c01743eb6c5228 Author: Peter Zijlstra Date: Tue Sep 23 15:33:44 2008 +0200 sched: more sanity checks on the bandwidth settings While playing around with it, I noticed we missed some sanity checks. Also add some comments while we're there. Signed-off-by: Peter Zijlstra Signed-off-by: Ingo Molnar commit 78333cdd0e472180743d35988e576d6ecc6f6ddb Author: Peter Zijlstra Date: Tue Sep 23 15:33:43 2008 +0200 sched: add some comments to the bandwidth code Hopefully clarify some of this code a little. Signed-off-by: Peter Zijlstra Signed-off-by: Ingo Molnar commit 940959e93949e839c14f8ddc3b9b0e34a2ab6e29 Author: Peter Zijlstra Date: Tue Sep 23 15:33:42 2008 +0200 sched: fixlet for group load balance We should not only correct the increment for the initial group, but should be consistent and do so for all the groups we encounter. Signed-off-by: Peter Zijlstra Signed-off-by: Ingo Molnar commit 63e5c39859a41591662466028c4d1281c033c05a Merge: 6956985... fa74820... c8bfff6... Author: Ingo Molnar Date: Tue Sep 23 16:23:05 2008 +0200 Merge branches 'sched/urgent' and 'sched/rt' into sched/devel commit 695698500912c4479ddf4723e492de3970ff8530 Author: Peter Zijlstra Date: Tue Sep 23 14:54:23 2008 +0200 sched: rework wakeup preemption Rework the wakeup preemption to work on real runtime instead of the virtual runtime. This greatly simplifies the code. Signed-off-by: Peter Zijlstra Signed-off-by: Ingo Molnar commit 1a73ef6ac3f4b44abc9d1875eb9240d7524a7cf7 Author: Martin Steigerwald Date: Tue Sep 23 13:48:44 2008 +0200 CFS scheduler: documentation about scheduling policies The documentation about the CFS scheduler is scarse when it comes to scheduling policies. This patch adds a chapter about the scheduling policies it supports. Peter Zijlstra provided most of the information for it in http://marc.info/?l=linux-kernel&m=122210038326356&w=2 Signed-off-by: Martin Steigerwald Acked-by: Peter Zijlstra Signed-off-by: Ingo Molnar commit 3a72dc8eb5a7122fff439a22bd22486a4fff505c Author: Harvey Harrison Date: Mon Sep 22 14:55:46 2008 -0700 rcu: fix sparse shadowed variable warning kernel/rcuclassic.c:564:18: warning: symbol 'flags' shadows an earlier one kernel/rcuclassic.c:527:16: originally declared here Signed-off-by: Harvey Harrison Acked-by: Paul E. McKenney Signed-off-by: Andrew Morton Signed-off-by: Ingo Molnar commit 006c75f146e58e080d2b2725a6664f71886e112b Author: Andrew Morton Date: Mon Sep 22 14:55:46 2008 -0700 sched: clarify ifdef tangle - Add some comments to try to make the ifdef puzzle a bit clearer - Explicitly inline one of the three init_hrtick() implementations. Signed-off-by: Andrew Morton Signed-off-by: Ingo Molnar commit b3e15bdef689641e7f1bb03efbe56112c3ee82e2 Author: Aristeu Rozanski Date: Mon Sep 22 13:13:59 2008 -0400 x86, NMI watchdog: setup before enabling NMI watchdog There's a small window when NMI watchdog is being set up that if any NMIs are triggered, the NMI code will make make use of not initalized wd_ops elements: void setup_apic_nmi_watchdog(void *unused) { if (__get_cpu_var(wd_enabled)) return; /* cheap hack to support suspend/resume */ /* if cpu0 is not active neither should the other cpus */ if (smp_processor_id() != 0 && atomic_read(&nmi_active) <= 0) return; switch (nmi_watchdog) { case NMI_LOCAL_APIC: /* enable it before to avoid race with handler */ --> __get_cpu_var(wd_enabled) = 1; --> if (lapic_watchdog_init(nmi_hz) < 0) { (...) asmlinkage notrace __kprobes void default_do_nmi(struct pt_regs *regs) { (...) if (nmi_watchdog_tick(regs, reason)) return; (...) notrace __kprobes int nmi_watchdog_tick(struct pt_regs *regs, unsigned reason) { (...) if (!__get_cpu_var(wd_enabled)) return rc; switch (nmi_watchdog) { case NMI_LOCAL_APIC: rc |= lapic_wd_event(nmi_hz); (...) int lapic_wd_event(unsigned nmi_hz) { struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk); u64 ctr; --> rdmsrl(wd->perfctr_msr, ctr); and wd->*_msr will be initialized on each processor type specific setup, after enabling NMIs for PMIs. Since the counter was just set, the chances of an performance counter generated NMI is minimal, but any other unknown NMI would trigger the problem. This patch fixes the problem by setting everything up before enabling performance counter generated NMIs and will set wd_enabled using a callback function. Signed-off-by: Aristeu Rozanski Acked-by: Don Zickus Acked-by: Prarit Bhargava Acked-by: Vivek Goyal Signed-off-by: Ingo Molnar commit 28b166a700899a0f88b1cc283c449fb5bf72a635 Author: Aristeu Rozanski Date: Mon Sep 22 13:14:13 2008 -0400 x86, NMI watchdog: when booting with reset_devices, clear the performance counters P4s have a quirk that makes necessary to clear P4_CCCR_OVF bit on the CCCR everytime the PMI is triggered. When booting the kernel with reset_devices (more specific kdump case), the counters reach zero and the PMI will be generated. This is not a problem on other processors but on P4s, it'll continue to generate NMIs until that bit is cleared. Since there may be other users of the performance counters, clear and disable all of them when booting with reset_devices option. We have a P4 box here that crashes because of this problem. Since the kdump kernel usually boots with only one processor active, the second logical unit won't be set up, therefore, MSR_P4_IQ_CCCR1 (and other performance counter registers) won't be cleared and P4_CCCR_OVF may be still set because the previous kernel was using this register. An NMI is triggered because of the MSR_P4_IQ_CCCR1 right after the NMI delivery is enabled, triggering the race fixed on my previous email. Signed-off-by: Aristeu Rozanski Acked-by: Don Zickus Acked-by: Prarit Bhargava Acked-by: Vivek Goyal Signed-off-by: Ingo Molnar commit caea8a03702c147e8ae90da0801e7ba8297b1d46 Author: Chris Friesen Date: Mon Sep 22 11:06:09 2008 -0600 sched: fix list traversal to use _rcu variant load_balance_fair() calls rcu_read_lock() but then traverses the list using the regular list traversal routine. This patch converts the list traversal to use the _rcu version. Signed-off-by: Chris Friesen Signed-off-by: Ingo Molnar commit f681bbd656b01439be904250a1581ca9c27505a1 Author: Ingo Molnar Date: Mon Sep 22 16:29:00 2008 +0200 sched: turn off WAKEUP_OVERLAP WAKEUP_OVERLAP is not a winner on a 16way box, running psql+sysbench: .27-rc7-NO_WAKEUP_OVERLAP .27-rc7-WAKEUP_OVERLAP ------------------------------------------------- 1: 694 811 +14.39% 2: 1454 1427 -1.86% 4: 3017 3070 +1.70% 8: 5694 5808 +1.96% 16: 10592 10612 +0.19% 32: 9693 9647 -0.48% 64: 8507 8262 -2.97% 128: 8402 7087 -18.55% 256: 8419 5124 -64.30% 512: 7990 3671 -117.62% ------------------------------------------------- SUM: 64466 55524 -16.11% ... so turn it off by default. Signed-off-by: Ingo Molnar commit 15afe09bf496ae10c989e1a375a6b5da7bd3e16e Author: Peter Zijlstra Date: Sat Sep 20 23:38:02 2008 +0200 sched: wakeup preempt when small overlap Lin Ming reported a 10% OLTP regression against 2.6.27-rc4. The difference seems to come from different preemption agressiveness, which affects the cache footprint of the workload and its effective cache trashing. Aggresively preempt a task if its avg overlap is very small, this should avoid the task going to sleep and find it still running when we schedule back to it - saving a wakeup. Reported-by: Lin Ming Signed-off-by: Peter Zijlstra Signed-off-by: Ingo Molnar commit 16dc552f35bc0ec6fec8ef83f8032eee352d17f5 Author: Yinghai Lu Date: Fri Sep 19 11:45:04 2008 -0700 x86: use WARN_ONCE in workaround for mtrr mask so could help catch attention about bug in bios about mtrr mask setting. WARN_ONCE got into mainline already, lets use it. Signed-off-by: Yinghai Lu Signed-off-by: Ingo Molnar commit 0b88641f1bafdbd087d5e63987a30cc0eadd63b9 Merge: fbdbf70... 72d3105... Author: Ingo Molnar Date: Mon Sep 22 13:08:57 2008 +0200 Merge commit 'v2.6.27-rc7' into x86/debug commit 153dab77e228931f3aff74f21b762927ac710ca7 Author: Akinobu Mita Date: Sun Sep 21 23:25:40 2008 +0900 x86: use platform_device_register_simple() Cleanup pcspeaker.c Signed-off-by: Akinobu Mita Signed-off-by: Ingo Molnar commit af2d237bf574f89ae5a1b67f2556a324c8f64ff5 Author: Akinobu Mita Date: Sun Sep 21 23:27:13 2008 +0900 x86: check for ioremap() failure in copy_oldmem_page() Add a check for ioremap() failure in copy_oldmem_page(). This patch also includes small coding style fixes. Signed-off-by: Akinobu Mita Signed-off-by: Ingo Molnar commit 6d80c39f9155e289fe8037a8b6352931ff916ceb Author: Steven Whitehouse Date: Mon Sep 22 07:29:31 2008 +0100 GFS2: Add UUID to GFS2 sb This patch adds a UUID to the GFS2 sb structure. This field is not actually referenced from kernel space at all, but is added for completeness and due to the userland tools which get their on-disk structure information from the gfs2_ondisk.h header file. Since we have to be backwards compatible, we will assume that any GFS2 sb for which the UUID is all 0 does not have a UUID as such. We should then be (after some userland changes) able to support the -U mount option. This addresses Fedora bugzilla #242689 Signed-off-by: Steven Whitehouse commit ab2b49518e743962f71b94246855c44ee9cf52cc Merge: f058925... 72d3105... Author: James Morris Date: Sun Sep 21 17:41:56 2008 -0700 Merge branch 'master' into next Conflicts: MAINTAINERS Thanks for breaking my tree :-) Signed-off-by: James Morris commit cd86f420614c1a2dea9c21d7f4f1acb5ec2465b2 Author: Julia Lawall Date: Sat Sep 20 20:06:32 2008 -0700 IB: Drop code after return statement A break after a return serves no purpose, remove it. Signed-off-by: Julia Lawall Reviewed-by: Richard Genoud Signed-off-by: Roland Dreier commit 7097228c54e7348d8c8c6dccc96e50191e39c2f8 Author: Michael Brooks Date: Sat Sep 20 20:06:16 2008 -0700 IB/mad: Don't discard BMA responses in kernel This fixes the problem of incoming BMA responses being dropped due to a bad "is response" check. Fix the test to use the ib_response_mad() predicate, which correctly handles BMA MADs. This fixes . Signed-off-by: Michael Brooks Acked-by: Sean Hefty Signed-off-by: Roland Dreier commit 940358967599ba9057b3c51ba906e1cd5b984729 Author: Ralph Campbell Date: Sat Sep 20 20:05:51 2008 -0700 IB/ipath: Fix SLID generation for RC/UC QPs when LMC > 0 The code to set the source LID in the sent LRH was not setting the low bits if LMC != 0 for RC/UC QPs. Signed-off-by: Ralph Campbell Signed-off-by: Roland Dreier commit b9012e0a4255c93e1d81f1ccee591de6414b5955 Author: Alexander Schmidt Date: Sat Sep 20 20:05:21 2008 -0700 IB/ehca: Generate flush status CQ entries When a QP goes into error state, it is required that CQ entries with a flush error status are delivered to the application for any outstanding work requests. eHCA does not do this in hardware, so this patch adds software flush CQE generation to the ehca driver. Whenever a QP gets into error state, it is added to the QP error list of its respective CQ. If the error QP list of a CQ is not empty, poll_cq() generates flush CQEs before polling the actual CQ. Signed-off-by: Alexander Schmidt Signed-off-by: Roland Dreier commit 279b0bbba2bb647348ad90e183b3960aa99eccfd Author: Yinghai Lu Date: Thu Sep 18 23:55:27 2008 -0700 x86: fix arch/x86/kernel/cpu/mtrr/main.c warning fix this warning reported by Andrew Morton: > arch/x86/kernel/cpu/mtrr/main.c: In function 'mtrr_bp_init': > arch/x86/kernel/cpu/mtrr/main.c:1170: warning: 'extra_remove_base' may be used uninitialized in this function the warning is bogus but the logic that prevents uninitialized use is a bit convoluted so simplify it all. Signed-off-by: Ingo Molnar commit 5e51900be6c15488b80343d3c3e62d4d605ba9a9 Merge: 9985647... adee14b... Author: Ingo Molnar Date: Fri Sep 19 09:15:50 2008 +0200 Merge commit 'v2.6.27-rc6' into x86/cleanups commit 719ee344675c2efed9115934f19aa66a526b6e5b Author: Steven Whitehouse Date: Thu Sep 18 13:53:59 2008 +0100 GFS2: high time to take some time over atime Until now, we've used the same scheme as GFS1 for atime. This has failed since atime is a per vfsmnt flag, not a per fs flag and as such the "noatime" flag was not getting passed down to the filesystems. This patch removes all the "special casing" around atime updates and we simply use the VFS's atime code. The net result is that GFS2 will now support all the same atime related mount options of any other filesystem on a per-vfsmnt basis. We do lose the "lazy atime" updates, but we gain "relatime". We could add lazy atime to the VFS at a later date, if there is a requirement for that variant still - I suspect relatime will be enough. Also we lose about 100 lines of code after this patch has been applied, and I have a suspicion that it will speed things up a bit, even when atime is "on". So it seems like a nice clean up as well. From a user perspective, everything stays the same except the loss of the per-fs atime quantum tweekable (ought to be per-vfsmnt at the very least, and to be honest I don't think anybody ever used it) and that a number of options which were ignored before now work correctly. Please let me know if you've got any comments. I'm pushing this out early so that you can all see what my plans are. Signed-off-by: Steven Whitehouse commit 37ec89e83c4ca98323fe74f139301ff3949cfdb6 Author: Steven Whitehouse Date: Thu Sep 18 13:49:32 2008 +0100 GFS2: The war on bloat The following patch shrinks the gfs2_args structure which is embedded in every GFS2 superblock. It cuts down the size of the options to a single unsigned int (the 13 bits of bitfields will be rounded up to that size by the compiler) from the current 11 unsigned ints. So on x86 thats 44 bytes shrinking to 4 bytes, in each and every GFS2 superblock. Signed-off-by: Steven Whitehouse commit fbdbf709938d155c719c76b9894d28342632c797 Author: Uwe Kleine-König Date: Mon Sep 15 22:02:43 2008 +0200 x86, debug: gpio_free might sleep According to the documentation gpio_free should only be called from task context only. To make this more explicit add a might sleep to all implementations. This patch changes the gpio_free implementations for the x86 architecture. Signed-off-by: Uwe Kleine-König Signed-off-by: Ingo Molnar commit 90f7d25c6b672137344f447a30a9159945ffea72 Author: Arjan van de Ven Date: Tue Sep 16 11:27:30 2008 -0700 x86: print DMI information in the oops trace in order to diagnose hard system specific issues, it's useful to have the system name in the oops (as provided by DMI) Signed-off-by: Arjan van de Ven Signed-off-by: Ingo Molnar commit 998564789137921acae9e367b61c5a1dc295653d Author: Paul Bolle Date: Tue Sep 16 11:17:03 2008 +0200 x86 setup: drop SWAP_DEV Impact: None (cleanup) SWAP_DEV is unused since 2.6.23-rc1. The comment was already incorrect since (at least) 2.6.12. Signed-off-by: Paul Bolle Signed-off-by: H. Peter Anvin commit acd2c8aa02f302ed838348052e16ee575c645147 Author: Abhijith Das Date: Mon Sep 15 08:54:06 2008 -0500 GFS2: GFS2 will panic if you misspell any mount options The gfs2 superblock pointer is NULL after a failed mount. When control eventually goes to gfs2_kill_sb, we dereference this NULL pointer. This patch ensures that the gfs2 superblock pointer is not NULL before being dereferenced in gfs2_kill_sb. Signed-off-by: Abhijith Das Signed-off-by: Steven Whitehouse commit acb57a3652c614efed26080dad5972c0076166b1 Author: Bob Peterson Date: Thu Sep 11 15:35:37 2008 -0400 GFS2: Direct IO write at end of file error This patch fixes a problem whereby a direct_io write doesn't fall back to buffered write properly at end of file. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse commit b899219572350685e6163ce7535efb5ad9bcd6a4 Author: Alexey Dobriyan Date: Sun Sep 14 13:44:41 2008 +0400 x86: simpler SYSVIPC_COMPAT definition X86_64 part is entirely redundant. Signed-off-by: Alexey Dobriyan Signed-off-by: Ingo Molnar commit f058925b201357fba48d56cc9c1719ae274b2022 Author: Stephen Smalley Date: Thu Sep 11 09:20:26 2008 -0400 Update selinux info in MAINTAINERS and Kconfig help text Update the SELinux entry in MAINTAINERS and drop the obsolete information from the selinux Kconfig help text. Signed-off-by: Stephen Smalley Signed-off-by: James Morris commit 09b22a2f678ae733801b888c44756d0abd686b8a Merge: 3ba3557... adee14b... Author: Ingo Molnar Date: Thu Sep 11 13:37:28 2008 +0200 Merge commit 'v2.6.27-rc6' into sched/devel commit 91030ca1e739696812242c807b112ee3981a14be Author: Hugh Dickins Date: Tue Sep 9 16:42:45 2008 +0100 x86: unsigned long pte_pfn pte_pfn() has always been of type unsigned long, even on 32-bit PAE; but in the current tip/next/mm tree it works out to be unsigned long long on 64-bit, which gives an irritating warning if you try to printk a pfn with the usual %lx. Now use the same pte_pfn() function, moved from pgtable-3level.h to pgtable.h, for all models: as suggested by Jeremy Fitzhardinge. And pte_page() can well move along with it (remaining a macro to avoid dependence on mm_types.h). Signed-off-by: Hugh Dickins Acked-by: Jeremy Fitzhardinge Signed-off-by: Ingo Molnar commit e8aed68614c81f24d8c4cbcb4923f848ece846e1 Author: Lai Jiangshan Date: Wed Sep 10 11:01:07 2008 +0800 doc/RCU: fix pseudocode in rcuref.txt atomic_inc_not_zero(v) return 0 if *v = 0. use spin_lock instead of write_lock for update lock. Signed-off-by: Lai Jiangshan Cc: "Paul E. McKenney" Signed-off-by: Ingo Molnar commit 429b022af41108f6942d72547592b1d30e9a51f0 Merge: 0cd418d... adee14b... Author: Ingo Molnar Date: Wed Sep 10 08:35:40 2008 +0200 Merge commit 'v2.6.27-rc6' into core/rcu commit 9c0bbee8a6fc14107e9a7af6750bfe1056cbf4bc Author: Alexey Dobriyan Date: Tue Sep 9 11:01:31 2008 +0400 seccomp: drop now bogus dependency on PROC_FS seccomp is prctl(2)-driven now. Signed-off-by: Alexey Dobriyan Signed-off-by: Ingo Molnar commit e545a6140b698b2494daf0b32107bdcc5e901390 Author: Manfred Spraul Date: Sun Sep 7 16:57:22 2008 +0200 kernel/cpu.c: create a CPU_STARTING cpu_chain notifier Right now, there is no notifier that is called on a new cpu, before the new cpu begins processing interrupts/softirqs. Various kernel function would need that notification, e.g. kvm works around by calling smp_call_function_single(), rcu polls cpu_online_map. The patch adds a CPU_STARTING notification. It also adds a helper function that sends the message to all cpu_chain handlers. Tested on x86-64. All other archs are untested. Especially on sparc, I'm not sure if I got it right. Signed-off-by: Manfred Spraul Signed-off-by: Ingo Molnar commit 5df45515512436a808d3476a90e83f2efb022422 Author: Ingo Molnar Date: Sat Sep 6 23:55:40 2008 +0200 x86, tsc calibration: fix my brown paperbag day ... Signed-off-by: Ingo Molnar commit afe73824f52d6767c77e9456f573a76075108279 Author: Jan Beulich Date: Fri Aug 29 13:15:28 2008 +0100 x86-64: eliminate dead code Signed-off-by: Jan Beulich Signed-off-by: Ingo Molnar commit 17b746278da8d6642bc487ec35efe4be2333f03f Author: Jan Beulich Date: Fri Aug 29 12:51:32 2008 +0100 x86: pgd_{c,d}tor() cleanup Giving pgd_ctor() a properly typed parameter allows eliminating a local variable. Adjust pgd_dtor() to match. Signed-off-by: Jan Beulich Acked-by: Jeremy Fitzhardinge Cc: "Jeremy Fitzhardinge" Signed-off-by: Ingo Molnar commit 3ba35573ad9a149a3af19625b502679283382f6b Author: Manfred Spraul Date: Sun Aug 31 19:58:49 2008 +0200 kernel/cpu.c: Move the CPU_DYING notifiers When a cpu is taken offline, the CPU_DYING notifiers are called on the dying cpu. According to , the cpu should be "not running any task, not handling interrupts, soon dead". For the current implementation, this is not true: - __cpu_disable can fail. If it fails, then the cpu will remain alive and happy. - At least on x86, __cpu_disable() briefly enables the local interrupts to handle any outstanding interrupts. What about moving CPU_DYING down a few lines, behind the __cpu_disable() line? There are only two CPU_DYING handlers in the kernel right now: one in kvm, one in the scheduler. Both should work with the patch applied [and: I'm not sure if either one handles a failing __cpu_disable()] The patch survives simple offlining a cpu. kvm untested due to lack of a test setup. Signed-off-By: Manfred Spraul Signed-off-by: Ingo Molnar commit 0722bba8f14eb5271c8b67e97def74da50eceb15 Author: Christoph Hellwig Date: Mon Sep 1 18:14:51 2008 +0200 x86: kill sys32_pause It's an unused duplicate of the generic sys_pause. Signed-off-by: Christoph Hellwig Signed-off-by: Ingo Molnar commit 38736f475071b80b66be28af7b44c854073699cc Author: Gautham R Shenoy Date: Sat Sep 6 14:50:23 2008 +0530 sched: fix __load_balance_iterator() for cfq with only one task The __load_balance_iterator() returns a NULL when there's only one sched_entity which is a task. It is caused by the following code-path. /* Skip over entities that are not tasks */ do { se = list_entry(next, struct sched_entity, group_node); next = next->next; } while (next != &cfs_rq->tasks && !entity_is_task(se)); if (next == &cfs_rq->tasks) return NULL; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This will return NULL even when se is a task. As a side-effect, there was a regression in sched_mc behavior since 2.6.25, since iter_move_one_task() when it calls load_balance_start_fair(), would not get any tasks to move! Fix this by checking if the last entity was a task or not. Signed-off-by: Gautham R Shenoy Acked-by: Peter Zijlstra Signed-off-by: Ingo Molnar commit 7f79d852ed30a06eebf7497afe9334a726db3d40 Merge: aef745f... 70bb089... Author: Ingo Molnar Date: Sat Sep 6 16:51:57 2008 +0200 Merge branch 'linus' into sched/devel commit c8bfff6dd4d41834f4952cbc49e28e31906a6188 Author: Krzysztof Helt Date: Fri Sep 5 23:46:19 2008 +0200 sched: compilation fix with gcc 3.4.6 I found that 2.6.27-rc5-mm1 does not compile with gcc 3.4.6. The error is: CC kernel/sched.o kernel/sched.c: In function `start_rt_bandwidth': kernel/sched.c:208: sorry, unimplemented: inlining failed in call to 'rt_bandwidth_enabled': function body not available kernel/sched.c:214: sorry, unimplemented: called from here make[1]: *** [kernel/sched.o] Error 1 make: *** [kernel] Error 2 It seems that the gcc 3.4.6 requires full inline definition before first usage. The patch below fixes the compilation problem. Signed-off-by: Krzysztof Helt (if needed> Signed-off-by: Ingo Molnar commit 5b7e41ff37267c35b0fcf9162ca0c32c3d8d2c5c Author: H. Peter Anvin Date: Wed Sep 3 17:24:03 2008 -0700 x86: additional defconfig updates Additional updates to the x86 defconfigs. The goals are, as before: - Make them usable to testers, more so than distributors or end users, both of which are likely to have their own config already. - Keep 32 and 64 bits as similar as is practical. Changes: - Use a more generic CPU type (ppro and generic, respectively). - Bump number of CPUs to 64 (few if any NR_CPUS arrays left). - Enable PAT. - Enable OPTIMIZE_INLINE. - Enable microcode update support. - Build SMT scheduler support (in addition to MC). Signed-off-by: H. Peter Anvin Signed-off-by: Ingo Molnar commit 616ad8c44281c0c6711a72b560e01ec335ff27e0 Merge: 9980996... b380b0d... Author: Ingo Molnar Date: Fri Sep 5 18:56:57 2008 +0200 Merge branch 'linus' into x86/defconfig commit 4ab4ba32aa16b012cb0faabf1a27952508fe67f2 Author: Petr Tesarik Date: Wed Sep 3 13:31:42 2008 +0200 x86, tracehook: clean up implementation of syscall_get_error() The x86-tracehook code now contains this line in syscall_get_error(): return error >= -4095L ? error : 0; Hard-wiring a constant is not nice. Let's use the IS_ERR_VALUE macro from linux/err.h instead. Signed-off-by: Petr Tesarik Cc: utrace-devel@redhat.com Acked-by: Roland McGrath Signed-off-by: Ingo Molnar commit 28c3cfd5fb998bd3683bebeebbba38baa2101cad Merge: 04197c8... b380b0d... Author: Ingo Molnar Date: Fri Sep 5 17:53:05 2008 +0200 Merge branch 'linus' into x86/tracehook commit efd327a2d41214dded03cbfbb6d447530964cddd Author: Alex Nixon Date: Wed Sep 3 14:36:40 2008 +0100 x86/paravirt: Remove duplicate paravirt_pagetable_setup_{start, done}() They were already called once in arch/x86/kernel/setup.c - we don't need to call them again. Signed-off-by: Alex Nixon Signed-off-by: Ingo Molnar commit 27eccf46491e1f77f9af9bbe0778122ce6882890 Author: Andrew Morton Date: Fri Sep 5 08:42:08 2008 -0500 dlm: choose better identifiers sparc32: fs/dlm/config.c:397: error: expected identifier or '(' before '{' token fs/dlm/config.c: In function 'drop_node': fs/dlm/config.c:589: warning: initialization from incompatible pointer type fs/dlm/config.c:589: warning: initialization from incompatible pointer type fs/dlm/config.c: In function 'release_node': fs/dlm/config.c:601: warning: initialization from incompatible pointer type fs/dlm/config.c:601: warning: initialization from incompatible pointer type fs/dlm/config.c: In function 'show_node': fs/dlm/config.c:717: warning: initialization from incompatible pointer type fs/dlm/config.c:717: warning: initialization from incompatible pointer type fs/dlm/config.c: In function 'store_node': fs/dlm/config.c:726: warning: initialization from incompatible pointer type fs/dlm/config.c:726: warning: initialization from incompatible pointer type Cc: Christine Caulfield Signed-off-by: Andrew Morton Signed-off-by: David Teigland commit bd1eb8818cc2c8ddab86be027ab43fb852942704 Author: Julien Brunel Date: Mon Sep 1 10:51:22 2008 +0200 GFS2: Use an IS_ERR test rather than a NULL test In case of error, the function gfs2_inode_lookup returns an ERR pointer, but never returns a NULL pointer. So a NULL test that necessarily comes after an IS_ERR test should be deleted, and a NULL test that may come after a call to this function should be strengthened by an IS_ERR test. The semantic match that finds this problem is as follows: (http://www.emn.fr/x-info/coccinelle/) // @match_bad_null_test@ expression x, E; statement S1,S2; @@ x = gfs2_inode_lookup(...) ... when != x = E * if (x != NULL) S1 else S2 // Signed-off-by: Julien Brunel Signed-off-by: Julia Lawall Signed-off-by: Steven Whitehouse commit dff5257473ca1e05002809809f51f858e9a966fc Author: Steven Whitehouse Date: Tue Sep 2 13:33:17 2008 +0100 GFS2: Fix race relating to glock min-hold time In the case that a request for a glock arrives right after the grant reply has arrived, it sometimes means that the gl_tstamp field hasn't been updated recently enough. The net result is that the min-hold time for the glock is ignored. If this happens often enough, it leads to poor performance. This patch adds an additional test, so that if the reply pending bit is set on a glock, then it will select the maximum length of time for the min-hold time, rather than looking at gl_tstamp. Signed-off-by: Steven Whitehouse commit b56c8c221d192e4ffa719d00907c3b60fbaa2737 Author: James Morris Date: Fri Sep 5 21:43:38 2008 +1000 SELinux: add gitignore file for mdp script Add gitignore file for scripts/selinux/mdp/mdp. Signed-off-by: James Morris commit 4156e9a8ef5b521185f451213d33fa661f38512e Author: Ingo Molnar Date: Thu Sep 4 22:47:47 2008 +0200 x86: quick TSC calibration, improve - make sure the final TSC timestamp is reliable too Signed-off-by: Ingo Molnar commit 6ac40ed0413ef4096720f966e11c7cdf259eee3f Author: Linus Torvalds Date: Thu Sep 4 10:41:22 2008 -0700 x86: quick TSC calibration Introduce a fast TSC-calibration method on sane hardware. It only uses 17920 PIT timer ticks to calibrate the TSC, plus 256 ticks on each side to make sure the TSC values were very close to the tick, so the whole calibration takes 15ms. Yet, despite only takign 15ms, we can actually give pretty stringent guarantees of accuracy: - the code requires that we hit each 256-counter block at least 50 times, so the TSC error is basically at *MOST* just a few PIT cycles off in any direction. In practice, it's going to be about one microseconds off (which is how long it takes to read the counter) - so over 17920 PIT cycles, we can pretty much guarantee that the calibration error is less than one half of a percent. My testing bears this out: on my machine, the quick-calibration reports 2934.085kHz, while the slow one reports 2933.415. Yes, the slower calibration is still more precise. For me, the slow calibration is stable to within about one hundreth of a percent, so it's (at a guess) roughly an order-and-a-half of magnitude more precise. The longer you wait, the more precise you can be. However, the nice thing about the fast TSC PIT synchronization is that it's pretty much _guaranteed_ to give that 0.5% precision, and fail gracefully (and very quickly) if it doesn't get it. And it really is fairly simple (even if there's a lot of _details_ there, and I didn't get all of those right ont he first try or even the second ;) The patch says "110 insertions", but 63 of those new lines are actually comments. Signed-off-by: Linus Torvalds Signed-off-by: Ingo Molnar --- arch/x86/kernel/tsc.c | 111 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 110 insertions(+), 1 deletions(-) commit f9f2ed486256f3480e4d499ffd6bf730bc5e6fc6 Author: David Teigland Date: Thu Sep 4 12:51:20 2008 -0500 dlm: remove bkl BLK from recent pushdown is not needed. Signed-off-by: David Teigland commit dc44e65943169de2d1a1b494876f48a65a9737f1 Author: Andi Kleen Date: Thu Sep 4 13:47:38 2008 +0200 x86: capitalize function call interrupts consistently Impact: aestetic Capitalize function call interrupts consistently. All other descriptions in /proc/interrupts are capitalized except for "function call interrupts". Capitalize it too for consistency. While that's technically a published ABI I think the risk of anyone relying on that text to stay the same is negligible. Signed-off-by: Andi Kleen Signed-off-by: H. Peter Anvin commit a977c400957451f3bd92b9ed6022f5fe8a6cbbf5 Author: Thomas Gleixner Date: Thu Sep 4 15:18:59 2008 +0000 x86: TSC make the calibration loop smarter The last changes made the calibration loop 250ms long which is far too much. Try to do that more clever. Experiments have shown that using a 10ms delay for the PIT based calibration gives us a good enough value. If we have a reference (HPET/PMTIMER) and the result of the PIT and the reference is close enough, then we can break out of the calibration loop on a match right away and use the reference value. Otherwise we just loop 3 times and decide then, which value to take. One caveat is that for virtualized environments the PIT calibration often does not work at all and I found out that 10us is a bit too short as well for the reference to give a sane result. The solution here is to make the last loop longer when the first two PIT calibrations failed. Signed-off-by: Thomas Gleixner Signed-off-by: Ingo Molnar commit 827014be05e4515fa0dfc32e3100c4dab2070a98 Author: Thomas Gleixner Date: Thu Sep 4 15:18:53 2008 +0000 x86: TSC: use one set of reference variables Signed-off-by: Thomas Gleixner Signed-off-by: Ingo Molnar commit d683ef7afe8b6dbac6a3c681cef8a908357793ca Author: Thomas Gleixner Date: Thu Sep 4 15:18:48 2008 +0000 x86: TSC: separate hpet/pmtimer calculation out Signed-off-by: Thomas Gleixner Signed-off-by: Ingo Molnar commit cce3e057242d3d46fea07b9eb3910b0076419be5 Author: Thomas Gleixner Date: Thu Sep 4 15:18:44 2008 +0000 x86: TSC: define the PIT latch value separate Signed-off-by: Thomas Gleixner Signed-off-by: Ingo Molnar commit 44be6fdf1056b685eb79e53e42bd2d321b085cfc Author: David Teigland Date: Thu Aug 28 11:36:19 2008 -0500 dlm: fix address compare Compare only the addr and port fields of sockaddr structures. Fixes a problem with ipv6 where sin6_scope_id does not match. Signed-off-by: David Teigland commit a0f000ec9b61b99111757df138b11144236fc59b Author: Herbert Xu Date: Thu Aug 14 22:21:31 2008 +1000 crypto: skcipher - Use RNG interface instead of get_random_bytes This patch makes the IV generators use the new RNG interface so that the user can pick an RNG other than the default get_random_bytes. Signed-off-by: Herbert Xu commit 17f0f4a47df9aea9ee26c939f8057c35e0be1847 Author: Neil Horman Date: Thu Aug 14 22:15:52 2008 +1000 crypto: rng - RNG interface and implementation This patch adds a random number generator interface as well as a cryptographic pseudo-random number generator based on AES. It is meant to be used in cases where a deterministic CPRNG is required. One of the first applications will be as an input in the IPsec IV generation process. Signed-off-by: Neil Horman Signed-off-by: Herbert Xu commit ccb778e1841ce04b4c10b39f0dd2558ab2c6dcd4 Author: Neil Horman Date: Tue Aug 5 14:13:08 2008 +0800 crypto: api - Add fips_enable flag Add the ability to turn FIPS-compliant mode on or off at boot In order to be FIPS compliant, several check may need to be preformed that may be construed as unusefull in a non-compliant mode. This patch allows us to set a kernel flag incating that we are running in a fips-compliant mode from boot up. It also exports that mode information to user space via a sysctl (/proc/sys/crypto/fips_enabled). Tested successfully by me. Signed-off-by: Neil Horman Signed-off-by: Herbert Xu commit 5be5e667a9a5d8d5553e009e67bc692d95e5916a Author: Herbert Xu Date: Sun Aug 17 18:04:30 2008 +1000 crypto: skcipher - Move IV generators into their own modules This patch moves the default IV generators into their own modules in order to break a dependency loop between cryptomgr, rng, and blkcipher. Signed-off-by: Herbert Xu commit 1aa4ecd95d8d67d21731a00646326a71295dafa3 Author: Herbert Xu Date: Sun Aug 17 17:01:56 2008 +1000 crypto: cryptomgr - Test ciphers using ECB As it is we only test ciphers when combined with a mode. That means users that do not invoke a mode of operations may get an untested cipher. This patch tests all ciphers using the ECB mode so that simple cipher users such as ansi-cprng are also protected. Signed-off-by: Herbert Xu commit 73d3864a4823abda19ebc4387b6ddcbf416e3a77 Author: Herbert Xu Date: Sun Aug 3 21:15:23 2008 +0800 crypto: api - Use test infrastructure This patch makes use of the new testing infrastructure by requiring algorithms to pass a run-time test before they're made available to users. Signed-off-by: Herbert Xu commit da7f033ddc9fdebb3223b0bf88a2a2ab5b797608 Author: Herbert Xu Date: Thu Jul 31 17:08:25 2008 +0800 crypto: cryptomgr - Add test infrastructure This patch moves the newly created alg_test infrastructure into cryptomgr. This shall allow us to use it for testing at algorithm registrations. Signed-off-by: Herbert Xu commit 01b323245e4f6d4a22ffd73754f145f45c85988c Author: Herbert Xu Date: Thu Jul 31 15:41:55 2008 +0800 crypto: tcrypt - Add alg_test interface This patch creates a new interface algorithm testing. A test can be requested for a particular implementation of an algorithm. This is achieved by taking both the name of the algorithm and that of the implementation. The all-inclusive test has also been rewritten to no longer require a duplicate listing of all algorithms with tests. In that process a number of missing tests have also been discovered and rectified. Signed-off-by: Herbert Xu commit bdecd22821a0fab1f5c9e4c9b7fba894593507d4 Author: Herbert Xu Date: Thu Jul 31 14:03:44 2008 +0800 crypto: tcrypt - Abort and only log if there is an error The info printed is a complete waste of space when there is no error since it doesn't tell us anything that we don't already know. If there is an error, we can also be more verbose. In case that there is an error, this patch also aborts the test and returns the error to the caller. In future this will be used to algorithms at registration time. Signed-off-by: Herbert Xu commit 8cb51ba8e06570a5fff674b3744d12a1b089f2d0 Author: Austin Zhang Date: Thu Aug 7 09:57:03 2008 +0800 crypto: crc32c - Use Intel CRC32 instruction From NHM processor onward, Intel processors can support hardware accelerated CRC32c algorithm with the new CRC32 instruction in SSE 4.2 instruction set. The patch detects the availability of the feature, and chooses the most proper way to calculate CRC32c checksum. Byte code instructions are used for compiler compatibility. No MMX / XMM registers is involved in the implementation. Signed-off-by: Austin Zhang Signed-off-by: Kent Liu Signed-off-by: Herbert Xu commit f139cfa7cdccd0b315fad098889897b5fcd389b0 Author: Herbert Xu Date: Thu Jul 31 12:23:53 2008 +0800 crypto: tcrypt - Avoid using contiguous pages If tcrypt is to be used as a run-time integrity test, it needs to be more resilient in a hostile environment. For a start allocating 32K of physically contiguous memory is definitely out. This patch teaches it to use separate pages instead. Signed-off-by: Herbert Xu commit a7581a01fbc69771a2b391de4220ba670c0aa261 Author: Herbert Xu Date: Mon Aug 4 14:22:29 2008 +0800 crypto: api - Display larval objects properly Rather than displaying larval objects as real objects, this patch makes them show up under /proc/crypto as of type larval. Signed-off-by: Herbert Xu commit c51b6c8102a82239163c8c04e404c7cc2857b4be Author: Herbert Xu Date: Mon Aug 4 11:44:59 2008 +0800 crypto: api - Export crypto_alg_lookup instead of __crypto_alg_lookup Since the only user of __crypto_alg_lookup is doing exactly what crypto_alg_lookup does, we can now the latter in lieu of the former. Signed-off-by: Herbert Xu commit b6d44341864b50a308f932c39f03fb8ad5efb021 Author: Adrian Bunk Date: Wed Jul 16 19:28:00 2008 +0800 crypto: Kconfig - Replace leading spaces with tabs Instead of tabs there were two spaces. Signed-off-by: Adrian Bunk Signed-off-by: Herbert Xu commit c1dcf65ffc5796bf4ff75c13f448e63b3a416fd6 Author: David Teigland Date: Mon Aug 18 14:03:25 2008 -0500 dlm: fix locking of lockspace list in dlm_scand The dlm_scand thread needs to lock the list of lockspaces when going through it. Signed-off-by: David Teigland commit dc68c7ed362a00a48290252573a8eb9f74463c3a Author: David Teigland Date: Mon Aug 18 11:43:30 2008 -0500 dlm: detect available userspace daemon If dlm_controld (the userspace daemon that controls the setup and recovery of the dlm) fails, the kernel should shut down the lockspaces in the kernel rather than leaving them running. This is detected by having dlm_controld hold a misc device open while running, and if the kernel detects a close while the daemon is still needed, it stops the lockspaces in the kernel. Knowing that the userspace daemon isn't running also allows the lockspace create/remove routines to avoid waiting on the daemon for join/leave operations. Signed-off-by: David Teigland commit 0f8e0d9a317406612700426fad3efab0b7bbc467 Author: David Teigland Date: Wed Aug 6 13:30:24 2008 -0500 dlm: allow multiple lockspace creates Add a count for lockspace create and release so that create can be called multiple times to use the lockspace from different places. Also add the new flag DLM_LSFL_NEWEXCL to create a lockspace with the previous behavior of returning -EEXIST if the lockspace already exists. Signed-off-by: David Teigland commit 1befdefcf476d5eb2fb4243fdf4d996a376708b1 Author: Luiz Fernando N. Capitulino Date: Thu Aug 28 11:00:07 2008 -0300 x86: remove 8254 timer texts from Documentation Commit ecd29476ae0143b1c3641edfa76c0fc3e9ad3021 removed the "disable_8254_timer" and "enable_8254_timer" kernel parameters from the kernel but did not remove the references to them from two files in the Documentation directory: kernel-parameters.txt and x86/x86_64/boot-options.txt. This change completes the removal. Signed-off-by: Luiz Fernando N. Capitulino Acked-by: Maciej W. Rozycki Signed-off-by: Ingo Molnar commit d9250dea3f89fe808a525f08888016b495240ed4 Author: KaiGai Kohei Date: Thu Aug 28 16:35:57 2008 +0900 SELinux: add boundary support and thread context assignment The purpose of this patch is to assign per-thread security context under a constraint. It enables multi-threaded server application to kick a request handler with its fair security context, and helps some of userspace object managers to handle user's request. When we assign a per-thread security context, it must not have wider permissions than the original one. Because a multi-threaded process shares a single local memory, an arbitary per-thread security context also means another thread can easily refer violated information. The constraint on a per-thread security context requires a new domain has to be equal or weaker than its original one, when it tries to assign a per-thread security context. Bounds relationship between two types is a way to ensure a domain can never have wider permission than its bounds. We can define it in two explicit or implicit ways. The first way is using new TYPEBOUNDS statement. It enables to define a boundary of types explicitly. The other one expand the concept of existing named based hierarchy. If we defines a type with "." separated name like "httpd_t.php", toolchain implicitly set its bounds on "httpd_t". This feature requires a new policy version. The 24th version (POLICYDB_VERSION_BOUNDARY) enables to ship them into kernel space, and the following patch enables to handle it. Signed-off-by: KaiGai Kohei Acked-by: Stephen Smalley Signed-off-by: James Morris commit 7940ca3605b77f20cc6e9852e4ca6f2d725b5653 Author: Ingo Molnar Date: Tue Aug 19 13:40:47 2008 +0200 sched: extract walk_tg_tree(), fix fix: kernel/sched.c: In function '__rt_schedulable': kernel/sched.c:8771: error: implicit declaration of function 'walk_tg_tree' kernel/sched.c:8771: error: 'tg_nop' undeclared (first use in this function) kernel/sched.c:8771: error: (Each undeclared identifier is reported only once kernel/sched.c:8771: error: for each function it appears in.) Signed-off-by: Ingo Molnar commit aef745fca016aea45adae5c98e8698904dd8ad51 Author: Ingo Molnar Date: Thu Aug 28 11:34:43 2008 +0200 sched: clean up __might_sleep() add KERN_ to the printout and clean up the flow a bit. Signed-off-by: Ingo Molnar commit 29cbef4869bf288256ab76c7dc674cb132b35de2 Author: Joe Korty Date: Wed Aug 27 11:21:39 2008 -0400 make might_sleep() display the oopsing process Expand might_sleep's printk to indicate the oopsing process. Signed-off-by: Joe Korty Signed-off-by: Ingo Molnar commit aec0a5142cb52aaa152d962d84a838e25d520742 Author: Bharata B Rao Date: Thu Aug 28 14:42:49 2008 +0530 sched: call resched_task() conditionally from new task wake up path - During wake up of a new task, task_new_fair() can do a resched_task() on the current task. Later in the code path, check_preempt_curr() also ends up doing the same, which can be avoided. Check if TIF_NEED_RESCHED is already set for the current task. - task_new_fair() does a resched_task() on the current task unconditionally. This can be done only in case when child runs before the parent. So this is a small speedup. Signed-off-by: Bharata B Rao Acked-by: Peter Zijlstra Signed-off-by: Ingo Molnar commit 2c7e9fd4c6cb7f4b0bc7162e9a30847e51a1ca1b Author: Joe Korty Date: Wed Aug 27 10:35:06 2008 -0400 x86: make poll_idle behave more like the other idle methods Make poll_idle() behave more like the other idle methods. Currently, poll_idle() returns immediately. The other idle methods all wait indefinately for some condition to come true before returning. poll_idle should emulate these other methods and also wait for a return condition, in this case, for need_resched() to become 'true'. Without this delay the idle loop spends all of its time in the outer loop that calls poll_idle. This outer loop, these days, does real work, some of it under rcu locks. That work should only be done when idle is entered and when idle exits, not continuously while idle is spinning. Signed-off-by: Joe Korty Signed-off-by: Ingo Molnar commit da31894ed7b654e2e1741e7ac4ef6c15be0dd14b Author: Eric Paris Date: Fri Aug 22 11:35:57 2008 -0400 securityfs: do not depend on CONFIG_SECURITY Add a new Kconfig option SECURITYFS which will build securityfs support but does not require CONFIG_SECURITY. The only current user of securityfs does not depend on CONFIG_SECURITY and there is no reason the full LSM needs to be built to build this fs. Signed-off-by: Eric Paris Signed-off-by: James Morris commit 86d688984deefa3ae5a802880c11f2b408b5d6cf Merge: 93c06cb... 4c246ed... Author: James Morris Date: Thu Aug 28 10:47:34 2008 +1000 Merge branch 'master' into next commit 0188d6c5807b65e2e20dcb75a668efbe5418b27e Author: Steven Whitehouse Date: Tue Aug 26 09:38:26 2008 +0100 GFS2: Fix & clean up GFS2 rename This patch fixes a locking issue in the rename code by ensuring that we hold the per sb rename lock over both directory and "other" renames which involve different parent directories. At the same time, this moved the (only called from one place) function gfs2_ok_to_move into the file that its called from, so we can mark it static. This should make a code a bit easier to follow. Signed-off-by: Steven Whitehouse Cc: Peter Staubach commit 0cd418ddb1ee88df7d16d5df06cb2da68eceb9e4 Author: Hiroshi Shimamoto Date: Mon Aug 18 14:39:21 2008 -0700 rcuclassic: fix compiler warning CC kernel/rcuclassic.o kernel/rcuclassic.c: In function 'rcu_init_percpu_data': kernel/rcuclassic.c:705: warning: comparison of distinct pointer types lacks a cast kernel/rcuclassic.c:713: warning: comparison of distinct pointer types lacks a cast flags should be unsigned long. Signed-off-by: Hiroshi Shimamoto Signed-off-by: Ingo Molnar commit a81726087428b541aa64604b8a94104a4d4aa8f9 Author: Hiroshi Shimamoto Date: Tue Aug 26 15:13:45 2008 -0700 x86: acpi: move acpi_mcfg_64bit_base_addr into CONFIG_PCI_MMCONFIG acpi_mcfg_64bit_base_addr is used when CONFIG_PCI_MMCONFIG is enabled. Signed-off-by: Hiroshi Shimamoto Signed-off-by: Ingo Molnar commit 93c06cbbf9fea5d5be1778febb7fa9ab1a74e5f5 Author: Serge E. Hallyn Date: Tue Aug 26 14:47:57 2008 -0500 selinux: add support for installing a dummy policy (v2) In August 2006 I posted a patch generating a minimal SELinux policy. This week, David P. Quigley posted an updated version of that as a patch against the kernel. It also had nice logic for auto-installing the policy. Following is David's original patch intro (preserved especially bc it has stats on the generated policies): se interested in the changes there were only two significant changes. The first is that the iteration through the list of classes used NULL as a sentinel value. The problem with this is that the class_to_string array actually has NULL entries in its table as place holders for the user space object classes. The second change was that it would seem at some point the initial sids table was NULL terminated. This is no longer the case so that iteration has to be done on array length instead of looking for NULL. Some statistics on the policy that it generates: The policy consists of 523 lines which contain no blank lines. Of those 523 lines 453 of them are class, permission, and initial sid definitions. These lines are usually little to no concern to the policy developer since they will not be adding object classes or permissions. Of the remaining 70 lines there is one type, one role, and one user statement. The remaining lines are broken into three portions. The first group are TE allow rules which make up 29 of the remaining lines, the second is assignment of labels to the initial sids which consist of 27 lines, and file system labeling statements which are the remaining 11. In addition to the policy.conf generated there is a single file_contexts file containing two lines which labels the entire system with base_t. This policy generates a policy.23 binary that is 7920 bytes. (then a few versions later...): The new policy is 587 lines (stripped of blank lines) with 476 of those lines being the boilerplate that I mentioned last time. The remaining 111 lines have the 3 lines for type, user, and role, 70 lines for the allow rules (one for each object class including user space object classes), 27 lines to assign types to the initial sids, and 11 lines for file system labeling. The policy binary is 9194 bytes. Changelog: Aug 26: Added Documentation/SELinux.txt Aug 26: Incorporated a set of comments by Stephen Smalley: 1. auto-setup SELINUXTYPE=dummy 2. don't auto-install if selinux is enabled with non-dummy policy 3. don't re-compute policy version 4. /sbin/setfiles not /usr/sbin/setfiles Aug 22: As per JMorris comments, made sure make distclean cleans up the mdp directory. Removed a check for file_contexts which is now created in the same file as the check, making it superfluous. Signed-off-by: Serge Hallyn Signed-off-by: David Quigley Signed-off-by: James Morris commit 65eb3dc609dec17deea48dcd4de2e549d29a9824 Author: Kevin Diggs Date: Tue Aug 26 10:26:54 2008 +0200 sched: add kernel doc for the completion, fix kernel-doc-nano-HOWTO.txt This patch adds kernel doc for the completion feature. An error in the split-man.pl PERL snippet in kernel-doc-nano-HOWTO.txt is also fixed. Signed-off-by: Kevin Diggs Signed-off-by: Ingo Molnar commit 3cf430b0636045dc524759a0852293ba037732a7 Merge: 93dcf55... 83097ac... Author: Ingo Molnar Date: Tue Aug 26 10:25:59 2008 +0200 Merge branch 'linus' into sched/devel commit bdd314616f7218e325aa9637a46159ecba44cfeb Author: H. Peter Anvin Date: Mon Aug 25 17:44:03 2008 -0700 x86: msr-on-cpu: remove unnecessary level of abstraction Remove an unnecessary level of abstraction in the msr-on-cpu library. Although this duplicates some code, the duplicated code is less than the additional code, and this way should be faster. Additionally, change the order of the functions to make the regular structure of this file more obvious. Signed-off-by: H. Peter Anvin commit 94d4ac2f4a58c6e37876827c6688c61cef21290c Merge: ed21763... 08970fc... Author: H. Peter Anvin Date: Mon Aug 25 22:45:37 2008 -0700 Merge branch 'x86/urgent' into x86/cleanups commit c7ffa6c26277b403920e2255d10df849bd613380 Author: Avi Kivity Date: Mon Aug 25 13:11:27 2008 +0300 x86: default to reboot via ACPI Triple-fault and keyboard reset may assert INIT instead of RESET; however INIT is blocked when Intel VT is enabled. This leads to a partially reset machine when invoking emergency_restart via sysrq-b: the processor is still working but other parts of the system are dead. Default to rebooting via ACPI, which correctly asserts RESET and reboots the machine. This is safe since we will fall back to keyboard reset and triple fault if acpi is not enabled or if the reset is not successful. Signed-off-by: Avi Kivity Signed-off-by: Ingo Molnar commit ed21763e7b0b3fb50e4efd9d4bc17ef5b035d304 Author: Robert Richter Date: Fri Aug 22 20:23:38 2008 +0200 x86: cleanup in amd_cpu_notify() small coding style fix. Signed-off-by: Robert Richter Signed-off-by: Ingo Molnar commit ea1c9de45ecb162841c9b4e0fa303a245d59b1c8 Merge: 4e1d112... a2bd727... Author: Ingo Molnar Date: Mon Aug 25 11:10:42 2008 +0200 Merge branch 'x86/urgent' into x86/cleanups commit 93dcf55f828b035fc93fc19eb03c1390e1e6d570 Author: Oleg Nesterov Date: Wed Aug 20 16:54:44 2008 -0700 wait_task_inactive: "improve" the returned value for ->nvcsw == 0 wait_task_inactive() returns 1 when p->nvcsw == 0 || p->nvcsw == 1. This means that two subsequent calls can return the same number while the task was scheduled in between. Change the code to return "nvcsw | LONG_MIN" instead of "nvcsw ?: 1", now the overlap always needs LONG_MAX schedules. Signed-off-by: Oleg Nesterov Signed-off-by: Andrew Morton Signed-off-by: Ingo Molnar commit f31e11d87a5d7601636710195891ba462ad99f11 Author: Oleg Nesterov Date: Wed Aug 20 16:54:44 2008 -0700 wait_task_inactive(): don't consider task->nivcsw If wait_task_inactive() returns success the task was deactivated. In that case schedule() always increments ->nvcsw which alone can be used as a "generation counter". If the next call returns the same number, we can be sure that the task was unscheduled. Otherwise, because we know that .on_rq == 0 again, ->nvcsw should have been changed in between. Q: perhaps it is better to do "ncsw = (p->nvcsw << 1) | 1" ? This decreases the possibility of "was it unscheduled" false positive when ->nvcsw == 0. Signed-off-by: Oleg Nesterov Signed-off-by: Andrew Morton Signed-off-by: Ingo Molnar commit 94d3d8247de22c5b0624aa00616ceca459498e55 Author: Oleg Nesterov Date: Wed Aug 20 16:54:41 2008 -0700 sched: do_wait_for_common: use signal_pending_state() Change do_wait_for_common() to use signal_pending_state() instead of open coding. Signed-off-by: Oleg Nesterov Signed-off-by: Andrew Morton Signed-off-by: Ingo Molnar commit b05f78f5c713eda2c34e495d92495ee4f1c3b5e1 Author: Yinghai Lu Date: Fri Aug 22 01:32:50 2008 -0700 x86_64: printout msr -v2 commandline show_msr=1 for bsp, show_msr=32 for all 32 cpus. [ mingo@elte.hu: added documentation ] Signed-off-by: Yinghai Lu Signed-off-by: Ingo Molnar commit f86399396ce7a4f4069828b7dceac5aa5113dfb5 Author: Eduardo Habkost Date: Wed Jul 30 18:32:27 2008 -0300 x86, paravirt_ops: use unsigned long instead of u32 for alloc_p*() pfn args This patch changes the pfn args from 'u32' to 'unsigned long' on alloc_p*() functions on paravirt_ops, and the corresponding implementations for Xen and VMI. The prototypes for CONFIG_PARAVIRT=n are already using unsigned long, so paravirt.h now matches the prototypes on asm-x86/pgalloc.h. It shouldn't result in any changes on generated code on 32-bit, with or without CONFIG_PARAVIRT. On both cases, 'codiff -f' didn't show any change after applying this patch. On 64-bit, there are (expected) binary changes only when CONFIG_PARAVIRT is enabled, as the patch is really supposed to change the size of the pfn args. [ v2: KVM_GUEST: use the right parameter type on kvm_release_pt() ] Signed-off-by: Eduardo Habkost Acked-by: Jeremy Fitzhardinge Acked-by: Zachary Amsden Signed-off-by: Ingo Molnar commit 275a89bdd3868af3008852594d2e169eaf69441b Author: Paul E. McKenney Date: Thu Aug 21 06:14:55 2008 -0700 rcu: use irq-safe locks Some earlier tip/core/rcu patches caused RCU to incorrectly enable irqs too early in boot. This caused Yinghai's repeated-kexec testing to hit oopses, presumably due to so that device interrupts left over from the prior kernel instance (which would oops the newly booting kernel before it got a chance to reset said devices). This patch therefore converts all the local_irq_disable()s in rcuclassic.c to local_irq_save(). Besides, I never did like local_irq_disable() anyway. ;-) Signed-off-by: Paul E. McKenney Signed-off-by: Yinghai Lu Signed-off-by: Ingo Molnar commit 59dfc3f8fbabb8681ab4f2fb2df795f9211f40f9 Author: venkatesh.pallipadi@intel.com Date: Wed Aug 20 16:45:54 2008 -0700 x86: PAT documentation updates with debug info Documentation update for PAT. Reflect the latest API details. Also, adds details about ways to get more info in order to debug PAT. Signed-off-by: Venkatesh Pallipadi Signed-off-by: Ingo Molnar commit 470fba7ebe60ad9185056b080b331abad24b4df9 Merge: 7225e75... 6a55617... Author: Ingo Molnar Date: Thu Aug 21 13:28:24 2008 +0200 Merge branch 'linus' into x86/doc commit 4e1d112cac08049e764fe3c4f6d3ec92529f9f68 Author: Huang Weiyi Date: Wed Aug 20 16:43:07 2008 -0700 arch/x86/kernel/apm_32.c: remove duplicated #include Removed duplicated include file in arch/x86/kernel/apm_32.c. Signed-off-by: Huang Weiyi Acked-by: Pavel Machek Signed-off-by: Ingo Molnar Signed-off-by: Andrew Morton commit e621bd18958ef5dbace3129ebe17a0a475e127d9 Author: Dave Young Date: Wed Aug 20 16:43:03 2008 -0700 i386: vmalloc size fix Booting kernel with vmalloc=[any size<=16m] will oops on my pc (i386/1G memory). BUG_ON in arch/x86/mm/init_32.c triggered: BUG_ON((unsigned long)high_memory > VMALLOC_START); It's due to the vm area hole. In include/asm-x86/pgtable_32.h: #define VMALLOC_OFFSET (8 * 1024 * 1024) #define VMALLOC_START (((unsigned long)high_memory + 2 * VMALLOC_OFFSET - 1) \ & ~(VMALLOC_OFFSET - 1)) There's several related point: 1. MAXMEM : (-__PAGE_OFFSET - __VMALLOC_RESERVE). The space after VMALLOC_END is included as well, I set it to (VMALLOC_END - PAGE_OFFSET - __VMALLOC_RESERVE) 2. VMALLOC_OFFSET is not considered in __VMALLOC_RESERVE fixed by adding VMALLOC_OFFSET to it. 3. VMALLOC_START : (((unsigned long)high_memory + 2 * VMALLOC_OFFSET - 1) & ~(VMALLOC_OFFSET - 1)) So it's not always 8M, bigger than 8M possible. I set it to ((unsigned long)high_memory + VMALLOC_OFFSET) 4. the VMALLOC_RESERVE is an unused macro, so remove it here. Signed-off-by: Dave Young Cc: akpm@linux-foundation.org Cc: hidave.darkstar@gmail.com Signed-off-by: Ingo Molnar Signed-off-by: Andrew Morton commit 2d96ae6b0dc03a568398c1655fb8967f01f8e40a Author: Andrew Morton Date: Wed Aug 20 16:43:02 2008 -0700 arch/x86/pci/irq.c: attempt to clean up code layout Signed-off-by: Andrew Morton Signed-off-by: Ingo Molnar commit f58e2c33ffa31b8d4a71609a5e71e8d893574a07 Author: Claudio Scordino Date: Wed Aug 20 15:18:45 2008 +0200 sched: new documentation about CFS Rewrite of the CFS documentation - because the old one was sorely out-dated. Signed-off-by: Claudio Scordino Acked-by: Peter Zijlstra Signed-off-by: Ingo Molnar commit 27990eac52dae87b909ef4f7e796fb6ec758bb94 Author: Jeremy Fitzhardinge Date: Tue Aug 19 13:10:07 2008 -0700 x86: another user of PTE_FLAGS_MASK Signed-off-by: Jeremy Fitzhardinge Signed-off-by: Ingo Molnar commit b6edbb1e045a7116d5571544dae25c6c37c94a48 Author: Jeremy Fitzhardinge Date: Tue Aug 19 13:04:19 2008 -0700 x86_64: use save/loadsegment in ia32 compat Use savesegment and loadsegment consistently in ia32 compat code. Signed-off-by: Jeremy Fitzhardinge Signed-off-by: Ingo Molnar commit 3f23d815c5049c9d7022226cec2242e384dd0b43 Author: Randy Dunlap Date: Sun Aug 17 21:44:22 2008 -0700 security: add/fix security kernel-doc Add security/inode.c functions to the kernel-api docbook. Use '%' on constants in kernel-doc notation. Fix several typos/spellos in security function descriptions. Signed-off-by: Randy Dunlap Signed-off-by: James Morris commit c171f465b7281f2d3b03e9145ec763d6a8bab176 Author: Uros Bizjak Date: Wed Aug 20 10:44:47 2008 +0200 x86, cleanup: use X86_CR4_PGE in x86/power/hibernate_asm_32.S Signed-off-by: Uros Bizjak Signed-off-by: Ingo Molnar commit 7393423dd9b5790a3115873be355e9fc862bce8f Merge: 8df9676... 1fca254... Author: Ingo Molnar Date: Wed Aug 20 11:52:15 2008 +0200 Merge branch 'linus' into x86/cleanups commit 9a7e0b180da21885988d47558671cf580279f9d6 Author: Peter Zijlstra Date: Tue Aug 19 12:33:06 2008 +0200 sched: rt-bandwidth fixes The last patch allows sysctl_sched_rt_runtime to disable bandwidth accounting for the group scheduler - however it doesn't deal with sched_setscheduler(), which will keep tasks out of groups that have no assigned runtime. If we relax this, we get into the situation where RT tasks can get into a group when we disable bandwidth control, and then starve them by enabling it again. Rework the schedulability code to check for this condition and fail to turn on bandwidth control with -EBUSY when this situation is found. Signed-off-by: Peter Zijlstra Signed-off-by: Ingo Molnar commit eb755805f21bd5ded84026e167b7a90887ac42e5 Author: Peter Zijlstra Date: Tue Aug 19 12:33:05 2008 +0200 sched: extract walk_tg_tree() Extract walk_tg_tree() and make it a little more generic so we can use it in the schedulablity test. Signed-off-by: Peter Zijlstra Signed-off-by: Ingo Molnar commit 0b148fa04852859972abbf848177b92daeef138a Author: Peter Zijlstra Date: Tue Aug 19 12:33:04 2008 +0200 sched: rt-bandwidth group disable fixes More extensive disable of bandwidth control. It allows sysctl_sched_rt_runtime to disable full group bandwidth control. Signed-off-by: Peter Zijlstra Signed-off-by: Ingo Molnar commit 6f0d5c390e4206dcb3804a5072a048fdb7d2b428 Author: Peter Zijlstra Date: Tue Aug 19 12:33:03 2008 +0200 sched: rt-bandwidth accounting fix It fixes an accounting bug where we would continue accumulating runtime even though the bandwidth control is disabled. This would lead to very long throttle periods once bandwidth control gets turned on again. Signed-off-by: Peter Zijlstra Signed-off-by: Ingo Molnar commit af4491e51632d01fbc2b856ffa9ebcd4b38db68c Author: Peter Zijlstra Date: Tue Aug 19 12:33:02 2008 +0200 sched: rt-bandwidth for user grouping interface rt_runtime is a signed value Signed-off-by: Peter Zijlstra Signed-off-by: Ingo Molnar commit 0c925d79234fe77589d8ff3861f9f8bb9e7fc3f6 Author: Hiroshi Shimamoto Date: Mon Aug 18 21:49:51 2008 -0700 rcuclassic: fix compilation NG fix: CC kernel/rcuclassic.o kernel/rcuclassic.c: In function '__rcu_process_callbacks': kernel/rcuclassic.c:561: error: 'flags' undeclared (first use in this function) kernel/rcuclassic.c:561: error: (Each undeclared identifier is reported only once kernel/rcuclassic.c:561: error: for each function it appears in.) Declare missing variable flags. Signed-off-by: Hiroshi Shimamoto Signed-off-by: Ingo Molnar commit eff9b713ee3540ddab862095aaf4b1511a6758bc Author: Paul E. McKenney Date: Mon Aug 18 17:51:08 2008 -0700 rcu: fix locking cleanup fallout Given that the rcp->lock is now acquired from call_rcu(), which can be invoked from irq-disable regions, all acquisitions need to disable irqs. The following patch fixes this. Although I don't have any reason to believe that this is the cause of Yinghai's oops, it does need to be fixed. Signed-off-by: Paul E. McKenney Cc: Yinghai Lu Signed-off-by: Ingo Molnar commit 20211e4d344729f4d4c93da37a590fc1c3a1fd9b Author: Paolo Ciarrocchi Date: Mon Aug 18 21:25:38 2008 +0200 x86: Coding style fixes to arch/x86/oprofile/op_model_p4.c A coding style patch to arch/x86/oprofile/op_model_p4.c that removes 87 errors and 4 warnings. Before: total: 89 errors, 13 warnings, 722 lines checked After: total: 2 errors, 9 warnings, 721 lines checked Compile tested, binary verified as follow: paolo@paolo-desktop:~/linux.trees.git$ size /tmp/op_model_p4.o.* text data bss dec hex filename 2691 968 32 3691 e6b /tmp/op_model_p4.o.after 2691 968 32 3691 e6b /tmp/op_model_p4.o.before paolo@paolo-desktop:~/linux.trees.git$ md5sum /tmp/op_model_p4.o.* 8c1c9823bab33333e1f7f76574e62561 /tmp/op_model_p4.o.after 8c1c9823bab33333e1f7f76574e62561 /tmp/op_model_p4.o.before Signed-off-by: Paolo Ciarrocchi Cc: robert.richter@amd.com Signed-off-by: Ingo Molnar commit 8df9676d6402563da91427e8d9f2da8a4598aede Author: H. Peter Anvin Date: Mon Aug 18 18:13:33 2008 -0700 x86: consistency cleanups Rename _ASM_MOV_UL to _ASM_MOV for consistency with other _ASM_ instructions (_ASM_ADD, _ASM_SUB and so on.) Add ASM_SP, _ASM_BP, _ASM_SI, and _ASM_DI for consistency with _ASM_[ABCD]X. Signed-off-by: H. Peter Anvin commit ded00a56e99555c3f4000ef3eebfd5fe0d574565 Author: Paul E. McKenney Date: Sun Aug 17 12:50:36 2008 -0700 rcu: remove redundant ACCESS_ONCE definition from rcupreempt.c Remove the redundant definition of ACCESS_ONCE() from rcupreempt.c in favor of the one in compiler.h. Also merge the comment header from rcupreempt.c's definition into that in compiler.h. Signed-off-by: Paul E. McKenney Signed-off-by: Ingo Molnar commit 7b22ff5344fda666e0938e5261ea7b9a3dfce497 Author: FUJITA Tomonori Date: Mon Aug 18 00:36:18 2008 +0900 x86 gart: allocate size-aligned address for alloc_coherent, v2 This patch changes GART IOMMU to return a size aligned address wrt dma_alloc_coherent, as DMA-mapping.txt defines: The cpu return address and the DMA bus master address are both guaranteed to be aligned to the smallest PAGE_SIZE order which is greater than or equal to the requested size. This invariant exists (for example) to guarantee that if you allocate a chunk which is smaller than or equal to 64 kilobytes, the extent of the buffer you receive will not cross a 64K boundary. Signed-off-by: FUJITA Tomonori Signed-off-by: Ingo Molnar commit cd95851785bcfe95fdf73689e8ecb5a1c5959231 Author: Paul E. McKenney Date: Sun Aug 17 07:37:15 2008 -0700 rcu: fix classic RCU locking cleanup lockdep problem On Fri, Aug 15, 2008 at 04:24:30PM +0200, Ingo Molnar wrote: > > Paul, > > one of your two recent RCU patches caused this lockdep splat in -tip > testing: > > -------------------> > Brought up 2 CPUs > Total of 2 processors activated (6850.87 BogoMIPS). > PM: Adding info for No Bus:platform > khelper used greatest stack depth: 3124 bytes left > > ================================= > [ INFO: inconsistent lock state ] > 2.6.27-rc3-tip #1 > --------------------------------- > inconsistent {softirq-on-W} -> {in-softirq-W} usage. > ksoftirqd/0/4 [HC0[0]:SC1[1]:HE1:SE0] takes: > (&rcu_ctrlblk.lock){-+..}, at: [] __rcu_process_callbacks+0x1ac/0x1f0 > {softirq-on-W} state was registered at: > [] __lock_acquire+0x3f4/0x5b0 > [] lock_acquire+0x89/0xc0 > [] _spin_lock+0x3b/0x70 > [] rcu_init_percpu_data+0x29/0x80 > [] rcu_cpu_notify+0xaf/0xd0 > [] notifier_call_chain+0x2d/0x60 > [] __raw_notifier_call_chain+0x1e/0x30 > [] _cpu_up+0x79/0x110 > [] cpu_up+0x4d/0x70 > [] kernel_init+0xb1/0x200 > [] kernel_thread_helper+0x7/0x10 > [] 0xffffffff > irq event stamp: 14 > hardirqs last enabled at (14): [] trace_hardirqs_on+0xb/0x10 > hardirqs last disabled at (13): [] trace_hardirqs_off+0xb/0x10 > softirqs last enabled at (0): [] copy_process+0x276/0x1190 > softirqs last disabled at (11): [] call_on_stack+0x1a/0x30 > > other info that might help us debug this: > no locks held by ksoftirqd/0/4. > > stack backtrace: > Pid: 4, comm: ksoftirqd/0 Not tainted 2.6.27-rc3-tip #1 > [] print_usage_bug+0x16c/0x1b0 > [] mark_lock+0xa75/0xb10 > [] ? sched_clock+0x15/0x30 > [] __lock_acquire+0x3ad/0x5b0 > [] lock_acquire+0x89/0xc0 > [] ? __rcu_process_callbacks+0x1ac/0x1f0 > [] _spin_lock+0x3b/0x70 > [] ? __rcu_process_callbacks+0x1ac/0x1f0 > [] __rcu_process_callbacks+0x1ac/0x1f0 > [] rcu_process_callbacks+0x26/0x50 > [] __do_softirq+0x95/0x120 > [] ? __do_softirq+0x0/0x120 > [] call_on_stack+0x1a/0x30 > [] ? ksoftirqd+0x96/0x110 > [] ? ksoftirqd+0x0/0x110 > [] ? kthread+0x47/0x80 > [] ? kthread+0x0/0x80 > [] ? kernel_thread_helper+0x7/0x10 > ======================= > calling init_cpufreq_transition_notifier_list+0x0/0x20 > initcall init_cpufreq_transition_notifier_list+0x0/0x20 returned 0 after 0 msecs > calling net_ns_init+0x0/0x190 > net_namespace: 676 bytes > initcall net_ns_init+0x0/0x190 returned 0 after 0 msecs > calling cpufreq_tsc+0x0/0x20 > initcall cpufreq_tsc+0x0/0x20 returned 0 after 0 msecs > calling reboot_init+0x0/0x20 > initcall reboot_init+0x0/0x20 returned 0 after 0 msecs > calling print_banner+0x0/0x10 > Booting paravirtualized kernel on bare hardware > > <----------------------- > > my guess is on: > > commit 1f7b94cd3d564901f9e04a8bc5832ae7bfd690a0 > Author: Paul E. McKenney > Date: Tue Aug 5 09:21:44 2008 -0700 > > rcu: classic RCU locking and memory-barrier cleanups > > Ingo Fixes a problem detected by lockdep in which rcu->lock was acquired both in irq context and in process context, but without disabling from process context. Signed-off-by: Paul E. McKenney Signed-off-by: Ingo Molnar commit 5bbd4c3724008c93cf3efdfc38a3402e245ab506 Author: Mathieu Desnoyers Date: Fri Aug 15 12:56:59 2008 -0400 x86: spinlock use LOCK_PREFIX Since we are now using DS prefixes instead of NOP to remove LOCK prefixes, there is no longer any problems with instruction boundaries moving around. * Linus Torvalds (torvalds@linux-foundation.org) wrote: > > > On Thu, 14 Aug 2008, Mathieu Desnoyers wrote: > > > > Changing the 0x90 (single-byte nop) currently used into a 0x3E DS segment > > override prefix should fix this issue. Since the default of the atomic > > instructions is to use the DS segment anyway, it should not affect the > > behavior. > > Ok, so I think this is an _excellent_ patch, but I'd like to also then use > LOCK_PREFIX in include/asm-x86/futex.h. > > See commit 9d55b9923a1b7ea8193b8875c57ec940dc2ff027. > > Linus Unless there a rationale for this, I think these be changed to LOCK_PREFIX too. grep "lock ;" include/asm-x86/spinlock.h "lock ; cmpxchgw %w1,%2\n\t" asm volatile("lock ; xaddl %0, %1\n" "lock ; cmpxchgl %1,%2\n\t" Applies to 2.6.27-rc2. Signed-off-by: Mathieu Desnoyers Acked-by: Linus Torvalds CC: Linus Torvalds CC: H. Peter Anvin CC: Jeremy Fitzhardinge CC: Roland McGrath CC: Ingo Molnar Cc: Steven Rostedt CC: Steven Rostedt CC: Thomas Gleixner CC: Peter Zijlstra CC: Andrew Morton CC: David Miller CC: Ulrich Drepper CC: Rusty Russell CC: Gregory Haskins CC: Arnaldo Carvalho de Melo CC: "Luis Claudio R. Goncalves" CC: Clark Williams CC: Christoph Lameter CC: Andi Kleen CC: Harvey Harrison Signed-off-by: H. Peter Anvin commit 1f49a2c2aeb22d5abc6d4ea574ff63d37ca55fbe Author: Mathieu Desnoyers Date: Fri Aug 15 12:45:09 2008 -0400 x86: revert replace LOCK_PREFIX in futex.h Since we now use DS prefixes instead of NOP to remove LOCK prefixes, there are no longer any issues with instruction boundaries moving around. Depends on : x86 alternatives : fix LOCK_PREFIX race with preemptible kernel and CPU hotplug On Thu, 14 Aug 2008, Mathieu Desnoyers wrote: > > Changing the 0x90 (single-byte nop) currently used into a 0x3E DS segment > override prefix should fix this issue. Since the default of the atomic > instructions is to use the DS segment anyway, it should not affect the > behavior. Ok, so I think this is an _excellent_ patch, but I'd like to also then use LOCK_PREFIX in include/asm-x86/futex.h. See commit 9d55b9923a1b7ea8193b8875c57ec940dc2ff027. Linus Applies to 2.6.27-rc2 (and -rc3 unless hell broke loose in futex.h between rc2 and rc3). Signed-off-by: Mathieu Desnoyers CC: Linus Torvalds CC: H. Peter Anvin CC: Jeremy Fitzhardinge CC: Roland McGrath CC: Ingo Molnar Cc: Steven Rostedt CC: Steven Rostedt CC: Thomas Gleixner CC: Peter Zijlstra CC: Andrew Morton CC: David Miller CC: Ulrich Drepper CC: Rusty Russell CC: Gregory Haskins CC: Arnaldo Carvalho de Melo CC: "Luis Claudio R. Goncalves" CC: Clark Williams CC: Christoph Lameter CC: Andi Kleen CC: Harvey Harrison Signed-off-by: H. Peter Anvin commit f88f07e0f0fd6376e081b10930d272a08fbf082f Author: Mathieu Desnoyers Date: Thu Aug 14 16:58:15 2008 -0400 x86: alternatives : fix LOCK_PREFIX race with preemptible kernel and CPU hotplug If a kernel thread is preempted in single-cpu mode right after the NOP (nop about to be turned into a lock prefix), then we CPU hotplug a CPU, and then the thread is scheduled back again, a SMP-unsafe atomic operation will be used on shared SMP variables, leading to corruption. No corruption would happen in the reverse case : going from SMP to UP is ok because we split a bit instruction into tiny pieces, which does not present this condition. Changing the 0x90 (single-byte nop) currently used into a 0x3E DS segment override prefix should fix this issue. Since the default of the atomic instructions is to use the DS segment anyway, it should not affect the behavior. The exception to this are references that use ESP/RSP and EBP/RBP as the base register (they will use the SS segment), however, in Linux (a) DS == SS at all times, and (b) we do not distinguish between segment violations reported as #SS as opposed to #GP, so there is no need to disassemble the instruction to figure out the suitable segment. This patch assumes that the 0x3E prefix will leave atomic operations as-is (thus assuming they normally touch data in the DS segment). Since there seem to be no obvious ill-use of other segment override prefixes for atomic operations, it should be safe. It can be verified with a quick grep -r LOCK_PREFIX include/asm-x86/ grep -A 1 -r LOCK_PREFIX arch/x86/ Taken from This source : AMD64 Architecture Programmer's Manual Volume 3: General-Purpose and System Instructions States "Instructions that Reference a Non-Stack Segment—If an instruction encoding references any base register other than rBP or rSP, or if an instruction contains an immediate offset, the default segment is the data segment (DS). These instructions can use the segment-override prefix to select one of the non-default segments, as shown in Table 1-5." Therefore, forcing the DS segment on the atomic operations, which already use the DS segment, should not change. This source : http://wiki.osdev.org/X86_Instruction_Encoding States "In 64-bit the CS, SS, DS and ES segment overrides are ignored." Confirmed by "AMD 64-Bit Technology" A.7 http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/x86-64_overview.pdf "In 64-bit mode, the DS, ES, SS and CS segment-override prefixes have no effect. These four prefixes are no longer treated as segment-override prefixes in the context of multipleprefix rules. Instead, they are treated as null prefixes." This patch applies to 2.6.27-rc2, but would also have to be applied to earlier kernels (2.6.26, 2.6.25, ...). Performance impact of the fix : tests done on "xaddq" and "xaddl" shows it actually improves performances on Intel Xeon, AMD64, Pentium M. It does not change the performance on Pentium II, Pentium 3 and Pentium 4. Xeon E5405 2.0GHz : NR_TESTS 10000000 test empty cycles : 162207948 test test 1-byte nop xadd cycles : 170755422 test test DS override prefix xadd cycles : 170000118 * test test LOCK xadd cycles : 472012134 AMD64 2.0GHz : NR_TESTS 10000000 test empty cycles : 146674549 test test 1-byte nop xadd cycles : 150273860 test test DS override prefix xadd cycles : 149982382 * test test LOCK xadd cycles : 270000690 Pentium 4 3.0GHz NR_TESTS 10000000 test empty cycles : 290001195 test test 1-byte nop xadd cycles : 310000560 test test DS override prefix xadd cycles : 310000575 * test test LOCK xadd cycles : 1050103740 Pentium M 2.0GHz NR_TESTS 10000000 test empty cycles : 180000523 test test 1-byte nop xadd cycles : 320000345 test test DS override prefix xadd cycles : 310000374 * test test LOCK xadd cycles : 480000357 Pentium 3 550MHz NR_TESTS 10000000 test empty cycles : 510000231 test test 1-byte nop xadd cycles : 620000128 test test DS override prefix xadd cycles : 620000110 * test test LOCK xadd cycles : 800000088 Pentium II 350MHz NR_TESTS 10000000 test empty cycles : 200833494 test test 1-byte nop xadd cycles : 340000130 test test DS override prefix xadd cycles : 340000126 * test test LOCK xadd cycles : 530000078 Speed test modules can be found at http://ltt.polymtl.ca/svn/trunk/tests/kernel/test-prefix-speed-32.c http://ltt.polymtl.ca/svn/trunk/tests/kernel/test-prefix-speed.c Macro-benchmarks 2.0GHz E5405 Core 2 dual Quad-Core Xeon Summary * replace smp lock prefixes with DS segment selector prefixes no lock prefix (s) with lock prefix (s) Speedup make -j1 kernel/ 33.94 +/- 0.07 34.91 +/- 0.27 2.8 % hackbench 50 2.99 +/- 0.01 3.74 +/- 0.01 25.1 % * replace smp lock prefixes with 0x90 nops no lock prefix (s) with lock prefix (s) Speedup make -j1 kernel/ 34.16 +/- 0.32 34.91 +/- 0.27 2.2 % hackbench 50 3.00 +/- 0.01 3.74 +/- 0.01 24.7 % Detail : 1 CPU, replace smp lock prefixes with DS segment selector prefixes make -j1 kernel/ real 0m34.067s user 0m30.630s sys 0m2.980s real 0m33.867s user 0m30.582s sys 0m3.024s real 0m33.939s user 0m30.738s sys 0m2.876s real 0m33.913s user 0m30.806s sys 0m2.808s avg : 33.94s std. dev. : 0.07s hackbench 50 Time: 2.978 Time: 2.982 Time: 3.010 Time: 2.984 Time: 2.982 avg : 2.99 std. dev. : 0.01 1 CPU, noreplace-smp make -j1 kernel/ real 0m35.326s user 0m30.630s sys 0m3.260s real 0m34.325s user 0m30.802s sys 0m3.084s real 0m35.568s user 0m30.722s sys 0m3.168s real 0m34.435s user 0m30.886s sys 0m2.996s avg.: 34.91s std. dev. : 0.27s hackbench 50 Time: 3.733 Time: 3.750 Time: 3.761 Time: 3.737 Time: 3.741 avg : 3.74 std. dev. : 0.01 1 CPU, replace smp lock prefixes with 0x90 nops make -j1 kernel/ real 0m34.139s user 0m30.782s sys 0m2.820s real 0m34.010s user 0m30.630s sys 0m2.976s real 0m34.777s user 0m30.658s sys 0m2.916s real 0m33.924s user 0m30.634s sys 0m2.924s real 0m33.962s user 0m30.774s sys 0m2.800s real 0m34.141s user 0m30.770s sys 0m2.828s avg : 34.16 std. dev. : 0.32 hackbench 50 Time: 2.999 Time: 2.994 Time: 3.004 Time: 2.991 Time: 2.988 avg : 3.00 std. dev. : 0.01 I did more runs (20 runs of each) to compare the nop case to the DS prefix case. Results in seconds. They actually does not seems to show a significant difference. NOP 34.155 33.955 34.012 35.299 35.679 34.141 33.995 35.016 34.254 33.957 33.957 34.008 35.013 34.494 33.893 34.295 34.314 34.854 33.991 34.132 DS 34.080 34.304 34.374 35.095 34.291 34.135 33.940 34.208 35.276 34.288 33.861 33.898 34.610 34.709 33.851 34.256 35.161 34.283 33.865 35.078 Used http://www.graphpad.com/quickcalcs/ttest1.cfm?Format=C to do the T-test (yeah, I'm lazy) : Group Group One (DS prefix) Group Two (nops) Mean 34.37815 34.37070 SD 0.46108 0.51905 SEM 0.10310 0.11606 N 20 20 P value and statistical significance: The two-tailed P value equals 0.9620 By conventional criteria, this difference is considered to be not statistically significant. Confidence interval: The mean of Group One minus Group Two equals 0.00745 95% confidence interval of this difference: From -0.30682 to 0.32172 Intermediate values used in calculations: t = 0.0480 df = 38 standard error of difference = 0.155 So, unless these calculus are completely bogus, the difference between the nop and the DS case seems not to be statistically significant. Signed-off-by: Mathieu Desnoyers Acked-by: H. Peter Anvin CC: Linus Torvalds CC: Jeremy Fitzhardinge CC: Roland McGrath CC: Ingo Molnar Cc: Steven Rostedt CC: Steven Rostedt CC: Thomas Gleixner CC: Peter Zijlstra CC: Andrew Morton CC: David Miller CC: Ulrich Drepper CC: Rusty Russell CC: Gregory Haskins CC: Arnaldo Carvalho de Melo CC: "Luis Claudio R. Goncalves" CC: Clark Williams CC: Christoph Lameter CC: Andi Kleen CC: Harvey Harrison Signed-off-by: H. Peter Anvin commit f3efbe582b5396d134024c03a5fa253f2a85d9a6 Merge: 05d3ed0... b635ace... Author: Ingo Molnar Date: Fri Aug 15 18:15:17 2008 +0200 Merge branch 'linus' into x86/gart commit 48e2bd56b1d1ae4b95fb21be778927b64d5c4235 Author: Gustavo F. Padovan Date: Sat Aug 2 12:50:37 2008 -0300 x86: coding style fixes to arch/x86/kernel/traps_64.c Fix coding style of traps_64.c with improvements suggested by Ingo. Signed-off-by: Gustavo F. Padovan Signed-off-by: Ingo Molnar commit 5802294f1b1895ee19a3d0ae72805da453afb9de Author: Steven Rostedt Date: Wed Jul 30 14:20:55 2008 -0400 rcu: trace fix possible mem-leak In the initialization of the RCU trace module, if rcupreempt_debugfs_init() fails, we never free the the trace buffer. This patch frees the trace buffer in case the debugfs fails. Signed-off-by: Steven Rostedt Reviewed-by: "Paul E. McKenney" Signed-off-by: Ingo Molnar commit dd0078f4f04d939950a792c493d7d97d7ce663b8 Author: Steven Rostedt Date: Wed Jul 30 14:20:54 2008 -0400 rcu: just rename call_rcu_bh instead of making it a macro Seems that I found a box that has a config that passes call_rcu_bh as a function pointer (see net/sctp/sm_make_chunk.c), so declaring the call_rcu_bh has a macro function isn't good enough. This patch makes it just another name of call_rcu for rcupreempt. Signed-off-by: Steven Rostedt Reviewed-by: "Paul E. McKenney" Signed-off-by: Ingo Molnar commit c9c3dddd8f9a05b25d4ce53e8e80cc0ea1759d18 Author: Alexey Dobriyan Date: Fri Aug 1 03:51:38 2008 +0400 x86_64: remove empty lines from stack traces/oopses Signed-off-by: Alexey Dobriyan Cc: ak@suse.de Cc: akpm@osdl.org Signed-off-by: Ingo Molnar commit 3fb669dd6ec11e14819c0114a0e68a9ddcec65e1 Author: Richard Kennedy Date: Fri Aug 1 13:36:28 2008 +0100 reorder struct prop_local_single to remove padding on 64 bit builds reorder structure to remove 8 bytes of padding on 64 bit builds (also removes 8 bytes from task_struct) Signed-off-by: Richard Kennedy Cc: peterz@infradead.org Signed-off-by: Ingo Molnar commit bee367ed066e26c14263d808136fba8eec3bd70a Author: Richard Kennedy Date: Fri Aug 1 13:24:08 2008 +0100 sched: reorder struct sched_rt_entity to remove padding on 64 bit builds remove 8 bytes of padding on 64 bit builds (also removes 8 bytes from task_struct) Signed-off-by: Richard Kennedy Signed-off-by: Ingo Molnar commit 07dd20e0324f4d3e33bde1944d4f7771a09c498c Author: Richard Kennedy Date: Fri Aug 1 13:18:04 2008 +0100 sched: reorder signal_struct to remove 8 bytes on 64 bit builds reorder structure to remove 8 bytes of padding on 64 bit builds Signed-off-by: Richard Kennedy Signed-off-by: Ingo Molnar commit 04197c83b3e05546d1003cfa3ff43f1639c0057f Merge: 71998e8... b635ace... Author: Ingo Molnar Date: Fri Aug 15 17:07:34 2008 +0200 Merge branch 'linus' into x86/tracehook Conflicts: arch/x86/Kconfig Signed-off-by: Ingo Molnar commit 34d7c2b38d124219b7034356716e3455c439acd3 Author: Paul E. McKenney Date: Fri Aug 1 14:11:05 2008 -0700 rcu: remove list_for_each_rcu() All of the in-tree uses of list_for_each_rcu() have been converted to list_for_each_entry_rcu(), so list_for_each_rcu() can now be removed. Signed-off-by: Paul E. McKenney Signed-off-by: Ingo Molnar commit ff9cf2ce7afe76435d66c898cc9dacaa68e79d41 Author: Paul E. McKenney Date: Fri Aug 1 14:10:02 2008 -0700 rcu: fixes to include/linux/rcupreempt.h Hello! Compared tip/core/rcu to my latest patchset, and found the following issues: o the memory barrier in rcu_exit_nohz() somehow got out of place (it is correct in mainline as of 2.6.26-rc7). o There is a duplicate declaration of rcu_dyntick_sched. The attached patch fixes these. Signed-off-by: Paul E. McKenney Signed-off-by: Ingo Molnar commit d9336a9b47d57db835453968efbd0d5cedfe0260 Author: Paolo Ciarrocchi Date: Sat Aug 2 21:25:43 2008 +0200 x86: coding style fixes to arch/x86/kernel/paravirt_patch_32.c Before: total: 3 errors, 1 warnings, 49 lines checked After: total: 2 errors, 1 warnings, 49 lines checked paolo@paolo-desktop:~/linux.trees.git$ md5sum /tmp/paravirt_patch_32.o.* a78eea4264723e18c49dcfbe0ee0aae7 /tmp/paravirt_patch_32.o.after a78eea4264723e18c49dcfbe0ee0aae7 /tmp/paravirt_patch_32.o.before Signed-off-by: Paolo Ciarrocchi Signed-off-by: Ingo Molnar commit 3492cdf0176bde5e35223a1388d59676bc67c145 Author: Paolo Ciarrocchi Date: Sat Aug 2 21:25:13 2008 +0200 x86: coding style fixes to arch/x86/lib/string_32.c Before: total: 21 errors, 0 warnings, 237 lines checked After: total: 0 errors, 0 warnings, 237 lines checked paolo@paolo-desktop:~/linux.trees.git$ md5sum /tmp/string_32.o.* c55d059ef1612b32a8bb2771a72ae0d5 /tmp/string_32.o.after c55d059ef1612b32a8bb2771a72ae0d5 /tmp/string_32.o.before Signed-off-by: Paolo Ciarrocchi Signed-off-by: Ingo Molnar commit 209b580fd8c3a42b69550c98de434671d41a4ebb Author: Paolo Ciarrocchi Date: Sat Aug 2 21:24:45 2008 +0200 x86: coding style fixes to arch/x86/lib/strstr_32.c Before: total: 3 errors, 0 warnings, 31 lines checked After: total: 0 errors, 0 warnings, 31 lines checked paolo@paolo-desktop:~/linux.trees.git$ md5sum /tmp/strstr_32.o.* c96006ec3387862e5bacb139207a3098 /tmp/strstr_32.o.after c96006ec3387862e5bacb139207a3098 /tmp/strstr_32.o.before Signed-off-by: Paolo Ciarrocchi Signed-off-by: Ingo Molnar commit 2070dae10f50ec244f58292436ace9a3f9dc1d71 Author: Paolo Ciarrocchi Date: Sat Aug 2 21:24:06 2008 +0200 x86: coding style fixes to arch/x86/kernel/bios_uv.c paolo@paolo-desktop:~/linux.trees.git$ md5sum /tmp/bios_uv.o.* 9afe794594831166704744184e192ed8 /tmp/bios_uv.o.after 9afe794594831166704744184e192ed8 /tmp/bios_uv.o.before Signed-off-by: Paolo Ciarrocchi Signed-off-by: Ingo Molnar commit 020878ac427aa053414602cef975c2b5a2e33bf8 Author: Paolo Ciarrocchi Date: Sat Aug 2 21:23:36 2008 +0200 x86: coding style fixes to arch/x86/boot/compressed/misc.c Before: total: 4 errors, 6 warnings, 439 lines checked After: total: 1 errors, 5 warnings, 441 lines checked Before -#include +#include paolo@paolo-desktop:~/linux.trees.git$ md5sum /tmp/misc.o.* 8b2394e1fe519a9542e9a7e3e7b69c39 /tmp/misc.o.after 8b2394e1fe519a9542e9a7e3e7b69c39 /tmp/misc.o.before After -#include +#include paolo@paolo-desktop:~/linux.trees.git$ md5sum /tmp/misc.o.* 59a2d264284be5e72b5af4f3a8ccfb47 /tmp/misc.o.after 8b2394e1fe519a9542e9a7e3e7b69c39 /tmp/misc.o.before Signed-off-by: Paolo Ciarrocchi Signed-off-by: Ingo Molnar commit 2bd455dbfebfd632a8dcf1d3d1612737986fde0a Author: Li Zefan Date: Mon Aug 4 11:26:38 2008 +0800 x86: remove nesting CONFIG_HOTPLUG_CPU prefill_possible_map() is defined inside CONFIG_HOTPLUG_CPU, so the nesting CONFIG_HOTPLUG_CPU is just redundant. Signed-off-by: Li Zefan Signed-off-by: Ingo Molnar commit 516cbf3730c49739629d66313b20bdc50c98aa2c Author: Tim Bird Date: Tue Aug 12 12:52:36 2008 -0700 x86, bootup: add built-in kernel command line for x86 (v2) Allow x86 to support a built-in kernel command line. The built-in command line can override the one provided by the boot loader, for those cases where the boot loader is broken or it is difficult to change the command line in the the boot loader. H. Peter Anvin wrote: > Ingo Molnar wrote: >> Best would be to make it really apparent in the code that nothing >> changes if this config option is not set. Preferably there should be >> no extra code at all in that case. >> > > I would like to see this: [...Nested ifdefs...] OK. This version changes absolutely nothing if CONFIG_CMDLINE_BOOL is not set (the default). Also, no space is appended even when CONFIG_CMDLINE_BOOL is set, but the builtin string is empty. This is less sloppy all the way around, IMHO. Note that I use the same option names as on other arches for this feature. [ mingo@elte.hu: build fix ] Signed-off-by: Tim Bird Cc: Matt Mackall Signed-off-by: Ingo Molnar commit 1f7b94cd3d564901f9e04a8bc5832ae7bfd690a0 Author: Paul E. McKenney Date: Tue Aug 5 09:21:44 2008 -0700 rcu: classic RCU locking and memory-barrier cleanups This patch simplifies the locking and memory-barrier usage in the Classic RCU grace-period-detection mechanism, incorporating Lai Jiangshan's feedback from the earlier version (http://lkml.org/lkml/2008/8/1/400 and http://lkml.org/lkml/2008/8/3/43). Passed 10 hours of rcutorture concurrent with CPUs being put online and taken offline on a 128-hardware-thread Power machine. My apologies to whoever in the Eastern Hemisphere was planning to use this machine over the Western Hemisphere night, but it was sitting idle and... So this is ready for tip/core/rcu. This patch is in preparation for moving to a hierarchical algorithm to allow the very large SMP machines -- requested by some people at OLS, and there seem to have been a few recent patches in the 4096-CPU direction as well. The general idea is to move to a much more conservative concurrency design, then apply a hierarchy to reduce contention on the global lock by a few orders of magnitude (larger machines would see greater reductions). The reason for taking a conservative approach is that this code isn't on any fast path. Prototype in progress. This patch is against the linux-tip git tree (tip/core/rcu). If you wish to test this against 2.6.26, use the following set of patches: http://www.rdrop.com/users/paulmck/patches/2.6.26-ljsimp-1.patch http://www.rdrop.com/users/paulmck/patches/2.6.26-ljsimpfix-3.patch The first patch combines commits 5127bed588a2f8f3a1f732de2a8a190b7df5dce3 and 3cac97cbb14aed00d83eb33d4613b0fe3aaea863 from Lai Jiangshan , and the second patch contains my changes. Signed-off-by: Paul E. McKenney Signed-off-by: Ingo Molnar commit 293a17ebc944c958e24e6ffbd1d5a49abdbf489e Author: Paul E. McKenney Date: Tue Aug 12 17:25:03 2008 -0700 rcu: prevent console flood when one CPU sees another AWOL via RCU One small change needed to keep from flooding the console when one CPU notices that another is AWOL. Unless I am missing something subtle. Otherwise the cleanups look good! Signed-off-by: Paul E. McKenney Signed-off-by: Ingo Molnar commit dbc74c65b3fd841985935f676388c82d6b85c485 Author: Vesa-Matti Kari Date: Thu Aug 7 03:18:20 2008 +0300 selinux: Unify for- and while-loop style Replace "thing != NULL" comparisons with just "thing" to make the code look more uniform (mixed styles were used even in the same source file). Signed-off-by: Vesa-Matti Kari Acked-by: Stephen Smalley Signed-off-by: James Morris commit 8d7ccaa545490cdffdfaff0842436a8dd85cf47b Merge: b2139aa... 30a2f3c... Author: Ingo Molnar Date: Thu Aug 14 12:19:59 2008 +0200 Merge commit 'v2.6.27-rc3' into x86/prototypes Conflicts: include/asm-x86/dma-mapping.h Signed-off-by: Ingo Molnar commit 72dbf4790fc6736f9cb54424245114acf0b0038c Author: Bob Peterson Date: Tue Aug 12 13:39:29 2008 -0500 GFS2: rm on multiple nodes causes panic This patch fixes a problem whereby simultaneous unlink, rmdir, rename and link operations (e.g. rm -fR *) from multiple nodes on the same GFS2 file system can cause kernel panics, hangs, and/or memory corruption. It also gets rid of all the non-rgrp calls to gfs2_glock_nq_m. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse commit 9b8df98fc8973ad1c5f0d7c4cf71c7fb84fe22c5 Author: Steven Whitehouse Date: Fri Aug 8 13:45:13 2008 +0100 GFS2: Fix metafs mounts This patch is intended to fix the issues reported in bz #457798. Instead of having the metafs as a separate filesystem, it becomes a second root of gfs2. As a result it will appear as type gfs2 in /proc/mounts, but it is still possible (for backwards compatibility purposes) to mount it as type gfs2meta. A new mount flag "meta" is introduced so that its possible to tell the two cases apart in /proc/mounts. As a result it becomes possible to mount type gfs2 with -o meta and get the same result as mounting type gfs2meta. So it is possible to mount just the metafs on its own. Currently if you do this, its then impossible to mount the "normal" root of the gfs2 filesystem without first unmounting the metafs root. I'm not sure if thats a feature or a bug :-) Either way, this is a great improvement on the previous scheme and I've verified that it works ok with bind mounts on both the "normal" root and the metafs root in various combinations. There were also a bunch of functions in super.c which didn't belong there, so this moves them into ops_fstype.c where they can be static. Hopefully the mount/umount sequence is now more obvious as a result. Signed-off-by: Steven Whitehouse Cc: Alexander Viro commit c1e817d03a7de57a963654c35e6e80af9a5dbff5 Author: Steven Whitehouse Date: Tue Jul 22 22:58:03 2008 +0100 GFS2: Fix debugfs glock file iterator Due to an incorrect iterator, some glocks were being missed from the glock dumps obtained via debugfs. This patch fixes the problem and ensures that we don't miss any glocks in future. Signed-off-by: Steven Whitehouse commit 99809963c99e1ed868d9ebeb4a5e7ee1cbe0309f Author: Jeff Chua Date: Wed Aug 6 19:09:53 2008 +0800 x86: make sparsemem more available With CONFIG_X86_PC, I can set CONFIG_SPARSEMEM=y. With CONFIG_X86_GENERICARCH, CONFIG_SPARSEMEM depends on CONFIG_NUMA. I'm using the patch below to enable sparsemem instead of flatmem. System booted and is running. Signed-off-by: Ingo Molnar commit f2556896597c43cb48f04b1c16214938a6ccce9a Merge: 7c13e6a... 10fec20... Author: Ingo Molnar Date: Mon Aug 11 22:01:54 2008 +0200 Merge branch 'linus' into x86/defconfig commit 59f09ba2b62e6f89beeb4c8fc2c83fe14321dda9 Author: Philipp Kohlbecher Date: Wed Aug 6 15:25:26 2008 +0200 x86: fix comment in protected mode header Comments in arch/x86/boot/compressed/head_32.S erroneously refer to the real mode pointer as the second and the heap area as the third argument to decompress_kernel(). In fact, these have been the first and second argument, respectively, since v2.6.20. This patch corrects the comments. It introduces no code changes. Signed-off-by: Philipp Kohlbecher Signed-off-by: Ingo Molnar commit 7c13e6a3d15a4ebcc3f40df5f4d19665479f8ca3 Author: Dimitri Sivanich Date: Mon Aug 11 10:46:46 2008 -0500 x86: remove EXPERIMENTAL restriction from CONFIG_HOTPLUG_CPU This removes the EXPERIMENTAL restriction from CONFIG_HOTPLUG_CPU on the x86 architecture. Signed-off-by: Dimitri Sivanich Signed-off-by: Ingo Molnar commit 78635fc739b1254f3e0362ac430edbdd2cff01dc Author: Ingo Molnar Date: Mon Aug 11 13:34:15 2008 +0200 rcu, debug: detect stalled grace periods, cleanups small cleanups. Signed-off-by: Ingo Molnar commit 67182ae1c42206e516f7efb292b745e826497b24 Author: Paul E. McKenney Date: Sun Aug 10 18:35:38 2008 -0700 rcu, debug: detect stalled grace periods this is a diagnostic patch for Classic RCU. The approach is to record a timestamp at the beginning of the grace period (in rcu_start_batch()), then have rcu_check_callbacks() complain if: 1. it is running on a CPU that has holding up grace periods for a long time (say one second). This will identify the culprit assuming that the culprit has not disabled hardware irqs, instruction execution, or some such. 2. it is running on a CPU that is not holding up grace periods, but grace periods have been held up for an even longer time (say two seconds). It is enabled via the default-off CONFIG_DEBUG_RCU_STALL kernel parameter. Rather than exponential backoff, it backs off to once per 30 seconds. My feeling upon thinking on it was that if you have stalled RCU grace periods for that long, a few extra printk() messages are probably the least of your worries... Signed-off-by: Paul E. McKenney Cc: Peter Zijlstra Cc: Yinghai Lu Cc: David Witbrodt Signed-off-by: Ingo Molnar commit c4c0c56a7a85ed5725786219e4fbca7e840b1531 Merge: 5127bed... 796aade... Author: Ingo Molnar Date: Mon Aug 11 13:27:47 2008 +0200 Merge branch 'linus' into core/rcu commit 8aeb4022633f7d0eca5e13a9622bd73df92bbf2a Author: Huang Weiyi Date: Sun Aug 10 21:09:22 2008 +0800 arch/x86/kernel/cpuid.c: removed duplicated #include Removed duplicated include file in arch/x86/kernel/cpuid.c. Signed-off-by: Huang Weiyi Signed-off-by: Ingo Molnar commit 4c942654a4514d7d0a9b592a7d1b198a212e8a03 Author: Huang Weiyi Date: Sun Aug 10 20:57:45 2008 +0800 arch/x86/kernel/acpi/boot.c: removed duplicated #include Removed duplicated include file in arch/x86/kernel/acpi/boot.c. Signed-off-by: Huang Weiyi Signed-off-by: Ingo Molnar commit 6de9c70882ecdee63a652d493bf2353963bd4c22 Merge: d406d21... 796aade... Author: Ingo Molnar Date: Mon Aug 11 12:57:01 2008 +0200 Merge branch 'linus' into x86/cleanups commit d406d21d90dce2e66c7eb4a44605aac947fe55fb Author: Marcin Slusarz Date: Mon Aug 11 00:09:38 2008 +0200 x86: mpparse.c: fix section mismatch warning WARNING: vmlinux.o(.text+0x118f7): Section mismatch in reference from the function construct_ioapic_table() to the function .init.text:MP_bus_info() The function construct_ioapic_table() references the function __init MP_bus_info(). This is often because construct_ioapic_table lacks a __init annotation or the annotation of MP_bus_info is wrong. construct_ioapic_table is called only from construct_default_ISA_mptable which is __init Signed-off-by: Marcin Slusarz Cc: Thomas Gleixner Cc: Ingo Molnar Cc: H. Peter Anvin Signed-off-by: H. Peter Anvin commit bafc1dae8215c862c2e6ae913ddadc20581e59b9 Author: Marcin Slusarz Date: Mon Aug 11 00:11:13 2008 +0200 x86: mmconf: fix section mismatch warning WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x1591): Section mismatch in reference from the function init_amd() to the function .init.text:check_enable_amd_mmconf_dmi() The function __cpuinit init_amd() references a function __init check_enable_amd_mmconf_dmi(). If check_enable_amd_mmconf_dmi is only used by init_amd then annotate check_enable_amd_mmconf_dmi with a matching annotation. check_enable_amd_mmconf_dmi is only called from init_amd which is __cpuinit Signed-off-by: Marcin Slusarz Cc: Thomas Gleixner Cc: Ingo Molnar Cc: H. Peter Anvin Signed-off-by: H. Peter Anvin commit 85a14437ed24244c78f9a70d58b8299753b03c92 Author: Marcin Slusarz Date: Mon Aug 11 00:12:37 2008 +0200 x86: fix MP_processor_info section mismatch warning WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x1fe7): Section mismatch in reference from the function MP_processor_info() to the variable .init.data:x86_quirks The function __cpuinit MP_processor_info() references a variable __initdata x86_quirks. If x86_quirks is only used by MP_processor_info then annotate x86_quirks with a matching annotation. MP_processor_info uses x86_quirks which is __init and is used only from smp_read_mpc and construct_default_ISA_mptable which are __init Signed-off-by: Marcin Slusarz Cc: Thomas Gleixner Cc: Ingo Molnar Cc: H. Peter Anvin Signed-off-by: H. Peter Anvin commit 90936cfe6c8f7e90a6f8b0c5cb44d3a012dfd313 Author: Marcin Slusarz Date: Mon Aug 11 00:07:44 2008 +0200 x86, tsc: fix section mismatch warning WARNING: vmlinux.o(.text+0x7950): Section mismatch in reference from the function native_calibrate_tsc() to the function .init.text:tsc_read_refs() The function native_calibrate_tsc() references the function __init tsc_read_refs(). This is often because native_calibrate_tsc lacks a __init annotation or the annotation of tsc_read_refs is wrong. tsc_read_refs is called from native_calibrate_tsc which is not __init and native_calibrate_tsc cannot be marked __init Signed-off-by: Marcin Slusarz Cc: Thomas Gleixner Cc: Ingo Molnar Cc: H. Peter Anvin Signed-off-by: H. Peter Anvin commit 421fae06be9e0dac45747494756b3580643815f9 Author: Vesa-Matti Kari Date: Wed Aug 6 18:24:51 2008 +0300 selinux: conditional expression type validation was off-by-one expr_isvalid() in conditional.c was off-by-one and allowed invalid expression type COND_LAST. However, it is this header file that needs to be fixed. That way the if-statement's disjunction's second component reads more naturally, "if expr type is greater than the last allowed value" ( rather than using ">=" in conditional.c): if (expr->expr_type <= 0 || expr->expr_type > COND_LAST) Signed-off-by: Vesa-Matti Kari Signed-off-by: James Morris commit 15446235367fa4a621ff5abfa4b6ebbe25b33763 Author: Casey Schaufler Date: Wed Jul 30 15:37:11 2008 -0700 smack: limit privilege by label There have been a number of requests to make the Smack LSM enforce MAC even in the face of privilege, either capability based or superuser based. This is not universally desired, however, so it seems desirable to make it optional. Further, at least one legacy OS implemented a scheme whereby only processes running with one particular label could be exempt from MAC. This patch supports these three cases. If /smack/onlycap is empty (unset or null-string) privilege is enforced in the normal way. If /smack/onlycap contains a label only processes running with that label may be MAC exempt. If the label in /smack/onlycap is the star label ("*") the semantics of the star label combine with the privilege restrictions to prevent any violations of MAC, even in the presence of privilege. Again, this will be independent of the privilege scheme. Signed-off-by: Casey Schaufler Reviewed-by: James Morris commit cf9481e289247fe9cf40f2e2481220d899132049 Author: David Howells Date: Sun Jul 27 21:31:07 2008 +1000 SELinux: Fix a potentially uninitialised variable in SELinux hooks Fix a potentially uninitialised variable in SELinux hooks that's given a pointer to the network address by selinux_parse_skb() passing a pointer back through its argument list. By restructuring selinux_parse_skb(), the compiler can see that the error case need not set it as the caller will return immediately. Signed-off-by: David Howells Signed-off-by: James Morris commit 0c0e186f812457e527c420f7a4d02865fd0dc7d2 Author: Vesa-Matti J Kari Date: Mon Jul 21 02:50:20 2008 +0300 SELinux: trivial, remove unneeded local variable Hello, Remove unneeded local variable: struct avtab_node *newnode Signed-off-by: Vesa-Matti Kari Signed-off-by: James Morris commit df4ea865f09580b1cad621c0426612f598847815 Author: Vesa-Matti J Kari Date: Sun Jul 20 23:57:01 2008 +0300 SELinux: Trivial minor fixes that change C null character style Trivial minor fixes that change C null character style. Signed-off-by: Vesa-Matti Kari Signed-off-by: James Morris commit 3583a71183a02c51ca71cd180e9189cfb0411cc1 Author: Adrian Bunk Date: Tue Jul 22 20:21:23 2008 +0300 make selinux_write_opts() static This patch makes the needlessly global selinux_write_opts() static. Signed-off-by: Adrian Bunk Signed-off-by: James Morris commit a677f58a8c8c541bf7d02c658545084040f3708d Author: Yinghai Lu Date: Tue Jul 29 00:37:10 2008 -0700 x86: print per_cpu data address to make sure per_cpu data on correct node. Signed-off-by: Yinghai Lu Signed-off-by: Ingo Molnar commit e9c8abb66cc37801bdb5d4360bb78d180c3bbb73 Author: Gustavo F. Padovan Date: Tue Jul 29 02:48:56 2008 -0300 x86: coding style fixes to arch/x86/kernel/sys_x86_64.c Fix all errors and many warnings reported by checkpatch.pl without change sys_x86_64.o arch/x86/kernel/sys_x86_64.o: text data bss dec hex filename 1567 0 0 1567 61f sys_x86_64.o.after 1567 0 0 1567 61f sys_x86_64.o.before md5: de28ffedcb5851dfd7ec87a03afec1fd sys_x86_64.o.after de28ffedcb5851dfd7ec87a03afec1fd sys_x86_64.o.before Signed-off-by: Gustavo F. Padovan Signed-off-by: Ingo Molnar commit 4df9e510a9fda29aca71d8acac853b98aa6884d1 Author: Gustavo F. Padovan Date: Tue Jul 29 02:48:55 2008 -0300 x86: coding style fixes to arch/x86/kernel/traps_64.c Fix all errors and many warnings reported by checkpath.pl. Except the change of include to the traps.o before and after changes are the same. Signed-off-by: Gustavo F. Padovan Signed-off-by: Ingo Molnar commit caa007dd3687d38a0252484d9d0a8f9d929ba932 Author: Gustavo F. Padovan Date: Tue Jul 29 02:48:54 2008 -0300 x86: coding style fixes to arch/x86/kernel/signal_64.c Fix all errors and many warnings reported by checkpatch.pl without change signal_64.o arch/x86/kernel/signal_64.o text data bss dec hex filename 5143 0 8 5151 141f signal_64.o.after 5143 0 8 5151 141f signal_64.o.before md5: e68718092b3641cb27e79e55ce57e3ad signal_64.o.after e68718092b3641cb27e79e55ce57e3ad signal_64.o.before Signed-off-by: Gustavo F. Padovan Signed-off-by: Ingo Molnar commit 08aadf069d0482ade033badefa8f03eb2fcddd9c Author: Gustavo F. Padovan Date: Tue Jul 29 02:48:53 2008 -0300 x86: coding style fixes to arch/x86/kernel/crash_dump_64.c Fix conding style without change crash_dump_64.o arch/x86/kernel/crash_dump_64.o text data bss dec hex filename 129 0 0 129 81 crash_dump_64.o.after 129 0 0 129 81 crash_dump_64.o.before md5: 885b52c1b92737e6b12e5107e90fc1f1 crash_dump_64.o.after 885b52c1b92737e6b12e5107e90fc1f1 crash_dump_64.o.before Signed-off-by: Gustavo F. Padovan Signed-off-by: Ingo Molnar commit 8092c654de9a964c14d89da56834f73a80548a58 Author: Gustavo F. Padovan Date: Tue Jul 29 02:48:52 2008 -0300 x86: add KERN_INFO to printks on process_64.c Fix many coding style warnings. Signed-off-by: Gustavo F. Padovan Signed-off-by: Ingo Molnar commit 7de08b4e1ed8d80e6086f71b7e99fc4b397aae39 Author: Gustavo F. Padovan Date: Tue Jul 29 02:48:51 2008 -0300 x86: coding styles fixes to arch/x86/kernel/process_64.c Fix about 50 errors and many warnings without change process_64.o arch/x86/kernel/process_64.o: text data bss dec hex filename 5236 8 24 5268 1494 process_64.o.after 5236 8 24 5268 1494 process_64.o.before md5: 9c35e9debdea4e471288c6e8ca267a75 process_64.o.after 9c35e9debdea4e471288c6e8ca267a75 process_64.o.before Signed-off-by: Gustavo F. Padovan Signed-off-by: Ingo Molnar commit 71998e83c520c7a91b254dc9705baeedbee0d44f Merge: c9272c4... 99bbc4b... Author: Ingo Molnar Date: Mon Jul 28 17:03:43 2008 +0200 Merge branch 'x86-tracehook' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace into x86/tracehook commit 7225e75144b9718cbbe1820d9c011c809d5773fd Author: Randy Dunlap Date: Sat Jul 26 17:54:22 2008 -0700 documentation: move mtrr.txt to Doc/x86/ subdir Move mtrr.txt to the Documentation/x86/ subdirectory. Add 00-INDEX to the Documentation/x86/ subdirectory. Signed-off-by: Randy Dunlap Cc: Adrian Bunk Signed-off-by: Ingo Molnar commit 99bbc4b1e677ac695431e8d9c8e710ef391c567f Author: Roland McGrath Date: Sun Apr 20 14:35:12 2008 -0700 x86: tracehook: CONFIG_HAVE_ARCH_TRACEHOOK The x86 arch code has all the prerequisites, so set HAVE_ARCH_TRACEHOOK. Signed-off-by: Roland McGrath commit 59e52130f04537d2c80ea44bb007cadd1ad29543 Author: Roland McGrath Date: Sat Apr 19 19:10:57 2008 -0700 x86: tracehook: TIF_NOTIFY_RESUME This adds TIF_NOTIFY_RESUME support for x86, both 64-bit and 32-bit. When set, we call tracehook_notify_resume() on the way to user mode. Signed-off-by: Roland McGrath commit 4dfcbb997aa9f3a6a3ed8c192f0dac28b027e08f Author: Roland McGrath Date: Sat Apr 19 15:37:09 2008 -0700 x86 signals: use asm/syscall.h Replace local inlines with the asm/syscall.h interfaces that do the same things. Signed-off-by: Roland McGrath commit 68bd0f4ef7750fc277e1268bf40f443898382409 Author: Roland McGrath Date: Fri Apr 18 17:08:44 2008 -0700 x86: tracehook: asm/syscall.h Add asm/syscall.h for x86 with all the required entry points. This will allow arch-independent tracing code for system calls. Signed-off-by: Roland McGrath commit eeea3c3ff8af7f6960a0515d46dff6479bdb91f9 Author: Roland McGrath Date: Sun Mar 16 23:36:28 2008 -0700 x86: tracehook syscall This changes x86 syscall tracing to use the new tracehook.h entry points. There is no change, only cleanup. Signed-off-by: Roland McGrath commit 36a033082b5243d45d508c5ccd47a754edbc6821 Author: Roland McGrath Date: Fri Mar 14 17:46:38 2008 -0700 x86: tracehook_signal_handler This makes the x86 signal handling code use tracehook_signal_handler() in place of calling into ptrace guts. The call is moved after the sa_mask processing, but there is no other change. This cleanup doesn't matter to existing debuggers, but is the sensible thing: have all facets of the handler setup complete before the debugger inspects the task again. Signed-off-by: Roland McGrath commit 3964cd3a6721f18ef1dd67b9a0a89dc5b36683b9 Author: Ingo Molnar Date: Sat Jul 26 19:35:20 2008 +0200 x86: visws_quirks, fix build error fix: arch/x86/kernel/visws_quirks.c: In function ‘visws_early_detect’: arch/x86/kernel/visws_quirks.c:290: error: ‘skip_ioapic_setup’ undeclared (first use in this function) arch/x86/kernel/visws_quirks.c:290: error: (Each undeclared identifier is reported only once arch/x86/kernel/visws_quirks.c:290: error: for each function it appears in.) Signed-off-by: Ingo Molnar commit 39eacc20f93614f7bab63eb1d45060503afc46d0 Author: Huang Weiyi Date: Fri Jul 25 23:30:13 2008 +0800 arch/x86/kernel/visws_quirks.c: Removed duplicated #include Removed duplicated #include in arch/x86/kernel/visws_quirks.c. asm/apic.h asm/arch_hooks.h asm/io.h asm/visws/cobalt.h asm/visws/lithium.h asm/visws/piix4.h linux/init.h linux/interrupt.h linux/smp.h Signed-off-by: Huang Weiyi Signed-off-by: Ingo Molnar commit 17f3ab748e3ee8a0af069e53b93e1487cf44aecc Author: Joerg Roedel Date: Fri Jul 25 16:48:59 2008 +0200 x86: convert discontig_32.c from round_up to roundup Signed-off-by: Joerg Roedel Signed-off-by: Ingo Molnar commit be3e89ee6df8607356f705901dd90bcf3836c86e Author: Joerg Roedel Date: Fri Jul 25 16:48:58 2008 +0200 x86: convert numa_64.c from round_up to roundup Signed-off-by: Joerg Roedel Signed-off-by: Ingo Molnar commit d86bb0dac792c6a9c92944b6db2687980c808094 Author: Joerg Roedel Date: Fri Jul 25 16:48:57 2008 +0200 x86: convert init_64.c from round_up to roundup Signed-off-by: Joerg Roedel Signed-off-by: Ingo Molnar commit 15ae2d76ceb037a1e3fcd8fc9b4fe3177f9f3831 Author: Joerg Roedel Date: Fri Jul 25 16:48:56 2008 +0200 x86: convert pageattr.c from round_up to roundup Signed-off-by: Joerg Roedel Signed-off-by: Ingo Molnar commit 1ddb5518052e4e28ab489237443f7443b3fd69ca Author: Joerg Roedel Date: Fri Jul 25 16:48:55 2008 +0200 x86: convert pci-dma.c from round_up to roundup Signed-off-by: Joerg Roedel Signed-off-by: Ingo Molnar commit bda307ed7bdc160fcf1475a49f6c2e796fcb1294 Merge: 0791e13... 024e8ac... Author: Ingo Molnar Date: Sat Jul 26 15:38:48 2008 +0200 Merge branch 'linus' into x86/cleanups commit 1503af661947b7a4a09355cc2ae6aa0d43f16776 Merge: a318631... 024e8ac... Author: Ingo Molnar Date: Sat Jul 26 15:30:40 2008 +0200 Merge branch 'linus' into x86/header-guards Conflicts: include/asm-x86/gpio.h include/asm-x86/ide.h Signed-off-by: Ingo Molnar commit 3c9339049df5cc3a468c11de6c4101a1ea8c3d83 Author: Ingo Molnar Date: Fri Jul 25 11:32:36 2008 +0200 x86: fix ds.c build error fix: arch/x86/kernel/ds.c: In function ‘ds_allocate_buffer': arch/x86/kernel/ds.c:339: error: implicit declaration of function ‘PAGE_ALIGN' Signed-off-by: Ingo Molnar commit 0e2f65ee30eee2db054f7fd73f462c5da33ec963 Merge: da7878d... fb2e405... Author: Ingo Molnar Date: Fri Jul 25 11:37:07 2008 +0200 Merge branch 'linus' into x86/pebs Conflicts: arch/x86/Kconfig.cpu arch/x86/kernel/cpu/intel.c arch/x86/kernel/setup_64.c Signed-off-by: Ingo Molnar commit b2139aa0eec330c711c5a279db361e5ef1178e78 Author: Jaswinder Singh Date: Fri Jul 25 11:32:38 2008 +0530 X86_SMP: tlb_XX.c declare smp_invalidate_interrupt before they get used declare smp_invalidate_interrupt in asm-x86/hw_irq.h for X86_32 and X86_64 Signed-off-by: Jaswinder Singh commit 2907829cd0ffdef69083985ba28cf1cf3857c681 Author: Jaswinder Singh Date: Fri Jul 25 11:05:56 2008 +0530 X86_SMP: ipi.c declare functions before they get used move upwards for __send_IPI_shortcut and send_IPI_mask_bitmask Signed-off-by: Jaswinder Singh commit f86c99853b22576ee8dc4fa27ff6f3c0c7ce0ef8 Author: Jaswinder Singh Date: Fri Jul 25 10:52:53 2008 +0530 X86_SMP: smpboot.c declare idle_thread_array and smp_b_stepping as static Signed-off-by: Jaswinder Singh commit e7f08dfdaa82def3d685d16fdb99203cbb67ec95 Author: Jaswinder Singh Date: Fri Jul 25 10:42:26 2008 +0530 X86_SMP: smp.c declare functions before they get used declared following smp interrupts in asm-x86/hw_irq.h: smp_reschedule_interrupt, smp_call_function_interrupt, smp_call_function_single_interrupt Signed-off-by: Jaswinder Singh commit 4fe702c7e401f912c0edd294af6e37c02f451bbb Author: Jaswinder Singh Date: Fri Jul 25 10:19:27 2008 +0530 X86_32: declare pt_regs_access as unsigned long Fixed pt_regs_access to unsigned long as per X86_64 Signed-off-by: Jaswinder Singh commit 461d159694d683c9e43ead07720a3a0f17f6f060 Author: Jaswinder Singh Date: Fri Jul 25 09:19:05 2008 +0530 x86_64: Declare new_utsname in asm-x86/syscalls.h Signed-off-by: Jaswinder Singh commit 0791e13fbb1ea4e1808d055922c3f116b924bdc9 Author: Maciej W. Rozycki Date: Mon Jul 21 01:28:43 2008 +0100 x86: fix up a comment in ack_APIC_irq() Adjust a comment in ack_APIC_irq() according to the recent removal of CONFIG_X86_GOOD_APIC. Signed-off-by: Maciej W. Rozycki Signed-off-by: Ingo Molnar commit e0a5a5d9b006fd441e61685a051fa85d52fb172c Author: Alexander van Heukelum Date: Tue Jul 22 18:14:16 2008 +0200 x86, 64-bit, dwarf2: push pushes 8 bytes and popf pops 8 The CFI_ADJUST_CFA_OFFSET dwarf2 annotation of a push/popf pair in ret_from_fork wrongly used a value of 4. It should have been 8. Fix that. Signed-off-by: Alexander van Heukelum Cc: Andi Kleen Cc: heukelum@fastmail.fm Signed-off-by: Ingo Molnar commit 05d3ed0a1fe3ea05ab9f3b8d32576a0bc2e19660 Author: Prarit Bhargava Date: Mon Jul 21 10:15:22 2008 -0400 x86, pci: iommu fix potential overflow in alloc_iommu() It is possible that alloc_iommu()'s boundary_size overflows as dma_get_seg_boundary can return 0xffffffff. In that case, further usage of boundary_size triggers a BUG_ON() in the iommu code. Signed-off-by: Prarit Bhargava Signed-off-by: Ingo Molnar commit 71e3b818431957371c7f69fa1c576d4a403c1478 Author: Jaswinder Singh Date: Wed Jul 23 17:44:00 2008 +0530 x86: mach-default/setup.c declare no_broadcast before they get used included mach_ipi.h for no_broadcast declaration fixed minor spacing for no_broadcast Signed-off-by: Jaswinder Singh commit 01eb7858c017b1c63b962f8c2ad37133383ca560 Author: Jaswinder Singh Date: Wed Jul 23 17:41:59 2008 +0530 x86: mm/pgtable_32.c declare set_pmd_pfn before they get used Signed-off-by: Jaswinder Singh commit e0b7c8192ded4c2096388008d3ca6708caa8b601 Author: Jaswinder Singh Date: Wed Jul 23 17:40:34 2008 +0530 x86: mm/pageattr.c declare arch_report_meminfo before they get used declared arch_report_meminfo() in asm-x86/pgtable.h as it will be also accessible by fs/proc/proc_misc.c Signed-off-by: Jaswinder Singh commit 4b6e9f27d0034740e9cfa341b45c229ba30ec0c5 Author: Jaswinder Singh Date: Wed Jul 23 17:39:16 2008 +0530 x86: mm/ioremap.c declare early_ioremap_debug and early_ioremap_nested as static Signed-off-by: Jaswinder Singh commit 70ef56414ec7e01d787c8e959bb259845df4ee4f Author: Jaswinder Singh Date: Wed Jul 23 17:36:37 2008 +0530 x86: mm/fault.c declare do_page_fault before they get used declared do_page_fault() in asm-x86/trap.h for both X86_32 and X86_64 removed do_invalid_op declaration from mm/fault.c as it is already declared in asm-x86/trap.h Signed-off-by: Jaswinder Singh commit a80495ec927e8ec2b1ff085592bbe9bed77ffb3b Author: Jaswinder Singh Date: Wed Jul 23 17:33:57 2008 +0530 x86: mm/init_XX.c declare functions before they get used included in mm/init_32.c for zap_low_mappings() declared free_initmem() in asm-x86/page_XX.h Signed-off-by: Jaswinder Singh commit 8f7db5186cf126b56035d9a9735774d751090d66 Author: Jaswinder Singh Date: Wed Jul 23 17:31:02 2008 +0530 x86: vm86_32.c declare functions before they get used declared following syscalls in asm-x86/syscalls.h: sys_vm86old, sys_vm86 Signed-off-by: Jaswinder Singh commit 2b97df06ce44b1d145bd1299f50765803c2fabee Author: Jaswinder Singh Date: Wed Jul 23 17:13:14 2008 +0530 x86: apic_XX.c declare functions before they get used declared following smp interrupts in asm-x86/hw_irq.h: smp_apic_timer_interrupt, smp_spurious_interrupt, smp_error_interrupt Signed-off-by: Jaswinder Singh commit a31863168660c6b6f6c7ffe05bb6a38e97803326 Author: Vegard Nossum Date: Tue Jul 22 21:53:53 2008 +0200 x86: consolidate header guards This patch consolidates the header guard names which are also used externally, i.e. in .c files. Signed-off-by: Vegard Nossum commit a021e5124a6c57325ffb02a60cd1d5f40342f8aa Author: H. Peter Anvin Date: Tue Jul 22 15:33:57 2008 -0400 x86: doc: boot.txt: fix the size of the start_sys field The start_sys field is two bytes, not four. Signed-off-by: H. Peter Anvin commit 5616c23ad9cd3c50af674d408fef7b90abeee81c Author: H. Peter Anvin Date: Tue Jul 22 15:32:38 2008 -0400 x86: doc: move x86-generic documentation from Doc/x86/i386 The boot protocol, USB legacy support, and zero-page documentation is common to the x86 platform, not i386-specific. Signed-off-by: H. Peter Anvin commit 77ef50a522717fa040636ee1017179ceba12ff62 Author: Vegard Nossum Date: Wed Jun 18 17:08:48 2008 +0200 x86: consolidate header guards This patch is the result of an automatic script that consolidates the format of all the headers in include/asm-x86/. The format: 1. No leading underscore. Names with leading underscores are reserved. 2. Pathname components are separated by two underscores. So we can distinguish between mm_types.h and mm/types.h. 3. Everything except letters and numbers are turned into single underscores. Signed-off-by: Vegard Nossum commit a656c8efb40a8700046df20da2195f8aa39ce38a Author: Vegard Nossum Date: Tue Jul 22 21:27:11 2008 +0200 x86: fix spurious '#' in kvm header Signed-off-by: Vegard Nossum commit 1e84911c6c37fd1080ef07039e19c346628b31db Author: Jaswinder Singh Date: Mon Jul 21 22:58:29 2008 +0530 x86: mtrr/main.c declare range_state as static Signed-off-by: Jaswinder Singh commit 8fd329a1ac696973ba5467c510302ae1248cc11a Author: Jaswinder Singh Date: Mon Jul 21 22:54:56 2008 +0530 x86: common.c declare idle_regs before they get used Signed-off-by: Jaswinder Singh commit 1c6c727d9c12c84a612abe31b60948f06fc2ab2d Author: Jaswinder Singh Date: Mon Jul 21 22:40:37 2008 +0530 x86: proc.c declare cpuinfo_op before they get used Signed-off-by: Jaswinder Singh commit c1686aeaf0780055ffcd4b224b73d5ada77630e8 Author: Jaswinder Singh Date: Mon Jul 21 22:35:38 2008 +0530 x86: ptrace.c declare functions before they get used Signed-off-by: Jaswinder Singh commit 36454936c00c700ae86b5ff376d3c1c1a862c4f5 Author: Jaswinder Singh Date: Mon Jul 21 22:31:57 2008 +0530 x86: i387.c declare dump_fpu() before they get used Signed-off-by: Jaswinder Singh commit 791b897ccd1a2c6c184b88ca6d1aaf053499c3df Author: Jaswinder Singh Date: Mon Jul 21 22:28:22 2008 +0530 x86: pci-nommu.c declare nommu_dma_ops before they get used Signed-off-by: Jaswinder Singh commit 9321b8cbbbf3a8dbd4748e3722facaeb8401bd13 Author: Jaswinder Singh Date: Mon Jul 21 22:24:29 2008 +0530 x86: pci-dma.c declare iommu_bio_merge before they get used moved iommu_bio_merge from io_64.h to io.h because it is required for both. Signed-off-by: Jaswinder Singh commit a7b7511ac1404eaf0e7b6c445a7c61b48ccfcf0b Author: Jaswinder Singh Date: Mon Jul 21 22:19:29 2008 +0530 x86: e820.c declare pci_mem_start before they get used Signed-off-by: Jaswinder Singh commit 5314d48ed54c1a0111c597d1510f77850a1b3232 Author: Jaswinder Singh Date: Mon Jul 21 22:12:23 2008 +0530 x86: setup.c declare saved_video_mode before they get used Signed-off-by: Jaswinder Singh commit cc0384917bf69079088701a0725c5fc6b554bf35 Author: Jaswinder Singh Date: Mon Jul 21 21:52:51 2008 +0530 x86: time_XX.c declare functions before they get used Declare time_init() in asm-x86/time.h Also did cleanup in asm-x86/timer.h : timer_ack is only required for X86_32 int recalibrate_cpu_khz(void) is for X86_32 Signed-off-by: Jaswinder Singh commit b994b6c0332a5499b33880855dadad04d74cde54 Author: Jaswinder Singh Date: Mon Jul 21 21:37:52 2008 +0530 x86: signal_XX.c declare do_notify_resume before they get used Signed-off-by: Jaswinder Singh commit fb26132b441e75d6ba9996efc29b42081aee0abd Author: Jaswinder Singh Date: Mon Jul 21 21:36:40 2008 +0530 x86: process_32.c declare cpu_number before they get used Moved DECLARE_PER_CPU(int, cpu_number) from CONFIG_X86_32_SMP to CONFIG_X86_32 because cpu_number is required for both. And include asm/smp.h in process_32.c Signed-off-by: Jaswinder Singh commit bbc1f698a508927d21324b57500e863f9bd562b9 Author: Jaswinder Singh Date: Mon Jul 21 21:34:13 2008 +0530 x86: Introducing asm/syscalls.h Declaring arch-dependent syscalls for x86 architecture Signed-off-by: Jaswinder Singh commit 5127bed588a2f8f3a1f732de2a8a190b7df5dce3 Author: Lai Jiangshan Date: Sun Jul 6 17:23:59 2008 +0800 rcu classic: new algorithm for callbacks-processing(v2) This is v2, it's a little deference from v1 that I had send to lkml. use ACCESS_ONCE use rcu_batch_after/rcu_batch_before for batch # comparison. rcutorture test result: (hotplugs: do cpu-online/offline once per second) No CONFIG_NO_HZ: OK, 12hours No CONFIG_NO_HZ, hotplugs: OK, 12hours CONFIG_NO_HZ=y: OK, 24hours CONFIG_NO_HZ=y, hotplugs: Failed. (Failed also without my patch applied, exactly the same bug occurred, http://lkml.org/lkml/2008/7/3/24) v1's email thread: http://lkml.org/lkml/2008/6/2/539 v1's description: The code/algorithm of the implement of current callbacks-processing is very efficient and technical. But when I studied it and I found a disadvantage: In multi-CPU systems, when a new RCU callback is being queued(call_rcu[_bh]), this callback will be invoked after the grace period for the batch with batch number = rcp->cur+2 has completed very very likely in current implement. Actually, this callback can be invoked after the grace period for the batch with batch number = rcp->cur+1 has completed. The delay of invocation means that latency of synchronize_rcu() is extended. But more important thing is that the callbacks usually free memory, and these works are delayed too! it's necessary for reclaimer to free memory as soon as possible when left memory is few. A very simple way can solve this problem: a field(struct rcu_head::batch) is added to record the batch number for the RCU callback. And when a new RCU callback is being queued, we determine the batch number for this callback(head->batch = rcp->cur+1) and we move this callback to rdp->donelist if we find that head->batch <= rcp->completed when we process callbacks. This simple way reduces the wait time for invocation a lot. (about 2.5Grace Period -> 1.5Grace Period in average in multi-CPU systems) This is my algorithm. But I do not add any field for struct rcu_head in my implement. We just need to memorize the last 2 batches and their batch number, because these 2 batches include all entries that for whom the grace period hasn't completed. So we use a special linked-list rather than add a field. Please see the comment of struct rcu_data. Signed-off-by: Lai Jiangshan Cc: "Paul E. McKenney" Cc: Dipankar Sarma Cc: Gautham Shenoy Cc: Dhaval Giani Cc: Peter Zijlstra Signed-off-by: Ingo Molnar commit 3cac97cbb14aed00d83eb33d4613b0fe3aaea863 Author: Lai Jiangshan Date: Sun Jul 6 17:23:55 2008 +0800 rcu classic: simplify the next pending batch use a batch number(rcp->pending) instead of a flag(rcp->next_pending) rcu_start_batch() need to change this flag, so mb()s is needed for memory-access safe. but(after this patch applied) rcu_start_batch() do not change this batch number(rcp->pending), rcp->pending is managed by __rcu_process_callbacks only, and troublesome mb()s are eliminated. And codes look simpler and clearer. Signed-off-by: Lai Jiangshan Cc: "Paul E. McKenney" Cc: Dipankar Sarma Cc: Gautham Shenoy Cc: Dhaval Giani Cc: Peter Zijlstra Signed-off-by: Ingo Molnar commit da7878d75b8520c9ae00d27dfbbce546a7bfdfbb Merge: 0e50a4c... 543cf4c... Author: Ingo Molnar Date: Wed Jun 25 12:32:01 2008 +0200 Merge branch 'linus' into x86/pebs commit 0e50a4c6ab94ffe7e5515b86b5df9e5abc8c6b13 Merge: 34b2cd5... f26a398... Author: Thomas Gleixner Date: Sat May 17 16:01:05 2008 +0200 Merge branch 'linus' into x86/pebs commit 34b2cd5b688b012975fcfc3b3970fc3508fa82c4 Author: Ingo Molnar Date: Sat May 17 08:30:07 2008 +0200 x86: PEBS cleanup Signed-off-by: Ingo Molnar commit 573da4224e8c3800e613d715e909c3179a7e3cb2 Author: Cyrill Gorcunov Date: Mon Apr 28 23:15:04 2008 +0400 x86: DS cleanup - dont treat 0 as NULL Signed-off-by: Cyrill Gorcunov Signed-off-by: Ingo Molnar Signed-off-by: Thomas Gleixner commit 970e725098a6da5a9c1f8128102c812e31a0444c Author: Andrew Morton Date: Wed Apr 16 16:40:17 2008 -0700 x86, ptrace: PEBS support, warning fix arch/x86/kernel/process_32.c:566: warning: unused variable 'ds_next' arch/x86/kernel/process_32.c:566: warning: unused variable 'ds_prev' Cc: Markus Metzger Cc: Andi Kleen Cc: "H. Peter Anvin" Cc: "Siddha, Suresh B" Cc: Roland McGrath Cc: Michael Kerrisk Cc: Cc: stephane eranian Cc: Jason Wessel Signed-off-by: Andrew Morton Signed-off-by: Ingo Molnar Signed-off-by: Thomas Gleixner commit 93fa7636dfdc059b25df148f230c0991096afdef Author: Markus Metzger Date: Tue Apr 8 11:01:58 2008 +0200 x86, ptrace: PEBS support Polish the ds.h interface and add support for PEBS. Ds.c is meant to be the resource allocator for per-thread and per-cpu BTS and PEBS recording. It is used by ptrace/utrace to provide execution tracing of debugged tasks. It will be used by profilers (e.g. perfmon2). It may be used by kernel debuggers to provide a kernel execution trace. Changes in detail: - guard DS and ptrace by CONFIG macros - separate DS and BTS more clearly - simplify field accesses - add functions to manage PEBS buffers - add simple protection/allocation mechanism - added support for Atom Opens: - buffer overflow handling Currently, only circular buffers are supported. This is all we need for debugging. Profilers would want an overflow notification. This is planned to be added when perfmon2 is made to use the ds.h interface. - utrace intermediate layer Signed-off-by: Markus Metzger Signed-off-by: Ingo Molnar Signed-off-by: Thomas Gleixner