Summary of changes from v2.6.6 to v2.6.7-rc1 ============================================ [PATCH] 8139too: more useful debug info for tx_timeout Hi, I think this patch is useful for looking whether it's the real driver bug or other bug. What do you think of this? If ok, please apply. -- OGAWA Hirofumi [PATCH] 8139too: more useful debug info for tx_timeout /* disable Tx ASAP, if not already */ tmp8 = RTL_R8 (ChipCmd); if (tmp8 & CmdTxEnb) RTL_W8 (ChipCmd, CmdRxEnb); The above will clear the Tx Descs. So, this prints the debugging info before rtl8139_tx_timeout() does it. And IntrStatus etc. also prints anytime for the debug. - Converted Linux drivers to initialize DRM instances based on PCI IDs, not just a single instance. The PCI ID lists include a driver private field, which may be used by drivers for chip family or other information. Based on work by jonsmirl and Eric Anholt. I've left out the PCI device naming for this patch as that might be a bit controversial. clean up tdfx to look like everyone else.. From: Eric Anholt: - Move IRQ functions from drm_dma.h to new drm_irq.h and disentangle them from __HAVE_DMA. This will be useful for adding vblank sync support to sis and tdfx. Rename dma_service to irq_handler, which is more accurately what it is. - Fix the #if _HAVE_DMA_IRQ in radeon, r128, mga, i810, i830, gamma to have the right number of underscores. This may have been a problem in the case that the server died without doing its DRM_IOCTL_CONTROL to uninit left gamma_dma.c out of last changeset - Add DRM_GET_PRIV_WITH_RETURN macro. This can be used in shared code to get the drm_file_t * based on the filp passed in ioctl handlers. From Eric Anholt: Introduce a new ioctl, DRM_IOCTL_SET_VERSION. This ioctl allows the server or client to notify the DRM that it expects a certain version of the device dependent or device independent interface. If the major doesn't match or minor is too large, EINVAL is returned. A major of -1 means that the requestor doesn't care about that portion of the interface. The ioctl returns the actual versions in the same struct. From: Michel Daenzer: Memory layout transition: * the 2D driver initializes MC_FB_LOCATION and related registers sanely * the DRM deduces the layout from these registers * clients use the new SETPARAM ioctl to tell the DRM where they think the framebuffer is located in the card's address space * the DRM uses all this information to check client state and fix it up if necessary This is a prerequisite for things like direct rendering with IGP chips and video capturing. From Eric Anholt: some cleanups from AlanH: - Tie the DRM to a specific device: setunique no longer succeeds when given a busid that doesn't correspond to the device the DRM is attached to. This is a breaking of backwards-compatibility only for the multiple-DRI-head case with X Servers that don't use interface 1.1. - Move irq_busid to drm_irq.h and make it only return the IRQ for the current device. Retains compatibility with previous X Servers, cleans up unnecessary code. This means no irq_busid on !__HAVE_IRQ, but can be changed if necessary. - Bump interface version to 1.2. This version when set signifies that the control ioctl should ignore the irq number passed in and enable the interrupt handler for the attached device. Otherwise it errors out when the passed-in irq is not equal to the device's. - Store the highest version the interface has been set to in the device. From Eric Anholt: Return EBUSY when attempting to addmap a DRM_SHM area with a lock in it if dev->lock.hw_lock is already set. This fixes the case of two X Servers running on the same head on different VTs with interface 1.1, by making the 2nd head fail to inizialize like before. From Eric Anholt + Jon Smirl: Don't ioremap the framebuffer area. The ioremapped area wasn't used by anything. From Michel Daenzer: Adapt to nopage() prototype change in Linux 2.6.1. Reviewed by: Arjan van de Ven , additional feedback from William Lee Irwin III and Linus Torvalds. More differentiated error codes for DRM(agp_acquire) drm_ctx_dtor.patch Submitted by: Erdi Chen Miscellaneous changes from DRM CVS radeon_drm.h: missing define from previous checkin * Introduce COMMIT_RING() as in radeon DRM, stop using error prone writeback for ring read pointer (Paul Mackerras) * Get rid of some superfluous stuff, minor fixes From Jon Smirl: This code allows the mesa drivers to use a single definition of the DRM sarea/IOCTLS [PATCH] fealnx #0: replace dev->base_addr with ioaddr [PATCH] fealnx #1: replace magic constants with enums [PATCH] fealnx #2: add 'static'; fix wrapped comment [PATCH] fealnx #3: fix pointer substraction bug [PATCH] fealnx #4: stop doing stop_nic_rx/writel(np->crvalue) in reset_rx_descriptors this can inadvertently (re)enable tx and/or rx. [PATCH] fealnx #5: introduce stop_nic_rxtx(), use it where makes sense [PATCH] fealnx #6: Francois' fixes for low memory handling; remove free_one_rx_descriptor (not used anymore) [PATCH] fealnx #7: Garzik fix (IIRC): add locking to tx_timeout [PATCH] fealnx #8: rework error handling Add reset timer, fire it 1/2 sec after 'Too much work in interrupt' Move reset code from tx_timeout into two separate routines: reset_and_disable_rxtx() and enable_rxtx() New function reset_tx_descriptors(): clean up tx ring after tx_timeout. tx_timeout now does: reset_and_disable_rxtx() reset_tx_descriptors() enable_rxtx() netif_wake_queue() Absense of netif_wake_queue() call was probably the cause of tx_timeout() stalling all future tx. Remove stop_nic_tx(), not used anymore [PATCH] fealnx #9: fix locking for set_rx_mode [PATCH] fealnx #10: replace local delay functions with udelay [PATCH] fealnx #11: cleanup and coding style [PATCH] pcnet32 add register dump capability At the next opportunity to add new code to 2.6.6, please apply the following patch to include the capability to dump chip registers. Ethtool -d support. [PATCH] pcnet32 timer to free tx skbs for 79C971/972 At the next opportunity to add new code to 2.6.6, please apply the following: This patch uses an on-chip timer to free completed transmit skb's for the 79C971 and 972 versions which currently will leave completed transmit skb's on the transmit ring until new transmit traffic occurs. define an empty driver pci ids for ffb driver [PATCH] Cleanups for b44 Hi! During some unrelated work I was confused by b44_init_hw. Its return is checked in _open() but nowhere else. I started adding missing checks, but then I found why its so: it only ever returns 0. So this turns it into void. Killed #if 0-ed piece of code and fixed indentation at one point. Please apply, Pavel [libata sata_sis] add new PCI id Also remove constant from linux/pci_ids.h. [libata] Promise driver split part 1: clone to sx4 Clone sata_promise to sata_sx4. [libata] Promise driver split part 2: remove SX4 code from sata_promise [libata] Promise driver split part 3: remove TX2/4 code from sata_sx4 [libata] Promise driver split part 4: common header convert DRM to use pci device structures on Linux, move pci ids into a separate include file (this is auto-generated from the DRM tree) drmP.h: remove unused structure [CPUFREQ] powernow-k8 cpuid changes. cpuid changes to support new processors that will be coming out in the future. Also works around a processor that we have released to the field that can have an erroneous cpuid value. From paul.devriendt@amd.com [CPUFREQ] powernow-k8 ignore invalid p-states. From paul.devriendt@amd.com [CPUFREQ] powernow-k8: prevent BIOSs offering a vid of 0x1f, which means off. From paul.devriendt@amd.com [PATCH] USB: fix usbfs iso interval problem In 2.6, ISO transfers on USB require a value for urb->interval ... which usbfs didn't provide (until this patch), or let user mode drivers specify. This patch initializes the urb->interval from the endpoint's descriptor, so ISO transfers should now work from userspace. It also fixes a related problem for interrupt transfers. [PATCH] USB: root hubs can report remote wakeup feature The patch lets HCDs report the root hub remote wakeup feature to usbcore through config descriptors, and lets usbcore say whether or not remote wakeup (of host from sleep, by devices) should be enabled. Both OHCI and UHCI HCDs have some remote wakeup support already; I'm not too sure how well it works. Given (separate) patches, their root hubs can start to act more like other hubs in this area too. That'll make it easier to start using USB suspend mode. [PATCH] USB: Remove unusual_devs entries for Minolta DiMAGE 7, 7Hi It looks safe to conclude that the unusual_devs.h entries for the Minolta DiMAGE 7x cameras aren't needed. (Michael has tested the 7Hi and it's definitely unnecessary.) The two other DiMAGE entries probably aren't needed either, but we don't have any evidence of that so I'm leaving them. [PATCH] USB: unusual_devs.h update On Tue, 20 Apr 2004, Damian Ivereigh wrote: > Here is the output of dmesg when plugging in an IBM USB MemKey > > usb-storage: This device (0a16,8888,0100 S 06 P 50) has unneeded SubClass and Protocol entries in unusual_devs.h > Please send a copy of this message to Thank you for sending this in. Greg and Pete, here's the patch. [PATCH] USB: Implement endpoint_disable() for UHCI This patch implements the endpoint_disable method for the UHCI driver, as you requested a while back. It guarantees that during unbinding events (disconnect, configuration change, rmmod) the UHCI driver will have finished using every URB for the interface being unbound. It doesn't quite guarantee that the completion handlers will have finished running, but it would take a pretty unlikely race to violate that assumption. (I think it's the same with the OHCI and EHCI drivers.) Despite the patch numbering this one applies _after_ as249, which is a more important bugfix. [PATCH] USB: Eliminate dead code from the UHCI driver I'm not sure what this piece of code is doing in the UHCI driver. It looks like someone envisioned queuing several URBs for the same endpoint simultaneously. Anyway, the driver can't do that and this code can never run. [PATCH] USB: fix devio compiler warnings created by previous patch. [PATCH] USB: usbtest, smp unlink modes Handle some SMP-visible unlink states better. [PATCH] USB: re-factor enumeration logic This is an update to some patches from the December/January timeframe, which will help sort out some of the mess for drivers that need to use the reset logic. It's one of the last significant patches in my gadget-2.6 tree that haven't yet been merged into the main kernel tree. More refactoring of the enumeration code paths: * The first half of usb_new_device() becomes the second half of a new hub_port_init() routine (resets, sets address, gets descriptor) * The middle chunk of hub_port_connect_change() becomes the first half of that new hub_port_init() routine. * Khubd uses that new routine in hub_port_connect_change(). * Now usb_new_device() cleans up better after faults, and has a more useful locking policy (caller owns dev->serialize). * Has related minor cleanups including commenting some of the curious request sequences coming from khubd. Refactoring means a lot of the current usb_reset_device() logic won't need to stay an imperfect clone of the enumeration code ... soon, it can just call hub_port_init(). Even without touching usb_reset_device(), this eliminates a deadlock. Previously, address0_sem was used both during probe and during reset, so probe routines can't implement DFU firmware download (involves a reset; DFU also uncovers other problems) or safely recover from probe faults by resetting (usb-storage can try that). Now that lock is no longer held during probe(); so those deadlocks are gone. (And some drivers, like at76c503, can start to remove ugly workarounds.) [PATCH] USB: khubd fixes This goes on top of the other enumeration patch I just sent, to handle some dubious and/or broken hub configurations better. Make khubd handle some cases better: - Track power budget for bus-powered hubs. This version only warns when the budgets are exceeded. Eventually, the budgets should help prevent such errors. - Rejects illegal USB setup: two consecutive bus powered hubs would exceed the voltage drop budget, causing much flakiness. - For hosts with high speed hubs, warn when devices are hooked up to full speed hubs if they'd be faster on a high speed one. - For hubs that don't do power switching, don't try to use it - For hubs that aren't self-powered, don't report local power status [PATCH] USB usbfs: take a reference to the usb device Hi Greg, this is the first of a series of patches that replace the per-file semaphore ps->devsem with the per-device semaphore ps->dev->serialize. The role of devsem was to protect against device disconnection. This can be done equally well using ps->dev->serialize. On the other hand, ps->dev->serialize protects against configuration and other changes, and has already been introduced into usbfs in several places. Using just one semaphore simplifies the code and removes some remaining race conditions. It should also fix the oopses some people have been seeing. In this first patch, a reference is taken to the usb device as long as the usbfs file is open. That way we can use ps->dev->serialize for as long as ps exists. devio.c | 27 ++++++++++++++++----------- inode.c | 3 --- 2 files changed, 16 insertions(+), 14 deletions(-) [PATCH] USB usbfs: replace the per-file semaphore with the per-device semaphore devio.c | 43 +++++++++++++++++++++++-------------------- usbdevice_fs.h | 1 - 2 files changed, 23 insertions(+), 21 deletions(-) [PATCH] USB usbfs: remove obsolete comment from proc_resetdevice devio.c | 3 --- 1 files changed, 3 deletions(-) [PATCH] USB usbfs: fix up proc_setconfig The semaphore is now taken in the caller. devio.c | 2 -- 1 files changed, 2 deletions(-) [PATCH] USB usbfs: fix up proc_ioctl The semaphore is now taken in the caller. devio.c | 2 -- 1 files changed, 2 deletions(-) [PATCH] USB usbfs: fix up releaseintf The semaphore is now taken in the callers. devio.c | 2 -- 1 files changed, 2 deletions(-) [PATCH] USB usbfs: destroy submitted urbs only on the disconnected interface The remaining three patches contain miscellaneous fixes to usbfs. This one fixes up the disconnect callback to only shoot down urbs on the disconnected interface, and not on all interfaces. It also adds a sanity check (this check is pointless because the interface could never have been claimed in the first place if it failed, but I feel better having it there). devio.c | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-) [PATCH] USB usbfs: missing lock in proc_getdriver Hi Oliver, > I expect it to rarely matter, but it might matter now and then. It's > just a question of hygiene. If you are using a temporary buffer I'd > like to see it used to full advantage. So either drop the lock or do > a direct copy. I'd prefer the first option your patch implemented. I agree. Greg, please consider applying the updated patch: Protect against driver binding changes while reading the driver name. [PATCH] USB usbfs: drop pointless racy check The check of interface->dev.driver requires a lock to be taken to protect against driver binding changes. But in fact I think it is better just to drop the test. The result is that the caller is required to claim an interface before changing the altsetting, which is consistent with the other routines that operate on interfaces. devio.c | 6 ++---- 1 files changed, 2 insertions(+), 4 deletions(-) [PATCH] USB: Ignore URB_NO_INTERRUPT flag in UHCI Following a suggestion of David Brownell's I have decided to remove support for the URB_NO_INTERRUPT flag in the UHCI driver. The overall effect of the flag is to reduce the number of interrupts, thereby improving throughput somewhat while increasing the duration of the remaining IRQ handlers quite a lot (i.e., increasing interrupt variance). So I think we're better off without it. Mind you, this is all in the absence of any firm measurements. A common case where this will come up is during usb-storage bulk transfers. Such transfers are generally divided into scatter-gather components each corresponding to a single URB and transferring one memory page (4 KB). While generating an interrupt for each one is a little faster than ideal -- about every 3 ms -- it's better than waiting until 64 KB has been transferred and there are 1024 individual TDs to clean up during the IRQ. [PATCH] USB: Cosmetic improvements for the UHCI driver This patch makes a few minor improvements to the appearance of the UHCI driver. Please apply. [PATCH] USB: Altsetting updates for USB media drivers This patch implements the new altsetting regime for the drivers under usb/media. Not much needed to be changed. I'm unable to test any of the changes, but at least they compile all right (except that I didn't even try to compile the pwc driver since it's marked BROKEN). The stv680 and w9968cf drivers still include an assumption that they are bound to interface number 0. Since that the drivers are fairly tightly linked to a specific kind of device I didn't try to change those assumptions, but maybe they should be changed. [PATCH] USB: Altsetting update for USB misc drivers This is the altsetting update for the drivers under usb/misc. As you can, not much was needed at all. [PATCH] USB: Altsetting update for USB net drivers The only driver under usb/net that needed any altsetting changes was usbnet. I'm not looking forward to going through all the source files under usb/serial. :-( [libata] add ata_tf_{to,from}_fis helpers [libata] clean up taskfile submission to hardware When writing taskfile (an ATA command) to the controller, the exact setup of the taskfile is dependent on the taskfile "protocol": PIO, PIO Multiple, DMA, Non-data, etc. So, we separate out the submission of the taskfile to hardware into a separate function ata_qc_issue_prot(), which will later be the home for more code. Also, remove some dead code (#if 0'd). [PATCH] sata_vsc initialization fix drm_irq.h: remove NO_VERSION MPT Fusion add back FC909 support From: "Moore, Eric Dean" [PATCH] qlogicfas: kill horrible irq probing this patch kills irq probe and also I/O because isn't useful to probe I/O if we can't probe irq later. [PATCH] qlogicfas: split and create a new module [PATCH] qlogic_cs: use qlogicfas408 module this patch kills qlogic_core.c and I guess the same idea can be applied to other pcmcia scsi drivers. comments? [PATCH] qla2xxx set current state fixes - always set_current_state(TASK_UNINTERRUBTIBLE) unless we explicitly check for signals. - make all timeouts take HZ based values. [PATCH] SCSI tape log message fixes This patch changes the st console/log messages: - __GFP_NOWARN added to buffer allocation to suppress useless messages when having to use smaller than default segments - move log message from enlarge_buffer() to caller so that the tape name can be printed and remove some debugging messages; now the st messages should include drive name where applicable (a problem reported by Hironobu Ishii) - setting options is logged only when debugging; the most important options are now seen in sysfs [PATCH] aic7xxx: fix oops whe hardware is not present From: Herbert Xu This is because aic7xxx does not unregister itself properly if no devices are found. This patch fixes the problem. aic7xxx: compile fix for EISA only case We can't refer to PCI functions for a pure EISA machine. [PATCH] Update aacraid MAINTAINERS entry [PATCH] scsi: don't attach device if PQ indicates not connected [PATCH] 3ware driver update This patch includes the following driver changes: 1.26.00.038 - Roll driver minor version to 26 to denote kernel 2.6. Add support for cmds_per_lun module parameter. 1.26.00.039 - Fix bug in tw_chrdev_ioctl() polling code. Fix data_buffer_length usage in tw_chrdev_ioctl(). Update contact information. [libata] move ATAPI startup from katad thread to workqueue thread libata creates one thread per ata_port structure. This is inadequate for our needs, and also cumbersome to maintain, now that workqueues and Rusty's thread work is available. This patch begins to move libata away from doing its own per-port thread, by moving the ATAPI command initiation code to work under the workqueue system. This patch also creates a private workqueue, global to all of libata. [libata] move PIO data xfer from katad thread to workqueue thread [libata] move probe execution from katad thread to workqueue thread This allows us to kill the katad thread itself, and several thread-related variables in struct ata_port. [PATCH] fix module unload problem in sd Move scsi_device_get out of sd probe path to allow module to be unloaded when devices are not open. [libata] move ATAPI command initiation code from libata-scsi to libata-core [libata] make ata_wq workqueue local to libata-core module Now that libata-scsi module no longer calls queue_work() directly, we can localize the use of ata_wq. [libata] internal cleanup: kill ata_pio_start Integrate it into its caller. [libata] some work on the ATAPI path Remove a lot of redundant code in ATAPI packet submission. ATAPI is still disabled, it doesn't work yet. [libata] work queueing cleanups and fixes Make sure to initialize PIO data xfer state. Use queue_delayed_work() rather than manually calling schedule_timeout(), then queue_work(), ourselves. [libata] increase max-sectors limit for modern drives This is the much-discussed "speed up SATA" patch. It limits requests to 1MB as discussed, rather than the hardware maximum (32MB). As soon as Jens Axboe's patch to dynamically determining request size is merged, max_sectors becomes what it properly should be -- a description of the absolute hardware maximum. [libata] replace per-command semaphore with optional completion The semaphore was initialized and up'd for each command, but nobody was listening. Replace this with a completion, which may or may not be present. [libata promise] make sure our schedule_timeout(N) are never with N==0 Make sure we delay for a minimum desired length of time. drm_pciids.h: add new tdfx id, and blank ffb ids [PATCH] scsi_disk_release() warning fix drivers/scsi/sd.c: In function `scsi_disk_release': drivers/scsi/sd.c:1477: warning: unused variable `sdev' [libata] remove unused struct ata_engine [PATCH] sata_sx4.c warning fix drivers/scsi/sata_sx4.c: In function `pdc20621_put_to_dimm': drivers/scsi/sata_sx4.c:928: warning: comparison is always true due to limited range of data type The code is doing, effectively: if ((long)(expr returning u32)) >= 0 but on 64-bit architectures, that will always be true. So cast the u32 result to s32 before promoting to long so that bit 31 correctly propagates into bits 32-63. [PATCH] USB: mtouchusb update for 2.6.6-rc2 The attached patch for the 3M Touch Systems Capacitive controller. Quick list of changes: * Changed reset from standard USB dev reset to vendor reset * Changed data sent to host from compensated to raw coordinates * Eliminated vendor/product module params * Performed multiple successfull tests with an EXII-5010UC The changes are primarily due to comments from Vojtech Pavlik, as well as making the newer EXII-50XXUC controllers work. Thanks to 3M Touch Systems for sending me some new controllers to test with! An updated HOWTO is also available at: http://groomlakelabs.com/grandamp/code/microtouch/Linux-Input-USB-Touchscreen-HowTo.txt [PATCH] USB: audits in usb_init() there were some missing audits in usb_init() [PATCH] USB: be assertive in usbfs Be assertive. [PATCH] USB: Altsetting updates for usb/serial The updates needed for proper altsetting handling among the USB serial drivers turned out to be a lot easier than I expected, thanks to the organization of the drivers. Only a handful of changes were needed. [PATCH] USB: usb-storage driver changes for 2.6.x [1/4] Patch as239b from Alan Stern: This patch improves the interaction between a SCSI reset, an internally generated reset, and an abort. This improves our error-recovery in cases where the device is hung (or almost hung) while we're trying to auto-reset. [PATCH] USB: usb-storage driver changes for 2.6.x [2/4] This is patch as248b from Alan Stern, modified by myself: This adds a flag which allows us to supress the "unneeded unusual_devs.h entry" message. This is useful for times when idiotic device manufacturers break the rules and release two different devices with the same VID, PID, and revision number. [PATCH] USB: usb-storage driver changes for 2.6.x [3/4] This patch adds some clear-halt calls if a GetMaxLUN fails. Apparently, some devices (like certain early-rev Zip100s) stall their bulk pipes if they receive a GetMaxLUN. [PATCH] USB: usb-storage driver changes for 2.6.x [4/4] This is a trivial patch to remove some duplicate includes. sched.h and errno.h are already included in this file about a dozen lines or so above this point. USB: switch struct urb to use a kref instead of it's own atomic_t [libata sata_sx4] trivial: fix filename in header USB: removed unused atomic_t in keyspan driver structure. USB: make ehci driver use a kref instead of an atomic_t USB: fix incorrect usb-serial conversion for cur_altsetting from previous patch. [PATCH] USB: Allocate interface structures dynamically This is a revised version of an earlier patch; I feel a lot better about this one. Basically it does the same thing as before: allocate interfaces dynamically to avoid the problems with reusing them. The difference is that this patch adds a struct kref to the array of usb_interface_cache's, so the array can persist if needed after the device has been disconnected. Each interface takes a reference to it (along with the configuration itself), so as long as the interfaces remain pinned in memory the altsettings will also remain. Here is a slight revision of patch as246b. This one allocates all the new interfaces before changing any other state; otherwise it's the same. USB: fix compiler warnings in devices.c file. Cset exclude: jejb@mulgrave.(none)|ChangeSet|20040404150128|05866 scsi_get_device needs no NULL check [PATCH] add SMBIOS tables to sysfs -- UPDATED My cleanups to the smbios driver. [PATCH] USB: add new USB PhidgetServo driver Here is a driver for the usb servo controllers from Phidgets , using sysfs. Note that the devices claim to be hid devices, so I've added them to the hid_blacklist (HID_QUIRK_IGNORE). A servo controller isn't really an hid device (or is it?). [PATCH] USB: Lock devices during tree traversal On Tue, 27 Apr 2004, Greg KH wrote: > So, what's next in this patch series? :) Funny you should ask... While writing those patches I noted a problem, that the USB device tree can change while a process reading /proc/bus/usb/devices is traversing it, leading to an oops when a pointer to a no-longer-existing child device is dereferenced. The ensuing discussion led to the conclusion that the devices' ->serialize locks should be acquired, top-down, while going through the tree. That means changing the code that populates the devices file and changing the code that adds and removes USB device structures. This patch takes care of the first part. I'm delaying the second part because that section of usbcore is still under change -- David Brownell's revisions have not yet been fully integrated. A similar change should be made to usb_find_device() and match_device() in usb.c. You may want to add that yourself. [PATCH] USB: fix sparc64 2.6.6-rc2-mm2 build busted: usb/core/hub.c hubstatus > 2) An undefined 'hubstatus' variable in drivers/usb/core/hub.c: > > CC drivers/usb/core/hub.o > drivers/usb/core/hub.c: In function `hub_port_connect_change': > drivers/usb/core/hub.c:1343: error: `hubstatus' undeclared (first use in this function) > drivers/usb/core/hub.c:1343: error: (Each undeclared identifier is reported only once > drivers/usb/core/hub.c:1343: error: for each function it appears in.) > make[3]: *** [drivers/usb/core/hub.o] Error 1 > > As a total shot in the dark, the following fixes the build (I've no clue > if it is the right fix): Yes, it's the right fix. Greg, please merge the attached patch, which will be needed on any big-endian system. [PATCH] USB: fix PhidgetServo driver Somehow I managed to send the wrong version. Here is a patch which fixes that. (Remove a dev_info() which wasn't supposed to be there, and make sure that everything is still consistent in the unlikely event that kmalloc() fails). Just minor cleanups. USB: remove the wait_for_urb function from bfusb driver as it's no longer needed. USB: fix build error in hci_usb driver due to urb reference count change. This really needs to get fixed the proper way, by making the urb allocation dynamic in the driver, instead of the hack it is currently doing... [PATCH] PCI: pci.ids update from sf.net + add IXP4xx to pci_ids.h [PATCH] PCI: I'm moving Can you please feed the following patch to Andrew? [PATCH] PCI: message cleanup in PCI probe The messages read: PCI: Address space collision on region 8 of bridge 0000:00:1f.0 [1180:11bf] PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1 Transparent bridge - 0000:00:1e.0 PCI: Using IRQ router PIIX/ICH [8086/24cc] at 0000:00:1f.0 PCI: Found IRQ 11 for device 0000:00:1f.1 The following patch adds "PCI: " in front of the message and KERN_INFO as well. Compile&boot tested. Jochen [PATCH] PCI Hotplug: rpaphp: set eeh option (enabled ) prior to any i/o to newly added IOA Attached patch fix the problem I have found during DLPAR I/O slots testing on our new hardware. rpaphp needs to set eeh-option(eanbled) for newly added IOA prior to performing PCI config(pci_setup_device), otherwise the pci_dev of the IOA will have invalid base address information. Linas Vepstas impleted eeh changes. [PATCH] PCI Hotplug: RPA DLPAR remove slot, return code fix [PATCH] PCI Hotplug: Clean up acpiphp_core.c: null checks If the "struct hotplug_struct *" parameter to any function in hotplug_slots_ops is ever NULL something bogus is going on. In this case we should just oops and not hide the bug. This also fixes the driver name used in debug messages. [PATCH] PCI Hotplug: Clean up acpiphp_core.c: slot_paranoia_check Matthew Wilcox wrote: > On Thu, Apr 22, 2004 at 01:18:23PM +0200, Rolf Eike Beer wrote: > > slot_paranoia_check is only another kind of checking everything for NULL. > > Removing this leads to function get_slot is reduced to a simple cast, so > > this function can be killed also. > > Since private is void *, you don't even need the casts. > > > static int enable_slot (struct hotplug_slot *hotplug_slot) > > { > > - struct slot *slot = get_slot(hotplug_slot, __FUNCTION__); > > + struct slot *slot = (struct slot *)hotplug_slot->private; > > struct slot *slot = hotplug_slot->private; > > is enough. Fixed. [PATCH] PCI Hotplug: Clean up acpiphp_core.c: coding style This patch kills the space before the opening brace in function declarations. It also beautifies some ugly return statements. [PATCH] PCI Hotplug: Clean up acpiphp_core.c: kill hardware_test The function hardware_test only tells that there are no tests. If we just kill it the file "test" in the slot's directory will not show up which means pretty much the same. [PATCH] PCI Hotplug: Clean up acpiphp_core.c: use goto for error handling This one converts the error handling in init_slots to use gotos to avoid code duplication. [PATCH] PCI Hotplug: Clean up acpiphp_core.c: return Fix 2 very ugly return constructs. [PATCH] PCI Hotplug: Clean up acpiphp_core.c: remove 3 get_* functions If we remove this 3 get_* functions the pci hotplug core will do the same thing for us. [PATCH] CompactPCI Hotplug: remove unneeded funtion for parameter handling A special function for handling the parameters in non-module case is not needed, the MODULE_* makros handle this also for compiled in situations. [PATCH] CompactPCI Hotplug: kill magic number slot->magic is not used anymore since slot_paranoia_check is dead, so just kill it. [PATCH] CompactPCI Hotplug ZT5550: use new style of module parameters Convert the driver to use new interface for module parameters, Also fix the driver name used in debug messages. Eike [PATCH] ACPI PCI Hotplug: use new style of module parameters This one converts acpiphp_core.c to use the new interface for module parameters. [PATCH] ACPI PCI Hotplug: kill magic number The magic slot number was only another type of checking the validity of a pointer. These checks are all gone so magic can follow them. [PATCH] ACPI PCI Hotplug: use goto for error handling This one fixes another space before an opening brace I missed before and optimizes the error paths in init_slots a bit more. [PATCH] ACPI PCI Hotplug: coding style fixes Some minor coding style fixes: -space before opening brace of function -wrap some long lines -change some identations from spaces to tabs [PATCH] ACPI PCI Hotplug: add a BUG() where one should be If there is a condition with the comment "should never happen" it is a good place for a BUG() if it is ever reached. [PATCH] PCI Hotplug skeleton: use new style of module parameters Convert the PCI hotplug skeleton driver to use new style of module parameter handling. [PATCH] PCI Hotplug skeleton: remove useless NULL checks This one removes all the useless NULL checks including slot_paranoia_check, get_slot and the magic number from the PCI hotplug skeleton driver. Also some lines containing only a single tab are fixed. [PATCH] PCI Hotplug skeleton: fix codingstyle Coding style fixes for pcihp_skeleton.c: remove spaces before opening braces and change a comment in function hardware_test to make clearer that the functions purpose is not to tell the user there are no tests. [PATCH] PCI Hotplug skeleton: mark functions __init/__exit Add __init and __exit to some functions only called from __init/__exit context. [PATCH] PCI Hotplug skeleton: use goto for error handling Convert PCI hotplug skeleton driver to use goto for error handling in init_slots to avoid code duplication. [PATCH] PCI Hotplug skeleton: final cleanups Some final fixes for the skeleton driver: -spaces before opening brace -add a better example for hardware_test function -remove a "int retval" in a void function -some more coding style changes -changed enough stuff: increase version number -fix a typo in a comment [PATCH] PCI Express Hotplug: fix coding style [PATCH] PCI Express Hotplug: remove useless kmalloc casts The result of kmalloc does not need to be casted, it is a void * which can be assigned to any pointer variable. Also avoid code duplication in one if statement. [PATCH] PCI Express Hotplug: splut pciehp_ctrl.c::configure_new_function configure_new_function is way too big (>600 lines). Split it in 2 functions, one for the new functions and one for bridges. And split out a small piece from the bridge function which is used twice to it's own function. Patch is huge because of the identation changes but does nothing than the split and some minor coding style changes. [PATCH] Compaq PCI Hotplug: coding style fixes The usual coding style fixes, this time for cpqphp_ctrl.c and cpqphp.c. [PATCH] Compaq PCI Hotplug: remove useless NULL checks Remove some useless NULL and magic checks from Compaq PCI Hotplug driver. If one of this pointers is invalid we are in bad trouble anyway. [PATCH] Compaq PCI Hotplug: move huge inline function out of header file set_controller_speed is implemented in cpqphp.h but only used in cpqphp_ctrl.c and it's much too big to be defined in a header file. This patch moves it to cpqphp_ctrl.c. Also the inline attribute is removed, this function is called from 2 places and to big to be an inline. [PATCH] Compaq PCI Hotplug: use new style of module parameters Convert Compaq PCI Hotplug driver to use the new style of module parameters. [PATCH] Compaq PCI Hotplug: more coding style fixes Fix a lot of coding style issues in Compaq PCI hotplug: -spaces before opening brace of functions -much too much C++ style comments -wrap long lines -remove some comments where the code does not really need to be explained Eike [PATCH] Compaq PCI Hotplug: split up hardware_test This puts the LED shifting used as "hardware test" in a function to make cpqhp_hardware_test much smaller and easier to read. Also changes some comments from C++ to C style. [PATCH] Compaq PCI Hotplug: use goto for error handling Convert ctrl_slot_setup to use goto for error handling and fix some minor coding style things. [PATCH] Compaq PCI Hotplug: remove useless NULL checks from cpqphp_core.c Remove some useless NULL checks in cpqphp_core.c [PATCH] Compaq PCI Hotplug: fix C++ style comments This is not C++! Fix comments from C++ style to C style, removing some useless ones (e.g. no need to tell up and down protect a critical section). [PATCH] Compaq PCI Hotplug: use goto for error handling in cpqphp_ctrl.c Change cpqphp_ctrl.c to use goto for error handling. [PATCH] Compaq PCI Hotplug: coding style fixes for cpqphp_ctrl.c Some coding style fixes I missed last time. [PATCH] Compaq PCI Hotplug: some final fixes for cpqphp_core.c Final small fixes for cpqphp_core.c: -use better error handling in one_time_init -small coding style fixes -the name of the driver is not "pci_hotplug" -add an __exit for unload_cpqphp -changes enough to increment version, isn't it? [PATCH] PCI Hotplug: Remove type magic from kmalloc This patch removes the cast of kmalloc's results to the target pointer type. Also it fixes kmalloc to use sizeof(*foo) instead of sizeof(type_of_foo) as suggested by Matthew Wilcox. Also removes a few useless checks if a pointer is NULL before calling kfree: kfree checks this itself. [PATCH] [BUGFIX] shpchp_pci.c: fix missing braces after if Add missing braces around if statement, if not we will try to add devices for an empty slot. [PATCH] PCI Hotplug Core: use new style of module parameters Convert PCI Hotplug Core to new style of module parameter handling. [PATCH] PCI Hotplug: Move an often used while loop to an inline function Walking through a pci_resource list and freeing all members is done a lot of times in unload functions. This patch moves this to an inline function in pciehp_core.c, pciehp_pci.c, shpchp_core.c and shpchp_pci.c. This shrinks the code a lot (some 200 lines) and makes it much easier to read. Also adds some __exit. [PATCH] PCI Express Hotplug: use new style of module parameters This converts PCI Express Hotplug to the new style of module parameter handling. [PATCH] Compaq PCI Hotplug: fix missing braces Fix missing braces. It does not change the code but makes it easier to read. PCI Hotplug: fix stupid build bugs caused by previous patches. Doesn't anyone build their patches anymore before sending them out... [PATCH] PCI Express Hotplug: remove useless NULL checks Remove useless NULL checks and magic numbers from PCI Express Hotplug, also some minimal coding style fixes. [PATCH] RPA PCI Hotplug: Remove useless NULL checks Remove useless NULL checks and magic numbers from rpaphp. If one of these ever becomes invalid we are in serious trouble anyway. [PATCH] Compaq PCI Hotplug: kill useless kmalloc casts This patch removes the cast of kmalloc's results to the target pointer type. Also it fixes kmalloc to use sizeof(*foo) instead of sizeof(type_of_foo) as suggested by Matthew Wilcox. Also removes a few useless checks if a pointer is NULL before calling kfree: kfree checks this itself. [PATCH] PCI Express Hotplug: kill hardware_test The hardware_test function of the PCI Express Hotplug driver is empty. It's better to completely kill this to tell the user hardware tests are not supported by this driver. [PATCH] Compaq PCI Hotplug: remove useless NULL checks from cpqphp_ctrl.c Remove useless NULL checks from cpqphp_ctrl.c. Under normal circumstances there is no chance for any of this functions to get called with a NULL argument. If we are in such trouble that we get a NULL pointer don't hide it, just oops. [PATCH] PCI Express Hotplug: use goto for error handling This changes pciehp_core.c::init_slots to use goto for error hanling. Also a missing magic missed by previous patches is killed. [PATCH] PCI Express Hotplug: codingstyle fixes for pciehp.h Some small coding style fixes and a typo fix for pciehp.h [PATCH] PCI Express Hotplug: remove useless kmalloc casts The result of kmalloc does not need to be casted to any other pointer type. Also use kmalloc(*foo) instead of kmalloc(type_of_foo) and wrap some long lines. [PATCH] PCI Express Hotplug: some cleanups Some coding style fixes and small cleanups for pciehp_core.c: -wrap long lines -kill spaces before opening braces of functions -remove code duplication where both parts of an if statement do exactly the same -kill some useless comments -kill an unneeded initialisation [PATCH] PCI Express Hotplug: mark global variables static Don't know why, but it looks like a good idea to mark this global variables static. [PATCH] PCI Express Hotplug: kill more useless casts This patch does two things: -remove casts of pointers which are void* or already the correct type for the target -if we dereferenced a struct member and copied this to it's own variable use this and don't dereference the member again [PATCH] PCI Express Hotplug: codingstyle fixes for pciehp_pci.c This is a bunch of coding style fixes (wrap long lines, whitspacing etc.) for pciehp_pci.c [PATCH] RPA PCI Hotplug: use new style of module parameters The debug parameter of rpaphp is only used as a boolean so we can scan the commandline of it like a boolean parameter. [PATCH] RPA PCI Hotplug: kill get_cur_bus_speed from rpaphp_core.c The get_cur_bus_speed function of rpaphp does nothing that the PCI Hotplug Core would not do by itself if this function does not exist, so just kill it. [PATCH] RPA PCI Hotplug: codingstyle fixes for rpaphp_core.c Some coding style fixes for rpaphp_core.c: -s/return(foo)/return foo/ -some whitespace fixes -document function in proper way Eike [PATCH] RPA PCI Hotplug: fix up init_slots in rpaphp_core.c rpaphp_core.c::init_slots is not more than a for loop and is called only from one place, this inlines the important 3 lines. Als add some __init and __exit. [PATCH] RPA PCI Hotplug: remove useless NULL checks from rpaphp_core.c Remove two useless NULL checks from rpaphp_core.c [PATCH] RPA PCI Hotplug: use goto for error handling in rpaphp_slot.c Convert rpaphp_slot.c::alloc_slot_struct to use goto for error handling. Also some small coding style fixes. [PATCH] RPA PCI Hotplug: codingstyle fixes for rpaphp_pci.c Some coding style fixes for rpaphp_pci.c. [PATCH] SHPC PCI Hotplug: use new style of module parameters Convert shpchp_core.c to use new style of module handling. Eike [PATCH] SHPC PCI Hotplug: kill hardware_test shpchp_core.c::hardware_test is empty. If we remove it we tell the user that hardware tests are not supported at all. [PATCH] SHPC PCI Hotplug: fix cleanup_slots to use a release function shpchp is the only driver which does not use a release function for the slot struct. This adds one and does some minor coding style fixes. Also no one cares about the return value of cleanup_slots (which is always 0 anyway) so we can make the function void. [PATCH] SHPC PCI Hotplug: use goto for error handling Convert shpchp_core.c::init_slots to use goto for error handling. [PATCH] SHPC PCI Hotplug: codingstyle fixes Some small coding style fixes for shpchp_core.c. [PATCH] SHPC PCI Hotplug: kill useless NULL checks [PATCH] SHPC PCI Hotplug: more coding style fixes A big bunch of coding style fixes for shpchp_ctrl.c and shpchp_pci.c Eike [PATCH] SHPC PCI Hotplug: remove some useless casts Remove a useless cast: pci_add_new_bus returns a struct pci_bus*, so no need to cast. PCI Hotplug: fix build error due to previous patches. Fix errors in [PATCH] aic7xxx: fix oops whe hardware is not present This patch was causing a boot panic. Now fixed. [PATCH] USB: USB altsetting updates for IDSN Hisax driver The USB core is changing the way interfaces and altsettings are stored. They are no longer required to be in numerical order, and as a result, simply indexing the interface and altsetting arrays won't work as expected. This patch for the st5481 takes these changes into account. A simpler approach would be to store a pointer to the struct usb_host_interface rather than look it up repeatedly, but I'm not very familiar with this driver and didn't want to attempt such an alteration. [PATCH] USB: Alcatel TD10 Serial to USB converter cable support The Alcatel TD10 USB to Serial converter cable (for use with a Alcatel OT 535 or 735(i) mobile phone) seems to be a repackaged Alcatel version of the Prolific 2303 adapter. And as such, simply adding its product/vendor id (0x11f7/0x02df) to drivers/usb/serial/pl2303.c seems to be enough to make it work. [PATCH] USB Gadget: gadget zero and USB suspend/resume This patch lets gadget zero be more useful in testing usb suspend and resume. It prints messages on suspend() and resume(), and supports an "autoresume=N" mode to wake the host after N seconds. [PATCH] USB: reject urb submissions to suspended devices This patch rejects URB submissions to suspended devices, so that they don't get hardware-specific fault reports. Instead, they get the same code (-EHOSTUNREACH) for all HCDs. It also fixes a minor problem with colliding declarations of the symbol USB_STATE_SUSPENDED. [PATCH] USB: LEGO USB Tower driver v0.95 here is the latest version 0.95 of the LEGO USB Tower driver against 2.6.6-rc3 which corrects a lot of problems in the version currently in the kernel, most notably sleeping in interrupt context and improper locking. Please apply. It has been thoroughly tested with UHCI, OHCI and EHCI host controllers using Lejos and NQC. Firmware and program download, and with proper modifications all communication protocols supported by Lejos work, as well as firmware and program download and datalog upload in NQC. Notes to application maintainers/protocol designers: - Small modifications are needed in communication protocols because the tower tends to discard the first byte of transmissions. So for example LNP needs to send an extra byte like 0xff before the packet, and F7 handlers needs to cope with a lost 0x55. - I suggest /dev/usb/legousbtower0 etc. as the standard device names. This puts it in the same place as the other USB devices and makes clear which driver is responsible for these devices. [PATCH] USB: usbfs: change extern inline to static inline And change __inline__ to inline and get rid of an unused function while at it. [PATCH] USB: fix WARN_ON in usbfs On Tuesday 27 April 2004 10:58, Oliver Neukum wrote: > Am Dienstag, 27. April 2004 00:14 schrieb Greg KH: > > On Mon, Apr 26, 2004 at 04:05:17PM +0200, Duncan Sands wrote: > > > diff -Nru a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c > > > --- a/drivers/usb/core/devio.c Mon Apr 26 13:48:28 2004 > > > +++ b/drivers/usb/core/devio.c Mon Apr 26 13:48:28 2004 > > > @@ -350,8 +350,8 @@ > > > * all pending I/O requests; 2.6 does that. > > > */ > > > > > > - if (ifnum < 8*sizeof(ps->ifclaimed)) > > > - clear_bit(ifnum, &ps->ifclaimed); > > > + BUG_ON(ifnum >= 8*sizeof(ps->ifclaimed)); > > > > I've changed that to a WARN_ON(). Yeah, writing over memory is bad, but > > oopsing is worse. Let's be a bit nicer than that. > > You aren't nice that way. An oops has localised consequences. Scribbling > over memory can cause anything. Hi Greg, if won't accept a BUG_ON, how about the following? [PATCH] USB: dummy_hcd, root port wakeup/suspend Here's what's in my tree to make dummy_hcd do suspend and wakeup correctly ... that is, making its emulated root hub and gadget work more like real ones. It's easier to do this for fake hardware than the real stuff. But real drivers tend to need very similar changes ... :) - Dave p.s. This does not depend on the suspend/resume patch. And it doesn't do "global" suspend (of root hub). [PATCH] USB: DSBR-100 tiny patch On Fri, Feb 06, 2004 at 10:17:32AM -0800, Greg KH wrote: > On Fri, Feb 06, 2004 at 05:06:01PM +0100, Markus Demleitner wrote: > > Since I finally switched over to 2.6 I noticed that my dsbr100 driver > > produces a warning to the effect that I should provide a release > > callback. After a quick google on the issue I came to the conclusion > > No, you will have to fix up your driver to work properly, sorry. It's > due to the changes to the v4l layer to handle removable devices much > better (and to tie it into the driver model.) I didn't get around to doing real work on this until now, but finally in the attachment there's my stab at bringing dsbr100 up to kernel 2.6. I'm not really comfortable with the release callback issues (I've yet to find some HOWTO-like documentation on this...) on the v4l side, so I'd be grateful if you could have a look at it. I've basically tried to copy what stv680 does, which may or may not have been a good idea (in particular see the comment above the disconnect function). I've used the opportunity for some code beautyfing, which of course makes the patch a bit of a mess. I hope you won't mind too much -- as you can see, it would have been pretty messy anyway. [PATCH] I2C: Add LM99 support to the lm90 driver The following patch adds support for the LM99 chip to the lm90 driver, on popular request. The nVidia GeForce FX 5900 series cards have such a chip on-board for monitoring the GPU temperature. Relevant pointers: http://archives.andrew.net.au/lm-sensors/msg07671.html http://secure.netroedge.com/~lm78/readticket.cgi?ticket=1661 http://secure.netroedge.com/~lm78/readticket.cgi?ticket=1662 Additional effects of the patch: * Do not consider the lm90 driver experimental anymore. I have had enough testers and not a single problem report, the driver is working OK. * Support the LM89. According to the datasheets, it is exactly the same chip as the LM99 (to the chip ID). We've never seen this chip in a computer so far, but it doesn't cost anything to support it (actually we cannot not support it, since we have no way to differenciate with the LM99). * Scan two addresses instead of one. The LM99 and LM89 have a "-1" variant using an alternate address. * Update copyright year. * Reword the identification code a bit. It is hopefully slightly less unreadable. This patch was successfully tested by Corey Hickey. [PATCH] PCI Hotplug: pciehp-linkage-fix.patch This fixes allyesconfig I2C: rename i2c-ip4xx.c driver [PATCH] I2C: Voltage conversions in via686a My previous patch was actually not correct, reading from the chip's registers was fixed but writing limits to it wasn't. This new version of the patch should be better. Sorry for the trouble. [PATCH] I2C: support I2C_M_NO_RD_ACK in i2c-algo-bit I have an I2C device (Samsung ks0127 video grabber) with a peculiar i2c implementation. When reading bytes, it only senses for the stop condition in the place where the acknowledge bit should be. So, to properly support this device acks need to be turned off during reads. There is an I2C_M_NO_RD_ACK bit already defined in i2c.h which appears to be what I want. Unfortunately it doesn't seem to be used anywhere in the current tree. At the end of this message is a patch to teach i2c_algo_bit to honor the bit. [PATCH] I2C: Update IXP4xx I2C bus driver The 2.6 IXP4xx code has been cleaned up to change all references to IXP42x/IXP425 with IXP4xx. The following patch updates the I2C bits. Before applying, you need to 'bk move i2c-ixp42x.c ixp-4xx.c". [PATCH] I2C: add I2C epson 8564 RTC chip driver Add support for the Epson 8564 RTC chip. Update cifs change log [PATCH] Class support for ppdev.c [PATCH] Add class support to drivers/char/ip2main.c [PATCH] add class support to drivers/block/paride/pg.c This patch adds class support to pg.c, the parallel port generic ATAPI device driver. I have verified it compiles but do not have the hardware. If someone does and could test that would be helpful. [PATCH] add class support to drivers/block/paride/pt.c This patch adds class support to pt.c which "the high-level driver for parallel port ATAPI tape drives based on chips supported by the paride module." Which I dont have in order to test. I have verified it compiles but can not test it. If someone who has the hardware could I would appreciate it. [PATCH] add class support to drivers/char/tipar.c This patch adds class support to the Texas Instruments graphing calculators with a parallel link cable. I have verified it compiles. If someone has the hardware please verify it works. [PATCH] Re: Platform device matching On Mon, Apr 26, 2004 at 12:27:33AM +0100, Russell King wrote: > So, this comment needs updating: > > * So, extract the from the device, and compare it against > * the name of the driver. Return whether they match or not. Want a patch? Add missing cifs protocol data unit definitions [TG3]: Add eeprom dump support. [PATCH] USB: esthetic and trivial patch. [PATCH] USB: Altsetting update for USB IrDA driver This patch updates the USB IrDA driver to take into account that the kernel may no longer store altsetting entries in numerical order. The driver only needed one change; this was a simple matter of using the entry corresponding to the altsetting that was just installed. [PATCH] USB: update for mtouchusb The attached patch for the 3M Touch Systems Capacitive controller. (again) Quick list of changes: * decrease mtouch->open counter in the event of a urb submission failure The changes are due to comments Oliver Neukum's comments on the touchkit.c driver. Good catch! Sorry I missed it. http://marc.theaimsgroup.com/?l=linux-usb-devel&m=108343028201159&w=2 ia64: Avoid ".save rp, r0" since the kernel unwinder doesn't support it yet. Once we switch to a libunwind-based kernel unwinder, this code can be re-enabled again. Add smb copy function fix truncated directory listings on large directories to Samba (when Unicode and Unix extensions enabled) [PATCH] sym53c500_cs PCMCIA SCSI driver (round 5) Fifth attempt at a PCMCIA SCSI driver for the Symbios 53c500 controller. This version has all the cleanup Christoph has requested to date, including removal of support for the obsolete (in 2.6) proc_info functionality. Support for additional sysfs class device attributes has been added: two are read-only (irq, ioport), one is read-write (fast_pio). The read-write attribute is a per-instance flag indicating the PIO speed of the particular HBA: valid values are 1 (enabled -- default) and 0 (disabled). [PATCH] PATCH: (as255) Handle Unit Attention during INQUIRY better Some buggy USB storage devices can return Unit Attention status for INQUIRY commands. The current code in scsi_scan.c checks for ASC = 0x28 = Not ready to ready transition, but these devices can also return ASC = 0x29 = Power-on or reset occurred. In addition, the code doesn't retry the INQUIRY when these codes are received. [PATCH] minor changes to qla1280 driver On one of our big machines we found a problem with posted writes while running AIM. Two writes of the Request Queue In pointer went out of order, making the chip think that it had a queue wrap. I took advantage of this opportunity to add relaxed reads, which helps the Altix. It should not affect other arches. All reads are relaxed except for the read of the Semaphore register. [PATCH] aic7xxx deadlock fix We cannot call del_timer_sync() from within that timer's handler function! [PATCH] (5/5) pcmcia/nsp: use kernel.h min/max/ARRAY_SIZE From: Michael Veeck Subject: [Kernel-janitors] [PATCH] drivers/scsi/pcmcia MIN/MAX/NUMBER removal Patch (against 2.6.6-rc1) removes unnecessary min/max/number macros and changes calls to use kernel.h macros instead. drivers/scsi/pcmcia/nsp_cs.c | 12 ++++++------ drivers/scsi/pcmcia/nsp_cs.h | 2 -- 2 files changed, 6 insertions(+), 8 deletions(-) [PATCH] (2/5) aic7xyz_old: use kernel.h min/max/ARRAY_SIZE From: Michael Veeck Subject: [Kernel-janitors] [PATCH] drivers/scsi/aic7xxx_old MIN/MAX/NUMBER removal Patch (against 2.6.6-rc1) removes unnecessary min/max/number macros and changes calls to use kernel.h macros instead. drivers/scsi/aic7xxx_old.c | 43 ++++++++++++++------------------ drivers/scsi/aic7xxx_old/aic7xxx_proc.c | 6 ++-- 2 files changed, 23 insertions(+), 26 deletions(-) [PATCH] (4/5) nsp32 (ninja): use kernel.h min/max/ARRAY_SIZE From: Michael Veeck Subject: [Kernel-janitors] [PATCH] drivers/scsi/nsp MIN/MAX/NUMBER removal Patch (against 2.6.6-rc1) removes unnecessary min/max/number macros and changes calls to use kernel.h macros instead. drivers/scsi/nsp32.c | 24 ++++++++++++------------ drivers/scsi/nsp32.h | 4 ---- 2 files changed, 12 insertions(+), 16 deletions(-) [PATCH] (3/5) ncr53c8x: use kernel.h min/max From: Michael Veeck Subject: [Kernel-janitors] [PATCH] drivers/scsi/53c* MIN/MAX removal Patch (against 2.6.6-rc1) removes unnecessary min/max macros and changes calls to use kernel.h macros instead. drivers/scsi/ncr53c8xx.c | 6 +++--- drivers/scsi/sym53c8xx_comm.h | 5 +---- 2 files changed, 4 insertions(+), 7 deletions(-) [PATCH] support swsusp for aic7xxx From: Pavel Machek Marks threads as needed for suspend. DESC aic79xx_osm.c build fix EDESC drivers/scsi/aic7xxx/aic79xx_osm.c: In function `ahd_linux_dv_thread': drivers/scsi/aic7xxx/aic79xx_osm.c:2594: `PF_IOTHREAD' undeclared (first use in this function) [netdrvr b44] ethtool_ops support [netdrvr b44] use netdev_priv [netdrvr b44] use miilib for MII ioctl handling [PATCH] add simple class for adb This adds /sys/class/adb/, removes unused devfs lines and updates a comment to match reality. PCI Hotplug: revert broken PCI Express hotplug patch USB: add support for Zire 31 devices. Info was from Adriaan de Groot [PATCH] USB Gadget: fix pxa define in gadget_chips.h below is a trivial patch which fixes the PXA gadget define in drivers/linux/usb/gadget/gadget_chips.h Everywhere CONFIG_USB_GADGET_PXA2XX is used, except in that file, which bites obviously ... Fix define for PXA UDC. [PATCH] USB Gadget: fix g_serial debug module parm g_serial.ko can't be load as module because "debug" is only defined if G_SERIAL_DEBUG is defined, but "debug" is referenced in MODULE_PARM(). [PATCH] USB: usbnet handles Billionton Systems USB2AR This adds another ax8817x device to "usbnet". [PATCH] USB: add support for eGalax Touchscreen USB this is the second version of the patch to add support for eGalax Touchkit USB touchscreen. changes since last patch: - fixed the bug in open, found by oliver neukum - renamed driver from touchkit.c to touchkitusb.c (since the thing also exists as RS232, PS/2 and I2C) - some minor coding style updates [PATCH] USB: Reduce kernel stack usage This patch allocates a temporary array from the heap instead of from the kernel's stack in usb_set_configuration(). It also updates a few comments. Please apply. [PATCH] USB Storage: unusual_devs.h update On 4 May 2004, Rajesh Kumble Nayak wrote: > The Above patch work fine for Sony Hc-85 > I shall post the dmesg entry soon. > > With many thanks > Rajesh Greg and Pete, here's the patch. It's possible that this entry could be combined with the previous one, but until we know definitely they should be kept separate. [PATCH] USB: Small change to CPiA USB driver Only one aspect of it is notable: The CPiA USB driver calls usb_driver_release_interface() during its disconnect() routine. That doesn't appear to be necessary, since it didn't call usb_driver_claim_interface() beforehand and since the interface will be released automatically when disconnect() returns. [PATCH] USB Storage: Kyocera Finecsm 3L -unusual_devs.h [PATCH] PATCH: (as268) Import device-reset changes from gadget-2.6 tree This patch imports the changes that David Brownell has made to the device-reset functions in his gadget-2.6 tree. Once these ongoing troubling questions about locking are settled, I'll add support for the "descriptors changed" case. [PATCH] PCI Hotplug: rpaphp doesn't initialize slot's name Attached is a revised version of rpaphp.patch. It has the following fixes: - Set up slot->name - Kill some dbgs - Eike's fixes - New fixes for incorrect "goto" in rpaphp_slot.c. [PATCH] USB Storage: Sony Clie I've received the following report which indicates that the Sony Clie needs the US_FL_FIX_INQUIRY flag set. http://bugs.debian.org/243650 Driver core: handle error if we run out of memory in kmap code [PATCH] missing audit in bus_register() |How about using a goto on the error path to clean up properly |instead of the different return sections. .. here goes Take 2: [PATCH] I2C: Sensors (W83627HF) in Tyan S2882 [PATCH] I2C: Invert as99127f beep bits in kernel space The following patch changes the way we invert beep bits for the AS99127F sensor chip. This chip behaves differently from the other chips in that a disabled bit is 1, not 0. So far we didn't handle that specificity in the w83781d driver, so it was left to user-space applications to handle it. For the sake of uniformity, it's obviously better if it's done in the driver instead (although the meaning of each bit is still chip-dependant). I already did a similar change to the 2.4 driver and the sensors program. I don't think that many user-space application will be affected, since most of them don't handle the beep mask as far as I can tell. This also close Debian bug #209299: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=209299 Successfully tested on my AS99127F rev.1 chip. Aurelien Jarno also checked that there were no regression on non-Asus chips. [PATCH] I2C: Rewrite temperature conversions in via686a driver The following patch rewrites the temperature conversion macros and functions found in the via686a chip driver. Contrary to the voltage conversions a few weeks ago, temperature conversions were numerically correct, but artificially complex. The new ones are cleaner. It also fixes a highly improbable array overflow (would take one of the measured temperatures to be over 145 degrees C). Successfully tested by Mark D. Studebaker and I, and already applied to our CVS repository. [PATCH] I2C: Fix memory leaks in w83781d and asb100 Quoting myself > U-ho. I think I've introduced a memory leak with this patch :( > > For drivers that handle subclients (asb100 and w83781d on i2c), the > sublient memory is never released if I read the code correctly. This > is because we now free the private data on unload, assuming that it > contains the i2c client data as well. That's true for the main i2c > client, but not for the subclients (data == NULL so nothing is freed). > > Could someone take a look and confirm? I could test and actually saw memory leaking when cycling the w83781d driver at a sustained rate (5/s). > I can see two different fixes: > > 1* When freeing the memory, free the data if it's not NULL (main > client), else free client (subclients). Cleaner (I suppose?). > > 2* When creating subclients, do data = &client instead of data = NULL. > Then freeing will work. Less code, faster. Are there side effects? (I > don't think so) > > My preference would go to 2*. I ended up implementing 1*. That's cleaner and there's actually almost no extra code. Mark, can you confirm that I'm doing the correct thing? I'll do something similar in our CVS repository (for now, the asb100 and w83781d drivers had not their memory allocation scheme reworked there). [PATCH] I2C: add .class to i2c drivers in the "[RFC|PATCH][2.6] Additional i2c adapter flags for i2c client isolation" thread, the i2c people have agreed that an ".class" field should be added to struct i2c_driver. Currently only drivers do checks for plausibility ("Is this an adapter I can attach to?"), but adapters don't have a chance to keep drivers away from their bus. If both drivers and adapters provide a .class entry, the i2c-core can easily compare them and let devices only probe on busses where they can really exist. Real world example: DVB i2c adapters cannot ensure that only known DVB i2c chipsets probe their busses. Most client drivers probe every bus they get their hands on. This will confuse some DVB i2c busses. With the new I2C_CLASS_ALL flag it will be possible that an adapter can request that really all drivers are probed on the adapter. On the other hand, drivers can make sure that they get the chance to probe on every i2c adapter out there (this is not encouraged, though) The attached patch does the first step: - add .class member to struct i2c_device - remove unused .flags member from struct i2c_adapter - rename I2C_ADAP_CLASS_xxx to I2C_CLASS_xxx (to be used both for drivers and adapters) - add new I2C_CLASS_ALL and I2C_CLASS_SOUND classes - follow these changes in the existing drivers with copy & paste [PATCH] ia64: perfmon update - Cleanup the read/write check routines for pfm_write_pmcs, pfm_write_pmds, pfm_read_pmds. - Autodetect the PMU model. No need to have the kernel compiled for Itanium, HP Simulator, or Itanium2. The support for all PMU models is included. Probing is based on processor family and platform_name, if necessary. With this patch, it is possible to use an Itanium2 compiled kernel on an Itanium 1 system and get perfmon to work. - Removed remaining dependency on CONFIG_MCKINLEY by adding a new field (flags) to pmu_config_t. Update /proc/perfmon to show the new field. - Fixed a bug in the Itanium2 pmc_ita2_write_check() where an inactive PMC13 would be considered active. [PATCH] ia64: initialize IO-port-base early (start_secondary): Set up IO port base here, in case early console needs it. (smp_callin): Move IO port base setup to start_secondary(). [PATCH] ia64: arch/ia64/kernel/smp.c: kill duplicate #include linux/cache.h is included more than once. ia64: rename "mem" boot parameter to "max_addr" and implement proper "mem" Patch by Robert Picco: This patch renames the mem bootparameter to max_addr and implements the mem boot parameter to work as documented (i.e., to limit the amount of memory to be recognized by the kernel). [PATCH] PCI: PCI devices with no PCI_CACHE_LINE_SIZE implemented On Wed, May 05, 2004 at 03:31:02PM -0700, Greg KH wrote: > On Thu, Apr 29, 2004 at 02:53:01PM -0500, Matt Domsch wrote: > > a) need this be a warning, wouldn't KERN_DEBUG suffice, if a message > > is needed at all? This is printed in pci_generic_prep_mwi(). > > Yes, we should make that KERN_DEBUG. I don't have a problem with that. > Care to make a patch? [PATCH] ia64: SN2 - remove node_first_cpu member Remove node_first_cpu member from nodepda_s and replace its usage with calls to sn_get_node_first_cpu(). ia64: Fix spurious GAS dependency-violation (dv) warnings by taking advantage of two new GAS directives (.serialize.{data,instruction}). [CPUFREQ] Export scaling cur frequencies Many users want to know the current cpu freqeuncy, even if not using the userspace frequency. On ->target cpufreq drivers (if they do their calls to cpufreq_notify_transition correctly) this just means reading out cpufreq_policy->cur. [CPUFREQ] Move cpufreq_get() from the userspace governor to the core. Contrary to the previous implementation, it now calls the cpufreq driver, and reads out the _actual_ current frequency, and not the frequency the CPUfreq core _thinks_ the CPU is running at. Most cpufreq drivers do provide such a "hw get" function (only ACPI-io can definitely not be supported, I'm not sure about sh, sparc64 and powermac) anyway, and it is useful for other issues. [CPUFREQ] Export cpufreq_get() to userspace. As it involves calls to hardware which might take some time, only let the super-user read out this value. [CPUFREQ] Fix 'out of sync' issue. Sometimes we might discover during a call to cpufreq_get() that we're "out of sync", meaning the actual CPU frequency changed "behind our back". If this happens, the flag CPUFREQ_PANIC_OUTOFSYNC decides what can be done: if it is set, the kernel panic's, it it is not set, the cpufreq transition notifiers are informed of this change, and a call to cpufreq_update_policy() is scheduled [using the default workqueue] so that the user-defined values override BIOS / external interaction. [CPUFREQ] (Hopefully) fix cpufreq resume support. Upon resuming, first CPUfreq hardware support needs to be re-enabled in certain cases (call to cpufreq_driver->resume()). Then, two different paths may need to be taken: a) frequency during suspend equals frequency during resume ==> everything is fine, b) frequency differ ==> either we can't handle it, then panic (see flag CPUFREQ_PANIC_RESUME_OUTOFSYNC). Or we can handle it, then notify all [CPUFREQ] Handle CPUFREQ_RESUMECHANGE notifications Notifications in i386, sparc64, x86_64, sh-sci and sa11xx-pcmcia notifiers. sa1100-framebuffer doesn't seem to be able to handle frequency transitions behind its back well. So, sa11xx will be marked CPUFREQ_PANIC_OUTOFSYNC | CPUFREQ_PANIC_RESUME_OUTOFSYNC later. [CPUFREQ] use elanfreq's internal get function as ->get() [CPUFREQ] use gx-suspmod's internal get function as ->get() [CPUFREQ] Add a longhaul_get function. [CPUFREQ] Add longrun ->get Longrun users might be interested in their CPU's current frequency as well, so use a longrun-specific cpuid-call in longrun_get(). [CPUFREQ] Add p4-clockmod ->get p4-clockmod is a bit more complicated as it might run on SMP, HT, and the instructions need to run on the specific (physical) CPU. [CPUFREQ] powernow-k6 ->get powernow_k6 has almost all pieces in place for its own ->get() function. Add the rest. [CPUFREQ] powernow-k7->get() implementation by Bruno Ducrot. [CPUFREQ] Add powernowk8_get() but be careful as some code needs to run on specified CPU only. [CPUFREQ] use speedstep_centrino's internal get function as ->get() [CPUFREQ] Use speedstep_lib's capabilites for ->get() in speedstep-ich.c [CPUFREQ] Use speedstep_lib's capabilites for ->get() in speedstep-smi.c JFS: Avoid race invalidating metadata page [CPUFREQ] arm-integrator ->get() implementation arm-integrator had its ->get() implementation inside integrator_cpufreq_init(). Move it to an extra function, and add it as ->get() function. [CPUFREQ] sa11x0 ->get sa11x0_getspeed can be used by both cpu-sa1100.c and cpu-sa1110.c as ->get() function. Update calling conventions, and un-export it as we fixed the handling of cpufreq_get in the cpufreq core. Also, remove special call to userspace-governor init as it isn't needed any longer. [CPUFREQ] Handle CPU frequency changing behind our back. Once we detected 50 consecutive ticks with lost ticks (and this is half of the amount needed to trigger the fallback to a "sane" timesource), verify the CPU frequency is in sync if cpufreq is used: sometimes the CPU frequency changes behind the user's back, and then the TSC detects lost ticks. By a call to cpufreq_get(), the frequency the TSC driver thinks the CPU is in is updated to the actual frequency, in case these differ. Works really nice on my notebook -- it's never falling back to a different timesource now, even if I plug in the power cord. [CPUFREQ] Also check whether the CPU frequency is out of sync once we get to cpufreq_notify_transition. [CPUFREQ] Handle P4 TSC scaling. Currently, the TSC cpufreq notifiers does almost nothing on P4s, as we assumed the TSC to be constant independent of _all_ frequency transitions. Extensive testing by Karol Kozimor has shown, though, that only _throttling_ does not affect the TSC, but _scaling_ does. So: - pass the CPUFREQ_CONST_LOOPS flags (to be exact, all flags) to cpufreq transition notifiers - skip TSC value changes if this flag is set - set this flag for P4 / P4-Ms only in p4-clockmod [On Pentium-M banias the TSC _is_ affected by p4-clock modulation [CPUFREQ] Clean up P4 centrino detection. Add a new "struct cpu_id" for better handling of different Pentium M steppings / revisions. [CPUFREQ] Improved Banias detection. The built-in tables are only valid for Pentium M (Banias) processors with CPUID 6/9/5. So, add a pointer to the proper struct cpu_id to the cpu_model struct, and re-name _CPU/CPU to _BANIAS/BANIAS [CPUFREQ] Add support for Pentium M (Dothan) processors. Until further review, only ACPI data will get this driver to run - no built-in tables will exist. Many thanks to Thomas Renninger for reporting the lack of, and testing the support for Dothan processors. [CPUFREQ] Add support for Pentium M (Dothan) processors for p4-clockmod. But warn loudly if anyone tries to use it -- you really should use speedstep-centrino instead. On Dothans, the TSC is _not_ affected by TSC transitions (contrary to Banias processors), so set the CPUFREQ_CONST_LOOPS flag. Many thanks to Thomas Renninger for reporting the lack of, and testing the support for Dothan processors. JFS: reduce stack usage [PATCH] ia64: fix MOD_{INC,DEC}_USE_COUNT use in prominfo set proper proc_entry owner instead. Patch OK'd by Jesse Barnes. JFS: [CHECKER] More robust error recovery in add_index If an error is encountered in add_index, it now leaves the index table in a consistent state. Since the return value is stored in the directory entry regardless of add_index's success, return zero instead of -EPERM (which made no sense). Add modules to sysfs This patch adds basic kobject support to struct module, and it creates a /sys/module directory which contains all of the individual modules. Each module currently exports the refcount (if they are unloadable) and any module paramaters that are marked exportable in sysfs. Was written by me and Rusty over and over many times during the past 6 months. update readme for mode,uid,gid description [PATCH] ia64: make perfmon treat Ski simulator like real Itanium chip Remove perfmon_hpsim.c. Support is folded into perfmon_itanium.c or perfmon_itanium2.c depending on how Ski identifies itself via the CPU ID family (Merced or McKinley/Madison). Also fix firmware emulator PAL_PERFMON_INFO emulation to report Itanium2 information. [TG3]: Add 572x/575x PCI IDs. [TG3]: Add 5750 chip and PHY IDs. [TG3]: Prepare for 5750 support plus minor fixes. 1) Handle cases that apply to 5750 the same as 5705. 2) Only set CLOCK_CTRL_FORCE_CLKRUN on 5705_A0 3) Clear out on-chip and memory stats block right before setting MAC_MODE. 4) On bootup chip probe, always skip PHY reset if link is up. [TIGON3]: Detect and record PCI Express. [TG3]: PCI Express 5750_A0 chips need 5701_REG_WRITE_BUG treatment. [TG3]: Fix chiprev test in previous change. [TG3]: Do not set CLOCK_CTRL_DELAY_PCI_GRANT on PCI Express. [TG3]: Double delay after writing MAC_MI_MODE reg. [TG3]: Correct RDMAC/WDMAC mode settings on 5705/5750. [TG3]: Do not write stats coalescing ticks reg on 5705/5750. NTFS: 2.1.9 release - Fix two bugs in the decompression engine in handling of corner cases. [CPUFREQ] Warning fixes. On sparc64: drivers/cpufreq/cpufreq.c: In function `cpufreq_add_dev': drivers/cpufreq/cpufreq.c:394: warning: cast to pointer from integer of different size drivers/cpufreq/cpufreq.c: In function `handle_update': drivers/cpufreq/cpufreq.c:507: warning: cast from pointer to integer of different size [PATCH] Make SCSI timeout modifiable add a timeout field to struct scsi_device and expose it in in sysfs. This patch allows LLDs to override the default timeout used for scsi devices and exposes it in sysfs. The default timeout value used is too short for many RAID array devices, such as those created by the ipr driver. MPT Fusion driver 3.01.06 update From: Moore, Eric Dean [PATCH] PATCH [1/15] qla2xxx: Firmware dump fixes ISP dump routine fixes: o Properly release hardware_lock in failure path. o Fix inability to complete ISP2100 dump, by properly reseting the RISC after register reads. drivers/scsi/qla2xxx/qla_dbg.c | 34 ++++++++++++---------------------- 1 files changed, 12 insertions(+), 22 deletions(-) [PATCH] PATCH [2/15] qla2xxx: Remove flash routines Remove flash support from embedded driver: o Remove unused option-rom variables from host structure. o Remove flash manipulation routines. drivers/scsi/qla2xxx/qla_def.h | 2 drivers/scsi/qla2xxx/qla_gbl.h | 8 drivers/scsi/qla2xxx/qla_init.c | 3 drivers/scsi/qla2xxx/qla_sup.c | 446 ---------------------------------------- 4 files changed, 459 deletions(-) [PATCH] PATCH [3/15] qla2xxx: 2100 request-q contraints Older, notably the ISP2100, chips have some contraints for the request queue depth and number of scatter-gather elements allowed for a given command. For this chip, reduce request queue size to 128 and maximum number of scatter-gather entries for a command to 32. drivers/scsi/qla2xxx/qla_def.h | 14 +++----------- drivers/scsi/qla2xxx/qla_init.c | 9 +++++---- drivers/scsi/qla2xxx/qla_iocb.c | 14 +++++++------- drivers/scsi/qla2xxx/qla_os.c | 14 +++++++++----- drivers/scsi/qla2xxx/qla_rscn.c | 2 +- 5 files changed, 25 insertions(+), 28 deletions(-) [PATCH] PATCH [4/15] qla2xxx: PortID binding fixes Fix problem where port ID binding would not be honoured when a device was moved within the fabric. drivers/scsi/qla2xxx/qla_init.c | 33 ++++++++++++++++++++++++--------- 1 files changed, 24 insertions(+), 9 deletions(-) [PATCH] PATCH [5/15] qla2xxx: Debug messages during ISP abort Issue a kernel warning message before initiating an ISP abort (big hammer) -- additional debugging mechanism in case of event. drivers/scsi/qla2xxx/qla_mbx.c | 9 +++++++++ drivers/scsi/qla2xxx/qla_os.c | 2 ++ drivers/scsi/qla2xxx/qla_rscn.c | 2 ++ 3 files changed, 13 insertions(+) [PATCH] PATCH [6/15] qla2xxx: LoopID downcast fix Fix problem where the driver would incorrectly down-cast the target loop_id while retrieving link statistics. drivers/scsi/qla2xxx/qla_gbl.h | 2 +- drivers/scsi/qla2xxx/qla_mbx.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) [PATCH] PATCH [7/15] qla2xxx: Firmware options fixes Cleanup retrieval and update of firmware options: o Update only valid for non-(2[12]00) ISPs. o Instruct firmware to return completed IOCBs without waiting for an ABTS to complete. drivers/scsi/qla2xxx/qla_init.c | 79 +++++++++++++++++++++++++--------------- 1 files changed, 50 insertions(+), 29 deletions(-) [PATCH] PATCH [8/15] qla2xxx: Volatile topology fixes Fix problem where during ISP initialization in a volatile topology (i.e. fabric environment with large number of streaming RSCNs) the driver would loop indefinitely or hang due to termination of an invalid thread pid. drivers/scsi/qla2xxx/qla_init.c | 142 ++++++++++++++++------------------------ drivers/scsi/qla2xxx/qla_os.c | 1 2 files changed, 60 insertions(+), 83 deletions(-) [PATCH] PATCH [9/15] qla2xxx: Tape command handling fixes Fix several problems when handling commands issued to tape devices: 1) insure commands are not prematurely returned to the mid-layer with a failed status during loop/fabric transitions. 2) tape commands tend to have rather 'long' timeout values, unfortunately, as the these values increase into the 17 to 20 minute range (and larger), the cumulative skew of the RISC's own timer result in commands being held for seconds beyond their defined timeout values. Compensate for this in the driver's command timeout function. drivers/scsi/qla2xxx/qla_def.h | 3 + drivers/scsi/qla2xxx/qla_init.c | 4 ++ drivers/scsi/qla2xxx/qla_isr.c | 10 ++--- drivers/scsi/qla2xxx/qla_os.c | 74 ++++++++++++++++++++++++++++++++++++---- 4 files changed, 79 insertions(+), 12 deletions(-) [PATCH] PATCH [10/15] qla2xxx: Use readX_relaxed Jeremy Higdon : For those to whom this is new (it was discussed on linux-kernel and linux-ia64 I believe), normal PCI register reads imply that PCI DMA writes that occured prior to the PCI MMR (memory mapped register) read (on the PCI bus) will be reflected in system memory once the MMR read is complete. On our platforms, we can speed up the MMR read significantly if that ordering requirement is "relaxed". So I attempted to find the common register reads that don't have a need for this ordering so that I could make them use this faster read. drivers/scsi/qla2xxx/qla_def.h | 3 +++ drivers/scsi/qla2xxx/qla_iocb.c | 6 +++--- drivers/scsi/qla2xxx/qla_isr.c | 2 +- 3 files changed, 7 insertions(+), 4 deletions(-) [PATCH] PATCH [11/15] qla2xxx: /proc fixes /proc file updates: o Address 'unaligned access' message on ia64 platorms while displaying bit-field flags. o Iterate through the the OS target array to display target ID bindings. drivers/scsi/qla2xxx/qla_os.c | 30 ++++++++++++------------------ 1 files changed, 12 insertions(+), 18 deletions(-) [PATCH] PATCH [12/15] qla2xxx: RIO/ZIO fixes RIO/ZIO fixes: o Reduce register access during RIO operation by checking for a 'dirtied' signature. o Fix problem where ZIO mode handling could result in a nasty recursive call-frame. drivers/scsi/qla2xxx/qla_os.c | 5 +---- 1 files changed, 1 insertion(+), 4 deletions(-) [PATCH] PATCH [13/15] qla2xxx: Misc. code scrubbing Misc. driver scrubbing: o Use kernel #define for PCI command register bit. o Fix rate-limiting check the queue-depth module parameter. o Clean-up comments. drivers/scsi/qla2xxx/qla_init.c | 2 +- drivers/scsi/qla2xxx/qla_mbx.c | 1 - drivers/scsi/qla2xxx/qla_os.c | 7 +++---- 3 files changed, 4 insertions(+), 6 deletions(-) PATCH [14/15] qla2xxx: Resync with latest released firmware -- 3.02.28. From: Andrew Vasquez drivers/scsi/qla2xxx/ql2300_fw.c |12380 +++++++++++++++++++-------------------- drivers/scsi/qla2xxx/ql2322_fw.c |11812 ++++++++++++++++++------------------- drivers/scsi/qla2xxx/ql6312_fw.c |10174 ++++++++++++++++---------------- drivers/scsi/qla2xxx/ql6322_fw.c |10352 ++++++++++++++++---------------- 4 files changed, 22368 insertions(+), 22350 deletions(-) [PATCH] PATCH [15/15] qla2xxx: Update driver version Update version number to 8.00.00b12-k. drivers/scsi/qla2xxx/qla_version.h | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) [PATCH] imm/ppa style police fix remaining style problems after Al ressurrected the drivers. [PATCH] missing pci_set_master in megaraid [PATCH] mca_53c9x needs CONFIG_MCA_LEGACY sym53c500_cs remove irq,ioport scsi attributes From: Bob Tracy [PATCH] Initio INI-9X00U/UW error handling in 2.6 Plumb old error handling into new eh infrastructure. [Bluetooth] Fix disconnect race on ISOC interface The hci_usb_disconnect() gets called recursively when SCO support is enabled and used. This causes sysfs_hash_and_remove() finally to dereference a NULL pointer. Noticed by Sebastian Schmidt [Bluetooth] Adapt changes for USB core altsettings The USB core has changed its way the interfaces and the altsettings are stored. The probe routines of the USB based Bluetooth drivers must be changed and in some cases they are simplified. Patch from Alan Stern JFS: module unload was not removing /proc/fs/jfs/ Add IBM power RAID driver 2.0.6 From: Brian King [Bluetooth] Use type of the parent socket The SELinux fixes for kernel sockets assume that we always use the type SOCK_SEQPACKET, but this must not be the truth. Give the sock->type as argument to sock_create_lite() and everything is correct for the new child socket. Add SCSI IPR PCI Ids to pci_ids.h [PATCH] small scheduler cleanup From: Ingo Molnar From: Nick Piggin wrote: It removes the last place where we mess with run_list open coded. [PATCH] sched: improved resolution in find_busiest_node From: Nick Piggin From: Frank Cornelis In order to get the best possible resolution we need to use NR_CPUS instead of the constant value 10. load is an int, so no need to worry about overflows... [PATCH] sched: scheduler domain support From: Nick Piggin This is the core sched domains patch. It can handle any number of levels in a scheduling heirachy, and allows architectures to easily customize how the scheduler behaves. It also provides progressive balancing backoff needed by SGI on their large systems (although they have not yet tested it). It is built on top of (well, uses ideas from) my previous SMP/NUMA work, and gets results very similar to them when using the default scheduling description. Benchmarks ========== Martin was seeing I think 10-20% better system times in kernbench on the 32 way. I was seeing improvements in dbench, tbench, kernbench, reaim, hackbench on a 16-way NUMAQ. Hackbench in fact had a non linear element which is all but eliminated. Large improvements in volanomark. Cross node task migration was decreased in all above benchmarks, sometimes by a factor of 100!! Cross CPU migration was also generally decreased. See this post: http://groups.google.com.au/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&frame=right&th=a406c910b30cbac4&seekm=UAdQ.3hj.5%40gated-at.bofh.it#link2 Results on a hyperthreading P4 are equivalent to Ingo's shared runqueues patch (which is a big improvement). Some examples on the 16-way NUMAQ (this is slightly older sched domain code): http://www.kerneltrap.org/~npiggin/w26/hbench.png http://www.kerneltrap.org/~npiggin/w26/vmark.html From: Jes Sorensen Tiny patch to make -mm3 compile on an NUMA box with NR_CPUS > BITS_PER_LONG. From: "Martin J. Bligh" Fix a minor nit with the find_busiest_group code. No functional change, but makes the code simpler and clearer. This patch does two things ... adds some more expansive comments, and removes this if clause: if (*imbalance < SCHED_LOAD_SCALE && max_load - this_load > SCHED_LOAD_SCALE) *imbalance = SCHED_LOAD_SCALE; If we remove the scaling factor, we're basically conditionally doing: if (*imbalance < 1) *imbalance = 1; Which is pointless, as the very next thing we do is to remove the scaling factor, rounding up to the nearest integer as we do: *imbalance = (*imbalance + SCHED_LOAD_SCALE - 1) >> SCHED_LOAD_SHIFT; Thus the if statement is redundant, and only makes the code harder to read ;-) From: Rick Lindsley In find_busiest_group(), after we exit the do/while, we select our imbalance. But max_load, avg_load, and this_load are all unsigned, so min(x,y) will make a bad choice if max_load < avg_load < this_load (that is, a choice between two negative [very large] numbers). Unfortunately, there is a bug when max_load never gets changed from zero (look in the loop and think what happens if the only load on the machine is being created by cpu groups of which we are a member). And you have a recipe for some really bogus values for imbalance. Even if you fix the max_load == 0 bug, there will still be times when avg_load - this_load will be negative (thus very large) and you'll make the decision to move stuff when you shouldn't have. This patch allows for this_load to set max_load, which if I understand the logic properly is correct. With this patch applied, the algorithm is *much* more conservative ... maybe *too* conservative but that's for another round of testing ... From: Ingo Molnar sched-find-busiest-fix [PATCH] sched_domain debugging From: Nick Piggin Anton was attempting to make a sched domain topology for his POWER5 and was having some trouble. This patch only includes code which is ifdefed out, but hopefully it will be of some use to implementors. [PATCH] scheduler domain balancing improvements From: Nick Piggin This patch gets the sched_domain scheduler working better WRT balancing. Its been tested on the NUMAQ. Among other things it changes to the way SMT load calculation works so as not to active load blances when it shouldn't. It still has a problem with SMT and NUMA: it will put a task on each sibling in a node before moving tasks to another node. It should probably start moving tasks after each *physical* CPU is filled. To fix, you need "how much CPU power in this domain?" At the moment we approximate # runqueues == CPU power, and hack around it at the CPU physical domain by counting all sibling runqueues as 1. It isn't hard to correctly work the CPU power out, but once CPU hotplug is in the equation it becomes much more hotplug events. If anyone is actually interested in getting this fixed, that is. [PATCH] sched: cpu_sibling_map to cpu_mask From: Nick Piggin This is a (somewhat) trivial patch which converts cpu_sibling_map from an array of CPUs to an array of cpumasks. Needed for >2 siblings per package, but it actually can simplify code as it allows the cpu_sibling_map to be set up even when there is 1 sibling per package. Intel want this, I use it in the next patch to build scheduling domains for the P4 HT. From: Thomas Schlichter Build fix From: "Pallipadi, Venkatesh" Fix to handle more than 2 siblings per package. [PATCH] sched: implement domains for i386 HT From: Nick Piggin The following patch builds a scheduling description for the i386 architecture using cpu_sibling_map to set up SMT if CONFIG_SCHED_SMT is set. It could be made more fancy and collapse degenerate domains at runtime (ie. 1 sibling per CPU, or 1 NUMA node in the computer). From: Zwane Mwaikambo This fixes an oops due to cpu_sibling_map being uninitialised when a system with no MP table (most UP boxen) boots a CONFIG_SMT kernel. What also happens is that the cpu_group lists end up not being terminated properly, but this oops kills it first. Patch tested on UP w/o MP table, 2x P2 and UP Xeon w/ no siblings. From: "Martin J. Bligh" , Nick Piggin Change arch_init_sched_domains to use cpu_online_map From: Anton Blanchard Fix build with NR_CPUS > BITS_PER_LONG [PATCH] sched: handle inter-CPU jiffies skew From: Nick Piggin John Hawkes discribed this problem to me: There *is* a small problem in this area, though, that SuSE avoids. "jiffies" gets updated by cpu0. The other CPUs may, over time, get out of sync (and they're initialized on ia64 to start out being out of sync), so it's no guarantee that every CPU will wake up from its timer interrupt and see a "jiffies" value that is guaranteed to be last_jiffies+1. Sometimes the jiffies value may be unchanged since the last wakeup. Sometimes the jiffies value may have incremented by 2 (or more, especially if cpu0's interrupts are disabled for long stretches of time). So an algoithm that says, "I'll call load_balance() only when jiffies is *exactly* N" is going to fail on occasion, either by calling load_balance() too often or not often enough. *** I fixed this by adding a last_balance field to struct sched_domain, and working off that. [PATCH] sched_balance_exec(): don't fiddle with the cpus_allowed mask From: Rusty Russell , Nick Piggin The current sched_balance_exec() sets the task's cpus_allowed mask temporarily to move it to a different CPU. This has several issues, including the fact that a task will see its affinity at a bogus value. So we change the migration_req_t to explicitly specify a destination CPU, rather than the migration thread deriving it from cpus_allowed. If the requested CPU is no longer valid (racing with another set_cpus_allowed, say), it can be ignored: if the task is not allowed on this CPU, there will be another migration request pending. This change allows sched_balance_exec() to tell the migration thread what to do without changing the cpus_allowed mask. So we rename __set_cpus_allowed() to move_task(), as the cpus_allowed mask is now set by the caller. And move_task_away(), which the migration thread uses to actually perform the move, is renamed __move_task(). I also ignore offline CPUs in sched_best_cpu(), so sched_migrate_task() doesn't need to check for offline CPUs. Ulterior motive: this approach also plays well with CPU Hotplug. Previously that patch might have seen a task with cpus_allowed only containing the dying CPU (temporarily due to sched_balance_exec) and forcibly reset it to all cpus, which might be wrong. The other approach is to hold the cpucontrol sem around sched_balance_exec(), which is too much of a bottleneck. [PATCH] sched-group-power From: Nick Piggin The following patch implements a cpu_power member to struct sched_group. This allows special casing to be removed for SMT groups in the balancing code. It does not take CPU hotplug into account yet, but that shouldn't be too hard. I have tested it on the NUMAQ by pretending it has SMT. Works as expected. Active balances across nodes. [PATCH] sched_domains: use cpu_possible_map From: Nick Piggin This changes sched domains to contain all possible CPUs, and check for online as needed. It's in order to play nicely with CPU hotplug. [PATCH] sched: SMT niceness handling From: Con Kolivas This patch provides full per-package priority support for SMT processors (aka pentium4 hyperthreading) when combined with CONFIG_SCHED_SMT. It maintains cpu percentage distribution within each physical cpu package by limiting the time a lower priority task can run on a sibling cpu concurrently with a higher priority task. It introduces a new flag into the scheduler domain unsigned int per_cpu_gain; /* CPU % gained by adding domain cpus */ This is empirically set to 15% for pentium4 at the moment and can be modified to support different values dynamically as newer processors come out with improved SMT performance. It should not matter how many siblings there are. How it works is it compares tasks running on sibling cpus and when a lower static priority task is running it will delay it till high_priority_timeslice * (100 - per_cpu_gain) / 100 <= low_prio_timeslice eg. a nice 19 task timeslice is 10ms and nice 0 timeslice is 102ms On vanilla the nice 0 task runs on one logical cpu while the nice 19 task runs unabated on the other logical cpu. With smtnice the nice 0 runs on one logical cpu for 102ms and the nice 19 sleeps till the nice 0 task has 12ms remaining and then will schedule. Real time tasks and kernel threads are not altered by this code, and kernel threads do not delay lower priority user tasks. with lots of thanks to Zwane Mwaikambo and Nick Piggin for help with the coding of this version. If this is merged, it is probably best to delay pushing this upstream in mainline till sched_domains gets tested for at least one major release. [PATCH] sched: add local load metrics From: Nick Piggin This patch removes the per runqueue array of NR_CPU arrays. Each time we want to check a remote CPU's load we check nr_running as well anyway, so introduce a cpu_load which is the load of the local runqueue and is kept updated in the timer tick. Put them in the same cacheline. This has additional benefits of having the cpu_load consistent across all CPUs and more up to date. It is sampled better too, being updated once per timer tick. This shouldn't make much difference in scheduling behaviour, but all benchmarks are either as good or better on the 16-way NUMAQ: hackbench, reaim, volanomark are about the same, tbench and dbench are maybe a bit better. kernbench is about one percent better. John reckons it isn't a big deal, but it does save 4K per CPU or 2MB total on his big systems, so I figure it must be a bit kinder on the caches. I think it is just nicer in general anyway. [PATCH] Reduce TLB flushing during process migration From: Martin Hicks Another optimization patch from Jack Steiner, intended to reduce TLB flushes during process migration. Most architextures should define tlb_migrate_prepare() to be flush_tlb_mm(), but on i386, it would be a wasted flush, because i386 disconnects previous cpus from the tlb flush automatically. [PATCH] sched: trivial fixes, cleanups From: Ingo Molnar The trivial fixes. - added recent trivial bits from Nick's and my patches. - hotplug CPU fix - early init cleanup [PATCH] Hotplug CPU sched_balance_exec Fix From: Rusty Russell From: Srivatsa Vaddagiri From: Andrew Morton From: Rusty Russell We want to get rid of lock_cpu_hotplug() in sched_migrate_task. Found that lockless migration of execing task is _extremely_ racy. The races I hit are described below, alongwith probable solutions. Task migration done elsewhere should be safe (?) since they either hold the lock (sys_sched_setaffinity) or are done entirely with preemption disabled (load_balance). sched_balance_exec does: a. disables preemption b. finds new_cpu for current c. enables preemption d. calls sched_migrate_task to migrate current to new_cpu and sched_migrate_task does: e. task_rq_lock(p) f. migrate_task(p, dest_cpu ..) (if we have to wait for migration thread) g. task_rq_unlock() h. wake_up_process(rq->migration_thread) i. wait_for_completion() Several things can happen here: 1. new_cpu can go down after h and before migration thread has got around to handle the request ==> we need to add a cpu_is_offline check in __migrate_task 2. new_cpu can go down between c and d or before f. ===> Even though this case is automatically handled by the above change (migrate_task being called on a running task, current, will delegate migration to migration thread), would it be good practice to avoid calling migrate_task in the first place itself when dest_cpu is offline. This means adding another cpu_is_offline check after e in sched_migrate_task 3. The 'current' task can get preempted _immediately_ after g and when it comes back, task_cpu(p) can be dead. In which case, it is invalid to do wake_up on a non-existent migration thread. (rq->migration_thread can be NULL). ===> We should disable preemption thr' g and h 4. Before migration thread gets around to handle the request, its cpu goes dead. This will leave unhandled migration requests in the dead cpu. ===> We need to wakeup sleeping requestors (if any) in CPU_DEAD notification. I really wonder if we can get rid of these issues by avoiding balancing at exec time and instead have it balanced during load_balance ..Alternately if this is valuable and we want to retain it, I think we still need to consider a read/write sem, with sched_migrate_task doing down_read_trylock. This may eliminate the deadlock I hit between cpu_up and CPU_UP_PREPARE notification, which had forced me away from r/w sem. Anyway patch below addresses the above races. Its against 2.6.6-rc2-mm1 and has been tested on a 4way Intel Pentium SMP m/c. Rusty sez: Two other changes: 1) I grabbed a reference to the thread, rather than using preempt_disable(). It's the more obvious way I think. 2) Why the wait_to_die code? It might be needed if we move tasks after stop_machine, but for nowI don't see the problem with the migration thread running on the wrong CPU for a bit: nothing is on this runqueue so active_load_balance is safe, and __migrate task will be a noop (due to cpu_is_offline() check). If there is a problem, your fix is racy, because we could be preempted immediately afterwards. So I just stop the kthread then wakeup any remaining... [PATCH] sched: wakeup balancing fixes From: Nick Piggin Make affine wakes and "passive load balancing" more conservative. Aggressive affine wakeups were causing huge regressions in dbt3-pgsql on 8-way non NUMA systems at OSDL's STP. [PATCH] sched: fix imbalance calculations From: Nick Piggin Imbalance calculations were not right. This would cause unneeded migration. [PATCH] sched: altix tuning From: Nick Piggin From: John Hawkes The following brings up performance on a 64-way Altix. This system being on the smaller end of the scale should also be applicable to other NUMA systems. [PATCH] sched: oops fix From: Nick Piggin After the for_each_domain change, the warn here won't trigger, instead it will oops in the if statement. Also, make sure we don't pass an empty cpumask to for_each_cpu. [PATCH] ppc64: sched-domain support From: Anton Blanchard Below are the diffs between the current ppc64 sched init stuff and x86. - Ignore the POWER5 specific stuff, I dont set up a sibling map yet. - What should I set cache_hot_time to? large cpumask typechecking requirements (perhaps useful on x86 as well): - cpu->cpumask = CPU_MASK_NONE -> cpus_clear(cpu->cpumask); - cpus_and(nodemask, node_to_cpumask(i), cpu_possible_map) doesnt work, need to use a temporary [PATCH] ARCH_HAS_SCHED_WAKE_BALANCE doesnt exist From: Anton Blanchard It seems someone has been making trivial changes without using grep. [PATCH] sched: fix setup races From: Nick Piggin De-racify the sched domain setup code. This involves creating a dummy "init" domain during sched_init (which is called early). When topology information becomes available, the sched domains are then built and attached. The attach mechanism is asynchronous and uses the migration threads, which perform the switch with interrupts off. This is a quiescent state, so domains can still be lockless on the read side. It also allows us to change the domains at runtime without much more work. This is something SGI is interested in to elegantly do soft partitioning of their systems without having to use hard cpu affinities (which cause balancing problems of their own). The current setup code also has a race somewhere because it is unable to boot on a 384 CPU system. From: Anton Blanchard This is basically a mindless ppc64 merge of the x86 changes to sched domain init code. Actually if I produce a sibling_map[] then the x86 code and the ppc64 will be identical. Maybe we can merge it. [PATCH] sched: minor cleanups From: Nick Piggin Minor cleanups from Ingo's patch including task_hot (do it right in try_to_wake_up too). [PATCH] sched: uninlinings From: Ingo Molnar Uninline things [PATCH] sched: add enqueeu_task_head() From: Ingo Molnar Helper function for later patches [PATCH] sched: extend sync wakeups From: Ingo Molnar The attached patch extends sync wakeups to the process sys_exit() path too: the chldwait wakeup can be done sync, since we know that the process is going to exit (and thus deschedule). The most visible effect of this change is strace's behavior on SMP systems: it now stays on a single CPU, together with the traced child. (previously it would run in parallel to the child, bouncing around madly.) [PATCH] sched: lock cpu_attach_domain for hotplug From: Nick Piggin The attached patch is required to work correctly with the CPU hotplug framework. John Hawkes reports successful booting with this. [PATCH] sched: cleanups From: Ingo Molnar This re-adds cleanups which were lost in splitups of an earlier patch. [PATCH] sched: passive balancing damping From: Nick Piggin This patch starts to balance woken processes when half the relevant domain's imbalance_pct is reached. Previously balancing would start after a small, constant difference in waker/wakee runqueue loads was reached, which would cause too much process movement when there are lots of processes running. It also turns wake balancing into a domain flag while previously it was always on. Now sched domains can "soft partition" an SMP system without using processor affinities. [PATCH] sched: cpu load management cleanup From: Ingo Molnar This does the source/target cleanup. This is a no-functionality patch which also adds more comments to explain these functions. [PATCH] sched: balance-on-clone From: Ingo Molnar Implement balancing during clone(). It does the following things: - introduces SD_BALANCE_CLONE that can serve as a tool for an architecture to limit the search-idlest-CPU scope on clone(). E.g. the 512-CPU systems should rather not enable this. - uses the highest sd for the imbalance_pct, not this_rq (which didnt make sense). - unifies balance-on-exec and balance-on-clone via the find_idlest_cpu() function. Gets rid of sched_best_cpu() which was still a bit inconsistent IMO, it used 'min_load < load' as a condition for balancing - while a more correct approach would be to use half of the imbalance_pct, like passive balancing does. - the patch also reintroduces the possibility to do SD_BALANCE_EXEC on SMP systems, and activates it - to get testing. - NOTE: there's one thing in this patch that is slightly unclean: i introduced wake_up_forked_thread. I did this to make it easier to get rid of this patch later (wake_up_forked_process() has lots of dependencies in various architectures). If this capability remains in the kernel then i'll clean it up and introduce one function for wake_up_forked_process/thread. - NOTE2: i added the SD_BALANCE_CLONE flag to the NUMA CPU template too. Some NUMA architectures probably want to disable this. [PATCH] sched: reduce idle time From: Nick Piggin It makes NEWLY_IDLE balances cause find_busiest_group return the busiest available group even if there isn't an imbalance. Basically - try a bit harder to prevent schedule emptying the runqueue. It is quite aggressive, but that isn't so bad because we don't (by default) do NEWLY_IDLE balancing across NUMA nodes, and NEWLY_IDLE balancing is always restricted to cache_hot tasks. It picked up a little bit of idle time that dbt2-pgsql was seeing... [PATCH] sched: micro-optimisation for wake_up From: Nick Piggin This actually does produce better code, especially under the locked section. Turns a conditional + unconditional jump under the lock in the unlikely case into a cmov outside the lock. [PATCH] sched: Look at another CPU's domain From: Nick Piggin The SMT wake_idle code really wants to look at a non-local CPU's domain in order to check for idle siblings. So change the domain attachment code a little bit so we continue to hold a runqueue's lock while attaching a new domain. This means the locking rules have changed to: you may access your own domain without any lock, you must hold a remote runqueue's lock in order to view its domain. [PATCH] Move migrate_all_tasks to CPU_DEAD handling From: Srivatsa Vaddagiri migrate_all_tasks is currently run with rest of the machine stopped. It iterates thr' the complete task table, turning off cpu affinity of any task that it finds affine to the dying cpu. Depending on the task table size this can take considerable time. All this time machine is stopped, doing nothing. Stopping the machine for such extended periods can be avoided if we do task migration in CPU_DEAD notification and that's precisely what this patch does. The patch puts idle task to the _front_ of the dying CPU's runqueue at the highest priority possible. This cause idle thread to run _immediately_ after kstopmachine thread yields. Idle thread notices that its cpu is offline and dies quickly. Task migration can then be done at leisure in CPU_DEAD notification, when rest of the CPUs are running. Some advantages with this approach are: - More scalable. Predicatable amout of time that machine is stopped. - No changes to hot path/core code. We are just exploiting scheduler rules which runs the next high-priority task on the runqueue. Also since I put idle task to the _front_ of the runqueue, there are no races when a equally high priority task is woken up and added to the runqueue. It gets in at the back of the runqueue, _after_ idle task! - cpu_is_offline check that is presenty required in try_to_wake_up, idle_balance and rebalance_tick can be removed, thus speeding them up a bit From: Srivatsa Vaddagiri Rusty mentioned that the unlikely hints against cpu_is_offline is redundant since the macro already has that hint. Patch below removes those redundant hints I added. [PATCH] sched_getaffinity vs cpu hotplug race fix From: Srivatsa Vaddagiri Fix the race in sys_sched_getaffinity. Patch below takes cpu_hotplug lock before reading cpus_allowed mask of a task. [PATCH] migration_thread() race fix From: Srivatsa Vaddagiri Noticed that migration_thread can examine "kthread_should_stop()?" without setting its state to TASK_INTERRUPTIBLE first. This can cause kthread_stop on that thread to block forever ... P.S - I assumed that having the task state set to TASK_INTERRUTIBLE while it is doing active_load_balance is fine. It seemed to be the case earlier also. [PATCH] aacraid reset handler fix This fixes a situation where the handler can exit too early. [ACPI] handle _CRS outside _PRS -- even when non-zero avoid sharing IRQ12 http://bugzilla.kernel.org/show_bug.cgi?id=2665 [PATCH] Fix deadlock in journalled quota From: Jan Kara Attached patch should fix reported deadlock in journalled quota code. quotactl() call was violating the locking rules and didn't start transaction when it should. From: Found a couple of symbols not exported that were needed by the ext3.ko module. [PATCH] MIPS update From: Ralf Baechle - Kconfig cleanups: - enable DMA_NONCOHERENT, DMA_COHERENT or DMA_IP27 via reverse dependencies - untangle VRC4171 / VRC4173 selection - R10000 support enables PREFETCH - SEAD needs IRQ_CPU - Update defconfig against latest Kconfig files. - Fix computation of return address if syscall number was out of range - Add power managment hooks in signal code. - Don't try to handle signals when previous context was not in user mode. - Fix serial interface setup for VR41xx systems. - Build fixes after CLEAR_BITMAP changed name. - Removes bogus comment from - is dead. - Start collecting common definitions for PMON firmware in - Define ARCH_MIN_TASKALIGN to 8; we have 64-bit members even on 32-bit kernels if we're running on MIPS II or better. [PATCH] mips: fix 2.6 fb setup From: Ralf Baechle [PATCH] mips: Simplify expression From: Ralf Baechle CONFIG_MIPS is always defined, for 32-bit and 64-bit. [PATCH] mips: newport driver fixes From: Ralf Baechle Make the driver for Newport aka XL work in 2.6. [PATCH] mips: remove VIDEO_TYPE_SNI_RM From: Ralf Baechle The RM200's onboard video really is a plain old boring Cyrix PCI card. [PATCH] mips: GBE Video Driver From: Ralf Baechle This patch adds the GBE video driver for the video system in SGI IP32 aka O2 and it's i386-based equivalent the Visual Workstation. This driver obsoletes sgivwfb.c; but I'd prefer to play safe and remove it after some additional time, just in case. [PATCH] mips: add missing IP22 Zilog bit From: Ralf Baechle Add missing definition PORT_IP22ZILOG which is need by ip22zilog driver. [PATCH] mips: 64-bit MIPS needs compat stuff From: Ralf Baechle [PATCH] mips: remove dz driver From: Ralf Baechle This driver has been obsoleted by drivers/serial/dz.c. [PATCH] mips: sgiwd93 2.6 fixes and crapectomy From: Ralf Baechle Get to work under 2.6 sorting out the giant mess this has been. Further cleanups would require a full crapectomy of wd33c93.c itself ... [PATCH] Fixes in 32 bit ioctl emulation code From: Raghavan , me I am submitting a patch that fixes 2 race conditions in the 32 bit ioctl emulation code.(fs/compat.c) Since the search is not locked; when a ioctl_trans structure is deleted, corruption can occur. The following scenarios discuss the race conditions: 1) When the search is hapenning, if any ioctl_trans structure gets deleted; then rather than searching the hash table, the code will start searching the free list. while (t && t->cmd != cmd) - [PATCH] Fix nmi_watchdog=2 and P4 HT From: Philippe Elie With nmi_watchdog=2 and a P4 ht box the nmi is occurring only on logical processor 0, it's better to get it on both. With this patch, on x86 SMP and nmi_watchdog=2, nmi interupts occur at 1000 hz (if the cpu is loaded) not at the intended 1 hz rate but that's a distinct problem. [PATCH] reduce NMI watchdog call frequency with local APIC. From: Mikael Pettersson The real problem is that SMP with nmi_watchdog=2 initialises the lapic NMI watchdog but doesn't check it and therefore doesn't reduce nmi_hz. This is an SMP bug. The patch changes smpboot.c to do a check_nmi_watchdog() at the appropriate place, which fixes the high NMI frequency problem w/o changing anything else. I've verified that it solves the problem on my MP-capable UP box. [PATCH] Fix ext3 bogus ENOSPC With strange workloads which do a lot of quick truncation on small filesystems it is possible to get into a situation where there are free blocks on the disk, but they are not allocatable at this time due to their having been freed up in the current JBD transaction. Applications get unexpected ENOSPC errors. We can fix that with this patch, originally by Andreas Dilger which forces a single commit+retry when an ENOSPC is encountered. [PATCH] sched: in_sched_functions() cleanup From: Rusty Russell 1) Create an in_sched_functions() function in sched.c and make the archs use it. (Two archs have wchan #if 0'd out: left them alone). 2) Move __sched from linux/init.h to linux/sched.h and add comment. 3) Rename __scheduling_functions_start_here/end_here to __sched_text_start/end. Thanks to wli and Sam Ravnborg for clue donation. [PATCH] ext3 error handling fixes From: Andreas Dilger a) we don't call ext3_error() for an IO error in ext3_find_entry(), so we won't do the normal ext3 error handling (mark SB in error, remount-ro or panic if desired); b) in empty_dir() we don't continue checking for non-empty blocks after a content error (ext3_check_dir_entry() calls ext3_error() already); c) we had decided not to mark the SB in error for holes in directories to allow leway in the indexed-directory implementation, but this change incorrectly also disabled marking the SB in error for real IO errors. [PATCH] selinux: reopen descriptors closed on exec to /dev/null From: Stephen Smalley This patch changes the SELinux module to try to reset any descriptors it closes on exec (due to a lack of permission by the new domain to the inherited open file) to refer to the null device. This counters the problem of SELinux inducing program misbehavior, particularly due to having descriptors 0-2 closed when the new domain is not allowed access to the caller's tty. This is primarily to address the case where the caller is trusted with respect to the new domain, as the untrusted caller case is already handled via AT_SECURE and glibc secure mode. The code is partly based on the OpenWall LSM, which in turn drew from the OpenWall kernel patch. Note that the code does not guarantee that the descriptor is always re-opened to /dev/null; it merely makes a reasonable effort to do so, but can fail under various conditions. [PATCH] cyclades MAINTAINERS update From: Marcelo Tosatti [PATCH] Laptop Mode doc update From: Richard Atterer reported that mutt does not play well with noatime (it uses access times to check whether new mail has arrived in a folder). This patch warns about this in the doc, and adds a setting to the control script to disable the noatime remount. [PATCH] AS: increase batch expiry intervals From: Nick Piggin Without disturbing the read/write ratio, increase the bathc expiry intervals. This wil have the effect of increasing latency a little, but with improved throughput. [PATCH] Consolidate sys32_readv and sys32_writev From: Arnd Bergmann The seven implementations of this have gone out of sync and are mostly buggy. The new compat_sys_* version is based on the ppc64 implementation, which most closely resembles the code in sys_readv/sys_writev. Tested on x86_64, ia64 and s390. [PATCH] Consolidate do_execve32 From: Arnd Bergmann The code for sys32_execve/do_execve32 in most of the seven versions was copied from fs/exec.c but not kept up-to-date. The new compat_do_execve() function is based on the mips code and has been resync'ed with do_execve(). IA64 changes are from Arun Sharma. Tested on x86_64, ia64 and s390 [PATCH] Consolidate sys32_select From: Arnd Bergmann sys32_select has seven mostly but not exactly identical versions, so consolidate them as compat_sys_select. Based on the ppc64 implementation, which most closely resembles sys_select. One bug that was not caught by LTP has been fixed since the first version of this patch. tested x86_64, ia64 and s390. [PATCH] Consolidate sys32_nfsservctl From: Arnd Bergmann sys32_nfsservctl is the largest remaining syscall emulation handler that can be consolidated. mips and ia64 currently don't use this at all, parisc has a simpler implementation than the one used by s390, sparc ppc and that the new compat_sys_nfsservctl is based on. The user access checks in the code are inconsistant at least, which should be fixed here. Compile tested only due to lack of proper test setup. [PATCH] es7000 subarch update From: "Protasevich, Natalie" The patch fixes a problem with ES7000 Server Management mechanism that uses platform register mip_port. It was not initialized, so the mechanism was not functional. The patch also fixes the APIC destination for hierarchical and flat cluster models used in ES7000. The destination ID's reflect policies for Cascade based systems which use logical delivery and lowest priority mechanism, and for xAPIC based models that use physical delivery and fixed APIC destinations. The patch also turns on NO_IOAPIC_CHECK (1) to avoid error messages and attempts to re-write the ID, because on ES7000 all ID's are hard coded in the BIOS and cannot be altered. [PATCH] ppc32: ppc8xx build fixes From: "Prof. BJ" - m8xx_setup warning and mfmsr error fix - ppc8xx_pic include error fix - tqm8xxl.c typeing (syntax) error fix - commproc.c include error and prototype warning fix (acked by Matt Porter) [PATCH] Remove bootsect_helper and a comment fix From: Coywolf Qi Hunt Since "Direct booting from floppy is no longer supported", this patch is to remove the bootsect_helper code. And also a comment fix. The other two platforms x86_64 and PC-9800 should also be cleaned up too. [PATCH] Remove bootsect_helper on x86_64 and pc98 From: Coywolf Qi Hunt Since "Direct booting from floppy is no longer supported", this patch is remove the bootsect_helper code from x86_64 and PC-9800. [PATCH] remove some unused variables in s2io From: Anton Blanchard Found a few warnings when compiling with NAPI off. [PATCH] New version of early CPU detect From: Andi Kleen We still need some kind of early CPU detection, e.g. for the AMD768 workaround and for the slab allocator to size its slabs correctly for the cache line. Also some other code already had private early CPU routines. This patch takes a new approach compared to the previous patch which caused Andrew so much grief. It only fills in a few selected fields in boot_cpu_data (only the data needed to identify the CPU type and the cache alignment). In particular the feature masks are not filled in, and the other fields are also not touched to prevent unwanted side effects. Also convert the ppro workaround to use standard cpu data now. I'm not sure if slab still has the necessary support to use the cache line size early; previously Manfred showed some serious memory saving with this for kernels that are compiled for a bigger cache line size than the CPU (is often the case on distribution kernels). This code could be reenable now with this patch. [PATCH] shrink_slab: improved handling of GFP_NOFS allocations Currently, shrink_slab() will decide that it needs to scan a certain number of dentries, will call shrink_dcache_memory() requesting that this be done, and shrink_dcache_memory() will simply bale out without doing anything because the caller did not have __GFP_FS. This has the potential to disrupt our lovely pagecache-vs-slab balancing act. So change things so that shrinker callouts can return -1, indicating that they baled out. This way, shrink_slab can remember that this slab was owed a certain number of scannings and these will be correctly performed next time a __GFP_FS caller comes by. [PATCH] fix 3c59x.c to allow 3c905c 100bT-FD From: Burton Windle Fix the 3c905C 10/100 transceiver initialisation woes. [PATCH] partitioning cleanup: use DOS_EXTENDED_PARTITION From: FabF Use the pre-existing enum rather than magic numbers. [PATCH] Reiserfs commit default fix From: Bart Samwel This patch from Micha Feigin fixes some bugs in the earlier reiserfs commit default patch. The changelog: * If you remounted without any commit=NNN option, it would assume commit=0 and restore the defaults. This patch makes it leave the current state alone if you don't pass commit=NNN. * Added range check for cast from unsigned long to unsigned int. [PATCH] reiserfs: acl device node initialization From: Chris Mason From: jeffm@suse.com properly init device inodes in the acl code [PATCH] reiserfs: xattr support From: Chris Mason From: jeffm@suse.com reiserfs support for xattrs [PATCH] reiserfs: ACL support From: Chris Mason From: jeffm@suse.com reiserfs acl support [PATCH] reiserfs: support trusted xattrs From: Chris Mason From: jeffm@suse.com reiserfs support for trusted xattrs [PATCH] reiserfs: selinux support From: Chris Mason From: jeffm@suse.com reiserfs support for selinux [PATCH] reiserfs: xattr locking fixes From: Chris Mason From: jeffm@suse.com reiserfs xattr locking fixes [PATCH] reiserfs: quota support From: Chris Mason ReiserFS support for quotas. Originally from Jan Kara [PATCH] reiserfs: xattr permission fix From: Chris Mason From: jeffm@suse.com reiserfs permission bug fix for xattrs [PATCH] reiserfs: add device info to diagnostic messages From: Chris Mason From: Jeff Mahoney Add device info to the various reiserfs warnings and panics so you can tell which filesystem triggers the message. Loosely based on code from Oleg Drokin. [PATCH] mptfusion depends on scsi From: Olaf Hering [PATCH] find_user locking and leak fix find_user() is being called from set/get_priority(), but it doesn't take the needed lock, and those callers were forgetting to drop the refcount which find_user() took. [PATCH] Improve laptop mode's block_dump output From: "Theodore Ts'o" This patch versus improves the output produced by "echo 1 > /proc/sys/vm/block_dump", in the following ways: 1) The messages are printed with KERN_DEBUG, so that even if sysklogd is running, if configured appropriately, it will not need to write to log files. 2) The inode which is dirtied by a process is now identified more precisely by inode number and filesystem ID, and by a dcache name if present. 3) In the generic filesystem sget function, the superblock id (s_id) is filled in with the filesystem type by default. Filesystems which are block-device based will override s_id, but this allows pseudo filesystems such as tmpfs, procfs, etc. to be identified in (2). [PATCH] com90xx error message patch: check_region() gone From: Greg Aumann This patch updates two error messages to reflect changes in the code. [PATCH] Kill a warning while making pdfdocs. From: Alexey Dobriyan DOCPROC Documentation/DocBook/parportbook.sgml Warning(drivers/parport/share.c:188): No description found for parameter 'drv' (kernel-doc parameter name is incorrect.) [PATCH] Kill some 'No description found...' warnings. (kernel-api.sgml) From: Alexey Dobriyan Fix various kernel-doc parameters. [PATCH] Allow architectures to reenable interrupts on contended spinlocks From: Keith Owens As requested by Linus, update all architectures to add the common infrastructure. Tested on ia64 and i386. Enable interrupts while waiting for a disabled spinlock, but only if interrupts were enabled before issuing spin_lock_irqsave(). This patch consists of three sections :- * An architecture independent change to call _raw_spin_lock_flags() instead of _raw_spin_lock() when the flags are available. * An ia64 specific change to implement _raw_spin_lock_flags() and to define _raw_spin_lock(lock) as _raw_spin_lock_flags(lock, 0) for the ASM_SUPPORTED case. * Patches for all other architectures and for ia64 with !ASM_SUPPORTED to map _raw_spin_lock_flags(lock, flags) to _raw_spin_lock(lock). Architecture maintainers can define _raw_spin_lock_flags() to do something useful if they want to enable interrupts while waiting for a disabled spinlock. [PATCH] Un-inline spinlocks on ppc64 From: Paul Mackerras The patch below moves the ppc64 spinlocks and rwlocks out of line and into arch/ppc64/lib/locks.c, and implements _raw_spin_lock_flags for ppc64. Part of the motivation for moving the spinlocks and rwlocks out of line was that I needed to add code to the slow paths to yield the processor to the hypervisor on systems with shared processors. On these systems, a cpu as seen by the kernel is a virtual processor that is not necessarily running full-time on a real physical cpu. If we are spinning on a lock which is held by another virtual processor which is not running at the moment, we are just wasting time. In such a situation it is better to do a hypervisor call to ask it to give the rest of our time slice to the lock holder so that forward progress can be made. The one problem with out-of-line spinlock routines is that lock contention will show up in profiles in the spin_lock etc. routines rather than in the callers, as it does with inline spinlocks. I have added a CONFIG_SPINLINE config option for people that want to do profiling. In the longer term, Anton is talking about teaching the profiling code to attribute samples in the spin lock routines to the routine's caller. This patch reduces the kernel by about 80kB on my G5. With inline spinlocks selected, the kernel gets about 4kB bigger than without the patch, because _raw_spin_lock_flags is slightly bigger than _raw_spin_lock. This patch depends on the patch from Keith Owens to add _raw_spin_lock_flags. [PATCH] Only Print Taint Message Once From: Rusty Russell Only print the tainted message the first time. Its purpose is to warn users that we can't support them, not to fill their logs. [PATCH] blk_start_queue() should use kblockd kblockd is the thread which runs unplug functions, not keventd. [PATCH] EDD: follow sysfs convention, MODULE_VERSION, remove dead SCSI symlink From: Matt Domsch Clean up the edd.c driver. * use kobject_set_name() instead of snprintf() per GregKH's recommendation. * Add MODULE_VERSION() * s/driverfs/sysfs/ in Kconfig * Remove report URL message, as there have been too many BIOSs reported, virtually none of which are EDD-capable. This may return if/when I develop a better reporting method and database to capture/store the data from users. * Remove the unused code for creating a symlink to the scsi_device. This never worked right, and I'm going to show the relationship from a userspace tool which uses libsysfs instead. [PATCH] cmpci OSS driver update From: C.L. Tien Current version from cmedia. [PATCH] dentry and inode cache hash algorithm performance changes. From: "Jose R. Santos" It alleviates some issues seen with Linux when accessing millions of files on machines with large amounts of RAM (+32GB). Both algorithms are base on some studies that Dominique Heger was doing on hash table efficiencies in Linux. The dentry hash table has been tested in small systems with one internal IDE hard disk as well as in large SMP with many fiberchanel disks. Dominique claims that in all the testing done, they did not see one case were this has function provided worst performance and that in most test they were seeing better performance. The inode hash function was done by me base on Dominique's original work and has only been stress tested with SpecSFS. It provided a 3% improvement over the default algorithm in the SpecSFS results and speed ups in the response time of almost all filesystem operations the benchmark stress. With the better distribution is as also possible to reduce the number of inode buckets for 32 million to 16 million and still get a slightly better results. Anton was nice enough to provide some graphs that show the distribution before and after the patch at http://samba.org/~anton/linux/sfs/1/ For the dentry hash function, some of my other coorkers had put this hash function through various testing and have concluded that the hash function was equal or better than the default hash function. These runs were done with a (hopefully to be Open Source soon) benchmark called FFSB which can simulate various io patters across many filesystems and variable file sizes. SpecSFS fileset is basically a lot of small file which varies depending on the size of the run. For a not so big SMP system the number of file is in the +20 Million files range. Of those 20 million files only 10% are access randomly by the client. The purpose of this is that the benchmark tries to stress not only the NFS layer but, VM and Filesystems layers as well. The filesets are also hundreds of gigabytes in size in order to promote disk head movement by guaranteeing cache misses in memory. SFS 27% of the workload are lookups __d_lookup has showing high in my profiles. For the inode hash the problem that I see is that when running a benchmark with this huge fileset we end up trying to free a lot of inode entries during the run while trying to put new entries in cache. We end up calling ifind_fast() which calls find_inodes_fast() held under inode_lock. In order to avoid holding the inode_lock we needed to avoid having long chains in that hash function. When I took a look at the original hash function, I found it to be a bit to simple for any workload. My solution (which I took advantage of Dominique's work) was to create a hash that function that could generate completely different hashes depending on the hashval and the superblock in order to have the hash scale as we added more filesystems to the machine. Both of these problems can be somewhat tuned out by increasing the number of buckets of both d and i cache but it got to a point were I had 256MB of inode and 128MB in dentry hash buckets on a not so large SMP. With the hash changes I have been able to reduce the number of buckets to 128MB for inode cache and to 32MB for dentry cache and still get better performance. If it help my case... I haven't been running this benchmark for long, so I haven't been able to find a way to cheat. I need to come up with generic solutions until I can find a cheat for the benchmark. :) SDET results: Steve Pratt seem to have a SDET setup already and he did me the favor of running SDET with a reduce dentry entry hash table size. I belive that his table suggest that less than 3% change is acceptable variability, but overall he got a 5% better number using the new hash algorith. A) x4408way1.sdet.2.6.5100000-8p.04-05-05_12.08.44 vs B) x4408way1.sdet.2.6.5+hash-100000-8p.04-05-05_11.48.02 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 1048576 (order: 10, 4194304 bytes) Results:Throughput tolerance = 0.00 + 3.00% of A A B Threads Ops/sec Ops/sec %diff diff tolerance ----------- ------------ ------------ -------- ------------ ------------ 1 4341.9300 4401.9500 1.38 60.02 130.26 2 8242.2000 8165.1200 -0.94 -77.08 247.27 4 15274.4900 15257.1000 -0.11 -17.39 458.23 8 21326.9200 21320.7000 -0.03 -6.22 639.81 16 23056.2100 24282.8000 5.32 1226.59 691.69 * 32 23397.2500 24684.6100 5.50 1287.36 701.92 * 64 23372.7600 23632.6500 1.11 259.89 701.18 128 17009.3900 16651.9600 -2.10 -357.43 510.28 ========================================================================= [PATCH] Fix MTD suspend/resume From: Russell King This patch carries forward the following bug fix from MTD CVS, which causes a lot of noise after a suspend/resume cycle on ARM devices. revision 1.127 date: 2003/07/02 20:29:38; author: acurtis; state: Exp; lines: +2 -1 Added FL_STATUS to the FL_READY case in put_chip(). (Eliminate noise) [PATCH] remove blk_queue_bounce() printks From: Matt Domsch Jens Axboe wrote: It should just be deleted. As you note, it is a debug message. I originally added it so we would have some clues as to dma capability for bug reports. There never was any, the check can go :) [PATCH] fix deadlock in create_workqueue() Fix bug identified by Srivatsa Vaddagiri : There's a deadlock in __create_workqueue when CONFIG_HOTPLUG_CPU is set. This can happen when create_workqueue_thread fails to create a worker thread. In that case, we call destroy_workqueue with cpu hotplug lock held. destroy_workqueue however also attempts to take the same lock. [PATCH] throttle P4 thermal warnings From: Zwane Mwaikambo In really bad conditions this can keep printing for a while, throttle the output somewhat. Also change the "CPU%d" formatting to better match the other boot output. [PATCH] pcmcia/i82365.c warning fix From: "Luiz Fernando N. Capitulino" drivers/pcmcia/i82365.c: At top level: drivers/pcmcia/i82365.c:71: warning: `version' defined but not used [PATCH] worker_thread race fix Fix a waitqueue-handling race in worker_thread(). [PATCH] Warn when smp_call_function() is called with interrupts disabled From: Keith Owens Almost every architecture has a comment above smp_call_function() * You must not call this function with disabled interrupts or from a * hardware interrupt handler or from a bottom half handler. I have not seen any problems with calling smp_call_function() from a bottom half handler, but calling it with interrupts disabled can definitely deadlock. This bug is hard to reproduce and even harder to debug. CPU A CPU B Disable interrupts smp_call_function() Take call_lock Send IPIs Wait for all cpus to acknowledge IPI CPU A has not responded, spin waiting for cpu A to respond, holding call_lock smp_call_function() Spin waiting for call_lock Deadlock Deadlock Change all smp_call_function() to WARN_ON(irqs_disabled()). It should be BUG_ON() but some buggy code like SCSI sg will break with BUG_ON, so just warn for now. Change it to BUG_ON after the buggy code has been fixed. [PATCH] fixup 68360 module refcounting From: Christoph Hellwig [PATCH] gcc-3.4.0 fixes for 2.6.6-rc3 x86_64 kernel From: Mikael Pettersson Here are some patches to fix compilation warnings from gcc-3.4.0 in the 2.6.6-rc3 x86_64 kernel. - puts() type conflict in boot/compressed/misc.c: rename to putstr(), just like i386 did - cast-as-lvalue in ia32_copy_siginfo_from_user(): use temporary - code before declaration in io_apic.c: move decl up - code before declaration in ioremap.c: move existing #ifndef up - cast-as-lvalue (tons of them) from UP version of per_cpu(): merged asm-generic's version [PATCH] ppc64: use generic ipc syscall translation From: David Gibson Currently ppc64 has its own code to convert 32-bit ipc() syscalls to 64-bit, rather than using the common translation code from ipc/compat.c. This patch, tweaked slightly from an earlier version of Anton Blanchard's fixes that, replacing the ppc64 code with calls to the common code. I've run the LSB IPC tests, and as many of the LTP IPC tests as I could figure out how to run easily, and it seems to pass them all. [PATCH] fix ramdisk size assembler warning From: Jorn Engel AS arch/i386/boot/setup.o /usr/src/linux-2.6.5/arch/i386/boot/setup.S: Assembler messages: /usr/src/linux-2.6.5/arch/i386/boot/setup.S:159: Warning: value 0x37ffffff truncated to 0x37ffffff The warning is correct, the calculated value for ramdisk_max would be 0xb7ffffff instead of 0x37ffffff. Truncating 0xb7ffffff to 0x37ffffff is desired behaviour, so we should do it explicitly. [PATCH] cyclades cleanups From: Marcelo Tosatti - cleanups for cyclades Kconfig entry (Adrian Bunk/me) - janitors project: remove dead function (Don Koch) From: aris@cathedrallabs.org (Aristeu Sergio Rozanski Filho) Use the standard min/max macros [PATCH] jiffies-to-clockt fix From: john stultz This patch polishes up Tim Schmielau's (tim@physik3.uni-rostock.de) fix for jiffies_to_clock_t() and jiffies_64_to_clock_t(). The issues observed was w/ /proc output not matching up to wall time due to accumulated error caused by HZ not being exactly 1000 on i386 systems. The solution is to correct that error by using the more accurate TICK_NSEC in our calculation. Additionally, this patch corrects 3 warnings in the TCP layer uncovered by this change. [PATCH] readahead: keep file->f_ra sane When two threads are simultaneously pread()ing from the same fd (which is a legitimate thing to do), the readahead code thinks that a huge amount of seeking is happening and shrinks the window, damaging performance a lot. I don't see a sane way to avoid this within the readahead code, so take a private copy of the readahead state and restore it prior to returning from the read. [PATCH] CLOCK_TICK_RATE: introduce asm-*/8253pit.h, #define PIT_TICK_RATE constant. From: Thorsten Kranzkowski The calculation of the counter values in drivers/input/misc/pcspkr.c is incorrectly based on CLOCK_TICK_RATE. This goes unnoticed in i386 because there the system clock is driven by the same Programmable Interval Timer chip as the speaker. But this doesn't hold true on other archs, e.g. Alpha. To solve this problem I made these patches: 1/3: introduce asm-*/8253pit.h, #define PIT_TICK_RATE constant. It seems this is not always the same value. 2/3: use PIT_TICK_RATE in *spkr.c 3/3: use CLOCK_TICK_RATE where 1193180 was used in general timing calculations. (optional) There are still some places where the magic number is used instead of the #define (vt_ioctl.c, gameport.c) but I left them as-is. I got some responses from arch maintainers to specifically not touch their respective architectures so changing these places would mean breakage for them. Tested on Alpha and i386, ack'ed by Ralf Baechle for MIPS. This patch: introduce asm-*/8253pit.h, #define PIT_TICK_RATE constant. [PATCH] CLOCK_TICK_RATE: use PIT_TICK_RATE in *spkr.c From: Thorsten Kranzkowski [PATCH] CLOCK_TICK_RATE: use CLOCK_TICK_RATE From: Thorsten Kranzkowski use CLOCK_TICK_RATE where 1193180 was used in general timing calculations. (optional) [PATCH] es7000 subarch update for generic arch From: "Protasevich, Natalie" This is ES7000 sub architecture update. It makes ES7000 a part of the generic architecture, so the single compiled kernel will be able to choose a correct set of parameters, routines ("genapic"), and a boot path. It uses criteria provided by the subarch for platform identification. In case of ES7000, it is a unique product/vendor string in the ACPI/MP OEM table, and server control registers. The patch is confined to only es7000 subarch and generic subarch. It was tested on ES7000 as well as generic Intel 8x Xeon system. Andi Kleen has reviewed the changes. [PATCH] update Documentation/md.txt From: (Dick Streefland) The following patch documents the currently undocumented raid= kernel parameter. [PATCH] bfs filesystem read past the end of dir From: Jakub Jermar I found out that BFS filesystem will eventually try to read and interpret garbage past the end of directory in bfs_add_entry(). If the garbage (interpreted as i-node number) is not set to zero (does it have to be?) bfs_add_entry() will consider it a regular directory entry. This causes weird things like this: # touch a # rm a # ls # touch b # ls a My patch detects an attempt to read past the end of directory and explicitly clears the garbage that represents i-node number. Thus the correct behaviour is achieved. (was unable to contact Tigran) [PATCH] simplify mqueue_inode_info->messages allocation From: Chris Wright Currently, if a user creates an mqueue and passes an mq_attr, the info->messages will be created twice (and the extra one is properly freed). This patch simply delays the allocation so that it only ever happens once. The relevant mq_attr data is passed to lower levels via the dentry->d_fsdata fs private data. This also helps isolate the areas we'd need to touch to do rlimits on mqueues. [PATCH] swsusp documentation updates From: Pavel Machek [PATCH] blk: cache queue_congestion_on/off_threshold values From: "Chen, Kenneth W" It's kind of redundant that queue_congestion_on/off_threshold gets calculated on every I/O and they produce the same number over and over again unless q->nr_requests gets changed (which is probably a very rare event). We can cache those values in the request_queue structure. [PATCH] SElinux interface for reporting size of printk buffer From: Olaf Dabrunz Add the necessary hooks so that a SELinux-enabled kernel will allow the new "report the size of the printk buffer" query to work. [PATCH] Fix race on tty close From: Benjamin Herrenschmidt ldisc close can race with the flush_to_ldisc workqueue. This patch fixes it by killing the workqueue first. [PATCH] as-iosched barrier fix From: Jens Axboe AS does not correctly account requests inserted with INSERT_FRONT or INSERT_BACK, barriers for example. In other elevators, requeued requests also go through the insert path, but AS has its own requeue handler which means the code has never been tested. Also, make inserting a barrier with INSERT_SORT imply INSERT_BACK, which is the logical behaviour. Previously such insertions weren't rigorously defined. [PATCH] pcmcia/tcic.c warning fix. From: "Luiz Fernando N. Capitulino" drivers/pcmcia/tcic.c:63: warning: `version' defined but not used [PATCH] Lindent arch/i386/kernel/cpuid.c From: Hanna Linder Per Greg's request this is a patch of having run Lindent on cpuid.c. The tabs were not the right number of spaces before. I have verified it still compiles and boots with this "change". [PATCH] fix wrong var used in hotplug/shpchp_ctrl.c. From: "Luiz Fernando N. Capitulino" Zhenmin's checker tool detected this: 9. /drivers/pci/hotplug/shpchp_ctrl.c, Line 1575: err("%s: Failed to disable slot, error code(%d)\n", __FUNCTION__, rc); Maybe change to: err("%s: Failed to disable slot, error code(%d)\n", __FUNCTION__, retval); I think it is right because at line 1564, the slot is turned off, and in this line (1575) is checked the status to see if we got an error; if so, the error number is shown. This number is in 'retval', not in 'rc' ('rc' does have the return of configure_new_device()). [PATCH] hugepage: fix add_to_page_cache() error handling From: David Gibson add_to_page_cache() locks the given page if and only if it suceeds. The hugepage code (every arch), however, does an unlock_page() after add_to_page_cache() before checking the return code, which could trip the BUG() in unlock_page() if add_to_page_cache() failed. In practice we've never hit this bug, because the only ways add_to_page_cache() can fail are when we fail to allocate a radix tree node (very rare), or when there is already a page at that offset in the radix tree, which never happens during prefault, obviously. We should probably fix it anyway, though. The analagous bug in some of the patches floating about to demand-allocation of hugepages is more of a problem, because multiple processes can race to instantiate a particular page in the radix tree - that's been hit at least once (which is how I found this). [PATCH] Add sysctl to define a hugetlb-capable group From: "Chen, Kenneth W" , "Seth, Rohit" This patch addresses the longstanding problem wherein Oracle needs CAP_IPC_LOCK to allocate SHM_HUGETLB shm memory, but people don't want to run Oracle as root, and capabilties are busted. Various ideas with rlimits didn't work out, mainly because these objects live beyond the lifetime of the user processes which establish them. What we do is to create root-writeable /proc/sys/vm/hugetlb_shm_group which specifies a single group ID. Users who belong to that group may allocate hugepages for SHM_HUGETLB shm segments. So the sysadmin will greate a new group, say `hugepageusers', will add the oracle user to that group and will write that group's ID into /proc/sys/vm/hugetlb_shm_group. [PATCH] cpqarray update for 2.6 From: This patch fixes 2 minor issues that break our Array Configuration utility. my_io was changed to a pointer so the & had to removed when using it with copy_to_user(). Sometime in 2.5 SG_MAX got changed to 31. Maybe to copy cciss? Now I'm changing it back to 32 so our app can work. [PATCH] kill useless MOD_{INC,DEC}_USE_COUNT in sound/oss/msnd.c From: Christoph Hellwig Callers are exported register/unregister handlers so the module is locked in core by users of said exports. [PATCH] kill MOD_{INC,DEC}_USE_COUNT gunk in arch/cris/arch-v10/drivers/pcf8563.c From: Christoph Hellwig Driver already sets fops->owner so the open/close methods are entirely superflous. [PATCH] fix MOD_{INC,DEC}_USE_COUNT gunk in arch/um/drivers/net_kern.c From: Christoph Hellwig Well, UML is pretty out of date in mainline, but I'd like to squash the last users of said beasts rather sooner than later. [PATCH] drivers/video/* MOD_INC_USE_COUNT fixes From: Christoph Hellwig A bunch of framebuffer drivers use MOD_INC_USE_COUNT to prevent themselves from unloading completely - but we have a much easier way to do so, that is simply removing the module_exit/cleanup_module handler. [PATCH] fix MOD_INC_USE_COUNT usage in mtd From: Christoph Hellwig mtd driver need to get another reference if ->probe succeeds (strange design if you ask me, but what the heck..), and while most drivers have been switched to __module_get already two are still missing. [PATCH] remove MOD_INC_USE_COUNT usage in arch/um/drivers/harddog_kern.c From: Christoph Hellwig ->open already has a reference so use __module_get. The file has no maintainer noted in it, all credits are from the driver it's copied from. [PATCH] minor RCU optimization From: Stephen Hemminger Minor tweak to rcu, use __list_splice instead of list_splice because the list has already been checked for empty. [PATCH] use core_initcall for binfmt initialisation We need to register the binfmts earlier, so normal initcalls can successfully run call_usermodehelper() to execute things. [PATCH] Make usermodehelper_init() use core_initcall() We may as well make usermodehelper_init() core_initcall as well, to make sure its services are avaialble to all the other initcall levels. [PATCH] export con_set_default_unimap() fbcon needs this symbol. [PATCH] Crystal cs4235 mixer fix From: Joseph Parmelee Fixes improper setup of the mixer on Crystal soundcards with the CS4235 chip. [PATCH] remove kernel 2.2 code from drivers/net/hamradio/dmascc.c From: Adrian Bunk The patch below removes some #ifdef'd kernel 2.2 code from drivers/net/hamradio/dmascc.c. [PATCH] telephony/ixj.h: remove kernel 2.2 #ifdef's From: Adrian Bunk The patch below removes two #ifdef's for kernel 2.2 from linux-2.6.2-mm1/drivers/telephony/ixj.h [PATCH] fix some typos in sound docs From: Christoph Hellwig (partially from the debian kernel tree) [PATCH] make tags for selinux From: Olaf Hering make tags skips security/selinux/include because of find . -name include -prune This patch does just add it later. No idea if it can be done better. [PATCH] remove intermezzo Peter Braam said: I would just like to say that I have no difficulties with intermezzo being rm -rf'd. There are probably only a handful of users. In the past 4 years nobody has supported InterMezzo sufficiently for it to become successful. I have been fortunate to get really good support for the Lustre project. So I have focussed on that. Lustre 1.X has become really solid. The disconnected operation, caching and mirroring functionality of InterMezzo will become available in Lustre as a new feature in version 2. So I see no point in keeping InterMezzo if it is a nuisance. The patch removes the references to intermezzo. Please do a `bk rm' of fs/intermezzo. [PATCH] PPC termio fix From: Paul Mackerras It turns out that we are not handling the TABDLY bits of the termios c_oflag field correctly on PPC, PPC64 and Alpha. These three architectures have a value for XTABS that is different from the TAB3 value. POSIX specifies that setting the TABDLY field to TAB3 should result in tabs being expanded to spaces. In n_tty.c:opost() we check for O_TABDLY(tty) == XTABS, which is fine on most architectures because they have XTABS == TAB3. I think the right thing to do is just to change the definition of XTABS to be the same as TAB3 on these architectures. The patch below does this for PPC and PPC64 (and I suggest the Alpha maintainer should do the same). At the moment, applications using either the XTABS or TAB3 values won't get the expected behaviour. With this patch, apps that use TAB3 will get the expected behaviour. Apps that use XTABS will need to be recompiled (but note that the POSIX-specified name to use is TAB3 not XTABS). [PATCH] Fix __down Tainting Kernel with CONFIG_MODVERSIONS=y From: Rusty Russell PowerPC64 ABI has ".funcname" (the actual function) and "funcname" (the function descriptor) and we strip off the dots in "dedotify" called from module_frob_arch_sections(). We need to also de-dotify the corresponding names in the __version section. Actually has nothing to do with __down, it's just that we only print the first symbol whose version is missing. Remove intermezzo, per instructions from Peter Braam. [PATCH] x86-64: convert sibling map to masks From: Andi Kleen From: Suresh B. Siddha Convert sibling map on x86-64 to cpumasks. This is needed for the SMT patches. [PATCH] Add SMT setup for domain scheduler on x86-64 From: Andi Kleen Set up SMT for the domain scheduler on x86-64. This way the scheduling works better on HyperThreading aware systems; in particular it will use both physical CPUs before sharing two virtual CPUs on the same package. This improves performance considerably in some cases. Based on the i386 code and a previous patch from Suresh B. Siddha. [PATCH] get_thread_area macro fixes From: Adam Lackorzynski one of the macros for get_thread_area extracts the wrong bit. The "32bit" field is in bit 22, not 23 (as can be seen in desc.h). [ Fix ia64/x86-64 too, while we're at it. Linus ] [PATCH] fix LLD module refcounting in sr.c The patch to close all the open/close/hotplug races in sr left the module refcounting broken so that the ULD housing the CD device now can't be removed until the device itself is removed. This patch (structurally identical to the one for sd.c to perform the same function) fixes the module refcounting. qla2100 fabric fixes From: "Andrew Vasquez" Ok, well there aren't too many folks using an QLA2100 in a fabric topology, if there were, they wouldn't have gotten very far in the driver load sequence. I've been able to scrape-up a QLA2100, 1Gig switch, and an JBOD. Upon loading the 8.00.00b12k driver, the firmware successfully logs into the switch, the driver receives a LOOP_UP event, but, the kernel panics due to NULL pointer dereference while trying to perform an RFT_ID -- the attached patch against current scsi-misc-2.6 fixes that problem. NTFS: Cleanup whitespace (trailing space removal, etc). [SERIAL] Fix exit function pointer initialisers This wraps pointer initialisers to functions marked __devexit with __devexit_p. [PATCH] ntfs cleanup ntfs_fill_super() and ntfs_read_inode_mount() cleaned up. Removed the kludges around the first iget() on NTFS. Instead of playing with (re)setting ->s_op we have the MFT_FILE inode set up by explicit new_inode()/ set ->i_ino/insert_inode_hash()/call ntfs_read_inode_mount() directly. That kills the need of second super_operations and it allows to return error from ntfs_read_inode_mount() without resorting to ugly "poisoning" tricks. [PATCH] Sun3x dummycon Sun3x: Like most other platforms, Sun3x needs conswitchp set if CONFIG_DUMMY_CONSOLE is defined (from Sam Creasey) [PATCH] M68k missing M68k: needs include for __attribute_const__ (from Richard Zidlicky) [PATCH] PA-RISC updates for 2.6.6 - Split PA7300LC from PA7100LC (Matthew Wilcox) - Handle 32-bit firmware and 64-bit kernel at runtime (Ryan Bradetich) - Fix building in a separate tree (Matthew Wilcox) - Update defconfigs (Randolph Chung) - Make WCHAN work (Randolph Chung) - Initial support for SMP in 2.6 (Grant Grundler) - Use 8-byte PTEs on 32-bit kernels (James Bottomley) - Implement L2/L3 hybrid page tables for 64 bit kernels (James Bottomley) - Support 8TB of physical and virtual address space (James Bottomley) - Macro'ise the tlb miss handlers (James Bottomley) - Check the ptrace flags correctly in the syscall return path (Randolph Chung) - Eliminate many magic numbers (James Bottomley) - Work around linker bug in vmlinux.lds.S (James Bottomley) - Many cache flushing fixes (James Bottomley) - first baby step for PA8800 support (Grant Grundler) - Self-aligning spinlocks (Randolph Chung) [PATCH] ppc64: extra barrier in I/O operations At the moment, on PPC64, the instruction we use for wmb() doesn't order cacheable stores vs. non-cacheable stores. (It does order cacheable vs. cacheable and non-cacheable vs. non-cacheable.) This causes problems in the sort of driver code that writes stuff into memory, does a wmb(), then a writel to the device to start a DMA operation to read the stuff it has just written to memory. This patch solves the problem by adding a sync instruction before the store in the write* and out* macros. The sync is a full barrier that orders all loads and stores, cacheable or not. The patch also moves the eieio instruction that we had after the store to before the load in the read* and in* macros. With the sync before the store, we don't need an eieio as well in a sequence of stores, but we still need an eieio between a store and a load. I think it is better to do this than to turn wmb() into a full memory barrier (a sync instruction) because the full barrier is slow and isn't needed with the sync in the write*/out* macros. This way, write*/out* are fully ordered with respect to preceding loads and stores, which is what driver writers expect, and we avoid penalizing users of wmb() who are only doing cacheable stores. [PATCH] radeon: fix overlapping copyarea This fixes a corruption problem with overlapping copyarea()'s in the radeon driver. [libata] preparation for writeback caching support * bug fix: make sure 'nsect' member of struct ata_queued_cmd is initialized each time a cmd is re-used. Only affects PIO data xfers, which nobody uses. * slightly change the way a device's flags are printed out. currently the only flag is 'lba48', but soon 'wcache' will appear also. * add WB-cache-related constants and macros to linux/ata.h [libata] Maintainer annotations In MAINTAINERS and in individual low-level drivers. ia64: Add support to the kernel unwinder for the ".save rp, r0" idiom. Based on patch by Keith Owens. Update readme with new information on symlinks to Samba ia64: Minor changes to get a (mostly) clean compile with GCC pre-3.5. [PATCH] kill warning in r8169 [PATCH] lance.c: fix for card with signature 0x52 0x49 From: Vesselin Kostadiov Problem: The lance.c driver did not work with my Racal Interlan EtherBlaster card. More info: I found that my card has a signature 0x52 0x49 that was not recognized by the driver. Explanation: Following your suggestion I created a static table with possible signatures, not too different from the table in ni65.c. The updated code compares the first byte of the cards signature with the first byte of the signatures from the table. It this succeeds then it reads the second byte and compares the whole signature with the values from the table. This way the minimal I/O reads approach is maintained. Side effect: The previous version would missdetect cards with signatures 0x52 0x57 and 0x57 0x44. This has been fixed as well. [PATCH] 8139too not running s3 suspend/resume pci fix From: "Adrian Yee" Having an 8139 based device in my notebook, I often switch between it and wireless. The problem is that the 8139too driver does not save/restore the pci configuration of the card if the device isn't running. This simple patch moves the save/restore code so that the code runs regardless of whether or not the device is running. I looked at other drivers and they all seem to do the same thing. Is there a reason why this isn't done like in the patch? [netdrvr gt961000eth] remove useless MOD_{INC,DEC}_USE_COUNT *grr* - this one slipped in after my last round of audits it seems. [netdrvr dmfe] netpoll support [sound/oss i810] fix wait queue race in drain_dac This particular one fixes a textbook race condition in drain_dac that causes it to timeout when it shouldn't. [sound/oss i810] fix race This patch fixes the value of swptr in case of an underrun/overrun. Overruns/underruns probably won't occur at all when the driver is fixed properly, but this doesn't hurt. [sound/oss] remove bogus CIV_TO_LVI This patch removes a pair of bogus LVI assignments. The explanation in the comment is wrong because the value of PCIB tells the hardware that the DMA buffer can be processed even if LVI == CIV. Setting LVI to CIV + 1 causes overruns when with short writes (something that vmware is very fond of). [sound/oss i810] clean up with macros This patch adds a number macros to clean up the code. [sound/oss i810] fix partial DMA transfers This patch fixes a longstanding bug in this driver where partial fragments are fed to the hardware. Worse yet, those fragments are then extended while the hardware is doing DMA transfers causing all sorts of problems. [sound/oss i810] fix playback SETTRIGGER This patch fixes SETTRIGGER with playback so that the LVI is always set and the DAC is always started. [sound/oss i810] fix OSS fragments This patch makes userfragsize do what it's meant to do: do not start DAC/ADC until a full fragment is available. [sound/oss i810] remove divides on playback This patch removes a couple of divides on the playback path. [sound/oss i810] fix drain_dac loop when signals_allowed==0 This patch fixes another bug in the drain_dac wait loop when it is called with signals_allowed == 0. [sound/oss i810] fix reads/writes % 4 != 0 This patch removes another bogus chunk of code that breaks when the application does a partial write. In particular, a read/write of x bytes where x % 4 != 0 will loop forever. [sound/oss i810] fix deadlock in drain_dac This patch fixes a typo in a previous change that causes the driver to deadlock under SMP. [CPUFREQ] cpu_sibling_mask fixup. [CPUFREQ] Latency is in nanoseconds -- speedstep-centrino got it wrong [CPUFREQ] Avoid scheduling cpufreq_delayed_get_work() twice; but do call it a bit earlier. [CPUFREQ] Fix for longrun.c for degenerate case From H. Peter Anvin I ran into a system the other day which had a Transmeta processor, but configured in a degenerate, fixed-frequency configuration. It crashed booting Fedora Core 2 test 3 due to a division by zero in the longrun cpufreq driver. [CPUFREQ] Nehemiah improvements for longhaul driver. From Andreas Meisinger [PATCH] M68k superfluous whitespace M68k: Remove superfluous whitespace that hurts my eyes with `let c_space_errors=1' in vim. This includes correcting trailing whitespace and spaces in front of tabs. `diff -urNbB' shows no difference before/after. [ARM PATCH] 1828/2: rework SA11xx PCMCIA code structure for better sharing of generic code Patch from Nicolas Pitre [Updated after comments on patch #1828/1] This patch moves things around and rename a few files/functions/structures to be shared with PXA2xx PCMCIA support (coming in a separate patch) and maybe others SOCs. No functional change were made so SA11xx users shouldn't see any difference. [ARM PATCH] 1829/1: base support for PCMCIA on the Intel PXA2xx chip Patch from Nicolas Pitre This provides base PCMCIA support for PXA2xx and relies on patch #1828/1. Board specific support must be added separately. This is based on initial work from Stefan Eletzhofer and Ian Molton. [ARM PATCH] 1830/2: PCMCIA support for Lubbock (PXA2xx based) Patch from Nicolas Pitre [Patch #1830/1 wasn't generated correctly. Here's the fixed one.] This adds PCMCIA support for Lubbock and relies on patch #1829/1. This is probably the only reference board using both the PXA2xx and the SA1111 for PCMCIA support. Other board won't need to link sa1111_generic.o. [ARM PATCH] 1831/1: remove outdated SA11xx PCMCIA documentation Patch from Nicolas Pitre This documentation isn't reflecting the code anymore and therefore is misleading. Better remove it and let people see the code where multiple examples can now be followed. [ARM PATCH] 1843/2: supress PCMCIA build warnings Patch from Nicolas Pitre This is an alternative to patch #1843/1 keeping the ugliness (and possible future breakage) to the affected driver only. This one should be applied after patch #1830/2. NTFS: 2.1.10 - Force read-only (re)mounting of volumes with unsupported flags. [sound i810] silently ignore invalid PCM_ENABLE_xxx bits from userland We must guarantee that struct file's ->f_mode agrees with PCM_ENABLE_xxx bits from userland OSS apps. Other drivers silently ignore invalid bits, so we follow their lead. USB: make functions static in usb drivers that should be Thanks to Tridge's findstatic.pl script for helping find these. [SCHED]: Fix double slash in include directive. [SPARC64]: Export pci_domain_nr to modules. Module attributes: fix build error if CONFIG_MODULE_UNLOAD=n Thanks to Andrew Morton for pointing this out to me. [PATCH] I2C: kill duplicate includes in i2c bus drivers Following a suggestion of Arthur Othieno, here is a trivial patch that kills duplicate inclusions of config.h in four i2c bus drivers. At first I was thinking of also removing inclusions of config.h wherever it doesn't seem to be necessary but Eugene doesn't seem to think it's a good idea. So I may give it a try later (in 2.7), but for now this simple patch will be enough. [PATCH] I2C: new i2c video decoder calls Attached patch adds three new calls to the i2c video decoder API. The changes were requested by Michael (CC'ed) and approved by Gerd Knorr (v4l maintainer, CC'ed). Short explanation: * INIT is a general initialization call with optional initialization data. Reason for this is that several i2c decoders (or general: clients) are being used by several adapters (main drivers), and in some cases, one adapter simply needs different settings than the other, either because the adapter is completely different or because the card was reverse engineered in a way that doesn't allow multiple adapters to run using the same original initialization data. Michael faces such a problem right now. Both he and me lack time to properly sit together and work out the exact details or a proper way to merge. * VBI_BYPASS and GPIO set specific pins on the decoder. This will be used in the saa7111 driver. My driver (zr36067, original user of the saa7111 driver) doesn't use any of this, but Michael's does. [PATCH] I2C: new i2c video decoder calls: saa7111 driver Attached patch implements the i2c calls in the saa7111 driver. The driver is still compatible with old behaviour, so the zr36067 driver (the original user of the saa7111 module) doesn't need any changes. I'll probably gradually make everything use DECODER_INIT instead of 0 (that was a nice hack back then) somewhere later. Can I just remove '0' later on? Or are there official ABI rules for stable kernel versions? [PATCH] I2C: Rename hardware monitoring I2C class Quoting myself: > Mmm, I once proposed that I2C_ADAP_CLASS_SMBUS would be better renamed > I2C_ADAP_CLASS_SENSORS (so I2C_CLASS_SENSORS now). What about that? I > think it would be great to embed that change into your patch, so that > the name changes only once. > > BTW, if HWMON is prefered to SENSORS, this is fine with me too, I > have no strong preference. Below is a patch that does that. I finally went for HWMON. Yes, it's big, but it's actually nothing more than s/I2C_CLASS_SMBUS/I2C_CLASS_HWMON/ (thanks perl -wip :)). [PATCH] Add class support to drivers/net/wan/cosa.c This patch adds sysfs class support to the Cosa driver. I have verified it compiles but do not have the hardware to test it. If someone could that would be helpful. [PATCH] USB: Add class support to drivers/usb/misc/tiglusb.c [PATCH] CompactPCI: remove set_attention_status This one removes useless code and fixes the issue that the return code of set_attention_status for cpcihp is always 0 even if cpci_set_attention_status returns an error. Eike [PATCH] CompactPCI: remove two useless checks This two checks are useless: cpci_hp_register_bus is only called from two places, one has constant arguments, the other one passes the module parameters which are checked for this condition on module load. And the bus argument of cpci_hp_unregister_bus can never be NULL, all functions calling this function use fields of the bus struct before so they will oops if it would ever be NULL. Eike [PATCH] CompactPCI: use goto for error handling Convert cpci_hotplug_core.c::cpci_hp_register_bus to use goto for error handling. Eike [PATCH] CompactPCI: remove useless NULL checks If the "struct hotplug_slot *" parameter is ever NULL then something bogus is going on, we should not hide this bug by catching it. Also the name used in debug messages is fixed. Eike [PATCH] CompactPCI: remove slot_paranoia_check and get_slot slot_paranoia_check is only another kind of checking everything for NULL. Removing this leads to function get_slot is reduced to a simple cast, so this function can be killed also. Eike [PATCH] CompactPCI: remove two useless functions This 2 get_* functions do the same the hotplug core does if this functions would not be there. Kill Bil^H^H^Hem! Eike [PATCH] PCI Express, SHPCHP: fix freeing wrong resources [PATCH] SHPC PCI Hotplug: fix cleanup_slots again... Am Donnerstag, 6. Mai 2004 02:48 schrieb Sy, Dely L: > On Fri, 30 Apr 2004 12:31:18 +0200, Rolf Eike Beer wrote: >> - >> -static int cleanup_slots (struct controller * ctrl) >> +static void cleanup_slots(const struct controller *ctrl) >> { >> - struct slot *old_slot, *next_slot; >> + struct slot *old_slot; >> >> old_slot = ctrl->slot; >> ctrl->slot = NULL; >> >> while (old_slot) { >> - next_slot = old_slot->next; >> - pci_hp_deregister (old_slot->hotplug_slot); >> - kfree(old_slot->hotplug_slot->info); >> - kfree(old_slot->hotplug_slot->name); >> - kfree(old_slot->hotplug_slot); >> - kfree(old_slot); >> - old_slot = next_slot; >> + pci_hp_deregister(old_slot->hotplug_slot); >> + old_slot = old_slot->next; >> } > > The variable next_slot and its assignment should be kept, for once > pci_hp_deregister() is called and the release callback function, which > you added, may come in and clean up old_slot structure before the next > ptr is saved. The code should be: > > static void cleanup_slots(const struct controller *ctrl) > { > struct slot *old_slot, *next_slot; > > old_slot = ctrl->slot; > ctrl->slot = NULL; > > while (old_slot) { > next_slot = old_slot->next; > pci_hp_deregister (old_slot->hotplug_slot); > old_slot = next_slot; > } > } Yes, good point. Greg, this patch makes does it. Please apply on top of the other stuff. [PATCH] PCI Hotplug: fix wrong var used in hotplug/shpchp_ctrl.c. Zhenmin's checker tool detected this: 9. /drivers/pci/hotplug/shpchp_ctrl.c, Line 1575: err("%s: Failed to disable slot, error code(%d)\n", __FUNCTION__, rc); Maybe change to: err("%s: Failed to disable slot, error code(%d)\n", __FUNCTION__, retval); I think it is right because at line 1564, the slot is turned off, and in this line (1575) is checked the status to see if we got an error; if so, the error number is shown. This number is in 'retval', not in 'rc' ('rc' does have the return of configure_new_device()). The patch bellow fixes that: [PATCH] acpiphp_glue.c oops fix It oopses during modprobe becasue the first load of acpiphp didn't clean up properly. [PATCH] PCI crash with pciless box or pci=off workaround on Vaio's From: Alan Cox Reasoning - Earlier in boot if we find a PCI bus we set raw_pci_ops to the function table to use for PCI accesses. - When we enter this function we now check if we have such a method - If not then we know it will otherwise crash because the call sequence through the code goes pci_irq_init pirq_peer_trick pci_scan_bus_parented which always leads down into the device scan and a pci config access via raw_pci_ops The moment the scan searches for a device it has to crash as it has no access methods. - The other case which does nothing but pci_fixup_irqs is a no-op with no PCI devices - THUS any situation that is changed by the raw_pci_ops check was previously an oops case anyway or did nothing anyway. Tested with pci=off and without NTFS: Only build logfile.o if building the driver with read-write support. [PATCH] USB: fix media/dsbr100.c unused variable. drivers/usb/media/dsbr100.c: In function `usb_dsbr100_probe': drivers/usb/media/dsbr100.c:239: warning: unused variable `videodev' [PATCH] USB Gadget: Fix file-storage gadget Request Sense length On Fri, 7 May 2004, kernel@metro.cx wrote: > Hi All, > > I don't know where else to report this, but I found a very very very > minor bug in the usb gadgets drivers, specifically the file_storage.c > mass storage driver. > > In the function do_request_sense(..) it says: > > buf[7] = 18 - 7; // Additional sense length > > Whereas (according to page 38 of the USB mass storage class, UFI command spec, > http://www.usb.org/developers/devclass_docs#approved) this clearly neads > to be equal to 10, not 11. > > I checked with the 2.6.5 source, it is still there. Hope someone will find this usefull, although most USB hosts seem to ignore length bits alltogether anyway.... > > Koen Martens You are quite right; thank you for pointing this out. Greg, please apply the patch below. [PATCH] USB: EHCI power management updates This patch updates EHCI suspend/resume so that its essential components work on a few different implementations: - make root hub suspend/resume work - make remote wakeup work (given CONFIG_USB_SUSPEND patch) - separate root hub suspend/resume from PCI suspend/resume - say if controller supports remote wakeup (on this system) - sysfs register dump unavailable if controller is suspended Plus a handful of minor cleanups. Please merge, along with the "hcd-0506.patch" I sent last week. Tested by modifying sysfs power/state files, since ACPI doesn't work on this system (so I can't test system suspend/resume): - For root hub(*) ... suspend/resume works, also remote wakeup - PCI controller ... suspend/resume works, remote wakeup signals PME# (according to "lspci -vv"), but that's ignored on my test sytem Regardless of whether USB was active, "echo 1 > /proc/acpi/sleep" produced a system that wouldn't resume, and the same result came from "echo standby > /sys/power/state". So that's about as far as I can take this testing for now. - Dave (*) Doing this relies on the CONFIG_USB_SUSPEND patch. Otherwise no USB devices respond to sysfs power/state updates. The PCI suspend/resume is a superset of this. [PATCH] ia64: perfmon dv serialization patch Add ia64_dv_serialize_*() macros to ia64_set_*br() calls to avoid DV warnings from the assembler (requires updated assembler). [PATCH] USB: Accept devices with funky interface/altsetting numbers Now that all the USB drivers have been audited, we can safely accept devices that have noncompliant numbering for their interfaces or altsettings. This patch skips bad or duplicate descriptors, allows gaps in the numbering, accepts more or fewer interfaces than bNumInterfaces, and logs warnings describing all these things. Also, the debugging log messages have been improved by David Brownell. This should please a sizeable group of users. NTFS: Really final white space cleanups. [PATCH] USB: more functional HCD PCI PM glue This patch makes the usbcore PCI suspend/resume logic behave much better. In particular: - Even HCs without PCI PM support will normally be able to support global suspend, saving power ... and will need to resume later. Let them try to suspend; lots of not-that-old USB controllers don't have PM caps. - Saner order for the boilerplate PCI stuff. It also explicitly disables the IRQ and DMA, which aren't available in D1/D2/D3 states anyway. - Uses pci_enable_wake() when the root hub supports remote wakeup. Didn't fully work in one test setup; that controller's PME# was evidently ignored. (Not enabled unless CONFIG_USB_SUSPEND.) It worked for me with brief tests with the current 2.6.6-rc uhci-hcd with one old UHCI; more extensive ones with various OHCIs (using patches which I'll post soonish); and not at all with EHCI (where PM hasn't ever worked). Those of you who've been having PM problems might find this helpful as-is, though I think that unless you're using UHCI you'll also need an HCD patch. - Dave [PATCH] USB: fixes of assumptions about waitqueues quoting Linus: -- > so there is no need to recheck the bit in do/while loop, because > there is no false wakeups now. You should never assume this. You should assume that there are _always_ false wakeups. Why? Because Linux has always allowed people to leave wait-queues active, without being "atomic". For example, the tty read/write layer used to (still does?) add itself on the wait-queue _once_, and then leave itself on the wait-queue while in a loop it does copies from/to user space. -- Unfortunately, this means us. Here's the first fix. Comments? - make sure timeouts are observed even if somebody left us on a queue [PATCH] USB: OHCI resume/reset stops deadlocking in PM code System-wide PM resume now happily deadlocks if one of the resuming devices tries to remove devices which vanished during the suspend(*). IMO that's unreasonable both because devices can/do vanish, and because 2.4 didn't deadlock in those cases; but no patch to fix that has been merged. The result is that ever since merging the "new" PM code, some OHCI-based systems deadlock on resume. So this patch handles the "lost power during resume" case differently: it doesn't disconnect the root hub (or its children) directly. Instead, it does part of that work immediately, and defers the rest to khubd: - add a "pending" list for live urbs, and use it after reset to abort pending URBs (and reclaim "live" EDs/TDs) - immediately mark all devices NOTATTACHED, so any operations on the devices before khubd handles the disconnects, including resume() callbacks, will fail - kick root hub so it can do the cleanup It also handles "fminterval" init/reinit a bit better, mostly to work better in some remote wakeup scenarios addressed in later patches: - save any initial value the boot firmware provided - use it during initialization (and eventually, remote wakeup) Other changes: - use better jiffies calculation for scheduled delays - the allocator does more of the one-time initialization - initialize hcd.can_wakeup according to boot firmware - move some inlines to the header - minor cleanups (*) http://marc.theaimsgroup.com/?l=linux-kernel&m=106606272103414&w=2 reported against 2.6.0-test7. [PATCH] USB: OHCI cleanups This splits out a few obvious fixes, to help shrink a PM patch: - when the HC is quiescing, don't schedule any more EDs or re-activate any after unlink completion. - when the HC is suspended, don't access registers through sysfs either. - simplify locking and call for donelist processing [PATCH] USB: khubd turns port power back on after reset This goes with the OHCI anti-deadlock patch, and is what ensures that when a root hub loses power during suspend, khubd can turn port power back on so devices can enumerate. [PATCH] USB: OHCI root hub suspend/resume/wakeup This patch goes on top of the previous two, and the hcd-0506 patch: - Moves root hub suspend/resume code out of PCI-specific bus glue into generic hub code. That way it's easy to re-use it even for non-PCI implementations like SA1111, OMAP, and LH7A404. (Plus, given CONFIG_USB_SUSPEND, it can be invoked with sysfs.) - Root hub suspend is a lot more careful, as is root hub resume. Pending transactions are now shut down more consistently; and more registers are re-initialized on resume. - The PCI bus glue is now left with truly generic PCI stuff, plus some PMAC-specific stuff (which doesn't include irq disabling any more, hcd-0506 moves that up a level in the stack). - Remote wakeup support is basically working for the root hub. (given CONFIG_USB_SUSPEND to suspend devices and enable it). - Idle HCs will now automatically suspend themselves, and resume as necessary. This saves a certain amount of power on most systems, and matches what UHCI has been doing for a while. The large size of this patch is mostly because of moving that root hub suspend/resume code out of the PCI-specific glue. [PATCH] USB: cosmetic fixes for cdc-acm [PATCH] kobject_set_name - error handling 1) kobject_set_name-cleanup-01.patch This patch corrects the following by checking the reutrn code from kobject_set_name(). bus_add_driver() bus_register() sys_dev_register() o The following patch cleansup the kobject_set_name() users. Basically checking return code from kobject_set_name(). There can be error returns like -ENOMEM or -EFAULT from kobject_set_name() if the name length exceeds KOBJ_NAME_LEN. [PATCH] fix dev_printk to work even in the absence of an attached driver [PATCH] kobject/sysfs race fix The following patch fixes the race involved between unregistering a kobject and simultaneously opeing a corresponding attribute file in sysfs. Ideally sysfs should take a ref. to the kobject as long as it has dentries referring to the kobjects, but because of current limitations in module/kobject ref counting, sysfs's pinning of kobject leads to hang/delays in rmmod of certain modules. The patch checks for unhashed dentries in check_perm() while opening a sysfs file. If the dentry is still hashed then it goes ahead and takes the ref to kobject. This done under the per dentry lock. It does this in the inline routine sysfs_get_kobject(dentry). [ACPI] create platform_rename_gsi() so ES7000 can munge IRQ numbers from Natalie Protasevich Fix oops when smb buffer can not be allocated [ACPI] if _STA.functional, set _STA.present (Bjorn Helgaas) workaround for Big Sur and Bull systems [SPARC]: Mark unaligned_panic as attribute used to workaround gcc-3.4 problem. [PATCH] ia64: sn_get_node_first_cpu() is redundant sn_get_node_first_cpu() is redundant, so kill it. Since calls to this routine happen rather late in the boot process, using the generic topology functions is safe. Fixup the callers and kill the function. [PATCH] ia64: kill warnings in sn2 specific pci init A couple of unused variable warnings cropped up in the sn2 pci init routine. This patch kills the unused variables. [BRIDGE]: Allow multiple interfaces with same address (necessary for VLAN's). [MODULES]: Fix endianness in modprobe. [TG3]: Update to 5788 capable 5705 TSO firmware, version 1.2.0 [TG3]: Update to non-5705 TSO firmware version 1.6.0 [TG3]: If asked to load TSO firmware on 5750, just return success. The 5750 does TSO in hardware, not via firmware code. ia64: Call print_modules() before printing tombstone. [PATCH] ia64: This patch kills some unused lines and redundant functions [PATCH] ia64: map display option ROMs on SN2 By default, the sn2 PCI init code doesn't map option ROM address ranges since PIO address space is limited. However, we do need to map display option ROMs in the event that userland applications want to read and emulate them. First of a set of eight patches that adds support for Intel's IXP4xx family of network processors. The code still needs some cleanup here and there, but it's to the point that it's should be OK to push upstream. Some of the remaining TODOs: - Cleanup GPIO IRQ handling for edge-trigered HW (none exists ATM) - Add IDE driver for various platforms - Misc cleanups This patch adds the changes to arch/arm/Makefile and arch/arm/Kconfig entry-armv.S, debug.S, bios32.c: IXP4xx support head.S, Makefile: IXP4xx support proc-xscale.S, Kconfig: IXP4xx support Add IXP4xx support pci_ids.h: IXP4xx support [TG3]: Add 5750 NVRAM programming plus 5704 MAC offset bug fix. [ACPI] Add MADT error checking (Yi Zhu) http://bugzilla.kernel.org/show_bug.cgi?id=1434 [ACPI] create kacpid thread to handle ACPI work in process context. Also will be needed for cpu hot-unplug. from Anil S Keshavamurthy and David Shaohua Li http://bugzilla.kernel.org/show_bug.cgi?id=2515 [TG3]: Update LED programming to support 5750. [TG3]: Updated ASF handling for 5750. [ARM PATCH] 1827/1: PXAFB patch updated based on comments in 1826 Patch from Ian Campbell The patch includes the PXA FB driver discussed recently on the arm-kernel mailing list, I have incorporated your (RMK's) comments from patch 1826. [ARM PATCH] 1853/1: Update OMAP low level debug functions Patch from Tony Lindgren Changes OMAP low level debug function to allow virtual IO != physical IO. Also removes waituart checking, as it only worked for first serial port. [ARM PATCH] 1854/1: Remove old board specific OMAP files Patch from Tony Lindgren Removes old duplicate board specific files that have been renamed to board-*.c [ARM PATCH] 1848/1: PXA suspend/resume improvements Patch from John K Luebs This adds support to preserve the GPIO level across a suspend and resume for PXA machine types. Removes the hack to preserve the FFUART state as this is unnecessary with the 2.6 serial driver code. [ARM PATCH] 1842/1: fix/clarify some comments Patch from Nicolas Pitre [ARM PATCH] 1863/2: definitions and mapping for the Intel PXA27x internal registers Patch from Nicolas Pitre This reworks PXA register mapping to accommodate the extra registers of the PXA27x chips (aka Bulverde). [ARM PATCH] 1864/1: separate PXA25x and PXA27x specific code Patch from Nicolas Pitre [ARM PATCH] 1865/1: DMA changes for PXA27x Patch from Nicolas Pitre [ARM PATCH] 1871/1: faster IRQ retrieval for PXA27x Patch from Nicolas Pitre On the PXA27x it's also possible to get the IRQ info from CP6 and it has a much lower latency than the memory mapped equivalent. [ARM PATCH] 1851/1: sa1100 fb support for collie Patch from John Lenz Add the sa1100fb_mach_info structure for collie [ARM PATCH] 1874/1: S3C2410 - fix for selection of low-level debug UART Patch from Ben Dooks Fix selection configuration entry for UART number for low-level debug common.c: new file [ARM PATCH] 1875/1: SMDK2410 machine support Patch from Ben Dooks Initial support for SMDK2410 and variants [ARM PATCH] 1876/1: Removed apostrophes from asembler source Patch from Ben Dooks Removed two 's from assembly code (binutils has problems with these) common.c: Remove unused flag variable from ixp4xx_timer_interrupt prpmc1100-setup.c, ixdp425-setup.c, coyote-setup.c: Add INIT_MACHINE() to ixp4xx platform setup code [PATCH] ia64: switch /proc/perfmon to seq_file avoid buffer overflows Switches /proc/perfmon to using the seq_file interface. This is more inline with the rest of the kernel and avoid crashes for very large machine configurations. Based on patch by Dean Nelson. [PATCH] USB: missing probe() diagnostics for CDC Ethernet This patch should help correct the "missing diagnostics with CONFIG_USB_DEBUG during CDC Ethernet probe()" issue. Some folk are having problems with firmware that doesn't respond properly to descriptor fetches -- which is unnecessarily confusing because the diagnostics aren't being printed. [PATCH] USB: Patch to remove interface indices from devio.c > I went ahead and created a patch to change all the places where devio.c > uses an interface index. Now it always uses just the interface number. > Does this look all right to you? I don't have a convenient way to test > it. Hi Alan, thanks for doing this. It looks and works OK. I added some name changes: all struct usb_interface pointers are now called intf; and, when reasonable, variables holding interface numbers are now all called ifnum. This drowns your original changes in a sea of churning names, I hope you don't mind. [PATCH] USB: Don't delete interfaces until all are unbound On Thu, 13 May 2004, Duncan Sands wrote: > No, but the pointer for another (previous) interface may just have been > set to NULL, causing an Oops when usb_ifnum_to_if loops over all > interfaces. Of course! I trust you won't mind me changing your suggested fix slightly. This should do an equally good job of repairing things, and it will prevent other possible invalid references as well. [PATCH] USB: fix ohci-hcd build error "Matt H." wrote: > > Just attempted to compile 2.6.6-mm2 and got this error > > CC [M] drivers/usb/core/driverfs.o > CC [M] drivers/usb/core/hcd-pci.o > LD [M] drivers/usb/core/usbcore.o > LD drivers/usb/host/built-in.o > CC [M] drivers/usb/host/ehci-hcd.o > CC [M] drivers/usb/host/ohci-hcd.o > In file included from drivers/usb/host/ohci-hcd.c:129: > drivers/usb/host/ohci-hub.c: In function `ohci_rh_resume': > drivers/usb/host/ohci-hub.c:313: error: `hcd' undeclared (first use in this > function) hm, not sure what's happened there... [ARM] Remove Documentation/ARM/XScale Patch from Deepak Saxena Documentation/ARM/XScale has not been updated by anyone in a long time; therefore, it is being deleted until someone volunteers to provide updated versions. ia64: fix spurious "timer tick before it's due" problem Patch Bjorn Helgaas: Fix the "timer tick before it's due" complaint from timer_interrupt(). The problem was that smp_callin() turned on the periodic timer tick before syncing the ITC with the BP. Syncing the ITC happens with interrupts disabled, and if you're unlucky enough to (1) pend a timer interrupt, and (2) set the ITC back before the ITM value that caused the timer interrupt, you can get stuck for several iterations in the following cycle (assume 100 clocks per tick): ITC ITM --- --- ia64_init_itm() 100 200 schedule first tick at 200 ia64_sync_itc() disable interrupts 200 200 ITC == ITM; pend IT interrupt 150 set ITC to sync with BP enable interrupts recognize pending IT interrupt disable IT interrupts timer_interrupt() 160 200 notice that 160 < 200, printk "timer tick before it's due") 200 200 ITC == ITM; pend IT interrupt 300 set ITM for next tick re-enable IT interrupt recognize pending IT interrupt disable IT interrupts timer_interrupt() 260 300 notice that 260 < 300, printk "timer tick before it's due") ... repeat until you're tired or timer_interrupt() takes long enough that the ITC lands after the ITM This patch syncs the ITC with the BP before starting up the periodic tick, so the above scenario should never happen. This doesn't change how the timer tick on the BP is started; that happens quite early (and must be early because things like calibrate_delay() depend on jiffies updates). ia64: Correct atomic_inc_and_test() and atomic64_inc_and_test(). USB: add snooping capability to usbfs for control messages. Also fix up some of the other printk() calls to be dev_* calls. MTD driver for Intel IXP4xx platform (from MTD CVS tree) Patch from Deepak Saxena [PATCH] USB: Merge support for Keyspan UPR-112 USB serial adapter from 2.4 to 2.6 Following patch merges the support for Keyspan UPR-112 USB serial adapter from 2.4 to 2.6. [PATCH] USB: usbhid calls itself "hid" [ARM] Add config help and documentation for Intel IXP4xx platforms Patch from Deepak Saxena [libata] add new ->bmdma_setup hook In order to support some new taskfile protocols, particularly ATAPI, the setup-and-start-DMA hook needs to be split into its component pieces, 'setup' and 'start'. For PCI IDE-style controllers, most of the code is moved into the 'setup' portion, with the 'start' portion only flipping a single bit in hardware. [libata] use new ->bmdma_{start,setup} method to properly support ATAPI [libata] more ATAPI work - translate SCSI CDB to ATA PACKET Now that we can specify ATAPI as a taskfile protocol, we can utilize the existing SCSI->ATA translation infrastructure to build an ATA PACKET command quickly and easily. [libata] random minor bug fixes * Only call ata_sg_setup{_one} if ATA_QCFLAG_SG is set. Preparation for future use, as currently ATA_QCFLAG_SG is always set when ata_qc_issue is called. This change in theory is incorrect for Promise TX/SX4 drivers, since those drivers set up the Promise-specific packet in their ->fill_sg hook, which is now called conditionally. A FIXME that doesn't affect anything, for now. * ATA_PROT_ATAPI and ATA_PROT_ATAPI_DMA command issue need to be differentiated. * Create and use ata_qc_set_polling() to consistently set/clear the flags associated with using polling instead of interrupts. [libata] kill ATA_QCFLAG_POLL flag The standard ATA bit nIEN in the Device Control register serves as the indicator for whether we are polling or not. As it mirrors ATA_QCFLAG_POLL completely, eliminate that in favor of testing ATA_NIEN bit. NTFS: 2.1.11 - Driver internal cleanups. NTFS: 2.1.11 - Rename uchar_t to ntfschar. [netdrvr wan] remove comx driver the drivers have been broken since pre-2.4.0, like referencing a symbol that was made procfs-internal in 2.3.x, haven't received maintainer updates for about the same period and MOD_{INC,DEC}_USE_COUNT usage that pretty much unfixable (inside warts of _horrible_ procfs abuse). [ARM] Fix broken IXP4xx GPIO0 IRQ handling Patch from Yves Rutschle USB: remove magic number field from usb_serial_port as it's pretty useless. USB: remove magic number field from struct usb_serial as it's pretty useless. USB: removed port_paranoia_check() call for usb serial drivers. Pretty useless stuff. If this was hiding anything real, we need to find out. USB: remove serial_paranoia_check() function If this is hiding real problems, we need to find them. Add missing mount parameters [netdrvr] remove rcpci driver, for Red Creek Hardware VPNs Pete Popov, the author says This driver is obsolete and broken in 2.4 (and I'm pretty sure in 2.6). The hardware has not been available for a while. I wrote the driver for 2.2 but after I left RedCreek (which went out of business), someone in the community updated it to 2.4, but I don't think that person even had the hardware to test it so the driver remained broken in 2.4. So my recommendation is that it's really time to remove this driver from the kernel tree. Just my opinion but I thought I'd share it with you. [...] I can't imagine that there are any users because 2.4 was broken last time I checked (admittedly that was a year ago). [TG3]: Include mss in every txd, not just the first, on 5750. USB: remove get_usb_serial() as it's pretty much unneeded It also could hide real bugs, and that's not good. And the name implies that a reference is grabbed, and that's not true at all. [PATCH] USB: compile fix for usbfs snooping [TG3]: On 5750 with TSO, need to set some special reg bits. [PATCH] USB: ohci resume fix Prakash K. Cheemplavam wrote: > David Brownell wrote: > >>> There appear lines like >>> >>> usb usb2: string descriptor 0 read error: -108 >>> >>> bug or feature? They weren't there with 2.6.6-mm1. I have no usb2.0 >>> stuff to actually test. My usb1 stuff seems to work though. >> >> Bug; minor, since the only real symptom seems to be messages like >> that. Ignore them for now, I'll make a patch soonish. > > Ok, good. Thanks for the explanation of what is going on, though I don't > can make too much out of it. ;-) The short version is: it's missing this patch. [TG3]: Full chip reset tweaks for 5750. [PATCH] add ibmasm driver warning message [note, I changed this a bit to be nicer on the system log, greg k-h] [PATCH] sysfs_rename_dir-cleanup o The following patch cleans up sysfs_rename_dir(). It now checks the return code of kobject_set_name() and propagates the error code to its callers. Because of this there are changes in the following two APIs. Both return int instead of void. int sysfs_rename_dir(struct kobject * kobj, const char *new_name) int kobject_rename(struct kobject * kobj, char *new_name) [PATCH] USB: hcd-pci suspend tweak I needed this to get an APM + UHCI config to behave on resume. Applies against your BK of last night ... OHCI and EHCI do some of this manually, they could be simplified later. [TG3]: More 5750 chip reset tweaks. [TG3]: Do not enable slow clocks on 5750 with ASF. [TG3]: Rewrite dma_rwctrl settings to handle PCIX/PCIE. [TG3]: Add 572x/575x PCI IDs to driver table, update vers/reldate. USB: change usbserial core to use module_param() USB: convert pl2303 to use module_param() USB: convert visor to use module_param() [PATCH] I2C: ICH6/6300ESB i2c support This patch adds DID support for ICH6 and 6300ESB to i2c-i801.c(SMBus). In order to add this support I needed to patch pci_ids.h with the SMBus DID's. To keep things orginized I renumbered the ICH6 and ESB entries in pci_ids.h. I then patched the piix IDE and i810 audio drivers to reflect the updated #define's. I also removed an error from irq.c; there was a reference to a 6300ESB DID that does not exist. [PATCH] I2C: "probe" module param broken for it87 in Linux 2.6.6 Jean Delvare writes: > So I'd suggest that you simply use the standard exit sequence in the > it87 driver (the second one in your current patch). A patch for the 2.4 > driver would be appreciated as well. OK. I've attached a new version of the patch against linux-2.6.6. I'll send a patch against current lm_sensors CVS removing the extra exit command in a separate mail. Greg KH writes: > On Wed, May 12, 2004 at 04:38:03PM +0200, Bj?rn Mork wrote: >> + if (!it87_find(&addr)) { >> + printk("it87.o: new ISA address: 0x%04x\n", addr); > > That printk is wrong (no KERN_ level, or dev_printk() style use). > Please fix it in your next revision of this patch. Errh, I just added it to document my sloppyness. It was never meant to be in the patch I sent you. Sorry. Removed in the attached patch. The style of these drivers seem to be "just working, making no noise" so I assume informational printk's are unwanted. [PATCH] I2C: Missed ixp42x -> ixp4xx conversion Forgot to include this with my original patch a few weeks ago... do not try to grab the i_sem ever during revalidate path since the rename code can grab it before we get here [PATCH] Make users of page->count use the provided macros I'm about to change the meaning (and name) of page->count. Go through and fix up all those places which are open-coding references to it. [PATCH] Implement atomic_add_negative() on various architectures Lots of architectures have atomic_add_return() and no atomic_add_negative(). We can implement the latter in terms of the former. [PATCH] Implement atomic_inc_and_test() on various architectures It's easy to do when the arch provides atomic_inc_return(). [PATCH] alpha: atomic_inc_and_test() From: Ivan Kokshaysky It seems atomic_inc_and_test() is missing on alpha. [PATCH] ia64 atomic_inc_and_test fix From: David Mosberger [PATCH] sparc64: implement atomic_add_negative() [PATCH] Fix page double-freeing race This has been there for nearly two years. See bugzilla #1403 vmscan.c does, in two places: spin_lock(zone->lru_lock) page = lru_to_page(&zone->inactive_list); if (page_count(page) == 0) { /* erk, it's being freed by __page_cache_release() or * release_pages() */ put_it_back_on_the_lru(); } else { --> window 1 <-- page_cache_get(page); put_in_on_private_list(); } spin_unlock(zone->lru_lock) use_the_private_list(); page_cache_release(page); whereas __page_cache_release() and release_pages() do: if (put_page_testzero(page)) { --> window 2 <-- spin_lock(lru->lock); if (page_count(page) == 0) { remove_it_from_the_lru(); really_free_the_page() } spin_unlock(zone->lru_lock) } The race occurs if the vmscan.c path sees page_count()==1 and then the page_cache_release() path happens in that few-instruction "window 1" before vmscan's page_cache_get(). The page_cache_release() path does put_page_testzero(), which returns true. Then this CPU takes an interrupt... The vmscan.c path then does page_cache_get(), taking the refcount to one. Then it uses the page and does page_cache_release(), taking the refcount to zero and the page is really freed. Now, the CPU running page_cache_release() returns from the interrupt, takes the LRU lock, sees the page still has a refcount of zero and frees it again. Boom. The patch fixes this by closing "window 1". We provide a "get_page_testone()" which grabs a ref on the page and returns true if the refcount was previously zero. If that happens the vmscan.c code simply drops the page's refcount again and leaves the page on the LRU. All this happens under the zone->lru_lock, which is also taken by __page_cache_release() and release_pages(), so the vmscan code knows that the page has not been returned to the page allocator yet. In terms of implementation, the page counts are now offset by one: a free page has page->_count of -1. This is so that we can use atomic_add_negative() and atomic_inc_and_test() to provide put_page_testzero() and get_page_testone(). The macros hide all of this so the public interpretation of page_count() and set_page_count() remains unaltered. The compiler can usually constant-fold the offsetting of page->count. This patch increases an x86 SMP kernel's text by 32 bytes. The patch renames page->count to page->_count to break callers who aren't using the macros. This patch requires that the architecture implement atomic_add_negative(). It is currently present on arm arm26 i386 ia64 mips ppc s390 v850 x86_64 ppc implements this as #define atomic_add_negative(a, v) (atomic_add_return((a), (v)) < 0) and atomic_add_return() is implemented on alpha cris h8300 ia64 m68knommu mips parisc ppc ppc ppc64 s390 sh sparc v850 so we're looking pretty good. [PATCH] sched: add missing local_irq_enable() From: Nick Piggin this_rq_lock does a local_irq_disable, and sched_yield() needs to undo that. [PATCH] MSEC_TO_JIFFIES consolidation From: Ingo Molnar We have various different implementations of MSEC[S]_TO_JIFFIES and JIFFIES_TO_MSEC[S]. We recently had a compile-time clash in USB. Fix all that up. - The SCTP version was very inefficient. Hopefully this version is accurate enough. - Optimise for the HZ=100 and HZ=1000 cases - This version does round-up, so sleep(9 milliseconds) works OK on 100HZ. - We still have lots of jiffies_to_msec and msec_to_jiffies implementations. From: William Lee Irwin III Optimize the cases where HZ is a divisor of 1000 or vice-versa in JIFFIES_TO_MSECS() and MSECS_TO_JIFFIES() by allowing the nonvanishing(!) integral ratios to appear as a parenthesized expressions eligible for constant folding optimizations. From: me Use typesafe inlines for the jiffies-to-millisecond conversion functions. This means that milliseconds officially takes the type `unsigned int'. All current callers seem to be OK with that. Drivers need to be fixed up to use this instead of their private versions. [PATCH] Covert drivers to use msec_to_jiffies Remove various private implementations of msecs_to_jiffies() and jiffies_to_msecs(). There are various uppercase versions which should be consolidated. [PATCH] MSEC_TO_JIFFIES to msec_to_jiffies Switch all users of MSEC[S]_TO_JIFFIES and JIFFIES_TO_MSEC[S] over to use jiffies_to_msecs() and msecs_to_jiffies(). Withdraw MSECS_TO_JIFFIES() and JIFFIES_TO_MSECS() from the kernel API. [PATCH] revert the process-migration-speedup patch David Mosberger asked that this be backed out: "I do not believe that flushing the TLB before migration is be the right thing to do on ia64 machines which support global TLB purges (i.e., all but SGI's machines)." It was of huge benefit for the SGI machines, so work is ongoing. [PATCH] VM accounting fix From: Hugh Dickins Stas Sergeev wrote: mprotect() fails to merge VMAs because one VMA can end up with VM_ACCOUNT flag set, and another without that flag. That makes several apps of mine to malfuncate. Great find! Someone has got their test the wrong way round. Since that VM_MAYACCT macro is being used in one place only, and just hiding what it's actually about, fold it into its callsite. [PATCH] do_mounts_rd-malloc-fix gcc-3.4.0 sez: init/do_mounts_rd.c:309: warning: conflicting types for built-in function 'malloc' [PATCH] filtered wakeups From: William Lee Irwin III This patch series is solving the "thundering herd" problem that occurs in the mainline implementation of hashed waitqueues. There are two sources of spurious wakeups in such arrangements: (a) Hash collisions that place waiters on different objects on the same waitqueue, which wakes threads falsely when any of the objects hashed to the same queue receives a wakeup. i.e. loss of information about which object a wakeup event is related to. (b) Loss of information about which object a given waiter is waiting on. This precludes wake-one semantics for mutual exclusion scenarios. For instance, a lock bit may be slept on. If there are any waiters on the object, a lock bit release event must wake at least one of them so as to prevent deadlock. But without information as to which waiter is waiting on which object, we must resort to waking all waiters who could possibly be waiting on it. Now, as the lock bit provides mutual exclusion, only one of the waiters woken can proceed, and the remainder will go back to sleep and wait for another event, creating unnecessary system load. Once wake-one semantics are established, only one of the waiters waiting to acquire a lock bit need to be woken, which measurably reduces system load and improves efficiency (i.e. it's the subject of the benchmarking I've been sending to you). Even beyond the measurable efficiency gains, there are reasons of robustness and responsiveness to motivate addressing the issue of thundering herds. In a real-life scenario I've been personally involved in resolving, the thundering herd issue caused powerful modern SMP machines with fast IO systems to be unresponsive to user input for a minute at a time or more. Analogues of these patches for the distro kernels involved fully resolved the issue to the customer's satisfaction and obviated workarounds to limit the pagecache's size. The latest spin of these patches basically shoves more pieces of the logic into the wakeup functions, with some efficiency gains from sharing the hot codepath with the rest of the kernel, and a slightly larger diff than the patches with the newly-introduced entrypoint. Writing these was motivated by the push to insulate sched.c from more of the details of wakeup semantics by putting more of the logic into the wakeup functions. In order to accomplish this while still solving (b), the wakeup functions grew a new argument for communication about what object a wakeup event is related to to be passed by the waker. ========= This patch provides an additional argument to wakeup functions so that information may be passed from the waker to the waiter. This is provided as a separate patch so that the overhead of the additional argument can be measured in isolation. No change in performance was observable here. [PATCH] filtered wakeups: wakeup enhancements From: William Lee Irwin III This patch provides an additional argument to __wake_up_common() so that the information wakefunc.patch made waiters ready to receive may be passed to them by wakers. This is provided as a separate patch so that the overhead of the additional argument to __wake_up_common() can be measured in isolation. No change in performance was observable here. [PATCH] filtered wakeups: apply to pagecache functions From: William Lee Irwin III This patch implements wake-one semantics for page wakeups in a single step. Discrimination between distinct pages is achieved by passing the page to the wakeup function, which compares it to a pointer in its own on-stack structure containing the waitqueue element and the page. Bit discrimination is achieved by storing the bit number in that same structure and testing the bit in the wakeup function. Wake-one semantics are achieved by using WQ_FLAG_EXCLUSIVE in the codepaths waiting to acquire the bit for mutual exclusion. [PATCH] filtered wakeups: apply to buffer_head functions From: William Lee Irwin III This patch implements wake-one semantics for buffer_head wakeups in a single step. The buffer_head being waited on is passed to the waiter's wakeup function by the waker, and the wakeup function compares that to the a pointer stored in its on-stack structure and checking the readiness of the bit there also. Wake-one semantics are achieved by using WQ_FLAG_EXCLUSIVE in the codepaths waiting to acquire the bit for mutual exclusion. [PATCH] rename rmap_lock to page_map_lock Sync this up with Andrea's patches. [PATCH] rmap-5-swap_unplug-page-revert Revert the pre-2.6.6 per-address-space unplugging changes. This removes a swapper_space exceptionality, syncs things with Andrea and provides for simplification of the swap unplug function. [PATCH] Add blk_run_page() From: Andrea Arcangeli From: Jens Axboe Add blk_run_page() API. This is so that we can pass the target page all the way down to (for example) the swap unplug function. So swap can work out which blockdevs back this particular page. [PATCH] blk_run_page(): fixup for swap_unplug_io_fn() [PATCH] blk_run_page(): we don't trust bh->b_page We don't trust bh->b_page to point to the right thing across all filesystems, so revert this bit. [PATCH] swap speedups and fix From: Andrea Arcangeli I don't think we need an install_swap_bdev/remove_swap_bdev anymore, we should use the swap_info->bdev, not the swap_bdevs. the swap_info already has a ->bdev field, the only point of remove_swap_bdev/install_swap_bdev was to unplug all devices as efficiently as possible, we don't need that anymore with the page parameter. Plus the semaphore should be a rwsem to allow parallel unplug from multiple pages. After that I don't need to take the semaphore anymore during swapon, no swapcache with swp_type() pointing to such bdev, will be allowed until swapon is complete (SWP_ACTIVE is set a lot later after setting p->bdev). In swapoff I only need a dummy serialization with the readers, after try_to_unuse is complete: err = try_to_unuse(type); current->flags &= ~PF_SWAPOFF; /* wait for any unplug function to finish */ down_write(&swap_unplug_sem); up_write(&swap_unplug_sem); that's all, no other locking and no install_swap_bdev/remove_swap_bdev. (and the swap_bdevs[] compression code was busted) [PATCH] ia64 cpu hotplug: core kernel initialisation From: Ashok Raj This patch changes __init to __devinit to init_idle so that when a new cpu arrives, it can call these functions at a later time. [PATCH] ia64 cpu hotplug: init section fixes From: Ashok Raj Contains changes from __init to __devinit to support cpu hotplug Changes only arch/ia64 portions of the kernel tree. [PATCH] ia64 cpu hotplug: sysfs additions From: Ashok Raj Creation of sysfs via topology_init() creates sysfs entries. The creation of the online control file is created separately when the cpu_up is invoked in arch independent code. [PATCH] ia64 cpu hotplug: IRQ affinity work From: Ashok Raj irq affinity setting via /proc was forcing iosapic rte programming by force. The correct way to do this is to perform this when a interrupt is pending. [PATCH] ia64 cpu hotplug: /proc rework From: Ashok Raj Changes proc entries for cpu hotplug to be created via the cpu hotplug notifier callbacks. Also fixed a bug in the removal code that did not remove proc entries as expected. [PATCH] Revisited: ia64-cpu-hotplug-cpu_present.patch From: Paul Jackson With a hotplug capable kernel, there is a requirement to distinguish a possible CPU from one actually present. The set of possible CPU numbers doesn't change during a single system boot, but the set of present CPUs changes as CPUs are physically inserted into or removed from a system. The cpu_possible_map does not change once initialized at boot, but the cpu_present_map changes dynamically as CPUs are inserted or removed. Paul Jackson provided an expanded explanation: Ashok's cpu hot plug patch adds a cpu_present_map, resulting in the following cpu maps being available. All the following maps are fixed size bitmaps of size NR_CPUS. #ifdef CONFIG_HOTPLUG_CPU cpu_possible_map - map with all NR_CPUS bits set cpu_present_map - map with bit 'cpu' set iff cpu is populated cpu_online_map - map with bit 'cpu' set iff cpu available to scheduler #else cpu_possible_map - map with bit 'cpu' set iff cpu is populated cpu_present_map - copy of cpu_possible_map cpu_online_map - map with bit 'cpu' set iff cpu available to scheduler #endif In either case, NR_CPUS is fixed at compile time, as the static size of these bitmaps. The cpu_possible_map is fixed at boot time, as the set of CPU id's that it is possible might ever be plugged in at anytime during the life of that system boot. The cpu_present_map is dynamic(*), representing which CPUs are currently plugged in. And cpu_online_map is the dynamic subset of cpu_present_map, indicating those CPUs available for scheduling. If HOTPLUG is enabled, then cpu_possible_map is forced to have all NR_CPUS bits set, otherwise it is just the set of CPUs that ACPI reports present at boot. If HOTPLUG is enabled, then cpu_present_map varies dynamically, depending on what ACPI reports as currently plugged in, otherwise cpu_present_map is just a copy of cpu_possible_map. (*) Well, cpu_present_map is dynamic in the hotplug case. If not hotplug, it's the same as cpu_possible_map, hence fixed at boot. [PATCH] ia64 cpu hotplug: core From: Ashok Raj Supports basic ability to enable hotplug functions for IA64. Code is just evolving, and there are several loose ends to tie up. What this code drop does - Support logical online and offline - Handles interrupt migration without loss of interrupts. - Handles stress fine > 24+ hrs with make -j/ftp/rcp workloads - Handles irq migration from a dying cpu without loss of interrupts. What needs to be done - Boot CPU removal support, with platform level authentication - Putting cpu being removed in BOOT_RENDEZ mode. [PATCH] Module ref counting for vt console drivers From: Herbert Xu The following patch adds basic module reference counting to vt console drivers. Currently modules like fbcon are not counted at all. [PATCH] I2O subsystem fixing and cleanup for 2.6 - i2o-config-clean.patch From: Markus Lidel * Changes the formating of the header in i2o_config.c [PATCH] I2O subsystem fixing and cleanup for 2.6 - i2o-passthru.patch From: Markus Lidel * Add a pass-thru ioctl to i2o_config, which is needed to work with the Adaptec management software. [PATCH] i2o: 64-bit fixes From: Markus Lidel Fix 64-bit problems. [PATCH] I2O subsystem fixing and cleanup for 2.6 - i2o_block-cleanup.patch From: Markus Lidel * more than 3 "visible" disks (hda, hdb, hdc, hdd) lead to kernel panics. * removes some unused code with partitions. * I2O_LOCK was often called with the addresses of the controller, and not with the address of the device. Fixed. * the cleanup function for gendisk (del_gendisk) doesn't work if the queue is shared between different devices. To workaround the queue is removed before. * redundant code removed in module initialization and remove, use i2ob_new_device and i2ob_del_device instead. * removed atomic_t queue_depth * removed unnecessary and bogus code for queue handling [PATCH] I2O subsystem fixing and cleanup for 2.6 - i2o-64-bit-fix.patch From: Markus Lidel * provides i2o_context_list_*() functions, which maps 64-bit pointers to 32-bit context id's in a dynamic list. On 32-bit systems the functions are replaced with a static inline. * i2o_scsi now uses the i2o_context_list_*() functions for transaction context, and therefore now work on 64-bit systems too. [PATCH] I2O subsystem fixing and cleanup for 2.6 - i2o-makefile-cleanup.patch From: Markus Lidel * The Kconfig and Makefile in drivers/message/i2o still got a CONFIG_I2O_PCI entry, which is not used anymore. This one is replaced by a CONFIG_I2O_CONFIG entry, which now builds the i2o_config module. [PATCH] d_flags locking fixes A few filesystems modify dentry.d_flags under non-obvious locking. To consolidate that field wth d_vfs_flags they need to take ->d_lock [PATCH] d_vfs_flags locking fix Be consistent about d_vfs_flags locking: take dentry->d_lock when modifying it. [PATCH] dentry shrinkage Rework dentries so that the inline name length is between 31 and 48 bytes. On SMP P4-compiled x86 each dentry consumes 160 bytes (24 per page). Here's the histogram of name lengths on all 1.5M files on my workstation: 1: 0% 2: 0% 3: 1% 4: 5% 5: 8% 6: 13% 7: 19% 8: 26% 9: 33% 10: 42% 11: 49% 12: 55% 13: 60% 14: 64% 15: 67% 16: 69% 17: 71% 18: 73% 19: 75% 20: 76% 21: 78% 22: 79% 23: 80% 24: 81% 25: 82% 26: 83% 27: 85% 28: 86% 29: 87% 30: 88% 31: 89% 32: 90% 33: 91% 34: 92% 35: 93% 36: 94% 37: 95% 38: 96% 39: 96% 40: 96% 41: 96% 42: 96% 43: 96% 44: 97% 45: 97% 46: 97% 47: 97% 48: 97% 49: 98% 50: 98% 51: 98% 52: 98% 53: 98% 54: 98% 55: 98% 56: 98% 57: 98% 58: 98% 59: 98% 60: 99% 61: 99% 62: 99% 63: 99% 64: 99% So on x86 we'll fit 89% of filenames into the inline name. The patch also removes the NAME_ALLOC_LEN() rounding-up of the storage for the out-of-line names. That seems unnecessary. [PATCH] dentry qstr consolidation When dentries are given an external name we currently allocate an entire qstr for the external name. This isn't needed. We can use the internal qstr and kmalloc only the string itself. This saves 12 bytes from externally-allocated names and 4 bytes from the dentry itself. The saving of 4 bytes from the dentry doesn't actually decrease the dentry's storage requirements, but it makes four more bytes available for internal names, taking the internal/external ratio from 89% up to 93% on my 1.5M files. Fix: The qstr consolidation wasn't quite right, because it can cause qstr->len to be unstable during lookup lockless traverasl. Fix that up by taking d_lock earlier in lookup. This serialises against d_move. Take the lock after comparing the parent and hash to preserve the mostly-lockless behaviour. This obsoletes d_movecount, which is removed. [PATCH] dentry d_bucket fix The gap between checking d_bucket and sampling d_move_count looks like a bug to me. It feels safer to be checking d_bucket after taking the lock, when we know that it is stable. And it's a little faster to check d_bucket after having checked the hash rather than before. [PATCH] more dentry shrinkage - d_vfs_flags can be removed - just use d_flags. All modifications of dentry->d_flags are under dentry->d_lock. On x86 this takes the internal string size up to 40 bytes. The internal/external ratio on my 1.5M files hits 96%. [PATCH] dentry layout tweaks Lookup typically touches three fields of the dentry: d_bucket, d_name.hash and d_parent. Change the layout of things so that these will always be in the same cacheline. [PATCH] H8/300: bitops.h add find_next_bit From: Yoshinori Sato - add find_next_bit [PATCH] H8/300: ldscripts fix From: Yoshinori Sato - symbol prefix (use h8300 and v850) support - include headers [PATCH] H8/300: pic support From: Yoshinori Sato - add PIC binary support [PATCH] H/8300 pic support fix From: Yoshinori Sato Sorry. There was the file which lacked. [PATCH] H8/300: preempt support From: Yoshinori Sato - add preempt support - add new syscalls - code cleanup [PATCH] H8/300: SCI driver fix From: Yoshinori Sato - fix h8300 depend setup sequence [PATCH] H8/300: ne driver From: Yoshinori Sato - ne2k compatible NIC support [PATCH] H8/300: Kconfig From: Yoshinori Sato - Separate taget depends config. [PATCH] H8/300: delete headers From: Yoshinori Sato - Delete obsolete header files [PATCH] H8/300: more cleanup From: Yoshinori Sato - gcc-3.4 warning fix. - io access address fix. - cleanup code. [PATCH] Add del_single_shot_timer() From: Geoff Gustafson , "Chen, Kenneth W" , Ingo Molnar , me. The big-SMP guys are seeing high CPU load due to del_timer_sync()'s inefficiencies. The callers are fs/aio.c and schedule_timeout(). We note that neither of these callers' timer handlers actually re-add the timer - they are single-shot. So we don't need all that complexity in del_timer_sync() - we can just run del_timer() and if that worked we know the timer is dead. Add del_single_shot_timer(), export it to modules and use it in AIO and schedule_timeout(). (these numbers are for an earlier patch, but they'll be close) Before: 32p 4p Warm cache 29,000 505 Cold cache 37,800 1220 After: 32p 4p Warm cache 95 88 Cold cache 1,800 140 [Measurements are CPU cycles spent in a call to del_timer_sync, the average of 1000 calls. 32p is 16-node NUMA, 4p is SMP.] (I cleaned up a few things and added some commentary) [PATCH] s390: core From: Martin Schwidefsky s390 core changes: - Rename idle_cpu_mask to nohz_cpu_mask as agreed with Dipankar. - Refine compiler version check for "Q" constraints in uaccess.h. - Store per process ptrace information to the correct place. - Fix per cpu data access for 64-bit modules. - Add topology_init function for cpu hotplug. - Define TASK_SIZE dependent on TIF_31BIT and define MM_VM_SIZE to 4TB to get rid of elf_map32 and arch_get_unmapped_area. [PATCH] s390: common i/o layer From: Martin Schwidefsky Common i/o layer changes: - Delay unregister/register of ccw devices reappering on a different subchannel. Search for the old ccw_device & subchannel for the reattached device and deregister it too to avoid inconsistencies. - Fix path grouping for devices that present command reject for SetPGID but not for SensePGID. [PATCH] s390: dasd driver From: Martin Schwidefsky dasd driver changes: - Do error recovery for error recovery requests. - Retry request if the start_IO failed because of a timeout. [PATCH] s390: 3270 console driver From: Martin Schwidefsky 3270 device driver change: - Don't allow activation of views while the initial size sensing is still in progress. Replace RAW3270_FLAGS_SHUTDOWN with RAW3270_FLAGS_READY. - Make 3270 views loadable as modules. [PATCH] s390: zfcp host adapter From: Martin Schwidefsky zfcp host adapter change: - Prevent infinite retry of SCSI commands when FCP adapter is unavailable. - Always queue error recovery structure to the error recovery running list. - Add help text to zfcp config option. [PATCH] s390: network driver From: Martin Schwidefsky Network driver changes: - lcs: Add missing irb error checking. - lcs: Fix multicasting. - lcs: Use a seperate lock (ipm_lock) for multicast list. - lcs: Add missing in_dev_put in multicase address list handling. - iucv: Set static variables to NULL after kfree. - iucv: Do bus_unregister if module initialization fails. - netiucv: Convert iucvMagic to EBCDIC in con_action_start. - netiucv: Remove administration of ifno-stuff for device name, - netiucv: Add attribute to remove a netiucv device. - qeth: Add version string that is displayed at driver load time. - qeth: Fix memory leak in qeth_arp_query. - qeth: Remove duplicate case statements in qeth_do_ioctl. - qeth: Fix OSA broadcast filtering. - qeth: Increase timeout for purge ARP cache IPA. - qeth: Fix hsi device naming. - qeth: Add do_QDIO count to qeth performance statistics. - qeth: Allow writing to IP address takeover attribute only in state DOWN or RECOVER. - qeth: Fix hang when removing a vlan device. - qeth: Cleanup error messages for ARP commands. - qeth: Return EOPNOTSUPP for purge ARP on HiperSockets. - qeth: Drop skbs if the net_device of a qeth device is down. - qeth: Simplify ip address list processing. [PATCH] befs: LBD support From: "Sergey S. Kostyliov" LBD patch merged long time ago, so it is safe to pass u64 block numbers to sb_bread() when sector_t is large enough. [PATCH] befs: microoptimisation, use befs_bread() instead of befs_bread_iaddr() From: "Sergey S. Kostyliov" We already have block number (inode->i_ino), so there is no need to calculate it from befs_block_run before sb_bread() call (this is what befs_bread_iaddr() do). [PATCH] befs: binary search microoptimisation From: "Sergey S. Kostyliov" Move value initialisation out of the loop body. [PATCH] befs: typo fix From: "Sergey S. Kostyliov" Fix really old typo in config help [PATCH] befs: debugging code cleanup From: "Sergey S. Kostyliov" - Reduce stack usage. - Kill useless duplication of error and warning messages when debug is on. Old behaviour was: ... BeFS(hda1): [PATCH] befs: maintainer update From: "Sergey S. Kostyliov" Acked by Will Dyson. [PATCH] befs: inode->i_flags thinko fix Jorn Engel inode->i_flags should never contain fs-specific flags. In fact, it doesn't; the checks against it cause "chattr +T" to be useless for ext[23]. Same bug was in befs as well. [PATCH] export clear_pages on ppc32 From: Olaf Hering ext3 as module is not possible in 2.6.6, clear_pages, called from clear_page, is not exported. Also, unexport clear_page(), which is an inline. [PATCH] PPC32: Fix __flush_dcache_icache_phys() for Book E From: Matt Porter This patch implements/uses __flush_dcache_icache_page() which kmaps on a Book E part, but keeps the existing behavior on other PowerPCs which can disable the MMU. [PATCH] PPC32: Fix copy prefetch on non coherent PPCs From: Matt Porter This patch fixes the condition where prefetching cache lines beyond a buffer can cause data corruption on non cache coherent PPCs. It is a port of the version that went into 2.4. From Eugene Surovegin . [PATCH] PPC32: Add Book E / PPC44x specific exception support From: Matt Porter Adds general Book E debug exception support and PPC44x-specific debug exception implementation. [PATCH] PPC32: Add Book E / PPC44x specific exception support From: Matt Porter Adds general Book E machine check exception support and PPC44x-specific machine check exception implementation. [PATCH] PPC32: New OCP core support (updated) From: Matt Porter New OCP infrastructure ported from 2.4 along with several enhancements. Updated patch with comments from hch and Valdis. [PATCH] PPC32: Bubinga/405EP for new OCP From: Matt Porter Merge Bubinga/405EP support against new OCP. [PATCH] PPC32: PPC44x lib support From: Matt Porter Merge PPC44x library support against new OCP. [PATCH] PPC32: IBM PPC4xx-specific OCP support From: Matt Porter Merge PPC4xx-specific OCP support for new OCP core. [PATCH] PPC32: 4xx core fixes and 440gx PIC support From: Matt Porter Merge misc. 4xx core fixes and support for the new cascade scheme in the 440gx. [PATCH] PPC32: Update 4xx defconfigs From: Matt Porter Update all current 4xx defconfigs for new OCP. [PATCH] PPC32: PPC40x ports for new OCP From: Matt Porter Merge all current PPC40x ports against new OCP. [PATCH] PPC32: PPC44x ports for new OCP From: Matt Porter Merge all current PPC44x ports against new OCP. [PATCH] ppc32: Fix pmac compile after OCP changes From: Paul Mackerras Matt Porter's recent changes broke the compile for non-4xx ppc32 systems, unfortunately. I get an error that mfdcr is not defined in include/asm-ppc/ocp.h when compiling for powermac (reasonable, since the mfdcr instruction only exists on 4xx processors). The patch below fixes it. [PATCH] ppc32: Move declarations into headers From: Paul Mackerras The patch below moves some declarations from C files into the appropriate header file in include/asm-ppc (and removes an unused local variable in a function). [PATCH] ppc64: fix radix tree allocation under spinlock From: Anton Blanchard We were allocating radix tree nodes in the interrupt code with GFP_KERNEL under a spinlock. Change it to GFP_ATOMIC. [PATCH] ppc64: set MSR_RI in iseries exception code From: Anton Blanchard We need to set MSR_RI in iseries exception prolog. [PATCH] ppc64: align some heavily used variables From: Anton Blanchard Based on feedback from the hardware guys align jiffies and tb_last_stamp. We update both regularly and there are other read only, heavily accessed things that share those cachelines. [PATCH] ppc64: NVRAM fixes From: Anton Blanchard We check nvram_fetch/nvram_store against -1, so better not make these unsigned. [PATCH] ppc64: remove iseries interrupt recursion workaround From: Anton Blanchard It turns out we do avoid irq recursion on iseries so remove the workaround. [PATCH] ppc64: Kconfig bits for CONFIG_SPINLINE From: Paul Mackerras When I sent the patch to uninline the spinlocks, I inadvertently left out the change to arch/ppc64/Kconfig which defines the config symbol for inlining the locks (CONFIG_SPINLINE now). This patch adds it. It also adds a symbol CONFIG_PPC_SPLPAR which enables the code for calling the hypervisor on shared-processor logically-partitioned system to yield the physical processor to the lock holder when spinning. (The code that depends on this symbol is already present in arch/ppc64/lib/locks.c.) [PATCH] ppc64: strengthen I/O and memory barriers From: Paul Mackerras After I sent the recent patch to include/asm-ppc64/io.h which put stronger barriers in the I/O accessor macros, Paul McKenney pointed out to me that a writex/outx could still slide out from inside a spinlocked region. This patch makes the barriers a bit stronger so that this can't happen. It means that we need to use a sync instruction for wmb (a full "heavyweight" sync), since drivers rely on wmb for ordering between writes to system memory and writes to a device. I have left smb_wmb() as a lighter-weight barrier that orders stores, and doesn't impose an ordering between cacheable and non-cacheable accesses (the amusingly-named eieio instruction). I am assuming here that smp_wmb is only used for ordering stores to system memory so that another cpu will see them in order. It can't be used for enforcing any ordering that a device will see, because it is just a gcc barrier on UP. This also changes the spinlock/rwlock unlock code to use lwsync ("light-weight sync") rather than eieio, since eieio doesn't order loads, and we need to ensure that loads stay inside the spinlocked region. [PATCH] ppc64 iSeries: allow read only virtual disks From: Stephen Rothwell It is possible to attach a virtual disk to a logical partition on an iSeries machine so that it is read only to the partition. This patch allows Linux to use such virtual disks. [PATCH] ppc64: Add proper SMP init on dual 970FX based machines This patch fixes SMP boot on Apple Xserve G5 [ACPI] delete IOAPIC-disable workaround on x86_64/VIA http://bugme.osdl.org/show_bug.cgi?id=1530 [PATCH] PowerPC Virtual Ethernet duplicate MAC addresses This fixes a bug where different partitions were assigned the same MAC address. Also, according to Anton, gcc 3.5 didn't like our mac_addr_p gymnastics, so this ends up fixing that as well. [PATCH] PowerPC Virtual Ethernet links to /sys/class/net/ethX This adds links to the driver and device inside /sys/class/net/ethX for PowerPC Virtual Ethernet devices. [PATCH] fix typo in avm_cs PCMCIA AVM B1 cardservice driver this trivial fix makes the AVM B1 PCMCIA cards work with 2.6 [PATCH] x86_64 has buggy ffs() implementation x86_64 has an incorrect ffs() implementation. The asm uses "g" instead of "rm" for the bsfl instruction. (This was spotted by Yuri Per.) bsfl does not accept constant values but only memory or register ones. On i386 the correct "rm" is used. This causes NTFS build to fail as gcc optimizes a variable into a constant and ffs() then fails to assemble. [sound/oss i810] bump driver to version 1.00 [sound/oss i810] pci id cleanups The driver defined its own PCI id constants. Kill the majority, which were redundant, and move the rest to include/linux/pci_ids.h. Also, move open-coded tests for "new ICH" audio chips to a single helper function. These tests were being patched with each new ICH motherboard from Intel, resulting in each new PCI id being added to several places in the driver. Note that, even though this should be a harmless patch, there exists the remote possibility that I mis-matched some of the PCI ids, as I only tested ICH5. Make dev_dbg() "use" it's 'dev' argument even when not debugging. This avoids warnings about unused variables. Fix typo nonsense test in radeon PMAC backlight code. [ARM PATCH] 1880/1: cache_type is uninitialised in the blockops_check() function Patch from Catalin Marinas In the blockops_check() function, cache_type is uninitialised because an "mcr" instruction is used instead of "mrc". [ARM PATCH] 1881/1: Illegal strex instruction generated by gcc Patch from Catalin Marinas The _raw_write_(try)lock functions in include/asm-arm/spinlock.h should use the early clobber modifier (&) for the "tmp" register. A newer compiler (gcc-3.4.0) generates an "strexeq %0, %1, [%2]" instruction where %0 is the same as %2, which is illegal. [ARM PATCH] 1882/1: Fixes in the v6_dma_(invalidate|flush)_range functions Patch from Catalin Marinas The v6_dma_invalidate_range - the "mcr" instruction for draining the write buffer requires r0 == 0. A "cmp" instruction for testing the end address is missing in the v6_dma_flush_range function. [ARM PATCH] 1883/1: Bit 4 in pmd should be 0 for the ARMv6 architecture Patch from Catalin Marinas Unlike the v5 architecture, the ARM1136 requires that BIT4 is 0 in the first level page descriptor (ARM1136 TRM, page 6-39). It works at the moment but it might break future v6 cores. [ARM PATCH] 1879/1: fix a few xscale "drain write & fill buffer" instructions Patch from Robin Farine Fix the xscale cache handling routines that were invalidating a D cache line instead of draining the write & fill buffer as intended. [PATCH] Multiple (ICH3) IDE-controllers in a system This fixes a problem with multiple IDE controllers in a system. The problem is that pcibios_fixups table (in arch/i386/pci/fixup.c) uses the pci_fixup_ide_trash() quirk for Intel's ICH3 (my case specifically 8086:248b). This clears any bogus BAR information set up by the BIOS. In a system which has multiple ICH3's can't use any of the IDE controllers beside the one on the first ICH3. Anyhow, the fix is to make sure pci_fixup_ide_trash resets the BARs only for first time being called, so the subsequent IDE controllers will use the BIOS BARs. This is better than "loosing" all these IDE controllers in the case their BARs set right. The issue discussed and agreed with Bartlomiej Zolnierkiewicz (see below). [libata] internal cleanups Remove unused 'done_late' arg to ata_qc_complete(), which was never useful in 2.4, and never used at all in 2.6. This allows us to eliminate the same arg from ata_dma_complete(), and also make it more correct by passing the command rather than the ATA port structure as arg0. [PATCH] x86: stack dumps using frame pointers From: Adam Litke Teach the x86 stack tracing code to use frame pointers, if they are available. It eliminates all the false-positives in the normal stack traces. This is a big improvement, and -fomit-frame-pointer seems to make no difference at all to generated code size. Maybe we should kill off -fomit-frame-pointer. [PATCH] Fix writeback_inodes-vs-umount race Fix bug identified by Chris Mason. If writeback_inodes is left holding a ref on the superblock's last inode then the superblock list walk can race with umount and the superblock can be released. Take and put a ref against the superblock to fix that. [PATCH] sched: improved cpu_load rounding From: Nick Piggin "Siddha, Suresh B" noticed a problem in the cpu_load averaging where the integer truncation could sometimes cause cpu_load to never quite reach its target. I'm not sure that you could demonstrate a real world problem, but I quite like this fix. [PATCH] sched: fix scheduler for unsynched processor sched_clock From: Nick Piggin Fine-tune the unsynched sched_clock handling. Basically, you need to be careful about ensuring timestamps get correctly adjusted when moving CPUs, and you *can't* look at your unadjusted sched_clock() and a remote task's ->timestamp and try to come up with anything meaningful. I think this second problem will really hit hard in the activate_task path on systems with unsynched sched_clock when you're waking up a remote task, which happens very often. Andi, I thought some Opterons have unsynched tscs? Maybe this is causing your unexplained bad interactivity? Another problem is a fixup in pull_task. When adjusting ->timestamp from one processor to another, you must use timestamp_last_tick for the local processor too. Using sched_clock() will cause ->timestamp to creep forward. A final small fix is for sync wakeups. They were using __activate_task for some reason, thus they don't get credited for sleeping at all AFAIKS. And another thing, do we want to #ifdef timestamp_last_tick so it doesn't show on UP? [PATCH] sched: less locking in balancing From: Nick Piggin Analysis and basic idea from Suresh Siddha "This small change in load_balance() brings the performance back upto base scheduler(infact I see a ~1.5% performance improvement now). Basically this fix removes the unnecessary double_lock.." Workload is SpecJBB on 16-way Altix. [PATCH] sched: reduce node balancing interval From: Nick Piggin From: Suresh Siddha Node max rebalance interval is too large. It is currently dependent on number of online cpus. For 16 cpu system, max node balance interval in busy case is 32 seconds. Agreed that it will use max 32 seconds only when it doesn't find imbalance for a long time. But this will lead to slow response time in cases where load runs for a second with no imbalance and suddently creates an imbalance. My patch makes the busy max node rebalance interval equal to the base [scheduler]. [PATCH] Use -msoft-float From: Dave Jones To catch accidental usage of floating point. Has been in -mm for ages. [PATCH] autofs4: dnotify + autofs may create signal/restart syscall loop From: Ian Kent From: Jeff Mahoney I saw a recent bug report that showed when a process set up a dnotify against the autofs root and then attempted an access(2) call inside the autofs namespace on a mount that would fail, it would create a signal/restart loop. The cause is that the autofs code checks to see if any signals are pending after it waits on a response from the autofs daemon. If it finds any, it assumes that autofs_wait was interrupted, and that it should return -ERESTARTNOINTR. The problem with this is that a signal_pending(current) check will return true if *any* signals were received, not just if a signal that interrupted the wait was received. autofs_wait explicitly blocks all signals except for SIGKILL, SIGQUIT, and SIGINT before calling interruptible_sleep_on. The effect is that if a dnotify is set against the autofs root, when the autofs daemon creates the directory, a dnotify event will be sent to the originating process. Since the code in autofs_root_lookup doesn't check to see what signals are actually pending, it bails early, telling the caller to try again. The loop goes on forever until interrupted via one of the actual interrupting signals. The following patch makes both autofs_root_lookup and autofs4_root_lookup verify that one of its defined "shutdown" signals are pending before bailing out early. Any other signal should be delivered later, as expected. It doesn't matter if the signal occured outside of the sleep in autofs_wait. The calling process will either go away or try again. [PATCH] autofs4: printk cleanups and memory leak fix From: Ian Kent - Correct text in DPRINTK messages and comments, a little reformating and correct URL location for autofs v4 in Kconfig message. - Fix error-path memory leak in autofs4_fill_super() [PATCH] autofs4: locking rework From: Ian Kent Remove BKL from autofs4 module and add spinlock to serialise access to the automount daemon communication waitq. Locking requirements are different in 2.6 and so I'm seeking comments and suggestions on this. I have taken a rather heavy handed approach to this in the patch. For example, the VFS operations that directly change the filesystem, such as autofs4_mkdir etc, hold the inode semaphore on entry so the BKL has been removed. I can't see why two locking mechanisms are needed. Rather than add locking all over the place, I'm looking for justification it's needed, as I don't see it myself. [PATCH] autofs4: expiry refcount fixes From: Ian Kent This patch is the result of an e-mail discussion with Soni Maneesh. He felt that the use of reference counts in the expire module is unreliable (in the presence of rcu) and suggested it should use standard VFS calls where possible. This has been done. Once the boundary in autofs is reached we have no choice but to resort using reference counts (but under the vfsmount_lock). After review by hch: - renamed autofs4_may_umount to __may_umount_tree, made it static and moved it to namespace.c. - added stub function may_umount_tree with description - altered may_umount to use above stub function and added little description - added may_umount_tree prototype to fs.h - removed the EXPORT_SYMBOL for vfsmount_lock - updated expire.c to suit [PATCH] autofs4: may_umount_tree() cleanup From: Patch to sync 2.6.6-rc2-mm2 with the result of my discussion with Christoph Hellwig. Difference is that Christoph realised that merging may_umount_tree and may_umount was not worth it. They are now seperate functions. [PATCH] autofs4: readdir fixes From: Ian Kent a. Implement readdir and friends for directory lookup for late mounting. This is done largely by replacing a catch all condition in try_to_fill_dentry with appropriate cases. b. Add path calc. function in waitq.c to get extended path to return to daemon (for direct mounts). c. Add revalidate calls to sys_chdir and sys_chroot so that pwd lookups work correctly. d. Add ioctl to retrieve minor version for automount daemon (and me) to recognise module fix level. Bumped minor version to 5. From: Hugh Dickins After chdir (or chroot) to non-existent directory on 2.6.5-mm5, you can no longer unmount filesystem holding working directory (or root). [PATCH] autofs4: fix handling of chdir and chroot From: Pushed changes in sys_chdir and sys_chroot into the revalidate/lookup by using nameidata hint. [PATCH] autofs4: add ioctl to query unmountability From: Ian Kent Add ioctl to find out if autofs mount can be umounted. When the daemon discovers this it's past the point of no return. [PATCH] autofs4: readdir futureproofing From: Ian Kent Needed for support coming development plans. [PATCH] autofs4 race fix From: Ian Kent The case where two process similtaneously trigger a mount in autofs4 can cause multiple requests to the daemon for the same mount. The daemon handles this OK but it's possible an incorrect error to be returned. For this reason I believe it is better to change the spin lock to a semaphore in waitq.c. This makes the second and subsequent request wait on the q as ther supposed to. [PATCH] autofs4 compat ioctls From: These are the ioctls that need to be added to the compatibility layer. They are all esentially the same as the AUTOFS_IOC_PROTOVER in their requirements and so should be fine. [PATCH] fbdev: radeonfb: fix garbled screen From: Benjamin Herrenschmidt > My screen is still a bit garbeld after booting. > Still like http://zodiac.dnsalias.org/images/garbage.jpg Yes, usual boot-time x86 garbage. Well, let's imagine it's as simple as clearing the framebuffer during boot :) Try this patch and let me know. If that doesn't help, then the problem is definitely in fbcon. [PATCH] fbdev: Neomagic driver update. From: James Simmons Here is a updated driver for the neomagic. [PATCH] fbdev: video/tridentfb.c warning fix From: "Luiz Fernando N. Capitulino" Speaking with frame buffer people, we agree with this patch to fix the warning: drivers/video/tridentfb.c:455: warning: `tridentfb_fillrect' defined but not used drivers/video/tridentfb.c:473: warning: `tridentfb_copyarea' defined but not used [PATCH] fbdev: video/hgafb.c warning fix From: "Luiz Fernando N. Capitulino" Make HGA acceleration functions selectable in kernel config, fix these warnings: drivers/video/hgafb.c:452: warning: `hgafb_fillrect' defined but not used drivers/video/hgafb.c:472: warning: `hgafb_copyarea' defined but not used drivers/video/hgafb.c:502: warning: `hgafb_imageblit' defined but not used [PATCH] fbdev: video/tdfxfb.c warning fix From: "Luiz Fernando N. Capitulino" Fix this: drivers/video/tdfxfb.c:1005: warning: `tdfxfb_cursor' defined but not used and make the acceleration function selectable (like hgafb and tridentfb) Geert says: tdfxfb_cursor() was not used before, causing a compiler warning. tdfxfb_cursor() may work, but we don't know, so we didn't dare to enable it by default. Now the user (he who has the hardware) can enable it, and tell us whether it works or not. [PATCH] fbdev: video/imsttfb.c warning fix From: "Luiz Fernando N. Capitulino" drivers/video/imsttfb.c:1089: warning: `imsttfb_load_cursor_image' defined but not used drivers/video/imsttfb.c:1159: warning: `imstt_set_cursor' defined but not used [PATCH] fbdev: clean up logo handling From: James Simmons This make the logo handling code easier to read. Merged the two code blocks since they test for the exact same condition. [PATCH] fbdev: remove redundant p->vrows calculation From: James Simmons This patch removes the redundent calculation of p->vrows. This is done in fbcon_resize. [PATCH] fbdev: remove redundant local From: James Simmons Remove extra variable. We use i instead of rc. [PATCH] fbdev: set a default access_align value From: James Simmons Set the default access_align variable. This variable tells us how much data the hardware can handle in a single read/write cycle. For example the epson chipset can handle only 16 bit reads and writes to the framebuffer. [PATCH] fbdev: Fix NULL-ptr dereference in pm2fb_probe From: Jim Hague It fixes the NULL pointer dereference and also a problem in pm2fb_blank(). [PATCH] fbdev: Virtual fbdev updates From: James Simmons This is attempt 2 at the virtual framebuffer patch. It migrates the driver to the framebuffer_release/framebuffer_alloc api. It doesn't enable the driver by default. [PATCH] fbdev: Vesa Fbdev update From: James Simmons This patch migrates the Vesa Framebuffer driver over to the framebuffer_alloc/framebuffer_release api. It also fixes the error handling paths. The mtrr issue that Geert brought up has been fixed. [PATCH] fbdev: Vesa Fbdev update fix From: Geert Uytterhoeven On Sun, 25 Apr 2004, James Simmons wrote: > This patch migrates the Vesa Framebuffer driver over to the > framebuffer_alloc/framebuffer_release api. It also fixes the error > handling paths. The mtrr issue that Geert brought up has been fixed. > With your approval Geert, Ben please apply this patch. > + /* Set video size according to vram boot option */ > + if (vram && vram * 1024 * 1024 != vesafb_fix.smem_len) > + vesafb_fix.smem_len = vram * 1024 * 1024; The second part of the test can be removed. The rest looks OK to me. [PATCH] fbdev: New Asiliant framebuffer driver. From: James Simmons This is the new asiliant framebuffer driver. [PATCH] fbdev: Fix fbcon and unimap From: Fabrice Menard Trying to solve my latin1 char problems with the framebuffer console, I found that fbcon doesn't set a unicode map. [PATCH] fbdev: Q40 fbdev updates. From: James Simmons It ports this driver to sysfs api and fixes a colormap issue. [PATCH] Fix i2o_proc kernel panic on access of /proc/i2o/iop0/lct From: Markus Lidel The patch converts i2o_proc to seq_file, thereby fixing a bug in the i2o_proc.c module, where the kernel panics, if you access /proc/i2o/iop0/lct and read more then 1024 bytes of it. [PATCH] i2o_proc module owner fix From: Warren Togami [PATCH] slabify iocontext + request_queue From: Jens Axboe Move both request_queue and io_context allocation to a slab cache. This is mainly a space-saving exercise. Some setups have a lot of disks and the kmalloc rounding-up can consume significant amounts of memory. [PATCH] show last kernel-image symbol in /proc/kallsyms From: Rusty Russell The current code doesn't show the last symbol (usually _einittext) in /proc/kallsyms. The reason for this is subtle: s_start() returns an empty string for position 0 (ignored by s_show()), and s_next() returns the first symbol for position 1. What should happen is that update_iter() for position 0 should fill in the first symbol. Unfortunately, the get_ksymbol_core() fills in the symbol information, *and* updates the iterator: we have to split these functions, which we do by making it return the length of the name offset. Then we can call get_ksymbol_core() without moving the iterator, meaning that we can call it at position 0 (ie. s_start()). [PATCH] Include Aliases in kallsyms From: Rusty Russell Kallsyms discards symbols with the same address, but these are sometimes useful. Skip this minor optimization and make kallsyms_lookup deal with aliases [PATCH] make buildcheck From: Arjan van de Ven the patch below adds a "make buildcheck" target which checks for the "uses exit in init" bug using Keith Owen's script. In the future other similar sanity checks can be added to this target, but even just this one has been quite useful already. I use it in the kernel rpm build process for example, and I'm sure the OSDL build testers can/want to use it too. From: Keith Owens They commented out the progress print statements, I prefer to have them present but no big deal. The licence is missing. [PATCH] efivars: check that it's enabled From: "Randy.Dunlap" EFI-enabled kernels crash on non-EFI machines. efivars_init() and efivars_exit() need to check efi_enabled instead of assuming that the system is using EFI. [PATCH] expose backing dev max read-ahead From: Jens Axboe Expose the blockdev's VM readahead in /sys/block/hda/queue/read_ahead_kbytes This duplicates `blockdev --setra', but we're trying to get away from ioctls. It would be nice to have a readahead-setting mechanism which also allows, say, NFS to be tuned. But there is no common exposure point for backing_dev_infos. One option might be per-superblock: mount -o remount,read_ahead_kbytes=64 but the generic remount code also has no visibility of the backing_dev, so it would need a new super_block operation. One which doesn't accidentally modify default_backing_dev_info. [PATCH] ib700wdt watchdog driver fix From: Patrice Bouchand ibwdt_ping(): we should write the current timeout's index into the holdoff register, not the timeout's value in seconds. [PATCH] ib700wdt watchdog driver fix #2 From: Patrice Bouchand The value written in the WDT_STOP register is not important. As soon as something is written, the watchdog timer stops. But things will be cleaner if we use the following patch. [PATCH] laptop-mode documentation fix From: Sau Dan Lee The script /etc/acpi/actions/battery.sh in the document doesn't run, because of a wrong name. [PATCH] create_workqueue locking fix Fix some silliness in there. [PATCH] Fix AladdinCard entry in parport_pc From: Christian Groessler Our AladdinCard also uses the oxsemi_840 chips and locks up when ecp mode is enabled. [PATCH] Watchdog timer for Intel IXP4xx CPUs From: Deepak Saxena Following patch adds a driver for the watchdogs on the Intel IXP4xx family of network processors (ARM). [PATCH] Update laptop mode control script with XFS_HZ=100 From: Bart Samwel The laptop mode control script incorrectly guesses XFS_HZ=1000. This is incorrect, since the patches that made XFS use USER_HZ went into 2.6.6 as well. This changes XFS_HZ to 100 and removes the warning from the doc about checking XFS_HZ. [PATCH] videodev: handle class_register() failure From: "Randy.Dunlap" From: (Walter Harms) (acked by Gerd) [PATCH] dquot_release oops fix From: Jan Kara Fix a null-pointer-deref oops in the quota code. [PATCH] calculate NGROUPS_PER_BLOCK from PAGE_SIZE From: Greg Edwards On ia64, EXEC_PAGESIZE (max page size) is 65536, but the default page size is 16k. This results in NGROUPS_PER_BLOCK in include/linux/sched.h being calculated incorrectly when the page size is anything other than 64k. For example, on a 16k page size kernel, a setgroups() call with a gidsetsize of 65536 will end up walking over memory since only 1/4 of the needed pages were allocated for the blocks[] array in the group_info struct. Patch below calculates NGROUPS_PER_BLOCK from PAGE_SIZE instead. [PATCH] PCI debug compile fix in sis_router_probe() From: Pavel Roskin I get a compile error when I define "DEBUG" in arch/i386/pci/pci.h. Variable rt is not defined in sis_router_probe(), file arch/i386/pci/irq.c. [PATCH] security: remove empty build of capability.o From: Chris Wright The build includes capability.c when CONFIG_SECURITY=n, yet the whole file is ifdef'd out. Remove unnecessary build step as well as superfluous ifdefs. [PATCH] security: minor cleanups in capability.c From: Chris Wright Remove confusing error message when loading as secondary module, and ditch conditional MY_NAME macro. [PATCH] fix linux doc errors From: Alan Cox [PATCH] fix block layer ioctl bug From: Alan Cox The block layer checks for -EINVAL from block layer driver ioctls. This is wrong - ENOTTY is unknown and some drivers correctly use this. I suspect for an internal ioctl 2.7 should change to -ENOIOCTLCMD and bitch about old style returns This is conservative fix for the 2.6 case, it keeps the bogus -EINVAL to avoid breaking stuff [PATCH] Fix reiserfs oom crash From: Oleg Drokin Thanks to Standford guys, a case where reiserfs can dereference NULL pointer if memory allocation fail during mount was identified. [PATCH] implement print_modules() From: Arjan van de Ven , Rusty Russell The patch below resolves the "Not Yet Implemented" print_modules() thing. This is a really useful feature for distros; it allows us to do statistical analysis on which modules are present how often in oopses compared to how often they are used normally. In addition it helps to spot candidates for certain bugs without having to go back to the customer asking for this information. [PATCH] m68k: use print_modules() From: Geert Uytterhoeven [PATCH] Fix endianess in modpost when cross-compiling for sparc on i386 From: Mathieu Chouquet-Stringer This patch makes the following code work again: #ifdef STT_REGISTER if (info->hdr->e_machine == EM_SPARC || info->hdr->e_machine == EM_SPARCV9) { /* Ignore register directives. */ if (ELF_ST_TYPE(sym->st_info) == STT_REGISTER) break; } #endif This portion of code is sparc specific and nothing else in modpost.c uses e_machine meaning cross-compiling for sparc on i386 (or any little endian machine) is the only way to experience the bug. Without it, e_machine has the wrong value and modpost then generates a lot of "*** Warning: \"symbol\" [filename.ko] undefined" messages. [PATCH] fix cyclades compile with !PCI From: Adrian Bunk drivers/char/cyclades.c: In function `cy_cleanup_module': drivers/char/cyclades.c:5638: warning: implicit declaration of function `pci_release_regions' [PATCH] fix tlan.c for !PCI From: Adrian Bunk drivers/net/tlan.c: In function `tlan_remove_one': drivers/net/tlan.c:449: warning: implicit declaration of function `pci_release_regions' [PATCH] fix aic7xxx_old.c for !PCI From: Adrian Bunk drivers/scsi/aic7xxx_old.c: In function `aic7xxx_release': drivers/scsi/aic7xxx_old.c:10971: warning: implicit declaration of function `pci_release_regions' [PATCH] x86_64 msr.c warning fix arch/x86_64/kernel/msr.c:1:10: warning: extra tokens at end of #ident directive [PATCH] Make /proc/sysrq-trigger ignore sysrq_enabled It's silly that writing to /proc/sysrq-trigger does nothing if you haven't enabled /proc/sys/kernel/sysrq. So provide a new __handle_sysrq() which ignores the sysrq_enabled check. The patch also withdraws __handle_sysrq_nolock() from the kernel API. It had no callers. [PATCH] remove driver model code in mwave driver From: Christoph Hellwig Someone blindly added sysfs support to the driver long time ago without understanding the implications (and if they were understood the driver would need half a rewrite for it). Herber Xu recently noticed the problems this causes on unload, so let's if 0 out all that code and get the driver working again. [PATCH] Fix x86_64 allmodconfig with gcc-3.4.0 From: Andi Kleen *** Warning: "memcmp" [drivers/atm/zatm.ko] undefined! gcc 3.4 specific problem. This patch should fix it. Actually it would be better to move all these EXPORT_SYMBOLs into lib/string.c, it is silly that each arch has to duplicate all that. [PATCH] Remove old sh-sci driver From: Paul Mundt The old drivers/char sh-sci driver is no long used by anyone, both sh and h8300 are using the drivers/serial version at this point, so we can get rid of the old one entirely. [PATCH] Export `laptop_mode' for XFS From: XFS needs `laptop_mode'. [PATCH] floppy.c: better floppy_init error handling From: "Randy.Dunlap" From: "Luiz Fernando N. Capitulino" Adds a better audit for floppy_init(). Fixes one real bug (in calling blk_queue_max_sectors()). [PATCH] floppy.c: better/cleaner use of debugt From: "Randy.Dunlap" From: "Luiz Fernando N. Capitulino" floppy_debugt.patch: better use of the debugt functions. [PATCH] remove unused acpi_irq_to_vector() From: Bjorn Helgaas Now that everybody has acpi_gsi_to_irq(), we can nuke the deprecated acpi_irq_to_vector(). No references remain. [PATCH] Laptop mode control script support for XFS *_centisecs sysctl values. From: Bart Samwel XFS now uses /proc/sys/fs/xfs/xfssyncd_centisecs /proc/sys/fs/xfs/xfsbufd_centisecs /proc/sys/fs/xfs/age_buffer_centisecs Here's a patch to support these values in the laptop mode control script. [PATCH] Increase xfsbufd_centisecs when in laptop mode From: Bart Samwel The attached patch is the outcome of a discussion with Nathan. When laptop mode is active, there is no need for XFS to wake up xfsbufd (the daemon that flushes buffers that are too old) too often. The default is once every second, this patch makes laptop mode do it once every 30 seconds. [PATCH] ac97_plugin_ad1980 porting fix From: "Uwe Bugla" Fix up a mistake in the 2.4->2.6 forward-port of this driver. [PATCH] groups_alloc(0) clobbers memory past end of block From: Olaf Kirch Authentication code in net/sunrpc makes frequent use of groups_alloc(0), which seems to clobber memory past the end of what it allocated. If called with gidsetsize == 0, groups_alloc will set nblocks = 0, but still does a group_info->blocks[0] = group_info->small_block; [PATCH] bootmem.c cleanup From: Michael Buesch - BUG_ON() conversion - Remove redundant dump_stack() (BUG already does that) [PATCH] radeonfb stack space fix These (unused) arrays are causing huge stack utilisation when instantiated in an auto variable. Remove them for now. [PATCH] typhoon locking fix Initialise the semaphore even if !MODULE [PATCH] Fix botched fbdev lvalue conversion [PATCH] i2o_config build fix Stomp a C99ism. Fix bogus debug code in usb/misc/cytherm.c Uncovered by recent cleanup of "dev_dbg()". [PATCH] x86-64 updates Various accumulated x86-64 patches and bug fixes. It fixes one nasty bug that has been there since NX is used by default in the kernel. With heavy AGP memory allocation it would set NX on parts of the kernel mapping in some corner cases, which gave endless crash loops. Thanks goes to some wizards in AMD debug labs for getting a trace out of this. Also various other fixes. This patches only changes x86-64 specific files, i have some changes outside too that I am sending separately. - Fix help test for CONFIG_NUMA - Don't enable SMT nice on CMP - Move HT and MWAIT checks up to generic code - Update defconfig - Remove duplicated includes (Arthur Othieno) - Set up GSI entry for ACPI SCI correctly (from i386) - Fix some comments - Fix threadinfo printing in oopses - Set task alignment to 16 bytes - Handle NX bit for code pages correctly in change_page_attr() - Use generic nops for non amd specific kernel - Add __KERNEL__ checks in unistd.h (David Lee) [PATCH] x86-64: fix /dev/mem caching behaviour This changes the /dev/mem caching behaviour on x86-64 to be compatible with i386. By default everything is set cached. This actually makes WC MTRRs on AMD systems work, which would get overriden by the UC PAT bits that were set earlier. This can make DVD decoding with hardware support a lot faster. It also supports O_SYNC now, like i386, although that is not really safe, because it allows the user to create undefined cache attribute conflicts that can corrupt caches in some circumstances. I kept it for now. Better would to disallow it, until Terrence Ripperda's PAT framework is getting merged, that can avoid these problems. Actually it would be probably a good idea to add a printk here to catch broken programs for i386 and x86-64, but that is for another patch. [PATCH] Handle empty nodes in sysfs on x86-64 This code is shared between i386 and x86-64, and x86-64 needs to check for empty nodes here. Otherwise you can get oopses at boot in some circumstances. This handles empty nodes != 0; empty node zero are still broken in other ways. [PATCH] ide-disk.c: more write cache fixes - many Maxtor disks incorrectly claim CACHE FLUSH EXT command support, fix it by checking both CACHE FLUSH EXT command and LBA48 support (thanks to Eric D. Mudama for help in fixing this) - write_cache() was called with 'drive->id->cfs_enable_2 & 0x3000' as 'int arg' argument which was always truncated to zero due to 'u8 drive->wcache = arg' assignment so write cache was indeed enabled but drive->wcache was zero (thanks to Rene Herman for help in debugging this) - flush cache in idedisk_start_power_step() only if ATA-6 CACHE FLUSH (EXT) bits are present in disk's identify data (prevents sending unknown commands) - set drive->wcache in idedisk_setup() not idedisk_attach() (no need to check id->command_set_2 - we check id->cfs_enable_2 instead in write_cache() call) - use ide_cacheflush_p() in idedisk_setup() - minor cleanups [PATCH] remove bogus drivers/ide/pci/cmd640.h Trivia. CMD640 driver doesn't use generic IDE PCI code (it doesn't even include this header). Fix gidsetsize == 0 for real this time. We need to always allocate at least one indirect block pointer, since we always fill out blocks[0] even if we don't have any groups. [libata] minor stuff * now that ATAPI is close to working, making ATAPI DMA interrupts in ata_host_intr * remove unnecessary space character in printk() output (oh, the horror) [CPUFREQ] Fix several operator precedence bugs. [CPUFREQ] Makefile reordering issues. As several cpufreq drivers are late_initcalls now [dependency on acpi/processor.c which is module_init()], we need to use Makefile ordering to assert that - speedstep-centrino is loaded before acpi [faster: msr instead of io] - speedstep-centrino, speedstep-ich and acpi are loaded before p4-clockmod [frequency and voltage scaling instead of throttling] [CPUFREQ] Sync p4-clockmod MSR access across logical CPUs. As noted and debugged by Rutger Nijlunsing and verified in section 13.15.3 of Intel's IA32 Intel Architecture Software Developer's Manual, Volume 3, the p4-clockmod msr needs to be set to the same value on all logical CPUs ("siblings") to function "properly". This patch implements this, and uses cpufreq_p4_get instead of a local copy in cpufreq_p4_setdc. The latter function now only does the actual setting, all other (notification, verification and set_cpus_allowed()) stuff is done in cpufreq_p4_target. [CPUFREQ] Fix an invalid comment in speedstep-ich This driver is for ICH only, not for PIIX4. Thanks to Christian Hilberg for noting this. [CPUFREQ] Make powernow-k8 work right when ACPI is built as a module. From: Tony Lindgren [libata] handle non-data ATAPI commands via interrupt It's easier to do it this way, than polling, at the moment. Also, fix a test in ata_scsi_translate that was incorrectly erroring-out non-data commands. [netdrvr b44] better reset behavior This patch makes the b44-after-bcm4400 scenario work for me. What was happening is that the broadcom driver sets a "power off MAC" bit, and we didn't remove that when initializing the chip. Also added some (a bit ugly, I know ) logic to clear up the address filter stuff, which is what recent broadcom drivers do... [PATCH] Quota fix 2 This fixes the problem with recursion into filesystem when inode of quota file needs a page + some other allocation problems. I hope I got the GFP mask setting right.. We need to use "memset_io()" when accessing PCI mapped memory. A regular "memset()" may be using cache control instructions etc, which is not appropriate for memory-mapped IO. This also fixes a warning. [PATCH] alpha: fix GP-load symbol linkage From: Ivan Kokshaysky This skips the GP-loading function prologue (two instructions: 8 bytes) on BRSGP linkage correctly, fixing an oops on alpha while loading the aic7xxx driver. [SERIAL] Remove base_baud default from 8250_pci Since all boards specify base_baud, the code to default base_baud to the architecture-defined BASE_BAUD is redundant. Also, defaulting to the architecture-defined BASE_BAUD is wrong since the UARTs on a serial PCI card will be clocked at the same rate no matter what the architecture of the host machine. [ARM] Move a bunch of symbol exports from armksyms.c This moves a bunch of EXPORT_SYMBOL() statements from armksyms.c into the file which defines the function/variable such that the exports are localised. This also means we can get rid of the ugly __xxx_to_xxx__is_a_macro in include/asm-arm/arch-*/memory.h [ARM] Convert execve() to be a function rather than a SWI call. This eliminates the last SWI user from the kernel - now all SWI calls will only come from userspace. More importantly, this also allows us to empty the kernel stack when starting userspace programs from kernelspace, thereby ensuring that the user registers always appear at the top of the kernel stack. Remove drivers/net/auto_irq.c. No more users of the autoirq_xxx() API existed, so this file is not only unused, it isn't even listed in any makefiles. [PATCH] Fixed quota recursion fix This fixes the gfp_mask setting on the quota inode. [libata] DMADIR support DMADIR bit is necessary for some PATA->SATA bridges. These bridges require the OS driver to specify the data xfer direction, for PACKET (a.k.a. scsi) commands. A reliable DMADIR detection method hasn't yet been developed, and ATAPI is still a WIP, so DMADIR is enabled with an ifdef for now. [libata] remove redundant use of ATA_QCFLAG_SG in ATAPI packet translation ata_scsi_translate() sets this flag for all ATA->SCSI translated commands, so it need not be done in atapi_xlat(). The now-removed use in atapi_xlat() was also inconsistent WRT PIO versus DMA. [libata] SCSI->ATA simulator hacking: INQUIRY command The SCSI T10 committee is working on a document describing a standard method for translating ATA<->SCSI, since it being done quite often these days. Some of the recommendations are reasonable, and we implement two here: * Mirror that ATA 'removeable media' bit into INQUIRY output. * Change behavior of INQUIRY output field 'product revision' from the libata software version number to the first 4 bytes of the ATA device's firmware revision number. Rather than cache the firmware revision in struct ata_device, as was/is done with two other strings, I took the opportunity to eliminate the caching of the two other strings, 'vendor' and 'product'. These strings are now retrieved as needed from the IDENTIFY [PACKET] DEVICE info page, since we cache its entire contents. Retrieving a string from the identify-device page is done via the helper function ata_dev_id_string(), which is now exported. This patch winds up making struct ata_device 40 bytes smaller, and the libata core gets a bit smaller as well. [libata] comments and constants * note a nasty problem with shared interrupts that must be fixed before we turn on certain code paths. * add a few comments to the READ CAPACITY scsi simulator * remove a FIXME comment from the TEST UNIT READY scsi simulator * add constant for ATA command CHECK POWER MODE, and associated "mandatory" power management feature set bit. [libata] scsi simulator improvements: MODE SENSE, SEEK(6,10), REZERO_UNIT * SEEK(6), SEEK(10), and REZERO_UNIT are no-ops. Unconditionally complete these commands with success. * MODE SENSE caching page work: * correct page length * set bit, if read-ahead is disabled * set bit, if writeback caching is enabled (previously, this bit was never set, even if writeback caching was enabled) * add MODE SENSE r/w error recovery page [libata] replace ATA_QCFLAG_ATAPI with inline helper Detection of an ATAPI taskfile is possible using a simple test on existing information, so there is no need to cache this value in a separate flag (ATA_QCFLAG_ATAPI). Instead, create and use a new helper function is_atapi_taskfile(). USB: fix build error in drivers/usb/serial/console.c Thanks to Adrian Bunk for pointing this out. [PATCH] USB: fix usb-serial serial_open oops No usb serial devices, just compiled in and the system has a USB controller. Unable to handle kernel NULL pointer dereference at virtual address 0000000c printing eip: c046a188 *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU: 0 EIP: 0060:[] Not tainted VLI EFLAGS: 00010246 (2.6.6-mm3) EIP is at serial_open+0x38/0x170 eax: 00000000 ebx: dc883000 ecx: c0613db8 edx: 00000000 esi: 00000001 edi: 00000000 ebp: dc84cef0 esp: dc84cedc ds: 007b es: 007b ss: 0068 Process serial (pid: 1073, threadinfo=dc84c000 task=ddffca50) Stack: 00000000 de8f4f5c ffffffed 00000100 de8f4f5c dc84cf14 c035a874 090115a0 0bc00000 dc883000 00000000 de8f4f5c 00000001 df8a2dfc dc84cf40 c0171270 dc84c000 00000001 00000000 de8f4f5c dbc75e94 00000000 de8f4f5c dbc75e94 Call Trace: [] show_stack+0x75/0x90 [] show_registers+0x11f/0x180 [] die+0xb6/0x170 [] do_page_fault+0x1e0/0x525 [] error_code+0x2d/0x40 [] tty_open+0x274/0x3b0 [] chrdev_open+0x160/0x340 [] dentry_open+0x156/0x230 [] filp_open+0x4d/0x50 [] sys_open+0x38/0x70 [] sysenter_past_esp+0x52/0x79 Code: de 63 c0 89 55 f0 c7 45 ec 00 00 00 00 85 f6 0f 85 31 01 00 00 c7 83 8c 09 00 00 00 00 00 00 8b 43 08 e8 3c fe ff ff 31 d2 89 c7 <8a> 50 0c 8b 43 08 29 d0 8b 74 87 18 89 b3 8c 09 00 00 89 5e 04 (gdb) list *serial_open+0x38 0xc046a188 is in serial_open (drivers/usb/serial/usb-serial.c:465). 460 461 /* get the serial object associated with this tty pointer */ 462 serial = usb_serial_get_by_index(tty->index); 463 464 /* set up our port structure making the tty driver remember our port object, and us it */ 465 portNumber = tty->index - serial->minor; 466 port = serial->port[portNumber]; 467 tty->driver_data = port; 468 469 port->tty = tty; [PATCH] USB: fix obsolete header usage in usb storage drivers/scsi/hosts.h is obsolete, is the prefered header nowadays. (hosts.h is just a 1 line wrapper to include it for now anyway) [PATCH] USB: fix CONFIG_PM build issues [PATCH] USB: fix MSEC_TO_JIFFIES in usb code Here are some MSEC_TO_JIFFIES() fixes missed by whoever did it, plus a minor fix to grab root_hub->serialize() during OHCI suspend. (I forgot to cut/paste those lines from resume.) [PATCH] USB: further fix to mdc800 I made a mistake fixing that driver. Here's the fix. Please apply soon. - fix race condition leading to busy waiting [ARM] Fix use of page->count Patch from: Ian Campbell This changes the atomic_t in struct page named count into a private member _count which breaks arch/arm/mm/init.c at line 80 which reads page->count directly in show_mem(). The comments in the above changeset suggest that page_count(page) is precisely equal to the old page->count semantics, even though the semantics of _count are different, so I think the following is correct [ARM] Update atomic.h This re-jigs atomic.h by providing atomic_add_return and atomic_sub_return as other architectures do. This allows us to implement the atomic ops that test the new value without having to write the underlying atomic operation in various forms. [ARM] Add linux/module.h include for ioremap. [ARM] Fix bogus variable name in dev_dbg() call in dmabounce code. [PATCH] USB: Aiptek.c Driver patch [PATCH] USB: RNDIS (and CDC) filter flag handling This should fix the problem David Meggy found, where RNDIS was setting the OID_GEN_CURRENT_PACKET_FILTER state incorrectly. It's the same issue Andrew Morton noticed a while back, for that matter, but with more than just a "now compiles on 64 bit" fix. Basically the code needs to interpret 32 bits provided in the request from the (Windows) host, rather than 8 bits of other memory that's got some irrelevant value. The fix is just to save the 32 bits. I did the same thing with the CDC Ethernet filter, which should eventually be used the same way: to limit what packets get sent to the host. Also defined a couple more of the CDC requests. [PATCH] USB: ethernet/rndis gadget address params This resolves a FIXME by adding module parameters that can be used to provide stable (vs random) addresses, and gets rid of a runtime error from obsolete module parameter usage in the RNDIS code. The stable ethernet addresses are nice to hosts, which will normally want to save them away in config databases. For example, without stable addresses Windows XP will end up recording quite a lot of RNDIS devices. [PATCH] USB: new delay helper safe wrt waitqueues this is a new waiting helper safe even if we are left on a waitqueue. This version addresses Alan's concerns about ifdefs. Please apply. - add delay helper that is safe even if we are still on another waitqueue [PATCH] USB: purge wait_ms from core this makes the core use the new safe waiting helper. - remove wait_ms from hub driver USB: fix dumb compile error in aiptek driver Doesn't anyone ever actually build the patches they send me... USB: fix up formatting issues with aiptek driver PCI Hotplug: clean up a lot of global symbols that do not need to be. [libata] polish DocBook docs a bit Mainly involved fixing a great many docproc warnings, by filling in missing documentation in the source code. switch to mempools for cifs request buf and mid allocation to avoid deadlocks in out of memory conditions free mempool in correct order [PATCH] ppc64: fix non-SMP build break arch/ppc64/lib/locks.c was recently added by Paulus' lock rewrite. It's always compiled, which breaks non-SMP builds. Below patch makes it depend on CONFIG_SMP. [ACPI] revert button module unload fix (2281) Cset exclude: len.brown@intel.com|ChangeSet|20040503042906|02093 Cset exclude: len.brown@intel.com|ChangeSet|20040428081825|02121 Cset exclude: len.brown@intel.com[lenb]|ChangeSet|20040428071221|03892 [ACPI] remove /proc files before unloading modules from Sau Dan Lee, Zhenyu Wang http://bugzilla.kernel.org/show_bug.cgi?id=2705 JFS: error in __get_metapage caused by invalid size from ea_get [ARM] Initialise an uninitialised spinlock. Resolve merge conflicts. Add msleep function to the kernel core to prevent duplication. Delete block/carmel.c's version of msleep() Remove libata's version of msleep() USB: remove usb_uninterruptible_sleep_ms() now that we have msleep() USB: clean up usages of wait_ms() now that we have msleep() USB: remove ehci and ohci's private sleep function and use msleep() instead. USB: remove wait_ms() from usb.h as it's no longer needed. I2C: change i2c_delay() to use msleep() instead. Input: remove wait_ms() in place of using msleep() Some more misc wait_ms() conversions to use msleep() [PATCH] e100: fix for incoherent arches * Changed mapping on Rx skb to bi-directional. skb->data holds both the RFD structure and the packet data, and the RFD is read/written by HW. Issue found on Xscale HW that doesn't handle cache syncs auto- matically. Other changes in patch are whitespace/spelling. [PATCH] e100: big-endian fix for ethtool -e/E * Reads/writes from/to eeprom using ethtool weren't working right on big-endian. Now they are. [PATCH] e100: netdev->priv to netdev_priv() * Convert all netdev->priv references to the fancy new netdev_priv(). do not recurse into the filesystem allocating sk buffs [PATCH] ppc32: IBM PowerPC 750GX Support From: Benjamin Herrenschmidt From: Bryan Rittmeyer This patch adds preliminary support for the IBM PowerPC 750GX. In summary this part is a PPC750FX ramped to 1 GHz with a 1MB 4-way L2 and more advanced I/O pipelining. It is beginning to appear in embedded systems and was rumored to be under evaluation inside Apple. Tested on PVR 70020101; please merge. http://www-3.ibm.com/chips/techlib/techlib.nsf/products/PowerPC_750GX_Microprocessor [PATCH] ppc32: some whitespace fixes From: Paul Mackerras This patch does nothing but fix up whitespace in three files in arch/ppc. It deletes trailing blanks and tabs in several places and joins two lines that didn't need to be split. [PATCH] ppc32: Handle altivec assist exception properly From: Paul Mackerras On machines with Altivec (i.e. G4 and G5 processors), the altivec floating-point instructions can operate in two modes: one where denormalized inputs or outputs are truncated to zero, and one where they aren't. In the latter mode the processor can take an exception when it encounters denormalized floating-point inputs or outputs rather than dealing with them in hardware. This patch adds code to deal properly with the exception, by emulating the instruction that caused the exception. Previously the kernel just switched the altivec unit into the truncate-to-zero mode, which works but is a bit gross. Fortunately there are only a limited set of altivec instructions which can generate the assist exception, so we don't have to emulate the whole altivec instruction set. This patch also makes sure that we always have a handler for the altivec unavailable exception. Without this, if you run a kernel that is not configured for altivec support on a machine with altivec, it works fine until a user process tries to execute an altivec instruction. At that point the kernel thinks it has taken an unknown exception and panics. With this patch it sends a SIGILL to the process instead. [PATCH] ppc32: update defconfigs From: Paul Mackerras This patch updates several of the ppc32 defconfigs. [PATCH] ppc32: fix MOD_{INC,DEC}_USE_COUNT abuse in 4xx/8xx code From: Christoph Hellwig Note that most of the drivers are in a compiling shape currently, but I want to get rid of the last callers of those. (acked by Tom Rini) [PATCH] ppc32: update Motorola LoPEC and Sandpoint defconfigs From: Tom Rini Update the Motorola LoPEC and Sandpoint config files. [PATCH] ppc32: remove 'mem_pieces_append' From: Tom Rini From: Fabian.Frederick Remove mem_pieces_append, it is never used. [PATCH] ppc32: some fixes for 'make O=...' From: Tom Rini From: Geoffrey LEVAND Fix some of the problems with 'make O=...' Ack'd by Sam Ravnborg. [PATCH] ppc32: Fix ocp_register_driver() return value From: Matt Porter Fixes return value from ocp_register_driver(). [PATCH] ppc32: PPC4xx fixes From: Matt Porter Fixes 440GX UIC code, updates 440GX PVRs, and fixes a typo. [PATCH] ppc64: fix rtas error log length From: Anton Blanchard From: Jake Moilanen Fix for not vmalloc'n space for the sequence number in rtas_log_buf. [PATCH] ppc64: fix rtas error log location From: Anton Blanchard From: Nathan Lynch Somehow we've been placing the rtas error_log file at /proc/ppc64/error_log, which breaks at least one application I know of. It is supposed to be at /proc/ppc64/rtas/error_log (this is the 2.4 behavior). [PATCH] ppc64: add stack overflow detection From: Anton Blanchard I only got iseries first time around. Add CONFIG_DEBUG_STACKOVERFLOW for pseries/pmac too. [PATCH] ppc64: fix error return in mf_proc From: Anton Blanchard This patch was submitted by Olaf Hering to fix mf_proc.c where it does not return error values correctly. [PATCH] ppc64: fix rtas flash driver From: Anton Blanchard From: John Rose Please apply the following, which fixes a typo that prevents the creation of the manage_flash /proc file. [PATCH] ppc64: 4GB firmware flash fix From: Anton Blanchard From: Jake Moilanen We want to make sure flash list is above 4 gigs, not 4 megs. [PATCH] ppc64: oprofile fixes From: Anton Blanchard - support newer 970 and POWER5 chips. - use new SIHV/SIPR bits on POWER5. - fix oops at shutdown. [PATCH] ppc64: Make PMC6 spin From: Anton Blanchard Make PMC6 spin on POWER5 boxes. [PATCH] ppc64: correct return code in iommu_alloc_consistent From: Anton Blanchard From: Olof Johansson iommu_alloc_consistent should return NULL on failure. [PATCH] ppc64: more required exports From: Anton Blanchard IBM veth uses these symbols. [PATCH] ppc64: add device tree pointer for vio devices From: Anton Blanchard From: Olaf Hering Provide pointer into the device-tree for vio devices. [PATCH] md: Fix user-after-free bug in multipath From: NeilBrown If mddev->thread is non-null later, it gets used. [PATCH] i2o: reorder of fields in i2o_cmd_passthru structure From: Markus Lidel I have made a mistake in the kernel header i2o-dev.h. All structures there begin with the "iop" first, but in my structure the order is reversed. [PATCH] Fix incorrect PT_FPSCR definition Bryan Rosenburg pointed out that the definition of PT_FPSCR in include/asm-ppc64/ptrace.h is wrong. The patch below fixes it. [PATCH] add default ARM/ARM26 IDE host driver Add drivers/ide/arm/ide_arm.c for simple default IDE interfaces and clean obsolete ide_init_default_hwifs() implementations in asm-arm/arch-{cl7500,rpc,shark}/ide.h and asm-arm26/ide.h. This allows us to kill ide_init_default_hwifs() completely in the next patch (because lh7a40x and sa1100 are broken). Cross-compile tested on ARM. [PATCH] ARM/ARM26 IDE cleanups - clear hwif->hw in setup-pci.c before using it - fix arch/arm/Kconfig to allow IDE only on platforms supporting it - introduce IDE_ARCH_OBSOLETE_INIT and ide_default_io_ctl() so we can use generic ide_init_hwif_ports() and kill no longer needed (leave broken lh7a40x and sa1100 versions) Cross-compile tested on ARM. [PATCH] Fix reiserfs inode size update race reiserfs_file_write unlocks the pages it operated on before updating i_size. This can lead to races with writepage, who checks i_size when deciding how much of the file to zero out. This patch also replaces SetPageReferenced with mark_page_accessed() in reiserfs_file_write This was verified to fix the BitKeeper data corruption problems that Steven Cole has been debugging, where concurrent writes to a file and writebacks to disk would cause zeroes in the file when CONFIG_PREEMPT was enabled. [PATCH] H8/300 IDE support update From: Yoshinori Sato With minor fixes from me [PATCH] ide.c: use less stack in ide_unregister() From: Chris Wedgwood Seperate function, cruft removed and old_hwif renamed to something less confusing. Fixed mismerge of cpufreq and pcmcia updates JFS: Don't return -EPERM for system xattrs. Also, we don't need to call jfs_permission directly anymore, just permission. Thanks Andreas Gruenbacher Shorten some PCI device names Avoid warnings about truncating them when building the name database. JFS: Implement multiple commit threads For scalability, jfs now allows specifying the number of commit threads with the module parameter commit_threads=. It defaults to the number of processors. You can also change the number of txBlocks and txLocks allocated with the nTxBlock and nTxLock module parameters. "modinfo jfs.ko" for specifics. [PATCH] alpha fp-emu vs module refcounting From: Ivan Kokshaysky This allows building the math-emu code as a module only when CONFIG_SMP is not set. The fp trap handler cannot be preempted on a single-CPU (as CONFIG_PREEMPT is not going to be supported on alpha), so the module can be safely unloaded at any time. [PATCH] fealnx smp bugfix spinlock_t *lp = &((struct netdev_private *)dev->priv)->lock; doesn't mix with spin_unlock_irqrestore(&lp, flags); JFS: set GFP_NOFS to avoid recursing back into file system code [PATCH] ia64: fix 1-CPU PMC/PMD dump for /proc/perfmon when PFM_DEBUG is on [PATCH] system_state splitup Split the system_state state `SYSTEM_SHUTDOWN' into SYSTEM_HALT, SYSTEM_POWER_OFF and SYSTEM_RESTART and export system_state to modules. This allows driver shutdown routines to know why they are being shutdown. The IDE subsystem wants this so that it knows to not spin the disks down across a reboot. [PATCH] blk_run_page() race fix blk_run_page() is incorrectly using page->mapping, which makes it racy against removal from swapcache. Make block_sync_page() use page_mapping(), and remove bkl_run_page(), which only had one caller. [PATCH] put module license in swim3.c From: Paul Mackerras This patch adds module tags for the swim3 (macintosh floppy) driver. [PATCH] PPC32: Get full register set on bad kernel accesses From: Paul Mackerras At present on ppc32, if the kernel accesses a bad address and causes an oops, or drops into the xmon debugger, we only have the contents of the volatile registers available to print. The reason is that we only save the volatile registers on entry for a page fault. This patch restructures the code a bit so that if do_page_fault() determines that the page fault is caused by a bad kernel access, it returns to the caller, which then saves the full register set into the exception frame before calling bad_page_fault(). This way we get the full set of registers printed in the oops message. [PATCH] PPC64: iSeries virtual ethernet transmit errors From: Stephen Rothwell This patch stops the iseries_veth driver trying to send every packet to too many logical partitions. Consequently, the number of transmit errors falls to (about) zero from a very large number. This should also improve performance a bit as the driver is no longer doing 31 extra skb_clone()s and skb_free()s for each packet. [PATCH] PPC64 iSeries virtual ethernet locking fix From: Olaf Hering Missing spin_unlock in the error path. [PATCH] PPC32: Minor OCP cleanups From: Matt Porter Fixes a warning and a printk format in OCP. [PATCH] Mark CONFIG_MAC_SERIAL (drivers/macintosh/macserial.c) as broken From: Arthur Othieno CONFIG_MAC_SERIAL (drivers/macintosh/macserial.c) is marked obsolete and currently doesn't build. benh says: "I though build got fixed recently ... well, anyway, the driver is indeed obsolete, there's a new one in drivers/serial now." [PATCH] dpt_i2o warning fixes drivers/scsi/dpt_i2o.c: In function `adpt_queue': drivers/scsi/dpt_i2o.c:442: warning: use of cast expressions as lvalues is deprecated drivers/scsi/dpt_i2o.c: In function `adpt_scsi_register': drivers/scsi/dpt_i2o.c:2213: warning: use of cast expressions as lvalues is deprecated [PATCH] speed up readahead for seeky loads From: Ram Pai Currently the readahead code tends to read one more page than it should with seeky database-style loads. This was to prevent bogus readahead triggering when we step into the last page of the current window. The patch removes that workaround and fixes up the suboptimal logic instead. wrt the "rounding errors" mentioned in this patch, Ram provided the following description: Say the i/o size is 20 pages. Our algorithm starts by a initial average i/o size of 'ra_pages/2' which is mostly say 16. Now every time we take a average, the 'average' progresses as follows (16+20)/2=18 (18+20)/2=19 (19+20)/2=19 (19+20)/2=19..... and the rounding error makes it never touch 20 Benchmarking sitrep: IOZONE run on a nfs mounted filesystem: client machine 2proc, 733MHz, 2GB memory server machine 8proc, 700Mhz, 8GB memory ./iozone -c -t1 -s 4096m -r 128k [PATCH] security: add disable param to capabilities module From: Chris Wright Add disable param to capabilities module. Similar to the SELinux param for disabling at boot time. This allows vendors to ship single binary image with capabilities compiled statically, and disable it if they provide another security model compiled as module. [PATCH] fix radio-cadet `readq' namespace clash It conflicts with the readq() I/O function. [PATCH] Remove hardcoded offsets from i386 asm From: Brian Gerst Generate offsets for thread_info, cpuinfo_x86, and a few others instead of hardcoding them. [PATCH] Fix madvise length checking Fix http://bugme.osdl.org/show_bug.cgi?id=2710. When the user passed madvise a length of -1 through -4095, madvise blindly rounds this up to 0 then "succeeds". [PATCH] dentry size tuning Experimenting with various values of DENTRY_STORAGE dentry size objs/slab dentry size * objs/slab inline string 148 26 3848 32 152 26 3952 36 156 25 3900 40 160 24 4000 44 We're currently at 160. The patch fairly arbitrarily takes it down to 152, so we can fit a 35-char name into the inline part of the dentry. Also, go back to the old way of sizing d_iname so that any arch-specific compiler-forced alignemnts are honoured. [PATCH] Fix arithmetic in shrink_zone() From: Nick Piggin If the zone has a very small number of inactive pages, local variable `ratio' can be huge and we do way too much scanning. So much so that Ingo hit an NMI watchdog expiry, although that was because the zone would have a had a single refcount-zero page in it, and that logic recently got fixed up via get_page_testone(). Nick's patch simply puts a sane-looking upper bound on the number of pages which we'll scan in this round. It fixes another failure case: if the inactive list becomes very small compared to the size of the active list, active list scanning (and therefore inactive list refilling) also becomes small. This patch causes inactive list scanning to be keyed off the size of the active+inactive lists. It has the plus of hiding active and inactive balancing implementation from the higher level scanning code. It will slightly change other aspects of scanning behaviour, but probably not significantly. [PATCH] slab: enable runtime cache line size on i386 From: Manfred Spraul the attached patch switches the SLAB_HWCACHE_ALIGN alignment from the compile time L1 cache line size to the runtime detected value for i386. x86-64 already uses the runtime detection. [PATCH] slab: allow arch override for kmem_bufctl_t From: Manfred Spraul The slab allocator keeps track of the free objects in a slab with a linked list of integers (typedef'ed to kmem_bufctl_t). Right now unsigned int is used for kmem_bufctl_t, i.e. 4 bytes per-object overhead. The attached patch implements a per-arch definition of for this type: Theoretically, unsigned short is sufficient for kmem_bufctl_t and this would reduce the per-object overhead to 2 bytes. But some archs cannot operate on 16-bit values efficiently, thus it's not possible to switch everyone to ushort. The chosen types are a result of dicussions with the various arch maintainers. [PATCH] slab: add kmem_cache_alloc_node From: Manfred Spraul The attached patch adds a simple kmem_cache_alloc_node function: allocate memory on a given node. The function is intended for cpu bound structures. It's used for alloc_percpu and for the slab-internal per-cpu structures. Jack Steiner reported a ~3% performance increase for AIM7 on a 64-way Itanium 2. Port maintainers: The patch could cause problems if CPU_UP_PREPARE is called for a cpu on a node before the corresponding memory is attached and/or if alloc_pages_node doesn't fall back to memory from another node if there is no memory in the requested node. I think noone does that, but I'm not sure. [PATCH] Work around gcc 3.3.3-hammer sched miscompilation on x86-64 From: Andi Kleen The new domain scheduler got miscompiled on x86-64 with gcc 3.3.3-hammer, which is shipping with some distributions. The kernel deadlocks eventually under light stress on SMP systems with the right options. After some experiments it seems this simple change avoids the miscompilation. It also doesn't pessimize the code unduly for other architectures. [PATCH] BeFS MAINTAINERS update From: "Sergey S. Kostyliov" [PATCH] Fix for Makefiles to get KBUILD_OUTPUT working From: Mathieu Chouquet-Stringer If you use O=/someotherdir or KBUILD_OUTPUT=/someotherdir on the following architectures: alpha, mips, sh and cris, the build process is probably going to fail at one point or another, depending on the target you used, because make can't find scripts/Makefile.build or scripts/Makefile.clean. The following patch fixes this, I greped the whole tree and these four were the only "offenders" I found. [PATCH] reserve syscall slots for kexec From: "Randy.Dunlap" kexec is a fairly major and popular feature. People are shipping it in products, although it is not known if Linux distributors plan to ship it. The patch reserves the kexec syscall slots to pin the ABI down for everyone. - add kexec_load prototype to syscalls.h - add LINUX_REBOOT_CMD_KEXEC to reboot.h - add kexec_load syscall for ia32, ia64, x86_64, ppc32, ppc64 [PATCH] fore200e.c warning fix drivers/atm/fore200e.c: In function `fore200e_close': drivers/atm/fore200e.c:1659: warning: use of cast expressions as lvalues is deprecated [PATCH] Remove blk_run_queues() remnants It no longer exists. [PATCH] replace MOD_INC_USE_COUNT in cyber2000fb From: Christoph Hellwig This driver is unloadable for the pci case, but not if vlb cards are found so we can't use the module_exit removal to lock it into memory. Replace the MOD_INC_USE_COUNT with __module_get in it's module_init routine. [PATCH] don't mention MOD_INC_USE_COUNT/MOD_DEC_USE_COUNT in docs From: Christoph Hellwig If we want new drivers to not use obsolete interfaces we're better off not mentioning it in the documentation. [PATCH] mark the `planb' video driver broken From: Christoph Hellwig This one is missing updates from the v4l1 interfaces in 2.4 to the 2.6ish v4l2 and thus doesn't compile. While we're at it also remove the MOD_{INC,DEC}_USE_COUNT calls in it that were bogus even in 2.4 to avoid false positives in grep. [PATCH] Subject: [PATCH] kbuild SUBDIRS="more/ than/ one/" From: Andreas Gruenbacher Here is a patch that re-adds support for more than one directory in SUBDIRS. We have a number of packages that use this. The FORCE dependency of crmodverdir seems unnecessary; removing. (acked by Sam) [PATCH] correct ps2esdi module parm name From: "Randy.Dunlap" The module parameter name is incorrect (looks like a thinko). [PATCH] SELinux: fix error handling in selinuxfs From: Stephen Smalley This patch against 2.6.6 fixes error handling for two out-of-memory conditions in selinuxfs, avoiding potential deadlock due to returning without releasing a semaphore. The patch was submitted by Karl MacMillan of Tresys. [PATCH] Quota fix 3 - quota file corruption From: Jan Kara This patch fixes possible quota files corruption which could happen when root did not have any inodes&space allocated. Originally this could not happen as structure would not be written to disk in that case but with journalled quota we need to write even all-zero structure. The fix is not very nice but change of the format on disk is probably worse (I made a mistake with not including the usage-bitmaps into format :(). [PATCH] SubmittingDrivers completeness From: Jonathan Corbet I noticed a patch went in to Documentation/SubmittingDrivers which tweaked the URL for KernelTraffic. Here's a self-serving patch which makes that section more complete; to be fair, I added two other sites too. Just in case it's useful. [PATCH] EDD: remove unused SCSI header files From: Matt Domsch EDD: Remove no longer needed SCSI header file inclusion. Thanks to ArjanV for reminding me. [PATCH] efivars: add MODULE_VERSION, remove unnecessary check in exit From: Matt Domsch * Adds MODULE_VERSION * Remove check for efi_enabled in efivars_exit() - we aborted module load at init based on this already. [PATCH] do_generic_mapping_read() cleanup We just tested the page's uptodateness, no point in doing it again. [PATCH] drivers/cdrom/aztcd.c warning fix. From: "Luiz Fernando N. Capitulino" drivers/cdrom/azctd.c:379: warning: `pa_ok' defined but not used [PATCH] initialise mca_bus_type even if !MCA_bus From: "Randy.Dunlap" We need to call mca_system_init() to register MCA bus struct, otherwise find_mca_adapter() oopses with a NULL ptr dereference. Fixes this oops reported last week: http://marc.theaimsgroup.com/?l=linux-kernel&m=108455738606747&w=2 Thanks to James Bottomley for pointing this out. [PATCH] kNFSd: Use correct _bh locking on sv_lock. From: NeilBrown With the _bh, we can deadlock. [PATCH] kNFSd: Make sure CACHE_NEGATIVE is cleared when a cache entry is updates. From: NeilBrown This is important for update-in-place caches which may change from being negative to posative. Thanks to "J. Bruce Fields" and Olaf Kirch [PATCH] kNFSd: Allow larger writes to sunrpc/svc caches. From: NeilBrown We currently serialize all writes to these caches with queue_io_sem, so we only needed one buffer. There is some need for larger-than-one-page writes, so we can just statically allocate a buffer. [PATCH] kNFSd: Change fh_compose to NOT consume a reference to the dentry. From: NeilBrown fh_compose currently consumes a reference to the dentry but not the export point. This is both inconsistent and confusing. It is better if a routine like this doesn't consume reference points, so with this patch, it doesn't. This fixes a couple of very subtle and unusual reference counting errors. [PATCH] kNFSd: Protect reference to exp across calls to nfsd_cross_mnt From: NeilBrown nfsd_cross_mnt can release the reference to the passed svc_export structure when it returns a different svc_export structure. So we need to make sure we have a counted reference before, and drop the reference afterwards. [PATCH] kNFSd: Fix race conditions in idmapper From: NeilBrown From: "J. Bruce Fields" Also fix leaks on error; split up code a bit to make it easier to verify correctness. [PATCH] kNFSd: Improve idmapper behaviour on failure. From: NeilBrown From: "J. Bruce Fields" Slightly better behavior on failed mapping (which may happen either because idmapd is not running, or because there it has told us it doesn't know the mapping.): on name->id (setattr), return BADNAME. (I used ESRCH to communicate BADNAME, just because it was the first error in include/asm-generic/errno-base.h that had something to do with nonexistance of something, and that we weren't already using.) id->name (getattr), return a string representation of the numerical id. This is probably useless to the client, especially since we're unlikely to accept such a string on a setattr, but perhaps some client will find it mildly helpful. [PATCH] kNFSd: Reduce timeout when waiting for idmapper userspace daemon. From: NeilBrown From: "J. Bruce Fields" 1 second should be plenty of time; if we're going to take longer than that it's probably better just to return NFS4ERR_DELAY and let the client retry anyway. [PATCH] kNFSd: Remove check on number of threads waiting on user-space. From: NeilBrown From: "J. Bruce Fields" Currently we are counting the number of threads already asleep and returning an immediate NFS4ERR_DELAY (==JUKEBOX) error if more than half are already asleep. This patch removes that logic, so instead we only return NFS4ERR_DELAY if an upcall times out (if it takes more than a second to return). With the thread counting there is the risk that even when all the relevant subsystems are responsive, the client may still see occasional NFS4ERR_DELAY returns just because, by coincidence, several upcalls were initiated at the same time. I expect clients will delay several seconds before retrying after NFS4ERR_DELAY, so this will be quite noticeable to users. Sporadic long delays like this are likely to lead users to suspect a problem somewhere, when in fact there is none. The current scheme ensures that we can still process requests not depending on upcalls, even when all threads would otherwise be tied up waiting on upcalls. However, this is not something that should happen under normal circumstances; if a server spends a significant portion of its time with all threads waiting for upcalls, this a sign that something is seriously wrong. In such a circumstance (e.g., an ldap server dies), we can, at least, bound the waiting time to a second without the need for counting threads. In short, removing the thread-counting will allow us to behave predictably when things are working, while still allowing some progress when they don't. It would be a worthwhile project to measure the amount of time threads spend waiting for upcalls (or for reads, for that matter); if a significant portion of the time they spend handling requests is spent sleeping, then there's an opportunity to improve nfsd performance: if we can break the one-to-one mapping between requests and threads, then we can lower the number of threads required to keep the nfs server busy. However, both the currently available options for doing this are problematic: returning JUKEBOX/DELAY errors at random times will lead to unpredictable performance, and saving a copy of the request to be processed from scratch again later is wasteful and makes it difficult to provide correct semantics, especially in the NFSv4 case. So for now I believe waits with short timeouts are the best option. [PATCH] kNFSd: Add a warning when upcalls fail, From: NeilBrown From: "J. Bruce Fields" To help the user diagnose problems caused by user-level daemons not running. [PATCH] svc_recv() fix From: "J. Bruce Fields" svc_recv may call svc_sock_release before rqstp->rq_res is initialized. [PATCH] VFS cache sizing fix for small machines From: Matt Mackall Doing the algebra: c = (a - b) * 3/2 a' = a - c = a - 3/2(a - b) = (2a - 3a + 3b)/2 = (3b - a)/2 a' >= 0 3b - a >= 0 3b >= a b >= a/3 nr_free_pages() >= mempages/3 We can indeed get into trouble if we try to load a large kernel on a very small box (ie kernel reserves more than 2/3 of usable memory). Surprisingly I haven't hit this, but here's a fix. [PATCH] vga16fb-fix The recent ARM-specific fix broke ia32. Hopefully the ARM team can find an arch-specific implementation of VGA_MAP_MEM() which makes it work. [PATCH] Fix overzealous use of online cpu iterators From: Rusty Russell The IA64 hotplug CPU merge seems to have included some core changes: in particular the recalc_bh_state() needs to sum for all (including offline) cpus, since we don't empty the counters on CPU down. The totals printed by /proc/stat (the first loop) should include offline cpus, too (apparently printing out the per-cpu lines for offline cpus confuses top). [PATCH] use-before-uninitialized value in ext3(2)_find_ goal From: Mingming Cao There is a uninitialized goal value being referenced in both ext3 and ext2 find goal block functions (ext3_find_goal() and ext2_find_goal()). In the non-sequential write case, these functions check the goal value(non zero) before calling ext3(2)_find_near() to find the goal block to allocate. Since the goal value is uninitialized(non zero), the ext3(2)_find_near() is never being called in the non-sequential write, thus ext3(2)_find_goal() failed to guide a goal block in the random write case. ext3(2)_new_block() takes the junk goal value and will turn it to goal 0 since it's normally beyond the filesystem block number limit. The fix is trivial. [PATCH] s390: core s390 From: Martin Schwidefsky s390 core changes: - Fix system call trace / audit interface. - Fix find_first_bit / find_next_bit inlines assembly constraints. [PATCH] s390: dasd driver From: Martin Schwidefsky dasd device driver changes: - Reset pointer from ccw device to dasd_devmap on device removal. [PATCH] s390: zfcp host adapater From: Martin Schwidefsky zfcp host adapter change: - Remove misplaced dot in error message. - Remove unused performance statistics code. [PATCH] s390: network driver From: Martin Schwidefsky Network driver changes: - iucv: Make grab_param function SMP safe. - lcs: Fix null-pointer dereference after unsuccessful set_online. - qeth: Fix kmalloc flags in qeth_alloc_reply. - qeth: Show broadcase capability also in route4/6 sysfs attributes. - qeth: Remove debug code. - qeth: Add option to qetharp user space interface to strip unused fields from query arp records. - qeth: Add shortcut in outbound path for HiperSockets. - qeth: Add more info to qeth_perf_stats. - qeth: Add support for direct SNMP interface to OSA express cards. [PATCH] sir_dev locking fix From: Martin Diehl There was a spin_unlock missing in the raw mode tx-completion path. Probably it slipped through because the raw mode stuff is never reached with my Actisys hardware. [PATCH] s390 atomic_inc_and_test() fix From: David Mosberger [PATCH] raid locking fix. From: Neil Brown Fix bug #2661 Raid currently calls ->unplug_fn under spin_lock_irqsave(), but unplug_fns can sleep. After a morning of scratching my head and trying to come up with some that does less locking, the following is the best I can come up with. I'm not proud of it but it should work. If I move "nr_pending" out or rdev into the per-personality structures (e.g. mirror_info), and if I had "atomic_inc_if_nonzero" I could do with without locking so much, but random atomic* functions don't seem trivial [PATCH] ucLinux: return 0 on success from do_munmap() for nommu version Added a nommu version of sysctl_max_map_count. Fix return value from do_munmap(), it should return 0 on success not EINVAL. [PATCH] m68knommu: fix cache flush for 5407 ColdFire CPU Fix the cache flushing code for the ColdFire 5407 CPU. The cpushl instruction arguments are wrong, causing it to miss some cache lines. [PATCH] m68knommu: big clean/fix of Dragonball frame buffer driver Big cleanup of the Motorola DragonBall 68x328 frame buffer. It was quite broken before. Patch from Georges Menie . [PATCH] m68knommu: add find_next_bit() to bitops.h A couple of fixups for asm-m68knommu/bitops.h: . re-order definition of fls(), to be outside __KERNEL__ . add code for find_next_bit() [PATCH] m68knommu: add init points for Dragonball frame buffer driver Create init points for the Motorola Dragonball 68x328 frame buffer driver. Patch from Georges Menie [PATCH] m68knommu: un-define IO instructions when using smc driver We should un-define all the x86 style IO routines when redefining local versions. [PATCH] m68knommu: remove ColdFire specific atomic functions Remove ColdFire specific code sections for atomic_add and atomc_sub. These are not needed, the m68k asm code for these functions is ColdFire clean. [PATCH] m68knommu: correct build line for Dragonbakk frame buffer driver Correct build lines for Motorola Dragonball 68x328 frame buffer driver. Patch from Georges Menie [PATCH] m68knommu: remove un-used libgcc symbols Remove a lot of un-used and un-needed libgcc funstions from export list for m68knommu syms. [PATCH] m68knommu: add newlines to debug trace in comempci.c Add newlines to some printk debug trace of comempci.c driver. [PATCH] ppc64: move kmem_bufctl_t inside #ifndef __ASSEMBLY__ When the kmem_bufctl_t typedef got added to include/asm-ppc64/types.h, it got added outside the #ifndef __ASSEMBLY__ section, causing assembler errors. This patch, from David Gibson, moves it inside the #ifndef __ASSEMBLY__ region. [SPARC64]: Update defconfig. [SPARC64]: Mark sort_memlist static. [SPARC]: Fix prom_prom_taken[].theres_more setting. [PATCH] mxser.c kernel-2.6.5 This adds support for the CP-104 Moxa Smartio serial cards. Just add the PCI ID information. [SPARC64]: Verify that boot CPU number is less than NR_CPUS. [NETLINK]: Fix typo in netlink_unicast. [IPV6]: Fix sock identity checking bug in tcp_ipv6_check_established. [IPV6] handle return value from ip6_push_pending_frames(). [IPV6] unify XXX_push_pending_frames() code path for rawv6 sockets. [IPV6] unify csum_ipv6_magic() code path for rawv6 sockets. [IPV6] put appropriate checksum for rawv6 sockets even if it was not initialized. [IPV6] ensure to evaluate the checksum for sockets with the IPV6_CHECKSUM option. JFS: [CHECKER] Memory leak on commonly executed path The jfs_log structure was never being freed at unmount time. [PATCH] Invalid notify_change(symlink, [ATTR_MODE]) in nfsd Make sure NFS client doesn't see errors from mode setting on new symlinks. When nfsd creates a symlink, it tries to set the mode as the mode is carried in the NFS request and some filesystems store a mode. If the filesystem refuses to set the mode (e.g. -EOPNOTSUPP), this error should not be returned to the client. [ARM] Fix IXP4xx CLOCK_TICK_RATE to match HW 66.66... MHz [ARM] Fix IXP4XX_OST_RELOAD_MASK definition to not mask proper bits Current definition of OST_RELOAD_MASK masks off bit 2 of the timer reload value register when it should mask bits 0 and 1. This would cause small timeout values to be loaded incorrectly. [PATCH] Fix NFS long symlinks checks The NFS readlink() methods all take a buffer length argument. Use that instead of assuming PAGE_SIZE... We need to return ENAMETOOLONG rather than EIO. RPC: Ensure that if we reconnect, we delay by at least 15 seconds in order to avoid flooding of servers. NFS_O_DIRECT: there's a code path in nfs_direct_write_seg where NFS_I(inode)->data_updates can get out of sync with reality, which will lead to a BUG() in nfs_clear_inode later on. Patch by Olaf Kirch. NFS O_DIRECT: Change the NFS O_DIRECT path so that it no longer calls the generic VFS read and write routines. This allows all application read requests to pass through to the server, instead of just the ones that appear to be inside the file. this eliminates the requirement to use a GETATTR operation before each read or write to determine where the EOF is. This is a significant performance and scalability win. It also removes all requirements for holding the inode semaphore during NFS direct reads and writes, as the read and write logic no longer needs atomic access to the size of the file. this also helps client CPU scalability by reducing the serialization of writes against a single file. Patch by Chuck Lever NFSv4: Fix a bug in the open reboot-recovery code. Following a suggestion by Jamie Lokier RPC: Make "major" timeouts be of fixed length "timeo< NFS: Patch by Steve Dickson to improve error reporting when mounting an NFS filesystem. RPCSEC_GSS: this adds some new trace messages and makes existing ones consistent with other trace messages in the RPC client. Patch by Chuck Lever RPCSEC_GSS: Make a couple functions in the krb5 code more generally useful. This will help prepare for the spkm3 and lipkey mechanisms. Patch by Bruce Fields RPCSEC_GSS: Fix module reference counting. Clean up the interface to the GSSAPI code. Patch by Bruce Fields RPCSEC_GSS: Move EXPORT_SYMBOL's to place where functions are defined. Patch by Bruce Fields RPCSEC_GSS: Split out integrity code in wrap and unwrap procedures; otherwise they're going to be ridiculously long after we add privacy support. Patch by Bruce Fields RPCSEC_GSS: The expiration time passed down in the gss context is (duh!) in seconds, not jiffies! Patch by Bruce Fields nfs_writepage_sync stack reduction Patch from akpm From: Arjan van de Ven and akpm nfs/read.c: dynamically allocate the big structs JFS: [CHECKER] if txCommit fails, don't call d_instantiate In several functions, d_instantiate is called before the transaction is committed. Under the rare condition that txCommit fails, the new inode is released, but the dentry continues to point to it. This can lead to a seg fault. The fix is to call d_instantiate after txCommit has run successfully. ia64: Fix bug in fsys_rt_sigprocmask() reported by Andreas Schwab. ia64: Reserve syscall number for kexec_load(). From: Dave Jones Remove a local 1k array. [PATCH] ide-disk.c: don't put disks in STANDBY mode on reboot From: Bartlomiej Zolnierkiewicz From: Rene Herman Prevent the disks from spinning down across a reboot. [ARM PATCH] 1867/1: support for the Intel Mainstone (PXA27x based) eval board Patch from Nicolas Pitre [ARM PATCH] 1868/1: support for LEDs on Mainstone Patch from Nicolas Pitre [ARM PATCH] 1870/1: defconfig for Mainstone Patch from Nicolas Pitre [ARM PATCH] 1889/1: don't select CONFIG_IWMMXT just yet with Mainstone Patch from Nicolas Pitre Since the iWMMXt patch (#1866/1) requires more time to be reviewed, this patch will allow merging Mainstone patches without breaking anything if iWMMXt support isn't merged yet. Should be applied after patch #1867/1. [PATCH] fix for stuck cpus at boot] From: Anton Blanchard From: Rusty Russell When hotplug cpu isn't enabled, cpu_is_offline is always false. I had a stuck cpu at boot that resulted in a lockup because we tried to start a migration thread on it. Instead of cpu_is_offline we can use !cpu_online which should cover both the hotplug cpu enabled and disabled cases. [PATCH] ppc64: Fix readq & writeq From: Benjamin Herrenschmidt This fixes busted asm constraints for readq & writeq implementation on ppc64 that resulted in garbage beeing generated for writeq (plus an obvious mistake in the prototype). [PATCH] ppc64: fix inline version of _raw_spin_trylock From: Paul Mackerras When I added the out-of-line spinlocks on PPC64, I inadvertently introduced a bug in the inline version of _raw_spin_trylock, where it returns the opposite of what it should return. The patch below fixes it. [PATCH] ppc64: update xmon debugger From: Paul Mackerras This patch fixes a whole pile of problems in the xmon kernel debugger for ppc64. This basically makes xmon SMP-safe. Now, when we enter xmon it sends an IPI to the other CPUs to get them into xmon too. It also changes the way we do single-stepping and breakpoints so that we don't have to remove a breakpoint to proceed from it (instead we either emulate the instruction where the breakpoint was, or execute it out of place). With this patch, if we get an exception inside xmon, it will just return to the xmon command loop instead of hanging the system as at present. The patch is quite large because it updates the disassembler to the latest version from binutils (trimmed a bit), which is why I didn't cc lkml. [PATCH] ppc64: trivial cleanup From: David Gibson The ppc64 head.S contains an enable_32b_mode function which is used nowhere. This patch removes it. [PATCH] ppc64: make enter_rtas() take unsigned long arg From: Paul Mackerras We declare enter_rtas with a struct rtas_args * argument, though it is supposed to be a physical address, and then every time we call it we cast the unsigned long result from __pa() to a void *. This patch changes the declaration of enter_rtas to make it take an unsigned long argument, and removes the cast from all the callers. The actual enter_rtas() routine is in assembler and doesn't need to be changed. [PATCH] ramdisk fixes - Remove the ramdisk special-case in fs-writeback.c - it will soon be unneeded. - Fix rd_ioctl() to return -ENOTTY on invalid ioctl types, not -EINVAL. - Make ramdisk Kconfig friendlier. [PATCH] ramdisk memory allocation fixes Allocating pagecache pages within the disk request_fn is deadlocky and prone to page allocation failures, causing write I/O errors. Attempt to improve things by fiddling with gfp masks. [PATCH] ramdisk: lock blockdev pages during "IO". There's a race: one CPU writes a 1k block into a ramdisk page which isn't in the blockdev pagecache yet. It memsets the locked page to zeroes. While this is happening, another CPU comes in and tries to write a different 1k block to the "disk". But it doesn't lock the page so it races with the memset and can have its data scribbled over. Fix this up by locking the page even if it already existed in pagecache. Locking a pagecache page in a make_request_fn sounds deadlocky but it is not, because: a) ramdisk_writepage() does nothing but a set_bit(), and cannot recur onto the same page. b) Any higher-level code which holds a page lock is supposed to be allocating its memory with GFP_NOFS, and in 2.6 kernels that's equivalent to GFP_NOIO. (The distinction between GFP_NOIO and GFP_NOFS basically disappeared with the buffer_head LRU, although it was reused for writes to swap). [PATCH] ramdisk: use kmap_atomic() in rd_blkdev_pagecache_IO() We don't actualy need to kmap the blockdev inode's pages at all, because they're ~__GFP_HIGHMEM. But it's future-safe, and cheap. [PATCH] ramdisk: fix PageUptodate() handling When a filesystem does getblk() to get a buffer_head against the ramdisk the VFS will allocate a new not-uptodate pagecache page and will attach buffers to it. The filesystem will then bring certain buffer_heads uptodate. But not the whole page. Later, various ramdisk a_ops see the not-uptodate page and wipe the whole thing out, including the parts to which the filesystem wrote! Fix that up by only zapping those parts of the page which are covered by non-uptodate buffers. [PATCH] ramdisk: implement writepages() Implement an empty ->writepages() so that attempts to write back ramdisk pages have less work to do. [PATCH] ramdisk: separate the blockdev backing_dev_info from the hosted inodes' Give appropriate and separate backing_dev_info's to both the ramdisk blockdev inode and to the files which live atop the ramdisk. Everything works now. [PATCH] Debugging option to put data symbols in kallsyms From: Rusty Russell kallsyms contains only function names, but some debuggers (eg. xmon on PPC/PPC64) use it to lookup symbols: it'd be much nicer if it included data symbols too. [PATCH] getblk() BUG removal We keep on getting BUG()s from isofs_read_super() because it passes an insane blocksize to bread(). See http://bugme.osdl.org/show_bug.cgi?id=2735 for example. I don't know what's up with isofs, but going BUG in there seems a bit rude. Change it to drop a bunch of diagnostics and a backtrace then return a null bh*. Most callers of getblk() don't expect it to fail, so they'll oops anyway. But isofs does actually check for a NULL return. This way, the machine stays up and we get better debug diagnostics. [PATCH] fbmem: rename sys_inbuf() and sys_outbuf() From: David Mosberger These aren't syscalls, so rename them. And make them static. [PATCH] Fix NFSD oops in readdir From: Neil Brown If a single readdir entry needs to be split over two pages in the reply, we first encode it into a new page, and then copy the bits into place. When we do this relocation, we have to modify the "offset" pointer to be either in the first or the second page, as appropriate. If the pointer should be at the start of the second page, it is currently put past the end of the first page. Note that as the offset and whole response is known to be 4byte-aligned, the offset pointer will never be split over two pages. [PATCH] cleanup double semicolons From: Nuno Monteiro Remove lots of double-semicolons. [PATCH] trivial: Fix name of biovec slab From: Rusty Russell From: "Chen, Kenneth W" Pure cosmetic. Largest biovec slab is printed as biovec-BIO_MAX_PAGES in /proc/slabinfo. It would be more informative to print actual number instead of macro's name. [PATCH] trivial: scripts_kernel-doc should strip comments inside structs' From: Rusty Russell From: Long block comment before declaration moves it out of page in pdfs. Alexey $ ./linux-2.6.6-rc2/scripts/kernel-doc.orig -text test.c struct stuff: struct stuff { int a; /** comment here*/char b; }; Members: a aaaa b bbbbb Description: stuff $ ./linux-2.6.6-rc2/scripts/kernel-doc -text test.c struct stuff: struct stuff { int a; char b; }; Members: a aaaa b bbbbb Description: stuff [PATCH] trivial: add parantheses for if (necessary for cross-compilation) From: Rusty Russell From: Martin Schaffner [PATCH] trivial: fix old URLs in initrd doc From: Rusty Russell From: Marco Cova [PATCH] trivial: MAINTAINERS fbdev - web site change From: Rusty Russell From: David Eger Doesn't resolve. [PATCH] trivial: Fix #endif comment in linux_moduleparam.h From: Rusty Russell From: Pavel Machek If we are providing "helpful" comment, it should better be correct. [PATCH] trivial: fix /proc documentation lies about file-nr From: Rusty Russell From: Tommi Virtanen [PATCH] trivial: Make JFFS2 ready for Linux 2.7 From: Rusty Russell (OK from maintainer David Woodhouse ) From: Sam Ravnborg From: > since the code for Linux 2.4 compatibility in fs/jffs2 is gone, we can > clean up the Makefile a bit. Following patch makes the Makefile > compatible with Linux 2.7 instead . :) Please consider applying. If we are going to clean up this I prefer we get rid of the local variables. See attached patch. Sam [PATCH] trivial: drivers/media/video_ir-kbd-gpio.c: kill duplicate include From: Rusty Russell From: a.othieno@bluewin.ch (Arthur Othieno) [PATCH] trivial: add CC Trivial Patch Monkey to SubmittingPatches From: Rusty Russell From: maximilian attems The "Trivial Patch Monkey" is neither documented in MAINTAINERS nor was there a note in SubmittingPatches. [PATCH] trivial: fix counter in build_zonelists() From: Rusty Russell From: Stephen Leonard This fixes a counter that is unnecessarily incremented in build_zonelists(). [PATCH] trivial: swsusp section usage From: Rusty Russell From: Pavel Machek This patch fixes init section usage in swsusp.c: "read_suspend_image()" can be __init. [PATCH] Feed arch/i386/kernel/msr.c through Lindent [PATCH] msr.c touchups A few things which Lindent got wrong. [PATCH] Fix i386/x86_64 cpuid/msr BUG() on impossible CPUs From: Rusty Russell Matthieu Castet pointed out that testing cpu_online(cpu) on a UP system goes BUG(). That's because you're never supposed to ask cpu_online() about a CPU which is >= NR_CPUS. msr and cpuid devices use the minor to indicate the CPU number. Oops. Fix is to explicitly test cpu < NR_CPUS. Using cpu_online() is OK; although the CPU might go down before you actually read the file, that will simply cause junk to be returned. [PATCH] drop left-over #ifndef __ia64__ From: David Mosberger It used to be that loops_per_jiffy was a macro on ia64, hence it couldn't be exported. That's no longer the case though, so there is no point in inhibiting its export (not that it makes any _sense_ to export that value on ia64). [PATCH] Fix power/shutdown.c comments From: Roger Luethi Make the comments in drivers/base/power/shutdown.c somewhat less wrong. There's still room for improvement :-/. [PATCH] Neaten and fix init/main.c cpu bringup message From: Andrew Theurer Use num_online_cpus in smp_init instead of counting cpus which may or may not really be brought up. [PATCH] ramfs lfs limit From: Andrea Arcangeli this fixes the 2G limit on ramfs ia64: Make cond_syscall() declare a dummy prototype so GCC doesn't complain. ia64: Kill a warning when arch/ia64/kernel/machvec.c gets compiled on UP. ia64: Update defconfig [PATCH] H8/300 mtd setup fix - config symbol fix [PATCH] H8/300 new ide driver support - new config items - interface setup - io cleanup [PATCH] H8/300 module support update - add module support code - add H8/300 ELF infomation - fix kcore ELF format Add 'mode' argument to vfs_symlink. Right now we ignore it, but we need to pass this down to the low-level filesystems if we want to ever make knfsd create symlinks with different permissions correctly. [IPV4]: Fix deadlock in IP tunnel error path. [BRIDGE]: Fix LL_RESERVED_SPACE usage in netfilter code. [IPSEC]: Fix state modifications in xfrm_state_update(). doing a mod_timer on a live state without holding a lock or for that matter not even checking whether the state is dead is definitely a bad idea [ARM PATCH] 1884/1: OMAP update 1/2: arch files Patch from Tony Lindgren This patch syncs the mainline kernel with the linux-omap tree. The highlights of the patch are: - Changed the BOOT_MEM() to use the new IO address (Tony Lindgren) - Cleaned up interrupt handler (Juha Yrjölä) - DMA channel linking for 1610 (Samuel Ortiz) - GPIO fixes (Juha Yrjölä) - IRQ fix for OMAP-730 (Kevin Hilman) - OMAP-1510 FPGA interrupt fix (Dirk Behme) - OMAP-1610 voltage change settings (Todd Poynor) - Uncompress kernel serial output fixes (Tony Lindgren) [PATCH] block device layer: separate backing_dev_info infrastructure Sigh. ramdisk almost works, except it loses data on umount. This is because the files which are atop the ramdisk do not contribute to dirty memory accounting, but they do need writeback. So when sync() calls sync_inodes_sb() to do the work, sync_inodes_sb() hopelessly underestimates the number of pages which need writeback for a complete sync. If you run `sync' enough times, everything eventually hits "disk" and all is happy. The root cause here is that the ramdisk and the files which it hosts shared the same backing_dev_info. This is inappropriate because the hosted files *do* want to writeback and really should contribute to dirty memory accounting. But the ramdisk inode itself wants neither. So. The patch sets up the infrastructure which permits a blockdev to provide a separate backing_dev_info for the files which it hosts. It's a bit of a ramdisk-special. [REDO] ramdisk: separate the blockdev backing_dev_info from the hosted inodes This re-applies the separation of the ramdisk blockdev inode backing store and the files that live in the ramdisk. Cset exclude: torvalds@ppc970.osdl.org|ChangeSet|20040521210025|21437 [ARM PATCH] 1885/1: OMAP update 2/2: include files Patch from Tony Lindgren This patch syncs the mainline kernel with the linux-omap tree. The highlights of the patch are: - Changed the BOOT_MEM() to use the new IO address (Tony Lindgren) - Cleaned up interrupt handler (Juha Yrjölä) - DMA channel linking for 1610 (Samuel Ortiz) - GPIO fixes (Juha Yrjölä) - IRQ fix for OMAP-730 (Kevin Hilman) - OMAP-1510 FPGA interrupt fix (Dirk Behme) - OMAP-1610 voltage change settings (Todd Poynor) - Uncompress kernel serial output fixes (Tony Lindgren) [ARM PATCH] 1887/1: Update OMAP low level debug functions again Patch from Tony Lindgren This patch makes the low level debug functions work when support is compiled in for multiple OMAPs. The patch also removes now unnecessary include, incorrect comment, and SERIAL_REG_SHIFT ifdefs. [PATCH] Fix !CONFIG_SYSFS build From: Maneesh Soni The sysfs_rename_dir() interface was changed recently but I forgot to change the definition if CONFIG_SYSFS is not defined. [PATCH] vga16fb warning fix drivers/video/vga16fb.c:1350: warning: assignment makes pointer from integer without a cast [PATCH] gss_api build fix From: "J. Bruce Fields" Older gcc's don't like that dimensionless array. Remove it in favour of a pointer to the data. [PATCH] console autodetection for pmac From: Olaf Hering This one allows console autodetection for powermacs. [PATCH] fix sendfile on 64bit architectures From: Andi Kleen sys_sendfile has a hardcoded 2GB limit. 64bit architectures should probably always use sys_sendfile64() in their native system tables, because for them sizeof(off_t) == sizeof(loff_t). This patch does this. It seemed easier to just change the 64bit entry tables instead of fixing up all the emulation layers to do 2GB checks on their own. I changed all 64bit architectures except for parisc64, which seemed to already have a sendfile64. [PATCH] fbdev: mode switching fix. From: James Simmons This fixes the bugs that where in mode switch via stty. The problem was we couldn't set the mode just by using the x and y resolution. We use modedb to fill in the rest. There also was a bug that allowed you to change the console resolution for drivers with fixed resolutions. This would mess up your display. Now that is fixed. [PATCH] trivial: use page_to_phys in dma_map_page() From: Trivial Patch Monkey From: Adam Lackorzynski dma_map_page() can be simplified by using page_to_phys instead of writing the calculation explicitly. [PATCH] trivial: remove duplicated #includes From: Rusty Russell From: a.othieno@bluewin.ch (Arthur Othieno) From: Vinay K Nallamothu Remove various duplicated #includes From: Vinay K Nallamothu Use mod_timer in drivers_block_floppy98.c From: carbonated beverage doc update for bk usage bk://... appears to be dead, use http://... instead. [PATCH] Sanitise handling of unneeded syscall stubs From: David Mosberger Below is a patch that tries to sanitize the dropping of unneeded system-call stubs in generic code. In some instances, it would be possible to move the optional system-call stubs into a library routine which would avoid the need for #ifdefs, but in many cases, doing so would require making several functions global (and possibly exporting additional data-structures in header-files). Furthermore, it would inhibit (automatic) inlining in the cases in the cases where the stubs are needed. For these reasons, the patch keeps the #ifdef-approach. This has been tested on ia64 and there were no objections from the arch-maintainers (and one positive response). The patch should be safe but arch-maintainers may want to take a second look to see if some __ARCH_WANT_foo macros should be removed for their architecture (I'm quite sure that's the case, but I wanted to play it safe and only preserved the status-quo in that regard). [PATCH] blk: clear completion stack pointer on return From: Jens Axboe It doesn't always look safe to let ->waiting remain set when returning from functions that set it to point to stack area, since various locations check for != NULL to see if it's valid. So clear it on return from ide_do_drive_cmd() and blk_execute_rq(). [PATCH] swsusp: kill unneccessary debugging From: Pavel Machek This is no longer neccessary. We have enough pauses elsewhere, and it works well enough that this is not needed. [PATCH] swsusp: fix devfs breakage introduced in 2.6.6 From: Pavel Machek This fixes bad interaction between devfs and swsusp. Check whether the swap device is the specified resume device, irrespective of whether they are specified by identical names. (Thus, device inode aliasing is allowed. You can say /dev/hda4 instead of /dev/ide/host0/bus0/target0/lun0/part4 [if using devfs] and they'll be considered the same device. This is *necessary* for devfs, since the resume code can only recognize the form /dev/hda4, but the suspend code would like the long name [as shown in 'cat /proc/mounts'].) [Thanks to devfs hero whose name I forgot.] [PATCH] i4l: Eicon driver: fix __devexit in prototype From: Armin Schindler Fixes a compiler warning about unused Eicon ISDN driver function if hotplug is disabled. [PATCH] x86 cpuid cache info update From: Francois Romieu Missing cache size format for Intel P4E (p.26 of doc. 241618-025, "Intel Processor Identification and the CPUID Instruction"). [PATCH] autofs4: printk cleanup From: Ian Kent This is a patch contributed by Joe Perches to automatically include the function name in the dprintk statements. [PATCH] autofs4: MAINTAINERS update From: Ian Kent This changes the autofs4 maintainer to me. Recommended by Joe Perches and OKed with Jeremy. [PATCH] JFFS2_FS_NAND=y compile error The case of CONFIG_JFFS2_FS_NAND=y got broken recently. The bug is obvious, and the fix is trivial: [PATCH] more comx removal The patch below removes the MAINTAINERS entry for the removed comx driver. Additionally, the following comx header files could be removed: drivers/net/wan/mixcom.h drivers/net/wan/hscx.h drivers/net/wan/munich32x.h drivers/net/wan/falc-lh.h I've double-checked that none of them are used by any other driver. [PATCH] remove dead drivers/ide/ppc/swarm.c This driver was partially merged in 2.5.32 and never compiled in 2.5/2.6. It was fixed in linux-mips CVS but has been broken again about 5 months ago. Just remove it for now (it is in wrong directory anyway). [PATCH] two fixups for my ARM/ARM26 IDE changes - initializing needs to be set to 1 before calling ide_arm_init() - ide_default_io_ctl() should be 0 on arm26 [PATCH] IDE PCI: don't initialize fields of static chipset tables to zero Also remove unused EOL define from ide.h. This trivial patch makes grepping a lot easier. [IPSEC]: Lock policy in policy timer. [BRIDGE]: Handle delete of multiple devices with same address. This fixes the issue discovered when removing bluetooth devices from a bridge. Need to add special case code when forwarding table is being cleaned up to handle the case where several devices share the same hardware address. [BRIDGE]: Cleanup of bridge allocation. Minor cleanup (lead in to later sysfs support). Change new_nb to new_bridge_dev and return the net_device rather than bridge because that is what the caller wants anyway. [BRIDGE]: Relax locking on add/delete. Relax the locking on add/delete interfaces to a bridge. Since these operations are already called with RTNL semaphore, only need to hold the bridge lock while doing operations related to STP and processing path. This is necessary for later sysfs support where those operations might sleep. [BRIDGE]: Ioctl cleanup and consolidation. Merge the ioctl stub calls that just end up calling the sub-function to do the actual ioctl. Move br_get_XXX_ifindices into the ioctl file as well where they can be static. [BRIDGE]: Fix deadlock on device removal. Fix a deadlock where deleting a device call br_del_if with lock held. br_del_if doesn't want to be called under lock anymore. [BRIDGE]: Read forwarding table chunk at a time. Change how the read of forwarding table works. Instead of copying entries to user one at a time, use an intermediate kernel buffer and do up to a page at a chunk. This gets rid of some awkward code dealing with entries getting deleted during the copy. And allows same function to be used by later sysfs hook. [BRIDGE]: Expose timer_residue function for use by sysfs. Move the local function timer_residue to br_timer_value so it can be used by both ioctl and sysfs code. [BRIDGE]: Add sysfs support. [BRIDGE]: New ioctl interface for 32/64 compatability. Add four new ioctl's for the operations that can't be done through sysfs. The existing bridge ioctl's are multiplexed, and most go through SIOCDEVPRIVATE so they won't work in a mixed 32/64bit environment. The new release of bridge-utils will use these if possible, and fall back to the old interface. [BRIDGE]: Compat hooks for new-ioctl interface. Replacement 64 bit compatibility code for the new ioctl's. The new ioctl's all pass through clean, but for the old style ioctl's it uses the mis-feature of the earlier bridge-utils that they check the API version. So if an old 32bit version of brctl is run on a 64bit platform it will report bridge utilities not compatible with kernel version Tested on Itanium 1; but should solve issue for sparc, ppc, and x86_64 [BRIDGE]: Forwarding table sanity checks. Forwarding table paranoia: * Solve some potential problems if a device changes address and one or more device has the same address. * Warn if new device added to a bridge matches a entry that has shown up on the network. * Also don't put static entries in the timer list, they don't time out so shouldn't be there. Avoid type warning in comparison by making it explicit. (The difference between two pointers is a "size_t", while MAX_LEN and the result here are "int"s). [PATCH] Make swapper_space tree_lock irq-safe ->tree_lock is supposed to be IRQ-safe. Hugh worked out that with his changes, we never actually take it from interrupt context, so spin_lock() is sufficient. Apart from kinda freaking me out, the analysis which led to this decision becomes untrue with later patches. So make it irq-safe. [PATCH] __add_to_swap_cache and add_to_pagecache() simplification Simplify the logic in there a bit. [PATCH] revert recent swapcache handling changes Go back to the 2.6.5 concepts, with rmap additions. In particular: - Implement Andrea's flavour of page_mapping(). This function opaquely does the right thing for pagecache pages, anon pages and for swapcache pages. The critical thing here is that page_mapping() returns &swapper_space for swapcache pages without actually requiring the storage at page->mapping. This frees page->mapping for the anonmm/anonvma metadata. - Andrea and Hugh placed the pagecache index of swapcache pages into page->private rather than page->index. So add new page_index() function which hides this. - Make swapper_space.set_page_dirty() again point at __set_page_dirty_buffers(). If we don't do that, a bare set_page_dirty() will fall through to __set_page_dirty_buffers(), which is silly. This way, __set_page_dirty_buffers() can continue to use page->mapping. It should never go near anon or swapcache pages. - Give swapper_space a ->set_page_dirty address_space_operation method, so that set_page_dirty() will not fall through to __set_page_dirty_buffers() for swapcache pages. That function is not set up to handle them. The main effect of these changes is that swapcache pages are treated more similarly to pagecache pages. And we are again tagging swapcache pages as dirty in their radix tree, which is a requirement if we later wish to implement swapcache writearound based on tagged radix-tree walks. [PATCH] vmscan: revert may_enter_fs changes Fix up the "may we call writepage" logic for the swapcache changes. [PATCH] Make sync_page use swapper_space again Revert recent changes to sync_page(). Now that page_mapping() returns &swapper_space for swapcache pages we don't need to test for PageSwapCache in sync_page(). [PATCH] __set_page_dirty_nobuffers race fix Running __mark_inode_dirty() against a swapcache page is illegal and will oops. I see a race in set_page_dirty() wherein it can be called with a PageSwapCache page, but if the page is removed from swapcache after __set_page_dirty_nobuffers() drops tree_lock(), we have the situation where PageSwapCache() is false, but local variable `mapping' points at swapcache. Handle that by checking for non-null mapping->host. We don't care about the page state at this point - we're only interested in the inode. There is a converse case: what if a page is added to swapcache as we are running set_page_dirty() against it? In this case the page gets its PG_dirty flag set but it is not tagged as dirty in the swapper_space radix tree. The swap writeout code will handle this OK and test_clear_page_dirty()'s call to radix_tree_tag_clear(PAGECACHE_TAG_DIRTY) will silently have no effect. The only downside is that future radix-tree-based writearound won't notice that such pages are dirty and swap IO scheduling will be a teensy bit worse. The patch also fixes the (silly) testing of local variable `mapping' to see if the page was truncated. We should test page_mapping() for that. [PATCH] rmap 7 object-based rmap From: Hugh Dickins Dave McCracken's object-based reverse mapping scheme for file pages: why build up and tear down chains of pte pointers for file pages, when page->mapping has i_mmap and i_mmap_shared lists of all the vmas which might contain that page, and it appears at one deterministic position within the vma (unless vma is nonlinear - see next patch)? Has some drawbacks: more work to locate the ptes from page_referenced and try_to_unmap, especially if the i_mmap lists contain a lot of vmas covering different ranges; has to down_trylock the i_shared_sem, and hope that doesn't fail too often. But attractive in that it uses less lowmem, and shifts the rmap burden away from the hot paths, to swapout. Hybrid scheme for the moment: carry on with pte_chains for anonymous pages, that's unchanged; but file pages keep mapcount in the pte union of struct page, where anonymous pages keep chain pointer or direct pte address: so page_mapped(page) works on both. Hugh massaged it a little: distinct page_add_file_rmap entry point; list searches check rss so as not to waste time on mms fully swapped out; check mapcount to terminate once all ptes have been found; and a WARN_ON if page_referenced should have but couldn't find all the ptes. [PATCH] rmap 8 unmap nonlinear From: Hugh Dickins The previous patch let the ptes of file pages be located via page ->mapping->i_mmap and i_mmap_shared lists of vmas; which works well unless the vma is VM_NONLINEAR - one in which sys_remap_file_pages has been used to place pages in unexpected places, to avoid an explosion of distinct unmergable vmas. Such pages were effectively locked in memory. page_referenced_file is already skipping nonlinear vmas, they'd just waste its time, and age unfairly any pages in their proper positions. Now extend try_to_unmap_file, to persuade it to swap from nonlinears. Ignoring the page requested, try to unmap cluster of 32 neighbouring ptes (in worst case all empty slots) in a nonlinear vma, then move on to the next vma; stopping when we've unmapped at least as many maps as the requested page had (vague guide of how hard to try), or have reached the end. With large sparse nonlinear vmas, this could take a long time: inserted a cond_resched while no locks are held, unusual at this level but I think okay, shrink_list does so. Use vm_private_data a little like the old mm->swap_address, as a cursor recording how far we got, so we don't attack the same ptes next time around (earlier tried inserting an empty marker vma in the list, but that got messy). How well this will work on real- life nonlinear vmas remains to be seen, but should work better than locking them all in memory, or swapping everything out all the time. Existing users of vm_private_data have either VM_RESERVED or VM_DONTEXPAND set, both of which are in the VM_SPECIAL category where we never try to merge vmas: so removed the vm_private_data test from is_mergeable_vma, so we can still merge VM_NONLINEARs. Of course, we could instead add another field to vm_area_struct. [PATCH] slab: consolidate panic code Many places do: if (kmem_cache_create(...) == NULL) panic(...); We can consolidate all that by passing another flag to kmem_cache_create() which says "panic if it doesn't work". [PATCH] rmap 9 remove pte_chains From: Hugh Dickins Lots of deletions: the next patch will put in the new anon rmap, which should look clearer if first we remove all of the old pte-pointer-based rmap from the core in this patch - which therefore leaves anonymous rmap totally disabled, anon pages locked in memory until process frees them. Leave arch files (and page table rmap) untouched for now, clean them up in a later batch. A few constructive changes amidst all the deletions: Choose names (e.g. page_add_anon_rmap) and args (e.g. no more pteps) now so we need not revisit so many files in the next patch. Inline function page_dup_rmap for fork's copy_page_range, simply bumps mapcount under lock. cond_resched_lock in copy_page_range. Struct page rearranged: no pte union, just mapcount moved next to atomic count, so two ints can occupy one long on 64-bit; i386 struct page now 32 bytes even with PAE. Never pass PageReserved to page_remove_rmap, only do_wp_page did so. From: Hugh Dickins Move page_add_anon_rmap's BUG_ON(page_mapping(page)) inside the rmap_lock (well, might as well just check mapping if !mapcount then): if this page is being mapped or unmapped on another cpu at the same time, page_mapping's PageAnon(page) and page->mapping are volatile. But page_mapping(page) is used more widely: I've a nasty feeling that clear_page_anon, page_add_anon_rmap and/or page_mapping need barriers added (also in 2.6.6 itself), [PATCH] rmap 10 add anonmm rmap From: Hugh Dickins Hugh's anonmm object-based reverse mapping scheme for anonymous pages. We have not yet decided whether to adopt this scheme, or Andrea's more advanced anon_vma scheme. anonmm is easier for me to merge quickly, to replace the pte_chain rmap taken out in the previous patch; a patch to install Andrea's anon_vma will follow in due course. Why build up and tear down chains of pte pointers for anonymous pages, when a page can only appear at one particular address, in a restricted group of mms that might share it? (Except: see next patch on mremap.) Introduce struct anonmm per mm to track anonymous pages, all forks from one exec sharing the same bundle of linked anonmms. Anonymous pages originate in one mm, but may be forked into another mm of the bundle later on. Callouts from fork.c to allocate, dup and exit the anonmm structure private to rmap.c. From: Hugh Dickins Two concurrent exits (of the last two mms sharing the anonhd). First exit_rmap brings anonhd->count down to 2, gets preempted (at the spin_unlock) by second, which brings anonhd->count down to 1, sees it's 1 and frees the anonhd (without making any change to anonhd->count itself), cpu goes on to do something new which reallocates the old anonhd as a new struct anonmm (probably not a head, in which case count will start at 1), first resumes after the spin_unlock and sees anonhd->count 1, frees "anonhd" again, it's used for something else, a later exit_rmap list_del finds list corrupt. [PATCH] rmap 11 mremap moves From: Hugh Dickins A weakness of the anonmm scheme is its difficulty in tracking pages shared between two or more mms (one being an ancestor of the other), when mremap has been used to move a range of pages in one of those mms. mremap move is not very common anyway, and it's more often used on a page range exclusive to the mm; but uncommon though it may be, we must not allow unlocked pages to become unswappable. This patch follows Linus' suggestion, simply to take a private copy of the page in such a case: early C-O-W. My previous implementation was daft with respect to pages currently on swap: it insisted on swapping them in to copy them. No need for that: just take the copy when a page is brought in from swap, and its intended address is found to clash with what rmap has already noted. If do_swap_page has to make this copy in the mremap moved case (simply a call to do_wp_page), might as well do so also in the case when it's a write access but the page not exclusive, it's always seemed a little odd that swapin needed a second fault for that. A bug even: get_user_pages force imagines that a single call to handle_mm_fault must break C-O-W. Another bugfix: swapoff's unuse_process didn't check is_vm_hugetlb_page. Andrea's anon_vma has no such problem with mremap moved pages, handling them with elegant use of vm_pgoff - though at some cost to vma merging. How important is it to handle them efficiently? For now there's a msg printk(KERN_WARNING "%s: mremap moved %d cows\n", current->comm, cows); [PATCH] rmap 12 pgtable remove rmap From: Hugh Dickins Remove the support for pte_chain rmap from page table initialization, just continue to maintain nr_page_table_pages (but only for user page tables - it also counted vmalloc page tables before, little need, and I'm unsure if per-cpu stats are safe early enough on all arches). mm/memory.c is the only core file affected. But ppc and ppc64 have found the old rmap page table initialization useful to support their ptep_test_and_clear_young: so transfer rmap's initialization to them (even on kernel page tables? well, okay). [PATCH] rmap 13 include/asm deletions From: Hugh Dickins Delete include/asm*/rmap.h Delete pte_addr_t typedef from include/asm*/pgtable.h Delete KM_PTE2 from subset of include/asm*/kmap_types.h Beware when 4G/4G returns to -mm: i386 may need KM_FILLER for 8K stack. [PATCH] Convert i_shared_sem back to a spinlock Having a semaphore in there causes modest performance regressions on heavily mmap-intensive workloads on some hardware. Specifically, up to 30% in SDET on NUMAQ and big PPC64. So switch it back to being a spinlock. This does mean that unmap_vmas() needs to be told whether or not it is allowed to schedule away; that's simple to do via the zap_details structure. This change means that there will be high scheuling latencies when someone truncates a large file which is currently mmapped, but nobody does that anyway. The scheduling points in unmap_vmas() are mainly for munmap() and exit(), and they still will work OK for that. From: Hugh Dickins Sorry, my premature optimizations (trying to pass down NULL zap_details except when needed) have caught you out doubly: unmap_mapping_range_list was NULLing the details even though atomic was set; and if it hadn't, then zap_pte_range would have missed free_swap_and_cache and pte_clear when pte not present. Moved the optimization into zap_pte_range itself. Plus massive documentation update. From: Hugh Dickins Here's a second patch to add to the first: mremap's cows can't come home without releasing the i_mmap_lock, better move the whole "Subtle point" locking from move_vma into move_page_tables. And it's possible for the file that was behind an anonymous page to be truncated while we drop that lock, don't want to abort mremap because of VM_FAULT_SIGBUS. (Eek, should we be checking do_swap_page of a vm_file area against the truncate_count sequence? Technically yes, but I doubt we need bother.) - We cannot hold i_mmap_lock across move_one_page() because move_one_page() needs to perform __GFP_WAIT allocations of pagetable pages. - Move the cond_resched() out so we test it once per page rather than only when move_one_page() returns -EAGAIN. [PATCH] rmap 14: i_shared_lock fixes From: Hugh Dickins First of batch of six patches which introduce Rajesh Venkatasubramanian's implementation of a radix priority search tree of vmas, to handle object-based reverse mapping corner cases well. rmap 14 i_shared_lock fixes Start the sequence with a couple of outstanding i_shared_lock fixes. Since i_shared_sem became i_shared_lock, we've had to shift and then temporarily remove mremap move's protection of concurrent truncation - if mremap moves ptes while unmap_mapping_range_list is making its way through the vmas, there's a danger we'd move a pte from an area yet to be cleaned back into an area already cleared. Now site the i_shared_lock with the page_table_lock in move_one_page. Replace page_table_present by get_one_pte_map, so we know when it's necessary to allocate a new page table: in which case have to drop i_shared_lock, trylock and perhaps reorder locks on the way back. Yet another fix: must check for NULL dst before pte_unmap(dst). And over in rmap.c, try_to_unmap_file's cond_resched amidst its lengthy nonlinear swapping was now causing might_sleep warnings: moved to a rather unsatisfactory and less frequent cond_resched_lock on i_shared_lock when we reach the end of the list; and one before starting on the nonlinears too: the "cursor" may become out-of-date if we do schedule, but I doubt it's worth bothering about. [PATCH] numa api: x86_64 support From: Andi Kleen Add NUMA API system calls on x86-64 This includes a bugfix to prevent miscompilation on gcc 3.2 of bitmap.h [PATCH] numa api: Add i386 support From: Andi Kleen Add NUMA API system calls for i386 [PATCH] numa api: Add IA64 support From: Andi Kleen Add NUMA API system calls on IA64 and one bug fix required for it. [PATCH] numa api: Core NUMA API code From: Andi Kleen The following patches add support for configurable NUMA memory policy for user processes. It is based on the proposal from last kernel summit with feedback from various people. This NUMA API doesn't not attempt to implement page migration or anything else complicated: all it does is to police the allocation when a page is first allocation or when a page is reallocated after swapping. Currently only support for shared memory and anonymous memory is there; policy for file based mappings is not implemented yet (although they get implicitely policied by the default process policy) It adds three new system calls: mbind to change the policy of a VMA, set_mempolicy to change the policy of a process, get_mempolicy to retrieve memory policy. User tools (numactl, libnuma, test programs, manpages) can be found in ftp://ftp.suse.com/pub/people/ak/numa/numactl-0.6.tar.gz For details on the system calls see the manpages http://www.firstfloor.org/~andi/mbind.html http://www.firstfloor.org/~andi/set_mempolicy.html http://www.firstfloor.org/~andi/get_mempolicy.html Most user programs should actually not use the system calls directly, but use the higher level functions in libnuma (http://www.firstfloor.org/~andi/numa.html) or the command line tools (http://www.firstfloor.org/~andi/numactl.html The system calls allow user programs and administors to set various NUMA memory policies for putting memory on specific nodes. Here is a short description of the policies copied from the kernel patch: * NUMA policy allows the user to give hints in which node(s) memory should * be allocated. * * Support four policies per VMA and per process: * * The VMA policy has priority over the process policy for a page fault. * * interleave Allocate memory interleaved over a set of nodes, * with normal fallback if it fails. * For VMA based allocations this interleaves based on the * offset into the backing object or offset into the mapping * for anonymous memory. For process policy an process counter * is used. * bind Only allocate memory on a specific set of nodes, * no fallback. * preferred Try a specific node first before normal fallback. * As a special case node -1 here means do the allocation * on the local CPU. This is normally identical to default, * but useful to set in a VMA when you have a non default * process policy. * default Allocate on the local node first, or when on a VMA * use the process policy. This is what Linux always did * in a NUMA aware kernel and still does by, ahem, default. * * The process policy is applied for most non interrupt memory allocations * in that process' context. Interrupts ignore the policies and always * try to allocate on the local CPU. The VMA policy is only applied for memory * allocations for a VMA in the VM. * * Currently there are a few corner cases in swapping where the policy * is not applied, but the majority should be handled. When process policy * is used it is not remembered over swap outs/swap ins. * * Only the highest zone in the zone hierarchy gets policied. Allocations * requesting a lower zone just use default policy. This implies that * on systems with highmem kernel lowmem allocation don't get policied. * Same with GFP_DMA allocations. * * For shmfs/tmpfs/hugetlbfs shared memory the policy is shared between * all users and remembered even when nobody has memory mapped. This patch: This is the core NUMA API code. This includes NUMA policy aware wrappers for get_free_pages and alloc_page_vma(). On non NUMA kernels these are defined away. The system calls mbind (see http://www.firstfloor.org/~andi/mbind.html), get_mempolicy (http://www.firstfloor.org/~andi/get_mempolicy.html) and set_mempolicy (http://www.firstfloor.org/~andi/set_mempolicy.html) are implemented here. Adds a vm_policy field to the VMA and to the process. The process also has field for interleaving. VMA interleaving uses the offset into the VMA, but that's not possible for process allocations. From: Andi Kleen > Andi, how come policy_vma() calls ->set_policy under i_shared_sem? I think this can be actually dropped now. In an earlier version I did walk the vma shared list to change the policies of other mappings to the same shared memory region. This turned out too complicated with all the corner cases, so I eventually gave in and added ->get_policy to the fast path. Also there is still the mmap_sem which prevents races in the same MM. Patch to remove it attached. Also adds documentation and removes the bogus __alloc_page_vma() prototype noticed by hch. From: Andi Kleen A few incremental fixes for NUMA API. - Fix a few comments - Add a compat_ function for get_mem_policy I considered changing the ABI to avoid this, but that would have made the API too ugly. I put it directly into the file because a mm/compat.c didn't seem worth it just for this. - Fix the algorithm for VMA interleave. From: Matthew Dobson 1) Move the extern of alloc_pages_current() into #ifdef CONFIG_NUMA. The only references to the function are in NUMA code in mempolicy.c 2) Remove the definitions of __alloc_page_vma(). They aren't used. 3) Move forward declaration of struct vm_area_struct to top of file. [PATCH] mpol in copy_vma From: Hugh Dickins I think Andi missed the copy_vma I recently added for mremap, and it'll need something like below.... (Doesn't look like it'll optimize away when it's not needed - rather bloaty.) [PATCH] numa api core: use SLAB_PANIC [PATCH] Re-add NUMA API statistics From: Andi Kleen Patch readds the sysfs output of the NUMA API statistics. All my test scripts need this and it is very useful to check if the policy actually works. This got lost when the huge page numa api changes got dropped. I decided to not resend the huge pages NUMA API changes for now. Instead I will wait for this area to settle when demand paged large pages is merged. [PATCH] numa api: Add VMA hooks for policy From: Andi Kleen NUMA API adds a policy to each VMA. During VMA creattion, merging and splitting these policies must be handled properly. This patch adds the calls to this. It is a nop when CONFIG_NUMA is not defined. [PATCH] numa api: Add shared memory support From: Andi Kleen Add support to tmpfs and hugetlbfs to support NUMA API. Shared memory is a bit of a special case for NUMA policy. Normally policy is associated to VMAs or to processes, but for a shared memory segment you really want to share the policy. The core NUMA API has code for that, this patch adds the necessary changes to tmpfs and hugetlbfs. First it changes the custom swapping code in tmpfs to follow the policy set via VMAs. It is also useful to have a "backing store" of policy that saves the policy even when nobody has the shared memory segment mapped. This allows command line tools to pre configure policy, which is then later used by programs. Note that hugetlbfs needs more changes - it is also required to switch it to lazy allocation, otherwise the prefault prevents mbind() from working. [PATCH] small numa api fixups From: Christoph Hellwig - don't include mempolicy.h in sched.h and mm.h when a forward delcaration is enough. Andi argued against that in the past, but I'd really hate to add another header to two of the includes used in basically every driver when we can include it in the six files actually needing it instead (that number is for my ppc32 system, maybe other arches need more include in their directories) - make numa api fields in tast_struct conditional on CONFIG_NUMA, this gives us a few ugly ifdefs but avoids wasting memory on non-NUMA systems. [PATCH] numa api: Add statistics From: Andi Kleen Add NUMA hit/miss statistics to page allocation and display them in sysfs. This is not 100% required for NUMA API, but without this it is very The overhead is quite low because all counters are per CPU and only happens when CONFIG_NUMA is defined. [PATCH] numa api: Add policy support to anonymous memory From: Andi Kleen Change to core VM to use alloc_page_vma() instead of alloc_page(). Change the swap readahead to follow the policy of the VMA. [PATCH] numa api: fix end of memory handling in mbind From: Andi Kleen This fixes a user triggerable crash in mbind() in NUMA API. It would oops when running into the end of memory. Actually not really oops, because a oops with the mm sem hold for writing always deadlocks. [PATCH] rmap 15: vma_adjust From: Hugh Dickins If file-based vmas are to be kept in a tree, according to the file offsets they map, then adjusting the vma's start pgoff or its end involves repositioning in the tree, while holding i_shared_lock (and page_table_lock). We used to avoid that if possible, e.g. when just moving end; but if we're heading that way, let's now tidy up vma_merge and split_vma, and do all the locking and adjustment in a new helper vma_adjust. And please, let's call the next vma in vma_merge "next" rather than "prev". Since these patches are diffed over 2.6.6-rc2-mm2, they include the NUMA mpolicy mods which you'll have to remove to go earlier in the series, sorry for that nuisance. I have intentionally changed the one vma_mpol_equal to mpol_equal, to make the merge cases more alike. [PATCH] rmap 16: pretend prio_tree From: Hugh Dickins Pave the way for prio_tree by switching over to its interfaces, but actually still implement them with the same old lists as before. Most of the vma_prio_tree interfaces are straightforward. The interesting one is vma_prio_tree_next, used to search the tree for all vmas which overlap the given range: unlike the list_for_each_entry it replaces, it does not find every vma, just those that match. But this does leave handling of nonlinear vmas in a very unsatisfactory state: for now we have to search again over the maximum range to find all the nonlinear vmas which might contain a page, which of course takes away the point of the tree. Fixed in later patch of this batch. There is no need to initialize vma linkage all over, just do it before inserting the vma in list or tree. /proc/pid/statm had an odd test for its shared count: simplified to an equivalent test on vm_file. [PATCH] rmap 17: real prio_tree From: Hugh Dickins Rajesh Venkatasubramanian's implementation of a radix priority search tree of vmas, to handle object-based reverse mapping corner cases well. Amongst the objections to object-based rmap were test cases by akpm and by mingo, in which large numbers of vmas mapping disjoint or overlapping parts of a file showed strikingly poor performance of the i_mmap lists. Perhaps those tests are irrelevant in the real world? We cannot be too sure: the prio_tree is well-suited to solving precisely that problem, so unless it turns out to bring too much overhead, let's include it. Why is this prio_tree.c placed in mm rather than lib? See GET_INDEX: this implementation is geared throughout to use with vmas, though the first half of the file appears more general than the second half. Each node of the prio_tree is itself (contained within) a vma: might save memory by allocating distinct nodes from which to hang vmas, but wouldn't save much, and would complicate the usage with preallocations. Off each node of the prio_tree itself hangs a list of like vmas, if any. The connection from node to list is a little awkward, but probably the best compromise: it would be more straightforward to list likes directly from the tree node, but that would use more memory per vma, for the list_head and to identify that head. Instead, node's shared.vm_set.head points to next vma (whose shared.vm_set.head points back to node vma), and that next contains the list_head from which the rest hang - reusing fields already used in the prio_tree node itself. Currently lacks prefetch: Rajesh hopes to add some soon. [PATCH] rmap 18: i_mmap_nonlinear From: Hugh Dickins The prio_tree is of no use to nonlinear vmas: currently we're having to search the tree in the most inefficient way to find all its nonlinears. At the very least we need an indication of the unlikely case when there are some nonlinears; but really, we'd do best to take them out of the prio_tree altogether, into a list of their own - i_mmap_nonlinear. [PATCH] unmap_mapping_range: add comment [PATCH] rmap 19: arch prio_tree From: Hugh Dickins The previous patches of this prio_tree batch have been to generic only. Now the arm and parisc __flush_dcache_page are converted to using vma_prio_tree_next, and benefit from its selection of relevant vmas. They're still accessing the tree without i_shared_lock or any other, that's not forgotten but still under investigation. Include pagemap.h for the definition of PAGE_CACHE_SHIFT. s390 and x86_64 no longer initialize vma's shared field (whose type has changed), done later. [PATCH] vm_area_struct size comment From: Hugh Dickins Missed comment on the size of vm_area_struct: it is no longer 64 bytes on ia32. [PATCH] rmap.c comment/style fixups From: Christoph Hellwig [PATCH] rmap 20 i_mmap_shared into i_mmap From: Hugh Dickins Why should struct address_space have separate i_mmap and i_mmap_shared prio_trees (separating !VM_SHARED and VM_SHARED vmas)? No good reason, the same processing is usually needed on both. Merge i_mmap_shared into i_mmap, but keep i_mmap_writable count of VM_SHARED vmas (those capable of dirtying the underlying file) for the mapping_writably_mapped test. The VM_MAYSHARE test in the arm and parisc loops is not necessarily what they will want to use in the end: it's provided as a harmless example of what might be appropriate, but maintainers are likely to revise it later (that parisc loop is currently being changed in the parisc tree anyway). On the way, remove the now out-of-date comments on vm_area_struct size. [PATCH] rmap 21 try_to_unmap_one mapcount From: Hugh Dickins Why should try_to_unmap_anon and try_to_unmap_file take a copy of page->mapcount and pass it down for try_to_unmap_one to decrement? why not just check page->mapcount itself? asks akpm. Perhaps there used to be a good reason, but not any more: remove the mapcount arg. [PATCH] rmap 22 flush_dcache_mmap_lock From: Hugh Dickins arm and parisc __flush_dcache_page have been scanning the i_mmap(_shared) list without locking or disabling preemption. That may be even more unsafe now it's a prio tree instead of a list. It looks like we cannot use i_shared_lock for this protection: most uses of flush_dcache_page are okay, and only one would need lock ordering fixed (get_user_pages holds page_table_lock across flush_dcache_page); but there's a few (e.g. in net and ntfs) which look as if they're using it in I/O completion - and it would be restrictive to disallow it there. So, on arm and parisc only, define flush_dcache_mmap_lock(mapping) as spin_lock_irq(&(mapping)->tree_lock); on i386 (and other arches left to the next patch) define it away to nothing; and use where needed. While updating locking hierarchy in filemap.c, remove two layers of the fossil record from add_to_page_cache comment: no longer used for swap. I believe all the #includes will work out, but have only built i386. I can see several things about this patch which might cause revulsion: the name flush_dcache_mmap_lock? the reuse of the page radix_tree's tree_lock for this different purpose? spin_lock_irqsave instead? can't we somehow get i_shared_lock to handle the problem? [PATCH] rmap 23 empty flush_dcache_mmap_lock From: Hugh Dickins Most architectures (like i386) do nothing in flush_dcache_page, or don't scan i_mmap in flush_dcache_page, so don't need flush_dcache_mmap_lock to do anything: define it and flush_dcache_mmap_unlock away. Noticed arm26, cris, h8300 still defining flush_page_to_ram: delete it again. [PATCH] rmap 24 no rmap fastcalls From: Hugh Dickins I like CONFIG_REGPARM, even when it's forced on: because it's easy to force off for debugging - easier than editing out scattered fastcalls. Plus I've never understood why we make function foo a fastcall, but function bar not. Remove fastcall directives from rmap. And fix comment about mremap_moved race: it only applies to anon pages. [PATCH] rmap 27 memset 0 vma From: Hugh Dickins We're NULLifying more and more fields when initializing a vma (mpol_set_vma_default does that too, if configured to do anything). Now use memset to avoid specifying fields, and save a little code too. (Yes, I realize anon_vma will want to set vm_pgoff non-0, but I think that will be better handled at the core, since anon vm_pgoff is negotiable up until an anon_vma is actually assigned.) [PATCH] rmap 28 remove_vm_struct From: Hugh Dickins The callers of remove_shared_vm_struct then proceed to do several more identical things: gather them together in remove_vm_struct. [PATCH] rmap 29 VM_RESERVED safety From: Hugh Dickins From: Andrea Arcangeli Set VM_RESERVED in videobuf_mmap_mapper, to warn do_no_page and swapout not to worry about its pages. Set VM_RESERVED in ia64_elf32_init, it too provides an unusual nopage which might surprise higher level checks. Future safety: they don't actually pose a problem in this current tree. [PATCH] rmap 30 fix bad mapcount From: Hugh Dickins From: Andrea Arcangeli page_alloc.c's bad_page routine should reset a bad mapcount; and it's more revealing to show the bad mapcount than just the boolean mapped. [PATCH] rmap 31 unlikely bad memory From: Hugh Dickins From: Andrea Arcangeli Sprinkle unlikelys throughout mm/memory.c, wherever we see a pgd_bad or a pmd_bad; likely or unlikely on pte_same or !pte_same. Put the jump in the error return from do_no_page, not in the fast path. [PATCH] rmap 32 zap_pmd_range wrap From: Hugh Dickins From: Andrea Arcangeli zap_pmd_range, alone of all those page_range loops, lacks the check for whether address wrapped. Hugh is in doubt as to whether this makes any difference to any config on any arch, but eager to fix the odd one out. [PATCH] rmap 33 install_arg_page vma From: Hugh Dickins anon_vma will need to pass vma to put_dirty_page, so change it and its various callers (setup_arg_pages and its 32-on-64-bit arch variants); and please, let's rename it to install_arg_page. Earlier attempt to do this (rmap 26 __setup_arg_pages) tried to clean up those callers instead, but failed to boot: so now apply rmap 27's memset initialization of vmas to these callers too; which relieves them from needing the recently included linux/mempolicy.h. While there, moved install_arg_page's flush_dcache_page up before page_table_lock - doesn't in fact matter at all, just saves one worry when researching flush_dcache_page locking constraints. [PATCH] rmap 34 vm_flags page_table_lock From: Hugh Dickins First of a batch of seven rmap patches, based on 2.6.6-mm3. Probably the final batch: remaining issues outstanding can have isolated patches. The first half of the batch is good for anonmm or anon_vma, the second half of the batch replaces my anonmm rmap by Andrea's anon_vma rmap. Judge for yourselves which you prefer. I do think I was wrong to call anon_vma more complex than anonmm (its lists are easier to understand than my refcounting), and I'm happy with its vma merging after the last patch. It just comes down to whether we can spare the extra 24 bytes (maximum, on 32-bit) per vma for its advantages in swapout and mremap. rmap 34 vm_flags page_table_lock Why do we guard vm_flags mods with page_table_lock when it's already down_write guarded by mmap_sem? There's probably a historical reason, but no sign of any need for it now. Andrea added a comment and removed the instance from mprotect.c, Hugh plagiarized his comment and removed the instances from madvise.c and mlock.c. Huge leap in scalability... not expected; but this should stop people asking why those spinlocks. [PATCH] rmap 35 mmap.c cleanups From: Hugh Dickins Before some real vma_merge work in mmap.c in the next patch, a patch of miscellaneous cleanups to cut down the noise: - remove rb_parent arg from vma_merge: mm->mmap can do that case - scatter pgoff_t around to ingratiate myself with the boss - reorder is_mergeable_vma tests, vm_ops->close is least likely - can_vma_merge_before take combined pgoff+pglen arg (from Andrea) - rearrange do_mmap_pgoff's ever-confusing anonymous flags switch - comment do_mmap_pgoff's mysterious (vm_flags & VM_SHARED) test - fix ISO C90 warning on browse_rb if building with DEBUG_MM_RB - stop that long MNT_NOEXEC line wrapping Yes, buried in amidst these is indeed one pgoff replaced by "next->vm_pgoff - pglen" (reverting a mod of mine which took pgoff supplied by user too seriously in the anon case), and another pgoff replaced by 0 (reverting anon_vma mod which crept in with NUMA API): neither of them really matters, except perhaps in /proc/pid/maps. [PATCH] rmap 36 mprotect use vma_merge From: Hugh Dickins Earlier on, in 2.6.6, we took the vma merging code out of mremap.c and let it rely on vma_merge instead (via copy_vma). Now take the vma merging code out of mprotect.c and let it rely on vma_merge too: so vma_merge becomes the sole vma merging engine. The fruit of this consolidation is that mprotect now merges file-backed vmas naturally. Make this change now because anon_vma will complicate the vma merging rules, let's keep them all in one place. vma_merge remains where the decisions are made, whether to merge with prev and/or next; but now [addr,end) may be the latter part of prev, or first part or whole of next, whereas before it was always a new area. vma_adjust carries out vma_merge's decision, but when sliding the boundary between vma and next, must temporarily remove next from the prio_tree too. And it turned out (by oops) to have a surer idea of whether next needs to be removed than vma_merge, so the fput and freeing moves into vma_adjust. Too much decipherment of what's going on at the start of vma_adjust? Yes, and there's a delicate assumption that you may use vma_adjust in sliding a boundary, or splitting in two, or growing a vma (mremap uses it in that way), but not for simply shrinking a vma. Which is so, and must be so (how could pages mapped in the part to go, be zapped without first splitting?), but would feel better with some protection. __vma_unlink can then be moved from mm.h to mmap.c, and mm.h's more misleading than helpful can_vma_merge is deleted. [PATCH] rmap 37 page_add_anon_rmap vma From: Hugh Dickins Silly final patch for anonmm rmap: change page_add_anon_rmap's mm arg to vma arg like anon_vma rmap, to smooth the transition between them. [PATCH] rmap 38 remove anonmm rmap From: Hugh Dickins Before moving on to anon_vma rmap, remove now what's peculiar to anonmm rmap: the anonmm handling and the mremap move cows. Temporarily reduce page_referenced_anon and try_to_unmap_anon to stubs, so a kernel built with this patch will not swap anonymous at all. [PATCH] rmap 39 add anon_vma rmap From: Hugh Dickins Andrea Arcangeli's anon_vma object-based reverse mapping scheme for anonymous pages. Instead of tracking anonymous pages by pte_chains or by mm, this tracks them by vma. But because vmas are frequently split and merged (particularly by mprotect), a page cannot point directly to its vma(s), but instead to an anon_vma list of those vmas likely to contain the page - a list on which vmas can easily be linked and unlinked as they come and go. The vmas on one list are all related, either by forking or by splitting. This has three particular advantages over anonmm: that it can cope effortlessly with mremap moves; and no longer needs page_table_lock to protect an mm's vma tree, since try_to_unmap finds vmas via page -> anon_vma -> vma instead of using find_vma; and should use less cpu for swapout since it can locate its anonymous vmas more quickly. It does have disadvantages too: a lot more change in mmap.c to deal with anon_vmas, though small straightforward additions now that the vma merging has been refactored there; more lowmem needed for each anon_vma and vma structure; an additional restriction on the merging of vmas (cannot be merged if already assigned different anon_vmas, since then their pages will be pointing to different heads). (There would be no need to enlarge the vma structure if anonymous pages belonged only to anonymous vmas; but private file mappings accumulate anonymous pages by copy-on-write, so need to be listed in both anon_vma and prio_tree at the same time. A different implementation could avoid that by using anon_vmas only for purely anonymous vmas, and use the existing prio_tree to locate cow pages - but that would involve a long search for each single private copy, probably not a good idea.) Where before the vm_pgoff of a purely anonymous (not file-backed) vma was meaningless, now it represents the virtual start address at which that vma is mapped - which the standard file pgoff manipulations treat linearly as vmas are split and merged. But if mremap moves the vma, then it generally carries its original vm_pgoff to the new location, so pages shared with the old location can still be found. Magic. Hugh has massaged it somewhat: building on the earlier rmap patches, this patch is a fifth of the size of Andrea's original anon_vma patch. Please note that this posting will be his first sight of this patch, which he may or may not approve. [PATCH] rmap 40 better anon_vma sharing From: Hugh Dickins anon_vma rmap will always necessarily be more restrictive about vma merging than before: according to the history of the vmas in an mm, they are liable to be allocated different anon_vma heads, and from that point on be unmergeable. Most of the time this doesn't matter at all; but in two cases it may matter. One case is that mremap refuses (-EFAULT) to span more than a single vma: so it is conceivable that some app has relied on vma merging prior to mremap in the past, and will now fail with anon_vma. Conceivable but unlikely, let's cross that bridge if we come to it: and the right answer would be to extend mremap, which should not be exporting the kernel's implementation detail of vma to user interface. The other case that matters is when a reasonable repetitive sequence of syscalls and faults ends up with a large number of separate unmergeable vmas, instead of the single merged vma it could have. Andrea's mprotect-vma-merging patch fixed some such instances, but left other plausible cases unmerged. There is no perfect solution, and the harder you try to allow vmas to be merged, the less efficient anon_vma becomes, in the extreme there being one to span the whole address space, from which hangs every private vma; but anonmm rmap is clearly superior to that extreme. Andrea's principle was that neighbouring vmas which could be mprotected into mergeable vmas should be allowed to share anon_vma: good insight. His implementation was to arrange this sharing when trying vma merge, but that seems to be too early. This patch sticks to the principle, but implements it in anon_vma_prepare, when handling the first write fault on a private vma: with better results. The drawback is that this first write fault needs an extra find_vma_prev (whereas prev was already to hand when implementing anon_vma sharing at try-to-merge time). [PATCH] partial prefetch for vma_prio_tree_next From: Rajesh Venkatasubramanian This patch adds prefetches for walking a vm_set.list. Adding prefetches for prio tree traversals is tricky and may lead to cache trashing. So this patch just adds prefetches only when walking a vm_set.list. I haven't done any benchmarks to show that this patch improves performance. However, this patch should help to improve performance when vm_set.lists are long, e.g., libc. Since we only prefetch vmas that are guaranteed to be used in the near future, this patch should not result in cache trashing, theoretically. I didn't add any NULL checks before prefetching because prefetch.h clearly says prefetch(0) is okay. [PATCH] bogus sigaltstack calls by rt_sigreturn There is a longstanding bug in the rt_sigreturn system call. This exists in both 2.4 and 2.6, and for almost every platform. I am referring to this code in sys_rt_sigreturn (arch/i386/kernel/signal.c): if (__copy_from_user(&st, &frame->uc.uc_stack, sizeof(st))) goto badframe; /* It is more difficult to avoid calling this function than to call it and ignore errors. */ /* * THIS CANNOT WORK! "&st" is a kernel address, and "do_sigaltstack()" * takes a user address (and verifies that it is a user address). End * result: it does exactly _nothing_. */ do_sigaltstack(&st, NULL, regs->esp); As the comment says, this is bogus. On vanilla i386 kernels, this is just harmlessly stupid--do_sigaltstack always does nothing and returns -EFAULT. However this code actually bites users on kernels using Ingo Molnar's 4G/4G address space layout changes. There some kernel stack address might very well be a lovely and readable user address as well. When that happens, we make a sigaltstack call with some random buffer, and then the fun begins. To my knowledge, this has produced trouble in the real world only for 4G i386 kernels (RHEL and Fedora "hugemem" kernels) on machines that actually have several GB of physical memory (and in programs that are actually using sigaltstack and handling a lot of signals). However, the same clearly broken code has been blindly copied to most other architecture ports, and off hand I don't know the address space details of any other well enough to know if real kernel stack addresses and real user addresses are in fact disjoint as they are on i386 when not using the nonstandard 4GB address space layout. The obvious intent of the call being there in the first place is to permit a signal handler to diddle its ucontext_t.uc_stack before returning, and have this effect a sigaltstack call on the signal handler return. This is not only an optimization vs doing the extra system call, but makes it possible to make a sigaltstack change when that handler itself was running on the signal stack. AFAICT this has never actually worked before, so certainly noone depends on it. But the code certainly suggests that someone intended at one time for that to be the behavior. Thus I am inclined to fix it so it works in that way, though it has not done so before. It would also be reasonable enough to simply rip out the bogus call and not have this functionality. From the current state of code in both 2.4 and 2.6, there is no fathoming how this broken code came about. It's actually much simpler to just make it work! I can only presume that at some point in the past the sigaltstack implementation functions were different such that this made sense. Of the few ports I've looked at briefly, only the ppc/pc64 porters (go paulus!) actually tried to understand what the i386 code was doing and implemented it correctly rather than just carefully transliterating the bug. The patch below fixes only the i386 and x86_64 versions. The x86_64 patches I have not actually tested. I think each and every arch (except ppc and ppc64) need to make the corresponding fixes as well. Note that there is a function to fix for each native arch, and then one for each emulation flavor. The details differ minutely for getting the calls right in each emulation flavor, but I think that most or all of the arch's with biarch/emulation support have similar enough code that each emulation flavor's fix will look very much like the arch/x86_64/ia32/ia32_signal.c patch here. Linux 2.6.7-rc1