commit d3d373e0e3f51f335d8c722dd1340ab812fdf94b Merge: aceb91c 5084f89 Author: Linus Torvalds Date: Wed Feb 9 11:51:40 2011 -0800 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: virtio: console: Update Copyright virtio: console: Wake up outvq on host notifications commit aceb91cd351bc3a19a783c901fe8a8070d5f6fa9 Merge: ae8eed2 b8cf0e0 Author: Linus Torvalds Date: Wed Feb 9 11:45:21 2011 -0800 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: cdrom: support devices that have check_events but not media_changed cfq-iosched: Don't wait if queue already has requests. blkio-throttle: Avoid calling blkiocg_lookup_group() for root group cfq: rename a function to give it more appropriate name cciss: make cciss_revalidate not loop through CISS_MAX_LUNS volumes unnecessarily. drivers/block/aoe/Makefile: replace the use of -objs with -y loop: queue_lock NULL pointer derefence in blk_throtl_exit drivers/block/Makefile: replace the use of -objs with -y blktrace: Don't output messages if NOTIFY isn't set. commit ae8eed2d0906bf0d8eb0c2a4651676a41d361297 Merge: 100b33c 02214dc Author: Linus Torvalds Date: Wed Feb 9 11:44:55 2011 -0800 Merge branch 'for-linus' of git://neil.brown.name/md * 'for-linus' of git://neil.brown.name/md: FIX: md: process hangs at wait_barrier after 0->10 takeover md_make_request: don't touch the bio after calling make_request md: Don't allow slot_store while resync/recovery is happening. md: don't clear curr_resync_completed at end of resync. md: Don't use remove_and_add_spares to remove failed devices from a read-only array Add raid1->raid0 takeover support md: Remove the AllReserved flag for component devices. md: don't abort checking spares as soon as one cannot be added. md: fix the test for finding spares in raid5_start_reshape. md: simplify some 'if' conditionals in raid5_start_reshape. md: revert change to raid_disks on failure. commit b8cf0e0e552ca48e9a00f518aeb4f5e03984022b Author: Simon Arlott Date: Wed Feb 9 14:21:07 2011 +0100 cdrom: support devices that have check_events but not media_changed Commit 93aae17af1172c40c6f74b7294e93a90c3cfaa5d ("sr: implement sr_check_events()") replaced the media_changed op with the check_events op in drivers/scsi/sr.c All users that check for the CDC_MEDIA_CHANGED capbility try both the check_events op and the media_changed op, but register_cdrom() was requiring media_changed. This patch fixes the capability checking. The cdrom_select_disc ioctl is also using the two operations, so they should be required for CDC_SELECT_DISC too. Signed-off-by: Simon Arlott Cc: Tejun Heo Cc: Kay Sievers Tested-by: Chris Clayton Signed-off-by: Jens Axboe commit 02a8f01b5a9f396d0327977af4c232d0f94c45fd Author: Justin TerAvest Date: Wed Feb 9 14:20:03 2011 +0100 cfq-iosched: Don't wait if queue already has requests. Commit 7667aa0630407bc07dc38dcc79d29cc0a65553c1 added logic to wait for the last queue of the group to become busy (have at least one request), so that the group does not lose out for not being continuously backlogged. The commit did not check for the condition that the last queue already has some requests. As a result, if the queue already has requests, wait_busy is set. Later on, cfq_select_queue() checks the flag, and decides that since the queue has a request now and wait_busy is set, the queue is expired. This results in early expiration of the queue. This patch fixes the problem by adding a check to see if queue already has requests. If it does, wait_busy is not set. As a result, time slices do not expire early. The queues with more than one request are usually buffered writers. Testing shows improvement in isolation between buffered writers. Cc: stable@kernel.org Signed-off-by: Justin TerAvest Reviewed-by: Gui Jianfeng Acked-by: Vivek Goyal Signed-off-by: Jens Axboe commit 5084f89303c0a138f66bf74662753f46878989bb Author: Amit Shah Date: Mon Jan 31 13:06:37 2011 +0530 virtio: console: Update Copyright Signed-off-by: Amit Shah Signed-off-by: Rusty Russell commit 2770c5ea501be69989a7acf54ec4cb3554c47191 Author: Amit Shah Date: Mon Jan 31 13:06:36 2011 +0530 virtio: console: Wake up outvq on host notifications The outvq needs to be woken up on host notifications so that buffers consumed by the host can be reclaimed, outvq freed, and application writes may proceed again. The need for this is now finally noticed when I have qemu patches ready to use nonblocking IO and flow control. CC: Hans de Goede CC: stable@kernel.org Signed-off-by: Amit Shah Signed-off-by: Rusty Russell Acked-by: Hans de Goede commit 02214dc5461c36da26a34014cab4e1bb484edba2 Author: Krzysztof Wojcik Date: Fri Feb 4 14:18:26 2011 +0100 FIX: md: process hangs at wait_barrier after 0->10 takeover Following symptoms were observed: 1. After raid0->raid10 takeover operation we have array with 2 missing disks. When we add disk for rebuild, recovery process starts as expected but it does not finish- it stops at about 90%, md126_resync process hangs in "D" state. 2. Similar behavior is when we have mounted raid0 array and we execute takeover to raid10. After this when we try to unmount array- it causes process umount hangs in "D" In scenarios above processes hang at the same function- wait_barrier in raid10.c. Process waits in macro "wait_event_lock_irq" until the "!conf->barrier" condition will be true. In scenarios above it never happens. Reason was that at the end of level_store, after calling pers->run, we call mddev_resume. This calls pers->quiesce(mddev, 0) with RAID10, that calls lower_barrier. However raise_barrier hadn't been called on that 'conf' yet, so conf->barrier becomes negative, which is bad. This patch introduces setting conf->barrier=1 after takeover operation. It prevents to become barrier negative after call lower_barrier(). Signed-off-by: Krzysztof Wojcik Signed-off-by: NeilBrown commit e91ece5590b3c728624ab57043fc7a05069c604a Author: Chris Mason Date: Mon Feb 7 19:21:48 2011 -0500 md_make_request: don't touch the bio after calling make_request md_make_request was calling bio_sectors() for part_stat_add after it was calling the make_request function. This is bad because the make_request function can free the bio and because the bi_size field can change around. The fix here was suggested by Jens Axboe. It saves the sector count before the make_request call. I hit this with CONFIG_DEBUG_PAGEALLOC turned on while trying to break his pretty fusionio card. Cc: Signed-off-by: Chris Mason Signed-off-by: NeilBrown commit c6751b2bde477f56ceef67aa1d298ce44e8e2e23 Author: NeilBrown Date: Wed Feb 2 11:57:13 2011 +1100 md: Don't allow slot_store while resync/recovery is happening. Activating a spare in an array while resync/recovery is already happening can lead the that spare being marked in-sync when it isn't really. So don't allow the 'slot' to be set (this activating the device) while resync/recovery is happening. Signed-off-by: NeilBrown commit 7281f8129c362436237b82c8c026494dd36479dc Author: NeilBrown Date: Mon Jan 31 14:30:27 2011 +1100 md: don't clear curr_resync_completed at end of resync. There is no need to set this to zero at this point. It will be set to zero by remove_and_add_spares or at the start of md_do_sync at the latest. And setting it to zero before MD_RECOVERY_RUNNING is cleared can make a 'zero' appear briefly in the 'sync_completed' sysfs attribute just as resync is finishing. So simply remove this setting to zero. Signed-off-by: NeilBrown commit a8c42c7f476b5bb39bb3a5b32d5473b9a46cadb9 Author: NeilBrown Date: Mon Jan 31 13:47:13 2011 +1100 md: Don't use remove_and_add_spares to remove failed devices from a read-only array remove_and_add_spares is called in two places where the needs really are very different. remove_and_add_spares should not be called on an array which is about to be reshaped as some extra devices might have been manually added and that would remove them. However if the array is 'read-auto', that will currently happen, which is bad. So in the 'ro != 0' case don't call remove_and_add_spares but simply remove the failed devices as the comment suggests is needed. Signed-off-by: NeilBrown commit fc3a08b85b7a4f6c1069e5f71f6ad40d925ff55b Author: Krzysztof Wojcik Date: Mon Jan 31 13:47:13 2011 +1100 Add raid1->raid0 takeover support This patch introduces raid 1 to raid0 takeover operation in kernel space. Signed-off-by: Krzysztof Wojcik Signed-off-by: Neil Brown commit f21e9ff7f77d41ceca4e1e5ee5a4efa5ad7a5e40 Author: NeilBrown Date: Mon Jan 31 12:10:09 2011 +1100 md: Remove the AllReserved flag for component devices. This flag is not needed and is used badly. Devices that are included in a native-metadata array are reserved exclusively for that array - and currently have AllReserved set. They all are bd_claimed for the rdev and so cannot be shared. Devices that are included in external-metadata arrays can be shared among multiple arrays - providing there is no overlap. These are bd_claimed for md in general - not for a particular rdev. When changing the amount of a device that is used in an array we need to check for overlap. This currently includes a check on AllReserved So even without overlap, sharing with an AllReserved device is not allowed. However the bd_claim usage already precludes sharing with these devices, so the test on AllReserved is not needed. And in fact it is wrong. As this is the only use of AllReserved, simply remove all usage and definition of AllReserved. Signed-off-by: NeilBrown commit 50da08409654e036c4c964a473567a61a654cb83 Author: NeilBrown Date: Mon Jan 31 11:57:43 2011 +1100 md: don't abort checking spares as soon as one cannot be added. As spares can be added manually before a reshape starts, we need to find them all to mark some of them as in_sync. Previously we would abort looking for spares when we found an unallocated spare what could not be added to the array (implying there was no room for new spares). However already-added spares could be later in the list, so we need to keep searching. Signed-off-by: NeilBrown commit 469518a3455c79619e9231aeffeffa2e2989f738 Author: NeilBrown Date: Mon Jan 31 11:57:43 2011 +1100 md: fix the test for finding spares in raid5_start_reshape. As spares can be added to the array before the reshape is started, we need to find and count them when checking there are enough. The array could have been degraded, so we need to check all devices, no just those out side of the range of devices in the array before the reshape. So instead of checking the index, check the In_sync flag as that reliably tells if the device is a spare or this purpose. Signed-off-by: NeilBrown commit 87a8dec91e15954f0cf86be6c21741d991d83621 Author: NeilBrown Date: Mon Jan 31 11:57:43 2011 +1100 md: simplify some 'if' conditionals in raid5_start_reshape. There are two consecutive 'if' statements. if (mddev->delta_disks >= 0) .... if (mddev->delta_disks > 0) The code in the second is equally valid if delta_disks == 0, and these two statements are the only place that 'added_devices' is used. So make them a single if statement, make added_devices a local variable, and re-indent it all. No functional change. Signed-off-by: NeilBrown commit de171cb9a52598cc023adceafc6c166112401386 Author: NeilBrown Date: Mon Jan 31 11:57:42 2011 +1100 md: revert change to raid_disks on failure. If we try to update_raid_disks and it fails, we should put 'delta_disks' back to zero. This is important because some code, such as slot_store, assumes that delta_disks has been validated. Signed-off-by: NeilBrown commit be2c6b1990904dbd43f3d9b90fa2c530504375cd Author: Vivek Goyal Date: Wed Jan 19 08:25:02 2011 -0700 blkio-throttle: Avoid calling blkiocg_lookup_group() for root group o Jeff Moyer was doing some testing on a RAM backed disk and blkiocg_lookup_group() showed up high overhead after memcpy(). Similarly somebody else reported that blkiocg_lookup_group() is eating 6% extra cpu. Though looking at the code I can't think why the overhead of this function is so high. One thing is that it is called with very high frequency (once for every IO). o For lot of folks blkio controller will be compiled in but they might not have actually created cgroups. Hence optimize the case of root cgroup where we can avoid calling blkiocg_lookup_group() if IO is happening in root group (common case). Reported-by: Jeff Moyer Signed-off-by: Vivek Goyal Acked-by: Jeff Moyer Signed-off-by: Jens Axboe commit ba5bd520f679c450fb6efa439618703bd0956daa Author: Vivek Goyal Date: Wed Jan 19 08:25:02 2011 -0700 cfq: rename a function to give it more appropriate name o Rename a function to give it more approprate name. We are calculating cfq queue slice and function name gives the impression as if cfq group slice length is being calculated. Signed-off-by: Vivek Goyal Signed-off-by: Jens Axboe commit 68264e9d6781f7163e92c517769bb470fa43f6cd Author: Stephen M. Cameron Date: Wed Jan 19 08:25:02 2011 -0700 cciss: make cciss_revalidate not loop through CISS_MAX_LUNS volumes unnecessarily. Signed-off-by: Stephen M. Cameron Signed-off-by: Andrew Morton Signed-off-by: Jens Axboe commit a0700bdd0b0150ea445159b1dee587f1507c272f Author: Tracey Dent Date: Wed Jan 19 08:25:02 2011 -0700 drivers/block/aoe/Makefile: replace the use of -objs with -y Change Makefile to use -y instead of -objs because -objs is deprecated and should now be switched. According to (documentation/kbuild/makefiles.txt). Signed-off-by: Tracey Dent Cc: "Ed L. Cashin" Signed-off-by: Andrew Morton Signed-off-by: Jens Axboe commit ee71a968672a9951aee6014c55511007596425bc Author: Sergey Senozhatsky Date: Wed Jan 19 08:25:02 2011 -0700 loop: queue_lock NULL pointer derefence in blk_throtl_exit Performing $ sudo mount -o loop -o umask=0 /dev/sdb1 /mnt/ mount: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so $ sudo modprobe -r loop results in oops: BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [] do_raw_spin_lock+0x14/0x122 Process modprobe (pid: 6189, threadinfo ffff88009a898000, task ffff880154a88000) Call Trace: [] _raw_spin_lock_irq+0x4a/0x51 [] ? blk_throtl_exit+0x3b/0xa0 [] ? cancel_delayed_work_sync+0xd/0xf [] blk_throtl_exit+0x3b/0xa0 [] blk_release_queue+0x21/0x65 [] kobject_release+0x51/0x66 [] ? kobject_release+0x0/0x66 [] kref_put+0x43/0x4d [] kobject_put+0x47/0x4b [] blk_cleanup_queue+0x56/0x5b [] loop_exit+0x68/0x844 [loop] [] sys_delete_module+0x1e8/0x25b [] ? trace_hardirqs_on_thunk+0x3a/0x3f [] system_call_fastpath+0x16/0x1b because of an attempt to acquire NULL queue_lock. I added the same lines as in blk_queue_make_request - index 44e18c0..49e6a54 100644`fall back to embedded per-queue lock'. Signed-off-by: Sergey Senozhatsky Signed-off-by: Andrew Morton Signed-off-by: Jens Axboe commit 04de96c9c6981c5957aa5db39bbdc4d958d07efa Author: Tracey Dent Date: Wed Jan 19 08:25:02 2011 -0700 drivers/block/Makefile: replace the use of -objs with -y Change Makefile to use -y instead of -objs because -objs is deprecated and should now be switched. According to (documentation/kbuild/makefiles.txt). Signed-off-by: Tracey Dent Signed-off-by: Andrew Morton Signed-off-by: Jens Axboe commit 490da40d82b31c0562d3f5edb37810f492ca1c34 Author: Tao Ma Date: Wed Jan 19 10:51:44 2011 +0800 blktrace: Don't output messages if NOTIFY isn't set. Now if we enable blktrace, cfq has too many messages output to the trace buffer. It is fine if we don't specify any action mask. But if I do like this: blktrace /dev/sdb -a issue -a complete -o - | blkparse -i - I only want to see 'D' and 'C', while with the following command dd if=/mnt/ocfs2/test of=/dev/null bs=4k count=1 iflag=direct I will get(with a 2.6.37 vanilla kernel): 8,16 0 0 0.000000000 0 m N cfq3805 alloced 8,16 0 0 0.000004126 0 m N cfq3805 insert_request 8,16 0 0 0.000004884 0 m N cfq3805 add_to_rr 8,16 0 0 0.000008417 0 m N cfq workload slice:300 8,16 0 0 0.000009557 0 m N cfq3805 set_active wl_prio:0 wl_type:2 8,16 0 0 0.000010640 0 m N cfq3805 fifo= (null) 8,16 0 0 0.000011193 0 m N cfq3805 dispatch_insert 8,16 0 0 0.000012221 0 m N cfq3805 dispatched a request 8,16 0 0 0.000012802 0 m N cfq3805 activate rq, drv=1 8,16 0 1 0.000013181 3805 D R 114759 + 8 [dd] 8,16 0 2 0.000164244 0 C R 114759 + 8 [0] 8,16 0 0 0.000167997 0 m N cfq3805 complete rqnoidle 0 8,16 0 0 0.000168782 0 m N cfq3805 set_slice=100 8,16 0 0 0.000169874 0 m N cfq3805 arm_idle: 8 group_idle: 0 8,16 0 0 0.000170189 0 m N cfq schedule dispatch 8,16 0 0 0.000397938 0 m N cfq3805 slice expired t=0 8,16 0 0 0.000399763 0 m N cfq3805 sl_used=1 disp=1 charge=1 iops=0 sect=8 8,16 0 0 0.000400227 0 m N cfq3805 del_from_rr 8,16 0 0 0.000400882 0 m N cfq3805 put_queue See, there are 19 lines while I only need 2. I don't think it is appropriate for a user. So this patch will disable any messages if the BLK_TC_NOTIFY isn't set. Now the output for the same command will look like: 8,16 0 1 0.000000000 4908 D R 114759 + 8 [dd] 8,16 0 2 0.000146827 0 C R 114759 + 8 [0] Yes, it is what I want to see. Cc: Steven Rostedt Cc: Jeff Moyer Signed-off-by: Tao Ma Signed-off-by: Jens Axboe