commit ac7ac9f2b9bfd9b68a1571d27e4c8bebb4788914 Merge: ac89a91 8379e7c Author: Linus Torvalds Date: Sat Sep 5 13:38:37 2009 -0700 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: ocfs2_write_begin_nolock() should handle len=0 ocfs2: invalidate dentry if its dentry_lock isn't initialized. commit 8379e7c46cc48f51197dd663fc6676f47f2a1e71 Author: Sunil Mushran Date: Fri Sep 4 11:12:01 2009 -0700 ocfs2: ocfs2_write_begin_nolock() should handle len=0 Bug introduced by mainline commit e7432675f8ca868a4af365759a8d4c3779a3d922 The bug causes ocfs2_write_begin_nolock() to oops when len=0. Signed-off-by: Sunil Mushran Cc: stable@kernel.org Signed-off-by: Joel Becker commit a1b08e75dff3dc18a88444803753e667bb1d126e Author: Tao Ma Date: Thu Aug 27 14:46:56 2009 +0800 ocfs2: invalidate dentry if its dentry_lock isn't initialized. In commit a5a0a630922a2f6a774b6dac19f70cb5abd86bb0, when ocfs2_attch_dentry_lock fails, we call an extra iput and reset dentry->d_fsdata to NULL. This resolve a bug, but it isn't completed and the dentry is still there. When we want to use it again, ocfs2_dentry_revalidate doesn't catch it and return true. That make future ocfs2_dentry_lock panic out. One bug is http://oss.oracle.com/bugzilla/show_bug.cgi?id=1162. The resolution is to add a check for dentry->d_fsdata in revalidate process and return false if dentry->d_fsdata is NULL, so that a new ocfs2_lookup will be called again. Signed-off-by: Tao Ma Signed-off-by: Joel Becker commit 2584e7986f235572d4b03bbe52fd1e85c1679b8e Merge: 7c0a57d c795b33 Author: Linus Torvalds Date: Mon Aug 24 14:41:28 2009 -0700 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2/dlm: Wait on lockres instead of erroring cancel requests ocfs2: Add missing lock name ocfs2: Don't oops in ocfs2_kill_sb on a failed mount ocfs2: release the buffer head in ocfs2_do_truncate. ocfs2: Handle quota file corruption more gracefully commit c795b33ba171e41563ab7e25105c0cd4edd81cd7 Author: Goldwyn Rodrigues Date: Thu Aug 20 13:43:19 2009 -0500 ocfs2/dlm: Wait on lockres instead of erroring cancel requests In case a downconvert is queued, and a flock receives a signal, BUG_ON(lockres->l_action != OCFS2_AST_INVALID) is triggered because a lock cancel triggers a dlmunlock while an AST is scheduled. To avoid this, allow a LKM_CANCEL to pass through, and let it wait on __dlm_wait_on_lockres(). Signed-off-by: Goldwyn Rodrigues Acked-off-by: Mark Fasheh Signed-off-by: Joel Becker commit a8b88d3d49623ac701b5dc996cbd61219c793c7c Author: Jan Kara Date: Thu Aug 20 18:26:52 2009 +0200 ocfs2: Add missing lock name There is missing name for NFSSync cluster lock. This makes lockdep unhappy because we end up passing NULL to lockdep when initializing lock key. Fix it. Signed-off-by: Jan Kara Signed-off-by: Joel Becker commit 5fd131893793567c361ae64cbeb28a2a753bbe35 Author: Jan Kara Date: Thu Jul 30 17:01:53 2009 +0200 ocfs2: Don't oops in ocfs2_kill_sb on a failed mount If we fail to mount the filesystem, we have to be careful not to dereference uninitialized structures in ocfs2_kill_sb. Signed-off-by: Jan Kara Signed-off-by: Joel Becker commit 60e2ec48665b8495360ca4a6004c5cd52beb2bc1 Author: Tao Ma Date: Wed Aug 12 14:42:47 2009 +0800 ocfs2: release the buffer head in ocfs2_do_truncate. In ocfs2_do_truncate, we forget to release last_eb_bh which will cause memleak. So call brelse in the end. Signed-off-by: Tao Ma Signed-off-by: Joel Becker commit ada508274b8698a33cb0e5bd037db0f9dc781795 Author: Jan Kara Date: Mon Aug 3 18:24:21 2009 +0200 ocfs2: Handle quota file corruption more gracefully ocfs2_read_virt_blocks() does BUG when we try to read a block from a file beyond its end. Since this can happen due to filesystem corruption, it is not really an appropriate answer. Make ocfs2_read_quota_block() check the condition and handle it by calling ocfs2_error() and returning EIO. [ Modified to print ip_blkno in the error - Joel ] Reported-by: Tristan Ye Signed-off-by: Jan Kara Signed-off-by: Joel Becker commit bc7af9ba154f648598bf92b391e446e31b09330a Merge: d58d2d1 b409d7a Author: Linus Torvalds Date: Thu Aug 13 11:17:40 2009 -0700 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (22 commits) ocfs2: Fix possible deadlock when extending quota file ocfs2: keep index within status_map[] ocfs2: Initialize the cluster we're writing to in a non-sparse extend ocfs2: Remove redundant BUG_ON in __dlm_queue_ast() ocfs2/quota: Release lock for error in ocfs2_quota_write. ocfs2: Define credit counts for quota operations ocfs2: Remove syncjiff field from quota info ocfs2: Fix initialization of blockcheck stats ocfs2: Zero out padding of on disk dquot structure ocfs2: Initialize blocks allocated to local quota file ocfs2: Mark buffer uptodate before calling ocfs2_journal_access_dq() ocfs2: Make global quota files blocksize aligned ocfs2: Use ocfs2_rec_clusters in ocfs2_adjust_adjacent_records. ocfs2: Fix deadlock on umount ocfs2: Add extra credits and access the modified bh in update_edge_lengths. ocfs2: Fail ocfs2_get_block() immediately when a block needs allocation ocfs2: Fix error return in ocfs2_write_cluster() ocfs2: Fix compilation warning for fs/ocfs2/xattr.c ocfs2: Initialize count in aio_write before generic_write_checks ocfs2: log the actual return value of ocfs2_file_aio_write() ... commit b409d7a0ab46fe530efe52734984b4ed5d46c3eb Author: Jan Kara Date: Thu Aug 6 23:29:34 2009 +0200 ocfs2: Fix possible deadlock when extending quota file In OCFS2, allocator locks rank above transaction start. Thus we cannot extend quota file from inside a transaction less we could deadlock. We solve the problem by starting transaction not already in ocfs2_acquire_dquot() but only in ocfs2_local_read_dquot() and ocfs2_global_read_dquot() and we allocate blocks to quota files before starting the transaction. In case we crash, quota files will just have a few blocks more but that's no problem since we just use them next time we extend the quota file. Signed-off-by: Jan Kara Signed-off-by: Joel Becker commit 8a57a9dda760ea7845390f1cd36f3eb2a49391d8 Author: Roel Kluin Date: Thu Aug 6 16:07:50 2009 -0700 ocfs2: keep index within status_map[] Do not exceed array status_map[] Signed-off-by: Roel Kluin Cc: Mark Fasheh Signed-off-by: Andrew Morton Signed-off-by: Joel Becker commit e7432675f8ca868a4af365759a8d4c3779a3d922 Author: Sunil Mushran Date: Thu Aug 6 16:12:58 2009 -0700 ocfs2: Initialize the cluster we're writing to in a non-sparse extend In a non-sparse extend, we correctly allocate (and zero) the clusters between the old_i_size and pos, but we don't zero the portions of the cluster we're writing to outside of pos<->len. It handles clustersize > pagesize and blocksize < pagesize. [Cleaned up by Joel Becker.] Signed-off-by: Sunil Mushran Signed-off-by: Joel Becker commit ab57a40827d99e2d8e59066a56b93bf6c844c916 Author: Goldwyn Rodrigues Date: Fri Jul 31 14:28:02 2009 -0500 ocfs2: Remove redundant BUG_ON in __dlm_queue_ast() We BUG_ON() the same thing twice. Signed-off-by: Goldwyn Rodrigues Signed-off-by: Joel Becker commit e9956fae7dbce6ac770e7a00436c496ddbbd3215 Author: Tao Ma Date: Thu Jul 30 16:07:10 2009 +0800 ocfs2/quota: Release lock for error in ocfs2_quota_write. ocfs2_quota_write needs to release the lock if it fails to read quota block. So use "goto out" instead of "return err". Signed-off-by: Tao Ma Acked-by: Jan Kara Signed-off-by: Joel Becker commit 0584974a77796581eb3a64b6c5005edac4a95128 Author: Jan Kara Date: Wed Jul 22 13:17:21 2009 +0200 ocfs2: Define credit counts for quota operations Numbers of needed credits for some quota operations were written as raw numbers. Create appropriate defines instead. Signed-off-by: Jan Kara Signed-off-by: Joel Becker commit 4539f1df25bcd0fdf0d8a5e2c92de6bece83c7a0 Author: Jan Kara Date: Wed Jul 22 13:17:20 2009 +0200 ocfs2: Remove syncjiff field from quota info syncjiff is just a converted value of syncms. Some places which are updating syncms forgot to update syncjiff as well. Since the conversion is just a simple division / multiplication and it does not happen frequently, just remove the syncjiff field to avoid forgotten conversions. Signed-off-by: Jan Kara Signed-off-by: Joel Becker commit 1c1d9793ff6720531c0125a28d321f283716e32f Author: Jan Kara Date: Wed Jul 22 13:17:19 2009 +0200 ocfs2: Fix initialization of blockcheck stats We just set blockcheck stats to zeros but we should also properly initialize the spinlock there. Signed-off-by: Jan Kara Signed-off-by: Joel Becker commit 7669f54c55df225cbb2db939a6c9f111445ffc1c Author: Jan Kara Date: Wed Jul 22 13:17:18 2009 +0200 ocfs2: Zero out padding of on disk dquot structure Padding fields of on-disk dquot structure were not zeroed. Zero them so that it's easier to use them later. Signed-off-by: Jan Kara Signed-off-by: Joel Becker commit 0e7f387bf386c99e8ee322649fe4fc23b9cccc6c Author: Jan Kara Date: Wed Jul 22 13:17:17 2009 +0200 ocfs2: Initialize blocks allocated to local quota file When we extend local quota file, we should initialize data in newly allocated block. Firstly because on recovery we could parse bogus data, secondly so that block checksums are properly computed. Signed-off-by: Jan Kara Signed-off-by: Joel Becker commit 4b3fa1904c0d192461ebba692e4940a334b74353 Author: Jan Kara Date: Wed Jul 22 13:17:16 2009 +0200 ocfs2: Mark buffer uptodate before calling ocfs2_journal_access_dq() In a code path extending local quota files we marked new header buffer uptodate only after calling ocfs2_journal_access_dq() which triggers a bug. Fix it and also call ocfs2 variant of the function marking buffer uptodate. Signed-off-by: Jan Kara Signed-off-by: Joel Becker commit b57ac2c43e66cb4aa76c9d2d0e9e0e04f19cc5a6 Author: Jan Kara Date: Wed Jul 22 13:17:15 2009 +0200 ocfs2: Make global quota files blocksize aligned Change i_size of global quota files so that it always remains aligned to block size. This is mainly because the end of quota block may contain checksum (if checksumming is enabled) and it's a bit awkward for it to be "outside" of quota file (and it makes life harder for ocfs2-tools). Signed-off-by: Jan Kara Signed-off-by: Joel Becker commit 82e12644cf5227dab15201fbcaf0ca6330ebd70f Author: Tao Ma Date: Thu Jul 23 08:12:58 2009 +0800 ocfs2: Use ocfs2_rec_clusters in ocfs2_adjust_adjacent_records. In ocfs2_adjust_adjacent_records, we will adjust adjacent records according to the extent_list in the lower level. But actually the lower level tree will either be a leaf or a branch. If we only use ocfs2_is_empty_extent we will meet with some problem if the lower tree is a branch (tree_depth > 1). So use !ocfs2_rec_clusters instead. And actually only the leaf record can have holes. So add a BUG_ON for non-leaf branch. Signed-off-by: Tao Ma Signed-off-by: Joel Becker commit f7b1aa69be138ad9d7d3f31fa56f4c9407f56b6a Author: Jan Kara Date: Mon Jul 20 12:12:36 2009 +0200 ocfs2: Fix deadlock on umount In commit ea455f8ab68338ba69f5d3362b342c115bea8e13, we moved the dentry lock put process into ocfs2_wq. This causes problems during umount because ocfs2_wq can drop references to inodes while they are being invalidated by invalidate_inodes() causing all sorts of nasty things (invalidate_inodes() ending in an infinite loop, "Busy inodes after umount" messages etc.). We fix the problem by stopping ocfs2_wq from doing any further releasing of inode references on the superblock being unmounted, wait until it finishes the current round of releasing and finally cleaning up all the references in dentry_lock_list from ocfs2_put_super(). The issue was tracked down by Tao Ma . Signed-off-by: Jan Kara Signed-off-by: Joel Becker commit 3c5e10683e684ef45614c9071847e48f633d9806 Author: Tao Ma Date: Tue Jul 21 15:42:05 2009 +0800 ocfs2: Add extra credits and access the modified bh in update_edge_lengths. In normal tree rotation left process, we will never touch the tree branch above subtree_index and ocfs2_extend_rotate_transaction doesn't reserve the credits for them either. But when we want to delete the rightmost extent block, we have to update the rightmost records for all the rightmost branch(See ocfs2_update_edge_lengths), so we have to allocate extra credits for them. What's more, we have to access them also. Signed-off-by: Tao Ma Signed-off-by: Joel Becker commit 1f4cea3790bf44137192f441ee84af256e3bf809 Author: Wengang Wang Date: Mon Jul 13 11:38:58 2009 +0800 ocfs2: Fail ocfs2_get_block() immediately when a block needs allocation ocfs2_get_block() does no allocation. Hole filling for writes should have happened farther up in the call chain. We detect this case and print an error, but we then continue with the function. We should be exiting immediately. Signed-off-by: Wengang Wang Signed-off-by: Joel Becker commit cbfa9639aea5007313b75ec43b689d5f9e0c037d Author: Wengang Wang Date: Mon Jul 13 11:38:23 2009 +0800 ocfs2: Fix error return in ocfs2_write_cluster() A typo caused ocfs2_write_cluster() to return 0 in some error cases. Fix it. Signed-off-by: Wengang Wang Signed-off-by: Joel Becker commit 44d8e4e1e9b941065eb7578669fe51eb65c73e63 Author: Subrata Modak Date: Tue Jul 14 01:19:31 2009 +0530 ocfs2: Fix compilation warning for fs/ocfs2/xattr.c gcc 4.4.1 generates the following build warning on i386: CC [M] fs/ocfs2/xattr.o fs/ocfs2/xattr.c: In function ‘ocfs2_xattr_block_get’: fs/ocfs2/xattr.c:1055: warning: ‘block_off’ may be used uninitialized in this function The following fix is based on a similar approach by David Howells few days back: http://lkml.org/lkml/2009/7/9/109, Signed-off-by: Subrata Modak, Signed-off-by: Joel Becker commit cefcb800fa9536bb6f29485c53e6c82a65b0c022 Author: Goldwyn Rodrigues Date: Sat Jul 11 10:57:27 2009 -0500 ocfs2: Initialize count in aio_write before generic_write_checks generic_write_checks() expects count to be initialized to the size of the write. Writes to files open with O_DIRECT|O_LARGEFILE write 0 bytes because count is uninitialized. Signed-off-by: Goldwyn Rodrigues Signed-off-by: Joel Becker commit 405f55712dfe464b3240d7816cc4fe4174831be2 Author: Alexey Dobriyan Date: Sat Jul 11 22:08:37 2009 +0400 headers: smp_lock.h redux * Remove smp_lock.h from files which don't need it (including some headers!) * Add smp_lock.h to files which do need it * Make smp_lock.h include conditional in hardirq.h It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT This will make hardirq.h inclusion cheaper for every PREEMPT=n config (which includes allmodconfig/allyesconfig, BTW) Signed-off-by: Alexey Dobriyan Signed-off-by: Linus Torvalds commit 812e7a6a43fc34bc8f70c2b80db4ea5997d66ea8 Author: Wengang Wang Date: Fri Jul 10 13:26:04 2009 +0800 ocfs2: log the actual return value of ocfs2_file_aio_write() in ocfs2_file_aio_write(), log_exit() could don't log the value which is really returned. this patch fixes it. Signed-off-by: Wengang Wang Signed-off-by: Joel Becker commit 17ae26b669886efe237b77439e43eb390fceb119 Author: Jeff Liu Date: Tue Jul 7 15:51:40 2009 +0800 ocfs2: trivial fix for s/migrate/migration/ in dlmrecovery.c logging in dlmrecovery.c:1121, replace 'migrate' to 'migration' to keep the consistency by comparing to other lines with the similar log info in the same file. Signed-off-by: Jeff Liu Signed-off-by: Joel Becker commit 8b712cd58adfe6aeeb0be4ecc011dc35620719e7 Author: Jeff Mahoney Date: Tue Jul 7 17:22:12 2009 -0400 ocfs2: Fixup orphan scan cleanup after failed mount If the mount fails for any reason, ocfs2_dismount_volume calls ocfs2_orphan_scan_stop. It requires that ocfs2_orphan_scan_init be called to setup the mutex and work queues, but that doesn't happen if the mount has failed and we oops accessing an uninitialized work queue. This patch splits the init and startup of the orphan scan, eliminating the oops. Signed-off-by: Jeff Mahoney Signed-off-by: Joel Becker commit cf5434e894a17bb8385997adc6d56642055a85d6 Merge: 7b58fc2 d246ab3 Author: Linus Torvalds Date: Tue Jun 23 19:36:02 2009 -0700 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2/trivial: Wrap ocfs2_sysfile_cluster_lock_key within define. ocfs2: Add lockdep annotations vfs: Set special lockdep map for dirs only if not set by fs ocfs2: Disable orphan scanning for local and hard-ro mounts ocfs2: Do not initialize lvb in ocfs2_orphan_scan_lock_res_init() ocfs2: Stop orphan scan as early as possible during umount ocfs2: Fix ocfs2_osb_dump() ocfs2: Pin journal head before accessing jh->b_committed_data ocfs2: Update atime in splice read if necessary. ocfs2: Provide the ocfs2_dlm_lvb_valid() stack API. commit d246ab307d1d003c80fe279897dea22bf52b6e41 Author: Tao Ma Date: Thu Jun 18 13:12:06 2009 +0800 ocfs2/trivial: Wrap ocfs2_sysfile_cluster_lock_key within define. Actually ocfs2_sysfile_cluster_lock_key is only used if we enable CONFIG_DEBUG_LOCK_ALLOC. Wrap it so that we can avoid a building warning. fs/ocfs2/sysfile.c:53: warning: ‘ocfs2_sysfile_cluster_lock_key’ defined but not used Signed-off-by: Tao Ma Signed-off-by: Joel Becker commit cb25797d451dc774d9dbc402a65f16a0e32199fe Author: Jan Kara Date: Thu Jun 4 15:26:50 2009 +0200 ocfs2: Add lockdep annotations Add lockdep support to OCFS2. The support also covers all of the cluster locks except for open locks, journal locks, and local quotafile locks. These are special because they are acquired for a node, not for a particular process and lockdep cannot deal with such type of locking. Signed-off-by: Jan Kara Signed-off-by: Joel Becker commit df152c241df9e9d2b9a65d37bd02961abe7f591a Author: Sunil Mushran Date: Mon Jun 22 11:40:07 2009 -0700 ocfs2: Disable orphan scanning for local and hard-ro mounts Local and Hard-RO mounts do not need orphan scanning. Signed-off-by: Sunil Mushran Signed-off-by: Joel Becker commit 3211949f8998dde71d9fe2e063de045ece5e0473 Author: Sunil Mushran Date: Fri Jun 19 16:53:18 2009 -0700 ocfs2: Do not initialize lvb in ocfs2_orphan_scan_lock_res_init() We don't access the LVB in our ocfs2_*_lock_res_init() functions. Since the LVB can become invalid during some cluster recovery operations, the dlmglue must be able to handle an uninitialized LVB. For the orphan scan lock, we initialized an uninitialzed LVB with our scan sequence number plus one. This starts a normal orphan scan cycle. Signed-off-by: Sunil Mushran Signed-off-by: Joel Becker commit 692684e19e317a374c18e70a44d6413e51f71c11 Author: Sunil Mushran Date: Fri Jun 19 16:53:17 2009 -0700 ocfs2: Stop orphan scan as early as possible during umount Currently if the orphan scan fires a tick before the user issues the umount, the umount will wait for the queued orphan scan tasks to complete. This patch makes the umount stop the orphan scan as early as possible so as to reduce the probability of the queued tasks slowing down the umount. Signed-off-by: Sunil Mushran Signed-off-by: Joel Becker commit c3d38840abaa45c1c5a5fabbb8ffc9a0d1a764d1 Author: Sunil Mushran Date: Fri Jun 19 14:45:55 2009 -0700 ocfs2: Fix ocfs2_osb_dump() Skip printing information that is not valid for local mounts. Signed-off-by: Sunil Mushran Signed-off-by: Joel Becker commit 94e41ecfe0f202df948fdbb19a53308a58cf2184 Author: Sunil Mushran Date: Fri Jun 19 14:45:54 2009 -0700 ocfs2: Pin journal head before accessing jh->b_committed_data This patch adds jbd_lock_bh_state() and jbd_unlock_bh_state() around accessses to jh->b_committed_data. Fixes oss bugzilla#1131 http://oss.oracle.com/bugzilla/show_bug.cgi?id=1131 Signed-off-by: Sunil Mushran Signed-off-by: Joel Becker commit 1962f39abbb2d5643a7d59169422661a2d58793d Author: Tao Ma Date: Fri Jun 19 15:36:52 2009 +0800 ocfs2: Update atime in splice read if necessary. We should call ocfs2_inode_lock_atime instead of ocfs2_inode_lock in ocfs2_file_splice_read like we do in ocfs2_file_aio_read so that we can update atime in splice read if necessary. Signed-off-by: Tao Ma Signed-off-by: Joel Becker commit 1c520dfbf391e1617ef61553f815b8006a066c44 Author: Joel Becker Date: Fri Jun 19 15:14:13 2009 -0700 ocfs2: Provide the ocfs2_dlm_lvb_valid() stack API. The Lock Value Block (LVB) of a DLM lock can be lost when nodes die and the DLM cannot reconstruct its state. Clients of the DLM need to know this. ocfs2's internal DLM, o2dlm, explicitly zeroes out the LVB when it loses track of the state. This is not a standard behavior, but ocfs2 has always relied on it. Thus, an o2dlm LVB is always "valid". ocfs2 now supports both o2dlm and fs/dlm via the stack glue. When fs/dlm loses track of an LVBs state, it sets a flag (DLM_SBF_VALNOTVALID) on the Lock Status Block (LKSB). The contents of the LVB may be garbage or merely stale. ocfs2 doesn't want to try to guess at the validity of the stale LVB. Instead, it should be checking the VALNOTVALID flag. As this is the 'standard' way of treating LVBs, we will promote this behavior. We add a stack glue API ocfs2_dlm_lvb_valid(). It returns non-zero when the LVB is valid. o2dlm will always return valid, while fs/dlm will check VALNOTVALID. Signed-off-by: Joel Becker Acked-by: Mark Fasheh commit 90c699a9ee4be165966d40f1837909ccb8890a68 Author: Bartlomiej Zolnierkiewicz Date: Fri Jun 19 08:08:50 2009 +0200 block: rename CONFIG_LBD to CONFIG_LBDAF Follow-up to "block: enable by default support for large devices and files on 32-bit archs". Rename CONFIG_LBD to CONFIG_LBDAF to: - allow update of existing [def]configs for "default y" change - reflect that it is used also for large files support nowadays Signed-off-by: Bartlomiej Zolnierkiewicz Signed-off-by: Jens Axboe commit 300df7dc89cc276377fc020704e34875d5c473b6 Merge: 661adc4 9af0b38 Author: Linus Torvalds Date: Tue Jun 16 12:11:57 2009 -0700 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2/net: Use wait_event() in o2net_send_message_vec() ocfs2: Adjust rightmost path in ocfs2_add_branch. ocfs2: fdatasync should skip unimportant metadata writeout ocfs2: Remove redundant gotos in ocfs2_mount_volume() ocfs2: Add statistics for the checksum and ecc operations. ocfs2 patch to track delayed orphan scan timer statistics ocfs2: timer to queue scan of all orphan slots ocfs2: Correct ordering of ip_alloc_sem and localloc locks for directories ocfs2: Fix possible deadlock in quota recovery ocfs2: Fix possible deadlock with quotas in ocfs2_setattr() ocfs2: Fix lock inversion in ocfs2_local_read_info() ocfs2: Fix possible deadlock in ocfs2_global_read_dquot() ocfs2: update comments in masklog.h ocfs2: Don't printk the error when listing too many xattrs. commit 9af0b38ff3f4f79c62dd909405b113bf7c1a23aa Author: Sunil Mushran Date: Thu Jun 11 11:02:03 2009 -0700 ocfs2/net: Use wait_event() in o2net_send_message_vec() Replace wait_event_interruptible() with wait_event() in o2net_send_message_vec(). This is because this function is called by the dlm that expects signals to be blocked. Fixes oss bugzilla#1126 http://oss.oracle.com/bugzilla/show_bug.cgi?id=1126 Signed-off-by: Sunil Mushran Signed-off-by: Joel Becker commit 6b791bcc8b2ae21daf95d18cff2f1eca7a64c9a5 Author: Tao Ma Date: Fri Jun 12 14:18:36 2009 +0800 ocfs2: Adjust rightmost path in ocfs2_add_branch. In ocfs2_add_branch, we use the rightmost rec of the leaf extent block to generate the e_cpos for the newly added branch. In the most case, it is OK but if the parent extent block's rightmost rec covers more clusters than the leaf does, it will cause kernel panic if we insert some clusters in it. The message is something like: (7445,1):ocfs2_insert_at_leaf:3775 ERROR: bug expression: le16_to_cpu(el->l_next_free_rec) >= le16_to_cpu(el->l_count) (7445,1):ocfs2_insert_at_leaf:3775 ERROR: inode 66053, depth 0, count 28, next free 28, rec.cpos 270, rec.clusters 1, insert.cpos 275, insert.clusters 1 [] ? ocfs2_do_insert_extent+0xb58/0xda0 [ocfs2] [] ? ocfs2_insert_extent+0x5bd/0x6ba [ocfs2] [] ? ocfs2_add_clusters_in_btree+0x37f/0x564 [ocfs2] ... The panic can be easily reproduced by the following small test case (with bs=512, cs=4K, and I remove all the error handling so that it looks clear enough for reading). int main(int argc, char **argv) { int fd, i; char buf[5] = "test"; fd = open(argv[1], O_RDWR|O_CREAT); for (i = 0; i < 30; i++) { lseek(fd, 40960 * i, SEEK_SET); write(fd, buf, 5); } ftruncate(fd, 1146880); lseek(fd, 1126400, SEEK_SET); write(fd, buf, 5); close(fd); return 0; } The reason of the panic is that: the 30 writes and the ftruncate makes the file's extent list looks like: Tree Depth: 1 Count: 19 Next Free Rec: 1 ## Offset Clusters Block# 0 0 280 86183 SubAlloc Bit: 7 SubAlloc Slot: 0 Blknum: 86183 Next Leaf: 0 CRC32: 00000000 ECC: 0000 Tree Depth: 0 Count: 28 Next Free Rec: 28 ## Offset Clusters Block# Flags 0 0 1 143368 0x0 1 10 1 143376 0x0 ... 26 260 1 143576 0x0 27 270 1 143584 0x0 Now another write at 1126400(275 cluster) whiich will write at the gap between 271 and 280 will trigger ocfs2_add_branch, but the result after the function looks like: Tree Depth: 1 Count: 19 Next Free Rec: 2 ## Offset Clusters Block# 0 0 280 86183 1 271 0 143592 So the extent record is intersected and make the following operation bug out. This patch just try to remove the gap before we add the new branch, so that the root(branch) rightmost rec will cover the same right position. So in the above case, before adding branch the tree will be changed to Tree Depth: 1 Count: 19 Next Free Rec: 1 ## Offset Clusters Block# 0 0 271 86183 SubAlloc Bit: 7 SubAlloc Slot: 0 Blknum: 86183 Next Leaf: 0 CRC32: 00000000 ECC: 0000 Tree Depth: 0 Count: 28 Next Free Rec: 28 ## Offset Clusters Block# Flags 0 0 1 143368 0x0 1 10 1 143376 0x0 ... 26 260 1 143576 0x0 27 270 1 143584 0x0 And after branch add, the tree looks like Tree Depth: 1 Count: 19 Next Free Rec: 2 ## Offset Clusters Block# 0 0 271 86183 1 271 0 143592 Signed-off-by: Tao Ma Acked-by: Mark Fasheh Signed-off-by: Joel Becker commit 337eb00a2c3a421999c39c94ce7e33545ee8baa7 Author: Alessio Igor Bogani Date: Tue May 12 15:10:54 2009 +0200 Push BKL down into ->remount_fs() [xfs, btrfs, capifs, shmem don't need BKL, exempt] Signed-off-by: Alessio Igor Bogani Signed-off-by: Al Viro commit 6cfd0148425e528b859b26e436b01f23f6926224 Author: Christoph Hellwig Date: Tue May 5 15:40:36 2009 +0200 push BKL down into ->put_super Move BKL into ->put_super from the only caller. A couple of filesystems had trivial enough ->put_super (only kfree and NULLing of s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs, hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most of them probably don't need it, but I'd rather sort that out individually. Preferably after all the other BKL pushdowns in that area. [AV: original used to move lock_super() down as well; these changes are removed since we don't do lock_super() at all in generic_shutdown_super() now] [AV: fuse, btrfs and xfs are known to need no damn BKL, exempt] Signed-off-by: Christoph Hellwig Signed-off-by: Al Viro commit 94cb993f2ee99f3a9318e7b4dbb383497c4bedea Author: Christoph Hellwig Date: Mon Apr 27 09:46:44 2009 -0400 ocfs2: remove ->write_super and stop maintaining ->s_dirt Signed-off-by: Christoph Hellwig Acked-by: Joel Becker Signed-off-by: Al Viro commit e04cc15f52eb072937cec2bbd8499f37afe5e1ef Author: Hisashi Hifumi Date: Tue Jun 9 16:47:45 2009 +0900 ocfs2: fdatasync should skip unimportant metadata writeout In ocfs2, fdatasync and fsync are identical. I think fdatasync should skip committing transaction when inode->i_state is set just I_DIRTY_SYNC and this indicates only atime or/and mtime updates. Following patch improves fdatasync throughput. #sysbench --num-threads=16 --max-requests=300000 --test=fileio --file-block-size=4K --file-total-size=16G --file-test-mode=rndwr --file-fsync-mode=fdatasync run Results: -2.6.30-rc8 Test execution summary: total time: 107.1445s total number of events: 119559 total time taken by event execution: 116.1050 per-request statistics: min: 0.0000s avg: 0.0010s max: 0.1220s approx. 95 percentile: 0.0016s Threads fairness: events (avg/stddev): 7472.4375/303.60 execution time (avg/stddev): 7.2566/0.64 -2.6.30-rc8-patched Test execution summary: total time: 86.8529s total number of events: 300016 total time taken by event execution: 24.3077 per-request statistics: min: 0.0000s avg: 0.0001s max: 0.0336s approx. 95 percentile: 0.0001s Threads fairness: events (avg/stddev): 18751.0000/718.75 execution time (avg/stddev): 1.5192/0.05 Signed-off-by: Hisashi Hifumi Acked-by: Mark Fasheh Signed-off-by: Joel Becker commit 06c59bb896ce23c56829f6acf667faebd51dd3b8 Author: Tao Ma Date: Tue May 19 06:47:20 2009 +0800 ocfs2: Remove redundant gotos in ocfs2_mount_volume() Signed-off-by: Tao Ma Signed-off-by: Joel Becker commit 73be192b17e43b6dc4f492dab41d70ab5b9d2908 Author: Joel Becker Date: Tue Jan 6 14:57:08 2009 -0800 ocfs2: Add statistics for the checksum and ecc operations. It would be nice to know how often we get checksum failures. Even better, how many of them we can fix with the single bit ecc. So, we add a statistics structure. The structure can be installed into debugfs wherever the user wants. For ocfs2, we'll put it in the superblock-specific debugfs directory and pass it down from our higher-level functions. The stats are only registered with debugfs when the filesystem supports metadata ecc. Signed-off-by: Joel Becker commit 15633a220ffe74fc61bc8117e6a89a494011ea3d Author: Srinivas Eeda Date: Wed Jun 3 17:02:56 2009 -0700 ocfs2 patch to track delayed orphan scan timer statistics Patch to track delayed orphan scan timer statistics. Modifies ocfs2_osb_dump to print the following: Orphan Scan=> Local: 10 Global: 21 Last Scan: 67 seconds ago Signed-off-by: Srinivas Eeda Signed-off-by: Joel Becker commit 83273932fbefb6ceef9c0b82ac4d23900728f4d9 Author: Srinivas Eeda Date: Wed Jun 3 17:02:55 2009 -0700 ocfs2: timer to queue scan of all orphan slots When a dentry is unlinked, the unlinking node takes an EX on the dentry lock before moving the dentry to the orphan directory. Other nodes that have this dentry in cache have a PR on the same dentry lock. When the EX is requested, the other nodes flag the corresponding inode as MAYBE_ORPHANED during downconvert. The inode is finally deleted when the last node to iput the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set. A problem arises if a node is forced to free dentry locks because of memory pressure. If this happens, the node will no longer get downconvert notifications for the dentries that have been unlinked on another node. If it also happens that node is actively using the corresponding inode and happens to be the one performing the last iput on that inode, it will fail to delete the inode as it will not have the MAYBE_ORPHANED flag set. This patch fixes this shortcoming by introducing a periodic scan of the orphan directories to delete such inodes. Care has been taken to distribute the workload across the cluster so that no one node has to perform the task all the time. Signed-off-by: Srinivas Eeda Signed-off-by: Joel Becker commit edd45c08499a3e9d4c25431cd2b6a9ce5f692c92 Author: Jan Kara Date: Tue Jun 2 14:24:03 2009 +0200 ocfs2: Correct ordering of ip_alloc_sem and localloc locks for directories We use ordering ip_alloc_sem -> local alloc locks in ocfs2_write_begin(). So change lock ordering in ocfs2_extend_dir() and ocfs2_expand_inline_dir() to also use this lock ordering. Signed-off-by: Jan Kara Acked-by: Mark Fasheh Signed-off-by: Joel Becker commit 80d73f15d12f087f3fe074f8a4d6e5c5624f2b47 Author: Jan Kara Date: Tue Jun 2 14:24:02 2009 +0200 ocfs2: Fix possible deadlock in quota recovery In ocfs2_finish_quota_recovery() we acquired global quota file lock and started recovering local quota file. During this process we need to get quota structures, which calls ocfs2_dquot_acquire() which gets global quota file lock again. This second lock can block in case some other node has requested the quota file lock in the mean time. Fix the problem by moving quota file locking down into the function where it is really needed. Then dqget() or dqput() won't be called with the lock held. Signed-off-by: Jan Kara Signed-off-by: Joel Becker commit 65bac575e35915801ea518b9d8d8824367d125c8 Author: Jan Kara Date: Tue Jun 2 14:24:01 2009 +0200 ocfs2: Fix possible deadlock with quotas in ocfs2_setattr() We called vfs_dq_transfer() with global quota file lock held. This can lead to deadlocks as if vfs_dq_transfer() has to allocate new quota structure, it calls ocfs2_dquot_acquire() which tries to get quota file lock again and this can block if another node requested the lock in the mean time. Since we have to call vfs_dq_transfer() with transaction already started and quota file lock ranks above the transaction start, we cannot just rely on ocfs2_dquot_acquire() or ocfs2_dquot_release() on getting the lock if they need it. We fix the problem by acquiring pointers to all quota structures needed by vfs_dq_transfer() already before calling the function. By this we are sure that all quota structures are properly allocated and they can be freed only after we drop references to them. Thus we don't need quota file lock anywhere inside vfs_dq_transfer(). Signed-off-by: Jan Kara Signed-off-by: Joel Becker commit b4c30de39a2596503e888a7b47d19792f25913d6 Author: Jan Kara Date: Tue Jun 2 14:24:00 2009 +0200 ocfs2: Fix lock inversion in ocfs2_local_read_info() This function is called with dqio_mutex held but it has to acquire lock from global quota file which ranks above this lock. This is not deadlockable lock inversion since this code path is take only during mount when noone else can race with us but let's clean this up to silence lockdep. We just drop the dqio_mutex in the beginning of the function and reacquire it in the end since we don't need it - noone can race with us at this moment. Signed-off-by: Jan Kara Signed-off-by: Joel Becker commit 4e8a301929bfa017e6ffe11e3cf78ccaf8492801 Author: Jan Kara Date: Tue Jun 2 14:23:59 2009 +0200 ocfs2: Fix possible deadlock in ocfs2_global_read_dquot() It is not possible to get a read lock and then try to get the same write lock in one thread as that can block on downconvert being requested by other node leading to deadlock. So first drop the quota lock for reading and only after that get it for writing. Signed-off-by: Jan Kara Signed-off-by: Joel Becker commit e1defc4ff0cf57aca6c5e3ff99fa503f5943c1f1 Author: Martin K. Petersen Date: Fri May 22 17:17:49 2009 -0400 block: Do away with the notion of hardsect_size Until now we have had a 1:1 mapping between storage device physical block size and the logical block sized used when addressing the device. With SATA 4KB drives coming out that will no longer be the case. The sector size will be 4KB but the logical block size will remain 512-bytes. Hence we need to distinguish between the physical block size and the logical ditto. This patch renames hardsect_size to logical_block_size. Signed-off-by: Martin K. Petersen Signed-off-by: Jens Axboe commit 2b53bc7bff17341d8b5ac12115f5c2363638e628 Author: Coly Li Date: Tue May 5 20:03:28 2009 +0800 ocfs2: update comments in masklog.h In the mainline ocfs2 code, the interface for masklog is in files under /sys/fs/o2cb/masklog, but the comments in fs/ocfs2/cluster/masklog.h reference the old /proc interface. They are out of date. This patch modifies the comments in cluster/masklog.h, which also provides a bash script example on how to change the log mask bits. Signed-off-by: Coly Li Signed-off-by: Joel Becker commit a46fa684fcb7001d79c97f2968696997b3b79064 Author: Tao Ma Date: Mon May 4 05:18:09 2009 +0800 ocfs2: Don't printk the error when listing too many xattrs. Currently the kernel defines XATTR_LIST_MAX as 65536 in include/linux/limits.h. This is the largest buffer that is used for listing xattrs. But with ocfs2 xattr tree, we actually have no limit for the number. If filesystem has more names than can fit in the buffer, the kernel logs will be pollluted with something like this when listing: (27738,0):ocfs2_iterate_xattr_buckets:3158 ERROR: status = -34 (27738,0):ocfs2_xattr_tree_list_index_block:3264 ERROR: status = -34 So don't print "ERROR" message as this is not an ocfs2 error. Signed-off-by: Tao Ma Signed-off-by: Joel Becker