commit eda4b716ea1f2a647a39cebae66b3fae4c4b80e4 Merge: 55fb78a 7d8f987 Author: Linus Torvalds Date: Thu Dec 23 16:36:48 2010 -0800 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: Fix system inodes cache overflow. ocfs2: Hold ip_lock when set/clear flags for indexed dir. ocfs2: Adjust masklog flag values Ocfs2: Teach 'coherency=full' O_DIRECT writes to correctly up_read i_alloc_sem. ocfs2/dlm: Migrate lockres with no locks if it has a reference commit 7d8f98769e7f4bc29c38789daeb416c6a7d7c241 Author: Tao Ma Date: Wed Dec 22 17:50:30 2010 +0800 ocfs2: Fix system inodes cache overflow. When we store system inodes cache in ocfs2_super, we use a array for global system inodes. But unfortunately, the range is calculated wrongly which makes it overflow and pollute ocfs2_super->local_system_inodes. This patch fix it by setting the range properly. The corresponding bug is ossbug1303. http://oss.oracle.com/bugzilla/show_bug.cgi?id=1303 Cc: stable@kernel.org Signed-off-by: Tao Ma Signed-off-by: Joel Becker commit 8ac33dc86d37ca76d282aa112d4f2794a731064e Author: Tao Ma Date: Wed Dec 15 16:30:00 2010 +0800 ocfs2: Hold ip_lock when set/clear flags for indexed dir. When we set/clear the dyn_features for an inode we hold the ip_lock. So do it when we set/clear OCFS2_INDEXED_DIR_FL also. Signed-off-by: Tao Ma Signed-off-by: Joel Becker commit 41b41a26d4d6e4e3ad877d02377844ab9552dc16 Author: Sunil Mushran Date: Thu Dec 9 18:20:38 2010 -0800 ocfs2: Adjust masklog flag values Two masklogs had the same flag value. Signed-off-by: Sunil Mushran Signed-off-by: Joel Becker commit 39c99f12f15c8bf8257985d9b2a2548a03d18c00 Author: Tristan Ye Date: Tue Dec 7 14:35:07 2010 +0800 Ocfs2: Teach 'coherency=full' O_DIRECT writes to correctly up_read i_alloc_sem. Due to newly-introduced 'coherency=full' O_DIRECT writes also takes the EX rw_lock like buffered writes did(rw_level == 1), it turns out messing the usage of 'level' in ocfs2_dio_end_io() up, which caused i_alloc_sem being failed to get up_read'd correctly. This patch tries to teach ocfs2_dio_end_io to understand well on all locking stuffs by explicitly introducing a new bit for i_alloc_sem in iocb's private data, just like what we did for rw_lock. Signed-off-by: Tristan Ye Signed-off-by: Joel Becker commit 388c4bcb4e63e88fb1f312a2f5f9eb2623afcf5b Author: Sunil Mushran Date: Fri Nov 19 15:06:50 2010 -0800 ocfs2/dlm: Migrate lockres with no locks if it has a reference o2dlm was not migrating resources with zero locks because it assumed that that resource would get purged by dlm_thread. However, some usage patterns involve creating and dropping locks at a high rate leading to the migrate thread seeing zero locks but the purge thread seeing an active reference. When this happens, the dlm_thread cannot purge the resource and the migrate thread sees no reason to migrate that resource. The spell is broken when the migrate thread catches the resource with a lock. The fix is to make the migrate thread also consider the reference map. This usage pattern can be triggered by userspace on userdlm locks and flocks. Signed-off-by: Sunil Mushran Signed-off-by: Joel Becker commit 226291aa4641fa13cb5dec3bcb3379faa83009e2 Author: dann frazier Date: Thu Nov 18 15:03:09 2010 -0700 ocfs2_connection_find() returns pointer to bad structure If ocfs2_live_connection_list is empty, ocfs2_connection_find() will return a pointer to the LIST_HEAD, cast as a ocfs2_live_connection. This can cause an oops when ocfs2_control_send_down() dereferences c->oc_conn: Call Trace: [] ocfs2_control_message+0x28c/0x2b0 [ocfs2_stack_user] [] ocfs2_control_write+0x35/0xb0 [ocfs2_stack_user] [] vfs_write+0xb8/0x1a0 [] ? do_page_fault+0x153/0x3b0 [] sys_write+0x51/0x80 [] system_call_fastpath+0x16/0x1b Fix by explicitly returning NULL if no match is found. Signed-off-by: dann frazier Signed-off-by: Joel Becker commit a2a2f55291918f6cf9287d7beaecc7bc007a9f1c Author: Milton Miller Date: Wed Nov 17 22:20:11 2010 -0600 ocfs2: char is not always signed Commit 1c66b360fe262 (Change some lock status member in ocfs2_lock_res to char.) states that these fields need to be signed due to comparision to -1, but only changed the type from unsigned char to char. However, it is a compiler option if char is a signed or unsigned type. Change these fields to signed char so the code will work with all compilers. Signed-off-by: Milton Miller Signed-off-by: Joel Becker commit 1989a80a60d2f620bad99196d6c1801c2afd7c71 Author: Tristan Ye Date: Mon Nov 15 21:39:09 2010 +0800 Ocfs2: Stop tracking a negative dentry after dentry_iput(). I suddenly hit the problem during 2.6.37-rc1 regression test, which was introduced by commit '5e98d492406818e6a94c0ba54c61f59d40cefa4a'(Track negative entries v3), following scenario reproduces the issue easily: Node A Node B ================ ============ $touch testfile $ls testfile $rm -rf testfile $touch testfile $ls testfile ls: cannot access testfile: No such file or directory This patch stops tracking the dentry which was negativated by a inode deletion, so as to force the revaliation in next lookup, in case we'll touch the inode again in the same node. It didn't hurt the performance of multiple lookup for none-existed files anyway, while regresses a bit in the first try after a file deletion. Signed-off-by: Tristan Ye Signed-off-by: Joel Becker commit 1cf257f511918ba5b2eabd64d9acd40f1d7866ef Author: Jiri Slaby Date: Sat Nov 6 10:06:52 2010 +0100 ocfs2: fix memory leak Stanse found that o2hb_heartbeat_group_make_item leaks some memory on fail paths. Fix the paths by adding a new label and jump there. Signed-off-by: Jiri Slaby Cc: Mark Fasheh Cc: Joel Becker Cc: ocfs2-devel@oss.oracle.com Cc: Alexander Viro Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Joel Becker commit a48a982a6bd3896274dd643397c72da9258411e2 Author: David Sterba Date: Tue Nov 2 23:36:02 2010 +0100 fs/ocfs2/dlm: Use GFP_ATOMIC under spin_lock coccinelle check scripts/coccinelle/locks/call_kern.cocci found that in fs/ocfs2/dlm/dlmdomain.c an allocation with GFP_KERNEL is done with locks held: dlm_query_region_handler spin_lock(dlm_domain_lock) dlm_match_regions kmalloc(GFP_KERNEL) Change it to GFP_ATOMIC. Signed-off-by: David Sterba CC: Joel Becker CC: Mark Fasheh CC: ocfs2-devel@oss.oracle.com -- Exists in v2.6.37-rc1 and current linux-next. Signed-off-by: Joel Becker commit 451a3c24b0135bce54542009b5fde43846c7cf67 Author: Arnd Bergmann Date: Wed Nov 17 16:26:55 2010 +0100 BKL: remove extraneous #include The big kernel lock has been removed from all these files at some point, leaving only the #include. Remove this too as a cleanup. Signed-off-by: Arnd Bergmann Signed-off-by: Linus Torvalds commit 1ca7318cacdac5262492149cf46abc06c95693fa Merge: 0143832 1c66b36 Author: Linus Torvalds Date: Sun Nov 14 13:04:53 2010 -0800 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: Change some lock status member in ocfs2_lock_res to char. commit 1c66b360fe26204e2aa14e45086b4a6b8890b1a2 Author: Tao Ma Date: Sat Nov 13 16:22:02 2010 +0800 ocfs2: Change some lock status member in ocfs2_lock_res to char. Commit 83fd9c7 changes l_level, l_requested and l_blocking of ocfs2_lock_res from int to unsigned char. But actually it is initially as -1(ocfs2_lock_res_init_common) which correspoding to 255 for unsigned char. So the whole dlm lock mechanism doesn't work now which means a disaster to ocfs2. Cc: Goldwyn Rodrigues Signed-off-by: Tao Ma Signed-off-by: Joel Becker commit 3c26ff6e499ee7e6f9f2bc7da5f2f30d80862ecf Author: Al Viro Date: Sun Jul 25 11:46:36 2010 +0400 convert get_sb_nodev() users Signed-off-by: Al Viro commit 152a08366671080f27b32e0c411ad620c5f88b57 Author: Al Viro Date: Sun Jul 25 00:46:55 2010 +0400 new helper: mount_bdev() ... and switch of the obvious get_sb_bdev() users to ->mount() Signed-off-by: Al Viro commit 85fe4025c616a7c0ed07bc2fc8c5371b07f3888c Author: Christoph Hellwig Date: Sat Oct 23 11:19:54 2010 -0400 fs: do not assign default i_ino in new_inode Instead of always assigning an increasing inode number in new_inode move the call to assign it into those callers that actually need it. For now callers that need it is estimated conservatively, that is the call is added to all filesystems that do not assign an i_ino by themselves. For a few more filesystems we can avoid assigning any inode number given that they aren't user visible, and for others it could be done lazily when an inode number is actually needed, but that's left for later patches. Signed-off-by: Christoph Hellwig Signed-off-by: Dave Chinner Signed-off-by: Al Viro commit 7de9c6ee3ecffd99e1628e81a5ea5468f7581a1f Author: Al Viro Date: Sat Oct 23 11:11:40 2010 -0400 new helper: ihold() Clones an existing reference to inode; caller must already hold one. Signed-off-by: Al Viro commit ebdec241d509cf69f6ebf1ecdc036359d3dbe154 Author: Christoph Hellwig Date: Wed Oct 6 10:47:23 2010 +0200 fs: kill block_prepare_write __block_write_begin and block_prepare_write are identical except for slightly different calling conventions. Convert all callers to the __block_write_begin calling conventions and drop block_prepare_write. Signed-off-by: Christoph Hellwig Signed-off-by: Al Viro commit 229aebb873e29726b91e076161649cf45154b0bf Merge: 8de547e 50a23e6 Author: Linus Torvalds Date: Sun Oct 24 13:41:39 2010 -0700 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) Update broken web addresses in arch directory. Update broken web addresses in the kernel. Revert "drivers/usb: Remove unnecessary return's from void functions" for musb gadget Revert "Fix typo: configuation => configuration" partially ida: document IDA_BITMAP_LONGS calculation ext2: fix a typo on comment in ext2/inode.c drivers/scsi: Remove unnecessary casts of private_data drivers/s390: Remove unnecessary casts of private_data net/sunrpc/rpc_pipe.c: Remove unnecessary casts of private_data drivers/infiniband: Remove unnecessary casts of private_data drivers/gpu/drm: Remove unnecessary casts of private_data kernel/pm_qos_params.c: Remove unnecessary casts of private_data fs/ecryptfs: Remove unnecessary casts of private_data fs/seq_file.c: Remove unnecessary casts of private_data arm: uengine.c: remove C99 comments arm: scoop.c: remove C99 comments Fix typo configue => configure in comments Fix typo: configuation => configuration Fix typo interrest[ing|ed] => interest[ing|ed] Fix various typos of valid in comments ... Fix up trivial conflicts in: drivers/char/ipmi/ipmi_si_intf.c drivers/usb/gadget/rndis.c net/irda/irnet/irnet_ppp.c commit f8cae0f03f75adb54b1d48ddbc90f84a1f5de186 Author: Linus Torvalds Date: Fri Oct 22 19:30:38 2010 -0700 ocfs2: drop the BLKDEV_IFL_WAIT flag Commit dd3932eddf42 ("block: remove BLKDEV_IFL_WAIT") had removed the flag argument to blkdev_issue_flush(), but the ocfs2 merge brought in a new one. It didn't cause a merge conflict, so the merges silently worked out fine, but the result didn't actually compile. Signed-off-by: Linus Torvalds commit 092e0e7e520a1fca03e13c9f2d157432a8657ff2 Merge: 79f14b7 776c163 Author: Linus Torvalds Date: Fri Oct 22 10:52:56 2010 -0700 Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl * 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: vfs: make no_llseek the default vfs: don't use BKL in default_llseek llseek: automatically add .llseek fop libfs: use generic_file_llseek for simple_attr mac80211: disallow seeks in minstrel debug code lirc: make chardev nonseekable viotape: use noop_llseek raw: use explicit llseek file operations ibmasmfs: use generic_file_llseek spufs: use llseek in all file operations arm/omap: use generic_file_llseek in iommu_debug lkdtm: use generic_file_llseek in debugfs net/wireless: use generic_file_llseek in debugfs drm: use noop_llseek commit 79f14b7c56d3b3ba58f8b43d1f70b9b71477a800 Merge: c37927d 6d7bccc Author: Linus Torvalds Date: Fri Oct 22 10:52:01 2010 -0700 Merge branch 'vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl * 'vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: (30 commits) BKL: remove BKL from freevxfs BKL: remove BKL from qnx4 autofs4: Only declare function when CONFIG_COMPAT is defined autofs: Only declare function when CONFIG_COMPAT is defined ncpfs: Lock socket in ncpfs while setting its callbacks fs/locks.c: prepare for BKL removal BKL: Remove BKL from ncpfs BKL: Remove BKL from OCFS2 BKL: Remove BKL from squashfs BKL: Remove BKL from jffs2 BKL: Remove BKL from ecryptfs BKL: Remove BKL from afs BKL: Remove BKL from USB gadgetfs BKL: Remove BKL from autofs4 BKL: Remove BKL from isofs BKL: Remove BKL from fat BKL: Remove BKL from ext2 filesystem BKL: Remove BKL from do_new_mount() BKL: Remove BKL from cgroup BKL: Remove BKL from NTFS ... commit 2decd65a2630633cee04d0b83fdcee46ad2989a1 Author: Jeff Liu Date: Tue Oct 12 11:18:18 2010 +0800 ocfs2: Avoid to evaluate xattr block flags again. It was evaludated to indexed before, check it is ok i think. Signed-off-by: Jeff Liu Signed-off-by: Joel Becker commit fc3718918f13ad72827d62d36ea0f5fb55090644 Merge: 7bdb0d1 d4396ea Author: Joel Becker Date: Fri Oct 15 13:03:09 2010 -0700 Merge branch 'globalheartbeat-2' of git://oss.oracle.com/git/smushran/linux-2.6 into ocfs2-merge-window Conflicts: fs/ocfs2/ocfs2.h commit d4396eafe402b710a8535137b3bf2abe6c059a15 Author: Sunil Mushran Date: Fri Oct 15 11:57:21 2010 -0700 ocfs2/cluster: Release debugfs file elapsed_time_in_ms An earlier commit forgot to remove a debugfs file, elapsed_time_in_ms. Signed-off-by: Sunil Mushran commit 6038f373a3dc1f1c26496e60b6c40b164716f07e Author: Arnd Bergmann Date: Sun Aug 15 18:52:59 2010 +0200 llseek: automatically add .llseek fop All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode *i, struct file *f) { <+... ( nonseekable_open(...) | nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable */ }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, /* open uses nonseekable */ }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, /* we have seq_read */ }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, /* readdir is present */ }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, /* read accesses f_pos */ }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, /* write accesses f_pos */ }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, /* read and write both use no f_pos */ }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, /* write uses no f_pos */ }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, /* read uses no f_pos */ }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, /* no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: Arnd Bergmann Cc: Julia Lawall Cc: Christoph Hellwig commit 7bdb0d18bfd381cc5491eb95973ec5604b356c7e Author: Tristan Ye Date: Mon Oct 11 16:46:39 2010 +0800 ocfs2: Add a mount option "coherency=*" to handle cluster coherency for O_DIRECT writes. Currently, the default behavior of O_DIRECT writes was allowing concurrent writing among nodes to the same file, with no cluster coherency guaranteed (no EX lock held). This can leave stale data in the cache for buffered reads on other nodes. The new mount option introduce a chance to choose two different behaviors for O_DIRECT writes: * coherency=full, as the default value, will disallow concurrent O_DIRECT writes by taking EX locks. * coherency=buffered, allow concurrent O_DIRECT writes without EX lock among nodes, which gains high performance at risk of getting stale data on other nodes. Signed-off-by: Tristan Ye Signed-off-by: Joel Becker commit 75d9bbc73804285020aa4d99bd2a9600edea8945 Author: Goldwyn Rodrigues Date: Mon Oct 11 12:57:09 2010 -0500 Initialize max_slots early Functions such as ocfs2_recovery_init() make use of osb->max_slots. Initialize osb->max_slots early so the functions may use the correct value. Signed-off-by: Goldwyn Rodrigues Signed-off-by: Joel Becker commit f30d44f3e54a94e037da7a71d714b585dab28d9e Author: Poyo VL Date: Mon Oct 11 13:45:52 2010 -0700 When I tried to compile I got the following warning: fs/ocfs2/slot_map.c: In function ‘ocfs2_init_slot_info’: fs/ocfs2/slot_map.c:360: warning: ‘bytes’ may be used uninitialized in this function fs/ocfs2/slot_map.c:360: note: ‘bytes’ was declared here Compiler: gcc version 4.4.3 (GCC) on Mandriva I'm not sure why this warning occurs, I think compiler don't know that variable "bytes" is initialized when it is sent by reference to ocfs2_slot_map_physical_size and it throws that ugly warning. However, a simple initialization of "bytes" variable with 0 will fix it. Signed-off-by: Ionut Gabriel Popescu Signed-off-by: Joel Becker commit 9b5cd10e4c14a1a642076ace6a73be3d33c91fb6 Author: Srinivas Eeda Date: Tue Oct 5 15:53:06 2010 -0700 ocfs2: validate bg_free_bits_count after update This patch adds a safe check to ensure bg_free_bits_count doesn't exceed bg_bits in a group descriptor. This is to avoid on disk corruption that was seen recently. debugfs: group <52803072> Group Chain: 179 Parent Inode: 11 Generation: 2959379682 CRC32: 00000000 ECC: 0000 ## Block# Total Used Free Contig Size 0 52803072 32256 4294965350 34202 18207 4032 ...... Signed-off-by: Srinivas Eeda Signed-off-by: Joel Becker commit 4d94aa1b1d437f9513ddc89974d8bd214b8304f6 Author: Sunil Mushran Date: Sat Oct 9 10:27:04 2010 -0700 ocfs2/cluster: Bump up dlm protocol to version 1.1 dlm protocol 1.1. activates messages DLM_QUERY_REGION and DLM_QUERY_NODEINFO that are a must for global heartbeat. It also activates o2hb_global_heartbeat_active(). Signed-off-by: Sunil Mushran commit 43695d095dfaf266a8a940d9b07eed7f46076b49 Author: Sunil Mushran Date: Wed Oct 6 17:55:09 2010 -0700 ocfs2/cluster: Show per region heartbeat elapsed time This patch adds a per region debugfs file that shows the elapsed time since the time the o2hb timer was last armed. Signed-off-by: Sunil Mushran commit d6aa1c7c9e4b48081c2302e14b0f857017461efd Author: Sunil Mushran Date: Wed Oct 6 18:50:50 2010 -0700 ocfs2/cluster: Add mlogs for heartbeat up/down events This patch adds mlogs for o2hb up and down events. Signed-off-by: Sunil Mushran commit 1f28530537f106f83e5cf7ef0193075667b6d520 Author: Sunil Mushran Date: Wed Oct 6 17:55:12 2010 -0700 ocfs2/cluster: Create debugfs dir/files for each region This patch creates debugfs directory for each o2hb region and creates files to expose the region number and the per region live node bitmap. This information will be useful in debugging cluster issues. Signed-off-by: Sunil Mushran commit a6de013654b4839c8609e26241ebd9eb1ecc52e6 Author: Sunil Mushran Date: Wed Oct 6 17:55:13 2010 -0700 ocfs2/cluster: Create debugfs files for live, quorum and failed region bitmaps This patch prints the bitmaps of live, quorum and failed regions. This information will be useful in debugging cluster issues. Signed-off-by: Sunil Mushran commit b1c5ebfbe398b3360614a4788c02061cd153e60a Author: Sunil Mushran Date: Thu Oct 7 17:05:52 2010 -0700 ocfs2/cluster: Maintain bitmap of failed regions In global heartbeat mode, we track the bitmap of regions that have seen heartbeat timeouts. We fence if the number of such regions is greater than or equal to half the number of quorum regions. Signed-off-by: Sunil Mushran commit 43182d2a799865872041b6e4d8387131e9462f56 Author: Sunil Mushran Date: Wed Oct 6 17:55:16 2010 -0700 ocfs2/cluster: Maintain bitmap of quorum regions o2hb allows online adding of regions. However, a newly added region is not used in quorum calculations unless it has been added on all nodes. This patch tracks a bitmap of such quorum regions. Signed-off-by: Sunil Mushran commit e7d656baf6607a0775f4ca85464a4ead306741e5 Author: Sunil Mushran Date: Wed Oct 6 17:55:18 2010 -0700 ocfs2/cluster: Track bitmap of live heartbeat regions A heartbeat region becomes live (or active) after a fixed number of (steady) iterations. Signed-off-by: Sunil Mushran commit 536f0741f324f116d8b059295999945a2dac56bc Author: Sunil Mushran Date: Thu Oct 7 17:03:07 2010 -0700 ocfs2/cluster: Track number of global heartbeat regions In global heartbeat mode, we have a upper limit for the number of active regions. This patch adds the facility to track the number of active global heartbeat regions and fails to start heartbeat if the number exceeds the maximum. Signed-off-by: Sunil Mushran commit 823a637ae933fde8fdb280612dd3ff9912e301e3 Author: Sunil Mushran Date: Wed Oct 6 17:55:21 2010 -0700 ocfs2/cluster: Maintain live node bitmap per heartbeat region Currently we track a global livenode bitmap that keeps track of all nodes that are heartbeating in all regions. This patch adds the ability to track the livenode bitmap on a per region basis. We will use this facility in a later patch to allow us to withstand the loss of a minority number of regions. Signed-off-by: Sunil Mushran commit 8ca8b0bbd841b6bcd8ac05e51b0143aa61cfeff3 Author: Sunil Mushran Date: Thu Oct 7 17:01:27 2010 -0700 ocfs2/cluster: Reorganize o2hb debugfs init o2hb debugfs handling is reorganized to allow for easy expansion. Signed-off-by: Sunil Mushran commit 0e105d37c2adb19cb777aa6701a866f211764a30 Author: Sunil Mushran Date: Thu Oct 7 17:00:16 2010 -0700 ocfs2/cluster: Check slots for unconfigured live nodes o2hb currently checks slots for configured nodes only. This patch makes it check the slots for the live nodes too to take care of a race in which a node is removed from the configuration but not from the live map. Signed-off-by: Sunil Mushran commit 39a298563e0619b1b6e2e0974e58801de780621c Author: Sunil Mushran Date: Thu Oct 7 17:30:17 2010 -0700 ocfs2/cluster: Print messages when adding/removing nodes Prints messages when the user adds or removes nodes. Signed-off-by: Sunil Mushran commit 18c50cb0d3c293eabd6c2ef89c43f2a968e709ed Author: Sunil Mushran Date: Wed Oct 6 18:26:59 2010 -0700 ocfs2/cluster: Print messages when adding/removing heartbeat regions Prints messages when the user adds or removes heartbeat regions in global heartbeat mode. These messages are useful when debugging cluster related issues. Signed-off-by: Sunil Mushran commit 18cfdf1b1a8e83b09e4185c02396257ce7e7bee3 Author: Sunil Mushran Date: Thu Oct 7 16:47:03 2010 -0700 ocfs2/dlm: Add message DLM_QUERY_NODEINFO Adds new dlm message DLM_QUERY_NODEINFO that sends the attributes of all registered nodes. This message is sent if the negotiated dlm protocol is 1.1 or higher. If the information of the joining node does not match that of any existing nodes, the join domain request is rejected. Signed-off-by: Sunil Mushran commit 5f3c6d9c615770708df464c170316f83cf437242 Author: Sunil Mushran Date: Wed Oct 6 17:55:29 2010 -0700 ocfs2: Print message if user mounts without starting global heartbeat In global heartbeat mode, the heartbeat is started by the user. This patch prints an error if the user attempts to mount a volume without starting the heartbeat. Signed-off-by: Sunil Mushran commit ea2034416b54700e30371f2ad6517cbb94674083 Author: Sunil Mushran Date: Sat Oct 9 10:26:23 2010 -0700 ocfs2/dlm: Add message DLM_QUERY_REGION Adds new dlm message DLM_QUERY_REGION that sends the names of all active heartbeat regions. This message is only sent in the global heartbeat mode. If the regions in the joining node do not fully match the ones in the active nodes, the join domain request is rejected. Signed-off-by: Sunil Mushran commit b3c85c4cdf77154acc940dd0f14d1fb99cbbaf75 Author: Sunil Mushran Date: Thu Oct 7 14:31:06 2010 -0700 ocfs2/cluster: Get all heartbeat regions Export function in o2hb to get a list of heartbeat regions. It also adds an upper limit to the length of the heartbeat region name. o2hb_global_heartbeat_active() currently disables global heartbeat. It will be enabled in a later patch after all the code is added. Signed-off-by: Sunil Mushran commit b1365d0bd14b912cceb424cbeed9fe939a9038e3 Author: Sunil Mushran Date: Wed Oct 6 17:55:34 2010 -0700 ocfs2/dlm: Expose dlm_protocol in dlm_state Add dlm_protocol to the list of info shown by the debugfs file, dlm_state. Signed-off-by: Sunil Mushran commit 2c442719e90a44a6982c033d69df4aae4b167cfa Author: Sunil Mushran Date: Thu Oct 7 15:23:50 2010 -0700 ocfs2: Add support for heartbeat=global mount option Adds support for heartbeat=global mount option. It ensures that the heartbeat mode passed matches the one enabled on disk. Signed-off-by: Sunil Mushran commit 98f486f23bc5b6a6fa90e1a0707b7e9fe0e7f3e4 Author: Sunil Mushran Date: Sat Oct 9 10:24:46 2010 -0700 ocfs2: Add an incompat feature flag OCFS2_FEATURE_INCOMPAT_CLUSTERINFO OCFS2_FEATURE_INCOMPAT_CLUSTERINFO allows us to use sb->s_cluster_info for both userspace and o2cb cluster stacks. It also allows us to extend cluster info to include stack flags. This patch also adds stackflags to sb->s_clusterinfo. It also introduces a clusterinfo flag OCFS2_CLUSTER_O2CB_GLOBAL_HEARTBEAT to denote the enabled global heartbeat mode. This incompat flag can be set/cleared using tunefs.ocfs2 --fs-features. The clusterinfo flag is set/cleared using tunefs.ocfs2 --update-cluster-stack. Signed-off-by: Sunil Mushran commit 54b5187b5a1ad6573ade8b18e065dda92501fc52 Author: Sunil Mushran Date: Thu Oct 7 15:26:08 2010 -0700 ocfs2/cluster: Add heartbeat mode configfs parameter Add heartbeat mode parameter to the configfs tree. This will be used to set/show the heartbeat mode. The user is free to toggle the mode between local and global as long as there is no active heartbeat region. Signed-off-by: Sunil Mushran commit 60056794127a25d641465b706e8828186f7a2e1f Author: Arnd Bergmann Date: Wed Feb 24 13:25:33 2010 +0100 BKL: Remove BKL from OCFS2 The BKL in ocfs2/dlmfs is used in put_super, fill_super and remount_fs that are all three protected by the superblocks s_umount rw_semaphore. The use in ocfs2_control_open is evidently unrelated and the function is protected by ocfs2_control_lock. Therefore it is safe to remove the BKL entirely. Signed-off-by: Arnd Bergmann Cc: Mark Fasheh Cc: Joel Becker commit db71922217a214e5c9268448e537b54fc1f301ea Author: Jan Blunck Date: Sun Aug 15 22:51:10 2010 +0200 BKL: Explicitly add BKL around get_sb/fill_super This patch is a preparation necessary to remove the BKL from do_new_mount(). It explicitly adds calls to lock_kernel()/unlock_kernel() around get_sb/fill_super operations for filesystems that still uses the BKL. I've read through all the code formerly covered by the BKL inside do_kern_mount() and have satisfied myself that it doesn't need the BKL any more. do_kern_mount() is already called without the BKL when mounting the rootfs and in nfsctl. do_kern_mount() calls vfs_kern_mount(), which is called from various places without BKL: simple_pin_fs(), nfs_do_clone_mount() through nfs_follow_mountpoint(), afs_mntpt_do_automount() through afs_mntpt_follow_link(). Both later functions are actually the filesystems follow_link inode operation. vfs_kern_mount() is calling the specified get_sb function and lets the filesystem do its job by calling the given fill_super function. Therefore I think it is safe to push down the BKL from the VFS to the low-level filesystems get_sb/fill_super operation. [arnd: do not add the BKL to those file systems that already don't use it elsewhere] Signed-off-by: Jan Blunck Signed-off-by: Arnd Bergmann Cc: Matthew Wilcox Cc: Christoph Hellwig commit 817f2c842d6c38acfd58d20d29ba583ec467ae35 Author: Nikanth Karthikesan Date: Mon Sep 20 11:44:00 2010 +0530 Fix various typos of valid in comments Fix various typos of valid. Signed-off-by: Nikanth Karthikesan Signed-off-by: Jiri Kosina commit 93f3b86fb1bd0ad7b4a5eb1ad1fdae2b290633b7 Author: Joel Becker Date: Wed Sep 15 17:00:58 2010 -0700 ocfs2: Initialize the bktcnt variable properly, and call it bucket_count Signed-off-by: Joel Becker commit c0e1a3e80dc030cd54f06416efbf9e439bf6ecb7 Author: Joel Becker Date: Wed Sep 15 16:56:54 2010 -0700 ocfs2: Silence unused warning. When CONFIG_OCFS2_DEBUG_MASKLOG is undefined, we don't use the dentry variable in ocfs2_sync_file(). Let's just move all access to the dentry inside the logging call. Signed-off-by: Joel Becker commit 5e98d492406818e6a94c0ba54c61f59d40cefa4a Author: Goldwyn Rodrigues Date: Mon Jun 28 10:04:32 2010 -0500 Track negative entries v3 Track negative dentries by recording the generation number of the parent directory in d_fsdata. The generation number for the parent directory is recorded in the inode_info, which increments every time the lock on the directory is dropped. If the generation number of the parent directory and the negative dentry matches, there is no need to perform the revalidate, else a revalidate is forced. This improves performance in situations where nodes look for the same non-existent file multiple times. Thanks Mark for explaining the DLM sequence. Signed-off-by: Goldwyn Rodrigues Signed-off-by: Joel Becker commit b4d693fcc5fe99ed211addb5c6a0f8398f0b266e Author: Tao Ma Date: Mon Aug 16 16:58:21 2010 +0800 ocfs2: Cache system inodes of other slots. Durring orphan scan, if we are slot 0, and we are replaying orphan_dir:0001, the general process is that for every file in this dir: 1. we will iget orphan_dir:0001, since there is no inode for it. we will have to create an inode and read it from the disk. 2. do the normal work, such as delete_inode and remove it from the dir if it is allowed. 3. call iput orphan_dir:0001 when we are done. In this case, since we have no dcache for this inode, i_count will reach 0, and VFS will have to call clear_inode and in ocfs2_clear_inode we will checkpoint the inode which will let ocfs2_cmt and journald begin to work. 4. We loop back to 1 for the next file. So you see, actually for every deleted file, we have to read the orphan dir from the disk and checkpoint the journal. It is very time consuming and cause a lot of journal checkpoint I/O. A better solution is that we can have another reference for these inodes in ocfs2_super. So if there is no other race among nodes(which will let dlmglue to checkpoint the inode), for step 3, clear_inode won't be called and for step 1, we may only need to read the inode for the 1st time. This is a big win for us. So this patch will try to cache system inodes of other slots so that we will have one more reference for these inodes and avoid the extra inode read and journal checkpoint. Signed-off-by: Tao Ma Signed-off-by: Joel Becker commit 3bdb8efd94a73bb137e3315cd831cbc874052b4b Author: Patrick J. LoPresti Date: Thu Jul 22 15:05:57 2010 -0700 OCFS2: Allow huge (> 16 TiB) volumes to mount The OCFS2 developers have already done all of the hard work to allow volumes larger than 16 TiB. But there is still a "sanity check" in fs/ocfs2/super.c that prevents the mounting of such volumes, even when the cluster size and journal options would allow it. This patch replaces that sanity check with a more sophisticated one to mount a huge volume provided that (a) it is addressable by the raw word/address size of the system (borrowing a test from ext4); (b) the volume is using JBD2; and (c) the JBD2_FEATURE_INCOMPAT_64BIT flag is set on the journal. I factored out the sanity check into its own function. I also moved it from ocfs2_initialize_super() down to ocfs2_check_volume(); any earlier, and the journal will not have been initialized yet. This patch is one of a pair, and it depends on the other ("JBD2: Allow feature checks before journal recovery"). I have tested this patch on small volumes, huge volumes, and huge volumes without 64-bit block support in the journal. All of them appear to work or to fail gracefully, as appropriate. Signed-off-by: Patrick LoPresti Signed-off-by: Joel Becker commit 729963a1ff8d069d05dab6a024bfd59805ac622c Merge: 17ae521 6ea4843 Author: Joel Becker Date: Fri Sep 10 08:41:04 2010 -0700 Merge branch 'cow_readahead' of git://oss.oracle.com/git/tma/linux-2.6 into merge-2 commit 17ae521158d6fe89cfdd00a9990aa40d02e81391 Author: Tao Ma Date: Mon Aug 2 11:02:14 2010 +0800 ocfs2: Remove obsolete comments before ocfs2_start_trans. Signed-off-by: Tao Ma Signed-off-by: Joel Becker commit f9c57ada32ea3f2e12600cf274035fff063b2e0f Author: Tao Ma Date: Mon Aug 2 11:02:13 2010 +0800 ocfs2: Remove unused old_id in ocfs2_commit_cache. Signed-off-by: Tao Ma Signed-off-by: Joel Becker commit 4c38881f87c21ada5609a4065cb10e3e74da0d6e Author: Jan Kara Date: Thu Aug 5 20:32:46 2010 +0200 ocfs2: Remove ocfs2_sync_inode() ocfs2_sync_inode() is used only from ocfs2_sync_file(). But all data has already been written before calling ocfs2_sync_file() and ocfs2 doesn't use inode's private_list for tracking metadata buffers thus sync_mapping_buffers() is superfluous as well. Signed-off-by: Jan Kara Acked-by: Mark Fasheh Signed-off-by: Joel Becker commit 83fd9c7f65634ac440a6b9b7a63ba562f213ac60 Author: Goldwyn Rodrigues Date: Thu Jun 10 17:21:36 2010 -0500 Reorganize data elements to reduce struct sizes Thanks for the comments. I have incorportated them all. CONFIG_OCFS2_FS_STATS is enabled and CONFIG_DEBUG_LOCK_ALLOC is disabled. Statistics now look like - ocfs2_write_ctxt: 2144 - 2136 = 8 ocfs2_inode_info: 1960 - 1848 = 112 ocfs2_journal: 168 - 160 = 8 ocfs2_lock_res: 336 - 304 = 32 ocfs2_refcount_tree: 512 - 472 = 40 Signed-off-by: Goldwyn Rodrigues Signed-off-by: Joel Becker commit 95fa859a268fd7d9bae6f2d4d095e3f61100ac1b Author: Tao Ma Date: Wed Jun 9 16:48:59 2010 +0800 ocfs2: Remove obscure error handling in direct_write. In ocfs2, actually we don't allow any direct write pass i_size, see the function ocfs2_prepare_inode_for_write. So we don't need the bogus simple_setsize. Signed-off-by: Tao Ma Signed-off-by: Joel Becker commit 3c3f20c9813391ba4004764072989744395cf405 Author: Tao Ma Date: Tue Jun 1 13:58:13 2010 +0800 ocfs2: Add some trace log for orphan scan. Now orphan scan worker has no trace log, so it is very hard to tell whether it is finished or blocked. So add 2 mlog trace log so that we can tell whether the current orphan scan worker is blocked or not. It does help when I analyzed a orphan scan bug. Signed-off-by: Tao Ma Signed-off-by: Joel Becker commit ddee5cdb70e6f87de2fc696b87bd7bd184a51eb8 Author: Tristan Ye Date: Sat May 22 16:26:33 2010 +0800 Ocfs2: Add new OCFS2_IOC_INFO ioctl for ocfs2 v8. The reason why we need this ioctl is to offer the none-privileged end-user a possibility to get filesys info gathering. We use OCFS2_IOC_INFO to manipulate the new ioctl, userspace passes a structure to kernel containing an array of request pointers and request count, such as, * From userspace: struct ocfs2_info_blocksize oib = { .ib_req = { .ir_magic = OCFS2_INFO_MAGIC, .ir_code = OCFS2_INFO_BLOCKSIZE, ... } ... } struct ocfs2_info_clustersize oic = { ... } uint64_t reqs[2] = {(unsigned long)&oib, (unsigned long)&oic}; struct ocfs2_info info = { .oi_requests = reqs, .oi_count = 2, } ret = ioctl(fd, OCFS2_IOC_INFO, &info); * In kernel: Get the request pointers from *info*, then handle each request one bye one. Idea here is to make the spearated request small enough to guarantee a better backward&forward compatibility since a small piece of request would be less likely to be broken if filesys on raw disk get changed. Currently, the following 7 requests are supported per the requirement from userspace tool o2info, and I believe it will grow over time:-) OCFS2_INFO_CLUSTERSIZE OCFS2_INFO_BLOCKSIZE OCFS2_INFO_MAXSLOTS OCFS2_INFO_LABEL OCFS2_INFO_UUID OCFS2_INFO_FS_FEATURES OCFS2_INFO_JOURNAL_SIZE This ioctl is only specific to OCFS2. Signed-off-by: Tristan Ye Signed-off-by: Joel Becker commit 6ea4843f53282465f2bdbe5eedde7d8c3081dfdf Author: Tao Ma Date: Thu Aug 12 10:31:34 2010 +0800 ocfs2: Add readhead during CoW. In CoW, when we meet with a readahead page, we know it is time to move the readahead window. So carry out a new readahead. Signed-off-by: Tao Ma commit 7b61cf54a2445ad21df9dd44f0c8bb90154ddea8 Author: Tao Ma Date: Thu Jun 17 14:14:36 2010 +0800 ocfs2: Add readahead support for CoW. Add a new function ocfs2_readahead_for_cow so that we start readahead before we start our CoW. Signed-off-by: Tao Ma commit 155027121fe52f9b4f25e9d156c22f2f5012e5fe Author: Tao Ma Date: Thu Aug 12 10:36:38 2010 +0800 ocfs2: Add struct file to ocfs2_refcount_cow. Add a new parameter 'struct file *' to ocfs2_refcount_cow so that we can add readahead support later. Signed-off-by: Tao Ma commit b890823635e022467b924e3d9da8c5166a4e349c Author: Tao Ma Date: Thu Aug 12 10:27:14 2010 +0800 ocfs2: pass struct file* to ocfs2_prepare_inode_for_write. struct file * has file_ra_state to store the readahead state and data. So pass this to ocfs2_prepare_inode_for_write. so that it can be used in ocfs2_refcount_cow. Signed-off-by: Tao Ma commit 0378da0fda6edf5aaffda6f1248a78986bd955b5 Author: Tao Ma Date: Thu Aug 12 10:25:28 2010 +0800 ocfs2: pass struct file* to ocfs2_write_begin_nolock. struct file * has file_ra_state to store the readahead state and data. So pass this to ocfs2_write_begin_nolock so that it can be used in ocfs2_refcount_cow. Signed-off-by: Tao Ma