commit bda0233b89c10ae46ccecb78bffdaf0fd7833d17 Author: Sunil Mushran Date: Fri Sep 21 11:41:43 2007 -0700 ocfs2: Unlock mutex in local alloc failure case The fs was not unlocking the local alloc inode mutex in the code path in which it failed to find a window of free bits in the global bitmap. Signed-off-by: Sunil Mushran Signed-off-by: Mark Fasheh commit 813d974c53a2b353566a86bb127625b403696dae Author: Sunil Mushran Date: Thu Sep 20 10:59:48 2007 -0700 ocfs2: Pack vote message and response structures The ocfs2_vote_msg and ocfs2_response_msg structs needed to be packed to ensure similar sizeofs in 32-bit and 64-bit arches. Without this, we had inadvertantly broken 32/64 bit cross mounts. Signed-off-by: Sunil Mushran Signed-off-by: Mark Fasheh commit 5c26a7b70f89c36e8d9acc95cb896c3cd205fc8d Author: Mark Fasheh Date: Tue Sep 18 17:49:29 2007 -0700 ocfs2: Don't double set write parameters The target page offsets were being incorrectly set a second time in ocfs2_prepare_page_for_write(), which was causing problems on a 16k page size kernel. Additionally, ocfs2_write_failure() was incorrectly using those parameters instead of the parameters for the individual page being cleaned up. Signed-off-by: Mark Fasheh commit db56246c6980e376b02d2da568d119da71f82fb9 Author: Mark Fasheh Date: Mon Sep 17 09:06:29 2007 -0700 ocfs2: Fix pos/len passed to ocfs2_write_cluster This was broken for file systems whose cluster size is greater than page size. Pos needs to be incremented as we loop through the descriptors, and len needs to be capped to the size of a single cluster. Signed-off-by: Mark Fasheh commit 415cb800375cc4e89fb5a6a454e484bd4adbffb4 Author: Mark Fasheh Date: Sun Sep 16 20:10:16 2007 -0700 ocfs2: Allow smaller allocations during large writes The ocfs2 write code loops through a page much like the block code, except that ocfs2 allocation units can be any size, including larger than page size. Typically it's equal to or larger than page size - most kernels run 4k pages, the minimum ocfs2 allocation (cluster) size. Some changes introduced during 2.6.23 changed the way writes to pages are handled, and inadvertantly broke support for > 4k page size. Instead of just writing one cluster at a time, we now handle the whole page in one pass. This means that multiple (small) seperate allocations might happen in the same pass. The allocation code howver typically optimizes by getting the maximum which was reserved. This triggered a BUG_ON in the extend code where it'd ask for a single bit (for one part of a > 4k page) and get back more than it asked for. Fix this by providing a variant of the high level allocation function which allows the caller to specify a maximum. The traditional function remains and just calls the new one with a maximum determined from the initial reservation. Signed-off-by: Mark Fasheh commit e535e2efd295c3990bb9f654c8bb6bd176ebdc2b Author: Mark Fasheh Date: Fri Aug 31 10:23:41 2007 -0700 ocfs2: Fix calculation of i_blocks during truncate We were setting i_blocks too early - before truncating any allocation. Correct things to set i_blocks after the allocation change. Signed-off-by: Mark Fasheh commit 30b8548f2c270c0205558fe4826a6ab8e7fe51ad Author: tao.ma@oracle.com Date: Thu Sep 6 08:02:25 2007 +0800 [PATCH] ocfs2: Fix a wrong cluster calculation. In ocfs2_alloc_write_write_ctxt, the written clusters length is calculated by the byte length only. This may cause some problems if we start to write at some position in the end of one cluster and last to a second cluster while the "len" is smaller than a cluster size. In that case, we have to write 2 clusters actually. So we have to take the start position into consideration also. Signed-off-by: Tao Ma Signed-off-by: Mark Fasheh commit c0123adef626607535f3c2c93b530c36780885e0 Author: Tiger Yang Date: Sat Sep 8 00:16:10 2007 +0800 [PATCH] ocfs2: fix mount option parsing For some mount option types, ocfs2_parse_options() will try to access sb->s_fs_info to get at the ocfs2 private superblock. Unfortunately, that hasn't been allocated yet and will cause a kernel crash. Fix this by storing options in a struct which can then get pushed into the ocfs2_super once it's been allocated later. If we need more options which store to the ocfs2_super in the future, we can just fields to this struct. Signed-off-by: Tiger Yang Signed-off-by: Mark Fasheh commit e0dceaf0a4b8c55076a4dbcba7ac8b05755f5cc6 Author: Mark Fasheh Date: Thu Aug 9 16:52:30 2007 -0700 ocfs2: set non-default s_time_gran during mount We need to manually set this to '1' during mount, otherwise inode_setattr() will chop off the nanosecond portion of our timestamps. Signed-off-by: Mark Fasheh commit ce17204ae633001ef41318d487282730e96b9522 Author: Sunil Mushran Date: Mon Jul 30 11:02:50 2007 -0700 ocfs2: Retry sendpage() if it returns EAGAIN Instead of treating EAGAIN, returned from sendpage(), as an error, this patch retries the operation. Signed-off-by: Sunil Mushran Signed-off-by: Mark Fasheh commit 480214d71f1972756473415d31953647952400fb Author: Sunil Mushran Date: Mon Aug 6 15:11:56 2007 -0700 ocfs2: Fix rename/extend race If one process is extending a file while another is renaming it, there exists a window when rename could flush the old inode's stale i_size to disk. This patch recognizes the fact that rename is only updating the old inode's ctime, so it ensures only that value is flushed to disk. Signed-off-by: Sunil Mushran Signed-off-by: Mark Fasheh commit 6a18380e7ddd7d1a0493efe3be6475dd92323364 Author: Adrian Bunk Date: Mon Jul 23 10:01:21 2007 +0200 [2.6 patch] ocfs2_insert_extent(): remove dead code This patch removes some now dead code. Spotted by the Coverity checker. Signed-off-by: Adrian Bunk Signed-off-by: Mark Fasheh commit 5a25403175b8a945e93fc9c64ae9cf54f5730add Author: Mark Fasheh Date: Fri Jul 20 12:56:16 2007 -0700 ocfs2: Fix max offset calculations ocfs2_max_file_offset() was over-estimating the largest file size for several cases. This wasn't really a problem before, but now that we support sparse files, it needs to be more accurate. Signed-off-by: Mark Fasheh commit ce76fd30ce98cdaeb38dca0dfbb3fa6d2801c5ce Author: Mark Fasheh Date: Fri Jul 20 12:02:14 2007 -0700 ocfs2: check ia_size limits in setattr We have to manually check the requested truncate size as the check in vmtruncate() comes too late for Ocfs2. Signed-off-by: Mark Fasheh commit 7c08d70c69150148c14f02633855f1591219c37c Author: Mark Fasheh Date: Fri Jul 20 11:58:36 2007 -0700 ocfs2: Fix some casting errors related to file writes ocfs2_align_clusters_to_page_index() needs to cast the clusters shift to pgoff_t and ocfs2_file_buffered_write() needs loff_t when calculating destination start for memcpy. Signed-off-by: Mark Fasheh commit a00cce356b5592208e761525a48a25902322cce9 Author: Mark Fasheh Date: Fri Jul 20 11:28:30 2007 -0700 ocfs2: use s_maxbytes directly in ocfs2_change_file_space() There's no need to recalculate things via ocfs2_max_file_offset() as we've already done that to fill s_maxbytes, so use that instead. We can also un-export ocfs2_max_file_offset() then. Signed-off-by: Mark Fasheh commit c11e9fafb398411af7558fca913c2fa4a10b1f48 Author: Mark Fasheh Date: Fri Jul 20 11:24:53 2007 -0700 ocfs2: Restrict inode changes in ocfs2_update_inode_atime() ocfs2_update_inode_atime() calls ocfs2_mark_inode_dirty() to push changes from the struct inode into the ocfs2 disk inode. The problem is, ocfs2_mark_inode_dirty() might change other fields, depending on what happened to the struct inode. Since we don't always have locking to serialize changes to other fields (like i_size, etc), just fix things up to only touch the atime field. Signed-off-by: Mark Fasheh commit 3836df6b520a2f93033bf53200b12a2cb5137395 Author: Jens Axboe Date: Tue Jul 24 10:17:50 2007 +0200 ocfs2: bad kunmap_atomic() kunmap_atomic() takes the virtual address, not the mapped page as argument. Signed-off-by: Jens Axboe Cc: Mark Fasheh Signed-off-by: Linus Torvalds commit 1833633803c7ef4d8f09877d3f1549cbd252f477 Author: Nick Piggin Date: Fri Jul 20 00:31:45 2007 -0700 fix some conversion overflows Fix page index to offset conversion overflows in buffer layer, ecryptfs, and ocfs2. It would be nice to convert the whole tree to page_offset, but for now just fix the bugs. Signed-off-by: Nick Piggin Cc: Michael Halcrow Cc: Mark Fasheh Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 20c2df83d25c6a95affe6157a4c9cac4cf5ffaac Author: Paul Mundt Date: Fri Jul 20 10:11:58 2007 +0900 mm: Remove slab destructors from kmem_cache_create(). Slab destructors were no longer supported after Christoph's c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt commit f745bb1c73e2395e6b9961d4d915a8f8e2cd32cd Merge: ff86303... 385820a... Author: Linus Torvalds Date: Thu Jul 19 14:16:44 2007 -0700 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: ocfs2: ->fallocate() support commit d0217ac04ca6591841e5665f518e38064f4e65bd Author: Nick Piggin Date: Thu Jul 19 01:47:03 2007 -0700 mm: fault feedback #1 Change ->fault prototype. We now return an int, which contains VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte. FAULT_RET_ code tells the VM whether a page was found, whether it has been locked, and potentially other things. This is not quite the way he wanted it yet, but that's changed in the next patch (which requires changes to arch code). This means we no longer set VM_CAN_INVALIDATE in the vma in order to say that a page is locked which requires filemap_nopage to go away (because we can no longer remain backward compatible without that flag), but we were going to do that anyway. struct fault_data is renamed to struct vm_fault as Linus asked. address is now a void __user * that we should firmly encourage drivers not to use without really good reason. The page is now returned via a page pointer in the vm_fault struct. Signed-off-by: Nick Piggin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 54cb8821de07f2ffcd28c380ce9b93d5784b40d7 Author: Nick Piggin Date: Thu Jul 19 01:46:59 2007 -0700 mm: merge populate and nopage into fault (fixes nonlinear) Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes the virtual address -> file offset differently from linear mappings. ->populate is a layering violation because the filesystem/pagecache code should need to know anything about the virtual memory mapping. The hitch here is that the ->nopage handler didn't pass down enough information (ie. pgoff). But it is more logical to pass pgoff rather than have the ->nopage function calculate it itself anyway (because that's a similar layering violation). Having the populate handler install the pte itself is likewise a nasty thing to be doing. This patch introduces a new fault handler that replaces ->nopage and ->populate and (later) ->nopfn. Most of the old mechanism is still in place so there is a lot of duplication and nice cleanups that can be removed if everyone switches over. The rationale for doing this in the first place is that nonlinear mappings are subject to the pagefault vs invalidate/truncate race too, and it seemed stupid to duplicate the synchronisation logic rather than just consolidate the two. After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in pagecache. Seems like a fringe functionality anyway. NOPAGE_REFAULT is removed. This should be implemented with ->fault, and no users have hit mainline yet. [akpm@linux-foundation.org: cleanup] [randy.dunlap@oracle.com: doc. fixes for readahead] [akpm@linux-foundation.org: build fix] Signed-off-by: Nick Piggin Signed-off-by: Randy Dunlap Cc: Mark Fasheh Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit d00806b183152af6d24f46f0c33f14162ca1262a Author: Nick Piggin Date: Thu Jul 19 01:46:57 2007 -0700 mm: fix fault vs invalidate race for linear mappings Fix the race between invalidate_inode_pages and do_no_page. Andrea Arcangeli identified a subtle race between invalidation of pages from pagecache with userspace mappings, and do_no_page. The issue is that invalidation has to shoot down all mappings to the page, before it can be discarded from the pagecache. Between shooting down ptes to a particular page, and actually dropping the struct page from the pagecache, do_no_page from any process might fault on that page and establish a new mapping to the page just before it gets discarded from the pagecache. The most common case where such invalidation is used is in file truncation. This case was catered for by doing a sort of open-coded seqlock between the file's i_size, and its truncate_count. Truncation will decrease i_size, then increment truncate_count before unmapping userspace pages; do_no_page will read truncate_count, then find the page if it is within i_size, and then check truncate_count under the page table lock and back out and retry if it had subsequently been changed (ptl will serialise against unmapping, and ensure a potentially updated truncate_count is actually visible). Complexity and documentation issues aside, the locking protocol fails in the case where we would like to invalidate pagecache inside i_size. do_no_page can come in anytime and filemap_nopage is not aware of the invalidation in progress (as it is when it is outside i_size). The end result is that dangling (->mapping == NULL) pages that appear to be from a particular file may be mapped into userspace with nonsense data. Valid mappings to the same place will see a different page. Andrea implemented two working fixes, one using a real seqlock, another using a page->flags bit. He also proposed using the page lock in do_no_page, but that was initially considered too heavyweight. However, it is not a global or per-file lock, and the page cacheline is modified in do_no_page to increment _count and _mapcount anyway, so a further modification should not be a large performance hit. Scalability is not an issue. This patch implements this latter approach. ->nopage implementations return with the page locked if it is possible for their underlying file to be invalidated (in that case, they must set a special vm_flags bit to indicate so). do_no_page only unlocks the page after setting up the mapping completely. invalidation is excluded because it holds the page lock during invalidation of each page (and ensures that the page is not mapped while holding the lock). This also allows significant simplifications in do_no_page, because we have the page locked in the right place in the pagecache from the start. Signed-off-by: Nick Piggin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 385820a38d5e7c70b20af4d68767b1920b1e4133 Author: Mark Fasheh Date: Thu Jul 19 00:14:38 2007 -0700 ocfs2: ->fallocate() support Plug ocfs2 into the ->fallocate() callback. This just re-uses the existing preallocation code. Signed-off-by: Mark Fasheh commit 86313c488a6848b7ec2ba04e74f25f79dd32a0b7 Author: Jeremy Fitzhardinge Date: Tue Jul 17 18:37:03 2007 -0700 usermodehelper: Tidy up waiting Rather than using a tri-state integer for the wait flag in call_usermodehelper_exec, define a proper enum, and use that. I've preserved the integer values so that any callers I've missed should still work OK. Signed-off-by: Jeremy Fitzhardinge Cc: James Bottomley Cc: Randy Dunlap Cc: Christoph Hellwig Cc: Andi Kleen Cc: Paul Mackerras Cc: Johannes Berg Cc: Ralf Baechle Cc: Bjorn Helgaas Cc: Joel Becker Cc: Tony Luck Cc: Kay Sievers Cc: Srivatsa Vaddagiri Cc: Oleg Nesterov Cc: David Howells commit b8c638acacfe32c0bde361916467af00691f1965 Merge: ef9efe4... 8e1c091... Author: Linus Torvalds Date: Tue Jul 17 15:19:06 2007 -0700 Merge branch 'uninit-var' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/misc-2.6 * 'uninit-var' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/misc-2.6: arch/i386/* fs/* ipc/*: mark variables with uninitialized_var() drivers/*: mark variables with uninitialized_var() commit 8e1c091cccd551557d24ce845715e8ceb6c49d36 Author: Jeff Garzik Date: Tue Jul 17 05:40:59 2007 -0400 arch/i386/* fs/* ipc/*: mark variables with uninitialized_var() Mark variables with uninitialized_var() if such a warning appears, and analysis proves that the var is initialized properly on all paths it is used. Signed-off-by: Jeff Garzik commit 3bd858ab1c451725c07a805dcb315215dc85b86e Author: Satyam Sharma Date: Tue Jul 17 15:00:08 2007 +0530 Introduce is_owner_or_cap() to wrap CAP_FOWNER use with fsuid check Introduce is_owner_or_cap() macro in fs.h, and convert over relevant users to it. This is done because we want to avoid bugs in the future where we check for only effective fsuid of the current task against a file's owning uid, without simultaneously checking for CAP_FOWNER as well, thus violating its semantics. [ XFS uses special macros and structures, and in general looked ... untouchable, so we leave it alone -- but it has been looked over. ] The (current->fsuid != inode->i_uid) check in generic_permission() and exec_permission_lite() is left alone, because those operations are covered by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH. Similarly operations falling under the purview of CAP_CHOWN and CAP_LEASE are also left alone. Signed-off-by: Satyam Sharma Cc: Al Viro Acked-by: Serge E. Hallyn Signed-off-by: Linus Torvalds commit a569425512253992cc64ebf8b6d00a62f986db3e Author: Christoph Hellwig Date: Tue Jul 17 04:04:28 2007 -0700 knfsd: exportfs: add exportfs.h header currently the export_operation structure and helpers related to it are in fs.h. fs.h is already far too large and there are very few places needing the export bits, so split them off into a separate header. [akpm@linux-foundation.org: fix cifs build] Signed-off-by: Christoph Hellwig Signed-off-by: Neil Brown Cc: Steven French Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit add096909da63ef32d6766f6771c07c9f16c6ee5 Merge: e245bef... 54c57dc... Author: Linus Torvalds Date: Mon Jul 16 10:52:55 2007 -0700 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (32 commits) [PATCH] ocfs2: zero_user_page conversion ocfs2: Support xfs style space reservation ioctls ocfs2: support for removing file regions ocfs2: update truncate handling of partial clusters ocfs2: btree support for removal of arbirtrary extents ocfs2: Support creation of unwritten extents ocfs2: support writing of unwritten extents ocfs2: small cleanup of ocfs2_write_begin_nolock() ocfs2: btree changes for unwritten extents ocfs2: abstract btree growing calls ocfs2: use all extent block suballocators ocfs2: plug truncate into cached dealloc routines ocfs2: simplify deallocation locking ocfs2: harden buffer check during mapping of page blocks ocfs2: shared writeable mmap ocfs2: factor out write aops into nolock variants ocfs2: rework ocfs2_buffered_write_cluster() ocfs2: take ip_alloc_sem during entire truncate ocfs2: Add "preferred slot" mount option [KJ PATCH] Replacing memset(,0,PAGE_SIZE) with clear_page() in fs/ocfs2/dlm/dlmrecovery.c ... commit 7b595756ec1f49e0049a9e01a1298d53a7faaa15 Author: Tejun Heo Date: Thu Jun 14 03:45:17 2007 +0900 sysfs: kill unnecessary attribute->owner sysfs is now completely out of driver/module lifetime game. After deletion, a sysfs node doesn't access anything outside sysfs proper, so there's no reason to hold onto the attribute owners. Note that often the wrong modules were accounted for as owners leading to accessing removed modules. This patch kills now unnecessary attribute->owner. Note that with this change, userland holding a sysfs node does not prevent the backing module from being unloaded. For more info regarding lifetime rule cleanup, please read the following message. http://article.gmane.org/gmane.linux.kernel/510293 (tweaked by Greg to not delete the field just yet, to make it easier to merge things properly.) Signed-off-by: Tejun Heo Cc: Cornelia Huck Cc: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit 54c57dc3b6578356c0a428c767d4bf080254a2ee Author: Eric Sandeen Date: Wed Jun 20 17:15:10 2007 -0700 [PATCH] ocfs2: zero_user_page conversion Signed-off-by: Eric Sandeen Signed-off-by: Mark Fasheh commit b25801038da5823bba1b5440a57ca68afc51b6bd Author: Mark Fasheh Date: Fri Mar 9 16:53:21 2007 -0800 ocfs2: Support xfs style space reservation ioctls We re-use the RESVSP/UNRESVSP ioctls from xfs which allow the user to allocate and deallocate regions to a file without zeroing data or changing i_size. Though renamed, the structure passed in from user is identical to struct xfs_flock64. The three fields that are actually used right now are l_whence, l_start and l_len. This should get ocfs2 immediate compatibility with userspace software using the pre-existing xfs ioctls. Signed-off-by: Mark Fasheh commit 063c4561f52a74de686fe0ff2f96f4f54c9fecd2 Author: Mark Fasheh Date: Tue Jul 3 13:34:11 2007 -0700 ocfs2: support for removing file regions Provide an internal interface for the removal of arbitrary file regions. ocfs2_remove_inode_range() takes a byte range within a file and will remove existing extents within that range. Partial clusters will be zeroed so that any read from within the region will return zeros. Signed-off-by: Mark Fasheh commit 35edec1d52c075975991471d624b33b9336226f2 Author: Mark Fasheh Date: Fri Jul 6 14:41:18 2007 -0700 ocfs2: update truncate handling of partial clusters The partial cluster zeroing code used during truncate usually assumes that the rightmost byte in the range to be zeroed lies on a cluster boundary. This makes sense for truncate, but punching holes might require zeroing on non-aligned rightmost boundaries. Signed-off-by: Mark Fasheh commit d0c7d7082ee1ec4f95ee57bf86ed39d1a27c4037 Author: Mark Fasheh Date: Tue Jul 3 13:27:22 2007 -0700 ocfs2: btree support for removal of arbirtrary extents Add code to the btree paths to support the removal of arbitrary regions within an existing extent. With proper higher level support this can be used to "punch holes" in a file. Truncate (a special case of hole punching) could also be converted to use these methods. Signed-off-by: Mark Fasheh commit 2ae99a60374f360ba07037ebbf33d19b89ac43a6 Author: Mark Fasheh Date: Fri Mar 9 16:43:28 2007 -0800 ocfs2: Support creation of unwritten extents This can now be trivially supported with re-use of our existing extend code. ocfs2_allocate_unwritten_extents() takes a start offset and a byte length and iterates over the inode, adding extents (marked as unwritten) until len is reached. Existing extents are skipped over. Signed-off-by: Mark Fasheh commit b27b7cbcf12a1bfff1ed68a73ddd7d11edc20daf Author: Mark Fasheh Date: Mon Jun 18 11:22:56 2007 -0700 ocfs2: support writing of unwritten extents Update the write code to detect when the user is asking to write to an unwritten extent. Like writing to a hole, we must zero the region between the write and the cluster boundaries. Most of the existing cluster zeroing logic can be re-used with some additional checks for the unwritten flag on extent records. Signed-off-by: Mark Fasheh commit 0d172baa5586071ae0ae0c07356a378fdbedecdb Author: Mark Fasheh Date: Mon May 14 18:09:54 2007 -0700 ocfs2: small cleanup of ocfs2_write_begin_nolock() We can easily seperate out the write descriptor setup and manipulation into helper functions. Signed-off-by: Mark Fasheh commit 328d5752e1259dfb29b7e65f6c2d145fddbaa750 Author: Mark Fasheh Date: Mon Jun 18 10:48:04 2007 -0700 ocfs2: btree changes for unwritten extents Writes to a region marked as unwritten might result in a record split or merge. We can support splits by making minor changes to the existing insert code. Merges require left rotations which mostly re-use right rotation support functions. Signed-off-by: Mark Fasheh commit c3afcbb34426a9291e4c038540129053a72c3cd8 Author: Mark Fasheh Date: Tue May 29 14:28:51 2007 -0700 ocfs2: abstract btree growing calls The top level calls and logic for growing a tree can easily be abstracted out of ocfs2_insert_extent() into a seperate function - ocfs2_grow_tree(). This allows future code to easily grow btrees when needed. Signed-off-by: Mark Fasheh commit 1f6697d072e6fd0b332a4301c21060dcb89bd623 Author: Mark Fasheh Date: Mon Jun 25 14:53:33 2007 -0700 ocfs2: use all extent block suballocators Now that we have a method to deallocate blocks from them, each node should allocate extent blocks from their local suballocator file. Signed-off-by: Mark Fasheh commit 59a5e416d1ab543a5248a2b34d83202c4d55d132 Author: Mark Fasheh Date: Fri Jun 22 15:52:36 2007 -0700 ocfs2: plug truncate into cached dealloc routines Signed-off-by: Mark Fasheh commit 2b604351bc99b4e4504758cbac369b660b71de0b Author: Mark Fasheh Date: Fri Jun 22 15:45:27 2007 -0700 ocfs2: simplify deallocation locking Deallocation of suballocator blocks, most notably extent blocks, might involve multiple suballocator inodes. The locking for this can get extremely complicated, especially when the suballocator inodes to delete from aren't known until deep within an unrelated codepath. Implement a simple scheme for recording the blocks to be unlinked so that the actual deallocation can be done in a context which won't deadlock. Signed-off-by: Mark Fasheh commit bce997682fe3121516f5a20cf7bad2e6029ba018 Author: Mark Fasheh Date: Mon Jun 18 11:12:36 2007 -0700 ocfs2: harden buffer check during mapping of page blocks We don't want to submit buffer_new blocks for read i/o. This actually won't happen right now because those requests during an allocating write are all nicely aligned. It's probably a good idea to provide an explicit check though. Signed-off-by: Mark Fasheh commit 7307de80510a70e5e5aa98de1e80ccbb7d90a3a8 Author: Mark Fasheh Date: Wed May 9 15:16:19 2007 -0700 ocfs2: shared writeable mmap Implement cluster consistent shared writeable mappings using the ->page_mkwrite() callback. Signed-off-by: Mark Fasheh commit 607d44aa3fa6f40b0facaf1028886ed362b92682 Author: Mark Fasheh Date: Wed May 9 15:14:45 2007 -0700 ocfs2: factor out write aops into nolock variants ocfs2_mkwrite() will want this so that it can add some mmap specific checks before asking for a write. Signed-off-by: Mark Fasheh commit 3a307ffc2730bfa1a4dfa94537be9d412338aad2 Author: Mark Fasheh Date: Tue May 8 17:47:32 2007 -0700 ocfs2: rework ocfs2_buffered_write_cluster() Use some ideas from the new-aops patch series and turn ocfs2_buffered_write_cluster() into a 2 stage operation with the caller copying data in between. The code now understands multiple cluster writes as a result of having to deal with a full page write for greater than 4k pages. This sets us up to easily call into the write path during ->page_mkwrite(). Signed-off-by: Mark Fasheh commit 2e89b2e48e1da09ed483f195968c9172aa95b5e2 Author: Mark Fasheh Date: Wed May 9 13:40:18 2007 -0700 ocfs2: take ip_alloc_sem during entire truncate Use of the alloc sem during truncate was too narrow - we want to protect the i_size change and page truncation against mmap now. Signed-off-by: Mark Fasheh commit baf4661a8225d3a39622b795a8db0e6aa845c1ec Author: Sunil Mushran Date: Mon Jun 18 17:00:24 2007 -0700 ocfs2: Add "preferred slot" mount option ocfs2 will attempt to assign the node the slot# provided in the mount option. Failure to assign the preferred slot is not an error. This small feature can be useful for automated testing. Signed-off-by: Sunil Mushran Signed-off-by: Mark Fasheh commit 5fb0f7f010ba07e373c30c3e99b0efd868c6c977 Author: Shani Moideen Date: Mon Jun 11 09:38:19 2007 +0530 [KJ PATCH] Replacing memset(,0,PAGE_SIZE) with clear_page() in fs/ocfs2/dlm/dlmrecovery.c Replacing memset(,0,PAGE_SIZE) with clear_page() in fs/ocfs2/dlm/dlmrecovery.c Signed-off-by: Shani Moideen Signed-off-by: Mark Fasheh commit 800deef3f6f87fee3a2e89cf7237a1f20c1a78d7 Author: Christoph Hellwig Date: Thu May 17 16:03:13 2007 +0200 [PATCH] ocfs2: use list_for_each_entry where benefical Signed-off-by: Christoph Hellwig Signed-off-by: Mark Fasheh commit e6df3a663a5d1ee68aeae7f007197f272700d9cc Author: Joel Becker Date: Tue Feb 6 15:45:39 2007 -0800 ocfs2: Wake up a starting region if it gets killed in the background. Tell o2cb_region_dev_write() to wake up if rmdir(2) happens on the heartbeat region while it is starting up. Then o2hb_region_dev_write() can check to see if it is alive and act accordingly. This prevents a hang (not being woken) and a crash (if it's woken by a signal). Signed-off-by: Joel Becker Signed-off-by: Mark Fasheh commit 16c6a4f24de2933b26477ad5dfb71f518220d641 Author: Joel Becker Date: Tue Jun 19 11:34:03 2007 -0700 ocfs2: live heartbeat depends on the local node configuration Removing the local node configuration out from underneath a running heartbeat is "bad". Provide an API in the ocfs2 nodemanager to request a configfs dependancy on the local node, then use it in heartbeat. Signed-off-by: Joel Becker Signed-off-by: Mark Fasheh commit 14829422be6d6b6721f61b1e749acf5a9cb664d8 Author: Joel Becker Date: Thu Jun 14 21:40:49 2007 -0700 ocfs2: Depend on configfs heartbeat items. ocfs2 mounts require a heartbeat region. Use the new configfs_depend_item() facility to actually depend on them so they can't go away from under us. First, teach cluster/nodemanager.c to depend an item on the o2cb subsystem. Then teach o2hb_register_callbacks to take a UUID and depend on the appropriate region. Finally, teach all users of o2hb to pass a UUID or NULL if they don't require a pin. Signed-off-by: Joel Becker Signed-off-by: Mark Fasheh commit e6bd07aee739566803425acdbf5cdb29919164e1 Author: Joel Becker Date: Fri Jul 6 23:33:17 2007 -0700 configfs: Convert subsystem semaphore to mutex Convert the su_sem member of struct configfs_subsystem to a struct mutex, as that's what it is. Also convert all the users and update Documentation/configfs.txt and Documentation/configfs_example.c accordingly. [ Conflict in fs/dlm/config.c with commit 3168b0780d06ace875696f8a648d04d6089654e5 manually resolved. --Mark ] Inspired-by: Satyam Sharma Signed-off-by: Joel Becker Signed-off-by: Mark Fasheh commit cac36bb06efe4880234524e117e0e712b10b1f16 Author: Jens Axboe Date: Thu Jun 14 13:10:48 2007 +0200 pipe: change the ->pin() operation to ->confirm() The name 'pin' was badly chosen, it doesn't pin a pipe buffer in the most commonly used sense in the kernel. So change the name to 'confirm', after debating this issue with Hugh Dickins a bit. A good return from ->confirm() means that the buffer is really there, and that the contents are good. Signed-off-by: Jens Axboe commit d6b29d7cee064f28ca097e906de7453541351095 Author: Jens Axboe Date: Mon Jun 4 09:59:47 2007 +0200 splice: divorce the splice structure/function definitions from the pipe header We need to move even more stuff into the header so that folks can use the splice_to_pipe() implementation instead of open-coding a lot of pipe knowledge (see relay implementation), so move to our own header file finally. Signed-off-by: Jens Axboe commit 5ffc4ef45b3b0a57872f631b4e4ceb8ace0d7496 Author: Jens Axboe Date: Fri Jun 1 11:49:19 2007 +0200 sendfile: remove .sendfile from filesystems that use generic_file_sendfile() They can use generic_file_splice_read() instead. Since sys_sendfile() now prefers that, there should be no change in behaviour. Signed-off-by: Jens Axboe commit 6a14b90bb6bc7cd83e2a444bf457a2ea645cbfe7 Author: Jens Axboe Date: Thu Jun 14 13:08:55 2007 +0200 vmsplice: add vmsplice-to-user support A bit of a cheat, it actually just copies the data to userspace. But this makes the interface nice and symmetric and enables people to build on splice, with room for future improvement in performance. Signed-off-by: Jens Axboe commit c66ab6fa705e1b2887a6d9246b798bdc526839e2 Author: Jens Axboe Date: Tue Jun 12 21:17:17 2007 +0200 splice: abstract out actor data For direct splicing (or private splicing), the output may not be a file. So abstract out the handling into a specified actor function and put the data in the splice_desc structure earlier, so we can build on top of that. This is the first step in better splice handling for drivers, and also for implementing vmsplice _to_ user memory. Signed-off-by: Jens Axboe