GIT c2d7a5132bf6404dd749c32983d4013902fc385c git+ssh://master.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw.git commit Author: Steven Whitehouse Date: Sun Aug 26 14:23:56 2007 +0100 [GFS2] Correct lock ordering in unlink This patch corrects the lock ordering in unlink to be the same as that in the rest of GFS2, i.e. parent -> child -> rgrp. Signed-off-by: Steven Whitehouse commit 99cde84d39f2b4adbce3c26e557163857c13c2e0 Author: Wendy Cheng Date: Fri Aug 24 09:15:01 2007 -0400 [GFS2] fix inode meta data corruption Fix a nasty inode meta data corruption issue by keeping the buffer head in icache array. This buffer needs to stay in memory until journal flush occurs Otherwise, gfs2_meta_inode_buffer could do a disk read before the inode hits disk. It ends up with meta data corruptions. The buffer will be released as part of the existing journal flush logic. Signed-off-by: S. Wendy Cheng Signed-off-by: Steven Whitehouse commit 478781d55a347853c66c4f32df7bd3642280a458 Author: Benjamin Marzinski Date: Thu Aug 23 13:19:05 2007 -0500 GFS2] delay glock demote for a minimum hold time When a lot of IO, with some distributed mmap IO, is run on a GFS2 filesystem in a cluster, it will deadlock. The reason is that do_no_page() will repeatedly call gfs2_sharewrite_nopage(), because each node keeps giving up the glock too early, and is forced to call unmap_mapping_range(). This bumps the mapping->truncate_count sequence count, forcing do_no_page() to retry. This patch institutes a minimum glock hold time a tenth a second. This insures that even in heavy contention cases, the node has enough time to get some useful work done before it gives up the glock. A second issue is that when gfs2_glock_dq() is called from within a page fault to demote a lock, and the associated page needs to be written out, it will try to acqire a lock on it, but it has already been locked at a higher level. This patch puts makes gfs2_glock_dq() use the work queue as well, to avoid this issue. This is the same patch as Steve Whitehouse originally proposed to fix this issue, execpt that gfs2_glock_dq() now grabs a reference to the glock before it queues up the work on it. Signed-off-by: Benjamin E. Marzinski Signed-off-by: Steven Whitehouse commit 70b62dc923534d56311c47f1ab448294d9c1ce7e Author: Abhijith Das Date: Thu Aug 23 13:33:01 2007 -0500 [GFS2] panic after can't parse mount arguments When you try to mount gfs2 with -o garbage, the mount fails and the gfs2 superblock is deallocated and becomes NULL. The vfs comes around later on and calls gfs2_kill_sb. At this point the hidden gfs2 superblock pointer (sb->s_fs_info) is NULL and dereferencing it through gfs2_meta_syncfs causes the panic. (the other function call to gfs2_delete_debugfs_file() succeeds because this function already checks for a NULL pointer) Signed-off-by: Abhijith Das Signed-off-by: Steven Whitehouse commit 674f3cc10376469ff43227b0d5fd8a47c7ededdb Author: Bob Peterson Date: Wed Aug 22 11:15:29 2007 -0500 [GFS2] Patch to protect sd_log_num_jdata This is a patch to GFS2 to protect sd_log_num_jdata with the gfs2_log_lock. Without this patch, there is a timing window where you can get hit the following assert from function gfs2_log_flush(): gfs2_assert_withdraw(sdp, sdp->sd_log_num_buf + sdp->sd_log_num_jdata == sdp->sd_log_commited_buf + sdp->sd_log_commited_databuf); I've tested it on my roth cluster and it fixes the problem. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse commit 0cf7b552a99491caf793fe60b1d22b9c8d192f4b Author: Abhijith Das Date: Tue Aug 21 09:57:29 2007 -0500 [GFS2] Wendy's dump lockname in hex & fix glock dump With this patch, gfs2 glockdump through the debugfs filesystem will only dump glocks for the specified filesystem instead of all glocks. Also, to aid debugging, the glock number is dumped in hex instead of decimal. Signed-off-by: Steven Whitehouse Signed-off-by: S. Wendy Cheng Signed-off-by: Abhijith Das commit 5899cb57a100ba1be62aa68cce3171e51f7d3ad6 Author: Patrick Caulfield Date: Mon Aug 20 15:13:38 2007 +0100 [DLM] Fix lowcomms socket closing This patch fixes the slight mess made in lowcomms closing by previous patches and fixes all sorts of DLM hangs. Signed-Off-By: Patrick Caulfield Signed-off-by: Steven Whitehouse commit 7c6e58e861bed4f2300f53e82ff5c62d028aeaf8 Author: Wendy Cheng Date: Mon Aug 20 09:29:53 2007 -0400 [GFS2] Reduce truncate IO traffic Current GFS2 setattr call unconditionally invokes do_shrink even the requested size and actual file size are equal. This has generated large amount of extra IOs found during NFS benchmark runs. This patch moves the relevant logic out of shrink code path. Since setattr is a system call, the time stamps update is still required. Signed-off-by: S. Wendy Cheng Signed-off-by: Steven Whitehouse commit b98fa1dc60289ed91b990cec762054eaf86881d9 Author: Benjamin Marzinski Date: Fri Aug 17 20:22:07 2007 -0500 [GFS2] Add NULL entry to token table match_token() was returning garbage data instead of a fail value. This data happened to match a valid option id for an option that required an argument (in this case, lockproto=%s) For match_token() to correctly fail if the option doesn't match any of the tokens, the token table must end with a NULL entry. This patch adds the NULL entry. Signed-off-by: Benjamin E. Marzinski Signed-off-by: Steven Whitehouse commit 3d56af0e3ca7573f9b0dc086f3139cd7afe65dd8 Author: Steven Whitehouse Date: Thu Aug 16 17:08:20 2007 +0100 [GFS2] Add a missing gfs2_trans_add_bh() This was missing from the dir_split_leaf() function although in most cases its not a problem due to other functions having already previously called gfs2_trans_add_bh. This makes certain that it is correct. Signed-off-by: Steven Whitehouse Cc: Wendy Cheng commit ab0248d0868ac8f5081fd0637dd8edf08e32cedc Author: Steven Whitehouse Date: Thu Aug 16 16:03:57 2007 +0100 [GFS2] Clean up invalidatepage/releasepage This patch fixes some bugs relating to journaled data files by cleaning up the gfs2_invalidatepage() and gfs2_releasepage() functions. We now never block during gfs2_releasepage(), instead we always either release or refuse to release depending on the status of the buffers. This fixes Red Hat bugzillas #248969 and #252392. Signed-off-by: Steven Whitehouse Cc: Bob Peterson commit d8b5784198794669e0a52799996e62f7fd222154 Author: Abhijith Das Date: Wed Aug 15 11:25:05 2007 -0500 [GFS2] Fix quota do_list operation hang This is the filesystem part of the patches to fix this bz. There are additional userland patches (gfs2_quota, libgfs2) for the complete solution. This patch adds a new field qu_ll_next to the gfs2_quota structure. This field allows us to create linked lists of quotas in the ondisk quota inode. Instead of scanning through the entire sparse quota file for valid quotas, we can now simply walk through the user and group quota linked lists to perform the do_list operation. Signed-off-by: Abhijith Das Signed-off-by: Steven Whitehouse commit d868841cd77f76363c348ca4a8d15a1f2851f716 Author: Denis Cheng Date: Wed Aug 15 23:54:44 2007 +0800 [GFS2] fixed a NULL pointer assignment BUG Signed-off-by: Denis Cheng Signed-off-by: Steven Whitehouse commit 0633a2e4f7f34290b6ffc2cb002a14aa545bd05b Author: Abhijith Das Date: Tue Aug 14 15:34:58 2007 -0500 [GFS2] Force unstuff of hidden quota inode This patch forcibly unstuffs (if stuffed) the hidden quota inode at the first availble opportunity. In any practical scenario the quota inode won't be stuffed, so this is ok to do. Unstuffing the quota inode allows us to ignore the case of a stuffed quota inode in gfs2_adjust_quota(). Signed-off-by: Abhijith Das Signed-off-by: Steven Whitehouse commit 645eed2e681f6d12ce79aadbd33418227789be47 Author: Denis Cheng Date: Mon Aug 13 11:01:58 2007 +0800 [GFS2] better code for translating characters the original code could work, but I think this code could work better. Signed-off-by: Denis Cheng Signed-off-by: Steven Whitehouse commit 5b40f06de42733785c0cbbbee11586c5d2926f16 Author: Denis Cheng Date: Sat Aug 11 10:27:08 2007 +0800 [GFS2] unneeded typecast sb->s_fs_info is a void pointer, thus the type cast is not needed. Signed-off-by: Denis Cheng Signed-off-by: Steven Whitehouse commit d7578845ec6c945c8fe73f5b0edf128e41a53f97 Author: Denis Cheng Date: Sat Aug 11 10:27:07 2007 +0800 [GFS2] use list_for_each_entry instead Signed-off-by: Denis Cheng Signed-off-by: Steven Whitehouse commit 977692a686c34dad4dfa4381da52baa3a166598e Author: Bob Peterson Date: Wed Aug 8 17:08:14 2007 -0500 [GFS2] Ensure journal file cache is flushed after recovery This is for bugzilla bug #248176: GFS2: invalid metadata block Patches 1 thru 3 were accepted upstream, but there were problems with 4 and 5. Those issues have been resolved and now the recovery tests are passing without errors. This code has gone through 41 * 3 successful gfs2 recovery tests before it hit an unrelated (openais) problem. I'm continuing to test it. This is a complete rewrite of patch 5 for bug #248176, written by Steve Whitehouse. This is referred to in the bugzilla record as "new 6" and "a different solution". The problem was that the journal inodes, although protected by a glock, were not synched with the other nodes because they don't use the inode glock synch operations (i.e. no "glops" were defined). Therefore, journal recovery on a journal-recovering node were causing the blocks to get out of sync with the node that was actually trying to use that journal as it comes back up from a reboot. There are two possible solutions: (1) To make the journals use the normal inode glock sync operations, or (2) To make the journal operations take effect immediately (i.e. no caching). Although option 1 works, it turns out to be a lot more code. Steve opted for option 2, which is much simpler and therefore less prone to regression errors. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse -- commit ff12bca47052521d168b898ed8f436b3032a7f49 Author: Bob Peterson Date: Wed Aug 8 16:52:09 2007 -0500 [GFS2] invalid metadata block - REVISED This is for bugzilla bug #248176: GFS2: invalid metadata block Patches 1 thru 3 were accepted upstream, but there were problems with 4 and 5. Those issues have been resolved and now the recovery tests are passing without errors. This code has gone through 41 * 3 successful gfs2 recovery tests before it hit an unrelated (openais) problem. This is a complete rewrite of patch 4 for bug #248176. Part of the problem was that inodes were being recycled before their buffers were flushed to the journal logs. Another problem was that the clone bitmaps were being searched for deleted inodes to recycle, but only the "real" bitmaps should be searched for that purpose. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse commit 1642e2cd821c65dce475e504b812167a30ef1170 Author: Steven Whitehouse Date: Wed Aug 1 13:57:10 2007 +0100 [GFS2] Reduce number of gfs2_scand processes to one We only need a single gfs2_scand process rather than the one per filesystem which we had previously. As a result the parameter determining the frequency of gfs2_scand runs becomes a module parameter rather than a mount parameter as it was before. Signed-off-by: Steven Whitehouse commit 6504608f5d1933e3fc2f039211455d2b97b1645d Author: Denis Cheng Date: Tue Jul 31 18:31:12 2007 +0800 [GFS2] use the declaration of gfs2_dops in the header file instead Signed-off-by: Denis Cheng Signed-off-by: Steven Whitehouse commit 8e18fb8890b7e6818be5982953dc90a321c6e13e Author: Denis Cheng Date: Tue Jul 31 18:31:11 2007 +0800 [GFS2] mark struct *_operations const these struct *_operations are all method tables, thus should be const. Signed-off-by: Denis Cheng Signed-off-by: Steven Whitehouse commit 711657d35bf3dbca0f81c16e57a5e4d0377bfcf2 Author: Bob Peterson Date: Wed Jul 25 10:06:22 2007 -0500 [GFS2] Detach buf data during in-place writeback This is patch 5 of 5 for bug #248176 Metadata corruption was occurring because page references weren't being removed in all cases. I previously added a function called detach_bufdata, but I discovered there already WAS a function out there to do the job. It's called gfs2_meta_cache_flush. So I added a call to that to remove the page references. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse commit 89af6e960937babfc41bd6db0dde6bbf84f8980f Author: Denis Cheng Date: Wed Jul 25 17:53:58 2007 +0800 [GFS2] use an temp variable to reduce a spin_unlock this is more clear. Signed-off-by: Denis Cheng Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse commit 7d542615a3ee0dca435f2ae2bc930ed991119be3 Author: Bob Peterson Date: Tue Jul 24 14:09:32 2007 -0500 [GFS2] Prevent infinite loop in try_rgrp_unlink() This is patch three of five for bug #248176. The try_rgrp_unlink code in rgrp.c had an infinite loop. This was caused because the bitmap function rgblk_search can return a block less than the "goal" block, in which case it was looping. The fix is to make it always march forward as needed. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse commit d579c56a3e2f86a499d8116a3ea9bbf551b86ced Author: Bob Peterson Date: Tue Jul 24 14:07:33 2007 -0500 [GFS2] Revert part of earlier log.c changes This is patch 2 of 5 for bug #248176. The list_move code previously concocted in log.c for bug #238162 (see https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=238162#c23) never runs as bh can now never be NULL at this point. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse commit e9ed393eb9e2e7a5afc609ea1b4d30f95637d7e6 Author: Bob Peterson Date: Tue Jul 24 14:05:31 2007 -0500 [GFS2] Move some code inside the log lock This is the first of five patches for bug #248176: There were still some critical variables being manipulated outside the log_lock spinlock. That usually resulted in a hang. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse commit a3d2e6b8aaa9e954127cacbce86ba53075a72bf4 Author: Steven Whitehouse Date: Tue Jul 24 13:53:36 2007 +0100 [GFS2] Fix an oops in glock dumping This fixes an oops which was occurring during glock dumping due to the seq file code not taking a reference to the glock. Also this fixes a memory leak which occurred in certain cases, in turn preventing the filesystem from unmounting. Signed-off-by: Steven Whitehouse commit 0764a3d7f09e47f128418293e66abd46907bdebe Author: Steve French Date: Fri Jul 20 13:07:26 2007 -0500 [GFS2] GFS2 not checking pointer on create when running under nfsd When looking at an unrelated problem, I noticed that nfsd does not set nameidata pointer on create (ie nd is NULL). This should cause an oops in some cases in which when NFSd is mounted over GFS2. Signed-off-by: Steve French Signed-off-by: Steven Whitehouse commit c9c2841d0bb2977ccb97c4b353a0ea459945c80d Author: Jesper Juhl Date: Sat Jul 21 17:03:22 2007 +0200 [GFS2] Clean up duplicate includes in fs/gfs2/ This patch cleans up duplicate includes in fs/gfs2/ Signed-off-by: Jesper Juhl Signed-off-by: Steven Whitehouse commit 30bba2c49fa30e491a3f8b7fd8cb820d7978f701 Author: Josef Whiter Date: Mon Jul 23 10:02:40 2007 +0100 [GFS2] Fix calculation of demote state If a glock is in the exclusive state and a request for demote to deferred has been received, then further requests for demote to shared are being ignored. This patch fixes that by ensuring that we demote to unlocked in that case. Signed-off-by: Josef Whiter Signed-off-by: Steven Whitehouse commit 2d438d87c2f57b99f147a8bc5fbd32680b981a39 Author: Steven Whitehouse Date: Mon Jul 23 09:54:36 2007 +0100 [GFS2] Fix two races relating to glock callbacks One of the races relates to referencing a variable while not holding its protecting spinlock. The patch simply moves the test inside the spin lock. The other races occurs when a demote to unlocked request occurs during the time a demote to shared request is already running. This of course only happens in the case that the lock was in the exclusive mode to start with. The patch adds a check to see if another demote request has occurred in the mean time and if it has, then it performs a second demote. Signed-off-by: Steven Whitehouse fs/dlm/lowcomms.c | 16 +-- fs/gfs2/bmap.c | 32 +++++ fs/gfs2/daemon.c | 24 ---- fs/gfs2/daemon.h | 1 fs/gfs2/dir.c | 1 fs/gfs2/eaops.c | 8 + fs/gfs2/eaops.h | 4 - fs/gfs2/glock.c | 244 +++++++++++++++++++++++++++------------- fs/gfs2/glock.h | 4 - fs/gfs2/glops.c | 6 + fs/gfs2/incore.h | 7 + fs/gfs2/inode.c | 20 ++- fs/gfs2/locking/dlm/lock_dlm.h | 1 fs/gfs2/locking/dlm/plock.c | 11 +- fs/gfs2/locking/nolock/main.c | 1 fs/gfs2/log.c | 18 +-- fs/gfs2/lops.c | 20 +++ fs/gfs2/main.c | 3 fs/gfs2/mount.c | 5 + fs/gfs2/ops_address.c | 132 ++-------------------- fs/gfs2/ops_fstype.c | 34 ++---- fs/gfs2/ops_inode.c | 31 +++-- fs/gfs2/ops_super.c | 1 fs/gfs2/quota.c | 13 ++ fs/gfs2/recovery.c | 2 fs/gfs2/rgrp.c | 37 ++++-- fs/gfs2/super.c | 1 fs/gfs2/sys.c | 2 include/linux/gfs2_ondisk.h | 30 +++++ 29 files changed, 385 insertions(+), 324 deletions(-) diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c index 9e9d2e8..62a8a6c 100644 --- a/fs/dlm/lowcomms.c +++ b/fs/dlm/lowcomms.c @@ -334,18 +334,8 @@ static void close_connection(struct conn con->rx_page = NULL; } - /* If we are an 'othercon' then NULL the pointer to us - from the parent and tidy ourself up */ - if (test_bit(CF_IS_OTHERCON, &con->flags)) { - struct connection *parent = __nodeid2con(con->nodeid, 0); - parent->othercon = NULL; - kmem_cache_free(con_cache, con); - } - else { - /* Parent connections get reused */ - con->retries = 0; - mutex_unlock(&con->sock_mutex); - } + con->retries = 0; + mutex_unlock(&con->sock_mutex); } /* We only send shutdown messages to nodes that are not part of the cluster */ @@ -731,6 +721,8 @@ static int tcp_accept_from_sock(struct c INIT_WORK(&othercon->swork, process_send_sockets); INIT_WORK(&othercon->rwork, process_recv_sockets); set_bit(CF_IS_OTHERCON, &othercon->flags); + } + if (!othercon->sock) { newcon->othercon = othercon; othercon->sock = newsock; newsock->sk->sk_user_data = othercon; diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c index cd805a6..9b89904 100644 --- a/fs/gfs2/bmap.c +++ b/fs/gfs2/bmap.c @@ -1085,6 +1085,33 @@ static int do_shrink(struct gfs2_inode * return error; } +static int do_touch(struct gfs2_inode *ip, u64 size) +{ + struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode); + struct buffer_head *dibh; + int error; + + error = gfs2_trans_begin(sdp, RES_DINODE, 0); + if (error) + return error; + + down_write(&ip->i_rw_mutex); + + error = gfs2_meta_inode_buffer(ip, &dibh); + if (error) + goto do_touch_out; + + ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME; + gfs2_trans_add_bh(ip->i_gl, dibh, 1); + gfs2_dinode_out(ip, dibh->b_data); + brelse(dibh); + +do_touch_out: + up_write(&ip->i_rw_mutex); + gfs2_trans_end(sdp); + return error; +} + /** * gfs2_truncatei - make a file a given size * @ip: the inode @@ -1105,8 +1132,11 @@ int gfs2_truncatei(struct gfs2_inode *ip if (size > ip->i_di.di_size) error = do_grow(ip, size); - else + else if (size < ip->i_di.di_size) error = do_shrink(ip, size); + else + /* update time stamps */ + error = do_touch(ip, size); return error; } diff --git a/fs/gfs2/daemon.c b/fs/gfs2/daemon.c index 3548d9f..3731ab0 100644 --- a/fs/gfs2/daemon.c +++ b/fs/gfs2/daemon.c @@ -35,30 +35,6 @@ #include "util.h" The kthread functions used to start these daemons block and flush signals. */ /** - * gfs2_scand - Look for cached glocks and inodes to toss from memory - * @sdp: Pointer to GFS2 superblock - * - * One of these daemons runs, finding candidates to add to sd_reclaim_list. - * See gfs2_glockd() - */ - -int gfs2_scand(void *data) -{ - struct gfs2_sbd *sdp = data; - unsigned long t; - - while (!kthread_should_stop()) { - gfs2_scand_internal(sdp); - t = gfs2_tune_get(sdp, gt_scand_secs) * HZ; - if (freezing(current)) - refrigerator(); - schedule_timeout_interruptible(t); - } - - return 0; -} - -/** * gfs2_glockd - Reclaim unused glock structures * @sdp: Pointer to GFS2 superblock * diff --git a/fs/gfs2/daemon.h b/fs/gfs2/daemon.h index 8010071..0de9b35 100644 --- a/fs/gfs2/daemon.h +++ b/fs/gfs2/daemon.h @@ -10,7 +10,6 @@ #ifndef __DAEMON_DOT_H__ #define __DAEMON_DOT_H__ -int gfs2_scand(void *data); int gfs2_glockd(void *data); int gfs2_recoverd(void *data); int gfs2_logd(void *data); diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c index 2beb2f4..08c6dd0 100644 --- a/fs/gfs2/dir.c +++ b/fs/gfs2/dir.c @@ -1043,6 +1043,7 @@ static int dir_split_leaf(struct inode * error = gfs2_meta_inode_buffer(dip, &dibh); if (!gfs2_assert_withdraw(GFS2_SB(&dip->i_inode), !error)) { + gfs2_trans_add_bh(dip->i_gl, dibh, 1); dip->i_di.di_blocks++; gfs2_set_inode_blocks(&dip->i_inode); gfs2_dinode_out(dip, dibh->b_data); diff --git a/fs/gfs2/eaops.c b/fs/gfs2/eaops.c index 1ab3e9d..aa8dbf3 100644 --- a/fs/gfs2/eaops.c +++ b/fs/gfs2/eaops.c @@ -200,28 +200,28 @@ static int security_eo_remove(struct gfs return gfs2_ea_remove_i(ip, er); } -static struct gfs2_eattr_operations gfs2_user_eaops = { +static const struct gfs2_eattr_operations gfs2_user_eaops = { .eo_get = user_eo_get, .eo_set = user_eo_set, .eo_remove = user_eo_remove, .eo_name = "user", }; -struct gfs2_eattr_operations gfs2_system_eaops = { +const struct gfs2_eattr_operations gfs2_system_eaops = { .eo_get = system_eo_get, .eo_set = system_eo_set, .eo_remove = system_eo_remove, .eo_name = "system", }; -static struct gfs2_eattr_operations gfs2_security_eaops = { +static const struct gfs2_eattr_operations gfs2_security_eaops = { .eo_get = security_eo_get, .eo_set = security_eo_set, .eo_remove = security_eo_remove, .eo_name = "security", }; -struct gfs2_eattr_operations *gfs2_ea_ops[] = { +const struct gfs2_eattr_operations *gfs2_ea_ops[] = { NULL, &gfs2_user_eaops, &gfs2_system_eaops, diff --git a/fs/gfs2/eaops.h b/fs/gfs2/eaops.h index 508b4f7..da2f7fb 100644 --- a/fs/gfs2/eaops.h +++ b/fs/gfs2/eaops.h @@ -22,9 +22,9 @@ struct gfs2_eattr_operations { unsigned int gfs2_ea_name2type(const char *name, const char **truncated_name); -extern struct gfs2_eattr_operations gfs2_system_eaops; +extern const struct gfs2_eattr_operations gfs2_system_eaops; -extern struct gfs2_eattr_operations *gfs2_ea_ops[]; +extern const struct gfs2_eattr_operations *gfs2_ea_ops[]; #endif /* __EAOPS_DOT_H__ */ diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 3f0974e..931368a 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -25,8 +25,10 @@ #include #include #include #include -#include -#include +#include +#include +#include +#include #include "gfs2.h" #include "incore.h" @@ -48,7 +50,6 @@ struct glock_iter { int hash; /* hash bucket index */ struct gfs2_sbd *sdp; /* incore superblock */ struct gfs2_glock *gl; /* current glock struct */ - struct hlist_head *hb_list; /* current hash bucket ptr */ struct seq_file *seq; /* sequence file for debugfs */ char string[512]; /* scratch space */ }; @@ -59,8 +60,13 @@ static int gfs2_dump_lockstate(struct gf static int dump_glock(struct glock_iter *gi, struct gfs2_glock *gl); static void gfs2_glock_xmote_th(struct gfs2_glock *gl, struct gfs2_holder *gh); static void gfs2_glock_drop_th(struct gfs2_glock *gl); +static void run_queue(struct gfs2_glock *gl); + static DECLARE_RWSEM(gfs2_umount_flush_sem); static struct dentry *gfs2_root; +static struct task_struct *scand_process; +static unsigned int scand_secs = 5; +static struct workqueue_struct *glock_workqueue; #define GFS2_GL_HASH_SHIFT 15 #define GFS2_GL_HASH_SIZE (1 << GFS2_GL_HASH_SHIFT) @@ -276,6 +282,18 @@ static struct gfs2_glock *gfs2_glock_fin return gl; } +static void glock_work_func(struct work_struct *work) +{ + struct gfs2_glock *gl = container_of(work, struct gfs2_glock, gl_work.work); + + spin_lock(&gl->gl_spin); + if (test_and_clear_bit(GLF_PENDING_DEMOTE, &gl->gl_flags)) + set_bit(GLF_DEMOTE, &gl->gl_flags); + run_queue(gl); + spin_unlock(&gl->gl_spin); + gfs2_glock_put(gl); +} + /** * gfs2_glock_get() - Get a glock, or create one if one doesn't exist * @sdp: The GFS2 superblock @@ -315,6 +333,7 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, gl->gl_name = name; atomic_set(&gl->gl_ref, 1); gl->gl_state = LM_ST_UNLOCKED; + gl->gl_demote_state = LM_ST_EXCLUSIVE; gl->gl_hash = hash; gl->gl_owner_pid = 0; gl->gl_ip = 0; @@ -323,10 +342,12 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, gl->gl_req_bh = NULL; gl->gl_vn = 0; gl->gl_stamp = jiffies; + gl->gl_tchange = jiffies; gl->gl_object = NULL; gl->gl_sbd = sdp; gl->gl_aspace = NULL; lops_init_le(&gl->gl_le, &gfs2_glock_lops); + INIT_DELAYED_WORK(&gl->gl_work, glock_work_func); /* If this glock protects actual on-disk data or metadata blocks, create a VFS inode to manage the pages/buffers holding them. */ @@ -440,6 +461,8 @@ static void wait_on_holder(struct gfs2_h static void gfs2_demote_wake(struct gfs2_glock *gl) { + BUG_ON(!spin_is_locked(&gl->gl_spin)); + gl->gl_demote_state = LM_ST_EXCLUSIVE; clear_bit(GLF_DEMOTE, &gl->gl_flags); smp_mb__after_clear_bit(); wake_up_bit(&gl->gl_flags, GLF_DEMOTE); @@ -545,12 +568,14 @@ static int rq_demote(struct gfs2_glock * return 0; } set_bit(GLF_LOCK, &gl->gl_flags); - spin_unlock(&gl->gl_spin); if (gl->gl_demote_state == LM_ST_UNLOCKED || - gl->gl_state != LM_ST_EXCLUSIVE) + gl->gl_state != LM_ST_EXCLUSIVE) { + spin_unlock(&gl->gl_spin); gfs2_glock_drop_th(gl); - else + } else { + spin_unlock(&gl->gl_spin); gfs2_glock_xmote_th(gl, NULL); + } spin_lock(&gl->gl_spin); return 0; @@ -679,10 +704,14 @@ static void gfs2_glmutex_unlock(struct g * practise: LM_ST_SHARED and LM_ST_UNLOCKED */ -static void handle_callback(struct gfs2_glock *gl, unsigned int state, int remote) +static void handle_callback(struct gfs2_glock *gl, unsigned int state, + int remote, unsigned long delay) { + int bit = delay ? GLF_PENDING_DEMOTE : GLF_DEMOTE; + spin_lock(&gl->gl_spin); - if (test_and_set_bit(GLF_DEMOTE, &gl->gl_flags) == 0) { + set_bit(bit, &gl->gl_flags); + if (gl->gl_demote_state == LM_ST_EXCLUSIVE) { gl->gl_demote_state = state; gl->gl_demote_time = jiffies; if (remote && gl->gl_ops->go_type == LM_TYPE_IOPEN && @@ -695,8 +724,9 @@ static void handle_callback(struct gfs2_ } return; } - } else if (gl->gl_demote_state != LM_ST_UNLOCKED) { - gl->gl_demote_state = state; + } else if (gl->gl_demote_state != LM_ST_UNLOCKED && + gl->gl_demote_state != state) { + gl->gl_demote_state = LM_ST_UNLOCKED; } spin_unlock(&gl->gl_spin); } @@ -723,6 +753,7 @@ static void state_change(struct gfs2_glo } gl->gl_state = new_state; + gl->gl_tchange = jiffies; } /** @@ -760,10 +791,20 @@ static void xmote_bh(struct gfs2_glock * if (!gh) { gl->gl_stamp = jiffies; - if (ret & LM_OUT_CANCELED) + if (ret & LM_OUT_CANCELED) { op_done = 0; - else + } else { + spin_lock(&gl->gl_spin); + if (gl->gl_state != gl->gl_demote_state) { + gl->gl_req_bh = NULL; + spin_unlock(&gl->gl_spin); + gfs2_glock_drop_th(gl); + gfs2_glock_put(gl); + return; + } gfs2_demote_wake(gl); + spin_unlock(&gl->gl_spin); + } } else { spin_lock(&gl->gl_spin); list_del_init(&gh->gh_list); @@ -799,7 +840,6 @@ out: gl->gl_req_gh = NULL; gl->gl_req_bh = NULL; clear_bit(GLF_LOCK, &gl->gl_flags); - run_queue(gl); spin_unlock(&gl->gl_spin); } @@ -817,7 +857,7 @@ out: * */ -void gfs2_glock_xmote_th(struct gfs2_glock *gl, struct gfs2_holder *gh) +static void gfs2_glock_xmote_th(struct gfs2_glock *gl, struct gfs2_holder *gh) { struct gfs2_sbd *sdp = gl->gl_sbd; int flags = gh ? gh->gh_flags : 0; @@ -871,7 +911,6 @@ static void drop_bh(struct gfs2_glock *g gfs2_assert_warn(sdp, !ret); state_change(gl, LM_ST_UNLOCKED); - gfs2_demote_wake(gl); if (glops->go_inval) glops->go_inval(gl, DIO_METADATA); @@ -884,10 +923,10 @@ static void drop_bh(struct gfs2_glock *g } spin_lock(&gl->gl_spin); + gfs2_demote_wake(gl); gl->gl_req_gh = NULL; gl->gl_req_bh = NULL; clear_bit(GLF_LOCK, &gl->gl_flags); - run_queue(gl); spin_unlock(&gl->gl_spin); gfs2_glock_put(gl); @@ -1195,9 +1234,10 @@ void gfs2_glock_dq(struct gfs2_holder *g { struct gfs2_glock *gl = gh->gh_gl; const struct gfs2_glock_operations *glops = gl->gl_ops; + unsigned delay = 0; if (gh->gh_flags & GL_NOCACHE) - handle_callback(gl, LM_ST_UNLOCKED, 0); + handle_callback(gl, LM_ST_UNLOCKED, 0, 0); gfs2_glmutex_lock(gl); @@ -1215,8 +1255,14 @@ void gfs2_glock_dq(struct gfs2_holder *g } clear_bit(GLF_LOCK, &gl->gl_flags); - run_queue(gl); spin_unlock(&gl->gl_spin); + + gfs2_glock_hold(gl); + if (test_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) && + !test_bit(GLF_DEMOTE, &gl->gl_flags)) + delay = gl->gl_ops->go_min_hold_time; + if (queue_delayed_work(glock_workqueue, &gl->gl_work, delay) == 0) + gfs2_glock_put(gl); } void gfs2_glock_dq_wait(struct gfs2_holder *gh) @@ -1443,18 +1489,21 @@ static void blocking_cb(struct gfs2_sbd unsigned int state) { struct gfs2_glock *gl; + unsigned long delay = 0; + unsigned long holdtime; + unsigned long now = jiffies; gl = gfs2_glock_find(sdp, name); if (!gl) return; - handle_callback(gl, state, 1); + holdtime = gl->gl_tchange + gl->gl_ops->go_min_hold_time; + if (time_before(now, holdtime)) + delay = holdtime - now; - spin_lock(&gl->gl_spin); - run_queue(gl); - spin_unlock(&gl->gl_spin); - - gfs2_glock_put(gl); + handle_callback(gl, state, 1, delay); + if (queue_delayed_work(glock_workqueue, &gl->gl_work, delay) == 0) + gfs2_glock_put(gl); } /** @@ -1495,7 +1544,8 @@ void gfs2_glock_cb(void *cb_data, unsign return; if (!gfs2_assert_warn(sdp, gl->gl_req_bh)) gl->gl_req_bh(gl, async->lc_ret); - gfs2_glock_put(gl); + if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0) + gfs2_glock_put(gl); up_read(&gfs2_umount_flush_sem); return; } @@ -1588,7 +1638,7 @@ void gfs2_reclaim_glock(struct gfs2_sbd if (gfs2_glmutex_trylock(gl)) { if (list_empty(&gl->gl_holders) && gl->gl_state != LM_ST_UNLOCKED && demote_ok(gl)) - handle_callback(gl, LM_ST_UNLOCKED, 0); + handle_callback(gl, LM_ST_UNLOCKED, 0, 0); gfs2_glmutex_unlock(gl); } @@ -1617,7 +1667,7 @@ static int examine_bucket(glock_examiner goto out; gl = list_entry(head->first, struct gfs2_glock, gl_list); while(1) { - if (gl->gl_sbd == sdp) { + if (!sdp || gl->gl_sbd == sdp) { gfs2_glock_hold(gl); read_unlock(gl_lock_addr(hash)); if (prev) @@ -1635,6 +1685,7 @@ out: read_unlock(gl_lock_addr(hash)); if (prev) gfs2_glock_put(prev); + cond_resched(); return has_entries; } @@ -1663,20 +1714,6 @@ out_schedule: } /** - * gfs2_scand_internal - Look for glocks and inodes to toss from memory - * @sdp: the filesystem - * - */ - -void gfs2_scand_internal(struct gfs2_sbd *sdp) -{ - unsigned int x; - - for (x = 0; x < GFS2_GL_HASH_SIZE; x++) - examine_bucket(scan_glock, sdp, x); -} - -/** * clear_glock - look at a glock and see if we can free it from glock cache * @gl: the glock to look at * @@ -1701,7 +1738,7 @@ static void clear_glock(struct gfs2_gloc if (gfs2_glmutex_trylock(gl)) { if (list_empty(&gl->gl_holders) && gl->gl_state != LM_ST_UNLOCKED) - handle_callback(gl, LM_ST_UNLOCKED, 0); + handle_callback(gl, LM_ST_UNLOCKED, 0, 0); gfs2_glmutex_unlock(gl); } } @@ -1843,7 +1880,7 @@ static int dump_glock(struct glock_iter spin_lock(&gl->gl_spin); - print_dbg(gi, "Glock 0x%p (%u, %llu)\n", gl, gl->gl_name.ln_type, + print_dbg(gi, "Glock 0x%p (%u, 0x%llx)\n", gl, gl->gl_name.ln_type, (unsigned long long)gl->gl_name.ln_number); print_dbg(gi, " gl_flags ="); for (x = 0; x < 32; x++) { @@ -1963,6 +2000,35 @@ static int gfs2_dump_lockstate(struct gf return error; } +/** + * gfs2_scand - Look for cached glocks and inodes to toss from memory + * @sdp: Pointer to GFS2 superblock + * + * One of these daemons runs, finding candidates to add to sd_reclaim_list. + * See gfs2_glockd() + */ + +static int gfs2_scand(void *data) +{ + unsigned x; + unsigned delay; + + while (!kthread_should_stop()) { + for (x = 0; x < GFS2_GL_HASH_SIZE; x++) + examine_bucket(scan_glock, NULL, x); + if (freezing(current)) + refrigerator(); + delay = scand_secs; + if (delay < 1) + delay = 1; + schedule_timeout_interruptible(delay * HZ); + } + + return 0; +} + + + int __init gfs2_glock_init(void) { unsigned i; @@ -1974,52 +2040,69 @@ #ifdef GL_HASH_LOCK_SZ rwlock_init(&gl_hash_locks[i]); } #endif + + scand_process = kthread_run(gfs2_scand, NULL, "gfs2_scand"); + if (IS_ERR(scand_process)) + return PTR_ERR(scand_process); + + glock_workqueue = create_workqueue("glock_workqueue"); + if (IS_ERR(glock_workqueue)) { + kthread_stop(scand_process); + return PTR_ERR(glock_workqueue); + } + return 0; } +void gfs2_glock_exit(void) +{ + destroy_workqueue(glock_workqueue); + kthread_stop(scand_process); +} + +module_param(scand_secs, uint, S_IRUGO|S_IWUSR); +MODULE_PARM_DESC(scand_secs, "The number of seconds between scand runs"); + static int gfs2_glock_iter_next(struct glock_iter *gi) { + struct gfs2_glock *gl; + +restart: read_lock(gl_lock_addr(gi->hash)); - while (1) { - if (!gi->hb_list) { /* If we don't have a hash bucket yet */ - gi->hb_list = &gl_hash_table[gi->hash].hb_list; - if (hlist_empty(gi->hb_list)) { - read_unlock(gl_lock_addr(gi->hash)); - gi->hash++; - read_lock(gl_lock_addr(gi->hash)); - gi->hb_list = NULL; - if (gi->hash >= GFS2_GL_HASH_SIZE) { - read_unlock(gl_lock_addr(gi->hash)); - return 1; - } - else - continue; - } - if (!hlist_empty(gi->hb_list)) { - gi->gl = list_entry(gi->hb_list->first, - struct gfs2_glock, - gl_list); - } - } else { - if (gi->gl->gl_list.next == NULL) { - read_unlock(gl_lock_addr(gi->hash)); - gi->hash++; - read_lock(gl_lock_addr(gi->hash)); - gi->hb_list = NULL; - continue; - } - gi->gl = list_entry(gi->gl->gl_list.next, - struct gfs2_glock, gl_list); - } + gl = gi->gl; + if (gl) { + gi->gl = hlist_entry(gl->gl_list.next, + struct gfs2_glock, gl_list); if (gi->gl) - break; + gfs2_glock_hold(gi->gl); } read_unlock(gl_lock_addr(gi->hash)); + if (gl) + gfs2_glock_put(gl); + if (gl && gi->gl == NULL) + gi->hash++; + while(gi->gl == NULL) { + if (gi->hash >= GFS2_GL_HASH_SIZE) + return 1; + read_lock(gl_lock_addr(gi->hash)); + gi->gl = hlist_entry(gl_hash_table[gi->hash].hb_list.first, + struct gfs2_glock, gl_list); + if (gi->gl) + gfs2_glock_hold(gi->gl); + read_unlock(gl_lock_addr(gi->hash)); + gi->hash++; + } + + if (gi->sdp != gi->gl->gl_sbd) + goto restart; + return 0; } static void gfs2_glock_iter_free(struct glock_iter *gi) { + if (gi->gl) + gfs2_glock_put(gi->gl); kfree(gi); } @@ -2033,9 +2116,8 @@ static struct glock_iter *gfs2_glock_ite gi->sdp = sdp; gi->hash = 0; - gi->gl = NULL; - gi->hb_list = NULL; gi->seq = NULL; + gi->gl = NULL; memset(gi->string, 0, sizeof(gi->string)); if (gfs2_glock_iter_next(gi)) { @@ -2055,7 +2137,7 @@ static void *gfs2_glock_seq_start(struct if (!gi) return NULL; - while (n--) { + while(n--) { if (gfs2_glock_iter_next(gi)) { gfs2_glock_iter_free(gi); return NULL; @@ -2082,7 +2164,9 @@ static void *gfs2_glock_seq_next(struct static void gfs2_glock_seq_stop(struct seq_file *file, void *iter_ptr) { - /* nothing for now */ + struct glock_iter *gi = iter_ptr; + if (gi) + gfs2_glock_iter_free(gi); } static int gfs2_glock_seq_show(struct seq_file *file, void *iter_ptr) @@ -2095,7 +2179,7 @@ static int gfs2_glock_seq_show(struct se return 0; } -static struct seq_operations gfs2_glock_seq_ops = { +static const struct seq_operations gfs2_glock_seq_ops = { .start = gfs2_glock_seq_start, .next = gfs2_glock_seq_next, .stop = gfs2_glock_seq_stop, diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h index 7721ca3..f7a8e62 100644 --- a/fs/gfs2/glock.h +++ b/fs/gfs2/glock.h @@ -132,11 +132,11 @@ void gfs2_glock_cb(void *cb_data, unsign void gfs2_glock_schedule_for_reclaim(struct gfs2_glock *gl); void gfs2_reclaim_glock(struct gfs2_sbd *sdp); - -void gfs2_scand_internal(struct gfs2_sbd *sdp); void gfs2_gl_hash_clear(struct gfs2_sbd *sdp, int wait); int __init gfs2_glock_init(void); +void gfs2_glock_exit(void); + int gfs2_create_debugfs_file(struct gfs2_sbd *sdp); void gfs2_delete_debugfs_file(struct gfs2_sbd *sdp); int gfs2_register_debugfs(void); diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c index 777ca46..7ef6b23 100644 --- a/fs/gfs2/glops.c +++ b/fs/gfs2/glops.c @@ -156,9 +156,11 @@ static void inode_go_sync(struct gfs2_gl ip = NULL; if (test_bit(GLF_DIRTY, &gl->gl_flags)) { - if (ip) + if (ip && !gfs2_is_jdata(ip)) filemap_fdatawrite(ip->i_inode.i_mapping); gfs2_log_flush(gl->gl_sbd, gl); + if (ip && gfs2_is_jdata(ip)) + filemap_fdatawrite(ip->i_inode.i_mapping); gfs2_meta_sync(gl); if (ip) { struct address_space *mapping = ip->i_inode.i_mapping; @@ -452,6 +454,7 @@ const struct gfs2_glock_operations gfs2_ .go_lock = inode_go_lock, .go_unlock = inode_go_unlock, .go_type = LM_TYPE_INODE, + .go_min_hold_time = HZ / 10, }; const struct gfs2_glock_operations gfs2_rgrp_glops = { @@ -462,6 +465,7 @@ const struct gfs2_glock_operations gfs2_ .go_lock = rgrp_go_lock, .go_unlock = rgrp_go_unlock, .go_type = LM_TYPE_RGRP, + .go_min_hold_time = HZ / 10, }; const struct gfs2_glock_operations gfs2_trans_glops = { diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index 170ba93..23b611a 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -11,6 +11,7 @@ #ifndef __INCORE_DOT_H__ #define __INCORE_DOT_H__ #include +#include #define DIO_WAIT 0x00000010 #define DIO_METADATA 0x00000020 @@ -130,6 +131,7 @@ struct gfs2_glock_operations { int (*go_lock) (struct gfs2_holder *gh); void (*go_unlock) (struct gfs2_holder *gh); const int go_type; + const unsigned long go_min_hold_time; }; enum { @@ -161,6 +163,7 @@ enum { GLF_LOCK = 1, GLF_STICKY = 2, GLF_DEMOTE = 3, + GLF_PENDING_DEMOTE = 4, GLF_DIRTY = 5, }; @@ -193,6 +196,7 @@ struct gfs2_glock { u64 gl_vn; unsigned long gl_stamp; + unsigned long gl_tchange; void *gl_object; struct list_head gl_reclaim; @@ -203,6 +207,7 @@ struct gfs2_glock { struct gfs2_log_element gl_le; struct list_head gl_ail_list; atomic_t gl_ail_count; + struct delayed_work gl_work; }; struct gfs2_alloc { @@ -429,7 +434,6 @@ struct gfs2_tune { unsigned int gt_log_flush_secs; unsigned int gt_jindex_refresh_secs; /* Check for new journal index */ - unsigned int gt_scand_secs; unsigned int gt_recoverd_secs; unsigned int gt_logd_secs; unsigned int gt_quotad_secs; @@ -574,7 +578,6 @@ struct gfs2_sbd { /* Daemon stuff */ - struct task_struct *sd_scand_process; struct task_struct *sd_recoverd_process; struct task_struct *sd_logd_process; struct task_struct *sd_quotad_process; diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index 34f7bcd..013f00b 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -244,6 +244,11 @@ static int gfs2_dinode_in(struct gfs2_in return 0; } +static void gfs2_inode_bh(struct gfs2_inode *ip, struct buffer_head *bh) +{ + ip->i_cache[0] = bh; +} + /** * gfs2_inode_refresh - Refresh the incore copy of the dinode * @ip: The GFS2 inode @@ -688,7 +693,7 @@ out: static void init_dinode(struct gfs2_inode *dip, struct gfs2_glock *gl, const struct gfs2_inum_host *inum, unsigned int mode, unsigned int uid, unsigned int gid, - const u64 *generation, dev_t dev) + const u64 *generation, dev_t dev, struct buffer_head **bhp) { struct gfs2_sbd *sdp = GFS2_SB(&dip->i_inode); struct gfs2_dinode *di; @@ -743,13 +748,15 @@ static void init_dinode(struct gfs2_inod di->di_mtime_nsec = cpu_to_be32(tv.tv_nsec); di->di_ctime_nsec = cpu_to_be32(tv.tv_nsec); memset(&di->di_reserved, 0, sizeof(di->di_reserved)); + + set_buffer_uptodate(dibh); - brelse(dibh); + *bhp = dibh; } static int make_dinode(struct gfs2_inode *dip, struct gfs2_glock *gl, unsigned int mode, const struct gfs2_inum_host *inum, - const u64 *generation, dev_t dev) + const u64 *generation, dev_t dev, struct buffer_head **bhp) { struct gfs2_sbd *sdp = GFS2_SB(&dip->i_inode); unsigned int uid, gid; @@ -770,7 +777,7 @@ static int make_dinode(struct gfs2_inode if (error) goto out_quota; - init_dinode(dip, gl, inum, mode, uid, gid, generation, dev); + init_dinode(dip, gl, inum, mode, uid, gid, generation, dev, bhp); gfs2_quota_change(dip, +1, uid, gid); gfs2_trans_end(sdp); @@ -909,6 +916,7 @@ struct inode *gfs2_createi(struct gfs2_h struct gfs2_inum_host inum = { .no_addr = 0, .no_formal_ino = 0 }; int error; u64 generation; + struct buffer_head *bh=NULL; if (!name->len || name->len > GFS2_FNAMESIZE) return ERR_PTR(-ENAMETOOLONG); @@ -935,7 +943,7 @@ struct inode *gfs2_createi(struct gfs2_h if (error) goto fail_gunlock; - error = make_dinode(dip, ghs[1].gh_gl, mode, &inum, &generation, dev); + error = make_dinode(dip, ghs[1].gh_gl, mode, &inum, &generation, dev, &bh); if (error) goto fail_gunlock2; @@ -945,6 +953,8 @@ struct inode *gfs2_createi(struct gfs2_h if (IS_ERR(inode)) goto fail_gunlock2; + gfs2_inode_bh(GFS2_I(inode), bh); + error = gfs2_inode_refresh(GFS2_I(inode)); if (error) goto fail_gunlock2; diff --git a/fs/gfs2/locking/dlm/lock_dlm.h b/fs/gfs2/locking/dlm/lock_dlm.h index 24d70f7..9e8265d 100644 --- a/fs/gfs2/locking/dlm/lock_dlm.h +++ b/fs/gfs2/locking/dlm/lock_dlm.h @@ -13,7 +13,6 @@ #define LOCK_DLM_DOT_H #include #include #include -#include #include #include #include diff --git a/fs/gfs2/locking/dlm/plock.c b/fs/gfs2/locking/dlm/plock.c index fba1f1d..1f7b038 100644 --- a/fs/gfs2/locking/dlm/plock.c +++ b/fs/gfs2/locking/dlm/plock.c @@ -346,15 +346,16 @@ static ssize_t dev_write(struct file *fi static unsigned int dev_poll(struct file *file, poll_table *wait) { + unsigned int mask = 0; + poll_wait(file, &send_wq, wait); spin_lock(&ops_lock); - if (!list_empty(&send_list)) { - spin_unlock(&ops_lock); - return POLLIN | POLLRDNORM; - } + if (!list_empty(&send_list)) + mask = POLLIN | POLLRDNORM; spin_unlock(&ops_lock); - return 0; + + return mask; } static const struct file_operations dev_fops = { diff --git a/fs/gfs2/locking/nolock/main.c b/fs/gfs2/locking/nolock/main.c index 0d149c8..d3b8ce6 100644 --- a/fs/gfs2/locking/nolock/main.c +++ b/fs/gfs2/locking/nolock/main.c @@ -9,7 +9,6 @@ #include #include -#include #include #include #include diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c index f49a12e..d0e6b42 100644 --- a/fs/gfs2/log.c +++ b/fs/gfs2/log.c @@ -83,11 +83,6 @@ static void gfs2_ail1_start_one(struct g gfs2_assert(sdp, bd->bd_ail == ai); - if (!bh){ - list_move(&bd->bd_ail_st_list, &ai->ai_ail2_list); - continue; - } - if (!buffer_busy(bh)) { if (!buffer_uptodate(bh)) { gfs2_log_unlock(sdp); @@ -130,11 +125,6 @@ static int gfs2_ail1_empty_one(struct gf bd_ail_st_list) { bh = bd->bd_bh; - if (!bh){ - list_move(&bd->bd_ail_st_list, &ai->ai_ail2_list); - continue; - } - gfs2_assert(sdp, bd->bd_ail == ai); if (buffer_busy(bh)) { @@ -155,13 +145,14 @@ static int gfs2_ail1_empty_one(struct gf static void gfs2_ail1_start(struct gfs2_sbd *sdp, int flags) { - struct list_head *head = &sdp->sd_ail1_list; + struct list_head *head; u64 sync_gen; struct list_head *first; struct gfs2_ail *first_ai, *ai, *tmp; int done = 0; gfs2_log_lock(sdp); + head = &sdp->sd_ail1_list; if (list_empty(head)) { gfs2_log_unlock(sdp); return; @@ -228,6 +219,7 @@ static void gfs2_ail2_empty_one(struct g { struct list_head *head = &ai->ai_ail2_list; struct gfs2_bufdata *bd; + struct gfs2_inode *bh_ip; while (!list_empty(head)) { bd = list_entry(head->prev, struct gfs2_bufdata, @@ -237,6 +229,10 @@ static void gfs2_ail2_empty_one(struct g list_del(&bd->bd_ail_st_list); list_del(&bd->bd_ail_gl_list); atomic_dec(&bd->bd_gl->gl_ail_count); + if (bd->bd_bh->b_page->mapping) { + bh_ip = GFS2_I(bd->bd_bh->b_page->mapping->host); + gfs2_meta_cache_flush(bh_ip); + } brelse(bd->bd_bh); } } diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c index 3b395c4..7ef3356 100644 --- a/fs/gfs2/lops.c +++ b/fs/gfs2/lops.c @@ -117,7 +117,7 @@ static void buf_lo_before_commit(struct struct buffer_head *bh; struct gfs2_log_descriptor *ld; struct gfs2_bufdata *bd1 = NULL, *bd2; - unsigned int total = sdp->sd_log_num_buf; + unsigned int total; unsigned int offset = BUF_OFFSET; unsigned int limit; unsigned int num; @@ -127,12 +127,16 @@ static void buf_lo_before_commit(struct limit = buf_limit(sdp); /* for 4k blocks, limit = 503 */ + gfs2_log_lock(sdp); + total = sdp->sd_log_num_buf; bd1 = bd2 = list_prepare_entry(bd1, &sdp->sd_log_le_buf, bd_le.le_list); while(total) { num = total; if (total > limit) num = limit; + gfs2_log_unlock(sdp); bh = gfs2_log_get_buf(sdp); + gfs2_log_lock(sdp); ld = (struct gfs2_log_descriptor *)bh->b_data; ptr = (__be64 *)(bh->b_data + offset); ld->ld_header.mh_magic = cpu_to_be32(GFS2_MAGIC); @@ -152,21 +156,27 @@ static void buf_lo_before_commit(struct break; } + gfs2_log_unlock(sdp); set_buffer_dirty(bh); ll_rw_block(WRITE, 1, &bh); + gfs2_log_lock(sdp); n = 0; list_for_each_entry_continue(bd2, &sdp->sd_log_le_buf, bd_le.le_list) { + gfs2_log_unlock(sdp); bh = gfs2_log_fake_buf(sdp, bd2->bd_bh); set_buffer_dirty(bh); ll_rw_block(WRITE, 1, &bh); + gfs2_log_lock(sdp); if (++n >= num) break; } + BUG_ON(total < num); total -= num; } + gfs2_log_unlock(sdp); } static void buf_lo_after_commit(struct gfs2_sbd *sdp, struct gfs2_ail *ai) @@ -482,11 +492,12 @@ static void databuf_lo_add(struct gfs2_s gfs2_trans_add_gl(bd->bd_gl); if (gfs2_is_jdata(ip)) { - sdp->sd_log_num_jdata++; gfs2_pin(sdp, bd->bd_bh); tr->tr_num_databuf_new++; } gfs2_log_lock(sdp); + if (gfs2_is_jdata(ip)) + sdp->sd_log_num_jdata++; sdp->sd_log_num_databuf++; list_add(&le->le_list, &sdp->sd_log_le_databuf); gfs2_log_unlock(sdp); @@ -524,7 +535,7 @@ static void databuf_lo_before_commit(str struct gfs2_log_descriptor *ld; unsigned int limit; unsigned int total_dbuf; - unsigned int total_jdata = sdp->sd_log_num_jdata; + unsigned int total_jdata; unsigned int num, n; __be64 *ptr = NULL; @@ -536,6 +547,7 @@ static void databuf_lo_before_commit(str */ gfs2_log_lock(sdp); total_dbuf = sdp->sd_log_num_databuf; + total_jdata = sdp->sd_log_num_jdata; bd2 = bd1 = list_prepare_entry(bd1, &sdp->sd_log_le_databuf, bd_le.le_list); while(total_dbuf) { @@ -621,10 +633,10 @@ static void databuf_lo_before_commit(str } gfs2_log_unlock(sdp); if (bh) { - set_buffer_mapped(bh); set_buffer_dirty(bh); ll_rw_block(WRITE, 1, &bh); bh = NULL; + ptr = NULL; } n = 0; gfs2_log_lock(sdp); diff --git a/fs/gfs2/main.c b/fs/gfs2/main.c index d5d4e68..79c91fd 100644 --- a/fs/gfs2/main.c +++ b/fs/gfs2/main.c @@ -107,6 +107,8 @@ static int __init init_gfs2_fs(void) fail_unregister: unregister_filesystem(&gfs2_fs_type); fail: + gfs2_glock_exit(); + if (gfs2_bufdata_cachep) kmem_cache_destroy(gfs2_bufdata_cachep); @@ -127,6 +129,7 @@ fail: static void __exit exit_gfs2_fs(void) { + gfs2_glock_exit(); gfs2_unregister_debugfs(); unregister_filesystem(&gfs2_fs_type); unregister_filesystem(&gfs2meta_fs_type); diff --git a/fs/gfs2/mount.c b/fs/gfs2/mount.c index 4864659..b941f9f 100644 --- a/fs/gfs2/mount.c +++ b/fs/gfs2/mount.c @@ -42,6 +42,7 @@ enum { Opt_nosuiddir, Opt_data_writeback, Opt_data_ordered, + Opt_err, }; static match_table_t tokens = { @@ -64,7 +65,8 @@ static match_table_t tokens = { {Opt_suiddir, "suiddir"}, {Opt_nosuiddir, "nosuiddir"}, {Opt_data_writeback, "data=writeback"}, - {Opt_data_ordered, "data=ordered"} + {Opt_data_ordered, "data=ordered"}, + {Opt_err, NULL} }; /** @@ -237,6 +239,7 @@ int gfs2_mount_args(struct gfs2_sbd *sdp case Opt_data_ordered: args->ar_data = GFS2_DATA_ORDERED; break; + case Opt_err: default: fs_info(sdp, "unknown option: %s\n", o); error = -EINVAL; diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c index 42a5f58..8407d1d 100644 --- a/fs/gfs2/ops_address.c +++ b/fs/gfs2/ops_address.c @@ -616,58 +616,13 @@ static sector_t gfs2_bmap(struct address return dblock; } -static void discard_buffer(struct gfs2_sbd *sdp, struct buffer_head *bh) -{ - struct gfs2_bufdata *bd; - - gfs2_log_lock(sdp); - bd = bh->b_private; - if (bd) { - bd->bd_bh = NULL; - bh->b_private = NULL; - if (!bd->bd_ail && list_empty(&bd->bd_le.le_list)) - kmem_cache_free(gfs2_bufdata_cachep, bd); - } - gfs2_log_unlock(sdp); - - lock_buffer(bh); - clear_buffer_dirty(bh); - bh->b_bdev = NULL; - clear_buffer_mapped(bh); - clear_buffer_req(bh); - clear_buffer_new(bh); - clear_buffer_delay(bh); - unlock_buffer(bh); -} - static void gfs2_invalidatepage(struct page *page, unsigned long offset) { - struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host); - struct buffer_head *head, *bh, *next; - unsigned int curr_off = 0; - BUG_ON(!PageLocked(page)); if (offset == 0) ClearPageChecked(page); - if (!page_has_buffers(page)) - return; - bh = head = page_buffers(page); - do { - unsigned int next_off = curr_off + bh->b_size; - next = bh->b_this_page; - - if (offset <= curr_off) - discard_buffer(sdp, bh); - - curr_off = next_off; - bh = next; - } while (bh != head); - - if (!offset) - try_to_release_page(page, 0); - - return; + block_invalidatepage(page, offset); } /** @@ -736,59 +691,6 @@ out: } /** - * stuck_releasepage - We're stuck in gfs2_releasepage(). Print stuff out. - * @bh: the buffer we're stuck on - * - */ - -static void stuck_releasepage(struct buffer_head *bh) -{ - struct inode *inode = bh->b_page->mapping->host; - struct gfs2_sbd *sdp = inode->i_sb->s_fs_info; - struct gfs2_bufdata *bd = bh->b_private; - struct gfs2_glock *gl; -static unsigned limit = 0; - - if (limit > 3) - return; - limit++; - - fs_warn(sdp, "stuck in gfs2_releasepage() %p\n", inode); - fs_warn(sdp, "blkno = %llu, bh->b_count = %d\n", - (unsigned long long)bh->b_blocknr, atomic_read(&bh->b_count)); - fs_warn(sdp, "pinned = %u\n", buffer_pinned(bh)); - fs_warn(sdp, "bh->b_private = %s\n", (bd) ? "!NULL" : "NULL"); - - if (!bd) - return; - - gl = bd->bd_gl; - - fs_warn(sdp, "gl = (%u, %llu)\n", - gl->gl_name.ln_type, (unsigned long long)gl->gl_name.ln_number); - - fs_warn(sdp, "bd_list_tr = %s, bd_le.le_list = %s\n", - (list_empty(&bd->bd_list_tr)) ? "no" : "yes", - (list_empty(&bd->bd_le.le_list)) ? "no" : "yes"); - - if (gl->gl_ops == &gfs2_inode_glops) { - struct gfs2_inode *ip = gl->gl_object; - unsigned int x; - - if (!ip) - return; - - fs_warn(sdp, "ip = %llu %llu\n", - (unsigned long long)ip->i_no_formal_ino, - (unsigned long long)ip->i_no_addr); - - for (x = 0; x < GFS2_MAX_META_HEIGHT; x++) - fs_warn(sdp, "ip->i_cache[%u] = %s\n", - x, (ip->i_cache[x]) ? "!NULL" : "NULL"); - } -} - -/** * gfs2_releasepage - free the metadata associated with a page * @page: the page that's being released * @gfp_mask: passed from Linux VFS, ignored by us @@ -805,38 +707,31 @@ int gfs2_releasepage(struct page *page, struct gfs2_sbd *sdp = aspace->i_sb->s_fs_info; struct buffer_head *bh, *head; struct gfs2_bufdata *bd; - unsigned long t = jiffies + gfs2_tune_get(sdp, gt_stall_secs) * HZ; if (!page_has_buffers(page)) goto out; + gfs2_log_lock(sdp); head = bh = page_buffers(page); do { - while (atomic_read(&bh->b_count)) { - if (!atomic_read(&aspace->i_writecount)) - return 0; - - if (!(gfp_mask & __GFP_WAIT)) - return 0; - - if (time_after_eq(jiffies, t)) { - stuck_releasepage(bh); - /* should we withdraw here? */ - return 0; - } - - yield(); - } - + if (atomic_read(&bh->b_count)) + goto cannot_release; + bd = bh->b_private; + if (bd && bd->bd_ail) + goto cannot_release; gfs2_assert_warn(sdp, !buffer_pinned(bh)); gfs2_assert_warn(sdp, !buffer_dirty(bh)); + bh = bh->b_this_page; + } while(bh != head); + gfs2_log_unlock(sdp); + head = bh = page_buffers(page); + do { gfs2_log_lock(sdp); bd = bh->b_private; if (bd) { gfs2_assert_warn(sdp, bd->bd_bh == bh); gfs2_assert_warn(sdp, list_empty(&bd->bd_list_tr)); - gfs2_assert_warn(sdp, !bd->bd_ail); bd->bd_bh = NULL; if (!list_empty(&bd->bd_le.le_list)) bd = NULL; @@ -851,6 +746,9 @@ int gfs2_releasepage(struct page *page, out: return try_to_free_buffers(page); +cannot_release: + gfs2_log_unlock(sdp); + return 0; } const struct address_space_operations gfs2_file_aops = { diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index cf5aa50..314c113 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -28,18 +28,18 @@ #include "inode.h" #include "lm.h" #include "mount.h" #include "ops_fstype.h" +#include "ops_dentry.h" #include "ops_super.h" #include "recovery.h" #include "rgrp.h" #include "super.h" #include "sys.h" #include "util.h" +#include "log.h" #define DO 0 #define UNDO 1 -extern struct dentry_operations gfs2_dops; - static struct gfs2_sbd *init_sbd(struct super_block *sb) { struct gfs2_sbd *sdp; @@ -145,7 +145,8 @@ static int init_names(struct gfs2_sbd *s snprintf(sdp->sd_proto_name, GFS2_FSNAME_LEN, "%s", proto); snprintf(sdp->sd_table_name, GFS2_FSNAME_LEN, "%s", table); - while ((table = strchr(sdp->sd_table_name, '/'))) + table = sdp->sd_table_name; + while ((table = strchr(table, '/'))) *table = '_'; out: @@ -161,14 +162,6 @@ static int init_locking(struct gfs2_sbd if (undo) goto fail_trans; - p = kthread_run(gfs2_scand, sdp, "gfs2_scand"); - error = IS_ERR(p); - if (error) { - fs_err(sdp, "can't start scand thread: %d\n", error); - return error; - } - sdp->sd_scand_process = p; - for (sdp->sd_glockd_num = 0; sdp->sd_glockd_num < sdp->sd_args.ar_num_glockd; sdp->sd_glockd_num++) { @@ -229,7 +222,6 @@ fail: while (sdp->sd_glockd_num--) kthread_stop(sdp->sd_glockd_process[sdp->sd_glockd_num]); - kthread_stop(sdp->sd_scand_process); return error; } @@ -301,8 +293,9 @@ static int init_sb(struct gfs2_sbd *sdp, fs_err(sdp, "can't get root dentry\n"); error = -ENOMEM; iput(inode); - } - sb->s_root->d_op = &gfs2_dops; + } else + sb->s_root->d_op = &gfs2_dops; + out: gfs2_glock_dq_uninit(&sb_gh); return error; @@ -368,7 +361,7 @@ static int init_journal(struct gfs2_sbd ip = GFS2_I(sdp->sd_jdesc->jd_inode); error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, - LM_FLAG_NOEXP | GL_EXACT, + LM_FLAG_NOEXP | GL_EXACT | GL_NOCACHE, &sdp->sd_jinode_gh); if (error) { fs_err(sdp, "can't acquire journal inode glock: %d\n", @@ -818,7 +811,6 @@ static struct super_block* get_gfs2_sb(c struct nameidata nd; struct file_system_type *fstype; struct super_block *sb = NULL, *s; - struct list_head *l; int error; error = path_lookup(dev_name, LOOKUP_FOLLOW, &nd); @@ -830,8 +822,7 @@ static struct super_block* get_gfs2_sb(c error = vfs_getattr(nd.mnt, nd.dentry, &stat); fstype = get_fs_type("gfs2"); - list_for_each(l, &fstype->fs_supers) { - s = list_entry(l, struct super_block, s_instances); + list_for_each_entry(s, &fstype->fs_supers, s_instances) { if ((S_ISBLK(stat.mode) && s->s_dev == stat.rdev) || (S_ISDIR(stat.mode) && s == nd.dentry->d_inode->i_sb)) { sb = s; @@ -861,7 +852,7 @@ static int gfs2_get_sb_meta(struct file_ error = -ENOENT; goto error; } - sdp = (struct gfs2_sbd*) sb->s_fs_info; + sdp = sb->s_fs_info; if (sdp->sd_vfs_meta) { printk(KERN_WARNING "GFS2: gfs2meta mount already exists\n"); error = -EBUSY; @@ -896,7 +887,10 @@ error: static void gfs2_kill_sb(struct super_block *sb) { - gfs2_delete_debugfs_file(sb->s_fs_info); + if (sb->s_fs_info) { + gfs2_delete_debugfs_file(sb->s_fs_info); + gfs2_meta_syncfs(sb->s_fs_info); + } kill_block_super(sb); } diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c index 911c115..2cbe5a3 100644 --- a/fs/gfs2/ops_inode.c +++ b/fs/gfs2/ops_inode.c @@ -69,7 +69,7 @@ static int gfs2_create(struct inode *dir mark_inode_dirty(inode); break; } else if (PTR_ERR(inode) != -EEXIST || - (nd->intent.open.flags & O_EXCL)) { + (nd && (nd->intent.open.flags & O_EXCL))) { gfs2_holder_uninit(ghs); return PTR_ERR(inode); } @@ -278,17 +278,25 @@ static int gfs2_unlink(struct inode *dir gfs2_holder_init(rgd->rd_gl, LM_ST_EXCLUSIVE, 0, ghs + 2); - error = gfs2_glock_nq_m(3, ghs); + error = gfs2_glock_nq(ghs); /* parent */ if (error) - goto out; + goto out_parent; + + error = gfs2_glock_nq(ghs + 1); /* child */ + if (error) + goto out_child; + + error = gfs2_glock_nq(ghs + 2); /* rgrp */ + if (error) + goto out_rgrp; error = gfs2_unlink_ok(dip, &dentry->d_name, ip); if (error) - goto out_gunlock; + goto out_rgrp; error = gfs2_trans_begin(sdp, 2*RES_DINODE + RES_LEAF + RES_RG_BIT, 0); if (error) - goto out_gunlock; + goto out_rgrp; error = gfs2_dir_del(dip, &dentry->d_name); if (error) @@ -298,12 +306,15 @@ static int gfs2_unlink(struct inode *dir out_end_trans: gfs2_trans_end(sdp); -out_gunlock: - gfs2_glock_dq_m(3, ghs); -out: - gfs2_holder_uninit(ghs); - gfs2_holder_uninit(ghs + 1); + gfs2_glock_dq(ghs + 2); +out_rgrp: gfs2_holder_uninit(ghs + 2); + gfs2_glock_dq(ghs + 1); +out_child: + gfs2_holder_uninit(ghs + 1); + gfs2_glock_dq(ghs); +out_parent: + gfs2_holder_uninit(ghs); gfs2_glock_dq_uninit(&ri_gh); return error; } diff --git a/fs/gfs2/ops_super.c b/fs/gfs2/ops_super.c index 603d940..4316690 100644 --- a/fs/gfs2/ops_super.c +++ b/fs/gfs2/ops_super.c @@ -92,7 +92,6 @@ static void gfs2_put_super(struct super_ kthread_stop(sdp->sd_recoverd_process); while (sdp->sd_glockd_num--) kthread_stop(sdp->sd_glockd_process[sdp->sd_glockd_num]); - kthread_stop(sdp->sd_scand_process); if (!(sb->s_flags & MS_RDONLY)) { error = gfs2_make_fs_ro(sdp); diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c index 6e546ee..addb51e 100644 --- a/fs/gfs2/quota.c +++ b/fs/gfs2/quota.c @@ -70,6 +70,7 @@ struct gfs2_quota_host { u64 qu_limit; u64 qu_warn; s64 qu_value; + u32 qu_ll_next; }; struct gfs2_quota_change_host { @@ -580,6 +581,7 @@ static void gfs2_quota_in(struct gfs2_qu qu->qu_limit = be64_to_cpu(str->qu_limit); qu->qu_warn = be64_to_cpu(str->qu_warn); qu->qu_value = be64_to_cpu(str->qu_value); + qu->qu_ll_next = be32_to_cpu(str->qu_ll_next); } static void gfs2_quota_out(const struct gfs2_quota_host *qu, void *buf) @@ -589,6 +591,7 @@ static void gfs2_quota_out(const struct str->qu_limit = cpu_to_be64(qu->qu_limit); str->qu_warn = cpu_to_be64(qu->qu_warn); str->qu_value = cpu_to_be64(qu->qu_value); + str->qu_ll_next = cpu_to_be32(qu->qu_ll_next); memset(&str->qu_reserved, 0, sizeof(str->qu_reserved)); } @@ -614,6 +617,16 @@ static int gfs2_adjust_quota(struct gfs2 s64 value; int err = -EIO; + if (gfs2_is_stuffed(ip)) { + struct gfs2_alloc *al = NULL; + al = gfs2_alloc_get(ip); + /* just request 1 blk */ + al->al_requested = 1; + gfs2_inplace_reserve(ip); + gfs2_unstuff_dinode(ip, NULL); + gfs2_inplace_release(ip); + gfs2_alloc_put(ip); + } page = grab_cache_page(mapping, index); if (!page) return -ENOMEM; diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c index 5ada38c..beb6c7a 100644 --- a/fs/gfs2/recovery.c +++ b/fs/gfs2/recovery.c @@ -469,7 +469,7 @@ int gfs2_recover_journal(struct gfs2_jde }; error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, - LM_FLAG_NOEXP, &ji_gh); + LM_FLAG_NOEXP | GL_NOCACHE, &ji_gh); if (error) goto fail_gunlock_j; } else { diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index ce48c45..2d7f7ea 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -31,6 +31,7 @@ #include "log.h" #include "inode.h" #define BFITNOENT ((u32)~0) +#define NO_BLOCK ((u64)~0) /* * These routines are used by the resource group routines (rgrp.c) @@ -116,8 +117,7 @@ static unsigned char gfs2_testbit(struct * @buffer: the buffer that holds the bitmaps * @buflen: the length (in bytes) of the buffer * @goal: start search at this block's bit-pair (within @buffer) - * @old_state: GFS2_BLKST_XXX the state of the block we're looking for; - * bit 0 = alloc(1)/free(0), bit 1 = meta(1)/data(0) + * @old_state: GFS2_BLKST_XXX the state of the block we're looking for. * * Scope of @goal and returned block number is only within this bitmap buffer, * not entire rgrp or filesystem. @buffer will be offset from the actual @@ -137,9 +137,13 @@ static u32 gfs2_bitfit(struct gfs2_rgrpd byte = buffer + (goal / GFS2_NBBY); bit = (goal % GFS2_NBBY) * GFS2_BIT_SIZE; end = buffer + buflen; - alloc = (old_state & 1) ? 0 : 0x55; + alloc = (old_state == GFS2_BLKST_FREE) ? 0x55 : 0; while (byte < end) { + /* If we're looking for a free block we can eliminate all + bitmap settings with 0x55, which represents four data + blocks in a row. If we're looking for a data block, we can + eliminate 0x00 which corresponds to four free blocks. */ if ((*byte & 0x55) == alloc) { blk += (8 - bit) >> 1; @@ -859,19 +863,24 @@ static int try_rgrp_fit(struct gfs2_rgrp static struct inode *try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 *last_unlinked) { struct inode *inode; - u32 goal = 0; + u32 goal = 0, block; u64 no_addr; + struct gfs2_sbd *sdp = rgd->rd_sbd; for(;;) { if (goal >= rgd->rd_data) break; - goal = rgblk_search(rgd, goal, GFS2_BLKST_UNLINKED, - GFS2_BLKST_UNLINKED); - if (goal == BFITNOENT) + down_write(&sdp->sd_log_flush_lock); + block = rgblk_search(rgd, goal, GFS2_BLKST_UNLINKED, + GFS2_BLKST_UNLINKED); + up_write(&sdp->sd_log_flush_lock); + if (block == BFITNOENT) break; - no_addr = goal + rgd->rd_data0; + /* rgblk_search can return a block < goal, so we need to + keep it marching forward. */ + no_addr = block + rgd->rd_data0; goal++; - if (no_addr < *last_unlinked) + if (*last_unlinked != NO_BLOCK && no_addr <= *last_unlinked) continue; *last_unlinked = no_addr; inode = gfs2_inode_lookup(rgd->rd_sbd->sd_vfs, DT_UNKNOWN, @@ -1152,7 +1161,7 @@ int gfs2_inplace_reserve_i(struct gfs2_i struct gfs2_alloc *al = &ip->i_alloc; struct inode *inode; int error = 0; - u64 last_unlinked = 0; + u64 last_unlinked = NO_BLOCK; if (gfs2_assert_warn(sdp, al->al_requested)) return -EINVAL; @@ -1289,7 +1298,9 @@ static u32 rgblk_search(struct gfs2_rgrp allocatable block anywhere else, we want to be able wrap around and search in the first part of our first-searched bit block. */ for (x = 0; x <= length; x++) { - if (bi->bi_clone) + /* The GFS2_BLKST_UNLINKED state doesn't apply to the clone + bitmaps, so we must search the originals for that. */ + if (old_state != GFS2_BLKST_UNLINKED && bi->bi_clone) blk = gfs2_bitfit(rgd, bi->bi_clone + bi->bi_offset, bi->bi_len, goal, old_state); else @@ -1305,9 +1316,7 @@ static u32 rgblk_search(struct gfs2_rgrp goal = 0; } - if (old_state != new_state) { - gfs2_assert_withdraw(rgd->rd_sbd, blk != BFITNOENT); - + if (blk != BFITNOENT && old_state != new_state) { gfs2_trans_add_bh(rgd->rd_gl, bi->bi_bh, 1); gfs2_setbit(rgd, bi->bi_bh->b_data + bi->bi_offset, bi->bi_len, blk, new_state); diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index f916b97..5589802 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -58,7 +58,6 @@ void gfs2_tune_init(struct gfs2_tune *gt gt->gt_incore_log_blocks = 1024; gt->gt_log_flush_secs = 60; gt->gt_jindex_refresh_secs = 60; - gt->gt_scand_secs = 15; gt->gt_recoverd_secs = 60; gt->gt_logd_secs = 1; gt->gt_quotad_secs = 5; diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c index c26c21b..ba3a172 100644 --- a/fs/gfs2/sys.c +++ b/fs/gfs2/sys.c @@ -442,7 +442,6 @@ TUNE_ATTR(quota_simul_sync, 1); TUNE_ATTR(quota_cache_secs, 1); TUNE_ATTR(stall_secs, 1); TUNE_ATTR(statfs_quantum, 1); -TUNE_ATTR_DAEMON(scand_secs, scand_process); TUNE_ATTR_DAEMON(recoverd_secs, recoverd_process); TUNE_ATTR_DAEMON(logd_secs, logd_process); TUNE_ATTR_DAEMON(quotad_secs, quotad_process); @@ -464,7 +463,6 @@ static struct attribute *tune_attrs[] = &tune_attr_quota_cache_secs.attr, &tune_attr_stall_secs.attr, &tune_attr_statfs_quantum.attr, - &tune_attr_scand_secs.attr, &tune_attr_recoverd_secs.attr, &tune_attr_logd_secs.attr, &tune_attr_quotad_secs.attr, diff --git a/include/linux/gfs2_ondisk.h b/include/linux/gfs2_ondisk.h index a44a6a0..c3c19f9 100644 --- a/include/linux/gfs2_ondisk.h +++ b/include/linux/gfs2_ondisk.h @@ -170,6 +170,33 @@ struct gfs2_rgrp { }; /* + * quota linked list: user quotas and group quotas form two separate + * singly linked lists. ll_next stores uids or gids of next quotas in the + * linked list. + +Given the uid/gid, how to calculate the quota file offsets for the corresponding +gfs2_quota structures on disk: + +for user quotas, given uid, +offset = uid * sizeof(struct gfs2_quota); + +for group quotas, given gid, +offset = (gid * sizeof(struct gfs2_quota)) + sizeof(struct gfs2_quota); + + + uid:0 gid:0 uid:12 gid:12 uid:17 gid:17 uid:5142 gid:5142 ++-------+-------+ +-------+-------+ +-------+- - - -+ +- - - -+-------+ +| valid | valid | :: | valid | valid | :: | valid | inval | :: | inval | valid | ++-------+-------+ +-------+-------+ +-------+- - - -+ +- - - -+-------+ +next:12 next:12 next:17 next:5142 next:NULL next:NULL + | | | | |<-- user quota list | + \______|___________/ \______|___________/ group quota list -->| + | | | + \__________________/ \_______________________________________/ + +*/ + +/* * quota structure */ @@ -177,7 +204,8 @@ struct gfs2_quota { __be64 qu_limit; __be64 qu_warn; __be64 qu_value; - __u8 qu_reserved[64]; + __be32 qu_ll_next; /* location of next quota in list */ + __u8 qu_reserved[60]; }; /*