GIT 29cf885c92531b6f5bc2c6d9eb5f2d13adf592d1 git+ssh://master.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw.git commit Author: Bob Peterson Date: Wed Aug 8 17:08:14 2007 -0500 [GFS2] Ensure journal file cache is flushed after recovery This is for bugzilla bug #248176: GFS2: invalid metadata block Patches 1 thru 3 were accepted upstream, but there were problems with 4 and 5. Those issues have been resolved and now the recovery tests are passing without errors. This code has gone through 41 * 3 successful gfs2 recovery tests before it hit an unrelated (openais) problem. I'm continuing to test it. This is a complete rewrite of patch 5 for bug #248176, written by Steve Whitehouse. This is referred to in the bugzilla record as "new 6" and "a different solution". The problem was that the journal inodes, although protected by a glock, were not synched with the other nodes because they don't use the inode glock synch operations (i.e. no "glops" were defined). Therefore, journal recovery on a journal-recovering node were causing the blocks to get out of sync with the node that was actually trying to use that journal as it comes back up from a reboot. There are two possible solutions: (1) To make the journals use the normal inode glock sync operations, or (2) To make the journal operations take effect immediately (i.e. no caching). Although option 1 works, it turns out to be a lot more code. Steve opted for option 2, which is much simpler and therefore less prone to regression errors. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse -- commit 147466c9363276871a86bdbbd9e3fb4994aadfa8 Author: Bob Peterson Date: Wed Aug 8 16:52:09 2007 -0500 [GFS2] invalid metadata block - REVISED This is for bugzilla bug #248176: GFS2: invalid metadata block Patches 1 thru 3 were accepted upstream, but there were problems with 4 and 5. Those issues have been resolved and now the recovery tests are passing without errors. This code has gone through 41 * 3 successful gfs2 recovery tests before it hit an unrelated (openais) problem. This is a complete rewrite of patch 4 for bug #248176. Part of the problem was that inodes were being recycled before their buffers were flushed to the journal logs. Another problem was that the clone bitmaps were being searched for deleted inodes to recycle, but only the "real" bitmaps should be searched for that purpose. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse commit 2b1c1060e88ea53b234bf6c788568a375effc386 Author: David Teigland Date: Tue Aug 7 09:44:48 2007 -0500 [DLM] fix basts for granted PR waiting CW Fix a long standing bug where a blocking callback would be missed when there's a granted lock in PR mode and waiting locks in both PR and CW modes (and the PR lock was added to the waiting queue before the CW lock). The logic simply compared the numerical values of the modes to determine if a blocking callback was required, but in the one case of PR and CW, the lower valued CW mode blocks the higher valued PR mode. We just need to add a special check for this PR/CW case in the tests that decide when a blocking callback is needed. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse commit 04659dcf8a24b70ff4d93881a470d5bd28095e0e Author: Patrick Caulfield Date: Thu Aug 2 14:58:14 2007 +0100 [DLM] More othercon fixes The last patch to clean out 'othercon' structures only fixed half the problem. The attached addresses the other situations too, and fixes bz#238490 Signed-Off-By: Patrick Caulfield Signed-off-by: Steven Whitehouse commit 57013452a8b9056828ccfe056c1694c180cd1744 Author: Steven Whitehouse Date: Wed Aug 1 13:57:10 2007 +0100 [GFS2] Reduce number of gfs2_scand processes to one We only need a single gfs2_scand process rather than the one per filesystem which we had previously. As a result the parameter determining the frequency of gfs2_scand runs becomes a module parameter rather than a mount parameter as it was before. Signed-off-by: Steven Whitehouse commit fbd8edbb6ce5232f35a63c722a94890fe2ab4da4 Author: Denis Cheng Date: Tue Jul 31 18:31:12 2007 +0800 [GFS2] use the declaration of gfs2_dops in the header file instead Signed-off-by: Denis Cheng Signed-off-by: Steven Whitehouse commit 5713c576055065a35a73e0fbb87f3271978121bc Author: Denis Cheng Date: Tue Jul 31 18:31:11 2007 +0800 [GFS2] mark struct *_operations const these struct *_operations are all method tables, thus should be const. Signed-off-by: Denis Cheng Signed-off-by: Steven Whitehouse commit c885b3eaf3e8ee671b5b3e60f77d5f43d1063915 Author: Bob Peterson Date: Wed Jul 25 10:06:22 2007 -0500 [GFS2] Detach buf data during in-place writeback This is patch 5 of 5 for bug #248176 Metadata corruption was occurring because page references weren't being removed in all cases. I previously added a function called detach_bufdata, but I discovered there already WAS a function out there to do the job. It's called gfs2_meta_cache_flush. So I added a call to that to remove the page references. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse commit 3d94d4d0933cccde3c24282c6e98562e90cf50a2 Author: Denis Cheng Date: Wed Jul 25 17:53:58 2007 +0800 [GFS2] use an temp variable to reduce a spin_unlock this is more clear. Signed-off-by: Denis Cheng Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse commit 911c123507b0ed54d0bd7c14a193b3dff281dfdc Author: Bob Peterson Date: Tue Jul 24 14:09:32 2007 -0500 [GFS2] Prevent infinite loop in try_rgrp_unlink() This is patch three of five for bug #248176. The try_rgrp_unlink code in rgrp.c had an infinite loop. This was caused because the bitmap function rgblk_search can return a block less than the "goal" block, in which case it was looping. The fix is to make it always march forward as needed. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse commit 9ed1a4d0154cd6cdb1f76e9208feb05a51daede9 Author: Bob Peterson Date: Tue Jul 24 14:07:33 2007 -0500 [GFS2] Revert part of earlier log.c changes This is patch 2 of 5 for bug #248176. The list_move code previously concocted in log.c for bug #238162 (see https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=238162#c23) never runs as bh can now never be NULL at this point. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse commit e7cc0d311a927e2c594942799e00f68470cd0a4b Author: Bob Peterson Date: Tue Jul 24 14:05:31 2007 -0500 [GFS2] Move some code inside the log lock This is the first of five patches for bug #248176: There were still some critical variables being manipulated outside the log_lock spinlock. That usually resulted in a hang. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse commit c904abc6d16868b3ee2a70355f51c222e64e74f3 Author: Steven Whitehouse Date: Tue Jul 24 13:53:36 2007 +0100 [GFS2] Fix an oops in glock dumping This fixes an oops which was occurring during glock dumping due to the seq file code not taking a reference to the glock. Also this fixes a memory leak which occurred in certain cases, in turn preventing the filesystem from unmounting. Signed-off-by: Steven Whitehouse commit f30f5b78f94e77af96edb15e55d2f8b70d141a81 Author: Steve French Date: Fri Jul 20 13:07:26 2007 -0500 [GFS2] GFS2 not checking pointer on create when running under nfsd When looking at an unrelated problem, I noticed that nfsd does not set nameidata pointer on create (ie nd is NULL). This should cause an oops in some cases in which when NFSd is mounted over GFS2. Signed-off-by: Steve French Signed-off-by: Steven Whitehouse commit 84523ef874bb4101f11b9519497d306ccc3d3a89 Author: Jesper Juhl Date: Sat Jul 21 17:03:22 2007 +0200 [GFS2] Clean up duplicate includes in fs/gfs2/ This patch cleans up duplicate includes in fs/gfs2/ Signed-off-by: Jesper Juhl Signed-off-by: Steven Whitehouse commit c52005cb9a90e5d5fddfbdc6c822dd45f300b154 Author: Josef Whiter Date: Mon Jul 23 10:02:40 2007 +0100 [GFS2] Fix calculation of demote state If a glock is in the exclusive state and a request for demote to deferred has been received, then further requests for demote to shared are being ignored. This patch fixes that by ensuring that we demote to unlocked in that case. Signed-off-by: Josef Whiter Signed-off-by: Steven Whitehouse commit 827047b63977de650175112e8a2d06f0fdb4a528 Author: Steven Whitehouse Date: Mon Jul 23 09:54:36 2007 +0100 [GFS2] Fix two races relating to glock callbacks One of the races relates to referencing a variable while not holding its protecting spinlock. The patch simply moves the test inside the spin lock. The other races occurs when a demote to unlocked request occurs during the time a demote to shared request is already running. This of course only happens in the case that the lock was in the exclusive mode to start with. The patch adds a check to see if another demote request has occurred in the mean time and if it has, then it performs a second demote. Signed-off-by: Steven Whitehouse commit e7b6954fb8b17b7b10b26ae40ab5b75a7dbb8b54 Author: Jesper Juhl Date: Thu Jul 19 00:27:43 2007 +0200 [DLM] Fix memory leak in dlm_add_member() when dlm_node_weight() returns less than zero There's a memory leak in fs/dlm/member.c::dlm_add_member(). If "dlm_node_weight(ls->ls_name, nodeid)" returns < 0, then we'll return without freeing the memory allocated to the (at that point yet unused) 'memb'. This patch frees the allocated memory in that case and thus avoids the leak. Signed-off-by: Jesper Juhl Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse commit c656dbd5550b7137c2c601acb837fd8d3382640b Author: Steven Whitehouse Date: Thu Jul 19 16:12:50 2007 +0100 [GFS2] Revert remounting w/o acl option leaves acls enabled This reverts commit 569a7b6c2e8965ff4908003b925757703a3d649c. The code was correct originally. The default setting for ACLs after a remount should be to be the same as before the remount. Signed-off-by: Abhijith Das Signed-off-by: Steven Whitehouse commit f07f5b4af772e2d523c82e3b548cc1097277c093 Author: Steven Whitehouse Date: Wed Jul 18 11:40:06 2007 +0100 [GFS2] Fix setting of inherit jdata attr Due to a mix up between the jdata attribute and inherit jdata attribute it has not been possible to set the inherit jdata attribute on directories. This is now fixed and the ioctl will report the inherit jdata attribute for directories rather than the jdata attribute as it did previously. This stems from our need to have the one bit in the ioctl attr flags mean two different things according to whether the underlying inode is a directory or not. Signed-off-by: Steven Whitehouse commit bdca96c1d010b52e80f126b0bd7d6cf3b5c00c44 Author: Patrick Caulfield Date: Tue Jul 17 16:53:15 2007 +0100 [DLM] zero unused parts of sockaddr_storage When we build a sockaddr_storage for an IP address, clear the unused parts as they could be used for node comparisons. I have seen this occasionally make sctp connections fail. Signed-Off-By: Patrick Caulfield Signed-off-by: Steven Whitehouse commit a24bde0b0c3efd6bba505098b67ebf30ef81192c Author: Steven Whitehouse Date: Tue Jul 17 10:29:02 2007 +0100 [GFS2] Fix incorrect error path in prepare_write() The error path in prepare_write() was incorrect in the (very rare) event that the transaction fails to start. The following prevents a NULL pointer dereference, Signed-off-by: Steven Whitehouse commit a6c9df4e00e61df51050ef40b13c0c77ec459f1c Author: Steven Whitehouse Date: Tue Jul 17 10:26:56 2007 +0100 [GFS2] Fix incorrect return code in rgrp.c The following patch fixes a bug where 0 was being used as a return code to indicate "nothing to do" when in fact 0 was a valid block location which might be returned by the function. Signed-off-by: Steven Whitehouse commit b12260a7a5cda9971c5aef45a732d7e9aa9fe20d Author: David Teigland Date: Fri Jul 13 14:49:06 2007 -0500 [DLM] fix NULL ls usage Fix regression in recent patch "[DLM] variable allocation" which attempts to dereference an "ls" struct when it's NULL. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse commit 4dd2bcc9581dec0046fc73d85f5ede52facc66c0 Author: Bob Peterson Date: Thu Jul 12 16:58:50 2007 -0500 [GFS2] soft lockup in rgblk_search This patch seems to fix the problem described in bugzilla bug 246114. It was written by Steve Whitehouse with some tweaking by me. The code was looping in the relatively new section of code designed to search for and reuse unlinked inodes. In cases where it was finding an appropriate inode to reuse, it was looping around and finding the same block over and over because a "<=" check should have been a "<" when comparing the goal block to the last unlinked block found. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse commit f91253d9c111c4e3c8b5708defbd6304d6b1e5c9 Author: Bob Peterson Date: Wed Jul 11 15:55:23 2007 -0500 [GFS2] soft lockup detected in databuf_lo_before_commit This is part 2 of the patch for bug #245832, part 1 of which is already in the git tree. The problem was that sdp->sd_log_num_databuf was not always being protected by the gfs2_log_lock spinlock, but the sd_log_le_databuf (which it is supposed to reflect) was protected. That meant there was a timing window during which gfs2_log_flush called databuf_lo_before_commit and the count didn't match what was really on the linked list in that window. So when it ran out of items on the linked list, it decremented total_dbuf from 0 to -1 and thus never left the "while(total_dbuf)" loop. The solution is to protect the variable sdp->sd_log_num_databuf so that the value will always match the contents of the linked list, and therefore the number will never go negative, and therefore, the loop will be exited properly. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse commit f120be57c596c61cbb0733f76994a604bb5f3160 Author: Patrick Caulfield Date: Wed Jul 11 13:39:43 2007 +0100 [DLM] Clear othercon pointers when a connection is closed This patch clears the othercon pointer and frees the memory when a connnection is closed. This could cause a small memory leak when nodes leave the cluster. Signed-Off-By: Patrick Caulfield Signed-off-by: Steven Whitehouse fs/dlm/lock.c | 69 +++++++++++++--- fs/dlm/lowcomms.c | 24 ++++-- fs/dlm/member.c | 4 + fs/dlm/rcom.c | 7 +- fs/gfs2/daemon.c | 24 ------ fs/gfs2/daemon.h | 1 fs/gfs2/eaops.c | 8 +- fs/gfs2/eaops.h | 4 - fs/gfs2/glock.c | 169 +++++++++++++++++++++++++--------------- fs/gfs2/glock.h | 4 - fs/gfs2/incore.h | 2 fs/gfs2/locking/dlm/lock_dlm.h | 1 fs/gfs2/locking/dlm/plock.c | 11 +-- fs/gfs2/locking/nolock/main.c | 1 fs/gfs2/log.c | 16 +--- fs/gfs2/lops.c | 23 ++++- fs/gfs2/main.c | 3 + fs/gfs2/mount.c | 25 +++--- fs/gfs2/ops_address.c | 3 - fs/gfs2/ops_file.c | 29 ++++--- fs/gfs2/ops_fstype.c | 14 --- fs/gfs2/ops_inode.c | 2 fs/gfs2/ops_super.c | 1 fs/gfs2/recovery.c | 2 fs/gfs2/rgrp.c | 49 ++++++++---- fs/gfs2/super.c | 1 fs/gfs2/sys.c | 2 27 files changed, 290 insertions(+), 209 deletions(-) diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c index b455919..2082daf 100644 --- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -1670,9 +1670,10 @@ static int can_be_granted(struct dlm_rsb with a deadlk here, we'd have to generate something like grant_lock with the deadlk error.) */ -/* returns the highest requested mode of all blocked conversions */ +/* Returns the highest requested mode of all blocked conversions; sets + cw if there's a blocked conversion to DLM_LOCK_CW. */ -static int grant_pending_convert(struct dlm_rsb *r, int high) +static int grant_pending_convert(struct dlm_rsb *r, int high, int *cw) { struct dlm_lkb *lkb, *s; int hi, demoted, quit, grant_restart, demote_restart; @@ -1709,6 +1710,9 @@ static int grant_pending_convert(struct } hi = max_t(int, lkb->lkb_rqmode, hi); + + if (cw && lkb->lkb_rqmode == DLM_LOCK_CW) + *cw = 1; } if (grant_restart) @@ -1721,29 +1725,52 @@ static int grant_pending_convert(struct return max_t(int, high, hi); } -static int grant_pending_wait(struct dlm_rsb *r, int high) +static int grant_pending_wait(struct dlm_rsb *r, int high, int *cw) { struct dlm_lkb *lkb, *s; list_for_each_entry_safe(lkb, s, &r->res_waitqueue, lkb_statequeue) { if (can_be_granted(r, lkb, 0, NULL)) grant_lock_pending(r, lkb); - else + else { high = max_t(int, lkb->lkb_rqmode, high); + if (lkb->lkb_rqmode == DLM_LOCK_CW) + *cw = 1; + } } return high; } +/* cw of 1 means there's a lock with a rqmode of DLM_LOCK_CW that's blocked + on either the convert or waiting queue. + high is the largest rqmode of all locks blocked on the convert or + waiting queue. */ + +static int lock_requires_bast(struct dlm_lkb *gr, int high, int cw) +{ + if (gr->lkb_grmode == DLM_LOCK_PR && cw) { + if (gr->lkb_highbast < DLM_LOCK_EX) + return 1; + return 0; + } + + if (gr->lkb_highbast < high && + !__dlm_compat_matrix[gr->lkb_grmode+1][high+1]) + return 1; + return 0; +} + static void grant_pending_locks(struct dlm_rsb *r) { struct dlm_lkb *lkb, *s; int high = DLM_LOCK_IV; + int cw = 0; DLM_ASSERT(is_master(r), dlm_dump_rsb(r);); - high = grant_pending_convert(r, high); - high = grant_pending_wait(r, high); + high = grant_pending_convert(r, high, &cw); + high = grant_pending_wait(r, high, &cw); if (high == DLM_LOCK_IV) return; @@ -1751,27 +1778,41 @@ static void grant_pending_locks(struct d /* * If there are locks left on the wait/convert queue then send blocking * ASTs to granted locks based on the largest requested mode (high) - * found above. FIXME: highbast < high comparison not valid for PR/CW. + * found above. */ list_for_each_entry_safe(lkb, s, &r->res_grantqueue, lkb_statequeue) { - if (lkb->lkb_bastaddr && (lkb->lkb_highbast < high) && - !__dlm_compat_matrix[lkb->lkb_grmode+1][high+1]) { - queue_bast(r, lkb, high); + if (lkb->lkb_bastaddr && lock_requires_bast(lkb, high, cw)) { + if (cw && high == DLM_LOCK_PR) + queue_bast(r, lkb, DLM_LOCK_CW); + else + queue_bast(r, lkb, high); lkb->lkb_highbast = high; } } } +static int modes_require_bast(struct dlm_lkb *gr, struct dlm_lkb *rq) +{ + if ((gr->lkb_grmode == DLM_LOCK_PR && rq->lkb_rqmode == DLM_LOCK_CW) || + (gr->lkb_grmode == DLM_LOCK_CW && rq->lkb_rqmode == DLM_LOCK_PR)) { + if (gr->lkb_highbast < DLM_LOCK_EX) + return 1; + return 0; + } + + if (gr->lkb_highbast < rq->lkb_rqmode && !modes_compat(gr, rq)) + return 1; + return 0; +} + static void send_bast_queue(struct dlm_rsb *r, struct list_head *head, struct dlm_lkb *lkb) { struct dlm_lkb *gr; list_for_each_entry(gr, head, lkb_statequeue) { - if (gr->lkb_bastaddr && - gr->lkb_highbast < lkb->lkb_rqmode && - !modes_compat(gr, lkb)) { + if (gr->lkb_bastaddr && modes_require_bast(gr, lkb)) { queue_bast(r, gr, lkb->lkb_rqmode); gr->lkb_highbast = lkb->lkb_rqmode; } @@ -2235,7 +2276,7 @@ static int do_convert(struct dlm_rsb *r, before we try again to grant this one. */ if (is_demoted(lkb)) { - grant_pending_convert(r, DLM_LOCK_IV); + grant_pending_convert(r, DLM_LOCK_IV, NULL); if (_can_be_granted(r, lkb, 1)) { grant_lock(r, lkb); queue_cast(r, lkb, 0); diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c index dd36273..9e9d2e8 100644 --- a/fs/dlm/lowcomms.c +++ b/fs/dlm/lowcomms.c @@ -313,6 +313,7 @@ static void make_sockaddr(struct sockadd in6_addr->sin6_port = cpu_to_be16(port); *addr_len = sizeof(struct sockaddr_in6); } + memset((char *)saddr + *addr_len, 0, sizeof(struct sockaddr_storage) - *addr_len); } /* Close a remote connection and tidy up */ @@ -332,8 +333,19 @@ static void close_connection(struct conn __free_page(con->rx_page); con->rx_page = NULL; } - con->retries = 0; - mutex_unlock(&con->sock_mutex); + + /* If we are an 'othercon' then NULL the pointer to us + from the parent and tidy ourself up */ + if (test_bit(CF_IS_OTHERCON, &con->flags)) { + struct connection *parent = __nodeid2con(con->nodeid, 0); + parent->othercon = NULL; + kmem_cache_free(con_cache, con); + } + else { + /* Parent connections get reused */ + con->retries = 0; + mutex_unlock(&con->sock_mutex); + } } /* We only send shutdown messages to nodes that are not part of the cluster */ @@ -631,7 +643,7 @@ out_resched: out_close: mutex_unlock(&con->sock_mutex); - if (ret != -EAGAIN && !test_bit(CF_IS_OTHERCON, &con->flags)) { + if (ret != -EAGAIN) { close_connection(con, false); /* Reconnect when there is something to send */ } @@ -1122,8 +1134,6 @@ static int tcp_listen_for_all(void) log_print("Using TCP for communications"); - set_bit(CF_IS_OTHERCON, &con->flags); - sock = tcp_create_listen_sock(con, dlm_local_addr[0]); if (sock) { add_sock(sock, con); @@ -1407,7 +1417,7 @@ void dlm_lowcomms_stop(void) for (i = 0; i <= max_nodeid; i++) { con = __nodeid2con(i, 0); if (con) { - con->flags |= 0xFF; + con->flags |= 0x0F; if (con->sock) con->sock->sk->sk_user_data = NULL; } @@ -1423,8 +1433,6 @@ void dlm_lowcomms_stop(void) con = __nodeid2con(i, 0); if (con) { close_connection(con, true); - if (con->othercon) - kmem_cache_free(con_cache, con->othercon); kmem_cache_free(con_cache, con); } } diff --git a/fs/dlm/member.c b/fs/dlm/member.c index 073599d..d099775 100644 --- a/fs/dlm/member.c +++ b/fs/dlm/member.c @@ -56,8 +56,10 @@ static int dlm_add_member(struct dlm_ls return -ENOMEM; w = dlm_node_weight(ls->ls_name, nodeid); - if (w < 0) + if (w < 0) { + kfree(memb); return w; + } memb->nodeid = nodeid; memb->weight = w; diff --git a/fs/dlm/rcom.c b/fs/dlm/rcom.c index e3a1527..188b91c 100644 --- a/fs/dlm/rcom.c +++ b/fs/dlm/rcom.c @@ -386,8 +386,7 @@ static void receive_rcom_lock_reply(stru dlm_recover_process_copy(ls, rc_in); } -static int send_ls_not_ready(struct dlm_ls *ls, int nodeid, - struct dlm_rcom *rc_in) +static int send_ls_not_ready(int nodeid, struct dlm_rcom *rc_in) { struct dlm_rcom *rc; struct rcom_config *rf; @@ -395,7 +394,7 @@ static int send_ls_not_ready(struct dlm_ char *mb; int mb_len = sizeof(struct dlm_rcom) + sizeof(struct rcom_config); - mh = dlm_lowcomms_get_buffer(nodeid, mb_len, ls->ls_allocation, &mb); + mh = dlm_lowcomms_get_buffer(nodeid, mb_len, GFP_NOFS, &mb); if (!mh) return -ENOBUFS; memset(mb, 0, mb_len); @@ -465,7 +464,7 @@ void dlm_receive_rcom(struct dlm_header log_print("lockspace %x from %d type %x not found", hd->h_lockspace, nodeid, rc->rc_type); if (rc->rc_type == DLM_RCOM_STATUS) - send_ls_not_ready(ls, nodeid, rc); + send_ls_not_ready(nodeid, rc); return; } diff --git a/fs/gfs2/daemon.c b/fs/gfs2/daemon.c index 3548d9f..3731ab0 100644 --- a/fs/gfs2/daemon.c +++ b/fs/gfs2/daemon.c @@ -35,30 +35,6 @@ #include "util.h" The kthread functions used to start these daemons block and flush signals. */ /** - * gfs2_scand - Look for cached glocks and inodes to toss from memory - * @sdp: Pointer to GFS2 superblock - * - * One of these daemons runs, finding candidates to add to sd_reclaim_list. - * See gfs2_glockd() - */ - -int gfs2_scand(void *data) -{ - struct gfs2_sbd *sdp = data; - unsigned long t; - - while (!kthread_should_stop()) { - gfs2_scand_internal(sdp); - t = gfs2_tune_get(sdp, gt_scand_secs) * HZ; - if (freezing(current)) - refrigerator(); - schedule_timeout_interruptible(t); - } - - return 0; -} - -/** * gfs2_glockd - Reclaim unused glock structures * @sdp: Pointer to GFS2 superblock * diff --git a/fs/gfs2/daemon.h b/fs/gfs2/daemon.h index 8010071..0de9b35 100644 --- a/fs/gfs2/daemon.h +++ b/fs/gfs2/daemon.h @@ -10,7 +10,6 @@ #ifndef __DAEMON_DOT_H__ #define __DAEMON_DOT_H__ -int gfs2_scand(void *data); int gfs2_glockd(void *data); int gfs2_recoverd(void *data); int gfs2_logd(void *data); diff --git a/fs/gfs2/eaops.c b/fs/gfs2/eaops.c index 1ab3e9d..aa8dbf3 100644 --- a/fs/gfs2/eaops.c +++ b/fs/gfs2/eaops.c @@ -200,28 +200,28 @@ static int security_eo_remove(struct gfs return gfs2_ea_remove_i(ip, er); } -static struct gfs2_eattr_operations gfs2_user_eaops = { +static const struct gfs2_eattr_operations gfs2_user_eaops = { .eo_get = user_eo_get, .eo_set = user_eo_set, .eo_remove = user_eo_remove, .eo_name = "user", }; -struct gfs2_eattr_operations gfs2_system_eaops = { +const struct gfs2_eattr_operations gfs2_system_eaops = { .eo_get = system_eo_get, .eo_set = system_eo_set, .eo_remove = system_eo_remove, .eo_name = "system", }; -static struct gfs2_eattr_operations gfs2_security_eaops = { +static const struct gfs2_eattr_operations gfs2_security_eaops = { .eo_get = security_eo_get, .eo_set = security_eo_set, .eo_remove = security_eo_remove, .eo_name = "security", }; -struct gfs2_eattr_operations *gfs2_ea_ops[] = { +const struct gfs2_eattr_operations *gfs2_ea_ops[] = { NULL, &gfs2_user_eaops, &gfs2_system_eaops, diff --git a/fs/gfs2/eaops.h b/fs/gfs2/eaops.h index 508b4f7..da2f7fb 100644 --- a/fs/gfs2/eaops.h +++ b/fs/gfs2/eaops.h @@ -22,9 +22,9 @@ struct gfs2_eattr_operations { unsigned int gfs2_ea_name2type(const char *name, const char **truncated_name); -extern struct gfs2_eattr_operations gfs2_system_eaops; +extern const struct gfs2_eattr_operations gfs2_system_eaops; -extern struct gfs2_eattr_operations *gfs2_ea_ops[]; +extern const struct gfs2_eattr_operations *gfs2_ea_ops[]; #endif /* __EAOPS_DOT_H__ */ diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 3f0974e..559937c 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -25,8 +25,8 @@ #include #include #include #include -#include -#include +#include +#include #include "gfs2.h" #include "incore.h" @@ -48,7 +48,6 @@ struct glock_iter { int hash; /* hash bucket index */ struct gfs2_sbd *sdp; /* incore superblock */ struct gfs2_glock *gl; /* current glock struct */ - struct hlist_head *hb_list; /* current hash bucket ptr */ struct seq_file *seq; /* sequence file for debugfs */ char string[512]; /* scratch space */ }; @@ -61,6 +60,8 @@ static void gfs2_glock_xmote_th(struct g static void gfs2_glock_drop_th(struct gfs2_glock *gl); static DECLARE_RWSEM(gfs2_umount_flush_sem); static struct dentry *gfs2_root; +static struct task_struct *scand_process; +static unsigned int scand_secs = 5; #define GFS2_GL_HASH_SHIFT 15 #define GFS2_GL_HASH_SIZE (1 << GFS2_GL_HASH_SHIFT) @@ -545,12 +546,14 @@ static int rq_demote(struct gfs2_glock * return 0; } set_bit(GLF_LOCK, &gl->gl_flags); - spin_unlock(&gl->gl_spin); if (gl->gl_demote_state == LM_ST_UNLOCKED || - gl->gl_state != LM_ST_EXCLUSIVE) + gl->gl_state != LM_ST_EXCLUSIVE) { + spin_unlock(&gl->gl_spin); gfs2_glock_drop_th(gl); - else + } else { + spin_unlock(&gl->gl_spin); gfs2_glock_xmote_th(gl, NULL); + } spin_lock(&gl->gl_spin); return 0; @@ -695,8 +698,9 @@ static void handle_callback(struct gfs2_ } return; } - } else if (gl->gl_demote_state != LM_ST_UNLOCKED) { - gl->gl_demote_state = state; + } else if (gl->gl_demote_state != LM_ST_UNLOCKED && + gl->gl_demote_state != state) { + gl->gl_demote_state = LM_ST_UNLOCKED; } spin_unlock(&gl->gl_spin); } @@ -760,10 +764,20 @@ static void xmote_bh(struct gfs2_glock * if (!gh) { gl->gl_stamp = jiffies; - if (ret & LM_OUT_CANCELED) + if (ret & LM_OUT_CANCELED) { op_done = 0; - else + } else { + spin_lock(&gl->gl_spin); + if (gl->gl_state != gl->gl_demote_state) { + gl->gl_req_bh = NULL; + spin_unlock(&gl->gl_spin); + gfs2_glock_drop_th(gl); + gfs2_glock_put(gl); + return; + } gfs2_demote_wake(gl); + spin_unlock(&gl->gl_spin); + } } else { spin_lock(&gl->gl_spin); list_del_init(&gh->gh_list); @@ -817,7 +831,7 @@ out: * */ -void gfs2_glock_xmote_th(struct gfs2_glock *gl, struct gfs2_holder *gh) +static void gfs2_glock_xmote_th(struct gfs2_glock *gl, struct gfs2_holder *gh) { struct gfs2_sbd *sdp = gl->gl_sbd; int flags = gh ? gh->gh_flags : 0; @@ -1617,7 +1631,7 @@ static int examine_bucket(glock_examiner goto out; gl = list_entry(head->first, struct gfs2_glock, gl_list); while(1) { - if (gl->gl_sbd == sdp) { + if (!sdp || gl->gl_sbd == sdp) { gfs2_glock_hold(gl); read_unlock(gl_lock_addr(hash)); if (prev) @@ -1635,6 +1649,7 @@ out: read_unlock(gl_lock_addr(hash)); if (prev) gfs2_glock_put(prev); + cond_resched(); return has_entries; } @@ -1663,20 +1678,6 @@ out_schedule: } /** - * gfs2_scand_internal - Look for glocks and inodes to toss from memory - * @sdp: the filesystem - * - */ - -void gfs2_scand_internal(struct gfs2_sbd *sdp) -{ - unsigned int x; - - for (x = 0; x < GFS2_GL_HASH_SIZE; x++) - examine_bucket(scan_glock, sdp, x); -} - -/** * clear_glock - look at a glock and see if we can free it from glock cache * @gl: the glock to look at * @@ -1963,6 +1964,35 @@ static int gfs2_dump_lockstate(struct gf return error; } +/** + * gfs2_scand - Look for cached glocks and inodes to toss from memory + * @sdp: Pointer to GFS2 superblock + * + * One of these daemons runs, finding candidates to add to sd_reclaim_list. + * See gfs2_glockd() + */ + +static int gfs2_scand(void *data) +{ + unsigned x; + unsigned delay; + + while (!kthread_should_stop()) { + for (x = 0; x < GFS2_GL_HASH_SIZE; x++) + examine_bucket(scan_glock, NULL, x); + if (freezing(current)) + refrigerator(); + delay = scand_secs; + if (delay < 1) + delay = 1; + schedule_timeout_interruptible(delay * HZ); + } + + return 0; +} + + + int __init gfs2_glock_init(void) { unsigned i; @@ -1974,52 +2004,56 @@ #ifdef GL_HASH_LOCK_SZ rwlock_init(&gl_hash_locks[i]); } #endif + + scand_process = kthread_run(gfs2_scand, NULL, "gfs2_scand"); + if (IS_ERR(scand_process)) + return PTR_ERR(scand_process); + return 0; } +void gfs2_glock_exit(void) +{ + kthread_stop(scand_process); +} + +module_param(scand_secs, uint, S_IRUGO|S_IWUSR); +MODULE_PARM_DESC(scand_secs, "The number of seconds between scand runs"); + static int gfs2_glock_iter_next(struct glock_iter *gi) { + struct gfs2_glock *gl; + read_lock(gl_lock_addr(gi->hash)); - while (1) { - if (!gi->hb_list) { /* If we don't have a hash bucket yet */ - gi->hb_list = &gl_hash_table[gi->hash].hb_list; - if (hlist_empty(gi->hb_list)) { - read_unlock(gl_lock_addr(gi->hash)); - gi->hash++; - read_lock(gl_lock_addr(gi->hash)); - gi->hb_list = NULL; - if (gi->hash >= GFS2_GL_HASH_SIZE) { - read_unlock(gl_lock_addr(gi->hash)); - return 1; - } - else - continue; - } - if (!hlist_empty(gi->hb_list)) { - gi->gl = list_entry(gi->hb_list->first, - struct gfs2_glock, - gl_list); - } - } else { - if (gi->gl->gl_list.next == NULL) { - read_unlock(gl_lock_addr(gi->hash)); - gi->hash++; - read_lock(gl_lock_addr(gi->hash)); - gi->hb_list = NULL; - continue; - } - gi->gl = list_entry(gi->gl->gl_list.next, - struct gfs2_glock, gl_list); - } + gl = gi->gl; + if (gl) { + gi->gl = hlist_entry(gl->gl_list.next, struct gfs2_glock, + gl_list); if (gi->gl) - break; + gfs2_glock_hold(gi->gl); } read_unlock(gl_lock_addr(gi->hash)); + if (gl) + gfs2_glock_put(gl); + + while(gi->gl == NULL) { + gi->hash++; + if (gi->hash >= GFS2_GL_HASH_SIZE) + return 1; + read_lock(gl_lock_addr(gi->hash)); + gi->gl = hlist_entry(gl_hash_table[gi->hash].hb_list.first, + struct gfs2_glock, gl_list); + if (gi->gl) + gfs2_glock_hold(gi->gl); + read_unlock(gl_lock_addr(gi->hash)); + } return 0; } static void gfs2_glock_iter_free(struct glock_iter *gi) { + if (gi->gl) + gfs2_glock_put(gi->gl); kfree(gi); } @@ -2033,12 +2067,17 @@ static struct glock_iter *gfs2_glock_ite gi->sdp = sdp; gi->hash = 0; - gi->gl = NULL; - gi->hb_list = NULL; gi->seq = NULL; memset(gi->string, 0, sizeof(gi->string)); - if (gfs2_glock_iter_next(gi)) { + read_lock(gl_lock_addr(gi->hash)); + gi->gl = hlist_entry(gl_hash_table[gi->hash].hb_list.first, + struct gfs2_glock, gl_list); + if (gi->gl) + gfs2_glock_hold(gi->gl); + read_unlock(gl_lock_addr(gi->hash)); + + if (!gi->gl && gfs2_glock_iter_next(gi)) { gfs2_glock_iter_free(gi); return NULL; } @@ -2055,7 +2094,7 @@ static void *gfs2_glock_seq_start(struct if (!gi) return NULL; - while (n--) { + while(n--) { if (gfs2_glock_iter_next(gi)) { gfs2_glock_iter_free(gi); return NULL; @@ -2082,7 +2121,9 @@ static void *gfs2_glock_seq_next(struct static void gfs2_glock_seq_stop(struct seq_file *file, void *iter_ptr) { - /* nothing for now */ + struct glock_iter *gi = iter_ptr; + if (gi) + gfs2_glock_iter_free(gi); } static int gfs2_glock_seq_show(struct seq_file *file, void *iter_ptr) @@ -2095,7 +2136,7 @@ static int gfs2_glock_seq_show(struct se return 0; } -static struct seq_operations gfs2_glock_seq_ops = { +static const struct seq_operations gfs2_glock_seq_ops = { .start = gfs2_glock_seq_start, .next = gfs2_glock_seq_next, .stop = gfs2_glock_seq_stop, diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h index 7721ca3..f7a8e62 100644 --- a/fs/gfs2/glock.h +++ b/fs/gfs2/glock.h @@ -132,11 +132,11 @@ void gfs2_glock_cb(void *cb_data, unsign void gfs2_glock_schedule_for_reclaim(struct gfs2_glock *gl); void gfs2_reclaim_glock(struct gfs2_sbd *sdp); - -void gfs2_scand_internal(struct gfs2_sbd *sdp); void gfs2_gl_hash_clear(struct gfs2_sbd *sdp, int wait); int __init gfs2_glock_init(void); +void gfs2_glock_exit(void); + int gfs2_create_debugfs_file(struct gfs2_sbd *sdp); void gfs2_delete_debugfs_file(struct gfs2_sbd *sdp); int gfs2_register_debugfs(void); diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index 170ba93..1390b30 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -429,7 +429,6 @@ struct gfs2_tune { unsigned int gt_log_flush_secs; unsigned int gt_jindex_refresh_secs; /* Check for new journal index */ - unsigned int gt_scand_secs; unsigned int gt_recoverd_secs; unsigned int gt_logd_secs; unsigned int gt_quotad_secs; @@ -574,7 +573,6 @@ struct gfs2_sbd { /* Daemon stuff */ - struct task_struct *sd_scand_process; struct task_struct *sd_recoverd_process; struct task_struct *sd_logd_process; struct task_struct *sd_quotad_process; diff --git a/fs/gfs2/locking/dlm/lock_dlm.h b/fs/gfs2/locking/dlm/lock_dlm.h index 24d70f7..9e8265d 100644 --- a/fs/gfs2/locking/dlm/lock_dlm.h +++ b/fs/gfs2/locking/dlm/lock_dlm.h @@ -13,7 +13,6 @@ #define LOCK_DLM_DOT_H #include #include #include -#include #include #include #include diff --git a/fs/gfs2/locking/dlm/plock.c b/fs/gfs2/locking/dlm/plock.c index fba1f1d..1f7b038 100644 --- a/fs/gfs2/locking/dlm/plock.c +++ b/fs/gfs2/locking/dlm/plock.c @@ -346,15 +346,16 @@ static ssize_t dev_write(struct file *fi static unsigned int dev_poll(struct file *file, poll_table *wait) { + unsigned int mask = 0; + poll_wait(file, &send_wq, wait); spin_lock(&ops_lock); - if (!list_empty(&send_list)) { - spin_unlock(&ops_lock); - return POLLIN | POLLRDNORM; - } + if (!list_empty(&send_list)) + mask = POLLIN | POLLRDNORM; spin_unlock(&ops_lock); - return 0; + + return mask; } static const struct file_operations dev_fops = { diff --git a/fs/gfs2/locking/nolock/main.c b/fs/gfs2/locking/nolock/main.c index 0d149c8..d3b8ce6 100644 --- a/fs/gfs2/locking/nolock/main.c +++ b/fs/gfs2/locking/nolock/main.c @@ -9,7 +9,6 @@ #include #include -#include #include #include #include diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c index f49a12e..00ab6c0 100644 --- a/fs/gfs2/log.c +++ b/fs/gfs2/log.c @@ -83,11 +83,6 @@ static void gfs2_ail1_start_one(struct g gfs2_assert(sdp, bd->bd_ail == ai); - if (!bh){ - list_move(&bd->bd_ail_st_list, &ai->ai_ail2_list); - continue; - } - if (!buffer_busy(bh)) { if (!buffer_uptodate(bh)) { gfs2_log_unlock(sdp); @@ -130,11 +125,6 @@ static int gfs2_ail1_empty_one(struct gf bd_ail_st_list) { bh = bd->bd_bh; - if (!bh){ - list_move(&bd->bd_ail_st_list, &ai->ai_ail2_list); - continue; - } - gfs2_assert(sdp, bd->bd_ail == ai); if (buffer_busy(bh)) { @@ -155,13 +145,14 @@ static int gfs2_ail1_empty_one(struct gf static void gfs2_ail1_start(struct gfs2_sbd *sdp, int flags) { - struct list_head *head = &sdp->sd_ail1_list; + struct list_head *head; u64 sync_gen; struct list_head *first; struct gfs2_ail *first_ai, *ai, *tmp; int done = 0; gfs2_log_lock(sdp); + head = &sdp->sd_ail1_list; if (list_empty(head)) { gfs2_log_unlock(sdp); return; @@ -228,6 +219,7 @@ static void gfs2_ail2_empty_one(struct g { struct list_head *head = &ai->ai_ail2_list; struct gfs2_bufdata *bd; + struct gfs2_inode *bh_ip; while (!list_empty(head)) { bd = list_entry(head->prev, struct gfs2_bufdata, @@ -237,6 +229,8 @@ static void gfs2_ail2_empty_one(struct g list_del(&bd->bd_ail_st_list); list_del(&bd->bd_ail_gl_list); atomic_dec(&bd->bd_gl->gl_ail_count); + bh_ip = GFS2_I(bd->bd_bh->b_page->mapping->host); + gfs2_meta_cache_flush(bh_ip); brelse(bd->bd_bh); } } diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c index aff70f0..a0371f8 100644 --- a/fs/gfs2/lops.c +++ b/fs/gfs2/lops.c @@ -117,7 +117,7 @@ static void buf_lo_before_commit(struct struct buffer_head *bh; struct gfs2_log_descriptor *ld; struct gfs2_bufdata *bd1 = NULL, *bd2; - unsigned int total = sdp->sd_log_num_buf; + unsigned int total; unsigned int offset = BUF_OFFSET; unsigned int limit; unsigned int num; @@ -127,12 +127,16 @@ static void buf_lo_before_commit(struct limit = buf_limit(sdp); /* for 4k blocks, limit = 503 */ + gfs2_log_lock(sdp); + total = sdp->sd_log_num_buf; bd1 = bd2 = list_prepare_entry(bd1, &sdp->sd_log_le_buf, bd_le.le_list); while(total) { num = total; if (total > limit) num = limit; + gfs2_log_unlock(sdp); bh = gfs2_log_get_buf(sdp); + gfs2_log_lock(sdp); ld = (struct gfs2_log_descriptor *)bh->b_data; ptr = (__be64 *)(bh->b_data + offset); ld->ld_header.mh_magic = cpu_to_be32(GFS2_MAGIC); @@ -152,21 +156,27 @@ static void buf_lo_before_commit(struct break; } + gfs2_log_unlock(sdp); set_buffer_dirty(bh); ll_rw_block(WRITE, 1, &bh); + gfs2_log_lock(sdp); n = 0; list_for_each_entry_continue(bd2, &sdp->sd_log_le_buf, bd_le.le_list) { + gfs2_log_unlock(sdp); bh = gfs2_log_fake_buf(sdp, bd2->bd_bh); set_buffer_dirty(bh); ll_rw_block(WRITE, 1, &bh); + gfs2_log_lock(sdp); if (++n >= num) break; } + BUG_ON(total < num); total -= num; } + gfs2_log_unlock(sdp); } static void buf_lo_after_commit(struct gfs2_sbd *sdp, struct gfs2_ail *ai) @@ -486,8 +496,8 @@ static void databuf_lo_add(struct gfs2_s gfs2_pin(sdp, bd->bd_bh); tr->tr_num_databuf_new++; } - sdp->sd_log_num_databuf++; gfs2_log_lock(sdp); + sdp->sd_log_num_databuf++; list_add(&le->le_list, &sdp->sd_log_le_databuf); gfs2_log_unlock(sdp); } @@ -523,8 +533,8 @@ static void databuf_lo_before_commit(str struct buffer_head *bh = NULL,*bh1 = NULL; struct gfs2_log_descriptor *ld; unsigned int limit; - unsigned int total_dbuf = sdp->sd_log_num_databuf; - unsigned int total_jdata = sdp->sd_log_num_jdata; + unsigned int total_dbuf; + unsigned int total_jdata; unsigned int num, n; __be64 *ptr = NULL; @@ -535,6 +545,8 @@ static void databuf_lo_before_commit(str * into the log along with a header */ gfs2_log_lock(sdp); + total_dbuf = sdp->sd_log_num_databuf; + total_jdata = sdp->sd_log_num_jdata; bd2 = bd1 = list_prepare_entry(bd1, &sdp->sd_log_le_databuf, bd_le.le_list); while(total_dbuf) { @@ -620,10 +632,10 @@ static void databuf_lo_before_commit(str } gfs2_log_unlock(sdp); if (bh) { - set_buffer_mapped(bh); set_buffer_dirty(bh); ll_rw_block(WRITE, 1, &bh); bh = NULL; + ptr = NULL; } n = 0; gfs2_log_lock(sdp); @@ -653,6 +665,7 @@ static void databuf_lo_before_commit(str break; } bh = NULL; + BUG_ON(total_dbuf < num); total_dbuf -= num; total_jdata -= num; } diff --git a/fs/gfs2/main.c b/fs/gfs2/main.c index d5d4e68..79c91fd 100644 --- a/fs/gfs2/main.c +++ b/fs/gfs2/main.c @@ -107,6 +107,8 @@ static int __init init_gfs2_fs(void) fail_unregister: unregister_filesystem(&gfs2_fs_type); fail: + gfs2_glock_exit(); + if (gfs2_bufdata_cachep) kmem_cache_destroy(gfs2_bufdata_cachep); @@ -127,6 +129,7 @@ fail: static void __exit exit_gfs2_fs(void) { + gfs2_glock_exit(); gfs2_unregister_debugfs(); unregister_filesystem(&gfs2_fs_type); unregister_filesystem(&gfs2meta_fs_type); diff --git a/fs/gfs2/mount.c b/fs/gfs2/mount.c index 6f006a8..4864659 100644 --- a/fs/gfs2/mount.c +++ b/fs/gfs2/mount.c @@ -82,19 +82,20 @@ int gfs2_mount_args(struct gfs2_sbd *sdp char *options, *o, *v; int error = 0; - /* If someone preloaded options, use those instead */ - spin_lock(&gfs2_sys_margs_lock); - if (!remount && gfs2_sys_margs) { - data = gfs2_sys_margs; - gfs2_sys_margs = NULL; - } - spin_unlock(&gfs2_sys_margs_lock); + if (!remount) { + /* If someone preloaded options, use those instead */ + spin_lock(&gfs2_sys_margs_lock); + if (gfs2_sys_margs) { + data = gfs2_sys_margs; + gfs2_sys_margs = NULL; + } + spin_unlock(&gfs2_sys_margs_lock); - /* Set some defaults */ - memset(args, 0, sizeof(struct gfs2_args)); - args->ar_num_glockd = GFS2_GLOCKD_DEFAULT; - args->ar_quota = GFS2_QUOTA_DEFAULT; - args->ar_data = GFS2_DATA_DEFAULT; + /* Set some defaults */ + args->ar_num_glockd = GFS2_GLOCKD_DEFAULT; + args->ar_quota = GFS2_QUOTA_DEFAULT; + args->ar_data = GFS2_DATA_DEFAULT; + } /* Split the options into tokens with the "," character and process them */ diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c index ce90032..42a5f58 100644 --- a/fs/gfs2/ops_address.c +++ b/fs/gfs2/ops_address.c @@ -416,7 +416,7 @@ static int gfs2_prepare_write(struct fil error = gfs2_trans_begin(sdp, rblocks, 0); if (error) - goto out; + goto out_trans_fail; if (gfs2_is_stuffed(ip)) { if (end > sdp->sd_sb.sb_bsize - sizeof(struct gfs2_dinode)) { @@ -434,6 +434,7 @@ prepare_write: out: if (error) { gfs2_trans_end(sdp); +out_trans_fail: if (alloc_required) { gfs2_inplace_release(ip); out_qunlock: diff --git a/fs/gfs2/ops_file.c b/fs/gfs2/ops_file.c index 7734211..94d76ac 100644 --- a/fs/gfs2/ops_file.c +++ b/fs/gfs2/ops_file.c @@ -177,8 +177,8 @@ static const u32 fsflags_to_gfs2[32] = { [5] = GFS2_DIF_APPENDONLY, [7] = GFS2_DIF_NOATIME, [12] = GFS2_DIF_EXHASH, - [14] = GFS2_DIF_JDATA, - [20] = GFS2_DIF_DIRECTIO, + [14] = GFS2_DIF_INHERIT_JDATA, + [20] = GFS2_DIF_INHERIT_DIRECTIO, }; static const u32 gfs2_to_fsflags[32] = { @@ -187,8 +187,6 @@ static const u32 gfs2_to_fsflags[32] = { [gfs2fl_AppendOnly] = FS_APPEND_FL, [gfs2fl_NoAtime] = FS_NOATIME_FL, [gfs2fl_ExHash] = FS_INDEX_FL, - [gfs2fl_Jdata] = FS_JOURNAL_DATA_FL, - [gfs2fl_Directio] = FS_DIRECTIO_FL, [gfs2fl_InheritDirectio] = FS_DIRECTIO_FL, [gfs2fl_InheritJdata] = FS_JOURNAL_DATA_FL, }; @@ -207,6 +205,12 @@ static int gfs2_get_flags(struct file *f return error; fsflags = fsflags_cvt(gfs2_to_fsflags, ip->i_di.di_flags); + if (!S_ISDIR(inode->i_mode)) { + if (ip->i_di.di_flags & GFS2_DIF_JDATA) + fsflags |= FS_JOURNAL_DATA_FL; + if (ip->i_di.di_flags & GFS2_DIF_DIRECTIO) + fsflags |= FS_DIRECTIO_FL; + } if (put_user(fsflags, ptr)) error = -EFAULT; @@ -270,13 +274,6 @@ static int do_gfs2_set_flags(struct file if ((new_flags ^ flags) == 0) goto out; - if (S_ISDIR(inode->i_mode)) { - if ((new_flags ^ flags) & GFS2_DIF_JDATA) - new_flags ^= (GFS2_DIF_JDATA|GFS2_DIF_INHERIT_JDATA); - if ((new_flags ^ flags) & GFS2_DIF_DIRECTIO) - new_flags ^= (GFS2_DIF_DIRECTIO|GFS2_DIF_INHERIT_DIRECTIO); - } - error = -EINVAL; if ((new_flags ^ flags) & ~GFS2_FLAGS_USER_SET) goto out; @@ -315,11 +312,19 @@ out: static int gfs2_set_flags(struct file *filp, u32 __user *ptr) { + struct inode *inode = filp->f_path.dentry->d_inode; u32 fsflags, gfsflags; if (get_user(fsflags, ptr)) return -EFAULT; gfsflags = fsflags_cvt(fsflags_to_gfs2, fsflags); - return do_gfs2_set_flags(filp, gfsflags, ~0); + if (!S_ISDIR(inode->i_mode)) { + if (gfsflags & GFS2_DIF_INHERIT_JDATA) + gfsflags ^= (GFS2_DIF_JDATA | GFS2_DIF_INHERIT_JDATA); + if (gfsflags & GFS2_DIF_INHERIT_DIRECTIO) + gfsflags ^= (GFS2_DIF_DIRECTIO | GFS2_DIF_INHERIT_DIRECTIO); + return do_gfs2_set_flags(filp, gfsflags, ~0); + } + return do_gfs2_set_flags(filp, gfsflags, ~GFS2_DIF_JDATA); } static long gfs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index cf5aa50..f0bcaa2 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -28,6 +28,7 @@ #include "inode.h" #include "lm.h" #include "mount.h" #include "ops_fstype.h" +#include "ops_dentry.h" #include "ops_super.h" #include "recovery.h" #include "rgrp.h" @@ -38,8 +39,6 @@ #include "util.h" #define DO 0 #define UNDO 1 -extern struct dentry_operations gfs2_dops; - static struct gfs2_sbd *init_sbd(struct super_block *sb) { struct gfs2_sbd *sdp; @@ -161,14 +160,6 @@ static int init_locking(struct gfs2_sbd if (undo) goto fail_trans; - p = kthread_run(gfs2_scand, sdp, "gfs2_scand"); - error = IS_ERR(p); - if (error) { - fs_err(sdp, "can't start scand thread: %d\n", error); - return error; - } - sdp->sd_scand_process = p; - for (sdp->sd_glockd_num = 0; sdp->sd_glockd_num < sdp->sd_args.ar_num_glockd; sdp->sd_glockd_num++) { @@ -229,7 +220,6 @@ fail: while (sdp->sd_glockd_num--) kthread_stop(sdp->sd_glockd_process[sdp->sd_glockd_num]); - kthread_stop(sdp->sd_scand_process); return error; } @@ -368,7 +358,7 @@ static int init_journal(struct gfs2_sbd ip = GFS2_I(sdp->sd_jdesc->jd_inode); error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, - LM_FLAG_NOEXP | GL_EXACT, + LM_FLAG_NOEXP | GL_EXACT | GL_NOCACHE, &sdp->sd_jinode_gh); if (error) { fs_err(sdp, "can't acquire journal inode glock: %d\n", diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c index 911c115..5b8b994 100644 --- a/fs/gfs2/ops_inode.c +++ b/fs/gfs2/ops_inode.c @@ -69,7 +69,7 @@ static int gfs2_create(struct inode *dir mark_inode_dirty(inode); break; } else if (PTR_ERR(inode) != -EEXIST || - (nd->intent.open.flags & O_EXCL)) { + (nd && (nd->intent.open.flags & O_EXCL))) { gfs2_holder_uninit(ghs); return PTR_ERR(inode); } diff --git a/fs/gfs2/ops_super.c b/fs/gfs2/ops_super.c index 603d940..4316690 100644 --- a/fs/gfs2/ops_super.c +++ b/fs/gfs2/ops_super.c @@ -92,7 +92,6 @@ static void gfs2_put_super(struct super_ kthread_stop(sdp->sd_recoverd_process); while (sdp->sd_glockd_num--) kthread_stop(sdp->sd_glockd_process[sdp->sd_glockd_num]); - kthread_stop(sdp->sd_scand_process); if (!(sb->s_flags & MS_RDONLY)) { error = gfs2_make_fs_ro(sdp); diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c index 5ada38c..beb6c7a 100644 --- a/fs/gfs2/recovery.c +++ b/fs/gfs2/recovery.c @@ -469,7 +469,7 @@ int gfs2_recover_journal(struct gfs2_jde }; error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, - LM_FLAG_NOEXP, &ji_gh); + LM_FLAG_NOEXP | GL_NOCACHE, &ji_gh); if (error) goto fail_gunlock_j; } else { diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index e4e0406..2d7f7ea 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -31,6 +31,7 @@ #include "log.h" #include "inode.h" #define BFITNOENT ((u32)~0) +#define NO_BLOCK ((u64)~0) /* * These routines are used by the resource group routines (rgrp.c) @@ -116,8 +117,7 @@ static unsigned char gfs2_testbit(struct * @buffer: the buffer that holds the bitmaps * @buflen: the length (in bytes) of the buffer * @goal: start search at this block's bit-pair (within @buffer) - * @old_state: GFS2_BLKST_XXX the state of the block we're looking for; - * bit 0 = alloc(1)/free(0), bit 1 = meta(1)/data(0) + * @old_state: GFS2_BLKST_XXX the state of the block we're looking for. * * Scope of @goal and returned block number is only within this bitmap buffer, * not entire rgrp or filesystem. @buffer will be offset from the actual @@ -137,9 +137,13 @@ static u32 gfs2_bitfit(struct gfs2_rgrpd byte = buffer + (goal / GFS2_NBBY); bit = (goal % GFS2_NBBY) * GFS2_BIT_SIZE; end = buffer + buflen; - alloc = (old_state & 1) ? 0 : 0x55; + alloc = (old_state == GFS2_BLKST_FREE) ? 0x55 : 0; while (byte < end) { + /* If we're looking for a free block we can eliminate all + bitmap settings with 0x55, which represents four data + blocks in a row. If we're looking for a data block, we can + eliminate 0x00 which corresponds to four free blocks. */ if ((*byte & 0x55) == alloc) { blk += (8 - bit) >> 1; @@ -859,20 +863,28 @@ static int try_rgrp_fit(struct gfs2_rgrp static struct inode *try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 *last_unlinked) { struct inode *inode; - u32 goal = 0; + u32 goal = 0, block; u64 no_addr; + struct gfs2_sbd *sdp = rgd->rd_sbd; for(;;) { - goal = rgblk_search(rgd, goal, GFS2_BLKST_UNLINKED, - GFS2_BLKST_UNLINKED); - if (goal == 0) - return 0; - no_addr = goal + rgd->rd_data0; - if (no_addr <= *last_unlinked) + if (goal >= rgd->rd_data) + break; + down_write(&sdp->sd_log_flush_lock); + block = rgblk_search(rgd, goal, GFS2_BLKST_UNLINKED, + GFS2_BLKST_UNLINKED); + up_write(&sdp->sd_log_flush_lock); + if (block == BFITNOENT) + break; + /* rgblk_search can return a block < goal, so we need to + keep it marching forward. */ + no_addr = block + rgd->rd_data0; + goal++; + if (*last_unlinked != NO_BLOCK && no_addr <= *last_unlinked) continue; *last_unlinked = no_addr; inode = gfs2_inode_lookup(rgd->rd_sbd->sd_vfs, DT_UNKNOWN, - no_addr, -1); + no_addr, -1); if (!IS_ERR(inode)) return inode; } @@ -1149,7 +1161,7 @@ int gfs2_inplace_reserve_i(struct gfs2_i struct gfs2_alloc *al = &ip->i_alloc; struct inode *inode; int error = 0; - u64 last_unlinked = 0; + u64 last_unlinked = NO_BLOCK; if (gfs2_assert_warn(sdp, al->al_requested)) return -EINVAL; @@ -1286,7 +1298,9 @@ static u32 rgblk_search(struct gfs2_rgrp allocatable block anywhere else, we want to be able wrap around and search in the first part of our first-searched bit block. */ for (x = 0; x <= length; x++) { - if (bi->bi_clone) + /* The GFS2_BLKST_UNLINKED state doesn't apply to the clone + bitmaps, so we must search the originals for that. */ + if (old_state != GFS2_BLKST_UNLINKED && bi->bi_clone) blk = gfs2_bitfit(rgd, bi->bi_clone + bi->bi_offset, bi->bi_len, goal, old_state); else @@ -1302,9 +1316,7 @@ static u32 rgblk_search(struct gfs2_rgrp goal = 0; } - if (old_state != new_state) { - gfs2_assert_withdraw(rgd->rd_sbd, blk != BFITNOENT); - + if (blk != BFITNOENT && old_state != new_state) { gfs2_trans_add_bh(rgd->rd_gl, bi->bi_bh, 1); gfs2_setbit(rgd, bi->bi_bh->b_data + bi->bi_offset, bi->bi_len, blk, new_state); @@ -1313,7 +1325,7 @@ static u32 rgblk_search(struct gfs2_rgrp bi->bi_len, blk, new_state); } - return (blk == BFITNOENT) ? 0 : (bi->bi_start * GFS2_NBBY) + blk; + return (blk == BFITNOENT) ? blk : (bi->bi_start * GFS2_NBBY) + blk; } /** @@ -1393,6 +1405,7 @@ u64 gfs2_alloc_data(struct gfs2_inode *i goal = rgd->rd_last_alloc_data; blk = rgblk_search(rgd, goal, GFS2_BLKST_FREE, GFS2_BLKST_USED); + BUG_ON(blk == BFITNOENT); rgd->rd_last_alloc_data = blk; block = rgd->rd_data0 + blk; @@ -1437,6 +1450,7 @@ u64 gfs2_alloc_meta(struct gfs2_inode *i goal = rgd->rd_last_alloc_meta; blk = rgblk_search(rgd, goal, GFS2_BLKST_FREE, GFS2_BLKST_USED); + BUG_ON(blk == BFITNOENT); rgd->rd_last_alloc_meta = blk; block = rgd->rd_data0 + blk; @@ -1478,6 +1492,7 @@ u64 gfs2_alloc_di(struct gfs2_inode *dip blk = rgblk_search(rgd, rgd->rd_last_alloc_meta, GFS2_BLKST_FREE, GFS2_BLKST_DINODE); + BUG_ON(blk == BFITNOENT); rgd->rd_last_alloc_meta = blk; diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index f916b97..5589802 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -58,7 +58,6 @@ void gfs2_tune_init(struct gfs2_tune *gt gt->gt_incore_log_blocks = 1024; gt->gt_log_flush_secs = 60; gt->gt_jindex_refresh_secs = 60; - gt->gt_scand_secs = 15; gt->gt_recoverd_secs = 60; gt->gt_logd_secs = 1; gt->gt_quotad_secs = 5; diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c index c26c21b..ba3a172 100644 --- a/fs/gfs2/sys.c +++ b/fs/gfs2/sys.c @@ -442,7 +442,6 @@ TUNE_ATTR(quota_simul_sync, 1); TUNE_ATTR(quota_cache_secs, 1); TUNE_ATTR(stall_secs, 1); TUNE_ATTR(statfs_quantum, 1); -TUNE_ATTR_DAEMON(scand_secs, scand_process); TUNE_ATTR_DAEMON(recoverd_secs, recoverd_process); TUNE_ATTR_DAEMON(logd_secs, logd_process); TUNE_ATTR_DAEMON(quotad_secs, quotad_process); @@ -464,7 +463,6 @@ static struct attribute *tune_attrs[] = &tune_attr_quota_cache_secs.attr, &tune_attr_stall_secs.attr, &tune_attr_statfs_quantum.attr, - &tune_attr_scand_secs.attr, &tune_attr_recoverd_secs.attr, &tune_attr_logd_secs.attr, &tune_attr_quotad_secs.attr,