GIT 4c7b4f98252150aa61efefefd00739ce2b529dec git+ssh://master.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw.git commit Author: David Teigland Date: Tue May 29 08:47:51 2007 -0500 [DLM] show default protocol Display the initial value of the "protocol" config value in configfs. The default value has always been 0 in the past anyway, so it's always appeared to be correct. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse commit 39377fd10f88815789c903528f1494c22a5215c3 Author: David Teigland Date: Tue May 29 08:47:04 2007 -0500 [DLM] dumping master locks Add a new debugfs file that dumps a compact list of mastered locks. This will be used by a userland daemon to collect state for deadlock detection. Also, for the existing function that prints all lock state, lock the rsb before going through the lock lists since they can be changing in the course of normal dlm activity. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse commit 60bf765d660c548bc4be64ab4d2baa46a2e46f71 Author: David Teigland Date: Tue May 29 08:46:00 2007 -0500 [DLM] canceling deadlocked lock Add a function that can be used through libdlm by a system daemon to cancel another process's deadlocked lock. A completion ast with EDEADLK is returned to the process waiting for the lock. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse commit a3f169ffe3070e0276ef1bae9ee57726f9cc5aff Author: David Teigland Date: Tue May 29 08:44:23 2007 -0500 [DLM] timeout fixes Various fixes related to the new timeout feature: - add_timeout() missed setting TIMEWARN flag on lkb's when the TIMEOUT flag was already set - clear_proc_locks should remove a dead process's locks from the timeout list - the end-of-life calculation for user locks needs to consider that ETIMEDOUT is equivalent to -DLM_ECANCEL - make initial default timewarn_cs config value visible in configfs - change bit position of TIMEOUT_CANCEL flag so it's not copied to a remote master node - set timestamp on remote lkb's so a lock dump will display the time they've been waiting Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse commit 88a9af626b497f7d2c7de207e0ff430949a600d8 Author: Steven Whitehouse Date: Tue May 29 11:14:21 2007 +0100 [DLM] Compile fix A one liner fix which got missed from the earlier patches. Signed-off-by: Steven Whitehouse Cc: Fabio Massimo Di Nitto Cc: David Teigland commit 00b74ee0a665a1a4fa6cd7bbe576fa2834f726b1 Author: Fabio Massimo Di Nitto Date: Tue May 22 09:00:24 2007 +0200 [GFS2] latest gfs2-nmw headers break userland build 2e8701a15cd6f7c95e74d6660615a69b09e453ef commit breaks libgfs2 build: gcc -Wall -I/usr/src/ubuntu/mypkgs/rhcluster/cluster/config -DHELPER_PROGRAM -D_FILE_OFFSET_BITS=64 -DGFS2_RELEASE_NAME=\"2.0\" -ggdb -I/usr/include -I../include -I../libgfs2 -c -o gfs2hex.o gfs2hex.c In file included from hexedit.h:22, from gfs2hex.c:27: /usr/include/linux/gfs2_ondisk.h:505: error: expected specifier-qualifier-list before ‘u32’ make[2]: *** [gfs2hex.o] Error 1 make[2]: Leaving directory `/usr/src/ubuntu/mypkgs/rhcluster/cluster/gfs2/edit' make[1]: *** [all] Error 2 make[1]: Leaving directory `/usr/src/ubuntu/mypkgs/rhcluster/cluster/gfs2' make: *** [gfs2] Error 2 Signed-off-by: Fabio Massimo Di Nitto Signed-off-by: Steven Whitehouse commit 3e8b0e1994f426b23765deb4b20f7d86c6a7115e Author: David Teigland Date: Fri May 18 16:02:57 2007 -0500 [DLM] fix compile breakage In the rush to get the previous patch set sent, a compilation bug I fixed shortly before sending somehow got clobbered, probably by a missed quilt refresh or something. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse commit 02476d76e2bcfca704a87e5c258381ec91ed25ef Author: David Teigland Date: Fri May 18 09:03:35 2007 -0500 [DLM] wait for config check during join [6/6] Joining the lockspace should wait for the initial round of inter-node config checks to complete before returning. This way, if there's a configuration mismatch between the joining node and the existing nodes, the join can fail and return an error to the application. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse commit 7c6792c4140548e78f6033f9306125879a608dd0 Author: David Teigland Date: Fri May 18 09:02:20 2007 -0500 [DLM] fix new_lockspace error exit [5/6] Fix the error path when exiting new_lockspace(). It was kfree'ing the lockspace struct at the end, but that's only valid if it exits before kobject_register occured. After kobject_register we have to let the kobject do the freeing. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse commit 8107eec48f66e5a5af34c90953721b6776c1cb4e Author: David Teigland Date: Fri May 18 09:01:26 2007 -0500 [DLM] cancel in conversion deadlock [4/6] When conversion deadlock is detected, cancel the conversion and return EDEADLK to the application. This is a new default behavior where before the dlm would allow the deadlock to exist indefinately. The DLM_LKF_NODLCKWT flag can now be used in a conversion to prevent the dlm from performing conversion deadlock detection/cancelation on it. The DLM_LKF_CONVDEADLK flag can continue to be used as before to tell the dlm to demote the granted mode of the lock being converted if it gets into a conversion deadlock. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse commit 59fa6dceb980e3f95a61ee602210b241f0b0df94 Author: David Teigland Date: Fri May 18 09:00:32 2007 -0500 [DLM] dlm_device interface changes [3/6] Change the user/kernel device interface used by libdlm: - Add ability for userspace to check the version of the interface. libdlm can now adapt to different versions of the kernel interface. - Increase the size of the flags passed in a lock request so all possible flags can be used from userspace. - Add an opaque "xid" value for each lock. This "transaction id" will be used later to associate locks with each other during deadlock detection. - Add a "timeout" value for each lock. This is used along with the DLM_LKF_TIMEOUT flag. Also, remove a fragment of unused code in device_read(). This patch requires updating libdlm which is backward compatible with older kernels. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse commit 5b5abe46b3dbe3551de1a1cfeb5c12dd2a2dac40 Author: David Teigland Date: Fri May 18 08:59:31 2007 -0500 [DLM] add lock timeouts and warnings [2/6] New features: lock timeouts and time warnings. If the DLM_LKF_TIMEOUT flag is set, then the request/conversion will be canceled after waiting the specified number of centiseconds (specified per lock). This feature is only available for locks requested through libdlm (can be enabled for kernel dlm users if there's a use for it.) If the new DLM_LSFL_TIMEWARN flag is set when creating the lockspace, then a warning message will be sent to userspace (using genetlink) after a request/conversion has been waiting for a given number of centiseconds (configurable per node). The time warnings will be used in the future to do deadlock detection in userspace. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse commit 9675bde93284a9b7ec74018a477693d8651d2304 Author: David Teigland Date: Fri May 18 08:58:15 2007 -0500 [DLM] block scand during recovery [1/6] Don't let dlm_scand run during recovery since it may try to do a resource directory removal while the directory nodes are changing. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse commit 13259f4e658a248b477145821a71be369baa2508 Author: Josef Bacik Date: Wed May 16 15:56:13 2007 -0400 [DLM] keep dlm from panicing when traversing rsb list in debugfs This problem was originally reported against GFS6.1, but the same issue exists in upstream DLM. This patch keeps the rsb iterator assigning under the rsbtbl list lock. Each time we process an rsb we grab a reference to it to make sure it is not freed out from underneath us, and then put it when we get the next rsb in the list or move onto another list. Signed-off-by: Josef Bacik Signed-off-by: Steven Whitehouse commit 8ba34a28981716bf07d26ead70b96c19e24736a4 Author: Abhijith Das Date: Wed May 16 17:02:19 2007 -0500 [GFS2] Quotas non-functional - fix bug This patch fixes an error in the quota code where a 'struct gfs2_quota_lvb*' was being passed to gfs2_adjust_quota() instead of a 'struct gfs2_quota_data*'. Also moved 'struct gfs2_quota_lvb' from fs/gfs2/incore.h to include/linux/gfs2_ondisk.h as per Steve's suggestion. Signed-off-by: Abhijith Das Signed-off-by: Steven Whitehouse commit a7c18fc4baa4099667ae6549d3711629a51da649 Author: Steven Whitehouse Date: Tue May 15 15:37:50 2007 +0100 [GFS2] Clean up inode number handling This patch cleans up the inode number handling code. The main difference is that instead of looking up the inodes using a struct gfs2_inum_host we now use just the no_addr member of this structure. The tests relating to no_formal_ino can then be done by the calling code. This has advantages in that we want to do different things in different code paths if the no_formal_ino doesn't match. In the NFS patch we want to return -ESTALE, but in the ->lookup() path, its a bug in the fs if the no_formal_ino doesn't match and thus we can withdraw in this case. In order to later fix bz #201012, we need to be able to look up an inode without knowing no_formal_ino, as the only information that is known to us is the on-disk location of the inode in question. This patch will also help us to fix bz #236099 at a later date by cleaning up a lot of the code in that area. There are no user visible changes as a result of this patch and there are no changes to the on-disk format either. Signed-off-by: Steven Whitehouse commit 43abbfe7597ad85c9096859fc85190bb73938fe6 Author: Steven Whitehouse Date: Mon May 14 17:43:26 2007 +0100 [GFS2] Reduce size of struct gdlm_lock This patch removes the completion (which is rather large) from struct gdlm_lock in favour of using the wait_on_bit() functions. We don't need to add any extra fields to the structure to do this, so we save 32 bytes (on x86_64) per structure. This adds up to quite a lot when we may potentially have millions of these lock structures, Signed-off-by: Steven Whitehouse Acked-by: David Teigland commit 5e14f93d60295827f24842e9e16a4de8640a1448 Author: Robert Peterson Date: Mon May 14 12:42:18 2007 -0500 [GFS2] Addendum patch 2 for gfs2_grow This addendum patch 2 corrects three things: 1. It fixes a stupid mistake in the previous addendum that broke gfs2. Ref: https://www.redhat.com/archives/cluster-devel/2007-May/msg00162.html 2. It fixes a problem that Dave Teigland pointed out regarding the external declarations in ops_address.h being in the wrong place. 3. It recasts a couple more %llu printks to (unsigned long long) as requested by Steve Whitehouse. I would have loved to put this all in one revised patch, but there was a rush to get some patches for RHEL5. Therefore, the previous patches were applied to the git tree "as is" and therefore, I'm posting another addendum. Sorry. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse commit 887cdd75bed2912d98a93692382989a2c51328db Author: Nate Diller Date: Thu May 10 22:41:28 2007 -0700 [GFS2] use zero_user_page Use zero_user_page() instead of open-coding it. Signed-off-by: Nate Diller Cc: Steven Whitehouse Signed-off-by: Andrew Morton commit 65e61032a254c60ca20e246271e7325ac7c79006 Author: Robert Peterson Date: Thu May 10 16:54:38 2007 -0500 [GFS2] Kernel changes to support new gfs2_grow command (part 2) To avoid code redundancy, I separated out the operational "guts" into a new function called read_rindex_entry. Then I made two functions: the closer-to-original gfs2_ri_update (without the special condition checks) and gfs2_ri_update_special that's designed with that condition in mind. (I don't like the name, but if you have a suggestion, I'm all ears). Oh, and there's an added benefit: we don't need all the ugly gotos anymore. ;) This patch has been tested with gfs2_fsck_hellfire (which runs for three and a half hours, btw). Signed-off-By: Bob Peterson Signed-off-by: Steven Whitehouse commit eca7e9dc0ec6f6553373c7fa1429df91b1254179 Author: Robert Peterson Date: Wed May 9 09:37:57 2007 -0500 [GFS2] kernel changes to support new gfs2_grow command This is another revision of my gfs2 kernel patch that allows gfs2_grow to function properly. Steve Whitehouse expressed some concerns about the previous patch and I restructured it based on his comments. The previous patch was doing the statfs_change at file close time, under its own transaction. The current patch does the statfs_change inside the gfs2_commit_write function, which keeps it under the umbrella of the inode transaction. I can't call ri_update to re-read the rindex file during the transaction because the transaction may have outstanding unwritten buffers attached to the rgrps that would be otherwise blown away. So instead, I created a new function, gfs2_ri_total, that will re-read the rindex file just to total the file system space for the sake of the statfs_change. The ri_update will happen later, when gfs2 realizes the version number has changed, as it happened before my patch. Since the statfs_change is happening at write_commit time and there may be multiple writes to the rindex file for one grow operation. So one consequence of this restructuring is that instead of getting one kernel message to indicate the change, you may see several. For example, before when you did a gfs2_grow, you'd get a single message like: GFS2: File system extended by 247876 blocks (968MB) Now you get something like: GFS2: File system extended by 207896 blocks (812MB) GFS2: File system extended by 39980 blocks (156MB) This version has also been successfully run against the hours-long "gfs2_fsck_hellfire" test that does several gfs2_grow and gfs2_fsck while interjecting file system damage. It does this repeatedly under a variety Resource Group conditions. Signed-off-By: Bob Peterson Signed-off-by: Steven Whitehouse commit 3947c5571e980b56eb725baa54b8802f0f9e3ec0 Author: Satyam Sharma Date: Tue May 8 09:18:58 2007 +0100 [DLM] fix a couple of races Fix two races in fs/dlm/config.c: (1) Grab the configfs subsystem semaphore before calling config_group_find_obj() in get_space(). This solves a potential race between get_space() and concurrent mkdir(2) or rmdir(2). (2) Grab a reference on the found config_item _while_ holding the configfs subsystem semaphore in get_comm(), and not after it. This solves a potential race between get_comm() and concurrent rmdir(2). Signed-off-by: Satyam Sharma Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse commit e495b3fdbaf15c13977e8ca1d444794f3a217e57 Author: Benjamin Marzinski Date: Wed May 2 09:44:03 2007 -0500 [GFS2] flush the glock completely in inode_go_sync Fix for bz #231910 When filemap_fdatawrite() is called on the inode mapping in data=ordered mode, it will add the glock to the log. In inode_go_sync(), if you do the gfs2_log_flush() before this, after the filemap_fdatawrite() call, the glock and its associated data buffers will be on the log again. This means you can demote a lock from exclusive, without having it flushed from the log. The attached patch simply moves the gfs2_log_flush up to after the filemap_fdatawrite() call. Originally, I tried moving the gfs2_log_flush to after gfs2_meta_sync(), but that caused me to trip the following assert. GFS2: fsid=cypher-36:test.0: fatal: assertion "!buffer_busy(bh)" failed GFS2: fsid=cypher-36:test.0: function = gfs2_ail_empty_gl, file = fs/gfs2/glops.c, line = 61 It appears that gfs2_log_flush() puts some of the glocks buffers in the busy state and the filemap_fdatawrite() call is necessary to flush them. This makes me worry slightly that a related problem could happen because of moving the gfs2_log_flush() after the initial filemap_fdatawrite(), but I assume that gfs2_ail_empty_gl() would catch that case as well. Signed-off-by: Benjamin E. Marzinski Signed-off-by: Steven Whitehouse fs/dlm/Makefile | 1 fs/dlm/config.c | 25 ++ fs/dlm/config.h | 1 fs/dlm/debug_fs.c | 173 +++++++++++++++ fs/dlm/dlm_internal.h | 16 + fs/dlm/lock.c | 468 +++++++++++++++++++++++++++++++--------- fs/dlm/lock.h | 13 + fs/dlm/lockspace.c | 80 +++++-- fs/dlm/main.c | 11 + fs/dlm/member.c | 11 + fs/dlm/netlink.c | 153 +++++++++++++ fs/dlm/rcom.c | 4 fs/dlm/recoverd.c | 4 fs/dlm/user.c | 129 ++++++++--- fs/gfs2/bmap.c | 8 - fs/gfs2/dir.c | 56 ++++- fs/gfs2/dir.h | 9 - fs/gfs2/glock.c | 7 - fs/gfs2/glops.c | 2 fs/gfs2/incore.h | 12 - fs/gfs2/inode.c | 80 +++---- fs/gfs2/inode.h | 18 +- fs/gfs2/locking/dlm/lock.c | 11 + fs/gfs2/locking/dlm/lock_dlm.h | 2 fs/gfs2/locking/dlm/thread.c | 11 + fs/gfs2/meta_io.h | 2 fs/gfs2/ondisk.c | 37 +-- fs/gfs2/ops_address.c | 35 +++ fs/gfs2/ops_address.h | 2 fs/gfs2/ops_dentry.c | 24 +- fs/gfs2/ops_export.c | 29 +- fs/gfs2/ops_file.c | 4 fs/gfs2/ops_fstype.c | 16 + fs/gfs2/ops_inode.c | 20 +- fs/gfs2/quota.c | 4 fs/gfs2/rgrp.c | 189 ++++++++++++---- fs/gfs2/rgrp.h | 1 fs/gfs2/super.c | 2 fs/gfs2/util.c | 4 include/linux/Kbuild | 1 include/linux/dlm.h | 13 + include/linux/dlm_device.h | 22 +- include/linux/dlm_netlink.h | 56 +++++ include/linux/gfs2_ondisk.h | 19 +- 44 files changed, 1373 insertions(+), 412 deletions(-) diff --git a/fs/dlm/Makefile b/fs/dlm/Makefile index 604cf7d..d248e60 100644 --- a/fs/dlm/Makefile +++ b/fs/dlm/Makefile @@ -8,6 +8,7 @@ dlm-y := ast.o \ member.o \ memory.o \ midcomms.o \ + netlink.o \ lowcomms.o \ rcom.o \ recover.o \ diff --git a/fs/dlm/config.c b/fs/dlm/config.c index 822abdc..5069b2c 100644 --- a/fs/dlm/config.c +++ b/fs/dlm/config.c @@ -90,6 +90,7 @@ struct cluster { unsigned int cl_scan_secs; unsigned int cl_log_debug; unsigned int cl_protocol; + unsigned int cl_timewarn_cs; }; enum { @@ -103,6 +104,7 @@ enum { CLUSTER_ATTR_SCAN_SECS, CLUSTER_ATTR_LOG_DEBUG, CLUSTER_ATTR_PROTOCOL, + CLUSTER_ATTR_TIMEWARN_CS, }; struct cluster_attribute { @@ -162,6 +164,7 @@ CLUSTER_ATTR(toss_secs, 1); CLUSTER_ATTR(scan_secs, 1); CLUSTER_ATTR(log_debug, 0); CLUSTER_ATTR(protocol, 0); +CLUSTER_ATTR(timewarn_cs, 1); static struct configfs_attribute *cluster_attrs[] = { [CLUSTER_ATTR_TCP_PORT] = &cluster_attr_tcp_port.attr, @@ -174,6 +177,7 @@ static struct configfs_attribute *cluste [CLUSTER_ATTR_SCAN_SECS] = &cluster_attr_scan_secs.attr, [CLUSTER_ATTR_LOG_DEBUG] = &cluster_attr_log_debug.attr, [CLUSTER_ATTR_PROTOCOL] = &cluster_attr_protocol.attr, + [CLUSTER_ATTR_TIMEWARN_CS] = &cluster_attr_timewarn_cs.attr, NULL, }; @@ -429,6 +433,8 @@ static struct config_group *make_cluster cl->cl_toss_secs = dlm_config.ci_toss_secs; cl->cl_scan_secs = dlm_config.ci_scan_secs; cl->cl_log_debug = dlm_config.ci_log_debug; + cl->cl_protocol = dlm_config.ci_protocol; + cl->cl_timewarn_cs = dlm_config.ci_timewarn_cs; space_list = &sps->ss_group; comm_list = &cms->cs_group; @@ -748,9 +754,16 @@ static ssize_t node_weight_write(struct static struct space *get_space(char *name) { + struct config_item *i; + if (!space_list) return NULL; - return to_space(config_group_find_obj(space_list, name)); + + down(&space_list->cg_subsys->su_sem); + i = config_group_find_obj(space_list, name); + up(&space_list->cg_subsys->su_sem); + + return to_space(i); } static void put_space(struct space *sp) @@ -776,20 +789,20 @@ static struct comm *get_comm(int nodeid, if (cm->nodeid != nodeid) continue; found = 1; + config_item_get(i); break; } else { if (!cm->addr_count || memcmp(cm->addr[0], addr, sizeof(*addr))) continue; found = 1; + config_item_get(i); break; } } up(&clusters_root.subsys.su_sem); - if (found) - config_item_get(i); - else + if (!found) cm = NULL; return cm; } @@ -909,6 +922,7 @@ #define DEFAULT_TOSS_SECS 10 #define DEFAULT_SCAN_SECS 5 #define DEFAULT_LOG_DEBUG 0 #define DEFAULT_PROTOCOL 0 +#define DEFAULT_TIMEWARN_CS 500 /* 5 sec = 500 centiseconds */ struct dlm_config_info dlm_config = { .ci_tcp_port = DEFAULT_TCP_PORT, @@ -920,6 +934,7 @@ struct dlm_config_info dlm_config = { .ci_toss_secs = DEFAULT_TOSS_SECS, .ci_scan_secs = DEFAULT_SCAN_SECS, .ci_log_debug = DEFAULT_LOG_DEBUG, - .ci_protocol = DEFAULT_PROTOCOL + .ci_protocol = DEFAULT_PROTOCOL, + .ci_timewarn_cs = DEFAULT_TIMEWARN_CS }; diff --git a/fs/dlm/config.h b/fs/dlm/config.h index 967cc3d..a3170fe 100644 --- a/fs/dlm/config.h +++ b/fs/dlm/config.h @@ -27,6 +27,7 @@ struct dlm_config_info { int ci_scan_secs; int ci_log_debug; int ci_protocol; + int ci_timewarn_cs; }; extern struct dlm_config_info dlm_config; diff --git a/fs/dlm/debug_fs.c b/fs/dlm/debug_fs.c index 61ba670..184be98 100644 --- a/fs/dlm/debug_fs.c +++ b/fs/dlm/debug_fs.c @@ -17,6 +17,7 @@ #include #include #include "dlm_internal.h" +#include "lock.h" #define DLM_DEBUG_BUF_LEN 4096 static char debug_buf[DLM_DEBUG_BUF_LEN]; @@ -26,6 +27,8 @@ static struct dentry *dlm_root; struct rsb_iter { int entry; + int master; + int header; struct dlm_ls *ls; struct list_head *next; struct dlm_rsb *rsb; @@ -85,6 +88,8 @@ static int print_resource(struct dlm_rsb struct dlm_lkb *lkb; int i, lvblen = res->res_ls->ls_lvblen, recover_list, root_list; + lock_rsb(res); + seq_printf(s, "\nResource %p Name (len=%d) \"", res, res->res_length); for (i = 0; i < res->res_length; i++) { if (isprint(res->res_name[i])) @@ -151,6 +156,59 @@ static int print_resource(struct dlm_rsb seq_printf(s, "\n"); } out: + unlock_rsb(res); + return 0; +} + +static void print_master_lock(struct seq_file *s, struct dlm_lkb *lkb, + struct dlm_rsb *r) +{ + struct dlm_user_args *ua; + unsigned int waiting = 0; + uint64_t xid = 0; + + if (lkb->lkb_flags & DLM_IFL_USER) { + ua = (struct dlm_user_args *) lkb->lkb_astparam; + if (ua) + xid = ua->xid; + } + + if (lkb->lkb_timestamp) + waiting = jiffies_to_msecs(jiffies - lkb->lkb_timestamp); + + /* id nodeid remid pid xid flags sts grmode rqmode time_ms len name */ + + seq_printf(s, "%x %d %x %u %llu %x %d %d %d %u %d \"%s\"\n", + lkb->lkb_id, + lkb->lkb_nodeid, + lkb->lkb_remid, + lkb->lkb_ownpid, + (unsigned long long)xid, + lkb->lkb_exflags, + lkb->lkb_status, + lkb->lkb_grmode, + lkb->lkb_rqmode, + waiting, + r->res_length, + r->res_name); +} + +static int print_master_resource(struct dlm_rsb *r, struct seq_file *s) +{ + struct dlm_lkb *lkb; + + lock_rsb(r); + + list_for_each_entry(lkb, &r->res_grantqueue, lkb_statequeue) + print_master_lock(s, lkb, r); + + list_for_each_entry(lkb, &r->res_convertqueue, lkb_statequeue) + print_master_lock(s, lkb, r); + + list_for_each_entry(lkb, &r->res_waitqueue, lkb_statequeue) + print_master_lock(s, lkb, r); + + unlock_rsb(r); return 0; } @@ -166,6 +224,9 @@ static int rsb_iter_next(struct rsb_iter read_lock(&ls->ls_rsbtbl[i].lock); if (!list_empty(&ls->ls_rsbtbl[i].list)) { ri->next = ls->ls_rsbtbl[i].list.next; + ri->rsb = list_entry(ri->next, struct dlm_rsb, + res_hashchain); + dlm_hold_rsb(ri->rsb); read_unlock(&ls->ls_rsbtbl[i].lock); break; } @@ -176,6 +237,7 @@ static int rsb_iter_next(struct rsb_iter if (ri->entry >= ls->ls_rsbtbl_size) return 1; } else { + struct dlm_rsb *old = ri->rsb; i = ri->entry; read_lock(&ls->ls_rsbtbl[i].lock); ri->next = ri->next->next; @@ -184,11 +246,13 @@ static int rsb_iter_next(struct rsb_iter ri->next = NULL; ri->entry++; read_unlock(&ls->ls_rsbtbl[i].lock); + dlm_put_rsb(old); goto top; } + ri->rsb = list_entry(ri->next, struct dlm_rsb, res_hashchain); read_unlock(&ls->ls_rsbtbl[i].lock); + dlm_put_rsb(old); } - ri->rsb = list_entry(ri->next, struct dlm_rsb, res_hashchain); return 0; } @@ -202,7 +266,7 @@ static struct rsb_iter *rsb_iter_init(st { struct rsb_iter *ri; - ri = kmalloc(sizeof *ri, GFP_KERNEL); + ri = kzalloc(sizeof *ri, GFP_KERNEL); if (!ri) return NULL; @@ -260,7 +324,17 @@ static int rsb_seq_show(struct seq_file { struct rsb_iter *ri = iter_ptr; - print_resource(ri->rsb, file); + if (ri->master) { + if (ri->header) { + seq_printf(file, "id nodeid remid pid xid flags sts " + "grmode rqmode time_ms len name\n"); + ri->header = 0; + } + if (is_master(ri->rsb)) + print_master_resource(ri->rsb, file); + } else { + print_resource(ri->rsb, file); + } return 0; } @@ -296,6 +370,83 @@ static const struct file_operations rsb_ }; /* + * Dump master lock state + */ + +static struct rsb_iter *master_iter_init(struct dlm_ls *ls, loff_t *pos) +{ + struct rsb_iter *ri; + + ri = kzalloc(sizeof *ri, GFP_KERNEL); + if (!ri) + return NULL; + + ri->ls = ls; + ri->entry = 0; + ri->next = NULL; + ri->master = 1; + + if (*pos == 0) + ri->header = 1; + + if (rsb_iter_next(ri)) { + rsb_iter_free(ri); + return NULL; + } + + return ri; +} + +static void *master_seq_start(struct seq_file *file, loff_t *pos) +{ + struct rsb_iter *ri; + loff_t n = *pos; + + ri = master_iter_init(file->private, pos); + if (!ri) + return NULL; + + while (n--) { + if (rsb_iter_next(ri)) { + rsb_iter_free(ri); + return NULL; + } + } + + return ri; +} + +static struct seq_operations master_seq_ops = { + .start = master_seq_start, + .next = rsb_seq_next, + .stop = rsb_seq_stop, + .show = rsb_seq_show, +}; + +static int master_open(struct inode *inode, struct file *file) +{ + struct seq_file *seq; + int ret; + + ret = seq_open(file, &master_seq_ops); + if (ret) + return ret; + + seq = file->private_data; + seq->private = inode->i_private; + + return 0; +} + +static const struct file_operations master_fops = { + .owner = THIS_MODULE, + .open = master_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release +}; + +/* * dump lkb's on the ls_waiters list */ @@ -362,6 +513,20 @@ int dlm_create_debug_file(struct dlm_ls return -ENOMEM; } + memset(name, 0, sizeof(name)); + snprintf(name, DLM_LOCKSPACE_LEN+8, "%s_master", ls->ls_name); + + ls->ls_debug_master_dentry = debugfs_create_file(name, + S_IFREG | S_IRUGO, + dlm_root, + ls, + &master_fops); + if (!ls->ls_debug_master_dentry) { + debugfs_remove(ls->ls_debug_waiters_dentry); + debugfs_remove(ls->ls_debug_rsb_dentry); + return -ENOMEM; + } + return 0; } @@ -371,6 +536,8 @@ void dlm_delete_debug_file(struct dlm_ls debugfs_remove(ls->ls_debug_rsb_dentry); if (ls->ls_debug_waiters_dentry) debugfs_remove(ls->ls_debug_waiters_dentry); + if (ls->ls_debug_master_dentry) + debugfs_remove(ls->ls_debug_master_dentry); } int dlm_register_debugfs(void) diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h index 30994d6..f2c8549 100644 --- a/fs/dlm/dlm_internal.h +++ b/fs/dlm/dlm_internal.h @@ -151,6 +151,7 @@ struct dlm_args { void *bastaddr; int mode; struct dlm_lksb *lksb; + unsigned long timeout; }; @@ -213,6 +214,9 @@ #define DLM_IFL_DEAD 0x00040000 #define DLM_IFL_OVERLAP_UNLOCK 0x00080000 #define DLM_IFL_OVERLAP_CANCEL 0x00100000 #define DLM_IFL_ENDOFLIFE 0x00200000 +#define DLM_IFL_WATCH_TIMEWARN 0x00400000 +#define DLM_IFL_TIMEOUT_CANCEL 0x00800000 +#define DLM_IFL_DEADLOCK_CANCEL 0x01000000 #define DLM_IFL_USER 0x00000001 #define DLM_IFL_ORPHAN 0x00000002 @@ -243,6 +247,9 @@ struct dlm_lkb { struct list_head lkb_wait_reply; /* waiting for remote reply */ struct list_head lkb_astqueue; /* need ast to be sent */ struct list_head lkb_ownqueue; /* list of locks for a process */ + struct list_head lkb_time_list; + unsigned long lkb_timestamp; + unsigned long lkb_timeout_cs; char *lkb_lvbptr; struct dlm_lksb *lkb_lksb; /* caller's status block */ @@ -447,6 +454,9 @@ struct dlm_ls { struct mutex ls_orphans_mutex; struct list_head ls_orphans; + struct mutex ls_timeout_mutex; + struct list_head ls_timeout; + struct list_head ls_nodes; /* current nodes in ls */ struct list_head ls_nodes_gone; /* dead node list, recovery */ int ls_num_nodes; /* number of nodes in ls */ @@ -460,9 +470,12 @@ struct dlm_ls { struct dentry *ls_debug_rsb_dentry; /* debugfs */ struct dentry *ls_debug_waiters_dentry; /* debugfs */ + struct dentry *ls_debug_master_dentry; /* debugfs */ wait_queue_head_t ls_uevent_wait; /* user part of join/leave */ int ls_uevent_result; + struct completion ls_members_done; + int ls_members_result; struct miscdevice ls_device; @@ -472,6 +485,7 @@ struct dlm_ls { struct task_struct *ls_recoverd_task; struct mutex ls_recoverd_active; spinlock_t ls_recover_lock; + unsigned long ls_recover_begin; /* jiffies timestamp */ uint32_t ls_recover_status; /* DLM_RS_ */ uint64_t ls_recover_seq; struct dlm_recover *ls_recover_args; @@ -501,6 +515,7 @@ #define LSFL_RECOVERY_STOP 2 #define LSFL_RCOM_READY 3 #define LSFL_RCOM_WAIT 4 #define LSFL_UEVENT_WAIT 5 +#define LSFL_TIMEWARN 6 /* much of this is just saving user space pointers associated with the lock that we pass back to the user lib with an ast */ @@ -518,6 +533,7 @@ struct dlm_user_args { void __user *castaddr; void __user *bastparam; void __user *bastaddr; + uint64_t xid; }; #define DLM_PROC_FLAGS_CLOSING 1 diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c index d8d6e72..de943af 100644 --- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -82,10 +82,13 @@ static int send_bast(struct dlm_rsb *r, static int send_lookup(struct dlm_rsb *r, struct dlm_lkb *lkb); static int send_remove(struct dlm_rsb *r); static int _request_lock(struct dlm_rsb *r, struct dlm_lkb *lkb); +static int _cancel_lock(struct dlm_rsb *r, struct dlm_lkb *lkb); static void __receive_convert_reply(struct dlm_rsb *r, struct dlm_lkb *lkb, struct dlm_message *ms); static int receive_extralen(struct dlm_message *ms); static void do_purge(struct dlm_ls *ls, int nodeid, int pid); +static void del_timeout(struct dlm_lkb *lkb); +void dlm_timeout_warn(struct dlm_lkb *lkb); /* * Lock compatibilty matrix - thanks Steve @@ -194,17 +197,17 @@ void dlm_dump_rsb(struct dlm_rsb *r) /* Threads cannot use the lockspace while it's being recovered */ -static inline void lock_recovery(struct dlm_ls *ls) +static inline void dlm_lock_recovery(struct dlm_ls *ls) { down_read(&ls->ls_in_recovery); } -static inline void unlock_recovery(struct dlm_ls *ls) +void dlm_unlock_recovery(struct dlm_ls *ls) { up_read(&ls->ls_in_recovery); } -static inline int lock_recovery_try(struct dlm_ls *ls) +int dlm_lock_recovery_try(struct dlm_ls *ls) { return down_read_trylock(&ls->ls_in_recovery); } @@ -286,8 +289,22 @@ static void queue_cast(struct dlm_rsb *r if (is_master_copy(lkb)) return; + del_timeout(lkb); + DLM_ASSERT(lkb->lkb_lksb, dlm_print_lkb(lkb);); + /* if the operation was a cancel, then return -DLM_ECANCEL, if a + timeout caused the cancel then return -ETIMEDOUT */ + if (rv == -DLM_ECANCEL && (lkb->lkb_flags & DLM_IFL_TIMEOUT_CANCEL)) { + lkb->lkb_flags &= ~DLM_IFL_TIMEOUT_CANCEL; + rv = -ETIMEDOUT; + } + + if (rv == -DLM_ECANCEL && (lkb->lkb_flags & DLM_IFL_DEADLOCK_CANCEL)) { + lkb->lkb_flags &= ~DLM_IFL_DEADLOCK_CANCEL; + rv = -EDEADLK; + } + lkb->lkb_lksb->sb_status = rv; lkb->lkb_lksb->sb_flags = lkb->lkb_sbflags; @@ -581,6 +598,7 @@ static int create_lkb(struct dlm_ls *ls, kref_init(&lkb->lkb_ref); INIT_LIST_HEAD(&lkb->lkb_ownqueue); INIT_LIST_HEAD(&lkb->lkb_rsb_lookup); + INIT_LIST_HEAD(&lkb->lkb_time_list); get_random_bytes(&bucket, sizeof(bucket)); bucket &= (ls->ls_lkbtbl_size - 1); @@ -985,15 +1003,136 @@ void dlm_scan_rsbs(struct dlm_ls *ls) { int i; - if (dlm_locking_stopped(ls)) - return; - for (i = 0; i < ls->ls_rsbtbl_size; i++) { shrink_bucket(ls, i); + if (dlm_locking_stopped(ls)) + break; cond_resched(); } } +static void add_timeout(struct dlm_lkb *lkb) +{ + struct dlm_ls *ls = lkb->lkb_resource->res_ls; + + if (is_master_copy(lkb)) { + lkb->lkb_timestamp = jiffies; + return; + } + + if (test_bit(LSFL_TIMEWARN, &ls->ls_flags) && + !(lkb->lkb_exflags & DLM_LKF_NODLCKWT)) { + lkb->lkb_flags |= DLM_IFL_WATCH_TIMEWARN; + goto add_it; + } + if (lkb->lkb_exflags & DLM_LKF_TIMEOUT) + goto add_it; + return; + + add_it: + DLM_ASSERT(list_empty(&lkb->lkb_time_list), dlm_print_lkb(lkb);); + mutex_lock(&ls->ls_timeout_mutex); + hold_lkb(lkb); + lkb->lkb_timestamp = jiffies; + list_add_tail(&lkb->lkb_time_list, &ls->ls_timeout); + mutex_unlock(&ls->ls_timeout_mutex); +} + +static void del_timeout(struct dlm_lkb *lkb) +{ + struct dlm_ls *ls = lkb->lkb_resource->res_ls; + + mutex_lock(&ls->ls_timeout_mutex); + if (!list_empty(&lkb->lkb_time_list)) { + list_del_init(&lkb->lkb_time_list); + unhold_lkb(lkb); + } + mutex_unlock(&ls->ls_timeout_mutex); +} + +/* FIXME: is it safe to look at lkb_exflags, lkb_flags, lkb_timestamp, and + lkb_lksb_timeout without lock_rsb? Note: we can't lock timeout_mutex + and then lock rsb because of lock ordering in add_timeout. We may need + to specify some special timeout-related bits in the lkb that are just to + be accessed under the timeout_mutex. */ + +void dlm_scan_timeout(struct dlm_ls *ls) +{ + struct dlm_rsb *r; + struct dlm_lkb *lkb; + int do_cancel, do_warn; + + for (;;) { + if (dlm_locking_stopped(ls)) + break; + + do_cancel = 0; + do_warn = 0; + mutex_lock(&ls->ls_timeout_mutex); + list_for_each_entry(lkb, &ls->ls_timeout, lkb_time_list) { + + if ((lkb->lkb_exflags & DLM_LKF_TIMEOUT) && + time_after_eq(jiffies, lkb->lkb_timestamp + + lkb->lkb_timeout_cs * HZ/100)) + do_cancel = 1; + + if ((lkb->lkb_flags & DLM_IFL_WATCH_TIMEWARN) && + time_after_eq(jiffies, lkb->lkb_timestamp + + dlm_config.ci_timewarn_cs * HZ/100)) + do_warn = 1; + + if (!do_cancel && !do_warn) + continue; + hold_lkb(lkb); + break; + } + mutex_unlock(&ls->ls_timeout_mutex); + + if (!do_cancel && !do_warn) + break; + + r = lkb->lkb_resource; + hold_rsb(r); + lock_rsb(r); + + if (do_warn) { + /* clear flag so we only warn once */ + lkb->lkb_flags &= ~DLM_IFL_WATCH_TIMEWARN; + if (!(lkb->lkb_exflags & DLM_LKF_TIMEOUT)) + del_timeout(lkb); + dlm_timeout_warn(lkb); + } + + if (do_cancel) { + log_debug(ls, "timeout cancel %x node %d %s", + lkb->lkb_id, lkb->lkb_nodeid, r->res_name); + lkb->lkb_flags &= ~DLM_IFL_WATCH_TIMEWARN; + lkb->lkb_flags |= DLM_IFL_TIMEOUT_CANCEL; + del_timeout(lkb); + _cancel_lock(r, lkb); + } + + unlock_rsb(r); + unhold_rsb(r); + dlm_put_lkb(lkb); + } +} + +/* This is only called by dlm_recoverd, and we rely on dlm_ls_stop() stopping + dlm_recoverd before checking/setting ls_recover_begin. */ + +void dlm_adjust_timeouts(struct dlm_ls *ls) +{ + struct dlm_lkb *lkb; + long adj = jiffies - ls->ls_recover_begin; + + ls->ls_recover_begin = 0; + mutex_lock(&ls->ls_timeout_mutex); + list_for_each_entry(lkb, &ls->ls_timeout, lkb_time_list) + lkb->lkb_timestamp += adj; + mutex_unlock(&ls->ls_timeout_mutex); +} + /* lkb is master or local copy */ static void set_lvb_lock(struct dlm_rsb *r, struct dlm_lkb *lkb) @@ -1275,10 +1414,8 @@ static int queue_conflict(struct list_he * queue for one resource. The granted mode of each lock blocks the requested * mode of the other lock." * - * Part 2: if the granted mode of lkb is preventing the first lkb in the - * convert queue from being granted, then demote lkb (set grmode to NL). - * This second form requires that we check for conv-deadlk even when - * now == 0 in _can_be_granted(). + * Part 2: if the granted mode of lkb is preventing an earlier lkb in the + * convert queue from being granted, then deadlk/demote lkb. * * Example: * Granted Queue: empty @@ -1287,41 +1424,52 @@ static int queue_conflict(struct list_he * * The first lock can't be granted because of the granted mode of the second * lock and the second lock can't be granted because it's not first in the - * list. We demote the granted mode of the second lock (the lkb passed to this - * function). + * list. We either cancel lkb's conversion (PR->EX) and return EDEADLK, or we + * demote the granted mode of lkb (from PR to NL) if it has the CONVDEADLK + * flag set and return DEMOTED in the lksb flags. + * + * Originally, this function detected conv-deadlk in a more limited scope: + * - if !modes_compat(lkb1, lkb2) && !modes_compat(lkb2, lkb1), or + * - if lkb1 was the first entry in the queue (not just earlier), and was + * blocked by the granted mode of lkb2, and there was nothing on the + * granted queue preventing lkb1 from being granted immediately, i.e. + * lkb2 was the only thing preventing lkb1 from being granted. + * + * That second condition meant we'd only say there was conv-deadlk if + * resolving it (by demotion) would lead to the first lock on the convert + * queue being granted right away. It allowed conversion deadlocks to exist + * between locks on the convert queue while they couldn't be granted anyway. * - * After the resolution, the "grant pending" function needs to go back and try - * to grant locks on the convert queue again since the first lock can now be - * granted. + * Now, we detect and take action on conversion deadlocks immediately when + * they're created, even if they may not be immediately consequential. If + * lkb1 exists anywhere in the convert queue and lkb2 comes in with a granted + * mode that would prevent lkb1's conversion from being granted, we do a + * deadlk/demote on lkb2 right away and don't let it onto the convert queue. + * I think this means that the lkb_is_ahead condition below should always + * be zero, i.e. there will never be conv-deadlk between two locks that are + * both already on the convert queue. */ -static int conversion_deadlock_detect(struct dlm_rsb *rsb, struct dlm_lkb *lkb) +static int conversion_deadlock_detect(struct dlm_rsb *r, struct dlm_lkb *lkb2) { - struct dlm_lkb *this, *first = NULL, *self = NULL; + struct dlm_lkb *lkb1; + int lkb_is_ahead = 0; - list_for_each_entry(this, &rsb->res_convertqueue, lkb_statequeue) { - if (!first) - first = this; - if (this == lkb) { - self = lkb; + list_for_each_entry(lkb1, &r->res_convertqueue, lkb_statequeue) { + if (lkb1 == lkb2) { + lkb_is_ahead = 1; continue; } - if (!modes_compat(this, lkb) && !modes_compat(lkb, this)) - return 1; - } - - /* if lkb is on the convert queue and is preventing the first - from being granted, then there's deadlock and we demote lkb. - multiple converting locks may need to do this before the first - converting lock can be granted. */ - - if (self && self != first) { - if (!modes_compat(lkb, first) && - !queue_conflict(&rsb->res_grantqueue, first)) - return 1; + if (!lkb_is_ahead) { + if (!modes_compat(lkb2, lkb1)) + return 1; + } else { + if (!modes_compat(lkb2, lkb1) && + !modes_compat(lkb1, lkb2)) + return 1; + } } - return 0; } @@ -1450,42 +1598,57 @@ static int _can_be_granted(struct dlm_rs if (!now && !conv && list_empty(&r->res_convertqueue) && first_in_list(lkb, &r->res_waitqueue)) return 1; - out: - /* - * The following, enabled by CONVDEADLK, departs from VMS. - */ - - if (conv && (lkb->lkb_exflags & DLM_LKF_CONVDEADLK) && - conversion_deadlock_detect(r, lkb)) { - lkb->lkb_grmode = DLM_LOCK_NL; - lkb->lkb_sbflags |= DLM_SBF_DEMOTED; - } - return 0; } -/* - * The ALTPR and ALTCW flags aren't traditional lock manager flags, but are a - * simple way to provide a big optimization to applications that can use them. - */ - -static int can_be_granted(struct dlm_rsb *r, struct dlm_lkb *lkb, int now) +static int can_be_granted(struct dlm_rsb *r, struct dlm_lkb *lkb, int now, + int *err) { - uint32_t flags = lkb->lkb_exflags; int rv; int8_t alt = 0, rqmode = lkb->lkb_rqmode; + int8_t is_convert = (lkb->lkb_grmode != DLM_LOCK_IV); + + if (err) + *err = 0; rv = _can_be_granted(r, lkb, now); if (rv) goto out; - if (lkb->lkb_sbflags & DLM_SBF_DEMOTED) + /* + * The CONVDEADLK flag is non-standard and tells the dlm to resolve + * conversion deadlocks by demoting grmode to NL, otherwise the dlm + * cancels one of the locks. + */ + + if (is_convert && can_be_queued(lkb) && + conversion_deadlock_detect(r, lkb)) { + if (lkb->lkb_exflags & DLM_LKF_CONVDEADLK) { + lkb->lkb_grmode = DLM_LOCK_NL; + lkb->lkb_sbflags |= DLM_SBF_DEMOTED; + } else if (!(lkb->lkb_exflags & DLM_LKF_NODLCKWT)) { + if (err) + *err = -EDEADLK; + else { + log_print("can_be_granted deadlock %x now %d", + lkb->lkb_id, now); + dlm_dump_rsb(r); + } + } goto out; + } - if (rqmode != DLM_LOCK_PR && flags & DLM_LKF_ALTPR) + /* + * The ALTPR and ALTCW flags are non-standard and tell the dlm to try + * to grant a request in a mode other than the normal rqmode. It's a + * simple way to provide a big optimization to applications that can + * use them. + */ + + if (rqmode != DLM_LOCK_PR && (lkb->lkb_exflags & DLM_LKF_ALTPR)) alt = DLM_LOCK_PR; - else if (rqmode != DLM_LOCK_CW && flags & DLM_LKF_ALTCW) + else if (rqmode != DLM_LOCK_CW && (lkb->lkb_exflags & DLM_LKF_ALTCW)) alt = DLM_LOCK_CW; if (alt) { @@ -1500,10 +1663,20 @@ static int can_be_granted(struct dlm_rsb return rv; } +/* FIXME: I don't think that can_be_granted() can/will demote or find deadlock + for locks pending on the convert list. Once verified (watch for these + log_prints), we should be able to just call _can_be_granted() and not + bother with the demote/deadlk cases here (and there's no easy way to deal + with a deadlk here, we'd have to generate something like grant_lock with + the deadlk error.) */ + +/* returns the highest requested mode of all blocked conversions */ + static int grant_pending_convert(struct dlm_rsb *r, int high) { struct dlm_lkb *lkb, *s; int hi, demoted, quit, grant_restart, demote_restart; + int deadlk; quit = 0; restart: @@ -1513,14 +1686,29 @@ static int grant_pending_convert(struct list_for_each_entry_safe(lkb, s, &r->res_convertqueue, lkb_statequeue) { demoted = is_demoted(lkb); - if (can_be_granted(r, lkb, 0)) { + deadlk = 0; + + if (can_be_granted(r, lkb, 0, &deadlk)) { grant_lock_pending(r, lkb); grant_restart = 1; - } else { - hi = max_t(int, lkb->lkb_rqmode, hi); - if (!demoted && is_demoted(lkb)) - demote_restart = 1; + continue; } + + if (!demoted && is_demoted(lkb)) { + log_print("WARN: pending demoted %x node %d %s", + lkb->lkb_id, lkb->lkb_nodeid, r->res_name); + demote_restart = 1; + continue; + } + + if (deadlk) { + log_print("WARN: pending deadlock %x node %d %s", + lkb->lkb_id, lkb->lkb_nodeid, r->res_name); + dlm_dump_rsb(r); + continue; + } + + hi = max_t(int, lkb->lkb_rqmode, hi); } if (grant_restart) @@ -1538,7 +1726,7 @@ static int grant_pending_wait(struct dlm struct dlm_lkb *lkb, *s; list_for_each_entry_safe(lkb, s, &r->res_waitqueue, lkb_statequeue) { - if (can_be_granted(r, lkb, 0)) + if (can_be_granted(r, lkb, 0, NULL)) grant_lock_pending(r, lkb); else high = max_t(int, lkb->lkb_rqmode, high); @@ -1733,7 +1921,7 @@ static void confirm_master(struct dlm_rs } static int set_lock_args(int mode, struct dlm_lksb *lksb, uint32_t flags, - int namelen, uint32_t parent_lkid, void *ast, + int namelen, unsigned long timeout_cs, void *ast, void *astarg, void *bast, struct dlm_args *args) { int rv = -EINVAL; @@ -1776,10 +1964,6 @@ static int set_lock_args(int mode, struc if (flags & DLM_LKF_VALBLK && !lksb->sb_lvbptr) goto out; - /* parent/child locks not yet supported */ - if (parent_lkid) - goto out; - if (flags & DLM_LKF_CONVERT && !lksb->sb_lkid) goto out; @@ -1791,6 +1975,7 @@ static int set_lock_args(int mode, struc args->astaddr = ast; args->astparam = (long) astarg; args->bastaddr = bast; + args->timeout = timeout_cs; args->mode = mode; args->lksb = lksb; rv = 0; @@ -1845,6 +2030,7 @@ static int validate_lock_args(struct dlm lkb->lkb_lksb = args->lksb; lkb->lkb_lvbptr = args->lksb->sb_lvbptr; lkb->lkb_ownpid = (int) current->pid; + lkb->lkb_timeout_cs = args->timeout; rv = 0; out: return rv; @@ -1903,6 +2089,9 @@ static int validate_unlock_args(struct d if (is_overlap(lkb)) goto out; + /* don't let scand try to do a cancel */ + del_timeout(lkb); + if (lkb->lkb_flags & DLM_IFL_RESEND) { lkb->lkb_flags |= DLM_IFL_OVERLAP_CANCEL; rv = -EBUSY; @@ -1934,6 +2123,9 @@ static int validate_unlock_args(struct d if (is_overlap_unlock(lkb)) goto out; + /* don't let scand try to do a cancel */ + del_timeout(lkb); + if (lkb->lkb_flags & DLM_IFL_RESEND) { lkb->lkb_flags |= DLM_IFL_OVERLAP_UNLOCK; rv = -EBUSY; @@ -1984,7 +2176,7 @@ static int do_request(struct dlm_rsb *r, { int error = 0; - if (can_be_granted(r, lkb, 1)) { + if (can_be_granted(r, lkb, 1, NULL)) { grant_lock(r, lkb); queue_cast(r, lkb, 0); goto out; @@ -1994,6 +2186,7 @@ static int do_request(struct dlm_rsb *r, error = -EINPROGRESS; add_lkb(r, lkb, DLM_LKSTS_WAITING); send_blocking_asts(r, lkb); + add_timeout(lkb); goto out; } @@ -2009,16 +2202,32 @@ static int do_request(struct dlm_rsb *r, static int do_convert(struct dlm_rsb *r, struct dlm_lkb *lkb) { int error = 0; + int deadlk = 0; /* changing an existing lock may allow others to be granted */ - if (can_be_granted(r, lkb, 1)) { + if (can_be_granted(r, lkb, 1, &deadlk)) { grant_lock(r, lkb); queue_cast(r, lkb, 0); grant_pending_locks(r); goto out; } + /* can_be_granted() detected that this lock would block in a conversion + deadlock, so we leave it on the granted queue and return EDEADLK in + the ast for the convert. */ + + if (deadlk) { + /* it's left on the granted queue */ + log_debug(r->res_ls, "deadlock %x node %d sts%d g%d r%d %s", + lkb->lkb_id, lkb->lkb_nodeid, lkb->lkb_status, + lkb->lkb_grmode, lkb->lkb_rqmode, r->res_name); + revert_lock(r, lkb); + queue_cast(r, lkb, -EDEADLK); + error = -EDEADLK; + goto out; + } + /* is_demoted() means the can_be_granted() above set the grmode to NL, and left us on the granted queue. This auto-demotion (due to CONVDEADLK) might mean other locks, and/or this lock, are @@ -2041,6 +2250,7 @@ static int do_convert(struct dlm_rsb *r, del_lkb(r, lkb); add_lkb(r, lkb, DLM_LKSTS_CONVERT); send_blocking_asts(r, lkb); + add_timeout(lkb); goto out; } @@ -2274,7 +2484,7 @@ int dlm_lock(dlm_lockspace_t *lockspace, if (!ls) return -EINVAL; - lock_recovery(ls); + dlm_lock_recovery(ls); if (convert) error = find_lkb(ls, lksb->sb_lkid, &lkb); @@ -2284,7 +2494,7 @@ int dlm_lock(dlm_lockspace_t *lockspace, if (error) goto out; - error = set_lock_args(mode, lksb, flags, namelen, parent_lkid, ast, + error = set_lock_args(mode, lksb, flags, namelen, 0, ast, astarg, bast, &args); if (error) goto out_put; @@ -2299,10 +2509,10 @@ int dlm_lock(dlm_lockspace_t *lockspace, out_put: if (convert || error) __put_lkb(ls, lkb); - if (error == -EAGAIN) + if (error == -EAGAIN || error == -EDEADLK) error = 0; out: - unlock_recovery(ls); + dlm_unlock_recovery(ls); dlm_put_lockspace(ls); return error; } @@ -2322,7 +2532,7 @@ int dlm_unlock(dlm_lockspace_t *lockspac if (!ls) return -EINVAL; - lock_recovery(ls); + dlm_lock_recovery(ls); error = find_lkb(ls, lkid, &lkb); if (error) @@ -2344,7 +2554,7 @@ int dlm_unlock(dlm_lockspace_t *lockspac out_put: dlm_put_lkb(lkb); out: - unlock_recovery(ls); + dlm_unlock_recovery(ls); dlm_put_lockspace(ls); return error; } @@ -3111,9 +3321,10 @@ static void receive_request_reply(struct lkb->lkb_remid = ms->m_lkid; if (is_altmode(lkb)) munge_altmode(lkb, ms); - if (result) + if (result) { add_lkb(r, lkb, DLM_LKSTS_WAITING); - else { + add_timeout(lkb); + } else { grant_lock_pc(r, lkb, ms); queue_cast(r, lkb, 0); } @@ -3172,6 +3383,12 @@ static void __receive_convert_reply(stru queue_cast(r, lkb, -EAGAIN); break; + case -EDEADLK: + receive_flags_reply(lkb, ms); + revert_lock_pc(r, lkb); + queue_cast(r, lkb, -EDEADLK); + break; + case -EINPROGRESS: /* convert was queued on remote master */ receive_flags_reply(lkb, ms); @@ -3179,6 +3396,7 @@ static void __receive_convert_reply(stru munge_demoted(lkb, ms); del_lkb(r, lkb); add_lkb(r, lkb, DLM_LKSTS_CONVERT); + add_timeout(lkb); break; case 0: @@ -3298,8 +3516,7 @@ static void _receive_cancel_reply(struct case -DLM_ECANCEL: receive_flags_reply(lkb, ms); revert_lock_pc(r, lkb); - if (ms->m_result) - queue_cast(r, lkb, -DLM_ECANCEL); + queue_cast(r, lkb, -DLM_ECANCEL); break; case 0: break; @@ -3424,7 +3641,7 @@ int dlm_receive_message(struct dlm_heade } } - if (lock_recovery_try(ls)) + if (dlm_lock_recovery_try(ls)) break; schedule(); } @@ -3503,7 +3720,7 @@ int dlm_receive_message(struct dlm_heade log_error(ls, "unknown message type %d", ms->m_type); } - unlock_recovery(ls); + dlm_unlock_recovery(ls); out: dlm_put_lockspace(ls); dlm_astd_wake(); @@ -4034,13 +4251,13 @@ int dlm_recover_process_copy(struct dlm_ int dlm_user_request(struct dlm_ls *ls, struct dlm_user_args *ua, int mode, uint32_t flags, void *name, unsigned int namelen, - uint32_t parent_lkid) + unsigned long timeout_cs) { struct dlm_lkb *lkb; struct dlm_args args; int error; - lock_recovery(ls); + dlm_lock_recovery(ls); error = create_lkb(ls, &lkb); if (error) { @@ -4062,7 +4279,7 @@ int dlm_user_request(struct dlm_ls *ls, When DLM_IFL_USER is set, the dlm knows that this is a userspace lock and that lkb_astparam is the dlm_user_args structure. */ - error = set_lock_args(mode, &ua->lksb, flags, namelen, parent_lkid, + error = set_lock_args(mode, &ua->lksb, flags, namelen, timeout_cs, DLM_FAKE_USER_AST, ua, DLM_FAKE_USER_AST, &args); lkb->lkb_flags |= DLM_IFL_USER; ua->old_mode = DLM_LOCK_IV; @@ -4094,19 +4311,20 @@ int dlm_user_request(struct dlm_ls *ls, list_add_tail(&lkb->lkb_ownqueue, &ua->proc->locks); spin_unlock(&ua->proc->locks_spin); out: - unlock_recovery(ls); + dlm_unlock_recovery(ls); return error; } int dlm_user_convert(struct dlm_ls *ls, struct dlm_user_args *ua_tmp, - int mode, uint32_t flags, uint32_t lkid, char *lvb_in) + int mode, uint32_t flags, uint32_t lkid, char *lvb_in, + unsigned long timeout_cs) { struct dlm_lkb *lkb; struct dlm_args args; struct dlm_user_args *ua; int error; - lock_recovery(ls); + dlm_lock_recovery(ls); error = find_lkb(ls, lkid, &lkb); if (error) @@ -4127,6 +4345,7 @@ int dlm_user_convert(struct dlm_ls *ls, if (lvb_in && ua->lksb.sb_lvbptr) memcpy(ua->lksb.sb_lvbptr, lvb_in, DLM_USER_LVB_LEN); + ua->xid = ua_tmp->xid; ua->castparam = ua_tmp->castparam; ua->castaddr = ua_tmp->castaddr; ua->bastparam = ua_tmp->bastparam; @@ -4134,19 +4353,19 @@ int dlm_user_convert(struct dlm_ls *ls, ua->user_lksb = ua_tmp->user_lksb; ua->old_mode = lkb->lkb_grmode; - error = set_lock_args(mode, &ua->lksb, flags, 0, 0, DLM_FAKE_USER_AST, - ua, DLM_FAKE_USER_AST, &args); + error = set_lock_args(mode, &ua->lksb, flags, 0, timeout_cs, + DLM_FAKE_USER_AST, ua, DLM_FAKE_USER_AST, &args); if (error) goto out_put; error = convert_lock(ls, lkb, &args); - if (error == -EINPROGRESS || error == -EAGAIN) + if (error == -EINPROGRESS || error == -EAGAIN || error == -EDEADLK) error = 0; out_put: dlm_put_lkb(lkb); out: - unlock_recovery(ls); + dlm_unlock_recovery(ls); kfree(ua_tmp); return error; } @@ -4159,7 +4378,7 @@ int dlm_user_unlock(struct dlm_ls *ls, s struct dlm_user_args *ua; int error; - lock_recovery(ls); + dlm_lock_recovery(ls); error = find_lkb(ls, lkid, &lkb); if (error) @@ -4194,7 +4413,7 @@ int dlm_user_unlock(struct dlm_ls *ls, s out_put: dlm_put_lkb(lkb); out: - unlock_recovery(ls); + dlm_unlock_recovery(ls); kfree(ua_tmp); return error; } @@ -4207,7 +4426,7 @@ int dlm_user_cancel(struct dlm_ls *ls, s struct dlm_user_args *ua; int error; - lock_recovery(ls); + dlm_lock_recovery(ls); error = find_lkb(ls, lkid, &lkb); if (error) @@ -4231,11 +4450,59 @@ int dlm_user_cancel(struct dlm_ls *ls, s out_put: dlm_put_lkb(lkb); out: - unlock_recovery(ls); + dlm_unlock_recovery(ls); kfree(ua_tmp); return error; } +int dlm_user_deadlock(struct dlm_ls *ls, uint32_t flags, uint32_t lkid) +{ + struct dlm_lkb *lkb; + struct dlm_args args; + struct dlm_user_args *ua; + struct dlm_rsb *r; + int error; + + dlm_lock_recovery(ls); + + error = find_lkb(ls, lkid, &lkb); + if (error) + goto out; + + ua = (struct dlm_user_args *)lkb->lkb_astparam; + + error = set_unlock_args(flags, ua, &args); + if (error) + goto out_put; + + /* same as cancel_lock(), but set DEADLOCK_CANCEL after lock_rsb */ + + r = lkb->lkb_resource; + hold_rsb(r); + lock_rsb(r); + + error = validate_unlock_args(lkb, &args); + if (error) + goto out_r; + lkb->lkb_flags |= DLM_IFL_DEADLOCK_CANCEL; + + error = _cancel_lock(r, lkb); + out_r: + unlock_rsb(r); + put_rsb(r); + + if (error == -DLM_ECANCEL) + error = 0; + /* from validate_unlock_args() */ + if (error == -EBUSY) + error = 0; + out_put: + dlm_put_lkb(lkb); + out: + dlm_unlock_recovery(ls); + return error; +} + /* lkb's that are removed from the waiters list by revert are just left on the orphans list with the granted orphan locks, to be freed by purge */ @@ -4314,12 +4581,13 @@ void dlm_clear_proc_locks(struct dlm_ls { struct dlm_lkb *lkb, *safe; - lock_recovery(ls); + dlm_lock_recovery(ls); while (1) { lkb = del_proc_lock(ls, proc); if (!lkb) break; + del_timeout(lkb); if (lkb->lkb_exflags & DLM_LKF_PERSISTENT) orphan_proc_lock(ls, lkb); else @@ -4347,7 +4615,7 @@ void dlm_clear_proc_locks(struct dlm_ls } mutex_unlock(&ls->ls_clear_proc_locks); - unlock_recovery(ls); + dlm_unlock_recovery(ls); } static void purge_proc_locks(struct dlm_ls *ls, struct dlm_user_proc *proc) @@ -4429,12 +4697,12 @@ int dlm_user_purge(struct dlm_ls *ls, st if (nodeid != dlm_our_nodeid()) { error = send_purge(ls, nodeid, pid); } else { - lock_recovery(ls); + dlm_lock_recovery(ls); if (pid == current->pid) purge_proc_locks(ls, proc); else do_purge(ls, nodeid, pid); - unlock_recovery(ls); + dlm_unlock_recovery(ls); } return error; } diff --git a/fs/dlm/lock.h b/fs/dlm/lock.h index 64fc4ec..1720313 100644 --- a/fs/dlm/lock.h +++ b/fs/dlm/lock.h @@ -1,7 +1,7 @@ /****************************************************************************** ******************************************************************************* ** -** Copyright (C) 2005 Red Hat, Inc. All rights reserved. +** Copyright (C) 2005-2007 Red Hat, Inc. All rights reserved. ** ** This copyrighted material is made available to anyone wishing to use, ** modify, copy, or redistribute it subject to the terms and conditions @@ -24,6 +24,10 @@ void dlm_put_rsb(struct dlm_rsb *r); void dlm_hold_rsb(struct dlm_rsb *r); int dlm_put_lkb(struct dlm_lkb *lkb); void dlm_scan_rsbs(struct dlm_ls *ls); +int dlm_lock_recovery_try(struct dlm_ls *ls); +void dlm_unlock_recovery(struct dlm_ls *ls); +void dlm_scan_timeout(struct dlm_ls *ls); +void dlm_adjust_timeouts(struct dlm_ls *ls); int dlm_purge_locks(struct dlm_ls *ls); void dlm_purge_mstcpy_locks(struct dlm_rsb *r); @@ -34,15 +38,18 @@ int dlm_recover_master_copy(struct dlm_l int dlm_recover_process_copy(struct dlm_ls *ls, struct dlm_rcom *rc); int dlm_user_request(struct dlm_ls *ls, struct dlm_user_args *ua, int mode, - uint32_t flags, void *name, unsigned int namelen, uint32_t parent_lkid); + uint32_t flags, void *name, unsigned int namelen, + unsigned long timeout_cs); int dlm_user_convert(struct dlm_ls *ls, struct dlm_user_args *ua_tmp, - int mode, uint32_t flags, uint32_t lkid, char *lvb_in); + int mode, uint32_t flags, uint32_t lkid, char *lvb_in, + unsigned long timeout_cs); int dlm_user_unlock(struct dlm_ls *ls, struct dlm_user_args *ua_tmp, uint32_t flags, uint32_t lkid, char *lvb_in); int dlm_user_cancel(struct dlm_ls *ls, struct dlm_user_args *ua_tmp, uint32_t flags, uint32_t lkid); int dlm_user_purge(struct dlm_ls *ls, struct dlm_user_proc *proc, int nodeid, int pid); +int dlm_user_deadlock(struct dlm_ls *ls, uint32_t flags, uint32_t lkid); void dlm_clear_proc_locks(struct dlm_ls *ls, struct dlm_user_proc *proc); static inline int is_master(struct dlm_rsb *r) diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c index a677b2a..c8f0c15 100644 --- a/fs/dlm/lockspace.c +++ b/fs/dlm/lockspace.c @@ -197,13 +197,24 @@ static int do_uevent(struct dlm_ls *ls, else kobject_uevent(&ls->ls_kobj, KOBJ_OFFLINE); + log_debug(ls, "%s the lockspace group...", in ? "joining" : "leaving"); + + /* dlm_controld will see the uevent, do the necessary group management + and then write to sysfs to wake us */ + error = wait_event_interruptible(ls->ls_uevent_wait, test_and_clear_bit(LSFL_UEVENT_WAIT, &ls->ls_flags)); + + log_debug(ls, "group event done %d %d", error, ls->ls_uevent_result); + if (error) goto out; error = ls->ls_uevent_result; out: + if (error) + log_error(ls, "group %s failed %d %d", in ? "join" : "leave", + error, ls->ls_uevent_result); return error; } @@ -234,8 +245,13 @@ static int dlm_scand(void *data) struct dlm_ls *ls; while (!kthread_should_stop()) { - list_for_each_entry(ls, &lslist, ls_list) - dlm_scan_rsbs(ls); + list_for_each_entry(ls, &lslist, ls_list) { + if (dlm_lock_recovery_try(ls)) { + dlm_scan_rsbs(ls); + dlm_scan_timeout(ls); + dlm_unlock_recovery(ls); + } + } schedule_timeout_interruptible(dlm_config.ci_scan_secs * HZ); } return 0; @@ -395,6 +411,7 @@ static int new_lockspace(char *name, int { struct dlm_ls *ls; int i, size, error = -ENOMEM; + int do_unreg = 0; if (namelen > DLM_LOCKSPACE_LEN) return -EINVAL; @@ -417,11 +434,16 @@ static int new_lockspace(char *name, int goto out; memcpy(ls->ls_name, name, namelen); ls->ls_namelen = namelen; - ls->ls_exflags = flags; ls->ls_lvblen = lvblen; ls->ls_count = 0; ls->ls_flags = 0; + /* ls_exflags are forced to match among nodes, and we don't + need to require all nodes to have TIMEWARN active */ + if (flags & DLM_LSFL_TIMEWARN) + set_bit(LSFL_TIMEWARN, &ls->ls_flags); + ls->ls_exflags = (flags & ~DLM_LSFL_TIMEWARN); + size = dlm_config.ci_rsbtbl_size; ls->ls_rsbtbl_size = size; @@ -461,6 +483,8 @@ static int new_lockspace(char *name, int mutex_init(&ls->ls_waiters_mutex); INIT_LIST_HEAD(&ls->ls_orphans); mutex_init(&ls->ls_orphans_mutex); + INIT_LIST_HEAD(&ls->ls_timeout); + mutex_init(&ls->ls_timeout_mutex); INIT_LIST_HEAD(&ls->ls_nodes); INIT_LIST_HEAD(&ls->ls_nodes_gone); @@ -477,6 +501,8 @@ static int new_lockspace(char *name, int init_waitqueue_head(&ls->ls_uevent_wait); ls->ls_uevent_result = 0; + init_completion(&ls->ls_members_done); + ls->ls_members_result = -1; ls->ls_recoverd_task = NULL; mutex_init(&ls->ls_recoverd_active); @@ -513,32 +539,49 @@ static int new_lockspace(char *name, int error = dlm_recoverd_start(ls); if (error) { log_error(ls, "can't start dlm_recoverd %d", error); - goto out_rcomfree; + goto out_delist; } - dlm_create_debug_file(ls); - error = kobject_setup(ls); if (error) - goto out_del; + goto out_stop; error = kobject_register(&ls->ls_kobj); if (error) - goto out_del; + goto out_stop; + + /* let kobject handle freeing of ls if there's an error */ + do_unreg = 1; + + /* This uevent triggers dlm_controld in userspace to add us to the + group of nodes that are members of this lockspace (managed by the + cluster infrastructure.) Once it's done that, it tells us who the + current lockspace members are (via configfs) and then tells the + lockspace to start running (via sysfs) in dlm_ls_start(). */ error = do_uevent(ls, 1); if (error) - goto out_unreg; + goto out_stop; + + wait_for_completion(&ls->ls_members_done); + error = ls->ls_members_result; + if (error) + goto out_members; + + dlm_create_debug_file(ls); + + log_debug(ls, "join complete"); *lockspace = ls; return 0; - out_unreg: - kobject_unregister(&ls->ls_kobj); - out_del: - dlm_delete_debug_file(ls); + out_members: + do_uevent(ls, 0); + dlm_clear_members(ls); + kfree(ls->ls_node_array); + out_stop: dlm_recoverd_stop(ls); - out_rcomfree: + out_delist: spin_lock(&lslist_lock); list_del(&ls->ls_list); spin_unlock(&lslist_lock); @@ -550,7 +593,10 @@ static int new_lockspace(char *name, int out_rsbfree: kfree(ls->ls_rsbtbl); out_lsfree: - kfree(ls); + if (do_unreg) + kobject_unregister(&ls->ls_kobj); + else + kfree(ls); out: module_put(THIS_MODULE); return error; @@ -570,6 +616,8 @@ int dlm_new_lockspace(char *name, int na error = new_lockspace(name, namelen, lockspace, flags, lvblen); if (!error) ls_count++; + else if (!ls_count) + threads_stop(); out: mutex_unlock(&ls_lock); return error; @@ -696,7 +744,7 @@ static int release_lockspace(struct dlm_ dlm_clear_members_gone(ls); kfree(ls->ls_node_array); kobject_unregister(&ls->ls_kobj); - /* The ls structure will be freed when the kobject is done with */ + /* The ls structure will be freed when the kobject is done with */ mutex_lock(&ls_lock); ls_count--; diff --git a/fs/dlm/main.c b/fs/dlm/main.c index 162fbae..eca2907 100644 --- a/fs/dlm/main.c +++ b/fs/dlm/main.c @@ -2,7 +2,7 @@ ******************************************************************************* ** ** Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. -** Copyright (C) 2004-2005 Red Hat, Inc. All rights reserved. +** Copyright (C) 2004-2007 Red Hat, Inc. All rights reserved. ** ** This copyrighted material is made available to anyone wishing to use, ** modify, copy, or redistribute it subject to the terms and conditions @@ -25,6 +25,8 @@ #else static inline int dlm_register_debugfs(void) { return 0; } static inline void dlm_unregister_debugfs(void) { } #endif +int dlm_netlink_init(void); +void dlm_netlink_exit(void); static int __init init_dlm(void) { @@ -50,10 +52,16 @@ static int __init init_dlm(void) if (error) goto out_debug; + error = dlm_netlink_init(); + if (error) + goto out_user; + printk("DLM (built %s %s) installed\n", __DATE__, __TIME__); return 0; + out_user: + dlm_user_exit(); out_debug: dlm_unregister_debugfs(); out_config: @@ -68,6 +76,7 @@ static int __init init_dlm(void) static void __exit exit_dlm(void) { + dlm_netlink_exit(); dlm_user_exit(); dlm_config_exit(); dlm_memory_exit(); diff --git a/fs/dlm/member.c b/fs/dlm/member.c index 85e2897..073599d 100644 --- a/fs/dlm/member.c +++ b/fs/dlm/member.c @@ -1,7 +1,7 @@ /****************************************************************************** ******************************************************************************* ** -** Copyright (C) 2005 Red Hat, Inc. All rights reserved. +** Copyright (C) 2005-2007 Red Hat, Inc. All rights reserved. ** ** This copyrighted material is made available to anyone wishing to use, ** modify, copy, or redistribute it subject to the terms and conditions @@ -233,6 +233,12 @@ int dlm_recover_members(struct dlm_ls *l *neg_out = neg; error = ping_members(ls); + if (!error || error == -EPROTO) { + /* new_lockspace() may be waiting to know if the config + is good or bad */ + ls->ls_members_result = error; + complete(&ls->ls_members_done); + } if (error) goto out; @@ -284,6 +290,9 @@ int dlm_ls_stop(struct dlm_ls *ls) dlm_recoverd_suspend(ls); ls->ls_recover_status = 0; dlm_recoverd_resume(ls); + + if (!ls->ls_recover_begin) + ls->ls_recover_begin = jiffies; return 0; } diff --git a/fs/dlm/netlink.c b/fs/dlm/netlink.c new file mode 100644 index 0000000..863b87d --- /dev/null +++ b/fs/dlm/netlink.c @@ -0,0 +1,153 @@ +/* + * Copyright (C) 2007 Red Hat, Inc. All rights reserved. + * + * This copyrighted material is made available to anyone wishing to use, + * modify, copy, or redistribute it subject to the terms and conditions + * of the GNU General Public License v.2. + */ + +#include +#include +#include + +#include "dlm_internal.h" + +static uint32_t dlm_nl_seqnum; +static uint32_t listener_nlpid; + +static struct genl_family family = { + .id = GENL_ID_GENERATE, + .name = DLM_GENL_NAME, + .version = DLM_GENL_VERSION, +}; + +static int prepare_data(u8 cmd, struct sk_buff **skbp, size_t size) +{ + struct sk_buff *skb; + void *data; + + skb = genlmsg_new(size, GFP_KERNEL); + if (!skb) + return -ENOMEM; + + /* add the message headers */ + data = genlmsg_put(skb, 0, dlm_nl_seqnum++, &family, 0, cmd); + if (!data) { + nlmsg_free(skb); + return -EINVAL; + } + + *skbp = skb; + return 0; +} + +static struct dlm_lock_data *mk_data(struct sk_buff *skb) +{ + struct nlattr *ret; + + ret = nla_reserve(skb, DLM_TYPE_LOCK, sizeof(struct dlm_lock_data)); + if (!ret) + return NULL; + return nla_data(ret); +} + +static int send_data(struct sk_buff *skb) +{ + struct genlmsghdr *genlhdr = nlmsg_data((struct nlmsghdr *)skb->data); + void *data = genlmsg_data(genlhdr); + int rv; + + rv = genlmsg_end(skb, data); + if (rv < 0) { + nlmsg_free(skb); + return rv; + } + + return genlmsg_unicast(skb, listener_nlpid); +} + +static int user_cmd(struct sk_buff *skb, struct genl_info *info) +{ + listener_nlpid = info->snd_pid; + printk("user_cmd nlpid %u\n", listener_nlpid); + return 0; +} + +static struct genl_ops dlm_nl_ops = { + .cmd = DLM_CMD_HELLO, + .doit = user_cmd, +}; + +int dlm_netlink_init(void) +{ + int rv; + + rv = genl_register_family(&family); + if (rv) + return rv; + + rv = genl_register_ops(&family, &dlm_nl_ops); + if (rv < 0) + goto err; + return 0; + err: + genl_unregister_family(&family); + return rv; +} + +void dlm_netlink_exit(void) +{ + genl_unregister_ops(&family, &dlm_nl_ops); + genl_unregister_family(&family); +} + +static void fill_data(struct dlm_lock_data *data, struct dlm_lkb *lkb) +{ + struct dlm_rsb *r = lkb->lkb_resource; + struct dlm_user_args *ua = (struct dlm_user_args *) lkb->lkb_astparam; + + memset(data, 0, sizeof(struct dlm_lock_data)); + + data->version = DLM_LOCK_DATA_VERSION; + data->nodeid = lkb->lkb_nodeid; + data->ownpid = lkb->lkb_ownpid; + data->id = lkb->lkb_id; + data->remid = lkb->lkb_remid; + data->status = lkb->lkb_status; + data->grmode = lkb->lkb_grmode; + data->rqmode = lkb->lkb_rqmode; + data->timestamp = lkb->lkb_timestamp; + if (ua) + data->xid = ua->xid; + if (r) { + data->lockspace_id = r->res_ls->ls_global_id; + data->resource_namelen = r->res_length; + memcpy(data->resource_name, r->res_name, r->res_length); + } +} + +void dlm_timeout_warn(struct dlm_lkb *lkb) +{ + struct dlm_lock_data *data; + struct sk_buff *send_skb; + size_t size; + int rv; + + size = nla_total_size(sizeof(struct dlm_lock_data)) + + nla_total_size(0); /* why this? */ + + rv = prepare_data(DLM_CMD_TIMEOUT, &send_skb, size); + if (rv < 0) + return; + + data = mk_data(send_skb); + if (!data) { + nlmsg_free(send_skb); + return; + } + + fill_data(data, lkb); + + send_data(send_skb); +} + diff --git a/fs/dlm/rcom.c b/fs/dlm/rcom.c index 6bfbd61..f71c235 100644 --- a/fs/dlm/rcom.c +++ b/fs/dlm/rcom.c @@ -90,7 +90,7 @@ static int check_config(struct dlm_ls *l log_error(ls, "version mismatch: %x nodeid %d: %x", DLM_HEADER_MAJOR | DLM_HEADER_MINOR, nodeid, rc->rc_header.h_version); - return -EINVAL; + return -EPROTO; } if (rf->rf_lvblen != ls->ls_lvblen || @@ -98,7 +98,7 @@ static int check_config(struct dlm_ls *l log_error(ls, "config mismatch: %d,%x nodeid %d: %d,%x", ls->ls_lvblen, ls->ls_exflags, nodeid, rf->rf_lvblen, rf->rf_lsflags); - return -EINVAL; + return -EPROTO; } return 0; } diff --git a/fs/dlm/recoverd.c b/fs/dlm/recoverd.c index 3cb636d..6657599 100644 --- a/fs/dlm/recoverd.c +++ b/fs/dlm/recoverd.c @@ -2,7 +2,7 @@ ******************************************************************************* ** ** Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. -** Copyright (C) 2004-2005 Red Hat, Inc. All rights reserved. +** Copyright (C) 2004-2007 Red Hat, Inc. All rights reserved. ** ** This copyrighted material is made available to anyone wishing to use, ** modify, copy, or redistribute it subject to the terms and conditions @@ -190,6 +190,8 @@ static int ls_recover(struct dlm_ls *ls, dlm_clear_members_gone(ls); + dlm_adjust_timeouts(ls); + error = enable_locking(ls, rv->seq); if (error) { log_debug(ls, "enable_locking failed %d", error); diff --git a/fs/dlm/user.c b/fs/dlm/user.c index b0201ec..6438941 100644 --- a/fs/dlm/user.c +++ b/fs/dlm/user.c @@ -33,16 +33,17 @@ #ifdef CONFIG_COMPAT struct dlm_lock_params32 { __u8 mode; __u8 namelen; - __u16 flags; + __u16 unused; + __u32 flags; __u32 lkid; __u32 parent; - + __u64 xid; + __u64 timeout; __u32 castparam; __u32 castaddr; __u32 bastparam; __u32 bastaddr; __u32 lksb; - char lvb[DLM_USER_LVB_LEN]; char name[0]; }; @@ -68,6 +69,7 @@ struct dlm_lksb32 { }; struct dlm_lock_result32 { + __u32 version[3]; __u32 length; __u32 user_astaddr; __u32 user_astparam; @@ -102,6 +104,8 @@ static void compat_input(struct dlm_writ kb->i.lock.flags = kb32->i.lock.flags; kb->i.lock.lkid = kb32->i.lock.lkid; kb->i.lock.parent = kb32->i.lock.parent; + kb->i.lock.xid = kb32->i.lock.xid; + kb->i.lock.timeout = kb32->i.lock.timeout; kb->i.lock.castparam = (void *)(long)kb32->i.lock.castparam; kb->i.lock.castaddr = (void *)(long)kb32->i.lock.castaddr; kb->i.lock.bastparam = (void *)(long)kb32->i.lock.bastparam; @@ -115,6 +119,10 @@ static void compat_input(struct dlm_writ static void compat_output(struct dlm_lock_result *res, struct dlm_lock_result32 *res32) { + res32->version[0] = res->version[0]; + res32->version[1] = res->version[1]; + res32->version[2] = res->version[2]; + res32->user_astaddr = (__u32)(long)res->user_astaddr; res32->user_astparam = (__u32)(long)res->user_astparam; res32->user_lksb = (__u32)(long)res->user_lksb; @@ -130,6 +138,36 @@ static void compat_output(struct dlm_loc } #endif +/* Figure out if this lock is at the end of its life and no longer + available for the application to use. The lkb still exists until + the final ast is read. A lock becomes EOL in three situations: + 1. a noqueue request fails with EAGAIN + 2. an unlock completes with EUNLOCK + 3. a cancel of a waiting request completes with ECANCEL/EDEADLK + An EOL lock needs to be removed from the process's list of locks. + And we can't allow any new operation on an EOL lock. This is + not related to the lifetime of the lkb struct which is managed + entirely by refcount. */ + +static int lkb_is_endoflife(struct dlm_lkb *lkb, int sb_status, int type) +{ + switch (sb_status) { + case -DLM_EUNLOCK: + return 1; + case -DLM_ECANCEL: + case -ETIMEDOUT: + case -EDEADLK: + if (lkb->lkb_grmode == DLM_LOCK_IV) + return 1; + break; + case -EAGAIN: + if (type == AST_COMP && lkb->lkb_grmode == DLM_LOCK_IV) + return 1; + break; + } + return 0; +} + /* we could possibly check if the cancel of an orphan has resulted in the lkb being removed and then remove that lkb from the orphans list and free it */ @@ -176,25 +214,7 @@ void dlm_user_add_ast(struct dlm_lkb *lk log_debug(ls, "ast overlap %x status %x %x", lkb->lkb_id, ua->lksb.sb_status, lkb->lkb_flags); - /* Figure out if this lock is at the end of its life and no longer - available for the application to use. The lkb still exists until - the final ast is read. A lock becomes EOL in three situations: - 1. a noqueue request fails with EAGAIN - 2. an unlock completes with EUNLOCK - 3. a cancel of a waiting request completes with ECANCEL - An EOL lock needs to be removed from the process's list of locks. - And we can't allow any new operation on an EOL lock. This is - not related to the lifetime of the lkb struct which is managed - entirely by refcount. */ - - if (type == AST_COMP && - lkb->lkb_grmode == DLM_LOCK_IV && - ua->lksb.sb_status == -EAGAIN) - eol = 1; - else if (ua->lksb.sb_status == -DLM_EUNLOCK || - (ua->lksb.sb_status == -DLM_ECANCEL && - lkb->lkb_grmode == DLM_LOCK_IV)) - eol = 1; + eol = lkb_is_endoflife(lkb, ua->lksb.sb_status, type); if (eol) { lkb->lkb_ast_type &= ~AST_BAST; lkb->lkb_flags |= DLM_IFL_ENDOFLIFE; @@ -252,16 +272,18 @@ static int device_user_lock(struct dlm_u ua->castaddr = params->castaddr; ua->bastparam = params->bastparam; ua->bastaddr = params->bastaddr; + ua->xid = params->xid; if (params->flags & DLM_LKF_CONVERT) error = dlm_user_convert(ls, ua, params->mode, params->flags, - params->lkid, params->lvb); + params->lkid, params->lvb, + (unsigned long) params->timeout); else { error = dlm_user_request(ls, ua, params->mode, params->flags, params->name, params->namelen, - params->parent); + (unsigned long) params->timeout); if (!error) error = ua->lksb.sb_lkid; } @@ -299,6 +321,22 @@ static int device_user_unlock(struct dlm return error; } +static int device_user_deadlock(struct dlm_user_proc *proc, + struct dlm_lock_params *params) +{ + struct dlm_ls *ls; + int error; + + ls = dlm_find_lockspace_local(proc->lockspace); + if (!ls) + return -ENOENT; + + error = dlm_user_deadlock(ls, params->flags, params->lkid); + + dlm_put_lockspace(ls); + return error; +} + static int create_misc_device(struct dlm_ls *ls, char *name) { int error, len; @@ -348,7 +386,7 @@ static int device_create_lockspace(struc return -EPERM; error = dlm_new_lockspace(params->name, strlen(params->name), - &lockspace, 0, DLM_USER_LVB_LEN); + &lockspace, params->flags, DLM_USER_LVB_LEN); if (error) return error; @@ -524,6 +562,14 @@ #endif error = device_user_unlock(proc, &kbuf->i.lock); break; + case DLM_USER_DEADLOCK: + if (!proc) { + log_print("no locking on control device"); + goto out_sig; + } + error = device_user_deadlock(proc, &kbuf->i.lock); + break; + case DLM_USER_CREATE_LOCKSPACE: if (proc) { log_print("create/remove only on control device"); @@ -641,6 +687,9 @@ #endif int struct_len; memset(&result, 0, sizeof(struct dlm_lock_result)); + result.version[0] = DLM_DEVICE_VERSION_MAJOR; + result.version[1] = DLM_DEVICE_VERSION_MINOR; + result.version[2] = DLM_DEVICE_VERSION_PATCH; memcpy(&result.lksb, &ua->lksb, sizeof(struct dlm_lksb)); result.user_lksb = ua->user_lksb; @@ -699,6 +748,20 @@ #endif return error; } +static int copy_version_to_user(char __user *buf, size_t count) +{ + struct dlm_device_version ver; + + memset(&ver, 0, sizeof(struct dlm_device_version)); + ver.version[0] = DLM_DEVICE_VERSION_MAJOR; + ver.version[1] = DLM_DEVICE_VERSION_MINOR; + ver.version[2] = DLM_DEVICE_VERSION_PATCH; + + if (copy_to_user(buf, &ver, sizeof(struct dlm_device_version))) + return -EFAULT; + return sizeof(struct dlm_device_version); +} + /* a read returns a single ast described in a struct dlm_lock_result */ static ssize_t device_read(struct file *file, char __user *buf, size_t count, @@ -710,6 +773,16 @@ static ssize_t device_read(struct file * DECLARE_WAITQUEUE(wait, current); int error, type=0, bmode=0, removed = 0; + if (count == sizeof(struct dlm_device_version)) { + error = copy_version_to_user(buf, count); + return error; + } + + if (!proc) { + log_print("non-version read from control device %zu", count); + return -EINVAL; + } + #ifdef CONFIG_COMPAT if (count < sizeof(struct dlm_lock_result32)) #else @@ -747,11 +820,6 @@ #endif } } - if (list_empty(&proc->asts)) { - spin_unlock(&proc->asts_spin); - return -EAGAIN; - } - /* there may be both completion and blocking asts to return for the lkb, don't remove lkb from asts list unless no asts remain */ @@ -823,6 +891,7 @@ static const struct file_operations devi static const struct file_operations ctl_device_fops = { .open = ctl_device_open, .release = ctl_device_close, + .read = device_read, .write = device_write, .owner = THIS_MODULE, }; diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c index c53a5d2..e76a887 100644 --- a/fs/gfs2/bmap.c +++ b/fs/gfs2/bmap.c @@ -885,7 +885,6 @@ static int gfs2_block_truncate_page(stru unsigned blocksize, iblock, length, pos; struct buffer_head *bh; struct page *page; - void *kaddr; int err; page = grab_cache_page(mapping, index); @@ -933,10 +932,7 @@ static int gfs2_block_truncate_page(stru if (sdp->sd_args.ar_data == GFS2_DATA_ORDERED || gfs2_is_jdata(ip)) gfs2_trans_add_bh(ip->i_gl, bh, 0); - kaddr = kmap_atomic(page, KM_USER0); - memset(kaddr + offset, 0, length); - flush_dcache_page(page); - kunmap_atomic(kaddr, KM_USER0); + zero_user_page(page, offset, length, KM_USER0); unlock: unlock_page(page); @@ -1044,7 +1040,7 @@ static int trunc_end(struct gfs2_inode * ip->i_di.di_height = 0; ip->i_di.di_goal_meta = ip->i_di.di_goal_data = - ip->i_num.no_addr; + ip->i_no_addr; gfs2_buffer_clear_tail(dibh, sizeof(struct gfs2_dinode)); } ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME_SEC; diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c index a96fa07..9cdd71c 100644 --- a/fs/gfs2/dir.c +++ b/fs/gfs2/dir.c @@ -1456,7 +1456,7 @@ int gfs2_dir_read(struct inode *inode, u if (dip->i_di.di_entries != g.offset) { fs_warn(sdp, "Number of entries corrupt in dir %llu, " "ip->i_di.di_entries (%u) != g.offset (%u)\n", - (unsigned long long)dip->i_num.no_addr, + (unsigned long long)dip->i_no_addr, dip->i_di.di_entries, g.offset); error = -EIO; @@ -1488,24 +1488,54 @@ out: * Returns: errno */ -int gfs2_dir_search(struct inode *dir, const struct qstr *name, - struct gfs2_inum_host *inum, unsigned int *type) +struct inode *gfs2_dir_search(struct inode *dir, const struct qstr *name) { struct buffer_head *bh; struct gfs2_dirent *dent; + struct inode *inode; + + dent = gfs2_dirent_search(dir, name, gfs2_dirent_find, &bh); + if (dent) { + if (IS_ERR(dent)) + return ERR_PTR(PTR_ERR(dent)); + inode = gfs2_inode_lookup(dir->i_sb, + be64_to_cpu(dent->de_inum.no_addr), + be16_to_cpu(dent->de_type)); + brelse(bh); + return inode; + } + return ERR_PTR(-ENOENT); +} + +int gfs2_dir_check(struct inode *dir, const struct qstr *name, + const struct gfs2_inode *ip) +{ + struct buffer_head *bh; + struct gfs2_dirent *dent; + int ret = -ENOENT; dent = gfs2_dirent_search(dir, name, gfs2_dirent_find, &bh); if (dent) { if (IS_ERR(dent)) return PTR_ERR(dent); - if (inum) - gfs2_inum_in(inum, (char *)&dent->de_inum); - if (type) - *type = be16_to_cpu(dent->de_type); + if (ip) { + if (be64_to_cpu(dent->de_inum.no_addr) != ip->i_no_addr) + goto out; + if (be64_to_cpu(dent->de_inum.no_formal_ino) != + ip->i_no_formal_ino) + goto out; + if (unlikely(IF2DT(ip->i_inode.i_mode) != + be16_to_cpu(dent->de_type))) { + gfs2_consist_inode(GFS2_I(dir)); + ret = -EIO; + goto out; + } + } + ret = 0; +out: brelse(bh); - return 0; } - return -ENOENT; + return ret; } static int dir_new_leaf(struct inode *inode, const struct qstr *name) @@ -1565,7 +1595,7 @@ static int dir_new_leaf(struct inode *in */ int gfs2_dir_add(struct inode *inode, const struct qstr *name, - const struct gfs2_inum_host *inum, unsigned type) + const struct gfs2_inode *nip, unsigned type) { struct gfs2_inode *ip = GFS2_I(inode); struct buffer_head *bh; @@ -1580,7 +1610,7 @@ int gfs2_dir_add(struct inode *inode, co if (IS_ERR(dent)) return PTR_ERR(dent); dent = gfs2_init_dirent(inode, dent, name, bh); - gfs2_inum_out(inum, (char *)&dent->de_inum); + gfs2_inum_out(nip, dent); dent->de_type = cpu_to_be16(type); if (ip->i_di.di_flags & GFS2_DIF_EXHASH) { leaf = (struct gfs2_leaf *)bh->b_data; @@ -1700,7 +1730,7 @@ int gfs2_dir_del(struct gfs2_inode *dip, */ int gfs2_dir_mvino(struct gfs2_inode *dip, const struct qstr *filename, - struct gfs2_inum_host *inum, unsigned int new_type) + const struct gfs2_inode *nip, unsigned int new_type) { struct buffer_head *bh; struct gfs2_dirent *dent; @@ -1715,7 +1745,7 @@ int gfs2_dir_mvino(struct gfs2_inode *di return PTR_ERR(dent); gfs2_trans_add_bh(dip->i_gl, bh, 1); - gfs2_inum_out(inum, (char *)&dent->de_inum); + gfs2_inum_out(nip, dent); dent->de_type = cpu_to_be16(new_type); if (dip->i_di.di_flags & GFS2_DIF_EXHASH) { diff --git a/fs/gfs2/dir.h b/fs/gfs2/dir.h index 48fe890..8a468ca 100644 --- a/fs/gfs2/dir.h +++ b/fs/gfs2/dir.h @@ -16,15 +16,16 @@ struct inode; struct gfs2_inode; struct gfs2_inum; -int gfs2_dir_search(struct inode *dir, const struct qstr *filename, - struct gfs2_inum_host *inum, unsigned int *type); +struct inode *gfs2_dir_search(struct inode *dir, const struct qstr *filename); +int gfs2_dir_check(struct inode *dir, const struct qstr *filename, + const struct gfs2_inode *ip); int gfs2_dir_add(struct inode *inode, const struct qstr *filename, - const struct gfs2_inum_host *inum, unsigned int type); + const struct gfs2_inode *ip, unsigned int type); int gfs2_dir_del(struct gfs2_inode *dip, const struct qstr *filename); int gfs2_dir_read(struct inode *inode, u64 *offset, void *opaque, filldir_t filldir); int gfs2_dir_mvino(struct gfs2_inode *dip, const struct qstr *filename, - struct gfs2_inum_host *new_inum, unsigned int new_type); + const struct gfs2_inode *nip, unsigned int new_type); int gfs2_dir_exhash_dealloc(struct gfs2_inode *dip); diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 1815429..b3ed585 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -1823,7 +1823,8 @@ static int dump_inode(struct glock_iter print_dbg(gi, " Inode:\n"); print_dbg(gi, " num = %llu/%llu\n", - ip->i_num.no_formal_ino, ip->i_num.no_addr); + (unsigned long long)ip->i_no_formal_ino, + (unsigned long long)ip->i_no_addr); print_dbg(gi, " type = %u\n", IF2DT(ip->i_inode.i_mode)); print_dbg(gi, " i_flags ="); for (x = 0; x < 32; x++) @@ -1909,8 +1910,8 @@ static int dump_glock(struct glock_iter } if (test_bit(GLF_DEMOTE, &gl->gl_flags)) { print_dbg(gi, " Demotion req to state %u (%llu uS ago)\n", - gl->gl_demote_state, - (u64)(jiffies - gl->gl_demote_time)*(1000000/HZ)); + gl->gl_demote_state, (unsigned long long) + (jiffies - gl->gl_demote_time)*(1000000/HZ)); } if (gl->gl_ops == &gfs2_inode_glops && gl->gl_object) { if (!test_bit(GLF_LOCK, &gl->gl_flags) && diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c index 7b82657..777ca46 100644 --- a/fs/gfs2/glops.c +++ b/fs/gfs2/glops.c @@ -156,9 +156,9 @@ static void inode_go_sync(struct gfs2_gl ip = NULL; if (test_bit(GLF_DIRTY, &gl->gl_flags)) { - gfs2_log_flush(gl->gl_sbd, gl); if (ip) filemap_fdatawrite(ip->i_inode.i_mapping); + gfs2_log_flush(gl->gl_sbd, gl); gfs2_meta_sync(gl); if (ip) { struct address_space *mapping = ip->i_inode.i_mapping; diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index d995441..b2079fc 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -213,8 +213,8 @@ enum { struct gfs2_inode { struct inode i_inode; - struct gfs2_inum_host i_num; - + u64 i_no_addr; + u64 i_no_formal_ino; unsigned long i_flags; /* GIF_... */ struct gfs2_dinode_host i_di; /* To be replaced by ref to block */ @@ -275,14 +275,6 @@ enum { QDF_LOCKED = 2, }; -struct gfs2_quota_lvb { - __be32 qb_magic; - u32 __pad; - __be64 qb_limit; /* Hard limit of # blocks to alloc */ - __be64 qb_warn; /* Warn user when alloc is above this # */ - __be64 qb_value; /* Current # blocks allocated */ -}; - struct gfs2_quota_data { struct list_head qd_list; unsigned int qd_count; diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index df0b8b3..58f5a67 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -41,9 +41,9 @@ #include "util.h" static int iget_test(struct inode *inode, void *opaque) { struct gfs2_inode *ip = GFS2_I(inode); - struct gfs2_inum_host *inum = opaque; + u64 *no_addr = opaque; - if (ip->i_num.no_addr == inum->no_addr && + if (ip->i_no_addr == *no_addr && inode->i_private != NULL) return 1; @@ -53,37 +53,37 @@ static int iget_test(struct inode *inode static int iget_set(struct inode *inode, void *opaque) { struct gfs2_inode *ip = GFS2_I(inode); - struct gfs2_inum_host *inum = opaque; + u64 *no_addr = opaque; - ip->i_num = *inum; - inode->i_ino = inum->no_addr; + inode->i_ino = (unsigned long)*no_addr; + ip->i_no_addr = *no_addr; return 0; } -struct inode *gfs2_ilookup(struct super_block *sb, struct gfs2_inum_host *inum) +struct inode *gfs2_ilookup(struct super_block *sb, u64 no_addr) { - return ilookup5(sb, (unsigned long)inum->no_addr, - iget_test, inum); + unsigned long hash = (unsigned long)no_addr; + return ilookup5(sb, hash, iget_test, &no_addr); } -static struct inode *gfs2_iget(struct super_block *sb, struct gfs2_inum_host *inum) +static struct inode *gfs2_iget(struct super_block *sb, u64 no_addr) { - return iget5_locked(sb, (unsigned long)inum->no_addr, - iget_test, iget_set, inum); + unsigned long hash = (unsigned long)no_addr; + return iget5_locked(sb, hash, iget_test, iget_set, &no_addr); } /** * gfs2_inode_lookup - Lookup an inode * @sb: The super block - * @inum: The inode number + * @no_addr: The inode number * @type: The type of the inode * * Returns: A VFS inode, or an error */ -struct inode *gfs2_inode_lookup(struct super_block *sb, struct gfs2_inum_host *inum, unsigned int type) +struct inode *gfs2_inode_lookup(struct super_block *sb, u64 no_addr, unsigned int type) { - struct inode *inode = gfs2_iget(sb, inum); + struct inode *inode = gfs2_iget(sb, no_addr); struct gfs2_inode *ip = GFS2_I(inode); struct gfs2_glock *io_gl; int error; @@ -110,12 +110,12 @@ struct inode *gfs2_inode_lookup(struct s inode->i_op = &gfs2_dev_iops; } - error = gfs2_glock_get(sdp, inum->no_addr, &gfs2_inode_glops, CREATE, &ip->i_gl); + error = gfs2_glock_get(sdp, no_addr, &gfs2_inode_glops, CREATE, &ip->i_gl); if (unlikely(error)) goto fail; ip->i_gl->gl_object = ip; - error = gfs2_glock_get(sdp, inum->no_addr, &gfs2_iopen_glops, CREATE, &io_gl); + error = gfs2_glock_get(sdp, no_addr, &gfs2_iopen_glops, CREATE, &io_gl); if (unlikely(error)) goto fail_put; @@ -144,14 +144,12 @@ static int gfs2_dinode_in(struct gfs2_in struct gfs2_dinode_host *di = &ip->i_di; const struct gfs2_dinode *str = buf; - if (ip->i_num.no_addr != be64_to_cpu(str->di_num.no_addr)) { + if (ip->i_no_addr != be64_to_cpu(str->di_num.no_addr)) { if (gfs2_consist_inode(ip)) gfs2_dinode_print(ip); return -EIO; } - if (ip->i_num.no_formal_ino != be64_to_cpu(str->di_num.no_formal_ino)) - return -ESTALE; - + ip->i_no_formal_ino = be64_to_cpu(str->di_num.no_formal_ino); ip->i_inode.i_mode = be32_to_cpu(str->di_mode); ip->i_inode.i_rdev = 0; switch (ip->i_inode.i_mode & S_IFMT) { @@ -247,7 +245,7 @@ int gfs2_dinode_dealloc(struct gfs2_inod if (error) goto out_qs; - rgd = gfs2_blk2rgrpd(sdp, ip->i_num.no_addr); + rgd = gfs2_blk2rgrpd(sdp, ip->i_no_addr); if (!rgd) { gfs2_consist_inode(ip); error = -EIO; @@ -366,8 +364,6 @@ struct inode *gfs2_lookupi(struct inode struct super_block *sb = dir->i_sb; struct gfs2_inode *dip = GFS2_I(dir); struct gfs2_holder d_gh; - struct gfs2_inum_host inum; - unsigned int type; int error; struct inode *inode = NULL; int unlock = 0; @@ -395,12 +391,9 @@ struct inode *gfs2_lookupi(struct inode goto out; } - error = gfs2_dir_search(dir, name, &inum, &type); - if (error) - goto out; - - inode = gfs2_inode_lookup(sb, &inum, type); - + inode = gfs2_dir_search(dir, name); + if (IS_ERR(inode)) + error = PTR_ERR(inode); out: if (unlock) gfs2_glock_dq_uninit(&d_gh); @@ -548,7 +541,7 @@ static int create_ok(struct gfs2_inode * if (!dip->i_inode.i_nlink) return -EPERM; - error = gfs2_dir_search(&dip->i_inode, name, NULL, NULL); + error = gfs2_dir_check(&dip->i_inode, name, NULL); switch (error) { case -ENOENT: error = 0; @@ -588,8 +581,7 @@ static void munge_mode_uid_gid(struct gf *gid = current->fsgid; } -static int alloc_dinode(struct gfs2_inode *dip, struct gfs2_inum_host *inum, - u64 *generation) +static int alloc_dinode(struct gfs2_inode *dip, u64 *no_addr, u64 *generation) { struct gfs2_sbd *sdp = GFS2_SB(&dip->i_inode); int error; @@ -605,7 +597,7 @@ static int alloc_dinode(struct gfs2_inod if (error) goto out_ipreserv; - inum->no_addr = gfs2_alloc_di(dip, generation); + *no_addr = gfs2_alloc_di(dip, generation); gfs2_trans_end(sdp); @@ -760,7 +752,7 @@ static int link_dinode(struct gfs2_inode goto fail_quota_locks; } - error = gfs2_dir_add(&dip->i_inode, name, &ip->i_num, IF2DT(ip->i_inode.i_mode)); + error = gfs2_dir_add(&dip->i_inode, name, ip, IF2DT(ip->i_inode.i_mode)); if (error) goto fail_end_trans; @@ -844,7 +836,7 @@ struct inode *gfs2_createi(struct gfs2_h struct gfs2_inode *dip = ghs->gh_gl->gl_object; struct inode *dir = &dip->i_inode; struct gfs2_sbd *sdp = GFS2_SB(&dip->i_inode); - struct gfs2_inum_host inum; + struct gfs2_inum_host inum = { .no_addr = 0, .no_formal_ino = 0 }; int error; u64 generation; @@ -864,7 +856,7 @@ struct inode *gfs2_createi(struct gfs2_h if (error) goto fail_gunlock; - error = alloc_dinode(dip, &inum, &generation); + error = alloc_dinode(dip, &inum.no_addr, &generation); if (error) goto fail_gunlock; @@ -877,7 +869,7 @@ struct inode *gfs2_createi(struct gfs2_h if (error) goto fail_gunlock2; - inode = gfs2_inode_lookup(dir->i_sb, &inum, IF2DT(mode)); + inode = gfs2_inode_lookup(dir->i_sb, inum.no_addr, IF2DT(mode)); if (IS_ERR(inode)) goto fail_gunlock2; @@ -976,10 +968,8 @@ int gfs2_rmdiri(struct gfs2_inode *dip, */ int gfs2_unlink_ok(struct gfs2_inode *dip, const struct qstr *name, - struct gfs2_inode *ip) + const struct gfs2_inode *ip) { - struct gfs2_inum_host inum; - unsigned int type; int error; if (IS_IMMUTABLE(&ip->i_inode) || IS_APPEND(&ip->i_inode)) @@ -997,18 +987,10 @@ int gfs2_unlink_ok(struct gfs2_inode *di if (error) return error; - error = gfs2_dir_search(&dip->i_inode, name, &inum, &type); + error = gfs2_dir_check(&dip->i_inode, name, ip); if (error) return error; - if (!gfs2_inum_equal(&inum, &ip->i_num)) - return -ENOENT; - - if (IF2DT(ip->i_inode.i_mode) != type) { - gfs2_consist_inode(dip); - return -EIO; - } - return 0; } diff --git a/fs/gfs2/inode.h b/fs/gfs2/inode.h index b57f448..05fc095 100644 --- a/fs/gfs2/inode.h +++ b/fs/gfs2/inode.h @@ -10,17 +10,17 @@ #ifndef __INODE_DOT_H__ #define __INODE_DOT_H__ -static inline int gfs2_is_stuffed(struct gfs2_inode *ip) +static inline int gfs2_is_stuffed(const struct gfs2_inode *ip) { return !ip->i_di.di_height; } -static inline int gfs2_is_jdata(struct gfs2_inode *ip) +static inline int gfs2_is_jdata(const struct gfs2_inode *ip) { return ip->i_di.di_flags & GFS2_DIF_JDATA; } -static inline int gfs2_is_dir(struct gfs2_inode *ip) +static inline int gfs2_is_dir(const struct gfs2_inode *ip) { return S_ISDIR(ip->i_inode.i_mode); } @@ -32,9 +32,15 @@ static inline void gfs2_set_inode_blocks (GFS2_SB(inode)->sd_sb.sb_bsize_shift - GFS2_BASIC_BLOCK_SHIFT); } +static inline int gfs2_check_inum(const struct gfs2_inode *ip, u64 no_addr, + u64 no_formal_ino) +{ + return ip->i_no_addr == no_addr && ip->i_no_formal_ino == no_formal_ino; +} + void gfs2_inode_attr_in(struct gfs2_inode *ip); -struct inode *gfs2_inode_lookup(struct super_block *sb, struct gfs2_inum_host *inum, unsigned type); -struct inode *gfs2_ilookup(struct super_block *sb, struct gfs2_inum_host *inum); +struct inode *gfs2_inode_lookup(struct super_block *sb, u64 no_addr, unsigned type); +struct inode *gfs2_ilookup(struct super_block *sb, u64 no_addr); int gfs2_inode_refresh(struct gfs2_inode *ip); @@ -47,7 +53,7 @@ struct inode *gfs2_createi(struct gfs2_h int gfs2_rmdiri(struct gfs2_inode *dip, const struct qstr *name, struct gfs2_inode *ip); int gfs2_unlink_ok(struct gfs2_inode *dip, const struct qstr *name, - struct gfs2_inode *ip); + const struct gfs2_inode *ip); int gfs2_ok_to_move(struct gfs2_inode *this, struct gfs2_inode *to); int gfs2_readlinki(struct gfs2_inode *ip, char **buf, unsigned int *len); int gfs2_glock_nq_atime(struct gfs2_holder *gh); diff --git a/fs/gfs2/locking/dlm/lock.c b/fs/gfs2/locking/dlm/lock.c index c305255..542a797 100644 --- a/fs/gfs2/locking/dlm/lock.c +++ b/fs/gfs2/locking/dlm/lock.c @@ -174,7 +174,6 @@ static int gdlm_create_lp(struct gdlm_ls lp->cur = DLM_LOCK_IV; lp->lvb = NULL; lp->hold_null = NULL; - init_completion(&lp->ast_wait); INIT_LIST_HEAD(&lp->clist); INIT_LIST_HEAD(&lp->blist); INIT_LIST_HEAD(&lp->delay_list); @@ -399,6 +398,12 @@ static void gdlm_del_lvb(struct gdlm_loc lp->lksb.sb_lvbptr = NULL; } +static int gdlm_ast_wait(void *word) +{ + schedule(); + return 0; +} + /* This can do a synchronous dlm request (requiring a lock_dlm thread to get the completion) because gfs won't call hold_lvb() during a callback (from the context of a lock_dlm thread). */ @@ -424,10 +429,10 @@ static int hold_null_lock(struct gdlm_lo lpn->lkf = DLM_LKF_VALBLK | DLM_LKF_EXPEDITE; set_bit(LFL_NOBAST, &lpn->flags); set_bit(LFL_INLOCK, &lpn->flags); + set_bit(LFL_AST_WAIT, &lpn->flags); - init_completion(&lpn->ast_wait); gdlm_do_lock(lpn); - wait_for_completion(&lpn->ast_wait); + wait_on_bit(&lpn->flags, LFL_AST_WAIT, gdlm_ast_wait, TASK_UNINTERRUPTIBLE); error = lpn->lksb.sb_status; if (error) { printk(KERN_INFO "lock_dlm: hold_null_lock dlm error %d\n", diff --git a/fs/gfs2/locking/dlm/lock_dlm.h b/fs/gfs2/locking/dlm/lock_dlm.h index d074c6e..24d70f7 100644 --- a/fs/gfs2/locking/dlm/lock_dlm.h +++ b/fs/gfs2/locking/dlm/lock_dlm.h @@ -101,6 +101,7 @@ enum { LFL_NOBAST = 10, LFL_HEADQUE = 11, LFL_UNLOCK_DELETE = 12, + LFL_AST_WAIT = 13, }; struct gdlm_lock { @@ -117,7 +118,6 @@ struct gdlm_lock { unsigned long flags; /* lock_dlm flags LFL_ */ int bast_mode; /* protected by async_lock */ - struct completion ast_wait; struct list_head clist; /* complete */ struct list_head blist; /* blocking */ diff --git a/fs/gfs2/locking/dlm/thread.c b/fs/gfs2/locking/dlm/thread.c index 9cf1f16..1aca51e 100644 --- a/fs/gfs2/locking/dlm/thread.c +++ b/fs/gfs2/locking/dlm/thread.c @@ -44,6 +44,13 @@ static void process_blocking(struct gdlm ls->fscb(ls->sdp, cb, &lp->lockname); } +static void wake_up_ast(struct gdlm_lock *lp) +{ + clear_bit(LFL_AST_WAIT, &lp->flags); + smp_mb__after_clear_bit(); + wake_up_bit(&lp->flags, LFL_AST_WAIT); +} + static void process_complete(struct gdlm_lock *lp) { struct gdlm_ls *ls = lp->ls; @@ -136,7 +143,7 @@ static void process_complete(struct gdlm */ if (test_and_clear_bit(LFL_SYNC_LVB, &lp->flags)) { - complete(&lp->ast_wait); + wake_up_ast(lp); return; } @@ -214,7 +221,7 @@ out: if (test_bit(LFL_INLOCK, &lp->flags)) { clear_bit(LFL_NOBLOCK, &lp->flags); lp->cur = lp->req; - complete(&lp->ast_wait); + wake_up_ast(lp); return; } diff --git a/fs/gfs2/meta_io.h b/fs/gfs2/meta_io.h index e037425..527bf19 100644 --- a/fs/gfs2/meta_io.h +++ b/fs/gfs2/meta_io.h @@ -63,7 +63,7 @@ int gfs2_meta_indirect_buffer(struct gfs static inline int gfs2_meta_inode_buffer(struct gfs2_inode *ip, struct buffer_head **bhp) { - return gfs2_meta_indirect_buffer(ip, 0, ip->i_num.no_addr, 0, bhp); + return gfs2_meta_indirect_buffer(ip, 0, ip->i_no_addr, 0, bhp); } struct buffer_head *gfs2_meta_ra(struct gfs2_glock *gl, u64 dblock, u32 extlen); diff --git a/fs/gfs2/ondisk.c b/fs/gfs2/ondisk.c index d9ecfd2..cd4cf05 100644 --- a/fs/gfs2/ondisk.c +++ b/fs/gfs2/ondisk.c @@ -33,26 +33,10 @@ #define pv(struct, member, fmt) printk(K * first arg: the cpu-order structure */ -void gfs2_inum_in(struct gfs2_inum_host *no, const void *buf) +void gfs2_inum_out(const struct gfs2_inode *ip, struct gfs2_dirent *dent) { - const struct gfs2_inum *str = buf; - - no->no_formal_ino = be64_to_cpu(str->no_formal_ino); - no->no_addr = be64_to_cpu(str->no_addr); -} - -void gfs2_inum_out(const struct gfs2_inum_host *no, void *buf) -{ - struct gfs2_inum *str = buf; - - str->no_formal_ino = cpu_to_be64(no->no_formal_ino); - str->no_addr = cpu_to_be64(no->no_addr); -} - -static void gfs2_inum_print(const struct gfs2_inum_host *no) -{ - printk(KERN_INFO " no_formal_ino = %llu\n", (unsigned long long)no->no_formal_ino); - printk(KERN_INFO " no_addr = %llu\n", (unsigned long long)no->no_addr); + dent->de_inum.no_formal_ino = cpu_to_be64(ip->i_no_formal_ino); + dent->de_inum.no_addr = cpu_to_be64(ip->i_no_addr); } static void gfs2_meta_header_in(struct gfs2_meta_header_host *mh, const void *buf) @@ -74,9 +58,10 @@ void gfs2_sb_in(struct gfs2_sb_host *sb, sb->sb_multihost_format = be32_to_cpu(str->sb_multihost_format); sb->sb_bsize = be32_to_cpu(str->sb_bsize); sb->sb_bsize_shift = be32_to_cpu(str->sb_bsize_shift); - - gfs2_inum_in(&sb->sb_master_dir, (char *)&str->sb_master_dir); - gfs2_inum_in(&sb->sb_root_dir, (char *)&str->sb_root_dir); + sb->sb_master_dir.no_addr = be64_to_cpu(str->sb_master_dir.no_addr); + sb->sb_master_dir.no_formal_ino = be64_to_cpu(str->sb_master_dir.no_formal_ino); + sb->sb_root_dir.no_addr = be64_to_cpu(str->sb_root_dir.no_addr); + sb->sb_root_dir.no_formal_ino = be64_to_cpu(str->sb_root_dir.no_formal_ino); memcpy(sb->sb_lockproto, str->sb_lockproto, GFS2_LOCKNAME_LEN); memcpy(sb->sb_locktable, str->sb_locktable, GFS2_LOCKNAME_LEN); @@ -146,9 +131,8 @@ void gfs2_dinode_out(const struct gfs2_i str->di_header.__pad0 = 0; str->di_header.mh_format = cpu_to_be32(GFS2_FORMAT_DI); str->di_header.__pad1 = 0; - - gfs2_inum_out(&ip->i_num, &str->di_num); - + str->di_num.no_addr = cpu_to_be64(ip->i_no_addr); + str->di_num.no_formal_ino = cpu_to_be64(ip->i_no_formal_ino); str->di_mode = cpu_to_be32(ip->i_inode.i_mode); str->di_uid = cpu_to_be32(ip->i_inode.i_uid); str->di_gid = cpu_to_be32(ip->i_inode.i_gid); @@ -178,7 +162,8 @@ void gfs2_dinode_print(const struct gfs2 { const struct gfs2_dinode_host *di = &ip->i_di; - gfs2_inum_print(&ip->i_num); + printk(KERN_INFO " no_formal_ino = %llu\n", (unsigned long long)ip->i_no_formal_ino); + printk(KERN_INFO " no_addr = %llu\n", (unsigned long long)ip->i_no_addr); printk(KERN_INFO " di_size = %llu\n", (unsigned long long)di->di_size); printk(KERN_INFO " di_blocks = %llu\n", (unsigned long long)di->di_blocks); diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c index 30c1562..fb84478 100644 --- a/fs/gfs2/ops_address.c +++ b/fs/gfs2/ops_address.c @@ -1,6 +1,6 @@ /* * Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. - * Copyright (C) 2004-2006 Red Hat, Inc. All rights reserved. + * Copyright (C) 2004-2007 Red Hat, Inc. All rights reserved. * * This copyrighted material is made available to anyone wishing to use, * modify, copy, or redistribute it subject to the terms and conditions @@ -32,6 +32,7 @@ #include "quota.h" #include "trans.h" #include "rgrp.h" #include "ops_file.h" +#include "super.h" #include "util.h" #include "glops.h" @@ -450,6 +451,31 @@ out_uninit: } /** + * adjust_fs_space - Adjusts the free space available due to gfs2_grow + * @inode: the rindex inode + */ +static void adjust_fs_space(struct inode *inode) +{ + struct gfs2_sbd *sdp = inode->i_sb->s_fs_info; + struct gfs2_statfs_change_host *m_sc = &sdp->sd_statfs_master; + struct gfs2_statfs_change_host *l_sc = &sdp->sd_statfs_local; + u64 fs_total, new_free; + + /* Total up the file system space, according to the latest rindex. */ + fs_total = gfs2_ri_total(sdp); + + spin_lock(&sdp->sd_statfs_spin); + if (fs_total > (m_sc->sc_total + l_sc->sc_total)) + new_free = fs_total - (m_sc->sc_total + l_sc->sc_total); + else + new_free = 0; + spin_unlock(&sdp->sd_statfs_spin); + fs_warn(sdp, "File system extended by %llu blocks.\n", + (unsigned long long)new_free); + gfs2_statfs_change(sdp, new_free, new_free, 0); +} + +/** * gfs2_commit_write - Commit write to a file * @file: The file to write to * @page: The page containing the data @@ -511,6 +537,9 @@ static int gfs2_commit_write(struct file di->di_size = cpu_to_be64(inode->i_size); } + if (inode == sdp->sd_rindex) + adjust_fs_space(inode); + brelse(dibh); gfs2_trans_end(sdp); if (al->al_requested) { @@ -728,8 +757,8 @@ static unsigned limit = 0; return; fs_warn(sdp, "ip = %llu %llu\n", - (unsigned long long)ip->i_num.no_formal_ino, - (unsigned long long)ip->i_num.no_addr); + (unsigned long long)ip->i_no_formal_ino, + (unsigned long long)ip->i_no_addr); for (x = 0; x < GFS2_MAX_META_HEIGHT; x++) fs_warn(sdp, "ip->i_cache[%u] = %s\n", diff --git a/fs/gfs2/ops_address.h b/fs/gfs2/ops_address.h index 35aaee4..fa1b5b3 100644 --- a/fs/gfs2/ops_address.h +++ b/fs/gfs2/ops_address.h @@ -1,6 +1,6 @@ /* * Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. - * Copyright (C) 2004-2006 Red Hat, Inc. All rights reserved. + * Copyright (C) 2004-2007 Red Hat, Inc. All rights reserved. * * This copyrighted material is made available to anyone wishing to use, * modify, copy, or redistribute it subject to the terms and conditions diff --git a/fs/gfs2/ops_dentry.c b/fs/gfs2/ops_dentry.c index a6fdc52..793e334 100644 --- a/fs/gfs2/ops_dentry.c +++ b/fs/gfs2/ops_dentry.c @@ -21,6 +21,7 @@ #include "dir.h" #include "glock.h" #include "ops_dentry.h" #include "util.h" +#include "inode.h" /** * gfs2_drevalidate - Check directory lookup consistency @@ -40,14 +41,15 @@ static int gfs2_drevalidate(struct dentr struct gfs2_inode *dip = GFS2_I(parent->d_inode); struct inode *inode = dentry->d_inode; struct gfs2_holder d_gh; - struct gfs2_inode *ip; - struct gfs2_inum_host inum; - unsigned int type; + struct gfs2_inode *ip = NULL; int error; int had_lock=0; - if (inode && is_bad_inode(inode)) - goto invalid; + if (inode) { + if (is_bad_inode(inode)) + goto invalid; + ip = GFS2_I(inode); + } if (sdp->sd_args.ar_localcaching) goto valid; @@ -59,7 +61,7 @@ static int gfs2_drevalidate(struct dentr goto fail; } - error = gfs2_dir_search(parent->d_inode, &dentry->d_name, &inum, &type); + error = gfs2_dir_check(parent->d_inode, &dentry->d_name, ip); switch (error) { case 0: if (!inode) @@ -73,16 +75,6 @@ static int gfs2_drevalidate(struct dentr goto fail_gunlock; } - ip = GFS2_I(inode); - - if (!gfs2_inum_equal(&ip->i_num, &inum)) - goto invalid_gunlock; - - if (IF2DT(ip->i_inode.i_mode) != type) { - gfs2_consist_inode(dip); - goto fail_gunlock; - } - valid_gunlock: if (!had_lock) gfs2_glock_dq_uninit(&d_gh); diff --git a/fs/gfs2/ops_export.c b/fs/gfs2/ops_export.c index aad9183..51a8a14 100644 --- a/fs/gfs2/ops_export.c +++ b/fs/gfs2/ops_export.c @@ -75,10 +75,10 @@ static int gfs2_encode_fh(struct dentry (connectable && *len < GFS2_LARGE_FH_SIZE)) return 255; - fh[0] = cpu_to_be32(ip->i_num.no_formal_ino >> 32); - fh[1] = cpu_to_be32(ip->i_num.no_formal_ino & 0xFFFFFFFF); - fh[2] = cpu_to_be32(ip->i_num.no_addr >> 32); - fh[3] = cpu_to_be32(ip->i_num.no_addr & 0xFFFFFFFF); + fh[0] = cpu_to_be32(ip->i_no_formal_ino >> 32); + fh[1] = cpu_to_be32(ip->i_no_formal_ino & 0xFFFFFFFF); + fh[2] = cpu_to_be32(ip->i_no_addr >> 32); + fh[3] = cpu_to_be32(ip->i_no_addr & 0xFFFFFFFF); *len = GFS2_SMALL_FH_SIZE; if (!connectable || inode == sb->s_root->d_inode) @@ -90,10 +90,10 @@ static int gfs2_encode_fh(struct dentry igrab(inode); spin_unlock(&dentry->d_lock); - fh[4] = cpu_to_be32(ip->i_num.no_formal_ino >> 32); - fh[5] = cpu_to_be32(ip->i_num.no_formal_ino & 0xFFFFFFFF); - fh[6] = cpu_to_be32(ip->i_num.no_addr >> 32); - fh[7] = cpu_to_be32(ip->i_num.no_addr & 0xFFFFFFFF); + fh[4] = cpu_to_be32(ip->i_no_formal_ino >> 32); + fh[5] = cpu_to_be32(ip->i_no_formal_ino & 0xFFFFFFFF); + fh[6] = cpu_to_be32(ip->i_no_addr >> 32); + fh[7] = cpu_to_be32(ip->i_no_addr & 0xFFFFFFFF); fh[8] = cpu_to_be32(inode->i_mode); fh[9] = 0; /* pad to double word */ @@ -144,7 +144,8 @@ static int gfs2_get_name(struct dentry * ip = GFS2_I(inode); *name = 0; - gnfd.inum = ip->i_num; + gnfd.inum.no_addr = ip->i_no_addr; + gnfd.inum.no_formal_ino = ip->i_no_formal_ino; gnfd.name = name; error = gfs2_glock_nq_init(dip->i_gl, LM_ST_SHARED, 0, &gh); @@ -202,9 +203,9 @@ static struct dentry *gfs2_get_dentry(st /* System files? */ - inode = gfs2_ilookup(sb, inum); + inode = gfs2_ilookup(sb, inum->no_addr); if (inode) { - if (GFS2_I(inode)->i_num.no_formal_ino != inum->no_formal_ino) { + if (GFS2_I(inode)->i_no_formal_ino != inum->no_formal_ino) { iput(inode); return ERR_PTR(-ESTALE); } @@ -236,7 +237,7 @@ static struct dentry *gfs2_get_dentry(st gfs2_glock_dq_uninit(&rgd_gh); gfs2_glock_dq_uninit(&ri_gh); - inode = gfs2_inode_lookup(sb, inum, fh_obj->imode); + inode = gfs2_inode_lookup(sb, inum->no_addr, fh_obj->imode); if (!inode) goto fail; if (IS_ERR(inode)) { @@ -249,6 +250,10 @@ static struct dentry *gfs2_get_dentry(st iput(inode); goto fail; } + if (GFS2_I(inode)->i_no_formal_ino != inum->no_formal_ino) { + iput(inode); + goto fail; + } error = -EIO; if (GFS2_I(inode)->i_di.di_flags & GFS2_DIF_SYSTEM) { diff --git a/fs/gfs2/ops_file.c b/fs/gfs2/ops_file.c index 064df88..550032c 100644 --- a/fs/gfs2/ops_file.c +++ b/fs/gfs2/ops_file.c @@ -502,7 +502,7 @@ static int gfs2_lock(struct file *file, struct gfs2_inode *ip = GFS2_I(file->f_mapping->host); struct gfs2_sbd *sdp = GFS2_SB(file->f_mapping->host); struct lm_lockname name = - { .ln_number = ip->i_num.no_addr, + { .ln_number = ip->i_no_addr, .ln_type = LM_TYPE_PLOCK }; if (!(fl->fl_flags & FL_POSIX)) @@ -557,7 +557,7 @@ static int do_flock(struct file *file, i gfs2_glock_dq_uninit(fl_gh); } else { error = gfs2_glock_get(GFS2_SB(&ip->i_inode), - ip->i_num.no_addr, &gfs2_flock_glops, + ip->i_no_addr, &gfs2_flock_glops, CREATE, &gl); if (error) goto out; diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index 2c5f8e7..c682371 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -236,17 +236,17 @@ fail: return error; } -static struct inode *gfs2_lookup_root(struct super_block *sb, - struct gfs2_inum_host *inum) +static inline struct inode *gfs2_lookup_root(struct super_block *sb, + u64 no_addr) { - return gfs2_inode_lookup(sb, inum, DT_DIR); + return gfs2_inode_lookup(sb, no_addr, DT_DIR); } static int init_sb(struct gfs2_sbd *sdp, int silent, int undo) { struct super_block *sb = sdp->sd_vfs; struct gfs2_holder sb_gh; - struct gfs2_inum_host *inum; + u64 no_addr; struct inode *inode; int error = 0; @@ -289,10 +289,10 @@ static int init_sb(struct gfs2_sbd *sdp, sb_set_blocksize(sb, sdp->sd_sb.sb_bsize); /* Get the root inode */ - inum = &sdp->sd_sb.sb_root_dir; + no_addr = sdp->sd_sb.sb_root_dir.no_addr; if (sb->s_type == &gfs2meta_fs_type) - inum = &sdp->sd_sb.sb_master_dir; - inode = gfs2_lookup_root(sb, inum); + no_addr = sdp->sd_sb.sb_master_dir.no_addr; + inode = gfs2_lookup_root(sb, no_addr); if (IS_ERR(inode)) { error = PTR_ERR(inode); fs_err(sdp, "can't read in root inode: %d\n", error); @@ -449,7 +449,7 @@ static int init_inodes(struct gfs2_sbd * if (undo) goto fail_qinode; - inode = gfs2_lookup_root(sdp->sd_vfs, &sdp->sd_sb.sb_master_dir); + inode = gfs2_lookup_root(sdp->sd_vfs, sdp->sd_sb.sb_master_dir.no_addr); if (IS_ERR(inode)) { error = PTR_ERR(inode); fs_err(sdp, "can't read in master directory: %d\n", error); diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c index d85f6e0..f8ecfec 100644 --- a/fs/gfs2/ops_inode.c +++ b/fs/gfs2/ops_inode.c @@ -157,7 +157,7 @@ static int gfs2_link(struct dentry *old_ if (error) goto out_gunlock; - error = gfs2_dir_search(dir, &dentry->d_name, NULL, NULL); + error = gfs2_dir_check(dir, &dentry->d_name, NULL); switch (error) { case -ENOENT: break; @@ -217,8 +217,7 @@ static int gfs2_link(struct dentry *old_ goto out_ipres; } - error = gfs2_dir_add(dir, &dentry->d_name, &ip->i_num, - IF2DT(inode->i_mode)); + error = gfs2_dir_add(dir, &dentry->d_name, ip, IF2DT(inode->i_mode)); if (error) goto out_end_trans; @@ -275,7 +274,7 @@ static int gfs2_unlink(struct inode *dir gfs2_holder_init(dip->i_gl, LM_ST_EXCLUSIVE, 0, ghs); gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, ghs + 1); - rgd = gfs2_blk2rgrpd(sdp, ip->i_num.no_addr); + rgd = gfs2_blk2rgrpd(sdp, ip->i_no_addr); gfs2_holder_init(rgd->rd_gl, LM_ST_EXCLUSIVE, 0, ghs + 2); @@ -420,7 +419,7 @@ static int gfs2_mkdir(struct inode *dir, dent = (struct gfs2_dirent *)((char*)dent + GFS2_DIRENT_SIZE(1)); gfs2_qstr2dirent(&str, dibh->b_size - GFS2_DIRENT_SIZE(1) - sizeof(struct gfs2_dinode), dent); - gfs2_inum_out(&dip->i_num, &dent->de_inum); + gfs2_inum_out(dip, dent); dent->de_type = cpu_to_be16(DT_DIR); gfs2_dinode_out(ip, di); @@ -472,7 +471,7 @@ static int gfs2_rmdir(struct inode *dir, gfs2_holder_init(dip->i_gl, LM_ST_EXCLUSIVE, 0, ghs); gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, ghs + 1); - rgd = gfs2_blk2rgrpd(sdp, ip->i_num.no_addr); + rgd = gfs2_blk2rgrpd(sdp, ip->i_no_addr); gfs2_holder_init(rgd->rd_gl, LM_ST_EXCLUSIVE, 0, ghs + 2); error = gfs2_glock_nq_m(3, ghs); @@ -614,7 +613,7 @@ static int gfs2_rename(struct inode *odi * this is the case of the target file already existing * so we unlink before doing the rename */ - nrgd = gfs2_blk2rgrpd(sdp, nip->i_num.no_addr); + nrgd = gfs2_blk2rgrpd(sdp, nip->i_no_addr); if (nrgd) gfs2_holder_init(nrgd->rd_gl, LM_ST_EXCLUSIVE, 0, ghs + num_gh++); } @@ -653,7 +652,7 @@ static int gfs2_rename(struct inode *odi if (error) goto out_gunlock; - error = gfs2_dir_search(ndir, &ndentry->d_name, NULL, NULL); + error = gfs2_dir_check(ndir, &ndentry->d_name, NULL); switch (error) { case -ENOENT: error = 0; @@ -750,7 +749,7 @@ static int gfs2_rename(struct inode *odi if (error) goto out_end_trans; - error = gfs2_dir_mvino(ip, &name, &ndip->i_num, DT_DIR); + error = gfs2_dir_mvino(ip, &name, nip, DT_DIR); if (error) goto out_end_trans; } else { @@ -768,8 +767,7 @@ static int gfs2_rename(struct inode *odi if (error) goto out_end_trans; - error = gfs2_dir_add(ndir, &ndentry->d_name, &ip->i_num, - IF2DT(ip->i_inode.i_mode)); + error = gfs2_dir_add(ndir, &ndentry->d_name, ip, IF2DT(ip->i_inode.i_mode)); if (error) goto out_end_trans; diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c index c186857..fcd3ee2 100644 --- a/fs/gfs2/quota.c +++ b/fs/gfs2/quota.c @@ -627,6 +627,8 @@ static int gfs2_adjust_quota(struct gfs2 err = 0; qd->qd_qb.qb_magic = cpu_to_be32(GFS2_MAGIC); qd->qd_qb.qb_value = cpu_to_be64(value); + ((struct gfs2_quota_lvb*)(qd->qd_gl->gl_lvb))->qb_magic = cpu_to_be32(GFS2_MAGIC); + ((struct gfs2_quota_lvb*)(qd->qd_gl->gl_lvb))->qb_value = cpu_to_be64(value); unlock: unlock_page(page); page_cache_release(page); @@ -709,7 +711,7 @@ static int do_sync(unsigned int num_qd, offset = qd2offset(qd); error = gfs2_adjust_quota(ip, offset, qd->qd_change_sync, (struct gfs2_quota_data *) - qd->qd_gl->gl_lvb); + qd); if (error) goto out_end_trans; diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index 1727f50..30eb428 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -1,6 +1,6 @@ /* * Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. - * Copyright (C) 2004-2006 Red Hat, Inc. All rights reserved. + * Copyright (C) 2004-2007 Red Hat, Inc. All rights reserved. * * This copyrighted material is made available to anyone wishing to use, * modify, copy, or redistribute it subject to the terms and conditions @@ -431,9 +431,94 @@ static int compute_bitstructs(struct gfs } /** - * gfs2_ri_update - Pull in a new resource index from the disk + * gfs2_ri_total - Total up the file system space, according to the rindex. + * + */ +u64 gfs2_ri_total(struct gfs2_sbd *sdp) +{ + u64 total_data = 0; + struct inode *inode = sdp->sd_rindex; + struct gfs2_inode *ip = GFS2_I(inode); + struct gfs2_rindex_host ri; + char buf[sizeof(struct gfs2_rindex)]; + struct file_ra_state ra_state; + int error, rgrps; + + mutex_lock(&sdp->sd_rindex_mutex); + file_ra_state_init(&ra_state, inode->i_mapping); + for (rgrps = 0;; rgrps++) { + loff_t pos = rgrps * sizeof(struct gfs2_rindex); + + if (pos + sizeof(struct gfs2_rindex) >= ip->i_di.di_size) + break; + error = gfs2_internal_read(ip, &ra_state, buf, &pos, + sizeof(struct gfs2_rindex)); + if (error != sizeof(struct gfs2_rindex)) + break; + gfs2_rindex_in(&ri, buf); + total_data += ri.ri_data; + } + mutex_unlock(&sdp->sd_rindex_mutex); + return total_data; +} + +/** + * read_rindex_entry - Pull in a new resource index entry from the disk * @gl: The glock covering the rindex inode * + * Returns: 0 on success, error code otherwise + */ + +static int read_rindex_entry(struct gfs2_inode *ip, + struct file_ra_state *ra_state) +{ + struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode); + loff_t pos = sdp->sd_rgrps * sizeof(struct gfs2_rindex); + char buf[sizeof(struct gfs2_rindex)]; + int error; + struct gfs2_rgrpd *rgd; + + error = gfs2_internal_read(ip, ra_state, buf, &pos, + sizeof(struct gfs2_rindex)); + if (!error) + return 0; + if (error != sizeof(struct gfs2_rindex)) { + if (error > 0) + error = -EIO; + return error; + } + + rgd = kzalloc(sizeof(struct gfs2_rgrpd), GFP_NOFS); + error = -ENOMEM; + if (!rgd) + return error; + + mutex_init(&rgd->rd_mutex); + lops_init_le(&rgd->rd_le, &gfs2_rg_lops); + rgd->rd_sbd = sdp; + + list_add_tail(&rgd->rd_list, &sdp->sd_rindex_list); + list_add_tail(&rgd->rd_list_mru, &sdp->sd_rindex_mru_list); + + gfs2_rindex_in(&rgd->rd_ri, buf); + error = compute_bitstructs(rgd); + if (error) + return error; + + error = gfs2_glock_get(sdp, rgd->rd_ri.ri_addr, + &gfs2_rgrp_glops, CREATE, &rgd->rd_gl); + if (error) + return error; + + rgd->rd_gl->gl_object = rgd; + rgd->rd_rg_vn = rgd->rd_gl->gl_vn - 1; + return error; +} + +/** + * gfs2_ri_update - Pull in a new resource index from the disk + * @ip: pointer to the rindex inode + * * Returns: 0 on successful update, error code otherwise */ @@ -441,13 +526,11 @@ static int gfs2_ri_update(struct gfs2_in { struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode); struct inode *inode = &ip->i_inode; - struct gfs2_rgrpd *rgd; - char buf[sizeof(struct gfs2_rindex)]; struct file_ra_state ra_state; - u64 junk = ip->i_di.di_size; + u64 rgrp_count = ip->i_di.di_size; int error; - if (do_div(junk, sizeof(struct gfs2_rindex))) { + if (do_div(rgrp_count, sizeof(struct gfs2_rindex))) { gfs2_consist_inode(ip); return -EIO; } @@ -455,50 +538,50 @@ static int gfs2_ri_update(struct gfs2_in clear_rgrpdi(sdp); file_ra_state_init(&ra_state, inode->i_mapping); - for (sdp->sd_rgrps = 0;; sdp->sd_rgrps++) { - loff_t pos = sdp->sd_rgrps * sizeof(struct gfs2_rindex); - error = gfs2_internal_read(ip, &ra_state, buf, &pos, - sizeof(struct gfs2_rindex)); - if (!error) - break; - if (error != sizeof(struct gfs2_rindex)) { - if (error > 0) - error = -EIO; - goto fail; + for (sdp->sd_rgrps = 0; sdp->sd_rgrps < rgrp_count; sdp->sd_rgrps++) { + error = read_rindex_entry(ip, &ra_state); + if (error) { + clear_rgrpdi(sdp); + return error; } + } - rgd = kzalloc(sizeof(struct gfs2_rgrpd), GFP_NOFS); - error = -ENOMEM; - if (!rgd) - goto fail; - - mutex_init(&rgd->rd_mutex); - lops_init_le(&rgd->rd_le, &gfs2_rg_lops); - rgd->rd_sbd = sdp; - - list_add_tail(&rgd->rd_list, &sdp->sd_rindex_list); - list_add_tail(&rgd->rd_list_mru, &sdp->sd_rindex_mru_list); - - gfs2_rindex_in(&rgd->rd_ri, buf); - error = compute_bitstructs(rgd); - if (error) - goto fail; + sdp->sd_rindex_vn = ip->i_gl->gl_vn; + return 0; +} - error = gfs2_glock_get(sdp, rgd->rd_ri.ri_addr, - &gfs2_rgrp_glops, CREATE, &rgd->rd_gl); - if (error) - goto fail; +/** + * gfs2_ri_update_special - Pull in a new resource index from the disk + * + * This is a special version that's safe to call from gfs2_inplace_reserve_i. + * In this case we know that we don't have any resource groups in memory yet. + * + * @ip: pointer to the rindex inode + * + * Returns: 0 on successful update, error code otherwise + */ +static int gfs2_ri_update_special(struct gfs2_inode *ip) +{ + struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode); + struct inode *inode = &ip->i_inode; + struct file_ra_state ra_state; + int error; - rgd->rd_gl->gl_object = rgd; - rgd->rd_rg_vn = rgd->rd_gl->gl_vn - 1; + file_ra_state_init(&ra_state, inode->i_mapping); + for (sdp->sd_rgrps = 0;; sdp->sd_rgrps++) { + /* Ignore partials */ + if ((sdp->sd_rgrps + 1) * sizeof(struct gfs2_rindex) > + ip->i_di.di_size) + break; + error = read_rindex_entry(ip, &ra_state); + if (error) { + clear_rgrpdi(sdp); + return error; + } } sdp->sd_rindex_vn = ip->i_gl->gl_vn; return 0; - -fail: - clear_rgrpdi(sdp); - return error; } /** @@ -978,18 +1061,25 @@ int gfs2_inplace_reserve_i(struct gfs2_i { struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode); struct gfs2_alloc *al = &ip->i_alloc; - int error; + int error = 0; if (gfs2_assert_warn(sdp, al->al_requested)) return -EINVAL; - error = gfs2_rindex_hold(sdp, &al->al_ri_gh); + /* We need to hold the rindex unless the inode we're using is + the rindex itself, in which case it's already held. */ + if (ip != GFS2_I(sdp->sd_rindex)) + error = gfs2_rindex_hold(sdp, &al->al_ri_gh); + else if (!sdp->sd_rgrps) /* We may not have the rindex read in, so: */ + error = gfs2_ri_update_special(ip); + if (error) return error; error = get_local_rgrp(ip); if (error) { - gfs2_glock_dq_uninit(&al->al_ri_gh); + if (ip != GFS2_I(sdp->sd_rindex)) + gfs2_glock_dq_uninit(&al->al_ri_gh); return error; } @@ -1019,7 +1109,8 @@ void gfs2_inplace_release(struct gfs2_in al->al_rgd = NULL; gfs2_glock_dq_uninit(&al->al_rgd_gh); - gfs2_glock_dq_uninit(&al->al_ri_gh); + if (ip != GFS2_I(sdp->sd_rindex)) + gfs2_glock_dq_uninit(&al->al_ri_gh); } /** @@ -1379,7 +1470,7 @@ void gfs2_unlink_di(struct inode *inode) struct gfs2_inode *ip = GFS2_I(inode); struct gfs2_sbd *sdp = GFS2_SB(inode); struct gfs2_rgrpd *rgd; - u64 blkno = ip->i_num.no_addr; + u64 blkno = ip->i_no_addr; rgd = rgblk_free(sdp, blkno, 1, GFS2_BLKST_UNLINKED); if (!rgd) @@ -1414,9 +1505,9 @@ static void gfs2_free_uninit_di(struct g void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip) { - gfs2_free_uninit_di(rgd, ip->i_num.no_addr); + gfs2_free_uninit_di(rgd, ip->i_no_addr); gfs2_quota_change(ip, -1, ip->i_inode.i_uid, ip->i_inode.i_gid); - gfs2_meta_wipe(ip, ip->i_num.no_addr, 1); + gfs2_meta_wipe(ip, ip->i_no_addr, 1); } /** diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h index b01e0cf..b4c6adf 100644 --- a/fs/gfs2/rgrp.h +++ b/fs/gfs2/rgrp.h @@ -65,5 +65,6 @@ void gfs2_rlist_add(struct gfs2_sbd *sdp void gfs2_rlist_alloc(struct gfs2_rgrp_list *rlist, unsigned int state, int flags); void gfs2_rlist_free(struct gfs2_rgrp_list *rlist); +u64 gfs2_ri_total(struct gfs2_sbd *sdp); #endif /* __RGRP_DOT_H__ */ diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index 4fdda97..faccffd 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -360,7 +360,7 @@ int gfs2_jindex_hold(struct gfs2_sbd *sd name.len = sprintf(buf, "journal%u", sdp->sd_journals); name.hash = gfs2_disk_hash(name.name, name.len); - error = gfs2_dir_search(sdp->sd_jindex, &name, NULL, NULL); + error = gfs2_dir_check(sdp->sd_jindex, &name, NULL); if (error == -ENOENT) { error = 0; break; diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c index 601eaa1..3f5edc5 100644 --- a/fs/gfs2/util.c +++ b/fs/gfs2/util.c @@ -115,8 +115,8 @@ int gfs2_consist_inode_i(struct gfs2_ino "GFS2: fsid=%s: inode = %llu %llu\n" "GFS2: fsid=%s: function = %s, file = %s, line = %u\n", sdp->sd_fsname, - sdp->sd_fsname, (unsigned long long)ip->i_num.no_formal_ino, - (unsigned long long)ip->i_num.no_addr, + sdp->sd_fsname, (unsigned long long)ip->i_no_formal_ino, + (unsigned long long)ip->i_no_addr, sdp->sd_fsname, function, file, line); return rv; } diff --git a/include/linux/Kbuild b/include/linux/Kbuild index e101315..eafa88f 100644 --- a/include/linux/Kbuild +++ b/include/linux/Kbuild @@ -49,6 +49,7 @@ header-y += consolemap.h header-y += const.h header-y += cycx_cfm.h header-y += dlm_device.h +header-y += dlm_netlink.h header-y += dm-ioctl.h header-y += dn.h header-y += dqblk_v1.h diff --git a/include/linux/dlm.h b/include/linux/dlm.h index 1b1dcb9..5227a95 100644 --- a/include/linux/dlm.h +++ b/include/linux/dlm.h @@ -2,7 +2,7 @@ ******************************************************************************* ** ** Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. -** Copyright (C) 2004-2005 Red Hat, Inc. All rights reserved. +** Copyright (C) 2004-2007 Red Hat, Inc. All rights reserved. ** ** This copyrighted material is made available to anyone wishing to use, ** modify, copy, or redistribute it subject to the terms and conditions @@ -85,7 +85,11 @@ #define DLM_RESNAME_MAXLEN 64 * Only relevant to locks originating in userspace. A persistent lock will not * be removed if the process holding the lock exits. * - * DLM_LKF_NODLKWT + * DLM_LKF_NODLCKWT + * + * Do not cancel the lock if it gets into conversion deadlock. + * Exclude this lock from being monitored due to DLM_LSFL_TIMEWARN. + * * DLM_LKF_NODLCKBLK * * net yet implemented @@ -149,6 +153,7 @@ #define DLM_LKF_ORPHAN 0x00004000 #define DLM_LKF_ALTPR 0x00008000 #define DLM_LKF_ALTCW 0x00010000 #define DLM_LKF_FORCEUNLOCK 0x00020000 +#define DLM_LKF_TIMEOUT 0x00040000 /* * Some return codes that are not in errno.h @@ -199,11 +204,11 @@ struct dlm_lksb { char * sb_lvbptr; }; +#define DLM_LSFL_NODIR 0x00000001 +#define DLM_LSFL_TIMEWARN 0x00000002 #ifdef __KERNEL__ -#define DLM_LSFL_NODIR 0x00000001 - /* * dlm_new_lockspace * diff --git a/include/linux/dlm_device.h b/include/linux/dlm_device.h index c2735ca..9642277 100644 --- a/include/linux/dlm_device.h +++ b/include/linux/dlm_device.h @@ -2,7 +2,7 @@ ******************************************************************************* ** ** Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. -** Copyright (C) 2004-2005 Red Hat, Inc. All rights reserved. +** Copyright (C) 2004-2007 Red Hat, Inc. All rights reserved. ** ** This copyrighted material is made available to anyone wishing to use, ** modify, copy, or redistribute it subject to the terms and conditions @@ -18,21 +18,24 @@ #define DLM_USER_LVB_LEN 32 /* Version of the device interface */ -#define DLM_DEVICE_VERSION_MAJOR 5 -#define DLM_DEVICE_VERSION_MINOR 1 +#define DLM_DEVICE_VERSION_MAJOR 6 +#define DLM_DEVICE_VERSION_MINOR 0 #define DLM_DEVICE_VERSION_PATCH 0 /* struct passed to the lock write */ struct dlm_lock_params { __u8 mode; __u8 namelen; - __u16 flags; + __u16 unused; + __u32 flags; __u32 lkid; __u32 parent; - void __user *castparam; + __u64 xid; + __u64 timeout; + void __user *castparam; void __user *castaddr; void __user *bastparam; - void __user *bastaddr; + void __user *bastaddr; struct dlm_lksb __user *lksb; char lvb[DLM_USER_LVB_LEN]; char name[0]; @@ -62,9 +65,15 @@ struct dlm_write_request { } i; }; +struct dlm_device_version { + __u32 version[3]; +}; + /* struct read from the "device" fd, consists mainly of userspace pointers for the library to use */ + struct dlm_lock_result { + __u32 version[3]; __u32 length; void __user * user_astaddr; void __user * user_astparam; @@ -83,6 +92,7 @@ #define DLM_USER_QUERY 3 #define DLM_USER_CREATE_LOCKSPACE 4 #define DLM_USER_REMOVE_LOCKSPACE 5 #define DLM_USER_PURGE 6 +#define DLM_USER_DEADLOCK 7 /* Arbitrary length restriction */ #define MAX_LS_NAME_LEN 64 diff --git a/include/linux/dlm_netlink.h b/include/linux/dlm_netlink.h new file mode 100644 index 0000000..1927633 --- /dev/null +++ b/include/linux/dlm_netlink.h @@ -0,0 +1,56 @@ +/* + * Copyright (C) 2007 Red Hat, Inc. All rights reserved. + * + * This copyrighted material is made available to anyone wishing to use, + * modify, copy, or redistribute it subject to the terms and conditions + * of the GNU General Public License v.2. + */ + +#ifndef _DLM_NETLINK_H +#define _DLM_NETLINK_H + +enum { + DLM_STATUS_WAITING = 1, + DLM_STATUS_GRANTED = 2, + DLM_STATUS_CONVERT = 3, +}; + +#define DLM_LOCK_DATA_VERSION 1 + +struct dlm_lock_data { + uint16_t version; + uint32_t lockspace_id; + int nodeid; + int ownpid; + uint32_t id; + uint32_t remid; + uint64_t xid; + int8_t status; + int8_t grmode; + int8_t rqmode; + unsigned long timestamp; + int resource_namelen; + char resource_name[DLM_RESNAME_MAXLEN]; +}; + +enum { + DLM_CMD_UNSPEC = 0, + DLM_CMD_HELLO, /* user->kernel */ + DLM_CMD_TIMEOUT, /* kernel->user */ + __DLM_CMD_MAX, +}; + +#define DLM_CMD_MAX (__DLM_CMD_MAX - 1) + +enum { + DLM_TYPE_UNSPEC = 0, + DLM_TYPE_LOCK, + __DLM_TYPE_MAX, +}; + +#define DLM_TYPE_MAX (__DLM_TYPE_MAX - 1) + +#define DLM_GENL_VERSION 0x1 +#define DLM_GENL_NAME "DLM" + +#endif /* _DLM_NETLINK_H */ diff --git a/include/linux/gfs2_ondisk.h b/include/linux/gfs2_ondisk.h index 8b7e4c1..9ecf929 100644 --- a/include/linux/gfs2_ondisk.h +++ b/include/linux/gfs2_ondisk.h @@ -59,13 +59,6 @@ struct gfs2_inum_host { __u64 no_addr; }; -static inline int gfs2_inum_equal(const struct gfs2_inum_host *ino1, - const struct gfs2_inum_host *ino2) -{ - return ino1->no_formal_ino == ino2->no_formal_ino && - ino1->no_addr == ino2->no_addr; -} - /* * Generic metadata head structure * Every inplace buffer logged in the journal must start with this. @@ -507,11 +500,19 @@ struct gfs2_quota_change_host { __u32 qc_id; }; +struct gfs2_quota_lvb { + __be32 qb_magic; + __u32 __pad; + __be64 qb_limit; /* Hard limit of # blocks to alloc */ + __be64 qb_warn; /* Warn user when alloc is above this # */ + __be64 qb_value; /* Current # blocks allocated */ +}; + #ifdef __KERNEL__ /* Translation functions */ +struct gfs2_inode; -extern void gfs2_inum_in(struct gfs2_inum_host *no, const void *buf); -extern void gfs2_inum_out(const struct gfs2_inum_host *no, void *buf); +extern void gfs2_inum_out(const struct gfs2_inode *ip, struct gfs2_dirent *dent); extern void gfs2_sb_in(struct gfs2_sb_host *sb, const void *buf); extern void gfs2_rindex_in(struct gfs2_rindex_host *ri, const void *buf); extern void gfs2_rindex_out(const struct gfs2_rindex_host *ri, void *buf);