GIT 281e4bacc5d74f28654929539f39897b7f0a8e5e git://git.linux-nfs.org/~bfields/linux.git#for-mm 89afedc4fe68f0758577a4fc8c5122e63d2c5c28 commit 89afedc4fe68f0758577a4fc8c5122e63d2c5c28 Author: Jeff Layton Date: Tue Apr 8 15:40:08 2008 -0400 NLM: don't let lockd exit on unexpected svc_recv errors (try #2) When svc_recv returns an unexpected error, lockd will print a warning and exit. This problematic for several reasons. In particular, it will cause the reference counts for the thread to be wrong, and can lead to a potential BUG() call. Rather than exiting on error from svc_recv, have the thread do a 1s sleep and then retry the loop. This is unlikely to cause any harm, and if the error turns out to be something temporary then it may be able to recover. Signed-off-by: Jeff Layton Signed-off-by: J. Bruce Fields commit 88a2d9b1599dee623bff6d3422fc98f54962794d Author: Jeff Layton Date: Tue Apr 8 15:40:07 2008 -0400 NFS: don't let nfs_callback_svc exit on unexpected svc_recv errors (try #2) When svc_recv returns an unexpected error, nfs_callback_svc will print a warning and exit. This problematic for several reasons. In particular, it will cause the reference counts for the thread to be wrong, and no new thread will be started until all nfs4 mounts are unmounted. Rather than exiting on error from svc_recv, have the thread do a 1s sleep and then retry the loop. This is unlikely to cause any harm, and if the error turns out to be something temporary then it may be able to recover. Signed-off-by: Jeff Layton Signed-off-by: J. Bruce Fields commit e07ac6b1ff5970e99138e5e4a255ee0dabdbf4d6 Author: Steven Whitehouse Date: Tue Apr 8 13:12:52 2008 +0100 Use a zero sized array for raw field in struct fid The raw field's size can vary so we use a zero sized array since gcc will not allow a variable sized array inside a union. This has been tested with ext3 and gfs2 and relates to the bug report: http://lkml.org/lkml/2007/10/24/374 and discussion thread: http://lkml.org/lkml/2008/4/7/65 Signed-off-by: Steven Whitehouse Cc: Christoph Hellwig Cc: Neil Brown Cc: Adrian Bunk Signed-off-by: J. Bruce Fields commit fb12dc4d60ddca5f1a438d75b84c8789beff33ae Author: Olga Kornievskaia Date: Fri Mar 28 16:04:56 2008 -0400 nfsd: use static memory for callback program and stats There's no need to dynamically allocate this memory, and doing so may create the possibility of races on shutdown of the rpc client. (We've witnessed it only after adding rpcsec_gss support to the server, after which the rpc code can send destroys calls that expect to still be able to access the rpc_stats structure after it has been destroyed.) Such races are in theory possible if the module containing this "static" memory is removed very quickly after an rpc client is destroyed, but we haven't seen that happen. Signed-off-by: J. Bruce Fields commit 191c55236da1331117afc2cf139bde46951f03c4 Author: Jeff Layton Date: Mon Apr 7 16:45:37 2008 -0400 SUNRPC: remove svc_create_thread() Now that the nfs4 callback thread uses the kthread API, there are no more users of svc_create_thread(). Remove it. Signed-off-by: Jeff Layton Signed-off-by: J. Bruce Fields commit 6ba6528ea60ba91b40ca49cf4ef5b2f49d1c8065 Author: J. Bruce Fields Date: Mon Apr 7 13:09:47 2008 -0400 nfsd: fix comment Obvious comment nit. Signed-off-by: J. Bruce Fields commit 47a07491a04cd4986e13129ed302d630f269e6ea Author: J. Bruce Fields Date: Mon Apr 7 13:05:27 2008 -0400 lockd: Fix stale nlmsvc_unlink_block comment As of 5996a298da43a03081e9ba2116983d173001c862 ("NLM: don't unlock on cancel requests") we no longer unlock in this case, so the comment is no longer accurate. Thanks to Stuart Friedberg for pointing out the inconsistency. Cc: Stuart Friedberg Signed-off-by: J. Bruce Fields commit 1a9ef07085b82b94b5bb7c6aa7b148b8142adc4f Author: Robert P. J. Day Date: Tue Apr 1 21:48:37 2008 -0400 NFSD: Strip __KERNEL__ testing from unexported header files. Also, sort the Kbuild file. Signed-off-by: Robert P. J. Day Signed-off-by: J. Bruce Fields commit c7985013cd7a4b9693cd75fbeb12511fac649b41 Author: Kevin Coffman Date: Mon Mar 31 10:31:44 2008 -0400 sunrpc: make token header values less confusing g_make_token_header() and g_token_size() add two too many, and therefore their callers pass in "(logical_value - 2)" rather than "logical_value" as hard-coded values which causes confusion. This dates back to the original g_make_token_header which took an optional token type (token_id) value and added it to the token. This was removed, but the routine always adds room for the token_id rather than not. Signed-off-by: Kevin Coffman Signed-off-by: J. Bruce Fields commit 21dd89ce01dedde0ef537109a8c237ec709cde5e Author: Kevin Coffman Date: Mon Mar 31 10:31:33 2008 -0400 gss_krb5: consistently use unsigned for seqnum Consistently use unsigned (u32 vs. s32) for seqnum. In get_mic function, send the local copy of seq_send, rather than the context version. Signed-off-by: Kevin Coffman Signed-off-by: J. Bruce Fields commit 519b75b2c84072d2e467f8ee56355e49a2dee3a9 Author: Chuck Lever Date: Thu Mar 27 16:34:54 2008 -0400 NFSD: Remove NFSv4 dependency on NFSv3 Clean up: Because NFSD_V4 "depends on" NFSD_V3, it appears as a child of the NFSD_V3 menu entry, and is not visible if NFSD_V3 is unselected. Replace the dependency on NFSD_V3 with a "select NFSD_V3". This makes NFSD_V4 look and work just like NFS_V3, while ensuring that NFSD_V3 is enabled if NFSD_V4 is. Sam Ravnborg adds: "This use of select is questionable. In general it is bad to select a symbol with dependencies. In this case the dependencies of NFSD_V3 are duplicated for NFSD_V4 so we will not se erratic configurations but do you remember to update NFSD_V4 when you add a depends on NFSD_V3? But I see no other clean way to do it right now." Later he said: "My comment was more to say we have things to address in kconfig. This is abuse in the acceptable range." Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields commit a60fd2c6ff6f4c0521a493dadc3158c30004a1b6 Author: Chuck Lever Date: Thu Mar 27 16:34:47 2008 -0400 SUNRPC: Remove PROC_FS dependency Recently, commit 440bcc59 added a reverse dependency to fs/Kconfig to ensure that PROC_FS was enabled if SUNRPC_GSS was enabled. Apparently this isn't necessary because the auth_gss components under net/sunrpc will build correctly even if PROC_FS is disabled, though RPCSEC_GSS will not work without /proc. It also violates the guideline in Documentation/kbuild/kconfig-language.txt that states "In general use select only for non-visible symbols (no prompts anywhere) and for symbols with no dependencies." To address these issues, remove the dependency. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields commit df97a26fe3dd679469c94162e8d753c1911e4817 Author: Chuck Lever Date: Thu Mar 27 16:34:40 2008 -0400 NFSD: Use "depends on" for PROC_FS dependency Recently, commit 440bcc59 added a reverse dependency to fs/Kconfig to ensure that PROC_FS was enabled if NFSD_V4 was enabled. There is a guideline in Documentation/kbuild/kconfig-language.txt that states "In general use select only for non-visible symbols (no prompts anywhere) and for symbols with no dependencies." A quick grep around other Kconfig files reveals that no entry currently uses "select PROC_FS" -- every one uses "depends on". Thus CONFIG_NFSD_V4 should use "depends on PROC_FS" as well. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields commit 69b89f2842bae5b77c29d8e67e6a4f7e8834b1db Author: Tom Tucker Date: Tue Mar 25 11:14:02 2008 -0500 SVCRDMA: Check num_sge when setting LAST_CTXT bit The RDMACTXT_F_LAST_CTXT bit was getting set incorrectly when the last chunk in the read-list spanned multiple pages. This resulted in a kernel panic when the wrong context was used to build the RPC iovec page list. RDMA_READ is used to fetch RPC data from the client for NFS_WRITE requests. A scatter-gather is used to map the advertised client side buffer to the server-side iovec and associated page list. WR contexts are used to convey which scatter-gather entries are handled by each WR. When the write data is large, a single RPC may require multiple RDMA_READ requests so the contexts for a single RPC are chained together in a linked list. The last context in this list is marked with a bit RDMACTXT_F_LAST_CTXT so that when this WR completes, the CQ handler code can enqueue the RPC for processing. The code in rdma_read_xdr was setting this bit on the last two contexts on this list when the last read-list chunk spanned multiple pages. This caused the svc_rdma_recvfrom logic to incorrectly build the RPC and caused the kernel to crash because the second-to-last context doesn't contain the iovec page list. Modified the condition that sets this bit so that it correctly detects the last context for the RPC. Signed-off-by: Tom Tucker Tested-by: Roland Dreier Signed-off-by: J. Bruce Fields commit f315aaac968e16ec514cdfb8f41af93742002c63 Author: J. Bruce Fields Date: Fri Mar 14 17:51:12 2008 -0400 nfsd: move most of fh_verify to separate function Move the code that actually parses the filehandle and looks up the dentry and export to a separate function. This simplifies the reference counting a little and moves fh_verify() a little closer to the kernel ideal of small, minimally-indentended functions. Clean up a few other minor style sins along the way. Signed-off-by: J. Bruce Fields Cc: Neil Brown commit f91b591f94b4773793f9f4e47611ff3902ce1d5e Author: Andrew Morton Date: Wed Mar 12 14:04:25 2008 -0700 net/sunrpc/svc.c: suppress unintialized var warning net/sunrpc/svc.c: In function '__svc_create_thread': net/sunrpc/svc.c:587: warning: 'oldmask.bits[0u]' may be used uninitialized in this function Cc: Neil Brown Cc: Trond Myklebust Cc: David S. Miller Cc: Tom Tucker Cc: Chuck Lever Signed-off-by: Andrew Morton Signed-off-by: J. Bruce Fields commit 85751a23d0e93f67f12f9de1b5d2fec53d8cbea2 Author: Kevin Coffman Date: Thu Feb 21 13:44:27 2008 -0500 Remove define for KRB5_CKSUM_LENGTH, which will become enctype-dependent cleanup: When adding new encryption types, the checksum length can be different for each enctype. Face the fact that the current code only supports DES which has a checksum length of 8. Signed-off-by: Kevin Coffman Signed-off-by: J. Bruce Fields commit f0b84a1644b0d80a7c27e75078335505ad371519 Author: Kevin Coffman Date: Thu Feb 21 13:44:12 2008 -0500 Correct grammer/typos in dprintks cleanup: Fix grammer/typos to use "too" instead of "to" Signed-off-by: Kevin Coffman Signed-off-by: J. Bruce Fields commit 255ce4a3fecc3ee346037dad667c41292b208604 Author: Tom Tucker Date: Tue Mar 11 12:44:27 2008 -0500 SVCRDMA: Add check for XPT_CLOSE in svc_rdma_send SVCRDMA: Add check for XPT_CLOSE in svc_rdma_send The svcrdma transport can crash if a send is waiting for an empty SQ slot and the connection is closed due to an asynchronous error. The crash is caused when svc_rdma_send attempts to send on a deleted QP. Signed-off-by: Tom Tucker Signed-off-by: J. Bruce Fields commit 2ca602d917d9e141a2682b6fb7f96c240b3bb447 Author: Felix Blyakher Date: Tue Feb 26 10:54:36 2008 -0800 nfsd: initialize lease type in nfs4_open_delegation() While lease is correctly checked by supplying the type argument to vfs_setlease(), it's stored with fl_type uninitialized. This breaks the logic when checking the type of the lease. The fix is to initialize fl_type. The old code still happened to function correctly since F_RDLCK is zero, and we only implement read delegations currently (nor write delegations). But that's no excuse for not fixing this. Signed-off-by: Felix Blyakher Signed-off-by: J. Bruce Fields commit 5e341b154d7da9a51eb1d7b41b02ee5b1e18239b Author: James Lentini Date: Mon Feb 25 12:20:13 2008 -0500 Documentation: NFS/RDMA instructions for 2.6.25-rc1 Add some instructions for using the new NFS/RDMA features. Signed-off-by: James Lentini Cc: Roland Dreier Signed-off-by: J. Bruce Fields commit 91335fb27b76432e1952fddcbd424070431140c2 Author: Jeff Layton Date: Wed Feb 20 08:55:30 2008 -0500 NFS: convert nfs4 callback thread to kthread API There's a general push to convert kernel threads to use the (much cleaner) kthread API. This patch converts the NFSv4 callback kernel thread to the kthread API. In addition to being generally cleaner this also removes the dependency on signals when shutting down the thread. Note that this patch depends on the recent patches to svc_recv() to make it check kthread_should_stop() periodically. Those patches are in Bruce's tree at the moment and are slated for 2.6.26 along with the lockd conversion, so this conversion is probably also appropriate for 2.6.26. Signed-off-by: Jeff Layton Acked-by: Trond Myklebust Signed-off-by: J. Bruce Fields commit 7beb8dfe6be60685f14ed6b252225bea4e73365c Author: Harvey Harrison Date: Wed Feb 20 12:49:02 2008 -0800 nfsd: fix sparse warning in vfs.c fs/nfsd/vfs.c:991:27: warning: Using plain integer as NULL pointer Signed-off-by: Harvey Harrison Signed-off-by: J. Bruce Fields commit 15f07f9164a7d4727fcc0e4c413d9b6995478bc4 Author: Harvey Harrison Date: Wed Feb 20 12:49:00 2008 -0800 nfsd: fix sparse warnings Add extern to nfsd/nfsd.h fs/nfsd/nfssvc.c:146:5: warning: symbol 'nfsd_nrthreads' was not declared. Should it be static? fs/nfsd/nfssvc.c:261:5: warning: symbol 'nfsd_nrpools' was not declared. Should it be static? fs/nfsd/nfssvc.c:269:5: warning: symbol 'nfsd_get_nrthreads' was not declared. Should it be static? fs/nfsd/nfssvc.c:281:5: warning: symbol 'nfsd_set_nrthreads' was not declared. Should it be static? fs/nfsd/export.c:1534:23: warning: symbol 'nfs_exports_op' was not declared. Should it be static? Add include of auth.h fs/nfsd/auth.c:27:5: warning: symbol 'nfsd_setuser' was not declared. Should it be static? Make static, move forward declaration closer to where it's needed. fs/nfsd/nfs4state.c:1877:1: warning: symbol 'laundromat_main' was not declared. Should it be static? Make static, forward declaration was already marked static. fs/nfsd/nfs4idmap.c:206:1: warning: symbol 'idtoname_parse' was not declared. Should it be static? fs/nfsd/vfs.c:1156:1: warning: symbol 'nfsd_create_setattr' was not declared. Should it be static? Signed-off-by: Harvey Harrison Signed-off-by: J. Bruce Fields commit ad46b97846f87d2d8f0480ebce4835b2cdd689bd Author: J. Bruce Fields Date: Wed Feb 20 15:40:15 2008 -0500 lockd: convert nsm_mutex to a spinlock There's no reason for a mutex here, except to allow an allocation under the lock, which we can avoid with the usual trick of preallocating memory for the new object and freeing it if it turns out to be unnecessary. Signed-off-by: J. Bruce Fields commit fd9378b8db8f7bdf6c026933b3df1985a49b4d5c Author: J. Bruce Fields Date: Wed Feb 20 15:27:31 2008 -0500 lockd: clean up __nsm_find() Use list_for_each_entry(). Also, in keeping with kernel style, make the normal case (kzalloc succeeds) unindented and handle the abnormal case with a goto. Signed-off-by: J. Bruce Fields commit 23087a2d8aeaad5511b774248fda8b8b109c5d82 Author: J. Bruce Fields Date: Wed Feb 20 14:02:47 2008 -0500 lockd: fix race in nlm_release() The sm_count is decremented to zero but left on the nsm_handles list. So in the space between decrementing sm_count and acquiring nsm_mutex, it is possible for another task to find this nsm_handle, increment the use count and then enter nsm_release itself. Thus there's nothing to prevent the nsm being freed before we acquire nsm_mutex here. Signed-off-by: J. Bruce Fields commit 5d40adb08bf84d50621e3fca8618b13418a5da6c Author: Harshula Jayasuriya Date: Wed Feb 20 10:56:56 2008 +1100 sunrpc: GSS integrity and decryption failures should return GARBAGE_ARGS In function svcauth_gss_accept() (net/sunrpc/auth_gss/svcauth_gss.c) the code that handles GSS integrity and decryption failures should be returning GARBAGE_ARGS as specified in RFC 2203, sections 5.3.3.4.2 and 5.3.3.4.3. Reviewed-by: Greg Banks Signed-off-by: Harshula Jayasuriya Signed-off-by: J. Bruce Fields commit c6c6ec20b3e5424a1ebaf3142b668199ff05a225 Author: Harvey Harrison Date: Mon Feb 18 02:01:49 2008 -0800 lockd: fix sparse warning in svcshare.c fs/lockd/svcshare.c:74:50: warning: Using plain integer as NULL pointer Signed-off-by: Harvey Harrison Cc: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: J. Bruce Fields commit 48c62b5c5c3fbd634ea8a2e50c117213583d9e74 Author: Adrian Bunk Date: Wed Feb 13 23:30:26 2008 +0200 make nfsd_create_setattr() static This patch makes the needlessly global nfsd_create_setattr() static. Signed-off-by: Adrian Bunk Signed-off-by: J. Bruce Fields commit 7b950a28f756d8b0a800d28a8c82abe41b186654 Author: Jeff Layton Date: Tue Feb 12 11:47:24 2008 -0500 SUNRPC: allow svc_recv to break out of 500ms sleep when alloc_page fails svc_recv() calls alloc_page(), and if it fails it does a 500ms uninterruptible sleep and then reattempts. There doesn't seem to be any real reason for this to be uninterruptible, so change it to an interruptible sleep. Also check for kthread_stop() and signalled() after setting the task state to avoid races that might lead to sleeping after kthread_stop() wakes up the task. I've done some very basic smoke testing with this, but obviously it's hard to test the actual changes since this all depends on an alloc_page() call failing. Signed-off-by: Jeff Layton Signed-off-by: J. Bruce Fields commit 31cf267db9c2b191d1e3c86260047253f37e3b16 Author: Chuck Lever Date: Mon Feb 11 17:12:38 2008 -0500 NFSD: Remove redundant "select" clauses in fs/Kconfig As far as I can tell, selecting the CRYPTO and CRYPTO_MD5 entries under CONFIG_NFSD is redundant, since CONFIG_NFSD_V4 already selects RPCSEC_GSS_KRB5, which selects these entries. Testing with "make menuconfig" shows that the entries under CRYPTO still properly reflect "Y" or "M" based on the setting of CONFIG_NFSD after this change is applied. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields commit 8e6d2db45398bf157be91ae7c8421ebcacc9e446 Author: Chuck Lever Date: Mon Feb 11 17:12:31 2008 -0500 NFSD: Move "select NFSD_V2_ACL if NFSD_V3_ACL" Clean up: since NFSD_V2_ACL is a boolean, it can be selected safely under the NFSD_V3_ACL entry (also a boolean). Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields commit 3a2f1bfa896b9067ac88eaaecdd2a5a796c6236e Author: Chuck Lever Date: Mon Feb 11 17:12:24 2008 -0500 NFSD: Move "select FS_POSIX_ACL if NFSD_V4" Clean up: since FS_POSIX_ACL is a non-visible boolean entry, it can be selected safely under the NFSD_V4 entry (also a boolean). Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields commit 8632e59a804e3dbf5ee9e33e1c30768cb7317761 Author: Chuck Lever Date: Mon Feb 11 17:11:54 2008 -0500 NFSD: Update help text for CONFIG_NFSD Clean up: refresh the help text for Kconfig items related to the NFS server. Remove obsolete URLs, and make the language consistent among the options. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields commit 6ba903e78db177bc250f51a6926b764ef7c96224 Author: Chuck Lever Date: Mon Feb 11 17:11:39 2008 -0500 NFSD: Remove NFSD_TCP kernel build option Likewise, distros usually leave CONFIG_NFSD_TCP enabled. TCP support in the Linux NFS server is stable enough that we can leave it on always. CONFIG_NFSD_TCP adds about 10 lines of code, and defaults to "Y" anyway. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields commit 578c581bb722b021e415f5560748294a92de4e84 Author: J. Bruce Fields Date: Mon Feb 11 15:48:47 2008 -0500 nfsd: clarify readdir/mountpoint-crossing code The code here is difficult to understand; attempt to clarify somewhat by pulling out one of the more mystifying conditionals into a separate function. While we're here, also add lease_time to the list of attributes that we don't really need to cross a mountpoint to fetch. Signed-off-by: J. Bruce Fields Cc: Peter Staubach commit 7836e8e7835ded7dc880d5a05bb1ff1a32f9536e Author: J. Bruce Fields Date: Thu Jan 31 16:14:54 2008 -0500 svcrpc: move unused field from cache_deferred_req This field is set once and never used; probably some artifact of an earlier implementation idea. Signed-off-by: J. Bruce Fields commit ee3f70f9415699284c02bc497ded859932c8d288 Author: J. Bruce Fields Date: Sat Jan 26 23:36:48 2008 -0500 nfsd4: kill unnecessary check in preprocess_stateid_op This condition is always true. Signed-off-by: J. Bruce Fields commit ac482aeed19780d3ca76edd95dd1352ffd30f8be Author: J. Bruce Fields Date: Sat Jan 26 19:08:12 2008 -0500 nfsd4: simplify stateid sequencing checks Pull this common code into a separate function. Signed-off-by: J. Bruce Fields commit c810aaa98e8c481d89b0e50ba2265001b7863654 Author: J. Bruce Fields Date: Sat Jan 26 14:58:45 2008 -0500 nfsd4: remove unnecessary CHECK_FH check in preprocess_seqid_op Every caller sets this flag, so it's meaningless. Signed-off-by: J. Bruce Fields commit 06b443e35b5aa7845bd75353d0896bf1708f2ac3 Author: J. Bruce Fields Date: Sat Jan 19 13:58:23 2008 -0500 nfs: remove unnecessary NFS_NEED_* defines Thanks to Robert Day for pointing out that these two defines are unused. Signed-off-by: J. Bruce Fields Cc: Trond Myklebust Trond Myklebust Cc: Neil Brown Cc: "Robert P. J. Day" commit cf3b44bc0f94062f94ac3868050738c9ced6b5a1 Author: Aurélien Charbon Date: Fri Jan 18 15:50:56 2008 +0100 IPv6 support for NFS server export caches This adds IPv6 support to the interfaces that are used to express nfsd exports. All addressed are stored internally as IPv6; backwards compatibility is maintained using mapped addresses. Thanks to Bruce Fields, Brian Haley, Neil Brown and Hideaki Joshifuji for comments Signed-off-by: Aurelien Charbon Cc: Neil Brown Cc: Brian Haley Cc: YOSHIFUJI Hideaki / 吉藤英明 Signed-off-by: J. Bruce Fields commit b0a9ba20029b83c52aabc4214f85cabb64e70102 Author: Jeff Layton Date: Thu Feb 7 16:34:55 2008 -0500 NLM: Convert lockd to use kthreads Have lockd_up start lockd using kthread_run. With this change, lockd_down now blocks until lockd actually exits, so there's no longer need for the waitqueue code at the end of lockd_down. This also means that only one lockd can be running at a time which simplifies the code within lockd's main loop. This also adds a check for kthread_should_stop in the main loop of nlmsvc_retry_blocked and after that function returns. There's no sense continuing to retry blocks if lockd is coming down anyway. Signed-off-by: Jeff Layton commit 87c4071d60e931f8a3605b818e71474adbd4174d Author: Jeff Layton Date: Thu Feb 7 16:34:54 2008 -0500 SUNRPC: have svc_recv() check kthread_should_stop() When using kthreads that call into svc_recv, we want to make sure that they do not block there for a long time when we're trying to take down the kthread. This patch changes svc_recv() to check kthread_should_stop() at the same places that it checks to see if it's signalled(). Also check just before svc_recv() tries to schedule(). By making sure that we check it just after setting the task state we can avoid having to use any locking or signalling to ensure it doesn't block for a long time. There's still a chance of a 500ms sleep if alloc_page() fails, but that should be a rare occurrence and isn't a terribly long time in the context of a kthread being taken down. Signed-off-by: Jeff Layton commit 61be93025b23e68dbe0bb9d9a4720369638f538e Author: Jeff Layton Date: Thu Feb 7 16:34:53 2008 -0500 SUNRPC: export svc_sock_update_bufs Needed since the plan is to not have a svc_create_thread helper and to have current users of that function just call kthread_run directly. Signed-off-by: Jeff Layton Reviewed-by: NeilBrown Signed-off-by: J. Bruce Fields commit 7cfdda7379d190aa74aede0acb6fd361abf4e7d1 Author: NeilBrown Date: Fri Feb 8 13:03:37 2008 +1100 knfsd: Remove NLM_HOST_MAX and associated logic. Lockd caches information about hosts that have recently held locks to expedite the taking of further locks. It periodically discards this information for hosts that have not been used for a few minutes. lockd currently has a value NLM_HOST_MAX, and changes the 'garbage collection' behaviour when the number of hosts exceeds this threshold. However its behaviour is strange, and likely not what was intended. When the number of hosts exceeds the max, it scans *less* often (every 2 minutes vs every minute) and allows unused host information to remain around longer (5 minutes instead of 2). Having this limit is of dubious value anyway, and we have not suffered from the code not getting the limit right, so remove the limit altogether. We go with the larger values (discard 5 minute old hosts every 2 minutes) as they are probably safer. Maybe the periodic garbage collection should be replace to with 'shrinker' handler so we just respond to memory pressure.... Acked-by: Jeff Layton Signed-off-by: Neil Brown Signed-off-by: J. Bruce Fields ### Diffstat output ./fs/lockd/host.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff .prev/fs/lockd/host.c ./fs/lockd/host.c commit 0490a54a00c14212f22c5948c8c13a4553d745bd Author: Chuck Lever Date: Fri Mar 14 14:26:08 2008 -0400 lockd: introduce new function to encode private argument in SM_MON requests Clean up: refactor the encoding of the opaque 16-byte private argument in xdr_encode_mon(). This will be updated later to support IPv6 addresses. Signed-off-by: Chuck Lever Signed-off-by: Trond Myklebust commit 2ca7754d4c96d68e1475690422a202ba5f0443d8 Author: Chuck Lever Date: Fri Mar 14 14:26:01 2008 -0400 lockd: Fix up incorrect RPC buffer size calculations. Switch to using the new mon_id encoder function. Now that we've refactored the encoding of SM_MON requests, we've discovered that the pre-computed buffer length maximums are incorrect! Signed-off-by: Chuck Lever Signed-off-by: Trond Myklebust commit ea72a7f170e686baf00ceee57b6197bef686889c Author: Chuck Lever Date: Fri Mar 14 14:25:53 2008 -0400 lockd: document use of mon_id argument in SM_MON requests Clean up: document the argument type that xdr_encode_common() is marshalling by introducing a new function. The new function will replace xdr_encode_common() in just a sec. Signed-off-by: Chuck Lever Signed-off-by: Trond Myklebust commit 850c95fd07b63704d06909c9a081c51272d32db1 Author: Chuck Lever Date: Fri Mar 14 14:25:46 2008 -0400 lockd: refactor SM_MON my_id argument encoder Clean up: introduce a new XDR encoder specifically for the my_id argument of SM_MON requests. Signed-off-by: Chuck Lever Signed-off-by: Trond Myklebust commit 496951743100b2d2ccd8b268257e79425e489851 Author: Chuck Lever Date: Fri Mar 14 14:25:39 2008 -0400 lockd: refactor SM_MON mon_name argument encoder Clean up: introduce a new XDR encoder specifically for the mon_name argument of SM_MON requests. This will be updated later to support IPv6 addresses in addition to IPv4 addresses. Signed-off-by: Chuck Lever Signed-off-by: Trond Myklebust commit 099bd05f27ff24a3041d54e7ed42d2eb681484b9 Author: Chuck Lever Date: Fri Mar 14 14:25:32 2008 -0400 lockd: Ensure NSM strings aren't longer than protocol allows Introduce a special helper function to check the length of NSM strings before they are placed on the wire. Signed-off-by: Chuck Lever Signed-off-by: Trond Myklebust commit f34ec991ae0f015f5cdc51ad46c3a317ffae2466 Author: Chuck Lever Date: Fri Mar 14 14:18:45 2008 -0400 lockd: bring a few function declarations up to date Clean-up: replace __inline__ and use up-to-date function declaration conventions. Signed-off-by: Chuck Lever Signed-off-by: Trond Myklebust commit eb18860e1385bfc7f08fcb7ba362e4a5156c8324 Author: Chuck Lever Date: Fri Mar 14 14:18:37 2008 -0400 NLM: NLM protocol version numbers are u32 Clean up: RPC protocol version numbers are u32. Make sure we use an appropriate type for NLM version numbers when calling nlm_lookup_host(). Eliminates a harmless mixed sign comparison in nlm_host_lookup(). Signed-off-by: Chuck Lever Signed-off-by: Trond Myklebust commit 90d5b18061656993410dfd57ddb88aa5a3f34563 Author: Chuck Lever Date: Fri Mar 14 14:18:30 2008 -0400 NLM: LOCKD fails to load if CONFIG_SYSCTL is not set Bruce Fields says: "By the way, we've got another config-related nit here: http://bugzilla.linux-nfs.org/show_bug.cgi?id=156 You can build lockd without CONFIG_SYSCTL set, but then the module will fail to load." For now, disable the sysctl registration calls in lockd if CONFIG_SYSCTL is not enabled. This allows the kernel to build properly if PROC_FS or SYSCTL is not enabled, but an NFS client is desired. In the long run, we would like to be able to build the kernel with an NFS client but without lockd. This makes sense, for example, if you want an NFSv4-only NFS client, as NFSv4 doesn't use NLM at all. Signed-off-by: Chuck Lever Signed-off-by: Trond Myklebust commit 1e40316bc44851c5920490df01482ed862e4dbc4 Author: Chuck Lever Date: Fri Mar 14 14:15:18 2008 -0400 SUNRPC: Add a default setting for CONFIG_SUNRPC_BIND34 Most distros will want support for rpcbind protocols 3 and 4 to default off until they have integrated user-space support for the new rpcbind daemon which supports IPv6 RPC services. Signed-off-by: Chuck Lever Signed-off-by: Trond Myklebust commit 327a299d8abd0f4580098cbd6c58b0f017417005 Author: Chuck Lever Date: Fri Mar 14 14:15:11 2008 -0400 SUNRPC: Update help Kconfig text Clean up: refresh the help text for Kconfig items related to the sunrpc module. Remove obsolete URLs, and make the language consistent among the options. Signed-off-by: Chuck Lever Signed-off-by: Trond Myklebust commit ecfc555a8327ff09b07066d73a98c04115007eec Author: Chuck Lever Date: Fri Mar 14 14:14:56 2008 -0400 NFS: Always enable NFS direct I/O Since O_DIRECT is a standard feature that is enabled in most distros, eliminate the CONFIG_NFS_DIRECTIO build option, and change the fs/nfs/Makefile to always build in the NFS direct I/O engine. Signed-off-by: Chuck Lever Signed-off-by: Trond Myklebust commit 82d101d58a2312297ee79f96d44c1d8c7fe1032d Author: Chuck Lever Date: Fri Mar 14 14:10:37 2008 -0400 NFS: Show most mount options via nfs_show_options() Display all mount options in /proc/mount which may be needed to reconstruct a previous mount. Signed-off-by: Chuck Lever Cc: Miklos Szeredi Signed-off-by: Trond Myklebust commit 3f8400d1f1f9d5fb175bdbf6236e564dde454f28 Author: Chuck Lever Date: Fri Mar 14 14:10:30 2008 -0400 NFS: Save the values of the "mount*=" mount options Save the value of the mountproto= mountport= mountvers= and mountaddr= options so that these values can be displayed later via nfs_show_options(). This preserves the intent of the original mount options, should the file system need to be remounted based on what's displayed in /proc/mounts. Signed-off-by: Chuck Lever Cc: Miklos Szeredi Signed-off-by: Trond Myklebust commit f22d6d79fe227245363a8849ea8c85fe6c6598c3 Author: Chuck Lever Date: Fri Mar 14 14:10:22 2008 -0400 NFS: Save the value of the "port=" mount option During a remount based on the mount options displayed in /proc/mounts, we want to preserve the original behavior of the mount request. Let's save the original setting of the "port=" mount option in the mount's nfs_server structure. This allows us to simplify the default behavior of port setting for NFSv4 mounts: by default, NFSv2/3 mounts first try an RPC bind to determine the NFS server's port, unless the user specified the "port=" mount option; Users can force the client to skip the RPC bind by explicitly specifying "port=". NFSv4, by contrast, assumes the NFS server port is 2049 and skips the RPC bind, unless the user specifies "port=". Users can force an RPC bind for NFSv4 by explicitly specifying "port=0". I added a couple of extra comments to clarify this behavior. Signed-off-by: Chuck Lever Cc: Miklos Szeredi Signed-off-by: Trond Myklebust commit 78fa701f341564e60461de91cd08ff5f7fb09b31 Author: Chuck Lever Date: Fri Mar 14 14:10:15 2008 -0400 NFS: Fix up data types of fields in nfs_parsed_mount_options Clean up: make data types of fields in nfs_parsed_mount_options more consistent with other uses. Signed-off-by: Chuck Lever Signed-off-by: Trond Myklebust commit 2d76743227028a26f884284aade03d1e43f4f249 Author: Chuck Lever Date: Fri Mar 14 14:10:08 2008 -0400 NFS: numeric mount parameters are unsigned Clean up: use %u instead of %d when displaying NFS mount options. Nit: Fix reporting of "namlen=" option in nfs_show_mount_stats. The mount option is called "namlen" without the "e". Signed-off-by: Chuck Lever Signed-off-by: Trond Myklebust commit 7bda2cdf484a00e52b0ed925e99d4bf4696b2c7a Author: Jeff Layton Date: Fri Feb 22 14:50:01 2008 -0500 NFS: clean up short packet handling for NFSv4 readdir Currently, the NFS readdir decoders have a workaround for buggy servers that send an empty readdir response with the EOF bit unset. If the server sends a malformed response in some cases, this workaround kicks in and just returns an empty response rather than returning a proper error to the caller. This patch does 3 things: 1) have malformed responses with no entries return error (-EIO) 2) preserve existing workaround for servers that send empty responses with the EOF marker unset. 3) Add some comments to clarify the logic in decode_readdir(). Signed-off-by: Jeff Layton Signed-off-by: Trond Myklebust commit 643f81115baca3630e544f6874567648b605efae Author: Jeff Layton Date: Fri Feb 22 14:50:00 2008 -0500 NFS: clean up short packet handling for NFSv3 readdir Currently, the NFS readdir decoders have a workaround for buggy servers that send an empty readdir response with the EOF bit unset. If the server sends a malformed response in some cases, this workaround kicks in and just returns an empty response rather than returning a proper error to the caller. This patch does 3 things: 1) have malformed responses with no entries return error (-EIO) 2) preserve existing workaround for servers that send empty responses with the EOF marker unset. 3) Add some comments to clarify the logic in nfs3_xdr_readdirres(). Signed-off-by: Jeff Layton Signed-off-by: Trond Myklebust commit caa02bd540618e4b447a1f776363ba27c4c79090 Author: Jeff Layton Date: Fri Feb 22 14:49:59 2008 -0500 NFS: clean up short packet handling for NFSv2 readdir Currently, the NFS readdir decoders have a workaround for buggy servers that send an empty readdir response with the EOF bit unset. If the server sends a malformed response in some cases, this workaround kicks in and just returns an empty response rather than returning a proper error to the caller. This patch does 3 things: 1) have malformed responses with no entries return error (-EIO) 2) preserve existing workaround for servers that send empty responses with the EOF marker unset. 3) Add some comments to clarify the logic in nfs_xdr_readdirres(). Signed-off-by: Jeff Layton Signed-off-by: Trond Myklebust commit 4af68bffac444a23f027e18ff244101e63b79227 Author: Fred Isaman Date: Wed Mar 19 11:54:04 2008 -0400 nfs: remove duplicate initializations of nfs_read_data field Signed-off-by: Fred Isaman Signed-off-by: Trond Myklebust commit 6d884e8fc8114dc8877218f06a9a9a1d801901e4 Author: Fred Date: Wed Mar 19 11:24:38 2008 -0400 nfs: nfs_redirty_request Both flush functions have the same error handling routine. Pull it out as a function. Signed-off-by: Fred Isaman Signed-off-by: Trond Myklebust commit 2f42b5d043ee271d1e5d30ecd77186b6c4d4e534 Author: Fred Isaman Date: Thu Mar 13 15:26:30 2008 +0200 NFS: fix encode_fsinfo_maxsz The previous value was not taking into account space for bitmap array size. Signed-off-by: Fred Isaman Signed-off-by: Benny Halevy Signed-off-by: Trond Myklebust commit 98a8e3239427051f5d44f2025b398bdcc3918f37 Author: Trond Myklebust Date: Wed Mar 12 12:25:28 2008 -0400 SUNRPC: Add a helper rpcauth_lookup_generic_cred() The NFSv4 protocol allows clients to negotiate security protocols on the fly in the case where an administrator on the server changes the export settings and/or in the case where we may have a filesystem migration event. Instead of having the NFS client code cache credentials that are tied to a particular AUTH method it is therefore preferable to have a generic credential that can be converted into whatever AUTH is in use by the RPC client when the read/write/sillyrename/... is put on the wire. We do this by means of the new "generic" credential, which basically just caches the minimal information that is needed to look up an RPCSEC_GSS, AUTH_SYS, or AUTH_NULL credential. Signed-off-by: Trond Myklebust commit 5c691044ecbca04dd558fca4c754121689fe1b34 Author: Trond Myklebust Date: Wed Mar 12 16:21:07 2008 -0400 SUNRPC: Add an rpc_credop callback for binding a credential to an rpc_task We need the ability to treat 'generic' creds specially, since they want to bind instances of the auth cred instead of binding themselves. Signed-off-by: Trond Myklebust commit 9a559efd4199c9812d339e23cc1b6055366b224f Author: Trond Myklebust Date: Wed Mar 12 12:24:49 2008 -0400 SUNRPC: Add a generic RPC credential Add an rpc credential that is not tied to any particular auth mechanism, but that can be cached by NFS, and later used to look up a cred for whichever auth mechanism that turns out to be valid when the RPC call is being made. Signed-off-by: Trond Myklebust commit 4ccda2cdd8d156b6f49440653d5d6997e0facf97 Author: Trond Myklebust Date: Wed Mar 12 16:20:55 2008 -0400 SUNRPC: Clean up rpcauth_bindcred() Signed-off-by: Trond Myklebust commit af093835774931de898a9baf7b4041fa0d100f77 Author: Trond Myklebust Date: Wed Mar 12 12:12:16 2008 -0400 SUNRPC: Fix RPCAUTH_LOOKUP_ROOTCREDS The current RPCAUTH_LOOKUP_ROOTCREDS flag only works for AUTH_SYS authentication, and then only as a special case in the code. This patch removes the auth_sys special casing, and replaces it with generic code. Signed-off-by: Trond Myklebust commit 25337fdc85951dfeac944f16cb565904c619077a Author: Trond Myklebust Date: Wed Mar 12 14:40:14 2008 -0400 SUNRPC: Fix a bug in rpcauth_lookup_credcache() The hash bucket is for some reason always being set to zero. Signed-off-by: Trond Myklebust commit 5e4424af9a1f062c6451681dff24a26e27741cc6 Author: Trond Myklebust Date: Mon Feb 25 21:53:49 2008 -0800 SUNRPC: Remove now-redundant RCU-safe rpc_task free path Now that we've tightened up the locking rules for RPC queue wakeups, we can remove the RCU-safe kfree calls... Signed-off-by: Trond Myklebust commit ff2d7db848f8db7cade39e55f78f86d77e0de01a Author: Trond Myklebust Date: Mon Feb 25 21:40:51 2008 -0800 SUNRPC: Ensure that we read all available tcp data Don't stop until we run out of data, or we hit an error. Signed-off-by: Trond Myklebust commit f5fb7b06e4e4ab18326f067f4317b2016ce18af2 Author: Trond Myklebust Date: Mon Feb 25 21:40:50 2008 -0800 SUNRPC: Eliminate the now-redundant rpc_start_wakeup() Signed-off-by: Trond Myklebust commit eb276c0e10187702928aeaa133e1d3dbaf3eafc7 Author: Trond Myklebust Date: Fri Feb 22 17:27:59 2008 -0500 SUNRPC: Switch tasks to using the rpc_waitqueue's timer function Signed-off-by: Trond Myklebust commit 36df9aae3158ce8fc4ede241169dc94ac910d884 Author: Trond Myklebust Date: Wed Jul 18 16:18:52 2007 -0400 SUNRPC: Add a timer function to wait queues. This is designed to replace the timeout timer in the individual rpc_tasks. By putting the timer function in the wait queue, we will eventually be able to reduce the total number of timers in use by the RPC subsystem. Signed-off-by: Trond Myklebust commit f6a1cc89309f0ae847a9b6fe418d1c4215e5bc55 Author: Trond Myklebust Date: Fri Feb 22 17:06:55 2008 -0500 SUNRPC: Add a (empty for the moment) destructor for rpc_wait_queues Signed-off-by: Trond Myklebust commit 5d00837b90340af9106dcd93af75fd664c8eb87f Author: Trond Myklebust Date: Fri Feb 22 16:34:17 2008 -0500 SUNRPC: Run rpc timeout functions as callbacks instead of in softirqs An audit of the current RPC timeout functions shows that they don't really ever need to run in the softirq context. As long as the softirq is able to signal that the wakeup is due to a timeout (which it can do by setting task->tk_status to -ETIMEDOUT) then the callback functions can just run as standard task->tk_callback functions (in the rpciod/process context). The only possible border-line case would be xprt_timer() for the case of UDP, when the callback is used to reduce the size of the transport congestion window. In testing, however, the effect of moving that update to a callback would appear to be minor. Signed-off-by: Trond Myklebust commit fda1393938035559b417dd5b26b9cc293a7aee00 Author: Trond Myklebust Date: Fri Feb 22 16:34:12 2008 -0500 SUNRPC: Convert users of rpc_wake_up_task to use rpc_wake_up_queued_task Signed-off-by: Trond Myklebust commit 96ef13b283934fbf60b732e6c4ce23e8babd0042 Author: Trond Myklebust Date: Fri Feb 22 15:46:41 2008 -0500 SUNRPC: Add a new helper rpc_wake_up_queued_task() In all cases where we currently use rpc_wake_up_task(), we almost always know on which waitqueue the rpc_task is actually sleeping. This will allows us to simplify the queue locking in a future patch. Signed-off-by: Trond Myklebust commit fde95c7554aa77f9a242f32b0b5f8f15395abf52 Author: Trond Myklebust Date: Fri Feb 22 15:09:26 2008 -0500 SUNRPC: Clean up rpc_run_timer() All RPC timeout callback functions are expected to wake the task up. We can enforce this by moving the wakeup back into rpc_run_timer. Signed-off-by: Trond Myklebust commit 101070ca2fe67186f5f5517b66cb4757b17f4e29 Author: Trond Myklebust Date: Tue Feb 19 20:04:23 2008 -0500 NFS: Ensure that the asynchronous RPC calls complete on nfsiod. We want to ensure that rpc_call_ops that involve mntput() are run on nfsiod rather than on rpciod, so that they don't deadlock when the resulting umount calls rpc_shutdown_client(). Hence we specify that read, write and commit calls must complete on nfsiod. Ditto for NFSv4 open, lock, locku and close asynchronous calls. Signed-off-by: Trond Myklebust commit 5746006f1d17d9d5a3015051ea54de4341cb31f9 Author: Trond Myklebust Date: Tue Feb 19 20:04:22 2008 -0500 NFS: Add an nfsiod workqueue NFS post-rpciod cleanups often involve tasks that cannot be safely performed within the rpciod context (due to deadlock concerns). We therefore add a dedicated NFS workqueue that can perform tasks like cleaning up state after an interrupted NFSv4 open() call, or calling put_nfs_open_context() after an asynchronous read or write call. Signed-off-by: Trond Myklebust commit 32bfb5c0f495dd88ef6bac4b76885d0820563739 Author: Trond Myklebust Date: Tue Feb 19 20:04:21 2008 -0500 SUNRPC: Allow the rpc_release() callback to be run on another workqueue A lot of the work done by the rpc_release() callback is inappropriate for rpciod as it will often involve things like starting a new rpc call in order to clean up state after an interrupted NFSv4 open() call, or calls to mntput(), etc. This patch allows the caller of rpc_run_task() to specify that the rpc_release callback should run on a different workqueue than the default rpciod_workqueue. Signed-off-by: Trond Myklebust commit 383ba71938519959be8e0b598ec658f0c211ff45 Author: Trond Myklebust Date: Tue Feb 19 20:04:20 2008 -0500 NFS: Fix a deadlock with lazy umount We can't allow rpc callback functions like task->tk_ops->rpc_call_prepare() and task->tk_ops->rpc_call_done() to call mntput() in any way, since that will cause a deadlock when the call to rpc_shutdown_client() attempts to wait on 'task' to complete. We can avoid the above deadlock by moving calls to mntput to task->tk_ops->rpc_release() callback, since at that time the task will be marked as completed, and so rpc_shutdown_client won't attempt to wait on it. Signed-off-by: Trond Myklebust commit 4b5621f6b127bce9218998c187bd25bf7f9fc371 Author: Trond Myklebust Date: Mon Feb 25 15:56:29 2008 -0800 NFS: Fix an f_mode/f_flags confusion in fs/nfs/write.c O_SYNC is stored in filp->f_flags. Thanks to Al Viro for pointing out the bug. Signed-off-by: Trond Myklebust commit cbc20059259edee4ea24c923e6627c394830c972 Author: Trond Myklebust Date: Thu Feb 14 11:11:30 2008 -0500 SUNRPC: Declare as const the rpc_message arguments to rpc_call_sync/async Signed-off-by: Trond Myklebust commit 2785259631697ebb0749a3782cca206e2e542939 Author: Nick Piggin Date: Mon Feb 4 23:48:37 2008 -0800 nfs: use GFP_NOFS preloads for radix-tree insertion NFS should use GFP_NOFS mode radix tree preloads rather than GFP_ATOMIC allocations at radix-tree insertion-time. This is important to reduce the atomic memory requirement. Signed-off-by: Nick Piggin Cc: Trond Myklebust Cc: "J. Bruce Fields" Signed-off-by: Andrew Morton Signed-off-by: Trond Myklebust Documentation/filesystems/nfs-rdma.txt | 252 ++++++++++++++++++++++++++++ fs/Kconfig | 179 ++++++++++---------- fs/lockd/host.c | 93 ++++++----- fs/lockd/mon.c | 113 ++++++++++--- fs/lockd/svc.c | 162 +++++++++--------- fs/lockd/svclock.c | 6 +- fs/lockd/svcshare.c | 3 +- fs/nfs/Makefile | 3 +- fs/nfs/callback.c | 93 ++++++----- fs/nfs/client.c | 16 ++ fs/nfs/dir.c | 2 +- fs/nfs/direct.c | 9 +- fs/nfs/file.c | 6 - fs/nfs/inode.c | 45 +++++- fs/nfs/internal.h | 13 +- fs/nfs/nfs2xdr.c | 37 +++- fs/nfs/nfs3xdr.c | 37 +++- fs/nfs/nfs4proc.c | 18 ++- fs/nfs/nfs4state.c | 8 +- fs/nfs/nfs4xdr.c | 39 +++-- fs/nfs/read.c | 18 +-- fs/nfs/super.c | 130 ++++++++++----- fs/nfs/symlink.c | 1 - fs/nfs/unlink.c | 2 +- fs/nfs/write.c | 86 +++++----- fs/nfsd/auth.c | 1 + fs/nfsd/export.c | 9 +- fs/nfsd/nfs4callback.c | 28 ++-- fs/nfsd/nfs4idmap.c | 2 +- fs/nfsd/nfs4state.c | 71 +++++---- fs/nfsd/nfs4xdr.c | 13 +- fs/nfsd/nfsctl.c | 22 ++- fs/nfsd/nfsfh.c | 228 ++++++++++++++------------ fs/nfsd/nfssvc.c | 2 - fs/nfsd/vfs.c | 4 +- include/linux/exportfs.h | 2 +- include/linux/lockd/lockd.h | 17 +- include/linux/lockd/sm_inter.h | 1 + include/linux/nfs3.h | 2 +- include/linux/nfs_fs_sb.h | 8 + include/linux/nfsd/Kbuild | 4 +- include/linux/nfsd/cache.h | 2 - include/linux/nfsd/nfsd.h | 11 +- include/linux/sunrpc/auth.h | 13 +- include/linux/sunrpc/cache.h | 1 - include/linux/sunrpc/clnt.h | 9 +- include/linux/sunrpc/gss_krb5.h | 6 +- include/linux/sunrpc/sched.h | 41 ++--- include/linux/sunrpc/svc.h | 1 - include/linux/sunrpc/svcauth.h | 5 +- include/net/ipv6.h | 9 + net/sunrpc/Makefile | 2 +- net/sunrpc/auth.c | 63 +++++--- net/sunrpc/auth_generic.c | 157 ++++++++++++++++++ net/sunrpc/auth_gss/auth_gss.c | 7 +- net/sunrpc/auth_gss/gss_generic_token.c | 4 +- net/sunrpc/auth_gss/gss_krb5_crypto.c | 6 +- net/sunrpc/auth_gss/gss_krb5_seal.c | 9 +- net/sunrpc/auth_gss/gss_krb5_seqnum.c | 4 +- net/sunrpc/auth_gss/gss_krb5_unseal.c | 2 +- net/sunrpc/auth_gss/gss_krb5_wrap.c | 8 +- net/sunrpc/auth_gss/gss_spkm3_seal.c | 4 +- net/sunrpc/auth_gss/svcauth_gss.c | 9 +- net/sunrpc/auth_null.c | 1 + net/sunrpc/auth_unix.c | 57 +++---- net/sunrpc/cache.c | 1 - net/sunrpc/clnt.c | 8 +- net/sunrpc/rpcb_clnt.c | 2 +- net/sunrpc/sched.c | 264 ++++++++++++++---------------- net/sunrpc/svc.c | 15 +-- net/sunrpc/svc_xprt.c | 30 +++- net/sunrpc/svcauth_unix.c | 118 +++++++++----- net/sunrpc/svcsock.c | 1 + net/sunrpc/xprt.c | 45 +++--- net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 + net/sunrpc/xprtsock.c | 7 +- 76 files changed, 1731 insertions(+), 978 deletions(-) diff --git a/Documentation/filesystems/nfs-rdma.txt b/Documentation/filesystems/nfs-rdma.txt new file mode 100644 index 0000000..1ae3487 --- /dev/null +++ b/Documentation/filesystems/nfs-rdma.txt @@ -0,0 +1,252 @@ +################################################################################ +# # +# NFS/RDMA README # +# # +################################################################################ + + Author: NetApp and Open Grid Computing + Date: February 25, 2008 + +Table of Contents +~~~~~~~~~~~~~~~~~ + - Overview + - Getting Help + - Installation + - Check RDMA and NFS Setup + - NFS/RDMA Setup + +Overview +~~~~~~~~ + + This document describes how to install and setup the Linux NFS/RDMA client + and server software. + + The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server + was first included in the following release, Linux 2.6.25. + + In our testing, we have obtained excellent performance results (full 10Gbit + wire bandwidth at minimal client CPU) under many workloads. The code passes + the full Connectathon test suite and operates over both Infiniband and iWARP + RDMA adapters. + +Getting Help +~~~~~~~~~~~~ + + If you get stuck, you can ask questions on the + + nfs-rdma-devel@lists.sourceforge.net + + mailing list. + +Installation +~~~~~~~~~~~~ + + These instructions are a step by step guide to building a machine for + use with NFS/RDMA. + + - Install an RDMA device + + Any device supported by the drivers in drivers/infiniband/hw is acceptable. + + Testing has been performed using several Mellanox-based IB cards, the + Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter. + + - Install a Linux distribution and tools + + The first kernel release to contain both the NFS/RDMA client and server was + Linux 2.6.25 Therefore, a distribution compatible with this and subsequent + Linux kernel release should be installed. + + The procedures described in this document have been tested with + distributions from Red Hat's Fedora Project (http://fedora.redhat.com/). + + - Install nfs-utils-1.1.1 or greater on the client + + An NFS/RDMA mount point can only be obtained by using the mount.nfs + command in nfs-utils-1.1.1 or greater. To see which version of mount.nfs + you are using, type: + + > /sbin/mount.nfs -V + + If the version is less than 1.1.1 or the command does not exist, + then you will need to install the latest version of nfs-utils. + + Download the latest package from: + + http://www.kernel.org/pub/linux/utils/nfs + + Uncompress the package and follow the installation instructions. + + If you will not be using GSS and NFSv4, the installation process + can be simplified by disabling these features when running configure: + + > ./configure --disable-gss --disable-nfsv4 + + For more information on this see the package's README and INSTALL files. + + After building the nfs-utils package, there will be a mount.nfs binary in + the utils/mount directory. This binary can be used to initiate NFS v2, v3, + or v4 mounts. To initiate a v4 mount, the binary must be called mount.nfs4. + The standard technique is to create a symlink called mount.nfs4 to mount.nfs. + + NOTE: mount.nfs and therefore nfs-utils-1.1.1 or greater is only needed + on the NFS client machine. You do not need this specific version of + nfs-utils on the server. Furthermore, only the mount.nfs command from + nfs-utils-1.1.1 is needed on the client. + + - Install a Linux kernel with NFS/RDMA + + The NFS/RDMA client and server are both included in the mainline Linux + kernel version 2.6.25 and later. This and other versions of the 2.6 Linux + kernel can be found at: + + ftp://ftp.kernel.org/pub/linux/kernel/v2.6/ + + Download the sources and place them in an appropriate location. + + - Configure the RDMA stack + + Make sure your kernel configuration has RDMA support enabled. Under + Device Drivers -> InfiniBand support, update the kernel configuration + to enable InfiniBand support [NOTE: the option name is misleading. Enabling + InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)]. + + Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or + iWARP adapter support (amso, cxgb3, etc.). + + If you are using InfiniBand, be sure to enable IP-over-InfiniBand support. + + - Configure the NFS client and server + + Your kernel configuration must also have NFS file system support and/or + NFS server support enabled. These and other NFS related configuration + options can be found under File Systems -> Network File Systems. + + - Build, install, reboot + + The NFS/RDMA code will be enabled automatically if NFS and RDMA + are turned on. The NFS/RDMA client and server are configured via the hidden + SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The + value of SUNRPC_XPRT_RDMA will be: + + - N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client + and server will not be built + - M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M, + in this case the NFS/RDMA client and server will be built as modules + - Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client + and server will be built into the kernel + + Therefore, if you have followed the steps above and turned no NFS and RDMA, + the NFS/RDMA client and server will be built. + + Build a new kernel, install it, boot it. + +Check RDMA and NFS Setup +~~~~~~~~~~~~~~~~~~~~~~~~ + + Before configuring the NFS/RDMA software, it is a good idea to test + your new kernel to ensure that the kernel is working correctly. + In particular, it is a good idea to verify that the RDMA stack + is functioning as expected and standard NFS over TCP/IP and/or UDP/IP + is working properly. + + - Check RDMA Setup + + If you built the RDMA components as modules, load them at + this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel + card: + + > modprobe ib_mthca + > modprobe ib_ipoib + + If you are using InfiniBand, make sure there is a Subnet Manager (SM) + running on the network. If your IB switch has an embedded SM, you can + use it. Otherwise, you will need to run an SM, such as OpenSM, on one + of your end nodes. + + If an SM is running on your network, you should see the following: + + > cat /sys/class/infiniband/driverX/ports/1/state + 4: ACTIVE + + where driverX is mthca0, ipath5, ehca3, etc. + + To further test the InfiniBand software stack, use IPoIB (this + assumes you have two IB hosts named host1 and host2): + + host1> ifconfig ib0 a.b.c.x + host2> ifconfig ib0 a.b.c.y + host1> ping a.b.c.y + host2> ping a.b.c.x + + For other device types, follow the appropriate procedures. + + - Check NFS Setup + + For the NFS components enabled above (client and/or server), + test their functionality over standard Ethernet using TCP/IP or UDP/IP. + +NFS/RDMA Setup +~~~~~~~~~~~~~~ + + We recommend that you use two machines, one to act as the client and + one to act as the server. + + One time configuration: + + - On the server system, configure the /etc/exports file and + start the NFS/RDMA server. + + Exports entries with the following format have been tested: + + /vol0 10.97.103.47(rw,async) 192.168.0.47(rw,async,insecure,no_root_squash) + + Here the first IP address is the client's Ethernet address and the second + IP address is the clients IPoIB address. + + Each time a machine boots: + + - Load and configure the RDMA drivers + + For InfiniBand using a Mellanox adapter: + + > modprobe ib_mthca + > modprobe ib_ipoib + > ifconfig ib0 a.b.c.d + + NOTE: use unique addresses for the client and server + + - Start the NFS server + + If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in kernel config), + load the RDMA transport module: + + > modprobe svcrdma + + Regardless of how the server was built (module or built-in), start the server: + + > /etc/init.d/nfs start + + or + + > service nfs start + + Instruct the server to listen on the RDMA transport: + + > echo rdma 2050 > /proc/fs/nfsd/portlist + + - On the client system + + If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in kernel config), + load the RDMA client module: + + > modprobe xprtrdma.ko + + Regardless of how the client was built (module or built-in), issue the mount.nfs command: + + > /path/to/your/mount.nfs :/ /mnt -i -o rdma,port=2050 + + To verify that the mount is using RDMA, run "cat /proc/mounts" and check the + "proto" field for the given mount. + + Congratulations! You're using NFS/RDMA! diff --git a/fs/Kconfig b/fs/Kconfig index d731282..846a25f 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -411,7 +411,7 @@ config JFS_STATISTICS to be made available to the user in the /proc/fs/jfs/ directory. config FS_POSIX_ACL -# Posix ACL utility routines (for now, only ext2/ext3/jfs/reiserfs) +# Posix ACL utility routines (for now, only ext2/ext3/jfs/reiserfs/nfs4) # # NOTE: you can implement Posix ACLs without these helpers (XFS does). # Never use this symbol for ifdefs. @@ -1637,105 +1637,86 @@ config NFS_V4 If unsure, say N. -config NFS_DIRECTIO - bool "Allow direct I/O on NFS files" - depends on NFS_FS - help - This option enables applications to perform uncached I/O on files - in NFS file systems using the O_DIRECT open() flag. When O_DIRECT - is set for a file, its data is not cached in the system's page - cache. Data is moved to and from user-level application buffers - directly. Unlike local disk-based file systems, NFS O_DIRECT has - no alignment restrictions. - - Unless your program is designed to use O_DIRECT properly, you are - much better off allowing the NFS client to manage data caching for - you. Misusing O_DIRECT can cause poor server performance or network - storms. This kernel build option defaults OFF to avoid exposing - system administrators unwittingly to a potentially hazardous - feature. - - For more details on NFS O_DIRECT, see fs/nfs/direct.c. - - If unsure, say N. This reduces the size of the NFS client, and - causes open() to return EINVAL if a file residing in NFS is - opened with the O_DIRECT flag. - config NFSD tristate "NFS server support" depends on INET select LOCKD select SUNRPC select EXPORTFS - select NFSD_V2_ACL if NFSD_V3_ACL select NFS_ACL_SUPPORT if NFSD_V2_ACL - select NFSD_TCP if NFSD_V4 - select CRYPTO_MD5 if NFSD_V4 - select CRYPTO if NFSD_V4 - select FS_POSIX_ACL if NFSD_V4 - select PROC_FS if NFSD_V4 - select PROC_FS if SUNRPC_GSS - help - If you want your Linux box to act as an NFS *server*, so that other - computers on your local network which support NFS can access certain - directories on your box transparently, you have two options: you can - use the self-contained user space program nfsd, in which case you - should say N here, or you can say Y and use the kernel based NFS - server. The advantage of the kernel based solution is that it is - faster. - - In either case, you will need support software; the respective - locations are given in the file in the - NFS section. - - If you say Y here, you will get support for version 2 of the NFS - protocol (NFSv2). If you also want NFSv3, say Y to the next question - as well. - - Please read the NFS-HOWTO, available from - . - - To compile the NFS server support as a module, choose M here: the - module will be called nfsd. If unsure, say N. + help + Choose Y here if you want to allow other computers to access + files residing on this system using Sun's Network File System + protocol. To compile the NFS server support as a module, + choose M here: the module will be called nfsd. + + You may choose to use a user-space NFS server instead, in which + case you can choose N here. + + To export local file systems using NFS, you also need to install + user space programs which can be found in the Linux nfs-utils + package, available from http://linux-nfs.org/. More detail about + the Linux NFS server implementation is available via the + exports(5) man page. + + Below you can choose which versions of the NFS protocol are + available to clients mounting the NFS server on this system. + Support for NFS version 2 (RFC 1094) is always available when + CONFIG_NFSD is selected. + + If unsure, say N. config NFSD_V2_ACL bool depends on NFSD config NFSD_V3 - bool "Provide NFSv3 server support" + bool "NFS server support for NFS version 3" depends on NFSD help - If you would like to include the NFSv3 server as well as the NFSv2 - server, say Y here. If unsure, say Y. + This option enables support in your system's NFS server for + version 3 of the NFS protocol (RFC 1813). + + If unsure, say Y. config NFSD_V3_ACL - bool "Provide server support for the NFSv3 ACL protocol extension" + bool "NFS server support for the NFSv3 ACL protocol extension" depends on NFSD_V3 + select NFSD_V2_ACL help - Implement the NFSv3 ACL protocol extension for manipulating POSIX - Access Control Lists on exported file systems. NFS clients should - be compiled with the NFSv3 ACL protocol extension; see the - CONFIG_NFS_V3_ACL option. If unsure, say N. + Solaris NFS servers support an auxiliary NFSv3 ACL protocol that + never became an official part of the NFS version 3 protocol. + This protocol extension allows applications on NFS clients to + manipulate POSIX Access Control Lists on files residing on NFS + servers. NFS servers enforce POSIX ACLs on local files whether + this protocol is available or not. + + This option enables support in your system's NFS server for the + NFSv3 ACL protocol extension allowing NFS clients to manipulate + POSIX ACLs on files exported by your system's NFS server. NFS + clients which support the Solaris NFSv3 ACL protocol can then + access and modify ACLs on your NFS server. + + To store ACLs on your NFS server, you also need to enable ACL- + related CONFIG options for your local file systems of choice. + + If unsure, say N. config NFSD_V4 - bool "Provide NFSv4 server support (EXPERIMENTAL)" - depends on NFSD && NFSD_V3 && EXPERIMENTAL + bool "NFS server support for NFS version 4 (EXPERIMENTAL)" + depends on NFSD && PROC_FS && EXPERIMENTAL + select NFSD_V3 + select FS_POSIX_ACL select RPCSEC_GSS_KRB5 help - If you would like to include the NFSv4 server as well as the NFSv2 - and NFSv3 servers, say Y here. This feature is experimental, and - should only be used if you are interested in helping to test NFSv4. - If unsure, say N. + This option enables support in your system's NFS server for + version 4 of the NFS protocol (RFC 3530). -config NFSD_TCP - bool "Provide NFS server over TCP support" - depends on NFSD - default y - help - If you want your NFS server to support TCP connections, say Y here. - TCP connections usually perform better than the default UDP when - the network is lossy or congested. If unsure, say Y. + To export files using NFSv4, you need to install additional user + space programs which can be found in the Linux nfs-utils package, + available from http://linux-nfs.org/. + + If unsure, say N. config ROOT_NFS bool "Root file system on NFS" @@ -1781,15 +1762,33 @@ config SUNRPC_XPRT_RDMA tristate depends on SUNRPC && INFINIBAND && EXPERIMENTAL default SUNRPC && INFINIBAND + help + This option enables an RPC client transport capability that + allows the NFS client to mount servers via an RDMA-enabled + transport. + + To compile RPC client RDMA transport support as a module, + choose M here: the module will be called xprtrdma. + + If unsure, say N. config SUNRPC_BIND34 bool "Support for rpcbind versions 3 & 4 (EXPERIMENTAL)" depends on SUNRPC && EXPERIMENTAL + default n help - Provides kernel support for querying rpcbind servers via versions 3 - and 4 of the rpcbind protocol. The kernel automatically falls back - to version 2 if a remote rpcbind service does not support versions - 3 or 4. + RPC requests over IPv6 networks require support for larger + addresses when performing an RPC bind. Sun added support for + IPv6 addressing by creating two new versions of the rpcbind + protocol (RFC 1833). + + This option enables support in the kernel RPC client for + querying rpcbind servers via versions 3 and 4 of the rpcbind + protocol. The kernel automatically falls back to version 2 + if a remote rpcbind service does not support versions 3 or 4. + By themselves, these new versions do not provide support for + RPC over IPv6, but the new protocol versions are necessary to + support it. If unsure, say N to get traditional behavior (version 2 rpcbind requests only). @@ -1803,12 +1802,13 @@ config RPCSEC_GSS_KRB5 select CRYPTO_DES select CRYPTO_CBC help - Provides for secure RPC calls by means of a gss-api - mechanism based on Kerberos V5. This is required for - NFSv4. + Choose Y here to enable Secure RPC using the Kerberos version 5 + GSS-API mechanism (RFC 1964). - Note: Requires an auxiliary userspace daemon which may be found on - http://www.citi.umich.edu/projects/nfsv4/ + Secure RPC calls with Kerberos require an auxiliary user-space + daemon which may be found in the Linux nfs-utils package + available from http://linux-nfs.org/. In addition, user-space + Kerberos support should be installed. If unsure, say N. @@ -1822,11 +1822,12 @@ config RPCSEC_GSS_SPKM3 select CRYPTO_CAST5 select CRYPTO_CBC help - Provides for secure RPC calls by means of a gss-api - mechanism based on the SPKM3 public-key mechanism. + Choose Y here to enable Secure RPC using the SPKM3 public key + GSS-API mechansim (RFC 2025). - Note: Requires an auxiliary userspace daemon which may be found on - http://www.citi.umich.edu/projects/nfsv4/ + Secure RPC calls with SPKM3 require an auxiliary userspace + daemon which may be found in the Linux nfs-utils package + available from http://linux-nfs.org/. If unsure, say N. diff --git a/fs/lockd/host.c b/fs/lockd/host.c index f1ef49f..a17664c 100644 --- a/fs/lockd/host.c +++ b/fs/lockd/host.c @@ -19,12 +19,11 @@ #define NLMDBG_FACILITY NLMDBG_HOSTCACHE -#define NLM_HOST_MAX 64 #define NLM_HOST_NRHASH 32 #define NLM_ADDRHASH(addr) (ntohl(addr) & (NLM_HOST_NRHASH-1)) #define NLM_HOST_REBIND (60 * HZ) -#define NLM_HOST_EXPIRE ((nrhosts > NLM_HOST_MAX)? 300 * HZ : 120 * HZ) -#define NLM_HOST_COLLECT ((nrhosts > NLM_HOST_MAX)? 120 * HZ : 60 * HZ) +#define NLM_HOST_EXPIRE (300 * HZ) +#define NLM_HOST_COLLECT (120 * HZ) static struct hlist_head nlm_hosts[NLM_HOST_NRHASH]; static unsigned long next_gc; @@ -42,11 +41,12 @@ static struct nsm_handle * nsm_find(const struct sockaddr_in *sin, /* * Common host lookup routine for server & client */ -static struct nlm_host * -nlm_lookup_host(int server, const struct sockaddr_in *sin, - int proto, int version, const char *hostname, - unsigned int hostname_len, - const struct sockaddr_in *ssin) +static struct nlm_host *nlm_lookup_host(int server, + const struct sockaddr_in *sin, + int proto, u32 version, + const char *hostname, + unsigned int hostname_len, + const struct sockaddr_in *ssin) { struct hlist_head *chain; struct hlist_node *pos; @@ -55,7 +55,7 @@ nlm_lookup_host(int server, const struct sockaddr_in *sin, int hash; dprintk("lockd: nlm_lookup_host("NIPQUAD_FMT"->"NIPQUAD_FMT - ", p=%d, v=%d, my role=%s, name=%.*s)\n", + ", p=%d, v=%u, my role=%s, name=%.*s)\n", NIPQUAD(ssin->sin_addr.s_addr), NIPQUAD(sin->sin_addr.s_addr), proto, version, server? "server" : "client", @@ -142,9 +142,7 @@ nlm_lookup_host(int server, const struct sockaddr_in *sin, INIT_LIST_HEAD(&host->h_granted); INIT_LIST_HEAD(&host->h_reclaim); - if (++nrhosts > NLM_HOST_MAX) - next_gc = 0; - + nrhosts++; out: mutex_unlock(&nlm_host_mutex); return host; @@ -175,9 +173,10 @@ nlm_destroy_host(struct nlm_host *host) /* * Find an NLM server handle in the cache. If there is none, create it. */ -struct nlm_host * -nlmclnt_lookup_host(const struct sockaddr_in *sin, int proto, int version, - const char *hostname, unsigned int hostname_len) +struct nlm_host *nlmclnt_lookup_host(const struct sockaddr_in *sin, + int proto, u32 version, + const char *hostname, + unsigned int hostname_len) { struct sockaddr_in ssin = {0}; @@ -460,7 +459,7 @@ nlm_gc_hosts(void) * Manage NSM handles */ static LIST_HEAD(nsm_handles); -static DEFINE_MUTEX(nsm_mutex); +static DEFINE_SPINLOCK(nsm_lock); static struct nsm_handle * __nsm_find(const struct sockaddr_in *sin, @@ -468,7 +467,7 @@ __nsm_find(const struct sockaddr_in *sin, int create) { struct nsm_handle *nsm = NULL; - struct list_head *pos; + struct nsm_handle *pos; if (!sin) return NULL; @@ -482,38 +481,43 @@ __nsm_find(const struct sockaddr_in *sin, return NULL; } - mutex_lock(&nsm_mutex); - list_for_each(pos, &nsm_handles) { - nsm = list_entry(pos, struct nsm_handle, sm_link); +retry: + spin_lock(&nsm_lock); + list_for_each_entry(pos, &nsm_handles, sm_link) { if (hostname && nsm_use_hostnames) { - if (strlen(nsm->sm_name) != hostname_len - || memcmp(nsm->sm_name, hostname, hostname_len)) + if (strlen(pos->sm_name) != hostname_len + || memcmp(pos->sm_name, hostname, hostname_len)) continue; - } else if (!nlm_cmp_addr(&nsm->sm_addr, sin)) + } else if (!nlm_cmp_addr(&pos->sm_addr, sin)) continue; - atomic_inc(&nsm->sm_count); - goto out; + atomic_inc(&pos->sm_count); + kfree(nsm); + nsm = pos; + goto found; } - - if (!create) { - nsm = NULL; - goto out; + if (nsm) { + list_add(&nsm->sm_link, &nsm_handles); + goto found; } + spin_unlock(&nsm_lock); + + if (!create) + return NULL; nsm = kzalloc(sizeof(*nsm) + hostname_len + 1, GFP_KERNEL); - if (nsm != NULL) { - nsm->sm_addr = *sin; - nsm->sm_name = (char *) (nsm + 1); - memcpy(nsm->sm_name, hostname, hostname_len); - nsm->sm_name[hostname_len] = '\0'; - atomic_set(&nsm->sm_count, 1); + if (nsm == NULL) + return NULL; - list_add(&nsm->sm_link, &nsm_handles); - } + nsm->sm_addr = *sin; + nsm->sm_name = (char *) (nsm + 1); + memcpy(nsm->sm_name, hostname, hostname_len); + nsm->sm_name[hostname_len] = '\0'; + atomic_set(&nsm->sm_count, 1); + goto retry; -out: - mutex_unlock(&nsm_mutex); +found: + spin_unlock(&nsm_lock); return nsm; } @@ -532,12 +536,9 @@ nsm_release(struct nsm_handle *nsm) { if (!nsm) return; - if (atomic_dec_and_test(&nsm->sm_count)) { - mutex_lock(&nsm_mutex); - if (atomic_read(&nsm->sm_count) == 0) { - list_del(&nsm->sm_link); - kfree(nsm); - } - mutex_unlock(&nsm_mutex); + if (atomic_dec_and_lock(&nsm->sm_count, &nsm_lock)) { + list_del(&nsm->sm_link); + spin_unlock(&nsm_lock); + kfree(nsm); } } diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c index 908b23f..e4d5635 100644 --- a/fs/lockd/mon.c +++ b/fs/lockd/mon.c @@ -18,6 +18,8 @@ #define NLMDBG_FACILITY NLMDBG_MONITOR +#define XDR_ADDRBUF_LEN (20) + static struct rpc_clnt * nsm_create(void); static struct rpc_program nsm_program; @@ -147,28 +149,55 @@ nsm_create(void) /* * XDR functions for NSM. + * + * See http://www.opengroup.org/ for details on the Network + * Status Monitor wire protocol. */ -static __be32 * -xdr_encode_common(struct rpc_rqst *rqstp, __be32 *p, struct nsm_args *argp) +static __be32 *xdr_encode_nsm_string(__be32 *p, char *string) { - char buffer[20], *name; - - /* - * Use the dotted-quad IP address of the remote host as - * identifier. Linux statd always looks up the canonical - * hostname first for whatever remote hostname it receives, - * so this works alright. - */ - if (nsm_use_hostnames) { - name = argp->mon_name; - } else { - sprintf(buffer, "%u.%u.%u.%u", NIPQUAD(argp->addr)); + size_t len = strlen(string); + + if (len > SM_MAXSTRLEN) + len = SM_MAXSTRLEN; + return xdr_encode_opaque(p, string, len); +} + +/* + * "mon_name" specifies the host to be monitored. + * + * Linux uses a text version of the IP address of the remote + * host as the host identifier (the "mon_name" argument). + * + * Linux statd always looks up the canonical hostname first for + * whatever remote hostname it receives, so this works alright. + */ +static __be32 *xdr_encode_mon_name(__be32 *p, struct nsm_args *argp) +{ + char buffer[XDR_ADDRBUF_LEN + 1]; + char *name = argp->mon_name; + + if (!nsm_use_hostnames) { + snprintf(buffer, XDR_ADDRBUF_LEN, + NIPQUAD_FMT, NIPQUAD(argp->addr)); name = buffer; } - if (!(p = xdr_encode_string(p, name)) - || !(p = xdr_encode_string(p, utsname()->nodename))) + + return xdr_encode_nsm_string(p, name); +} + +/* + * The "my_id" argument specifies the hostname and RPC procedure + * to be called when the status manager receives notification + * (via the SM_NOTIFY call) that the state of host "mon_name" + * has changed. + */ +static __be32 *xdr_encode_my_id(__be32 *p, struct nsm_args *argp) +{ + p = xdr_encode_nsm_string(p, utsname()->nodename); + if (!p) return ERR_PTR(-EIO); + *p++ = htonl(argp->prog); *p++ = htonl(argp->vers); *p++ = htonl(argp->proc); @@ -176,18 +205,48 @@ xdr_encode_common(struct rpc_rqst *rqstp, __be32 *p, struct nsm_args *argp) return p; } -static int -xdr_encode_mon(struct rpc_rqst *rqstp, __be32 *p, struct nsm_args *argp) +/* + * The "mon_id" argument specifies the non-private arguments + * of an SM_MON or SM_UNMON call. + */ +static __be32 *xdr_encode_mon_id(__be32 *p, struct nsm_args *argp) { - p = xdr_encode_common(rqstp, p, argp); - if (IS_ERR(p)) - return PTR_ERR(p); + p = xdr_encode_mon_name(p, argp); + if (!p) + return ERR_PTR(-EIO); - /* Surprise - there may even be room for an IPv6 address now */ + return xdr_encode_my_id(p, argp); +} + +/* + * The "priv" argument may contain private information required + * by the SM_MON call. This information will be supplied in the + * SM_NOTIFY call. + * + * Linux provides the raw IP address of the monitored host, + * left in network byte order. + */ +static __be32 *xdr_encode_priv(__be32 *p, struct nsm_args *argp) +{ *p++ = argp->addr; *p++ = 0; *p++ = 0; *p++ = 0; + + return p; +} + +static int +xdr_encode_mon(struct rpc_rqst *rqstp, __be32 *p, struct nsm_args *argp) +{ + p = xdr_encode_mon_id(p, argp); + if (IS_ERR(p)) + return PTR_ERR(p); + + p = xdr_encode_priv(p, argp); + if (IS_ERR(p)) + return PTR_ERR(p); + rqstp->rq_slen = xdr_adjust_iovec(rqstp->rq_svec, p); return 0; } @@ -195,7 +254,7 @@ xdr_encode_mon(struct rpc_rqst *rqstp, __be32 *p, struct nsm_args *argp) static int xdr_encode_unmon(struct rpc_rqst *rqstp, __be32 *p, struct nsm_args *argp) { - p = xdr_encode_common(rqstp, p, argp); + p = xdr_encode_mon_id(p, argp); if (IS_ERR(p)) return PTR_ERR(p); rqstp->rq_slen = xdr_adjust_iovec(rqstp->rq_svec, p); @@ -220,9 +279,11 @@ xdr_decode_stat(struct rpc_rqst *rqstp, __be32 *p, struct nsm_res *resp) } #define SM_my_name_sz (1+XDR_QUADLEN(SM_MAXSTRLEN)) -#define SM_my_id_sz (3+1+SM_my_name_sz) -#define SM_mon_id_sz (1+XDR_QUADLEN(20)+SM_my_id_sz) -#define SM_mon_sz (SM_mon_id_sz+4) +#define SM_my_id_sz (SM_my_name_sz+3) +#define SM_mon_name_sz (1+XDR_QUADLEN(SM_MAXSTRLEN)) +#define SM_mon_id_sz (SM_mon_name_sz+SM_my_id_sz) +#define SM_priv_sz (XDR_QUADLEN(SM_PRIV_SIZE)) +#define SM_mon_sz (SM_mon_id_sz+SM_priv_sz) #define SM_monres_sz 2 #define SM_unmonres_sz 1 diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 1ed8bd4..2169af4 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include @@ -48,14 +49,11 @@ EXPORT_SYMBOL(nlmsvc_ops); static DEFINE_MUTEX(nlmsvc_mutex); static unsigned int nlmsvc_users; -static pid_t nlmsvc_pid; +static struct task_struct *nlmsvc_task; static struct svc_serv *nlmsvc_serv; int nlmsvc_grace_period; unsigned long nlmsvc_timeout; -static DECLARE_COMPLETION(lockd_start_done); -static DECLARE_WAIT_QUEUE_HEAD(lockd_exit); - /* * These can be set at insmod time (useful for NFS as root filesystem), * and also changed through the sysctl interface. -- Jamie Lokier, Aug 2003 @@ -74,7 +72,9 @@ static const unsigned long nlm_timeout_min = 3; static const unsigned long nlm_timeout_max = 20; static const int nlm_port_min = 0, nlm_port_max = 65535; +#ifdef CONFIG_SYSCTL static struct ctl_table_header * nlm_sysctl_table; +#endif static unsigned long get_lockd_grace_period(void) { @@ -111,35 +111,30 @@ static inline void clear_grace_period(void) /* * This is the lockd kernel thread */ -static void -lockd(struct svc_rqst *rqstp) +static int +lockd(void *vrqstp) { - int err = 0; + int err = 0, preverr = 0; + struct svc_rqst *rqstp = vrqstp; unsigned long grace_period_expire; - /* Lock module and set up kernel thread */ - /* lockd_up is waiting for us to startup, so will - * be holding a reference to this module, so it - * is safe to just claim another reference - */ - __module_get(THIS_MODULE); - lock_kernel(); - - /* - * Let our maker know we're running. - */ - nlmsvc_pid = current->pid; - nlmsvc_serv = rqstp->rq_server; - complete(&lockd_start_done); - - daemonize("lockd"); + /* try_to_freeze() is called from svc_recv() */ set_freezable(); - /* Process request with signals blocked, but allow SIGKILL. */ + /* Allow SIGKILL to tell lockd to drop all of its locks */ allow_signal(SIGKILL); dprintk("NFS locking service started (ver " LOCKD_VERSION ").\n"); + /* + * FIXME: it would be nice if lockd didn't spend its entire life + * running under the BKL. At the very least, it would be good to + * have someone clarify what it's intended to protect here. I've + * seen some handwavy posts about posix locking needing to be + * done under the BKL, but it's far from clear. + */ + lock_kernel(); + if (!nlm_timeout) nlm_timeout = LOCKD_DFLT_TIMEO; nlmsvc_timeout = nlm_timeout * HZ; @@ -148,10 +143,9 @@ lockd(struct svc_rqst *rqstp) /* * The main request loop. We don't terminate until the last - * NFS mount or NFS daemon has gone away, and we've been sent a - * signal, or else another process has taken over our job. + * NFS mount or NFS daemon has gone away. */ - while ((nlmsvc_users || !signalled()) && nlmsvc_pid == current->pid) { + while (!kthread_should_stop()) { long timeout = MAX_SCHEDULE_TIMEOUT; RPC_IFDEBUG(char buf[RPC_MAX_ADDRBUFLEN]); @@ -161,6 +155,7 @@ lockd(struct svc_rqst *rqstp) nlmsvc_invalidate_all(); grace_period_expire = set_grace_period(); } + continue; } /* @@ -179,14 +174,20 @@ lockd(struct svc_rqst *rqstp) * recvfrom routine. */ err = svc_recv(rqstp, timeout); - if (err == -EAGAIN || err == -EINTR) + if (err == -EAGAIN || err == -EINTR) { + preverr = err; continue; + } if (err < 0) { - printk(KERN_WARNING - "lockd: terminating on error %d\n", - -err); - break; + if (err != preverr) { + printk(KERN_WARNING "%s: unexpected error " + "from svc_recv (%d)\n", __func__, err); + preverr = err; + } + schedule_timeout_interruptible(HZ); + continue; } + preverr = err; dprintk("lockd: request from %s\n", svc_print_addr(rqstp, buf, sizeof(buf))); @@ -195,28 +196,19 @@ lockd(struct svc_rqst *rqstp) } flush_signals(current); + if (nlmsvc_ops) + nlmsvc_invalidate_all(); + nlm_shutdown_hosts(); - /* - * Check whether there's a new lockd process before - * shutting down the hosts and clearing the slot. - */ - if (!nlmsvc_pid || current->pid == nlmsvc_pid) { - if (nlmsvc_ops) - nlmsvc_invalidate_all(); - nlm_shutdown_hosts(); - nlmsvc_pid = 0; - nlmsvc_serv = NULL; - } else - printk(KERN_DEBUG - "lockd: new process, skipping host shutdown\n"); - wake_up(&lockd_exit); + unlock_kernel(); + + nlmsvc_task = NULL; + nlmsvc_serv = NULL; /* Exit the RPC thread */ svc_exit_thread(rqstp); - /* Release module */ - unlock_kernel(); - module_put_and_exit(0); + return 0; } /* @@ -261,14 +253,15 @@ static int make_socks(struct svc_serv *serv, int proto) int lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */ { - struct svc_serv * serv; - int error = 0; + struct svc_serv *serv; + struct svc_rqst *rqstp; + int error = 0; mutex_lock(&nlmsvc_mutex); /* * Check whether we're already up and running. */ - if (nlmsvc_pid) { + if (nlmsvc_serv) { if (proto) error = make_socks(nlmsvc_serv, proto); goto out; @@ -295,13 +288,28 @@ lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */ /* * Create the kernel thread and wait for it to start. */ - error = svc_create_thread(lockd, serv); - if (error) { + rqstp = svc_prepare_thread(serv, &serv->sv_pools[0]); + if (IS_ERR(rqstp)) { + error = PTR_ERR(rqstp); + printk(KERN_WARNING + "lockd_up: svc_rqst allocation failed, error=%d\n", + error); + goto destroy_and_out; + } + + svc_sock_update_bufs(serv); + nlmsvc_serv = rqstp->rq_server; + + nlmsvc_task = kthread_run(lockd, rqstp, serv->sv_name); + if (IS_ERR(nlmsvc_task)) { + error = PTR_ERR(nlmsvc_task); + nlmsvc_task = NULL; + nlmsvc_serv = NULL; printk(KERN_WARNING - "lockd_up: create thread failed, error=%d\n", error); + "lockd_up: kthread_run failed, error=%d\n", error); + svc_exit_thread(rqstp); goto destroy_and_out; } - wait_for_completion(&lockd_start_done); /* * Note: svc_serv structures have an initial use count of 1, @@ -323,42 +331,28 @@ EXPORT_SYMBOL(lockd_up); void lockd_down(void) { - static int warned; - mutex_lock(&nlmsvc_mutex); if (nlmsvc_users) { if (--nlmsvc_users) goto out; - } else - printk(KERN_WARNING "lockd_down: no users! pid=%d\n", nlmsvc_pid); - - if (!nlmsvc_pid) { - if (warned++ == 0) - printk(KERN_WARNING "lockd_down: no lockd running.\n"); - goto out; + } else { + printk(KERN_ERR "lockd_down: no users! task=%p\n", + nlmsvc_task); + BUG(); } - warned = 0; - kill_proc(nlmsvc_pid, SIGKILL, 1); - /* - * Wait for the lockd process to exit, but since we're holding - * the lockd semaphore, we can't wait around forever ... - */ - clear_thread_flag(TIF_SIGPENDING); - interruptible_sleep_on_timeout(&lockd_exit, HZ); - if (nlmsvc_pid) { - printk(KERN_WARNING - "lockd_down: lockd failed to exit, clearing pid\n"); - nlmsvc_pid = 0; + if (!nlmsvc_task) { + printk(KERN_ERR "lockd_down: no lockd running.\n"); + BUG(); } - spin_lock_irq(¤t->sighand->siglock); - recalc_sigpending(); - spin_unlock_irq(¤t->sighand->siglock); + kthread_stop(nlmsvc_task); out: mutex_unlock(&nlmsvc_mutex); } EXPORT_SYMBOL(lockd_down); +#ifdef CONFIG_SYSCTL + /* * Sysctl parameters (same as module parameters, different interface). */ @@ -443,6 +437,8 @@ static ctl_table nlm_sysctl_root[] = { { .ctl_name = 0 } }; +#endif /* CONFIG_SYSCTL */ + /* * Module (and sysfs) parameters. */ @@ -516,15 +512,21 @@ module_param(nsm_use_hostnames, bool, 0644); static int __init init_nlm(void) { +#ifdef CONFIG_SYSCTL nlm_sysctl_table = register_sysctl_table(nlm_sysctl_root); return nlm_sysctl_table ? 0 : -ENOMEM; +#else + return 0; +#endif } static void __exit exit_nlm(void) { /* FIXME: delete all NLM clients */ nlm_shutdown_hosts(); +#ifdef CONFIG_SYSCTL unregister_sysctl_table(nlm_sysctl_table); +#endif } module_init(init_nlm); diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c index fe9bdb4..1f122c1 100644 --- a/fs/lockd/svclock.c +++ b/fs/lockd/svclock.c @@ -29,6 +29,7 @@ #include #include #include +#include #define NLMDBG_FACILITY NLMDBG_SVCLOCK @@ -226,8 +227,7 @@ failed: } /* - * Delete a block. If the lock was cancelled or the grant callback - * failed, unlock is set to 1. + * Delete a block. * It is the caller's responsibility to check whether the file * can be closed hereafter. */ @@ -887,7 +887,7 @@ nlmsvc_retry_blocked(void) unsigned long timeout = MAX_SCHEDULE_TIMEOUT; struct nlm_block *block; - while (!list_empty(&nlm_blocked)) { + while (!list_empty(&nlm_blocked) && !kthread_should_stop()) { block = list_entry(nlm_blocked.next, struct nlm_block, b_list); if (block->b_when == NLM_NEVER) diff --git a/fs/lockd/svcshare.c b/fs/lockd/svcshare.c index 068886d..b0ae070 100644 --- a/fs/lockd/svcshare.c +++ b/fs/lockd/svcshare.c @@ -71,7 +71,8 @@ nlmsvc_unshare_file(struct nlm_host *host, struct nlm_file *file, struct nlm_share *share, **shpp; struct xdr_netobj *oh = &argp->lock.oh; - for (shpp = &file->f_shares; (share = *shpp) != 0; shpp = &share->s_next) { + for (shpp = &file->f_shares; (share = *shpp) != NULL; + shpp = &share->s_next) { if (share->s_host == host && nlm_cmp_owner(share, oh)) { *shpp = share->s_next; kfree(share); diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile index df0f41e..ac6170c 100644 --- a/fs/nfs/Makefile +++ b/fs/nfs/Makefile @@ -5,7 +5,7 @@ obj-$(CONFIG_NFS_FS) += nfs.o nfs-y := client.o dir.o file.o getroot.o inode.o super.o nfs2xdr.o \ - pagelist.o proc.o read.o symlink.o unlink.o \ + direct.o pagelist.o proc.o read.o symlink.o unlink.o \ write.o namespace.o mount_clnt.o nfs-$(CONFIG_ROOT_NFS) += nfsroot.o nfs-$(CONFIG_NFS_V3) += nfs3proc.o nfs3xdr.o @@ -14,5 +14,4 @@ nfs-$(CONFIG_NFS_V4) += nfs4proc.o nfs4xdr.o nfs4state.o nfs4renewd.o \ delegation.o idmap.o \ callback.o callback_xdr.o callback_proc.o \ nfs4namespace.o -nfs-$(CONFIG_NFS_DIRECTIO) += direct.o nfs-$(CONFIG_SYSCTL) += sysctl.o diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index 66648dd..5606ae3 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -15,6 +15,7 @@ #include #include #include +#include #include @@ -27,9 +28,7 @@ struct nfs_callback_data { unsigned int users; struct svc_serv *serv; - pid_t pid; - struct completion started; - struct completion stopped; + struct task_struct *task; }; static struct nfs_callback_data nfs_callback_info; @@ -57,48 +56,44 @@ module_param_call(callback_tcpport, param_set_port, param_get_int, /* * This is the callback kernel thread. */ -static void nfs_callback_svc(struct svc_rqst *rqstp) +static int +nfs_callback_svc(void *vrqstp) { - int err; + int err, preverr = 0; + struct svc_rqst *rqstp = vrqstp; - __module_get(THIS_MODULE); - lock_kernel(); - - nfs_callback_info.pid = current->pid; - daemonize("nfsv4-svc"); - /* Process request with signals blocked, but allow SIGKILL. */ - allow_signal(SIGKILL); set_freezable(); - complete(&nfs_callback_info.started); - - for(;;) { - if (signalled()) { - if (nfs_callback_info.users == 0) - break; - flush_signals(current); - } + /* + * FIXME: do we really need to run this under the BKL? If so, please + * add a comment about what it's intended to protect. + */ + lock_kernel(); + while (!kthread_should_stop()) { /* * Listen for a request on the socket */ err = svc_recv(rqstp, MAX_SCHEDULE_TIMEOUT); - if (err == -EAGAIN || err == -EINTR) + if (err == -EAGAIN || err == -EINTR) { + preverr = err; continue; + } if (err < 0) { - printk(KERN_WARNING - "%s: terminating on error %d\n", - __FUNCTION__, -err); - break; + if (err != preverr) { + printk(KERN_WARNING "%s: unexpected error " + "from svc_recv (%d)\n", __func__, err); + preverr = err; + } + schedule_timeout_uninterruptible(HZ); + continue; } + preverr = err; svc_process(rqstp); } - - flush_signals(current); - svc_exit_thread(rqstp); - nfs_callback_info.pid = 0; - complete(&nfs_callback_info.stopped); unlock_kernel(); - module_put_and_exit(0); + nfs_callback_info.task = NULL; + svc_exit_thread(rqstp); + return 0; } /* @@ -107,14 +102,13 @@ static void nfs_callback_svc(struct svc_rqst *rqstp) int nfs_callback_up(void) { struct svc_serv *serv = NULL; + struct svc_rqst *rqstp; int ret = 0; lock_kernel(); mutex_lock(&nfs_callback_mutex); - if (nfs_callback_info.users++ || nfs_callback_info.pid != 0) + if (nfs_callback_info.users++ || nfs_callback_info.task != NULL) goto out; - init_completion(&nfs_callback_info.started); - init_completion(&nfs_callback_info.stopped); serv = svc_create(&nfs4_callback_program, NFS4_CALLBACK_BUFSIZE, NULL); ret = -ENOMEM; if (!serv) @@ -127,15 +121,28 @@ int nfs_callback_up(void) nfs_callback_tcpport = ret; dprintk("Callback port = 0x%x\n", nfs_callback_tcpport); - ret = svc_create_thread(nfs_callback_svc, serv); - if (ret < 0) + rqstp = svc_prepare_thread(serv, &serv->sv_pools[0]); + if (IS_ERR(rqstp)) { + ret = PTR_ERR(rqstp); goto out_err; + } + + svc_sock_update_bufs(serv); nfs_callback_info.serv = serv; - wait_for_completion(&nfs_callback_info.started); + + nfs_callback_info.task = kthread_run(nfs_callback_svc, rqstp, + "nfsv4-svc"); + if (IS_ERR(nfs_callback_info.task)) { + ret = PTR_ERR(nfs_callback_info.task); + nfs_callback_info.serv = NULL; + nfs_callback_info.task = NULL; + svc_exit_thread(rqstp); + goto out_err; + } out: /* * svc_create creates the svc_serv with sv_nrthreads == 1, and then - * svc_create_thread increments that. So we need to call svc_destroy + * svc_prepare_thread increments that. So we need to call svc_destroy * on both success and failure so that the refcount is 1 when the * thread exits. */ @@ -152,19 +159,15 @@ out_err: } /* - * Kill the server process if it is not already up. + * Kill the server process if it is not already down. */ void nfs_callback_down(void) { lock_kernel(); mutex_lock(&nfs_callback_mutex); nfs_callback_info.users--; - do { - if (nfs_callback_info.users != 0 || nfs_callback_info.pid == 0) - break; - if (kill_proc(nfs_callback_info.pid, SIGKILL, 1) < 0) - break; - } while (wait_for_completion_timeout(&nfs_callback_info.stopped, 5*HZ) == 0); + if (nfs_callback_info.users == 0 && nfs_callback_info.task != NULL) + kthread_stop(nfs_callback_info.task); mutex_unlock(&nfs_callback_mutex); unlock_kernel(); } diff --git a/fs/nfs/client.c b/fs/nfs/client.c index c5c0175..93dfd75 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -170,6 +170,8 @@ static void nfs4_shutdown_client(struct nfs_client *clp) BUG_ON(!RB_EMPTY_ROOT(&clp->cl_state_owners)); if (__test_and_clear_bit(NFS_CS_IDMAP, &clp->cl_res_state)) nfs_idmap_delete(clp); + + rpc_destroy_wait_queue(&clp->cl_rpcwaitq); #endif } @@ -680,10 +682,22 @@ static int nfs_init_server(struct nfs_server *server, if (error < 0) goto error; + server->port = data->nfs_server.port; + error = nfs_init_server_rpcclient(server, &timeparms, data->auth_flavors[0]); if (error < 0) goto error; + /* Preserve the values of mount_server-related mount options */ + if (data->mount_server.addrlen) { + memcpy(&server->mountd_address, &data->mount_server.address, + data->mount_server.addrlen); + server->mountd_addrlen = data->mount_server.addrlen; + } + server->mountd_version = data->mount_server.version; + server->mountd_port = data->mount_server.port; + server->mountd_protocol = data->mount_server.protocol; + server->namelen = data->namlen; /* Create a client RPC handle for the NFSv3 ACL management interface */ nfs_init_server_aclclient(server); @@ -1062,6 +1076,8 @@ static int nfs4_init_server(struct nfs_server *server, server->acdirmin = data->acdirmin * HZ; server->acdirmax = data->acdirmax * HZ; + server->port = data->nfs_server.port; + error = nfs_init_server_rpcclient(server, &timeparms, data->auth_flavors[0]); error: diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 6cea747..d583654 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1966,7 +1966,7 @@ force_lookup: if (!NFS_PROTO(inode)->access) goto out_notsup; - cred = rpcauth_lookupcred(NFS_CLIENT(inode)->cl_auth, 0); + cred = rpc_lookup_cred(); if (!IS_ERR(cred)) { res = nfs_do_access(inode, cred, mask); put_rpccred(cred); diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 16844f9..e442005 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -280,6 +280,7 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq, .rpc_client = NFS_CLIENT(inode), .rpc_message = &msg, .callback_ops = &nfs_read_direct_ops, + .workqueue = nfsiod_workqueue, .flags = RPC_TASK_ASYNC, }; unsigned int pgbase; @@ -323,7 +324,7 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq, data->inode = inode; data->cred = msg.rpc_cred; data->args.fh = NFS_FH(inode); - data->args.context = ctx; + data->args.context = get_nfs_open_context(ctx); data->args.offset = pos; data->args.pgbase = pgbase; data->args.pages = data->pagevec; @@ -446,6 +447,7 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq) struct rpc_task_setup task_setup_data = { .rpc_client = NFS_CLIENT(inode), .callback_ops = &nfs_write_direct_ops, + .workqueue = nfsiod_workqueue, .flags = RPC_TASK_ASYNC, }; @@ -537,6 +539,7 @@ static void nfs_direct_commit_schedule(struct nfs_direct_req *dreq) .rpc_message = &msg, .callback_ops = &nfs_commit_direct_ops, .callback_data = data, + .workqueue = nfsiod_workqueue, .flags = RPC_TASK_ASYNC, }; @@ -546,6 +549,7 @@ static void nfs_direct_commit_schedule(struct nfs_direct_req *dreq) data->args.fh = NFS_FH(data->inode); data->args.offset = 0; data->args.count = 0; + data->args.context = get_nfs_open_context(dreq->ctx); data->res.count = 0; data->res.fattr = &data->fattr; data->res.verf = &data->verf; @@ -682,6 +686,7 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq, .rpc_client = NFS_CLIENT(inode), .rpc_message = &msg, .callback_ops = &nfs_write_direct_ops, + .workqueue = nfsiod_workqueue, .flags = RPC_TASK_ASYNC, }; size_t wsize = NFS_SERVER(inode)->wsize; @@ -728,7 +733,7 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq, data->inode = inode; data->cred = msg.rpc_cred; data->args.fh = NFS_FH(inode); - data->args.context = ctx; + data->args.context = get_nfs_open_context(ctx); data->args.offset = pos; data->args.pgbase = pgbase; data->args.pages = data->pagevec; diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 5d2e9d9..8312b0f 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -238,10 +238,8 @@ nfs_file_read(struct kiocb *iocb, const struct iovec *iov, ssize_t result; size_t count = iov_length(iov, nr_segs); -#ifdef CONFIG_NFS_DIRECTIO if (iocb->ki_filp->f_flags & O_DIRECT) return nfs_file_direct_read(iocb, iov, nr_segs, pos); -#endif dfprintk(VFS, "nfs: read(%s/%s, %lu@%lu)\n", dentry->d_parent->d_name.name, dentry->d_name.name, @@ -387,9 +385,7 @@ const struct address_space_operations nfs_file_aops = { .write_end = nfs_write_end, .invalidatepage = nfs_invalidate_page, .releasepage = nfs_release_page, -#ifdef CONFIG_NFS_DIRECTIO .direct_IO = nfs_direct_IO, -#endif .launder_page = nfs_launder_page, }; @@ -447,10 +443,8 @@ static ssize_t nfs_file_write(struct kiocb *iocb, const struct iovec *iov, ssize_t result; size_t count = iov_length(iov, nr_segs); -#ifdef CONFIG_NFS_DIRECTIO if (iocb->ki_filp->f_flags & O_DIRECT) return nfs_file_direct_write(iocb, iov, nr_segs, pos); -#endif dfprintk(VFS, "nfs: write(%s/%s(%ld), %lu@%Ld)\n", dentry->d_parent->d_name.name, dentry->d_name.name, diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 6f88d7c..5cb3345 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -523,8 +523,12 @@ struct nfs_open_context *get_nfs_open_context(struct nfs_open_context *ctx) static void __put_nfs_open_context(struct nfs_open_context *ctx, int wait) { - struct inode *inode = ctx->path.dentry->d_inode; + struct inode *inode; + if (ctx == NULL) + return; + + inode = ctx->path.dentry->d_inode; if (!atomic_dec_and_lock(&ctx->count, &inode->i_lock)) return; list_del(&ctx->list); @@ -610,7 +614,7 @@ int nfs_open(struct inode *inode, struct file *filp) struct nfs_open_context *ctx; struct rpc_cred *cred; - cred = rpcauth_lookupcred(NFS_CLIENT(inode)->cl_auth, 0); + cred = rpc_lookup_cred(); if (IS_ERR(cred)) return PTR_ERR(cred); ctx = alloc_nfs_open_context(filp->f_path.mnt, filp->f_path.dentry, cred); @@ -1218,6 +1222,36 @@ static void nfs_destroy_inodecache(void) kmem_cache_destroy(nfs_inode_cachep); } +struct workqueue_struct *nfsiod_workqueue; + +/* + * start up the nfsiod workqueue + */ +static int nfsiod_start(void) +{ + struct workqueue_struct *wq; + dprintk("RPC: creating workqueue nfsiod\n"); + wq = create_singlethread_workqueue("nfsiod"); + if (wq == NULL) + return -ENOMEM; + nfsiod_workqueue = wq; + return 0; +} + +/* + * Destroy the nfsiod workqueue + */ +static void nfsiod_stop(void) +{ + struct workqueue_struct *wq; + + wq = nfsiod_workqueue; + if (wq == NULL) + return; + nfsiod_workqueue = NULL; + destroy_workqueue(wq); +} + /* * Initialize NFS */ @@ -1225,6 +1259,10 @@ static int __init init_nfs_fs(void) { int err; + err = nfsiod_start(); + if (err) + goto out6; + err = nfs_fs_proc_init(); if (err) goto out5; @@ -1271,6 +1309,8 @@ out3: out4: nfs_fs_proc_exit(); out5: + nfsiod_stop(); +out6: return err; } @@ -1286,6 +1326,7 @@ static void __exit exit_nfs_fs(void) #endif unregister_nfs_fs(); nfs_fs_proc_exit(); + nfsiod_stop(); } /* Not quite true; I just maintain it */ diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 9319927..04ae867 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -46,9 +46,9 @@ struct nfs_parsed_mount_data { struct sockaddr_storage address; size_t addrlen; char *hostname; - unsigned int version; + u32 version; unsigned short port; - int protocol; + unsigned short protocol; } mount_server; struct { @@ -56,7 +56,8 @@ struct nfs_parsed_mount_data { size_t addrlen; char *hostname; char *export_path; - int protocol; + unsigned short port; + unsigned short protocol; } nfs_server; struct security_mnt_opts lsm_opts; @@ -115,13 +116,8 @@ extern void nfs_destroy_readpagecache(void); extern int __init nfs_init_writepagecache(void); extern void nfs_destroy_writepagecache(void); -#ifdef CONFIG_NFS_DIRECTIO extern int __init nfs_init_directcache(void); extern void nfs_destroy_directcache(void); -#else -#define nfs_init_directcache() (0) -#define nfs_destroy_directcache() do {} while(0) -#endif /* nfs2xdr.c */ extern int nfs_stat_to_errno(int); @@ -146,6 +142,7 @@ extern struct rpc_procinfo nfs4_procedures[]; extern int nfs_access_cache_shrinker(int nr_to_scan, gfp_t gfp_mask); /* inode.c */ +extern struct workqueue_struct *nfsiod_workqueue; extern struct inode *nfs_alloc_inode(struct super_block *sb); extern void nfs_destroy_inode(struct inode *); extern int nfs_write_inode(struct inode *,int); diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c index 1f7ea67..86a80b3 100644 --- a/fs/nfs/nfs2xdr.c +++ b/fs/nfs/nfs2xdr.c @@ -428,7 +428,7 @@ nfs_xdr_readdirres(struct rpc_rqst *req, __be32 *p, void *dummy) size_t hdrlen; unsigned int pglen, recvd; u32 len; - int status, nr; + int status, nr = 0; __be32 *end, *entry, *kaddr; if ((status = ntohl(*p++))) @@ -452,7 +452,12 @@ nfs_xdr_readdirres(struct rpc_rqst *req, __be32 *p, void *dummy) kaddr = p = kmap_atomic(*page, KM_USER0); end = (__be32 *)((char *)p + pglen); entry = p; - for (nr = 0; *p++; nr++) { + + /* Make sure the packet actually has a value_follows and EOF entry */ + if ((entry + 1) > end) + goto short_pkt; + + for (; *p++; nr++) { if (p + 2 > end) goto short_pkt; p++; /* fileid */ @@ -467,18 +472,32 @@ nfs_xdr_readdirres(struct rpc_rqst *req, __be32 *p, void *dummy) goto short_pkt; entry = p; } - if (!nr && (entry[0] != 0 || entry[1] == 0)) - goto short_pkt; + + /* + * Apparently some server sends responses that are a valid size, but + * contain no entries, and have value_follows==0 and EOF==0. For + * those, just set the EOF marker. + */ + if (!nr && entry[1] == 0) { + dprintk("NFS: readdir reply truncated!\n"); + entry[1] = 1; + } out: kunmap_atomic(kaddr, KM_USER0); return nr; short_pkt: + /* + * When we get a short packet there are 2 possibilities. We can + * return an error, or fix up the response to look like a valid + * response and return what we have so far. If there are no + * entries and the packet was short, then return -EIO. If there + * are valid entries in the response, return them and pretend that + * the call was successful, but incomplete. The caller can retry the + * readdir starting at the last cookie. + */ entry[0] = entry[1] = 0; - /* truncate listing ? */ - if (!nr) { - dprintk("NFS: readdir reply truncated!\n"); - entry[1] = 1; - } + if (!nr) + nr = -errno_NFSERR_IO; goto out; err_unmap: nr = -errno_NFSERR_IO; diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c index 3917e2f..fb03048 100644 --- a/fs/nfs/nfs3xdr.c +++ b/fs/nfs/nfs3xdr.c @@ -508,7 +508,7 @@ nfs3_xdr_readdirres(struct rpc_rqst *req, __be32 *p, struct nfs3_readdirres *res struct page **page; size_t hdrlen; u32 len, recvd, pglen; - int status, nr; + int status, nr = 0; __be32 *entry, *end, *kaddr; status = ntohl(*p++); @@ -542,7 +542,12 @@ nfs3_xdr_readdirres(struct rpc_rqst *req, __be32 *p, struct nfs3_readdirres *res kaddr = p = kmap_atomic(*page, KM_USER0); end = (__be32 *)((char *)p + pglen); entry = p; - for (nr = 0; *p++; nr++) { + + /* Make sure the packet actually has a value_follows and EOF entry */ + if ((entry + 1) > end) + goto short_pkt; + + for (; *p++; nr++) { if (p + 3 > end) goto short_pkt; p += 2; /* inode # */ @@ -581,18 +586,32 @@ nfs3_xdr_readdirres(struct rpc_rqst *req, __be32 *p, struct nfs3_readdirres *res goto short_pkt; entry = p; } - if (!nr && (entry[0] != 0 || entry[1] == 0)) - goto short_pkt; + + /* + * Apparently some server sends responses that are a valid size, but + * contain no entries, and have value_follows==0 and EOF==0. For + * those, just set the EOF marker. + */ + if (!nr && entry[1] == 0) { + dprintk("NFS: readdir reply truncated!\n"); + entry[1] = 1; + } out: kunmap_atomic(kaddr, KM_USER0); return nr; short_pkt: + /* + * When we get a short packet there are 2 possibilities. We can + * return an error, or fix up the response to look like a valid + * response and return what we have so far. If there are no + * entries and the packet was short, then return -EIO. If there + * are valid entries in the response, return them and pretend that + * the call was successful, but incomplete. The caller can retry the + * readdir starting at the last cookie. + */ entry[0] = entry[1] = 0; - /* truncate listing ? */ - if (!nr) { - dprintk("NFS: readdir reply truncated!\n"); - entry[1] = 1; - } + if (!nr) + nr = -errno_NFSERR_IO; goto out; err_unmap: nr = -errno_NFSERR_IO; diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 7ce0786..f38d057 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -51,6 +51,7 @@ #include "nfs4_fs.h" #include "delegation.h" +#include "internal.h" #include "iostat.h" #define NFSDBG_FACILITY NFSDBG_PROC @@ -773,6 +774,7 @@ static int _nfs4_proc_open_confirm(struct nfs4_opendata *data) .rpc_message = &msg, .callback_ops = &nfs4_open_confirm_ops, .callback_data = data, + .workqueue = nfsiod_workqueue, .flags = RPC_TASK_ASYNC, }; int status; @@ -910,6 +912,7 @@ static int _nfs4_proc_open(struct nfs4_opendata *data) .rpc_message = &msg, .callback_ops = &nfs4_open_ops, .callback_data = data, + .workqueue = nfsiod_workqueue, .flags = RPC_TASK_ASYNC, }; int status; @@ -1315,6 +1318,7 @@ int nfs4_do_close(struct path *path, struct nfs4_state *state, int wait) .rpc_client = server->client, .rpc_message = &msg, .callback_ops = &nfs4_close_ops, + .workqueue = nfsiod_workqueue, .flags = RPC_TASK_ASYNC, }; int status = -ENOMEM; @@ -1404,7 +1408,7 @@ nfs4_atomic_open(struct inode *dir, struct dentry *dentry, struct nameidata *nd) BUG_ON(nd->intent.open.flags & O_CREAT); } - cred = rpcauth_lookupcred(NFS_CLIENT(dir)->cl_auth, 0); + cred = rpc_lookup_cred(); if (IS_ERR(cred)) return (struct dentry *)cred; parent = dentry->d_parent; @@ -1439,7 +1443,7 @@ nfs4_open_revalidate(struct inode *dir, struct dentry *dentry, int openflags, st struct rpc_cred *cred; struct nfs4_state *state; - cred = rpcauth_lookupcred(NFS_CLIENT(dir)->cl_auth, 0); + cred = rpc_lookup_cred(); if (IS_ERR(cred)) return PTR_ERR(cred); state = nfs4_do_open(dir, &path, openflags, NULL, cred); @@ -1656,7 +1660,7 @@ nfs4_proc_setattr(struct dentry *dentry, struct nfs_fattr *fattr, nfs_fattr_init(fattr); - cred = rpcauth_lookupcred(NFS_CLIENT(inode)->cl_auth, 0); + cred = rpc_lookup_cred(); if (IS_ERR(cred)) return PTR_ERR(cred); @@ -1892,7 +1896,7 @@ nfs4_proc_create(struct inode *dir, struct dentry *dentry, struct iattr *sattr, struct rpc_cred *cred; int status = 0; - cred = rpcauth_lookupcred(NFS_CLIENT(dir)->cl_auth, 0); + cred = rpc_lookup_cred(); if (IS_ERR(cred)) { status = PTR_ERR(cred); goto out; @@ -2761,10 +2765,10 @@ nfs4_async_handle_error(struct rpc_task *task, const struct nfs_server *server) case -NFS4ERR_STALE_CLIENTID: case -NFS4ERR_STALE_STATEID: case -NFS4ERR_EXPIRED: - rpc_sleep_on(&clp->cl_rpcwaitq, task, NULL, NULL); + rpc_sleep_on(&clp->cl_rpcwaitq, task, NULL); nfs4_schedule_state_recovery(clp); if (test_bit(NFS4CLNT_STATE_RECOVER, &clp->cl_state) == 0) - rpc_wake_up_task(task); + rpc_wake_up_queued_task(&clp->cl_rpcwaitq, task); task->tk_status = 0; return -EAGAIN; case -NFS4ERR_DELAY: @@ -3235,6 +3239,7 @@ static struct rpc_task *nfs4_do_unlck(struct file_lock *fl, .rpc_client = NFS_CLIENT(lsp->ls_state->inode), .rpc_message = &msg, .callback_ops = &nfs4_locku_ops, + .workqueue = nfsiod_workqueue, .flags = RPC_TASK_ASYNC, }; @@ -3419,6 +3424,7 @@ static int _nfs4_do_setlk(struct nfs4_state *state, int cmd, struct file_lock *f .rpc_client = NFS_CLIENT(state->inode), .rpc_message = &msg, .callback_ops = &nfs4_lock_ops, + .workqueue = nfsiod_workqueue, .flags = RPC_TASK_ASYNC, }; int ret; diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index b962397..7775435 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -292,8 +292,10 @@ struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *server, struct spin_unlock(&clp->cl_lock); if (sp == new) get_rpccred(cred); - else + else { + rpc_destroy_wait_queue(&new->so_sequence.wait); kfree(new); + } return sp; } @@ -310,6 +312,7 @@ void nfs4_put_state_owner(struct nfs4_state_owner *sp) return; nfs4_remove_state_owner(clp, sp); spin_unlock(&clp->cl_lock); + rpc_destroy_wait_queue(&sp->so_sequence.wait); put_rpccred(cred); kfree(sp); } @@ -529,6 +532,7 @@ static void nfs4_free_lock_state(struct nfs4_lock_state *lsp) spin_lock(&clp->cl_lock); nfs_free_unique_id(&clp->cl_lockowner_id, &lsp->ls_id); spin_unlock(&clp->cl_lock); + rpc_destroy_wait_queue(&lsp->ls_sequence.wait); kfree(lsp); } @@ -731,7 +735,7 @@ int nfs_wait_on_sequence(struct nfs_seqid *seqid, struct rpc_task *task) list_add_tail(&seqid->list, &sequence->list); if (list_first_entry(&sequence->list, struct nfs_seqid, list) == seqid) goto unlock; - rpc_sleep_on(&sequence->wait, task, NULL, NULL); + rpc_sleep_on(&sequence->wait, task, NULL); status = -EAGAIN; unlock: spin_unlock(&sequence->lock); diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c index db1ed9c..2b519f6 100644 --- a/fs/nfs/nfs4xdr.c +++ b/fs/nfs/nfs4xdr.c @@ -110,7 +110,7 @@ static int nfs4_stat_to_errno(int); #define decode_savefh_maxsz (op_decode_hdr_maxsz) #define encode_restorefh_maxsz (op_encode_hdr_maxsz) #define decode_restorefh_maxsz (op_decode_hdr_maxsz) -#define encode_fsinfo_maxsz (op_encode_hdr_maxsz + 2) +#define encode_fsinfo_maxsz (encode_getattr_maxsz) #define decode_fsinfo_maxsz (op_decode_hdr_maxsz + 11) #define encode_renew_maxsz (op_encode_hdr_maxsz + 3) #define decode_renew_maxsz (op_decode_hdr_maxsz) @@ -3481,7 +3481,7 @@ static int decode_readdir(struct xdr_stream *xdr, struct rpc_rqst *req, struct n size_t hdrlen; u32 recvd, pglen = rcvbuf->page_len; __be32 *end, *entry, *p, *kaddr; - unsigned int nr; + unsigned int nr = 0; int status; status = decode_op_hdr(xdr, OP_READDIR); @@ -3505,7 +3505,12 @@ static int decode_readdir(struct xdr_stream *xdr, struct rpc_rqst *req, struct n kaddr = p = kmap_atomic(page, KM_USER0); end = p + ((pglen + readdir->pgbase) >> 2); entry = p; - for (nr = 0; *p++; nr++) { + + /* Make sure the packet actually has a value_follows and EOF entry */ + if ((entry + 1) > end) + goto short_pkt; + + for (; *p++; nr++) { u32 len, attrlen, xlen; if (end - p < 3) goto short_pkt; @@ -3532,20 +3537,32 @@ static int decode_readdir(struct xdr_stream *xdr, struct rpc_rqst *req, struct n p += attrlen; /* attributes */ entry = p; } - if (!nr && (entry[0] != 0 || entry[1] == 0)) - goto short_pkt; + /* + * Apparently some server sends responses that are a valid size, but + * contain no entries, and have value_follows==0 and EOF==0. For + * those, just set the EOF marker. + */ + if (!nr && entry[1] == 0) { + dprintk("NFS: readdir reply truncated!\n"); + entry[1] = 1; + } out: kunmap_atomic(kaddr, KM_USER0); return 0; short_pkt: + /* + * When we get a short packet there are 2 possibilities. We can + * return an error, or fix up the response to look like a valid + * response and return what we have so far. If there are no + * entries and the packet was short, then return -EIO. If there + * are valid entries in the response, return them and pretend that + * the call was successful, but incomplete. The caller can retry the + * readdir starting at the last cookie. + */ dprintk("%s: short packet at entry %d\n", __FUNCTION__, nr); entry[0] = entry[1] = 0; - /* truncate listing ? */ - if (!nr) { - dprintk("NFS: readdir reply truncated!\n"); - entry[1] = 1; - } - goto out; + if (nr) + goto out; err_unmap: kunmap_atomic(kaddr, KM_USER0); return -errno_NFSERR_IO; diff --git a/fs/nfs/read.c b/fs/nfs/read.c index 5a70be5..d333f5f 100644 --- a/fs/nfs/read.c +++ b/fs/nfs/read.c @@ -58,22 +58,19 @@ struct nfs_read_data *nfs_readdata_alloc(unsigned int pagecount) return p; } -static void nfs_readdata_rcu_free(struct rcu_head *head) +static void nfs_readdata_free(struct nfs_read_data *p) { - struct nfs_read_data *p = container_of(head, struct nfs_read_data, task.u.tk_rcu); if (p && (p->pagevec != &p->page_array[0])) kfree(p->pagevec); mempool_free(p, nfs_rdata_mempool); } -static void nfs_readdata_free(struct nfs_read_data *rdata) -{ - call_rcu_bh(&rdata->task.u.tk_rcu, nfs_readdata_rcu_free); -} - void nfs_readdata_release(void *data) { - nfs_readdata_free(data); + struct nfs_read_data *rdata = data; + + put_nfs_open_context(rdata->args.context); + nfs_readdata_free(rdata); } static @@ -174,6 +171,7 @@ static void nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data, .rpc_message = &msg, .callback_ops = call_ops, .callback_data = data, + .workqueue = nfsiod_workqueue, .flags = RPC_TASK_ASYNC | swap_flags, }; @@ -186,7 +184,7 @@ static void nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data, data->args.pgbase = req->wb_pgbase + offset; data->args.pages = data->pagevec; data->args.count = count; - data->args.context = req->wb_context; + data->args.context = get_nfs_open_context(req->wb_context); data->res.fattr = &data->fattr; data->res.count = count; @@ -253,7 +251,6 @@ static int nfs_pagein_multi(struct inode *inode, struct list_head *head, unsigne data = nfs_readdata_alloc(1); if (!data) goto out_bad; - INIT_LIST_HEAD(&data->pages); list_add(&data->pages, &list); requests++; nbytes -= len; @@ -300,7 +297,6 @@ static int nfs_pagein_one(struct inode *inode, struct list_head *head, unsigned if (!data) goto out_bad; - INIT_LIST_HEAD(&data->pages); pages = data->pagevec; while (!list_empty(head)) { req = nfs_list_entry(head->next); diff --git a/fs/nfs/super.c b/fs/nfs/super.c index f921902..140174d 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -441,10 +441,52 @@ static const char *nfs_pseudoflavour_to_name(rpc_authflavor_t flavour) return sec_flavours[i].str; } +static void nfs_show_mountd_options(struct seq_file *m, struct nfs_server *nfss, + int showdefaults) +{ + struct sockaddr *sap = (struct sockaddr *)&nfss->mountd_address; + + switch (sap->sa_family) { + case AF_INET: { + struct sockaddr_in *sin = (struct sockaddr_in *)sap; + seq_printf(m, ",mountaddr=" NIPQUAD_FMT, + NIPQUAD(sin->sin_addr.s_addr)); + break; + } + case AF_INET6: { + struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)sap; + seq_printf(m, ",mountaddr=" NIP6_FMT, + NIP6(sin6->sin6_addr)); + break; + } + default: + if (showdefaults) + seq_printf(m, ",mountaddr=unspecified"); + } + + if (nfss->mountd_version || showdefaults) + seq_printf(m, ",mountvers=%u", nfss->mountd_version); + if (nfss->mountd_port || showdefaults) + seq_printf(m, ",mountport=%u", nfss->mountd_port); + + switch (nfss->mountd_protocol) { + case IPPROTO_UDP: + seq_printf(m, ",mountproto=udp"); + break; + case IPPROTO_TCP: + seq_printf(m, ",mountproto=tcp"); + break; + default: + if (showdefaults) + seq_printf(m, ",mountproto=auto"); + } +} + /* * Describe the mount options in force on this server representation */ -static void nfs_show_mount_options(struct seq_file *m, struct nfs_server *nfss, int showdefaults) +static void nfs_show_mount_options(struct seq_file *m, struct nfs_server *nfss, + int showdefaults) { static const struct proc_nfs_info { int flag; @@ -452,6 +494,8 @@ static void nfs_show_mount_options(struct seq_file *m, struct nfs_server *nfss, const char *nostr; } nfs_info[] = { { NFS_MOUNT_SOFT, ",soft", ",hard" }, + { NFS_MOUNT_INTR, ",intr", ",nointr" }, + { NFS_MOUNT_POSIX, ",posix", "" }, { NFS_MOUNT_NOCTO, ",nocto", "" }, { NFS_MOUNT_NOAC, ",noac", "" }, { NFS_MOUNT_NONLM, ",nolock", "" }, @@ -462,18 +506,22 @@ static void nfs_show_mount_options(struct seq_file *m, struct nfs_server *nfss, }; const struct proc_nfs_info *nfs_infop; struct nfs_client *clp = nfss->nfs_client; - - seq_printf(m, ",vers=%d", clp->rpc_ops->version); - seq_printf(m, ",rsize=%d", nfss->rsize); - seq_printf(m, ",wsize=%d", nfss->wsize); + u32 version = clp->rpc_ops->version; + + seq_printf(m, ",vers=%u", version); + seq_printf(m, ",rsize=%u", nfss->rsize); + seq_printf(m, ",wsize=%u", nfss->wsize); + if (nfss->bsize != 0) + seq_printf(m, ",bsize=%u", nfss->bsize); + seq_printf(m, ",namlen=%u", nfss->namelen); if (nfss->acregmin != 3*HZ || showdefaults) - seq_printf(m, ",acregmin=%d", nfss->acregmin/HZ); + seq_printf(m, ",acregmin=%u", nfss->acregmin/HZ); if (nfss->acregmax != 60*HZ || showdefaults) - seq_printf(m, ",acregmax=%d", nfss->acregmax/HZ); + seq_printf(m, ",acregmax=%u", nfss->acregmax/HZ); if (nfss->acdirmin != 30*HZ || showdefaults) - seq_printf(m, ",acdirmin=%d", nfss->acdirmin/HZ); + seq_printf(m, ",acdirmin=%u", nfss->acdirmin/HZ); if (nfss->acdirmax != 60*HZ || showdefaults) - seq_printf(m, ",acdirmax=%d", nfss->acdirmax/HZ); + seq_printf(m, ",acdirmax=%u", nfss->acdirmax/HZ); for (nfs_infop = nfs_info; nfs_infop->flag; nfs_infop++) { if (nfss->flags & nfs_infop->flag) seq_puts(m, nfs_infop->str); @@ -482,9 +530,24 @@ static void nfs_show_mount_options(struct seq_file *m, struct nfs_server *nfss, } seq_printf(m, ",proto=%s", rpc_peeraddr2str(nfss->client, RPC_DISPLAY_PROTO)); + if (version == 4) { + if (nfss->port != NFS_PORT) + seq_printf(m, ",port=%u", nfss->port); + } else + if (nfss->port) + seq_printf(m, ",port=%u", nfss->port); + seq_printf(m, ",timeo=%lu", 10U * nfss->client->cl_timeout->to_initval / HZ); seq_printf(m, ",retrans=%u", nfss->client->cl_timeout->to_retries); seq_printf(m, ",sec=%s", nfs_pseudoflavour_to_name(nfss->client->cl_auth->au_flavor)); + + if (version != 4) + nfs_show_mountd_options(m, nfss, showdefaults); + +#ifdef CONFIG_NFS_V4 + if (clp->rpc_ops->version == 4) + seq_printf(m, ",clientaddr=%s", clp->cl_ipaddr); +#endif } /* @@ -529,10 +592,10 @@ static int nfs_show_stats(struct seq_file *m, struct vfsmount *mnt) seq_printf(m, "\n\tcaps:\t"); seq_printf(m, "caps=0x%x", nfss->caps); - seq_printf(m, ",wtmult=%d", nfss->wtmult); - seq_printf(m, ",dtsize=%d", nfss->dtsize); - seq_printf(m, ",bsize=%d", nfss->bsize); - seq_printf(m, ",namelen=%d", nfss->namelen); + seq_printf(m, ",wtmult=%u", nfss->wtmult); + seq_printf(m, ",dtsize=%u", nfss->dtsize); + seq_printf(m, ",bsize=%u", nfss->bsize); + seq_printf(m, ",namlen=%u", nfss->namelen); #ifdef CONFIG_NFS_V4 if (nfss->nfs_client->rpc_ops->version == 4) { @@ -546,9 +609,9 @@ static int nfs_show_stats(struct seq_file *m, struct vfsmount *mnt) /* * Display security flavor in effect for this mount */ - seq_printf(m, "\n\tsec:\tflavor=%d", auth->au_ops->au_flavor); + seq_printf(m, "\n\tsec:\tflavor=%u", auth->au_ops->au_flavor); if (auth->au_flavor) - seq_printf(m, ",pseudoflavor=%d", auth->au_flavor); + seq_printf(m, ",pseudoflavor=%u", auth->au_flavor); /* * Display superblock I/O counters @@ -683,7 +746,6 @@ static int nfs_parse_mount_options(char *raw, struct nfs_parsed_mount_data *mnt) { char *p, *string, *secdata; - unsigned short port = 0; int rc; if (!raw) { @@ -798,7 +860,7 @@ static int nfs_parse_mount_options(char *raw, return 0; if (option < 0 || option > 65535) return 0; - port = option; + mnt->nfs_server.port = option; break; case Opt_rsize: if (match_int(args, &mnt->rsize)) @@ -1048,7 +1110,8 @@ static int nfs_parse_mount_options(char *raw, } } - nfs_set_port((struct sockaddr *)&mnt->nfs_server.address, port); + nfs_set_port((struct sockaddr *)&mnt->nfs_server.address, + mnt->nfs_server.port); return 1; @@ -1169,7 +1232,9 @@ static int nfs_validate_mount_data(void *options, args->acregmax = 60; args->acdirmin = 30; args->acdirmax = 60; + args->mount_server.port = 0; /* autobind unless user sets port */ args->mount_server.protocol = XPRT_TRANSPORT_UDP; + args->nfs_server.port = 0; /* autobind unless user sets port */ args->nfs_server.protocol = XPRT_TRANSPORT_TCP; switch (data->version) { @@ -1706,28 +1771,6 @@ static void nfs4_fill_super(struct super_block *sb) } /* - * If the user didn't specify a port, set the port number to - * the NFS version 4 default port. - */ -static void nfs4_default_port(struct sockaddr *sap) -{ - switch (sap->sa_family) { - case AF_INET: { - struct sockaddr_in *ap = (struct sockaddr_in *)sap; - if (ap->sin_port == 0) - ap->sin_port = htons(NFS_PORT); - break; - } - case AF_INET6: { - struct sockaddr_in6 *ap = (struct sockaddr_in6 *)sap; - if (ap->sin6_port == 0) - ap->sin6_port = htons(NFS_PORT); - break; - } - } -} - -/* * Validate NFSv4 mount options */ static int nfs4_validate_mount_data(void *options, @@ -1751,6 +1794,7 @@ static int nfs4_validate_mount_data(void *options, args->acregmax = 60; args->acdirmin = 30; args->acdirmax = 60; + args->nfs_server.port = NFS_PORT; /* 2049 unless user set port= */ args->nfs_server.protocol = XPRT_TRANSPORT_TCP; switch (data->version) { @@ -1767,9 +1811,6 @@ static int nfs4_validate_mount_data(void *options, &args->nfs_server.address)) goto out_no_address; - nfs4_default_port((struct sockaddr *) - &args->nfs_server.address); - switch (data->auth_flavourlen) { case 0: args->auth_flavors[0] = RPC_AUTH_UNIX; @@ -1827,9 +1868,6 @@ static int nfs4_validate_mount_data(void *options, &args->nfs_server.address)) return -EINVAL; - nfs4_default_port((struct sockaddr *) - &args->nfs_server.address); - switch (args->auth_flavor_len) { case 0: args->auth_flavors[0] = RPC_AUTH_UNIX; diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c index 83e865a..412738d 100644 --- a/fs/nfs/symlink.c +++ b/fs/nfs/symlink.c @@ -10,7 +10,6 @@ * nfs symlink handling code */ -#define NFS_NEED_XDR_TYPES #include #include #include diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c index 7574153..3adf8b2 100644 --- a/fs/nfs/unlink.c +++ b/fs/nfs/unlink.c @@ -234,7 +234,7 @@ nfs_async_unlink(struct inode *dir, struct dentry *dentry) if (data == NULL) goto out; - data->cred = rpcauth_lookupcred(NFS_CLIENT(dir)->cl_auth, 0); + data->cred = rpc_lookup_cred(); if (IS_ERR(data->cred)) { status = PTR_ERR(data->cred); goto out_free; diff --git a/fs/nfs/write.c b/fs/nfs/write.c index bed6341..ce40cad 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -59,19 +59,13 @@ struct nfs_write_data *nfs_commit_alloc(void) return p; } -static void nfs_commit_rcu_free(struct rcu_head *head) +void nfs_commit_free(struct nfs_write_data *p) { - struct nfs_write_data *p = container_of(head, struct nfs_write_data, task.u.tk_rcu); if (p && (p->pagevec != &p->page_array[0])) kfree(p->pagevec); mempool_free(p, nfs_commit_mempool); } -void nfs_commit_free(struct nfs_write_data *wdata) -{ - call_rcu_bh(&wdata->task.u.tk_rcu, nfs_commit_rcu_free); -} - struct nfs_write_data *nfs_writedata_alloc(unsigned int pagecount) { struct nfs_write_data *p = mempool_alloc(nfs_wdata_mempool, GFP_NOFS); @@ -93,21 +87,18 @@ struct nfs_write_data *nfs_writedata_alloc(unsigned int pagecount) return p; } -static void nfs_writedata_rcu_free(struct rcu_head *head) +static void nfs_writedata_free(struct nfs_write_data *p) { - struct nfs_write_data *p = container_of(head, struct nfs_write_data, task.u.tk_rcu); if (p && (p->pagevec != &p->page_array[0])) kfree(p->pagevec); mempool_free(p, nfs_wdata_mempool); } -static void nfs_writedata_free(struct nfs_write_data *wdata) +void nfs_writedata_release(void *data) { - call_rcu_bh(&wdata->task.u.tk_rcu, nfs_writedata_rcu_free); -} + struct nfs_write_data *wdata = data; -void nfs_writedata_release(void *wdata) -{ + put_nfs_open_context(wdata->args.context); nfs_writedata_free(wdata); } @@ -291,8 +282,6 @@ static int nfs_page_async_flush(struct nfs_pageio_descriptor *pgio, spin_unlock(&inode->i_lock); if (!nfs_pageio_add_request(pgio, req)) { nfs_redirty_request(req); - nfs_end_page_writeback(page); - nfs_clear_page_tag_locked(req); return pgio->pg_error; } return 0; @@ -366,15 +355,13 @@ int nfs_writepages(struct address_space *mapping, struct writeback_control *wbc) /* * Insert a write request into an inode */ -static int nfs_inode_add_request(struct inode *inode, struct nfs_page *req) +static void nfs_inode_add_request(struct inode *inode, struct nfs_page *req) { struct nfs_inode *nfsi = NFS_I(inode); int error; error = radix_tree_insert(&nfsi->nfs_page_tree, req->wb_index, req); - BUG_ON(error == -EEXIST); - if (error) - return error; + BUG_ON(error); if (!nfsi->npages) { igrab(inode); if (nfs_have_delegation(inode, FMODE_WRITE)) @@ -384,8 +371,8 @@ static int nfs_inode_add_request(struct inode *inode, struct nfs_page *req) set_page_private(req->wb_page, (unsigned long)req); nfsi->npages++; kref_get(&req->wb_kref); - radix_tree_tag_set(&nfsi->nfs_page_tree, req->wb_index, NFS_PAGE_TAG_LOCKED); - return 0; + radix_tree_tag_set(&nfsi->nfs_page_tree, req->wb_index, + NFS_PAGE_TAG_LOCKED); } /* @@ -413,7 +400,7 @@ static void nfs_inode_remove_request(struct nfs_page *req) } static void -nfs_redirty_request(struct nfs_page *req) +nfs_mark_request_dirty(struct nfs_page *req) { __set_page_dirty_nobuffers(req->wb_page); } @@ -467,7 +454,7 @@ int nfs_reschedule_unstable_write(struct nfs_page *req) return 1; } if (test_and_clear_bit(PG_NEED_RESCHED, &req->wb_flags)) { - nfs_redirty_request(req); + nfs_mark_request_dirty(req); return 1; } return 0; @@ -597,6 +584,13 @@ static struct nfs_page * nfs_update_request(struct nfs_open_context* ctx, /* Loop over all inode entries and see if we find * A request for the page we wish to update */ + if (new) { + if (radix_tree_preload(GFP_NOFS)) { + nfs_release_request(new); + return ERR_PTR(-ENOMEM); + } + } + spin_lock(&inode->i_lock); req = nfs_page_find_request_locked(page); if (req) { @@ -607,28 +601,27 @@ static struct nfs_page * nfs_update_request(struct nfs_open_context* ctx, error = nfs_wait_on_request(req); nfs_release_request(req); if (error < 0) { - if (new) + if (new) { + radix_tree_preload_end(); nfs_release_request(new); + } return ERR_PTR(error); } continue; } spin_unlock(&inode->i_lock); - if (new) + if (new) { + radix_tree_preload_end(); nfs_release_request(new); + } break; } if (new) { - int error; nfs_lock_request_dontget(new); - error = nfs_inode_add_request(inode, new); - if (error) { - spin_unlock(&inode->i_lock); - nfs_unlock_request(new); - return ERR_PTR(error); - } + nfs_inode_add_request(inode, new); spin_unlock(&inode->i_lock); + radix_tree_preload_end(); req = new; goto zero_page; } @@ -806,6 +799,7 @@ static void nfs_write_rpcsetup(struct nfs_page *req, .rpc_message = &msg, .callback_ops = call_ops, .callback_data = data, + .workqueue = nfsiod_workqueue, .flags = flags, .priority = priority, }; @@ -822,7 +816,7 @@ static void nfs_write_rpcsetup(struct nfs_page *req, data->args.pgbase = req->wb_pgbase + offset; data->args.pages = data->pagevec; data->args.count = count; - data->args.context = req->wb_context; + data->args.context = get_nfs_open_context(req->wb_context); data->args.stable = NFS_UNSTABLE; if (how & FLUSH_STABLE) { data->args.stable = NFS_DATA_SYNC; @@ -851,6 +845,17 @@ static void nfs_write_rpcsetup(struct nfs_page *req, rpc_put_task(task); } +/* If a nfs_flush_* function fails, it should remove reqs from @head and + * call this on each, which will prepare them to be retried on next + * writeback using standard nfs. + */ +static void nfs_redirty_request(struct nfs_page *req) +{ + nfs_mark_request_dirty(req); + nfs_end_page_writeback(req->wb_page); + nfs_clear_page_tag_locked(req); +} + /* * Generate multiple small requests to write out a single * contiguous dirty area on one page. @@ -906,8 +911,6 @@ out_bad: nfs_writedata_release(data); } nfs_redirty_request(req); - nfs_end_page_writeback(req->wb_page); - nfs_clear_page_tag_locked(req); return -ENOMEM; } @@ -948,8 +951,6 @@ static int nfs_flush_one(struct inode *inode, struct list_head *head, unsigned i req = nfs_list_entry(head->next); nfs_list_remove_request(req); nfs_redirty_request(req); - nfs_end_page_writeback(req->wb_page); - nfs_clear_page_tag_locked(req); } return -ENOMEM; } @@ -1159,8 +1160,11 @@ int nfs_writeback_done(struct rpc_task *task, struct nfs_write_data *data) #if defined(CONFIG_NFS_V3) || defined(CONFIG_NFS_V4) -void nfs_commit_release(void *wdata) +void nfs_commit_release(void *data) { + struct nfs_write_data *wdata = data; + + put_nfs_open_context(wdata->args.context); nfs_commit_free(wdata); } @@ -1187,6 +1191,7 @@ static void nfs_commit_rpcsetup(struct list_head *head, .rpc_message = &msg, .callback_ops = &nfs_commit_ops, .callback_data = data, + .workqueue = nfsiod_workqueue, .flags = flags, .priority = priority, }; @@ -1203,6 +1208,7 @@ static void nfs_commit_rpcsetup(struct list_head *head, /* Note: we always request a commit of the entire inode */ data->args.offset = 0; data->args.count = 0; + data->args.context = get_nfs_open_context(first->wb_context); data->res.count = 0; data->res.fattr = &data->fattr; data->res.verf = &data->verf; @@ -1297,7 +1303,7 @@ static void nfs_commit_done(struct rpc_task *task, void *calldata) } /* We have a mismatch. Write the page again */ dprintk(" mismatch\n"); - nfs_redirty_request(req); + nfs_mark_request_dirty(req); next: nfs_clear_page_tag_locked(req); } diff --git a/fs/nfsd/auth.c b/fs/nfsd/auth.c index d13403e..294992e 100644 --- a/fs/nfsd/auth.c +++ b/fs/nfsd/auth.c @@ -10,6 +10,7 @@ #include #include #include +#include "auth.h" int nfsexp_flags(struct svc_rqst *rqstp, struct svc_export *exp) { diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c index 8a6f7c9..33bfcf0 100644 --- a/fs/nfsd/export.c +++ b/fs/nfsd/export.c @@ -35,6 +35,7 @@ #include #include #include +#include #define NFSDDBG_FACILITY NFSDDBG_EXPORT @@ -1548,6 +1549,7 @@ exp_addclient(struct nfsctl_client *ncp) { struct auth_domain *dom; int i, err; + struct in6_addr addr6; /* First, consistency check. */ err = -EINVAL; @@ -1566,9 +1568,10 @@ exp_addclient(struct nfsctl_client *ncp) goto out_unlock; /* Insert client into hashtable. */ - for (i = 0; i < ncp->cl_naddr; i++) - auth_unix_add_addr(ncp->cl_addrlist[i], dom); - + for (i = 0; i < ncp->cl_naddr; i++) { + ipv6_addr_set_v4mapped(ncp->cl_addrlist[i].s_addr, &addr6); + auth_unix_add_addr(&addr6, dom); + } auth_unix_forget_old(dom); auth_domain_put(dom); diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index aae2b29..562abf3 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -344,6 +344,21 @@ static struct rpc_version * nfs_cb_version[] = { &nfs_cb_version4, }; +static struct rpc_program cb_program; + +static struct rpc_stat cb_stats = { + .program = &cb_program +}; + +#define NFS4_CALLBACK 0x40000000 +static struct rpc_program cb_program = { + .name = "nfs4_cb", + .number = NFS4_CALLBACK, + .nrvers = ARRAY_SIZE(nfs_cb_version), + .version = nfs_cb_version, + .stats = &cb_stats, +}; + /* Reference counting, callback cleanup, etc., all look racy as heck. * And why is cb_set an atomic? */ @@ -358,13 +373,12 @@ static int do_probe_callback(void *data) .to_maxval = (NFSD_LEASE_TIME/2) * HZ, .to_exponential = 1, }; - struct rpc_program * program = &cb->cb_program; struct rpc_create_args args = { .protocol = IPPROTO_TCP, .address = (struct sockaddr *)&addr, .addrsize = sizeof(addr), .timeout = &timeparms, - .program = program, + .program = &cb_program, .version = nfs_cb_version[1]->number, .authflavor = RPC_AUTH_UNIX, /* XXX: need AUTH_GSS... */ .flags = (RPC_CLNT_CREATE_NOPING), @@ -382,16 +396,8 @@ static int do_probe_callback(void *data) addr.sin_port = htons(cb->cb_port); addr.sin_addr.s_addr = htonl(cb->cb_addr); - /* Initialize rpc_program */ - program->name = "nfs4_cb"; - program->number = cb->cb_prog; - program->nrvers = ARRAY_SIZE(nfs_cb_version); - program->version = nfs_cb_version; - program->stats = &cb->cb_stat; - /* Initialize rpc_stat */ - memset(program->stats, 0, sizeof(cb->cb_stat)); - program->stats->program = program; + memset(args.program->stats, 0, sizeof(struct rpc_stat)); /* Create RPC client */ client = rpc_create(&args); diff --git a/fs/nfsd/nfs4idmap.c b/fs/nfsd/nfs4idmap.c index 996bd88..5b39842 100644 --- a/fs/nfsd/nfs4idmap.c +++ b/fs/nfsd/nfs4idmap.c @@ -202,7 +202,7 @@ static struct cache_detail idtoname_cache = { .alloc = ent_alloc, }; -int +static int idtoname_parse(struct cache_detail *cd, char *buf, int buflen) { struct ent ent, *res; diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index bcb97d8..b98b12f 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1638,6 +1638,7 @@ nfs4_open_delegation(struct svc_fh *fh, struct nfsd4_open *open, struct nfs4_sta locks_init_lock(&fl); fl.fl_lmops = &nfsd_lease_mng_ops; fl.fl_flags = FL_LEASE; + fl.fl_type = flag == NFS4_OPEN_DELEGATE_READ? F_RDLCK: F_WRLCK; fl.fl_end = OFFSET_MAX; fl.fl_owner = (fl_owner_t)dp; fl.fl_file = stp->st_vfs_file; @@ -1646,8 +1647,7 @@ nfs4_open_delegation(struct svc_fh *fh, struct nfsd4_open *open, struct nfs4_sta /* vfs_setlease checks to see if delegation should be handed out. * the lock_manager callbacks fl_mylease and fl_change are used */ - if ((status = vfs_setlease(stp->st_vfs_file, - flag == NFS4_OPEN_DELEGATE_READ? F_RDLCK: F_WRLCK, &flp))) { + if ((status = vfs_setlease(stp->st_vfs_file, fl.fl_type, &flp))) { dprintk("NFSD: setlease failed [%d], no delegation\n", status); unhash_delegation(dp); flag = NFS4_OPEN_DELEGATE_NONE; @@ -1762,10 +1762,6 @@ out: return status; } -static struct workqueue_struct *laundry_wq; -static void laundromat_main(struct work_struct *); -static DECLARE_DELAYED_WORK(laundromat_work, laundromat_main); - __be32 nfsd4_renew(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, clientid_t *clid) @@ -1873,7 +1869,11 @@ nfs4_laundromat(void) return clientid_val; } -void +static struct workqueue_struct *laundry_wq; +static void laundromat_main(struct work_struct *); +static DECLARE_DELAYED_WORK(laundromat_work, laundromat_main); + +static void laundromat_main(struct work_struct *not_used) { time_t t; @@ -1974,6 +1974,26 @@ io_during_grace_disallowed(struct inode *inode, int flags) && mandatory_lock(inode); } +static int check_stateid_generation(stateid_t *in, stateid_t *ref) +{ + /* If the client sends us a stateid from the future, it's buggy: */ + if (in->si_generation > ref->si_generation) + return nfserr_bad_stateid; + /* + * The following, however, can happen. For example, if the + * client sends an open and some IO at the same time, the open + * may bump si_generation while the IO is still in flight. + * Thanks to hard links and renames, the client never knows what + * file an open will affect. So it could avoid that situation + * only by serializing all opens and IO from the same open + * owner. To recover from the old_stateid error, the client + * will just have to retry the IO: + */ + if (in->si_generation < ref->si_generation) + return nfserr_old_stateid; + return nfs_ok; +} + /* * Checks for stateid operations */ @@ -2022,12 +2042,8 @@ nfs4_preprocess_stateid_op(struct svc_fh *current_fh, stateid_t *stateid, int fl goto out; stidp = &stp->st_stateid; } - if (stateid->si_generation > stidp->si_generation) - goto out; - - /* OLD STATEID */ - status = nfserr_old_stateid; - if (stateid->si_generation < stidp->si_generation) + status = check_stateid_generation(stateid, stidp); + if (status) goto out; if (stp) { if ((status = nfs4_check_openmode(stp,flags))) @@ -2035,7 +2051,7 @@ nfs4_preprocess_stateid_op(struct svc_fh *current_fh, stateid_t *stateid, int fl renew_client(stp->st_stateowner->so_client); if (filpp) *filpp = stp->st_vfs_file; - } else if (dp) { + } else { if ((status = nfs4_check_delegmode(dp, flags))) goto out; renew_client(dp->dl_client); @@ -2064,6 +2080,7 @@ nfs4_preprocess_seqid_op(struct svc_fh *current_fh, u32 seqid, stateid_t *statei { struct nfs4_stateid *stp; struct nfs4_stateowner *sop; + __be32 status; dprintk("NFSD: preprocess_seqid_op: seqid=%d " "stateid = (%08x/%08x/%08x/%08x)\n", seqid, @@ -2126,7 +2143,7 @@ nfs4_preprocess_seqid_op(struct svc_fh *current_fh, u32 seqid, stateid_t *statei } } - if ((flags & CHECK_FH) && nfs4_check_fh(current_fh, stp)) { + if (nfs4_check_fh(current_fh, stp)) { dprintk("NFSD: preprocess_seqid_op: fh-stateid mismatch!\n"); return nfserr_bad_stateid; } @@ -2149,15 +2166,9 @@ nfs4_preprocess_seqid_op(struct svc_fh *current_fh, u32 seqid, stateid_t *statei " confirmed yet!\n"); return nfserr_bad_stateid; } - if (stateid->si_generation > stp->st_stateid.si_generation) { - dprintk("NFSD: preprocess_seqid_op: future stateid?!\n"); - return nfserr_bad_stateid; - } - - if (stateid->si_generation < stp->st_stateid.si_generation) { - dprintk("NFSD: preprocess_seqid_op: old stateid!\n"); - return nfserr_old_stateid; - } + status = check_stateid_generation(stateid, &stp->st_stateid); + if (status) + return status; renew_client(sop->so_client); return nfs_ok; @@ -2193,7 +2204,7 @@ nfsd4_open_confirm(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if ((status = nfs4_preprocess_seqid_op(&cstate->current_fh, oc->oc_seqid, &oc->oc_req_stateid, - CHECK_FH | CONFIRM | OPEN_STATE, + CONFIRM | OPEN_STATE, &oc->oc_stateowner, &stp, NULL))) goto out; @@ -2264,7 +2275,7 @@ nfsd4_open_downgrade(struct svc_rqst *rqstp, if ((status = nfs4_preprocess_seqid_op(&cstate->current_fh, od->od_seqid, &od->od_stateid, - CHECK_FH | OPEN_STATE, + OPEN_STATE, &od->od_stateowner, &stp, NULL))) goto out; @@ -2317,7 +2328,7 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if ((status = nfs4_preprocess_seqid_op(&cstate->current_fh, close->cl_seqid, &close->cl_stateid, - CHECK_FH | OPEN_STATE | CLOSE_STATE, + OPEN_STATE | CLOSE_STATE, &close->cl_stateowner, &stp, NULL))) goto out; status = nfs_ok; @@ -2622,7 +2633,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = nfs4_preprocess_seqid_op(&cstate->current_fh, lock->lk_new_open_seqid, &lock->lk_new_open_stateid, - CHECK_FH | OPEN_STATE, + OPEN_STATE, &lock->lk_replay_owner, &open_stp, lock); if (status) @@ -2649,7 +2660,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = nfs4_preprocess_seqid_op(&cstate->current_fh, lock->lk_old_lock_seqid, &lock->lk_old_lock_stateid, - CHECK_FH | LOCK_STATE, + LOCK_STATE, &lock->lk_replay_owner, &lock_stp, lock); if (status) goto out; @@ -2846,7 +2857,7 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if ((status = nfs4_preprocess_seqid_op(&cstate->current_fh, locku->lu_seqid, &locku->lu_stateid, - CHECK_FH | LOCK_STATE, + LOCK_STATE, &locku->lu_stateowner, &stp, NULL))) goto out; diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 0e6a179..1ba7ad9 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1867,6 +1867,15 @@ out_serverfault: goto out; } +static inline int attributes_need_mount(u32 *bmval) +{ + if (bmval[0] & ~(FATTR4_WORD0_RDATTR_ERROR | FATTR4_WORD0_LEASE_TIME)) + return 1; + if (bmval[1] & ~FATTR4_WORD1_MOUNTED_ON_FILEID) + return 1; + return 0; +} + static __be32 nfsd4_encode_dirent_fattr(struct nfsd4_readdir *cd, const char *name, int namlen, __be32 *p, int *buflen) @@ -1888,9 +1897,7 @@ nfsd4_encode_dirent_fattr(struct nfsd4_readdir *cd, * we will not follow the cross mount and will fill the attribtutes * directly from the mountpoint dentry. */ - if (d_mountpoint(dentry) && - (cd->rd_bmval[0] & ~FATTR4_WORD0_RDATTR_ERROR) == 0 && - (cd->rd_bmval[1] & ~FATTR4_WORD1_MOUNTED_ON_FILEID) == 0) + if (d_mountpoint(dentry) && !attributes_need_mount(cd->rd_bmval)) ignore_crossmnt = 1; else if (d_mountpoint(dentry)) { int err; diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 8516137..613bcb8 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -37,6 +37,7 @@ #include #include +#include /* * We have a single directory with 9 nodes in it. @@ -149,7 +150,6 @@ static const struct file_operations transaction_ops = { .release = simple_transaction_release, }; -extern struct seq_operations nfs_exports_op; static int exports_open(struct inode *inode, struct file *file) { return seq_open(file, &nfs_exports_op); @@ -222,6 +222,7 @@ static ssize_t write_getfs(struct file *file, char *buf, size_t size) struct auth_domain *clp; int err = 0; struct knfsd_fh *res; + struct in6_addr in6; if (size < sizeof(*data)) return -EINVAL; @@ -236,7 +237,11 @@ static ssize_t write_getfs(struct file *file, char *buf, size_t size) res = (struct knfsd_fh*)buf; exp_readlock(); - if (!(clp = auth_unix_lookup(sin->sin_addr))) + + ipv6_addr_set_v4mapped(sin->sin_addr.s_addr, &in6); + + clp = auth_unix_lookup(&in6); + if (!clp) err = -EPERM; else { err = exp_rootfh(clp, data->gd_path, res, data->gd_maxlen); @@ -257,6 +262,7 @@ static ssize_t write_getfd(struct file *file, char *buf, size_t size) int err = 0; struct knfsd_fh fh; char *res; + struct in6_addr in6; if (size < sizeof(*data)) return -EINVAL; @@ -271,7 +277,11 @@ static ssize_t write_getfd(struct file *file, char *buf, size_t size) res = buf; sin = (struct sockaddr_in *)&data->gd_addr; exp_readlock(); - if (!(clp = auth_unix_lookup(sin->sin_addr))) + + ipv6_addr_set_v4mapped(sin->sin_addr.s_addr, &in6); + + clp = auth_unix_lookup(&in6); + if (!clp) err = -EPERM; else { err = exp_rootfh(clp, data->gd_path, &fh, NFS_FHSIZE); @@ -347,8 +357,6 @@ static ssize_t write_filehandle(struct file *file, char *buf, size_t size) return mesg - buf; } -extern int nfsd_nrthreads(void); - static ssize_t write_threads(struct file *file, char *buf, size_t size) { /* if size > 0, look for a number of threads and call nfsd_svc @@ -371,10 +379,6 @@ static ssize_t write_threads(struct file *file, char *buf, size_t size) return strlen(buf); } -extern int nfsd_nrpools(void); -extern int nfsd_get_nrthreads(int n, int *); -extern int nfsd_set_nrthreads(int n, int *); - static ssize_t write_pool_threads(struct file *file, char *buf, size_t size) { /* if size > 0, look for an array of number of threads per node diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index 3e6b3f4..100ae56 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -113,6 +113,124 @@ static __be32 nfsd_setuser_and_check_port(struct svc_rqst *rqstp, } /* + * Use the given filehandle to look up the corresponding export and + * dentry. On success, the results are used to set fh_export and + * fh_dentry. + */ +static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) +{ + struct knfsd_fh *fh = &fhp->fh_handle; + struct fid *fid = NULL, sfid; + struct svc_export *exp; + struct dentry *dentry; + int fileid_type; + int data_left = fh->fh_size/4; + __be32 error; + + error = nfserr_stale; + if (rqstp->rq_vers > 2) + error = nfserr_badhandle; + if (rqstp->rq_vers == 4 && fh->fh_size == 0) + return nfserr_nofilehandle; + + if (fh->fh_version == 1) { + int len; + + if (--data_left < 0) + return error; + if (fh->fh_auth_type != 0) + return error; + len = key_len(fh->fh_fsid_type) / 4; + if (len == 0) + return error; + if (fh->fh_fsid_type == FSID_MAJOR_MINOR) { + /* deprecated, convert to type 3 */ + len = key_len(FSID_ENCODE_DEV)/4; + fh->fh_fsid_type = FSID_ENCODE_DEV; + fh->fh_fsid[0] = new_encode_dev(MKDEV(ntohl(fh->fh_fsid[0]), ntohl(fh->fh_fsid[1]))); + fh->fh_fsid[1] = fh->fh_fsid[2]; + } + data_left -= len; + if (data_left < 0) + return error; + exp = rqst_exp_find(rqstp, fh->fh_fsid_type, fh->fh_auth); + fid = (struct fid *)(fh->fh_auth + len); + } else { + __u32 tfh[2]; + dev_t xdev; + ino_t xino; + + if (fh->fh_size != NFS_FHSIZE) + return error; + /* assume old filehandle format */ + xdev = old_decode_dev(fh->ofh_xdev); + xino = u32_to_ino_t(fh->ofh_xino); + mk_fsid(FSID_DEV, tfh, xdev, xino, 0, NULL); + exp = rqst_exp_find(rqstp, FSID_DEV, tfh); + } + + error = nfserr_stale; + if (PTR_ERR(exp) == -ENOENT) + return error; + + if (IS_ERR(exp)) + return nfserrno(PTR_ERR(exp)); + + error = nfsd_setuser_and_check_port(rqstp, exp); + if (error) + goto out; + + /* + * Look up the dentry using the NFS file handle. + */ + error = nfserr_stale; + if (rqstp->rq_vers > 2) + error = nfserr_badhandle; + + if (fh->fh_version != 1) { + sfid.i32.ino = fh->ofh_ino; + sfid.i32.gen = fh->ofh_generation; + sfid.i32.parent_ino = fh->ofh_dirino; + fid = &sfid; + data_left = 3; + if (fh->ofh_dirino == 0) + fileid_type = FILEID_INO32_GEN; + else + fileid_type = FILEID_INO32_GEN_PARENT; + } else + fileid_type = fh->fh_fileid_type; + + if (fileid_type == FILEID_ROOT) + dentry = dget(exp->ex_path.dentry); + else { + dentry = exportfs_decode_fh(exp->ex_path.mnt, fid, + data_left, fileid_type, + nfsd_acceptable, exp); + } + if (dentry == NULL) + goto out; + if (IS_ERR(dentry)) { + if (PTR_ERR(dentry) != -EINVAL) + error = nfserrno(PTR_ERR(dentry)); + goto out; + } + + if (S_ISDIR(dentry->d_inode->i_mode) && + (dentry->d_flags & DCACHE_DISCONNECTED)) { + printk("nfsd: find_fh_dentry returned a DISCONNECTED directory: %s/%s\n", + dentry->d_parent->d_name.name, dentry->d_name.name); + } + + fhp->fh_dentry = dentry; + fhp->fh_export = exp; + nfsd_nr_verified++; + return 0; +out: + exp_put(exp); + return error; +} + +/* * Perform sanity checks on the dentry in a client's file handle. * * Note that the file handle dentry may need to be freed even after @@ -124,115 +242,18 @@ static __be32 nfsd_setuser_and_check_port(struct svc_rqst *rqstp, __be32 fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, int access) { - struct knfsd_fh *fh = &fhp->fh_handle; - struct svc_export *exp = NULL; + struct svc_export *exp; struct dentry *dentry; - __be32 error = 0; + __be32 error; dprintk("nfsd: fh_verify(%s)\n", SVCFH_fmt(fhp)); if (!fhp->fh_dentry) { - struct fid *fid = NULL, sfid; - int fileid_type; - int data_left = fh->fh_size/4; - - error = nfserr_stale; - if (rqstp->rq_vers > 2) - error = nfserr_badhandle; - if (rqstp->rq_vers == 4 && fh->fh_size == 0) - return nfserr_nofilehandle; - - if (fh->fh_version == 1) { - int len; - if (--data_left<0) goto out; - switch (fh->fh_auth_type) { - case 0: break; - default: goto out; - } - len = key_len(fh->fh_fsid_type) / 4; - if (len == 0) goto out; - if (fh->fh_fsid_type == FSID_MAJOR_MINOR) { - /* deprecated, convert to type 3 */ - len = key_len(FSID_ENCODE_DEV)/4; - fh->fh_fsid_type = FSID_ENCODE_DEV; - fh->fh_fsid[0] = new_encode_dev(MKDEV(ntohl(fh->fh_fsid[0]), ntohl(fh->fh_fsid[1]))); - fh->fh_fsid[1] = fh->fh_fsid[2]; - } - if ((data_left -= len)<0) goto out; - exp = rqst_exp_find(rqstp, fh->fh_fsid_type, - fh->fh_auth); - fid = (struct fid *)(fh->fh_auth + len); - } else { - __u32 tfh[2]; - dev_t xdev; - ino_t xino; - if (fh->fh_size != NFS_FHSIZE) - goto out; - /* assume old filehandle format */ - xdev = old_decode_dev(fh->ofh_xdev); - xino = u32_to_ino_t(fh->ofh_xino); - mk_fsid(FSID_DEV, tfh, xdev, xino, 0, NULL); - exp = rqst_exp_find(rqstp, FSID_DEV, tfh); - } - - error = nfserr_stale; - if (PTR_ERR(exp) == -ENOENT) - goto out; - - if (IS_ERR(exp)) { - error = nfserrno(PTR_ERR(exp)); - goto out; - } - - error = nfsd_setuser_and_check_port(rqstp, exp); + error = nfsd_set_fh_dentry(rqstp, fhp); if (error) goto out; - - /* - * Look up the dentry using the NFS file handle. - */ - error = nfserr_stale; - if (rqstp->rq_vers > 2) - error = nfserr_badhandle; - - if (fh->fh_version != 1) { - sfid.i32.ino = fh->ofh_ino; - sfid.i32.gen = fh->ofh_generation; - sfid.i32.parent_ino = fh->ofh_dirino; - fid = &sfid; - data_left = 3; - if (fh->ofh_dirino == 0) - fileid_type = FILEID_INO32_GEN; - else - fileid_type = FILEID_INO32_GEN_PARENT; - } else - fileid_type = fh->fh_fileid_type; - - if (fileid_type == FILEID_ROOT) - dentry = dget(exp->ex_path.dentry); - else { - dentry = exportfs_decode_fh(exp->ex_path.mnt, fid, - data_left, fileid_type, - nfsd_acceptable, exp); - } - if (dentry == NULL) - goto out; - if (IS_ERR(dentry)) { - if (PTR_ERR(dentry) != -EINVAL) - error = nfserrno(PTR_ERR(dentry)); - goto out; - } - - if (S_ISDIR(dentry->d_inode->i_mode) && - (dentry->d_flags & DCACHE_DISCONNECTED)) { - printk("nfsd: find_fh_dentry returned a DISCONNECTED directory: %s/%s\n", - dentry->d_parent->d_name.name, dentry->d_name.name); - } - - fhp->fh_dentry = dentry; - fhp->fh_export = exp; - nfsd_nr_verified++; - cache_get(&exp->h); + dentry = fhp->fh_dentry; + exp = fhp->fh_export; } else { /* * just rechecking permissions @@ -242,7 +263,6 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, int access) dprintk("nfsd: fh_verify - just checking\n"); dentry = fhp->fh_dentry; exp = fhp->fh_export; - cache_get(&exp->h); /* * Set user creds for this exportpoint; necessary even * in the "just checking" case because this may be a @@ -281,8 +301,6 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, int access) access, ntohl(error)); } out: - if (exp && !IS_ERR(exp)) - exp_put(exp); if (error == nfserr_stale) nfsdstats.fh_stale++; return error; diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 9647b0f..941041f 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -244,7 +244,6 @@ static int nfsd_init_socks(int port) if (error < 0) return error; -#ifdef CONFIG_NFSD_TCP error = lockd_up(IPPROTO_TCP); if (error >= 0) { error = svc_create_xprt(nfsd_serv, "tcp", port, @@ -254,7 +253,6 @@ static int nfsd_init_socks(int port) } if (error < 0) return error; -#endif return 0; } diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 46f59d5..17ac51b 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -988,7 +988,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file, * flushing the data to disk is handled separately below. */ - if (file->f_op->fsync == 0) {/* COMMIT3 cannot work */ + if (!file->f_op->fsync) {/* COMMIT3 cannot work */ stable = 2; *stablep = 2; /* FILE_SYNC */ } @@ -1152,7 +1152,7 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, } #endif /* CONFIG_NFSD_V3 */ -__be32 +static __be32 nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *resfhp, struct iattr *iap) { diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index adcbb05..de8387b 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -43,7 +43,7 @@ struct fid { u32 parent_ino; u32 parent_gen; } i32; - __u32 raw[6]; + __u32 raw[0]; }; }; diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h index 4babb2a..acf39e1 100644 --- a/include/linux/lockd/lockd.h +++ b/include/linux/lockd/lockd.h @@ -173,8 +173,10 @@ void nlmclnt_next_cookie(struct nlm_cookie *); /* * Host cache */ -struct nlm_host *nlmclnt_lookup_host(const struct sockaddr_in *, int, int, - const char *, unsigned int); +struct nlm_host *nlmclnt_lookup_host(const struct sockaddr_in *sin, + int proto, u32 version, + const char *hostname, + unsigned int hostname_len); struct nlm_host *nlmsvc_lookup_host(struct svc_rqst *, const char *, unsigned int); struct rpc_clnt * nlm_bind_host(struct nlm_host *); @@ -217,8 +219,7 @@ void nlmsvc_mark_resources(void); void nlmsvc_free_host_resources(struct nlm_host *); void nlmsvc_invalidate_all(void); -static __inline__ struct inode * -nlmsvc_file_inode(struct nlm_file *file) +static inline struct inode *nlmsvc_file_inode(struct nlm_file *file) { return file->f_file->f_path.dentry->d_inode; } @@ -226,8 +227,8 @@ nlmsvc_file_inode(struct nlm_file *file) /* * Compare two host addresses (needs modifying for ipv6) */ -static __inline__ int -nlm_cmp_addr(const struct sockaddr_in *sin1, const struct sockaddr_in *sin2) +static inline int nlm_cmp_addr(const struct sockaddr_in *sin1, + const struct sockaddr_in *sin2) { return sin1->sin_addr.s_addr == sin2->sin_addr.s_addr; } @@ -236,8 +237,8 @@ nlm_cmp_addr(const struct sockaddr_in *sin1, const struct sockaddr_in *sin2) * Compare two NLM locks. * When the second lock is of type F_UNLCK, this acts like a wildcard. */ -static __inline__ int -nlm_compare_locks(const struct file_lock *fl1, const struct file_lock *fl2) +static inline int nlm_compare_locks(const struct file_lock *fl1, + const struct file_lock *fl2) { return fl1->fl_pid == fl2->fl_pid && fl1->fl_owner == fl2->fl_owner diff --git a/include/linux/lockd/sm_inter.h b/include/linux/lockd/sm_inter.h index 22a6458..5a5448b 100644 --- a/include/linux/lockd/sm_inter.h +++ b/include/linux/lockd/sm_inter.h @@ -19,6 +19,7 @@ #define SM_NOTIFY 6 #define SM_MAXSTRLEN 1024 +#define SM_PRIV_SIZE 16 /* * Arguments for all calls to statd diff --git a/include/linux/nfs3.h b/include/linux/nfs3.h index 7f11fa5..539f3b5 100644 --- a/include/linux/nfs3.h +++ b/include/linux/nfs3.h @@ -96,7 +96,7 @@ struct nfs3_fh { #define MOUNTPROC3_UMNTALL 4 -#if defined(__KERNEL__) || defined(NFS_NEED_KERNEL_TYPES) +#if defined(__KERNEL__) /* Number of 32bit words in post_op_attr */ #define NFS3_POST_OP_ATTR_WORDS 22 diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h index 3423c67..ac7e4fb 100644 --- a/include/linux/nfs_fs_sb.h +++ b/include/linux/nfs_fs_sb.h @@ -93,6 +93,7 @@ struct nfs_server { unsigned int wpages; /* write size (in pages) */ unsigned int wtmult; /* server disk block size */ unsigned int dtsize; /* readdir size */ + unsigned short port; /* "port=" setting */ unsigned int bsize; /* server block size */ unsigned int acregmin; /* attr cache timeouts */ unsigned int acregmax; @@ -117,6 +118,13 @@ struct nfs_server { atomic_t active; /* Keep trace of any activity to this server */ wait_queue_head_t active_wq; /* Wait for any activity to stop */ + + /* mountd-related mount options */ + struct sockaddr_storage mountd_address; + size_t mountd_addrlen; + u32 mountd_version; + unsigned short mountd_port; + unsigned short mountd_protocol; }; /* Server capabilities */ diff --git a/include/linux/nfsd/Kbuild b/include/linux/nfsd/Kbuild index e726fc3..fc97204 100644 --- a/include/linux/nfsd/Kbuild +++ b/include/linux/nfsd/Kbuild @@ -1,6 +1,6 @@ unifdef-y += const.h +unifdef-y += debug.h unifdef-y += export.h +unifdef-y += nfsfh.h unifdef-y += stats.h unifdef-y += syscall.h -unifdef-y += nfsfh.h -unifdef-y += debug.h diff --git a/include/linux/nfsd/cache.h b/include/linux/nfsd/cache.h index 7b5d784..04b355c 100644 --- a/include/linux/nfsd/cache.h +++ b/include/linux/nfsd/cache.h @@ -10,7 +10,6 @@ #ifndef NFSCACHE_H #define NFSCACHE_H -#ifdef __KERNEL__ #include #include @@ -77,5 +76,4 @@ void nfsd_reply_cache_shutdown(void); int nfsd_cache_lookup(struct svc_rqst *, int); void nfsd_cache_update(struct svc_rqst *, int, __be32 *); -#endif /* __KERNEL__ */ #endif /* NFSCACHE_H */ diff --git a/include/linux/nfsd/nfsd.h b/include/linux/nfsd/nfsd.h index 8caf4c4..21ee440 100644 --- a/include/linux/nfsd/nfsd.h +++ b/include/linux/nfsd/nfsd.h @@ -27,7 +27,6 @@ #define NFSD_VERSION "0.5" #define NFSD_SUPPORTED_MINOR_VERSION 0 -#ifdef __KERNEL__ /* * Special flags for nfsd_permission. These must be different from MAY_READ, * MAY_WRITE, and MAY_EXEC. @@ -56,12 +55,20 @@ extern struct svc_program nfsd_program; extern struct svc_version nfsd_version2, nfsd_version3, nfsd_version4; extern struct svc_serv *nfsd_serv; + +extern struct seq_operations nfs_exports_op; + /* * Function prototypes. */ int nfsd_svc(unsigned short port, int nrservs); int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp); +int nfsd_nrthreads(void); +int nfsd_nrpools(void); +int nfsd_get_nrthreads(int n, int *); +int nfsd_set_nrthreads(int n, int *); + /* nfsd/vfs.c */ int fh_lock_parent(struct svc_fh *, struct dentry *); int nfsd_racache_init(int); @@ -326,6 +333,4 @@ extern struct timeval nfssvc_boot; #endif /* CONFIG_NFSD_V4 */ -#endif /* __KERNEL__ */ - #endif /* LINUX_NFSD_NFSD_H */ diff --git a/include/linux/sunrpc/auth.h b/include/linux/sunrpc/auth.h index 7a69ca3..e93cd8a 100644 --- a/include/linux/sunrpc/auth.h +++ b/include/linux/sunrpc/auth.h @@ -59,8 +59,8 @@ struct rpc_cred { /* * Client authentication handle */ -#define RPC_CREDCACHE_NR 8 -#define RPC_CREDCACHE_MASK (RPC_CREDCACHE_NR - 1) +#define RPC_CREDCACHE_HASHBITS 4 +#define RPC_CREDCACHE_NR (1 << RPC_CREDCACHE_HASHBITS) struct rpc_cred_cache { struct hlist_head hashtable[RPC_CREDCACHE_NR]; spinlock_t lock; @@ -89,7 +89,6 @@ struct rpc_auth { /* Flags for rpcauth_lookupcred() */ #define RPCAUTH_LOOKUP_NEW 0x01 /* Accept an uninitialised cred */ -#define RPCAUTH_LOOKUP_ROOTCREDS 0x02 /* This really ought to go! */ /* * Client authentication ops @@ -113,6 +112,7 @@ struct rpc_credops { void (*crdestroy)(struct rpc_cred *); int (*crmatch)(struct auth_cred *, struct rpc_cred *, int); + void (*crbind)(struct rpc_task *, struct rpc_cred *); __be32 * (*crmarshal)(struct rpc_task *, __be32 *); int (*crrefresh)(struct rpc_task *); __be32 * (*crvalidate)(struct rpc_task *, __be32 *); @@ -126,9 +126,12 @@ extern const struct rpc_authops authunix_ops; extern const struct rpc_authops authnull_ops; void __init rpc_init_authunix(void); +void __init rpc_init_generic_auth(void); void __init rpcauth_init_module(void); void __exit rpcauth_remove_module(void); +void __exit rpc_destroy_generic_auth(void); +struct rpc_cred * rpc_lookup_cred(void); int rpcauth_register(const struct rpc_authops *); int rpcauth_unregister(const struct rpc_authops *); struct rpc_auth * rpcauth_create(rpc_authflavor_t, struct rpc_clnt *); @@ -136,8 +139,8 @@ void rpcauth_release(struct rpc_auth *); struct rpc_cred * rpcauth_lookup_credcache(struct rpc_auth *, struct auth_cred *, int); void rpcauth_init_cred(struct rpc_cred *, const struct auth_cred *, struct rpc_auth *, const struct rpc_credops *); struct rpc_cred * rpcauth_lookupcred(struct rpc_auth *, int); -struct rpc_cred * rpcauth_bindcred(struct rpc_task *); -void rpcauth_holdcred(struct rpc_task *); +void rpcauth_bindcred(struct rpc_task *, struct rpc_cred *, int); +void rpcauth_generic_bind_cred(struct rpc_task *, struct rpc_cred *); void put_rpccred(struct rpc_cred *); void rpcauth_unbindcred(struct rpc_task *); __be32 * rpcauth_marshcred(struct rpc_task *, __be32 *); diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h index 03547d6..2d8b211 100644 --- a/include/linux/sunrpc/cache.h +++ b/include/linux/sunrpc/cache.h @@ -120,7 +120,6 @@ struct cache_deferred_req { struct list_head hash; /* on hash chain */ struct list_head recent; /* on fifo */ struct cache_head *item; /* cache item we wait on */ - time_t recv_time; void *owner; /* we might need to discard all defered requests * owned by someone */ void (*revisit)(struct cache_deferred_req *req, diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h index 129a86e..6fff7f8 100644 --- a/include/linux/sunrpc/clnt.h +++ b/include/linux/sunrpc/clnt.h @@ -127,11 +127,12 @@ int rpcb_getport_sync(struct sockaddr_in *, u32, u32, int); void rpcb_getport_async(struct rpc_task *); void rpc_call_start(struct rpc_task *); -int rpc_call_async(struct rpc_clnt *clnt, struct rpc_message *msg, - int flags, const struct rpc_call_ops *tk_ops, +int rpc_call_async(struct rpc_clnt *clnt, + const struct rpc_message *msg, int flags, + const struct rpc_call_ops *tk_ops, void *calldata); -int rpc_call_sync(struct rpc_clnt *clnt, struct rpc_message *msg, - int flags); +int rpc_call_sync(struct rpc_clnt *clnt, + const struct rpc_message *msg, int flags); struct rpc_task *rpc_call_null(struct rpc_clnt *clnt, struct rpc_cred *cred, int flags); void rpc_restart_call(struct rpc_task *); diff --git a/include/linux/sunrpc/gss_krb5.h b/include/linux/sunrpc/gss_krb5.h index 5a4b1e0..a10f1fb 100644 --- a/include/linux/sunrpc/gss_krb5.h +++ b/include/linux/sunrpc/gss_krb5.h @@ -70,8 +70,6 @@ enum seal_alg { SEAL_ALG_DES3KD = 0x0002 }; -#define KRB5_CKSUM_LENGTH 8 - #define CKSUMTYPE_CRC32 0x0001 #define CKSUMTYPE_RSA_MD4 0x0002 #define CKSUMTYPE_RSA_MD4_DES 0x0003 @@ -150,9 +148,9 @@ gss_decrypt_xdr_buf(struct crypto_blkcipher *tfm, struct xdr_buf *inbuf, s32 krb5_make_seq_num(struct crypto_blkcipher *key, int direction, - s32 seqnum, unsigned char *cksum, unsigned char *buf); + u32 seqnum, unsigned char *cksum, unsigned char *buf); s32 krb5_get_seq_num(struct crypto_blkcipher *key, unsigned char *cksum, - unsigned char *buf, int *direction, s32 * seqnum); + unsigned char *buf, int *direction, u32 *seqnum); diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h index f689f02..d1a5c8c 100644 --- a/include/linux/sunrpc/sched.h +++ b/include/linux/sunrpc/sched.h @@ -11,7 +11,6 @@ #include #include -#include #include #include #include @@ -33,7 +32,8 @@ struct rpc_wait_queue; struct rpc_wait { struct list_head list; /* wait queue links */ struct list_head links; /* Links to related tasks */ - struct rpc_wait_queue * rpc_waitq; /* RPC wait queue we're on */ + struct list_head timer_list; /* Timer list */ + unsigned long expires; }; /* @@ -57,33 +57,25 @@ struct rpc_task { __u8 tk_cred_retry; /* - * timeout_fn to be executed by timer bottom half * callback to be executed after waking up * action next procedure for async tasks * tk_ops caller callbacks */ - void (*tk_timeout_fn)(struct rpc_task *); void (*tk_callback)(struct rpc_task *); void (*tk_action)(struct rpc_task *); const struct rpc_call_ops *tk_ops; void * tk_calldata; - /* - * tk_timer is used for async processing by the RPC scheduling - * primitives. You should not access this directly unless - * you have a pathological interest in kernel oopses. - */ - struct timer_list tk_timer; /* kernel timer */ unsigned long tk_timeout; /* timeout for rpc_sleep() */ unsigned short tk_flags; /* misc flags */ unsigned long tk_runstate; /* Task run status */ struct workqueue_struct *tk_workqueue; /* Normally rpciod, but could * be any workqueue */ + struct rpc_wait_queue *tk_waitqueue; /* RPC wait queue we're on */ union { struct work_struct tk_work; /* Async task work queue */ struct rpc_wait tk_wait; /* RPC wait */ - struct rcu_head tk_rcu; /* for task deletion */ } u; unsigned short tk_timeouts; /* maj timeouts */ @@ -123,6 +115,7 @@ struct rpc_task_setup { const struct rpc_message *rpc_message; const struct rpc_call_ops *callback_ops; void *callback_data; + struct workqueue_struct *workqueue; unsigned short flags; signed char priority; }; @@ -147,9 +140,7 @@ struct rpc_task_setup { #define RPC_TASK_RUNNING 0 #define RPC_TASK_QUEUED 1 -#define RPC_TASK_WAKEUP 2 -#define RPC_TASK_HAS_TIMER 3 -#define RPC_TASK_ACTIVE 4 +#define RPC_TASK_ACTIVE 2 #define RPC_IS_RUNNING(t) test_bit(RPC_TASK_RUNNING, &(t)->tk_runstate) #define rpc_set_running(t) set_bit(RPC_TASK_RUNNING, &(t)->tk_runstate) @@ -171,15 +162,6 @@ struct rpc_task_setup { smp_mb__after_clear_bit(); \ } while (0) -#define rpc_start_wakeup(t) \ - (test_and_set_bit(RPC_TASK_WAKEUP, &(t)->tk_runstate) == 0) -#define rpc_finish_wakeup(t) \ - do { \ - smp_mb__before_clear_bit(); \ - clear_bit(RPC_TASK_WAKEUP, &(t)->tk_runstate); \ - smp_mb__after_clear_bit(); \ - } while (0) - #define RPC_IS_ACTIVATED(t) test_bit(RPC_TASK_ACTIVE, &(t)->tk_runstate) /* @@ -192,6 +174,12 @@ struct rpc_task_setup { #define RPC_PRIORITY_HIGH (1) #define RPC_NR_PRIORITY (1 + RPC_PRIORITY_HIGH - RPC_PRIORITY_LOW) +struct rpc_timer { + struct timer_list timer; + struct list_head list; + unsigned long expires; +}; + /* * RPC synchronization objects */ @@ -204,6 +192,7 @@ struct rpc_wait_queue { unsigned char count; /* # task groups remaining serviced so far */ unsigned char nr; /* # tasks remaining for cookie */ unsigned short qlen; /* total # tasks waiting in queue */ + struct rpc_timer timer_list; #ifdef RPC_DEBUG const char * name; #endif @@ -229,9 +218,11 @@ void rpc_killall_tasks(struct rpc_clnt *); void rpc_execute(struct rpc_task *); void rpc_init_priority_wait_queue(struct rpc_wait_queue *, const char *); void rpc_init_wait_queue(struct rpc_wait_queue *, const char *); +void rpc_destroy_wait_queue(struct rpc_wait_queue *); void rpc_sleep_on(struct rpc_wait_queue *, struct rpc_task *, - rpc_action action, rpc_action timer); -void rpc_wake_up_task(struct rpc_task *); + rpc_action action); +void rpc_wake_up_queued_task(struct rpc_wait_queue *, + struct rpc_task *); void rpc_wake_up(struct rpc_wait_queue *); struct rpc_task *rpc_wake_up_next(struct rpc_wait_queue *); void rpc_wake_up_status(struct rpc_wait_queue *, int); diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 64c9755..4b54c5f 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -386,7 +386,6 @@ struct svc_serv * svc_create(struct svc_program *, unsigned int, void (*shutdown)(struct svc_serv*)); struct svc_rqst *svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool); -int svc_create_thread(svc_thread_fn, struct svc_serv *); void svc_exit_thread(struct svc_rqst *); struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int, void (*shutdown)(struct svc_serv*), diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h index 22e1ef8..d39dbdc 100644 --- a/include/linux/sunrpc/svcauth.h +++ b/include/linux/sunrpc/svcauth.h @@ -24,6 +24,7 @@ struct svc_cred { }; struct svc_rqst; /* forward decl */ +struct in6_addr; /* Authentication is done in the context of a domain. * @@ -120,10 +121,10 @@ extern void svc_auth_unregister(rpc_authflavor_t flavor); extern struct auth_domain *unix_domain_find(char *name); extern void auth_domain_put(struct auth_domain *item); -extern int auth_unix_add_addr(struct in_addr addr, struct auth_domain *dom); +extern int auth_unix_add_addr(struct in6_addr *addr, struct auth_domain *dom); extern struct auth_domain *auth_domain_lookup(char *name, struct auth_domain *new); extern struct auth_domain *auth_domain_find(char *name); -extern struct auth_domain *auth_unix_lookup(struct in_addr addr); +extern struct auth_domain *auth_unix_lookup(struct in6_addr *addr); extern int auth_unix_forget_old(struct auth_domain *dom); extern void svcauth_unix_purge(void); extern void svcauth_unix_info_release(void *); diff --git a/include/net/ipv6.h b/include/net/ipv6.h index c0c019f..12d94f0 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -383,6 +383,15 @@ static inline int ipv6_addr_v4mapped(const struct in6_addr *a) a->s6_addr32[2] == htonl(0x0000ffff)); } +static inline void ipv6_addr_set_v4mapped(const __be32 addr, + struct in6_addr *v4mapped) +{ + ipv6_addr_set(v4mapped, + 0, 0, + htonl(0x0000FFFF), + addr); +} + /* * find the first different bit between two addresses * length of address must be a multiple of 32bits diff --git a/net/sunrpc/Makefile b/net/sunrpc/Makefile index 92e1dbe..5369aa3 100644 --- a/net/sunrpc/Makefile +++ b/net/sunrpc/Makefile @@ -8,7 +8,7 @@ obj-$(CONFIG_SUNRPC_GSS) += auth_gss/ obj-$(CONFIG_SUNRPC_XPRT_RDMA) += xprtrdma/ sunrpc-y := clnt.o xprt.o socklib.o xprtsock.o sched.o \ - auth.o auth_null.o auth_unix.o \ + auth.o auth_null.o auth_unix.o auth_generic.o \ svc.o svcsock.o svcauth.o svcauth_unix.o \ rpcb_clnt.o timer.o xdr.o \ sunrpc_syms.o cache.o rpc_pipe.o \ diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c index eca941c..0632cd0 100644 --- a/net/sunrpc/auth.c +++ b/net/sunrpc/auth.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include @@ -280,10 +281,9 @@ rpcauth_lookup_credcache(struct rpc_auth *auth, struct auth_cred * acred, struct hlist_node *pos; struct rpc_cred *cred = NULL, *entry, *new; - int nr = 0; + unsigned int nr; - if (!(flags & RPCAUTH_LOOKUP_ROOTCREDS)) - nr = acred->uid & RPC_CREDCACHE_MASK; + nr = hash_long(acred->uid, RPC_CREDCACHE_HASHBITS); rcu_read_lock(); hlist_for_each_entry_rcu(entry, pos, &cache->hashtable[nr], cr_hash) { @@ -356,7 +356,6 @@ rpcauth_lookupcred(struct rpc_auth *auth, int flags) put_group_info(acred.group_info); return ret; } -EXPORT_SYMBOL_GPL(rpcauth_lookupcred); void rpcauth_init_cred(struct rpc_cred *cred, const struct auth_cred *acred, @@ -375,41 +374,58 @@ rpcauth_init_cred(struct rpc_cred *cred, const struct auth_cred *acred, } EXPORT_SYMBOL_GPL(rpcauth_init_cred); -struct rpc_cred * -rpcauth_bindcred(struct rpc_task *task) +void +rpcauth_generic_bind_cred(struct rpc_task *task, struct rpc_cred *cred) +{ + task->tk_msg.rpc_cred = get_rpccred(cred); + dprintk("RPC: %5u holding %s cred %p\n", task->tk_pid, + cred->cr_auth->au_ops->au_name, cred); +} +EXPORT_SYMBOL_GPL(rpcauth_generic_bind_cred); + +static void +rpcauth_bind_root_cred(struct rpc_task *task) { struct rpc_auth *auth = task->tk_client->cl_auth; struct auth_cred acred = { - .uid = current->fsuid, - .gid = current->fsgid, - .group_info = current->group_info, + .uid = 0, + .gid = 0, }; struct rpc_cred *ret; - int flags = 0; dprintk("RPC: %5u looking up %s cred\n", task->tk_pid, task->tk_client->cl_auth->au_ops->au_name); - get_group_info(acred.group_info); - if (task->tk_flags & RPC_TASK_ROOTCREDS) - flags |= RPCAUTH_LOOKUP_ROOTCREDS; - ret = auth->au_ops->lookup_cred(auth, &acred, flags); + ret = auth->au_ops->lookup_cred(auth, &acred, 0); + if (!IS_ERR(ret)) + task->tk_msg.rpc_cred = ret; + else + task->tk_status = PTR_ERR(ret); +} + +static void +rpcauth_bind_new_cred(struct rpc_task *task) +{ + struct rpc_auth *auth = task->tk_client->cl_auth; + struct rpc_cred *ret; + + dprintk("RPC: %5u looking up %s cred\n", + task->tk_pid, auth->au_ops->au_name); + ret = rpcauth_lookupcred(auth, 0); if (!IS_ERR(ret)) task->tk_msg.rpc_cred = ret; else task->tk_status = PTR_ERR(ret); - put_group_info(acred.group_info); - return ret; } void -rpcauth_holdcred(struct rpc_task *task) +rpcauth_bindcred(struct rpc_task *task, struct rpc_cred *cred, int flags) { - struct rpc_cred *cred = task->tk_msg.rpc_cred; - if (cred != NULL) { - get_rpccred(cred); - dprintk("RPC: %5u holding %s cred %p\n", task->tk_pid, - cred->cr_auth->au_ops->au_name, cred); - } + if (cred != NULL) + cred->cr_ops->crbind(task, cred); + else if (flags & RPC_TASK_ROOTCREDS) + rpcauth_bind_root_cred(task); + else + rpcauth_bind_new_cred(task); } void @@ -550,6 +566,7 @@ static struct shrinker rpc_cred_shrinker = { void __init rpcauth_init_module(void) { rpc_init_authunix(); + rpc_init_generic_auth(); register_shrinker(&rpc_cred_shrinker); } diff --git a/net/sunrpc/auth_generic.c b/net/sunrpc/auth_generic.c new file mode 100644 index 0000000..6a3f77c --- /dev/null +++ b/net/sunrpc/auth_generic.c @@ -0,0 +1,157 @@ +/* + * Generic RPC credential + * + * Copyright (C) 2008, Trond Myklebust + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#ifdef RPC_DEBUG +# define RPCDBG_FACILITY RPCDBG_AUTH +#endif + +struct generic_cred { + struct rpc_cred gc_base; + struct auth_cred acred; +}; + +static struct rpc_auth generic_auth; +static struct rpc_cred_cache generic_cred_cache; +static const struct rpc_credops generic_credops; + +/* + * Public call interface + */ +struct rpc_cred *rpc_lookup_cred(void) +{ + return rpcauth_lookupcred(&generic_auth, 0); +} +EXPORT_SYMBOL_GPL(rpc_lookup_cred); + +static void +generic_bind_cred(struct rpc_task *task, struct rpc_cred *cred) +{ + struct rpc_auth *auth = task->tk_client->cl_auth; + struct auth_cred *acred = &container_of(cred, struct generic_cred, gc_base)->acred; + struct rpc_cred *ret; + + ret = auth->au_ops->lookup_cred(auth, acred, 0); + if (!IS_ERR(ret)) + task->tk_msg.rpc_cred = ret; + else + task->tk_status = PTR_ERR(ret); +} + +/* + * Lookup generic creds for current process + */ +static struct rpc_cred * +generic_lookup_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags) +{ + return rpcauth_lookup_credcache(&generic_auth, acred, flags); +} + +static struct rpc_cred * +generic_create_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags) +{ + struct generic_cred *gcred; + + gcred = kmalloc(sizeof(*gcred), GFP_KERNEL); + if (gcred == NULL) + return ERR_PTR(-ENOMEM); + + rpcauth_init_cred(&gcred->gc_base, acred, &generic_auth, &generic_credops); + gcred->gc_base.cr_flags = 1UL << RPCAUTH_CRED_UPTODATE; + + gcred->acred.uid = acred->uid; + gcred->acred.gid = acred->gid; + gcred->acred.group_info = acred->group_info; + if (gcred->acred.group_info != NULL) + get_group_info(gcred->acred.group_info); + + dprintk("RPC: allocated generic cred %p for uid %d gid %d\n", + gcred, acred->uid, acred->gid); + return &gcred->gc_base; +} + +static void +generic_free_cred(struct rpc_cred *cred) +{ + struct generic_cred *gcred = container_of(cred, struct generic_cred, gc_base); + + dprintk("RPC: generic_free_cred %p\n", gcred); + if (gcred->acred.group_info != NULL) + put_group_info(gcred->acred.group_info); + kfree(gcred); +} + +static void +generic_free_cred_callback(struct rcu_head *head) +{ + struct rpc_cred *cred = container_of(head, struct rpc_cred, cr_rcu); + generic_free_cred(cred); +} + +static void +generic_destroy_cred(struct rpc_cred *cred) +{ + call_rcu(&cred->cr_rcu, generic_free_cred_callback); +} + +/* + * Match credentials against current process creds. + */ +static int +generic_match(struct auth_cred *acred, struct rpc_cred *cred, int flags) +{ + struct generic_cred *gcred = container_of(cred, struct generic_cred, gc_base); + + if (gcred->acred.uid != acred->uid || + gcred->acred.gid != acred->gid || + gcred->acred.group_info != acred->group_info) + return 0; + return 1; +} + +void __init rpc_init_generic_auth(void) +{ + spin_lock_init(&generic_cred_cache.lock); +} + +void __exit rpc_destroy_generic_auth(void) +{ + rpcauth_clear_credcache(&generic_cred_cache); +} + +static struct rpc_cred_cache generic_cred_cache = { + {{ NULL, },}, +}; + +static const struct rpc_authops generic_auth_ops = { + .owner = THIS_MODULE, +#ifdef RPC_DEBUG + .au_name = "Generic", +#endif + .lookup_cred = generic_lookup_cred, + .crcreate = generic_create_cred, +}; + +static struct rpc_auth generic_auth = { + .au_ops = &generic_auth_ops, + .au_count = ATOMIC_INIT(0), + .au_credcache = &generic_cred_cache, +}; + +static const struct rpc_credops generic_credops = { + .cr_name = "Generic cred", + .crdestroy = generic_destroy_cred, + .crbind = generic_bind_cred, + .crmatch = generic_match, +}; diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c index 6dac387..d34f6df 100644 --- a/net/sunrpc/auth_gss/auth_gss.c +++ b/net/sunrpc/auth_gss/auth_gss.c @@ -266,6 +266,7 @@ gss_release_msg(struct gss_upcall_msg *gss_msg) BUG_ON(!list_empty(&gss_msg->list)); if (gss_msg->ctx != NULL) gss_put_ctx(gss_msg->ctx); + rpc_destroy_wait_queue(&gss_msg->rpc_waitqueue); kfree(gss_msg); } @@ -408,13 +409,13 @@ gss_refresh_upcall(struct rpc_task *task) } spin_lock(&inode->i_lock); if (gss_cred->gc_upcall != NULL) - rpc_sleep_on(&gss_cred->gc_upcall->rpc_waitqueue, task, NULL, NULL); + rpc_sleep_on(&gss_cred->gc_upcall->rpc_waitqueue, task, NULL); else if (gss_msg->ctx == NULL && gss_msg->msg.errno >= 0) { task->tk_timeout = 0; gss_cred->gc_upcall = gss_msg; /* gss_upcall_callback will release the reference to gss_upcall_msg */ atomic_inc(&gss_msg->count); - rpc_sleep_on(&gss_msg->rpc_waitqueue, task, gss_upcall_callback, NULL); + rpc_sleep_on(&gss_msg->rpc_waitqueue, task, gss_upcall_callback); } else err = gss_msg->msg.errno; spin_unlock(&inode->i_lock); @@ -1299,6 +1300,7 @@ static const struct rpc_credops gss_credops = { .cr_name = "AUTH_GSS", .crdestroy = gss_destroy_cred, .cr_init = gss_cred_init, + .crbind = rpcauth_generic_bind_cred, .crmatch = gss_match, .crmarshal = gss_marshal, .crrefresh = gss_refresh, @@ -1310,6 +1312,7 @@ static const struct rpc_credops gss_credops = { static const struct rpc_credops gss_nullops = { .cr_name = "AUTH_GSS", .crdestroy = gss_destroy_cred, + .crbind = rpcauth_generic_bind_cred, .crmatch = gss_match, .crmarshal = gss_marshal, .crrefresh = gss_refresh_null, diff --git a/net/sunrpc/auth_gss/gss_generic_token.c b/net/sunrpc/auth_gss/gss_generic_token.c index ea8c92e..d83b881 100644 --- a/net/sunrpc/auth_gss/gss_generic_token.c +++ b/net/sunrpc/auth_gss/gss_generic_token.c @@ -148,7 +148,7 @@ int g_token_size(struct xdr_netobj *mech, unsigned int body_size) { /* set body_size to sequence contents size */ - body_size += 4 + (int) mech->len; /* NEED overflow check */ + body_size += 2 + (int) mech->len; /* NEED overflow check */ return(1 + der_length_size(body_size) + body_size); } @@ -161,7 +161,7 @@ void g_make_token_header(struct xdr_netobj *mech, int body_size, unsigned char **buf) { *(*buf)++ = 0x60; - der_write_length(buf, 4 + mech->len + body_size); + der_write_length(buf, 2 + mech->len + body_size); *(*buf)++ = 0x06; *(*buf)++ = (unsigned char) mech->len; TWRITE_STR(*buf, mech->data, ((int) mech->len)); diff --git a/net/sunrpc/auth_gss/gss_krb5_crypto.c b/net/sunrpc/auth_gss/gss_krb5_crypto.c index 0dd7923..1d52308 100644 --- a/net/sunrpc/auth_gss/gss_krb5_crypto.c +++ b/net/sunrpc/auth_gss/gss_krb5_crypto.c @@ -66,8 +66,8 @@ krb5_encrypt( goto out; if (crypto_blkcipher_ivsize(tfm) > 16) { - dprintk("RPC: gss_k5encrypt: tfm iv size to large %d\n", - crypto_blkcipher_ivsize(tfm)); + dprintk("RPC: gss_k5encrypt: tfm iv size too large %d\n", + crypto_blkcipher_ivsize(tfm)); goto out; } @@ -102,7 +102,7 @@ krb5_decrypt( goto out; if (crypto_blkcipher_ivsize(tfm) > 16) { - dprintk("RPC: gss_k5decrypt: tfm iv size to large %d\n", + dprintk("RPC: gss_k5decrypt: tfm iv size too large %d\n", crypto_blkcipher_ivsize(tfm)); goto out; } diff --git a/net/sunrpc/auth_gss/gss_krb5_seal.c b/net/sunrpc/auth_gss/gss_krb5_seal.c index dedcbd6..5f1d36d 100644 --- a/net/sunrpc/auth_gss/gss_krb5_seal.c +++ b/net/sunrpc/auth_gss/gss_krb5_seal.c @@ -87,10 +87,10 @@ gss_get_mic_kerberos(struct gss_ctx *gss_ctx, struct xdr_buf *text, now = get_seconds(); - token->len = g_token_size(&ctx->mech_used, 22); + token->len = g_token_size(&ctx->mech_used, 24); ptr = token->data; - g_make_token_header(&ctx->mech_used, 22, &ptr); + g_make_token_header(&ctx->mech_used, 24, &ptr); *ptr++ = (unsigned char) ((KG_TOK_MIC_MSG>>8)&0xff); *ptr++ = (unsigned char) (KG_TOK_MIC_MSG&0xff); @@ -109,15 +109,14 @@ gss_get_mic_kerberos(struct gss_ctx *gss_ctx, struct xdr_buf *text, md5cksum.data, md5cksum.len)) return GSS_S_FAILURE; - memcpy(krb5_hdr + 16, md5cksum.data + md5cksum.len - KRB5_CKSUM_LENGTH, - KRB5_CKSUM_LENGTH); + memcpy(krb5_hdr + 16, md5cksum.data + md5cksum.len - 8, 8); spin_lock(&krb5_seq_lock); seq_send = ctx->seq_send++; spin_unlock(&krb5_seq_lock); if (krb5_make_seq_num(ctx->seq, ctx->initiate ? 0 : 0xff, - ctx->seq_send, krb5_hdr + 16, krb5_hdr + 8)) + seq_send, krb5_hdr + 16, krb5_hdr + 8)) return GSS_S_FAILURE; return (ctx->endtime < now) ? GSS_S_CONTEXT_EXPIRED : GSS_S_COMPLETE; diff --git a/net/sunrpc/auth_gss/gss_krb5_seqnum.c b/net/sunrpc/auth_gss/gss_krb5_seqnum.c index 43f3421..f160be6 100644 --- a/net/sunrpc/auth_gss/gss_krb5_seqnum.c +++ b/net/sunrpc/auth_gss/gss_krb5_seqnum.c @@ -43,7 +43,7 @@ s32 krb5_make_seq_num(struct crypto_blkcipher *key, int direction, - s32 seqnum, + u32 seqnum, unsigned char *cksum, unsigned char *buf) { unsigned char plain[8]; @@ -65,7 +65,7 @@ s32 krb5_get_seq_num(struct crypto_blkcipher *key, unsigned char *cksum, unsigned char *buf, - int *direction, s32 * seqnum) + int *direction, u32 *seqnum) { s32 code; unsigned char plain[8]; diff --git a/net/sunrpc/auth_gss/gss_krb5_unseal.c b/net/sunrpc/auth_gss/gss_krb5_unseal.c index e30a993..d91a5d0 100644 --- a/net/sunrpc/auth_gss/gss_krb5_unseal.c +++ b/net/sunrpc/auth_gss/gss_krb5_unseal.c @@ -82,7 +82,7 @@ gss_verify_mic_kerberos(struct gss_ctx *gss_ctx, struct xdr_netobj md5cksum = {.len = 0, .data = cksumdata}; s32 now; int direction; - s32 seqnum; + u32 seqnum; unsigned char *ptr = (unsigned char *)read_token->data; int bodysize; diff --git a/net/sunrpc/auth_gss/gss_krb5_wrap.c b/net/sunrpc/auth_gss/gss_krb5_wrap.c index 3bdc527..b00b1b4 100644 --- a/net/sunrpc/auth_gss/gss_krb5_wrap.c +++ b/net/sunrpc/auth_gss/gss_krb5_wrap.c @@ -137,7 +137,7 @@ gss_wrap_kerberos(struct gss_ctx *ctx, int offset, BUG_ON((buf->len - offset) % blocksize); plainlen = blocksize + buf->len - offset; - headlen = g_token_size(&kctx->mech_used, 22 + plainlen) - + headlen = g_token_size(&kctx->mech_used, 24 + plainlen) - (buf->len - offset); ptr = buf->head[0].iov_base + offset; @@ -149,7 +149,7 @@ gss_wrap_kerberos(struct gss_ctx *ctx, int offset, buf->len += headlen; BUG_ON((buf->len - offset - headlen) % blocksize); - g_make_token_header(&kctx->mech_used, 22 + plainlen, &ptr); + g_make_token_header(&kctx->mech_used, 24 + plainlen, &ptr); *ptr++ = (unsigned char) ((KG_TOK_WRAP_MSG>>8)&0xff); @@ -176,9 +176,7 @@ gss_wrap_kerberos(struct gss_ctx *ctx, int offset, if (krb5_encrypt(kctx->seq, NULL, md5cksum.data, md5cksum.data, md5cksum.len)) return GSS_S_FAILURE; - memcpy(krb5_hdr + 16, - md5cksum.data + md5cksum.len - KRB5_CKSUM_LENGTH, - KRB5_CKSUM_LENGTH); + memcpy(krb5_hdr + 16, md5cksum.data + md5cksum.len - 8, 8); spin_lock(&krb5_seq_lock); seq_send = kctx->seq_send++; diff --git a/net/sunrpc/auth_gss/gss_spkm3_seal.c b/net/sunrpc/auth_gss/gss_spkm3_seal.c index abf17ce..c832712 100644 --- a/net/sunrpc/auth_gss/gss_spkm3_seal.c +++ b/net/sunrpc/auth_gss/gss_spkm3_seal.c @@ -107,10 +107,10 @@ spkm3_make_token(struct spkm3_ctx *ctx, tokenlen = 10 + ctxelen + 1 + md5elen + 1; /* Create token header using generic routines */ - token->len = g_token_size(&ctx->mech_used, tokenlen); + token->len = g_token_size(&ctx->mech_used, tokenlen + 2); ptr = token->data; - g_make_token_header(&ctx->mech_used, tokenlen, &ptr); + g_make_token_header(&ctx->mech_used, tokenlen + 2, &ptr); spkm3_make_mic_token(&ptr, tokenlen, &mic_hdr, &md5cksum, md5elen, md5zbit); } else if (toktype == SPKM_WRAP_TOK) { /* Not Supported */ diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c index 481f984..5905d56 100644 --- a/net/sunrpc/auth_gss/svcauth_gss.c +++ b/net/sunrpc/auth_gss/svcauth_gss.c @@ -1146,7 +1146,7 @@ svcauth_gss_accept(struct svc_rqst *rqstp, __be32 *authp) case RPC_GSS_SVC_INTEGRITY: if (unwrap_integ_data(&rqstp->rq_arg, gc->gc_seq, rsci->mechctx)) - goto auth_err; + goto garbage_args; /* placeholders for length and seq. number: */ svc_putnl(resv, 0); svc_putnl(resv, 0); @@ -1154,7 +1154,7 @@ svcauth_gss_accept(struct svc_rqst *rqstp, __be32 *authp) case RPC_GSS_SVC_PRIVACY: if (unwrap_priv_data(rqstp, &rqstp->rq_arg, gc->gc_seq, rsci->mechctx)) - goto auth_err; + goto garbage_args; /* placeholders for length and seq. number: */ svc_putnl(resv, 0); svc_putnl(resv, 0); @@ -1169,6 +1169,11 @@ svcauth_gss_accept(struct svc_rqst *rqstp, __be32 *authp) ret = SVC_OK; goto out; } +garbage_args: + /* Restore write pointer to its original value: */ + xdr_ressize_check(rqstp, reject_stat); + ret = SVC_GARBAGE; + goto out; auth_err: /* Restore write pointer to its original value: */ xdr_ressize_check(rqstp, reject_stat); diff --git a/net/sunrpc/auth_null.c b/net/sunrpc/auth_null.c index 537d0e8..3c26c18 100644 --- a/net/sunrpc/auth_null.c +++ b/net/sunrpc/auth_null.c @@ -125,6 +125,7 @@ static const struct rpc_credops null_credops = { .cr_name = "AUTH_NULL", .crdestroy = nul_destroy_cred, + .crbind = rpcauth_generic_bind_cred, .crmatch = nul_match, .crmarshal = nul_marshal, .crrefresh = nul_refresh, diff --git a/net/sunrpc/auth_unix.c b/net/sunrpc/auth_unix.c index 5ed91e5..04e936a 100644 --- a/net/sunrpc/auth_unix.c +++ b/net/sunrpc/auth_unix.c @@ -60,7 +60,8 @@ static struct rpc_cred * unx_create_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags) { struct unx_cred *cred; - int i; + unsigned int groups = 0; + unsigned int i; dprintk("RPC: allocating UNIX cred for uid %d gid %d\n", acred->uid, acred->gid); @@ -70,21 +71,17 @@ unx_create_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags) rpcauth_init_cred(&cred->uc_base, acred, auth, &unix_credops); cred->uc_base.cr_flags = 1UL << RPCAUTH_CRED_UPTODATE; - if (flags & RPCAUTH_LOOKUP_ROOTCREDS) { - cred->uc_uid = 0; - cred->uc_gid = 0; - cred->uc_gids[0] = NOGROUP; - } else { - int groups = acred->group_info->ngroups; - if (groups > NFS_NGROUPS) - groups = NFS_NGROUPS; - - cred->uc_gid = acred->gid; - for (i = 0; i < groups; i++) - cred->uc_gids[i] = GROUP_AT(acred->group_info, i); - if (i < NFS_NGROUPS) - cred->uc_gids[i] = NOGROUP; - } + + if (acred->group_info != NULL) + groups = acred->group_info->ngroups; + if (groups > NFS_NGROUPS) + groups = NFS_NGROUPS; + + cred->uc_gid = acred->gid; + for (i = 0; i < groups; i++) + cred->uc_gids[i] = GROUP_AT(acred->group_info, i); + if (i < NFS_NGROUPS) + cred->uc_gids[i] = NOGROUP; return &cred->uc_base; } @@ -118,26 +115,21 @@ static int unx_match(struct auth_cred *acred, struct rpc_cred *rcred, int flags) { struct unx_cred *cred = container_of(rcred, struct unx_cred, uc_base); - int i; + unsigned int groups = 0; + unsigned int i; - if (!(flags & RPCAUTH_LOOKUP_ROOTCREDS)) { - int groups; - if (cred->uc_uid != acred->uid - || cred->uc_gid != acred->gid) - return 0; + if (cred->uc_uid != acred->uid || cred->uc_gid != acred->gid) + return 0; + if (acred->group_info != NULL) groups = acred->group_info->ngroups; - if (groups > NFS_NGROUPS) - groups = NFS_NGROUPS; - for (i = 0; i < groups ; i++) - if (cred->uc_gids[i] != GROUP_AT(acred->group_info, i)) - return 0; - return 1; - } - return (cred->uc_uid == 0 - && cred->uc_gid == 0 - && cred->uc_gids[0] == (gid_t) NOGROUP); + if (groups > NFS_NGROUPS) + groups = NFS_NGROUPS; + for (i = 0; i < groups ; i++) + if (cred->uc_gids[i] != GROUP_AT(acred->group_info, i)) + return 0; + return 1; } /* @@ -245,6 +237,7 @@ static const struct rpc_credops unix_credops = { .cr_name = "AUTH_UNIX", .crdestroy = unx_destroy_cred, + .crbind = rpcauth_generic_bind_cred, .crmatch = unx_match, .crmarshal = unx_marshal, .crrefresh = unx_refresh, diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c index b5f2786..d75530f 100644 --- a/net/sunrpc/cache.c +++ b/net/sunrpc/cache.c @@ -571,7 +571,6 @@ static int cache_defer_req(struct cache_req *req, struct cache_head *item) return -ETIMEDOUT; dreq->item = item; - dreq->recv_time = get_seconds(); spin_lock(&cache_defer_lock); diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 8834d68..d6701f7 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -544,7 +544,7 @@ EXPORT_SYMBOL_GPL(rpc_run_task); * @msg: RPC call parameters * @flags: RPC call flags */ -int rpc_call_sync(struct rpc_clnt *clnt, struct rpc_message *msg, int flags) +int rpc_call_sync(struct rpc_clnt *clnt, const struct rpc_message *msg, int flags) { struct rpc_task *task; struct rpc_task_setup task_setup_data = { @@ -575,7 +575,7 @@ EXPORT_SYMBOL_GPL(rpc_call_sync); * @data: user call data */ int -rpc_call_async(struct rpc_clnt *clnt, struct rpc_message *msg, int flags, +rpc_call_async(struct rpc_clnt *clnt, const struct rpc_message *msg, int flags, const struct rpc_call_ops *tk_ops, void *data) { struct rpc_task *task; @@ -1062,7 +1062,7 @@ call_transmit(struct rpc_task *task) if (task->tk_msg.rpc_proc->p_decode != NULL) return; task->tk_action = rpc_exit_task; - rpc_wake_up_task(task); + rpc_wake_up_queued_task(&task->tk_xprt->pending, task); } /* @@ -1531,7 +1531,7 @@ void rpc_show_tasks(void) proc = -1; if (RPC_IS_QUEUED(t)) - rpc_waitq = rpc_qname(t->u.tk_wait.rpc_waitq); + rpc_waitq = rpc_qname(t->tk_waitqueue); printk("%5u %04d %04x %6d %8p %6d %8p %8ld %8s %8p %8p\n", t->tk_pid, proc, diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c index 3164a08..f480c71 100644 --- a/net/sunrpc/rpcb_clnt.c +++ b/net/sunrpc/rpcb_clnt.c @@ -298,7 +298,7 @@ void rpcb_getport_async(struct rpc_task *task) /* Put self on queue before sending rpcbind request, in case * rpcb_getport_done completes before we return from rpc_run_task */ - rpc_sleep_on(&xprt->binding, task, NULL, NULL); + rpc_sleep_on(&xprt->binding, task, NULL); /* Someone else may have bound if we slept */ if (xprt_bound(xprt)) { diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index 4c66912..6eab9bf 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -38,9 +38,9 @@ static struct kmem_cache *rpc_buffer_slabp __read_mostly; static mempool_t *rpc_task_mempool __read_mostly; static mempool_t *rpc_buffer_mempool __read_mostly; -static void __rpc_default_timer(struct rpc_task *task); static void rpc_async_schedule(struct work_struct *); static void rpc_release_task(struct rpc_task *task); +static void __rpc_queue_timer_fn(unsigned long ptr); /* * RPC tasks sit here while waiting for conditions to improve. @@ -57,41 +57,30 @@ struct workqueue_struct *rpciod_workqueue; * queue->lock and bh_disabled in order to avoid races within * rpc_run_timer(). */ -static inline void -__rpc_disable_timer(struct rpc_task *task) +static void +__rpc_disable_timer(struct rpc_wait_queue *queue, struct rpc_task *task) { + if (task->tk_timeout == 0) + return; dprintk("RPC: %5u disabling timer\n", task->tk_pid); - task->tk_timeout_fn = NULL; task->tk_timeout = 0; + list_del(&task->u.tk_wait.timer_list); + if (list_empty(&queue->timer_list.list)) + del_timer(&queue->timer_list.timer); } -/* - * Run a timeout function. - * We use the callback in order to allow __rpc_wake_up_task() - * and friends to disable the timer synchronously on SMP systems - * without calling del_timer_sync(). The latter could cause a - * deadlock if called while we're holding spinlocks... - */ -static void rpc_run_timer(struct rpc_task *task) +static void +rpc_set_queue_timer(struct rpc_wait_queue *queue, unsigned long expires) { - void (*callback)(struct rpc_task *); - - callback = task->tk_timeout_fn; - task->tk_timeout_fn = NULL; - if (callback && RPC_IS_QUEUED(task)) { - dprintk("RPC: %5u running timer\n", task->tk_pid); - callback(task); - } - smp_mb__before_clear_bit(); - clear_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate); - smp_mb__after_clear_bit(); + queue->timer_list.expires = expires; + mod_timer(&queue->timer_list.timer, expires); } /* * Set up a timer for the current task. */ -static inline void -__rpc_add_timer(struct rpc_task *task, rpc_action timer) +static void +__rpc_add_timer(struct rpc_wait_queue *queue, struct rpc_task *task) { if (!task->tk_timeout) return; @@ -99,27 +88,10 @@ __rpc_add_timer(struct rpc_task *task, rpc_action timer) dprintk("RPC: %5u setting alarm for %lu ms\n", task->tk_pid, task->tk_timeout * 1000 / HZ); - if (timer) - task->tk_timeout_fn = timer; - else - task->tk_timeout_fn = __rpc_default_timer; - set_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate); - mod_timer(&task->tk_timer, jiffies + task->tk_timeout); -} - -/* - * Delete any timer for the current task. Because we use del_timer_sync(), - * this function should never be called while holding queue->lock. - */ -static void -rpc_delete_timer(struct rpc_task *task) -{ - if (RPC_IS_QUEUED(task)) - return; - if (test_and_clear_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate)) { - del_singleshot_timer_sync(&task->tk_timer); - dprintk("RPC: %5u deleting timer\n", task->tk_pid); - } + task->u.tk_wait.expires = jiffies + task->tk_timeout; + if (list_empty(&queue->timer_list.list) || time_before(task->u.tk_wait.expires, queue->timer_list.expires)) + rpc_set_queue_timer(queue, task->u.tk_wait.expires); + list_add(&task->u.tk_wait.timer_list, &queue->timer_list.list); } /* @@ -161,7 +133,7 @@ static void __rpc_add_wait_queue(struct rpc_wait_queue *queue, struct rpc_task * list_add(&task->u.tk_wait.list, &queue->tasks[0]); else list_add_tail(&task->u.tk_wait.list, &queue->tasks[0]); - task->u.tk_wait.rpc_waitq = queue; + task->tk_waitqueue = queue; queue->qlen++; rpc_set_queued(task); @@ -181,22 +153,18 @@ static void __rpc_remove_wait_queue_priority(struct rpc_task *task) list_move(&t->u.tk_wait.list, &task->u.tk_wait.list); list_splice_init(&task->u.tk_wait.links, &t->u.tk_wait.links); } - list_del(&task->u.tk_wait.list); } /* * Remove request from queue. * Note: must be called with spin lock held. */ -static void __rpc_remove_wait_queue(struct rpc_task *task) +static void __rpc_remove_wait_queue(struct rpc_wait_queue *queue, struct rpc_task *task) { - struct rpc_wait_queue *queue; - queue = task->u.tk_wait.rpc_waitq; - + __rpc_disable_timer(queue, task); if (RPC_IS_PRIORITY(queue)) __rpc_remove_wait_queue_priority(task); - else - list_del(&task->u.tk_wait.list); + list_del(&task->u.tk_wait.list); queue->qlen--; dprintk("RPC: %5u removed from queue %p \"%s\"\n", task->tk_pid, queue, rpc_qname(queue)); @@ -229,6 +197,9 @@ static void __rpc_init_priority_wait_queue(struct rpc_wait_queue *queue, const c INIT_LIST_HEAD(&queue->tasks[i]); queue->maxpriority = nr_queues - 1; rpc_reset_waitqueue_priority(queue); + queue->qlen = 0; + setup_timer(&queue->timer_list.timer, __rpc_queue_timer_fn, (unsigned long)queue); + INIT_LIST_HEAD(&queue->timer_list.list); #ifdef RPC_DEBUG queue->name = qname; #endif @@ -245,6 +216,12 @@ void rpc_init_wait_queue(struct rpc_wait_queue *queue, const char *qname) } EXPORT_SYMBOL_GPL(rpc_init_wait_queue); +void rpc_destroy_wait_queue(struct rpc_wait_queue *queue) +{ + del_timer_sync(&queue->timer_list.timer); +} +EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue); + static int rpc_wait_bit_killable(void *word) { if (fatal_signal_pending(current)) @@ -313,7 +290,6 @@ EXPORT_SYMBOL_GPL(__rpc_wait_for_completion_task); */ static void rpc_make_runnable(struct rpc_task *task) { - BUG_ON(task->tk_timeout_fn); rpc_clear_queued(task); if (rpc_test_and_set_running(task)) return; @@ -326,7 +302,7 @@ static void rpc_make_runnable(struct rpc_task *task) int status; INIT_WORK(&task->u.tk_work, rpc_async_schedule); - status = queue_work(task->tk_workqueue, &task->u.tk_work); + status = queue_work(rpciod_workqueue, &task->u.tk_work); if (status < 0) { printk(KERN_WARNING "RPC: failed to add task to queue: error: %d!\n", status); task->tk_status = status; @@ -343,7 +319,7 @@ static void rpc_make_runnable(struct rpc_task *task) * as it's on a wait queue. */ static void __rpc_sleep_on(struct rpc_wait_queue *q, struct rpc_task *task, - rpc_action action, rpc_action timer) + rpc_action action) { dprintk("RPC: %5u sleep_on(queue \"%s\" time %lu)\n", task->tk_pid, rpc_qname(q), jiffies); @@ -357,11 +333,11 @@ static void __rpc_sleep_on(struct rpc_wait_queue *q, struct rpc_task *task, BUG_ON(task->tk_callback != NULL); task->tk_callback = action; - __rpc_add_timer(task, timer); + __rpc_add_timer(q, task); } void rpc_sleep_on(struct rpc_wait_queue *q, struct rpc_task *task, - rpc_action action, rpc_action timer) + rpc_action action) { /* Mark the task as being activated if so needed */ rpc_set_active(task); @@ -370,18 +346,19 @@ void rpc_sleep_on(struct rpc_wait_queue *q, struct rpc_task *task, * Protect the queue operations. */ spin_lock_bh(&q->lock); - __rpc_sleep_on(q, task, action, timer); + __rpc_sleep_on(q, task, action); spin_unlock_bh(&q->lock); } EXPORT_SYMBOL_GPL(rpc_sleep_on); /** * __rpc_do_wake_up_task - wake up a single rpc_task + * @queue: wait queue * @task: task to be woken up * * Caller must hold queue->lock, and have cleared the task queued flag. */ -static void __rpc_do_wake_up_task(struct rpc_task *task) +static void __rpc_do_wake_up_task(struct rpc_wait_queue *queue, struct rpc_task *task) { dprintk("RPC: %5u __rpc_wake_up_task (now %lu)\n", task->tk_pid, jiffies); @@ -395,8 +372,7 @@ static void __rpc_do_wake_up_task(struct rpc_task *task) return; } - __rpc_disable_timer(task); - __rpc_remove_wait_queue(task); + __rpc_remove_wait_queue(queue, task); rpc_make_runnable(task); @@ -404,48 +380,32 @@ static void __rpc_do_wake_up_task(struct rpc_task *task) } /* - * Wake up the specified task + * Wake up a queued task while the queue lock is being held */ -static void __rpc_wake_up_task(struct rpc_task *task) +static void rpc_wake_up_task_queue_locked(struct rpc_wait_queue *queue, struct rpc_task *task) { - if (rpc_start_wakeup(task)) { - if (RPC_IS_QUEUED(task)) - __rpc_do_wake_up_task(task); - rpc_finish_wakeup(task); - } + if (RPC_IS_QUEUED(task) && task->tk_waitqueue == queue) + __rpc_do_wake_up_task(queue, task); } /* - * Default timeout handler if none specified by user + * Wake up a task on a specific queue */ -static void -__rpc_default_timer(struct rpc_task *task) +void rpc_wake_up_queued_task(struct rpc_wait_queue *queue, struct rpc_task *task) { - dprintk("RPC: %5u timeout (default timer)\n", task->tk_pid); - task->tk_status = -ETIMEDOUT; - rpc_wake_up_task(task); + spin_lock_bh(&queue->lock); + rpc_wake_up_task_queue_locked(queue, task); + spin_unlock_bh(&queue->lock); } +EXPORT_SYMBOL_GPL(rpc_wake_up_queued_task); /* * Wake up the specified task */ -void rpc_wake_up_task(struct rpc_task *task) +static void rpc_wake_up_task(struct rpc_task *task) { - rcu_read_lock_bh(); - if (rpc_start_wakeup(task)) { - if (RPC_IS_QUEUED(task)) { - struct rpc_wait_queue *queue = task->u.tk_wait.rpc_waitq; - - /* Note: we're already in a bh-safe context */ - spin_lock(&queue->lock); - __rpc_do_wake_up_task(task); - spin_unlock(&queue->lock); - } - rpc_finish_wakeup(task); - } - rcu_read_unlock_bh(); + rpc_wake_up_queued_task(task->tk_waitqueue, task); } -EXPORT_SYMBOL_GPL(rpc_wake_up_task); /* * Wake up the next task on a priority queue. @@ -495,7 +455,7 @@ new_queue: new_owner: rpc_set_waitqueue_owner(queue, task->tk_owner); out: - __rpc_wake_up_task(task); + rpc_wake_up_task_queue_locked(queue, task); return task; } @@ -508,16 +468,14 @@ struct rpc_task * rpc_wake_up_next(struct rpc_wait_queue *queue) dprintk("RPC: wake_up_next(%p \"%s\")\n", queue, rpc_qname(queue)); - rcu_read_lock_bh(); - spin_lock(&queue->lock); + spin_lock_bh(&queue->lock); if (RPC_IS_PRIORITY(queue)) task = __rpc_wake_up_next_priority(queue); else { task_for_first(task, &queue->tasks[0]) - __rpc_wake_up_task(task); + rpc_wake_up_task_queue_locked(queue, task); } - spin_unlock(&queue->lock); - rcu_read_unlock_bh(); + spin_unlock_bh(&queue->lock); return task; } @@ -534,18 +492,16 @@ void rpc_wake_up(struct rpc_wait_queue *queue) struct rpc_task *task, *next; struct list_head *head; - rcu_read_lock_bh(); - spin_lock(&queue->lock); + spin_lock_bh(&queue->lock); head = &queue->tasks[queue->maxpriority]; for (;;) { list_for_each_entry_safe(task, next, head, u.tk_wait.list) - __rpc_wake_up_task(task); + rpc_wake_up_task_queue_locked(queue, task); if (head == &queue->tasks[0]) break; head--; } - spin_unlock(&queue->lock); - rcu_read_unlock_bh(); + spin_unlock_bh(&queue->lock); } EXPORT_SYMBOL_GPL(rpc_wake_up); @@ -561,26 +517,48 @@ void rpc_wake_up_status(struct rpc_wait_queue *queue, int status) struct rpc_task *task, *next; struct list_head *head; - rcu_read_lock_bh(); - spin_lock(&queue->lock); + spin_lock_bh(&queue->lock); head = &queue->tasks[queue->maxpriority]; for (;;) { list_for_each_entry_safe(task, next, head, u.tk_wait.list) { task->tk_status = status; - __rpc_wake_up_task(task); + rpc_wake_up_task_queue_locked(queue, task); } if (head == &queue->tasks[0]) break; head--; } - spin_unlock(&queue->lock); - rcu_read_unlock_bh(); + spin_unlock_bh(&queue->lock); } EXPORT_SYMBOL_GPL(rpc_wake_up_status); +static void __rpc_queue_timer_fn(unsigned long ptr) +{ + struct rpc_wait_queue *queue = (struct rpc_wait_queue *)ptr; + struct rpc_task *task, *n; + unsigned long expires, now, timeo; + + spin_lock(&queue->lock); + expires = now = jiffies; + list_for_each_entry_safe(task, n, &queue->timer_list.list, u.tk_wait.timer_list) { + timeo = task->u.tk_wait.expires; + if (time_after_eq(now, timeo)) { + dprintk("RPC: %5u timeout\n", task->tk_pid); + task->tk_status = -ETIMEDOUT; + rpc_wake_up_task_queue_locked(queue, task); + continue; + } + if (expires == now || time_after(expires, timeo)) + expires = timeo; + } + if (!list_empty(&queue->timer_list.list)) + rpc_set_queue_timer(queue, expires); + spin_unlock(&queue->lock); +} + static void __rpc_atrun(struct rpc_task *task) { - rpc_wake_up_task(task); + task->tk_status = 0; } /* @@ -589,7 +567,7 @@ static void __rpc_atrun(struct rpc_task *task) void rpc_delay(struct rpc_task *task, unsigned long delay) { task->tk_timeout = delay; - rpc_sleep_on(&delay_queue, task, NULL, __rpc_atrun); + rpc_sleep_on(&delay_queue, task, __rpc_atrun); } EXPORT_SYMBOL_GPL(rpc_delay); @@ -644,10 +622,6 @@ static void __rpc_execute(struct rpc_task *task) BUG_ON(RPC_IS_QUEUED(task)); for (;;) { - /* - * Garbage collection of pending timers... - */ - rpc_delete_timer(task); /* * Execute any pending callback. @@ -816,8 +790,6 @@ EXPORT_SYMBOL_GPL(rpc_free); static void rpc_init_task(struct rpc_task *task, const struct rpc_task_setup *task_setup_data) { memset(task, 0, sizeof(*task)); - setup_timer(&task->tk_timer, (void (*)(unsigned long))rpc_run_timer, - (unsigned long)task); atomic_set(&task->tk_count, 1); task->tk_flags = task_setup_data->flags; task->tk_ops = task_setup_data->callback_ops; @@ -832,7 +804,7 @@ static void rpc_init_task(struct rpc_task *task, const struct rpc_task_setup *ta task->tk_owner = current->tgid; /* Initialize workqueue for async tasks */ - task->tk_workqueue = rpciod_workqueue; + task->tk_workqueue = task_setup_data->workqueue; task->tk_client = task_setup_data->rpc_client; if (task->tk_client != NULL) { @@ -845,12 +817,11 @@ static void rpc_init_task(struct rpc_task *task, const struct rpc_task_setup *ta task->tk_action = rpc_prepare_task; if (task_setup_data->rpc_message != NULL) { - memcpy(&task->tk_msg, task_setup_data->rpc_message, sizeof(task->tk_msg)); + task->tk_msg.rpc_proc = task_setup_data->rpc_message->rpc_proc; + task->tk_msg.rpc_argp = task_setup_data->rpc_message->rpc_argp; + task->tk_msg.rpc_resp = task_setup_data->rpc_message->rpc_resp; /* Bind the user cred */ - if (task->tk_msg.rpc_cred != NULL) - rpcauth_holdcred(task); - else - rpcauth_bindcred(task); + rpcauth_bindcred(task, task_setup_data->rpc_message->rpc_cred, task_setup_data->flags); if (task->tk_action == NULL) rpc_call_start(task); } @@ -868,13 +839,6 @@ rpc_alloc_task(void) return (struct rpc_task *)mempool_alloc(rpc_task_mempool, GFP_NOFS); } -static void rpc_free_task(struct rcu_head *rcu) -{ - struct rpc_task *task = container_of(rcu, struct rpc_task, u.tk_rcu); - dprintk("RPC: %5u freeing task\n", task->tk_pid); - mempool_free(task, rpc_task_mempool); -} - /* * Create a new task for the specified client. */ @@ -898,12 +862,25 @@ out: return task; } - -void rpc_put_task(struct rpc_task *task) +static void rpc_free_task(struct rpc_task *task) { const struct rpc_call_ops *tk_ops = task->tk_ops; void *calldata = task->tk_calldata; + if (task->tk_flags & RPC_TASK_DYNAMIC) { + dprintk("RPC: %5u freeing task\n", task->tk_pid); + mempool_free(task, rpc_task_mempool); + } + rpc_release_calldata(tk_ops, calldata); +} + +static void rpc_async_release(struct work_struct *work) +{ + rpc_free_task(container_of(work, struct rpc_task, u.tk_work)); +} + +void rpc_put_task(struct rpc_task *task) +{ if (!atomic_dec_and_test(&task->tk_count)) return; /* Release resources */ @@ -915,9 +892,11 @@ void rpc_put_task(struct rpc_task *task) rpc_release_client(task->tk_client); task->tk_client = NULL; } - if (task->tk_flags & RPC_TASK_DYNAMIC) - call_rcu_bh(&task->u.tk_rcu, rpc_free_task); - rpc_release_calldata(tk_ops, calldata); + if (task->tk_workqueue != NULL) { + INIT_WORK(&task->u.tk_work, rpc_async_release); + queue_work(task->tk_workqueue, &task->u.tk_work); + } else + rpc_free_task(task); } EXPORT_SYMBOL_GPL(rpc_put_task); @@ -937,9 +916,6 @@ static void rpc_release_task(struct rpc_task *task) } BUG_ON (RPC_IS_QUEUED(task)); - /* Synchronously delete any running timer */ - rpc_delete_timer(task); - #ifdef RPC_DEBUG task->tk_magic = 0; #endif @@ -1029,11 +1005,20 @@ rpc_destroy_mempool(void) kmem_cache_destroy(rpc_task_slabp); if (rpc_buffer_slabp) kmem_cache_destroy(rpc_buffer_slabp); + rpc_destroy_wait_queue(&delay_queue); } int rpc_init_mempool(void) { + /* + * The following is not strictly a mempool initialisation, + * but there is no harm in doing it here + */ + rpc_init_wait_queue(&delay_queue, "delayq"); + if (!rpciod_start()) + goto err_nomem; + rpc_task_slabp = kmem_cache_create("rpc_tasks", sizeof(struct rpc_task), 0, SLAB_HWCACHE_ALIGN, @@ -1054,13 +1039,6 @@ rpc_init_mempool(void) rpc_buffer_slabp); if (!rpc_buffer_mempool) goto err_nomem; - if (!rpciod_start()) - goto err_nomem; - /* - * The following is not strictly a mempool initialisation, - * but there is no harm in doing it here - */ - rpc_init_wait_queue(&delay_queue, "delayq"); return 0; err_nomem: rpc_destroy_mempool(); diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index a290e15..71023c7 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -584,7 +584,7 @@ __svc_create_thread(svc_thread_fn func, struct svc_serv *serv, struct svc_rqst *rqstp; int error = -ENOMEM; int have_oldmask = 0; - cpumask_t oldmask; + cpumask_t uninitialized_var(oldmask); rqstp = svc_prepare_thread(serv, pool); if (IS_ERR(rqstp)) { @@ -613,16 +613,6 @@ out_thread: } /* - * Create a thread in the default pool. Caller must hold BKL. - */ -int -svc_create_thread(svc_thread_fn func, struct svc_serv *serv) -{ - return __svc_create_thread(func, serv, &serv->sv_pools[0]); -} -EXPORT_SYMBOL(svc_create_thread); - -/* * Choose a pool in which to create a new thread, for svc_set_num_threads */ static inline struct svc_pool * @@ -915,8 +905,7 @@ svc_process(struct svc_rqst *rqstp) case SVC_OK: break; case SVC_GARBAGE: - rpc_stat = rpc_garbage_args; - goto err_bad; + goto err_garbage; case SVC_SYSERR: rpc_stat = rpc_system_err; goto err_bad; diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index 332eb47..d8e8d79 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -586,8 +587,12 @@ int svc_recv(struct svc_rqst *rqstp, long timeout) while (rqstp->rq_pages[i] == NULL) { struct page *p = alloc_page(GFP_KERNEL); if (!p) { - int j = msecs_to_jiffies(500); - schedule_timeout_uninterruptible(j); + set_current_state(TASK_INTERRUPTIBLE); + if (signalled() || kthread_should_stop()) { + set_current_state(TASK_RUNNING); + return -EINTR; + } + schedule_timeout(msecs_to_jiffies(500)); } rqstp->rq_pages[i] = p; } @@ -607,7 +612,7 @@ int svc_recv(struct svc_rqst *rqstp, long timeout) try_to_freeze(); cond_resched(); - if (signalled()) + if (signalled() || kthread_should_stop()) return -EINTR; spin_lock_bh(&pool->sp_lock); @@ -626,6 +631,20 @@ int svc_recv(struct svc_rqst *rqstp, long timeout) * to bring down the daemons ... */ set_current_state(TASK_INTERRUPTIBLE); + + /* + * checking kthread_should_stop() here allows us to avoid + * locking and signalling when stopping kthreads that call + * svc_recv. If the thread has already been woken up, then + * we can exit here without sleeping. If not, then it + * it'll be woken up quickly during the schedule_timeout + */ + if (kthread_should_stop()) { + set_current_state(TASK_RUNNING); + spin_unlock_bh(&pool->sp_lock); + return -EINTR; + } + add_wait_queue(&rqstp->rq_wait, &wait); spin_unlock_bh(&pool->sp_lock); @@ -641,7 +660,10 @@ int svc_recv(struct svc_rqst *rqstp, long timeout) svc_thread_dequeue(pool, rqstp); spin_unlock_bh(&pool->sp_lock); dprintk("svc: server %p, no data yet\n", rqstp); - return signalled()? -EINTR : -EAGAIN; + if (signalled() || kthread_should_stop()) + return -EINTR; + else + return -EAGAIN; } } spin_unlock_bh(&pool->sp_lock); diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c index 3c64051..3f30ee6 100644 --- a/net/sunrpc/svcauth_unix.c +++ b/net/sunrpc/svcauth_unix.c @@ -11,7 +11,8 @@ #include #include #include - +#include +#include #define RPCDBG_FACILITY RPCDBG_AUTH @@ -85,7 +86,7 @@ static void svcauth_unix_domain_release(struct auth_domain *dom) struct ip_map { struct cache_head h; char m_class[8]; /* e.g. "nfsd" */ - struct in_addr m_addr; + struct in6_addr m_addr; struct unix_domain *m_client; int m_add_change; }; @@ -113,12 +114,19 @@ static inline int hash_ip(__be32 ip) return (hash ^ (hash>>8)) & 0xff; } #endif +static inline int hash_ip6(struct in6_addr ip) +{ + return (hash_ip(ip.s6_addr32[0]) ^ + hash_ip(ip.s6_addr32[1]) ^ + hash_ip(ip.s6_addr32[2]) ^ + hash_ip(ip.s6_addr32[3])); +} static int ip_map_match(struct cache_head *corig, struct cache_head *cnew) { struct ip_map *orig = container_of(corig, struct ip_map, h); struct ip_map *new = container_of(cnew, struct ip_map, h); return strcmp(orig->m_class, new->m_class) == 0 - && orig->m_addr.s_addr == new->m_addr.s_addr; + && ipv6_addr_equal(&orig->m_addr, &new->m_addr); } static void ip_map_init(struct cache_head *cnew, struct cache_head *citem) { @@ -126,7 +134,7 @@ static void ip_map_init(struct cache_head *cnew, struct cache_head *citem) struct ip_map *item = container_of(citem, struct ip_map, h); strcpy(new->m_class, item->m_class); - new->m_addr.s_addr = item->m_addr.s_addr; + ipv6_addr_copy(&new->m_addr, &item->m_addr); } static void update(struct cache_head *cnew, struct cache_head *citem) { @@ -150,22 +158,24 @@ static void ip_map_request(struct cache_detail *cd, struct cache_head *h, char **bpp, int *blen) { - char text_addr[20]; + char text_addr[40]; struct ip_map *im = container_of(h, struct ip_map, h); - __be32 addr = im->m_addr.s_addr; - - snprintf(text_addr, 20, "%u.%u.%u.%u", - ntohl(addr) >> 24 & 0xff, - ntohl(addr) >> 16 & 0xff, - ntohl(addr) >> 8 & 0xff, - ntohl(addr) >> 0 & 0xff); + if (ipv6_addr_v4mapped(&(im->m_addr))) { + snprintf(text_addr, 20, NIPQUAD_FMT, + ntohl(im->m_addr.s6_addr32[3]) >> 24 & 0xff, + ntohl(im->m_addr.s6_addr32[3]) >> 16 & 0xff, + ntohl(im->m_addr.s6_addr32[3]) >> 8 & 0xff, + ntohl(im->m_addr.s6_addr32[3]) >> 0 & 0xff); + } else { + snprintf(text_addr, 40, NIP6_FMT, NIP6(im->m_addr)); + } qword_add(bpp, blen, im->m_class); qword_add(bpp, blen, text_addr); (*bpp)[-1] = '\n'; } -static struct ip_map *ip_map_lookup(char *class, struct in_addr addr); +static struct ip_map *ip_map_lookup(char *class, struct in6_addr *addr); static int ip_map_update(struct ip_map *ipm, struct unix_domain *udom, time_t expiry); static int ip_map_parse(struct cache_detail *cd, @@ -176,10 +186,10 @@ static int ip_map_parse(struct cache_detail *cd, * for scratch: */ char *buf = mesg; int len; - int b1,b2,b3,b4; + int b1, b2, b3, b4, b5, b6, b7, b8; char c; char class[8]; - struct in_addr addr; + struct in6_addr addr; int err; struct ip_map *ipmp; @@ -198,7 +208,23 @@ static int ip_map_parse(struct cache_detail *cd, len = qword_get(&mesg, buf, mlen); if (len <= 0) return -EINVAL; - if (sscanf(buf, "%u.%u.%u.%u%c", &b1, &b2, &b3, &b4, &c) != 4) + if (sscanf(buf, NIPQUAD_FMT "%c", &b1, &b2, &b3, &b4, &c) == 4) { + addr.s6_addr32[0] = 0; + addr.s6_addr32[1] = 0; + addr.s6_addr32[2] = htonl(0xffff); + addr.s6_addr32[3] = + htonl((((((b1<<8)|b2)<<8)|b3)<<8)|b4); + } else if (sscanf(buf, NIP6_FMT "%c", + &b1, &b2, &b3, &b4, &b5, &b6, &b7, &b8, &c) == 8) { + addr.s6_addr16[0] = htons(b1); + addr.s6_addr16[1] = htons(b2); + addr.s6_addr16[2] = htons(b3); + addr.s6_addr16[3] = htons(b4); + addr.s6_addr16[4] = htons(b5); + addr.s6_addr16[5] = htons(b6); + addr.s6_addr16[6] = htons(b7); + addr.s6_addr16[7] = htons(b8); + } else return -EINVAL; expiry = get_expiry(&mesg); @@ -216,10 +242,7 @@ static int ip_map_parse(struct cache_detail *cd, } else dom = NULL; - addr.s_addr = - htonl((((((b1<<8)|b2)<<8)|b3)<<8)|b4); - - ipmp = ip_map_lookup(class,addr); + ipmp = ip_map_lookup(class, &addr); if (ipmp) { err = ip_map_update(ipmp, container_of(dom, struct unix_domain, h), @@ -239,7 +262,7 @@ static int ip_map_show(struct seq_file *m, struct cache_head *h) { struct ip_map *im; - struct in_addr addr; + struct in6_addr addr; char *dom = "-no-domain-"; if (h == NULL) { @@ -248,20 +271,24 @@ static int ip_map_show(struct seq_file *m, } im = container_of(h, struct ip_map, h); /* class addr domain */ - addr = im->m_addr; + ipv6_addr_copy(&addr, &im->m_addr); if (test_bit(CACHE_VALID, &h->flags) && !test_bit(CACHE_NEGATIVE, &h->flags)) dom = im->m_client->h.name; - seq_printf(m, "%s %d.%d.%d.%d %s\n", - im->m_class, - ntohl(addr.s_addr) >> 24 & 0xff, - ntohl(addr.s_addr) >> 16 & 0xff, - ntohl(addr.s_addr) >> 8 & 0xff, - ntohl(addr.s_addr) >> 0 & 0xff, - dom - ); + if (ipv6_addr_v4mapped(&addr)) { + seq_printf(m, "%s" NIPQUAD_FMT "%s\n", + im->m_class, + ntohl(addr.s6_addr32[3]) >> 24 & 0xff, + ntohl(addr.s6_addr32[3]) >> 16 & 0xff, + ntohl(addr.s6_addr32[3]) >> 8 & 0xff, + ntohl(addr.s6_addr32[3]) >> 0 & 0xff, + dom); + } else { + seq_printf(m, "%s" NIP6_FMT "%s\n", + im->m_class, NIP6(addr), dom); + } return 0; } @@ -281,16 +308,16 @@ struct cache_detail ip_map_cache = { .alloc = ip_map_alloc, }; -static struct ip_map *ip_map_lookup(char *class, struct in_addr addr) +static struct ip_map *ip_map_lookup(char *class, struct in6_addr *addr) { struct ip_map ip; struct cache_head *ch; strcpy(ip.m_class, class); - ip.m_addr = addr; + ipv6_addr_copy(&ip.m_addr, addr); ch = sunrpc_cache_lookup(&ip_map_cache, &ip.h, hash_str(class, IP_HASHBITS) ^ - hash_ip(addr.s_addr)); + hash_ip6(*addr)); if (ch) return container_of(ch, struct ip_map, h); @@ -319,14 +346,14 @@ static int ip_map_update(struct ip_map *ipm, struct unix_domain *udom, time_t ex ch = sunrpc_cache_update(&ip_map_cache, &ip.h, &ipm->h, hash_str(ipm->m_class, IP_HASHBITS) ^ - hash_ip(ipm->m_addr.s_addr)); + hash_ip6(ipm->m_addr)); if (!ch) return -ENOMEM; cache_put(ch, &ip_map_cache); return 0; } -int auth_unix_add_addr(struct in_addr addr, struct auth_domain *dom) +int auth_unix_add_addr(struct in6_addr *addr, struct auth_domain *dom) { struct unix_domain *udom; struct ip_map *ipmp; @@ -355,7 +382,7 @@ int auth_unix_forget_old(struct auth_domain *dom) } EXPORT_SYMBOL(auth_unix_forget_old); -struct auth_domain *auth_unix_lookup(struct in_addr addr) +struct auth_domain *auth_unix_lookup(struct in6_addr *addr) { struct ip_map *ipm; struct auth_domain *rv; @@ -650,9 +677,24 @@ static int unix_gid_find(uid_t uid, struct group_info **gip, int svcauth_unix_set_client(struct svc_rqst *rqstp) { - struct sockaddr_in *sin = svc_addr_in(rqstp); + struct sockaddr_in *sin; + struct sockaddr_in6 *sin6, sin6_storage; struct ip_map *ipm; + switch (rqstp->rq_addr.ss_family) { + case AF_INET: + sin = svc_addr_in(rqstp); + sin6 = &sin6_storage; + ipv6_addr_set(&sin6->sin6_addr, 0, 0, + htonl(0x0000FFFF), sin->sin_addr.s_addr); + break; + case AF_INET6: + sin6 = svc_addr_in6(rqstp); + break; + default: + BUG(); + } + rqstp->rq_client = NULL; if (rqstp->rq_proc == 0) return SVC_OK; @@ -660,7 +702,7 @@ svcauth_unix_set_client(struct svc_rqst *rqstp) ipm = ip_map_cached_get(rqstp); if (ipm == NULL) ipm = ip_map_lookup(rqstp->rq_server->sv_program->pg_class, - sin->sin_addr); + &sin6->sin6_addr); if (ipm == NULL) return SVC_DENIED; diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index c475977..a45c378 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1101,6 +1101,7 @@ void svc_sock_update_bufs(struct svc_serv *serv) } spin_unlock_bh(&serv->sv_lock); } +EXPORT_SYMBOL(svc_sock_update_bufs); /* * Initialize socket for RPC use and create svc_sock struct diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index d5553b8..85199c6 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -188,9 +188,9 @@ out_sleep: task->tk_timeout = 0; task->tk_status = -EAGAIN; if (req && req->rq_ntrans) - rpc_sleep_on(&xprt->resend, task, NULL, NULL); + rpc_sleep_on(&xprt->resend, task, NULL); else - rpc_sleep_on(&xprt->sending, task, NULL, NULL); + rpc_sleep_on(&xprt->sending, task, NULL); return 0; } EXPORT_SYMBOL_GPL(xprt_reserve_xprt); @@ -238,9 +238,9 @@ out_sleep: task->tk_timeout = 0; task->tk_status = -EAGAIN; if (req && req->rq_ntrans) - rpc_sleep_on(&xprt->resend, task, NULL, NULL); + rpc_sleep_on(&xprt->resend, task, NULL); else - rpc_sleep_on(&xprt->sending, task, NULL, NULL); + rpc_sleep_on(&xprt->sending, task, NULL); return 0; } EXPORT_SYMBOL_GPL(xprt_reserve_xprt_cong); @@ -453,7 +453,7 @@ void xprt_wait_for_buffer_space(struct rpc_task *task) struct rpc_xprt *xprt = req->rq_xprt; task->tk_timeout = req->rq_timeout; - rpc_sleep_on(&xprt->pending, task, NULL, NULL); + rpc_sleep_on(&xprt->pending, task, NULL); } EXPORT_SYMBOL_GPL(xprt_wait_for_buffer_space); @@ -472,7 +472,7 @@ void xprt_write_space(struct rpc_xprt *xprt) if (xprt->snd_task) { dprintk("RPC: write space: waking waiting task on " "xprt %p\n", xprt); - rpc_wake_up_task(xprt->snd_task); + rpc_wake_up_queued_task(&xprt->pending, xprt->snd_task); } spin_unlock_bh(&xprt->transport_lock); } @@ -602,8 +602,7 @@ void xprt_force_disconnect(struct rpc_xprt *xprt) /* Try to schedule an autoclose RPC call */ if (test_and_set_bit(XPRT_LOCKED, &xprt->state) == 0) queue_work(rpciod_workqueue, &xprt->task_cleanup); - else if (xprt->snd_task != NULL) - rpc_wake_up_task(xprt->snd_task); + xprt_wake_pending_tasks(xprt, -ENOTCONN); spin_unlock_bh(&xprt->transport_lock); } EXPORT_SYMBOL_GPL(xprt_force_disconnect); @@ -653,7 +652,7 @@ void xprt_connect(struct rpc_task *task) task->tk_rqstp->rq_bytes_sent = 0; task->tk_timeout = xprt->connect_timeout; - rpc_sleep_on(&xprt->pending, task, xprt_connect_status, NULL); + rpc_sleep_on(&xprt->pending, task, xprt_connect_status); xprt->stat.connect_start = jiffies; xprt->ops->connect(task); } @@ -749,18 +748,19 @@ EXPORT_SYMBOL_GPL(xprt_update_rtt); void xprt_complete_rqst(struct rpc_task *task, int copied) { struct rpc_rqst *req = task->tk_rqstp; + struct rpc_xprt *xprt = req->rq_xprt; dprintk("RPC: %5u xid %08x complete (%d bytes received)\n", task->tk_pid, ntohl(req->rq_xid), copied); - task->tk_xprt->stat.recvs++; + xprt->stat.recvs++; task->tk_rtt = (long)jiffies - req->rq_xtime; list_del_init(&req->rq_list); /* Ensure all writes are done before we update req->rq_received */ smp_wmb(); req->rq_received = req->rq_private_buf.len = copied; - rpc_wake_up_task(task); + rpc_wake_up_queued_task(&xprt->pending, task); } EXPORT_SYMBOL_GPL(xprt_complete_rqst); @@ -769,17 +769,17 @@ static void xprt_timer(struct rpc_task *task) struct rpc_rqst *req = task->tk_rqstp; struct rpc_xprt *xprt = req->rq_xprt; + if (task->tk_status != -ETIMEDOUT) + return; dprintk("RPC: %5u xprt_timer\n", task->tk_pid); - spin_lock(&xprt->transport_lock); + spin_lock_bh(&xprt->transport_lock); if (!req->rq_received) { if (xprt->ops->timer) xprt->ops->timer(task); - task->tk_status = -ETIMEDOUT; - } - task->tk_timeout = 0; - rpc_wake_up_task(task); - spin_unlock(&xprt->transport_lock); + } else + task->tk_status = 0; + spin_unlock_bh(&xprt->transport_lock); } /** @@ -864,7 +864,7 @@ void xprt_transmit(struct rpc_task *task) if (!xprt_connected(xprt)) task->tk_status = -ENOTCONN; else if (!req->rq_received) - rpc_sleep_on(&xprt->pending, task, NULL, xprt_timer); + rpc_sleep_on(&xprt->pending, task, xprt_timer); spin_unlock_bh(&xprt->transport_lock); return; } @@ -875,7 +875,7 @@ void xprt_transmit(struct rpc_task *task) */ task->tk_status = status; if (status == -ECONNREFUSED) - rpc_sleep_on(&xprt->sending, task, NULL, NULL); + rpc_sleep_on(&xprt->sending, task, NULL); } static inline void do_xprt_reserve(struct rpc_task *task) @@ -895,7 +895,7 @@ static inline void do_xprt_reserve(struct rpc_task *task) dprintk("RPC: waiting for request slot\n"); task->tk_status = -EAGAIN; task->tk_timeout = 0; - rpc_sleep_on(&xprt->backlog, task, NULL, NULL); + rpc_sleep_on(&xprt->backlog, task, NULL); } /** @@ -1052,6 +1052,11 @@ static void xprt_destroy(struct kref *kref) xprt->shutdown = 1; del_timer_sync(&xprt->timer); + rpc_destroy_wait_queue(&xprt->binding); + rpc_destroy_wait_queue(&xprt->pending); + rpc_destroy_wait_queue(&xprt->sending); + rpc_destroy_wait_queue(&xprt->resend); + rpc_destroy_wait_queue(&xprt->backlog); /* * Tear down transport state and free the rpc_xprt */ diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c index 16fd3f6..af408fc 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c @@ -1036,6 +1036,8 @@ int svc_rdma_send(struct svcxprt_rdma *xprt, struct ib_send_wr *wr) wait_event(xprt->sc_send_wait, atomic_read(&xprt->sc_sq_count) < xprt->sc_sq_depth); + if (test_bit(XPT_CLOSE, &xprt->sc_xprt.xpt_flags)) + return 0; continue; } /* Bumped used SQ WR count and post */ diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 30e7ac2..8bd3b0f 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -1073,6 +1073,7 @@ static void xs_tcp_data_ready(struct sock *sk, int bytes) { struct rpc_xprt *xprt; read_descriptor_t rd_desc; + int read; dprintk("RPC: xs_tcp_data_ready...\n"); @@ -1084,8 +1085,10 @@ static void xs_tcp_data_ready(struct sock *sk, int bytes) /* We use rd_desc to pass struct xprt to xs_tcp_data_recv */ rd_desc.arg.data = xprt; - rd_desc.count = 65536; - tcp_read_sock(sk, &rd_desc, xs_tcp_data_recv); + do { + rd_desc.count = 65536; + read = tcp_read_sock(sk, &rd_desc, xs_tcp_data_recv); + } while (read > 0); out: read_unlock(&sk->sk_callback_lock); }