GIT 4054953d4510f4cfbf5fb74d09c1446f81357e25 git://git.linux-nfs.org/~bfields/linux.git#for-mm commit 14de4d794f93da7e117f546317cbf0e3e0d95a3e Author: J. Bruce Fields Date: Tue Sep 25 11:57:19 2007 -0400 locks: add warning about mandatory locking races The mandatory file locking implementation has long-standing races that probably render it useless. I know of no plans to fix them. Till we do, we should at least warn people. Signed-off-by: J. Bruce Fields commit 27e031db5f6ef886a3f8a947a5e02e07331a42e3 Author: J. Bruce Fields Date: Mon Sep 24 18:52:09 2007 -0400 Documentation: move mandatory locking documentation to filesystems/ Shouldn't this mandatory-locking documentation be in the Documentation/filesystems directory? Give it a more descriptive name while we're at it, and update 00-INDEX with a more inclusive description of Documentation/filesystems (which has already talked about more than just individual filesystems). Signed-off-by: J. Bruce Fields Acked-by: Randy Dunlap commit 89dab77633d96474dc88ad471957a48160482d71 Author: Dr. David Alan Gilbert Date: Sat Aug 25 16:09:27 2007 +0100 knfsd: Add source address to sunrpc svc errors This patch adds the address of the client that caused an error in sunrpc/svc.c so that you get errors that look like: svc: 192.168.66.28, port=709: unknown version (3 for prog 100003, nfsd) I've seen machines which get bunches of unknown version or similar errors from time to time, and while the recent patch to add the service helps to find which service has the wrong version it doesn't help find the potentially bad client. The patch is against a checkout of Linus's git tree made on 2007-08-24. One observation is that the svc_print_addr function prints to a buffer which in this case makes life a little more complex; it just feels as if there must be lots of places that print a connection address - is there a better function to use anywhere? I think actually there are a few places with semi duplicated code; e.g. one_sock_name switches on the address family but only currently has IPV4; I wonder how many other places are similar. Signed-off-by: Dave Gilbert Cc: Randy Dunlap Signed-off-by: J. Bruce Fields commit a5b51d88131cab6c1b99ce5da934028cf3927cc2 Author: Peter Staubach Date: Thu Aug 16 12:10:07 2007 -0400 knfsd: 64 bit ino support for NFS server Modify the NFS server code to support 64 bit ino's, as appropriate for the system and the NFS protocol version. The gist of the changes is to query the underlying file system for attributes and not just to use the cached attributes in the inode. For this specific purpose, the inode only contains an ino field which unsigned long, which is large enough on 64 bit platforms, but is not large enough on 32 bit platforms. I haven't been able to find any reason why ->getattr can't be called while i_mutex. The specification indicates that i_mutex is not required to be held in order to invoke ->getattr, but it doesn't say that i_mutex can't be held while invoking ->getattr. I also haven't come to any conclusions regarding the value of lease_get_mtime() and whether it should or should not be invoked by fill_post_wcc() too. I chose not to change this because I thought that it was safer to leave well enough alone. If we decide to make a change, it can be done separately. Signed-off-by: Peter Staubach Signed-off-by: J. Bruce Fields commit b5a162085cd708ce643cbd5abfe8c9992255cbc2 Author: J. Bruce Fields Date: Thu Aug 9 20:16:22 2007 -0400 svcgss: move init code into separate function We've let svcauth_gss_accept() get much too long and hairy. The RPC_GSS_PROC_INIT and RPC_GSS_PROC_CONTINUE_INIT cases share very little with the other cases, so it's very natural to split them off into a separate function. This will also nicely isolate the piece of code we need to parametrize to authenticating gss-protected NFSv4 callbacks on behalf of the NFS client. Signed-off-by: J. Bruce Fields commit 0050cb82d8184eb09c08350d4a887dcef5836d35 Author: J. Bruce Fields Date: Thu Aug 9 18:34:32 2007 -0400 knfsd: remove code duplication in nfsd4_setclientid() Each branch of this if-then-else has a bunch of duplicated code that we could just put at the end. Signed-off-by: "J. Bruce Fields" commit 64954a225e2dfe36957327b5e1e4936470b8f6db Author: Andrew Morton Date: Thu Aug 9 00:53:50 2007 -0700 nfsd warning fix fs/nfsd/nfsctl.c: In function 'write_filehandle': fs/nfsd/nfsctl.c:301: warning: 'maxsize' may be used uninitialized in this function Cc: Neil Brown Signed-off-by: Andrew Morton Signed-off-by: "J. Bruce Fields" commit 1489e2a1e9b921a545f26b581bf0924458a81ba6 Author: J. Bruce Fields Date: Tue Oct 24 18:33:17 2006 -0400 knfsd: fix callback rpc cred It doesn't make sense to make the callback with credentials that the client made the setclientid with. Instead the spec requires that the callback occur with the credentials the client authenticated *to*. It probably doesn't matter what we use for auth_unix, and some more infrastructure will be needed for auth_gss, so let's just remove the cred lookup for now. Signed-off-by: J. Bruce Fields commit 953eca0a59309c45b497d7e5e828eb87a3201efa Author: J. Bruce Fields Date: Wed Aug 1 15:30:59 2007 -0400 knfsd: move nfsv4 slab creation/destruction to module init/exit We have some slabs that the nfs4 server uses to store state objects. We're currently creating and destroying those slabs whenever the server is brought up or down. That seems excessive; may as well just do that in module initialization and exit. Also add some minor header cleanup. (Thanks to Andrew Morton for that and a compile fix.) Signed-off-by: "J. Bruce Fields" commit 0720c30a16b0b8f96daff3851dac4a1bbe0e5b09 Author: J. Bruce Fields Date: Fri Jul 27 18:06:50 2007 -0400 knfsd: spawn kernel thread to probe callback channel We want to allow gss on the callback channel, so people using krb5 can still get the benefits of delegations. But looking up the rpc credential can take some time in that case. And we shouldn't delay the response to setclientid_confirm while we wait. It may be inefficient, but for now the simplest solution is just to spawn a new thread as necessary for the purpose. (Thanks to Adrian Bunk for catching a missing static here.) Signed-off-by: "J. Bruce Fields" Cc: Adrian Bunk commit bb52ba85c4caf3052ed6a1ee832646e93164835c Author: J. Bruce Fields Date: Fri Jul 27 16:36:45 2007 -0400 knfsd: nfs4 name->id mapping not correctly parsing negative downcall Note that qword_get() returns length or -1, not an -ERROR. Signed-off-by: "J. Bruce Fields" commit dd86d2fa3d5ce1975c3fac7efaecc2b9f1987179 Author: J. Bruce Fields Date: Fri Jul 27 16:10:37 2007 -0400 knfsd: demote some printk()s to dprintk()s To quote a recent mail from Andrew Morton: Look: if there's a way in which an unprivileged user can trigger a printk we fix it, end of story. OK. I assume that goes double for printk()s that might be triggered by random hosts on the internet. So, disable some printk()s that look like they could be triggered by malfunctioning or malicious clients. For now, just downgrade them to dprintk()s. Signed-off-by: "J. Bruce Fields" commit b9d17cb50e440d15ed1310aa94b60057e205462e Author: J. Bruce Fields Date: Thu Jul 26 17:04:54 2007 -0400 knfsd: cleanup of nfsd4 cmp_* functions Benny Halevy suggested renaming cmp_* to same_* to make the meaning of the return value clearer. Fix some nearby style deviations while we're at it, including a small swath of creative indentation in nfs4_preprocess_seqid_op(). Signed-off-by: "J. Bruce Fields" commit 13a207c62ebf5dd95818d05828c790fbf4d9dca3 Author: J. Bruce Fields Date: Tue Jul 24 21:38:18 2007 -0400 knfsd: delete code made redundant by map_new_errors I moved this check into map_new_errors, but forgot to delete the original. Oops. Signed-off-by: "J. Bruce Fields" commit ad65be75d86a9d9f7a99504af8e78aeaf5b9d2f6 Author: Christoph Hellwig Date: Wed Mar 7 15:26:25 2007 +0000 nfsd: fix horrible indentation in nfsd_setattr Signed-off-by: Christoph Hellwig commit ea530dfdcf3428446356aed29c56ce9b44fbf18b Author: J. Bruce Fields Date: Thu Jul 12 15:30:32 2007 -0400 nfsd: remove unused cache_for_each macro This macro is unused. Signed-off-by: "J. Bruce Fields" commit ab182f36fbfcd89a36d69cbdd2fd29c3b4f7307b Author: J. Bruce Fields Date: Fri Jun 22 17:26:32 2007 -0400 nfsd: tone down inaccurate dprintk The nfserr_dropit happens routinely on upcalls (so a kmalloc failure is almost never the actual cause), but I occasionally get a complant from some tester that's worried because they ran across this message after turning on debugging to research some unrelated problem. Signed-off-by: "J. Bruce Fields" commit 99abc6a91d00839e28f9620ce23be8e6a20d7828 Author: Pavel Emelyanov Date: Thu Sep 20 12:45:02 2007 +0400 locks: Fix potential OOPS in generic_setlease() This code is run under lock_kernel(), which is dropped during sleeping operations, so the following race is possible: CPU1: CPU2: vfs_setlease(); vfs_setlease(); lock_kernel(); lock_kernel(); /* spin */ generic_setlease(): ... for (before = ...) /* here we found some lease after * which we will insert the new one */ fl = locks_alloc_lock(); /* go to sleep in this allocation and * drop the BKL */ generic_setlease(): ... for (before = ...) /* here we find the "before" pointing * at the one we found on CPU1 */ ->fl_change(my_before, arg); lease_modify(); locks_free_lock(); /* and we freed it */ ... unlock_kernel(); locks_insert_lock(before, fl); /* OOPS! We have just tried to add the lease * at the tail of already removed one */ The similar races are already handled in other code - all the allocations are performed before any checks/updates. Thanks to Kamalesh Babulal for testing and for a bug report on an earlier version. Signed-off-by: Pavel Emelyanov Signed-off-by: J. Bruce Fields Cc: Kamalesh Babulal commit 3279d0b110df5356bca6b3495af2a52a5bc54bb6 Author: Pavel Emelyanov Date: Wed Sep 19 16:44:07 2007 +0400 Use list_first_entry in locks_wake_up_blocks This routine deletes all the elements from the list with the "while (!list_empty())" loop, and we already have a list_first_entry() macro to help it look nicer :) Signed-off-by: Pavel Emelyanov commit be0ee4e871fbd9de9cddbc50e5fb1dc3cc499258 Author: J. Bruce Fields Date: Wed Sep 12 15:45:07 2007 -0400 locks: fix flock_lock_file() comment This comment wasn't updated when lease support was added, and it makes essentially the same mistake that the code made before a recent bugfix. Signed-off-by: J. Bruce Fields commit 1a27e6ff3f607218931f04eca740ca0b04b52e90 Author: Pavel Emelyanov Date: Tue Sep 11 16:38:13 2007 +0400 Memory shortage can result in inconsistent flocks state When the flock_lock_file() is called to change the flock from F_RDLCK to F_WRLCK or vice versa the existing flock can be removed without appropriate warning. Look: for_each_lock(inode, before) { struct file_lock *fl = *before; if (IS_POSIX(fl)) break; if (IS_LEASE(fl)) continue; if (filp != fl->fl_file) continue; if (request->fl_type == fl->fl_type) goto out; found = 1; locks_delete_lock(before); <<<<<< ! break; } if after this point the subsequent locks_alloc_lock() will fail the return code will be -ENOMEM, but the existing lock is already removed. This is a known feature that such "re-locking" is not atomic, but in the racy case the file should stay locked (although by some other process), but in this case the file will be unlocked. The proposal is to prepare the lock in advance keeping no chance to fail in the future code. Found during making the flocks pid-namespaces aware. (Note: Thanks to Reuben Farrelly for finding a bug in an earlier version of this patch.) Signed-off-by: Pavel Emelyanov Signed-off-by: J. Bruce Fields Cc: Reuben Farrelly commit 1854f8a4f08011f7b353bf32c3337aa4bde8bb11 Author: J. Bruce Fields Date: Tue Nov 14 16:54:36 2006 -0500 locks: kill redundant local variable There's no need for another variable local to this loop; we can use the variable (of the same name!) already declared at the top of the function, and not used till later (at which point it's initialized, so this is safe). Signed-off-by: J. Bruce Fields commit d024dd770f8739ab9085da578e12c9a7c605b96a Author: J. Bruce Fields Date: Thu May 10 19:02:07 2007 -0400 locks: reverse order of posix_locks_conflict() arguments The first argument to posix_locks_conflict() is meant to be a lock request, and the second a lock from an inode's lock request. It doesn't really make a difference which order you call them in, since the only asymmetric test in posix_lock_conflict() is the check whether the second argument is a posix lock--and every caller already does that check for some reason. But may as well fix posix_test_lock() to call posix_locks_conflict() with the arguments in the same order as everywhere else. Signed-off-by: "J. Bruce Fields" Signed-off-by: Andrew Morton --- Documentation/00-INDEX | 4 Documentation/filesystems/00-INDEX | 2 Documentation/filesystems/mandatory-locking.txt | 171 ++++++++++++++ Documentation/locks.txt | 10 Documentation/mandatory.txt | 152 ------------ fs/locks.c | 45 ++- fs/nfsd/nfs3xdr.c | 59 ++-- fs/nfsd/nfs4callback.c | 86 ++----- fs/nfsd/nfs4idmap.c | 8 fs/nfsd/nfs4proc.c | 4 fs/nfsd/nfs4state.c | 163 +++++-------- fs/nfsd/nfs4xdr.c | 17 - fs/nfsd/nfsctl.c | 7 fs/nfsd/nfssvc.c | 8 fs/nfsd/nfsxdr.c | 4 fs/nfsd/vfs.c | 43 ++- include/linux/nfsd/nfsd.h | 18 - include/linux/nfsd/nfsfh.h | 42 --- include/linux/nfsd/xdr4.h | 4 include/linux/sunrpc/cache.h | 10 net/sunrpc/auth_gss/svcauth_gss.c | 144 ++++++----- net/sunrpc/svc.c | 40 ++- 22 files changed, 506 insertions(+), 535 deletions(-) diff -puN Documentation/00-INDEX~git-nfsd Documentation/00-INDEX --- a/Documentation/00-INDEX~git-nfsd +++ a/Documentation/00-INDEX @@ -145,7 +145,7 @@ fb/ feature-removal-schedule.txt - list of files and features that are going to be removed. filesystems/ - - directory with info on the various filesystems that Linux supports. + - info on the vfs and the various filesystems that Linux supports. firmware_class/ - request_firmware() hotplug interface info. floppy.txt @@ -240,8 +240,6 @@ m68k/ - directory with info about Linux on Motorola 68k architecture. magic-number.txt - list of magic numbers used to mark/protect kernel data structures. -mandatory.txt - - info on the Linux implementation of Sys V mandatory file locking. mca.txt - info on supporting Micro Channel Architecture (e.g. PS/2) systems. md.txt diff -puN Documentation/filesystems/00-INDEX~git-nfsd Documentation/filesystems/00-INDEX --- a/Documentation/filesystems/00-INDEX~git-nfsd +++ a/Documentation/filesystems/00-INDEX @@ -52,6 +52,8 @@ isofs.txt - info and mount options for the ISO 9660 (CDROM) filesystem. jfs.txt - info and mount options for the JFS filesystem. +mandatory-locking.txt + - info on the Linux implementation of Sys V mandatory file locking. ncpfs.txt - info on Novell Netware(tm) filesystem using NCP protocol. ntfs.txt diff -puN /dev/null Documentation/filesystems/mandatory-locking.txt --- /dev/null +++ a/Documentation/filesystems/mandatory-locking.txt @@ -0,0 +1,171 @@ + Mandatory File Locking For The Linux Operating System + + Andy Walker + + 15 April 1996 + (Updated September 2007) + +0. Why you should avoid mandatory locking +----------------------------------------- + +The Linux implementation is prey to a number of difficult-to-fix race +conditions which in practice make it not dependable: + + - The write system call checks for a mandatory lock only once + at its start. It is therefore possible for a lock request to + be granted after this check but before the data is modified. + A process may then see file data change even while a mandatory + lock was held. + - Similarly, an exclusive lock may be granted on a file after + the kernel has decided to proceed with a read, but before the + read has actually completed, and the reading process may see + the file data in a state which should not have been visible + to it. + - Similar races make the claimed mutual exclusion between lock + and mmap similarly unreliable. + +1. What is mandatory locking? +------------------------------ + +Mandatory locking is kernel enforced file locking, as opposed to the more usual +cooperative file locking used to guarantee sequential access to files among +processes. File locks are applied using the flock() and fcntl() system calls +(and the lockf() library routine which is a wrapper around fcntl().) It is +normally a process' responsibility to check for locks on a file it wishes to +update, before applying its own lock, updating the file and unlocking it again. +The most commonly used example of this (and in the case of sendmail, the most +troublesome) is access to a user's mailbox. The mail user agent and the mail +transfer agent must guard against updating the mailbox at the same time, and +prevent reading the mailbox while it is being updated. + +In a perfect world all processes would use and honour a cooperative, or +"advisory" locking scheme. However, the world isn't perfect, and there's +a lot of poorly written code out there. + +In trying to address this problem, the designers of System V UNIX came up +with a "mandatory" locking scheme, whereby the operating system kernel would +block attempts by a process to write to a file that another process holds a +"read" -or- "shared" lock on, and block attempts to both read and write to a +file that a process holds a "write " -or- "exclusive" lock on. + +The System V mandatory locking scheme was intended to have as little impact as +possible on existing user code. The scheme is based on marking individual files +as candidates for mandatory locking, and using the existing fcntl()/lockf() +interface for applying locks just as if they were normal, advisory locks. + +Note 1: In saying "file" in the paragraphs above I am actually not telling +the whole truth. System V locking is based on fcntl(). The granularity of +fcntl() is such that it allows the locking of byte ranges in files, in addition +to entire files, so the mandatory locking rules also have byte level +granularity. + +Note 2: POSIX.1 does not specify any scheme for mandatory locking, despite +borrowing the fcntl() locking scheme from System V. The mandatory locking +scheme is defined by the System V Interface Definition (SVID) Version 3. + +2. Marking a file for mandatory locking +--------------------------------------- + +A file is marked as a candidate for mandatory locking by setting the group-id +bit in its file mode but removing the group-execute bit. This is an otherwise +meaningless combination, and was chosen by the System V implementors so as not +to break existing user programs. + +Note that the group-id bit is usually automatically cleared by the kernel when +a setgid file is written to. This is a security measure. The kernel has been +modified to recognize the special case of a mandatory lock candidate and to +refrain from clearing this bit. Similarly the kernel has been modified not +to run mandatory lock candidates with setgid privileges. + +3. Available implementations +---------------------------- + +I have considered the implementations of mandatory locking available with +SunOS 4.1.x, Solaris 2.x and HP-UX 9.x. + +Generally I have tried to make the most sense out of the behaviour exhibited +by these three reference systems. There are many anomalies. + +All the reference systems reject all calls to open() for a file on which +another process has outstanding mandatory locks. This is in direct +contravention of SVID 3, which states that only calls to open() with the +O_TRUNC flag set should be rejected. The Linux implementation follows the SVID +definition, which is the "Right Thing", since only calls with O_TRUNC can +modify the contents of the file. + +HP-UX even disallows open() with O_TRUNC for a file with advisory locks, not +just mandatory locks. That would appear to contravene POSIX.1. + +mmap() is another interesting case. All the operating systems mentioned +prevent mandatory locks from being applied to an mmap()'ed file, but HP-UX +also disallows advisory locks for such a file. SVID actually specifies the +paranoid HP-UX behaviour. + +In my opinion only MAP_SHARED mappings should be immune from locking, and then +only from mandatory locks - that is what is currently implemented. + +SunOS is so hopeless that it doesn't even honour the O_NONBLOCK flag for +mandatory locks, so reads and writes to locked files always block when they +should return EAGAIN. + +I'm afraid that this is such an esoteric area that the semantics described +below are just as valid as any others, so long as the main points seem to +agree. + +4. Semantics +------------ + +1. Mandatory locks can only be applied via the fcntl()/lockf() locking + interface - in other words the System V/POSIX interface. BSD style + locks using flock() never result in a mandatory lock. + +2. If a process has locked a region of a file with a mandatory read lock, then + other processes are permitted to read from that region. If any of these + processes attempts to write to the region it will block until the lock is + released, unless the process has opened the file with the O_NONBLOCK + flag in which case the system call will return immediately with the error + status EAGAIN. + +3. If a process has locked a region of a file with a mandatory write lock, all + attempts to read or write to that region block until the lock is released, + unless a process has opened the file with the O_NONBLOCK flag in which case + the system call will return immediately with the error status EAGAIN. + +4. Calls to open() with O_TRUNC, or to creat(), on a existing file that has + any mandatory locks owned by other processes will be rejected with the + error status EAGAIN. + +5. Attempts to apply a mandatory lock to a file that is memory mapped and + shared (via mmap() with MAP_SHARED) will be rejected with the error status + EAGAIN. + +6. Attempts to create a shared memory map of a file (via mmap() with MAP_SHARED) + that has any mandatory locks in effect will be rejected with the error status + EAGAIN. + +5. Which system calls are affected? +----------------------------------- + +Those which modify a file's contents, not just the inode. That gives read(), +write(), readv(), writev(), open(), creat(), mmap(), truncate() and +ftruncate(). truncate() and ftruncate() are considered to be "write" actions +for the purposes of mandatory locking. + +The affected region is usually defined as stretching from the current position +for the total number of bytes read or written. For the truncate calls it is +defined as the bytes of a file removed or added (we must also consider bytes +added, as a lock can specify just "the whole file", rather than a specific +range of bytes.) + +Note 3: I may have overlooked some system calls that need mandatory lock +checking in my eagerness to get this code out the door. Please let me know, or +better still fix the system calls yourself and submit a patch to me or Linus. + +6. Warning! +----------- + +Not even root can override a mandatory lock, so runaway processes can wreak +havoc if they lock crucial files. The way around it is to change the file +permissions (remove the setgid bit) before trying to read or write to it. +Of course, that might be a bit tricky if the system is hung :-( + diff -puN Documentation/locks.txt~git-nfsd Documentation/locks.txt --- a/Documentation/locks.txt~git-nfsd +++ a/Documentation/locks.txt @@ -53,11 +53,11 @@ fcntl(), with all the problems that impl 1.3 Mandatory Locking As A Mount Option --------------------------------------- -Mandatory locking, as described in 'Documentation/mandatory.txt' was prior -to this release a general configuration option that was valid for all -mounted filesystems. This had a number of inherent dangers, not the least -of which was the ability to freeze an NFS server by asking it to read a -file for which a mandatory lock existed. +Mandatory locking, as described in 'Documentation/filesystems/mandatory.txt' +was prior to this release a general configuration option that was valid for +all mounted filesystems. This had a number of inherent dangers, not the +least of which was the ability to freeze an NFS server by asking it to read +a file for which a mandatory lock existed. From this release of the kernel, mandatory locking can be turned on and off on a per-filesystem basis, using the mount options 'mand' and 'nomand'. diff -puN Documentation/mandatory.txt~git-nfsd /dev/null --- a/Documentation/mandatory.txt +++ /dev/null @@ -1,152 +0,0 @@ - Mandatory File Locking For The Linux Operating System - - Andy Walker - - 15 April 1996 - - -1. What is mandatory locking? ------------------------------- - -Mandatory locking is kernel enforced file locking, as opposed to the more usual -cooperative file locking used to guarantee sequential access to files among -processes. File locks are applied using the flock() and fcntl() system calls -(and the lockf() library routine which is a wrapper around fcntl().) It is -normally a process' responsibility to check for locks on a file it wishes to -update, before applying its own lock, updating the file and unlocking it again. -The most commonly used example of this (and in the case of sendmail, the most -troublesome) is access to a user's mailbox. The mail user agent and the mail -transfer agent must guard against updating the mailbox at the same time, and -prevent reading the mailbox while it is being updated. - -In a perfect world all processes would use and honour a cooperative, or -"advisory" locking scheme. However, the world isn't perfect, and there's -a lot of poorly written code out there. - -In trying to address this problem, the designers of System V UNIX came up -with a "mandatory" locking scheme, whereby the operating system kernel would -block attempts by a process to write to a file that another process holds a -"read" -or- "shared" lock on, and block attempts to both read and write to a -file that a process holds a "write " -or- "exclusive" lock on. - -The System V mandatory locking scheme was intended to have as little impact as -possible on existing user code. The scheme is based on marking individual files -as candidates for mandatory locking, and using the existing fcntl()/lockf() -interface for applying locks just as if they were normal, advisory locks. - -Note 1: In saying "file" in the paragraphs above I am actually not telling -the whole truth. System V locking is based on fcntl(). The granularity of -fcntl() is such that it allows the locking of byte ranges in files, in addition -to entire files, so the mandatory locking rules also have byte level -granularity. - -Note 2: POSIX.1 does not specify any scheme for mandatory locking, despite -borrowing the fcntl() locking scheme from System V. The mandatory locking -scheme is defined by the System V Interface Definition (SVID) Version 3. - -2. Marking a file for mandatory locking ---------------------------------------- - -A file is marked as a candidate for mandatory locking by setting the group-id -bit in its file mode but removing the group-execute bit. This is an otherwise -meaningless combination, and was chosen by the System V implementors so as not -to break existing user programs. - -Note that the group-id bit is usually automatically cleared by the kernel when -a setgid file is written to. This is a security measure. The kernel has been -modified to recognize the special case of a mandatory lock candidate and to -refrain from clearing this bit. Similarly the kernel has been modified not -to run mandatory lock candidates with setgid privileges. - -3. Available implementations ----------------------------- - -I have considered the implementations of mandatory locking available with -SunOS 4.1.x, Solaris 2.x and HP-UX 9.x. - -Generally I have tried to make the most sense out of the behaviour exhibited -by these three reference systems. There are many anomalies. - -All the reference systems reject all calls to open() for a file on which -another process has outstanding mandatory locks. This is in direct -contravention of SVID 3, which states that only calls to open() with the -O_TRUNC flag set should be rejected. The Linux implementation follows the SVID -definition, which is the "Right Thing", since only calls with O_TRUNC can -modify the contents of the file. - -HP-UX even disallows open() with O_TRUNC for a file with advisory locks, not -just mandatory locks. That would appear to contravene POSIX.1. - -mmap() is another interesting case. All the operating systems mentioned -prevent mandatory locks from being applied to an mmap()'ed file, but HP-UX -also disallows advisory locks for such a file. SVID actually specifies the -paranoid HP-UX behaviour. - -In my opinion only MAP_SHARED mappings should be immune from locking, and then -only from mandatory locks - that is what is currently implemented. - -SunOS is so hopeless that it doesn't even honour the O_NONBLOCK flag for -mandatory locks, so reads and writes to locked files always block when they -should return EAGAIN. - -I'm afraid that this is such an esoteric area that the semantics described -below are just as valid as any others, so long as the main points seem to -agree. - -4. Semantics ------------- - -1. Mandatory locks can only be applied via the fcntl()/lockf() locking - interface - in other words the System V/POSIX interface. BSD style - locks using flock() never result in a mandatory lock. - -2. If a process has locked a region of a file with a mandatory read lock, then - other processes are permitted to read from that region. If any of these - processes attempts to write to the region it will block until the lock is - released, unless the process has opened the file with the O_NONBLOCK - flag in which case the system call will return immediately with the error - status EAGAIN. - -3. If a process has locked a region of a file with a mandatory write lock, all - attempts to read or write to that region block until the lock is released, - unless a process has opened the file with the O_NONBLOCK flag in which case - the system call will return immediately with the error status EAGAIN. - -4. Calls to open() with O_TRUNC, or to creat(), on a existing file that has - any mandatory locks owned by other processes will be rejected with the - error status EAGAIN. - -5. Attempts to apply a mandatory lock to a file that is memory mapped and - shared (via mmap() with MAP_SHARED) will be rejected with the error status - EAGAIN. - -6. Attempts to create a shared memory map of a file (via mmap() with MAP_SHARED) - that has any mandatory locks in effect will be rejected with the error status - EAGAIN. - -5. Which system calls are affected? ------------------------------------ - -Those which modify a file's contents, not just the inode. That gives read(), -write(), readv(), writev(), open(), creat(), mmap(), truncate() and -ftruncate(). truncate() and ftruncate() are considered to be "write" actions -for the purposes of mandatory locking. - -The affected region is usually defined as stretching from the current position -for the total number of bytes read or written. For the truncate calls it is -defined as the bytes of a file removed or added (we must also consider bytes -added, as a lock can specify just "the whole file", rather than a specific -range of bytes.) - -Note 3: I may have overlooked some system calls that need mandatory lock -checking in my eagerness to get this code out the door. Please let me know, or -better still fix the system calls yourself and submit a patch to me or Linus. - -6. Warning! ------------ - -Not even root can override a mandatory lock, so runaway processes can wreak -havoc if they lock crucial files. The way around it is to change the file -permissions (remove the setgid bit) before trying to read or write to it. -Of course, that might be a bit tricky if the system is hung :-( - diff -puN fs/locks.c~git-nfsd fs/locks.c --- a/fs/locks.c~git-nfsd +++ a/fs/locks.c @@ -534,7 +534,9 @@ static void locks_insert_block(struct fi static void locks_wake_up_blocks(struct file_lock *blocker) { while (!list_empty(&blocker->fl_block)) { - struct file_lock *waiter = list_entry(blocker->fl_block.next, + struct file_lock *waiter; + + waiter = list_first_entry(&blocker->fl_block, struct file_lock, fl_block); __locks_delete_block(waiter); if (waiter->fl_lmops && waiter->fl_lmops->fl_notify) @@ -668,7 +670,7 @@ posix_test_lock(struct file *filp, struc for (cfl = filp->f_path.dentry->d_inode->i_flock; cfl; cfl = cfl->fl_next) { if (!IS_POSIX(cfl)) continue; - if (posix_locks_conflict(cfl, fl)) + if (posix_locks_conflict(fl, cfl)) break; } if (cfl) @@ -715,8 +717,7 @@ next_task: } /* Try to create a FLOCK lock on filp. We always insert new FLOCK locks - * at the head of the list, but that's secret knowledge known only to - * flock_lock_file and posix_lock_file. + * after any leases, but before any posix locks. * * Note that if called with an FL_EXISTS argument, the caller may determine * whether or not a lock was successfully freed by testing the return @@ -733,6 +734,15 @@ static int flock_lock_file(struct file * lock_kernel(); if (request->fl_flags & FL_ACCESS) goto find_conflict; + + if (request->fl_type != F_UNLCK) { + error = -ENOMEM; + new_fl = locks_alloc_lock(); + if (new_fl == NULL) + goto out; + error = 0; + } + for_each_lock(inode, before) { struct file_lock *fl = *before; if (IS_POSIX(fl)) @@ -754,10 +764,6 @@ static int flock_lock_file(struct file * goto out; } - error = -ENOMEM; - new_fl = locks_alloc_lock(); - if (new_fl == NULL) - goto out; /* * If a higher-priority process was blocked on the old file lock, * give it the opportunity to lock the file. @@ -819,7 +825,7 @@ static int __posix_lock_file(struct inod lock_kernel(); if (request->fl_type != F_UNLCK) { for_each_lock(inode, before) { - struct file_lock *fl = *before; + fl = *before; if (!IS_POSIX(fl)) continue; if (!posix_locks_conflict(request, fl)) @@ -1337,6 +1343,7 @@ int fcntl_getlease(struct file *filp) int generic_setlease(struct file *filp, long arg, struct file_lock **flp) { struct file_lock *fl, **before, **my_before = NULL, *lease; + struct file_lock *new_fl = NULL; struct dentry *dentry = filp->f_path.dentry; struct inode *inode = dentry->d_inode; int error, rdlease_count = 0, wrlease_count = 0; @@ -1363,6 +1370,11 @@ int generic_setlease(struct file *filp, || (atomic_read(&inode->i_count) > 1))) goto out; + error = -ENOMEM; + new_fl = locks_alloc_lock(); + if (new_fl == NULL) + goto out; + /* * At this point, we know that if there is an exclusive * lease on this file, then we hold it on this filp @@ -1405,18 +1417,15 @@ int generic_setlease(struct file *filp, if (!leases_enable) goto out; - error = -ENOMEM; - fl = locks_alloc_lock(); - if (fl == NULL) - goto out; - - locks_copy_lock(fl, lease); + locks_copy_lock(new_fl, lease); + locks_insert_lock(before, new_fl); - locks_insert_lock(before, fl); + *flp = new_fl; + return 0; - *flp = fl; - error = 0; out: + if (new_fl != NULL) + locks_free_lock(new_fl); return error; } EXPORT_SYMBOL(generic_setlease); diff -puN fs/nfsd/nfs3xdr.c~git-nfsd fs/nfsd/nfs3xdr.c --- a/fs/nfsd/nfs3xdr.c~git-nfsd +++ a/fs/nfsd/nfs3xdr.c @@ -174,9 +174,6 @@ static __be32 * encode_fattr3(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, struct kstat *stat) { - struct dentry *dentry = fhp->fh_dentry; - struct timespec time; - *p++ = htonl(nfs3_ftypes[(stat->mode & S_IFMT) >> 12]); *p++ = htonl((u32) stat->mode); *p++ = htonl((u32) stat->nlink); @@ -191,10 +188,9 @@ encode_fattr3(struct svc_rqst *rqstp, __ *p++ = htonl((u32) MAJOR(stat->rdev)); *p++ = htonl((u32) MINOR(stat->rdev)); p = encode_fsid(p, fhp); - p = xdr_encode_hyper(p, (u64) stat->ino); + p = xdr_encode_hyper(p, stat->ino); p = encode_time3(p, &stat->atime); - lease_get_mtime(dentry->d_inode, &time); - p = encode_time3(p, &time); + p = encode_time3(p, &stat->mtime); p = encode_time3(p, &stat->ctime); return p; @@ -203,31 +199,9 @@ encode_fattr3(struct svc_rqst *rqstp, __ static __be32 * encode_saved_post_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp) { - struct inode *inode = fhp->fh_dentry->d_inode; - /* Attributes to follow */ *p++ = xdr_one; - - *p++ = htonl(nfs3_ftypes[(fhp->fh_post_mode & S_IFMT) >> 12]); - *p++ = htonl((u32) fhp->fh_post_mode); - *p++ = htonl((u32) fhp->fh_post_nlink); - *p++ = htonl((u32) nfsd_ruid(rqstp, fhp->fh_post_uid)); - *p++ = htonl((u32) nfsd_rgid(rqstp, fhp->fh_post_gid)); - if (S_ISLNK(fhp->fh_post_mode) && fhp->fh_post_size > NFS3_MAXPATHLEN) { - p = xdr_encode_hyper(p, (u64) NFS3_MAXPATHLEN); - } else { - p = xdr_encode_hyper(p, (u64) fhp->fh_post_size); - } - p = xdr_encode_hyper(p, ((u64)fhp->fh_post_blocks) << 9); - *p++ = fhp->fh_post_rdev[0]; - *p++ = fhp->fh_post_rdev[1]; - p = encode_fsid(p, fhp); - p = xdr_encode_hyper(p, (u64) inode->i_ino); - p = encode_time3(p, &fhp->fh_post_atime); - p = encode_time3(p, &fhp->fh_post_mtime); - p = encode_time3(p, &fhp->fh_post_ctime); - - return p; + return encode_fattr3(rqstp, p, fhp, &fhp->fh_post_attr); } /* @@ -246,6 +220,7 @@ encode_post_op_attr(struct svc_rqst *rqs err = vfs_getattr(fhp->fh_export->ex_mnt, dentry, &stat); if (!err) { *p++ = xdr_one; /* attributes follow */ + lease_get_mtime(dentry->d_inode, &stat.mtime); return encode_fattr3(rqstp, p, fhp, &stat); } } @@ -284,6 +259,23 @@ encode_wcc_data(struct svc_rqst *rqstp, return encode_post_op_attr(rqstp, p, fhp); } +/* + * Fill in the post_op attr for the wcc data + */ +void fill_post_wcc(struct svc_fh *fhp) +{ + int err; + + if (fhp->fh_post_saved) + printk("nfsd: inode locked twice during operation.\n"); + + err = vfs_getattr(fhp->fh_export->ex_mnt, fhp->fh_dentry, + &fhp->fh_post_attr); + if (err) + fhp->fh_post_saved = 0; + else + fhp->fh_post_saved = 1; +} /* * XDR decode functions @@ -643,8 +635,11 @@ int nfs3svc_encode_attrstat(struct svc_rqst *rqstp, __be32 *p, struct nfsd3_attrstat *resp) { - if (resp->status == 0) + if (resp->status == 0) { + lease_get_mtime(resp->fh.fh_dentry->d_inode, + &resp->stat.mtime); p = encode_fattr3(rqstp, p, &resp->fh, &resp->stat); + } return xdr_ressize_check(rqstp, p); } @@ -802,7 +797,7 @@ nfs3svc_encode_readdirres(struct svc_rqs static __be32 * encode_entry_baggage(struct nfsd3_readdirres *cd, __be32 *p, const char *name, - int namlen, ino_t ino) + int namlen, u64 ino) { *p++ = xdr_one; /* mark entry present */ p = xdr_encode_hyper(p, ino); /* file id */ @@ -873,7 +868,7 @@ compose_entry_fh(struct nfsd3_readdirres #define NFS3_ENTRYPLUS_BAGGAGE (1 + 21 + 1 + (NFS3_FHSIZE >> 2)) static int encode_entry(struct readdir_cd *ccd, const char *name, int namlen, - loff_t offset, ino_t ino, unsigned int d_type, int plus) + loff_t offset, u64 ino, unsigned int d_type, int plus) { struct nfsd3_readdirres *cd = container_of(ccd, struct nfsd3_readdirres, common); diff -puN fs/nfsd/nfs4callback.c~git-nfsd fs/nfsd/nfs4callback.c --- a/fs/nfsd/nfs4callback.c~git-nfsd +++ a/fs/nfsd/nfs4callback.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include #include @@ -343,26 +344,28 @@ static struct rpc_version * nfs_cb_versi &nfs_cb_version4, }; -/* - * Use the SETCLIENTID credential - */ -static struct rpc_cred * -nfsd4_lookupcred(struct nfs4_client *clp, int taskflags) +/* Reference counting, callback cleanup, etc., all look racy as heck. + * And why is cb_set an atomic? */ + +static int do_probe_callback(void *data) { - struct auth_cred acred; - struct rpc_clnt *clnt = clp->cl_callback.cb_client; - struct rpc_cred *ret; + struct nfs4_client *clp = data; + struct nfs4_callback *cb = &clp->cl_callback; + struct rpc_message msg = { + .rpc_proc = &nfs4_cb_procedures[NFSPROC4_CLNT_CB_NULL], + .rpc_argp = clp, + }; + int status; + + status = rpc_call_sync(cb->cb_client, &msg, RPC_TASK_SOFT); - get_group_info(clp->cl_cred.cr_group_info); - acred.uid = clp->cl_cred.cr_uid; - acred.gid = clp->cl_cred.cr_gid; - acred.group_info = clp->cl_cred.cr_group_info; - - dprintk("NFSD: looking up %s cred\n", - clnt->cl_auth->au_ops->au_name); - ret = rpcauth_lookup_credcache(clnt->cl_auth, &acred, taskflags); - put_group_info(clp->cl_cred.cr_group_info); - return ret; + if (status) { + rpc_shutdown_client(cb->cb_client); + cb->cb_client = NULL; + } else + atomic_set(&cb->cb_set, 1); + put_nfs4_client(clp); + return 0; } /* @@ -390,11 +393,7 @@ nfsd4_probe_callback(struct nfs4_client .authflavor = RPC_AUTH_UNIX, /* XXX: need AUTH_GSS... */ .flags = (RPC_CLNT_CREATE_NOPING), }; - struct rpc_message msg = { - .rpc_proc = &nfs4_cb_procedures[NFSPROC4_CLNT_CB_NULL], - .rpc_argp = clp, - }; - int status; + struct task_struct *t; if (atomic_read(&cb->cb_set)) return; @@ -426,16 +425,11 @@ nfsd4_probe_callback(struct nfs4_client /* the task holds a reference to the nfs4_client struct */ atomic_inc(&clp->cl_count); - msg.rpc_cred = nfsd4_lookupcred(clp,0); - if (IS_ERR(msg.rpc_cred)) - goto out_release_clp; - status = rpc_call_async(cb->cb_client, &msg, RPC_TASK_ASYNC, &nfs4_cb_null_ops, NULL); - put_rpccred(msg.rpc_cred); + t = kthread_run(do_probe_callback, clp, "nfs4_cb_probe"); - if (status != 0) { - dprintk("NFSD: asynchronous NFSPROC4_CB_NULL failed!\n"); + if (IS_ERR(t)) goto out_release_clp; - } + return; out_release_clp: @@ -447,30 +441,6 @@ out_err: (int)clp->cl_name.len, clp->cl_name.data); } -static void -nfs4_cb_null(struct rpc_task *task, void *dummy) -{ - struct nfs4_client *clp = (struct nfs4_client *)task->tk_msg.rpc_argp; - struct nfs4_callback *cb = &clp->cl_callback; - __be32 addr = htonl(cb->cb_addr); - - dprintk("NFSD: nfs4_cb_null task->tk_status %d\n", task->tk_status); - - if (task->tk_status < 0) { - dprintk("NFSD: callback establishment to client %.*s failed\n", - (int)clp->cl_name.len, clp->cl_name.data); - goto out; - } - atomic_set(&cb->cb_set, 1); - dprintk("NFSD: callback set to client %u.%u.%u.%u\n", NIPQUAD(addr)); -out: - put_nfs4_client(clp); -} - -static const struct rpc_call_ops nfs4_cb_null_ops = { - .rpc_call_done = nfs4_cb_null, -}; - /* * called with dp->dl_count inc'ed. * nfs4_lock_state() may or may not have been called. @@ -491,10 +461,6 @@ nfsd4_cb_recall(struct nfs4_delegation * if ((!atomic_read(&clp->cl_callback.cb_set)) || !clnt) return; - msg.rpc_cred = nfsd4_lookupcred(clp, 0); - if (IS_ERR(msg.rpc_cred)) - goto out; - cbr->cbr_trunc = 0; /* XXX need to implement truncate optimization */ cbr->cbr_dp = dp; @@ -515,8 +481,6 @@ nfsd4_cb_recall(struct nfs4_delegation * status = rpc_call_sync(clnt, &msg, RPC_TASK_SOFT); } out_put_cred: - put_rpccred(msg.rpc_cred); -out: if (status == -EIO) atomic_set(&clp->cl_callback.cb_set, 0); /* Success or failure, now we're either waiting for lease expiration diff -puN fs/nfsd/nfs4idmap.c~git-nfsd fs/nfsd/nfs4idmap.c --- a/fs/nfsd/nfs4idmap.c~git-nfsd +++ a/fs/nfsd/nfs4idmap.c @@ -207,6 +207,7 @@ idtoname_parse(struct cache_detail *cd, { struct ent ent, *res; char *buf1, *bp; + int len; int error = -EINVAL; if (buf[buflen - 1] != '\n') @@ -248,10 +249,11 @@ idtoname_parse(struct cache_detail *cd, goto out; /* Name */ - error = qword_get(&buf, buf1, PAGE_SIZE); - if (error == -EINVAL) + error = -EINVAL; + len = qword_get(&buf, buf1, PAGE_SIZE); + if (len < 0) goto out; - if (error == -ENOENT) + if (len == 0) set_bit(CACHE_NEGATIVE, &ent.h.flags); else { if (error >= IDMAP_NAMESZ) { diff -puN fs/nfsd/nfs4proc.c~git-nfsd fs/nfsd/nfs4proc.c --- a/fs/nfsd/nfs4proc.c~git-nfsd +++ a/fs/nfsd/nfs4proc.c @@ -238,12 +238,12 @@ nfsd4_open(struct svc_rqst *rqstp, struc break; case NFS4_OPEN_CLAIM_DELEGATE_PREV: open->op_stateowner->so_confirmed = 1; - printk("NFSD: unsupported OPEN claim type %d\n", + dprintk("NFSD: unsupported OPEN claim type %d\n", open->op_claim_type); status = nfserr_notsupp; goto out; default: - printk("NFSD: Invalid OPEN claim type %d\n", + dprintk("NFSD: Invalid OPEN claim type %d\n", open->op_claim_type); status = nfserr_inval; goto out; diff -puN fs/nfsd/nfs4state.c~git-nfsd fs/nfsd/nfs4state.c --- a/fs/nfsd/nfs4state.c~git-nfsd +++ a/fs/nfsd/nfs4state.c @@ -462,26 +462,28 @@ copy_cred(struct svc_cred *target, struc } static inline int -same_name(const char *n1, const char *n2) { +same_name(const char *n1, const char *n2) +{ return 0 == memcmp(n1, n2, HEXDIR_LEN); } static int -cmp_verf(nfs4_verifier *v1, nfs4_verifier *v2) { - return(!memcmp(v1->data,v2->data,sizeof(v1->data))); +same_verf(nfs4_verifier *v1, nfs4_verifier *v2) +{ + return 0 == memcmp(v1->data, v2->data, sizeof(v1->data)); } static int -cmp_clid(clientid_t * cl1, clientid_t * cl2) { - return((cl1->cl_boot == cl2->cl_boot) && - (cl1->cl_id == cl2->cl_id)); +same_clid(clientid_t *cl1, clientid_t *cl2) +{ + return (cl1->cl_boot == cl2->cl_boot) && (cl1->cl_id == cl2->cl_id); } /* XXX what about NGROUP */ static int -cmp_creds(struct svc_cred *cr1, struct svc_cred *cr2){ - return(cr1->cr_uid == cr2->cr_uid); - +same_creds(struct svc_cred *cr1, struct svc_cred *cr2) +{ + return cr1->cr_uid == cr2->cr_uid; } static void @@ -507,7 +509,7 @@ check_name(struct xdr_netobj name) { if (name.len == 0) return 0; if (name.len > NFS4_OPAQUE_LIMIT) { - printk("NFSD: check_name: name too long(%d)!\n", name.len); + dprintk("NFSD: check_name: name too long(%d)!\n", name.len); return 0; } return 1; @@ -546,7 +548,7 @@ find_confirmed_client(clientid_t *clid) unsigned int idhashval = clientid_hashval(clid->cl_id); list_for_each_entry(clp, &conf_id_hashtbl[idhashval], cl_idhash) { - if (cmp_clid(&clp->cl_clientid, clid)) + if (same_clid(&clp->cl_clientid, clid)) return clp; } return NULL; @@ -559,7 +561,7 @@ find_unconfirmed_client(clientid_t *clid unsigned int idhashval = clientid_hashval(clid->cl_id); list_for_each_entry(clp, &unconf_id_hashtbl[idhashval], cl_idhash) { - if (cmp_clid(&clp->cl_clientid, clid)) + if (same_clid(&clp->cl_clientid, clid)) return clp; } return NULL; @@ -753,7 +755,7 @@ nfsd4_setclientid(struct svc_rqst *rqstp * or different ip_address */ status = nfserr_clid_inuse; - if (!cmp_creds(&conf->cl_cred, &rqstp->rq_cred) + if (!same_creds(&conf->cl_cred, &rqstp->rq_cred) || conf->cl_addr != sin->sin_addr.s_addr) { dprintk("NFSD: setclientid: string in use by client" "at %u.%u.%u.%u\n", NIPQUAD(conf->cl_addr)); @@ -772,14 +774,8 @@ nfsd4_setclientid(struct svc_rqst *rqstp new = create_client(clname, dname); if (new == NULL) goto out; - copy_verf(new, &clverifier); - new->cl_addr = sin->sin_addr.s_addr; - copy_cred(&new->cl_cred,&rqstp->rq_cred); gen_clid(new); - gen_confirm(new); - gen_callback(new, setclid); - add_to_unconfirmed(new, strhashval); - } else if (cmp_verf(&conf->cl_verifier, &clverifier)) { + } else if (same_verf(&conf->cl_verifier, &clverifier)) { /* * CASE 1: * cl_name match, confirmed, principal match @@ -804,13 +800,7 @@ nfsd4_setclientid(struct svc_rqst *rqstp new = create_client(clname, dname); if (new == NULL) goto out; - copy_verf(new,&conf->cl_verifier); - new->cl_addr = sin->sin_addr.s_addr; - copy_cred(&new->cl_cred,&rqstp->rq_cred); copy_clid(new, conf); - gen_confirm(new); - gen_callback(new, setclid); - add_to_unconfirmed(new,strhashval); } else if (!unconf) { /* * CASE 2: @@ -823,14 +813,8 @@ nfsd4_setclientid(struct svc_rqst *rqstp new = create_client(clname, dname); if (new == NULL) goto out; - copy_verf(new,&clverifier); - new->cl_addr = sin->sin_addr.s_addr; - copy_cred(&new->cl_cred,&rqstp->rq_cred); gen_clid(new); - gen_confirm(new); - gen_callback(new, setclid); - add_to_unconfirmed(new, strhashval); - } else if (!cmp_verf(&conf->cl_confirm, &unconf->cl_confirm)) { + } else if (!same_verf(&conf->cl_confirm, &unconf->cl_confirm)) { /* * CASE3: * confirmed found (name, principal match) @@ -850,19 +834,19 @@ nfsd4_setclientid(struct svc_rqst *rqstp new = create_client(clname, dname); if (new == NULL) goto out; - copy_verf(new,&clverifier); - new->cl_addr = sin->sin_addr.s_addr; - copy_cred(&new->cl_cred,&rqstp->rq_cred); gen_clid(new); - gen_confirm(new); - gen_callback(new, setclid); - add_to_unconfirmed(new, strhashval); } else { /* No cases hit !!! */ status = nfserr_inval; goto out; } + copy_verf(new, &clverifier); + new->cl_addr = sin->sin_addr.s_addr; + copy_cred(&new->cl_cred, &rqstp->rq_cred); + gen_confirm(new); + gen_callback(new, setclid); + add_to_unconfirmed(new, strhashval); setclid->se_clientid.cl_boot = new->cl_clientid.cl_boot; setclid->se_clientid.cl_id = new->cl_clientid.cl_id; memcpy(setclid->se_confirm.data, new->cl_confirm.data, sizeof(setclid->se_confirm.data)); @@ -910,16 +894,16 @@ nfsd4_setclientid_confirm(struct svc_rqs goto out; if ((conf && unconf) && - (cmp_verf(&unconf->cl_confirm, &confirm)) && - (cmp_verf(&conf->cl_verifier, &unconf->cl_verifier)) && + (same_verf(&unconf->cl_confirm, &confirm)) && + (same_verf(&conf->cl_verifier, &unconf->cl_verifier)) && (same_name(conf->cl_recdir,unconf->cl_recdir)) && - (!cmp_verf(&conf->cl_confirm, &unconf->cl_confirm))) { + (!same_verf(&conf->cl_confirm, &unconf->cl_confirm))) { /* CASE 1: * unconf record that matches input clientid and input confirm. * conf record that matches input clientid. * conf and unconf records match names, verifiers */ - if (!cmp_creds(&conf->cl_cred, &unconf->cl_cred)) + if (!same_creds(&conf->cl_cred, &unconf->cl_cred)) status = nfserr_clid_inuse; else { /* XXX: We just turn off callbacks until we can handle @@ -933,7 +917,7 @@ nfsd4_setclientid_confirm(struct svc_rqs } } else if ((conf && !unconf) || ((conf && unconf) && - (!cmp_verf(&conf->cl_verifier, &unconf->cl_verifier) || + (!same_verf(&conf->cl_verifier, &unconf->cl_verifier) || !same_name(conf->cl_recdir, unconf->cl_recdir)))) { /* CASE 2: * conf record that matches input clientid. @@ -941,18 +925,18 @@ nfsd4_setclientid_confirm(struct svc_rqs * unconf->cl_name or unconf->cl_verifier don't match the * conf record. */ - if (!cmp_creds(&conf->cl_cred,&rqstp->rq_cred)) + if (!same_creds(&conf->cl_cred, &rqstp->rq_cred)) status = nfserr_clid_inuse; else status = nfs_ok; } else if (!conf && unconf - && cmp_verf(&unconf->cl_confirm, &confirm)) { + && same_verf(&unconf->cl_confirm, &confirm)) { /* CASE 3: * conf record not found. * unconf record found. * unconf->cl_confirm matches input confirm */ - if (!cmp_creds(&unconf->cl_cred, &rqstp->rq_cred)) { + if (!same_creds(&unconf->cl_cred, &rqstp->rq_cred)) { status = nfserr_clid_inuse; } else { unsigned int hash = @@ -967,8 +951,8 @@ nfsd4_setclientid_confirm(struct svc_rqs conf = unconf; status = nfs_ok; } - } else if ((!conf || (conf && !cmp_verf(&conf->cl_confirm, &confirm))) - && (!unconf || (unconf && !cmp_verf(&unconf->cl_confirm, + } else if ((!conf || (conf && !same_verf(&conf->cl_confirm, &confirm))) + && (!unconf || (unconf && !same_verf(&unconf->cl_confirm, &confirm)))) { /* CASE 4: * conf record not found, or if conf, conf->cl_confirm does not @@ -1019,7 +1003,7 @@ nfsd4_free_slab(struct kmem_cache **slab *slab = NULL; } -static void +void nfsd4_free_slabs(void) { nfsd4_free_slab(&stateowner_slab); @@ -1207,10 +1191,12 @@ move_to_close_lru(struct nfs4_stateowner } static int -cmp_owner_str(struct nfs4_stateowner *sop, struct xdr_netobj *owner, clientid_t *clid) { - return ((sop->so_owner.len == owner->len) && - !memcmp(sop->so_owner.data, owner->data, owner->len) && - (sop->so_client->cl_clientid.cl_id == clid->cl_id)); +same_owner_str(struct nfs4_stateowner *sop, struct xdr_netobj *owner, + clientid_t *clid) +{ + return (sop->so_owner.len == owner->len) && + 0 == memcmp(sop->so_owner.data, owner->data, owner->len) && + (sop->so_client->cl_clientid.cl_id == clid->cl_id); } static struct nfs4_stateowner * @@ -1219,7 +1205,7 @@ find_openstateowner_str(unsigned int has struct nfs4_stateowner *so = NULL; list_for_each_entry(so, &ownerstr_hashtbl[hashval], so_strhash) { - if (cmp_owner_str(so, &open->op_owner, &open->op_clientid)) + if (same_owner_str(so, &open->op_owner, &open->op_clientid)) return so; } return NULL; @@ -1738,7 +1724,7 @@ out: if (open->op_claim_type == NFS4_OPEN_CLAIM_PREVIOUS && flag == NFS4_OPEN_DELEGATE_NONE && open->op_delegate_type != NFS4_OPEN_DELEGATE_NONE) - printk("NFSD: WARNING: refusing delegation reclaim\n"); + dprintk("NFSD: WARNING: refusing delegation reclaim\n"); open->op_delegate_type = flag; } @@ -2147,7 +2133,7 @@ nfs4_preprocess_seqid_op(struct svc_fh * *sopp = NULL; if (ZERO_STATEID(stateid) || ONE_STATEID(stateid)) { - printk("NFSD: preprocess_seqid_op: magic stateid!\n"); + dprintk("NFSD: preprocess_seqid_op: magic stateid!\n"); return nfserr_bad_stateid; } @@ -2181,25 +2167,24 @@ nfs4_preprocess_seqid_op(struct svc_fh * lkflg = setlkflg(lock->lk_type); if (lock->lk_is_new) { - if (!sop->so_is_open_owner) + if (!sop->so_is_open_owner) + return nfserr_bad_stateid; + if (!same_clid(&clp->cl_clientid, lockclid)) return nfserr_bad_stateid; - if (!cmp_clid(&clp->cl_clientid, lockclid)) - return nfserr_bad_stateid; - /* stp is the open stateid */ - status = nfs4_check_openmode(stp, lkflg); - if (status) - return status; - } else { - /* stp is the lock stateid */ - status = nfs4_check_openmode(stp->st_openstp, lkflg); - if (status) - return status; + /* stp is the open stateid */ + status = nfs4_check_openmode(stp, lkflg); + if (status) + return status; + } else { + /* stp is the lock stateid */ + status = nfs4_check_openmode(stp->st_openstp, lkflg); + if (status) + return status; } - } if ((flags & CHECK_FH) && nfs4_check_fh(current_fh, stp)) { - printk("NFSD: preprocess_seqid_op: fh-stateid mismatch!\n"); + dprintk("NFSD: preprocess_seqid_op: fh-stateid mismatch!\n"); return nfserr_bad_stateid; } @@ -2215,22 +2200,22 @@ nfs4_preprocess_seqid_op(struct svc_fh * goto check_replay; if (sop->so_confirmed && flags & CONFIRM) { - printk("NFSD: preprocess_seqid_op: expected" + dprintk("NFSD: preprocess_seqid_op: expected" " unconfirmed stateowner!\n"); return nfserr_bad_stateid; } if (!sop->so_confirmed && !(flags & CONFIRM)) { - printk("NFSD: preprocess_seqid_op: stateowner not" + dprintk("NFSD: preprocess_seqid_op: stateowner not" " confirmed yet!\n"); return nfserr_bad_stateid; } if (stateid->si_generation > stp->st_stateid.si_generation) { - printk("NFSD: preprocess_seqid_op: future stateid?!\n"); + dprintk("NFSD: preprocess_seqid_op: future stateid?!\n"); return nfserr_bad_stateid; } if (stateid->si_generation < stp->st_stateid.si_generation) { - printk("NFSD: preprocess_seqid_op: old stateid!\n"); + dprintk("NFSD: preprocess_seqid_op: old stateid!\n"); return nfserr_old_stateid; } renew_client(sop->so_client); @@ -2242,7 +2227,7 @@ check_replay: /* indicate replay to calling function */ return nfserr_replay_me; } - printk("NFSD: preprocess_seqid_op: bad seqid (expected %d, got %d)\n", + dprintk("NFSD: preprocess_seqid_op: bad seqid (expected %d, got %d)\n", sop->so_seqid, seqid); *sopp = NULL; return nfserr_bad_seqid; @@ -2561,7 +2546,7 @@ find_lockstateowner_str(struct inode *in struct nfs4_stateowner *op; list_for_each_entry(op, &lock_ownerstr_hashtbl[hashval], so_strhash) { - if (cmp_owner_str(op, owner, clid)) + if (same_owner_str(op, owner, clid)) return op; } return NULL; @@ -2855,7 +2840,7 @@ nfsd4_lockt(struct svc_rqst *rqstp, stru file_lock.fl_type = F_WRLCK; break; default: - printk("NFSD: nfs4_lockt: bad lock type!\n"); + dprintk("NFSD: nfs4_lockt: bad lock type!\n"); status = nfserr_inval; goto out; } @@ -3025,7 +3010,7 @@ nfsd4_release_lockowner(struct svc_rqst INIT_LIST_HEAD(&matches); for (i = 0; i < LOCK_HASH_SIZE; i++) { list_for_each_entry(sop, &lock_ownerid_hashtbl[i], so_idhash) { - if (!cmp_owner_str(sop, owner, clid)) + if (!same_owner_str(sop, owner, clid)) continue; list_for_each_entry(stp, &sop->so_stateids, st_perstateowner) { @@ -3149,11 +3134,14 @@ nfs4_check_open_reclaim(clientid_t *clid /* initialization to perform at module load time: */ -void +int nfs4_state_init(void) { - int i; + int i, status; + status = nfsd4_init_slabs(); + if (status) + return status; for (i = 0; i < CLIENT_HASH_SIZE; i++) { INIT_LIST_HEAD(&conf_id_hashtbl[i]); INIT_LIST_HEAD(&conf_str_hashtbl[i]); @@ -3182,6 +3170,7 @@ nfs4_state_init(void) for (i = 0; i < CLIENT_HASH_SIZE; i++) INIT_LIST_HEAD(&reclaim_str_hashtbl[i]); reclaim_str_hashtbl_size = 0; + return 0; } static void @@ -3242,20 +3231,15 @@ __nfs4_state_start(void) set_max_delegations(); } -int +void nfs4_state_start(void) { - int status; - if (nfs4_init) - return 0; - status = nfsd4_init_slabs(); - if (status) - return status; + return; nfsd4_load_reboot_recovery_data(); __nfs4_state_start(); nfs4_init = 1; - return 0; + return; } int @@ -3313,7 +3297,6 @@ nfs4_state_shutdown(void) nfs4_lock_state(); nfs4_release_reclaim(); __nfs4_state_shutdown(); - nfsd4_free_slabs(); nfs4_unlock_state(); } diff -puN fs/nfsd/nfs4xdr.c~git-nfsd fs/nfsd/nfs4xdr.c --- a/fs/nfsd/nfs4xdr.c~git-nfsd +++ a/fs/nfsd/nfs4xdr.c @@ -1683,7 +1683,7 @@ out_acl: if (bmval0 & FATTR4_WORD0_FILEID) { if ((buflen -= 8) < 0) goto out_resource; - WRITE64((u64) stat.ino); + WRITE64(stat.ino); } if (bmval0 & FATTR4_WORD0_FILES_AVAIL) { if ((buflen -= 8) < 0) @@ -1825,16 +1825,15 @@ out_acl: WRITE32(stat.mtime.tv_nsec); } if (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID) { - struct dentry *mnt_pnt, *mnt_root; - if ((buflen -= 8) < 0) goto out_resource; - mnt_root = exp->ex_mnt->mnt_root; - if (mnt_root->d_inode == dentry->d_inode) { - mnt_pnt = exp->ex_mnt->mnt_mountpoint; - WRITE64((u64) mnt_pnt->d_inode->i_ino); - } else - WRITE64((u64) stat.ino); + if (exp->ex_mnt->mnt_root->d_inode == dentry->d_inode) { + err = vfs_getattr(exp->ex_mnt->mnt_parent, + exp->ex_mnt->mnt_mountpoint, &stat); + if (err) + goto out_nfserr; + } + WRITE64(stat.ino); } *attrlenp = htonl((char *)p - (char *)attrlenp - 4); *countp = p - buffer; diff -puN fs/nfsd/nfsctl.c~git-nfsd fs/nfsd/nfsctl.c --- a/fs/nfsd/nfsctl.c~git-nfsd +++ a/fs/nfsd/nfsctl.c @@ -298,7 +298,7 @@ static ssize_t write_filehandle(struct f * qword quoting is used, so filehandle will be \x.... */ char *dname, *path; - int maxsize; + int uninitialized_var(maxsize); char *mesg = buf; int len; struct auth_domain *dom; @@ -679,11 +679,13 @@ static int __init init_nfsd(void) int retval; printk(KERN_INFO "Installing knfsd (copyright (C) 1996 okir@monad.swb.de).\n"); + retval = nfs4_state_init(); /* nfs4 locking state */ + if (retval) + return retval; nfsd_stat_init(); /* Statistics */ nfsd_cache_init(); /* RPC reply cache */ nfsd_export_init(); /* Exports table */ nfsd_lockd_init(); /* lockd->nfsd callbacks */ - nfs4_state_init(); /* NFSv4 locking state */ nfsd_idmap_init(); /* Name to ID mapping */ if (proc_mkdir("fs/nfs", NULL)) { struct proc_dir_entry *entry; @@ -712,6 +714,7 @@ static void __exit exit_nfsd(void) nfsd_stat_shutdown(); nfsd_lockd_shutdown(); nfsd_idmap_shutdown(); + nfsd4_free_slabs(); unregister_filesystem(&nfsd_fs_type); } diff -puN fs/nfsd/nfssvc.c~git-nfsd fs/nfsd/nfssvc.c --- a/fs/nfsd/nfssvc.c~git-nfsd +++ a/fs/nfsd/nfssvc.c @@ -349,9 +349,7 @@ nfsd_svc(unsigned short port, int nrserv error = nfsd_racache_init(2*nrservs); if (error<0) goto out; - error = nfs4_state_start(); - if (error<0) - goto out; + nfs4_state_start(); nfsd_reset_versions(); @@ -546,10 +544,8 @@ nfsd_dispatch(struct svc_rqst *rqstp, __ /* Now call the procedure handler, and encode NFS status. */ nfserr = proc->pc_func(rqstp, rqstp->rq_argp, rqstp->rq_resp); nfserr = map_new_errors(rqstp->rq_vers, nfserr); - if (nfserr == nfserr_jukebox && rqstp->rq_vers == 2) - nfserr = nfserr_dropit; if (nfserr == nfserr_dropit) { - dprintk("nfsd: Dropping request due to malloc failure!\n"); + dprintk("nfsd: Dropping request; may be revisited later\n"); nfsd_cache_update(rqstp, RC_NOCACHE, NULL); return 0; } diff -puN fs/nfsd/nfsxdr.c~git-nfsd fs/nfsd/nfsxdr.c --- a/fs/nfsd/nfsxdr.c~git-nfsd +++ a/fs/nfsd/nfsxdr.c @@ -523,6 +523,10 @@ nfssvc_encode_entry(void *ccdv, const ch cd->common.err = nfserr_toosmall; return -EINVAL; } + if (ino > ~((u32) 0)) { + cd->common.err = nfserr_fbig; + return -EINVAL; + } *p++ = xdr_one; /* mark entry present */ *p++ = htonl((u32) ino); /* file id */ p = xdr_encode_array(p, name, namlen);/* name length & name */ diff -puN fs/nfsd/vfs.c~git-nfsd fs/nfsd/vfs.c --- a/fs/nfsd/vfs.c~git-nfsd +++ a/fs/nfsd/vfs.c @@ -295,7 +295,8 @@ nfsd_setattr(struct svc_rqst *rqstp, str if (!iap->ia_valid) goto out; - /* NFSv2 does not differentiate between "set-[ac]time-to-now" + /* + * NFSv2 does not differentiate between "set-[ac]time-to-now" * which only requires access, and "set-[ac]time-to-X" which * requires ownership. * So if it looks like it might be "set both to the same time which @@ -308,25 +309,33 @@ nfsd_setattr(struct svc_rqst *rqstp, str */ #define BOTH_TIME_SET (ATTR_ATIME_SET | ATTR_MTIME_SET) #define MAX_TOUCH_TIME_ERROR (30*60) - if ((iap->ia_valid & BOTH_TIME_SET) == BOTH_TIME_SET - && iap->ia_mtime.tv_sec == iap->ia_atime.tv_sec - ) { - /* Looks probable. Now just make sure time is in the right ballpark. - * Solaris, at least, doesn't seem to care what the time request is. - * We require it be within 30 minutes of now. - */ - time_t delta = iap->ia_atime.tv_sec - get_seconds(); - if (delta<0) delta = -delta; - if (delta < MAX_TOUCH_TIME_ERROR && - inode_change_ok(inode, iap) != 0) { - /* turn off ATTR_[AM]TIME_SET but leave ATTR_[AM]TIME - * this will cause notify_change to set these times to "now" + if ((iap->ia_valid & BOTH_TIME_SET) == BOTH_TIME_SET && + iap->ia_mtime.tv_sec == iap->ia_atime.tv_sec) { + /* + * Looks probable. + * + * Now just make sure time is in the right ballpark. + * Solaris, at least, doesn't seem to care what the time + * request is. We require it be within 30 minutes of now. */ - iap->ia_valid &= ~BOTH_TIME_SET; - } + time_t delta = iap->ia_atime.tv_sec - get_seconds(); + if (delta < 0) + delta = -delta; + if (delta < MAX_TOUCH_TIME_ERROR && + inode_change_ok(inode, iap) != 0) { + /* + * Turn off ATTR_[AM]TIME_SET but leave ATTR_[AM]TIME. + * This will cause notify_change to set these times + * to "now" + */ + iap->ia_valid &= ~BOTH_TIME_SET; + } } - /* The size case is special. It changes the file as well as the attributes. */ + /* + * The size case is special. + * It changes the file as well as the attributes. + */ if (iap->ia_valid & ATTR_SIZE) { if (iap->ia_size < inode->i_size) { err = nfsd_permission(rqstp, fhp->fh_export, dentry, MAY_TRUNC|MAY_OWNER_OVERRIDE); diff -puN include/linux/nfsd/nfsd.h~git-nfsd include/linux/nfsd/nfsd.h --- a/include/linux/nfsd/nfsd.h~git-nfsd +++ a/include/linux/nfsd/nfsd.h @@ -153,19 +153,21 @@ extern int nfsd_max_blksize; */ #ifdef CONFIG_NFSD_V4 extern unsigned int max_delegations; -void nfs4_state_init(void); -int nfs4_state_start(void); +int nfs4_state_init(void); +void nfsd4_free_slabs(void); +void nfs4_state_start(void); void nfs4_state_shutdown(void); time_t nfs4_lease_time(void); void nfs4_reset_lease(time_t leasetime); int nfs4_reset_recoverydir(char *recdir); #else -static inline void nfs4_state_init(void){}; -static inline int nfs4_state_start(void){return 0;} -static inline void nfs4_state_shutdown(void){} -static inline time_t nfs4_lease_time(void){return 0;} -static inline void nfs4_reset_lease(time_t leasetime){} -static inline int nfs4_reset_recoverydir(char *recdir) {return 0;} +static inline int nfs4_state_init(void) { return 0; } +static inline void nfsd4_free_slabs(void) { } +static inline void nfs4_state_start(void) { } +static inline void nfs4_state_shutdown(void) { } +static inline time_t nfs4_lease_time(void) { return 0; } +static inline void nfs4_reset_lease(time_t leasetime) { } +static inline int nfs4_reset_recoverydir(char *recdir) { return 0; } #endif /* diff -puN include/linux/nfsd/nfsfh.h~git-nfsd include/linux/nfsd/nfsfh.h --- a/include/linux/nfsd/nfsfh.h~git-nfsd +++ a/include/linux/nfsd/nfsfh.h @@ -150,17 +150,7 @@ typedef struct svc_fh { struct timespec fh_pre_ctime; /* ctime before oper */ /* Post-op attributes saved in fh_unlock */ - umode_t fh_post_mode; /* i_mode */ - nlink_t fh_post_nlink; /* i_nlink */ - uid_t fh_post_uid; /* i_uid */ - gid_t fh_post_gid; /* i_gid */ - __u64 fh_post_size; /* i_size */ - unsigned long fh_post_blocks; /* i_blocks */ - unsigned long fh_post_blksize;/* i_blksize */ - __be32 fh_post_rdev[2];/* i_rdev */ - struct timespec fh_post_atime; /* i_atime */ - struct timespec fh_post_mtime; /* i_mtime */ - struct timespec fh_post_ctime; /* i_ctime */ + struct kstat fh_post_attr; /* full attrs after operation */ #endif /* CONFIG_NFSD_V3 */ } svc_fh; @@ -297,36 +287,12 @@ fill_pre_wcc(struct svc_fh *fhp) if (!fhp->fh_pre_saved) { fhp->fh_pre_mtime = inode->i_mtime; fhp->fh_pre_ctime = inode->i_ctime; - fhp->fh_pre_size = inode->i_size; - fhp->fh_pre_saved = 1; + fhp->fh_pre_size = inode->i_size; + fhp->fh_pre_saved = 1; } } -/* - * Fill in the post_op attr for the wcc data - */ -static inline void -fill_post_wcc(struct svc_fh *fhp) -{ - struct inode *inode = fhp->fh_dentry->d_inode; - - if (fhp->fh_post_saved) - printk("nfsd: inode locked twice during operation.\n"); - - fhp->fh_post_mode = inode->i_mode; - fhp->fh_post_nlink = inode->i_nlink; - fhp->fh_post_uid = inode->i_uid; - fhp->fh_post_gid = inode->i_gid; - fhp->fh_post_size = inode->i_size; - fhp->fh_post_blksize = BLOCK_SIZE; - fhp->fh_post_blocks = inode->i_blocks; - fhp->fh_post_rdev[0] = htonl((u32)imajor(inode)); - fhp->fh_post_rdev[1] = htonl((u32)iminor(inode)); - fhp->fh_post_atime = inode->i_atime; - fhp->fh_post_mtime = inode->i_mtime; - fhp->fh_post_ctime = inode->i_ctime; - fhp->fh_post_saved = 1; -} +extern void fill_post_wcc(struct svc_fh *); #else #define fill_pre_wcc(ignored) #define fill_post_wcc(notused) diff -puN include/linux/nfsd/xdr4.h~git-nfsd include/linux/nfsd/xdr4.h --- a/include/linux/nfsd/xdr4.h~git-nfsd +++ a/include/linux/nfsd/xdr4.h @@ -428,8 +428,8 @@ set_change_info(struct nfsd4_change_info cinfo->atomic = 1; cinfo->before_ctime_sec = fhp->fh_pre_ctime.tv_sec; cinfo->before_ctime_nsec = fhp->fh_pre_ctime.tv_nsec; - cinfo->after_ctime_sec = fhp->fh_post_ctime.tv_sec; - cinfo->after_ctime_nsec = fhp->fh_post_ctime.tv_nsec; + cinfo->after_ctime_sec = fhp->fh_post_attr.ctime.tv_sec; + cinfo->after_ctime_nsec = fhp->fh_post_attr.ctime.tv_nsec; } int nfs4svc_encode_voidres(struct svc_rqst *, __be32 *, void *); diff -puN include/linux/sunrpc/cache.h~git-nfsd include/linux/sunrpc/cache.h --- a/include/linux/sunrpc/cache.h~git-nfsd +++ a/include/linux/sunrpc/cache.h @@ -136,16 +136,6 @@ sunrpc_cache_update(struct cache_detail struct cache_head *new, struct cache_head *old, int hash); -#define cache_for_each(pos, detail, index, member) \ - for (({read_lock(&(detail)->hash_lock); index = (detail)->hash_size;}) ; \ - ({if (index==0)read_unlock(&(detail)->hash_lock); index--;}); \ - ) \ - for (pos = container_of((detail)->hash_table[index], typeof(*pos), member); \ - &pos->member; \ - pos = container_of(pos->member.next, typeof(*pos), member)) - - - extern void cache_clean_deferred(void *owner); static inline struct cache_head *cache_get(struct cache_head *h) diff -puN net/sunrpc/auth_gss/svcauth_gss.c~git-nfsd net/sunrpc/auth_gss/svcauth_gss.c --- a/net/sunrpc/auth_gss/svcauth_gss.c~git-nfsd +++ a/net/sunrpc/auth_gss/svcauth_gss.c @@ -631,7 +631,8 @@ svc_safe_putnetobj(struct kvec *resv, st return 0; } -/* Verify the checksum on the header and return SVC_OK on success. +/* + * Verify the checksum on the header and return SVC_OK on success. * Otherwise, return SVC_DROP (in the case of a bad sequence number) * or return SVC_DENIED and indicate error in authp. */ @@ -961,6 +962,78 @@ gss_write_init_verf(struct svc_rqst *rqs } /* + * Having read the cred already and found we're in the context + * initiation case, read the verifier and initiate (or check the results + * of) upcalls to userspace for help with context initiation. If + * the upcall results are available, write the verifier and result. + * Otherwise, drop the request pending an answer to the upcall. + */ +static int svcauth_gss_handle_init(struct svc_rqst *rqstp, + struct rpc_gss_wire_cred *gc, __be32 *authp) +{ + struct kvec *argv = &rqstp->rq_arg.head[0]; + struct kvec *resv = &rqstp->rq_res.head[0]; + struct xdr_netobj tmpobj; + struct rsi *rsip, rsikey; + + /* Read the verifier; should be NULL: */ + *authp = rpc_autherr_badverf; + if (argv->iov_len < 2 * 4) + return SVC_DENIED; + if (svc_getnl(argv) != RPC_AUTH_NULL) + return SVC_DENIED; + if (svc_getnl(argv) != 0) + return SVC_DENIED; + + /* Martial context handle and token for upcall: */ + *authp = rpc_autherr_badcred; + if (gc->gc_proc == RPC_GSS_PROC_INIT && gc->gc_ctx.len != 0) + return SVC_DENIED; + memset(&rsikey, 0, sizeof(rsikey)); + if (dup_netobj(&rsikey.in_handle, &gc->gc_ctx)) + return SVC_DROP; + *authp = rpc_autherr_badverf; + if (svc_safe_getnetobj(argv, &tmpobj)) { + kfree(rsikey.in_handle.data); + return SVC_DENIED; + } + if (dup_netobj(&rsikey.in_token, &tmpobj)) { + kfree(rsikey.in_handle.data); + return SVC_DROP; + } + + /* Perform upcall, or find upcall result: */ + rsip = rsi_lookup(&rsikey); + rsi_free(&rsikey); + if (!rsip) + return SVC_DROP; + switch (cache_check(&rsi_cache, &rsip->h, &rqstp->rq_chandle)) { + case -EAGAIN: + case -ETIMEDOUT: + case -ENOENT: + /* No upcall result: */ + return SVC_DROP; + case 0: + /* Got an answer to the upcall; use it: */ + if (gss_write_init_verf(rqstp, rsip)) + return SVC_DROP; + if (resv->iov_len + 4 > PAGE_SIZE) + return SVC_DROP; + svc_putnl(resv, RPC_SUCCESS); + if (svc_safe_putnetobj(resv, &rsip->out_handle)) + return SVC_DROP; + if (resv->iov_len + 3 * 4 > PAGE_SIZE) + return SVC_DROP; + svc_putnl(resv, rsip->major_status); + svc_putnl(resv, rsip->minor_status); + svc_putnl(resv, GSS_SEQ_WIN); + if (svc_safe_putnetobj(resv, &rsip->out_token)) + return SVC_DROP; + } + return SVC_COMPLETE; +} + +/* * Accept an rpcsec packet. * If context establishment, punt to user space * If data exchange, verify/decrypt @@ -974,11 +1047,9 @@ svcauth_gss_accept(struct svc_rqst *rqst struct kvec *argv = &rqstp->rq_arg.head[0]; struct kvec *resv = &rqstp->rq_res.head[0]; u32 crlen; - struct xdr_netobj tmpobj; struct gss_svc_data *svcdata = rqstp->rq_auth_data; struct rpc_gss_wire_cred *gc; struct rsc *rsci = NULL; - struct rsi *rsip, rsikey; __be32 *rpcstart; __be32 *reject_stat = resv->iov_base + resv->iov_len; int ret; @@ -1023,30 +1094,14 @@ svcauth_gss_accept(struct svc_rqst *rqst if ((gc->gc_proc != RPC_GSS_PROC_DATA) && (rqstp->rq_proc != 0)) goto auth_err; - /* - * We've successfully parsed the credential. Let's check out the - * verifier. An AUTH_NULL verifier is allowed (and required) for - * INIT and CONTINUE_INIT requests. AUTH_RPCSEC_GSS is required for - * PROC_DATA and PROC_DESTROY. - * - * AUTH_NULL verifier is 0 (AUTH_NULL), 0 (length). - * AUTH_RPCSEC_GSS verifier is: - * 6 (AUTH_RPCSEC_GSS), length, checksum. - * checksum is calculated over rpcheader from xid up to here. - */ *authp = rpc_autherr_badverf; switch (gc->gc_proc) { case RPC_GSS_PROC_INIT: case RPC_GSS_PROC_CONTINUE_INIT: - if (argv->iov_len < 2 * 4) - goto auth_err; - if (svc_getnl(argv) != RPC_AUTH_NULL) - goto auth_err; - if (svc_getnl(argv) != 0) - goto auth_err; - break; + return svcauth_gss_handle_init(rqstp, gc, authp); case RPC_GSS_PROC_DATA: case RPC_GSS_PROC_DESTROY: + /* Look up the context, and check the verifier: */ *authp = rpcsec_gsserr_credproblem; rsci = gss_svc_searchbyctx(&gc->gc_ctx); if (!rsci) @@ -1067,51 +1122,6 @@ svcauth_gss_accept(struct svc_rqst *rqst /* now act upon the command: */ switch (gc->gc_proc) { - case RPC_GSS_PROC_INIT: - case RPC_GSS_PROC_CONTINUE_INIT: - *authp = rpc_autherr_badcred; - if (gc->gc_proc == RPC_GSS_PROC_INIT && gc->gc_ctx.len != 0) - goto auth_err; - memset(&rsikey, 0, sizeof(rsikey)); - if (dup_netobj(&rsikey.in_handle, &gc->gc_ctx)) - goto drop; - *authp = rpc_autherr_badverf; - if (svc_safe_getnetobj(argv, &tmpobj)) { - kfree(rsikey.in_handle.data); - goto auth_err; - } - if (dup_netobj(&rsikey.in_token, &tmpobj)) { - kfree(rsikey.in_handle.data); - goto drop; - } - - rsip = rsi_lookup(&rsikey); - rsi_free(&rsikey); - if (!rsip) { - goto drop; - } - switch(cache_check(&rsi_cache, &rsip->h, &rqstp->rq_chandle)) { - case -EAGAIN: - case -ETIMEDOUT: - case -ENOENT: - goto drop; - case 0: - if (gss_write_init_verf(rqstp, rsip)) - goto drop; - if (resv->iov_len + 4 > PAGE_SIZE) - goto drop; - svc_putnl(resv, RPC_SUCCESS); - if (svc_safe_putnetobj(resv, &rsip->out_handle)) - goto drop; - if (resv->iov_len + 3 * 4 > PAGE_SIZE) - goto drop; - svc_putnl(resv, rsip->major_status); - svc_putnl(resv, rsip->minor_status); - svc_putnl(resv, GSS_SEQ_WIN); - if (svc_safe_putnetobj(resv, &rsip->out_token)) - goto drop; - } - goto complete; case RPC_GSS_PROC_DESTROY: if (gss_write_verf(rqstp, rsci->mechctx, gc->gc_seq)) goto auth_err; @@ -1158,7 +1168,7 @@ svcauth_gss_accept(struct svc_rqst *rqst goto out; } auth_err: - /* Restore write pointer to original value: */ + /* Restore write pointer to its original value: */ xdr_ressize_check(rqstp, reject_stat); ret = SVC_DENIED; goto out; diff -puN net/sunrpc/svc.c~git-nfsd net/sunrpc/svc.c --- a/net/sunrpc/svc.c~git-nfsd +++ a/net/sunrpc/svc.c @@ -777,6 +777,30 @@ svc_register(struct svc_serv *serv, int } /* + * Printk the given error with the address of the client that caused it. + */ +static int +__attribute__ ((format (printf, 2, 3))) +svc_printk(struct svc_rqst *rqstp, const char *fmt, ...) +{ + va_list args; + int r; + char buf[RPC_MAX_ADDRBUFLEN]; + + if (!net_ratelimit()) + return 0; + + printk(KERN_WARNING "svc: %s: ", + svc_print_addr(rqstp, buf, sizeof(buf))); + + va_start(args, fmt); + r = vprintk(fmt, args); + va_end(args); + + return r; +} + +/* * Process the RPC request. */ int @@ -963,14 +987,13 @@ svc_process(struct svc_rqst *rqstp) return 0; err_short_len: - if (net_ratelimit()) - printk("svc: short len %Zd, dropping request\n", argv->iov_len); + svc_printk(rqstp, "short len %Zd, dropping request\n", + argv->iov_len); goto dropit; /* drop request */ err_bad_dir: - if (net_ratelimit()) - printk("svc: bad direction %d, dropping request\n", dir); + svc_printk(rqstp, "bad direction %d, dropping request\n", dir); serv->sv_stats->rpcbadfmt++; goto dropit; /* drop request */ @@ -1000,8 +1023,7 @@ err_bad_prog: goto sendit; err_bad_vers: - if (net_ratelimit()) - printk("svc: unknown version (%d for prog %d, %s)\n", + svc_printk(rqstp, "unknown version (%d for prog %d, %s)\n", vers, prog, progp->pg_name); serv->sv_stats->rpcbadfmt++; @@ -1011,16 +1033,14 @@ err_bad_vers: goto sendit; err_bad_proc: - if (net_ratelimit()) - printk("svc: unknown procedure (%d)\n", proc); + svc_printk(rqstp, "unknown procedure (%d)\n", proc); serv->sv_stats->rpcbadfmt++; svc_putnl(resv, RPC_PROC_UNAVAIL); goto sendit; err_garbage: - if (net_ratelimit()) - printk("svc: failed to decode args\n"); + svc_printk(rqstp, "failed to decode args\n"); rpc_stat = rpc_garbage_args; err_bad: _