From: Miklos Szeredi This patch removes the steal_locks() function. steal_locks() doesn't work correctly with any filesystem that does it's own lock management, including NFS, CIFS, etc. In addition it has weird semantics on local filesystems in case tasks sharing file-descriptor tables are doing POSIX locking operations in parallel to execve(). The steal_locks() function has an effect on applications doing: clone(CLONE_FILES) /* in child */ lock execve lock POSIX locks acquired before execve (by "child", "parent" or any further task sharing files_struct) will after the execve be owned exclusively by "child". According to Chris Wright some LSB/LTP kind of suite triggers without the stealing behavior, but there's no known real-world application that would also fail. Apps using NPTL are not affected, since all other threads are killed before execve. Apps using LinuxThreads are only affected if they - have multiple threads during exec (LinuxThreads doesn't kill other threads, the app may do it with pthread_kill_other_threads_np()) - rely on POSIX locks being inherited across exec Both conditions are documented, but not their interaction. Apps using clone() natively are affected if they - use clone(CLONE_FILES) - rely on POSIX locks being inherited across exec The above scenarios are unlikely, but possible. If the patch is vetoed, there's a plan B, that involves mostly keeping the weird stealing semantics, but changing the way lock ownership is handled so that network and local filesystems work consistently. That would add more complexity though, so this solution seems to be preferred by most people. Signed-off-by: Miklos Szeredi Cc: Trond Myklebust Cc: Matthew Wilcox Cc: Chris Wright Cc: Christoph Hellwig Cc: Steven French Signed-off-by: Andrew Morton --- fs/binfmt_elf.c | 1 fs/binfmt_misc.c | 1 fs/exec.c | 1 fs/locks.c | 57 ------------------------------------------- include/linux/fs.h | 1 5 files changed, 61 deletions(-) diff -puN fs/binfmt_elf.c~remove-steal_locks fs/binfmt_elf.c --- devel/fs/binfmt_elf.c~remove-steal_locks 2006-05-19 16:00:39.000000000 -0700 +++ devel-akpm/fs/binfmt_elf.c 2006-05-19 16:00:39.000000000 -0700 @@ -759,7 +759,6 @@ static int load_elf_binary(struct linux_ /* Discard our unneeded old files struct */ if (files) { - steal_locks(files); put_files_struct(files); files = NULL; } diff -puN fs/binfmt_misc.c~remove-steal_locks fs/binfmt_misc.c --- devel/fs/binfmt_misc.c~remove-steal_locks 2006-05-19 16:00:39.000000000 -0700 +++ devel-akpm/fs/binfmt_misc.c 2006-05-19 16:00:39.000000000 -0700 @@ -204,7 +204,6 @@ static int load_misc_binary(struct linux goto _error; if (files) { - steal_locks(files); put_files_struct(files); files = NULL; } diff -puN fs/exec.c~remove-steal_locks fs/exec.c --- devel/fs/exec.c~remove-steal_locks 2006-05-19 16:00:39.000000000 -0700 +++ devel-akpm/fs/exec.c 2006-05-19 16:00:39.000000000 -0700 @@ -866,7 +866,6 @@ int flush_old_exec(struct linux_binprm * bprm->mm = NULL; /* We're using it now */ /* This is the point of no return */ - steal_locks(files); put_files_struct(files); current->sas_ss_sp = current->sas_ss_size = 0; diff -puN fs/locks.c~remove-steal_locks fs/locks.c --- devel/fs/locks.c~remove-steal_locks 2006-05-19 16:00:39.000000000 -0700 +++ devel-akpm/fs/locks.c 2006-05-19 16:00:39.000000000 -0700 @@ -2204,63 +2204,6 @@ int lock_may_write(struct inode *inode, EXPORT_SYMBOL(lock_may_write); -static inline void __steal_locks(struct file *file, fl_owner_t from) -{ - struct inode *inode = file->f_dentry->d_inode; - struct file_lock *fl = inode->i_flock; - - while (fl) { - if (fl->fl_file == file && fl->fl_owner == from) - fl->fl_owner = current->files; - fl = fl->fl_next; - } -} - -/* When getting ready for executing a binary, we make sure that current - * has a files_struct on its own. Before dropping the old files_struct, - * we take over ownership of all locks for all file descriptors we own. - * Note that we may accidentally steal a lock for a file that a sibling - * has created since the unshare_files() call. - */ -void steal_locks(fl_owner_t from) -{ - struct files_struct *files = current->files; - int i, j; - struct fdtable *fdt; - - if (from == files) - return; - - lock_kernel(); - j = 0; - - /* - * We are not taking a ref to the file structures, so - * we need to acquire ->file_lock. - */ - spin_lock(&files->file_lock); - fdt = files_fdtable(files); - for (;;) { - unsigned long set; - i = j * __NFDBITS; - if (i >= fdt->max_fdset || i >= fdt->max_fds) - break; - set = fdt->open_fds->fds_bits[j++]; - while (set) { - if (set & 1) { - struct file *file = fdt->fd[i]; - if (file) - __steal_locks(file, from); - } - i++; - set >>= 1; - } - } - spin_unlock(&files->file_lock); - unlock_kernel(); -} -EXPORT_SYMBOL(steal_locks); - static int __init filelock_init(void) { filelock_cache = kmem_cache_create("file_lock_cache", diff -puN include/linux/fs.h~remove-steal_locks include/linux/fs.h --- devel/include/linux/fs.h~remove-steal_locks 2006-05-19 16:00:39.000000000 -0700 +++ devel-akpm/include/linux/fs.h 2006-05-19 16:00:39.000000000 -0700 @@ -782,7 +782,6 @@ extern int setlease(struct file *, long, extern int lease_modify(struct file_lock **, int); extern int lock_may_read(struct inode *, loff_t start, unsigned long count); extern int lock_may_write(struct inode *, loff_t start, unsigned long count); -extern void steal_locks(fl_owner_t from); struct fasync_struct { int magic; _