From: Oleg Nesterov Two threads, T1 and T2. T2 ptraces P, and P is not a child of ptracer's thread group. P exits and goes to TASK_ZOMBIE. T1 does wait_task_zombie(P): P->exit_state = TASK_DEAD; ... read_unlock(&tasklist_lock); T2 does exit(), takes tasklist, forget_original_parent() does __ptrace_unlink(P) but doesn't call do_notify_parent(P) because p->exit_state == EXIT_DEAD. Now, P is not visible to our process: __ptrace_unlink() removed it from ->children. We should send notification to P->parent and release P if and only if SIGCHLD is ignored. And we have 3 bugs: 1. P->parent does do_wait() and gets -ECHILD (P is on ->parent->children, but its state is TASK_DEAD). 2. // wait_task_zombie() continues if (put_user(...)) { // TODO: is this safe? p->exit_state = EXIT_ZOMBIE; return; } we return without notification/release, task_struct leaked. Solution: ignore -EFAULT and proceed. It is an application's bug if we can't fill infop/stat_addr (in case of VM_FAULT_OOM we have much more problems). 3. // wait_task_zombie() continues if (p->real_parent != p->parent) { // Not taken, it was untraced'ed ... } release_task(p); we released the task which we shouldn't. Solution: check ->real_parent != ->parent before, under tasklist_lock, but use ptrace_unlink() instead of __ptrace_unlink() to check ->ptrace. This patch hopefully solves 2 and 3, the 1st bug will be fixed later, we need some cleanups in forget_original_parent/reparent_thread. However, the first race is very unlikely and not critical, so I hope it makes sense to fix 1 and 2 for now. 4. Small cleanup: don't "restore" EXIT_ZOMBIE unless we know we are not going to realease the child. Signed-off-by: Oleg Nesterov Cc: Ingo Molnar Cc: Roland McGrath Signed-off-by: Andrew Morton --- kernel/exit.c | 45 +++++++++++++++++++++------------------------ 1 files changed, 21 insertions(+), 24 deletions(-) diff -puN kernel/exit.c~wait_task_zombie-fix-2-3-races-vs-forget_original_parent kernel/exit.c --- a/kernel/exit.c~wait_task_zombie-fix-2-3-races-vs-forget_original_parent +++ a/kernel/exit.c @@ -1166,8 +1166,7 @@ static int wait_task_zombie(struct task_ int __user *stat_addr, struct rusage __user *ru) { unsigned long state; - int retval; - int status; + int retval, status, traced; if (unlikely(noreap)) { pid_t pid = p->pid; @@ -1209,7 +1208,10 @@ static int wait_task_zombie(struct task_ return 0; } - if (likely(p->real_parent == p->parent)) { + /* traced means p->ptrace, but not vice versa */ + traced = (p->real_parent != p->parent); + + if (likely(!traced)) { struct signal_struct *psig; struct signal_struct *sig; @@ -1291,35 +1293,30 @@ static int wait_task_zombie(struct task_ retval = put_user(p->pid, &infop->si_pid); if (!retval && infop) retval = put_user(p->uid, &infop->si_uid); - if (retval) { - // TODO: is this safe? - p->exit_state = EXIT_ZOMBIE; - return retval; - } - retval = p->pid; - if (p->real_parent != p->parent) { + if (!retval) + retval = p->pid; + + if (traced) { write_lock_irq(&tasklist_lock); - /* Double-check with lock held. */ - if (p->real_parent != p->parent) { - __ptrace_unlink(p); - // TODO: is this safe? - p->exit_state = EXIT_ZOMBIE; - /* - * If this is not a detached task, notify the parent. - * If it's still not detached after that, don't release - * it now. - */ + /* We dropped tasklist, ptracer could die and untrace */ + ptrace_unlink(p); + /* + * If this is not a detached task, notify the parent. + * If it's still not detached after that, don't release + * it now. + */ + if (p->exit_signal != -1) { + do_notify_parent(p, p->exit_signal); if (p->exit_signal != -1) { - do_notify_parent(p, p->exit_signal); - if (p->exit_signal != -1) - p = NULL; + p->exit_state = EXIT_ZOMBIE; + p = NULL; } } write_unlock_irq(&tasklist_lock); } if (p != NULL) release_task(p); - BUG_ON(!retval); + return retval; } _