From: Oleg Nesterov schedule() checks PF_DEAD on every context switch and sets ->state = EXIT_DEAD to ensure that the exiting task will be deactivated. Note that this EXIT_DEAD is in fact a "random" value, we can use any bit except normal TASK_XXX values. It is better to set this state in do_exit() along with PF_DEAD flag and remove that check in schedule(). We are safe wrt concurrent try_to_wake_up() (for example ptrace, tkill), it can not change task's ->state: the 'state' argument of try_to_wake_up() can't have EXIT_DEAD bit. And in case when try_to_wake_up() sees a stale value of ->state == TASK_RUNNING it will do nothing. Signed-off-by: Oleg Nesterov Cc: Ingo Molnar Signed-off-by: Andrew Morton --- kernel/exit.c | 1 + kernel/sched.c | 3 --- 2 files changed, 1 insertion(+), 3 deletions(-) diff -puN kernel/exit.c~set-exit_dead-state-in-do_exit-not-in-schedule kernel/exit.c --- a/kernel/exit.c~set-exit_dead-state-in-do_exit-not-in-schedule +++ a/kernel/exit.c @@ -958,6 +958,7 @@ fastcall NORET_TYPE void do_exit(long co preempt_disable(); BUG_ON(tsk->flags & PF_DEAD); tsk->flags |= PF_DEAD; + tsk->state = EXIT_DEAD; schedule(); BUG(); diff -puN kernel/sched.c~set-exit_dead-state-in-do_exit-not-in-schedule kernel/sched.c --- a/kernel/sched.c~set-exit_dead-state-in-do_exit-not-in-schedule +++ a/kernel/sched.c @@ -3311,9 +3311,6 @@ need_resched_nonpreemptible: spin_lock_irq(&rq->lock); - if (unlikely(prev->flags & PF_DEAD)) - prev->state = EXIT_DEAD; - switch_count = &prev->nivcsw; if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) { switch_count = &prev->nvcsw; _