From: Matt Helsley Task watchers is a notifier chain that sends notifications to registered callers whenever a task forks, execs, changes its [re][ug]id, or exits. The goal is to keep the these paths comparatively simple while enabling the addition of per-task intialization, monitoring, and tear-down functions by existing and proposed kernel features. Here are some performance numbers for those interested in getting a rough idea of how applying each patch affects performance. I ran the benchmarks with two other patches I've posted recently that aren't in this series: profile_make_notifier_blocks_read_mostly semundo_simplify Kernbench NUMAQ - 16 700MHz PIII processors, Debian Sarge +Patch 0 - None Elapsed: 100.99s User: 1163.63s System: 224.24s CPU: 1373.67% 1163.68user 224.49system 1:41.35elapsed 1369%CPU (0avgtext+0avgdata 0maxresident)k 1164.11user 222.03system 1:40.41elapsed 1380%CPU (0avgtext+0avgdata 0maxresident)k 1163.10user 226.21system 1:41.20elapsed 1372%CPU (0avgtext+0avgdata 0maxresident)k +Patch 0 - None (second run) Elapsed: 100.57s User: 1163.10s System: 224.01s CPU: 1378.33% 1163.48user 223.40system 1:40.06elapsed 1385%CPU (0avgtext+0avgdata 0maxresident)k 1161.64user 224.99system 1:40.63elapsed 1377%CPU (0avgtext+0avgdata 0maxresident)k 1164.18user 223.64system 1:41.01elapsed 1373%CPU (0avgtext+0avgdata 0maxresident)k +Patch - profile_make_notifier_blocks_read_mostly (Posted outside of this series) Elapsed: 100.74s User: 1163.92s System: 223.10s CPU: 1376.66% 1164.58user 222.78system 1:41.33elapsed 1369%CPU (0avgtext+0avgdata 0maxresident)k 1163.26user 224.21system 1:40.17elapsed 1385%CPU (0avgtext+0avgdata 0maxresident)k 1163.93user 222.32system 1:40.73elapsed 1376%CPU (0avgtext+0avgdata 0maxresident)k +Patch - semundo_simplify (Posted outside of this series) Elapsed: 100.92s User: 1162.45s System: 224.52s CPU: 1373.67% 1163.16user 224.83system 1:40.85elapsed 1376%CPU (0avgtext+0avgdata 0maxresident)k 1162.95user 223.99system 1:41.33elapsed 1368%CPU (0avgtext+0avgdata 0maxresident)k 1161.23user 224.74system 1:40.58elapsed 1377%CPU (0avgtext+0avgdata 0maxresident)k +Patch 1 - task_watchers Elapsed: 100.99s User: 1163.45s System: 224.78s CPU: 1374% 1163.45user 224.74system 1:40.30elapsed 1384%CPU (0avgtext+0avgdata 0maxresident)k 1164.55user 223.82system 1:41.12elapsed 1372%CPU (0avgtext+0avgdata 0maxresident)k 1162.34user 225.78system 1:41.56elapsed 1366%CPU (0avgtext+0avgdata 0maxresident)k +Patch 2 - add a process events task watcher Elapsed: 100.87s User: 1163.36s System: 225.11s CPU: 1375.67% 1164.12user 225.32system 1:41.13elapsed 1373%CPU (0avgtext+0avgdata 0maxresident)k 1163.05user 226.86system 1:40.87elapsed 1377%CPU (0avgtext+0avgdata 0maxresident)k 1162.92user 223.16system 1:40.61elapsed 1377%CPU (0avgtext+0avgdata 0maxresident)k +Patch 3 - refactor process events Elapsed: 100.66s User: 1162.81s System: 227.08s CPU: 1380.33% 1162.62user 226.87system 1:40.69elapsed 1379%CPU (0avgtext+0avgdata 0maxresident)k 1163.26user 226.93system 1:40.56elapsed 1382%CPU (0avgtext+0avgdata 0maxresident)k 1162.54user 227.45system 1:40.72elapsed 1380%CPU (0avgtext+0avgdata 0maxresident)k +Patch 4 - process events module Elapsed: 101.06s User: 1162.57s System: 225.67s CPU: 1373% 1164.08user 224.63system 1:40.57elapsed 1380%CPU (0avgtext+0avgdata 0maxresident)k 1160.70user 226.88system 1:40.79elapsed 1376%CPU (0avgtext+0avgdata 0maxresident)k 1162.94user 225.50system 1:41.82elapsed 1363%CPU (0avgtext+0avgdata 0maxresident)k +Patch 5 - switch to use a blocking notifier chain Elapsed: 102.52s User: 1162.54s System: 224.73s CPU: 1353.67% 1162.57user 224.95system 1:40.50elapsed 1380%CPU (0avgtext+0avgdata 0maxresident)k 1162.77user 224.85system 1:46.22elapsed 1306%CPU (0avgtext+0avgdata 0maxresident)k 1162.28user 224.39system 1:40.84elapsed 1375%CPU (0avgtext+0avgdata 0maxresident)k +Patch 6 - add an audit task watcher Elapsed: 101.07s User: 1161.91s System: 224.90s CPU: 1371.33% 1162.64user 224.15system 1:40.79elapsed 1375%CPU (0avgtext+0avgdata 0maxresident)k 1162.21user 225.85system 1:41.50elapsed 1367%CPU (0avgtext+0avgdata 0maxresident)k 1160.88user 224.69system 1:40.92elapsed 1372%CPU (0avgtext+0avgdata 0maxresident)k +Patch 7 - add a delayacct task watcher Elapsed: 101.00s User: 1163.59s System: 224.08s CPU: 1373.33% 1163.09user 225.15system 1:41.04elapsed 1373%CPU (0avgtext+0avgdata 0maxresident)k 1164.51user 221.55system 1:40.89elapsed 1373%CPU (0avgtext+0avgdata 0maxresident)k 1163.17user 225.55system 1:41.06elapsed 1374%CPU (0avgtext+0avgdata 0maxresident)k +Patch 8 - add a profile task watcher Elapsed: 100.61s User: 1162.95s System: 224.36s CPU: 1378.33% 1163.38user 224.60system 1:40.93elapsed 1375%CPU (0avgtext+0avgdata 0maxresident)k 1162.59user 224.10system 1:41.00elapsed 1372%CPU (0avgtext+0avgdata 0maxresident)k 1162.88user 224.38system 1:39.89elapsed 1388%CPU (0avgtext+0avgdata 0maxresident)k +Patch 8 - add a profile task watcher (second run) Elapsed: 100.95s User: 1164.07s System: 224.58s CPU: 1375% 1164.13user 224.69system 1:40.90elapsed 1376%CPU (0avgtext+0avgdata 0maxresident)k 1164.74user 224.09system 1:40.96elapsed 1375%CPU (0avgtext+0avgdata 0maxresident)k 1163.34user 224.96system 1:40.99elapsed 1374%CPU (0avgtext+0avgdata 0maxresident)k +Patch 9 - introduce per-task watchers Elapsed: 100.76s User: 1162.49s System: 225.37s CPU: 1377% 1162.65user 225.80system 1:40.89elapsed 1376%CPU (0avgtext+0avgdata 0maxresident)k 1162.09user 225.44system 1:40.53elapsed 1380%CPU (0avgtext+0avgdata 0maxresident)k 1162.72user 224.86system 1:40.85elapsed 1375%CPU (0avgtext+0avgdata 0maxresident)k +Patch 10 - add a semundo task watcher Elapsed: 101.27s User: 1163.71s System: 224.82s CPU: 1370.67% 1163.84user 224.10system 1:41.55elapsed 1366%CPU (0avgtext+0avgdata 0maxresident)k 1163.43user 224.76system 1:41.16elapsed 1372%CPU (0avgtext+0avgdata 0maxresident)k 1163.86user 225.59system 1:41.10elapsed 1374%CPU (0avgtext+0avgdata 0maxresident)k +Patch 11 - switch the semundo task watcher to a per-task watcher Elapsed: 100.96s User: 1162.93s System: 224.76s CPU: 1374% 1163.14user 223.37system 1:40.61elapsed 1378%CPU (0avgtext+0avgdata 0maxresident)k 1162.92user 226.02system 1:41.34elapsed 1370%CPU (0avgtext+0avgdata 0maxresident)k 1162.73user 224.88system 1:40.94elapsed 1374%CPU (0avgtext+0avgdata 0maxresident)k This patch: Use a notifier chain to inform watchers that a task is forking, execing, changing an id, or exiting. This allows watchers to monitor these paths without adding their own code directly to the paths. Adding a watcher is likely to be much more maintainable when it is insensitive to the order it is added to the chain. This means watchers should avoid setting the priority field of the notifier blocks they are registering. If ordering is necessary then adding calls directly in the paths in question is probably a better idea. WATCH_TASK_INIT is called before fork/clone complete. WATCH_TASK_CLONE is called just before completion for fork/clone. Watchers may prevent a WATCH_TASK_CLONE from succeeding by returning with NOTIFY_STOP_MASK set. However watchers are strongly discouraged from returning with NOTIFY_STOP_MASK set from WATCH_TASK_INIT -- it may interfere with the operation of other watchers. WATCH_TASK_EXEC is called just before successfully returning from the exec system call. WATCH_TASK_UID is called every time a task's real or effective user id change. WATCH_TASK_GID is called every time a task's real or effective group id change. WATCH_TASK_EXIT is called at the beginning of do_exit when a task is exiting for any reason. WATCH_TASK_FREE is called before critical task structures like the mm_struct become inaccessible and the task is subsequently freed. Watchers must never return NOTIFY_STOP_MASK in response to WATCH_TASK_FREE. Doing so will prevent other watchers from cleaning up and could cause a wide variety of "bad things" to happen. For every WATCH_TASK_INIT and WATCH_TASK_CLONE, a corresponding WATCH_TASK_FREE is guaranteed. Because fork/clone may be failed by another watcher, a watcher may see a WATCH_TASK_FREE without a preceding WATCH_TASK_INIT or WATCH_TASK_CLONE. Signed-off-by: Matt Helsley Cc: Jes Sorensen Cc: Alan Stern Cc: Chandra S. Seetharaman Cc: Christoph Hellwig Signed-off-by: Andrew Morton --- fs/exec.c | 2 ++ include/linux/notifier.h | 14 ++++++++++++++ include/linux/sched.h | 1 + kernel/exit.c | 8 +++++++- kernel/fork.c | 19 +++++++++++++++---- kernel/sys.c | 31 +++++++++++++++++++++++++++++++ 6 files changed, 70 insertions(+), 5 deletions(-) diff -puN fs/exec.c~task-watchers-task-watchers fs/exec.c --- a/fs/exec.c~task-watchers-task-watchers +++ a/fs/exec.c @@ -50,6 +50,7 @@ #include #include #include +#include #include #include @@ -1097,6 +1098,7 @@ int search_binary_handler(struct linux_b fput(bprm->file); bprm->file = NULL; current->did_exec = 1; + notify_watchers(WATCH_TASK_EXEC, current); proc_exec_connector(current); return retval; } diff -puN include/linux/notifier.h~task-watchers-task-watchers include/linux/notifier.h --- a/include/linux/notifier.h~task-watchers-task-watchers +++ a/include/linux/notifier.h @@ -154,5 +154,19 @@ extern int raw_notifier_call_chain(struc #define CPU_DOWN_FAILED 0x0006 /* CPU (unsigned)v NOT going down */ #define CPU_DEAD 0x0007 /* CPU (unsigned)v dead */ +extern int register_task_watcher(struct notifier_block *nb); +extern int unregister_task_watcher(struct notifier_block *nb); +#define WATCH_FLAGS_MASK ((-1) ^ 0x0FFFFUL) +#define get_watch_event(v) ({ ((v) & ~WATCH_FLAGS_MASK); }) +#define get_watch_flags(v) ({ ((v) & WATCH_FLAGS_MASK); }) + +#define WATCH_TASK_INIT 0x00000001 /* initialize task_struct */ +#define WATCH_TASK_CLONE 0x00000002 /* "after" clone */ +#define WATCH_TASK_EXEC 0x00000003 +#define WATCH_TASK_UID 0x00000004 /* [re]uid changed */ +#define WATCH_TASK_GID 0x00000005 /* [re]gid changed */ +#define WATCH_TASK_EXIT 0x0000FFFE +#define WATCH_TASK_FREE 0x0000FFFF + #endif /* __KERNEL__ */ #endif /* _LINUX_NOTIFIER_H */ diff -puN include/linux/sched.h~task-watchers-task-watchers include/linux/sched.h --- a/include/linux/sched.h~task-watchers-task-watchers +++ a/include/linux/sched.h @@ -211,6 +211,7 @@ extern void cpu_init (void); extern void trap_init(void); extern void update_process_times(int user); extern void scheduler_tick(void); +extern int notify_watchers(unsigned long, void *); #ifdef CONFIG_DETECT_SOFTLOCKUP extern void softlockup_tick(void); diff -puN kernel/exit.c~task-watchers-task-watchers kernel/exit.c --- a/kernel/exit.c~task-watchers-task-watchers +++ a/kernel/exit.c @@ -40,6 +40,7 @@ #include #include /* for audit_free() */ #include +#include #include #include @@ -849,8 +850,11 @@ fastcall NORET_TYPE void do_exit(long co struct task_struct *tsk = current; struct taskstats *tidstats; int group_dead; + int notify_result; profile_task_exit(tsk); + tsk->exit_code = code; + notify_result = notify_watchers(WATCH_TASK_EXIT, tsk); WARN_ON(atomic_read(&tsk->fs_excl)); @@ -907,9 +911,12 @@ fastcall NORET_TYPE void do_exit(long co #endif if (unlikely(tsk->audit_context)) audit_free(tsk); + tsk->exit_code = code; taskstats_exit_send(tsk, tidstats, group_dead); taskstats_exit_free(tidstats); delayacct_tsk_exit(tsk); + notify_result = notify_watchers(WATCH_TASK_FREE, tsk); + WARN_ON(notify_result & NOTIFY_STOP_MASK); exit_mm(tsk); @@ -930,7 +937,6 @@ fastcall NORET_TYPE void do_exit(long co if (tsk->binfmt) module_put(tsk->binfmt->module); - tsk->exit_code = code; proc_exit_connector(tsk); exit_notify(tsk); #ifdef CONFIG_NUMA diff -puN kernel/fork.c~task-watchers-task-watchers kernel/fork.c --- a/kernel/fork.c~task-watchers-task-watchers +++ a/kernel/fork.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include #include @@ -950,6 +951,7 @@ static task_t *copy_process(unsigned lon int pid) { int retval; + int notify_result; struct task_struct *p = NULL; if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS)) @@ -1044,6 +1046,14 @@ static task_t *copy_process(unsigned lon p->io_context = NULL; p->io_wait = NULL; p->audit_context = NULL; + + p->tgid = p->pid; + if (clone_flags & CLONE_THREAD) + p->tgid = current->tgid; + + notify_result = notify_watchers(WATCH_TASK_INIT, p); + if (notify_result & NOTIFY_STOP_MASK) + goto bad_fork_cleanup; cpuset_fork(p); #ifdef CONFIG_NUMA p->mempolicy = mpol_copy(p->mempolicy); @@ -1061,10 +1071,6 @@ static task_t *copy_process(unsigned lon p->blocked_on = NULL; /* not blocked yet */ #endif - p->tgid = p->pid; - if (clone_flags & CLONE_THREAD) - p->tgid = current->tgid; - if ((retval = security_task_alloc(p))) goto bad_fork_cleanup_policy; if ((retval = audit_alloc(p))) @@ -1228,6 +1234,9 @@ static task_t *copy_process(unsigned lon total_forks++; spin_unlock(¤t->sighand->siglock); write_unlock_irq(&tasklist_lock); + notify_result = notify_watchers(WATCH_TASK_CLONE, p); + if (notify_result & NOTIFY_STOP_MASK) + goto bad_fork_cleanup_namespaces; proc_fork_connector(p); return p; @@ -1252,6 +1261,8 @@ bad_fork_cleanup_audit: audit_free(p); bad_fork_cleanup_security: security_task_free(p); + notify_result = notify_watchers(WATCH_TASK_FREE, p); + WARN_ON(notify_result & NOTIFY_STOP_MASK); bad_fork_cleanup_policy: #ifdef CONFIG_NUMA mpol_free(p->mempolicy); diff -puN kernel/sys.c~task-watchers-task-watchers kernel/sys.c --- a/kernel/sys.c~task-watchers-task-watchers +++ a/kernel/sys.c @@ -435,6 +435,29 @@ int unregister_reboot_notifier(struct no EXPORT_SYMBOL(unregister_reboot_notifier); +/* task watchers notifier chain */ +static ATOMIC_NOTIFIER_HEAD(task_watchers); + +int register_task_watcher(struct notifier_block *nb) +{ + return atomic_notifier_chain_register(&task_watchers, nb); +} + +EXPORT_SYMBOL_GPL(register_task_watcher); + +int unregister_task_watcher(struct notifier_block *nb) +{ + return atomic_notifier_chain_unregister(&task_watchers, nb); +} + +EXPORT_SYMBOL_GPL(unregister_task_watcher); + +int notify_watchers(unsigned long val, void *v) +{ + return atomic_notifier_call_chain(&task_watchers, val, v); +} + + static int set_one_prio(struct task_struct *p, int niceval, int error) { int no_nice; @@ -840,6 +863,7 @@ asmlinkage long sys_setregid(gid_t rgid, current->egid = new_egid; current->gid = new_rgid; key_fsgid_changed(current); + notify_watchers(WATCH_TASK_GID, current); proc_id_connector(current, PROC_EVENT_GID); return 0; } @@ -880,6 +904,7 @@ asmlinkage long sys_setgid(gid_t gid) return -EPERM; key_fsgid_changed(current); + notify_watchers(WATCH_TASK_GID, current); proc_id_connector(current, PROC_EVENT_GID); return 0; } @@ -970,6 +995,7 @@ asmlinkage long sys_setreuid(uid_t ruid, current->fsuid = current->euid; key_fsuid_changed(current); + notify_watchers(WATCH_TASK_UID, current); proc_id_connector(current, PROC_EVENT_UID); return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RE); @@ -1018,6 +1044,7 @@ asmlinkage long sys_setuid(uid_t uid) current->suid = new_suid; key_fsuid_changed(current); + notify_watchers(WATCH_TASK_UID, current); proc_id_connector(current, PROC_EVENT_UID); return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_ID); @@ -1067,6 +1094,7 @@ asmlinkage long sys_setresuid(uid_t ruid current->suid = suid; key_fsuid_changed(current); + notify_watchers(WATCH_TASK_UID, current); proc_id_connector(current, PROC_EVENT_UID); return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RES); @@ -1120,6 +1148,7 @@ asmlinkage long sys_setresgid(gid_t rgid current->sgid = sgid; key_fsgid_changed(current); + notify_watchers(WATCH_TASK_GID, current); proc_id_connector(current, PROC_EVENT_GID); return 0; } @@ -1163,6 +1192,7 @@ asmlinkage long sys_setfsuid(uid_t uid) } key_fsuid_changed(current); + notify_watchers(WATCH_TASK_UID, current); proc_id_connector(current, PROC_EVENT_UID); security_task_post_setuid(old_fsuid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS); @@ -1192,6 +1222,7 @@ asmlinkage long sys_setfsgid(gid_t gid) } current->fsgid = gid; key_fsgid_changed(current); + notify_watchers(WATCH_TASK_GID, current); proc_id_connector(current, PROC_EVENT_GID); } return old_fsgid; _