From: Eric Dumazet On some workloads, (for example when lot of close() syscalls are done), RCU qlen can be quite large, and RCU heads are no longer in cpu cache when rcu_do_batch() is called. This patch adds a prefetch() in rcu_do_batch() to give CPU a hint to bring back cache lines containing 'struct rcu_head's. Most list manipulations macros include prefetch(), but not open coded ones (at least with current C compilers :) ) I got a nice speedup on a trivial benchmark (3.48 us per iteration instead of 3.95 us on a 1.6 GHz Pentium-M) while (1) { pipe(p); close(fd[0]); close(fd[1]);} Signed-off-by: Eric Dumazet Cc: "Paul E. McKenney" Signed-off-by: Andrew Morton --- kernel/rcupdate.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletion(-) diff -puN kernel/rcupdate.c~rcu-add-a-prefetch-in-rcu_do_batch kernel/rcupdate.c --- a/kernel/rcupdate.c~rcu-add-a-prefetch-in-rcu_do_batch +++ a/kernel/rcupdate.c @@ -235,12 +235,14 @@ static void rcu_do_batch(struct rcu_data list = rdp->donelist; while (list) { - next = rdp->donelist = list->next; + next = list->next; + prefetch(next); list->func(list); list = next; if (++count >= rdp->blimit) break; } + rdp->donelist = list; local_irq_disable(); rdp->qlen -= count; _