Trigger softirq less frequently We trigger the softirq before this patch always with the value in sd->interval. However, if the queue is busy then it is sufficient to schedule the tasklet with sd->interval*busy_factor. So we modify the calculation of the next time to balance by taking the interval added to last_balance again. This is only the right value if the idle/busy situation continues as is. There are two potential trouble spots: - If the queue was idle and now gets busy then we call rebalance early. However, that is not a problem because we will then use the longer interval for the next period. - If the queue was busy and becomes idle then we potentially wait too long before rebalancing. However, when the task goes idle then idle_balance is called. We add another calculation of the next balance time based on sd->interval in idle_balance so that we will rebalance soon. V2->V3: - Calculate rebalance time based on current jiffies and not based on the jiffies at the last time we load balanced. We no longer rely on staggering and therefore we can affort to do this now. Signed-off-by: Christoph Lameter Index: linux-2.6.19-rc5-mm1/kernel/sched.c =================================================================== --- linux-2.6.19-rc5-mm1.orig/kernel/sched.c 2006-11-13 18:11:07.034196836 -0600 +++ linux-2.6.19-rc5-mm1/kernel/sched.c 2006-11-13 19:04:40.316317645 -0600 @@ -2777,14 +2777,28 @@ out_balanced: static void idle_balance(int this_cpu, struct rq *this_rq) { struct sched_domain *sd; + int pulled_task = 0; + unsigned long next_balance = jiffies + 60 * HZ; for_each_domain(this_cpu, sd) { if (sd->flags & SD_BALANCE_NEWIDLE) { /* If we've pulled tasks over stop searching: */ - if (load_balance_newidle(this_cpu, this_rq, sd)) + pulled_task = load_balance_newidle(this_cpu, + this_rq, sd); + if (time_after(next_balance, + sd->last_balance + sd->balance_interval)) + next_balance = sd->last_balance + + sd->balance_interval; + if (pulled_task) break; } } + if (!pulled_task) + /* + * We are going idle. next_balance may be set based on + * a busy processor. So reset next_balance. + */ + this_rq->next_balance = next_balance; } /* @@ -2907,7 +2921,7 @@ static void run_rebalance_domains(struct */ idle = NOT_IDLE; } - sd->last_balance += interval; + sd->last_balance = jiffies; } if (time_after(next_balance, sd->last_balance + interval)) next_balance = sd->last_balance + interval;