Schedule load balance tasklet less frequently We schedule the tasklet before this patch always with the value in sd->interval. However, if the queue is busy then it is sufficient to schedule the tasklet with sd->interval*busy_factor. So we modify the calculation of the next time to balance by taking the interval added to last_balance again. This is only the right value if the idle/busy situation continues as is. There are two potential trouble spots: - If the queue was idle and now gets busy then we call rebalance early. However, that is not a problem because we will then use the longer interval for the next period. - If the queue was busy and becomes idle then we potentially wait too long before rebalancing. However, when the task goes idle then idle_balance is called. We add another calculation of the next balance time based on sd->interval in idle_balance so that we will rebalance soon. V2-V2: - Calculate rebalance time based on current jiffies and not based on the jiffies at the last time we load balanced. We no longer rely on staggering and therefore we can affort to do this now. Signed-off-by: Christoph Lameter Index: linux-2.6.19-rc4-mm2/kernel/sched.c =================================================================== --- linux-2.6.19-rc4-mm2.orig/kernel/sched.c 2006-11-10 22:44:56.749870566 -0600 +++ linux-2.6.19-rc4-mm2/kernel/sched.c 2006-11-10 22:45:20.320246539 -0600 @@ -2758,14 +2758,28 @@ out_balanced: static void idle_balance(int this_cpu, struct rq *this_rq) { struct sched_domain *sd; + int pulled_task = 0; + unsigned long next_balance = jiffies + 60 * HZ; for_each_domain(this_cpu, sd) { if (sd->flags & SD_BALANCE_NEWIDLE) { /* If we've pulled tasks over stop searching: */ - if (load_balance_newidle(this_cpu, this_rq, sd)) + pulled_task = load_balance_newidle(this_cpu, + this_rq, sd); + if (time_after(next_balance, + sd->last_balance + sd->balance_interval)) + next_balance = sd->last_balance + + sd->balance_interval; + if (pulled_task) break; } } + if (!pulled_task) + /* + * We are going idle. next_balance may be set based on + * a busy processor. So reset next_balance. + */ + this_rq->next_balance = next_balance; } /* @@ -2880,7 +2894,7 @@ static void rebalance_domains(unsigned l if (unlikely(!interval)) interval = 1; - if (jiffies - sd->last_balance >= interval) { + if (time_after_eq(jiffies, sd->last_balance + interval)) { if (load_balance(this_cpu, this_rq, sd, idle)) { /* * We've pulled tasks over so either we're no @@ -2889,10 +2903,18 @@ static void rebalance_domains(unsigned l */ idle = NOT_IDLE; } - sd->last_balance += interval; + sd->last_balance = jiffies; } - next_balance = min(next_balance, - sd->last_balance + sd->balance_interval); + /* + * Calculate the next balancing point assuming that + * the idle state does not change. If we are idle and then + * start running a process then this will be recalculated. + * If we are running a process and then become idle + * then idle_balance will reset next_balance so that we + * rebalance earlier. + */ + if (time_after(next_balance, sd->last_balance + interval)) + next_balance = sd->last_balance + interval; } this_rq->next_balance = next_balance; } @@ -3149,7 +3171,7 @@ void scheduler_tick(void) task_running_tick(rq, p); #ifdef CONFIG_SMP update_load(rq); - if (jiffies >= rq->next_balance) + if (time_after_eq(jiffies, rq->next_balance)) tasklet_schedule(&rebalance); #endif }