Subject: spufs: hack to avoid losing wakeups From: Christoph Hellwig On Wed, Feb 14, 2007 at 05:14:33PM +1100, Jeremy Kerr wrote: > I have a simple testcase that starts a number of contexts, then sends > mailbox messages to them in a predefined order. Of course, with the old > sceduler, it all works until you send a message to the 17th context > (ie, n_spes + 1), as while the others are still running, spu_acquire() > for the 17th context just blocks. > > We make a little more progress with the new patches, but it still > doesn't complete the whole set of message-sending. When we receive a > tick, we spu_deactivate() the context. Now (as far as I can tell), > there's no way for that context to ever be rescheduled, as it's been > removed from the run queue. > > I've worked-around this by doing a spu_activate() after the > spu_activate(). This gets us a little further, but I seem to get stuck > further along in the test - I'll try and figure out what's happening > tomorrow. I've plaied around a little with this and by disabling the nowake optimization it goes quite a lot further. The problem then is that all threads hand in spufs_wait in spufs_run_spu. We're missing a wakeup there somewhere, and I doubt this problem is in the scheduler. With a nasty little patch I've attached below (and the nowake removal patch in bz #32127) I can completet your test case. I doubt the lost wakeup is in the scheduler, so we should probably open up a bugzilla on this problem. Signed-off-by: Arnd Bergmann Index: linux-2.6/arch/powerpc/platforms/cell/spufs/run.c =================================================================== --- linux-2.6.orig/arch/powerpc/platforms/cell/spufs/run.c +++ linux-2.6/arch/powerpc/platforms/cell/spufs/run.c @@ -306,6 +306,27 @@ static inline int spu_process_events(str return ret; } +#define spufs_wait_timeout(wq, condition, tmo) \ +({ \ + int __ret = 0; \ + DEFINE_WAIT(__wait); \ + for (;;) { \ + prepare_to_wait(&(wq), &__wait, TASK_INTERRUPTIBLE); \ + if (condition) \ + break; \ + if (!signal_pending(current)) { \ + spu_release(ctx); \ + schedule_timeout(tmo); \ + spu_acquire(ctx); \ + continue; \ + } \ + __ret = -ERESTARTSYS; \ + break; \ + } \ + finish_wait(&(wq), &__wait); \ + __ret; \ +}) + long spufs_run_spu(struct file *file, struct spu_context *ctx, u32 *npc, u32 *event) { @@ -316,6 +337,11 @@ long spufs_run_spu(struct file *file, st if (down_interruptible(&ctx->run_sema)) return -ERESTARTSYS; + /* update context priority */ + ctx->rt_priority = current->rt_priority; + ctx->policy = current->policy; + ctx->prio = current->static_prio; + ctx->ops->master_start(ctx); ctx->event_return = 0; ret = spu_run_init(ctx, npc); @@ -323,7 +349,7 @@ long spufs_run_spu(struct file *file, st goto out; do { - ret = spufs_wait(ctx->stop_wq, spu_stopped(ctx, &status)); + ret = spufs_wait_timeout(ctx->stop_wq, spu_stopped(ctx, &status), HZ); spu = ctx->spu; if (unlikely(ret)) break;