From peterz@infradead.org Mon Oct 29 13:21:10 2007 Date: Mon, 29 Oct 2007 21:21:00 +0100 From: Peter Zijlstra To: Christoph Lameter Cc: Dave Chinner Subject: Re: XFS and dirty handling in 2.6.24-rc1 (fwd) On Mon, 2007-10-29 at 12:40 -0700, Christoph Lameter wrote: > On Mon, 29 Oct 2007, Peter Zijlstra wrote: > > > > Didn't Peter Z's per-BDI rate-based throttling get merged in .24-rc1? > > > If so, that's the most like candidate that introduced this hang. > > > > Yes they did, any more details on the issue? > > 8p 4 node IA64 system 5G RAM. Just doing scp hangs like that after > transferring a few k bytes. Then after 10 minutes it suddenly finishes. > > I have seen the same issue with less holdoffs on an 8p x86 system with 8G > RAM. > > > One thing that _might_ also influence the whole behaviour is Wu's inode > > writeback patches, I've seem one hang in there and they hold locks that > > can stall balance_dirty_pages(). > > What can I do to figure out more? I always start with something like this, to figure out why we're not breaking out of the congestion loop. --- mm/page-writeback.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) Index: linux-2.6/mm/page-writeback.c =================================================================== --- linux-2.6.orig/mm/page-writeback.c 2007-10-30 13:29:23.475573261 -0700 +++ linux-2.6/mm/page-writeback.c 2007-10-31 11:35:20.607309470 -0700 @@ -420,6 +420,19 @@ static void balance_dirty_pages(struct a if (pages_written >= write_chunk) break; /* We've done our duty */ + printk(KERN_DEBUG "global: %llu + %llu + %llu < %llu\n" + "bdi: %llu + %llu < %llu\n", + (u64)global_page_state(NR_FILE_DIRTY), + (u64)global_page_state(NR_UNSTABLE_NFS), + (u64)global_page_state(NR_WRITEBACK), + (u64)dirty_thresh, + + (u64)bdi_stat(bdi, BDI_RECLAIMABLE), + (u64)bdi_stat(bdi, BDI_WRITEBACK), + (u64)bdi_thresh + ); + + congestion_wait(WRITE, HZ/10); }