From peterz@infradead.org Mon Oct 29 13:21:10 2007
Date: Mon, 29 Oct 2007 21:21:00 +0100
From: Peter Zijlstra <peterz@infradead.org>
To: Christoph Lameter <clameter@sgi.com>
Cc: Dave Chinner <dgc@sgi.com>
Subject: Re: XFS and dirty handling in 2.6.24-rc1 (fwd)


On Mon, 2007-10-29 at 12:40 -0700, Christoph Lameter wrote:
> On Mon, 29 Oct 2007, Peter Zijlstra wrote:
> 
> > > Didn't Peter Z's per-BDI rate-based throttling get merged in .24-rc1?
> > > If so, that's the most like candidate that introduced this hang.
> > 
> > Yes they did, any more details on the issue?
> 
> 8p 4 node IA64 system 5G RAM. Just doing scp hangs like that after 
> transferring a few k bytes. Then after 10 minutes it suddenly finishes.
> 
> I have seen the same issue with less holdoffs on an 8p x86 system with 8G 
> RAM.
> 
> > One thing that _might_ also influence the whole behaviour is Wu's inode
> > writeback patches, I've seem one hang in there and they hold locks that
> > can stall balance_dirty_pages().
> 
> What can I do to figure out more?

I always start with something like this, to figure out why we're not
breaking out of the congestion loop.


---
 mm/page-writeback.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

Index: linux-2.6/mm/page-writeback.c
===================================================================
--- linux-2.6.orig/mm/page-writeback.c	2007-10-25 18:28:41.000000000 -0700
+++ linux-2.6/mm/page-writeback.c	2007-11-01 19:39:12.000000000 -0700
@@ -420,6 +420,19 @@ static void balance_dirty_pages(struct a
 		if (pages_written >= write_chunk)
 			break;		/* We've done our duty */
 
+		printk(KERN_CRIT "global: %llu + %llu + %llu < %llu "
+				"bdi: %llu + %llu < %llu\n",
+				(u64)global_page_state(NR_FILE_DIRTY),
+				(u64)global_page_state(NR_UNSTABLE_NFS),
+				(u64)global_page_state(NR_WRITEBACK),
+				(u64)dirty_thresh,
+
+				(u64)bdi_stat(bdi, BDI_RECLAIMABLE),
+				(u64)bdi_stat(bdi, BDI_WRITEBACK),
+				(u64)bdi_thresh
+		      );
+
+
 		congestion_wait(WRITE, HZ/10);
 	}