From: NeilBrown When NFSD receives a write request, the data is typically in a number of 1448 byte segments and writev is used to collect them together. Unfortunately, generic_file_buffered_write passes these to the filesystem one at a time, so an e.g. 32K over-write becomes a series of partial-page writes to each page, causing the filesystem to have to pre-read those pages - wasted effort. generic_file_buffered_write handles one segment of the vector at a time as it has to pre-fault in each segment to avoid deadlocks. When writing from kernel-space (and nfsd does) this is not an issue, so generic_file_buffered_write does not need to break and iovec from nfsd into little pieces. This patch avoids the splitting when get_fs is KERNEL_DS as it is from NFSd. This issue was introduced by commit 6527c2bdf1f833cc18e8f42bd97973d583e4aa83 Acked-by: Nick Piggin Cc: Norman Weathers Cc: Vladimir V. Saveliev Signed-off-by: Neil Brown Signed-off-by: Andrew Morton --- mm/filemap.c | 32 +++++++++++++++++++------------- 1 files changed, 19 insertions(+), 13 deletions(-) diff -puN mm/filemap.c~knfsd-stop-nfsd-writes-from-being-broken-into-lots-of-little-writes-to-filesystem mm/filemap.c --- a/mm/filemap.c~knfsd-stop-nfsd-writes-from-being-broken-into-lots-of-little-writes-to-filesystem +++ a/mm/filemap.c @@ -2033,21 +2033,27 @@ generic_file_buffered_write(struct kiocb /* Limit the size of the copy to the caller's write size */ bytes = min(bytes, count); - /* - * Limit the size of the copy to that of the current segment, - * because fault_in_pages_readable() doesn't know how to walk - * segments. + /* We only need to worry about prefaulting when writes are from + * user-space. NFSd uses vfs_writev with several non-aligned + * segments in the vector, and limiting to one segment a time is + * a noticeable performance for re-write */ - bytes = min(bytes, cur_iov->iov_len - iov_base); - - /* - * Bring in the user page that we will copy from _first_. - * Otherwise there's a nasty deadlock on copying from the - * same page as we're writing to, without it being marked - * up-to-date. - */ - fault_in_pages_readable(buf, bytes); + if (!segment_eq(get_fs(), KERNEL_DS)) { + /* + * Limit the size of the copy to that of the current + * segment, because fault_in_pages_readable() doesn't + * know how to walk segments. + */ + bytes = min(bytes, cur_iov->iov_len - iov_base); + /* + * Bring in the user page that we will copy from + * _first_. Otherwise there's a nasty deadlock on + * copying from the same page as we're writing to, + * without it being marked up-to-date. + */ + fault_in_pages_readable(buf, bytes); + } page = __grab_cache_page(mapping,index,&cached_page,&lru_pvec); if (!page) { status = -ENOMEM; _