Currently cpusets are not able to do proper writeback since dirty ratio calculations and writeback are all done for the system as a whole. This may result in a large percentage of the nodes in a cpuset to become dirty without background writeout being triggered and without synchrononous writes occurring. Instead writeout occurs during reclaim when memory is tight which may lead to dicey VM situations. In order to fix the problem we first of all introduce a method to establish a map of dirty nodes for each struct address_space. Secondly, we modify the dirty limit calculation to be based on the current state of memory on the nodes of the cpuset that the current tasks belongs to. If the current tasks is part of a cpuset that is not allowed to allocate from all nodes in the system then we select only inodes for writeback that have pages on the nodes that we are allowed to allocate from. Tested on: IA64 NUMA 128p, 12p Compiles on: i386 SMP x86_64 UP Changelog: RFC->V1 ------------------ - Rework dirty_map logic to allocate it dynamically on larger NUMA systems. Move to struct address_space and address various minor issues. - Dynamically allocate dirty maps only if an inode is dirtied. - Clear the dirty map only when an inode is cleared (simplifies locking and we need to keep the dirty state even after the dirty state of all pages has be cleared for NFS writeout to occur correctly). - Drop nr_node_ids patches - Drop the NR_UNRECLAIMABLE patch. There may be other ideas around on how to accomplish the same in a more elegant way. - Drop mentioning the NFS issues since Peter is working on those.