RCU git Tree

git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git

The following branches are of interest:

  1. origin/rcu/next: commits intended for the next release, or perhaps the one after that.
  2. origin/rcu/testing: commits that might go into an upcoming release, but which might be rebased or even dropped, depending on the outcome of further testing. This branch is normally a superset of origin/rcu/next.
  3. origin/rcu/urgent: fixes for regressions in mainline.

This tree may be accessed as follows:

git clone git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git
git checkout origin/rcu/testing

Once created, you can make your local copy incorporate changes as follows:

git remote update
git checkout origin/rcu/testing

RCU To-Do List

Things on Paul's Reasonably Immediate List

  1. ftrace plugins for RCU.
  2. Merge SRCU in to TREE_RCU. As currently implemented, SRCU cannot be made hierarchical, and thus is a scalability bottleneck.
  3. Make CONFIG_RCU_FAST_NO_HZ work for CONFIG_TREE_PREEMPT_RCU.
  4. Once CONFIG_TINY_PREEMPT_RCU is deemed sufficiently reliable, eliminate the UP-only code paths from the hierarchical implementations of RCU. Currently slating this for mid-2011.

Software-Engineering Enhancements

  1. Make a call_rcu_blocking(), call_rcu_blocking_bh(), and call_rcu_blocking_sched() that invoke synchronize_rcu() (or the corresponding API) if callbacks are backing up. However, the base call_rcu() family of functions already have checks to accelerate grace periods, and these primitives have extremely simple implementations, so they can wait until there is a strong need. (Mathieu Desnoyers has added a defer_rcu() call to userspace RCU that has similar properties.)

Optimizations

  1. Make rcutree track the largest CPU, and to restrict scans to that number. This change would be helpful in the common case where NR_CPUS is much larger than the actual number of CPUs.
  2. Consider adding checks for RT threads to __call_rcu() so that we can have deterministic RCU updates. The first step of this is slated for 2.6.40 (commit 2655d57ef35aa327a2e58a1c5dc7b65c65003f4e).
  3. It may be possible to accelerate TREE_PREEMPT_RCU grace periods based on the fact that a CPU can unambiguously determine whether or not it is in an RCU read-side critical section at a specific point in time. A similar trick might also work (albeit probabilistically) for CONFIG_PREEMPT builds of TREE_RCU. In both cases, the first guess would be that note_new_gpnum() would be a good place for such an optimization.
  4. Attempt to make RCU independent of the scheduling-clock interrupt, in order to reduce OS jitter. This was a topic of informal discussion at the 2010 Linux Plumbers Conference. A first step in this direction, which should support Frederic Weisbecker's suppression of the scheduling clock interrupt for CPUs having but one runnable task, is in my -rcu tree.

Simple Cleanups

  1. Make rcupreempt_trace.c use seqfile. Li Zefan (lizf@cn.fujitsu.com) is taking this on, but using Frederic Weisbecker's statistical ftrace. Li Zefan's earlier non-ftrace patch may be found here.
  2. Simplify the handling of memory barriers in TREE_RCU's dyntick interface by creating a PUBLISH_AND_STORE() and a LOAD_AND_ACQUIRE() primitive. This should have the same readability benefits as did the change from naked memory barriers to rcu_assign_pointer() and rcu_dereference().
  3. Prevent grace-period counters from overflowing while a given CPU is in dyntick-idle state. Even on a 32-bit machine, such overflow takes more than a month, so this is still relatively low priority.
  4. Move #ifdef'ed functions from kernel/rcutree.c to kernel/rcutree_plugin.h.

Inspection

  1. Look for places where people (wrongly, for preemptable RCU) assume that rcu_read_lock() disables preemption.
  2. Look for places where people assume that disabling preemption and/or irqs makes rcu_read_lock() unnecessary.
  3. Make sure Documentation/RCU/*.txt files better match reality. (This is an ongoing work item, as it is amazing how quickly the documentation decays.)

Possibly Dubious Changes

  1. Allow RCU read-side critical sections in idle loops. This has come up enough times over the past several years that it is time to go do it. However, the interactions with dyntick-idle mode are challenging.
  2. Remove the function pointer from the struct rcu_head and group these structures by function in order to:
    1. shrink the size of the struct rcu_head, and
    2. improve the efficiency of callback processing by grouping the calls though pointers.
    This has some non-trivial consequences, please see the discussion between Manfred and Paul for some of the issues involved.

    Another approach is to maintain a separate queue that tracks the functions to be invoked and the memory blocks to invoke them on, as Mathieu Desnoyers suggested. This means that the new defer_rcu() primitive can now block, either when waiting for a grace period to elapse or when allocating new memory. Alternatively, this primitive could return an error if the callback could not be enqueued immediately. Of course, the queue of callbacks must be present, so that memory savings will depend on the workload.

  3. Make rcutree keep track of quiescent states that a CPU passes through after a given grace period starts, but before that CPU is aware that a new grace period has started. The goal would be to reduce grace-period latency, but it is not clear how helpful this change would be. It is clear that this change could be quite difficult to get right.
  4. Change the size of the grace-period-number counter from long to int. I do not currently believe that this change is worthwhile, as the memory savings are miniscule, it could result in worse code on 64-bit systems, and it requires a change to the exported rcu_batches_completed() and rcu_batches_completed_bh() APIs.
  5. Replace Hierarchical RCU's manual bitmask scans with something like ffs() or ffsl(). Although this seems unlikely to make the code measureably faster, it might simplify the code a little bit.

RCU Done List

  1. Make a “fire and forget” primitive that kfree()'s the specified memory block after a grace period elapses. Lai Jiangshan and Manfred Spraul are working on patches for this, see Lai's and Manfred's patches. Thanks to Lai Jiangshan, this is in mainline as of 2.6.40 (commit 9ab1544eb4196ca8d05c433b2eb56f74496b1ee3).
  2. Getting Preemptable RCU priority boosting to mainline. This is in mainline as of 2.6.40 (commit 27f4d28057adf98750cf863c40baefb12f5b6d21).
  3. Get rid of CONFIG_RCU_CPU_STALL_DETECTOR, thus making stall detection unconditional. The overhead of this option is negligible, it has proven quite useful, and making it unconditional would reduce the number of combinations of configuration parameters that must be tested. However, recent experience has shown that it is valuable to be able to suppress RCU CPU stall detection at times, so this item is waiting on experience with the boot-time and sysfs controls over RCU CPU stall detection. Currently slating this for early 2011. This is in mainline as of 2.6.40 (commit a00e0d714fbded07a7a2254391ce9ed5a5cb9d82).
  4. Create a small-memory-footprint RCU for single-CPU embedded devices. A starting point may be found here, with the LWN writeup here. This is in mainline as of 2.6.33 (commit 9b1d82fa1611706fa7ee1505f290160a18caf95d).
  5. It appears that some versions of gcc do not like forward references to static inline functions. But kernel/rcutree_plugin.h really wants to define forward-referenced static inline functions. What is the best way to handle this?
    1. Tell people to tune their gcc warnings down a bit.
    2. Remove the "inline" modifiers.
    3. Move the #include of kernel/rcu_plugin.h back up near the front of kernel/rcutree.c, bringing back the ugly list of functions referenced by the plugins.
    Removing the "inline" modifiers seems best at the moment, and hit mainline as of 2.6.33 (commit dbe01350fa8ce0c11948ab7d6be71a4d901be151).
  6. Reduce the need for force_quiescent_state() to send IPIs by making rcu_check_callbacks() invoke the much cheaper set_need_resched(). This appeared in mainline in 2.6.35 (commit d25eb9442bb2c38c1e742f0fa764d7132d72593f).
  7. Make a module parameter that controls the frequency with which force_quiescent_state() is invoked. Increasing this frequency can be useful when torture-testing RCU. This appeared in mainline in v2.6.34 (commit bf66f18e79e34c421bbd8f6511e2c556b779df2f).
  8. Forbid grace periods from starting while force_quiescent_state() is running. This would have little or no effect on performance, while greatly simplifying race conditions. This is in mainline as of v2.6.34 (commit 07079d5357a4d53c2b13126c4a38fb40e6e04966).
  9. Fix TREE_PREEMPT_RCU's implementation of synchronize_rcu_expedited(), with a hack for 2.6.32 (commit 019129d595caaa5bd0b41d128308da1be6a91869) and real fix in 2.6.33 (commit d9a3da0699b24a589b27a61e1a5b5bd30d9db669).
  10. Fix additional bugs in TREE_PREEMPT_RCU as they arise. And there has been one or two of them. Chasing them down was more entertaining than usual.
  11. Fixed an RCU performance bug located by Nick Piggin in which RCU takes a big performance hit trying to push RCU callbacks through the system. In 2.6.32 commit 37c72e56f6b234ea7387ba530434a80abf2658d8.
  12. Fold rcu_pending() into rcu_check_callbacks(). In 2.6.32-rc1 commit a157229cabd6dd8cfa82525fc9bf730c94cc9ac2.
  13. Rename rcu_qsctr_inc() to rcu_report_qs() and rcu_bh_qsctr_inc() to rcu_bh_report_qs(). Actually remapped differently as a result of making rcu_sched, rcu_bh, and rcu_preempt (when present) the basic RCU implementations. This is in 2.6.32-rc1 commit d6714c22b43fbcbead7e7b706ff270e15f04a791.
  14. Expedited grace periods. This is in 2.6.31 commit 03b042bf1dc14a268a3d65d38b4ec2a4261e8477. This implementation does not expedite preemptable RCU, But that is on the way.
  15. Additional work on user-level RCU implementations. Numerous variants in flight, no shortage! Best place to start is git://lttng.org/userspace-rcu.git, or, if you prefer tarballs, here.
  16. Applying lessons learned from the Hierarchical RCU experience to Preemptable RCU. This also includes moving the preemptable RCU state machine out of the scheduling-clock interrupt handler, speeding up the read-side primitives, and speeding up the grace periods. Also cutting about a thousand lines of code. See commit f41d911f8c49a5d65c86504c19e8204bb605c4fd.
  17. Make rcutree's synchronize_rcu() be barrier() for !SMP builds. Commit a682604838763981613e42015cd0e39f2989d6bb handles that and all the other RCU implementations while it is at it.
  18. Lai Jiangshan fixed a long-standing CPU-hotplug race in rcu_barrier(). Commit f69b17d7e745d8edd7c0d90390cbaa77e63c5ea3.
  19. Stomping out Hierarchical RCU bugs!!!
    1. Grace-period latency on small systems. Commit c12172c0251761c54260376eb29a5f6547495580.
    2. Suspend/resume issues. Commit 90a4d2c0106bb690f0b6af3d506febc35c658aa7.
    3. A couple of rcutorture bugs. Commits c59ab97e9ecdee9084d2da09e5a8ceea9a396508 and c9d557c19f94df42db78d4a5de4d25feee694bad.
    4. Commit a682604838763981613e42015cd0e39f2989d6bb teaching RCU that the idle task is not a queiscent state at boot time. (Actually a bug for all flavors of RCU, not just hierarchical RCU.)
    5. Commit ef631b0ca01655d24e9ca7e199262c4a46416a26 making RCU less IPI-happy.
  20. The C standard specifies “undefined behavior” for overflow of signed integral variables. Could this possibly bite us? Inspect and consider changing some of the long variables to unsigned long. First attempts to do this were quite ugly, so am happy to assume that all machines that Linux runs on will be twos-complement machines.
  21. Finish merging commits from Arnd's RCU sparse checking. These will appear in 2.6.37 (commit 67bdbffd696f29a0b68aa8daa285783a06651583).
  22. Create a TINY_PREEMPT_RCU, then drive the choice of RCU implementation directly off of CONFIG_SMP and CONFIG_PREEMPT. This is now in mainline (commit a57eb940d130477a799dfb24a570ee04979c0f7f).
  23. Can we remove one or more of the ACCESS_ONCE() primitives in TREE_PREEMPT_RCU's __rcu_read_lock() and __rcu_read_unlock()? (Yes, commit 80dcf60e6b97c7363971e7a0a788d8484d35f8a6.)