RCU git Tree
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git
The following branches are of interest:
-
origin/rcu/next
: commits intended for the next
release, or perhaps the one after that.
-
origin/rcu/testing
: commits that might go into
an upcoming release, but which might be rebased or even
dropped, depending on the outcome of further testing.
This branch is normally a superset of origin/rcu/next
.
-
origin/rcu/urgent
: fixes for regressions in
mainline.
This tree may be accessed as follows:
git clone git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git
git checkout origin/rcu/testing
Once created, you can make your local copy incorporate changes as
follows:
git remote update
git checkout origin/rcu/testing
RCU To-Do List
Things on Paul's Reasonably Immediate List
- ftrace plugins for RCU.
- Merge SRCU in to TREE_RCU.
As currently implemented, SRCU cannot be made hierarchical,
and thus is a scalability bottleneck.
- Make
CONFIG_RCU_FAST_NO_HZ
work for
CONFIG_TREE_PREEMPT_RCU
.
- Once
CONFIG_TINY_PREEMPT_RCU
is deemed sufficiently
reliable, eliminate the UP-only code paths from the hierarchical
implementations of RCU.
Currently slating this for mid-2011.
Software-Engineering Enhancements
- Make a
call_rcu_blocking()
,
call_rcu_blocking_bh()
,
and call_rcu_blocking_sched()
that invoke synchronize_rcu()
(or the corresponding API)
if callbacks are backing up.
However, the base call_rcu()
family of functions
already have checks to accelerate grace periods, and these
primitives have extremely simple implementations, so they can wait
until there is a strong need.
(Mathieu Desnoyers has added a defer_rcu() call to userspace
RCU that has similar properties.)
Optimizations
- Make rcutree track the largest CPU, and to restrict scans
to that number.
This change would be helpful in the common case where
NR_CPUS is much larger than the actual number of CPUs.
- Consider adding checks for RT threads to
__call_rcu()
so that we can have deterministic RCU updates.
The first step of this is slated for 2.6.40
(commit 2655d57ef35aa327a2e58a1c5dc7b65c65003f4e).
- It may be possible to accelerate TREE_PREEMPT_RCU grace periods
based on the fact that a CPU can unambiguously
determine whether or not it is in an RCU read-side critical
section at a specific point in time.
A similar trick might also work (albeit probabilistically)
for
CONFIG_PREEMPT
builds of TREE_RCU.
In both cases, the first guess would be that
note_new_gpnum()
would be a good place for such
an optimization.
- Attempt to make RCU independent of the scheduling-clock
interrupt, in order to reduce OS jitter.
This was a topic of informal discussion at the
2010 Linux Plumbers Conference.
A first step in this direction, which should support Frederic
Weisbecker's suppression of the scheduling clock interrupt
for CPUs having but one runnable task, is in my -rcu tree.
Simple Cleanups
- Make rcupreempt_trace.c use seqfile.
Li Zefan (lizf@cn.fujitsu.com) is taking this on, but using
Frederic Weisbecker's statistical ftrace.
Li Zefan's earlier non-ftrace patch may be found
here.
- Simplify the handling of memory barriers in TREE_RCU's
dyntick interface by creating a
PUBLISH_AND_STORE()
and a LOAD_AND_ACQUIRE()
primitive.
This should have the same readability benefits as did the
change from naked memory barriers to rcu_assign_pointer()
and rcu_dereference().
- Prevent grace-period counters from overflowing while a given
CPU is in dyntick-idle state.
Even on a 32-bit machine, such overflow takes more than a
month, so this is still relatively low priority.
- Move #ifdef'ed functions from
kernel/rcutree.c
to
kernel/rcutree_plugin.h
.
Inspection
- Look for places where people (wrongly, for preemptable RCU)
assume that rcu_read_lock() disables preemption.
- Look for places where people assume that disabling preemption
and/or irqs makes rcu_read_lock() unnecessary.
- Make sure Documentation/RCU/*.txt files better match reality.
(This is an ongoing work item, as it is amazing how quickly
the documentation decays.)
Possibly Dubious Changes
- Allow RCU read-side critical sections in idle loops.
This has come up enough times over the past several years that
it is time to go do it.
However, the interactions with dyntick-idle mode are challenging.
- Remove the function pointer from the
struct rcu_head
and group these structures by function in order to:
- shrink the size of the
struct rcu_head
, and
- improve the efficiency of callback processing by grouping
the calls though pointers.
This has some non-trivial consequences, please see the
discussion between
Manfred and Paul for some of the issues involved.
Another approach is to maintain a separate queue that tracks
the functions to be invoked and the memory blocks to invoke them on,
as Mathieu Desnoyers
suggested. This means that the new defer_rcu()
primitive can now block, either when waiting for a grace period
to elapse or when allocating new memory. Alternatively, this
primitive could return an error if the callback could not
be enqueued immediately. Of course, the queue of callbacks
must be present, so that memory savings will depend on the
workload.
- Make rcutree keep track of quiescent states that a CPU
passes through after a given grace period starts, but
before that CPU is aware that a new grace period has
started.
The goal would be to reduce grace-period latency, but it is
not clear how helpful this change would be.
It is clear that this change could be quite difficult to get right.
- Change the size of the grace-period-number counter from long to int.
I do not currently believe that this change is worthwhile, as
the memory savings are miniscule, it could result in worse code
on 64-bit systems, and it requires a change to the exported
rcu_batches_completed() and rcu_batches_completed_bh() APIs.
- Replace Hierarchical RCU's manual bitmask scans with something
like ffs() or ffsl().
Although this seems unlikely to make the code measureably faster,
it might simplify the code a little bit.
RCU Done List
- Make a “fire and forget” primitive that
kfree()
's the specified memory block after
a grace period elapses. Lai Jiangshan and Manfred Spraul
are working on patches for this, see
Lai's and
Manfred's
patches.
Thanks to Lai Jiangshan, this is in mainline as of 2.6.40
(commit 9ab1544eb4196ca8d05c433b2eb56f74496b1ee3).
- Getting Preemptable RCU priority boosting to mainline.
This is in mainline as of 2.6.40
(commit 27f4d28057adf98750cf863c40baefb12f5b6d21).
- Get rid of
CONFIG_RCU_CPU_STALL_DETECTOR
, thus
making stall detection unconditional.
The overhead of this option is negligible, it has proven
quite useful, and making it unconditional would reduce the
number of combinations of configuration parameters that must
be tested.
However, recent experience has shown that it is valuable to
be able to suppress RCU CPU stall detection at times, so
this item is waiting on experience with the boot-time and
sysfs controls over RCU CPU stall detection.
Currently slating this for early 2011.
This is in mainline as of 2.6.40
(commit a00e0d714fbded07a7a2254391ce9ed5a5cb9d82).
- Create a small-memory-footprint RCU for single-CPU embedded
devices.
A starting point may be found
here,
with the LWN writeup
here.
This is in mainline as of 2.6.33 (commit 9b1d82fa1611706fa7ee1505f290160a18caf95d).
- It appears that some versions of gcc do not like forward
references to static inline functions.
But
kernel/rcutree_plugin.h
really wants to
define forward-referenced static inline functions.
What is the best way to handle this?
- Tell people to tune their gcc warnings down a bit.
- Remove the "inline" modifiers.
- Move the
#include
of
kernel/rcu_plugin.h
back up near the front
of kernel/rcutree.c
, bringing back the
ugly list of functions referenced by the plugins.
Removing the "inline" modifiers seems best at the moment, and
hit mainline as of 2.6.33
(commit dbe01350fa8ce0c11948ab7d6be71a4d901be151).
- Reduce the need for
force_quiescent_state()
to send
IPIs by making rcu_check_callbacks()
invoke the
much cheaper set_need_resched()
.
This appeared in mainline in 2.6.35
(commit d25eb9442bb2c38c1e742f0fa764d7132d72593f).
- Make a module parameter that controls the frequency
with which
force_quiescent_state()
is invoked.
Increasing this frequency can be useful when torture-testing
RCU.
This appeared in mainline in v2.6.34
(commit bf66f18e79e34c421bbd8f6511e2c556b779df2f).
- Forbid grace periods from starting while
force_quiescent_state()
is running.
This would have little or no effect on performance, while
greatly simplifying race conditions.
This is in mainline as of v2.6.34
(commit 07079d5357a4d53c2b13126c4a38fb40e6e04966).
- Fix TREE_PREEMPT_RCU's implementation of
synchronize_rcu_expedited()
, with
a hack for 2.6.32
(commit 019129d595caaa5bd0b41d128308da1be6a91869)
and real fix in 2.6.33
(commit d9a3da0699b24a589b27a61e1a5b5bd30d9db669).
- Fix additional bugs in TREE_PREEMPT_RCU as they arise.
And there has been
one or
two of them.
Chasing them down was
more entertaining than usual.
- Fixed an RCU performance bug located by Nick Piggin in
which RCU takes a big performance hit trying to push RCU
callbacks through the system.
In 2.6.32 commit 37c72e56f6b234ea7387ba530434a80abf2658d8.
- Fold rcu_pending() into rcu_check_callbacks(). In 2.6.32-rc1
commit a157229cabd6dd8cfa82525fc9bf730c94cc9ac2.
- Rename rcu_qsctr_inc() to rcu_report_qs() and rcu_bh_qsctr_inc()
to rcu_bh_report_qs().
Actually remapped differently as a result of making rcu_sched,
rcu_bh, and rcu_preempt (when present) the basic RCU implementations.
This is in 2.6.32-rc1
commit d6714c22b43fbcbead7e7b706ff270e15f04a791.
- Expedited grace periods.
This is in 2.6.31 commit 03b042bf1dc14a268a3d65d38b4ec2a4261e8477.
This implementation does not expedite preemptable RCU,
But that is on the way.
- Additional work on user-level RCU implementations.
Numerous variants in flight, no shortage!
Best place to start is
git://lttng.org/userspace-rcu.git
,
or, if you prefer tarballs,
here.
- Applying lessons learned from the Hierarchical RCU experience
to Preemptable RCU.
This also includes moving the preemptable RCU state machine
out of the scheduling-clock interrupt handler, speeding
up the read-side primitives, and speeding up the grace periods.
Also cutting about a thousand lines of code.
See commit f41d911f8c49a5d65c86504c19e8204bb605c4fd.
- Make rcutree's synchronize_rcu() be barrier() for !SMP builds.
Commit a682604838763981613e42015cd0e39f2989d6bb handles that
and all the other RCU implementations while it is at it.
- Lai Jiangshan fixed a long-standing CPU-hotplug race in
rcu_barrier(). Commit f69b17d7e745d8edd7c0d90390cbaa77e63c5ea3.
- Stomping out Hierarchical RCU bugs!!!
- Grace-period latency on small systems.
Commit c12172c0251761c54260376eb29a5f6547495580.
- Suspend/resume issues.
Commit 90a4d2c0106bb690f0b6af3d506febc35c658aa7.
- A couple of rcutorture bugs.
Commits c59ab97e9ecdee9084d2da09e5a8ceea9a396508 and
c9d557c19f94df42db78d4a5de4d25feee694bad.
- Commit a682604838763981613e42015cd0e39f2989d6bb
teaching RCU that the idle task is not a queiscent state
at boot time. (Actually a bug for all flavors of RCU,
not just hierarchical RCU.)
- Commit ef631b0ca01655d24e9ca7e199262c4a46416a26 making
RCU less IPI-happy.
- The C standard specifies “undefined behavior” for
overflow of signed integral variables.
Could this possibly bite us?
Inspect and consider changing some of the
long
variables to unsigned long
.
First attempts to do this were quite ugly, so am happy to assume
that all machines that Linux runs on will be twos-complement
machines.
- Finish merging commits from Arnd's RCU sparse checking.
These will appear in 2.6.37 (commit
67bdbffd696f29a0b68aa8daa285783a06651583).
- Create a TINY_PREEMPT_RCU, then drive the choice of RCU
implementation directly off of CONFIG_SMP and CONFIG_PREEMPT.
This is now in mainline
(commit a57eb940d130477a799dfb24a570ee04979c0f7f).
- Can we remove one or more of the
ACCESS_ONCE()
primitives in TREE_PREEMPT_RCU's __rcu_read_lock()
and __rcu_read_unlock()
?
(Yes, commit 80dcf60e6b97c7363971e7a0a788d8484d35f8a6.)