Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

An incorrect assumption over radix_tree_tag_get()

0 views
Skip to first unread message

David Howells

unread,
Apr 6, 2010, 12:30:02 PM4/6/10
to

Hi,

I think I've made a bad assumption over my usage of radix_tree_tag_get() in
fs/fscache/page.c.

I've assumed that radix_tree_tag_get() is protected from radix_tree_tag_set()
and radix_tree_tag_clear() by the RCU read lock. However, now I'm not so
sure. I think it's only protected against removal of part of the tree.

Can you confirm?

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Nick Piggin

unread,
Apr 6, 2010, 1:10:01 PM4/6/10
to
On Tue, Apr 06, 2010 at 05:19:49PM +0100, David Howells wrote:
>
> Hi,
>
> I think I've made a bad assumption over my usage of radix_tree_tag_get() in
> fs/fscache/page.c.
>
> I've assumed that radix_tree_tag_get() is protected from radix_tree_tag_set()
> and radix_tree_tag_clear() by the RCU read lock. However, now I'm not so
> sure. I think it's only protected against removal of part of the tree.
>
> Can you confirm?

It is safe. Synchronization requirements for using the radix tree API
are documented.

David Howells

unread,
Apr 6, 2010, 3:00:01 PM4/6/10
to
Nick Piggin <npi...@suse.de> wrote:

> It is safe. Synchronization requirements for using the radix tree API
> are documented.

I presume you mean the big comment on it in radix-tree.h.

According to that, it is not safe:

* - any function _modifying_ the tree or tags (inserting or deleting
* items, setting or clearing tags) must exclude other modifications, and
* exclude any functions reading the tree.

David

David Howells

unread,
Apr 6, 2010, 3:20:01 PM4/6/10
to
David Howells <dhow...@redhat.com> wrote:

> Nick Piggin <npi...@suse.de> wrote:
>
> > It is safe. Synchronization requirements for using the radix tree API
> > are documented.
>
> I presume you mean the big comment on it in radix-tree.h.
>
> According to that, it is not safe:
>
> * - any function _modifying_ the tree or tags (inserting or deleting
> * items, setting or clearing tags) must exclude other modifications, and
> * exclude any functions reading the tree.

Having said that, the next few lines, say that it is:

* The notable exceptions to this rule are the following functions:
* radix_tree_lookup
* radix_tree_lookup_slot
* radix_tree_tag_get
* radix_tree_gang_lookup
* radix_tree_gang_lookup_slot
* radix_tree_gang_lookup_tag
* radix_tree_gang_lookup_tag_slot
* radix_tree_tagged

However, I'm not sure I agree that radix_tree_tag_get() belongs in this list.

The bug symptoms are this:

Someone is seeing is a bug with an apparently corrupt radix tree tag chain
being observed in radix_tree_tag_get(). Leastways, the BUG() on line 602 in
radix_tree_tag_get() trips once in a while:

kernel BUG at
/usr/src/linux-2.6-2.6.33/debian/build/source_i386_none/lib/radix-tree.c:602!
RIP: 0010:[<ffffffff81182040>] radix_tree_tag_get+0xbc/0xe3
[<ffffffffa0247b67>] ? __fscache_maybe_release_page+0x42/0x115
[<ffffffffa0372e7d>] ? nfs_fscache_release_page+0x66/0x99 [nfs]
[<ffffffff810b6dee>] ? invalidate_inode_pages2_range+0x15a/0x262
[<ffffffffa035312f>] ? nfs_invalidate_mapping_nolock+0x18/0xb4
[<ffffffffa0354097>] ? nfs_revalidate_mapping+0x85/0x99 [nfs]
[<ffffffffa0351158>] ? nfs_file_splice_read+0x5b/0x8e [nfs]
[<ffffffff811043d3>] ? splice_direct_to_actor+0xbe/0x188
[<ffffffff81104a1c>] ? direct_splice_actor+0x0/0x1e
[<ffffffff81113274>] ? ep_scan_ready_list+0x132/0x151
[<ffffffff811044e7>] ? do_splice_direct+0x4a/0x64
[<ffffffff810e8fa8>] ? do_sendfile+0x12d/0x1a8
[<ffffffff8106685b>] ? getnstimeofday+0x55/0xaf
[<ffffffff810e906c>] ? sys_sendfile64+0x49/0x88
[<ffffffff8103145f>] ? sysenter_dispatch+0x7/0x2e

which is this:

if (!tag_get(node, tag, offset))
saw_unset_tag = 1;
if (height == 1) {
int ret = tag_get(node, tag, offset);

--> BUG_ON(ret && saw_unset_tag);
return !!ret;
}

In fs/fscache/page.c, __fscache_maybe_release_page() does a radix_tree_lookup()
with just the RCU read lock held, and then calls radix_tree_tag_get() a couple
of times. In this case, it's the first instance, before we grab the
stores_lock spinlock (which is used to serialise alteration of the radix tree)
that is the problem:

/* see if the page is actually undergoing storage - if so we can't get
* rid of it till the cache has finished with it */
if (radix_tree_tag_get(&cookie->stores, page->index,
FSCACHE_COOKIE_STORING_TAG)) {
rcu_read_unlock();
goto page_busy;
}

Looking at radix_tree_tag_get(), I can see that it carefully uses
rcu_dereference_raw() to protect itself against pointer modification - but
looking at radix_tree_tag_set/clear(), no pointers are modified, no nodes are
replaced. radix_tree_tag_get()'s attempts to protect itself count for nothing
as set/clear() modify the node directly.

So, what I'm seeing is that the two calls to tag_get() on the same bit
occasionally show a different value, and, looking at the code, I can't see any
reason for the confidence displayed in the documenation that this cannot
happen.

Dave Chinner

unread,
Apr 6, 2010, 7:40:01 PM4/6/10
to
On Wed, Apr 07, 2010 at 03:09:03AM +1000, Nick Piggin wrote:
> On Tue, Apr 06, 2010 at 05:19:49PM +0100, David Howells wrote:
> >
> > Hi,
> >
> > I think I've made a bad assumption over my usage of radix_tree_tag_get() in
> > fs/fscache/page.c.
> >
> > I've assumed that radix_tree_tag_get() is protected from radix_tree_tag_set()
> > and radix_tree_tag_clear() by the RCU read lock. However, now I'm not so
> > sure. I think it's only protected against removal of part of the tree.
> >
> > Can you confirm?
>
> It is safe. Synchronization requirements for using the radix tree API
> are documented.

I don't think it is safe - I made modifications to XFS that modified
radix tree tags under a read lock (not RCU), but this resulted in
corrupted tag state as concurrent tag set/clear operations for
different slots propagated through the tree and got mixed up.
Christoph fixed the problem (f1f724e4b523d444c5a598d74505aefa3d6844d2)
by putting all tag modifications under the write lock. I can't see
how doing tag modifications under RCU read locks is any safer than
doing it under a spinning read lock....

Cheers,

Dave.
--
Dave Chinner
da...@fromorbit.com

Nick Piggin

unread,
Apr 7, 2010, 4:00:01 AM4/7/10
to
On Wed, Apr 07, 2010 at 09:34:38AM +1000, Dave Chinner wrote:
> On Wed, Apr 07, 2010 at 03:09:03AM +1000, Nick Piggin wrote:
> > On Tue, Apr 06, 2010 at 05:19:49PM +0100, David Howells wrote:
> > >
> > > Hi,
> > >
> > > I think I've made a bad assumption over my usage of radix_tree_tag_get() in
> > > fs/fscache/page.c.
> > >
> > > I've assumed that radix_tree_tag_get() is protected from radix_tree_tag_set()
> > > and radix_tree_tag_clear() by the RCU read lock. However, now I'm not so
> > > sure. I think it's only protected against removal of part of the tree.
> > >
> > > Can you confirm?
> >
> > It is safe. Synchronization requirements for using the radix tree API
> > are documented.
>
> I don't think it is safe - I made modifications to XFS that modified
> radix tree tags under a read lock (not RCU), but this resulted in
> corrupted tag state as concurrent tag set/clear operations for
> different slots propagated through the tree and got mixed up.
> Christoph fixed the problem (f1f724e4b523d444c5a598d74505aefa3d6844d2)
> by putting all tag modifications under the write lock. I can't see
> how doing tag modifications under RCU read locks is any safer than
> doing it under a spinning read lock....

No the modifications must all be serialized, but they can run in
parallel with a radix_tree_tag_get().

0 new messages