Unlocking without locking here?

2 views
Skip to first unread message

Lukas Bulwahn

unread,
Apr 26, 2019, 12:19:59 AM4/26/19
to clang-bu...@googlegroups.com, Himanshu Jha
Dear all,

I have been applying the clang thread static analyser on the kernel,
and after around hundreds of annotations came around this code in
v5.1-rc5:

drivers/md/md.c:3393:3: warning: releasing mutex
'mddev->reconfig_mutex' that was not held [-Wthread-safety-analysis]
mddev_unlock(mddev);
^

This looks as follows:

static ssize_t
rdev_attr_store(struct kobject *kobj, struct attribute *attr,
const char *page, size_t length)
{
struct rdev_sysfs_entry *entry = container_of(attr, struct
rdev_sysfs_entry, attr);
struct md_rdev *rdev = container_of(kobj, struct md_rdev, kobj);
ssize_t rv;
struct mddev *mddev = rdev->mddev;

if (!entry->store)
return -EIO;
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
rv = mddev ? mddev_lock(mddev): -EBUSY;
if (!rv) {
if (rdev->mddev == NULL)
rv = -EBUSY;
else
rv = entry->store(rdev, page, length);
mddev_unlock(mddev);
}
return rv;
}


If my brain is not completely messed up after looking at so much
locking code, there should be an issue here:

Assume rdev->mddev==NULL (or by aliasing the local mddev==NULL).
Then rv==-EBUSY
Then, we take the then branch in if(!rv), check again (rdev->mddev ==
NULL) and set rv (that is redundant at this point), but then we unlock
where we never locked.

Is this a bug or where my thinking mistake?

Lukas

Arnd Bergmann

unread,
Apr 26, 2019, 4:26:47 AM4/26/19
to Lukas Bulwahn, clang-bu...@googlegroups.com, Himanshu Jha, NeilBrown
On Fri, Apr 26, 2019 at 6:20 AM Lukas Bulwahn <lukas....@gmail.com> wrote:
>
> Dear all,
>
> I have been applying the clang thread static analyser on the kernel,
> and after around hundreds of annotations came around this code in
> v5.1-rc5:
>
> drivers/md/md.c:3393:3: warning: releasing mutex
> 'mddev->reconfig_mutex' that was not held [-Wthread-safety-analysis]
> mddev_unlock(mddev);
> ^

Nice find!

> This looks as follows:
>
> rv = mddev ? mddev_lock(mddev): -EBUSY;
> if (!rv) {
> if (rdev->mddev == NULL)
> rv = -EBUSY;
> else
> rv = entry->store(rdev, page, length);
> mddev_unlock(mddev);
> }
> return rv;
> }
>
>
> If my brain is not completely messed up after looking at so much
> locking code, there should be an issue here:
>
> Assume rdev->mddev==NULL (or by aliasing the local mddev==NULL).
> Then rv==-EBUSY
> Then, we take the then branch in if(!rv), check again (rdev->mddev ==
> NULL) and set rv (that is redundant at this point), but then we unlock
> where we never locked.
>
> Is this a bug or where my thinking mistake?

Generally speaking, try to look at the git history 'git log -p $FILE'
to see how this came about, and you'll find commit 27c529bb8e90
("md: lock access to rdev attributes properly") that explains the
intention.

Adding Neil Brown to Cc as the author of that commit.

The idea here is that rdev->mddev can change underneath us
from another thread, and we take a snapshot of it at the start of the
function. If it's non-NULL, we take the lock that prevents it from
becoming NULL, and then check if it has changed in the meantime.

Finally, once we know that rdev is valid and remains valid, we
call the store() function, and then unlock it again. It is probably
a valid assumption that rdev->mmdev can change from
a valid pointer to a NULL pointer, but not to another pointer,
which would break this spectacularly.

What I think the thread analyser has found is the aliasing
between mddev and rdev->mddev that allows the compiler to
eliminate the 'mddev' variable and dereference rdev->mddev
twice, leading to a race of

A B
if (rdev->mddev)
rdev->mddev = NULL
mutex_lock(&rdev->mddev->reconfig_mutex)

We could add a spinlock around the access to the rdev->mddev
pointer here, or possibly just use a READ_ONCE/WRITE_ONCE
pair like

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 05ffffb8b769..495b3e4377d5 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2259,7 +2259,7 @@ static void unbind_rdev_from_array(struct md_rdev *rdev)
bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk);
list_del_rcu(&rdev->same_set);
pr_debug("md: unbind<%s>\n", bdevname(rdev->bdev,b));
- rdev->mddev = NULL;
+ WRITE_ONCE(rdev->mddev, NULL);
sysfs_remove_link(&rdev->kobj, "block");
sysfs_put(rdev->sysfs_state);
rdev->sysfs_state = NULL;
@@ -3378,7 +3378,7 @@ rdev_attr_store(struct kobject *kobj, struct
attribute *attr,
struct rdev_sysfs_entry *entry = container_of(attr, struct
rdev_sysfs_entry, attr);
struct md_rdev *rdev = container_of(kobj, struct md_rdev, kobj);
ssize_t rv;
- struct mddev *mddev = rdev->mddev;
+ struct mddev *mddev = READ_ONCE(rdev->mddev);

if (!entry->store)
return -EIO;

I have not looked at the code to see if that is sufficient to fix the race,
but I assume we need at least that.

Arnd

Lukas Bulwahn

unread,
Apr 26, 2019, 4:50:59 AM4/26/19
to Arnd Bergmann, clang-bu...@googlegroups.com, Himanshu Jha, NeilBrown
Thanks for the hint. I did that as well, after writing the email
above. It was 6:20 in the morning and I just found a locking bug where
many other tools and people must have failed or been equally confused.
Arnd, I actually thought of this patch:

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 05ffffb8b769..aded861a8864 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -3384,12 +3384,11 @@ rdev_attr_store(struct kobject *kobj, struct
attribute *attr,
return -EIO;
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
- rv = mddev ? mddev_lock(mddev): -EBUSY;
+ if (!mddev)
+ return -EBUSY;
+ rv = mddev_lock(mddev);
if (!rv) {
- if (rdev->mddev == NULL)
- rv = -EBUSY;
- else
- rv = entry->store(rdev, page, length);
+ rv = entry->store(rdev, page, length);
mddev_unlock(mddev);
}
return rv;


If you think this solves the issue, I prepare a proper patch and send
it to the right mailing lists and maintainers.

Lukas

Arnd Bergmann

unread,
Apr 26, 2019, 5:08:08 AM4/26/19
to Lukas Bulwahn, clang-bu...@googlegroups.com, Himanshu Jha, NeilBrown
I think this patch does not fix the issue that the thread analyser found
but instead revives the bug that was fixed by the 27c529bb8e90 commit,
it does not prevent the local 'mddev' alias for rdev->mddev from
becoming NULL due to a concurrent unbind_rdev_from_array.

On the other hand, this only works if every path leading up to
unbind_rdev_from_array() also takes the mddev_lock(), which doesn't
seem to be the case here.

Arnd

NeilBrown

unread,
Apr 26, 2019, 8:41:12 PM4/26/19
to Arnd Bergmann, Lukas Bulwahn, clang-bu...@googlegroups.com, Himanshu Jha
I cannot see the point of this part. I don't think it matter how often
'NULL' is written to rdev->mddev.

> sysfs_remove_link(&rdev->kobj, "block");
> sysfs_put(rdev->sysfs_state);
> rdev->sysfs_state = NULL;
> @@ -3378,7 +3378,7 @@ rdev_attr_store(struct kobject *kobj, struct
> attribute *attr,
> struct rdev_sysfs_entry *entry = container_of(attr, struct
> rdev_sysfs_entry, attr);
> struct md_rdev *rdev = container_of(kobj, struct md_rdev, kobj);
> ssize_t rv;
> - struct mddev *mddev = rdev->mddev;
> + struct mddev *mddev = READ_ONCE(rdev->mddev);

This is a reasonable change to make, and I agree that it would fix what
the analysis seems to have found.

Thanks,
NeilBrown
signature.asc

Arnd Bergmann

unread,
Apr 27, 2019, 12:54:07 PM4/27/19
to NeilBrown, Lukas Bulwahn, clang-bu...@googlegroups.com, Himanshu Jha
On Sat, Apr 27, 2019 at 2:41 AM NeilBrown <ne...@suse.com> wrote:
> On Fri, Apr 26 2019, Arnd Bergmann wrote:
> > On Fri, Apr 26, 2019 at 6:20 AM Lukas Bulwahn <lukas....@gmail.com> wrote:

> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index 05ffffb8b769..495b3e4377d5 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -2259,7 +2259,7 @@ static void unbind_rdev_from_array(struct md_rdev *rdev)
> > bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk);
> > list_del_rcu(&rdev->same_set);
> > pr_debug("md: unbind<%s>\n", bdevname(rdev->bdev,b));
> > - rdev->mddev = NULL;
> > + WRITE_ONCE(rdev->mddev, NULL);
>
> I cannot see the point of this part. I don't think it matter how often
> 'NULL' is written to rdev->mddev.

Ah, I had assumed that READ_ONCE()/WRITE_ONCE() must
be paired in the same way as smp_rmb() and smp_wmb(), but
I probably misunderstood that then.

On a related note, can you confirm that any invocation of
unbind_rdev_from_array() is done with mddev_lock() held?
I don't see where we take the lock, and without it, the protection
is pointless here.

Arnd

NeilBrown

unread,
Apr 27, 2019, 7:10:06 PM4/27/19
to Arnd Bergmann, Lukas Bulwahn, clang-bu...@googlegroups.com, Himanshu Jha
The lock is taken in:
- md_ioctl() if the action was tiggered by an ioctl,
- rdev_attr_store() or md_attr_store() if the action was triggered
by a write to a sysfs file, or
- md_check_recovery() if the action is triggered by md itself.

Certainly unbind_rdev_from_array() should only ever be called with
reconfig_mutex held.

Thanks,
NeilBrown

signature.asc
Reply all
Reply to author
Forward
0 new messages