Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH 0/13] sysfs lazification.

0 views
Skip to first unread message

Eric W. Biederman

unread,
Nov 3, 2009, 6:54:33 AM11/3/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise

The sysfs code updates the vfs caches immediately when the sysfs data
structures change causing a lot of unnecessary complications. The
following patchset untangles that beast. Allowing for simpler
more straight forward code, the removal of a hack from the vfs
to support sysfs, and human comprehensible locking on sysfs.

Most of these patches have already been reviewed and acked from the
last time I had time to work on sysfs.

In net the patches look like:

fs/namei.c | 22 ---
fs/sysfs/dir.c | 388 ++++++++++++++++---------------------------------
fs/sysfs/file.c | 41 +----
fs/sysfs/inode.c | 178 ++++++++++++++---------
fs/sysfs/symlink.c | 11 +-
fs/sysfs/sysfs.h | 9 +-
include/linux/namei.h | 1 -
7 files changed, 256 insertions(+), 394 deletions(-)

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Eric W. Biederman

unread,
Nov 3, 2009, 6:57:38 AM11/3/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@xmission.com>

The granularity of sysfs time when we keep it is 1 ns. Which
when passed to timestamp_trunc results in a nop. So remove
the unnecessary function call making sysfs_setattr slightly
easier to read.

Acked-by: Tejun Heo <t...@kernel.org>
Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---
fs/sysfs/inode.c | 9 +++------
1 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 8a08250..fed7a74 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -103,14 +103,11 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
if (ia_valid & ATTR_GID)
iattrs->ia_gid = iattr->ia_gid;
if (ia_valid & ATTR_ATIME)
- iattrs->ia_atime = timespec_trunc(iattr->ia_atime,
- inode->i_sb->s_time_gran);
+ iattrs->ia_atime = iattr->ia_atime;
if (ia_valid & ATTR_MTIME)
- iattrs->ia_mtime = timespec_trunc(iattr->ia_mtime,
- inode->i_sb->s_time_gran);
+ iattrs->ia_mtime = iattr->ia_mtime;
if (ia_valid & ATTR_CTIME)
- iattrs->ia_ctime = timespec_trunc(iattr->ia_ctime,
- inode->i_sb->s_time_gran);
+ iattrs->ia_ctime = iattr->ia_ctime;
if (ia_valid & ATTR_MODE) {
umode_t mode = iattr->ia_mode;

--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 3, 2009, 6:57:42 AM11/3/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@xmission.com>

With the implementation of sysfs_getattr and sysfs_permission
sysfs becomes able to lazily propogate inode attribute changes
from the sysfs_dirents to the vfs inodes. This paves the way
for deleting significant chunks of now unnecessary code.

Acked-by: Tejun Heo <t...@kernel.org>
Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/dir.c | 2 +
fs/sysfs/inode.c | 64 ++++++++++++++++++++++++++++++++++++++-------------
fs/sysfs/symlink.c | 3 ++
fs/sysfs/sysfs.h | 2 +
4 files changed, 54 insertions(+), 17 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index fa37126..25d052a 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -800,7 +800,9 @@ static struct dentry * sysfs_lookup(struct inode *dir, struct dentry *dentry,

const struct inode_operations sysfs_dir_inode_operations = {
.lookup = sysfs_lookup,
+ .permission = sysfs_permission,
.setattr = sysfs_setattr,
+ .getattr = sysfs_getattr,
.setxattr = sysfs_setxattr,
};

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index fccfb55..2dcafe8 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -37,7 +37,9 @@ static struct backing_dev_info sysfs_backing_dev_info = {
};

static const struct inode_operations sysfs_inode_operations ={
+ .permission = sysfs_permission,
.setattr = sysfs_setattr,
+ .getattr = sysfs_getattr,
.setxattr = sysfs_setxattr,
};

@@ -196,7 +198,6 @@ static inline void set_default_inode_attr(struct inode * inode, mode_t mode)

static inline void set_inode_attr(struct inode * inode, struct iattr * iattr)
{
- inode->i_mode = iattr->ia_mode;
inode->i_uid = iattr->ia_uid;
inode->i_gid = iattr->ia_gid;
inode->i_atime = iattr->ia_atime;
@@ -227,38 +228,56 @@ static int sysfs_count_nlink(struct sysfs_dirent *sd)
return nr + 2;
}

+static void sysfs_refresh_inode(struct sysfs_dirent *sd, struct inode *inode)
+{
+ struct sysfs_inode_attrs *iattrs = sd->s_iattr;
+
+ inode->i_mode = sd->s_mode;
+ if (iattrs) {
+ /* sysfs_dirent has non-default attributes
+ * get them from persistent copy in sysfs_dirent
+ */
+ set_inode_attr(inode, &iattrs->ia_iattr);
+ security_inode_notifysecctx(inode,
+ iattrs->ia_secdata,
+ iattrs->ia_secdata_len);
+ }
+
+ if (sysfs_type(sd) == SYSFS_DIR)
+ inode->i_nlink = sysfs_count_nlink(sd);
+}
+
+int sysfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
+{
+ struct sysfs_dirent *sd = dentry->d_fsdata;
+ struct inode *inode = dentry->d_inode;
+
+ mutex_lock(&sysfs_mutex);
+ sysfs_refresh_inode(sd, inode);
+ mutex_unlock(&sysfs_mutex);
+
+ generic_fillattr(inode, stat);
+ return 0;
+}
+
static void sysfs_init_inode(struct sysfs_dirent *sd, struct inode *inode)
{
struct bin_attribute *bin_attr;
- struct sysfs_inode_attrs *iattrs;

inode->i_private = sysfs_get(sd);
inode->i_mapping->a_ops = &sysfs_aops;
inode->i_mapping->backing_dev_info = &sysfs_backing_dev_info;
inode->i_op = &sysfs_inode_operations;
- inode->i_ino = sd->s_ino;
lockdep_set_class(&inode->i_mutex, &sysfs_inode_imutex_key);

- iattrs = sd->s_iattr;
- if (iattrs) {
- /* sysfs_dirent has non-default attributes
- * get them for the new inode from persistent copy
- * in sysfs_dirent
- */
- set_inode_attr(inode, &iattrs->ia_iattr);
- if (iattrs->ia_secdata)
- security_inode_notifysecctx(inode,
- iattrs->ia_secdata,
- iattrs->ia_secdata_len);
- } else
- set_default_inode_attr(inode, sd->s_mode);
+ set_default_inode_attr(inode, sd->s_mode);
+ sysfs_refresh_inode(sd, inode);

/* initialize inode according to type */
switch (sysfs_type(sd)) {
case SYSFS_DIR:
inode->i_op = &sysfs_dir_inode_operations;
inode->i_fop = &sysfs_dir_operations;
- inode->i_nlink = sysfs_count_nlink(sd);
break;
case SYSFS_KOBJ_ATTR:
inode->i_size = PAGE_SIZE;
@@ -341,3 +360,14 @@ int sysfs_hash_and_remove(struct sysfs_dirent *dir_sd, const char *name)
else
return -ENOENT;
}
+
+int sysfs_permission(struct inode *inode, int mask)
+{
+ struct sysfs_dirent *sd = inode->i_private;
+
+ mutex_lock(&sysfs_mutex);
+ sysfs_refresh_inode(sd, inode);
+ mutex_unlock(&sysfs_mutex);
+
+ return generic_permission(inode, mask, NULL);
+}
diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
index 1137418..c5eff49 100644
--- a/fs/sysfs/symlink.c
+++ b/fs/sysfs/symlink.c
@@ -214,6 +214,9 @@ const struct inode_operations sysfs_symlink_inode_operations = {
.readlink = generic_readlink,
.follow_link = sysfs_follow_link,
.put_link = sysfs_put_link,
+ .setattr = sysfs_setattr,
+ .getattr = sysfs_getattr,
+ .permission = sysfs_permission,
};


diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index a96d967..12ccc07 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -156,7 +156,9 @@ static inline void __sysfs_put(struct sysfs_dirent *sd)
struct inode *sysfs_get_inode(struct sysfs_dirent *sd);
void sysfs_delete_inode(struct inode *inode);
int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr *iattr);
+int sysfs_permission(struct inode *inode, int mask);
int sysfs_setattr(struct dentry *dentry, struct iattr *iattr);
+int sysfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat);
int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
size_t size, int flags);
int sysfs_hash_and_remove(struct sysfs_dirent *dir_sd, const char *name);
--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 3, 2009, 6:57:59 AM11/3/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@xmission.com>

Cleanly separate the work that is specific to setting the
attributes of a sysfs_dirent from what is needed to update
the attributes of a vfs inode.

Additionally grab the sysfs_mutex to keep any nasties from
surprising us when updating the sysfs_dirent.

Acked-by: Tejun Heo <t...@kernel.org>
Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/inode.c | 52 ++++++++++++++++++++++++++++++++--------------------
fs/sysfs/sysfs.h | 1 +
2 files changed, 33 insertions(+), 20 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index fed7a74..fccfb55 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -64,30 +64,15 @@ struct sysfs_inode_attrs *sysfs_init_inode_attrs(struct sysfs_dirent *sd)

return attrs;
}
-int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
+
+int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr * iattr)
{
- struct inode * inode = dentry->d_inode;
- struct sysfs_dirent * sd = dentry->d_fsdata;
struct sysfs_inode_attrs *sd_attrs;
struct iattr *iattrs;
unsigned int ia_valid = iattr->ia_valid;
- int error;
-
- if (!sd)
- return -EINVAL;

sd_attrs = sd->s_iattr;

- error = inode_change_ok(inode, iattr);
- if (error)
- return error;
-
- iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */
-
- error = inode_setattr(inode, iattr);
- if (error)
- return error;
-
if (!sd_attrs) {
/* setting attributes for the first time, allocate now */
sd_attrs = sysfs_init_inode_attrs(sd);
@@ -110,12 +95,39 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)


iattrs->ia_ctime = iattr->ia_ctime;
if (ia_valid & ATTR_MODE) {
umode_t mode = iattr->ia_mode;
-

- if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
- mode &= ~S_ISGID;
iattrs->ia_mode = sd->s_mode = mode;


}
}
+ return 0;
+}
+

+int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
+{
+ struct inode * inode = dentry->d_inode;
+ struct sysfs_dirent * sd = dentry->d_fsdata;
+ int error;
+
+ if (!sd)
+ return -EINVAL;
+
+ error = inode_change_ok(inode, iattr);
+ if (error)
+ return error;
+
+ iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */
+ if (iattr->ia_valid & ATTR_MODE) {
+ if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
+ iattr->ia_mode &= ~S_ISGID;
+ }
+
+ error = inode_setattr(inode, iattr);
+ if (error)
+ return error;
+
+ mutex_lock(&sysfs_mutex);
+ error = sysfs_sd_setattr(sd, iattr);
+ mutex_unlock(&sysfs_mutex);
+
return error;
}

diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index af4c4e7..a96d967 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -155,6 +155,7 @@ static inline void __sysfs_put(struct sysfs_dirent *sd)
*/


struct inode *sysfs_get_inode(struct sysfs_dirent *sd);
void sysfs_delete_inode(struct inode *inode);

+int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr *iattr);


int sysfs_setattr(struct dentry *dentry, struct iattr *iattr);

int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
size_t size, int flags);

--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 3, 2009, 6:58:03 AM11/3/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@xmission.com>

Now that sysfs_getattr and sysfs_permission refresh the vfs
inode there is no need to immediatly push the mode change
into the vfs cache. Reducing the amount of work needed and
simplifying the locking.

Acked-by: Tejun Heo <t...@kernel.org>
Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/file.c | 31 ++++++++-----------------------
1 files changed, 8 insertions(+), 23 deletions(-)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index faa1a80..dc30d9e 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -579,38 +579,23 @@ EXPORT_SYMBOL_GPL(sysfs_add_file_to_group);
*/
int sysfs_chmod_file(struct kobject *kobj, struct attribute *attr, mode_t mode)
{
- struct sysfs_dirent *victim_sd = NULL;
- struct dentry *victim = NULL;
- struct inode * inode;
+ struct sysfs_dirent *sd;
struct iattr newattrs;
int rc;

- rc = -ENOENT;
- victim_sd = sysfs_get_dirent(kobj->sd, attr->name);
- if (!victim_sd)
- goto out;
+ mutex_lock(&sysfs_mutex);

- mutex_lock(&sysfs_rename_mutex);
- victim = sysfs_get_dentry(victim_sd);
- mutex_unlock(&sysfs_rename_mutex);
- if (IS_ERR(victim)) {
- rc = PTR_ERR(victim);
- victim = NULL;
+ rc = -ENOENT;
+ sd = sysfs_find_dirent(kobj->sd, attr->name);
+ if (!sd)
goto out;
- }
-
- inode = victim->d_inode;

- mutex_lock(&inode->i_mutex);
-
- newattrs.ia_mode = (mode & S_IALLUGO) | (inode->i_mode & ~S_IALLUGO);
+ newattrs.ia_mode = (mode & S_IALLUGO) | (sd->s_mode & ~S_IALLUGO);
newattrs.ia_valid = ATTR_MODE;
- rc = sysfs_setattr(victim, &newattrs);
+ rc = sysfs_sd_setattr(sd, &newattrs);

- mutex_unlock(&inode->i_mutex);
out:
- dput(victim);
- sysfs_put(victim_sd);
+ mutex_unlock(&sysfs_mutex);
return rc;
}
EXPORT_SYMBOL_GPL(sysfs_chmod_file);
--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 3, 2009, 6:58:08 AM11/3/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@xmission.com>

By teaching sysfs_revalidate to hide a dentry for
a sysfs_dirent if the sysfs_dirent has been renamed,
and by teaching sysfs_lookup to return the original
dentry if the sysfs dirent has been renamed. I can
show the results of renames correctly without having to
update the dcache during the directory rename.

This massively simplifies the rename logic allowing a lot
of weird sysfs special cases to be removed along with
a lot of now unnecesary helper code.

Acked-by: Tejun Heo <t...@kernel.org>
Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/namei.c | 22 -------
fs/sysfs/dir.c | 158 ++++++++++---------------------------------------
fs/sysfs/inode.c | 12 ----
fs/sysfs/sysfs.h | 1 -
include/linux/namei.h | 1 -
5 files changed, 32 insertions(+), 162 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index d11f404..d3c190c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1279,28 +1279,6 @@ struct dentry *lookup_one_len(const char *name, struct dentry *base, int len)
return __lookup_hash(&this, base, NULL);
}

-/**
- * lookup_one_noperm - bad hack for sysfs
- * @name: pathname component to lookup
- * @base: base directory to lookup from
- *
- * This is a variant of lookup_one_len that doesn't perform any permission
- * checks. It's a horrible hack to work around the braindead sysfs
- * architecture and should not be used anywhere else.
- *
- * DON'T USE THIS FUNCTION EVER, thanks.
- */
-struct dentry *lookup_one_noperm(const char *name, struct dentry *base)
-{
- int err;
- struct qstr this;
-
- err = __lookup_one_len(name, &this, base, strlen(name));
- if (err)
- return ERR_PTR(err);
- return __lookup_hash(&this, base, NULL);
-}
-
int user_path_at(int dfd, const char __user *name, unsigned flags,
struct path *path)
{
diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index a05b027..0b60212 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -25,7 +25,6 @@
#include "sysfs.h"

DEFINE_MUTEX(sysfs_mutex);
-DEFINE_MUTEX(sysfs_rename_mutex);
DEFINE_SPINLOCK(sysfs_assoc_lock);

static DEFINE_SPINLOCK(sysfs_ino_lock);
@@ -85,46 +84,6 @@ static void sysfs_unlink_sibling(struct sysfs_dirent *sd)
}

/**
- * sysfs_get_dentry - get dentry for the given sysfs_dirent
- * @sd: sysfs_dirent of interest
- *
- * Get dentry for @sd. Dentry is looked up if currently not
- * present. This function descends from the root looking up
- * dentry for each step.
- *
- * LOCKING:
- * mutex_lock(sysfs_rename_mutex)
- *
- * RETURNS:
- * Pointer to found dentry on success, ERR_PTR() value on error.
- */
-struct dentry *sysfs_get_dentry(struct sysfs_dirent *sd)
-{
- struct dentry *dentry = dget(sysfs_sb->s_root);
-
- while (dentry->d_fsdata != sd) {
- struct sysfs_dirent *cur;
- struct dentry *parent;
-
- /* find the first ancestor which hasn't been looked up */
- cur = sd;
- while (cur->s_parent != dentry->d_fsdata)
- cur = cur->s_parent;
-
- /* look it up */
- parent = dentry;
- mutex_lock(&parent->d_inode->i_mutex);
- dentry = lookup_one_noperm(cur->s_name, parent);
- mutex_unlock(&parent->d_inode->i_mutex);
- dput(parent);
-
- if (IS_ERR(dentry))
- break;
- }
- return dentry;
-}
-
-/**
* sysfs_get_active - get an active reference to sysfs_dirent
* @sd: sysfs_dirent to get an active reference to
*
@@ -315,6 +274,14 @@ static int sysfs_dentry_revalidate(struct dentry *dentry, struct nameidata *nd)
if (sd->s_flags & SYSFS_FLAG_REMOVED)
goto out_bad;

+ /* The sysfs dirent has been moved? */
+ if (dentry->d_parent->d_fsdata != sd->s_parent)
+ goto out_bad;
+
+ /* The sysfs dirent has been renamed */
+ if (strcmp(dentry->d_name.name, sd->s_name) != 0)
+ goto out_bad;
+
mutex_unlock(&sysfs_mutex);
out_valid:
return 1;
@@ -322,6 +289,12 @@ out_bad:
/* Remove the dentry from the dcache hashes.
* If this is a deleted dentry we use d_drop instead of d_delete
* so sysfs doesn't need to cope with negative dentries.
+ *
+ * If this is a dentry that has simply been renamed we
+ * use d_drop to remove it from the dcache lookup on its
+ * old parent. If this dentry persists later when a lookup
+ * is performed at its new name the dentry will be readded
+ * to the dcache hashes.
*/
is_dir = (sysfs_type(sd) == SYSFS_DIR);
mutex_unlock(&sysfs_mutex);
@@ -705,10 +678,15 @@ static struct dentry * sysfs_lookup(struct inode *dir, struct dentry *dentry,
}

/* instantiate and hash dentry */
- dentry->d_op = &sysfs_dentry_ops;
- dentry->d_fsdata = sysfs_get(sd);
- d_instantiate(dentry, inode);
- d_rehash(dentry);
+ ret = d_find_alias(inode);
+ if (!ret) {
+ dentry->d_op = &sysfs_dentry_ops;
+ dentry->d_fsdata = sysfs_get(sd);
+ d_add(dentry, inode);
+ } else {
+ d_move(ret, dentry);
+ iput(inode);
+ }

out_unlock:
mutex_unlock(&sysfs_mutex);
@@ -785,62 +763,32 @@ void sysfs_remove_dir(struct kobject * kobj)
int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
{
struct sysfs_dirent *sd = kobj->sd;
- struct dentry *parent = NULL;
- struct dentry *old_dentry = NULL, *new_dentry = NULL;
const char *dup_name = NULL;
int error;

- mutex_lock(&sysfs_rename_mutex);
+ mutex_lock(&sysfs_mutex);

error = 0;
if (strcmp(sd->s_name, new_name) == 0)
goto out; /* nothing to rename */

- /* get the original dentry */
- old_dentry = sysfs_get_dentry(sd);
- if (IS_ERR(old_dentry)) {
- error = PTR_ERR(old_dentry);
- old_dentry = NULL;
- goto out;
- }
-
- parent = old_dentry->d_parent;
-
- /* lock parent and get dentry for new name */
- mutex_lock(&parent->d_inode->i_mutex);
- mutex_lock(&sysfs_mutex);
-
error = -EEXIST;
if (sysfs_find_dirent(sd->s_parent, new_name))
- goto out_unlock;
-
- error = -ENOMEM;
- new_dentry = d_alloc_name(parent, new_name);
- if (!new_dentry)
- goto out_unlock;
+ goto out;

/* rename sysfs_dirent */
error = -ENOMEM;
new_name = dup_name = kstrdup(new_name, GFP_KERNEL);
if (!new_name)
- goto out_unlock;
+ goto out;

dup_name = sd->s_name;
sd->s_name = new_name;

- /* rename */
- d_add(new_dentry, NULL);
- d_move(old_dentry, new_dentry);
-
error = 0;
- out_unlock:
+ out:
mutex_unlock(&sysfs_mutex);
- mutex_unlock(&parent->d_inode->i_mutex);
kfree(dup_name);
- dput(old_dentry);
- dput(new_dentry);
- out:
- mutex_unlock(&sysfs_rename_mutex);
return error;
}

@@ -848,12 +796,11 @@ int sysfs_move_dir(struct kobject *kobj, struct kobject *new_parent_kobj)
{
struct sysfs_dirent *sd = kobj->sd;
struct sysfs_dirent *new_parent_sd;
- struct dentry *old_parent, *new_parent = NULL;
- struct dentry *old_dentry = NULL, *new_dentry = NULL;
int error;

- mutex_lock(&sysfs_rename_mutex);
BUG_ON(!sd->s_parent);
+
+ mutex_lock(&sysfs_mutex);
new_parent_sd = (new_parent_kobj && new_parent_kobj->sd) ?
new_parent_kobj->sd : &sysfs_root;

@@ -861,61 +808,20 @@ int sysfs_move_dir(struct kobject *kobj, struct kobject *new_parent_kobj)
if (sd->s_parent == new_parent_sd)
goto out; /* nothing to move */

- /* get dentries */
- old_dentry = sysfs_get_dentry(sd);
- if (IS_ERR(old_dentry)) {
- error = PTR_ERR(old_dentry);
- old_dentry = NULL;
- goto out;
- }
- old_parent = old_dentry->d_parent;
-
- new_parent = sysfs_get_dentry(new_parent_sd);
- if (IS_ERR(new_parent)) {
- error = PTR_ERR(new_parent);
- new_parent = NULL;
- goto out;
- }
-
-again:
- mutex_lock(&old_parent->d_inode->i_mutex);
- if (!mutex_trylock(&new_parent->d_inode->i_mutex)) {
- mutex_unlock(&old_parent->d_inode->i_mutex);
- goto again;
- }
- mutex_lock(&sysfs_mutex);
-
error = -EEXIST;
if (sysfs_find_dirent(new_parent_sd, sd->s_name))
- goto out_unlock;
-
- error = -ENOMEM;
- new_dentry = d_alloc_name(new_parent, sd->s_name);
- if (!new_dentry)
- goto out_unlock;
-
- error = 0;
- d_add(new_dentry, NULL);
- d_move(old_dentry, new_dentry);
+ goto out;

/* Remove from old parent's list and insert into new parent's list. */
sysfs_unlink_sibling(sd);
sysfs_get(new_parent_sd);
- drop_nlink(old_parent->d_inode);
sysfs_put(sd->s_parent);
sd->s_parent = new_parent_sd;
- inc_nlink(new_parent->d_inode);
sysfs_link_sibling(sd);

- out_unlock:
+ error = 0;
+out:
mutex_unlock(&sysfs_mutex);
- mutex_unlock(&new_parent->d_inode->i_mutex);
- mutex_unlock(&old_parent->d_inode->i_mutex);
- out:
- dput(new_parent);
- dput(old_dentry);
- dput(new_dentry);
- mutex_unlock(&sysfs_rename_mutex);
return error;
}

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 2dcafe8..082a3eb 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -205,17 +205,6 @@ static inline void set_inode_attr(struct inode * inode, struct iattr * iattr)
inode->i_ctime = iattr->ia_ctime;
}

-
-/*
- * sysfs has a different i_mutex lock order behavior for i_mutex than other
- * filesystems; sysfs i_mutex is called in many places with subsystem locks
- * held. At the same time, many of the VFS locking rules do not apply to
- * sysfs at all (cross directory rename for example). To untangle this mess
- * (which gives false positives in lockdep), we're giving sysfs inodes their
- * own class for i_mutex.
- */
-static struct lock_class_key sysfs_inode_imutex_key;
-


static int sysfs_count_nlink(struct sysfs_dirent *sd)

{
struct sysfs_dirent *child;
@@ -268,7 +257,6 @@ static void sysfs_init_inode(struct sysfs_dirent *sd, struct inode *inode)


inode->i_mapping->a_ops = &sysfs_aops;
inode->i_mapping->backing_dev_info = &sysfs_backing_dev_info;
inode->i_op = &sysfs_inode_operations;

- lockdep_set_class(&inode->i_mutex, &sysfs_inode_imutex_key);

set_default_inode_attr(inode, sd->s_mode);
sysfs_refresh_inode(sd, inode);
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 90b3501..98a15bf 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -103,7 +103,6 @@ extern struct kmem_cache *sysfs_dir_cachep;
* dir.c
*/
extern struct mutex sysfs_mutex;
-extern struct mutex sysfs_rename_mutex;
extern spinlock_t sysfs_assoc_lock;

extern const struct file_operations sysfs_dir_operations;
diff --git a/include/linux/namei.h b/include/linux/namei.h
index ec0f607..0289467 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -76,7 +76,6 @@ extern struct file *nameidata_to_filp(struct nameidata *nd, int flags);
extern void release_open_intent(struct nameidata *);

extern struct dentry *lookup_one_len(const char *, struct dentry *, int);
-extern struct dentry *lookup_one_noperm(const char *, struct dentry *);

extern int follow_down(struct path *);
extern int follow_up(struct path *);
--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 3, 2009, 6:58:22 AM11/3/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@xmission.com>

These two functions do 90% of the same work and it doesn't significantly
obfuscate the function to allow both the parent dir and the name to change
at the same time. So merge them together to simplify maintenance, and
increase testing.

Acked-by: Tejun Heo <t...@kernel.org>
Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/dir.c | 62 +++++++++++++++++++++++++----------------------------
fs/sysfs/sysfs.h | 3 ++
2 files changed, 32 insertions(+), 33 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 0b60212..e1a86d1 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -760,30 +760,42 @@ void sysfs_remove_dir(struct kobject * kobj)
__sysfs_remove_dir(sd);
}

-int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
+int sysfs_rename(struct sysfs_dirent *sd,
+ struct sysfs_dirent *new_parent_sd, const char *new_name)
{
- struct sysfs_dirent *sd = kobj->sd;


const char *dup_name = NULL;
int error;

mutex_lock(&sysfs_mutex);

error = 0;
- if (strcmp(sd->s_name, new_name) == 0)
+ if ((sd->s_parent == new_parent_sd) &&
+ (strcmp(sd->s_name, new_name) == 0))


goto out; /* nothing to rename */

error = -EEXIST;
- if (sysfs_find_dirent(sd->s_parent, new_name))
+ if (sysfs_find_dirent(new_parent_sd, new_name))


goto out;

/* rename sysfs_dirent */

- error = -ENOMEM;
- new_name = dup_name = kstrdup(new_name, GFP_KERNEL);
- if (!new_name)
- goto out;
+ if (strcmp(sd->s_name, new_name) != 0) {
+ error = -ENOMEM;
+ new_name = dup_name = kstrdup(new_name, GFP_KERNEL);
+ if (!new_name)
+ goto out;
+
+ dup_name = sd->s_name;
+ sd->s_name = new_name;
+ }

- dup_name = sd->s_name;
- sd->s_name = new_name;
+ /* Remove from old parent's list and insert into new parent's list. */
+ if (sd->s_parent != new_parent_sd) {
+ sysfs_unlink_sibling(sd);
+ sysfs_get(new_parent_sd);
+ sysfs_put(sd->s_parent);
+ sd->s_parent = new_parent_sd;
+ sysfs_link_sibling(sd);
+ }

error = 0;
out:
@@ -792,37 +804,21 @@ int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
return error;
}

+int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
+{
+ return sysfs_rename(kobj->sd, kobj->sd->s_parent, new_name);
+}
+

int sysfs_move_dir(struct kobject *kobj, struct kobject *new_parent_kobj)
{
struct sysfs_dirent *sd = kobj->sd;
struct sysfs_dirent *new_parent_sd;

- int error;

BUG_ON(!sd->s_parent);
-
- mutex_lock(&sysfs_mutex);
- new_parent_sd = (new_parent_kobj && new_parent_kobj->sd) ?
+ new_parent_sd = new_parent_kobj && new_parent_kobj->sd ?
new_parent_kobj->sd : &sysfs_root;

- error = 0;
- if (sd->s_parent == new_parent_sd)
- goto out; /* nothing to move */
-
- error = -EEXIST;
- if (sysfs_find_dirent(new_parent_sd, sd->s_name))
- goto out;
-
- /* Remove from old parent's list and insert into new parent's list. */
- sysfs_unlink_sibling(sd);
- sysfs_get(new_parent_sd);
- sysfs_put(sd->s_parent);
- sd->s_parent = new_parent_sd;
- sysfs_link_sibling(sd);


-
- error = 0;

-out:
- mutex_unlock(&sysfs_mutex);
- return error;
+ return sysfs_rename(sd, new_parent_sd, sd->s_name);
}

/* Relationship between s_mode and the DT_xxx types */
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 98a15bf..ca52e7b 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -130,6 +130,9 @@ int sysfs_create_subdir(struct kobject *kobj, const char *name,
struct sysfs_dirent **p_sd);
void sysfs_remove_subdir(struct sysfs_dirent *sd);

+int sysfs_rename(struct sysfs_dirent *sd,
+ struct sysfs_dirent *new_parent_sd, const char *new_name);
+
static inline struct sysfs_dirent *__sysfs_get(struct sysfs_dirent *sd)
{
if (sd) {
--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 3, 2009, 6:58:49 AM11/3/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
From: Eric W. Biederman <ebie...@aristanetworks.com>

With lazy inode updates and dentry operations bringing everything
into sync on demand there is no longer any need to immediately
update the vfs or grab i_mutex to protect those updates as we
make changes to sysfs.

Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/dir.c | 91 ++---------------------------------------------------
fs/sysfs/sysfs.h | 2 -
2 files changed, 4 insertions(+), 89 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 25d052a..a05b027 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -386,12 +386,6 @@ struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type)
return NULL;
}

-static int sysfs_ilookup_test(struct inode *inode, void *arg)
-{
- struct sysfs_dirent *sd = arg;
- return inode->i_ino == sd->s_ino;
-}
-
/**
* sysfs_addrm_start - prepare for sysfs_dirent add/remove
* @acxt: pointer to sysfs_addrm_cxt to be used
@@ -399,47 +393,20 @@ static int sysfs_ilookup_test(struct inode *inode, void *arg)
*
* This function is called when the caller is about to add or
* remove sysfs_dirent under @parent_sd. This function acquires
- * sysfs_mutex, grabs inode for @parent_sd if available and lock
- * i_mutex of it. @acxt is used to keep and pass context to
+ * sysfs_mutex. @acxt is used to keep and pass context to
* other addrm functions.
*
* LOCKING:
* Kernel thread context (may sleep). sysfs_mutex is locked on
- * return. i_mutex of parent inode is locked on return if
- * available.
+ * return.
*/
void sysfs_addrm_start(struct sysfs_addrm_cxt *acxt,
struct sysfs_dirent *parent_sd)
{
- struct inode *inode;
-
memset(acxt, 0, sizeof(*acxt));
acxt->parent_sd = parent_sd;

- /* Lookup parent inode. inode initialization is protected by
- * sysfs_mutex, so inode existence can be determined by
- * looking up inode while holding sysfs_mutex.
- */
mutex_lock(&sysfs_mutex);
-
- inode = ilookup5(sysfs_sb, parent_sd->s_ino, sysfs_ilookup_test,
- parent_sd);
- if (inode) {
- WARN_ON(inode->i_state & I_NEW);
-
- /* parent inode available */
- acxt->parent_inode = inode;
-
- /* sysfs_mutex is below i_mutex in lock hierarchy.
- * First, trylock i_mutex. If fails, unlock
- * sysfs_mutex and lock them in order.
- */
- if (!mutex_trylock(&inode->i_mutex)) {
- mutex_unlock(&sysfs_mutex);
- mutex_lock(&inode->i_mutex);
- mutex_lock(&sysfs_mutex);
- }
- }
}

/**
@@ -471,11 +438,6 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)

sd->s_parent = sysfs_get(acxt->parent_sd);

- if (sysfs_type(sd) == SYSFS_DIR && acxt->parent_inode)
- inc_nlink(acxt->parent_inode);
-
- acxt->cnt++;
-
sysfs_link_sibling(sd);

/* Update timestamps on the parent */
@@ -579,40 +541,6 @@ void sysfs_remove_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
sd->s_flags |= SYSFS_FLAG_REMOVED;
sd->s_sibling = acxt->removed;
acxt->removed = sd;
-
- if (sysfs_type(sd) == SYSFS_DIR && acxt->parent_inode)
- drop_nlink(acxt->parent_inode);
-
- acxt->cnt++;
-}
-
-/**
- * sysfs_dec_nlink - Decrement link count for the specified sysfs_dirent
- * @sd: target sysfs_dirent
- *
- * Decrement nlink for @sd. @sd must have been unlinked from its
- * parent on entry to this function such that it can't be looked
- * up anymore.
- */
-static void sysfs_dec_nlink(struct sysfs_dirent *sd)
-{
- struct inode *inode;
-
- inode = ilookup(sysfs_sb, sd->s_ino);
- if (!inode)
- return;
-
- /* adjust nlink and update timestamp */
- mutex_lock(&inode->i_mutex);
-
- inode->i_ctime = CURRENT_TIME;
- drop_nlink(inode);
- if (sysfs_type(sd) == SYSFS_DIR)
- drop_nlink(inode);
-
- mutex_unlock(&inode->i_mutex);
-
- iput(inode);
}

/**
@@ -621,25 +549,15 @@ static void sysfs_dec_nlink(struct sysfs_dirent *sd)
*
* Finish up sysfs_dirent add/remove. Resources acquired by
* sysfs_addrm_start() are released and removed sysfs_dirents are
- * cleaned up. Timestamps on the parent inode are updated.
+ * cleaned up.
*
* LOCKING:
- * All mutexes acquired by sysfs_addrm_start() are released.
+ * sysfs_mutex is released.
*/
void sysfs_addrm_finish(struct sysfs_addrm_cxt *acxt)
{
/* release resources acquired by sysfs_addrm_start() */
mutex_unlock(&sysfs_mutex);
- if (acxt->parent_inode) {
- struct inode *inode = acxt->parent_inode;
-
- /* if added/removed, update timestamps on the parent */
- if (acxt->cnt)
- inode->i_ctime = inode->i_mtime = CURRENT_TIME;
-
- mutex_unlock(&inode->i_mutex);
- iput(inode);
- }

/* kill removed sysfs_dirents */
while (acxt->removed) {
@@ -648,7 +566,6 @@ void sysfs_addrm_finish(struct sysfs_addrm_cxt *acxt)
acxt->removed = sd->s_sibling;
sd->s_sibling = NULL;

- sysfs_dec_nlink(sd);
sysfs_deactivate(sd);
unmap_bin_file(sd);
sysfs_put(sd);
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 12ccc07..90b3501 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -89,9 +89,7 @@ static inline unsigned int sysfs_type(struct sysfs_dirent *sd)
*/
struct sysfs_addrm_cxt {
struct sysfs_dirent *parent_sd;
- struct inode *parent_inode;
struct sysfs_dirent *removed;
- int cnt;
};

/*
--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 3, 2009, 6:58:58 AM11/3/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@xmission.com>

Using dentry instead of d in the function name is what
several other filesystems are doing and it seems to be
a more readable convention.

Acked-by: Tejun Heo <t...@kernel.org>


Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/dir.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index e020183..130dfc3 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -298,7 +298,7 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
goto repeat;
}

-static void sysfs_d_iput(struct dentry * dentry, struct inode * inode)
+static void sysfs_dentry_iput(struct dentry * dentry, struct inode * inode)
{


struct sysfs_dirent * sd = dentry->d_fsdata;

@@ -307,7 +307,7 @@ static void sysfs_d_iput(struct dentry * dentry, struct inode * inode)
}

static const struct dentry_operations sysfs_dentry_ops = {
- .d_iput = sysfs_d_iput,
+ .d_iput = sysfs_dentry_iput,
};



struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type)

--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 3, 2009, 6:59:06 AM11/3/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@xmission.com>

Lining up the functions in sysfs_symlink_inode_operations
follows the pattern in the rest of sysfs and makes things
slightly more readable.

Acked-by: Tejun Heo <t...@kernel.org>
Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/symlink.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
index c5081ad..1137418 100644
--- a/fs/sysfs/symlink.c
+++ b/fs/sysfs/symlink.c
@@ -210,10 +210,10 @@ static void sysfs_put_link(struct dentry *dentry, struct nameidata *nd, void *co


}

const struct inode_operations sysfs_symlink_inode_operations = {

- .setxattr = sysfs_setxattr,
- .readlink = generic_readlink,
- .follow_link = sysfs_follow_link,
- .put_link = sysfs_put_link,
+ .setxattr = sysfs_setxattr,
+ .readlink = generic_readlink,
+ .follow_link = sysfs_follow_link,
+ .put_link = sysfs_put_link,
};


--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 3, 2009, 6:59:34 AM11/3/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@xmission.com>

Currently sysfs updates the timestamps on the vfs directory
inode when we create or remove a directory entry but doesn't
update the cached copy on the sysfs_dirent, fix that oversight.

Acked-by: Tejun Heo <t...@kernel.org>
Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/dir.c | 18 ++++++++++++++++++
1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index b5e8499..fa37126 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -464,6 +464,8 @@ void sysfs_addrm_start(struct sysfs_addrm_cxt *acxt,
*/


int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)

{
+ struct sysfs_inode_attrs *ps_iattr;
+
if (sysfs_find_dirent(acxt->parent_sd, sd->s_name))
return -EEXIST;

@@ -476,6 +478,13 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)

sysfs_link_sibling(sd);

+ /* Update timestamps on the parent */
+ ps_iattr = acxt->parent_sd->s_iattr;
+ if (ps_iattr) {
+ struct iattr *ps_iattrs = &ps_iattr->ia_iattr;
+ ps_iattrs->ia_ctime = ps_iattrs->ia_mtime = CURRENT_TIME;
+ }
+
return 0;
}

@@ -554,10 +563,19 @@ int sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
*/


void sysfs_remove_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)

{
+ struct sysfs_inode_attrs *ps_iattr;
+
BUG_ON(sd->s_flags & SYSFS_FLAG_REMOVED);

sysfs_unlink_sibling(sd);

+ /* Update timestamps on the parent */
+ ps_iattr = acxt->parent_sd->s_iattr;
+ if (ps_iattr) {
+ struct iattr *ps_iattrs = &ps_iattr->ia_iattr;
+ ps_iattrs->ia_ctime = ps_iattrs->ia_mtime = CURRENT_TIME;
+ }
+


sd->s_flags |= SYSFS_FLAG_REMOVED;
sd->s_sibling = acxt->removed;
acxt->removed = sd;
--

1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 3, 2009, 6:59:41 AM11/3/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@xmission.com>

Calling d_drop unconditionally when a sysfs_dirent is deleted has
the potential to leak mounts, so instead implement dentry delete
and revalidate operations that cause sysfs dentries to be removed
at the appropriate time.

Acked-by: Tejun Heo <t...@kernel.org>
Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/dir.c | 73 +++++++++++++++++++++++++++++++++++--------------------
1 files changed, 46 insertions(+), 27 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 130dfc3..b5e8499 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -298,6 +298,46 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
goto repeat;
}

+static int sysfs_dentry_delete(struct dentry *dentry)


+{
+ struct sysfs_dirent *sd = dentry->d_fsdata;

+ return !!(sd->s_flags & SYSFS_FLAG_REMOVED);
+}
+
+static int sysfs_dentry_revalidate(struct dentry *dentry, struct nameidata *nd)
+{
+ struct sysfs_dirent *sd = dentry->d_fsdata;
+ int is_dir;
+
+ mutex_lock(&sysfs_mutex);
+
+ /* The sysfs dirent has been deleted */
+ if (sd->s_flags & SYSFS_FLAG_REMOVED)
+ goto out_bad;
+
+ mutex_unlock(&sysfs_mutex);
+out_valid:
+ return 1;
+out_bad:
+ /* Remove the dentry from the dcache hashes.
+ * If this is a deleted dentry we use d_drop instead of d_delete
+ * so sysfs doesn't need to cope with negative dentries.
+ */
+ is_dir = (sysfs_type(sd) == SYSFS_DIR);
+ mutex_unlock(&sysfs_mutex);
+ if (is_dir) {
+ /* If we have submounts we must allow the vfs caches
+ * to lie about the state of the filesystem to prevent
+ * leaks and other nasty things.
+ */
+ if (have_submounts(dentry))
+ goto out_valid;
+ shrink_dcache_parent(dentry);
+ }
+ d_drop(dentry);


+ return 0;
+}
+

static void sysfs_dentry_iput(struct dentry * dentry, struct inode * inode)
{
struct sysfs_dirent * sd = dentry->d_fsdata;

@@ -307,6 +347,8 @@ static void sysfs_dentry_iput(struct dentry * dentry, struct inode * inode)


}

static const struct dentry_operations sysfs_dentry_ops = {

+ .d_revalidate = sysfs_dentry_revalidate,
+ .d_delete = sysfs_dentry_delete,
.d_iput = sysfs_dentry_iput,
};

@@ -527,44 +569,21 @@ void sysfs_remove_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
}

/**
- * sysfs_drop_dentry - drop dentry for the specified sysfs_dirent
+ * sysfs_dec_nlink - Decrement link count for the specified sysfs_dirent
* @sd: target sysfs_dirent
*
- * Drop dentry for @sd. @sd must have been unlinked from its
+ * Decrement nlink for @sd. @sd must have been unlinked from its


* parent on entry to this function such that it can't be looked

* up anymore.
*/
-static void sysfs_drop_dentry(struct sysfs_dirent *sd)
+static void sysfs_dec_nlink(struct sysfs_dirent *sd)
{
struct inode *inode;
- struct dentry *dentry;



inode = ilookup(sysfs_sb, sd->s_ino);

if (!inode)
return;

- /* Drop any existing dentries associated with sd.
- *
- * For the dentry to be properly freed we need to grab a
- * reference to the dentry under the dcache lock, unhash it,
- * and then put it. The playing with the dentry count allows
- * dput to immediately free the dentry if it is not in use.
- */
-repeat:
- spin_lock(&dcache_lock);
- list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
- if (d_unhashed(dentry))
- continue;
- dget_locked(dentry);
- spin_lock(&dentry->d_lock);
- __d_drop(dentry);
- spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_lock);
- dput(dentry);
- goto repeat;
- }
- spin_unlock(&dcache_lock);


-
/* adjust nlink and update timestamp */

mutex_lock(&inode->i_mutex);

@@ -611,7 +630,7 @@ void sysfs_addrm_finish(struct sysfs_addrm_cxt *acxt)


acxt->removed = sd->s_sibling;
sd->s_sibling = NULL;

- sysfs_drop_dentry(sd);
+ sysfs_dec_nlink(sd);
sysfs_deactivate(sd);
unmap_bin_file(sd);
sysfs_put(sd);
--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 3, 2009, 6:59:50 AM11/3/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
From: Eric W. Biederman <ebie...@aristanetworks.com>

The sysfs_mutex is required to ensure updates are and will remain
atomic with respect to other inode iattr updates, that do not happen
through the filesystem.

Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/inode.c | 41 +++++++++++++++++++++++++++++------------
1 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index e28cecf..8a08250 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -122,23 +122,39 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
return error;
}

+static int sysfs_sd_setsecdata(struct sysfs_dirent *sd, void **secdata, u32 *secdata_len)
+{
+ struct sysfs_inode_attrs *iattrs;
+ void *old_secdata;
+ size_t old_secdata_len;
+
+ iattrs = sd->s_iattr;
+ if (!iattrs)
+ iattrs = sysfs_init_inode_attrs(sd);
+ if (!iattrs)
+ return -ENOMEM;
+
+ old_secdata = iattrs->ia_secdata;
+ old_secdata_len = iattrs->ia_secdata_len;
+
+ iattrs->ia_secdata = *secdata;
+ iattrs->ia_secdata_len = *secdata_len;
+
+ *secdata = old_secdata;
+ *secdata_len = old_secdata_len;


+ return 0;
+}
+

int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
size_t size, int flags)

{


struct sysfs_dirent *sd = dentry->d_fsdata;

- struct sysfs_inode_attrs *iattrs;
void *secdata;
int error;
u32 secdata_len = 0;

if (!sd)
return -EINVAL;
- if (!sd->s_iattr)
- sd->s_iattr = sysfs_init_inode_attrs(sd);
- if (!sd->s_iattr)
- return -ENOMEM;
-


- iattrs = sd->s_iattr;

if (!strncmp(name, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN)) {
const char *suffix = name + XATTR_SECURITY_PREFIX_LEN;
@@ -150,12 +166,13 @@ int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
&secdata, &secdata_len);
if (error)
goto out;
- if (iattrs->ia_secdata)
- security_release_secctx(iattrs->ia_secdata,
- iattrs->ia_secdata_len);
- iattrs->ia_secdata = secdata;
- iattrs->ia_secdata_len = secdata_len;

+ mutex_lock(&sysfs_mutex);
+ error = sysfs_sd_setsecdata(sd, &secdata, &secdata_len);
+ mutex_unlock(&sysfs_mutex);
+
+ if (secdata)
+ security_release_secctx(secdata, secdata_len);
} else
return -EINVAL;
out:
--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 3, 2009, 6:59:59 AM11/3/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@xmission.com>

Currently every caller of sysfs_chmod_file happens at either
file creation time to set a non-default mode or in response
to a specific user requested space change in policy. Making
timestamps of when the chmod happens and notification of
a file changing mode uninteresting.

Remove the unnecessary time stamp and filesystem change
notification, and removes the last of the explicit inotify
and donitfy support from sysfs.

Acked-by: Tejun Heo <t...@kernel.org>


Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/file.c | 10 +---------
1 files changed, 1 insertions(+), 9 deletions(-)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index f5ea468..faa1a80 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -604,17 +604,9 @@ int sysfs_chmod_file(struct kobject *kobj, struct attribute *attr, mode_t mode)
mutex_lock(&inode->i_mutex);



newattrs.ia_mode = (mode & S_IALLUGO) | (inode->i_mode & ~S_IALLUGO);

- newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
- newattrs.ia_ctime = current_fs_time(inode->i_sb);
+ newattrs.ia_valid = ATTR_MODE;
rc = sysfs_setattr(victim, &newattrs);

- if (rc == 0) {
- fsnotify_change(victim, newattrs.ia_valid);
- mutex_lock(&sysfs_mutex);
- victim_sd->s_mode = newattrs.ia_mode;
- mutex_unlock(&sysfs_mutex);
- }
-
mutex_unlock(&inode->i_mutex);
out:
dput(victim);
--
1.6.5.2.143.g8cc62

Serge E. Hallyn

unread,
Nov 3, 2009, 8:48:56 AM11/3/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman

You're ignoring the potential -ENOMEM return value here?

Worse, if -ENOMEM, then secdata was never set, so you call
security_release_secctx() on a random value left on the stack...

> + mutex_unlock(&sysfs_mutex);
> +
> + if (secdata)
> + security_release_secctx(secdata, secdata_len);
> } else
> return -EINVAL;
> out:
> --
> 1.6.5.2.143.g8cc62
>
> --

> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in

Eric W. Biederman

unread,
Nov 3, 2009, 4:09:28 PM11/3/09
to Serge E. Hallyn, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman

No. It is more elegant than that.
If sysfs_sd_setsecdata fails secdata holds the freshly allocated secdata value.
If sysfs_sd_setsecdata succeeds secdata holds the old value of secdata.

Eric

Serge E. Hallyn

unread,
Nov 3, 2009, 4:31:54 PM11/3/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman

Gah, sorry.

Acked-by: Serge Hallyn <se...@us.ibm.com>

(as an aside, I can't see any reason to not just return if strncmp(name,
XATTR_SECURITY_PREFIX,) above to avoid a level of indent and needless gotos)

thanks,
-serge

Eric W. Biederman

unread,
Nov 3, 2009, 5:14:21 PM11/3/09
to Serge E. Hallyn, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
"Serge E. Hallyn" <se...@us.ibm.com> writes:
>
> Gah, sorry.

No prob.

> Acked-by: Serge Hallyn <se...@us.ibm.com>
>
> (as an aside, I can't see any reason to not just return if strncmp(name,
> XATTR_SECURITY_PREFIX,) above to avoid a level of indent and needless gotos)

Sounds like it is worth a patch. For now I'm concentrating on getting
this patchset merged, and extraneous cleanups have a way of dragging
the process out beyond what time I have to give.

Eric

Serge E. Hallyn

unread,
Nov 3, 2009, 5:27:57 PM11/3/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Quoting Eric W. Biederman (ebie...@xmission.com):
> From: Eric W. Biederman <ebie...@xmission.com>
>
> Calling d_drop unconditionally when a sysfs_dirent is deleted has
> the potential to leak mounts, so instead implement dentry delete
> and revalidate operations that cause sysfs dentries to be removed
> at the appropriate time.
>
> Acked-by: Tejun Heo <t...@kernel.org>
> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>

I like where it's heading (and spent some time staring at the
related code, looks kosher).

Acked-by: Serge Hallyn <se...@us.ibm.com>

> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in

Serge E. Hallyn

unread,
Nov 3, 2009, 9:43:47 PM11/3/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Quoting Eric W. Biederman (ebie...@xmission.com):
> From: Eric W. Biederman <ebie...@xmission.com>
>
> Currently every caller of sysfs_chmod_file happens at either
> file creation time to set a non-default mode or in response
> to a specific user requested space change in policy. Making
> timestamps of when the chmod happens and notification of
> a file changing mode uninteresting.

But these changes can occur by togging values in sysfs files
(i.e. f71805f.c), right? Is this (specifically not doing inotify)
definately uncontroversial?

I can't exactly picture an admin sitting there watching
nautilus for a sysfs file to become writeable, but could
imagine some site's automation getting hung... Or am I way
off base?

> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in

Serge E. Hallyn

unread,
Nov 3, 2009, 9:57:30 PM11/3/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Quoting Eric W. Biederman (ebie...@xmission.com):
> From: Eric W. Biederman <ebie...@xmission.com>
>
> The granularity of sysfs time when we keep it is 1 ns. Which
> when passed to timestamp_trunc results in a nop. So remove
> the unnecessary function call making sysfs_setattr slightly
> easier to read.
>
> Acked-by: Tejun Heo <t...@kernel.org>
> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>

Acked-by: Serge Hallyn <se...@us.ibm.com>

> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in

Serge E. Hallyn

unread,
Nov 3, 2009, 10:16:47 PM11/3/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Quoting Eric W. Biederman (ebie...@xmission.com):

Was it a bug that before this patch this wasn't cleared before the
actual inode_setattr()?

Since the S_ISGID will be set for the *new* gid, that is,
iattr->ia_gid, shouldn't the user be required to be
in_group_p(iattr->i_gid)? Note you haven't done the
inode_setattr() yet.

> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in

Eric W. Biederman

unread,
Nov 3, 2009, 10:43:19 PM11/3/09
to Serge E. Hallyn, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
"Serge E. Hallyn" <se...@us.ibm.com> writes:

> Quoting Eric W. Biederman (ebie...@xmission.com):
>> From: Eric W. Biederman <ebie...@xmission.com>
>>
>> Currently every caller of sysfs_chmod_file happens at either
>> file creation time to set a non-default mode or in response
>> to a specific user requested space change in policy. Making
>> timestamps of when the chmod happens and notification of
>> a file changing mode uninteresting.
>
> But these changes can occur by togging values in sysfs files
> (i.e. f71805f.c), right? Is this (specifically not doing inotify)
> definately uncontroversial?

The fs_notify_change was not introduced to deliberately support
a feature but as a side effect of other cleanups. So there
is no indication that anyone cares about inotify support.

> I can't exactly picture an admin sitting there watching
> nautilus for a sysfs file to become writeable, but could
> imagine some site's automation getting hung... Or am I way
> off base?

I would be stunned if the shell script in the automation that writes
to a sysfs file to make things writeable doesn't on it's next line
kick off whatever needs it to be writable.

With no benefit to using inotify and with only a handful of sysfs
files affected I don't expect this change to break anything in
userspace and I have been happily running with it for a year or so on
all of our machines at work with no one problems.

The reason I am making the change is that the goal of this patchset is
to get sysfs to act like any other distributed filesystem in linux,
and to use the same interfaces in roughly the same ways as other
distributed filesystems. Unfortunately there is not a good interface
for distributed filesystems to support inotify or I would use it.

Eric

Eric W. Biederman

unread,
Nov 3, 2009, 10:50:13 PM11/3/09
to Serge E. Hallyn, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
"Serge E. Hallyn" <se...@us.ibm.com> writes:

Not strictly as inode_setattr performs the exact same check.

> Since the S_ISGID will be set for the *new* gid, that is,
> iattr->ia_gid, shouldn't the user be required to be
> in_group_p(iattr->i_gid)? Note you haven't done the
> inode_setattr() yet.

Interesting point, yes I am potentially testing the wrong gid
there. The dangers of moving code around.

Thank you for catching that. !#@#!#

Eric

Serge E. Hallyn

unread,
Nov 3, 2009, 10:54:28 PM11/3/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Quoting Eric W. Biederman (ebie...@xmission.com):
> From: Eric W. Biederman <ebie...@xmission.com>
>
> Currently sysfs updates the timestamps on the vfs directory
> inode when we create or remove a directory entry but doesn't
> update the cached copy on the sysfs_dirent, fix that oversight.

confused... why not do this in sysfs_addrm_finish()?

I guess you'd have to do at it at top before dropping sysfs_mutex
so it wouldn't be as pretty as I was thinking, but at least you
could just do it once.

> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in

Eric W. Biederman

unread,
Nov 3, 2009, 11:12:27 PM11/3/09
to Serge E. Hallyn, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
"Serge E. Hallyn" <se...@us.ibm.com> writes:

> Quoting Eric W. Biederman (ebie...@xmission.com):
>> From: Eric W. Biederman <ebie...@xmission.com>
>>
>> Currently sysfs updates the timestamps on the vfs directory
>> inode when we create or remove a directory entry but doesn't
>> update the cached copy on the sysfs_dirent, fix that oversight.
>
> confused... why not do this in sysfs_addrm_finish()?
>
> I guess you'd have to do at it at top before dropping sysfs_mutex
> so it wouldn't be as pretty as I was thinking, but at least you
> could just do it once.

Well sysfs_addrm_finish doesn't really know if you did anything.

Beyond that my ultimate goal is to kill sysfs_addrm_start and
sysfs_addrm_finish. Of course that requires fixing all of the
sysfs users that depend on the impossible to get right recursive
directory removal in sysfs, so it is not the subject of this patchset.

Eric

Serge E. Hallyn

unread,
Nov 3, 2009, 11:21:01 PM11/3/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Quoting Eric W. Biederman (ebie...@xmission.com):

So the inode->i_mutex is not needed?

> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in

Serge E. Hallyn

unread,
Nov 3, 2009, 11:23:59 PM11/3/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Quoting Eric W. Biederman (ebie...@xmission.com):
> From: Eric W. Biederman <ebie...@xmission.com>
>
> Now that sysfs_getattr and sysfs_permission refresh the vfs
> inode there is no need to immediatly push the mode change
> into the vfs cache. Reducing the amount of work needed and
> simplifying the locking.
>
> Acked-by: Tejun Heo <t...@kernel.org>
> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>

Nice.

Acked-by: Serge Hallyn <se...@us.ibm.com>

> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in

Serge E. Hallyn

unread,
Nov 3, 2009, 11:33:03 PM11/3/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Quoting Eric W. Biederman (ebie...@xmission.com):
> From: Eric W. Biederman <ebie...@aristanetworks.com>
>
> With lazy inode updates and dentry operations bringing everything
> into sync on demand there is no longer any need to immediately
> update the vfs or grab i_mutex to protect those updates as we
> make changes to sysfs.
>
> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>

Acked-by: Serge Hallyn <se...@us.ibm.com>

> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in

Serge E. Hallyn

unread,
Nov 3, 2009, 11:33:51 PM11/3/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Quoting Eric W. Biederman (ebie...@xmission.com):
> From: Eric W. Biederman <ebie...@xmission.com>
>
> Using dentry instead of d in the function name is what
> several other filesystems are doing and it seems to be
> a more readable convention.
>
> Acked-by: Tejun Heo <t...@kernel.org>
> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>

Acked-by: Serge Hallyn <se...@us.ibm.com>

> ---
> fs/sysfs/dir.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> index e020183..130dfc3 100644
> --- a/fs/sysfs/dir.c
> +++ b/fs/sysfs/dir.c
> @@ -298,7 +298,7 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
> goto repeat;
> }
>
> -static void sysfs_d_iput(struct dentry * dentry, struct inode * inode)
> +static void sysfs_dentry_iput(struct dentry * dentry, struct inode * inode)
> {
> struct sysfs_dirent * sd = dentry->d_fsdata;
>
> @@ -307,7 +307,7 @@ static void sysfs_d_iput(struct dentry * dentry, struct inode * inode)
> }
>
> static const struct dentry_operations sysfs_dentry_ops = {
> - .d_iput = sysfs_d_iput,
> + .d_iput = sysfs_dentry_iput,
> };
>
> struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type)
> --
> 1.6.5.2.143.g8cc62
>
> --

> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in

Serge E. Hallyn

unread,
Nov 3, 2009, 11:34:05 PM11/3/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Quoting Eric W. Biederman (ebie...@xmission.com):
> From: Eric W. Biederman <ebie...@xmission.com>
>
> Lining up the functions in sysfs_symlink_inode_operations
> follows the pattern in the rest of sysfs and makes things
> slightly more readable.
>
> Acked-by: Tejun Heo <t...@kernel.org>
> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>

Acked-by: Serge Hallyn <se...@us.ibm.com>

> ---
> fs/sysfs/symlink.c | 8 ++++----
> 1 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
> index c5081ad..1137418 100644
> --- a/fs/sysfs/symlink.c
> +++ b/fs/sysfs/symlink.c
> @@ -210,10 +210,10 @@ static void sysfs_put_link(struct dentry *dentry, struct nameidata *nd, void *co
> }
>
> const struct inode_operations sysfs_symlink_inode_operations = {
> - .setxattr = sysfs_setxattr,
> - .readlink = generic_readlink,
> - .follow_link = sysfs_follow_link,
> - .put_link = sysfs_put_link,
> + .setxattr = sysfs_setxattr,
> + .readlink = generic_readlink,
> + .follow_link = sysfs_follow_link,
> + .put_link = sysfs_put_link,
> };
>
>
> --
> 1.6.5.2.143.g8cc62
>
> --

> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in

Serge E. Hallyn

unread,
Nov 3, 2009, 11:56:00 PM11/3/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman

Ok - I personally agree, but I know there are admins out there with
very different mindsets from mine

Acked-by: Serge Hallyn <se...@us.ibm.com>

-serge

Serge E. Hallyn

unread,
Nov 3, 2009, 11:58:33 PM11/3/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Quoting Eric W. Biederman (ebie...@xmission.com):
> "Serge E. Hallyn" <se...@us.ibm.com> writes:
>
> > Quoting Eric W. Biederman (ebie...@xmission.com):
> >> From: Eric W. Biederman <ebie...@xmission.com>
> >>
> >> Currently sysfs updates the timestamps on the vfs directory
> >> inode when we create or remove a directory entry but doesn't
> >> update the cached copy on the sysfs_dirent, fix that oversight.
> >
> > confused... why not do this in sysfs_addrm_finish()?
> >
> > I guess you'd have to do at it at top before dropping sysfs_mutex
> > so it wouldn't be as pretty as I was thinking, but at least you
> > could just do it once.
>
> Well sysfs_addrm_finish doesn't really know if you did anything.

Oh right - well it used to through cnt right? but not after your
last patch.

> Beyond that my ultimate goal is to kill sysfs_addrm_start and
> sysfs_addrm_finish. Of course that requires fixing all of the
> sysfs users that depend on the impossible to get right recursive
> directory removal in sysfs, so it is not the subject of this patchset.

I didn't see the patch nixing inode->i_mtime (and cnt) changing from
sysfs_addrm_finish() until after responding. Got it now.

Acked-by: Serge Hallyn <se...@us.ibm.com>

-serge

Eric W. Biederman

unread,
Nov 4, 2009, 12:50:45 AM11/4/09
to Serge E. Hallyn, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
"Serge E. Hallyn" <se...@us.ibm.com> writes:

> So the inode->i_mutex is not needed?

Good question. Nothing in sysfs needs it. The VFS does not grab the
inode mutex on this path, but the vfs does grab the inode mutex when
writing to the inode.

Since the VFs isn't grabbing the inode_mutex there is probably a race in
here somewhere if someone looks at things just right.

I am too tired tonight to be that person.

Eric

Eric W. Biederman

unread,
Nov 4, 2009, 7:04:25 AM11/4/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn

inode_change_ok already clears the SGID bit when necessary so there is
no reason for sysfs_setattr to carry code to do the same, and it is
good to kill the extra copy because when I moved the code, I goofed
and in certain corner cases the code will look at the wrong gid.

Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>

---
fs/sysfs/inode.c | 4 ----
1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 72e2e99..e2595a7 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -120,10 +120,6 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
return error;



iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */

- if (iattr->ia_valid & ATTR_MODE) {


- if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))

- iattr->ia_mode &= ~S_ISGID;


- }

error = inode_setattr(inode, iattr);

if (error)
--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 4, 2009, 7:05:27 AM11/4/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn

In general everything that writes to vfs inodes holds the
inode mutex, so hold the inode mutex over sysfs_refresh_inode.
The sysfs data structures don't need this but it looks like the
vfs might.

Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/inode.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index e2595a7..ad549f5 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -240,9 +240,11 @@ int sysfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *sta


struct sysfs_dirent *sd = dentry->d_fsdata;

struct inode *inode = dentry->d_inode;

+ mutex_lock(&inode->i_mutex);
mutex_lock(&sysfs_mutex);
sysfs_refresh_inode(sd, inode);
mutex_unlock(&sysfs_mutex);
+ mutex_unlock(&inode->i_mutex);

generic_fillattr(inode, stat);
return 0;
@@ -353,9 +355,11 @@ int sysfs_permission(struct inode *inode, int mask)
{


struct sysfs_dirent *sd = inode->i_private;

+ mutex_lock(&inode->i_mutex);
mutex_lock(&sysfs_mutex);
sysfs_refresh_inode(sd, inode);
mutex_unlock(&sysfs_mutex);
+ mutex_unlock(&inode->i_mutex);

return generic_permission(inode, mask, NULL);

Eric W. Biederman

unread,
Nov 4, 2009, 7:57:14 AM11/4/09
to Serge E. Hallyn, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman

Turns out it wasn't as bad as it looks because inode_change_ok already
does what I am trying to do properly.

Although come to think of it, SGID on a sysfs file is completely bogus.
We don't exec files and we don't allow users to create anything.

I have already sent the follow up patch to remove those few lines of code.

Serge E. Hallyn

unread,
Nov 4, 2009, 9:24:40 AM11/4/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Quoting Eric W. Biederman (ebie...@xmission.com):
> "Serge E. Hallyn" <se...@us.ibm.com> writes:
>
> > So the inode->i_mutex is not needed?
>
> Good question. Nothing in sysfs needs it. The VFS does not grab the
> inode mutex on this path, but the vfs does grab the inode mutex when
> writing to the inode.

All callers of fs/attr.c:notify_change() do seem to take the i_mutex,
though. And Documentation/filesystem/Locking claims that ->setattr()
does need i_mutex. So I assume that setting of inode->i_ctime etc,
which is what you're doing here, needs to be protected by the i_mutex.

> Since the VFs isn't grabbing the inode_mutex there is probably a race in
> here somewhere if someone looks at things just right.
>
> I am too tired tonight to be that person.

The readers take no lock of any sort (i.e. generic_fillattr and its
callers) so IIUC they could get inconsistent data...

-serge

Serge E. Hallyn

unread,
Nov 4, 2009, 9:25:51 AM11/4/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise
Quoting Eric W. Biederman (ebie...@xmission.com):
>
> inode_change_ok already clears the SGID bit when necessary so there is
> no reason for sysfs_setattr to carry code to do the same, and it is
> good to kill the extra copy because when I moved the code, I goofed
> and in certain corner cases the code will look at the wrong gid.
>
> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>

Acked-by: Serge Hallyn <se...@us.ibm.com>

> ---
> fs/sysfs/inode.c | 4 ----
> 1 files changed, 0 insertions(+), 4 deletions(-)
>
> diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
> index 72e2e99..e2595a7 100644
> --- a/fs/sysfs/inode.c
> +++ b/fs/sysfs/inode.c
> @@ -120,10 +120,6 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
> return error;
>
> iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */
> - if (iattr->ia_valid & ATTR_MODE) {
> - if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
> - iattr->ia_mode &= ~S_ISGID;
> - }
>
> error = inode_setattr(inode, iattr);
> if (error)
> --
> 1.6.5.2.143.g8cc62
>
> --

> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in

Serge E. Hallyn

unread,
Nov 4, 2009, 9:27:45 AM11/4/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise
Quoting Eric W. Biederman (ebie...@xmission.com):
>
> In general everything that writes to vfs inodes holds the
> inode mutex, so hold the inode mutex over sysfs_refresh_inode.
> The sysfs data structures don't need this but it looks like the
> vfs might.
>
> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>

Oh right so pls disregard my last reply to patch 9 :)


Acked-by: Serge Hallyn <se...@us.ibm.com>

Eric W. Biederman

unread,
Nov 4, 2009, 3:51:36 PM11/4/09
to Serge E. Hallyn, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise
"Serge E. Hallyn" <se...@us.ibm.com> writes:

> Quoting Eric W. Biederman (ebie...@xmission.com):
>>
>> In general everything that writes to vfs inodes holds the
>> inode mutex, so hold the inode mutex over sysfs_refresh_inode.
>> The sysfs data structures don't need this but it looks like the
>> vfs might.
>>
>> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
>
> Oh right so pls disregard my last reply to patch 9 :)

I also checked and nfs has the same basic structure and does take the
inode mutex on these paths, inside of nfs_refresh_inode which is
called from __nfs_revalidate_inode.

So it is definitely the consistent thing to do.

Eric

Serge E. Hallyn

unread,
Nov 4, 2009, 4:50:10 PM11/4/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Quoting Eric W. Biederman (ebie...@xmission.com):
> From: Eric W. Biederman <ebie...@xmission.com>
>
> By teaching sysfs_revalidate to hide a dentry for
> a sysfs_dirent if the sysfs_dirent has been renamed,
> and by teaching sysfs_lookup to return the original
> dentry if the sysfs dirent has been renamed. I can
> show the results of renames correctly without having to
> update the dcache during the directory rename.
>
> This massively simplifies the rename logic allowing a lot
> of weird sysfs special cases to be removed along with
> a lot of now unnecesary helper code.
>
> Acked-by: Tejun Heo <t...@kernel.org>

> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>

Patch looks *great*, except:

..

> @@ -315,6 +274,14 @@ static int sysfs_dentry_revalidate(struct dentry *dentry, struct nameidata *nd)


> if (sd->s_flags & SYSFS_FLAG_REMOVED)

> goto out_bad;
>
> + /* The sysfs dirent has been moved? */
> + if (dentry->d_parent->d_fsdata != sd->s_parent)
> + goto out_bad;
> +
> + /* The sysfs dirent has been renamed */
> + if (strcmp(dentry->d_name.name, sd->s_name) != 0)
> + goto out_bad;
> +
> mutex_unlock(&sysfs_mutex);
> out_valid:
> return 1;
> @@ -322,6 +289,12 @@ out_bad:


> /* Remove the dentry from the dcache hashes.

> * If this is a deleted dentry we use d_drop instead of d_delete

> * so sysfs doesn't need to cope with negative dentries.
> + *

> + * If this is a dentry that has simply been renamed we
> + * use d_drop to remove it from the dcache lookup on its
> + * old parent. If this dentry persists later when a lookup
> + * is performed at its new name the dentry will be readded
> + * to the dcache hashes.
> */


> is_dir = (sysfs_type(sd) == SYSFS_DIR);

> mutex_unlock(&sysfs_mutex);

After this, if (is_dir) and (have_submounts(dentry)) then you'll
still goto out_valid and return 1. Is that what you want?

-serge

Eric W. Biederman

unread,
Nov 4, 2009, 4:59:57 PM11/4/09
to Serge E. Hallyn, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
"Serge E. Hallyn" <se...@us.ibm.com> writes:

> Quoting Eric W. Biederman (ebie...@xmission.com):
>> From: Eric W. Biederman <ebie...@xmission.com>
>>
>> By teaching sysfs_revalidate to hide a dentry for
>> a sysfs_dirent if the sysfs_dirent has been renamed,
>> and by teaching sysfs_lookup to return the original
>> dentry if the sysfs dirent has been renamed. I can
>> show the results of renames correctly without having to
>> update the dcache during the directory rename.
>>
>> This massively simplifies the rename logic allowing a lot
>> of weird sysfs special cases to be removed along with
>> a lot of now unnecesary helper code.
>>
>> Acked-by: Tejun Heo <t...@kernel.org>
>> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
>
> Patch looks *great*, except:
>

> ...

It isn't what I want but it is what the VFS requires. If let the vfs
continue on it's delusional state we will leak the vfs mount and
everything mounted on top of it, with no way to remove the mounts.

Eric

Serge E. Hallyn

unread,
Nov 4, 2009, 5:20:10 PM11/4/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Quoting Eric W. Biederman (ebie...@xmission.com):
> From: Eric W. Biederman <ebie...@xmission.com>
>
> These two functions do 90% of the same work and it doesn't significantly
> obfuscate the function to allow both the parent dir and the name to change
> at the same time. So merge them together to simplify maintenance, and
> increase testing.

>
> Acked-by: Tejun Heo <t...@kernel.org>
> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>

Based on just this patchset it seems sysfs_rename() could be static,
but since it isn't static I assume your later patchset will use it
outside fs/sysfs/dir.c?

Acked-by: Serge Hallyn <se...@us.ibm.com>


> ---
> fs/sysfs/dir.c | 62 +++++++++++++++++++++++++----------------------------
> fs/sysfs/sysfs.h | 3 ++
> 2 files changed, 32 insertions(+), 33 deletions(-)
>
> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> index 0b60212..e1a86d1 100644
> --- a/fs/sysfs/dir.c
> +++ b/fs/sysfs/dir.c
> @@ -760,30 +760,42 @@ void sysfs_remove_dir(struct kobject * kobj)
> __sysfs_remove_dir(sd);
> }
>
> -int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
> +int sysfs_rename(struct sysfs_dirent *sd,
> + struct sysfs_dirent *new_parent_sd, const char *new_name)
> {
> - struct sysfs_dirent *sd = kobj->sd;
> const char *dup_name = NULL;
> int error;
>
> mutex_lock(&sysfs_mutex);
>
> error = 0;
> - if (strcmp(sd->s_name, new_name) == 0)
> + if ((sd->s_parent == new_parent_sd) &&
> + (strcmp(sd->s_name, new_name) == 0))
> goto out; /* nothing to rename */
>
> error = -EEXIST;
> - if (sysfs_find_dirent(sd->s_parent, new_name))
> + if (sysfs_find_dirent(new_parent_sd, new_name))
> goto out;
>
> /* rename sysfs_dirent */
> - error = -ENOMEM;
> - new_name = dup_name = kstrdup(new_name, GFP_KERNEL);
> - if (!new_name)
> - goto out;
> + if (strcmp(sd->s_name, new_name) != 0) {
> + error = -ENOMEM;
> + new_name = dup_name = kstrdup(new_name, GFP_KERNEL);
> + if (!new_name)
> + goto out;
> +
> + dup_name = sd->s_name;
> + sd->s_name = new_name;
> + }
>
> - dup_name = sd->s_name;
> - sd->s_name = new_name;
> + /* Remove from old parent's list and insert into new parent's list. */
> + if (sd->s_parent != new_parent_sd) {
> + sysfs_unlink_sibling(sd);
> + sysfs_get(new_parent_sd);
> + sysfs_put(sd->s_parent);
> + sd->s_parent = new_parent_sd;
> + sysfs_link_sibling(sd);
> + }
>
> error = 0;
> out:
> @@ -792,37 +804,21 @@ int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
> return error;
> }
>
> +int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
> +{
> + return sysfs_rename(kobj->sd, kobj->sd->s_parent, new_name);
> +}
> +
> int sysfs_move_dir(struct kobject *kobj, struct kobject *new_parent_kobj)
> {
> struct sysfs_dirent *sd = kobj->sd;
> struct sysfs_dirent *new_parent_sd;
> - int error;
>
> BUG_ON(!sd->s_parent);
> -
> - mutex_lock(&sysfs_mutex);
> - new_parent_sd = (new_parent_kobj && new_parent_kobj->sd) ?
> + new_parent_sd = new_parent_kobj && new_parent_kobj->sd ?
> new_parent_kobj->sd : &sysfs_root;
>
> - error = 0;
> - if (sd->s_parent == new_parent_sd)
> - goto out; /* nothing to move */
> -
> - error = -EEXIST;
> - if (sysfs_find_dirent(new_parent_sd, sd->s_name))
> - goto out;
> -
> - /* Remove from old parent's list and insert into new parent's list. */
> - sysfs_unlink_sibling(sd);
> - sysfs_get(new_parent_sd);
> - sysfs_put(sd->s_parent);
> - sd->s_parent = new_parent_sd;
> - sysfs_link_sibling(sd);
> -
> - error = 0;
> -out:
> - mutex_unlock(&sysfs_mutex);
> - return error;
> + return sysfs_rename(sd, new_parent_sd, sd->s_name);
> }
>
> /* Relationship between s_mode and the DT_xxx types */
> diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
> index 98a15bf..ca52e7b 100644
> --- a/fs/sysfs/sysfs.h
> +++ b/fs/sysfs/sysfs.h
> @@ -130,6 +130,9 @@ int sysfs_create_subdir(struct kobject *kobj, const char *name,
> struct sysfs_dirent **p_sd);
> void sysfs_remove_subdir(struct sysfs_dirent *sd);
>
> +int sysfs_rename(struct sysfs_dirent *sd,
> + struct sysfs_dirent *new_parent_sd, const char *new_name);
> +
> static inline struct sysfs_dirent *__sysfs_get(struct sysfs_dirent *sd)
> {
> if (sd) {
> --
> 1.6.5.2.143.g8cc62
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in

Eric W. Biederman

unread,
Nov 4, 2009, 5:24:10 PM11/4/09
to Serge E. Hallyn, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
"Serge E. Hallyn" <se...@us.ibm.com> writes:

> Quoting Eric W. Biederman (ebie...@xmission.com):
>> From: Eric W. Biederman <ebie...@xmission.com>
>>
>> These two functions do 90% of the same work and it doesn't significantly
>> obfuscate the function to allow both the parent dir and the name to change
>> at the same time. So merge them together to simplify maintenance, and
>> increase testing.
>>
>> Acked-by: Tejun Heo <t...@kernel.org>
>> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
>
> Based on just this patchset it seems sysfs_rename() could be static,
> but since it isn't static I assume your later patchset will use it
> outside fs/sysfs/dir.c?

To implement sysfs_rename_link, but that is too much to digest/review/push
all at once.

I have a snapshot of my development tree up on kernel.org if you are
interested.

Eric

Serge E. Hallyn

unread,
Nov 4, 2009, 5:37:44 PM11/4/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Quoting Eric W. Biederman (ebie...@xmission.com):
> "Serge E. Hallyn" <se...@us.ibm.com> writes:
>
> > Quoting Eric W. Biederman (ebie...@xmission.com):
> >> From: Eric W. Biederman <ebie...@xmission.com>
> >>
> >> These two functions do 90% of the same work and it doesn't significantly
> >> obfuscate the function to allow both the parent dir and the name to change
> >> at the same time. So merge them together to simplify maintenance, and
> >> increase testing.
> >>
> >> Acked-by: Tejun Heo <t...@kernel.org>
> >> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
> >
> > Based on just this patchset it seems sysfs_rename() could be static,
> > but since it isn't static I assume your later patchset will use it
> > outside fs/sysfs/dir.c?
>
> To implement sysfs_rename_link, but that is too much to digest/review/push
> all at once.

Ok, just making sure it'll get used.

To repeat myself,

Acked-by: Serge Hallyn <se...@us.ibm.com>

> I have a snapshot of my development tree up on kernel.org if you are
> interested.

I think this was an appropriately sized set :) Might take a look this
weekend though.

thanks,
-serge

Eric W. Biederman

unread,
Nov 4, 2009, 11:51:39 PM11/4/09
to Serge E. Hallyn, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
"Serge E. Hallyn" <se...@us.ibm.com> writes:

> Quoting Eric W. Biederman (ebie...@xmission.com):
>> "Serge E. Hallyn" <se...@us.ibm.com> writes:
>>
>> > Quoting Eric W. Biederman (ebie...@xmission.com):
>> >> From: Eric W. Biederman <ebie...@xmission.com>
>> >>
>> >> These two functions do 90% of the same work and it doesn't significantly
>> >> obfuscate the function to allow both the parent dir and the name to change
>> >> at the same time. So merge them together to simplify maintenance, and
>> >> increase testing.
>> >>
>> >> Acked-by: Tejun Heo <t...@kernel.org>
>> >> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
>> >
>> > Based on just this patchset it seems sysfs_rename() could be static,
>> > but since it isn't static I assume your later patchset will use it
>> > outside fs/sysfs/dir.c?
>>
>> To implement sysfs_rename_link, but that is too much to digest/review/push
>> all at once.
>
> Ok, just making sure it'll get used.
>
> To repeat myself,
>
> Acked-by: Serge Hallyn <se...@us.ibm.com>
>
>> I have a snapshot of my development tree up on kernel.org if you are
>> interested.
>
> I think this was an appropriately sized set :) Might take a look this
> weekend though.

Thanks for the review.

Eric

Greg KH

unread,
Nov 6, 2009, 6:37:50 PM11/6/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise
On Tue, Nov 03, 2009 at 03:53:56AM -0800, Eric W. Biederman wrote:
>
> The sysfs code updates the vfs caches immediately when the sysfs data
> structures change causing a lot of unnecessary complications. The
> following patchset untangles that beast. Allowing for simpler
> more straight forward code, the removal of a hack from the vfs
> to support sysfs, and human comprehensible locking on sysfs.
>
> Most of these patches have already been reviewed and acked from the
> last time I had time to work on sysfs.
>
> In net the patches look like:

Can you resend these based on the review that you just got, with the new
acks and changes so that I can apply them?

thanks,

greg k-h

Tejun Heo

unread,
Nov 6, 2009, 9:07:30 PM11/6/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Eric W. Biederman wrote:
> diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
> index 1137418..c5eff49 100644
> --- a/fs/sysfs/symlink.c
> +++ b/fs/sysfs/symlink.c

> @@ -214,6 +214,9 @@ const struct inode_operations sysfs_symlink_inode_operations = {
> .readlink = generic_readlink,
> .follow_link = sysfs_follow_link,
> .put_link = sysfs_put_link,
> + .setattr = sysfs_setattr,

It would be nice either to separate addition of setattr to symlinks
into a separate patch or note it in the description.

Thanks.

--
tejun

Tejun Heo

unread,
Nov 6, 2009, 9:09:10 PM11/6/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Eric W. Biederman wrote:
> From: Eric W. Biederman <ebie...@aristanetworks.com>
>
> With lazy inode updates and dentry operations bringing everything
> into sync on demand there is no longer any need to immediately
> update the vfs or grab i_mutex to protect those updates as we
> make changes to sysfs.
>
> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>

Acked-by: Tejun Heo <t...@kernel.org>

Tejun Heo

unread,
Nov 6, 2009, 9:12:19 PM11/6/09
to Eric W. Biederman, Serge E. Hallyn, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Hello,

Eric W. Biederman wrote:
> It isn't what I want but it is what the VFS requires. If let the vfs
> continue on it's delusional state we will leak the vfs mount and
> everything mounted on top of it, with no way to remove the mounts.

This is caused by not having any way to prevent deletion on
directories with submounts, right? How does other distributed
filesystems deal with directories with submounts going away underneath
it?

Thanks.

--
tejun

Eric W. Biederman

unread,
Nov 6, 2009, 9:16:52 PM11/6/09
to Tejun Heo, Serge E. Hallyn, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Tejun Heo <t...@kernel.org> writes:

> Hello,
>
> Eric W. Biederman wrote:
>> It isn't what I want but it is what the VFS requires. If let the vfs
>> continue on it's delusional state we will leak the vfs mount and
>> everything mounted on top of it, with no way to remove the mounts.
>
> This is caused by not having any way to prevent deletion on
> directories with submounts, right? How does other distributed
> filesystems deal with directories with submounts going away underneath
> it?

NFS does exactly the same thing I am doing.

Eric

Tejun Heo

unread,
Nov 6, 2009, 9:18:56 PM11/6/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn
Eric W. Biederman wrote:
> inode_change_ok already clears the SGID bit when necessary so there is
> no reason for sysfs_setattr to carry code to do the same, and it is
> good to kill the extra copy because when I moved the code, I goofed
> and in certain corner cases the code will look at the wrong gid.
>
> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>

Acked-by: Tejun Heo <t...@kernel.org>

I would much prefer this one being prior to 06 but if it's too
painful, please at least note it in the description of 06.

Thanks.

--
tejun

Tejun Heo

unread,
Nov 6, 2009, 9:20:44 PM11/6/09
to Eric W. Biederman, Serge E. Hallyn, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Hello,

Eric W. Biederman wrote:
> NFS does exactly the same thing I am doing.

Oh... well, sysfs directories parenting other filesystems are pretty
rare and well defined. Although it's not very pretty, I don't think
we'll see any actual problem there. Thanks for the explanation.

--
tejun

Tejun Heo

unread,
Nov 6, 2009, 9:27:15 PM11/6/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn
Eric W. Biederman wrote:
> In general everything that writes to vfs inodes holds the
> inode mutex, so hold the inode mutex over sysfs_refresh_inode.
> The sysfs data structures don't need this but it looks like the
> vfs might.
>
> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>

Acked-by: Tejun Heo <t...@kernel.org>

Sidenote: Hmmm... Originally, sysfs completely depended on vfs locking
but with sysfs_dirent separation, the tree structure itself and some
attributes went under the protection of sysfs_mutex while leaving more
vfs oriented fields under vfs locking. This patchset makes sysfs
lazier so it can't depend on any vfs layer locking. I think you've
converted all necessary places while removing dependency on
dentry/inode from update operations but it might be a good idea to do
a audit pass over how fields are being protected now.

Thanks for your patience.

--
tejun

Tejun Heo

unread,
Nov 6, 2009, 9:27:55 PM11/6/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Eric W. Biederman wrote:
> From: Eric W. Biederman <ebie...@aristanetworks.com>
>
> The sysfs_mutex is required to ensure updates are and will remain
> atomic with respect to other inode iattr updates, that do not happen
> through the filesystem.

>
> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>

Acked-by: Tejun Heo <t...@kernel.org>

--

Eric W. Biederman

unread,
Nov 6, 2009, 9:35:23 PM11/6/09
to Tejun Heo, Serge E. Hallyn, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Tejun Heo <t...@kernel.org> writes:

> Hello,
>
> Eric W. Biederman wrote:
>> NFS does exactly the same thing I am doing.
>
> Oh... well, sysfs directories parenting other filesystems are pretty
> rare and well defined. Although it's not very pretty, I don't think
> we'll see any actual problem there. Thanks for the explanation.

To be perfectly clear, if we hit this ugly case. The internal
sysfs_dirent tree is always what it should be. Only the VFS dentry
cache doesn't get updated.

Eric

Miklos Szeredi

unread,
Nov 7, 2009, 6:13:47 AM11/7/09
to Eric W. Biederman, t...@kernel.org, se...@us.ibm.com, gre...@suse.de, kay.s...@vrfy.org, gr...@kroah.com, linux-...@vger.kernel.org, cornel...@de.ibm.com, linux-...@vger.kernel.org, eric.d...@gmail.com, bc...@lhnet.ca, ebie...@aristanetworks.com
On Fri, 06 Nov 2009, ebie...@xmission.com (Eric W. Biederman wrote:
> Tejun Heo <t...@kernel.org> writes:
>
> > Hello,
> >
> > Eric W. Biederman wrote:
> >> It isn't what I want but it is what the VFS requires. If let the vfs
> >> continue on it's delusional state we will leak the vfs mount and
> >> everything mounted on top of it, with no way to remove the mounts.

"umount -l" on the whole thing will clear any submounts up too.

> >
> > This is caused by not having any way to prevent deletion on
> > directories with submounts, right? How does other distributed
> > filesystems deal with directories with submounts going away underneath
> > it?
>
> NFS does exactly the same thing I am doing.

Yes, this is a problem for NFS too. You cannot tell the NFS server
"this directory is mounted on some client, don't let anything happen
to it!". Basically the remaining choices are:

a) let the old path leading up to the mount still be accessible, even
though it doesn't exist anymore on the server (or has been replaced
with something different)

b) automatically dissolve any submounts if the path disappeard on the
server

I think Al was arguing in favor of b), while Linus said that mounts
must never just disappear, so a) is better. I don't think an
agreement was reached.

Thanks,
Miklos

Eric W. Biederman

unread,
Nov 7, 2009, 6:57:54 AM11/7/09
to Miklos Szeredi, t...@kernel.org, se...@us.ibm.com, gre...@suse.de, kay.s...@vrfy.org, gr...@kroah.com, linux-...@vger.kernel.org, cornel...@de.ibm.com, linux-...@vger.kernel.org, eric.d...@gmail.com, bc...@lhnet.ca, ebie...@aristanetworks.com
Miklos Szeredi <mik...@szeredi.hu> writes:

I haven't seen that conversation. I do know it is non-intutive and if
you attempt to delete what is a mount point in another mount namespace
and it won't go away. (What we do for non-distributed filesystems).
So I would favor mount points dissolving if we had the infrastructure.

Regardless the goal for now is to simply catch up with other distributed
filesystems.

Eric

Eric W. Biederman

unread,
Nov 8, 2009, 1:04:31 AM11/8/09
to Tejun Heo, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Tejun Heo <t...@kernel.org> writes:

> Eric W. Biederman wrote:
>> diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
>> index 1137418..c5eff49 100644
>> --- a/fs/sysfs/symlink.c
>> +++ b/fs/sysfs/symlink.c
>> @@ -214,6 +214,9 @@ const struct inode_operations sysfs_symlink_inode_operations = {
>> .readlink = generic_readlink,
>> .follow_link = sysfs_follow_link,
>> .put_link = sysfs_put_link,
>> + .setattr = sysfs_setattr,
>
> It would be nice either to separate addition of setattr to symlinks
> into a separate patch or note it in the description.

Good point note added.

Eric

Eric W. Biederman

unread,
Nov 8, 2009, 2:04:23 AM11/8/09
to Tejun Heo, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn
Tejun Heo <t...@kernel.org> writes:

> Eric W. Biederman wrote:
>> In general everything that writes to vfs inodes holds the
>> inode mutex, so hold the inode mutex over sysfs_refresh_inode.
>> The sysfs data structures don't need this but it looks like the
>> vfs might.
>>
>> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
>
> Acked-by: Tejun Heo <t...@kernel.org>
>
> Sidenote: Hmmm... Originally, sysfs completely depended on vfs locking
> but with sysfs_dirent separation, the tree structure itself and some
> attributes went under the protection of sysfs_mutex while leaving more
> vfs oriented fields under vfs locking. This patchset makes sysfs
> lazier so it can't depend on any vfs layer locking. I think you've
> converted all necessary places while removing dependency on
> dentry/inode from update operations but it might be a good idea to do
> a audit pass over how fields are being protected now.

You raised a good point. I took a quick second pass through.
I did not see anything I have missed, and I did not change anything
else on the vfs path.

So at the very least I don't expect there are any locking related
regressions.

Eric

Eric W. Biederman

unread,
Nov 8, 2009, 2:06:32 AM11/8/09
to Greg KH, Greg Kroah-Hartman, Kay Sievers, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise
Greg KH <gr...@kroah.com> writes:

> On Tue, Nov 03, 2009 at 03:53:56AM -0800, Eric W. Biederman wrote:
>>
>> The sysfs code updates the vfs caches immediately when the sysfs data
>> structures change causing a lot of unnecessary complications. The
>> following patchset untangles that beast. Allowing for simpler
>> more straight forward code, the removal of a hack from the vfs
>> to support sysfs, and human comprehensible locking on sysfs.
>>
>> Most of these patches have already been reviewed and acked from the
>> last time I had time to work on sysfs.
>>
>> In net the patches look like:
>
> Can you resend these based on the review that you just got, with the new
> acks and changes so that I can apply them?

Coming in just a minute. The fixes are the trailing patches.

Eric

Eric W. Biederman

unread,
Nov 8, 2009, 2:25:44 AM11/8/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn

The sysfs code updates the vfs caches immediately when the sysfs data
structures change causing a lot of unnecessary complications. The
following patchset untangles that beast. Allowing for simpler
more straight forward code, the removal of a hack from the vfs
to support sysfs, and human comprehensible locking on sysfs.

Most of these patches have already been reviewed and acked from the
last time I had time to work on sysfs.

This acks have been folded in and the two small bugs found in the
previous review have been fixed in the trailing patches (they are
minor enough nits that even a bisect that happens to land in the
middle should not see sysfs problems).

In net the patches look like:

fs/namei.c | 22 ---
fs/sysfs/dir.c | 388 ++++++++++++++++---------------------------------
fs/sysfs/file.c | 41 +----
fs/sysfs/inode.c | 178 ++++++++++++++---------
fs/sysfs/symlink.c | 11 +-
fs/sysfs/sysfs.h | 9 +-
include/linux/namei.h | 1 -
7 files changed, 256 insertions(+), 394 deletions(-)

Eric W. Biederman

unread,
Nov 8, 2009, 2:27:40 AM11/8/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@xmission.com>

Using dentry instead of d in the function name is what
several other filesystems are doing and it seems to be
a more readable convention.

Acked-by: Tejun Heo <t...@kernel.org>
Acked-by: Serge Hallyn <se...@us.ibm.com>


Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>

---
fs/sysfs/dir.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index e020183..130dfc3 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -298,7 +298,7 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
goto repeat;
}

-static void sysfs_d_iput(struct dentry * dentry, struct inode * inode)
+static void sysfs_dentry_iput(struct dentry * dentry, struct inode * inode)
{
struct sysfs_dirent * sd = dentry->d_fsdata;

@@ -307,7 +307,7 @@ static void sysfs_d_iput(struct dentry * dentry, struct inode * inode)
}

static const struct dentry_operations sysfs_dentry_ops = {
- .d_iput = sysfs_d_iput,
+ .d_iput = sysfs_dentry_iput,
};

struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type)
--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 8, 2009, 2:27:49 AM11/8/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@xmission.com>

Currently every caller of sysfs_chmod_file happens at either
file creation time to set a non-default mode or in response
to a specific user requested space change in policy. Making
timestamps of when the chmod happens and notification of
a file changing mode uninteresting.

Remove the unnecessary time stamp and filesystem change
notification, and removes the last of the explicit inotify
and donitfy support from sysfs.

Acked-by: Tejun Heo <t...@kernel.org>
Acked-by: Serge Hallyn <se...@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/file.c | 10 +---------
1 files changed, 1 insertions(+), 9 deletions(-)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index f5ea468..faa1a80 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -604,17 +604,9 @@ int sysfs_chmod_file(struct kobject *kobj, struct attribute *attr, mode_t mode)
mutex_lock(&inode->i_mutex);

newattrs.ia_mode = (mode & S_IALLUGO) | (inode->i_mode & ~S_IALLUGO);
- newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
- newattrs.ia_ctime = current_fs_time(inode->i_sb);
+ newattrs.ia_valid = ATTR_MODE;
rc = sysfs_setattr(victim, &newattrs);

- if (rc == 0) {
- fsnotify_change(victim, newattrs.ia_valid);
- mutex_lock(&sysfs_mutex);
- victim_sd->s_mode = newattrs.ia_mode;
- mutex_unlock(&sysfs_mutex);
- }
-
mutex_unlock(&inode->i_mutex);
out:
dput(victim);
--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 8, 2009, 2:27:57 AM11/8/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@maxwell.aristanetworks.com>

In general everything that writes to vfs inodes holds the
inode mutex, so hold the inode mutex over sysfs_refresh_inode.
The sysfs data structures don't need this but it looks like the
vfs might.

Acked-by: Serge Hallyn <se...@us.ibm.com>
Acked-by: Tejun Heo <t...@kernel.org>


Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/inode.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 8197e1a..75516cd 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -237,9 +237,11 @@ int sysfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *sta
struct sysfs_dirent *sd = dentry->d_fsdata;
struct inode *inode = dentry->d_inode;

+ mutex_lock(&inode->i_mutex);
mutex_lock(&sysfs_mutex);
sysfs_refresh_inode(sd, inode);
mutex_unlock(&sysfs_mutex);
+ mutex_unlock(&inode->i_mutex);

generic_fillattr(inode, stat);
return 0;
@@ -349,9 +351,11 @@ int sysfs_permission(struct inode *inode, int mask)
{
struct sysfs_dirent *sd = inode->i_private;

+ mutex_lock(&inode->i_mutex);
mutex_lock(&sysfs_mutex);
sysfs_refresh_inode(sd, inode);
mutex_unlock(&sysfs_mutex);
+ mutex_unlock(&inode->i_mutex);

return generic_permission(inode, mask, NULL);
}
--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 8, 2009, 2:28:07 AM11/8/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@maxwell.aristanetworks.com>

inode_change_ok already clears the SGID bit when necessary


so there is no reason for sysfs_setattr to carry code to do
the same, and it is good to kill the extra copy because when

I moved the code last in certain corner cases the code will


look at the wrong gid.

Acked-by: Serge Hallyn <se...@us.ibm.com>


Acked-by: Tejun Heo <t...@kernel.org>
Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/inode.c | 4 ----
1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 082a3eb..8197e1a 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -117,10 +117,6 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
return error;

iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */
- if (iattr->ia_valid & ATTR_MODE) {
- if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
- iattr->ia_mode &= ~S_ISGID;
- }

error = inode_setattr(inode, iattr);
if (error)
--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 8, 2009, 2:28:15 AM11/8/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@xmission.com>

With the implementation of sysfs_getattr and sysfs_permission
sysfs becomes able to lazily propogate inode attribute changes
from the sysfs_dirents to the vfs inodes. This paves the way
for deleting significant chunks of now unnecessary code.

While doing this we did not reference sysfs_setattr from
sysfs_symlink_inode_operations so I added along with
sysfs_getattr and sysfs_permission.

Acked-by: Tejun Heo <t...@kernel.org>
Acked-by: Serge Hallyn <se...@us.ibm.com>

Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/dir.c | 2 +
fs/sysfs/inode.c | 64 ++++++++++++++++++++++++++++++++++++++-------------
fs/sysfs/symlink.c | 3 ++
fs/sysfs/sysfs.h | 2 +
4 files changed, 54 insertions(+), 17 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index fa37126..25d052a 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -800,7 +800,9 @@ static struct dentry * sysfs_lookup(struct inode *dir, struct dentry *dentry,

const struct inode_operations sysfs_dir_inode_operations = {
.lookup = sysfs_lookup,
+ .permission = sysfs_permission,
.setattr = sysfs_setattr,
+ .getattr = sysfs_getattr,
.setxattr = sysfs_setxattr,
};

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index fccfb55..2dcafe8 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -37,7 +37,9 @@ static struct backing_dev_info sysfs_backing_dev_info = {
};

static const struct inode_operations sysfs_inode_operations ={
+ .permission = sysfs_permission,
.setattr = sysfs_setattr,
+ .getattr = sysfs_getattr,
.setxattr = sysfs_setxattr,
};

@@ -196,7 +198,6 @@ static inline void set_default_inode_attr(struct inode * inode, mode_t mode)

static inline void set_inode_attr(struct inode * inode, struct iattr * iattr)
{
- inode->i_mode = iattr->ia_mode;
inode->i_uid = iattr->ia_uid;
inode->i_gid = iattr->ia_gid;
inode->i_atime = iattr->ia_atime;
@@ -227,38 +228,56 @@ static int sysfs_count_nlink(struct sysfs_dirent *sd)
return nr + 2;
}

+static void sysfs_refresh_inode(struct sysfs_dirent *sd, struct inode *inode)
+{
+ struct sysfs_inode_attrs *iattrs = sd->s_iattr;
+
+ inode->i_mode = sd->s_mode;
+ if (iattrs) {
+ /* sysfs_dirent has non-default attributes
+ * get them from persistent copy in sysfs_dirent
+ */
+ set_inode_attr(inode, &iattrs->ia_iattr);
+ security_inode_notifysecctx(inode,
+ iattrs->ia_secdata,
+ iattrs->ia_secdata_len);
+ }
+
+ if (sysfs_type(sd) == SYSFS_DIR)
+ inode->i_nlink = sysfs_count_nlink(sd);
+}
+
+int sysfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
+{
+ struct sysfs_dirent *sd = dentry->d_fsdata;
+ struct inode *inode = dentry->d_inode;
+
+ mutex_lock(&sysfs_mutex);
+ sysfs_refresh_inode(sd, inode);
+ mutex_unlock(&sysfs_mutex);
+
+ generic_fillattr(inode, stat);
+ return 0;
+}
+
static void sysfs_init_inode(struct sysfs_dirent *sd, struct inode *inode)
{
struct bin_attribute *bin_attr;
- struct sysfs_inode_attrs *iattrs;

inode->i_private = sysfs_get(sd);
inode->i_mapping->a_ops = &sysfs_aops;
inode->i_mapping->backing_dev_info = &sysfs_backing_dev_info;
inode->i_op = &sysfs_inode_operations;
- inode->i_ino = sd->s_ino;
lockdep_set_class(&inode->i_mutex, &sysfs_inode_imutex_key);

- iattrs = sd->s_iattr;
- if (iattrs) {
- /* sysfs_dirent has non-default attributes
- * get them for the new inode from persistent copy
- * in sysfs_dirent
- */
- set_inode_attr(inode, &iattrs->ia_iattr);
- if (iattrs->ia_secdata)
- security_inode_notifysecctx(inode,
- iattrs->ia_secdata,
- iattrs->ia_secdata_len);
- } else
- set_default_inode_attr(inode, sd->s_mode);
+ set_default_inode_attr(inode, sd->s_mode);
+ sysfs_refresh_inode(sd, inode);

/* initialize inode according to type */
switch (sysfs_type(sd)) {
case SYSFS_DIR:
inode->i_op = &sysfs_dir_inode_operations;
inode->i_fop = &sysfs_dir_operations;
- inode->i_nlink = sysfs_count_nlink(sd);
break;
case SYSFS_KOBJ_ATTR:
inode->i_size = PAGE_SIZE;
@@ -341,3 +360,14 @@ int sysfs_hash_and_remove(struct sysfs_dirent *dir_sd, const char *name)
else
return -ENOENT;
}
+
+int sysfs_permission(struct inode *inode, int mask)
+{
+ struct sysfs_dirent *sd = inode->i_private;
+
+ mutex_lock(&sysfs_mutex);
+ sysfs_refresh_inode(sd, inode);
+ mutex_unlock(&sysfs_mutex);
+
+ return generic_permission(inode, mask, NULL);
+}


diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
index 1137418..c5eff49 100644
--- a/fs/sysfs/symlink.c
+++ b/fs/sysfs/symlink.c
@@ -214,6 +214,9 @@ const struct inode_operations sysfs_symlink_inode_operations = {
.readlink = generic_readlink,
.follow_link = sysfs_follow_link,
.put_link = sysfs_put_link,
+ .setattr = sysfs_setattr,

+ .getattr = sysfs_getattr,
+ .permission = sysfs_permission,
};


diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index a96d967..12ccc07 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -156,7 +156,9 @@ static inline void __sysfs_put(struct sysfs_dirent *sd)
struct inode *sysfs_get_inode(struct sysfs_dirent *sd);
void sysfs_delete_inode(struct inode *inode);
int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr *iattr);
+int sysfs_permission(struct inode *inode, int mask);
int sysfs_setattr(struct dentry *dentry, struct iattr *iattr);
+int sysfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat);
int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
size_t size, int flags);
int sysfs_hash_and_remove(struct sysfs_dirent *dir_sd, const char *name);
--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 8, 2009, 2:28:22 AM11/8/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@maxwell.aristanetworks.com>

With lazy inode updates and dentry operations bringing everything


into sync on demand there is no longer any need to immediately
update the vfs or grab i_mutex to protect those updates as we
make changes to sysfs.

Acked-by: Serge Hallyn <se...@us.ibm.com>
Acked-by: Tejun Heo <t...@kernel.org>


Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/dir.c | 91 ++---------------------------------------------------
fs/sysfs/sysfs.h | 2 -
2 files changed, 4 insertions(+), 89 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 25d052a..a05b027 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -386,12 +386,6 @@ struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type)
return NULL;
}

-static int sysfs_ilookup_test(struct inode *inode, void *arg)
-{
- struct sysfs_dirent *sd = arg;
- return inode->i_ino == sd->s_ino;
-}
-
/**
* sysfs_addrm_start - prepare for sysfs_dirent add/remove
* @acxt: pointer to sysfs_addrm_cxt to be used
@@ -399,47 +393,20 @@ static int sysfs_ilookup_test(struct inode *inode, void *arg)
*
* This function is called when the caller is about to add or
* remove sysfs_dirent under @parent_sd. This function acquires
- * sysfs_mutex, grabs inode for @parent_sd if available and lock
- * i_mutex of it. @acxt is used to keep and pass context to
+ * sysfs_mutex. @acxt is used to keep and pass context to
* other addrm functions.
*
* LOCKING:
* Kernel thread context (may sleep). sysfs_mutex is locked on
- * return. i_mutex of parent inode is locked on return if
- * available.
+ * return.
*/
void sysfs_addrm_start(struct sysfs_addrm_cxt *acxt,
struct sysfs_dirent *parent_sd)
{
- struct inode *inode;
-
memset(acxt, 0, sizeof(*acxt));
acxt->parent_sd = parent_sd;

- /* Lookup parent inode. inode initialization is protected by
- * sysfs_mutex, so inode existence can be determined by
- * looking up inode while holding sysfs_mutex.
- */
mutex_lock(&sysfs_mutex);
-
- inode = ilookup5(sysfs_sb, parent_sd->s_ino, sysfs_ilookup_test,
- parent_sd);
- if (inode) {
- WARN_ON(inode->i_state & I_NEW);
-
- /* parent inode available */
- acxt->parent_inode = inode;
-
- /* sysfs_mutex is below i_mutex in lock hierarchy.
- * First, trylock i_mutex. If fails, unlock
- * sysfs_mutex and lock them in order.
- */
- if (!mutex_trylock(&inode->i_mutex)) {
- mutex_unlock(&sysfs_mutex);
- mutex_lock(&inode->i_mutex);
- mutex_lock(&sysfs_mutex);
- }
- }
}

/**
@@ -471,11 +438,6 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)

sd->s_parent = sysfs_get(acxt->parent_sd);

- if (sysfs_type(sd) == SYSFS_DIR && acxt->parent_inode)
- inc_nlink(acxt->parent_inode);
-
- acxt->cnt++;
-
sysfs_link_sibling(sd);

/* Update timestamps on the parent */
@@ -579,40 +541,6 @@ void sysfs_remove_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
sd->s_flags |= SYSFS_FLAG_REMOVED;
sd->s_sibling = acxt->removed;
acxt->removed = sd;
-
- if (sysfs_type(sd) == SYSFS_DIR && acxt->parent_inode)
- drop_nlink(acxt->parent_inode);
-
- acxt->cnt++;
-}
-
-/**
- * sysfs_dec_nlink - Decrement link count for the specified sysfs_dirent
- * @sd: target sysfs_dirent
- *
- * Decrement nlink for @sd. @sd must have been unlinked from its
- * parent on entry to this function such that it can't be looked
- * up anymore.
- */
-static void sysfs_dec_nlink(struct sysfs_dirent *sd)
-{
- struct inode *inode;
-
- inode = ilookup(sysfs_sb, sd->s_ino);
- if (!inode)
- return;
-
- /* adjust nlink and update timestamp */
- mutex_lock(&inode->i_mutex);
-
- inode->i_ctime = CURRENT_TIME;
- drop_nlink(inode);
- if (sysfs_type(sd) == SYSFS_DIR)
- drop_nlink(inode);
-
- mutex_unlock(&inode->i_mutex);
-
- iput(inode);
}

/**
@@ -621,25 +549,15 @@ static void sysfs_dec_nlink(struct sysfs_dirent *sd)
*
* Finish up sysfs_dirent add/remove. Resources acquired by
* sysfs_addrm_start() are released and removed sysfs_dirents are
- * cleaned up. Timestamps on the parent inode are updated.
+ * cleaned up.
*
* LOCKING:
- * All mutexes acquired by sysfs_addrm_start() are released.
+ * sysfs_mutex is released.
*/
void sysfs_addrm_finish(struct sysfs_addrm_cxt *acxt)
{
/* release resources acquired by sysfs_addrm_start() */
mutex_unlock(&sysfs_mutex);
- if (acxt->parent_inode) {
- struct inode *inode = acxt->parent_inode;
-
- /* if added/removed, update timestamps on the parent */
- if (acxt->cnt)
- inode->i_ctime = inode->i_mtime = CURRENT_TIME;
-
- mutex_unlock(&inode->i_mutex);
- iput(inode);
- }

/* kill removed sysfs_dirents */
while (acxt->removed) {
@@ -648,7 +566,6 @@ void sysfs_addrm_finish(struct sysfs_addrm_cxt *acxt)
acxt->removed = sd->s_sibling;
sd->s_sibling = NULL;

- sysfs_dec_nlink(sd);
sysfs_deactivate(sd);
unmap_bin_file(sd);
sysfs_put(sd);
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 12ccc07..90b3501 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -89,9 +89,7 @@ static inline unsigned int sysfs_type(struct sysfs_dirent *sd)
*/
struct sysfs_addrm_cxt {
struct sysfs_dirent *parent_sd;
- struct inode *parent_inode;
struct sysfs_dirent *removed;
- int cnt;
};

/*
--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 8, 2009, 2:28:41 AM11/8/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@xmission.com>

Lining up the functions in sysfs_symlink_inode_operations
follows the pattern in the rest of sysfs and makes things
slightly more readable.

Acked-by: Tejun Heo <t...@kernel.org>
Acked-by: Serge Hallyn <se...@us.ibm.com>

Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/symlink.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
index c5081ad..1137418 100644
--- a/fs/sysfs/symlink.c
+++ b/fs/sysfs/symlink.c
@@ -210,10 +210,10 @@ static void sysfs_put_link(struct dentry *dentry, struct nameidata *nd, void *co


}

const struct inode_operations sysfs_symlink_inode_operations = {

- .setxattr = sysfs_setxattr,
- .readlink = generic_readlink,
- .follow_link = sysfs_follow_link,
- .put_link = sysfs_put_link,
+ .setxattr = sysfs_setxattr,
+ .readlink = generic_readlink,
+ .follow_link = sysfs_follow_link,
+ .put_link = sysfs_put_link,
};


--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 8, 2009, 2:28:55 AM11/8/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@xmission.com>

By teaching sysfs_revalidate to hide a dentry for
a sysfs_dirent if the sysfs_dirent has been renamed,
and by teaching sysfs_lookup to return the original
dentry if the sysfs dirent has been renamed. I can
show the results of renames correctly without having to
update the dcache during the directory rename.

This massively simplifies the rename logic allowing a lot
of weird sysfs special cases to be removed along with
a lot of now unnecesary helper code.

Acked-by: Tejun Heo <t...@kernel.org>


Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/namei.c | 22 -------
fs/sysfs/dir.c | 158 ++++++++++---------------------------------------
fs/sysfs/inode.c | 12 ----
fs/sysfs/sysfs.h | 1 -
include/linux/namei.h | 1 -
5 files changed, 32 insertions(+), 162 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index d11f404..d3c190c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1279,28 +1279,6 @@ struct dentry *lookup_one_len(const char *name, struct dentry *base, int len)
return __lookup_hash(&this, base, NULL);
}

-/**
- * lookup_one_noperm - bad hack for sysfs
- * @name: pathname component to lookup
- * @base: base directory to lookup from
- *
- * This is a variant of lookup_one_len that doesn't perform any permission
- * checks. It's a horrible hack to work around the braindead sysfs
- * architecture and should not be used anywhere else.
- *
- * DON'T USE THIS FUNCTION EVER, thanks.
- */
-struct dentry *lookup_one_noperm(const char *name, struct dentry *base)
-{
- int err;
- struct qstr this;
-
- err = __lookup_one_len(name, &this, base, strlen(name));
- if (err)
- return ERR_PTR(err);
- return __lookup_hash(&this, base, NULL);
-}
-
int user_path_at(int dfd, const char __user *name, unsigned flags,
struct path *path)
{
diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index a05b027..0b60212 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -25,7 +25,6 @@
#include "sysfs.h"

DEFINE_MUTEX(sysfs_mutex);
-DEFINE_MUTEX(sysfs_rename_mutex);
DEFINE_SPINLOCK(sysfs_assoc_lock);

static DEFINE_SPINLOCK(sysfs_ino_lock);
@@ -85,46 +84,6 @@ static void sysfs_unlink_sibling(struct sysfs_dirent *sd)
}

/**
- * sysfs_get_dentry - get dentry for the given sysfs_dirent
- * @sd: sysfs_dirent of interest
- *
- * Get dentry for @sd. Dentry is looked up if currently not
- * present. This function descends from the root looking up
- * dentry for each step.
- *
- * LOCKING:
- * mutex_lock(sysfs_rename_mutex)
- *
- * RETURNS:
- * Pointer to found dentry on success, ERR_PTR() value on error.
- */
-struct dentry *sysfs_get_dentry(struct sysfs_dirent *sd)
-{
- struct dentry *dentry = dget(sysfs_sb->s_root);
-
- while (dentry->d_fsdata != sd) {
- struct sysfs_dirent *cur;
- struct dentry *parent;
-
- /* find the first ancestor which hasn't been looked up */
- cur = sd;
- while (cur->s_parent != dentry->d_fsdata)
- cur = cur->s_parent;
-
- /* look it up */
- parent = dentry;
- mutex_lock(&parent->d_inode->i_mutex);
- dentry = lookup_one_noperm(cur->s_name, parent);
- mutex_unlock(&parent->d_inode->i_mutex);
- dput(parent);
-
- if (IS_ERR(dentry))
- break;
- }
- return dentry;
-}
-
-/**
* sysfs_get_active - get an active reference to sysfs_dirent
* @sd: sysfs_dirent to get an active reference to
*
@@ -315,6 +274,14 @@ static int sysfs_dentry_revalidate(struct dentry *dentry, struct nameidata *nd)
if (sd->s_flags & SYSFS_FLAG_REMOVED)
goto out_bad;

+ /* The sysfs dirent has been moved? */
+ if (dentry->d_parent->d_fsdata != sd->s_parent)
+ goto out_bad;
+
+ /* The sysfs dirent has been renamed */
+ if (strcmp(dentry->d_name.name, sd->s_name) != 0)
+ goto out_bad;
+
mutex_unlock(&sysfs_mutex);
out_valid:
return 1;
@@ -322,6 +289,12 @@ out_bad:
/* Remove the dentry from the dcache hashes.
* If this is a deleted dentry we use d_drop instead of d_delete
* so sysfs doesn't need to cope with negative dentries.
+ *
+ * If this is a dentry that has simply been renamed we
+ * use d_drop to remove it from the dcache lookup on its
+ * old parent. If this dentry persists later when a lookup
+ * is performed at its new name the dentry will be readded
+ * to the dcache hashes.
*/
is_dir = (sysfs_type(sd) == SYSFS_DIR);
mutex_unlock(&sysfs_mutex);
@@ -705,10 +678,15 @@ static struct dentry * sysfs_lookup(struct inode *dir, struct dentry *dentry,
}

/* instantiate and hash dentry */
- dentry->d_op = &sysfs_dentry_ops;
- dentry->d_fsdata = sysfs_get(sd);
- d_instantiate(dentry, inode);
- d_rehash(dentry);
+ ret = d_find_alias(inode);
+ if (!ret) {
+ dentry->d_op = &sysfs_dentry_ops;
+ dentry->d_fsdata = sysfs_get(sd);
+ d_add(dentry, inode);
+ } else {
+ d_move(ret, dentry);
+ iput(inode);
+ }

out_unlock:
mutex_unlock(&sysfs_mutex);
@@ -785,62 +763,32 @@ void sysfs_remove_dir(struct kobject * kobj)
int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
{
struct sysfs_dirent *sd = kobj->sd;
- struct dentry *parent = NULL;
- struct dentry *old_dentry = NULL, *new_dentry = NULL;
const char *dup_name = NULL;
int error;

- mutex_lock(&sysfs_rename_mutex);
+ mutex_lock(&sysfs_mutex);

error = 0;
if (strcmp(sd->s_name, new_name) == 0)
goto out; /* nothing to rename */

- /* get the original dentry */
- old_dentry = sysfs_get_dentry(sd);
- if (IS_ERR(old_dentry)) {
- error = PTR_ERR(old_dentry);
- old_dentry = NULL;
- goto out;
- }
-
- parent = old_dentry->d_parent;
-
- /* lock parent and get dentry for new name */
- mutex_lock(&parent->d_inode->i_mutex);
- mutex_lock(&sysfs_mutex);
-
error = -EEXIST;
if (sysfs_find_dirent(sd->s_parent, new_name))
- goto out_unlock;
-
- error = -ENOMEM;
- new_dentry = d_alloc_name(parent, new_name);
- if (!new_dentry)
- goto out_unlock;
+ goto out;

/* rename sysfs_dirent */
error = -ENOMEM;
new_name = dup_name = kstrdup(new_name, GFP_KERNEL);
if (!new_name)
- goto out_unlock;
+ goto out;

dup_name = sd->s_name;
sd->s_name = new_name;

- /* rename */
- d_add(new_dentry, NULL);
- d_move(old_dentry, new_dentry);
-
error = 0;
- out_unlock:
+ out:
mutex_unlock(&sysfs_mutex);
- mutex_unlock(&parent->d_inode->i_mutex);
kfree(dup_name);
- dput(old_dentry);
- dput(new_dentry);
- out:
- mutex_unlock(&sysfs_rename_mutex);
return error;
}

@@ -848,12 +796,11 @@ int sysfs_move_dir(struct kobject *kobj, struct kobject *new_parent_kobj)
{
struct sysfs_dirent *sd = kobj->sd;
struct sysfs_dirent *new_parent_sd;
- struct dentry *old_parent, *new_parent = NULL;
- struct dentry *old_dentry = NULL, *new_dentry = NULL;
int error;

- mutex_lock(&sysfs_rename_mutex);
BUG_ON(!sd->s_parent);
+
+ mutex_lock(&sysfs_mutex);
new_parent_sd = (new_parent_kobj && new_parent_kobj->sd) ?
new_parent_kobj->sd : &sysfs_root;

@@ -861,61 +808,20 @@ int sysfs_move_dir(struct kobject *kobj, struct kobject *new_parent_kobj)
if (sd->s_parent == new_parent_sd)
goto out; /* nothing to move */

- /* get dentries */
- old_dentry = sysfs_get_dentry(sd);
- if (IS_ERR(old_dentry)) {
- error = PTR_ERR(old_dentry);
- old_dentry = NULL;
- goto out;
- }
- old_parent = old_dentry->d_parent;
-
- new_parent = sysfs_get_dentry(new_parent_sd);
- if (IS_ERR(new_parent)) {
- error = PTR_ERR(new_parent);
- new_parent = NULL;
- goto out;
- }
-
-again:
- mutex_lock(&old_parent->d_inode->i_mutex);
- if (!mutex_trylock(&new_parent->d_inode->i_mutex)) {
- mutex_unlock(&old_parent->d_inode->i_mutex);
- goto again;
- }
- mutex_lock(&sysfs_mutex);
-
error = -EEXIST;
if (sysfs_find_dirent(new_parent_sd, sd->s_name))
- goto out_unlock;
-
- error = -ENOMEM;
- new_dentry = d_alloc_name(new_parent, sd->s_name);
- if (!new_dentry)
- goto out_unlock;
-
- error = 0;
- d_add(new_dentry, NULL);
- d_move(old_dentry, new_dentry);
+ goto out;

/* Remove from old parent's list and insert into new parent's list. */
sysfs_unlink_sibling(sd);
sysfs_get(new_parent_sd);
- drop_nlink(old_parent->d_inode);
sysfs_put(sd->s_parent);
sd->s_parent = new_parent_sd;
- inc_nlink(new_parent->d_inode);
sysfs_link_sibling(sd);

- out_unlock:
+ error = 0;
+out:
mutex_unlock(&sysfs_mutex);
- mutex_unlock(&new_parent->d_inode->i_mutex);
- mutex_unlock(&old_parent->d_inode->i_mutex);
- out:
- dput(new_parent);
- dput(old_dentry);
- dput(new_dentry);
- mutex_unlock(&sysfs_rename_mutex);
return error;
}

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 2dcafe8..082a3eb 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -205,17 +205,6 @@ static inline void set_inode_attr(struct inode * inode, struct iattr * iattr)
inode->i_ctime = iattr->ia_ctime;
}

-
-/*
- * sysfs has a different i_mutex lock order behavior for i_mutex than other
- * filesystems; sysfs i_mutex is called in many places with subsystem locks
- * held. At the same time, many of the VFS locking rules do not apply to
- * sysfs at all (cross directory rename for example). To untangle this mess
- * (which gives false positives in lockdep), we're giving sysfs inodes their
- * own class for i_mutex.
- */
-static struct lock_class_key sysfs_inode_imutex_key;
-


static int sysfs_count_nlink(struct sysfs_dirent *sd)

{
struct sysfs_dirent *child;
@@ -268,7 +257,6 @@ static void sysfs_init_inode(struct sysfs_dirent *sd, struct inode *inode)


inode->i_mapping->a_ops = &sysfs_aops;
inode->i_mapping->backing_dev_info = &sysfs_backing_dev_info;
inode->i_op = &sysfs_inode_operations;

- lockdep_set_class(&inode->i_mutex, &sysfs_inode_imutex_key);

set_default_inode_attr(inode, sd->s_mode);
sysfs_refresh_inode(sd, inode);
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 90b3501..98a15bf 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -103,7 +103,6 @@ extern struct kmem_cache *sysfs_dir_cachep;
* dir.c
*/
extern struct mutex sysfs_mutex;
-extern struct mutex sysfs_rename_mutex;
extern spinlock_t sysfs_assoc_lock;

extern const struct file_operations sysfs_dir_operations;
diff --git a/include/linux/namei.h b/include/linux/namei.h
index ec0f607..0289467 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -76,7 +76,6 @@ extern struct file *nameidata_to_filp(struct nameidata *nd, int flags);
extern void release_open_intent(struct nameidata *);

extern struct dentry *lookup_one_len(const char *, struct dentry *, int);
-extern struct dentry *lookup_one_noperm(const char *, struct dentry *);

extern int follow_down(struct path *);
extern int follow_up(struct path *);
--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 8, 2009, 2:29:01 AM11/8/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@xmission.com>

These two functions do 90% of the same work and it doesn't significantly


obfuscate the function to allow both the parent dir and the name to change
at the same time. So merge them together to simplify maintenance, and
increase testing.

Acked-by: Tejun Heo <t...@kernel.org>
Acked-by: Serge Hallyn <se...@us.ibm.com>


Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/dir.c | 62 +++++++++++++++++++++++++----------------------------
fs/sysfs/sysfs.h | 3 ++
2 files changed, 32 insertions(+), 33 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 0b60212..e1a86d1 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -760,30 +760,42 @@ void sysfs_remove_dir(struct kobject * kobj)
__sysfs_remove_dir(sd);
}

-int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
+int sysfs_rename(struct sysfs_dirent *sd,
+ struct sysfs_dirent *new_parent_sd, const char *new_name)
{
- struct sysfs_dirent *sd = kobj->sd;


const char *dup_name = NULL;
int error;

mutex_lock(&sysfs_mutex);

error = 0;
- if (strcmp(sd->s_name, new_name) == 0)
+ if ((sd->s_parent == new_parent_sd) &&
+ (strcmp(sd->s_name, new_name) == 0))


goto out; /* nothing to rename */

error = -EEXIST;
- if (sysfs_find_dirent(sd->s_parent, new_name))
+ if (sysfs_find_dirent(new_parent_sd, new_name))


goto out;

/* rename sysfs_dirent */

- error = -ENOMEM;
- new_name = dup_name = kstrdup(new_name, GFP_KERNEL);
- if (!new_name)
- goto out;
+ if (strcmp(sd->s_name, new_name) != 0) {
+ error = -ENOMEM;
+ new_name = dup_name = kstrdup(new_name, GFP_KERNEL);
+ if (!new_name)
+ goto out;
+
+ dup_name = sd->s_name;
+ sd->s_name = new_name;
+ }

- dup_name = sd->s_name;
- sd->s_name = new_name;
+ /* Remove from old parent's list and insert into new parent's list. */
+ if (sd->s_parent != new_parent_sd) {
+ sysfs_unlink_sibling(sd);
+ sysfs_get(new_parent_sd);
+ sysfs_put(sd->s_parent);
+ sd->s_parent = new_parent_sd;
+ sysfs_link_sibling(sd);
+ }

error = 0;
out:
@@ -792,37 +804,21 @@ int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
return error;
}

+int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
+{
+ return sysfs_rename(kobj->sd, kobj->sd->s_parent, new_name);
+}
+

int sysfs_move_dir(struct kobject *kobj, struct kobject *new_parent_kobj)
{
struct sysfs_dirent *sd = kobj->sd;
struct sysfs_dirent *new_parent_sd;

- int error;

BUG_ON(!sd->s_parent);
-
- mutex_lock(&sysfs_mutex);
- new_parent_sd = (new_parent_kobj && new_parent_kobj->sd) ?
+ new_parent_sd = new_parent_kobj && new_parent_kobj->sd ?
new_parent_kobj->sd : &sysfs_root;

- error = 0;
- if (sd->s_parent == new_parent_sd)
- goto out; /* nothing to move */
-
- error = -EEXIST;
- if (sysfs_find_dirent(new_parent_sd, sd->s_name))
- goto out;
-
- /* Remove from old parent's list and insert into new parent's list. */
- sysfs_unlink_sibling(sd);
- sysfs_get(new_parent_sd);
- sysfs_put(sd->s_parent);
- sd->s_parent = new_parent_sd;
- sysfs_link_sibling(sd);


-
- error = 0;

-out:
- mutex_unlock(&sysfs_mutex);
- return error;
+ return sysfs_rename(sd, new_parent_sd, sd->s_name);
}

/* Relationship between s_mode and the DT_xxx types */
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 98a15bf..ca52e7b 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -130,6 +130,9 @@ int sysfs_create_subdir(struct kobject *kobj, const char *name,
struct sysfs_dirent **p_sd);
void sysfs_remove_subdir(struct sysfs_dirent *sd);

+int sysfs_rename(struct sysfs_dirent *sd,
+ struct sysfs_dirent *new_parent_sd, const char *new_name);
+
static inline struct sysfs_dirent *__sysfs_get(struct sysfs_dirent *sd)
{
if (sd) {
--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 8, 2009, 2:29:19 AM11/8/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@xmission.com>

Now that sysfs_getattr and sysfs_permission refresh the vfs
inode there is no need to immediatly push the mode change
into the vfs cache. Reducing the amount of work needed and
simplifying the locking.

Acked-by: Tejun Heo <t...@kernel.org>
Acked-by: Serge Hallyn <se...@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/file.c | 31 ++++++++-----------------------
1 files changed, 8 insertions(+), 23 deletions(-)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index faa1a80..dc30d9e 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -579,38 +579,23 @@ EXPORT_SYMBOL_GPL(sysfs_add_file_to_group);
*/


int sysfs_chmod_file(struct kobject *kobj, struct attribute *attr, mode_t mode)

{
- struct sysfs_dirent *victim_sd = NULL;
- struct dentry *victim = NULL;
- struct inode * inode;
+ struct sysfs_dirent *sd;
struct iattr newattrs;
int rc;

- rc = -ENOENT;
- victim_sd = sysfs_get_dirent(kobj->sd, attr->name);
- if (!victim_sd)
- goto out;
+ mutex_lock(&sysfs_mutex);

- mutex_lock(&sysfs_rename_mutex);
- victim = sysfs_get_dentry(victim_sd);
- mutex_unlock(&sysfs_rename_mutex);
- if (IS_ERR(victim)) {
- rc = PTR_ERR(victim);
- victim = NULL;
+ rc = -ENOENT;
+ sd = sysfs_find_dirent(kobj->sd, attr->name);
+ if (!sd)
goto out;
- }
-
- inode = victim->d_inode;

- mutex_lock(&inode->i_mutex);
-
- newattrs.ia_mode = (mode & S_IALLUGO) | (inode->i_mode & ~S_IALLUGO);
+ newattrs.ia_mode = (mode & S_IALLUGO) | (sd->s_mode & ~S_IALLUGO);
newattrs.ia_valid = ATTR_MODE;
- rc = sysfs_setattr(victim, &newattrs);
+ rc = sysfs_sd_setattr(sd, &newattrs);

- mutex_unlock(&inode->i_mutex);
out:
- dput(victim);
- sysfs_put(victim_sd);
+ mutex_unlock(&sysfs_mutex);
return rc;
}
EXPORT_SYMBOL_GPL(sysfs_chmod_file);
--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 8, 2009, 2:29:38 AM11/8/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@xmission.com>

Currently sysfs updates the timestamps on the vfs directory
inode when we create or remove a directory entry but doesn't
update the cached copy on the sysfs_dirent, fix that oversight.

Acked-by: Tejun Heo <t...@kernel.org>
Acked-by: Serge Hallyn <se...@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/dir.c | 18 ++++++++++++++++++
1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index b5e8499..fa37126 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -464,6 +464,8 @@ void sysfs_addrm_start(struct sysfs_addrm_cxt *acxt,
*/


int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)

{
+ struct sysfs_inode_attrs *ps_iattr;
+
if (sysfs_find_dirent(acxt->parent_sd, sd->s_name))
return -EEXIST;

@@ -476,6 +478,13 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)

sysfs_link_sibling(sd);

+ /* Update timestamps on the parent */
+ ps_iattr = acxt->parent_sd->s_iattr;
+ if (ps_iattr) {
+ struct iattr *ps_iattrs = &ps_iattr->ia_iattr;
+ ps_iattrs->ia_ctime = ps_iattrs->ia_mtime = CURRENT_TIME;
+ }
+
return 0;
}

@@ -554,10 +563,19 @@ int sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
*/


void sysfs_remove_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)

{
+ struct sysfs_inode_attrs *ps_iattr;
+
BUG_ON(sd->s_flags & SYSFS_FLAG_REMOVED);

sysfs_unlink_sibling(sd);

+ /* Update timestamps on the parent */
+ ps_iattr = acxt->parent_sd->s_iattr;
+ if (ps_iattr) {
+ struct iattr *ps_iattrs = &ps_iattr->ia_iattr;
+ ps_iattrs->ia_ctime = ps_iattrs->ia_mtime = CURRENT_TIME;
+ }
+


sd->s_flags |= SYSFS_FLAG_REMOVED;
sd->s_sibling = acxt->removed;
acxt->removed = sd;
--

1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 8, 2009, 2:30:01 AM11/8/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@xmission.com>

The granularity of sysfs time when we keep it is 1 ns. Which
when passed to timestamp_trunc results in a nop. So remove
the unnecessary function call making sysfs_setattr slightly
easier to read.

Acked-by: Tejun Heo <t...@kernel.org>
Acked-by: Serge Hallyn <se...@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/inode.c | 9 +++------
1 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 8a08250..fed7a74 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -103,14 +103,11 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
if (ia_valid & ATTR_GID)
iattrs->ia_gid = iattr->ia_gid;
if (ia_valid & ATTR_ATIME)
- iattrs->ia_atime = timespec_trunc(iattr->ia_atime,
- inode->i_sb->s_time_gran);
+ iattrs->ia_atime = iattr->ia_atime;
if (ia_valid & ATTR_MTIME)
- iattrs->ia_mtime = timespec_trunc(iattr->ia_mtime,
- inode->i_sb->s_time_gran);
+ iattrs->ia_mtime = iattr->ia_mtime;
if (ia_valid & ATTR_CTIME)
- iattrs->ia_ctime = timespec_trunc(iattr->ia_ctime,
- inode->i_sb->s_time_gran);
+ iattrs->ia_ctime = iattr->ia_ctime;
if (ia_valid & ATTR_MODE) {
umode_t mode = iattr->ia_mode;

--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 8, 2009, 2:30:05 AM11/8/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@xmission.com>

Calling d_drop unconditionally when a sysfs_dirent is deleted has
the potential to leak mounts, so instead implement dentry delete
and revalidate operations that cause sysfs dentries to be removed
at the appropriate time.

Acked-by: Tejun Heo <t...@kernel.org>
Acked-by: Serge Hallyn <se...@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/dir.c | 73 +++++++++++++++++++++++++++++++++++--------------------
1 files changed, 46 insertions(+), 27 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 130dfc3..b5e8499 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -298,6 +298,46 @@ void release_sysfs_dirent(struct sysfs_dirent * sd)
goto repeat;
}

+static int sysfs_dentry_delete(struct dentry *dentry)


+{
+ struct sysfs_dirent *sd = dentry->d_fsdata;

+ return !!(sd->s_flags & SYSFS_FLAG_REMOVED);
+}
+
+static int sysfs_dentry_revalidate(struct dentry *dentry, struct nameidata *nd)


+{
+ struct sysfs_dirent *sd = dentry->d_fsdata;

+ int is_dir;
+
+ mutex_lock(&sysfs_mutex);
+
+ /* The sysfs dirent has been deleted */
+ if (sd->s_flags & SYSFS_FLAG_REMOVED)
+ goto out_bad;
+
+ mutex_unlock(&sysfs_mutex);
+out_valid:
+ return 1;
+out_bad:
+ /* Remove the dentry from the dcache hashes.
+ * If this is a deleted dentry we use d_drop instead of d_delete
+ * so sysfs doesn't need to cope with negative dentries.
+ */
+ is_dir = (sysfs_type(sd) == SYSFS_DIR);
+ mutex_unlock(&sysfs_mutex);
+ if (is_dir) {
+ /* If we have submounts we must allow the vfs caches
+ * to lie about the state of the filesystem to prevent
+ * leaks and other nasty things.
+ */
+ if (have_submounts(dentry))
+ goto out_valid;
+ shrink_dcache_parent(dentry);
+ }
+ d_drop(dentry);


+ return 0;
+}
+

static void sysfs_dentry_iput(struct dentry * dentry, struct inode * inode)
{
struct sysfs_dirent * sd = dentry->d_fsdata;

@@ -307,6 +347,8 @@ static void sysfs_dentry_iput(struct dentry * dentry, struct inode * inode)


}

static const struct dentry_operations sysfs_dentry_ops = {

+ .d_revalidate = sysfs_dentry_revalidate,
+ .d_delete = sysfs_dentry_delete,
.d_iput = sysfs_dentry_iput,
};

@@ -527,44 +569,21 @@ void sysfs_remove_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
}

/**
- * sysfs_drop_dentry - drop dentry for the specified sysfs_dirent
+ * sysfs_dec_nlink - Decrement link count for the specified sysfs_dirent
* @sd: target sysfs_dirent
*
- * Drop dentry for @sd. @sd must have been unlinked from its
+ * Decrement nlink for @sd. @sd must have been unlinked from its


* parent on entry to this function such that it can't be looked

* up anymore.
*/
-static void sysfs_drop_dentry(struct sysfs_dirent *sd)
+static void sysfs_dec_nlink(struct sysfs_dirent *sd)
{
struct inode *inode;
- struct dentry *dentry;



inode = ilookup(sysfs_sb, sd->s_ino);

if (!inode)
return;

- /* Drop any existing dentries associated with sd.
- *
- * For the dentry to be properly freed we need to grab a
- * reference to the dentry under the dcache lock, unhash it,
- * and then put it. The playing with the dentry count allows
- * dput to immediately free the dentry if it is not in use.
- */
-repeat:
- spin_lock(&dcache_lock);
- list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
- if (d_unhashed(dentry))
- continue;
- dget_locked(dentry);
- spin_lock(&dentry->d_lock);
- __d_drop(dentry);
- spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_lock);
- dput(dentry);
- goto repeat;
- }
- spin_unlock(&dcache_lock);


-
/* adjust nlink and update timestamp */

mutex_lock(&inode->i_mutex);

@@ -611,7 +630,7 @@ void sysfs_addrm_finish(struct sysfs_addrm_cxt *acxt)


acxt->removed = sd->s_sibling;
sd->s_sibling = NULL;

- sysfs_drop_dentry(sd);
+ sysfs_dec_nlink(sd);
sysfs_deactivate(sd);
unmap_bin_file(sd);
sysfs_put(sd);
--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 8, 2009, 2:30:31 AM11/8/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@xmission.com>

Cleanly separate the work that is specific to setting the
attributes of a sysfs_dirent from what is needed to update
the attributes of a vfs inode.

Additionally grab the sysfs_mutex to keep any nasties from
surprising us when updating the sysfs_dirent.

Acked-by: Tejun Heo <t...@kernel.org>


Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/inode.c | 52 ++++++++++++++++++++++++++++++++--------------------
fs/sysfs/sysfs.h | 1 +
2 files changed, 33 insertions(+), 20 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index fed7a74..fccfb55 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -64,30 +64,15 @@ struct sysfs_inode_attrs *sysfs_init_inode_attrs(struct sysfs_dirent *sd)

return attrs;
}
-int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
+
+int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr * iattr)
{
- struct inode * inode = dentry->d_inode;
- struct sysfs_dirent * sd = dentry->d_fsdata;
struct sysfs_inode_attrs *sd_attrs;
struct iattr *iattrs;
unsigned int ia_valid = iattr->ia_valid;
- int error;
-
- if (!sd)
- return -EINVAL;

sd_attrs = sd->s_iattr;

- error = inode_change_ok(inode, iattr);
- if (error)
- return error;
-
- iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */
-
- error = inode_setattr(inode, iattr);
- if (error)
- return error;
-
if (!sd_attrs) {
/* setting attributes for the first time, allocate now */
sd_attrs = sysfs_init_inode_attrs(sd);
@@ -110,12 +95,39 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)


iattrs->ia_ctime = iattr->ia_ctime;
if (ia_valid & ATTR_MODE) {
umode_t mode = iattr->ia_mode;
-

- if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))

- mode &= ~S_ISGID;
iattrs->ia_mode = sd->s_mode = mode;


}
}
+ return 0;
+}
+

+int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
+{
+ struct inode * inode = dentry->d_inode;
+ struct sysfs_dirent * sd = dentry->d_fsdata;
+ int error;
+
+ if (!sd)
+ return -EINVAL;
+
+ error = inode_change_ok(inode, iattr);
+ if (error)
+ return error;
+
+ iattr->ia_valid &= ~ATTR_SIZE; /* ignore size changes */
+ if (iattr->ia_valid & ATTR_MODE) {
+ if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
+ iattr->ia_mode &= ~S_ISGID;
+ }
+
+ error = inode_setattr(inode, iattr);
+ if (error)
+ return error;
+
+ mutex_lock(&sysfs_mutex);
+ error = sysfs_sd_setattr(sd, iattr);
+ mutex_unlock(&sysfs_mutex);
+
return error;
}

diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index af4c4e7..a96d967 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -155,6 +155,7 @@ static inline void __sysfs_put(struct sysfs_dirent *sd)
*/


struct inode *sysfs_get_inode(struct sysfs_dirent *sd);
void sysfs_delete_inode(struct inode *inode);

+int sysfs_sd_setattr(struct sysfs_dirent *sd, struct iattr *iattr);


int sysfs_setattr(struct dentry *dentry, struct iattr *iattr);

int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
size_t size, int flags);

--
1.6.5.2.143.g8cc62

Eric W. Biederman

unread,
Nov 8, 2009, 2:30:54 AM11/8/09
to Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn, Eric W. Biederman, Eric W. Biederman
From: Eric W. Biederman <ebie...@maxwell.aristanetworks.com>

The sysfs_mutex is required to ensure updates are and will remain


atomic with respect to other inode iattr updates, that do not happen
through the filesystem.

Acked-by: Serge Hallyn <se...@us.ibm.com>


Acked-by: Tejun Heo <t...@kernel.org>
Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
---

fs/sysfs/inode.c | 41 +++++++++++++++++++++++++++++------------
1 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index e28cecf..8a08250 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -122,23 +122,39 @@ int sysfs_setattr(struct dentry * dentry, struct iattr * iattr)
return error;
}

+static int sysfs_sd_setsecdata(struct sysfs_dirent *sd, void **secdata, u32 *secdata_len)
+{
+ struct sysfs_inode_attrs *iattrs;
+ void *old_secdata;
+ size_t old_secdata_len;
+
+ iattrs = sd->s_iattr;
+ if (!iattrs)
+ iattrs = sysfs_init_inode_attrs(sd);
+ if (!iattrs)
+ return -ENOMEM;
+
+ old_secdata = iattrs->ia_secdata;
+ old_secdata_len = iattrs->ia_secdata_len;
+
+ iattrs->ia_secdata = *secdata;
+ iattrs->ia_secdata_len = *secdata_len;
+
+ *secdata = old_secdata;
+ *secdata_len = old_secdata_len;


+ return 0;
+}
+

int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
size_t size, int flags)

{


struct sysfs_dirent *sd = dentry->d_fsdata;

- struct sysfs_inode_attrs *iattrs;
void *secdata;
int error;
u32 secdata_len = 0;

if (!sd)
return -EINVAL;
- if (!sd->s_iattr)
- sd->s_iattr = sysfs_init_inode_attrs(sd);
- if (!sd->s_iattr)
- return -ENOMEM;
-


- iattrs = sd->s_iattr;

if (!strncmp(name, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN)) {
const char *suffix = name + XATTR_SECURITY_PREFIX_LEN;
@@ -150,12 +166,13 @@ int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
&secdata, &secdata_len);
if (error)
goto out;
- if (iattrs->ia_secdata)
- security_release_secctx(iattrs->ia_secdata,
- iattrs->ia_secdata_len);
- iattrs->ia_secdata = secdata;
- iattrs->ia_secdata_len = secdata_len;

+ mutex_lock(&sysfs_mutex);
+ error = sysfs_sd_setsecdata(sd, &secdata, &secdata_len);
+ mutex_unlock(&sysfs_mutex);
+
+ if (secdata)
+ security_release_secctx(secdata, secdata_len);
} else
return -EINVAL;
out:
--
1.6.5.2.143.g8cc62

Tejun Heo

unread,
Nov 8, 2009, 10:58:22 PM11/8/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn
Eric W. Biederman wrote:
> The sysfs code updates the vfs caches immediately when the sysfs data
> structures change causing a lot of unnecessary complications. The
> following patchset untangles that beast. Allowing for simpler
> more straight forward code, the removal of a hack from the vfs
> to support sysfs, and human comprehensible locking on sysfs.
>
> Most of these patches have already been reviewed and acked from the
> last time I had time to work on sysfs.
>
> This acks have been folded in and the two small bugs found in the
> previous review have been fixed in the trailing patches (they are
> minor enough nits that even a bisect that happens to land in the
> middle should not see sysfs problems).

Thanks a lot for bringing some sanity to sysfs. :-)

--
tejun

Serge E. Hallyn

unread,
Nov 9, 2009, 9:15:19 AM11/9/09
to Tejun Heo, Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Quoting Tejun Heo (t...@kernel.org):
> Hello,
>
> Eric W. Biederman wrote:
> > It isn't what I want but it is what the VFS requires. If let the vfs
> > continue on it's delusional state we will leak the vfs mount and
> > everything mounted on top of it, with no way to remove the mounts.
>
> This is caused by not having any way to prevent deletion on
> directories with submounts, right? How does other distributed
> filesystems deal with directories with submounts going away underneath
> it?

Ooooh. I see, I was thinking only about the rename case, and forgot
this was the path for deleted files, too. For the rename case it
should be ok to let the dentry be put since the submounts will be
accessible at the new location, right? Should that be handled
separately?

-serge

Serge E. Hallyn

unread,
Nov 9, 2009, 9:26:55 AM11/9/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Eric W. Biederman
Quoting Eric W. Biederman (ebie...@xmission.com):
> From: Eric W. Biederman <ebie...@xmission.com>
>
> Cleanly separate the work that is specific to setting the
> attributes of a sysfs_dirent from what is needed to update
> the attributes of a vfs inode.
>
> Additionally grab the sysfs_mutex to keep any nasties from
> surprising us when updating the sysfs_dirent.
>
> Acked-by: Tejun Heo <t...@kernel.org>
> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>

Oh, sorry, guess i never did

Acked-by: Serge Hallyn <se...@us.ibm.com>

Eric Biederman

unread,
Nov 9, 2009, 5:35:08 PM11/9/09
to Serge E. Hallyn, Tejun Heo, Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-...@vger.kernel.org, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise
On Mon, Nov 9, 2009 at 6:14 AM, Serge E. Hallyn <se...@us.ibm.com> wrote:
> Quoting Tejun Heo (t...@kernel.org):
>> Hello,
>>
>> Eric W. Biederman wrote:
>> > It isn't what I want but it is what the VFS requires. �If let the vfs
>> > continue on it's delusional state we will leak the vfs mount and
>> > everything mounted on top of it, with no way to remove the mounts.
>>
>> This is caused by not having any way to prevent deletion on
>> directories with submounts, right? �How does other distributed
>> filesystems deal with directories with submounts going away underneath
>> it?
>
> Ooooh. �I see, I was thinking only about the rename case, and forgot
> this was the path for deleted files, too. �For the rename case it
> should be ok to let the dentry be put since the submounts will be
> accessible at the new location, right? �Should that be handled
> separately?

No in the rename case it isn't ok to let the dentry be discarded put as mounts
are implemented using a hash of the struct dentry's address, and if you aren't
the mount point you are referenced as d_parent.

For rename I am slightly better than NFS. sysfs does not support hard
links so if I am looking up the new name I can look for a preexisting
dentry for my
inode and if I find one I call d_move on it to lazily perform the rename.

Eric

Eric W. Biederman

unread,
Nov 17, 2009, 4:11:51 AM11/17/09
to Greg KH, Greg Kroah-Hartman, Kay Sievers, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise
Greg KH <gr...@kroah.com> writes:

> On Tue, Nov 03, 2009 at 03:53:56AM -0800, Eric W. Biederman wrote:
>>
>> The sysfs code updates the vfs caches immediately when the sysfs data
>> structures change causing a lot of unnecessary complications. The
>> following patchset untangles that beast. Allowing for simpler
>> more straight forward code, the removal of a hack from the vfs
>> to support sysfs, and human comprehensible locking on sysfs.
>>
>> Most of these patches have already been reviewed and acked from the
>> last time I had time to work on sysfs.
>>

>> In net the patches look like:
>

> Can you resend these based on the review that you just got, with the new
> acks and changes so that I can apply them?

Are you going to apply them soon? If you don't have time I can start
a sysfs tree and just ask Linus to pull them.

Greg KH

unread,
Nov 17, 2009, 10:42:33 AM11/17/09
to Eric W. Biederman, Kay Sievers, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise
On Tue, Nov 17, 2009 at 01:11:22AM -0800, Eric W. Biederman wrote:
> Greg KH <gr...@kroah.com> writes:
>
> > On Tue, Nov 03, 2009 at 03:53:56AM -0800, Eric W. Biederman wrote:
> >>
> >> The sysfs code updates the vfs caches immediately when the sysfs data
> >> structures change causing a lot of unnecessary complications. The
> >> following patchset untangles that beast. Allowing for simpler
> >> more straight forward code, the removal of a hack from the vfs
> >> to support sysfs, and human comprehensible locking on sysfs.
> >>
> >> Most of these patches have already been reviewed and acked from the
> >> last time I had time to work on sysfs.
> >>
> >> In net the patches look like:
> >
> > Can you resend these based on the review that you just got, with the new
> > acks and changes so that I can apply them?
>
> Are you going to apply them soon? If you don't have time I can start
> a sysfs tree and just ask Linus to pull them.

Well, as the merge window is not open right now, that might be tough to
get Linus to take them at the moment :)

I was away last week, and will apply them to my tree this week, thanks
for your patience.

greg k-h

Eric W. Biederman

unread,
Nov 17, 2009, 10:56:30 AM11/17/09
to Greg KH, Kay Sievers, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise
Greg KH <gr...@kroah.com> writes:

> On Tue, Nov 17, 2009 at 01:11:22AM -0800, Eric W. Biederman wrote:
>> Greg KH <gr...@kroah.com> writes:
>>
>> > On Tue, Nov 03, 2009 at 03:53:56AM -0800, Eric W. Biederman wrote:
>> >>
>> >> The sysfs code updates the vfs caches immediately when the sysfs data
>> >> structures change causing a lot of unnecessary complications. The
>> >> following patchset untangles that beast. Allowing for simpler
>> >> more straight forward code, the removal of a hack from the vfs
>> >> to support sysfs, and human comprehensible locking on sysfs.
>> >>
>> >> Most of these patches have already been reviewed and acked from the
>> >> last time I had time to work on sysfs.
>> >>
>> >> In net the patches look like:
>> >
>> > Can you resend these based on the review that you just got, with the new
>> > acks and changes so that I can apply them?
>>
>> Are you going to apply them soon? If you don't have time I can start
>> a sysfs tree and just ask Linus to pull them.
>
> Well, as the merge window is not open right now, that might be tough to
> get Linus to take them at the moment :)

True. ;)

> I was away last week, and will apply them to my tree this week, thanks
> for your patience.

I'm working on it.

Eric

Greg KH

unread,
Nov 20, 2009, 4:21:11 PM11/20/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn, Eric W. Biederman
On Sat, Nov 07, 2009 at 11:27:04PM -0800, Eric W. Biederman wrote:
> From: Eric W. Biederman <ebie...@xmission.com>
>
> Cleanly separate the work that is specific to setting the
> attributes of a sysfs_dirent from what is needed to update
> the attributes of a vfs inode.
>
> Additionally grab the sysfs_mutex to keep any nasties from
> surprising us when updating the sysfs_dirent.
>
> Acked-by: Tejun Heo <t...@kernel.org>
> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>

Due to the extended attr work for sysfs that went into Linus's tree
recently, this patch doesn't apply at all anymore.

I've applied the first 5 to my tree, care to respin the other 10 and
resend them?

thanks,

greg k-h

Eric W. Biederman

unread,
Nov 20, 2009, 4:39:35 PM11/20/09
to Greg KH, Greg Kroah-Hartman, Kay Sievers, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn, Eric W. Biederman
Greg KH <gr...@kroah.com> writes:

> On Sat, Nov 07, 2009 at 11:27:04PM -0800, Eric W. Biederman wrote:
>> From: Eric W. Biederman <ebie...@xmission.com>
>>
>> Cleanly separate the work that is specific to setting the
>> attributes of a sysfs_dirent from what is needed to update
>> the attributes of a vfs inode.
>>
>> Additionally grab the sysfs_mutex to keep any nasties from
>> surprising us when updating the sysfs_dirent.
>>
>> Acked-by: Tejun Heo <t...@kernel.org>
>> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
>
> Due to the extended attr work for sysfs that went into Linus's tree
> recently, this patch doesn't apply at all anymore.

I will take a look. I'm scratching my about how that caused a problem
because that patch came out of my tree originally, before the rest of
these patches. So if anything I should have problems in the other direction.

Eric

Eric W. Biederman

unread,
Nov 20, 2009, 7:07:38 PM11/20/09
to Greg KH, Greg Kroah-Hartman, Kay Sievers, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn, Eric W. Biederman
Greg KH <gr...@kroah.com> writes:

> On Sat, Nov 07, 2009 at 11:27:04PM -0800, Eric W. Biederman wrote:
>> From: Eric W. Biederman <ebie...@xmission.com>
>>
>> Cleanly separate the work that is specific to setting the
>> attributes of a sysfs_dirent from what is needed to update
>> the attributes of a vfs inode.
>>
>> Additionally grab the sysfs_mutex to keep any nasties from
>> surprising us when updating the sysfs_dirent.
>>
>> Acked-by: Tejun Heo <t...@kernel.org>
>> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
>
> Due to the extended attr work for sysfs that went into Linus's tree
> recently, this patch doesn't apply at all anymore.

It looks like the issue is the a trivial context conflict with
sysfs-mark-a-locally-only-used-function-static.patch

Which marks one of those new functions as static.

Here come the patches 6-15 rebased on top of that.

Eric

Greg KH

unread,
Nov 21, 2009, 12:15:27 AM11/21/09
to Eric W. Biederman, Kay Sievers, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn, Eric W. Biederman
On Fri, Nov 20, 2009 at 04:07:11PM -0800, Eric W. Biederman wrote:
> Greg KH <gr...@kroah.com> writes:
>
> > On Sat, Nov 07, 2009 at 11:27:04PM -0800, Eric W. Biederman wrote:
> >> From: Eric W. Biederman <ebie...@xmission.com>
> >>
> >> Cleanly separate the work that is specific to setting the
> >> attributes of a sysfs_dirent from what is needed to update
> >> the attributes of a vfs inode.
> >>
> >> Additionally grab the sysfs_mutex to keep any nasties from
> >> surprising us when updating the sysfs_dirent.
> >>
> >> Acked-by: Tejun Heo <t...@kernel.org>
> >> Signed-off-by: Eric W. Biederman <ebie...@aristanetworks.com>
> >
> > Due to the extended attr work for sysfs that went into Linus's tree
> > recently, this patch doesn't apply at all anymore.
>
> It looks like the issue is the a trivial context conflict with
> sysfs-mark-a-locally-only-used-function-static.patch

Ah, sorry about that, I should have noticed that, I was looking for a
much harder problem :(

I'll go queue these up in a bit.

thanks,

greg k-h

Greg KH

unread,
Nov 30, 2009, 4:58:24 PM11/30/09
to Eric W. Biederman, Greg Kroah-Hartman, Kay Sievers, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn
On Mon, Nov 30, 2009 at 01:33:37PM -0800, Greg KH wrote:

> On Sat, Nov 07, 2009 at 11:25:03PM -0800, Eric W. Biederman wrote:
> >
> > The sysfs code updates the vfs caches immediately when the sysfs data
> > structures change causing a lot of unnecessary complications. The
> > following patchset untangles that beast. Allowing for simpler
> > more straight forward code, the removal of a hack from the vfs
> > to support sysfs, and human comprehensible locking on sysfs.
> >
> > Most of these patches have already been reviewed and acked from the
> > last time I had time to work on sysfs.
> >
> > This acks have been folded in and the two small bugs found in the
> > previous review have been fixed in the trailing patches (they are
> > minor enough nits that even a bisect that happens to land in the
> > middle should not see sysfs problems).
>
> I've applied all of these to my tree now, and sorry, but something is
> broken pretty badly.
>
> When doing a simple 'ls /sys/class/input/' the process locks up. This
> means that X can't find any input devices, which makes for a bit of a
> problem when wanting to use your mouse or keyboard :(
>
> Attached is the state of my processes when this happens, if that helps
> out any.
>
> So I'm going to drop all of these from my tree again, as they are not
> ready for merging at this point :(

In looking at the stuck processes, it seems your last patch was the
problem. Removing that caused things to work again, so I've only
dropped that one.

Next time, please test your patches before submitting them :(

Eric W. Biederman

unread,
Nov 30, 2009, 7:15:58 PM11/30/09
to Greg KH, Greg Kroah-Hartman, Kay Sievers, linux-...@vger.kernel.org, Tejun Heo, Cornelia Huck, linux-...@vger.kernel.org, Eric Dumazet, Benjamin LaHaise, Serge Hallyn
Greg KH <gr...@kroah.com> writes:

Weird I thought I had tested this.

That last patch to add locking that is only needed for vfs coherency
has certainly seen less testing than the others.

I also remember verify that nfs does the same thing, when in fact
nfs takes inode->i_lock not inode->i_mutex in the same situation.

generic_permission takes no locks so this is really about serializing
writes to the inode. The vfs only takes inode->i_mutex, when calling
notify_change.

So it appears I have stepped into a murky corner of the vfs.

I will take a look and do a bit more testing. At the moment it looks
like a solution to serializing writes to the stat attributes on the inode is
going to be simply holding sysfs_mutex over inode_setattr in
sysfs_setattr. Assuming a solution is needed at all.

Eric

0 new messages