The interface is now:
int send_kevent(enum kevent type, struct kset *kset,
struct kobject *kobj, const char *signal)
Say your processor (with kobject "kobj") is overheating. You might do
send_kevent(KEVENT_POWER, NULL, kobj, "overheating");
We could get rid of signal and just require passing a specific attribute
file in sysfs, which would presumably explain the reason for the event,
but I think having a single signal value is acceptable. The rest of the
payload has been ditched.
The basic idea here is to represent to user-space events as changes to
sysfs. Media was changed? Then that block device in sysfs emits a
"media_change" event.
This patch includes two example events: file system mount and unmount.
Kay has some utilities and examples at
http://vrfy.org/projects/kevents/
and
http://vrfy.org/projects/kdbusd/
The intention of this work is to hook the kernel into D-BUS, although
the implementation is agnostic and should work with any user-space
setup.
Best,
Robert Love
Kernel Events Layer. A simple sysfs change notifier over netlink.
Signed-Off-By: Robert Love <r...@novell.com>
fs/super.c | 11 ++++-
include/linux/kevent.h | 42 +++++++++++++++++++
include/linux/netlink.h | 1
init/Kconfig | 14 ++++++
kernel/Makefile | 1
kernel/kevent.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++++
6 files changed, 173 insertions(+), 1 deletion(-)
diff -urN linux-2.6.9-rc1-mm2/fs/super.c linux/fs/super.c
--- linux-2.6.9-rc1-mm2/fs/super.c 2004-08-31 16:46:15.912777856 -0400
+++ linux/fs/super.c 2004-08-31 16:44:41.023203264 -0400
@@ -35,6 +35,7 @@
#include <linux/vfs.h>
#include <linux/writeback.h> /* for the emergency remount stuff */
#include <linux/idr.h>
+#include <linux/kevent.h>
#include <asm/uaccess.h>
@@ -875,8 +876,12 @@
up_write(&s->s_umount);
deactivate_super(s);
s = ERR_PTR(error);
- } else
+ } else {
s->s_flags |= MS_ACTIVE;
+ if (bdev->bd_disk)
+ send_kevent(KEVENT_FS, NULL,
+ &bdev->bd_disk->kobj, "mount");
+ }
}
return s;
@@ -891,6 +896,10 @@
void kill_block_super(struct super_block *sb)
{
struct block_device *bdev = sb->s_bdev;
+
+ if (bdev->bd_disk)
+ send_kevent(KEVENT_FS, NULL, &bdev->bd_disk->kobj, "umount");
+
generic_shutdown_super(sb);
set_blocksize(bdev, sb->s_old_blocksize);
close_bdev_excl(bdev);
diff -urN linux-2.6.9-rc1-mm2/include/linux/kevent.h linux/include/linux/kevent.h
--- linux-2.6.9-rc1-mm2/include/linux/kevent.h 1969-12-31 19:00:00.000000000 -0500
+++ linux/include/linux/kevent.h 2004-08-31 16:21:05.706364128 -0400
@@ -0,0 +1,42 @@
+#ifndef _LINUX_KEVENT_H
+#define _LINUX_KEVENT_H
+
+#include <linux/config.h>
+#include <linux/kobject.h>
+
+/* kevent types - these are used as the multicast group */
+enum kevent {
+ KEVENT_GENERAL = 0,
+ KEVENT_STORAGE = 1,
+ KEVENT_POWER = 2,
+ KEVENT_FS = 3,
+ KEVENT_HOTPLUG = 4,
+};
+
+#ifdef __KERNEL__
+#ifdef CONFIG_KERNEL_EVENTS
+
+int send_kevent(enum kevent type, struct kset *kset,
+ struct kobject *kobj, const char *signal);
+
+int send_kevent_atomic(enum kevent type, struct kset *kset,
+ struct kobject *kobj, const char *signal);
+
+#else
+
+static inline int send_kevent(enum kevent type, struct kset *kset,
+ struct kobject *kobj, const char *signal)
+{
+ return 0;
+}
+
+static inline int send_kevent_atomic(enum kevent type, struct kset *kset,
+ struct kobject *kobj, const char *signal)
+{
+ return 0;
+}
+
+#endif /* CONFIG_KERNEL_EVENTS */
+#endif /* __KERNEL__ */
+
+#endif /* _LINUX_KEVENT_H */
diff -urN linux-2.6.9-rc1-mm2/include/linux/netlink.h linux/include/linux/netlink.h
--- linux-2.6.9-rc1-mm2/include/linux/netlink.h 2004-08-31 16:46:16.316716448 -0400
+++ linux/include/linux/netlink.h 2004-08-31 16:44:41.372150216 -0400
@@ -17,6 +17,7 @@
#define NETLINK_ROUTE6 11 /* af_inet6 route comm channel */
#define NETLINK_IP6_FW 13
#define NETLINK_DNRTMSG 14 /* DECnet routing messages */
+#define NETLINK_KEVENT 15 /* Kernel messages to userspace */
#define NETLINK_TAPBASE 16 /* 16 to 31 are ethertap */
#define MAX_LINKS 32
diff -urN linux-2.6.9-rc1-mm2/init/Kconfig linux/init/Kconfig
--- linux-2.6.9-rc1-mm2/init/Kconfig 2004-08-31 16:46:16.454695472 -0400
+++ linux/init/Kconfig 2004-08-31 16:44:41.445139120 -0400
@@ -149,6 +149,20 @@
logging of avc messages output). Does not do system-call
auditing without CONFIG_AUDITSYSCALL.
+config KERNEL_EVENTS
+ bool "Kernel Events Layer"
+ depends on NET
+ default y
+ help
+ This option enables the kernel events layer, which is a simple
+ mechanism for kernel-to-user communication over a netlink socket.
+ The goal of the kernel events layer is to provide a simple and
+ efficient logging, error, and events system. Specifically, code
+ is available to link the events into D-BUS. Say Y, unless you
+ are building a system requiring minimal memory consumption.
+
+ D-BUS is available at http://dbus.freedesktop.org/
+
config AUDITSYSCALL
bool "Enable system-call auditing support"
depends on AUDIT && (X86 || PPC64 || ARCH_S390 || IA64)
diff -urN linux-2.6.9-rc1-mm2/kernel/kevent.c linux/kernel/kevent.c
--- linux-2.6.9-rc1-mm2/kernel/kevent.c 1969-12-31 19:00:00.000000000 -0500
+++ linux/kernel/kevent.c 2004-08-31 16:20:00.280310400 -0400
@@ -0,0 +1,105 @@
+/*
+ * kernel/kevent.c - sysfs event delivery via netlink socket
+ *
+ * Copyright (C) 2004 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2004 Novell, Inc. All rights reserved.
+ *
+ * Licensed under the GNU GPL v2.
+ *
+ * Authors:
+ * Robert Love <r...@novell.com>
+ * Kay Sievers <kay.s...@vrfy.org>
+ * Arjan van de Ven <arj...@redhat.com>
+ */
+
+#include <linux/module.h>
+#include <linux/spinlock.h>
+#include <linux/socket.h>
+#include <linux/skbuff.h>
+#include <linux/netlink.h>
+#include <linux/string.h>
+#include <linux/kobject.h>
+#include <linux/kevent.h>
+#include <net/sock.h>
+
+static struct sock *kevent_sock = NULL; /* kevent's global netlink socket */
+
+/**
+ * send_kevent - send a message to user-space via the kernel events layer
+ */
+static int do_send_kevent(enum kevent type, int gfp_mask,
+ const char *object, const char *signal)
+{
+ struct sk_buff *skb;
+ char *buffer;
+ int len;
+
+ if (!kevent_sock)
+ return -EIO;
+
+ if (!object || !signal)
+ return -EINVAL;
+
+ len = strlen(object) + 1 + strlen(signal);
+
+ skb = alloc_skb(len, gfp_mask);
+ if (!skb)
+ return -ENOMEM;
+
+ buffer = skb_put(skb, len);
+
+ sprintf(buffer, "%s\n%s", object, signal);
+
+ return netlink_broadcast(kevent_sock, skb, 0, (1 << type), gfp_mask);
+}
+
+int send_kevent(enum kevent type, struct kset *kset,
+ struct kobject *kobj, const char *signal)
+{
+ const char *path;
+ int ret;
+
+ path = kobject_get_path(kset, kobj, GFP_KERNEL);
+ if (!path)
+ return -ENOMEM;
+
+ ret = do_send_kevent(type, GFP_KERNEL, path, signal);
+ kfree(path);
+
+ return ret;
+}
+
+EXPORT_SYMBOL_GPL(send_kevent);
+
+int send_kevent_atomic(enum kevent type, struct kset *kset,
+ struct kobject *kobj, const char *signal)
+{
+ const char *path;
+ int ret;
+
+ path = kobject_get_path(kset, kobj, GFP_ATOMIC);
+ if (!path)
+ return -ENOMEM;
+
+ ret = do_send_kevent(type, GFP_ATOMIC, path, signal);
+ kfree(path);
+
+ return ret;
+}
+
+EXPORT_SYMBOL_GPL(send_kevent_atomic);
+
+static int kevent_init(void)
+{
+ kevent_sock = netlink_kernel_create(NETLINK_KEVENT, NULL);
+
+ if (!kevent_sock) {
+ printk(KERN_ERR
+ "kevent: unable to create netlink socket!\n");
+ return -ENODEV;
+ }
+
+ return 0;
+}
+
+module_init(kevent_init);
diff -urN linux-2.6.9-rc1-mm2/kernel/Makefile linux/kernel/Makefile
--- linux-2.6.9-rc1-mm2/kernel/Makefile 2004-08-31 16:46:16.471692888 -0400
+++ linux/kernel/Makefile 2004-08-31 16:45:49.025865288 -0400
@@ -27,6 +27,7 @@
obj-$(CONFIG_AUDIT) += audit.o
obj-$(CONFIG_AUDITSYSCALL) += auditsc.o
obj-$(CONFIG_KPROBES) += kprobes.o
+obj-$(CONFIG_KERNEL_EVENTS) += kevent.o
ifneq ($(CONFIG_IA64),y)
# According to Alan Modra <al...@linuxcare.com.au>, the -fno-omit-frame-pointer is
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Buffer overrun, methinks.
> Why not share the implementation here?
Because we will probably want to export do_send_kevent(), with a
different name, if this thing starts getting used, because there are
places where the kobj path is already computed and although inexpensive
it does cost a few cycles to go kobject -> path as a string.
Robert Love
> I *still* thinking putting all the possible messages into something
> like include/linux/kevent_msg.h is a good idea to prevent people
> creating all sorts of stupid ad-hoc messages over time that userspace
> will ineviitably not be able to track....
I still agree, but since that does not change the API (I'd still want
the signal to be a string) that bit can come later when there are real
users. Or I will take a patch now. ;-)
Since we ditched the payload, the only thing left is the signal, which
is going to be "changed" half the time anyhow.
> Robert Love <r...@ximian.com> wrote:
> >
> > - len = strlen(object) + 1 + strlen(signal);
> > + len = strlen(object) + 1 + strlen(signal) + 1;
> >
> > should fix it, right?
Attached.
Robert Love
+ len = strlen(object) + 1 + strlen(signal) + 1;
+
+ skb = alloc_skb(len, gfp_mask);
+ if (!skb)
+ return -ENOMEM;
+
+ buffer = skb_put(skb, len);
+
+ sprintf(buffer, "%s\n%s", object, signal);
+
+ return netlink_broadcast(kevent_sock, skb, 0, (1 << type), gfp_mask);
+}
+
+int send_kevent(enum kevent type, struct kset *kset,
+ struct kobject *kobj, const char *signal)
+{
+ const char *path;
+ int ret;
+
+ path = kobject_get_path(kset, kobj, GFP_KERNEL);
+ if (!path)
+ return -ENOMEM;
+
+ ret = do_send_kevent(type, GFP_KERNEL, path, signal);
+ kfree(path);
+
+ return ret;
+}
+
+EXPORT_SYMBOL_GPL(send_kevent);
+
+int send_kevent_atomic(enum kevent type, struct kset *kset,
+ struct kobject *kobj, const char *signal)
+{
+ const char *path;
+ int ret;
+
+ path = kobject_get_path(kset, kobj, GFP_ATOMIC);
+ if (!path)
+ return -ENOMEM;
+
That's unrelated. I meant:
static int __send_kevent(enum kevent type, struct kset *kset,
struct kobject *kobj, const char *signal, int gfp_flags)
{
const char *path;
int ret;
path = kobject_get_path(kset, kobj, gfp_flags);
if (!path)
return -ENOMEM;
ret = do_send_kevent(type, gfp_flags, path, signal);
kfree(path);
return ret;
}
int send_kevent(enum kevent type, struct kset *kset,
struct kobject *kobj, const char *signal)
{
return __send_kevent(type, kset, kobj, signal, GFP_KERNEL);
}
EXPORT_SYMBOL_GPL(send_kevent);
int send_kevent_atomic(enum kevent type, struct kset *kset,
struct kobject *kobj, const char *signal)
{
return __send_kevent(type, kset, kobj, signal, GFP_ATOMIC);
}
EXPORT_SYMBOL_GPL(send_kevent_atomic);
> That's unrelated. I meant:
Not unrelated - if it were not the case, I'd see your point. But, ugh,
add another level of redirection to save the duplication of three lines?
Robert Love
Hrm, but len is the right size...
Oh, missing the NULL, eh?
So
- len = strlen(object) + 1 + strlen(signal);
+ len = strlen(object) + 1 + strlen(signal) + 1;
should fix it, right?
Robert Love
Hi Robert,
Are you limiting the kernel event mechanism a little too much by getting
rid of the payload? Wouldn't it be useful to sometimes have data at
event time?
Thanks,
Dan
The motivation for doing this, is the ambitioned idea, that _data_ should
only be exposed through sysfs values to userspace. This would make it
possible for userspace to scan any state at any time, regardless of a
received event. Which should make the whole setup more reliable, as
applications can just read in the state at startup. We do a similar job
with udevstart, as all lost hotplug events during the early boot are
recovered just by reading sysfs and synthesize these events for creating
device nodes.
The same applies to the way back to the kernel. We don't want to send
data over the netlink back to the kernel, we can write to sysfs.
Very "simple" data still can be specified by the signal string, just
like a "verb" that describes, what actually happened with the kobject.
In the mount case, we send a "mount/unmount" event for the physical
device, and userspace can read "/proc/mounts" for the data, as
applications do today by polling.
Another version to do this (similar to Robert's CPU overheating example
above), is to create a owner value in the blockdevice's kset and let the
device claiming code fill that value. Then the signal may just be a simple
"add/remove" event for the "owner" file at device claiming and release.
Yes, it may require, that some things in the kernel need to use kobjects
to represent it's state to userspace, but that is a nice side effect and
better than a complicated definition, what the event may carry ot of the
kernel with every specific event, I think.
If you can think of any case we can't expose enough data with this model
or we will not be able to use sysfs, let us know.
Thanks,
Kay
Why is the kset needed? We can determine that from the kobject.
How about changing this to:
int send_kevent(struct kobject *kobj, struct attribute *attr);
which just tells userspace that a specific attribute needs to be read,
as something "important" has changed.
Will passing the attribute name be able to successfully handle the
"enum kevent" and "signal" combinations?
thanks,
greg k-h
Do all events require an attribute? What about the "overheating"
example? Would you need an attribute or would getting a "signal" for a
specific kobj be enough?
Binding an attribute to an event would at least tell you the name of the
attribute to check. Otherwise, how does an app know the name of the
attribute that changed? Or am I missing something?
Thanks,
Dan
I expect it's because:
fill_kobj_path(struct kset *kset, struct kobject *kobj, char *path, int length)
get_kobj_path_length(struct kset *kset, struct kobject *kobj)
and therefore the exported:
kobject_get_path(struct kset *kset, struct kobject *kobj, int gfp_mask)
are all passing the kset. If they all are not needed, they can go too?
> How about changing this to:
> int send_kevent(struct kobject *kobj, struct attribute *attr);
> which just tells userspace that a specific attribute needs to be read,
> as something "important" has changed.
Hmm, in most cases this will work. But in mandates the creation of an
attribute instead of the lazy signal string. Yes, it would be nicer in
the long run and closer to the idea, that the whole event data should be
available through sysfs, but it may be hard to reach this in some subsystems.
> Will passing the attribute name be able to successfully handle the
> "enum kevent" and "signal" combinations?
What should we do in the hotplug case? We may send a NULL attr for
the kset creation. But how can we distinguish between "add" and "remove"?
Just by looking if we find the sysfs dir? I'm not sure in this case.
Any ideas?
Thanks,
Kay
Hmm, both can be the case at the moment. We need to decide, if we want to
mandate the use of a sysfs attribute instead of the string value as the signal.
> Binding an attribute to an event would at least tell you the name of the
> attribute to check. Otherwise, how does an app know the name of the
> attribute that changed? Or am I missing something?
That's a valid point, right. If we don't use "verbs" as the signal string,
and know what we can expect from that particular kind of event, we don't
know which attribute has changed.
The remaining questions are:
o Do we want the multicast - "channels" for the events? It may be nice, to
have it, if a application only interested in e.g. hotplug events, can get
only the interesting events?
o What kind of signal do we need? A lazy string, a well defined set like
ADD/REMOVE/CHANGE?
Or can we get rid of the whole signal? But how can we distinguish between
add and remove? Watching if the sysfs file comes or goes is not a option,
I think.
Thanks,
Kay
> Why is the kset needed? We can determine that from the kobject.
>
> How about changing this to:
> int send_kevent(struct kobject *kobj, struct attribute *attr);
> which just tells userspace that a specific attribute needs to be read,
> as something "important" has changed.
>
> Will passing the attribute name be able to successfully handle the
> "enum kevent" and "signal" combinations?
We can drop the kset if you say we never need it. Why do all the kobj
get_path functions take a kset then? That is what confused me.
We can also drop the "enum kevent" if we decide we don't want to take
advantage of the multicasting of the netlink socket. The enum defines
what multicast group the netlink message is sent out in. I actually
have been talking to Kay about ditching it, and we are trying to figure
out if we ever _need_ it.
So that is 2 for 2. But ...
I don't dig replacing the signal string with an attribute. I think it
will really limit what we can do - having the signal as a verb
describing the event is really important. We might also not always have
an attribute. Kay's points are all valid.
So what if we had
int send_kevent(struct kobject *kobj, const char *signal);
Which was a way of notifying user-space of a change of "signal" on the
object "kobj" in sysfs.
If we ever wanted to send an attribute, we can do something like
const char *signal;
signal = attribute_to_string(attribute);
send_kevent(kobj, signal);
kfree(signal);
Dig that?
Best,
Robert Love
> o What kind of signal do we need? A lazy string, a well defined set like
> ADD/REMOVE/CHANGE?
> Or can we get rid of the whole signal? But how can we distinguish between
> add and remove? Watching if the sysfs file comes or goes is not a option,
> I think.
I think (from our off-list discussions) that we really need the signal.
Agreed?
I do think that defining the signal to specific values makes sense, e.g.
KEVENT_ADD, KEVENT_REMOVE, KEVENT_MOUNTED, etc. We could also send the
attribute as a string.
To get around the hotplug issue that would occur without 'enum kevent',
as we discussed, we could have a "hotplug_added" signal or whatever.
Nothing wrong with that.
Cool or not?
Robert Love
The only problem I see is making an app sift through all of the events
to get to a specific type of event. They'd have to parse and match the
signal, that could be costly.
Will there be small single purpose applications listening on the netlink
socket or off dbus for specific signals? Or do you see this mainly
handled by a single kevent daemon?
> So that is 2 for 2. But ...
>
> I don't dig replacing the signal string with an attribute. I think it
> will really limit what we can do - having the signal as a verb
> describing the event is really important. We might also not always have
> an attribute. Kay's points are all valid.
>
> So what if we had
>
> int send_kevent(struct kobject *kobj, const char *signal);
>
> Which was a way of notifying user-space of a change of "signal" on the
> object "kobj" in sysfs.
>
> If we ever wanted to send an attribute, we can do something like
>
> const char *signal;
>
> signal = attribute_to_string(attribute);
> send_kevent(kobj, signal);
> kfree(signal);
>
> Dig that?
I thought you wanted to standardize the signals? Would you have a
delimiter in your signal to have a standard prefix to use to identify
the string in User Space?
Why not add the attribute name to the path you get from the kobj?
send_attribute_kevent(kobj, attr->name, signal);
and
send_kevent(kobj, signal);
Thanks,
Dan
> The only problem I see is making an app sift through all of the events
> to get to a specific type of event. They'd have to parse and match the
> signal, that could be costly.
I'd hope that they were not that many events that it would ever be
costly.
But this should be mitigated by your event aggregator in user-space,
whether that is D-BUS or whatever else. If it is D-BUS, you could then
subscribe via D-BUS only to the events you care about.
> Will there be small single purpose applications listening on the netlink
> socket or off dbus for specific signals? Or do you see this mainly
> handled by a single kevent daemon?
Whatever you want to do.
I think they way we would set up our Linux desktop product would have D-
BUS listening on the kevent socket and propagating the events up the
stack (or have a separate daemon listen and funnel the events to D-BUS.
whatever).
Then applications up the stack would respond to and handle the D-BUS
signals.
Robert Love
Ok.. so if I wanted to send an event (that included data at event time)
from a driver for a particular device, I would send the event with the
send_kevent() call and create and maintain an attribute for that
specific event. In order for an application to receive the data for the
event, the driver would need to store the data for that event somewhere
and keep it. Unless there's a write attribute to tell me it's been read,
I must maintain it or write over it if the same event occurs again.
Is this how it's supposed to work?
Even though 1) there won't be many events and 2) few events will include
data - don't you think this is a bit too much development overhead for
the driver?
If you had the payload with the event, you could fire and forget. It
would be fewer steps for the driver and require less management and
storage.
Thanks,
Dan
No, the current version(idea) is mainly a sysfs/kobject notification,
not a data channel, or something that requires a more complicated logic.
(but, yes, your example applies to simple event sequence numbers too. We
can't expose these kind of "snapshot data" by a sysfs value).
> Even though 1) there won't be many events and 2) few events will include
> data - don't you think this is a bit too much development overhead for
> the driver?
>
> If you had the payload with the event, you could fire and forget. It
> would be fewer steps for the driver and require less management and
> storage.
There will be only very few cases for that, but some drivers will want
to send these kind of information to userspace (even binary blobs).
Don't know if this kind of data dump should be part of kevent or better
handled by something driver specific. It gets even more complicated, if
the userspace listener must interpret all these different data formats.
Maybe we just need to rename "kevent" to "kobject_notify" to make the
focus more clear :)
Thanks,
Kay
Thanks, Kay, for answering my questions.
I'm wondering if you've narrowed the interface too much in respect to
possible events. I'm interested in error event notification. My goal is
to work on creating common, consistent, and meaningful error messages or
notifications that can be easily understood or used to trigger events.
Possible responses to error messages - in User Space - could be to spit
out a more detailed error message to the console that includes possible
causes, possible actions, and the erroring device information (gathered
from HAL and/or sysfs) or to automatically launch a diagnostic.
While I'm currently more interested in the actual error events, I
thought I could take advantage of the proposed kevent interface because
parsing /var/log/messages is cumbersome. But now I'm not so sure.
I was thinking the send_kevent form that included the payload could be
put into dev_err() macros, so we didn't have to add yet another
interface to drivers. We could just patch dev_err(). I even thought we
could add a dev_event() for using specific events.
Now I'm wondering if this interface would be useful for error events.
Most error events don't require data at event time, but there are some
that do.
If you curious about the error events, I've started a list of actionable
error events that includes the current message string, possible causes,
and possible actions (it's a work in progress):
http://linux-diag.sourceforge.net/first_failure/FirstFailure.html
Thanks,
Dan
On Thu, Sep 02, 2004 at 12:25:21PM -0400, Robert Love wrote:
> On Thu, 2004-09-02 at 10:34 +0200, Greg KH wrote:
>
> > Why is the kset needed? We can determine that from the kobject.
> >
> > How about changing this to:
> > int send_kevent(struct kobject *kobj, struct attribute *attr);
> > which just tells userspace that a specific attribute needs to be read,
> > as something "important" has changed.
> >
> > Will passing the attribute name be able to successfully handle the
> > "enum kevent" and "signal" combinations?
>
> We can drop the kset if you say we never need it. Why do all the kobj
> get_path functions take a kset then? That is what confused me.
Look at kobject_hotplug. We can determine the kset assigned to a
kobject with the logic in that function. Then we use the kset to get
the path and other good stuff that the hotplug call needs.
> We can also drop the "enum kevent" if we decide we don't want to take
> advantage of the multicasting of the netlink socket. The enum defines
> what multicast group the netlink message is sent out in. I actually
> have been talking to Kay about ditching it, and we are trying to figure
> out if we ever _need_ it.
Sounds fine to me.
> So that is 2 for 2. But ...
>
> I don't dig replacing the signal string with an attribute. I think it
> will really limit what we can do - having the signal as a verb
> describing the event is really important. We might also not always have
> an attribute. Kay's points are all valid.
>
> So what if we had
>
> int send_kevent(struct kobject *kobj, const char *signal);
>
> Which was a way of notifying user-space of a change of "signal" on the
> object "kobj" in sysfs.
Ok, I'll accept that, and I like it, it's simple, and yet powerful for
pretty much everything we'll need in the future.
In fact, I like that type of function so much, I wrote it a few years
ago:
void kobject_hotplug(const char *action, struct kobject *kobj)
You fell right into my trap :)
So, we're back to the original issue. Why is this kernel event system
different from the hotplug system? I would argue there isn't one,
becides the transport, as you seem to want everything that we currently
provide in the current kobject_hotplug() call.
But transports are important, I agree.
How about you just add the ability to send hotplug calls across netlink?
Make it so the kobject_hotplug() function does both the exec() call, and
a netlink call (based on a config option for those people who like to
configure such stuff.)
That way, programs who want to listen to netlink messages to get hotplug
events do so, and so does programs who want to do the /etc/hotplug.d/
type of notification?
Oh, and attributes. How about we just change kobject_hotplug() to be:
int kobject_hotplug(const char *action, struct kobject *kobj, struct attribute *attr);
and if we set attr to be a valid pointer, we make the DEVPATH paramater
contain the attribute name at the end of it.
I'd be glad to accept a patch that implements this.
Look acceptable?
thanks,
greg k-h
I don't think this "event notification" system should be used for error
event notification. For errors, you want to never drop them, or want to
rely on userspace reading the sysfs attribute file in time before it
changes again.
And as my previous message shows, I think we just evolved back to the
current hotplug interface, which really isn't a good one for errors :)
> If you curious about the error events, I've started a list of actionable
> error events that includes the current message string, possible causes,
> and possible actions (it's a work in progress):
>
> http://linux-diag.sourceforge.net/first_failure/FirstFailure.html
That's a great start. Hopefully it will help in figuring out what error
messages should be standardized on across drivers that belong to the
same class.
thanks,
greg k-h
If we add this to the kobject_hotplug function, how do we prevent the
execution of /sbin/hotplug for ksets that have positive hotplug filters?
So I've created a new function for now:
int kobj_notify(const char *signal, struct kobject *kobj, struct attribute *attr)
which can be used from the subsystems. This function is also called for
the normal /sbin/hotplug event. (The subsystems may provide a additional
environment for the /sbin/hotplug events, this is ignored by now.)
Here is the debug output of a simple listener, while inserting a
USB-memory-stick, mounting, unmounting and removing it:
[root@pim kdbusd]# ./kdbusd
main: [1094349091] 'add' from '/org/kernel/devices/pci0000:00/0000:00:1d.1/usb3/3-1'
main: [1094349091] 'add' from '/org/kernel/devices/pci0000:00/0000:00:1d.1/usb3/3-1/3-1:1.0'
main: [1094349091] 'add' from '/org/kernel/class/scsi_host/host2'
main: [1094349091] 'add' from '/org/kernel/devices/pci0000:00/0000:00:1d.1/usb3/3-1/3-1:1.0/host2/2:0:0:0'
main: [1094349091] 'add' from '/org/kernel/block/sda'
main: [1094349091] 'add' from '/org/kernel/class/scsi_device/2:0:0:0'
main: [1094349091] 'add' from '/org/kernel/class/scsi_generic/sg0'
main: [1094349092] 'add' from '/org/kernel/block/sda/sda1'
main: [1094349094] 'claim' from '/org/kernel/block/sda/sda1'
main: [1094349101] 'release' from '/org/kernel/block/sda/sda1'
main: [1094349106] 'remove' from '/org/kernel/class/scsi_generic/sg0'
main: [1094349106] 'remove' from '/org/kernel/class/scsi_device/2:0:0:0'
main: [1094349106] 'remove' from '/org/kernel/block/sda/sda1'
main: [1094349106] 'remove' from '/org/kernel/block/sda'
main: [1094349106] 'remove' from '/org/kernel/devices/pci0000:00/0000:00:1d.1/usb3/3-1/3-1:1.0/host2/2:0:0:0'
main: [1094349106] 'remove' from '/org/kernel/class/scsi_host/host2'
main: [1094349106] 'remove' from '/org/kernel/devices/pci0000:00/0000:00:1d.1/usb3/3-1/3-1:1.0'
main: [1094349106] 'remove' from '/org/kernel/devices/pci0000:00/0000:00:1d.1/usb3/3-1'
Thanks,
Kay
> So, we're back to the original issue. Why is this kernel event system
> different from the hotplug system? I would argue there isn't one,
> becides the transport, as you seem to want everything that we currently
> provide in the current kobject_hotplug() call.
>
> But transports are important, I agree.
>
> How about you just add the ability to send hotplug calls across netlink?
> Make it so the kobject_hotplug() function does both the exec() call, and
> a netlink call (based on a config option for those people who like to
> configure such stuff.)
This smells.
Look, I agree that unifying the two ideas and transports as much as
possible is the right way to proceed. But the fact is, as you said,
transports _are_ important. And simply always sending out a hotplug
event _and_ a netlink event is silly and superfluous. We need to make
up our minds.
I don't think anyone argues that netlink makes sense for these low
priority asynchronous events.
I'd prefer to integrate the two approaches as much as possible, but keep
the two transports separate. Use hotplug for hotplug events as we do
now and use kevent, which is over netlink, for the new events we want to
add.
Maybe always do the kevent from the hotplug, but definitely do not do
the hotplug from all kevents. It is redundant and extra overhead.
Doing both simultaneous begs the question of why have both. Picking the
right tool for the job is, well, the right tool for the job.
Robert Love
> If we add this to the kobject_hotplug function, how do we prevent the
> execution of /sbin/hotplug for ksets that have positive hotplug filters?
Ah, another good point.
> So I've created a new function for now:
> int kobj_notify(const char *signal, struct kobject *kobj, struct attribute *attr)
> which can be used from the subsystems. This function is also called for
> the normal /sbin/hotplug event. (The subsystems may provide a additional
> environment for the /sbin/hotplug events, this is ignored by now.)
This is basically the last patch I posted, with the removel of "enum
kevent" and the addition of struct attribute *attr".
Which is exactly what I want. ;-)
Best,
> But transports are important, I agree.
Oh, I have another thought (see my previous email first).
The proper course of action based on your suggestion is to cleanly
abstract the concept of the "backend transport" from the notifier, and
offer a compile-time option of hotplug, netlink, and/or whatever else.
Make the transport pluggable and configurable. Do it cleanly. Yada
yada.
But that is a lot of code and a lot of work. More than I think is
warranted, right?
Accepting that the above is the clean and proper way to do what you say,
let's carry it through. What is the ideal situation? People pick
either hotplug or netlink or foo as their transport. Why pick more than
one? Most people select hotplug because that is there now and works.
Maybe in the future people would choose netlink and move to that. This
is all ideally.
In practice, however, we get people enabling both hotplug and netlink,
because they need hotplug for hotplug and want netlink for the new
kevent stuff. So this approach leads to no one ever picking the ideal.
What we want is people using hotplug for hotplug, and kevent over
netlink for the event stuff.
So why stick the two together?
We have kobject_hotplug() and kobject_notify() and everything makes
sense.
Robert Love
in addition I consider the 2 *uses* ortogonal. Hotplug historically has
been 1) reasonably heavy weight and 2) concerned with hardware changes.
General events such as say, "thermal throttling started" could of course
be munged into hotplug but to me that is something artificial.
Saying "look ma, the interfaces have the same types, so it has to be one
function" is imo the wrong thing to do. The usage is different (although
I will admit there is an overlapping area). The cost of sending a
message is different.
Yes, it doesn't make much sense to pipe the kevents through
/sbin/hotplug, but I definitely want the hotplug-events over netlink,
to get rid of the SEQNUM reorder nightmare and unpredictable delay of
the execution of the /etc/hotplug.d/ helpers, we currently can't handle
very well with HAL.
I expect, that we need to have both transports (for hotplug), cause early
boot is unable to react to netlink messages.
The /sbin/hotplug can easily switched off, after some "advanced event daemon"
is running by: "echo -n "" > /proc/sys/kernel/hotplug".
What about moving the /sbin/hotplug execution from lib/kobject.c to
kernel/kobj_notify.c and merge it with our netlink code? This would
separate the "object-storage" from the "object-notification" which is
nice. And we would have a single place to implement all kind of event
transports.
If anybody invents some other kind of transport it can go into
kobj_notify.c. All transports can share some code and are highly
configurable then.
kernel/kobj_notify.c would export:
kobject_hotplug(const char *action, struct kobject *kobj)
kobj_notify (const char *signal, struct kobject *kobj, struct attribute *attr)
lib/kobject.c just calls kobject_hotplug() and we get the event on both
transports at the same time. All other events just use kobj_notify() which
doesn't do /sbin/hotplug.
How does it sound?
Thanks,
Kay
I tried it! Here is a version that moves kobject_hotplug() event out of the
kobject core to the new "userspace event"-file kernel/kobject_uevent.c.
The /sbin/hotplug message is now just a special kind of userspace notification.
All hotplug messages are also send over netlink, with the whole hotplug
environment attached to the message. It looks like this on the wire:
recv(3, "add@/class/input/mouse2\0ACTION=add\0DEVPATH=/class/input/mouse2\0SEQNUM=768\0SUBSYSTEM=input\0", 1024, 0) = 90
All other events like the filesystem device claim/release are only sent
through the netlink interface by kobject_uevent().
Thanks,
Kay
(Andrew, you can drop your other patch in your stack that contained a
version of this.)
thanks,
greg k-h
----------------------
Subject: Kobject Userspace Event Notification
Implemetation of userspace events through a netlink socket. The kernel events
layer provides the functionality to raise an event from a given kobject
represented by its sysfs-path and a signal string to describe the type of
event.
Currently, kobject additions and removals are signalized to userspace by forking
the /sbin/hotplug helper. This patch moves this special case of userspace-event
out of the kobject core to the new kobject_uevent implementation. This makes it
possible to send all hotplug messages also through the new netlink transport.
Possible new users of the kernel userspace functionality are filesystem
mount events (block device claim/release) or simple device state transitions
(cpu overheating).
To send an event, the user needs to pass the kobject, a optional
sysfs-attribute and the signal string to the following function:
kobject_uevent(const char *signal,
struct kobject *kobj,
struct attribute *attr)
Example:
kobject_uevent("overheating", &cpu->kobj, NULL);
The message itself is sent over multicast netlink socket, which makes
it possible for userspace to listen with multiple applications for the same
messages.
Signed-off-by: Robert Love <r...@novell.com>
Signed-off-by: Kay Sievers <kay.s...@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gr...@kroah.com>
diff -Nru a/include/linux/kobject.h b/include/linux/kobject.h
--- a/include/linux/kobject.h 2004-09-10 16:45:59 -07:00
+++ b/include/linux/kobject.h 2004-09-10 16:45:59 -07:00
@@ -62,9 +62,7 @@
extern struct kobject * kobject_get(struct kobject *);
extern void kobject_put(struct kobject *);
-extern void kobject_hotplug(const char *action, struct kobject *);
-
-extern char * kobject_get_path(struct kset *, struct kobject *, int);
+extern char * kobject_get_path(struct kobject *, int);
struct kobj_type {
void (*release)(struct kobject *);
@@ -236,6 +234,34 @@
extern int subsys_create_file(struct subsystem * , struct subsys_attribute *);
extern void subsys_remove_file(struct subsystem * , struct subsys_attribute *);
+
+
+#ifdef CONFIG_HOTPLUG
+extern void kobject_hotplug(const char *action, struct kobject *kobj);
+#else
+static inline void kobject_hotplug(const char *action, struct kobject *kobj) { }
+#endif
+
+
+#ifdef CONFIG_KOBJECT_UEVENT
+extern int kobject_uevent(const char *signal, struct kobject *kobj,
+ struct attribute *attr);
+
+extern int kobject_uevent_atomic(const char *signal, struct kobject *kobj,
+ struct attribute *attr);
+#else
+static inline int kobject_uevent(const char *signal, struct kobject *kobj,
+ struct attribute *attr)
+{
+ return 0;
+}
+
+static inline int kobject_uevent_atomic(const char *signal, struct kobject *kobj,
+ struct attribute *attr)
+{
+ return 0;
+}
+#endif
#endif /* __KERNEL__ */
#endif /* _KOBJECT_H_ */
diff -Nru a/include/linux/netlink.h b/include/linux/netlink.h
--- a/include/linux/netlink.h 2004-09-10 16:45:59 -07:00
+++ b/include/linux/netlink.h 2004-09-10 16:45:59 -07:00
@@ -17,6 +17,7 @@
#define NETLINK_ROUTE6 11 /* af_inet6 route comm channel */
#define NETLINK_IP6_FW 13
#define NETLINK_DNRTMSG 14 /* DECnet routing messages */
+#define NETLINK_KOBJECT_UEVENT 15 /* Kernel messages to userspace */
#define NETLINK_TAPBASE 16 /* 16 to 31 are ethertap */
#define MAX_LINKS 32
diff -Nru a/init/Kconfig b/init/Kconfig
--- a/init/Kconfig 2004-09-10 16:45:59 -07:00
+++ b/init/Kconfig 2004-09-10 16:45:59 -07:00
@@ -195,6 +195,25 @@
agent" (/sbin/hotplug) to load modules and set up software needed
to use devices as you hotplug them.
+config KOBJECT_UEVENT
+ bool "Kernel Userspace Events"
+ depends on NET
+ default y
+ help
+ This option enables the kernel userspace event layer, which is a
+ simple mechanism for kernel-to-user communication over a netlink
+ socket.
+ The goal of the kernel userspace events layer is to provide a simple
+ and efficient events system, that notifies userspace about kobject
+ state changes. This will enable applications to just listen for
+ events instead of polling system devices and files.
+ Hotplug events (kobject addition and removal) are also available on
+ the netlink socket in addition to the execution of /sbin/hotplug if
+ CONFIG_HOTPLUG is enabled.
+
+ Say Y, unless you are building a system requiring minimal memory
+ consumption.
+
config IKCONFIG
bool "Kernel .config support"
---help---
diff -Nru a/lib/Makefile b/lib/Makefile
--- a/lib/Makefile 2004-09-10 16:45:59 -07:00
+++ b/lib/Makefile 2004-09-10 16:45:59 -07:00
@@ -6,7 +6,7 @@
lib-y := errno.o ctype.o string.o vsprintf.o cmdline.o \
bust_spinlocks.o rbtree.o radix-tree.o dump_stack.o \
kobject.o kref.o idr.o div64.o parser.o int_sqrt.o \
- bitmap.o extable.o
+ bitmap.o extable.o kobject_uevent.o
lib-$(CONFIG_RWSEM_GENERIC_SPINLOCK) += rwsem-spinlock.o
lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
diff -Nru a/lib/kobject.c b/lib/kobject.c
--- a/lib/kobject.c 2004-09-10 16:45:59 -07:00
+++ b/lib/kobject.c 2004-09-10 16:45:59 -07:00
@@ -63,7 +63,7 @@
return container_of(entry,struct kobject,entry);
}
-static int get_kobj_path_length(struct kset *kset, struct kobject *kobj)
+static int get_kobj_path_length(struct kobject *kobj)
{
int length = 1;
struct kobject * parent = kobj;
@@ -79,7 +79,7 @@
return length;
}
-static void fill_kobj_path(struct kset *kset, struct kobject *kobj, char *path, int length)
+static void fill_kobj_path(struct kobject *kobj, char *path, int length)
{
struct kobject * parent;
@@ -99,146 +99,24 @@
* kobject_get_path - generate and return the path associated with a given kobj
* and kset pair. The result must be freed by the caller with kfree().
*
- * @kset: kset in question, with which to build the path
* @kobj: kobject in question, with which to build the path
* @gfp_mask: the allocation type used to allocate the path
*/
-char * kobject_get_path(struct kset *kset, struct kobject *kobj, int gfp_mask)
+char *kobject_get_path(struct kobject *kobj, int gfp_mask)
{
char *path;
int len;
- len = get_kobj_path_length(kset, kobj);
+ len = get_kobj_path_length(kobj);
path = kmalloc(len, gfp_mask);
if (!path)
return NULL;
memset(path, 0x00, len);
- fill_kobj_path(kset, kobj, path, len);
+ fill_kobj_path(kobj, path, len);
return path;
}
-#ifdef CONFIG_HOTPLUG
-
-u64 hotplug_seqnum;
-#define BUFFER_SIZE 1024 /* should be enough memory for the env */
-#define NUM_ENVP 32 /* number of env pointers */
-static spinlock_t sequence_lock = SPIN_LOCK_UNLOCKED;
-
-static void kset_hotplug(const char *action, struct kset *kset,
- struct kobject *kobj)
-{
- char *argv [3];
- char **envp = NULL;
- char *buffer = NULL;
- char *scratch;
- int i = 0;
- int retval;
- char *kobj_path = NULL;
- char *name = NULL;
- unsigned long seq;
-
- /* If the kset has a filter operation, call it. If it returns
- failure, no hotplug event is required. */
- if (kset->hotplug_ops->filter) {
- if (!kset->hotplug_ops->filter(kset, kobj))
- return;
- }
-
- pr_debug ("%s\n", __FUNCTION__);
-
- if (!hotplug_path[0])
- return;
-
- envp = kmalloc(NUM_ENVP * sizeof (char *), GFP_KERNEL);
- if (!envp)
- return;
- memset (envp, 0x00, NUM_ENVP * sizeof (char *));
-
- buffer = kmalloc(BUFFER_SIZE, GFP_KERNEL);
- if (!buffer)
- goto exit;
-
- if (kset->hotplug_ops->name)
- name = kset->hotplug_ops->name(kset, kobj);
- if (name == NULL)
- name = kset->kobj.name;
-
- argv [0] = hotplug_path;
- argv [1] = name;
- argv [2] = NULL;
-
- /* minimal command environment */
- envp [i++] = "HOME=/";
- envp [i++] = "PATH=/sbin:/bin:/usr/sbin:/usr/bin";
-
- scratch = buffer;
-
- envp [i++] = scratch;
- scratch += sprintf(scratch, "ACTION=%s", action) + 1;
-
- spin_lock(&sequence_lock);
- seq = ++hotplug_seqnum;
- spin_unlock(&sequence_lock);
-
- envp [i++] = scratch;
- scratch += sprintf(scratch, "SEQNUM=%ld", seq) + 1;
-
- kobj_path = kobject_get_path(kset, kobj, GFP_KERNEL);
- if (!kobj_path)
- goto exit;
-
- envp [i++] = scratch;
- scratch += sprintf (scratch, "DEVPATH=%s", kobj_path) + 1;
-
- if (kset->hotplug_ops->hotplug) {
- /* have the kset specific function add its stuff */
- retval = kset->hotplug_ops->hotplug (kset, kobj,
- &envp[i], NUM_ENVP - i, scratch,
- BUFFER_SIZE - (scratch - buffer));
- if (retval) {
- pr_debug ("%s - hotplug() returned %d\n",
- __FUNCTION__, retval);
- goto exit;
- }
- }
-
- pr_debug ("%s: %s %s %s %s %s %s %s\n", __FUNCTION__, argv[0], argv[1],
- envp[0], envp[1], envp[2], envp[3], envp[4]);
- retval = call_usermodehelper (argv[0], argv, envp, 0);
- if (retval)
- pr_debug ("%s - call_usermodehelper returned %d\n",
- __FUNCTION__, retval);
-
-exit:
- kfree(kobj_path);
- kfree(buffer);
- kfree(envp);
- return;
-}
-
-void kobject_hotplug(const char *action, struct kobject *kobj)
-{
- struct kobject * top_kobj = kobj;
-
- /* If this kobj does not belong to a kset,
- try to find a parent that does. */
- if (!top_kobj->kset && top_kobj->parent) {
- do {
- top_kobj = top_kobj->parent;
- } while (!top_kobj->kset && top_kobj->parent);
- }
-
- if (top_kobj->kset && top_kobj->kset->hotplug_ops)
- kset_hotplug(action, top_kobj->kset, kobj);
-}
-#else
-void kobject_hotplug(const char *action, struct kobject *kobj)
-{
- return;
-}
-#endif /* CONFIG_HOTPLUG */
-
/**
* kobject_init - initialize object.
* @kobj: object in question.
@@ -654,7 +532,6 @@
EXPORT_SYMBOL(kobject_add);
EXPORT_SYMBOL(kobject_del);
EXPORT_SYMBOL(kobject_rename);
-EXPORT_SYMBOL(kobject_hotplug);
EXPORT_SYMBOL(kset_register);
EXPORT_SYMBOL(kset_unregister);
diff -Nru a/lib/kobject_uevent.c b/lib/kobject_uevent.c
--- /dev/null Wed Dec 31 16:00:00 196900
+++ b/lib/kobject_uevent.c 2004-09-10 16:45:59 -07:00
@@ -0,0 +1,264 @@
+/*
+ * kernel userspace event delivery
+ *
+ * Copyright (C) 2004 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2004 Novell, Inc. All rights reserved.
+ * Copyright (C) 2004 IBM, Inc. All rights reserved.
+ *
+ * Licensed under the GNU GPL v2.
+ *
+ * Authors:
+ * Robert Love <r...@novell.com>
+ * Kay Sievers <kay.s...@vrfy.org>
+ * Arjan van de Ven <arj...@redhat.com>
+ * Greg Kroah-Hartman <gr...@kroah.com>
+ */
+
+#include <linux/spinlock.h>
+#include <linux/socket.h>
+#include <linux/skbuff.h>
+#include <linux/netlink.h>
+#include <linux/string.h>
+#include <linux/kobject.h>
+#include <net/sock.h>
+
+#ifdef CONFIG_KOBJECT_UEVENT
+static struct sock *uevent_sock;
+
+/**
+ * send_uevent - notify userspace by sending event trough netlink socket
+ *
+ * @signal: signal name
+ * @obj: object path (kobject)
+ * @buf: buffer used to pass auxiliary data like the hotplug environment
+ * @buflen:
+ * gfp_mask:
+ */
+static int send_uevent(const char *signal, const char *obj, const void *buf,
+ int buflen, int gfp_mask)
+{
+ struct sk_buff *skb;
+ char *pos;
+ int len;
+
+ if (!uevent_sock)
+ return -EIO;
+
+ len = strlen(signal) + 1;
+ len += strlen(obj) + 1;
+ len += buflen;
+
+ skb = alloc_skb(len, gfp_mask);
+ if (!skb)
+ return -ENOMEM;
+
+ pos = skb_put(skb, len);
+
+ pos += sprintf(pos, "%s@%s", signal, obj) + 1;
+ memcpy(pos, buf, buflen);
+
+ return netlink_broadcast(uevent_sock, skb, 0, 1, gfp_mask);
+}
+
+static int do_kobject_uevent(const char *signal, struct kobject *kobj,
+ struct attribute *attr, int gfp_mask)
+{
+ char *path;
+ char *attrpath;
+ int len;
+ int rc = -ENOMEM;
+
+ path = kobject_get_path(kobj, gfp_mask);
+ if (!path)
+ return -ENOMEM;
+
+ if (attr) {
+ len = strlen(path);
+ len += strlen(attr->name) + 2;
+ attrpath = kmalloc(len, gfp_mask);
+ if (!attrpath)
+ goto exit;
+ sprintf(attrpath, "%s/%s", path, attr->name);
+ rc = send_uevent(signal, attrpath, NULL, 0, gfp_mask);
+ kfree(attrpath);
+ } else {
+ rc = send_uevent(signal, path, NULL, 0, gfp_mask);
+ }
+
+exit:
+ kfree(path);
+ return rc;
+}
+
+/**
+ * kobject_uevent - notify userspace by sending event through netlink socket
+ *
+ * @signal: signal name
+ * @kobj: struct kobject that the event is happening to
+ * @attr: optional struct attribute the event belongs to
+ */
+int kobject_uevent(const char *signal, struct kobject *kobj,
+ struct attribute *attr)
+{
+ return do_kobject_uevent(signal, kobj, attr, GFP_KERNEL);
+}
+EXPORT_SYMBOL_GPL(kobject_uevent);
+
+int kobject_uevent_atomic(const char *signal, struct kobject *kobj,
+ struct attribute *attr)
+{
+ return do_kobject_uevent(signal, kobj, attr, GFP_ATOMIC);
+}
+
+EXPORT_SYMBOL_GPL(kobject_uevent_atomic);
+
+static int __init kobject_uevent_init(void)
+{
+ uevent_sock = netlink_kernel_create(NETLINK_KOBJECT_UEVENT, NULL);
+
+ if (!uevent_sock) {
+ printk(KERN_ERR
+ "kobject_uevent: unable to create netlink socket!\n");
+ return -ENODEV;
+ }
+
+ return 0;
+}
+
+core_initcall(kobject_uevent_init);
+
+#else
+static inline int send_uevent(const char *signal, const char *obj,
+ const void *buf, int buflen, int gfp_mask)
+{
+ return 0;
+}
+
+#endif /* CONFIG_KOBJECT_UEVENT */
+
+
+#ifdef CONFIG_HOTPLUG
+u64 hotplug_seqnum;
+static spinlock_t sequence_lock = SPIN_LOCK_UNLOCKED;
+
+#define BUFFER_SIZE 1024 /* should be enough memory for the env */
+#define NUM_ENVP 32 /* number of env pointers */
+/**
+ * kobject_hotplug - notify userspace by executing /sbin/hotplug
+ *
+ * @action: action that is happening (usually "ADD" or "REMOVE")
+ * @kobj: struct kobject that the action is happening to
+ */
+void kobject_hotplug(const char *action, struct kobject *kobj)
+{
+ char *argv [3];
+ char **envp = NULL;
+ char *buffer = NULL;
+ char *scratch;
+ int i = 0;
+ int retval;
+ char *kobj_path = NULL;
+ char *name = NULL;
+ u64 seq;
+ struct kobject *top_kobj = kobj;
+ struct kset *kset;
+
+ if (!top_kobj->kset && top_kobj->parent) {
+ do {
+ top_kobj = top_kobj->parent;
+ } while (!top_kobj->kset && top_kobj->parent);
+ }
+
+ if (top_kobj->kset && top_kobj->kset->hotplug_ops)
+ kset = top_kobj->kset;
+ else
+ return;
+
+ /* If the kset has a filter operation, call it.
+ Skip the event, if the filter returns zero. */
+ if (kset->hotplug_ops->filter) {
+ if (!kset->hotplug_ops->filter(kset, kobj))
+ return;
+ }
+
+ pr_debug ("%s\n", __FUNCTION__);
+
+ envp = kmalloc(NUM_ENVP * sizeof (char *), GFP_KERNEL);
+ if (!envp)
+ return;
+ memset (envp, 0x00, NUM_ENVP * sizeof (char *));
+
+ buffer = kmalloc(BUFFER_SIZE, GFP_KERNEL);
+ if (!buffer)
+ goto exit;
+
+ if (kset->hotplug_ops->name)
+ name = kset->hotplug_ops->name(kset, kobj);
+ if (name == NULL)
+ name = kset->kobj.name;
+
+ argv [0] = hotplug_path;
+ argv [1] = name;
+ argv [2] = NULL;
+
+ /* minimal command environment */
+ envp [i++] = "HOME=/";
+ envp [i++] = "PATH=/sbin:/bin:/usr/sbin:/usr/bin";
+
+ scratch = buffer;
+
+ envp [i++] = scratch;
+ scratch += sprintf(scratch, "ACTION=%s", action) + 1;
+
+ kobj_path = kobject_get_path(kobj, GFP_KERNEL);
+ if (!kobj_path)
+ goto exit;
+
+ envp [i++] = scratch;
+ scratch += sprintf (scratch, "DEVPATH=%s", kobj_path) + 1;
+
+ spin_lock(&sequence_lock);
+ seq = ++hotplug_seqnum;
+ spin_unlock(&sequence_lock);
+
+ envp [i++] = scratch;
+ scratch += sprintf(scratch, "SEQNUM=%lld", seq) + 1;
+
+ envp [i++] = scratch;
+ scratch += sprintf(scratch, "SUBSYSTEM=%s", name) + 1;
+
+ if (kset->hotplug_ops->hotplug) {
+ /* have the kset specific function add its stuff */
+ retval = kset->hotplug_ops->hotplug (kset, kobj,
+ &envp[i], NUM_ENVP - i, scratch,
+ BUFFER_SIZE - (scratch - buffer));
+ if (retval) {
+ pr_debug ("%s - hotplug() returned %d\n",
+ __FUNCTION__, retval);
+ goto exit;
+ }
+ }
+
+ pr_debug ("%s: %s %s %s %s %s %s %s\n", __FUNCTION__, argv[0], argv[1],
+ envp[0], envp[1], envp[2], envp[3], envp[4]);
+
+ send_uevent(action, kobj_path, buffer, scratch - buffer, GFP_KERNEL);
+
+ if (!hotplug_path[0])
+ goto exit;
+
+ retval = call_usermodehelper (argv[0], argv, envp, 0);
+ if (retval)
+ pr_debug ("%s - call_usermodehelper returned %d\n",
+ __FUNCTION__, retval);
+
+exit:
+ kfree(kobj_path);
+ kfree(buffer);
+ kfree(envp);
+ return;
+}
+EXPORT_SYMBOL(kobject_hotplug);
+#endif /* CONFIG_HOTPLUG */
+
+
Sorry I missed the flare up of this topic. What about events for which
there is no associated kobject?
why is the kobject argument not first? Seems weird..
What happened to a formatted string argument? The signal argument can
become the pre-formatted string, and someone can provide a wrapper
that takes a printf() like format and args.
kobject_uevent_printf(kobj, "something bad: 0x%08x", err);
Tim
Tough, no event for them :)
> why is the kobject argument not first? Seems weird..
Yeah, it is a bit "odd", but it follows my old kobject_hotplug() way.
> What happened to a formatted string argument? The signal argument can
> become the pre-formatted string, and someone can provide a wrapper
> that takes a printf() like format and args.
> kobject_uevent_printf(kobj, "something bad: 0x%08x", err);
Use an attribute, and have userspace read that formatted argument if
need be. This keeps the kernel interface much simpler, and doesn't
allow you to abuse it for things it is not intended for (like error
reporting stuff...)
thanks,
greg k-h
Not to be cheeky, but cpu "overheating" isn't an error event? <grin>
Errm, not for error reporting? So the "driver hardening" and fault
logging people shouldn't use this?
Greg, looks good to me. Thank you much for picking this up.
I am currently out of town (foo camp) so I cannot really test it, but I
did a quick patch review and it looks right on.
> (Andrew, you can drop your other patch in your stack that contained a
> version of this.)
Nod.
> +/**
> + * send_uevent - notify userspace by sending event trough netlink socket
s/trough/through/ ;-)
...
Kay, any thoughts?
We should probably add at least _some_ user. The filesystem mount
events are good, since we want to add those to HAL.
Again, thanks.
Robert Love
> Not to be cheeky, but cpu "overheating" isn't an error event? <grin>
I think the key importance is event vs. error. E.g., "overheating" is a
state, something that can be handled in user-space. A driver piping out
"error 501" is, well, something different.
Personally, I am not against using kevent for driver events/errors, but
the exact scope needs to be decided by the community. You do need a
kevent, though.
Robert Love
> > What happened to a formatted string argument? The signal argument can
> > become the pre-formatted string, and someone can provide a wrapper
> > that takes a printf() like format and args.
> > kobject_uevent_printf(kobj, "something bad: 0x%08x", err);
>
> Use an attribute, and have userspace read that formatted argument if
> need be. This keeps the kernel interface much simpler, and doesn't
> allow you to abuse it for things it is not intended for (like error
> reporting stuff...)
Erm, no. This will just encourage folks to sprintf to a buffer first
and pass the result to kobject_uevent_printf().
nitpick: Also, if this isn't taking formatted input, shouldn't the name of the
function lose the 'f' ?
Dave
Heh, give something for the "spelling nit-pickers" to submit a patch
against :)
> We should probably add at least _some_ user. The filesystem mount
> events are good, since we want to add those to HAL.
True, anyone want to send me a patch with a user of this?
thanks,
greg k-h
The "driver hardening" people hopefully have been scared away from Linux
so they never show their face around here again. And if they do, no,
this is not what they want to do (what they need to do is get a clue...)
The fault logging people should also not use this, as this is for
notifying userspace that an event has happened. And yes, "overheat" is
an example of a event that some people consider a "fault" :)
thanks,
greg k-h
Yeah, I agree. I think we need to standardize on the "events", and make
it hard to misspell them. I'll work on that on Monday.
> nitpick: Also, if this isn't taking formatted input, shouldn't the name of the
> function lose the 'f' ?
The function name is kobject_uevent(), no 'f' in sight :)
thanks,
greg k-h
Do we agree on the model that the signal is a simple verb and we keep
only a small dictionary of _static_ signal strings and no fancy compositions?
And we should reserve the "add" and "remove" only for the hotplug events.
Here is the first user besides the hotplug event, for notification of
filesystem changes with "mount" and "umount" signals for the blockdevice.
Thanks,
Kay
What's wrong about fixing acpi to have something like
/sys/devices/acpi/buttons/power/, that spits out the event?
Just curious...
Jan
It's a "real" one in that it is generated :)
request_firmware() causes the hotplug event to happen, so that we know
to load firmware to the device that requested it.
> > I didn't put a check in the kobject_uevent() calls to prevent the add and
> > remove, but now it's a lot easier to do so if you think it's necessary.
>
> Don't think that this is needed. I will add somthing to the kobject
> documentation, if it's stable and merged.
Ok, I'll add this patch and wait for the documentation :)
thanks,
greg k-h
I don't have any concrete examples right now, but it seems that this is
being locked down pretty tightly for no real reason...
Just a passing thought.
You'd still need to spit out a payload with the status. Interesting idea
for the evolution of acpi, though...
Yes, it's a wide range of stuff that all should be consolidated.
But I haven't seen many people step up to do the work, that's my biggest
complaint. I know I don't have the time to do it either :)
> > > What I think we'll find is that fringe users will hack around it. It will
> > > become a documentum that the "insert" event of a Foo really means
> > > something else. People will adapt to the limited "verbs" and overload
> > > them to mean whatever it is that they need.
> >
> > Since when did I ever state that the verbs would be "limited"? One of
> > the original issues that people were worried about was the possiblity of
> > a typo in a verb that we would be forced to live with. The patch I just
> > proposed fixes that issue.
>
> Sorry, don't get me wrong - I fully agree with amking it an enum or some
> such. In fact, I think I suggested that in the first draft, too. But
> adding a dozen enum verbs, of which each are used once in one single
> driver is just not going to fly. The barrier to creating a new "verb" is
> not high, but there is a slight barrier. Rather than deal with the
> barrier, I fear (and I could be wrong) that people will just overload
> existing verbs.
We'll see how it gets used. If people start to do this, we'll
reconsider the kernel code. The interface to userspace will still be
the same, so we aren't painting ourselves into a corner right now.
> > > As much as we all like to malign "driver hardening", there is a *lot* that
> > > can be done to make drivers more robust and to report better diagnostics
> > > and failure events.
> >
> > I agree. But this interface is not designed or intended for that.
>
> Right. I originally asked Robert if there was some way to make this
> interface capable of handling that, too. Maybe the answer is merely "no,
> not this API".
"No, not this API."
Seriously, that's not what this interface is for. This is a simple
event notification interface.
> > > I'd like to have a standardized way to spit things like ECC errors up to
> > > userspace, but I don't think that's what Greg K-H wants these used for.
> >
> > You are correct. I also would like to see a way ECC and other types of
> > errors and diagnostics be sent to userspace in a common and unified
> > manner. But I have yet to see a proposal to do this that is acceptable.
>
> Well, let's open that discussion, then! :) What requirements do you see?
>
> Basically, they need to be exactly like this, except there needs to be
> some amount of buffering of messages (somehow) and they need a data
> payload.
Sounds good to me.
> For buffering, I could live with "all events with no listener get
> printk()ed and they get buffered in dmesg". Now all we need to solve is
> the payload issue.
>
> Really, other than payload, why NOT use this API for ECC and driver
> faults?
The payload is a pretty good reason why to not use this right now. No
one has proposed a way to handle such a payload in a sane manner.
> > > I'd like to ACPI events move to a standardized event system, but they
> > > *require* a data payload.
> >
> > Great, that also would be wonderful. Create such a event system for
> > ACPI if you desire. I think the ACPI developers are still working on
> > bigger issues currently.
>
> ACPI *has* it's own event system. It's fine, but it's Yet Another Event
> System.
Yeah, it's pretty old too, that's why it is the way it is.
> I'd love to see it use this. It has mostly the same behaviors,
> except it has a data payload (a string and couple unsigned ints, if I
> recall). If this API can't handle that, then we have to keep ACPI's
> current event system. Wouldn't it be nicer to remove code and encourage
> migrating towards standard(ish) APIs?
>
> Again, other than payload, why NOT use this API for ACPI?
Again, the payload is the big thing, right?
thanks,
greg k-h
That's fine. I'll dump them into a separate header file to make it a
bit easier to find and add new ones to.
> In other words, I like the safety and typo prevention that this gives
> us, but want to keep things flexible and easy.
Heh, you want it all, don't you :)
Yeah, it's much reduced in flexibility from where it started.
> But so long as you can always add a new action, what complaint do you
> have? In other words, all this does is force the use of the enum, which
> ensures that we try to reuse existing actions, prevent typos, and so on.
Well, it will be what it will be, I think. I know several people who
wanted it to be more than it is turning out to be, but that's not
unexpected. Of course we can cope with what it is.
What I think we'll find is that fringe users will hack around it. It will
become a documentum that the "insert" event of a Foo really means
something else. People will adapt to the limited "verbs" and overload
them to mean whatever it is that they need.
As much as we all like to malign "driver hardening", there is a *lot* that
can be done to make drivers more robust and to report better diagnostics
and failure events.
I'd like to have a standardized way to spit things like ECC errors up to
userspace, but I don't think that's what Greg K-H wants these used for.
I'd like to ACPI events move to a standardized event system, but they
*require* a data payload.
There are *way* too many places (IMHO) where we throw a printk() and punt,
or do something which is less than ideal. If I had my druthers, we would
examine most places that call printk() at runtime (not startup, etc) and
figure out if an event makes more sense.
This model serves well for "eth0 has a link" and "hda1 was mounted" sorts
of events. [Though namespaces make mounting a lot of fun. Which namespace
was it mounted on? Why should my app in namespace X see an event about
namespace Y?]
If that is all it's good for, then it is better than nothing, though not
as good as it might be.
Tim
Well, what's the status supposed to mean? Is this something like
buttondown, buttonup, ...? Couldn't this be the 'event'. I mean that it
is an event is kind of selfexplaining if it comes through the event layer.
Jan
That's because the original proposal was not very well defined at all.
> > But so long as you can always add a new action, what complaint do you
> > have? In other words, all this does is force the use of the enum, which
> > ensures that we try to reuse existing actions, prevent typos, and so on.
>
> Well, it will be what it will be, I think. I know several people who
> wanted it to be more than it is turning out to be, but that's not
> unexpected. Of course we can cope with what it is.
Who are those people? What did people want it to be that it now is not?
Will they not speak up publicly? If not, how do they expect it to
address things they want?
> What I think we'll find is that fringe users will hack around it. It will
> become a documentum that the "insert" event of a Foo really means
> something else. People will adapt to the limited "verbs" and overload
> them to mean whatever it is that they need.
Since when did I ever state that the verbs would be "limited"? One of
the original issues that people were worried about was the possiblity of
a typo in a verb that we would be forced to live with. The patch I just
proposed fixes that issue.
All new "verbs" will be submitted to the same kind of review that any
other portion of the kernel code would be subjected to. Nothing wrong
with that, right?
> As much as we all like to malign "driver hardening", there is a *lot* that
> can be done to make drivers more robust and to report better diagnostics
> and failure events.
I agree. But this interface is not designed or intended for that.
> I'd like to have a standardized way to spit things like ECC errors up to
> userspace, but I don't think that's what Greg K-H wants these used for.
You are correct. I also would like to see a way ECC and other types of
errors and diagnostics be sent to userspace in a common and unified
manner. But I have yet to see a proposal to do this that is acceptable.
> I'd like to ACPI events move to a standardized event system, but they
> *require* a data payload.
Great, that also would be wonderful. Create such a event system for
ACPI if you desire. I think the ACPI developers are still working on
bigger issues currently.
> There are *way* too many places (IMHO) where we throw a printk() and punt,
> or do something which is less than ideal. If I had my druthers, we would
> examine most places that call printk() at runtime (not startup, etc) and
> figure out if an event makes more sense.
Please do.
> This model serves well for "eth0 has a link" and "hda1 was mounted" sorts
> of events. [Though namespaces make mounting a lot of fun. Which namespace
> was it mounted on? Why should my app in namespace X see an event about
> namespace Y?]
Yeah, namespaces are interesting :)
> If that is all it's good for, then it is better than nothing, though not
> as good as it might be.
Patches are always welcome.
thanks,
greg k-h
I agree with this. And because of that, we should enforce that, and
help prevent typos in the signals. So, here's a patch that does just
that, making it a lot harder to mess up (although you still can, as
enumerated types aren't checked by the compiler...) This patch booted
on my test box.
Anyone object to me adding this patch? If not, I'll fix up Kay's patch
for mounting to use this interface and add both of them.
> And we should reserve the "add" and "remove" only for the hotplug events.
I don't know, the firmware objects already use "add" for an event. I
didn't put a check in the kobject_uevent() calls to prevent the add and
remove, but now it's a lot easier to do so if you think it's necessary.
thanks,
greg k-h
-----------
kevent: force users to use a type, not a string to help prevent typos
and so that we all can keep track of the different event types much
easier.
===== drivers/base/firmware_class.c 1.13 vs edited =====
--- 1.13/drivers/base/firmware_class.c 2004-09-13 09:46:22 -07:00
+++ edited/drivers/base/firmware_class.c 2004-09-14 16:33:20 -07:00
@@ -420,7 +420,7 @@
add_timer(&fw_priv->timeout);
}
- kobject_hotplug("add", &class_dev->kobj);
+ kobject_hotplug(&class_dev->kobj, KOBJ_ADD);
wait_for_completion(&fw_priv->completion);
set_bit(FW_STATUS_DONE, &fw_priv->status);
===== include/linux/kobject.h 1.30 vs edited =====
--- 1.30/include/linux/kobject.h 2004-09-13 12:58:51 -07:00
+++ edited/include/linux/kobject.h 2004-09-14 16:24:00 -07:00
@@ -236,27 +236,33 @@
extern void subsys_remove_file(struct subsystem * , struct subsys_attribute *);
+enum kobject_action {
+ KOBJ_ADD = 0x00,
+ KOBJ_REMOVE = 0x01,
+ KOBJ_MAX_ACTION,
+};
+
#ifdef CONFIG_HOTPLUG
-extern void kobject_hotplug(const char *action, struct kobject *kobj);
+extern void kobject_hotplug(struct kobject *kobj, enum kobject_action action);
#else
-static inline void kobject_hotplug(const char *action, struct kobject *kobj) { }
+static inline void kobject_hotplug(struct kobject *kobj, enum kobject_action action) { }
#endif
#ifdef CONFIG_KOBJECT_UEVENT
-extern int kobject_uevent(const char *signal, struct kobject *kobj,
+extern int kobject_uevent(struct kobject *kobj, enum kobject_action action,
struct attribute *attr);
-extern int kobject_uevent_atomic(const char *signal, struct kobject *kobj,
+extern int kobject_uevent_atomic(struct kobject *kobj, enum kobject_action action,
struct attribute *attr);
#else
-static inline int kobject_uevent(const char *signal, struct kobject *kobj,
+static inline int kobject_uevent(struct kobject *kobj, enum kobject_action action,
struct attribute *attr)
{
return 0;
}
-static inline int kobject_uevent_atomic(const char *signal, struct kobject *kobj,
+static inline int kobject_uevent_atomic(struct kobject *kobj, enum kobject_action action,
struct attribute *attr)
{
return 0;
===== lib/kobject.c 1.55 vs edited =====
--- 1.55/lib/kobject.c 2004-09-13 09:54:04 -07:00
+++ edited/lib/kobject.c 2004-09-14 16:20:21 -07:00
@@ -185,7 +185,7 @@
if (parent)
kobject_put(parent);
} else {
- kobject_hotplug("add", kobj);
+ kobject_hotplug(kobj, KOBJ_ADD);
}
return error;
@@ -299,7 +299,7 @@
void kobject_del(struct kobject * kobj)
{
- kobject_hotplug("remove", kobj);
+ kobject_hotplug(kobj, KOBJ_REMOVE);
sysfs_remove_dir(kobj);
unlink(kobj);
}
===== lib/kobject_uevent.c 1.2 vs edited =====
--- 1.2/lib/kobject_uevent.c 2004-09-11 18:50:05 -07:00
+++ edited/lib/kobject_uevent.c 2004-09-14 16:28:21 -07:00
@@ -22,6 +22,20 @@
#include <linux/kobject.h>
#include <net/sock.h>
+/* these match up with the values for enum kobject_action */
+static char *actions[] = {
+ "add", /* 0x00 */
+ "remove", /* 0x01 */
+};
+
+static char *action_to_string(enum kobject_action action)
+{
+ if (action >= KOBJ_MAX_ACTION)
+ return NULL;
+ else
+ return actions[action];
+}
+
#ifdef CONFIG_KOBJECT_UEVENT
static struct sock *uevent_sock;
@@ -60,11 +74,12 @@
return netlink_broadcast(uevent_sock, skb, 0, 1, gfp_mask);
}
-static int do_kobject_uevent(const char *signal, struct kobject *kobj,
+static int do_kobject_uevent(struct kobject *kobj, enum kobject_action action,
struct attribute *attr, int gfp_mask)
{
char *path;
char *attrpath;
+ char *signal;
int len;
int rc = -ENOMEM;
@@ -72,6 +87,10 @@
if (!path)
return -ENOMEM;
+ signal = action_to_string(action);
+ if (!signal)
+ return -EINVAL;
+
if (attr) {
len = strlen(path);
len += strlen(attr->name) + 2;
@@ -97,17 +116,17 @@
* @kobj: struct kobject that the event is happening to
* @attr: optional struct attribute the event belongs to
*/
-int kobject_uevent(const char *signal, struct kobject *kobj,
+int kobject_uevent(struct kobject *kobj, enum kobject_action action,
struct attribute *attr)
{
- return do_kobject_uevent(signal, kobj, attr, GFP_KERNEL);
+ return do_kobject_uevent(kobj, action, attr, GFP_KERNEL);
}
EXPORT_SYMBOL_GPL(kobject_uevent);
-int kobject_uevent_atomic(const char *signal, struct kobject *kobj,
+int kobject_uevent_atomic(struct kobject *kobj, enum kobject_action action,
struct attribute *attr)
{
- return do_kobject_uevent(signal, kobj, attr, GFP_ATOMIC);
+ return do_kobject_uevent(kobj, action, attr, GFP_ATOMIC);
}
EXPORT_SYMBOL_GPL(kobject_uevent_atomic);
@@ -149,7 +168,7 @@
* @action: action that is happening (usually "ADD" or "REMOVE")
* @kobj: struct kobject that the action is happening to
*/
-void kobject_hotplug(const char *action, struct kobject *kobj)
+void kobject_hotplug(struct kobject *kobj, enum kobject_action action)
{
char *argv [3];
char **envp = NULL;
@@ -159,6 +178,7 @@
int retval;
char *kobj_path = NULL;
char *name = NULL;
+ char *action_string;
u64 seq;
struct kobject *top_kobj = kobj;
struct kset *kset;
@@ -183,6 +203,10 @@
pr_debug ("%s\n", __FUNCTION__);
+ action_string = action_to_string(action);
+ if (!action_string)
+ return;
+
envp = kmalloc(NUM_ENVP * sizeof (char *), GFP_KERNEL);
if (!envp)
return;
@@ -208,7 +232,7 @@
scratch = buffer;
envp [i++] = scratch;
- scratch += sprintf(scratch, "ACTION=%s", action) + 1;
+ scratch += sprintf(scratch, "ACTION=%s", action_string) + 1;
kobj_path = kobject_get_path(kobj, GFP_KERNEL);
if (!kobj_path)
@@ -242,7 +266,7 @@
pr_debug ("%s: %s %s %s %s %s %s %s\n", __FUNCTION__, argv[0], argv[1],
envp[0], envp[1], envp[2], envp[3], envp[4]);
- send_uevent(action, kobj_path, buffer, scratch - buffer, GFP_KERNEL);
+ send_uevent(action_string, kobj_path, buffer, scratch - buffer, GFP_KERNEL);
if (!hotplug_path[0])
goto exit;
Tim Hockin wrote:
> Take another example: ECC. I'd love to spit out ECC messages in a
> standard way. I've got some code in devel to do better ECC handling. But
> you have to spit out the CPU and the address, at least, and possibly more.
Well, time for /sys/devices/memory/memory<n>/. That would perhaps also
be suitable for numa which want to know which memory module is near
which cpu. So you could have symbolic links in /sys/devices/cpu/cpu<n>
to the corresponding memory modules.
Jan
Don't we already have something like that. On an SN2 near me at this
time, running 2.6.9-rc1-mm4:
# pwd
/sys/devices/system/node/node0
# ls -lt cpu? | cut -c33-
0 Sep 15 00:50 cpu0 -> ../../../../devices/system/cpu/cpu0
0 Sep 15 00:50 cpu1 -> ../../../../devices/system/cpu/cpu1
This tells me that CPUs 0 and 1 are on node 0.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <p...@sgi.com> 1.650.933.1373
Well, the fact that you'd have to somehow:
1) pass the list of all the drivers that register notify handlers to the
userspace
2) make userspace daemons hold ~10 sysfs nodes open
Not that it's not feasible, but that's simply ugly IMHO.
Best regards,
--
Karol Kozimor
kkoz...@aurox.org
And how do you know which memory modules are near cpu0 and 1 ?
Is there already a devices/system/memory/ thing which also gets linked
from the node0 directory?
(Sorry, no SN2 to check handy ;-) )
Jan
IIRC the two possible future destinations for ACPI events are this,
and the input layer. There are some ACPI events that clearly should go
through this mechanism (e.g. thermal), some the input layer (e.g.
weird laptop extra keys), and maybe some in between? I know David
Bronaugh was looking into this a few weeks ago, maybe he'll pop back
up.
My 2c -- Andy
The problem here is that the current picture is merely a sysfs notification
(like a hotplug event is) and we really don't want to use that for error
handling and other things, that needs a reliable form of communcation.
Here we should consider a new transactional interface which can also carry
binary snapshot data from the drivers.
This is a complete different use and should have something like a event
filesystem (relayfs?) or similar and not in any kind something like the here
proposed printk fallback.
For the hotplug event we use a kind of "payload". It's just the already
computed list of '\0'-terminated environment strings, cause we need to know,
what subsystem we are working on and need the hotplug sequence number.
But hotplug is some special kind of event and every event is still strongly
tight to a sysfs device and all "real data" is available from the device
itself.
The same for the mount event, we don't want to carry any specific data,
cause the real data is already available in /proc/mounts, we just want to
know about the device state change. Then we can reread that data, instead
of constantly polling for it.
Yeah, for the ACPI event, I see that it would be possible to use the
payload model, like we use for the hotplug event, but if we open that in
general to the public, I expect something like printk over netlink.
From my point of view, the kobject, an event is generated for, should be
a "real" device and never a whole subsystem. As long as ACPI can _not_ send
events for specific sysfs devices, I'm not for using kobject_uevent.
Thanks,
Kay
So far as the current code is concerned, "memory" and "node" are synonyms.
Notice for example the files such as:
/sys/devices/system/node/node0/meminfo
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <p...@sgi.com> 1.650.933.1373
I *think* the suggestion was to add sysfs nodes for ACPI objects so that
the assosicated kobject could be the "source" argument to the uevent API.
Not that each kobject would have an event stream of it's own :)
I'm not deeply into ACPI, so I'm not sure if it is possible to enumerate
all event sources, or if GPEs would come from nowhere...
Yepp.
> I'm not deeply into ACPI, so I'm not sure if it is possible to enumerate
> all event sources, or if GPEs would come from nowhere...
Well if you do a find /sys | grep acpi, there are basically all nodes
already there - just no values in there :-(
I think Andrew's suggestions (in this same thread) make more sense, ie.
deliver button events via the input layer...
Jan
> IIRC the two possible future destinations for ACPI events are this,
> and the input layer. There are some ACPI events that clearly should go
> through this mechanism (e.g. thermal), some the input layer (e.g.
> weird laptop extra keys), and maybe some in between? I know David
> Bronaugh was looking into this a few weeks ago, maybe he'll pop back
> up.
Hey, Andy.
I'd like, if possible, to have ACPI use kevents. ACPI makes sense as a
user.
The first question is whether or not ACPI could use the current kevent
model. The current reactionary response is that kevent is too limited,
but that misses the point a bit. The point is that you create kobjects
and better integrate with sysfs, and then kevent becomes trivial.
So if ACPI better took advantage of sysfs, would kevents be sufficient?
Robert Love
And here's the patch I applied to my trees and will show up in the next
-mm release.
I'll go convert Kay's mount patch to the new interface and add it too.
(and yes, I've added a "change" event that drivers and such can use to
tell userspace they they need to go look at a specific sysfs file. I
figured that this was going to be the next action request so I might as
well add it now...)
thanks,
greg k-h
-----
kevent: standardize on the event types
This prevents any potential typos from happening.
Signed-off-by: Greg Kroah-Hartman <gr...@kroah.com>
diff -Nru a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
--- a/drivers/base/firmware_class.c 2004-09-15 11:52:49 -07:00
+++ b/drivers/base/firmware_class.c 2004-09-15 11:52:49 -07:00
@@ -420,7 +420,7 @@
add_timer(&fw_priv->timeout);
}
- kobject_hotplug("add", &class_dev->kobj);
+ kobject_hotplug(&class_dev->kobj, KOBJ_ADD);
wait_for_completion(&fw_priv->completion);
set_bit(FW_STATUS_DONE, &fw_priv->status);
diff -Nru a/include/linux/kobject.h b/include/linux/kobject.h
--- a/include/linux/kobject.h 2004-09-15 11:52:49 -07:00
+++ b/include/linux/kobject.h 2004-09-15 11:52:49 -07:00
@@ -22,6 +22,7 @@
#include <linux/sysfs.h>
#include <linux/rwsem.h>
#include <linux/kref.h>
+#include <linux/kobject_uevent.h>
#include <asm/atomic.h>
#define KOBJ_NAME_LEN 20
@@ -235,32 +236,10 @@
extern int subsys_create_file(struct subsystem * , struct subsys_attribute *);
extern void subsys_remove_file(struct subsystem * , struct subsys_attribute *);
-
#ifdef CONFIG_HOTPLUG
-extern void kobject_hotplug(const char *action, struct kobject *kobj);
-#else
-static inline void kobject_hotplug(const char *action, struct kobject *kobj) { }
-#endif
-
-
-#ifdef CONFIG_KOBJECT_UEVENT
-extern int kobject_uevent(const char *signal, struct kobject *kobj,
- struct attribute *attr);
-
-extern int kobject_uevent_atomic(const char *signal, struct kobject *kobj,
- struct attribute *attr);
+extern void kobject_hotplug(struct kobject *kobj, enum kobject_action action);
#else
-static inline int kobject_uevent(const char *signal, struct kobject *kobj,
- struct attribute *attr)
-{
- return 0;
-}
-
-static inline int kobject_uevent_atomic(const char *signal, struct kobject *kobj,
- struct attribute *attr)
-{
- return 0;
-}
+static inline void kobject_hotplug(struct kobject *kobj, enum kobject_action action) { }
#endif
#endif /* __KERNEL__ */
diff -Nru a/include/linux/kobject_uevent.h b/include/linux/kobject_uevent.h
--- /dev/null Wed Dec 31 16:00:00 196900
+++ b/include/linux/kobject_uevent.h 2004-09-15 11:52:49 -07:00
@@ -0,0 +1,50 @@
+/*
+ * kobject_uevent.h - list of kobject user events that can be generated
+ *
+ * Copyright (C) 2004 IBM Corp.
+ * Copyright (C) 2004 Greg Kroah-Hartman <gr...@kroah.com>
+ *
+ * This file is released under the GPLv2.
+ *
+ */
+
+#ifndef _KOBJECT_EVENT_H_
+#define _KOBJECT_EVENT_H_
+
+/*
+ * If you add an action here, you must also add the proper string to the
+ * lib/kobject_uevent.c file.
+ */
+
+enum kobject_action {
+ KOBJ_ADD = 0x00, /* add event, for hotplug */
+ KOBJ_REMOVE = 0x01, /* remove event, for hotplug */
+ KOBJ_CHANGE = 0x02, /* a sysfs attribute file has changed */
+ KOBJ_MOUNT = 0x03, /* mount event for block devices */
+ KOBJ_MAX_ACTION, /* must be last action listed */
+};
+
+
+#ifdef CONFIG_KOBJECT_UEVENT
+int kobject_uevent(struct kobject *kobj,
+ enum kobject_action action,
+ struct attribute *attr);
+int kobject_uevent_atomic(struct kobject *kobj,
+ enum kobject_action action,
+ struct attribute *attr);
+#else
+static inline int kobject_uevent(struct kobject *kobj,
+ enum kobject_action action,
+ struct attribute *attr)
+{
+ return 0;
+}
+static inline int kobject_uevent_atomic(struct kobject *kobj,
+ enum kobject_action action,
+ struct attribute *attr)
+{
+ return 0;
+}
+#endif
+
+#endif
diff -Nru a/lib/kobject.c b/lib/kobject.c
--- a/lib/kobject.c 2004-09-15 11:52:49 -07:00
+++ b/lib/kobject.c 2004-09-15 11:52:49 -07:00
@@ -185,7 +185,7 @@
if (parent)
kobject_put(parent);
} else {
- kobject_hotplug("add", kobj);
+ kobject_hotplug(kobj, KOBJ_ADD);
}
return error;
@@ -299,7 +299,7 @@
void kobject_del(struct kobject * kobj)
{
- kobject_hotplug("remove", kobj);
+ kobject_hotplug(kobj, KOBJ_REMOVE);
sysfs_remove_dir(kobj);
unlink(kobj);
}
diff -Nru a/lib/kobject_uevent.c b/lib/kobject_uevent.c
--- a/lib/kobject_uevent.c 2004-09-15 11:52:49 -07:00
+++ b/lib/kobject_uevent.c 2004-09-15 11:52:49 -07:00
@@ -19,9 +19,29 @@
#include <linux/skbuff.h>
#include <linux/netlink.h>
#include <linux/string.h>
+#include <linux/kobject_uevent.h>
#include <linux/kobject.h>
#include <net/sock.h>
+/*
+ * These must match up with the values for enum kobject_action
+ * as found in include/linux/kobject_uevent.h
+ */
+static char *actions[] = {
+ "add", /* 0x00 */
+ "remove", /* 0x01 */
+ "change", /* 0x02 */
+ "mount", /* 0x03 */
+};
+
+static char *action_to_string(enum kobject_action action)
+{
+ if (action >= KOBJ_MAX_ACTION)
+ return NULL;
+ else
+ return actions[action];
+}
+
#ifdef CONFIG_KOBJECT_UEVENT
static struct sock *uevent_sock;
@@ -60,11 +80,12 @@
return netlink_broadcast(uevent_sock, skb, 0, 1, gfp_mask);
}
-static int do_kobject_uevent(const char *signal, struct kobject *kobj,
+static int do_kobject_uevent(struct kobject *kobj, enum kobject_action action,
struct attribute *attr, int gfp_mask)
{
char *path;
char *attrpath;
+ char *signal;
int len;
int rc = -ENOMEM;
@@ -72,6 +93,10 @@
if (!path)
return -ENOMEM;
+ signal = action_to_string(action);
+ if (!signal)
+ return -EINVAL;
+
if (attr) {
len = strlen(path);
len += strlen(attr->name) + 2;
@@ -97,17 +122,17 @@
* @kobj: struct kobject that the event is happening to
* @attr: optional struct attribute the event belongs to
*/
-int kobject_uevent(const char *signal, struct kobject *kobj,
+int kobject_uevent(struct kobject *kobj, enum kobject_action action,
struct attribute *attr)
{
- return do_kobject_uevent(signal, kobj, attr, GFP_KERNEL);
+ return do_kobject_uevent(kobj, action, attr, GFP_KERNEL);
}
EXPORT_SYMBOL_GPL(kobject_uevent);
-int kobject_uevent_atomic(const char *signal, struct kobject *kobj,
+int kobject_uevent_atomic(struct kobject *kobj, enum kobject_action action,
struct attribute *attr)
{
- return do_kobject_uevent(signal, kobj, attr, GFP_ATOMIC);
+ return do_kobject_uevent(kobj, action, attr, GFP_ATOMIC);
}
EXPORT_SYMBOL_GPL(kobject_uevent_atomic);
@@ -149,7 +174,7 @@
* @action: action that is happening (usually "ADD" or "REMOVE")
* @kobj: struct kobject that the action is happening to
*/
-void kobject_hotplug(const char *action, struct kobject *kobj)
+void kobject_hotplug(struct kobject *kobj, enum kobject_action action)
{
char *argv [3];
char **envp = NULL;
@@ -159,6 +184,7 @@
int retval;
char *kobj_path = NULL;
char *name = NULL;
+ char *action_string;
u64 seq;
struct kobject *top_kobj = kobj;
struct kset *kset;
@@ -183,6 +209,10 @@
pr_debug ("%s\n", __FUNCTION__);
+ action_string = action_to_string(action);
+ if (!action_string)
+ return;
+
envp = kmalloc(NUM_ENVP * sizeof (char *), GFP_KERNEL);
if (!envp)
return;
@@ -208,7 +238,7 @@
scratch = buffer;
envp [i++] = scratch;
- scratch += sprintf(scratch, "ACTION=%s", action) + 1;
+ scratch += sprintf(scratch, "ACTION=%s", action_string) + 1;
kobj_path = kobject_get_path(kobj, GFP_KERNEL);
if (!kobj_path)
@@ -242,7 +272,7 @@
pr_debug ("%s: %s %s %s %s %s %s %s\n", __FUNCTION__, argv[0], argv[1],
envp[0], envp[1], envp[2], envp[3], envp[4]);
- send_uevent(action, kobj_path, buffer, scratch - buffer, GFP_KERNEL);
+ send_uevent(action_string, kobj_path, buffer, scratch - buffer, GFP_KERNEL);
if (!hotplug_path[0])
goto exit;
> And here's the patch I applied to my trees and will show up in the next
> -mm release.
>
> I'll go convert Kay's mount patch to the new interface and add it too.
I think you want an "unmount" signal, too.
> (and yes, I've added a "change" event that drivers and such can use to
> tell userspace they they need to go look at a specific sysfs file. I
> figured that this was going to be the next action request so I might as
> well add it now...)
Good.
Robert Love
> So do we actually plan to handle namespaces at all wrt kevents?
I don't see why we have to.
A mount event should really just cause a rescan of the mount table.
In which namespace? All of them? Is that an information leak that might
be hazardous (I'm bad with security stuff).
> > A mount event should really just cause a rescan of the mount table.
>
> In which namespace? All of them? Is that an information leak that might
> be hazardous (I'm bad with security stuff).
You can only see your own namespace. So e.g. /proc/mtab is your name
space's mount table and you can rescan that when receiving the mount
signal.
That is sort of the coolness of this approach - push the payload out to
user-space, and we don't have to worry about things like information
leak, name spaces, security, etc. We are just providing a notify
mechanism for state information that should already be exposed through
sysfs.
So there is no information leak.
Robert Love
In the namespace of sysfs :).
It's just the physical device.
Thanks,
Kay
> So there is no information leak.
Are you not sending it with some specific device as the source? Or is it
just coming from some abstract root kobject?
> If you're notifying me of mounts and unmounts, I really want to know about
> all of them, not just mounts that have a hard local device. I'd rather
> get "something was mounted" and be forced to probe that (that's a leak,
> too, but less important).
If its a big deal, we can use a generic kobject instead of the physical
device, but I don't think it is a big deal.
It is technically not a security issue, anyhow, since the kevent
requires root to read. But I suppose most uses are going to shovel all
the events onto a user visible bus and we don't want to do arbitration
in user-space.
Robert Love
You can listen only as root!
All information is already in /proc/mounts.
> I'm no security expert, but that seems to me to be a gratuitous leak.
I don't aggree on the second part :)
> Maybe it's just another example of why namespaces need to go away.
>
> 2) If you send me an event "/dev/hda3 mounted" do I also get an event when
> I loopback mount /tmp/rh9.0-1.iso or when I bind mount /foo to /bar or
> when I mount server:/export/home on /home?
You get an event if fs-code claims/relases a genhd. It's a claim/release
event to be more precise. Only the first mount will emit a event and the
last umount.
thaks,
Kay
> Doh! You're right. Here's Kay's patch ported to the new interface, and
> adding a umount event type. I've applied it to my trees.
I am actually thinking that Kay's approach is less than ideal, since it
does not catch all mounts and unmounts.
Robert Love
Ok, for now, no payload.
The kevent interface is well defined, bounded and simple. Let's not try
to muddy it up with arbitraty blobs being passed as a payload (and for
those people who want fimware binary crap in event log messages, please,
just use a sysfs file for access and send a kevent with KOBJ_CHANGE
action.)
The event logging people are working on figuring out what actually needs
to be logged in the first place. Let's let them finialize that, and
then let them propose a way to get that standardized information out of
the kernel (hint, I don't think that netlink is going to cut it for
them...)
thanks,
greg k-h
> We aren't giving absolute /dev entries here, that's the beauty of the
> kobject tree :)
Not that I agree, but I don't think it is the absolute /dev entries that
bother him: it is the fact that knowledge of the mount itself is an
information leak.
Which it is. As root, in my name space, I should rest in the knowledge
that my mounts are secret, I guess. But I just do not see it as a big
problem.
Robert Love
> It's a can of worms, is what it is. And I'm not sure what a good fix
> would be. Would it just be enough to send a generic "mount-table changed"
> event, and let userspace figure out the rest?
"Can of worms" is a tough description for something that there is no
practical security issue for, just a lot of hand waving. No one even
uses name spaces.
Anyhow, I already said that we could send out a generic kobject instead
of the one tied to the specific device.
> Or really, why is the kernel broadcasting a mount, which originated in
> userland. Couldn't mount (or a mount wrapper) do that? It's already
> running in the right namespace...
In practice stuff like that never works. Besides, it is not mount(1)
that we want to wrap but the mount(2) system call. And, uh, I'd rather
stab myself than try to get that patch by Uli.
> Anyone can watch the refcount on the fs-modules, they increment on every
> device claim. Is that a leak in your eyes too :)
Haha, Kay, rocking point.
> The "big deal" is that namespaces are not *ever* considered when things
> like this go on.
>
> We keep adding things that don't think namepsaces are a big deal, and
> we're painting ourselves into a corner where NONE of the advanced
> functionality will work in the face of namespaces.
This stuff does not break name spaces, though. Not at all. Root gets
the event and it or some user rescans their mount table.
I can use name spaces with this and they do not break.
You do have an argument with respect to the information leak, but I've
never looked at name spaces as a method of hiding what is mounted.
Not if I don't build my fs as a module? And yeah, in the strictest sense
it IS a leak if a user can do that.
Now, I'm not *really* concerned withs ecurity, I just want to make sure we
don't flippantly disregard it.
I'd be happy if the actual block device was not part of the event.
What worries me more is that only some mount/umount calls get events, and
not all of them.
That should be fixed, and I think Robert has already said they will fix
that.
thanks,
greg k-h
--
Hm, this is an issue that I and the FreeBSD author of jail were talking
about a week or so ago when I was describing why we did udev in
userspace and the hotplug stuff. He pointed out the namespace issue as
the biggest problem for FreeBSD to be able to do the same kind of thing
that we are doing.
But I agree, I don't think it's a big deal right now either.
thanks,
greg k-h
Ok, well if you two agree that something else should be done, send me a
patch against this last one :)
thanks,
greg k-h
I was only looking from the sysfs side and want the event for a block
device. A generic mount notification was not on the plan that time and
will also not fit in the current kobject picture. We may name it "claim"
and "release" if you like that more.
Kay
> Are you not sending it with some specific device as the source? Or is it
> just coming from some abstract root kobject?
It comes the the physical device.
Is there really a specific issue that you are seeing?
Robert Love
ah, sorry, that is wrong, we (linux-vserver)
_do_ use namespaces extensively, and probably
other 'jail' solutions will use it too ...
best,
Herbert
Great. So, how do you handle the /sbin/hotplug call today? How would
you want to handle this kevent notifier?
thanks,
greg k-h
most of the time, not at all, but if, then in the
'initial' namespace where other userspace helpers
are handled too (means on the host)
> How would you want to handle this kevent notifier?
if there was a notifier telling about mounts - real
and virtual - then they would make sense _inside_
the respective namespace they happen .. e.g.
usb-device attached:
helper is called on host, and does some stuff
result is a mount of some device, which happens
on the host/initial namespace, notifier happens
there ...
process in namespace does --bind mount:
this might be interesting for the host too, but
probably it is more useful for the namespace
where it happened ...
but keep in mind, that might not be true for the
usual, we put our sendmail in a separate namespace
best,
Herbert
Ok, then you could handle the kevent message the same way, right?
> > How would you want to handle this kevent notifier?
>
> if there was a notifier telling about mounts - real
> and virtual - then they would make sense _inside_
> the respective namespace they happen .. e.g.
>
> usb-device attached:
> helper is called on host, and does some stuff
> result is a mount of some device, which happens
> on the host/initial namespace, notifier happens
> there ...
>
> process in namespace does --bind mount:
> this might be interesting for the host too, but
> probably it is more useful for the namespace
> where it happened ...
But in which namespace did it happen? That's probaby the hard part to
determine, right?
yes probably ...
> > > How would you want to handle this kevent notifier?
> >
> > if there was a notifier telling about mounts - real
> > and virtual - then they would make sense _inside_
> > the respective namespace they happen .. e.g.
> >
> > usb-device attached:
> > helper is called on host, and does some stuff
> > result is a mount of some device, which happens
> > on the host/initial namespace, notifier happens
> > there ...
> >
> > process in namespace does --bind mount:
> > this might be interesting for the host too, but
> > probably it is more useful for the namespace
> > where it happened ...
>
> But in which namespace did it happen? That's probaby the hard part to
> determine, right?
well, in the mount --bind case, not really, because
the current process already identifies a namespace
which could be used for that ...
best,
Herbert
In any case, how is it that all kernel code _should_ be sending events
to userspace? GPL the kernel code in question and use kobject_uevent?
;) It'd be nice if non-GPL kernel code could send events through this
interface too.
please advise, thanks.
Mike
On 9/5/04, Kay Sievers <kay.s...@vrfy.org> wrote:
> diff -Nru a/kernel/kobject_uevent.c b/kernel/kobject_uevent.c
> --- /dev/null Wed Dec 31 16:00:00 196900
> +++ b/kernel/kobject_uevent.c 2004-09-06 03:47:59 +02:00
<snip>
> +
> +int kobject_uevent(const char *signal, struct kobject *kobj,
> + struct attribute *attr)
> +{
> + return do_kobject_uevent(signal, kobj, attr, GFP_KERNEL);
> +}
> +
> +EXPORT_SYMBOL(kobject_uevent);
> +
> +int kobject_uevent_atomic(const char *signal, struct kobject *kobj,
> + struct attribute *attr)
> +{
> + return do_kobject_uevent(signal, kobj, attr, GFP_ATOMIC);
> +}
> +
> +EXPORT_SYMBOL(kobject_uevent_atomic);
> +
Yes it is, why are you trying to argue about GPL issues on lkml?
> but I'd like to understand why kobject_uevent and kbject_uevent_atomic
> are EXPORT_SYMBOL_GPL rather than EXPORT_SYMBOL.
Because only GPL code should be causing kevents.
> During the evoloution from a separate kevents over netlink (rml, kay,
> arjan) then folding it in to kobject with hotplug (kay, greg kh, etc)
> it went from GPL to not, as listed below in one of kay's early
> patches, back to EXPORT_SYMBOL_GPL as it stands today.
I asked that it be put back. Is there a problem with this? Do you have
non-GPL kernel code that wants to use this interface?
thanks,
greg k-h