Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH/RFC] Significantly reworked LTT core

3 views
Skip to first unread message

Karim Yaghmour

unread,
Jul 1, 2005, 10:40:06 PM7/1/05
to

A few months back, there was a very large thread of discussion
about the inclusion of the ltt code by Andrew in -mm. Following this
discussion, relayfs was quite heavily trimmed down. However, unlike
what I had promised, I never got around to actually do the same to
the ltt code. Part of it was my not being ready to actually gut 5
years of coding ... that was just kind of difficult. Lately though,
through active discussion on the ltt-dev list, this issue has
resurfaced and a few pieces of revamped code started going around.
Thanks to Mathieu Desnoyers (Ecole Polytechnique) and Michael
Raymond (SGI) getting things moving again, I got back to thinking
about the best way to get the LTT code down to a palatable structure.
And this time around, I gave simplicity a chance ...

Which brings me to the patch below. This is a significantly cut down
version of the ltt core. It's now 5K instead of the initial 100K.
While the size has been trimmed down, much of the functionality can
still be easily obtained through the introduction of a new method:
the ltt multiplexer (ltt_mux). Basically, this is the function that
controls the tracing behavior. If none is provided, no tracing goes
on. Typically, such a function would be implemented as part of a
loadable "control" module. Said module would be responsible for:
- Allocating and managing relayfs buffers for storing events
- Allowing the user-space tracing daemon to control tracing, such as
by controling event masks, etc.
- Communicate with the user-space daemon for committing buffered data
- Providing primitives for having multiple tracing streams, including
flight-recording.
- Provide abstractions for registering new facilities and events.
- Maintaining overall sanity of tracing functionality.
IOW, much of what was purged can now be modularized and loaded
separately. Obviously this doesn't preclude having those modules
still packaged with the rest of the kernel, but it does make things
much cleaner.

This patch isn't definitive, it's truely experimental. I've only
compile-tested it for now. I'm posting it here mostly as a preview.
Of course, your feedback is welcome.

Signed-off-by: Karim Yaghmour <ka...@opersys.com>

-----

MAINTAINERS | 10 +++++
include/linux/ltt-core.h | 35 ++++++++++++++++++
init/Kconfig | 19 ++++++++++
kernel/Makefile | 1
kernel/ltt-core.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 153 insertions(+)


diff -urpN linux-2.6.12-rc4-mm2/include/linux/ltt-core.h linux-2.6.12-rc4-mm2-ltt/include/linux/ltt-core.h
--- linux-2.6.12-rc4-mm2/include/linux/ltt-core.h 1969-12-31 19:00:00.000000000 -0500
+++ linux-2.6.12-rc4-mm2-ltt/include/linux/ltt-core.h 2005-07-01 22:00:56.000000000 -0400
@@ -0,0 +1,35 @@
+/*
+ * linux/include/linux/ltt-core.h
+ *
+ * Copyright (C) 1999-2005 Karim Yaghmour (ka...@opersys.com)
+ *
+ * Definitions for the Linux Trace Toolkit core.
+ */
+
+#ifndef _LTT_CORE_H
+#define _LTT_CORE_H
+
+#include <linux/types.h>
+#include <linux/relayfs_fs.h>
+
+#define LTT_RELAYFS_ROOT "ltt"
+
+struct ltt_event_header {
+ struct timeval time;
+ u8 eid;
+ uint32_t length;
+};
+
+struct ltt_chan_buf {
+ struct rchan_buf* rbuf;
+
+ struct ltt_chan_buf* next;
+};
+
+extern struct ltt_chan_buf* (*ltt_mux)(uint8_t cpuid, uint8_t eid, uint32_t length, char* buf);
+
+extern struct dentry* ltt_root_dentry; /* Root of the directory tree */
+
+extern void ltt_log_event(uint8_t eid, uint32_t length, char* buf);
+
+#endif /* _LTT_CORE_H */
diff -urpN linux-2.6.12-rc4-mm2/init/Kconfig linux-2.6.12-rc4-mm2-ltt/init/Kconfig
--- linux-2.6.12-rc4-mm2/init/Kconfig 2005-05-18 15:12:03.000000000 -0400
+++ linux-2.6.12-rc4-mm2-ltt/init/Kconfig 2005-07-01 21:28:03.000000000 -0400
@@ -238,6 +238,25 @@ config CPUSETS

Say N if unsure.

+config LTT_CORE
+ bool "Support for LTT core"
+ depends on RELAYFS_FS
+ default n
+ help
+ The Linux Trace Toolkit (LTT) is a combination of kernel
+ components and user-space control and visualization tools for
+ tracing the kernel and analyzing its dynamic behavior.
+
+ For more information, please visit:
+ http://www.opersys.com/ltt
+
+ Experimental code for next-generation user-space tools
+ can be found here:
+ http://ltt.polymtl.ca
+
+ Say N if unsure.
+
+
menuconfig EMBEDDED
bool "Configure standard kernel features (for small systems)"
help
diff -urpN linux-2.6.12-rc4-mm2/kernel/ltt-core.c linux-2.6.12-rc4-mm2-ltt/kernel/ltt-core.c
--- linux-2.6.12-rc4-mm2/kernel/ltt-core.c 1969-12-31 19:00:00.000000000 -0500
+++ linux-2.6.12-rc4-mm2-ltt/kernel/ltt-core.c 2005-07-01 22:00:30.000000000 -0400
@@ -0,0 +1,88 @@
+/*
+ * ltt-core.c
+ *
+ * (C) Copyright, 1999, 2000, 2001, 2002, 2003, 2004, 2005 -
+ * Karim Yaghmour (ka...@opersys.com)
+ *
+ * Contains the core kernel functionality for the Linux Trace Toolkit.
+ *
+ * Authors:
+ * Karim Yaghmour (ka...@opersys.com)
+ * Tom Zanussi (zan...@us.ibm.com)
+ * Bob Wisniewski (b...@watson.ibm.com)
+ * Mathieu Desnoyers (mathieu....@polymtl.ca)
+ * Michael A. Raymond (mray...@sgi.com)
+ */
+#include <linux/types.h>
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/ltt-core.h>
+#include <linux/module.h>
+#include <linux/relayfs_fs.h>
+#include <linux/spinlock.h>
+
+/* Main multiplexer */
+struct ltt_chan_buf* (*ltt_mux)(uint8_t, uint8_t, uint32_t, char*) = NULL;
+
+/* LTT root in relayfs */
+struct dentry* ltt_root_dentry = NULL;
+
+/* This is the base function that all event specific functions will use to
+ actually write the log record. */
+void ltt_log_event(uint8_t eid,
+ uint32_t length,
+ char* buf)
+{
+ uint8_t cpu;
+ unsigned long flags;
+ struct timeval event_time;
+ struct ltt_chan_buf* ltt_buf = NULL;
+ struct ltt_event_header* header;
+
+ /* Can't go further without an active multiplexer, do this ASAP not to
+ impact performance */
+ if (!ltt_mux)
+ return;
+
+ local_irq_save(flags);
+
+ cpu = smp_processor_id();
+
+ /* Do we log the event? */
+ if (!(ltt_buf = ltt_mux(cpu, eid, length, buf)))
+ goto log_done;
+
+ /* Get timestamp just once */
+ do_gettimeofday(&event_time);
+
+ /* Loop through the returned list of rchans and write to each one. */
+ do {
+ if ((header = relay_reserve(ltt_buf->rbuf->chan,
+ sizeof(header) + length))) {
+ memcpy(&header->time, &event_time, sizeof(event_time));
+ header->eid = eid;
+ header->length = length;
+ memcpy(header + sizeof(header), buf, length);
+
+ relay_commit(ltt_buf->rbuf, header, sizeof(header) + length);
+ }
+ } while ((ltt_buf = ltt_buf->next));
+
+log_done:
+ local_irq_restore(flags);
+}
+
+/* Initialize the LTT state at boot time */
+static int __init ltt_init(void)
+{
+ /* Create the basic RelayFS directory tree */
+ if (!(ltt_root_dentry = relayfs_create_dir(LTT_RELAYFS_ROOT, NULL)))
+ return -1;
+
+ return 0;
+}
+__initcall(ltt_init);
+
+EXPORT_SYMBOL(ltt_root_dentry);
+EXPORT_SYMBOL(ltt_mux);
+EXPORT_SYMBOL(ltt_log_event);
diff -urpN linux-2.6.12-rc4-mm2/kernel/Makefile linux-2.6.12-rc4-mm2-ltt/kernel/Makefile
--- linux-2.6.12-rc4-mm2/kernel/Makefile 2005-05-18 15:12:03.000000000 -0400
+++ linux-2.6.12-rc4-mm2-ltt/kernel/Makefile 2005-07-01 21:17:31.000000000 -0400
@@ -31,6 +31,7 @@ obj-$(CONFIG_DETECT_SOFTLOCKUP) += softl
obj-$(CONFIG_GENERIC_HARDIRQS) += irq/
obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
obj-$(CONFIG_SECCOMP) += seccomp.o
+obj-$(CONFIG_LTT_CORE) += ltt-core.o

ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
# According to Alan Modra <al...@linuxcare.com.au>, the -fno-omit-frame-pointer is
diff -urpN linux-2.6.12-rc4-mm2/MAINTAINERS linux-2.6.12-rc4-mm2-ltt/MAINTAINERS
--- linux-2.6.12-rc4-mm2/MAINTAINERS 2005-05-18 15:12:03.000000000 -0400
+++ linux-2.6.12-rc4-mm2-ltt/MAINTAINERS 2005-07-01 21:28:36.000000000 -0400
@@ -1444,6 +1444,16 @@ L: linux-secu...@wirex.com
W: http://lsm.immunix.org
S: Supported

+LINUX TRACE TOOLKIT
+P: Karim Yaghmour
+M: ka...@opersys.com
+P: Mathieu Desnoyers
+M: mathieu....@polymtl.ca
+W: http://www.opersys.com/LTT
+W: http://ltt.polymtl.ca
+L: ltt...@listserv.shafik.org
+S: Maintained
+
LM83 HARDWARE MONITOR DRIVER
P: Jean Delvare
M: kh...@linux-fr.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Michael Raymond

unread,
Jul 2, 2005, 9:20:07 AM7/2/05
to
Karim, three comments:
- CPU ID needs be bigger than 8-bits, else you can only support up to 256p
- Are 8-bit event IDs big enough? A comment explainig that sub-IDs should
be placed within the log record would help avoid the issue of the 8-bit
space disappearing too quickly
- To avoid the problem of RelayFS non-reenterancy I think ltt_log_event() will
need a hook at its end so that whatever mechanism was used to avoid the
situation can be disabled.
Thanks,
Michael
--
Michael A. Raymond Office: (651) 683-3434
Core OS Group Real-Time System Software

Karim Yaghmour

unread,
Jul 2, 2005, 10:30:11 AM7/2/05
to

Thanks for the feedback Michael.

Michael Raymond wrote:
> Karim, three comments:
> - CPU ID needs be bigger than 8-bits, else you can only support up to 256p

I don't mind. uint16_t?

> - Are 8-bit event IDs big enough? A comment explainig that sub-IDs should
> be placed within the log record would help avoid the issue of the 8-bit
> space disappearing too quickly

.... uint16_t?

> - To avoid the problem of RelayFS non-reenterancy I think ltt_log_event() will
> need a hook at its end so that whatever mechanism was used to avoid the
> situation can be disabled.

I think this one needs more thinking ... I did think about adding such a
hook, but I didn't like the idea of it, it just seems unclean. After all
the problem is limited to a very small subset of events, namely NMIs and
page fault traps. One thing I was thinking about is that in both these
cases there's always an entry event and an exit event. ltt_mux could then
coordinate using these events. IOW, if it just got an NMI entry, it
doesn't allow any other event to get logged until it sees the NMI exit
go by ... same for page faults. By doing this, we can just keep just one
hook: ltt_mux. Right?

Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || ka...@opersys.com || 1-866-677-4546

Karim Yaghmour

unread,
Jul 2, 2005, 12:00:09 PM7/2/05
to

Karim Yaghmour wrote:
> I think this one needs more thinking ... I did think about adding such a
> hook, but I didn't like the idea of it, it just seems unclean. After all
> the problem is limited to a very small subset of events, namely NMIs and
> page fault traps. One thing I was thinking about is that in both these
> cases there's always an entry event and an exit event. ltt_mux could then
> coordinate using these events. IOW, if it just got an NMI entry, it
> doesn't allow any other event to get logged until it sees the NMI exit
> go by ... same for page faults. By doing this, we can just keep just one
> hook: ltt_mux. Right?

Brain was slow this morning. What we're trying to do is to avoid getting
those very NMIs and page faults into the logging path if we're already
in that path ... That's different from what my brain came up with this
morning. We may need that hook after all ... something like:

if (ltt_post)
ltt_post(...);

Christoph Hellwig

unread,
Jul 2, 2005, 12:10:10 PM7/2/05
to
This code is rather pointless. The ltt_mux is doing all the real
work and it's not included. And while we're at it the layering for
it is wrong aswell - the ltt_log_event API should be implemented by
the actual multiplexer with what's in ltt_log_event now minus the
irq disabling becoming a library function.

Exporting a pointer to the root dentry seems like a very wrong API
aswell, that's an implementation detail that should be hidden.

Besides that the code is not following Documentation/CodingStyle
at all, please read it.

Besides that I'd sugest scrapping the ltt name and ltt_ prefix - we know
we're on linux, adn we don't care whether it's a toolkit, but spelling trace_
out would actually be a lot more descriptive. So what about trace_* symbol
names and trace.[ch] filenames?

Karim Yaghmour

unread,
Jul 2, 2005, 5:10:08 PM7/2/05
to

Thanks for the feedback, it's greatly appreciated.

Christoph Hellwig wrote:
> This code is rather pointless. The ltt_mux is doing all the real
> work and it's not included. And while we're at it the layering for
> it is wrong aswell - the ltt_log_event API should be implemented by
> the actual multiplexer with what's in ltt_log_event now minus the
> irq disabling becoming a library function.

Actually I kind of disagree here. Yes, you're partially right, ltt_mux
is doing a lot of work, and it's not included. However, what work
ltt_mux is doing is administrative and that's what was complained
about a lot last time the ltt patches were included. So yes, I could
provide a very basic ltt_mux that would instantiate a single relayfs
channel and does no filtering whatsoever, but that would be
insufficient for real usage. And if I provided a full mux, then we'd
pretty much end up with the same code we had previously.

By having it this way, the essential part of the mechanism, its
logging code, is shared by all, yet there can be any number of muxes
loaded on top of it. The LKST project, for example, has got a module
that just counts the events that occur. Plug that as the mux, and
always return NULL (no channel to write to) and you've ready to go.

For ltt, the mux would be quite involved, including having netlink
sockets going back and forth talking to a user-space daemon, and
allowing quite a few options/features to be set.

In other cases, it should be fairly simple to implement a mux
local to a given subsystem that a developer needs to monitor. He
can then manage everything about how tracing goes on without having
to rewrite his own logging function.

The rational here is simple: there is no need to have multiple
logging functions, but there are already multiple existing
implementations of deciding how and what needs to be logged,
how it's control, and how it interfaces with the outside world
(be it user-space or otherwise.) This code, simplistic as it may be,
serves this reality quite well.

If what's in ltt_log_event goes into the multiplexer, then we're
back to having each implementation have its own buffering mechanism
and yet no single entry-point for tracing inside the kernel.

Replacing local_irq_disable/enable() with function pointers is not
a problem, if that's something desirable.

> Exporting a pointer to the root dentry seems like a very wrong API
> aswell, that's an implementation detail that should be hidden.

That's just to allow add-on modules to find the hook point within
relayfs. We could just leave it to the multiplexer to "init tracing"
and then provide an API such as ltt_chan_add, ltt_chan_del,
ltt_chan_ctl, etc.

> Besides that the code is not following Documentation/CodingStyle
> at all, please read it.

I'll double-check, thanks.

> Besides that I'd sugest scrapping the ltt name and ltt_ prefix - we know
> we're on linux, adn we don't care whether it's a toolkit, but spelling trace_
> out would actually be a lot more descriptive. So what about trace_* symbol
> names and trace.[ch] filenames?

I don't feel in any way patriotic about the ltt_ prefix, any other
name will do just fine. However, trace_ is so generic as to be
confusing. There are already plenty of trace* in the kernel. If
you've got any other more-specific/less-generic name, I'm all ears.
How about evtrace?

Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || ka...@opersys.com || 1-866-677-4546

Christoph Hellwig

unread,
Jul 7, 2005, 10:20:28 AM7/7/05
to
On Sat, Jul 02, 2005 at 05:15:16PM -0400, Karim Yaghmour wrote:
> Christoph Hellwig wrote:
> > This code is rather pointless. The ltt_mux is doing all the real
> > work and it's not included. And while we're at it the layering for
> > it is wrong aswell - the ltt_log_event API should be implemented by
> > the actual multiplexer with what's in ltt_log_event now minus the
> > irq disabling becoming a library function.
>
> Actually I kind of disagree here. Yes, you're partially right, ltt_mux
> is doing a lot of work, and it's not included. However, what work
> ltt_mux is doing is administrative and that's what was complained
> about a lot last time the ltt patches were included. So yes, I could
> provide a very basic ltt_mux that would instantiate a single relayfs
> channel and does no filtering whatsoever, but that would be
> insufficient for real usage. And if I provided a full mux, then we'd
> pretty much end up with the same code we had previously.

We're not gonna add hooks to the kernel so you can copile the same
horrible code you had before against it out of tree. Do a sane demux
and submit it.

Karim Yaghmour

unread,
Jul 8, 2005, 9:20:07 AM7/8/05
to

Christoph Hellwig wrote:
> We're not gonna add hooks to the kernel so you can copile the same
> horrible code you had before against it out of tree. Do a sane demux
> and submit it.

If I just wanted hooks, I would have submitted a patch that did just
that, without any logging function. The code for the mux that goes
on top of that code is actually on its way to be completely rewritten.
I can see that you may have read my posting as indicating that we were
recompiling the same previous code out of tree, but that is certainly
not the intent.

FWIW, we'll look submitting a minimal mux with the patch.

Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || ka...@opersys.com || 1-866-677-4546

0 new messages