Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

KLive: Linux Kernel Live Usage Monitor

0 views
Skip to first unread message

Andrea Arcangeli

unread,
Aug 29, 2005, 11:10:50 PM8/29/05
to linux-...@vger.kernel.org
Hello,

During the Kernel Summit somebody raised the point that it's not clear
how much testing each rc/pre/git kernel gets before the final release.

So I setup a server to track automatically the amount of testing that
each kernel gets. Clearly this will be a very rough approximation and it
can't be reliable, but perhaps it'll be useful. If this won't be useful,
the time I spent on it is very minor so no problem ;).

All the details can be found in the project website:

http://klive.cpushare.com/

Full source (server included) is here:

http://klive.cpushare.com/downloads/klive-0.0.tar.bz2

To run the client:

wget http://klive.cpushare.com/klive.tac

Then at every boot (like in /etc/init.d/boot.local):

twistd -oy klive.tac

In theory we could get rid of the client entirely and make it a kernel
config option, but I've no idea if this project is useful, so I don't
want to spend too much time on it at this point.

Thank you.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Sven Ladegast

unread,
Aug 30, 2005, 4:04:28 AM8/30/05
to linux-...@vger.kernel.org
On Tue, 30 Aug 2005, Andrea Arcangeli wrote:

> During the Kernel Summit somebody raised the point that it's not clear
> how much testing each rc/pre/git kernel gets before the final release.

Generally this is a good idea to track the usage/testing time of different
versions.

> In theory we could get rid of the client entirely and make it a kernel
> config option, but I've no idea if this project is useful, so I don't
> want to spend too much time on it at this point.

The idea isn't bad but lots of people could think that this is some kind
of home-phoning or spy software. I guess lots of people would turn this
feature off...and of course you can't enable it by default. But combined
with an automatic oops/panic/bug-report this would be _very_ useful I think.

Let's see what others say about it.

Sven

Rogier Wolff

unread,
Aug 30, 2005, 4:31:15 AM8/30/05
to Sven Ladegast, linux-...@vger.kernel.org
On Tue, Aug 30, 2005 at 10:01:21AM +0200, Sven Ladegast wrote:
> The idea isn't bad but lots of people could think that this is some kind
> of home-phoning or spy software. I guess lots of people would turn this
> feature off...and of course you can't enable it by default. But combined
> with an automatic oops/panic/bug-report this would be _very_ useful I think.

It IS some "home phoning" and "spy software". However, when the
goal is to sign you up for more direct marketing, people tend to
object. When the goal is to keep track of running kernels, I'm
hopeful that people will recognise that this is different.

A trick to use would be to send an UDP packet at boot (after 1 minute
or so), and then randomly say "once a month" (i.e. about 1/30 chance of
sending a packet on the first day) The number of these random packets
recieved is a measure of the number of CPU-months that the kernel
runs.

Roger.

--
** R.E....@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ

Sven Ladegast

unread,
Aug 30, 2005, 4:53:56 AM8/30/05
to Rogier Wolff, linux-...@vger.kernel.org
On Tue, 30 Aug 2005, Rogier Wolff wrote:

> It IS some "home phoning" and "spy software". However, when the
> goal is to sign you up for more direct marketing, people tend to
> object. When the goal is to keep track of running kernels, I'm
> hopeful that people will recognise that this is different.

The problem is that people made bad experiences with home-phoning software
in the past. Changing their opinion about this issue isn't easy I think.
I can almost see the headlines: Spy software found in recent Linux
kernels... :o)

Although home-phoning can be useful under certain circumstances it is the
wrong way to implement it in a kernel. IMHO a userspace tool is the better
solution: Everyone can decide if he/she wants to report what kernel
version is running on their systems.

> A trick to use would be to send an UDP packet at boot (after 1 minute
> or so), and then randomly say "once a month" (i.e. about 1/30 chance of
> sending a packet on the first day) The number of these random packets
> recieved is a measure of the number of CPU-months that the kernel
> runs.

This could be a sloution but like you know UDP packets may or may not
arrive the destination address. So the packet loss with this method could
be very high, expecially if you send only one packet. Using a
TCP-connection for this is a lot more stable and the payload can be
encrypted too.

Once again: I think this is a userspace task.

Sven

Rogier Wolff

unread,
Aug 30, 2005, 5:41:33 AM8/30/05
to Sven Ladegast, linux-...@vger.kernel.org
On Tue, Aug 30, 2005 at 10:53:13AM +0200, Sven Ladegast wrote:
> >A trick to use would be to send an UDP packet at boot (after 1 minute
> >or so), and then randomly say "once a month" (i.e. about 1/30 chance of
> >sending a packet on the first day) The number of these random packets
> >recieved is a measure of the number of CPU-months that the kernel
> >runs.
>
> This could be a sloution but like you know UDP packets may or may not
> arrive the destination address. So the packet loss with this method could
> be very high, expecially if you send only one packet. Using a
> TCP-connection for this is a lot more stable and the payload can be
> encrypted too.

The "load" that an UDP packet poses on a system is much lower than
for a TCP connection. The fact that UDP packets sometimes get lost
is not much of an issue: Those packets simply wouldn't get logged.
So what?

In 90% (my guess, 90% of statistics is made up....) of the cases
where the first packet doesn't reach the destination, any subsequent
packets also wouldn't. So if it is so unimportant as here, why bother
with the more overhead of the TCP connection?

The "in kernel module" that might send this, could put some easily
gathered information into the packet. The goal of logging kernels-
that-get-run would then be met. Installing a userspace program is
something that most testers won't be bothered to do.

A kernel option that is clearly documented what exact info is logged
would IMHO work better. (A userspace program is technically a better
solution, the social aspect of getting a bigger user-base is the main
reason for me to suggest the in-kernel approach).

(the people who go upgrading kernels tend to be different people from
those who go installing programs for fun.)

Roger.

--
** R.E....@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ

Bernd Petrovitsch

unread,
Aug 30, 2005, 5:55:30 AM8/30/05
to Rogier Wolff, Sven Ladegast, linux-...@vger.kernel.org
On Tue, 2005-08-30 at 11:40 +0200, Rogier Wolff wrote:
[...]

> would IMHO work better. (A userspace program is technically a better
> solution, the social aspect of getting a bigger user-base is the main
> reason for me to suggest the in-kernel approach).

So *if* a user wants to participate, he/she also installs and configures
some daemon for this (which is also easier to look into if one is
curious) and there is no "Linux kernel phones home" stories.
And if not, he/she can easily remove it (again).

Bernd
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services

Alan Cox

unread,
Aug 30, 2005, 10:08:17 AM8/30/05
to Sven Ladegast, linux-...@vger.kernel.org
On Maw, 2005-08-30 at 10:01 +0200, Sven Ladegast wrote:
> The idea isn't bad but lots of people could think that this is some kind
> of home-phoning or spy software. I guess lots of people would turn this
> feature off...and of course you can't enable it by default. But combined
> with an automatic oops/panic/bug-report this would be _very_ useful I think.

Wrong way around - you need to let people turn it on. Perhaps distribute
it with the kernel so you can

make register
[Reports hardware, stashed a unique sha-1 hashed cookie]
[Asks for permission, installs UDP ping daemon]

make unregister


but it would have to be opt in. That might lower coverage but should
increase quality, especially id the id in the cookie can be put into
bugzilla reports, and the hardware reporting is done so it can be
machine processed (ie so you can ask stuff like 'reliability with Nvidia
IDE')

Alan

Andrea Arcangeli

unread,
Aug 30, 2005, 10:47:17 AM8/30/05
to Rogier Wolff, Sven Ladegast, linux-...@vger.kernel.org
On Tue, Aug 30, 2005 at 10:29:01AM +0200, Rogier Wolff wrote:
> sending a packet on the first day) The number of these random packets
> recieved is a measure of the number of CPU-months that the kernel
> runs.

This is more or less what klive currently does, except it's a bit more
sophisticated than that, so you don't risk to lose uptime if a udp
packet is lost (or if the server goes down, or if dns resolution fails),
and secondly currently klive gets right suspend to disk. But it still
gets right suspend to disk, when system is suspended that's not
accounted as "uptime".

Andrea Arcangeli

unread,
Aug 30, 2005, 10:56:35 AM8/30/05
to Sven Ladegast, linux-...@vger.kernel.org
On Tue, Aug 30, 2005 at 10:01:21AM +0200, Sven Ladegast wrote:
> [..] combined
> with an automatic oops/panic/bug-report this would be _very_ useful I think.

That would be nice addition IMHO. It'll be more complex since it'll
involve netconsole dumping and passing the klive session to the kernel
somehow (userland would be too unreliable to push the oops to the
server). The worst part is that oops dumping might expose random kernel
data (it could contain ssh keys as well), so I would either need to
purify the stack/code/register lines making the oops quite useless, or
not to show it at all (and only to show the count of the oopses
publically). A parameter could be used to tell the kernel if the whole
oops should be sent to the klive server or if only the notification an
oops should be sent (without sending the payload with potentially
sensitive data inside).

Andrea Arcangeli

unread,
Aug 30, 2005, 11:11:12 AM8/30/05
to Rogier Wolff, Sven Ladegast, linux-...@vger.kernel.org
On Tue, Aug 30, 2005 at 11:40:58AM +0200, Rogier Wolff wrote:
> packets also wouldn't. So if it is so unimportant as here, why bother
> with the more overhead of the TCP connection?

I agree TCP isn't needed, I also don't see SSL very useful here, I use
it extensively for other projects and it would have been even simpler to
use SSL over TCP than to use cleartext UDP with twisted, but it was
pointless to hide the contents of the packet on the network, when then I
show all of them on the website ;) So I'd rather save some packet on the
network and some cpu as well.

> A kernel option that is clearly documented what exact info is logged
> would IMHO work better. (A userspace program is technically a better

It's certainly much easier to tweak the kernel config before compiling
the kernel than to edit the mess in /etc/init.d/* with all the
gratuitous differences of the userland flavours.

Clearly it would be an option to keep disabled by default.

The object of the project is to know how much testing a rc/pre kernel
had before release, and most of the testers are supposed to tweak the
config option by themself, so having a config tweak would make it very
easy to setup. It'll be a bit lighter too, twisted currently takes 6m of
RSS on a x86.

However I'm quite neutral, the main advantage of the userland solution
is that it has been orders of magnitude simpler to develop.

I could perhaps write an auto-installer script, that fetches the tac
file with wget and adds a line to /etc/init.d/boot.local to make life
easier.

Alan Cox

unread,
Aug 30, 2005, 12:08:22 PM8/30/05
to Andrea Arcangeli, Rogier Wolff, Sven Ladegast, linux-...@vger.kernel.org
On Maw, 2005-08-30 at 17:10 +0200, Andrea Arcangeli wrote:
> It's certainly much easier to tweak the kernel config before compiling
> the kernel than to edit the mess in /etc/init.d/* with all the
> gratuitous differences of the userland flavours.

Just follow the LSB specification and about the only thing thats totally
out of field is Slackware.

> easy to setup. It'll be a bit lighter too, twisted currently takes 6m of
> RSS on a x86.

Right thats my first reaction, 6Mbytes of unauditable weirdness versus a
tiny C program or a shell script using netcat.

echo "Reporting boot: "
(echo "BOOT:"$(cat /etc/lum-serial)":"$(uname -a)"::") | nc -u -w 10
testhost.example.com 7658

> I could perhaps write an auto-installer script, that fetches the tac
> file with wget and adds a line to /etc/init.d/boot.local to make life
> easier.

For one distro perhaps. Using a proper init service script makes it work
for pretty much everyone.

Andrea Arcangeli

unread,
Aug 30, 2005, 12:17:16 PM8/30/05
to Alan Cox, Rogier Wolff, Sven Ladegast, linux-...@vger.kernel.org
On Tue, Aug 30, 2005 at 05:33:38PM +0100, Alan Cox wrote:
> Just follow the LSB specification and about the only thing thats totally
> out of field is Slackware.

Fair enough, though one line like '(sleep 60; twistd ...) & in
/etc/init.d/boot.local would have been a bit simpler for a quick and
dirty autoinstall .sh script (that's the simplest way I install it in my system).

> Right thats my first reaction, 6Mbytes of unauditable weirdness versus a

;)

> tiny C program or a shell script using netcat.
>
> echo "Reporting boot: "
> (echo "BOOT:"$(cat /etc/lum-serial)":"$(uname -a)"::") | nc -u -w 10
> testhost.example.com 7658

Client completely stateless couldn't get right suspend to disk as far as
I can tell.

Tiny C program will be less tiny than the current tac file and the
package would immediately become arch dependent. Plus if you want to run
it as user nobody the twistd -u/g --pidfile --logfile and all the rest
in twisted make life so much easier. On my systems I've other services
running in background with twistd so perhaps I'm biased because I share
almost all of it ;).

> For one distro perhaps. Using a proper init service script makes it work
> for pretty much everyone.

I'm not very optimistic about the depdency chain to be distro
indipendent, but I will look into that shortly and I guess here I'm
running a bit offtopic.

Thanks!

Alan Cox

unread,
Aug 30, 2005, 12:28:50 PM8/30/05
to Andrea Arcangeli, Rogier Wolff, Sven Ladegast, linux-...@vger.kernel.org
On Maw, 2005-08-30 at 18:16 +0200, Andrea Arcangeli wrote:
> Tiny C program will be less tiny than the current tac file and the
> package would immediately become arch dependent.

I doubt there is anything needed that can't be done in sh and nc here.
Catching boots can be done by adding one to a boot number and sending
that as well. How does suspend to disk handle uptime - if the uptime
stops then sending the uptime will deal with it.

I'm happy to whack on some sh scripts to do the client side.

Jesper Juhl

unread,
Aug 30, 2005, 12:37:43 PM8/30/05
to Alan Cox, Andrea Arcangeli, Rogier Wolff, Sven Ladegast, linux-...@vger.kernel.org
On 8/30/05, Alan Cox <al...@lxorguk.ukuu.org.uk> wrote:
> On Maw, 2005-08-30 at 17:10 +0200, Andrea Arcangeli wrote:
> > It's certainly much easier to tweak the kernel config before compiling
> > the kernel than to edit the mess in /etc/init.d/* with all the
> > gratuitous differences of the userland flavours.
>
> Just follow the LSB specification and about the only thing thats totally
> out of field is Slackware.
>
These days Slackware has /etc/rc.d/rc.sysvinit to run SystemV style
init scripts in addition to its own BSD init scripts. So as long as
the script is well written (takes start/stop arguments) and is placed
in /etc/rc.d/rc${runlevel}.d/ as both K* and S*, then all a slackware
user needs to do to run it at boot is to chmod +x
/etc/rc.d/rc.sysvinit if they haven't already.


--
Jesper Juhl <jespe...@gmail.com>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

Andrea Arcangeli

unread,
Aug 30, 2005, 12:41:35 PM8/30/05
to Alan Cox, Rogier Wolff, Sven Ladegast, linux-...@vger.kernel.org
On Tue, Aug 30, 2005 at 05:56:33PM +0100, Alan Cox wrote:
> I doubt there is anything needed that can't be done in sh and nc here.
> Catching boots can be done by adding one to a boot number and sending
> that as well. How does suspend to disk handle uptime - if the uptime
> stops then sending the uptime will deal with it.

I agree it's feasible.

> I'm happy to whack on some sh scripts to do the client side.

You're welcome, the client directory currently contains the tac file, we
can add more clients no problem ;).

Wilkerson, Bryan P

unread,
Aug 30, 2005, 1:12:55 PM8/30/05
to Andrea Arcangeli, linux-...@vger.kernel.org

On Tue, Aug 30, 2005 at 10:01:21AM +0200, Sven Ladegast wrote:

> The idea isn't bad but lots of people could think that this is some
kind
> of home-phoning or spy software. I guess lots of people would turn
this
> feature off...and of course you can't enable it by default. But

combined
> with an automatic oops/panic/bug-report this would be _very_ useful I
> think.

I think this is useful and would personally participate if it were a
config tweak. There are a couple of issues that come to mind.

1. Possibly paranoia, but given the apparent numbers of people with
malicious intent on the Internet and knowing that there are some
financially motivated to make Linux kernel developers over confident in
they're work, I'm not sure I'd trust or use the data unless it was
somehow authenticated.

2. Some of us sit behind corporate firewalls and proxies that have
oppressive rules that would have made Stalin proud. The solution must
be proxy aware and if it used HTTP, even better because it's more likely
to work anywhere. The proxy settings could also be a .config thing.

3. Again security; I haven't cleared this with my corporate superiors
but I'm not sure they'll like the fact that anyone could intercept the
data and compute how many people in the company are running Linux test
kernels. I know this almost sounds anti-open but we're breaking them in
slowly to the model and I don't think they are ready for this one just
yet. :)

-bryan

linux-os (Dick Johnson)

unread,
Aug 30, 2005, 1:46:00 PM8/30/05
to Wilkerson, Bryan P, Andrea Arcangeli, linux-...@vger.kernel.org

The beginnings of "Magic Lantern" and "Carnivore"? Good, now just
use port 25 because everybody has port 25 open ..... Just like
Microsnitch^M^M^M^M^Msoft.


Cheers,
Dick Johnson
Penguin : Linux version 2.6.13 on an i686 machine (5589.48 BogoMips).
Warning : 98.36% of all statistics are fiction.
.
I apologize for the following. I tried to kill it with the above dot :

****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to Deliver...@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

Andrea Arcangeli

unread,
Aug 30, 2005, 4:31:59 PM8/30/05
to Wilkerson, Bryan P, linux-...@vger.kernel.org
On Tue, Aug 30, 2005 at 10:08:38AM -0700, Wilkerson, Bryan P wrote:
> they're work, I'm not sure I'd trust or use the data unless it was
> somehow authenticated.

I doubt many testers would be willing to register on yet another website
just for this. So I doubt adding authentication is a good idea.

However if you really want to authenticate I could add an email based
authentication method similar to the CPUShare authentication method that
is already implemented and fully secure.

Then I can add a button to hide all not authenticated users from the
listing. Things will be substantially more complicated on the server
side, so I'd rather prefer that we solve the below points first.

> 2. Some of us sit behind corporate firewalls and proxies that have
> oppressive rules that would have made Stalin proud. The solution must
> be proxy aware and if it used HTTP, even better because it's more likely
> to work anywhere. The proxy settings could also be a .config thing.

I can easily add a second entry point to the server that can pass
through the proxy no problem.

> 3. Again security; I haven't cleared this with my corporate superiors
> but I'm not sure they'll like the fact that anyone could intercept the
> data and compute how many people in the company are running Linux test
> kernels. I know this almost sounds anti-open but we're breaking them in
> slowly to the model and I don't think they are ready for this one just
> yet. :)

Sure I understand, KLive wasn't thought in terms of corporate firewalls
that must hide anything behind the firewall (I wonder how the proxy
prevents the people to search in google though, I bet a few of the
cleartext search queries and the syn and tcp timestamp sequence numbers
will reveal much more than whatever could ever be sent to klive in
cleartext ;).

Then I guess all you need is that I use a https instead of http for the
secondary entry point discussed above (assuming your proxy lets you do
https).

Still the routing points of the internet could count the syn packets
that you send to klive.cpushare.com and by watching the statistics with
many computers coming from the same host md5-sum they may be able to
guess which is the "host" that corresponds to the IP that is sending
the many syns.

So before I add features for your special needs I'd rather make sure
that you can live with this worst case condition of the "syn" guessing
coming from your proxy and with destination klive.cpushare.com.

Thanks a lot!

Bill Davidsen

unread,
Aug 30, 2005, 6:07:05 PM8/30/05
to Rogier Wolff, linux-...@vger.kernel.org
Rogier Wolff wrote:
> On Tue, Aug 30, 2005 at 10:53:13AM +0200, Sven Ladegast wrote:
>
>>>A trick to use would be to send an UDP packet at boot (after 1 minute
>>>or so), and then randomly say "once a month" (i.e. about 1/30 chance of
>>>sending a packet on the first day) The number of these random packets
>>>recieved is a measure of the number of CPU-months that the kernel
>>>runs.
>>
>>This could be a sloution but like you know UDP packets may or may not
>>arrive the destination address. So the packet loss with this method could
>>be very high, expecially if you send only one packet. Using a
>>TCP-connection for this is a lot more stable and the payload can be
>>encrypted too.

The information will be public anyway, what's the gain. This is a way
for people to voluntarily give you information, keep it simple. And to
that end run it as a user program. It should call home at start (boot),
stop, and from time to time to prove it's up. A single UDP packet can do
all that, and if you see machine X boot 2.6.99-rc5 and drop back to rc4
in ten minutes, that's valuable information. And if people stop running
the test kernel and drop back to a vendor kernel, THAT'S valuable info
as well. Time of use is not the only indication here, fallback is an
indication that a kernel boot was not pleasing in some way.


>
>
> The "load" that an UDP packet poses on a system is much lower than
> for a TCP connection. The fact that UDP packets sometimes get lost
> is not much of an issue: Those packets simply wouldn't get logged.
> So what?
>
> In 90% (my guess, 90% of statistics is made up....) of the cases
> where the first packet doesn't reach the destination, any subsequent
> packets also wouldn't. So if it is so unimportant as here, why bother
> with the more overhead of the TCP connection?

Your assumption is unrealistic, a fair number of routers start dropping
packets under load, ping first, then some other icmp, then udp, tcp last
(usually).


>
> The "in kernel module" that might send this, could put some easily
> gathered information into the packet. The goal of logging kernels-
> that-get-run would then be met. Installing a userspace program is
> something that most testers won't be bothered to do.

I think you have it backward, people will add a user program after the
fact, they may not recompile just to add a feature.


>
> A kernel option that is clearly documented what exact info is logged
> would IMHO work better. (A userspace program is technically a better
> solution, the social aspect of getting a bigger user-base is the main
> reason for me to suggest the in-kernel approach).

The social issue is that on a stable machine I'm not going to change the
kernel, but any one user can run the program to provide usage without
having access to anything important, so I can install this as nobody or
it's own UID, and feel better than putting it in my kernel.


>
> (the people who go upgrading kernels tend to be different people from
> those who go installing programs for fun.)

Part of my testing involves booting a kernel and seeing if it stays up,
the user program could be added after the fact.

The kernel info, memory, CPU, and machine name can be gotten without
privs, as can uptime. As far as how to install it, provide a script and
put it in cron (system or user). So you can actually track so info on
the system, like load. A week running while I was on vacation doesn't
test much, a week running on a loaded server tests other things.

The use will depend on how easy it is to install, patch and build isn't
easy. Crontab is. And I bet developers would be interested in how long
it takes a new release to be used in production. There are lots of
things you could add, but get it working first.

--
-bill davidsen (davi...@tmr.com)
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me

Sven Ladegast

unread,
Aug 30, 2005, 6:44:42 PM8/30/05
to Alan Cox, linux-...@vger.kernel.org
On Tue, 30 Aug 2005, Alan Cox wrote:

> but it would have to be opt in. That might lower coverage but should
> increase quality, especially id the id in the cookie can be put into
> bugzilla reports, and the hardware reporting is done so it can be
> machine processed (ie so you can ask stuff like 'reliability with Nvidia
> IDE')

Maybe I used the wrong words... But you are right: It has to be opt-in! A
change in the kernel sources which automagically sends data, regardless
what kind of data, to somewhere in the net must not be enabled by default.

But until klive is implemented one day it is interesting thinking about
what possibilities (and maybe even possible misuse) such a data
collection has. What data does klive send? Is the data just a hash of
different system variables or is it also possible to identify one single
computer (or person)? Data protection...laws etc. are things that must be
considered too maybe.

I think the problem is not the technical implementation. The bigger
problem is the data, where it comes from and the most interesting point
what to do with it at the end.

Sven

Alan Cox

unread,
Aug 30, 2005, 6:56:30 PM8/30/05
to Sven Ladegast, linux-...@vger.kernel.org
On Mer, 2005-08-31 at 00:43 +0200, Sven Ladegast wrote:
> collection has. What data does klive send? Is the data just a hash of
> different system variables or is it also possible to identify one single
> computer (or person)? Data protection...laws etc. are things that must be
> considered too maybe.

My thinking is something like this

"Register a box + optional PCI id list/CPU info"
Reply with a secured serial number

Uptime data then can just be boot number, time up


> I think the problem is not the technical implementation. The bigger
> problem is the data, where it comes from and the most interesting point
> what to do with it at the end.

We don't need personally identifiable data (email, name, ip address etc)

What to do with it will be most interesting indeed.

Sven Ladegast

unread,
Aug 30, 2005, 7:20:33 PM8/30/05
to Alan Cox, linux-...@vger.kernel.org
On Wed, 31 Aug 2005, Alan Cox wrote:

> "Register a box + optional PCI id list/CPU info"
> Reply with a secured serial number

Registering means to create an ID for the system? Something out of
timestamp plus your PCI IDs and CPU info and so on?

Sven

Andrea Arcangeli

unread,
Aug 30, 2005, 9:50:14 PM8/30/05
to Bill Davidsen, Rogier Wolff, linux-...@vger.kernel.org
On Tue, Aug 30, 2005 at 06:11:26PM -0400, Bill Davidsen wrote:
> the system, like load. A week running while I was on vacation doesn't
> test much, a week running on a loaded server tests other things.

btw, I thought about adding the load average too but it wasn't really
interesting, since sometime a server is stressed a lot for a few minutes
and then goes back to idle mode. A kernel bug will not necessairly
trigger because some app is I/O bound all the time. Certainly more load
is a factor that increases the probability of bugs and race conditions
though, it's just not obvious how to assign a 0/100% score to a certain
KLive "session".

> The use will depend on how easy it is to install, patch and build isn't
> easy. Crontab is. And I bet developers would be interested in how long

crontab really is easy and standard, crontab seems actually the only way I
could really write the few liner autoinstall script.

Alan Cox

unread,
Aug 31, 2005, 9:11:58 AM8/31/05
to Sven Ladegast, linux-...@vger.kernel.org
On Mer, 2005-08-31 at 01:19 +0200, Sven Ladegast wrote:
> On Wed, 31 Aug 2005, Alan Cox wrote:
>
> > "Register a box + optional PCI id list/CPU info"
> > Reply with a secured serial number
>
> Registering means to create an ID for the system? Something out of
> timestamp plus your PCI IDs and CPU info and so on?

Or have the other end issue you some kind of secure cookie, which was my
thought. Generating it locally as you suggest would be even better as a
hardware change would make a box change identity automatically

Sven Ladegast

unread,
Aug 31, 2005, 10:29:32 AM8/31/05
to Alan Cox, linux-...@vger.kernel.org
On Wed, 31 Aug 2005, Alan Cox wrote:

>> Registering means to create an ID for the system? Something out of
>> timestamp plus your PCI IDs and CPU info and so on?
>
> Or have the other end issue you some kind of secure cookie, which was my
> thought. Generating it locally as you suggest would be even better as a
> hardware change would make a box change identity automatically

Reading twice is sometimes better. :) It must have been late yesterday...

Well changing ID automagically can be okay because a system changes its ID
from time to time and so you cannot track a certain system/person easily.

Why not generating a unique system ID at compilation stage of the kernel
if the apopriate kernel option is enabled? This needn't have something to
do with klive...just a unique kernel-ID or something like that.

klive, if userspace or not, finally makes use of this ID to generate live
stats of kernel usage. PCI-IDs, CPU and whatever else could be used as a
salt to generate a really UNIQE ID...

Sven

tony...@intel.com

unread,
Aug 31, 2005, 3:15:46 PM8/31/05
to Andrea Arcangeli, Bill Davidsen, Rogier Wolff, linux-...@vger.kernel.org
Do you want to try to handle version skew ? All kernels built
from GIT trees look like 2.6.13 until Linus releases 2.6.14-rc1.
Possible approaches (requiring changes to the kernel Makefile).
1) Use the SHA1 of HEAD to provide a precise identification.
2) Use $(git-rev-tree linus ^v${VERSION}.${PATCHLEVEL}.${SUBLEVEL}${EXTRAVERSION} | wc -l)
to get an approximate distance from the base version

Another version issue is use of "localversion" ... I use it to tag
kernels with a summary of the config file I used during build (e.g.
-tiger-smp, or -generic-up). Looking at the results you've collected
so far, there appear to be a variety of other conventions in use
that prevent aggregation of results.

-Tony

Andrea Arcangeli

unread,
Aug 31, 2005, 3:47:34 PM8/31/05
to tony...@intel.com, Bill Davidsen, Rogier Wolff, linux-...@vger.kernel.org
On Wed, Aug 31, 2005 at 12:14:23PM -0700, tony...@intel.com wrote:
> Do you want to try to handle version skew ? All kernels built
> from GIT trees look like 2.6.13 until Linus releases 2.6.14-rc1.
> Possible approaches (requiring changes to the kernel Makefile).
> 1) Use the SHA1 of HEAD to provide a precise identification.
> 2) Use $(git-rev-tree linus ^v${VERSION}.${PATCHLEVEL}.${SUBLEVEL}${EXTRAVERSION} | wc -l)
> to get an approximate distance from the base version
>
> Another version issue is use of "localversion" ... I use it to tag
> kernels with a summary of the config file I used during build (e.g.
> -tiger-smp, or -generic-up). Looking at the results you've collected
> so far, there appear to be a variety of other conventions in use
> that prevent aggregation of results.

Aggregation of results seems the biggest problem right now. If we add
the git tag we really have to aggregate the git revisions before showing
the main page (or there would be too many of them). So we need at least
a standard way to do that. Perhaps it's simpler to export it via
readonly sysctl or with /proc and passed separately to the server (not
mixed in the uname strings)? I can extend the protocol without
invalidating the old clients and old data.

I'm thinking to add optional aggregations for (\d+)\.(\d+)\.(\d+)\D and
for different archs. So you can watch ia64 only or 2.6.13 only etc...

The "-tiger-smp/-generic-up" makes life harder indeed ;).

If there was a more standard way to add extraversions and localversions
aggregation would be easier and more reliable.

Andrea Arcangeli

unread,
Aug 31, 2005, 5:23:39 PM8/31/05
to Sven Ladegast, Alan Cox, linux-...@vger.kernel.org
On Wed, Aug 31, 2005 at 04:28:59PM +0200, Sven Ladegast wrote:
> Why not generating a unique system ID at compilation stage of the kernel
> if the apopriate kernel option is enabled? This needn't have something to
> do with klive...just a unique kernel-ID or something like that.

I could also store an unique ID on disk without involving the kernel, if
all you want is to track a single computer. But I didn't want to track a
single computer. The main reason there is an "host" (as md5 of the IP)
is to give more values to info coming from different IP (assuming not
everyone is out there to confuse data). But it's not really about
tracking.

However I like the idea of uploading the `lspci -v` output since it
could be useful to know about really good hardware and drivers.

About the cookie I'm skeptical about the need of it, because it wouldn't
be secure anyway (there's no way for me to verify that the pci-ids are
the real ones that are in the computer so any notion of security is
quite pointless here), if something we need an ack that the packet was
not lost and that we should keep sending the pciids in at the next
packet too.

The only reason to use ssl would be to hide the pci-ids on the network
transfer (not really to make the cookie secure).

BTW, in the meantime I wrote the completely generic installer (this
is not rpm/deb kind of installer, it's a quick and dirty approach but it
should run in all distro and in all archs:

wget http://klive.cpushare.com/install.sh
sh install.sh --install

that will make it persistent. It goes into /var/tmp/klive-*

to uninstall it *completely*:

sh install.sh --uninstall
rm install.sh

You don't need root for the above (infact I never tested it as root, but
it should work as root too ;).

Please let me know if there are problem with the quick and dirty
installer (I finished it a few minutes ago), thanks!

Sven Ladegast

unread,
Sep 1, 2005, 8:27:31 AM9/1/05
to Andrea Arcangeli, linux-...@vger.kernel.org
On Tue, 30 Aug 2005, Andrea Arcangeli wrote:

> That would be nice addition IMHO. It'll be more complex since it'll
> involve netconsole dumping and passing the klive session to the kernel
> somehow (userland would be too unreliable to push the oops to the
> server). The worst part is that oops dumping might expose random kernel
> data (it could contain ssh keys as well), so I would either need to
> purify the stack/code/register lines making the oops quite useless, or
> not to show it at all (and only to show the count of the oopses
> publically). A parameter could be used to tell the kernel if the whole
> oops should be sent to the klive server or if only the notification an
> oops should be sent (without sending the payload with potentially
> sensitive data inside).

This could be a config option too: Whether sending payload data or not.
And people should see a notification that they may expose sensitive data
when using the report-function.

Sven

Pavel Machek

unread,
Sep 1, 2005, 11:06:23 AM9/1/05
to Andrea Arcangeli, Sven Ladegast, linux-...@vger.kernel.org
Hi!

> > [..] combined
> > with an automatic oops/panic/bug-report this would be _very_ useful I think.
>
> That would be nice addition IMHO. It'll be more complex since it'll
> involve netconsole dumping and passing the klive session to the kernel
> somehow (userland would be too unreliable to push the oops to the
> server). The worst part is that oops dumping might expose random kernel
> data (it could contain ssh keys as well), so I would either need to
> purify the stack/code/register lines making the oops quite useless, or
> not to show it at all (and only to show the count of the oopses
> publically). A parameter could be used to tell the kernel if the whole
> oops should be sent to the klive server or if only the notification an
> oops should be sent (without sending the payload with potentially
> sensitive data inside).

Well, you could remove everything that is not valid kernel text from backtrace.

That should make ssh keys non-issue and still provide usefull information.

Oh and you probably want to somehow identify modified kernels.
Otherwise if I do some development on 2.3.4-foo5, you'll get many oopsen
caused by my development code... it is getting complex.
--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

Pavel Machek

unread,
Sep 1, 2005, 11:06:52 AM9/1/05
to Andrea Arcangeli, Alan Cox, Rogier Wolff, Sven Ladegast, linux-...@vger.kernel.org
Hi!

> > tiny C program or a shell script using netcat.
> >
> > echo "Reporting boot: "

> > (echo "BOOT:"_(cat /etc/lum-serial)":"_(uname -a)"::") | nc -u -w 10


> > testhost.example.com 7658
>
> Client completely stateless couldn't get right suspend to disk as far as
> I can tell.

I'd say "ignore suspend". Machines using it are probably not connected
to network, anyway, and it stresses system quite a lot.

I'm afraid that if you compared completely idle system and system running
one hour a day, suspended for the rest, the first system would likely reach better
uptime.


--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

-

Andrea Arcangeli

unread,
Sep 1, 2005, 11:20:56 AM9/1/05
to Pavel Machek, Alan Cox, Rogier Wolff, Sven Ladegast, linux-...@vger.kernel.org
On Wed, Aug 31, 2005 at 08:32:00PM +0200, Pavel Machek wrote:
> I'd say "ignore suspend". Machines using it are probably not connected
> to network, anyway, and it stresses system quite a lot.

Currently even if you're not connected to the network it's fine. As long
as you connect sometime. If a packet manages to get sent to the sever in
a window when you're connected your stats will be fine and the seconds
of uptime will be checked against the time it passed on the server.

> I'm afraid that if you compared completely idle system and system running
> one hour a day, suspended for the rest, the first system would likely reach better
> uptime.

That shouldn't be the case. Anyway I can trivially detect the suspended
systems, so if you want I can add a tag like "suspended" on the right of
the table (in the future we can add filters in the main page so you can
filter out the archs you don't want, smp etc..).

Andrea Arcangeli

unread,
Sep 1, 2005, 11:24:16 AM9/1/05
to Pavel Machek, Sven Ladegast, linux-...@vger.kernel.org
On Wed, Aug 31, 2005 at 08:20:51PM +0200, Pavel Machek wrote:
> Well, you could remove everything that is not valid kernel text from backtrace.

What if the corruption wrote the ssh key inside a the kernel text?

As suggested before, I suspect the only way would be to make it
optional.

> Oh and you probably want to somehow identify modified kernels.
> Otherwise if I do some development on 2.3.4-foo5, you'll get many oopsen
> caused by my development code... it is getting complex.

Agreed, however there's no way to do it reliably, since if you apply a
patch before compiling the kernel, there's no way to know it unless we
do a md5sum of the whole source at every compilation and that would be
too slow ;)

Thanks.

Andrea Arcangeli

unread,
Sep 5, 2005, 2:26:53 PM9/5/05
to tony...@intel.com, Bill Davidsen, Rogier Wolff, linux-...@vger.kernel.org, kl...@cpushare.com
On Wed, Aug 31, 2005 at 09:47:01PM +0200, Andrea Arcangeli wrote:
> I'm thinking to add optional aggregations for (\d+)\.(\d+)\.(\d+)\D and
> for different archs. So you can watch ia64 only or 2.6.13 only etc...
>
> The "-tiger-smp/-generic-up" makes life harder indeed ;).

I now implemented some basic aggregation per-arch and per-branch but I'm
not yet merging in the same row kernels with only a different
localversion (example: 2.6.13-ppc64 isn't merged with 2.6.13). The
problem is that the localversions may be random and so it complicates
things as said above (it'd be really nice to have a way to identify the
localversion reliably). Suggestions are welcome. Thanks.

Marc Giger

unread,
Sep 5, 2005, 6:05:44 PM9/5/05
to Andrea Arcangeli, linux-...@vger.kernel.org
Hi Andrea

Two little details:

The following line does not print what you expect on
alpha's:

MHZ = int(re.search(r' (\d+)\.?\d?',
os.popen("grep -i mhz /proc/cpuinfo | head -n
1").read()).group(1))

My /proc/cpuinfo:

cpu : Alpha
cpu model : EV56
cpu variation : 7
cpu revision : 0
cpu serial number :
system type : EB164
system variation : LX164
system revision : 0
system serial number :
cycle frequency [Hz] : 533171392 est.
timer frequency [Hz] : 1024.00
page size [bytes] : 8192
phys. address bits : 40
max. addr. space # : 127
BogoMIPS : 1059.80
kernel unaligned acc : 61926
(pc=fffffc00005f7ccc,va=fffffc003feee60e) user unaligned acc : 0
(pc=0,va=0) platform string : Digital AlphaPC 164LX 533 MHz
cpus detected : 1
L1 Icache : 8K, 1-way, 32b line
L1 Dcache : 8K, 1-way, 32b line
L2 cache : 96K, 3-way, 64b line
L3 cache : 2048K, 1-way, 64b line

Second, you should mention somewhere that it needs at minimum twisted
1.3.0 to work correctly, did you?

Oh, another point:
Some of my machines have long uptimes, and I won't it reboot
to just match the klive runtime. So the reported uptime
is (in my cases) far away from true.


It is very interesting to see how often a vanilla/-git/-mm etc kernel is
tested. Perhaps klive could be extended to automatically report oopses
and/or other troubles if possible. What abut reporting core features
which are used on the machine like fs, scheduler, raid, lvm etc, so that
the devs can see which subsystem got a lot testing and what is not used
much?

Thanks

Marc


On Tue, 30 Aug 2005 05:09:59 +0200
Andrea Arcangeli <and...@cpushare.com> wrote:

> Hello,
>
> During the Kernel Summit somebody raised the point that it's not clear
> how much testing each rc/pre/git kernel gets before the final release.
>
> So I setup a server to track automatically the amount of testing that
> each kernel gets. Clearly this will be a very rough approximation and
it
> can't be reliable, but perhaps it'll be useful. If this won't be
useful,
> the time I spent on it is very minor so no problem ;).
>
> All the details can be found in the project website:
>
> http://klive.cpushare.com/
>
> Full source (server included) is here:
>
> http://klive.cpushare.com/downloads/klive-0.0.tar.bz2
>
> To run the client:
>
> wget http://klive.cpushare.com/klive.tac
>
> Then at every boot (like in /etc/init.d/boot.local):
>
> twistd -oy klive.tac
>
> In theory we could get rid of the client entirely and make it a kernel
> config option, but I've no idea if this project is useful, so I don't
> want to spend too much time on it at this point.
>
> Thank you.

Andrea Arcangeli

unread,
Sep 5, 2005, 7:14:03 PM9/5/05
to Marc Giger, linux-...@vger.kernel.org, kl...@cpushare.com
On Tue, Sep 06, 2005 at 12:05:07AM +0200, Marc Giger wrote:
> Hi Andrea
>
> Two little details:
>
> The following line does not print what you expect on
> alpha's:
>
> MHZ = int(re.search(r' (\d+)\.?\d?',
> os.popen("grep -i mhz /proc/cpuinfo | head -n
> 1").read()).group(1))

Thanks for reminding me about it ;)

> Second, you should mention somewhere that it needs at minimum twisted
> 1.3.0 to work correctly, did you?

I didn't, I actually hoped it would work with older twisted too ;)

> Oh, another point:
> Some of my machines have long uptimes, and I won't it reboot
> to just match the klive runtime. So the reported uptime
> is (in my cases) far away from true.

You don't need to reboot them, however I can't trust past uptimes or it
would be way too easy to fake the results (it's still easy but it takes
a lot more effort).

> It is very interesting to see how often a vanilla/-git/-mm etc kernel is

> tested. [..]

This is the objective yes.

> [..]. Perhaps klive could be extended to automatically report oopses


> and/or other troubles if possible. What abut reporting core features
> which are used on the machine like fs, scheduler, raid, lvm etc, so
> that the devs can see which subsystem got a lot testing and what is
> not used much?

So it sounds like the next thing to do is to extend the protocol to add
an _optional_ reporting of more info on the subsystems and hardware
involved (turned off by default). About the oopses that will require
kernel changes, and as per previous emails that'd be a very interesting
feature to enable optionally too (via sysctl etc..).

The old protocol (number 0) will stay, infact that will remain the
default.

Thanks.

0 new messages