Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Query about kdump_msg hook into crash_kexec()

43 views
Skip to first unread message

Vivek Goyal

unread,
Jan 31, 2011, 6:10:03 PM1/31/11
to
Hi,

I noticed following commit which hooks into crash_kexec() for calling
kmsg_dump().

I think it is not a very good idea to pass control to modules after
crash_kexec() has been called. Because modules can try to take locks
or try to do some other operations which we really should not be doing
now and fail kdump also. The whole design of kdump is built on the
fact that in crashing kernel we do minimal thing and try to make
transition to second kernel robust. Now with this hook, kmsg dumper
breaks that assumption.

Anyway, if an image is loaded and we have setup to capture dump also
why do we need to save kmsg with the help of an helper. I am assuming
this is more of a debugging aid if we have no other way to capture the
kernel log buffer. So if somebody has setup kdump to capture the
vmcore, why to call another handler which tries to save part of the
vmcore (kmsg) separately.

Thanks
Vivek

commit 0f4bd46ec252887f44f1f065b41867cac8f70dfb
Author: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>
Date: Tue Dec 22 03:15:43 2009 +0000

kmsg_dump: Dump on crash_kexec as well

crash_kexec gets called before kmsg_dump(KMSG_DUMP_OOPS) if
panic_on_oops is set, so the kernel log buffer is not stored
for this case.

This patch adds a KMSG_DUMP_KEXEC dump type which gets called
when crash_kexec() is invoked. To avoid getting double dumps,
the old KMSG_DUMP_PANIC is moved below crash_kexec(). The
mtdoops driver is modified to handle KMSG_DUMP_KEXEC in the
same way as a panic.

Signed-off-by: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>
Acked-by: Simon Kagstrom <simon.k...@netinsight.net>
Signed-off-by: David Woodhouse <David.W...@intel.com>

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Américo Wang

unread,
Feb 1, 2011, 2:30:02 AM2/1/11
to
On Mon, Jan 31, 2011 at 05:59:39PM -0500, Vivek Goyal wrote:
>
>Anyway, if an image is loaded and we have setup to capture dump also
>why do we need to save kmsg with the help of an helper. I am assuming
>this is more of a debugging aid if we have no other way to capture the
>kernel log buffer. So if somebody has setup kdump to capture the
>vmcore, why to call another handler which tries to save part of the
>vmcore (kmsg) separately.
>

Well, kmsg dumper is not only for panic/oops, it can also save kernel
messages when the system is reboot/halt.

Yeah, it is true that vmcore contains log_buf, but I think the users
of ramoops/mtdoops are mainly those who don't have kdump configured
in their kernels, they are cheaper than kdump, but less powerful.

Thanks.

Américo Wang

unread,
Feb 1, 2011, 2:40:01 AM2/1/11
to
On Mon, Jan 31, 2011 at 11:33:15PM -0800, Eric W. Biederman wrote:
>Américo Wang <xiyou.w...@gmail.com> writes:
>
>> On Mon, Jan 31, 2011 at 05:59:39PM -0500, Vivek Goyal wrote:
>>>
>>>Anyway, if an image is loaded and we have setup to capture dump also
>>>why do we need to save kmsg with the help of an helper. I am assuming
>>>this is more of a debugging aid if we have no other way to capture the
>>>kernel log buffer. So if somebody has setup kdump to capture the
>>>vmcore, why to call another handler which tries to save part of the
>>>vmcore (kmsg) separately.
>>>
>>
>> Well, kmsg dumper is not only for panic/oops, it can also save kernel
>> messages when the system is reboot/halt.
>>
>> Yeah, it is true that vmcore contains log_buf, but I think the users
>> of ramoops/mtdoops are mainly those who don't have kdump configured
>> in their kernels, they are cheaper than kdump, but less powerful
>
>The issue is the inane call inside crash_kexec.
>
>It requires both a kexec kernel to be loaded and it requires you to be
>crashing. Given that when I audited the kmsg_dump handlers they really
>weren't safe in a crash dump scenario we should just remove it.
>

Probably, I think we need to get rid of KMSG_DUMP_KEXEC.

Eric W. Biederman

unread,
Feb 1, 2011, 2:40:02 AM2/1/11
to
Américo Wang <xiyou.w...@gmail.com> writes:

> On Mon, Jan 31, 2011 at 05:59:39PM -0500, Vivek Goyal wrote:
>>
>>Anyway, if an image is loaded and we have setup to capture dump also
>>why do we need to save kmsg with the help of an helper. I am assuming
>>this is more of a debugging aid if we have no other way to capture the
>>kernel log buffer. So if somebody has setup kdump to capture the
>>vmcore, why to call another handler which tries to save part of the
>>vmcore (kmsg) separately.
>>
>
> Well, kmsg dumper is not only for panic/oops, it can also save kernel
> messages when the system is reboot/halt.
>
> Yeah, it is true that vmcore contains log_buf, but I think the users
> of ramoops/mtdoops are mainly those who don't have kdump configured

> in their kernels, they are cheaper than kdump, but less powerful

The issue is the inane call inside crash_kexec.

It requires both a kexec kernel to be loaded and it requires you to be
crashing. Given that when I audited the kmsg_dump handlers they really
weren't safe in a crash dump scenario we should just remove it.

Eric

Américo Wang

unread,
Feb 1, 2011, 3:20:01 AM2/1/11
to
On Tue, Feb 01, 2011 at 03:38:53PM +0800, Américo Wang wrote:
>On Mon, Jan 31, 2011 at 11:33:15PM -0800, Eric W. Biederman wrote:
>>The issue is the inane call inside crash_kexec.
>>
>>It requires both a kexec kernel to be loaded and it requires you to be
>>crashing. Given that when I audited the kmsg_dump handlers they really
>>weren't safe in a crash dump scenario we should just remove it.
>>
>
>Probably, I think we need to get rid of KMSG_DUMP_KEXEC.
>

Here we go.

--------->

KMSG_DUMP_KEXEC is useless because we already save kernel messages
inside /proc/vmcore, and it is unsafe to allow modules to do
other stuffs in a crash dump scenario.

Reported-by: Vivek Goyal <vgo...@redhat.com>
Cc: "Eric W. Biederman" <ebie...@xmission.com>
Signed-off-by: WANG Cong <xiyou.w...@gmail.com>


---
diff --git a/drivers/char/ramoops.c b/drivers/char/ramoops.c
index 1a9f5f6..8653ba4 100644
--- a/drivers/char/ramoops.c
+++ b/drivers/char/ramoops.c
@@ -69,8 +69,7 @@ static void ramoops_do_dump(struct kmsg_dumper *dumper,
struct timeval timestamp;

if (reason != KMSG_DUMP_OOPS &&
- reason != KMSG_DUMP_PANIC &&
- reason != KMSG_DUMP_KEXEC)
+ reason != KMSG_DUMP_PANIC)
return;

/* Only dump oopses if dump_oops is set */
diff --git a/drivers/mtd/mtdoops.c b/drivers/mtd/mtdoops.c
index e3e40f4..56eac4e 100644
--- a/drivers/mtd/mtdoops.c
+++ b/drivers/mtd/mtdoops.c
@@ -308,8 +308,7 @@ static void mtdoops_do_dump(struct kmsg_dumper *dumper,
char *dst;

if (reason != KMSG_DUMP_OOPS &&
- reason != KMSG_DUMP_PANIC &&
- reason != KMSG_DUMP_KEXEC)
+ reason != KMSG_DUMP_PANIC)
return;

/* Only dump oopses if dump_oops is set */
diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
index 2a0d7d6..05fa2a9 100644
--- a/include/linux/kmsg_dump.h
+++ b/include/linux/kmsg_dump.h
@@ -17,7 +17,6 @@
enum kmsg_dump_reason {
KMSG_DUMP_OOPS,
KMSG_DUMP_PANIC,
- KMSG_DUMP_KEXEC,
KMSG_DUMP_RESTART,
KMSG_DUMP_HALT,
KMSG_DUMP_POWEROFF,
diff --git a/kernel/kexec.c b/kernel/kexec.c
index ec19b92..3ac0218 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -32,7 +32,6 @@
#include <linux/console.h>
#include <linux/vmalloc.h>
#include <linux/swap.h>
-#include <linux/kmsg_dump.h>

#include <asm/page.h>
#include <asm/uaccess.h>
@@ -1078,8 +1077,6 @@ void crash_kexec(struct pt_regs *regs)
if (kexec_crash_image) {
struct pt_regs fixed_regs;

- kmsg_dump(KMSG_DUMP_KEXEC);
-
crash_setup_regs(&fixed_regs, regs);
crash_save_vmcoreinfo();
machine_crash_shutdown(&fixed_regs);

Vivek Goyal

unread,
Feb 1, 2011, 10:30:02 AM2/1/11
to
On Tue, Feb 01, 2011 at 04:13:13PM +0800, Américo Wang wrote:
> On Tue, Feb 01, 2011 at 03:38:53PM +0800, Américo Wang wrote:
> >On Mon, Jan 31, 2011 at 11:33:15PM -0800, Eric W. Biederman wrote:
> >>The issue is the inane call inside crash_kexec.
> >>
> >>It requires both a kexec kernel to be loaded and it requires you to be
> >>crashing. Given that when I audited the kmsg_dump handlers they really
> >>weren't safe in a crash dump scenario we should just remove it.
> >>
> >
> >Probably, I think we need to get rid of KMSG_DUMP_KEXEC.
> >
>
> Here we go.
>
> --------->
>
> KMSG_DUMP_KEXEC is useless because we already save kernel messages
> inside /proc/vmcore, and it is unsafe to allow modules to do
> other stuffs in a crash dump scenario.
>

I think this is right thing to do. First of all kmsg_dump() call inside
crash_kexec() does not make any sense. Secondly, if kdump kernel is
loaded, we anyway are going to capture log buffers in vmcore and there
should not be any need to try to save messages twice.

Now one can argue that what if kdump does not capture the dump. I think
then right solution (though painful one) is to try to make kdump as
reliable as possible instead of trying to come up with backup mechanisms
to save logs in case kdump fails.

Acked-by: Vivek Goyal <vgo...@redhat.com>

Thanks
Vivek

Jarod Wilson

unread,
Feb 1, 2011, 11:10:01 AM2/1/11
to
On Tue, Feb 01, 2011 at 10:28:45AM -0500, Vivek Goyal wrote:
> On Tue, Feb 01, 2011 at 04:13:13PM +0800, Américo Wang wrote:
> > On Tue, Feb 01, 2011 at 03:38:53PM +0800, Américo Wang wrote:
> > >On Mon, Jan 31, 2011 at 11:33:15PM -0800, Eric W. Biederman wrote:
> > >>The issue is the inane call inside crash_kexec.
> > >>
> > >>It requires both a kexec kernel to be loaded and it requires you to be
> > >>crashing. Given that when I audited the kmsg_dump handlers they really
> > >>weren't safe in a crash dump scenario we should just remove it.
> > >>
> > >
> > >Probably, I think we need to get rid of KMSG_DUMP_KEXEC.
> > >
> >
> > Here we go.
> >
> > --------->
> >
> > KMSG_DUMP_KEXEC is useless because we already save kernel messages
> > inside /proc/vmcore, and it is unsafe to allow modules to do
> > other stuffs in a crash dump scenario.
> >
>
> I think this is right thing to do. First of all kmsg_dump() call inside
> crash_kexec() does not make any sense. Secondly, if kdump kernel is
> loaded, we anyway are going to capture log buffers in vmcore and there
> should not be any need to try to save messages twice.
>
> Now one can argue that what if kdump does not capture the dump. I think
> then right solution (though painful one) is to try to make kdump as
> reliable as possible instead of trying to come up with backup mechanisms
> to save logs in case kdump fails.

I'm of the same mind, kmsg_dump is fine by itself, but wedging it in to
duplicate something kdump already does on the grounds that kdump might
fail is Doing It Wrong. If one has concerns about kdump failing, put
effort into solidifying kdump.

Acked-by: Jarod Wilson <ja...@redhat.com>

--
Jarod Wilson
ja...@redhat.com

KOSAKI Motohiro

unread,
Feb 2, 2011, 8:00:02 PM2/2/11
to
Hi

> Hi,
>
> I noticed following commit which hooks into crash_kexec() for calling
> kmsg_dump().
>
> I think it is not a very good idea to pass control to modules after
> crash_kexec() has been called. Because modules can try to take locks
> or try to do some other operations which we really should not be doing
> now and fail kdump also. The whole design of kdump is built on the
> fact that in crashing kernel we do minimal thing and try to make
> transition to second kernel robust. Now with this hook, kmsg dumper
> breaks that assumption.

I guess you talked about some foolish can shoot their own foot. if so,
Yes. Any kernel module can make kernel panic or more disaster result.


> Anyway, if an image is loaded and we have setup to capture dump also
> why do we need to save kmsg with the help of an helper. I am assuming
> this is more of a debugging aid if we have no other way to capture the
> kernel log buffer. So if somebody has setup kdump to capture the
> vmcore, why to call another handler which tries to save part of the
> vmcore (kmsg) separately.

No.

kmsg_dump() is desingned for embedded. and some embedded devices uses
kexec for non dumping purpose. (Have you seen your embedded devices
show "Now storing dump image.." message?)

Anyway, you can feel free to avoid to use ksmg_dump().

KOSAKI Motohiro

unread,
Feb 2, 2011, 8:00:01 PM2/2/11
to
> On Tue, Feb 01, 2011 at 03:38:53PM +0800, Américo Wang wrote:
> >On Mon, Jan 31, 2011 at 11:33:15PM -0800, Eric W. Biederman wrote:
> >>The issue is the inane call inside crash_kexec.
> >>
> >>It requires both a kexec kernel to be loaded and it requires you to be
> >>crashing. Given that when I audited the kmsg_dump handlers they really
> >>weren't safe in a crash dump scenario we should just remove it.
> >>
> >
> >Probably, I think we need to get rid of KMSG_DUMP_KEXEC.
> >
>
> Here we go.
>
> --------->
>
> KMSG_DUMP_KEXEC is useless because we already save kernel messages
> inside /proc/vmcore, and it is unsafe to allow modules to do
> other stuffs in a crash dump scenario.
>
> Reported-by: Vivek Goyal <vgo...@redhat.com>
> Cc: "Eric W. Biederman" <ebie...@xmission.com>
> Signed-off-by: WANG Cong <xiyou.w...@gmail.com>

I wrote why this is no good idea by another mail. Please see it.
Anyway you have a right to don't use this feature.

Nack.

Vivek Goyal

unread,
Feb 2, 2011, 9:10:02 PM2/2/11
to
On Thu, Feb 03, 2011 at 09:59:04AM +0900, KOSAKI Motohiro wrote:
> > On Tue, Feb 01, 2011 at 03:38:53PM +0800, Américo Wang wrote:
> > >On Mon, Jan 31, 2011 at 11:33:15PM -0800, Eric W. Biederman wrote:
> > >>The issue is the inane call inside crash_kexec.
> > >>
> > >>It requires both a kexec kernel to be loaded and it requires you to be
> > >>crashing. Given that when I audited the kmsg_dump handlers they really
> > >>weren't safe in a crash dump scenario we should just remove it.
> > >>
> > >
> > >Probably, I think we need to get rid of KMSG_DUMP_KEXEC.
> > >
> >
> > Here we go.
> >
> > --------->
> >
> > KMSG_DUMP_KEXEC is useless because we already save kernel messages
> > inside /proc/vmcore, and it is unsafe to allow modules to do
> > other stuffs in a crash dump scenario.
> >
> > Reported-by: Vivek Goyal <vgo...@redhat.com>
> > Cc: "Eric W. Biederman" <ebie...@xmission.com>
> > Signed-off-by: WANG Cong <xiyou.w...@gmail.com>
>
> I wrote why this is no good idea by another mail. Please see it.
> Anyway you have a right to don't use this feature.
>

But you have not explained that why do you need to hook into crash_kexec()
and you have also not explained why do you need to send out kdump_msg()
notification if kdump is configured.

Some detailed explanation here would help.

Thanks
Vivek

Vivek Goyal

unread,
Feb 2, 2011, 9:10:02 PM2/2/11
to
On Thu, Feb 03, 2011 at 09:55:41AM +0900, KOSAKI Motohiro wrote:
> Hi
>
> > Hi,
> >
> > I noticed following commit which hooks into crash_kexec() for calling
> > kmsg_dump().
> >
> > I think it is not a very good idea to pass control to modules after
> > crash_kexec() has been called. Because modules can try to take locks
> > or try to do some other operations which we really should not be doing
> > now and fail kdump also. The whole design of kdump is built on the
> > fact that in crashing kernel we do minimal thing and try to make
> > transition to second kernel robust. Now with this hook, kmsg dumper
> > breaks that assumption.
>
> I guess you talked about some foolish can shoot their own foot. if so,
> Yes. Any kernel module can make kernel panic or more disaster result.

Yes, the difference is that once a fool shoots his foot, kernel tries
to take a meaningful action to figure out what went wrong. Like displayig
an oops backtrace or like dumping a core (if kdump is configured) so
that one can figure out who was the fool and what did who do.

Now think give the control to two fools. First fool shoots his foot
and then kernel transfers the control to another fool which completely
screws up the situation and one can not save the core.

>
>
> > Anyway, if an image is loaded and we have setup to capture dump also
> > why do we need to save kmsg with the help of an helper. I am assuming
> > this is more of a debugging aid if we have no other way to capture the
> > kernel log buffer. So if somebody has setup kdump to capture the
> > vmcore, why to call another handler which tries to save part of the
> > vmcore (kmsg) separately.
>
> No.
>
> kmsg_dump() is desingned for embedded.

Great. And I like the idea of trying to save some useful information
to non volatile RAM or flash or something like that.

> kexec for non dumping purpose. (Have you seen your embedded devices
> show "Now storing dump image.." message?)

No I have not seen. Can you explain a bit more that apart from kernel
dump, what are the other purposes of kdump.

>
> Anyway, you can feel free to avoid to use ksmg_dump().

Yes, that is one more way but this information is not even exported to
user space to figure out if there are any registerd users of kmsg_dump.

Seconly there are two more important things.

- Why do you need a notification from inside crash_kexec(). IOW, what
is the usage of KMSG_DUMP_KEXEC.

- One can anyway call kmsg_dump() outside crash_kexec() before it so
that kmsg_dump notification will go out before kdump gets the control.
What I am arguing here is that it is not necessarily a very good idea
because external modules can try to do any amount of unsafe actions
once we export the hook.

Doing this is still fine if kdump is not configured as anyway syste would
have rebooted. But if kdump is configured, then we are just reducing
the reliability of the operation by passing the control in the hands
of unaudited code and trusting it when kernel data structures are
corrupt.

So to me, sending out kmsg_dump notifications are perfectly fine when
kdump is not configured. But if it is configured, then it probably is
not a good idea. Anyway, if you have configured the system to capture
the full dump, why do you also need kmsg_dump. And if you are happy
with kmsg_dump() then you don't need kdump. So these both seem to be
mutually exclusive anyway.

Thanks
Vivek

KOSAKI Motohiro

unread,
Feb 3, 2011, 12:00:01 AM2/3/11
to
> On Thu, Feb 03, 2011 at 09:55:41AM +0900, KOSAKI Motohiro wrote:
> > Hi
> >
> > > Hi,
> > >
> > > I noticed following commit which hooks into crash_kexec() for calling
> > > kmsg_dump().
> > >
> > > I think it is not a very good idea to pass control to modules after
> > > crash_kexec() has been called. Because modules can try to take locks
> > > or try to do some other operations which we really should not be doing
> > > now and fail kdump also. The whole design of kdump is built on the
> > > fact that in crashing kernel we do minimal thing and try to make
> > > transition to second kernel robust. Now with this hook, kmsg dumper
> > > breaks that assumption.
> >
> > I guess you talked about some foolish can shoot their own foot. if so,
> > Yes. Any kernel module can make kernel panic or more disaster result.
>
> Yes, the difference is that once a fool shoots his foot, kernel tries
> to take a meaningful action to figure out what went wrong. Like displayig
> an oops backtrace or like dumping a core (if kdump is configured) so
> that one can figure out who was the fool and what did who do.
>
> Now think give the control to two fools. First fool shoots his foot
> and then kernel transfers the control to another fool which completely
> screws up the situation and one can not save the core.

If you really want to full control, you should disable CONFIG_MODULES,
kprobes, ftrace and perf. We have a lot of kernel capturing way already.
So, only one feature diabling don't solve anything. Alternatively,
I can imagine to improve security modules and audit loaded kernel
module (and other injection code) more carefully.

So, I'm curious why do you hate so much a part of them and not all of them.

> > > Anyway, if an image is loaded and we have setup to capture dump also
> > > why do we need to save kmsg with the help of an helper. I am assuming
> > > this is more of a debugging aid if we have no other way to capture the
> > > kernel log buffer. So if somebody has setup kdump to capture the
> > > vmcore, why to call another handler which tries to save part of the
> > > vmcore (kmsg) separately.
> >
> > No.
> >
> > kmsg_dump() is desingned for embedded.
>
> Great. And I like the idea of trying to save some useful information
> to non volatile RAM or flash or something like that.

Yeah, thanks.

>
> > kexec for non dumping purpose. (Have you seen your embedded devices
> > show "Now storing dump image.." message?)
>
> No I have not seen. Can you explain a bit more that apart from kernel
> dump, what are the other purposes of kdump.
>
> >
> > Anyway, you can feel free to avoid to use ksmg_dump().
>
> Yes, that is one more way but this information is not even exported to
> user space to figure out if there are any registerd users of kmsg_dump.
>
> Seconly there are two more important things.
>
> - Why do you need a notification from inside crash_kexec(). IOW, what
> is the usage of KMSG_DUMP_KEXEC.

AFAIK, kexec is used sneak rebooting way when the system face unexpected
scenario on some devices. (Some embedded is running very long time, then
it can't avoid memory bit corruption. all of reset is a last resort.
and a vendor gather logs at periodically checkback).

The main purpose of to introduce KMSG_DUMP_KEXEC is to be separate it
from KMSG_DUMP_PANIC. At kmsg_dump() initial patch, KMSG_DUMP_PANIC
is always called both kdump is configured or not. But it's no good idea
the same log is to be appeared when both kexec was successed and failured.
Moreover someone don't want any log at kexec phase. They only want logs
when real panic (ie kexec failure) route. Then, I've separated it to two.
Two separated argument can solve above both requreiment.


> - One can anyway call kmsg_dump() outside crash_kexec() before it so
> that kmsg_dump notification will go out before kdump gets the control.
> What I am arguing here is that it is not necessarily a very good idea
> because external modules can try to do any amount of unsafe actions
> once we export the hook.

I wrote why I don't think I added new risk. (shortly, It can be a lot of
another way)
Can you please tell me your view of my point? I'm afraid I haven't
understand your worry. So, I hope to understand it before making
alternative fixing patch.

> Doing this is still fine if kdump is not configured as anyway syste would
> have rebooted. But if kdump is configured, then we are just reducing
> the reliability of the operation by passing the control in the hands
> of unaudited code and trusting it when kernel data structures are
> corrupt.

At minimum, I'm fully agree we need reliable kdump. I only put a doubtness
this removing is a real solution or not.

> So to me, sending out kmsg_dump notifications are perfectly fine when
> kdump is not configured. But if it is configured, then it probably is
> not a good idea. Anyway, if you have configured the system to capture
> the full dump, why do you also need kmsg_dump. And if you are happy
> with kmsg_dump() then you don't need kdump. So these both seem to be
> mutually exclusive anyway.

Honestly, I haven't heared anyone are using both at the same time. But
I can guess some reason. 1) If the system is very big box, kdump is
really slooooooow operation. example Some stock exchange system have half
terabytes memory and it mean dump delivery need to hald days at worse. But
market should open just 9:00 at next day. So, summry information (eg log and
trace information) spoiling is important thing. 2) Two sequence crash (ie
crash kdump reboot-start next-crash-before-finish-reboot) can override former
dump image. Usually admin _guess_ the reason of two are same and report boss so.
But unfortunatelly customers at high end area often refuse a _guess_ report.

Or, it's for business competition reason. As far as I heared, IBM and HP
UNI*X system can save the logs both dump and special flash device.

PS: FWIW, Hitach folks have usage idea for their enterprise purpose, but
unfortunately I don't know its detail. I hope anyone tell us it.

KOSAKI Motohiro

unread,
Feb 3, 2011, 12:00:01 AM2/3/11
to
> > I wrote why this is no good idea by another mail. Please see it.
> > Anyway you have a right to don't use this feature.
> >
>
> But you have not explained that why do you need to hook into crash_kexec()
> and you have also not explained why do you need to send out kdump_msg()
> notification if kdump is configured.
>
> Some detailed explanation here would help.

Hi,

I send it you now :)

KOSAKI Motohiro

unread,
Feb 3, 2011, 12:30:01 AM2/3/11
to
> AFAIK, kexec is used sneak rebooting way when the system face unexpected
> scenario on some devices. (Some embedded is running very long time, then
> it can't avoid memory bit corruption. all of reset is a last resort.
> and a vendor gather logs at periodically checkback).
>
> The main purpose of to introduce KMSG_DUMP_KEXEC is to be separate it
> from KMSG_DUMP_PANIC. At kmsg_dump() initial patch, KMSG_DUMP_PANIC
> is always called both kdump is configured or not. But it's no good idea
> the same log is to be appeared when both kexec was successed and failured.
> Moreover someone don't want any log at kexec phase. They only want logs
> when real panic (ie kexec failure) route. Then, I've separated it to two.
> Two separated argument can solve above both requreiment.

A bit additional explanation, An original patch have kmsg_dump(KMSG_DUMP_PANIC)
callsite at following point. I didn't think it makes either embedded or
kdump folks happiness. Thus I moved it after crash_kexec().


---------------------------------------------------------------------
@@ -74,6 +75,7 @@ NORET_TYPE void panic(const char * fmt, ...)
dump_stack();
#endif

+ kmsg_dump(KMSG_DUMP_PANIC);
/*
* If we have crashed and we have a crash kernel loaded let it handle
* everything else.
* Do we want to call this before we try to display a message?
*/
crash_kexec(NULL);
---------------------------------------------------------------------

Seiji Aguchi

unread,
Feb 3, 2011, 1:50:01 PM2/3/11
to
Hi,

>PS: FWIW, Hitach folks have usage idea for their enterprise purpose, but
> unfortunately I don't know its detail. I hope anyone tell us it.

I explain the usage of kmsg_dump(KMSG_DUMP_KEXEC) in enterprise area.

[Background]
In our support service experience, we always need to detect root cause
of OS panic.
So, customers in enterprise area never forgive us if kdump fails and
we can't detect the root cause of panic due to lack of materials for
investigation.

>- Why do you need a notification from inside crash_kexec(). IOW, what
> is the usage of KMSG_DUMP_KEXEC.


The usage of kdump(KMSG_DUMP_KEXEC) in enterprise area is getting
useful information for investigating kernel crash in case kdump
kernel doesn't boot.

Kdump kernel may not start booting because there is a sha256 checksum
verified over the kdump kernel before it starts booting.
This means kdump kernel may fail even if there is no bug in kdump and
we can't get any information for detecting root cause of kernel crash.

As I mentioned in [Background], We must avoid lack of materials for
investigation.
So, kdump(KMSG_DUMP_KEXEC) is very important feature in enterprise
area.

>- One can anyway call kmsg_dump() outside crash_kexec() before it so
> that kmsg_dump notification will go out before kdump gets the control.
> What I am arguing here is that it is not necessarily a very good idea
> because external modules can try to do any amount of unsafe actions
> once we export the hook.
>

kmsg_dump() is a feature for specific servers capable NVRAM or flash memory.
So, it should be provided as a option.

By providing notification, linux kernel is able to support
this feature in following different kinds of servers flexibly.

- equipped with some NVRAMs or flash memories
- equipped with a NVRAM or flash memory

Also, linux kernel is able to support servers which don't have NVRAM/flash memory

Seiji

Eric W. Biederman

unread,
Feb 3, 2011, 4:20:02 PM2/3/11
to
Seiji Aguchi <seiji....@hds.com> writes:

> Hi,
>
>>PS: FWIW, Hitach folks have usage idea for their enterprise purpose, but
>> unfortunately I don't know its detail. I hope anyone tell us it.
>
> I explain the usage of kmsg_dump(KMSG_DUMP_KEXEC) in enterprise area.
>
> [Background]
> In our support service experience, we always need to detect root cause
> of OS panic.
> So, customers in enterprise area never forgive us if kdump fails and
> we can't detect the root cause of panic due to lack of materials for
> investigation.
>
>>- Why do you need a notification from inside crash_kexec(). IOW, what
>> is the usage of KMSG_DUMP_KEXEC.
>
>
> The usage of kdump(KMSG_DUMP_KEXEC) in enterprise area is getting
> useful information for investigating kernel crash in case kdump
> kernel doesn't boot.
>
> Kdump kernel may not start booting because there is a sha256 checksum
> verified over the kdump kernel before it starts booting.
> This means kdump kernel may fail even if there is no bug in kdump and
> we can't get any information for detecting root cause of kernel crash

Sure it is theoretically possible that the sha256 checksum gets
corrupted (I have never seen it happen or heard reports of it
happening). It is a feature that if someone has corrupted your code the
code doesn't try and run anyway and corrupt anything else.

That you are arguing against have such a feature in the code you use to
write to persistent storage is scary.

> As I mentioned in [Background], We must avoid lack of materials for
> investigation.
> So, kdump(KMSG_DUMP_KEXEC) is very important feature in enterprise
> area.

That sounds wonderful, but it doesn't jive with the
code. kmsg_dump(KMSG_DUMP_KEXEC) when I read through it was simply not
written to be robust when most of the kernel is suspect. Making it in
appropriate for use on the crash_kexec path. I do not believe kmsg_dump
has seen any testing in kernel failure scenarios.

There is this huge assumption that kmsg_dump is more reliable than
crash_kexec, from my review of the code kmsg_dump is simply not safe in
the context of a broken kernel. The kmsg_dump code last I looked code
won't work if called with interrupts disabled.

Furthermore kmsg_dump(KMSG_DUMP_KEXEC) is only useful for debugging
crash_kexec. Which has it backwards as it is kmsg_dump that needs the
debugging.

You just argued that it is better to corrupt the target of your
kmsg_dump in the event of a kernel failure instead of to fail silently.

I don't want that unreliable code that wants to corrupt my jffs
partition anywhere near my machines.

Eric

Seiji Aguchi

unread,
Feb 3, 2011, 5:20:01 PM2/3/11
to
Hi Eric,

Thank you for your prompt reply.

I would like to consider "Needs in enterprise area" and "Implementation of kmsg_dump()" separately.

(1) Needs in enterprise area
In case of kdump failure, we would like to store kernel buffer to NVRAM/flush memory
for detecting root cause of kernel crash.

(2) Implementation of kmsg_dump
You suggest to review/test cording of kmsg_dump() more.

What do you think about (1)?
Is it acceptable for you?

Seiji

>-----Original Message-----
>From: Eric W. Biederman [mailto:ebie...@xmission.com]
>Sent: Thursday, February 03, 2011 4:13 PM
>To: Seiji Aguchi
>Cc: Vivek Goyal; KOSAKI Motohiro; linux kernel mailing list; Jarod Wilson
>Subject: Re: Query about kdump_msg hook into crash_kexec()
>

Américo Wang

unread,
Feb 3, 2011, 9:30:02 PM2/3/11
to
On Thu, Feb 03, 2011 at 05:08:01PM -0500, Seiji Aguchi wrote:
>Hi Eric,
>
>Thank you for your prompt reply.
>
>I would like to consider "Needs in enterprise area" and "Implementation of kmsg_dump()" separately.
>
>(1) Needs in enterprise area
> In case of kdump failure, we would like to store kernel buffer to NVRAM/flush memory
> for detecting root cause of kernel crash.
>
>(2) Implementation of kmsg_dump
> You suggest to review/test cording of kmsg_dump() more.
>
>What do you think about (1)?
>Is it acceptable for you?
>

For "in case of kdump fails", we can move
KMSG_DUMP_OOPS/KMSG_DUMP_PANIC before calling crash_kexec(),
so you can capture the log before kdump happens.

Vivek Goyal

unread,
Feb 3, 2011, 10:00:01 PM2/3/11
to
On Fri, Feb 04, 2011 at 10:24:27AM +0800, Américo Wang wrote:
> On Thu, Feb 03, 2011 at 05:08:01PM -0500, Seiji Aguchi wrote:
> >Hi Eric,
> >
> >Thank you for your prompt reply.
> >
> >I would like to consider "Needs in enterprise area" and "Implementation of kmsg_dump()" separately.
> >
> >(1) Needs in enterprise area
> > In case of kdump failure, we would like to store kernel buffer to NVRAM/flush memory
> > for detecting root cause of kernel crash.
> >
> >(2) Implementation of kmsg_dump
> > You suggest to review/test cording of kmsg_dump() more.
> >
> >What do you think about (1)?
> >Is it acceptable for you?
> >
>
> For "in case of kdump fails", we can move
> KMSG_DUMP_OOPS/KMSG_DUMP_PANIC before calling crash_kexec(),
> so you can capture the log before kdump happens.

True. If the idea is to capture the logs before kdump starts saving
core, then kdump_msg() call, can be put before crash_kexec() call and
there is no need to hook into crash_kexec().

Secondly now the question of whether kdump_msg() call should be before
crash_kexec() or not because modules might want to do lots of unreliable
things, I am now split on that. Especially because of the fact that if
somebody wants it probably can use kprobe to hook into crash_kexec()
or panic() or something like that to execute its code before everything
else.

The other reason I am little split on this because in the past I have seen
kdump fail many a times either because of chipset issues or because of driver
issues etc. So though I don't like it and I want drivers to be fixed
to take care of booting in kdump environment, things not necessarily
worked as well as we expected them to and it is not hard to meet a kdump
failure.

Hence though I do not prefer it but I will not stand in the way of kdump_msg()
being called before crash_kexec() gets the control.

But we should atleast remove the hook from crash_kexec() and get rid
of additional config option (KMSG_DUMP_KEXEC).

Thanks
Vivek

Américo Wang

unread,
Feb 3, 2011, 10:30:01 PM2/3/11
to
On Thu, Feb 03, 2011 at 09:50:42PM -0500, Vivek Goyal wrote:
>On Fri, Feb 04, 2011 at 10:24:27AM +0800, Américo Wang wrote:
>> On Thu, Feb 03, 2011 at 05:08:01PM -0500, Seiji Aguchi wrote:
>> >Hi Eric,
>> >
>> >Thank you for your prompt reply.
>> >
>> >I would like to consider "Needs in enterprise area" and "Implementation of kmsg_dump()" separately.
>> >
>> >(1) Needs in enterprise area
>> > In case of kdump failure, we would like to store kernel buffer to NVRAM/flush memory
>> > for detecting root cause of kernel crash.
>> >
>> >(2) Implementation of kmsg_dump
>> > You suggest to review/test cording of kmsg_dump() more.
>> >
>> >What do you think about (1)?
>> >Is it acceptable for you?
>> >
>>
>> For "in case of kdump fails", we can move
>> KMSG_DUMP_OOPS/KMSG_DUMP_PANIC before calling crash_kexec(),
>> so you can capture the log before kdump happens.
>
>True. If the idea is to capture the logs before kdump starts saving
>core, then kdump_msg() call, can be put before crash_kexec() call and
>there is no need to hook into crash_kexec().
>

Totally agreed.

>Secondly now the question of whether kdump_msg() call should be before
>crash_kexec() or not because modules might want to do lots of unreliable
>things, I am now split on that. Especially because of the fact that if
>somebody wants it probably can use kprobe to hook into crash_kexec()
>or panic() or something like that to execute its code before everything
>else.

If kprobe is the reason, then probably we can remove lots of other
kernel API's like kmsg dumper.


>
>The other reason I am little split on this because in the past I have seen
>kdump fail many a times either because of chipset issues or because of driver
>issues etc. So though I don't like it and I want drivers to be fixed
>to take care of booting in kdump environment, things not necessarily
>worked as well as we expected them to and it is not hard to meet a kdump
>failure.
>
>Hence though I do not prefer it but I will not stand in the way of kdump_msg()
>being called before crash_kexec() gets the control.
>

The point is that we don't have to choose one of them (kmsg dumper or
kdump), we can choose to have them both. Like Seiji said, they always
want the kernel messages to be saved. Duplicates are much better than
nothing.

KOSAKI Motohiro

unread,
Feb 4, 2011, 1:50:01 AM2/4/11
to
> >Secondly now the question of whether kdump_msg() call should be before
> >crash_kexec() or not because modules might want to do lots of unreliable
> >things, I am now split on that. Especially because of the fact that if
> >somebody wants it probably can use kprobe to hook into crash_kexec()
> >or panic() or something like that to execute its code before everything
> >else.
>
> If kprobe is the reason, then probably we can remove lots of other
> kernel API's like kmsg dumper.

Really?

On enterprise, distro kernel has CONFIG_KPROBE=y. It has both pros and cons,
pros) kprobe can provide alternative hooking way therefore kmsg dump is not
must. cons) even if we remove kmsg dump, we can't get full kernel control.
kprobe can inject unrealiable code. It's two sides of the same coin. But,
many embedded don't have kprobe feature, therefore kprobe can't be
alternative of kmsg dumper.

Vivek Goyal

unread,
Feb 4, 2011, 10:00:02 AM2/4/11
to
On Thu, Feb 03, 2011 at 01:52:01PM +0900, KOSAKI Motohiro wrote:

[..]


> > Seconly there are two more important things.
> >
> > - Why do you need a notification from inside crash_kexec(). IOW, what
> > is the usage of KMSG_DUMP_KEXEC.
>
> AFAIK, kexec is used sneak rebooting way when the system face unexpected
> scenario on some devices. (Some embedded is running very long time, then
> it can't avoid memory bit corruption. all of reset is a last resort.
> and a vendor gather logs at periodically checkback).
>
> The main purpose of to introduce KMSG_DUMP_KEXEC is to be separate it
> from KMSG_DUMP_PANIC. At kmsg_dump() initial patch, KMSG_DUMP_PANIC
> is always called both kdump is configured or not. But it's no good idea
> the same log is to be appeared when both kexec was successed and failured.
> Moreover someone don't want any log at kexec phase. They only want logs
> when real panic (ie kexec failure) route. Then, I've separated it to two.
> Two separated argument can solve above both requreiment.

This point is not very clear to me. Let me add some details.

- crash_kexec() path is not taken during regular kexec boot. It is only
taken when first kernel has crashed and we want to boot in kdump
kernel

- If kdump fails, most likely you will either be hung or panic in second
kernel or something like that.

- Current code looks as follows.

if (mutex_trylock(&kexec_mutex)) {


if (kexec_crash_image) {
struct pt_regs fixed_regs;

kmsg_dump(KMSG_DUMP_KEXEC);

crash_setup_regs(&fixed_regs, regs);
crash_save_vmcoreinfo();
machine_crash_shutdown(&fixed_regs);
machine_kexec(kexec_crash_image);
}
mutex_unlock(&kexec_mutex);

If a kdump kernel is not loaded you will call kmsg_dump(PANIC) and if
a kdump kernel is loaded you will call kmsg_dump(KEXEC). This is
irrespective of the fact whether kdump succeeds or fails.

- ramoops driver is not differentiating between KMSG_DUMP_KEXEC or
KMSG_DUMP_PANIC. Similiarly mtdoops driver is not differentiating
between two reasons. IOW, to me there does not seem to be any difference
between events KMSG_DUMP_KEXEC and KMSG_DUMP_PANIC.

So we should simply get rid of extra event KMSG_DUMP_KEXEC?

>
>
> > - One can anyway call kmsg_dump() outside crash_kexec() before it so
> > that kmsg_dump notification will go out before kdump gets the control.
> > What I am arguing here is that it is not necessarily a very good idea
> > because external modules can try to do any amount of unsafe actions
> > once we export the hook.
>
> I wrote why I don't think I added new risk. (shortly, It can be a lot of
> another way)
> Can you please tell me your view of my point? I'm afraid I haven't
> understand your worry. So, I hope to understand it before making
> alternative fixing patch.

Once you provide a hook to modules they can try doing anything. As long
as modules keep it really simple, it probably is fine. But what if
they start doing some writes to regular file system/devices in an
attempt to save kernel logs or even some more data and that can lead
to hard lockups or further corruption and possibly resulting in kdump
not even getting a chance to run.

If we keep the kmsg dump driver simple I guess things will be just fine in
most of the cases. So as I said in other mail, I am fine with kmsg
notifications being sent before crash_kexec(). Atleast we can audit in
kernel driver code and bigger worry is vendor specific outside the tree
modules. But I think we can live with that and down the line possibly
provide a tunable to change this behavior if it becomes an issue.

So until and unlesss we have a very clear reason of differentiating
between KEXEC and PANIC event, lets get rid of it. So far I have not
been able to understand what's the difference between two.

Thanks
Vivek

Vivek Goyal

unread,
Feb 4, 2011, 10:10:02 AM2/4/11
to

And I think to compensate for that somebody introduced additional
kmsg_dump(KEXEC) call inside crash_kexec() and put it under CONFIG
option so that one can change the behavior based on config options.

I think this makes the logic somewhat twisted and an unnecessary call
inside crash_kexec(). So until and unless there is a strong reason we
can get rid of KEXEC event and move kmsg_dump call before crash_kexec()
for now and see how does it go, IMHO.

Vivek

Vivek Goyal

unread,
Feb 8, 2011, 11:50:01 AM2/8/11
to
On Thu, Feb 03, 2011 at 05:08:01PM -0500, Seiji Aguchi wrote:
> Hi Eric,
>
> Thank you for your prompt reply.
>
> I would like to consider "Needs in enterprise area" and "Implementation of kmsg_dump()" separately.
>
> (1) Needs in enterprise area
> In case of kdump failure, we would like to store kernel buffer to NVRAM/flush memory
> for detecting root cause of kernel crash.
>
> (2) Implementation of kmsg_dump
> You suggest to review/test cording of kmsg_dump() more.
>
> What do you think about (1)?
> Is it acceptable for you?

Ok, I am just trying to think loud about this problem and see if something
fruitful comes out which paves the way forward.

- So ideally we would like kdump_msg() to be called after crash_kexec() so
that any unaudited (third party modules), unreliable calls do not
compromise the realiability of kdump operation.

But hitachi folks seems to be wanting to save atleast kernel buffers
somwhere in the NVRAM etc because they think that kdump can be
unreliable and we might not capture any information after the crash. So
they kind of want two mechanisms in place. One is light weight which
tries to save kernel buffers in NVRAM and then one heavy weight one
which tries to save the entire/filtered kernel core.

Personally I am not too excited about the idea but I guess I can live
with it. We can try to audit atleast in kernel module and for external
modules we don't have much control and live with the fact that if
modules screw up, we don't capture the dump.

Those who don't want this behavior can do three things.

- Disable kdump_msg() at compile time.
- Do not load any module which registers for kdump_msg()
- Implement a /proc tunable which allows controlling this
behavior.

- Ok, having said why do we want it, comes the question of how to
do it so that it works reasonably well.

- There seems to be on common requirement of kmsg_dump() and kdump()
and that is stop other cpus reliably (use nmi if possible). Can
we try to share this code between kmsg_dump and crash_kexec(). So
something like as follows.

- panic happens
- Do all the activities related to printing panic string and
stack dump.
- Stop other cpus.
- This can be probably be done with the equivalent of
machine_crash_shutdown() function. In fact this function
can probably be broken down in two parts. First part
does shutdown_prepare() where all other cpus are shot
down and second part can do the actual disabling of
LAPIC/IOAPIC and saving cpu registers etc.

if (mutex_trylock(some_shutdown_mutex)) {
/* setp regs, fix vmcoreinfo etc */
crash_kexec_prepare();
machine_shutdown_prepare();
kdump_msg();
crash_kexec_execute()
/* Also call panic_notifier_list here ? */
}

crash_kexec_prepare () {
crash_setup_regs(&fixed_regs, regs);
crash_save_vmcoreinfo();
}

crash_kexec_execute() {
/* Shutdown lapic/ioapic, save this cpu register etc */
machine_shutdown();
machine_kexec()
}

So basically we break down machine_shutdown() function in two parts
and start sharing common part between kdump_msg(), crash_kexec and
possibly panic_notifiers.

If kdump is not configured, then after executing kdump_msg() and panic
notifiers, we should either be sitting in tight loop with interrupt
enabled for somebody to press Ctrl-boot or reboot system upon lapse
of panic_timeout().

Eric, does it make sense to you?

Thanks
Vivek

Eric W. Biederman

unread,
Feb 8, 2011, 12:40:01 PM2/8/11
to
Vivek Goyal <vgo...@redhat.com> writes:

kexec on panic doesn't strictly require that we stop other cpus.

What makes sense to me at this point is for someone on the kmsg_dump
side to make a strong case that the code actually works in a crash dump
scenario. We have lots of experience over the years that says a design
like kmsg_dump is attractive but turns out to be a unreliable piece of
junk that fails just when you need it. Because developers only test
the case when the kernel is happy and because people share code with
the regular path drivers, and that code assumes things are happy.

I forget exactly why but last I looked.
local_irq_disable()
kmsg_dump()
local_irq_enalbe()

Was a recipe for disaster, and you have be at least that good to even
have a chance of working in a crash dump scenario.

In part I am puzzled why the kmsg dump doesn't just use the printk
interface. Strangely enough printk works in the event of a crash and
has been shown to be reliable over the years.

Vivek Goyal

unread,
Feb 8, 2011, 2:30:01 PM2/8/11
to

Yes but it is desirable.

- We don't want cpus to be scribbling on old memory and possibly on
new kernel's memory also in case of corrupted pointer and crash
the freshly booted kernel (New kerenl's memory is mapped in old
kernel)

- We don't want other cpus to complete panic() and jump to BIOS or
lead to some kind of triple fault and reset the system etc.

So that would suggest to be robust, stopping other cpus is required.

On a side note, kdump_msg() hook is present in emergency_reboot() too.
So if these paths are not properly designed, then system might not
even reboot automatically even if panic_timeout() has been specified.

>
> What makes sense to me at this point is for someone on the kmsg_dump
> side to make a strong case that the code actually works in a crash dump
> scenario. We have lots of experience over the years that says a design
> like kmsg_dump is attractive but turns out to be a unreliable piece of
> junk that fails just when you need it. Because developers only test
> the case when the kernel is happy and because people share code with
> the regular path drivers, and that code assumes things are happy.
>
> I forget exactly why but last I looked.
> local_irq_disable()
> kmsg_dump()
> local_irq_enalbe()

I agree that kdump_msg() code should be able to work with interrupts
disabled, atleast.

There seem to be two pieces to it. One is generic call which calls
all the dumpers and then respective dumpers.

Looking at generic dumping call, I can't think why does it need interrupts
to be enabled. There is one spin_lock() and then rcu_read_lock(). That's
it.

Regarding mtdoops, it is hard to tell. There is lot of code. But the good
thing is that they have introduced a separate write path for panic context.
That way atleast one can do special casing in panic path to avoid
taking locks and not be dependent on interrupts.

ramoops seems to be simple. It seems to be just memcpy() except
do_gettimeofday(). I noticed that in the past you raised concern about usage
of do_gettimeofday(), but I am not sure what is the concern here (silly
question i guess).

So to me, ramoops seems to be simple memcpy (atleast in principle) and
mtdoops has a dedicated path for handling panic context. So atleast
it is fixable for possible issues. Generic code seems harmless to me
at this point of time.



>
> Was a recipe for disaster, and you have be at least that good to even
> have a chance of working in a crash dump scenario.
>
> In part I am puzzled why the kmsg dump doesn't just use the printk
> interface. Strangely enough printk works in the event of a crash and
> has been shown to be reliable over the years.

Are you suggesting implementing these things as console driver and
register with printk as console? Sounds interesting. I think one of
the issues they probably will find that they don't want to log everything.
They want to do selective logging. Not sure how would they get this
info.

In some cases like emergency_restart(), there are no printk() and they
just consider it one of the events to dump the kernel buffer contents.

Thanks
Vivek

Eric W. Biederman

unread,
Feb 8, 2011, 3:00:01 PM2/8/11
to
Vivek Goyal <vgo...@redhat.com> writes:

Ouch! Ouch! Ouch!

emergency_restart is just supposed to be the restart bits, so
that it always works! We wouldn't need to use emergency restart
if we could trust the normal hardware shutdown code. Sigh.

I guess this is another issue I have with the design of kmsg_dump.
Instead of being a good citizen like everyone else and placing their
hooks explicitly where they can be seen. The kmsg_dump hooks were
buried in other routines changing the semantics of those routines.

What makes this scary is the modified routines are deliberately simple
so they are robust and can be audited and now on those code paths we
have the hooks that are so big everyone gives up before reading and
understanding all of the code involved.

>> What makes sense to me at this point is for someone on the kmsg_dump
>> side to make a strong case that the code actually works in a crash dump
>> scenario. We have lots of experience over the years that says a design
>> like kmsg_dump is attractive but turns out to be a unreliable piece of
>> junk that fails just when you need it. Because developers only test
>> the case when the kernel is happy and because people share code with
>> the regular path drivers, and that code assumes things are happy.
>>
>> I forget exactly why but last I looked.
>> local_irq_disable()
>> kmsg_dump()
>> local_irq_enalbe()
>
> I agree that kdump_msg() code should be able to work with interrupts
> disabled, atleast.
>
> There seem to be two pieces to it. One is generic call which calls
> all the dumpers and then respective dumpers.
>
> Looking at generic dumping call, I can't think why does it need interrupts
> to be enabled. There is one spin_lock() and then rcu_read_lock(). That's
> it.
>
> Regarding mtdoops, it is hard to tell. There is lot of code. But the good
> thing is that they have introduced a separate write path for panic context.
> That way atleast one can do special casing in panic path to avoid
> taking locks and not be dependent on interrupts.

I think that is part of my problem. There is a lot of code so it is
really hard to audit, and really hard to trust. I'm still leary with
the amount of code on the kexec on panic code path.

> ramoops seems to be simple. It seems to be just memcpy() except
> do_gettimeofday(). I noticed that in the past you raised concern about usage
> of do_gettimeofday(), but I am not sure what is the concern here (silly
> question i guess).
>
> So to me, ramoops seems to be simple memcpy (atleast in principle) and
> mtdoops has a dedicated path for handling panic context. So atleast
> it is fixable for possible issues. Generic code seems harmless to me
> at this point of time.
>
>>
>> Was a recipe for disaster, and you have be at least that good to even
>> have a chance of working in a crash dump scenario.
>>
>> In part I am puzzled why the kmsg dump doesn't just use the printk
>> interface. Strangely enough printk works in the event of a crash and
>> has been shown to be reliable over the years.
>
> Are you suggesting implementing these things as console driver and
> register with printk as console? Sounds interesting. I think one of
> the issues they probably will find that they don't want to log everything.
> They want to do selective logging. Not sure how would they get this
> info.
>
> In some cases like emergency_restart(), there are no printk() and they
> just consider it one of the events to dump the kernel buffer contents.

Sorry. I am tired and I'm really not ready to deal with yet another
mechanism like this, that is over designed and was flaky the first time
I looked at it. That flakiness combined with the authors not stepping
forward and rushing off to fix things or telling me they have fixed
things, means that I don't have any confidence in the mechanism.

Vivek if you want to step forward and be one of the people taking care
of kmsg_dump, then this conversation might have a point.

What I know for certain is that kmsg_dump and kexec on panic are
used in pretty exclusive ways, so getting the kmsg_dump code out
of crash_kexec does not hurt anyone today.

Right now I can't even get 2.6.38-rc4 to compile a reasonably sized
program, so I see a lot worse regressions to deal with than something
I can just avoid for now.

Eric

KOSAKI Motohiro

unread,
Mar 7, 2011, 8:40:02 PM3/7/11
to
I'm sorry I've missed this mail long time.

> > ---------------------------------------------------------------------
> > @@ -74,6 +75,7 @@ NORET_TYPE void panic(const char * fmt, ...)
> > dump_stack();
> > #endif
> >
> > + kmsg_dump(KMSG_DUMP_PANIC);
> > /*
> > * If we have crashed and we have a crash kernel loaded let it handle
> > * everything else.
> > * Do we want to call this before we try to display a message?
> > */
> > crash_kexec(NULL);
> > ---------------------------------------------------------------------
>
> And I think to compensate for that somebody introduced additional
> kmsg_dump(KEXEC) call inside crash_kexec() and put it under CONFIG
> option so that one can change the behavior based on config options.
>
> I think this makes the logic somewhat twisted and an unnecessary call
> inside crash_kexec(). So until and unless there is a strong reason we
> can get rid of KEXEC event and move kmsg_dump call before crash_kexec()
> for now and see how does it go, IMHO.

I think I can agree your proposal. But could you please explain why do
you think kmsg _before_ kdump and kmsg _in_ kdump are so different?
I think it is only C level difference. CPU don't care C function and
anyway the kernel call kmsg_dump() because invoke second kernel even
if you proposal applied.

It is only curious. I'm not against your proposal.

Thanks.

Andrew Morton

unread,
May 26, 2011, 4:20:01 PM5/26/11
to
On Thu, 3 Feb 2011 13:53:01 +0900 (JST)
KOSAKI Motohiro <kosaki....@jp.fujitsu.com> wrote:

> > > I wrote why this is no good idea by another mail. Please see it.
> > > Anyway you have a right to don't use this feature.
> > >
> >
> > But you have not explained that why do you need to hook into crash_kexec()
> > and you have also not explained why do you need to send out kdump_msg()
> > notification if kdump is configured.
> >
> > Some detailed explanation here would help.
>
> Hi,
>
> I send it you now :)
>

What happened with this? kexec-remove-kmsg_dump_kexec.patch has two acks
and one unexplained nack :(

Eric W. Biederman

unread,
May 27, 2011, 9:50:01 PM5/27/11
to
Andrew Morton <ak...@linux-foundation.org> writes:

> On Thu, 3 Feb 2011 13:53:01 +0900 (JST)
> KOSAKI Motohiro <kosaki....@jp.fujitsu.com> wrote:
>
>> > > I wrote why this is no good idea by another mail. Please see it.
>> > > Anyway you have a right to don't use this feature.
>> > >
>> >
>> > But you have not explained that why do you need to hook into crash_kexec()
>> > and you have also not explained why do you need to send out kdump_msg()
>> > notification if kdump is configured.
>> >
>> > Some detailed explanation here would help.
>>
>> Hi,
>>
>> I send it you now :)
>>
>
> What happened with this? kexec-remove-kmsg_dump_kexec.patch has two acks
> and one unexplained nack :(

As I recall the nack was based on a theoretical use case, and a cleanup
of kmsg_dump to make it more robust which I don't believe has happened,
instead of something real.

My feel is that we should remove kmsg_dump and if a real use case comes
up reconsider it at that time.

I don't think anyone cares too strongly at the moment because the
features are not expected to be used in conjunction with each other, nor
even expected to be compiled into the same kernel. However given that
they are not used to be used in conjunction with each other a call into
kmsg_dump from crash_kexec is really just cluttering the code for no
benefit to anyone.

I do believe kmsg_dump suffers from all of the same problems lkdtm
suffered from long ago which is it only works in a working kernel,
and it is unlikely to tell you anything on system failure.

Eric

KOSAKI Motohiro

unread,
May 30, 2011, 1:20:02 AM5/30/11
to
(2011/05/27 5:10), Andrew Morton wrote:
> On Thu, 3 Feb 2011 13:53:01 +0900 (JST)
> KOSAKI Motohiro <kosaki....@jp.fujitsu.com> wrote:
>
>>>> I wrote why this is no good idea by another mail. Please see it.
>>>> Anyway you have a right to don't use this feature.
>>>>
>>>
>>> But you have not explained that why do you need to hook into crash_kexec()
>>> and you have also not explained why do you need to send out kdump_msg()
>>> notification if kdump is configured.
>>>
>>> Some detailed explanation here would help.
>>
>> Hi,
>>
>> I send it you now :)
>>
>
> What happened with this? kexec-remove-kmsg_dump_kexec.patch has two acks
> and one unexplained nack :(

http://groups.google.com/group/linux.kernel/browse_thread/thread/1084f406573d76ac/ee19e34b45f83536?lnk=raot&pli=1

At last mail, Vivek proposed move kms_dump() instead remove. and I asked following question and
I've got no response. I'm still waiting his.

Américo Wang

unread,
May 30, 2011, 3:40:02 AM5/30/11
to

Seiji once proposed NVRAM which uses kmsg_dump, he wants
to have another way to store panic information when kdump doesn't
work which is _not_ rare in the real world.

Seiji Aguchi

unread,
May 31, 2011, 5:00:02 PM5/31/11
to
Hi,

As Americo said, sometimes kdump doesn't work in the real world.

So, I would like to keep discussing implementation of kmsg_dump(KMSG_DUMP_KEXEC)
rather than just removing it.

Seiji

Vivek Goyal

unread,
May 31, 2011, 5:40:02 PM5/31/11
to
On Tue, May 31, 2011 at 04:58:00PM -0400, Seiji Aguchi wrote:
> Hi,
>
> As Americo said, sometimes kdump doesn't work in the real world.
>
> So, I would like to keep discussing implementation of kmsg_dump(KMSG_DUMP_KEXEC)
> rather than just removing it.

What are you using kmsg_dump() for? Using mtdoops, ramoops or something
else? Is it working reliably for you?

Thanks
Vivek

Vivek Goyal

unread,
May 31, 2011, 6:00:02 PM5/31/11
to

Few reasons.

- There is no correlation between crash_kexec() and kdump_msg(). What
you are creating is equivalent of panic notifiers and calling those
notifiers before dump happened. So calling it inside of crash_kexec()
does not make much sense from code point of view.

- Why does somebody need to keep track of event KMSG_DUMP_KEXEC?

- There is one kernel CONFIG option introduce which looks completely
superfluous.

My general take on the whole issue.

- In general I think exporting a hook to module so that they can do
anything before crash is a bad idea. Now this can be overloaded to
do things like sending crash notifications in clustered environement
where we recommend doing it in second kernel.

- Even if we really have to do it, there seemed to be two concern
areas.

- Reliability of kdump_msg() generic infrastructure and its
capability in terms of handling races with other cpus and
NMIs.

- Reliability of module which is getting the callback from
kdump_msg().

I think in one of the mails I was discussing that common infrastructure
between kdump and kmsg_dump() can be put in a separate function, like
stopping all cpus etc to avoid races in generic infrastrucutre and
then first we can all kmsg_dump() and then crash_kexec().

But this still does not provide us any protection against modules getting
control after crash and possiblly worsen the situation.

Thanks
Vivek

Seiji Aguchi

unread,
May 31, 2011, 6:30:02 PM5/31/11
to
Hi,

>What are you using kmsg_dump() for? Using mtdoops, ramoops or something
>else? Is it working reliably for you?

I plan to use kmsg_dump() for set_variable service of UEFI.
I proposed a prototype patch this month and will improve it.
(kmsg_dump is used inside pstore.)

https://lkml.org/lkml/2011/5/10/340

Seiji

Eric W. Biederman

unread,
Jun 1, 2011, 11:30:02 PM6/1/11
to
Seiji Aguchi <seiji....@hds.com> writes:

> Hi,
>
>>What are you using kmsg_dump() for? Using mtdoops, ramoops or something
>>else? Is it working reliably for you?
>
> I plan to use kmsg_dump() for set_variable service of UEFI.
> I proposed a prototype patch this month and will improve it.
> (kmsg_dump is used inside pstore.)
>
> https://lkml.org/lkml/2011/5/10/340

Shudder. Firmware calls in the crash path.

If that is the use, we need to remove the kmsg_dump(KMSG_DUMP_KEXEC)
hook from crash_kexec yesterday. It is leading to some really ludicrous
suggestions that are on the way from making kexec on panic unreliable
and useless.

There will always be EFI implementations where that will not work and
there will be no way we can fix those.

There is a long history of people trying to do things in a crashing
kernel, things that simply do not work when the system is in a bad
state. kmsg_dump() when I reviewed the code had significant
implementation problems for being called from interrupt handlers
and the like.

To introduce a different solution for capturing information when a
kernel crashes we need to see numbers that in a large number of
situations that the mechanism you are proposing is more reliable and/or
more maintainable than the current kexec on panic implementation.

The best work I know of on the reliability of the current situation
is "Evaluating Linux Kernel Crash Dumping Mechanisms", by Fernando Luis Vazquez Cao.
http://www.linuxsymposium.org/archives/OLS/Reprints-2006/cao-reprint.pdf


Now it does happen to be a fact that our efi support in linux is
so buggy kexec does not work let alone kexec on panic (if the target
kernel has any efi support). But our efi support being buggy is not
a reason to add more ways to fail when we have a kernel with efi
support. It is an argument to remove our excessive use of EFI
calls.

So let's just remove the ridiculous kmsg_dump(KMSG_DUMP_KEXEC) hook from
crash_kexec and remove any temptation for abuses like wanting to use
kmsg_dump() on anything but a deeply embedded system where there simply
is not enough memory for 2 kernels.

Eric

Andrew Morton

unread,
Jun 7, 2011, 8:10:02 PM6/7/11
to

So am I allowed to merge kexec-remove-kmsg_dump_kexec.patch yet?


From: WANG Cong <xiyou.w...@gmail.com>

KMSG_DUMP_KEXEC is useless because we already save kernel messages inside
/proc/vmcore, and it is unsafe to allow modules to do other stuffs in a
crash dump scenario.

[ak...@linux-foundation.org: fix powerpc build]
Signed-off-by: WANG Cong <xiyou.w...@gmail.com>
Reported-by: Vivek Goyal <vgo...@redhat.com>
Acked-by: Vivek Goyal <vgo...@redhat.com>
Acked-by: Jarod Wilson <ja...@redhat.com>
Cc: "Eric W. Biederman" <ebie...@xmission.com>
Cc: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
---

arch/powerpc/platforms/pseries/nvram.c | 1 -
drivers/char/ramoops.c | 3 +--
drivers/mtd/mtdoops.c | 3 +--
include/linux/kmsg_dump.h | 1 -
kernel/kexec.c | 3 ---
5 files changed, 2 insertions(+), 9 deletions(-)

diff -puN drivers/char/ramoops.c~kexec-remove-kmsg_dump_kexec drivers/char/ramoops.c
--- a/drivers/char/ramoops.c~kexec-remove-kmsg_dump_kexec
+++ a/drivers/char/ramoops.c
@@ -69,8 +69,7 @@ static void ramoops_do_dump(struct kmsg_
struct timeval timestamp;

if (reason != KMSG_DUMP_OOPS &&
- reason != KMSG_DUMP_PANIC &&
- reason != KMSG_DUMP_KEXEC)
+ reason != KMSG_DUMP_PANIC)
return;

/* Only dump oopses if dump_oops is set */
diff -puN drivers/mtd/mtdoops.c~kexec-remove-kmsg_dump_kexec drivers/mtd/mtdoops.c
--- a/drivers/mtd/mtdoops.c~kexec-remove-kmsg_dump_kexec
+++ a/drivers/mtd/mtdoops.c
@@ -308,8 +308,7 @@ static void mtdoops_do_dump(struct kmsg_
char *dst;

if (reason != KMSG_DUMP_OOPS &&
- reason != KMSG_DUMP_PANIC &&
- reason != KMSG_DUMP_KEXEC)
+ reason != KMSG_DUMP_PANIC)
return;

/* Only dump oopses if dump_oops is set */
diff -puN include/linux/kmsg_dump.h~kexec-remove-kmsg_dump_kexec include/linux/kmsg_dump.h
--- a/include/linux/kmsg_dump.h~kexec-remove-kmsg_dump_kexec
+++ a/include/linux/kmsg_dump.h
@@ -18,7 +18,6 @@
enum kmsg_dump_reason {
KMSG_DUMP_OOPS,
KMSG_DUMP_PANIC,
- KMSG_DUMP_KEXEC,
KMSG_DUMP_RESTART,
KMSG_DUMP_HALT,
KMSG_DUMP_POWEROFF,
diff -puN kernel/kexec.c~kexec-remove-kmsg_dump_kexec kernel/kexec.c
--- a/kernel/kexec.c~kexec-remove-kmsg_dump_kexec
+++ a/kernel/kexec.c
@@ -32,7 +32,6 @@
#include <linux/console.h>
#include <linux/vmalloc.h>
#include <linux/swap.h>
-#include <linux/kmsg_dump.h>
#include <linux/syscore_ops.h>

#include <asm/page.h>
@@ -1079,8 +1078,6 @@ void crash_kexec(struct pt_regs *regs)


if (kexec_crash_image) {
struct pt_regs fixed_regs;

- kmsg_dump(KMSG_DUMP_KEXEC);
-
crash_setup_regs(&fixed_regs, regs);
crash_save_vmcoreinfo();
machine_crash_shutdown(&fixed_regs);
diff -puN arch/powerpc/platforms/pseries/nvram.c~kexec-remove-kmsg_dump_kexec arch/powerpc/platforms/pseries/nvram.c
--- a/arch/powerpc/platforms/pseries/nvram.c~kexec-remove-kmsg_dump_kexec
+++ a/arch/powerpc/platforms/pseries/nvram.c
@@ -490,7 +490,6 @@ static void oops_to_nvram(struct kmsg_du
/* These are almost always orderly shutdowns. */
return;
case KMSG_DUMP_OOPS:
- case KMSG_DUMP_KEXEC:
break;
case KMSG_DUMP_PANIC:
panicking = true;
_

KOSAKI Motohiro

unread,
Jun 9, 2011, 7:10:02 AM6/9/11
to
Hi

Sorry for the delay. I had got stuck LinuxCon Japan and piled up plenty
paper works.

>>> I think I can agree your proposal. But could you please explain why do
>>> you think kmsg _before_ kdump and kmsg _in_ kdump are so different?
>>> I think it is only C level difference. CPU don't care C function and
>>> anyway the kernel call kmsg_dump() because invoke second kernel even
>>> if you proposal applied.
>>> It is only curious. I'm not against your proposal.
>>> Thanks.
>
> Few reasons.
>
> - There is no correlation between crash_kexec() and kdump_msg(). What
> you are creating is equivalent of panic notifiers and calling those
> notifiers before dump happened. So calling it inside of crash_kexec()
> does not make much sense from code point of view.

Thank you for the replay. I got you _think_ no makes sense, but I haven't
explain what you talk about the code of "code point of view".
If you read the code, you can understand kdump_msg() and panic_notifiers
are not same point.


> - Why does somebody need to keep track of event KMSG_DUMP_KEXEC?

I believe I answered already at last thread.

http://groups.google.com/group/linux.kernel/browse_thread/thread/1084f406573d76ac/daa1a08804385089?q=kexec%3A+remove+KMSG_DUMP_KEXEC&lnk=ol&


> - There is one kernel CONFIG option introduce which looks completely
> superfluous.

What you mean "superfluous"? We already have billion kernel CONFIG.
Is it also bad?

> My general take on the whole issue.
>
> - In general I think exporting a hook to module so that they can do
> anything before crash is a bad idea. Now this can be overloaded to
> do things like sending crash notifications in clustered environement
> where we recommend doing it in second kernel.

??
It's not my issue and I haven't talked about such thing. I guess you
confuse I and Aguch-san or someone else.

>
> - Even if we really have to do it, there seemed to be two concern
> areas.
>
> - Reliability of kdump_msg() generic infrastructure and its
> capability in terms of handling races with other cpus and
> NMIs.
>
> - Reliability of module which is getting the callback from
> kdump_msg().

Indeed. I thought Aguch-san said he promised he work on improve them.
However it doesn't happen yet. Okay, I


> I think in one of the mails I was discussing that common infrastructure
> between kdump and kmsg_dump() can be put in a separate function, like
> stopping all cpus etc to avoid races in generic infrastrucutre and
> then first we can all kmsg_dump() and then crash_kexec().

Nice idea! Yes. I didn't think enterprise folks start to use this feature
and it now happen.
If nobody are working on this, I'll do it.


> But this still does not provide us any protection against modules getting
> control after crash and possiblly worsen the situation.

I think this doesn't big matter. If module author hope to get hook, they
can use kprobe in nowadays. I don't think we can make perfect kprobe protection.
I think I wrote this at last thread too.

Probably most reliability stupid module detect way is, watching lkml and revewing
kmsg_dump() user conteniously. However, if you strongly worry about this issue,
I can agree we make tiny foolish module protection. (but I don't have concrete
idea yet)

KOSAKI Motohiro

unread,
Jun 9, 2011, 7:20:02 AM6/9/11
to
Hi

> Seiji Aguchi <seiji....@hds.com> writes:
>
>> Hi,
>>
>>> What are you using kmsg_dump() for? Using mtdoops, ramoops or something
>>> else? Is it working reliably for you?
>>
>> I plan to use kmsg_dump() for set_variable service of UEFI.
>> I proposed a prototype patch this month and will improve it.
>> (kmsg_dump is used inside pstore.)
>>
>> https://lkml.org/lkml/2011/5/10/340
>
> Shudder. Firmware calls in the crash path.
>
> If that is the use, we need to remove the kmsg_dump(KMSG_DUMP_KEXEC)
> hook from crash_kexec yesterday. It is leading to some really ludicrous
> suggestions that are on the way from making kexec on panic unreliable
> and useless.

Do you have concrete example?

If you only talked about theoretical issue, probably making boot parameter
is enough and reasonable way.


> There will always be EFI implementations where that will not work and
> there will be no way we can fix those.
>
> There is a long history of people trying to do things in a crashing
> kernel, things that simply do not work when the system is in a bad
> state. kmsg_dump() when I reviewed the code had significant
> implementation problems for being called from interrupt handlers
> and the like.
>
> To introduce a different solution for capturing information when a
> kernel crashes we need to see numbers that in a large number of
> situations that the mechanism you are proposing is more reliable and/or
> more maintainable than the current kexec on panic implementation.
>
> The best work I know of on the reliability of the current situation
> is "Evaluating Linux Kernel Crash Dumping Mechanisms", by Fernando Luis Vazquez Cao.
> http://www.linuxsymposium.org/archives/OLS/Reprints-2006/cao-reprint.pdf

Every reliability improvement idea is welcome! This also improve embedded too.


> Now it does happen to be a fact that our efi support in linux is
> so buggy kexec does not work let alone kexec on panic (if the target
> kernel has any efi support). But our efi support being buggy is not
> a reason to add more ways to fail when we have a kernel with efi
> support. It is an argument to remove our excessive use of EFI
> calls.

Which part is buggy? As far as I know, IBM, HP and Fujitsu have EFI supported
server product and it works.

So, if you are suffered from buggy efi, I think the best way is to make config
option and you disable it and Aguch-san enable it.


> So let's just remove the ridiculous kmsg_dump(KMSG_DUMP_KEXEC) hook from
> crash_kexec and remove any temptation for abuses like wanting to use
> kmsg_dump() on anything but a deeply embedded system where there simply
> is not enough memory for 2 kernels.

This sentence has two misunderstands. 1) modern embedded (e.g. Android) has
lots memory rather than 10 years past PC 2) they often don't need full feature
second kernel. they often don't need full crash dump.

Thanks.

Vivek Goyal

unread,
Jun 14, 2011, 6:20:02 PM6/14/11
to
On Thu, Jun 09, 2011 at 08:00:08PM +0900, KOSAKI Motohiro wrote:
> Hi
>
> Sorry for the delay. I had got stuck LinuxCon Japan and piled up plenty
> paper works.
>
> >>> I think I can agree your proposal. But could you please explain why do
> >>> you think kmsg _before_ kdump and kmsg _in_ kdump are so different?
> >>> I think it is only C level difference. CPU don't care C function and
> >>> anyway the kernel call kmsg_dump() because invoke second kernel even
> >>> if you proposal applied.
> >>> It is only curious. I'm not against your proposal.
> >>> Thanks.
> >
> > Few reasons.
> >
> > - There is no correlation between crash_kexec() and kdump_msg(). What
> > you are creating is equivalent of panic notifiers and calling those
> > notifiers before dump happened. So calling it inside of crash_kexec()
> > does not make much sense from code point of view.
>
> Thank you for the replay. I got you _think_ no makes sense, but I haven't
> explain what you talk about the code of "code point of view".
> If you read the code, you can understand kdump_msg() and panic_notifiers
> are not same point.
>
>
> > - Why does somebody need to keep track of event KMSG_DUMP_KEXEC?
>
> I believe I answered already at last thread.
>
> http://groups.google.com/group/linux.kernel/browse_thread/thread/1084f406573d76ac/daa1a08804385089?q=kexec%3A+remove+KMSG_DUMP_KEXEC&lnk=ol&
>

Frankly speaking I never understood it. The only thing I got is that
you are saying embedded devices want to do something upon a KEXEC and
I do not understand what's that action embedded devices want to take
upon an KEXEC.

And it is not just KEXEC, I am also curious about hooks in places
like emergency_restart(). Who needs to know about it and is it
safe to do.

So if would be great if you could explain it in detail again.

>
> > - There is one kernel CONFIG option introduce which looks completely
> > superfluous.
>
> What you mean "superfluous"? We already have billion kernel CONFIG.
> Is it also bad?

We have lots of config options but every config option goes through
some thought process and it is taken in if it makes sense. In this
case this additional config option did not make any sense.

>
> > My general take on the whole issue.
> >
> > - In general I think exporting a hook to module so that they can do
> > anything before crash is a bad idea. Now this can be overloaded to
> > do things like sending crash notifications in clustered environement
> > where we recommend doing it in second kernel.
>
> ??
> It's not my issue and I haven't talked about such thing. I guess you
> confuse I and Aguch-san or someone else.

Once you export the hook to a module, anybody can do that. In the
past people have asked for it reapeatedly. So I am just giving you
one example that what will people start doing once the hook is
there.

>
> >
> > - Even if we really have to do it, there seemed to be two concern
> > areas.
> >
> > - Reliability of kdump_msg() generic infrastructure and its
> > capability in terms of handling races with other cpus and
> > NMIs.
> >
> > - Reliability of module which is getting the callback from
> > kdump_msg().
>
> Indeed. I thought Aguch-san said he promised he work on improve them.
> However it doesn't happen yet. Okay, I
>
>
> > I think in one of the mails I was discussing that common infrastructure
> > between kdump and kmsg_dump() can be put in a separate function, like
> > stopping all cpus etc to avoid races in generic infrastrucutre and
> > then first we can all kmsg_dump() and then crash_kexec().
>
> Nice idea! Yes. I didn't think enterprise folks start to use this feature
> and it now happen.
> If nobody are working on this, I'll do it.

It would be great if you could work on it and make sure kdump_msg()
and crash_kexec() could share common infrastructure which takes care
of common actions like stopping cpus, taking care of NMIs and invocation
of panic() on mutliple cpus etc.

>
>
> > But this still does not provide us any protection against modules getting
> > control after crash and possiblly worsen the situation.
>
> I think this doesn't big matter. If module author hope to get hook, they
> can use kprobe in nowadays. I don't think we can make perfect kprobe protection.
> I think I wrote this at last thread too.
>
> Probably most reliability stupid module detect way is, watching lkml and revewing
> kmsg_dump() user conteniously. However, if you strongly worry about this issue,
> I can agree we make tiny foolish module protection. (but I don't have concrete
> idea yet)

I do worry about modules. Once the system has paniced, I personally don't
think that anbody and everybody should be able to look at that event and
take whatever actions they want to.

Having said that personally I like the idea of being able to save
backtrace on some non volatile storage and access it over next boot
through pstore interface.

So if care is taken to make kmsg_dump() generic infrastructure fool proof
probably it is good start and then we can look into module thing later.

Thanks
Vivek

0 new messages