Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH v22] edac, ras/hw_event.h: use events to handle hw issues

8 views
Skip to first unread message

Mauro Carvalho Chehab

unread,
May 10, 2012, 3:56:47 PM5/10/12
to Mauro Carvalho Chehab, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
Add a new tracepoint-based hardware events report method for
outputing Memory Controller events.

Part of the description bellow is shamelessly copied from Tony
Luck's notes about the Hardware Error BoF during LPC 2010 [1].
Tony, thanks for your notes and discussions to generate the
h/w error reporting requirements.

[1] http://lwn.net/Articles/416669/

We have several subsystems & methods for reporting hardware errors:

1) EDAC ("Error Detection and Correction"). In its original form
this consisted of a platform specific driver that read topology
information and error counts from chipset registers and reported
the results via a sysfs interface.

2) mcelog - x86 specific decoding of machine check bank registers
reporting in binary form via /dev/mcelog. Recent additions make use
of the APEI extensions that were documented in version 4.0a of the
ACPI specification to acquire more information about errors without
having to rely reading chipset registers directly. A user level
programs decodes into somewhat human readable format.

3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and
decodes errors reported via machine check bank registers in AMD
processors to the console log using printk();

Each of these mechanisms has a band of followers ... and none
of them appear to meet all the needs of all users.

As part of a hardware event subsystem, let's encapsulate the memory
error hardware events into a trace facility.

NOTE: The original patch was providing an additional mechanism for
MCA-based trace events that also contained MCA error register data.
Hoever, as no agreement was reached so far for the MCA-based trace
events, for now, let's add events only for memory errors.
A latter patch is planned to change the tracepoint, for those types
of event.

Reviewed-by: Aristeu Rozanski <aroz...@redhat.com>
Cc: Doug Thompson <nor...@yahoo.com>
Cc: Steven Rostedt <ros...@goodmis.org>
Cc: Frederic Weisbecker <fwei...@gmail.com>
Cc: Ingo Molnar <mi...@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mch...@redhat.com>
---
drivers/edac/edac_core.h | 2 +-
drivers/edac/edac_mc.c | 25 +++++++++++----
include/ras/hw_event.h | 77 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 96 insertions(+), 8 deletions(-)
create mode 100644 include/ras/hw_event.h

diff --git a/drivers/edac/edac_core.h b/drivers/edac/edac_core.h
index f06ce9a..eee7360 100644
--- a/drivers/edac/edac_core.h
+++ b/drivers/edac/edac_core.h
@@ -468,7 +468,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
const int layer2,
const char *msg,
const char *other_detail,
- const void *mcelog);
+ const void *arch_log);

/*
* edac_device APIs
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index e5b5563..11f0178 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -33,6 +33,10 @@
#include "edac_core.h"
#include "edac_module.h"

+#define CREATE_TRACE_POINTS
+#define TRACE_INCLUDE_PATH ../../include/ras
+#include <ras/hw_event.h>
+
/* lock to memory controller's control array */
static DEFINE_MUTEX(mem_ctls_mutex);
static LIST_HEAD(mc_devices);
@@ -381,6 +385,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
* which will perform kobj unregistration and the actual free
* will occur during the kobject callback operation
*/
+
return mci;
}
EXPORT_SYMBOL_GPL(edac_mc_alloc);
@@ -982,7 +987,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
const int layer2,
const char *msg,
const char *other_detail,
- const void *mcelog)
+ const void *arch_log)
{
/* FIXME: too much for stack: move it to some pre-alocated area */
char detail[80], location[80];
@@ -1119,21 +1124,27 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
}

/* Memory type dependent details about the error */
- if (type == HW_EVENT_ERR_CORRECTED) {
+ if (type == HW_EVENT_ERR_CORRECTED)
snprintf(detail, sizeof(detail),
"page:0x%lx offset:0x%lx grain:%d syndrome:0x%lx",
page_frame_number, offset_in_page,
grain, syndrome);
- edac_ce_error(mci, pos, msg, location, label, detail,
- other_detail, enable_per_layer_report,
- page_frame_number, offset_in_page, grain);
- } else {
+ else
snprintf(detail, sizeof(detail),
"page:0x%lx offset:0x%lx grain:%d",
page_frame_number, offset_in_page, grain);

+ /* Report the error via the trace interface */
+ trace_mc_error(type, mci->mc_idx, msg, label, location,
+ detail, other_detail);
+
+ /* Report the error via the edac_mc_printk() interface */
+ if (type == HW_EVENT_ERR_CORRECTED)
+ edac_ce_error(mci, pos, msg, location, label, detail,
+ other_detail, enable_per_layer_report,
+ page_frame_number, offset_in_page, grain);
+ else
edac_ue_error(mci, pos, msg, location, label, detail,
other_detail, enable_per_layer_report);
- }
}
EXPORT_SYMBOL_GPL(edac_mc_handle_error);
diff --git a/include/ras/hw_event.h b/include/ras/hw_event.h
new file mode 100644
index 0000000..66981ef
--- /dev/null
+++ b/include/ras/hw_event.h
@@ -0,0 +1,77 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM hw_event
+
+#if !defined(_TRACE_HW_EVENT_MC_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HW_EVENT_MC_H
+
+#include <linux/tracepoint.h>
+#include <linux/edac.h>
+#include <linux/ktime.h>
+
+/*
+ * Hardware Events Report
+ *
+ * Those events are generated when hardware detected a corrected or
+ * uncorrected event, and are meant to replace the current API to report
+ * errors defined on both EDAC and MCE subsystems.
+ *
+ * FIXME: Add events for handling memory errors originated from the
+ * MCE subsystem.
+ */
+
+/*
+ * Hardware-independent Memory Controller specific events
+ */
+
+/*
+ * Default error mechanisms for Memory Controller errors (CE and UE)
+ */
+TRACE_EVENT(mc_error,
+
+ TP_PROTO(const unsigned int err_type,
+ const unsigned int mc_index,
+ const char *msg,
+ const char *label,
+ const char *location,
+ const char *detail,
+ const char *driver_detail),
+
+ TP_ARGS(err_type, mc_index, msg, label, location,
+ detail, driver_detail),
+
+ TP_STRUCT__entry(
+ __field( unsigned int, err_type )
+ __field( unsigned int, mc_index )
+ __string( msg, msg )
+ __string( label, label )
+ __string( detail, detail )
+ __string( location, location )
+ __string( driver_detail, driver_detail )
+ ),
+
+ TP_fast_assign(
+ __entry->err_type = err_type;
+ __entry->mc_index = mc_index;
+ __assign_str(msg, msg);
+ __assign_str(label, label);
+ __assign_str(location, location);
+ __assign_str(detail, detail);
+ __assign_str(driver_detail, driver_detail);
+ ),
+
+ TP_printk(HW_ERR "mce#%d: %s error %s on memory stick \"%s\" (%s %s %s)",
+ __entry->mc_index,
+ (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
+ ((__entry->err_type == HW_EVENT_ERR_FATAL) ?
+ "Fatal" : "Uncorrected"),
+ __get_str(msg),
+ __get_str(label),
+ __get_str(location),
+ __get_str(detail),
+ __get_str(driver_detail))
+);
+
+#endif /* _TRACE_HW_EVENT_MC_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
--
1.7.8

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Borislav Petkov

unread,
May 10, 2012, 4:41:03 PM5/10/12
to Mauro Carvalho Chehab, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Tony Luck
It should be:

Subject: RAS: Add a tracepoint for reporting memory controller errors

and not

Subject: edac, ras/hw_event.h: use events to handle hw issues

because events don't handle hw issues.

On Thu, May 10, 2012 at 04:56:28PM -0300, Mauro Carvalho Chehab wrote:
> Add a new tracepoint-based hardware events report method for
> outputing Memory Controller events.

reporting
ras
This still says "mce" and it should say "MC" or "mem_ctl" or similar.

> + __entry->mc_index,
> + (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
> + ((__entry->err_type == HW_EVENT_ERR_FATAL) ?
> + "Fatal" : "Uncorrected"),
> + __get_str(msg),
> + __get_str(label),
> + __get_str(location),
> + __get_str(detail),
> + __get_str(driver_detail))
> +);
> +
> +#endif /* _TRACE_HW_EVENT_MC_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>

I'm assuming you have all those changes on the experimental branch so
that I can continue reviewing from there?

@Tony: this is adding a RAS tracepoint for memory controller errors
coming from EDAC (for now), any objections/issues?

Thanks.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

Mauro Carvalho Chehab

unread,
May 10, 2012, 4:55:52 PM5/10/12
to Borislav Petkov, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar, Tony Luck
Then, the header name should be called as "ras.h", otherwise an error happens:

In file included from include/ras/hw_event.h:77:0,
from drivers/edac/edac_mc.c:38:
include/trace/define_trace.h:79:43: fatal error: ../../include/ras/ras.h: Arquivo ou diretório não encontrado
Yes.

>
> @Tony: this is adding a RAS tracepoint for memory controller errors
> coming from EDAC (for now), any objections/issues?
>
> Thanks.
>

Thanks,
Mauro

Mauro Carvalho Chehab

unread,
May 10, 2012, 5:00:47 PM5/10/12
to Mauro Carvalho Chehab, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
Add a new tracepoint-based hardware events report method for
reporting Memory Controller events.

Part of the description bellow is shamelessly copied from Tony
Luck's notes about the Hardware Error BoF during LPC 2010 [1].
Tony, thanks for your notes and discussions to generate the
h/w error reporting requirements.

[1] http://lwn.net/Articles/416669/

We have several subsystems & methods for reporting hardware errors:

1) EDAC ("Error Detection and Correction"). In its original form
this consisted of a platform specific driver that read topology
information and error counts from chipset registers and reported
the results via a sysfs interface.

2) mcelog - x86 specific decoding of machine check bank registers
reporting in binary form via /dev/mcelog. Recent additions make use
of the APEI extensions that were documented in version 4.0a of the
ACPI specification to acquire more information about errors without
having to rely reading chipset registers directly. A user level
programs decodes into somewhat human readable format.

3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and
decodes errors reported via machine check bank registers in AMD
processors to the console log using printk();

Each of these mechanisms has a band of followers ... and none
of them appear to meet all the needs of all users.

As part of a RAS subsystem, let's encapsulate the memory error hardware
events into a trace facility.

NOTE: The original patch was providing an additional mechanism for
MCA-based trace events that also contained MCA error register data.
Hoever, as no agreement was reached so far for the MCA-based trace
events, for now, let's add events only for memory errors.
A latter patch is planned to change the tracepoint, for those types
of event.

Reviewed-by: Aristeu Rozanski <aroz...@redhat.com>
Cc: Doug Thompson <nor...@yahoo.com>
Cc: Steven Rostedt <ros...@goodmis.org>
Cc: Frederic Weisbecker <fwei...@gmail.com>
Cc: Ingo Molnar <mi...@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mch...@redhat.com>
---
drivers/edac/edac_core.h | 2 +-
drivers/edac/edac_mc.c | 25 +++++++++++----
include/ras/ras.h | 77 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 96 insertions(+), 8 deletions(-)
create mode 100644 include/ras/ras.h

diff --git a/drivers/edac/edac_core.h b/drivers/edac/edac_core.h
index f06ce9a..eee7360 100644
--- a/drivers/edac/edac_core.h
+++ b/drivers/edac/edac_core.h
@@ -468,7 +468,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
const int layer2,
const char *msg,
const char *other_detail,
- const void *mcelog);
+ const void *arch_log);

/*
* edac_device APIs
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index e5b5563..7493adb 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -33,6 +33,10 @@
#include "edac_core.h"
#include "edac_module.h"

+#define CREATE_TRACE_POINTS
+#define TRACE_INCLUDE_PATH ../../include/ras
+#include <ras/ras.h>
diff --git a/include/ras/ras.h b/include/ras/ras.h
new file mode 100644
index 0000000..13ea4ee
--- /dev/null
+++ b/include/ras/ras.h
@@ -0,0 +1,77 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM ras
+ TP_printk(HW_ERR "mem_ctl#%d: %s error %s on memory stick \"%s\" (%s %s %s)",
+ __entry->mc_index,
+ (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
+ ((__entry->err_type == HW_EVENT_ERR_FATAL) ?
+ "Fatal" : "Uncorrected"),
+ __get_str(msg),
+ __get_str(label),
+ __get_str(location),
+ __get_str(detail),
+ __get_str(driver_detail))
+);
+
+#endif /* _TRACE_HW_EVENT_MC_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
--
1.7.8

Luck, Tony

unread,
May 10, 2012, 5:10:49 PM5/10/12
to Borislav Petkov, Mauro Carvalho Chehab, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
>> + TP_printk(HW_ERR "mce#%d: %s error %s on memory stick \"%s\" (%s %s %s)",
>
> This still says "mce" and it should say "MC" or "mem_ctl" or similar.

I'm trying to look at how this will look to an end user who is not intimately
acquainted with the internals of how memory subsystems work.

Whether the string starts with "mce" or "MC" or whatever ... what will the
user do with the mc_index that is printed with that first %d? I don't think
it helps them find the DIMM when they open the box. I suppose it is useful
if there are multiple messages ... and they see that the same memory controller
is mentioned in each. But I almost think it belongs inside the parentheses at
the end as the "low level details that most users won't need to care about.

Next %s is "Corrected" or "Fatal" or "Uncorrected" ... that's good.

What are the options for the next "%s" (msg)?

"memory stick"?? I suppose "DIMM" is a bit implementation dependent (SIMMs
are long gone ... but perhaps there will be some new acronym for stacked
memory ... STIMS :-) )

Then label (from SMBIOS) ... then the internal details. Good.

-Tony

Mauro Carvalho Chehab

unread,
May 10, 2012, 6:08:16 PM5/10/12
to Luck, Tony, Borislav Petkov, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
Em 10-05-2012 18:10, Luck, Tony escreveu:
>>> + TP_printk(HW_ERR "mce#%d: %s error %s on memory stick \"%s\" (%s %s %s)",
>>
>> This still says "mce" and it should say "MC" or "mem_ctl" or similar.
>
> I'm trying to look at how this will look to an end user who is not intimately
> acquainted with the internals of how memory subsystems work.

This is what patch v23 prints on sb_edac:

# cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# entries-in-buffer/entries-written: 30/30 #P:32
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
kworker/u:6-201 [007] .N.. 186.197280: mc_error: [Hardware Error]: mem_ctl#0: Corrected error memory read error on memory stick "DIMM_A1" (channel:0 slot:1 page:0x2f1eb3 offset:0x446 grain:32 syndrome:0x0 1 error(s): Unknown: Err=0001:0090 socket=0 channel=0/mask=1 rank=5)
kworker/u:6-201 [007] .N.. 186.239536: mc_error: [Hardware Error]: mem_ctl#1: Corrected error memory read error on memory stick "DIMM_E2" (channel:0 slot:0 page:0x93180b offset:0x927 grain:32 syndrome:0x0 1 error(s): Unknown: Err=0001:0090 socket=1 channel=2/mask=4 rank=0)

There are still some space to improve the fields provided by the drivers.

> Whether the string starts with "mce" or "MC" or whatever ... what will the
> user do with the mc_index that is printed with that first %d? I don't think
> it helps them find the DIMM when they open the box.

Well, it helps to match the memory information on the trace with the sysfs nodes
that are memory-controller based and with the dmesg info.

Calling it as "mc" is more coherent with the dmesg prints.

> I suppose it is useful
> if there are multiple messages ... and they see that the same memory controller
> is mentioned in each. But I almost think it belongs inside the parentheses at
> the end as the "low level details that most users won't need to care about.

I don't have a strong preference, although I think it is better to have it at
the beginning.

> Next %s is "Corrected" or "Fatal" or "Uncorrected" ... that's good.
>
> What are the options for the next "%s" (msg)?

The type of the error. In the above, it is "memory read error".

> "memory stick"?? I suppose "DIMM" is a bit implementation dependent (SIMMs
> are long gone ... but perhaps there will be some new acronym for stacked
> memory ... STIMS :-) )

Memory stick is described at edac.h as:

* Memory Stick: A printed circuit board that aggregates multiple
* memory devices in parallel. In general, this is the
* Field Replaceable Unit (FRU) which gets replaced, in
* the case of excessive errors. Most often it is also
* called DIMM (Dual Inline Memory Module).
*

DIMM is implementation dependent. As EDAC is also used on non-x86 archs, calling
it as DIMM on ARM is probably wrong.

Calling it as "STIMS" (or any other unusual acronym) seems worse ;)

>
> Then label (from SMBIOS) ... then the internal details. Good.

Regards,
Mauro

Luck, Tony

unread,
May 10, 2012, 6:38:02 PM5/10/12
to Mauro Carvalho Chehab, Borislav Petkov, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
kworker/u:6-201 [007] .N.. 186.197280: mc_error: [Hardware Error]: mem_ctl#0: Corrected error memory read error on memory stick "DIMM_A1" (channel:0 slot:1 page:0x2f1eb3 offset:0x446 grain:32 syndrome:0x0 1 error(s): Unknown: Err=0001:0090 socket=0 channel=0/mask=1 rank=5)

The word "error" appears *five* times on this line (once with a capital E).
I feel beaten, bruised and ready to give up on this machine with just one
actual error reported :-)

We could get rid of one by:
s/Corrected error memory read error/Corrected memory read error/

(though we'd need to see if things still read well for all other "msg" options.

Or perhaps it could say:
... Corrected error: memory read on memory stick ...
or even:
... Corrected error: read on memory stick ...

This part could get shortened too:
mc_error: [Hardware Error]:
will mc_error ever report something that isn't a "Hardware Error"?
I don't think we have to preserve this legacy string when moving
to a new reporting mechanism.


> There are still some space to improve the fields provided by the drivers.
Apart from reporting "channel" twice, that doesn't look too bad. Maybe
the "1 error(s)" could say "count: 1"?

-Tony

Steven Rostedt

unread,
May 10, 2012, 6:47:08 PM5/10/12
to Mauro Carvalho Chehab, Borislav Petkov, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Frederic Weisbecker, Ingo Molnar, Tony Luck
On Thu, 2012-05-10 at 17:55 -0300, Mauro Carvalho Chehab wrote:

> >> EXPORT_SYMBOL_GPL(edac_mc_handle_error);
> >> diff --git a/include/ras/hw_event.h b/include/ras/hw_event.h
> >> new file mode 100644
> >> index 0000000..66981ef
> >> --- /dev/null
> >> +++ b/include/ras/hw_event.h
> >> @@ -0,0 +1,77 @@
> >> +#undef TRACE_SYSTEM
> >> +#define TRACE_SYSTEM hw_event
> >
> > ras
>
> Then, the header name should be called as "ras.h", otherwise an error happens:
>
> In file included from include/ras/hw_event.h:77:0,
> from drivers/edac/edac_mc.c:38:
> include/trace/define_trace.h:79:43: fatal error: ../../include/ras/ras.h: Arquivo ou diretório não encontrado

From samples/trace_events/trace-events-sample.h:

/*
* TRACE_INCLUDE_FILE is not needed if the filename and TRACE_SYSTEM are equal
*/
#define TRACE_INCLUDE_FILE trace-events-sample

Thus, you could do:

+++ b/include/ras/hw_event.h

+undef TRACE_SYSTEM
+define TRACE_SYSTEM ras

[...]

#define TRACE_INCLUDE_FILE hw_event


-- Steve

Mauro Carvalho Chehab

unread,
May 10, 2012, 7:17:58 PM5/10/12
to Steven Rostedt, Borislav Petkov, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Frederic Weisbecker, Ingo Molnar, Tony Luck
Thanks, Steve!

I'll rename it to include/ras/ras_event.h

Regards,
Mauro

Mauro Carvalho Chehab

unread,
May 10, 2012, 9:49:52 PM5/10/12
to Luck, Tony, Borislav Petkov, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
Em 10-05-2012 19:37, Luck, Tony escreveu:
> kworker/u:6-201 [007] .N.. 186.197280: mc_error: [Hardware Error]: mem_ctl#0: Corrected error memory read error on memory stick "DIMM_A1" (channel:0 slot:1 page:0x2f1eb3 offset:0x446 grain:32 syndrome:0x0 1 error(s): Unknown: Err=0001:0090 socket=0 channel=0/mask=1 rank=5)
>
> The word "error" appears *five* times on this line (once with a capital E).
> I feel beaten, bruised and ready to give up on this machine with just one
> actual error reported :-)

:)

Several of them come from the driver-provided details.

The edac-mc core contributes with "mc_error", "[Hardware Error]" and "Corrected error".
The sb-edac driver contributes with "memory read error" and "1 error(s)".

We can get easily get rid of "[Hardware Error]" by removing HW_ERR from:

TP_printk(HW_ERR "mem_ctl#%d: %s error %s on memory stick \"%s\" (%s %s %s)",

replacing mc_error by something else is not hard, but this is the name of the trace call:

TRACE_EVENT(mc_error,
..

Maybe the better is to do s/mc_error/mc_event/g.

The error count msg ("1 error(s)") could be replaced by "count:1".

>
> We could get rid of one by:
> s/Corrected error memory read error/Corrected memory read error/

This is the hardest possible solution ;) Changing it will cause weird messages
all over EDAC drivers ;)

This is how sb_edac.c provides the "memory read error" string:

switch (optypenum) {
case 0:
optype = "generic undef request error";
break;
case 1:
optype = "memory read error";
break;
case 2:
optype = "memory write error";
break;
case 3:
optype = "addr/cmd error";
break;
case 4:
optype = "memory scrubbing error";
break;
default:
optype = "reserved";
break;

In the last case of switch, for this driver, the error would be printed as "Corrected reserved".

On i7core_edac, there's also one error that would be weird:

switch (optypenum) {
case 0:
optype = "generic undef request";
break;

Drivers like i5100_edac also provide error messages without the word "error" on it, like:

static const char *i5100_err_msg(unsigned err)
{
static const char *merrs[] = {
"unknown", /* 0 */
"uncorrectable data ECC on replay", /* 1 */
"unknown", /* 2 */
"unknown", /* 3 */
"aliased uncorrectable demand data ECC", /* 4 */
"aliased uncorrectable spare-copy data ECC", /* 5 */
"aliased uncorrectable patrol data ECC", /* 6 */
"unknown", /* 7 */
"unknown", /* 8 */
"unknown", /* 9 */
"non-aliased uncorrectable demand data ECC", /* 10 */
"non-aliased uncorrectable spare-copy data ECC", /* 11 */
"non-aliased uncorrectable patrol data ECC", /* 12 */
..

On _several_ drivers, the error type is simply the name of the driver, or blank:

amd76x_edac.c:
edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
mci->csrows[row]->first_page, 0, 0,
row, 0, -1,
mci->ctl_name, "", NULL);

i3200_edac:
edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
0, 0, 0,
eccerrlog_row(channel, log),
-1, -1,
"i3000 UE", "", NULL);

Btw, you should not forget that, while simple usecases will be to just read the
/sys/kernel/debug/tracing/trace file, a monitoring tool will use the
binary data information, and store each trace field on a separate database
field.

So, the contents of the error message field should be consistent.

>
> (though we'd need to see if things still read well for all other "msg" options.
>
> Or perhaps it could say:
> ... Corrected error: memory read on memory stick ...
> or even:
> ... Corrected error: read on memory stick ...
>
> This part could get shortened too:
> mc_error: [Hardware Error]:
> will mc_error ever report something that isn't a "Hardware Error"?
> I don't think we have to preserve this legacy string when moving
> to a new reporting mechanism.
>
>
>> There are still some space to improve the fields provided by the drivers.
> Apart from reporting "channel" twice, that doesn't look too bad. Maybe
> the "1 error(s)" could say "count: 1"?

Agreed.

See the enclosed patch. The TP_printk() message after it is:

mc_event: Corrected error:memory read error on memory stick "DIMM_A1" (mc:0 channel:0 slot:0 page:0x1a3706 offset:0xff1 grain:32 syndrome:0x0 count:1 area:DRAM err_code:0001:0090 socket:0 channel_mask:8 rank:0)

If ok, I'll merge the edac core part together with this changeset, and the
sb_edac part together with the patch that cleans the sb_edac logs.

Regards,
Mauro

--

edac: Improve error messages on sb-edac and edac-mc

From: Mauro Carvalho Chehab <mch...@redhat.com>

After this patch, /sys/kernel/debug/tracing/trace displays:

kworker/u:6-201 [007] .N.. 161.136624: mc_event: Corrected error:memory read error on memory stick "DIMM_A1" (mc:0 channel:0 slot:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 count:1 area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:1)
kworker/u:6-201 [007] .N.. 161.155708: mc_event: Corrected error:memory read error on memory stick "DIMM_E1" (mc:1 channel:0 slot:0 page:0x987f45 offset:0x14c grain:32 syndrome:0x0 count:1 area:DRAM err_code:0001:0090 socket:1 channel_mask:4 rank:0)
kworker/u:6-201 [007] .N.. 161.174817: mc_event: Corrected error:memory read error on memory stick "DIMM_C1" (mc:0 channel:0 slot:1 page:0x2bf618 offset:0xb2e grain:32 syndrome:0x0 count:1 area:DRAM err_code:0001:0090 socket:0 channel_mask:4 rank:4)

Signed-off-by: Mauro Carvalho Chehab <mch...@redhat.com>

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 0550cb4..b7492e8 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -35,7 +35,7 @@

#define CREATE_TRACE_POINTS
#define TRACE_INCLUDE_PATH ../../include/ras
-#include <ras/ras.h>
+#include <ras/ras_event.h>

/* lock to memory controller's control array */
static DEFINE_MUTEX(mem_ctls_mutex);
@@ -1174,7 +1174,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
page_frame_number, offset_in_page, grain);

/* Report the error via the trace interface */
- trace_mc_error(type, mci->mc_idx, msg, label, location,
+ trace_mc_event(type, mci->mc_idx, msg, label, location,
detail, other_detail);

/* Report the error via the edac_mc_printk() interface */
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 69c807c..60dbefe 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -786,7 +786,7 @@ static int get_memory_error_data(struct mem_ctl_info *mci,
u8 *socket,
long *channel_mask,
u8 *rank,
- char *area_type, char *msg)
+ char **area_type, char *msg)
{
struct mem_ctl_info *new_mci;
struct sbridge_pvt *pvt = mci->pvt_info;
@@ -841,7 +841,7 @@ static int get_memory_error_data(struct mem_ctl_info *mci,
sprintf(msg, "Can't discover the memory socket");
return -EINVAL;
}
- area_type = get_dram_attr(reg);
+ *area_type = get_dram_attr(reg);
interleave_mode = INTERLEAVE_MODE(reg);

pci_read_config_dword(pvt->pci_sad0, interleave_list[n_sads],
@@ -1339,7 +1339,7 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
struct mem_ctl_info *new_mci;
struct sbridge_pvt *pvt = mci->pvt_info;
enum hw_event_mc_err_type tp_event;
- char *type, *optype, msg[256], *recoverable_msg;
+ char *type, *optype, msg[256];
bool ripv = GET_BITFIELD(m->mcgstatus, 0, 0);
bool overflow = GET_BITFIELD(m->status, 62, 62);
bool uncorrected_error = GET_BITFIELD(m->status, 61, 61);
@@ -1352,7 +1352,7 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
long channel_mask, first_channel;
u8 rank, socket;
int rc, dimm;
- char *area_type = "Unknown";
+ char *area_type = NULL;

if (uncorrected_error) {
if (ripv) {
@@ -1404,7 +1404,7 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
}

rc = get_memory_error_data(mci, m->addr, &socket,
- &channel_mask, &rank, area_type, msg);
+ &channel_mask, &rank, &area_type, msg);
if (rc < 0)
goto err_parsing;
new_mci = get_mci_for_node_id(socket);
@@ -1424,10 +1424,6 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
else
dimm = 2;

- if (uncorrected_error && recoverable)
- recoverable_msg = " recoverable";
- else
- recoverable_msg = "";

/*
* FIXME: On some memory configurations (mirror, lockstep), the
@@ -1436,14 +1432,13 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
* to the group of dimm's where the error may be happening.
*/
snprintf(msg, sizeof(msg),
- "%d error(s)%s: %s%s: Err=%04x:%04x socket=%d channel=%ld/mask=%ld rank=%d",
+ "count:%d%s%s area:%s err_code:%04x:%04x socket:%d channel_mask:%ld rank:%d",
core_err_cnt,
overflow ? " OVERFLOW" : "",
+ (uncorrected_error && recoverable) ? " recoverable" : "",
area_type,
- recoverable_msg,
mscod, errcode,
socket,
- first_channel,
channel_mask,
rank);

diff --git a/include/ras/ras.h b/include/ras/ras_event.h
similarity index 80%
rename from include/ras/ras.h
rename to include/ras/ras_event.h
index 13ea4ee..66f6a43 100644
--- a/include/ras/ras.h
+++ b/include/ras/ras_event.h
@@ -1,5 +1,6 @@
#undef TRACE_SYSTEM
#define TRACE_SYSTEM ras
+#define TRACE_INCLUDE_FILE ras_event

#if !defined(_TRACE_HW_EVENT_MC_H) || defined(TRACE_HEADER_MULTI_READ)
#define _TRACE_HW_EVENT_MC_H
@@ -26,25 +27,25 @@
/*
* Default error mechanisms for Memory Controller errors (CE and UE)
*/
-TRACE_EVENT(mc_error,
+TRACE_EVENT(mc_event,

TP_PROTO(const unsigned int err_type,
const unsigned int mc_index,
- const char *msg,
+ const char *error_msg,
const char *label,
const char *location,
- const char *detail,
+ const char *core_detail,
const char *driver_detail),

- TP_ARGS(err_type, mc_index, msg, label, location,
- detail, driver_detail),
+ TP_ARGS(err_type, mc_index, error_msg, label, location,
+ core_detail, driver_detail),

TP_STRUCT__entry(
__field( unsigned int, err_type )
__field( unsigned int, mc_index )
- __string( msg, msg )
+ __string( msg, error_msg )
__string( label, label )
- __string( detail, detail )
+ __string( detail, core_detail )
__string( location, location )
__string( driver_detail, driver_detail )
),
@@ -52,20 +53,20 @@ TRACE_EVENT(mc_error,
TP_fast_assign(
__entry->err_type = err_type;
__entry->mc_index = mc_index;
- __assign_str(msg, msg);
+ __assign_str(msg, error_msg);
__assign_str(label, label);
__assign_str(location, location);
- __assign_str(detail, detail);
+ __assign_str(detail, core_detail);
__assign_str(driver_detail, driver_detail);
),

- TP_printk(HW_ERR "mem_ctl#%d: %s error %s on memory stick \"%s\" (%s %s %s)",
- __entry->mc_index,
+ TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s %s)",
(__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
((__entry->err_type == HW_EVENT_ERR_FATAL) ?
"Fatal" : "Uncorrected"),
__get_str(msg),
__get_str(label),
+ __entry->mc_index,
__get_str(location),
__get_str(detail),
__get_str(driver_detail))

Borislav Petkov

unread,
May 11, 2012, 6:05:02 AM5/11/12
to Mauro Carvalho Chehab, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
Let me repeat myself:

> It should be:
>
> Subject: RAS: Add a tracepoint for reporting memory controller errors
>
> and not
>
> Subject: edac, ras/hw_event.h: use events to handle hw issues
>
> because events don't handle hw issues.

You don't use events to handle hardware issues - you use tracepoints to
report hardware issues! And besides, "event" is so overloaded in kernel
land that actually refraining from its use makes me keep at least a bit
of sanity when looking at yet another header called something event.h.

Please be more precise when naming your patches.

Thanks.
> To unsubscribe from this list: send the line "unsubscribe linux-edac" in
> the body of a message to majo...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

Borislav Petkov

unread,
May 11, 2012, 6:25:49 AM5/11/12
to Mauro Carvalho Chehab, Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
On Thu, May 10, 2012 at 10:48:40PM -0300, Mauro Carvalho Chehab wrote:
> Em 10-05-2012 19:37, Luck, Tony escreveu:
> > kworker/u:6-201 [007] .N.. 186.197280: mc_error: [Hardware Error]: mem_ctl#0: Corrected error memory read error on memory stick "DIMM_A1" (channel:0 slot:1 page:0x2f1eb3 offset:0x446 grain:32 syndrome:0x0 1 error(s): Unknown: Err=0001:0090 socket=0 channel=0/mask=1 rank=5)
> >
> > The word "error" appears *five* times on this line (once with a capital E).
> > I feel beaten, bruised and ready to give up on this machine with just one
> > actual error reported :-)
>
> :)
>
> Several of them come from the driver-provided details.
>
> The edac-mc core contributes with "mc_error", "[Hardware Error]" and "Corrected error".
> The sb-edac driver contributes with "memory read error" and "1 error(s)".
>
> We can get easily get rid of "[Hardware Error]" by removing HW_ERR from:
>
> TP_printk(HW_ERR "mem_ctl#%d: %s error %s on memory stick \"%s\" (%s %s %s)",
>
> replacing mc_error by something else is not hard, but this is the name of the trace call:
>
> TRACE_EVENT(mc_error,
> ...
>
> Maybe the better is to do s/mc_error/mc_event/g.

HW_ERR is the "official" prefix used by the MCE code in the kernel.
Maybe we can shorten it but it is needed to raise attention when staring
at dmesg output.

Now, since this tracepoint is not dmesg, we don't need it there at all
since we know that trace_mc_error reports memory errors.

"mc_error" is also not needed.

> The error count msg ("1 error(s)") could be replaced by "count:1".

Is there even a possibility to report more than one error when invoking
trace_mc_error once? If not, simply drop the count completely.

> > We could get rid of one by:
> > s/Corrected error memory read error/Corrected memory read error/
>
> This is the hardest possible solution ;) Changing it will cause weird messages
> all over EDAC drivers ;)

I agree with Tony here - repeating error a gazillion times on one report
only is a "naaah!"

Here's how it should look:

kworker/u:6-201 [007] .N.. 161.136624: [Hardware Error]: memory read on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0 rank:1 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 channel_mask:1)

* count is gone
* MC-drivers shouldn't say "error" when reporting an error
* UE/CE moves into the brackets
* socket moves earlier in the brackets, and keep the whole deal hierarchical.
* drop "err_code" what is that?
* drop second "socket"
* drop "area" Area "DRAM" - are there other?
* what is "channel_mask"?
* move "rank" to earlier

Now this is an output format I can get on board with.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

Mauro Carvalho Chehab

unread,
May 11, 2012, 8:38:35 AM5/11/12
to Borislav Petkov, Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
Em 11-05-2012 07:25, Borislav Petkov escreveu:
> On Thu, May 10, 2012 at 10:48:40PM -0300, Mauro Carvalho Chehab wrote:
>> Em 10-05-2012 19:37, Luck, Tony escreveu:
>>> kworker/u:6-201 [007] .N.. 186.197280: mc_error: [Hardware Error]: mem_ctl#0: Corrected error memory read error on memory stick "DIMM_A1" (channel:0 slot:1 page:0x2f1eb3 offset:0x446 grain:32 syndrome:0x0 1 error(s): Unknown: Err=0001:0090 socket=0 channel=0/mask=1 rank=5)
>>>
>>> The word "error" appears *five* times on this line (once with a capital E).
>>> I feel beaten, bruised and ready to give up on this machine with just one
>>> actual error reported :-)
>>
>> :)
>>
>> Several of them come from the driver-provided details.
>>
>> The edac-mc core contributes with "mc_error", "[Hardware Error]" and "Corrected error".
>> The sb-edac driver contributes with "memory read error" and "1 error(s)".
>>
>> We can get easily get rid of "[Hardware Error]" by removing HW_ERR from:
>>
>> TP_printk(HW_ERR "mem_ctl#%d: %s error %s on memory stick \"%s\" (%s %s %s)",
>>
>> replacing mc_error by something else is not hard, but this is the name of the trace call:
>>
>> TRACE_EVENT(mc_error,
>> ...
>>
>> Maybe the better is to do s/mc_error/mc_event/g.
>
> HW_ERR is the "official" prefix used by the MCE code in the kernel.
> Maybe we can shorten it but it is needed to raise attention when staring
> at dmesg output.
>
> Now, since this tracepoint is not dmesg, we don't need it there at all
> since we know that trace_mc_error reports memory errors.

IMO, we can get rid of it on trace, keeping it at dmesg.

> "mc_error" is also not needed.

Some name is required there. This parameter affects:
- the name of the tracepoint function;
- the TP_printk() output;
- the trace filter name.

mc_event is probably a good name.

>> The error count msg ("1 error(s)") could be replaced by "count:1".
>
> Is there even a possibility to report more than one error when invoking
> trace_mc_error once? If not, simply drop the count completely.

Good point. It makes sense to add a count parameter to the error handling core,
in order to handle "count". AFAIKT, the only two drivers that currently reports
error counts are sb_edac and i7core_edac.

With regards to sb_edac, it also needs another logic to be handled by the core:
channel_mask. This is required to handle lockstep mode and mirror mode.

Adding support for it is already on my TODO list. I'll also put "count"
on my TODO list.

>>> We could get rid of one by:
>>> s/Corrected error memory read error/Corrected memory read error/
>>
>> This is the hardest possible solution ;) Changing it will cause weird messages
>> all over EDAC drivers ;)
>
> I agree with Tony here - repeating error a gazillion times on one report
> only is a "naaah!"
>
> Here's how it should look:
>
> kworker/u:6-201 [007] .N.. 161.136624: [Hardware Error]: memory read on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0 rank:1 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 channel_mask:1)

As already explained: the error_msg is not consistent among the drivers.

So, "memory read on ..." will work fine on sb_edac/i7core_edac but it _won't_
work on other drivers.

Changing this field on each driver requires deep knowledge of each memory
controller, and not all datasheets are publicly available.

For example, this is the code for Freescale MPC85xx EDAC driver:

if (err_detect & DDR_EDE_SBE)
edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
pfn, err_addr & ~PAGE_MASK, syndrome,
row_index, 0, -1,
mci->ctl_name, "", NULL);

if (err_detect & DDR_EDE_MBE)
edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
pfn, err_addr & ~PAGE_MASK, syndrome,
row_index, 0, -1,
mci->ctl_name, "", NULL);

It uses a blank value for the error message. putting the error_msg at the beginning,
as you proposed would print:

[Hardware Error]: on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0 )

> * count is gone

While count is not properly addressed, I'll keep it on sb_edac/i7core_edac.

The right thing here seems to move it to a core property and increment the error counters
according with the error count.

> * MC-drivers shouldn't say "error" when reporting an error
> * UE/CE moves into the brackets

It seems that both Tony and me agrees that UE/CE/Fatal should be outside the brackets.

> * socket moves earlier in the brackets, and keep the whole deal hierarchical.

Socket doesn't bring a very relevant information. It provides the CPU socket with the core that
were running the code when an error was generated, not the CPU that were managing the memory.

The CPU managing the memory is called "memory controller".

It might be used by something that would kill the affected task, but doing such things
is probably not easy.

So, the hierarchical information that translates into "DIMM_A1" is "mc:0 channel:0 slot:0".

> * drop "err_code" what is that?

In this case:
u32 errcode = GET_BITFIELD(m->status, 0, 15);

> * drop second "socket"
> * drop "area" Area "DRAM" - are there other?

Yes. there are 4 types of area at sb_edac.

> * what is "channel_mask"?

It is a bitmask mask showing all channels affected by an error, on sb_edac.
Handing it is on my TODO list.

> * move "rank" to earlier

Why? This is the least relevant information provided by the driver-specific logic:

snprintf(msg, sizeof(msg),
"count:%d%s%s area:%s err_code:%04x:%04x socket:%d channel_mask:%ld rank:%d",
core_err_cnt,
overflow ? " OVERFLOW" : "",
(uncorrected_error && recoverable) ? " recoverable" : "",
area_type,
mscod, errcode,
socket,
channel_mask,
rank);

Error count, overflow, recoverable bit, etc are more relevant than it, as it actually affects
the user.

Rank, on the other hand, might only help someone interested on replacing a DRAM chip. Still,
it is doubtful that this would be used, in practice, as companies that replaces DIMM chip likely
have some specific equipments to test the DIMMs.

So, I almost dropped it. The only reason for me to not drop is that it allows me to compare
the driver results with some other testing tools I'm using on driver's development.


> Now this is an output format I can get on board with.
>

Regards,
Mauro.

Mauro Carvalho Chehab

unread,
May 11, 2012, 10:54:38 AM5/11/12
to Mauro Carvalho Chehab, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
The tracepoint printk will be displayed like:

mc_event: (Corrected|Uncorrected|Fatal) error:[error msg] on memory stick "[label]" ([location] [edac_mc detail] [driver_detail])

Where:
[error msg] is the driver-specific error message
(e. g. "memory read", "bus error", ...);
[location] is the location in terms of memory controller and
branch/channel/slot, channel/slot or csrow/channel;
[label] is the memory stick label;
[edac_mc detail] describes the address location of the error
and the syndrome;
[driver detail] is driver-specifig error message details,
when needed/provided (e. g. "area:DMA", ...)

For example:

mc_event: Corrected error:memory read on memory stick "DIMM_1A" (mc:0 channel:0 slot:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA)

Of course, any userspace tools meant to handle errors should not parse
the above data. They should, instead, use the binary fields provided by
the tracepoint, mapping them directly into their MIBs.

NOTE: The original patch was providing an additional mechanism for
MCA-based trace events that also contained MCA error register data.
Hoever, as no agreement was reached so far for the MCA-based trace
events, for now, let's add events only for memory errors.
A latter patch is planned to change the tracepoint, for those types
of event.

Reviewed-by: Aristeu Rozanski <aroz...@redhat.com>
Cc: Doug Thompson <nor...@yahoo.com>
Cc: Steven Rostedt <ros...@goodmis.org>
Cc: Frederic Weisbecker <fwei...@gmail.com>
Cc: Ingo Molnar <mi...@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mch...@redhat.com>
---
drivers/edac/edac_core.h | 2 +-
drivers/edac/edac_mc.c | 25 ++++++++++----
include/ras/ras_event.h | 78 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 97 insertions(+), 8 deletions(-)
create mode 100644 include/ras/ras_event.h

diff --git a/drivers/edac/edac_core.h b/drivers/edac/edac_core.h
index f06ce9a..eee7360 100644
--- a/drivers/edac/edac_core.h
+++ b/drivers/edac/edac_core.h
@@ -468,7 +468,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
const int layer2,
const char *msg,
const char *other_detail,
- const void *mcelog);
+ const void *arch_log);

/*
* edac_device APIs
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index e5b5563..0153cd98 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -33,6 +33,10 @@
#include "edac_core.h"
#include "edac_module.h"

+#define CREATE_TRACE_POINTS
+#define TRACE_INCLUDE_PATH ../../include/ras
+#include <ras/ras_event.h>
+ trace_mc_event(type, mci->mc_idx, msg, label, location,
+ detail, other_detail);
+
+ /* Report the error via the edac_mc_printk() interface */
+ if (type == HW_EVENT_ERR_CORRECTED)
+ edac_ce_error(mci, pos, msg, location, label, detail,
+ other_detail, enable_per_layer_report,
+ page_frame_number, offset_in_page, grain);
+ else
edac_ue_error(mci, pos, msg, location, label, detail,
other_detail, enable_per_layer_report);
- }
}
EXPORT_SYMBOL_GPL(edac_mc_handle_error);
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
new file mode 100644
index 0000000..66f6a43
--- /dev/null
+++ b/include/ras/ras_event.h
@@ -0,0 +1,78 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM ras
+#define TRACE_INCLUDE_FILE ras_event
+TRACE_EVENT(mc_event,
+
+ TP_PROTO(const unsigned int err_type,
+ const unsigned int mc_index,
+ const char *error_msg,
+ const char *label,
+ const char *location,
+ const char *core_detail,
+ const char *driver_detail),
+
+ TP_ARGS(err_type, mc_index, error_msg, label, location,
+ core_detail, driver_detail),
+
+ TP_STRUCT__entry(
+ __field( unsigned int, err_type )
+ __field( unsigned int, mc_index )
+ __string( msg, error_msg )
+ __string( label, label )
+ __string( detail, core_detail )
+ __string( location, location )
+ __string( driver_detail, driver_detail )
+ ),
+
+ TP_fast_assign(
+ __entry->err_type = err_type;
+ __entry->mc_index = mc_index;
+ __assign_str(msg, error_msg);
+ __assign_str(label, label);
+ __assign_str(location, location);
+ __assign_str(detail, core_detail);
+ __assign_str(driver_detail, driver_detail);
+ ),
+
+ TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s %s)",
+ (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
+ ((__entry->err_type == HW_EVENT_ERR_FATAL) ?
+ "Fatal" : "Uncorrected"),
+ __get_str(msg),
+ __get_str(label),
+ __entry->mc_index,

Luck, Tony

unread,
May 11, 2012, 1:02:53 PM5/11/12
to Mauro Carvalho Chehab, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
> For example:
>
> mc_event: Corrected error:memory read on memory stick "DIMM_1A" (mc:0 channel:0 slot:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA)

This is looking so much better.

I looked through your examples from drivers on what text we might see
in the "memory read" position ... and agree that it would be a lot of
work to make them all come up with grammatically clean messages, especially
for all the poorly documented (or undocumented) "default/unknown/..." cases.

Back to my "does the casual user really need to know" soapbox. What different
actions do we expect a user to take when we tell them "read error" or "write
error" or "unknown error"? I'm beginning to think that this belongs inside
the brackets! Perhaps as: type:"memory read"?

Then we'd have:

mc_event: Corrected error on memory stick "DIMM_1A" (bunch of stuff for deep diagnosis by vendor)

Knowing that the error was Corrected/Uncorrected is vital to the user. It lets them know
the urgency with which they need to take action ... we need to educate them that a few
"Corrected" errors are perfectly normal and nothing to raise blood pressure about.

Knowing which memory stick was involved - also very important. If they do take action,
they should be able to swap out the memory stick that was the source of the problem.

Everything else is just for memory geeks like me, you and Boris (and OEMs who want to
diagnose root cause of problems they see by pattern analysis across errors from multiple
machines with DIMMS from different batches/vendors).

-Tony

Borislav Petkov

unread,
May 11, 2012, 1:07:21 PM5/11/12
to Mauro Carvalho Chehab, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
From Documentation/SubmittingPatches:

"2) Describe your changes.

Describe the technical detail of the change(s) your patch includes.

Be as specific as possible."

"use tracepoint to handle hw issues" is not specific - it is bullshit
bingo with a random word generator.

I'm getting tired of this crap: either take the suggestion as is,
without randomly changing it for no reason whatsoever, or give a good
reason why it is not a good suggestion and propose a better one!

Randomly changing it in the next version of the patch for another, even
crappier one doesn't help.
> To unsubscribe from this list: send the line "unsubscribe linux-edac" in
> the body of a message to majo...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

Borislav Petkov

unread,
May 11, 2012, 1:24:55 PM5/11/12
to Mauro Carvalho Chehab, Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
On Fri, May 11, 2012 at 09:37:48AM -0300, Mauro Carvalho Chehab wrote:
> > HW_ERR is the "official" prefix used by the MCE code in the kernel.
> > Maybe we can shorten it but it is needed to raise attention when staring
> > at dmesg output.
> >
> > Now, since this tracepoint is not dmesg, we don't need it there at all
> > since we know that trace_mc_error reports memory errors.
>
> IMO, we can get rid of it on trace, keeping it at dmesg.
>
> > "mc_error" is also not needed.
>
> Some name is required there. This parameter affects:
> - the name of the tracepoint function;
> - the TP_printk() output;
> - the trace filter name.
>
> mc_event is probably a good name.

If this tracepoint is going to report memory errors and we drop HW_ERR,
then I guess "mc_error" is the most fitting one because it says what the
message is.

> >> The error count msg ("1 error(s)") could be replaced by "count:1".
> >
> > Is there even a possibility to report more than one error when invoking
> > trace_mc_error once? If not, simply drop the count completely.
>
> Good point. It makes sense to add a count parameter to the error handling core,
> in order to handle "count". AFAIKT, the only two drivers that currently reports
> error counts are sb_edac and i7core_edac.

What does that mean? You report multiple errors with one tracepoint
invocation? How do you report error details then?

If you only report single errors, error count will always be 1 so drop
it.

> With regards to sb_edac, it also needs another logic to be handled by
> the core: channel_mask. This is required to handle lockstep mode and
> mirror mode.

What does "handle lockstep mode" mean and how is that important for me
staring at the tracepoint output.

[ … ]

> > Here's how it should look:
> >
> > kworker/u:6-201 [007] .N.. 161.136624: [Hardware Error]: memory read on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0 rank:1 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 channel_mask:1)
>
> As already explained: the error_msg is not consistent among the drivers.
>
> So, "memory read on ..." will work fine on sb_edac/i7core_edac but it _won't_
> work on other drivers.
>
> Changing this field on each driver requires deep knowledge of each memory
> controller, and not all datasheets are publicly available.
>
> For example, this is the code for Freescale MPC85xx EDAC driver:
>
> if (err_detect & DDR_EDE_SBE)
> edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
> pfn, err_addr & ~PAGE_MASK, syndrome,
> row_index, 0, -1,
> mci->ctl_name, "", NULL);
>
> if (err_detect & DDR_EDE_MBE)
> edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
> pfn, err_addr & ~PAGE_MASK, syndrome,
> row_index, 0, -1,
> mci->ctl_name, "", NULL);
>
> It uses a blank value for the error message. putting the error_msg at the beginning,
> as you proposed would print:
>
> [Hardware Error]: on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0 )

that's fine since we know the tracepoint purpose:

"mc_error: on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0)"

is pretty clear to me, IMHO.

> > * count is gone
>
> While count is not properly addressed, I'll keep it on sb_edac/i7core_edac.
>
> The right thing here seems to move it to a core property and increment the error counters
> according with the error count.

Incrementing the error counters shouldn't have anything to do with the
error reporting.

> > * MC-drivers shouldn't say "error" when reporting an error
> > * UE/CE moves into the brackets
>
> It seems that both Tony and me agrees that UE/CE/Fatal should be outside the brackets.

Why?

> > * socket moves earlier in the brackets, and keep the whole deal hierarchical.
>
> Socket doesn't bring a very relevant information. It provides the CPU socket with the core that
> were running the code when an error was generated, not the CPU that were managing the memory.

Then drop it if it's not relevant.

> > * drop "err_code" what is that?
>
> In this case:
> u32 errcode = GET_BITFIELD(m->status, 0, 15);

Either decode it like I do in amd_decode_err_code() or remove it
completely - no naked bit values.

> > * drop second "socket"
> > * drop "area" Area "DRAM" - are there other?
>
> Yes. there are 4 types of area at sb_edac.

And they are?

> > * what is "channel_mask"?
>
> It is a bitmask mask showing all channels affected by an error, on sb_edac.
> Handing it is on my TODO list.

What can the user do with it when it sees it reported?

> > * move "rank" to earlier
>
> Why? This is the least relevant information provided by the driver-specific logic:
>
> snprintf(msg, sizeof(msg),
> "count:%d%s%s area:%s err_code:%04x:%04x socket:%d channel_mask:%ld rank:%d",
> core_err_cnt,
> overflow ? " OVERFLOW" : "",
> (uncorrected_error && recoverable) ? " recoverable" : "",
> area_type,
> mscod, errcode,
> socket,
> channel_mask,
> rank);
>
> Error count, overflow, recoverable bit, etc are more relevant than it, as it actually affects
> the user.
>
> Rank, on the other hand, might only help someone interested on replacing a DRAM chip. Still,
> it is doubtful that this would be used, in practice, as companies that replaces DIMM chip likely
> have some specific equipments to test the DIMMs.
>
> So, I almost dropped it. The only reason for me to not drop is that it allows me to compare
> the driver results with some other testing tools I'm using on driver's development.

Then drop it and keep a local version for your testing needs only.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

Mauro Carvalho Chehab

unread,
May 11, 2012, 2:39:02 PM5/11/12
to Borislav Petkov, Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
Em 11-05-2012 14:24, Borislav Petkov escreveu:
> On Fri, May 11, 2012 at 09:37:48AM -0300, Mauro Carvalho Chehab wrote:
>>> HW_ERR is the "official" prefix used by the MCE code in the kernel.
>>> Maybe we can shorten it but it is needed to raise attention when staring
>>> at dmesg output.
>>>
>>> Now, since this tracepoint is not dmesg, we don't need it there at all
>>> since we know that trace_mc_error reports memory errors.
>>
>> IMO, we can get rid of it on trace, keeping it at dmesg.
>>
>>> "mc_error" is also not needed.
>>
>> Some name is required there. This parameter affects:
>> - the name of the tracepoint function;
>> - the TP_printk() output;
>> - the trace filter name.
>>
>> mc_event is probably a good name.
>
> If this tracepoint is going to report memory errors and we drop HW_ERR,
> then I guess "mc_error" is the most fitting one because it says what the
> message is.

True. Yet, "mc_event" fits better, as some drivers can also report events
that aren't errors. So, calling it as "mc_error", as on my original patch
is actually wrong. For example, i5100 can report those types of events:

"spare copy initiated", /* 20 */
"spare copy completed", /* 21 */

Spare copy events don't fit into "errors". Other drivers can also report events
like passing some temperature thresholds that aren't really errors.

Eventually, we might need yet-another severity type for those types of events,
as they aren't Corrected/Uncorrected/Fatal errors, but just notify events.

>>>> The error count msg ("1 error(s)") could be replaced by "count:1".
>>>
>>> Is there even a possibility to report more than one error when invoking
>>> trace_mc_error once? If not, simply drop the count completely.
>>
>> Good point. It makes sense to add a count parameter to the error handling core,
>> in order to handle "count". AFAIKT, the only two drivers that currently reports
>> error counts are sb_edac and i7core_edac.
>
> What does that mean? You report multiple errors with one tracepoint
> invocation? How do you report error details then?

When a burst of errors happen at the same memory address (within a grain), a
few memory controllers can merge them into a single report, providing an error
count and a overflow flag, if the burst is higher than the register size.

> If you only report single errors, error count will always be 1 so drop
> it.

No, it is not single errors. It is multiple errors at the same address/grain.

>> With regards to sb_edac, it also needs another logic to be handled by
>> the core: channel_mask. This is required to handle lockstep mode and
>> mirror mode.
>
> What does "handle lockstep mode" mean and how is that important for me
> staring at the tracepoint output.

That were already explained dozens of time on the patch review threads:

The error check on lookstep mode is calculated over a 128 bits cacheline.
The memory controller sometimes is not able to distinguish if the error
happened on the first channel or on the second channel.

So, either one (or the two) envolved DIMMs should be responsible for it.

>
> [ … ]
>
>>> Here's how it should look:
>>>
>>> kworker/u:6-201 [007] .N.. 161.136624: [Hardware Error]: memory read on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0 rank:1 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 channel_mask:1)
>>
>> As already explained: the error_msg is not consistent among the drivers.
>>
>> So, "memory read on ..." will work fine on sb_edac/i7core_edac but it _won't_
>> work on other drivers.
>>
>> Changing this field on each driver requires deep knowledge of each memory
>> controller, and not all datasheets are publicly available.
>>
>> For example, this is the code for Freescale MPC85xx EDAC driver:
>>
>> if (err_detect & DDR_EDE_SBE)
>> edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
>> pfn, err_addr & ~PAGE_MASK, syndrome,
>> row_index, 0, -1,
>> mci->ctl_name, "", NULL);
>>
>> if (err_detect & DDR_EDE_MBE)
>> edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
>> pfn, err_addr & ~PAGE_MASK, syndrome,
>> row_index, 0, -1,
>> mci->ctl_name, "", NULL);
>>
>> It uses a blank value for the error message. putting the error_msg at the beginning,
>> as you proposed would print:
>>
>> [Hardware Error]: on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0 )
>
> that's fine since we know the tracepoint purpose:
>
> "mc_error: on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0)"
>
> is pretty clear to me, IMHO.

yes, but:
mc_event: on memory stick "DIMM_A1" (type: corrected socket:0 mc:0 channel:0 slot:0)

is not clear if this is either an error or something else.

>
>>> * count is gone
>>
>> While count is not properly addressed, I'll keep it on sb_edac/i7core_edac.
>>
>> The right thing here seems to move it to a core property and increment the error counters
>> according with the error count.
>
> Incrementing the error counters shouldn't have anything to do with the
> error reporting.

Why? The error counters is part of the API. Some utilities like edac-ctl actually don't even
care about the error logs. They only look into the error counting.

The API has 3 ways to report errors:
- via counters;
- via tracepoints;
- via dmesg.

The error handling routine should update the error on all of them.

Of course, we may opt to remove error counting from the Kernel as a hole, letting it for
userspace handling. Even so, the driver needs to provide how many errors were reported,
in order to implement it on userspace.

>>> * MC-drivers shouldn't say "error" when reporting an error
>>> * UE/CE moves into the brackets
>>
>> It seems that both Tony and me agrees that UE/CE/Fatal should be outside the brackets.
>
> Why?

Also already discussed dozens of time: this is the error severity. The way users handle
UE errors are different than the way they handle CE, e. g. CE errors are "tolerable".
UE errors might not be.

>
>>> * socket moves earlier in the brackets, and keep the whole deal hierarchical.
>>
>> Socket doesn't bring a very relevant information. It provides the CPU socket with the core that
>> were running the code when an error was generated, not the CPU that were managing the memory.
>
> Then drop it if it's not relevant.

Because irrelevant != "not very relevant".

I might eventually drop it on some latter cleanups on sb_edac driver.

>>> * drop "err_code" what is that?
>>
>> In this case:
>> u32 errcode = GET_BITFIELD(m->status, 0, 15);
>
> Either decode it like I do in amd_decode_err_code() or remove it
> completely - no naked bit values.

It is decoded, but providing it might help with debugging.

I might eventually drop it on some latter cleanups on sb_edac driver.

>
>>> * drop second "socket"
>>> * drop "area" Area "DRAM" - are there other?
>>
>> Yes. there are 4 types of area at sb_edac.
>
> And they are?

See the driver:

static char *get_dram_attr(u32 reg)
{
switch(DRAM_ATTR(reg)) {
case 0:
return "DRAM";
case 1:
return "MMCFG";
case 2:
return "NXM";
default:
return "unknown";
}
}

>
>>> * what is "channel_mask"?
>>
>> It is a bitmask mask showing all channels affected by an error, on sb_edac.
>> Handing it is on my TODO list.
>
> What can the user do with it when it sees it reported?

Instead of blaming just one DIMM, blaming 2 or 4 DIMMs.

>>> * move "rank" to earlier
>>
>> Why? This is the least relevant information provided by the driver-specific logic:
>>
>> snprintf(msg, sizeof(msg),
>> "count:%d%s%s area:%s err_code:%04x:%04x socket:%d channel_mask:%ld rank:%d",
>> core_err_cnt,
>> overflow ? " OVERFLOW" : "",
>> (uncorrected_error && recoverable) ? " recoverable" : "",
>> area_type,
>> mscod, errcode,
>> socket,
>> channel_mask,
>> rank);
>>
>> Error count, overflow, recoverable bit, etc are more relevant than it, as it actually affects
>> the user.
>>
>> Rank, on the other hand, might only help someone interested on replacing a DRAM chip. Still,
>> it is doubtful that this would be used, in practice, as companies that replaces DIMM chip likely
>> have some specific equipments to test the DIMMs.
>>
>> So, I almost dropped it. The only reason for me to not drop is that it allows me to compare
>> the driver results with some other testing tools I'm using on driver's development.
>
> Then drop it and keep a local version for your testing needs only.

It is too early to do that, as there literally thousands of ways to configure SB Memory Controllers,
and I was not able to test all of them.

So, at least while we don't have a report of having all possible SB modes tested by somebody,
I won't be dropping things that will help me to address bugzillas opened for this driver.

Regards,
Mauro

Mauro Carvalho Chehab

unread,
May 11, 2012, 2:54:01 PM5/11/12
to Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
Em 11-05-2012 14:02, Luck, Tony escreveu:
>> For example:
>>
>> mc_event: Corrected error:memory read on memory stick "DIMM_1A" (mc:0 channel:0 slot:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA)
>
> This is looking so much better.
>
> I looked through your examples from drivers on what text we might see
> in the "memory read" position ... and agree that it would be a lot of
> work to make them all come up with grammatically clean messages, especially
> for all the poorly documented (or undocumented) "default/unknown/..." cases.
>
> Back to my "does the casual user really need to know" soapbox. What different
> actions do we expect a user to take when we tell them "read error" or "write
> error" or "unknown error"? I'm beginning to think that this belongs inside
> the brackets! Perhaps as: type:"memory read"?

Indeed, read/write errors are equal for the user, but other events like (i5100):

"SPD protocol error", /* 18 */
"spare copy initiated", /* 20 */
"spare copy completed", /* 21 */

Or (i5000):
"Northbound CRC error on non-redundant retry";
">Tmid Thermal event with intelligent throttling disabled";
specific = "DIMM-spare copy started";
specific = "DIMM-spare copy completed";

May mean that the DIMM is ok, but the error maybe on some other part of the system
(like an overheated cabinet, a badly-inserted DIMM or PCI device or maybe just some
data mirroring in progress).

So, IMHO, keeping it at the main part of the error is valuable, at least when the
driver can generate such kinds of event.

> Then we'd have:
>
> mc_event: Corrected error on memory stick "DIMM_1A" (bunch of stuff for deep diagnosis by vendor)

With the current implementation, this can actually be done at driver-basis: just fill
error_msg with a blank string and add all details at the driver-specific error detail.

> Knowing that the error was Corrected/Uncorrected is vital to the user. It lets them know
> the urgency with which they need to take action ... we need to educate them that a few
> "Corrected" errors are perfectly normal and nothing to raise blood pressure about.
>
> Knowing which memory stick was involved - also very important. If they do take action,
> they should be able to swap out the memory stick that was the source of the problem.

> Everything else is just for memory geeks like me, you and Boris (and OEMs who want to
> diagnose root cause of problems they see by pattern analysis across errors from multiple
> machines with DIMMS from different batches/vendors).

Agreed (except for the cases like the above described).

Regards,
Mauro

Tony Luck

unread,
May 11, 2012, 4:07:43 PM5/11/12
to Mauro Carvalho Chehab, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
On Fri, May 11, 2012 at 11:53 AM, Mauro Carvalho Chehab
<mch...@redhat.com> wrote:
> With the current implementation, this can actually be done at driver-basis: just fill
> error_msg with a blank string and add all details at the driver-specific error detail.

Cool. Then we just need some documentation telling EDAC driver
writers that error_msg is only intended for information that will help
a casual user understand what (if anything) to do about the error.

-Tony

Borislav Petkov

unread,
May 11, 2012, 6:31:36 PM5/11/12
to Mauro Carvalho Chehab, Borislav Petkov, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
On Fri, May 11, 2012 at 02:10:30PM -0300, Mauro Carvalho Chehab wrote:
> Well, tell me what title you want for this patch, instead of coming
> with offending comments like above, and I'll change it to match your
> desire.

Already told you but here it is again:

> Let me repeat myself:
>
> > It should be:
> >
> > Subject: RAS: Add a tracepoint for reporting memory controller errors
> >
> > and not
> >
> > Subject: edac, ras/hw_event.h: use events to handle hw issues
> >
> > because events don't handle hw issues.
>
> You don't use events to handle hardware issues - you use tracepoints to
> report hardware issues! And besides, "event" is so overloaded in kernel
> land that actually refraining from its use makes me keep at least a bit
> of sanity when looking at yet another header called something event.h.
>
> Please be more precise when naming your patches.
>
> Thanks.

Luck, Tony

unread,
May 11, 2012, 6:35:32 PM5/11/12
to Borislav Petkov, Mauro Carvalho Chehab, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
> > Subject: RAS: Add a tracepoint for reporting memory controller errors

s/errors/events/ ... since Mauro has pointed out that some events like:
"spare copy initiated" and "spare copy completed" may be reported using
this tracepoint ... and they are not errors

-Tony

Mauro Carvalho Chehab

unread,
May 12, 2012, 10:24:50 AM5/12/12
to Mauro Carvalho Chehab, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
The tracepoint printk will be displayed like:

mc_event: (Corrected|Uncorrected|Fatal) error:[error msg] on memory stick "[label]" ([location] [edac_mc detail] [driver_detail])

Where:
[error msg] is the driver-specific error message
(e. g. "memory read", "bus error", ...);
[location] is the location in terms of memory controller and
branch/channel/slot, channel/slot or csrow/channel;
[label] is the memory stick label;
[edac_mc detail] describes the address location of the error
and the syndrome;
[driver detail] is driver-specifig error message details,
when needed/provided (e. g. "area:DMA", ...)

For example:

mc_event: Corrected error:memory read on memory stick "DIMM_1A" (mc:0 channel:0 slot:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA)

Of course, any userspace tools meant to handle errors should not parse
the above data. They should, instead, use the binary fields provided by
the tracepoint, mapping them directly into their MIBs.

NOTE: The original patch was providing an additional mechanism for
MCA-based trace events that also contained MCA error register data.
Hoever, as no agreement was reached so far for the MCA-based trace
events, for now, let's add events only for memory errors.
A latter patch is planned to change the tracepoint, for those types
of event.

Reviewed-by: Aristeu Rozanski <aroz...@redhat.com>
Cc: Doug Thompson <nor...@yahoo.com>
Cc: Steven Rostedt <ros...@goodmis.org>
Cc: Frederic Weisbecker <fwei...@gmail.com>
Cc: Ingo Molnar <mi...@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mch...@redhat.com>
---
drivers/edac/edac_core.h | 2 +-
drivers/edac/edac_mc.c | 46 +++++++++++++++++++++++----
include/ras/ras_event.h | 78 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 118 insertions(+), 8 deletions(-)
create mode 100644 include/ras/ras_event.h

diff --git a/drivers/edac/edac_core.h b/drivers/edac/edac_core.h
index f06ce9a..eee7360 100644
--- a/drivers/edac/edac_core.h
+++ b/drivers/edac/edac_core.h
@@ -468,7 +468,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
const int layer2,
const char *msg,
const char *other_detail,
- const void *mcelog);
+ const void *arch_log);

/*
* edac_device APIs
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index e5b5563..eb078ec 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -33,6 +33,10 @@
#include "edac_core.h"
#include "edac_module.h"

+#define CREATE_TRACE_POINTS
+#define TRACE_INCLUDE_PATH ../../include/ras
+#include <ras/ras_event.h>
+
/* lock to memory controller's control array */
static DEFINE_MUTEX(mem_ctls_mutex);
static LIST_HEAD(mc_devices);
@@ -381,6 +385,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
* which will perform kobj unregistration and the actual free
* will occur during the kobject callback operation
*/
+
return mci;
}
EXPORT_SYMBOL_GPL(edac_mc_alloc);
@@ -972,6 +977,27 @@ static void edac_ue_error(struct mem_ctl_info *mci,
}

#define OTHER_LABEL " or "
+
+/**
+ * edac_mc_handle_error - reports a memory event to userspace
+ *
+ * @type: severity of the error (CE/UE/Fatal)
+ * @mci: a struct mem_ctl_info pointer
+ * @page_frame_number: mem page where the error occurred
+ * @offset_in_page: offset of the error inside the page
+ * @syndrome: ECC syndrome
+ * @layer0: Memory layer0 position
+ * @layer1: Memory layer2 position
+ * @layer2: Memory layer3 position
+ * @msg: Message meaningful to the end users that
+ * explains the event
+ * @other_detail: Technical details about the event that
+ * may help hardware manufacturers and
+ * EDAC developers to analyse the event
+ * @arch_log: Architecture-specific struct that can
+ * be used to add extended information to the
+ * tracepoint, like dumping MCE registers.
+ */
void edac_mc_handle_error(const enum hw_event_mc_err_type type,
struct mem_ctl_info *mci,
const unsigned long page_frame_number,
@@ -982,7 +1008,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
const int layer2,
const char *msg,
const char *other_detail,
- const void *mcelog)
+ const void *arch_log)
{
/* FIXME: too much for stack: move it to some pre-alocated area */
char detail[80], location[80];
@@ -1119,21 +1145,27 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
}

/* Memory type dependent details about the error */
- if (type == HW_EVENT_ERR_CORRECTED) {
+ if (type == HW_EVENT_ERR_CORRECTED)
snprintf(detail, sizeof(detail),
"page:0x%lx offset:0x%lx grain:%d syndrome:0x%lx",
page_frame_number, offset_in_page,
grain, syndrome);
- edac_ce_error(mci, pos, msg, location, label, detail,
- other_detail, enable_per_layer_report,
- page_frame_number, offset_in_page, grain);
- } else {
+ else
snprintf(detail, sizeof(detail),
"page:0x%lx offset:0x%lx grain:%d",
page_frame_number, offset_in_page, grain);

+ /* Report the error via the trace interface */
+ trace_mc_event(type, mci->mc_idx, msg, label, location,
+ detail, other_detail);
+
+ /* Report the error via the edac_mc_printk() interface */
+ if (type == HW_EVENT_ERR_CORRECTED)
+ edac_ce_error(mci, pos, msg, location, label, detail,
+ other_detail, enable_per_layer_report,
+ page_frame_number, offset_in_page, grain);
+ else
edac_ue_error(mci, pos, msg, location, label, detail,
other_detail, enable_per_layer_report);
- }
}
EXPORT_SYMBOL_GPL(edac_mc_handle_error);
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
new file mode 100644
index 0000000..66f6a43
--- /dev/null
+++ b/include/ras/ras_event.h
@@ -0,0 +1,78 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM ras
+#define TRACE_INCLUDE_FILE ras_event
+TRACE_EVENT(mc_event,
+
+ TP_PROTO(const unsigned int err_type,
+ const unsigned int mc_index,
+ const char *error_msg,
+ const char *label,
+ const char *location,
+ const char *core_detail,
+ const char *driver_detail),
+
+ TP_ARGS(err_type, mc_index, error_msg, label, location,
+ core_detail, driver_detail),
+
+ TP_STRUCT__entry(
+ __field( unsigned int, err_type )
+ __field( unsigned int, mc_index )
+ __string( msg, error_msg )
+ __string( label, label )
+ __string( detail, core_detail )
+ __string( location, location )
+ __string( driver_detail, driver_detail )
+ ),
+
+ TP_fast_assign(
+ __entry->err_type = err_type;
+ __entry->mc_index = mc_index;
+ __assign_str(msg, error_msg);
+ __assign_str(label, label);
+ __assign_str(location, location);
+ __assign_str(detail, core_detail);
+ __assign_str(driver_detail, driver_detail);
+ ),
+
+ TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s %s)",
+ (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
+ ((__entry->err_type == HW_EVENT_ERR_FATAL) ?
+ "Fatal" : "Uncorrected"),
+ __get_str(msg),
+ __get_str(label),
+ __entry->mc_index,

Borislav Petkov

unread,
May 14, 2012, 9:34:36 AM5/14/12
to Mauro Carvalho Chehab, Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
On Fri, May 11, 2012 at 03:38:37PM -0300, Mauro Carvalho Chehab wrote:
> True. Yet, "mc_event" fits better, as some drivers can also report events
> that aren't errors. So, calling it as "mc_error", as on my original patch
> is actually wrong. For example, i5100 can report those types of events:
>
> "spare copy initiated", /* 20 */
> "spare copy completed", /* 21 */
>
> Spare copy events don't fit into "errors". Other drivers can also report events
> like passing some temperature thresholds that aren't really errors.
>
> Eventually, we might need yet-another severity type for those types of events,
> as they aren't Corrected/Uncorrected/Fatal errors, but just notify events.

Stupid i5100... :-)

> >>>> The error count msg ("1 error(s)") could be replaced by "count:1".
> >>>
> >>> Is there even a possibility to report more than one error when invoking
> >>> trace_mc_error once? If not, simply drop the count completely.
> >>
> >> Good point. It makes sense to add a count parameter to the error handling core,
> >> in order to handle "count". AFAIKT, the only two drivers that currently reports
> >> error counts are sb_edac and i7core_edac.
> >
> > What does that mean? You report multiple errors with one tracepoint
> > invocation? How do you report error details then?
>
> When a burst of errors happen at the same memory address (within a grain), a
> few memory controllers can merge them into a single report, providing an error

What does "a few memory controllers can merge them into a single report"
mean exactly? Which few? Do you have an example output?

> count and a overflow flag, if the burst is higher than the register size.
>
> > If you only report single errors, error count will always be 1 so drop
> > it.
>
> No, it is not single errors. It is multiple errors at the same address/grain.

Oh, so the SB MC can report multiple errors with one MCE or whatever it uses?

> >> With regards to sb_edac, it also needs another logic to be handled by
> >> the core: channel_mask. This is required to handle lockstep mode and
> >> mirror mode.
> >
> > What does "handle lockstep mode" mean and how is that important for me
> > staring at the tracepoint output.
>
> That were already explained dozens of time on the patch review threads:
>
> The error check on lookstep mode is calculated over a 128 bits cacheline.
> The memory controller sometimes is not able to distinguish if the error
> happened on the first channel or on the second channel.

Ah, channel interleaving.

> So, either one (or the two) envolved DIMMs should be responsible for it.

Ok, so in that case having channel mask in there doesn't begin to
explain the user what that means - only you and Intel people know that.

In that case, you probably want to make it much more explicit:

"kworker/u:6-201 [007] .N.. 161.136624: [Hardware Error]: memory read on unknown memory... (... channel: ...)"

like in the case with interleaved csrow controllers or something to that
effect saying that you cannot pinpoint the DIMM but it could be either
of the n DIMMs on channels ....

?
Ok, I'd say we call it "mc_error" and i5100's two messages about spare
copying simply lose. We're reporting predominantly errors here anyway.

> >>> * count is gone
> >>
> >> While count is not properly addressed, I'll keep it on sb_edac/i7core_edac.
> >>
> >> The right thing here seems to move it to a core property and increment the error counters
> >> according with the error count.
> >
> > Incrementing the error counters shouldn't have anything to do with the
> > error reporting.
>
> Why?

Because incrementing error counters and reporting errors should be two
different things.

> >>> * MC-drivers shouldn't say "error" when reporting an error
> >>> * UE/CE moves into the brackets
> >>
> >> It seems that both Tony and me agrees that UE/CE/Fatal should be outside the brackets.
> >
> > Why?
>
> Also already discussed dozens of time: this is the error severity. The way users handle
> UE errors are different than the way they handle CE, e. g. CE errors are "tolerable".
> UE errors might not be.

And why does that warrant having the error type in the beginning of the
message? I can still read it if it is in the brackets a couple of chars
later on.

> >>> * socket moves earlier in the brackets, and keep the whole deal hierarchical.
> >>
> >> Socket doesn't bring a very relevant information. It provides the CPU socket with the core that
> >> were running the code when an error was generated, not the CPU that were managing the memory.
> >
> > Then drop it if it's not relevant.
>
> Because irrelevant != "not very relevant".
>
> I might eventually drop it on some latter cleanups on sb_edac driver.

You still don't give me a valid, technical reason to keep it.

And yes, it _IS_ black or white: the socket either tells you a valuable
piece of information which you need for handling the error after
reporting it, or not, is useless, and no one needs to have it in the
error log.

It is that simple.

>
> >>> * drop "err_code" what is that?
> >>
> >> In this case:
> >> u32 errcode = GET_BITFIELD(m->status, 0, 15);
> >
> > Either decode it like I do in amd_decode_err_code() or remove it
> > completely - no naked bit values.
>
> It is decoded, but providing it might help with debugging.

I only see naked bit values, where is it decoded? It should be properly
decoded in the final error message that the user gets to see.

> >>> * drop second "socket"
> >>> * drop "area" Area "DRAM" - are there other?
> >>
> >> Yes. there are 4 types of area at sb_edac.
> >
> > And they are?
>
> See the driver:
>
> static char *get_dram_attr(u32 reg)
> {
> switch(DRAM_ATTR(reg)) {
> case 0:
> return "DRAM";
> case 1:
> return "MMCFG";
> case 2:
> return "NXM";
> default:
> return "unknown";
> }

You mean 3 - "unknown" is not a memory region.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

Mauro Carvalho Chehab

unread,
May 14, 2012, 10:27:33 AM5/14/12
to Borislav Petkov, Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
Hmm... I've already released 2 versions after this one, addressing all the pointed
issues pointed by you and Tony at:

http://permalink.gmane.org/gmane.linux.kernel/1296116

It would be good if you could comment against the latest patch version.

Em 14-05-2012 10:34, Borislav Petkov escreveu:
> On Fri, May 11, 2012 at 03:38:37PM -0300, Mauro Carvalho Chehab wrote:
>> True. Yet, "mc_event" fits better, as some drivers can also report events
>> that aren't errors. So, calling it as "mc_error", as on my original patch
>> is actually wrong. For example, i5100 can report those types of events:
>>
>> "spare copy initiated", /* 20 */
>> "spare copy completed", /* 21 */
>>
>> Spare copy events don't fit into "errors". Other drivers can also report events
>> like passing some temperature thresholds that aren't really errors.
>>
>> Eventually, we might need yet-another severity type for those types of events,
>> as they aren't Corrected/Uncorrected/Fatal errors, but just notify events.
>
> Stupid i5100... :-)

:)

>
>>>>>> The error count msg ("1 error(s)") could be replaced by "count:1".
>>>>>
>>>>> Is there even a possibility to report more than one error when invoking
>>>>> trace_mc_error once? If not, simply drop the count completely.
>>>>
>>>> Good point. It makes sense to add a count parameter to the error handling core,
>>>> in order to handle "count". AFAIKT, the only two drivers that currently reports
>>>> error counts are sb_edac and i7core_edac.
>>>
>>> What does that mean? You report multiple errors with one tracepoint
>>> invocation? How do you report error details then?
>>
>> When a burst of errors happen at the same memory address (within a grain), a
>> few memory controllers can merge them into a single report, providing an error
>
> What does "a few memory controllers can merge them into a single report"
> mean exactly? Which few? Do you have an example output?

sb_edac and i7core_edac are two examples. Other memory controllers might have this
feature, but, AFAIKT, no other drivers implement it.

I don't have any example handy.

>> count and a overflow flag, if the burst is higher than the register size.
>>
>>> If you only report single errors, error count will always be 1 so drop
>>> it.
>>
>> No, it is not single errors. It is multiple errors at the same address/grain.
>
> Oh, so the SB MC can report multiple errors with one MCE or whatever it uses?

Yes.

>
>>>> With regards to sb_edac, it also needs another logic to be handled by
>>>> the core: channel_mask. This is required to handle lockstep mode and
>>>> mirror mode.
>>>
>>> What does "handle lockstep mode" mean and how is that important for me
>>> staring at the tracepoint output.
>>
>> That were already explained dozens of time on the patch review threads:
>>
>> The error check on lookstep mode is calculated over a 128 bits cacheline.
>> The memory controller sometimes is not able to distinguish if the error
>> happened on the first channel or on the second channel.
>
> Ah, channel interleaving.

It is due to channel interleaving, but some chipsets provide more than one
ECC algorithm: one algo uses 72 bits for ECC while others use 144 bits for ECC.

On those, when channel interleaving is enabled and lockstep is disabled, it uses
two 72 bits ECC, one for each DIMM, with points to a single DIMM on errors,
at the expense of correcting less errors.

>> So, either one (or the two) envolved DIMMs should be responsible for it.
>
> Ok, so in that case having channel mask in there doesn't begin to
> explain the user what that means - only you and Intel people know that.

With the current way, yes.

> In that case, you probably want to make it much more explicit:
>
> "kworker/u:6-201 [007] .N.. 161.136624: [Hardware Error]: memory read on unknown memory... (... channel: ...)"
>
> like in the case with interleaved csrow controllers or something to that
> effect saying that you cannot pinpoint the DIMM but it could be either
> of the n DIMMs on channels ....
>
> ?

What I intend to do is to display all affected DIMMs, like:

Corrected error: memory read on "DIMM1 or DIMM2" ...

but filtering the "channel_mask" requires more core changes, so I won't start
working on it before merging the current patch series.
It is not only i5100 that have non-error events like that. It is ok to call the event function
as *_handle_error, as this is something internal to the subsystem, but, when outputting to
userspace, we should avoid calling "error" something that it isn't an error.

>>>>> * count is gone
>>>>
>>>> While count is not properly addressed, I'll keep it on sb_edac/i7core_edac.
>>>>
>>>> The right thing here seems to move it to a core property and increment the error counters
>>>> according with the error count.
>>>
>>> Incrementing the error counters shouldn't have anything to do with the
>>> error reporting.
>>
>> Why?
>
> Because incrementing error counters and reporting errors should be two
> different things.

It is not different things: it is just different ways of exporting the same
information. Some tools only use the error counters currently to output memory errors
(see edac-util).

>>>>> * MC-drivers shouldn't say "error" when reporting an error
>>>>> * UE/CE moves into the brackets
>>>>
>>>> It seems that both Tony and me agrees that UE/CE/Fatal should be outside the brackets.
>>>
>>> Why?
>>
>> Also already discussed dozens of time: this is the error severity. The way users handle
>> UE errors are different than the way they handle CE, e. g. CE errors are "tolerable".
>> UE errors might not be.
>
> And why does that warrant having the error type in the beginning of the
> message? I can still read it if it is in the brackets a couple of chars
> later on.

What were agreed (at rebase version 3?) is that user-relevant info is outside parenthesis,
while developers/manufactures relevant info is inside parenthesis.

>>>>> * socket moves earlier in the brackets, and keep the whole deal hierarchical.
>>>>
>>>> Socket doesn't bring a very relevant information. It provides the CPU socket with the core that
>>>> were running the code when an error was generated, not the CPU that were managing the memory.
>>>
>>> Then drop it if it's not relevant.
>>
>> Because irrelevant != "not very relevant".
>>
>> I might eventually drop it on some latter cleanups on sb_edac driver.
>
> You still don't give me a valid, technical reason to keep it.
>
> And yes, it _IS_ black or white: the socket either tells you a valuable
> piece of information which you need for handling the error after
> reporting it, or not, is useless, and no one needs to have it in the
> error log.
>
> It is that simple.

As I said, "rank" on sb_edac is relevant for me, as the driver's author,
when handling issues related to this driver.

Please, don't pretend to know better what's relevant at sb_edac than
the driver's author.

>>>>> * drop "err_code" what is that?
>>>>
>>>> In this case:
>>>> u32 errcode = GET_BITFIELD(m->status, 0, 15);
>>>
>>> Either decode it like I do in amd_decode_err_code() or remove it
>>> completely - no naked bit values.
>>
>> It is decoded, but providing it might help with debugging.
>
> I only see naked bit values, where is it decoded? It should be properly
> decoded in the final error message that the user gets to see.

It is decoded at the sb_driver. See the drivers's code for further details.

>>>>> * drop second "socket"
>>>>> * drop "area" Area "DRAM" - are there other?
>>>>
>>>> Yes. there are 4 types of area at sb_edac.
>>>
>>> And they are?
>>
>> See the driver:
>>
>> static char *get_dram_attr(u32 reg)
>> {
>> switch(DRAM_ATTR(reg)) {
>> case 0:
>> return "DRAM";
>> case 1:
>> return "MMCFG";
>> case 2:
>> return "NXM";
>> default:
>> return "unknown";
>> }
>
> You mean 3 - "unknown" is not a memory region.
>

Yes. So, there are actually 3 types of area.

Regards,
Mauro

Borislav Petkov

unread,
May 15, 2012, 11:10:27 AM5/15/12
to Mauro Carvalho Chehab, Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
On Mon, May 14, 2012 at 11:27:09AM -0300, Mauro Carvalho Chehab wrote:
> Hmm... I've already released 2 versions after this one, addressing all the pointed
> issues pointed by you and Tony at:
>
> http://permalink.gmane.org/gmane.linux.kernel/1296116
>
> It would be good if you could comment against the latest patch version.

Ok, here it is:

---
> commit 771823672b7c244b9a57523c955ead9fd87f2412
> Author: Mauro Carvalho Chehab <mch...@redhat.com>
> Date: Thu Feb 23 08:10:34 2012 -0300
>
> RAS: Add a tracepoint for reporting memory controller events
Ah ok, so count, area and rank are driver-specific stuff and they're not part of
the mandatory string output, I guess that's fine.

Here's what an error looks like on my system here:

mcegen.py-2868 [007] .N.. 178.261607: mc_event: Corrected error:amd64_edac on memory stick "unknown memory" (mc:0 csrow:3 channel:1 page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed )

There's still this trailing " " at the end of the error line which
shouldn't be there and also two spaces between "channel" and "page".

Also, according to the output above "amd64_edac" is supposed to be
[error msg] which is strange.

I believe this comes from this call in f1x_map_sysaddr_to_csrow():

edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
page, offset, syndrome,
csrow, chan, -1,
EDAC_MOD_STR, "", NULL);

I guess you want to do the following instead:

mcegen.py-2868 [007] .N.. 178.261607: mc_event: amd64_edac: corrected error on memory stick "unknown memory" (mc:0 csrow:3 channel:1 page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed)

maybe concatenate EDAC_MOD_STR with the proper string it reports, i.e.
corrected/uncorrected error?

Also, make sure the calls in the other drivers don't generate such
strange output.

> For example:
>
> mc_event: Corrected error:memory read on memory stick "DIMM_1A" (mc:0 channel:0 slot:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA)
you need a space after "error:"?

> Of course, any userspace tools meant to handle errors should not parse
> the above data. They should, instead, use the binary fields provided by
> the tracepoint, mapping them directly into their MIBs.

What is a MIB?

> NOTE: The original patch was providing an additional mechanism for
> MCA-based trace events that also contained MCA error register data.
> Hoever, as no agreement was reached so far for the MCA-based trace
> events, for now, let's add events only for memory errors.
> A latter patch is planned to change the tracepoint, for those types
> of event.
>
> Reviewed-by: Aristeu Rozanski <aroz...@redhat.com>
> Cc: Doug Thompson <nor...@yahoo.com>
> Cc: Steven Rostedt <ros...@goodmis.org>
> Cc: Frederic Weisbecker <fwei...@gmail.com>
> Cc: Ingo Molnar <mi...@redhat.com>
> Signed-off-by: Mauro Carvalho Chehab <mch...@redhat.com>
>
> diff --git a/drivers/edac/edac_core.h b/drivers/edac/edac_core.h
> index f06ce9ab692c..eee73605c5a0 100644
> --- a/drivers/edac/edac_core.h
> +++ b/drivers/edac/edac_core.h
> @@ -468,7 +468,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
> const int layer2,
> const char *msg,
> const char *other_detail,
> - const void *mcelog);
> + const void *arch_log);
>
> /*
> * edac_device APIs
> diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> index e5b55632359f..eb078ec62044 100644
analyze it.
> index 000000000000..66f6a43151dc
--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

Mauro Carvalho Chehab

unread,
May 15, 2012, 12:06:18 PM5/15/12
to Borislav Petkov, Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
Yep.

>
> Here's what an error looks like on my system here:
>
> mcegen.py-2868 [007] .N.. 178.261607: mc_event: Corrected error:amd64_edac on memory stick "unknown memory" (mc:0 csrow:3 channel:1 page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed )
>
> There's still this trailing " " at the end of the error line which
> shouldn't be there and also two spaces between "channel" and "page".

If you take a look at the trace printk:

+ TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s %s)",
+ (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
+ ((__entry->err_type == HW_EVENT_ERR_FATAL) ?
+ "Fatal" : "Uncorrected"),
+ __get_str(msg),
+ __get_str(label),
+ __entry->mc_index,
+ __get_str(location),
+ __get_str(detail),
+ __get_str(driver_detail))

There are not extra spaces there. The first extra space is probably because
there is an extra space at the label string. This should be easy to fix.

The other extra space at the end is because amd64 currently doesn't provide
driver_detail information.

> Also, according to the output above "amd64_edac" is supposed to be
> [error msg] which is strange.
>
> I believe this comes from this call in f1x_map_sysaddr_to_csrow():
>
> edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
> page, offset, syndrome,
> csrow, chan, -1,
> EDAC_MOD_STR, "", NULL);
>
> I guess you want to do the following instead:
>
> mcegen.py-2868 [007] .N.. 178.261607: mc_event: amd64_edac: corrected error on memory stick "unknown memory" (mc:0 csrow:3 channel:1 page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed)
>
> maybe concatenate EDAC_MOD_STR with the proper string it reports, i.e.
> corrected/uncorrected error?

The issue here is because amd64_edac (just like a few other drivers) use
its driver name (EDAC_MOD_STR) as the error message, instead of using
something meaningful, like "read error" or "ECC error".

> Also, make sure the calls in the other drivers don't generate such
> strange output.

The same kind of strange output is also at the printk's, and it is there
even with the current calls: the output syntax is broken on several drivers,
and fixing for some will break for the others.

So, this needs to be reviewed driver-per-driver. I'll handle that with the
machines I have for test. For the other drivers, maybe we can just fill
the error_msg information with "error".

Anyway, I intend to work on that after merging this huge patch series.
As Tony said, a change on that is not trivial.

>> For example:
>>
>> mc_event: Corrected error:memory read on memory stick "DIMM_1A" (mc:0 channel:0 slot:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA)
> you need a space after "error:"?
>
>> Of course, any userspace tools meant to handle errors should not parse
>> the above data. They should, instead, use the binary fields provided by
>> the tracepoint, mapping them directly into their MIBs.
>
> What is a MIB?

Management Information Base. This is how anyone that works with Element
Management calls the model of information that represents each management
property. It is generally written using ITU-T ASN.1 syntax. Almost all
management software use that.

[1] http://en.wikipedia.org/wiki/Management_information_base
Analyse is the same as analyze [2].

[2] http://dictionary.reference.com/browse/analyse

Borislav Petkov

unread,
May 15, 2012, 12:39:19 PM5/15/12
to Mauro Carvalho Chehab, Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
On Tue, May 15, 2012 at 01:05:48PM -0300, Mauro Carvalho Chehab wrote:
> > Here's what an error looks like on my system here:
> >
> > mcegen.py-2868 [007] .N.. 178.261607: mc_event: Corrected error:amd64_edac on memory stick "unknown memory" (mc:0 csrow:3 channel:1 page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed )
> >
> > There's still this trailing " " at the end of the error line which
> > shouldn't be there and also two spaces between "channel" and "page".
>
> If you take a look at the trace printk:
>
> + TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s %s)",
> + (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
> + ((__entry->err_type == HW_EVENT_ERR_FATAL) ?
> + "Fatal" : "Uncorrected"),
> + __get_str(msg),
> + __get_str(label),
> + __entry->mc_index,
> + __get_str(location),
> + __get_str(detail),
> + __get_str(driver_detail))
>
> There are not extra spaces there. The first extra space is probably because
> there is an extra space at the label string. This should be easy to fix.
>
> The other extra space at the end is because amd64 currently doesn't provide
> driver_detail information.

Remind me again why do we need two strings: detail and driver_detail?

Because they could very well be lumped together with a single "%s"
format - "(mc:%d %s)" - and be printed.

And detail will always contain something which is not the empty string,
so problem solved.

> > Also, according to the output above "amd64_edac" is supposed to be
> > [error msg] which is strange.
> >
> > I believe this comes from this call in f1x_map_sysaddr_to_csrow():
> >
> > edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
> > page, offset, syndrome,
> > csrow, chan, -1,
> > EDAC_MOD_STR, "", NULL);
> >
> > I guess you want to do the following instead:
> >
> > mcegen.py-2868 [007] .N.. 178.261607: mc_event: amd64_edac: corrected error on memory stick "unknown memory" (mc:0 csrow:3 channel:1 page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed)
> >
> > maybe concatenate EDAC_MOD_STR with the proper string it reports, i.e.
> > corrected/uncorrected error?
>
> The issue here is because amd64_edac (just like a few other drivers) use
> its driver name (EDAC_MOD_STR) as the error message, instead of using
> something meaningful, like "read error" or "ECC error".

No, the issue is here that edac_mc_handle_ce() used to say "CE..."
and edac_mc_handle_ue() used to say "UE.. " and yours don't say that
anymore. In other words, you need to add the "CE/UE" thing to the string
based on the HW_EVENT_ERR_* flag or something to that effect.

[ … ]

> >> Of course, any userspace tools meant to handle errors should not parse
> >> the above data. They should, instead, use the binary fields provided by
> >> the tracepoint, mapping them directly into their MIBs.
> >
> > What is a MIB?
>
> Management Information Base. This is how anyone that works with Element
> Management calls the model of information that represents each management
> property. It is generally written using ITU-T ASN.1 syntax. Almost all
> management software use that.
>
> [1] http://en.wikipedia.org/wiki/Management_information_base

That looks like an ACPI or some other idiotic spec speak, pls remove it.

[ … ]

> >> + * edac_mc_handle_error - reports a memory event to userspace
> >> + *
> >> + * @type: severity of the error (CE/UE/Fatal)
> >> + * @mci: a struct mem_ctl_info pointer
> >> + * @page_frame_number: mem page where the error occurred
> >> + * @offset_in_page: offset of the error inside the page
> >> + * @syndrome: ECC syndrome
> >> + * @layer0: Memory layer0 position
> >> + * @layer1: Memory layer2 position
> >> + * @layer2: Memory layer3 position
> >> + * @msg: Message meaningful to the end users that
> >> + * explains the event
> >> + * @other_detail: Technical details about the event that
> >> + * may help hardware manufacturers and
> >> + * EDAC developers to analyse the event
> >
> > analyze it.
>
> Analyse is the same as analyze [2].

I know that. What I meant is

s/EDAC developers to analyse the event/EDAC developers to analyse it/

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

Mauro Carvalho Chehab

unread,
May 16, 2012, 7:22:32 AM5/16/12
to Borislav Petkov, Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
Em 15-05-2012 13:38, Borislav Petkov escreveu:
> On Tue, May 15, 2012 at 01:05:48PM -0300, Mauro Carvalho Chehab wrote:
>>> Here's what an error looks like on my system here:
>>>
>>> mcegen.py-2868 [007] .N.. 178.261607: mc_event: Corrected error:amd64_edac on memory stick "unknown memory" (mc:0 csrow:3 channel:1 page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed )
>>>
>>> There's still this trailing " " at the end of the error line which
>>> shouldn't be there and also two spaces between "channel" and "page".
>>
>> If you take a look at the trace printk:
>>
>> + TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s %s)",
>> + (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
>> + ((__entry->err_type == HW_EVENT_ERR_FATAL) ?
>> + "Fatal" : "Uncorrected"),
>> + __get_str(msg),
>> + __get_str(label),
>> + __entry->mc_index,
>> + __get_str(location),
>> + __get_str(detail),
>> + __get_str(driver_detail))
>>
>> There are not extra spaces there. The first extra space is probably because
>> there is an extra space at the label string. This should be easy to fix.
>>
>> The other extra space at the end is because amd64 currently doesn't provide
>> driver_detail information.
>
> Remind me again why do we need two strings: detail and driver_detail?
>
> Because they could very well be lumped together with a single "%s"
> format - "(mc:%d %s)" - and be printed.

As already explained, after merging two different fields, they can't be easily
unmerged.

The main reason for moving from printk to tracepoint is to allow userspace
programs to get data in binary format, avoiding them to parse a printk message.

So, it is more important to provide the right information at the trace fields
than the niceness of the printk data.

>
> And detail will always contain something which is not the empty string,
> so problem solved.
>
>>> Also, according to the output above "amd64_edac" is supposed to be
>>> [error msg] which is strange.
>>>
>>> I believe this comes from this call in f1x_map_sysaddr_to_csrow():
>>>
>>> edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
>>> page, offset, syndrome,
>>> csrow, chan, -1,
>>> EDAC_MOD_STR, "", NULL);
>>>
>>> I guess you want to do the following instead:
>>>
>>> mcegen.py-2868 [007] .N.. 178.261607: mc_event: amd64_edac: corrected error on memory stick "unknown memory" (mc:0 csrow:3 channel:1 page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed)
>>>
>>> maybe concatenate EDAC_MOD_STR with the proper string it reports, i.e.
>>> corrected/uncorrected error?
>>
>> The issue here is because amd64_edac (just like a few other drivers) use
>> its driver name (EDAC_MOD_STR) as the error message, instead of using
>> something meaningful, like "read error" or "ECC error".
>
> No, the issue is here that edac_mc_handle_ce() used to say "CE..."
> and edac_mc_handle_ue() used to say "UE.. " and yours don't say that
> anymore. In other words, you need to add the "CE/UE" thing to the string
> based on the HW_EVENT_ERR_* flag or something to that effect.


Huh?

>>> mcegen.py-2868 [007] .N.. 178.261607: mc_event: Corrected error:amd64_edac on memory stick "unknown memory" (mc:0 csrow:3 channel:1 page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed )

CE/UE is there. The only change is that, instead of using acronyms, it is now
saying "Corrected error"/"Uncorrected error", as the idea is to provide something
that the user will better understand.

> [ … ]
>
>>>> Of course, any userspace tools meant to handle errors should not parse
>>>> the above data. They should, instead, use the binary fields provided by
>>>> the tracepoint, mapping them directly into their MIBs.
>>>
>>> What is a MIB?
>>
>> Management Information Base. This is how anyone that works with Element
>> Management calls the model of information that represents each management
>> property. It is generally written using ITU-T ASN.1 syntax. Almost all
>> management software use that.
>>
>> [1] http://en.wikipedia.org/wiki/Management_information_base
>
> That looks like an ACPI or some other idiotic spec speak, pls remove it.

No, MIB is not an idiotic speak. It is a widely-used term, older than TCP/IP,
and defined by ITU-T (M.3000 series) and by ISO.

It is used by everybody that knows a little bit about error management,
as all management systems and protocols (like SNMP) are based on MIB
concepts. This concept is also used by other international forums, like
IETF, to define the Information Model to be used to manage the machine
and/or network resources. For example, the TCP/IP stack management base
is defined at IETF RFC1213.

>
> [ … ]
>
>>>> + * edac_mc_handle_error - reports a memory event to userspace
>>>> + *
>>>> + * @type: severity of the error (CE/UE/Fatal)
>>>> + * @mci: a struct mem_ctl_info pointer
>>>> + * @page_frame_number: mem page where the error occurred
>>>> + * @offset_in_page: offset of the error inside the page
>>>> + * @syndrome: ECC syndrome
>>>> + * @layer0: Memory layer0 position
>>>> + * @layer1: Memory layer2 position
>>>> + * @layer2: Memory layer3 position
>>>> + * @msg: Message meaningful to the end users that
>>>> + * explains the event
>>>> + * @other_detail: Technical details about the event that
>>>> + * may help hardware manufacturers and
>>>> + * EDAC developers to analyse the event
>>>
>>> analyze it.
>>
>> Analyse is the same as analyze [2].
>
> I know that. What I meant is
>
> s/EDAC developers to analyse the event/EDAC developers to analyse it/
>

Ah, OK, I'll fix that.

Regards,
Mauro

Steven Rostedt

unread,
May 16, 2012, 8:48:19 AM5/16/12
to Borislav Petkov, Mauro Carvalho Chehab, Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Frederic Weisbecker, Ingo Molnar
On Tue, 2012-05-15 at 18:38 +0200, Borislav Petkov wrote:
> On Tue, May 15, 2012 at 01:05:48PM -0300, Mauro Carvalho Chehab wrote:
> > > Here's what an error looks like on my system here:
> > >
> > > mcegen.py-2868 [007] .N.. 178.261607: mc_event: Corrected error:amd64_edac on memory stick "unknown memory" (mc:0 csrow:3 channel:1 page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed )
> > >
> > > There's still this trailing " " at the end of the error line which
> > > shouldn't be there and also two spaces between "channel" and "page".
> >
> > If you take a look at the trace printk:
> >
> > + TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s %s)",
> > + (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
> > + ((__entry->err_type == HW_EVENT_ERR_FATAL) ?
> > +
> > + __get_str(msg),
> > + __get_str(label),
> > + __entry->mc_index,
> > + __get_str(location),
> > + __get_str(detail),
> > + __get_str(driver_detail))
> >
> > There are not extra spaces there. The first extra space is probably because
> > there is an extra space at the label string. This should be easy to fix.
> >
> > The other extra space at the end is because amd64 currently doesn't provide
> > driver_detail information.
>
> Remind me again why do we need two strings: detail and driver_detail?
>
> Because they could very well be lumped together with a single "%s"
> format - "(mc:%d %s)" - and be printed.
>
> And detail will always contain something which is not the empty string,
> so problem solved.

Here's another trick if you want to get rid of the space and keep both
fields:

TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s%s%s)",
(__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
((__entry->err_type == HW_EVENT_ERR_FATAL) ?
"Fatal" : "Uncorrected"),

__get_str(msg),
__get_str(label),
__entry->mc_index,
__get_str(location),
__get_str(detail),
strlen(__get_str(detail)) &&
strlen(__get_str(driver_detail) ? " ": "",
__get_str(driver_detail))

-- Steve

Borislav Petkov

unread,
May 16, 2012, 9:17:40 AM5/16/12
to Mauro Carvalho Chehab, Borislav Petkov, Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
This doesn't answer my question. My question was: "why can't 'detail'
and 'driver_detail' be a single parameter, i.e. 'detail' and this way
solve both pretty printing and getting binary data problems?
Ok, so let's change the string output from the above version to:

"mcegen.py-2868 [007] .N.. 178.261607: mc_event: amd64_edac: Corrected error on memory stick "unknown memory" (mc:0 csrow:3 channel:1 page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed)"

I.e., "mc_event" [DRIVER] [ERROR TYPE] [DETAIL].

> >>>> Of course, any userspace tools meant to handle errors should not parse
> >>>> the above data. They should, instead, use the binary fields provided by
> >>>> the tracepoint, mapping them directly into their MIBs.
> >>>
> >>> What is a MIB?
> >>
> >> Management Information Base. This is how anyone that works with Element
> >> Management calls the model of information that represents each management
> >> property. It is generally written using ITU-T ASN.1 syntax. Almost all
> >> management software use that.
> >>
> >> [1] http://en.wikipedia.org/wiki/Management_information_base
> >
> > That looks like an ACPI or some other idiotic spec speak, pls remove it.
>
> No, MIB is not an idiotic speak. It is a widely-used term, older than TCP/IP,
> and defined by ITU-T (M.3000 series) and by ISO.
>
> It is used by everybody that knows a little bit about error management,
> as all management systems and protocols (like SNMP) are based on MIB
> concepts. This concept is also used by other international forums, like
> IETF, to define the Information Model to be used to manage the machine
> and/or network resources. For example, the TCP/IP stack management base
> is defined at IETF RFC1213.

More idiotic drivel.

Oh, well, you won't let go of this crap, so at least add the wikipedia
URL to the "MIB" reference so that mere mortals like me who don't know
jack of error management can understand what you're talking about.

Thanks.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

Steven Rostedt

unread,
May 16, 2012, 9:28:25 AM5/16/12
to Borislav Petkov, Mauro Carvalho Chehab, Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Frederic Weisbecker, Ingo Molnar
On Wed, 2012-05-16 at 15:16 +0200, Borislav Petkov wrote:

> > >>>> Of course, any userspace tools meant to handle errors should not parse
> > >>>> the above data. They should, instead, use the binary fields provided by
> > >>>> the tracepoint, mapping them directly into their MIBs.
> > >>>
> > >>> What is a MIB?
> > >>
> > >> Management Information Base. This is how anyone that works with Element
> > >> Management calls the model of information that represents each management
> > >> property. It is generally written using ITU-T ASN.1 syntax. Almost all
> > >> management software use that.
> > >>
> > >> [1] http://en.wikipedia.org/wiki/Management_information_base
> > >
> > > That looks like an ACPI or some other idiotic spec speak, pls remove it.
> >
> > No, MIB is not an idiotic speak. It is a widely-used term, older than TCP/IP,
> > and defined by ITU-T (M.3000 series) and by ISO.
> >
> > It is used by everybody that knows a little bit about error management,
> > as all management systems and protocols (like SNMP) are based on MIB
> > concepts. This concept is also used by other international forums, like
> > IETF, to define the Information Model to be used to manage the machine
> > and/or network resources. For example, the TCP/IP stack management base
> > is defined at IETF RFC1213.
>
> More idiotic drivel.
>
> Oh, well, you won't let go of this crap, so at least add the wikipedia
> URL to the "MIB" reference so that mere mortals like me who don't know
> jack of error management can understand what you're talking about.

I should change goodmis to goodmib. That sounds better. For those that
don't know, goodmis comes from a project I started a long time ago.

Good Object Oriented Database Management Information System

Which was inspired by a database I wrote at a company I worked for in
the 90's. Called "Object Based Information v1" or OBI-1.

:-)

-- Steve

Borislav Petkov

unread,
May 16, 2012, 9:33:07 AM5/16/12
to Steven Rostedt, Mauro Carvalho Chehab, Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Frederic Weisbecker, Ingo Molnar
On Wed, May 16, 2012 at 09:27:51AM -0400, Steven Rostedt wrote:
> I should change goodmis to goodmib. That sounds better. For those that
> don't know, goodmis comes from a project I started a long time ago.

I always thought "goodmis" meant the good thing you missed :-) And I
wanted to ask you about it the next time we're at a conf somewhere.

> Good Object Oriented Database Management Information System

Which one was the bad OODBMIS?

> Which was inspired by a database I wrote at a company I worked for in
> the 90's. Called "Object Based Information v1" or OBI-1.

LOOL.

For the same reason, x86 hw guys came up with an instruction called LEA.

:-)

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

Mauro Carvalho Chehab

unread,
May 16, 2012, 11:16:58 AM5/16/12
to Borislav Petkov, Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
This is the 24th version of this very same patch... We started reviewing this patch
in January... All arguments about why this is a separate data were already said and
discussed.

In summary: all edac messages provide "detail" as this contains the error location
in terms of channel/slot. So, any MIB for EDAC could handle those parameters properly.
With regards to driver_detail, this have per-driver details. So, per-driver MIB is
required for them, if some userspace program wants to properly store that information.

Merging those two separate fields together only makes harder for userspace to store
the error detail information on their MIB.
As I've explained, there is a consistency issue with the [ERROR MSG]. On the ones
that properly implement it, [ERROR MSG] is "read error", "spare copy", "bus error", ...

If you think we need to add an extra field for the driver name, then let's add a
"driver_name" field there, but a driver's name shouldn't be merged with an
error type message, as it is abusing the syntax, and the syntax of each field
should be clear enough to allow it to be stored on a MIB.

In any case, those drivers that fill the error msg with something else should
be changes to write there something like "ECC error" instead of "amd64_edac:".

Btw, I'm almost convinced that we should break the "detail" into its components,
e. g., instead of one string, it should be:
int layer0, layer1, layer2; /* Location */
unsigned long memory_address; /* or maybe page/offset */
u32 grain;
unsigned syndrome;

That would be better from MIB point of view.

>
>>>>>> Of course, any userspace tools meant to handle errors should not parse
>>>>>> the above data. They should, instead, use the binary fields provided by
>>>>>> the tracepoint, mapping them directly into their MIBs.
>>>>>
>>>>> What is a MIB?
>>>>
>>>> Management Information Base. This is how anyone that works with Element
>>>> Management calls the model of information that represents each management
>>>> property. It is generally written using ITU-T ASN.1 syntax. Almost all
>>>> management software use that.
>>>>
>>>> [1] http://en.wikipedia.org/wiki/Management_information_base
>>>
>>> That looks like an ACPI or some other idiotic spec speak, pls remove it.
>>
>> No, MIB is not an idiotic speak. It is a widely-used term, older than TCP/IP,
>> and defined by ITU-T (M.3000 series) and by ISO.
>>
>> It is used by everybody that knows a little bit about error management,
>> as all management systems and protocols (like SNMP) are based on MIB
>> concepts. This concept is also used by other international forums, like
>> IETF, to define the Information Model to be used to manage the machine
>> and/or network resources. For example, the TCP/IP stack management base
>> is defined at IETF RFC1213.
>
> More idiotic drivel.
>
> Oh, well, you won't let go of this crap, so at least add the wikipedia
> URL to the "MIB" reference so that mere mortals like me who don't know
> jack of error management can understand what you're talking about.

Ok, I'll add it.

Regards,
Mauro

Mauro Carvalho Chehab

unread,
May 16, 2012, 11:24:18 AM5/16/12
to Steven Rostedt, Borislav Petkov, Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Frederic Weisbecker, Ingo Molnar
Great! I'll use that trick, thanks!
>
> -- Steve
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-edac" in

Borislav Petkov

unread,
May 16, 2012, 11:47:32 AM5/16/12
to Mauro Carvalho Chehab, Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
On Wed, May 16, 2012 at 12:16:35PM -0300, Mauro Carvalho Chehab wrote:
> > This doesn't answer my question. My question was: "why can't 'detail'
> > and 'driver_detail' be a single parameter, i.e. 'detail' and this way
> > solve both pretty printing and getting binary data problems?
>
> This is the 24th version of this very same patch...

This would have been the case if you'd split your patches and sent them
in 10-15 patches sets like normal people, _after_ gathering all review feedback.

But, you wanted to fix EDAC and the whole world while at it and besides,
your patches caused boot errors here and there.

Then, you went and rebased the whole patchset after me reviewing one or
two and incremented for that rebase the version number.

So v24 means nothing to me - you might just as well use it for your
internal tracking.

> In summary: all edac messages provide "detail" as this contains the
> error location in terms of channel/slot. So, any MIB for EDAC could
> handle those parameters properly. With regards to driver_detail, this
> have per-driver details. So, per-driver MIB is required for them, if
> some userspace program wants to properly store that information.
>
> Merging those two separate fields together only makes harder for
> userspace to store the error detail information on their MIB.

There's that MIB crap again.

And it doesn't make it harder for anything because in userspace you can
do everything with those strings, cut them, replace them, whatever your
heart desires, even store the correct error detail information "on their
MIB." Basically, you have one string in userspace and you can massage
the hell out of it and even fit it to the MIB or whatever...

[ … ]

> >>>>> mcegen.py-2868 [007] .N.. 178.261607: mc_event: Corrected error:amd64_edac on memory stick "unknown memory" (mc:0 csrow:3 channel:1 page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed )
> >>
> >> CE/UE is there. The only change is that, instead of using acronyms, it is now
> >> saying "Corrected error"/"Uncorrected error", as the idea is to provide something
> >> that the user will better understand.
> >
> > Ok, so let's change the string output from the above version to:
> >
> > "mcegen.py-2868 [007] .N.. 178.261607: mc_event: amd64_edac: Corrected error on memory stick "unknown memory" (mc:0 csrow:3 channel:1 page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed)"
> >
> > I.e., "mc_event" [DRIVER] [ERROR TYPE] [DETAIL].
>
> As I've explained, there is a consistency issue with the [ERROR MSG]. On the ones
> that properly implement it, [ERROR MSG] is "read error", "spare copy", "bus error", ...
>
> If you think we need to add an extra field for the driver name, then let's add a
> "driver_name" field there, but a driver's name shouldn't be merged with an
> error type message, as it is abusing the syntax, and the syntax of each field
> should be clear enough to allow it to be stored on a MIB.
>
> In any case, those drivers that fill the error msg with something else should
> be changes to write there something like "ECC error" instead of "amd64_edac:".
>
> Btw, I'm almost convinced that we should break the "detail" into its components,
> e. g., instead of one string, it should be:
> int layer0, layer1, layer2; /* Location */
> unsigned long memory_address; /* or maybe page/offset */
> u32 grain;
> unsigned syndrome;
>
> That would be better from MIB point of view.

Dude, enough with this MIB crap already!

Simply move the EDAC_MOD_STR thing earlier in the tracepoint so that you
can get:

"mc_event:" EDAC_MOD_STR [ERROR_TYPE] [DETAIL]

EDAC_MOD_STR is _always_ the driver name:

0 amd64_edac.h 148 #define EDAC_MOD_STR "amd64_edac"
1 amd76x_edac.c 23 #define EDAC_MOD_STR "amd76x_edac"
2 amd8111_edac.c 37 #define AMD8111_EDAC_MOD_STR "amd8111_edac"
3 amd8131_edac.c 37 #define AMD8131_EDAC_MOD_STR "amd8131_edac"
4 cpc925_edac.c 34 #define CPC925_EDAC_MOD_STR "cpc925_edac"
5 e752x_edac.c 28 #define EDAC_MOD_STR "e752x_edac"
6 e7xxx_edac.c 33 #define EDAC_MOD_STR "e7xxx_edac"
7 i3000_edac.c 21 #define EDAC_MOD_STR "i3000_edac"
8 i3200_edac.c 22 #define EDAC_MOD_STR "i3200_edac"
9 i5000_edac.c 31 #define EDAC_MOD_STR "i5000_edac"
a i5400_edac.c 38 #define EDAC_MOD_STR "i5400_edac"
b i7300_edac.c 36 #define EDAC_MOD_STR "i7300_edac"
c i7core_edac.c 65 #define EDAC_MOD_STR "i7core_edac"
d i82443bxgx_edac.c 36 #define EDAC_MOD_STR "i82443bxgx_edac"
e i82860_edac.c 20 #define EDAC_MOD_STR "i82860_edac"
f i82875p_edac.c 24 #define EDAC_MOD_STR "i82875p_edac"
g i82975x_edac.c 20 #define EDAC_MOD_STR "i82975x_edac"
h mpc85xx_edac.h 15 #define EDAC_MOD_STR "MPC85xx_edac"
i mv64x60_edac.h 16 #define EDAC_MOD_STR "MV64x60_edac"
j r82600_edac.c 26 #define EDAC_MOD_STR "r82600_edac"
k sb_edac.c 38 #define EDAC_MOD_STR "sbridge_edac"
l x38_edac.c 21 #define EDAC_MOD_STR "x38_edac"

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

Steven Rostedt

unread,
May 16, 2012, 1:05:23 PM5/16/12
to Mauro Carvalho Chehab, Borislav Petkov, Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Frederic Weisbecker, Ingo Molnar
On Wed, 2012-05-16 at 12:24 -0300, Mauro Carvalho Chehab wrote:

> > Here's another trick if you want to get rid of the space and keep both
> > fields:
> >
> > TP_printk("%s error:%s on memory stick \"%s\" (mc:%d %s %s%s%s)",
> > (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
> > ((__entry->err_type == HW_EVENT_ERR_FATAL) ?
> > "Fatal" : "Uncorrected"),
> >
> > __get_str(msg),
> > __get_str(label),
> > __entry->mc_index,
> > __get_str(location),
> > __get_str(detail),
> > strlen(__get_str(detail)) &&
> > strlen(__get_str(driver_detail) ? " ": "",
> > __get_str(driver_detail))
>
> Great! I'll use that trick, thanks!

strlen() may be too overblown. The following should work and is more
efficient:

(((char *)__get_str(detail))[0] &&
((char *)__get_str(driver_detail))[0]) ? " " : "",

-- Steve


--

Mauro Carvalho Chehab

unread,
May 16, 2012, 2:08:54 PM5/16/12
to Borislav Petkov, Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
Em 16-05-2012 12:47, Borislav Petkov escreveu:
> On Wed, May 16, 2012 at 12:16:35PM -0300, Mauro Carvalho Chehab wrote:
>>> This doesn't answer my question. My question was: "why can't 'detail'
>>> and 'driver_detail' be a single parameter, i.e. 'detail' and this way
>>> solve both pretty printing and getting binary data problems?
>>
>> This is the 24th version of this very same patch...
>
> This would have been the case if you'd split your patches and sent them
> in 10-15 patches sets like normal people, _after_ gathering all review feedback.

The first version had only 3 patches. Patches were being added there due to
the huge delay to get them reviewed/accepted.

> But, you wanted to fix EDAC and the whole world while at it and besides,
> your patches caused boot errors here and there.
>
> Then, you went and rebased the whole patchset after me reviewing one or
> two and incremented for that rebase the version number.

Because you requested that by not accepting incremental patches at the end
of the series.

> So v24 means nothing to me - you might just as well use it for your
> internal tracking.
>
>> In summary: all edac messages provide "detail" as this contains the
>> error location in terms of channel/slot. So, any MIB for EDAC could
>> handle those parameters properly. With regards to driver_detail, this
>> have per-driver details. So, per-driver MIB is required for them, if
>> some userspace program wants to properly store that information.
>>
>> Merging those two separate fields together only makes harder for
>> userspace to store the error detail information on their MIB.
>
> There's that MIB crap again.
>
> And it doesn't make it harder for anything because in userspace you can
> do everything with those strings, cut them, replace them, whatever your
> heart desires, even store the correct error detail information "on their
> MIB." Basically, you have one string in userspace and you can massage
> the hell out of it and even fit it to the MIB or whatever...

The rationale behind providing binary information instead of a printk is
to avoid kernel to spend time with printk formatting and userspace to
parse it and try to recover the original data.

If this weren't a requirement, the better would be to just not use any
tracepoint for errors at all.

Also, the API should be handled in a way that it will work on userspace.

I'm foreseen the need of implementing the error counters on userspace
on my plans for the edac-utils userspace (or at least, some advanced
error counter logic, like burst detection, decay, etc).

For that to be properly implemented, userspace may need to get access
to each of the components of the "detail" field.

>
> [ … ]
>
>>>>>>> mcegen.py-2868 [007] .N.. 178.261607: mc_event: Corrected error:amd64_edac on memory stick "unknown memory" (mc:0 csrow:3 channel:1 page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed )
>>>>
>>>> CE/UE is there. The only change is that, instead of using acronyms, it is now
>>>> saying "Corrected error"/"Uncorrected error", as the idea is to provide something
>>>> that the user will better understand.
>>>
>>> Ok, so let's change the string output from the above version to:
>>>
>>> "mcegen.py-2868 [007] .N.. 178.261607: mc_event: amd64_edac: Corrected error on memory stick "unknown memory" (mc:0 csrow:3 channel:1 page:0x5bac7 offset:0x388 grain:0 syndrome:0x34ed)"
>>>
>>> I.e., "mc_event" [DRIVER] [ERROR TYPE] [DETAIL].
>>
>> As I've explained, there is a consistency issue with the [ERROR MSG]. On the ones
>> that properly implement it, [ERROR MSG] is "read error", "spare copy", "bus error", ...
>>
>> If you think we need to add an extra field for the driver name, then let's add a
>> "driver_name" field there, but a driver's name shouldn't be merged with an
>> error type message, as it is abusing the syntax, and the syntax of each field
>> should be clear enough to allow it to be stored on a MIB.
>>
>> In any case, those drivers that fill the error msg with something else should
>> be changes to write there something like "ECC error" instead of "amd64_edac:".
>>
>> Btw, I'm almost convinced that we should break the "detail" into its components,
>> e. g., instead of one string, it should be:
>> int layer0, layer1, layer2; /* Location */
>> unsigned long memory_address; /* or maybe page/offset */
>> u32 grain;
>> unsigned syndrome;
>>
>> That would be better from MIB point of view.
>
> Dude, enough with this MIB crap already!

Your feedback about the patches are very welcome, but it seems that you
have no experience with MIB handling and/or with userspace logic for
error management, so, please stop nonsense about something that doesn't
seem to be on your expertise area.

> Simply move the EDAC_MOD_STR thing earlier in the tracepoint so that you
> can get:
>
> "mc_event:" EDAC_MOD_STR [ERROR_TYPE] [DETAIL]
>
> EDAC_MOD_STR is _always_ the driver name:
>
> 0 amd64_edac.h 148 #define EDAC_MOD_STR "amd64_edac"
> 1 amd76x_edac.c 23 #define EDAC_MOD_STR "amd76x_edac"

> 2 amd8111_edac.c 37 #define AMD8111_EDAC_MOD_STR "amd8111_edac"
> 3 amd8131_edac.c 37 #define AMD8131_EDAC_MOD_STR "amd8131_edac"
> 4 cpc925_edac.c 34 #define CPC925_EDAC_MOD_STR "cpc925_edac"

Those above use another name.

> 5 e752x_edac.c 28 #define EDAC_MOD_STR "e752x_edac"
> 6 e7xxx_edac.c 33 #define EDAC_MOD_STR "e7xxx_edac"
> 7 i3000_edac.c 21 #define EDAC_MOD_STR "i3000_edac"
> 8 i3200_edac.c 22 #define EDAC_MOD_STR "i3200_edac"
> 9 i5000_edac.c 31 #define EDAC_MOD_STR "i5000_edac"
> a i5400_edac.c 38 #define EDAC_MOD_STR "i5400_edac"
> b i7300_edac.c 36 #define EDAC_MOD_STR "i7300_edac"
> c i7core_edac.c 65 #define EDAC_MOD_STR "i7core_edac"
> d i82443bxgx_edac.c 36 #define EDAC_MOD_STR "i82443bxgx_edac"
> e i82860_edac.c 20 #define EDAC_MOD_STR "i82860_edac"
> f i82875p_edac.c 24 #define EDAC_MOD_STR "i82875p_edac"
> g i82975x_edac.c 20 #define EDAC_MOD_STR "i82975x_edac"
> h mpc85xx_edac.h 15 #define EDAC_MOD_STR "MPC85xx_edac"
> i mv64x60_edac.h 16 #define EDAC_MOD_STR "MV64x60_edac"
> j r82600_edac.c 26 #define EDAC_MOD_STR "r82600_edac"
> k sb_edac.c 38 #define EDAC_MOD_STR "sbridge_edac"
> l x38_edac.c 21 #define EDAC_MOD_STR "x38_edac"
>

Ok, I'll add a driver_name field with *EDAC_MOD_STR, on a separate
patchset (as this will require touching, again, on every single
driver).

Regards,
Mauro

Borislav Petkov

unread,
May 16, 2012, 4:00:18 PM5/16/12
to Mauro Carvalho Chehab, Borislav Petkov, Luck, Tony, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
On Wed, May 16, 2012 at 01:52:21PM -0300, Mauro Carvalho Chehab wrote:
> Em 16-05-2012 12:47, Borislav Petkov escreveu:
> > On Wed, May 16, 2012 at 12:16:35PM -0300, Mauro Carvalho Chehab wrote:
> >>> This doesn't answer my question. My question was: "why can't 'detail'
> >>> and 'driver_detail' be a single parameter, i.e. 'detail' and this way
> >>> solve both pretty printing and getting binary data problems?
> >>
> >> This is the 24th version of this very same patch...
> >
> > This would have been the case if you'd split your patches and sent them
> > in 10-15 patches sets like normal people, _after_ gathering all review feedback.
>
> The first version had only 3 patches. Patches were being added there due to
> the huge delay to get them reviewed/accepted.
>
> > But, you wanted to fix EDAC and the whole world while at it and besides,
> > your patches caused boot errors here and there.
> >
> > Then, you went and rebased the whole patchset after me reviewing one or
> > two and incremented for that rebase the version number.
>
> Because you requested that by not accepting incremental patches at the end
> of the series.

No, because this is how patch series are done. Incremental patches are
for people who get paid on the amount of patches they get into the
kernel.

> > So v24 means nothing to me - you might just as well use it for your
> > internal tracking.
> >
> >> In summary: all edac messages provide "detail" as this contains the
> >> error location in terms of channel/slot. So, any MIB for EDAC could
> >> handle those parameters properly. With regards to driver_detail, this
> >> have per-driver details. So, per-driver MIB is required for them, if
> >> some userspace program wants to properly store that information.
> >>
> >> Merging those two separate fields together only makes harder for
> >> userspace to store the error detail information on their MIB.
> >
> > There's that MIB crap again.
> >
> > And it doesn't make it harder for anything because in userspace you can
> > do everything with those strings, cut them, replace them, whatever your
> > heart desires, even store the correct error detail information "on their
> > MIB." Basically, you have one string in userspace and you can massage
> > the hell out of it and even fit it to the MIB or whatever...
>
> The rationale behind providing binary information instead of a printk is
> to avoid kernel to spend time with printk formatting and userspace to
> parse it and try to recover the original data.
>
> If this weren't a requirement, the better would be to just not use any
> tracepoint for errors at all.
>
> Also, the API should be handled in a way that it will work on userspace.

No, userspace will be doing parsing because it is the only sensible
thing to do. The kernel's job is to carry out enough information for
the user to handle the error in a way which is just enough. No more, no
less. It is in no f*cking way expected to make it pretty and in suitable
portions so that userspace can consume it.

Ok, this took longer than I thought and I'm getting really tired of the
crap so let me save you the trouble:

* either make the RAS tracepoint patch the way I'm suggesting.

* or give a really good reason for doing it differently (and yes,
userspace will be doing parsing).

* or consider it nacked.

It is that simple.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

Luck, Tony

unread,
May 16, 2012, 4:27:33 PM5/16/12
to Borislav Petkov, Mauro Carvalho Chehab, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
> No, userspace will be doing parsing because it is the only sensible
> thing to do. The kernel's job is to carry out enough information for
> the user to handle the error in a way which is just enough. No more, no
> less. It is in no f*cking way expected to make it pretty and in suitable
> portions so that userspace can consume it.

My requirement on prettiness was just that the information useful
to a normal system owner come first. Enough to know how serious the
problem is, and which component is affected. The fine details must
be in the "( ... )" part - and will generally only be of interest
to very sophisticated users.

We've reached that goal.

Looking at the next part ... which is how to write useful analysis
tools that study these logs. We have some common elements in trace
records - but a there is (deliberately) plenty of scope for drivers
to add their own customizations. If these bits are going to be
used by the user mode tools - then they will have per-driver
parsing code that will be tightly linked to the version of the driver.

Are we happy with this? Once the patch goes it, we have created an
ABI which will be hard to change.

-Tony

Borislav Petkov

unread,
May 16, 2012, 5:05:38 PM5/16/12
to Luck, Tony, Borislav Petkov, Mauro Carvalho Chehab, Linux Edac Mailing List, Linux Kernel Mailing List, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
On Wed, May 16, 2012 at 08:27:24PM +0000, Luck, Tony wrote:
> > No, userspace will be doing parsing because it is the only sensible
> > thing to do. The kernel's job is to carry out enough information for
> > the user to handle the error in a way which is just enough. No more, no
> > less. It is in no f*cking way expected to make it pretty and in suitable
> > portions so that userspace can consume it.
>
> My requirement on prettiness was just that the information useful
> to a normal system owner come first. Enough to know how serious the
> problem is, and which component is affected. The fine details must
> be in the "( ... )" part - and will generally only be of interest
> to very sophisticated users.
>
> We've reached that goal.
>
> Looking at the next part ... which is how to write useful analysis
> tools that study these logs. We have some common elements in trace
> records - but a there is (deliberately) plenty of scope for drivers
> to add their own customizations. If these bits are going to be
> used by the user mode tools - then they will have per-driver
> parsing code that will be tightly linked to the version of the driver.

This is exactly what I'm trying to explain - the kernel carries out the
information, userspace parses it the way it sees fit.

> Are we happy with this? Once the patch goes it, we have created an
> ABI which will be hard to change.

Yes, very true.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

Mauro Carvalho Chehab

unread,
May 17, 2012, 5:28:23 PM5/17/12
to Mauro Carvalho Chehab, Linux Edac Mailing List, Linux Kernel Mailing List, Aristeu Rozanski, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
For example:

mc_event: Corrected error:memory read on memory stick "DIMM_1A" (mc:0 channel:0 slot:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA)

Of course, any userspace tools meant to handle errors should not parse
the above data. They should, instead, use the binary fields provided by
the tracepoint, mapping them directly into their MIBs.

NOTE: The original patch was providing an additional mechanism for
MCA-based trace events that also contained MCA error register data.
Hoever, as no agreement was reached so far for the MCA-based trace
events, for now, let's add events only for memory errors.
A latter patch is planned to change the tracepoint, for those types
of event.

Cc: Aristeu Rozanski <aroz...@redhat.com>
Cc: Doug Thompson <nor...@yahoo.com>
Cc: Steven Rostedt <ros...@goodmis.org>
Cc: Frederic Weisbecker <fwei...@gmail.com>
Cc: Ingo Molnar <mi...@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mch...@redhat.com>
---

v24b: Remove extra whitespaces and improve messages on amd64_edac.

drivers/edac/amd64_edac.c | 29 ++++++++--------
drivers/edac/amd64_edac.h | 3 +-
drivers/edac/edac_core.h | 2 +-
drivers/edac/edac_mc.c | 56 +++++++++++++++++++++++++------
include/ras/ras_event.h | 80 +++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 143 insertions(+), 27 deletions(-)
create mode 100644 include/ras/ras_event.h

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 60479f9..5be52e1 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1052,8 +1052,8 @@ static void k8_map_sysaddr_to_csrow(struct mem_ctl_info *mci, u64 sys_addr,
edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
page, offset, syndrome,
-1, -1, -1,
- EDAC_MOD_STR,
"failed to map error addr to a node",
+ EDAC_MOD_STR,
NULL);
return;
}
@@ -1064,8 +1064,8 @@ static void k8_map_sysaddr_to_csrow(struct mem_ctl_info *mci, u64 sys_addr,
edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
page, offset, syndrome,
-1, -1, -1,
- EDAC_MOD_STR,
"failed to map error addr to a csrow",
+ EDAC_MOD_STR,
NULL);
return;
}
@@ -1085,8 +1085,8 @@ static void k8_map_sysaddr_to_csrow(struct mem_ctl_info *mci, u64 sys_addr,
edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
page, offset, syndrome,
csrow, -1, -1,
- EDAC_MOD_STR,
"unknown syndrome - possible error reporting race",
+ EDAC_MOD_STR,
NULL);
return;
}
@@ -1105,7 +1105,7 @@ static void k8_map_sysaddr_to_csrow(struct mem_ctl_info *mci, u64 sys_addr,
edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, src_mci,
page, offset, syndrome,
csrow, channel, -1,
- EDAC_MOD_STR, "", NULL);
+ "", EDAC_MOD_STR, NULL);
}

static int ddr2_cs_size(unsigned i, bool dct_width)
@@ -1595,8 +1595,8 @@ static void f1x_map_sysaddr_to_csrow(struct mem_ctl_info *mci, u64 sys_addr,
edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
page, offset, syndrome,
-1, -1, -1,
- EDAC_MOD_STR,
"failed to map error addr to a csrow",
+ EDAC_MOD_STR,
NULL);
return;
}
@@ -1612,7 +1612,7 @@ static void f1x_map_sysaddr_to_csrow(struct mem_ctl_info *mci, u64 sys_addr,
edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
page, offset, syndrome,
csrow, chan, -1,
- EDAC_MOD_STR, "", NULL);
+ "", EDAC_MOD_STR, NULL);
}

/*
@@ -1896,8 +1896,8 @@ static void amd64_handle_ce(struct mem_ctl_info *mci, struct mce *m)
edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
0, 0, 0,
-1, -1, -1,
- EDAC_MOD_STR,
"HW has no ERROR_ADDRESS available",
+ EDAC_MOD_STR,
NULL);
return;
}
@@ -1925,8 +1925,8 @@ static void amd64_handle_ue(struct mem_ctl_info *mci, struct mce *m)
edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
0, 0, 0,
-1, -1, -1,
- EDAC_MOD_STR,
"HW has no ERROR_ADDRESS available",
+ EDAC_MOD_STR,
NULL);
return;
}
@@ -1945,8 +1945,9 @@ static void amd64_handle_ue(struct mem_ctl_info *mci, struct mce *m)
edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
page, offset, 0,
-1, -1, -1,
+ "ERROR ADDRESS NOT mapped to a MC",
EDAC_MOD_STR,
- "ERROR ADDRESS NOT mapped to a MC", NULL);
+ NULL);
return;
}

@@ -1959,14 +1960,14 @@ static void amd64_handle_ue(struct mem_ctl_info *mci, struct mce *m)
edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
page, offset, 0,
-1, -1, -1,
- EDAC_MOD_STR,
"ERROR ADDRESS NOT mapped to CS",
+ EDAC_MOD_STR,
NULL);
} else {
edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
page, offset, 0,
csrow, -1, -1,
- EDAC_MOD_STR, "", NULL);
+ "", EDAC_MOD_STR, NULL);
}
}

@@ -2471,7 +2472,7 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci,
mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED;

mci->edac_cap = amd64_determine_edac_cap(pvt);
- mci->mod_name = EDAC_MOD_STR;
+ mci->mod_name = EDAC_DRIVER_STR;
mci->mod_ver = EDAC_AMD64_VERSION;
mci->ctl_name = fam->ctl_name;
mci->dev_name = pci_name(pvt->F2);
@@ -2731,7 +2732,7 @@ static const struct pci_device_id amd64_pci_table[] __devinitdata = {
MODULE_DEVICE_TABLE(pci, amd64_pci_table);

static struct pci_driver amd64_pci_driver = {
- .name = EDAC_MOD_STR,
+ .name = EDAC_DRIVER_STR,
.probe = amd64_probe_one_instance,
.remove = __devexit_p(amd64_remove_one_instance),
.id_table = amd64_pci_table,
@@ -2750,7 +2751,7 @@ static void setup_pci_device(void)

pvt = mci->pvt_info;
amd64_ctl_pci =
- edac_pci_create_generic_ctl(&pvt->F2->dev, EDAC_MOD_STR);
+ edac_pci_create_generic_ctl(&pvt->F2->dev, EDAC_DRIVER_STR);

if (!amd64_ctl_pci) {
pr_warning("%s(): Unable to create PCI control\n",
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 9a666cb..acea42c 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -145,7 +145,8 @@
*/

#define EDAC_AMD64_VERSION "3.4.0"
-#define EDAC_MOD_STR "amd64_edac"
+#define EDAC_DRIVER_STR "amd64_edac"
+#define EDAC_MOD_STR "driver:" EDAC_DRIVER_STR

/* Extended Model from CPUID, for CPU Revision numbers */
#define K8_REV_D 1
diff --git a/drivers/edac/edac_core.h b/drivers/edac/edac_core.h
index f06ce9a..eee7360 100644
--- a/drivers/edac/edac_core.h
+++ b/drivers/edac/edac_core.h
@@ -468,7 +468,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
const int layer2,
const char *msg,
const char *other_detail,
- const void *mcelog);
+ const void *arch_log);

/*
* edac_device APIs
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 7246a3c..461f6f8 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -33,6 +33,10 @@
#include "edac_core.h"
#include "edac_module.h"

+#define CREATE_TRACE_POINTS
+#define TRACE_INCLUDE_PATH ../../include/ras
+#include <ras/ras_event.h>
+
/* lock to memory controller's control array */
static DEFINE_MUTEX(mem_ctls_mutex);
static LIST_HEAD(mc_devices);
@@ -384,6 +388,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
* which will perform kobj unregistration and the actual free
* will occur during the kobject callback operation
*/
+
return mci;
}
EXPORT_SYMBOL_GPL(edac_mc_alloc);
@@ -909,12 +914,12 @@ static void edac_ce_error(struct mem_ctl_info *mci,
if (edac_mc_get_log_ce()) {
if (other_detail && *other_detail)
edac_mc_printk(mci, KERN_WARNING,
- "CE %s on %s (%s%s - %s)\n",
+ "CE %s on %s (%s %s - %s)\n",
msg, label, location,
detail, other_detail);
else
edac_mc_printk(mci, KERN_WARNING,
- "CE %s on %s (%s%s)\n",
+ "CE %s on %s (%s %s)\n",
msg, label, location,
detail);
}
@@ -953,12 +958,12 @@ static void edac_ue_error(struct mem_ctl_info *mci,
if (edac_mc_get_log_ue()) {
if (other_detail && *other_detail)
edac_mc_printk(mci, KERN_WARNING,
- "UE %s on %s (%s%s - %s)\n",
+ "UE %s on %s (%s %s - %s)\n",
msg, label, location, detail,
other_detail);
else
edac_mc_printk(mci, KERN_WARNING,
- "UE %s on %s (%s%s)\n",
+ "UE %s on %s (%s %s)\n",
msg, label, location, detail);
}

@@ -975,6 +980,27 @@ static void edac_ue_error(struct mem_ctl_info *mci,
}

#define OTHER_LABEL " or "
+
+/**
+ * edac_mc_handle_error - reports a memory event to userspace
+ *
+ * @type: severity of the error (CE/UE/Fatal)
+ * @mci: a struct mem_ctl_info pointer
+ * @page_frame_number: mem page where the error occurred
+ * @offset_in_page: offset of the error inside the page
+ * @syndrome: ECC syndrome
+ * @layer0: Memory layer0 position
+ * @layer1: Memory layer2 position
+ * @layer2: Memory layer3 position
+ * @msg: Message meaningful to the end users that
+ * explains the event
+ * @other_detail: Technical details about the event that
+ * may help hardware manufacturers and
+ * EDAC developers to analyse the event
+ * @arch_log: Architecture-specific struct that can
+ * be used to add extended information to the
+ * tracepoint, like dumping MCE registers.
+ */
void edac_mc_handle_error(const enum hw_event_mc_err_type type,
struct mem_ctl_info *mci,
const unsigned long page_frame_number,
@@ -985,7 +1011,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
const int layer2,
const char *msg,
const char *other_detail,
- const void *mcelog)
+ const void *arch_log)
{
/* FIXME: too much for stack: move it to some pre-alocated area */
char detail[80], location[80];
@@ -1120,23 +1146,31 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
edac_layer_name[mci->layers[i].type],
pos[i]);
}
+ if (p > location)
+ *(p - 1) = '\0';
index 0000000..2f9069d
--- /dev/null
+++ b/include/ras/ras_event.h
@@ -0,0 +1,80 @@
+ TP_printk("%s error:%s%s on memory stick \"%s\" (mc:%d %s %s%s%s)",
+ (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
+ ((__entry->err_type == HW_EVENT_ERR_FATAL) ?
+ "Fatal" : "Uncorrected"),
+ ((char *)__get_str(msg))[0] ? " " : "",
+ __get_str(msg),
+ __get_str(label),
+ __entry->mc_index,
+ __get_str(location),
+ __get_str(detail),
+ ((char *)__get_str(driver_detail))[0] ? " " : "",
+ __get_str(driver_detail))
+);
+
+#endif /* _TRACE_HW_EVENT_MC_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
--
1.7.8

Borislav Petkov

unread,
May 17, 2012, 5:49:23 PM5/17/12
to Mauro Carvalho Chehab, Linux Edac Mailing List, Linux Kernel Mailing List, Aristeu Rozanski, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
Nacked-by: Borislav Petkov <borisla...@amd.com>

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

Ingo Molnar

unread,
May 18, 2012, 3:12:57 AM5/18/12
to Borislav Petkov, Mauro Carvalho Chehab, Linux Edac Mailing List, Linux Kernel Mailing List, Aristeu Rozanski, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
Just wondering why this got nacked, and what the
suggestions/plans are to improve the situation: I assume Mauro
is working on these things to solve problems, or to add
features, Mauro could you please give a higher level list of
those problems or features? There must be more to it than just a
new tracepoint! :-)

Thanks,

Ingo

Borislav Petkov

unread,
May 18, 2012, 5:57:10 AM5/18/12
to Ingo Molnar, Borislav Petkov, Mauro Carvalho Chehab, Linux Edac Mailing List, Linux Kernel Mailing List, Aristeu Rozanski, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
On Fri, May 18, 2012 at 09:12:44AM +0200, Ingo Molnar wrote:
> > > Of course, any userspace tools meant to handle errors should not parse
> > > the above data. They should, instead, use the binary fields provided by
> > > the tracepoint, mapping them directly into their MIBs.
> >
> > Nacked-by: Borislav Petkov <borisla...@amd.com>
>
> Just wondering why this got nacked, and what the
> suggestions/plans are to improve the situation:

Basically this is the thread which lead to it: http://marc.info/?l=linux-kernel&m=133709477524773&w=2

> I assume Mauro is working on these things to solve problems, or to
> add features, Mauro could you please give a higher level list of
> those problems or features? There must be more to it than just a new
> tracepoint! :-)

My main objection was that the tracepoint to report errors from edac
contains the following prototype:

+ TP_PROTO(const unsigned int err_type,
+ const unsigned int mc_index,
+ const char *error_msg,
+ const char *label,
+ const char *location,
+ const char *core_detail,
+ const char *driver_detail),

and that the last args should be merged simply into one 'const char
*detail' which every driver can populate as it sees fit.

But Mauro did not want to parse the string in userspace but feed it
straight into a MIB (which could mean "Men In Black" for all I know),
right from the tracepoint:

> Of course, any userspace tools meant to handle errors should not parse
> the above data. They should, instead, use the binary fields provided by
> the tracepoint, mapping them directly into their MIBs.

And I wanted to have a generic, usable-for-all tracepoint output
which anyone in userspace can parse, decode, cut, paste as she sees
fit without forcing kernel output formatting into any abstract error
management hierarchy or whatever.

As Tony put it, we need to hammer that out properly now before it
becomes an ABI.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

Mauro Carvalho Chehab

unread,
May 18, 2012, 7:00:46 AM5/18/12
to Borislav Petkov, Ingo Molnar, Linux Edac Mailing List, Linux Kernel Mailing List, Aristeu Rozanski, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
The "detail" field above is, in fact, a group of the following integer
values:

- physical address of the error, provided as 3 integers: page, offset an
the error offset granularity;

- ECC syndrome, with has a XOR value used to determine how many and what
bit(s) have errors.

"driver_detail" contains other driver-specific details about the errors.

The "location" field is also given by 3 integers: layer0, layer1, layer2 with
contains the branch/slot/channel or slot/channel or chip select row/channel
position of the affected DIMM.

On my last comment, I said that I'm convinced that those two fields should
be broken into their integer components.

The fields at "detail" and "location" are generated by a snprintf() logic
inside the EDAC core. "driver_detail" is filled by the drivers.

I did that merge in order to provide a clearer output message, but this
doesn't help userspace tools that could get the binary information at
the trace directly.

There's a conceptual issue here, and it seems that Boris is not understanding,
probably because he doesn't have any experience on handling errors on userspace.

In this point, my previous experience on writing network management
and my work at the userspace EDAC tool helps me to see both faces of
the coin.

Currently, EDAC prints an error message, and increments some Kernel counters
to indicate where the error is located.

Userspace tools can either get the dmesg logs or read the error counters.
The edac-tools (with is the only public tool for EDAC that I'm aware of)
prefers to rely on the error counters, instead of parsing the error logs.

I suspect that one of the reasons is because printk messages are not a
consistent ABI, as changing them are not considered as a Kernel regression.

There are several issues on relying at the Kernel error counters: they don't
have decay or any other kind of logic that would allow detecting bursts.

So, on a machine running for 30 days with, let's say, 10 corrected errors,
it is not possible to tell if they were all generated on a burst (because
of some temporary interference) or if they are sparse errors generated at
the same DRAM, indicating a bad memory.

The big advantage (maybe the only advantage) of using a tracepoint is that
the binary data can be exported directly.

If the location and memory location integers are exported as integers, it
is simple to change the edac-tools to work with tracepoints, instead of working
with the error counters.

With this simple change, the tool can be improved to provide a more
consistent memory error report, as userspace should not be worried on
parsing fields that may change in future without notice.

Also, doing that will avoid the extra effort of joining everything into
a single string, and then breaking them back into their individual fields
on userspace.

> but feed it
> straight into a MIB (which could mean "Men In Black" for all I know),
> right from the tracepoint:

I can do a s/MIB/Management Information Base/. We can also avoid all other
acronyms that have more than one sense, like RAS (with all network
people will associate it with "Remote Access Service").

>> Of course, any userspace tools meant to handle errors should not parse
>> the above data. They should, instead, use the binary fields provided by
>> the tracepoint, mapping them directly into their MIBs.
>
> And I wanted to have a generic, usable-for-all tracepoint output
> which anyone in userspace can parse, decode, cut, paste as she sees
> fit without forcing kernel output formatting into any abstract error
> management hierarchy or whatever.

s/printk/trace/? That doesn't sound a good idea. The advantage of
tracepoints of printk's is that they provide an ABI that allows passing
strucutured information to userspace.

> As Tony put it, we need to hammer that out properly now before it
> becomes an ABI.

Yes, sure.

Regards,
Mauro

Borislav Petkov

unread,
May 18, 2012, 8:44:05 AM5/18/12
to Mauro Carvalho Chehab, Borislav Petkov, Ingo Molnar, Linux Edac Mailing List, Linux Kernel Mailing List, Aristeu Rozanski, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
On Fri, May 18, 2012 at 07:59:55AM -0300, Mauro Carvalho Chehab wrote:
> The "detail" field above is, in fact, a group of the following integer
> values:
>
> - physical address of the error, provided as 3 integers: page, offset an
> the error offset granularity;
>
> - ECC syndrome, with has a XOR value used to determine how many and what
> bit(s) have errors.
>
> "driver_detail" contains other driver-specific details about the errors.
>
> The "location" field is also given by 3 integers: layer0, layer1, layer2 with
> contains the branch/slot/channel or slot/channel or chip select row/channel
> position of the affected DIMM.
>
> On my last comment, I said that I'm convinced that those two fields should
> be broken into their integer components.
>
> The fields at "detail" and "location" are generated by a snprintf() logic
> inside the EDAC core. "driver_detail" is filled by the drivers.
>
> I did that merge in order to provide a clearer output message, but this
> doesn't help userspace tools that could get the binary information at
> the trace directly.
>
> There's a conceptual issue here, and it seems that Boris is not understanding,
> probably because he doesn't have any experience on handling errors on userspace.

Probably you're too stuck up with your error handling "experience" and
you fail to see that I'm trying to make this as generic as possible and
not fit it into anything so that EVERYONE can use it.

> In this point, my previous experience on writing network management
> and my work at the userspace EDAC tool helps me to see both faces of
> the coin.
>
> Currently, EDAC prints an error message, and increments some Kernel counters
> to indicate where the error is located.
>
> Userspace tools can either get the dmesg logs or read the error counters.
> The edac-tools (with is the only public tool for EDAC that I'm aware of)
> prefers to rely on the error counters, instead of parsing the error logs.
>
> I suspect that one of the reasons is because printk messages are not a
> consistent ABI, as changing them are not considered as a Kernel regression.

Oh yeah, teach me more about that, that's like not appearing maybe once
a week on lkml but what do I know...

> There are several issues on relying at the Kernel error counters: they
> don't have decay or any other kind of logic that would allow detecting
> bursts.

Guess what, you can do decay very successfully in userspace.

> So, on a machine running for 30 days with, let's say, 10 corrected
> errors, it is not possible to tell if they were all generated on a
> burst (because of some temporary interference) or if they are sparse
> errors generated at the same DRAM, indicating a bad memory.

Why not, you don't have timestamps?

> The big advantage (maybe the only advantage) of using a tracepoint is that
> the binary data can be exported directly.

Really?? You don't say. Teach me more about tracepoints and why it is a
good thing to use them for error reporting.

> If the location and memory location integers are exported as integers,
> it is simple to change the edac-tools to work with tracepoints,
> instead of working with the error counters.

Yeah, so?

> With this simple change, the tool can be improved to provide a more
> consistent memory error report, as userspace should not be worried on
> parsing fields that may change in future without notice.

Which userspace, your userspace? What happens if another userspace comes
with slightly different error formatting requirements? We change the
kernel again or we can't because this tracepoint is being used and we
add another tracepoint and let the bloat begin?

> Also, doing that will avoid the extra effort of joining everything into
> a single string, and then breaking them back into their individual fields
> on userspace.

I'm being told this is very easy to do in userspace in your garden
variety language.

> > but feed it
> > straight into a MIB (which could mean "Men In Black" for all I know),
> > right from the tracepoint:
>
> I can do a s/MIB/Management Information Base/. We can also avoid all other
> acronyms that have more than one sense, like RAS (with all network
> people will associate it with "Remote Access Service").

Oh, I know, MIB=Mauro Is Best.

> >> Of course, any userspace tools meant to handle errors should not parse
> >> the above data. They should, instead, use the binary fields provided by
> >> the tracepoint, mapping them directly into their MIBs.
> >
> > And I wanted to have a generic, usable-for-all tracepoint output
> > which anyone in userspace can parse, decode, cut, paste as she sees
> > fit without forcing kernel output formatting into any abstract error
> > management hierarchy or whatever.
>
> s/printk/trace/? That doesn't sound a good idea. The advantage of
> tracepoints of printk's is that they provide an ABI that allows passing
> strucutured information to userspace.

WTF? Maybe you don't understand what I'm talking about at all and I'm
wasting my time completely.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

Mauro Carvalho Chehab

unread,
May 18, 2012, 9:24:03 AM5/18/12
to Borislav Petkov, Ingo Molnar, Linux Edac Mailing List, Linux Kernel Mailing List, Aristeu Rozanski, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
Em 18-05-2012 09:43, Borislav Petkov escreveu:

(offensive/ironic comments skipped)

> you fail to see that I'm trying to make this as generic as possible and
> not fit it into anything so that EVERYONE can use it.

Tracepoints are generic enough. Everyone can use it. If need new fields are
needed, just add a new tracepoint. Converting it into a new printk way
won't make it usable by everyone. It will just make the ABI unstable.

For sure the fields required for memory errors are different than the ones
required by PCI, CPU or other types of hardware.

>> There are several issues on relying at the Kernel error counters: they
>> don't have decay or any other kind of logic that would allow detecting
>> bursts.
>
> Guess what, you can do decay very successfully in userspace.

Yes, if and only if userspace receives the errors on a proper way.

>> So, on a machine running for 30 days with, let's say, 10 corrected
>> errors, it is not possible to tell if they were all generated on a
>> burst (because of some temporary interference) or if they are sparse
>> errors generated at the same DRAM, indicating a bad memory.
>
> Why not, you don't have timestamps?

At the error counters? No.

There are timestamps at printk's, but printk's can't be reliably parsed, as
formats are not consistent among drivers, and the "ABI" can break on every
Kernel release.

>> With this simple change, the tool can be improved to provide a more
>> consistent memory error report, as userspace should not be worried on
>> parsing fields that may change in future without notice.
>
> Which userspace, your userspace? What happens if another userspace comes
> with slightly different error formatting requirements? We change the
> kernel again or we can't because this tracepoint is being used and we
> add another tracepoint and let the bloat begin?

As said before, this is not about error formatting requirements! TP_printk() macro
does formatting. I'm not discussing TP_printk() output.

All Userspace tools require access to the error fields.

The real issue with tracepoint is to provide access to the structured data that
userspace needs (so, what fields should be exported), and not about what printk
formats. With regards to the printk format there's already an agreement that
seems to satisfy you, me and Tony.

By adding all fields there at the tracepoint, no changes will be needed inside
Kernel to satisfy any userspace requirements, and the ABI will be reliable and
stable.

>> Also, doing that will avoid the extra effort of joining everything into
>> a single string, and then breaking them back into their individual fields
>> on userspace.
>
> I'm being told this is very easy to do in userspace in your garden
> variety language.

Well, ask the one that told you that to write a parser that will get all
EDAC error reports on all drivers, on several kernel versions using dmesg.
This is not easy, as the messages are not consistent, the right fields are
sometimes at the wrong places, and different kernel versions have different
printk's.

Just to give you an example, this is how sb_edac currently outputs an error:

EDAC MC1: CE row 10, channel 0, label "DIMM_5": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=6 Err=0008:00c2 (ch=2), addr = 0x22719d0780 => socket=1, Channel=3(mask=8), rank=4

The above error happened at the memory controlled by the second CPU socket,
at slot #1 of channel 3.

A similar report for amd64_edac (if amd64 would have 4 channels) would be:

EDAC MC1: amd64_edac: row 1, channel 3, label "DIMM_5" [...]

E. g. the error location fields are on different places. Writing a parser for
that would require to look inside each driver, and will require parsers
rewrite on each kernel version (for example, on this patch series, the '='
delimiter were replaced by ':' due to upstream feedback).

I think we merged some printk change for sb_edac on 3.4. If so, this driver,
that started on 3.3, will have 3 different printk ABI variants: one for
3.3, one for 3.4 and another for 3.5.

Also, as the "mask" needs to be properly parsed, 3.6 will also have yet another
printk format.

Regards,
Mauro

Borislav Petkov

unread,
May 18, 2012, 10:05:47 AM5/18/12
to Mauro Carvalho Chehab, Borislav Petkov, Ingo Molnar, Linux Edac Mailing List, Linux Kernel Mailing List, Aristeu Rozanski, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
On Fri, May 18, 2012 at 10:23:39AM -0300, Mauro Carvalho Chehab wrote:
> Em 18-05-2012 09:43, Borislav Petkov escreveu:
>
> (offensive/ironic comments skipped)
>
> > you fail to see that I'm trying to make this as generic as possible and
> > not fit it into anything so that EVERYONE can use it.
>
> Tracepoints are generic enough. Everyone can use it. If need new fields are
> needed, just add a new tracepoint.

This is exactly what I'm talking about - this is wrong on so many levels
that it ain't even funny! You can add as many tracepoints if you want
to to your pet project but in the kernel we add one tracepoint for one
thing which fits all users.

I want to see you be successful at adding a new tracepoint
to, say, prepare_task_switch() in kernel/sched/core.c because
trace_sched_switch() does not fit your needs.

> Converting it into a new printk way won't make it usable by everyone.
> It will just make the ABI unstable.

WTF? We're not even talking about converting it into a new printk!

[ … ]

> > Guess what, you can do decay very successfully in userspace.
>
> Yes, if and only if userspace receives the errors on a proper way.

Doh, am I talking to the wall here, do you even read what I'm sayin?

> >> So, on a machine running for 30 days with, let's say, 10 corrected
> >> errors, it is not possible to tell if they were all generated on a
> >> burst (because of some temporary interference) or if they are sparse
> >> errors generated at the same DRAM, indicating a bad memory.
> >
> > Why not, you don't have timestamps?
>
> At the error counters? No.
>
> There are timestamps at printk's, but printk's can't be reliably parsed, as
> formats are not consistent among drivers, and the "ABI" can break on every
> Kernel release.

Yes, enough already! We know about the printks! The whole world knows.

> >> With this simple change, the tool can be improved to provide a more
> >> consistent memory error report, as userspace should not be worried on
> >> parsing fields that may change in future without notice.
> >
> > Which userspace, your userspace? What happens if another userspace comes
> > with slightly different error formatting requirements? We change the
> > kernel again or we can't because this tracepoint is being used and we
> > add another tracepoint and let the bloat begin?
>
> As said before, this is not about error formatting requirements! TP_printk() macro
> does formatting. I'm not discussing TP_printk() output.
>
> All Userspace tools require access to the error fields.
>
> The real issue with tracepoint is to provide access to the structured data that
> userspace needs (so, what fields should be exported), and not about what printk
> formats. With regards to the printk format there's already an agreement that
> seems to satisfy you, me and Tony.
>
> By adding all fields there at the tracepoint, no changes will be needed inside
> Kernel to satisfy any userspace requirements, and the ABI will be reliable and
> stable.

Right, and this is why I'm asking you to have the following tracepoint proto:

+ TP_PROTO(const unsigned int err_type,
+ const unsigned int mc_index,
+ const char *error_msg,
+ const char *label,
+ const char *location,
+ const char *detail)

where detail contains all the crap one driver adds for technical people
to pinpoint where the error is.

And not have _TWO_ detail arguments!

Btw, the output looks like this here:

<...>-2723 [001] .N.. 89.107045: mc_event: Corrected error: on memory stick "unknown memory" (mc:0 csrow:3 channel:1 page:0x5bac7 offset:0x388 grain:0 syndrome:0xfc5b driver:amd64_edac)

Come to think of it, the "driver:amd64_edac" is not really needed
because on every single system there's only one EDAC driver running and
I don't think the fact that we're telling in the tracepoint who detected
the error is meaningfull information.

Which means, you can remove the EDAC_MOD_STR argument you're passing to
edac_mc_handle_error() and have one less argument.

> >> Also, doing that will avoid the extra effort of joining everything into
> >> a single string, and then breaking them back into their individual fields
> >> on userspace.
> >
> > I'm being told this is very easy to do in userspace in your garden
> > variety language.
>
> Well, ask the one that told you that to write a parser that will get all
> EDAC error reports on all drivers, on several kernel versions using dmesg.

Nobody is talking about dmesg - I'm talking about the "const char
*detail" string in the tracepoint above and how I want it to be a single
char * so that userspace can read it out and fumble with it the way it
sees fit.

Having "detail" and "other_detail" is simply misleading and unneeded and
a clear sign that this ABI is not designed properly yet.

That's it.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

Mauro Carvalho Chehab

unread,
May 18, 2012, 10:32:17 AM5/18/12
to Borislav Petkov, Ingo Molnar, Linux Edac Mailing List, Linux Kernel Mailing List, Aristeu Rozanski, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
Em 18-05-2012 11:05, Borislav Petkov escreveu:
> On Fri, May 18, 2012 at 10:23:39AM -0300, Mauro Carvalho Chehab wrote:
>> Em 18-05-2012 09:43, Borislav Petkov escreveu:
>>
>> (offensive/ironic comments skipped)
>>
>>> you fail to see that I'm trying to make this as generic as possible and
>>> not fit it into anything so that EVERYONE can use it.
>>
>> Tracepoints are generic enough. Everyone can use it. If need new fields are
>> needed, just add a new tracepoint.
>
> This is exactly what I'm talking about - this is wrong on so many levels
> that it ain't even funny! You can add as many tracepoints if you want
> to to your pet project but in the kernel we add one tracepoint for one
> thing which fits all users.
>
> I want to see you be successful at adding a new tracepoint
> to, say, prepare_task_switch() in kernel/sched/core.c because
> trace_sched_switch() does not fit your needs.

Ok, but you won't use trace_sched_switch() as a memory tracepoint, as
they represent different things.

Memory errors are different than CPU errors. So, their tracepoints
will be different.

<skip>

> Right, and this is why I'm asking you to have the following tracepoint proto:
>
> + TP_PROTO(const unsigned int err_type,
> + const unsigned int mc_index,
> + const char *error_msg,
> + const char *label,
> + const char *location,
> + const char *detail)
>
> where detail contains all the crap one driver adds for technical people
> to pinpoint where the error is.
>
> And not have _TWO_ detail arguments!

And what I'm saying is that this should be, instead:

+ TP_PROTO(const unsigned int err_type,
+ const unsigned int mc_index,
+ const char *error_msg,
+ const char *label,
+ int layer0,
+ int layer1,
+ int layer2,
+ unsigned long pfn,
+ unsigned long offset,
+ unsigned long grain,
+ unsigned long syndrome,
+ const char *driver_detail),

So, having just one detail argument, filled by the driver, and not
folding "location" and core "details" into strings, but keeping as they
are.

>
> Btw, the output looks like this here:
>
> <...>-2723 [001] .N.. 89.107045: mc_event: Corrected error: on memory stick "unknown memory" (mc:0 csrow:3 channel:1 page:0x5bac7 offset:0x388 grain:0 syndrome:0xfc5b driver:amd64_edac)
>
> Come to think of it, the "driver:amd64_edac" is not really needed
> because on every single system there's only one EDAC driver running and
> I don't think the fact that we're telling in the tracepoint who detected
> the error is meaningfull information.
>
> Which means, you can remove the EDAC_MOD_STR argument you're passing to
> edac_mc_handle_error() and have one less argument.

That's what I said you, but you didn't seem to agree, as I understood that
you've required to keep "amd64_edac" at the trace, due to:
http://markmail.org/message/nr3ooep7gc7mhgdl.

If you're ok, I'll remove EDAC_MOD_STR argument from the amd64_edac calls
on a separate patch (with can be merged latter with the patch that converted
amd64_edac to the new function calls).

>
>>>> Also, doing that will avoid the extra effort of joining everything into
>>>> a single string, and then breaking them back into their individual fields
>>>> on userspace.
>>>
>>> I'm being told this is very easy to do in userspace in your garden
>>> variety language.
>>
>> Well, ask the one that told you that to write a parser that will get all
>> EDAC error reports on all drivers, on several kernel versions using dmesg.
>
> Nobody is talking about dmesg - I'm talking about the "const char
> *detail" string in the tracepoint above and how I want it to be a single
> char * so that userspace can read it out and fumble with it the way it
> sees fit.
>
> Having "detail" and "other_detail" is simply misleading and unneeded and
> a clear sign that this ABI is not designed properly yet.
>
> That's it.

I agree with that.

It follows a version removing the "core_detail" parameter from the trace.

I removed the changes at amd64_edac from it. I'll address them on a latter
patch, removing the EDAC_MOD_STR argument from the calls, after we've
done with this patch.

Regards,
Mauro

-

RAS: Add a tracepoint for reporting memory controller events

Of course, any userspace tools meant to handle errors should not parse
the above data. They should, instead, use the binary fields provided by
the tracepoint, mapping them directly into their Management Information
Base.

NOTE: The original patch was providing an additional mechanism for
MCA-based trace events that also contained MCA error register data.
However, as no agreement was reached so far for the MCA-based trace
events, for now, let's add events only for memory errors.
A latter patch is planned to change the tracepoint, for those types
of event.

Cc: Aristeu Rozanski <aroz...@redhat.com>
Cc: Doug Thompson <nor...@yahoo.com>
Cc: Steven Rostedt <ros...@goodmis.org>
Cc: Frederic Weisbecker <fwei...@gmail.com>
Cc: Ingo Molnar <mi...@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mch...@redhat.com>

diff --git a/drivers/edac/edac_core.h b/drivers/edac/edac_core.h
index f06ce9a..eee7360 100644
--- a/drivers/edac/edac_core.h
+++ b/drivers/edac/edac_core.h
@@ -468,7 +468,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
const int layer2,
const char *msg,
const char *other_detail,
- const void *mcelog);
+ const void *arch_log);

/*
* edac_device APIs
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 10f3750..5006461 100644
@@ -1120,6 +1146,13 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
edac_layer_name[mci->layers[i].type],
pos[i]);
}
+ if (p > location)
+ *(p - 1) = '\0';
+
+ /* Report the error via the trace interface */
+ trace_mc_event(type, mci->mc_idx, msg, label, layer0, layer1, layer2,
+ page_frame_number, offset_in_page, grain, syndrome,
+ other_detail);

/* Memory type dependent details about the error */
if (type == HW_EVENT_ERR_CORRECTED) {
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
new file mode 100644
index 0000000..65835df
--- /dev/null
+++ b/include/ras/ras_event.h
@@ -0,0 +1,100 @@
+ TP_PROTO(const unsigned int err_type,
+ const unsigned int mc_index,
+ const char *error_msg,
+ const char *label,
+ int layer0,
+ int layer1,
+ int layer2,
+ unsigned long pfn,
+ unsigned long offset,
+ unsigned long grain,
+ unsigned long syndrome,
+ const char *driver_detail),
+
+ TP_ARGS(err_type, mc_index, error_msg, label, layer0, layer1, layer2,
+ pfn, offset, grain, syndrome, driver_detail),
+
+ TP_STRUCT__entry(
+ __field( unsigned int, err_type )
+ __field( unsigned int, mc_index )
+ __string( msg, error_msg )
+ __string( label, label )
+ __field( int, layer0 )
+ __field( int, layer1 )
+ __field( int, layer2 )
+ __field( int, pfn )
+ __field( int, offset )
+ __field( int, grain )
+ __field( int, syndrome )
+ __string( driver_detail, driver_detail )
+ ),
+
+ TP_fast_assign(
+ __entry->err_type = err_type;
+ __entry->mc_index = mc_index;
+ __assign_str(msg, error_msg);
+ __assign_str(label, label);
+ __entry->layer0 = layer0;
+ __entry->layer1 = layer1;
+ __entry->layer2 = layer2;
+ __entry->pfn = pfn;
+ __entry->offset = offset;
+ __entry->grain = grain;
+ __entry->syndrome = syndrome;
+ __assign_str(driver_detail, driver_detail);
+ ),
+
+ TP_printk("%s error:%s%s on memory stick \"%s\" (mc:%d location:%d:%d:%d page:0x%08x offset:0x%08x grain:%d syndrome:0x%08x%s%s)",
+ (__entry->err_type == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
+ ((__entry->err_type == HW_EVENT_ERR_FATAL) ?
+ "Fatal" : "Uncorrected"),
+ ((char *)__get_str(msg))[0] ? " " : "",
+ __get_str(msg),
+ __get_str(label),
+ __entry->mc_index,
+ __entry->layer0,
+ __entry->layer1,
+ __entry->layer2,
+ __entry->pfn,
+ __entry->offset,
+ __entry->grain,
+ __entry->syndrome,
+ ((char *)__get_str(driver_detail))[0] ? " " : "",
+ __get_str(driver_detail))
+);
+
+#endif /* _TRACE_HW_EVENT_MC_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>

Borislav Petkov

unread,
May 18, 2012, 12:40:45 PM5/18/12
to Mauro Carvalho Chehab, Borislav Petkov, Ingo Molnar, Linux Edac Mailing List, Linux Kernel Mailing List, Aristeu Rozanski, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
On Fri, May 18, 2012 at 11:31:46AM -0300, Mauro Carvalho Chehab wrote:
> Ok, but you won't use trace_sched_switch() as a memory tracepoint, as
> they represent different things.
>
> Memory errors are different than CPU errors. So, their tracepoints
> will be different.

WTF? A tracepoint is a tracepoint.
And this way you're enforcing an interface that all drivers will have
to adhere to. What if "grain" doesn't mean a thing for a driver, or
"syndrome" or whatever? What if some other entity wants to use that
tracepoint?

See what I'm sayin?

Having

TP_PROTO(const unsigned int err_type,
const unsigned int mc_index,
const char *error_msg,
const char *label,
const char *location,
const char *detail)

is a bit more generic and userspace can parse it however it likes.

Actually, I'd slim this up even more:

TP_PROTO(const unsigned int mc_index,
const char *error_msg,
const char *label,
const char *location,
const char *detail)

and have error_msg contain the "Corrected/Uncorrected/Fatal" things
and this way you can drop all the ternary operators in the tracepoint
definition.

> > Btw, the output looks like this here:
> >
> > <...>-2723 [001] .N.. 89.107045: mc_event: Corrected error: on memory stick "unknown memory" (mc:0 csrow:3 channel:1 page:0x5bac7 offset:0x388 grain:0 syndrome:0xfc5b driver:amd64_edac)
> >
> > Come to think of it, the "driver:amd64_edac" is not really needed
> > because on every single system there's only one EDAC driver running and
> > I don't think the fact that we're telling in the tracepoint who detected
> > the error is meaningfull information.
> >
> > Which means, you can remove the EDAC_MOD_STR argument you're passing to
> > edac_mc_handle_error() and have one less argument.
>
> That's what I said you, but you didn't seem to agree, as I understood that
> you've required to keep "amd64_edac" at the trace, due to:
> http://markmail.org/message/nr3ooep7gc7mhgdl.
>
> If you're ok, I'll remove EDAC_MOD_STR argument from the amd64_edac calls
> on a separate patch (with can be merged latter with the patch that converted
> amd64_edac to the new function calls).

Ok.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

Mauro Carvalho Chehab

unread,
May 18, 2012, 1:27:39 PM5/18/12
to Borislav Petkov, Ingo Molnar, Linux Edac Mailing List, Linux Kernel Mailing List, Aristeu Rozanski, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
Em 18-05-2012 13:40, Borislav Petkov escreveu:
> On Fri, May 18, 2012 at 11:31:46AM -0300, Mauro Carvalho Chehab wrote:
>> Ok, but you won't use trace_sched_switch() as a memory tracepoint, as
>> they represent different things.
>>
>> Memory errors are different than CPU errors. So, their tracepoints
>> will be different.
>
> WTF? A tracepoint is a tracepoint.

This is the event you've mentioned:

TRACE_EVENT(sched_switch,

TP_PROTO(struct task_struct *prev,
struct task_struct *next),

It doesn't have the same arguments of a memory tracepoint, like:

TRACE_EVENT(mm_vmscan_wakeup_kswapd,

TP_PROTO(int nid, int zid, int order),

trace events for different things will have different arguments.
This enforcement is already there at the EDAC API/ABI. This patch series
won't change that.

Memory controllers that can't get the error address (page/offset/grain)
or the syndrome fills those information with 0.

Maybe there is a driver deficiency (or a bad hardware implementation)
that would not allow to report where the error happened, but that doesn't
mean that we should create a crappy API due to bad hardware.

>
> See what I'm sayin?
>
> Having
>
> TP_PROTO(const unsigned int err_type,
> const unsigned int mc_index,
> const char *error_msg,
> const char *label,
> const char *location,
> const char *detail)
>
> is a bit more generic and userspace can parse it however it likes.
>
> Actually, I'd slim this up even more:
>
> TP_PROTO(const unsigned int mc_index,
> const char *error_msg,
> const char *label,
> const char *location,
> const char *detail)

Well, this even more generic:
TP_PROTO(const char *msg)

but the more we convert other fields into strings, the harder is for
userspace to handle it, as fields may be on different order, string
conversion patches might change or break the parsing behavior,
different drivers will require different parsing expressions, etc.

> and have error_msg contain the "Corrected/Uncorrected/Fatal" things
> and this way you can drop all the ternary operators in the tracepoint
> definition.
>
>>> Btw, the output looks like this here:
>>>
>>> <...>-2723 [001] .N.. 89.107045: mc_event: Corrected error: on memory stick "unknown memory" (mc:0 csrow:3 channel:1 page:0x5bac7 offset:0x388 grain:0 syndrome:0xfc5b driver:amd64_edac)
>>>
>>> Come to think of it, the "driver:amd64_edac" is not really needed
>>> because on every single system there's only one EDAC driver running and
>>> I don't think the fact that we're telling in the tracepoint who detected
>>> the error is meaningfull information.
>>>
>>> Which means, you can remove the EDAC_MOD_STR argument you're passing to
>>> edac_mc_handle_error() and have one less argument.
>>
>> That's what I said you, but you didn't seem to agree, as I understood that
>> you've required to keep "amd64_edac" at the trace, due to:
>> http://markmail.org/message/nr3ooep7gc7mhgdl.
>>
>> If you're ok, I'll remove EDAC_MOD_STR argument from the amd64_edac calls
>> on a separate patch (with can be merged latter with the patch that converted
>> amd64_edac to the new function calls).
>
> Ok.

Ok, I'll prepare a patch for it after we finish discussing this patch.

Regards,
Mauro

Borislav Petkov

unread,
May 18, 2012, 2:53:24 PM5/18/12
to Mauro Carvalho Chehab, Borislav Petkov, Ingo Molnar, Linux Edac Mailing List, Linux Kernel Mailing List, Aristeu Rozanski, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
On Fri, May 18, 2012 at 02:27:10PM -0300, Mauro Carvalho Chehab wrote:
> Em 18-05-2012 13:40, Borislav Petkov escreveu:
> > On Fri, May 18, 2012 at 11:31:46AM -0300, Mauro Carvalho Chehab wrote:
> >> Ok, but you won't use trace_sched_switch() as a memory tracepoint, as
> >> they represent different things.
> >>
> >> Memory errors are different than CPU errors. So, their tracepoints
> >> will be different.
> >
> > WTF? A tracepoint is a tracepoint.
>
> This is the event you've mentioned:
>
> TRACE_EVENT(sched_switch,
>
> TP_PROTO(struct task_struct *prev,
> struct task_struct *next),
>
> It doesn't have the same arguments of a memory tracepoint, like:
>
> TRACE_EVENT(mm_vmscan_wakeup_kswapd,
>
> TP_PROTO(int nid, int zid, int order),
>
> trace events for different things will have different arguments.

It seems you and I speak two different english languages. Here's what I meant:

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0533a688ce22..96947fb50328 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1918,6 +1918,7 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev,
prepare_lock_switch(rq, next);
prepare_arch_switch(next);
trace_sched_switch(prev, next);
+ trace_sched_switch_rq(prev, next, rq);
}

because for some reason we want to dump something from the runqueue
pointer now. We can't change the current tracepoint because userspace
scripts use it.

Now imagine the bloat levels when people start adding
tracepoints left and right...
Setting something to 0 because it is N/A is the first sign that
something is wrong with your ABI. Having an arbitrary string which each
driver parses as it sees fit is perfectly fine.

> > See what I'm sayin?
> >
> > Having
> >
> > TP_PROTO(const unsigned int err_type,
> > const unsigned int mc_index,
> > const char *error_msg,
> > const char *label,
> > const char *location,
> > const char *detail)
> >
> > is a bit more generic and userspace can parse it however it likes.
> >
> > Actually, I'd slim this up even more:
> >
> > TP_PROTO(const unsigned int mc_index,
> > const char *error_msg,
> > const char *label,
> > const char *location,
> > const char *detail)
>
> Well, this even more generic:
> TP_PROTO(const char *msg)
>
> but the more we convert other fields into strings, the harder is for
> userspace to handle it,

For the millionth time, it is not hard for userspace to parse anything!

> as fields may be on different order, string conversion patches might
> change or break the parsing behavior, different drivers will require
> different parsing expressions, etc.

That's why _each_ _driver_ will have its format and the userspace tools
parsing it will know about it!

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

Luck, Tony

unread,
May 18, 2012, 3:11:07 PM5/18/12
to Borislav Petkov, Mauro Carvalho Chehab, Ingo Molnar, Linux Edac Mailing List, Linux Kernel Mailing List, Aristeu Rozanski, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
> That's why _each_ _driver_ will have its format and the userspace tools
> parsing it will know about it!

Sounds like a full employment program for parser writers.

There are some interesting fields that should be common to all
drivers ... so have twenty parsers that can handle:

paddr: 0x1234
PADDR: 0x1234
Paddr = 0x1234
Phys = 1234
addr: 0x1234
Address: 0x000000001234

looks like a lot of make-work ... when the OS can standardize in the ABI
that there is a 64-bit binary value that is the physical address of the
error (and another 64-bit mask saying which, if any, bits are valid).

So we should be looking for the set of always relevant values that can
be kept explicitly separate in fields in TP_PROTO, and per-driver specific
stuff (grain/syndrome??) bits that will have to go into the "details"
string and require some driver specific user-mode parsing to use.

-Tony

Borislav Petkov

unread,
May 18, 2012, 5:12:38 PM5/18/12
to Luck, Tony, Borislav Petkov, Mauro Carvalho Chehab, Ingo Molnar, Linux Edac Mailing List, Linux Kernel Mailing List, Aristeu Rozanski, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
On Fri, May 18, 2012 at 07:10:42PM +0000, Luck, Tony wrote:
> > That's why _each_ _driver_ will have its format and the userspace tools
> > parsing it will know about it!
>
> Sounds like a full employment program for parser writers.
>
> There are some interesting fields that should be common to all
> drivers ... so have twenty parsers that can handle:
>
> paddr: 0x1234
> PADDR: 0x1234
> Paddr = 0x1234
> Phys = 1234
> addr: 0x1234
> Address: 0x000000001234
>
> looks like a lot of make-work ... when the OS can standardize in the ABI
> that there is a 64-bit binary value that is the physical address of the
> error (and another 64-bit mask saying which, if any, bits are valid).

Makes sense, I gotta say :)

> So we should be looking for the set of always relevant values that
> can be kept explicitly separate in fields in TP_PROTO, and per-driver
> specific stuff (grain/syndrome??) bits that will have to go into the
> "details" string and require some driver specific user-mode parsing to
> use.

How about we put all the values which are globally valid for all drivers
in separate fields and leave the "(...)" brackets for details where each
driver can put its own relevant stuff?

For the record, I very much like this reasoning :).

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

Borislav Petkov

unread,
May 19, 2012, 5:27:28 AM5/19/12
to Luck, Tony, Mauro Carvalho Chehab, Ingo Molnar, Linux Edac Mailing List, Linux Kernel Mailing List, Aristeu Rozanski, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
On a second thought, filtering out the globally valid fields for all
drivers might require a lot of careful driver auditing.

What would be better and easier is to add those single fields to the
tracepoint which are relevant to the user (and which are more or less
globally valid for all drivers by inferrence) and leave the rest lumped
together in a single char *.

Which is basically what I'm suggesting for a couple of days now :-)

TP_PROTO(const unsigned int err_type,
const unsigned int mc_index,
const char *error_msg,
const char *label,
const char *location,
const char *detail)

and I'm really not sure about err_type - this is an edac-specific define
and it means nothing outside the kernel so its string representation
could very well could be merged with error_msg and we can drop the ( ? :
) ugliness in the tracepoint definition itself.

IOW:


TP_PROTO(const unsigned int mc_index,
const char *error_msg,
const char *label,
const char *location,
const char *detail)

Now I get all warm and cosy simply I'm staring at this :-).

Hmmm.

Mauro Carvalho Chehab

unread,
May 21, 2012, 11:35:09 AM5/21/12
to Borislav Petkov, Luck, Tony, Ingo Molnar, Linux Edac Mailing List, Linux Kernel Mailing List, Aristeu Rozanski, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
Em 19-05-2012 06:26, Borislav Petkov escreveu:
> On Fri, May 18, 2012 at 11:12:11PM +0200, Borislav Petkov wrote:
>> On Fri, May 18, 2012 at 07:10:42PM +0000, Luck, Tony wrote:
>>>> That's why _each_ _driver_ will have its format and the userspace tools
>>>> parsing it will know about it!
>>>
>>> Sounds like a full employment program for parser writers.
>>>
>>> There are some interesting fields that should be common to all
>>> drivers ... so have twenty parsers that can handle:
>>>
>>> paddr: 0x1234
>>> PADDR: 0x1234
>>> Paddr = 0x1234
>>> Phys = 1234
>>> addr: 0x1234
>>> Address: 0x000000001234
>>>
>>> looks like a lot of make-work ... when the OS can standardize in the ABI
>>> that there is a 64-bit binary value that is the physical address of the
>>> error (and another 64-bit mask saying which, if any, bits are valid).

The EDAC API works with the address/address mask as 3 values: address page,
address offset inside the page and grain.

The values for address page/offset/grain and syndrome are mandatory on the EDAC
error calls. Drivers that can't obtain that fill with zero. EDAC prints those
values via the current API.

So, passing those values for the tracepoint makes sense.

In order to use a cleaner API, we may add a flags bitfield there, and add a few
flags to mark if the address (page, offset, grain) is valid and if the syndrome
is valid.

>>
>> Makes sense, I gotta say :)
>>
>>> So we should be looking for the set of always relevant values that
>>> can be kept explicitly separate in fields in TP_PROTO, and per-driver
>>> specific stuff (grain/syndrome??) bits that will have to go into the
>>> "details" string and require some driver specific user-mode parsing to
>>> use.
>>
>> How about we put all the values which are globally valid for all drivers
>> in separate fields and leave the "(...)" brackets for details where each
>> driver can put its own relevant stuff?

You're again concerned about the printk formatting, and not about the fields.
I don't care that much about that. Whatever you/Tony wants for the printk
is fine for me.

The TP_printk logic can change on newer patches, as printk output is not
stable in Linux Kernel.

>>
>> For the record, I very much like this reasoning :).
>
> On a second thought, filtering out the globally valid fields for all
> drivers might require a lot of careful driver auditing.

If you want to explode the driver details, then yes, but the address and
syndrome are part of the EDAC internal ABI.

Most drivers are capable of getting address and syndrome, for CE.
For UE, syndrome is generally not calculated.

Anyway, on _all_ internal ABI calls, when syndrome or address aren't
calculated, the drivers pass a value of 0 for them.

So, it is easy to add a logic inside the code that will rise a flag to
indicate userspace when those fields are filled or not. All the EDAC
core needs to do is to check if the value is 0 or not.

> What would be better and easier is to add those single fields to the
> tracepoint which are relevant to the user (and which are more or less
> globally valid for all drivers by inferrence) and leave the rest lumped
> together in a single char *.

Address and syndrome can be relevant for a proper edac handling daemon, as
some advanced error check mechanism could use those values to be able
to filter out some random interference from an error that is occurring at
the same memory grain, e. g. if a machine is producing an error (even being
very sparsed in time) at the same DRAM, and no errors outside it,
it is very likely that such DRAM chip is damaged.

>
> Which is basically what I'm suggesting for a couple of days now :-)
>
> TP_PROTO(const unsigned int err_type,
> const unsigned int mc_index,
> const char *error_msg,
> const char *label,
> const char *location,
> const char *detail)
>
> and I'm really not sure about err_type - this is an edac-specific define
> and it means nothing outside the kernel so its string representation
> could very well could be merged with error_msg and we can drop the ( ? :
> ) ugliness in the tracepoint definition itself.

All error management systems use a mandatory field for the error severity.
On all implementations I know, this is the first field at the management
base, and, together with the error message, the most important one.

Btw, even the Linux Kernel printk logic starts with severity (error level).

Regards,
Mauro

Borislav Petkov

unread,
May 21, 2012, 12:01:08 PM5/21/12
to Mauro Carvalho Chehab, Luck, Tony, Ingo Molnar, Linux Edac Mailing List, Linux Kernel Mailing List, Aristeu Rozanski, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
This is exactly the issue - we don't want to overcomplicate the
tracepoint! Simply output the single address value and userspace can cut
and split and paste it however it wants.

<snip some bullshit which I'm tired of addressing everytime>

> >> For the record, I very much like this reasoning :).
> >
> > On a second thought, filtering out the globally valid fields for all
> > drivers might require a lot of careful driver auditing.
>
> If you want to explode the driver details, then yes, but the address and
> syndrome are part of the EDAC internal ABI.
>
> Most drivers are capable of getting address and syndrome, for CE.
> For UE, syndrome is generally not calculated.
>
> Anyway, on _all_ internal ABI calls, when syndrome or address aren't
> calculated, the drivers pass a value of 0 for them.
>
> So, it is easy to add a logic inside the code that will rise a flag to
> indicate userspace when those fields are filled or not. All the EDAC
> core needs to do is to check if the value is 0 or not.

No, we're not going to do that! Adding a bit about the validity of a bit
is wrong design and is screaming B0RKED.

We're going to have single fields for EDAC-global valid values and leave
the driver-specific stuff lumped in one char * string.

Then, if a driver cannot specify sindrome or whatever, it simply DOESN'T
PUT IT IN THE STRING instead of adding validity bits.

[ … ]

> > Which is basically what I'm suggesting for a couple of days now :-)
> >
> > TP_PROTO(const unsigned int err_type,
> > const unsigned int mc_index,
> > const char *error_msg,
> > const char *label,
> > const char *location,
> > const char *detail)
> >
> > and I'm really not sure about err_type - this is an edac-specific define
> > and it means nothing outside the kernel so its string representation
> > could very well could be merged with error_msg and we can drop the ( ? :
> > ) ugliness in the tracepoint definition itself.
>
> All error management systems use a mandatory field for the error severity.

.. I'm pretty sure the defines you're exporting are not the same as
your error management systems require. So some kind of mapping is
still needed. Which shows that you're going to need to massage that
information in userspace after all.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

Mauro Carvalho Chehab

unread,
May 21, 2012, 1:28:41 PM5/21/12
to Borislav Petkov, Luck, Tony, Ingo Molnar, Linux Edac Mailing List, Linux Kernel Mailing List, Aristeu Rozanski, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
That's exactly what the latest version of this patch does.

> <snip some bullshit which I'm tired of addressing everytime>
>
>>>> For the record, I very much like this reasoning :).
>>>
>>> On a second thought, filtering out the globally valid fields for all
>>> drivers might require a lot of careful driver auditing.
>>
>> If you want to explode the driver details, then yes, but the address and
>> syndrome are part of the EDAC internal ABI.
>>
>> Most drivers are capable of getting address and syndrome, for CE.
>> For UE, syndrome is generally not calculated.
>>
>> Anyway, on _all_ internal ABI calls, when syndrome or address aren't
>> calculated, the drivers pass a value of 0 for them.
>>
>> So, it is easy to add a logic inside the code that will rise a flag to
>> indicate userspace when those fields are filled or not. All the EDAC
>> core needs to do is to check if the value is 0 or not.
>
> No, we're not going to do that! Adding a bit about the validity of a bit
> is wrong design and is screaming B0RKED.

Huh? Nobody said to add a bit to tell about the validity of another bit.

I'm saying is that a bitfield could be used to indicate that certain integer
values (and not bits) are filled or not. Btw, MCE has it: depending on the
MCE status registers, reading other registers may be needed to evaluate
fully the error.

> We're going to have single fields for EDAC-global valid values and leave
> the driver-specific stuff lumped in one char * string.
>
> Then, if a driver cannot specify sindrome or whatever, it simply DOESN'T
> PUT IT IN THE STRING instead of adding validity bits.
>
> [ … ]

Parsing strings that can be changed from Kernel versions and drivers is a
maintenance nightmare. Both I and Tony pointed it with different examples.

>>> Which is basically what I'm suggesting for a couple of days now :-)
>>>
>>> TP_PROTO(const unsigned int err_type,
>>> const unsigned int mc_index,
>>> const char *error_msg,
>>> const char *label,
>>> const char *location,
>>> const char *detail)
>>>
>>> and I'm really not sure about err_type - this is an edac-specific define
>>> and it means nothing outside the kernel so its string representation
>>> could very well could be merged with error_msg and we can drop the ( ? :
>>> ) ugliness in the tracepoint definition itself.
>>
>> All error management systems use a mandatory field for the error severity.
>
> ... I'm pretty sure the defines you're exporting are not the same as
> your error management systems require. So some kind of mapping is
> still needed. Which shows that you're going to need to massage that
> information in userspace after all.

Yes, field mapping is needed at the management systems, but this is not the
point. It is not Kernel's task to help mapping fields inside userspace apps,
but it is the task of a properly designed API to avoid userspace to guess about
what the kernel is reporting.

A printk/sprintk-designed API, as you're proposing, requires userspace to guess
what the events mean, with the help of a very careful designed parsing rules
that needs to take into account every single EDAC driver (as the (s)printk message
message will vary among them), and needs to be reviewed on every single new kernel
version, as the (s)printk output can change among different Kernel versions. Even
-stable versions might require changing the parsers, as a fixup patch might be
changing the (s)printk arguments.

This is a b0rked design.

Regards,
Mauro

Borislav Petkov

unread,
May 21, 2012, 4:40:26 PM5/21/12
to Mauro Carvalho Chehab, Luck, Tony, Ingo Molnar, Linux Edac Mailing List, Linux Kernel Mailing List, Aristeu Rozanski, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
On Mon, May 21, 2012 at 01:40:08PM -0300, Mauro Carvalho Chehab wrote:
> That's exactly what the latest version of this patch does.

Really, where is the address field?

+ TP_PROTO(const unsigned int err_type,
+ const unsigned int mc_index,
+ const char *error_msg,
+ const char *label,
+ const char *location,
+ const char *core_detail,
+ const char *driver_detail),


[ … ]

> Huh? Nobody said to add a bit to tell about the validity of another bit.
>
> I'm saying is that a bitfield could be used to indicate that certain integer
> values (and not bits) are filled or not. Btw, MCE has it: depending on the
> MCE status registers, reading other registers may be needed to evaluate
> fully the error.

Bullshit, there'll be no bits showing the validity of other fields.

> > We're going to have single fields for EDAC-global valid values and leave
> > the driver-specific stuff lumped in one char * string.
> >
> > Then, if a driver cannot specify sindrome or whatever, it simply DOESN'T
> > PUT IT IN THE STRING instead of adding validity bits.
> >
> > [ … ]
>
> Parsing strings that can be changed from Kernel versions and drivers is a
> maintenance nightmare. Both I and Tony pointed it with different examples.

Bullshit, stop making up reasons to prove your point.
Enough with the crap already - now you're really desperately looking for
bogus arguments to prove your point.

We're going to have single fields for EDAC-global valid values and leave
the driver-specific stuff lumped in one char * string.


--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

Mauro Carvalho Chehab

unread,
May 21, 2012, 11:05:08 PM5/21/12
to Borislav Petkov, Luck, Tony, Ingo Molnar, Linux Edac Mailing List, Linux Kernel Mailing List, Aristeu Rozanski, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
<offensive comments skipped>

Em 21-05-2012 17:40, Borislav Petkov escreveu:
> On Mon, May 21, 2012 at 01:40:08PM -0300, Mauro Carvalho Chehab wrote:
>> That's exactly what the latest version of this patch does.
>
> Really, where is the address field?
>
> + TP_PROTO(const unsigned int err_type,
> + const unsigned int mc_index,
> + const char *error_msg,
> + const char *label,
> + const char *location,
> + const char *core_detail,
> + const char *driver_detail),
>
>
> [ … ]

The above is not the latest version of it. The latest version is:
http://www.spinics.net/lists/kernel/msg1343822.html

The definition there is:

+/*
+ * Default error mechanisms for Memory Controller errors (CE and UE)
+ */
+TRACE_EVENT(mc_event,
+
+ TP_PROTO(const unsigned int err_type,
+ const unsigned int mc_index,
+ const char *error_msg,
+ const char *label,
+ int layer0,
+ int layer1,
+ int layer2,
+ unsigned long pfn,
+ unsigned long offset,
+ unsigned long grain,
+ unsigned long syndrome,
+ const char *driver_detail),
+
+ TP_ARGS(err_type, mc_index, error_msg, label, layer0, layer1, layer2,
+ pfn, offset, grain, syndrome, driver_detail),

The address is there using the edac way to represent it (page, offset, grain).

> We're going to have single fields for EDAC-global valid values and leave
> the driver-specific stuff lumped in one char * string.

That's exactly what I said.

See above. driver_detail is a char string, with
the driver specific stuff. The EDAC global values are represented as-is
without being converted to integers.

Regards,
Mauro

Borislav Petkov

unread,
May 22, 2012, 5:28:17 AM5/22/12
to Mauro Carvalho Chehab, Borislav Petkov, Luck, Tony, Ingo Molnar, Linux Edac Mailing List, Linux Kernel Mailing List, Aristeu Rozanski, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
On Tue, May 22, 2012 at 12:04:48AM -0300, Mauro Carvalho Chehab wrote:
> +TRACE_EVENT(mc_event,
> +
> + TP_PROTO(const unsigned int err_type,
> + const unsigned int mc_index,
> + const char *error_msg,
> + const char *label,
> + int layer0,
> + int layer1,
> + int layer2,

Those are EDAC-internal layer representation, why are they exported to
userspace? Userspace needs only the location and label AFAICT.

If you export them to userspace, they need much more meaningful names -
layer{0,1,2} mean nothing outside of the kernel.

> + unsigned long pfn,
> + unsigned long offset,
> + unsigned long grain,

Why aren't those a single 'unsigned long address' since they all are
computed from it?

> + unsigned long syndrome,
> + const char *driver_detail),
> +
> + TP_ARGS(err_type, mc_index, error_msg, label, layer0, layer1, layer2,
> + pfn, offset, grain, syndrome, driver_detail),
>
> The address is there using the edac way to represent it (page, offset, grain).

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

Mauro Carvalho Chehab

unread,
May 22, 2012, 6:18:41 AM5/22/12
to Borislav Petkov, Luck, Tony, Ingo Molnar, Linux Edac Mailing List, Linux Kernel Mailing List, Aristeu Rozanski, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
Em 22-05-2012 06:28, Borislav Petkov escreveu:
> On Tue, May 22, 2012 at 12:04:48AM -0300, Mauro Carvalho Chehab wrote:
>> +TRACE_EVENT(mc_event,
>> +
>> + TP_PROTO(const unsigned int err_type,
>> + const unsigned int mc_index,
>> + const char *error_msg,
>> + const char *label,
>> + int layer0,
>> + int layer1,
>> + int layer2,
>
> Those are EDAC-internal layer representation, why are they exported to
> userspace? Userspace needs only the location and label AFAICT.

Those are not the EDAC internal layer representation. They're the physical
location of the DIMM or rank.

> If you export them to userspace, they need much more meaningful names -
> layer{0,1,2} mean nothing outside of the kernel.

Ok. Do you have a better naming suggestion?

What about layer0_pos, layer1_pos, layer2_pos?

>
>> + unsigned long pfn,
>> + unsigned long offset,
>> + unsigned long grain,
>
> Why aren't those a single 'unsigned long address' since they all are
> computed from it?

We can merge pfn and offset into "unsigned long address".

With regards to the grain, it is an address mask, written with a "short" way.
So, grain 32, for example, means:
ffff:ffff:ffff:fffe0

As the current EDAC API exports it as grain, IMO, it is better to keep it as-is,
but it won't be hard to do:
unsigned long mask = ((unsigned long) -1) && (1 - grain)

What do you think?

>> + unsigned long syndrome,
>> + const char *driver_detail),
>> +
>> + TP_ARGS(err_type, mc_index, error_msg, label, layer0, layer1, layer2,
>> + pfn, offset, grain, syndrome, driver_detail),
>>
>> The address is there using the edac way to represent it (page, offset, grain).
>

Regards,
Mauro

Borislav Petkov

unread,
May 22, 2012, 9:05:22 AM5/22/12
to Mauro Carvalho Chehab, Luck, Tony, Ingo Molnar, Linux Edac Mailing List, Linux Kernel Mailing List, Aristeu Rozanski, Doug Thompson, Steven Rostedt, Frederic Weisbecker, Ingo Molnar
On Tue, May 22, 2012 at 07:18:21AM -0300, Mauro Carvalho Chehab wrote:
> Em 22-05-2012 06:28, Borislav Petkov escreveu:
> > On Tue, May 22, 2012 at 12:04:48AM -0300, Mauro Carvalho Chehab wrote:
> >> +TRACE_EVENT(mc_event,
> >> +
> >> + TP_PROTO(const unsigned int err_type,
> >> + const unsigned int mc_index,
> >> + const char *error_msg,
> >> + const char *label,
> >> + int layer0,
> >> + int layer1,
> >> + int layer2,
> >
> > Those are EDAC-internal layer representation, why are they exported to
> > userspace? Userspace needs only the location and label AFAICT.
>
> Those are not the EDAC internal layer representation. They're the physical
> location of the DIMM or rank.

Ok, you've replaced the location char * with the layers.

> > If you export them to userspace, they need much more meaningful names -
> > layer{0,1,2} mean nothing outside of the kernel.
>
> Ok. Do you have a better naming suggestion?
>
> What about layer0_pos, layer1_pos, layer2_pos?

Actually, I'd like them to be called channel/rank/row or something. Having them
numbered I don't know which layer is the top layer (channel/branch/slot)
and the lowest (rank/csrow/...)

Maybe top_layer, middle_layer, lowest_layer? Or something like that...

> >
> >> + unsigned long pfn,
> >> + unsigned long offset,
> >> + unsigned long grain,
> >
> > Why aren't those a single 'unsigned long address' since they all are
> > computed from it?
>
> We can merge pfn and offset into "unsigned long address".

Just have a single "unsigned long address" field and userspace can pick
out the stuff it needs from it.

> With regards to the grain, it is an address mask, written with a "short" way.
> So, grain 32, for example, means:
> ffff:ffff:ffff:fffe0
>
> As the current EDAC API exports it as grain, IMO, it is better to keep it as-is,
> but it won't be hard to do:
> unsigned long mask = ((unsigned long) -1) && (1 - grain)
>
> What do you think?

Why are we even exporting grain actually with each tracepoint
invocation? This is the granularity of reported error in bytes, and it,
as such, is statically assigned to a value in each driver. Userspace can
certainly figure out that value in a different way.

But the more important question is: does the grain help us when handling
the error info in userspace?

It tells us that at this physical address with "grain" granularity we
had an error. So?

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
0 new messages