Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

mptutil(8) segfault on IBM xSeries 3550

5 views
Skip to first unread message

Charles Owens

unread,
Feb 12, 2010, 2:25:33 PM2/12/10
to freebsd-...@freebsd.org
Howdy,

We're working with IBM hardware (xSeries 3550) that has an
mpt-based RAID controller... after initial success with testing the
mptutil utility, now operations other than "show adapter" and "show
volume" are resulting in segfaults.

While it was working properly we created and removed volumes several
times, force-failed drives, and just generally put it through its
paces... and all seemed fine. Then, after a reboot, it suddenly started
failing with segfault as described, and nothing we do has helped to get
it out of this state (including trying to use the LSI in-BIOS manager to
create/delete volumes -- which in and of itself works fine).

We found recent thread
http://docs.freebsd.org/cgi/mid.cgi?4B56CD4C.80503 and hoped that it
might somehow relate... and even tried the patch that John Baldwin
posted, but to no avail.

Has anyone seen this behavior and/or have a suggested fix or workaround?


Here's the output of "mptutil show adapter":

mpt0 Adapter:
Board Name: SR-BR10i
Board Assembly: L3-25116-01H
Chip Name: C1068E
Chip Revision: UNUSED
RAID Levels: RAID0, RAID1, RAID1E
RAID0 Stripes: 64K
RAID1E Stripes: 64K
RAID0 Drives/Vol: 1-10
RAID1 Drives/Vol: 2
RAID1E Drives/Vol: 3-10


This work is being done using FreeBSD 8.0-RELEASE-p2 + PAE.

Thank very much,

Charles


--
Charles Owens
Great Bay Software, Inc.

Charles Owens

unread,
Feb 15, 2010, 5:25:15 PM2/15/10
to freebsd-...@freebsd.org


I should add that the RAID controller in question is the IBM
ServeRAID-BR10i SAS/SATA Controller which is based on the LSI 1068E
processor, as described here:
http://www-01.ibm.com/common/ssi/rep_ca/4/872/ENUSAG09-0104/index.html

Charles Owens

unread,
Feb 15, 2010, 5:31:59 PM2/15/10
to freebsd-...@freebsd.org
PR created -- http://www.freebsd.org/cgi/query-pr.cgi?pr=143972

Charles Owens
Great Bay Software, Inc.

pluknet

unread,
Feb 15, 2010, 11:34:01 PM2/15/10
to Charles Owens, freebsd-...@freebsd.org

Hi, would you show ktrace output?

--
wbr,
pluknet

Charles Owens

unread,
Feb 17, 2010, 8:41:51 AM2/17/10
to pluknet, freebsd-...@freebsd.org

I'm going to do this today. Should I run ktrace with any particular
arguments (ie. the "-t" option) ?

Thanks, Charles


Charles Owens

unread,
Feb 17, 2010, 3:12:51 PM2/17/10
to pluknet, freebsd-...@freebsd.org

I've attached a kdump'd ktrace for command "mptutil show drives". I
also have one for "show config"... but at quick glance they appear to
end very similarly (but I'll send it, of course, if you need it).

Here's a bit more info that may relate: when I first played with this
system it would boot with the following SCSI drives detected:

* da0, da1 -- real SCSI drives... not in RAID volume
* da2, da3 -- funky "Linux Virtual Drive" and "Linux Virtual Floppy"
that appear to be on some USB bus (something internal... maybe
related to the IBM on-board management controller?)


The first few times that I used mptutil to create the RAID volume, it
would show up as da4... not too surprisingly. The final time I created
it successfully, it showed up as da0... for no reason that I could see.
>From that point on, as I've described, mptutil would bomb whenever
trying to do anything beyond "show adapter" and "show volumes".

Thanks in advance for looking at this,
Charles

show-drives_kt.txt

John Baldwin

unread,
Feb 17, 2010, 2:23:36 PM2/17/10
to freebsd-...@freebsd.org, Charles Owens
On Monday 15 February 2010 5:31:59 pm Charles Owens wrote:
> PR created -- http://www.freebsd.org/cgi/query-pr.cgi?pr=143972
>
> Charles Owens
> Great Bay Software, Inc.

Can you build mptutil with debug symbols (make DEBUG_FLAGS=-g clean all
install) and get a coredump?

--
John Baldwin

John Baldwin

unread,
Feb 18, 2010, 10:23:08 AM2/18/10
to freebsd-...@freebsd.org, Charles Owens

Try this updated patch. It should fix the problems with 'mptutil show drives'
displaying all daX devices in the system rather than just the ones for the
mptX bus. I had incorrectly interpreted the XPT matches as being an AND
rather than an OR. This changes the code to first do a lookup for the logical
"path" (SCSI bus) for mptX devices and then do a second lookup to fetch any
daX devices on that path. I tested it on a machine with an mpt controller and
a USB disk. Unfortunately I wasn't able to test any of the RAID stuff, just
'show drives'. This mpt(4) controller doesn't support RAID either, so I was
also able to verify the fix you had already tested for cleaning up 'show
adapter' output in that case.

Index: mpt_cam.c
===================================================================
--- mpt_cam.c (revision 204004)
+++ mpt_cam.c (working copy)
@@ -56,15 +56,75 @@
return (xptfd);
}

+/* Fetch the path id of bus 0 for the opened mpt controller. */
+static int
+fetch_path_id(path_id_t *path_id)
+{
+ struct bus_match_pattern *b;
+ union ccb ccb;
+ size_t bufsize;
+
+ if (xpt_open() < 0)
+ return (ENXIO);
+
+ /* First, find the path id of bus 0 for this mpt controller. */
+ bzero(&ccb, sizeof(ccb));
+
+ ccb.ccb_h.func_code = XPT_DEV_MATCH;
+
+ bufsize = sizeof(struct dev_match_result) * 1;
+ ccb.cdm.num_matches = 0;
+ ccb.cdm.match_buf_len = bufsize;
+ ccb.cdm.matches = calloc(1, bufsize);
+
+ bufsize = sizeof(struct dev_match_pattern) * 1;
+ ccb.cdm.num_patterns = 1;
+ ccb.cdm.pattern_buf_len = bufsize;
+ ccb.cdm.patterns = calloc(1, bufsize);
+
+ /* Match mptX bus 0. */
+ ccb.cdm.patterns[0].type = DEV_MATCH_BUS;
+ b = &ccb.cdm.patterns[0].pattern.bus_pattern;
+ snprintf(b->dev_name, sizeof(b->dev_name), "mpt");
+ b->unit_number = mpt_unit;
+ b->bus_id = 0;
+ b->flags = BUS_MATCH_NAME | BUS_MATCH_UNIT | BUS_MATCH_BUS_ID;
+
+ if (ioctl(xptfd, CAMIOCOMMAND, &ccb) < 0) {
+ free(ccb.cdm.matches);
+ free(ccb.cdm.patterns);
+ return (errno);
+ }
+ free(ccb.cdm.patterns);
+
+ if (((ccb.ccb_h.status & CAM_STATUS_MASK) != CAM_REQ_CMP) ||
+ (ccb.cdm.status != CAM_DEV_MATCH_LAST)) {
+ warnx("fetch_path_id got CAM error %#x, CDM error %d\n",
+ ccb.ccb_h.status, ccb.cdm.status);
+ free(ccb.cdm.matches);
+ return (EIO);
+ }
+
+ /* We should have exactly 1 match for the bus. */
+ if (ccb.cdm.num_matches != 1 ||
+ ccb.cdm.matches[0].type != DEV_MATCH_BUS) {
+ free(ccb.cdm.matches);
+ return (ENOENT);
+ }
+ *path_id = ccb.cdm.matches[0].result.bus_result.path_id;
+ free(ccb.cdm.matches);
+ return (0);
+}
+
int
mpt_query_disk(U8 VolumeBus, U8 VolumeID, struct mpt_query_disk *qd)
{
- struct bus_match_pattern *b;
struct periph_match_pattern *p;
struct periph_match_result *r;
union ccb ccb;
+ path_id_t path_id;
size_t bufsize;
- int i;
+ int error, i;

/* mpt(4) only handles devices on bus 0. */
if (VolumeBus != 0)
@@ -73,6 +133,11 @@
if (xpt_open() < 0)
return (ENXIO);

+ /* Find the path ID of bus 0. */
+ error = fetch_path_id(&path_id);
+ if (error)
+ return (error);
+
bzero(&ccb, sizeof(ccb));

ccb.ccb_h.func_code = XPT_DEV_MATCH;
@@ -85,25 +150,18 @@
ccb.cdm.match_buf_len = bufsize;
ccb.cdm.matches = calloc(1, bufsize);

- bufsize = sizeof(struct dev_match_pattern) * 2;
- ccb.cdm.num_patterns = 2;
+ bufsize = sizeof(struct dev_match_pattern) * 1;
+ ccb.cdm.num_patterns = 1;
ccb.cdm.pattern_buf_len = bufsize;
ccb.cdm.patterns = calloc(1, bufsize);

- /* Match mptX bus 0. */
- ccb.cdm.patterns[0].type = DEV_MATCH_BUS;
- b = &ccb.cdm.patterns[0].pattern.bus_pattern;
- snprintf(b->dev_name, sizeof(b->dev_name), "mpt");
- b->unit_number = mpt_unit;
- b->bus_id = 0;
- b->flags = BUS_MATCH_NAME | BUS_MATCH_UNIT | BUS_MATCH_BUS_ID;
-
/* Look for a "da" device at the specified target and lun. */
- ccb.cdm.patterns[1].type = DEV_MATCH_PERIPH;
- p = &ccb.cdm.patterns[1].pattern.periph_pattern;
+ ccb.cdm.patterns[0].type = DEV_MATCH_PERIPH;
+ p = &ccb.cdm.patterns[0].pattern.periph_pattern;
+ p->path_id = path_id;
snprintf(p->periph_name, sizeof(p->periph_name), "da");
p->target_id = VolumeID;
- p->flags = PERIPH_MATCH_NAME | PERIPH_MATCH_TARGET;
+ p->flags = PERIPH_MATCH_PATH | PERIPH_MATCH_NAME | PERIPH_MATCH_TARGET;

if (ioctl(xptfd, CAMIOCOMMAND, &ccb) < 0) {
i = errno;
@@ -122,25 +180,22 @@
}

/*
- * We should have exactly 2 matches, 1 for the bus and 1 for
- * the peripheral. However, if we only have 1 match and it is
- * for the bus, don't print an error message and return
- * ENOENT.
+ * We should have exactly 1 match for the peripheral.
+ * However, if we don't get a match, don't print an error
+ * message and return ENOENT.
*/
- if (ccb.cdm.num_matches == 1 &&
- ccb.cdm.matches[0].type == DEV_MATCH_BUS) {
+ if (ccb.cdm.num_matches == 0) {
free(ccb.cdm.matches);
return (ENOENT);
}
- if (ccb.cdm.num_matches != 2) {
- warnx("mpt_query_disk got %d matches, expected 2",
+ if (ccb.cdm.num_matches != 1) {
+ warnx("mpt_query_disk got %d matches, expected 1",
ccb.cdm.num_matches);
free(ccb.cdm.matches);
return (EIO);
}
- if (ccb.cdm.matches[0].type != DEV_MATCH_BUS ||
- ccb.cdm.matches[1].type != DEV_MATCH_PERIPH) {
- warnx("mpt_query_disk got wrong CAM matches");
+ if (ccb.cdm.matches[0].type != DEV_MATCH_PERIPH) {
+ warnx("mpt_query_disk got wrong CAM match");
free(ccb.cdm.matches);
return (EIO);
}
@@ -336,47 +391,44 @@
{
CONFIG_PAGE_IOC_2 *ioc2;
struct mpt_standalone_disk *disks;
- struct bus_match_pattern *b;
struct periph_match_pattern *p;
struct periph_match_result *r;
struct cam_device *dev;
union ccb ccb;
+ path_id_t path_id;
size_t bufsize;
u_int i;
- int count;
+ int count, error;

if (xpt_open() < 0)
return (ENXIO);

+ error = fetch_path_id(&path_id);
+ if (error)
+ return (error);
+
for (count = 100;; count+= 100) {
/* Try to fetch 'count' disks in one go. */
bzero(&ccb, sizeof(ccb));

ccb.ccb_h.func_code = XPT_DEV_MATCH;

- bufsize = sizeof(struct dev_match_result) * (count + 2);
+ bufsize = sizeof(struct dev_match_result) * (count + 1);
ccb.cdm.num_matches = 0;
ccb.cdm.match_buf_len = bufsize;
ccb.cdm.matches = calloc(1, bufsize);

- bufsize = sizeof(struct dev_match_pattern) * 2;
- ccb.cdm.num_patterns = 2;
+ bufsize = sizeof(struct dev_match_pattern) * 1;
+ ccb.cdm.num_patterns = 1;
ccb.cdm.pattern_buf_len = bufsize;
ccb.cdm.patterns = calloc(1, bufsize);

- /* Match mptX bus 0. */
- ccb.cdm.patterns[0].type = DEV_MATCH_BUS;
- b = &ccb.cdm.patterns[0].pattern.bus_pattern;
- snprintf(b->dev_name, sizeof(b->dev_name), "mpt");
- b->unit_number = mpt_unit;
- b->bus_id = 0;
- b->flags = BUS_MATCH_NAME | BUS_MATCH_UNIT | BUS_MATCH_BUS_ID;
-
/* Match any "da" peripherals. */
- ccb.cdm.patterns[1].type = DEV_MATCH_PERIPH;
- p = &ccb.cdm.patterns[1].pattern.periph_pattern;
+ ccb.cdm.patterns[0].type = DEV_MATCH_PERIPH;
+ p = &ccb.cdm.patterns[0].pattern.periph_pattern;
+ p->path_id = path_id;
snprintf(p->periph_name, sizeof(p->periph_name), "da");
- p->flags = PERIPH_MATCH_NAME;
+ p->flags = PERIPH_MATCH_PATH | PERIPH_MATCH_NAME;

if (ioctl(xptfd, CAMIOCOMMAND, &ccb) < 0) {
i = errno;
@@ -406,21 +458,16 @@
break;
}

- /*
- * We should have N + 1 matches, 1 for the bus and 1 for each
- * "da" device.
- */
- if (ccb.cdm.num_matches < 1) {
- warnx("mpt_fetch_disks didn't get any matches");
+ /* Shortcut if we don't have any "da" devices. */
+ if (ccb.cdm.num_matches == 0) {
free(ccb.cdm.matches);
- return (EIO);
+ *ndisks = 0;
+ *disksp = NULL;
+ return (0);
}
- if (ccb.cdm.matches[0].type != DEV_MATCH_BUS) {
- warnx("mpt_fetch_disks got wrong CAM matches");
- free(ccb.cdm.matches);
- return (EIO);
- }
- for (i = 1; i < ccb.cdm.num_matches; i++) {
+
+ /* We should have N matches, 1 for each "da" device. */
+ for (i = 0; i < ccb.cdm.num_matches; i++) {
if (ccb.cdm.matches[i].type != DEV_MATCH_PERIPH) {
warnx("mpt_fetch_disks got wrong CAM matches");
free(ccb.cdm.matches);
@@ -428,14 +475,6 @@
}
}

- /* Shortcut if we don't have any "da" devices. */
- if (ccb.cdm.num_matches == 1) {
- free(ccb.cdm.matches);
- *ndisks = 0;
- *disksp = NULL;
- return (0);
- }
-
/*
* Some of the "da" peripherals may be for RAID volumes, so
* fetch the IOC 2 page (list of RAID volumes) so we can
@@ -444,7 +483,7 @@
ioc2 = mpt_read_ioc_page(fd, 2, NULL);
disks = calloc(ccb.cdm.num_matches, sizeof(*disks));
count = 0;
- for (i = 1; i < ccb.cdm.num_matches; i++) {
+ for (i = 0; i < ccb.cdm.num_matches; i++) {
r = &ccb.cdm.matches[i].result.periph_result;
if (periph_is_volume(ioc2, r))
continue;
@@ -480,10 +519,9 @@
int
mpt_rescan_bus(int bus, int id)
{
- struct bus_match_pattern *b;
union ccb ccb;
path_id_t path_id;
- size_t bufsize;
+ int error;

/* mpt(4) only handles devices on bus 0. */
if (bus != -1 && bus != 0)
@@ -492,54 +530,12 @@
if (xpt_open() < 0)
return (ENXIO);

- /* First, find the path id of bus 0 for this mpt controller. */
- bzero(&ccb, sizeof(ccb));
+ error = fetch_path_id(&path_id);
+ if (error)
+ return (error);

- ccb.ccb_h.func_code = XPT_DEV_MATCH;
-
- bufsize = sizeof(struct dev_match_result) * 1;
- ccb.cdm.num_matches = 0;
- ccb.cdm.match_buf_len = bufsize;
- ccb.cdm.matches = calloc(1, bufsize);
-
- bufsize = sizeof(struct dev_match_pattern) * 1;
- ccb.cdm.num_patterns = 1;
- ccb.cdm.pattern_buf_len = bufsize;
- ccb.cdm.patterns = calloc(1, bufsize);
-
- /* Match mptX bus 0. */
- ccb.cdm.patterns[0].type = DEV_MATCH_BUS;
- b = &ccb.cdm.patterns[0].pattern.bus_pattern;
- snprintf(b->dev_name, sizeof(b->dev_name), "mpt");
- b->unit_number = mpt_unit;
- b->bus_id = 0;
- b->flags = BUS_MATCH_NAME | BUS_MATCH_UNIT | BUS_MATCH_BUS_ID;
-
- if (ioctl(xptfd, CAMIOCOMMAND, &ccb) < 0) {
- free(ccb.cdm.matches);
- free(ccb.cdm.patterns);
- return (errno);
- }
- free(ccb.cdm.patterns);
-
- if (((ccb.ccb_h.status & CAM_STATUS_MASK) != CAM_REQ_CMP) ||
- (ccb.cdm.status != CAM_DEV_MATCH_LAST)) {
- warnx("mpt_rescan_bus got CAM error %#x, CDM error %d\n",
- ccb.ccb_h.status, ccb.cdm.status);
- free(ccb.cdm.matches);
- return (EIO);
- }
-
- /* We should have exactly 1 match for the bus. */
- if (ccb.cdm.num_matches != 1 ||
- ccb.cdm.matches[0].type != DEV_MATCH_BUS) {
- free(ccb.cdm.matches);
- return (ENOENT);
- }
- path_id = ccb.cdm.matches[0].result.bus_result.path_id;
- free(ccb.cdm.matches);
-
- /* Now perform the actual rescan. */
+ /* Perform the actual rescan. */
+ bzero(&ccb, sizeof(ccb));
ccb.ccb_h.path_id = path_id;
if (id == -1) {
ccb.ccb_h.func_code = XPT_SCAN_BUS;
Index: mpt_show.c
===================================================================
--- mpt_show.c (revision 204004)
+++ mpt_show.c (working copy)
@@ -78,6 +78,7 @@
CONFIG_PAGE_MANUFACTURING_0 *man0;
CONFIG_PAGE_IOC_2 *ioc2;
CONFIG_PAGE_IOC_6 *ioc6;
+ U16 IOCStatus;
int fd, comma;

if (ac != 1) {
@@ -108,7 +109,7 @@

free(man0);

- ioc2 = mpt_read_ioc_page(fd, 2, NULL);
+ ioc2 = mpt_read_ioc_page(fd, 2, &IOCStatus);
if (ioc2 != NULL) {
printf(" RAID Levels:");
comma = 0;
@@ -151,9 +152,11 @@
printf(" none");
printf("\n");
free(ioc2);
- }
+ } else if ((IOCStatus & MPI_IOCSTATUS_MASK) !=
+ MPI_IOCSTATUS_CONFIG_INVALID_PAGE)
+ warnx("mpt_read_ioc_page(2): %s", mpt_ioc_status(IOCStatus));

- ioc6 = mpt_read_ioc_page(fd, 6, NULL);
+ ioc6 = mpt_read_ioc_page(fd, 6, &IOCStatus);
if (ioc6 != NULL) {
display_stripe_map(" RAID0 Stripes",
ioc6->SupportedStripeSizeMapIS);
@@ -172,7 +175,9 @@
printf("-%u", ioc6->MaxDrivesIME);
printf("\n");
free(ioc6);
- }
+ } else if ((IOCStatus & MPI_IOCSTATUS_MASK) !=
+ MPI_IOCSTATUS_CONFIG_INVALID_PAGE)
+ warnx("mpt_read_ioc_page(6): %s", mpt_ioc_status(IOCStatus));

/* TODO: Add an ioctl to fetch IOC_FACTS and print firmware version. */

@@ -541,7 +546,8 @@
for (i = 0; i <= 0xff; i++) {
pinfo = mpt_pd_info(fd, i, &IOCStatus);
if (pinfo == NULL) {
- if (IOCStatus != MPI_IOCSTATUS_CONFIG_INVALID_PAGE)
+ if ((IOCStatus & MPI_IOCSTATUS_MASK) !=
+ MPI_IOCSTATUS_CONFIG_INVALID_PAGE)
warnx("mpt_pd_info(%d): %s", i,
mpt_ioc_status(IOCStatus));
continue;

--
John Baldwin

Charles Owens

unread,
Feb 19, 2010, 1:01:38 PM2/19/10
to John Baldwin, freebsd-...@freebsd.org
> [patch omitted]


John,

The patch appears to have resolved the problem. We're still banging on
it, but so far it looks very good!

Thanks very much!

Charles

John Baldwin

unread,
Feb 19, 2010, 1:15:13 PM2/19/10
to Charles Owens, freebsd-...@freebsd.org

Excellent, thanks! I've committed it to HEAD and will MFC it in a week or
so. It is probably too late to make 7.3 however.

--
John Baldwin

Charles Owens

unread,
Mar 15, 2010, 4:03:23 PM3/15/10
to John Baldwin, freebsd-...@freebsd.org

Again, thanks for the patch... overall it is working well... we're now
able to successively do what we need to do with RAID system. We are,
though, seeing some sor of error messages:

# mptutil show volumes
mpt0 Volumes:
Id Size Level Stripe State Write-Cache Name
mptutil: mpt_query_disk got 4 matches, expected 2
0 ( 279G) RAID-1 OPTIMAL Disabled

# mptutil show config
mpt0 Configuration: 1 volumes, 2 drives
mptutil: mpt_query_disk got 4 matches, expected 2
volume 0 (279G) RAID-1 OPTIMAL spans:
drive 1 (279G) ONLINE <WD3000BLFS-23YBU 4V04> SATA
drive 0 (279G) ONLINE <WD3000BLFS-23YBU 4V04> SATA
spare pools: 0


We can certainly live with this, but I wanted to let you know in case
you thought it was worth digging into. Let me know if you need any
additional debug info beyond this:

# camcontrol devlist
<LSILOGIC Logical Volume 3000> at scbus0 target 0 lun 0 (pass0,da0)
<ATA WD3000BLFS-23YBU 4V04> at scbus1 target 1 lun 0 (pass1)
<Linux Virtual CD/DVD 0316> at scbus2 target 0 lun 0 (pass2,cd0)
<Linux Virtual Floppy 0316> at scbus3 target 0 lun 0 (da1,pass3)
<Linux Virtual Floppy 0316> at scbus3 target 0 lun 1 (da2,pass4)


Thanks,

Charles

John Baldwin

unread,
Mar 17, 2010, 11:14:11 AM3/17/10
to Charles Owens, freebsd-...@freebsd.org

Are you sure this is a fixed binary? The new binary doesn't print out that
message anymore, it only ways 'got %d matches, expected 1'. Also, the 4
instead of 2 is consistent with the old bug in that the two Linux virtual
floppies (da1 and da2) would be reported as extra for 'mptutil show drives' in
this case I think.

> We can certainly live with this, but I wanted to let you know in case
> you thought it was worth digging into. Let me know if you need any
> additional debug info beyond this:
>
> # camcontrol devlist
> <LSILOGIC Logical Volume 3000> at scbus0 target 0 lun 0 (pass0,da0)
> <ATA WD3000BLFS-23YBU 4V04> at scbus1 target 1 lun 0 (pass1)
> <Linux Virtual CD/DVD 0316> at scbus2 target 0 lun 0 (pass2,cd0)
> <Linux Virtual Floppy 0316> at scbus3 target 0 lun 0 (da1,pass3)
> <Linux Virtual Floppy 0316> at scbus3 target 0 lun 1 (da2,pass4)

--
John Baldwin

Charles Owens

unread,
Mar 17, 2010, 4:43:11 PM3/17/10
to John Baldwin, freebsd-...@freebsd.org

You're right! It appears on one of my two devel systems I misapplied
the patch somehow. Much better now... thanks!

0 new messages