Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#688198: megasas: Failed to alloc kernel SGL buffer for IOCTL - Possible regression from 2.6.32.41~3

186 views
Skip to first unread message

Todd Fleisher

unread,
Nov 20, 2012, 10:50:01 AM11/20/12
to
FYI - I'm seeing this same issue in Ubuntu 12.04: Linux deb015.pod02 3.2.0-32-generic #51-Ubuntu SMP Wed Sep 26 21:33:09 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux


--
To UNSUBSCRIBE, email to debian-bugs-...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Bjørn Mork

unread,
Nov 20, 2012, 12:40:01 PM11/20/12
to
Todd Fleisher <to...@fleetstreetops.com> writes:

> FYI - I'm seeing this same issue in Ubuntu 12.04: Linux deb015.pod02
> 3.2.0-32-generic #51-Ubuntu SMP Wed Sep 26 21:33:09 UTC 2012 x86_64
> x86_64 x86_64 GNU/Linux

Shit! I have a bad feeling I might be responsible here...

Looks like the "fix" I submitted a while ago results in leaking
dma_allocated memory instead of BUGing out. Maybe slightly better in a
short term, but slightly more difficult to notice. Does it take a while
before this error starts appearing? Do you run some smartctl commands
periodically?

I'd appreciate it if the good Debian kernel team could tak a look at
this before it goes upstream, but I believe something like the attached
patch might fix the bug. This patch is based on v3.2.34, but I'll
rebase it on current mainline and submit it upstream with Cc stable if
any of you confirms that this look sane


Bjørn

0001-megaraid_sas-fix-memory-leak-if-SGL-has-0-length-ent.patch

Todd Fleisher

unread,
Nov 20, 2012, 3:40:02 PM11/20/12
to
I get this periodically (seemingly random - but usually once it starts happening it sticks around for a while, then disappears only to return later) when I'm using LSI's MegaCli64 utility. When the kernel logs the error the MegaCli64 command doesn't return any data either.

Ex:
ro...@deb015.pod02:~# MegaCli64 -PDList -aALL


Exit Code: 0x00


Which is paired with a kernel message:
Nov 20 20:29:50 deb015 kernel: [797020.797811] megasas: Failed to alloc kernel SGL buffer for IOCTL

Other times that same command (or other MegaCli64 commands) will succeed and return the associated data. When this happens, there is no megasas kernel message.

-T
> From 4c41818461c2604f859d2fecda2657827071f0d4 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bj...@mork.no>
> Date: Tue, 20 Nov 2012 18:17:48 +0100
> Subject: [PATCH] megaraid_sas: fix memory leak if SGL has 0 length entries
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> commit 98cb7e44 ([SCSI] megaraid_sas: Sanity check user
> supplied length before passing it to dma_alloc_coherent())
> introduced a memory leak. Memory allocated for entries
> following zero length SGL entries will not be freed.
>
> Signed-off-by: Bjørn Mork <bj...@mork.no>
> ---
> drivers/scsi/megaraid/megaraid_sas_base.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
> index 7c471eb..f013432 100644
> --- a/drivers/scsi/megaraid/megaraid_sas_base.c
> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
> @@ -4886,8 +4886,9 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance,
> sense, sense_handle);
> }
>
> - for (i = 0; i < ioc->sge_count && kbuff_arr[i]; i++) {
> - dma_free_coherent(&instance->pdev->dev,
> + for (i = 0; i < ioc->sge_count; i++) {
> + if (kbuff_arr[i])
> + dma_free_coherent(&instance->pdev->dev,
> kern_sge32[i].length,
> kbuff_arr[i], kern_sge32[i].phys_addr);
> }
> --
> 1.7.10.4

Bjørn Mork

unread,
Nov 21, 2012, 2:30:01 AM11/21/12
to
Todd Fleisher <to...@fleetstreetops.com> writes:

> I get this periodically (seemingly random - but usually once it starts happening it sticks around for a while, then disappears only to return later) when I'm using LSI's MegaCli64 utility. When the kernel logs the error the MegaCli64 command doesn't return any data either.
>
> Ex:
> ro...@deb015.pod02:~# MegaCli64 -PDList -aALL
>
>
> Exit Code: 0x00
>
>
> Which is paired with a kernel message:
> Nov 20 20:29:50 deb015 kernel: [797020.797811] megasas: Failed to alloc kernel SGL buffer for IOCTL
>
> Other times that same command (or other MegaCli64 commands) will succeed and return the associated data. When this happens, there is no megasas kernel message.


Thanks. I don't know what the MegaCli64 utility does, but I assume it
use the driver specific ioctls to send passthrough commands like the
smartmontools do. That is consistent with your description.

But I was concluding too fast as usual. The bug I found needs to be
fixed, but it cannot be the cause of this problem. If it were then you
would most likely see many other effects on your system. And the same
bug has been backported to 2.6.32 as well. And if if had not been, and
you are in fact hit by it, then your system would have crashed instead.

So that cannot be the problem. And then I don't know what could have
changed between 2.6.32 and 3.2. Could be something outside this driver.

It would be interesting to know something about the size of the buffers
which cannot be allocated. But running with debug pacthes is maybe out
of the question? Otherwise you could try running with something like
this to get a better picture of why this is failing:


diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index f013432..1c0fa1d 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -4797,6 +4797,7 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance,
if (!kbuff_arr[i]) {
printk(KERN_DEBUG "megasas: Failed to alloc "
"kernel SGL buffer for IOCTL \n");
+ printk(KERN_DEBUG "megasas: iov_len=%d\n", ioc->sgl[i].iov_len);
error = -ENOMEM;
goto out;
}




Bjørn

Todd Fleisher

unread,
Nov 21, 2012, 11:50:02 AM11/21/12
to
FWIW - I don't experience the problem/message on a Debian Squeeze box running Linux deb003.pod01 2.6.32-5-xen-amd64 #1 SMP Mon Jan 16 20:48:30 UTC 2012 x86_64 GNU/Linux

I'm not currently able to re-compile my 3.2 ubuntu 12.04 kernel, but will try to find a comparable system to do it on.
0 new messages