Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

mlx4_core: DMA-API: device driver tries to sync DMA memory it has not allocated

121 views
Skip to first unread message

Gerald Schaefer

unread,
Dec 10, 2013, 8:50:02 AM12/10/13
to
Hi,

During network stress test with CONFIG_DMA_API_DEBUG=y we get the following
message on the server side:

[57523.955982] mlx4_core 0000:00:00.0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x0000000143d2c202] [size=1526 bytes]
[57523.956018] ------------[ cut here ]------------
[57523.956021] WARNING: at /home/dailybuild/builder/packages/BUILD/linux-3.12.3-20131209/lib/dma-debug.c:986
[57523.956022] Modules linked in: mlx4_en dm_multipath scsi_dh mlx4_core eadm_sch dm_mod lcs autofs4
[57523.956034] CPU: 3 PID: 20 Comm: ksoftirqd/3 Not tainted 3.12.3-65.x.20131209-s390xdefault #1
[57523.956037] task: 000000026f47cb60 ti: 000000026f484000 task.ti: 000000026f484000
[57523.956044] Krnl PSW : 0404e00180000000 00000000004a0422 (check_sync+0xf2/0x52c)
[57523.956047] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 EA:3
[57523.956047] Krnl GPRS: 0000000000000111 000000008b767e9c 0000000000000092 0000000000000000
[57523.956051] 00000000004a041e 0000000000000000 00000000000005f6 0000000143d2c202
[57523.956053] 000000026f487a98 0000000000000001 0000000000a71144 000000026f10dd60
[57523.956055] 0000000000000000 000000000075bb18 00000000004a041e 000000026f487980
[57523.956064] Krnl Code: 00000000004a0412: c020002143d7 larl %r2,8c8bc0
[57523.956064] 00000000004a0418: c0e50012717a brasl %r14,6ee70c
[57523.956064] #00000000004a041e: a7f40001 brc 15,4a0420
[57523.956064] >00000000004a0422: e330a0000012 lt %r3,0(%r10)
[57523.956064] 00000000004a0428: a7740201 brc 7,4a082a
[57523.956064] 00000000004a042c: a7f401f3 brc 15,4a0812
[57523.956064] 00000000004a0430: d5078030c030 clc 48(8,%r8),48(%r12)
[57523.956064] 00000000004a0436: a7c40070 brc 12,4a0516
[57523.956081] Call Trace:
[57523.956084] ([<00000000004a041e>] check_sync+0xee/0x52c)
[57523.956087] [<00000000004a0a26>] debug_dma_sync_single_for_cpu+0x66/0x74
[57523.956094] [<000003ff801fcae0>] mlx4_en_complete_rx_desc+0x15c/0x240 [mlx4_en]
[57523.956098] [<000003ff801fe1fa>] mlx4_en_process_rx_cq+0x56a/0xb58 [mlx4_en]
[57523.956101] [<000003ff801fe8ea>] mlx4_en_poll_rx_cq+0x92/0xf8 [mlx4_en]
[57523.956105] [<00000000005d558c>] net_rx_action+0xa0/0x2f0
[57523.956109] [<000000000013cd10>] __do_softirq+0x1e8/0x480
[57523.956111] [<000000000013cfe4>] run_ksoftirqd+0x3c/0x80
[57523.956114] [<000000000016f496>] smpboot_thread_fn+0x2c6/0x2e8
[57523.956118] [<0000000000163834>] kthread+0xd0/0xe4
[57523.956123] [<00000000006ff19e>] kernel_thread_starter+0x6/0xc
[57523.956125] [<00000000006ff198>] kernel_thread_starter+0x0/0xc
[57523.956127] 1 lock held by ksoftirqd/3/20:
[57523.956129] #0: (&(&dma_entry_hash[i].lock)->rlock){-.-...}, at: [<000000000049f1d6>] get_hash_bucket+0x4a/0x64
[57523.956135] Last Breaking-Event-Address:
[57523.956137] [<00000000004a041e>] check_sync+0xee/0x52c
[57523.956139] ---[ end trace ec82af84e1d16c5f ]---

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Amir Vadai

unread,
Feb 27, 2014, 12:10:03 PM2/27/14
to
Hi Gerald,

It seems that I missed this mail.

I didn't see this warning in our lab in x86 platform.
Is it reproduced easily on PPC?
Any hints how to reproduce it in our lab?

Do you have a 'good' version that didn't have the warning?

Thanks,
Amir

Gerald Schaefer

unread,
Feb 28, 2014, 4:30:03 PM2/28/14
to
Hi Amir,

I could narrow it down to a uperf testrun with the following profile
and CONFIG_DMA_API_DEBUG=y (it usually takes multiple runs to show):

<profile name="test250">
<group nthreads="250">
<transaction iterations="1">
<flowop type="accept" options="remotehost=10.100.1.36 protocol=tcp"/>
</transaction>
<transaction duration="300">
<flowop type="write" options="size=1"/>
<flowop type="read" options="size=1"/>
</transaction>
<transaction iterations="1">
<flowop type="disconnect" />
</transaction>
</group>
</profile>

>
> Do you have a 'good' version that didn't have the warning?

No, it showed on 3.12 and 3.13, before that we had other warnings but
not this one as far as I remember.

Thanks,
Gerald

Gerald Schaefer

unread,
Mar 3, 2014, 1:00:02 PM3/3/14
to
On Sun, 2 Mar 2014 10:40:47 +0200
Amir Vadai <am...@mellanox.com> wrote:
I checked again with 3.10 and 3.11, and couldn't reproduce the warning
there. However, there is another warning on those kernel versions:

mlx4_core 0000:00:00.0: DMA-API: device driver failed to check map error[device address=0x00000001046af186] [size=258 bytes] [mapped as single]

That one was fixed with commit 237a3a3b "net/mlx4_en: Fix handling of dma_map failure".
When I apply commit 237a3a3b on top of 3.11, I can also reproduce the
new "sync DMA" warning there.

So, either the new "sync DMA" warning was introduced with commit 237a3a3b,
or it was just occluded by the previous "map error" warning.

> >
> > Thanks,
> > Gerald
> >
>
> Ok, We're preparing a setup to reproduce the issue.
>
> Amir

Eyal Perry

unread,
Mar 5, 2014, 11:30:02 AM3/5/14
to
On 03/03/14 18:58 +0100, Gerald Schaefer wrote:
> I checked again with 3.10 and 3.11, and couldn't reproduce the warning
> there. However, there is another warning on those kernel versions:
>
> mlx4_core 0000:00:00.0: DMA-API: device driver failed to check map error[device address=0x00000001046af186] [size=258 bytes] [mapped as single]
>
> That one was fixed with commit 237a3a3b "net/mlx4_en: Fix handling of dma_map failure".
> When I apply commit 237a3a3b on top of 3.11, I can also reproduce the
> new "sync DMA" warning there.
>
> So, either the new "sync DMA" warning was introduced with commit 237a3a3b,
> or it was just occluded by the previous "map error" warning.
Hi Gerald,
Thanks for the update and for your efforts on investigating this issue,
we are currently working on reproducing this issue on our lab in order
to understand thoroughly the root cause.

Eyal
0 new messages