Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Regarding tx-nocache-copy in the Sheevaplug

39 views
Skip to first unread message

Lluís Batlle i Rossell

unread,
Oct 13, 2014, 7:00:01 AM10/13/14
to
Hello,

on the 7th of January 2014 ths patch was applied:
https://lkml.org/lkml/2014/1/7/307

[PATCH v2] net: Do not enable tx-nocache-copy by default

In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
sent corrupted. I think this machine has something special about the cache.

Enabling back this tx-nocache-copy (as it used to be before the patch) the
transfers work fine again. I think that most people, encountering this problem,
completely disable the tx offload instead of enabling back this setting.

Is this an ARM kernel problem regarding this platform?

Thank you,
Lluís
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Eric Dumazet

unread,
Oct 13, 2014, 8:30:02 AM10/13/14
to
On Mon, 2014-10-13 at 12:52 +0200, Lluís Batlle i Rossell wrote:
> Hello,
>
> on the 7th of January 2014 ths patch was applied:
> https://lkml.org/lkml/2014/1/7/307
>
> [PATCH v2] net: Do not enable tx-nocache-copy by default
>
> In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
> sent corrupted. I think this machine has something special about the cache.
>
> Enabling back this tx-nocache-copy (as it used to be before the patch) the
> transfers work fine again. I think that most people, encountering this problem,
> completely disable the tx offload instead of enabling back this setting.
>
> Is this an ARM kernel problem regarding this platform?

Which NIC and driver is this exactly ?

Lluís Batlle i Rossell

unread,
Oct 13, 2014, 8:40:02 AM10/13/14
to
On Mon, Oct 13, 2014 at 05:26:11AM -0700, Eric Dumazet wrote:
> On Mon, 2014-10-13 at 12:52 +0200, Lluís Batlle i Rossell wrote:
> > Hello,
> >
> > on the 7th of January 2014 ths patch was applied:
> > https://lkml.org/lkml/2014/1/7/307
> >
> > [PATCH v2] net: Do not enable tx-nocache-copy by default
> >
> > In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
> > sent corrupted. I think this machine has something special about the cache.
> >
> > Enabling back this tx-nocache-copy (as it used to be before the patch) the
> > transfers work fine again. I think that most people, encountering this problem,
> > completely disable the tx offload instead of enabling back this setting.
> >
> > Is this an ARM kernel problem regarding this platform?
>
> Which NIC and driver is this exactly ?

According to dmesg in 3.10.1:
[ 7.858872] mv643xx_eth: MV-643xx 10/100/1000 ethernet driver version 1.4
[ 7.866001] mv643xx_eth_port mv643xx_eth_port.0 eth0: port 0 with MAC address 00:50:43:01:d1:bb

Regards,
Lluís.

Andrew Lunn

unread,
Oct 13, 2014, 10:30:02 AM10/13/14
to
On Mon, Oct 13, 2014 at 12:52:46PM +0200, Lluís Batlle i Rossell wrote:
> Hello,
>
> on the 7th of January 2014 ths patch was applied:
> https://lkml.org/lkml/2014/1/7/307
>
> [PATCH v2] net: Do not enable tx-nocache-copy by default
>
> In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
> sent corrupted. I think this machine has something special about the cache.

Hi Lluís

Please could you describe your test setup. I would like to try to
reproduce the problem. I have a machine based on kirkwood 6282 and the
same ethernet.

Thanks
Andrew

Lluís Batlle i Rossell

unread,
Oct 13, 2014, 10:40:01 AM10/13/14
to
Enabling tx offload and disabling tx-nocache-copy, making the machine *send* a
lot of ssh traffic (sftp for example) makes ssh fail HMAC. It's quite easy to
reproduce here.

As for the hardware, it's an old sheevaplug board.

Eric Dumazet

unread,
Oct 13, 2014, 10:50:03 AM10/13/14
to
On Mon, 2014-10-13 at 16:31 +0200, Lluís Batlle i Rossell wrote:
> Enabling tx offload and disabling tx-nocache-copy, making the machine *send* a
> lot of ssh traffic (sftp for example) makes ssh fail HMAC. It's quite easy to
> reproduce here.
>
> As for the hardware, it's an old sheevaplug board.


Have you tried disabling TSO only, and are you using the latest kernel ?

Ezequiel Garcia added lot of changes recently.

Lluís Batlle i Rossell

unread,
Oct 13, 2014, 11:50:02 AM10/13/14
to
On Mon, Oct 13, 2014 at 07:49:19AM -0700, Eric Dumazet wrote:
> On Mon, 2014-10-13 at 16:31 +0200, Lluís Batlle i Rossell wrote:
> > Enabling tx offload and disabling tx-nocache-copy, making the machine *send* a
> > lot of ssh traffic (sftp for example) makes ssh fail HMAC. It's quite easy to
> > reproduce here.
> >
> > As for the hardware, it's an old sheevaplug board.
>
>
> Have you tried disabling TSO only, and are you using the latest kernel ?
>
> Ezequiel Garcia added lot of changes recently.
>
>

Is TSO TCP segmentation offload? It's disabled. The kernel is 3.16.3 (debian).
https://packages.debian.org/testing/kernel/linux-image-3.16-2-kirkwood

Benjamin Poirier

unread,
Oct 15, 2014, 6:00:02 PM10/15/14
to
On 2014/10/13 12:52, Lluís Batlle i Rossell wrote:
> Hello,
>
> on the 7th of January 2014 ths patch was applied:
> https://lkml.org/lkml/2014/1/7/307
>
> [PATCH v2] net: Do not enable tx-nocache-copy by default
>
> In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
> sent corrupted. I think this machine has something special about the cache.
>
> Enabling back this tx-nocache-copy (as it used to be before the patch) the
> transfers work fine again. I think that most people, encountering this problem,
> completely disable the tx offload instead of enabling back this setting.
>
> Is this an ARM kernel problem regarding this platform?

This is odd, only x86 defines ARCH_HAS_NOCACHE_UACCESS. On arm,
skb_do_copy_data_nocache() should end up using __copy_from_user()
regardless of tx-nocache-copy.

Eric Dumazet

unread,
Oct 15, 2014, 6:50:02 PM10/15/14
to
On Wed, 2014-10-15 at 14:57 -0700, Benjamin Poirier wrote:
> On 2014/10/13 12:52, Lluís Batlle i Rossell wrote:
> > Hello,
> >
> > on the 7th of January 2014 ths patch was applied:
> > https://lkml.org/lkml/2014/1/7/307
> >
> > [PATCH v2] net: Do not enable tx-nocache-copy by default
> >
> > In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
> > sent corrupted. I think this machine has something special about the cache.
> >
> > Enabling back this tx-nocache-copy (as it used to be before the patch) the
> > transfers work fine again. I think that most people, encountering this problem,
> > completely disable the tx offload instead of enabling back this setting.
> >
> > Is this an ARM kernel problem regarding this platform?
>
> This is odd, only x86 defines ARCH_HAS_NOCACHE_UACCESS. On arm,
> skb_do_copy_data_nocache() should end up using __copy_from_user()
> regardless of tx-nocache-copy.

kmap_atomic()/kunmap_atomic() is missing, so we lack
__cpuc_flush_dcache_area() operations.

Benjamin Poirier

unread,
Oct 16, 2014, 1:40:02 PM10/16/14
to
On 2014/10/15 15:45, Eric Dumazet wrote:
> On Wed, 2014-10-15 at 14:57 -0700, Benjamin Poirier wrote:
> > On 2014/10/13 12:52, Lluís Batlle i Rossell wrote:
> > > Hello,
> > >
> > > on the 7th of January 2014 ths patch was applied:
> > > https://lkml.org/lkml/2014/1/7/307
> > >
> > > [PATCH v2] net: Do not enable tx-nocache-copy by default
> > >
> > > In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets to be
> > > sent corrupted. I think this machine has something special about the cache.
> > >
> > > Enabling back this tx-nocache-copy (as it used to be before the patch) the
> > > transfers work fine again. I think that most people, encountering this problem,
> > > completely disable the tx offload instead of enabling back this setting.
> > >
> > > Is this an ARM kernel problem regarding this platform?
> >
> > This is odd, only x86 defines ARCH_HAS_NOCACHE_UACCESS. On arm,
> > skb_do_copy_data_nocache() should end up using __copy_from_user()
> > regardless of tx-nocache-copy.
>
> kmap_atomic()/kunmap_atomic() is missing, so we lack
> __cpuc_flush_dcache_area() operations.
>

You lost me there.
1) I don't see the link
2) It seems kmap_atomic and so on are there:
$ grep kmap_atomic System.map-3.16-2-kirkwood
c0014838 T kmap_atomic
c001491c T kmap_atomic_pfn
c00149a4 T kmap_atomic_to_page

MACH_KIRKWOOD selects CPU_FEROCEON which has
__cpuc_flush_dcache_area ->
cpu_cache.flush_kern_dcache_area ->
feroceon_flush_kern_dcache_area

Lluís Batlle i Rossell

unread,
Oct 16, 2014, 1:50:02 PM10/16/14
to
Hello all,

it seems I was a bit wrong - although enabling back tx-nocache-copy makes the
tx-errors happen much less often (ssh complaining about HMAC), they still
happen. It seems that something was introduced in some recent kernels that broke
the tx offload.

I have no idea what it can be, but since 2.6 until at least 3.10 the network
driver worked fine with tx offload in this sheevaplug board.

Regards,
Lluís.

Eric Dumazet

unread,
Oct 16, 2014, 1:50:05 PM10/16/14
to
On Thu, 2014-10-16 at 10:34 -0700, Benjamin Poirier wrote:
> On 2014/10/15 15:45, Eric Dumazet wrote:

> > kmap_atomic()/kunmap_atomic() is missing, so we lack
> > __cpuc_flush_dcache_area() operations.
> >
>
> You lost me there.
> 1) I don't see the link
> 2) It seems kmap_atomic and so on are there:
> $ grep kmap_atomic System.map-3.16-2-kirkwood
> c0014838 T kmap_atomic
> c001491c T kmap_atomic_pfn
> c00149a4 T kmap_atomic_to_page
>
> MACH_KIRKWOOD selects CPU_FEROCEON which has
> __cpuc_flush_dcache_area ->
> cpu_cache.flush_kern_dcache_area ->
> feroceon_flush_kern_dcache_area

I meant to put a '?' instead of a '.'

Note that tcp does a copy, using :

Benjamin Poirier

unread,
Oct 17, 2014, 5:00:02 PM10/17/14
to
On 2014/10/16 19:46, Lluís Batlle i Rossell wrote:
[...]
>
> Hello all,
>
> it seems I was a bit wrong - although enabling back tx-nocache-copy makes the
> tx-errors happen much less often (ssh complaining about HMAC), they still
> happen. It seems that something was introduced in some recent kernels that broke
> the tx offload.
>
> I have no idea what it can be, but since 2.6 until at least 3.10 the network
> driver worked fine with tx offload in this sheevaplug board.

It's not the most pleasant alternative but if you can be sure enough
whether the problem is occurring or not, you could try bisecting,
possibly limiting the bisection to mv643xx

$ git bisect start v3.16.3 v3.10 -- drivers/net/ethernet/marvell/mv643xx_eth.c
Bisecting: 16 revisions left to test after this (roughly 4 steps)

The problem might be outside of the driver though.
0 new messages