[RFC PATCH] drivers: ata: ahci_sunxi: Increased SATA/AHCI DMA TX/RX FIFOs

118 views
Skip to first unread message

Uenal Mutlu

unread,
May 10, 2019, 3:27:29 PM5/10/19
to Jens Axboe, Maxime Ripard, Chen-Yu Tsai, linu...@vger.kernel.org, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Uenal Mutlu, linux...@googlegroups.com, u-b...@lists.denx.de, linux-...@amarulasolutions.com, Jagan Teki, Pablo Greco, Mark Rutland, Oliver Schinagl, Linus Walleij, Hans de Goede, FUKAUMI Naoki, Andre Przywara
Increasing the SATA/AHCI DMA TX/RX FIFOs (P0DMACR.TXTS and .RXTS) from
default 0x0 each to 0x3 each gives a write performance boost of 120MB/s
from lame 36MB/s to 45MB/s previously. Read performance is about 200MB/s
[tested on SSD using dd bs=4K count=512K].

Tested on the Banana Pi R1 (aka Lamobo R1) and Banana Pi M1 SBCs
with Allwinner A20 32bit-SoCs (ARMv7-a / arm-linux-gnueabihf).
These devices are RaspberryPi-like small devices.

RFC: Since more than about 25 similar SBC/SoC models do use the
ahci_sunxi driver, users are encouraged to test it on all the
affected boards and give feedback.

List of the affected sunxi and other boards and SoCs with SATA using
the ahci_sunxi driver:
$ grep -i -e "^&ahci" arch/arm/boot/dts/sun*dts
and http://linux-sunxi.org/Category:Devices_with_SATA_port

Signed-off-by: Uenal Mutlu <u...@mutluit.com>
---
drivers/ata/ahci_sunxi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ata/ahci_sunxi.c b/drivers/ata/ahci_sunxi.c
index 911710643305..257986431c79 100644
--- a/drivers/ata/ahci_sunxi.c
+++ b/drivers/ata/ahci_sunxi.c
@@ -158,7 +158,7 @@ static void ahci_sunxi_start_engine(struct ata_port *ap)
struct ahci_host_priv *hpriv = ap->host->private_data;

/* Setup DMA before DMA start */
- sunxi_clrsetbits(hpriv->mmio + AHCI_P0DMACR, 0x0000ff00, 0x00004400);
+ sunxi_clrsetbits(hpriv->mmio + AHCI_P0DMACR, 0x0000ffff, 0x00004433);

/* Start DMA */
sunxi_setbits(port_mmio + PORT_CMD, PORT_CMD_START);
--
2.11.0

Stefan Monnier

unread,
May 11, 2019, 9:38:47 AM5/11/19
to linux...@googlegroups.com, linu...@vger.kernel.org, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, u-b...@lists.denx.de
> Increasing the SATA/AHCI DMA TX/RX FIFOs (P0DMACR.TXTS and .RXTS) from
> default 0x0 each to 0x3 each gives a write performance boost of 120MB/s
> from lame 36MB/s to 45MB/s previously. Read performance is about 200MB/s
> [tested on SSD using dd bs=4K count=512K].

Such a simple patch to fix such a long-standing performance problem that
everyone [ well, apparently not quite everyone ] assumed was a hardware
limitation...

And yet, April 1st is long gone.

Is it really for real?


Stefan

U.Mutlu

unread,
May 11, 2019, 2:12:10 PM5/11/19
to Stefan Monnier, linu...@vger.kernel.org, linux...@googlegroups.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, u-b...@lists.denx.de
Yes, it's indeed real, Stefan; really no April 1st joke. :-)

As you indicated, this problem of slow SATA write-speed
with these small devices lasts now for more than 5 years.
This patch finally solves the problem.

On my test device (BPI-R1) the optimum blocksize seems to be 12K
as it then gives even 129 MB/s write speed.

Here are some test results with different blocksizes, all giving
a write speed of 125 to 129 MB/s:

time sh -c "dd if=/dev/zero of=test.tmp bs=$bs count=$count conv=fdatasync"


------------ bs=8K / count=256K / 1 ------------------
262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB) copied, 16.9237 s, 127 MB/s

real 0m16.935s
user 0m0.388s
sys 0m15.777s

------------ bs=8K / count=256K / 2 ------------------
262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB) copied, 16.9916 s, 126 MB/s

real 0m17.973s
user 0m0.326s
sys 0m16.806s

------------ bs=8K / count=256K / 3 ------------------
262144+0 records in
262144+0 records out
2147483648 bytes (2.1 GB) copied, 17.0085 s, 126 MB/s

real 0m17.993s
user 0m0.442s
sys 0m16.588s

------------ bs=12K / count=171K / 1 ------------------
175104+0 records in
175104+0 records out
2151677952 bytes (2.2 GB) copied, 16.8474 s, 128 MB/s

real 0m16.860s
user 0m0.205s
sys 0m15.705s

------------ bs=12K / count=171K / 2 ------------------
175104+0 records in
175104+0 records out
2151677952 bytes (2.2 GB) copied, 16.6934 s, 129 MB/s

real 0m17.669s
user 0m0.227s
sys 0m16.355s

------------ bs=12K / count=171K / 3 ------------------
175104+0 records in
175104+0 records out
2151677952 bytes (2.2 GB) copied, 16.6684 s, 129 MB/s

real 0m17.654s
user 0m0.388s
sys 0m16.118s

------------ bs=16K / count=128K / 1 ------------------
131072+0 records in
131072+0 records out
2147483648 bytes (2.1 GB) copied, 17.1845 s, 125 MB/s

real 0m17.200s
user 0m0.251s
sys 0m16.060s

------------ bs=16K / count=128K / 2 ------------------
131072+0 records in
131072+0 records out
2147483648 bytes (2.1 GB) copied, 16.9221 s, 127 MB/s

real 0m17.902s
user 0m0.170s
sys 0m16.763s

------------ bs=16K / count=128K / 3 ------------------
131072+0 records in
131072+0 records out
2147483648 bytes (2.1 GB) copied, 16.8845 s, 127 MB/s

real 0m17.868s
user 0m0.167s
sys 0m16.736s


Maxime Ripard

unread,
May 12, 2019, 8:12:53 AM5/12/19
to Uenal Mutlu, Jens Axboe, Chen-Yu Tsai, linu...@vger.kernel.org, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux...@googlegroups.com, u-b...@lists.denx.de, linux-...@amarulasolutions.com, Jagan Teki, Pablo Greco, Mark Rutland, Oliver Schinagl, Linus Walleij, Hans de Goede, FUKAUMI Naoki, Andre Przywara
Hi,
Having comments / defines here would be great, once fixed:
Acked-by: Maxime Ripard <maxime...@bootlin.com>

Maxime

--
Maxime Ripard, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

U.Mutlu

unread,
May 12, 2019, 12:08:23 PM5/12/19
to Maxime Ripard, Jens Axboe, Chen-Yu Tsai, linu...@vger.kernel.org, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux...@googlegroups.com, u-b...@lists.denx.de, linux-...@amarulasolutions.com, Jagan Teki, Pablo Greco, Mark Rutland, Oliver Schinagl, Linus Walleij, Hans de Goede, FUKAUMI Naoki, Andre Przywara
Hi Maxime & Others,

what follows is a somewhat lengthy technical story behind this patch;
you can just skip it and jump to the end.


As can be seen in the ahci_sunxi.c, the port used in this patch
is this one (32bit):
#define AHCI_P0DMACR 0x0170
It's a so called "Vendor Specific Port" according to the SATA/AHCI specs by Intel.
The data behind it is actually a struct, consisting of 4 fields,
each 4bits long, plus a 16bits long field that is marked as Reserved
in secondary literature (see below):

struct AHCI_P0DMACR_t
{
unsigned TXTS : 4,
RXTS : 4,
TXABL : 4,
RXABL : 4,
Res1 : 16;
};

This struct is just my creation for my own tests as it's not part of the
driver source. The patch touches only the first 2 fields: TXTS and RXTS.

See this similar product documentation by Texas Instruments for the above struct:
https://www.ti.com/lit/ug/sprugj8c/sprugj8c.pdf
TMS320C674x/OMAP-L1x Processor, Serial ATA (SATA) Controller, User's Guide,
Literature Number: SPRUGJ8C, March 2011,
Page 68, Chapter 4.33 "Port DMA Control Register (P0DMACR)"

The above TI document describes the two fields as follows:

TXTS:
Transmit Transaction Size (TX_TRANSACTION_SIZE). This field defines the DMA
transaction size in
DWORDs for transmit (system bus read, device write) operation. [...]

RXTS:
Receive Transaction Size (RX_TRANSACTION_SIZE). This field defines the Port
DMA transaction size
in DWORDs for receive (system bus write, device read) operation. [...]


So, in my patch the fields TXTS and RXTS are set to 3 each.
Without the patch, these fields seem to have some random content
(I'vee seen 5 and 6, 4 and 4, and 0 and 0 on different devices),
as the previous code doesn't touch these 2 fields (ie. these two fields
are not within the used old mask of 0xff00; cf. ahci_sunxi.c, function
ahci_sunxi_start_engine(...)).


Some background story in my hunt for obtaining product documentation:

I couldn't find any product documentation for the SATA/AHCI
in these SoCs by Allwinner Technology (allwinnertech.com),
unlike with such products from other such companies.

I asked Allwinner, but they just said that the A20 of my SBC
would (allegedly) no more be actual and that the support for it
has ended (but this statement somehow cannot be true as the
A20 SoC is still continued being marketed at their website).
They instead sent me a bunch of really irrelevant PDFs which have
nothing to do with SATA/AHCI.

So, the company Allwinner Technology unfortunately was not cooperative
to provide me information on their SATA/AHCI-implementation in their SoCs :-(
Even the ports used in the actual ahci_sunxi.c in the linux tree are undocumented;
it is even commented with "/* This magic is from the original code */"
and below it many ports are used for which no documentation is available,
or at least I couldn't find any on the Internet. And the initial programmer
in 2014 and prior was Daniel Wang (danie...@allwinnertech.com),
but email to that address bounces.

So, I was forced to research secondary literature from other vendors
like Texas Instruments (thanks TI !) and Intel, and also studying
very old source codes in the old Linux repositories (as it differs
much from the current version) going back to the year 2014, and had
to do many (blind) experiments until I found this solution.

The above given User's Guide by Texas Instruments (and their such
documents for their newer such products) helped me much to find the solution.
It's of course not really the correct documentation for the Allwinner SoCs,
but still better than nothing.

If I only had the right documentation, then I for sure could try
to further improve that already achieved result by this patch,
as with SATA-II upto 300 MiB/s is possible.


Yes, I'll resend the patch with some appropriate comments.

Uenal Mutlu

Maxime Ripard

unread,
May 12, 2019, 1:40:45 PM5/12/19
to U.Mutlu, Jens Axboe, Chen-Yu Tsai, linu...@vger.kernel.org, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux...@googlegroups.com, u-b...@lists.denx.de, linux-...@amarulasolutions.com, Jagan Teki, Pablo Greco, Mark Rutland, Oliver Schinagl, Linus Walleij, Hans de Goede, FUKAUMI Naoki, Andre Przywara
Hi,
That's awesome research and explanation, thanks! :)

Uenal Mutlu

unread,
May 12, 2019, 5:00:29 PM5/12/19
to Jens Axboe, Maxime Ripard, Chen-Yu Tsai, linu...@vger.kernel.org, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Uenal Mutlu, linux...@googlegroups.com, linux-...@amarulasolutions.com, Jagan Teki, Pablo Greco, Mark Rutland, Oliver Schinagl, Linus Walleij, Hans de Goede, FUKAUMI Naoki, Andre Przywara, Stefan Monnier
Increasing the SATA/AHCI DMA TX/RX FIFOs (P0DMACR.TXTS and .RXTS, ie.
TX_TRANSACTION_SIZE and RX_TRANSACTION_SIZE) from default 0x0 each
to 0x3 each, gives a write performance boost of 120 MiB/s to 132 MiB/s
from lame 36 MiB/s to 45 MiB/s previously.
Read performance is about 200 MiB/s.
[tested on SSD using dd bs=2K/4K/8K/12K/16K/24K/32K: peak-perf at 12K].

Tested on the Banana Pi R1 (aka Lamobo R1) and Banana Pi M1 SBCs
with Allwinner A20 32bit-SoCs (ARMv7-a / arm-linux-gnueabihf).
These devices are RaspberryPi-like small devices.

This problem of slow SATA write-speed with these small devices lasts now
for more than 5 years. Many commentators throughout the years wrongly
assumed the slow write speed was a hardware limitation. This patch finally
solves the problem, which in fact was just a hard-to-fix software problem
(b/c of lack of documentation by the SoC-maker Allwinner Technology).

RFC: Since more than about 25 similar SBC/SoC models do use the
ahci_sunxi driver, users are encouraged to test it on all the
affected boards and give feedback.

Lists of the affected sunxi and other boards and SoCs with SATA using
the ahci_sunxi driver:
$ grep -i -e "^&ahci" arch/arm/boot/dts/sun*dts
and http://linux-sunxi.org/SATA#Devices_with_SATA_ports
See also http://linux-sunxi.org/Category:Devices_with_SATA_port

Patch v2:
- Commented the patch in-place in ahci_sunxi.c
- With bs=12K and no conv=... passed to dd, the write performance
rises further to 132 MiB/s
- Changed MB/s to MiB/s
- Posted the story behind the patch:
http://lkml.iu.edu/hypermail/linux/kernel/1905.1/03506.html
- Posted a dd test script to find optimal bs, and some results:
https://bit.ly/2YoOzEM

Patch v1:
- States bs=4K for dd and a write performance of 120 MiB/s

Signed-off-by: Uenal Mutlu <u...@mutluit.com>
---
drivers/ata/ahci_sunxi.c | 47 +++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 45 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/ahci_sunxi.c b/drivers/ata/ahci_sunxi.c
index 911710643305..ed19f19808c5 100644
--- a/drivers/ata/ahci_sunxi.c
+++ b/drivers/ata/ahci_sunxi.c
@@ -157,8 +157,51 @@ static void ahci_sunxi_start_engine(struct ata_port *ap)
void __iomem *port_mmio = ahci_port_base(ap);
struct ahci_host_priv *hpriv = ap->host->private_data;

- /* Setup DMA before DMA start */
- sunxi_clrsetbits(hpriv->mmio + AHCI_P0DMACR, 0x0000ff00, 0x00004400);
+ /* Setup DMA before DMA start
+ *
+ * NOTE: A similar SoC with SATA/AHCI by Texas Instruments documents
+ * this Vendor Specific Port (P0DMACR, aka PxDMACR) in its
+ * User's Guide document (TMS320C674x/OMAP-L1x Processor
+ * Serial ATA (SATA) Controller, Literature Number: SPRUGJ8C,
+ * March 2011, Chapter 4.33 Port DMA Control Register (P0DMACR),
+ * p.68, https://www.ti.com/lit/ug/sprugj8c/sprugj8c.pdf)
+ * as equivalent to the following struct:
+ *
+ * struct AHCI_P0DMACR_t
+ * {
+ * unsigned TXTS : 4,
+ * RXTS : 4,
+ * TXABL : 4,
+ * RXABL : 4,
+ * Reserved : 16;
+ * };
+ *
+ * TXTS: Transmit Transaction Size (TX_TRANSACTION_SIZE).
+ * This field defines the DMA transaction size in DWORDs for
+ * transmit (system bus read, device write) operation. [...]
+ *
+ * RXTS: Receive Transaction Size (RX_TRANSACTION_SIZE).
+ * This field defines the Port DMA transaction size in DWORDs
+ * for receive (system bus write, device read) operation. [...]
+ *
+ * TXABL: Transmit Burst Limit.
+ * This field allows software to limit the VBUSP master read
+ * burst size. [...]
+ *
+ * RXABL: Receive Burst Limit.
+ * Allows software to limit the VBUSP master write burst
+ * size. [...]
+ *
+ * Reserved: Reserved.
+ *
+ *
+ * NOTE: According to the above document, the following alternative
+ * to the code below could perhaps be a better option
+ * (or preparation) for possible further improvements later:
+ * sunxi_clrsetbits(hpriv->mmio + AHCI_P0DMACR, 0x0000ffff,
+ * 0x00000033);
+ */
+ sunxi_clrsetbits(hpriv->mmio + AHCI_P0DMACR, 0x0000ffff, 0x00004433);

Hans de Goede

unread,
May 13, 2019, 3:44:51 AM5/13/19
to Uenal Mutlu, Jens Axboe, Maxime Ripard, Chen-Yu Tsai, linu...@vger.kernel.org, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux...@googlegroups.com, linux-...@amarulasolutions.com, Jagan Teki, Pablo Greco, Mark Rutland, Oliver Schinagl, Linus Walleij, FUKAUMI Naoki, Andre Przywara, Stefan Monnier
Hi,

On 12-05-19 22:59, Uenal Mutlu wrote:
> Increasing the SATA/AHCI DMA TX/RX FIFOs (P0DMACR.TXTS and .RXTS, ie.
> TX_TRANSACTION_SIZE and RX_TRANSACTION_SIZE) from default 0x0 each
> to 0x3 each, gives a write performance boost of 120 MiB/s to 132 MiB/s
> from lame 36 MiB/s to 45 MiB/s previously.
> Read performance is about 200 MiB/s.
> [tested on SSD using dd bs=2K/4K/8K/12K/16K/24K/32K: peak-perf at 12K].
>
> Tested on the Banana Pi R1 (aka Lamobo R1) and Banana Pi M1 SBCs
> with Allwinner A20 32bit-SoCs (ARMv7-a / arm-linux-gnueabihf).
> These devices are RaspberryPi-like small devices.
>
> This problem of slow SATA write-speed with these small devices lasts now
> for more than 5 years. Many commentators throughout the years wrongly
> assumed the slow write speed was a hardware limitation. This patch finally
> solves the problem, which in fact was just a hard-to-fix software problem
> (b/c of lack of documentation by the SoC-maker Allwinner Technology).
>
> RFC: Since more than about 25 similar SBC/SoC models do use the
> ahci_sunxi driver, users are encouraged to test it on all the
> affected boards and give feedback

The SATA controller on these boards is inside the A10/A20 SoC, the
A10 and A20 use the same controller, so it is the same on all the boards.

IOW I don't see this only being tested on 1 board as a reason for the patch
to be RFC.
Have you tried / benchmarked the 0x00000033 option?

Regards,

Hans

Maxime Ripard

unread,
May 13, 2019, 5:59:22 AM5/13/19
to Uenal Mutlu, Jens Axboe, Chen-Yu Tsai, linu...@vger.kernel.org, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux...@googlegroups.com, linux-...@amarulasolutions.com, Jagan Teki, Pablo Greco, Mark Rutland, Oliver Schinagl, Linus Walleij, Hans de Goede, FUKAUMI Naoki, Andre Przywara, Stefan Monnier
Hi,
Acked-by: Maxime Ripard <maxime...@bootlin.com>

Just a minor nitpick though, the part starting with RFC: and with the
version changelog should be after the --- below so that it doesn't get
applied as part of the commit log.

U.Mutlu

unread,
May 13, 2019, 6:34:55 AM5/13/19
to Hans de Goede, Jens Axboe, Maxime Ripard, Chen-Yu Tsai, linu...@vger.kernel.org, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux...@googlegroups.com, linux-...@amarulasolutions.com, Jagan Teki, Pablo Greco, Mark Rutland, Oliver Schinagl, Linus Walleij, FUKAUMI Naoki, Andre Przywara, Stefan Monnier
Hans de Goede wrote on 05/13/2019 09:44 AM:
> On 12-05-19 22:59, Uenal Mutlu wrote:
>> Increasing the SATA/AHCI DMA TX/RX FIFOs (P0DMACR.TXTS and .RXTS, ie.
>> TX_TRANSACTION_SIZE and RX_TRANSACTION_SIZE) from default 0x0 each
>> to 0x3 each, gives a write performance boost of 120 MiB/s to 132 MiB/s
>> from lame 36 MiB/s to 45 MiB/s previously.
>> Read performance is about 200 MiB/s.
>> [tested on SSD using dd bs=2K/4K/8K/12K/16K/24K/32K: peak-perf at 12K].
>>
>> Tested on the Banana Pi R1 (aka Lamobo R1) and Banana Pi M1 SBCs
>> with Allwinner A20 32bit-SoCs (ARMv7-a / arm-linux-gnueabihf).
>> These devices are RaspberryPi-like small devices.
>>
>> This problem of slow SATA write-speed with these small devices lasts now
>> for more than 5 years. Many commentators throughout the years wrongly
>> assumed the slow write speed was a hardware limitation. This patch finally
>> solves the problem, which in fact was just a hard-to-fix software problem
>> (b/c of lack of documentation by the SoC-maker Allwinner Technology).
>>
>> RFC: Since more than about 25 similar SBC/SoC models do use the
>> ahci_sunxi driver, users are encouraged to test it on all the
>> affected boards and give feedback
>
> The SATA controller on these boards is inside the A10/A20 SoC, the
> A10 and A20 use the same controller, so it is the same on all the boards.

Ok, thanks for the clarification.
This fact of course simplifies the whole issue.

> IOW I don't see this only being tested on 1 board as a reason for the patch
> to be RFC.

I just wanted to be on the safe side :-) since I personally
have only 2 of the 25+ affected systems here for testing.
But I now understand that if it works on the tested two
A20 systems, then it normally should work on all of the
affected different boards/models as they all are using
the same SATA/AHCI-IP-Core, much like you also stated.

But of course I still wouldn't like it if someone loses data
and possibly blames my patch or even me myself, as the issue
is indeed a sensitive one where data loss (corruption of the
filesystem, partition, or the partition table) can indeed happen.
To be honest, it happened to me during my experiments with
some wrong, too high, values.
But I think the current version is mature and stable.
Yes, I did, but not that extensively yet. So far I couldn't see any
difference in the outcome.
There is in the TI doc just the following statement regarding this:

"Note that programming a burst size of greater than a transaction size,
while not invalid, is meaningless because the DMA maximizes out at
transaction size." (Ch. 3.4 DMA, p.15 in the TI doc).

I understand this as a neutral statement, ie. it doesn't hurt or make
any difference in practice if one sets TXABL effectively higher than
TXTS and/or RXABL effectively higher than RXTS,
FYI: these value are just some index-values, ie. index-value x means real value y.
0 for TXABL and/or RXABL means "Limit VBUSP burst size to 256 DWORDS",
ie. to the highest possible value for Transmit Burst Limit (TXABL) and
Receive Burst Limit (RXABL), respectively.

I'll test this alternative version now extensively in my test series.

But I could need an advice on what step I should take next in this
issue regarding getting the patch merged into the mainline kernel.
Shall I post a v3 of the patch with RFC removed, some more comments
added, and switching to the above alternative function argument
together with its test results, or shall I do these additions only
after the current version has already been merged into the kernel?
[actually I'm now unsure if patches with "RFC" flag ever get merged :-].

Thx.

> Regards,
>
> Hans

Hans de Goede

unread,
May 13, 2019, 7:20:27 AM5/13/19
to U.Mutlu, Jens Axboe, Maxime Ripard, Chen-Yu Tsai, linu...@vger.kernel.org, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux...@googlegroups.com, linux-...@amarulasolutions.com, Jagan Teki, Pablo Greco, Mark Rutland, Oliver Schinagl, Linus Walleij, FUKAUMI Naoki, Andre Przywara, Stefan Monnier
Hi,
If you're not seeing much difference in performance and all your
current testing has been done with 0x00004433, then it is fine to just
stick with that version, the original 0x00004400 values comes from
Allwinner's own SDK, presumably they had a reason for this.

> But I could need an advice on what step I should take next in this
> issue regarding getting the patch merged into the mainline kernel.
> Shall I post a v3 of the patch with RFC removed, some more comments
> added, and switching to the above alternative function argument
> together with its test results, or shall I do these additions only
> after the current version has already been merged into the kernel?
> [actually I'm now unsure if patches with "RFC" flag ever get merged :-].

The next step would be sending a non RFC version, with Maxime's Acked-by
added above your Signed-off-by, you may also add my:

Reviewed-by: Hans de Goede <hdeg...@redhat.com>, above your S-o-b
line.

Regards,

Hans

U.Mutlu

unread,
May 13, 2019, 7:20:56 AM5/13/19
to Maxime Ripard, Jens Axboe, Chen-Yu Tsai, linu...@vger.kernel.org, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, linux...@googlegroups.com, linux-...@amarulasolutions.com, Jagan Teki, Pablo Greco, Mark Rutland, Oliver Schinagl, Linus Walleij, Hans de Goede, FUKAUMI Naoki, Andre Przywara, Stefan Monnier
Thx!

> Just a minor nitpick though, the part starting with RFC: and with the
> version changelog should be after the --- below so that it doesn't get
> applied as part of the commit log.

Ok, I'll do it better in the future.
Thx again.

Uenal Mutlu

unread,
May 13, 2019, 10:24:44 AM5/13/19
to Jens Axboe, Maxime Ripard, Chen-Yu Tsai, linu...@vger.kernel.org, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, Uenal Mutlu, linux...@googlegroups.com, linux-...@amarulasolutions.com, Jagan Teki, Pablo Greco, Mark Rutland, Oliver Schinagl, Linus Walleij, Hans de Goede, FUKAUMI Naoki, Andre Przywara, Stefan Monnier
Increasing the SATA/AHCI DMA TX/RX FIFOs (P0DMACR.TXTS and .RXTS, ie.
TX_TRANSACTION_SIZE and RX_TRANSACTION_SIZE) from default 0x0 each
to 0x3 each, gives a write performance boost of 120 MiB/s to 132 MiB/s
from lame 36 MiB/s to 45 MiB/s previously.
Read performance is above 200 MiB/s.
[tested on SSD using dd bs=4K/8K/12K/16K/20K/24K/32K: peak-perf at 12K]

Tested on the SBCs Banana Pi R1 (aka Lamobo R1) and Banana Pi M1 which
are based on the Allwinner A20 32bit-SoC (ARMv7-a / arm-linux-gnueabihf).
These devices are RaspberryPi-like small devices.

This problem of slow SATA write-speed with these small devices lasts
for about 7 years now (beginning with the A10 SoC). Many commentators
throughout the years wrongly assumed the slow write speed was a
hardware limitation. This patch finally solves the problem, which
in fact was just a hard-to-find software problem due to lack of
SATA/AHCI documentation by the SoC-maker Allwinner Technology.

Lists of the affected sunxi and other boards and SoCs with SATA using
the ahci_sunxi driver:
$ grep -i -e "^&ahci" arch/arm/boot/dts/sun*dts
and http://linux-sunxi.org/SATA#Devices_with_SATA_ports
See also http://linux-sunxi.org/Category:Devices_with_SATA_port

Acked-by: Maxime Ripard <maxime...@bootlin.com>
Reviewed-by: Hans de Goede <hdeg...@redhat.com>
Signed-off-by: Uenal Mutlu <u...@mutluit.com>
---

v3:
* Removed RFC from Subject line, and also the explicit call for RFC
in the text, thereby submitting the patch for official merging.

v2:
* Commented the patch in-place in ahci_sunxi.c
* With bs=12K and no conv=... passed to dd, the write performance
rises further to 132 MiB/s
* Changed MB/s to MiB/s
* Posted the story behind the patch:
http://lkml.iu.edu/hypermail/linux/kernel/1905.1/03506.html
* Posted a dd test script to find optimal bs, and some results:
https://bit.ly/2YoOzEM

v1:
* States bs=4K for dd and a write performance of 120 MiB/s
---
drivers/ata/ahci_sunxi.c | 47 +++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 45 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/ahci_sunxi.c b/drivers/ata/ahci_sunxi.c
index 911710643305..018186a39a69 100644
--- a/drivers/ata/ahci_sunxi.c
+++ b/drivers/ata/ahci_sunxi.c
@@ -157,8 +157,51 @@ static void ahci_sunxi_start_engine(struct ata_port *ap)
void __iomem *port_mmio = ahci_port_base(ap);
struct ahci_host_priv *hpriv = ap->host->private_data;

- /* Setup DMA before DMA start */
- sunxi_clrsetbits(hpriv->mmio + AHCI_P0DMACR, 0x0000ff00, 0x00004400);
+ /* Setup DMA before DMA start
+ *
+ * NOTE: A similar SoC with SATA/AHCI by Texas Instruments documents
+ * this Vendor Specific Port (P0DMACR, aka PxDMACR) in its
+ * User's Guide document (TMS320C674x/OMAP-L1x Processor
+ * Serial ATA (SATA) Controller, Literature Number: SPRUGJ8C,
+ * March 2011, Chapter 4.33 Port DMA Control Register (P0DMACR),
+ * p.68, https://www.ti.com/lit/ug/sprugj8c/sprugj8c.pdf)
+ * as equivalent to the following struct:
+ *
+ * struct AHCI_P0DMACR_t
+ * {
+ * unsigned TXTS : 4;
+ * unsigned RXTS : 4;
+ * unsigned TXABL : 4;
+ * unsigned RXABL : 4;
+ * unsigned Reserved : 16;

Chen-Yu Tsai

unread,
Jul 5, 2019, 4:49:00 AM7/5/19
to Jens Axboe, Hans de Goede, Maxime Ripard, linu...@vger.kernel.org, linux-arm-kernel, linux-kernel, Uenal Mutlu, linux-sunxi, linux-amarula, Jagan Teki, Pablo Greco, Mark Rutland, Oliver Schinagl, Linus Walleij, FUKAUMI Naoki, Andre Przywara, Stefan Monnier
On Mon, May 13, 2019 at 10:24 PM Uenal Mutlu <u...@mutluit.com> wrote:
>
> Increasing the SATA/AHCI DMA TX/RX FIFOs (P0DMACR.TXTS and .RXTS, ie.
> TX_TRANSACTION_SIZE and RX_TRANSACTION_SIZE) from default 0x0 each
> to 0x3 each, gives a write performance boost of 120 MiB/s to 132 MiB/s
> from lame 36 MiB/s to 45 MiB/s previously.
> Read performance is above 200 MiB/s.
> [tested on SSD using dd bs=4K/8K/12K/16K/20K/24K/32K: peak-perf at 12K]
>
> Tested on the SBCs Banana Pi R1 (aka Lamobo R1) and Banana Pi M1 which
> are based on the Allwinner A20 32bit-SoC (ARMv7-a / arm-linux-gnueabihf).
> These devices are RaspberryPi-like small devices.
>
> This problem of slow SATA write-speed with these small devices lasts
> for about 7 years now (beginning with the A10 SoC). Many commentators
> throughout the years wrongly assumed the slow write speed was a
> hardware limitation. This patch finally solves the problem, which
> in fact was just a hard-to-find software problem due to lack of
> SATA/AHCI documentation by the SoC-maker Allwinner Technology.
>
> Lists of the affected sunxi and other boards and SoCs with SATA using
> the ahci_sunxi driver:
> $ grep -i -e "^&ahci" arch/arm/boot/dts/sun*dts
> and http://linux-sunxi.org/SATA#Devices_with_SATA_ports
> See also http://linux-sunxi.org/Category:Devices_with_SATA_port
>
> Acked-by: Maxime Ripard <maxime...@bootlin.com>
> Reviewed-by: Hans de Goede <hdeg...@redhat.com>
> Signed-off-by: Uenal Mutlu <u...@mutluit.com>

Tested-by: Chen-Yu Tsai <we...@csie.org>

on a Lamabo R1 as well.

Maybe we could merge this soon so it makes the next merge window?

Thanks.
Reply all
Reply to author
Forward
0 new messages