Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH] dma: omap-dma: add support for pause of non-cyclic transfers

22 views
Skip to first unread message

Sebastian Andrzej Siewior

unread,
Aug 7, 2015, 4:50:05 AM8/7/15
to
This DMA driver is used by 8250-omap on DRA7-evm. There is one
requirement that is to pause a transfer. This is currently used on the RX
side. It is possible that the UART HW aborted the RX (UART's RX-timeout)
but the DMA controller starts the transfer shortly after.
Before we can manually purge the FIFO we need to pause the transfer,
check how many bytes it already received and terminate the transfer
without it making any progress.

From testing on the TX side it seems that it is possible that we invoke
pause once the transfer has completed which is indicated by the missing
CCR_ENABLE bit but before the interrupt has been noticed. In that case the
interrupt will come even after disabling it.

The AM572x manual says that we have to wait for the CCR_RD_ACTIVE &
CCR_WR_ACTIVE bits to be gone before programming it again here is the
drain loop. Also it looks like without the drain the TX-transfer makes
sometimes progress.

One note: The pause + resume combo is broken because after resume the
the complete transfer will be programmed again. That means the already
transferred bytes (until the pause event) will be sent again. This is
currently not important for my UART user because it does only pause +
terminate.

Cc: <sta...@vger.kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <big...@linutronix.de>
---
drivers/dma/omap-dma.c | 54 ++++++++++++++++++++++++++++++++++++++------------
1 file changed, 41 insertions(+), 13 deletions(-)

diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
index 249445c8a4c6..6b8497203caf 100644
--- a/drivers/dma/omap-dma.c
+++ b/drivers/dma/omap-dma.c
@@ -299,7 +299,7 @@ static void omap_dma_start(struct omap_chan *c, struct omap_desc *d)
omap_dma_chan_write(c, CCR, d->ccr | CCR_ENABLE);
}

-static void omap_dma_stop(struct omap_chan *c)
+static int omap_dma_stop(struct omap_chan *c)
{
struct omap_dmadev *od = to_omap_dma_dev(c->vc.chan.device);
uint32_t val;
@@ -342,8 +342,26 @@ static void omap_dma_stop(struct omap_chan *c)

omap_dma_glbl_write(od, OCP_SYSCONFIG, sysconfig);
} else {
+ int i = 0;
+
+ if (!(val & CCR_ENABLE))
+ return -EINVAL;
+
val &= ~CCR_ENABLE;
omap_dma_chan_write(c, CCR, val);
+ do {
+ val = omap_dma_chan_read(c, CCR);
+ if (!(val & (CCR_RD_ACTIVE | CCR_WR_ACTIVE)))
+ break;
+ if (i > 100)
+ break;
+ udelay(5);
+ } while (1);
+
+ if (val & (CCR_RD_ACTIVE | CCR_WR_ACTIVE))
+ dev_err(c->vc.chan.device->dev,
+ "DMA drain did not complete on lch %d\n",
+ c->dma_ch);
}

mb();
@@ -358,6 +376,7 @@ static void omap_dma_stop(struct omap_chan *c)

omap_dma_chan_write(c, CLNK_CTRL, val);
}
+ return 0;
}

static void omap_dma_start_sg(struct omap_chan *c, struct omap_desc *d,
@@ -728,6 +747,8 @@ static enum dma_status omap_dma_tx_status(struct dma_chan *chan,
} else {
txstate->residue = 0;
}
+ if (ret == DMA_IN_PROGRESS && c->paused)
+ ret = DMA_PAUSED;
spin_unlock_irqrestore(&c->vc.lock, flags);

return ret;
@@ -1053,28 +1074,33 @@ static int omap_dma_terminate_all(struct dma_chan *chan)
static int omap_dma_pause(struct dma_chan *chan)
{
struct omap_chan *c = to_omap_dma_chan(chan);
+ struct omap_dmadev *od = to_omap_dma_dev(chan->device);
+ unsigned long flags;
+ int ret = -EINVAL;

- /* Pause/Resume only allowed with cyclic mode */
- if (!c->cyclic)
- return -EINVAL;
+ spin_lock_irqsave(&od->irq_lock, flags);

- if (!c->paused) {
- omap_dma_stop(c);
- c->paused = true;
+ if (!c->paused && c->desc) {
+ ret = omap_dma_stop(c);
+ if (!ret)
+ c->paused = true;
}

- return 0;
+ spin_unlock_irqrestore(&od->irq_lock, flags);
+
+ return ret;
}

static int omap_dma_resume(struct dma_chan *chan)
{
struct omap_chan *c = to_omap_dma_chan(chan);
+ struct omap_dmadev *od = to_omap_dma_dev(chan->device);
+ unsigned long flags;
+ int ret = -EINVAL;

- /* Pause/Resume only allowed with cyclic mode */
- if (!c->cyclic)
- return -EINVAL;
+ spin_lock_irqsave(&od->irq_lock, flags);

- if (c->paused) {
+ if (c->paused && c->desc) {
mb();

/* Restore channel link register */
@@ -1082,9 +1108,11 @@ static int omap_dma_resume(struct dma_chan *chan)

omap_dma_start(c, c->desc);
c->paused = false;
+ ret = 0;
}
+ spin_unlock_irqrestore(&od->irq_lock, flags);

- return 0;
+ return ret;
}

static int omap_dma_chan_init(struct omap_dmadev *od)
--
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Peter Ujfalusi

unread,
Aug 7, 2015, 5:50:06 AM8/7/15
to
On 08/07/2015 11:41 AM, Sebastian Andrzej Siewior wrote:
> This DMA driver is used by 8250-omap on DRA7-evm. There is one
> requirement that is to pause a transfer. This is currently used on the RX
> side. It is possible that the UART HW aborted the RX (UART's RX-timeout)
> but the DMA controller starts the transfer shortly after.
> Before we can manually purge the FIFO we need to pause the transfer,
> check how many bytes it already received and terminate the transfer
> without it making any progress.
>
> From testing on the TX side it seems that it is possible that we invoke
> pause once the transfer has completed which is indicated by the missing
> CCR_ENABLE bit but before the interrupt has been noticed. In that case the
> interrupt will come even after disabling it.
>
> The AM572x manual says that we have to wait for the CCR_RD_ACTIVE &
> CCR_WR_ACTIVE bits to be gone before programming it again here is the
> drain loop. Also it looks like without the drain the TX-transfer makes
> sometimes progress.
>
> One note: The pause + resume combo is broken because after resume the
> the complete transfer will be programmed again. That means the already
> transferred bytes (until the pause event) will be sent again. This is
> currently not important for my UART user because it does only pause +
> terminate.

with a short testing audio did not broke (the only user of pause/resume)
Some comments embedded.

> Cc: <sta...@vger.kernel.org>

Why stable? This is not fixing any bugs since the PAUSE was not allowed for
non cyclic transfers.
if (++i > 100)
break;
to avoid infinite loop?


> + break;
> + udelay(5);
> + } while (1);
> +
> + if (val & (CCR_RD_ACTIVE | CCR_WR_ACTIVE))

if (i > 100) ?
Péter

Sebastian Andrzej Siewior

unread,
Aug 7, 2015, 6:40:07 AM8/7/15
to
Hmmm. The DRA7x was using pause before for UART. I just did not see it
coming that it was not allowed here. John made a similar change to the
edma driver and I assumed it went stable but now I see that it was just
cherry-picked into the ti tree.
If you are not comfortable it being stable material I can drop it.
Ah. So I forgot to increment the counter. A few lines above there is
the same loop as a workaround for something. This is the same loop. I
could merge the loop + warning if you prefer. to have those things in
one place. I could also just increment i. Merging the two loops might
be better.

>> + break;
>> + udelay(5);
>> + } while (1);
>> +
>> + if (val & (CCR_RD_ACTIVE | CCR_WR_ACTIVE))
>
> if (i > 100) ?

While that would work, too I think it is more explicit to the reader if
you check for the condition that is important to you.

>> + dev_err(c->vc.chan.device->dev,
>> + "DMA drain did not complete on lch %d\n",
>> + c->dma_ch);
>> }
>>
>> mb();

Sebastian

Russell King - ARM Linux

unread,
Aug 7, 2015, 7:00:07 AM8/7/15
to
On Fri, Aug 07, 2015 at 10:41:57AM +0200, Sebastian Andrzej Siewior wrote:
> This DMA driver is used by 8250-omap on DRA7-evm. There is one
> requirement that is to pause a transfer. This is currently used on the RX
> side. It is possible that the UART HW aborted the RX (UART's RX-timeout)
> but the DMA controller starts the transfer shortly after.
> Before we can manually purge the FIFO we need to pause the transfer,
> check how many bytes it already received and terminate the transfer
> without it making any progress.
>
> >From testing on the TX side it seems that it is possible that we invoke
> pause once the transfer has completed which is indicated by the missing
> CCR_ENABLE bit but before the interrupt has been noticed. In that case the
> interrupt will come even after disabling it.

How do you cope with the OMAP DMA hardware clearing its FIFO when you
pause it?

The problem is that on mem-to-device transfers, the DMA hardware can
prefetch some data from memory into its FIFO before the device wants
the data. If you then disable the channel, the hardware clears the
FIFO.

It is unspecified whether the hardware updates the source address in
this case, or by how much. So it's pretty hard to undo the prefetching
in software.

The net result is: data loss.

This is why I explicitly did not implement pause/resume for non-cyclic
channels.

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

Peter Ujfalusi

unread,
Aug 7, 2015, 7:50:07 AM8/7/15
to
This change is needed for the UART DMA support if I'm not mistaken and this
mode is not really supported by older kernels, so having this to implement
something which is not going to be used in the stable kernels feels somehow wrong.
The other loop is for handling the ERRATA i541 and the two loops can not be
merged since the errata handling also require to change in SYSCONFIG register.

>
>>> + break;
>>> + udelay(5);
>>> + } while (1);
>>> +
>>> + if (val & (CCR_RD_ACTIVE | CCR_WR_ACTIVE))
>>
>> if (i > 100) ?
>
> While that would work, too I think it is more explicit to the reader if
> you check for the condition that is important to you.

Yeah, I see that the errata handling is doing the same, fine by me.

>>> + dev_err(c->vc.chan.device->dev,
>>> + "DMA drain did not complete on lch %d\n",
>>> + c->dma_ch);
>>> }
>>>
>>> mb();
>
> Sebastian
>


--
Péter

Sebastian Andrzej Siewior

unread,
Aug 7, 2015, 8:40:07 AM8/7/15
to
On 08/07/2015 12:55 PM, Russell King - ARM Linux wrote:
> On Fri, Aug 07, 2015 at 10:41:57AM +0200, Sebastian Andrzej Siewior wrote:
>> This DMA driver is used by 8250-omap on DRA7-evm. There is one
>> requirement that is to pause a transfer. This is currently used on the RX
>> side. It is possible that the UART HW aborted the RX (UART's RX-timeout)
>> but the DMA controller starts the transfer shortly after.
>> Before we can manually purge the FIFO we need to pause the transfer,
>> check how many bytes it already received and terminate the transfer
>> without it making any progress.
>>
>> >From testing on the TX side it seems that it is possible that we invoke
>> pause once the transfer has completed which is indicated by the missing
>> CCR_ENABLE bit but before the interrupt has been noticed. In that case the
>> interrupt will come even after disabling it.
>
> How do you cope with the OMAP DMA hardware clearing its FIFO when you
> pause it?

I don't

> The problem is that on mem-to-device transfers, the DMA hardware can
> prefetch some data from memory into its FIFO before the device wants
> the data. If you then disable the channel, the hardware clears the
> FIFO.
>
> It is unspecified whether the hardware updates the source address in
> this case, or by how much. So it's pretty hard to undo the prefetching
> in software.
>
> The net result is: data loss.
>
> This is why I explicitly did not implement pause/resume for non-cyclic
> channels.

Right now the 820-omap (8250-dma in general, too but they don't use
this driver) pause only the RX transfer in an error condition. This
means it is only device-to-mem transfer. I only mentioned the TX
transfer here since this was easier to test.

When I first tested the 8250_omap + DMA on EDMA it seemed that once the
UART hardware triggered an time-out interrupt the DMA transfer _never_
started. Based on this I could just cancel the transfer since the
RX-DMA and assume zero bytes were transfer. Therefore I ignored the
lack of pause support in EDMA.
Things were fine until someone used 3mbaud instead 115200. Its been
observed that at this speed the UART triggers the time-out interrupt
and the DMA transfer just started. Without the support for pause
we lost some bytes here and once pause support was added the problem
was gone. Now I've been chasing another bug on Dra7 which uses this
driver and the lack of pause support is a candidate for my current
problem. So at this point, things can't get worse if we pause a
transfer and the hardware reported the wrong number of received bytes.
The situation can improve if we get the correct number :)
The 8250_omap does not pause the TX/RX transfer for the fun of it. As I
said, on the TX side we avoid it and on the RX side the transfer is only
paused if we receive an error interrupt (= no way out).

Sebastian

Sebastian Andrzej Siewior

unread,
Aug 7, 2015, 8:50:06 AM8/7/15
to
On 08/07/2015 01:44 PM, Peter Ujfalusi wrote:
>>>> Cc: <sta...@vger.kernel.org>
>>>
>>> Why stable? This is not fixing any bugs since the PAUSE was not allowed for
>>> non cyclic transfers.
>>
>> Hmmm. The DRA7x was using pause before for UART. I just did not see it
>> coming that it was not allowed here. John made a similar change to the
>> edma driver and I assumed it went stable but now I see that it was just
>> cherry-picked into the ti tree.
>> If you are not comfortable it being stable material I can drop it.
>
> This change is needed for the UART DMA support if I'm not mistaken and this
> mode is not really supported by older kernels, so having this to implement
> something which is not going to be used in the stable kernels feels somehow wrong.

We have the DT pieces since v3.19-rc1. And if I remember correctly I
tested this on am335x-evm and dra7-evm by I the time I posted the
patches. I agree that dra7 support was not the best back then but I am
almost sure that I had vanilla running for testing.
But I don't insist on the stable tag. Consider it dropped.
yes, but I had in mind is to put the loop into one function so we gain:

+static void omap_dma_drain_chan(struct omap_chan *c)
+{
+ int i;
+ uint32_t val;
+
+ /* Wait for sDMA FIFO to drain */
+ for (i = 0; ; i++) {
+ val = omap_dma_chan_read(c, CCR);
+ if (!(val & (CCR_RD_ACTIVE | CCR_WR_ACTIVE)))
+ break;
+
+ if (i > 100)
+ break;
+
+ udelay(5);
+ }
+
+ if (val & (CCR_RD_ACTIVE | CCR_WR_ACTIVE))
+ dev_err(c->vc.chan.device->dev,
+ "DMA drain did not complete on lch %d\n",
+ c->dma_ch);
+}

which is invoked by both parts of the if case (handling the errata not
not) instead of having the same loop twice.

>>>> + break;
>>>> + udelay(5);
>>>> + } while (1);
>>>> +
>>>> + if (val & (CCR_RD_ACTIVE | CCR_WR_ACTIVE))
>>>
>>> if (i > 100) ?
>>
>> While that would work, too I think it is more explicit to the reader if
>> you check for the condition that is important to you.
>
> Yeah, I see that the errata handling is doing the same, fine by me.
good.

Sebastian

Russell King - ARM Linux

unread,
Aug 7, 2015, 9:20:06 AM8/7/15
to
On Fri, Aug 07, 2015 at 02:35:45PM +0200, Sebastian Andrzej Siewior wrote:
> On 08/07/2015 12:55 PM, Russell King - ARM Linux wrote:
> > On Fri, Aug 07, 2015 at 10:41:57AM +0200, Sebastian Andrzej Siewior wrote:
> >> This DMA driver is used by 8250-omap on DRA7-evm. There is one
> >> requirement that is to pause a transfer. This is currently used on the RX
> >> side. It is possible that the UART HW aborted the RX (UART's RX-timeout)
> >> but the DMA controller starts the transfer shortly after.
> >> Before we can manually purge the FIFO we need to pause the transfer,
> >> check how many bytes it already received and terminate the transfer
> >> without it making any progress.
> >>
> >> >From testing on the TX side it seems that it is possible that we invoke
> >> pause once the transfer has completed which is indicated by the missing
> >> CCR_ENABLE bit but before the interrupt has been noticed. In that case the
> >> interrupt will come even after disabling it.
> >
> > How do you cope with the OMAP DMA hardware clearing its FIFO when you
> > pause it?
>
> I don't

... and so you introduce a silent data loss bug into the driver. That's
not very clever.

> Right now the 820-omap (8250-dma in general, too but they don't use
> this driver) pause only the RX transfer in an error condition. This
> means it is only device-to-mem transfer. I only mentioned the TX
> transfer here since this was easier to test.

That may be how 8250 works, but 8250 is not everything. You can't ignore
this problem. You have to deal with it - either by not allowing a channel
that would loose data to be paused, or by recovering from that condition.
You're not doing either in your patch.

Therefore, I have no other option but to NAK your change. Sorry.

Please fix this.

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

Russell King - ARM Linux

unread,
Aug 7, 2015, 9:30:07 AM8/7/15
to
On Fri, Aug 07, 2015 at 12:36:14PM +0200, Sebastian Andrzej Siewior wrote:
> On 08/07/2015 11:44 AM, Peter Ujfalusi wrote:
> > with a short testing audio did not broke (the only user of pause/resume)
> > Some comments embedded.
> >
> >> Cc: <sta...@vger.kernel.org>
> >
> > Why stable? This is not fixing any bugs since the PAUSE was not allowed for
> > non cyclic transfers.
>
> Hmmm. The DRA7x was using pause before for UART. I just did not see it
> coming that it was not allowed here. John made a similar change to the
> edma driver and I assumed it went stable but now I see that it was just
> cherry-picked into the ti tree.

This is *NOT* stable material.

Pause of these channels is something that omap-dma has *never* supported.
Therefore, it is *not* a regression. What you are doing is *adding* a
feature to the omap-dma driver. That is not stable material in any sense.
Stable is for bug fixes to existing code, not feature enhancements.

If something else has been converted to pause channels and that is causing
a problem, then _that_ conversion is where the bug lies, not the lack of
support in the omap-dma.

If it's a result of using some new driver with omap-dma, then the problem
is with whatever introduced that new combination - it's not that omap-dma
is buggy.

Don't fix bugs in -stable by adding features. That's _no_ way to fix bugs.

NAK on this feature patch having any kind of stable tag.

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

Russell King - ARM Linux

unread,
Aug 7, 2015, 9:30:07 AM8/7/15
to
On Fri, Aug 07, 2015 at 03:22:56PM +0200, Sebastian Andrzej Siewior wrote:
> Would it be okay if I only allow pause for RX-transfers?

Yes, that satisfies one of my suggestions above.

> For TX-transfers, I would need to update the start-address so the
> transfers begins where it stopped. However based on your concern I
> can't really assume that the position reported by the HW is the correct
> one.

Exactly - I don't believe that existing OMAP DMA hardware can ever support
pausing a mem-to-device transfer without data loss as you have no idea
how many bytes have been prefetched by the DMA, and therefore you have
no idea how many bytes to unwind the hardware position. It gets worse
than that when you have to cross into the previous descriptor. It's
really not nice.

So, disallowing pause for mem-to-device is entirely reasonable given the
data loss implications.

Sebastian Andrzej Siewior

unread,
Aug 7, 2015, 9:30:07 AM8/7/15
to
Would it be okay if I only allow pause for RX-transfers?
For TX-transfers, I would need to update the start-address so the
transfers begins where it stopped. However based on your concern I
can't really assume that the position reported by the HW is the correct
one.

Sebastian

Sebastian Andrzej Siewior

unread,
Aug 7, 2015, 9:50:06 AM8/7/15
to
On 08/07/2015 03:22 PM, Russell King - ARM Linux wrote:
> On Fri, Aug 07, 2015 at 12:36:14PM +0200, Sebastian Andrzej Siewior wrote:
>> On 08/07/2015 11:44 AM, Peter Ujfalusi wrote:
>>> with a short testing audio did not broke (the only user of pause/resume)
>>> Some comments embedded.
>>>
>>>> Cc: <sta...@vger.kernel.org>
>>>
>>> Why stable? This is not fixing any bugs since the PAUSE was not allowed for
>>> non cyclic transfers.
>>
>> Hmmm. The DRA7x was using pause before for UART. I just did not see it
>> coming that it was not allowed here. John made a similar change to the
>> edma driver and I assumed it went stable but now I see that it was just
>> cherry-picked into the ti tree.
>
> This is *NOT* stable material.
>
> Pause of these channels is something that omap-dma has *never* supported.
> Therefore, it is *not* a regression. What you are doing is *adding* a
> feature to the omap-dma driver. That is not stable material in any sense.
> Stable is for bug fixes to existing code, not feature enhancements.

I didn't consider this as a feature.

> If something else has been converted to pause channels and that is causing
> a problem, then _that_ conversion is where the bug lies, not the lack of
> support in the omap-dma.

So we had the 8250-DMA doing pause and all its current users implement
it. We have a DMA driver tree which is not used and it not implementing
pause (not implementing pause at all). Later we get a combo of 8250-DMA
+ DMA driver that is broken because the lack of pause and this is
noticed a few kernel releases later.
The only way of fixing the bug is by implementing the pause feature.
Now you are saying that even if I implement this missing feature in a
newer kernel I am not allowed to mark it stable despite the fact that
it fixes an existing problem in older kernels because it is not a
regression.

> If it's a result of using some new driver with omap-dma, then the problem
> is with whatever introduced that new combination - it's not that omap-dma
> is buggy.
>
> Don't fix bugs in -stable by adding features. That's _no_ way to fix bugs.
>
> NAK on this feature patch having any kind of stable tag.

I already accepted this.

Sebastian

Russell King - ARM Linux

unread,
Aug 7, 2015, 10:00:07 AM8/7/15
to
On Fri, Aug 07, 2015 at 03:42:06PM +0200, Sebastian Andrzej Siewior wrote:
> On 08/07/2015 03:22 PM, Russell King - ARM Linux wrote:
> > On Fri, Aug 07, 2015 at 12:36:14PM +0200, Sebastian Andrzej Siewior wrote:
> >> On 08/07/2015 11:44 AM, Peter Ujfalusi wrote:
> >>> with a short testing audio did not broke (the only user of pause/resume)
> >>> Some comments embedded.
> >>>
> >>>> Cc: <sta...@vger.kernel.org>
> >>>
> >>> Why stable? This is not fixing any bugs since the PAUSE was not allowed for
> >>> non cyclic transfers.
> >>
> >> Hmmm. The DRA7x was using pause before for UART. I just did not see it
> >> coming that it was not allowed here. John made a similar change to the
> >> edma driver and I assumed it went stable but now I see that it was just
> >> cherry-picked into the ti tree.
> >
> > This is *NOT* stable material.
> >
> > Pause of these channels is something that omap-dma has *never* supported.
> > Therefore, it is *not* a regression. What you are doing is *adding* a
> > feature to the omap-dma driver. That is not stable material in any sense.
> > Stable is for bug fixes to existing code, not feature enhancements.
>
> I didn't consider this as a feature.

As it is something that the driver has _not_ supported, you are clearly
adding a feature to an existing driver. It's not a bug fix.

> > If something else has been converted to pause channels and that is causing
> > a problem, then _that_ conversion is where the bug lies, not the lack of
> > support in the omap-dma.
>
> So we had the 8250-DMA doing pause and all its current users implement
> it. We have a DMA driver tree which is not used and it not implementing
> pause (not implementing pause at all). Later we get a combo of 8250-DMA
> + DMA driver that is broken because the lack of pause and this is
> noticed a few kernel releases later.

Right, so the patch which caused the regression is the one which arranged
for the 8250-dma + omap-dma combination to work together, not the missing
pause support in omap-dma.

It doesn't matter that it's several releases old, it's that change caused
the regression you have today. That change is incorrect today, and it was
just as incorrect at the time that it was merged.

> The only way of fixing the bug is by implementing the pause feature.

That's not the only way of fixing the bug.

As the binding of drivers is controlled by DT, you can disable the binding
of these two drivers there and 8250 will go back to using non-DMA mode -
which is the situation which existed prior to the change which coupled the
two drivers together. That's an acceptable change for -stable trees,
because it's reverting the change which caused the regression, taking us
back to a situation we _know_ worked previously.

Then, in mainline during the next merge window, we can introduce the pause
feature to omap-dma, and re-enable the 8250 driver to use it. _Once_ that's
proven stable, we can then take a view whether those changes should _then_
be backported to stable kernels with greater confidence that backporting
the feature addition won't itself cause a new regression.

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

Peter Hurley

unread,
Aug 7, 2015, 10:50:07 AM8/7/15
to
On 08/07/2015 09:25 AM, Russell King - ARM Linux wrote:
> On Fri, Aug 07, 2015 at 03:22:56PM +0200, Sebastian Andrzej Siewior wrote:

>> For TX-transfers, I would need to update the start-address so the
>> transfers begins where it stopped. However based on your concern I
>> can't really assume that the position reported by the HW is the correct
>> one.
>
> Exactly - I don't believe that existing OMAP DMA hardware can ever support
> pausing a mem-to-device transfer without data loss as you have no idea
> how many bytes have been prefetched by the DMA, and therefore you have
> no idea how many bytes to unwind the hardware position. It gets worse
> than that when you have to cross into the previous descriptor. It's
> really not nice.
>
> So, disallowing pause for mem-to-device is entirely reasonable given the
> data loss implications.

Thanks for adding your hard-won knowledge to this discussion, Russell.
This saves us a bunch of wasted effort trying to fix x_char with DMA
(and TCSANOW termios changes and throttling).

Regards,
Peter Hurley

Peter Hurley

unread,
Aug 7, 2015, 11:10:06 AM8/7/15
to
[ + Greg KH ]
FWIW, the actual bug is the api that silently does nothing.


>> So we had the 8250-DMA doing pause and all its current users implement
>> it. We have a DMA driver tree which is not used and it not implementing
>> pause (not implementing pause at all). Later we get a combo of 8250-DMA
>> + DMA driver that is broken because the lack of pause and this is
>> noticed a few kernel releases later.
>
> Right, so the patch which caused the regression is the one which arranged
> for the 8250-dma + omap-dma combination to work together, not the missing
> pause support in omap-dma.

That would be the original submission patch set for an entire driver,
the 8250_omap driver.


> It doesn't matter that it's several releases old, it's that change caused
> the regression you have today. That change is incorrect today, and it was
> just as incorrect at the time that it was merged.
>
>> The only way of fixing the bug is by implementing the pause feature.
>
> That's not the only way of fixing the bug.
>
> As the binding of drivers is controlled by DT, you can disable the binding
> of these two drivers

No. 8250 dma is not a stand-alone driver. Even if it were, how would you go
back and fix DTs in the wild?

The "binding" is built-in with a CONFIG_ switch.


> there and 8250 will go back to using non-DMA mode -
> which is the situation which existed prior to the change which coupled the
> two drivers together. That's an acceptable change for -stable trees,
> because it's reverting the change which caused the regression, taking us
> back to a situation we _know_ worked previously.

What you're suggesting here is only possible if the entire 8250_omap driver
is removed, so that's a non-starter.

I suggest to wait on any solution until the correct fix is mainlined
and backported, as you note below.

Regards,
Peter Hurley

> Then, in mainline during the next merge window, we can introduce the pause
> feature to omap-dma, and re-enable the 8250 driver to use it. _Once_ that's
> proven stable, we can then take a view whether those changes should _then_
> be backported to stable kernels with greater confidence that backporting
> the feature addition won't itself cause a new regression.

--

Russell King - ARM Linux

unread,
Aug 7, 2015, 11:40:09 AM8/7/15
to
On Fri, Aug 07, 2015 at 11:08:48AM -0400, Peter Hurley wrote:
> [ + Greg KH ]
>
> On 08/07/2015 09:57 AM, Russell King - ARM Linux wrote:
> > As it is something that the driver has _not_ supported, you are clearly
> > adding a feature to an existing driver. It's not a bug fix.
> >
> >>> If something else has been converted to pause channels and that is causing
> >>> a problem, then _that_ conversion is where the bug lies, not the lack of
> >>> support in the omap-dma.
>
> FWIW, the actual bug is the api that silently does nothing.

Incorrect.

static int omap_dma_pause(struct dma_chan *chan)
{
struct omap_chan *c = to_omap_dma_chan(chan);

/* Pause/Resume only allowed with cyclic mode */
if (!c->cyclic)
return -EINVAL;

Asking for the channel to be paused will return an error code indicating
that the request failed. That will be propagated back through to the
return code of dmaengine_pause().

If we look at what 8250-dma.c is doing:

if (dma->rx_running) {
dmaengine_pause(dma->rxchan);

It's 8250-dma.c which is silently _ignoring_ the return code, failing
to check that the operation it requested worked. Maybe this should be
WARN_ON(dmaengine_pause(dma->rxchan)) or at least it should print a
message?

> > Right, so the patch which caused the regression is the one which arranged
> > for the 8250-dma + omap-dma combination to work together, not the missing
> > pause support in omap-dma.
>
> That would be the original submission patch set for an entire driver,
> the 8250_omap driver.

Well, that's where the bug lies, and I don't agree with your assessment
that it would mean reverting the whole thing.

The binding between the two drivers is controlled via DT. DT tells it
which DMA controller and which DMA input to use. So, as DMA is currently
broken on this, the solution is to break that link so that 8250-omap goes
back to using PIO only.

> > As the binding of drivers is controlled by DT, you can disable the binding
> > of these two drivers
>
> No. 8250 dma is not a stand-alone driver. Even if it were, how would you go
> back and fix DTs in the wild?

Well, we have reached an impass then.

I'm not going to accept a feature addition to a driver as a stable patch
without it being adequately tested over _several_ kernel revisions to
ensure that we don't end up cocking up some driver which uses the DMA
interfaces correctly. It's too big a risk.

So, I guess that means that older kernels will just have to remain broken -
all because the basic testing of the original code was never undertaken
to ensure that basic stuff like reception of characters worked properly.

Sorry, I have little sympathy here.

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

Sebastian Andrzej Siewior

unread,
Aug 7, 2015, 11:40:10 AM8/7/15
to
This DMA driver is used by 8250-omap on DRA7-evm. There is one
requirement that is to pause a transfer. This is currently used on the RX
side. It is possible that the UART HW aborted the RX (UART's RX-timeout)
but the DMA controller starts the transfer shortly after.
Before we can manually purge the FIFO we need to pause the transfer,
check how many bytes it already received and terminate the transfer
without it making any progress.

From testing on the TX side it seems that it is possible that we invoke
pause once the transfer has completed which is indicated by the missing
CCR_ENABLE bit but before the interrupt has been noticed. In that case the
interrupt will come even after disabling it.

The AM572x manual says that we have to wait for the CCR_RD_ACTIVE &
CCR_WR_ACTIVE bits to be gone before programming it again here is the
drain loop. Also it looks like without the drain the TX-transfer makes
sometimes progress.

One note: The pause + resume combo is broken because after resume the
the complete transfer will be programmed again. That means the already
transferred bytes (until the pause event) will be sent again. This is
currently not important for my UART user because it does only pause +
terminate.

v1…v2:
- move the drain loop into omap_dma_drain_chan() instead of having it
twice.
- allow pause only for DMA_DEV_TO_MEM transfers if non-cyclic. Add a
comment why DMA_MEM_TO_DEV not allowed.
- clear pause on terminate_all. Otherwise pause() + terminate_all()
will keep the pause bit set and we can't pause the following
transfer.

Signed-off-by: Sebastian Andrzej Siewior <big...@linutronix.de>
---
drivers/dma/omap-dma.c | 107 +++++++++++++++++++++++++++++++++----------------
1 file changed, 73 insertions(+), 34 deletions(-)

diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
index 249445c8a4c6..6bbf089d2d7f 100644
--- a/drivers/dma/omap-dma.c
+++ b/drivers/dma/omap-dma.c
@@ -299,7 +299,30 @@ static void omap_dma_start(struct omap_chan *c, struct omap_desc *d)
omap_dma_chan_write(c, CCR, d->ccr | CCR_ENABLE);
}

-static void omap_dma_stop(struct omap_chan *c)
+static void omap_dma_drain_chan(struct omap_chan *c)
+{
+ int i;
+ uint32_t val;
+
+ /* Wait for sDMA FIFO to drain */
+ for (i = 0; ; i++) {
+ val = omap_dma_chan_read(c, CCR);
+ if (!(val & (CCR_RD_ACTIVE | CCR_WR_ACTIVE)))
+ break;
+
+ if (i > 100)
+ break;
+
+ udelay(5);
+ }
+
+ if (val & (CCR_RD_ACTIVE | CCR_WR_ACTIVE))
+ dev_err(c->vc.chan.device->dev,
+ "DMA drain did not complete on lch %d\n",
+ c->dma_ch);
+}
+
+static int omap_dma_stop(struct omap_chan *c)
{
struct omap_dmadev *od = to_omap_dma_dev(c->vc.chan.device);
uint32_t val;
@@ -312,7 +335,6 @@ static void omap_dma_stop(struct omap_chan *c)
val = omap_dma_chan_read(c, CCR);
if (od->plat->errata & DMA_ERRATA_i541 && val & CCR_TRIGGER_SRC) {
uint32_t sysconfig;
- unsigned i;

sysconfig = omap_dma_glbl_read(od, OCP_SYSCONFIG);
val = sysconfig & ~DMA_SYSCONFIG_MIDLEMODE_MASK;
@@ -323,27 +345,18 @@ static void omap_dma_stop(struct omap_chan *c)
val &= ~CCR_ENABLE;
omap_dma_chan_write(c, CCR, val);

- /* Wait for sDMA FIFO to drain */
- for (i = 0; ; i++) {
- val = omap_dma_chan_read(c, CCR);
- if (!(val & (CCR_RD_ACTIVE | CCR_WR_ACTIVE)))
- break;
-
- if (i > 100)
- break;
-
- udelay(5);
- }
-
- if (val & (CCR_RD_ACTIVE | CCR_WR_ACTIVE))
- dev_err(c->vc.chan.device->dev,
- "DMA drain did not complete on lch %d\n",
- c->dma_ch);
+ omap_dma_drain_chan(c);

omap_dma_glbl_write(od, OCP_SYSCONFIG, sysconfig);
} else {
+
+ if (!(val & CCR_ENABLE))
+ return -EINVAL;
+
val &= ~CCR_ENABLE;
omap_dma_chan_write(c, CCR, val);
+
+ omap_dma_drain_chan(c);
}

mb();
@@ -358,6 +371,7 @@ static void omap_dma_stop(struct omap_chan *c)

omap_dma_chan_write(c, CLNK_CTRL, val);
}
+ return 0;
}

static void omap_dma_start_sg(struct omap_chan *c, struct omap_desc *d,
@@ -728,6 +742,8 @@ static enum dma_status omap_dma_tx_status(struct dma_chan *chan,
} else {
txstate->residue = 0;
}
+ if (ret == DMA_IN_PROGRESS && c->paused)
+ ret = DMA_PAUSED;
spin_unlock_irqrestore(&c->vc.lock, flags);

return ret;
@@ -1038,10 +1054,8 @@ static int omap_dma_terminate_all(struct dma_chan *chan)
omap_dma_stop(c);
}

- if (c->cyclic) {
- c->cyclic = false;
- c->paused = false;
- }
+ c->cyclic = false;
+ c->paused = false;

vchan_get_all_descriptors(&c->vc, &head);
spin_unlock_irqrestore(&c->vc.lock, flags);
@@ -1053,28 +1067,51 @@ static int omap_dma_terminate_all(struct dma_chan *chan)
static int omap_dma_pause(struct dma_chan *chan)
{
struct omap_chan *c = to_omap_dma_chan(chan);
+ struct omap_dmadev *od = to_omap_dma_dev(chan->device);
+ unsigned long flags;
+ int ret = -EINVAL;
+ bool can_pause;

- /* Pause/Resume only allowed with cyclic mode */
- if (!c->cyclic)
- return -EINVAL;
+ spin_lock_irqsave(&od->irq_lock, flags);

- if (!c->paused) {
- omap_dma_stop(c);
- c->paused = true;
+ if (!c->desc)
+ goto out;
+
+ if (c->cyclic)
+ can_pause = true;
+
+ else if (c->desc->dir == DMA_DEV_TO_MEM)
+ /*
+ * We do not allow DMA_MEM_TO_DEV transfers to be paused.
+ * According to RMK the OMAP hardware might prefetch bytes from
+ * memory into its FIFO and not send it to the device due to the
+ * pause. The bytes in the FIFO are cleared on pause. It is
+ * unspecified by how many bytes the source address is updated
+ * if at all.
+ */
+ can_pause = true;
+
+ if (can_pause && !c->paused) {
+ ret = omap_dma_stop(c);
+ if (!ret)
+ c->paused = true;
}
+out:
+ spin_unlock_irqrestore(&od->irq_lock, flags);

- return 0;
+ return ret;
}

static int omap_dma_resume(struct dma_chan *chan)
{
struct omap_chan *c = to_omap_dma_chan(chan);
+ struct omap_dmadev *od = to_omap_dma_dev(chan->device);
+ unsigned long flags;
+ int ret = -EINVAL;

- /* Pause/Resume only allowed with cyclic mode */
- if (!c->cyclic)
- return -EINVAL;
+ spin_lock_irqsave(&od->irq_lock, flags);

- if (c->paused) {
+ if (c->paused && c->desc) {
mb();

/* Restore channel link register */
@@ -1082,9 +1119,11 @@ static int omap_dma_resume(struct dma_chan *chan)

omap_dma_start(c, c->desc);
c->paused = false;
+ ret = 0;
}
+ spin_unlock_irqrestore(&od->irq_lock, flags);

- return 0;
+ return ret;
}

static int omap_dma_chan_init(struct omap_dmadev *od)
--
2.5.0

Sebastian Andrzej Siewior

unread,
Aug 7, 2015, 11:50:06 AM8/7/15
to
I think this is what Peter meant by saying "silently does nothing".

> So, I guess that means that older kernels will just have to remain broken -
> all because the basic testing of the original code was never undertaken
> to ensure that basic stuff like reception of characters worked properly.

Hehe. I wouldn't describe testing at 3mbaud as basic. This reads as I
didn't do any kind of testing at all prior submitting the driver. This
was not the case.

Sebastian

Peter Hurley

unread,
Aug 7, 2015, 12:10:08 PM8/7/15
to
Thanks for the suggestion; I'll hold on to that and push it after we add
the 8250 omap dma pause in mainline.


>>> Right, so the patch which caused the regression is the one which arranged
>>> for the 8250-dma + omap-dma combination to work together, not the missing
>>> pause support in omap-dma.
>>
>> That would be the original submission patch set for an entire driver,
>> the 8250_omap driver.
>
> Well, that's where the bug lies, and I don't agree with your assessment
> that it would mean reverting the whole thing.
>
> The binding between the two drivers is controlled via DT. DT tells it
> which DMA controller and which DMA input to use. So, as DMA is currently
> broken on this, the solution is to break that link so that 8250-omap goes
> back to using PIO only.
>
>>> As the binding of drivers is controlled by DT, you can disable the binding
>>> of these two drivers
>>
>> No. 8250 dma is not a stand-alone driver. Even if it were, how would you go
>> back and fix DTs in the wild?
>
> Well, we have reached an impass then.
>
> I'm not going to accept a feature addition to a driver as a stable patch
> without it being adequately tested over _several_ kernel revisions to
> ensure that we don't end up cocking up some driver which uses the DMA
> interfaces correctly. It's too big a risk.
>
> So, I guess that means that older kernels will just have to remain broken -
> all because the basic testing of the original code was never undertaken
> to ensure that basic stuff like reception of characters worked properly.


Well, instead of me saying something snide about the lack of upstream serial
driver unit tests, I'll say I've been working on cleaning up and organizing
my own tty/serial subsystem and driver units tests which I hope to upstream
in the fall.

Those include i/o validators that ran this driver for days at time without
error at max line rate. Unfortunately, that hardware does not exhibit the
same problem as the DRA7 noted in the changelog.



> Sorry, I have little sympathy here.


--

Sebastian Andrzej Siewior

unread,
Aug 7, 2015, 12:30:07 PM8/7/15
to
On 08/07/2015 06:07 PM, Peter Hurley wrote:
>> If we look at what 8250-dma.c is doing:
>>
>> if (dma->rx_running) {
>> dmaengine_pause(dma->rxchan);
>>
>> It's 8250-dma.c which is silently _ignoring_ the return code, failing
>> to check that the operation it requested worked. Maybe this should be
>> WARN_ON(dmaengine_pause(dma->rxchan)) or at least it should print a
>> message?
>
> Thanks for the suggestion; I'll hold on to that and push it after we add
> the 8250 omap dma pause in mainline.

I have a patch ready with WARN_ON_ONCE() for 8250-omap and 8250-dma.
This warning would trigger on am335x/edma until v4.2-rc1 and omap-dma
based version is open. I could post it if you want me to.
Besides that those two, there are four other drivers ignoring the
return code dmaengine_pause().

Sebastian

Russell King - ARM Linux

unread,
Aug 7, 2015, 12:30:07 PM8/7/15
to
On Fri, Aug 07, 2015 at 05:36:05PM +0200, Sebastian Andrzej Siewior wrote:
> + /*
> + * We do not allow DMA_MEM_TO_DEV transfers to be paused.
> + * According to RMK the OMAP hardware might prefetch bytes from
> + * memory into its FIFO and not send it to the device due to the
> + * pause. The bytes in the FIFO are cleared on pause. It is
> + * unspecified by how many bytes the source address is updated
> + * if at all.

Would you mind rewording the above.

Please take the time to read the manuals for the device you are playing
with. It's mostly documented in there. See the OMAP4430 ES2.x TRM,
16.4.18 and 16.4.19.

16.4.19 states that:

"When a source-synchronized channel is disabled during a transfer...
"In all other cases, the channel undergoes an abort."

A source-synchronised channel is one where the fetching of data is under
control of the device. In other words, a device-to-memory transfer.

So, a destination-synchronised channel (which would be a memory-to-device
transfer) undergoes an abort if you clear the enable bit.

16.4.20.4.6.2 defines what happens when a channel aborts:

"If an abort trigger occurs, the channel aborts immediately after
completion of current read/write transactions and then FIFO is
cleaned up."

It doesn't define what "cleaned up" means, so that's open to some
interpretation. That's why I contacted TI about this back in 2012:
| What is the behaviour of the OMAP DMA hardware when the DMA4_CCRi[7]
| enable bit is cleared and subsequently set without any additional
| reprogramming? I'm thinking specifically about memory-to-device DMA
| operations, in particular the UART ports.
|
| Will this allow a transfer to be temporarily paused, and then
| subsequently resumed with out loss of data in the DMA hardware, as if
| nothing had actually happened?

The answer I received was to check that RD_ACTIVE and WR_ACTIVE are both
clear _before_ disabling the channel, otherwise data loss will occur.

The problem is that if the channel is active, then device activity can
result in DMA activity starting between reading those as both clear and
the write to DMA_CCR to clear the enable bit hitting the hardware.

The explanation went on to say that if the DMA hardware can't drain
the data in its FIFO to the destination, then data loss "might" occur.

That will occur if we're destination-synchronised (eg) to a UART and
the UART is not accepting any further data.

Due to this, the OMAP DMA engine driver was submitted with this in the
cover note:

"For the OMAP DMAengine driver, there's a few short-comings:

1. pause/resume support is not implemented; it's not clear whether the
OMAP hardware is capable of supporting this sanely."

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

Russell King - ARM Linux

unread,
Aug 7, 2015, 12:40:05 PM8/7/15
to
Maybe Peter should phrase his replies better. "the actual bug is the
api that silently does nothing." to me means he is complaining that
dmaengine_pause() had no effect. _That_ is what I'm objecting to,
and claiming that Peter's comment is wrong.

He's now blaming me for snide remarks. I could call his one above an
incorrect snide remark against the quality of DMA engine code. He's
totally wrong, and been proven wrong by my analysis above, plain and
simple.

He should now accept that he's wrong and move along with sorting out
this mess _without_ requiring optional features to be implemented in
other subsystems to fix bugs in code he's supposed to be maintaining.

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

Russell King - ARM Linux

unread,
Aug 7, 2015, 12:40:06 PM8/7/15
to
Why wait? You're hiding a data loss bug which is clearly the result of
code you allegedly maintain.

> Well, instead of me saying something snide about the lack of upstream serial
> driver unit tests, I'll say I've been working on cleaning up and organizing
> my own tty/serial subsystem and driver units tests which I hope to upstream
> in the fall.
>
> Those include i/o validators that ran this driver for days at time without
> error at max line rate. Unfortunately, that hardware does not exhibit the
> same problem as the DRA7 noted in the changelog.

What you have is a race condition in the code you a responsible for
maintaining, caused by poorly implemented code. Fix it, rather than
whinging about drivers outside of your subsystem having never implemented
_optional_ things that you choose to merge broken code which relied upon
it _without_ checking that the operation succeeded.

It is _entirely_ your code which is wrong here.

I will wait for that to be fixed before acking the omap-dma change since
you obviously need something to test with.

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

Russell King - ARM Linux

unread,
Aug 7, 2015, 12:40:07 PM8/7/15
to
On Fri, Aug 07, 2015 at 06:20:44PM +0200, Sebastian Andrzej Siewior wrote:
> On 08/07/2015 06:07 PM, Peter Hurley wrote:
> >> If we look at what 8250-dma.c is doing:
> >>
> >> if (dma->rx_running) {
> >> dmaengine_pause(dma->rxchan);
> >>
> >> It's 8250-dma.c which is silently _ignoring_ the return code, failing
> >> to check that the operation it requested worked. Maybe this should be
> >> WARN_ON(dmaengine_pause(dma->rxchan)) or at least it should print a
> >> message?
> >
> > Thanks for the suggestion; I'll hold on to that and push it after we add
> > the 8250 omap dma pause in mainline.
>
> I have a patch ready with WARN_ON_ONCE() for 8250-omap and 8250-dma.
> This warning would trigger on am335x/edma until v4.2-rc1 and omap-dma
> based version is open. I could post it if you want me to.
> Besides that those two, there are four other drivers ignoring the
> return code dmaengine_pause().

It'll probably be a good idea if those are fixed too, and also have a
__must_check annotation on the declaration of dmaengine_pause() to
prevent future mishaps like this.

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

Peter Hurley

unread,
Aug 7, 2015, 1:30:05 PM8/7/15
to
Yes, I missed that the api included a return code which the 8250 dma
code should be checking.

> He's now blaming me for snide remarks. I could call his one above an
> incorrect snide remark against the quality of DMA engine code.

You'd be reading a lot into my statement.

> He's
> totally wrong, and been proven wrong by my analysis above, plain and
> simple.
>
> He should now accept that he's wrong

Done.

> and move along with sorting out
> this mess _without_ requiring optional features to be implemented in
> other subsystems to fix bugs in code he's supposed to be maintaining.

This is simply a bug that flew under the radar, much like every other
bug that gets fixed daily in mainline.

The omap-serial driver which doesn't use dma is still the preferred
stable driver for omap, for the moment.

One of the main features of the 8250_omap integration was the addition
of dma support. Without it, 8250_omap is ttyO in ttyS clothing.

Regards,
Peter Hurley

Russell King - ARM Linux

unread,
Aug 7, 2015, 1:50:06 PM8/7/15
to
On Fri, Aug 07, 2015 at 01:23:20PM -0400, Peter Hurley wrote:
> The omap-serial driver which doesn't use dma is still the preferred
> stable driver for omap, for the moment.
>
> One of the main features of the 8250_omap integration was the addition
> of dma support. Without it, 8250_omap is ttyO in ttyS clothing.

Correct. omap-serial used to have broken DMA support, and I had a
series of patches fixing it. Unfortunately, despite TI asking me to
work on this, TI went ahead and removed the DMA support behind my
back, which totally screwed my patch series after I merged some of
its pre-requists. It was then decided that DMA support was no longer
an important feature, so I dropped the patch set.

So, I'm well aware of the issues here, and the problems of using the
OMAP sDMA with the UARTs.

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

Greg KH

unread,
Aug 7, 2015, 2:00:07 PM8/7/15
to
On Fri, Aug 07, 2015 at 10:41:57AM +0200, Sebastian Andrzej Siewior wrote:
> This DMA driver is used by 8250-omap on DRA7-evm. There is one
> requirement that is to pause a transfer. This is currently used on the RX
> side. It is possible that the UART HW aborted the RX (UART's RX-timeout)
> but the DMA controller starts the transfer shortly after.
> Before we can manually purge the FIFO we need to pause the transfer,
> check how many bytes it already received and terminate the transfer
> without it making any progress.
>
> >From testing on the TX side it seems that it is possible that we invoke
> pause once the transfer has completed which is indicated by the missing
> CCR_ENABLE bit but before the interrupt has been noticed. In that case the
> interrupt will come even after disabling it.
>
> The AM572x manual says that we have to wait for the CCR_RD_ACTIVE &
> CCR_WR_ACTIVE bits to be gone before programming it again here is the
> drain loop. Also it looks like without the drain the TX-transfer makes
> sometimes progress.
>
> One note: The pause + resume combo is broken because after resume the
> the complete transfer will be programmed again. That means the already
> transferred bytes (until the pause event) will be sent again. This is
> currently not important for my UART user because it does only pause +
> terminate.
>
> Cc: <sta...@vger.kernel.org>

You don't get a "add support" patch into the stable tree unless it's a
trivial device id or quirk table addition, please go re-read
Documentation/stable_kernel_rules.txt

thanks,

greg k-h

Sebastian Andrzej Siewior

unread,
Aug 7, 2015, 2:00:07 PM8/7/15
to
* Russell King - ARM Linux | 2015-08-07 17:26:48 [+0100]:

>On Fri, Aug 07, 2015 at 05:36:05PM +0200, Sebastian Andrzej Siewior wrote:
>> + /*
>> + * We do not allow DMA_MEM_TO_DEV transfers to be paused.
>> + * According to RMK the OMAP hardware might prefetch bytes from
>> + * memory into its FIFO and not send it to the device due to the
>> + * pause. The bytes in the FIFO are cleared on pause. It is
>> + * unspecified by how many bytes the source address is updated
>> + * if at all.
>
>Would you mind rewording the above.
>
>Please take the time to read the manuals for the device you are playing
>with. It's mostly documented in there. See the OMAP4430 ES2.x TRM,
>16.4.18 and 16.4.19.

I didn't connect the dots that well. Thank you.



I updated the comment to:

/*
* We do not allow DMA_MEM_TO_DEV transfers to be paused.
* From the AM572x TRM, 16.1.4.18 Disabling a Channel During Transfer:
* "When a channel is disabled during a transfer, the channel undergoes
* an abort, unless it is hardware-source-synchronized …".
* A source-synchronised channel is one where the fetching of data is
* under control of the device. In other words, a device-to-memory
* transfer. So, a destination-synchronised channel (which would be a
* memory-to-device transfer) undergoes an abort if the the CCR_ENABLE
* bit is cleared.
* From 16.1.4.20.4.6.2 Abort: "If an abort trigger occurs, the channel
* aborts immediately after completion of current read/write
* transactions and then the FIFO is cleaned up." The term "cleaned up"
* is not defined. TI recommends to check that RD_ACTIVE and WR_ACTIVE
* are both clear _before_ disabling the channel, otherwise data loss
* will occur.
* The problem is that if the channel is active, then device activity
* can result in DMA activity starting between reading those as both
* clear and the write to DMA_CCR to clear the enable bit hitting the
* hardware. If the DMA hardware can't drain the data in its FIFO to the
* destination, then data loss "might" occur (say if we write to an UART
* and the UART is not accepting any further data).
*/

would that be okay?

>Due to this, the OMAP DMA engine driver was submitted with this in the
>cover note:
>
>"For the OMAP DMAengine driver, there's a few short-comings:
>
>1. pause/resume support is not implemented; it's not clear whether the
> OMAP hardware is capable of supporting this sanely."

If I google for it, I find it. pause/resume support for cyclic was added
later without a note why it is only supported for cyclic.

Sebastian

Peter Hurley

unread,
Aug 7, 2015, 2:30:07 PM8/7/15
to
[ + Heikki ]
Because it will generate tons of unnecessary reports when the patch that
WARNs inevitably ends up in mainline a version earlier than the patch that
fixes it.

>> Well, instead of me saying something snide about the lack of upstream serial
>> driver unit tests, I'll say I've been working on cleaning up and organizing
>> my own tty/serial subsystem and driver units tests which I hope to upstream
>> in the fall.
>>
>> Those include i/o validators that ran this driver for days at time without
>> error at max line rate. Unfortunately, that hardware does not exhibit the
>> same problem as the DRA7 noted in the changelog.
>
> What you have is a race condition in the code you a responsible for
> maintaining, caused by poorly implemented code. Fix it, rather than
> whinging about drivers outside of your subsystem having never implemented
> _optional_ things that you choose to merge broken code which relied upon
> it _without_ checking that the operation succeeded.
>
> It is _entirely_ your code which is wrong here.
>
> I will wait for that to be fixed before acking the omap-dma change since
> you obviously need something to test with.

I'm not sure to what you're referring here.

A WARNing fixes nothing.

If you mean some patch, as yet unwritten, that handles the dma cases when
dmaengine_pause() is unimplemented without data loss, ok, but please confirm
that's what you mean.

However, at some point one must look at the api and wonder if the separation
of concern has been drawn in the right place.

Regards,
Peter Hurley

Russell King - ARM Linux

unread,
Aug 7, 2015, 2:40:06 PM8/7/15
to
On Fri, Aug 07, 2015 at 02:21:59PM -0400, Peter Hurley wrote:
> [ + Heikki ]
>
> On 08/07/2015 12:33 PM, Russell King - ARM Linux wrote:
> > What you have is a race condition in the code you a responsible for
> > maintaining, caused by poorly implemented code. Fix it, rather than
> > whinging about drivers outside of your subsystem having never implemented
> > _optional_ things that you choose to merge broken code which relied upon
> > it _without_ checking that the operation succeeded.
> >
> > It is _entirely_ your code which is wrong here.
> >
> > I will wait for that to be fixed before acking the omap-dma change since
> > you obviously need something to test with.
>
> I'm not sure to what you're referring here.
>
> A WARNing fixes nothing.

The warning can wait.

> If you mean some patch, as yet unwritten, that handles the dma cases when
> dmaengine_pause() is unimplemented without data loss, ok, but please confirm
> that's what you mean.

But the regression needs fixing.

> However, at some point one must look at the api and wonder if the separation
> of concern has been drawn in the right place.

It _is_ in the right place. dmaengine_pause() always has been permitted
to fail. It's the responsibility of the user of this API to _check_ the
return code to find out whether it had the desired effect. Not checking
the return code is a bug in the caller's code.

If that wasn't the case, dmaengine_pause() would have a void return type.
It doesn't. It has an 'int' to allow failure, or to allow non-
implementation for cases where the underlying hardware can't pause the
channel without causing data loss.

What would you think is better: an API which silently loses data, or
one which refuses to stop the transfer and reports an error code back
to the caller.

You seem to be arguing for the former, and as such, there's no way I
can take you seriously.

In any case, Greg has now commented on the patch adding the feature,
basically refusing it for stable tree inclusion. So the matter is
settled: omap-dma isn't going to get the pause feature added in stable
trees any time soon. So a different solution now needs to be found,
which is what I've been saying all along...

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

Russell King - ARM Linux

unread,
Aug 7, 2015, 2:50:05 PM8/7/15
to
Better, if a tad verbose. I guess no one will miss that. :)

> If I google for it, I find it. pause/resume support for cyclic was added
> later without a note why it is only supported for cyclic.

Try searching for the "[PATCH 11/11] ASoC: omap-pcm: Convert to use
dmaengine" thread, which was the initial round of patches converting
omap-pcm to DMA engine, and where there was some discussion of how
to handle it at that time.

The result of that was it was felt that the safest approach was to
limit it to cyclic transfers for ASoC, and to use the method which
had been well proven over previous years/decade on numerous different
OMAP hardware.

It's all entirely sensible given that omap-dma has to cope with many
different hardware revisions with two major hardware versions and
people having limited testing resources for validation. So you will
understand, given the data loss issues here, why it was decided that
omap-dma will only provide pause/resume support for cases where it
has been proven to work sufficiently well and where data loss is not
that a major issue (if a few samples of audio get lost over a
suspend/resume, it's not going to corrupt your data.)

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

Peter Hurley

unread,
Aug 7, 2015, 9:50:05 PM8/7/15
to
On 08/07/2015 02:32 PM, Russell King - ARM Linux wrote:
> On Fri, Aug 07, 2015 at 02:21:59PM -0400, Peter Hurley wrote:
>> [ + Heikki ]
>>
>> On 08/07/2015 12:33 PM, Russell King - ARM Linux wrote:
>>> What you have is a race condition in the code you a responsible for
>>> maintaining, caused by poorly implemented code. Fix it, rather than
>>> whinging about drivers outside of your subsystem having never implemented
>>> _optional_ things that you choose to merge broken code which relied upon
>>> it _without_ checking that the operation succeeded.
>>>
>>> It is _entirely_ your code which is wrong here.
>>>
>>> I will wait for that to be fixed before acking the omap-dma change since
>>> you obviously need something to test with.
>>
>> I'm not sure to what you're referring here.
>>
>> A WARNing fixes nothing.
>
> The warning can wait.
>
>> If you mean some patch, as yet unwritten, that handles the dma cases when
>> dmaengine_pause() is unimplemented without data loss, ok, but please confirm
>> that's what you mean.
>
> But the regression needs fixing.

I too would prefer the bug to be fixed.

But calling it a regression is incorrect. There is no previous SHA in which this
problem didn't exist, except before either 8250_dma or 8250_omap was added.

From the outset, both the 8250 dma code and the 8250_omap driver (mistakenly)
relied on dmaengine_pause.


>> However, at some point one must look at the api and wonder if the separation
>> of concern has been drawn in the right place.
>
> It _is_ in the right place. dmaengine_pause() always has been permitted
> to fail. It's the responsibility of the user of this API to _check_ the
> return code to find out whether it had the desired effect. Not checking
> the return code is a bug in the caller's code.
>
> If that wasn't the case, dmaengine_pause() would have a void return type.
> It doesn't. It has an 'int' to allow failure

A resource error is significantly different than ENOSYS or EINVAL.


> or to allow non-
> implementation for cases where the underlying hardware can't pause the
> channel without causing data loss.


That's your assertion; I've seen no documentation to back that up
(other than the de facto commit).

And quite frankly, that's absurd.

1. No other driver implements _only some_ use-cases of dmaengine_pause().
2. The number of users expecting dmaengine_pause to be implemented for
non-cyclic dma transfers _dwarfs_ cyclic users.
3. There's a dedicated query interface, dma_get_slave_caps(), for which
omap-dma returns /true/ -- not /maybe/ -- to indicate dmaengine_pause()
is implemented.

As a consumer of the api, I'd much rather opt-out at device initialization
time knowing that a required feature is unimplemented, than discover it
at i/o time when it's too late.


> What would you think is better: an API which silently loses data, or
> one which refuses to stop the transfer and reports an error code back
> to the caller.

An api which provides a means of determining if necessary functionality
is implemented _during setup_. That way the consumer of the api can
determine if the feature is supportable.

For example, dma_get_slave_caps() could differentiate
* pause for cyclic support
* pause for non-cyclic support
* pause and resume support
* pause and terminate support
....


> You seem to be arguing for the former, and as such, there's no way I
> can take you seriously.

Leaping to conclusions.


> In any case, Greg has now commented on the patch adding the feature,
> basically refusing it for stable tree inclusion. So the matter is
> settled: omap-dma isn't going to get the pause feature added in stable
> trees any time soon. So a different solution now needs to be found,
> which is what I've been saying all along...

While Sebastian's initial patch is a good first-cut at addressing
8250_omap's use of omap-dma, none of the patches address the general
design problem I have outlined above; namely, that simply returning
an error at use time for an unimplemented slave transaction is
fundamentally flawed.

Regards,
Peter Hurley

Russell King - ARM Linux

unread,
Aug 8, 2015, 5:10:06 AM8/8/15
to
On Fri, Aug 07, 2015 at 09:41:19PM -0400, Peter Hurley wrote:
> That's your assertion; I've seen no documentation to back that up
> (other than the de facto commit).

So, you can't be bothered to read the thread where I quoted bits from
the manual, including paraphrasing emails I had discussing this issue
with TI.

What you're effectively saying that I'm a liar. Fine, there's no point
in engaging in further discussion. This will be my last reply to you.

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
0 new messages