Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

atmel_serial driver modifications for 6Mb/s throughput in synchronous mode

66 views
Skip to first unread message

wzab

unread,
Oct 15, 2011, 4:47:31 AM10/15/11
to
Hi,

I need to transmit serial data at rate 6Mb through the AT91SAM9260
USART.
I have succesfully modified the driver to work in synchronous mode
(see thread: https://groups.google.com/group/comp.arch.embedded/browse_thread/thread/503dd4dfd723e74f
)
However then I have faced strange problems at high data rates.
I use USART with DMA. Tha application sets tty into "raw" mode.
Anyway at 6Mb/s I've found, that some data are lost, but neither
errors nor even messages in system log were generated.
I supposed, that this is caused by DMA buffer overrun (the buffer
length of 512 bytes is sufficient for ca 580µs of transmission at 6Mb/
s, so the user space application may have problem to receive data on
time).
Therefore I've modified the length of this buffer (PDC_BUFFER_SIZE
constant) from the original 512 to 65536.
However after this modification transmission was even worse.
Maybe some parts of the code rely on fact that DMA buffer is e.g. not
bigger than one memory page?
Maybe some pointers used to communicate with this buffer use only
limited number bits?
Has anyone faced similar problems with the atmel_serial driver?

Looking at the amount of code assosciated with the TTY layer
surrounding the serial driver, I'm considering getting rid of
atmel_serial driver for this partical USART port and writing my own
lightweight and optimized DMA based driver for high speed transmission
of raw data. Maybe such solution already exists?
--
TIA & Regards,
Wojtek'

Jukka Marin

unread,
Oct 15, 2011, 6:00:20 AM10/15/11
to
On 2011-10-15, wzab <wza...@gmail.com> wrote:
> I have succesfully modified the driver to work in synchronous mode
> (see thread: https://groups.google.com/group/comp.arch.embedded/browse_thread/thread/503dd4dfd723e74f
> )
> However then I have faced strange problems at high data rates.
> I use USART with DMA. Tha application sets tty into "raw" mode.
> Anyway at 6Mb/s I've found, that some data are lost, but neither
> errors nor even messages in system log were generated.

I don't know about the driver you are using, but we've had severe problems
with Atmel chips and SPI running at high speed. Atmel is proud to have DMA
for various peripherals, but they are too stupid to implement any kind of
FIFO buffers. If the MCU bus gets too busy, the DMA system can steal no
bus cycles and it simply drops the received data without telling anyone.
I guess the same problem affects data transmission.

And yes, this really happens. We had to change our hardware and stop using
SPI at all. In the good old Motorola MCU's (68k series eg) have both DMA
_and_ FIFO and so they work reliably. Wish Atmel borrowed a brain and fixed
their chips.

You may try to place the serial buffer in the internal MCU RAM - it has
larger bandwidth and on the newer chips, it can be accessed even while the
external memory bus is busy.

Good luck.

-jm

wzab

unread,
Oct 16, 2011, 6:40:03 AM10/16/11
to
On Oct 15, 12:00 pm, Jukka Marin <jma...@pyy.embedtronics.fi> wrote:
> On 2011-10-15, wzab <wza...@gmail.com> wrote:
>
> > I have succesfully modified the driver to work in synchronous mode
> > (see thread:https://groups.google.com/group/comp.arch.embedded/browse_thread/thre...
> > )
> > However then I have faced strange problems at high data rates.
> > I use USART with DMA. Tha application sets tty into "raw"  mode.
> > Anyway at 6Mb/s I've found, that some data are lost, but neither
> > errors nor even messages in system log were generated.
>
> I don't know about the driver you are using, but we've had severe problems
> with Atmel chips and SPI running at high speed.  Atmel is proud to have DMA
> for various peripherals, but they are too stupid to implement any kind of
> FIFO buffers.  If the MCU bus gets too busy, the DMA system can steal no
> bus cycles and it simply drops the received data without telling anyone.
> I guess the same problem affects data transmission.
>

> And yes, this really happens.  We had to change our hardware and stop using
> SPI at all.  In the good old Motorola MCU's (68k series eg) have both DMA
> _and_ FIFO and so they work reliably.  Wish Atmel borrowed a brain and fixed
> their chips.
>
> You may try to place the serial buffer in the internal MCU RAM - it has
> larger bandwidth and on the newer chips, it can be accessed even while the
> external memory bus is busy.
>

Thanks a lot, it seems, that you are right. So I'll need to move my
USART
dma buffer to the internal SRAM.
As I can see, there are two areas with length of 4KB at physical
addresses
0x20000-0x20fff and 0x30000-0x30fff
I have found a structure at91sam9260_sram_desc describing how this
area should
be mapped by the Linux system:
http://lxr.linux.no/linux+v3.0.4/arch/arm/mach-at91/at91sam9260.c#L37

However I can't see any support for allocation of DMA buffers from
this area :-(.
There is no special zone (well for memory consisting of one page it
makes no sense
to define such a zone ;-) ).
Does it mean, that I should allocate the buffer myself by something
like:
request_mem_region(AT91_IO_VIRT_BASE-
AT91SAM9260_SRAM0_SIZE,PDC_BUFFER_SIZE,"usart") ?
I wouldn't like to tie my buffer to the first PDC_BUFFER_SIZE bytes of
this area.
(There may be other drivers using SRAM for their high-speed buffers -
eg MAC).
Is there any memory management provided for allocation of small part
of the SRAM?
I've found the: http://www.at91.com/forum/viewtopic.php/f,12/t,4991/ ,
but
it didn't provide me with all neded info...

In fact allocating of DMA buffers in SRAM for all USARTs will be an
overkill.
I have only one USART running at 6Mb/s, so it seems that anyway I'll
need to customize
the platform code for USART allocation...
I'll appreciate any hints...

wzab

unread,
Oct 16, 2011, 11:32:52 AM10/16/11
to
OK. I have found a patch with full implementation of Ethernet TX
buffer in SRAM:
ftp://www.at91.com/pub/linux/2.6.27-at91/2.6.27-at91-exp.patch.gz
It makes a little clear how to allocate buffer:

+#if defined(CONFIG_ARCH_AT91) && defined(CONFIG_MACB_TX_SRAM)
+#if defined(CONFIG_ARCH_AT91SAM9260)
+ if (request_mem_region(AT91SAM9260_SRAM0_BASE, TX_DMA_SIZE, "macb"))
{
+ bp->tx_ring_dma = AT91SAM9260_SRAM0_BASE;
+ } else {
+ if (request_mem_region(AT91SAM9260_SRAM1_BASE, TX_DMA_SIZE,
"macb")) {
+ bp->tx_ring_dma = AT91SAM9260_SRAM1_BASE;
+ } else {
+ printk(KERN_WARNING "Cannot request SRAM memory for TX ring,
already used\n");
+ return -EBUSY;
+ }
+ }
+

However it doesn't explain how to avoid collisions between different
drivers needing to use buffers in SRAM.
E.g. in my case (Ethernet connected data acquisition using USART via
optoisolation to receive data from AD converters) I need both Ethernet
and USART to have buffers in internal SRAM...
--
Regards,
WZab

wzab

unread,
Oct 16, 2011, 12:07:35 PM10/16/11
to
I have found the patch: http://lists.infradead.org/pipermail/linux-arm-kernel/2011-April/048279.html
which makes use of http://lxr.linux.no/linux+v3.0.4/include/linux/genalloc.h
Maybe this is the right way to go.
I'll share my results if I find a satisfactory solution...
--
Regards,
Wojtek

Ulf Samuelsson

unread,
Oct 16, 2011, 2:56:16 PM10/16/11
to
The new AT91SAM9Gx5 chips have get rid of the PDC in favour
of a real DMA controller with built in FIFO.
This will allow burst access to the SDRAM.
With the pricing of the SDRAM going up, the DDR2 interface
of these chips, will make them more cost effective.

BR
Ulf Samuelsson

wzab

unread,
Oct 16, 2011, 5:31:54 PM10/16/11
to
On Oct 16, 8:56 pm, Ulf Samuelsson <ulf_samuels...@invalid.telia.com>
wrote:
>
> The new AT91SAM9Gx5 chips have get rid of the PDC in favour
> of a real DMA controller with built in FIFO.
> This will allow burst access to the SDRAM.
> With the pricing of the SDRAM going up, the DDR2 interface
> of these chips, will make them more cost effective.
>
> BR
> Ulf Samuelsson

Yes, I know, but unfortunately my hardware platform is fixed, and I
really
have to force THIS hardware (AT91SAM9260) to provide required
performance...

BR
Wojtek

Ulf Samuelsson

unread,
Oct 17, 2011, 2:35:58 PM10/17/11
to
Anything running high speed with the PDC needs to understand
that the PDC needs access to the bus *often*.
At 6 Mbps and 10 bits per character, start, 8 bit data, stop,
it will take 1+2/3 us to handle one character.
the PDC will request the bus, when there is a byte in the holding
register. This byte needs to be moved to the memory before the next
byte is received.
If any group of peripheral occupies the bus for more than 1,67 us,
then you lose characters on reception.
If you put the receive buffer in SRAM, then the bus matrix
will be of great help, as long as nothing else is eating up
ALL the bandwidth to the bus. A PDC transfer should then
only take 10 ns.

It should be OK to have the transmit buffer in SDRAM, saving
the precious 4 kB of SRAM that normally is used for data.
You might also consider setting up the bus matrix
to prioritize the PDC.
This is not done by default.

If you have to use Ethernet, you might be screwed
unless you put the receive buffer in SRAM.

BR
Ulf Samuelsson

wzab

unread,
Oct 22, 2011, 5:42:49 PM10/22/11
to
> Anything running high speed with the PDC needs to understand
> that the PDC needs access to the bus *often*.
> At 6 Mbps and 10 bits per character, start, 8 bit data, stop,
> it will take 1+2/3 us to handle one character.
> the PDC will request the bus, when there is a byte in the holding
> register. This byte needs to be moved to the memory before the next
> byte is received.
> If any group of peripheral occupies the bus for more than 1,67 us,
> then you lose characters on reception.
> If you put the receive buffer in SRAM, then the bus matrix
> will be of great help, as long as nothing else is eating up
> ALL the bandwidth to the bus. A PDC transfer should then
> only take 10 ns.
>
> It should be OK to have the transmit buffer in SDRAM, saving
> the precious 4 kB of SRAM that normally is used for data.
> You might also consider setting up the bus matrix
> to prioritize the PDC.
> This is not done by default.

Well, I have managed to allocate the DMA RX buffer for the particular
USART in SRAM. It required small modification in atmel_serial.c .
However now of course I got the kernel panic, when the
atmel_tasklet_func
called atmel_rx_from_dma, which in turn called
dma_sync_single_for_cpu.
As for ioremapped memory the mapping didn't exist, it lead to kernel
panic.

Well, I can check if the currently serviced USART uses buffer in SRAM
and
bypass calls to synchronization functions, but I'm not sure if for
this
platform is it safe.
Aren't there any cache mechanisms active for SRAM memory?
When the PDC stores something to the SRAM buffer is it immediately
visible for CPU, or should I invalidate associated cache?
--
BR & TIA,
Wojtek

wzab

unread,
Oct 22, 2011, 5:51:35 PM10/22/11
to

> Aren't there any cache mechanisms active for SRAM memory?
> When the PDC stores something to the SRAM buffer is it immediately
> visible for CPU, or should I invalidate associated cache?
>

Maybe for the buffer ioremapped from the SRAM memory I should simply
directly call consistent_sync (as it is done in fact in the
dma_sync_single_for... routines:
http://lxr.linux.no/linux+v3.0.4/arch/xtensa/include/asm/dma-mapping.h#L101
)?

wzab

unread,
Oct 23, 2011, 4:53:53 AM10/23/11
to
On Oct 22, 11:51 pm, wzab <wza...@gmail.com> wrote:

> Maybe for the buffer ioremapped from the SRAM memory I should simply
> directly call consistent_sync (as it is done in fact in the
> dma_sync_single_for... routines:http://lxr.linux.no/linux+v3.0.4/arch/xtensa/include/asm/dma-mapping....
> )?

Oooops, sorry, I have not noticed, that consistent_sync does not exist
for arm architecture.
The implementation of dma_sync_single... for arm is different:
http://lxr.linux.no/linux+v3.0.4/arch/arm/include/asm/dma-mapping.h#L503
--
Wojtek

wzab

unread,
Oct 23, 2011, 3:48:31 PM10/23/11
to
I have changed the implementation of atmel_rx_from_dma to use the
__dma_single_cpu_to_dev and __dma_singlu_dev_to_cpu for USART whitch
uses RX buffer in SRAM. However then I get the Kernel BUG at:
http://lxr.linux.no/linux+v3.0.4/arch/arm/mm/dma-mapping.c#L451

Investigating the problem more thoroughly I've stated that my buffer
is allocated at 0x200000 (SRAM0) and ioremapped to 0xc4890000, while
high_memory is at 0xc4000000. Therefore my buffer address fails the
test of address validity at http://lxr.linux.no/linux+v3.0.4/arch/arm/include/asm/memory.h#L292
.

I've checked if location of my buffer above the high_memory location
makes synchronisation unnecessary (in fact I thought, that virtual
memory returned by ioremap is accessed bypassing the cache
mechanisms), but when I simply skipped the synchronization routines, I
still got my data stream corrupted.

So either my SRAM0 located buffer still requires the DMA
synchronization (but how to provide it?) or even though it is located
in SRAM i still get lost characters due to "traffic jams" in the Bus
Matrix (as Ulf suggested a few posts above).
If the latter is the case, I should probably boost the priority of
USART in the Bus Matrix, but this problem seems to be not weel
documented :-(.
I have found http://lxr.linux.no/linux+v3.0.4/arch/arm/mach-at91/include/mach/at91sam9260_matrix.h
but the only places where the matrix seems to be used are:
http://lxr.linux.no/linux+v3.0.4/arch/arm/mach-at91/at91sam9260_devices.c#L418
and http://lxr.linux.no/linux+v3.0.4/arch/arm/mach-at91/at91sam9260_devices.c#L1275

Does it mean, that Linux on AT91SAM9260 uses the Bus Matrix in its
default state?

Ulf Samuelsson

unread,
Oct 24, 2011, 6:39:40 PM10/24/11
to
Linux does not set up the Matrix.
This is (should be) done in at91bootstrap.
The Atmel at91bootstrap hardly touches the matrix.
The argument beeing, that only the customer knows
what the priorities should be.
I checked in at91bootstrap v2.13 into openembedded.
www.openembedded.org, where I set up the matrix
for the SAM9263.
It was done as an example, but I have not verified
the throughput details.

BR
Ulf Samuelsson

wzab

unread,
Oct 26, 2011, 10:56:58 AM10/26/11
to
On 25 Paź, 00:39, Ulf Samuelsson <ulf_samuels...@invalid.telia.com>
wrote:

> Linux does not set up the Matrix.
> This is (should be) done in at91bootstrap.
> The Atmel at91bootstrap hardly touches the matrix.
> The argument beeing, that only the customer knows
> what the priorities should be.
> I checked in at91bootstrap v2.13 into openembedded.www.openembedded.org, where I set up the matrix
> for the SAM9263.
> It was done as an example, but I have not verified
> the throughput details.
>
> BR
> Ulf Samuelsson

I have put the matrix setup code into the board initialization code:
(board-mmnet1000c, probably derived from board-sam9260ek.c)

static void __init ek_board_init(void)
{
/* Serial */
at91_add_device_serial();
/* USB Host */
at91_add_device_usbh(&ek_usbh_data);
/* USB Device */
at91_add_device_udc(&ek_udc_data);
/* SPI */
at91_add_device_spi(ek_spi_devices, ARRAY_SIZE(ek_spi_devices));
/* I2S */
at91_add_device_ssc(AT91SAM9260_ID_SSC, ATMEL_SSC_RX);
/* NAND */
at91_add_device_nand(&ek_nand_data);
/* Ethernet */
at91_add_device_eth(&ek_macb_data);
/* MMC */
at91_add_device_mmc(0, &ek_mmc_data);
/* I2C */
at91_add_device_i2c(NULL, 0);
/* LEDs */
at91_gpio_leds(ek_leds, ARRAY_SIZE(ek_leds));
/* Push Buttons */
ek_add_device_buttons();
/* Modify the Bus Matrix settings */
printk("<1>AT91_MATRIX_SCFG0 was %x
\n",at91_sys_read(AT91_MATRIX_SCFG0));
at91_sys_write(AT91_MATRIX_SCFG0, 0x010a0010);
printk("<1>AT91_MATRIX_SCFG1 was %x
\n",at91_sys_read(AT91_MATRIX_SCFG1));
at91_sys_write(AT91_MATRIX_SCFG1, 0x010a0010);
printk("<1>AT91_MATRIX_PRAS0 was %x
\n",at91_sys_read(AT91_MATRIX_PRAS0));
at91_sys_write(AT91_MATRIX_PRAS0, 0x00200300);
printk("<1>AT91_MATRIX_PRAS1 was %x
\n",at91_sys_read(AT91_MATRIX_PRAS1));
at91_sys_write(AT91_MATRIX_PRAS1, 0x00200300);

}

I have also checked, that my ioremapped buffers do not require DMA
synchronization.

Anyway the problem has not been succesfully solved, as the overhead
introduced
by the whole tty related layer of atmel_serial driver is simply to
big.

When I receive data up to ca. 170kB/s everything works fine. When the
data stream
reaches the desired 340kB/s, some data are lost probably not between
the USART and SDC
but due to software buffer overruns.

Probably I should get rid of the whole, very good, but from my point
of view too
sophisticated atmel_serial.c driver, and instead I should write a
minimalistic
DMA based driver passing data directly from DMA buffer to mmapped
memory in the
user space...

Thanks for help in resolving of bus matrix related problems.
--
BR Wojtek

wzab

unread,
Oct 26, 2011, 11:05:57 AM10/26/11
to
On 26 Paź, 16:56, wzab <wza...@gmail.com> wrote:

> Probably I should get rid of the whole, very good, but from my point
> of view too
> sophisticated atmel_serial.c driver, and instead I should write a
> minimalistic
> DMA based driver passing data directly from DMA buffer to mmapped
> memory in the
> user space...

Ooops, of course I have mistaken. It should be the memory buffer
allocated
in kernel driver, mmapped by this driver for the user mode
application.

To move data to the user space memory I should use get_user_pages, but
this
again is too complex for this purpose ;-).
--
BR Wojtek
0 new messages