As to the aligned pbuf payload: I think the code currently relies on mem_malloc returning aligned data (and that should be OK with your current settings), so you might want to check the return values of your libc malloc.As the pbuf code is written (I think I'm running the latest stable, 1.3.2), there is no way to guarantee a 32-bit aligned payload pointer at the start of the Ethernet frame with MEM_ALIGNMENT=4. This is because in pbuf_alloc, the payload pointer for PBUF_RAM types is initialized at an offset that is itself memory aligned (this offset is equal to the size of pbuf structure plus the various header lengths). When the 14 byte Ethernet header is eventually uncovered, it will always be 16-bit aligned since the the original payload pointer was 32-bit aligned. This of course assumes a PBUF_LINK_HLEN=14.
The moral for me is that I actually see higher throughput by setting MEM_ALIGNMENT=2, which guarantees that when the Ethernet header is uncovered, it will be 32-bit aligned. Even though the TCP/IP headers are unaligned, the final copy to the mac's transmit buffer is much faster if the source pointer is 32-bit aligned, i.e. at the start of the actual Ethernet frame.
Btw, this is also assuming the outgoing data is copied into the stack such that all the outgoing pbufs are PBUF_RAM-type.
Interesting results, but pretty esoteric since this is not an oft-used platform (MicroBlaze w/ xps_ethernetlite IP core).
Tyrel Newton wrote:I see... I must say I didn't check that yet. And as my code itself requires the payload aligned (or I would have to use packed structs to access the contents), I just ended up with 16-bit DMA transfers (using an Altera NIOS-II system with a standard Altera RAM-to-RAM DMA-engine). I always planned to write my own DMA engine in VHDL that can do 32-bit transfers from 16-bit aligned data, but I didn't make it, yet.As to the aligned pbuf payload: I think the code currently relies on mem_malloc returning aligned data (and that should be OK with your current settings), so you might want to check the return values of your libc malloc.As the pbuf code is written (I think I'm running the latest stable, 1.3.2), there is no way to guarantee a 32-bit aligned payload pointer at the start of the Ethernet frame with MEM_ALIGNMENT=4. This is because in pbuf_alloc, the payload pointer for PBUF_RAM types is initialized at an offset that is itself memory aligned (this offset is equal to the size of pbuf structure plus the various header lengths). When the 14 byte Ethernet header is eventually uncovered, it will always be 16-bit aligned since the the original payload pointer was 32-bit aligned. This of course assumes a PBUF_LINK_HLEN=14.
Anyway, if there is a requirement to let pbuf_alloc produce an unaligned payload so that the outer header is aligned, please file a bug report or patch at savannah!
The question is whether the final copy is what matters or the rest of the processing: when the final copy is done in background by a DMA engine, this might not even be harmful. While it is true that the transfer takes longer, it only has to be faster than the previous frame takes for sending. The only difference then is how long the DMA transfer generates a background load on the RAM bus, and if it uses too much RAM bandwitdth for the processor to work normally.The moral for me is that I actually see higher throughput by setting MEM_ALIGNMENT=2, which guarantees that when the Ethernet header is uncovered, it will be 32-bit aligned. Even though the TCP/IP headers are unaligned, the final copy to the mac's transmit buffer is much faster if the source pointer is 32-bit aligned, i.e. at the start of the actual Ethernet frame.
However, if the processor does the final copy (without a DMA enginge), than it's a bad thing if the data is not aligned. But you should be able to include a DMA engine in your FPGA, so...
Single PBUF_RAM pbufs or chained pbufs?Btw, this is also assuming the outgoing data is copied into the stack such that all the outgoing pbufs are PBUF_RAM-type.
Not that different to my own platform ;-) And after all, we need examples for task #7896 (Support zero-copy drivers) and this is one example to start with.Interesting results, but pretty esoteric since this is not an oft-used platform (MicroBlaze w/ xps_ethernetlite IP core).
Simon