Like William said, we can't really answer your question without more
detail, but I'll take a guess. The DRAM that's shared with the PRU is
marked as non-cachable memory since the PRU can modify it. That means
for a typical memory copy loop *EACH* word read from DRAM is going to
turn into a full round-trip CPU to DRAM to CPU read latency rather
than the first read triggering a cache-line fill.
You probably want to use a memory copy that uses a bunch more
registers and does burst reads from the PRU memory region (as big as
you can for performance, but at least a cache line long). There are
several useful routines from the ARM folks themselves:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka13544.html
...along with the benefits and drawbacks of each.
--
Charles Steinkuehler
cha...@steinkuehler.net