Can't get SPI data up to 10 MHz - PRU

57 views
Skip to first unread message

fred.p....@gmail.com

unread,
Dec 4, 2018, 4:25:32 AM12/4/18
to BeagleBoard
Hi, I need your help,

I need to read data of an SPI Master device on the BeagleBone. Since the SPI kernel driver doesn't support an SPI slave mode I have to implement it in the PRU (Because the maximum frequency one BeagleBone's GPIO can be toggle is only 15 KHz).
From what I have read, the PRU cycle time is 200 MHz, even so, I can't read data up to 10 MHz. I am doing the following:



                         k=0;
shift = 0;

while((__R31&cs) != cs){ // CS = 0
while((__R31&sclk) == sclk);
while((__R31&sclk) == sclk){ //sclk = 1

if((__R31&cs) == cs)
goto END;
}
while((__R31&sclk) != sclk); //SCLK = 0
buffer[cnt] = ((__R31&miso) == miso)?  buffer[cnt] | 0x01 << shift: buffer[cnt]; // Here is where my cycle is being stranded!!!
k++;
shift = (shift==32)? 0 : shift + 1;
cnt = ((k%32)==0)? cnt+1 : cnt; 

If I comment on the reading line, I can't catch clocks at 20 MHz. However, if a uncomment it I only can get all of them at 9 MHz (I need 10 MHz). My question is if it is possible to get it a bit faster. I can't understand why it is so slow (given that the PRU is a real-time processor  and has a high speed between instructions).

Regards,
-- Fred Gomes

Fred Gomes

unread,
Dec 4, 2018, 4:30:42 AM12/4/18
to beagl...@googlegroups.com
I've forgotten to mention. I am sending the clocks from the PRU 1 to PRU 0 at this time. So I know exactly how many clocks there is and the frequency. The objective is to plug the sensor which I want to read the data after having this working .

-- Fred Gomes


--
For more options, visit http://beagleboard.org/discuss
---
You received this message because you are subscribed to a topic in the Google Groups "BeagleBoard" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/beagleboard/Vtw23FneFLk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to beagleboard...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beagleboard/d8ea84f9-cb22-47bd-89ac-328214ded03e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Charles Steinkuehler

unread,
Dec 4, 2018, 7:39:43 AM12/4/18
to beagl...@googlegroups.com
Try writing in assembly, or at least providing a listing of the PRU
code your compiler is generating. The C code you've written could
turn into very ugly assembly with a lot of memory reads (which are
*VERY* expensive on the PRU) depending on the compiler.

In general, for speed you only want the PRU to write to your buffer[]
memory region, you should never read from it. So use a local variable
to build a 32-bit word and then write that into the buffer[] memory
instead of bit-wise oring the data piece-meal into your buffer[].

If you need to maintain a rate of 10 MHz, you only have 20 assembly
instructions per bit, which isn't many when you're trying to code in C.
>> <https://groups.google.com/d/msgid/beagleboard/d8ea84f9-cb22-47bd-89ac-328214ded03e%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>


--
Charles Steinkuehler
cha...@steinkuehler.net

Gerhard Hoffmann

unread,
Dec 4, 2018, 10:15:38 PM12/4/18
to beagl...@googlegroups.com


Am 04.12.18 um 10:25 schrieb fred.p....@gmail.com:
Hi, I need your help,

I need to read data of an SPI Master device on the BeagleBone. Since the SPI kernel driver doesn't support an SPI slave mode I have to implement it in the PRU (Because the maximum frequency one BeagleBone's GPIO can be toggle is only 15 KHz).
From what I have read, the PRU cycle time is 200 MHz, even so, I can't read data up to 10 MHz. I am doing the following:



                         k=0;
shift = 0;

while((__R31&cs) != cs){ // CS = 0


Why not simply     

#define CS  15              /* whatever */

while(__R31 & CS) {};


I hope that cs is a #defined macro and not a variable;

In C there is the convention that macros are written in upper case,

as a warning that there may be more action inside than meets the eye.

And macros should employ parentheses generously to avoid bugs like this:

#define A  2 + 3

if (5 * A)  .....  which results in

if (5 * 2 + 3)  , which is probably not what was intended.


Note that EVERY computation result can be used as a boolean.

0 is false, EVERYTHING else is true, so your compares to

generate a boolean are just a non-wanted complication.

 

while((__R31&sclk) == sclk);
while((__R31&sclk) == sclk){ //sclk = 1

if((__R31&cs) == cs)
goto END;

I don't know where END is, but gotos are known to not work well

with optimizing compilers since the the compiler may have to

throw assumptions overboard that are not safe any more since the

goto is non-local.


buffer[cnt] = ((__R31&miso) == miso)?  buffer[cnt] | 0x01 << shift: buffer[cnt]; // Here is where my cycle is being stranded!!!

The conditional assignment makes the program read back buffer[cnt]. That is bad

in concept since the buffer may be in shared ram and reading that may cost many clocks

for arbitration. Writing the shared RAM may be a little bit cheaper since the bus logic

might buffer the write and let the PRU run on.


Thinking about it, even the PRU local data RAM is somehow shared since the

ARM CPU and the other PRU can read & write it; let's hope that the priorities are set

in a sensible way.


Also the address of buffer[cnt] is computed 3 times in this line.

The compiler might spot that it has the values already. But you could kick it into the

right direction:


register int tmp;        // I have no idea if the TI compiler honors that. It is just a hint.

int *bufferp =  KJHGKJHJHM;


tmp = __R31 & MISO;  // the & result is already usable as a boolean

if (tmp) tmp = tmp | (1 << shift);  // that feels somehow wrong since the original bit is kept.

*bufferp++ = tmp;


You might also leave that shifting to the ARM CPU; it probably can be done

when the data is further proccessed.


regards,

Gerhard

Fred Gomes

unread,
Dec 7, 2018, 4:52:29 AM12/7/18
to beagl...@googlegroups.com
Hi Gerhard,

Thank you very much for your answer. 

In fact, I did not have the CS and Clock pins defined as "define", I made the program a way faster.
Another tip that improved the program's performance a lot is to access the storage array through a pointer, rather than a direct indexation, like the following: *(buffer_shared + count) = var. 

About your advice of not processing the data as it is being received, I don't think that's possible. The reason is that I want to catch 98 bytes * 102 bytes = 79.184 bits (which corresponds to an image of CMOS sensor). I don't have enough memory space to save those bits without doing the shift processing inside the function ...

Here's the code where I got the better results:

while(__R31&cs); // CS = 1 

while(__R31&sclk){ //sclk = 1

if(__R31&cs){
goto END;}
}
while(!(__R31&sclk)); //SCLK = 0 
var =  (__R31&miso)? var |(0x01 << shift): var;

k++;
shift--;

if(shift == -1){
*(buffer_shared + cnt) = var;
cnt++;
shift=31;
var = 0;
}
END:
*(buffer_shared+2600) = k; // Here I am storing the value of the counter, to check if all the clocks wre catched. 

I have only one small problem though, the sensor is sending data every time, and sometimes it catches more than one frame. I don't understand quite well why is it happening since the time between frames is about 10 ms, and in that why it is difficult to have sure that I haven't miss clocks in the frame. So, to have sure I am not missing clocks I add the following line:

LABEL:
if(k==79184)
goto END;

And I noticed that sometimes I miss one clock (although here I am introducing one extra instruction, which will make the program a bit slow). If you have any suggestion I should implement to tackle this problem, please let me know ... 

Thank you very much for your help.
-- Fred Gomes


--
For more options, visit http://beagleboard.org/discuss
---
You received this message because you are subscribed to a topic in the Google Groups "BeagleBoard" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/beagleboard/Vtw23FneFLk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to beagleboard...@googlegroups.com.

Charles Steinkuehler

unread,
Dec 7, 2018, 8:47:15 AM12/7/18
to beagl...@googlegroups.com
Again, please either write in assembly or post the assembly listing
the compiler is generating. You have 12 lines of C code above, and
that doesn't even look like the entire loop (just the inner loop for
one 32-bit word). As mentioned previously, you have time for about 20
PRU assembly instructions, which isn't many for 12 lines of C.

Without the full code and seeing the assembly your compiler is
generating, I can only make a few suggestions. An optimizing compiler
would probably do much of this for you, but it doesn't sound like your
compiler is doing much optimization (again, let's see some listing
files!):

* Get rid of the count variable and create a dedicated buffer pointer,
then increment that, ie:

*(buffer_shared + cnt) = var;
cnt++;

...turns into:

// Initial setup
uint32_t *buffer_ptr = buffer_shared

// ...other code goes here...

*(buffer_ptr) = var;
buffer_ptr++;

* Review the generated code for the var update. The PRU has lots of
bit test and shift instructions, so changing this calculation slightly
could reduce the number of instructions needed. If I was writing in
assembly I would shift var first then conditionally or in a '1'
depending on the state of the input pin. In C it would look like this
(but I don't know if this will generate the actual assembly I'm
thinking of, it depends on your compiler):

var >>= 1;
if (__R31&miso) var |= 0x8000;

* Review the code generated for the various bit test instructions (eg:
__R31 & <val>). These should be a single bit test instruction in PRU
assembly, but that may not be what the compiler is generating.

There are other "tricks" you can try (eg: get rid of the shift count
and use a bit in var instead). Without being able to see the
generated assembly you can easily wind up making things much worse,
but you can try it if you like. Make sure to initialize var to 0x8000
and set the buffer_ptr value in your setup code:

while(!(__R31&sclk)); //SCLK = 0

k++; // I assume this is needed outside the loop

// 1 in the LSB means this is the last bit of our 32-bit word
if (var & 0x01)
{
// Shift in the last bit
var >>= 1;
if (__R31&miso) var |= 0x8000;

// Store the value
*(buffer_ptr) = var;

// Setup for the next word
buffer_ptr++;
var = 0x8000;
} else {
// Shift in the next bit
var >>= 1;
if (__R31&miso) var |= 0x8000;
}

--
Charles Steinkuehler
cha...@steinkuehler.net

Fred Gomes

unread,
Dec 8, 2018, 4:11:35 PM12/8/18
to beagl...@googlegroups.com
Hi Charles, thank you very much for your answer,

I followed your advice but something overly odd is happening. 

First of all, let me contextualize you: the image sensor is always sending data through the CS, SCLK, and MOSI pins. The delay between each frame is about 10 ms. That said, I wrote the code in attachment "prog_pru_v1". In the first stage, I test it using the another PRU from beagle bone. So, I send some clocks from PRU 1 and count them on the PRU1; here I checked that the PRU can get all the pulses up to a frequency clock of 20 MHz (I only need 10 MHz). The problem comes when I shortened the delay time between frames, it stops catching all the clocks (in fact, it catches very few clocks) and I don't understand the reason since it should be in a loop (between the variation edges from the CS pin.

The other problem I did (prog_pru_v2.c) seems to work a bit better. It sometimes catches all the clocks from the sensor and fails another times. Surprisingly, the assembly code seems to be much uncluttered on this code.

I can't understand why is this happening, it doesn't seem to have much sense. I validate the code successfully, but the problem comes when I start sending much data spaced about 10 ms!

Find attached the code programs mentioned above, both the .c code and assembly. 

Let me know if you have any other suggestion I should test,

Bets regards,
-- Fred Gomes



--
For more options, visit http://beagleboard.org/discuss
---
You received this message because you are subscribed to a topic in the Google Groups "BeagleBoard" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/beagleboard/Vtw23FneFLk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to beagleboard...@googlegroups.com.
BBB.rar

Charles Steinkuehler

unread,
Dec 9, 2018, 6:34:50 PM12/9/18
to beagl...@googlegroups.com
As mentioned, you can't really tell what a compiler is going to turn
your C code into. As you discovered, the v2 code that is actually more
suited to the PRU low-level assembly instructions turns into worse
assembly because of the compiler. I suspect you have at least two
problems with your code:

1) You are on the edge of having enough time to capture all the clocks
even with the "better" assembly produced by v1 of the C code.

2) You may be encountering problems with noise or metastability with
your code which looks for the clock edges. It's not at all uncommon to
have an occasional glitch which could be why the v1 code isn't working
reliably.

I *HIGHLY* recommend you code this in assembly, and add a small bit of
filtering to the clock signal (eg: require the clock signal to be the
same for two or more clocks before detecting the edge).
> <mailto:cha...@steinkuehler.net>> escreveu no dia sexta, 7/12/2018 à(s)
> cha...@steinkuehler.net <mailto:cha...@steinkuehler.net>
>
> --
> For more options, visit http://beagleboard.org/discuss
> ---
> You received this message because you are subscribed to a topic in
> the Google Groups "BeagleBoard" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/beagleboard/Vtw23FneFLk/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> beagleboard...@googlegroups.com
> <mailto:beagleboard%2Bunsu...@googlegroups.com>.
> --
> For more options, visit http://beagleboard.org/discuss
> ---
> You received this message because you are subscribed to the Google
> Groups "BeagleBoard" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to beagleboard...@googlegroups.com
> <mailto:beagleboard...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/beagleboard/CAJHs20z_yfufDZ227vseD4zzUiE0kUAVBG0KBJcP6UnxgGYR1A%40mail.gmail.com
> <https://groups.google.com/d/msgid/beagleboard/CAJHs20z_yfufDZ227vseD4zzUiE0kUAVBG0KBJcP6UnxgGYR1A%40mail.gmail.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--
Charles Steinkuehler
cha...@steinkuehler.net
Reply all
Reply to author
Forward
0 new messages