PRU - Can't read data up to 2.5 MHz

61 views
Skip to first unread message

fred.p....@gmail.com

unread,
Nov 28, 2018, 7:18:57 AM11/28/18
to BeagleBoard
Hello, I need your help 

I have to read data from an SPI master device, which sends the clock at 10 MHz. Since the SPI kernel driver only allows to the beagle bone to working as SPI Master I had to implement this functionality using a PRU.

From what I've read throughout the internet the PRU processing rate is 200 MHz, so I thought I could easily read data at 10 MHz. Oddly, it happens that with transmission rates up to 2.5 MHz I am being unable to catch all the rising edges on the clock pin.

So, here's what I did:

In the PRU0 I wrote the following code:

bool WaitForRisingEdge_sclk(uint32_t sclk)
{
state[0] = ((__R31&sclk) == sclk) ?  true :  false;
if((state[0] == true) && (previous_state[0] == false))
{
previous_state[0] = state[0];
return true; //Rising edge
}
else{
previous_state[0] = state[0];
return false;
}
}

void main(void)
{
while(1)
{
START:
count_clocks = 0;
while(gpio2[GPIO_DATAIN/4] & P8_8) //Receive notification from the beagle bone to read data
{

LOOP:
if(WaitForRisingEdge_sclk(sclk)) // wait for rising edge on clock pin
{
//buffer[i] = ((__R31&miso) == miso)?  buffer[i] | 0x01 << k : buffer[i] | 0x00 << k;
//buffer[i] =   buffer[i] | 0x01 << k;
count_clocks++;
}
else if(WaitForRisingEdge_cs(cs))
{
gpio2[GPIO_SETDATAOUT/4] = P8_7; // Notify the beagle bone that the data was already read
buffer[2600] = count_clocks;
goto START;
}
else
{
goto LOOP;
}
}
}
}


and I did one simple program ON PRU1 which sends at a certain frequency. I got to the conclusion that with transmission rates up to 2.5 MHz and can't count all the clocks. I was wondering if there is any better way for reading the rising edge, I might be lossoing performance on that function itself.

Thank you very much for your help,
-- Fred Gomes



Gerhard Hoffmann

unread,
Nov 28, 2018, 8:42:34 AM11/28/18
to beagl...@googlegroups.com


Am 28.11.18 um 12:08 schrieb fred.p....@gmail.com:
...

state[0] = ((__R31&sclk) == sclk) ?  true :  false;
state[0] = (__R31&sclk) == sclk; 

should do the same thing, but I would expect the compiler to optimize
that away. Unrolling the loops and inlining should help, also.

This is how I do the read:

Remember that I read now the SPI data into a CPLD and fetch
them bytewise.

I switch the 2 byte address lines to the CPLD and then have to wait
7 ns for propagation through the CPLD and some time more until
the ringing at the P8/P9 connector has calmed down. So I must
wait, say 4 Instructions à 5 nsec before I really get the data.
That is done with some volatile reads. I had the impression that
the number of instructions and the delay did not always scale 1:1,
so it took some pruning with the oscilloscope until I was satisfied.

The canonical solution for your problem is probably to use the
hardware SPI interface with the PRU, which should work to 48 MBit/s.
I could not make that work, and in the end I wanted 100 MBit/s anyway.

cheers,
Gerhard

------------------------------------------------------


// data avail is either (not busy) or (not drl). It is high active.
// The CPLD takes a little more than 32 Clocks at 100 MHz
// to get the 32 bits. Then we can read them out, bytewise, and
// we select the byte using 2 port bits as address.
// It is probably harmless if that extends slightly into the next
// conversion since the read activity is decoupled from the ADC core
// Reading the CPLD does not toggle ADC pins.
//
// inline saves 20 nsec of procedure overhead.

inline void wait_data_avail(void){

    while  (  __R31 & (1 << DAT_AVAIL)) {};    // wait for the high time of p9.26 = data_avail
    while  (!(__R31 & (1 << DAT_AVAIL))){};    // wait for the low time
    // Now we are at the start of the high time. The ADC transaction window opens.
    // next 320 ns we will read the data into the CPLD or program the ADC
}


// read 4 bytes from the CPLD, mask them, shift them & convert to one int.
// I must read at least 3 times that the results are right ( for address setup time)
// removing a single read makes it 60 nsec faster, 15 nsec per read. Should be 5 nsec???
// reading 3 times takes 40 nsec per bit. That should be enough.
// reading 4 times takes 60 nsec per bit. Reading __R31 takes abt. 20 ns. :-(
// Von der steigenden Flanke von data_available am P9 bis zum return dauert 725 nsec.
// kill 320 nsec, the time the CPLD needs to fill the shift register
// Once through the empty loop costs 5 nsec.
// for( retval=60; retval;  retval--){};       

// In the mean time I have changed the CPLD so that it tells when I immediately
// can fetch the data, so I gain 350 nsec that were spent with busy waiting previously.
// Now I should be able to process 3 channels.
// Using the scope is essential to see where time is lost.

inline int read_adc(void){

    int retval;

    // Without volatile this runs 3 times as fast, even though __R31 is volatile
    // The compiler seems to assume incorrectly that reading __R31 has no
    // side effects. But it has. It spends time and data might change.
    //
    // maybe we could do the merging of the result in the setup time
    // but when the compiler re-arranges instructions that might fail.

    volatile unsigned int byte0, byte1, byte2, byte3;

    wait_data_avail();

    // from here to parking the address at return it takes 350 nsec.

    __R30   &= ~(3 << QSEL);    // address 0
    byte0    = __R31;            // address setup time for byte 0
    byte0    = __R31;
//    byte0    = __R31;
    byte0    = __R31;

    __R30   |= (1 << QSEL);        // address 1
    byte1    = __R31;
    byte1    = __R31;
//    byte1    = __R31;
    byte1    = __R31;

    __R30   &= ~(3 << QSEL);    // address 2,   remove old bit field
    __R30   |=  (2 << QSEL);    // insert new bit field
    byte2   = __R31;
    byte2   = __R31;
//    byte2   = __R31;
    byte2   = __R31;

    __R30   |= (1<< QSEL);        // increment to address 3
    byte3   = __R31;            //
    byte3   = __R31;
//    byte3   = __R31;
    byte3   = __R31;            // get the last byte

    retval  = ((byte0 & 0xff)      )
            | ((byte1 & 0xff) << 8 )
            | ((byte2 & 0xff) << 16)
            | ((byte3 & 0xff) << 24);

    __R30   &= ~(3 << QSEL);    // park address at 0, may be removed.
                                // but makes it easy to spot the action on the scope.
    return  retval;
}

------------------------------------------------------

Fred Gomes

unread,
Dec 3, 2018, 5:23:03 AM12/3/18
to beagl...@googlegroups.com
I Gerhard, thank you very much for your answer.

I replaced the "IF" statements by "While", as shown in your example and the communication got a way faster.  However, I am can't still get data at 10 MHz, I think the problem has to be with writing in the shared memory zone, I've got the following piece of code:
                      

#define PRU_SHARED_MEM_ADDR 0x00010000

                       volatile int* buffer = (volatile int *) PRU_SHARED_MEM_ADDR;
int k = 0;
while((__R31&cs) != cs){ // CS = 0

while((__R31&sclk) == sclk){ //sclk = 1

if((__R31&cs) == cs)
goto END;
}
while((__R31&sclk) != sclk); //SCLK = 0 --> RISING EDGE
buffer[k/32] = ((__R31&miso) >> miso) << (K%32); --> I lOOSE CLOCKS HERE
k++;
}
END;

Any idea how can I get it a little bit faster? If I remove the MISO reading line I can catch all the SCK clocks, the problem comes when I add that line, I think it gets slower because of the access to that memory zone. 

Thank you very much,
Fred Gomes

--
For more options, visit http://beagleboard.org/discuss
---
You received this message because you are subscribed to a topic in the Google Groups "BeagleBoard" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/beagleboard/UOWD-_p0HKg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to beagleboard...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beagleboard/1f01cc2f-444c-a423-0a47-5acc4d45855f%40hoffmann-hochfrequenz.de.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages