state[0] = (__R31&sclk) == sclk;
should do the same thing, but I would expect
the compiler to optimize
that away. Unrolling the loops and inlining
should help, also.
This is how I do the read:
Remember that I read now the SPI data
into a CPLD and fetch
them bytewise.
I switch the 2 byte address lines to
the CPLD and then have to wait
7 ns for propagation through the CPLD
and some time more until
the ringing at the P8/P9 connector has
calmed down. So I must
wait, say 4 Instructions à 5 nsec
before I really get the data.
That is done with some volatile reads.
I had the impression that
the number of instructions and the
delay did not always scale 1:1,
so it took some pruning with the
oscilloscope until I was satisfied.
The canonical solution for your problem
is probably to use the
hardware SPI interface with the PRU,
which should work to 48 MBit/s.
I could not make that work, and in the
end I wanted 100 MBit/s anyway.
cheers,
Gerhard
------------------------------------------------------
// data avail is either (not busy) or (not drl). It is high
active.
// The CPLD takes a little more than 32 Clocks at 100 MHz
// to get the 32 bits. Then we can read them out, bytewise, and
// we select the byte using 2 port bits as address.
// It is probably harmless if that extends slightly into the next
// conversion since the read activity is decoupled from the ADC
core
// Reading the CPLD does not toggle ADC pins.
//
// inline saves 20 nsec of procedure
overhead.
inline void wait_data_avail(void){
while ( __R31 & (1 << DAT_AVAIL)) {}; // wait
for the high time of p9.26 = data_avail
while (!(__R31 & (1 << DAT_AVAIL))){}; // wait
for the low time
// Now we are at the start of the high time. The ADC
transaction window opens.
// next 320 ns we will read the data into the CPLD or program
the ADC
}
// read 4 bytes from the CPLD, mask them, shift them & convert
to one int.
// I must read at least 3 times that the results are right ( for
address setup time)
// removing a single read makes it 60 nsec faster, 15 nsec per
read. Should be 5 nsec???
// reading 3 times takes 40 nsec per bit. That should be enough.
// reading 4 times takes 60 nsec per bit. Reading __R31 takes abt.
20 ns. :-(
// Von der steigenden Flanke von data_available am P9 bis zum
return dauert 725 nsec.
// kill 320 nsec, the time the CPLD needs to fill the shift
register
// Once through the empty loop costs 5 nsec.
// for( retval=60; retval; retval--){};
// In the mean time I have changed the CPLD so that it tells when
I immediately
// can fetch the data, so I gain 350 nsec that were spent with
busy waiting previously.
// Now I should be able to process 3 channels.
// Using the scope is essential to see where time is lost.
inline int read_adc(void){
int retval;
// Without volatile this runs 3 times as fast, even though
__R31 is volatile
// The compiler seems to assume incorrectly that reading __R31
has no
// side effects. But it has. It spends time and data might
change.
//
// maybe we could do the merging of
the result in the setup time
// but when the compiler
re-arranges instructions that might fail.
volatile unsigned int byte0, byte1,
byte2, byte3;
wait_data_avail();
// from here to parking the address at return it takes 350
nsec.
__R30 &= ~(3 << QSEL); // address 0
byte0 = __R31; // address setup time for byte 0
byte0 = __R31;
// byte0 = __R31;
byte0 = __R31;
__R30 |= (1 << QSEL); // address 1
byte1 = __R31;
byte1 = __R31;
// byte1 = __R31;
byte1 = __R31;
__R30 &= ~(3 << QSEL); // address 2, remove
old bit field
__R30 |= (2 << QSEL); // insert new bit field
byte2 = __R31;
byte2 = __R31;
// byte2 = __R31;
byte2 = __R31;
__R30 |= (1<< QSEL); // increment to address 3
byte3 = __R31; //
byte3 = __R31;
// byte3 = __R31;
byte3 = __R31; // get the last byte
retval = ((byte0 & 0xff) )
| ((byte1 & 0xff) << 8 )
| ((byte2 & 0xff) << 16)
| ((byte3 & 0xff) << 24);
__R30 &= ~(3 << QSEL); // park address at 0,
may be removed.
// but makes it easy to spot the
action on the scope.
return retval;
}
------------------------------------------------------