Hi,
Yes, pipelining from SDRAM definitely sounds like a good solution and I think the current implementation should support this reasonably well. In general, this sort of pipelining depends on how much DTCM you need and how long you are able to wait for the transfer
to happen. The idea is that, once the first transfer is done, you start the next transfer and then do some processing while that happens to mask the transfer time.
In terms of making it work, I think the easiest is to do this in the loop that is going through the sources, transferring the data for 1 or more (depending on the size of the data). You need to create a global variable set up with the space to store the data
to be transferred, and I would advise setting up two of these spaces so you can switch between them - one being where data is being transferred into now, and one you are processing now, with another variable or two tracking which is which. You then start
the transfer into the first, then in the loop, wait for the transfer to be done, start the next transfer (if not at the end of the data) and then process the data that has been transferred.
There are a couple of tricky things to work around in the current codebase for this from looking into it a bit more related to how you do the DMA initiation and how you then know it has finished. The problems come from there being two different ways of processing
neurons, either using a joined-up synapse-neuron-processing core where everything happens on a single core or using split synapse processing and neuron processing cores. The issue is that the DMA is handled differently in each of these cases; the joined-up
core model uses the spin1_api and simulation.h code to handle DMAs, whereas the split-core model uses a more direct approach programming the hardware directly. My suggestion would be to pick which of these you are going to use and concentrate on that initially
to get it working, and then we can always look at possibly resolving this later. You are probably using the joined-up model if you are using a 1.0ms timestep, and the split-model if you are using a 0.1ms timestep.
If you are using the joined-up model, you can use the methods in simulation.h (simulation_dma_transfer_done_callback_on) to register a DMA handler function against a tag, and then have a variable which you set to "true" when the DMA is complete. You can then
set the variable to false, then start the DMA using spin1_dma_transfer. You can then "wait" for the variable to become true within a while loop - to make that more efficient, within the loop you can call "spin1_wfi()" which means it will go to sleep until
an interrupt is raised. You still need to check your variable, as the wfi will wake up on any interrupt, not just the DMA completion.
If you are using the split-core model, the process is similar, but there are a couple of differences. Specifically, the more direct interaction with the DMA hardware can be done with the function in dma_common.h. You don't need to register a callback in this
case since the wait_for_dma_complete function reads the DMA hardware directly to check if there are DMAs in progress. This should be OK since while you are running this part of the code, nothing else will be using the DMA hardware. So in this case, you can
first call cancel_dmas just to make sure the hardware is in the right state, then use do_fast_dma_read to start the transfer and wait_for_dma_complete when you need to wait (you don't need to use the spin1_wfi here). The other things are the same - you still
need a set of data locations to store data being transferred and data you are processing.
Hopefully that gives you enough to get started with, but let us know if you need more help.
Andrew 🙂