Hi,
I am new to the group, so first, I would like to thank you for all your hard work.
I have started using Neo to work with my Plexon data, and I have several problems.
The data is typically 48 channels with MUA and LFP recorded, and the resulting files are around 2-3Gb in size. My hardware is a MacBook Pro 10.7.5 with Anaconda Python 2.7 installed. I did the same on a shared computer having 24 cores that I can access by ssh or shared screens (it is now operational, and doesn't solve the following problems). I installed Neo to install Elephant, and am trying to make it work before delving into the analysis.
Another thing worth mentioning: I installed the Anaconda
Accelerate add-on, which includes the mkl packages supposed to allow the parallelization of numpy computations (I haven't looked into this in details yet).
First, the reading of the .plx files takes literally ages (~30min), even when I don't load the spike waveforms. One processor core works at 100% up until maybe the last 5min, when the RAM starts to be filled. I understand from previous threads that the reading cannot be made more efficient. However, I would like to know if 1) the computationally-intensive part couldn't be parallelized (now it only uses one core, if it could use many, that would divide the reading time by as much), and 2) if the reading would be faster with NeuroExplorer (Nex) files?
Second, the memory management looks terrible, the final space occupied in memory is roughly three times the original file's size (even without loading waveforms!). Also there seems to be some memory leak, since if I just do seg = r.read_segment() and then seg= [] , or just read the file without assigning the result to a variable, the memory is still occupied, "forever" (only when I quit that python kernel is the memory cleared). Is there something I do wrong? How to manage this/improve the situation? I thought Python was supposed to be good at memory management, I was really surprised at this behavior.
Finally, I tried to save the data read into a more tractable format, hoping to be able to reread the data much more quickly (if I have to do the long operation of reading with PlexonIO only once, it could be ok as it is). I tried hdf5, through the corresponding Neo.io. However, again the final file was three times larger than the original file, and it took ages to write. It also didn't seem to work (empty segment read back from the hdf5 file). Is it necessary to reorganize manually the data (build segments from scratch etc) before writing?
Overall, here I am just looking for a way to quickly and efficiently save data into a format I could quickly read again into memory. Has anyone suggestions? (Maybe just a raw binary file, if it is possible?)
Up til now, my LFP data was digitized at 10kHz, so one of the first things I will do is to collect the LFP data only at 1kHz from now on. However, this is only a partial fix. It may help a lot but I don't think it will solve the underlying problems described above.
All the best,
Florian Gerard-Mercier
lab