Reading large Plexon data, writing to useful format (Neo)

168 views
Skip to first unread message

Florian Gerard-Mercier

unread,
Nov 24, 2015, 12:37:19 AM11/24/15
to Neural Ensemble
Hi,


I am new to the group, so first, I would like to thank you for all your hard work.


I have started using Neo to work with my Plexon data, and I have several problems.
The data is typically 48 channels with MUA and LFP recorded, and the resulting files are around 2-3Gb in size. My hardware is a MacBook Pro 10.7.5 with Anaconda Python 2.7 installed. I did the same on a shared computer having 24 cores that I can access by ssh or shared screens (it is now operational, and doesn't solve the following problems). I installed Neo to install Elephant, and am trying to make it work before delving into the analysis.
Another thing worth mentioning: I installed the Anaconda Accelerate add-on, which includes the mkl packages supposed to allow the parallelization of numpy computations (I haven't looked into this in details yet).

First, the reading of the .plx files takes literally ages (~30min), even when I don't load the spike waveforms. One processor core works at 100% up until maybe the last 5min, when the RAM starts to be filled. I understand from previous threads that the reading cannot be made more efficient. However, I would like to know if 1) the computationally-intensive part couldn't be parallelized (now it only uses one core, if it could use many, that would divide the reading time by as much), and 2) if the reading would be faster with NeuroExplorer (Nex) files?

Second, the memory management looks terrible, the final space occupied in memory is roughly three times the original file's size (even without loading waveforms!). Also there seems to be some memory leak, since if I just do seg = r.read_segment() and then seg= [] , or just read the file without assigning the result to a variable, the memory is still occupied, "forever" (only when I quit that python kernel is the memory cleared). Is there something I do wrong? How to manage this/improve the situation? I thought Python was supposed to be good at memory management, I was really surprised at this behavior.

Finally, I tried to save the data read into a more tractable format, hoping to be able to reread the data much more quickly (if I have to do the long operation of reading with PlexonIO only once, it could be ok as it is). I tried hdf5, through the corresponding Neo.io. However, again the final file was three times larger than the original file, and it took ages to write. It also didn't seem to work (empty segment read back from the hdf5 file). Is it necessary to reorganize manually the data (build segments from scratch etc) before writing?
Overall, here I am just looking for a way to quickly and efficiently save data into a format I could quickly read again into memory. Has anyone suggestions? (Maybe just a raw binary file, if it is possible?)


Up til now, my LFP data was digitized at 10kHz, so one of the first things I will do is to collect the LFP data only at 1kHz from now on. However, this is only a partial fix. It may help a lot but I don't think it will solve the underlying problems described above.


All the best,

Florian Gerard-Mercier
lab

Samuel Garcia

unread,
Nov 24, 2015, 3:41:10 AM11/24/15
to neurale...@googlegroups.com
Hi,
PlexonIO is written is pure python and the way the file format is done imply a stupid loop to collect and concatenate all chunk.
The loop is done in python and chunk are very small and have variable size. This is the worth case possible. So this IO in that state never will be efficient.
One way would be to recode it in cython or something equivalent (at least the central stupid loop).
This file format is known to be inefficient for reading even plexon change there file format. Unfortunately new plexon file format is not in neo.io.
All multi CPU, and accelerated libs won't help for this, again the job to do is stupid read small variable chunk on a disk, scale then and concatenate then in memory.

So:
1. bad memory and one CPU 100% is natural.
2. hdf5 bigger than origin maybe this is because maybe signals are stored in float32 or float64
3. hdf5 shoudl really faster even it is bigger.



Samuel
--
You received this message because you are subscribed to the Google Groups "Neural Ensemble" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neuralensembl...@googlegroups.com.
To post to this group, send email to neurale...@googlegroups.com.
Visit this group at http://groups.google.com/group/neuralensemble.
For more options, visit https://groups.google.com/d/optout.

Florian Gerard-Mercier

unread,
Nov 24, 2015, 7:30:50 AM11/24/15
to Neural Ensemble
Thanks for your reply.

I understand the problem with .plx files.
But what about .nex files? Does NeuroExplorer-generated files suffer from the same stupid format?

Also, I don't understand how the memory leakage (the fact data remains in memory even after its assignment to a variable was destructed) could be explained by the stupidity of the .plx format? It is something I would like to understand.

Ok then you are saying I should try to use hdf5 whatever happens? There is no "better" way to store data for quick read and write?
On this topic, should I use the NeoHdf5IO or maybe do my own thing with the h5py package? Or maybe the neo io already relies on h5py?

Thanks,

Florian

Andrew Davison

unread,
Nov 24, 2015, 7:48:25 AM11/24/15
to neurale...@googlegroups.com
Hi Florian,

Concerning HDF5, there are two new "standard" formats for electrophysiology data, NIX and NWB, both of which are based on HDF5. Neither has Neo support yet, but we're working on it.

The NeoHdf5IO format will probably be deprecated in Neo 0.4, in favour of NIX and NWB. My proposal (which hasn't yet been discussed) is to make NeoHdf5IO read-only, so that people can read in data created with Neo 0.3, but then have to save it in one of the new formats.

Rather than doing your own thing with h5py, I suggest trying the Python bindings for NIX or NWB:

https://github.com/G-Node/nixpy
https://github.com/AllenInstitute/nwb-api/tree/master/ainwb
https://github.com/NeurodataWithoutBorders/api-python

If you are interested in helping out, you could try writing a Neo IO class for NWB - see https://github.com/NeuralEnsemble/python-neo/issues/221 - let me know if you're interested.

Cheers,

Andrew

Florian Gerard-Mercier

unread,
Nov 25, 2015, 12:20:16 AM11/25/15
to Neural Ensemble
Hi Andrew,

Thanks for the update. It is nice to announce that neohdf5 will probably be deprecated, I can thus avoid using it.

I have checked out these two new formats. It seems like this is still very much in flux, and I wouldn't want to be stranded... Also, I am not -at all- interested in sharability (for better or for worse, I work alone...), and my main interest is rapidity in reading and writing times. Do these two formats also offer the best performance regarding reading and writing?

I am too much of a newbie to think about writing an io class. But if I find a satisfying solution, I will share it of course.

Cheers,

Florian
Reply all
Reply to author
Forward
0 new messages