NixIO is unbearably slow to create new neo h5 files

90 views
Skip to first unread message

gite...@berkeley.edu

unread,
Sep 1, 2017, 4:21:00 PM9/1/17
to Neural Ensemble
Hello,

I'm a PhD candidate at UC Berkeley and have been a happy neo user for over a year!

I recently updated to neo 0.5.1 and have encountered a problem. I'm attempting to create a neo file with the NixIO.write_block() method. I create my neo block which has n segments (n=number of trials), and each segment has k spiketrains (k= 20-30) and 70 Analog Signals arrays. The block object is created successfully but when I pass it to the NixIO.write_block() method it can take up to three hours (for something that took 6 minutes with the neo version 0.4.1) to save.

Has anyone experienced this problem before? If not, could someone provide me with tips on how to create a neo h5 file in a reasonable amount of time please? I really don't want to stop using neo because of this but I don't know what's wrong.

Thanks!
Greg Telian

Achilleas Koutsou

unread,
Sep 3, 2017, 7:15:31 AM9/3/17
to Neural Ensemble
Hello Greg,

I wrote the NIX IO for Neo and I am aware of the performance issues. I've been working on optimising certain parts for some time now. Some of these changes have been merged (after 0.5.1) and some are still being worked on.

If you're curious (and for posterity) most issues with the performance of the NIX IO are related to certain operations which don't scale well with the number of objects (Spiketrains, AnalogSignals, etc). One issue, for example, is related to conflict resolution between objects that share the same name (or have no name at all). This situation isn't a problem for Neo, but it is for NIX and HDF5, so during write, the IO needs to resolve these conflicts and the original implementation wasn't very efficient. This particular issue has been fixed in the master branch of Neo.

I'd be curious to know if using the master branch has a significant impact on your 3 hour write time, though I don't know if using the master branch for anything other than testing and profiling is recommended. Of course, I understand you may not have time to run tests for us, so I'll have a look at what the perfomance looks like on my end with a Neo object structure that fits your description (i.e., 30 spike trains and 70 analog signals per segment). What's a typical value for n (number of segments) in your file?

One more question: Could you tell me the version of nixio you're using? Some of the issues also depend on the underlying NIX package (the python implementation of which is called nixio, not to be confused with the NixIO in Neo which handles reading and writing), so optimisations are being worked on there as well.

Thanks for the report,
Achilleas

gite...@berkeley.edu

unread,
Sep 5, 2017, 6:50:33 PM9/5/17
to Neural Ensemble
Hello Achilleas,

Thank you so much for your explanation and offering your time to help!

I pulled the master branch from github and created a neo file. Unfortunately, it does not appear to save any faster than before. I also noticed that the structure of the neo file is different than what I expected (I don't think this is unique to the master branch). For example, my files created with neo 0.4.1 have the standard block/segments/analogsignals, spiketrains, etc structure. Which I really like and is the reason I love neo. But the new files seem to lose this structure. The new structure is something like this: block/groups/neo.segment.long string of numbers and characters. I believe the string of characters is the unique name nixio gives my segments. I don't understand why this happens because I make sure to provide a unique name for each segment I create. I've attached an example neo file created with the old and new format. It isn't clear to me how I can create a neo file with the newer version and maintain the same great neo core file structure.

FYI I am using nixio v1.4.2 and my typical experiments have 1300 segments.

Thank you  for your help!
Greg

Achilleas Koutsou

unread,
Sep 6, 2017, 7:16:19 AM9/6/17
to Neural Ensemble
Hi Greg,

First, a note about the old HDF5IO vs the current NixIO. The difference you're seeing in the object paths ("/block/groups/..." instead of "/block/segments/...") is due to the structure defined by the NIX library. NIX uses HDF5 for storage, but it defines a object hierarchy of its own. The NixIO in Neo is responsible for converting objects and data that follow the Neo structure to the corresponding objects defined by NIX. Groups are the equivalent of Segments, so the object you identified as "neo.segment.<long string of numbers and characters>" is in fact a NIX Group object. The change from HDF5 to NIX as a backend is the reason for all the differences you're noting, including the performance drop. The NIX format is more rigid but also more descriptive than plain HDF5, which means the NixIO does more than just save the Neo structure to file; it also has to convert objects and relations to the format defined by NIX.

The change in naming scheme for individual objects was done as a general fix for avoiding checks that have to do with name conflict resolution, as you correctly mentioned. You can find the discussion which lead to this decision in the following issue on GitHub if you're curious: https://github.com/NeuralEnsemble/python-neo/issues/311. You don't need to read the discussion as it's quite long, but I thought I'd link it for posterity.
The short version is that we needed a general way to handle name conflicts but also wanted to be able to determine whether an object has already been written to a file, to know whether an object should overwrite a previously saved one or store it alongside. For example, if a block is created, written, modified, and then written again to the same file, the file shouldn't contain two blocks.

You bring up a valid concern with our naming approach, however. The current state of the IO assumes (perhaps not explicitly) that users wont be manipulating the underlying NIX or HDF5 files written by the Neo-NixIO. This is implicit in the naming since the object names chosen by the user are stored in the metadata of each object and the visible object name is replaced with what you noted: the neo type followed by a long string of numbers (a UUID). There's a bit of a conflict of use cases perhaps. Users who want to read, write, and generally work primarily using Neo and simply pick NIX as their storage backend would probably prefer the reliability of having uniquely identifiable objects when reading, writing, and overwriting and may never be exposed to the UUID names, unless they use NIX or HDF5 tools to inspect their data. On the other hand, users who want to use the NixIO as a way to get their data into NIX or HDF5 format (i.e., using Neo as a conversion layer), who also may have carefully chosen the names of their objects (using meaningful signal and spiketrain names), as I suspect you are doing, don't want the "conversion layer" to be renaming their objects in the process.

The easy fix to this would be to have function arguments to specify behaviour. This was mentioned as an option in the issue discussion linked above, not so much for disabling the naming method, but for defining whether objects should be overwritten or not. I'm not against adding arguments that allow users to specify behaviour, especially in cases like this, so I'll definitely look into doing this in a nice, clean way that doesn't disrupt common workflows.

Back on the main topic about the write time, I'm surprised there was little or no effect from the change. I didn't expect the difference to be huge, but at least noticeable. Given the number of segments (1300), I'm not surprised it's taking very long though. Three hours is, of course, unreasonable, I just mean that I can imagine how the large number of objects can cause the write time to grow so much.

Thanks for the extra info and linking to the data.
I've been profiling different parts of the IO as well as NIX itself and I have a good idea of which parts are the worst offenders, but I'm still a while away from fixing everything.

gite...@berkeley.edu

unread,
Sep 6, 2017, 1:16:44 PM9/6/17
to Neural Ensemble
Achilleas,

Wow, thank you for such a thorough explanation! I really appreciate it. It really helps me understand the behavior I've observed!

Given how I am creating my own files from raw data would you recommend I use nixio exclusively or change the way I create my neo files? If the latter what should I change to increase write performance? Perhaps, create multiple blocks instead of many segments.

Again, thank you, you've been incredibly helpful,
Greg T.

Achilleas Koutsou

unread,
Jul 19, 2018, 7:05:10 AM7/19/18
to Neural Ensemble
Hey again.

It's been almost a year but I have some good news.

First, a small feature is being added: I've finally implemented an option that allows users to force the NIX files to inherit the names of the corresponding Neo objects. The relevant pull request is still open (as of writing) here https://github.com/NeuralEnsemble/python-neo/pull/551. The write_block() and write_all_blocks() function now allow you to specify use_obj_names=True to enable this. Name conflicts are caught before any writing happens and raise Exceptions accordingly.

The more important feature however is the progress on the writing optimisations for NIXPy. It's still in development and I need to test some edge cases, but in the current state it is much faster in both writing and reading objects. I don't mean to jump the gun, but if you want to try it out, you can get it from the development branch using pip:

You'll also need a compatible dev version for Neo to test it since the IO requires some changes. You can also get this from a dev branch using pip:
Note that this Neo version also includes the aforementioned use_obj_names feature.

If you're still using Neo with the NIXIO and want to test these out, I'd be happy to hear if you have any issues and if the performance improvements would make your life easier.

Reply all
Reply to author
Forward
0 new messages