Hi Greg,
First, a note about the old HDF5IO vs the current NixIO. The difference you're seeing in the object paths ("/block/groups/..." instead of "/block/segments/...") is due to the structure defined by the NIX library. NIX uses HDF5 for storage, but it defines a object hierarchy of its own. The NixIO in Neo is responsible for converting objects and data that follow the Neo structure to the corresponding objects defined by NIX. Groups are the equivalent of Segments, so the object you identified as "neo.segment.<long string of numbers and characters>" is in fact a NIX Group object. The change from HDF5 to NIX as a backend is the reason for all the differences you're noting, including the performance drop. The NIX format is more rigid but also more descriptive than plain HDF5, which means the NixIO does more than just save the Neo structure to file; it also has to convert objects and relations to the format defined by NIX.
The change in naming scheme for individual objects was done as a general fix for avoiding checks that have to do with name conflict resolution, as you correctly mentioned. You can find the discussion which lead to this decision in the following issue on GitHub if you're curious:
https://github.com/NeuralEnsemble/python-neo/issues/311. You don't need to read the discussion as it's quite long, but I thought I'd link it for posterity.
The short version is that we needed a general way to handle name conflicts but also wanted to be able to determine whether an object has already been written to a file, to know whether an object should overwrite a previously saved one or store it alongside. For example, if a block is created, written, modified, and then written again to the same file, the file shouldn't contain two blocks.
You bring up a valid concern with our naming approach, however. The current state of the IO assumes (perhaps not explicitly) that users wont be manipulating the underlying NIX or HDF5 files written by the Neo-NixIO. This is implicit in the naming since the object names chosen by the user are stored in the metadata of each object and the visible object name is replaced with what you noted: the neo type followed by a long string of numbers (a UUID). There's a bit of a conflict of use cases perhaps. Users who want to read, write, and generally work primarily using Neo and simply pick NIX as their storage backend would probably prefer the reliability of having uniquely identifiable objects when reading, writing, and overwriting and may never be exposed to the UUID names, unless they use NIX or HDF5 tools to inspect their data. On the other hand, users who want to use the NixIO as a way to get their data into NIX or HDF5 format (i.e., using Neo as a conversion layer), who also may have carefully chosen the names of their objects (using meaningful signal and spiketrain names), as I suspect you are doing, don't want the "conversion layer" to be renaming their objects in the process.
The easy fix to this would be to have function arguments to specify behaviour. This was mentioned as an option in the issue discussion linked above, not so much for disabling the naming method, but for defining whether objects should be overwritten or not. I'm not against adding arguments that allow users to specify behaviour, especially in cases like this, so I'll definitely look into doing this in a nice, clean way that doesn't disrupt common workflows.
Back on the main topic about the write time, I'm surprised there was little or no effect from the change. I didn't expect the difference to be huge, but at least noticeable. Given the number of segments (1300), I'm not surprised it's taking very long though. Three hours is, of course, unreasonable, I just mean that I can imagine how the large number of objects can cause the write time to grow so much.
Thanks for the extra info and linking to the data.
I've been profiling different parts of the IO as well as NIX itself and I have a good idea of which parts are the worst offenders, but I'm still a while away from fixing everything.