Writing a RawIO

68 views
Skip to first unread message

Achilleas Koutsou

unread,
Apr 19, 2018, 7:12:13 AM4/19/18
to Neural Ensemble
Hello all,

We've been looking into writing a Neo RawIO (reader) for NIX files and reading through the documentation and examples. We've been looking at the examplerawio as well as the other raw IOs in the repository for reference.

I have a few questions about how the header is built (the most important part of the IO, of course). I understand everything revolves around channels and the header is constructed in a way that makes it straightforward to reference signals and other data through the recording channels. So, a couple of specific questions:

  •  In the example, it's assumed that there are 16 channels. Are these 16 channel indexes (in Neo terms, 16x ChannelIndex objects) or is it 16 channels in a single ChannelIndex object?
  • The information collected for each channel (units, sampling_rate, etc) should presumably come from the Signals that the channel references/contains. This implies that all Signals referenced from a single ChannelIndex share the same values. In practical terms, I see how this makes sense (all signals coming from the same channel should be measured in the same units, for instance), but as far as I know, this is not enforced in the Neo model in any way, correct? While building the header dictionary for a RawIO, should any checks be made that this consistency exists, or should it be assumed that the signals in the file are sane and that the information for any one signal represents the information for all signals coming from the same ChannelIndex? I see the requirements are addressed in the docstring of the baserawio, but I'm not entirely sure I fully got all the details of the requirements.
  • Following from the above question. if the requirements are not met (and assuming they're not enforced in any way), will reading data through the RawIO produce errors or will it work but behave unpredictably?
  • Is there a script (or test) that creates a Neo structure with fake data that complies with the requirements (or limitations) of the RawIO? If not, I could write one once I get a better handle on what these limitations are.

For the last point, I'm thinking something along the lines of:
  1. Create a Neo tree that complies with RawIO requirements
  2. Write it to a file using any of the writers that have a RawIO counterpart
  3. Use the read functions and check expected output.
I understand there are already tests for all the RawIOs and a common raw IO test, but something like this would make it clear how a traditional Neo structure translates to what comes out of a raw reader. Ideally, any IO (that has a raw reader) could be replaced in the second step and the test should work, making it a nice test to work against.

Thanks.

Achilleas Koutsou

Samuel Garcia

unread,
Apr 24, 2018, 4:10:28 AM4/24/18
to neurale...@googlegroups.com
Hello Achilleas,
thank you for jumping in this rawio.




  •  In the example, it's assumed that there are 16 channels. Are these 16 channel indexes (in Neo terms, 16x ChannelIndex objects) or is it 16 channels in a single ChannelIndex object?
In rawio, for signals, there is a "group_id" field.
All channels with the same group_id must have the same sampling_rate/t_start/length_in_segment/units.

In BaseFromRaw, the generic class to bridge neo.rawio to legacy neo.io, all channel from a group are grouped in the same ChannelIndex whan signal_group_mode='group-by-same-units'.
But you can also split all channels in N ChannelIndex channel indfex object with signal_group_mode='split-all'. (as it was the case for many in old version of neo).
So the user have two possibility.
Each IO that inherits BaseFromRaw have a _prefered_signal_group_mode class attributes that fix the **default** behavior for the IOs.



  • The information collected for each channel (units, sampling_rate, etc) should presumably come from the Signals that the channel references/contains. This implies that all Signals referenced from a single ChannelIndex share the same values. In practical terms, I see how this makes sense (all signals coming from the same channel should be measured in the same units, for instance), but as far as I know, this is not enforced in the Neo model in any way, correct? While building the header dictionary for a RawIO, should any checks be made that this consistency exists, or should it be assumed that the signals in the file are sane and that the information for any one signal represents the information for all signals coming from the same ChannelIndex? I see the requirements are addressed in the docstring of the baserawio, but I'm not entirely sure I fully got all the details of the requirements.
Yes, rawio force consistency for a channel along segment. This not the case for the neo model which is more liberal.
If an underlying file format enable inconsistency for one channel along segment you have to check this and create two channel.
But in that case the scenario:  "neo model A with irregulraity">"to file hdf5">"from file file with rawio">"neo model B".
modelA!=modelB because channels will be diffrents.
This is the limitation of  rawio, the true neo model with inregularity in channel attributes cannot be render with rawio.



  • Following from the above question. if the requirements are not met (and assuming they're not enforced in any way), will reading data through the RawIO produce errors or will it work but behave unpredictably?
Most of file format don't have this.
Only flexible hdf5 file base should have this very special case.
I think for NixIO, to handle special case of a channel that change the sampling_rate along segment you should have a flag.
irregularity_mode='raise_error'/'split_channel' with a strong documentation.


  • Is there a script (or test) that creates a Neo structure with fake data that complies with the requirements (or limitations) of the RawIO? If not, I could write one once I get a better handle on what these limitations are.

In neo/test/iotest/common_io_test.py there are two methods test_write_then_read and test_read_then_write.
In neo/test/iotest/common_io_test.py there is write_generic()
In neo/test/iotest/generate_datasets there is generate_from_supported_objects()

By playing with that function you could add some kargs for generating asynmetric neo tree object to test this.


For the last point, I'm thinking something along the lines of:
  1. Create a Neo tree that complies with RawIO requirements
  2. Write it to a file using any of the writers that have a RawIO counterpart
  3. Use the read functions and check expected output.
I understand there are already tests for all the RawIOs and a common raw IO test, but something like this would make it clear how a traditional Neo structure translates to what comes out of a raw reader. Ideally, any IO (that has a raw reader) could be replaced in the second step and the test should work, making it a nice test to work against.

Codding a class in neo.io that write generic neo tree is quite challenging even before rawio.
And coding a class that do read with rawio and write is a super challenge.



Don't spent to much time to write code to handle special case that will be never used.
100% of my analyses code (spike sorting/spectral analysis/average) assume that sampling_rate and units are consistent for a channel.
I guess that other toolbox do the same and all end user too.


Best,


Samuel






--
You received this message because you are subscribed to the Google Groups "Neural Ensemble" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neuralensembl...@googlegroups.com.
To post to this group, send email to neurale...@googlegroups.com.
Visit this group at https://groups.google.com/group/neuralensemble.
For more options, visit https://groups.google.com/d/optout.

Achilleas Koutsou

unread,
Apr 24, 2018, 8:35:03 AM4/24/18
to Neural Ensemble
Hi Samuel.

Thanks for the clarifications and advice. It's very useful. We'll keep working on it get back to you if we have any further questions.

Achilleas Koutsou

unread,
May 22, 2018, 11:39:20 AM5/22/18
to Neural Ensemble
Hello again,

Our student who has been working on the RawIO has it mostly implemented. Certain things are a bit unclear still however, so I'd like to ask a few followup questions. Some of these are a bit basic still, but I wanted to be absolutely sure there are no misunderstandings.

- In the BaseRawIO docstring, it is written that "Only one channel set for AnalogSignal (aka ChannelIndex) stable along Segment". If so, why is it that in many functions, for example the _get_analogsignal_chunk, both seg_index and channel_indexes have to be given as parameters? How do Segment and ChannelIndex objects relate to one another and how is their number and relationship limited exactly in the cases where the RawIO is used?

- How exactly do the indexes of a ChannelIndex map to signals? Say we have 3 AnalogSignals, in the same segment: A1, A2, and A3:
1. If A1 contains two signals, can A2 and A3 contain any number of signals or are they limited to the same shape as A1?
2. If each have 2 signals, and they are all referenced by a single ChannelIndex object, what does an index = [1] refer to? Is it the first signal of each AnalogSignal (A1[:, 0], A2[:, 0], A3[:, 0]) or just the first one? If it's just the first one, does index = [3] then refer to A2[:, 0]? Is it something else completely?
3. Does the length of all signals need to be the same for the RawIO to make sense?

Thanks again.

Achilleas

Samuel Garcia

unread,
May 22, 2018, 3:28:12 PM5/22/18
to neurale...@googlegroups.com
Hi,
the raw API could be improved, feel free to also add it in the futur PR.

A the very begening of rawio only one global sample_rate and one global length per segment were possible.
Then the concept of 'group_id' have been introduced for signal that have '_common_sig_characteristics' to enable the case with several sampling_rate/length for signals.

At neo.io level all signal_channel with sharing the same group_id will be grouped in the same ChannelIndex.




- In the BaseRawIO docstring, it is written that "Only one channel set for AnalogSignal (aka ChannelIndex) stable along Segment". If so, why is it that in many functions, for example the _get_analogsignal_chunk, both seg_index and channel_indexes have to be given as parameters? How do Segment and ChannelIndex objects relate to one another and how is their number and relationship limited exactly in the cases where the RawIO is used?

In RawIO, signal_channel (aka ChannelIndex in neo.core) is supposed to be stable along segment.
In short, a group of signal_channel (with same_id) is stable along segment.
len(ChannelIndex ) x len(segments) = total nomber of AnalogSignal.



In legacy, neo.io everything is possible even strange things.


- How exactly do the indexes of a ChannelIndex map to signals? Say we have 3 AnalogSignals, in the same segment: A1, A2, and A3:
1. If A1 contains two signals, can A2 and A3 contain any number of signals or are they limited to the same shape as A1?
A1, A2 and A3 come from different group_id so they can have different length, nb_channel (so shape), sampling_rate, t_start, ...



2. If each have 2 signals, and they are all referenced by a single ChannelIndex object, what does an index = [1] refer to? Is it the first signal of each AnalogSignal (A1[:, 0], A2[:, 0], A3[:, 0]) or just the first one? If it's just the first one, does index = [3] then refer to A2[:, 0]? Is it something else completely?
It do not make sens.
They won't have the same ChannelIndex even if they have 2 channels.
You will have 3 ChannelIndex.


3. Does the length of all signals need to be the same for the RawIO to make sense?

Not necessary, even if it is a general case.
if you have different length inside a segment so you must declare different group_id.
Reply all
Reply to author
Forward
0 new messages