Possible improvements to the measurement group

1 view
Skip to first unread message

Darren Dale

unread,
Jan 26, 2010, 4:37:53 PM1/26/10
to ma...@googlegroups.com
This discussion started in a thread titled "quick comment about
multidimensional hdf5 arrays".

Matt Newville wrote:
> If I understand correctly, the some names and layout will be mandated,
> perhaps as
>   Entry/Positioners/
> and
>   Entry/Detectors/
> (All names here being negotiable, decorating with "NX_" as desired).

I have been using:

/entry1/measurement/scalar_data # signals and scanned positioners
/entry1/measurement/positioners # starting positions of every motor on
the beamline
/entry1/measurement/mca1

That is, the raw data is contained in a measurement group, and what
you are calling positioners and detectors were collected inte
scalar_data. I mention it only to remind how I have been doing it for
some time, the organization you suggest up to this point is similar in
its essentials.

> If that is already the case, I don't see that allowing
>   Entry/Detectors/Canberra Ge Elem #3/
>          with attributes including (type='MCA', npts=2048, dimension=1)
>
> is all that much better than
>   Entry/Detectors/mca003/
>      with attributes including
>       (label='Canberra MED Ge Element #3, APS DetectorPool GE2',
>        npts=2048, dimension=1)

I think the latter still need a type attribute, but otherwise I am with you.

> Having to deal with whitespace, punctuation, (and unicode?) in dataset
> names seems like a bad idea to me.  The detector label is meant to be
> read by a human, but the dataset name will need to be traversed by the
> reading program.

I agree, its a bad idea. In practice, we have a lot of spec data files
here at the lab that contain spaces in the names of positioners or
counters, and I would have a hard time arguing to someone that "you
can't use names with spaces" when hdf5 in fact allows it.
Discouraging, ok, but disallowing means enforcing...

> The First approach would require looking for the
> required 'type' attribute (which must be one of some pre-defined list
> of valid types) to understand what the data was, and guarantees a
> user-defined label.  The Second approach means the dataset name itself
> tells the type (and the dateset name cannot be absent!), and has a
> user-defined label as an attribute to help understand the values in
> the dataset.

Ok.

>>> If I typically collect 3 ion chamber intensities and 40 ROIs in
>>> addition to 4 x 2048 MCAs at each point in the scan, how many detector
>>> datasets should I have?  I can see how it could be 1, 2, 44 or 47:
>>>    1:   1 array of (npts, 43+4*2048)
>>
>> Yuck.
>>
>>>    2:   1 array of (npts,43)  1 array of (npts, 4, 2048)
>>>   44:  43 arrays of (npts,)   1 array of (npts, 4, 2048)
>>>   47:  43 arrays of (npts,)   4 array of (npts, 2048)
>>
>> I think 44 or 47 makes sense. Armando suggested that if you have
>> independent calibrations for each of your 4 MCAs, it makes sense to
>> separate them. I agree.
>
> That's a good point, and I can certainly see how 47 datasets looks
> like the most reasonable solution here.
>
> Well, except that this assumes that calibration information is best
> held in attributes, which may or may not be the case. It may be best
> to store an (npts, 4, 2008) array for the data and an (4, 2008) array
> for the MCA Energies at each Channel.

In which case, maybe it makes sense for the detector to be a group,
and the spectra and the calibration would be datasets in that group.

> I think this just expresses the inherent tension of when something is
> an attribute and when it is data.  My bias is to use attributes
> sparingly, and use them for labels, flags, and descriptive integers
> (dimensions, etc) that describe or modify the data contained in the
> dataset.  If an attribute is required for all Detectors or Positions,
> I tend to think something may be wrong.

Ok.

> But perhaps, especially given the "intrinsic_dimensionality"
> confusion, we should postpone the Detectors discussion and limit the
> topic to how to organize the Positioners Group.
>
>>> OK, I got it now.  Thanks, and sorry for adding noise.  If detectors
>>> are stored with their "intrinsic dimension" at each point in the scan
>>> (seems to be the consensus), wouldn't the shape of the data array be
>>> self-describing?
>>
>> Only for the case where any scan is flattened into (npts,...) rather
>> than (nx,ny,...). If datasets are structured according to the latter
>> scheme, which appears to be the normal way for nexus, then you need
>> more information to know whether a 3-dimensional dataset, taken out of
>> context, is a 2D scan of spectra, a 3D scan of a counter, or a 1D scan
>> of images.
>
> I'm definitely in favor of flattening the positioners from (nx,
> ny,...) to (npts,...).  But, as above, what is less clear to me is how
> to store the Positioner data beyond that. One could have
>
>  Entry/Positioners/Positions         (npts, NPositioners)
>  Entry/Positioners/PositionLabels (NPositioners)
>  Entry/Positioners/PositionAddrs  (NPositioners)
>
> at each Point i in the scan, the values of all relevent positioners
> are recorded.  All their labels and addresses are also stored. As a
> bonus, the need for the Positioners Group seems weak, which could
> flatten the structure.

Or save it in an hdf5 table. (I'm not advocating for this.)

> I have the sense that some would prefer
>  Entry/Positioners/Position1      (npts,)  attributes: label, address
>  Entry/Positioners/Position2      (npts,)  attributes: label, address
>  ...
>  Entry/Positioners/PositionN      (npts,)  attributes: label, address
>
> At this point, I have a slight preference for the first variation, as
> it has predictable names for datasets, 2-D array of numerical data,
> and replaces attributes which are required by every dataset with
> "attribute arrays".

I would prefer that a positioner's data and attributes (like units)
were encapsulated in a single entity: a dataset. Plus, I think it
would be more difficult to interactively work with such monolithic
data structures. I guess it would be possible to extend phynx so that
it inspected an array of position labels, made those labels available
for tab completion, and for dictionary-like access, but I still have a
very strong preference for the second approach. With more descriptive
labels than Position1 (we already know it is a Position, its in the
Positioner group).

> But we should probably think about how to move from long email thread
> to actual written documents.  Google Wave doesn't seem quite ready for
> us (or vice versa).  Perhaps a wiki or group-wide Google Doc?

Wave would be my first choice (by far), but that suggestion was not
well received. I don't have a preference between the other two, they
are both good suggestions. Someone (Carlos?) was uncomfortable with
requiring a google account, is that necessary for google docs? I have
a wiki at http://dale.chess.cornell.edu/chess-wiki, it is
write-protected to prevent being spammed.

Darren

Reply all
Reply to author
Forward
0 new messages