Hi Darren, All,
Thanks for the replies.
Darren wrote:
> Concerning dead time, I think I have an elegant solution in phynx
I agree that the phynx solution is elegant. It does put some burden
on API implementers in other languages, and also means that the
corrected data is not actually stored.
I think there are enough subtleties with dead time corrections that
placing the burden of doing the correction at the point of origin is
preferred. For example, if there are two dead-times for a detector
(detectors using XIA's DXP electronics have this feature), the full
correction may included deadtimes (in nanoseconds) that have been
determined separately, and then the correction is done iteratively...
at least that's one way to do it. I t's hard to imagine that this sort
of correction (or ALL the variations on how to store deadtime) would
be done by every library that can read an HDF5 file.
I think that for an interchange format, it's best to not rely on
anything besides trivial calculations at the user end.
==Issues with IDL
Darren wrote:
> You mentioned on the website that IDL sometimes cannot read a well
> formed file, and that it might have been due to the file being open
> multiple times. Do you mean that you had the file open in multiple
> programs at the same time? If so, I think that is unfortunately a
> situation that needs to be avoided. I don't think the hdf5 library
> uses locks to prevent multiple processes from modifying/accessing the
> file at the same time, so this can result in corrupted data. However,
> I may be mistaken, and would love to have someone correct me. If I
> understood Andrew Collette (author of h5py) correctly, this situation
> is improved if hdf5 is compiled with support for mpi. Preliminary
> tests seem to show that hdf5 can be built as a shared library
> (required for python bindings) with mpi support, but I don't think the
> hdf5 group tests this configuration.
I need to (or better yet, somebody else needs to) look into this more
carefully. I had a hard time getting reliable failures. I was
definitely doing all of these things, which may lead to troubles:
- overwriting test files
- using the HDF5 Viewer, and not always closing the Viewer.
- reading files on both Linux and Windows.
- using files sitting on networked drives.
I never had trouble with Python (h5py only, I didn't test pytables),
but several problems with IDL. I was using IDL 7.0, on both Windows
and Linux but (mostly linux simply because restarting IDL is faster).
My suspicion is that IDL and the HDF5 Viewer are not always good at
actually closing file handles. I believe I never had trouble when IDL
opened a "brand new file" that had never been touched by another
application. So, perhaps I was beating on HDF5 files more than they
expect -- that worries me a little.
Armando wrote:
> Concerning IDL and HDF5, I thought the support level was good.
> Nevertheless, I think IDL also gets some "good promotion" in Scientific
> fields from the fact people like Stefan Vogts (MAPS), Chris Ryan
> (GeoPIXE) and Manolo Sanchez (XOP) use it. I am unable to judge it, but
> I guess IF the HDF5 support in IDL is very limited, IDL itself could do
> some effort to improve it if the request comes from "good customers"...
I won't claim to be a good IDL customer, so that's not my fight.
FWIW, the release notes for IDL 7.1 (May, 2009) says it supports HDF5
1.6.7. According to the HDF5 web pages, 1.6.7 was released in Jan,
2008, 1.6.8 in Nov, 2008, and 1.6.9 in May, 2009. 1.8.0 was released
in Feb, 2008. So IDL 6 to 12 months behind HDF5 releases, and more
reluctant to move up minor versions, both of which seem reasonable.
It does mean that assuming an application can read HDF5 1.8 files may
not be a good idea for a long time (many folks are still using the
IDL6).
Personally, I'm more concerned that files written with HDF5 1.8 can
*crash* applications linked with the HDF5 1.6 library. That seems
like it has to at mostly an HDF5 problem to me.
==Data Layout
Armando also wrote:
> Personally I prefer to have an attribute based layout than a name based
> layout but nothing prevents you from having both.
I agree with this. I had "Version", and "Beamline" attributes to the
top-level data group. In principle, attributes such as these ought to
be able to explain what the data layout is well enough for a library
to read data from several different sources.
> For simplicity, at this mailing list we had decided to put the data in
> /data/data till we get a better layout. If you take a look at the Pigment
> dataset I provided, and you look at the attributes, you will see that
> /data is an NXentry group, /data/data is a link and the relevant
> information is inside an NXdata group. So, the file is following "NeXus
> rules" despite having an agreed name based layout. When I said in my
> PANDATA talk, that NeXus could save us a lot of discussions, I meant that
> some things have been already thought and solved and we could use
> them. The main problem I see with NeXus is that it is an Instrument based
> approach to store the data that is far from adequate from the analysis
> point of view. Darren and I have come with a proposal that may help to
> simplify the situation. In the extreme case (just a Measurement group)
> only borrows NXentry from NeXus.
OK, but why 'data/data'?? What is gained by following NeXus conventions?
Thanks,
--Matt Newville