Dear colleagues,
I vote for Option 2
I follow in principle the arguments of Dieter and Andy, see in replies to
their emails maybe a few more ideas to solve here.
Bests,
Markus
----------------------------------------------------------------
Dr.-Ing. Markus Kühbach
Max-Planck-Institut für Eisenforschung GmbH
Department "
Microstructure Physics and Alloy Design"
Research group "Theory and Simulation"
Room 649
+49 211 6792 385
m.kue...@mpie.de
I would tend to agree with Dieter, we are lacking in general instructions in the document (beyond
basic descriptions).
>> I try to take the experimentalist perspective and assume I have no idea about HDF5. When I start reading the draft, let me see where
I get to: I asked myself the following questions, I feel they need to be answered, I try below at least a sketch of an answer:
What is the key problem the standard solves?
-> This is unclear to me, possible answer:
-> The content of most established APT file formats which keep the large datasets is not human-readable.
-> Here standard solves this and makes content from APT measurement human-readable.
-> Here standard is also a solution to store the metadata, i.e. the data (descriptions) about the content of the file.
-> Examples of metadata are units, descriptions of the purpose of variables, or the dimensions of the dataset.
I know there is RHIT and HITS. I heard these file formats are not open source. Does here standard change this?
-> Yes and no. Yes because here standard specifies a file format which is human-readable and accessible with open tools.
->No because content from RHIT and HITS file contains unprocessed raw data for which the protocols to transform
these into human-readable CAMECA-agnostic format is not possible because the IVAS source code is not open.
Can I import a APT-HDF5 file back into APSuite6/IVAS4, or older IVAS versions?
-> No, not at this point.
So why should I care then, why is an APT-HDF5 file relevant for me? Will APSuite6/IVAS4 only write these in the
future?
-> This is a though comment but I think a valid criticism to us...
-> There is a number of complementary software tool from the APT community which can already read APT-HDF5 formats.
Support is expected to grow and part of a vision for a more open and accessible approach to APT processing.
-> Open formats which keep the metadata are useful because they enable me as an experimentalist to organize my results
better. Also such format enables me to go back to an analysis and will assist me in figuring out what the heck I did back then.
Proprietary formats or doing this with IVAS exclusively works only if the version of IVAS has not changed.
(The last statement is a maybe too inaccurate claim, please check)
Why HDF5, what the heck is this at all?
The Hierarchical Data Format version 5, or HDF5 for short, is an open source library and file format specification
which facilitates open data exchange, fast file operations for both small but also very large datasets.
Is there an example? What should I do with the specification?
>> There is no hint here in the standard, so maybe this is too complicated...
>> It seems that just some people tried to agree on something, this is likely good, reading the author list, but why is it important for me?
-> We need to have an example POS/EPOS file and a transcoder tool to go from POS/EPOS to APT-HDF5 and back.
-> We need to have a Python and a Matlab script with which people can access the content.
-> We also need an explanation how to view HDF5 file. HDF5View and h5dump is not the common knowledge of APTers.
Who are the authors of the standard?
>> Unclear
Where can I post my questions?
>> Unclear, so again why should I care
I don't think much should be said about sample preparation (for example) apart from that which
was necessary to identify the specimen in an associated body of work.
>> I tend to agree because I fear that many will misunderstand the term "sample prep" and maybe then write all the details of their polishing and milling steps.
As of now this is not even part of the file spec and hence it is unclear what to say here, likely Ands comment here gives and implicit answer what is expected when
populating this field.
We're not looking to document everything in the HDF5 file, rather we want to capture the experimentally determined atom probe data and related meta-data.
>> For power readers people my fear is that they think they need to populate all fields, maybe we need to be stronger on this one,
what is essential, mark it green or black, what is optional mark it yellow or grayed out and what is maybe not even necessary, kick it out
I guess you could cast the meta-data
net quite wide, but I don't think we advocating having a total description of the experiment in there either. Perhaps a guiding principal might be, "Whatever is useful to know for the atom probe analysis", so approximate or bulk composition, expected phases,
material state (quenched, annealed, irradiated), perhaps how the specimen was prepared - as this can influence what species are found.
>> THIS IS EXACTLY WHY WE NEED TO WORK ALONG SUCH THOUGHTS. What is essential to an analysis different people will have different options about it.
We just need to make sure that we make the documenting of metadata as convenient, i.e. as automatic as possible and store more metadata to an experiment.
>> I would like to strongly advocate against trying to weave a too complicated metadata graph with this file format spec.
>> Yes, metadata are embedded in a graph, yes, the entire workflow of an APT research study can be seen as a very large graph with directed and undirected sections.
From which multiple sub-graphs can be taken and inspected. But but but : we are not at all there yet and this goes immediately quite far away
from a sole file spec only and above the heads and interest of many practitioners.
I agree we should think along such lines but this is research, research, which, as I frequently get told, is not what the APT TC is supposed to be doing.
My idea of defining exactly named fields like we have right now in the standard was to take out the smallest possible sub-graph and metadata which are
specific for only taking a measurement with an APT tomograph and get a results file.
I also think we need a little more development of the code, I know when I was trying to get Matlab
write something the C++ validator was happy with, we found lots of bugs/issues.
>> I was not aware of these challenges but typically this is also somewhat expected during software development.
>> In the long run, it might be better to have the validator ideally in Python and Matlab, it is easier accessible than is C++
and performance for the validator is not much of an issue.
Andy
Vote: Option 2
"Withhold the Draft standard for further revisions"
Comments:
(1) there should be an introduction that explains what this is for (what are the benefits, who should use this?). Very few people in the community might be able to guess a broader purpose (beyond the stated “utilise existing file storage technologies … [for]
high performance primitives").
>> Yes
(2) there should be a brief explanation of what HDF5 is, and why it is chosen for the proposed standard (benefits?) - I would not assume this is broadly established in the broader APT community.
>> Why HDF5?
-Open format
-Human readable
-Fast performance for both small and PB large files
-Metadata capabilities
-In-place compression capabilities
-Defined endianness
-Defined data layout and dimensions
-Possibility to store structured data in a hierarchy to help organizing thoughts and data
(3) how are we dealing with parameters that are not easy to come by, e.g., how do I know the "Virtual imaging distance” for my instrument? Laser spot size, beyond the nominal value given for a specific instrument type by hearsay? Is there an established procedure
how to measure that, e.g. FWHM of a laser scan?
>> I have no clue, I am not an expert on this
(4) there seem still to be missing some basic experimental parameters, e.g. the base vacuum of the instrument or perhaps vacuum evolution during the experiment. Specimen preparation details? Perhaps the intent is to have Specimen Prep included with “SampleDescription”,
at 5MB there is plenty of room, compared to the max 100 or 200 characters limit for other specimen descriptors, but this is not explicitly asked for. These may be minor points, but I think the standard should be reasonably complete and detail-balanced before
presenting it to the community for comment.
(5) In terms of extensibility of the format, should “Additional fields are strongly discouraged” really be the only thing we have to say? Are there any plans how to deal with potential revisions or extensions, beyond including a HDF5 version number?
>> I think there is no clear concept yet.
-------------------------------------------------
Max-Planck-Institut für Eisenforschung GmbH
Max-Planck-Straße 1
D-40237 Düsseldorf
Handelsregister B 2533
Amtsgericht Düsseldorf
Geschäftsführung
Prof. Dr. Gerhard Dehm
Prof. Dr. Jörg Neugebauer
Prof. Dr. Dierk Raabe
Dr. Kai de Weldige
Ust.-Id.-Nr.: DE 11 93 58 514
Steuernummer: 105 5891 1000
Please consider that invitations and e-mails of our institute are
only valid if they end with …@
mpie.de.
If you are not sure of the validity please contact
r...@mpie.de
Bitte beachten Sie, dass Einladungen zu Veranstaltungen und E-Mails
aus unserem Haus nur mit der Endung …@
mpie.de gültig sind.
In Zweifelsfällen wenden Sie sich bitte an
r...@mpie.de
-------------------------------------------------