Re: A question for RAW to read H5 files

140 views
Skip to first unread message

Richard Gillilan

unread,
Jun 2, 2021, 11:56:35 AM6/2/21
to Shuo Sui, Richard Gillilan, bioxt...@googlegroups.com
RAW should be able to read arbitrary HDF5, but I’m not sure if anything special is needed in the metadata. Probably not. I’ve copied this message to the bioxtas raw list in case Jesse or anyone else has suggestions.
You may wish to subscribe to this list if you plan to use RAW in the future.

Richard


On Jun 2, 2021, at 11:50 AM, Shuo Sui <ss4...@cornell.edu> wrote:

Hi Richard,
 
This is Shuo from Lois Pollack lab. We are preparing a XFEL beamtime and hope to have real-time data analysis capability. The data come out of the epix detector are stored in the HDF5 files, and we hope to directly import them to RAW to visualize the scattering curves. Do you have suggestions about what parameters to be integrated into the constructed HDF5 files so they are compatible with RAW? I’ll appreciate your advice very much!
 
Best,
 
Shuo  

Jesse Hopkins

unread,
Jun 2, 2021, 1:26:35 PM6/2/21
to Shuo Sui, Richard Gillilan, bioxt...@googlegroups.com
Hi Shuo,

While Richard is correct in that RAW can read hdf5 files generally, in order to actually load anything useful from them RAW needs to know where the data is stored in the file. By default, RAW can load any file type that can be read by fabio (http://www.silx.org/doc/fabio/latest/index.html), plus a few others that we have custom definitions for.

If your file type cannot be read by fabio you have a couple of options. 
1) You can either ask the folks who maintain fabio to add a reader for it (this would be my preferred approach, because it will benefit the whole community in the long run). Then if you build RAW from source with the version of fabio that reads your filetype, RAW will be able to read in images.

2) Write a custom reader for your filetype, and add it to the RAW source code. I could help you if you go this route. You could then submit that change to RAW to have it incorporated into future releases. In the meantime, you would build your custom version of RAW from source to use your new reader until the next release. This is probably the best approach if you're in a hurry and have someone who's happy writing python code.


My recommendation is to first test if RAW can read your image as is (i.e. if fabio can load it). If not, then decide which approach you want to take and let me know. We can go from there.

Note that once you can read in the images, you'll need to make a custom configuration file for the experiment. If you want to read in additional values (like incident intensity for normalization purposes) you'll need to either have a header format that RAW can read, or define a new header format for RAW and add it to the code. You can learn more about making config files for RAW here:

All the best.

- Jesse

----
Jesse Hopkins, PhD
Beamline Scientist
BioCAT, Sector 18
Advanced Photon Source


--
You received this message because you are subscribed to the Google Groups "BioXTAS RAW" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bioxtas_raw...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bioxtas_raw/A79B6330-73FB-44FD-A438-C0187AD73D81%40cornell.edu.

Thomas Grant

unread,
Jun 2, 2021, 5:05:49 PM6/2/21
to bioxt...@googlegroups.com, Shuo Sui, Richard Gillilan
Hi Jesse,

Just to clarify, we're not actually interested in reading the 2D images or performing any integration in RAW. We just have a large stack of radial profiles saved in an hdf5 format. We make this hdf5 ourselves, so we can format it any way we like (i.e. we can call the datasets in the hdf5 file whatever would work for RAW, be it /data/data or something else).

Is there something in RAW that allows us to select an hdf5 dataset to read from, or would setting it to /data/data work fine? How to read q values for instance?

Thanks,
Tom

Stephen Paul Meisburger

unread,
Jun 2, 2021, 5:45:46 PM6/2/21
to bioxt...@googlegroups.com, Shuo Sui, Richard Gillilan
Hi all,

Sounds like you need to reverse engineer the raw data series format. You could try saving some sec profiles in h5, and open them in python. There are also h5 datasets included in the tutorials 

- Steve 

On Jun 2, 2021, at 5:05 PM, Thomas Grant <tome...@gmail.com> wrote:



Thomas Grant

unread,
Jun 2, 2021, 5:57:39 PM6/2/21
to bioxt...@googlegroups.com, Richard Gillilan, Shuo Sui
Good idea. We’ll look at those tutorial sec h5 files. 

Thanks,
Tom

Jesse Hopkins

unread,
Jun 2, 2021, 6:05:29 PM6/2/21
to bioxt...@googlegroups.com, Shuo Sui, Richard Gillilan
Hi all,

Tom: Thanks for the clarification. Reading in a set of profiles from an hdf5 file is a very different task than reading in a set of images. So it's good to know exactly what you want to do.

I wouldn't recommend trying to reverse engineer the RAW series .hdf5 files (though if you want to, it's all in the save/load functions for the series, in the SASFileIO.py file). There's a lot of stuff in there you don't need, and it would be a pain to have to add it to each of your files.

I think there's a better approach here. RAW has an (undocumented) feature that allows you to provide JSON formatted definitions files that define a data structure within an HDF5 file for loading. These definition files can be added to any version of raw using the Options->Add HDF5 Definition menu option, so in theory you wouldn't even have to use a from source version of RAW, you can just make a definition and add it to whatever RAW you have installed.

At the moment, this feature is only lightly tested, in that I've made one definition file (for LiX .hdf5 formatted reduced data), and it works for that. However, I think with a little bit of staring at the code and the definitions file, it would be easy to define one that will work for your data.

You can find the definition files here:

The loading function that uses these definition is here in the loadHdf5File function:

If that sounds like what you want, let me know, and I can refresh my memory on how these definition files work, and what the minimal set of definitions you would need are.

All the best.

- Jesse
----
Jesse Hopkins, PhD
Beamline Scientist
BioCAT, Sector 18
Advanced Photon Source

Thomas Grant

unread,
Jun 2, 2021, 8:09:31 PM6/2/21
to bioxt...@googlegroups.com, Shuo Sui, Richard Gillilan
Hi Jesse,

Thanks, that sounds like the way to go. I'll look into getting the definitions file with the JSON format working for us.  I also was curious about your APS SEC-SAXS .h5 files you use at BioCAT.  I see the datasets are something like /profiles/XXXXXX or /subtracted_profiles/XXXXXXX. Can we follow that format also?  That format seems more in line with what we'll have, just a series of profiles to drop into RAW.  Is there a definitions file somewhere for that?

Thanks,
Tom

Jesse Hopkins

unread,
Jun 2, 2021, 8:26:37 PM6/2/21
to bioxt...@googlegroups.com, Shuo Sui, Richard Gillilan
Hi Tom,

That’s just the generic RAW series format, it’s not particular to BioCAT (though we do use it). You can take a look at the load_series function in the SASFileIO.py to see how that’s loaded (and the save_series function to see how it’s saved). If you want to use the series format you’ll need to define most of the data groups and attributes you see in that load/save function.

I think the JSON approach is going to be easier than creating the full RAW series file, because the series file is going to expect data groups like baseline_subtracted_profiles (even if it’s empty) and things like that, but you’re welcome to use either. There’s no reason you can’t have your profiles in a similar format to the basic RAW series file but and still use the JSON, using /profiles/XXXXX for your profiles for example.

Does that make sense?

I’m happy to help with creating the definition file you want.

All the best.

- Jesse


----
Jesse Hopkins, PhD
Beamline Scientist
BioCAT, Sector 18
Advanced Photon Source

Jesse Hopkins

unread,
Jun 2, 2021, 8:32:57 PM6/2/21
to bioxt...@googlegroups.com, Shuo Sui, Richard Gillilan
Hi Tom,

Another thought. I’ve been maybe thinking too inside the box here. If you’d like, there might be a simple script based solution. You could use h5py to load your custom hdf5 files with the profiles. Then extract the data and use the RAW API to save it as either standard .dat files or save all the profiles in a single standard RAW series file, whichever you prefer. Those could then be loaded into the RAW GUI as normal. That might be the easiest and/or most efficient way to do this, though it has the disadvantage of requiring an extra step to load the data files, which might get annoying depending on how many hdf5 files you’re generating.

All the best.

- Jesse

----
Jesse Hopkins, PhD
Beamline Scientist
BioCAT, Sector 18
Advanced Photon Source

Thomas Grant

unread,
Jun 2, 2021, 8:54:59 PM6/2/21
to bioxt...@googlegroups.com, Shuo Sui, Richard Gillilan
Hi Jesse,

Yes, that's true. I could always just write a bajillion .dat files just to get going, then later work on making a single file to contain it all for RAW.  Hmm...

Right now I'm going to see how difficult it really is to just copy the basic outline of the definitions file from LiX. I'm trying to figure out what are the minimum datasets required.  Can you please tell me, what is the difference between "batch" and "series"?

Thanks,
Tom

Jesse Hopkins

unread,
Jun 2, 2021, 9:05:12 PM6/2/21
to bioxt...@googlegroups.com, Shuo Sui, Richard Gillilan
Hi Tom,

You can also use the RAW API to write a RAW series file (our standard hdf5 file) from a bunch of profiles, so it all gets saved as one file.

I guess I’m not understanding how far upstream you’re controlling the data. If you are writing a python script that generates this hdf5 file in the first place, you can skip the middle step and just write directly to a format RAW can read. Here’s an example, assume that you have numpy arrays of q, I, and uncertainty in a list for each profile, as [q, I, err].

Using the RAW API you would do something like this:

my_profiles = []

for prof in profiles:
    my_profiles.append[raw.make_profile(q, I, err) for q, I, err in prof]

my_series = raw.profiles_to_series(my_profiles)

raw.save_series(my_series, ‘my_series.hdf5’, ‘./my_datadir’)


And how you have an hdf5 file in the standard RAW series format that you can load into RAW without any other work.

If, on the other hand, you’re not generating this initial hdf5 file yourself, but just providing specifications to some format, then the other options I’ve laid out should work just fine.

To answer your question, if a dataset is ‘batch’, then the profiles in it load in the Profiles tab in RAW (i.e. as individual profiles). If it’s a series, then they all load as a single series in the Series tab in RAW.

All the best.

- Jesse

Jesse Hopkins

unread,
Jun 2, 2021, 9:11:37 PM6/2/21
to bioxt...@googlegroups.com, Shuo Sui, Richard Gillilan
In the previous email it should have been:
my_profiles.append(raw.make_profile(q, I, err) for q, I, err in prof)

Using () instead of [] (I switched from a list comprehension to a loop and didn’t excise all my brackets).

- Jesse

----
Jesse Hopkins, PhD
Beamline Scientist
BioCAT, Sector 18
Advanced Photon Source

Jesse Hopkins

unread,
Jun 2, 2021, 9:14:57 PM6/2/21
to bioxt...@googlegroups.com, Shuo Sui, Richard Gillilan
One more time, because I’m not getting code right (and then I’m done writing code for the night). It really should be:

my_profiles = []

for prof in profiles:
   q = prof[0]
   I = prof[1]
   err = prof[2]
    my_profiles.append(raw.make_profile(q, I, err))


my_series = raw.profiles_to_series(my_profiles)

raw.save_series(my_series, ‘my_series.hdf5’, ‘./my_datadir’)


Though I suspect you got the idea the first time around.

All the best.

- Jesse

----
Jesse Hopkins, PhD
Beamline Scientist
BioCAT, Sector 18
Advanced Photon Source

Thomas Grant

unread,
Jun 2, 2021, 9:21:19 PM6/2/21
to bioxt...@googlegroups.com, Shuo Sui, Richard Gillilan
Hi Jesse,

That's great, that's just what I needed. Indeed I am writing the hdf5 files myself. We read just a giant stream of all data you could imagine, then I write what I want to an hdf5 file (such as radial profiles, q values, any metadata, etc.). So I will try doing what you mention and running the RAW API to write the series.

Thanks again,
Tom

Jesse Hopkins

unread,
Jun 2, 2021, 10:13:19 PM6/2/21
to bioxt...@googlegroups.com, Shuo Sui, Richard Gillilan
Hi Tom,

Sounds good. You can find API documentation here:

I just noticed that the main API reference (not the getting started or examples, but the function calls) seems to not be building for this version of the documentation. So I’ll look into that and try to solve it tomorrow, so you can take a look at that as well as the rest of the documentation for the API.

Let me know how it goes.

All the best.

- Jesse


----
Jesse Hopkins, PhD
Beamline Scientist
BioCAT, Sector 18
Advanced Photon Source

Jesse Hopkins

unread,
Jun 3, 2021, 11:07:12 AM6/3/21
to bioxt...@googlegroups.com, Shuo Sui, Richard Gillilan
Hi Tom,

I've fixed the build issue so now all of the API documentation is available here:

All the best.

- Jesse

----
Jesse Hopkins, PhD
Beamline Scientist
BioCAT, Sector 18
Advanced Photon Source

Thomas Grant

unread,
Jun 3, 2021, 1:55:18 PM6/3/21
to bioxt...@googlegroups.com, Shuo Sui, Richard Gillilan
Thanks Jesse.  So far so good getting the raw series saved using the API using some fake data.

I'll update you when I have some real data next week.

Best,
Tom

Jesse Hopkins

unread,
Jun 22, 2021, 3:03:19 PM6/22/21
to bioxt...@googlegroups.com
Hi Tom,

I'm curious if this ended up working for you.

All the best.

- Jesse

----
Jesse Hopkins, PhD
Deputy Director
BioCAT, Sector 18
Advanced Photon Source

Thomas Grant

unread,
Jun 22, 2021, 3:11:04 PM6/22/21
to bioxt...@googlegroups.com
Hi Jesse,

Sorry I meant to follow up. Yes, I was able to get it to work. The main issue turned out to be that our files have tens of thousands of profiles in them (typically 30-60k), which proved to be too much for my computer to handle in RAW. RAW was able to open the file, but I wasn't able to actually do much with it without it hanging/crashing.  But that's okay. I think doing it in RAW is not the best approach anyways. We will need some careful outlier filtering and scaling that I think just needs some dedicated coding for.

Thanks for your help,
Tom

Jesse Hopkins

unread,
Jun 22, 2021, 3:48:47 PM6/22/21
to bioxt...@googlegroups.com
Tom,

Got it, thanks for the update. Yes, I haven't worked with more than a few thousand profiles in a single dataset in RAW, so I guess I'm not terribly surprised that it has some issues. Something to work on eventually.

All the best.

- Jesse

----
Jesse Hopkins, PhD
Deputy Director
BioCAT, Sector 18
Advanced Photon Source

Reply all
Reply to author
Forward
0 new messages