Globals file not properly overwritten

62 views
Skip to first unread message

Victor Youri Helson

unread,
Dec 16, 2021, 5:02:04 AM12/16/21
to the labscript suite
Hello everyone,

We've been using labscript for sevral years now and started to experience some serious issues with runmanager, which worsened over time.
Basically, the software would start freezing for a few seconds every time a global was updated, and it took several seconds for runmanager to create the shots files and send them over to BLACS.
I've traced back the issue to the hdf5 file used by runmanager to store the globals (we went for a single file approach, which holds ever single global variable, splitted into groups), which was actually growing in size over time (incremented each time a globals value is changed in runmanager), to the point that when I checked yesterday its size was about 500Mb. I've temporarily replaced it with one of our recent experimental shot file and the freezing issues are gone!
However, the file size (of the new file) is still increased when a global is changed. This is very easy to diagnose, we just have to click multiple times on some boolean variable checkbox and watch the file size grow by increments of ~10kb, live. This was tested on our experimental setup and on a second one, which runs a more recent version of labscript. 
We first thought that the issue was actually a feature which we were not aware of, i.e. this globals file was initially thought to hold "memory" of the previous globals values, and that modifying one global would simply append a new "line" to the file with the new set of globals. However, when opening the hdf5 file (by hand or with e.g. python), we can only access the most recent state of runmanager. It seems like the globals effectively get replaced in the hdf5 structure of the file but that somehow the old values do not get deleted from the binary itself.
Moreover, the issue do not seem to come from the updating of the globals values but rather from the updating of the expansions in the file. Indeed, when looking at the expansion subgroups in the hdf5 file, they seem to hold memory of variables previously deleted from runmanager. This was again tested on two setups, by creating "dummy" variables, and deleting them right away. They would still appear in the expansions of the hdf5, albeit not in the globals.

I've not pushed the analysis / deciphering of the source code much further because I first wanted to know if we were doing something obviously wrong or if we were missing some feature of the software (or if this has been patched in recent versions).

For reference, we are using an older version of the software dating fromearly 2017 which we never really took the time to update, and the other version that was tested dates from early 2019 I believe.

Best regards

Victor Helson

Zak V

unread,
Dec 16, 2021, 12:34:22 PM12/16/21
to the labscript suite
Hi Victor,

Unfortunately both of the undesirable behaviors you see, namely the bloating file size and slow global updates, are aspects of the hdf5 format, not issues with runmanager itself.

The increasing file size is due to the fact that hdf5 files don't reuse the freespace created when data is deleted (see https://support.hdfgroup.org/HDF5/doc/H5.user/Performance.html). This means that each time a global's value is changed in the hdf5 file, it allocates a new region of memory in the hdf5 file to store the new value without deleting the region of memory used to store the previous value. As you saw, this causes the labscript globals hdf5 file to bloat over time. I've noticed this is particularly bad when doing optimizations with M-LOOP as that involves editing many global values very frequently. When doing those regularly I often see my globals files grow to a few GB.

If you'd like to shrink the globals file back to a more reasonable size, the best thing to do is create a new globals file and copy over the values from the old one. There are a few ways to do this. I usually do it with the h5repack utility (make a backup before deleting  or overwriting the old/bloated file!). You should also be able to do this from within runmanager itself, though I haven't taken that approach. Just create a new globals file and copy over all of the globals from the old one by right-clicking on the globals file or group in the globals tab and selecting "copy...". If you don't want to manually go through and activate the same globals groups in the new file, you can close runmanager, then replace the bloated file with the new one (i.e. delete the old/bloated one and rename the new file to the old file's name), then reopen runmanager.

I've heard others suggest that the slow global update times seemed to be associated with the file bloating, but I didn't see that in my tests. I included some plots in runmanager PR #96 where I plotted the time taken to update a global's value and the size of the global file while repeatedly updating a global's value. I found that the update time was pretty much independent of the size of the globals file. However there were occasional outliers where it would randomly take a long time to update a global's value. In particular I saw that happen more when using a fairly old/low-performing computer to host the network drive with the hdf5 globals file on it. Some updates were still slower than others when using a newer/higher-performing computer but not to the same extent as with the old machine. It could be that this was the case for my machines in particular though, it sounds like your global updates are reliably slow?

In some more recent investigations I noticed that when the globals took a long time to update, this was the line that slowed everything down. That line calls `h5py.File.close()` - a method from the `h5py` module. At that point I didn't investigate much further because the issue seemed to be outside of labscript. I'm not sure why closing a file would randomly, instead of reliably, be slow. It could be network or antivirus issues but I'm really not sure. Let us know if you investigate and find anything!

Cheers,
Zak

Philip Starkey

unread,
Dec 20, 2021, 8:16:07 AM12/20/21
to labscri...@googlegroups.com
Hi Victor,

Thanks for raising this issue on the mailing list! This is an issue we were aware of. Unfortunately it's somewhat due to the way the HDF5 format works, in that space inside the file is not necessarily reallocated. You've possibly uncovered a bug regarding expansion entries not being deleted when globals are, but that likely pales in comparison to file size growth just from editing a value (or toggling a boolean).

I did have some hope that recent versions of h5py might address this. There are some new keyword arguments that can be used at file creation (so they have to be set when you create a new file, useless for existing files): fs_strategy and fs_persist. However, I've just tried all combinations of those and it doesn't seem to change how the file size grows (tested with h5py v3.2.1). I'm wondering if it's to do with the way we store the globals in the h5 file (as attributes rather than in datasets in order to bypass problems with fixed length string storage in datasets - the h5py issue that discussed this added functionality was focussed on datatsets). I'm unsure if that's expected behaviour. E.g. I would have hoped that setting fs_persist to True would have allowed unallocated space to be tracked and reused across file open/close (FYI we open/close the H5 file every time we change a global so that we can properly support safe shared access using zlock/h5lock). It could be worth attempting to replicate this issue with attributes outside of the labscript suite so that it could be reported upstream to h5py (and possibly hdf5 depending on what the h5py devs say).

As an aside, you can shrink the globals file using h5repack using something like h5repack -L -c 10 -s 20:dtype old.h5 new.h5
(or just copy the relevant groups out from one file to another using python)

If you have the time and would like to investigate further, we'd appreciate the help. I think it's a matter of working out if there is a way to make h5py reuse the space (when updating attributes) and either integrating that knowledge into the labscript suite or reporting the issue upstream. But if not, don't worry, we'll add it to the bug list and try to look into it more when we can!

Cheers,
Phil

--
You received this message because you are subscribed to the Google Groups "the labscript suite" group.
To unsubscribe from this group and stop receiving emails from it, send an email to labscriptsuit...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/labscriptsuite/953ff4fe-d760-49e3-92a6-4776c56323aen%40googlegroups.com.


--
Dr Philip Starkey
Senior Technician (Lightboard developer)

School of Physics & Astronomy
Monash University
10 College Walk, Clayton Campus
Victoria 3800, Australia

Victor Youri Helson

unread,
Dec 20, 2021, 9:49:54 AM12/20/21
to the labscript suite
Hi Phil and Zak, 

Thanks for the answers.

As pointed out by Zak, we indeed had the globals file hosted on a local server. However, moving it to a drive of the actual labscript computer did not help with the issue of runmanager being extra slow to respond. The issue really seems to be stemming from the size of the file itself.

Our current patch to the issue is to have a routine scheduled to be run every week which copies the globals group from the file to a new one and renames the latter to get recognized by runmanager as the new globals file. 

I don't have much time to dedicate to the issue sadly; there's always another experimental annoyance showing up!

Best

Victor

Philip Starkey

unread,
Dec 20, 2021, 5:29:28 PM12/20/21
to labscri...@googlegroups.com
Hi Victor,

Your solution sounds like a good one! We might consider implementing something like that inside runmanager if we can't find a better solution inside h5py.

For anyone else confused (or with an over zealous spam filter) like me, Zak's response can be viewed here: https://groups.google.com/g/labscriptsuite/c/pwfhb9QX224 (I had not seen Zak's response when I wrote my own last night - apologies for the information overlap!)

Cheers,
Phil



Reply all
Reply to author
Forward
0 new messages