Reducing the delay between shots

Jean-Gabriel Pipelin

unread,

Apr 2, 2025, 11:20:57 AMApr 2

to the labscript suite

Hello labscript community,

I'm building a experiment in which I want to have a large repetition rate (meaning a intershot delay of the order of a 100ms ideally) . Currently, each shot in Runmanager has an experimental duration of about 20ms and I'm repeating this shot using the repeat function in BLACS. However, I've realized that between the shots, I have a delay of about 300ms and I don't know how to reduce it.

A few more details: the MOT is loaded during the intershot delay and then the experiment is done in the shot. And regarding hardware, I'm using an ni PCIe card, a PrawnDO and a PulseBlaster as a master pseudoclock.

I suspect that it was maybe due to the fact that between shots, the devices transition to manual then go back to buffered mode, delaying everything, but I'm not sure.

If anyone that has been using labscript maybe for a while longer than me and has encountered similar issues when trying to work at fast rate has some insight I would be really glad.

Best,
Jean-Gabriel

Johannes Schabbauer

unread,

Apr 3, 2025, 3:56:04 AMApr 3

to the labscript suite

Hi Jean-Gabriel,

I think there is no clear and unique solution to reduce the inter-shot delay sigificantely, without somehow breaking current devices or reducing some of the flexibility of labscript. There are some approaches, which you might try to adapt for your setup:

There is a solution by Shafinul from MIT, who was working on performance improvements last year. I did not really manage to test his modifications (also because we use more unusual devices), but I think the current implementation works quite well (see code and README at his GitHub: https://github.com/shafinulh/blacs ). The downsides are that this speedup requires some changes to the devices in BLACS, but he already adapted the standard devices (NI-daq, PulseBlaster, PrawnBlaster, IMAQdxCamera), see also his labscript-devices fork. Also, some of the speedup means disabling GUI updates in BLACS, which in principle is fine, but might make debugging a bit harder if one device crashes.
Then there is also an adaption from Darmstadt, with some description in the thesis of Lars (https://doi.org/10.26083/tuprints-00024446), I think their code is also somewhere in GitHub. I think they managed to use labscript for conditional programming of sequences, which means splitting up one shot in multiple sub-shots. The time between sub-shots is smaller than the delay between two shots. However, this solution also need adaption of the device codes for BLACS, and I'm not sure if this solution is exactly what you need.
Another option would be, to just compile multiple of your current shots into one longer shot. This would probably need no changes of the labscript device code, but just more post-processing of the data in the hdf5 files. However, there might be problems because the first experiment in your combined shot has a longer delay before running (in your case longer MOT loading), causing slightly different results.

For my experiment, we have shots of about 500ms length (including MOT loading). There the inter-shot delay is also noticeable, but we do not use any performance improvements/hacks, also because I like having live visualization of the shots in Lyse and running the script there also takes some time (meaning if the shots come in too fast, the visualization is not live anymore).

I hope my explanations above are some help to you.

Best wishes,

Johannes

dihm....@gmail.com

unread,

Apr 3, 2025, 3:59:04 PMApr 3

to the labscript suite

Jean-Gabriel,

This is an extremely common pain point and trying to tackle it in a more general way is fairly high on my infinitely long list of things I'd like to do with my very finite time. At the end of the day, labscript was initially designed for a BEC experiment with each shot taking multiple seconds, so transition delays for each shot weren't a serious concern and design choices were made under that basic assumption. You are likely correct that the transition functions for your devices are the limiting factor now. If everything else is perfect (which it likely isn't), I strongly suspect the ultimate limit to cycle times is the serialized access to the shot h5 file, which lives on disk. While it has been a while since I've looked into the problem, the times you are seeing are fairly close to what I would expect for optimal and it appears you are doing many of the things I would normally suggest. Johannes has provided some pretty good resources for more involved ways to tackle the problem. There are also a number of historical threads of people tackling this problem (here or here for instance) that could be useful to read to get a feel for the wider problem.

All that said, I would highly encourage you to look into less involved solutions first. One day I'll formalize this advice into the docs but for now I guess I'll continue to workshop it here on the list-serve. Organized in roughly priority order:

Invest in your control computer(s). I personally find it interesting that people will spend hundreds of thousands on the experiment hardware/lasers/synths then cheap out on the control computer. You are going to use that computer for >8hrs/day for years. It is OK to spend $3-5k on good hardware. I don't have hard numbers for specs since there are a ton of confounding factors in practice, but the following are good places to focus. In general, latency is going to be more important than raw processing power, assuming you meet a certain base threshold. In any case, I also highly recommend not using the latest and greatest so that your system is more robust.
- The fastest hard-drive you can reasonably attain. Given serial access to the shot file is a common bottleneck, ensuring you can read/write to disk as fast as possible is very important.
- Lot's of RAM, with the fastest clock speed your motherboard supports. Labscript relies heavily on having tons of simultaneous python threads. Keeping all of that within fast memory is important.
- Decent graphics card. BLACS cycle times include updates to the GUI within the hot loop (for now at least). Ensuring those updates are not limited by how quickly you can draw to the screen is important.
- Obviously a fast, multi-core processor with decent sized caches. Be wary of going too far here. Having too many cores can actually cause subtle slowdowns in software that tries to aggressively parallelize (like certain numpy backends).
Labscript best practices
- No remote shot storage. Keep your shots local on the computer that is running BLACS. Latency to serial accesses to a remote shot file can be quite large.
- Ensure devices don't open the shot file read/write if they will only read. This can allow for multiple simultaneous reads of the shot file, breaking some of the serialization bottleneck.
- Ensure devices don't hold the shot file open longer than they need to. A common problem is a device that takes a long time to read data off. If you keep the shot file open during a 1second long data transfer, nothing else can transition during that time either. Best to open the file to get necessary info, close it, read data off the device, then re-open the file to save the data.
- Optimize your script to limit re-programs or excessively large instruction counts. Novatech's are a common culprit here. Specifying dense ramps can lead to 1000s of instructions, and the serial comms interface on the device is very slow. Determine if you really need that many instructions.
- Careful leverage of smart-cache. Even for a slow programming device, if there aren't changes between shots the smart cache will prevent reprogramming. If you have a particularly slow device, re-programming individual instructions that change, instead of the entire array may be warranted.
- Avoid slow comms devices. Basic serial interfaces, overloaded GPIB, USB1.1/2 devices, or slow ethernet devices can slow things down too. Try to use the fastest interface available, or even find a different device that has a better interface.
Labscript hacks
- Try to move some of your experiment into the programming time. A common case is performing your MOT reload between shots where timing and dynamic control aren't necessary.
- Do multiple experiments in a single shot. This allows you to distribute the programming penalty over more data points, at the expense of a more complicated script and analysis.
Optimized dependencies
- Use modern pythons. Modest speed improvements to base python have been implemented since 3.11. Using later versions of python could provide modest benefits.
- Use optimized backends. Not all software used by labscript is python. Ensure you aren't being limited by a compiled dependency. For instance, Labscript does use the BLAS/LAPACK backends via numpy/scipy. Using an optimized BLAS/LAPACK for your architecture can provide some benefit. By default, the numpy provided by conda's default channel uses the highly optimized MKL, which tends to be faster for heavy computational loads on Intel CPUs.
Carefully profile
- Once the above have been implemented and/or have failed to help, then you need to think proper optimization, and that requires proper profiling (ie more than the times BLACS flashes in the GUI). For very slow things, checking timestamps in the logs may provide a hint, but realistically a full profiling solution will be necessary. Setting up a common recipe for how to do this is on my todo list, but I would recommend looking in to a statistical profiler (like Scalene) as it requires less tampering and overhead. In any case, it is very hard to fix what you can't quantitatively measure, so you really should do this before moving on to the next step so you can focus your efforts on what will give the most bang for you time.
Tune labscript internals for your use case
- Labscript is designed to be quite general, and we are generally loathe to implement an optimization in the mainline repos that limits flexibility. That said, most experiments don't need the full flexibility of labscript and can make customizations locally that trade unnecessary features for greater speed (ie these are things like what Johannes linked). Be wary of deviating too far from mainline without good reason as once your code diverges, merging updates from the mainline can get pretty painful.

In any case, if you (or others) make progress on identifying bottlenecks or improving speed, please do post something to the listserve. That experience is invaluable to the community, but can be very hard to find when it is buried deep in random github repos, theses, and lab notes.

-David

Jean-Gabriel Pipelin

unread,

Apr 4, 2025, 4:22:10 AMApr 4

to the labscript suite

Hello to both of you,

First of all thank you very much for your really quick and thorough answers. I'll try to work on both making sure that the controlling apparatus is well done (cheking the management of the shots in and out of labscript, reducing as much as possible unneccessary operations, etc) and also adapting our labscript to have it tailored for our experiment (looking at the various refs that you sent).

I'll keep you updated either here or on the listserve if I identify something relevant during my work to optimize this issue.

Thanks again for giving me some of your time to help me !

Best,

Jean-Gabriel

Reply all

Reply to author

Forward