ZMQError: error: lock not held

22 views
Skip to first unread message

David La Mantia

unread,
Feb 6, 2026, 4:56:25 PMFeb 6
to the labscript suite
Hello all,

TYIA for any advice.

I have a Pendulum frequency counter as a user_device that's working quite well, except when I want it to take data for longer than about 10 s.

There, in transition_to_biffered the last thing I have it do is print 'Pendulum is done counting'

Then it throws the following error chain (below).  I've tried doing what https://groups.google.com/g/labscriptsuite/c/QJYqdTslVPM/m/iuRModpEBAAJ
did, but to no avail.

Ideas?

Taking measurement: 100%|##########| 100/100 [01:40<00:00, 1.01s/s]

 

Pendulum is done counting.

Pendulum is done counting.

2026-02-06 14:07:46,367 ERROR BLACS.pendulum_main_worker.worker: Exception in job:

Traceback (most recent call last):

File "C:\Users\Lab\anaconda3\envs\labscript\lib\site-packages\blacs\tab_base_classes.py", line 898, in _transition_to_buffered

return self.transition_to_buffered(

File "C:\Users\Lab\labscript-suite\userlib\user_devices\Pendulum\blacs_workers.py", line 99, in transition_to_buffered

print('Pendulum is done counting.')

File "C:\Users\Lab\anaconda3\envs\labscript\lib\site-packages\labscript_utils\h5_lock.py", line 85, in __exit__

self.close()

File "C:\Users\Lab\anaconda3\envs\labscript\lib\site-packages\labscript_utils\h5_lock.py", line 64, in close

self.zlock.release()

File "C:\Users\Lab\anaconda3\envs\labscript\lib\site-packages\zprocess\zlock\__init__.py", line 246, in release

self.client.release(self.key, self._client_id)

File "C:\Users\Lab\anaconda3\envs\labscript\lib\site-packages\zprocess\zlock\__init__.py", line 195, in release

raise zmq.ZMQError(response)

zmq.error.ZMQError: error: lock not held


Chris Billington

unread,
Feb 6, 2026, 5:08:58 PMFeb 6
to labscri...@googlegroups.com
Hi David,

To prevent multiple programs trying to read/write HDF5 files whilst one is writing to it, access to the files is protected by a lock. But the lock has a timeout, I believe it's something like 45 seconds by default, and if a program doesn't release the lock in that time (by closing the HDF5 file), it is released automatically. The error you're seeing indicates an HDF5 was kept open for longer than that timeout.

This suggests your code keeps the HDF5 file open for a long time during the acquisition. Actually, it looks like the acquisition all happens within transition_to_buffered(), which takes 100 seconds.

You should have transition_to_buffered() open the HDF5 file, read anything it needs, and then close it. If some data-reading code needs to continue running during the shot, transition_to_buffered() should spawn a thread to do this, so the thread can keep running after transition_to_buffered() returns. The thread should store the results temporarily in memory, rather than writing to the HDF5 file. Then in transition_to_manual(), the thread should be stopped, the HDF5 opened again, and the acquired data written to it.

That way the file is only ever open for short times, which is important so other devices' blacs workers can access it if they need to, and will avoid holding the lock for longer than the timeout.

Hope that's helpful,

Chris



--
You received this message because you are subscribed to the Google Groups "the labscript suite" group.
To unsubscribe from this group and stop receiving emails from it, send an email to labscriptsuit...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/labscriptsuite/f9d4c332-5743-4434-976e-d79583e77478n%40googlegroups.com.

Ryan McGill

unread,
Mar 23, 2026, 4:05:55 PM (9 days ago) Mar 23
to the labscript suite
Chris, thanks for your insight. I was able to rectify the issue by following your guidance (opening and closing the HDF5 for the short period instead of keeping it open during the long run). 

However, we ran into another issue - during long runs of the experiment, the device would timeout. It looks like BLACS was looking for call and responses between runs, and if we did not update within 300s we would get a timeout error. One obvious solution would be to send pings in between runs, but for this particular device we were looking for a "set and forget" method, so I augmented experiment_queue.py for the "Transitioning to buffered mode" time out to be ignored. Aside from pings, is there a more clever way to ignore this timeout (possibly for a single device?) This is a crude solution, but if the device were to timeout then the experiment would fail either way without handling, so works in this case for the short term. 

Thanks,
Ryan McGill

Chris Billington

unread,
Mar 23, 2026, 9:03:25 PM (9 days ago) Mar 23
to labscri...@googlegroups.com
Hi Ryan,

It sounds like your whole experiment might be taking place during transition_to_buffered(). Whilst doing that and increasing/ignoring the timeout may work, the intended approach is for transition_to_buffered() to be short, and only configure the device for the rest of the run. Then, during transition_to_manual() afterwards, acquired data is saved to the HDF5 file. If work needs to be done in between (e.g. to continuously read data from the device), other labscript devices do things like spawn a thread during transition_to_buffered(), which keeps running to acquire data during the shot and is stopped during transition_to_manual(). You might look at the IMAQdxCamera BLACS worker for an example.

Regards,

Chris


Reply all
Reply to author
Forward
0 new messages