Issues when compiling time is long.

69 views
Skip to first unread message

Alessandro Restelli

unread,
Sep 9, 2020, 7:16:28 PM9/9/20
to the labscript suite
Dear Labscript team,

We are working on a custom pseudoclock that can store up to 8 milion instructions.
We found that when compiling a script in runmanager, as compilation time increases, the ZeroMQ lock is not held at some point. I suspect this can be due to a timeout and I'm wondering if it is possibile to extend it. However the problem could be another, for example memory overflow.

The error on runmanager is the following and there is no further traceback information.

Traceback (most recent call last):
  File "/storage/Tier3/ananya/labscript-suite/userlib/labscriptlib/example_apparatus/many_instr_test.py", line 102, in <module>
    stop(t)
zmq.error.ZMQError: error: lock not held
Compilation aborted.

Any idea of what might be causing this issue?
Thank you in advance for your help!

All the best,

Alessandro (JQI)












Chris Billington

unread,
Sep 9, 2020, 10:03:27 PM9/9/20
to labscri...@googlegroups.com
Hi Alessandro:

Zlock clients that hold their lock too long have it automatically released (this is mostly in order to ensure that programs that hard crash don't hold locks indefinitely). The timeout is set by the client when it acquires the lock. The default timeout is 45 seconds, as set in labscript_utils.ls_zprocess.

You may increase it by doing:

from labscript_utils.ls_zprocess import ProcessTree
ProcessTree.instance().zlock_client.set_default_timeout(<desired timeout in seconds>)

If you call this any time prior to calling stop() within your experiment script, the locks acquired during compilation should use the longer timeout.

Compilation exceeding the current default of 45 seconds seems quite long though!

This problem has always annoyed me - that we want locks to be released if a program crashes, but not if it is still running just fine. We usually try to ensure programs hold HDF5 files open for as little time as possible to minimise lock contention, but during compilation we instead optimise for minimising compilation time by not reopening the HDF5 file repeatedly in the same thread,  causing the issue you're seeing when compilation takes a long time even though most of it is not file IO.

I might think about how to have zlock perhaps have a background thread communicating with the server to renew the lease on the lock if it is held for a long time.

Hope that helps!

-Chris

--
You received this message because you are subscribed to the Google Groups "the labscript suite" group.
To unsubscribe from this group and stop receiving emails from it, send an email to labscriptsuit...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/labscriptsuite/c47f7ef3-3632-4f0c-af70-47b25bf101c4n%40googlegroups.com.

Alessandro Restelli

unread,
Sep 10, 2020, 10:50:46 AM9/10/20
to the labscript suite
Dear Chris,
thanks a lot for your quick reply! That is exactly the information we were looking for.
The reason why compiling is taking so much time is likely due to the way we are generating the events for our pseudo-clock that is not particularly efficient (a for loop)

Here is what we are doing: it generates a list of 640000 instructions that are sent to our pseudoclock:

----------------------------------------------------------------------------------------------------------------------------------------------------------------
from labscript import *
from labscript_devices.Pyncmaster import Pyncmaster
from labscript_devices.NI_PCIe_6363 import NI_PCIe_6363
from labscript_devices.NovaTechDDS9M import NovaTechDDS9M
from labscript_devices.Camera import Camera
from labscript_devices.PineBlaster import PineBlaster
from labscript_devices.NI_PCI_6733 import NI_PCI_6733
from labscript_utils.unitconversions import *

Pyncmaster(name='pyncmaster', board_number=0, time_based_stop_workaround = True, time_based_stop_workaround_extra_time=0.5)

DigitalOut( 'ch_1', pyncmaster.direct_outputs, 'flag 0')
DigitalOut( 'ch_2', pyncmaster.direct_outputs, 'flag 1')
DigitalOut( 'ch_4', pyncmaster.direct_outputs, 'flag 3')
DigitalOut( 'ch_9', pyncmaster.direct_outputs, 'flag 8')

#Begin program
start()
t = 0

t += 0.1

#Loop over list of instructions
for i in range(80000):
    ch_1.go_high(t)
    ch_2.go_high(t)
    t+= 0.00001
    ch_4.go_high(t)
    ch_9.go_high(t)
    t+= 0.00001
    ch_1.go_low(t)
    ch_2.go_low(t)
    t+= 0.00001
    ch_4.go_low(t)
    ch_9.go_low(t)
    t+= 0.00001

t += 0.1

#End program
stop(t)

----------------------------------------------------------------------------------------------------------------------------------------------------------------

We are creating events in this inefficient way only to test if the hardware can handle the maximum number of instructions, however it is unlikely that a similar loop would be implemented in an actual experiment. 

All the best,

Alessandro

Alessandro Restelli

unread,
Sep 10, 2020, 5:45:47 PM9/10/20
to the labscript suite
Dear Chris,
thanks again, your suggestion was very helpful. For some reason adding the line in the script did not work but we hacked our way around by changing DEFAULT_TIMEOUT directly in the zprocess module source, and now the lock does not expire: this is a temporary solution to allow us to deal with a slow compilation until we optimize our code (there ought to be a better way to generate pulse trains than using a for loop).
Best,

Alessandro

Chris Billington

unread,
Sep 20, 2020, 10:34:27 PM9/20/20
to labscri...@googlegroups.com
Hi Alessandro,

Good to hear you figured it out!

Yes, my suggestion to add the code to modify the timeout to your experiment script was incorrect - it's even worse than I said: the file is already open and the lock held before any of your script's code runs.

But changing the hard-coded value of course works.

Cheers,

Chris

Reply all
Reply to author
Forward
0 new messages