Running labscript in a distributed web server architecture

167 views
Skip to first unread message

Rohit Prasad Bhatt

unread,
Nov 18, 2018, 6:52:58 AM11/18/18
to The labscript suite
Hi all,
We are soon getting started with labscript. We plan to run it as a web server. What I mean is the following :

1. Labscript will be running on a dedicated computer (which we call experiment server) for experimental control. This computer will be hosted on a network. All executions will be done by communicating remotely with this computer.

2. There can be multiple users simultaneously communicating with the experiment server. There has to be a way to queue all execution requests properly.

3. The results of all requests submitted by various users should be divided into groups according to the users who submitted them and then sent to a analysis/storage/user-specified directory. For e.g. if A and B are two users simultaneously communicating with the experiment server, then at the end, results of the experiment shots submitted by A should be grouped separately from those of B.


One way which I could think to implement this was the following :

1. Install BLACS on the experiment server. Ask all users to install runmanager on their computers.

2. Now all runmanagers should be used to communicate with the BLACS on the experiment server. An issue which I can think of is the simultaneous access by various runmanagers on different computers with the BLACS on the experiment server.

Athough BLACS documentation mentions that it queues the various HDF files received from the runmanager (for e.g. when performing shots with Cartesian product of various parameters), would this still hold in this case with several runmanagers and one BLACS?

Also can BLACS keep on queuing requests while also executing them, i.e. it is executing a shot and also simultaneously adding requests received from five different users to the execution queue list.

3. The outputs of various users should be clubbed properly and sent to the user-specified directory. The user can specify a path on his/her personal computer or on a shared space. In case of choosing shared space, he/she has to be careful to avoid confusion with outputs of other users. A possible way can be to give a clear output directory name for his/her shots e.g. "18_Nov_2018_Rohit". But maybe there are other better ways.

4. Once everyone has his/her own experiment shot outputs they would like to analyse them. They can use lyse either on their own PC or use an analysis server like the experiment server above where different users submit their analysis request in a similar queued fashion.


If I was not clear in any of the points above I am happy to answer questions. I seek your suggestions on how to implement such an architecture with labscript.

Philip Starkey

unread,
Nov 19, 2018, 1:21:33 AM11/19/18
to The labscript suite
Hi Rohit,

Yes, for the most part, your suggested implementation is the way to go. BLACS is designed to handle your suggested workload. It can continue accepting new shots from runmanager while executing an experiment. It can also handle accepting requests simultaneously from multiple instances of runmanager (although it does serialise these, so in theory simultaneous submission from too many instances of runmanager may result in some runmanager requests timing out. I doubt it will be a problem though, and could be easily fixed by double-buffering the paths on the BLACS side if you ever find it is a problem). Shots will be queued and executed by BLACS in the order it receives them, so they may be interleaved if multiple people are producing shots at the same time.

In such a situation, shots must be stored on a network share accessible to all PCs. This is typically the default behaviour though. You just need to have the network share mounted on each PC, and specify the local path to the mount in the experiment_shot_storage parameter in the labconfig file of each PC. The labscript suite will convert shot file paths to an agnostic variant for communication between PCs, and then convert the agnostic path to the correct local path before accessing the files from other PCs. So it all runs transparently, and this is well tested (we've been doing it for over 5 years with no issues).

Shot files are also (by default) stored in a folder structure that follows <experiment_shot_storage>/<labscript python filename>/<year>-<month>/<day>/ so as long as each user has their own uniquely named labscript experiment logic Python file, then shots will already be segregated as you want. All users will need to share the same network share drive though (as BLACS can only be configured to access files from one experiment shot storage folder).

Analysis is slightly more complicated as we don't have an inbuilt way to distribute completed shot files to multiple instances of lyse. However, BLACS does have a plugin system, so you could write some custom code to create a BLACS plugin that can handle distributions to multiple analysis systems. Plugins like this are useful because you can write them to do whatever you like, and you don't have to rely on your required feature being built into the core of the labscript suite.

I hope that helps with your planning.

Cheers,
Phil.

Russell Anderson

unread,
Nov 19, 2018, 1:43:13 AM11/19/18
to labscri...@googlegroups.com
Hi Rohit,

Further to Phil's suggestion, another way to distribute completed shot files for analysis to the host that submitted them would be to run an instance of lyse on the control server, which acted solely as a broker of shots. It would have one single-shot analysis routine that spoofed the functionality of submit_waiting_files() in blacs/analysis_submission.py, dutifully sending shot file paths to one host per shot. Each of these hosts would be running lyse, and would see shots being added to the queue as though blacs had sent them.

Kind regards,

--
You received this message because you are subscribed to the Google Groups "The labscript suite" group.
To unsubscribe from this group and stop receiving emails from it, send an email to labscriptsuit...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Russell Anderson
Lecturer

School of Physics and Astronomy
19 Rainforest Walk
Monash University VIC 3800
Australia

T:  +61 3 9905 5943                      
Message has been deleted
Message has been deleted

Rohit Prasad Bhatt

unread,
Jan 14, 2021, 3:52:41 AM1/14/21
to 'Philip Starkey' via The labscript suite
Hi all,
I would be interested if anyone has ever implemented a user management system for labscript. What I mean is the following :

1. There are several users who submit their shots to be run. Now the user management system requires you to be a registered user and provide some kind of token or key. This token is given to you on sign up. For submitting shots you must provide your token, which will be compared with the registered user database.

2. The shots submitted by registered users are run and the resulting HDF files are stored in a location with a directory tree having folders of different users (maybe labeled by the unique key given to every registered user). Inside the folder of a particular registered user, there would be sub-folder for several shots he/she submitted. The exact directory structure inside the user folder could be same as default labscript structure (i.e. Year/Month/Day/Shot). But it would be nice to have some kind of "job-id" for such shots so that user can access all shots in a /Shot folder with this "job-id".

A possible scenario is when the users are all across the globe. Its similar to running experiments on IBM quantum computers.

I remember posting similar questions earlier (https://groups.google.com/g/labscriptsuite/c/jY2GVHF4cFA/m/PVSRGiMbAQAJ), but we are more focused on this topic now (and our wishlist has changed), so I initiated a new discussion.

I am looking forward to your suggestions on a possible implementation.

Regards,
Rohit Prasad Bhatt

Ian B. Spielman

unread,
Jan 14, 2021, 12:57:42 PM1/14/21
to labscri...@googlegroups.com
Dear Rohit,

I have a feeling that this is a larger project than you may think, really for two reasons.

(1) Robust user authentication is not easy (I am sure it would be easy to hack an insecure solution).

(2) The shared windows drive paradigm labscript uses is really not suitable for a scaleable remote submission solution (i.e., how does the complied h5 file go from the remote user’s computer, to blacs, and when the shot is done where is the final data stored? How does this interact with the whole labscript system in a truly distributed environment?)

— Ian

Ian B. Spielman

Fellow, Joint Quantum Institute
National Institute of Standards and Technology and the University of Maryland

----- WEB -----
http://ultracold.jqi.umd.edu

----- EMAIL -----
spie...@jqi.umd.edu

----- ZOOM -----
https://umd.zoom.us/j/7984811536

----- PHONE -----
(301) 246-2482

----- MAIL -----
UMD:
2207 Computer & Space Sciences Bldg.
College Park, MD 20742

NIST:
100 Bureau Drive, Stop 8424
Gaithersburg, MD 20899-8424 USA

----- OFFICE -----
UMD: Physical Sciences Complex, Room 2153
NIST: Building 216, Room B131
> --
> You received this message because you are subscribed to the Google Groups "the labscript suite" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to labscriptsuit...@googlegroups.com.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/labscriptsuite/CAKBMk%2BfcoxC-XL7SO16wkXxRtCnenB%3DSR8-8Pc3x3b1nyq-9bw%40mail.gmail.com.

Zak V

unread,
Jan 14, 2021, 2:59:53 PM1/14/21
to the labscript suite
Hi Rohit,

This isn't exactly what you're looking for with your most recent email, but our lab has a setup that uses some of the ideas from the previous email chain that you linked. Based on Russ's suggestion we run blacs and a copy of lyse on a server computer. That instance of lyse only runs one singleshot routine, which we call `forward_shot.py`. That script simply sends the shot along to another copy of lyse at the ip address specified by a global. To run shots, we open runmanager and lyse on a client computer, set the "BLACS Hostname" to the server's ip address, set the ip address global to the client's ip address, and click Engage. The shot is then compiled, sent to blacs, run by blacs, sent to the server's lyse instance, and finally forwarded back to the client's lyse instance. To keep files from different clients separate for easier organization the clients each have a different value for `apparatus_name` in their labconfig, which puts their shots in different directories. I'd be happy to share that lyse script if you're interested.

That approach does require the clients and server to have access to the same network drive, which probably isn't very scalable as Ian pointed out. Maybe it is possible to do a shared drive over the internet to allow remote users, though I'm not sure how reliable or secure that would be.

If you really want to go all out you could write some software to act as an intermediary between the different labscript components which moves files via scp or the like. Clients would have runmanager running on their local machines and set the "BLACS Hostname" to `localhost`. But instead of running blacs on their machines, clients would run a custom software component that would receive shot submissions from runmanager, then copy the shot files over scp to the server running blacs, and then add the shots to the blacs queue. The client would poll the server and scp back over shot files once they have run, which it could then send to a local instance of lyse running on the client machine. The nice thing about using scp like this would be that you could have secure communication and add/remove users the same way that a typical computing cluster would. It would probably take a while to get an approach like this working though, so it may or may not be worth your while.

Cheers,
Zak

Philip Starkey

unread,
Jan 14, 2021, 8:03:31 PM1/14/21
to labscri...@googlegroups.com
Hi All,

Ian and Zak have made some very good points here. I agree the biggest challenge would be HDF5 file access. I can see several options here:
  1. You use an authenticated VPN to a network containing the control PC and a SMB network share of some kind, that limits folder access based on the authenticated user (preferably using the same credentials as the VPN). This might be achievable using an enterprise grade NAS (although you would want to audit the security of that and regularly update the NAS firmware at a minimum - as I believe internet exposed NAS's are often hacking targets for cryptomining malware). This will probably only work well for people who have really good internet connections (as my experience with SMB over VPN even with a good connection has been average).
  2. You write custom software that utilises SSH (aka Zak's suggestion of using SCP) for tunneling instead of a VPN (the rest of the implementation and concerns would probably be similar to point 1., but maybe would hold up better on slower internet connections.
  3. You write some new software that mimics BLACS ZMQ interface on the user side, mimics runmanager/lyse ZMQ interface on the server side, and handles authentication and transfer of HDF5 files over a persistent network socket (probably something like a secure websocket so you are less likely to run into NAT/firewall issues). You might be able to use Django + 3rd party Django authentication library + Django channels to do this on the server side (in order to keep it in all Python). Using well maintained Python libraries to do communication/authentication reduces the risk of being hacked. This would allow users to install runmanager and lyse locally, submit shots from runmanager to your custom application, which transfers over a websocket (possibly via an intermediate webserver under your control) to a remote counterpart which saves to disk and submits to BLACS. BLACS notifies this program when shots are completed, and it then sends the completed file back over the websocket to the users program, who forwards it on to lyse (approximately anyway - there might be some tweaks as to where the websocket ends, dealing with parallel websockets and heterogenous communication speeds between clients etc. Some of this is perhaps better offloaded to a server running in the cloud that can access your network share or something like that). If you want to go this route and don't have experience with Django or AWS/Google cloud/etc. dev-ops, you probably want to hire someone locally to at least advise, if not scope and develop the solution for you.
There would of course be a heap of other concerns - making sure authenticated users have the up-to-date connection table, making sure they can't execute arbitrary code in BLACS, protecting your physical equipment with a robust interlock system (preferably in hardware and isolated from the labscript suite) that monitors and shuts-down/protects the equipment if things overheat, drift, fail open, fail closed, etc, etc. Many people on this mailing list have experience with various aspects of this and might be willing to share their knowledge (although some of it is straying into general lab stuff rather than labscript suite questions/concerns).

In short I believe it's doable, but there is a lot of things to consider, design, and implement if you want to make it work for an arbitrary set of global users! We can share theoretical ideas here easily enough, but I want to stress that's a long way from a viable solution that can be implemented successfully and safely!

Very interested to continue the conversation though and hear how it goes if you do go down this route.

Cheers,
Phil




--
Dr Philip Starkey
Senior Technician (Lightboard developer)

School of Physics & Astronomy
Monash University
10 College Walk, Clayton Campus
Victoria 3800, Australia

Rohit Prasad Bhatt

unread,
Jan 14, 2021, 11:36:54 PM1/14/21
to 'Philip Starkey' via The labscript suite
Hi all,
Thanks for the comments and suggestions. Maybe I should mention the following :

1. We are not planning to share the HDF files directly with the user. The idea is that the user would submit an experiment script and a globals file. This should be given to Runmanager and the resulting HDF shots are always on the server storage. User can only retrieve results from the shots using "job-id". Atleast for the start the plan is that the user cannot have a lower level access to HDF files directly.

This means things have to be standardized on the experiment in terms of coding, timings, operations etc. Then one could write an API whose functions you can use in the experiment script. These API functions could be like "prepare a BEC in a given state" or "get the number of atoms from the images" etc. Its similar to using QisKit to run experiment on IBM quantum computers.

2. We were indeed thinking of using the Django framework for user management. There seems to be some open code available which goes in that direction (https://github.com/ornl-ndav/django-remote-submission). We will also have to look into properly queuing requests from several users.

Anyway this just a start and exact implementation will evolve as we work on it. I will keep posting the updates on the project.

Thanks again for your help!

Regards,
Rohit Prasad Bhatt

Reply all
Reply to author
Forward
0 new messages