FW locks in large workflow

29 views
Skip to first unread message

michael...@berkeley.edu

unread,
Jul 5, 2018, 7:06:07 PM7/5/18
to fireworkflows

Hello FireWorks Users,



Background about my workflow


I am using fireworks on LBNL's NERSC Cori system. I have a workflow which contains ~50,000 fireworks with no interdependencies. Each firework consists of two firetasks:


1) Scriptask which executes a parallel program (currently ~3 min cpu time /execution)

2) Scriptask which runs a small python script that processes some output


I am running on the 68-core KNL nodes so I am using ‘rocket_launch: rlaunch multi 34’ in my qadapter script with 2 threads in my Scriptasks for the parallel program.


Hundreds of thousands of these fireworks will eventually need to be run with an even more expensive version of the firetask #1 software.


Issue


When I only have a few thousand of these types of fireworks in my workflow everything runs perfectly well. However, when I scale up to ~50,000 I begin to get many the following type of error in the FW_job%.out file after ~10 fireworks have successfully completed:



2018-07-05 15:13:52,532 INFO fw_id 63526 locked. Can't refresh!



I believe the fireworks’ lock attempts are expiring because they have waited past the config file parameter WFLOCK_EXPIRATION_SECS time limit without getting a lock on the DB because there are too many fireworks finishing around the same time and the database updates are apparently taking too long. I have extended the WFLOCK_EXPIRATION_SECS parameter from 5 min to 10 min but this did not solve the problem. In any case, I can’t even afford 5 minutes of downtime between each firework completion.


Setting the WFLOCK_EXPIRATION_KILL to True does not solve my problem because then there are too many fireworks forcing locks on the database.


I have tried the ‘lpad admin maintain --infinite --maintain_interval 60’ command to no avail. I have also poked around with the database profiler but I am not sure what to look at.


I have heard adding indices to the LaunchPad may improve update speed. I know I do this in the format LaunchPad(..., user_indices= ['spec.paramter1],...) but I am not sure what ‘parameter1’ should be.


Is a fix possible or is my workflow already too large?


Please let me know if more information would be helpful.


I would greatly appreciate any advice.



Thank you.


Anubhav Jain

unread,
Jul 5, 2018, 7:17:06 PM7/5/18
to michael...@berkeley.edu, fireworkflows
Hi Michael

A quick question before getting too detailed - if there are no interdependencies between the Fireworks, then why not use a single workflow? The main reason to put multiple FWs into one workflow is to ensure that dependencies are executed correctly.

I am asking because the lock is specific to a workflow. If you instead used 50,000 workflows, each with a single FW (instead of 1 workflow with 50,000 Fireworks) then you probably wouldn't run into the locking issue.

Best,
Anubhav

--
You received this message because you are subscribed to the Google Groups "fireworkflows" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fireworkflow...@googlegroups.com.
To post to this group, send email to firewo...@googlegroups.com.
Visit this group at https://groups.google.com/group/fireworkflows.
To view this discussion on the web visit https://groups.google.com/d/msgid/fireworkflows/fb576419-616e-4880-9184-3580b0bd84bd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Best,
Anubhav

Anubhav Jain

unread,
Jul 5, 2018, 7:19:18 PM7/5/18
to michael...@berkeley.edu, fireworkflows
quick typo correction - "then why not use a single workflow?" should have read "then why use only a single workflow (instead of putting each FW in its own workflow)"
--
Best,
Anubhav

michael...@berkeley.edu

unread,
Jul 6, 2018, 4:52:32 PM7/6/18
to fireworkflows
Hello Anubhav, 


I have implemented this fix and it is working great. I'm not sure why I didn't try this out! 

Thank you for such a fast reply. 


Best, 

Michael 

Anubhav Jain

unread,
Jul 6, 2018, 8:40:09 PM7/6/18
to michael...@berkeley.edu, fireworkflows
Hi Michael,

That's great to hear!

Let us know if you encounter any more issues.

Best,
Anubhav


For more options, visit https://groups.google.com/d/optout.


--
Best,
Anubhav
Reply all
Reply to author
Forward
0 new messages