How to re-run workflows with bad states_list (Wait fws in the middle)?

41 views
Skip to first unread message

specter119

unread,
May 10, 2016, 10:25:48 AM5/10/16
to fireworkflows
I found some of my workflows with element Gd have abnormal states_list (as followed). All of them will quickly turn to FIZZLED even if I rerun them again.
I don't konw how to rerun these workflows, will anyone is kind to give me some suggestions? 
Besides, anyone who knows the reason for the this situation ?



```json
{
    "name": "Cs1 Rh1 Te2--809",
    "state": "RUNNING",
    "states_list": "C-C-C-C-C-C-C-C-REA-W-RUN",
    "created_on": "2016-05-06T02:17:48.807000"
},

{
    "name": "Ag1 Gd1 Se2--2353",
    "state": "RUNNING",
    "states_list": "C-C-C-C-C-C-W-W-W-W-W-C-REA",
    "created_on": "2016-05-06T02:17:31.385000"
},

{
    "name": "Gd1 Rb1 Se2--2567",
    "state": "RUNNING",
    "states_list": "C-C-C-C-C-C-W-W-W-W-W-C-REA",
    "created_on": "2016-05-06T02:17:35.948000"
},

{
    "name": "Ce1 K1 Te2--2656",
    "state": "RUNNING",
    "states_list": "C-C-C-C-C-C-W-W-W-W-W-C-RES",
    "created_on": "2016-05-06T02:17:37.625000"
},
{
    "name": "Cu1 Gd1 Se2--2586",
    "state": "RUNNING",
    "states_list": "C-C-C-C-C-C-W-W-W-W-W-C-REA",
    "created_on": "2016-05-06T02:17:37.209000"
},

{
    "name": "Gd1 Se2 Tl1--2691",
    "state": "RUNNING",
    "states_list": "C-C-C-C-C-C-W-W-W-W-W-C-REA",
    "created_on": "2016-05-06T02:17:39.414000"
},

{
    "name": "Cu1 Gd1 Te2--2592",
    "state": "RUNNING",
    "states_list": "C-C-C-C-C-C-W-W-W-W-W-C-REA",
    "created_on": "2016-05-06T02:17:37.076000"
},
```

Anubhav Jain

unread,
May 10, 2016, 12:27:10 PM5/10/16
to specter119, fireworkflows
Please explain what is "bad" about the states list. Note that the order of the states_list is arbitrary. Also note that a workflow is considered RUNNING if part of it is unfinished and part of it finished. So I don't undersatnd the problem from the information you are showing. You can use "--d more" option to show things in more detail.

--
You received this message because you are subscribed to the Google Groups "fireworkflows" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fireworkflow...@googlegroups.com.
To post to this group, send email to firewo...@googlegroups.com.
Visit this group at https://groups.google.com/group/fireworkflows.
To view this discussion on the web visit https://groups.google.com/d/msgid/fireworkflows/70525266-ffeb-4051-b9d6-739419fe5f51%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Anubhav Jain

unread,
May 10, 2016, 12:28:47 PM5/10/16
to fireworkflows
The order of states in the states_list is arbitrary.

Try using the "--d more" option. It will show you more details about exactly what is ready, completed, or waiting.

As for the FIZZLED, you will need to provide more details. I cannot read the mind of your workflow. e.g. what is the error? (e.g. specified by the launch or the output file). is the error related to fireworks itself or some feature of your workflow (in the latter case, this list is not the appropriate help channel)

specter119

unread,
May 10, 2016, 12:46:44 PM5/10/16
to fireworkflows

Thanks for your patience, Anubhav Jain
let’s talk about the last one for example:

lpad get_wflows -i 2592 -d more
{
    "name": "Cu1 Gd1 Te2--2592",
    "state": "FIZZLED",
    "states": {
        "Cu1_Gd1_Te2--Add_to_SNL_database--469": "COMPLETED",
        "Cu1_Gd1_Te2--GGA_optimize_structure_(2x)--470": "COMPLETED",
        "Cu1_Gd1_Te2--VASP_db_insertion--471": "COMPLETED",
        "Cu1_Gd1_Te2--Controller_add_Electronic_Structure_v2--472": "COMPLETED",
        "Cu1_Gd1_Te2--GGA_static_v2--2590": "COMPLETED",
        "Cu1_Gd1_Te2--VASP_db_insertion--2591": "COMPLETED",
        "Cu1_Gd1_Te2--GGA_Uniform_v2--2592": "WAITING",
        "Cu1_Gd1_Te2--VASP_db_insertion--2593": "WAITING",
        "Cu1_Gd1_Te2--GGA_band_structure_v2--2594": "WAITING",
        "Cu1_Gd1_Te2--VASP_db_insertion--2595": "WAITING",
        "Cu1_Gd1_Te2--GGA_Boltztrap--2596": "WAITING",
        "Cu1_Gd1_Te2--GGA_static_v2--2606": "COMPLETED",
        "Cu1_Gd1_Te2--VASP_db_insertion--2607": "FIZZLED"
    },
    "created_on": "2016-05-06T02:17:37.076000",
    "launch_dirs": {
        "Cu1_Gd1_Te2--Add_to_SNL_database--469": [
            "/lustre/home/umjzhh-1/launcher/layered_material/ycute2/substitution_01stRun/block_2016-05-06-02-19-20-552442/launcher_2016-05-07-03-03-23-617978"
        ],
        "Cu1_Gd1_Te2--GGA_optimize_structure_(2x)--470": [
            "/lustre/home/umjzhh-1/launcher/layered_material/ycute2/substitution_01stRun/block_2016-05-06-02-19-20-552442/launcher_2016-05-07-03-03-58-948288"
        ],
        "Cu1_Gd1_Te2--VASP_db_insertion--471": [
            "/lustre/home/umjzhh-1/launcher/layered_material/ycute2/substitution_01stRun/block_2016-05-06-02-19-20-552442/launcher_2016-05-07-04-08-40-426531"
        ],
        "Cu1_Gd1_Te2--Controller_add_Electronic_Structure_v2--472": [
            "/lustre/home/umjzhh-1/launcher/layered_material/ycute2/substitution_01stRun/block_2016-05-06-02-19-20-552442/launcher_2016-05-07-04-08-50-754977"
        ],
        "Cu1_Gd1_Te2--GGA_static_v2--2590": [
            "/lustre/home/umjzhh-1/launcher/layered_material/ycute2/substitution_01stRun/block_2016-05-06-02-19-20-552442/launcher_2016-05-07-04-09-31-230718"
        ],
        "Cu1_Gd1_Te2--VASP_db_insertion--2591": [
            "/lustre/home/umjzhh-1/launcher/layered_material/ycute2/substitution_01stRun/block_2016-05-06-02-19-20-552442/launcher_2016-05-07-04-27-58-486517"
        ],
        "Cu1_Gd1_Te2--GGA_Uniform_v2--2592": [],
        "Cu1_Gd1_Te2--VASP_db_insertion--2593": [],
        "Cu1_Gd1_Te2--GGA_band_structure_v2--2594": [],
        "Cu1_Gd1_Te2--VASP_db_insertion--2595": [],
        "Cu1_Gd1_Te2--GGA_Boltztrap--2596": [],
        "Cu1_Gd1_Te2--GGA_static_v2--2606": [
            "/lustre/home/umjzhh-1/launcher/layered_material/ycute2/substitution_01stRun/block_2016-05-06-02-19-20-552442/launcher_2016-05-07-04-28-33-768387"
        ],
        "Cu1_Gd1_Te2--VASP_db_insertion--2607": [
            "/lustre/home/umjzhh-1/launcher/layered_material/ycute2/substitution_01stRun/block_2016-05-06-02-19-20-552442/launcher_2016-05-10-14-01-49-524562"
        ]
    },
    "updated_on": "2016-05-10T14:01:55.656000"
}
more /lustre/home/umjzhh-1/launcher/layered_material/ycute2/substitution_01stRun/block_2016-05-06-02-19-20-552442/launcher_2016-05-07-04-28-33-768387/Cu1_Gd1_Te2--GGA_sta-16114.*
::::::::::::::
Cu1_Gd1_Te2--GGA_sta-16114.error
::::::::::::::
INFO:launchpad:Task completed: Unconverged Handler Task
INFO:launchpad:Task started: Vasp Custodian Task.
/lustre/home/umjzhh-1/kl_me2/codes/pymatgen/pymatgen/io/vasp/outputs.py:389: UnconvergedVASPWarning: vasprun.xml is an unconverged VASP run.
Electronic convergence reached: False.
Ionic convergence reached: True.
  warnings.warn(msg, UnconvergedVASPWarning)
::::::::::::::
Cu1_Gd1_Te2--GGA_sta-16114.out
::::::::::::::
kl_me2 environment
2016-05-07 12:28:36,091 INFO Hostname/IP lookup (this will take a few seconds)
2016-05-07 12:28:36,094 INFO Launching Rocket
2016-05-07 12:28:38,430 DEBUG Querying for duplicates, fw_id: 2606
2016-05-07 12:28:38,441 DEBUG Verifying for duplicates, fw_ids: 2606, 2590
2016-05-07 12:28:38,448 DEBUG Verifying for duplicates, fw_ids: 2606, 2606
2016-05-07 12:28:38,464 DEBUG FW with id: 2606 is unique!
2016-05-07 12:28:38,467 DEBUG Created/updated Launch with launch_id: 1230
2016-05-07 12:28:38,505 DEBUG Checked out FW with id: 2606
2016-05-07 12:28:38,515 INFO RUNNING fw_id: 2606 in directory: /lustre/home/umjzhh-1/launcher/layered_material/ycute2/substitution_01stRun/block_2016-05-06-02-19-20-552442/launc
her_2016-05-07-04-28-33-768387
2016-05-07 12:28:38,526 INFO Task started: Vasp Copy Task.
COPYING /lustre/home/umjzhh-1/launcher/layered_material/ycute2/substitution_01stRun/block_2016-05-06-02-19-20-552442/launcher_2016-05-07-04-09-31-230718/INCAR.gz INCAR.gz
COPYING /lustre/home/umjzhh-1/launcher/layered_material/ycute2/substitution_01stRun/block_2016-05-06-02-19-20-552442/launcher_2016-05-07-04-09-31-230718/KPOINTS.gz KPOINTS.gz
COPYING /lustre/home/umjzhh-1/launcher/layered_material/ycute2/substitution_01stRun/block_2016-05-06-02-19-20-552442/launcher_2016-05-07-04-09-31-230718/POSCAR.gz POSCAR.gz
COPYING /lustre/home/umjzhh-1/launcher/layered_material/ycute2/substitution_01stRun/block_2016-05-06-02-19-20-552442/launcher_2016-05-07-04-09-31-230718/POTCAR.gz POTCAR.gz
COPYING /lustre/home/umjzhh-1/launcher/layered_material/ycute2/substitution_01stRun/block_2016-05-06-02-19-20-552442/launcher_2016-05-07-04-09-31-230718/CONTCAR.gz CONTCAR.gz
2016-05-07 12:28:38,625 INFO Task completed: Vasp Copy Task
2016-05-07 12:28:38,625 INFO Task started: Unconverged Handler Task.
2016-05-07 12:28:38,719 INFO Task completed: Unconverged Handler Task
2016-05-07 12:28:38,721 INFO Task started: Vasp Custodian Task.
(/lustre/home/umjzhh-1/kl_me2/virtenv_kl_me2)[umjzhh-1@mu06 launcher_2016-05-07-04-28-33-768387]$ more /lustre/home/umjzhh-1/launcher/layered_material/ycute2/substitution_01stRun/block_2016-05-06-02-19-20-552442/launcher_2016-05-07-04-28-33-768387/Cu1_Gd1_Te2--GGA_sta-16114.*
::::::::::::::
/lustre/home/umjzhh-1/launcher/layered_material/ycute2/substitution_01stRun/block_2016-05-06-02-19-20-552442/launcher_2016-05-07-04-28-33-768387/Cu1_Gd1_Te2--GGA_sta-16114.error
::::::::::::::
INFO:launchpad:Task completed: Unconverged Handler Task
INFO:launchpad:Task started: Vasp Custodian Task.
/lustre/home/umjzhh-1/kl_me2/codes/pymatgen/pymatgen/io/vasp/outputs.py:389: UnconvergedVASPWarning: vasprun.xml is an unconverged VASP run.
Electronic convergence reached: False.
Ionic convergence reached: True.

  warnings.warn(msg, UnconvergedVASPWarning)
:::::::::::::
/lustre/home/umjzhh-1/launcher/layered_material/ycute2/substitution_01stRun/block_2016-05-06-02-19-20-552442/launcher_2016-05-07-04-28-33-768387/Cu1_Gd1_Te2--GGA_sta-16114.out
::::::::::::::
kl_me2 environment
2016-05-07 12:28:36,091 INFO Hostname/IP lookup (this will take a few seconds)
2016-05-07 12:28:36,094 INFO Launching Rocket
2016-05-07 12:28:38,430 DEBUG Querying for duplicates, fw_id: 2606
2016-05-07 12:28:38,441 DEBUG Verifying for duplicates, fw_ids: 2606, 2590
2016-05-07 12:28:38,448 DEBUG Verifying for duplicates, fw_ids: 2606, 2606
2016-05-07 12:28:38,464 DEBUG FW with id: 2606 is unique!
2016-05-07 12:28:38,467 DEBUG Created/updated Launch with launch_id: 1230
2016-05-07 12:28:38,505 DEBUG Checked out FW with id: 2606
2016-05-07 12:28:38,515 INFO RUNNING fw_id: 2606 in directory: /lustre/home/umjzhh-1/launcher/layered_material/ycute2/substitution_01stRun/block_2016-05-06-02-19-20-552442/launc
her_2016-05-07-04-28-33-768387
2016-05-07 12:28:38,526 INFO Task started: Vasp Copy Task.
COPYING /lustre/home/umjzhh-1/launcher/layered_material/ycute2/substitution_01stRun/block_2016-05-06-02-19-20-552442/launcher_2016-05-07-04-09-31-230718/INCAR.gz INCAR.gz
COPYING /lustre/home/umjzhh-1/launcher/layered_material/ycute2/substitution_01stRun/block_2016-05-06-02-19-20-552442/launcher_2016-05-07-04-09-31-230718/KPOINTS.gz KPOINTS.gz
COPYING /lustre/home/umjzhh-1/launcher/layered_material/ycute2/substitution_01stRun/block_2016-05-06-02-19-20-552442/launcher_2016-05-07-04-09-31-230718/POSCAR.gz POSCAR.gz
COPYING /lustre/home/umjzhh-1/launcher/layered_material/ycute2/substitution_01stRun/block_2016-05-06-02-19-20-552442/launcher_2016-05-07-04-09-31-230718/POTCAR.gz POTCAR.gz
COPYING /lustre/home/umjzhh-1/launcher/layered_material/ycute2/substitution_01stRun/block_2016-05-06-02-19-20-552442/launcher_2016-05-07-04-09-31-230718/CONTCAR.gz CONTCAR.gz
2016-05-07 12:28:38,625 INFO Task completed: Vasp Copy Task
2016-05-07 12:28:38,625 INFO Task started: Unconverged Handler Task.
2016-05-07 12:28:38,719 INFO Task completed: Unconverged Handler Task
2016-05-07 12:28:38,721 INFO Task started: Vasp Custodian Task.

在 2016年5月11日星期三 UTC+8上午12:28:47,Anubhav Jain写道:

Anubhav Jain

unread,
May 10, 2016, 2:14:05 PM5/10/16
to fireworkflows
It looks like your VASP run is unconverged and (likely) the Uncoverged Handler Task throwing an error. This has nothing to do with FireWorks.

Please review the revised policies for posting to this list. Note that in the future, such messages will be rejected.

specter119

unread,
May 10, 2016, 9:42:09 PM5/10/16
to fireworkflows
Thank you, Anubhav Jain. i will review the revised policies carefully.

在 2016年5月11日星期三 UTC+8上午2:14:05,Anubhav Jain写道:
Reply all
Reply to author
Forward
0 new messages