Re: Run workflow from PC to server ?

75 views
Skip to first unread message

Joseph Montoya

unread,
Aug 6, 2018, 2:04:20 PM8/6/18
to tayeb....@gmail.com, atomate
Hi Tayeb,

A few things:

1) Your error message for lpad.reset makes me thing that there might be an issue with your mongo connection.  Can you give us some more information?  Have you confirmed that your database credentials are working for connecting to the mongo server on the supercomputer?  Also, can you post the full traceback for your errors?

2) I think that it’s likely possible to submit jobs from your own computer (i. e. via passwordless login), but this likely won’t work if there’s an issue connecting to the database as above.

3) The vasp_cmd shouldn’t need to be set in the custodian code, it should be supplied as a parameter to the workflow constructor, or supplied as a env_chk parameter in both the constructor and the fireworker file.

Best,
Joey

On Aug 6, 2018, at 1:03 AM, tayeb....@gmail.com wrote:

Hi atomate group, Hi Joey,


I think I have restriction on running workflow on supercomputer, (lpad.reset is not working : operationFailure: there are no user authenticated , even-though the same notebook runs on my PC) so I am planning to prepare the workflow (notebook) on my PC, and submit it to the server. is this possible ?

my attempt so far is to created ssh connection so I can submit a job without password  through  command line : "ssh m...@server.adress  sbatch /lustre/home/path/atomate.slurm"


but I don't know how to configuration it in atomate, my attempt was to set vasp_cmd in custodian/vasp/jobs.py but it is designed for path, and atomate prints : FileNotFoundError: [Errno 2] No such file or directory:

Appreciate any help.

Thank you.
Tayeb. Ph.D
QEERI.


--
You received this message because you are subscribed to the Google Groups "atomate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to atomate+u...@googlegroups.com.
To post to this group, send email to ato...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/atomate/eed737ed-1729-46d3-8cf2-46350514b236%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

tayeb....@gmail.com

unread,
Aug 9, 2018, 10:15:09 AM8/9/18
to atomate
Thank you Joseph,

I have moved out that idea, the problem now is that when i use rapidfire from fireworks.core.rocket_launcher it starts properly and prepare the files but I am using SLURM queue system, so it will not work properly.

when I use fireworks.queue.queue_launcher to import rapidfire, launch_rocket_to_queue..  defining properly launchpad =LaunchPad(host='', port=, name='', username= '', password='')
I get this error :

2018-08-09 16:49:40,530 INFO getting queue adapter
2018-08-09 16:49:40,531 INFO Found previous block, using /lustre/home/elbentr81/atomate/block_2018-08-09-13-19-25-442029
2018-08-09 16:49:41,696 INFO The number of jobs currently in the queue is: 0
2018-08-09 16:49:41,696 INFO 0 jobs in queue. Maximum allowed by user: 0
2018-08-09 16:49:42,811 INFO Launching a rocket!
2018-08-09 16:49:43,913 INFO Created new dir /lustre/home/elbentr81/atomate/block_2018-08-09-13-19-25-442029/launcher_2018-08-09-13-49-43-912035
2018-08-09 16:49:43,913 INFO moving to launch_dir /lustre/home/elbentr81/atomate/block_2018-08-09-13-19-25-442029/launcher_2018-08-09-13-49-43-912035
/lustre/home/elbentr81/anaconda3/lib/python3.6/site-packages/fireworks/queue/queue_adapter.py:143: UserWarning: Key nnodes has been specified in qadapter but it is not present in template, please check template (/lustre/home/elbentr81/atomate/atomate.slurm) for supported keys.
  .format(subs_key, self.template_file))



means that it is not reading mongodb? (lpad get_fws is working and gives me a list of the workflow).


Best regards.
Tayeb.

Joseph Montoya

unread,
Aug 9, 2018, 10:26:41 AM8/9/18
to tayeb....@gmail.com, atomate
Hi Tayeb,

The error "UserWarning: Key nnodes has been specified in qadapter but it is not present in template” is saying that there’s a line in your qadapter specifying “nnodes”, but the slurm template uses “nodes” instead.  Can you look in your qadapter file, if there’s a line that specifies nnodes, change it to nodes.  You can see the slurm template here:


Best,
Joey



tayeb....@gmail.com

unread,
Aug 10, 2018, 11:13:46 AM8/10/18
to atomate
Thank you, the Warning is solved, the problem I am having is about fireworks not generating VASP files when I use fireworks.queue.queue_launcher, I have moved the question to 
fireworkflows 
https://groups.google.com/d/msgid/fireworkflows/5f618fe9-b6fa-43d9-99a6-b0ada1b8f5cd%40googlegroups.com?utm_medium=email&utm_source=footer



Thank you.
Tayeb.

tayeb....@gmail.com

unread,
Aug 10, 2018, 1:56:30 PM8/10/18
to atomate
Dear Dr. Montoya,

the log looks OK, the jobs starts, and craches after vasp starts and not finding the INCAR,  POTCAR ....  the job directory after the crash contain only : atomate-2562062.error  block_2018-08-10-15-08-49-829801  vasp.log
atomate-2562062.out    FW_submit.script

while in  rapidfire loaded from rocket, I can find all VASP files, POSCAR, INCAR ... 

python running :

from fireworks import Firework, FWorker, LaunchPad
from fireworks.queue.queue_launcher import rapidfire
from atomate.vasp.firetasks.run_calc import RunVaspCustodian, RunVaspFake, RunVaspDirect, RunNoVasp
from fireworks.utilities.fw_serializers import load_object_from_file



#from atomate.vasp.powerups import remove_custodian
queueadapter = load_object_from_file("/lustre/home/elbentr81/auto-dir/my_qadapter.yaml")
launchpad =LaunchPad(host='ds247061.mlab.com', port=***, name='tatitechno', username= 'tayeb', password='****')

if __name__ == "__main__":
    
    rapidfire(launchpad, FWorker(), queueadapter , launch_dir='.', nlaunches=2, njobs_queue=4,
              njobs_block=500, sleep_time=None, reserve=False, strm_lvl='INFO', timeout=None,
              fill_mode=False)


I get:

/lustre/home/elbentr81/anaconda3/lib/python3.6/site-packages/pymatgen/__init__.py:35: UserWarning: With effect from pmg 5.0, all pymatgen settings are prefixed with a "PMG_". E.g., "PMG_VASP_PSP_DIR" instead of "VASP_PSP_DIR".
  warnings.warn('With effect from pmg 5.0, all pymatgen settings are'
2018-08-10 20:15:07,039 INFO getting queue adapter
2018-08-10 20:15:07,039 INFO Found previous block, using /lustre/home/elbentr81/atomate/block_2018-08-10-15-07-03-101203
2018-08-10 20:15:08,232 INFO The number of jobs currently in the queue is: 0
2018-08-10 20:15:08,233 INFO 0 jobs in queue. Maximum allowed by user: 0
2018-08-10 20:15:09,349 INFO Launching a rocket!
2018-08-10 20:15:10,452 INFO Created new dir /lustre/home/elbentr81/atomate/block_2018-08-10-15-07-03-101203/launcher_2018-08-10-17-15-10-451541
2018-08-10 20:15:10,452 INFO moving to launch_dir /lustre/home/elbentr81/atomate/block_2018-08-10-15-07-03-101203/launcher_2018-08-10-17-15-10-451541
2018-08-10 20:15:10,454 INFO submitting queue script
2018-08-10 20:15:11,625 INFO Job submission was successful and job_id is 2562066
2018-08-10 20:15:11,626 INFO Sleeping for 5 seconds...zzz...
2018-08-10 20:15:17,736 INFO Launching a rocket!
2018-08-10 20:15:18,842 INFO Created new dir /lustre/home/elbentr81/atomate/block_2018-08-10-15-07-03-101203/launcher_2018-08-10-17-15-18-841873
2018-08-10 20:15:18,842 INFO moving to launch_dir /lustre/home/elbentr81/atomate/block_2018-08-10-15-07-03-101203/launcher_2018-08-10-17-15-18-841873
2018-08-10 20:15:18,845 INFO submitting queue script
2018-08-10 20:15:20,067 INFO Job submission was successful and job_id is 2562067
2018-08-10 20:15:20,068 INFO Launched allowed number of jobs: 2
###########################################
atomate-2562062.out 
<<FW_ auto Tayeb>>
2018-08-10 18:08:49,828 INFO getting queue adapter
2018-08-10 18:08:49,830 INFO Created new dir /lustre/home/elbentr81/atomate/block_2018-08-10-15-07-03-101203/launcher_2018-08-10-15-08-45-061561/block_2018-08-10-15-08-49-829801
2018-08-10 18:08:49,853 INFO The number of jobs currently in the queue is: 1
2018-08-10 18:08:49,854 INFO 1 jobs in queue. Maximum allowed by user: 4
OMP_NUM_THREADS= 1
######################################
 atomate-2562062.error
/lustre/home/elbentr81/anaconda3/lib/python3.6/site-packages/pymatgen/__init__.py:35: UserWarning: With effect from pmg 5.0, all pymatgen settings are prefixed with a "PMG_". E.g., "PMG_VASP_PSP_DIR" instead of "VASP_PSP_DIR".
  warnings.warn('With effect from pmg 5.0, all pymatgen settings are'



Thank you very much for your Help !


Best regards.
Tayeb.


On Monday, August 6, 2018 at 9:04:20 PM UTC+3, Joseph Montoya wrote:

Anubhav Jain

unread,
Aug 10, 2018, 11:20:09 PM8/10/18
to atomate
Hi Tayeb,

It looks like you have this same issue posted on the FireWorks forum. Let's continue with the issue here only instead of there.

- First, it is strange that you have this directory structure:

/lustre/home/elbentr81/atomate/block_2018-08-10-15-07-03-101203/launcher_2018-08-10-15-08-45-061561/block_2018-08-10-15-08-49-829801

It is like you are running qlaunch rapidfire inside of a launcher directory, which is very unusual

- Second there should be more output files in the actual directory that the job ran in.

If I had to guess, I would say that your "my_qadapter.yaml" file is set up incorrectly - perhaps the "rocket_launch" command is set to "qlaunch" instead of "rlaunch". Can you attach the my_qadapter.yaml file that you have?

tayeb....@gmail.com

unread,
Aug 12, 2018, 1:29:38 PM8/12/18
to atomate
Dear Dr. Anubhav,

Thank you for your help, what you say about qlaunch was right, I was using qlaunch  after rlaunch didn't work (and this was wrong), I made fresh installation due to some modification in (fireworks/atomate/custodian) during my learning curve, the calculation are staring now and running fine (for most of them), and I got the error that was causing VASP to star without finding the INCAR file : 
pymongo.errors.ServerSelectionTimeoutError: [Errno 104] Connection reset by peer

So, its connection problem between the supercomputer and mongodb, I am trying to set a connection between a workstation (with local mongodb and the supercomputer) to avoid this problem.

Thank you and to Dr. Montoya for the prompt support in this ticket. 

Appreciate it.
Tayeb.
Reply all
Reply to author
Forward
0 new messages