Help with Rosetta on the cloud

29 views
Skip to first unread message

Rick Baker

unread,
Mar 9, 2018, 1:40:28 PM3/9/18
to Cryo-EM in the cloud
Hi Mike,

I have some questions about using your cloud Rosetta tools. I am following the protocol from http://cryoem-tools.cloud/rosetta-aws/

For the most part it is very helpful, but it gets confusing about how to set this up when my model has multiple chains.

I have a 4 Ang map that I want to model. The complex has 5 chains. I successfully made the .fasta and .hrr files and ran your first cloud command, rosetta_refinement_on_aws.py. I ran this command 5 separate times, each with the fasta file that contains all five chains and an .hrr file that contains the sequence alignment for a single chain. This seems to run fine, and I get a folder for each chain that contains various model for that chain.

So now I have 25 pdb files (I've changed the base name for each for clarity):
chaina_201.pdb     chaina_202.pdb     chaina_203.pdb     chaina_204.pdb     chaina_205.pdb
chainb_201.pdb     chainb_202.pdb     chainb_203.pdb     chainb_204.pdb     chainb_205.pdb
chainc_201.pdb     chainc_202.pdb     chainc_203.pdb     chainc_204.pdb     chainc_205.pdb
chaind_201.pdb     chaind_202.pdb     chaind_203.pdb     chaind_204.pdb     chaind_205.pdb
chaine_201.pdb     chaine_202.pdb     chaine_203.pdb     chaine_204.pdb     chaine_205.pdb

From what I gather I need to make these 25 files into 5 files, grouped by their weight (i.e. all 201 together, all 202 together, etc.) This will give me 5 pdbs, each containing one copy of chain a, b, c, ,d , e. So, the "Weight 1" model will be a concatenation of chaina_201.pdb, chainb_201.pdb, chainc_201.pdb, chaind_201.pdb, chaine_201.pdb.

My primary concern is how to make this large pdb file, which contains all 5 chains. Should I concatenate using the command line? Or should I dock all of the models into Chimera and then save a new pdb relative to the map? Chimera seems to not change much in the PDB file, but it does add some extra lines.  I'm mainly concerned b/c your instructions are very clear that the PDB file needs ot be made in a specific way, while the output from rosetta_refinement_on_Aws.py seems to indicate that the pdb files need to be docked and saved with new coordinates relative to the map. 

Finally, some of my chains have the same label. Is it important for each chain to have a unique label id? I'm assuming yes and am editing this using Chimera.

Thanks!
Rick

Rick Baker

unread,
Mar 9, 2018, 2:04:33 PM3/9/18
to Cryo-EM in the cloud
Related to making a multi-chain PDB file, what about the header/ remark?

Each PDB file created from the .hrr file has a single remark at the beginning, which is the sequence of the chain. How should I treat this remark when I concatenate the files?

When I write out all 5 pdb files in Chimera, it only includes the header from a single file. If I manually include all 5, when how does it work with chain identifiers?


Rick Baker

unread,
Mar 9, 2018, 4:09:31 PM3/9/18
to Cryo-EM in the cloud
I went ahead and made the PDB files as best I could. I just tried to submit to the cloud for CM and I got this error:

~/AWS/cryoem-cloud-tools-master/rosetta/rosetta_refinement_on_aws.py --em_map=cryosparc_exp001160_004_sharp_zflip_110_box.mrc --fasta=AP2-NECAP.fasta --AMI=ami-ae0784ce --pdb_list=pdb_list.txt --outdir=CM_run1



Starting Rosetta model refinement in the cloud ...


Traceback (most recent call last):

  File "/home/ribaker/AWS/cryoem-cloud-tools-master/rosetta/rosetta_refinement_on_aws.py", line 320, in <module>

    cmd='%s/rosetta_prepare_input_files.py --pdb_list=%s --em_map=%s --fasta=%s --outdir=%s/ %s'  %(rosettadir,pdb_list,params['em_map'], params['fasta'],params['outdir'])

TypeError: not enough arguments for format string

Rick Baker

unread,
Mar 9, 2018, 4:26:36 PM3/9/18
to Cryo-EM in the cloud
So I think there might be a bug in the code. In line 320 the --outdir flag is asking for two strings, but there aren't enough arguments. The line is asking for 6 strings but only 5 variable are listed. If you look at line 314 you can see the same formatting, but it seems ot be correct in line 309. I changed the line of code and the error went away.

Now I get this error. It seems to maybe be related to the name of the PDB files? I renamed my five PDB files to model_201.pdb, model_202.pdb, model_203.pdb, model_204.pdb, and model_205.pdb after aligning them to the map and re-writing each to include 5 chains of the full complex I'm trying to model.

 ~/AWS/cryoem-cloud-tools-master/rosetta/rosetta_refinement_on_aws.py --em_map=cryosparc_exp001160_004_sharp_zflip_110_box.mrc --fasta=AP2-NECAP.fasta --AMI=ami-ae0784ce --pdb_list=pdb_list.txt --outdir=CM_run1



Starting Rosetta model refinement in the cloud ...


Traceback (most recent call last):

  File "/home/ribaker/AWS/cryoem-cloud-tools-master//aws//../rosetta//rosetta_prepare_input_files.py", line 272, in <module>

    makeCMfile(params,outdir)

  File "/home/ribaker/AWS/cryoem-cloud-tools-master//aws//../rosetta//rosetta_prepare_input_files.py", line 203, in makeCMfile

    replace_weight = 'weight="%s"' %(splitPdb[1])

IndexError: list index out of range

Starting Rosetta job on 6 x c4.8xlarge virtual machines on AWS in region us-west-2a (initialization will take a few minutes)

Rick Baker

unread,
Mar 9, 2018, 5:17:21 PM3/9/18
to Cryo-EM in the cloud
The script correctly opened up all of the c4.8xlarge instances and copied the files.  The outputs say that the jobs were killed for what it assumed were "bad inputs".

I attached rosetta.out and rosetta.err


rosetta.out
rosetta.err

Michael Cianfrocco

unread,
Mar 11, 2018, 8:55:29 PM3/11/18
to Cryo-EM in the cloud
Hey Rick,

Thanks for all this useful info! I'm going through this myself right now with a multi-chain model. This might take me a few days but as I figure it out, I'll update here. 

Mike

Michael Cianfrocco

unread,
Mar 11, 2018, 10:24:44 PM3/11/18
to Cryo-EM in the cloud
Hey Rick - 

While I"m still in the midst of running Rosetta-CM, I do know that 1) you were right about the bug on line 320, I've changed this in the Github repo 2) your error is related to the pdb_list.txt file missing the 'weight' information. 

The information on the website was missing the fact that you need to add a '1' or '0' next to each pdb file that you are listing in the pdb_list.txt file. I've updated the website to include this. 

Hopefully that is the last bug  ¯\_(ツ)_/¯

Mike

Rick Baker

unread,
Mar 13, 2018, 1:03:52 PM3/13/18
to Cryo-EM in the cloud
Thanks for the fast turn-around, Mike!

Putting the weight in the pdb_list.txt file fixed the error, and the job is now running on the cloud. We will see what happens in a few hours.

I have some more questions:

1. In Part II where you describe making a PDB file with multiple chains, you say that after replacing "END" with "TER", one should run relabechain.pl on the file. What is this? I don't see it in the Rosetta scripts in your package. 

2. Can you clarify whether the pdb files need to be docked into the mrc map and re-written with new coordinates? OR simply use the files as-is from the initial Rosetta output? (This is from Part II of your tutorial)

Rick Baker

unread,
Mar 13, 2018, 3:25:24 PM3/13/18
to Cryo-EM in the cloud
So the job fired up and then stopped running within 30 minutes or so.  

A few things:

1. The AWS script seemed to stop following the job. My local rosetta.out file is nearly blank (it stops with the line 'Rosetta job submitted on AWS! Monitor output file: 2018-03-13-160012-Rosetta-CM/rosetta.out to check status of job'. Also, rosetta.err never got copied to my local directory. I had to login to the AWS instances to find the actual rosetta.out and rosetta.err files.

2. Is the above problem due to some ssh error? Or permission error? I see this output in my command line after I submit the command:

[rick@figaro full_complex]$ ~/AWS/cryoem-cloud-tools-master/rosetta/rosetta_refinement_on_aws.py --em_map=cryosparc_exp001160_004_sharp_zflip_110_box.mrc --fasta=AP2-NECAP.fasta --AMI=ami-ae0784ce --pdb_list=pdb_list.txt



Starting Rosetta model refinement in the cloud ...


Starting Rosetta job on 6 x c4.8xlarge virtual machines on AWS in region us-west-2a (initialization will take a few minutes)


...uploading files to AWS ...

ssh  -o "StrictHostKeyChecking no" -q -n -f -i /home/ribaker/.aws/rick_oregon.pem ubu...@35.165.244.110 "export PATH=/usr/bin/$PATH && export PATH=/home/Rosetta/2017_08/main/source/:$PATH && /usr/local/bin/parallel -j36 ./run_final.sh {} ::: {1..36}> /home/ubuntu/rosetta.out 2> /home/ubuntu/rosetta.err < /dev/null &"

ssh  -o "StrictHostKeyChecking no" -q -n -f -i /home/ribaker/.aws/rick_oregon.pem ubu...@34.215.182.116 "export PATH=/usr/bin/$PATH && export PATH=/home/Rosetta/2017_08/main/source/:$PATH && /usr/local/bin/parallel -j36 ./run_final.sh {} ::: {1..36}> /home/ubuntu/rosetta.out 2> /home/ubuntu/rosetta.err < /dev/null &"

ssh  -o "StrictHostKeyChecking no" -q -n -f -i /home/ribaker/.aws/rick_oregon.pem ubu...@54.213.51.177 "export PATH=/usr/bin/$PATH && export PATH=/home/Rosetta/2017_08/main/source/:$PATH && /usr/local/bin/parallel -j36 ./run_final.sh {} ::: {1..36}> /home/ubuntu/rosetta.out 2> /home/ubuntu/rosetta.err < /dev/null &"

ssh  -o "StrictHostKeyChecking no" -q -n -f -i /home/ribaker/.aws/rick_oregon.pem ubu...@35.165.236.143 "export PATH=/usr/bin/$PATH && export PATH=/home/Rosetta/2017_08/main/source/:$PATH && /usr/local/bin/parallel -j36 ./run_final.sh {} ::: {1..36}> /home/ubuntu/rosetta.out 2> /home/ubuntu/rosetta.err < /dev/null &"

ssh  -o "StrictHostKeyChecking no" -q -n -f -i /home/ribaker/.aws/rick_oregon.pem ubu...@35.167.191.216 "export PATH=/usr/bin/$PATH && export PATH=/home/Rosetta/2017_08/main/source/:$PATH && /usr/local/bin/parallel -j36 ./run_final.sh {} ::: {1..36}> /home/ubuntu/rosetta.out 2> /home/ubuntu/rosetta.err < /dev/null &"

ssh  -o "StrictHostKeyChecking no" -q -n -f -i /home/ribaker/.aws/rick_oregon.pem ubu...@52.32.163.235 "export PATH=/usr/bin/$PATH && export PATH=/home/Rosetta/2017_08/main/source/:$PATH && /usr/local/bin/parallel -j36 ./run_final.sh {} ::: {1..36}> /home/ubuntu/rosetta.out 2> /home/ubuntu/rosetta.err < /dev/null &"


Rosetta job submitted on AWS! Monitor output file: 2018-03-13-160012-Rosetta-CM/rosetta.out to check status of job


3. I attached rosetta.err and rosetta.out. I only included the first 1000 and last 1000 lines of rosetta.out (it is 3 million lines long)

I'll look into these errors and see what I can find. Indrajit doesn't know what any of the errors mean.

Rick
rosetta.err
rosetta.out.truncate

Rick Baker

unread,
Mar 13, 2018, 3:27:27 PM3/13/18
to Cryo-EM in the cloud
Also, as the job was running, run.err only had a single error:

sh: 1: ps: not found

Rick Baker

unread,
Mar 13, 2018, 5:28:45 PM3/13/18
to Cryo-EM in the cloud
I also see this output in the terminal that I launch from. Every few minutes or so it writes out a number, and then when the job is "over" I get the error at the end. Related to problems syncing files?

13

13

13

13

13

13

13

13

13

13

13

13

13

13

13

13

13

13

13

13

13

13

13

13

13

13

13

13

13

13

13

13

13

13

13

13

13


Traceback (most recent call last):

  File "/home/ribaker/AWS/cryoem-cloud-tools-master//aws//../rosetta//rosetta_waiting.py", line 103, in <module>

    if float(numtot) > 25: 

ValueError: could not convert string to float: 

Reply all
Reply to author
Forward
0 new messages