Running from command line

62 views
Skip to first unread message

Matt K.

unread,
Sep 27, 2012, 2:43:10 PM9/27/12
to sate...@googlegroups.com
I have a few questions about running 2.2.4 from the command line.
1. When I run I get "Refused to clean" errors usually after step 1. (MacOS 10.7.5, Python 2.7.2 as supplied by Apple)
Refused to clean '/Users/mk/.sate/tree/tempQyUgkp/step0/centroid/r0': not created by SATeRefused to clean '/Users/mk/.sate/tree/tempQyUgkp/step0/centroid/r1': not created by SATeRefused to clean '/Users/mk/.sate/tree/tempQyUgkp/step0/centroid/r2': not created by SATeSATe ERROR: SATe is exiting because of an error:
Path exists: '/Users/mk/.sate/tree/tempQyUgkp/step0/centroid/r2/d1' 
I get similar errors on a centos cluster. There's nothing wrong with the permissions of the folders trying to be removed. I can remove them manually from the shell after the run quits. I've given a different directory for temporary files and I've gotten the same result.

2. I believe there's an error in the configuration file example given on the download page. The [sate] section needs to be [SATe] or you get an error that the config file doesn't have the proper headers. The error: 'The file "config" does not appear to be a valid configuration file format. It lacks section headers.'

3. Could you tell me more about the configuration of the config file. Can I include the input file names in this file or is that always done in the command line? I've noticed that if I put a value for temporaries (e.g. "temporaries = tmp/") into the config file it is ignored without an error, but that setting does work if given in the command line. Are there only some options which are read? Also, what are the other valid headers other than what is shown in the sample? 

Thank you,
Matt K

Jamie Oaks

unread,
Sep 29, 2012, 1:20:58 AM9/29/12
to sate...@googlegroups.com
Hi Matt,

I have a few questions about running 2.2.4 from the command line.
1. When I run I get "Refused to clean" errors usually after step 1. (MacOS 10.7.5, Python 2.7.2 as supplied by Apple)
Refused to clean '/Users/mk/.sate/tree/tempQyUgkp/step0/centroid/r0': not created by SATeRefused to clean '/Users/mk/.sate/tree/tempQyUgkp/step0/centroid/r1': not created by SATeRefused to clean '/Users/mk/.sate/tree/tempQyUgkp/step0/centroid/r2': not created by SATeSATe ERROR: SATe is exiting because of an error:
Path exists: '/Users/mk/.sate/tree/tempQyUgkp/step0/centroid/r2/d1' 
I get similar errors on a centos cluster. There's nothing wrong with the permissions of the folders trying to be removed. I can remove them manually from the shell after the run quits. I've given a different directory for temporary files and I've gotten the same result.

Can you please send me the exact command line invocation you are using when you are getting this message?  The good news is that this error will have no effect on the results.  After the analysis, when SATe is trying to clean up after itself (i.e., remove all the temporary files it created), it is refusing to delete some of the temp directories it created during the analysis.  In an effort to prevent SATe from deleting anything important on users' file systems, we have forced it to be very conservative when it comes to deleting things.  If you send me the command line invocation, I'll see if I can replicate the error messages and figure out the cause.


2. I believe there's an error in the configuration file example given on the download page. The [sate] section needs to be [SATe] or you get an error that the config file doesn't have the proper headers. The error: 'The file "config" does not appear to be a valid configuration file format. It lacks section headers.'

SATe configuration files are case sensitive, but all of the section headings and options should be lowercase. So, "[sate]" should work, but "[SATe]" should not.  I think there is another problem with the configuration file that is causing the problem.  Are you referring to the sample configuration file shown on the SATe webpage?  I just looked it over, and there are several things wrong with it.  Sorry about this!  I will update the webpage ASAP.  For example, all of the tool section headers like "[mafft aligner]" should only have the tool name, like "[mafft]".  As it is now, none of those tool sections would be recognized by SATe.  I will get a better example up there very soon.


3. Could you tell me more about the configuration of the config file. Can I include the input file names in this file or is that always done in the command line? I've noticed that if I put a value for temporaries (e.g. "temporaries = tmp/") into the config file it is ignored without an error, but that setting does work if given in the command line. Are there only some options which are read? Also, what are the other valid headers other than what is shown in the sample? 

Every time SATe runs, it will return a configuration file named something like "satejob_temp_sate_config.txt".  This will contain all of the settings of the current analysis (I am pasting an example at the bottom of this e-mail).  This is a great way to get a sample config file that you can then modify as needed.  You can specify every SATe option via the config file.  The config file created by SATe is comprehensive, that is, it contains fields for all options.  You must specify each option under the correct section header, but other than that, order does not matter.  For example, to specify the input and temporaries directory you would need something like:

[commandline]
input = path/to/input/data.fasta
temporaries = tmp/
multilocus = False

OR

[commandline]
input = path/to/input_directory/
temporaries = tmp/
multilocus = True

If multilocus is false, the path to the fasta data file needs to be specified, but if it is true, then the path to the directory containing the multiple fasta files (which must have a ".fasta" or ".fas" extension to be recognized) needs to be specified.

SATe has default settings for almost all of its options.  Thus, many mistakes in the config file will be ignored and the default setting used.  We need to improve the messaging of SATe to warn users when it finds unrecognized sections and/or options in the configuration file.  For now, it's always a good idea to look at the configuration file generated by SATe to know exactly what the analysis settings were.

Many thanks for posting your issues.  Hopefully some of this information will help.  We will get the sample configuration file on the webpage fixed very soon, and we will also do our best to fix the annoying clean-up error message you are getting.

Cheers,

Jamie


Here's a sample config file:

[commandline]
aligned = False
auto = False
datatype = dna
input = .
job = satejob
keepalignmenttemps = True
keeptemp = True
multilocus = True
raxml_search_after = False
temporaries = sateout
timesfile = time.txt
treefile = starting.tre
two_phase = False
untrusted = False

[sate]
after_blind_iter_term_limit = -1
after_blind_iter_without_imp_limit = 1
after_blind_time_term_limit = -1.0
after_blind_time_without_imp_limit = -1.0
aligner = mafft
blind_after_iter_without_imp = -1
blind_after_time_without_imp = -1.0
blind_after_total_iter = -1
blind_after_total_time = -1.0
blind_mode_is_final = True
break_strategy = longest
iter_limit = -1
iter_without_imp_limit = -1
max_mem_mb = 2048
max_subproblem_frac = 0.5
max_subproblem_size = 75
merger = muscle
move_to_blind_on_worse_score = True
num_cpus = 1
output_directory = sateout
return_final_tree_and_alignment = False
start_tree_search_from_current = True
time_limit = -1.0
time_without_imp_limit = -1.0
tree_estimator = fasttree

[clustalw2]
path = /Users/jamieoaks/projects/sate/sate-core/bin/clustalw2

[fakealigner]
path = 

[faketree]
path = 

[fasttree]
args = 
model = 
options = 
path = /Users/jamieoaks/projects/sate/sate-core/bin/fasttree

[mafft]
path = /Users/jamieoaks/projects/sate/sate-core/bin/mafft

[muscle]
path = /Users/jamieoaks/projects/sate/sate-core/bin/muscle

[opal]
path = /Users/jamieoaks/projects/sate/sate-core/bin/opal.jar

[padaligner]
path = 

[prank]
path = /Users/jamieoaks/projects/sate/sate-core/bin/prank

[probalign]
path = /Users/jamieoaks/projects/sate/sate-core/bin/probalign

[randtree]
path = 

[raxml]
args = 
model = 
path = /Users/jamieoaks/projects/sate/sate-core/bin/raxml

Matt K.

unread,
Oct 1, 2012, 9:50:57 AM10/1/12
to sate...@googlegroups.com
Thanks Jamie,
1. Good to hear the "Refused to clean" message isn't affecting the run. I just tried running with -k to keep the temporary files and then there were no errors. That's enough of a solution for me for now.

Here's the complete input/output with the settings that give the error. As I mentioned before, it happens even if I redirect the temporary folder location and try on a Linux system (this is a Mac desktop).

$ python ../sate-core/run_sate.py -i small.fasta -t small.tree -j test --auto
SATe INFO: Reading input sequences from 'small.fasta'...
SATe INFO: Configuration written to "/Users/mk/Downloads/satesrc-v2.2.4-2012Jul18/test/test_temp_sate_config.txt".

SATe INFO: Reading input sequences from 'small.fasta'...
SATe INFO: Directory for temporary files created at /Users/mk/.sate/test/tempYd19V7
SATe INFO: Reading starting trees from "small.tree"...
SATe INFO: Name translation information saved to /Users/mk/Downloads/satesrc-v2.2.4-2012Jul18/test/test_temp_name_translation.txt as safe name, original name, blank line format.
SATe INFO: Starting SATe algorithm on initial tree...
SATe INFO: Step 0. Realigning with decomposition strategy set to centroid
SATe INFO: Step 0. Alignment obtained. Tree inference beginning...
SATe INFO: realignment accepted and score improved.
SATe INFO: current score: -29481.808, best score: -29481.808
SATe INFO: Step 1. Realigning with decomposition strategy set to centroid
SATe INFO: Step 1. Alignment obtained. Tree inference beginning...
SATe INFO: realignment accepted and despite the score not improving.
SATe INFO: current score: -29481.855, best score: -29481.808
SATe INFO: Writing resulting alignment to /Users/mk/Downloads/satesrc-v2.2.4-2012Jul18/test/test.marker001.small.aln
SATe INFO: Writing resulting tree to /Users/mk/Downloads/satesrc-v2.2.4-2012Jul18/test/test.tre
SATe INFO: Writing resulting likelihood score to /Users/mk/Downloads/satesrc-v2.2.4-2012Jul18/test/test.score.txt
SATe INFO: The resulting alignment (with the names in a "safe" form) was first written as the file "/Users/mk/Downloads/satesrc-v2.2.4-2012Jul18/test/test_temp_iteration_0_seq_alignment.txt"
SATe INFO: The resulting tree (with the names in a "safe" form) was first written as the file "/Users/mk/Downloads/satesrc-v2.2.4-2012Jul18/test/test_temp_iteration_0_tree.tre"
Refused to clean '/Users/mk/.sate/test/tempYd19V7/step0/centroid/r0': not created by SATeRefused to clean '/Users/mk/.sate/test/tempYd19V7/step0/centroid/r1': not created by SATeRefused to clean '/Users/mk/.sate/test/tempYd19V7/step1/centroid/r0': not created by SATeRefused to clean '/Users/mk/.sate/test/tempYd19V7/step1/centroid/r1': not created by SATeSATe INFO: Total time spent: 24.9129729271s

2 and 3. Thank you for the advice on the configuration file settings. Looking at the "satejob_temp_sate_config.txt" file will surely answer my questions.

Matt

Matt K.

unread,
Oct 5, 2012, 2:38:33 PM10/5/12
to sate...@googlegroups.com
I found the SATe tutorial which answered some of my basic questions: http://phylo.bio.ku.edu/software/sate/sate_tutorial.pdf

A question I still have from looking at the config files generated by the GUI is that both max_subproblem_frac and max_subproblem_size are set to a value. Which one is used?

For example when I read in large.fasta in the GUI and have the program choose settings automatically, Max. Subproblem is set to Size with a value of 200. The config file then has both of these lines:
max_subproblem_size = 200
max_subproblem_frac = 0.5

If I change the GUI radio button for Max. Subproblem to Percentage and arbitrarily leave it at 50%, the config file has these lines:
max_subproblem_size = 3
max_subproblem_frac = 0.5

I realize that running with the program choosing these values (--auto) is recommended for basic runs and I also saw in the tutorial PDF the rule used for automatically selecting Max. Subproblem. However, we may choose to do some runs with a preset value and I'm not sure how it is set correctly.

Thanks
Matt

Jamie Oaks

unread,
Oct 9, 2012, 11:41:05 PM10/9/12
to sate...@googlegroups.com
Hi Matt,

Sorry for the delay, and thanks for posting the details associated with the "refused to clean" message.

Regarding "max_subproblem_size" and "max_subproblem_frac," SATe will use which ever option specifies more individuals (tips).  For example,
if your data file has 100 sequences, and you specify

max_subproblem_size = 51
max_subproblem_frac = 0.5

max_subproblem_size will be used, and the tree decomposition will continue until all subtrees have less than or equal to 51 tips.  Whereas if you specify

max_subproblem_size = 49
max_subproblem_frac = 0.5

max_subproblem_frac will be used, and the tree decomposition will continue until all subtrees have <= 50 tips.

However, while playing with these options to ensure I answered this correctly, I realized that SATe behaves oddly in processing these options from the command line and GUI.  If you specify either option so that it equals less tips than the default of the other option, it gets ignored.  In other words, because the default of the other option specifies more individuals, it gets used and the user input gets ignored.  I will change this behavior so that it is more intuitive ASAP.  In the meantime, it is safe to use a config file.  Just be aware that the option that specifies the most individuals "wins".  It is also safe to specify both options via the command line and GUI.  Also, the "--auto" option adjusts both options so that they do not conflict.

Many thanks for help in posting this issue.

Cheers,

Jamie

--
You received this message because you are subscribed to the Google Groups "SATe User" group.
To post to this group, send email to sate...@googlegroups.com
To unsubscribe from this group, send email to sate-user+...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sate-user?hl=en

Reply all
Reply to author
Forward
0 new messages