multiple trees in treefile

287 views
Skip to first unread message

Felipe Barreto

unread,
Apr 29, 2013, 3:50:51 PM4/29/13
to pamlso...@googlegroups.com
Hi,

I'm trying to run codeml across multiple data sets, and I've created an input file with multiple phylip alignments as well as a treefile with the tree topologies for the alignments, arranged in the same order as the alignments.  My goal is to perform runmode = 0 for each dataset.

When I try it, the first data set gets analyzed well, but it gives me an error when starting the second data set: "Species SIXb?"
That's the species listed first on the second tree. And yes, the ndata is set to the correct number of data sets.  Here's a copy of my tree file:

(FIVE,FOUR,(SIX,(THREE,(ONE,TWO))));
(SIXb,(THREEb,(ONEb,TWOb)),(FOURb,FIVEb));

The error still occurs when I switch the order of the data sets in both input files, with the error being, accordingly: "Species FIVE?"

If I run each data set individually, they both work.  But I want to be able to do this for thousands of data sets, so I need to solve this issue.

I believe my error is in the treefile, since the alignment file is read correctly when I used runmode = -2 (in order to ignore the treefile), as well as yn00, which does not use the tree file.

Assuming the issue is in the treefile, am I doing something wrong in specifying multiple trees in the file?

Thanks for any input!!



Ziheng

unread,
May 25, 2013, 7:02:44 AM5/25/13
to pamlso...@googlegroups.com
when you use ndata, each alignment is analyzed using the same set of trees.  In your case the trees for alignments are different so you should not use ndata.  You need prepare separate tree files for the different alignments, and you can write some simple perl/python scripts to manage the jobs.
ziheng
 

hvk

unread,
Jul 4, 2013, 8:06:16 PM7/4/13
to pamlso...@googlegroups.com

Dear Felipe,

I am currently doing an analysis on my chloroplast dataset with PAML. I have done several tests/models on a single gene by codeml, however it looks tricky when doing on a multiple-gene dataset. I used the current G option:

5 2382 G

G 2 475 319

seq_1      ATGTCACCACAAACAGAGACTAAAGCAAGTGTTGGATTCAAAGCT...

seq_2      ATGTCACCACAAACAGAGACTAAAGCAAGTGTTGGATTCAAAGCT...

seq_3      ATGTCACCACAAACAGAGACTAAAGCAAGTGTTGGATTCAAAGCT...

seq_4      ATGTCACCACAAACAGAGACTAAAGCAAGTGTTGGATTCAAAGCT...

seq_5      ATGTCACCACAAACAGAGACTAAAGCAAGTGTTGGATTCAAAGCT...

 

with a master tree (a tree made by all gene sequences). When I rub this by codeml it gives me same error as you did. As they are more than 70 genes it will tedious if I want to do it one gene at a time, so I would be grateful if you could tell me where the problem comes from, or how I should do a multiple gene (concatenated genes) analysis?

Best and regards

Hossein

Ziheng

unread,
Jul 21, 2013, 5:08:17 AM7/21/13
to pamlso...@googlegroups.com
First, see my reply above.
Second, you need decide whether the 70 genes should be analyzed in one combined analysis or analyzed separately.  In the former case you use option G.  The models for this kind of analysis are described in
Yang Z, Swanson WJ (2002) Codon-substitution models to detect adaptive evolution that account for heterogeneous selective pressures among site classes. Mol. Biol. Evol. 19:49-57
I suspect that you intend to analyze the genes separately.  Then if the tree is always the same, you can use the ndata option.  If the trees are different, you should write some perl or python scripts to generate the control files, tree files etc. in different folders and run the analyses separately.
Nowadays people analyze thousands of genes routinely, so 70 is not such a big deal, but you need learn to write scripts to manage the jobs, as analyzing the genes one by one manually is error-prone and not feasible.
ziheng


Lingyun Chen

unread,
Jul 26, 2013, 4:02:51 AM7/26/13
to pamlso...@googlegroups.com

Hello everyone,

'Then if the tree is always the same, you can use the ndata option'.
However, how can i make the file, which includes 70 genes in one file? What format? Could you show me an example?

Best regards.

Sincerely yours,
Ling-Yun Chen


Ziheng

unread,
Sep 5, 2013, 6:14:17 PM9/5/13
to pamlso...@googlegroups.com
Just one alignment followed by another, separately by some empty lines.
There may be some examples in the paml package, or you can use evolver to simulate 10 data sets, say, and then look at the data file.
Ziheng yang
Reply all
Reply to author
Forward
0 new messages