Hi Alexis and Hideyuki,
Hrdy et al. (2004) is available at the following URL.
http://dx.doi.org/10.1038/nature03149
Because GTR model assumes compositional equilibrium among OTUs, GTR
or similar models are not suitable for among-OTU compositionally highly
heterogeneous data set. "Data-recoding" such as RY-coding, AGY-coding,
or Dayhoff-coding is a technique for homogenizing character composition.
6-state Dayhoff coding should tlanslate
A, R, N, D, C, Q, E, G, H, I, L, K, M, F, P, S, T, W, Y, V
into
0, 1, 2, 2, 3, 2, 2, 0, 1, 4, 4, 1, 4, 5, 0, 0, 0, 5, 5, 4, respectively.
This translation might cancel out compositional heterogeneity, and we
can reconstruct ML tree applying MULTIGAMMA model by RAxML.
"pgrecodeseq" command in Phylogears can apply above translation. This
is available at
http://www.fifthdimension.jp/products/phylogears/ . For
example, you can apply 6-state Dayhoff coding to an amino-acid sequence
data set by the following command.
pgrecodeseq --type=ANY "ARNDCQEGHILKMFPSTWYVX-01223220144145000554?" inputfile outputfile
This command can read FASTA, NEXUS, PHYLIP, relaxed-PHYLIP, and
Treefinder format sequence files.
Best regards,
On Fri, 11 Jul 2014 15:56:01 +0200
Alexandros Stamatakis <
alexandros...@gmail.com> wrote in
<
53BFECF...@gmail.com>
> > I am trying to recode the protein sequences into six Dayhoff groups (Hrdy
> > et al. 2004)
> > and construct the phylogenetic tree.
>
> Could you please send a link to this paper, I am not able to find it.
>
> > Is it possible to do that with treating the six Dayhoff groups on the
> > alignment
> > as the multi-state characters?
>
> I guess so, you'd have to map the groups to six multi-state characters 0,1,...,5 and then you can infer a tree using, for instance a GTR model for multi-state characters, but I am not sure if this is what you want to do because I am unaware of the model.
--
Akifumi S. Tanabe, Ph.D.
National Research Institute of Fisheries Science, Fisheries Research Agency,
2-12-4 Fuku-ura, Kanazawa-ku, Yokohama, Kanagawa 236-8648, Japan.