partition file with an intron in phase one

25 views
Skip to first unread message

Timo Kosonen

unread,
Feb 9, 2018, 12:38:21 PM2/9/18
to raxml

Hi,

I have a dataset of protein coding sequences. There are three introns; two of them are placed between codons (phase 0), but one is between codon positions 1 and 2 (phase 1). I am also partitioning the data (codon positions 1 and 2 together and then 3 as a separate partition).

My question: Do I have to take into account the “phase 1” intron when assigning partitions? …as I have done here…The “phase 1” intron is at 538-593. …it looks logical, but in the end I became unsure how the program reads/executes the script.

Partition file:

DNA, EF1Codon1andCodon2 = 103-144\3, 104-144\3, 399-537\3, 400-537\3, 596-1202\3, 594-1202\3,
DNA, EF1Codon3 = 105-144\3, 401-537\3, 595-1202\3
DNA, EF1intron = 1-102, 145-398, 538-593

 

Timo

Alexandros Stamatakis

unread,
Feb 9, 2018, 3:24:30 PM2/9/18
to ra...@googlegroups.com
Hi Timo,

Are you unsure about the format of the partition file or the way to
assign the sites/introns to different partitions.

Alexis
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

Timo Kosonen

unread,
Feb 9, 2018, 3:55:17 PM2/9/18
to raxml
Hi,

I am unsure how to assign partitions when the intron is in the middle of the codon. If it was only "phase 0" introns in the dataset, I had written it like this: (changes with asterisks)

DNA, EF1Codon1andCodon2 = 103-144\3, 104-144\3, 399-537\3, 400-537\3, *594-1202\3, *595-1202\3, 
DNA, EF1Codon3 = 105-144\3, 401-537\3, *596-1202\3

DNA, EF1intron = 1-102, 145-398, 538-593


...but since the last intron was at "phase 1" I first thought that I of course need to take it into account - since it is the second nucleotide in the codon at position 594 and not the first. ...but then I thought that maybe the 
algorithm takes it somehow(?) into account, since the previous exon was not evenly divided by three... or something. And what I was doing was just making it all unnecessary complicated. And somewhere here I realized I didn't
really know how the program creates a partition ...and how I should assign the partition in the case of "phase1" codon.

Timo

Grimm

unread,
Feb 10, 2018, 7:56:42 AM2/10/18
to raxml
Hi Timo,

as a geneticist, I find it strange that an intron should insert between the 2nd and 3rd codon position. An intron is a non-coding part within a gene, flanked on each side by an exon. And the exons should usually have a nucleotide sequence composed of a multiple of three nucleotides (only intact codons).

But well there may be exceptions from the rule. How did you define the starts/ends of the exons? Have you used an aminoacid sequence of the protein as template to identify the 1st, 2nd and 3rd codon positions and the position of the introns?
If not, you may just have misplaced the codon by one-two nucleotides. I notice that your third exon (594-1202), the one that according to your question should start with a 2nd codon position, has a length of 609 nucleotides, so it either has two nucleotide too many, or is short one.

Analysis-wise it doesn't matter, as long as all first/second and third codon positions are correctly defined. Which is the case in your original definition:
after the last intron, the exon starts with the 2nd codon pos. (594), then comes the 3rd (595), and then the first (596). So all in order, analysis-technically.

Still, I would check if you recognised the introns correctly. In case you don't have a reference proteine sequence, you can just check the number of variable sites for you 1st/2nd and 3rd codon partitions. The latter should always be more than the first, also the optimised model for the 3rd codon position usually approaches more decisively a HKY-like or similar situation (transversions with much lower probability than transitions) than the one for the 1st and 2nd codon position. RAxML includes the according partition-wise information in the RAxML info file.

Cheers, Guido

Timo Kosonen

unread,
Feb 10, 2018, 3:12:39 PM2/10/18
to raxml

But well there may be exceptions from the rule. How did you define the starts/ends of the exons? Have you used an aminoacid sequence of the protein as template to identify the 1st, 2nd and 3rd codon positions and the position of the introns? 
If not, you may just have misplaced the codon by one-two nucleotides. I notice that your third exon (594-1202), the one that according to your question should start with a 2nd codon position, has a length of 609 nucleotides, so it either has two nucleotide too many, or is short one.

Yes, I examined the translation and with that I could identify the introns. Introns start with GT (few rare exception of GC) and stop with AG. If exons have also special codons to mark the start and end, that I am not aware of. The last exon is not complete and has been apparently cut in the middle of a codon (cutting the tails).
 

Analysis-wise it doesn't matter, as long as all first/second and third codon positions are correctly defined. Which is the case in your original definition:
after the last intron, the exon starts with the 2nd codon pos. (594), then comes the 3rd (595), and then the first (596). So all in order, analysis-technically.

That's good to hear, less lost now.
 
Still, I would check if you recognised the introns correctly. In case you don't have a reference proteine sequence, you can just check the number of variable sites for you 1st/2nd and 3rd codon partitions. The latter should always be more than the first, also the optimised model for the 3rd codon position usually approaches more decisively a HKY-like or similar situation (transversions with much lower probability than transitions) than the one for the 1st and 2nd codon position. RAxML includes the according partition-wise information in the RAxML info file.

This gene (tEF1a / ascomycota) is not the easiest to align. So I very much agree (extra check is always justified). But there is reference data available and atleast the exon aligment should be right (it is). What's tricky here is that introns are often in slightly different position between different clades. The alignment, even if correct, is not that "pretty". For the particular intron in question, I have not found an alternative solution that would result in correct translation. 

Thanks for help!
Timo

Alexandros Stamatakis

unread,
Feb 12, 2018, 6:19:39 AM2/12/18
to ra...@googlegroups.com
Dear Timo,

First of all, thanks Guido for your help :-)

RAxML will complain, i.e., exit with an error message if (i) one site is
assigned to two or more partitions or (ii) one or more sites are not
assigned to any partition. So you can't really do much wrong here.

Hope that helps a bit,

Alexis
> > an email to raxml+un...@googlegroups.com <javascript:>
> > <mailto:raxml+un...@googlegroups.com <javascript:>>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> Alexandros (Alexis) Stamatakis
>
> Research Group Leader, Heidelberg Institute for Theoretical Studies
> Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
>
> www.exelixis-lab.org <http://www.exelixis-lab.org>
Reply all
Reply to author
Forward
0 new messages