4DTV

718 views
Skip to first unread message

Pascal

unread,
Mar 17, 2014, 11:34:07 AM3/17/14
to pamlso...@googlegroups.com
Hej!

I want to calculate the 4-fold degenerate transversion rate for 2 sequences. Since im using CodeML for calculation of dN/dS anyway i thought it would be easiest to have it calculate the 4DTV also. Is it possible to do that in codeml? It seems that there is some calculation on degenerated sites done since a file "4fold.nuc" is created. However i didnt find documentation what this file actually contains.

Best regards,
Pascal

Ziheng

unread,
Apr 16, 2014, 4:02:51 PM4/16/14
to pamlso...@googlegroups.com
I was really excited to see that 4D TV has arrived (my TV is only 3D and that is already too high tech for me), but then realise this post is not about TV.

I think the file 4fold you mentioned collects columns of the alignment which are decided to be four fold degenerate. You can make up a small dataset and test. You can then analyze this file using baseml to estimate the distances etc. codeml also has an ML method for estimating d4 between two sequences. It is perhaps turned on only if runmode = -2 and the output may be on the monitor and not in the output file.
The details are described on pages 63-64 of my book (yang 2006).
I did not distinguish transitions and transversions at the 4-fold sites, so perhaps it is not what you want after all.
Ziheng


En-Hua Xia

unread,
Jul 22, 2014, 8:36:55 AM7/22/14
to pamlso...@googlegroups.com
Have you ever wrote any perl script to calculate 4Dtv? or Did you know how to calculate the 4DTV value between gene pairs? Thanks a lot.

Pascal

unread,
Aug 20, 2014, 7:58:54 AM8/20/14
to pamlso...@googlegroups.com
Yes, i ended up writing my own python script (see attachment...). As input it uses a codon alignment file. Check the source for more comments and for usage instructions. Best regards, Pascal
4fold.py

newgumtree

unread,
Aug 26, 2014, 4:41:39 PM8/26/14
to pamlso...@googlegroups.com
Hi Pascal,

Thanks for the code, it's very nice! Just a question, when you count the alignment length using FD.get_alignment_length(), could it be possible that gaps in the alignment (noted by '-') are also counted?

I also noticed that in the biopython-PAML module, the online document mentioned running the baseml program can give the ts and tv estimate for each branch, but when I tried it, no such information is given. 

Prof. Yang, I wonder whether the ts and tv estimates are available using baseml? If so, what parameters should I use and which file should I look at?

Thanks,
Qu

Ziheng

unread,
Oct 16, 2014, 5:55:19 PM10/16/14
to pamlso...@googlegroups.com
My memory about this is blurry. I seem to remember implementing a k80 model with differnt kappa for differnt branches in baseml. I think it is specified using Nhomo = 2, and it was for a paper I published with Anne yoder in JME in 1999 (?). I am not sure whether the implementation still works. You can test and let me know if it is broken, And I am not sure whether the model is available for hky85 as well.
By the way, the mysterious pattern described in that paper (higher kappa for more closely related species) remains a mystery.
Given kappa and the distance (branch length), it should be trivial to calculate ts and tv.
Of course if you are happy to assme the same kappa for all branches, you can easily get ts and tv for each branch by using the common kappa and the branch length.
Ziheng

Ziheng

unread,
Oct 16, 2014, 6:07:29 PM10/16/14
to pamlso...@googlegroups.com
For divergent sequences, my way of estimating d4 should be advantageous to collecting four fold sites and then running baseml.
Perhaps given d4 and the estimate of kappa from the codon model, you can calculate ts and tv as well, by assuming a model such as hky85.
You can use some of the formulas in chapter 1 of my book. For example, in eq. 1.30 in yang (2014), the numerator is ts and the denominator is tv. Or more precisely, distance t multiplied by the numerator will be the expected number of transitions per site, and t multiplied by the denominator is the expected number of transversions per site.
Ziheng

Hamed Bostan

unread,
Jan 6, 2017, 11:58:37 AM1/6/17
to PAML discussion group
Hi Pascal, 

thanks for the code but I could never make it work. I am not expert in the field but I have 2 sequence that want to calculate its 4dtv. the sequences are not also the same length. I know that people do the calculation for the whole genes in the genomes (pairwise) but I have no clue how. Can you please provide me the complete code since it seems it still requires some filling, and also the inputs files required. for example by alignment you mean the blast outfmt 0 of 2 sequence?

Sorry if the questions are too much elementary but it will be a great help tp me if resolved,
Cheers 
Reply all
Reply to author
Forward
0 new messages