Questions about calculation of phastCons and phyloP score

246 views
Skip to first unread message

lfj

unread,
Oct 11, 2017, 11:37:01 AM10/11/17
to gen...@soe.ucsc.edu
Dear Sir or Madam,
Recently, I wanna to re-calculate the phastCons and plyloP scores of 46-way in hg19. The way to extract the 4d-sites and construct the model used for calculating the conservation scores is following the manual of phastCons(http://compgen.cshl.edu/phast/phastCons-HOWTO.html), and some key parameters is from the site(http://genomewiki.ucsc.edu/index.php/Human/hg19/GRCh37_46-way_multiple_alignment). The only difference is that I did use the anntation file from GENCODE V19 ranther than  RefSeq(coding + reviewed) annotation information and the version of phast is v.1.4 Release. Meanwhile, I also used the 46-way model file available on UCSC genome browser in the hg19 and calculated the conservation scores as well. But the result is very confusing that the socres calculated using my own models is quite the same with that calculated using the models available on UCSC genome browser. However, both are quite different from the scores directly available on UCSC genome browser. So I just wanna know what's the reason? Now that  the socres calculated using my own models constructed myself  is quite the same with that calculated using the models available on UCSC genome browser, my ways to construct the model is correcct!  And the process of calculating phastCons and phylop is almost the same, I was puzzled why these results are not the same with that available on UCSC genome browser.
So, could you please offer me a detailed protocol and scripts to calculate the conservation scores ? both hg19 and hg38 are OK.
By the way, I have attached some scripts of my calculation and I hope you could find out where is my error and let me know. Thanks!

Sincerely,
Sherking.


01 download_maf.sh
02 extract_maf.sh
03 extract_gff.sh
04 extract_4d_sites.sh
05 build_model.sh
06 scores.sh

lfj

unread,
Oct 12, 2017, 11:18:42 AM10/12/17
to lfj, gen...@soe.ucsc.edu
Dear Sir or Madam,
I reconstruct the model used for calculating phastCons and phyloP score of 46-way in hg19 as the protocol described on the site(http://genomewiki.ucsc.edu/index.php/Human/hg19/GRCh37_46-way_multiple_alignment) and compared it with models available on both http://genomewiki.ucsc.edu/index.php/Human/hg19/GRCh37_46-way_multiple_alignment and http://hgdownload.soe.ucsc.edu/goldenPath/hg19/phastCons46way/ . But the result is also confusing, it is different. Just as I described in my last email, the re-constructed one is the same with the one construct myself but different from ones available on UCSC. And now, I  am crazy about it. Could you please help me find out the reasons?

By the way, the attachments are my results of models' comparison.


model_comparison.docx

Cath Tyner

unread,
Oct 12, 2017, 6:17:51 PM10/12/17
to lfj, UCSC Genome Browser Public Help Forum
Hello Sherking,

Thank you for contacting the UCSC Genome Browser support team regarding the issues you are having when re-constructing the phastCons and plyloP scores of the 46-way in hg19. Thank you also for including your file attachments. 

Our procedure for calculating the branch lengths on these trees is in our make doc, hg19.txt. We did special calculations in hg19, one for chrX and a second for all other chroms. Look in the make doc for sections titled:

#########################################################################
Phylogenetic tree from 46-way for chrX (DONE - 2009-10-26 - Hiram)
We need two trees, one for chrX only, and a second for all other chroms
#########################################################################
Phylogenetic tree from 46-way for non-chrX (DONE - 2009-10-27 - Hiram)
We need two trees, one for chrX only, and a second for all other chroms

You may also be able to find more information by searching our archived mailing list forumsFor example, you can search for "46-way hg19" to see these results.

Please respond to this list if you have further questions!

Thank you for contacting the UCSC Genome Browser support team. 
​Please send new and follow-up questions to one of our UCSC Genome Browser mailing lists below:

  * Post to the Public Help Forum: E
mail 
gen...@soe.ucsc.edu
​ or search the Public Archives
​  * Post to the Mirror Help Forum: Email
 
genome...@soe.ucsc.edu 
or search the Mirror Archives​
​  * Confidential/private help: Email
 
genom...@soe.ucsc.edu

UCSC Genome Browser Announcements List (email alerts for new data & software):
  * Subscribe: Email genome-announce+subscribe...@soe.ucsc.edu 
  * Unsubscribe: Email genome-announce+unsubscri...@soe.ucsc.edu

Join us on Social Media! FacebookTwitter, Wordpress BlogYouTube

​Enjoy,​
Cath
. . .
Cath Tyner
UCSC Genome Browser, Software QA & User Support
UC Santa Cruz Genomics Institute


On Wed, Oct 11, 2017 at 8:45 PM, lfj <lfj...@163.com> wrote:
Dear Sir or Madam,
I reconstruct the model used for calculating phastCons and phyloP score of 46-way in hg19 as the protocol described on the site(http://genomewiki.ucsc.edu/index.php/Human/hg19/GRCh37_46-way_multiple_alignment) and compared it with models available on both http://genomewiki.ucsc.edu/index.php/Human/hg19/GRCh37_46-way_multiple_alignment and http://hgdownload.soe.ucsc.edu/goldenPath/hg19/phastCons46way/ . But the result is also confusing, it is different. Just as I described in my last email, the re-constructed one is the same with the one construct myself but different from ones available on UCSC. And now, I  am crazy about it. Could you please help me find out the reasons?

By the way, the attachments are my results of models' comparison.




At 2017-10-11 23:02:48, "lfj" <lfj...@163.com> wrote:
Dear Sir or Madam,
Recently, I wanna to re-calculate the phastCons and plyloP scores of 46-way in hg19. The way to extract the 4d-sites and construct the model used for calculating the conservation scores is following the manual of phastCons(http://compgen.cshl.edu/phast/phastCons-HOWTO.html), and some key parameters is from the site(http://genomewiki.ucsc.edu/index.php/Human/hg19/GRCh37_46-way_multiple_alignment). The only difference is that I did use the anntation file from GENCODE V19 ranther than  RefSeq(coding + reviewed) annotation information and the version of phast is v.1.4 Release. Meanwhile, I also used the 46-way model file available on UCSC genome browser in the hg19 and calculated the conservation scores as well. But the result is very confusing that the socres calculated using my own models is quite the same with that calculated using the models available on UCSC genome browser. However, both are quite different from the scores directly available on UCSC genome browser. So I just wanna know what's the reason? Now that  the socres calculated using my own models constructed myself  is quite the same with that calculated using the models available on UCSC genome browser, my ways to construct the model is correcct!  And the process of calculating phastCons and phylop is almost the same, I was puzzled why these results are not the same with that available on UCSC genome browser.
So, could you please offer me a detailed protocol and scripts to calculate the conservation scores ? both hg19 and hg38 are OK.
By the way, I have attached some scripts of my calculation and I hope you could find out where is my error and let me know. Thanks!

Sincerely,
Sherking.




--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/2c70f26c.6706.15f0eaf9e0c.Coremail.lfj_no1%40163.com.

lfj

unread,
Oct 13, 2017, 12:08:36 PM10/13/17
to Cath Tyner, UCSC Genome Browser Public Help Forum
Sorry for that. And really be appreciated for your reply. I will try to check my procedure as your pipeline described. 
By the way, does the version of PHAST program affect the results a lot?  Because I encounter lots of errors or warnings whlie installing the old Version: 0.9.9.10 which never happened in latest version.


Enjoy,
Sherking






发自网易邮箱大师

Matthew Speir

unread,
Oct 19, 2017, 2:34:46 PM10/19/17
to lfj, UCSC Genome Browser Public Help Forum
Hi Sherking,

Thank you for your follow-up questions.

The version of phastCons we use is from 2010-12-30. One of our engineers notes that the phyloFit program is not a completely deterministic algorithm and that running it twice on the same input data will produce different results.

Any questions about the details of how phastCons and phyloP work should be passed on to the Siepel lab at Cold Spring Harbor Laboratory (CSHL): http://compgen.cshl.edu/phast/.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
Reply all
Reply to author
Forward
0 new messages