Question about conservation tracks(multiz align and phastCons)

8 views
Skip to first unread message

엄다훈

unread,
Nov 15, 2021, 12:47:43 PM11/15/21
to UCSC Genome Browser Discussion List
Hello!


I have a few questions about conservation tracks.
In mouse genome(mm10), we can see 60 vertebrates conservation track. I've known that PhastCons score is calculated by multiple alignment(multiz align). However, some regions are not consistent with multiz align. I added one screenshot for example. In this picture, case 1 is commonly easy to understand(many organisms have multiple alignment). case 2 and 3 are unique cases. Although there are multiple alignments in few or many organisms, however, the phastCons scores are higher or less. 

So, are there any inconsistent regions between multiz align and phastCons score? 


Also, I have one more question about phastCons score... I want to see individual phastCons score such as Chicken or Turkey(compared with mouse). However, phastCons score are given only 3 types. Is there way to get individual phastCons score such as mouse vs. human or mouse vs. chicken? Because multiz align gives individual sequences, I think it may be possible to calculate individual phastCons score.


Thank you for reading these ambiguous questions :)


Dahun

Daniel Schmelter

unread,
Nov 15, 2021, 8:28:29 PM11/15/21
to 엄다훈, UCSC Genome Browser Discussion List

Hello Dahun,

Thank you for emailing Genome Browser support with your questions about genome conservation scores and alignments.

It sounds like you meant to attach a screenshot and some example regions, but they were not included in your previous email. The MultiZ output and PhastCons are intentionally different, with separate goals for their analysis. The PhastCons data is meant to identify Conserved Elements, based on individual and adjacent bases, compared with the set of 60 genomes. You may be interested in reading the track description here:

https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=mm10&g=cons60way

PhastCons is written and intended for multiple alignments. If you want to see pairwise alignments, you can use the MultiZ organism filters to select only your desired organism. Scores exist for these alignments across the genome and can be exported or downloaded. It may help us to understand what you want to do with these pairwise alignment scores.

I hope this was helpful. If you have any more questions, please reply-all to gen...@soe.ucsc.edu. All messages sent to that address are publicly archived. If your question includes sensitive data, please reply-all to genom...@soe.ucsc.edu.

All the best,

Daniel Schmelter
UCSC Genome Browser


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CABbK9tqtkvC-nEByUQpVkYNjnOFS04Z-FQAxE2j1C61A3zFS8Q%40mail.gmail.com.

엄다훈

unread,
Nov 16, 2021, 11:31:30 AM11/16/21
to Daniel Schmelter, UCSC Genome Browser Discussion List
Hello Daniel,

I appreciate you for kind explanation :)
Oh, I forgot to add the screenshots for some examples. Before your email, I understood phastCons score is base on Multiz align tracks. 
conservation_track_example.png
Actually, I want to add this screenshot. Although Multiz align and phastCons are intentionally different, I think they are a little consistent as shown us in UCSC genome browser. Could you briefly explain the reason why there are some inconsistencies? I think it may be come from the way to calculate scores.

In 'Table Browser', I can download three types of phastCons scores. You said it is possible to get pairwise phastCons scores. However, in table browser, there is no option for filtering organism(only option to select organism exists on 'Configure Conservation track set' in browser). Also, I failed many times to download phastCons score in table browser because of connection error.... TT(So, I used ftp service. But in this case, I think these scores are not pairwise...). Could you give me some tips for downloading pairwise phastCons score?



Again, thank you for help!!

Best regards,
Dahun





2021년 11월 16일 (화) 오전 10:28, Daniel Schmelter <dsch...@ucsc.edu>님이 작성:

Brian Lee

unread,
Nov 17, 2021, 5:37:14 PM11/17/21
to 엄다훈, Daniel Schmelter, UCSC Genome Browser Discussion List

Dear Dahun,

Thank you for using the UCSC Genome Browser and your question about MultiZ and phastCons data.

The phastCons data is not scored in a linear combination of the alignments that are displayed in the separate MultiZ tracks.  The phastCons methods involve statistical Hidden Markov Model (HMM) processes where some parameters from training processes result in the final scores to predict likelihood of conservation.  The methods can impart more weight when there is a match with more distantly related species, where the thinking is that one would expect a lot more divergence from those distant genomes.   Reviewing some of the references will provide a more detailed explanation, such as "Phylogenetic Hidden Markov Models" available here: https://link.springer.com/chapter/10.1007/0-387-27733-1_12 or this help page for the program: http://compgen.cshl.edu/phast/help-pages/phastCons.txt

For accessing the phastCons data, it may help to understand that even when you show less species in the multiple alignment MulitZ track, that doesn't change the phastCons scores displayed.  There are the three mm10 tracks, Euarchontoglires Conservation, Placental Mammal Conservation, and  60 vertebrates conservation by PhastCon.  Any further subset calculations would have to be recalculated with the program independently. This is to say, if you wanted the PhastCons for just a section of fish species, it would require independent work to discover those values.

As Daniel shared, the track description page, https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=mm10&g=cons60way,  provides many details that can be helpful to review.  For accessing the data, while you can use the Table Browser, it is imperfect and not recommended when looking at wider regions.  On the Track Description page there is a  "Downloads for data in this track are available" section just below the settings options, where you can click a link for "PhastCons conservation (WIG format)" and arrive at our downloads page, where you can extract data directly: http://hgdownload.soe.ucsc.edu/goldenPath/mm10/phastCons60way/

There are programmatic command-line ways to extract data from these files. If it might be helpful, there is a way to use a tool bigWigToWig to extract coordinate regions on these files such as the following to extract just a region of data from the 60way file:

 bigWigToWig -chrom=chr12 start=56707600 end=56707900 http://hgdownload.soe.ucsc.edu/goldenPath/mm10/phastCons60way/mm10.60way.phastCons.bw stdout

That data output can even be loaded back as a custom track by adding a line like "track name=bigWigOutput type=wig" to the top of the file.  You can get the bigWigToWig utility here: http://hgdownload.soe.ucsc.edu/admin/exe/

Lastly, if it might help explain the unique properties of the MultiZ view. On the Track Description page under the "Display Conventions and Configuration" section there is a note how the "full" display (like the the one shared in your screenshot) is an indication of alignment quality. I think that might help reflect a bit more that the wiggle display shared does not equate to a score, but is rather an attempt by the Browser software to represent the base DNA information at a wider zoom level of the individual base information visible upon zooming in close enough.

Thank you again for your inquiry and for using the UCSC Genome Browser. If you have any further public questions, please send new questions to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly accessible forum to help others find answers to similar questions. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu, which is a private internal list to our support team.

All the best,


Reply all
Reply to author
Forward
0 new messages