Hi,
I wanted to download a track visualizing the conservation between mouse and human. Is there a Bed-File illustrating conserved regions or a bigwig-File available for download available? So far I have only found the conservation track between 100 vertebates.
Thank you
Best wishes
Sebastian
Hello, Sebastian.
You can download the MAF files associated with the hg19 100-way Conservation track from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/multiz100way/maf/. You can then extract only the human and mouse entries from those files using our mafSpeciesSubset utility which can be downloaded from http://hgdownload.cse.ucsc.edu/admin/exe/.
Please contact us again at gen...@soe.ucsc.edu if you have any further questions. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
---
Steve Heitner
UCSC Genome Bioinformatics Group
--
Hi Steve,
Thank you. The track looks fine, though I am confused that conservation between human (hg19) and mouse (mm10) seems to be very low (e.g. 13 short regions 2-108 bp in length on chr3). I was wondering, if there is a daaset that I can view in IGV at low resolution to get information in kb or mb scale?
Thx
Best wishes
Sebastian
From: Steve Heitner [mailto:st...@soe.ucsc.edu]
Sent: Monday, August 17, 2015 1:36 PM
To: Preissl, Sebastian <spre...@ucsd.edu>;
gen...@soe.ucsc.edu
Subject: RE: [genome] mouse human conservation track
Hello, Sebastian.
You can download the MAF files associated with the hg19 100-way Conservation track from
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/multiz100way/maf/. You can then extract only the human and mouse entries from those files using
our mafSpeciesSubset utility which can be downloaded from http://hgdo/.wnload.cse.ucsc.edu/admin/exe/.
Please contact us again at
gen...@soe.ucsc.edu if you have any further questions. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
---
Steve Heitner
UCSC Genome Bioinformatics Group
Hello Sebastian,
One of our engineers has already constructed a summary view somewhat like you describe. You can find it at http://genome-test.cse.ucsc.edu/~hiram/Synteny/index.html. If you would like a track that you can display and browse in IGV, then you have several options depending on the exact type of data that you want to display.
The phyloP and phastCons conservation scores can be downloaded in wiggle format from our download server at http://hgdownload.soe.ucsc.edu/goldenPath/hg19/ in the phyloP100way/ and phastCons100way/ subdirectories, respectively. Those conservation scores were calculated from the full 100-way alignment, however, which may not be what you are looking for.
If you would like to display the regions of the human genome that were successfully aligned to the mouse genome, then you will need other files. http://hgdownload.soe.ucsc.edu/goldenPath/hg19/vsMm10/ contains two files: hg19.mm10.all.chain.gz and hg19.mm10.net.gz. These files are the chain and net files for the human hg19 <-> mouse mm10 genome alignment. The difference between chains and nets is discussed at http://genomewiki.ucsc.edu/index.php/Chains_Nets, but the short story is that the chains file contains all of the successful alignments and the net file filters that list down to prevent alignments from overlapping on the human side. If you are only concerned with coverage of the human genome, then there should be little or no difference between them. IGV does not support either of these file formats, so you would need to convert them into formats that IGV does support. The all.chain file can be converted into a PSL file using our chainToPsl tool, while the .net file can be converted into a BED file using our netToBed tool. Both tools are provided for several computer architectures on our download server at http://hgdownload.soe.ucsc.edu/admin/exe/. We also provide source code in the userApps.src.tgz file if you would prefer to compile them yourself.
If you decide to use netToBed, please note that you will probably also want to use the -maxGap option to avoid having the BED file display coverage where there is actually a gap in the alignment. An example of this would be the centromere regions. Some chains have a large gap that spans the centromere with aligned sequence on both sides. Without the -maxGap option set to something suitably small (e.g., -maxGap=100), those gaps will be ignored and the resulting BED file will display them as aligned regions for those chains.
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.
--
Jonathan Casper
UCSC Genome Bioinformatics Group
--