Human-chimpanzee divergent sites

6 views
Skip to first unread message

Alexa Bracci

unread,
Mar 9, 2020, 12:26:59 PM3/9/20
to gen...@soe.ucsc.edu
Hi, 

I am looking for a list of fixed divergent sites between human (hg19) and chimpanzee (panTro6) reference genomes, preferably in terms of the human reference. Do you have a list of these sites available for download? Or is there a way I could generate a file like this from other files you have available?

Thank you.

Best,
Alexa
--
Alexa Bracci
PhD Candidate, Koren Lab
Department of Molecular Biology and Genetics
Cornell University - Ithaca, NY

Daniel Schmelter

unread,
Mar 12, 2020, 8:12:50 PM3/12/20
to Alexa Bracci, UCSC Genome Browser Discussion List

Hello Alexa,

Thank you for your question about fixed divergent sites in humans compared with chimpanzee panTro6.

We have a few resources available that compare human and chimpanzee, but we only have hg19 vs panTro4 alignment chain files available at the moment. It is possible we could run a genome alignment for hg19 against panTro6. Alternately, you could use LiftOver to convert assembly coordinates to the latest version or switch to hg38, which has pre-computed panTro6 alignments. This may require a bit of scripting, but you may also be interested in looking at our Primates Alignment track output options in the Table Browser. You may have to do an inverse intersect with a variation dataset like dbSNP.

http://genome.ucsc.edu/cgi-bin/hgTables?db=hg19&hgta_table=chainPanTro4&hgta_track=primateChainNet&hgta_group=compGeno

In determining what exactly you are looking for, could you be a bit more specific about what data format you are looking for as output? It also would be helpful to us if you could define "Fixed Divergent Sites" for us. We assume that it meant sites that do not vary within human or chimp, but which vary between the two species. More information will certainly allow us to give you a good idea of how to find this information.

I hope this was helpful. If you have any more questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are publicly archived. If your question includes sensitive data, please send it instead to genom...@soe.ucsc.edu.
All the best,

Daniel Schmelter
UCSC Genome Browser


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAJg211-%3Dd_YNR9UwiC6n8%3D-ejT_c9MmsD-s-xctAARqJaS_VTA%40mail.gmail.com.

Alexa Bracci

unread,
Mar 16, 2020, 12:06:59 PM3/16/20
to Daniel Schmelter, UCSC Genome Browser Discussion List
Hi Daniel, 

Thank you for your response. 

You are correct, by fixed divergent sites between human and chimpanzee, I do mean sites that do not vary within human or chimpanzee, but that do vary between the two species. The output I am looking for is a list of these sites with their location in both species. I would prefer the output in terms of panTro6 and hg19 reference genomes, but I could easily convert between hg19 and hg38 with LiftOver as you suggested.

Could you clarify what you mean by 'inverse intersect' with dbSNP? I understand that the alignment tracks will give me orthologous regions between human and chimpanzee, but I am unsure how to use that to identify sites where human and chimpanzee differ.

Thank you!

Best,
Alexa

Matthew Speir

unread,
Mar 23, 2020, 5:33:11 PM3/23/20
to Alexa Bracci, Daniel Schmelter, UCSC Genome Browser Discussion List
Hello, Alexa.

Unfortunately, there's not a terribly straightforward process for getting this data. This is due to the limited availability of variation data for chimp and the LiftOver files we have available. For example, the lastest dbSNP data for chimp that we have at the UCSC Genome Browser is for panTro1, however, there is dbSNP data available for a slightly new assembly (panTro4) available on the dbSNP archive: ftp://ftp.ncbi.nih.gov/snp/organisms/archive/chimpanzee_9598/VCF/. We have LiftOver files between hg19/panTro4, but to get the data to panTro6 you would have to lift it from panTro4 to panTro5, and then again from panTro5 to panTro6. NCBI's ReMap may also provide files that you can use to convert coordinates between these different assembly builds: ftp://ftp.ncbi.nlm.nih.gov/pub/remap.

Some of what you are looking for could be accomplished for human using the Table Browser "Intersect" feature, our dbSNP tracks,  and our chain alignments between panTro6 and hg38. The Intersect feature allows you to find regions in different tracks that do or do not overlap with each other and this may be a useful step in finding your desired output. You may also be interested in exploring changing the Output Format to BED or sequence (Fasta) formats. Using these tools could give you a list of regions or sequences which are not annotated to have any human variation. You may also want to start your investigation with a subset of the entire genome, such as a particular chromosome or perhaps only Coding regions, as these intersections are computationally intensive. You may also be interested in the Data Integrator tool, which also has intersection features. We wish you the best for your research!

Unfortunately, other than this, we do not have any other resources or tools to help you further. You may want to ask your question on a more general bioinformatics forum like biostars (https://www.biostars.org/) or others in the chimp community.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genomics Institute

Training videos & resources: http://genome.ucsc.edu/training/index.html
Want to share the Browser with colleagues? Host a workshop: http://bit.ly/ucscTraining



--
Matthew Speir
User Experience, Quality Assurance and User Support
HCA, CIRM, and UCSC Genome Browser
UCSC Genomics Institute
Reply all
Reply to author
Forward
0 new messages