Genomic coordinates from the HapMap track

7 views
Skip to first unread message

Sandy Pineda-Gonzalez

unread,
Apr 16, 2018, 10:39:08 AM4/16/18
to gen...@soe.ucsc.edu

Dear all,

I’ve been going through all forums and public spaces (without any success) to try and work out how the coordinates for the HapMap track were calculated. In the first instance I have the original set from the HapMap website and it looks as follows (where each position denotes: chromosome, position, rate and map:

chr1 55550 2.981822 0
chr1 82571 2.082414 0.080572
chr1 88169 2.081358 0.092229
chr1 254996 3.354927 0.439456
chr1 564598 2.887498 1.478148
chr1 564621 2.885864 1.478214
chr1 565433 2.883892 1.480558
chr1 568322 2.88757 1.488889
chr1 568527 2.89542 1.489481
chr1 721290 2.655176 1.931794
chr1 723819 2.669992 1.938509
chr1 728242 2.671779 1.950319
chr1 729948 2.675202 1.954877
chr1 739010 2.677693 1.979119


As you can see the original dataset above does not have the start-end coordinates which are the ones I’m after. The UCSC HapMap release 24 recombination map does have the coordinates with the start-end, but they seem to miss some of the data at the beginning in chromosome 1 and misses completely chromosome X. Just to make an example below, here is how I get the track from the track download (where each position denotes chromosome, star pos, end position and rate):

chr1 10000 55550 -1
chr1 568322 568527 -1
chr1 568527 721290 2.68581
chr1 721290 723819 2.82227
chr1 723819 723891 2.98131
chr1 723891 728242 2.98062
chr1 728242 729948 3.0781
chr1 729948 740857 3.07513
chr1 740857 750235 1.7835


So in short I would like to know:

A) how are the coordinates calculated then? (Do you guys assume the start position of the next coordinate is the same as the end coordinate from the previous one?)
B) is there any specific reason why chromosome X was not included
C) the rates of recombination are also different between both datasets
D)Is there any documentation anywhere that I could use as a guide to calculate the chromosome X coordinates based on the original hapMap dataset?
E) why is the interval of each coordinate different in the track?

Any help would be awesome and I’m sorry if I missed the answer anywhere else in the forum?

Regards,

Sandy




NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.

Matthew Speir

unread,
Apr 25, 2018, 7:38:15 PM4/25/18
to Sandy Pineda-Gonzalez, gen...@soe.ucsc.edu
Hello, Sandy.

Thank you for your question about HapMap data in the UCSC Genome Browser. 

A) how are the coordinates calculated then? (Do you guys assume the start position of the next coordinate is the same as the end coordinate from the previous one?)

Yes, the start coordinate is used as the end coordinate of the previous item. We downloaded the data in hg18 (NCBI36) coordinates from https://mathgen.stats.ox.ac.uk/wtccc-software/recombination_rates/genetic_map_b36_combined.tgz . The file for chr1 begins like this:

position COMBINED_rate(cM/Mb) Genetic_Map(cM)
45413 -1 0
558185 -1 0
558390 -1 0
711153 2.6858076690 0
713682 2.8222713027 0.0067924076
713754 2.9813105581 0.0069956111
718105 2.9806151254 0.0199672934
719811 3.0780969498 0.0250522228
730720 3.0751332930 0.0586311824

To get coordinates on hg19 (GRCh37), the hg18 files were processed by liftOver with our hg18ToHg19.over.chain.gz.

B) is there any specific reason why chromosome X was not included

chrX was not included in the tar file mentioned above that we downloaded when creating the track. You should contact the people at Oxford who created the data regarding why chrX data was not included in this tar file.

C) the rates of recombination are also different between both datasets

This is likely because the data used for the for "HapMap Release 24 recombination maps" track for hg18 and hg19 came from Oxford, not directly from HapMap

D)Is there any documentation anywhere that I could use as a guide to calculate the chromosome X coordinates based on the original hapMap dataset?

There are chrX files present in the directory https://mathgen.stats.ox.ac.uk/wtccc-software/recombination_rates/, but again it's unclear that this data is actually derived from HapMap data.

E) why is the interval of each coordinate different in the track?

Again, it would be best to ask the authors of this data at Oxford how this data was generated. It looks like this data was generated by Gil McVean's group at Oxford: https://www.bdi.ox.ac.uk/Team/gilean-mcvean.

I apologize for the confusion in the names of these tracks. It's not clear that the data in the "HapMap Release 24 combined recombination map" track under the "deCODE Recomb" track was actually created from HapMap data. We are looking for ways to rename this track to make the source of this data more clear.

Matthew Speir
UCSC Genome Bioinformatics Group

Training videos & resources: http://genome.ucsc.edu/training/index.html
Want to share the Browser with colleagues?



--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/3DBF0F16-9B43-4A0A-93FB-737176695731%40garvan.org.au.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

Reply all
Reply to author
Forward
0 new messages