Centromere regions in GRCH38

Skip to first unread message

Neeba Dijo

Jul 1, 2014, 11:25:25 AM7/1/14
to gen...@soe.ucsc.edu
Hi Team ,

For hg19, I stored each chromosome length and Centromere starta and end, PAR regions to database

T took data from gap.txt file

But for hg38, i didn't find any centromere data in gap file

I know there is some change

" Centromere representation - Debuting in this release, the large megabase-sized gaps that were previously used to represent centromeric regions
in human assemblies have been replaced by sequences from centromere models created by Karen Miga et al. using centromere databases developed during
her work in the Willard lab at Duke University and analysis software developed while working in the Kent lab at UCSC. The models, which provide the
approximate repeat number and order for each centromere, will be useful for read mapping and variation studies"

From the following data in  in gap  file

585     chr1    0       10000   1       N       10000   telomere        no
586     chr1    207666  257666  5       N       50000   contig  no
587     chr1    297968  347968  7       N       50000   contig  no
589     chr1    535988  585988  10      N       50000   contig  no
605     chr1    2702781 2746290 48      N       43509   scaffold        yes
85      chr1    12954384        13004384        224     N       50000   scaffold        yes
713     chr1    16799163        16849163        277     N       50000   scaffold        yes
810     chr1    29552233        29553835        491     N       1602    scaffold        yes
1515    chr1    121976459       122026459       1845    N       50000   contig  no
1517    chr1    122224535       122224635       1847    N       100     contig  no
1519    chr1    122503147       122503247       1849    N       100     contig  no
1537    chr1    124785432       124785532       1851    N       100     contig  no
1537    chr1    124849129       124849229       1853    N       100     contig  no
1538    chr1    124932724       124932824       1855    N       100     contig  no
1538    chr1    124977944       124978326       1857    N       382     scaffold        yes
1538    chr1    125013060       125013223       1859    N       163     scaffold        yes
1538    chr1    125026048       125026071       1861    N       23      scaffold        yes
1538    chr1    125029104       125029169       1863    N       65      scaffold        yes
1539    chr1    125103213       125103233       1865    N       20      scaffold        yes
1539    chr1    125130246       125131847       1867    N       1601    scaffold        yes
1539    chr1    125171347       125173583       1869    N       2236    scaffold        yes
0       chr1    125184587       143184587       1871    N       18000000        heterochromatin no
286     chr1    223558935       223608935       2922    N       50000   contig  no
36      chr1    228558364       228608364       2985    N       50000   scaffold        yes
2484    chr1    248946422       248956422       3281    N       10000   telomere        no

Can I take  scaffold       as centromere regions? 

Thanks & Regards,
Neeba Sebastian

Brian Lee

Jul 3, 2014, 4:49:35 PM7/3/14
to Neeba Dijo, gen...@soe.ucsc.edu
Dear Neeba,

Thank you for using the UCSC Genome Browser and your question about the hg38 centromere regions.

You can proceed in different directions depending on the purposes of your work when interpreting the centromere regions for hg38. Please load this session as an example of a centromere in hg38 on chromosome 9: http://genome-test.cse.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=AngieHinrichs&hgS_otherUserSessionName=hg38_chr9_cen

New to hg38 on our genome-test development site is a centromeres track (bright red track below the Hg19 Diff track, with the table centromeres). This track shows the location of Karen Miga's centromere model sequences. However, these annotations can be smaller than the centromeres shown in the chromosome ideogram and Chromosome Bands track. The hg38.cytoBandIdeo table has rows used to show centromere locations in the chromosome ideograms where each chromosome has two adjacent entries: a p11 (or p11.1) and q11 (or q11.1). For chr9, there are 9p11.1 and 9q11 from chr9:42200000-43000000 and chr9:43000000-45500000 respectively.

A couple things to note in the above session:
1. There are annotations extending into 9p11.1 and 9q11.
2. The Gap track has some tiny gaps in 9q11, close to the centromere (bright red) track-- but notice the large gap labeled "heterochromatin". That's not the centromere, but it's not accessible to current sequencing technologies either.

In summary, depending on your purpose you could use the centromere model regions (red), or the broader Chromosome Bands Ideogram definition of centromere which overlap some annotations (cytoBandIdeo table), or the regions of the assembly that are just NNNNN's (Gap track).

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group


Reply all
Reply to author
0 new messages