Hi Jake,
Thank you for the suggestion, I will make sure it's something that's
taken into consideration for future responses concerning getting
rRNA gene coordinates from the GENCODE track in the UCSC Genome
Browser.
As for a track that may have a complete set of rRNA annotations, I'm
not sure that we have one our public site. I was going to suggest
the RefSeq Genes track, but even that only appears to contain a
subset of the possible rRNA gene annotations. According to the
GenBank description of the 28S rRNA,
https://www.ncbi.nlm.nih.gov/nuccore/NR_003287.2?report=genbank, the
regions containing the the 45S rRNA precursor for the 18S, 5.8S and
28S rRNA should be found on chromosomes 13, 14, 15, 21 and 22.
However, our RefSeq Genes track only contains annotations for these
three rRNA genes on chr21, an unplaced chr22 scaffold, and an
unlocalized and unplaced scaffold:
NR_003287 chr21 + 8213887 8401980 rRNA RNA28S5
NR_003285 chr21 + 8212571 8212727 rRNA RNA5-8S5
NR_003285 chr21 + 8256780 8256936 rRNA RNA5-8S5
NR_003286 chr21 + 8209630 8211499 rRNA RNA18S5
NR_003285 chr21 + 8395606 8395762 rRNA RNA5-8S5
NR_003285 chr21 + 8439822 8439978 rRNA RNA5-8S5
NR_046235 chr21 + 8433221 8446572 rRNA RNA45S5
NR_003287 chr21 + 8441145 8446211 rRNA RNA28S5
NR_003286 chr21 + 8436875 8438744 rRNA RNA18S5
NR_003286 chr21 + 8392665 8394534 rRNA RNA18S5
NR_003287 chrUn_GL000220v1 + 113347 118417 rRNA RNA28S5
NR_046235 chrUn_GL000220v1 + 105423 118780 rRNA RNA45S5
NR_003285 chrUn_GL000220v1 + 112024 112180 rRNA RNA5-8S5
NR_003286 chrUn_GL000220v1 + 109077 110946 rRNA RNA18S5
NR_003285 chrUn_GL000220v1 + 155996 156152 rRNA RNA5-8S5
NR_003286 chrUn_GL000220v1 + 153049 154918 rRNA RNA18S5
NR_003287 chr22_KI270733v1_random + 130203 135280 rRNA RNA28S5
NR_046235 chr22_KI270733v1_random + 122272 135645 rRNA RNA45S5
NR_003285 chr22_KI270733v1_random + 128876 129032 rRNA RNA5-8S5
NR_003286 chr22_KI270733v1_random + 125930 127799 rRNA RNA18S5
NR_003285 chr22_KI270733v1_random + 173955 174111 rRNA RNA5-8S5
NR_003286 chr22_KI270733v1_random + 171011 172880 rRNA RNA18S5
In addition to these annotations, there are a number of 5S rRNAs.
The RefSeq Genes track is based on aligning RNAs from RefSeq to the
genome and then selecting those best alignments. In cases where
there are two best alignments, both are kept. I know that GENCODE
annotated a number of 5S pseudogenes, so it's possible that some of
the multiple alignments for a particular 5S gene could overlap with
some of the GENCODE 5S pseudogenes, though I haven't investigated
this. You can read more about how the RefSeq Genes track is
constructed on the track description page here:
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&g=refGene. If
you're interested, I can give some instructions on how to extract
rRNA annotations from the RefSeq Genes track.
Even though this output contains a few more annotations for the
larger rRNA genes, I doubt that this would be a complete annotation
of all the rRNA genes in the genome. This is because, according to
Wikipedia, each cluster on chromosomes 13, 14, 15, 21, and 22
contains 30-40 repeats of the 45S rRNA precursor gene whereas the
above list only contains one 45S annotation per chromosome and only
a few 5.8S, 18S, and 28S annotations outside of these. That's not
not even counting those rRNA annotations missing from chr13, chr14,
and chr15. You can try asking this question of other, more general
online biology help forums including Biostars or SeqAnswers to see
if others have recommendations for finding a full set of all rRNA
genes in the genome or how to best mask these rRNA genes from your
RNA-seq analysis.