hg38.ncbiRefSeq missing transcripts found in hg38.refFlat

Tim Fennell

unread,

Jan 8, 2018, 1:53:04 PM1/8/18

to gen...@soe.ucsc.edu

Hi,

I'm trying to understand why some transcripts are present in the UCSC alignments of RefSeq RNAs, but are not present in the data imported directly from refseq. I've read through the announcement blog post and the FAQ entry that I could find, but neither answered my question.

An example is probably easiest. Take the gene PDGFRA. If I run the following two queries in the public MySQL:

select * from hg38.refFlat where geneName = 'PDGFRA' and name like 'NM_%';

select * from hg38.ncbiRefSeq where name2 = 'PDGFRA' and name like 'NM_%';

the first one returns five transcripts (NM_001347827, NM_001347828, NM_001347829, NM_001347830, NM_006206) while the second one returns only a single transcript: NM_006206.4.

I thought perhaps that RefSeq only provided alignments for the single transcript. I couldn't find a reference at UCSC for exactly how the alignments from RefSeq are accessed, but on the RefSeq site I found the following GFF which I'm assuming contains data from the same source:

ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/GRCh38_latest_genomic.gff.gz

That GFF contains entries for all five transcripts, with alignments.

More broadly it would appear that somewhere between ~12% of NM transcripts that are present in hg38.refFlat are not present in hg38.ncbiRefSeq. The following query returns a count of 7,725, which the total number of NM transcripts in hg38.refFlat is 65,664:

select count(distinct rf.name)

from hg38.refFlat rf

left join hg38.ncbiRefSeq nm on rf.geneName = nm.name2 and rf.name = substr(nm.name, 1, instr(nm.name, '.')-1)

where nm.name is null

and rf.name like 'NM_%'

;

Any light that could be shed on these differences would be greatly appreciated! Thanks,

-t

Cath Tyner

unread,

Jan 8, 2018, 8:42:53 PM1/8/18

to Tim Fennell, UCSC Genome Browser Public Help Forum

Hi Tim,

Thanks for contacting the UCSC Genome Browser support team. Our ncbiRefSeq table is updated manually and will therefore be out of sync (older) than the automatically updated refGene/refLink tables.

The accessions you mentioned (NM_001347827, NM_001347828, NM_001347829, NM_001347830) all seem to have been into NCBI only recently:

e.g.,

https://www.ncbi.nlm.nih.gov/nuccore/NM_001347828

29-DEC-2017

Whereas the hg38 ncbiRefSeq table was last updated previous to those additions:

http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&g=ncbiRefSeq

Data last updated: 2016-11-29

When this ncbiRefSeq table is updated, we'll send an announcement out to our low-volume email announcements list (subscription info below).

Please respond to this list if you have further questions, and please always feel free to search our mailing list archives for related posts.

Thank you for contacting the UCSC Genome Browser support team.

Please send new and follow-up questions to one of our mailing lists below:

* Post to the Public Help Forum: E
mail
gen...@soe.ucsc.edu
or search the Public Archives

 * Post to the Mirror Help Forum: Email
genome...@soe.ucsc.edu or search the Mirror Archives

 * Confidential/private help: Email
genom...@soe.ucsc.edu

Join us on Social Media! Facebook, Twitter, Wordpress Blog, YouTube

UCSC Genome Browser Announcements List (for new data & software)

Request on-site training & workshops at your institution

Enjoy,

Cath

. . .

Cath Tyner

UCSC Genome Browser, Software QA & User Support

UC Santa Cruz Genomics Institute

UCSC Genome Browser

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAKKASR2hs6Q4prSfoTjV7uBwD91joW-O7frDroLQFka0j3sz1g%40mail.gmail.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

Tim Fennell

unread,

Jan 9, 2018, 11:43:37 AM1/9/18

to Cath Tyner, UCSC Genome Browser Public Help Forum

Thanks Cath. I did wonder about that. Looking at the revision history at NCBI they've been around a bit longer than that:

https://www.ncbi.nlm.nih.gov/nuccore/NM_001347828.1?report=girevhist

but still only since Dec 15, 2016, which is still later than the last update date on the ncbiRefSeq table.

I'm a little surprised to find that ncbiRefSeq hasn't been updated in over a year, especially as the default settings for the RefSeq track in the browser are now only to show data from ncbiRefSeqCurated for hg38. Are there any plans to increase the update frequency? Thanks again for your help,

-t

Cath Tyner

unread,

Jan 10, 2018, 3:56:56 PM1/10/18

to Tim Fennell, UCSC Genome Browser Public Help Forum

Hello again Tim,

Our engineer team is currently in the development process with the goal of producing updates for this track on a regular basis. Since it's a new track in the Browser, we have initially needed to complete much of this process manually. As you've mentioned, due to its importance and high demand, we are working to streamline these procedures. If you subscribe to our announcements list, we'll certainly post an update when this track has been updated. In the future, it is our goal to provide updates for this track on a frequent basis.

For reference, here is the source that

we

aim to be in sync with:
ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_mammalian/Homo_sapiens/all_assembly_versions/GCF_000001405.37_GRCh38.p11/

Thanks for your patience, and thanks for using the UCSC Genome Browser.

Cath

. . .

Cath Tyner

UCSC Genome Browser, Software QA & User Support

UC Santa Cruz Genomics Institute

UCSC Genome Browser

Reply all

Reply to author

Forward