Comparison of UCSC RefSeq & RefSeq Curated

167 views
Skip to first unread message

Yung-Chih Lai

unread,
Dec 7, 2017, 11:11:23 AM12/7/17
to UCSC Genome Browser Discussion List
Hi,

When I analyze RNA-Seq data, I always use UCSC RefSeq annotation for the mouse species. Now, there are similar but different options (i.e. RefSeq Curated, RefSeq Predicted, RefSeq Other, RefSeq Alignments). Should I still use UCSC RefSeq annotation? Could you tell me the advantage and disadvantage of these similar annotations? Usually, we produce about 30 million single-end 75bp reads for each sample, and we are interested in gene-level differentially expressed analysis. Many thanks.

Best,

Gary 

Cath Tyner

unread,
Dec 7, 2017, 6:56:26 PM12/7/17
to gen...@soe.ucsc.edu, yungc...@gmail.com
Hi Gary,

Thank you for contacting the UCSC Genome Browser. As you have found, and as seen in our news page, we released the new NCBI RefSeq tracks for mm10 last month.

Here is a related FAQ which points out the differences between the older UCSC RefSeq track (which you were used to using) and the newer NCBI RefSeq track: 


To summarize the FAQ linked above:
Due to the previous unavailability of direct coordinates from NCBI, the older UCSC RefSeq track was generated by aligning sequence (from NCBI RefSeq) with our BLAT alignment tool (and subsequent filtering for best matches) to the genome. In some (relatively rare) cases, sequence would map to multiple loci. Only the highest-scoring matches of multiple-mapped regions were kept, and these matches were given the same RefSeq accession identifier. Without understanding the fact that an alignment process was used, there could be some confusion as to why the same identifier was found in more than one location. In some research cases, we believe that it could be helpful to know about these high-scoring multiple matches with same identifiers, thus linking them back to their .fa source.

Recently, the previous unavailability of direct coordinates from NCBI changed and became available, thus allowing us the opportunity to create the new NCBI RefSeq track. With the direct coords from NCBI (and not just the .fa sequence), using an aligner was no longer needed. We are now able to display the non-redundant NCBI RefSeq annotations in our browser without NCBI-discrepancy. Meanwhile, we wanted to keep the older UCSC annotations of RefSeq as well, as a subtrack. 

Digging even deeper, you can read a UCSC Genome Browser blog post which was written when this same NCBI RefSeq track was released for human GRCh38/hg38 (the blog also pertains to mm10). 

Example showing the difference between UCSC RefSeq and NCBI RefSeq tracks:
A great example to visualize these differences is to go to the hg38 browser and enter this RefSeq accession ID into the search field: "NM_173571". You'll be taken to a search results page which shows just 1 hit from the non-redundant NCBI RefSeq track. However, from the BLAT-aligned UCSC RefSeq track, you'll see 10 hits for this accession where the sequence aligned with high scores to various loci on chrX. 

To really see what's happening with this multiple alignment, you can copy the sequence of this NCBI NM_173571 accession and then use our BLAT tool to find alignments to the genome. Here, you'll see all results (not just the UCSC RefSeq track criteria-passing 10 matches of 100% identity) from BLAT.

Understanding the NCBI RefSeq subtracks:
To understand more about the NCBI RefSeq subtracks (RefSeq Curated, RefSeq Predicted, RefSeq Other, RefSeq Alignments) in the UCSC Genome Browser, check out the mm10 track description page. Once you review the description page, you can learn more about the accession types over at NCBI:


NCBI resources
Please note that RefSeq is "the NCBI database of reference sequences; a curated, non-redundant set including genomic DNA contigs, mRNAs and proteins for known genes, and entire chromosomes."

From the summary section in the link above:
"If multiple INSDC submissions represent the same molecule for an organism, the "best" sequence is chosen to represent as the RefSeq record. "

Please respond to this list if you have further questions, and please always feel free to search our mailing list archives for related posts.

Thank you for contacting the UCSC Genome Browser support team. 
Please send new and follow-up questions to one of our mailing lists below:

  * Post to the Public Help Forum: E
mail 
gen...@soe.ucsc.edu
​ or search the Public Archives
​  * Post to the Mirror Help Forum: Email
 
genome...@soe.ucsc.edu 
or search the Mirror Archives​
​  * Confidential/private help: Email
 
genom...@soe.ucsc.edu

Join us on Social Media! FacebookTwitter, Wordpress BlogYouTube
UCSC Genome Browser Announcements List (for new data & software)
Request on-site training & workshops at your institution

​Enjoy,​
Cath
. . .
Cath Tyner
UCSC Genome Browser, Software QA & User Support
UC Santa Cruz Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAB11ZD6vjBgt_4kHN7Mo7ZFLP2U%3DsCUJ0RTKSx4EAzERPTm-FQ%40mail.gmail.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

Reply all
Reply to author
Forward
0 new messages