UCSC genome browser error?

45 views
Skip to first unread message

Pepperell, John

unread,
Jun 9, 2017, 11:39:51 AM6/9/17
to gen...@soe.ucsc.edu

Hello,

I work in the clinical genetics laboratory at Women & Infants Hospital in Providence, RI.  I have often used the UCSC browser and also the University of Miami/Oklahoma SNP evaluation tool (Genomic Oligoarray and SNP array evaluation tool v3.0.   http://firefly.ccs.miami.edu/cgi-bin/ROH/ROH_analysis_tool.cgi) and found them very useful in my analysis of microarray data.

 

Today (6/9/2017 at 10 am) I have used both websites and have noticed a discrepancy between the list of genes provided by the SNP evaluation tool in comparison to the list of genes identified by the UCSC genome browser/NCBI link.  I have examined the coordinates chr1:146016526-147929323 (set to GRCh37/hg19 for both websites).  Using the SNP evaluation tool v3.0 tool, 23 OMIM genes are identified (GPR89A > NBPF11), while using the UCSC browser 10 OMIM genes are identified (NBPF11 > GPR89B).  I think that I am inputting the data to the websites correctly, and wonder if there may be an error with one of websites.  I will also contact the authors of the Genomic Oligoarray and SNP array evaluation tool v3.0 about this observation.

 

I hope you will be able to help in this matter.

 

Thank you.

 

John

 

 

John Pepperell, Ph.D.

Associate Research Scientist

Genetics Division, Department of Pathology and Laboratory Medicine

Women & Infants Hospital

Providence, Rhode Island 02905

Tel: 401-453-7652.

FAX 401-453-7547.

 




This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual
or entity to whom they are addressed. If you are not the intended recipient, you are hereby notified
that any disclosure, copying, distribution or taking of any action in reliance on the information contained in
this e-mail is prohibited. If you have received this e-mail in error, please notify sender by reply e-mail and
delete this message and any attachment(s) immediately. Thank you for your consideration in this matter.

Brian Lee

unread,
Jun 9, 2017, 1:05:18 PM6/9/17
to Pepperell, John, gen...@soe.ucsc.edu
Dear Dr. Pepperell,

Thank you for using the UCSC Genome Browser and your question about the output differences between the UCSC browser and also the University of Miami/Oklahoma SNP tool.

After investigating the differences, the main message might be to be aware that different tools may have different processes. The results may not truly be a bug or data discrepancy at either site, but rather a reflection of how the tools may be attempting to provide optimal useful results (with potential issues).

Thank you for introducing the Miami/Oklahoma SNP evaluation tool, by using the provided coordinate range chr1:146016526-147929323 I was able to replicate the results you shared, seeing on the PartC All genes tab, the list of 23 genes you described (GPR89A > NBPF11). And at UCSC, I see the smaller number of genes (about 10) as seen in this below session link:

An important issue noticed when looking at the Miami/Oklahoma SNP evaluation tool results is that some of the their OMIM genes fall outside of the coordinate range input of chr1:146016526-147929323, an example is POLR3C, which is a bit to the left at chr1:146016526-147929323. However, I do note that POLR3C falls inside some of the NBPF genes that share some of the original coordinate range, so perhaps the SNP tool is algorithmically expanding the potential result space to provide users a more full list of results. Here is a session where you can see the POLR3C gene highlighted outside of the original large region highlighted: http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=brianlee&hgS_otherUserSessionName=hg19.POLR3C.miami

So in essence, the differences in results are likely from differences in how the two tools work, and it may require some investigation by users to interpret how to best use results; it is an excellent idea to contact the SNP tool authors to learn more.

One of the Browser engineers also noted that this region in GRCh37/hg19 has had many of the genomic regions dropped in the GRCh38/hg38 assembly. In the second session above with POLR3C, at the very top you will see many red regions indicating where hg19 contigs were dropped in the construction of the hg38 assembly. This suggests that much of this region in hg19 was considered obsolete sequence for the latest human assembly, and that may be another thing to consider as you pursue analyzing results.

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genomics Institute

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/445691CE28AB2D479D3EF0C0BD7E7AE55EB88180%40CDSEMP01.HS.CareNE.org.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

Pepperell, John

unread,
Jun 9, 2017, 4:18:57 PM6/9/17
to Brian Lee, gen...@soe.ucsc.edu

Dear Brian,

Thank you for your quick reply.  I also received a reply from the author of the Miami/Oklahoma SNP tool and have copied and pasted it below.  I did try to convert the coordinates chr1:146016526-147929323 using the lift over tool to version hg38, but received a “#Split in new” error message so was unable to try the different build comparison as suggested by the author.  However, I am glad that it looks like there is no fundamental problem with either site. 

 

Thanks again.

John

 

>> 

The main reason for the differences is that you compared between different genome assembly versions. Our tool converts the input gene coordinates into genome version hg38, then looks for OMIM gene located in these coordinates. And the genome assembly version in the UCSC  Genome Browser you mentioned is very like set at hg19 as default, if you change the genome version to hg38 in UCSC genome browser like this, http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr1%3A145512842%2D148457202&hgsid=595207749_7InAFcqbA6Oij1FOarezXlLd5qMi

then you should see the similar OMIM ids.

But our tool provides six more OMIM ids (#108770, #116200, #274000, #602390,#614049, #614920 ), which are OMIM phenotype ids, and are usually not provided by UCSC genome browser.

Hope this answers your question.

Best,
Zhijie


 

From: Brian Lee [mailto:bria...@soe.ucsc.edu]
Sent: Friday, June 09, 2017 1:05 PM
To: Pepperell, John
Cc: gen...@soe.ucsc.edu
Subject: Re: [genome] UCSC genome browser error?

 

This message originated from outside your organization


Brian Lee

unread,
Jun 9, 2017, 6:18:21 PM6/9/17
to Pepperell, John, gen...@soe.ucsc.edu
Dear John,

Thank you for sharing the response from the Miami/Oklahoma SNP tool, it sounds like their tool is doing its best to provide useful information by pulling from hg38.

To help explain the “#Split in new” error message seen when you try to convert the coordinates chr1:146016526-147929323 using the lift over tool to hg38, it is reflected by the note about how the this region in GRCh37/hg19 has had many of the contigs dropped in the GRCh38/hg38 assembly. Lift over does not give a single response, since there is no clear corresponding single location in hg38. Another way to see this is if you click the "View" menu when on the region in hg19 and then "In Other Genomes (Convert)" and select hg38, you will see a list of many different locations in hg38, where the top hit of 85.2% match has the same coordinates provided by the SNP tool team (chr1:145512842-148457202, which includes the POLR3C observation from before).

So it looks like, as you share, there is no fundamental problem with either site. Thank you again for your inquiry and using the UCSC Genome Browser, and sharing about the SNP tool. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genomics Institute
Reply all
Reply to author
Forward
0 new messages