[genome] How to find out which is the nearest RefSeq gene to a given hg19 coordinate

53 views
Skip to first unread message

Enrique Medina-Acosta

unread,
Sep 20, 2017, 4:33:29 PM9/20/17
to UCSC Genome Browser Mailing List

Dear UCSC Genome Browser support team,

 

I am trying, without success, to figure out a Table tool-based way to find out the closest gene start-end position to a list of coordinate position (hg19) in the genome.

 

Say, for example, I have the list of coordinates of interest (hg19) shown below, and I want to find out the coordinates of the nearest annotated genes (RefSeq gene) in both directions (left and rightwards).

 

Example list of coordinates;

chr1       1413937               1413938

chr5       121812697           121812698

chr5       180702296           180702297

chr6       27799457             27799458

chr6       31779620             31779621

chr7       45808644             45808645

chr8       22877790             22877791

chr8       22921537             22921538

chr21     40751102             40751103

 

I tried both intersection and data integrator, but I could not find a table feature that will correspond to “nearest gene.

 

Any suggestions?

 

Best regards,

 

Enrique


Prof. Dr. Enrique Medina-Acosta, M.Sc., PhD.
MyCVLattes - CNPq
MyResearcherID (ISI Web of Knowledge)
MyORCID

Senior Associate Professor
Unit Coordinator - Molecular Identification and Diagnostics Unit - NUDIM
Laboratory of Biotechnology, Center for Biosciences and Biotechnology
Universidade Estadual do Norte Fluminense Darcy Ribeiro - UENF
Building P4, rooms 212/228
Avenida Alberto Lamego 2000, Parque Califórnia, CEP 28013-602, Campos dos Goytacazes, RJ, Brazil.
Tel: +55 22 27397085



Christopher Lee

unread,
Sep 22, 2017, 12:40:21 PM9/22/17
to Enrique Medina-Acosta, UCSC Genome Browser Mailing List

Hi Enrique,

Thank you for your question on how to find the closest RefSeq genes to a given coordinate. Unfortunately we do not have a way to find the closest genes to a given coordinate, but we do have a tool for finding the "nearest" genes to another gene (via a large selection of "nearest" criteria), the Gene Sorter:
http://genome.ucsc.edu/cgi-bin/hgNear

However, given that you want to find the nearest genes to a set of coordinates, I think using the closest-features tool from the bedops suite is probably your best bet:
https://bedops.readthedocs.io/en/latest/

Here is a method to find the nearest upstream and downstream RefSeq transcripts to your given coordinates:

# copy and paste coords of interest into file then sort:
$ sort -k1,1 -k2,2n hg19.coords > hg19.coords.sorted

# get hg19 refSeq entries and  sort:
$ mysql --user=genome --host=genome-mysql.soe.ucsc.edu -A -Ne "select * from refGene" hg19 | cut -f2-6 | awk 'BEGIN {OFS="\t"} {print $2, $4, $5, $1}' | sort -k1,1 -k2,2n > hg19.refSeq.bed

# find closest:
$ closest-features hg19.coords.sorted hg19.refSeq.bed
chr1    1413937    1413938|chr1    1413494    1431584    NM_001317238|chr1    1447522    1470067    NM_001170535
chr21    40751102    40751103|chr21    40714240    40721047    NM_004965|chr21    40752169    40769815    NM_004627
chr5    121812697    121812698|chr5    121772191    121814782    NR_051996|chr5    121917193    121920295    NR_134281
chr5    180702296    180702297|chr5    180688212    180699308    NR_102762|chr5    180750506    180755196    NR_028322
chr6    27799457    27799458|chr6    27798951    27799305    NM_003541|chr6    27805657    27806117    NM_003510
chr6    31779620    31779621|chr6    31777395    31782835    NM_005527|chr6    31783290    31785719    NM_005345
chr7    45808644    45808645|chr7    45763385    45808617    NR_024271|chr7    45844827    45847509    NR_146388
chr8    22877790    22877791|chr8    22877647    22926700    NR_027140|chr8    22925741    22941132    NR_038873
chr8    22921537    22921538|chr8    22877647    22926700    NR_027140|chr8    22925741    22941132    NR_038873

Please note that the results seem to depend on sorting both inputs, and the up and downstream features are separated by the "|" character.

Please let us know if you have any further questions!

Thank you again for your inquiry and using the UCSC Genome Browser. If
you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a
publicly-accessible forum. If your question includes sensitive data,
you may send it instead to genom...@soe.ucsc.edu.

Christopher Lee
UCSC Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CADv9Q1AFuUbmCL27%2BM1B02APo_ujTcoMUBMY0uEuW2V41XHOYw%40mail.gmail.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

Reply all
Reply to author
Forward
0 new messages