Dear UCSC Genome Browser support team,
I am trying, without success, to figure out a Table tool-based way to find out the closest gene start-end position to a list of coordinate position (hg19) in the genome.
Say, for example, I have the list of coordinates of interest (hg19) shown below, and I want to find out the coordinates of the nearest annotated genes (RefSeq gene) in both directions (left and rightwards).
Example list of coordinates;
chr1 1413937 1413938
chr5 121812697 121812698
chr5 180702296 180702297
chr6 27799457 27799458
chr6 31779620 31779621
chr7 45808644 45808645
chr8 22877790 22877791
chr8 22921537 22921538
chr21 40751102 40751103
I tried both intersection and data integrator, but I could not find a table feature that will correspond to “nearest gene.”
Any suggestions?
Best regards,
Enrique
Hi Enrique,
Thank you for your question on how to find the closest RefSeq genes
to a given coordinate. Unfortunately we do not have a way to find the
closest genes to a given coordinate, but we do have a tool for finding
the "nearest" genes to another gene (via a large selection of "nearest"
criteria), the Gene Sorter:
http://genome.ucsc.edu/cgi-bin/hgNear
However, given that you want to find the nearest genes to a set of coordinates, I think using the closest-features
tool from the bedops suite is probably your best bet:
https://bedops.readthedocs.io/en/latest/
Here is a method to find the nearest upstream and downstream RefSeq transcripts to your given coordinates:
# copy and paste coords of interest into file then sort: $ sort -k1,1 -k2,2n hg19.coords > hg19.coords.sorted # get hg19 refSeq entries and sort: $ mysql --user=genome --host=genome-mysql.soe.ucsc.edu -A -Ne "select * from refGene" hg19 | cut -f2-6 | awk 'BEGIN {OFS="\t"} {print $2, $4, $5, $1}' | sort -k1,1 -k2,2n > hg19.refSeq.bed # find closest: $ closest-features hg19.coords.sorted hg19.refSeq.bed chr1 1413937 1413938|chr1 1413494 1431584 NM_001317238|chr1 1447522 1470067 NM_001170535 chr21 40751102 40751103|chr21 40714240 40721047 NM_004965|chr21 40752169 40769815 NM_004627 chr5 121812697 121812698|chr5 121772191 121814782 NR_051996|chr5 121917193 121920295 NR_134281 chr5 180702296 180702297|chr5 180688212 180699308 NR_102762|chr5 180750506 180755196 NR_028322 chr6 27799457 27799458|chr6 27798951 27799305 NM_003541|chr6 27805657 27806117 NM_003510 chr6 31779620 31779621|chr6 31777395 31782835 NM_005527|chr6 31783290 31785719 NM_005345 chr7 45808644 45808645|chr7 45763385 45808617 NR_024271|chr7 45844827 45847509 NR_146388 chr8 22877790 22877791|chr8 22877647 22926700 NR_027140|chr8 22925741 22941132 NR_038873 chr8 22921537 22921538|chr8 22877647 22926700 NR_027140|chr8 22925741 22941132 NR_038873
Please note that the results seem to depend on sorting both inputs, and the up and downstream features are separated by the "|" character.
Please let us know if you have any further questions!
Thank you again for your inquiry and using the UCSC Genome Browser. If
you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a
publicly-accessible forum. If your question includes sensitive data,
you may send it instead to genom...@soe.ucsc.edu.
Christopher Lee
UCSC Genomics Institute
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CADv9Q1AFuUbmCL27%2BM1B02APo_ujTcoMUBMY0uEuW2V41XHOYw%40mail.gmail.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.