Dear Matthew,
Thank you for using the UCSC Genome Browser and your question about extracting gene names in proximity to a custom track.
You can accomplish this goal with the intersection feature of the Table Browser:
http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#IntersectionThe first step would be increasing the size of your input regions. While the Table Browser has manipulations that allow you to select regions upstream or downstream of an inputted custom region, the easiest way to change the size of your regions is likely to use the command-line awk utility.
For example, if you had this input BED custom track:
track name=origData description="Original 30bp regions"
chr1 100000 100030 Data1
chr1 500000 500030 Data2
chr1 900000 900030 Data3
You can put the data lines into a file, for example, called inputBED, and use awk on the command line to subtract 50,000bp on the first coordinate and add 50,000bp to the end coordinate:
awk '{print $1, $2-50000, $3+50000, $4}' < inputBED
You could put this into a new custom track that would look like this:
track name=dataRange description="Original 30bp regions plus 50kp each side"
chr1 50000 150030 Data1
chr1 450000 550030 Data2
chr1 850000 950030 Data3
With this new wider custom track added to the Browser you can go to the Table Browser tool and select a gene track of interest by setting "group:" to "Genes and Gene Predictions" and the "track:" to "NCBI RefSeq" and then the table to "RefSeq All (ncbiRefSeq)".
Ensuring the region is set to (*) genome click the "create" button next to "intersection:" and on the new "Intersect with ncbiRefSeq" screen set the "group:" to "Custom Tracks" and then the "track:" to the name of your custom track, "dataRange" in the above-enlarged example after using awk. Leaving the settings as "All ncbiRefSeq records that have any overlap with dataRange" you can then click "submit" and set the "output format" to "custom track" and click "get output." On the new "Output ncbiRefSeq as Custom Track" page you can then edit the name and description before clicking "get custom track in genome browser" (or a file if you prefer). The result will be all the entries in the ncbiRefSeq gene track that overlap with any of your inputted enlarged dataRanges.
You can perform these intersections with other annotation tracks besides ncbiRefSeq, including data provided by other groups in Public Hubs. It happens there is a Public Hub with anoCar2 gene annotations provided by CESAR of human exons mapped to lizard. The same steps above can be used where the "group:" can be changed to a connected remote hub of data such as the "CESAR Gene Mappings" and the same intersection can be performed.
In this following session, you can see the two above custom tracks and the two intersections of these three regions where a JAG2 gene is identified from the CESAR hub from Data1 and XM_016998117.1 and XM_003214277.3 from ncbiRefSeq are identified:
http://genome.ucsc.edu/s/brianlee/anoCar2_intersectYou may also want to spend some time looking at our mailing list archives of similar questions. Other tools and scripts exist to try to approximate your goal such as closestGene.sh (
http://genomewiki.ucsc.edu/index.php/Finding_nearby_genes), closest-features (
https://bedops.readthedocs.io/en/latest/content/reference/set-operations/closest-features.html) and resources at Galaxy (
https://galaxyproject.github.io/training-material/topics/introduction/tutorials/galaxy-intro-peaks2genes/tutorial.html). To search our mailing-list archives you can type in any query and find previous answers such as these that may be of interest:
https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/bBan5oEFIjQ/LtRKiW-qAwAJhttps://groups.google.com/a/soe.ucsc.edu/d/msg/genome/zVjaN_CMiQQ/YDu-_eCzBQAJhttps://groups.google.com/a/soe.ucsc.edu/d/msg/genome/P2dPUMuL2YM/9IlMB1xbDgAJThank you again for your inquiry and using the UCSC Genome Browser. If you have any further public questions, please reply to
gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to
genom...@soe.ucsc.edu.
All the best,