Extracting Gene Names in Proximity to Custom Track

24 views

Skip to first unread message

Hale, Matthew D (mdh7cz)

unread,

Jul 29, 2020, 11:42:51 AM7/29/20

to gen...@soe.ucsc.edu

Hi,

I’m trying to determine if Table Browser (or some other tool) can be used to extract all features found within some distance of a custom track. I’m using PWMScan to create a custom track of transcription factor binding motifs throughout the AnoCar2.0 Anolis genome and would like to identify all coding sequences within 100kb of those motifs.

Is this possible? Thank you!

Matthew D. Hale

Brian Lee

unread,

Aug 3, 2020, 1:56:49 PM8/3/20

to Hale, Matthew D (mdh7cz), gen...@soe.ucsc.edu

Dear Matthew,

Thank you for using the UCSC Genome Browser and your question about extracting gene names in proximity to a custom track.

You can accomplish this goal with the intersection feature of the Table Browser: http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#Intersection

The first step would be increasing the size of your input regions. While the Table Browser has manipulations that allow you to select regions upstream or downstream of an inputted custom region, the easiest way to change the size of your regions is likely to use the command-line awk utility.

For example, if you had this input BED custom track:

track name=origData description="Original 30bp regions"
chr1 100000 100030 Data1
chr1 500000 500030 Data2
chr1 900000 900030 Data3

You can put the data lines into a file, for example, called inputBED, and use awk on the command line to subtract 50,000bp on the first coordinate and add 50,000bp to the end coordinate:

awk '{print $1, $2-50000, $3+50000, $4}' < inputBED

You could put this into a new custom track that would look like this:

track name=dataRange description="Original 30bp regions plus 50kp each side"
chr1 50000 150030 Data1
chr1 450000 550030 Data2
chr1 850000 950030 Data3

With this new wider custom track added to the Browser you can go to the Table Browser tool and select a gene track of interest by setting "group:" to "Genes and Gene Predictions" and the "track:" to "NCBI RefSeq" and then the table to "RefSeq All (ncbiRefSeq)".

Ensuring the region is set to (*) genome click the "create" button next to "intersection:" and on the new "Intersect with ncbiRefSeq" screen set the "group:" to "Custom Tracks" and then the "track:" to the name of your custom track, "dataRange" in the above-enlarged example after using awk. Leaving the settings as "All ncbiRefSeq records that have any overlap with dataRange" you can then click "submit" and set the "output format" to "custom track" and click "get output." On the new "Output ncbiRefSeq as Custom Track" page you can then edit the name and description before clicking "get custom track in genome browser" (or a file if you prefer). The result will be all the entries in the ncbiRefSeq gene track that overlap with any of your inputted enlarged dataRanges.

You can perform these intersections with other annotation tracks besides ncbiRefSeq, including data provided by other groups in Public Hubs. It happens there is a Public Hub with anoCar2 gene annotations provided by CESAR of human exons mapped to lizard. The same steps above can be used where the "group:" can be changed to a connected remote hub of data such as the "CESAR Gene Mappings" and the same intersection can be performed.

In this following session, you can see the two above custom tracks and the two intersections of these three regions where a JAG2 gene is identified from the CESAR hub from Data1 and XM_016998117.1 and XM_003214277.3 from ncbiRefSeq are identified: http://genome.ucsc.edu/s/brianlee/anoCar2_intersect

You may also want to spend some time looking at our mailing list archives of similar questions. Other tools and scripts exist to try to approximate your goal such as closestGene.sh (http://genomewiki.ucsc.edu/index.php/Finding_nearby_genes), closest-features (https://bedops.readthedocs.io/en/latest/content/reference/set-operations/closest-features.html) and resources at Galaxy (https://galaxyproject.github.io/training-material/topics/introduction/tutorials/galaxy-intro-peaks2genes/tutorial.html). To search our mailing-list archives you can type in any query and find previous answers such as these that may be of interest:
https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/bBan5oEFIjQ/LtRKiW-qAwAJ
https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/zVjaN_CMiQQ/YDu-_eCzBQAJ
https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/P2dPUMuL2YM/9IlMB1xbDgAJ

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further public questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/EA6C8C39-9E2A-44E1-AE64-B96CD47914EC%40virginia.edu.

Brian Lee, QA Manager

UCSC Genome Browser - UC Santa Cruz Genomics Institute

Google Scholar | Twitter | Facebook | YouTube

Reply all

Reply to author

Forward

0 new messages