Hello Anjuska,
Thank you for using the UCSC Genome Browser and your inquiry.
Could you give an example of what you are trying to achieve? Do you want to create a unique data format such as the following?
Unfortunately, there is no simple solution, and the final solution will require a fair amount of scripting to get the data formatted as you requested, which is beyond the scope of this mailing list. You may find some scripting help from other bioinformatics forums such as BioStars.
That being said, we can help you:After you have these three files, you can then do a bit of scripting to merge them.
To create a file that contains each base and position, we can use the twoBitToFa utility. You can download this tool from the utilities directory for your operating system. Using this tool, you can use a BED4 file to specify a region to extract sequence from the hg38 2bit file. If you are not familiar with how we store coordinates in different formats (0-start BED vs. 1-start positional), you can learn more from the following blog post: The UCSC Genome Browser Coordinate Counting Systems. For example, to extract the sequence for the following regions:
We will have to convert these ranges into a BED4 file that describes one base per line:
You can then use the twoBitToFa utility using the newly created BED4 to limit the sequence output to your regions of interest:
Which should create a file that describes each base position and its nucleotide sequence:
To learn whether each base overlaps with an exon or intron, we will use the Table Browser's intersection tool. We will create a custom track that contains regions for each exon/intron inside of your chosen gene annotation track. Here's a previously answered question that shows how to get exons-only and introns-only positions as custom tracks using the GENCODE V24 dataset. With these two custom tracks, you can then intersect them with the BED file from Step 1 to filter overlaps.
Exon and intron regions will depend on the gene track you use, so you should select whichever gene track best fits your research purpose. For this example, I will use the knownGene table and here is a session with the exon-only custom track and the BED4 file from Step 1 as a custom track:
After loading this session, next to intersection:, click the create radio button. Once on the new page, select the custom track tb_knownGene
, and then select: "All User Track records that have any overlap with tb_knownGene". Once this option is selected, click submit. After you are back on the main Table Browser page, click get output. You should get output that describes which bases overlap with exon regions.
Now you would just need to append the columns based on the key matching field (position) back into your BED4 file. There is an example of this in this previously answered question:
https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/R8CstMtiJZM/TFeA7iIYAQAJ
The outcome should be something like the following, with the following five fields, and the 5th field may be blank where the base doesn't overlap with exon or intron:
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly-accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Jairo Navarro
UCSC Genomics Institute
Want to share the Browser with colleagues?
Host a workshop: http://bit.ly/ucscTraining
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Mirror-Specific Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome-mirror+unsubscribe@soe.ucsc.edu.
To post to this group, send email to genome...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome-mirror/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome-mirror/CALyRH-wgKPOUgPjFTXCze9MqGBHdM%3DNm9vzQtJq%3Dg5ZUYbM5Mg%40mail.gmail.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.