Hello Eric,
Thank you for using the UCSC Genome Browser and your inquiry.
Our engineers share that it would be simple to return the sequence for all the exons, even when they overlap, as exons aren't much of the genome. However, from your query, the sequence returned is almost three times the size of the whole genome due to multiple isoforms for a gene in the same locus. The procedure will produce a fasta file with 5,066,752,749 bases (2,765,158 N's) in 72,577 sequences.
To avoid the timeout issue from the large query, you can extract the annotations using the public MySQL server, the hg38 2bit file, the bedClip utility, and the twoBitToFa utility. To use the twoBitToFa utility, you will have to download the chrom.sizes file for hg38 as well. You can download the necessary files and utilities from our downloads server:
hg38 2bit: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.2bit
chrom.sizes: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes
twoBitToFa and bedClip: http://hgdownload.soe.ucsc.edu/admin/exe/
After downloading the files and tools, you can query the public MySQL server to create a BED file to extract the sequence for each exon and intron, plus 4,000 bases upstream and downstream of the gene. The following commands querying the public MySQL server will create the BED file, output.bed:
hgsql hg38 -Ne "select chrom, txStart, txEnd,name from ncbiRefSeqCurated" | awk '{ $2 = $2 - 4000; $3 = $3 + 4000; print}' | bedClip stdin hg38.chrom.sizes output.bed
Once you have created the BED file for the hg38 genome, you can use the twoBitToFa command with the -bed option to get your sequence.
twoBitToFa hg38.2bit output.fa -bed=output.bed
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly-accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Jairo Navarro
UCSC Genome Browser
Want to share the Browser with colleagues?
Host a workshop: http://bit.ly/ucscTraining
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAC6vt_k30W5xf8ee9%2BGK8dd7qKgX_ooQu%3DTMzOB1UkE-4Le47w%40mail.gmail.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.