procedures have exceeded timeout: 1200 seconds, function has ended

182 views
Skip to first unread message

Eric Ho

unread,
May 14, 2019, 11:39:20 AM5/14/19
to genome
Hi,

I would like to download human hg38 exon + intron sequences + 4000 bps upstream and downstream of the gene.

So, I used the table browser tool with certain parameters (see the attached screenshot)

However, I encountered the timeout problem stated in the subject line.

I knew the file was going to be big, so I specified gzip output. Despite it, I have no success.

I am wondering how I can download such a large amount of data from UCSC GB. Is there a way to output the query in sql so that I can use it to fetch data from the public mysql server.

Your help is much appreciated.

best regards,
Eric Ho
Assistant Professor of Biology
Lafayette College
Screen Shot 2019-05-14 at 11.12.44 AM.png

Jairo Navarro Gonzalez

unread,
May 16, 2019, 4:52:59 PM5/16/19
to Eric Ho, genome

Hello Eric,

Thank you for using the UCSC Genome Browser and your inquiry.

Our engineers share that it would be simple to return the sequence for all the exons, even when they overlap, as exons aren't much of the genome. However, from your query, the sequence returned is almost three times the size of the whole genome due to multiple isoforms for a gene in the same locus. The procedure will produce a fasta file with 5,066,752,749 bases (2,765,158 N's) in 72,577 sequences.

To avoid the timeout issue from the large query, you can extract the annotations using the public MySQL server, the hg38 2bit file, the bedClip utility, and the twoBitToFa utility. To use the twoBitToFa utility, you will have to download the chrom.sizes file for hg38 as well. You can download the necessary files and utilities from our downloads server:

hg38 2bit: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.2bit
chrom.sizes: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes
twoBitToFa and bedClip: http://hgdownload.soe.ucsc.edu/admin/exe/

After downloading the files and tools, you can query the public MySQL server to create a BED file to extract the sequence for each exon and intron, plus 4,000 bases upstream and downstream of the gene. The following commands querying the public MySQL server will create the BED file, output.bed:

hgsql hg38 -Ne "select chrom, txStart, txEnd,name from ncbiRefSeqCurated" | awk '{ $2 = $2 - 4000; $3 = $3 + 4000; print}' | bedClip stdin hg38.chrom.sizes output.bed

Once you have created the BED file for the hg38 genome, you can use the twoBitToFa command with the -bed option to get your sequence.

twoBitToFa hg38.2bit output.fa -bed=output.bed

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly-accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro 
UCSC Genome Browser

Want to share the Browser with colleagues?
Host a workshop: http://bit.ly/ucscTraining


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAC6vt_k30W5xf8ee9%2BGK8dd7qKgX_ooQu%3DTMzOB1UkE-4Le47w%40mail.gmail.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.
Reply all
Reply to author
Forward
0 new messages