get fasta format sequence from genome

910 views
Skip to first unread message

DESSEN Philippe

unread,
Feb 10, 2017, 10:16:15 AM2/10/17
to gen...@soe.ucsc.edu, DESSEN Philippe


Dear colleague,

Do you say me how is it possible to obtain a genomic sequence by bed definition from a specific genome guid
:
ex : How can I obtain (interactively or by an URL) the human hg38 sequence defined by chr14: 31246363-31251138 (+)
in fasta format

I have not find this utility on the help.

Many thanks

Philippe Dessen
IGR, Villejuif, France

Jairo Navarro Gonzalez

unread,
Feb 16, 2017, 12:41:33 PM2/16/17
to DESSEN Philippe, gen...@soe.ucsc.edu

Hello Philippe,

Thank you for using the UCSC Genome Browser and your question about obtaining a sequence in FASTA format from the genome. 
This can be done in two ways, using the Table Browser or using the command-line tool, twoBitToFa.


Using the Table Browser

If you would like to use the Table Browser to get the FASTA sequences, you can do so by creating a custom track.
To create a custom track with your regions of interest, go to http://genome.ucsc.edu/cgi-bin/hgCustom and paste or upload your bed file. After you have created your custom track, navigate to the Table Browser.

Here are my settings for this example:

clade: Mammal
genome: Human
assembly: Dec. 2013 (GRCh38/hg38)
group: Custom Tracks
track: User Track
output format: sequence

After configuring the Table Browser settings, click the get output button. 
You will now be at a new page where you can configure the formatting of the sequence inside the FASTA file. 
Once you have configured the settings on this page, click get sequence.


Using the twoBitToFa utility

If you would like the use the command-line to get the FASTA sequences, install the twoBitToFa utility from our downloads page: 
http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads
If you would like to view all of the options available for any kent utility, run the command without any arguments.

Once you have this utility installed on your machine, you can use the link to the hg38 2bit file as the input for the twoBitToFa utility. Using the URL to the 2bit file, you can specify the region by specifying the input as:

http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.2bit:chr14:31246363-31251138

The command for your region of interest would be:

$ twoBitToFa http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.2bit:chr14:31246363-31251138 output.fa

If you have your regions of interest in a bed file, you can use the "-bed=<your_file>". This option will exclude introns from the output. 
The command should now be the following:
$ twoBitToFa -bed=<your_file> http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.2bit output.fa 

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly-accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro 
UCSC Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.


Reply all
Reply to author
Forward
0 new messages