Unable to download CDS from bigBed track

13 views
Skip to first unread message

David da Silva Pires

unread,
May 20, 2015, 1:54:21 PM5/20/15
to gen...@soe.ucsc.edu
Hi.

I have build a bigBed track called "SMPs v5.2" at the following assembly hub:

http://www.vision.ime.usp.br/~davidsp/hub/geneNetwork2/hub.txt

Since this track was obtained from a bed 12 file, every feature has the information about the number of blocks, blocks starts and blocks sizes. But, when I click at a feature to access its specific information page, there is no way to download just the coding sequence (CDS). The option that is displayed is relative to the entire window, including introns and intergenic regions.

What am I supposed to do in order to download just the CDS?

Tranks.

--
David da Silva Pires

Jonathan Casper

unread,
May 20, 2015, 4:04:43 PM5/20/15
to David da Silva Pires, gen...@soe.ucsc.edu

Hello David,

Thank you for your question about obtaining CDS sequence for items in your BED 12 track. We would like to improve the DNA retrieval options for the description pages of individual features, but there is no timetable for it right now. In the interim, the easiest way to get the sequence filtering options you describe is to use the Table Browser. First load your hub on our site, and then follow these steps:

1. Open the UCSC Table Browser at http://genome.ucsc.edu/cgi-bin/hgTables (or click "Table Browser" from the top "Tools" menu on our site).
2. Select your track hub and track from the drop-down menus, set the region to "genome", then click the "Identifiers: paste list" button.
3. On the new page, add the name of the BED 12 item that you want sequence from to the text box. Click "submit".
4. Select the output format "sequence" and click "get output".

On the resulting page, you should be able to choose which portions of your feature to retrieve sequence for (CDS, exons, UTR, etc.).

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group


--


David da Silva Pires

unread,
Jul 20, 2015, 3:35:56 PM7/20/15
to Jonathan Casper, gen...@soe.ucsc.edu
Hi, Jonathan.

Thank you very much for the detailed explanation.

If I follow the steps that you wrote and, at step 4, I choose "all fields from selected table" or "selected fields from primary and related tables", then I get the header and the line of the specific feature whose ID I pasted at "Identifiers: paste list" button.

However, if I select the output format as "sequence", in order to obtain the nucleotides of the CDS region, the result is a giant FASTA file with all the features of the track.

Am I doing something wrong? How to obtain the nucleotides of the CDS region of a feature contained in a BED 12 track from my assembly hub?

Thanks in advance.

--
David da Silva Pires

Jonathan Casper

unread,
Jul 27, 2015, 5:10:34 PM7/27/15
to David da Silva Pires, gen...@soe.ucsc.edu

Hello David,

Thank you for telling us about your problem where the identifiers filter isn't correctly limiting sequence output! One of our engineers is working on solving several problems related to accessing assembly hubs with the UCSC Table Browser, and this is definitely on the list. While that issue is not fixed yet, you can continue to use the Table Browser on our test server at http://genome-test.soe.ucsc.edu. Any fix for this problem will be provided there first. As a temporary workaround, you can pass the output of your Table Browser query through our faFilter tool (available at http://hgdownload.soe.ucsc.edu/admin/exe). faFilter can be provided with a list of names for your desired FASTA records, and will extract only those records from the data.

   faFilter -name='accession' TableBrowserOutput.fa MyCDSResults.fa

or
   faFilter -namePatList=accessionFile TableBrowserOutput.fa MyCDSResults.fa
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group

Reply all
Reply to author
Forward
0 new messages