length of transcripts

Bogdan Tanasa

unread,

Nov 19, 2015, 6:28:03 PM11/19/15

to gen...@soe.ucsc.edu

Dear all,

given a list of gene names, is there any quick way to find the length of the corresponding transcripts ? And is multiple transcripts for a gene exist, how can we identify the canonical transcripts ? MAny thanks,

bogdan

Cath Tyner

unread,

Nov 20, 2015, 7:45:37 PM11/20/15

to Bogdan Tanasa, gen...@soe.ucsc.edu

Hi Bogdan,

Thank you for using the UCSC Genome Browser and for submitting your question regarding canonical transcript lengths from a list of gene names. In short, there is no quick and easy way to retrieve transcript lengths. However, you can write a script to perform a calculation from the Table Browser output to get those results. To get output for the canonical transcripts, you can use a table called "knownCanonical" which describes the canonical splice variant of a gene (and generally contains one unique transcript per associated gene).

You can use our "Table Browser" tool (see our Table Browser User's Guide) to accomplish this by following the steps below:

1. Navigate to "Tools > Table Browser" in the top horizontal blue menu bar from our home page.

2. Set your conditions:

Clade: Mammal
Genome: Human
Assembly: Dec. 2013 (GRCh38/hg38)
Group: Genes and Gene Prediction
Track: GENCODE v22
Table: knownCanonical
Region: genome
Identifiers (names/accessions): Click "paste list" or "upload list" to attach your list of genes.
Output format: selected fields from primary and related tables

3. Click "get output" to move to the next step.

4. Under the section, "Select Fields from hg38.knownCanonical," check the following checkboxes:

chrom
chromStart
chromEnd
transcript

5. Under the hg38.kgXref fields, check the checkbox for "Gene Symbol."

6. Click the "get output" button at the top of the page.

From your output, you can now find the transcript length by calculating (chromEnd - chromStart)

.

Thank you again for your inquiry and for using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Enjoy,

Cath
. . .
Cath Tyner
UC Santa Cruz Genomics Institute

> --
>
> ---
> You received this message because you are subscribed to the Google Groups "UCSC Genome Browser discussion list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.

Bogdan Tanasa

unread,

Nov 25, 2015, 4:12:59 PM11/25/15

to Cath Tyner, gen...@soe.ucsc.edu

Dear Cath, and all,

I was trying to download the sequences of the canonical transcripts as you suggested below . When I do, all of the downloaded sequences are annotated on the "+" strand;

while I would have expected about half of the sequences to correspond to the transcripts on the strand "-".

Please could you let me know why all of the sequences are annotated on the strand "+" ? many thanks, and happy Thanksgiving !

-- bogdan

Christopher Lee

unread,

Nov 25, 2015, 5:22:04 PM11/25/15

to Bogdan Tanasa, Cath Tyner, gen...@soe.ucsc.edu

Hi Bogdan,

Thank you for using the UCSC Genome Browser and for submitting your question regarding canonical transcript strand annotation. The reason all the transcripts appear to be annotated on the "+" strand is because when a transcript is annotated on the "-" strand, the chromStart field is actually the end position of the transcript, and the chromEnd field is actually the start position of the transcript.

You can see text explaining this at the top of the page when you perform Step 5 as Cath noted above, and read the description for the "Select Fields from hg38.knownCanonical table" section. Note for chromStart: "Start position (0 based). Represents transcription start for + strand genes, end for - strand genes".

One way to display the strand your transcript occurs on is to allow selections from the knownGene table:

Follow all steps Cath noted up to and including Step 5.
Now scroll down and find the knownGene table in the "Linked Tables" section, and select the box.
Scroll to the bottom of the page and click "allow selection from selected tables"
Choose strand from the "hg38.knownGene fields" section.

Click the "get output" button at the top of the page.

Here is an example of what you will see:

#hg38.knownCanonical.chrom	hg38.knownCanonical.chromStart	hg38.knownCanonical.chromEnd	hg38.knownCanonical.transcript	hg38.kgXref.geneSymbol	hg38.knownGene.strand
chr1	169853073	169893959	uc001ggs.5	SCYL3	-
chr1	169795048	169854080	uc001ggp.4	C1orf112	+
chr1	27612063	27635277	uc001bom.4	FGR	-

Thank you again for your inquiry and for using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Thanks,

Christopher

Bogdan Tanasa

unread,

Nov 25, 2015, 5:32:16 PM11/25/15

to Christopher Lee, Cath Tyner, gen...@soe.ucsc.edu

Thank you Chris et all, happy Thanksgiving !

Reply all

Reply to author

Forward