Hello ruvalcabatrejo,
Thank you for your question about finding the transcriptional start site and coding sequence for genes. Unfortunately your sample output is missing from the question, so I can't advise you about those results.
If you would like to obtain the TSS and CDS start positions for a list of genes, I suggest you use the UCSC Table Browser as follows. Here I will assume that you want to obtain gene coordinates from the UCSC Genes track of the human hg19 genome assembly.
1. Open the UCSC Table Browser at http://genome.ucsc.edu/cgi-bin/hgTables
2. Select the following options
Clade: Mammals
Genome: Human
Assembly: Feb. 2009 (GRCh37/hg19)
Group: Genes and Gene Predictions
Track: UCSC Genes
Table: knownGene
Region: genome
Output format: selected fields from primary and related tables
Note: After you select the knownGene table, you can click the "describe table schema" button to get a description of each of the different fields. In particular, txStart contains the TSS coordinate, and cdsStart is the start position of the CDS. These coordinates are 0-based (see http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms for more information).
3. For "identifiers (names/accessions)", click "paste list" and paste your list of genes into the text box that appears. Click "submit" when you are done.
4. Click "get output".
5. On the next page, select the following options.
From hg19.knownGene: name, txStart, cdsStart
From the linked hg19.kgXref table: geneSymbol
6. Click "get output"
You will be presented with data for each transcript of your matching genes. Here is the output I got for the IL1RN gene:
#hg19.knownGene.name hg19.knownGene.txStart hg19.knownGene.cdsStart hg19.kgXref.geneSymbol uc002tix.1 113856936 113856936 IL1RN uc002tiy.3 113875469 113885303 IL1RN uc002tiz.3 113875469 113875595 IL1RN uc002tja.3 113875469 113875595 IL1RN uc002tjb.3 113885137 113885201 IL1RN
If you are interested in working further with the UCSC Table Browser and related tools, I suggest you begin with the resources on our training page at http://genome.ucsc.edu/training.html. The OpenHelix video tutorials in particular offer a guided, example-driven introduction to our website.
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.
--
Jonathan Casper
UCSC Genome Bioinformatics Group
--
#hg38.knownGene.name hg38.knownGene.txStart hg38.knownGene.cdsStart hg38.kgXref.geneSymbol uc002tix.1 113099359 113099359 IL1RN uc002tiy.3 113117892 113127726 IL1RN uc002tiz.3 113117892 113118018 IL1RN uc002tja.3 113117892 113118018 IL1RN uc002tjb.3 113127560 113127624 IL1RN
Hello Laura,
I used the hg19 assembly in my example because that is still the default human assembly for the UCSC Genome Browser. While GRCh38/hg38 is a more complete assembly than GRCh37/hg19, it is still quite new. Much of the annotation on the hg19 assembly (and there is a lot) has not yet been constructed for the hg38 assembly. That does not mean that hg19 is more accurate or trustworthy.
If you are interested in learning more about genome assemblies, NCBI provides a short primer at http://www.ncbi.nlm.nih.gov/assembly/basics/. You may also be interested in the NCBI Insights blog, which gives further information about some of their projects. You can find posts about the hg38 genome assembly at http://ncbiinsights.ncbi.nlm.nih.gov/tag/grch38/.
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.
--
Jonathan Casper
UCSC Genome Bioinformatics Group