chr6 | 46139583 | 46139628 | Signal peptide | 1000 | 0 | 46139583 | 46139628 | 0,0,0 | 1 | 45 | 0 | signal peptide | amino acids 1-15 on protein Q9Y6X5 | Q9Y6X5 | |
chr6 | 46139628 | 46143499 | Extracellular | 1000 | 0 | 46139628 | 46143499 | 100,0,0 | 3 | 781171224 | 0,1423,3647 | topological domain | amino acids 16-407 on protein Q9Y6X5 | Extracellular | Q9Y6X5 |
chr6 | 46139628 | 46143637 | Bis(5'-adenosy... | 1000 | 0 | 46139628 | 46143637 | 0,0,0 | 3 | 781171362 | 0,1423,3647 | chain | amino acids 16-453 on protein Q9Y6X5 | Bis(5'-adenosyl)-triphosphatase ENPP4 | Q9Y6X5 |
chr6 | 46139682 | 46139685 | ion-binding | 1000 | 0 | 46139682 | 46139685 | 0,0,0 | 1 | 3 | 0 | metal ion-binding site | amino acid 34 on protein Q9Y6X5 | Zinc 1; catalytic | Q9Y6X5 |
chr6 | 46139790 | 46139793 | enzyme act site | 1000 | 0 | 46139790 | 46139793 | 0,0,0 | 1 | 3 | 0 | active site | amino acid 70 on protein Q9Y6X5 | AMP-threonine intermediate | Q9Y6X5 |
chr6 | 46139790 | 46139793 | ion-binding | 1000 | 0 | 46139790 | 46139793 | 0,0,0 | 1 | 3 | 0 | metal ion-binding site | amino acid 70 on protein Q9Y6X5 | Zinc 1; catalytic | Q9Y6X5 |
chr6 | 46139853 | 46139856 | bind | 1000 | 0 | 46139853 | 46139856 | 0,0,0 | 1 | 3 | 0 | binding site | amino acid 91 on protein Q9Y6X5 | Substrate | Q9Y6X5 |
chr6 | 46140042 | 46140045 | bind | 1000 | 0 | 46140042 | 46140045 | 0,0,0 | 1 | 3 | 0 | binding site | amino acid 154 on protein Q9Y6X5 | Substrate | Q9Y6X5 |
chr6 | 46140045 | 46140048 | glyco | 1000 | 0 | 46140045 | 46140048 | 100100 | 1 | 3 | 0 | glycosylation site | amino acid 155 on protein Q9Y6X5 | N-linked (GlcNAc... | Q9Y6X5 |
chr6 | 46140078 | 46140081 | glyco | 1000 | 0 | 46140078 | 46140081 | 100100 | 1 | 3 | 0 | glycosylation site | amino acid 166 on protein Q9Y6X5 | N-linked (GlcNAc... | Q9Y6X5 |
chr6 | 46140147 | 46140150 | bind | 1000 | 0 | 46140147 | 46140150 | 0,0,0 | 1 | 3 | 0 | binding site | amino acid 189 on protein Q9Y6X5 | Substrate | Q9Y6X5 |
chr6 | 46140147 | 46140150 | ion-binding | 1000 | 0 | 46140147 | 46140150 | 0,0,0 | 1 | 3 | 0 | metal ion-binding site | amino acid 189 on protein Q9Y6X5 | Zinc 2; catalytic | Q9Y6X5 |
chr6 | 46140159 | 46140162 | ion-binding | 1000 | 0 | 46140159 | 46140162 | 0,0,0 | 1 | 3 | 0 | metal ion-binding site | amino acid 193 on protein Q9Y6X5 | Zinc 2; catalytic | Q9Y6X5 |
chr6 | 46140291 | 46140294 | ion-binding | 1000 | 0 | 46140291 | 46140294 | 0,0,0 | 1 | 3 | 0 | metal ion-binding site | amino acid 237 on protein Q9Y6X5 | Zinc 1; catalytic | Q9Y6X5 |
chr6 | 46140294 | 46140297 | ion-binding | 1000 | 0 | 46140294 | 46140297 | 0,0,0 | 1 | 3 | 0 | metal ion-binding site | amino acid 238 on protein Q9Y6X5 | Zinc 1; catalytic | Q9Y6X5 |
chr6 | 46140342 | 46140345 | disulf bond | 1000 | 0 | 46140342 | 46140345 | 100100100 | 1 | 3 | 0 | disulfide bond | amino acid 254 on protein Q9Y6X5 | disulfide bond to position 287 | Q9Y6X5 |
chr6 | 46140408 | 46141053 | glyco | 1000 | 0 | 46140408 | 46141053 | 100100 | 2 | 1,2 | 643 | glycosylation site | amino acid 276 on protein Q9Y6X5 | N-linked (GlcNAc... | Q9Y6X5 |
chr6 | 46141083 | 46141086 | disulf bond | 1000 | 0 | 46141083 | 46141086 | 100100100 | 1 | 3 | 0 | disulfide bond | amino acid 287 on protein Q9Y6X5 | disulfide bond to position 254 | Q9Y6X5 |
chr6 | 46143283 | 46143286 | ion-binding | 1000 | 0 | 46143283 | 46143286 | 0,0,0 | 1 | 3 | 0 | metal ion-binding site | amino acid 336 on protein Q9Y6X5 | Zinc 2; catalytic | Q9Y6X5 |
chr6 | 46143433 | 46143436 | glyco | 1000 | 0 | 46143433 | 46143436 | 100100 | 1 | 3 | 0 | glycosylation site | amino acid 386 on protein Q9Y6X5 | N-linked (GlcNAc... | Q9Y6X5 |
chr6 | 46143457 | 46143460 | disulf bond | 1000 | 0 | 46143457 | 46143460 | 100100100 | 1 | 3 | 0 | disulfide bond | amino acid 394 on protein Q9Y6X5 | disulfide bond to position 401 | Q9Y6X5 |
chr6 | 46143478 | 46143481 | disulf bond | 1000 | 0 | 46143478 | 46143481 | 100100100 | 1 | 3 | 0 | disulfide bond | amino acid 401 on protein Q9Y6X5 | disulfide bond to position 394 | Q9Y6X5 |
chr6 | 46143499 | 46143562 | Transmembrane | 1000 | 0 | 46143499 | 46143562 | 0,0,100 | 1 | 63 | 0 | transmembrane region | amino acids 408-428 on protein Q9Y6X5 | Helical | Q9Y6X5 |
chr6 | 46143562 | 46143637 | Cytoplasmic | 1000 | 0 | 46143562 | 46143637 | 100,0,0 | 1 | 75 | 0 | topological domain | amino acids 429-453 on protein Q9Y6X5 | Cytoplasmic | Q9Y6X5 |
Hi Kostas,
Thank you for your question about downloading protein coding genes. The current recommended method for downloading protein coding genes is to select your gene track of interest and then filter for "cdsStart!=cdsEnd" in the free form query section. This limits the output to all coding transcripts from your gene track of interest. However, because there can be multiple alternatively spliced transcripts for a single gene, the result will still contain multiple entries for one locus. Luckily we do create a table called "knownCanonical" which contains a "canonical" transcript for a given locus. Thus you can filter for those transcripts that have cdsStart!=cdsEnd and are in the knownCanonical table to get roughly one transcript per locus:
1. Navigate to the Table Browser: https://genome.ucsc.edu/cgi-bin/hgTables.
2. Select your organism and assembly of interest, in the example below I will be using Human Dec. 2013 GRCh38/hg38 (hg38).
3. Make the following selections:
- group: Genes and Gene Predictions
- track: GENCODE V24
- table: knownGene
4. Click the "create" button next to "filter".
5.
In the free-form query box in the "Filter on Fields from
hg38.knownGene" section, enter "cdsStart!=cdsEnd" without the quotes.
6.
Scroll down to the "Linked Tables" section and check the box next to
knownCanonical. Scroll down all the way to the bottom of the page and
click "allow filtering using fields in checked tables".
7. In the "hg38.knownCanonical based filters" free-form query box, enter "1" without quotes.
8. Click "submit".
9. Select your output format of interest (BED, custom track, etc) and whether you would like the results in a file.
10. Click "get output".
Here is a session containing a custom track created via the above
steps, where you can see that even though there are multiple coding
transcripts in the GENCODE V24 track, the Table Browser query limits the
output to only the protein coding genes (MTOR and ANGPTL7):
http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=chmalee&hgS_otherUserSessionName=hg38_proteinCodingOnly
Since you are new to the UCSC Genome Browser, you may also find the following pages helpful:
- UCSC Genome Browser - Training
http://genome.ucsc.edu/training/index.html
- UCSC Genome Browser Tutorials by OpenHelix:
http://www.openhelix.com/ucsc
- UCSC Genome Browser Videos:
https://www.youtube.com/channel/UCQnUJepyNOw0p8s2otX4RYQ/videos
Lastly, all emails sent to gen...@soe.ucsc.edu are publicly archived at the following google group, which you can search for topics of interest like the following:
https://groups.google.com/a/soe.ucsc.edu/forum/#!searchin/genome/find$20TSS
Please let us know if you have any further questions!
Thank you again for your inquiry and using the UCSC Genome Browser. If
you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a
publicly-accessible forum. If your question includes sensitive data,
you may send it instead to genom...@soe.ucsc.edu.
Christopher Lee
UCSC Genomics Institute
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAFNbu4Yf-P_6-5g%2BVmFU9vrAD-iw2-4i8pOpJL1gU3%2BNWtMeYQ%40mail.gmail.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.