clade: Mammal
genome: Human
assembly: Dec. 2013 (GRCh38/hg38)
group: Genes and Gene Predictions
track: NCBI RefSeq
table: RefSeq Curated (ncbiRefSeqCurated)
region: genome
output format: selected fields from primary and related tables
output file: enter a file name or leave blank to view in web browser
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAC4d_2TyGztJpMb_du%2Bb9_7sWseJGORL0Mkk46tufEZQ9pEZBg%40mail.gmail.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.
GCAGCCGCAG CTCGGGGGCG GTGCCTGCCT TGCAGCCTCC CCTCGGCGAT 50
CGCGCAGCCC CATCTTTGTC CGGCCTCCGC GCTTTGTTCT CGGCGCCCGG 100
GCCTTGGCCA GCCTGGCCAG CCGCCGAGCA GCCCCCACGC CGCGCTGGCG 150
TCGTCCTCGC CTCCCTCGCC GCCGCCCCCC GCGCGCGGCC GGGCCTTGCC 200
CCCCATGGTG TCCCGGCCAG AGCCCGAGGG CGAGGCCATG GACGCCGAGC 250
TGGCGGTAGC GCCGCCGGGC TGCTCGCACC TGGGCAGCTT CAAGGTGGAC 300
AACTGGAAGC AGAACCTGCG GGCCATCTAC CAGTGCTTCG TGTGGAGCGG 350
CACGGCTGAG GCCCGCAAGC GCAAGGCCAA GTCCTGTATC TGCCATGTCT 400
GTGGCGTCCA CCTCAACAGG CTGCATTCCT GCCTCTACTG TGTCTTCTTC 450
GGCTGTTTCA CAAAGAAGCA TATTCACGAG CATGCGAAGG CGAAGCGGCA 500
CAACCTGGCC ATTGATCTGA TGTACGGAGG CATCTACTGT TTTCTGTGCC 550
AGGACTACAT CTATGACAAA GACATGGAAA TAATCGCCAA GGAGGAGCAG 600
CGAAAAGCTT GGAAAATGCA AGGCGTTGGA GAGAAGTTTT CAACTTGGGA 650
ACCAACCAAA CGGGAGCTTG AACTGCTGAA GCACAACCCG AAAAGGAGAA 700
AGATCACCTC GAACTGCACC ATAGGTCTGC GTGGGCTGAT CAACCTTGGG 750
AACACATGCT TCATGAACTG CATCGTGCAG GCCCTGACCC ACACGCCACT 800
TCTGCGGGAC TTCTTCCTGT CTGACAGGCA CCGCTGTGAG ATGCAGAGCC 850
CCAGCTCCTG TCTGGTCTGT GAGATGTCCT CACTGTTTCA GGAGTTTTAC 900
TCTGGACACC GGTCCCCTCA CATCCCGTAT AAGTTGCTGC ACCTGGTGTG 950
GACCCACGCG AGGCACCTAG CAGGCTACGA GCAGCAGGAC GCCCACGAGT 1000
TCCTCATCGC GGCCCTGGAC GTGCTCCACC GACACTGCAA AGGTGATGAC 1050
AATGGGAAGA AGGCCAACAA CCCCAACCAC TGCAACTGCA TCATAGACCA 1100
GATCTTCACA GGCGGGTTGC AGTCAGACGT CACCTGCCAA GTCTGCCATG 1150
GAGTCTCCAC CACCATCGAC CCCTTCTGGG ACATCAGCTT GGATCTCCCC 1200
GGCTCTTCCA CCCCATTCTG GCCCCTGAGC CCAGGGAGCG AGGGCAACGT 1250
GGTAAACGGG GAAAGCCACG TGTCGGGAAC CACCACGCTC ACGGACTGCC 1300
TGCGACGATT CACCAGACCA GAGCACTTGG GCAGCAGCGC CAAGATCAAG 1350
TGCAGCGGTT GCCATAGCTA CCAGGAGTCC ACAAAGCAGC TCACTATGAA 1400
GAAACTGCCC ATCGTAGCCT GTTTTCATCT CAAACGATTT GAACACTCAG 1450
CCAAGCTGCG GCGGAAGATC ACCACGTATG TGTCCTTCCC CCTGGAGCTG 1500
GACATGACCC CTTTCATGGC CTCCAGCAAA GAGAGCAGGA TGAATGGACA 1550
GTACCAGCAG CCCACGGACA GTCTCAACAA TGACAACAAG TATTCCCTGT 1600
TTGCTGTTGT TAACCATCAA GGGACCTTGG AGAGTGGCCA CTACACCAGC 1650
TTTATCCGGC AGCACAAAGA CCAGTGGTTC AAGTGTGACG ATGCCATCAT 1700
CACCAAGGCC AGCATCAAGG ACGTCCTGGA CAGCGAAGGG TACTTGCTGT 1750
TCTATCACAA ACAGTTCCTG GAATACGAGT AGCCTTATCT GCAGCTGGTC 1800
AGAAAAACAA AGGCAATGCA TTGGCAAGCC TCACAAAGTG ATCCTCCCTG 1850
GCCCCCCCCT CCCCCAAGTC TCCCGCCGCC TCCCCGGCCT GGTGACACCA 1900
CCTCCCATGC AGATGTGGCC CCTCTGCACC TGGGACCCAT CGGGTCGGGA 1950
TGGACCACAC GGACGGGGAG GCTCCTGGAG CTGCTTTGAA GATGGATGAG 2000
ATGAGGGGTG TGCTCTGGGT GGGAGGAGCA GCGTACACCC GTCACCAGAA 2050
CATCTCTTGT GTCATGACAT GGGGGTGCAA CGGGGGCCTC ACAGCACAGA 2100
GTGACCGCTG CCTGGCGTTC CCCAGCACTC GGTGTGGAAA GGCCCCTACC 2150
TGCTGTAAGA TTATGGGTCC ATGAAAGCAG TAAGCTGGAC ACAGAGGTGT 2200
AGTGTGCGGG ACAGAGGGCC TTGCAGATGC CTTTCTGTTG GTGTTTTAGT 2250
GTTAAAATAC GGAGAGTATG GAACTCTTCA CCTCCATTTT CTCAGCGGCT 2300
GTGAAGCAGC CTCCTAGCTT CGGAAGTACG GACACTACGT CGCGTTTTCA 2350
AGCGTGTCTG TTCTGCAGGT AACAGCATCA AGCTGCACGT GGAAGCATCT 2400
CGCGGTTTTC TAGAAACAGG CATTTTCTTA TCCCTCTCCC GCTCCTTTTT 2450
CCACAAAGGT GAATTTCATA AATGTAATAC TAGTAAAGTG AATGAATTAC 2500
TGAGTTTATA CAGAAATTTA GGTAACTTCT CCTTTAGTCT CAAGAGCGAG 2550
TCTTGCTTTT TAATGGGTGC CGTTTATGTT GCTGCCCGCC CTGTGTGCCT 2600
GGCTCCTCTG GGTGCCTTGG TGTCTGCTGG TGGCTGGCAG TGGGCGCAGC 2650
GGAGGAGAGT TGTGCTGCAG CTCATACGGT GTGTCTGTCA TCTCAGTCTG 2700
GAGTAAATGC AGTGTCTGCC GGTGTCTGAT GGGTTCTGTC CCTCGTATTT 2750
TCTTTGCCTT CTATCCCATT GCCTGGCTAC CGCTGCCTGG CAGCCAAGGG 2800
TGTTGGTCGC GAAGCTGGAG TGGCCTCTGG TGGAGCCTGC ATCTTGTCTC 2850
GTCTGCCTCT GCTTTACATT TGGTGTACTT TCGGGCGTGG TGGCAGTAAA 2900
ATGACACCGT GATTGAGCTT GTCAGCAGAG CTGAAAGAGA AAGTAGAAGG 2950
ATGTGCATTG TTTCTTGTAA GATATCTTGC ATGTATCTGT GTATTCAAAT 3000
TCAAACAGAG ATGGTTTGTC CATTTGTCCA CTGAGAAATT AGAAACTAGG 3050
GACAAGGGGG AGGAAAAGTA CTGAAATACA GTTTATGAAG CAAGTGTGTC 3100
TCGGGCTGTG CTTGTCCCAG GAGCCCCAGC AGCATCTGAA CTGAGGCTTC 3150
TTCAGTCCTG CAGGAACAGG ATCATCTGTC TCAGCGGTGG GCAGATGTTT 3200
TCATAGACAG CCAGGGAGTA AACACTGTTG GCTCTGTGGG CTGTATGGTC 3250
TCTGCCATAA ATAGTACAGA GATGTGGCTG TGTCTAGTAC AACTTTTAGA 3300
CACAGAAATC TGAATGACAT ATATTGTTCT GTGTCAAGAA ACTTAGATTT 3350
TTTTTTTAAC TATTTAAAAA CGTGAAACCT ATTCTTAGCT CACAGGCCAT 3400
GGAGAAGCTG GTGGGGACCA GACCCAGCTC CTTAGCTGGC TGGGCTGGGG 3450
AGGGGGTAGT GACAGTGGCA GCTGCTACTC ACTGCTCAGT GTGGAAAACA 3500
CAGGACTTGG CAATCACAGC CCGCAGAACC ATCATGTGTG GCAGAAGCCT 3550
GAGGGATGCG GTTTCTTGCC CACGTGCTCT GTTCATTTTC TGTTGTTTTT 3600
CTGCACTTAA AGAATTCACA TGGAAGCATG TTTTATAAAA TGAATTACCA 3650
GAGAAACAGA GATGGGCCGA GATTTTCAGA AATGGTCCCA TGTGACCAAG 3700
TTCTGCTGTT TGGGTGACAG TGCTTTGAAG ATCTCCTTTG AGGATGTGCA 3750
GTCTTTTTTT TTTTTTTTTT GAGATGGAGT TTGTTGCCCA GGCTGGAGTG 3800
AGTGGCACAG TCTCGGCTCA CTGCAACCTC CACCTCCTGG GTTCAAGCAG 3850
TTCTCGTGCC GCAGCCTCCC AAGTAGCTGG GACTACAGGC ATGCACCACC 3900
ACGCCAGGCT AATTTTTGTA TTTTTAGTAG AGATGGGGTT TCACCATGTC 3950
TCAAACTCCT GACCTCAGGC GATCCACCCA CCTCAGCGTC CCAAAGTGCT 4000
GGGATTATAG GCGTGAGCCA CCGCACCTGG CCTATGAGTG GTCTTTTAAT 4050
TAGGAACAAA TCTAATGGAA AGGAGAGTTG ACTGAAGTTG GCCCACAGGA 4100
TTGTGAGCTG GGCAGTGCCT TCATGAAGGC TTGCCACCTT GGGACGCCCC 4150
AGTTTACTGG GGTGTCTTGC GGAGTGCAGA AGGCTTTCTG GCAGCTGCCT 4200
GGGTTTGGCC AGACCCTGCC TCCCCTCCCG CCGGCCAACC CCTAGTCCCC 4250
TTCCTGTCTC CACTTGCATT CAGGGGTGGC TGCTGTTCTG AGAACATTAG 4300
AACTGGGAAG AGAGATGGAG TCACATGGAT TTTTGGTGGG CATTATTCTA 4350
AACTTTCGTA TCCAAGTTAG TCCCCCTTAT TCCACTGTGG CATTGCCGTT 4400
CTAAGCAGTT ACCTGATGCC TGCTGCTGAA GAGCTGCTCA CAGGAGGCGG 4450
CGGCGGCCCT GGCACTGCCC CTTGCATTAG GTCTTGTGTT TGATGTGTTC 4500
TTGTGAATTT ACTTTGTCAG AACAAAATAT TTACGCGTTG GGTTCAGGAA 4550
TTTCTTTTAG CTCCCCATCT GGCTGTGAAA TTCAGGAAAC CTCCCGTTGC 4600
CTAGTAATCA CCCCATGTAG GTGTACATTG TGACAAAGTG CATCTGACCA 4650
CTAAGGGGCC CCCTTGGTGA CCCCAGCACA TTCACAGCAG TGTTAAAATG 4700
GCCTGCATTT TGGAGATGCT GGCTGGCCTT TCAGTGCCTC CCAGGAAGAC 4750
ACATGGCCTT TCCCTCTTCA GATGCCTGAA GGGAGTGCTT TGAGGCAGGT 4800
GATGTGCTGG GAGTGTGGGC GGCCTCCCTC TGGCCCCGGG GCCCTCTGTG 4850
GACCTTGGCT CCCTCCGTGG ACCTGGGCTT CGTGGTGAGC ACTGCAGCCT 4900
CCCTGGGCAT TCCCTCCAGC GCCAGCACCA CTGCAACATA TAGACCTGAG 4950
TGCTATTGTA TTTTGGCTTG GTGTGTATGC TCTTCATTGT GTAAAATTGC 5000
TGTTCTTTTG ACAATTTAAG TGATTGTTTT GTTTACTGTA AGTTTGAAAA 5050
TAAAAATGAA GAAAAAAATT CCAATGACTG TGCTGTGGTT GGAGACTTTA 5100
TTTACCAAGA TGTTTACTCT TCCTTTCCCC TTCCATTTTG AGGAGCTGTG 5150
TCACTCCTCC TCCCCCCCAG TGCTTTGTAG TCTCTCCTAT GTCATAATAA 5200
AGCTACATTT TCTCTGAGAA
=============================================================Thank you for using the UCSC Genome Browser and your question.
One of our engineers shares your goal could be accomplished with some scripting like this:
1. Make a BED file, using the CDS coordinates and the size of each RefSeq transcript, that enumerates the regions to be output (in BED 0-based half open coordinates; see http://genome.ucsc.edu/blog/the-ucsc-genome-browser-coordinate-counting-systems/). For example, it would have these 4 lines for NM_015276.1 above (CDS = 205..1782, size = 5220):
3. Use twoBitToFa -bed=<fileCreatedInStep1> to get the desired fasta.
Step 1 is definitely the complex part, but with some background in bioinformatics and scripting it is possible. The size of each mRNA can be obtained from the fasta file like this:
faSize -detailed seqNcbiRefSeq.rna.fa > mrna.sizes
It may be helpful to be aware that some transcripts have incomplete or complex CDS. For example, a few transcripts depend on ribosomal slippage, e.g. NM_001134939.1 with CDS "join(168..260,262..741)". The incomplete CDS seem to be all XM_ at this point.
These utilities like faToTwoBit, faSize, and twoBitToFa can be obtained here: http://hgdownload.soe.ucsc.edu/admin/exe/
Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further public questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
All the best,
Brian Lee
UC Santa Cruz Genomics Institute
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAC4d_2TX%3DXQ51q2gKj9LS1nzNjb3rxfO_8GGNe3AZYRBF9kjwg%40mail.gmail.com.