RSEM+STAR with modified GTF

84 views
Skip to first unread message

kallet

unread,
Jul 29, 2020, 7:34:52 AM7/29/20
to RSEM Users
Hi all RSEM users,

I'm applying RSEM to quantify transcript level expression with STAR aligner in mouse, and it works great with the GTF file for the GRCm38 genome I downloaded from Ensembl FTP site.

However there is an isoform missing from the mouse annotation, which we see in our data and exists in human, and I'd like to include that in the GTF. It should fairly simple, as it only skips one exon, but adding the isoform as new GTF lines doesn't seem to work properly.

The rsem-prepare-reference step with option --star goes through and finishes as expected, but when doing rsem-calculate-expression, it stalls and I get a warning 'RSEM can not recognize reference sequence name Slc12b2!'
which is the new alternative isoform name I included in the GTF, right after the original one.

I've thought of 2 options; either prepare-reference has skipped this isoform if I've done it incorrectly, but the lines seem correct to me. I believe RSEM should care only for start, CDS and stop lines in GTF of gene ID and isoform ID lines, though I've included all lines for the extra isoform. I suppose RSEM doesn't care if the isoform exists in Ensembl data base for real? Any suggestions to get around this are warmly welcome!

bw, Kalle, Univ Helsinki


Peng

unread,
Jul 29, 2020, 1:39:48 PM7/29/20
to RSEM Users
Hi Kalle,

If possible, would you mind to share the GTF lines you added for that isoform as well as GTF lines for a few other example isoforms? They would be helpful to diagnose the potential issue.

Best,
Peng  
Message has been deleted
Message has been deleted

kallet

unread,
Jul 30, 2020, 11:03:03 AM7/30/20
to RSEM Users
Hi Peng,
unfortunately the GTF lines don't seem to fit here .. here's last line of original GTF plus the ones I've included myself (hopefully all show up):

18 ensembl three_prime_utr 57944052 57946821 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "ENSMUST00000115366"; transcript_version "2"; gene_name "Slc12a2"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_name "Slc12a2-201"; transcript_source "ensembl"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; tag "basic"; transcript_support_level "1";
18 own transcript 57878678 57946821 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; tag "basic"; transcript_support_level "1";
18 own exon 57878678 57879544 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "1"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000702618"; exon_version "2"; tag "basic"; transcript_support_level "1";
18 own CDS 57878807 57879544 . + 0 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "1"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own start_codon 57878807 57878809 . + 0 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "1"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; tag "basic"; transcript_support_level "1";
18 own exon 57896282 57896401 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "2"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000143634"; exon_version "1"; tag "basic"; transcript_support_level "1";
18 own CDS 57896282 57896401 . + 0 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "2"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own exon 57897344 57897419 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "3"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000492306"; exon_version "1"; tag "basic"; transcript_support_level "1";
18 own CDS 57897344 57897419 . + 0 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "3"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own exon 57898059 57898154 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "4"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000143631"; exon_version "1"; tag "basic"; transcript_support_level "1";
18 own CDS 57898059 57898154 . + 2 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "4"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own exon 57899260 57899399 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "5"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000353984"; exon_version "2"; tag "basic"; transcript_support_level "1";
18 own CDS 57899260 57899399 . + 2 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "5"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own exon 57900022 57900132 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "6"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000143612"; exon_version "1"; tag "basic"; transcript_support_level "1";
18 own CDS 57900022 57900132 . + 0 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "6"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own exon 57901356 57901464 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "7"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000143633"; exon_version "1"; tag "basic"; transcript_support_level "1";
18 own CDS 57901356 57901464 . + 0 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "7"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own exon 57904124 57904251 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "8"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000143609"; exon_version "1"; tag "basic"; transcript_support_level "1";
18 own CDS 57904124 57904251 . + 2 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "8"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own exon 57904338 57904422 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "9"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000470856"; exon_version "1"; tag "basic"; transcript_support_level "1";
18 own CDS 57904338 57904422 . + 0 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "9"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own exon 57905953 57906104 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "10"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000143623"; exon_version "1"; tag "basic"; transcript_support_level "1";
18 own CDS 57905953 57906104 . + 2 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "10"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own exon 57910241 57910348 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "11"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000464372"; exon_version "1"; tag "basic"; transcript_support_level "1";
18 own CDS 57910241 57910348 . + 0 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "11"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own exon 57911890 57912013 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "12"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000143613"; exon_version "1"; tag "basic"; transcript_support_level "1";
18 own CDS 57911890 57912013 . + 0 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "12"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own exon 57912837 57912938 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "13"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000329891"; exon_version "1"; tag "basic"; transcript_support_level "1";
18 own CDS 57912837 57912938 . + 2 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "13"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own exon 57914106 57914261 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "14"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000143636"; exon_version "1"; tag "basic"; transcript_support_level "1";
18 own CDS 57914106 57914261 . + 2 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "14"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own exon 57915409 57915508 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "15"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000143622"; exon_version "1"; tag "basic"; transcript_support_level "1";
18 own CDS 57915409 57915508 . + 2 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "15"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own exon 57919456 57919567 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "16"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000496856"; exon_version "1"; tag "basic"; transcript_support_level "1";
18 own CDS 57919456 57919567 . + 1 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "16"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own exon 57921746 57921886 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "17"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000143632"; exon_version "1"; tag "basic"; transcript_support_level "1";
18 own CDS 57921746 57921886 . + 0 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "17"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own exon 57926407 57926513 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "18"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000143630"; exon_version "1"; tag "basic"; transcript_support_level "1";
18 own CDS 57926407 57926513 . + 0 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "18"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own exon 57930155 57930234 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "19"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000143626"; exon_version "1"; tag "basic"; transcript_support_level "1";
18 own CDS 57930155 57930234 . + 1 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "19"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own exon 57932480 57932605 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "20"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000143624"; exon_version "1"; tag "basic"; transcript_support_level "1";
18 own CDS 57932480 57932605 . + 2 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "20"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own exon 57936308 57936430 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "21"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000507140"; exon_version "2"; tag "basic"; transcript_support_level "1";
18 own CDS 57936308 57936430 . + 2 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "21"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own exon 57937660 57937771 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "22"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000143637"; exon_version "3"; tag "basic"; transcript_support_level "1";
18 own CDS 57937660 57937771 . + 2 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "22"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own exon 57940135 57940221 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "23"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000143642"; exon_version "1"; tag "basic"; transcript_support_level "1";
18 own CDS 57940135 57940221 . + 1 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "23"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own exon 57941009 57941144 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "24"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000411210"; exon_version "1"; tag "basic"; transcript_support_level "1";
18 own CDS 57941009 57941144 . + 1 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "24"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own exon 57941753 57941820 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "25"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000465019"; exon_version "1"; tag "basic"; transcript_support_level "1";
18 own CDS 57941753 57941820 . + 0 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "25"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own exon 57943916 57946821 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "26"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; exon_id "ENSMUSE00000350837"; exon_version "2"; tag "basic"; transcript_support_level "1";
18 own CDS 57943916 57944048 . + 1 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "26"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; protein_id "ENSMUSP00000111023"; protein_version "2"; tag "basic"; transcript_support_level "1";
18 own stop_codon 57944049 57944051 . + 0 gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; exon_number "26"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; tag "basic"; transcript_support_level "1";
18 own five_prime_utr 57878678 57878806 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; tag "basic"; transcript_support_level "1";
18 own three_prime_utr 57944052 57946821 . + . gene_id "ENSMUSG00000024597"; gene_version "10"; transcript_id "Slc12a2b "; transcript_version "2"; gene_name "Slc12a2"; gene_source "own"; gene_biotype "protein_coding"; transcript_name "Slc12a2-202"; transcript_source "own"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS37826"; tag "basic"; transcript_support_level "1";

kallet

unread,
Jul 30, 2020, 11:07:11 AM7/30/20
to RSEM Users
Well apparently it didn't fit!

Based on grepping RSEM has placed the new isoform in the data as expected, but does not want to use it for estimating expression:

grep 'Slc' transcriptInfo.tab -A1 -B1

ENSMUST00000115366 1169893525 1169961668 1169745912 1 54 278221 47020

Slc12a2b 1169893525 1169961668 1169961668 1 26 278275 47020

ENSMUST00000025497 1170023470 1170224773 1169961668 2 65 278301 47021

Peng

unread,
Jul 30, 2020, 5:03:38 PM7/30/20
to RSEM Users
Hi Kalle,

The warning message you received (as from your first post) is:


'RSEM can not recognize reference sequence name Slc12b2!'

But the transcript in your GTF lines has name as 'Slc12a2b' (as from your later posts), which is different from the 'Slc12b2' in the warning message.  Would this be any helpful to diagnose the issue?

Best,
Peng

kallet

unread,
Jul 31, 2020, 5:57:14 AM7/31/20
to RSEM Users
Hi,
unfortunately not, I had it written wrong in my original post. I'll edit that.
bw, Kalle

kallet

unread,
Jul 31, 2020, 5:57:47 AM7/31/20
to RSEM Users


keskiviikko 29. heinäkuuta 2020 14.34.52 UTC+3 kallet kirjoitti:
Hi all RSEM users,

I'm applying RSEM to quantify transcript level expression with STAR aligner in mouse, and it works great with the GTF file for the GRCm38 genome I downloaded from Ensembl FTP site.

However there is an isoform missing from the mouse annotation, which we see in our data and exists in human, and I'd like to include that in the GTF. It should fairly simple, as it only skips one exon, but adding the isoform as new GTF lines doesn't seem to work properly.

The rsem-prepare-reference step with option --star goes through and finishes as expected, but when doing rsem-calculate-expression, it stalls and I get a warning 'RSEM can not recognize reference sequence name Slc12a2!'
which is the new alternative isoform name I included in the GTF, right after the original one.

I've thought of 2 options; either prepare-reference has skipped this isoform if I've done it incorrectly, but the lines seem correct to me. I believe RSEM should care only for start, CDS and stop lines in GTF of gene ID and isoform ID lines, though I've included all lines for the extra isoform. I suppose RSEM doesn't care if the isoform exists in Ensembl data base for real? Any suggestions to get around this are warmly welcome!

bw, Kalle, Univ Helsinki


Reply all
Reply to author
Forward
0 new messages