Strange gene quantification

53 views
Skip to first unread message

Eugene Bolotin

unread,
May 1, 2017, 1:35:50 PM5/1/17
to Sailfish Users Group
Hi Rob,
I have been using the latest version of sailfish:0.9.2
I used the gencode26 -gencode.v26.annotation.gtf and gencode.v26.pc_transcripts.fa
I am trying to quantitate the genes, and I noticed kinda weird behavior. The transcript quantification seems to be correct. But when I tried to summarise the gene I have issues:
See example below:
This is grepped by CD19 gene:
the quant.sf and quant.gene.sf
The header is                                                                                                                                                                                                                           Name    Length  EffectiveLength TPM     NumReads
../counts_ebi/SRR1022945/quant.sf:ENST00000538922.5|ENSG00000177455.12|OTTHUMG00000097049.4|-|CD19-201|CD19|1957|UTR5:1-62|CDS:63-1736|UTR3:1737-1957|                   1957     1830.53  41.1203     423.406
../counts_ebi/SRR1022945/quant.sf:ENST00000324662.7|ENSG00000177455.12|OTTHUMG00000097049.4|OTTHUMT00000214152.2|CD19-001|CD19|1932|UTR5:1-44|CDS:45-1715|UTR3:1716-1932|1932     1805.53  259.113     2631.59
../counts_ebi/SRR1022945/quant.sf:ENST00000567541.5|ENSG00000177455.12|OTTHUMG00000097049.4|OTTHUMT00000432708.2|CD19-004|CD19|1707|UTR5:1-33|CDS:34-1707|               1707     1580.53  0           0

../counts_ebi/SRR1022945/quant.genes.sf:ENSG00000177455.12                                                                                                               1533     1406.53  0           0

Summing significant expression to 0 is pretty suspect.
I run the command line as directed in the manual.
This is output of cmd_info.json
{
    "sf_version": "0.9.2",
    "index": "/data/SRA/gencode26/gencode26",
    "libType": "IU",
    "mates1": "/dev/fd/63",
    "mates2": "/dev/fd/62",
    "output": "/data/SRA/counts_ebi/SRR1022945",
    "geneMap": "/data/SRA/gencode26/gencode.v26.annotation.gtf",
    "threads": "12"
}
Thanks,
Eugene

Eugene Bolotin

unread,
May 1, 2017, 1:57:10 PM5/1/17
to Sailfish Users Group
I searched the archive and I found a previous post:
Referencing the warning of GTF parsing for a single transcript.
I think i may have the same issue, I agree with you Rob, that warning should be much stronger. Ill try using tximport and see what happens.
Eugene

Rob

unread,
May 1, 2017, 2:39:54 PM5/1/17
to Sailfish Users Group
Hi Eugene,

  You're very efficient in answering your questions quickly ;P --- you've been beating me to the answers.  Indeed, this warning should be stronger.  Also, there is a bug fix for how aggregation is handled in the case of missing transcript <-> gene mappings (in salmon v >= 0.8.0) that has not yet been backported to Sailfish.  tximport should fix this problem, and you could also check to see if Salmon handles this case as you would expect.

Best,
Rob

Eugene Bolotin

unread,
May 1, 2017, 4:34:51 PM5/1/17
to Sailfish Users Group
Hi Rob,
I did use tximport and it worked as advertised. =)
thanks,
Eugene
Reply all
Reply to author
Forward
0 new messages