High difference in raw counts between salmon and bowtie2

Guglielmo Puccio

unread,

Jan 24, 2019, 9:15:45 AM1/24/19

to Sailfish Users Group

Hi,

I used salmon for the transcripts quantification and then fed the qunat.sf files to DESeq2 following DESeq2 guide and using the tximport package. Everything went smooth but the number of degs obtained looked really low. I tried salmon with lots of options like quasi, --validateMappings, --rangeFactorizationBins 4, --gcBias. I even got a warning message for the library and tried out the stranded library format (i don't know if the reads are stranded or unstranded). All these tries gave similar results in terms of raw counts per transcript with, of course, some differences.

Then i decided to go the old way and mapped the reads with bowtie2 (--very-sensitive) on the same transcriptome i was using with salmon and used samtools idxstats to obtain the raw counts. The result is a much higher number of raw counts and higher number of degs.

The strange thing is that i find lots of transcripts that has 0 or 1 count with salmon and hundreds with bowtie2 . I was wondering that maybe i was doing something wrong with salmon command.

I have also tried with raw reads and with trimmed reads (using trimmomatic) but i didn't find any significant difference.

here is what i used to launch salmon

salmon quant -i ../AC_combine/ac_combine_index_quasi/ --validateMappings --gcBias -l A -1 10_001_1P.fastq -2 10_001_2P.fastq -p 2 -o prova_10

and the bowtie2 command

bowtie2 -p 12 --very-sensitive -x ../bt2_index/index -1 ../trimmed/10_001_1P.fastq -2 ../trimmed/10_001_2P.fastq -S prova_10_mapped.sam

The last thing i tried was using salmon with the bam file generated with bowtie2. This gave a a lower count then the bowtie2\samtools method, but way better than the salmon method on the raw reds.

Here i add some transcripts rawcounts done with salmon and with samtools\bowtie2

transcriptbowtie2 salmon quasi quasi_stranded salmon_bam

Locus_1.1 352 1.000 1.000 1.000 174.337

Locus_1.2 464 0.000 0.000 0.000 234.700

Locus_1.3 607 0.000 0.000 0.000 306.498

Locus_2.2 443 16.781 16.828 16.912 224.651

Locus_2.3 445 0.000 0.000 0.000 226.578

Locus_2.4 407 0.000 0.000 0.000 205.300

Locus_2.5 253 21.760 21.760 21.760 123.791

Locus_2.6 23 7.417 7.417 7.417 11.270

Locus_2.7 430 194.972 194.999 194.936 213.972

Locus_2.8 264 174.525 174.519 174.526 134.232

Locus_2.9 464 144.395 144.384 144.398 231.653

Locus_2.10 465 147.022 147.075 147.002 236.608

Locus_2.11 472 1117.128 1117.019 1117.048 238.188

Locus_3.1 2 0.000 0.000 0.000 1.000

Locus_3.2 38 8.303 8.304 8.303 19.000

Locus_3.3 6 0.000 0.000 0.000 3.999

Locus_3.4 41 17.411 17.411 17.411 20.542

Locus_3.5 35 15.763 15.763 15.763 17.458

Locus_3.6 50 16.522 16.522 16.522 25.001

Locus_3.7 11 0.000 0.000 0.000 5.000

Locus_4.1 1003 820.253 820.254 820.253 505.749

Locus_4.2 680 16.747 16.746 16.747 337.546

Locus_5.1 589 0.000 0.000 0.000 301.228

Locus_5.2 168 0.000 0.000 0.000 85.571

Locus_5.3 646 0.000 0.000 0.000 323.983

Locus_5.4 597 0.000 0.000 0.000 300.976

Locus_5.5 229 0.000 0.000 0.000 118.925

Locus_5.6 170 0.000 0.000 0.000 84.879

Locus_5.7 615 0.000 0.000 0.000 306.200

Locus_5.8 638 0.000 0.000 0.000 321.332

(hope it is understandable)

Do you have any idea about what could be going wrong in the salmon method?

thank you

Guglielmo

Rob

unread,

Jan 24, 2019, 11:02:38 AM1/24/19

to Sailfish Users Group

Hi Guglielmo,

Thank you for the detailed report! I have a couple of potential thoughts, but it would really be easiest to debug this if you were able to provide the fastq files you are using (and the reference against which you are quantifying). First, `idxstats` is certainly not doing the right thing here, as multi-mapping reads are counted multiple times --- that is the sum of counts will be the number of alignments, not the number of reads (https://www.biostars.org/p/281879/). The difference between using salmon's mapping and feeding it the bowtie2 bam is certainly striking. However, I'd have to dig into things a bit more to see what is actually going on. Would you be able to provide the data somewhere? Also, one more minor note; because of the way that Bowtie2 outputs "orphan" mappings, it becomes very very difficult to properly deal with them during quantification (the SAM records are not paired in a meaningful way). So, it's always best to run Bowtie2 with `--no-mixed` and `--no-discordant` before feeding the BAM to salmon (though I doubt that is the cause of any issue here).

Best,

Rob

Guglielmo Puccio

unread,

Jan 24, 2019, 12:02:47 PM1/24/19

to Sailfish Users Group

Hi Rob,

Thank you for your quick response and for the precious tips.

Unfortunatly i left the office and i will be out of town for the weekend.

I will surely send you the files on monday.

Thank you again.

Best,

Guglielmo

Reply all

Reply to author

Forward