boost::math::digamma<long double>(long double): Evaluation of function at pole -nan Exception

Daniel Gaston

unread,

May 30, 2016, 9:36:59 AM5/30/16

to Sailfish Users Group

I am running Salmon on some data from an Ion Torrent experiment from a collaborator. I have successfully run Salmon on all of the samples except for 1, with Bootstrapping and VBOpt. The command line I am using is:

salmon quant -i transcripts.salmon.index -l SF -p 24 --useVBOpt --numBootstraps 100 --biasCorrect --useFSPD -r sample.fastq -o quants/sample_quant/

The output I am getting is:

Version Info: This is the most recent version of Salmon.
# salmon (mapping-based) v0.6.0
# [ program ] => salmon
# [ command ] => quant
# [ index ] => { ../../../human_kshv_2.1_transcripts.salmon.index }
# [ libType ] => { SF }
# [ threads ] => { 24 }
# [ useVBOpt ] => { }
# [ numBootstraps ] => { 100 }
# [ biasCorrect ] => { }
# [ useFSPD ] => { }
# [ unmatedReads ] => { ../../11_TT.bam.fastq }
# [ output ] => { quants/11_TT_quant/ }
Logs will be written to quants/11_TT_quant/logs
there is 1 lib
[2016-05-30 04:07:38.946] [jointLog] [info] parsing read library format
Loading 32-bit quasi index[2016-05-30 04:07:39.742] [stderrLog] [info] Loading Suffix Array
[2016-05-30 04:07:39.742] [stderrLog] [info] Loading Position Hash
[2016-05-30 04:07:39.740] [jointLog] [info] Loading Quasi index
[2016-05-30 04:07:44.900] [stderrLog] [info] Loading Transcript Info
[2016-05-30 04:07:46.190] [stderrLog] [info] Loading Rank-Select Bit Array
[2016-05-30 04:07:46.547] [stderrLog] [info] There were 180353 set bits in the bit array
[2016-05-30 04:07:46.584] [stderrLog] [info] Computing transcript lengths
[2016-05-30 04:07:46.584] [stderrLog] [info] Waiting to finish loading hash
Index contained 180353 targets
[2016-05-30 04:08:02.653] [jointLog] [info] done
[2016-05-30 04:08:02.653] [stderrLog] [info] Done loading index

processed 56000000 fragments
hits: 106673811; hits per frag: 1.90754

[2016-05-30 04:09:59.659] [jointLog] [info] Computed 221711 rich equivalence classes for further processing
[2016-05-30 04:09:59.659] [jointLog] [info] Counted 31318356 total reads in the equivalence classes
[2016-05-30 04:09:59.774] [jointLog] [info] Mapping rate = 55.5334%

[2016-05-30 04:09:59.774] [jointLog] [info] finished quantifyLibrary()
[2016-05-30 04:09:59.774] [jointLog] [info] Starting optimizer
[2016-05-30 04:10:00.014] [jointLog] [info] Marked 0 weighted equivalence classes as degenerate
[2016-05-30 04:10:00.025] [jointLog] [info] iteration = 0 | max rel diff. = 312
[2016-05-30 04:10:00.422] [jointLog] [info] iteration 50, recomputing effective lengths
Exception : [Error in function boost::math::digamma<long double>(long double): Evaluation of function at pole -nan]
/mnt/shared-data/Software/SalmonBeta-0.6.1_DebianSqueeze/bin/salmon quant was invoked improperly.
For usage information, try /mnt/shared-data/Software/SalmonBeta-0.6.1_DebianSqueeze/bin/salmon quant --help

I haven't used Salmon extensively and am just switching over to using it versus my Bowtie+Cufflinks pipeline. To me seeing the -nan flag I would intuitively suspect some sort of underflow or overflow issue perhaps. Any suggestions for further investigating or fixing this problem?

Rob

unread,

May 30, 2016, 10:02:14 AM5/30/16

to Sailfish Users Group

Hi Daniel,

Welcome to the user group. I'm sorry you ran into this issue. Indeed, what you're seeing is a numerical underflow issue. It happens when the digamma function is evaluated on an argument that is too small (this typically happens when a transcript with some ambiguous mass has its fragments reassigned and it ends up with a fraction assignment of fragments that is close to—but not quite— 0). There is a fix for this upstream in the nb branch (https://github.com/COMBINE-lab/salmon/tree/nb), though it hasn't been incorporated into a release yet. There are some great new features that are nearly finished testing that I was hoping to make it into the next immediate release, but perhaps it makes sense just to address this issue as a bugfix release ASAP. One issue, however, is that (as you can see) this bug is rather rare, and I don't currently have access to test data sets where I can validate my fix.

I should note that this problem is also likely to go away if you drop the `--useVBOpt` flag. This is because the standard EM algorithm doesn't work in log space, and doesn't try to evaluate the digamma function on very small arguments at any point.

Best,

Rob

Daniel Gaston

unread,

May 31, 2016, 6:50:13 AM5/31/16

to Sailfish Users Group

Hi Rob,

Thanks for the quick reply. I used to work in molecular phylogenetics so I'm used to the underflow issue when working in log space. I'll use the EM optimization algorithm on the data for now and re-analyze with VBOpt when a fix is released.