Error in while (abs(logLik[iter] - logLik[iter - 1]) > eps & iter < 10) { : missing value where TRUE/FALSE needed

eddieasalinas

unread,

May 30, 2012, 11:40:16 AM5/30/12

to mosaics_u...@googlegroups.com

Hi Dongjun,

I am testing with a two-sample run, but am getting a different error this time
(as seen highlighted below and in the subject line of this post)..
can you please provide any ideas or clues as to what's going on
and/or why the analysis isn't running to completion?

-Eddie

[salinasea2@xxxxxxxxx]/jobbed0>cat rlog.txt
Loading required package: methods
Loading required package: Rcpp
[1] "Before mosaicsRunAll..."
Use 'parallel' package for parallel computing.
Info: constructing bin-level files...
------------------------------------------------------------
Info: setting summary
------------------------------------------------------------
Name of aligned read file: /jobbed0/Sample_GH490_sorted.eland
Aligned read file format: Eland result
Directory of processed bin-level files: /jobbed0
Construct bin-level files by chromosome? N
------------------------------------------------------------
Info: setting summary
------------------------------------------------------------
Name of aligned read file: /jobbed0/Sample_GH490_sorted.control.eland
Aligned read file format: Eland result
Directory of processed bin-level files: /jobbed0
Construct bin-level files by chromosome? N
List of chromosomes to be excluded?
Fragment length: 35
Bin size: 35
------------------------------------------------------------
List of chromosomes to be excluded?
Fragment length: 35
Bin size: 35
------------------------------------------------------------
Info: reading the aligned read file and processing it into bin-level files...
Info: reading the aligned read file and processing it into bin-level files...
Info: done!
------------------------------------------------------------
Info: processing summary
------------------------------------------------------------
Directory of processed bin-level files: /jobbed0
Processed bin-level file: /jobbed0/Sample_GH490_sorted.eland_fragL35_bin35.txt
------------------------------------------------------------
Info: done!
------------------------------------------------------------
Info: processing summary
------------------------------------------------------------
Directory of processed bin-level files: /jobbed0
Processed bin-level file: /jobbed0/Sample_GH490_sorted.control.eland_fragL35_bin35.txt
------------------------------------------------------------
Info: analyzing bin-level files...
Info: fitting MOSAiCS model & call peaks...
Use 'multicore' package for parallel computing.
Info: reading and preprocessing bin-level data...
Info: data contains more than one chromosome.
Info: done!

Use 'parallel' package for parallel computing.
Info: background estimation method is determined based on data.
Info: background estimation based on bins with low tag counts.
Info: two-sample analysis (Input only).
Info: use adaptive griding.
Info: fitting background model...
Info: done!
Info: fitting one-signal-component model...
Error in while (abs(logLik[iter] - logLik[iter - 1]) > eps & iter < 10) { :
missing value where TRUE/FALSE needed
Calls: mosaicsRunAll ... mosaicsFit -> .local -> .mosaicsFit_IO -> .mosaicsZ1_1S
Execution halted
[salinasea2@bxxxxxxx]/jobbed0>

eddieasalinas

unread,

May 30, 2012, 12:57:33 PM5/30/12

to mosaics_u...@googlegroups.com

If it would help, here is the Rscript code:

library('mosaics')
library('parallel')
print("Before mosaicsRunAll...")
mosaicsRunAll(
        chipFile="/jobbed0/Sample_GH490_sorted.eland",
        chipFileFormat="eland_result",
        controlFile="/jobbed0/Sample_GH490_sorted.control.eland",
        controlFileFormat="eland_result",
        binfileDir="/jobbed0",
        peakFile="/jobbed0/peaks.bed",
        peakFileFormat="bed",
        reportSummary=TRUE,
        summaryFile=/jobbed0/summary.txt",
        reportExploratory=TRUE,
        exploratoryFile="/jobbed0/xploratory.pdf",
        reportGOF=TRUE,
        gofFile="/jobbed0/gof.pdf",
        fragLen=35,
        binSize=35,
        capping=0,
        bgEst="automatic",
        parallel=TRUE,
        nCore=12
        )
print("After mosaicsRunAll...")
save.image()
print("Saved image....exiting!")

mosaics.TS.R (END)
(note that path names have had their bases removed up to "/jobbed0")

Dongjun Chung

unread,

May 30, 2012, 5:10:19 PM5/30/12

to mosaics_u...@googlegroups.com

Hi Eddie,

In your output, I found that you set fragLen and binSize as 35. I guess that 35 bp might be read length. fragLen actually means the fragment length and in many cases, it is usually around 200 bp. I also recommend to set the bin size equal to the fragment length, e.g., 200 bp.

Could you please try larger values, such as 200, for fragLen and binSize and see whether it works?

Thanks,
Dongjun

sara.a...@gmail.com

unread,

Dec 10, 2013, 10:06:44 PM12/10/13

to mosaics_u...@googlegroups.com

Hi Dongjun,

I am getting the same error described here when analyzing ENCODE TAF1 ChIP-seq data for human H1 cells. I've successfully processed several other ENCODE data sets in MOSAiCS with the exact same scripts, starting from the .bam files on the ENCODE website, all aligned to the hg19 genome. All of the data have reads with 36 bp lengths, and I've constructed bins with a 200 bp fragment length and bin size. I've tested the actual fragment lengths of the data, which are generally between 100 and 125 bp. I'd be concerned about that, but the fact is that other data sets showed the same fragment length and I was able to apply MOSAiCS to that data using 200 bp fragment lengths and bins.

Loading required package: methods

Loading required package: Rcpp

Info: reading and preprocessing bin-level data...

Info: assume that data contains more than one chromosome

Info: done!

------------------------------------------------------------

Info: preprocessing summary

------------------------------------------------------------

[Note] Bins with ambiguous sequences will be excluded from the analysis.

Coordinates before & after preprocessing:

chr1: 0 - 249240400 -> 9800 - 249240400

chr10: 0 - 135524800 -> 59800 - 135524200

chr11: 0 - 134946400 -> 59800 - 134946000

chr12: 0 - 133841400 -> 59800 - 133841400

chr13: 0 - 115109800 -> 19019800 - 115109200

chr14: 0 - 107289400 -> 18999800 - 107289400

chr15: 0 - 102521200 -> 19999800 - 102520600

chr16: 0 - 90294800 -> 59800 - 90293200

chr17: 0 - 81195000 -> 0 - 81194800

chr18: 0 - 78017000 -> 9800 - 78017000

chr19: 0 - 59119000 -> 59800 - 59118600

chr2: 0 - 243185000 -> 9800 - 243184600

chr20: 0 - 62965000 -> 59800 - 62965000

chr3: 0 - 197959600 -> 59800 - 197900400

chr4: 0 - 191044400 -> 9800 - 191044400

chr5: 0 - 180903200 -> 9800 - 180787600

chr6: 0 - 171051000 -> 59800 - 171047600

chr7: 0 - 159128600 -> 9800 - 159128600

chr8: 0 - 146303400 -> 9800 - 146301400

chr9: 0 - 141136200 -> 9800 - 141112000

chrX: 0 - 155260400 -> 59800 - 155259800

------------------------------------------------------------

Info: background estimation method is determined based on data.

Info: background estimation based on bins with low tag counts.

Info: two-sample analysis (with mappability & GC content).

Info: use adaptive griding.

Info: fitting background model...

Info: grid = 0.01

Info: grid = 0.02

Info: grid = 0.04

Info: grid = 0.1

Info: grid = 0.2

Info: grid = 0.5

Info: done!

Info: fitting one-signal-component model...

Error in while (abs(logLik[iter] - logLik[iter - 1]) > eps & iter < 10) { :

missing value where TRUE/FALSE needed

Calls: mosaicsFit ... mosaicsFit -> .local -> .mosaicsFit_TS -> .mosaicsZ1_1S

Execution halted

Of the 12 data sets this data for TAF1 is the only data set that fails in this way, i.e. at all. Can you give me any suggestions for other issues that can cause this error, other than trying a different fragment length? The problem is eluding my attempts to resolve this.

Thanks so much!

Best,

Sara

Dongjun Chung

unread,

Dec 16, 2013, 12:14:08 AM12/16/13

to mosaics_u...@googlegroups.com

Hi Sara,

Sorry about the inconvenience due to this problem.

You got the errors while MOSAiCS tried to fit the signal component
and using slightly different background component could be helpful.

First, please try different background estimation approach.

In your case, MOSAiCS automatically chose bgEst="matchLow" based on the data.

Instead, you can explicitly specify

> fit <- mosaicsFit( ..., bgEst="rMOM", ... )

If this still does not work, you can try different analysis type.
You are currently using analysisType="TS" and you can instead try

> fit <- mosaicsFit( ..., analysisType="IO", ... )

In addition, when you use analysisType="IO",
using smaller values for truncProb argument (default = 0.999) could help, e.g.,

> fit <- mosaicsFit( ..., analysisType="IO", truncProb = 0.8, ... )

Please try these and let me know whether they work for you.

Best,
Dongjun

--
You received this message because you are subscribed to the Google Groups "MOSAiCS User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mosaics_user_gr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

sara.a...@gmail.com

unread,

Dec 20, 2013, 10:08:08 AM12/20/13

to mosaics_u...@googlegroups.com

Hi Dongjun,

I have successfully resolved problems I had with some fits to data using the IO procedure, and using the truncProb=0.8 setting. I have a followup question. Is there a principled way to decide a fit is sufficiently good to call the analysis a success? I can produce fits, and GOF plots that look reasonable, but there can still be places where the full MOSAICS model does not meet perfectly with the actual data distribution. Is there a principled way to say a fit is sufficiently good based on the BIC score? So far I feel uncomfortable with making such determinations.

Also, I am concerned about using the capping variable in the constructBins function. In general it should be set to a small integer number like 1 or 2. For one very deeply sequenced data set in yeast, I looked at the distribution for the number of reads per nucleotide in the data, and set a capping limit because there was a clear distribution out to 75 reads per nucleotide and the rest of the distribution above that value seemed to be associated with a background distribution. Should the caping variable be tuned for data sets, or should capping=1 be applied uniformly, regardless of sequencing depth?

Part of my motivation for asking is that I have other data that I processed with no capping (capping=0), which was perhaps simply a mistake on my part. I'm concerned I need to reprocess that data with capping =1, or even to tune the capping parameter for each data set, before I can be confident with the results.

Thanks for any words of wisdom you can provide. I want to converge to some good results, and to be on solid ground here.

Best,

Sara

Dongjun Chung

unread,

Dec 23, 2013, 4:17:29 PM12/23/13

to mosaics_u...@googlegroups.com

Hi Sara,

analysisType="TS" works differently from analysisType="IO" only for the bins with small input tag counts (default: input tag counts <= 2). If the input data is deeply sequenced, we have only very small number of such bins and this could make problems in model fitting in some cases when analysisType="TS". Moreover, when the input data is deeply sequenced, the input data usually provides good estimates for background & analysisType="IO" usually suffices. Hence, in general, for the deeply sequenced ChIP & input data, we suggest to consider analysisType="IO" first rather than analysisType="TS".

In summary, the recommended model fitting approach is:

1. If input control is not incorporated:
analysisType="OS".

Otherwise,

(1) If input data is NOT deeply sequenced: analysisType="TS".

(2) If input data is deeply sequenced: analysisType="IO".

2. If you have any fitting problems with default tuning parameters,

try

bgEst="rMOM" (for any analysisType) &

truncProb=[some values smaller than 0.999] (for analysisType="IO")

[Note] When analysisType="IO", input count > [truncProb quantile] is capped at [truncProb quantile] in the model fitting. Hence, the larger truncProb value is more desirable.

Regarding the capping, we recommend to choose capping depending on sequencing depth. When a data is NOT deeply sequenced, capping at small number (such as 3) is desirable. However, if a data is deeply sequenced, capping at small number might remove too much information from data. So, at this point, we recommend either (1) not to use capping at all or (2) to consider capping at very high number when the data is deeply sequenced.