collapse value

109 views
Skip to first unread message

eddieasalinas

unread,
May 25, 2012, 3:04:06 PM5/25/12
to MOSAiCS User Group
When preprocessing the eland input, what is a good value for
[collapse]?

The sample command (on the page http://www.stat.wisc.edu/~keles/Software/mosaics/)
under the section called "Preprocess Eland output to bin-level files"
has 3 as the collapse value

Why was 3 chosen for the value?

Is 3 a sort of threshold collapse value for declaration of a peak?

-Eddie

Dongjun Chung

unread,
May 26, 2012, 6:22:35 PM5/26/12
to MOSAiCS User Group
Hi Eddie,

You don't need to use the perl script any more to process eland files.
In more recent versions of mosaics package, you can process various
types of aligned read files in a much easier way using constructBins()
function. The "collapse" parameter in the perl script corresponds to
the "capping" parameter in constructBins() function. Please check the
vignette of recent version of mosaics package, e.g.,

http://www.bioconductor.org/packages/2.10/bioc/html/mosaics.html

There is no theory or rigorous rule to determine "capping" value and
it is not related to the peak calling step either. capping is used
only in the preprocessing step and it is to remove possible PCR
amplification artifacts.

capping = 3 means that at most 3 reads are allowed to be mapped to the
exactly identical nucleotide position on the genome. capping = 3 is
appropriate for the data with low sequencing depth because we usually
observe only one read at each nucleotide position. However, for data
with high sequencing depth (e.g., hiSeq), it is possible that small
capping value could rather introduce biases and in this case, we
recommend not to use capping at all. You can avoid using capping by
setting capping to some non-positive value, e.g., capping = 0.

Thanks,
Dongjun

On May 25, 2:04 pm, eddieasalinas <eddieasali...@gmail.com> wrote:
> When preprocessing the eland input, what is a good value for
> [collapse]?
>
> The sample command (on the pagehttp://www.stat.wisc.edu/~keles/Software/mosaics/)

eddieasalinas

unread,
May 29, 2012, 11:11:59 AM5/29/12
to mosaics_u...@googlegroups.com
Hi Dongjun,

Thank you!

It has been something for me to pay attention to as to which R vignette I pull up from google searches , usually a version that's *not* the most recent comes up.....that didn't help me. thank you for helping me to be sure that I use the most recent version.

I have used the function you mentioned to construct bins.

Then, I used the "readBins" function to read in the bins using the command below:
binData<-readBins(type=c("chip","M","GC","N"),fileName=c("./XXX.eland_fragL35_bin35.txt","./XXX.eland_fragL35_bin35.txt","./XXX.eland_fragL35_bin35.txt","./XXX.eland_fragL35_bin35.txt"))

I thought to use this function as suggested by the vignette.

Then, I try to use the mosaicsFit function for fitting (also per the vignette)
See how I have a one-sample analysis.  I am trying to speed things up by using the parallel package and all avaialble cores.

However, I get an error "Error in .rlmFit_OS(parEst = fitParam, mean_thres = meanThres, Y = binData@tagCount,  :
  insufficient # of proper strata! Cannot proceed!"

Do you have an idea what I'm doing wrong?

> myFit<-mosaicsFit(binData, analysisType="OS", bgEst="automatic" ,parallel=TRUE, nCore=32)
Use 'parallel' package for parallel computing.
Info: background estimation method is determined based on data.
Info: background estimation based on bins with low tag counts.
Info: one-sample analysis.
Info: use adaptive griding.
Info: fitting background model...
Info: grid = 0.01
Info: grid = 0.02
Info: grid = 0.04
Info: grid = 0.1
Info: grid = 0.2
Info: grid = 0.5
Error in .rlmFit_OS(parEst = fitParam, mean_thres = meanThres, Y = binData@tagCount,  :
  insufficient # of proper strata! Cannot proceed!
>


-Eddie

Dongjun Chung

unread,
May 30, 2012, 5:13:15 PM5/30/12
to MOSAiCS User Group
Hi Eddie,

I guess that this problem might be related to your choice of fragLen
and binSize.

In your output, I found that you set fragLen and binSize as 35. I
guess that 35 bp might be read length. fragLen actually means the
fragment length and in many cases, it is usually around 200 bp. I also
recommend to set the bin size equal to the fragment length, e.g., 200
bp.

Could you please try larger values, such as 200, for fragLen and
binSize and let me know whether it works?

Thanks,
Dongjun
> >www.stat.wisc.edu/~keles/Software/mosaics/<http://www.stat.wisc.edu/%7Ekeles/Software/mosaics/>)

eddieasalinas

unread,
May 31, 2012, 1:21:00 PM5/31/12
to mosaics_u...@googlegroups.com
Hi Dongjun,

Thank you for your help.  I have made some progress...

Using a two-sample analysis, I was able to get peak and bed files.  Thank you.

However, for a one-sample, analysis, I am still unable to get results.
I am getting the same "insufficient strata" error as before.

This time I am using 146, the proper tag size!

Any thoughts?

See below:

[salinasea2@p66]/analysis/MOSAiCS>R

R version 2.15.0 (2012-03-30)
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library('parallel')
> library('mosaics')
Loading required package: Rcpp
> constructBins(infile="XXXXX_5_cat.sorted.sam.eland",fileFormat="eland_result",byChr=FALSE,fragLen=146, binSize=146)
------------------------------------------------------------
Info: setting summary
------------------------------------------------------------
Name of aligned read file: XXXXX_5_cat.sorted.sam.eland
Aligned read file format: Eland result 
Directory of processed bin-level files: ./ 
Construct bin-level files by chromosome? N 
List of chromosomes to be excluded?  
Fragment length: 146 
Bin size: 146 
------------------------------------------------------------
Info: reading the aligned read file and processing it into bin-level files...
Info: done!
------------------------------------------------------------
Info: processing summary
------------------------------------------------------------
Directory of processed bin-level files: ./ 
Processed bin-level file: XXXXX_5_cat.sorted.sam.eland_fragL146_bin146.txt
------------------------------------------------------------
> binData<-readBins(type=c("chip","M","GC","N"),fileName=c("XXXXX_5_cat.sorted.sam.eland_fragL146_bin146.txt","XXXXX_5_cat.sorted.sam.eland_fragL146_bin146.txt","XXXXX_5_cat.sorted.sam.eland_fragL146_bin146.txt","XXXXX_5_cat.sorted.sam.eland_fragL146_bin146.txt"))
Info: reading and preprocessing bin-level data...
Info: data contains more than one chromosome.
Info: done!

------------------------------------------------------------
Info: preprocessing summary
------------------------------------------------------------
[Note] Bins with ambiguous sequences will be excluded from the analysis.
Coordinates before & after preprocessing:
chr10.fa: 0 - 135524792 -> 0 - 135524792
chr11.fa: 0 - 134946486 -> 0 - 134946486
chr11_gl000202_random.fa: 0 - 39274 -> 0 - 39274
chr12.fa: 0 - 133841266 -> 0 - 133840974
chr13.fa: 0 - 115109466 -> 0 - 115109174
chr14.fa: 0 - 107289268 -> 0 - 107289268
chr15.fa: 0 - 102521054 -> 0 - 102521054
chr16.fa: 0 - 90294722 -> 0 - 90294430
chr17_ctg5_hap1.fa: 0 - 1680606 -> 0 - 1680314
chr17.fa: 0 - 81194980 -> 0 - 81194980
chr17_gl000203_random.fa: 0 - 37376 -> 0 - 37084
chr17_gl000204_random.fa: 0 - 79424 -> 0 - 79424
chr17_gl000205_random.fa: 0 - 174616 -> 0 - 174470
chr17_gl000206_random.fa: 0 - 40588 -> 292 - 40588
chr18.fa: 0 - 78016998 -> 0 - 78016998
chr18_gl000207_random.fa: 0 - 4088 -> 0 - 4088
chr19.fa: 0 - 59118904 -> 0 - 59118904
chr19_gl000208_random.fa: 0 - 92564 -> 0 - 92564
chr19_gl000209_random.fa: 0 - 158702 -> 0 - 158410
chr1.fa: 0 - 249240542 -> 0 - 249240250
chr1_gl000191_random.fa: 0 - 106142 -> 0 - 105996
chr1_gl000192_random.fa: 0 - 547354 -> 292 - 547354
chr20.fa: 0 - 62965420 -> 0 - 62965274
chr21.fa: 0 - 48119848 -> 0 - 48119702
chr21_gl000210_random.fa: 0 - 27448 -> 292 - 27156
chr22.fa: 0 - 51243664 -> 0 - 51243372
chr2.fa: 0 - 243189426 -> 0 - 243189134
chr3.fa: 0 - 197956290 -> 0 - 197955998
chr4_ctg9_hap1.fa: 0 - 590424 -> 0 - 590424
chr4.fa: 0 - 191044212 -> 0 - 191044212
chr4_gl000193_random.fa: 0 - 187464 -> 0 - 187464
chr4_gl000194_random.fa: 0 - 190384 -> 0 - 190384
chr5.fa: 0 - 180904950 -> 0 - 180904804
chr6_apd_hap1.fa: 0 - 4622214 -> 0 - 4621922
chr6_cox_hap2.fa: 0 - 4793910 -> 0 - 4793764
chr6_dbb_hap3.fa: 0 - 4609220 -> 0 - 4609220
chr6.fa: 0 - 171051264 -> 0 - 171051264
chr6_mann_hap4.fa: 0 - 4648932 -> 0 - 4648640
chr6_mcf_hap5.fa: 0 - 4833330 -> 0 - 4833330
chr6_qbl_hap6.fa: 0 - 4611848 -> 0 - 4611556
chr6_ssto_hap7.fa: 0 - 4927354 -> 0 - 4927062
chr7.fa: 0 - 159128320 -> 0 - 159128320
chr7_gl000195_random.fa: 0 - 182938 -> 0 - 182792
chr8.fa: 0 - 146301198 -> 0 - 146301198
chr8_gl000196_random.fa: 0 - 38836 -> 438 - 38544
chr8_gl000197_random.fa: 0 - 37084 -> 0 - 37084
chr9.fa: 0 - 141129586 -> 0 - 141129440
chr9_gl000198_random.fa: 0 - 90082 -> 0 - 90082
chr9_gl000199_random.fa: 0 - 169506 -> 0 - 169360
chr9_gl000200_random.fa: 0 - 186880 -> 0 - 186588
chr9_gl000201_random.fa: 0 - 36062 -> 0 - 36062
chrM.fa: 0 - 16644 -> 0 - 16644
chrUn_gl000211.fa: 0 - 166002 -> 0 - 166002
chrUn_gl000212.fa: 0 - 186296 -> 0 - 186150
chrUn_gl000213.fa: 0 - 164104 -> 146 - 163812
chrUn_gl000214.fa: 0 - 137386 -> 146 - 137240
chrUn_gl000215.fa: 0 - 171550 -> 0 - 170966
chrUn_gl000216.fa: 0 - 172280 -> 0 - 172134
chrUn_gl000217.fa: 0 - 172134 -> 146 - 171988
chrUn_gl000218.fa: 0 - 161038 -> 0 - 161038
chrUn_gl000219.fa: 0 - 179142 -> 0 - 179142
chrUn_gl000220.fa: 0 - 161768 -> 0 - 161768
chrUn_gl000221.fa: 0 - 155198 -> 0 - 155052
chrUn_gl000222.fa: 0 - 186442 -> 146 - 186150
chrUn_gl000223.fa: 0 - 179434 -> 292 - 179142
chrUn_gl000224.fa: 0 - 179726 -> 0 - 179434
chrUn_gl000225.fa: 0 - 210970 -> 0 - 210824
chrUn_gl000226.fa: 0 - 15038 -> 0 - 15038
chrUn_gl000227.fa: 0 - 128334 -> 0 - 128334
chrUn_gl000228.fa: 0 - 129064 -> 0 - 128772
chrUn_gl000229.fa: 0 - 19710 -> 146 - 19710
chrUn_gl000230.fa: 0 - 43508 -> 0 - 43216
chrUn_gl000231.fa: 0 - 27448 -> 0 - 27156
chrUn_gl000232.fa: 0 - 40588 -> 0 - 40588
chrUn_gl000233.fa: 0 - 45844 -> 0 - 45260
chrUn_gl000234.fa: 0 - 40588 -> 0 - 40588
chrUn_gl000235.fa: 0 - 34310 -> 0 - 34310
chrUn_gl000236.fa: 0 - 41318 -> 0 - 41172
chrUn_gl000237.fa: 0 - 45844 -> 146 - 45698
chrUn_gl000238.fa: 0 - 39712 -> 0 - 39420
chrUn_gl000239.fa: 0 - 33726 -> 0 - 33726
chrUn_gl000240.fa: 0 - 41464 -> 0 - 41318
chrUn_gl000241.fa: 0 - 42194 -> 0 - 42194
chrUn_gl000242.fa: 0 - 43216 -> 0 - 43216
chrUn_gl000243.fa: 0 - 42340 -> 0 - 42340
chrUn_gl000244.fa: 0 - 38836 -> 292 - 38690
chrUn_gl000245.fa: 0 - 36062 -> 0 - 36062
chrUn_gl000246.fa: 0 - 37814 -> 0 - 37814
chrUn_gl000247.fa: 0 - 36354 -> 0 - 36354
chrUn_gl000248.fa: 0 - 39712 -> 0 - 39712
chrUn_gl000249.fa: 0 - 38252 -> 0 - 37960
chrX.fa: 0 - 155260050 -> 0 - 155259904
chrY.fa: 0 - 59363016 -> 0 - 59362870
------------------------------------------------------------
>  myFit<-mosaicsFit(binData, analysisType="OS", bgEst="automatic" ) 
Info: background estimation method is determined based on data.
Info: background estimation based on bins with low tag counts.
Info: one-sample analysis.
Info: use adaptive griding.
Info: fitting background model...
Info: grid = 0.01
Info: grid = 0.02
Info: grid = 0.04
Info: grid = 0.1
Info: grid = 0.2
Info: grid = 0.5
Error in .rlmFit_OS(parEst = fitParam, mean_thres = meanThres, Y = binData@tagCount,  : 
  insufficient # of proper strata! Cannot proceed!


-Eddie

Dongjun Chung

unread,
Jun 3, 2012, 4:26:03 PM6/3/12
to mosaics_u...@googlegroups.com
Hi Eddie,

The command lines indicate that you used the same file ("XXXXX_5_cat.sorted.sam.eland_fragL146_bin146.txt") for ChIP, M, GC, & N and I think that this generates the problem. I guessed so because

> binData <- readBins( type=c("chip","M","GC","N"),

fileName=c("XXXXX_5_cat.sorted.sam.eland_fragL146_bin146.txt",
"XXXXX_5_cat.sorted.sam.eland_fragL146_bin146.txt",
"XXXXX_5_cat.sorted.sam.eland_fragL146_bin146.txt",
"XXXXX_5_cat.sorted.sam.eland_fragL146_bin146.txt"))

If this is the case, you need to use proper M, GC, and N files. From the output, I guess that you are working on the human genome. Are you using hg19 as the reference genome? If so, I have preprocessed M, GC, and N files for hg19 with fragment length = bin size = 200 bp here:

http://www.stat.wisc.edu/~chungdon/hg19/

Could you please generate the bin-level file for ChIP sample using fragment length = bin size = 200 bp and try the one-sample analysis using the proper M, GC, and N files?

Thanks,
Dongjun

eddieasalinas

unread,
Jun 4, 2012, 7:26:01 AM6/4/12
to mosaics_u...@googlegroups.com
Hi Dongjun,

Yes, the alignment is/was against hg19.

I used the link that you gave me and downloaded HG19 data.

I tried to use the files downloaded with the CHIP data (for a one-sample analysis).

I first got an error that there were no chromosomes in common.

To address that, I removed ".fa" from the bin files generated by the MOSAICS constructBins function.

After that, I retried the analysis, but now I get a different error, "...differing number of rows."

See the log below....any ideas???

Should I reformat the BIN files from HG19 with the 146-value from the chip data?  Would that make a difference????

[username@xyx]/data/xxxxx/job_single>R

R version 2.15.0 (2012-03-30)
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library('mosaics')
Loading required package: Rcpp
> binData<-readBins(
+ type=c("chip","M","GC","N"),
+ fileName=c("./Sample_GH489_sorted.eland_fragL146_bin146.nofa.txt",
+ "./map_fragL200_bin200.txt",
+ "./GC_fragL200_bin200.txt",
+ "./N_fragL200_bin200.txt")
+ )
Info: reading and preprocessing bin-level data...
Info: data contains more than one chromosome.
Info: done!

------------------------------------------------------------
Info: preprocessing summary
------------------------------------------------------------
[Note] Bins with ambiguous sequences will be excluded from the analysis.
Coordinates before & after preprocessing:
chr1: 0 - 249239958 -> 715400 - 249236600
chr10: 0 - 135524792 -> 4365400 - 135517200
chr11: 0 - 134946486 -> 4365400 - 134933200
chr12: 0 - 133840244 -> 4365400 - 133838200
chr13: 0 - 115109904 -> Inf - -Inf
chr14: 0 - 107289414 -> Inf - -Inf
chr15: 0 - 102521054 -> Inf - -Inf
chr16: 0 - 90292678 -> 4365400 - 90286400
chr17: 0 - 81193082 -> 0 - 81190600
chr18: 0 - 78016852 -> 715400 - 78007800
chr19: 0 - 59118174 -> 4365400 - 59115400
chr2: 0 - 243184608 -> 715400 - 243177600
chr20: 0 - 62964982 -> 4365400 - 62955200
chr21: 0 - 48119410 -> Inf - -Inf
chr22: 0 - 51239722 -> Inf - -Inf
chr3: 0 - 197956144 -> 4365400 - 197946800
chr4: 0 - 191044212 -> 715400 - 191041000
chr5: 0 - 180900716 -> 715400 - 180894000
chr6: 0 - 171047468 -> 4365400 - 171039000
chr7: 0 - 159128466 -> 715400 - 159125400
chr8: 0 - 146301490 -> 715400 - 146292000
chr9: 0 - 141146522 -> 715400 - 141138200
chrX: 0 - 154929360 -> 4365400 - 154920600
chrY: 0 - 59033056 -> 715400 - 59027800
------------------------------------------------------------
There were 48 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
2: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
3: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
4: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
5: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
6: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
7: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
8: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
9: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
10: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
11: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
12: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
13: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
14: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
15: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
16: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
17: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
18: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
19: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
20: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
21: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
22: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
23: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
24: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
25: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
26: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
27: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
28: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
29: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
30: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
31: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
32: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
33: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
34: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
35: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
36: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
37: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
38: In Y > 0 & M == 0 :
  longer object length is not a multiple of shorter object length
39: In min(out[[i]]$coord) : no non-missing arguments to min; returning Inf
40: In max(out[[i]]$coord) : no non-missing arguments to max; returning -Inf
41: In min(out[[i]]$coord) : no non-missing arguments to min; returning Inf
42: In max(out[[i]]$coord) : no non-missing arguments to max; returning -Inf
43: In min(out[[i]]$coord) : no non-missing arguments to min; returning Inf
44: In max(out[[i]]$coord) : no non-missing arguments to max; returning -Inf
45: In min(out[[i]]$coord) : no non-missing arguments to min; returning Inf
46: In max(out[[i]]$coord) : no non-missing arguments to max; returning -Inf
47: In min(out[[i]]$coord) : no non-missing arguments to min; returning Inf
48: In max(out[[i]]$coord) : no non-missing arguments to max; returning -Inf
> myFit<-mosaicsFit(binData, analysisType="OS", bgEst="automatic" ) 
Info: background estimation method is determined based on data.
Info: background estimation based on bins with low tag counts.
Info: one-sample analysis.
Info: use adaptive griding.
Info: fitting background model...
Info: grid = 0.01
Error in data.frame(Y, as.factor(S1)) : 
  arguments imply differing number of rows: 12629491, 14307455
> 



-Eddie









On Sunday, June 3, 2012 4:26:03 PM UTC-4, Dongjun Chung wrote:
Hi Eddie,

The command lines indicate that you used the same file ("XXXXX_5_cat.sorted.sam.eland_fragL146_bin146.txt") for ChIP, M, GC, & N and I think that this generates the problem. I guessed so because

> binData <- readBins( type=c("chip","M","GC","N"),
fileName=c("XXXXX_5_cat.sorted.sam.eland_fragL146_bin146.txt",
"XXXXX_5_cat.sorted.sam.eland_fragL146_bin146.txt",
"XXXXX_5_cat.sorted.sam.eland_fragL146_bin146.txt",
"XXXXX_5_cat.sorted.sam.eland_fragL146_bin146.txt"))

If this is the case, you need to use proper M, GC, and N files. From the output, I guess that you are working on the human genome. Are you using hg19 as the reference genome? If so, I have preprocessed M, GC, and N files for hg19 with fragment length = bin size = 200 bp here:

http://www.stat.wisc.edu/~chungdon/hg19/

Could you please generate the bin-level file for ChIP sample using fragment length = bin size = 200 bp and try the one-sample analysis using the proper M, GC, and N files?

Thanks,
Dongjun

Dongjun Chung

unread,
Jun 4, 2012, 7:17:11 PM6/4/12
to mosaics_u...@googlegroups.com
Hi Eddie,

As you accurately mentioned, you need to remove ".fa" from the bin-level ChIP file. This is because readBins() function compares chromosome ID in ChIP, M, GC, and N files as they appear in these files.

And you also need to use the same binSize values for all of ChIP, M, GC, and N files. You are not required to use the same fragLen values but it is still recommended to set fragLen = binSize.

So please reconstruct bin-level ChIP files with fragLen = binSize = 200 and try the one-sample analysis with the provided M, GC, and N files. If this works and you need M, GC, and N files for fragLen = binSize = 146, I can generate these files for you. However, I expect that the difference in results by using 200 and 146 should be minor.

Thanks,
Dongjun

eddieasalinas

unread,
Jun 5, 2012, 2:14:04 PM6/5/12
to mosaics_u...@googlegroups.com
Hi Dongjun,

Great!  I made these changes and I am getting peaks now.

Thank you!

The peak-examiners will let me know if they are okay.

Thank  you again Dongjun!

-Eddie


On Monday, June 4, 2012 7:17:11 PM UTC-4, Dongjun Chung wrote:
Hi Eddie,

As you accurately mentioned, you need to remove ".fa" from the bin-level ChIP file. This is because readBins() function compares chromosome ID in ChIP, M, GC, and N files as they appear in these files.

And you also need to use the same binSize values for all of ChIP, M, GC, and N files. You are not required to use the same fragLen values but it is still recommended to set fragLen = binSize.

So please reconstruct bin-level ChIP files with fragLen = binSize = 200 and try the one-sample analysis with the provided M, GC, and N files. If this works and you need M, GC, and N files for fragLen = binSize = 146, I can generate these files for you. However, I expect that the difference in results by using 200 and 146 should be minor.

Thanks,
Dongjun

eddieasalinas

unread,
Jun 5, 2012, 3:34:00 PM6/5/12
to mosaics_u...@googlegroups.com
DongJun,

If you have a chance, can you format the reference data (hg19)
for value=146 (instead of 200)???

Also, if you have a chance to do so can you show me precisely which files
and which commands you used to perform the format, so that I can replicate the
construction for other values if necessary???

Thanks!!!


-Eddie

Dongjun Chung

unread,
Jun 21, 2012, 11:29:08 PM6/21/12
to mosaics_u...@googlegroups.com
Hi Eddie,

Sorry for the late response. I have some urgent things to take care.

You can now find the files for mappability, GC content, and ambiguity score for hg19 with fragment length = bin size = 146 bp here:

https://www.dropbox.com/sh/rt2s3bv3b9mfto0/wfjRUaXQrl

I also moved the files for mappability, GC content, and ambiguity score for hg19 with fragment length = bin size = 200 bp to this location as well.

Please try mosaics with these files and just let me know if you have any questions.

Best,
Dongjun
Reply all
Reply to author
Forward
0 new messages