Two questions on BamCoverage and computeMatrix for clean normalisation

503 views
Skip to first unread message

Jean-Philippe Villemin

unread,
Jan 31, 2018, 4:37:10 AM1/31/18
to deepTools

I'm testing DeepTools since a few days and I was wondering if you have advices to give on the normalisation step. 


I mean for example when you have done ChipSeq for one histone mark. 

You have 2 replicates and one input. 

All are in bam format.( from bwa alignment for example)


 And finally you want only one wiggle "normalised" to display on viewer or to make some plots to profile the signal around several coordinates.


You want at the end one wiggle file normalised in RPKM or RPM.


This is what I am doing ,(there is a step where i'm using wiggleTools from Ensembl to mean replicates wig for treatment and remove input signal) and I am wondering something in computeMatrix.


1-/ For compute matrix, what are the defaults for binSize & averageTypeBins.

And If you have compute bamCoverage with binSize of 10, it would be appropriate to use the same binSize, no. And is the signal not already set to its mean for this binsize by bamCoverage ?

So then when you use plotProfile , it will compute the stat set in averageTypeBins for the binsize set here. But if averageTypeBins or binsize is not set, what will it do by default ?


2- / No need to preprocess bam before with PicardTools to mark duplicates, Deeptools handle that by itself. Right
Juste to be sure,
From doc :
If set, reads that have the same orientation and start position will be considered only once. If reads are paired, the mate’s position also has to coincide to ignore a read.
So deeptools check by itself no need to pre-treatment, right ?


Thanks


echo "Create bigwig rep1 "
bamCoverage
--bam ${rep1BamPath} --normalizeUsingRPKM --outFileName ${OUT}${name}.rep1.new.bw --ignoreDuplicates --outFileFormat bigwig --binSize 10 --extendRead ${fragSize}  --minMappingQuality 20


echo
"Create bigwig rep2 "
bamCoverage
--bam ${rep2BamPath} --normalizeUsingRPKM --outFileName ${OUT}${name}.rep2.new.bw --ignoreDuplicates --outFileFormat bigwig --binSize 10 --extendRead ${fragSize}  --minMappingQuality 20


echo
"Create bigwig control "
bamCoverage
--bam ${controlBamPath} --normalizeUsingRPKM --outFileName ${OUT}${name}.control.new.bw --ignoreDuplicates --outFileFormat bigwig --binSize 10 --extendRead ${fragSize}  --minMappingQuality 20


echo
"wiggletools mean Replicates and substract new"
wiggletools write $
{OUT}${name}.mean.normalised.wig diff mean ${OUT}${name}.rep1.new.bw ${OUT}${name}.rep2.new.bw : ${OUT}${name}.control.new.bw


echo
"WigGoBigWig"
wigtoBigWig
-clip ${OUT}${name}.mean.normalised.wig ${chromLengthIndex} ${OUT}${name}.mean.normalised.bw


echo
"computeMatrix"
computeMatrix reference
-point  --regionsFileName ${joinedPathToBed} --scoreFileName ${OUT}${name}.mean.normalised.bw --outFileName ${OUT}${name}.mean.normalised.gz -a ${winInExon} -b ${winInIntron}

#-binSize ( Already binsize was set to 10 in bamCoverage , so no need to set it again)
#-averageTypeBins (what is used by default ?)

echo
"plotProfile"
plotProfile
-m ${OUT}${name}.mean.normalised.gz -out ${OUT}${name}.mean.normalised.png --plotTitle ${name}

Devon Ryan

unread,
Jan 31, 2018, 7:54:12 AM1/31/18
to Jean-Philippe Villemin, deepTools
1. The default bin size is 10, the default average type is "mean". The
bins used in computeMatrix will often only partially overlap those in
the underlying bigWig files. The computeMatrix bins generally affect
the resolution of the resulting image whereas those in the bigWig
files more directly affect the underlying data. It doesn't make sense
to use a bin size of 1 in computeMatrix when you've used a bin size of
50 in bamCoverage/bamCompare, of course. The defaults (50 for bigWig
files and 10 in computeMatrix) represent a convenient trade off, where
space is saved in the bigWig files but you can still get some decent
resolution in the resulting images. A bin averaging type of "mean" is
the default for both.

2. Right, no need to preprocess.

Devon
--
Devon Ryan, Ph.D.
Email: dpr...@dpryan.com
Data Manager/Bioinformatician
Max Planck Institute of Immunobiology and Epigenetics
Stübeweg 51
79108 Freiburg
Germany
> --
> You received this message because you are subscribed to the Google Groups
> "deepTools" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to deeptools+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages