Normalization and Hetmaps

953 views
Skip to first unread message

ATACseq

unread,
Feb 13, 2018, 12:24:42 PM2/13/18
to deepTools
Hi,
I'm learning deeptools as well exploring ATAC and Chip Seq data. I have very basic questions.
1) How do I select appropriate normalization option in bamCoverage? Any plots that I can plot with deeptools that helps me to assess normalization, smoothing and binning aspects?
2) How do I plot heatmaps for a specified region or all TSS sites or a list of genes using deeptools?

Appreciate if you can help on this.
Thank you

Devon Ryan

unread,
Feb 13, 2018, 1:48:18 PM2/13/18
to ATACseq, deepTools
1) I like 1x normalization (termed RPGC in deepTools 3.0). I don't
recommend smoothing, accept that your data is a bit noisy. The default
bin size is sufficient for most use cases, though you might decrease the
bin size a bit if you want to look for footprints in your data (good
luck, you need crazy high coverage to see that).

2) computeMatrix and then plotHeatmap.

Devon
--
Devon Ryan, PhD
Bioinformatician / Data manager
Bioinformatics Core Facility
Max Planck Institute for Immunobiology and Epigenetics
Email: dpry...@gmail.com

ATAC

unread,
Feb 14, 2018, 5:44:18 PM2/14/18
to deepTools
Hi Devon,
I tried to search this on groups but could not find anything relevant. Sorry if been asked previously.
Is there a way to custom the sample labels in compute matrix?
My raw .bw files have very long names. When I run them through Computematrix and Plotheatmap, the names overflow and overlap. Since, by default the names are assigned as per the input filename. Is there a way I can customize the names at the Computematrix level?
Also, how to arrange the plots, 3x3 or 4x4 ? Currently, I get a single row of plots. Here is my code example I ran.

computeMatrix reference-point \
-R hg19genes_ucsc.bed \
-S *.bw \
-b 1000 -a 1000 \
--binSize 20 \
--sortRegions no \
--referencePoint TSS \
--outFileName Chip_matrix.tab.gz

plotHeatmap \
-m Chip_matrix.tab.gz \
-out test.png \
--colorMap hot_r \
--missingDataColor .4 \
--heatmapHeight 7 \
--plotTitle 'Chip Seq Data' \
--whatToShow 'heatmap and colorbar' \
--sortRegions ascend

Devon Ryan

unread,
Feb 15, 2018, 2:30:55 AM2/15/18
to ATAC, deepTools
In deepTools 3 have a look at the `--smartLabels` option, which will
strip file paths and extensions for you. Otherwise, you'll want the
`--samplesLabel` option in plotHeatmap and/or computeMatrix (I think I
added that option to computeMatrix in version 3, so it may only exist
in plotHeatmap for earlier versions).

Devon
--
Devon Ryan, Ph.D.
Email: dpr...@dpryan.com
Data Manager/Bioinformatician
Max Planck Institute of Immunobiology and Epigenetics
Stübeweg 51
79108 Freiburg
Germany
> --
> You received this message because you are subscribed to the Google Groups "deepTools" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to deeptools+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

sund...@gmail.com

unread,
Feb 21, 2018, 4:30:33 PM2/21/18
to deepTools
Hi Devon,
I compute bamCoverage seperately, then for a given bed file do computeMatrix together and plotHeatmaps
I am trying to understand the scores that ComputeMatrix generates.
What are the scores and how are they calculated?
The .bw files used here are generated with default bin size in bamCoverage, If new computeMatrix, what does it effect ?
When clustering is applied, is it applied for single sample (.bw file ) or all the files together? I am not sure if it is a good practice to apply clustering or not ?

Devon Ryan

unread,
Feb 21, 2018, 6:46:39 PM2/21/18
to deep...@googlegroups.com
By default, the score is the per-bin average of whatever is in the
bigWig file. If that's "read coverage" or "fragment coverage" then
that's the score. If the bigWig files contain log2 fold-changes vs.
input then that's the score.

I assume that your second question pertains to the bin size setting in
computeMatrix. This effectively sets its resolution and, thus, the size
of the file it produces. This is convenient since it allows you to
produce images with multiple data resolutions. You are, of course,
limited by the resolution of the underlying bigWig files, but since in
the profiles (e.g., from plotProfile or the top of the default
plotHeatmap output) you're averaging over a typically huge number of
regions, a relatively coarse resolution in the bigWig files stills
allows for good signal resolution in the resulting plots.

Clustering is applied to all samples at once. Relatedly, all samples
will always be identically sorted, so if you make a heatmap you can be
confident that a given row in one sample came from the same region as
the same rows in other samples.

Devon

Mthabisi Moyo

unread,
Jun 14, 2018, 2:14:47 AM6/14/18
to deepTools
Devon,

I have also been trying to figure out how to interpret the score. I made my bigwigs (ChIP-seq from human samples) using bamCoverage:

bamCoverage --bam in.bam -o out.bw --outFileFormat bigwig --binSize 10 --normalizeUsing CPM --effectiveGenomeSize 2913022398 --numberOfProcessors max --ignoreDuplicates

1) Would you see a significant difference in the heatmap if you normalized with CPM instead of 1x normalization (RPGC)? I also just noticed that the effectiveGenomeSize parameter is for RPGC. Does it negatively affect the output if used when normalizing with CPM?

2) Is there a range in the colorbar/score that is correlated with the quality of ChIP-seq i.e. does a score range of between 0-1 (1 being zMax in the colorbar) reflect poorly on the quality of a ChIP vs a score between 10-50?

Thanks,

Mthabisi

Devon Ryan

unread,
Jun 14, 2018, 2:52:14 AM6/14/18
to Mthabisi Moyo, deepTools
All normalization does is change the scale on the Y-axis, it has no
effect otherwise. So if you see an effect with CPM normalized data
then the curves would look identical if you changed the normalization
(the only difference being the numbers on the Y axis). The effective
genome size will be ignored unless you specify RPGC, so don't worry
that you messed anything up :)

BTW, the main point behind the various normalizations is to make
different samples comparable, so if you're only looking at a single
sample then you don't need to worry about this.

I wouldn't try to assess ChIP quality from heatmap score ranges
without a lot of background knowledge on the particular mark that
you're looking at. What we find more useful is to look at the output
from plotFingerprint, particularly if you have an input sample to
compare to.

Devon
--
Devon Ryan, Ph.D.
Email: dpr...@dpryan.com
Data Manager/Bioinformatician
Max Planck Institute of Immunobiology and Epigenetics
Stübeweg 51
79108 Freiburg
Germany


Reply all
Reply to author
Forward
0 new messages