Downsample BAM files before run bamCoverage

Edroaldo Lummertz da Rocha

unread,

Jun 28, 2016, 10:54:12 AM6/28/16

to deepTools

Hi All,

deepTools is a great software, thank you! I am using deepTools to analyze ChIP-seq data and I am not having any problem running technical problem but a have a question regarding downsampling BAM files before using bamCoverage to compensate for differences in sequencing depth of multiple ChIP-seq libraries. In the supplementary information from this paper: Acquired Tissue-Specific Promoter Bivalency Is a Basis for PRC2 Necessity in Adult Cells, Cell 165 (6), 1389-1400, the authors used the Downsample BAM tool in Galaxy tools to down-sample larger libraries to the depth of smallest library.

I have an experiment with 8 ChIP-seq samples for 2 different cell lines and 4 drug treatments (we have the input for each experimental condition). I was visualizing (using plotHeatmap) all experimental conditions in a single heatmap by passing all .bw files to computeMatrix.

Based on down-sampling performed in the paper mentioned above, I was not sure how meaningful the heatmaps that I was creating using deepTools (computeMatrix -> plotHeatmap) for all my conditions are. However, as bamCoverage provides a depth-normalized file for each BAM file, I would think that my heatmaps are correct.

I was checking the documentation of computeMatrix and plotHeatmap but I have not found anything saying that these functions perform any additional normalization when taking several .bw files as input.

I am sorry for the long message but in summary, this is my question: Are the depth-normalized .bw files generated by bamCoverage "read to go" for visualizing several ChIP-seq experiments together or should I also consider to compensate differences in sequencing depth before using bamCoverage?

Hope it is clear and looking forward to hear from you!

Thank you very much!

Friederike Dündar

unread,

Jun 28, 2016, 11:28:19 AM6/28/16

to Edroaldo Lummertz da Rocha, deepTools

Hi,

if you used bamCompare with the --normalizeTo1x setting, you should be fine. The whole point of the normalizations offered in bamCompare and bamCoverage is to account for differences in sequencing depth.

You are correctly assuming that computeMatrix does not perform an additional normalization, but again, if you used bamCompare or bamCoverage to generate the bigWig files you're supplying to computeMatrix, you can, in principle, compare the signal strength.

Note however that different ChIP-seq experiments may yield very different signal amplitudes for a variety of technical reasons other than sequencing depth (e.g., pull-down efficiency). I would be very cautious to draw biological conclusions when simply comparing signal of ChIP A to signal of ChIP B. What may make more sense is to compare the differences in the signals for different regions, e.g. does ChIP A show similar signals around promoters and at exons while ChIP B tends to be stronger at promoters?

That being said, you can also use bamCompare to compare different ChIP-signals.

Hope that helps!

Best,

Friederike

--
You received this message because you are subscribed to the Google Groups "deepTools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deeptools+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Friederike Dündar

unread,

Jun 28, 2016, 11:37:18 AM6/28/16

to Edroaldo Lummertz da Rocha, deepTools

As a follow-up for your specific case: if you want to assess the impact that the drug treatment may have had on the binding of your TF of interest, you should definitely have replicates. If the replicates of the treatment give consistently lower signals than the untreated ones, that may be a hint that the drug did have an effect on the binding of the TF. Yet, you would have to rule out that the drug treatment did not interfere with your ChIP protocol in general, so if you saw regions where the signals reach similar levels between the two conditions, that'd be another hint that the effect you're seeing may be no be a complete artifact, but specific to certain regions. You could also test this doing ChIP (qPCR) on treated samples for a TF that you don't expect to be affected - this would give you some insights into the variability of the signal.

Sorry if these suggestions seem trivial to you - I just wanted to bring across that even the best bioinformatics normalization scheme will not be able to account for all the possible sources of artifacts, particularly not for ChIP-seq.

Best,

Friederike

Edroaldo Lummertz da Rocha

unread,

Jun 28, 2016, 12:58:43 PM6/28/16

to Friederike Dündar, deepTools

Hi Friederike,

Thank you so much for your detailed answer! Each sentence was very valuable to me. I will run bamCoverage again with the option --normalizeTo1x as I did not explicitly used it before and compare with my current results. I am plotting the heatmaps with all treatments to have any overall idea about the binding profiles across treatments but I am definetly planning to use bamCompare for specific comparisons.

It was really helpful! Thank you very much!

--

Edroaldo

kyl...@connect.hku.hk

unread,

Oct 13, 2017, 12:32:39 AM10/13/17

to deepTools

Hi Friederike & list,

Thanks for your effort for the detail explanation regarding the normalization issue. I've read the posts in the group in order to get an idea how can do the normalization step correctly. May I clarify some more here? Hope that you can point out my misunderstanding if any.

Examples
Samples: 1. H3K4me3 ChIP, 2. Input of H3K4me3 ChIP, 3. H3K9me3 ChIP, 4. Input of H3K9me3 ChIP
Goal: Make comparison between K4 & K9 ChIP experiment
.bw generation: bamCompare with --normalizeTo1x to process sample
resulting .bw: K4.bw (b1: sample 1 b2: sample 2) K9.bw (b1: sample 3 b2: sample 4)
Normalization achieved: K4.bw K9.bw are normalized to each other by sequencing depth;
the possible differences in input & ChIP sample had been considered individually both pairs

--> K4.bw & K9.bw can then be compared to each other e.g. using plotHeatmap, plotProfile

Is the flow correct?

I've got a question regarding the bamCompare setting in the above example. The description of --ratio argument said that "Only with –ratio subtract can –normalizeTo1x or –normalizeUsingRPKM be used." In this case, is it the setting should be "--ratio subtract --normalizeTo1x", which result a single .bw file (K4.bw or K9.bw stated above) and no need to obtain the log2 bw (as the aim is not comparing the different between ChIP & Input), right?

Thanks very much! Hoping for your reply.

Best Regards,
Kylie

Devon Ryan

unread,

Oct 13, 2017, 4:52:06 AM10/13/17

to kyl...@connect.hku.hk, deepTools

Hi Kylie,

In bamCompare, --normalizeTo1X is only actually used if you use
"--ratio subtract", it's ignored otherwise. I understand that this is
completely unclear and we're trying to clarify how all of this works
for the next release (this requires restructuring some of the
arguments.

If your K4 and K9 samples have different inputs that you're
normalizing against, you'll need to take some care when comparing the
two marks. If your goal is simply to see if there's enrichment of the
signals in similar places then you're fine. However, if you want to
perform quantitative comparisons (e.g., there's more K4 than K9 in a
given spot), then the fact that you've normalized to different inputs
won't allow you to do that. Such comparisons are problematic to begin
with, but would be a bit more so in this setup.

You can use "--ratio subtract" if you prefer, in which case if you
then specify "--normalizeTo1X" then the difference between IP and
input is normalized to 1X. Again, this may or may not be what you
actually want and I understand that the documentation is
fairly...opaque...here, we're hoping to change that in the next
release.

Devon
--
Devon Ryan, Ph.D.
Email: dpr...@dpryan.com
Data Manager/Bioinformatician
Max Planck Institute of Immunobiology and Epigenetics
Stübeweg 51
79108 Freiburg
Germany

Kylie Mak

unread,

Oct 13, 2017, 5:51:39 AM10/13/17

to Devon Ryan, deepTools

Dear Devon,

It's happy to get your reply. I actually simplified the whole comparison a bit in the last post for clarity.

In the real setup. K4 and K9 is just for the examination of enrichment pattern in similar location. However, quantitative comparison of K9 ChIP from different differentiation time points is needed (again, K9 ChIP from each time point have its own input). How should the normalization be done in this case?

Thanks so much for answering users question actively. It's make me feel very encouraged in using the tools.

Cheers,

Kylie

Devon Ryan

unread,

Oct 13, 2017, 6:09:32 AM10/13/17

to Kylie Mak, deepTools

We've mostly discussed the issues surrounding making quantitative
comparisons elsewhere: https://www.biostars.org/p/190362/

Have a read through the replies there.

Devon
--
Devon Ryan, Ph.D.
Email: dpr...@dpryan.com
Data Manager/Bioinformatician
Max Planck Institute of Immunobiology and Epigenetics
Stübeweg 51
79108 Freiburg
Germany

Reply all

Reply to author

Forward