deeptool BioCoverage

1,850 views
Skip to first unread message

sanjay chahar

unread,
Feb 26, 2014, 8:34:52 AM2/26/14
to deep...@googlegroups.com
Hi, My self Postdoc researcher in IGBMC, Strasbourg France. It will be very kind of you if you can answer my question concerning deep tools. I am using BamCoverage to Normalize to BAM files (Control vs KO H3K18ac) to have equal sequencing depth.

I have run MACS on WT and Control ChIP seq data, I see 10Million sequence reads (tags mapped in genome after filtering) in KO, and 30 Millions in Control, so to further process the files, i wanted to normalize both to sequencing depth to have equal number of reads.

to do so  I used BamCoverage deep tools, where i used 3x scaling factor (based on 3 times more mapped reads) in KO, And no scaling for WT, My question is, was it the right way to normalize? And what is the difference between given other parameters of normalization such as normalize to 1x, normalize to RPKM. Can you please suggest which would be the most appropriate way to normalize 2 files for sequencing depth. As i can see difference in my data when i tried RPKM and scaling factor of 3.

Thanks a lot

Best wishes
Sanjay Chahar, Postdoc fellow
IGBMC, Strasbourg





Friederike Dündar

unread,
Feb 27, 2014, 5:40:25 AM2/27/14
to cha...@googlemail.com, deep...@googlegroups.com
Dear Sanjay,

thanks for using deepTools! Here's our reply to your questions, we hope they'll help:


1. Regarding your approach using bamCoverage and --scaleFactor:

- I assume that you're not using really just 3 as a scaling factor, but the precise number based on the real number of mapped reads. Just checking :)

- The way you did it was to scale the smaller sample up to match the more deeply sequenced sample. To be honest, I would recommend to do it the other way around and to scale the more deeply sequenced sample down. This opinion stems from the fact that it is difficult to distinguish whether a region with no reads at all in the shallowly sequenced sample is due to lack of coverage or due to lack of mapability (which would mean that the region will also not be covered in the more deeply sequenced sample). Regions with zero coverage in the less deeply sequenced sample will still have zero coverage after being multiplied while the remaining regions will get "artificially" increased read numbers. That's why I would recommend to multiply your control sample with 1/3 if you were to stick to your procedure.



2. Regarding your question about RPKM and 1x sequencing depth normalization:

Let's assume the following example sequencing sample:
- mouse sample (effective genome size of the mouse: ~ 2.15057 x 10^9 bp)
- 50 million mapped reads
- average size of sequenced DNA fragments: 200
- bin size for the bigWig: 25 bp
- 2 exemplary bins: no. 1 with 10 overlapping reads, no. 2 with 12 overlapping reads

RPKM takes the bin size and the number of mapped reads into consideration, it does not care about the genome size:

RPKM (per bin) = number of reads per bin / ( number of mapped reads (in millions) * bin length (kp)

For the example above, this would mean:
RPKM(bin1) = 10 / (50 * 0.025) = 8.
For the second bin: RPKM(bin2) = 12 / (50 * 0.025) = 9.6

RPGC, on the other hand, does not only take the total number of reads into consideration, it also needs the effective genome size (which will differ from the "real" genome size because for mapping reads those regions of the genome where the sequence is either not determined or too repetitive to be covered should not be taken into consideration for calculating the coverage. Note that the exact effective genome size might be bigger than the values we indicate in the help texts if you have very long sequencing reads. For the example above, RPGC would work as follows:
sequencing depth = (total number of mapped reads * fragment length) / effective genome size = 50 x 10^6 * 200/ 2.15057 x 10^9 = 4.65
RPGC scaling factor = 1/sequencing depth = 1/4.65 = 0.22
RPGC(bin1) = 0.22 * 10 = 2.2
RPGC(bin2) = 0.22 * 12 = 2.64



3. Our advice

Your approach is fine, however, it will only work for this specific pair of samples, i.e. you might not be able to compare the resulting profiles to those that you obtain from other sources (with yet again different sequencing depths). Therefore, we highly recommend to use the 1x sequencing depth normalization which will ensure that you can add new profiles in the future, too, and compare them likewise.

That being said, in addition to just normalizing the individual samples for sequencing depth, I hope that you have input samples for the control as well as for the KD sample. If that is the case, I would recommend to use the bamCompare tool, so that both ChIP-seq samples will not only be normalized for sequencing depth, but also for possible differences in the input (the KD treatment might have influenced the chromatin). bamCompare can be used  either with the SES scaling factor (if bamFingerprint gave you a result like the left-most plot here: https://github.com/fidelram/deepTools/wiki/QC#wiki-bamFingerprint) or the read count normalization. The result of bamCompare can be, for example, the log2ratio(ChIP/input) which you can then use to compare the different conditions. For more details, see the information in the wiki: https://github.com/fidelram/deepTools/wiki/Normalizations#wiki-bamCompare

I hope that helps!

Do not hesitate to get back to us if you have more questions!

Best wishes,

Friederike




--
You received this message because you are subscribed to the Google Groups "deepTools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deeptools+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


tess.k...@gmail.com

unread,
Apr 10, 2017, 9:18:16 AM4/10/17
to deepTools, cha...@googlemail.com
Hi Friederike,

Thanks for this comprehensive explanation to this question!

I just started analyzing single-end ChIP-sequencing data sets, and I was also wondering how I should normalize my samples. I did not include input (yet), so I would mostly just like to compare the coverage across genes in a heatmap. To normalize for sequencing depth, I used the --normalizeUsingRPKM option and binsize=1.
However, now I get extremely high coverage values that I didn't expect. Also, in your example calculation here you use 0.025 for the binsize instead of 25 bp. Why is this? And could this explain why my coverage is so high?

Thanks a lot for your help!

Best regards,

Tess

Friederike Dündar

unread,
Apr 13, 2017, 1:04:31 PM4/13/17
to tess.k...@gmail.com, deepTools
Hi Tess,

what do you mean with "extremely high coverage values"? And are those localized, e.g. in regions where you may have peaks, or all over the genome?

For RPKM, the bin size is used in kb, 25 bp = 0.025 kb.

Best,

Friederike



--
You received this message because you are subscribed to the Google Groups "deepTools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deeptools+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tess Korthout

unread,
Apr 14, 2017, 4:59:27 AM4/14/17
to deepTools
Hi Friederike,

Thanks for your help! I now understand the binsize is in kb instead of bp :-)
I got high values all over the genome, which makes sense because I set binsize=1 so the counts are divided by 0.001.

Best,

Tess
> --
> You received this message because you are subscribed to a topic in the Google Groups "deepTools" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/deeptools/th96gaftAXQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to deeptools+...@googlegroups.com.

Friederike Dündar

unread,
Apr 17, 2017, 3:52:55 PM4/17/17
to Tess Korthout, deepTools
Right! Happy belated Easter :)

> To unsubscribe from this group and all its topics, send an email to deeptools+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to the Google Groups "deepTools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deeptools+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages