Hi Ian and Moshe,
Thanks for your responses. I have followed the same Biostar post to generate the 'treatment pileup' and it worked fine. But the control piluep (lambda) generation follows different procedure, and there is not much documentation on 'How the control pileup will be generated for a Paired-end data'.
So, I tried to figure it out by going through the code, and I guess the following is roughly the steps to generate the control pileup:
For control library, treat the PE as SE tags.
1. Extract the left end and right end of the fragment as two reads: left read (+ve strand) and right read -ve strand) [check function 'pileup_a_chromosome_c' in CallPeakUnit.pyx]. So, due to this we have twice the amount of fragments in control, and this is confirmed from this line in PeakDetect.pyx
if self.PE_MODE:
d = self.treat.average_template_length
control_total = self.control.total * 2 # in PE mode, entire fragment is counted as 1
# in treatment whereas both ends of fragment are counted in control/input.
2. Extend each tag generated from step1 towards the 3' direction, by d/2 (d is derived as average fragment length from treatment). [If you goto 'pileup_a_chromosome_c' function, the function by default performs directional read extension (calling function does not set the direction=False flag). This is different from how the control pileup is generated with a SE library]
3. Extension is also done for slocal(1kb regions), llocal (10kb region).
4. lambda_bg (whole genome_depth), which is the baseline value to be filled for no coverage regions is computed as
lambda_bg = (sum of fragment length from treatment) /genome_size [this is the case if the number of fragments in treat is lesser than control; Since the control fragments are counted twice the original size, the control size is always the higher library is my case. Otherwise the formula switches: refer line 147-173 in PeakDetect.pyx]
5. For each window/bp, substitute the max value from pileup generated from step 2-4.
6. If SPMR is given in callpeaks command, scale the pileup values to adjust for sequencing depth difference [value from this variable 'effective_depth_in_million' is used]
It will be really helpful if anyone can confirm or a the amend the above "steps for generating control pileup for a Paired-end (PE) library".
Thanks and regards,
Aarthi