Merging un-normalized matrices vs. merging data before alignment

272 views
Skip to first unread message

Debbie Chasman

unread,
Feb 21, 2018, 1:10:22 PM2/21/18
to HiC-Pro
Hello,

I'm wondering if these two orders of operations should give equivalent results. 

Say we have two different technical replicates of the same experiment and want to pool them to increase the sequencing depth.

Option 1: Use HiCPro to align and quantify each replicate separately in as two different "samples". Then, add up the raw/unnormalized matrices from the two samples. Finally, apply ice normalization to the single combined unnormalized matrix.

Option 2: Align and quantify both replicates together by putting all fastq files within the same sample data directory. Use HiC-Pro as usual to quantify and then ice normalize one matrix.

My intuition is that the two final matrices should be equivalent, because alignment and presumably QC is done independently per read pair. But perhaps I'm missing something. I'd really appreciate your insight.

Many thanks,

Best,
Deborah

nservant

unread,
Feb 21, 2018, 3:13:29 PM2/21/18
to HiC-Pro
Hi Debbie,

Yes, the two options should give exactly the same results.
On my side, I usually used Option 1 ... it allows to check the replicates quality independantly and to further merge them to reach higher resolution.
Best
Nicolas

Debbie Chasman

unread,
Feb 21, 2018, 4:38:33 PM2/21/18
to HiC-Pro
Okay, great. Thank you very much for your response! 

Ashley S Doane

unread,
Jun 18, 2019, 2:45:03 PM6/18/19
to HiC-Pro
Hi Nicolas,
can you or someone who has done this advise on pooling replicate samples?  Ideally I could cat the validPairs file for replicate 1 and replicate 2, and sort.  Or something similar.  Can you advise?

thanks,
Ashley

nservant

unread,
Jun 19, 2019, 8:07:27 AM6/19/19
to HiC-Pro
Hi Ashley

yes, this is exactly the best way to go.
Note that in practice, it should be exactly the same than simply adding the two raw contact maps.
Best
N

Moshe Olshansky

unread,
Jun 20, 2019, 2:54:24 AM6/20/19
to HiC-Pro
Hi Nicolas,

Won't more duplicated pairs of reads be potentially removed when using Option 2?

nservant

unread,
Jun 20, 2019, 12:21:29 PM6/20/19
to HiC-Pro
Option 2: Align and quantify both replicates together by putting all fastq files within the same sample data directory. Use HiC-Pro as usual to quantify and then ice normalize one matrix.

Yes absolutely !!
That's why I recommand to put one sample per folder first. So that each sample is run independantly (including duplicates removal).
Then create another input folder with link to the validPairs (in the same folder now). Set rm_dup=FALSE, et re-run HIC-Pro in stepwise mode, just to generate the maps.
Does it make sense for you ?
Nicolas

Moshe Olshansky

unread,
Jun 20, 2019, 9:25:45 PM6/20/19
to HiC-Pro
Yes, definitely.
Thank you.

Daniel Bsteh

unread,
Aug 20, 2019, 9:35:20 PM8/20/19
to HiC-Pro
Okay but I am confused how I can proceed after I merged the replicates as you described above from the .validPairs files... I want to visualize the data in juicebox or higlass? The input files seems to require the allvalidpairs files? Just wanna make sure the duplicates are handled correctly... Is the hicpro2juicebox utility actually removing the duplicates?
Thanks!

Moshe Olshansky

unread,
Aug 21, 2019, 12:32:23 AM8/21/19
to HiC-Pro
hicpro2juicebox does not remove duplicates and it is good. Replicates have been removed from each sample during the creation of allValidPairs. So now you can sort-merge all individual validPairs into one validPairs file and then run hicpro2juicebox on it.

Daniel Bsteh

unread,
Aug 21, 2019, 2:01:24 AM8/21/19
to HiC-Pro
Okay thanks! So to put this straight the .validPairs contain duplicates but the .allvalidPairs don't? So I can't set the configure file to rm_dup=0 with .validPairs but only with the .allValidPairs files? And you merge and sort the allvalidpairs files outside of HiCPro with like cat?

Moshe Olshansky

unread,
Aug 22, 2019, 7:47:33 AM8/22/19
to HiC-Pro
Normally duplicated reads for each sample will be removed at some stage before allValidPairs are produced and so you should not remove them again. As to merging the files from several samples, look at Unix sort command with -m flag.
Reply all
Reply to author
Forward
0 new messages