How to construct integrated contact map from multiple fastq files by using HiC-Pro

906 views
Skip to first unread message

Takeshi

unread,
Apr 26, 2016, 3:54:13 AM4/26/16
to HiC-Pro
Hi, Nicolas, 

Now, I want to combine multiple samples like SRR***1_R1.fastq, SRR***1_R2.fastq, and SRR***2_R1.fastq, SRR***2_R2.fastq, etc.
and then, construct integrated contact maps by using HiC-Pro.

By default, your pipeline generates individual raw and corrected contact map for each sample (SRR***1, SRR***2, …).

Could you please tell me how to generate integrated contact maps from multiple samples (fastq files) by using HiC-Pro?

Best, 

Takeshi


nservant

unread,
Apr 26, 2016, 5:24:21 AM4/26/16
to HiC-Pro
Hi Takeshi

Actually it depends on how to organize your input folder.
HiC-Pro will merge all data within the same folder, and generate maps per sample.
For instance, if you have the following organization ;
rawdata/
++SAMPLE1
++++SRR***1_R1.fastq
++++ SRR***1_R2.fastq


++SAMPLE2
++++SRR***2_R1.fastq
++++ SRR***2_R2.fastq
 
... you will have one map per sample
If you organize your data as follow ;
rawdata/
++SAMPLE
++++SRR***1_R1.fastq
++++ SRR***1_R2.fastq

++++SRR***2_R1.fastq
++++ SRR***2_R2.fastq

All data willl be merged, and processed as one sample.
Note that if you already processed the data independantly, you migth not want to rerun the complete analysis.
So you can simply create a new folder with link to the validPairs files, and run HiC-Pro with -s mode
For instance
data_valid/
++SAMPLE
++++ SRR****1.validPairs
++++ SRR****2.validPairs

And run HiC-Pro with -s build_contact_maps
Hope it helps
Nicolas
 

Takeshi

unread,
Apr 26, 2016, 6:44:23 AM4/26/16
to HiC-Pro
Hi Nicolas, 

Thank you for responding to my question.

I understand I can construct an integrated contact map 
by (1) organizing an input folder, or (2) rerunning a stepwise function (when I have individual HiC-Pro results).

********

All data willl be merged, and processed as one sample.
Note that if you already processed the data independantly, you migth not want to rerun the complete analysis.
So you can simply create a new folder with link to the validPairs files, and run HiC-Pro with -s mode
For instance
data_valid/
++SAMPLE
++++ SRR****1.validPairs
++++ SRR****2.validPairs

And run HiC-Pro with -s build_contact_maps



There is a validPairs-typed file, "SRR***1(or 2)_hg18.bwt2pairs.validPairs"
 in an output folder, "OUTPUT_FOLDER/hic_results/data/SRR***1(or 2)" in a human data case.

Can I use this "SRR***1(or 2)_hg18.bwt2pairs.validPairs"  
only to build an integrated contact map 
by (1) creating a new folder "data_valid" and copying the validPairs files in the folder as described above,
and then 
(2) executing a command line "HiC-Pro -s build_contact_maps"?

Do I need any arguments in the command line for the "data_valid" folder ?


Best, 

Takeshi




2016年4月26日火曜日 18時24分21秒 UTC+9 nservant:

nservant

unread,
Apr 26, 2016, 7:42:58 AM4/26/16
to HiC-Pro
Le 26/04/2016 12:44, Takeshi a écrit :

Can I use this "SRR***1(or 2)_hg18.bwt2pairs.validPairs"
only to build an integrated contact map
by (1) creating a new folder "data_valid" and copying the validPairs files in the folder as described above,
and then
yes absolutely
(2) executing a command line "HiC-Pro -s build_contact_maps"?

Do I need any arguments in the command line for the "data_valid" folder ?

You just have to specify the good input folder, so for instance
HiC-Pro -i data_valid -o merge_maps -c config.txt -s build_contact_maps

Nicolas

Takeshi

unread,
May 2, 2016, 5:47:11 AM5/2/16
to HiC-Pro
Hi Nicolas, 

According to your comment, I could reconstruct integrated contact maps by using the step-by-step function, "build_contact_maps".
I appreciate you.

Best,

Takeshi


2016年4月26日火曜日 20時42分58秒 UTC+9 nservant:

Samad Elka

unread,
Feb 22, 2018, 5:39:59 AM2/22/18
to HiC-Pro
Hi Nicolas, 
I recently started to enjoy using the HiC-Pro pipeline to analyse HiC data. 
I have the same issue discussed here, but i realize that the recent version "HiC-Pro -s build_contact_maps" (HiC-Pro_2.10) has evolved since this discussion when i want to build a matrix from multiple validPairs.

When i follow the steps explained here i get error bellow: 
Exit: Error: Directory Hierarchy of rawdata '/data_valid/sample/' is not correct. No '_allValidPairs' files detected
data_valid/
++sample
++++ SRR****1.validPairs
++++ SRR****2.validPairs

and when i change ".validPairs" to  "_allValidPairs" in file name, i get another error:
 
cat: rawdata/sample/sample_allValidPairs: Aucun fichier ou dossier de ce type
HiC-Pro_2.10.0/bin/../scripts//Makefile:163: recipe for target 'build_raw_maps' failed
make: *** [build_raw_maps] Error 1

I precise that it works only when the file has the same name of the valid_data/sample+"_allValidPairs" pattern... 
So im confused with this.

Any highlight from regarding this will be very helpfull
Cheers, 
Samad

nservant

unread,
Feb 22, 2018, 8:49:21 AM2/22/18
to HiC-Pro
Hi Samad,

Indeed I may have to change that in next versions ...
To fix it, you have to use the options "-s merge_persample -s build_contact_maps"
The first one will take the .validPairs, and generate the _allValidPairs file.
The second one will generate the maps.
Thanks for your feedback, I will had a note on that point !
Cheers

Samad Elka

unread,
Feb 22, 2018, 10:08:58 AM2/22/18
to HiC-Pro
Hi Nicolas, 
Thank you and I appreciate too much the very quick answer, that helped to fixe the issue by using "-s merge_persample" along with "-s build_contact_maps", and avoid me to change file names and/or merge them manually!
Thanks again.
Cheers, 
Samad

nservant

unread,
Feb 22, 2018, 12:36:08 PM2/22/18
to HiC-Pro
Just one detail, the "-s merge_persample" will try to remove duplicates if the option is set in the config file.
Best

nservant

unread,
Aug 2, 2018, 7:02:38 AM8/2/18
to HiC-Pro
Update on this post. I will be fixed in HiC-pro 2.11.0
Using -s build_contact_maps still requires .allValidPairs extention (note the '.', and no longer the "_")
However, the file name is much more flexible ... fixing the issue !

Andrea Perreault

unread,
Jan 22, 2019, 2:19:08 PM1/22/19
to HiC-Pro
Nicolas,

I'm having some trouble with the organization of directories to run -s build_contact_maps on multiple replicates. I have conducted the full HiC-Pro pipeline on each replicate individually and would like to build a merged interaction matrix. Below is my current directory structure:
HiChIP
+Rep1
++data
+++raw
++++_r1.fastq
++++_r2.fastq
++hic_pro_results
+++bowtie_results
+++hic_results
+Rep2 (same as Rep1)
+Rep3 (same as Rep1)
+Rep4 (same as Rep1)
+reproducibility
++data
+++Rep1.bwt2pairs.ValidPairs (copied from hic_results)
+++Rep2.bwt2pairs.ValidPairs (copied from hic_results)
+++Rep3.bwt2pairs.ValidPairs (copied from hic_results)
+++Rep4.bwt2pairs.ValidPairs (copied from hic_results)

I run the following command:
HiC-Pro -c /media/bryan/2Tb_Workspace/AP/HiChIP/config_mm10-AP-BWT_v3.txt -i /media/bryan/2Tb_Workspace/AP/HiChIP/reproducibility -o /media/bryan/2Tb_Workspace/AP/HiChIP/reproducibility/output -s merge_persample -s build_contact_maps

and get the following error:
find: File system loop detected; ‘rawdata/output/rawdata’ is part of the same file system loop as ‘rawdata’.
/home/bryan/HiC-Pro-2.10.0/bin/../scripts//Makefile:150: recipe for target 'merge_valid_interactions' failed
make: *** [merge_valid_interactions] Error 1

I also tried HiC-Pro -c /media/bryan/2Tb_Workspace/AP/HiChIP/config_mm10-AP-BWT_v3.txt -i /media/bryan/2Tb_Workspace/AP/HiChIP/reproducibility -o /media/bryan/2Tb_Workspace/AP/HiChIP/reproducibility/output -s build_contact_maps
and got the following error: Exit: Error: Directory Hierarchy of rawdata '/media/bryan/2Tb_Workspace/AP/HiChIP/reproducibility' is not correct. No '_allValidPairs' files detected

Any help would be appreciated!
-Andrea

nservant

unread,
Jan 23, 2019, 4:19:23 AM1/23/19
to HiC-Pro
Hi Andrea,
Could you try to use separate folders for input/output please ? it seems that there is a system loop on the data architecture.
Something like ;

HiC-Pro \
-c /media/bryan/2Tb_Workspace/AP/HiChIP/config_mm10-AP-BWT_v3.txt \
-i /media/bryan/2Tb_Workspace/AP/HiChIP/reproducibility \
-o /media/bryan/2Tb_Workspace/AP/HiChIP/merged_results \
-s merge_persample -s build_contact_maps

Thanks

Andrea Perreault

unread,
Jan 25, 2019, 11:14:41 AM1/25/19
to HiC-Pro
Thank you, that worked! The directory structure can be confusing at times.

-Andrea
Reply all
Reply to author
Forward
0 new messages