HiCPro unable to read input files

1,598 views
Skip to first unread message

Manjulapramila Thimmajanarthanan

unread,
Aug 17, 2016, 9:37:00 AM8/17/16
to HiC-Pro
I am trying to run HiCPro on my samples.

There is an error in identifying the input files in the folder mentioned with -i option.

HiC-Pro_2.7.8$ bin/HiC-Pro -i /data_b/Ago1_HiC/ -o /data_b/Ago1_HiCProOut/ -c config-hicpro.txt
Exit: Error: Directory Hierarchy of rawdata '/data_b/Ago1_HiC/' is not correct. No '.fastq(.gz)' files detected

But the files are there very much!

thimmamp@kw12556:~/bioinformatics_tools/HiCPro_learning/HiC-Pro_2.7.8$ ls -altr /data_b/Ago1_HiC/
total 9581660
-rw-r--r--  1 thimmamp kw-users 5918986319 Jul 26 18:07 Ago1_pool_R1_out.fastq.gz
-rw-r--r--  1 thimmamp kw-users 3892611527 Jul 26 18:09 Ago1_pool_R2_out.fastq.gz
drwxr-xr-x  2 thimmamp kw-users       4096 Aug 17 09:04 .
drwxrwxrwx 10 hpcadmin kw-users       4096 Aug 17 09:16 ..

nservant

unread,
Aug 18, 2016, 11:19:24 AM8/18/16
to HiC-Pro
Hi,
This is due to the data organization.
The input folder must contain one folder per sample.
Fore instance 
data
---sample1
------f1_R1.fastq
------f1_R2.fastq
--sample
....

Hope it helps
Nicolas
Message has been deleted

Manjulapramila Thimmajanarthanan

unread,
Aug 24, 2016, 2:27:58 AM8/24/16
to HiC-Pro
Hi Nicolas,

Thanks for this! After reorganising the data folder structure, it was progressing.

But throwing error for pairing of R1 and R2 tags for a sample.


bin/HiC-Pro -i /data_b/indata/ -o /data_b/HiCPro_out/ -c config-hicpro.txt
/data_b/HiCPro_out/ folder alreads exists. Do you want to delete it before running ?(y/n) [n] :
y

Run HiC-Pro 2.7.8
--------------------------------------------
Sun Aug 21 09:01:58 AST 2016
Bowtie2 alignment step1 ...
/home/thimmamp/bioinformatics_tools/HiCPro_learning/HiC-Pro_2.7.8/scripts/bowtie_wrap.sh -c /home/thimmamp/bioinformatics_tools/HiCPro_learning/HiC-Pro_2.7.8/config-hicpro.txt -u >> hicpro.log
--------------------------------------------
Tue Aug 23 14:23:26 AST 2016
Bowtie2 alignment step2 ...
/home/thimmamp/bioinformatics_tools/HiCPro_learning/HiC-Pro_2.7.8/scripts/bowtie_wrap.sh -c /home/thimmamp/bioinformatics_tools/HiCPro_learning/HiC-Pro_2.7.8/config-hicpro.txt -l >> hicpro.log
--------------------------------------------
Tue Aug 23 22:53:24 AST 2016
Combine both alignment ...
/home/thimmamp/bioinformatics_tools/HiCPro_learning/HiC-Pro_2.7.8/scripts/bowtie_combine.sh -c /home/thimmamp/bioinformatics_tools/HiCPro_learning/HiC-Pro_2.7.8/config-hicpro.txt >> hicpro.log
[bam_sort_core] merging from 36 files...
[bam_sort_core] merging from 58 files...
[bam_sort_core] merging from 48 files...
[bam_sort_core] merging from 78 files...
[bam_sort_core] merging from 46 files...
[bam_sort_core] merging from 90 files...
[bam_sort_core] merging from 30 files...
[bam_sort_core] merging from 52 files...
--------------------------------------------
Wed Aug 24 02:13:58 AST 2016
Bowtie2 mapping statistics for R1 and R2 tags ...
/home/thimmamp/bioinformatics_tools/HiCPro_learning/HiC-Pro_2.7.8/scripts/mapping_stat.sh -c /home/thimmamp/bioinformatics_tools/HiCPro_learning/HiC-Pro_2.7.8/config-hicpro.txt >> hicpro.log
--------------------------------------------
Wed Aug 24 02:48:39 AST 2016
Pairing of R1 and R2 tags ...
/home/thimmamp/bioinformatics_tools/HiCPro_learning/HiC-Pro_2.7.8/scripts/bowtie_pairing.sh -c /home/thimmamp/bioinformatics_tools/HiCPro_learning/HiC-Pro_2.7.8/config-hicpro.txt >> hicpro.log
make: *** [bowtie_pairing] Error 1

When I checked the respective mergeSAM.log file,

## mergeBAM.py
## forward= bowtie_results/bwt2/Ago1_HiC/Ago1_pool_R1_out_hg19.bwt2merged.bam
## reverse= bowtie_results/bwt2/Ago1_HiC/Ago1_pool_R2_out_hg19.bwt2merged.bam
## output= bowtie_results/bwt2/Ago1_HiC/Ago1_pool_out_hg19.bwt2pairs.bam
## min mapq= 0
## report_single= False
## report_multi= False
## verbose= True
## Merging forward and reverse tags ...
Forward and reverse reads not paired. Check that BAM files have the same read names and are sorted.

nservant

unread,
Aug 24, 2016, 5:03:30 AM8/24/16
to HiC-Pro
Hi
So did you check the error message ?
When we merge the R1 and R2 alignment files we assume that the two files are sorted and that reads have the same name.
Here I think that you may have an issue with the read names. In some dataset the extension R1 or R2 is added directly in the read names.
If so, you will need to update the read name in the fastq files before running HiCpro.
If you just look at the first lines of the fastq of bam files it should be easy to check
N

Manjulapramila Thimmajanarthanan

unread,
Aug 30, 2016, 3:09:55 AM8/30/16
to HiC-Pro
 Hi Nicolas,

Sorry I didnt quite understand that readname part!

In the configuration file, does the following line expects each read name  from R1 and R2 fastq.gz files to have _R1 and _R2 extension?#########################################################################
## Data
#########################################################################

PAIR1_EXT = _R1
PAIR2_EXT = _R2

For instance, my read names are currently like given below.
 gunzip -c /data_b/indataraw/Ago1/M_16_D0519_Ago1_pool_7_GTCCGC_L004_R1_001.fastq.gz | head -10
@K00235:13:H7VGLBBXX:4:1101:1091:1281 1:N:0:GTCCGC
NGGATGGATGGATGGATGATGATAATGAAGATTAATTATGTCAAATCAACAAAGGAGGCNGNNNNNNNNNNNNNNNNNCAGGGAAGTGGGGAAGTGGAATGCTCCAAAAATGAGGAGTTAAGACTTGTGGGCCTACTCTTGGTTCTGCCAC
+
#AAFAFFKFKFKKKKKKKKKKKAFKKKKKKFF<KKFKKKKKKKKK7<KKFFKKKKKKKF#K#################KKKKKAKKFKKKK,AFFKKKKKKKKKKKKKAFAFAKKKKKKFKKAAKKKFFK,<AA,<7FKAFKKKKKKKFFK
@K00235:13:H7VGLBBXX:4:1101:1132:1281 1:N:0:GTCCGC
NCCCGAGTAGCTGGGACTACAGGCGCCCGCCACCACGCCCGGCTAATTTTTTATATTTTNANNNNNNNNNNNNNNNNNCCGTGTTAGCCAGGATGGTCTCCATCTCCTGACCTCGTGATCCGCCAGCCTCGGTGGCCTCCCAAAGTGCTGG
+
#AAFFFKKKFKKKKKFKKKKKKKKFFKKKFAFFAKKKF<F<AKFKK7K<<FFKKKKKKK#A#################<7FFF<KFFKKKKKAF,<,,FFKFFFKKKKKKFAFKFKKKKKK7<<FK<FFFKFFFKKFFF7AFFAAKKK<KK
@K00235:13:H7VGLBBXX:4:1101:1538:1281 1:N:0:GTCCGC
NTTTATCACCTTGATCTATTTAAAAATATATTTATGAGGCACTGATATTTCACAAGCAANGNNNNNNNNNNNNNNNNNATAATTAAAAACTTAGGGCTTCTTAAATTTGGCTGTAATTCTGATATCTATTTACAATTATTCAAGAGTACAA

gunzip -c /data_b/indataraw/Ago1/M_16_D0519_Ago1_pool_7_GTCCGC_L004_R2_001.fastq.gz | head -10
@K00235:13:H7VGLBBXX:4:1101:1091:1281 2:N:0:GTCCGC
AAAATTAATTTCAGTNTNNNTTANNNNCTTATGTAACTTATGAACATGTAAGGATGATCACTTTGTTCTAATATGTAAATTTTTACAAAGTATTTTAGTAAAATGATATATGTCCTTCAGTGAATTCATATTGCTNNNCAANAAAACATTC
+
AAAF<,AFAAFKKKA#,###FAK####<A<K,7,AFK7FKKAKKKFKKAKFFFKFKKKAFA7<,AKKKKKKK,7,7AAAKKFAAKK<KFKKKKKF7FKKKKKK7FKAAA<FFF,,,7,A,,,F7,<7,FKKA,7F###7AF#AFAFAFF<<
@K00235:13:H7VGLBBXX:4:1101:1132:1281 2:N:0:GTCCGC
AGTACATGGTCATCANTNNNAAANNNNAGAGAGGTAGGCATACATAAAGAAAAAAACTAAGCGGCTTCGTTGAGAAGAGGGTATGTAAAATACGCTATAAAGCGTGAGTAGCACTTTTAGCTCACTGACGAATTGNNNAAANAAAAAAAAA
+
AA,,FA,A,,,,,,7#7###A,,####77,,,,7F,F,,,<FKKFKKFKF77,F7A,A,,77,7,(,7F,7AFKK,7,7FFAA,7,7,AF7<,7<,FFKKKKKK,,AF,<AAFA,,,,,,,,,,,,7,,<,,,7,###AFK#KKKFAF7F<
@K00235:13:H7VGLBBXX:4:1101:1538:1281 2:N:0:GTCCGC
TTTTTAAGGATTGCCNNNNNGTTNNNNATTGAGGCTGCACCATTTTACATTCCCACCAATGGTGCACAAGCTAGCTTTACAGGAATCAACCTAATCTTACACTTTCTATATTAATCCATTGTACTCTTGAATAATNNNAAANAGATATCAG

Does this file should be changed such that each read name carries _R1 or _R2 extension?

Best,
Manjula

nservant

unread,
Sep 1, 2016, 5:41:21 AM9/1/16
to HiC-Pro
Hi Manjula,
So did your head on the R1 file. Could you please do the same on the R2 file ?
In theory, readname in R1 and R2 files should be the same.
If this is not the case, HiC-pro will crash at the pairing step, because reads are paired based on their readname.
Thanks
Nicolas

Manjulapramila Thimmajanarthanan

unread,
Sep 1, 2016, 5:46:43 AM9/1/16
to HiC-Pro
Hi Nicolas,

My raw files are having names as shown,


/data_b/indata/Ago1_HiC/
total 44232992
-rw-r--r-- 1 thimmamp kw-users 27928517662 Jul 26 18:07 Ago1_pool_R1_out.fastq
-rw-r--r-- 1 thimmamp kw-users 17366047226 Jul 26 18:09 Ago1_pool_R2_out.fastq

Hence changed in the config file as,
## Data
#########################################################################

PAIR1_EXT = R1_out
PAIR2_EXT = R2_out


After this, I am running hicpro, but there seems to error in the same mergingSAM step.

The log file contents are like given below.

## mergeBAM.py
## forward= bowtie_results/bwt2/Ago1_HiC/
Ago1_pool_R1_out_hg19.bwt2merged.bam
## reverse= bowtie_results/bwt2/Ago1_HiC/Ago1_pool_R2_out_hg19.bwt2merged.bam
## output= bowtie_results/bwt2/Ago1_HiC/Ago1_pool__hg19.bwt2pairs.bam

## min mapq= 0
## report_single= False
## report_multi= False
## verbose= True
## Merging forward and reverse tags ...
Forward and reverse reads not paired. Check that BAM files have the same read names and are sorted.

What could be going wrong?

Please help me sort this out.

nservant

unread,
Sep 1, 2016, 5:51:40 AM9/1/16
to HiC-Pro
Can you do a

 cat -c /data_b/indata/Ago1_HiC/Ago1_pool_R1_out.fastq | head -10

 cat -c /data_b/indataAgo1_HiC/Ago1_pool_R2_out.fastq | head -10

And check the number of reads in R1 and R2 ... your files size look very different 27Gb versus 17Gb !?

nservant

unread,
Sep 1, 2016, 5:54:47 AM9/1/16
to HiC-Pro
sorry of course remove -c option from 'cat', I just copy/paste your gzip command.
N
Message has been deleted

nservant

unread,
Sep 14, 2016, 11:22:14 AM9/14/16
to HiC-Pro
Hi Lea,
In your case, the input directory should be "data" and not "sample1".

So specify
-i /DCEG/Branches/LTG/Chanock/Lea/HiC/data
instead of
-i /DCEG/Branches/LTG/Chanock/Lea/HiC/data/sample1

Nicolas

On Wednesday, 14 September 2016 17:18:47 UTC+2, Lea Jessop wrote:
I'm getting the same error.  I have arranged my fastq files so that only R1 and R2 for one sample are in a folder, so not sure what I'm still doing wrong.


[jessopl@build-compute HiC]$ cd data
[jessopl@build-compute data]$ ls
sample1  sample2
[jessopl@build-compute data]$ cd sample1
[jessopl@build-compute sample1]$ ls
UO31_S1_R1.fastq  UO31_S1_R2.fastq
[jessopl@build-compute sample1]$ /DCEG/Resources/Tools/HiC-Pro/2.7.8/opt/HiC-Pro_2.7.8/bin/HiC-Pro -i /DCEG/Branches/LTG/Chanock/Lea/HiC/data/sample1 -o /DCEG/Branches/LTG/Chanock/Lea/HiC/HiC-Pro_out -c /DCEG/Branches/LTG/Chanock/Lea/HiC/config-hicpro.txt -p
Exit: Error: Directory Hierarchy of rawdata '/DCEG/Branches/LTG/Chanock/Lea/HiC/data/sample1' is not correct. No '.fastq(.gz)' files detected

Lea

Lea Jessop

unread,
Sep 15, 2016, 10:01:26 AM9/15/16
to HiC-Pro
Yes Thank you!  Got step 1 to run successfully!  Unfortunately I just submitted the second script (HiCPro_step2_.sh) and I get the following error:
ImportError: No module named iced
make: *** [ice_norm] Error 1



[jessopl@build-compute HiC-Pro_out]$ make --file /mnt/nfs/gigantor/ifs/DCEG/Resources/Tools/HiC-Pro/2.7.8/opt/HiC-Pro_2.7.8/scripts/Makefile CONFIG_FILE=/DCEG/Branches/LTG/Chanock/Lea/HiC/config-hicpro.txt CONFIG_SYS=/mnt/nfs/gigantor/ifs/DCEG/Resources/Tools/HiC-Pro/2.7.8/opt/HiC-Pro_2.7.8/config-system.txt all_persample 2>&1
--------------------------------------------
Thu Sep 15 09:44:55 EDT 2016
Merge multiple files from the same sample ...
/mnt/nfs/gigantor/ifs/DCEG/Resources/Tools/HiC-Pro/2.7.8/opt/HiC-Pro_2.7.8/scripts/merge_valid_interactions.sh -c /DCEG/Branches/LTG/Chanock/Lea/HiC/config-hicpro.txt >> hicpro.log
--------------------------------------------
Thu Sep 15 09:48:33 EDT 2016
Merge stat files per sample ...
/mnt/nfs/gigantor/ifs/DCEG/Resources/Tools/HiC-Pro/2.7.8/opt/HiC-Pro_2.7.8/scripts/merge_stats.sh -c /DCEG/Branches/LTG/Chanock/Lea/HiC/config-hicpro.txt >> hicpro.log
--------------------------------------------
Thu Sep 15 09:48:35 EDT 2016
Generate binned matrix files ...
/mnt/nfs/gigantor/ifs/DCEG/Resources/Tools/HiC-Pro/2.7.8/opt/HiC-Pro_2.7.8/scripts/build_raw_maps.sh -c /DCEG/Branches/LTG/Chanock/Lea/HiC/config-hicpro.txt
--------------------------------------------
Thu Sep 15 09:50:28 EDT 2016
Run quality checks for all samples ...
/mnt/nfs/gigantor/ifs/DCEG/Resources/Tools/HiC-Pro/2.7.8/opt/HiC-Pro_2.7.8/scripts/make_plots.sh -c /DCEG/Branches/LTG/Chanock/Lea/HiC/config-hicpro.txt -p "all" >> hicpro.log
--------------------------------------------
Thu Sep 15 09:51:03 EDT 2016
Run ICE Normalization ...
/mnt/nfs/gigantor/ifs/DCEG/Resources/Tools/HiC-Pro/2.7.8/opt/HiC-Pro_2.7.8/scripts/ice_norm.sh -c /DCEG/Branches/LTG/Chanock/Lea/HiC/config-hicpro.txt >> hicpro.log
Traceback (most recent call last):
  File "/mnt/nfs/gigantor/ifs/DCEG/Resources/Tools/HiC-Pro/2.7.8/opt/HiC-Pro_2.7.8/scripts/ice", line 8, in <module>
    import iced
ImportError: No module named iced
make: *** [ice_norm] Error 1

nservant

unread,
Sep 15, 2016, 12:08:31 PM9/15/16
to HiC-Pro
This is because you need the iced python module.
In theory, the package is installed with HiC-Pro. But there is a small bug in the installation process.
Actually, iced is installed in the /usr/bin/python and not in the python defined in the configuration file and used by HiC-Pro.
I'm working on this bug ...

In the meantime, you simply have to install the iced module (in scripts/src) in the python you put in the configuration file using ;

cd HiC-Pro_2.7.8/scripts/src/ice_mod/
$
{MY_PYTHON_PATH}/python setup.py install --user

To be sure that it is correcly installed, you can open a python terminal and try 'import iced'
N

Lea Jessop

unread,
Sep 15, 2016, 4:35:51 PM9/15/16
to HiC-Pro
thank you.  Second script is now running! 
-Lea

Manjulapramila Thimmajanarthanan

unread,
Sep 18, 2016, 7:09:30 AM9/18/16
to HiC-Pro
Dear Nicolas,

Thanks a lot for your support! Finally it worked!

/hic_results_ago1_testing/
total 20
drwxr-xr-x 3 thimmamp kw-users 4096 Sep  7 18:03 data
drwxr-xr-x 3 thimmamp kw-users 4096 Sep  7 19:06 pic
drwxr-xr-x 6 thimmamp kw-users 4096 Sep  7 19:06 ..
drwxr-xr-x 3 thimmamp kw-users 4096 Sep  7 19:06 matrix

I could see the interaction matrices created for my input data.

Do you suggest any tools to create TAD from these interaction matrices?

nservant

unread,
Sep 19, 2016, 3:38:42 AM9/19/16
to HiC-Pro
Hi,
Glad it works.
Regarding the TADs calling, I'm usually using both the directionality index and the insulation score.
I know that other papers/methods are available but I never really tested them.
I would suggest to have a look at the recent review from the Ren's lab. (Schmitt et al. 2016). Different methods for TADs calling are presented.
Best

Ashwin Kelkar

unread,
Nov 17, 2017, 4:33:22 AM11/17/17
to HiC-Pro
Dear Nicolas
Thanks for the very elegant toolset for HiC analysis.
I keep getting the same error that is portrayed further below, namely I get an error while doing ice normalisation.

Fri Nov 17 14:40:16 IST 2017
Run ICE Normalization ...
/home/ashwin/tools/HiCPro/HiC-Pro_2.9.0/scripts/ice_norm.sh -c /home/ashwin/datasets/3d/config_test_latest.txt >> hicpro.log 
Traceback (most recent call last):
  File "/home/ashwin/tools/HiCPro/HiC-Pro_2.9.0/scripts/ice", line 8, in <module>
    import iced
ImportError: No module named iced
make: *** [ice_norm] Error 1


I tried using the standalone iced package as well which gives me the same error.
I tried following your suggestion of installing iced separately by going to the appropriate /src/iced-mod folder and doing a separate 'python setup.py install' for the same.
I did not get any error while installing it but when testing import of iced module in python. it still does not work.

Python 2.7.6 (default, Oct 26 2016, 20:30:19) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import iced
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "iced/__init__.py", line 1, in <module>
    from . import normalization
  File "iced/normalization.py", line 3, in <module>
    from ._normalization_ import _update_normalization_csr
ImportError: No module named _normalization_


Everything up to this point works perfectly fine but am stuck at the last stage.
Any suggestions would be helpful.
Thanks,
Ashwin

nservant

unread,
Nov 17, 2017, 10:41:35 AM11/17/17
to HiC-Pro
For note ; This point has been fixed by comparing the python version used by HiC-Pro and the one where iced was installed.
N
Reply all
Reply to author
Forward
0 new messages