Using HiC-Pro output for DI and TADs calling

980 views
Skip to first unread message

nservant

unread,
Jun 20, 2016, 12:51:39 PM6/20/16
to HiC-Pro
I report here an issue from the github wesite about how to use the HiC-Pro output to call TADs using the Dixon et al. method.

-----------

I just used the HiC-Pro output matrix from $PATH2/HiC-Pro/HiC.run/OUTPUT/hic_results/matrix/rawdata/iced/20000/rawdata_20000_iced.matrix

141133748 lines for rawdata_20000_iced.matrix

head rawdata_20000_iced.matrix

1 1 15
1 5 1
1 6 1
1 42 1
1 388 1
1 1378 1
1 1697 1
1 2190 1
1 2920 1
1 4955 1

155130 lines for rawdata_20000_ord.bed

head rawdata_20000_ord.bed
chr1 0 20000 1
chr1 20000 40000 2
chr1 40000 60000 3
chr1 60000 80000 4
chr1 80000 100000 5
chr1 100000 120000 6
chr1 120000 140000 7
chr1 140000 160000 8
chr1 160000 180000 9
chr1 180000 200000 10

Should I use different files? Which file I should use from HiC-Pro pipeline output files? I used human (hg19).

Thanks,


nservant

unread,
Jun 20, 2016, 12:52:09 PM6/20/16
to HiC-Pro
think everything is fine in the way you run HiCPro.
In the BED file you have 155130 line so in the sparseToDense script the matrix should be of size 155130x155130 which is not the case (you have 155129 x 155131)
Did you build the maps at lower resolution too ? Does it work with other resolution ?
I guess you are using the latest version of the iced package, 0.2.2 ? If not, could you please update it.
I will need the reproduice the bug, would you agree to share these files with me ?
Thanks
Nicolas

nservant

unread,
Jun 20, 2016, 12:53:44 PM6/20/16
to HiC-Pro

I am not sure how come I get the error >,< I am using the iced package 0.2.2 (the one you uploaded in HiC-Pro_2.7.7/scripts/src/iced-0.2.2)

[]$ python
Python 2.7.6 (default, Dec 17 2013, 17:06:01)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import iced
iced.version
'0.2.2-git'

I tried in another machine freshly, and interestingly, I get different errors
python /HiCPro/HiC-Pro_2.7.7/bin/utils/sparseToDense.py -b rawdata_20000_ord.bed -d -o test.matrix.txt rawdata_20000_iced.matrix

Traceback (most recent call last):
File "/HiCPro/HiC-Pro_2.7.7/bin/utils/sparseToDense.py", line 35, in
counts = counts.toarray()
File "/.local/lib/python2.7/site-packages/scipy/sparse/coo.py", line 274, in toarray
B = self._process_toarray_args(order, out)
File "/.local/lib/python2.7/site-packages/scipy/sparse/base.py", line 793, in _process_toarray_args
return np.zeros(self.shape, dtype=self.dtype, order=order)
MemoryError

I guess it could be due to memory error.

When I tried with 1000kb ones, it worked ;)

python /HiCPro/HiC-Pro_2.7.7/bin/utils/sparseToDense.py -b rawdata_1000000_ord.bed -d -o test.matrix.txt rawdata_1000000_iced.matrix

However, test.matrix.txt from this run give errors in the step calling HMM using domain caller (Dixon et al, 2012)
matlab < HMM_calls.m > dumpfile
Error using chol
Matrix must be positive definite.

Error in gaussLogprob (line 52)
R = chol(Sigma);

Error in mixGaussInferLatent (line 17)
logPz(:, k) = logMix(k) + gaussLogprob(mu(:, k), Sigma(:, :, k), X);

Error in mixGaussFit>estep (line 53)
[weights, ll] = mixGaussInferLatent(model, data);

Error in emAlgo (line 60)
[ess, ll] = estep(model, data);

Error in mixGaussFit (line 26)
[model, loglikHist] = emAlgo(model, data, initFn, @estep, @mstep , ...

I think it's due to the fact that the test.matrix.txt from HiC-Pro (after running sparseToDense.py) have a different distribution from the example test dataset that Bing Ren group uploaded in http://chromosome.sdsc.edu/mouse/hi-c/download.html - I was able to call TAD using example test dataset from this link using domain caller.

I also tested using the example test data you uploaded https://zerkalo.curie.fr/partage/HiC-Pro/HiCPro_testdata.tar.gz

I was able to make test.matrix.txt for 1000000 bin size one, but got same errors as above at the step calling HMM in domain caller.


nservant

unread,
Jun 20, 2016, 12:54:42 PM6/20/16
to HiC-Pro

ok I took me some times, but I just did it and it worked.
Actually, using this code is not that easy. So here is what I did ;


0- I'm in the iced/1000000 folder, and I used the test dataset

0bis - I'm not a Matlab expert, but it seems that the libraries or modules need to be loaded from the current folder. So I created a link to the require modules from the domains_call sotware itself !!!
ln -s Apps/domains_call/domaincall_software/required_modules/ require_modules


1- Convert to dense matrix. It creates the file
Apps/HiC-Pro_2.7.7/bin/utils/sparseToDense.py dixon_2M_1000000_iced.matrix -b ../../raw/1000000/dixon_2M_1000000_abs.bed -d


2- Create the DI file
Apps/domains_call/domaincall_software/perl_scripts/DI_from_matrix.pl dixon_2M_1000000_iced_dense.matrix 1000000 2000000 Apps/HiC-Pro_2.7.7/annotation/chrom_hg19.sizes > dixon_2M_1000000_iced_dense.di


3- Then, we need a couple of additional changes because the domains caller does not like chrX, Y and M
grep -v "M" dixon_2M_1000000_iced_dense.di | grep -v "Y" | sed -e 's/^X/23/g' > dixon_2M_1000000_iced_dense_v2.di


4- Copy the file HMM_calls.m in the current folder and manually edit it (!!!) to set the input, ie. the v2.di file and the output ...


5- run Matlab
nice /bioinfo/opt/Matlab/bin/matlab < HMM_calls_iced_test.m

Nelle Varoquaux

unread,
Jun 20, 2016, 5:46:04 PM6/20/16
to nservant, HiC-Pro
I wouldn't be suprised there is an off-by-one error somewhere in the code (I'd be responsible for that one…).
I'll check this out and come back to you.
Thanks,
N

--
You received this message because you are subscribed to the Google Groups "HiC-Pro" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hic-pro+u...@googlegroups.com.
To post to this group, send email to hic...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/hic-pro/765ea6da-141e-44ab-93c6-aff206a97fcd%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

liyufan...@gmail.com

unread,
Jul 9, 2018, 5:06:36 AM7/9/18
to HiC-Pro
Hello,
I am calling TAD using the method of Ren lab.And I met some question in first step. Some matrixs which's resolution over 150k,such as 40k or 20k ,even 5k,there always exist some error in HIC-Pro's sparseToDense.py step.  And  matrixs with 500k,1000k didn't met the problem.
There follows the error.

 /data/users/robots/software/HiC-Pro/HiC-Pro_2.10.0/bin/utils/sparseToDense.py dixon_2M_4000_iced.matrix -b /data/users/liyufang/HiC_Pro_test/hicpro_latest_test/hic_results/data/matrix/dixon_2M/raw/4000/dixon_2M_4000_abs.bed -d

Traceback (most recent call last):
  File "/data/users/robots/software/HiC-Pro/HiC-Pro_2.10.0/bin/utils/sparseToDense.py", line 87, in <module>
    counts = counts.toarray()
  File "/data/users/robots/software/python/Python-2.7.12/lib/python2.7/site-packages/scipy/sparse/coo.py", line 259, in toarray
    B = self._process_toarray_args(order, out)
  File "/data/users/robots/software/python/Python-2.7.12/lib/python2.7/site-packages/scipy/sparse/base.py", line 1130, in _process_toarray_args
    return np.zeros(self.shape, dtype=self.dtype, order=order)

Could you help me to solve the problem? Thank you very much! 



在 2016年6月21日星期二 UTC+8上午12:54:42,nservant写道:

nservant

unread,
Jul 9, 2018, 7:38:48 AM7/9/18
to HiC-Pro
I think that this is a memory issue. To generte dense matrix at 20/40kb resolution, you need several Gb of RAM ...
Nicolas
Reply all
Reply to author
Forward
0 new messages