I just used the HiC-Pro output matrix from $PATH2/HiC-Pro/HiC.run/OUTPUT/hic_results/matrix/rawdata/iced/20000/rawdata_20000_iced.matrix
141133748 lines for rawdata_20000_iced.matrix
head rawdata_20000_iced.matrix
1 1 15
1 5 1
1 6 1
1 42 1
1 388 1
1 1378 1
1 1697 1
1 2190 1
1 2920 1
1 4955 1
155130 lines for rawdata_20000_ord.bed
head rawdata_20000_ord.bed
chr1 0 20000 1
chr1 20000 40000 2
chr1 40000 60000 3
chr1 60000 80000 4
chr1 80000 100000 5
chr1 100000 120000 6
chr1 120000 140000 7
chr1 140000 160000 8
chr1 160000 180000 9
chr1 180000 200000 10
Should I use different files? Which file I should use from HiC-Pro pipeline output files? I used human (hg19).
Thanks,
I am not sure how come I get the error >,< I am using the iced package 0.2.2 (the one you uploaded in HiC-Pro_2.7.7/scripts/src/iced-0.2.2)
[]$ python
Python 2.7.6 (default, Dec 17 2013, 17:06:01)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import iced
iced.version
'0.2.2-git'
I tried in another machine freshly, and interestingly, I get different errors
python /HiCPro/HiC-Pro_2.7.7/bin/utils/sparseToDense.py -b rawdata_20000_ord.bed -d -o test.matrix.txt rawdata_20000_iced.matrix
Traceback (most recent call last):
File "/HiCPro/HiC-Pro_2.7.7/bin/utils/sparseToDense.py", line 35, in
counts = counts.toarray()
File "/.local/lib/python2.7/site-packages/scipy/sparse/coo.py", line 274, in toarray
B = self._process_toarray_args(order, out)
File "/.local/lib/python2.7/site-packages/scipy/sparse/base.py", line 793, in _process_toarray_args
return np.zeros(self.shape, dtype=self.dtype, order=order)
MemoryError
I guess it could be due to memory error.
When I tried with 1000kb ones, it worked ;)
python /HiCPro/HiC-Pro_2.7.7/bin/utils/sparseToDense.py -b rawdata_1000000_ord.bed -d -o test.matrix.txt rawdata_1000000_iced.matrix
However, test.matrix.txt from this run give errors in the step calling HMM using domain caller (Dixon et al, 2012)
matlab < HMM_calls.m > dumpfile
Error using chol
Matrix must be positive definite.
Error in gaussLogprob (line 52)
R = chol(Sigma);
Error in mixGaussInferLatent (line 17)
logPz(:, k) = logMix(k) + gaussLogprob(mu(:, k), Sigma(:, :, k), X);
Error in mixGaussFit>estep (line 53)
[weights, ll] = mixGaussInferLatent(model, data);
Error in emAlgo (line 60)
[ess, ll] = estep(model, data);
Error in mixGaussFit (line 26)
[model, loglikHist] = emAlgo(model, data, initFn, @estep, @mstep , ...
I think it's due to the fact that the test.matrix.txt from HiC-Pro (after running sparseToDense.py) have a different distribution from the example test dataset that Bing Ren group uploaded in http://chromosome.sdsc.edu/mouse/hi-c/download.html - I was able to call TAD using example test dataset from this link using domain caller.
I also tested using the example test data you uploaded https://zerkalo.curie.fr/partage/HiC-Pro/HiCPro_testdata.tar.gz
I was able to make test.matrix.txt for 1000000 bin size one, but got same errors as above at the step calling HMM in domain caller.
ok I took me some times, but I just did it and it worked.
Actually, using this code is not that easy. So here is what I did ;
0- I'm in the iced/1000000 folder, and I used the test dataset
0bis - I'm not a Matlab expert, but it seems that the libraries or
modules need to be loaded from the current folder. So I created a link
to the require modules from the domains_call sotware itself !!!
ln -s Apps/domains_call/domaincall_software/required_modules/ require_modules
1- Convert to dense matrix. It creates the file
Apps/HiC-Pro_2.7.7/bin/utils/sparseToDense.py dixon_2M_1000000_iced.matrix -b ../../raw/1000000/dixon_2M_1000000_abs.bed -d
2- Create the DI file
Apps/domains_call/domaincall_software/perl_scripts/DI_from_matrix.pl dixon_2M_1000000_iced_dense.matrix
1000000 2000000 Apps/HiC-Pro_2.7.7/annotation/chrom_hg19.sizes >
dixon_2M_1000000_iced_dense.di
3- Then, we need a couple of additional changes because the domains caller does not like chrX, Y and M
grep -v "M" dixon_2M_1000000_iced_dense.di | grep -v "Y" | sed -e 's/^X/23/g' > dixon_2M_1000000_iced_dense_v2.di
4- Copy the file HMM_calls.m in the current folder and manually edit it (!!!) to set the input, ie. the v2.di file and the output ...
5- run Matlab
nice /bioinfo/opt/Matlab/bin/matlab < HMM_calls_iced_test.m
--
You received this message because you are subscribed to the Google Groups "HiC-Pro" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hic-pro+u...@googlegroups.com.
To post to this group, send email to hic...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/hic-pro/765ea6da-141e-44ab-93c6-aff206a97fcd%40googlegroups.com.