sparseToDense.py error: row index exceeds matrix dimensions

748 views
Skip to first unread message

Matthew Denholtz

unread,
Sep 29, 2017, 9:26:23 AM9/29/17
to HiC-Pro
Hi all,
I'm running into an error with the sparseToDense.py script, in full:

# Traceback (most recent call last):
#  File "/home/software/HiC-Pro/bin/utils/sparseToDense.py", line 75, in <module>
#   counts = io.load_counts(args.filename, lengths=lengths)
#  File "/usr/local/lib/python2.7/dist-packages/iced/io/_io_else.py", line 30, in load_counts
#    counts = sparse.coo_matrix((X[:, 2], (X[:, 0], X[:, 1])), shape=shape)
#  File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/coo.py", line 184, in __init__
#    self._check()
#  File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/coo.py", line 230, in _check
#    raise ValueError('row index exceeds matrix dimensions')
# ValueError: row index exceeds matrix dimensions 

The min and max bin numbers in the bed file and matrix file are the same.

This is only thrown on a subset of my data sets, others work fine. Has anyone dealt with this and figured out a solution? I'm happy to post some of the data, but I'm hoping someone else had this problem and has a solution.

Thanks,
Matt

nservant

unread,
Oct 4, 2017, 3:29:03 AM10/4/17
to HiC-Pro
Hi Matt,
Anything I can do to try to reproduce your issue ?
Thanks

Lorenzo Concia

unread,
Mar 26, 2018, 1:52:43 PM3/26/18
to HiC-Pro
Hello

I am having the same issue, I normalized chromosome-wise matrixes with HiTC and I was trying to export in a file.

It work only for the first chromosome, not for the rest of the genome.
There must be something in the .bed file, in the numbering of the bins.

At the moment I am trying to re-number the bins from zero, as fo the first chromosome.

Lorenzo Concia

unread,
Mar 26, 2018, 2:45:35 PM3/26/18
to HiC-Pro
I renamed the bins from 0 to chromosome_length and now it works.

Is there any way to export from HiTC with a compatible format?

Lorenzo Concia

unread,
Mar 26, 2018, 3:10:26 PM3/26/18
to HiC-Pro
Here what I did

First, I remover the first two rows of each file, starting with  "#"  (tail -n +3 input > output)


Then, for the BED file

awk '{ if (NR == 1 ) { Value = $4 } { print $1"\t"$2"\t"$3"\t"($4-Value) } }' input.bed > output.bed



for the sparse matrix

awk '{ if (NR == 1 ) { Value = $1 } { print ($1-Value)"\t"($2-Value)"\t"$3 } }' input.sparse_matrix > ouput.sparse_matrix



Good luck!

nservant

unread,
Mar 27, 2018, 4:07:11 AM3/27/18
to HiC-Pro
Hi guys,
I would be happy to fix this issue, but I'm not sure to understand your point.
Is it related to the sparseToDense.py ou the HiTC exportC function ? or both ?
The next version of HiTC (April) has a new exportC function that allow to remove the "#" header.
Could you summarize me what's wrong with the row/col indexes ???
Thanks

Lorenzo Concia

unread,
Mar 28, 2018, 9:39:59 AM3/28/18
to HiC-Pro
Hello Nicholas

I think it's related to the compatibility between the two software.

sparsetodense.py wants every to number the bins from 0, not from the last bin of the previous chromosome. That's why in my hands only the first chromosome was working (because it start at 0 already).
Hence, I had to adjust by subtracting from all bin numbers the number of the first bin, to scale down from zero to N (N=bins in that chromosome).


As alternative, it would be great to be able to export single chromosomes in dense format instead of export the sparse and need to converte to dense afterwards.

Another thing that I am noticing is that the export is extremely slow, like 10 minutes for a chromosome of 500x500 bins.
I suspect is converting from dense to sparse "on the fly" (am I right?)

Than you
Reply all
Reply to author
Forward
0 new messages