[SciPy-User] Reading / writing sparse matrices

671 views
Skip to first unread message

Lutz Maibaum

unread,
Jun 18, 2010, 9:31:59 PM6/18/10
to scipy...@scipy.org
How can I write a sparse matrix with elements of type uint64 to a file, and recover it while preserving the data type? For example:

>>> import numpy as np
>>> import scipy.sparse
>>> a=scipy.sparse.lil_matrix((5,5), dtype=np.uint64)
>>> a[0,0]=9876543210

Now I save this matrix to a file:

>>> import scipy.io
>>> scipy.io.mmwrite("test.mtx", a, field='integer')

If I do not specify the field argument of mmwrite, I get a "unexpected dtype of kind u" exception. The generated file test.mtx looks as expected. But when I try to read this matrix, it is converted to int32:

>>> b=scipy.io.mmread("test.mtx")
>>> b.dtype
dtype('int32')
>>> b.data
array([-2147483648], dtype=int32)

As far as I can tell, it is not possible to specify a dtype when calling mmread. Is there a better way to go about this?

Any help is much appreciated.

Lutz

_______________________________________________
SciPy-User mailing list
SciPy...@scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user

Matthew Brett

unread,
Nov 11, 2010, 8:27:37 PM11/11/10
to SciPy Users List
Hi,

On Fri, Jun 18, 2010 at 6:31 PM, Lutz Maibaum <lutz.m...@gmail.com> wrote:
> How can I write a sparse matrix with elements of type uint64 to a file, and recover it while preserving the data type? For example:
>
>>>> import numpy as np
>>>> import scipy.sparse
>>>> a=scipy.sparse.lil_matrix((5,5), dtype=np.uint64)
>>>> a[0,0]=9876543210
>
> Now I save this matrix to a file:
>
>>>> import scipy.io
>>>> scipy.io.mmwrite("test.mtx", a, field='integer')
>
> If I do not specify the field argument of mmwrite, I get a "unexpected dtype of kind u" exception. The generated file test.mtx looks as expected. But when I try to read this matrix, it is converted to int32:
>
>>>> b=scipy.io.mmread("test.mtx")
>>>> b.dtype
> dtype('int32')
>>>> b.data
> array([-2147483648], dtype=int32)
>
> As far as I can tell, it is not possible to specify a dtype when calling mmread. Is there a better way to go about this?

I had a quick look at the code, and then at the Matrix Market format,
and it looks to me:

http://math.nist.gov/MatrixMarket/reports/MMformat.ps.gz

as if Matrix Market only allows integer, real or complex - hence the
(somewhat unhelpful) error.

Best,

Matthew

Lutz Maibaum

unread,
Nov 11, 2010, 10:36:37 PM11/11/10
to SciPy Users List
On Nov 11, 2010, at 5:27 PM, Matthew Brett wrote:
> On Fri, Jun 18, 2010 at 6:31 PM, Lutz Maibaum <lutz.m...@gmail.com> wrote:
>> How can I write a sparse matrix with elements of type uint64 to a file, and recover it while preserving the data type? For example:
>>
>>>>> import numpy as np
>>>>> import scipy.sparse
>>>>> a=scipy.sparse.lil_matrix((5,5), dtype=np.uint64)
>>>>> a[0,0]=9876543210
>>
>> Now I save this matrix to a file:
>>
>>>>> import scipy.io
>>>>> scipy.io.mmwrite("test.mtx", a, field='integer')
>>
>> If I do not specify the field argument of mmwrite, I get a "unexpected dtype of kind u" exception. The generated file test.mtx looks as expected. But when I try to read this matrix, it is converted to int32:
>>
>>>>> b=scipy.io.mmread("test.mtx")
>>>>> b.dtype
>> dtype('int32')
>>>>> b.data
>> array([-2147483648], dtype=int32)
>>
>> As far as I can tell, it is not possible to specify a dtype when calling mmread. Is there a better way to go about this?
>
> I had a quick look at the code, and then at the Matrix Market format,
> and it looks to me:
>
> http://math.nist.gov/MatrixMarket/reports/MMformat.ps.gz
>
> as if Matrix Market only allows integer, real or complex - hence the
> (somewhat unhelpful) error.

Yes, the Matrix Market file format has only these 3 types, and scipy.io.mmwrite (actually, scipy.io.mmio.MMFile._write) has to guess which of these to use for a given dtype:

if field is None:
kind = a.dtype.kind
if kind == 'i':
field = 'integer'
elif kind == 'f':
field = 'real'
elif kind == 'c':
field = 'complex'
else:
raise TypeError('unexpected dtype kind ' + kind)

It would be nice if this algorithm would be extended to handle unsigned integers (which seem to have kind=='u', but I'm not sure if that's sufficient and necessary) as well, which could also translate to "integer" in the MM file.

The opposite problem occurs when the file is read by mmread, which has to figure out how to translate the three Matrix Market types to python's numeric types. Using the system's default types for int, float and complex is very reasonable, but it would be nice if one could override this default by specifying an optional dtype argument (as is used, for example, by numpy.loadtxt).

Thanks for looking into this,

Lutz

Matthew Brett

unread,
Nov 11, 2010, 11:58:57 PM11/11/10
to SciPy Users List
Hi,

On Thu, Nov 11, 2010 at 7:36 PM, Lutz Maibaum <lutz.m...@gmail.com> wrote:
> On Nov 11, 2010, at 5:27 PM, Matthew Brett wrote:
>> On Fri, Jun 18, 2010 at 6:31 PM, Lutz Maibaum <lutz.m...@gmail.com> wrote:
>>> How can I write a sparse matrix with elements of type uint64 to a file, and recover it while preserving the data type? For example:
>>>
>>>>>> import numpy as np
>>>>>> import scipy.sparse
>>>>>> a=scipy.sparse.lil_matrix((5,5), dtype=np.uint64)
>>>>>> a[0,0]=9876543210
>>>
>>> Now I save this matrix to a file:
>>>
>>>>>> import scipy.io
>>>>>> scipy.io.mmwrite("test.mtx", a, field='integer')
>>>
>>> If I do not specify the field argument of mmwrite, I get a "unexpected dtype of kind u" exception. The generated file test.mtx looks as expected. But when I try to read this matrix, it is converted to int32:
>>>
>>>>>> b=scipy.io.mmread("test.mtx")
>>>>>> b.dtype
>>> dtype('int32')
>>>>>> b.data
>>> array([-2147483648], dtype=int32)
>>>
>>> As far as I can tell, it is not possible to specify a dtype when calling mmread. Is there a better way to go about this?
>>
>> I had a quick look at the code, and then at the Matrix Market format,
>> and it looks to me:
>>
>> http://math.nist.gov/MatrixMarket/reports/MMformat.ps.gz
>>
>> as if Matrix Market only allows integer, real or complex - hence the
>> (somewhat unhelpful) error.
>
> Yes, the Matrix Market file format has only these 3 types, and scipy.io.mmwrite (actually, scipy.io.mmio.MMFile._write) has to guess which of these to use for a given dtype:

...


> It would be nice if this algorithm would be extended to handle unsigned integers (which seem to have kind=='u', but I'm not sure if that's sufficient and necessary) as well, which could also translate to "integer" in the MM file.

> The opposite problem occurs when the file is read by mmread, which has to figure out how to translate the three Matrix Market types to python's numeric types. Using the system's default types for int, float and complex is very reasonable, but it would be nice if one could override this default by specifying an optional dtype argument (as is used, for example, by numpy.loadtxt).

The problem I can see is that this would be confusing:

a=scipy.sparse.lil_matrix((5,5), dtype=np.uint64)
a[0,0]=9876543210

mmwrite(fname, a)
res = mmread(fname)


b.data
array([-2147483648], dtype=int32)

That is, I think the writer shouldn't write something without warning,
that it will read incorrectly by default. So, how about a
compromise:

In [7]: mmwrite(fname, a)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
...
TypeError: Will not write unsigned integers by default. Please pass
field="integer" to write unsigned integers
In [8]: mmwrite(fname, a, field='integer')
In [9]: res = mmread(fname, dtype=np.uint64)
In [11]: res.todense()[0,0]
Out[11]: 9876543210

?

Best,

Matthew

Lutz Maibaum

unread,
Nov 12, 2010, 12:33:16 AM11/12/10
to SciPy Users List
On Thu, Nov 11, 2010 at 8:58 PM, Matthew Brett <matthe...@gmail.com> wrote:
> The problem I can see is that this would be confusing:
>
> a=scipy.sparse.lil_matrix((5,5), dtype=np.uint64)
> a[0,0]=9876543210
> mmwrite(fname, a)
> res = mmread(fname)
> b.data
> array([-2147483648], dtype=int32)
>
> That is, I think the writer shouldn't write something without warning,
> that it will read incorrectly by default.   So, how about a
> compromise:
>
> In [7]: mmwrite(fname, a)
> ---------------------------------------------------------------------------
> TypeError                                 Traceback (most recent call last)
> ...
> TypeError: Will not write unsigned integers by default. Please pass
> field="integer" to write unsigned integers
> In [8]: mmwrite(fname, a, field='integer')
> In [9]: res = mmread(fname, dtype=np.uint64)
> In [11]: res.todense()[0,0]
> Out[11]: 9876543210

That's one possibility, but I find it somewhat odd that this would
generate an exception when the matrix is being saved, even though
there is no ambiguity at this stage. It also wouldn't eliminate the
potential for confusion if someone tries to load a matrix that they
didn't save themselves, but got from some other source.

Are there other situations where the automated conversion from mmread
may cause problems? For example, reading a matrix with 64-bit integers
on a system where the default int dtype is only 32 bit?

I think it would be ideal if mmread would generate a warning or throw
an exception of the numerical value of the generated integer does not
coincide with string that has been read from the file. I don't know
if that is feasible. Alternatively, one could store additional
information about the integer data type in the Matrix Market header
section as a comment.

I understand that these solutions would require much more thought.
Your solution would be a nice initial patch.

Thanks,

Lutz

Matthew Brett

unread,
Nov 12, 2010, 3:46:17 AM11/12/10
to SciPy Users List
Hi,

Yes, of course, we can't protect people from getting unexpected or
wrong results in that case.

> Are there other situations where the automated conversion from mmread
> may cause problems? For example, reading a matrix with 64-bit integers
> on a system where the default int dtype is only 32 bit?

Yes, the general case is saving anything that cannot be read back into
the default integer on the system on which the file is being read.
So, for example, uint16 is always safe.

> I think it would be ideal if mmread would generate a warning or throw
> an exception of the numerical value of the generated integer does not
> coincide with string that has been read from the file. I don't  know
> if that is feasible. Alternatively, one could store additional
> information about the integer data type in the Matrix Market header
> section as a comment.

I don't know the code well enough to know if it's practical to check
every value for integer overflow. The comment idea sounds reasonable.
I'll have a look. I had already implemented the suggestion I sent
you before, so at least that should go in, unless we come up with
something better.

Cheers,

Matthew

Lutz Maibaum

unread,
Nov 12, 2010, 12:30:14 PM11/12/10
to SciPy Users List
On Fri, Nov 12, 2010 at 12:46 AM, Matthew Brett <matthe...@gmail.com> wrote:
>  I'll have a look.  I had already implemented the suggestion I sent
> you before, so at least that should go in, unless we come up with
> something better.

That sounds good. Perhaps more important than automatic type detection
in mmread would be the possibility for the caller to explicitly
request a specific type, similar to what numpy.loadtxt provides.

Thanks,

Lutz

Reply all
Reply to author
Forward
0 new messages