larry indexing breaks for dtype=object due to numpy isscalar call

25 views
Skip to first unread message

Thomas Coffee

unread,
Jan 27, 2012, 6:37:05 PM1/27/12
to labele...@googlegroups.com
For instance, the following object-dtype larry does not index properly:

>>> class C:
... pass
...
>>> c = larry([C()])
>>> c[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "la/deflarry.py", line 1719, in __getitem__
return larry(x, label)
File "la/deflarry.py", line 116, in __init__
raise ValueError, '0d larrys are not supported'
ValueError: 0d larrys are not supported


Whereas a numerical larry does:

>>> z = larry([0])
>>> z[0]
0


I believe the reason is that the larry.__getitem__ implementation
determines the dimension of the result of the indexing call by calling
numpy.isscalar on the result, which returns False for a generic object
(deflarry.py: line 1717).

I am not sure what is necessary to ensure the correct behavior always,
but in my case, the problem can be fixed by inserting the lines

if self.ndim <= 1:
return x

before line 1645, at the end if the clause "if isscalar(index):" in
the definition of larry.__getitem__.

Keith Goodman

unread,
Jan 27, 2012, 7:28:31 PM1/27/12
to labele...@googlegroups.com

Great bug find. Thank you.

Replacing the last if statement in __getitem__ with

if not isinstance(x, np.ndarray):
return x

seems to work and might be cleaner than adding a second if statement.
Plus is makes __getitem__ a tiny bit faster.

Here's the bug on the issue tracker: https://github.com/kwgoodman/la/issues/38

Keith Goodman

unread,
Jan 27, 2012, 7:56:29 PM1/27/12
to labele...@googlegroups.com

Thomas Coffee

unread,
Jan 28, 2012, 11:45:45 PM1/28/12
to labele...@googlegroups.com
The solution you suggest was my first thought also, but it can fail
for a larry of dtype=object when an element of the larry is an
instance of np.ndarray. This does not seem far-fetched to me, since in
my own case I am using larrys with elements of a type derived from
larry, and could easily imagine using larrys with elements of a type
derived from np.ndarray.

To handle correctly in all cases, it seems to me necessary to
explicitly consider how the index affects the dimensionality of the
result, i.e., whether the index is scalar-like rather than array-like.
As far as I can tell, the addition I've proposed to the "if
isscalar(index)" clause makes the final "if np.isscalar(x)" clause
unnecessary, thus still only one if statement would be required.
However, while I believe removing the latter clause would leave the
behavior consistent with my expectations, I'm not entirely certain it
would not change the behavior of any existing code (I don't know
enough about np.isscalar), so I thought it might be safer to leave
both.

- Thomas


On Sat, Jan 28, 2012 at 3:22 PM, <labele...@googlegroups.com> wrote:
>   Today's Topic Summary
>
> Group: http://groups.google.com/group/labeled-array/topics
>
> larry indexing breaks for dtype=object due to numpy isscalar call [3
> Updates]
>
>  larry indexing breaks for dtype=object due to numpy isscalar call
>
> Thomas Coffee <thomas...@gmail.com> Jan 27 06:37PM -0500

> Keith Goodman <kwgo...@gmail.com> Jan 27 04:28PM -0800


>
>>                return x
>
>> before line 1645, at the end if the clause "if isscalar(index):" in
>> the definition of larry.__getitem__.
>
> Great bug find. Thank you.
>
> Replacing the last if statement in __getitem__ with
>
> if not isinstance(x, np.ndarray):
> return x
>
> seems to work and might be cleaner than adding a second if statement.
> Plus is makes __getitem__ a tiny bit faster.
>
> Here's the bug on the issue tracker:
> https://github.com/kwgoodman/la/issues/38
>
>
>

> Keith Goodman <kwgo...@gmail.com> Jan 27 04:56PM -0800


>
>
>> seems to work and might be cleaner than adding a second if statement.
>> Plus is makes __getitem__ a tiny bit faster.
>
>> Here's the bug on the issue tracker:
>> https://github.com/kwgoodman/la/issues/38
>
> I pushed a bug fix:
> https://github.com/kwgoodman/la/commit/fce96d51dafed38cfc91fb67ba26f78a11494191
>
>
>

> You received this message because you are subscribed to the Google Group
> labeled-array.
> You can post via email.
> To unsubscribe from this group, send an empty message.
> For more options, visit this group.

--
______________________________________

Thomas Coffee
Massachusetts Institute of Technology
Department of Aeronautics & Astronautics
Space Systems Laboratory
617.549.5492

Please avoid sending attachments in proprietary formats like Word or PowerPoint.
For more information, see http://www.gnu.org/philosophy/no-word-attachments.html
______________________________________

Keith Goodman

unread,
Jan 29, 2012, 12:52:53 PM1/29/12
to labele...@googlegroups.com
On Sat, Jan 28, 2012 at 8:45 PM, Thomas Coffee <thomas...@gmail.com> wrote:
> The solution you suggest was my first thought also, but it can fail
> for a larry of dtype=object when an element of the larry is an
> instance of np.ndarray. This does not seem far-fetched to me, since in
> my own case I am using larrys with elements of a type derived from
> larry, and could easily imagine using larrys with elements of a type
> derived from np.ndarray.

Thanks for the bug fix review. It is very helpful.

I did think of the case of an array of arrays but did not think that
that was used. But makes sense to try to handle that case.

> To handle correctly in all cases, it seems to me necessary to
> explicitly consider how the index affects the dimensionality of the
> result, i.e., whether the index is scalar-like rather than array-like.
> As far as I can tell, the addition I've proposed to the "if
> isscalar(index)" clause makes the final "if np.isscalar(x)" clause
> unnecessary, thus still only one if statement would be required.
> However, while I believe removing the latter clause would leave the
> behavior consistent with my expectations, I'm not entirely certain it
> would not change the behavior of any existing code (I don't know
> enough about np.isscalar), so I thought it might be safer to leave
> both.

Your solution works for a 1d array of object dtype with ndarray elements:

>> a = np.empty(2, dtype=object)
>> a[0] = np.array([1, 2, 3])
>> a[1] = np.array([4, 5, 6])
>> lar = la.larry(a)
>> lar[0]
array([1, 2, 3])

But not for the 2d or higher case:

>> a = np.empty((2, 1), dtype=object)
>> a[0,0] = np.array([1, 2, 3])
>> a[1,0] = np.array([4, 5, 6])
>> lar = la.larry(a)
>> lar[0,0]
ValueError: Exactly one label per dimension needed

In the 2d case the index is a tuple of scalars. I guess that means
we'd have to check for that in the for loop over the tuple elements.

Adding that check allowed me to remove the final if statement in getitem.

Here's the commit:
https://github.com/kwgoodman/la/commit/5fe90b2bc46720ded0003b6088efa5916ca91aa6

Reply all
Reply to author
Forward
0 new messages