pandas index: intersection and union problem

762 views
Skip to first unread message

Skipper Seabold

unread,
Sep 6, 2011, 3:24:36 PM9/6/11
to pystat...@googlegroups.com
I'm trying to outer join two DataFrames that have a tuple as an index
and I'm running into problems with the intersection and union. Don't
worry, I made a patch.

https://github.com/jseabold/pandas/compare/master...fix-index

AFAICT, the problem comes from trying to assign a record array using
the slice notation. To get the 1d record array, you have to iterate
through the list items. You'll probably want to cythonize this
iteration I imagine, but the code works for my problems now. I don't
know of any side effects yet...

import pandas
import numpy as np

idx1 = np.array([(1, 'A'),(2, 'A'),(1, 'B'),(2, 'B')], dtype=[('num',
int),('let', 'a1')])
idx2 = np.array([(1, 'A'),(2, 'A'),(1, 'B'),(2, 'B'),(1,'C'),(2,
'C')], dtype=[('num', int),('let', 'a1')])

idx1 = pandas.Index(idx1)
idx2 = pandas.Index(idx2)

# intersection broken?
idx1.intersection(idx2)
# needs to be 1d like idx1 and idx2
int_idx = pandas.Index(sorted(set(idx1) & set(idx2)))

# union broken
union_idx = idx1.union(idx2)
print union_idx.shape #needs to be 1d like idx1 and idx2

Skipper

Wes McKinney

unread,
Sep 6, 2011, 3:27:49 PM9/6/11
to pystat...@googlegroups.com

what git revision are you using?

Wes McKinney

unread,
Sep 6, 2011, 3:38:50 PM9/6/11
to pystat...@googlegroups.com

I fixed this here, I think you're on a git revision prior to 8/21 when
I overhauled the intersection/union functions with some speedier
Cython code (but had a bug with tuple handling, because asarray is not
safe for creating 1-d arrays of tuples)

https://github.com/wesm/pandas/commit/1f0c920698b86949f448d6fae4f84ed50bd79725

BTW the new GitHub web style just freaked me out a little bit (too
much coffee?). Who moved my cheese!!??

Also, if you're using tuples for your index, why not go full blown
hierarchical index?
http://pandas.sourceforge.net/indexing.html#hierarchical-indexing-multiindex

- W

Skipper Seabold

unread,
Sep 6, 2011, 3:42:31 PM9/6/11
to pystat...@googlegroups.com
On Tue, Sep 6, 2011 at 3:27 PM, Wes McKinney <wesm...@gmail.com> wrote:

master as of this morning

Skipper

Skipper Seabold

unread,
Sep 6, 2011, 3:44:41 PM9/6/11
to pystat...@googlegroups.com

Slightly afraid of being too bleeding edge?

Skipper

Skipper Seabold

unread,
Sep 7, 2011, 6:57:39 PM9/7/11
to pystat...@googlegroups.com
On Tue, Sep 6, 2011 at 3:24 PM, Skipper Seabold <jsse...@gmail.com> wrote:

Apparently this was a numpy problem. Something changed in how object
arrays were handled. Current pandas master does not work with numpy
'2.0.0.dev-a1e7be3' (though my patch fixes the problem with this
version), but current pandas master works with recent master of numpy
'2.0.0.dev-900d82e'. Probably not a problem as long as it works across
releases.

Skipper

Wes McKinney

unread,
Sep 7, 2011, 7:03:32 PM9/7/11
to pystat...@googlegroups.com

Cool. I will plan to just target the latest NumPy release and any bugs
we should report to the NumPy team.

- Wes

Reply all
Reply to author
Forward
0 new messages