Selecting multiple values from one level of a MultiIndex

5,080 views
Skip to first unread message

Brendan Barnwell

unread,
Nov 2, 2013, 8:48:35 PM11/2/13
to pyd...@googlegroups.com
I have a DataFrame with a MultiIndex.  I want to select all the elements where the first level of the MultiIndex is one of a list of specified values.  `loc` seems to be the way to do this, but it is giving me strange behavior.  This is with the dev version from github.  Here's an example DataFrame:

    >>> d = pandas.DataFrame({
    ...     'X': [1, 1, 1, 1, 2, 2, 2, 3, 3, 4],
    ...     'Y': ['a', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd', 'd']
    ... })
    >>> print d
       X  Y
    0  1  a
    1  1  b
    2  1  b
    3  1  c
    4  2  c
    5  2  c
    6  2  d
    7  3  d
    8  3  d
    9  4  d

If I have a single-level index, as above, I can use `loc` with a list to select multiple rows:

    >>> d.loc[[0, 2, 4]]
       X  Y
    0  1  a
    2  1  b
    4  2  c

Now I give it a MultiIndex:

    >>> d.set_index(["X", "Y"], inplace=True, drop=False)
    >>> print d
         X  Y
    X Y     
    1 a  1  a
      b  1  b
      b  1  b
      c  1  c
    2 c  2  c
      c  2  c
      d  2  d
    3 d  3  d
      d  3  d
    4 d  4  d

Selecting rows by providing both index levels works:

    >>> d.loc[1, 'b']
         X  Y
    X Y     
    1 b  1  b
      b  1  b

However, things begin to get strange if I try to pass lists of indices.  If I pass the two indices as a list, the rows seem to be missing:

    >>> d.loc[[1, 'b']]
        X    Y
    1 NaN  NaN
    b NaN  NaN

Notice that the index of the returned DataFrame mixes items from two different levels of the original DataFrame, which is weird.  It's not clear what it's doing here; if it's selecting from the first level, it ought to throw an error because 'b' isn't a value in that level.

    >>> d.loc[[1, 2]]
         X  Y
    X Y     
    1 b  1  b
      b  1  b

What it now gives me is rows 1 and 2 *by number* from the original data frame --- that is, the same as d.iloc[[1, 2]]!  (I figured this out by trying other values, e.g., d.loc[[2, 6]] gives the same as d.iloc[[2, 6]].)  This is quite surprising, as the docs state vehemently that loc is ONLY for label-based indexing.

Is this a bug?  Given a sequence x, how can I index into the DataFrame to get all the rows where the first level of the MultiIndex is any of the values in x?

Jeffrey Tratner

unread,
Nov 2, 2013, 8:52:57 PM11/2/13
to pyd...@googlegroups.com
Need to pass a tuple in the list, i.e.:

df.loc[[(1, 'b')]]

df.loc[1, 'b'] is implicitly a tuple.

Not sure about your later example.


--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jeff Reback

unread,
Nov 2, 2013, 8:56:32 PM11/2/13
to pyd...@googlegroups.com
try using xs if u don't want to specify a fully qualifies tuple 


also you can pass a list of tuples to loc (as Jeff indicated as well)

Brendan Barnwell

unread,
Nov 2, 2013, 9:23:02 PM11/2/13
to pyd...@googlegroups.com
Sorry if I'm being dense, but I still don't see how those approaches answer my question, or address the odd behavior.

Given the list [1, 2], how can I get all rows from the dataframe where the FIRST level of the index is one of the values in my list?  That is, the union of d.ix[1] and d.ix[2].  xs doesn't seem to accept a list of indices either.

Also, I still don't understand why d.loc[[1, 2]] return items by numerical index, nor why d.loc[[1, 'b']] returns a DataFrame whose index is a hybrid of the two levels of the source DataFrame.

Jeffrey Tratner

unread,
Nov 2, 2013, 9:27:01 PM11/2/13
to pyd...@googlegroups.com
df.loc doesn't fail if you pass it a list of items and they aren't in the index. `1` and `'b'` aren't in the Index, so it creates a new DataFrame of nan values.  Same thing would happen if you did `df.loc[['silly', 'name]]`.

Jeff Reback

unread,
Nov 2, 2013, 9:28:45 PM11/2/13
to pyd...@googlegroups.com

you can do

df.loc[1:2]
if they are disjoint then I think ATM the you would have do separately select then combine; just too many parameter to deal with a list in xs as it needs to handle level and axis as well

df.loc[[

Jeff Reback

unread,
Nov 2, 2013, 9:33:14 PM11/2/13
to pyd...@googlegroups.com
df.loc[[1.2]] is really pretty ambiguous because it tries the indicies on separate axes (it's possible that this is using some of the fallback integer indexing because this is a multi-index) not sure

you need to explicitly use a tuple

df.loc[(1,'b')] 

xs will work similarly (but allow u to specify only a partial key)

Jeffrey Tratner

unread,
Nov 2, 2013, 9:36:49 PM11/2/13
to pyd...@googlegroups.com
We might look into the interaction between loc and MI (but it's somewhat complicated in some ways); however, the key takeaway is that, if you have a MultiIndex and you're trying to get something from it, you should use xs or pass tuples to loc.

Brendan Barnwell

unread,
Nov 2, 2013, 9:38:08 PM11/2/13
to pyd...@googlegroups.com
Okay, but that's contrary to the documentation, which says:

"ALL of the labels for which you ask, must be in the index or a KeyError will be raised!"

Also, the value 1 *is* in the index, and d.loc[1] works.  If d.loc[1] works and d.loc['b'] fails (which is the case), then I would expect that d.loc[[1, 'b']] should either fail (because one of the passed values is not available), or return the same as d.ix[(1, 'b')] (if it interprets the two values on separate index levels).

I still can't see how the result for d.loc[[1, 'b']] makes sense, nor why d.loc[[1, 2]] returns values by numerical index.

Jeff Reback

unread,
Nov 2, 2013, 9:45:20 PM11/2/13
to pyd...@googlegroups.com
I believe it's a bug that loc falls back to integer indexing for multi indexes

dartdog

unread,
Nov 3, 2013, 2:25:29 PM11/3/13
to pyd...@googlegroups.com
I seem to have a similar lack of understanding that I'm trying to get to the bottom of on SO, Here if anyone can take a look/crack at?
Reply all
Reply to author
Forward
0 new messages