I have a DataFrame with a MultiIndex. I want to select all the elements where the first level of the MultiIndex is one of a list of specified values. `loc` seems to be the way to do this, but it is giving me strange behavior. This is with the dev version from github. Here's an example DataFrame:
>>> d = pandas.DataFrame({
... 'X': [1, 1, 1, 1, 2, 2, 2, 3, 3, 4],
... 'Y': ['a', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd', 'd']
... })
>>> print d
X Y
0 1 a
1 1 b
2 1 b
3 1 c
4 2 c
5 2 c
6 2 d
7 3 d
8 3 d
9 4 d
If I have a single-level index, as above, I can use `loc` with a list to select multiple rows:
>>> d.loc[[0, 2, 4]]
X Y
0 1 a
2 1 b
4 2 c
Now I give it a MultiIndex:
>>> d.set_index(["X", "Y"], inplace=True, drop=False)
>>> print d
X Y
X Y
1 a 1 a
b 1 b
b 1 b
c 1 c
2 c 2 c
c 2 c
d 2 d
3 d 3 d
d 3 d
4 d 4 d
Selecting rows by providing both index levels works:
>>> d.loc[1, 'b']
X Y
X Y
1 b 1 b
b 1 b
However, things begin to get strange if I try to pass lists of indices. If I pass the two indices as a list, the rows seem to be missing:
>>> d.loc[[1, 'b']]
X Y
1 NaN NaN
b NaN NaN
Notice that the index of the returned DataFrame mixes items from two different levels of the original DataFrame, which is weird. It's not clear what it's doing here; if it's selecting from the first level, it ought to throw an error because 'b' isn't a value in that level.
>>> d.loc[[1, 2]]
X Y
X Y
1 b 1 b
b 1 b
What it now gives me is rows 1 and 2 *by number* from the original data frame --- that is, the same as d.iloc[[1, 2]]! (I figured this out by trying other values, e.g., d.loc[[2, 6]] gives the same as d.iloc[[2, 6]].) This is quite surprising, as the docs state vehemently that loc is ONLY for label-based indexing.
Is this a bug? Given a sequence x, how can I index into the DataFrame to get all the rows where the first level of the MultiIndex is any of the values in x?