create a MultiIndex dataframe from a list of tuples?

1,173 views
Skip to first unread message

Michael

unread,
Nov 24, 2014, 10:57:24 AM11/24/14
to pyd...@googlegroups.com
Given a list of tuples like 
L = [(1,1,2,3), (1,2,3,4), (1,3,4,5)]

can I use pandas.DataFrame.from_records() to create a dataframe with a MultiIndex of first 2 fields?
I tried from_records(), but I don't know what to provide to index parameter, as all I had tried throws exceptions

If not, I will use list comprehensions to split the list L into a MultiIndex and column data.
but I really am curious if I can do it in a single line with from_records()

Wouter Overmeire

unread,
Nov 24, 2014, 11:03:10 AM11/24/14
to pyd...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



In [120]: df = pd.DataFrame.from_records(L).set_index([0,1])

In [121]: df
     2  3
0 1      
1 1  2  3
  2  3  4
  3  4  5

Joris Van den Bossche

unread,
Nov 24, 2014, 11:03:20 AM11/24/14
to pyd...@googlegroups.com
You can just set it with set_index after the dataframe creation:

In [102]: L = [(1,1,2,3), (1,2,3,4), (1,3,4,5)]

In [104]: pd.DataFrame(L).set_index([0,1])
Out[104]:

     2  3
0 1
1 1  2  3
  2  3  4
  3  4  5

Or use the index argument in from_records with [0, 1]:

In [107]: pd.DataFrame.from_records(L, index=[0, 1])
Out[107]:

     2  3
0 1
1 1  2  3
  2  3  4
  3  4  5
2014-11-24 16:57 GMT+01:00 'Michael' via PyData <pyd...@googlegroups.com>:

--

Michael

unread,
Nov 24, 2014, 11:29:50 AM11/24/14
to pyd...@googlegroups.com
set_index() didn't cross my mind, it's like "dooh"

but, question about second variant, because I tried it exactly so, only that I gave it a columns parameter also => exception
I tried:
pd.DataFrame.from_records(data=L, index=[0,1], columns='a b c d'.split())   
# AssertionError: 2 columns passed, passed data had 4 columns

pd.DataFrame.from_records(data=L, index=[0,1], columns='c d'.split())
# ValueError: Shape of passed values is (4, 3), indices imply (4, 2)

is it a bug? or I did something wrong?

Michael

unread,
Nov 24, 2014, 11:31:57 AM11/24/14
to pyd...@googlegroups.com
I pasted the exceptions wrong
Actually it's:

pd.DataFrame.from_records(data=L, index=[0,1], columns='a b c d'.split())   
# ValueError: Shape of passed values is (4, 3), indices imply (4, 2)

pd.DataFrame.from_records(data=L, index=[0,1], columns='c d'.split())
# AssertionError: 2 columns passed, passed data had 4 columns

Joris Van den Bossche

unread,
Nov 24, 2014, 11:47:59 AM11/24/14
to pyd...@googlegroups.com
It works when using the given names in the index kwarg:

In [112]: pd.DataFrame.from_records(data=L, columns='a b c d'.split(), index=['a', 'b'])
Out[112]:
     c  d
a b

1 1  2  3
  2  3  4
  3  4  5


The reason it fails when using [0, 1] is that it interprets that as the actual values to set as the index (and not as the positions of the columns to set as index. This is not supported in from_records, only the exact labels. When not using the columns arg it does work with [0,1] as the columns then have integer labels).

Michael

unread,
Nov 24, 2014, 11:59:02 AM11/24/14
to pyd...@googlegroups.com
Now it's clear.

Anything can be done to have both variants?
I'm not sure if it's wise to create an issue out of it,
but you just saw the thinking of an average programmer (me)
index=['a', 'b'] it's a harder route in my brain (at least) than index=[0,1]

Joris Van den Bossche

unread,
Nov 24, 2014, 12:11:17 PM11/24/14
to pyd...@googlegroups.com
Setting the index can only be done using the column labels, not the position. Because, note that the [0, 1] in the one case are*also* labels and not positions (from_records gives a dataframe with integer column names if no column names are specified)
Reply all
Reply to author
Forward
0 new messages