slicing dataframes MultiIndex lexsort depth error

Martin De Kauwe

unread,

Sep 23, 2013, 3:53:40 AM9/23/13

to pyd...@googlegroups.com

Hi,

(apologies if cross posted, i sent it to the wrong mailing list I think https://groups.google.com/forum/#!topic/pystatsmodels/LL486HdjLfs)

I am reading multiple CSV files which are quite large 4383 rows x 80 columns and merging them to create a large dataframe which I then use later on. For a few of these CSV files everything works fine, but as I increase the number I run into an error I don't understand.

KeyError: 'MultiIndex lexsort depth 0, key was length 3'

If I generate a test example

import pandas as pd

import numpy as np

import datetime as dt

import cPickle as pickle

model_list = ["GDAY","SDVM","LPJX"]

#model_list = ["GDAY","SDVM"]

df_list = []

key_list = []

treatment = "AMB"

exp = "AVG"

for model in model_list:

df = pd.DataFrame(np.random.randn(4383, 80),

index=pd.date_range('20010101', periods=4383),

columns=['YEAR','DOY','CO2','PPT','PAR','AT','ST','VPD',\

'SW','NDEP','NEP','GPP','NPP','CEX','CVOC','RECO',\

'RAUTO','RLEAF','RWOOD','RROOT','RGROW','RHET',\

'RSOIL','ET','T','ES','EC','RO','DRAIN','LE',\

'SH','CL','CW','CCR','CFR','TNC','CFLIT',\

'CFLITA','CFLITB','CCLITB','CSOIL','GL',\

'GW','GCR','GR','CLLFALL','CRLIN','CWIN','LAI',\

'LMA','NCON','NCAN','NWOOD','NCR','NFR',\

'NSTOR','NLIT','NRLIT','NDW','NSOIL','NPOOLM',\

'NPOOLO','NFIX','NLITIN','NWLIN','NRLIN','NUP',\

'NGMIN','NMIN','NVOL','NLEACH','NGL','NGW',\

'NGCR','NGR','APARd','GCd','GAd','GBd','Betad'])

df_list.append(df)

# allows us to select by m, s or t

key_list.append((model,treatment,exp))

dfs = pd.concat(df_list, axis=1, keys=key_list,

names=["model","treatment","exp"])

dfs.to_pickle("models_output.pkl")

dfs = pd.read_pickle("models_output.pkl")

print dfs["GDAY","AMB","AVG"]

this will produce the error, but not when the loop over models is only two elements. How can I fix this?

thanks

Jeff Reback

unread,

Sep 23, 2013, 6:20:23 AM9/23/13

to pyd...@googlegroups.com

.sortlevel()

http://pandas.pydata.org/pandas-docs/dev/indexing.html#the-need-for-sortedness-with-multiindex

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Martin De Kauwe

unread,

Sep 23, 2013, 6:42:02 AM9/23/13

to pyd...@googlegroups.com

I saw that thanks, but I am having trouble following it and applying it to my test case to be honest. Suggestions welcome...

Jeff

unread,

Sep 23, 2013, 8:21:24 AM9/23/13

to pyd...@googlegroups.com

dfs.columns.lexsort_depth

0

This is not sorted.

dfs = dfs.sortlevel(0,axis=1)

dfs.columns.lexsort_depth

4

dfs[("GDAY","AMB","AVG")]

DatetimeIndex: 4383 entries, 2001-01-01 00:00:00 to 2012-12-31 00:00:00

Freq: D

Data columns (total 80 columns):

YEAR 4383 non-null values

DOY 4383 non-null values

CO2 4383 non-null values

PPT 4383 non-null values

PAR 4383 non-null values

AT 4383 non-null values

ST 4383 non-null values

VPD 4383 non-null values

SW 4383 non-null values

NDEP 4383 non-null values

NEP 4383 non-null values

GPP 4383 non-null values

NPP 4383 non-null values

CEX 4383 non-null values

CVOC 4383 non-null values

RECO 4383 non-null values

RAUTO 4383 non-null values

RLEAF 4383 non-null values

RWOOD 4383 non-null values

RROOT 4383 non-null values

RGROW 4383 non-null values

RHET 4383 non-null values

RSOIL 4383 non-null values

ET 4383 non-null values

T 4383 non-null values

ES 4383 non-null values

EC 4383 non-null values

RO 4383 non-null values

DRAIN 4383 non-null values

LE 4383 non-null values

SH 4383 non-null values

CL 4383 non-null values

CW 4383 non-null value

Reply all

Reply to author

Forward