Dataset columns throwing KeyError

296 views
Skip to first unread message

Hannah Rowland

unread,
Apr 11, 2021, 12:46:23 PM4/11/21
to The ETE toolkit
Hi everyone,

I'm trying to use a "layout" function executed in a loop for each node in the tree, but I get a key error that the species in the column cannot be found. I've hit a block on understanding why my code is looking for the species in column headers and not in the column. Your help is very much appreciated!

My best,
Hannah

Here are the steps I take:

I have loaded a .csv file 

df = pd.read_csv("statecsv") 
names= df.columns.str.split(';').tolist()
df= df.iloc[:,0].str.split(';', expand=True)
df.columns=['family', 'Q', 'K', 'E', 'L', 'T','H','R']             
df

df = df.set_index('family')
df

this has the following structure.
Screenshot 2021-04-11 at 12.10.17.png

And then to plot a circle tree with each node marked with it's trait state:

def layout(node):
    node.img_style['vt_line_color']="steelblue"
    node.img_style['hz_line_color']="steelblue"
    node.img_style['size']=0
    node.img_style['vt_line_width']=4
    node.img_style['hz_line_width']=4

    rF_w = 25
    rF_h = 12
    marginL = 5
    
if node.is_leaf():

        #### trait_1
        if (df.loc[node.name].Q==1):
            rectF = RectFace(width=rF_w, height=rF_h, fgcolor='white', bgcolor='sienna')
            rectF.margin_left=marginL + 15
            add_face_to_node(rectF, node, column=0, position="aligned")

This is the key error:

--------------------------------------------------------------------------- KeyError Traceback (most recent call last) ~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 2894 try: -> 2895 return self._engine.get_loc(casted_key) 2896 except KeyError as err: pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 'Rhea' The above exception was the direct cause of the following exception: KeyError Traceback (most recent call last) <ipython-input-32-4091e222b32f> in <module> 15 16 #### trait_1 ---> 17 if (df.loc[node.name].Q==1): 18 rectF = RectFace(width=rF_w, height=rF_h, fgcolor='white', bgcolor='sienna') 19 rectF.margin_left=marginL + 15 ~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py in __getitem__(self, key) 877 878 maybe_callable = com.apply_if_callable(key, self.obj) --> 879 return self._getitem_axis(maybe_callable, axis=axis) 880 881 def _is_scalar_access(self, key: Tuple): ~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis) 1108 # fall thru to straight lookup 1109 self._validate_key(key, axis) -> 1110 return self._get_label(key, axis=axis) 1111 1112 def _get_slice_axis(self, slice_obj: slice, axis: int): ~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py in _get_label(self, label, axis) 1057 def _get_label(self, label, axis: int): 1058 # GH#5667 this will fail if the label is not present in the axis. -> 1059 return self.obj.xs(label, axis=axis) 1060 1061 def _handle_lowerdim_multi_index_axis0(self, tup: Tuple): ~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py in xs(self, key, axis, level, drop_level) 3489 loc, new_index = self.index.get_loc_level(key, drop_level=drop_level) 3490 else: -> 3491 loc = self.index.get_loc(key) 3492 3493 if isinstance(loc, np.ndarray): ~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 2895 return self._engine.get_loc(casted_key) 2896 except KeyError as err: -> 2897 raise KeyError(key) from err 2898 2899 if tolerance is not None: KeyError: 'Rhea'

dengzi...@gmail.com

unread,
Apr 12, 2021, 4:31:23 AM4/12/21
to The ETE toolkit
Hi,
Your code seems ok, 
the line
```
....
if node.is_leaf():
    if (df.loc[node.name].Q==1):
....
```
is to determine if a leaf node from your tree was matched in the corresponding row and its column "Q" equals to 1 from your dataframe. The line executed correctly except they don't find "Rhea" in your column of "family" from the df, which exsist in your tree leaf.

So maybe check your tree leaves because seems not all of your tree leaves have a corresponding match in your df.

If you only want to find the match from your df and ignore the unmatch tree leaves, set a step to check if key exist in the df, like:
```
if node.is_leaf():
    if node.name in df.index: # to check if node in the dataframe index column
        if df.loc[node.name].Q==1:
      
``` 
Hope it helps.

Ziqi
Reply all
Reply to author
Forward
0 new messages