pandas.DataFrame.mask behavior and exceptions raised while they shouldn't /Possible feature request

35 views
Skip to first unread message

mush...@gmail.com

unread,
Sep 4, 2017, 3:35:42 AM9/4/17
to PyData
Hello.

I have such dataframe whose elements I want to convert to sets for those which are string or lists, and replace with an empty set for those which are None.

          id    super_graph    sub_graph
    GO1    GO1    ['GO4', 'GO5', 'GO6', 'GO7', 'GO8', 'GO9']    GO9
    GO2    GO2    ['GO4', 'GO5', 'GO6', 'GO7', 'GO8', 'GO9']    GO11
    GO3    GO3    ['GO1', 'GO5', 'GO6', 'GO7', 'GO8', 'GO9']    GO12
    GO4    GO4    ['GO1', 'GO6', 'GO7']   
    GO5    GO5    ['GO5']   
    GO6    GO6    ['GO1', 'GO5', 'GO7', 'GO3', 'GO9']   
    GO7    GO7    ['GO2', 'GO5', 'GO6', 'GO7', 'GO8', 'GO10', 'GO11', 'GO12']   
    GO8    GO8    ['GO2', 'GO3', 'GO4', 'GO5', 'GO6', 'GO7', 'GO8', 'GO9']   
    GO9    GO9       



I managed to do that in two steps; converting in lists the strings, then converting those lists to steps using them:

initial_frame = count_frame.loc[:,['id', "super_graph", "sub_graph"]]#THE FRAME WHOSE EXAMPLE YOU HAVE ABOVE
    initial_frame_mask =  ~initial_frame.applymap(lambda cell: isinstance(cell, list)| (cell is None))
   
    list_frame = initial_frame.mask(initial_frame_mask,initial_frame.applymap(lambda l: [l]))
    list_frame2 = list_frame.applymap(lambda l: set(l) if l is not None else {})

The trick is here to use the [] list constructor (maybe I should not use this word which has a very specific meaning in the languages but I can't find any better) instead of list(l) in list_frame creation as they behave differently, [] will take a string as such, list() will break down string sequences to their parts.

Then I convert these lists using the set() method, and conditional expression for avoiding including None (the end goal being to add all three lists for each row of the columns (Maybe I could use better methods, don't know, but anyway, I want to answer to the question that's following, for personal edification)

I actually intended to do this in one step, using  code as follow: 

    initial_frame = count_frame.loc[:,['id', "super_graph", "sub_graph"]]
    initial_frame_mask =  ~initial_frame.applymap(lambda cell: isinstance(cell, list))
   
    list_frame = initial_frame.mask(initial_frame_mask,initial_frame.applymap(lambda l: {l}) if l is not None else {})
   
but Python won't let me do like I want:)
actually set() method accepts lists and strings, behaving like the list() method toward them. I thus intended to use {} to do that but it don't work; throwing this exception:

    list_frame = initial_frame.mask(initial_frame_mask,initial_frame.applymap(lambda l: {l} if l is not None else {}))
   
    TypeError: ("unhashable type: 'list'", 'occurred at index super_graph')

It is exactly as doing

    In [354]: l=[1,2]
    In [355]: {l}
    Traceback (most recent call last):
   
      File "<ipython-input-355-37b01148d270>", line 1, in <module>
        {l}
   
    TypeError: unhashable type: 'list'

So I believe the mask method selects the data after performing the vectorized operation on the whole, but thus, this triggers this error that I should not actually see since my initial_frame_mask is well tailored to avoid the inconvenient values.

        id    super_graph    sub_graph
    GO1    True    False    True
    GO2    True    False    True
    GO3    True    False    True
    GO4    True    False    False
    GO5    True    False    False
    GO6    True    False    False
    GO7    True    False    False
    GO8    True    False    False
    GO9    True    False    False

So I would like to know how I can do this in one step (maybe using a similar function, that does not behave like mask but avoid to begin with the bad values, or using another way to convert this).

This also borders on possible feature request as it would be convenient to have a setting reversing the order of operation inside the mask method so that this doesn't collide and raise error
when you are specifically using the mask to do so.
Thanks by advance.


Reply all
Reply to author
Forward
0 new messages