Hello.
I have such dataframe whose elements I want to convert to sets for those which are string or lists, and replace with an empty set for those which are None.
id super_graph sub_graph
GO1 GO1 ['GO4', 'GO5', 'GO6', 'GO7', 'GO8', 'GO9'] GO9
GO2 GO2 ['GO4', 'GO5', 'GO6', 'GO7', 'GO8', 'GO9'] GO11
GO3 GO3 ['GO1', 'GO5', 'GO6', 'GO7', 'GO8', 'GO9'] GO12
GO4 GO4 ['GO1', 'GO6', 'GO7']
GO5 GO5 ['GO5']
GO6 GO6 ['GO1', 'GO5', 'GO7', 'GO3', 'GO9']
GO7 GO7 ['GO2', 'GO5', 'GO6', 'GO7', 'GO8', 'GO10', 'GO11', 'GO12']
GO8 GO8 ['GO2', 'GO3', 'GO4', 'GO5', 'GO6', 'GO7', 'GO8', 'GO9']
GO9 GO9
I managed to do that in two steps; converting in lists the strings, then converting those lists to steps using them:
initial_frame = count_frame.loc[:,['id', "super_graph", "sub_graph"]]#THE FRAME WHOSE EXAMPLE YOU HAVE ABOVE
initial_frame_mask = ~initial_frame.applymap(lambda cell: isinstance(cell, list)| (cell is None))
list_frame = initial_frame.mask(initial_frame_mask,initial_frame.applymap(lambda l: [l]))
list_frame2 = list_frame.applymap(lambda l: set(l) if l is not None else {})
The trick is here to use the [] list constructor (maybe I should not use this word which has a very specific meaning in the languages but I can't find any better) instead of list(l) in list_frame creation as they behave differently, [] will take a string as such, list() will break down string sequences to their parts.
Then I convert these lists using the set() method, and conditional expression for avoiding including None (the end goal being to add all three lists for each row of the columns (Maybe I could use better methods, don't know, but anyway, I want to answer to the question that's following, for personal edification)
I actually intended to do this in one step, using code as follow:
initial_frame = count_frame.loc[:,['id', "super_graph", "sub_graph"]]
initial_frame_mask = ~initial_frame.applymap(lambda cell: isinstance(cell, list))
list_frame = initial_frame.mask(initial_frame_mask,initial_frame.applymap(lambda l: {l}) if l is not None else {})
but Python won't let me do like I want:)
actually set() method accepts lists and strings, behaving like the list() method toward them. I thus intended to use {} to do that but it don't work; throwing this exception:
list_frame = initial_frame.mask(initial_frame_mask,initial_frame.applymap(lambda l: {l} if l is not None else {}))
TypeError: ("unhashable type: 'list'", 'occurred at index super_graph')
It is exactly as doing
In [354]: l=[1,2]
In [355]: {l}
Traceback (most recent call last):
File "<ipython-input-355-37b01148d270>", line 1, in <module>
{l}
TypeError: unhashable type: 'list'
So I believe the mask method selects the data after performing the vectorized operation on the whole, but thus, this triggers this error that I should not actually see since my initial_frame_mask is well tailored to avoid the inconvenient values.
id super_graph sub_graph
GO1 True False True
GO2 True False True
GO3 True False True
GO4 True False False
GO5 True False False
GO6 True False False
GO7 True False False
GO8 True False False
GO9 True False False
So I would like to know how I can do this in one step (maybe using a similar function, that does not behave like mask but avoid to begin with the bad values, or using another way to convert this).
This also borders on possible feature request as it would be convenient to have a setting reversing the order of operation inside the mask method so that this doesn't collide and raise error
when you are specifically using the mask to do so.
Thanks by advance.