Maybe I missed something about the split-apply-combine strategy*, but
I don't get how to retrieve simply the content of some rows for all or
part of groups.
get_group() does this beautifully for one but not for many (i tried to pass sets, lists at no effect),
I can come up easily with something like:
groups_dict= {k: list(grouped_df.get_group(k).loc[:,idcolumn]) for k in grouped_df.groups.keys()}
But I guess this is not computationally efficient, plus it has to be repeated for all columns you want to aggregate.
I actually came up with another solution (agg_df being the original dataframe, you may need resetting the df index in case you used some exploding strategy):
groups_dict2= {k: list(agg_df2.loc[v,'n']) for k,v in grouped.indices.items()}
Are there any better methods or wouldn't this be relevant to have a setting to output the actual values instead of an index?
dcolumn="A"
col="B"
current_wdf=df[[idcolumn,col]].dropna()
current_col=current_wdf.loc[:,col]
exploded_df=current_col.str.split('\,').apply(pd.Series,1).stack()#much slower but keep the index. I could used substitution with enumerate after dropping level
exploded_df.index=exploded_df.index.droplevel(-1)
exploded_df.name=col
agg_df=pd.DataFrame(current_wdf.loc[:,idcolumn]).join(exploded_df)
grouped=agg_df.groupby([col])
grouped=agg_df.groupby([col])
groups_dict= {k: list(grouped.get_group(v).loc[:,idcolumn]) for k, v in grouped.groups.items()}