Work with chunked df like not-chunked df

20 views
Skip to first unread message

Łukasz Wilk

unread,
Jul 8, 2016, 2:51:32 PM7/8/16
to PyData
Hi,

I am working with chunked df and it's time consuming to write code for querying chunked df
for example if I would like to perform:
df.col1.nunique()

then I have to write:
values = set()
for df_chunk in df_chunked:
    values
.union(df.unique())
len
(values)

maybe it would be a good idea to create mechanism to perform these functions out of the box?
I have an idea to create object inside of dataframe that would hold logic with looping through all chunks like:
df.chunks.nunique()

would perform code with loop from code example 2.
I am not sure if it's doable in this way and if it would have value?

Regards,
Łukasz

Tom Augspurger

unread,
Jul 8, 2016, 2:59:51 PM7/8/16
to pyd...@googlegroups.com
You might want to look at Dask, specifically dask.dataframe which builds on top of pandas, and will handle all this for you.
--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages