pandas -- maximum size of records / size of dataframe

4,660 views
Skip to first unread message

Simon Cropper

unread,
Jul 8, 2014, 8:01:55 AM7/8/14
to pyd...@googlegroups.com
Hi,

Can some one please document or point me to some documentation on
maximum numbers of records / size of dataframe that can be manipulated
by pandas at one time?

My understanding is that a dataframe is resident in memory when being
work on so is the limit set by available memory or does pandas cache
sections of the dataframe?

Any information would be appreciated.

--
Cheers Simon

Simon Cropper - Open Content Creator

Free and Open Source Software Workflow Guides
------------------------------------------------------------
Introduction http://www.fossworkflowguides.com
GIS Packages http://www.fossworkflowguides.com/gis
bash / Python http://www.fossworkflowguides.com/scripting

Jeff

unread,
Jul 8, 2014, 8:11:41 AM7/8/14
to pyd...@googlegroups.com
Their isn't a maximum size per se.

However, most pandas operations return a new dataframe when you apply operations. The key is python 
has to be able to allocate a single block of memory to hold the data (this is per dtype). numpy arrays which form
the basis for storage, are kept contiguously in memory.

Simon Cropper

unread,
Jul 8, 2014, 8:19:41 AM7/8/14
to pyd...@googlegroups.com
So Jeff,

in reality you would not be able to work on millions of records from a large relational database.

It would be better to use SQL to extract what you need and manipulate the smaller dataset.
--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jeff Reback

unread,
Jul 8, 2014, 8:22:54 AM7/8/14
to pyd...@googlegroups.com
sure that is a possibility 

also you can chunk reading csv and use hdf5

chunking sql is coming in a future pandas

it all depends on what you are doing 
even millions of records is very possible with reasonable 10gb + memory (and of course using 64-bit is a must)

you need to work with memory/disk limits
Reply all
Reply to author
Forward
0 new messages