Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Not sure why this is filling my sys memory

0 views
Skip to first unread message

Jonathan Gardner

unread,
Feb 20, 2010, 8:44:57 PM2/20/10
to Vincent Davis, pytho...@python.org
On Sat, Feb 20, 2010 at 5:07 PM, Vincent Davis <vin...@vincentdavis.net> wrote:
>> Code is below, The files are about 5mb and 230,000 rows. When I have 43
>> files of them and when I get to the 35th (reading it in) my system gets so
>> slow that it is nearly functionless. I am on a mac and activity monitor
>> shows that python is using 2.99GB of memory (of 4GB). (python 2.6 64bit).
>> The getsizeof() returns 6424 bytes for the alldata . So I am not sure what
>> is happening.

With this kind of data set, you should start looking at BDBs or
PostgreSQL to hold your data. While processing files this large is
possible, it isn't easy. Your time is better spent letting the DB
figure out how to arrange your data for you.

--
Jonathan Gardner
jgar...@jonathangardner.net

Jonathan Gardner

unread,
Feb 21, 2010, 12:34:41 AM2/21/10
to Vincent Davis, pytho...@python.org
On Sat, Feb 20, 2010 at 5:53 PM, Vincent Davis <vin...@vincentdavis.net> wrote:

>> On Sat, Feb 20, 2010 at 6:44 PM, Jonathan Gardner <jgar...@jonathangardner.net> wrote:
>>
>> With this kind of data set, you should start looking at BDBs or
>> PostgreSQL to hold your data. While processing files this large is
>> possible, it isn't easy. Your time is better spent letting the DB
>> figure out how to arrange your data for you.
>
> I really do need all of it in at time, It is dna microarray data. Sure there are 230,00 rows but only 4 columns of small numbers. Would it help to make them float() ? I need to at some point. I know in numpy there is a way to set the type for the whole array "astype()" I think.
> What I don't get is that it show the size of the dict with all the data to have only 6424 bytes. What is using up all the memory?
>

Look into getting PostgreSQL to organize the data for you. It's much
easier to do processing properly with a database handle than a file
handle. You may also discover that writing functions in Python inside
of PostgreSQL can scale very well for whatever data needs you have.

--
Jonathan Gardner
jgar...@jonathangardner.net

0 new messages