Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Memory error due to big input file
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  7 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
MRAB  
View profile  
 More options Jul 13, 1:20 pm
Newsgroups: comp.lang.python
From: MRAB <pyt...@mrabarnett.plus.com>
Date: Mon, 13 Jul 2009 18:20:43 +0100
Local: Mon, Jul 13 2009 1:20 pm
Subject: Re: Memory error due to big input file
sityee kong wrote:
> Hi All,

> I have a similar problem that many new python users might encounter. I
> would really appreciate if you could help me fix the error.
> I have a big text file with size more than 2GB. It turned out memory
> error when reading in this file. Here is my python script, the error
> occurred at line -- self.fh.readlines().

[snip code]
Your 'error' is that you're running it on a computer with insufficient
memory.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
s...@pobox.com  
View profile  
 More options Jul 13, 1:18 pm
Newsgroups: comp.lang.python
From: s...@pobox.com
Date: Mon, 13 Jul 2009 12:18:25 -0500
Local: Mon, Jul 13 2009 1:18 pm
Subject: Re: Memory error due to big input file

    phoebe> I have a big text file with size more than 2GB. It turned out
    phoebe> memory error when reading in this file. Here is my python
    phoebe> script, the error occurred at line -- self.fh.readlines().

    phoebe> import math
    phoebe> import time

    phoebe> class textfile:
    phoebe>   def __init__(self,fname):
    phoebe>      self.name=fname
    phoebe>      self.fh=open(fname)
    phoebe>      self.fh.readline()
    phoebe>      self.lines=self.fh.readlines()

Don't do that.  The problem is that you are trying to read the entire file
into memory.  Learn to operate a line (or a few lines) at a time.  Try
something like:

    a = open("/home/sservice/nfbc/GenoData/CompareCalls3.diff")
    for line in a:
        do your per-line work here

--
Skip Montanaro - s...@pobox.com - http://www.smontanaro.net/
    when i wake up with a heart rate below 40, i head right for the espresso
    machine. -- chaos @ forums.usms.org


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dave Angel  
View profile  
 More options Jul 13, 5:01 pm
Newsgroups: comp.lang.python
From: Dave Angel <da...@ieee.org>
Date: Mon, 13 Jul 2009 17:01:17 -0400
Local: Mon, Jul 13 2009 5:01 pm
Subject: Re: Memory error due to big input file

Others have pointed out that you have too little memory for a 2gig data
structure.  If you're running on a 32bit system, chances are it won't
matter how much memory you add, a process is limited to 4gb, and the OS
typically takes about half of it, your code and other data takes some,
and you don't have 2gig left.   A 64 bit version of Python, running on a
64bit OS, might be able to "just work."

Anyway, loading the whole file into a list is seldom the best answer,
except for files under a meg or so.  It's usually better to process the
file in sequence.  It looks like you're also making slices of that data,
so they could potentially be pretty big as well.

If you can be assured that you only need the current line and the
previous two (for example), then you can use a list of just those three,
and delete the oldest one, and add a new one to that list each time
through the loop.

Or, you can add some methods to that 'textfile' class that fetch a line
by index.  Brute force, you could pre-scan the file, and record all the
file offsets for the lines you find, rather than storing the actual
line.  So you still have just as big a list, but it's a list of
integers.  Then when somebody calls your method, he passes an integer,
and you return the particular line.  A little caching for performance,
and you're good to go.

Anyway, if you organize it that way, you can code the rest of the module
to not care whether the whole file is really in memory or not.

BTW, you should derive all your classes from something.  If nothing
else, use object.
  class textfile(object):


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Aaron Scott  
View profile  
 More options Jul 13, 5:20 pm
Newsgroups: comp.lang.python
From: Aaron Scott <aaron.hildebra...@gmail.com>
Date: Mon, 13 Jul 2009 14:20:13 -0700 (PDT)
Local: Mon, Jul 13 2009 5:20 pm
Subject: Re: Memory error due to big input file

> BTW, you should derive all your classes from something.  If nothing
> else, use object.
>   class textfile(object):

Just out of curiousity... why is that? I've been coding in Python for
a long time, and I never derive my base classes. What's the advantage
to deriving them?

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Vilya Harvey  
View profile  
 More options Jul 13, 5:51 pm
Newsgroups: comp.lang.python
From: Vilya Harvey <vilya.har...@gmail.com>
Date: Mon, 13 Jul 2009 22:51:01 +0100
Local: Mon, Jul 13 2009 5:51 pm
Subject: Re: Memory error due to big input file
2009/7/13 Aaron Scott <aaron.hildebra...@gmail.com>:

>> BTW, you should derive all your classes from something.  If nothing
>> else, use object.
>>   class textfile(object):

> Just out of curiousity... why is that? I've been coding in Python for
> a long time, and I never derive my base classes. What's the advantage
> to deriving them?

    class Foo:

uses the old object model.

    class Foo(object):

uses the new object model.

See http://docs.python.org/reference/datamodel.html (specifically
section 3.3) for details of the differences.

Vil.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Chris Rebert  
View profile  
 More options Jul 13, 6:02 pm
Newsgroups: comp.lang.python
From: Chris Rebert <c...@rebertia.com>
Date: Mon, 13 Jul 2009 15:02:37 -0700
Local: Mon, Jul 13 2009 6:02 pm
Subject: Re: Memory error due to big input file

Note that Python 3.0 makes explicitly subclassing `object` unnecessary
since it removes old-style classes; a class that doesn't explicitly
subclass anything will implicitly subclass `object`.

Cheers,
Chris
--
http://blog.rebertia.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Steven D'Aprano  
View profile  
 More options Jul 13, 11:49 pm
Newsgroups: comp.lang.python
From: Steven D'Aprano <ste...@REMOVE.THIS.cybersource.com.au>
Date: 14 Jul 2009 03:49:30 GMT
Local: Mon, Jul 13 2009 11:49 pm
Subject: Re: Memory error due to big input file

On Mon, 13 Jul 2009 14:20:13 -0700, Aaron Scott wrote:
>> BTW, you should derive all your classes from something.  If nothing
>> else, use object.
>>   class textfile(object):

> Just out of curiousity... why is that? I've been coding in Python for a
> long time, and I never derive my base classes. What's the advantage to
> deriving them?

"Old style" classes (those whose base classes aren't derived from
anything) have a few disadvantages:

(1) Properties don't work correctly:

>>> class Parrot:  # old-style class

...     def __init__(self):
...             self._x = 3
...     def _setter(self, value):
...             self._x = value
...     def _getter(self):
...             print "Processing ..."
...             return self._x + 1
...     x = property(_getter, _setter)
...
>>> p = Parrot()
>>> p.x

Processing ...
4
>>> p.x = 2
>>> p.x

2

In general, anything that uses the descriptor protocol, not just
property, will fail to work correctly with old-style classes.

(2) Classes using multiple inheritance with diamond-shaped inheritance
will be broken.

(3) __slots__ is just an attribute.

(4) super() doesn't work.

And, depending on whether you consider this a disadvantage or an
advantage:

(5) Special methods like __len__ can be over-ridden on the instance, not
just the class:

>>> class K:

...     def __len__(self):
...             return 0
...
>>> k = K()
>>> len(k)
0
>>> k.__len__ = lambda : 42
>>> len(k)

42

In their favour:

(1) Less typing.

(2) If you're not using descriptors, including property(), or multiple
inheritance with diamond diagrams, they work fine.

(3) They're (apparently) a tiny bit faster.

--
Steven


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google