for i, line in enumerate( open( textfile ) ):
if i == 0:
print 'First line is: ' + line
elif i == 1:
print 'Second line is: ' + line
.......
.......
I thought about f = open( textfile ) and then f[0], f[1], etc but that
throws a TypeError: 'file' object is unsubscriptable.
Is there a simpler way?
filehandle = open('/path/to/foo.txt')
for line in filehandle:
# do something...
But you could also do a generic lines = filehandle.readlines(), which
returns a list of all lines in the file, but that's a bit memory
hungry.
If all you need is sequential access, you can use the next() method of
the file object:
nextline = open(textfile).next
print 'First line is: %r' % nextline()
print 'Second line is: %r' % nextline()
...
For random access, the easiest way is to slurp all the file in a list
using file.readlines().
HTH,
George
>>> f = open("/etc/passwd")
>>> lines = f.readlines()
>>> lines[5]
'# lookupd DirectoryServices \n'
>>>
You can also check out the fileinput module. That ought to be sightly
more efficient and provides some additional functionality. I think
there are some restrictions on accessing lines out of order, though.
-Jeff
On 7/25/07, Daniel Nogradi <nog...@gmail.com> wrote:
> A very simple question: I currently use a cumbersome-looking way of
> getting the first, second, etc. line of a text file:
>
> for i, line in enumerate( open( textfile ) ):
> if i == 0:
> print 'First line is: ' + line
> elif i == 1:
> print 'Second line is: ' + line
> .......
> .......
>
> I thought about f = open( textfile ) and then f[0], f[1], etc but that
> throws a TypeError: 'file' object is unsubscriptable.
>
> Is there a simpler way?
> Depending on the size of your file, you can just use
> file.readlines. Note that file.readlines is going to read the
> entire file into memory, so don't use it on your plain-text
> version of War and Peace.
I don't think that would actually be a problem for any recent
machine.
The Project Gutenberg version of W&P is 3.1MB of text in 67403
lines. I just did an f.readlines() on it and it was pretty
much instantaneous, and the python interpreter instance that
contains that list of 67403 lines is using a bit less than 8MB
of RAM. An "empty" interpreter uses about 2.7MB. So, doing
f.readlines() on War and Peace requires a little over 5MB of RAM
-- not really much of a concern on any machine that's likely to
be running Python.
--
Grant Edwards grante Yow! Now I understand the
at meaning of "THE MOD SQUAD"!
visi.com
That might be a memory problem if you are running multiple processes
regularly, such as on a webserver.
YMMD :)
Regards,
Björn
--
BOFH excuse #335:
the AA battery in the wallclock sends magnetic interference
> That might be a memory problem if you are running multiple processes
> regularly, such as on a webserver.
I suppose if you did it in parallel 50 processes, you could use
up 250MB of RAM. Still not a big deal on many servers. A
decent OS will swap regions that aren't being used to disk, so
it's likely not to be a problem.
If you're talking several hundred instances, you could start to
use up serios amounts of VM. Still, I say do it the simple,
obvious way first, and optimize it _after_ you've determined
you have a performance problem (and determined where the
bottleneck is). Premature optimization...
--
Grant Edwards grante Yow! This PORCUPINE knows
at his ZIPCODE ... And he has
visi.com "VISA"!!
This is the same logic but less cumbersome, if that's what you mean:
to_get = [0, 3, 7, 11, 13]
got = dict((i,s) for (i,s) in enumerate(open(textfile)) if i in to_get)
print got[3]
This would probably be the best way for really big files and if you know
all of the lines you want ahead of time. If you need to access the file
multiple times at arbitrary positions, you may need to seek(0), cache
lines already read, or slurp the whole thing, which has already been
suggested.
James
--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095
Or, you might be reading from a text file dramatically larger than a 3MB copy of War and Peace. I regularly deal with log files that are often many times that, including some that have been well over a GB or more. Trust me, you don't want to read in the entire file when it's a 1.5GB text file. It's true that many times readlines() will work fine, but there are also certainly cases where it's not acceptable for memory and performance reasons.
-Jay
> A very simple question: I currently use a cumbersome-looking way of
> getting the first, second, etc. line of a text file:
>
> for i, line in enumerate( open( textfile ) ):
> if i == 0:
> print 'First line is: ' + line
> elif i == 1:
> print 'Second line is: ' + line
> .......
> .......
from itertools import islice
first_five_lines = list(islice(open(textfile), 5))
print 'first line is', first_five_lines[0]
print 'second line is', first_five_lines[1]
...
> Daniel Nogradi wrote:
>> A very simple question: I currently use a cumbersome-looking way of
>> getting the first, second, etc. line of a text file:
>
> to_get = [0, 3, 7, 11, 13]
> got = dict((i,s) for (i,s) in enumerate(open(textfile)) if i in to_get)
> print got[3]
>
> This would probably be the best way for really big files and if you know
> all of the lines you want ahead of time.
But it still has to read the complete file (altough it does not keep the
unwanted lines).
Combining this with Paul Rubin's suggestion of itertools.islice I think we
get the best solution:
got = dict((i,s) for (i,s) in
enumerate(islice(open(textfile),max(to_get)+1)) if i in to_get)
--
Gabriel Genellina
Thanks! This looks the best, I only need the first couple of lines
sequentially so don't need to read in the whole file ever.
A lazy evaluation scheme might be useful for random access that
only slurps as much as you need.
class LazySlurper(object):
r""" Lazily read a file using readline, allowing random access to the
results with __getitem__.
>>> import StringIO
>>> infile = StringIO.StringIO(
... "Line 0\n"
... "Line 1\n"
... "Line 2\n"
... "Line 3\n"
... "Line 4\n"
... "Line 5\n"
... "Line 6\n"
... "Line 7\n")
>>> slurper = LazySlurper(infile)
>>> print slurper[0],
Line 0
>>> print slurper[5],
Line 5
>>> print slurper[1],
Line 1
>>> infile.close()
"""
def __init__(self, fileobj):
self.fileobj = fileobj
self.upto = 0
self.lines = []
self._readupto(0)
def _readupto(self, n):
while self.upto <= n:
line = self.fileobj.readline()
if line == "":
break
self.lines.append(line)
self.upto += 1
def __getitem__(self, n):
self._readupto(n)
return self.lines[n]
--
Neil Cerutti
Eddie Robinson is about one word: winning and losing. --Eddie Robinson's agent
Paul Collier
if you only ever need the first few lines of a file, why not keep it
simple and do something like this?
mylines = open("c:\\myfile.txt","r").readlines()[:5]
that will give you the first five lines of the file. Replace 5 with
whatever number you need. next will work, too, obviously, but won't
that use of next hold the file open until you are done with it? Or,
more specifically, since you do not have a file object at all, won't
you have to wait until the function goes out of scope to release the
file? Would that be a problem? Or am I just being paranoid?
f.readlines()[:5]
reads the whole file in and generates a list of the lines just so it can
slice the first five off. Compare that, on a large file, with something like
[f.next() for _ in range(5)]
and I think you will see that the latter is significantly better in
almost all respects.
regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------
or even faster:
wanted = set([0, 3, 7, 11, 13])
with open(textfile) as src:
got = dict((i, s) for (i, s) in enumerate(islice(src,
min(wanted), max(wanted) + 1))
if i in wanted)
Of course that could just as efficiently create a list as a dict.
Note that using a list rather than a set for wanted takes len(wanted)
comparisons on misses, and len(wanted)/2 on hits, but most likely a
single comparison for a dict whether it is a hit or a miss.
--Scott David Daniels
Scott....@Acm.Org
[nice recipe to retrieve only certain lines of a file]
I think your time machine needs an adjustment, it spits things almost two
years later :)
--
Gabriel Genellina