Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: How to find the beginning of last line of a big text file ?

0 views
Skip to first unread message

Sebastian Bassi

unread,
Jan 1, 2009, 11:54:44 AM1/1/09
to pytho...@python.org
On Thu, Jan 1, 2009 at 2:19 PM, Barak, Ron <Ron....@lsi.com> wrote:
> I have a very big text file: I need to find the place where the last line
> begins (namely, the offset of the one-before-the-last '\n' + 1).
> Could you suggest a way to do that without getting all the file into memory
> (as I said, it's a big file), or heaving to readline() all lines (ditto) ?

for line in open(filename):
lastline = line
print "the lastline is: %s",%lastline

This will read all the lines, but line by line, so you will never have
the whole file in memory.
There may be more eficient ways to do this, like using the itertools.

Best,
SB.

MRAB

unread,
Jan 1, 2009, 12:31:06 PM1/1/09
to pytho...@python.org
Barak, Ron wrote:
> Hi,
>
> I have a _very_ big text file: I need to find the place where the last
> line begins (namely, the offset of the one-before-the-last '\n' + 1).
>
> Could you suggest a way to do that without getting all the file into
> memory (as I said, it's a big file), or heaving to readline() all lines
> (ditto) ?
>
You could seek() to near the end of the file before reading lines with
readline(). Remember that the seek will almost certainly put the file
pointer somewhere in the middle of a line, but that doesn't matter
provided that it's not the last line (ie if the second readline()
returns "" then the first readline() started somewhere in middle of the
last line of the file). If you find that the seek put the file pointer
somewhere in the middle of the last line, then try again, but this time
seeking further back from the end of file before reading. Repeat as
necessary.

Tim Chase

unread,
Jan 1, 2009, 1:03:46 PM1/1/09
to Sebastian Bassi, pytho...@python.org
Sebastian Bassi wrote:
> On Thu, Jan 1, 2009 at 2:19 PM, Barak, Ron <Ron....@lsi.com> wrote:
>> I have a very big text file: I need to find the place where the last line

>> begins (namely, the offset of the one-before-the-last '\n' + 1).
>> Could you suggest a way to do that without getting all the file into memory
>> (as I said, it's a big file), or heaving to readline() all lines (ditto) ?
>
> for line in open(filename):
> lastline = line
> print "the lastline is: %s",%lastline
>
> This will read all the lines, but line by line, so you will never have
> the whole file in memory.
> There may be more eficient ways to do this, like using the itertools.

I think the OP wanted to do it without having to touch each line
in the file. The following should do the trick, returning both
the offset in the file, and that last line's content.

from os import stat
def last_line(fname, estimated_line_size=1024):
assert estimated_line_size > 0
file_size = stat(fname).st_size
if not file_size: return 0, ""
f = file(fname, 'rb')
f.seek(-1, 2) # grab the last character
if f.read(1) == '\n': # a "proper" text file
file_size -= 1
offset = file_size
content = ""
while offset >= 0 and '\n' not in content:
offset -= estimated_line_size
if offset < 0:
estimated_line_size += offset # back it off
offset = 0
f.seek(offset)
block = f.read(estimated_line_size)
content = block + content
f.close()
loc = content.rfind('\n') + 1 # after the newline
return offset + loc, content[loc:]
offset, line = last_line('some_file.txt')
print "[%r] was found at offset %i" % (line, offset)

In theory, it should even handle "malformed" text-files that
don't end in a newline. There might be some odd edge-cases that
I missed, but I think I caught most of them.

-tkc


Barak, Ron

unread,
Jan 4, 2009, 6:25:18 AM1/4/09
to Tim Chase, pytho...@python.org
Hi Tim,

Thanks for the solution (and effort), and for teaching me some interesting new tricks.

Happy 2009!
Ron.

0 new messages