Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: How to change a generator ?

0 views
Skip to first unread message

MRAB

unread,
Dec 24, 2008, 12:03:58 PM12/24/08
to pytho...@python.org
Barak, Ron wrote:
> Hi,
>
> I have a generator whose aim is to returns consecutive lines from a file
> (the listing below is a simplified version).
> However, as it is written now, the generator method changes the text
> file pointer to end of file after first invocation.
> Namely, the file pointer changes from 0 to 6623 on line 24.
>
It might be that the generator method of self.input_file is reading the
file a chunk at a time for efficiency even though it's yielding a line
at a time.

> Can you suggest how the generator could be changed, so it will allow me
> to get the current location in the file after each yield ?
>
> Thanks,
> Ron.
>
> $ cat -n generator.py # listing without line numbers is below
> 1 #!/usr/bin/env python
> 2
> 3 import gzip
> 4 from Debug import _line as line
> 5
> 6 class LogStream():
> 7
> 8 def __init__(self, filename):
> 9 self.filename = filename
> 10 self.input_file = self.open_file(filename)
> 11
> 12 def open_file(self, in_file):
> 13 try:
> 14 f = gzip.GzipFile(in_file, "r")
> 15 f.readline()
> 16 except IOError:
> 17 f = open(in_file, "r")
> 18 f.readline()
> 19 f.seek(0)
> 20 return(f)
> 21
> 22 def line_generator(self):
> 23 print line()+".
> self.input_file.tell()==",self.input_file.tell()
> 24 for line_ in self.input_file:
> 25 print line()+".
> self.input_file.tell()==",self.input_file.tell()
> 26 yield line_.strip()
> 27
> 28
> 29 if __name__ == "__main__":
> 30
> 31 filename = "sac.log.50lines"
> 32 log_stream = LogStream(filename)
> 33 log_stream.input_file.seek(0)
> 34 line_generator = log_stream.line_generator()
> 35 line_ = line_generator.next()
>
> $ python generator.py
> 23. self.input_file.tell()== 0
> 25. self.input_file.tell()== 6623
>
> $ wc -c sac.log.50lines
> 6623 sac.log.50lines
> $ cat generator.py
> #!/usr/bin/env python
>
> import gzip
> from Debug import _line as line
>
> class LogStream():
>
> def __init__(self, filename):
> self.filename = filename
> self.input_file = self.open_file(filename)
>
> def open_file(self, in_file):
> try:
> f = gzip.GzipFile(in_file, "r")
> f.readline()
> except IOError:
> f = open(in_file, "r")
> f.readline()
> f.seek(0)
> return(f)
>
> def line_generator(self):
> print line()+". self.input_file.tell()==",self.input_file.tell()
> for line_ in self.input_file:
> print line()+". self.input_file.tell()==",self.input_file.tell()
> yield line_.strip()
>
>
> if __name__ == "__main__":
>
> filename = "sac.log.50lines"
> log_stream = LogStream(filename)
> log_stream.input_file.seek(0)
> line_generator = log_stream.line_generator()
> line_ = line_generator.next()
>

Gabriel Genellina

unread,
Dec 24, 2008, 12:37:53 PM12/24/08
to pytho...@python.org
En Wed, 24 Dec 2008 15:03:58 -0200, MRAB <goo...@mrabarnett.plus.com>
escribió:

>> I have a generator whose aim is to returns consecutive lines from a
>> file (the listing below is a simplified version).
>> However, as it is written now, the generator method changes the text
>> file pointer to end of file after first invocation.
>> Namely, the file pointer changes from 0 to 6623 on line 24.
>>
> It might be that the generator method of self.input_file is reading the
> file a chunk at a time for efficiency even though it's yielding a line
> at a time.

I think this is the case too.
I can think of 3 alternatives:

a) open the file unbuffered (bufsize=0). But I think this would greatly
decrease performance.

b) keep track internally of file position (by adding each line length).
The file should be opened in binary mode in this case (to avoid any '\n'
translation).

c) return line numbers only, instead of file positions. Seeking to a
certain line number requires to re-read the whole file from start;
depending on how often this is required, and how big is the file, this
might be acceptable.

--
Gabriel Genellina

MRAB

unread,
Dec 24, 2008, 1:00:26 PM12/24/08
to pytho...@python.org
readline() appears to work as expected, leaving the file position at the
start of the next line.
0 new messages