Classic situation - I have to process an input stream of unknown length until a I reach its end (EOF, End Of File). How do I check for EOF? The input stream can be anything from opened file through sys.stdin to a network socket. And it's binary and potentially huge (gigabytes), thus "for line in stream.readlines()" isn't really a way to go.
For now I have roughly:
stream = sys.stdin while True: data = stream.read(1024) process_data(data) if len(data) < 1024: ## (*) break
I smell a fragile point at (*) because as far as I know e.g. network sockets streams may return less data than requested even when the socket is still open.
I'd better like something like:
while not stream.eof(): ...
but there is not eof() method :-(
This is probably a trivial problem but I haven't found a decent solution.
> Classic situation - I have to process an input stream of unknown length > until a I reach its end (EOF, End Of File). How do I check for EOF? The > input stream can be anything from opened file through sys.stdin to a > network socket. And it's binary and potentially huge (gigabytes), thus > "for line in stream.readlines()" isn't really a way to go.
> For now I have roughly:
> stream = sys.stdin > while True: > data = stream.read(1024)
if len(data) == 0: break #EOF
> process_data(data)
-- Grant Edwards grante Yow! CALIFORNIA is where at people from IOWA or NEW visi.com YORK go to subscribe to CABLE TELEVISION!!
Grant Edwards wrote: > On 2007-02-19, GiBo <g...@gentlemail.com> wrote: >> Hi!
>> Classic situation - I have to process an input stream of unknown length >> until a I reach its end (EOF, End Of File). How do I check for EOF? The >> input stream can be anything from opened file through sys.stdin to a >> network socket. And it's binary and potentially huge (gigabytes), thus >> "for line in stream.readlines()" isn't really a way to go.
>> For now I have roughly:
>> stream = sys.stdin >> while True: >> data = stream.read(1024) > if len(data) == 0: > break #EOF >> process_data(data)
Right, not a big difference though. Isn't there a cleaner / more intuitive way? Like using some wrapper objects around the streams or something?
> Grant Edwards wrote: >> On 2007-02-19, GiBo <g...@gentlemail.com> wrote:
>>> Classic situation - I have to process an input stream of unknown length >>> until a I reach its end (EOF, End Of File). How do I check for EOF? The >>> input stream can be anything from opened file through sys.stdin to a >>> network socket. And it's binary and potentially huge (gigabytes), thus >>> "for line in stream.readlines()" isn't really a way to go.
>>> For now I have roughly:
>>> stream = sys.stdin >>> while True: >>> data = stream.read(1024) >> if len(data) == 0: >> break #EOF >>> process_data(data)
> Right, not a big difference though. Isn't there a cleaner / more > intuitive way? Like using some wrapper objects around the streams or > something?
Read the documentation... For a true file object: read([size]) ... An empty string is returned when EOF is encountered immediately. All the other "file-like" objects (like StringIO, socket.makefile, etc) maintain this behavior. So this is the way to check for EOF. If you don't like how it was spelled, try this:
if data=="": break
If your data is made of lines of text, you can use the file as its own iterator, yielding lines:
>>> stream = sys.stdin >>> while True: >>> data = stream.read(1024) >> if len(data) == 0: >> break #EOF >>> process_data(data)
> Right, not a big difference though. Isn't there a cleaner / > more intuitive way?
A file is at EOF when read() returns ''. The above is the cleanest, simplest, most direct way to do what you specified. Everybody does it that way, and everybody recognizes what's being done.
It's also the "standard, Pythonic" way to do it.
> Like using some wrapper objects around the streams or > something?
You can do that, but then you're mostly just obfuscating things.
-- Grant Edwards grante Yow! Vote for ME at -- I'm well-tapered, visi.com half-cocked, ill-conceived and TAX-DEFERRED!
In article <mailman.4219.1171936242.32031.python-l...@python.org>, Gabriel Genellina wrote: > So this is the way to check for EOF. If you don't like how it was spelled, > try this:
> On 2/19/07, Gabriel Genellina <gagsl...@yahoo.com.ar> wrote: > > En Mon, 19 Feb 2007 21:50:11 -0300, GiBo <g...@gentlemail.com> escribió:
> > > Grant Edwards wrote: > > >> On 2007-02-19, GiBo <g...@gentlemail.com> wrote:
> > >>> Classic situation - I have to process an input stream of unknown length > > >>> until a I reach its end (EOF, End Of File). How do I check for EOF? The > > >>> input stream can be anything from opened file through sys.stdin to a > > >>> network socket. And it's binary and potentially huge (gigabytes), thus > > >>> "for line in stream.readlines()" isn't really a way to go.
> > > Right, not a big difference though. Isn't there a cleaner / more > > > intuitive way? Like using some wrapper objects around the streams or > > > something?
> > Read the documentation... For a true file object: > > read([size]) ... An empty string is returned when EOF is encountered > > immediately. > > All the other "file-like" objects (like StringIO, socket.makefile, etc) > > maintain this behavior. > > So this is the way to check for EOF. If you don't like how it was spelled, > > try this:
> > if data=="": break
> > If your data is made of lines of text, you can use the file as its own > > iterator, yielding lines:
> data = f.read(bufsize): > while data: > # ... process data. > data = f.read(bufsize)
> -The only annoying bit it the duplicated line. I find I often follow > this pattern, and I realize python doesn't plan to have any sort of > do-while construct, but even still I prefer this idiom. What's the > concensus here?
> What about creating a standard binary-file iterator:
> def blocks_of(infile, bufsize = 1024): > data = infile.read(bufsize) > if data: > yield data
> -the use would look like this:
> for block in blocks_of(myfile, bufsize = 2**16): > process_data(block) # len(block) <= bufsize...
(ahem), make that iterator something that works, like:
def blocks_of(infile, bufsize = 1024): data = infile.read(bufsize) while data: yield data data = infile.read(bufsize)
> En Mon, 19 Feb 2007 21:50:11 -0300, GiBo <g...@gentlemail.com> escribió:
> > Grant Edwards wrote: > >> On 2007-02-19, GiBo <g...@gentlemail.com> wrote:
> >>> Classic situation - I have to process an input stream of unknown length > >>> until a I reach its end (EOF, End Of File). How do I check for EOF? The > >>> input stream can be anything from opened file through sys.stdin to a > >>> network socket. And it's binary and potentially huge (gigabytes), thus > >>> "for line in stream.readlines()" isn't really a way to go.
> >>> For now I have roughly:
> >>> stream = sys.stdin > >>> while True: > >>> data = stream.read(1024) > >> if len(data) == 0: > >> break #EOF > >>> process_data(data)
> > Right, not a big difference though. Isn't there a cleaner / more > > intuitive way? Like using some wrapper objects around the streams or > > something?
> Read the documentation... For a true file object: > read([size]) ... An empty string is returned when EOF is encountered > immediately. > All the other "file-like" objects (like StringIO, socket.makefile, etc) > maintain this behavior. > So this is the way to check for EOF. If you don't like how it was spelled, > try this:
> if data=="": break
> If your data is made of lines of text, you can use the file as its own > iterator, yielding lines:
data = f.read(bufsize): while data: # ... process data. data = f.read(bufsize)
-The only annoying bit it the duplicated line. I find I often follow this pattern, and I realize python doesn't plan to have any sort of do-while construct, but even still I prefer this idiom. What's the concensus here?
What about creating a standard binary-file iterator:
def blocks_of(infile, bufsize = 1024): data = infile.read(bufsize) if data: yield data
-the use would look like this:
for block in blocks_of(myfile, bufsize = 2**16): process_data(block) # len(block) <= bufsize...
On Feb 19, 6:58 pm, GiBo <g...@gentlemail.com> wrote:
> Hi!
> Classic situation - I have to process an input stream of unknown length > until a I reach its end (EOF, End Of File). How do I check for EOF? The > input stream can be anything from opened file through sys.stdin to a > network socket. And it's binary and potentially huge (gigabytes), thus > "for line in stream.readlines()" isn't really a way to go.
Could you use xreadlines()? It's a lazily-evaluated stream reader.
> For now I have roughly:
> stream = sys.stdin > while True: > data = stream.read(1024) > process_data(data) > if len(data) < 1024: ## (*) > break
> I smell a fragile point at (*) because as far as I know e.g. network > sockets streams may return less data than requested even when the socket > is still open.
Well it depends on a lot of things. Is the stream blocking or non- blocking (on sockets and some other sorts of streams, you can pick this yourself)? What are the underlying semantics (reliable-and- blocking TCP or dropping-and-unordered-UDP)? Unfortunately, you really need to just know what you're working with (and there's really no better solution; trying to hide the underlying semantics under a proscribed overlaid set of semantics can only lead to badness in the long run).
> I'd better like something like:
> while not stream.eof(): > ...
> but there is not eof() method :-(
> This is probably a trivial problem but I haven't found a decent solution.
For your case, it's not so hard: http://pyref.infogami.com/EOFError says "read() and readline() methods of file objects return an empty string when they hit EOF." so you should assume that if something is claiming to be a file-like object that it will work this way.
> Any hints?
So: stream = sys.stdin while True: data = stream.read(1024) if data=="": break process_data(data)