Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to do this in Python?

8 views
Skip to first unread message

Jim Garrison

unread,
Mar 17, 2009, 6:04:36 PM3/17/09
to
I'm an experienced C/Java/Perl developer learning Python.

What's the canonical Python way of implementing this pseudocode?

String buf
File f
while ((buf=f.read(10000)).length() > 0)
{
do something....
}

In other words, I want to read a potentially large file in 10000 byte
chunks (or some other suitably large chunk size). Since the Python
'file' object implements __next__() only in terms of lines (even,
it seems, for files opened in binary mode) I can't see how to use
the Python for statement in this context.

Am I missing something basic, or is this the canonical way:

with open(filename,"rb") as f:
buf = f.read(10000)
while len(buf) > 0
# do something....
buf = f.read(10000)

Josh Holland

unread,
Mar 17, 2009, 6:10:20 PM3/17/09
to pytho...@python.org
On Tue, Mar 17, 2009 at 05:04:36PM -0500, Jim Garrison wrote:
> What's the canonical Python way of implementing this pseudocode?
>
> String buf
> File f
> while ((buf=f.read(10000)).length() > 0)
> {
> do something....
> }

That looks more like C than pseudocode to me...
Someone's been spending far too much time on C-like languages, if that's
what your idea of simply readable code looks like. Thank heavens you
found Python before it was too late!

--
Josh Holland <j...@joshh.co.uk>
http://joshh.co.uk
madmartian on irc.freenode.net

andrew cooke

unread,
Mar 17, 2009, 6:30:06 PM3/17/09
to Jim Garrison, pytho...@python.org

embarrassed by the other reply i have read, but not doing much binary i/o
myself, i suggest:

with open(...) as f:
while (True):
buf = f.read(10000)
if not buf: break
...

but are you sure you don't want:

with open(...) as f:
for line in f:
...

andrew

Matthew Woodcraft

unread,
Mar 17, 2009, 6:34:31 PM3/17/09
to
Jim Garrison <jgar...@troux.com> writes:

> buf = f.read(10000)
> while len(buf) > 0
> # do something....
> buf = f.read(10000)

I think it's more usual to use a 'break' rather than duplicate the read.

That is, something more like

while True:
buf = f.read(10000)
if len(buf) == 0:
break
# do something

-M-

Tim Chase

unread,
Mar 17, 2009, 6:34:37 PM3/17/09
to Jim Garrison, pytho...@python.org
> Am I missing something basic, or is this the canonical way:
>
> with open(filename,"rb") as f:
> buf = f.read(10000)
> while len(buf) > 0
> # do something....
> buf = f.read(10000)

That will certainly do. Since read() should simply return a
0-length string when you're sucking air, you can just use the
test "while buf" instead of "while len(buf) > 0".

However, if you use it multiple places, you might consider
writing an iterator/generator you can reuse:

def chunk_file(fp, chunksize=10000):
s = fp.read(chunksize)
while s:
yield s
s = fp.read(chunksize)

with open(filename1, 'rb') as f:
for portion in chunk_file(f):
do_something_with(portion)

with open(filename2, 'rb') as f:
for portion in chunk_file(f, 1024):
do_something_with(portion)

-tkc

Luis Zarrabeitia

unread,
Mar 17, 2009, 6:35:50 PM3/17/09
to pytho...@python.org
On Tuesday 17 March 2009 06:04:36 pm Jim Garrison wrote:
>
> Am I missing something basic, or is this the canonical way:
>
> with open(filename,"rb") as f:
> buf = f.read(10000)
> while len(buf) > 0
> # do something....
> buf = f.read(10000)

well, a bit more canonical would be:
...
while buf:
# do something
...
instead of comparing len(buf) with 0. But that's a minor detail.

One could use this:

with open(filename, "rb") as f:

for buf in iter(lambda: f.read(1000),''):
do_something(buff)

but I don't really like a lambda in there. I guess one could use
functools.partial instead, but it still looks ugly to me. Oh, well, I guess I
also want to see the canonical way of doing it.

--
Luis Zarrabeitia (aka Kyrie)
Fac. de Matemática y Computación, UH.
http://profesores.matcom.uh.cu/~kyrie

Armin

unread,
Mar 16, 2009, 8:05:24 PM3/16/09
to pytho...@python.org
On Tuesday 17 March 2009 19:10:20 Josh Holland wrote:
> On Tue, Mar 17, 2009 at 05:04:36PM -0500, Jim Garrison wrote:
> > What's the canonical Python way of implementing this pseudocode?
> >
> > String buf
> > File f
> > while ((buf=f.read(10000)).length() > 0)
> > {
> > do something....
> > }
>
> That looks more like C than pseudocode to me...
> Someone's been spending far too much time on C-like languages, if that's
> what your idea of simply readable code looks like. Thank heavens you
> found Python before it was too late!

I should agree, that looks too much like C. (except there are no ; at the end
of first two lines). And I'm sure you will much enjoy your adventure as a
pythonista (pythanista?) just as I have after migration from C++.

--
Armin Moradi

Jim Garrison

unread,
Mar 17, 2009, 8:54:54 PM3/17/09
to

Ah. That's the Pythonesque way I was looking for. I knew
it would be a generator/iterator but haven't got the Python
mindset down yet and haven't played with writing my own
generator. I'm still trying to think in purely object-
oriented terms where I would override __next__() to
return a chunk of the appropriate size.

Give a man some code and you solve his immediate problem.
Show him a pattern and you've empowered him to solve
his own problems. Thanks!

Jim Garrison

unread,
Mar 17, 2009, 9:00:51 PM3/17/09
to
andrew cooke wrote:
> Jim Garrison wrote:
>> I'm an experienced C/Java/Perl developer learning Python.
>>
>> What's the canonical Python way of implementing this pseudocode?
[snip]

>
> embarrassed by the other reply i have read,

There's always some "trollish" behavior in any comp.lang.*
group. Too many people treat languages as religions instead
of tools. They all have strengths and weaknesses :-)

> but not doing much binary i/o
> myself, i suggest:
>
> with open(...) as f:
> while (True):
> buf = f.read(10000)
> if not buf: break
> ...
>
> but are you sure you don't want:
>
> with open(...) as f:
> for line in f:
> ...
>
> andrew

For a one-off,,your first example would work fine. See the
other reply from Tim Chase for a much more Pythonesque
pattern. I don't want "for line in f:" because binary
files don't necessarily have lines and I'm bulk processing
files potentially 100MB and larger. Reading them one line
at a time would be highly inefficient.

Thanks

MRAB

unread,
Mar 17, 2009, 9:19:07 PM3/17/09
to pytho...@python.org
Jim Garrison wrote:
[snip]

> Ah. That's the Pythonesque way I was looking for.
>
FYI, the correct word is "Pythonic". "Pythonesque" refers to Monty
Python.

Terry Reedy

unread,
Mar 17, 2009, 9:33:27 PM3/17/09
to pytho...@python.org
Jim Garrison wrote:

> Ah. That's the Pythonesque way I was looking for. I knew
> it would be a generator/iterator but haven't got the Python
> mindset down yet and haven't played with writing my own
> generator. I'm still trying to think in purely object-
> oriented terms where I would override __next__() to
> return a chunk of the appropriate size.
>
> Give a man some code and you solve his immediate problem.
> Show him a pattern and you've empowered him to solve
> his own problems. Thanks!

Python's iterator-fed for-loops are its primary motor for calculation.
Anytime one thinks of processing successive items with a while-loop, one
could consider factoring out the production of the successive items with
an iterator. While loops are really only needed for repeated processing
of a single object.

tjr

Grant Edwards

unread,
Mar 17, 2009, 11:59:30 PM3/17/09
to

if not f: break
# do something

--
Grant

Grant Edwards

unread,
Mar 18, 2009, 12:05:02 AM3/18/09
to

That's not pythonic unless you really do need to use
chumk_file() in a lot of places (IMO, more than 3 or 4). If it
only going to be used once, then just do the usual thing:

f = open(...)
while True:
buf = f.read()
if not buf: break
# whatever.
f.close()

Or, you can substitute a with if you want.

--
Grant

Grant Edwards

unread,
Mar 18, 2009, 12:06:37 AM3/18/09
to

Ow! Botched that in a couple ways....

with open(filename,"rb") as f:

while True:
buf = f.read(10000)
if not buf: break

bief...@gmail.com

unread,
Mar 18, 2009, 3:59:43 AM3/18/09
to
On Mar 18, 2:00 am, Jim Garrison <j...@acm.org> wrote:

>  I don't want "for line in f:" because binary
> files don't necessarily have lines and I'm bulk processing
> files potentially 100MB and larger.  Reading them one line
> at a time would be highly inefficient.
>

> Thanks- Hide quoted text -
>
> - Show quoted text -

For what I know, there are at least two levels of cache between your
application
and the actual file: python interpreter caches its reads, and the
operating system
does that too. So if you are worried about reading efficiently the
file, I think you can stop
worry. Instead, if you are processing files which might not have line
termination at all,
then reading in blocks is the right thing to do.

Ciao
----
FB

Hrvoje Niksic

unread,
Mar 18, 2009, 4:45:22 AM3/18/09
to
Luis Zarrabeitia <ky...@uh.cu> writes:

> One could use this:
>
> with open(filename, "rb") as f:
> for buf in iter(lambda: f.read(1000),''):
> do_something(buff)

This is by far the most pythonic solution, it uses the standard 'for'
loop, and clearly marks the sentinel value. lambda may look strange
at first, but this kind of thing is exactly what lambdas are for.
Judging by the other responses, it would seem that few people are
aware of the two-argument 'iter'.

> but I don't really like a lambda in there. I guess one could use
> functools.partial instead, but it still looks ugly to me. Oh, well,
> I guess I also want to see the canonical way of doing it.

I believe you've just written it.

Tim Chase

unread,
Mar 18, 2009, 6:23:16 AM3/18/09
to Grant Edwards, pytho...@python.org
>>> def chunk_file(fp, chunksize=10000):
>>> s = fp.read(chunksize)
>>> while s:
>>> yield s
>>> s = fp.read(chunksize)
>>
>> Ah. That's the Pythonesque way I was looking for.
>
> That's not pythonic unless you really do need to use
> chumk_file() in a lot of places (IMO, more than 3 or 4). If it
> only going to be used once, then just do the usual thing:

Different strokes for different folks -- my reuse threshold tends
towards "more than once". So even a mere 2 copies of the same
pattern would warrant refactoring out this idiom.

Thanks also to those in the thread that have modeled the new
next() sentinel syntax -- nifty.

-tkc


Ulrich Eckhardt

unread,
Mar 18, 2009, 7:09:14 AM3/18/09
to
Grant Edwards wrote:
> with open(filename,"rb") as f:
> while True:
> buf = f.read(10000)
> if not buf: break
> # do something

The pattern

with foo() as bar:
# do something with bar

is equivalent to

bar = foo()
if bar:
# do something with bar

except for the calls to __enter__ and __exit__, right? What I was wondering
was whether a similar construct was considered for a while loop or even an
if clause, because then the above could be written like this:

if open(filename, 'rb') as f:
while f.read(1000) as buf:
# do something with 'buf'

Uli

--
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

Jim Garrison

unread,
Mar 18, 2009, 11:07:49 AM3/18/09
to
Luis Zarrabeitia wrote:
> On Tuesday 17 March 2009 06:04:36 pm Jim Garrison wrote:
> with open(filename, "rb") as f:
> for buf in iter(lambda: f.read(1000),''):
> do_something(buff)

This is the most pythonic solution yet.

Thanks to all the responders who took time to ponder this seemingly
trivial question. I learned a lot about the Python mind-set.

Mel

unread,
Mar 18, 2009, 11:10:40 AM3/18/09
to
Jim Garrison wrote:
> andrew cooke wrote:
>> Jim Garrison wrote:
>>> I'm an experienced C/Java/Perl developer learning Python.
>>> What's the canonical Python way of implementing this pseudocode?
[ ... ]

>> but not doing much binary i/o
>> myself, i suggest:
>>
>> with open(...) as f:
>> while (True):
>> buf = f.read(10000)
>> if not buf: break
>> ...
[ ... ]

> For a one-off,,your first example would work fine. See the
> other reply from Tim Chase for a much more Pythonesque
> pattern. I don't want "for line in f:" because binary
> files don't necessarily have lines and I'm bulk processing
> files potentially 100MB and larger. Reading them one line
> at a time would be highly inefficient.

It would be more work, but subclassing the file class, with a next method
yielding the binary record you want would be fairly clean.

Mel.

Jim Garrison

unread,
Mar 18, 2009, 11:51:58 AM3/18/09
to
Jim Garrison wrote:
> Luis Zarrabeitia wrote:
>> On Tuesday 17 March 2009 06:04:36 pm Jim Garrison wrote:
>> with open(filename, "rb") as f:
>> for buf in iter(lambda: f.read(1000),''):
>> do_something(buf)

>
> This is the most pythonic solution yet.
>
> Thanks to all the responders who took time to ponder this seemingly
> trivial question. I learned a lot about the Python mind-set.

I just tried the code as given above and it results in an infinite loop.

Since f.read() returns a byte string when in binary mode, the sentinel
has to be b''. Is there a value that will compare equal to both '' and b''?

It's a shame the iter(o,sentinel) builtin does the
comparison itself, instead of being defined as iter(callable,callable)
where the second argument implements the termination test and returns a
boolean. This would seem to add much more generality... is
it worthy of a PEP?

Andrii V. Mishkovskyi

unread,
Mar 18, 2009, 12:01:22 PM3/18/09
to Jim Garrison, pytho...@python.org

Just before you start writing a PEP, take a look at `takewhile'
function in `itertools' module. ;)

>
> --
> http://mail.python.org/mailman/listinfo/python-list
>

--
Wbr, Andrii V. Mishkovskyi.

He's got a heart of a little child, and he keeps it in a jar on his desk.

Jim Garrison

unread,
Mar 18, 2009, 12:23:17 PM3/18/09
to
Andrii V. Mishkovskyi wrote:
> Just before you start writing a PEP, take a look at `takewhile'
> function in `itertools' module. ;)

OK, after reading the itertools docs I'm not sure how to use it
in this context. takewhile() requires a sequence, and turning
f.read(bufsize) into an iterable requires iter() (no?) which
wants to do its own termination testing. The following kludge
would subvert iter()'s termination testing but this is starting
to look Perlishly byzantine.

with open(filename, "rb") as f:

for buf in itertools.takewhile( \
lambda b:b, \
iter(lambda: f.read(1000),None)):
do_something(buf)

As opposed to

with open(filename, "rb") as f:

for buf in iter(lambda: f.read(1000), lambda b:b)):
do_something(buf)

where iter(callable,callable) is defined to

1) call the first argument
2) pass the returned value to the second argument
3) yield the first result and continue if the return value
from the second call is True, or terminate if False

S Arrowsmith

unread,
Mar 18, 2009, 12:23:05 PM3/18/09
to
Jim Garrison <j...@acm.org> wrote:
>It's a shame the iter(o,sentinel) builtin does the
>comparison itself, instead of being defined as iter(callable,callable)
>where the second argument implements the termination test and returns a
>boolean. This would seem to add much more generality... is
>it worthy of a PEP?

class sentinel:
def __eq__(self, other):
return termination_test()

for x in iter(callable, sentinel()):
...

Writing a sensible sentinel.__init__ is left as an exercise....

--
\S

under construction

Jim Garrison

unread,
Mar 18, 2009, 1:06:50 PM3/18/09
to

If I understand correctly, this pattern allows me to create
an object (instance of class sentinel) that implements whatever
equality semantics I need to effect loop termination. In the
case in point, then, I end up with

class sentinel:
def __eq__(self,other):
return other=='' or other==b''

with open(filename, "rb") as f:

for buf in iter(lambda: f.read(1000), sentinel())):
do_something(buf)

i.e. sentinel is really "object that compares equal to both ''
and b''". While I appreciate how this works, I think the
introduction of a whole new class is a bit of overkill for
what should be expressible in iter()

andrew cooke

unread,
Mar 18, 2009, 1:18:24 PM3/18/09
to Jim Garrison, pytho...@python.org

have you looked in operators? that might avoid the need for a class.

> --
> http://mail.python.org/mailman/listinfo/python-list
>
>


andrew cooke

unread,
Mar 18, 2009, 1:19:29 PM3/18/09
to Jim Garrison, pytho...@python.org
sorry, ignore that. hit send before thinking properly.

Luis Zarrabeitia

unread,
Mar 18, 2009, 2:08:35 PM3/18/09
to pytho...@python.org

Quoting Jim Garrison <j...@acm.org>:

> Jim Garrison wrote:
> > Luis Zarrabeitia wrote:
> >> On Tuesday 17 March 2009 06:04:36 pm Jim Garrison wrote:
> >> with open(filename, "rb") as f:
> >> for buf in iter(lambda: f.read(1000),''):
> >> do_something(buf)
> >
> > This is the most pythonic solution yet.
> >
> > Thanks to all the responders who took time to ponder this seemingly
> > trivial question. I learned a lot about the Python mind-set.
>
> I just tried the code as given above and it results in an infinite loop.
>
> Since f.read() returns a byte string when in binary mode, the sentinel
> has to be b''. Is there a value that will compare equal to both '' and b''?

Thank you for the correction.
It works in python2.5 (on my Debian, at least), but I can see why it doesn't in
python3.

> It's a shame the iter(o,sentinel) builtin does the
> comparison itself, instead of being defined as iter(callable,callable)
> where the second argument implements the termination test and returns a
> boolean.

A shame indeed. The "takewhile" workaround is way too obfuscated, and to create
a new class only for this purpose is an overkill (and I'd also classify it as
obfuscated).

> This would seem to add much more generality... is
> it worthy of a PEP?

and, it wouldn't need to replace the current sentinel implementation... one
keyword argument, "predicate", would suffice.

+1.

Cheers,

--
Luis Zarrabeitia
Facultad de Matemática y Computación, UH
http://profesores.matcom.uh.cu/~kyrie


Participe en Universidad 2010, del 8 al 12 de febrero de 2010
La Habana, Cuba
http://www.universidad2010.cu

afr...@yahoo.co.uk

unread,
Mar 18, 2009, 8:27:13 PM3/18/09
to
On Mar 18, 3:05 pm, Grant Edwards <gra...@visi.com> wrote:
> {snip] ... If it

> only going to be used once, then just do the usual thing:
>
> f = open(...)
> while True:
>    buf = f.read()
>    if not buf: break
>    # whatever.
> f.close()

+1

That's the canonical way (maybe using "with ... as" nowadays). Surely
everything else is simply overkill, (or unwarranted cleverness) here.
In any case this is what most practising pythonista would comprehend
instantly.

Jervis Whitley

unread,
Mar 18, 2009, 9:31:00 PM3/18/09
to Ulrich Eckhardt, pytho...@python.org
> What I was wondering
> was whether a similar construct was considered for a while loop or even an
> if clause, because then the above could be written like this:
>
>  if open(filename, 'rb') as f:
>      while f.read(1000) as buf:
>          # do something with 'buf'
>

see here, and the associated bug tracker item.

http://www.python.org/dev/peps/pep-0379/

There is also a thread on python-ideas (sorry no links handy) that
deals with this issue, and the
difference between the use of 'as' and assignment.

This is the PEP that describes the 'with' statement and it goes into
the choice of the 'as' keyword.
http://www.python.org/dev/peps/pep-0343/

(from PEP 343) ..
So now the final hurdle was that the PEP 310 syntax:

with VAR = EXPR:
BLOCK1

would be deceptive, since VAR does *not* receive the value of
EXPR. Borrowing from PEP 340, it was an easy step to:

with EXPR as VAR:
BLOCK1

Cheers,

Jervis

bief...@gmail.com

unread,
Mar 19, 2009, 10:05:12 AM3/19/09
to
> what should be expressible in iter()- Hide quoted text -

>
> - Show quoted text -


In the specific case it should not be needed to create a class,
because
at least with python 2.6:

>>> b'' == ''
True
>>> u'' == ''
True
>>>

so you should be able to do:

with open(filename, "rb") as f:

for buf in iter(lambda: f.read(1000), "" ):
do_something(buf)


Ciao
------
FB


Scott David Daniels

unread,
Mar 19, 2009, 11:01:16 AM3/19/09
to

Ah, you misunderstand the short-term expedient that 2.6 took.
Effectively, it simply said, bytes = str.

In 2.6:
>>> str is bytes
True
in 3.X:
>>> str is bytes
False
>>> b'' == ''
False
>>> type(b''), type('')
(<class 'bytes'>, <class 'str'>)

--Scott David Daniels
Scott....@Acm.Org

Josh Holland

unread,
Mar 21, 2009, 11:20:07 AM3/21/09
to pytho...@python.org
On Tue, Mar 17, 2009 at 08:00:51PM -0500, Jim Garrison wrote:
> There's always some "trollish" behavior in any comp.lang.*
> group. Too many people treat languages as religions instead
> of tools. They all have strengths and weaknesses :-)
If you're referring to my reply (about his pseudocode looking like C), I
hope you realise that it was tongue-in-cheek. For the record, I intend
to learn C in the near future and know it is a very powerful language.
How people would write a kernel in Python?
--
Josh Holland <j...@joshh.co.uk>
http://joshh.co.uk
madmartian on irc.freenode.net

Grant Edwards

unread,
Mar 21, 2009, 11:52:14 AM3/21/09
to
On 2009-03-21, Josh Holland <j...@joshh.co.uk> wrote:

> If you're referring to my reply (about his pseudocode looking
> like C), I hope you realise that it was tongue-in-cheek. For
> the record, I intend to learn C in the near future and know it
> is a very powerful language.

> How people would write a kernel in Python?

You'd probably use a carefully selected subset of the language,
and limit yourself when it comes to using library modules. I've
seen references to work done on writing OSes in languages like
Python, but I don't know how far they've gotten.

--
Grant

Josh Holland

unread,
Mar 21, 2009, 12:11:01 PM3/21/09
to pytho...@python.org
Sorry, I meant to write "How *many* people ..."

JanC

unread,
Mar 21, 2009, 12:25:16 PM3/21/09
to
Josh Holland wrote:

> How people would write a kernel in Python?

Like this:
http://code.google.com/p/cleese/wiki/CleeseArchitecturehttp://code.google.com/p/cleese/ ?


--
JanC

0 new messages