Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

getting file size

0 views
Skip to first unread message

Bob Smith

unread,
Jan 21, 2005, 8:49:35 PM1/21/05
to
Are these the same:

1. f_size = os.path.getsize(file_name)

2. fp1 = file(file_name, 'r')
data = fp1.readlines()
last_byte = fp1.tell()

I always get the same value when doing 1. or 2. Is there a reason I
should do both? When reading to the end of a file, won't tell() be just
as accurate as os.path.getsize()?

Thanks guys,

Bob

John Machin

unread,
Jan 21, 2005, 11:59:39 PM1/21/05
to

Read the docs. Note the hint that you get what the stdio serves up.
ftell() can only be _guaranteed_ to give you a magic cookie that you
may later use with fseek(magic_cookie) to return to the same place in a
more reliable manner than with Hansel & Gretel's non-magic
bread-crumbs. On 99.99% of modern filesystems, the cookie obtained by
ftell() when positioned at EOF is in fact the size in bytes. But why
chance it? os.path.getsize does as its name suggests; why not use it,
instead of a method with a side-effect? As for doing _both_, why would
you??

Marc 'BlackJack' Rintsch

unread,
Jan 22, 2005, 4:21:06 PM1/22/05
to

You don't always get the same value, even on systems where `tell()`
returns a byte position. You need the rights to read the file in case 2.

>>> import os
>>> os.path.getsize('/etc/shadow')
612L
>>> f = open('/etc/shadow', 'r')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
IOError: [Errno 13] Permission denied: '/etc/shadow'

Ciao,
Marc 'BlackJack' Rintsch

Tim Roberts

unread,
Jan 23, 2005, 12:19:07 AM1/23/05
to
Bob Smith <bob_smi...@hotmail.com> wrote:

On Windows, those two are not equivalent. Besides the newline conversion
done by reading text files, the solution in 2. will stop as soon as it sees
a ctrl-Z.

If you used 'rb', you'd be much closer.
--
- Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.

John Machin

unread,
Jan 23, 2005, 4:18:01 AM1/23/05
to

Tim Roberts wrote:
> Bob Smith <bob_smi...@hotmail.com> wrote:
>
> >Are these the same:
> >
> >1. f_size = os.path.getsize(file_name)
> >
> >2. fp1 = file(file_name, 'r')
> > data = fp1.readlines()
> > last_byte = fp1.tell()
> >
> >I always get the same value when doing 1. or 2. Is there a reason I
> >should do both? When reading to the end of a file, won't tell() be
just
> >as accurate as os.path.getsize()?
>
> On Windows, those two are not equivalent. Besides the newline
conversion
> done by reading text files,

Doesn't appear to me to go wrong due to newline conversion:

Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on
win32
>>> import os.path
>>> txt = 'qwertyuiop\nasdfghjkl\nzxcvbnm\n'
>>> file('bob', 'w').write(txt)
>>> len(txt)
29
>>> os.path.getsize('bob')
32L ##### as expected
>>> f = file('bob', 'r')
>>> lines = f.readlines()
>>> lines
['qwertyuiop\n', 'asdfghjkl\n', 'zxcvbnm\n']
>>> f.tell()
32L ##### as expected

> the solution in 2. will stop as soon as it sees
> a ctrl-Z.

... and the value returned by f.tell() is not the position of the
ctrl-Z but more likely the position of the end of the current block --
which could be thousands/millions of bytes before the physical end of
the file.

Good ol' CP/M.

>
> If you used 'rb', you'd be much closer.

And be much less hassled when that ctrl-Z wasn't meant to mean EOF, it
just happened to appear in an unvalidated data field part way down a
critical file :-(

John Machin

unread,
Jan 23, 2005, 4:49:08 AM1/23/05
to

Tim Roberts wrote:
> Bob Smith <bob_smi...@hotmail.com> wrote:
>
> >Are these the same:
> >
> >1. f_size = os.path.getsize(file_name)
> >
> >2. fp1 = file(file_name, 'r')
> > data = fp1.readlines()
> > last_byte = fp1.tell()
> >
> >I always get the same value when doing 1. or 2. Is there a reason I
> >should do both? When reading to the end of a file, won't tell() be
just
> >as accurate as os.path.getsize()?
>
> On Windows, those two are not equivalent. Besides the newline
conversion
> done by reading text files,

Doesn't appear to me to go wrong due to newline conversion:

Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on
win32
>>> import os.path
>>> txt = 'qwertyuiop\nasdfghjkl\nzxcvbnm\n'
>>> file('bob', 'w').write(txt)
>>> len(txt)
29
>>> os.path.getsize('bob')
32L ##### as expected
>>> f = file('bob', 'r')
>>> lines = f.readlines()
>>> lines
['qwertyuiop\n', 'asdfghjkl\n', 'zxcvbnm\n']
>>> f.tell()
32L ##### as expected

> the solution in 2. will stop as soon as it sees
> a ctrl-Z.

... and the value returned by f.tell() is not the position of the


ctrl-Z but more likely the position of the end of the current block --
which could be thousands/millions of bytes before the physical end of
the file.

Good ol' CP/M.

>


> If you used 'rb', you'd be much closer.

And be much less hassled when that ctrl-Z wasn't meant to mean EOF, it

0 new messages