The only solution which comes to my mind in such case is using a
thread/fork or having a non-blocking version of listdir() returning an
iterator.
What do you think about that?
_______________________________________________
Python-ideas mailing list
Python...@python.org
http://mail.python.org/mailman/listinfo/python-ideas
| I would find very useful having a version of os.listdir returning a
generator.
If there are no technical issues in the way, such a replacement (rather
than addition) would be in line with other list -> iterator replacements in
3.0 (range, dict,items, etc). A list could then be obtained with
list(os.listdir).
tjr
But how common is this use case really?
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
-1
The problem is that reading a directory requires an open file handle;
given a generator context, there's no clear mechanism for determining
when to close the handle. Because the list needs to be created in the
first place, why bother with a generator?
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/
"Typing is cheap. Thinking is expensive." --Roy Smith
Whenever the generator is __del__ed, or whenever the iteration
completes, whichever comes first?
> Because the list needs to be created in the first place
How so?
Maybe what we really want is the functionality of
the C opendir and readdir functions exposed in the os
module. Then we could have an explicit method for
closing the file handle.
--
Greg
It doesn't, actually. On Windows, os.listdir uses FindFirstFile and
FindNextFile, on OS2 it's DosFindFirst and DosFindNext, and on
everything else it's Posix opendir and readdir. All of these are
incremental, so a generator is the most natural way to expose the
underlying API.
That's just a set of facts and a single opinion. Past that I personally
have no preference.
Neil
What about an os.iterdir() generator which uses opendir/readdir as proposed?
The generator's close() could also call closedir(), and you could have a
warning in the docs about making sure to have it closed at some point.
One could even use an enclosing with closing(os.iterdir()) as d: block.
Georg
--
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.
I was feeling in the mood for a diversion, so I whipped up
a Pyrex prototype of an opendir() object that can be used
either as a file-like object or an iterator.
Here's the docstring:
"""opendir(pathname) --> an open directory object
Opens a directory and provides incremental access to
the filenames it contains. May be used as a file-like
object or as an iterator.
When used as a file-like object, each call to read()
returns one filename, or an empty string when the end
of the directory is reached. The close() method should
be called when finished with the directory.
The close() method should also be called when used as
an iterator and iteration is stopped prematurely. If
iteration proceeds to completion, the directory is
closed automatically."""
Source, setup.py and a brief test attached.
--
Greg
On 23 Nov, 02:40, "Guido van Rossum" <gu...@python.org> wrote:
> On Nov 22, 2007 3:25 PM, Terry Reedy <tjre...@udel.edu> wrote:
>
> > "Giampaolo Rodola'" <gne...@gmail.com> wrote
> > > I would find very useful having a version of os.listdir returning a
> > > generator.
>
> > If there are no technical issues in the way, such a replacement (rather
> > than addition) would be in line with other list -> iterator replacements in
> > 3.0 (range, dict,items, etc). A list could then be obtained with
> > list(os.listdir).
>
> But how common is this use case really?
>
> --
> --Guido van Rossum (home page:http://www.python.org/~guido/)
> _______________________________________________
> Python-ideas mailing list
> Python-id...@python.orghttp://mail.python.org/mailman/listinfo/python-ideas
> from opendir import opendir
>
> print "READ"
> d = opendir(".")
> while 1:
> name = d.read()
> if not name:
> break
> print " ", name
> print "EOF"
>
> print "ITERATE"
> d = opendir(".")
> for name in d:
> print " ", name
> print "STOP"
>
> print "TELL/SEEK"
> d = opendir(".")
> for i in range(3):
> name = d.read()
> print " ", name
> pos = d.tell()
> for i in range(3):
> name = d.read()
> print " ", name
> d.seek(pos)
> while 1:
> name = d.read()
> if not name:
> break
> print " ", name
> print "EOF"
This is exactly the usage I was talking about.
Enh. That is not reliable without work, and getting it reliable is a
waste of work. The proposed idea for adding an opendir() function is
workable, but it still doesn't solve the need for closing the handle
within listdir().
No matter what, changes the semantics of listdir() to leave a handle
lying around is going to cause problems for some people.
>> Because the list needs to be created in the first place
>
> How so?
If you're going to ask a question, it would be nice to leave the entire
original context in place, especially given that it's not a particularly
long chunk of text.
Anyway, the Windows case aside, if you don't have a reliable close()
mechanism, you need to slurp the whole thing into a list in one swell
foop so that you can just close the handle. Even in the Windows case,
you need a handle, and I don't know what the consequences are of leaving
it lying around.
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/
"Typing is cheap. Thinking is expensive." --Roy Smith
--Guido
Python...@python.org
http://mail.python.org/mailman/listinfo/python-ideas
I sincerely don't know.
Surely it's a rather specific use case, but it is one of the tasks
which takes the longest amount of time on an FTP server. 20,000 is
probably an exaggerated hypothetical situation, so I did a simple test
with a more realistic scenario.
On windows a very crowded directory is C:\windows\system32. Currently
the C:\windows\system32 of my Windows XP workstation contains 2201
files.
I tried to run the code below which is how an FTP server should
properly respond to a "LIST" command issued by client.
It took 1.70300006866 seconds to complete the first time and
0.266000032425 the second one.
I don't know if such specific use case could justify a listdir
generators support to have into the stdlib but having something like
Greg Ewing's opendirs module could have saved a lot of time in this
specific case.
-- Giampaolo
import os, stat, time
from tarfile import filemode
try:
import pwd, grp
except ImportError:
pwd = grp = None
def format_list(directory):
"""Return a directory listing emulating "/bin/ls -lA" UNIX
command output.
This is how output appears to client:
-rw-rw-rw- 1 owner group 7045120 Sep 02 3:47 music.mp3
drwxrwxrwx 1 owner group 0 Aug 31 18:50 e-books
-rw-rw-rw- 1 owner group 380 Sep 02 3:40 module.py
"""
listing = os.listdir(directory)
result = []
for basename in listing:
file = os.path.join(directory, basename)
# if the file is a broken symlink, use lstat to get stat for
# the link
try:
stat_result = os.stat(file)
except (OSError,AttributeError):
stat_result = os.lstat(file)
perms = filemode(stat_result.st_mode) # permissions
nlinks = stat_result.st_nlink # number of links to inode
if not nlinks: # non-posix system, let's use a bogus value
nlinks = 1
if pwd and grp:
# get user and group name, else just use the raw uid/gid
try:
uname = pwd.getpwuid(stat_result.st_uid).pw_name
except KeyError:
uname = stat_result.st_uid
try:
gname = grp.getgrgid(stat_result.st_gid).gr_name
except KeyError:
gname = stat_result.st_gid
else:
# on non-posix systems the only chance we use default
# bogus values for owner and group
uname = "owner"
gname = "group"
size = stat_result.st_size # file size
# stat.st_mtime could fail (-1) if file's last modification
# time is too old, in that case we return local time as last
# modification time.
try:
mtime = time.strftime("%b %d %H:%M",
time.localtime(stat_result.st_mtime))
except ValueError:
mtime = time.strftime("%b %d %H:%M")
# if the file is a symlink, resolve it, e.g. "symlink ->
real_file"
if stat.S_ISLNK(stat_result.st_mode):
basename = basename + " -> " + os.readlink(file)
# formatting is matched with proftpd ls output
result.append("%s %3s %-8s %-8s %8s %s %s\r\n" %(
perms, nlinks, uname, gname, size, mtime, basename))
return ''.join(result)
if __name__ == '__main__':
before = time.time()
format_list(r'C:\windows\system32')
print time.time() - before
Your code calls os.stat() on each file. I know from past experience
that os.stat() is *extremely* expensive. Because os.listdir() runs at C
speed, it only gets slow when run against hundreds of thousands of
entries.
(One directory on a work server has over 200K entries, and it takes
os.listdir() about twenty seconds. I believe that if we switched from
ext3 to something more appropriate that would get reduced.)
> I don't know if such specific use case could justify a listdir
> generators support to have into the stdlib but having something like
> Greg Ewing's opendirs module could have saved a lot of time in this
> specific case.
Doubtful.
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/
"Typing is cheap. Thinking is expensive." --Roy Smith