Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

lists and files question

0 views
Skip to first unread message

hokiegal99

unread,
Jul 22, 2003, 7:57:06 PM7/22/03
to
This code:

import os, re, string
setpath = raw_input("Enter the path: ")
for root, dirs, files in os.walk(setpath):
id = re.compile('Microsoft Excel Worksheet')
fname = files
# print fname
content = open(fname[1],'rb')

Produces this error:

IOError: Error[2] No such file or directory 'Name of File'

The strange thing is that it correctly identifies the file that it says
doesn't exist. Could someone explain why this is?

Also, is "files" a nested list? It looks like one, but I'm not entirely
sure as I'm still relatively new to Python. Thanks!

Mark Day

unread,
Jul 22, 2003, 8:33:22 PM7/22/03
to
In article <3F1DCF52...@hotmail.com>, hokiegal99
<hokie...@hotmail.com> wrote:

> This code:
>
> import os, re, string
> setpath = raw_input("Enter the path: ")
> for root, dirs, files in os.walk(setpath):
> id = re.compile('Microsoft Excel Worksheet')
> fname = files
> # print fname
> content = open(fname[1],'rb')
>
> Produces this error:
>
> IOError: Error[2] No such file or directory 'Name of File'
>
> The strange thing is that it correctly identifies the file that it says
> doesn't exist. Could someone explain why this is?

The problem is that file doesn't exist in the current working
directory; it's in another directory (stored in "root" in your code).

Try this:
content = open(os.path.join(root,fname[1]), 'rb')

> Also, is "files" a nested list? It looks like one, but I'm not entirely
> sure as I'm still relatively new to Python. Thanks!

It is a list of strings. Each string is the name of one of the files
in the directory (whose path is in "root" above).

-Mark

Sean 'Shaleh' Perry

unread,
Jul 22, 2003, 8:37:43 PM7/22/03
to
On Tuesday 22 July 2003 16:57, hokiegal99 wrote:
> This code:
>
> import os, re, string
> setpath = raw_input("Enter the path: ")
> for root, dirs, files in os.walk(setpath):
> id = re.compile('Microsoft Excel Worksheet')
> fname = files
> # print fname
> content = open(fname[1],'rb')
>
> Produces this error:
>
> IOError: Error[2] No such file or directory 'Name of File'
>

if you replace your logic with some prints you will quickly see the problem.

What happens is os.walk() passes filenames without their path. You need to
use os.path.join() to get the full name back.


hokiegal99

unread,
Jul 22, 2003, 10:29:44 PM7/22/03
to Sean 'Shaleh' Perry

"print fname" prints out the list of files in "setpath" w/o problem. How
does it do that if os.walk doesn't give it the path to the files?

Here's some output from "print fname":

['index.txt', 'CELL-MINUTES.xls', '.nautilus-metafile.xml']
['2000 Schedule.xls', '2001 State.pdf', '2001.pdf', 'A Little More Like
Bill.doc', 'AARP.doc', "Accounting's of Dad's Est.xls",
'Amortization_Table.xls', 'huey letter.doc', 'utt-R&D.pdf', 'utt.pdf',
'rest.xls', 'Debts and Assets.xls', 'First Accounting - Estate.xls',
'Friends.doc', "Grace's.doc", 'Home Address.doc', 'Ins.doc',
'Insurance.doc', 'Interro.doc', 'Marshall.doc', 'NBB home loan.doc',
'Position Description.doc', "andy's", "Andy's Travel Voucher.xls",
"Andy's Travel.xls", 'Rebuttal.doc', 'Refinance.doc', 'TaxReturn 2.pdf',
'TaxReturn 3.pdf', 'TaxReturn 4.pdf', 'TaxReturn 5.pdf',
'TaxReturn.pdf', 'The Casey Song.doc', "Touch of the Hand.xls", 'Workout
Sheet.xls', 'application.pdf', 'budsum.pdf']

When I add os.path.join like this:

setpath = raw_input("Enter the path: ")
for root, dirs, files in os.walk(setpath):
id = re.compile('Microsoft Excel Worksheet')
fname = files

print fname
content = open(os.path.join(root,fname[0]),'rb')

I get a "IndexError: list index out of range" error.

This is a Linux 2.4 computer running Python 2.3b2... if that matters.

Thanks!

John Machin

unread,
Jul 23, 2003, 8:28:04 AM7/23/03
to
hokiegal99 <hokie...@hotmail.com> wrote in message news:<3F1DF31...@hotmail.com>...

> "print fname" prints out the list of files in "setpath" w/o problem. How
> does it do that if os.walk doesn't give it the path to the files?

If you do open('x.txt') and your current directory is (say) /tmp, it
will in effect be trying to open('/tmp/x.txt') which will presumably
fail with the error that you saw because there's no such file ...
hence Sean's advice to use os.path.join so that you are in effect
doing
open('/usr/hokiepokie/weird_files/x.txt')
thus opening the file that you want and does exist -- unless of course
some other process moved it or deleted it after the os.walk() and
before the open() --

This is not a Python thingy nor a Linux thingy; implicitly assuming
that a "short" filename is relative to the calling process's current
directory/folder/group/whatever has been accepted behaviour with any
hierarchical file system that I've ever seen.

Now answering your question:
fname contains (say) ['x.txt', 'foo.xls', 'bar.doc'] and root contains
(say) '/usr/hokiepokie/weird_files'. So of course
print fname
manages to print
['x.txt', 'foo.xls', 'bar.doc']
without reference to root, just as
zot = 42; print zot
manages to print
42
without reference to root.


>
> Here's some output from "print fname":
>
> ['index.txt', 'CELL-MINUTES.xls', '.nautilus-metafile.xml']

... large chunks of your life history snipped ...

> Sheet.xls', 'application.pdf', 'budsum.pdf']
>
> When I add os.path.join like this:
>
> setpath = raw_input("Enter the path: ")
> for root, dirs, files in os.walk(setpath):
> id = re.compile('Microsoft Excel Worksheet')
> fname = files
> print fname
> content = open(os.path.join(root,fname[0]),'rb')
>
> I get a "IndexError: list index out of range" error.
>
> This is a Linux 2.4 computer running Python 2.3b2... if that matters.

It's always good to tell the OS and Python version; however in this
case it doesn't matter.

Put your thinking cap on: "list index out of range" ... which list?
what was the value of the index? ... "in range" means 0 <= index <
length of list; which of those 2 constraints was violated? what does
that tell you?

John Hunter

unread,
Jul 23, 2003, 9:04:37 AM7/23/03
to

Others have already answered your question - I just want to point out
a few of other things

import os, re, string
setpath = raw_input("Enter the path: ")
for root, dirs, files in os.walk(setpath):
id = re.compile('Microsoft Excel Worksheet')

1) id is a built in function; you may not want to override it with
your variable name

>>> x = 1
>>> id(x)
135313208

2) The reason to use re.compile is for efficiency. There is no need
to call it inside the loop, since you're just recompiling the same
regex over and over again. Instead, compile the regex outside the
loop

>>> rgx = re.compile('[A-Z]+')
>>> for some_text in some_list:
... m = rgx.match(some_text)

3) If you want to match 'Microsoft Excel Worksheet', you don't need
regular expressions since this is a string literal. You will
probably be better off just using the string find method, as in

s.find('Microsoft Excel Worksheet')

4) You may want to look at the path module, which provides a nice
interface for walking over files:
http://www.jorendorff.com/articles/python/path/

>>> from path import path
>>> xldir = path(setpath)
>>> for f in xldir.files('*.xls'):
... print f.read().find('Microsoft Excel Worksheet')

Cheers,
John Hunter

hokieghal99

unread,
Jul 23, 2003, 4:29:45 PM7/23/03
to
Thanks to everyone for the feedback on this. I've learned a lot from you
guys.

Skip Montanaro

unread,
Jul 23, 2003, 5:44:32 PM7/23/03
to

Jumping in late, and not meaning to step on John's toes:

>> 2) The reason to use re.compile is for efficiency. There is no need
>> to call it inside the loop, since you're just recompiling the same
>> regex over and over again. Instead, compile the regex outside the
>> loop
>>
>> >>> rgx = re.compile('[A-Z]+')
>> >>> for some_text in some_list:
>> ... m = rgx.match(some_text)

This usually doesn't buy you much because both regular expression modules
delivered with Python (sre via re, and pre if you choose it explicitly)
cache regular expression objects they compiles from strings. Details are in
sre.py:_compile and pre.py:_cachecompile.

Skip

0 new messages