Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Using try-catch to handle multiple possible file types?

66 views
Skip to first unread message

Victor Hooi

unread,
Nov 19, 2013, 2:13:47 AM11/19/13
to
Hi,

I have a script that needs to handle input files of different types (uncompressed, gzipped etc.).

My question is regarding how I should handle the different cases.

My first thought was to use a try-catch block and attempt to open it using the most common filetype, then if that failed, try the next most common type etc. before finally erroring out.

So basically, using exception handling for flow-control.

However, is that considered bad practice, or un-Pythonic?

What other alternative constructs could I also use, and pros and cons?

(I was thinking I could also use python-magic which wraps libmagic, or I can just rely on file extensions).

Other thoughts?

Cheers,
Victor

Chris Angelico

unread,
Nov 19, 2013, 2:22:50 AM11/19/13
to pytho...@python.org
On Tue, Nov 19, 2013 at 6:13 PM, Victor Hooi <victo...@gmail.com> wrote:
> My first thought was to use a try-catch block and attempt to open it using the most common filetype, then if that failed, try the next most common type etc. before finally erroring out.
>
> So basically, using exception handling for flow-control.
>
> However, is that considered bad practice, or un-Pythonic?

It's fairly common to work that way. But you may want to be careful
what order you try them in; some codecs might be technically capable
of reading other formats than you wanted, so start with the most
specific.

Alternatively, looking at a file's magic number (either with
python-magic/libmagic or by manually reading in a few bytes) might be
more efficient. Either way can work, take your choice!

ChrisA

Amit Saha

unread,
Nov 19, 2013, 2:22:13 AM11/19/13
to Victor Hooi, pytho...@python.org
How about starting with a dictionary like this:

file_opener = {'.gz': gz_opener,
'.txt': text_opener,
'.zip': zip_opener}
# and so on.

where the *_opener are say functions which does the job of actually
opening the files.
The above dictionary is keyed on file extensions, but perhaps you
would be better off using MIME types instead.

Assuming you go ahead with using MIME type, how about using
python-magic to detect the type and then look in your dictionary
above, if there is a corresponding file_opener object. If you get a
KeyError, you can raise an exception saying that you cannot handle
this file.


How does that sound?

Best,
Amit.


--
http://echorand.me

Mark Lawrence

unread,
Nov 19, 2013, 4:36:47 AM11/19/13
to pytho...@python.org
On 19/11/2013 07:13, Victor Hooi wrote:
>
> So basically, using exception handling for flow-control.
>
> However, is that considered bad practice, or un-Pythonic?
>

If it works for you use it, practicality beats purity :)

--
Python is the second best programming language in the world.
But the best has yet to be invented. Christian Tismer

Mark Lawrence

Victor Hooi

unread,
Nov 19, 2013, 7:30:46 PM11/19/13
to
Hi,

Is either approach (try-excepts, or using libmagic) considered more idiomatic? What would you guys prefer yourselves?

Also, is it possible to use either approach with a context manager ("with"), without duplicating lots of code?

For example:

try:
with gzip.open('blah.txt', 'rb') as f:
for line in f:
print(line)
except IOError as e:
with open('blah.txt', 'rb') as f:
for line in f:
print(line)

I'm not sure of how to do this without needing to duplicating the processing lines (everything inside the with)?

And using:

try:
f = gzip.open('blah.txt', 'rb')
except IOError as e:
f = open('blah.txt', 'rb')
finally:
for line in f:
print(line)

won't work, since the exception won't get thrown until you actually try to open the file. Plus, I'm under the impression that I should be using context-managers where I can.

Also, on another note, python-magic will return a string as a result, e.g.:

gzip compressed data, was "blah.txt", from Unix, last modified: Wed Nov 20 10:48:35 2013

I suppose it's enough to just do a?

if "gzip compressed data" in results:

or is there a better way?

Cheers,
Victor

Steven D'Aprano

unread,
Nov 19, 2013, 8:56:05 PM11/19/13
to
On Tue, 19 Nov 2013 16:30:46 -0800, Victor Hooi wrote:

> Hi,
>
> Is either approach (try-excepts, or using libmagic) considered more
> idiomatic? What would you guys prefer yourselves?

Specifically in the case of file types, I consider it better to use
libmagic. But as a general technique, using try...except is a reasonable
approach in many situations.


> Also, is it possible to use either approach with a context manager
> ("with"), without duplicating lots of code?
>
> For example:
>
> try:
> with gzip.open('blah.txt', 'rb') as f:
> for line in f:
> print(line)
> except IOError as e:
> with open('blah.txt', 'rb') as f:
> for line in f:
> print(line)
>
> I'm not sure of how to do this without needing to duplicating the
> processing lines (everything inside the with)?

Write a helper function:

def process(opener):
with opener('blah.txt', 'rb') as f:
for line in f:
print(line)


try:
process(gzip.open)
except IOError:
process(open)


If you have many different things to try:


for opener in [gzip.open, open, ...]:
try:
process(opener)
except IOError:
continue
else:
break



[...]
> Also, on another note, python-magic will return a string as a result,
> e.g.:
>
> gzip compressed data, was "blah.txt", from Unix, last modified: Wed Nov
> 20 10:48:35 2013
>
> I suppose it's enough to just do a?
>
> if "gzip compressed data" in results:
>
> or is there a better way?

*shrug*

Read the docs of python-magic. Do they offer a programmable API? If not,
that kinda sucks.



--
Steven

Mark Lawrence

unread,
Nov 19, 2013, 8:50:02 PM11/19/13
to pytho...@python.org
Something like

for filetype in filetypes:
try:
process(filetype)
break
except IOError:
pass

??? as it's 01:50 GMT and I can't sleep :(

Neil Cerutti

unread,
Nov 20, 2013, 10:05:03 AM11/20/13
to pytho...@python.org
Steven D'Aprano steve+comp....@pearwood.info via python.org
8:56 PM (12 hours ago) wrote:
> Write a helper function:
>
> def process(opener):
> with opener('blah.txt', 'rb') as f:
> for line in f:
> print(line)

As another option, you can enter the context manager after you decide.

try:
f = gzip.open('blah.txt', 'rb')
except IOError:
f = open('blah.txt', 'rb')
with f:
# processing
for line in f:
print(line)

contextlib.ExitStack was designed to handle cases where entering
context is optional, and so also works for this use case.

with contextlib.ExitStack() as stack:
try:
f = gzip.open('blah.txt', 'rb')
except IOError:
f = open('blah.txt', 'rb')
stack.enter_context(f)
for line in f:
print(line)

--
Neil Cerutti
> --
> https://mail.python.org/mailman/listinfo/python-list



--
Neil Cerutti <mr.cerut...@gmail.com>
0 new messages