I am writing a program that analyzes files of different formats. I
would like to use a function for each format. Obviously, functions can
be mapped to file formats. E.g. like this:
if file.endswith('xyz'):
xyz(file)
elif file.endswith('abc'):
abc(file)
...
Yet, I would prefer to do something of the following kind:
func = file[-3:]
apply_func(func, file)
Can something of this kind be done in Python?
Klaus
A file extension is not necessarily 3 chars long.
> apply_func(func, file)
>
> Can something of this kind be done in Python?
The simplest (and canonical) solution is to use a dict:
def handle_txt(path):
# code here
def handle_py(path):
# code here
etc...
def handle_default(path):
# for anything else
handlers = {
".txt" : handle_txt,
".py" : handle_py,
# etc
}
import os
def handle_file(path):
dummy, ext = os.path.splitext(path)
handler = handlers.get(ext, handle_default)
return handler(path)
HTH
No, of course not. But it is, if I choose to use only (self-made) file
endings that are 3 chars long. Anyway, it was just an example.
> handlers = {
> ".txt" : handle_txt,
> ".py" : handle_py,
> # etc
> }
>
That is exactly what I would like to avoid: Having to map the function
'handle_txt' to '.txt'. Firstly, because I don't want to repeat
anything and secondly, because I will one day add a new function and
forget to add its name to the dictionary. (This is not severe if there
is only one dictionary for mapping functions, but it will make life a
lot harder, if a lot of mappings of this kind are used.)
What I want is calling the string directly. In Prolog, I would use
something like:
get_file_ending(File, Ending),
Predicate =.. [Ending, File],
call(Predicate).
More directly to your question, best suggestion is to build a (const)
dictionary:
apply_func = { "xyz":xyz, "abc":abc }
which maps the strings to functions. This line can be at outer scope,
as long as it follows all the appropriate function definitions. Notice
that the individual functions need not be in the same module, if you use
a fully qualified name in the dictionary. And of course, there's no
necessity of naming the function exactly the same as the extension. So
you could implement the functions in another module 'implem", and use
the following:
import implem
apply_func = { "xyz":implem.process_xyz_files,
"abc":implem.process_abc_files }
Now, you use it by something like:
dummy, func_ext = os.path.splitext(my_filename)
apply_func(func_ext, my_filename)
(all code untested)
DaveA
> > handlers = {
> > ".txt" : handle_txt,
> > ".py" : handle_py,
> > # etc
> > }
> >
>
> That is exactly what I would like to avoid: Having to map the function
> 'handle_txt' to '.txt'. Firstly, because I don't want to repeat
> anything and secondly, because I will one day add a new function and
> forget to add its name to the dictionary.
Use dictionary mantained by runtime:
def handle(extensions):
funname = "handle_" + extension
return globals()[funname]
handle('txt') # => function handle_txt
w.
You basically need a getattr lookup. If you're prepared to instantiate
a class or to import a handlers module then you can just look up against
that:
<handlers.py>
def handle_py (stuff):
"print handling py"
def handle_default (stuff):
"print handling default"
</handlers.py>
<main>
import handlers
ext = "py"
handler = getattr (handlers, "handle_" + ext, handlers.handle_default)
handler ("stuff")
</main>
You can do the equivalent by having a Handlers class with
the appropriate methods (handle_py, etc.) and which
you then instantiate.
If you want to keep everything in one module, you should be
able to achieve the same effect by looking the module up
in sys.modules and then proceeding as above:
<whatever.py>
import sys
def handle_py (stuff):
print "handling py"
def handle_default (stuff):
print "handling default"
ext = "py"
me = sys.modules[__name__]
handler = getattr (me, "handle_" + ext, me.handle_default)
handler ("blah")
</whatever.py>
(All untested...)
TJG
import os
class Handlers:
class NoHandler(Exception):
pass
@staticmethod
def txt(fileName):
print 'I am processing a txt file'
@staticmethod
def tar(fileName):
print 'I am processing a tar file'
@classmethod
def default(cls, fileName):
raise cls.NoHandler("I don't know how to handle %s " % fileName)
for fileName in ['/tmp/test.txt', '/tmp/sdfsd.sfds']:
_, extension = os.path.splitext(fileName)
func = getattr(Handlers, extension.replace('.', ''), Handlers.default)
try:
func(fileName)
except Handlers.NoHandler, exc:
print exc
JM
Others have already pointed you to the approach of using a dict, or a
module/class namespace with functions/methods to do this. Either of the
latter two would be my favourite, depending on the complexity of the
handlers. A class is more suitable as a container for short, highly
correlated handlers, whereas a module makes more sense for handlers that do
rather different things, or that are longer than a single function. A
mixture of the two, e.g. a module of classes, where an entire class is used
to implement a complete handler over several methods (potentially including
some inheritance hierarchy between handlers that share functionality) might
also be a solution. Note that objects can be callable in Python (special
method __call__), you can exploit that here.
What you are implementing here is commonly called a dispatch mechanism,
BTW. There are several ways to do that, also within in Python. A web search
should reveal some more.
Stefan
--
Steve Holden +1 571 484 6266 +1 800 494 3119
PyCon is coming! Atlanta, Feb 2010 http://us.pycon.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS: http://holdenweb.eventbrite.com/
As mentioned, a dictionary dispatch will do what you want, but you can
also use the self-registering technique outlined here:
http://effbot.org/zone/metaclass-plugins.htm [Fredrik Lundh]
(For Plugin, read Handler in this case.)
One idea might be to have Handler classes such as:
class TextHandler(HandlerType):
extensions = ['', 'txt', 'rst']
def run(self, *args, **kw):
.... do stuff
then the __init__ of HandlerType's metaclass:
def __init__(cls, name, bases, attrs):
for ext in attrs.get('extensions', []):
registry[ext] = cls
then use like:
registry['txt']().run()
If you don't need state, you could perhaps make 'run' a staticmethod and
store it rather than the class,
eg. registry[ext] = cls.run
and then just:
registry['txt']()
hth
G.F
------------------------------------------------------------------------
registry = {}
class HandlerType(object):
class __metaclass__(type):
def __init__(cls, name, bases, attrs):
for ext in attrs.get('extensions', []):
registry[ext] = cls
class TextHandler(HandlerType):
extensions = ['', 'txt']
print registry
and with eval(), did you try ?
import sys
def functext():
print "texte"
def funcdoc():
print "doc"
def funcabc():
print "abc"
if __name__ == "__main__":
#replace filename with suitable value
filename = sys.argv[1].split('.')[1]
try:
eval('func' + filename + '()')
except:
print 'error'
I may have missed a bit of this thread -- so I have to ask: Has anyone
mentioned using getattr yet? It's a way of looking up *any* attribute
using a string to specify the name. Like this for your particular example:
class Functions: # This could be a module instead of a class
def xyz(...):
...
def abc(...):
...
... and so on ...
ext = os.path.splitext(file) # Parses out the extension
fn = getattr(Functions, ext) # Lookup the correct function
fn(...) # and call it
Gary Herron
WARNING: eval() is almost always the wrong answer to any question
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/
import antigravity
warning : it works !
another question ?
> --
> Aahz (a...@pythoncraft.com) <*> http://www.pythoncraft.com/
>
> import antigravity
Works for what?
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/
import antigravity
>>> WARNING: eval() is almost always the wrong answer to any question
>>
>>warning : it works !
>
> Works for what?
Code injection security bugs, of course.
http://en.wikipedia.org/wiki/Code_injection
It is surprisingly difficult to sanitize strings in Python to make them
safe to pass to eval. Unless you are prepared to trust the input data
explicitly, it's best to just avoid eval.
--
Steven
JM
go to hell ;-), it is part of the language, it seems to match the
aforementioned question.
I don't see it as a problem.
Let's wait Klaus opinion.
Olivier
Despite the fact that it's used in the standard library...
>> It is surprisingly difficult to sanitize strings in Python to make them
>> safe to pass to eval. Unless you are prepared to trust the input data
>> explicitly, it's best to just avoid eval.
>
> Despite the fact that it's used in the standard library...
Wisely or not, the standard library implicitly trusts it's input.
That's one of the many reasons why it's so hard to have a restricted
subset of Python.
--
Steven
And if the extension happens to be valid python-code, you might inject
code malus code through the filename. Great idea!
globals()["function_" + ext]()
is all you need, and doesn't suffer from that attack vector.
Diez
Thats right. In fact, your code is the precise analogy of my Prolog
example in Python. Obviously, eval() and call() are both inherently
dangerous. They should never be used in programs that are used in
programs that get input from people other than the author. Yet, my
program is supposed to parse files that I have created myself and that
are on my laptop. It is not supposed to interact with anybody else
than me.
On the other hand, I think, it is worthwhile getting acquainted with
the getattr-stuff, because this method can be useful in many contexts.
Anyway, thanks to all who participated in this thread. It taught me a
lot.
Famous last words.
Stefan
I knew it.
Olivier
All right, I admit that eval() is evil and should never be used. Under
no circumstances. (Which is, of course, the reason, why Python has
eval().) The same applies to knives. You shouldn't use them. You
shouldn't even use them in your own kitchen. A man might enter your
kitchen, take your knife away and use it against you.
Can you tell the difference between your above statement and the following:
"""
eval() is potentially dangerous and can make code harder to debug. 99%
of the proposed use case for eval() are covered by simpler, less
dangerous and easier to understand solutions, so the GoodPractice(tm) is
to favor these solutions and only use eval() - with appropriate care -
for the remaining 1% _real_ use case.
"""
If you can't tell the difference, then you're about as (im)mature as my
13 year old son and it might eventually be time to grow up.
> The same applies to knives. You shouldn't use them. You
> shouldn't even use them in your own kitchen. A man might enter your
> kitchen, take your knife away and use it against you.
Knives - specially the kind I use in my kitchen - are indeed potentially
dangerous, and I indeed had to educate my son so he wouldn't do anything
stupid with them - like pointing a knife at someone, running across the
house with a knife in his hand, or using them instead of a more
appropriate tool.
The probability that someone will enter your kitchen and use one of your
knives against you, while not null, are low enough to be ignored IMHO. I
whish I could say the same about script kiddies or more educated (and
dangerous) bad guys trying to attack our servers.
But you obviously never had to neither fix a compromised server nor
raise a kid - else you'd now better. Hopefully you didn't raise my kid -
now I just pray none of your code will ever run on our servers.
As already pointed out in my second post (though perhaps not
explicitly enough), I like the getattr-stuff better than eval(). That
is why I will not use eval(). I don't have a reason to use eval(). All
I wanted to say is this: If there are no circumstances at all under
which eval() can reasonably be used, then it should not be part of
Python. As it is part of Python (and as Python is a carefully designed
language), there will most probably some situations in which one might
want to use it.
Or perhaps is it me that failed to re-read a bit more of the thread
before answering - I obviously missed the irony (and made an a... of
myself), sorry :-/
> All
> I wanted to say is this: If there are no circumstances at all under
> which eval() can reasonably be used, then it should not be part of
> Python. As it is part of Python (and as Python is a carefully designed
> language), there will most probably some situations in which one might
> want to use it.
Indeed.
There is nothing to be sorry about. I am grateful to all participants
of this thread. I know a lot more about Python than before.