if filename.endswith(('.jpg','.jpeg','.gif','.png')):
print "This is a valid image file"
Currently this is not valid Python and I must use the ugly
if filename.endswith('.jpg') or filename.endswith('.jpeg') \
or filename.endswith('.gif') or filename.endswith('.png'):
print "This is a valid image file"
Of course a direct implementation is quite easy:
import sys
class Str(str):
def endswith(self,suffix,start=0,end=sys.maxint):#not sure about sys.maxint
endswith=super(Str,self).endswith
if isinstance(suffix,tuple):
return sum([endswith(s,start,end) for s in suffix]) # multi-or
return endswith(suffix,start,end)
if Str(filename).endswith(('.jpg','.jpeg','.gif','.png')):
print "This is a valid image file"
nevertheless I think this kind of checking is quite common and it would be
worth to have it in standard Python.
Any reaction, comment ?
Michele
In your special case:
import os
if os.path.splitext (filename) [1] in ['.jpg','.jpeg','.gif','.png']:
print "This is a valid image file"
perhaps even os.path.splitext (filename) [1].lower ()
for filesystems that are not case sensitive.
Daniel
Hi,
I like this feature request.
if the argument to endswith is not a string,
it should try to treat the argument as a list or tuple.
thomas
extensions = ('.jpg', '.jpeg', '.gif', '.png')
if filter(filename.endswith, extensions):
print "This is a valid image file
Jp
--
"Pascal is Pascal is Pascal is dog meat."
-- M. Devine and P. Larson, Computer Science 340
Using filter Michele's original statement becomes:
if filter(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
print "This is a valid image file"
IMHO this is simple enough to not require a change to the
.endswith method...
--Irmen
Michele> if filename.endswith(('.jpg','.jpeg','.gif','.png')):
Michele> print "This is a valid image file"
This is analogous to how isinstance works, where its second arg can be a
class or type or a tuple containing classes and types.
I suggest you submit a feature request to SF. A patch to stringobject.c and
unicodeobject.c would help improve chances of acceptance, and for symmetry
you should probably also modify the startswith methods of both types.
Skip
alternative 1:
>>> import re
>>> has_image_file_extn =
re.compile(r".*[.](jpg|jpeg|png|gif)$").match
>>> has_image_file_extn('foo.jpg')
<_sre.SRE_Match object at 0x00769F30>
>>> has_image_file_extn('foo.txt')
>>>
The above has factored out the common "." but is otherwise general for
any list of suffixes.
alternative 2:
>>> has_image_file_extn = lambda f: f.split('.')[-1] in
['jpg','jpeg','png','gif']
>>> has_image_file_extn('foo.jpg')
1
>>> has_image_file_extn('foo.txt')
0
>>>
This is of course restricted to cases where you can isolate the suffix
lexically. If the list is long and/or used frequently, then it might
be better to use a built-at-module-start-up dictionary instead.
One of my favorite languages is APL. All APL variables are arrays of 0 or
more dimensions, and most of the operations in APL take arrays as arguments
and return arrays as results. So I am often frustrated in Python when I
have to write a construct such as your example, when, IMHO, it would be
easy to extend Python to accept sequences as arguments where only single
values are currently allowed. One example is providing a sequence of
integers to index another sequence, vis. given seq = [3, 5, 7, 1 ,4],
seq[(1,3, 0)] would produce [5, 1, 3]. Currently the tersest way to do this
is [seq[x] for x in (1,3,0)]. The new extension would probably execute
faster, and is more readable. There are probably more situations that could
also benefit from such an extension. I know that I can create some of these
using magic methods, but how much nicer if they were native.
In APL one can specify indexes for the various dimensions of an array. If B
is a rank 2 array, B[1 2;3 4] retrieves columns 3 and 4 of rows 1 and 2.
WIBNI one could in a similar way drill into a nested list. I know the
various importable array modules do some of these tings, but they are
limited to homogeneous data.
Bob Gailer
bga...@alum.rpi.edu
303 442 2625
I haven't thought of "filter". It is true, it works, but is it really
readable? I had to think to understand what it is doing.
My (implicit) rationale for
filename.endswith(('.jpg','.jpeg','.gif','.png'))
was that it works exactly as "isinstance", so it is quite
obvious what it is doing. I am asking just for a convenience,
which has already a precedent in the language and respects
the Principle of Least Surprise.
Michele
Too bad my skills with C are essentially unexistent :-(
Michele
Numeric isn't, and I presume numarray isn't, either.
typecode 'O' in Numeric, IIRC.
John
Michele> Too bad my skills with C are essentially unexistent :-(
Look at it as an opportunity to enhance those skills. You have plenty of
time until 2.4. ;-)
In any case, even if you can't whip up the actual C code, a complete feature
request on SF would keep it from being entirely forgotten.
Skip
[Jp]
> > > extensions = ('.jpg', '.jpeg', '.gif', '.png')
> > > if filter(filename.endswith, extensions):
> > > print "This is a valid image file
> > >
> > > Jp
[Irmen]
> > Using filter Michele's original statement becomes:
> >
> > if filter(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
> > print "This is a valid image file"
> >
> > IMHO this is simple enough to not require a change to the
> > .endswith method...
[Michele]
> I haven't thought of "filter". It is true, it works, but is it really
> readable? I had to think to understand what it is doing.
> My (implicit) rationale for
>
> filename.endswith(('.jpg','.jpeg','.gif','.png'))
>
> was that it works exactly as "isinstance", so it is quite
> obvious what it is doing. I am asking just for a convenience,
> which has already a precedent in the language and respects
> the Principle of Least Surprise.
I prefer that this feature not be added. Convenience functions
like this one rarely pay for themselves because:
-- The use case is not that common (afterall, endswith() isn't even
used that often).
-- It complicates the heck out of the C code
-- Checking for optional arguments results in a slight slowdown
for the normal case.
-- It is easy to implement a readable version in only two or three
lines of pure python.
-- It is harder to read because it requires background knowledge
of how endswith() handles a tuple (quick, does it take any
iterable or just a tuple, how about a subclass of tuple; is it
like min() and max() in that it *args works just as well as
argtuple; which python version implemented it, etc).
-- It is a pain to keep the language consistent. Change endswith()
and you should change startswith(). Change the string object and
you should also change the unicode object and UserString and
perhaps mmap. Update the docs for each and add test cases for
each (including weird cases with zero-length tuples and such).
-- The use case above encroaches on scanning patterns that are
already efficiently implemented by the re module.
-- Worst of all, it increases the sum total of python language to be
learned without providing much in return.
-- In general, the language can be kept more compact, efficient, and
maintainable by not trying to vectorize everything (the recent addition
of the __builtin__.sum() is a rare exception that is worth it). It is
better to use a general purpose vectorizing function (like map, filter,
or reduce). This particular case is best implemented in terms of the
some() predicate documented in the examples for the new itertools module
(though any() might have been a better name for it):
some(filename.endswith, ('.jpg','.jpeg','.gif','.png'))
The implementation of some() is better than the filter version because
it provides an "early-out" upon the first successful hit.
Raymond Hettinger
This is arguable.
> -- It complicates the heck out of the C code
Really? Of course, you are the expert. I would do it in analogy to
"isinstance" and internally calling "ifilter" as you suggest.
> -- Checking for optional arguments results in a slight slowdown
> for the normal case.
Perhaps slight enough to be negligible? Of course without
implementation
we cannot say, but I would be surprised to have a sensible slowdown.
> -- It is easy to implement a readable version in only two or three
> lines of pure python.
Yes, but not immediately obvious. See later.
> -- It is harder to read because it requires background knowledge
> of how endswith() handles a tuple (quick, does it take any
> iterable or just a tuple, how about a subclass of tuple; is it
> like min() and max() in that it *args works just as well as
> argtuple; which python version implemented it, etc).
I have used "isinstance" and never wondered about these
technicalities, so
I guess the average user should not be more concerned with .endswith.
> -- It is a pain to keep the language consistent. Change endswith()
> and you should change startswith(). Change the string object and
> you should also change the unicode object and UserString and
> perhaps mmap. Update the docs for each and add test cases for
> each (including weird cases with zero-length tuples and such).
This is true for any modification of the language. One has to balance
costs and benefits. The balance is still largely subjective.
> -- The use case above encroaches on scanning patterns that are
> already efficiently implemented by the re module.
I think the general rule is to avoid regular expressions when
possible.
> -- Worst of all, it increases the sum total of python language to be
> learned without providing much in return.
That it is exactly what I am arguing *against*: there is no additional
learning
effort needed, since a similar feature is already present in
"isinstance"
and an user could be even surprised that it is not implemented in
.endswith.
> -- In general, the language can be kept more compact, efficient, and
> maintainable by not trying to vectorize everything (the recent addition
> of the __builtin__.sum() is a rare exception that is worth it). It is
> better to use a general purpose vectorizing function (like map, filter,
> or reduce). This particular case is best implemented in terms of the
> some() predicate documented in the examples for the new itertools module
> (though any() might have been a better name for it):
>
> some(filename.endswith, ('.jpg','.jpeg','.gif','.png'))
Uhm... don't like "some", nor "any"; what about "the"?
import itertools
the=lambda pred,seq: list(itertools.ifilter(pred,seq))
for filename in os.listdir('.'):
if the(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
print "This is a valid image"
That's readable enough for me, still not completely obvious. The first
time,
I got it wrong by defining "the=itertools.ifilter". I had the idea
that "ifilter" was acting just as "filter", which of course is not the
case
in this example.
> The implementation of some() is better than the filter version because
> it provides an "early-out" upon the first successful hit.
No point against that.
>
> Raymond Hettinger
Michele Simionato
P.S. I am not going to pursue this further, since I like quite a lot
if the(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
dosomething()
Instead, I will suggest this example to be added to the itertools
documentation ;)
I could also submit it as a cookbook recipe, since I think it is
a quite useful trick.
Also, it is good to make people aware of itertool goodies
(myself I have learned something in this thread).
the=lambda pred,seq: list(itertools.islice(itertools.ifilter(pred,seq),0,1))
in such a way that we exit at the first hit, otherwise one could just use
the standard "filter".
not-yet-good-enough-with-itertools-but-improving-ly your's
Michele
How about:
def the(pred,seq): return True in itertools.imap(pred,seq)
if you really want to use the name "the" ("any" makes much more sense to me).
Chris
That's a good idea, indeed. BTW, in this context I feel that
if the(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
dosomething()
is more clear than
if any(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
dosomething()
which is confusing to me since it seems "any" is referred to "filename"
whereas it is referred to the tuple elements.
M.S.
BTW, this suggest to me two short idiomas for multiple "or" and multiple "and",
with shortcut behavior:
def imultior(pred,iterable):
return True in itertools.imap(pred,iterable)
def imultiand(pred,iterable):
return not(False in itertools.imap(pred,iterable))
Nevertheless, they seem to be slower than the non-iterator-based
implementation :-( (at least in some preliminary profiling I did
using a list and a custom defined predicate function)
def multiand(pred,iterable):
istrue=True
for item in iterable:
istrue=istrue and pred(item)
if not istrue: return False
return True
def multior(pred,iterable):
istrue=False
for item in iterable:
istrue=istrue or pred(item)
if istrue: return True
return False
M.
if any_true(filename.endswith, ('.jpg','.jpeg','.gif','.png')):
dosomething()
I suspect it will more often make sense read aloud in the general
if any_true(pred, seq):
than
if the(pred, seq)
I guess the full set of functions might be
any_true, any_false, all_true, and all_false.
or maybe someone can think of better short phrase?
Regards,
Bengt Richter
> I suggest you submit a feature request to SF.
+1 from me :-)
This is a commonly used case. Using things like stripext() is only a
solution for this specific case where filename-extensions are matched.
Michele: I suggesz menatoning this in the feature-request or simple use
a different example (not based on filename extension.)
Regards
Hartmut Goebel
--
| Hartmut Goebel | IT-Security -- effizient |
| h.go...@goebel-consult.de | www.goebel-consult.de |
> I guess the full set of functions might be
> any_true, any_false, all_true, and all_false.
>
> or maybe someone can think of better short phrase?
>
'all_false(...)' is simply 'not any_true(...)'
'any_false(...)' is 'not all_true(...)'
So you could get by with just two of these functions, in which case
'any_of', and 'all_of' might be suitable names.
--
Duncan Booth dun...@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?
>bo...@oz.net (Bengt Richter) wrote in news:bfi4ir$t21$0...@216.39.172.122:
>
>> I guess the full set of functions might be
>> any_true, any_false, all_true, and all_false.
>>
>> or maybe someone can think of better short phrase?
>>
>
>'all_false(...)' is simply 'not any_true(...)'
>'any_false(...)' is 'not all_true(...)'
>
>So you could get by with just two of these functions, in which case
>'any_of', and 'all_of' might be suitable names.
>
I don't think they're equivalent if they do short-circuiting.
Regards,
Bengt Richter
I think in the specific case I was talking about "the" was quite
readable; however I agree that in the general case "any_true" etc.
would be better.
I would not be opposed to add these convenience functions in
itertools. The
advantage is standardization (i.e. I don't have to invent my own name,
different from the name chosen by anybody else), the disadvantage is
more things to learn; however, with such descriptive names, it would
be
difficult to not grasp what those functions are doing, even without
looking at the documentation. Anyway, I am sure many will be opposed,
saying that such functions are so simple that they do not deserve to
be
in the library. This would be a sensible opinion, BTW.
Michele
I think in the specific case I was talking about "the" was quite
I think in the specific case I was talking about "the" was quite
I think in the specific case I was talking about "the" was quite
>>'all_false(...)' is simply 'not any_true(...)'
>>'any_false(...)' is 'not all_true(...)'
>>
>>So you could get by with just two of these functions, in which case
>>'any_of', and 'all_of' might be suitable names.
>>
> I don't think they're equivalent if they do short-circuiting.
>
any_true short circuits as soon as it finds one that is true.
all_false short circuits as soon as it find one that is true.
all_true short circuits as soon as it finds on that is false.
any_false ditto.
Why aren't they equivalent?
>bo...@oz.net (Bengt Richter) wrote in news:bfjokm$kbc$0...@216.39.172.122:
>
>>>'all_false(...)' is simply 'not any_true(...)'
>>>'any_false(...)' is 'not all_true(...)'
>>>
>>>So you could get by with just two of these functions, in which case
>>>'any_of', and 'all_of' might be suitable names.
>>>
>> I don't think they're equivalent if they do short-circuiting.
>>
>
>any_true short circuits as soon as it finds one that is true.
>all_false short circuits as soon as it find one that is true.
>
>all_true short circuits as soon as it finds on that is false.
>any_false ditto.
>
>Why aren't they equivalent?
>
Oops, d'oh ... well, they're not spelled the same ;-)
Regards,
Bengt Richter
> I suspect it will more often make sense read aloud in the general
> if any_true(pred, seq):
> than
> if the(pred, seq)
> I guess the full set of functions might be
> any_true, any_false, all_true, and all_false.
> or maybe someone can think of better short phrase?
some, notevery, every, and notany, respectively. Just like in a
certain other language :-)
--
Just because we Lisp programmers are better than everyone else is no
excuse for us to be arrogant. -- Erann Gat
(setq reply-to
(concatenate 'string "Paul Foley " "<mycroft" '(#\@) "actrix.gen.nz>"))
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/212959
Thanks to everybody who gave feedback!
Michele
That's great, but your recipe contains a very serious error - you
spelled my name wrong ;)
Chris Perkins (not Perking)
Oops! It was a typo, I am going to correct it immediately!
Michele