Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Trouble sorting lists (unicode/locale related?)

4 views
Skip to first unread message

Erlend Fuglum

unread,
Sep 21, 2003, 7:32:44 AM9/21/03
to
Hi everyone,

I'm having some trouble sorting lists. I suspect this might have
something to do with locale settings and/or character
encoding/unicode.

Consider the following example, text containing norwegian special
characters æ, ø and å.

>>> liste = ["ola", "erlend", "trygve", "Ærlige anders", "Lars", "Øksemorderen", "Åsne", "Akrobatiske Anna", "leidulf"]
>>> liste.sort()
>>> liste
['Akrobatiske Anna', 'Lars', 'erlend', 'leidulf', 'ola', 'trygve',
'\xc5sne', '\xc6rlige anders', '\xd8ksemorderen']

There are a couple of issues for me here:
* The sorting method apparently places strings starting with uppercase
characters before strings staring with lowercase. I would like to
treat them them equally when sorting. OK, this could probably be fixed
by hacking with .toupper() or something, but isn't it possible to
achieve this in a more elegant way?

* The norwegian special characters are sorted in a wrong way.
According to our alphabet the correct order is (...) x, y, z, æ, ø å.
Python does it this way: (...) x, y, z, å, æ, ø ?

I would really appreciate any help and suggestions - I have been
fiddling with this mess for quite some time now :-)

Peter Otten

unread,
Sep 21, 2003, 8:10:16 AM9/21/03
to
Erlend Fuglum wrote:

Try setting the appropriate locale first:

import locale
locale.setlocale(locale.LC_ALL, ("no", None))

Then for a case-insensitive sort:

wordlist.sort(locale.strcoll)

should do (disclaimer: all untested).

Peter


Erlend Fuglum

unread,
Sep 21, 2003, 9:19:48 AM9/21/03
to
On Sun, 21 Sep 2003 14:10:16 +0200, Peter Otten <__pet...@web.de>
wrote:

>Erlend Fuglum wrote:
>
>> Hi everyone,
>>
>> I'm having some trouble sorting lists. I suspect this might have
>> something to do with locale settings and/or character
>> encoding/unicode.

>> (...)


>
>Try setting the appropriate locale first:
>
>import locale
>locale.setlocale(locale.LC_ALL, ("no", None))
>
>Then for a case-insensitive sort:
>
>wordlist.sort(locale.strcoll)
>
>should do (disclaimer: all untested).

Grrrrrreat, that did it! Thank you very much!

Erlend

Jeff Epler

unread,
Sep 21, 2003, 10:25:38 AM9/21/03
to
On Sun, Sep 21, 2003 at 02:10:16PM +0200, Peter Otten wrote:
> Try setting the appropriate locale first:
>
> import locale
> locale.setlocale(locale.LC_ALL, ("no", None))
>
> Then for a case-insensitive sort:
>
> wordlist.sort(locale.strcoll)
>
> should do (disclaimer: all untested).

If this did work, then you can use the DSU pattern like so:

decorated_wordlist = [(locale.strxfrm(w), w) for w in wordlist]
decorated_wordlist.sort()
wordlist[:] = [i[1] for i in decorated_wordlist]

from "man strxfrm"
DESCRIPTION
The strxfrm() function transforms the src string into a form such that
the result of strcmp() on two strings that have been transformed with
strxfrm() is the same as the result of strcoll() on the two strings
before their transformation. The first n characters of the transformed
string are placed in dest. The transformation is based on the pro-
gram’s current locale for category LC_COLLATE. (See setlocale(3)).

Jeff

John J. Lee

unread,
Sep 21, 2003, 4:54:29 PM9/21/03
to
Jeff Epler <jep...@unpythonic.net> writes:
> On Sun, Sep 21, 2003 at 02:10:16PM +0200, Peter Otten wrote:
[...]

> > locale.setlocale(locale.LC_ALL, ("no", None))
[...]

> If this did work, then you can use the DSU pattern like so:
[...]

The advantage of which is that you don't have to mess with the locale.


John

"Martin v. Löwis"

unread,
Sep 21, 2003, 4:59:34 PM9/21/03
to
John J. Lee wrote:

>>>locale.setlocale(locale.LC_ALL, ("no", None))
>
> [...]
>
>>If this did work, then you can use the DSU pattern like so:
>

> [...strxfrm...]


>
> The advantage of which is that you don't have to mess with the locale.

No, it doesn't. strxfrm requires that the locale is adjusted to the
desired LC_COLLATE facet. Otherwise, it will collate according to the
C locale (in which case strxfrm is the identity).

Regards,
Martin

John J. Lee

unread,
Sep 21, 2003, 9:27:08 PM9/21/03
to
"Martin v. Löwis" <mar...@v.loewis.de> writes:

> John J. Lee wrote:
>
> >>>locale.setlocale(locale.LC_ALL, ("no", None))
> > [...]
> >
> >>If this did work, then you can use the DSU pattern like so:
> > [...strxfrm...]
> > The advantage of which is that you don't have to mess with the
> > locale.
>
> No, it doesn't.

It does set the locale, you mean? So I guess there's no advantage
there at all?

[...]


> strxfrm requires that the locale is adjusted to the
> desired LC_COLLATE facet.

[...]


John

Martin v. Löwis

unread,
Sep 22, 2003, 1:47:19 AM9/22/03
to
j...@pobox.com (John J. Lee) writes:

> > >>If this did work, then you can use the DSU pattern like so:
> > > [...strxfrm...]
> > > The advantage of which is that you don't have to mess with the
> > > locale.
> >
> > No, it doesn't.
>
> It does set the locale, you mean?

Calling locale.strxfrm does not cause locale.setlocale to be called,
i.e. it does not set the locale. Likewise, calling locale.strcoll
does not cause setlocale to be called.

Instead, the application needs to call setlocale *explicitly* for
proper operation of either function.

> So I guess there's no advantage there at all?

Using strxfrm is an advantage if you have to collate the same string
multiple times, e.g. when sorting a list of strings. It is an
advantage because sorting will complete faster.

Regards,
Martin

Peter Otten

unread,
Sep 22, 2003, 4:49:33 AM9/22/03
to
Jeff Epler wrote:

> If this did work, then you can use the DSU pattern like so:
>
> decorated_wordlist = [(locale.strxfrm(w), w) for w in wordlist]
> decorated_wordlist.sort()
> wordlist[:] = [i[1] for i in decorated_wordlist]

Everytime that someone posts a naive list.sort(compare), the DSU pattern is
proposed to improve execution speed.

So maybe it's about time to change the sort() method to support a second
argument

list.sort(compare=None, mapping=None)

that, if provided, would perform the DSU magic. Or was that already proposed
and rejected?

Peter

Alex Martelli

unread,
Sep 22, 2003, 9:35:31 AM9/22/03
to
Peter Otten wrote:

I have not seen this proposed before, and I'm not very clear on what
the "compare" and "mapping" arguments are supposed to be in order to
let you specify any DSU. Basically it seems you would need two
callables, "decorate" to be called with each item in the list (to
return for each item the decorated tuple) and "undecorate" to be
called with each decorated tuple after the sort (to return the item
for the result). How do you turn that into "compare" and "mapping"?


Alex

Duncan Booth

unread,
Sep 22, 2003, 10:09:06 AM9/22/03
to
Alex Martelli <al...@aleax.it> wrote in
news:DUCbb.100826$hE5.3...@news1.tin.it:

>> So maybe it's about time to change the sort() method to support a
>> second argument
>>
>> list.sort(compare=None, mapping=None)
>>
>> that, if provided, would perform the DSU magic. Or was that already
>> proposed and rejected?
>
> I have not seen this proposed before, and I'm not very clear on what
> the "compare" and "mapping" arguments are supposed to be in order to
> let you specify any DSU. Basically it seems you would need two
> callables, "decorate" to be called with each item in the list (to
> return for each item the decorated tuple) and "undecorate" to be
> called with each decorated tuple after the sort (to return the item
> for the result). How do you turn that into "compare" and "mapping"?
>
>

You don't need two callables because the sort function would be doing
the decorating, so it knows also how to undecorate. I think the
suggestion is that the mapping argument returns something that can be
compared.

For example, here is a DSU function that does a not-in-place sort and
takes a suitable mapping argument. Changing it to in-place sort is,
of course, trivial.

>>> def DSU(aList, aMapping):
newList = [ (aMapping(item), index, item) for (index, item)
in enumerate(aList) ]
newList.sort()
return [ item[2] for item in newList ]

>>> print DSU(['Alex Martelli', 'Duncan Booth', 'Peter Otten', 'Jeff Epler'],
str.lower)
['Alex Martelli', 'Duncan Booth', 'Jeff Epler', 'Peter Otten']
>>> def lastFirst(name):
names = name.split()
names = names[-1:] + names[:-1]
return [name.lower() for name in names]

>>> print DSU(['Alex Martelli', 'Duncan Booth', 'Peter Otten', 'Jeff Epler'],
lastFirst)
['Duncan Booth', 'Jeff Epler', 'Alex Martelli', 'Peter Otten']


--
Duncan Booth
dun...@rcp.co.uk int month(char
*p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?

Peter Otten

unread,
Sep 22, 2003, 10:29:53 AM9/22/03
to al...@aleax.it
Alex Martelli wrote:

>> Everytime that someone posts a naive list.sort(compare), the DSU pattern
>> is proposed to improve execution speed.
>>
>> So maybe it's about time to change the sort() method to support a second
>> argument
>>
>> list.sort(compare=None, mapping=None)
>>
>> that, if provided, would perform the DSU magic. Or was that already
>> proposed and rejected?
>
> I have not seen this proposed before, and I'm not very clear on what
> the "compare" and "mapping" arguments are supposed to be in order to
> let you specify any DSU. Basically it seems you would need two
> callables, "decorate" to be called with each item in the list (to
> return for each item the decorated tuple) and "undecorate" to be
> called with each decorated tuple after the sort (to return the item
> for the result). How do you turn that into "compare" and "mapping"?

My first idea was to add a .dsu(mapping) where the tuples in the decoration
phase would be generated as (mapping(item), item).
But I would rather enhance the already-existing sort() to provide for both
the traditional and the dsu sorting. Then you have to detect if the
function passed to sort(fun) takes one or two arguments, which is not
reliable when default values come into play. So I resorted to named
arguments. Didn't make it any clearer?

So here is a pure python prototype to shed some light on the matter:

class dsu(list):
def sort(self, compare=None, mapping=None):
if mapping:
decorated = [(mapping(i), i) for i in self]
decorated.sort()
self[:] = [i[1] for i in decorated]
else:
list.sort(self, compare)

sample = dsu("ab Aa AC d e f".split())

# "traditional" sort
sample.sort(lambda s, t: -cmp(s, t))
print sample

# decorate-sort-undecorate
sample.sort(mapping=lambda s: s.lower())
print sample


Peter

Alex Martelli

unread,
Sep 22, 2003, 10:46:15 AM9/22/03
to
Duncan Booth wrote:
...

>>> So maybe it's about time to change the sort() method to support a
>>> second argument
>>>
>>> list.sort(compare=None, mapping=None)
>>>
>>> that, if provided, would perform the DSU magic. Or was that already
>>> proposed and rejected?
>>
>> I have not seen this proposed before, and I'm not very clear on what
>> the "compare" and "mapping" arguments are supposed to be in order to
>> let you specify any DSU. Basically it seems you would need two
>> callables, "decorate" to be called with each item in the list (to
>> return for each item the decorated tuple) and "undecorate" to be
>> called with each decorated tuple after the sort (to return the item
>> for the result). How do you turn that into "compare" and "mapping"?
>>
>>
> You don't need two callables because the sort function would be doing
> the decorating, so it knows also how to undecorate. I think the
> suggestion is that the mapping argument returns something that can be
> compared.
>
> For example, here is a DSU function that does a not-in-place sort and
> takes a suitable mapping argument. Changing it to in-place sort is,
> of course, trivial.
>
>>>> def DSU(aList, aMapping):
> newList = [ (aMapping(item), index, item) for (index, item)

Ah, I see -- the alleged "mapping" is in fact meant to be a
*CALLABLE*, NOT a mapping in the Python sense. Yes, you could
get away with just one callable in this way, of course. It
also seems to me that shoe-horning that second argument (in
practice mutually exclusive with the first one) to the sort
method, when there seem to be no real advantages to keeping
it a method, is not a great idea. Rather, a new built-in
function appears to me to be best here -- something like:

def sort(iterable, decorate=None):
if decorate is None:
aux = [ (item, index, item)
for index, item in enumerate(iterable) ]
else:
aux = [ (decorate(item), index, item)
for index, item in enumerate(iterable) ]
aux.sort()
return [ item for __, __, item in aux ]

so as to have no arbitrary constraints on the iterable being
a list, or the sort being necessarily in-place, when there
is no performance advantage in so doing -- and also serve the
frequent need of a non-in-place sort returning the sorted
list without any actual need for decoration. E.g., I often
use the equivalent of:

for key in sort(somesmalldict):
print key, somesmalldict[key]

and it would be handy to have that little 'sort' function
built-in for all such uses.

The only downside I can see to this proposal is that Python
isn't really all that good at passing the callable decorate
argument -- one would end up with a lot of little ad hoc
functions, or lambdas, and neither is a great solution. It's
the kind of situation where Smalltalk/Ruby shine by letting
an arbitrary block of code (albeit one only) be passed into
any method (in Ruby, this DSU-sort would probably become a
method of the Enumerable "module" [mix-in] -- a rather elegant
solution overall, "architecturally" nicer-looking than Python's,
although "pragmatically" we're probably abot even).


Alex

Duncan Booth

unread,
Sep 22, 2003, 11:35:12 AM9/22/03
to
Peter Otten <__pet...@web.de> wrote in
news:bkn12v$tns$06$1...@news.t-online.com:

> My first idea was to add a .dsu(mapping) where the tuples in the
> decoration phase would be generated as (mapping(item), item).

Note that if anyone proposes this seriously, it should generate a 3-tuple
(mapping(item), index, item) rather than the 2-tuple you suggest.

This is because the mapping function could reasonably be used to impose an
ordering on objects that have no natural order, so you need to be sure that
the comparison never falls through the the original object even where the
mapping compares equal. It also has the side effect of making the sort
stable (although if stability is a goal you might want another option to
reverse the sort which would use '-index' as the second element and call
.reverse() on the result).

FWIW, I think something like this belongs in the standard library rather
than as a method on lists or a new builtin.

Alex Martelli

unread,
Sep 22, 2003, 11:45:57 AM9/22/03
to
Duncan Booth wrote:
...

> FWIW, I think something like this belongs in the standard library rather
> than as a method on lists or a new builtin.

Definitely not a method on lists, we agree on that. Problem is, there
is no module in the standard library to collect "functions that are often
useful but not QUITE often enough to warrant being built-ins" (and if
there was, there are quite a few current built-ins I'd like to relegate
into such an auxiliary/utilities module), so that finding a good place,
other than built-ins, for such utility functions, is often a bother.


Alex

Shu-Hsien Sheu

unread,
Sep 22, 2003, 11:17:53 AM9/22/03
to
Hi,

I have a question about the comparison of efficiency of string slicing
and using string.startswith.
For example, which one of the following would be more efficient, or ,
moreover, more pythonic?

if aa[:3] == 'abc':

vs

if aa.startswith('abc'):


Thanks!

-shuhsien


Bob Gailer

unread,
Sep 22, 2003, 11:57:41 AM9/22/03
to

Here's one way to address this kind of question:

>>> import time
>>> def f():
... st = time.time()
... for i in range(500000):
... if aa.startswith('abc'):pass
... print time.time() - st
...
>>> f()
1.01200008392
>>> def f():
... st = time.time()
... for i in range(500000):
... if aa[:3] == 'abc':pass
... print time.time() - st
...
>>> f()
1.01100003719

Bob Gailer
bga...@alum.rpi.edu
303 442 2625

Duncan Booth

unread,
Sep 22, 2003, 12:25:58 PM9/22/03
to
Alex Martelli <al...@aleax.it> wrote in
news:VOEbb.101641$hE5.3...@news1.tin.it:

So someone needs to write a PEP proposing such a module.

Peter Hansen

unread,
Sep 22, 2003, 2:23:52 PM9/22/03
to

The latter is clearly more readable.

David Eppstein

unread,
Sep 22, 2003, 2:51:51 PM9/22/03
to
In article <3F6F3E38...@engcorp.com>,
Peter Hansen <pe...@engcorp.com> wrote:

> > For example, which one of the following would be more efficient, or ,
> > moreover, more pythonic?
> >
> > if aa[:3] == 'abc':
> >
> > vs
> >
> > if aa.startswith('abc'):
>
> The latter is clearly more readable.

More Pythonic, too, I think. "Readability counts," and "There should be
one-- and preferably only one --obvious way to do it." In this case,
startswith must be the one obvious way, or else why would it exist in
the standard library at all?

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science

Batista, Facundo

unread,
Sep 22, 2003, 3:14:58 PM9/22/03
to

Sorry for not replying the original message.

See PEP 08:

    - Avoid slicing strings when checking for prefixes or suffixes.
      Use startswith() and endswith() instead, since they are faster,
      cleaner and less error prone.  E.g.:

        No:  if foo[:3] == 'bar':
        Yes: if foo.startswith('bar'):

      The exception is if your code must work with Python 1.5.2 (but
      let's hope not!).


.       Facundo


#- -----Mensaje original-----
#- De: David Eppstein [mailto:epps...@ics.uci.edu]
#- Enviado el: Lunes 22 de Septiembre de 2003 3:52 PM
#- Para: pytho...@python.org
#- Asunto: Re: Slicing vs .startswith
#-
#-
#- In article <3F6F3E38...@engcorp.com>,
#-  Peter Hansen <pe...@engcorp.com> wrote:
#-
#- > > For example, which one of the following would be more
#- efficient, or ,
#- > > moreover, more pythonic?
#- > >
#- > > if aa[:3] == 'abc':
#- > >
#- > > vs
#- > >
#- > > if aa.startswith('abc'):
#- >
#- > The latter is clearly more readable.
#-
#- More Pythonic, too, I think.  "Readability counts," and
#- "There should be
#- one-- and preferably only one --obvious way to do it."  In
#- this case,
#- startswith must be the one obvious way, or else why would it
#- exist in
#- the standard library at all?
#-
#- --
#- David Eppstein                      http://www.ics.uci.edu/~eppstein/
#- Univ. of California, Irvine, School of Information & Computer Science
#- --
#- http://mail.python.org/mailman/listinfo/python-list
#-





. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ADVERTENCIA 

La información contenida en este mensaje y cualquier archivo anexo al mismo, son para uso exclusivo del destinatario y pueden contener información confidencial o propietaria, cuya divulgación es sancionada por la ley.

Si Ud. No es uno de los destinatarios consignados o la persona responsable de hacer llegar este mensaje a los destinatarios consignados, no está autorizado a divulgar, copiar, distribuir o retener información (o parte de ella) contenida en este mensaje. Por favor notifíquenos respondiendo al remitente, borre el mensaje original y borre las copias (impresas o grabadas en cualquier medio magnético) que pueda haber realizado del mismo.

Todas las opiniones contenidas en este mail son propias del autor del mensaje y no necesariamente coinciden con las de Telefónica Comunicaciones Personales S.A. o alguna empresa asociada.

Los mensajes electrónicos pueden ser alterados, motivo por el cual Telefónica Comunicaciones Personales S.A. no aceptará ninguna obligación cualquiera sea el resultante de este mensaje.

Muchas Gracias.

xtian

unread,
Sep 22, 2003, 5:47:51 PM9/22/03
to
David Eppstein <epps...@ics.uci.edu> wrote in message news:<eppstein-89A196...@news.service.uci.edu>...

> In article <3F6F3E38...@engcorp.com>,
> Peter Hansen <pe...@engcorp.com> wrote:
>
> > > For example, which one of the following would be more efficient, or ,
> > > moreover, more pythonic?
> > >
> > > if aa[:3] == 'abc':
> > >
> > > vs
> > >
> > > if aa.startswith('abc'):
> >
> > The latter is clearly more readable.
>
> More Pythonic, too, I think. "Readability counts," and "There should be
> one-- and preferably only one --obvious way to do it." In this case,
> startswith must be the one obvious way, or else why would it exist in
> the standard library at all?

It's also much more maintainable - if in the future that 'abc' needs
to change to 'abcdef', the slicing version requires changes in two
places (just asking for bugs), while startswith requires only one. Of
course, all these things (readability, maintainability and
pythonicism) are fairly closely interrelated.

xtian

Jeremy Fincher

unread,
Sep 22, 2003, 6:29:19 PM9/22/03
to
Shu-Hsien Sheu <sh...@bu.edu> wrote in message news:<mailman.1064245938...@python.org>...

> For example, which one of the following would be more efficient, or ,
> moreover, more pythonic?
>
> if aa[:3] == 'abc':
>
> vs
>
> if aa.startswith('abc'):

Python is about maintainability, and the latter is significantly more
maintainable than the former; if the string you're checking against
changes in size, using .startswith doesn't require you to change your
sourcecode anywhere else.

Jeremy

Paul

unread,
Sep 22, 2003, 6:39:00 PM9/22/03
to
However, what if you don't want case sensitivity? For example, to
check if a file is a jpg, I do name[-3:].lower() == 'jpg'. This will
work with both foo.jpg and foo.JPG.

Is this slower than name.lower().endswith('jpg')? Is there a better
solution altogether?

Paul


Bob Gailer <bga...@alum.rpi.edu> wrote in message news:<mailman.1064246454...@python.org>...

> --

Alex Martelli

unread,
Sep 22, 2003, 6:52:12 PM9/22/03
to
Paul wrote:

> However, what if you don't want case sensitivity? For example, to
> check if a file is a jpg, I do name[-3:].lower() == 'jpg'. This will
> work with both foo.jpg and foo.JPG.
>
> Is this slower than name.lower().endswith('jpg')? Is there a better
> solution altogether?

timeit.py gives me 0.9 microseconds for the less maintainable
name[:-3].lower()=='jpg' vs 1.7 for name.lower().endswith('jpg')
[and over 3 for re's]. Point is, _do I *care*_? How many millions
of filenames will I check, when the extra overhead of checking
a million filenames in the more maintainable way is less than a
second? How long will it have taken me to get those millions
of filenames into memory in the first place?

If this operation IS on the bottleneck, and a 0.8 microseconds
difference matters, I'll do the slicing -- but 99 times out of 100
I'll do the .endswith instead (I'll _start_ that way 100 times out
of 100, and optimize it iff profiling tells me that matters...).


Alex

Peter Hansen

unread,
Sep 22, 2003, 7:17:26 PM9/22/03
to
Paul wrote:
>
> However, what if you don't want case sensitivity? For example, to
> check if a file is a jpg, I do name[-3:].lower() == 'jpg'. This will
> work with both foo.jpg and foo.JPG.
>
> Is this slower than name.lower().endswith('jpg')? Is there a better
> solution altogether?

Yes, of course. :-)

import os
if os.path.splitext(name)[1].lower() == 'jpg':
pass

That also handles the problem with files named "ThisFileIs.NotAjpg"
being mistreated, as the other solutions do. ;-)

-Peter

David Eppstein

unread,
Sep 22, 2003, 7:35:39 PM9/22/03
to
In article <3F6F8306...@engcorp.com>,
Peter Hansen <pe...@engcorp.com> wrote:

I was about to post the same answer. One minor nit, though: it should be

os.path.splitext(name)[1].lower() == '.jpg'

It may be a little longer than the other solutions, but it expresses the
meaning more clearly.

Terry Reedy

unread,
Sep 22, 2003, 7:57:49 PM9/22/03
to

"Shu-Hsien Sheu" <sh...@bu.edu> wrote in message
news:mailman.1064245938...@python.org...
> Hi,

>
> For example, which one of the following would be more efficient, or
,
> moreover, more pythonic?
>
> if aa[:3] == 'abc':

This creates and deletes a temporary object.

> if aa.startswith('abc'):

This makes a function call. As Bob showed for his system, the two
overheads are about the same for a three char prefix. If one were
checking a 30000 byte prefix, the call might win.

Terry J. Reedy


Peter Hansen

unread,
Sep 22, 2003, 8:19:59 PM9/22/03
to
David Eppstein wrote:
>
> In article <3F6F8306...@engcorp.com>,
> Peter Hansen <pe...@engcorp.com> wrote:
>
> > Paul wrote:
> > >
> > > However, what if you don't want case sensitivity? For example, to
> > > check if a file is a jpg, I do name[-3:].lower() == 'jpg'. This will
> > > work with both foo.jpg and foo.JPG.
> > >
> > > Is this slower than name.lower().endswith('jpg')? Is there a better
> > > solution altogether?
> >
> > Yes, of course. :-)
> >
> > import os
> > if os.path.splitext(name)[1].lower() == 'jpg':
> > pass
> >
> > That also handles the problem with files named "ThisFileIs.NotAjpg"
> > being mistreated, as the other solutions do. ;-)
>
> I was about to post the same answer. One minor nit, though: it should be
>
> os.path.splitext(name)[1].lower() == '.jpg'

Oops, thanks! I _always_ make that mistake. It just seems that if
os.path.split() does not return any path separators in the components,
then os.path.splitext() shouldn't return the extension separator...

Luckily we have tests around here to catch that kind of thing. :-)

-Peter

Peter Otten

unread,
Sep 23, 2003, 4:27:09 AM9/23/03
to
Duncan Booth wrote:

> Note that if anyone proposes this seriously, it should generate a 3-tuple
> (mapping(item), index, item) rather than the 2-tuple you suggest.
>
> This is because the mapping function could reasonably be used to impose an
> ordering on objects that have no natural order, so you need to be sure
> that the comparison never falls through the the original object even where
> the mapping compares equal. It also has the side effect of making the sort
> stable (although if stability is a goal you might want another option to
> reverse the sort which would use '-index' as the second element and call
> .reverse() on the result).

So my demo implementation was faulty :-(
Let me try again:

def sort(self, cmpfunc=None, mapfunc=None):
if mapfunc:
list.sort(self, lambda x, y: cmp(mapfunc(x), mapfunc(y)))
else:
list.sort(self, cmpfunc)

Seriously, if sort() were to be thus extended, the dsu pattern would rather
act as a performance enhancement under the hood. I prefer plain C

struct {
PyObject *decorator;
int index; /* iff stability is not guaranteed by the sort algorithm */
PyObject *data;
}

over n-tuples and would hope to reuse the sorting infrastructure already
there in listobject.c to compare only the decorators. I am not familiar
with the C source, so be gentle if that is not feasible.

> FWIW, I think something like this belongs in the standard library rather
> than as a method on lists or a new builtin.

If you think of the comparison as a two-step process,

(1) extract or calculate an attribute
(2) wrap it into a comparison function

consider the following use cases:

(a) lambda x, y: cmp(y, x)
(b) lambda x, y: cmp(x.attr.lower(), y.attr.lower())

All code following the latter pattern could be rewritten (more clearly, I
think) as

alist.sort(mapfunc=lambda x: attr.lower())

and would automatically benefit from dsu (which could even be turned off
transparently for the client code for very large lists, where the memory
footprint becomes a problem).

The litmus test whether to put it into a utility function or into an already
existing method that needs only a minor and completely backwards-compatible
interface change would then be:

Are (b)-style comparison functions nearly as or more common than (a)-style
functions.

Peter

PS: You and Alex Martelli dismissed my somewhat unfocused idea faster than I
could sort it out. I hope you will find the above useful, though.

Shu-Hsien Sheu

unread,
Sep 23, 2003, 10:32:46 AM9/23/03
to
Great thanks for all the answers!
I really enjoy Python and learned a lot here.

-shuhsien


Gerrit Holl

unread,
Sep 23, 2003, 11:45:36 AM9/23/03
to
Shu-Hsien Sheu wrote:
> In my understanding, using try/except rather than if/else is more
> pythonic. However, sometimes it is difficult to use the later.
> For example, I want to search for a sub string in a list composed of
> strings. It is considered "possitive" if there is a match, no matter how
> many.
>
> my_test = ['something', 'others', 'still others']
>
> case 1: try/except
>
> hit = 0
> for i in my_test:
> try:
> i.index('some')
> hit = 1
> except ValueError:
> pass

> case 2: if/else
>
> hit = 0
> for i in my_test:
> if 'some' in i:
> hit = 1

Much faster would be:
def check():
for elem in my_test:
if 'some' in elem:
return True

...this way, it immediatly stops checking all following values once it finds
a single match.

> It seems to me that in a simple searching/matching, using if might be
> better and the code is smaller. Try/except would have its strengh on
> catching multiple errorrs.

Agreed.

> However, problems occur if the criteria is
> composed of "or" rather than "and". For instance:
>
> if (a in b) or (c in b):
> *do something
>
> try:
> b.index(a)
> b.index(c)
> *do something
> except ValueError:
> pass
>
> The above two are very different.

It would be more similar to use 'if (a in b) and (c in b)',
because that is what the try/except block does. If so, I
think it has the same effect.
I would absolutely prefer the former, because I don't like function
calls who neither change an object whose return value is
thrown away.

regards,
Gerrit.

--
243. As rent of herd cattle he shall pay three gur of corn to the
owner.
-- 1780 BC, Hammurabi, Code of Law
--
Asperger Syndroom - een persoonlijke benadering:
http://people.nl.linux.org/~gerrit/
Het zijn tijden om je zelf met politiek te bemoeien:
http://www.sp.nl/

Batista, Facundo

unread,
Sep 23, 2003, 11:40:38 AM9/23/03
to

#- In my understanding, using try/except rather than if/else is more
#- pythonic. However, sometimes it is difficult to use the later.

It's more pythonic to use always the right tool for the problem.


#- For example, I want to search for a sub string in a list composed of
#- strings. It is considered "possitive" if there is a match,
#- no matter how
#- many.
#-
#- my_test = ['something', 'others', 'still others']
#-
#- case 1: try/except
#-
#- hit = 0
#- for i in my_test:
#- try:
#- i.index('some')
#- hit = 1
#- except ValueError:
#- pass
#-
#-
#-
#- case 2: if/else
#-
#- hit = 0
#- for i in my_test:
#- if 'some' in i:
#- hit = 1
#-

IMHO, better the case 2. One detail, stop the iteration when you have an ok
(code untested):

for i in my_test:
if 'some in i:
hit = 1

break
else:
hit = 0


. Facundo

Shu-Hsien Sheu

unread,
Sep 23, 2003, 11:10:49 AM9/23/03
to
Hi,

In my understanding, using try/except rather than if/else is more

pythonic. However, sometimes it is difficult to use the later.

For example, I want to search for a sub string in a list composed of

strings. It is considered "possitive" if there is a match, no matter how
many.

my_test = ['something', 'others', 'still others']

case 1: try/except

hit = 0
for i in my_test:
try:
i.index('some')
hit = 1
except ValueError:
pass

case 2: if/else

hit = 0


for i in my_test:
if 'some' in i:
hit = 1

It seems to me that in a simple searching/matching, using if might be
better and the code is smaller. Try/except would have its strengh on

catching multiple errorrs. However, problems occur if the criteria is

composed of "or" rather than "and". For instance:

if (a in b) or (c in b):
*do something

try:
b.index(a)
b.index(c)
*do something
except ValueError:
pass

The above two are very different.

Am I right here?

-shuhsien


Stephen Horne

unread,
Sep 23, 2003, 1:40:13 PM9/23/03
to
On Tue, 23 Sep 2003 11:10:49 -0400, Shu-Hsien Sheu <sh...@bu.edu>
wrote:

>Hi,
>
>In my understanding, using try/except rather than if/else is more
>pythonic.

When to use an exception can be a difficult issue in any language.

A common suggestion as to when to use them is 'only for errors' - but
what exactly is an error? Really, its just a special case - and if you
are handling that special case it isn't an error any more.

For example, if your program runs out of memory and cannot complete a
requested task, as long as it handles that special case by reporting
that it run out of memory and not by crashing or whatever, it is
probably doing what any decent requirements document would specify -
in case of insufficient memory, report the problem and cleanly abort
the tasks that cannot be completed without terminating or crashing. In
terms of the requirements, no error has occurred - a special case has
simply been handled exactly as per the requirements.

Actually, I'm not really convinced by that argument either. I think I
first read it in Stroustrup, though I can't be bothered checking ATM.
Anyway, what I'm trying to express is that when to use an exception is
more about intuitions and common sense than hard-and-fast rules.

The nearest thing to a hard-and-fast rule, though, is that if you
throw an exception the condition should really be exceptional - a
special case as opposed to a normal case. try/except is not just an
alternative to if/else, and neither is it an alternative to 'break' in
loops (though Icon programmers may be tempted to use it that way).

One major benefit of exceptions is that you don't have to keep
checking 'has any error occured' in multiple if statements in a
function. Any function that does anything even remotely complicated
will have many different possible error conditions. Making sure that
normal-case code is not run when a failure has already occurred can be
a major headache. Enough of a headache that in C, many people think
error handling is the only case where a 'goto' is acceptable (so an
error can trigger a goto straight to the cleanup code at the end of a
function, keeping the rest of the function clean).

There are other issues, of course - exceptions allow your functions to
cope properly with errors that you don't even know about in functions
that you call, for instance.

But basically, if you handle exceptional conditions by raising
exceptions, your logic will get much clearer as for the most part it
only has to deal with the normal case. The exceptional cases get
handled by the exception handlers.

But if you use try/except as an alternative to if/else, people who
have to read your code may well take up dark magics purely so they can
curse you more effectively ;-)


--
Steve Horne

steve at ninereeds dot fsnet dot co dot uk

Tim Rowe

unread,
Sep 23, 2003, 2:47:10 PM9/23/03
to
On Tue, 23 Sep 2003 11:10:49 -0400, Shu-Hsien Sheu <sh...@bu.edu>
wrote:

>Hi,


>
>In my understanding, using try/except rather than if/else is more
>pythonic.

Rule of thumb: when the block of code is still doing what it's
supposed to do, use if/else. If it's failing to do what it's supposed
to do, use try/except. "except" should be an /exception/!

>However, sometimes it is difficult to use the later.
>For example, I want to search for a sub string in a list composed of
>strings. It is considered "possitive" if there is a match, no matter how
>many.
>
>my_test = ['something', 'others', 'still others']
>
>case 1: try/except
>
>hit = 0
>for i in my_test:
> try:
> i.index('some')
> hit = 1
> except ValueError:
> pass

I'd reckon that to be a bad use of try/except; the "exception" is a
perfectly normal case.

>case 2: if/else
>
>hit = 0
>for i in my_test:
> if 'some' in i:
> hit = 1

My /guess/ is that this would be faster than case 1, as well as
clearer!

>It seems to me that in a simple searching/matching, using if might be
>better and the code is smaller. Try/except would have its strengh on
>catching multiple errorrs. However, problems occur if the criteria is
>composed of "or" rather than "and". For instance:
>
>if (a in b) or (c in b):
> *do something
>
>try:
> b.index(a)
> b.index(c)
> *do something
>except ValueError:
> pass
>
>The above two are very different.

Yes. The first is clear and concise, the second is verbose and
unclear! Also the second could mask a genuine ValueError if a, b, or
c is an evaluation rather than a simple variable, so you'd think that
neither a nor c was in b when in fact you have no idea: something went
invisibly wrong and you never actually completed the search.

So try/except /only/ when something has gone wrong and you need to go
into some sort of recovery or termination, /not/ for routine tests.

Bob Gailer

unread,
Sep 24, 2003, 10:33:58 AM9/24/03
to
At 09:10 AM 9/23/2003, Shu-Hsien Sheu wrote:

>Hi,
>
>In my understanding, using try/except rather than if/else is more pythonic.

If/else and try/except are both Pythonic. In many cases you don't even get
to choose.

>However, sometimes it is difficult to use the later.
>For example, I want to search for a sub string in a list composed of
>strings. It is considered "possitive" if there is a match, no matter how many.
>
>my_test = ['something', 'others', 'still others']
>
>case 1: try/except
>
>hit = 0
>for i in my_test:
> try:
> i.index('some')
> hit = 1
> except ValueError:
> pass
>
>
>

>case 2: if/else
>
>hit = 0
>for i in my_test:
> if 'some' in i:
> hit = 1

Consider breaking out of the loop at the first success

for i in my_test:
if 'some' in i:
hit = 1

break
else:
hit = 0

Iist comprehension can also be an alternative:

hit = [i for i in my_test if 'some' in i]

>It seems to me that in a simple searching/matching, using if might be
>better and the code is smaller. Try/except would have its strengh on
>catching multiple errorrs. However, problems occur if the criteria is
>composed of "or" rather than "and". For instance:
>
>if (a in b) or (c in b):
> *do something
>
>try:
> b.index(a)
> b.index(c)
> *do something
>except ValueError:
> pass
>
>The above two are very different.
>

>Am I right here?

AFAIAC its a mater of style.

Fredrik Lundh

unread,
Sep 24, 2003, 5:48:44 PM9/24/03
to
RE: Slicing vs .startswithFacundo Batista wrote:

See PEP 08:
- Avoid slicing strings when checking for prefixes or suffixes.
Use startswith() and endswith() instead, since they are faster,
cleaner and less error prone. E.g.:
No: if foo[:3] == 'bar':
Yes: if foo.startswith('bar'):
The exception is if your code must work with Python 1.5.2 (but
let's hope not!).

the PEP isn't telling the truth:

> python timeit.py -s "s='egg&bacon'" "s[:3] == 'egg'"
100000 loops, best of 3: 4.43 usec per loop

> python timeit.py -s "s='egg&bacon'" "s.startswith('egg')"
100000 loops, best of 3: 7.6 usec per loop

</F>


Hung Jung Lu

unread,
Sep 27, 2003, 4:25:10 AM9/27/03
to
Tim Rowe <tim@remove_if_not_spam.digitig.co.uk> wrote in message news:<u941nvg94ae04ajnd...@4ax.com>...

> On Tue, 23 Sep 2003 11:10:49 -0400, Shu-Hsien Sheu <sh...@bu.edu>
> wrote:
>
> >In my understanding, using try/except rather than if/else is more
> >pythonic.
>
> Rule of thumb: when the block of code is still doing what it's
> supposed to do, use if/else. If it's failing to do what it's supposed
> to do, use try/except. "except" should be an /exception/!
.....

> So try/except /only/ when something has gone wrong and you need to go
> into some sort of recovery or termination, /not/ for routine tests.

You have a valid point of view, which nonetheless is not shared by
everyone. This is a recurring subject in the newsgroup.

Python exceptions have been used for other purposes, as can be seen
from Python FAQ (e.g. "4.22 Why is there no goto?" in
http://www.python.org/doc/faq/general.html)

The "for" loop in Python is also implemented internally with
exceptions. E.g.: http://groups.google.com/groups?selm=mailman.1010884758.23378.python-list%40python.org&oe=UTF-8&output=gplain,
where it mentioned:

"... In some other languages, 'non failure' mode exceptions may be
unusual, but it's the normal idiom in Python."

regards,

Hung Jung

Hung Jung Lu

unread,
Sep 27, 2003, 5:01:19 AM9/27/03
to
Shu-Hsien Sheu <sh...@bu.edu> wrote in message news:<mailman.1064330017...@python.org>...

> catching multiple errorrs. However, problems occur if the criteria is
> composed of "or" rather than "and". For instance:
>
> if (a in b) or (c in b):
> *do something
>
> try:
> b.index(a)
> b.index(c)
> *do something
> except ValueError:
> pass

For the "or" case, actually the exception trick works rather well.
Usually people raise events by hand:

class Found(Exception): pass

try:
if a in b: raise Found
if c in b: raise Found
except Found:
do something

Of course, one can supply additional information, as in:

try:
if a in b: raise Found('a in b')
if c in b: raise Found('c in b')
except Found, e:
print str(e)

Hung Jung

Tim Rowe

unread,
Sep 29, 2003, 5:33:27 AM9/29/03
to
On 27 Sep 2003 01:25:10 -0700, hungj...@yahoo.com (Hung Jung Lu)
wrote:

>Tim Rowe <tim@remove_if_not_spam.digitig.co.uk> wrote in message news:<u941nvg94ae04ajnd...@4ax.com>...
>> On Tue, 23 Sep 2003 11:10:49 -0400, Shu-Hsien Sheu <sh...@bu.edu>
>> wrote:
>>
>> >In my understanding, using try/except rather than if/else is more
>> >pythonic.
>>
>> Rule of thumb: when the block of code is still doing what it's
>> supposed to do, use if/else. If it's failing to do what it's supposed
>> to do, use try/except. "except" should be an /exception/!
>.....
>> So try/except /only/ when something has gone wrong and you need to go
>> into some sort of recovery or termination, /not/ for routine tests.
>
>You have a valid point of view, which nonetheless is not shared by
>everyone. This is a recurring subject in the newsgroup.

<examples snipped>

That's the nice thing about rules of thumb: they're not binding :-)

Internal constructs don't bother me -- ultimately it all comes down to
machine code which will be a mass of goto's. That doesn't mean that
my code should be (could be!) a mass of goto's.

And the last time I needed a goto for anything I was programming in
BASIC; I don't need to use exceptions to emulate it in a language with
decent flow control constructs (Python qualifies, of course!).

Stephen Horne

unread,
Sep 29, 2003, 2:08:23 PM9/29/03
to
On Mon, 29 Sep 2003 10:33:27 +0100, Tim Rowe
<tim@remove_if_not_spam.digitig.co.uk> wrote:

>Internal constructs don't bother me -- ultimately it all comes down to
>machine code which will be a mass of goto's. That doesn't mean that
>my code should be (could be!) a mass of goto's.
>
>And the last time I needed a goto for anything I was programming in
>BASIC; I don't need to use exceptions to emulate it in a language with
>decent flow control constructs (Python qualifies, of course!).

Ah - a good old goto debate ;-)

I have long claimed that goto is useful in rare circumstances, that it
can be clearer and more maintainable than structured code in those
rare cases, and that while anything can be abused it is still
perfectly possible for a good programmer to use it responsibly and to
refactor when the code becomes too messy. Structured, OO, and indeed
any style of programming can become unreadable and unmaintainable
(hence the fuss about refactoring).

That said, it was quite a shock when I actually found a case where I
needed a goto about six months ago. Quite literally I made several
attempts at writing a function in a structured way and got it wrong
each time. Each time I came back to it the previous version seemed a
horrible mess so I rewrote it. Eventually I figured this was stupid,
especially as it didn't seem so hard in my head. I put that mental
picture onto paper - and as I did so I realised it simply wasn't a
structured job. It was very much a state transition model. But because
it completes all its work in one go, and doesn't need to maintain the
state between calls, it simply hadn't occurred to me to think of it
that way.

I remember telling a college lecturer that if state transition models
are appropriate for some things then so must gotos be appropriate, and
that in a 'short running' state machine a goto would be the logical
way to implement the transitions. At the time, I didn't think it would
be so long before I found an example ;-)

Anyway, this example was in some fairly low level library code I was
writing in C++. I don't see any need for goto in Python, trust that it
will never be added, and don't like to see exceptions abused that way.

Just sharing a rather pointless anecdote.

0 new messages