Re: Pythonic way to determine if one char of many in a string

pyt...@bdurham.com

unread,

Feb 16, 2009, 12:48:34 AM2/16/09

to Nicolas Dandrimont, pytho...@python.org

Nicolas,

> I would go for something like:
>
> for char in word:
> if char in 'aeiouAEIUO':
> char_found = True
> break
> else:
> char_found = False
>
> It is clear (imo), and it is seems to be the intended idiom for a
> search loop, that short-circuits as soon as a match is found.

Thank you - that looks much better that my overly complicated attempts.

Are there any reasons why I couldn't simplify your approach as follows?

for char in word:
if char in 'aeiouAEIUO':
return True
return False

Cheers,
Malcolm

Nicolas Dandrimont

unread,

Feb 16, 2009, 12:28:46 AM2/16/09

to pyt...@bdurham.com, pytho...@python.org

* pyt...@bdurham.com <pyt...@bdurham.com> [2009-02-16 00:17:37 -0500]:

> I need to test strings to determine if one of a list of chars is
> in the string. A simple example would be to test strings to
> determine if they have a vowel (aeiouAEIOU) present.
> I was hopeful that there was a built-in method that operated
> similar to startswith where I could pass a tuple of chars to be
> tested, but I could not find such a method.
> Which of the following techniques is most Pythonic or are there
> better ways to perform this type of match?
> # long and hard coded but short circuits as soon as match found
> if 'a' in word or 'e' in word or 'i' in word or 'u' in word or
> ... :
> -OR-
> # flexible, but no short circuit on first match
> if [ char for char in word if char in 'aeiouAEIOU' ]:
> -OR-
> # flexible, but no short circuit on first match
> if set( word ).intersection( 'aeiouAEIOU' ):

I would go for something like:

for char in word:
if char in 'aeiouAEIUO':
char_found = True
break
else:
char_found = False

(No, I did not forget to indent the else statement, see
http://docs.python.org/reference/compound_stmts.html#for)

It is clear (imo), and it is seems to be the intended idiom for a search
loop, that short-circuits as soon as a match is found.

Cheers,
--
Nicolas Dandrimont

linux: the choice of a GNU generation
(k...@cis.ufl.edu put this on Tshirts in '93)

signature.asc

Chris Rebert

unread,

Feb 16, 2009, 12:56:00 AM2/16/09

to pyt...@bdurham.com, pytho...@python.org

On Sun, Feb 15, 2009 at 9:17 PM, <pyt...@bdurham.com> wrote:
> I need to test strings to determine if one of a list of chars is in the
> string. A simple example would be to test strings to determine if they have
> a vowel (aeiouAEIOU) present.
>
> I was hopeful that there was a built-in method that operated similar to
> startswith where I could pass a tuple of chars to be tested, but I could not
> find such a method.
>
> Which of the following techniques is most Pythonic or are there better ways
> to perform this type of match?
>
> # long and hard coded but short circuits as soon as match found
> if 'a' in word or 'e' in word or 'i' in word or 'u' in word or ... :
>
> -OR-
>
> # flexible, but no short circuit on first match
> if [ char for char in word if char in 'aeiouAEIOU' ]:

Just use the fairly new builtin function any() to make it short-circuit:

if any(char.lower() in 'aeiou' for char in word):
do_whatever()

Cheers,
Chris

--
Follow the path of the Iguana...
http://rebertia.com

odeits

unread,

Feb 16, 2009, 2:31:42 AM2/16/09

to

If you want to generalize it you should look at sets
http://docs.python.org/library/sets.html

It seems what you are actually testing for is if the intersection of
the two sets is not empty where the first set is the characters in
your word and the second set is the characters in your defined string.

Nicolas Dandrimont

unread,

Feb 16, 2009, 1:09:28 PM2/16/09

to pyt...@bdurham.com, pytho...@python.org

* pyt...@bdurham.com <pyt...@bdurham.com> [2009-02-16 00:48:34 -0500]:

> Nicolas,

>
> > I would go for something like:
> >

> > for char in word:

> > if char in 'aeiouAEIUO':
> > char_found = True
> > break
> > else:
> > char_found = False
> >

> > It is clear (imo), and it is seems to be the intended idiom for a
> > search loop, that short-circuits as soon as a match is found.
>

> Thank you - that looks much better that my overly complicated attempts.
>
> Are there any reasons why I couldn't simplify your approach as follows?
>

> for char in word:

> if char in 'aeiouAEIUO':
> return True
> return False

If you want to put this in its own function, this seems to be the way to go.

Cheers,
--
Nicolas Dandrimont

The nice thing about Windows is - It does not just crash, it displays a
dialog box and lets you press 'OK' first.
(Arno Schaefer's .sig)

signature.asc

J. Cliff Dyer

unread,

Feb 16, 2009, 1:34:17 PM2/16/09

to Nicolas Dandrimont, python-list

On Mon, 2009-02-16 at 00:28 -0500, Nicolas Dandrimont wrote:
> * pyt...@bdurham.com <pyt...@bdurham.com> [2009-02-16 00:17:37 -0500]:

>
> > I need to test strings to determine if one of a list of chars is
> > in the string. A simple example would be to test strings to
> > determine if they have a vowel (aeiouAEIOU) present.
> > I was hopeful that there was a built-in method that operated
> > similar to startswith where I could pass a tuple of chars to be
> > tested, but I could not find such a method.
> > Which of the following techniques is most Pythonic or are there
> > better ways to perform this type of match?
> > # long and hard coded but short circuits as soon as match found
> > if 'a' in word or 'e' in word or 'i' in word or 'u' in word or
> > ... :
> > -OR-
> > # flexible, but no short circuit on first match

> > if [ char for char in word if char in 'aeiouAEIOU' ]:

> > -OR-
> > # flexible, but no short circuit on first match

> > if set( word ).intersection( 'aeiouAEIOU' ):
>

> I would go for something like:
>
> for char in word:
> if char in 'aeiouAEIUO':
> char_found = True
> break
> else:
> char_found = False
>

> (No, I did not forget to indent the else statement, see
> http://docs.python.org/reference/compound_stmts.html#for)
>

> It is clear (imo), and it is seems to be the intended idiom for a search
> loop, that short-circuits as soon as a match is found.
>

If performance becomes an issue, you can tune this very easily, so it
doesn't have to scan through the string 'aeiouAEIOU' every time, by
making a set out of that:

vowels = set('aeiouAEIOU')
for char in word
if char in vowels:
return True
return False

Searching in a set runs in constant time.

> Cheers,
> --
> http://mail.python.org/mailman/listinfo/python-list

Stefaan Himpe

unread,

Feb 16, 2009, 2:44:55 PM2/16/09

to

An entirely different approach would be to use a regular expression:

import re
if re.search("[abc]", "nothing expekted"):
print "a, b or c occurs in the string 'nothing expekted'"

if re.search("[abc]", "something expected"):
print "a, b or c occurs in the string 'something expected'"

Best regards,
Stefaan.

Steven D'Aprano

unread,

Feb 17, 2009, 7:09:50 AM2/17/09

to

Nicolas Dandrimont wrote:

> I would go for something like:
>
> for char in word:
> if char in 'aeiouAEIUO':
> char_found = True
> break
> else:
> char_found = False
>
> (No, I did not forget to indent the else statement, see
> http://docs.python.org/reference/compound_stmts.html#for)

That might be better written as:

char_found = False

for char in word:
if char in 'aeiouAEIUO':
char_found = True
break

or even:

char_found = False
for char in word:
if char.lower() in 'aeiou':
char_found = True
break

but if word is potentially very large, it's probably better to reverse the
test: rather than compare every char of word to see if it is a vowel, just
search word for each vowel:

char_found = any( vowel in word for vowel in 'aeiouAEIOU' )

This moves the for-loop out of slow Python into fast C and should be much,
much faster for very large input.

--
Steven

Jervis Whitley

unread,

Feb 17, 2009, 3:08:04 PM2/17/09

to pytho...@python.org

>
> This moves the for-loop out of slow Python into fast C and should be much,
> much faster for very large input.
>

_Should_ be faster.

Here is my test on an XP system Python 2.5.4. I had similar results on
python 2.7 trunk.

WORD = 'g' * 100
WORD2 = 'g' * 50 + 'U'

BIGWORD = 'g' * 10000 + 'U'

def any_test(word):
return any(vowel in word for vowel in 'aeiouAEIOU')

def for_test(word):
for vowel in 'aeiouAEIOU':
if vowel in word:
return True
else:
return False

**no vowels**
any: [0.36063678618957751, 0.36116506191682773, 0.36212355395824081]
for: [0.24044885376801672, 0.2417684017413404, 0.24084797257163482]

**vowel 'U' final char**
any: [0.38218764069443112, 0.38431925474244588, 0.38238668882188831]
for: [0.16398578356553717, 0.16433223810347286, 0.16593555537176385]

**BIG word vowel 'U' final char**
any: [8.0007259193539895, 7.9797344140269644, 7.8901742633514012]
for: [7.6664422372764101, 7.6784683633957584, 7.6683055766498001]

Cheers,

Steven D'Aprano

unread,

Feb 18, 2009, 2:55:41 AM2/18/09

to

On Wed, 18 Feb 2009 07:08:04 +1100, Jervis Whitley wrote:

>> This moves the for-loop out of slow Python into fast C and should be
>> much, much faster for very large input.
>>
>>
> _Should_ be faster.

Yes, Python's timing results are often unintuitive.

> Here is my test on an XP system Python 2.5.4. I had similar results on
> python 2.7 trunk.

...

> **no vowels**
> any: [0.36063678618957751, 0.36116506191682773, 0.36212355395824081]
> for: [0.24044885376801672, 0.2417684017413404, 0.24084797257163482]

I get similar results.

...

> **BIG word vowel 'U' final char**
> any: [8.0007259193539895, 7.9797344140269644, 7.8901742633514012] for:
> [7.6664422372764101, 7.6784683633957584, 7.6683055766498001]

Well, I did say "for very large input". 10000 chars isn't "very large" --
that's only 9K. Try this instead:

>>> BIGWORD = 'g' * 500000 + 'U' # less than 500K of text
>>>
>>> Timer("for_test(BIGWORD)", setup).repeat(number=1000)
[4.7292280197143555, 4.633030891418457, 4.6327309608459473]
>>> Timer("any_test(BIGWORD)", setup).repeat(number=1000)
[4.7717428207397461, 4.6366970539093018, 4.6367099285125732]

The difference is not significant. What about bigger?

>>> BIGWORD = 'g' * 5000000 + 'U' # less than 5MB
>>>
>>> Timer("for_test(BIGWORD)", setup).repeat(number=100)
[4.8875839710235596, 4.7698030471801758, 4.769787073135376]
>>> Timer("any_test(BIGWORD)", setup).repeat(number=100)
[4.8555209636688232, 4.8139419555664062, 4.7710208892822266]

It seems to me that I was mistaken -- for large enough input, the running
time of each version converges to approximately the same speed.

What happens when you have hundreds of megabytes, I don't know.

--
Steven

Jervis Whitley

unread,

Feb 18, 2009, 2:44:02 PM2/18/09

to Steven D'Aprano, pytho...@python.org

>
> What happens when you have hundreds of megabytes, I don't know.
>
>

I hope I never have to test a word that is hundreds of megabytes long
for a vowel :)

Steve Holden

unread,

Feb 18, 2009, 3:05:47 PM2/18/09

to pytho...@python.org

Jervis Whitley wrote:
>> What happens when you have hundreds of megabytes, I don't know.
>>
>>

> I hope I never have to test a word that is hundreds of megabytes long
> for a vowel :)

I see you don't speak German ;-)
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Peter Otten

unread,

Feb 18, 2009, 3:12:45 PM2/18/09

to

Steven D'Aprano wrote:

> On Wed, 18 Feb 2009 07:08:04 +1100, Jervis Whitley wrote:
>
>
>>> This moves the for-loop out of slow Python into fast C and should be
>>> much, much faster for very large input.
>>>
>>>
>> _Should_ be faster.
>
> Yes, Python's timing results are often unintuitive.

Indeed.

> It seems to me that I was mistaken -- for large enough input, the running
> time of each version converges to approximately the same speed.

No, you were right. Both any_test() and for_test() use the improvement you
suggested, i. e. loop over the vowels, not the characters of the word.
Here's the benchmark as it should have been:

$ python -m timeit -s'word = "g"*10000' 'any(v in word for v in "aeiouAEIOU")'
1000 loops, best of 3: 314 usec per loop
$ python -m timeit -s'word = "g"*10000' 'any(c in "aeiouAEIOU" for c in word)'
100 loops, best of 3: 3.48 msec per loop

Of course this shows only the worst case behaviour. The results will vary
depending on the actual word e. g. "Ug..." or "g...a".

Peter

Peter Otten

unread,

Feb 18, 2009, 3:22:45 PM2/18/09

to

Steve Holden wrote:

> Jervis Whitley wrote:
>>> What happens when you have hundreds of megabytes, I don't know.
>>>
>>>
>> I hope I never have to test a word that is hundreds of megabytes long
>> for a vowel :)
>
> I see you don't speak German ;-)

I tried to come up with a funny way to point out that you're a fool.

But because I'm German I failed.

Peter

Message has been deleted

John Machin

unread,

Feb 19, 2009, 4:45:03 AM2/19/09

to

On Feb 19, 6:47 pm, Dennis Lee Bieber <wlfr...@ix.netcom.com> wrote:
> On Wed, 18 Feb 2009 21:22:45 +0100, Peter Otten <__pete...@web.de>
> declaimed the following in comp.lang.python:

> Yeah... the proper language to bemoan is Welsh <G>
>
> Where "w" is vowel <G>

Better bemoaned is Czech which can go on and on using only L and R as
"sonorant consonants" instead of vowels.

MRAB

unread,

Feb 19, 2009, 7:39:42 AM2/19/09

to pytho...@python.org

Dennis Lee Bieber wrote:
> On Wed, 18 Feb 2009 21:22:45 +0100, Peter Otten <__pet...@web.de>

> declaimed the following in comp.lang.python:
>

> Yeah... the proper language to bemoan is Welsh <G>
>
> Where "w" is vowel <G>
>

So is "y", but then it is in English too, sometimes.

Martin P. Hellwig

unread,

Feb 19, 2009, 7:50:57 AM2/19/09

to

Heh, born and raised in Germany, moved to the Netherlands and now live
in the UK, speak a bit of French too. No wonder the only language that
makes actually sense to me is Python.

--
mph

odeits

unread,

Feb 20, 2009, 10:14:02 PM2/20/09

to

On Feb 15, 11:31 pm, odeits <ode...@gmail.com> wrote:
> On Feb 15, 9:56 pm, Chris Rebert <c...@rebertia.com> wrote:
>
>
>
> > On Sun, Feb 15, 2009 at 9:17 PM, <pyt...@bdurham.com> wrote:
> > > I need to test strings to determine if one of a list of chars is in the
> > > string. A simple example would be to test strings to determine if they have
> > > a vowel (aeiouAEIOU) present.
>
> > > I was hopeful that there was a built-in method that operated similar to
> > > startswith where I could pass a tuple of chars to be tested, but I could not
> > > find such a method.
>
> > > Which of the following techniques is most Pythonic or are there better ways
> > > to perform this type of match?
>
> > > # long and hard coded but short circuits as soon as match found
> > > if 'a' in word or 'e' in word or 'i' in word or 'u' in word or ... :
>
> > > -OR-
>
> > > # flexible, but no short circuit on first match
> > > if [ char for char in word if char in 'aeiouAEIOU' ]:
>
> > Just use the fairly new builtin function any() to make it short-circuit:
>
> > if any(char.lower() in 'aeiou' for char in word):
> > do_whatever()
>
> > Cheers,
> > Chris
>
> > --
> > Follow the path of the Iguana...http://rebertia.com
>

> If you want to generalize it you should look at setshttp://docs.python.org/library/sets.html

>
> It seems what you are actually testing for is if the intersection of
> the two sets is not empty where the first set is the characters in
> your word and the second set is the characters in your defined string.

To expand on what I was saying I thought i should provide a code
snippet:

WORD = 'g' * 100
WORD2 = 'g' * 50 + 'U'

VOWELS = 'aeiouAEIOU'

BIGWORD = 'g' * 10000 + 'U'

def set_test(vowels, word):

vowels = set( iter(vowels))
letters = set( iter(word) )

if letters & vowels:

return True
else:
return False

with python 2.5 I got 1.30 usec/pass against the BIGWORD

Gabriel Genellina

unread,

Feb 21, 2009, 3:47:33 AM2/21/09

to pytho...@python.org

You could make it slightly faster by removing the iter() call: letters =
set(word)
And (if vowels are really constant) you could pre-build the vowels set.

--
Gabriel Genellina

odeits

unread,

Feb 21, 2009, 4:12:32 PM2/21/09

to

On Feb 21, 12:47 am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:

set(word) = set{[word]} meaning a set with one element, the string
the call to iter makes it set of the letters making up the word.

rdmu...@bitdance.com

unread,

Feb 21, 2009, 5:24:51 PM2/21/09

to pytho...@python.org

odeits <ode...@gmail.com> wrote:
> On Feb 21, 12:47=A0am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
> wrote:
> > En Sat, 21 Feb 2009 01:14:02 -0200, odeits <ode...@gmail.com> escribi=F3:

> >
> > > On Feb 15, 11:31=A0pm, odeits <ode...@gmail.com> wrote:
> > >> It seems what you are actually testing for is if the intersection of
> > >> the two sets is not empty where the first set is the characters in
> > >> your word and the second set is the characters in your defined string.
> >
> > > To expand on what I was saying I thought i should provide a code
> > > snippet:
> >
> > > WORD = 'g' * 100
> > > WORD2 = 'g' * 50 + 'U'
> > > VOWELS = 'aeiouAEIOU'
> > > BIGWORD = 'g' * 10000 + 'U'
> >
> > > def set_test(vowels, word):
> >
> > > vowels = set( iter(vowels))
> > > letters = set( iter(word) )
> >
> > > if letters & vowels:
> > > return True
> > > else:
> > > return False
> >
> > > with python 2.5 I got 1.30 usec/pass against the BIGWORD
> >
> > You could make it slightly faster by removing the iter() call:
> > letters = set(word)
> > And (if vowels are really constant) you could pre-build the vowels set.
>

> set(word) = set{[word]} meaning a set with one element, the string
> the call to iter makes it set of the letters making up the word.

Did you try it?

Python 2.6.1 (r261:67515, Jan 7 2009, 17:09:13)
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> set('abcd')
set(['a', 'c', 'b', 'd'])

--RDM

odeits

unread,

Feb 21, 2009, 7:53:25 PM2/21/09

to

You are in fact correct. Thank you for pointing that out.