Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Script for finding words of any size that do NOT contain vowels with acute diacritic marks?

59 views
Skip to first unread message

nwaits

unread,
Oct 17, 2012, 10:31:42 AM10/17/12
to
I'm very impressed with python's wordlist script for plain text. Is there a script for finding words that do NOT have certain diacritic marks, like acute or grave accents (utf-8), over the vowels?
Thank you.

Dave Angel

unread,
Oct 17, 2012, 11:00:11 AM10/17/12
to nwaits, pytho...@python.org
On 10/17/2012 10:31 AM, nwaits wrote:
> I'm very impressed with python's wordlist script for plain text. Is there a script for finding words that do NOT have certain diacritic marks, like acute or grave accents (utf-8), over the vowels?
> Thank you.

if you can construct a list of "illegal" characters, then you can simply
check each character of the word against the list, and if it succeeds
for all of the characters, it's a winner.

If that's not fast enough, you can build a translation table from the
list of illegal characters, and use translate on each word. Then it
becomes a question of checking if the translated word is all zeroes.
More setup time, but much faster looping for each word.

--

DaveA

wxjm...@gmail.com

unread,
Oct 17, 2012, 11:32:52 AM10/17/12
to nwaits, pytho...@python.org, d...@davea.name
Lazy way.
Py3.2

>>> import unicodedata
>>> def HasDiacritics(w):
... w_decomposed = unicodedata.normalize('NFKD', w)
... return 'no' if len(w) == len(w_decomposed) else 'yes'
...
>>> HasDiacritics('éléphant')
'yes'
>>> HasDiacritics('elephant')
'no'
>>> HasDiacritics('\N{LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON}')
'yes'
>>> HasDiacritics('U')
'no'
>>>

Should be ok for the CombiningDiacriticalMarks unicode range
(common diacritics)

jmf

wxjm...@gmail.com

unread,
Oct 17, 2012, 11:32:52 AM10/17/12
to comp.lan...@googlegroups.com, pytho...@python.org, d...@davea.name, nwaits
Le mercredi 17 octobre 2012 17:00:46 UTC+2, Dave Angel a écrit :

Ian Kelly

unread,
Oct 17, 2012, 1:07:11 PM10/17/12
to Python
On Wed, Oct 17, 2012 at 9:32 AM, <wxjm...@gmail.com> wrote:
>>>> import unicodedata
>>>> def HasDiacritics(w):
> ... w_decomposed = unicodedata.normalize('NFKD', w)
> ... return 'no' if len(w) == len(w_decomposed) else 'yes'
> ...
>>>> HasDiacritics('éléphant')
> 'yes'
>>>> HasDiacritics('elephant')
> 'no'
>>>> HasDiacritics('\N{LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON}')
> 'yes'
>>>> HasDiacritics('U')
> 'no'

Is there something wrong with True and False that you had to replace
them with strings?

"return len(w) != len(w_decomposed)" is all you need.

David Robinow

unread,
Oct 17, 2012, 1:16:43 PM10/17/12
to Python
On Wed, Oct 17, 2012 at 1:07 PM, Ian Kelly <ian.g...@gmail.com> wrote:
> "return len(w) != len(w_decomposed)" is all you need.
Thanks for helping, but I already knew that.

wxjm...@gmail.com

unread,
Oct 17, 2012, 2:17:16 PM10/17/12
to Python
Not at all, I knew this. In this I decided to program like
this.

Do you get it? Yes/No or True/False

jmf

wxjm...@gmail.com

unread,
Oct 17, 2012, 2:17:16 PM10/17/12
to comp.lan...@googlegroups.com, Python
Le mercredi 17 octobre 2012 19:07:43 UTC+2, Ian a écrit :

Chris Angelico

unread,
Oct 17, 2012, 2:22:29 PM10/17/12
to pytho...@python.org
On Thu, Oct 18, 2012 at 5:17 AM, <wxjm...@gmail.com> wrote:
> Not at all, I knew this. In this I decided to program like
> this.
>
> Do you get it? Yes/No or True/False

Yes but why? When you're returning a boolean concept, why not return a
boolean value? You don't even use values with one that
compares-as-true and the other that compares-as-false (for instance,
you could write the function so that it returns just the
diacritic-containing characters, meaning it'll return "" if there
aren't any). To what benefit?

Puzzled.

ChrisA

Ian Kelly

unread,
Oct 17, 2012, 2:27:12 PM10/17/12
to Python
On Wed, Oct 17, 2012 at 12:17 PM, <wxjm...@gmail.com> wrote:
> Not at all, I knew this. In this I decided to program like
> this.
>
> Do you get it? Yes/No or True/False

It's just bad style, because both 'yes' and 'no' evaluate true.

if HasDiacritics('éléphant'):
print('Correct!')

if HasDiacritics('elephant'):
print('Error!')

Prints:

Correct!
Error!

You could replace the test with "if HasDiacritics('elephant') ==
'yes':", but why force the caller to write that out when the former
test is more natural and less prone to error (e.g. typoing 'yes')?

wxjm...@gmail.com

unread,
Oct 17, 2012, 2:33:30 PM10/17/12
to Python
I *know* all this. In my prev. msg, the goal was to emph. the
usage of *unicode.normalize()".

jmf

wxjm...@gmail.com

unread,
Oct 17, 2012, 2:33:30 PM10/17/12
to comp.lan...@googlegroups.com, Python
Le mercredi 17 octobre 2012 20:28:21 UTC+2, Ian a écrit :

Steven D'Aprano

unread,
Oct 17, 2012, 7:18:03 PM10/17/12
to
David, Ian was directly responding to wxjm...@gmail.com, whose
suggestion included an entirely unnecessary conversion from a bool flag
to the strings 'yes' and 'no'. That can be seen in the part of Ian's post
that you deleted.

Regardless of whether *you personally* already knew that jmf's function
was unidiomatic and a poor design, you weren't directly the target of the
comment. I'm glad you already knew what Ian said, but you're not the only
person reading this thread.



--
Steven
0 new messages