I'm very impressed with python's wordlist script for plain text. Is there a script for finding words that do NOT have certain diacritic marks, like acute or grave accents (utf-8), over the vowels?
Thank you.
> I'm very impressed with python's wordlist script for plain text. Is there a script for finding words that do NOT have certain diacritic marks, like acute or grave accents (utf-8), over the vowels?
> Thank you.
if you can construct a list of "illegal" characters, then you can simply
check each character of the word against the list, and if it succeeds
for all of the characters, it's a winner.
If that's not fast enough, you can build a translation table from the
list of illegal characters, and use translate on each word. Then it
becomes a question of checking if the translated word is all zeroes.
More setup time, but much faster looping for each word.
> > I'm very impressed with python's wordlist script for plain text. Is there a script for finding words that do NOT have certain diacritic marks, like acute or grave accents (utf-8), over the vowels?
> > Thank you.
> if you can construct a list of "illegal" characters, then you can simply
> check each character of the word against the list, and if it succeeds
> for all of the characters, it's a winner.
> If that's not fast enough, you can build a translation table from the
> list of illegal characters, and use translate on each word. Then it
> becomes a question of checking if the translated word is all zeroes.
> More setup time, but much faster looping for each word.
>>> HasDiacritics('éléphant')
'yes'
>>> HasDiacritics('elephant')
'no'
>>> HasDiacritics('\N{LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON}')
'yes'
>>> HasDiacritics('U')
'no'
Should be ok for the CombiningDiacriticalMarks unicode range
(common diacritics)
> > I'm very impressed with python's wordlist script for plain text. Is there a script for finding words that do NOT have certain diacritic marks, like acute or grave accents (utf-8), over the vowels?
> > Thank you.
> if you can construct a list of "illegal" characters, then you can simply
> check each character of the word against the list, and if it succeeds
> for all of the characters, it's a winner.
> If that's not fast enough, you can build a translation table from the
> list of illegal characters, and use translate on each word. Then it
> becomes a question of checking if the translated word is all zeroes.
> More setup time, but much faster looping for each word.
>>> HasDiacritics('éléphant')
'yes'
>>> HasDiacritics('elephant')
'no'
>>> HasDiacritics('\N{LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON}')
'yes'
>>> HasDiacritics('U')
'no'
Should be ok for the CombiningDiacriticalMarks unicode range
(common diacritics)
On Thu, Oct 18, 2012 at 5:17 AM, <wxjmfa...@gmail.com> wrote:
> Not at all, I knew this. In this I decided to program like
> this.
> Do you get it? Yes/No or True/False
Yes but why? When you're returning a boolean concept, why not return a
boolean value? You don't even use values with one that
compares-as-true and the other that compares-as-false (for instance,
you could write the function so that it returns just the
diacritic-containing characters, meaning it'll return "" if there
aren't any). To what benefit?
On Wed, Oct 17, 2012 at 12:17 PM, <wxjmfa...@gmail.com> wrote:
> Not at all, I knew this. In this I decided to program like
> this.
> Do you get it? Yes/No or True/False
It's just bad style, because both 'yes' and 'no' evaluate true.
if HasDiacritics('éléphant'):
print('Correct!')
if HasDiacritics('elephant'):
print('Error!')
Prints:
Correct!
Error!
You could replace the test with "if HasDiacritics('elephant') ==
'yes':", but why force the caller to write that out when the former
test is more natural and less prone to error (e.g. typoing 'yes')?
On Wed, 17 Oct 2012 13:16:43 -0400, David Robinow wrote:
> On Wed, Oct 17, 2012 at 1:07 PM, Ian Kelly <ian.g.ke...@gmail.com>
> wrote:
>> "return len(w) != len(w_decomposed)" is all you need.
> Thanks for helping, but I already knew that.
David, Ian was directly responding to wxjmfa...@gmail.com, whose suggestion included an entirely unnecessary conversion from a bool flag to the strings 'yes' and 'no'. That can be seen in the part of Ian's post that you deleted.
Regardless of whether *you personally* already knew that jmf's function was unidiomatic and a poor design, you weren't directly the target of the comment. I'm glad you already knew what Ian said, but you're not the only person reading this thread.