Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to check...

2 views
Skip to first unread message

Lad

unread,
Feb 11, 2006, 7:14:54 AM2/11/06
to
Hello,
How can I check that a string does NOT contain NON English characters?
Thanks
L.

augustu...@gmail.com

unread,
Feb 11, 2006, 7:48:33 AM2/11/06
to
Hello,

try using regular expressions. I'afraid that i don't have any
documentation right here but i think there is a starting point for a
web search now.

Greetings

augustu...@gmail.com

unread,
Feb 11, 2006, 7:53:14 AM2/11/06
to
Additional info: You will documentation in the Python help utility by
typing the module name 're' or 'sre'

Daniel Marcel Eichler

unread,
Feb 11, 2006, 11:19:53 AM2/11/06
to pytho...@python.org
Lad wrote:

>How can I check that a string does NOT contain NON English characters?

try:
foobar.encode('ascii')
except:
bla

or use string.ascii_letters and enhance it.


mfg

Daniel

John Zenger

unread,
Feb 11, 2006, 11:37:13 AM2/11/06
to
This should be just a matter of determining how your string is encoded
(ASCII, UTF, Unicode, etc.) and checking the ord of each character to
see if it is in the contiguous range of English characters for that
encoding. For example, if you know that the string is ASCII or UTF-8,
you could check ord for each character and confirm it is less than 128.

Neil Hodgson

unread,
Feb 11, 2006, 7:38:22 PM2/11/06
to
Lad:

> How can I check that a string does NOT contain NON English characters?

It depends on how you define the set of English characters which is
as much a matter of opinion or authority as fact. The following may be
regarded as English despite containing 9 (8 unique) non-ASCII characters:
The €200 encyclopædia defines the “coördinates” in ¼ ångströms.

Neil

Steven D'Aprano

unread,
Feb 11, 2006, 8:24:06 PM2/11/06
to
On Sat, 11 Feb 2006 04:48:33 -0800, augustus.kling wrote:

> Hello,
>
> try using regular expressions.

"Some people, when confronted with a problem, think 'I know, I'll use
regular expressions'. Now they have two problems." -- Jamie Zawinski

The original poster asked:

"How can I check that a string does NOT contain NON English characters?"

REs are rather overkill for something so simple, don't you think?

import string
english = string.printable # is this what you want?
english = string.ascii_letters + string.digits # or maybe this?
english = "abc..." # or just manually set the characters yourself

for c in some_string:
if c not in english:
print "Not English!!!"
break
else:
print "English!"

if you want it as a function, it is even more flexible:

def all_good(s, goodchars=None):
if goodchars is None:
goodchars = string.printable
for c in s:
if c not in goodchars:
return False
return True

--
Steven.

Scott David Daniels

unread,
Feb 12, 2006, 1:06:43 PM2/12/06
to
If all you care about is ASCII vs. non-ASCII, you could use:
ord(max(string)) < 128

--
-Scott David Daniels
scott....@acm.org

0 new messages