Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

problem matching accented chars on OS X

1 view
Skip to first unread message

Alex Fenton

unread,
Jun 11, 2005, 7:08:30 AM6/11/05
to
Hi

I'm finding words within strings in Western European languages, so I
need to account
for accented characters, such as ê (e circumflex) and à (a grave). On
ruby 1.8.2
MSW the following works for me (simplified):

WORD_PATTERN = /^[\w\xC0-\xD6\xD8-\xF6\xF8-\xFF]+$/s

\w gets me a-z + A-Z , the hex characters are the positions of the
accented characters in
iso-8859-1 encoding. This seems to work, but when I run the same code on
OS X, I get

../lib/weft/backend/sqlite.rb:533: mismatch multibyte code length in
char-class range: /^[\w\xC0-\xD6\xD8-\xF6\xF8-\xFF]+$/ (SyntaxError)

Any pointers? I'm not sure what is going wrong.

Is there a library written that can help me matching letter characters
(ideally in a
variety of codesets)? [:alpha:] regex class seeemed to be synonymous
with \w, which
doesn't match enough.

cheers
alex

Nura...@aol.com

unread,
Jun 11, 2005, 8:20:09 AM6/11/05
to
Dear Alex,

you can test for equality in regexps for different encodings in Ruby

Equality---Two regexps are equal if their patterns are identical, they have
the same character set code, and their casefold? values are the same

/abc/ == /abc/x » true /abc/ == /abc/i » false
/abc/u == /abc/n

» false
The lang parameter enables multibyte support for the regexp: `n', `N' =
none, `e', `E' = EUC, `s', `S' = SJIS, `u', `U' = UTF-8.

That's information from the "Pragmatic Programmer's guide."
Unfortunately, I could not test it on a Mac ... There seems to be a special
encoding
for Mac, but maybe you can force it to use UTF-8 or something...

Hope that helps.

Best regards,

Axel

Nura...@aol.com

unread,
Jun 11, 2005, 8:33:34 AM6/11/05
to
Dear Alex,

I just saw something that may help on. There is a new Regexp library
Oniguruma for Ruby which allows for Regexp support in even more
encodings.
_http://raa.ruby-lang.org/project/oniguruma/_
(http://raa.ruby-lang.org/project/oniguruma/)
There is a special version for Mac OS 10.3, and I read that one has to
download
the correct one.....

Best regards,

Axel
0 new messages