Regular expressions: problems with swedish characters

Toffe

unread,

Mar 31, 2005, 12:51:02 PM3/31/05

to

Hi,

I've got a problem with regular expressions and strings containing
Swedish characters (едц).

I basically have a PHP script that highlights certain words in a text. I
found the code attached below in the commented manual at php.net. It
works great for all words that do not contain Swedish characters. The
words that do contain едц will not be highlighted.

Can anyone suggest how I should change my regexp to fix this?

Thanks,
toffe

Code:
=============
function highlightErrors($text, $errors) {

foreach($errors as $e) {
$text = highlight_word($text,$e);
}

return $text;
}

function highlight_word($buff,$query) {

$buff = preg_replace("/(^|[^A-ZедцЕДЦ]){1}(".preg_quote($query,"/").
")($|[^A-ZедцЕДЦ]){1}/i",
"\\1<span class='highlight'>\\2</span>\\3", $buff);
return $buff;
}

=========

R. Rajesh Jeba Anbiah

unread,

Apr 2, 2005, 2:03:40 PM4/2/05

to

Q: How could I match the foreign characters like åäö in regular
expressions?
A: Use hexadecimal representation of those characters, like \xe1

Refer:
http://www.php.net/preg_match#42167

Toffe

unread,

Apr 3, 2005, 5:06:22 AM4/3/05

to

R. Rajesh Jeba Anbiah wrote:
> Q: How could I match the foreign characters like едц in regular

> expressions?
> A: Use hexadecimal representation of those characters, like \xe1
>
> Refer:
> http://www.php.net/preg_match#42167
>

Sorry for being ignorant and not reading the FAQ before posting, won't
happen again...

Thanks a lot for the information!

-toffe

Toffe

unread,

Apr 3, 2005, 6:00:52 AM4/3/05

to

R. Rajesh Jeba Anbiah wrote:

> Q: How could I match the foreign characters like едц in regular

> expressions?
> A: Use hexadecimal representation of those characters, like \xe1
>
> Refer:
> http://www.php.net/preg_match#42167
>

Hi, thanks for the pointer.

It works almost like I want it to now.
My script should highlight certain words in the text, but the text could
be a mix of upper and lower case letters, and if $query below is hxllo
and $buff is HXLLO, where x and X is some Swedish character in its lower
and upper cases, I still don't get a match.

Any suggestions for how I can fix this?

Thanks,
toffe

Code:
====

$buff =
preg_replace("/(^|[^A-Z\xe5\xe4\xf6\xc5\xc4\xd6]){1}(".preg_quote($query,"/").
")($|[^A-Z\xe5\xe4\xf6\xc5\xc4\xd6]){1}/i",
"\\1<SURROUNDING>\\2<TAG>\\3", $buff);

return $buff;
=========

R. Rajesh Jeba Anbiah

unread,

Apr 3, 2005, 8:29:05 AM4/3/05

to

Toffe wrote:
> R. Rajesh Jeba Anbiah wrote:

> > Q: How could I match the foreign characters like åäö in regular

> > expressions?
> > A: Use hexadecimal representation of those characters, like \xe1
> >
> > Refer:
> > http://www.php.net/preg_match#42167
> >

> It works almost like I want it to now.
> My script should highlight certain words in the text, but the text
could
> be a mix of upper and lower case letters, and if $query below is
hxllo
> and $buff is HXLLO, where x and X is some Swedish character in its
lower
> and upper cases, I still don't get a match.

<snip>

> Code:
> ====
>
> $buff =
>
preg_replace("/(^|[^A-Z\xe5\xe4\xf6\xc5\xc4\xd6]){1}(".preg_quote($query,"/").
> ")($|[^A-Z\xe5\xe4\xf6\xc5\xc4\xd6]){1}/i",
> "\\1<SURROUNDING>\\2<TAG>\\3", $buff);
>
> return $buff;
> =========

IIRC, there is no lower-upper case distinction for the foreign
characters--so you may have to add those upper/lower case characters in
the set. Probably you may need to look at
<http://in.php.net/ucwords#51137>

BTW, we don't have any FAQ yet. We're just compiling and the
question was asked previously.

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/