So far I found no way to achieve that. I tried converting both strings
to the same case and using ~* , but neither worked.
Does anybody no a way to do this?
Thanks for the help.
Balázs
Oh, tricky. Firstly, case-insensetive means different things to
different locales. For example, in Turkish 'i' is not the lowecase
version of 'I'. Maybe you've chosen a locale that doesn't do UTF-8? You
don't specify a platform either. Locale support varies wildly by
platform.
What you probably want it some kind of accent-insensetive match that
mean that é, è, ë, e, É, È, E and Ë are all considered to match
eachother. The way you do that is by converting unicode to a particular
normal form and then comparing. Unfortunatly, I don't think PostgreSQL
supplies such a function right now.
However, some server-side procedural languages can do this. If you can
find one (possibly Perl) that can do the conversion, you can create a
function to do the mapping.
Have a nice day,
--
Martijn van Oosterhout <kle...@svana.org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.
The way I imagine it now is a special case of the ~ function.
I create matchgroups in a table and check each character if it is in the
group. If it is I will replace the character with the group in [éÉE],
[oóOÓ??] and do a regexp with that.
What do you think?
B.
No need to reinvent the wheel. ICU provides a range of services to deal
with this. For example the following filter in ICU:
NFD; [:Nonspacing Mark:] Remove; NFC.
Will remove all accents from characters. And it works for all Unicode
characters. With a bit more thinking you can work with case variations
also.
There is also a locale-independant case-mapping module there plus
various locale specific ones also.
http://icu.sourceforge.net/userguide/Transform.html
http://icu.sourceforge.net/userguide/caseMappings.html
http://icu.sourceforge.net/userguide/normalization.html
Thanks
Balázs
There are some unofficial ICU patches but I doubt they're still
up-to-date. I don't personally use it though maybe someone else here
does...
I was looking into creating a Pg function wrapper to some of the ICU
stuff, but, to be perfectly honest, I couldn't find an actual API
reference for ICU.
In any case, you can do this with PL/Perl:
CREATE FUNCTION strip_nonspacing_marks ( text ) RETURNS text AS $func$
use Unicode::Normalize;
use Encode;
my $string = NFD( decode( utf8 => shift() ) );
$string =~ s/\p{Mn}+//ogsm;
return NFC($string);
$func$ LANGUAGE 'plperl' STRICT;
It's untested and won't be as fast as ICU, but it should get the job
done. Hope it helps!
>
> Thanks
> Balázs
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>
--
Mike Rylander
mryl...@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
Do you have any suggestion?
Thanks
Balázs