On Fri, 2013-05-17, Olive wrote:
> One feature that seems to be missing in the re module (or any tools
> that I know for searching text) is "diacretical incensitive search". I
> would like to have a match for something like this:
> re.match("franc", "fran�ais")
...
> The algorithm to write such a function is trivial but there are a
> lot of mark we can put on a letter. It would be necessary to have the
> list of "a"'s with something on it. i.e. "�,�,�", etc. and this for
> every letter. Trying to make such a list by hand would inevitably lead
> to some symbols forgotten (and would be tedious).
Ok, but please remember that the diacriticals are of varying importance.
The english "na�ve" is easily recognizable when written as "naive".
The swedish word "f�r" cannot be spelled "far" and still be understood.
This is IMHO out of the scope of re, and perhaps case-insensitivity
should have been too. Perhaps it /would/ have been, if regular
expressions hadn't come from the ASCII world where these things are
easy.
/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/
snipabacken.se> O o .