Convert accented characters to unaccented

@medware-inc.com Jerry Hayes

unread,

Aug 2, 2002, 4:15:49 PM8/2/02

to

(Dephi 4)

Is there a way to convert accented characters to unaccented, (meaning ASCII
A-Z, a-z)?

Winston Kotzan

unread,

Aug 2, 2002, 6:51:03 PM8/2/02

to

I am somewhat unsure of what you mean by "accented". From your example in
parentheses, I think you mean converting between lowercase / upper case.

To convert between lowercase and upper case, add or subtract 32 from the
ASCII value of the letter:

function UpperCaseLtr(const Ch: Char): Char;
begin
// Convert lower case letter to upper case ( a --> A )
Result := Chr(Ord(Ch) - 32);
end;

function LowerCaseLtr(const Ch: Char): Char;
begin
// Convert lower case letter to upper case ( A --> a )
Result := Chr(Ord(Ch) + 32);
end;

... or if you are dealing with a string you can probably use the UpperCase()
or LowerCase() functions.

--
Winston Kotzan
www.wakproductions.com

"Jerry Hayes" <jhayes @ medware-inc . com> wrote in message
news:3d4ae7b6$1_2@dnews...

Robert

unread,

Aug 2, 2002, 8:59:23 PM8/2/02

to

Do you mean some of the international characters (ascii >128) ?
I think there is a function, "OemConvert" or something, that converts some
of those.
--
Robert

"Jerry Hayes" <jhayes @ medware-inc . com> wrote in message
news:3d4ae7b6$1_2@dnews...

Chris Thornton

unread,

Aug 3, 2002, 7:16:37 AM8/3/02

to

"Winston Kotzan" <reda...@wakproductions.com> wrote in message
news:3d4b0cd6_2@dnews...

> I am somewhat unsure of what you mean by "accented". From your example in
> parentheses, I think you mean converting between lowercase / upper case.

No, that was the range of characters that he was converting into.
What he wants to do, is to convert accented characters to their unaccent
counterparts.

ex: José => Jose
ex: áéíöü => aeiou

Peter Below (TeamB)

unread,

Aug 3, 2002, 8:07:35 AM8/3/02

to

In article <3d4ae7b6$1_2@dnews>, Jerry Hayes wrote:
> Is there a way to convert accented characters to unaccented, (meaning ASCII
> A-Z, a-z)?

The classical way is to have a conversion table and do a lookup in that
table. The problem with that is that the table is of course specific to a
certain charset (encoding), like Windows Latin-1. You could build a table
for a range of UNICODE (widechar) characters to get around this limitation
and convert the strings to widestrings before you do the accent removals.
The routine below uses ANSI characters with the Windows western (Latin-1)
encoding.

Function SimplifyChar (const _ch : char) : char;
Const
Charmap: array [#128..#255] of Char = (
#128 { Ä }, #129 { Å }, #130 { Ç }, #131 { É }, #132 { Ñ },
#133 { Ö }, #134 { Ü }, #135 { á }, #136 { à }, #137 { â },
#138 { ä }, #139 { ã }, #140 { å }, #141 { ç }, #142 { é },
#143 { è }, #144 { ê }, #145 { ë }, #146 { í }, #147 { ì },
#148 { î }, #149 { ï }, #150 { ñ }, #151 { ó }, #152 { ò },
#153 { ô }, #154 { ö }, #155 { õ }, #156 { ú }, #157 { ù },
#158 { û }, #159 { ü }, #160 { † }, #161 { ° }, #162 { ¢ },
#163 { £ }, #164 { § }, #165 { • }, #166 { ¶ }, #167 { ß },
#168 { ® }, #169 { © }, #170 { ™ }, #171 { ´ }, #172 { ¨ },
#173 { ≠ }, #174 { Æ }, #175 { Ø }, #176 { ∞ }, #177 { ± },
#178 { ≤ }, #179 { ≥ }, #180 { ¥ }, #181 { µ }, #182 { ∂ },
#183 { ∑ }, #184 { ∏ }, #185 { π }, #186 { ∫ }, #187 { ª },
#188 { º }, #189 { Ω }, #190 { æ }, #191 { ø }, 'A' { ¿ },
'A' { ¡ }, 'A' { ¬ }, 'A' { √ }, 'A' { ƒ }, 'A' { ≈ },
#198 { ∆ }, #199 { « }, 'E' { » }, 'E' { … }, 'E' { },
'E' { À }, 'I' { Ã }, 'I' { Õ }, 'I' { Œ }, 'I' { œ },
#208 { – }, #209 { — }, 'O' { “ }, 'O' { ” }, 'O' { ‘ },
'O' { ’ }, 'O' { ÷ }, #215 { ◊ }, #216 { ÿ }, 'U' { Ÿ },
'U' { ⁄ }, 'U' { € }, 'U' { ‹ }, #221 { › }, #222 { ﬁ },
#223 { ﬂ }, 'a' { ‡ }, 'a' { · }, 'a' { ‚ }, 'a' { „ },
'a' { ‰ }, 'a' { Â }, #230 { Ê }, #231 { Á }, 'e' { Ë },
'e' { È }, 'e' { Í }, 'e' { Î }, 'i' { Ï }, 'i' { Ì },
'i' { Ó }, 'i' { Ô }, #240 {  }, #241 { Ò }, 'o' { Ú },
'o' { Û }, 'o' { Ù }, 'o' { ı }, 'o' { ˆ }, #247 { ˜ },
#248 { ¯ }, 'u' { ˘ }, 'u' { ˙ }, 'u' { ˚ }, 'u' { ¸ },
#253 { ˝ }, #254 { ˛ }, #255 { ˇ }
);
begin
If _ch >= #128 Then
Result := Charmap[_ch]
Else
Result := _ch;
end;

The charmap table was created by this little routine and then edited:

Procedure CreateCharacterMap( fromchar, tochar: Char );
Function DisplayStr( const ch: Char ): String;
Begin
If ch < #32 Then
Result := '^'+Chr(Ord('A')-1+Ord(ch))
Else
Result := ch;
End;
Var
sl: TStringlist;
line, element: String;
ch: char;
Begin
Assert( fromchar <= tochar );
sl:= Tstringlist.Create;
Try
sl.Add('Const');
line:= Format(' Charmap: array [#%d..#%d] of Char = (',
[Ord(fromchar), Ord(tochar)] );
sl.Add(line);
line := '';
For ch:= fromchar to toChar Do Begin
element := Format('#%3.3d { %s }', [Ord(ch), DisplayStr(ch )]);
If (Length(line)+Length(element)) > 66 Then Begin
sl.Add(' '+line);
line := '';
End; { If }
line := line + element;
If ch <> tochar Then
line := line + ', ';
End; { For }
sl.Add(' '+line);
sl.add(' );');
Clipboard.AsText := sl.Text;
Finally
sl.Free
End; { Finally }
End;

--
Peter Below (TeamB)
Use the newsgroup archives :
http://www.mers.com/searchsite.html
http://www.tamaracka.com/search.htm
http://groups.google.com
http://www.prolix.be

Winston Kotzan

unread,

Aug 3, 2002, 12:02:34 PM8/3/02

to

Sorry, slight misunderstanding. :-(

"Chris Thornton" <ch...@thornsoft.com> wrote in message
news:3d4bbb83$1_1@dnews...