Problem with code.page and output.code.page using Extended Character Set

176 views
Skip to first unread message

BugFix@AutoIt

unread,
Apr 29, 2021, 7:07:14 AM4/29/21
to scite-interest
My system:
Windows 7 x64
Default System.Text.Encoding:  (Encoding name: UTF-8, CodePage: 65001)

I'm using SciTE with Lua- and AutoIt- scripts. AutoIt files are UTF8BOM encoded and Lua files UTF8.  

Example AutoIt
ConsoleWrite('Ë Ä ' & Chr(203) & ' ' & Chr(196) & @CRLF)
#cs
File encoding: UTF8BOM
code.page=65001
character.set=1000
-------------------------------
output (like expected): Ë Ä Ë Ä
#ce
Because the system codepage (property value: 0) is 65001, should the result be the same, if output.code.page=65001. But it failed. 
#cs
File encoding: UTF8BOM
code.page=65001
character.set=1000
-------------------------------
output (wrong): xCB xC4 xCB xC4
#ce
But I can live with this, because the 1st set works.

Example Lua
print('Ë '..'Ä '..string.char(203)..' '..string.char(196))
--[[
File encoding: UTF8
code.page=65001
character.set=1000
------------------------------- 
output (wrong): Ã Ã Ë Ä
]]

--[[
File encoding: UTF8
code.page=65001
character.set=1000
-------------------------------
output (wrong): Ë Ä xCB xC4
]]
This works in none case correct.

1. What can I do, to get true results with Lua?
2. Need I different property settings for Lua and AutoIt? (I would realize this with "OnOpen" event.)

Neil Hodgson

unread,
Apr 29, 2021, 7:31:49 PM4/29/21
to scite-i...@googlegroups.com
BugFix@AutoIt:

> Example Lua
> print('Ë '..'Ä '..string.char(203)..' '..string.char(196))
> …
> --[[
> File encoding: UTF8
> code.page=65001
> character.set=1000
> output.code.page=65001
> -------------------------------
> output (wrong): Ë Ä xCB xC4
> ]]

string.char(203) is producing the byte 203 which is 0xCB, not the UTF-8 representation of U+00CB which is 0xC3 0x8B. If you want Unicode character U+00CB, then use Lua’s utf8.char function:
print('Ë '..'Ä '..utf8.char(203)..' '..utf8.char(196))

Neil
Reply all
Reply to author
Forward
0 new messages