Problem with code.page and output.code.page using Extended Character Set

238 views

Skip to first unread message

BugFix@AutoIt

unread,

Apr 29, 2021, 7:07:14 AM4/29/21

to scite-interest

My system:

Windows 7 x64

Default System.Text.Encoding: (Encoding name: UTF-8, CodePage: 65001)

I'm using SciTE with Lua- and AutoIt- scripts. AutoIt files are UTF8BOM encoded and Lua files UTF8.

Example AutoIt

ConsoleWrite('Ë Ä ' & Chr(203) & ' ' & Chr(196) & @CRLF)

#cs

File encoding: UTF8BOM

code.page=65001

character.set=1000

output.code.page=0

-------------------------------

output (like expected): Ë Ä Ë Ä

#ce

Because the system codepage (property value: 0) is 65001, should the result be the same, if output.code.page=65001. But it failed.

#cs

File encoding: UTF8BOM

code.page=65001

character.set=1000

output.code.page=65001

-------------------------------

output (wrong): xCB xC4 xCB xC4

#ce

But I can live with this, because the 1st set works.

Example Lua

print('Ë '..'Ä '..string.char(203)..' '..string.char(196))

--[[

File encoding: UTF8

code.page=65001

character.set=1000

output.code.page=0

-------------------------------

output (wrong): Ã Ã Ë Ä

]]

--[[

File encoding: UTF8

code.page=65001

character.set=1000

output.code.page=65001

-------------------------------

output (wrong): Ë Ä xCB xC4

]]

This works in none case correct.

1. What can I do, to get true results with Lua?

2. Need I different property settings for Lua and AutoIt? (I would realize this with "OnOpen" event.)

Neil Hodgson

unread,

Apr 29, 2021, 7:31:49 PM4/29/21

to scite-i...@googlegroups.com

BugFix@AutoIt:

> Example Lua
> print('Ë '..'Ä '..string.char(203)..' '..string.char(196))
> …
> --[[
> File encoding: UTF8
> code.page=65001
> character.set=1000

> output.code.page=65001
> -------------------------------
> output (wrong): Ë Ä xCB xC4
> ]]

string.char(203) is producing the byte 203 which is 0xCB, not the UTF-8 representation of U+00CB which is 0xC3 0x8B. If you want Unicode character U+00CB, then use Lua’s utf8.char function:
print('Ë '..'Ä '..utf8.char(203)..' '..utf8.char(196))

Neil

Reply all

Reply to author

Forward

0 new messages