codepage little help

714 views
Skip to first unread message

Luigi Ferraris

unread,
Nov 29, 2013, 5:09:43 AM11/29/13
to harbou...@googlegroups.com
Hi guys, can someone help me to understand code page handling? I read a lot on this forum, but...

I'm Italian and I need to use symbols like these èéàùò€

Here a little code as reference
1  PROCEDURE Main()
2    LOCAL cFileName := hb_FnameMerge( hb_DirBase(), "prova", "txt" )
3    LOCAL cText
4    CLS
5    ? "current is " + hb_cdpSelect()
6    IF !hb_FileExists( cFileName )
7       cText := "I'm Italian so I need è €"  (utf8 latin_e_grave and Euro_Sign)
8       hb_MemoWrit( cFileName, cText )
9    ENDIF
10   cText := hb_MemoRead( cFileName )
11   ? "text is :" + cText
12   INKEY( 0 )
13
14 RETURN

OUTPUT: file is right, on video is wrong: OK because current language is EN (as I can see on line 5)

To solve my problem I have modified the source code like this
1  REQUEST HB_CODEPAGE_ITWIN
2  PROCEDURE Main()
3    LOCAL cFileName := hb_FnameMerge( hb_DirBase(), "prova", "txt" )
4    LOCAL cText
5    hb_CdpSelect( "ITWIN" )
6    hb_SetTermCP( "ITWIN", .T. )
7    CLS
8    ? "current is " + hb_cdpSelect()
9    IF !hb_FileExists( cFileName )
10      cText := "I'm Italian so I need è €"
11      hb_MemoWrit( cFileName, cText )
12    ENDIF
13   cText := hb_MemoRead( cFileName )
14   ? "text is :" + cText
15   INKEY( 0 )
16
17 RETURN


OUTPUT: file is right, on video is wrong: WHY? Current language is IT (as I can see on line 8)

I have modified my code at line 10
10      cText := "I'm Italian so I need "+ hb_Uchar( 0xE8 ) + "::" + hb_Uchar( 0x20AC )

OUTPUT: file is right, on video I can see latin_e_grave (OK) but I can't see Euro_sign: WHY? Current language is IT (as I can see on line 8)

I have modified my code at line 13 using different functions
13   cText := hb_Utf8ToStr( hb_MemoRead( cFileName ) )
13   cText := hb_Translate( hb_MemoRead( cFileName ), "UTF8", hb_CdpSelect() )
with or without REQUEST HB_CODEPAGE_UTF8


OUTPUT: file is right, on video is wrong: WHY? Current language is IT (as I can see on line 8)

AFAIK from this point I can't use some CLIPPER functions and I need to use Harbour extensions (ie hb_utf8At or hb_utf8RAt)

I have modified code in this way:
1  REQUEST HB_CODEPAGE_UTF8
5    hb_CdpSelect( "UTF8" )
6    hb_SetTermCP( "UTF8", .T. )
13   cText := hb_MemoRead( cFileName )


OUTPUT: file is right, on video is wrong: WHY? Current language is UTF8 (as I can see on line 8)

I have modified my code using UTF8EX

OUTPUT: file is right, on video is wrong: WHY? Current language is UTF8EX (as I can see on line 8)

At the end I have tried to use hb_LangSelect() function when codepage was "ITWIN"
a) hb_LangSelect( "it" )
b) hb_LangSelect( hb_UserLang() )
but always I get RTE --> Error BASE/1303  Argument error: __HB_LANGSELECT

Where is my fault?

TIA
Luigi Ferraris

Luigi Ferraris

unread,
Nov 29, 2013, 5:47:03 AM11/29/13
to harbou...@googlegroups.com
Hy friends,
about this



At the end I have tried to use hb_LangSelect() function when codepage was "ITWIN"
a) hb_LangSelect( "it" )
b) hb_LangSelect( hb_UserLang() )
but always I get RTE --> Error BASE/1303  Argument error: __HB_LANGSELECT

I found by myself the answer: at the top of my program I need
#include "hbextlng.ch"

Cheers
Luigi Ferraris

Alain Aupeix

unread,
Nov 29, 2013, 3:43:32 PM11/29/13
to harbou...@googlegroups.com
Le 29/11/2013 11:09, Luigi Ferraris a �crit�:
Hi guys, can someone help me to understand code page handling? I read a lot on this forum, but...

I'm Italian and I need to use symbols like these �����


Here a little code as reference
1� PROCEDURE Main()
2��� LOCAL cFileName := hb_FnameMerge( hb_DirBase(), "prova", "txt" )
3��� LOCAL cText
4��� CLS
5��� ? "current is " + hb_cdpSelect()
6��� IF !hb_FileExists( cFileName )
7������ cText := "I'm Italian so I need � �"� (utf8 latin_e_grave and Euro_Sign)
8������ hb_MemoWrit( cFileName, cText )
9��� ENDIF
10�� cText := hb_MemoRead( cFileName )
11�� ? "text is :" + cText
12�� INKEY( 0 )
13
14 RETURN

OUTPUT: file is right, on video is wrong: OK because current language is EN (as I can see on line 5)

To solve my problem I have modified the source code like this
1� REQUEST HB_CODEPAGE_ITWIN
2� PROCEDURE Main()
3��� LOCAL cFileName := hb_FnameMerge( hb_DirBase(), "prova", "txt" )
4��� LOCAL cText
5��� hb_CdpSelect( "ITWIN" )
6��� hb_SetTermCP( "ITWIN", .T. )
7��� CLS
8��� ? "current is " + hb_cdpSelect()
9��� IF !hb_FileExists( cFileName )
10����� cText := "I'm Italian so I need � �"
11����� hb_MemoWrit( cFileName, cText )
12��� ENDIF
13�� cText := hb_MemoRead( cFileName )
14�� ? "text is :" + cText
15�� INKEY( 0 )
16
17 RETURN


OUTPUT: file is right, on video is wrong: WHY? Current language is IT (as I can see on line 8)

I have modified my code at line 10
10����� cText := "I'm Italian so I need "+ hb_Uchar( 0xE8 ) + "::" + hb_Uchar( 0x20AC )


OUTPUT: file is right, on video I can see latin_e_grave (OK) but I can't see Euro_sign: WHY? Current language is IT (as I can see on line 8)

I have modified my code at line 13 using different functions
13�� cText := hb_Utf8ToStr( hb_MemoRead( cFileName ) )
13�� cText := hb_Translate( hb_MemoRead( cFileName ), "UTF8", hb_CdpSelect() )

with or without REQUEST HB_CODEPAGE_UTF8


OUTPUT: file is right, on video is wrong: WHY? Current language is IT (as I can see on line 8)

AFAIK from this point I can't use some CLIPPER functions and I need to use Harbour extensions (ie hb_utf8At or hb_utf8RAt)

I have modified code in this way:
1� REQUEST HB_CODEPAGE_UTF8
5��� hb_CdpSelect( "UTF8" )
6��� hb_SetTermCP( "UTF8", .T. )
13�� cText := hb_MemoRead( cFileName )


OUTPUT: file is right, on video is wrong: WHY? Current language is UTF8 (as I can see on line 8)

I have modified my code using UTF8EX

OUTPUT: file is right, on video is wrong: WHY? Current language is UTF8EX (as I can see on line 8)
Have you tried to use utf8tostr()
with these two functions I have no problem to display utf8 strings
You have also the function strtoutf8 wcich allos for example to save in utf8 text files.

A+
--

Alain Aupeix
http://jujuland.pagesperso-orange.fr/
http://pissobi-lacassagne.pagesperso-orange.fr/

U.buntu 12.04 | G.ramps 3.4.5-1 | H.arbour 3.2.0dev (2013-11-28 02:04) | HbIDE (Rev.264) | Five.Linux (r138) | Hw.Gui (2203)

Luigi Ferraris

unread,
Nov 29, 2013, 3:50:22 PM11/29/13
to harbou...@googlegroups.com
Il 29/11/2013 21.43, Alain Aupeix ha scritto:
> Have you tried to use utf8tostr()
> with these two functions I have no problem to display utf8 strings
> You have also the function strtoutf8 wcich allos for example to save
> in utf8 text files.
Hi Alain, I'm doing some test but I think the problem can be related
with OS code page (850 I'm on Windows Xp).
As I understand 858 is descripted as "OEM - Multilanguage Latin I +
Euro" (installed on my system), but Harbour it doesn't support this code.
Anyway, I continue to do test and using a GUI library (hbqt).

Cheers
Luigi Ferraris


---
Questa e-mail è priva di virus e malware perché è attiva la protezione avast! Antivirus.
http://www.avast.com

Alain Aupeix

unread,
Nov 29, 2013, 5:24:26 PM11/29/13
to harbou...@googlegroups.com
Le 29/11/2013 21:50, Luigi Ferraris a écrit :
Il 29/11/2013 21.43, Alain Aupeix ha scritto:
Have you tried to use utf8tostr()
with these two functions I have no problem to display utf8 strings
You have also the function strtoutf8 wcich allos for example to save in utf8 text files.
Hi Alain, I'm doing some test but I think the problem can be related with OS code page (850 I'm on Windows Xp).
As I understand 858 is descripted as "OEM - Multilanguage Latin I + Euro" (installed on my system), but Harbour it doesn't support this code.
Try this:

1  PROCEDURE Main()
2    LOCAL cFileName := hb_FnameMerge( hb_DirBase(), "prova", "txt" )
3    LOCAL cText
4    CLS
5    ? "current is " + hb_cdpSelect()
6    IF !hb_FileExists( cFileName )
7       cText := "I'm Italian so I need è €"  (utf8 latin_e_grave and Euro_Sign)
8       hb_MemoWrit( cFileName, hb_strtoutf8(cText ))
9    ENDIF
10   cText := hb_utf8tostr(hb_MemoRead( cFileName ))
11   ? "text is :" + cText
12   INKEY( 0 )
13
14 RETURN

I should be surprise if it doesn't work

Klas Engwall

unread,
Nov 29, 2013, 6:49:14 PM11/29/13
to harbou...@googlegroups.com
Hi Luigi,

>> Have you tried to use utf8tostr()
>> with these two functions I have no problem to display utf8 strings
>> You have also the function strtoutf8 wcich allos for example to save
>> in utf8 text files.
> Hi Alain, I'm doing some test but I think the problem can be related
> with OS code page (850 I'm on Windows Xp).
> As I understand 858 is descripted as "OEM - Multilanguage Latin I +
> Euro" (installed on my system), but Harbour it doesn't support this code.
> Anyway, I continue to do test and using a GUI library (hbqt).

So far you have mentioned CP850, 858, ITWIN and UTF8. That is a lot :-).
So I can't help wondering what your real goal is. Do you get UTF8
encoded files from external sources and have to convert them? Or is it
the other way around? Or are you just building an application with
normal keyboard input and want to save what the user types?

All the characters you mentioned in your original post exist in the
"Windows Western" character set as far as I can see (You can check that
in Accessories -> System Tools -> Character Set or some similar wording
in the start menu - select Lucida Console and Windows Western for
example). And Windows Western means a "WIN" or "ISO" codepage in
Harbour. I can understand why the guys in Eastern Europe are so fond of
UTF8, but IMHO the average user in Western Europe seldom needs it. So do
you have any special needs in addition to the Euro sign and the accented
characters (grave and acute) that you mentioned?

Also, as you are already aware of, using UTF8 requires a different
approach to string handling with functions like the standard at()
function not working properly and requiring special UTF8 versions
instead. Saving UTF8 strings in dbfs is also a problem because you do
not know exactly how much more room you are going to need in the
character fields.

In most cases in Western Europe it should be enough to use the "xxWIN"
codepage for the "xx" language used in the country. Like this:

set( _SET_CODEPAGE, "ITWIN" )

in your case.

Next question: Is this a new application you are writing or do you have
a long history of CP850 coded dbf files that you must still support? You
can convert automatically between "ITWIN" in the HVM and "IT850" or
something like that in the dbf files with:

set( _SET_CODEPAGE, "ITWIN" )
set( _SET_DBCODEPAGE, "IT850" )

But I suspect that some of the accented characters will not convert
correctly. If so, using "ITWIN" in the dbf files is preferable ... if it
is possible to convert the old data.

Also, looking at this line:

cText := "I'm Italian so I need � �" (utf8 latin_e_grave and Euro_Sign)

How is that text encoded? What is the codepage used by the editor? Is it
UTF8? It should be the same as the set( _SET_CODEPAGE ) codepage used in
the application.

Regards,
Klas

Francesco Perillo

unread,
Nov 29, 2013, 7:32:19 PM11/29/13
to harbou...@googlegroups.com

I believe that for western Europa best choice is codepage CP1252, superset of latin1/iso8859-1.
It has almost all glyphs of cp850 but in other position. Conversion should be ok.

Klas, can you please recap how console, hvm and dbf codepages interact ? And how safely conver from cp850 to cp1252 source, runtime, dbf...

Luigi Ferraris

unread,
Nov 30, 2013, 1:29:25 PM11/30/13
to harbou...@googlegroups.com
Il 30/11/2013 0.49, Klas Engwall ha scritto:
So far you have mentioned CP850, 858, ITWIN and UTF8. That is a lot :-). So I can't help wondering what your real goal is. Do you get UTF8 encoded files from external sources and have to convert them? Or is it the other way around? Or are you just building an application with normal keyboard input and want to save what the user types?

1) It will be a new application so I haven't problem related with existing code and data, but I want start with a "good" POV to write my code
2) I don't want use UTF8: it requires different approach related with some functions so I can create other bugs :-) on the other hand Western Europe cdp is right for me.
3) I will use Hbqt for input/output but ... I need several text file as "data" so see next point
4) I was start to write a simple text file and output on video BUT using a console mode program using CLIPPER command ? for output.
So I understand: I need to set the right code page because without any settings I can't see accented letters. I get theses values on my PC:
? hb_CdpSelect() ==> EN
? hb_LangSelect() ==> en
? hb_LangName() ==> Harbour Language: en English (English )
? hb_UserLang() ==> it-IT
? hb_CdpTerm() ==> IT850M
? hb_CdpOs() ==> ITWIN
? hb_CdpUniID() ==> cp437

So I have tried
hb_CdpSelect( "ITWIN" )
hb_LangSelect( "it")
? hb_CdpSelect() ==>ITWIN
? hb_LangSelect() ==> it.ITWIN
? hb_LangName() ==> Harbour Language: it.ITWIN Italian (Italiano)
? hb_UserLang() ==> it-IT
? hb_CdpTerm() ==> IT850M
? hb_CdpOs() ==> ITWIN
? hb_CdpUniID() ==> cp1252

On text file ALL is ok (letters and sign) but (strange) on video I can see only accented letters not Euro sign. Probably is related with OS code page?

So I have add hb_SetTermCp( "ITWIN", .T. ) - AFAIK set cdp for input and output - but I get the same as previous.
Question? But this function must be used to
a) inform Harbour program to use for I/O that cdp
OR
b) to "change" (OS level) the cdp

At the same time using I think SET( _SET_OSCODEPAGE, hb_CdpOs() ) will informs Harbour program: the OS code page is..... or like b)?

All the characters you mentioned in your original post exist in the "Windows Western" character set as far as I can see (You can check that in Accessories -> System Tools -> Character Set or some similar wording in the start menu� -� select Lucida Console and Windows Western for example). And Windows Western means a "WIN" or "ISO" codepage in Harbour. I can understand why the guys in Eastern Europe are so fond of UTF8, but IMHO the average user in Western Europe seldom needs it. So do you have any special needs in addition to the Euro sign and the accented characters (grave and acute) that you mentioned?

Also, as you are already aware of, using UTF8 requires a different approach to string handling with functions like the standard at() function not working properly and requiring special UTF8 versions instead. Saving UTF8 strings in dbfs is also a problem because you do not know exactly how much more room you are going to need in the character fields.

In most cases in Western Europe it should be enough to use the "xxWIN" codepage for the "xx" language used in the country. Like this:

set( _SET_CODEPAGE, "ITWIN" )

in your case.

Next question: Is this a new application you are writing or do you have a long history of CP850 coded dbf files that you must still support? You can convert automatically between "ITWIN" in the HVM and "IT850" or something like that in the dbf files with:

set( _SET_CODEPAGE, "ITWIN" )
set( _SET_DBCODEPAGE, "IT850" )

But I suspect that some of the accented characters will not convert correctly. If so, using "ITWIN" in the dbf files is preferable ... if it is possible to convert the old data.

Also, looking at this line:

cText := "I'm Italian so I need � �"� (utf8 latin_e_grave and Euro_Sign)

How is that text encoded? What is the codepage used by the editor? Is it UTF8? It should be the same as the set( _SET_CODEPAGE ) codepage used in the application.
About Editor (ConText) it reports program as type "DOS" so I think can be cp1252 or at least Cp858. AFAIK standard cp850 doesn't has EURO sign else I can't insert...probably.

Many thanks for your answer and help
Luigi Ferraris




Questa e-mail � priva di virus e malware perch� � attiva la protezione avast! Antivirus .


Klas Engwall

unread,
Nov 30, 2013, 7:45:54 PM11/30/13
to harbou...@googlegroups.com
Hi Luigi,

> 1) It will be a new application so I haven't problem related with
> existing code and data, but I want start with a "good" POV to write my code

OK, then use only "ITWIN" and only set it with

set(_SET_CODEPAGE,"ITWIN")

That is all! It will be used on screen, for keyboard input, in dbf files
and in text files - everywhere! (Unless Italian Windows setup uses
some strange default setting for console windows that I am unaware of
and that Harbour is unable to change???)

> 2) I don't want use UTF8: it requires different approach related with
> some functions so I can create other bugs :-) on the other hand Western
> Europe cdp is right for me.

That sounds good

> 3) I will use Hbqt for input/output but ... I need several text file as
> "data" so see next point

But you are still using console, not HBQT, for your tests, right?

> 4) I was start to write a simple text file and output on video *BUT*
> using a console mode program using CLIPPER command ? for output.
> So I understand: I need to set the right code page because /without any
> settings I can't see accented letters/. I get theses values on my PC:
> ? hb_CdpSelect() ==> EN
> ? hb_LangSelect() ==> en
> ? hb_LangName() ==> Harbour Language: en English (English )
> ? hb_UserLang() ==> it-IT
> ? hb_CdpTerm() ==> IT850M
> ? hb_CdpOs() ==> ITWIN
> ? hb_CdpUniID() ==> cp437
>
> *So I have tried*
> hb_CdpSelect( "ITWIN" )
> hb_LangSelect( "it")
> ? hb_CdpSelect() ==>ITWIN
> ? hb_LangSelect() ==> it.ITWIN
> ? hb_LangName() ==> Harbour Language: it.ITWIN Italian (Italiano)
> ? hb_UserLang() ==> it-IT
> ? hb_CdpTerm() ==> IT850M
> ? hb_CdpOs() ==> ITWIN
> ? hb_CdpUniID() ==> cp1252

I think you are making it too complicated for yourself :-)

For the moment, drop *all* those, and only set the codepage as I
suggested above. I suspect that the real cause of your problems, out of
all those settings, is the "IT850M" terminal codepage.

You only set the terminal (screen and keyboard) codepage it you want it
to be different from the internal codepage used by the HVM. And I see no
reason to do that. So let set(_SET_CODEPAGE, "ITWIN") do everything for you.

> _/On text file ALL is ok (letters and sign) but (strange) on video I can
> see only accented letters not Euro sign. Probably is related with OS
> code page?/_
>
> So I have add hb_SetTermCp( "ITWIN", .T. ) - AFAIK set cdp for input and
> output - but I get the same as previous.
> Question? But this function must be used to
> a) inform Harbour program to use for I/O that cdp
> OR
> b) to "change" (OS level) the cdp

Again, set the codepage only as I suggested. That should be enough.

> At the same time using I think SET( _SET_OSCODEPAGE, hb_CdpOs() ) will
> informs Harbour program: the OS code page is..... or like b)?

The OS codepage is used for conversion between the internal codepage and
the codepage in the OS for handling *file names*. AFAIK, that codepage
is detected automatically and should not need to be set manually in most
cases. Try it with accented file names to see if it works correctly. But
for the problems you have described so far, it is completely irrelevant.

> About Editor (ConText) it reports program as type "DOS" so I think can
> be cp1252 or at least Cp858.

I installed a copy of ConText (from http://www.contexteditor.org,
right?). I entered a Euro sign with <Alt><0128>, and it displayed
correctly. Then I saved it and looked at it with a hex editor, and it
said 0x80. So ConText uses a normal Windows codepage. "ITWIN" is what
you need in Harbour to match it. ConText will not handle CP850
correctly, although you can convert the entire file to "OEM Charset" in
the View menu - but only on screen as far as I can see. Context still
saves the file using the Windows codepage (1252 or ANSI or whatever we
want to call it) anyway. A strange menu option IMHO :-)

> AFAIK standard cp850 doesn't has EURO sign
> else I can't insert...probably.

I have not checked, but I suppose it doesn't, since CP850 is a lot older
than the EURO.

Regards,
Klas

Klas Engwall

unread,
Nov 30, 2013, 8:19:34 PM11/30/13
to harbou...@googlegroups.com
Hi Francesco,

> I believe that for western Europa best choice is codepage CP1252,
> superset of latin1/iso8859-1.

Yes, I think so. And that means using the "xxWIN" or possibly the
"xxISO" codepage for the language used in the country in question.

> It has almost all glyphs of cp850 but in other position. Conversion
> should be ok.

With maybe a few unusual exceptions in Western Europe and probably many
exceptions in Eastern Europe.

> Klas, can you please recap how console, hvm and dbf codepages interact ?
> And how safely conver from cp850 to cp1252 source, runtime, dbf...

HVM is "the boss". Set that codepage (also called the internal one) with
set(_SET_CODEPAGE,x) and it will be used everywhere, unless you tell
Harbour differently.

If you need a different codepage in dbf files, set that codepage with
set(_SET_DBCODEPAGE,y)

Conversion between the HVM and the dbfs will be done automatically by
Harbour. But check that all characters used in the application exist in
both codepages. For example, the Swedish Clipper codepage supplied by
Nantucket in NTXSWE.OBJ a very long time ago (CP437) is the crappiest
codepage ever written :-) and will cause problems with certain accented
characters, and other characters too, while using "CPWIN" in the HVM.
Przemek did a great job emulating the crap :-) in the "SV437C" codepage
so Clipper and Harbour work perfectly together, but some valid accented
characters are not supported in either.

When I change set(_SET_CODEPAGE) it also flips the codepage in the
console window, so that should not be a problem (unless, as I said to
Luigi, there is something "funny" going on in console windows in certain
localized Windows versions that I am not aware of).

It is also possible to use different codepages internally versus on
screen/in keyboard input. HB_SETKEYCP() (input), HB_SETDISPCP() (output)
and HB_SETTERMCP() (both) exist for that, but they are only needed in
special cases.

Source code must match the CP used internally, or the strings must be
converted.

Regards,
Klas

Luigi Ferraris

unread,
Dec 2, 2013, 4:02:16 AM12/2/13
to harbou...@googlegroups.com
Hi Klas,
many thanks for all infos and suggestions: I will follow them.

Il 01/12/2013 1.45, Klas Engwall ha scritto:
> I installed a copy of ConText (from http://www.contexteditor.org,
> right?). I entered a Euro sign with <Alt><0128>, and it displayed
> correctly. Then I saved it and looked at it with a hex editor, and it
> said 0x80. So ConText uses a normal Windows codepage. "ITWIN" is what
> you need in Harbour to match it. ConText will not handle CP850
> correctly, although you can convert the entire file to "OEM Charset"
> in the View menu - but only on screen as far as I can see. Context
> still saves the file using the Windows codepage (1252 or ANSI or
> whatever we want to call it) anyway. A strange menu option IMHO :-)
Yes, for this (and other) reason I think to change with Notepad++

Luigi Ferraris

unread,
Dec 5, 2013, 6:43:14 AM12/5/13
to harbou...@googlegroups.com
Hi Klass
I'm sorry if I'm boring you.

AFAIK hb_CdpUniID( cHarbourCodePage ) returns the standard / international codepage code. ie hb_CdpUniID( "ITWIN") it returns "cp1252"; this sounds good

I have done this as test:

hb_CdpSelect( "UTF8" )
hb_LangSelect( "it" )
hb_CdpUniID( hb_CdpSelect() )
and related with last row I get "cp437". Is the same using "UTF8EX"

THEN I have done this as test:
hb_CdpSelect( "ITISO" )
hb_LangSelect( "it" )
hb_CdpUniID( hb_CdpSelect() )
and related with last row I get "iso8859-1" and can be considered good, but if I want work using "iso8859-15" I need to write

hb_CdpSelect( "SVISO" ) <<< Sweden???
hb_LangSelect( "it" )
hb_CdpUniID( hb_CdpSelect() )
and related with last row I get "iso8859-15"

Are these the behaviours?

Best regards
Luigi Ferraris



Questa e-mail è priva di virus e malware perché è attiva la protezione avast! Antivirus .


Klas Engwall

unread,
Dec 5, 2013, 6:08:34 PM12/5/13
to harbou...@googlegroups.com
Hi Luigi,

> AFAIK hb_CdpUniID( cHarbourCodePage ) returns the standard /
> international codepage code. ie hb_CdpUniID( "ITWIN") it returns
> "cp1252"; this sounds good
> *
> I have done this as test:*
> hb_CdpSelect( "UTF8" )
> hb_LangSelect( "it" )
> hb_CdpUniID( hb_CdpSelect() )
> and related with last row I get "cp437". Is the same using "UTF8EX"
>
> *THEN **I have done this as test:*
> hb_CdpSelect( "ITISO" )
> hb_LangSelect( "it" )
> hb_CdpUniID( hb_CdpSelect() )
> and related with last row I get "iso8859-1" and can be considered good,
> but if I want work using "iso8859-15" I need to write
>
> hb_CdpSelect( "SVISO" ) <<< Sweden???
> hb_LangSelect( "it" )
> hb_CdpUniID( hb_CdpSelect() )
> and related with last row I get "iso8859-15"
>
> Are these the behaviours?

To begin with, hb_langselect() is irrelevant in this context. It has
nothing to do with codepages. Instead, it is about selecting the
language for names of months and weekdays etc (see src\lang\l_??.c).

I do not use UTF8, but I found this in src\codepage\cp_utf8.c:
#define HB_CP_ID UTF8EX
#define HB_CP_INFO "UTF-8 extended"
#define HB_CP_UNITB HB_UNITB_437
So that is probably where your "cp437" result is coming from.

The Swedish codepage "SVISO" is the only codepage with 8859-15. Viktor
created it as a special case after I suggested to put as many
alphabetical characters as possible in the collation strings (also a
special case). If you compare the collation strings in
src\codepage\l_sv.h with any other of the l_??.h files you will find
that the Swedish strings are much more complete than the other ones (you
must use an editor that understands UTF8 when you do that). This means
that letters with accents will be sorted in their alphabetical context
rather than by ASCII value (which is more or less random, as you can see
in the character set utility in Windows' start menu, accessories, system
tools). Secondly, the tilde characters connecting for example a, �, �,
�, etc means that those characters are given an equal value when
indexing and seeking, so words beginning with any of those letters will
be found whether you include the accent or not in the dbseek() argument.
Using this feature, called HB_CDP_ACSORT_INTERLEAVED, is a matter of
practice in the language and locale in question, and the codepage
mirrors that practice.

When ISO-8859-15 (in general) was originally created, it was intended to
be more complete than 8859-1, but the new codepage never took off.
Microsoft uses codepage 1252 instead, and that is where most of the
western world users have been sitting for a long time.

If you want to make an Italian codepage with all accented characters
sorted alphabetically rather than by ASCII value, I don't think it would
be very difficult. I once made a codepage for testing purposes in only
around ten minutes. You can borrow the collation strings from l_sv.h and
use cpsviso.c and cpitiso.c as guides. You will have to give your
codepage a new name, such as "ITISOLUIGI" :-) just like there are DOS
codepages with a "C" suffix to the codepage name to make those codepages
compatible with Nantucket's less than perfect attempts rather than the
standard 850 or 437 codepages.

Regards,
Klas
Reply all
Reply to author
Forward
0 new messages