Harbour 3.2 - breaking change with character encoding

1,164 views
Skip to first unread message

Zeljko

unread,
Nov 11, 2015, 5:03:39 PM11/11/15
to Harbour Users
I posted a similar questinon half a year ago, but did not get any answer. Basically, my app is compiled with harbour 3.0 using these settings for character encoding:

        HB_CDPSELECT("EN")
HB_SETTERMCP( "HR852","HR646" )

The app works great, everything on the screen is displayed as exprected, but when i recompile the same app with harbour 3.2  I have a problem that I can't type any special characters ("ČĆŽŠĐ") into any GET BOX. Also some box drawing characters are wrong. The data from the DBF files is fetched with the right characters, but I can't enter any croatian character into any GET field. There seems to be a breaking change with character encoding that I am not aware of. Also, I did not try harbour 3.1, so the breaking change might also be there. Any help appreciated.

Klas Engwall

unread,
Nov 11, 2015, 9:18:15 PM11/11/15
to harbou...@googlegroups.com
Hi Zeljko,

> I posted a similar questinon half a year ago, but did not get any
> answer. Basically, my app is compiled with harbour 3.0 using these
> settings for character encoding:
>
> HB_CDPSELECT("EN")
> HB_SETTERMCP( "HR852","HR646" )
>
> The app works great, everything on the screen is displayed as exprected,
> but when i recompile the same app with harbour 3.2 I have a problem
> that I can't type any special characters ("ČĆŽŠĐ") into any GET BOX.
> Also some box drawing characters are wrong. The data from the DBF files
> is fetched with the right characters, but I can't enter any croatian
> character into any GET field. There seems to be a breaking change with
> character encoding that I am not aware of.Also, I did not try harbour
> 3.1, so the breaking change might also be there. Any help appreciated.

I don't quite follow the logic in your codepage settings. What kind of
environment are you running in? Are you using terminals and a terminal
server?

You are setting the VM codepage to "EN", and the codepage of the DBF
files is also supposed to be "EN" since you do not set it to anything
else with Set(_SET_DBCODEPAGE). Then in hb_SetTermCp() you set the
internal (host) codepage to "HR646" and convert it to/from "HR852" for
input and output. I may be misunderstanding what you do, but isn't that
a contradiction? Wouldn't using the same codepage as internal codepage
in both functions be more straightforward? And for input/output too?

What is the reason for using "EN" in Croatia? What is the codepage of
the data in your DBF files "really"? Are there accented characters in
the files too, and how does indexing of those work for you? Do they
appear in the correct Croatian order?

I am really trying to understand your Croatian environment ...

For my language I set the VM codepage to a Windows codepage for both VM
and input/output with Set(_SET_CODEPAGE) and a different (437 based)
codepage for the DBF files with Set(_SET_DBCODEPAGE). I don't use
hb_SetTermCp().

Box characters are displayed using the "EN" codepage by default, but it
can be changed with hb_GtInfo(HB_GTI_BOXCP). But you must display them
with one of the box functions, SAYing box characters will not work
correctly. With the box functions you can combine box characters from
the "EN" codepage or from Unicode with text from a different codepage on
the same screen.

There was a big rewrite around 2012 when "everything" was internally
converted to Unicode. And current 3.2.0dev is a big step forward from
3.0 in many other areas too. The changes are huge.

Regards,
Klas

Zeljko

unread,
Nov 12, 2015, 12:46:48 AM11/12/15
to Harbour Users
Hi Klas, 
thank you for taking interest in my problem. I'll try to explain the state of my app. The DBF files are legacy encoded in the default CLIPPER encoding (CP 437). As the files are rather large (talking GBytes of data) and there are other apps not under my control that use the data, I am not in the position to convert them to the standard OEM cp used in Croatia (CP 852). Also the whole app (the code) is encoded in CP 437. As for the Windows OS settings, it is being run under CP 852. So what I need is this: 

The OS runs CP 852. Both code and DBF are encoded in CP 437.

The funny thing is that the current settings work great compiled with harbour 3.0, but not with harbour 3.2.

Zeljko

unread,
Nov 12, 2015, 1:56:51 AM11/12/15
to Harbour Users
To clarify one thing. These settings:

SET (_SET_DBCODEPAGE, "HR646")
SET (_SET_CODEPAGE, "HR852")

would work fine if I would not depend on other (FOXPRO) apps using the data. The screen display is correct and the cp conversion from and to the database works correctly. Even the sorting in the index is correct. But there lies the problem. The original index is using the EN encoding, so the sorting should not be correct. It should be sorted in the english collation. Obviously the index is written with the HR646 cp. This leads to index corruptions when the FOXPRO app writes something to the index.

I tried these settings:

SET (_SET_DBCODEPAGE, "EN")
SET (_SET_CODEPAGE, "HR852")

Now there is no corruption, but there is no cp translation going on when writing to or reading to the DBF files. Is there any way to force this translation from EN to HR852?


On Wednesday, November 11, 2015 at 11:03:39 PM UTC+1, Zeljko wrote:

Qatan

unread,
Nov 12, 2015, 12:00:21 PM11/12/15
to harbou...@googlegroups.com
Hello,
 
§ HB_CDPSELECT("EN")
§ HB_SETTERMCP( "HR852","HR646" )
 
What happens if you use like this: hb_cdpSelect( "CS852" ) or hb_cdpSelect( "CSISO" ) ? Or even hb_cdpSelect( "ISO8859-2" )?
Why do you use hb_setTermCP()? Try to use the program without that line to see what happens or try to use it like this: hb_SetTermCP( hb_cdpTerm() )
 
 
Qatan
 
 
From: Zeljko
Sent: Wednesday, November 11, 2015 11:03 PM
Subject: [harbour-users] Harbour 3.2 - breaking change with character encoding
 
I posted a similar questinon half a year ago, but did not get any answer. Basically, my app is compiled with harbour 3.0 using these settings for character encoding:
The app works great, everything on the screen is displayed as exprected, but when i recompile the same app with harbour 3.2  I have a problem that I can't type any special characters ("ČĆŽŠĐ") into any GET BOX. Also some box drawing characters are wrong. The data from the DBF files is fetched with the right characters, but I can't enter any croatian character into any GET field. There seems to be a breaking change with character encoding that I am not aware of. Also, I did not try harbour 3.1, so the breaking change might also be there. Any help appreciated.
--
--
You received this message because you are subscribed to the Google
Groups "Harbour Users" group.
Unsubscribe: harbour-user...@googlegroups.com
Web: http://groups.google.com/group/harbour-users

---
You received this message because you are subscribed to the Google Groups "Harbour Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to harbour-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Zeljko

unread,
Nov 12, 2015, 12:54:22 PM11/12/15
to Harbour Users
Hi Qatan,

after a gzillion permutations, I finally got the winning combination. It goes as follows:

REQUEST HB_CODEPAGE_HR852

REQUEST HB_CODEPAGE_HR646


SET
(_SET_DBCODEPAGE, "EN")
HB_CDPSELECT
("HR646")

HB_SETTERMCP
("HR852","HR646" )





USE E
:/DBF/PKDATA/DOKTOR INDEX E:/DBF/PKDATA/DOKTOR
SET ORDER TO
1


BROWSE
()

So if anyone comes across this problem. The steps to tame this beast in Harbour 3.2 are:

1. Use     SET (_SET_DBCODEPAGE, "EN") to set the DBF codepage to english 437
2. Use     HB_CDPSELECT("HR646")  to set the desired internal codepage. This is an important difference to Harbour 3.0 where I set the codepage to "EN"
3. Use     HB_SETTERMCP("HR852", "HR646") to enable the translation

Note that in Harbour 3.0 step 2 is  HB_CDPSELECT("EN").
Note also that with cp HR646 I had to use slashes instead of backslashes to set the DBF location, as the backslash in this case is the croatian character "Đ"

Klas Engwall

unread,
Nov 12, 2015, 1:44:08 PM11/12/15
to harbou...@googlegroups.com
Hi Zeljko,

> after a gzillion permutations, I finally got the winning combination.

Great :-)

> It goes as follows:
>
> REQUEST HB_CODEPAGE_HR852
> REQUEST HB_CODEPAGE_HR646
>
> SET(_SET_DBCODEPAGE,"EN")
> HB_CDPSELECT("HR646")

So far I totally agree with you, even though it is strange with the
correct sorting of Croatian characters. That shouldn't be possible with
the "EN" codepage, as you also said in one of the other posts today. Did
FoxPro create the index files? Then we have something on the FoxPro side
that we do not have full control over on the Harbour side. On the other
hand, if the FoxPro index is not "EN", then you should get index
corruption with SET(_SET_DBCODEPAGE,"EN"), and you said you do not get
that. Hmm ...

> HB_SETTERMCP("HR852","HR646")

So, you have "HR646" as the internal codepage in the VM. What is the
purpose of using "HR852" for screen and keyboard? Is that related to the
backslash problem or to possible differences between the character sets?

> USE E:/DBF/PKDATA/DOKTOR INDEX E:/DBF/PKDATA/DOKTOR
> SET ORDER TO 1
> BROWSE()
>
> So if anyone comes across this problem. The steps to tame this beast in
> Harbour 3.2 are:
>
> 1. Use SET (_SET_DBCODEPAGE, "EN") to set the DBF codepage to
> english 437
> 2. Use HB_CDPSELECT("HR646") to set the desired internal codepage.
> This is an important difference to Harbour 3.0 where I set the codepage
> to "EN"
> 3. Use HB_SETTERMCP("HR852", "HR646") to enable the translation
>
> Note that in Harbour 3.0 step 2 is HB_CDPSELECT("EN").
> Note also that with cp HR646 I had to use slashes instead of backslashes
> to set the DBF location, as the backslash in this case is the croatian
> character "Đ"

Did you also try the

SET( _SET_OSCODEPAGE, hb_cdpOS() )

setting? It is needed for recognizing file names with national
characters. It can be used with a codepage name as a string like the
other codepage setting functions, or it can be used by querying the OS
with the hb_cdpOS() function. Could this maybe solve your backslash problem?

Regards,
Klas

Zeljko

unread,
Nov 12, 2015, 2:41:02 PM11/12/15
to Harbour Users
Klas, 

1. The sorting is incorrect. It is sorted in the English collation, which is fine as I use user defined functions on the indexes to generate the correct sorting order.

2. As for the line HB_SETTERMCP("HR852","HR646") , Wooooow this I did not expect this to happen, I removed the line and the app still works. Seems to make no difference. This does surprise me, as it was necessary in HB30 to make it work. Thanks for pointing it out.

3. I tried your suggestion with SET( _SET_OSCODEPAGE, hb_cdpOS() ) , but the backslash is still interpreted as the Croatian letter "Đ".

Thanks for your help. If you have some other suggestion for the backlash problem, I am eager to try it out. Looked my code up and there are lots of backslashes related to the location of DBF and other files, so it is a PITA to change them.

Klas Engwall

unread,
Nov 12, 2015, 5:08:33 PM11/12/15
to harbou...@googlegroups.com
Hi Zeljko,

> 1. The sorting is incorrect. It is sorted in the English collation,
> which is fine as I use user defined functions on the indexes to generate
> the correct sorting order.

OK, apparently I misread what you said. No unexplained magic then :-)

> 2. As for the line HB_SETTERMCP("HR852","HR646") , Wooooow this I did
> not expect this to happen, I removed the line and the app still works.
> Seems to make no difference. This does surprise me, as it was necessary
> in HB30 to make it work. Thanks for pointing it out.

:-)

> 3. I tried your suggestion with SET( _SET_OSCODEPAGE, hb_cdpOS() ) , but
> the backslash is still interpreted as the Croatian letter "Đ".

I think that setting is still a bonus if you encounter files with
Croatian characters in the filename. Make a small test with Fopen() or
hb_MemoRead(), with and without setting the OS codepage, and try to use
files with Croatian filenames. But for correct handling of backslashes,
there are bigger problems to solve.

> Thanks for your help. If you have some other suggestion for the backlash
> problem, I am eager to try it out. Looked my code up and there are lots
> of backslashes related to the location of DBF and other files, so it is
> a PITA to change them.

Well, the 1250-based Croatian "HRWIN" or the 8859-2-based "HRISO"
codepage in the VM should solve that, I believe, judging from what
Wikipedia tells me. But the conversion of your source code files would
probably be a pretty big job. On the other hand, one day you will likely
have to break out of the restrictions of using a DOS codepage anyway. I
suggest you make another small test using that approach. It will give
you a taste of what you might accomplish if you go that route in the
future. Since the problem with translation when reading and writing DBF
files has been solved, converting to Windows or ISO in the VM and the
user interface will not give you any new problems in that area.

And regarding path separators in general, in my own code I have done
away with literal path separators entirely. The code I write nowadays is
as OS independent as I am able to create. That means using the hb_PS()
function and moving all file handling to special functions out of the
general business code. So I should be able to move to a different OS
that uses forward slashes with no changes in the source code. Old code
requires some work, though :-)

Regards,
Klas

Zeljko

unread,
Nov 13, 2015, 11:21:57 AM11/13/15
to Harbour Users
Klas, thanks once again,

seems there's some work for me ahead. 
Upgrading, I ran into another snag. All SAVE SCREEN - RESTORE SCREEN commands left me with the box characters garbled. Luckily, searching this forum I found a simple solution. I am use GTWVT, so I jad to put this line of code at the start of my solution:
hb_gtInfo( HB_GTI_COMPATBUFFER, .F.  )
 
which solved the problem. Seems Victor changed the default from .T. to .F. in harbour 3.2 to be compatible with Clipper, which I support wholehearteadly. Thought this worth mentioning for others who want to upgrade. Here is the link: https://groups.google.com/forum/#!searchin/harbour-users/harbour$203.2$20save$20screen/harbour-users/JI4pSmMggTg/z75hdDgdqp0J
If I run into other problems I'll document them here for future upgraders.

Klas Engwall

unread,
Nov 13, 2015, 6:57:19 PM11/13/15
to harbou...@googlegroups.com
Hi Zeljko,

> seems there's some work for me ahead.

I didn't mean that you *have* to make those conversions I mentioned :-)
but I think that now is a good time to at least experiment with them to
get a solid basis for deciding about the future direction of your
project(s).

> Upgrading, I ran into another snag. All SAVE SCREEN - RESTORE SCREEN
> commands left me with the box characters garbled. Luckily, searching
> this forum I found a simple solution. I am use GTWVT, so I jad to put
> this line of code at the start of my solution:
>
> hb_gtInfo(HB_GTI_COMPATBUFFER,.F.)
>
> which solved the problem.

Yes. Good job :-). And if you try to make a screendump as text to a
printer (and to a file too) you will find that the printer will not be
entirely happy about the mixed codepages between the SAY text and the
BOXed characters. So a little trick programming is required to fix that.

> Seems Victor changed the default from .T. to
> .F. in harbour 3.2 to be compatible with Clipper, which I support
> wholehearteadly.

Yes, I agree that was a good move

> Thought this worth mentioning for others who want to
> upgrade. Here is the
> link: https://groups.google.com/forum/#!searchin/harbour-users/harbour$203.2$20save$20screen/harbour-users/JI4pSmMggTg/z75hdDgdqp0J
> If I run into other problems I'll document them here for future upgraders.

OK, good idea

Regards,
Klas
Reply all
Reply to author
Forward
0 new messages