DBF and UTF8 issues

180 views
Skip to first unread message

Mindaugas Kavaliauskas

unread,
Apr 4, 2018, 7:44:52 AM4/4/18
to Harbour Developers, Jonas Gediminas
Hello,


we have a few issues with national/unicode/utf8 charset support.

1) DBF RDD corruption detected

EXTERNAL DBFCDX, DBFFPT
FIELD AA, BB, CC

PROC main()
RDDSETDEFAULT("DBFCDX")
#if 1
DBCREATE("test378a", {{"AA", "Q:U", 16, 0}, {"BB", "M:U", 10, 0},
{"CC", "M", 10, 0}},, .T., "")
DBAPPEND()
? "a1"
AA := "TestA"
? "2"
BB := "TestB" // corruption detected
? "3"
#elif 1
// BB is assigned without error if type of AA field is changed!
DBCREATE("test378b", {{"AA", "Q", 16, 0}, {"BB", "M:U", 10, 0},
{"CC", "M", 10, 0}},, .T., "")
DBAPPEND()
? "b1"
AA := "TestA"
? "2"
BB := "TestB"
? "3"
#elif 1
// BB is also assigned without error if field CC is removed!
DBCREATE("test378c", {{"AA", "Q:U", 16, 0}, {"BB", "M:U", 10, 0}},,
.T., "")
DBAPPEND()
? "c1"
AA := "TestA"
? "2"
BB := "TestB"
? "3"
#endif
DBCLOSEALL()
RETURN

2) Single byte (half of UTF8 character) is written to database. Can we
have a setting (or default behaviour) to omit a corrupted character (and
replace by space if field length is fixed (not varlength)) in non binary
character fields?

EXTERNAL DBFCDX, HB_CODEPAGE_UTF8EX
FIELD AA

PROC main()
LOCAL cText
HB_CDPSELECT("UTF8EX")
RDDSETDEFAULT("DBFCDX")
cText := "Aš" // two characters, three bytes, source file is UTF8
DBCREATE("test379", {{"AA", "C", 2, 0}},, .T., "")
DBAPPEND()
AA := cText
? "Text:", HB_StrToHex(cText), ASC(cText), ASC(SUBSTR(cText, 2))
? "Field:", HB_StrToHex(AA), ASC(AA), ASC(SUBSTR(AA, 2))
DBCLOSEALL()
RETURN


3) What is expected behaviour of UPPER()/LOWER() in UTF8 (not UTF8EX)
codepage? Should we add HB_BUPPER()/HB_UUPPER(), or should we change
current UPPER()/LOWER() to avoid a custom function MyUPPER()? Or I
misunderstand something?

EXTERNAL HB_CODEPAGE_UTF8EX, HB_CODEPAGE_LT775

PROC main()
LOCAL cText
HB_CDPSELECT("UTF8")
HB_SETTERMCP("LT775")
cText := "Aš" // two characters, three bytes, source file is UTF8
? UPPER(cText), HB_StrToHex(cText)
? MyUPPER(cText), HB_StrToHex(MyUPPER(cText))
RETURN

FUNC MyUPPER(cText)
LOCAL cCP := HB_CDPSELECT("UTF8EX")
cText := UPPER(cText)
HB_CDPSELECT(cCP)
RETURN cText


Best regards,
Mindaugas

Reply all
Reply to author
Forward
0 new messages