Hi Aleksander and Marek,
To clarify.
Yes it may help in such situations but only if translation is
enabled.
In this example the test server uses UTF8 encoding as default CDP
and DBF file uses PLWIN (CP1250 encoding).
When HVM uses different CDP then DBF translation with fallback
table is enabled (BTW now all translations between different CDPs
use this fallback table and neither source not destination CDP needs
to use any variant of Unicode encoding). It means that during
indexing
field values from are translated to CP1250 and because letter 'ư'
does
not exist in CP 1250 then is stored in index as simple Latin 'u'.
Then the
same happens with SEEK parameter so the code works quite nicely.
Anyhow the revert translation will not make such conversion. The
data
read from Unicode fields in DBF file can be well represented in UTF8
encoding. So it contains letter 'ư' not 'u'. In practice it means
that any
filter expressions should compare values in PLWIN context. This
problem exists from the beginning and recent modification has not
changed it. When the translation is not fully
reversible
then the
the expressions evaluated by HVM may give unexpected results
so
programmer should keep it in mind. This is true in all
languages using
such translations. Anyhow after recent modification it can be
masked
by special sorting/comparison order which keeps the same wight
for
all characters using different variants of the the same Latin
letter.
Maybe we should add it.
Finally I've seen on this list the discussion about GET and "UTF8"
Harbour's CDP and looks that I have to repeat some information.
It's bound with character encoding so I'll do it in this thread.
UTF8 Harbour's CDP is only for encoding. It does not introduce any
more information about encoded characters so it will not change
the behavior of functions like UPPER(), LOWER(), ISALPHA() and
all string functions operating on character indexes. i.e. LEN(),
LEFT(), RIGHT(), SUBSTR(), STUFF(), STRTRAN(), PAD*() etc.
All of them will operate on byte (binary) representation of the
strings in UTF8 encoding. Also GET system using above functions
will not work as you may expect. Anyhow this CDP is useful for
translation and is extremely light so is present in all Harbour
static builds.
But if someone wants to use UTF8 as fully functional CDP which
affects all above functions. sorting order, etc. then he should use
UTF8EX. It consumes more memory and increases the size of
static executable due to additional information about Unicode
characters so it's not default choice.
In short words: the "bug" examples in code using GET system
can be "fixed" by replacing:
hb_cdpSelect( "UTF8" )
with
REQUEST HB_CODEPAGE_UTF8EX
hb_cdpSelect( "UTF8EX" )
For more information about this changes in Harbour look at
2012-04-20 17:52 UTC+0200 Przemyslaw Czerpak
(druzus/at/
poczta.onet.pl)
best regards,
Przemek
W dniu 3.09.2025 o 20:03, Aleksander
Czajczynski pisze: