CODEPAGE: why my DBF are created without CP?

1,625 views
Skip to first unread message

TomFort

unread,
Jun 15, 2012, 4:16:07 AM6/15/12
to Harbour Users
When the ".PRG" generate a DBF, this is without a CodePage...

why?

I am a neophyte... and perhaps I don't understand how work the command
DBCREATE () where the seventh parameter is precisely <cCodePage>...

What value should I enter? "ITWIN"? "1252" (windows ANSI)? another??

OR

what is the sequence of commands to create a DBF with a particular CP?

I did various tests but had no results ...

and I find no documentation...


Thanks

Tom

Massimo Belgrano

unread,
Jun 15, 2012, 6:34:37 AM6/15/12
to harbou...@googlegroups.com

here the post of Klas 

This is a followup to a thread about codepages that has been going on for a few days. The discussion in that thread was, among other things, about using different codepages in the db tables and in the rest of the application. This can be useful for keeping the old <XX>437 codepage in existing files and using for example an <XX>WIN codepage in the user interface.

If you have additional codepage info that might be useful to other developers, please post that info in this thread.

Here is what I found when digging into the changelog and reading probably a thousand newsgroup messages in the last few days. There are two methods that can be used to convert data between character fields with CP 437 encoding and some other codepage for say/get, and neither has been "actively marketed". So I felt it was time to do that.

#1: DBUSEAREA() has eight arguments in Harbour, two more than Clipper. The seventh is <cCodePage>. This is not documented in the docs for the function in doc\en\rdddb.txt, but it was added already in 2002. This can be used to specify a codepage upon opening a specific dbf.

#2: SET( _SET_DBCODEPAGE ) was added in 2009 and sets the default codepage for use with all subsequently opened dbf files. The codepage can be set once and then (mostly) forgotten about. Everything happens automatically.

If there are tables where character fields must not be converted from one codepage to another, for example fields with encrypted passwords or other binary data, it is important to turn off codepage conversion for that particular table  -  or to convert the field in question to hex once and for all and not worry about it again.

According to my tests, passing NIL as the <cCodePage> argument is not enough to turn off the codepage conversion in DBUSEAREA(). Pass a realcodepage or unset SET( _SET_DBCODEPAGE ) temporarily:

cCp := SET( _SET_DBCODEPAGE, NIL )
USE ...
SET( _SET_DBCODEPAGE, cCp )

If the general codepage for the application, SET( _SET_CODEPAGE ) is set, for example, to a Windows codepage and SET( _SET_DBCODEPAGE ) is not set, and the tables are in 437 encoding, then the <cCodePage> argument must be added to every call to DBUSEAREA(). That can be a problem if those calls are hidden in library code, or 3rd party code.

Here is the complete list of arguments to the DBUSEAREA() function:

DBUSEAREA( [<lNewArea>], [<cDriver>], <cName>, [<xcAlias>],;
[<lShared>], [<lReadonly>], [<cCodePage>], [<nConnection>] ) -> <lSuccess>

The codepage can also be specified in the command version of DBUSEAREA(), like this:

USE TableName CODEPAGE <cCodepage>

DBCREATE() has five extra arguments compared to Clipper (two of them are documented). The <cCodePage> argument is number seven:

DBCREATE( <cFile>, <aStruct>, <cRDD>, <lKeepOpen>, <cAlias>,;
<cDelimArg>, <cCodePage>, <nConnection> ) -> <lSuccess>

BTW, in Clipper DBUSEAREA() and DBCREATE() return NIL, in Harbour they return <lSuccess>, which is also not documented except in the changelog (added by Przemek 2007-05-03).

To use CP 437 in db tables and CP WIN in the application, here are the settings to use:
SET( _SET_CODEPAGE, <mywindowscodepage> )
SET( _SET_DBCODEPAGE, <my437codepage> )

When SET( _SET_DBCODEPAGE ) was implemented on 2009-12-06 Viktor wrote the following description of what he wanted to do (with his own corrections integrated in the original message):

-----------------------------------------------------------

Hi All,

I'm faced with the situation where I'd like to detach the CP used in DB tables from the CP used in other parts of the system, f.e. the CP used to encode human readable strings in source code.

[ The reason are that I decided to make the CP transition gradual, without converting several gigabytes of database together with the app, plus if I switch to UTF-8 for strings used in source , I wouldn't like to switch to UTF-8 for the database yet, because all strings would become variable length. ]

So, my current choice is to extend all DBCREATE(), DBUSEAREA(), HB_DBCREATETEMP() and __DBOPENSDF() calls with the (dirty) Harbour <cCodePage> parameter extension.

For one thing this is quite error prone, as I may miss a few occurrences (or ones stored in lib code f.e.) thus corrupting the database. Second: I prefer not to use such dirty extensions.

This matter could be quite easily and cleanly solved with a app global Harbour setting, which would set the _default_ CP used in all DB operations. (just as we can f.e. have default shared mode set in _SET_EXCLUSIVE, since Clipper times)

So, I'd like to add a _SET_DBCODEPAGE setting which would control this centrally, cleanly and by eliminating all potential errors resulting in mass source code modification. The default value for this setting would be a neutral value (NIL/NULL), which means Harbour would behave exactly the same it is now, if someone doesn't use the setting.

The modification looks quite easy to make, so I can implement it.

Any opinions or objection?

Brgds,
Viktor

-----------------------------------------------------------

Here is the relevant part of the commit message

-----------------------------------------------------------

Revision: 13145
Author:   vszakats
Date:     2009-12-06 19:01:01 +0000 (Sun, 06 Dec 2009)

Log Message:
-----------
2009-12-06 19:59 UTC+0100 Viktor Szakats (harbour.01 syenar.hu)
 * src/vm/set.c
 * src/rdd/dbcmd.c
 * include/set.ch
 * include/hbset.h
  + Added support for SET( _SET_DBCODEPAGE ). This will set the
    default codepage for RDD operation. It affects following
    functions and everything which is based on them:
       DBUSEAREA()
       DBCREATE()
       HB_DBCREATETEMP()
       __DBOPENSDF()
    IOW every function which accepts current "dirty" Harbour
    extension <cCodePage>.
    This new SET() is useful if someone wants to use a different
    than app codepage in tables, without modifying every above
    calls to pass the db CP as extra parameter (plus maintaining
    this global setting in app code).

-----------------------------------------------------------

And here is an excerpt from a followup message from Przemek:

-----------------------------------------------------------

[...]

Anyhow all users should not forget that enabling any automatic code page
translation it's possible for tables containing text data only. Using
codepage translation for standard Clipper DBF tables used to store
binary data directly in table or in memo files will corrupt this data
so it should be used very carefully when user exactly knows what he
does and what it stored in used tables.

-----------------------------------------------------------

Hope this post can contribute to removing some of the confusion about codepages.

Regards,
Klas 


2012/6/15 TomFort <tommy...@gmail.com>

--
You received this message because you are subscribed to the Google
Groups "Harbour Users" group.
Unsubscribe: harbour-user...@googlegroups.com
Web: http://groups.google.com/group/harbour-users



--
Massimo Belgrano

Klas Engwall

unread,
Jun 15, 2012, 9:38:40 AM6/15/12
to harbou...@googlegroups.com
Hi Tom,
The <cCodePage> argument is not about *creating* a dbf with a specific
codepage but about *using* that codepage if you start saving data in the
dbf immediately after creating it without closing it first and then
reopening it. There is no place where a reference to the codepage is
saved in the dbf header, so you have to specify it in the application.
As I wrote in the post Massimo quoted, there are two ways you can do this:

1) By passing the codepage on every call to DBUSEAREA()
2) By setting a default SET(_SET_DBCODEPAGE, <your_db_codepage>) at the
top of the application to be used with every dbf

The interesting thing about DBCREATE(), since Clipper 5.x but
undocumented, is that by passing the <lKeepOpen> argument you do not
need to call DBUSEAREA() after DBCREATE() so you can start using it
immediately. And for that to work correctly you need to also pass the
<cCodePage> argument just like you would for a DBUSEAREA() call -
unless you use the application default approach above.

In my case, the codepage used in dbfs is "SV437C" because the result
matches the codepage that I used for the already existing dbfs in
Clipper. Are you starting fresh or do you have existing dbfs where data
is saved using a codepage that was decided a long time ago? If the
latter, use the same codepage for new dfs too (this is an absolute must,
or you will get index corruption and a general mess). Otherwise use
whatever you think is appropriate. It must be an existing Harbour
codepage, see the src\codepage directory.

Conversion between the VM codepage and the DB codepage is automatic. For
this to work correctly it is recommended that you always specify
SET(_SET_CODEPAGE,<your_vm_codepage>) and
SET(_SET_DBCODEPAGE,<your_db_codepage>) at the top of the application.

Returning to DBCREATE(), if you have specified a SET(_SET_DBCODEPAGE),
that codepage will be used by default, so you don't have to worry about
it after setting it once.

Another interesting thing that I found is that the <lKeepOpen> flag, if
you use it, does not have the normal .T./.F. options but .T./NIL (!). A
logical value will always be interpreted as .T. even if it is .F., which
is Clipper compatible but undocumented. So if you create a dbf that you
do not want to keep open, set the <lKeepOpen> argument to NIL or do not
specify any of the undocumented arguments at all.

Regards,
Klas

TomFort

unread,
Jun 18, 2012, 5:40:09 AM6/18/12
to Harbour Users
Hi.

Thanks!

But I found these documents:

http://www.dbf2002.com/dbf-file-format.html

byte 29 : "code page mark"

and this:
http://www.dbase.com/knowledgebase/int/db7_file_fmt.htm

byte 29: "Language driver ID"

I have to create DBF to share with users who have a software created
with VisualFoxPro, which generates DBF who also writes the byte 29,
and which has a command that analyzes this byte.

So, from what you say, the only way to set the bytes 29 is to write it
directly, because there is no command Harbour that does it.
Right?

Tom

Klas Engwall

unread,
Jun 18, 2012, 8:37:18 PM6/18/12
to harbou...@googlegroups.com
Hi Tom,

> Thanks!
>
> But I found these documents:
>
> http://www.dbf2002.com/dbf-file-format.html
>
> byte 29 : "code page mark"
>
> and this:
> http://www.dbase.com/knowledgebase/int/db7_file_fmt.htm
>
> byte 29: "Language driver ID"

Yes, but Harbour is Clipper, not (a recent) dBase or FoxPro workalike.
And Clipper must be able to understand files created by Harbour. So try
this page instead: http://www.zelczak.com/clipp_en.htm

At the time when the first dBase clones were created, Asthon Tate had
not quite realized that European countries were not satisfied with just
7-bit US-ASCII, so there was not yet a standard for codepages. And
Nantucket came up with their own system that the Clipper clones follow.

> I have to create DBF to share with users who have a software created
> with VisualFoxPro, which generates DBF who also writes the byte 29,
> and which has a command that analyzes this byte.

Well, you forgot to mention that you needed a FoxPro solution.

> So, from what you say, the only way to set the bytes 29 is to write it
> directly, because there is no command Harbour that does it.
> Right?

Yes, a simple fopen(); fseek(); fwrite(); fclose() sequence will do it.
You can even write your own FoxDbCreate() function that wraps DbCreate()
(without the <lKeepOpen> flag) and runs the fopen() ... sequence after
creating it.

But I do not know if that will help or not. I suspect (although I am no
FoxPro user) that FoxPro might be expecting the rest of the extended
header for the codepage setting to work, so this is what might happen:
Harbour creates a dBaseIII/Clipper dbf file with the short
dBaseIII/Clipper header and sets the file type byte to dBase III. You
add the codepage flag in byte 29, but FoxPro ignores it because that
flag is not a part of the dBaseIII file format standard. The solution is
probably to do it the other way around.

There is one thing you can try, though. The file type byte, the very
first byte of the dbf file, is 03h for a dBaseIII/Clipper dbf with no
memo fields. A basic VFP file type byte is 30h. If you call
hb_rddinfo(RDDI_TABLETYPE,DB_DBF_VFP) before creating the dbf file, and
#include "dbinfo.ch" at the top of the source file, then a dbf file with
a 30h file type will be created (with a short header). But I don't know
if that is enough for FoxPro to recognize the codepage byte. Try it and
see what happens. Either way, in Harbour you will have to specify the
codepage in the source code.

And one more thing. As I said last time, if you share the dbf between
Harbour and any other xBase version it is an absolute must that you use
a codepage that is 100% compatible with that other xBase version. Use
the cpinfo.prg utility in the harbour\tests directory to compare your
chosen codepage with what the other xBase version creates. It is written
for Harbour and Clipper, so you may have to modify it, change some
function names for example, for Fox. If the match is not absolute, your
applications *will* blow up with index corruption.

Good luck :-)

Regards,
Klas
Reply all
Reply to author
Forward
0 new messages