Lazarus + Firebird : question about charset

391 views
Skip to first unread message

Marc Lebrun

unread,
Feb 12, 2024, 8:54:27 AM2/12/24
to firebird-support

I have been asked to connect a Lazarus application to a Firebird 3.0 database. The problem is, the database has been around for twenty years, and it was created without specifying any default charset.

Using FlameRobin, I can read/write strings containing accented characters without any problem. In the database registration info, I just leave NONE as the charset for the connection, and as long as we all work from the same environment (Windows / European language) everything gets stored and retrieved in the same charset, so we have no problem at all.

The problem comes with Lazarus, I can't find a way to properly configure the TIBConnection component to display accented strings correctly. I tried leaving its charset property empty, or setting it to NONE, ISO8859_1, WIN1252, or event UTF8... each time, the accented characters are replaced with question marks.

It seems to me that when you write to a database without specifying a charset, the database doesn't do any conversion and writes the characters as it receives them. Because of this, we are obliged to reread them in exactly the same conditions. That's what I can't reproduce with Lazarus.

What can I do to solve this problem?

Thanks :-)

Dimitry Sibiryakov

unread,
Feb 12, 2024, 8:58:00 AM2/12/24
to firebird...@googlegroups.com
Marc Lebrun wrote 12.02.2024 14:50:
> Because of this, we are obliged to reread them in exactly the same conditions.
> That's what I can't reproduce with Lazarus.
>
> What can I do to solve this problem?

Lazarus strings are UTF-8 strings. Use explicit conversion from ANSI into
UTF-8 on data retrieval.

--
WBR, SD.

Tomasz Tyrakowski

unread,
Feb 12, 2024, 9:25:40 AM2/12/24
to firebird...@googlegroups.com
It depends in part which string type your Lazarus application uses. See
here: https://wiki.lazarus.freepascal.org/String. Using compiler
switches, you can make your application use either single-byte
AnsiStrings or UTF16 strings as the underlying string type (string is
actually just an alias, it's either AnsiString, ShortString or
UnicodeString).
In your database you've got strings stored in single byte encoding, so
there are some characters stored with codes above 127, which
interpretation (grphical representation) depends on the code page.
You've got no code page set in the database, so Firebird can't do any
automatic character translation for you (between database and client
character set).
You need to retrieve the text from the database (think of it not as
strings but just as simple byte arrays) and interpret the character
codes according to the code page of your choice (the one you set in FPC
RTL as the current code page).
Probably the simplest solution would be to use AnsiStrings as the
default in the application ({$H+}), leave the connection charset empty
and pray that the data in the DB is indeed stored in the code page you
believe it is (otherwise you may get invalid visual representation for a
given char code).
See also here:
https://www.freepascal.org/docs-html/rtl/system/unicodesupport.html
(maybe setting DefaultSystemCodePage will help).
The reason you're getting question marks now is that you most probably
use UnicodeString as your string type, in which case FPC assumes UTF-8
encoding. In normal cases you'd set UTF-8 in the connection encoding and
Firebird would translate the single byte CP-12xx strings from the tables
to UTF-8 strings wile sending them to you. But in your case Firebird is
not able to do the translation because it can't guess the interpretation
of the char codes stored in the DB (and translate them to proper UTF-8
codes) due to the lack of DB encoding settings.

regards
Tomasz

Marc Lebrun

unread,
Feb 12, 2024, 9:39:21 AM2/12/24
to firebird-support
Thanks Tomasz, your anwser gives me something to work on :-)

Marc Lebrun

unread,
Feb 13, 2024, 3:46:05 AM2/13/24
to firebird-support

Hello, I'm still stuck with my charset problem. I have tried several ways to convert the string, but none of these are working :

Edit1.Text := SQLQuery1.FieldByName('name').AsString;

Edit1.Text := Utf8ToAnsi(SQLQuery1.FieldByName('name').AsString);

Edit1.Text := AnsiToUtf8(SQLQuery1.FieldByName('name').AsString);

I'm trying to understand how the result string is encoded, so I display the hexadecimal value of each character. Here are two examples, where the question mark should be "è" (U+00E8) or "ë" (U+00EB). As you can see, this is exactly the values that I find in the hexadecimal codes :

Ad?le CROAIN
41 64 E8 6C 65 20 43 52 4F 41 49 4E 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 

Jo?l DEKEYSER
4A 6F EB 6C 20 44 45 4B 45 59 53 45 52 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

I've been told that when the string is received from "AsString" it's too late to convert it, the string is already spoiled, and I should do the conversion somewhere else. Do you know what it means ? I have not been able to find myself...

Thank you all if you can help me :-)

Tomasz Tyrakowski

unread,
Feb 13, 2024, 4:00:08 AM2/13/24
to firebird...@googlegroups.com
On 13.02.2024 at 09:46, Marc Lebrun wrote:
> Hello, I'm still stuck with my charset problem. I have tried several ways
> to convert the string, but none of these are working :
>
> Edit1.Text := SQLQuery1.FieldByName('name').AsString;
>
> Edit1.Text := Utf8ToAnsi(SQLQuery1.FieldByName('name').AsString);
>
> Edit1.Text := AnsiToUtf8(SQLQuery1.FieldByName('name').AsString);
>
[...]
>
> I've been told that when the string is received from "AsString" it's too
> late to convert it, the string is already spoiled, and I should do the
> conversion somewhere else. Do you know what it means ? I have not been able
> to find myself...

Browsing the docs of TIB_Connection I stumbled upon this:

https://lazarus-ccr.sourceforge.io/docs/fcl/ibconnection/tibconnection.useconnectioncharsetifnone.html

According to the docs, the meaning of the property is as follows:

"For string/blob fields with codepage none, use the connection character
set when copying data"

which looks like a solution to your exact problem.
So, setting the connection charset to your code page (e.g. WIN-1252) in
the TIB_Connection, and also turning on this property, might be the way
to go.
Worth giving it a shot.

regards
Tomasz

Mathias Pannier (unitel)

unread,
Feb 13, 2024, 4:03:36 AM2/13/24
to firebird...@googlegroups.com

Hello,

 

Did you have something like:

SQLQuery1.FieldByName('name').AsWideString

 

I’ve used FIB+ components for some time in Delphi. There was a difference using    FieldByName(‘fieldname’).AsString    or     FieldByName(‘fieldname’).AsWideString.

See also FieldType ftWideString: https://wiki.freepascal.org/Database_field_type

 

Regards

Mathias

 

Von: firebird...@googlegroups.com [mailto:firebird...@googlegroups.com] Im Auftrag von Marc Lebrun
Gesendet: Dienstag, 13. Februar 2024 09:46
An: firebird-support <firebird...@googlegroups.com>
Betreff: Re: [firebird-support] Lazarus + Firebird : question about charset

 

ACHTUNG: Diese E-Mail stammt von einem externen Absender. Bitte vermeiden Sie es, Anhänge oder externe Links zu öffnen.

 

--
You received this message because you are subscribed to the Google Groups "firebird-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firebird-suppo...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/firebird-support/fb2ed8ff-db3f-49b6-b845-e5cb014cbf8an%40googlegroups.com.

ub.unitel GmbH, Schulstraße 16, 06792 Sandersdorf-Brehna
Geschaeftsfuehrung Klaus Richter, Olaf Meyer
Amtsgericht Stendal
HRB 26389 FA Bitterfeld Steuernr. 116/107/08597 Ust.identNr. DE815796778
Deutsche Bank IBAN DE53 86070024 0 6143234 00
Kreissparkasse Anhalt-Bitterfeld IBAN DE69 80053722 0 3050326 82
_____________________________________________________________________
Dieses E-Mail ist nur für den Empfänger bestimmt, an den es gerichtet
ist und kann vertrauliches bzw. unter das Berufsgeheimnis fallendes
Material enthalten. Jegliche darin enthaltene Ansicht oder Meinungs-
äußerung ist die des Autors und stellt nicht notwendigerweise die
Ansicht oder Meinung von ub.unitel GmbH dar.
Sind Sie nicht der Empfänger, so haben Sie diese E-Mail irrtümlich
erhalten und jegliche Verwendung, Veröffentlichung, Weiterleitung,
Abschrift oder jeglicher Druck dieser E-Mail ist strengstens untersagt.
_____________________________________________________________________

Ertan Küçükoglu

unread,
Feb 13, 2024, 4:35:41 AM2/13/24
to firebird...@googlegroups.com
Hi,

I believe below would work fine but I cannot test since I don't have TIBDAC.

Delphi: Set UseUnicode to true on TIBConnection, use WideString for field read/write.
Lazarus: Set UseUnicode to true on TIBConnection use WideString for all field read/write.


Marc Lebrun <marcl...@gmail.com>, 13 Şub 2024 Sal, 11:46 tarihinde şunu yazdı:

Marc Lebrun

unread,
Feb 16, 2024, 5:28:40 AM2/16/24
to firebird-support
Hello, there is some good news and bad news : reading works, but not writing.
The property UserConnectionCharSetIfNone has been very useful, now I can read the values correctly :-)
However when I want to write, the problem is still not solved.
Here's my code :

  IBConnection := TIBConnection.Create(self);
  try
    IBConnection.HostName     := 'localhost';
    IBConnection.DatabaseName := 'c:\path\to\my\database.fdb';
    IBConnection.CharSet      := 'WIN1252';
    IBConnection.UserName     := '... my user name ...';
    IBConnection.Password     := '... my password ...';
    IBConnection.UseConnectionCharSetIfNone := True;
    IBConnection.Open;

    SQLTransaction := TSQLTransaction.Create(self);
    try
      SQLTransaction.Database := IBConnection;
      SQLTransaction.StartTransaction;

      SQLQuery := TSQLQuery.Create(self);
      try
        SQLQuery.DataBase := IBConnection;
        SQLQuery.Transaction := SQLTransaction;
        SQLQuery.SQL.Add('update personnel set nom = :nom where id = :id');
        SQLQuery.ParamByName('id') .AsInteger := 5000;
        SQLQuery.ParamByName('nom').AsString := 'Béatrice';
        SQLQuery.ExecSQL;
      finally
        SQLQuery.Free;
      end;

    finally
      SQLTransaction.Commit;
      SQLTransaction.Free;
    end;

  finally
    IBConnection.Free;
  end;

So my connection is in charset WIN1252.
And I believe Lazarus uses UTF8.
The string 'Béatrice' is written as 'Béatrice', no matter how I try to convert.
I tried all combinations of .AsString .AsUnicodeString .AsAnsiString .AsWideString with convertions of the value with Utf8ToAnsi(), AnsiToUtf8(), etc...

Nothing is working.

Here are the hexadecimal values before and after, for comparison :
Béatrice :  42 C3 A9 61 74 72 69 63 65
Béatrice : 42 C3 83 C2 A9 61 74 72 69 63 65

So it looks like the byte C3 has been replaced with three bytes C3 83 C2
All the rest is unchanged

I'm about to give up...

Dimitry Sibiryakov

unread,
Feb 16, 2024, 5:32:16 AM2/16/24
to firebird...@googlegroups.com
Marc Lebrun wrote 16.02.2024 11:28:
> So it looks like the byte C3 has been replaced with three bytes C3 83 C2

No, it is just double ANSI->UTF-8 encode.
AFAIK Lazarus and all its components are open source so you can freely debug
them. All you need it to trace whole way of data from screen to Firebird API and
check every transformation as well as encoding of result.

--
WBR, SD.

Ertan Küçükoglu

unread,
Feb 16, 2024, 6:18:08 AM2/16/24
to firebird...@googlegroups.com
Hello,

'Béatrice' is an UTF8 text. So, you need to convert it accordingly.

What I believe is if there are international inputs in your database then your connection charset should be set to UTF8 on all connections. You also need to use AsWideString on your read/write.

Thanks & Regards,
Ertan

Marc Lebrun <marcl...@gmail.com>, 16 Şub 2024 Cum, 13:28 tarihinde şunu yazdı:

Tomasz Tyrakowski

unread,
Feb 16, 2024, 6:28:49 AM2/16/24
to firebird...@googlegroups.com
On 16.02.2024 at 12:17, Ertan Küçükoglu wrote:
> 'Béatrice' is an UTF8 text. So, you need to convert it accordingly.
>
> What I believe is if there are international inputs in your database then
> your connection charset should be set to UTF8 on all connections. You also
> need to use AsWideString on your read/write.

I disagree. That would lead to the situation, in which some texts in the
database (the older ones or added by other, ANSI-based applications) are
ANSI-encoded and some are UTF-8, all of them mixed with each other in
columns with encoding NONE. That would be a complete mess as you'd have
no way of knowing whether a particular string coming from the database
is ANSI or UTF8.

regards
Tomasz

Virgo Pärna

unread,
Feb 16, 2024, 6:36:00 AM2/16/24
to firebird...@googlegroups.com
On 16.02.2024 12:28, Marc Lebrun wrote:
>
> So my connection is in charset WIN1252.
> And I believe Lazarus uses UTF8.
> The string 'Béatrice' is written as 'Béatrice', no matter how I try to
> convert.
> I tried all combinations of .AsString .AsUnicodeString .AsAnsiString
> .AsWideString with convertions of the value with Utf8ToAnsi(),
> AnsiToUtf8(), etc...
>

What is actual database charset? Or at least charset of nom field?
On can you check, what is value of DefaultSystemCodePage in Lazarus at
runtime.


--
Virgo Pärna
Gaiasoft OÜ
vi...@gaiasoft.ee

Tomasz Tyrakowski

unread,
Feb 16, 2024, 6:36:51 AM2/16/24
to firebird...@googlegroups.com, Marc Lebrun
On 16.02.2024 at 11:28, Marc Lebrun wrote:
> Hello, there is some good news and bad news : reading works, but not
> writing.
> The property UserConnectionCharSetIfNone has been very useful, now I can
> read the values correctly :-)
> However when I want to write, the problem is still not solved.

First, you haven't said from the beginning, that you intended to write
to the database, just to display strings correctly ;)
Just joking, mean no offense.
I'll try to do some tests when I've got more time to spare, but I'm
gonna do excactly what Dimitry suggested to you: debug the string
assignment in TIB_Query step by step and see if there's a way to force
the ANSI-encoded string to be passed to the database as is, without
implicit conversion to UTF-8 by sql-db code. Perhaps you can do the same
but much sooner, considering you're the one most interested in the
outcome :)

regards
Tomasz

Marc Lebrun

unread,
Feb 16, 2024, 7:43:48 AM2/16/24
to firebird-support
Thanks Tomasz.
I tried to do it myself, but the components are compiled in the IDE, I'm not sure if the debug is possible.
I tried setting a breakpoint on the line where i do SQLQuery.ParamByName('nom').AsString := 'Béatrice';
Then F7 would normally step into the code of these functions but in this case it's actign as if I pressed F8 to step over.
So I don't know how I can debug it...

Dimitry Sibiryakov

unread,
Feb 16, 2024, 7:48:22 AM2/16/24
to firebird...@googlegroups.com
Marc Lebrun wrote 16.02.2024 13:43:
> So I don't know how I can debug it...

https://www.google.com/search?q=Lazarus+debug+RTL

--
WBR, SD.

Virgo Pärna

unread,
Feb 16, 2024, 7:52:24 AM2/16/24
to firebird...@googlegroups.com
On 16.02.2024 14:43, Marc Lebrun wrote:
> Thanks Tomasz.
> I tried to do it myself, but the components are compiled in the IDE, I'm
> not sure if the debug is possible.
> I tried setting a breakpoint on the line where i do
> SQLQuery.ParamByName('nom').AsString := 'Béatrice';


There is additional question: what is codepage of string constant.
If source file is utf-8 encoded, then you can add
{$codepage UTF8}
at the beginning of file (just in case).
Also: what is the compiler mode you are compiling in?

Tomasz Tyrakowski

unread,
Feb 16, 2024, 7:53:52 AM2/16/24
to firebird...@googlegroups.com
On 16.02.2024 at 13:43, Marc Lebrun wrote:
> Thanks Tomasz.
> I tried to do it myself, but the components are compiled in the IDE, I'm
> not sure if the debug is possible.

Free Pascal / Lazarus doesn't support dynamically linked, pre-compiled
component libraries (like BPL in Delphi) - all units are statically
linked into your executable, so installing in the IDE is one thing, and
linking in your project is another. However, it may be possible the
precompiled SQLdb ppu files (without debug info) get linked with your
exe, so try the procedure I described below.

> I tried setting a breakpoint on the line where i do
> SQLQuery.ParamByName('nom').AsString := 'Béatrice';
> Then F7 would normally step into the code of these functions but in this
> case it's actign as if I pressed F8 to step over.
> So I don't know how I can debug it...

Open Tools / Configure "Build lazarus" and choose "Debug IDE" instead of
"Optimized IDE" or "Normal IDE", then rebuild Lazarus, rebuild your app
and try debugging again.

regards
Tomasz

Marc Lebrun

unread,
Feb 16, 2024, 8:14:38 AM2/16/24
to firebird-support
Thanks I will try that.

In the meantime I was looking at Virgo's question about DefaultSystemCodePage.
I found two different values :
DefaultSystemCodePage = 65001
DefaultUnicodeCodePage = 1200
Which correspond to UTF8 and UTF16, according to "systemh.inc" :

const
  CP_ACP     = 0;     // default to ANSI code page
  CP_OEMCP   = 1;     // default to OEM (console) code page
  CP_UTF16   = 1200;  // utf-16
  CP_UTF16BE = 1201;  // unicodeFFFE
  CP_UTF7    = 65000; // utf-7
  CP_UTF8    = 65001; // utf-8
  CP_ASCII   = 20127; // us-ascii
  CP_NONE    = $FFFF; // rawbytestring encoding


Regarding the compiler mode, I have this line on top of all my units :
{$mode ObjFPC}{$H+}

Virgo Pärna

unread,
Feb 16, 2024, 8:27:04 AM2/16/24
to firebird...@googlegroups.com
On 16.02.2024 15:14, Marc Lebrun wrote:
> Thanks I will try that.
>
> In the meantime I was looking at Virgo's question about
> DefaultSystemCodePage.
> I found two different values :
> DefaultSystemCodePage = 65001

So AnsiString is supposed to be UTF-8 encoded.

I know that I have in my lazarus program
if DefaultSystemCodePage = 65001 then
IBConnection.CharSet := 'UTF-8'
else
IBConnection.CharSet := 'WIN1252';

Database charset is actually 'WIN1252'
And then I can save accented characters in win1252 range cocrectly to
database


>
> Regarding the compiler mode, I have this line on top of all my units :
> {$mode ObjFPC}{$H+}
>

Ok. So not delphiunicode, which always forces source file to compiler
DefaultSystemCodePage.
{$codepage UTF8} would be probably good idea to make sure, that string
constants are also utf8 encoded (assuming file is actually utf8).

Dimitry Sibiryakov

unread,
Feb 16, 2024, 8:30:00 AM2/16/24
to firebird...@googlegroups.com
Virgo Pärna wrote 16.02.2024 14:27:
>     So AnsiString is supposed to be UTF-8 encoded.
>
>     I know that I have in my lazarus program
> if DefaultSystemCodePage = 65001 then
>   IBConnection.CharSet := 'UTF-8'
> else
>   IBConnection.CharSet := 'WIN1252';
>
> Database charset is actually 'WIN1252'
> And then I can save accented characters in win1252 range cocrectly to database

Unfortunately it won't help topic starter because they has fields with
charset NONE. Firebird cannot convert string from connection charset into field
charset in this case.
Perhaps there is a way to make AnsiString to be really ANSI in Lazarus.

--
WBR, SD.

Tomasz Tyrakowski

unread,
Feb 16, 2024, 8:53:01 AM2/16/24
to firebird...@googlegroups.com
On 16.02.2024 at 14:29, 'Dimitry Sibiryakov' via firebird-support wrote:
>   Unfortunately it won't help topic starter because they has fields
> with charset NONE. Firebird cannot convert string from connection
> charset into field charset in this case.
>   Perhaps there is a way to make AnsiString to be really ANSI in Lazarus.
>

That's exactly the issue here. In normal case, you use AnsiString when
you need it to be, well... ANSI ;) and UnicodeString for wide
characters. The basic "string" type is just an alias to one or the other.
However, if SQLdb, and more precisely the fcl-db package from Free
Pascal (the library implementing TIB_Connection, TIB_Query, etc.)
converts internally all strings to UTF-8, it poses a problem.
But I don't believe it to be the case, because if all strings were
quietly converted to UTF-8 inside SQLdb, what would be the point in
letting the client code set the character set in TIB_Connection? UTF-8
would be the only viable option anyway.
So, I think there has to be a away to convince SQLDb to pass ANSI
strings to Firebird without messing with them in between.

regards
Tomasz

Virgo Pärna

unread,
Feb 16, 2024, 8:53:30 AM2/16/24
to firebird...@googlegroups.com
On 16.02.2024 15:29, 'Dimitry Sibiryakov' via firebird-support wrote:
>
>   Unfortunately it won't help topic starter because they has fields
> with charset NONE. Firebird cannot convert string from connection
> charset into field charset in this case.
>   Perhaps there is a way to make AnsiString to be really ANSI in Lazarus.
>

So.. Maybe..

{$CODEPAGE 1252}
DefaultSystemCodePage := CP_ACP;

Tomasz Tyrakowski

unread,
Feb 16, 2024, 1:57:26 PM2/16/24
to firebird...@googlegroups.com
On 16.02.2024 at 14:14, Marc Lebrun wrote:
> Thanks I will try that.
>
> In the meantime I was looking at Virgo's question about
> DefaultSystemCodePage.
> I found two different values :
> DefaultSystemCodePage = 65001
> DefaultUnicodeCodePage = 1200
> Which correspond to UTF8 and UTF16, according to "systemh.inc" :
[...]
> Regarding the compiler mode, I have this line on top of all my units :
> {$mode ObjFPC}{$H+}

I (hopefully) recreated your environment (with the exception my CP is
1250, not 1251, but I don't think it makes any difference in principle -
it's still a specific ANSI code page).
So, I've created a databse with default character set NONE and a table
with a VARCHAR field in it (Firebird 4.0.4). I've inserted a string with
FlameRobin, with connection charset set to WIN1250 (that's my local ANSI
code page, I chose this one because I can easily enter CP-1250 accented
characters from my keyboard, unlike CP-1250 ones) and with some Polish
accented characters (which obviously would show garbage if you tried to
interpret them as UTF8).
Then I made a simple Lazarus app with TIB_Connection and TIB_Query, the
connection charset is set to 'WIN1250' and UseConnectionCharSetIfNone is
set to true. The compiler mode is set as yours, DefaultSystemCodePage
and DefaultUnicodeCodePage on my machine show the same values as yours.
Now, I can read rows from the table and display valid strings with
accented characters. Moreover, I can insert new rows into the table from
within the Lazarus app (with strings containing accented characters),
select them back and show them in the app, as well as select them in
FlameRobin (in both cases properly accented characters are shown). The
inserted strings come straight from a TEdit:

q.ParamByName('TXT').AsString := Edit1.Text;

and the strings being displayed come straight from the query:

s := q.FieldByName('TXT').AsString;
ShowMessage(s);

So, all in all, in my opinion it _does_ work as expected both ways
(UseConnectionCharSetIfNone seems to do the trick in both directions).
You must have made a mistake somewhere or there's something else in your
setup you haven't written about, which makes the difference in behavior.
I can send you a zipped fdb file + Lazarus project sources on PM if
you're interested (I'm not sure what the policy about substantial
attachments is on this list, so I don't want to violate rules or pollute
the list with a meg-or-so zip file).

regards
Tomasz

Reply all
Reply to author
Forward
0 new messages