EXECUTE STATEMENT and charset introducers

55 views
Skip to first unread message

Dmitry Kovalenko

unread,
Sep 20, 2023, 9:17:37 AM9/20/23
to firebir...@googlegroups.com
Hello,

I think it is a question for Adriano :)

I test the following scenario of using EXECUTE STATEMENT:

1. I insert into WIN1251-BLOB the text of the following SQL:

insert into TBL_CS__WIN1251 (COL_BLOB) values (_utf8 'тут будут UTF8-символы')

'тут будут UTF8-символы' - it is a text with russian symbols - 'here will be UTF8-symbols'

Note that I defined _utf8 before this text.

Because I write this text into WIN1251-BLOB, this text will consist only of WIN1251-symbols (not UTF8 symbols).

2. I want to execute this stored SQL in EXECUTE BLOCK.

execute block as declare STMT varchar(256);
begin for select COL_BLOB from TBL_CS__WIN1251 into :STMT DO EXECUTE STATEMENT :STMT; end

When I do it, I get the error "malformed string"

--------- SCRIPT
delete from TBL_CS__WIN1251;

insert into TBL_CS__WIN1251 (COL_BLOB) values (_win1251 'insert into TBL_CS__WIN1251 (COL_BLOB) values (_utf8 ''тут будут UTF8-символы'')');

select COL_BLOB from TBL_CS__WIN1251;

execute block as declare STMT varchar(256);
begin for select COL_BLOB from TBL_CS__WIN1251 into :STMT DO EXECUTE STATEMENT :STMT; end

---------
image.png
--------
Question

Should "EXECUTE STATEMENT" process _utf8 introducers in executed SQL ?

--------
Just for fun, I made a little gif with a demonstration of this problem in a similar script.

I used another insert statement here. 

insert into TBL_CS__WIN1251 (COL_BLOB) values (_utf8 'insert into TBL_CS__WIN1251 (COL_BLOB) values (_utf8 ''тут будут UTF8-символы'')');

This change does not matter.

2023_07_12--02--fb4_execute_stmt_and_cs_introducer-2.gif

Regards,
Dmitry Kovalenko

Dimitry Sibiryakov

unread,
Sep 20, 2023, 9:20:50 AM9/20/23
to firebir...@googlegroups.com
Dmitry Kovalenko wrote 20.09.2023 15:17:
> Because I write this text into WIN1251-BLOB, this text will consist only of
> WIN1251-symbols (not UTF8 symbols).

And that's the same error again: query text with introducers is strictly
binary, it cannot be stored or handled as any character string at all.
That's why indroducers is crap and must die.

--
WBR, SD.

Vlad Khorsun

unread,
Sep 20, 2023, 10:42:14 AM9/20/23
to firebir...@googlegroups.com
20.09.2023 16:17, Dmitry Kovalenko wrote:

> Just for fun, I made a little gif with a demonstration of this problem in a similar script.

NEVER add such huge attachments "for fun" !
Never add pictures into this list without abosolute need.

Thanks for understanding,
Vlad

Adriano dos Santos Fernandes

unread,
Sep 20, 2023, 11:02:13 AM9/20/23
to firebir...@googlegroups.com


Em qua., 20 de set. de 2023 10:17, Dmitry Kovalenko <dmitry....@gmail.com> escreveu:
Hello,

I think it is a question for Adriano :)

I test the following scenario of using EXECUTE STATEMENT:

Full metadata script and connection charset, please.


Adriano

Dmitry Kovalenko

unread,
Sep 20, 2023, 12:03:49 PM9/20/23
to firebir...@googlegroups.com
Hello

Full metadata script and connection charset, please.

This script is attached to this email (text has win1251 charset)

Top level insert does not use an introducer because isql does not support it.

SET NAMES WIN1251;
CONNECT inet4://localhost/d:\Database\FB_04_0_0\IBP_TEST_FB40_D3.GDB USER SYSDBA PASSWORD masterkey;
RECREATE TABLE TBL_CS__WIN1251 (COL_BLOB BLOB SUB_TYPE TEXT CHARACTER SET WIN1251);
INSERT INTO TBL_CS__WIN1251 (COL_BLOB) values ('insert into TBL_CS__WIN1251 (COL_BLOB) values (_utf8 ''тут будут UTF8-символы'')');
SELECT COL_BLOB FROM TBL_CS__WIN1251;
SET TERM !!;
EXECUTE BLOCK AS DECLARE STMT VARCHAR(256); BEGIN FOR SELECT COL_BLOB FROM TBL_CS__WIN1251 INTO :STMT DO EXECUTE STATEMENT :STMT; END!!
SET TERM ;!!

---- Console

d:\Users>chcp 1251
Текущая кодовая страница: 1251

d:\Users>d:\Program_Files\Firebird_4_0_x64\isql -i execute_statement_and_introducers.sql
Use CONNECT or CREATE DATABASE to specify a database

         COL_BLOB
=================
           1528:0
==============================================================================
COL_BLOB:

insert into TBL_CS__WIN1251 (COL_BLOB) values (_utf8 'тут будут UTF8-символы')
==============================================================================

Statement failed, SQLSTATE = 22000
Dynamic SQL Error
-SQL error code = -104
-Malformed string
-At block line: 1, col: 106
After line 11 in file execute_statement_and_introducers.sql


d:\Users>

Regards,
Dmitry Kovalenko
execute_statement_and_introducers.sql

Vlad Khorsun

unread,
Sep 20, 2023, 12:28:21 PM9/20/23
to firebir...@googlegroups.com
20.09.2023 19:03, Dmitry Kovalenko wrote:

> Top level insert does not use an introducer because isql does not support it.

You or me have wrong understanding of charset introducer's.

I always considered it as a *declaration* of charset of following literal.
And that charset could be not the same as script charset or connection charset.
I.e. following string *already* encoded using introducer charset and client
tools have nothing to do with it, engine does. Therefore nor isql, nor EXECUTE
STATEMENT have nothing to do with charset introducer, no special "support" is
required. Also, it is impossible to store correct string with _utf8 literal into
blob with WIN1251 encoding.

Regards,
Vlad

Adriano dos Santos Fernandes

unread,
Sep 20, 2023, 9:59:54 PM9/20/23
to firebir...@googlegroups.com
On 20/09/2023 13:28, Vlad Khorsun wrote:
> 20.09.2023 19:03, Dmitry Kovalenko wrote:
>
>> Top level insert does not use an introducer because isql does not
>> support it.
>
>   You or me have wrong understanding of charset introducer's.
>
>   I always considered it as a *declaration* of charset of following
> literal.
> And that charset could be not the same as script charset or connection
> charset.
> I.e. following string *already* encoded using introducer charset and client
> tools have nothing to do with it, engine does. Therefore nor isql, nor
> EXECUTE
> STATEMENT have nothing to do with charset introducer, no special
> "support" is
> required.

Correct.


> Also, it is impossible to store correct string with _utf8
> literal into
> blob with WIN1251 encoding.
>

Possible it is. But he then asks for it (the WIN1251 string) to be
interpreted as an UTF8, hence the malformed error.


Adriano


Mark Rotteveel

unread,
Sep 21, 2023, 3:19:10 AM9/21/23
to firebir...@googlegroups.com
It is a SQL standard feature, and when combined with hex-literals it is
pretty powerful (though that case is not a standard feature).

Mark
--
Mark Rotteveel

Mark Rotteveel

unread,
Sep 21, 2023, 3:26:35 AM9/21/23
to firebir...@googlegroups.com
On 20-09-2023 15:17, Dmitry Kovalenko wrote:
> I test the following scenario of using EXECUTE STATEMENT:
>
> 1. I insert into *WIN1251*-BLOB the text of the following SQL:
>
> insert into TBL_CS__WIN1251 (COL_BLOB) values (*_utf8* 'тут будут
> UTF8-символы')
>
> 'тут будут UTF8-символы' - it is a text with russian symbols - 'here
> will be UTF8-symbols'
>
> Note that I defined *_utf8* before this text.
>
> Because I write this text into WIN1251-BLOB, this text will consist only
> of WIN1251-symbols (not UTF8 symbols).

The mistake you make is thinking that an introducer is a cast, but it is
not. It is an instruction to interpret the *bytes* within the string
literal in the specified character set instead of the connection
character set. If those bytes are not valid characters in the introducer
character set, you'll get a malformed string error.

So, if the string was encoded using the connection character set, you
cannot use introducers this way (except if all characters are plain ASCII).

The only way you could make this work, is if you construct the statement
in binary form with multiple character sets, instead of a single
character set encoded with the connection character set.

Mark
--
Mark Rotteveel

Dimitry Sibiryakov

unread,
Sep 21, 2023, 4:52:52 AM9/21/23
to firebir...@googlegroups.com
'Mark Rotteveel' via firebird-devel wrote 21.09.2023 9:19:
> when combined with hex-literals it is pretty powerful (though that case is not a
> standard feature).

Yes, hex literal (or any other encoded binary literal) is the only safe way
of using them.
Frankly I have no idea what ANSI committee was thinking creating them in this
form.

--
WBR, SD.
Reply all
Reply to author
Forward
0 new messages