ISQL: SET SMARTTERM

76 views
Skip to first unread message

Adriano dos Santos Fernandes

unread,
Sep 2, 2023, 9:51:12 PM9/2/23
to firebir...@googlegroups.com
Hi!

I'm experimenting and this looks doable and very interesting for ISQL users.

The idea is to have a SET SMARTTERM command that switches handling of
SET TERM to something smart.

When it's ON, the terminator becomes semi-colon and ISQL do less
processing on command. It does not tries to understand comments nor strings.

When it see a new line and the line has a semicolon, it tries to execute
the statement.

But it does it with help of an enhanced engine's IAttachment::prepare2
method.

This method has a new parameter "unsigned* stopLength".

When stopLength is null, it works as standard existing
IAttachment::prepare method.

When it's not null, some different behaviors happens in engine:
- It does not tries to erase trailing semicolon before parse
- It parses statement requiring semicolon at end
- If statement is parsed but there are tokens left, parse is considered
successful and *stopLength is set to the number of used bytes of the
statement text

Using IAttachment::prepare2 and checking for
isc_command_end_err/isc_command_end_err2, ISQL is able to handle these
things:

- Statements with semicolon inside quotes or comments passed to the
engine and not parsed correctly
- Multiple statements passed as a single one to engine

So it becomes possible to run commands without care about a term, all
that using engine's rules instead of having a client parse that may not
match engine's one.

---
SQL> execute block returns (o1 integer, o2 integer)
CON> as
CON> begin
CON> o1 = 1;
CON> o2 = 2;
CON> suspend;
CON> end;

O1 O2
============ ============
1 2
---

---
SQL> select 1 from rdb$database; select 2 from rdb$database;

CONSTANT
============
1

CONSTANT
============
2
---


Adriano

Alex Peshkoff

unread,
Sep 3, 2023, 1:59:45 AM9/3/23
to firebir...@googlegroups.com
On 9/3/23 04:51, Adriano dos Santos Fernandes wrote:
> SQL> execute block returns (o1 integer, o2 integer)
> CON> as
> CON> begin
> CON> o1 = 1;

Am I right thinking that at this point client should be able to handle
some specific error code from prepare2() and (in ISQL case) proceed with
next line input? If yes - looks like it will be useful to have a key
combination (like correct ctrl-C handler) to interrupt that if user got
lost in a mix of semicolons, various quotes, comments, begin/end.


Adriano dos Santos Fernandes

unread,
Sep 3, 2023, 7:57:45 AM9/3/23
to firebir...@googlegroups.com
That would be really desired and useful.


Adriano

Adriano dos Santos Fernandes

unread,
Sep 3, 2023, 8:01:13 AM9/3/23
to firebir...@googlegroups.com
On 02/09/2023 22:51, Adriano dos Santos Fernandes wrote:
> But it does it with help of an enhanced engine's IAttachment::prepare2
> method.
>
> This method has a new parameter "unsigned* stopLength".

One point I'm in doubt with this api is that in case of semantic errors
after parse.

If I try to compile:

select '1' + 1 from rdb$database; select 2 from rdb$database;

Engine must consume first statement and report its length.

But engine is going to report an error too.

Do we have any other API that return output parameters even in case of
error status?


Adriano

Dimitry Sibiryakov

unread,
Sep 3, 2023, 8:01:16 AM9/3/23
to firebir...@googlegroups.com
Why not just make Firebird to execute batches and feed whole script to
IAttachment::execute()?..

--
WBR, SD.

Adriano dos Santos Fernandes

unread,
Sep 3, 2023, 8:12:11 AM9/3/23
to firebir...@googlegroups.com
On 03/09/2023 09:01, 'Dimitry Sibiryakov' via firebird-devel wrote:
> Why not just make Firebird to execute batches and feed whole script to
> IAttachment::execute()?..
>

This looks out of context for interactive mode.


Adriano

Dimitry Sibiryakov

unread,
Sep 3, 2023, 8:27:48 AM9/3/23
to firebir...@googlegroups.com
Ugh, I missed that you suggested to feed input to engine until it forms
something meaningful.
I wonder how you are going to handle front-end commands like SHOW or OUTPUT.
On the base of syntax error returned? But the same error is returned from
incomplete EXECUTE BLOCK from your example.
And when an user must press "Ctrl-C" if they see that they type commands but
nothing happen (because of syntax error in the first line)?

--
WBR, SD.

Vlad Khorsun

unread,
Sep 3, 2023, 8:36:28 AM9/3/23
to firebir...@googlegroups.com
What if instead of attempt of preparing statement, this new API will attempt
to just parse it and return success/failure ? Then, on success, ISQL will
pass corresponding part of input to the usual sequence of prepare/describe/
execute/fetch... APIs.

I.e. introduce new parse() method that will return count of successfully
parsed bytes, if any. Return 0, if text contains no valid statement.

Also, it is not clear what ISQL should do if text entered contains both valid
and not complete statements, for example:

select 1 from rdbdatabase; select

would it drop the 2nd "select" ? How user should continue typing the incomplete
command ? I.e. should (s)he type the rest

2 from rdbdatabase;

or the full command

select 2 from rdbdatabase;



Regards,
Vlad

Adriano dos Santos Fernandes

unread,
Sep 3, 2023, 9:13:48 AM9/3/23
to firebir...@googlegroups.com
Em dom., 3 de set. de 2023 09:36, Vlad Khorsun <fbv...@gmail.com> escreveu:
03.09.2023 15:01, Adriano dos Santos Fernandes wrote:
> On 02/09/2023 22:51, Adriano dos Santos Fernandes wrote:
>> But it does it with help of an enhanced engine's IAttachment::prepare2
>> method.
>>
>> This method has a new parameter "unsigned* stopLength".
>
> One point I'm in doubt with this api is that in case of semantic errors
> after parse.
>
> If I try to compile:
>
> select '1' + 1 from rdb$database; select 2 from rdb$database;
>
> Engine must consume first statement and report its length.
>
> But engine is going to report an error too.
>
> Do we have any other API that return output parameters even in case of
> error status?

   What if instead of attempt of preparing statement, this new API will attempt
to just parse it and return success/failure ? Then, on success, ISQL will
pass corresponding part of input to the usual sequence of prepare/describe/
execute/fetch... APIs.

   I.e. introduce new parse() method that will return count of successfully
parsed bytes, if any. Return 0, if text contains no valid statement.


I initially thought on it but then go to prepare2 method as it's going to be subsequently called.

But this detail of semantic errors make it appears that parse method may be better.

If we go this way I'm not sure parse should return only stop length or detailed (tokens) result of parse.



   Also, it is not clear what ISQL should do if text entered contains both valid
and not complete statements, for example:

        select 1 from rdbdatabase; select

would it drop the 2nd "select" ? How user should continue typing the incomplete
command ? I.e. should (s)he type the rest

        2 from rdbdatabase;

This one.


Adriano

Dimitry Sibiryakov

unread,
Sep 3, 2023, 9:20:05 AM9/3/23
to firebir...@googlegroups.com
Adriano dos Santos Fernandes wrote 03.09.2023 15:13:
>    Also, it is not clear what ISQL should do if text entered contains both
> valid
> and not complete statements, for example:
>
>         select 1 from rdbdatabase; select
>
> would it drop the 2nd "select" ? How user should continue typing the incomplete
> command ? I.e. should (s)he type the rest
>
>         2 from rdbdatabase;
>
>
> This one.

But what shall the method return for your initial example?

execute block returns (o1 integer, o2 integer) as begin o1 = 1; o2

What shall it return if a valid complete statement is not terminated with
semicolon at all?

--
WBR, SD.

Adriano dos Santos Fernandes

unread,
Sep 3, 2023, 9:57:56 AM9/3/23
to firebir...@googlegroups.com
On 03/09/2023 10:20, 'Dimitry Sibiryakov' via firebird-devel wrote:
> Adriano dos Santos Fernandes wrote 03.09.2023 15:13:
>>         Also, it is not clear what ISQL should do if text entered
>> contains both
>>     valid
>>     and not complete statements, for example:
>>
>>              select 1 from rdbdatabase; select
>>
>>     would it drop the 2nd "select" ? How user should continue typing
>> the incomplete
>>     command ? I.e. should (s)he type the rest
>>
>>              2 from rdbdatabase;
>>
>>
>> This one.
>
>   But what shall the method return for your initial example?
>
> execute block returns (o1 integer, o2 integer) as begin o1 = 1; o2
>

isc_command_end_err2


>   What shall it return if a valid complete statement is not terminated
> with semicolon at all?
>

isc_command_end_err2

When stopLength address is passed, semicolon termination is required.

This would avoid this statement to be executed just after the first line
is entered:

SQL> select ';' from rdb$database
CON> where true;

As ISQL only checks for semicolon to avoid completely unnecessary engine
call, but does it in theh dumb possible way, letting the engine to be
"smart".


Adriano

Adriano dos Santos Fernandes

unread,
Sep 3, 2023, 10:02:44 AM9/3/23
to firebir...@googlegroups.com
On 03/09/2023 10:57, Adriano dos Santos Fernandes wrote:
>>
>>   But what shall the method return for your initial example?
>>
>> execute block returns (o1 integer, o2 integer) as begin o1 = 1; o2
>>
>
> isc_command_end_err2
>

If we go to parse method strategy, then this should not raise error, but
return *stopLength = 0 for incomplete but valid statement.


Adriano


Dimitry Sibiryakov

unread,
Sep 3, 2023, 10:30:25 AM9/3/23
to firebir...@googlegroups.com
Adriano dos Santos Fernandes wrote 03.09.2023 15:57:
> As ISQL only checks for semicolon to avoid completely unnecessary engine
> call, but does it in theh dumb possible way, letting the engine to be
> "smart".

And how shall it handle frontend commands then?

--
WBR, SD.

Adriano dos Santos Fernandes

unread,
Sep 3, 2023, 10:45:46 AM9/3/23
to firebir...@googlegroups.com
I see not much problem here, just like currently, looking at it in start
of commands.

They are already broken with comments in its middle.

Perhaps we still need to check comments to make them work before
frontend commands.

If we go with parse method, it may help with this too.


Adriano

Dimitry Sibiryakov

unread,
Sep 3, 2023, 11:58:39 AM9/3/23
to firebir...@googlegroups.com
Adriano dos Santos Fernandes wrote 03.09.2023 16:45:
> I see not much problem here, just like currently, looking at it in start
> of commands.
>
> They are already broken with comments in its middle.
>
> Perhaps we still need to check comments to make them work before
> frontend commands.

So finally we end up with ISQL still do parsing and server redo parsing at
every semicolon.
Wouldn't it be simpler to expand ISQL parser to PSQL code as it was done in
Interbase?

--
WBR, SD.

Jim Starkey

unread,
Sep 3, 2023, 12:32:51 PM9/3/23
to firebir...@googlegroups.com

Let me suggest an alternative to the "set terminator" mechanism.

Skipping the very long history of this problem, the AmorphousDB Interact class handles the problem by probing a prospective statement as it is built up line by line by doing a "scan only" parse of the command text.  The actual code in the parse looks like this:

bool Parse::isIncompleteStatement(const char* string)
{
    if (parser)
        return parser->isIncompleteStatement(string);

    upcaseIdentifiers = true;
    setString(string);
    scanOnly = true;

    try
        {
        parse();
        }
    catch (SyntaxException&)
        {
        return tokenType == tokenEOF;
        }

    return false;
}

If scanOnly is set, two things primary things happen.  First, the the parser doesn't allocate syntax nodes; all productions return NULL.  Second, the parse returns a distinctive SyntaxError exception.  As can be seen, a SyntaxError with a Lex token type means that the statement was complete.  In addition, there are a few places in the parse that need to check a sub-production, in which was "scanOnly" is temporarily turned off, the sub-production is parsed, checked, and deleted, and "scanOnly" reverted.

The test for "parser" is a hack for when the AmorphousDB parser has recognized that the statement in question is, in fact, SQL and needs to pass the function the SQL parser.

AmorphousDB Interact also has an explicit command continuation character that is used only is command files.

I deeply regret having used YACC/BISON for DSQL.  I viewed DSQL as a quickie hack for a single customer that thought experimenting with a "compiler compiler" would be a good experience.  I later tried to make a BISON parser for C++ and discovered, like many before and after, that it is not capable of parsing the full C++ grammar.  I reverted to hard coded recursive descent parsers every since.

I wouldn't much multiple statements on a single interactive line with a 10 foot pole.

--
Jim Starkey, AmorphousDB, LLC

Adriano dos Santos Fernandes

unread,
Sep 3, 2023, 1:02:38 PM9/3/23
to firebir...@googlegroups.com
On 03/09/2023 12:58, 'Dimitry Sibiryakov' via firebird-devel wrote:
> Adriano dos Santos Fernandes wrote 03.09.2023 16:45:
>> I see not much problem here, just like currently, looking at it in start
>> of commands.
>>
>> They are already broken with comments in its middle.
>>
>> Perhaps we still need to check comments to make them work before
>> frontend commands.
>
>   So finally we end up with ISQL still do parsing and server redo
> parsing at every semicolon.

If server reports startPos too (stripping spaces and leading comments),
we don't need to parse comments in ISQL.

If server returns tokens, frontend commands would work even with
embedded comments.


>   Wouldn't it be simpler to expand ISQL parser to PSQL code as it was
> done in Interbase?
>

I have no idea what Interbase did and for what.

I would love to have decoupled parser from engine that can also work in
client. Not easy and will not solve this problem. *1

Client parser cannot do it always correctly without INTL, for example,
with alternate string quoting and MBCS. *2


Adriano

PS *1: Also love to have ISQL completely rewritten, also creating
reusable script runner library.

PS *2: I see no problem to have INTL integrated in y-valve and be
available for client tools, but it would be optional and its charsets
may not match server ones.

Dimitry Sibiryakov

unread,
Sep 3, 2023, 3:08:10 PM9/3/23
to firebir...@googlegroups.com
Adriano dos Santos Fernandes wrote 03.09.2023 19:02:
> I have no idea what Interbase did and for what.

They did exactly what you suggest: a little bit smarter terminator search.
For that they threat BEGIN-END the same way as nested comments ignoring
semicolons inside. That's simple.

--
WBR, SD.

Adriano dos Santos Fernandes

unread,
Sep 3, 2023, 3:50:09 PM9/3/23
to firebir...@googlegroups.com
On 03/09/2023 16:08, 'Dimitry Sibiryakov' via firebird-devel wrote:
> They did exactly what you suggest: a little bit smarter terminator
> search. For that they threat BEGIN-END the same way as nested comments
> ignoring semicolons inside. That's simple.

ALTER DATABASE BEGIN BACKUP;

<wait for END foevever>


Adriano

Dmitry Kovalenko

unread,
Sep 3, 2023, 4:12:39 PM9/3/23
to firebird-devel
ALTER DATABASE BEGIN BACKUP;

<wait for END foevever>

Such queries will cause a problem only when use within stored procedure body.

As I understand,  operator "ALTER DATABASE BEGIN BACKUP" can not be executed directly within SP.

Dmitry Kovalenko.

Dmitry Kovalenko

unread,
Sep 4, 2023, 1:48:20 AM9/4/23
to firebird-devel
Hello,

 
ALTER DATABASE BEGIN BACKUP;

<wait for END foevever>

Such queries will cause a problem only when use within stored procedure body.

I have locked at my parser code for processing procedure body.

It does not process the pair begin/end. It is just looking for END with semicolon (the terminate symbol).

I remember that the previous implementation was processing these pairs, but the current implementation does not do it.

Of course, this code is applying only for procedures (functions, execute blocks).

All the code for processing such queries looks so:

1. Try to find keyword "AS"
2. Try to find "END;"

---------
bool ib_common__sql_pnode__ddl__db_code::on_found
                                           (processor_context_type context,
                                            const input_iterator&  DEBUG_CODE(base_pos),
                                            input_iterator&        beg_tail,
                                            const input_iterator&  end_tail)
{
 assert(base_pos<=beg_tail);
 assert(beg_tail<=end_tail);
 assert(context.m_ds.text_services);

 DEBUG_CODE(m_debug__ds.check_sql_parser_settings(context.m_ds);)

 typedef t_ibp_sql_parser_funcs parser_funcs;

 if(parser_funcs::find_keyword(context.m_ds.text_services,
                               &beg_tail,
                               end_tail,
                               _SQL_T("AS"),
                               context.get_stmt_terminator()))
 {
  parser_funcs::find_stmt_end__code
   (context.m_ds.text_services,
    &beg_tail,
    end_tail,
    context.get_stmt_terminator());
 }//if

 return true;
}//on_found

---------
void t_ibp_sql_parser_funcs::find_stmt_end__code
                                (db_obj::t_db_text_services* const text_services,
                                 iterator*                   const pBeg,
                                 iterator                    const End,
                                 const char_type*            const term)
{
 assert(text_services);
 assert(pBeg);
 assert((*pBeg)<=End);

 while((*pBeg)!=End)
 {
  if(self_type::find_keyword(text_services,pBeg,End,_SQL_T("END"),NULL))
  {
   text_services->skeep_empty_space((*pBeg),End);

   if(self_type::is_stmt_terminator((*pBeg),End,term))
    break;
  }
 }//while
}//t_ibp_sql_parser_funcs::find_stmt_end__code


------
Regards,
Dmitry Kovalenko

Dmitry Kovalenko

unread,
Sep 4, 2023, 1:51:50 AM9/4/23
to firebir...@googlegroups.com
I have locked at my parser code for processing procedure body.

I have looked at ...

Sorry :)
Dmitry Kovalenko.

Mark Rotteveel

unread,
Sep 4, 2023, 3:50:06 AM9/4/23
to firebir...@googlegroups.com
On 04-09-2023 07:48, Dmitry Kovalenko wrote:
> All the code for processing such queries looks so:
>
> 1. Try to find keyword "AS"
> 2. Try to find "END;"

That doesn't work unless you first identify that your identify you're in
a PSQL style statement.

And step 2 is too naive. The proper way to do that is

1) Identify if PSQL statement or not
2) If PSQL, count begin and ends, when count of begin = count of end,
you're done with the statement.

(this trick assumes there are no other PSQL statements that can contain
begin and end, otherwise you will need to at least identify the contexts
where begin is valid, and that end must be immediately preceded by
begin, or otherwise by a semicolon.

Mark
--
Mark Rotteveel

Dmitry Kovalenko

unread,
Sep 4, 2023, 4:01:50 AM9/4/23
to firebir...@googlegroups.com
And step 2 is too naive. The proper way to do that is

Mark, I described only a common idea for looking for the final END of SP declaration.

You can provide me with a test SQL. I will test it :)

If my provider fails with your example, it will be really interesting to me :)

Kovalenko Dmitry.

Алексей Чудайкин

unread,
Sep 4, 2023, 4:38:34 AM9/4/23
to firebird-devel
HI! Implemented the ability to work with PSQL without entering SET TERM. The algorithm is based on the parser of incoming tokens and the subsequent determination of the current cursor position. Main tokens "AS" "DECLARE", pair "BEGIN-END" and pair "CASE-END". Implemented the ability to export without SET TERM. Temporarily selected "-XN" option. The "-X" option has not changed. Checked the correct operation in 2 ways. 1) Exported the metadata with the -xn option from a real database (3GB) and imported it to a new database. 2) Metadata from "Interbase" imported to a new database. Then exported and compared with metadata from "Interbase".
Chudaykin Alexey, RedSoft.

Karol Bieniaszewski

unread,
Sep 4, 2023, 4:43:58 AM9/4/23
to firebir...@googlegroups.com

Probably fail will be with subrutines in psql?

And CASE statemet also contains END

 

Regards,

Karol Bieniaszewski

--
You received this message because you are subscribed to the Google Groups "firebird-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firebird-deve...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/firebird-devel/CAHtTfeLOipCadT-AhiR0OQRrukLA5TDfkaMWx5193sT1BC_D7w%40mail.gmail.com.

 

Dmitry Kovalenko

unread,
Sep 4, 2023, 4:51:20 AM9/4/23
to firebird-devel

Probably fail will be with subrutines in psql?

And CASE statemet also contains END

Internal END does not have ';'

Am I right?

Dmitry Kovalenko.

Mark Rotteveel

unread,
Sep 4, 2023, 4:58:18 AM9/4/23
to firebir...@googlegroups.com
On 04-09-2023 10:51, Dmitry Kovalenko wrote:
> Probably fail will be with subrutines in psql?
>
> And CASE statemet also contains END
>
> Internal END does not have ';'
>
> Am I right?

No, you're not. For example this is a valid statement in PSQL:

x = case when y < 10 then 5 else 4 end;

Mark
--
Mark Rotteveel

Dmitry Kovalenko

unread,
Sep 4, 2023, 5:04:27 AM9/4/23
to firebir...@googlegroups.com
> Internal END does not have ';'
>
> Am I right?

No, you're not. For example this is a valid statement in PSQL:

x = case when y < 10 then 5 else 4 end;

Ok. Thank you. I will fix it.

Regards,
Dmitry Kovalenko

Adriano dos Santos Fernandes

unread,
Sep 4, 2023, 5:26:35 AM9/4/23
to firebir...@googlegroups.com


Em seg., 4 de set. de 2023 05:38, Алексей Чудайкин <chuday...@gmail.com> escreveu:
HI! Implemented the ability to work with PSQL without entering SET TERM. The algorithm is based on the parser of incoming tokens and the subsequent determination of the current cursor position. Main tokens "AS" "DECLARE", pair "BEGIN-END" and pair "CASE-END". Implemented the ability to export without SET TERM. Temporarily selected "-XN" option. The "-X" option has not changed. Checked the correct operation in 2 ways. 1) Exported the metadata with the -xn option from a real database (3GB) and imported it to a new database. 2) Metadata from "Interbase" imported to a new database. Then exported and compared with metadata from "Interbase".

How can it understand this string without intl library and not get stuck inside a string?

q'🙃...🙃'


Adriano

Dmitry Kovalenko

unread,
Sep 4, 2023, 6:25:11 AM9/4/23
to firebird-devel
Hello Karol. 

Probably fail will be with subrutines in psql?


I found this script in my tests:

recreate package PKG_DUMMY1
as
begin
 procedure sp1;end;

recreate package body PKG_DUMMY1
as
begin
 procedure sp1
 as
 declare function FUNC1 returns integer as begin return 0; end/*END;*/
 begin
 end
 procedure sp2
 as
 declare variable x1 integer;
 declare function FUNC1 returns integer as begin return 0; end/*END;*/
 declare variable x2 integer;
 begin
 end
end;

If I understand correctly - subroutines (FUNC1) does not require ';' after last end.

Did you say about this thing?

----
The problem with CASE, unfortunately, exists.

Dmitry Kovalenko.

Mark Rotteveel

unread,
Sep 4, 2023, 6:27:39 AM9/4/23
to firebir...@googlegroups.com
On 04-09-2023 09:49, 'Mark Rotteveel' via firebird-devel wrote:
> On 04-09-2023 07:48, Dmitry Kovalenko wrote:
>> All the code for processing such queries looks so:
>>
>> 1. Try to find keyword "AS"
>> 2. Try to find "END;"
>
> That doesn't work unless you first identify that your identify you're in
> a PSQL style statement.
>
> And step 2 is too naive. The proper way to do that is
>
> 1) Identify if PSQL statement or not
> 2) If PSQL, count begin and ends, when count of begin = count of end,
> you're done with the statement.

My suggestion is too naive too, as it won't work if subroutines are present.

Mark
--
Mark Rotteveel

Dmitry Kovalenko

unread,
Sep 4, 2023, 7:12:08 AM9/4/23
to firebird-devel
My suggestion is too naive too, as it won't work if subroutines are present.

It would be great if "END"+";" was the last operator in such statements.

Unfortunately, the real world is too cruel :))))

Dmitry Kovalenko.

Adriano dos Santos Fernandes

unread,
Sep 4, 2023, 7:53:23 AM9/4/23
to firebir...@googlegroups.com
Hi!

To make ISQL SET SMARTTERM and also allow tools (language services,
IDEs), I propose creation of parse "service" in engine.

This starts in engine, but in the future we may decouple parser from
engine code and also have this inside Util (i.e., as a client library
feature).

---
interface Attachment : ReferenceCounted
{
...

ParseResult parse(Status status, uint stmtLength, const string sqlStmt,
uint dialect);
}

interface ParseResult : Disposable {
uint getTokenCount();
LexToken getToken(uint index);

/* Statement must ends with semicolon to be considered parsed */
uint getParsedTokenCount();
uint getParsedTextLength();
}

interface LexToken : IVersioned {
const char* getRawText();
uint getFirstPosition();
uint getFirstLine();
uint getFirstColumn();
uint getLastPosition();
uint getLastLine();
uint getLastColumn();
}
---

Here is an example of results for:

/* comment */ select 1 + 'a' from rdb$database; select 2

parseResult:
- tokenCount: 9
- parsedTokenCount: 7
- parsedTextLength: 47

tokens rawText:
- 0: select
- 1: 1
- 2: +
- 3: 'a'
- 4: from
- 5: rdb$database
- 6: ;
- 7: select
- 8: 2


Adriano

Dimitry Sibiryakov

unread,
Sep 4, 2023, 8:04:23 AM9/4/23
to firebir...@googlegroups.com
Adriano dos Santos Fernandes wrote 04.09.2023 13:53:
> This starts in engine, but in the future we may decouple parser from
> engine code and also have this inside Util (i.e., as a client library
> feature).

Don't put the decoupling off: parsing must be performed by ISQL before attach
to server.

> interface LexToken : IVersioned {
> const char* getRawText();
> uint getFirstPosition();
> uint getFirstLine();
> uint getFirstColumn();
> uint getLastPosition();
> uint getLastLine();
> uint getLastColumn();
> }

Here missed functions to get token type (keyword, string literal, identifier,
etc) and methods to get token of definite type transformed per language rules
(identifier indelimited or uppercased, keyword uppercased, literal unquoted,
etc). Also standalone (?) methods for such transformations from RawText also
would be useful.
BTW, class Tokens already exists so anyone can use it directly. I see no need
in a new API interface.

--
WBR, SD.

Vlad Khorsun

unread,
Sep 4, 2023, 8:13:10 AM9/4/23
to firebir...@googlegroups.com
04.09.2023 14:53, Adriano dos Santos Fernandes wrote:
> Hi!
>
> To make ISQL SET SMARTTERM and also allow tools (language services,
> IDEs), I propose creation of parse "service" in engine.
>
> This starts in engine, but in the future we may decouple parser from
> engine code and also have this inside Util (i.e., as a client library
> feature).
>
> ---
> interface Attachment : ReferenceCounted
> {
> ...
>
> ParseResult parse(Status status, uint stmtLength, const string sqlStmt,
> uint dialect);
> }
>
> interface ParseResult : Disposable {
> uint getTokenCount();
> LexToken getToken(uint index);
>
> /* Statement must ends with semicolon to be considered parsed */

Hmm... why ? If we speak about interactive mode, user already pressed Enter and
i.e. user finished typing (probably). In script execution mode all of this
anyway can't work, AFAIU.

> uint getParsedTokenCount();
> uint getParsedTextLength();
> }
>
> interface LexToken : IVersioned {
> const char* getRawText();
> uint getFirstPosition();
> uint getFirstLine();
> uint getFirstColumn();
> uint getLastPosition();
> uint getLastLine();
> uint getLastColumn();
> }
> ---

What is the purpose of returning set of tokens ? How user application supposed
to use it ? I thought, we wanted to get statement(s), not tokens, isn't it ?
Even if one consider set of tokens as usable for some tasks, where the (set of)
statements ? ;)

> Here is an example of results for:
>
> /* comment */ select 1 + 'a' from rdb$database; select 2
>
> parseResult:
> - tokenCount: 9
> - parsedTokenCount: 7
> - parsedTextLength: 47
>
> tokens rawText:
> - 0: select
> - 1: 1
> - 2: +
> - 3: 'a'
> - 4: from
> - 5: rdb$database
> - 6: ;
> - 7: select
> - 8: 2

parsedTextLength points to the

/* comment */ select 1 + 'a' from rdb$database

which is valid statement, and it is good. But list of tokens is wider than
than parsed text and includes more tokens. It is a bit confused, as for me.
BTW, comments also could be returned as a token.

Regards,
Vlad

Dimitry Sibiryakov

unread,
Sep 4, 2023, 8:33:08 AM9/4/23
to firebir...@googlegroups.com
Vlad Khorsun wrote 04.09.2023 14:13:
> If we speak about interactive mode, user already pressed Enter and
> i.e. user finished typing (probably). In script execution mode all of this
> anyway can't work, AFAIU.

In script execution mode this parser also will have problem with legacy SET
TERM that allows to set as a terminator any sequence of bytes not containing
whitespaces, quotes, comments and the current terminator so unless you are going
to drop this support - following script cannot be parsed in any way:

SET TERM GO ;
SET TERM @#$%^+;This! GO
select 1 from rdb$database @#$%^+;This!

--
WBR, SD.

Aleksey Chudaykin

unread,
Sep 4, 2023, 8:40:25 AM9/4/23
to firebird-devel
Yes, it will get stuck. Viewing the isql code, there the char variable is used for the alternative qouted. If we're talking about the same thing.

понедельник, 4 сентября 2023 г. в 12:26:35 UTC+3, Adriano dos Santos Fernandes:

Adriano dos Santos Fernandes

unread,
Sep 4, 2023, 8:32:53 PM9/4/23
to firebir...@googlegroups.com, Aleksey Chudaykin
On 04/09/2023 09:40, Aleksey Chudaykin wrote:
> Yes, it will get stuck. Viewing the isql code, there the char variable
> is used for the alternative qouted. If we're talking about the same thing.
>

It doesn't matter the type, it cannot do it without intl library or
fixed heuristics.


Adriano


Adriano dos Santos Fernandes

unread,
Sep 4, 2023, 8:55:22 PM9/4/23
to firebir...@googlegroups.com
On 04/09/2023 09:13, Vlad Khorsun wrote:
>>     /* Statement must ends with semicolon to be considered parsed */
>
>   Hmm... why ? If we speak about interactive mode, user already pressed
> Enter and
> i.e. user finished typing (probably).

Without it, all sort of problem happens. Enter means nothing. Many
people do this:

select value
from table
where condition

Nothing requiring semicolon the statement will be evaluated without the
where, as in list of values dumb ISQL parser may see a semicolon inside
a string and try the parse service.


> In script execution mode all of this
> anyway can't work, AFAIU.
>

Why?

It should work the same way, as well the current code works in the same
way for interactive and script.


>>     uint getParsedTokenCount();
>>     uint getParsedTextLength();
>> }
>>
>> interface LexToken : IVersioned {
>>     const char* getRawText();
>>     uint getFirstPosition();
>>     uint getFirstLine();
>>     uint getFirstColumn();
>>     uint getLastPosition();
>>     uint getLastLine();
>>     uint getLastColumn();
>> }
>> ---
>
>   What is the purpose of returning set of tokens ? How user application
> supposed
> to use it ? I thought, we wanted to get statement(s), not tokens, isn't
> it ?

We have someone parsing for us, so we get tokens from it.

ISQL can use the token list to implement frontend parser, for example.

Language services / IDEs may use them (with improved information) for
syntax highlight, autocomplete.


> Even if one consider set of tokens as usable for some tasks, where the
> (set of)
> statements ? ;)
>

Set of statements may be interesting, but I would not like to change
parser to have this info.

Multiple calls to parse may be done to have this.


>> Here is an example of results for:
>>
>> /* comment */ select 1 + 'a' from rdb$database; select 2
>>
>> parseResult:
>> - tokenCount: 9
>> - parsedTokenCount: 7
>> - parsedTextLength: 47
>>
>> tokens rawText:
>> - 0: select
>> - 1: 1
>> - 2: +
>> - 3: 'a'
>> - 4: from
>> - 5: rdb$database
>> - 6: ;
>> - 7: select
>> - 8: 2
>
>   parsedTextLength points to the
>
> /* comment */ select 1 + 'a' from rdb$database
>
> which is valid statement, and it is good. But list of tokens is wider than
> than parsed text and includes more tokens. It is a bit confused, as for me.

It's not confusing if you think tokens from lexer feeds the parser.

The service returned parsed string and tokens.


> BTW, comments also could be returned as a token.
>

It is for some tools, really.

But is another example that I'm not sure we should add.

Comments are not tokens.

And in practice any positions outside the returned tokens are
whitespaces or comments, both easily distinguishable.


Adriano


Adriano dos Santos Fernandes

unread,
Sep 4, 2023, 8:58:20 PM9/4/23
to firebir...@googlegroups.com
I do not understand your point.

ISQL should continue work in the same way when not in "smart term" mode.

Its current code cannot die.

So you may do:

SET TERM GO ;
SET TERM @#$%^+;This! GO
select 1 from rdb$database @#$%^+;This!

SET SMARTERM ON @#$%^+;This!
select 1 from rdb$database;
SET SMARTTERM OFF;

select 1 from rdb$database @#$%^+;This!


Adriano

Adriano dos Santos Fernandes

unread,
Sep 4, 2023, 9:09:22 PM9/4/23
to firebir...@googlegroups.com
On 04/09/2023 09:04, 'Dimitry Sibiryakov' via firebird-devel wrote:
> Adriano dos Santos Fernandes wrote 04.09.2023 13:53:
>> This starts in engine, but in the future we may decouple parser from
>> engine code and also have this inside Util (i.e., as a client library
>> feature).
>
>   Don't put the decoupling off: parsing must be performed by ISQL before
> attach to server.
>

Good point.

But currently ISQL own parser cannot die anyway.


>> interface LexToken : IVersioned {
>>     const char* getRawText();
>>     uint getFirstPosition();
>>     uint getFirstLine();
>>     uint getFirstColumn();
>>     uint getLastPosition();
>>     uint getLastLine();
>>     uint getLastColumn();
>> }
>
>   Here missed functions to get token type (keyword, string literal,
> identifier, etc) and methods to get token of definite type transformed
> per language rules (identifier indelimited or uppercased, keyword
> uppercased, literal unquoted, etc). Also standalone (?) methods for such
> transformations from RawText also would be useful.

Would love to have all of this now.

But, first, interfaces are versioned, so things may be added later.

And second, we can't expose internal code numbers, so this is not easy
nor would be simple to maintain.


>   BTW, class Tokens already exists so anyone can use it directly.

No!


Adriano


Adriano dos Santos Fernandes

unread,
Sep 4, 2023, 9:24:24 PM9/4/23
to firebir...@googlegroups.com
On 04/09/2023 21:55, Adriano dos Santos Fernandes wrote:
>> Even if one consider set of tokens as usable for some tasks, where the
>> (set of)
>> statements ? 😉
>>
> Set of statements may be interesting, but I would not like to change
> parser to have this info.
>
> Multiple calls to parse may be done to have this.
>
>

One possibility is to have the parse method call the grammar multiple
times to return many parsed statements.

Then we don't need to change the grammar for that.

But it would make return data more complex.


Adriano

Dimitry Sibiryakov

unread,
Sep 5, 2023, 6:27:04 AM9/5/23
to firebir...@googlegroups.com
Adriano dos Santos Fernandes wrote 05.09.2023 2:55:
> We have someone parsing for us, so we get tokens from it.
>
> ISQL can use the token list to implement frontend parser, for example.

ISQL already using Tokens class for that. There is no point to duplicate
functionality.

> ISQL should continue work in the same way when not in "smart term" mode.
>
> Its current code cannot die.
>
> So you may do:
>
> SET TERM GO ;
> SET TERM @#$%^+;This! GO
> select 1 from rdb$database @#$%^+;This!
>
> SET SMARTERM ON @#$%^+;This!
> select 1 from rdb$database;
> SET SMARTTERM OFF;
>
> select 1 from rdb$database @#$%^+;This!

Are you seriously going to make it use two different parsers in different
modes? That's crazy and won't work.
Just try it. Pretend Tokens to be your smart parser (it has about the same
interface which you suggested) and modify ISQL to work with it. You'll see that
it is impossible.

--
WBR, SD.

Vlad Khorsun

unread,
Sep 5, 2023, 8:12:38 AM9/5/23
to firebir...@googlegroups.com
05.09.2023 3:55, Adriano dos Santos Fernandes wrote:
> On 04/09/2023 09:13, Vlad Khorsun wrote:
>>>     /* Statement must ends with semicolon to be considered parsed */
>>
>>   Hmm... why ? If we speak about interactive mode, user already pressed
>> Enter and
>> i.e. user finished typing (probably).
>
> Without it, all sort of problem happens. Enter means nothing. Many
> people do this:
>
> select value
> from table
> where condition
>
> Nothing requiring semicolon the statement will be evaluated without the
> where, as in list of values dumb ISQL parser may see a semicolon inside
> a string and try the parse service.

Here you right.

>> In script execution mode all of this
>> anyway can't work, AFAIU.
>>
>
> Why?
>
> It should work the same way, as well the current code works in the same
> way for interactive and script.

I thought that with SMARTTERM entered text will be passed to the parse() after
every newline, is it wrong ? Even if in non-interactive mode the whole script will
be passed into parse() - how many times it will happens ? Will it be passed once
as a whole, or after detecting (and executing) every single statement rest of the
script will be passed again into the parse() to get next statement ?

>>>     uint getParsedTokenCount();
>>>     uint getParsedTextLength();
>>> }
>>>
>>> interface LexToken : IVersioned {
>>>     const char* getRawText();
>>>     uint getFirstPosition();
>>>     uint getFirstLine();
>>>     uint getFirstColumn();
>>>     uint getLastPosition();
>>>     uint getLastLine();
>>>     uint getLastColumn();
>>> }
>>> ---
>>
>>   What is the purpose of returning set of tokens ? How user application
>> supposed
>> to use it ? I thought, we wanted to get statement(s), not tokens, isn't
>> it ?
>
> We have someone parsing for us, so we get tokens from it.
>
> ISQL can use the token list to implement frontend parser, for example.
>
> Language services / IDEs may use them (with improved information) for
> syntax highlight, autocomplete.
>
>
>> Even if one consider set of tokens as usable for some tasks, where the
>> (set of)
>> statements ? ;)
>>
>
> Set of statements may be interesting, but I would not like to change
> parser to have this info.

But parser anyway have this info, else how it returns getParsedTextLength() ?

> Multiple calls to parse may be done to have this.

This is acceptable in interactive mode, but I have a big doubts about script mode.

>>> Here is an example of results for:
>>>
>>> /* comment */ select 1 + 'a' from rdb$database; select 2
>>>
>>> parseResult:
>>> - tokenCount: 9
>>> - parsedTokenCount: 7
>>> - parsedTextLength: 47
>>>
>>> tokens rawText:
>>> - 0: select
>>> - 1: 1
>>> - 2: +
>>> - 3: 'a'
>>> - 4: from
>>> - 5: rdb$database
>>> - 6: ;
>>> - 7: select
>>> - 8: 2
>>
>>   parsedTextLength points to the
>>
>> /* comment */ select 1 + 'a' from rdb$database
>>
>> which is valid statement, and it is good. But list of tokens is wider than
>> than parsed text and includes more tokens. It is a bit confused, as for me.
>
> It's not confusing if you think tokens from lexer feeds the parser.

Let's assume for a moment that I know what is lexer, parser, token, etc.

> The service returned parsed string and tokens.

And it is still confusing - why parsed string doesn't contains all parsed tokens.

>> BTW, comments also could be returned as a token.
>>
>
> It is for some tools, really.
>
> But is another example that I'm not sure we should add.
>
> Comments are not tokens.

With proposed interface there is no tokens, strictly speaking. I see just a
set of lexemes, without its exact type (category) and meanings. And comments
*is* tokens (and lexeme, of course), here I not agree.

> And in practice any positions outside the returned tokens are
> whitespaces or comments, both easily distinguishable.

This is true. And, if consider comments as a white space or garbage - this is
acceptable. But what if one need to get comments, separately from white space ?
The goal is to avoid additional parsing, isn't is ?

Regards,
Vlad

Dimitry Sibiryakov

unread,
Sep 5, 2023, 8:41:02 AM9/5/23
to firebir...@googlegroups.com
Vlad Khorsun wrote 05.09.2023 14:12:
>   This is true. And, if consider comments as a white space or garbage - this is
> acceptable. But what if one need to get comments, separately from white space ?

Then it should be an option because generally one parse statement to do
something and the last thing they want to do is to envelope every token
retrieval in "skip comments" loop.
Consider this piece of code:

if (tokens[0] == "CREATE" && tokens[1] == "DATABASE")

--
WBR, SD.

Vlad Khorsun

unread,
Sep 5, 2023, 8:48:35 AM9/5/23
to firebir...@googlegroups.com
Agreed, there should be a way to skip comments when not needed.
But this could be a bit out of minimal interface, so "should be"
turns into "good to have" :)

Regards,
Vlad

Dimitry Sibiryakov

unread,
Sep 5, 2023, 8:56:12 AM9/5/23
to firebir...@googlegroups.com
Vlad Khorsun wrote 05.09.2023 14:48:
> But this could be a bit out of minimal interface, so "should be"
> turns into "good to have" :)

May be just an additional parameter for parse() function...

--
WBR, SD.

Karol Bieniaszewski

unread,
Sep 5, 2023, 2:52:52 PM9/5/23
to firebir...@googlegroups.com

The big question here when statement 1, 2, 3 from the script will be executed?

After ENTER? After „;” ?

Why i ask?

 

You can type SELECT * FROM TABLE ENTER

WHERE 😉

 

Regards,

Karol Bieniaszewski

 

Od: Vlad Khorsun
Wysłano: wtorek, 5 września 2023 14:12
Do: firebir...@googlegroups.com

--

You received this message because you are subscribed to the Google Groups "firebird-devel" group.

To unsubscribe from this group and stop receiving emails from it, send an email to firebird-deve...@googlegroups.com.

Adriano dos Santos Fernandes

unread,
Sep 6, 2023, 6:34:13 AM9/6/23
to firebir...@googlegroups.com, Aleksey Chudaykin
I see there is a prerequisite for "smart term".

It's smart error recovering, like described here in bison documentation:
https://www.gnu.org/software/bison/manual/html_node/Error-Recovery.html

For example:

select from rdb$database; select 2 from rdb$database;

There is two statements, first one with syntax error.

But parser should be able to eat the string until it's end (semicolon).

But it's not about semicolon only, it should be smart. Here is another
example:

execute block
as
declare v integer;
begin
select from rdb$database into v;
end; select 2 from rdb$database;

Here parser should eat the erroneous statement until its "end;", not
until "into v;".

While this is a prerequisite for the "parser service", it's a thing on
its own for "prepare".

Like a language's compiler that report multiple errors instead of first
one only, Firebird should do it. Oracle does it.

Then it's not only about parse, but semantic errors as well, for example:

execute block
as
begin
select 1 from rdb$database into v1;
select 2 from rdb$database into v2;
end;

We currently return:

Statement failed, SQLSTATE = 42S22
Dynamic SQL Error
-SQL error code = -206
-Column unknown
-V1
-At line 4, column 35

But we should better return something like:

Statement failed, SQLSTATE = 42S22
Dynamic SQL Error
-SQL error code = -206
(4, 35): Column unknown: V1
(5, 35): Column unknown: V2

This is very desired feature for routines, and specially for packages.


Adriano

Jim Starkey

unread,
Sep 6, 2023, 10:40:57 AM9/6/23
to firebir...@googlegroups.com
Guys, this whole discussion is the height of insanity.  First, the idea
that a parser for a formal language can be extended to recognize
syntactically invalid statements but still successfully recognize the
correct end of statement is just crazy.  And if it could be done, it
shouldn't be done.  Are you going to document that various syntax errors
that the system can successfully handle?

The second problem is even more serious.  Continuing to process multiple
statements on a line after one has failed (for any reason!) utterly
ignores the most common case where a subsequent statement depends on the
successful execution a  preceding statement.  How is the poor user going
to figure out what even happened?  All he or she really knows is that
something didn't work.

I don't know that status of auto-commit in Firebird, but has anyone
given any thought to the interaction of auto-commit of multiple
statements per line?  One line, two statements.  The second statement
fails.  What gets rolled back?

Database languages are formal languages.  When something goes wrong,
stop, flush the input, and let a human sort it out.
--
Jim Starkey, AmorphousDB, LLC

Adriano dos Santos Fernandes

unread,
Sep 6, 2023, 11:04:24 AM9/6/23
to firebir...@googlegroups.com
Em qua., 6 de set. de 2023 11:40, Jim Starkey <j...@jimstarkey.net> escreveu:
Guys, this whole discussion is the height of insanity.  First, the idea
that a parser for a formal language can be extended to recognize
syntactically invalid statements but still successfully recognize the
correct end of statement is just crazy.  And if it could be done, it
shouldn't be done.  Are you going to document that various syntax errors
that the system can successfully handle?

I desire the tools you use to create your projects start reporting only the first error per build.

Insanity for me is the MVP you said to have created still do it.



The second problem is even more serious.  Continuing to process multiple
statements on a line after one has failed (for any reason!) utterly
ignores the most common case where a subsequent statement depends on the
successful execution a  preceding statement.  How is the poor user going
to figure out what even happened?  All he or she really knows is that
something didn't work.

I don't know that status of auto-commit in Firebird, but has anyone
given any thought to the interaction of auto-commit of multiple
statements per line?  One line, two statements.  The second statement
fails.  What gets rolled back?

You should start using the tools you said to have created then and see what they do, how they do and how they are coded.

Or perhaps say with some out of context code how your closed/unpublished "modern" projects do it.


Adriano

Dimitry Sibiryakov

unread,
Sep 6, 2023, 11:06:35 AM9/6/23
to firebir...@googlegroups.com
Adriano dos Santos Fernandes wrote 06.09.2023 17:04:
> I desire the tools you use to create your projects start reporting only the
> first error per build.

Yes, it would be good to get only relevant error instead of snowball.
Fortunately GCC for example has a special option for it.

--
WBR, SD.
Reply all
Reply to author
Forward
0 new messages