Distributed queries (Management of External Data) proposal

191 views
Skip to first unread message

Vasiliy Yashkov

unread,
Jan 21, 2025, 3:53:36 AMJan 21
to firebir...@googlegroups.com
Hello All! 

This proposal describes the implementation of distributed (heterogeneous) queries. These queries allow managing data stored in other databases. Those databases can be located on the same or different servers and use the same or different DBMS. Distributed queries allow to integrate and process data from different sources, providing developers with the ability to work with them as a single logical unit.
External data management is defined by ISO/IEC 9075-9:2023(E) (SQL/MED) standard. External data management implies both read and write access to external data. Access to foreign data is implemented using 4 objects: foreign-data wrapper, foreign server, user mapping, and foreign table.

1. Goals

1.1. Ability to use foreign-data wrappers (providers);
1.2. Ability to define a foreign server, that manages external data;
1.3. Ability to create user mappings for foreign connections, defines the connection parameters to be used by the specified user when accessing a foreign server;
1.4. Ability to create foreign table whose rows are supplied when needed by some foreign server;
1.5. Ability to execute the `EXECUTE STATEMENT` operator on a foreign server;
1.6. Optimization of foreign queries.

2. Syntax

2.1 Foreign-data wrapper

In Firebird, the foreign-data wrapper can be implemented as a provider (plugin). To use a foreign-data wrapper, it should be defined in plugins.conf, for example:

Plugin = JDBC {
    Module = $(dir_plugins)/jdbc_provider
    Config = JDBC_config
}

Config = JDBC_config {
    Dir = $(this)/jdbc
}

Implementation of 12.4 <foreign-data wrapper definition> of the SQL/MED standard does not seem necessary (yet).
To connect to Firebird databases, the Engine and Remote providers are available by default and require no additional definition in plugins.conf.

2.2. Foreign server

A foreign server is a logical representation of a remote data source inside the DBMS. The DBMS connects to the foreign server to execute queries. The foreign server provides metadata and connection information about the remote data source. Usually, the external server stores the connection parameters such as connection string, port, database type, etc.

2.2.1. CREATE SERVER definition

CREATE SERVER [IF NOT EXISTS] <foreign server name>
        [{FOREIGN DATA WRAPPER | USING PLUGIN} <provider (plugin)>] [OPTIONS (<option> [,<option> ...] )]
    <option> ::= {
        CONNECTION_STRING = '<connection string>'
        | USER = '<user name>'
        | PASSWORD = '<password>'
        | ROLE = '<role>'
        | <option name> = 'value'
        | <option name> 'value' }

The `SERVER` clause specifies the name of the server.
The `FOREIGN DATA WRAPPER` or `USING PLUGIN` clause specifies the provider to use to access foreign data. If no clause is specified, the providers specified in the Providers parameter of the firebird.conf configuration file will be tried for connection. The clause `USING PLUGIN` is not specified in the standard, but it is familiar to Firebird developers and users, so it is proposed as an extension to the standard definition of a foreign server.
The optional `OPTIONS` clause defines additional connection parameters to the foreign server. The required connection parameters depend on the provider (plugin) used to access the data. General options are:
  • CONNECTION_STRING - connection string to the foreign server (depends on the foreign server specification), if the option is not specified, an attempt will be made to connect to the current database;
  • USER - the user on whose name the connection will be established;
  • PASSWORD - user password;
  • ROLE - the role with which the specified user will connect to the foreign server.
The `CREATE SERVER` statement can be executed by administrators or users with the `CREATE SERVER` privilege.
Example:

CREATE SERVER TEST_SERVER
     FOREIGN DATA WRAPPER "Remote"
             OPTIONS (
               CONNECTION_STRING 
                 '172.17.0.2:/db/external.fdb',
               USER 'ext_user',
               PASSWORD 'ext_password',
               CUSTOM_OPTION = 'Custom');

2.2.2. ALTER SERVER definition

ALTER SERVER <foreign server name>
        [[DROP] FOREIGN DATA WRAPPER | [DROP] USING PLUGIN <provider(plugin)>] [OPTIONS(<option> [, <option> ...] )]
    <option> ::= {
        [DROP] <option name> [= 'value']
        | [DROP] <option name> ['value'] }

If existing server options are specified, they will be updated and the others will not be affected.
The `ALTER SERVER` statement can be executed by administrators, owner or users with the `ALTER SERVER` privilege.
Example:

ALTER SERVER TEST_SERVER OPTIONS(DROP USER, DROP PASSWORD);

2.2.3. DROP SERVER definition

DROP SERVER <foreign server name>

The `DROP SERVER` statement can be executed by administrators, owner or users with the `DROP SERVER` privilege. If the server has dependencies, the `DROP SERVER` statement will fail with an execution error.
Example:

DROP SERVER TEST_SERVER;

2.3. User mappings

User mappings for foreign connections define the connection parameters to be used by the specified user when accessing a foreign server. The user mapping provides security and access control by specifying the credentials to connect to the foreign server. User mapping maps local user accounts to remote server credentials and defines the login and password for the connection.

2.3.1. CREATE USER MAPPING FOR definition

CREATE USER MAPPING FOR <user name> SERVER <server name> [OPTIONS (<option> [, <option> ...] )]
    <option> ::= {
        CONNECTION_STRING = '<connection string>'
        | USER = '<user name>'
        | PASSWORD = '<password>'
        | ROLE = '<role>'
        | <option name> = 'value'
        | <option name> 'value' }

The general options are the same as for the `CREATE SERVER` statement. When connecting to a foreign server, the option values specified in the user mapping will be used. The connection parameters specified in the user mapping have a higher priority than those specified for the foreign server.
The `CREATE USER MAPPING FOR` statement can be executed by administrators and database owner.

Example:

CREATE USER MAPPING FOR SYSDBA SERVER TEST_SERVER OPTIONS (USER 'ext_user', PASSWORD 'ext_password');

2.3.2. ALTER USER MAPPING FOR definition

ALTER USER MAPPING FOR <user name> SERVER <server name> [OPTIONS (<option> [, <option> ...] )]
    <option> ::= {
        [DROP] <option name> [= 'value']
        | [DROP] <option name> ['value'] }

If existing server options are specified, they will be updated and the others will not be affected.

The `ALTER USER MAPPING FOR` statement can be executed by administrators and database owner.

Example:

ALTER USER MAPPING FOR SYSDBA SERVER TEST_SERVER OPTIONS (USER 'new_user', DROP PASSWORD);

2.3.3. DROP USER MAPPING FOR definition

DROP USER MAPPING FOR <user name> SERVER <server name>

The `DROP USER MAPPING FOR` statement can be executed by administrators and database owner.

Example:

DROP USER MAPPING FOR SYSDBA SERVER TEST_SERVER;

2.4. Access to foreign data

Foreign server data can be managed in two ways: using the `EXECUTE STATEMENT` statement or via a foreign table.

2.4.1. Foreign table

A foreign table is a table that is physically stored on a foreign server but is accessible by the local DBMS. The foreign table allows executing SQL queries to the data contained in the table as local ones. The foreign table includes the table structure description and column mapping between the foreign and local table.

2.4.1.1. CREATE FOREIGN TABLE definition

CREATE FOREIGN TABLE [ IF NOT EXISTS ] <table name>
            (<foreign column definition> [, {<foreign column definition> | <table constraint>}...])
            SERVER <server name> [OPTIONS (<option> [, <option> ...] )]
    <foreign column definition> ::= 
        <regular column definition>
        | <identity column definition>
        [OPTIONS (<option> [, <option> ...] )]
    <regular column definition> ::=
        <column name> { <data type> | <domain name>}
        [DEFAULT {<literal> | NULL | <context variable>}]
        [NOT NULL]
        [<column constraint>]
        [COLLATE <collation name>]
    <identity column definition> ::=
        <column name> [<data type>]
        GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [(<identity column option> [<identity column option>])]
        [<column constraint>]
    <identity column option> ::= START WITH <start value>
        | INCREMENT [BY] <increment value>
    <column constraint> ::=
        [CONSTRAINT <constraint name>] CHECK (<check condition>)
    <table constraint> ::=
        [CONSTRAINT <constraint name>] CHECK (<check condition>)
    <option> ::= {
        <option name> [= 'value']
        | <option name> ['value'] }
Rules:
  • The name of the foreign table must be unique;
  • The required parameter is the name of the foreign server;
  • If the specified foreign server does not exist, the creation of the foreign table will fail;
  • The table name and column names should be the same as the object names on the foreign server. Can be remapped using the `OPTIONS` clause;
  • A foreign column cannot be `UNIQUE`, `PRIMARY KEY`, or `FOREIGN KEY`;
  • A foreign column cannot be computed (`COMPUTED [BY] | GENERATED ALWAYS AS`);
  • Foreign columns can have `NOT NULL` and `CHECK` integrity constraints, but this does not guarantee that the foreign server will check these constraints;
  • Otherwise, the rules for creating a foreign table are the same as for a regular table.
The optional `OPTIONS` clause specifies additional foreign column options and table options. The required foreign column and table options depend on the provider (plugin) used to access table data. General table options are:
  • SCHEMA_NAME - the schema name where the table is located on the foreign server;
  • TABLE_NAME - name of the table on the foreign server.
  • General column options are:
  • COLUMN_NAME - the name of the column in the foreign table;
  • PRIMARY_KEY - an identifier explicitly specifying that the column is used in the primary key.
The table name should include the full schema name and the table name on the foreign server (if not specified in the options). The table names should match. Column names should match (if not specified in the options). The number of defined columns should be less than or equal to the columns in the table on the foreign server.
The `CREATE FOREIGN TABLE` statement can be executed by administrators and users with the `CREATE FOREIGN TABLE` privilege and. The user executing the `CREATE FOREIGN TABLE` statement becomes the owner of the foreign table.
Example:

CREATE FOREIGN TABLE TEST(ID BIGINT GENERATED ALWAYS AS IDENTITY OPTIONS(COLUMN_NAME 'F_ID'),
    VCHAR VARCHAR(10) NOT NULL OPTIONS(COLUMN_NAME = 'F_VCHAR'),
    FBLOB BLOB SUB_TYPE BINARY OPTIONS(COLUMN_NAME = 'F_BLOB'),
    FBOOL BOOLEAN OPTIONS(COLUMN_NAME 'F_BOOL'))
SERVER TEST_SERVER OPTIONS(SCHEMA_NAME = 'ORA', TABLE_NAME 'FT_TEST');

The example above is equivalent to the following foreign table creation statement:

CREATE FOREIGN TABLE "ORA.FT_TEST"(F_ID BIGINT GENERATED ALWAYS AS IDENTITY),
    F_VCHAR VARCHAR(10) NOT NULL,
    F_BLOB BLOB SUB_TYPE BINARY,
    F_BOOL BOOLEAN)
SERVER TEST_SERVER;

Note that a declaration of the form "<schema>.<...>.<table name>" should not exceed 63 characters. Options allow to avoid this limitation.

2.4.1.2. ALTER FOREIGN TABLE definition

ALTER FOREIGN TABLE <table name> <modification operation> [, <modification operation>...]
      [OPTIONS (<option> [, <option> ...] )]
    <modification operation> ::=
        ADD <foreign column definition>
        | ADD <table constraint>
        | DROP <column name>
        | DROP CONSTRAINT <column or table constraint>
        | ALTER [COLUMN] <column name> <foreign column modification>
        | [OPTIONS (<option> [, <option> ...] )]
    <foreign column definition> ::=
        <regular column definition>
        | <identity column definition>
        [OPTIONS (<option> [, <option> ...] )]
    <regular column definition> ::=
        <column name> { <data type> | <domain name>}
        [DEFAULT {<literal> | NULL | <context variable>}]
        [NOT NULL]
        [<column constraint>]
        [COLLATE <collation name>]
    <identity column definition> ::=
        <column name> [<data type>]
        GENERATED {ALWAYS|BY DEFAULT} AS IDENTITY [(<identity column option> [<identity column option>])]
    [<column constraint>]
    <foreign column modification> ::=
        TO <new column name>
        | POSITION <new position>
        | <regular column modification>
        | <identity column definition>
    <regular column modification> ::=
        TYPE { <data type> | <domain name> }
        | SET DEFAULT { <literal> | NULL | <context variable>}
        | DROP DEFAULT
        | SET NOT NULL
        | DROP NOT NULL
    <identity column definition> ::=
        <identity column option>
        | SET GENERATED {ALWAYS|BY DEFAULT} [ <identity column option> ...]
        | DROP IDENTITY
    <identity column option> ::= {
        RESTART [ WITH <start value> ]
        | SET INCREMENT [BY] <increment value> }
    <option> ::=
        [DROP] <option name> [= 'value']
        | [DROP] <option name> ['value']

The mandatory parameters are the table name and the modification operation definition. If new values of existing options of a foreign column or foreign table are specified, they will be updated and the others will not be affected. Otherwise, the rules for altering a foreign table are the same as for a regular table.
The `ALTER FOREIGN TABLE` statement can be executed by administrators, the owner of the table and users with the `ALTER FOREIGN TABLE` privilege.
Example:

ALTER FOREIGN TABLE TEST ALTER FVCHAR OPTIONS(DROP COLUMN_NAME);

2.4.1.3. DROP FOREIGN TABLE definition

DROP FOREIGN TABLE <table name>

A foreign table that is referenced in triggers cannot be deleted, except for triggers written by the user specifically for this table. Also, a foreign table that is used in a stored procedure or view cannot be deleted. When a foreign table is dropped, all its triggers and options (include column options) will be deleted as well.
The `DROP TABLE` statement can be executed by administrators, the owner of the table and users with the `DROP FOREIGN TABLE` privilege.
Example:

DROP FOREIGN TABLE TEST;

2.4.2. EXECUTE STATEMENT statement changes

<execute_statement> ::= EXECUTE STATEMENT <argument>
      [<option> ...]
      [INTO <variables>];
    <argument> ::= <paramless_stmt>
                | (<paramless_stmt>)
                | (<stmt_with_params>) (<param_values>)
    <param_values> ::= <named_values> | <positional_values>
    <named_values> ::= <named_value> [, <named_value> ...]
    <named_value> ::= [EXCESS] paramname := <value_expr>
    <positional_values> ::= <value_expr> [, <value_expr> ...]
    <option> ::=
        WITH {AUTONOMOUS | COMMON} TRANSACTION
      | WITH CALLER PRIVILEGES
      | AS USER user
      | PASSWORD password
      | ROLE role
      | ON EXTERNAL { [DATA SOURCE] <connection_string> | SERVER <server name> }
    <connection_string> ::=
      !! See <filespec> in the CREATE DATABASE syntax !!
    <variables> ::= [:]varname [, [:]varname ...]

The mandatory option for the `ON EXTERNAL SERVER` clause is the name of the foreign server. If the specified foreign server does not exist, creating an object with this `EXECUTE STATEMENT` will fail. The connection parameters specified in the `EXECUTE STATEMENT` statement have a higher priority than those specified in the user mapping or specified for the foreign server.

Example:

EXECUTE BLOCK RETURNS (EXT_ID BIGINT, EXT_VCHAR VARCHAR(10), EXT_BLOB VARCHAR(20),
EXT_BOOL BOOLEAN) AS
BEGIN
    FOR EXECUTE STATEMENT
        'SELECT F_ID, F_VCHAR, F_BLOB, F_BOOL FROM FT_TEST'
        ON EXTERNAL SERVER TEST_SERVER PASSWORD 'EXT_PASSWORD'
        INTO :EXT_ID, :EXT_VCHAR, :EXT_BLOB, :EXT_BOOL DO
    SUSPEND;
END

3. Implementation

3.1. System tables and indexes

RDB$FOREIGN_SERVERS - contains a description of all foreign servers defined in the database:

Field name

Type

Description

RDB$FOREIGN_SERVER_NAME

CHAR (63) 

Foreign server name

RDB$FOREIGN_SERVER_WRAPPER

CHAR (63)

Provider (plugin) used to connect to a foreign server

RDB$SECURITY_CLASS

CHAR (63)

Security class

RDB$OWNER_NAME

CHAR (63)

The username of the user who created the server originally


RDB$FOREIGN_SERVER_OPTIONS - contains a description of all options defined for the foreign servers in the database:

Field name

Type

Description

RDB$FOREIGN_SERVER_NAME

CHAR (63) 

Foreign server name

RDB$FOREIGN_OPTION_NAME

CHAR (63)

Option name
RDB$FOREIGN_OPTION_VALUEVARCHAR (32765)Option value

Information about user mappings to foreign servers is stored in system tables:

RDB$FOREIGN_USER_MAPPINGS - contains a description of all user mappings defined in the database:

Field name

Type

Description

RDB$USER

CHAR (63) 

The user name to map

RDB$FOREIGN_SERVER_NAME

CHAR (63)

Foreign server name to map

RDB$FOREIGN_MAPPING_OPTIONS - contains a description of all options defined for user mappings to foreign servers in the database:

Field name

Type

Description

RDB$USER

CHAR (63) 

The user name to map

RDB$FOREIGN_SERVER_NAME

CHAR (63)

Foreign server name

RDB$FOREIGN_OPTION_NAME

CHAR (63)

Option name

RDB$FOREIGN_OPTION_VALUE

VARCHAR (32765)

Option value


This table completely absorbs table `RDB$FOREIGN_USER_MAPPINGS`, but the ISO/IEC 9075-9:2023(E) (SQL/MED) standard requires a separate table for user mappings and user mapping options.
A new field is added for the `RDB$RELATIONS` table that stores the foreign server name:

Field name

Type

Description

RDB$FOREIGN_SERVER_NAME

CHAR (63) 

Foreign server name


If the `RDB$FOREIGN_SERVER_NAME` field contains a value other than NULL, the metadata is considered to match the foreign table.

RDB$FOREIGN_TABLE_FIELD_OPTIONS - contains a description of all options defined for foreign table columns in the database:

Field name

Type

Description

RDB$TABLE_NAME

CHAR (63) 

Foreign table name

RDB$FIELD_NAME

CHAR (63)

Foreign column name

RDB$FOREIGN_OPTION_NAME

CHAR (63)

Option name

RDB$FOREIGN_OPTION_VALUE

VARCHAR (32765)

Option value


RDB$FOREIGN_TABLE_OPTIONS - contains a description of all options defined for foreign tables in the database:

Field name

Type

Description

RDB$TABLE_NAME

CHAR (63) 

Foreign table name

RDB$FOREIGN_OPTION_NAME

CHAR (63)

Option name

RDB$FOREIGN_OPTION_VALUE

VARCHAR (32765)

Option value


For `RDB$FOREIGN_SERVERS` a unique index is created on the field `RDB$FOREIGN_SERVER_NAME`.
For `RDB$FOREIGN_SERVERS_OPTIONS` creates a unique index on the fields `RDB$FOREIGN_SERVER_NAME` and `RDB$FOREIGN_OPTION_NAME`.
For `RDB$FOREIGN_TABLE_OPTIONS` a unique index is created on the `RDB$TABLE_NAME` and `RDB$FOREIGN_OPTION_NAME` fields.
For `RDB$FOREIGN_USER_MAPPINGS` a unique index is created for the fields `RDB$USER` and `RDB$FOREIGN_SERVER_NAME`.
For `RDB$FOREIGN_MAPPING_OPTIONS` a unique index is created on the `RDB$USER`, `RDB$FOREIGN_SERVER_NAME`, and `RDB$FOREIGN_OPTION_NAME` fields.
For `RDB$FOREIGN_TABLE_FIELD_OPTIONS` a unique index is created for the `RDB$TABLE_NAME`, `RDB$FIELD_NAME`, and `RDB$FOREIGN_OPTION_NAME` fields.

3.2. Code highlights

3.2.1. Foreign server

class ForeignServer

{

public:

   explicit ForeignServer(MemoryPool& p, const MetaName& aName, const MetaName& aPlugin)

       : name(p, aName), plugin(p, aPlugin), options(p)

   {}

...

private:

   const MetaName name;

   const MetaName plugin;

   Firebird::GenericMap<MetaNameStringPair> options;

};

  • The declaration is in src/jrd/ForeignServer.h
  • Created in MET_scan_relation if RDB$RELATIONS.RDB$FOREIGN_SERVER_NAME is not NULL
  • Controlled by an object of class Jrd::ForeignTableAdapter
3.2.2. Foreign table

class ForeignTableAdapter

{

public:

    class ForeignField

    {

...

        MetaName name;

        Firebird::GenericMap<Firebird::Full<MetaName, Firebird::string> > options;

    };

...

    jrd_rel* m_relation;

    Firebird::AutoPtr<ForeignServer> m_server;

    ForeignTableProvider* m_provider = NULL;

    ForeignTableConnection* m_connection = NULL;

    Firebird::GenericMap<MetaNameStringPair> m_tableOptions;

    Firebird::LeftPooledMap<MetaName, ForeignField*> m_foreignFields;

...

};

  • The declaration is in src/jrd/ForeignServer.h
  • Created in MET_scan_relation if RDB$RELATIONS.RDB$FOREIGN_SERVER_NAME is not NULL
  • Controlled by an object of class Jrd::jrd_rel

    class jrd_rel : public pool_alloc<type_rel> {
    ForeignTableAdapter* rel_foreign_adapter; }


The foreign table adapter object creates a foreign connection to a foreign server. The existing external data source (EDS) subsystem is used to create provider, external connection and operator objects. The `IscProvider` and its implementations are used as the base provider.

class ForeignTableConnection : public EDS::IscConnection

{

    void getProviderInfo(thread_db* tdbb);

    bool testSqlFeature(info_sql_features value) const { return m_sqlFeatures[value]; }

};


class ForeignTableProvider : public EDS::IscProvider

{

    ForeignTableConnection* createForeignConnection(thread_db* tdbb, ForeignServer* server);

};


class ForeignTableStatement : public EDS::IscStatement

{

    void setSql(const Firebird::string& sql);

    void openInternal(thread_db* tdbb, EDS::IscTransaction* transaction);

    void executeInternal(thread_db* tdbb, EDS::IscTransaction* transaction, record_param* org_rpb, record_param* new_rpb, const Firebird::Array<int>* skippedParameters = nullptr);

    bool fetchInternal(thread_db* tdbb, Record* record);

};


The new record stream performs foreign data fetching:

class ForeignTableScan final : public RecordStream // src/jrd/recsrc/RecordSource.h

{

private:

    struct Impure : public RecordSource::Impure

    {

        EDS::Statement* statement;

    };

public:

    ForeignTableScan(CompilerScratch* csb, const Firebird::string& alias, StreamType stream, jrd_rel* relation);

...

private:

...

    jrd_rel* const m_relation;

    const Firebird::string m_alias;

    Firebird::Array<const BoolExprNode*> m_filterNodes; // Stores nodes will be converted to a condition

    const SortNode* m_sortNode; // Stores nodes will be converted to order by    

};


3.2.3. Data modification


The data change depends on the operation type. But in general changes look like this:

else if (relation->rel_foreign_adapter)

{

    if (!impure->statement)

    {

        impure->statement = relation->rel_foreign_adapter->createStatement(tdbb, ...);

        impure->statement->bindToRequest(request, &impure->statement);

    }

For INSERT:

relation->rel_foreign_adapter->execute(tdbb, impure->statement, NULL, rpb); // Store. src/dsql/StmtNodes.cpp

Or for UPDATE:
relation->rel_foreign_adapter->execute(tdbb, impure->statement, orgRpb, newRpb); // Modify. src/dsql/StmtNodes.cpp
Or for DELETE:

relation->rel_foreign_adapter->execute(tdbb, impure->statement, NULL, rpb); // Erase. src/dsql/StmtNodes.cpp

}


The statements are closed after the nodes have been processed:

// src/jrd/exe.cpp

// Close external statements

while (request->req_ext_stmt)

    request->req_ext_stmt->close(tdbb);
The lifetime of the foreign connection depends on the lifetime of the attachment. EDS subsystem is responsible for releasing resources.

3.3. Optimization

3.3.1. Local predicates optimization

The optimizer extracts local predicates to the foreign table and passes them to the `Jrd::RecordSource` object (ForeignTableScan), which analyzes them and converts them into a `WHERE` clause for a query to the foreign server. For example, `ComparativeNode(FieldNode, LiteralNode)` will be converted to `WHERE ID = 0`, etc. For predicates that do not exist in other DBMSs, the `Jrd::RecordSource` object does not perform the conversion. For example, `IS NOT DISTINCT` is not present in Oracle/MySQL, `GEN_UUID` is not a standard function, etc. Initial support implies conversion of conditions of the form `field <op> constant`. Later the list can be extended.
The method `bool Jrd::ForeignTableScan::applyBoolean(Jrd::thread_db *tdbb, const Jrd::BoolExprNode *boolean)` checks if an input parameter can be passed to a foreign server and if it can, adds a condition to the `WHERE` clause of the `SELECT` statement and returns true. The optimizer generates a list of candidate predicates and passes it to this method.

3.3.2. Sorting optimization

Sorting by a foreign table on a foreign server is possible only if only its fields are involved. In this case, an `ORDER BY` clause with selection conditions is added to the `SELECT` statement. Method
`bool Jrd::ForeignTableScan::applySort(Jrd::thread_db *tdbb, const Jrd::SortNode *sort)` for an input parameter checks if it can be passed to the foreign server as an `ORDER BY` clause of the `SELECT` statement. If all elements inside `const Jrd::SortNode *sort` can be passed, true is returned. The optimizer performs pre-preparation - it checks that the sort list contains only columns from the foreign table.

4. Examples


create database '172.17.0.2:/db/external.fdb' user SYSDBA password 'masterkey';

 

connect '172.17.0.2:/db/external.fdb' user SYSDBA password 'masterkey';


create table ft_test(

f_id bigint

f_vchar varchar(10), 

f_blob blob sub_type binary

f_bool boolean);

commit;


create user ext_user password 'ext_password';

grant all on ft_test to ext_user;

commit;


connect 'localhost:/db/local.fdb' user SYSDBA password 'masterkey';


create server test_server foreign data wrapper "Remote" options(connection_string '172.17.0.2:/db/external.fdb', user 'ext_user', password 'ext_password');


create foreign table test(id bigint options(column_name 'f_id'), 

vchar varchar(10) options(column_name = 'f_vchar'), 

fblob blob sub_type binary options(column_name = 'f_blob'),

fbool boolean options(column_name = 'f_bool')

server test_server options(table_name 'ft_test');


commit;


select id, vchar, cast(fblob as varchar(20)), fbool from test rows 5;


                   ID VCHAR      CAST                   FBOOL 

===================== ========== ==================== ======= 

                   11 htiVSHUpUa '.a....k..           <true>  

                   12 jOESzIJDvn Y.z.....tl.F........ <false> 

                   13 sBv2zl76sV D9.D.8 >..J..]

                                               .     <false> 

                   14 O6d2RWV2tG 5....m..A.           <true>  

                   15 bYuyWGUrjI #D.......7...=.xy.   <false> 


alter server test_server options(drop user, drop password);

commit;

select id, vchar, cast(fblob as varchar(20)), fbool from test rows 5;


                   ID VCHAR      CAST                   FBOOL 

===================== ========== ==================== ======= 

Statement failed, SQLSTATE = 42000

Execute statement error at attach :

335545106 : Error occurred during login, please check server rdbserver.log for details

Data source : Firebird::172.17.0.2:/db/external.fdb


create user mapping for sysdba

server test_server

options (user 'ext_user', password 'ext_password');

commit;

select id, vchar, cast(fblob as varchar(20)), fbool from test rows 5;


                   ID VCHAR      CAST                   FBOOL 

===================== ========== ==================== ======= 

                   11 htiVSHUpUa '.a....k..           <true>  

                   12 jOESzIJDvn Y.z.....tl.F........ <false> 

                   13 sBv2zl76sV D9.D.8 >..J..]

                                               .     <false> 

                   14 O6d2RWV2tG 5....m..A.           <true>  

                   15 bYuyWGUrjI #D.......7...=.xy.   <false>


alter user mapping for sysdba

server test_server

options (drop password);

commit;

select id, vchar, cast(fblob as varchar(20)), fbool from test rows 5;


                   ID VCHAR      CAST                   FBOOL 

===================== ========== ==================== ======= 

Statement failed, SQLSTATE = 42000

Execute statement error at attach :

335544472 : Your user name and password are not defined. Ask your database administrator to set up a Red Database login.

Data source : Firebird::172.17.0.2:/db/external.fdb


execute block returns (ext_id bigint, ext_vchar varchar(10), ext_blob varchar(20), ext_bool boolean) as

begin

 for execute statement

  'select f_id, f_vchar, f_blob, f_bool from ft_test'

  on external server test_server password 'ext_password'

  into :ext_id, :ext_vchar, :ext_blob, :ext_bool DO

   suspend;

end^


               EXT_ID EXT_VCHAR  EXT_BLOB             EXT_BOOL 

===================== ========== ==================== ======== 

                   11 htiVSHUpUa '.a....k..           <true>   

                   12 jOESzIJDvn Y.z.....tl.F........ <false>  

                   13 sBv2zl76sV D9.D.8 >..J..]

                                               .     <false>  

                   14 O6d2RWV2tG 5....m..A.           <true>   

                   15 bYuyWGUrjI #D.......7...=.xy.   <false>  

                   16 <null>     8....s...V.          <true>   

                   17 mU3TNIXXU9 .......[8%$....R   <true>   

                   18 vfRdSpZP1s ..h..."[E0RbXT..c..  <true>   

                   19 5VmDx2E0IK ...c3{...?}.:....    <false>  

                   20 TiEftYzQpF ...+..

                                       ...           <false>  

                   21 3ebLTJqnTZ .h .$.....           <true> 


5. Testing of external providers

We have implemented a JDBC provider (JAVA) that allows connection to various DBMSs via JDBC protocol. Below are the results of testing the embedded and external providers. JDBC provider (via modified Jaybird driver) and Magpie (via modified Firebird ODBC driver) were used as external providers.
Test table metadata:

CREATE TABLE TEST (ID1 BIGINT NOT NULL,
        ID2 BIGINT NOT NULL,
        ID3 BIGINT NOT NULL,
        ID4 INTEGER default 0 NOT NULL,
        F1 BIGINT NOT NULL,
        F2 INTEGER NOT NULL,
        F3 BIGINT NOT NULL,
        F4 BIGINT NOT NULL,
        F5 BIGINT NOT NULL,
        F6 TIMESTAMP NOT NULL,
        F7 BIGINT NOT NULL,
        F8 BIGINT,
        F9 DOUBLE PRECISION,
        F10 INTEGER,
        F11 TIMESTAMP,
        F12 VARCHAR(32760),
        F13 BIGINT,
        F14 TIMESTAMP,
        F15 TIMESTAMP,
        F16 SMALLINT,
        F17 INTEGER,
        F18 INTEGER,
        F20 NUMERIC(15, 4),
        F21 NUMERIC(15, 4),
        F22 INTEGER,
        F23 INTEGER NOT NULL,
        F24 INTEGER NOT NULL,
        F25 BIGINT,
        F26 VARCHAR(2048),
        F27 DOUBLE PRECISION,
        F28 VARCHAR(2048),
        F29 VARCHAR(24),
        F30 BIGINT,
        F31 BIGINT,
        F32 INTEGER,
        F33 INTEGER,
        F34 VARCHAR(64),
        F35 VARCHAR(64),
        F36 VARCHAR(1024),
        F37 INTEGER,
        F38 INTEGER,
        F39 INTEGER,
        F40 INTEGER,
        F41 BIGINT,
        F42 BIGINT,
        F43 BIGINT,
        F44 INTEGER,
        F45 SMALLINT,
        F46 INTEGER,
CONSTRAINT PK_TEST PRIMARY KEY (ID1, ID2, ID3, ID4));

SELECT * FROM TEST (47 302 883 rows (~14.9 GB))

embedded

localhost

remote-provider + localhost

magpie-provider + odbc-driver + localhost

jdbc-provider + jaybird-driver + localhost

Current memory (bytes)

8 496 256 

(8.5 MB)

8 515 680 

(8.5 MB)

283 039 856 

(283.0 MB)

551 530 272 

(526.0 MB)

283 039 936 

(283.04  MB)

Delta memory (bytes)

282 544 

(0.28 MB)

28 2544 

(0.28 MB)

532 240 

(0.53 MB)

-3664 

(-0.004 MB)

532 320

 (0.53 MB)

Max memory (bytes)

8 502 800 

(8.5 MB)

8 522 224 

(8.5 MB)

283 050 272 

(283.0 MB)

551 554 784 

(551.6 MB)

283 050 352

(283.05 MB)

Elapsed time sec

 (hh:mm:ss)

8186.777 

(2:16:26)

8392.270 

(2:19:52)

8597.089 

(2:23:17)

9126.435 

(2:32:06)

8777.087 

(2:26:28)

Cpu sec (mm:ss)

3228.180

(53:48)

3139.620

(52:19)

3172.610 

(52:52)

3226.290

(53:46)

3313.280

(55:22)

Buffers

400

400

32 768

32 768

32 768

Reads

681 158

681 157

44

0

45

Writes

0

0

1

1

1

Fetches

50 027 104

50 027 104

657

4

654

JDBC provider speed is ~2.1% slower than Remote provider speed. The relatively small performance lag indicates the optimized handling of external sources in the engine.

INSERT INTO TEST (2 000 000 rows (~630 MB))

embedded

localhost

remote-provider + localhost

magpie-provider + odbc-driver + localhost

jdbc-provider + jaybird-driver + localhost

Current memory (bytes)286 108 400

(286.11 MB)
286 108 608

(286.11 MB)
283 866 048

(283.87 MB)
-283 866 144

(283.87 MB)
Delta memory (bytes)2 850 592

 (2.8506 MB)
2 837 248

(2.8372 MB)
263 472

(0.263472 MB)
-263 472

(0.263472 MB)
Max memory (bytes)287 586 112

(287.59 MB)
287 586 320

(287.59 MB)
283 915 456

(283.92 MB)
-283 915 552

(283.92 MB)
Elapsed time (sec)

(hh:mm:ss)
1098.741

(00:18:18)
1102.345

(00:18:22)
1127.777

(00:18:47)
-1528.422

(00:25:47)
Cpu sec (sec)45.2200.0000.000-0.000
Buffers327683276832768-32768
Reads446-6
Writes454345361-1
Fetches4 362 8924 362 892780-780
JDBC provider speed is ~35% slower than Remote provider speed. The slower insertion is caused by a performance penalty in the JDBC driver (in the native implementation of the OO API).
It was not possible to test the insertion with the Magpie provider, because the ODBC driver does not support the Unicode API and requires more long term development.

6. Conclusion

This proposal describes the implementation of distributed (heterogeneous) queries for Firebird as close as possible to the "ISO/IEC 9075-9:2023(E) (SQL/MED)" standard. This proposal describes the implementation of foreign tables and the change to operator `EXECUTE STATEMENT`, but does not consider the implementation of "DATALINK" SQL type.

Although the "ISO/IEC 9075-9:2023(E) (SQL/MED)" standard does not prohibit storing login and password in plaintext (whether in ESOE or in external server or mapping options) - it is bad practice. We should also consider additional storage options (e.g., via an external file or environment variables).

Dimitry Sibiryakov

unread,
Jan 21, 2025, 5:45:34 AMJan 21
to firebir...@googlegroups.com
Vasiliy Yashkov wrote 21.01.2025 9:53:
> In Firebird, the foreign-data wrapper can be implemented as a provider (plugin).
> To use a foreign-data wrapper, it should be defined in plugins.conf, for example:

Why? Usually plugins don't need that.

> A foreign table is a table that is physically stored on a foreign server but is accessible by the local DBMS. The foreign table allows executing SQL queries to the data contained in the table as local ones. The foreign table includes the table structure description and column mapping between the foreign and local table.

What's the purpose of it? You can get whole table definition from foreign server.

> RDB$FOREIGN_SERVER_OPTIONS - contains a description of all options defined for the foreign servers in the database:

Usually such things are stored in BLOB field.

> 3.2. Code highlights

I see that whole implementation is built-in and uses internal engine
structures, so what meaning does plugin configuration above have?

> The optimizer extracts local predicates to the foreign table and passes them to the `Jrd::RecordSource` object (ForeignTableScan), which analyzes them and converts them into a `WHERE` clause for a query to the foreign server.

Bad idea. Generating of SQL (if necessary) is the job for FDW, not optimizer
because optimize lack of knowledge of target database's SQL dialect. You also
are loosing ability to work with data sources not supporting SQL.

> It was not possible to test the insertion with the Magpie provider, because the ODBC driver does not support the Unicode API and requires more long term development.

What do you mean? Which ODBC driver and foreign server you used? I would
suggest MySQL as the second most popular DBMS. And its drivers support Unicode
API well.


--
WBR, SD.

Vasiliy Yashkov

unread,
Jan 21, 2025, 7:29:21 AMJan 21
to firebir...@googlegroups.com
вт, 21 янв. 2025 г. в 13:45, 'Dimitry Sibiryakov' via firebird-devel <firebir...@googlegroups.com>:
Vasiliy Yashkov wrote 21.01.2025 9:53:
> In Firebird, the foreign-data wrapper can be implemented as a provider (plugin).
> To use a foreign-data wrapper, it should be defined in plugins.conf, for example:

   Why? Usually plugins don't need that.
You're right, it's optional to define plugins in plugins.conf. The definition example  is given for clarity.

> A foreign table is a table that is physically stored on a foreign server but is accessible by the local DBMS. The foreign table allows executing SQL queries to the data contained in the table as local ones. The foreign table includes the table structure description and column mapping between the foreign and local table.

   What's the purpose of it? You can get whole table definition from foreign server.
Table definition on foreign server may not always satisfy the requirements on the current server. An explicit definition allows you to manage data. But we are also considering getting a foreign table definition in future implementations.

> RDB$FOREIGN_SERVER_OPTIONS - contains a description of all options defined for the foreign servers in the database:

   Usually such things are stored in BLOB field.
A lot of overhead.

> 3.2. Code highlights

   I see that whole implementation is built-in and uses internal engine
structures, so what meaning does plugin configuration above have?
Plugins may have a more complex structure that requires more detailed configuration.

> The optimizer extracts local predicates to the foreign table and passes them to the `Jrd::RecordSource` object (ForeignTableScan), which analyzes them and converts them into a `WHERE` clause for a query to the foreign server.

   Bad idea. Generating of SQL (if necessary) is the job for FDW, not optimizer
because optimize lack of knowledge of target database's SQL dialect. You also
are loosing ability to work with data sources not supporting SQL.
In fact, before optimizing predicates, SQL features are requested from a foreign server (similar to info_features), based on which conditions are generated. In any case, you can customize the query via options.

> It was not possible to test the insertion with the Magpie provider, because the ODBC driver does not support the Unicode API and requires more long term development.

   What do you mean? Which ODBC driver and foreign server you used? I would
suggest MySQL as the second most popular DBMS. And its drivers support Unicode
API well.
 Firebird ODBC driver. This was discussed here https://github.com/aafemt/magpie/issues/2

Dimitry Sibiryakov

unread,
Jan 21, 2025, 7:37:25 AMJan 21
to firebir...@googlegroups.com
Vasiliy Yashkov wrote 21.01.2025 13:29:
>    Usually such things are stored in BLOB field.
>
> A lot of overhead

Overhead of parsing is considered to be smaller that overhead of reading
separate records. That's why ACL and record formats are in BLOBs. Just feed BLOB
content to existing Config class and yuo'll get ready to use IPluginConfig.

> Plugins may have a more complex structure that requires more detailed configuration.

But I see no usage of plugins in this proposal at all.

> In fact, before optimizing predicates, SQL features are requested from a foreign server (similar to info_features), based on which conditions are generated. In any case, you can customize the query via options.

Still it requires generation of text in the optimizer and parsing of this
text in FDW plugin. A lot of overhead.

> What do you mean? Which ODBC driver and foreign server you used? I would
> suggest MySQL as the second most popular DBMS. And its drivers support Unicode
> API well.
>
> Firebird ODBC driver. This was discussed here https://github.com/aafemt/magpie/issues/2

First, it is rather strange to test implementation of connection to
"different DBMS" using the same DBMS.
Second, this is UnixODBC bug only, as was said.

I really suggest to do the tests connecting to MySQL, Oracle or MS SQL.

--
WBR, SD.

Alex Peshkoff

unread,
Jan 21, 2025, 7:44:41 AMJan 21
to firebir...@googlegroups.com
On 1/21/25 15:29, Vasiliy Yashkov wrote:

> > A foreign table is a table that is physically stored on a
> foreign server but is accessible by the local DBMS. The foreign
> table allows executing SQL queries to the data contained in the
> table as local ones. The foreign table includes the table
> structure description and column mapping between the foreign and
> local table.
>
>    What's the purpose of it? You can get whole table definition
> from foreign server.
>
> Table definition on foreign server may not always satisfy the
> requirements on the current server. An explicit definition allows you
> to manage data. But we are also considering getting a foreign table
> definition in future implementations.

Why not start with it? Understandable by current server table
description should be produced by suggested new provider, btw JDBC
provides standard way to report database's metadata.

On my mind requirement to describe foreign tables explicitly is nonsense
- what if at some moment it will not match foreign server reality? (and
that will for sure happen)

>
> > RDB$FOREIGN_SERVER_OPTIONS - contains a description of all
> options defined for the foreign servers in the database:
>
>    Usually such things are stored in BLOB field.
>
> A lot of overhead.

Much less than gathering them from the table. See for example
RDB$RUNTIME in RDB$RELATIONS.

But even this last phraze is actually offtopic now - IMO before going to
code/config details we should decide about foreign objects description.


Dmitry Yemanov

unread,
Jan 21, 2025, 8:50:51 AMJan 21
to firebir...@googlegroups.com
21.01.2025 15:44, Alex Peshkoff wrote:

>>     > A foreign table is a table that is physically stored on a
>>     foreign server but is accessible by the local DBMS. The foreign
>>     table allows executing SQL queries to the data contained in the
>>     table as local ones. The foreign table includes the table
>>     structure description and column mapping between the foreign and
>>     local table.
>>
>>        What's the purpose of it? You can get whole table definition
>>     from foreign server.
>>
>> Table definition on foreign server may not always satisfy the
>> requirements on the current server. An explicit definition allows you
>> to manage data. But we are also considering getting a foreign table
>> definition in future implementations.
>
> Why not start with it?

Because the work is based on the SQL standard and it requires the
explicit declaration.

> On my mind requirement to describe foreign tables explicitly is nonsense
> - what if at some moment it will not match foreign server reality? (and
> that will for sure happen)

The same may happen for dynamic metadata linkage as well. Today the
provider reports you with T1(A INT) and you use this metadata inside a
local stored procedure but tomorrow it fails due to changed foreign
declaration.

The point is that once the foreign table is declared locally, all local
objects may reference it safely without re-validating it against the
foreign database every time. It's kinda public API interface that should
be obeyed by all local DSQL/PSQL developers. And if some field is e.g.
renamed inside the foreign database, the foreign table owner can alter
its declaration to fix the issue, and this is usually done without
invalidating all the existing local references (only the foreign part of
the definition is altered). With dynamic metadata linkage every PSQL
developer has to know the foreign tables structure and have access there
to be able to fix the things locally. This is what I'd call nonsense ;-)

>>     > RDB$FOREIGN_SERVER_OPTIONS - contains a description of all
>>     options defined for the foreign servers in the database:
>>
>>        Usually such things are stored in BLOB field.
>>
>> A lot of overhead.
>
> Much less than gathering them from the table. See for example
> RDB$RUNTIME in RDB$RELATIONS.

But we still have RDB$RELATION_FIELDS ;-)

I'd say this is not going to hurt much if options are also stored inside
the metadata cache.


Dmitry

Dmitry Yemanov

unread,
Jan 21, 2025, 9:05:39 AMJan 21
to firebir...@googlegroups.com
21.01.2025 15:37, 'Dimitry Sibiryakov' via firebird-devel wrote:
>
>   But I see no usage of plugins in this proposal at all.

Provider is a plugin. Magpie is a plugin, isn't it?

>> In fact, before optimizing predicates, SQL features are requested from
>> a foreign server (similar to info_features), based on which conditions
>> are generated. In any case, you can customize the query via options.
>
>   Still it requires generation of text in the optimizer and parsing of
> this text in FDW plugin. A lot of overhead.

The optimizer does not generate any SQL text, it only suggests
expressions suitable for deeper optimization. They're converted into SQL
text a bit later. And only if the foreign database dialect supports them.

>   First, it is rather strange to test implementation of connection to
> "different DBMS" using the same DBMS.

This proposal is not only about "different" DBMS but also about the
Firebird itself (named external data sources are present in our
tracker). And it makes a lot of sense to compare the overhead against
other types of connections (and also API vs ODBC vs JDBC).


Dmitry

Dimitry Sibiryakov

unread,
Jan 21, 2025, 9:35:51 AMJan 21
to firebir...@googlegroups.com
Dmitry Yemanov wrote 21.01.2025 15:05:
>>    But I see no usage of plugins in this proposal at all.
>
> Provider is a plugin. Magpie is a plugin, isn't it?

Magpie is not Foreign Data Wrapper. It is a Firebird provider.
Description of implementation details in this topic doesn't include plugin
interfaces at all.

>> Still it requires generation of text in the optimizer and parsing of this text in FDW plugin. A lot of overhead.
>
> The optimizer does not generate any SQL text, it only suggests expressions suitable for deeper optimization. They're converted into SQL text a bit later. And only if the foreign database dialect supports them.

And what otherwise?..

Consider an obvious application fo FDW: a foreign table in CSV format. In the
current proposal the optimizer generates the expressions as a text and FDW is
supposed to implement a parser for these expressions. It is ineffective.
I would suggest to do a direct mapping of execution subtree using an
ORM-style API. In this case FDW is free to generate SQL text or execute the tree
immediately.

BTW,

> The optimizer extracts local predicates to the foreign table and passes them to the `Jrd::RecordSource` object (ForeignTableScan), which analyzes them and converts them into a `WHERE` clause for a query to the foreign server.

How about JOINs? You are not going to join two foreign tables using local
filtration, are you?..

> This proposal is not only about "different" DBMS but also about the Firebird itself (named external data sources are present in our tracker). And it makes a lot of sense to compare the overhead against other types of connections (and also API vs ODBC vs JDBC).

Yes, and exactly because of this it makes sense to test wider option instead
of limiting yourselves to Firebird alone. You are guaranteed to get unpleasant
surprises if you develop FDW using Firebird only as a target (trust me).


--
WBR, SD.

Dmitry Yemanov

unread,
Jan 21, 2025, 10:09:26 AMJan 21
to firebir...@googlegroups.com
21.01.2025 17:35, 'Dimitry Sibiryakov' via firebird-devel wrote:
>
>> Provider is a plugin. Magpie is a plugin, isn't it?
>
>   Magpie is not Foreign Data Wrapper. It is a Firebird provider.

And this RFC mentions that we do not implement Foreign Data Wrappers as
a separate entity but use providers for this goal instead. So Magpie
currently *is* a FDW ;-)

>>>    Still it requires generation of text in the optimizer and parsing
>>> of this text in FDW plugin. A lot of overhead.
>>
>> The optimizer does not generate any SQL text, it only suggests
>> expressions suitable for deeper optimization. They're converted into
>> SQL text a bit later. And only if the foreign database dialect
>> supports them.
>
>   And what otherwise?..

The provider reads the whole table and it's filtered afterwards.

>   Consider an obvious application fo FDW: a foreign table in CSV
> format. In the current proposal the optimizer generates the expressions
> as a text and FDW is supposed to implement a parser for these
> expressions. It is ineffective.
>   I would suggest to do a direct mapping of execution subtree using an
> ORM-style API. In this case FDW is free to generate SQL text or execute
> the tree immediately.

Well, this is worth thinking about.

>> The optimizer extracts local predicates to the foreign table and
>> passes them to the `Jrd::RecordSource` object (ForeignTableScan),
>> which analyzes them and converts them into a `WHERE` clause for a
>> query to the foreign server.
>
>   How about JOINs? You are not going to join two foreign tables using
> local filtration, are you?..

We do it locally (yet) but only because this optimization had less
priority than the others already mentioned here. We cannot do everything
at once and this topic is exactly to define what should be changed
and/or implemented later.

>> This proposal is not only about "different" DBMS but also about the
>> Firebird itself (named external data sources are present in our
>> tracker). And it makes a lot of sense to compare the overhead against
>> other types of connections (and also API vs ODBC vs JDBC).
>
>   Yes, and exactly because of this it makes sense to test wider option
> instead of limiting yourselves to Firebird alone. You are guaranteed to
> get unpleasant surprises if you develop FDW using Firebird only as a
> target (trust me).

The functionality was tested with other RDBMS. Only performance tests
were done with Firebird only.


Dmitry

Dimitry Sibiryakov

unread,
Jan 21, 2025, 10:33:54 AMJan 21
to firebir...@googlegroups.com
Dmitry Yemanov wrote 21.01.2025 16:09:
>>>
>>> The optimizer does not generate any SQL text, it only suggests expressions
>>> suitable for deeper optimization. They're converted into SQL text a bit
>>> later. And only if the foreign database dialect supports them.
>>
>>    And what otherwise?..
>
> The provider reads the whole table and it's filtered afterwards.

Well, suppose FDW returned "ok" for filtering and you have a query that joins
a local table and a foreign table. How the optimizer will choose plan? Local
full scan + nested loop on foreign table with filter may end up wrong if
filtering in FDW performs full scan as well.
It means that foreign table API should have methods to get estimated cost for
operation, not just ability to perform it.

> We do it locally (yet) but only because this optimization had less priority than the others already mentioned here. We cannot do everything at once and this topic is exactly to define what should be changed and/or implemented later.

Good. It means that we can discuss not only your current implementation but
alternatives as well.

--
WBR, SD.

Dmitry Yemanov

unread,
Jan 21, 2025, 10:48:50 AMJan 21
to firebir...@googlegroups.com
21.01.2025 18:33, 'Dimitry Sibiryakov' via firebird-devel wrote:
>
>>>> The optimizer does not generate any SQL text, it only suggests
>>>> expressions suitable for deeper optimization. They're converted into
>>>> SQL text a bit later. And only if the foreign database dialect
>>>> supports them.
>>>
>>>    And what otherwise?..
>>
>> The provider reads the whole table and it's filtered afterwards.
>
>   Well, suppose FDW returned "ok" for filtering and you have a query
> that joins a local table and a foreign table. How the optimizer will
> choose plan? Local full scan + nested loop on foreign table with filter
> may end up wrong if filtering in FDW performs full scan as well.

FDW cannot return OK for filtering here just because the optimizer does
not push the join condition there (only stream-local expressions
currently may be optimized remotely). Reason is simple - nobody knows
whether indexed loop join or hash join is going to be better. And
generally repeatable execution of remote queries is way more expensive
than local indexed scans, so existing cost algorithms do not fit.

> It means that foreign table API should have methods to get estimated
> cost for operation, not just ability to perform it.

And this would defer the feature forever. Let's be realistic and start
with the smaller steps.


Dmitry

Dimitry Sibiryakov

unread,
Jan 21, 2025, 11:02:14 AMJan 21
to firebir...@googlegroups.com
Dmitry Yemanov wrote 21.01.2025 16:48:
> Reason is simple - nobody knows whether indexed loop join or hash join is going
> to be better. And generally repeatable execution of remote queries is way more
> expensive than local indexed scans, so existing cost algorithms do not fit.

Not quite so: FDW provider can include round-trip cost into estimated cost so
the optimizer can compare costs of different plans with precision that is high
enough.

> And this would defer the feature forever. Let's be realistic and start with the smaller steps.

That's what I always say.
Let's start from a small step: divorce FDW provider's interface and client
interface so we don't have to keep backward compatibility and expose too much
internals on client side. In this case we can freely design unified "table"
interface that would work for the optimizer no matter if the table is local or
foreign.
Current implementation based on EDS provider hardly can fit.

--
WBR, SD.

Dimitry Sibiryakov

unread,
Jan 21, 2025, 11:08:06 AMJan 21
to firebir...@googlegroups.com
'Dimitry Sibiryakov' via firebird-devel wrote 21.01.2025 17:02:
>   That's what I always say.
>   Let's start from a small step: divorce FDW provider's interface and client
> interface so we don't have to keep backward compatibility and expose too much
> internals on client side. In this case we can freely design unified "table"
> interface that would work for the optimizer no matter if the table is local or
> foreign.

Or, perhaps, even smaller step: implement foreign servers with EDS support
now and put foreign tables off until design and optimization questions are settled.

--
WBR, SD.

Dimitry Sibiryakov

unread,
Jan 21, 2025, 11:59:52 AMJan 21
to firebir...@googlegroups.com
Vasiliy Yashkov wrote 21.01.2025 9:53:
> RDB$FOREIGN_SERVER_WRAPPER
>
> CHAR (63)
>
> Provider (plugin) used to connect to a foreign server

I suppose this is a Firebird provider's name such as "Engine13" or "Remote",
not EDS provider name such as "Internal" or "Firebird", right?

--
WBR, SD.

Denis Simonov

unread,
Jan 21, 2025, 12:20:02 PMJan 21
to firebird-devel
> The `FOREIGN DATA WRAPPER` or `USING PLUGIN` clause specifies the provider to use to access foreign data. If no clause is specified, the providers specified in the Providers parameter of the firebird.conf configuration file will be tried for connection. The clause `USING PLUGIN` is not specified in the standard, but it is familiar to Firebird developers and users, so it is proposed as an extension to the standard definition of a foreign server. The optional `OPTIONS` clause defines additional connection parameters to the foreign server. The required connection parameters depend on the provider (plugin) used to access the data. General options are:

Am I right in thinking that it is enough to specify the provider in `USING PLUGIN` and not specify it in the Providers configuration parameter (firebird.conf)? If so, that's great.

The thing is that I also made my own ODBCEngine and MySQLEngine providers. One of the main problems was the struggle with the connection string prefix and sequential enumeration of providers from the list in Providers. It was especially difficult to handle connection errors, because in this case Firebird silently redirected to the next provider in the list, as a result of which the user was given not the original error at all, but the error from the last provider in the list. I found a solution, but it is a hack.

I would also like to note that IProvider contains quite a large number of functions, many of which are redundant for this task, but in principle this is not a big deal, you can simply add stubs to methods that are not required.


> Although the "ISO/IEC 9075-9:2023(E) (SQL/MED)" standard does not prohibit storing login and password in plaintext (whether in ESOE or in external server or mapping options) - it is bad practice. We should also consider additional storage options (e.g., via an external file or environment variables).

External files and environment variables for obtaining passwords are good. You can also add an option to encrypt/decrypt the password. The encryption key can be located outside the DB.

Is it possible to touch the finished binary Windows/Linux assembly? I'm interested to see how my plugins will behave there.


> CREATE FOREIGN TABLE TEST(ID BIGINT GENERATED ALWAYS AS IDENTITY OPTIONS(COLUMN_NAME 'F_ID'),
>   VCHAR VARCHAR(10) NOT NULL OPTIONS(COLUMN_NAME = 'F_VCHAR'),
>   FBLOB BLOB SUB_TYPE BINARY OPTIONS(COLUMN_NAME = 'F_BLOB'),
>   FBOOL BOOLEAN OPTIONS(COLUMN_NAME 'F_BOOL'))
> SERVER TEST_SERVER OPTIONS(SCHEMA_NAME = 'ORA', TABLE_NAME 'FT_TEST');

Not bad for the initial implementation. True, I did not quite understand the meaning of what GENERATED ALWAYS AS IDENTITY gives? Let's say some restrictions can be checked on the Firebird side, but it is not very clear how the identity columns are used?

вторник, 21 января 2025 г. в 19:08:06 UTC+3, Dimitry Sibiryakov:

Denis Simonov

unread,
Jan 21, 2025, 12:46:18 PMJan 21
to firebird-devel
By the way, I remembered another problem with ODBCEngine and executing queries to it via EXECUTE STATEMENT. Not all drivers report the types of input parameters. However, our prepare prepares input and output messages. Then we can adjust them during execution, but only adjust them. For example, MySQL and its ODBC driver do not report the types of input parameters, but only their number. In this case, the ideal option would be for such drivers to simply pass the types of input parameters from the outside, that is, what is passed to EXECUTE STATEMENT. Has this problem been solved somehow?

вторник, 21 января 2025 г. в 20:20:02 UTC+3, Denis Simonov:

Dmitry Yemanov

unread,
Jan 21, 2025, 1:01:02 PMJan 21
to firebir...@googlegroups.com
21.01.2025 19:59, 'Dimitry Sibiryakov' via firebird-devel wrote:
>
>   I suppose this is a Firebird provider's name such as "Engine13" or
> "Remote", not EDS provider name such as "Internal" or "Firebird", right?

Right. IIRC, this string is used to compose the "Providers=XYZ" option
in isc_dpb_config when attaching and may be skipped for native Firebird
providers.


Dmitry

Dimitry Sibiryakov

unread,
Jan 21, 2025, 3:01:59 PMJan 21
to firebir...@googlegroups.com
Denis Simonov wrote 21.01.2025 18:46:
> For example, MySQL and its ODBC driver do not report the types of input
> parameters, but only their number.

It Oracle does, not MySQL.

--
WBR, SD.

Dimitry Sibiryakov

unread,
Jan 21, 2025, 3:05:02 PMJan 21
to firebir...@googlegroups.com
Dmitry Yemanov wrote 21.01.2025 19:00:
> IIRC, this string is used to compose the "Providers=XYZ" option in
> isc_dpb_config when attaching and may be skipped for native Firebird providers.

Isn't call to getPlugins() directly with this string much more
straightforward way?

--
WBR, SD.

Dimitry Sibiryakov

unread,
Jan 21, 2025, 5:41:38 PMJan 21
to firebir...@googlegroups.com
Dmitry Yemanov wrote 21.01.2025 19:00:
>>    I suppose this is a Firebird provider's name such as "Engine13" or
>> "Remote", not EDS provider name such as "Internal" or "Firebird", right?
>
> Right. IIRC, this string is used to compose the "Providers=XYZ" option in
> isc_dpb_config when attaching and may be skipped for native Firebird providers.

Perhaps default plugins.conf should contain alias "Engine" for current engine
to avoid manual modification of foreign servers' declarations during upgrade.

--
WBR, SD.

Dmitry Yemanov

unread,
Jan 22, 2025, 1:19:09 AMJan 22
to firebir...@googlegroups.com
22.01.2025 01:41, 'Dimitry Sibiryakov' via firebird-devel wrote:
>
>> Right. IIRC, this string is used to compose the "Providers=XYZ" option
>> in isc_dpb_config when attaching and may be skipped for native
>> Firebird providers.
>
> Perhaps default plugins.conf should contain alias "Engine" for
> current engine to avoid manual modification of foreign servers'
> declarations during upgrade.

The original idea is that you don't need to specify FDW/plugin name for
the native Firebird providers. This way there's no upgrade issues.
FDW/plugin name should be specified for 3rd party providers if you don't
want them inside firebird.conf and/or you don't want to have dedicated
connection string prefix for them (to be properly chained inside the
Providers list).


Dmitry

Dmitry Yemanov

unread,
Jan 22, 2025, 1:21:40 AMJan 22
to firebir...@googlegroups.com
21.01.2025 23:04, 'Dimitry Sibiryakov' via firebird-devel wrote:
>
>> IIRC, this string is used to compose the "Providers=XYZ" option in
>> isc_dpb_config when attaching and may be skipped for native Firebird
>> providers.
>
>   Isn't call to getPlugins() directly with this string much more
> straightforward way?

Maybe. But we still allow Firebird providers being an implicit option so
that regular connections via the y-valve works as usual. And we don't
expect thousands remote connections per second to make the performance
difference visible.


Dmitry

Dmitry Yemanov

unread,
Jan 22, 2025, 2:04:31 AMJan 22
to firebir...@googlegroups.com
21.01.2025 19:02, 'Dimitry Sibiryakov' via firebird-devel wrote:
>
> Not quite so: FDW provider can include round-trip cost into estimated
> cost so the optimizer can compare costs of different plans with
> precision that is high enough.

I doubt this is easy, as the FDW provider has almost no info about the
underlying network protocol implemented by the driver (e.g. individual
vs batched fetches), so its round-trip estimation could be far from
being accurate.

>   Let's start from a small step: divorce FDW provider's interface and
> client interface so we don't have to keep backward compatibility and
> expose too much internals on client side. In this case we can freely
> design unified "table" interface that would work for the optimizer no
> matter if the table is local or foreign.
>   Current implementation based on EDS provider hardly can fit.

I agree the current implementation does not fit CSV and alike well. And
it wasn't designed for that, although it's surely possible to implement
such a provider. In RedDatabase, we have custom adapters feature for
external tables that allows to represent *local files* (CSV included) as
a table. While it could be possible to merge "external" and new
"foreign" table features into a single one, many hard questions arise
when we start thinking about optimization (see below). So we've kept
them separate.

The current implementation is primarily designed to deal with RDBMS as a
remote server. So it operates with databases / queries and the provider
interface (while being somewhat heavier than necessary) fits nicely. The
current implementation also allows to share the code between EDS (which
works with queries) and foreign tables. And it allows to implement
various remote optimizations easily (by composing different remote
queries under the hood).

Yes, it makes some optimizations hard without a dedicated
optimizer-level (cost/selectivity related) methods. But I can hardly
imagine how a provider can provide an accurate cardinality estimation
for your "table" interface. Executing select count(*) for that table is
not an option being terribly slow. In PGSQL I believe you may get that
value from the system optimizer statistics tables/view. But is it
possible for the every RDBMS in the world? And is it always possible to
return other metrics useful for the optimizer?

Next, imagine we need to join two foreign tables remotely. It's easily
doable with the current implementation. With CSV it's impossible and
local joining is the only option, this is OK. But for foreign RDBMS
you'd need to go further and implement also "InnerJoin", "OuterJoin",
"OrderBy", "GroupBy" and tons of other interfaces. And embed them
somehow inside the current optimizer which is not ideal itself. I don't
foresee it being doable in the next decade, sorry.


Dmitry

Denis Simonov

unread,
Jan 22, 2025, 2:16:50 AMJan 22
to firebird-devel

FOREIGN TABLE is a convenient thing for working with a table/view in another DBMS. They are good for import/export. I seriously doubt that anyone will use them in complex queries. After all, if you need to do something more complex on the remote side, you can always use EXECUTE STATEMENT.

But I have another question. These things allow you to import data from a remote DB quite effectively, but there are questions about export. Will there be any options to use BatchAPI? Transferring 100,000 rows to the remote side is a lot of round-trips. Are there any optimizations or special options planned?
среда, 22 января 2025 г. в 10:04:31 UTC+3, Dmitry Yemanov:

Denis Simonov

unread,
Jan 22, 2025, 2:20:27 AMJan 22
to firebird-devel
I mean queries like this

```
INSERT INTO FWD_TABLE (A, B, C)
SELECT A, B, C FROM FB_TABLE
```


среда, 22 января 2025 г. в 10:16:50 UTC+3, Denis Simonov:

Vasiliy Yashkov

unread,
Jan 22, 2025, 2:30:04 AMJan 22
to firebir...@googlegroups.com
вт, 21 янв. 2025 г. в 20:20, Denis Simonov <sim....@gmail.com>:
Am I right in thinking that it is enough to specify the provider in `USING PLUGIN` and not specify it in the Providers configuration parameter (firebird.conf)? If so, that's great.
Yes, you're right. If the server does not explicitly specify the provider, the providers from firebird.conf will be used.
 
Not bad for the initial implementation. True, I did not quite understand the meaning of what GENERATED ALWAYS AS IDENTITY gives? Let's say some restrictions can be checked on the Firebird side, but it is not very clear how the identity columns are used?
The identity columns are working as usual. If the value generated by the current server is already present in the table on the foreign server, error will be returned:

Statement failed, SQLSTATE = 42000
Execute statement error at isc_dsql_execute2 :
335544665 : violation of PRIMARY or UNIQUE KEY constraint "INTEG_82" on table "FT_TEST"
335545072 : Problematic key value is ("F_ID" = 8)
Statement : insert into ft_test(f_id, f_vchar, f_blob, f_bool) values(?, ?, ?, ?)
Data source : Firebird::localhost:/tmp/external.fdb

If the column in the table on the foreign server can be NULL, then the value in the foreign table will be 0. 
It may be worth not using identity columns in foreign tables. Although they are not prohibited by the MED standard.

Vasiliy Yashkov

unread,
Jan 22, 2025, 2:42:41 AMJan 22
to firebir...@googlegroups.com
вт, 21 янв. 2025 г. в 20:46, Denis Simonov <sim....@gmail.com>:
By the way, I remembered another problem with ODBCEngine and executing queries to it via EXECUTE STATEMENT. Not all drivers report the types of input parameters. However, our prepare prepares input and output messages. Then we can adjust them during execution, but only adjust them. For example, MySQL and its ODBC driver do not report the types of input parameters, but only their number. In this case, the ideal option would be for such drivers to simply pass the types of input parameters from the outside, that is, what is passed to EXECUTE STATEMENT. Has this problem been solved somehow?
There were no changes on the server side to pass the types from outside. This problem was partially solved in the JDBC provider by analyzing the query and retrieving metadata from the JDBC connection.

Dimitry Sibiryakov

unread,
Jan 22, 2025, 5:29:40 AMJan 22
to firebir...@googlegroups.com
Dmitry Yemanov wrote 22.01.2025 8:04:
> I doubt this is easy, as the FDW provider has almost no info about the
> underlying network protocol implemented by the driver (e.g. individual vs
> batched fetches), so its round-trip estimation could be far from being accurate.

It isn't easy but doable. Remote plugin already gather stats of sent/received
packets. Adding average response time to them is possible as well.

> The current implementation is primarily designed to deal with RDBMS as a remote server. So it operates with databases / queries and the provider interface (while being somewhat heavier than necessary) fits nicely. The current implementation also allows to share the code between EDS (which works with queries) and foreign tables. And it allows to implement various remote optimizations easily (by composing different remote queries under the hood).

In short, it is primarily designed to deal with remote Firebird. That's ok
and most likely it will be the main (practically only one) usage of it.

> Yes, it makes some optimizations hard without a dedicated optimizer-level (cost/selectivity related) methods. But I can hardly imagine how a provider can provide an accurate cardinality estimation for your "table" interface.

That's ok because the provider is not required to provide these data. It can
return some fixed bogus that would make the optimizer to choose non-optimal
plan, but it is not your problem.

> And is it always possible to return other metrics useful for the optimizer?

Yes, it is always possible to return something. As I said returned value
doesn't have to be precise, returning fixed high cost (low cardinality) is fine
because it simply force the optimizer to choose hash join/full scan for remote
tables.
And these metrics can be dynamic, i.e. if FDW provider make queries itself,
it can gather statistics about their execution time and this way find out data
needed to make optimization for following invocations.
For example you requested filtering by a field. For the first time fixed
cardinality corresponding to full scan is returned. Then when the query is
executed, the provider check response time and give this filter higher
selectivity if the time is found to be better than expected.

> Next, imagine we need to join two foreign tables remotely. It's easily doable with the current implementation. With CSV it's impossible and local joining is the only option, this is OK. But for foreign RDBMS you'd need to go further and implement also "InnerJoin", "OuterJoin", "OrderBy", "GroupBy" and tons of other interfaces. And embed them somehow inside the current optimizer which is not ideal itself. I don't foresee it being doable in the next decade, sorry.

Right. That's why moving by little steps is important. Implement one
method/interface in this version and another in the next. There is no need to
rush for ultimate solution right now as Adriano does.

--
WBR, SD.

Dimitry Sibiryakov

unread,
Jan 22, 2025, 5:36:09 AMJan 22
to firebir...@googlegroups.com
Vasiliy Yashkov wrote 22.01.2025 8:42:
> вт, 21 янв. 2025 г. в 20:46, Denis Simonov <sim....@gmail.com
> <mailto:sim....@gmail.com>>:
>
> By the way, I remembered another problem with ODBCEngine and executing
> queries to it via EXECUTE STATEMENT. Not all drivers report the types of
> input parameters. However, our prepare prepares input and output messages.
> Then we can adjust them during execution, but only adjust them. For example,
> MySQL and its ODBC driver do not report the types of input parameters, but
> only their number. In this case, the ideal option would be for such drivers
> to simply pass the types of input parameters from the outside, that is, what
> is passed to EXECUTE STATEMENT. Has this problem been solved somehow?
>
> There were no changes on the server side to pass the types from outside.

What do you mean? I see in code that it is quite possible to derive needed
types from EDS parameter's definition and pass these types to the statement's
InputMetadata.

--
WBR, SD.

Vasiliy Yashkov

unread,
Jan 22, 2025, 7:04:38 AMJan 22
to firebird-devel
среда, 22 января 2025 г. в 13:36:09 UTC+3, Dimitry Sibiryakov:
What do you mean? I see in code that it is quite possible to derive needed
types from EDS parameter's definition and pass these types to the statement's
InputMetadata.

I mean the current implementation requires the provider to return an input message. 
If some driver does not return parameter types, then in this case the input message 
will be incomplete, and you can get an internal error (typically in CVT_move_common). 

Dimitry Sibiryakov

unread,
Jan 22, 2025, 7:24:54 AMJan 22
to firebir...@googlegroups.com
Vasiliy Yashkov wrote 22.01.2025 13:04:
> I mean the current implementation requires the provider to return an input message.

That's why it should be changed as one of the first little steps.
EDS must request data in the format it needs. Format of source data should be
irrelevant.
I know that it was the initial design but this design wasn't good.

--
WBR, SD.

Denis Simonov

unread,
Jan 22, 2025, 8:33:06 AMJan 22
to firebird-devel
I mean the current implementation requires the provider to return an input message.  If some driver does not return parameter types, then in this case the input message  will be incomplete, and you can get an internal error (typically in CVT_move_common). 

This is exactly what I mean. But EDS knows the types of input parameters in the place where we call it. It would be possible to pass an additional option from the provider, for example in isc_feature, that the provider cannot provide full information about the types of input parameters, and then simply ignore the InputMetadata that is returned from it and replace it with information about the EDS parameters.
 

среда, 22 января 2025 г. в 15:24:54 UTC+3, Dimitry Sibiryakov:

Dimitry Sibiryakov

unread,
Jan 22, 2025, 8:37:38 AMJan 22
to firebir...@googlegroups.com
Denis Simonov wrote 22.01.2025 14:33:
> This is exactly what I mean. But EDS knows the types of input parameters in the
> place where we call it. It would be possible to pass an additional option from
> the provider, for example in isc_feature, that the provider cannot provide full
> information about the types of input parameters, and then simply ignore the
> InputMetadata that is returned from it and replace it with information about the
> EDS parameters.

What could be a reason for a provider to return "false" for this feature and
to prefer own definition of metadata instead of one supplied by EDS?
Requesting types of input parameters from server is always slower than using
of already given MessageMetadata.

--
WBR, SD.

Denis Simonov

unread,
Jan 22, 2025, 10:01:04 AMJan 22
to firebird-devel
> What could be a reason for a provider to return "false" for this feature and
> to prefer own definition of metadata instead of one supplied by EDS?
> Requesting types of input parameters from server is always slower than using
> of already given MessageMetadata.

Because the mechanism should be common. If Firebird Engine can return these parameters and does, then good. On the EDS side, there will simply be type conversions. You're not going to cut this mechanism out of the engine, are you?

And if the external provider can't do this, then it's logical to use the VARCHAR(32765) stuff directly and pass real types from the EDS side. It's just common in MySQL to specify parameter types and sizes from the client. They have a function in the API that, according to the description, should return information about the input parameters, but it's just a stub that doesn't do anything (like maybe they'll implement it in the future).
среда, 22 января 2025 г. в 16:37:38 UTC+3, Dimitry Sibiryakov:

Dimitry Sibiryakov

unread,
Jan 22, 2025, 10:13:39 AMJan 22
to firebir...@googlegroups.com
Denis Simonov wrote 22.01.2025 16:01:
> > What could be a reason for a provider to return "false" for this feature and
> > to prefer own definition of metadata instead of one supplied by EDS?
> > Requesting types of input parameters from server is always slower than using
> > of already given MessageMetadata.
>
> Because the mechanism should be common. If Firebird Engine can return these
> parameters and does, then good. On the EDS side, there will simply be type
> conversions. You're not going to cut this mechanism out of the engine, are you?

Which exactly mechanism? Firebird engine does type coercion anyway so there
is no sense to do it twice: on server side + on EDS side. (Actually trice
without merged PR#8145.)

> And if the external provider can't do this, then it's logical to use the
> VARCHAR(32765) stuff directly and pass real types from the EDS side. It's just
> common in MySQL to specify parameter types and sizes from the client. They have
> a function in the API that, according to the description, should return
> information about the input parameters, but it's just a stub that doesn't do
> anything (like maybe they'll implement it in the future).

Exactly because of this it is more robust to always request types that are
needed on client side than ask what types the server will provide and then
perform the coercion.

--
WBR, SD.

Vasiliy Yashkov

unread,
Mar 3, 2025, 7:19:12 AMMar 3
to firebird-devel
среда, 22 января 2025 г. в 16:33:06 UTC+3, Denis Simonov:
But EDS knows the types of input parameters in the place where we call it. It would be possible to pass an additional option from the provider, for example in isc_feature, that the provider cannot provide full information about the types of input parameters, and then simply ignore the InputMetadata that is returned from it and replace it with information about the EDS parameters.

I think it's a good idea to get information from the provider that it supports preparing input parameter types. But if there are reasons to use input parameter information from EDS, when creating a provider, an option can be added (with `OPTIONS` clause or something else) to always use EDS information, whether the provider can prepare the information itself or not.
Reply all
Reply to author
Forward
0 new messages