faster reads of huge table

149 views
Skip to first unread message

Hamish Moffatt

unread,
May 21, 2025, 10:08:35 PM5/21/25
to firebird-support
Hi,

I have a table (in Firebird 5.0.2) which is simply a pair of INTEGER
columns, with a primary key across both. I need to read the whole table
out, and there's 1.3 million records, and this is slow - it's taking
over 12 seconds on a reasonably fast PC

The issue seems to be just the number of calls to isc_dsql_fetch and the
logic to unpack the structures. I'm using Qt and its built-in driver for
Firebird.

The server is running on the same machine. I fiddled with
TcpRemoteBufferSize and Wirecrypt settings and didn't see any change there.

Is there any way to speed this up?

Crazy ideas include a stored proc or user function that returns as many
rows as possible packed into one, in JSON format, or a string list or
something. (or an ARRAY but that seems to be discouraged.)


thanks

Hamish

liviuslivius

unread,
May 22, 2025, 2:34:22 AM5/22/25
to firebird...@googlegroups.com

Oto poprawiona wersja tekstu w języku angielskim:


Hi,


I'm not sure about the use case, as retrieving 1 million rows is an unusual operation. Is it a frequent task? Why do you consider 12 seconds to be slow? Please describe the scenario.


However, if you're working directly on the server, you can use XNet or the embedded mode, which should be faster.


Regards,
Karol Bieniaszewski


-------- Oryginalna wiadomość --------
Od: 'Hamish Moffatt' via firebird-support <firebird...@googlegroups.com>
Data: 22.05.2025 04:08 (GMT+01:00)
Do: firebird-support <firebird...@googlegroups.com>
Temat: [firebird-support] faster reads of huge table

--
Support the ongoing development of Firebird! Consider donating to the Firebird Foundation and help ensure its future. Every contribution makes a difference. Learn more and donate here:
https://www.firebirdsql.org/donate
---
You received this message because you are subscribed to the Google Groups "firebird-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firebird-suppo...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/firebird-support/87c5925d-2384-4fed-aa5f-f05b1dd2b7fe%40risingsoftware.com.

Tomasz Tyrakowski

unread,
May 22, 2025, 2:40:35 AM5/22/25
to firebird...@googlegroups.com
On 22.05.2025 at 04:08, 'Hamish Moffatt' via firebird-support wrote:
> I have a table (in Firebird 5.0.2) which is simply a pair of INTEGER
> columns, with a primary key across both. I need to read the whole table
> out, and there's 1.3 million records, and this is slow - it's taking
> over 12 seconds on a reasonably fast PC
>
> The issue seems to be just the number of calls to isc_dsql_fetch and the
> logic to unpack the structures. I'm using Qt and its built-in driver for
> Firebird.
>
> The server is running on the same machine. I fiddled with
> TcpRemoteBufferSize and Wirecrypt settings and didn't see any change there.
>
> Is there any way to speed this up?
How do you connect to your database? Via TCP/IP, XNET, or in embedded mode?
If the speed of fetching data is a crucial factor, use embedded mode -
this way all DB access code lives in your process' space and all data
transfer boils down to copying memory blocks.
If you need to go through the actual server process, try XNET -
https://github.com/FirebirdSQL/firebird/blob/master/doc/README.xnet.
I don't think serializing sequences of rows into one will speed things
up (but I don't have hard data - if you do some testing, please post
your results, I'm curious whether it's a viable approach at all) - the
code you'd need to parse it back into arrays will probably overweight
the code fbclient needs to unpack the simple message buffers.
Also, Firebird by default caches rows server-side and sends them in
batches, so even with TCP/IP it doesn't mean a network stack roundtrip
for every single row (unless you do SELECT FOR UPDATE).

regards
Tomasz

Dimitry Sibiryakov

unread,
May 22, 2025, 4:08:47 AM5/22/25
to firebird...@googlegroups.com
'liviuslivius' via firebird-support wrote 22.05.2025 8:34:
> However, if you're working directly on the server, you can use XNet or the
> embedded mode, which should be faster.

Embedded mode would be *slower* in this case.

--
WBR, SD.

Hamish Moffatt

unread,
May 22, 2025, 6:35:28 AM5/22/25
to firebird...@googlegroups.com
On 22/5/25 16:40, Tomasz Tyrakowski wrote:
On 22.05.2025 at 04:08, 'Hamish Moffatt' via firebird-support wrote:
I have a table (in Firebird 5.0.2) which is simply a pair of INTEGER columns, with a primary key across both. I need to read the whole table out, and there's 1.3 million records, and this is slow - it's taking over 12 seconds on a reasonably fast PC

The issue seems to be just the number of calls to isc_dsql_fetch and the logic to unpack the structures. I'm using Qt and its built-in driver for Firebird.

The server is running on the same machine. I fiddled with TcpRemoteBufferSize and Wirecrypt settings and didn't see any change there.

Is there any way to speed this up?
How do you connect to your database? Via TCP/IP, XNET, or in embedded mode?
If the speed of fetching data is a crucial factor, use embedded mode - this way all DB access code lives in your process' space and all data transfer boils down to copying memory blocks.
If you need to go through the actual server process, try XNET - https://github.com/FirebirdSQL/firebird/blob/master/doc/README.xnet.


I do need to use the server, as I have other applications connecting too. In production my application server runs in a different container to Firebird, so it will have to be TCP.

I did try XNET in my development environment just for interest's sake. Fetching all those rows was slightly faster, but not significantly. Possibly within the noise.


Interestingly when I force quit my application, the server logged

HAMISH-WIN10    Thu May 22 20:33:35 2025
        XNET error: Server shutdown detected

then connecting to it over XNET again hangs. And one time I got


HAMISH-WIN10    Thu May 22 19:44:48 2025
        XNET error: XNET server initialization failed. Probably another instance of server is already running.
        operating system directive CreateMutex failed
        Cannot create a file when that file already exists.



Hamish

Pieter Bas Hofstede

unread,
May 22, 2025, 10:49:14 AM5/22/25
to firebird-support
Does reading cached data makes a difference compared to non-cached data?
For kind of the same query (select first 1.3mill from ..) takes about 4.4seconds for me (fetch all)
Op donderdag 22 mei 2025 om 12:35:28 UTC+2 schreef ham...@risingsoftware.com:

Hamish Moffatt

unread,
May 22, 2025, 8:22:05 PM5/22/25
to firebird...@googlegroups.com
On 23/5/25 00:49, Pieter Bas Hofstede wrote:
> Does reading cached data makes a difference compared to non-cached data?
> For kind of the same query (select first 1.3mill from ..) takes about
> 4.4seconds for me (fetch all)


It's taking 11 seconds for me.

What sort of caching do you mean? I had it fetch the data twice on the
same connection, and it wasn't any quicker the second time.


Hamish


Hamish Moffatt

unread,
May 22, 2025, 9:48:26 PM5/22/25
to firebird...@googlegroups.com
On 22/5/25 12:08, 'Hamish Moffatt' via firebird-support wrote:
> Crazy ideas include a stored proc or user function that returns as
> many rows as possible packed into one, in JSON format, or a string
> list or something. (or an ARRAY but that seems to be discouraged.)


I gave this a try - a stored proc returning an INTEGER and a BLOB text
blob containing a JSON list of numbers - and it was slower - 11 seconds
grew to 20.

I wonder if it would be faster if using a VARBINARY instead of a blob.
Is there a way to pack the integers into 32-bit blocks in a VARBINARY? I
couldn't find the right conversion function or cast.


Hamish

Pieter Bas Hofstede

unread,
May 23, 2025, 2:22:39 AM5/23/25
to firebird-support
With caching I mean data cached by either Firebird(dbcachepages) of OS (filesystemcache) which will prevent going to disk in a consecutive action.

Things to look at:
- no where-clause in your SQL (you're looking up data natural instead of by accident using an index)
- no sorting in your SQL (could have the need of creating temp sortfiles which could slow things down)
- table formats / record versions / record fragments (check with gstat [server/dbname] -r -t [tablename]). If table/records have lots of versions / formats then backup + restore


Op vrijdag 23 mei 2025 om 03:48:26 UTC+2 schreef ham...@risingsoftware.com:

Hamish Moffatt

unread,
May 23, 2025, 2:29:44 AM5/23/25
to firebird...@googlegroups.com
On 23/5/25 16:22, Pieter Bas Hofstede wrote:
> With caching I mean data cached by either Firebird(dbcachepages) of OS
> (filesystemcache) which will prevent going to disk in a consecutive
> action.
>
> Things to look at:
> - no where-clause in your SQL (you're looking up data natural instead
> of by accident using an index)
> - no sorting in your SQL (could have the need of creating temp
> sortfiles which could slow things down)
> - table formats / record versions / record fragments (check with gstat
> [server/dbname] -r -t [tablename]). If table/records have lots of
> versions / formats then backup + restore


Thanks for this.

The plan shows the select is using the primary key. There's no where
clause. My testing is on a fresh restore of the db.

It seems to me the issue is just the sheer number of rows, with no easy
solution except to change the application not to require this all to be
read in at once.


Hamish


Vlad Khorsun

unread,
May 23, 2025, 1:26:45 PM5/23/25
to firebird-support
The plan shows the select is using the primary key. There's no where
clause. 

  ORDER BY PK ?
 
Regards,
Vlad

Hamish Moffatt

unread,
May 25, 2025, 1:45:10 AM5/25/25
to firebird...@googlegroups.com
On 24/5/25 03:26, Vlad Khorsun wrote:
The plan shows the select is using the primary key. There's no where
clause. 

  ORDER BY PK ?


Yes.


SQL> set planonly ;
SQL> select course_num,class_num from g_course_class order by 1,2 ;

PLAN (G_COURSE_CLASS ORDER PK_G_COURSE_CLASS)


Hamish

Mark Rotteveel

unread,
May 25, 2025, 4:26:56 AM5/25/25
to firebird...@googlegroups.com
On 23/05/2025 03:48, 'Hamish Moffatt' via firebird-support wrote:
> On 22/5/25 12:08, 'Hamish Moffatt' via firebird-support wrote:
>> Crazy ideas include a stored proc or user function that returns as
>> many rows as possible packed into one, in JSON format, or a string
>> list or something. (or an ARRAY but that seems to be discouraged.)
>
>
> I gave this a try - a stored proc returning an INTEGER and a BLOB text
> blob containing a JSON list of numbers - and it was slower - 11 seconds
> grew to 20.

For a single row, or for multiple rows? How did you generate the blob?

> I wonder if it would be faster if using a VARBINARY instead of a blob.
> Is there a way to pack the integers into 32-bit blocks in a VARBINARY? I
> couldn't find the right conversion function or cast.

Possibly, but once Firebird 5.0.3 is released, and you use a client that
supports protocol 19 with inline blobs, such an improvement will go
largely out the window because blobs (default: smaller than 64 KiB) will
be sent together with the query result, unless the bottleneck is
elsewhere (e.g. generating the blob).

Mark
--
Mark Rotteveel

Vlad Khorsun

unread,
May 25, 2025, 4:36:12 AM5/25/25
to firebird-support
  Could you try without ORDER BY  - to check if it is a reason of the slowness ?
Or, if it is absolute necessary, try to force sort:

select course_num + 0, class_num from g_course_class order by 1,2 ;

Regards,
Vlad

Hamish Moffatt

unread,
May 25, 2025, 7:38:24 AM5/25/25
to firebird...@googlegroups.com
On 25/5/25 18:26, 'Mark Rotteveel' via firebird-support wrote:
On 23/05/2025 03:48, 'Hamish Moffatt' via firebird-support wrote:
On 22/5/25 12:08, 'Hamish Moffatt' via firebird-support wrote:
Crazy ideas include a stored proc or user function that returns as many rows as possible packed into one, in JSON format, or a string list or something. (or an ARRAY but that seems to be discouraged.)


I gave this a try - a stored proc returning an INTEGER and a BLOB text blob containing a JSON list of numbers - and it was slower - 11 seconds grew to 20.

For a single row, or for multiple rows? How did you generate the blob?


The table links each COURSE_NUM to a list of CLASS_NUM, so I return a row for each COURSE_NUM with a JSON array of CLASS_NUM integers.


Table:

COURSE_NUM                      INTEGER Not Null
CLASS_NUM                       INTEGER Not Null DEFAULT 1
CONSTRAINT FK_G_COURSE_CLASS_CLASS:
  Foreign key (CLASS_NUM)    References G_CLASS (CLASS_NUM) On Update Cascade On Delete Cascade
CONSTRAINT FK_G_COURSE_CLASS_TEST:
  Foreign key (COURSE_NUM)    References G_COURSE (COURSE_NUM) On Update Cascade On Delete Cascade
CONSTRAINT PK_G_COURSE_CLASS:
  Primary key (COURSE_NUM, CLASS_NUM)


Procedure text:
=============================================================================
declare variable PREV_COURSE integer;
declare variable THIS_COURSE integer;
declare variable THIS_CLASS integer;
begin
    prev_course = -1;
    for select course_num, class_num
    from g_course_class
    order by course_num, class_num
    into :this_course, :this_class

    do
    begin
        if (prev_course != this_course) then
        begin
            if (prev_course <> -1) then
            begin
                classes = blob_append(classes, ']');
                course_num = prev_course;
                suspend;
            end

            prev_course = this_course;
            classes = blob_append(null, '[', this_class);

        end
        else
            classes = blob_append(classes, ',', this_class);

    end

    classes = blob_append(classes, ']');
    course_num = prev_course;
    suspend;

end




Hamish


Hamish Moffatt

unread,
May 25, 2025, 7:43:41 AM5/25/25
to firebird...@googlegroups.com
On 25/5/25 18:36, Vlad Khorsun wrote:

SQL> set planonly ;
SQL> select course_num,class_num from g_course_class order by 1,2 ;

PLAN (G_COURSE_CLASS ORDER PK_G_COURSE_CLASS)


  Could you try without ORDER BY  - to check if it is a reason of the slowness ?
Or, if it is absolute necessary, try to force sort:

select course_num + 0, class_num from g_course_class order by 1,2 ;


Yes, those are a bit faster. Testing from Python,

select COURSE_NUM, CLASS_NUM from G_COURSE_CLASS order by COURSE_NUM, CLASS_NUM - 12 seconds

select COURSE_NUM, CLASS_NUM from G_COURSE_CLASS - 9 seconds

select COURSE_NUM+0, CLASS_NUM from G_COURSE_CLASS order by 1, 2 - 10.2 seconds


My server version here is 5.0.1.


Hamish

Hamish Moffatt

unread,
May 25, 2025, 7:52:35 AM5/25/25
to firebird...@googlegroups.com
On 25/5/25 18:26, 'Mark Rotteveel' via firebird-support wrote:
> Possibly, but once Firebird 5.0.3 is released, and you use a client
> that supports protocol 19 with inline blobs, such an improvement will
> go largely out the window because blobs (default: smaller than 64 KiB)
> will be sent together with the query result, unless the bottleneck is
> elsewhere (e.g. generating the blob).


That'll be awesome. We have some tables with lots of rows with
potentially large text fields (but often not many chars actually used).
We had big VARCHARs and wasted a lot of disk space, but we tried moving
them all to blobs and found it was quite slow to read because of the
extra calls to read the blobs.

Does that just mean a new fbclient version or the application code also
needs to be adapted?


Hamish

Mark Rotteveel

unread,
May 25, 2025, 8:00:25 AM5/25/25
to firebird...@googlegroups.com
You'll need Firebird 5.0.3 or higher and a fbclient 5.0.3 or higher for
this. There is no need to adjust application code, unless you want to
configure the maximum inline blob size and the size of the
per-connection cache. If the application asks for a blob, the client
checks the cache, and if it's there serve it from the cache, otherwise
ask the server. If the cache is full (e.g. you query blobs, but don't
open and close them), received inline blobs are dropped, and will need
to be retrieved from the server if you do want to open one.

As an aside, the default of 65535 bytes I mentioned is also the maximum
(at least for now), configuring a higher value will have the same effect
as specifying 65535.

Mark
--
Mark Rotteveel

Mark Rotteveel

unread,
May 25, 2025, 8:10:37 AM5/25/25
to firebird...@googlegroups.com
On 25/05/2025 14:00, 'Mark Rotteveel' via firebird-support wrote:
> You'll need Firebird 5.0.3 or higher and a fbclient 5.0.3 or higher for
> this. There is no need to adjust application code, unless you want to
> configure the maximum inline blob size and the size of the per-
> connection cache. If the application asks for a blob, the client checks
> the cache, and if it's there serve it from the cache, otherwise ask the
> server. If the cache is full (e.g. you query blobs, but don't open and
> close them), received inline blobs are dropped, and will need to be
> retrieved from the server if you do want to open one.
To be explicit: using inline blobs or server-side blobs is transparent
to the client user, it is decided inside fbclient.

There is only one caveat: the application shouldn't use a blob parameter
buffer when opening the blob. If it does, fbclient assumes you're
requesting a sub type or character set conversion, and sends it to the
server, bypassing the cache.

Mark
--
Mark Rotteveel

Vlad Khorsun

unread,
May 27, 2025, 3:59:46 PM5/27/25
to firebird-support
  Tried with IBE,  connect using INET (localhost), table with two INTEGER fields, 1'300'000 records:

create table t2 (id int not null, f2 int);
commit;

execute block
as
declare v_id int = 0;
begin
  while (v_id < 1300000) do
  begin
    insert into t2 (id, f2) values (:v_id, mod(:v_id, 1000));
    v_id = v_id + 1;
  end
end

commit

alter table t2 add constraint pk_t2 primary key (id)
commit

All queries below was run with "fetch all":

query1:
select count(*) from t2

Plan
--------------------------------------------------------------------------------
PLAN (T2 NATURAL)

------ Performance info ------
Prepare time = 0ms
Execute time = 156ms
Avg fetch time = 156,00 ms
Current memory = 19 727 952
Max memory = 19 756 720
Memory buffers = 2 048
Reads from disk to cache = 9 565
Writes from cache to disk = 1
Fetches from cache = 1 338 242

query2:
select * from t2

PLAN (T2 NATURAL)
Execute time = 2s 500ms


query3:
select * from t2
order by id, f2

PLAN SORT (T2 NATURAL)
Execute time = 2s 828ms

query4:
select * from t2
order by id

PLAN (T2 ORDER PK_T2)
Execute time = 2s 438ms

Looks like problem with client app (python ?), not with Firebird.
One of the obvious suspect is how memory for resultset is allocated.

Regards,
Vlad

Hamish Moffatt

unread,
May 27, 2025, 7:53:00 PM5/27/25
to firebird...@googlegroups.com
On 28/5/25 05:59, Vlad Khorsun wrote:
  Tried with IBE,  connect using INET (localhost), table with two INTEGER fields, 1'300'000 records:


Thanks, I can also reproduce this in IBExpert, though my table has a primary key over both values, and both are also foreign keys.


CREATE TABLE G_COURSE_CLASS (
  COURSE_NUM INTEGER NOT NULL,
  CLASS_NUM INTEGER DEFAULT 1 NOT NULL,
  CONSTRAINT PK_G_COURSE_CLASS PRIMARY KEY (COURSE_NUM, CLASS_NUM),
  CONSTRAINT FK_G_COURSE_CLASS_TEST FOREIGN KEY (COURSE_NUM) REFERENCES G_COURSE (COURSE_NUM) ON DELETE CASCADE ON UPDATE CASCADE,
  CONSTRAINT FK_G_COURSE_CLASS_CLASS FOREIGN KEY (CLASS_NUM) REFERENCES G_CLASS (CLASS_NUM) ON DELETE CASCADE ON UPDATE CASCADE
);


select course_num, class_num
from g_course_class

PLAN (G_COURSE_CLASS NATURAL)
Execute time = 3s 641ms


select course_num+0, class_num


from g_course_class order by 1, 2

PLAN SORT (G_COURSE_CLASS NATURAL)
Execute time = 4s 500ms


select course_num, class_num


from g_course_class order by 1, 2

PLAN (G_COURSE_CLASS ORDER PK_G_COURSE_CLASS)
Execute time = 11s 563ms


Why is it much slower to use the primary key for the sort?


Looks like problem with client app (python ?), not with Firebird.
One of the obvious suspect is how memory for resultset is allocated.


I did the quick benchmarking in Python using fdb, and is much slower than the above.

My application is in Qt (C++). I made the query change above and I see it is much faster now from my C++ application, down from 12 seconds to 5.4.


Thanks,

Hamish


Reply all
Reply to author
Forward
0 new messages