Do empty varchar, integer and numeric fields degrade the Firebird operations and performance?

22 views
Skip to first unread message

Alexander Skara

unread,
Jul 15, 2022, 7:26:16 AMJul 15
to firebird-support
I have 2 tables: 1) for articles 2) for persons. And they are used in many different scenarios and applications. That is - I am using 1 table for the entire hierarchy of inheritance of article entity (JPA notions how inheritance is mapped to different ORM/SQL structures). I.e. my article table can contain datafields for the articles that are glasses and lenses (optic stores). My article table also contains fields for the articles that are wood products, beverages, jewellery etc. Lot of fields and most of them are empty and only some of them are filled and used for specific articles. Article table can contain 100K or more records.

My question - does such strategy, indiscrimante addition of new, mostly empty fields, does such strategy degrades performance or has other operational drawbacks?

As I understand, then there are schema ODS pages that contain the information about the fields and their datatypes and then there are data ODS pages that contain the data only. I.e. data ODS pages does not contain anything for the empty fields and hence - lot of empty fields should not increase the DB size or degrade the search/read/insert peformance.

BR
Alex

Stanislav Hruška

unread,
Jul 15, 2022, 7:57:53 AMJul 15
to firebird-support
I'm not an expert. I do not know answer. But it smells bad DB design to me.

Dátum: piatok 15. júla 2022, čas: 13:26:16 UTC+2, odosielateľ: alexand...@gmail.com

Mark Rotteveel

unread,
Jul 15, 2022, 7:59:51 AMJul 15
to firebird...@googlegroups.com
On 15-07-2022 13:26, Alexander Skara wrote:
> I have 2 tables: 1) for articles 2) for persons. And they are used in
> many different scenarios and applications. That is - I am using 1 table
> for the entire hierarchy of inheritance of article entity (JPA notions
> how inheritance is mapped to different ORM/SQL structures). I.e. my
> article table can contain datafields for the articles that are glasses
> and lenses (optic stores). My article table also contains fields for the
> articles that are wood products, beverages, jewellery etc. Lot of fields
> and most of them are empty and only some of them are filled and used for
> specific articles. Article table can contain 100K or more records.
>
> My question - does such strategy, indiscrimante addition of new, mostly
> empty fields, does such strategy degrades performance or has other
> operational drawbacks?

Obviously, not having those fields will perform better, as it saves
overhead on having additional columns, additional I/O (though this is
partially mitigated by RLE on-disk), additional CPU for
compressing/decompressing the RLE, additional memory for the record
image, additional I/O for sending data (though this is largely mitigated
by the null-bitset *if* you use NULL instead of, for example, empty
string or 0), and additional memory client-side as Jaybird will need to
create additional objects for handling those columns in the result set,
and your JPA objects are larger because they will have more instance
fields (roughly 4 bytes per field).

However, you would need to measure this in order to quantify that
impact. I guess in the grand scheme of things, it won't matter that
much, but the effect is non-zero.

> As I understand, then there are schema ODS pages that contain the
> information about the fields and their datatypes and then there are data
> ODS pages that contain the data only. I.e. data ODS pages does not
> contain anything for the empty fields and hence - lot of empty fields
> should not increase the DB size or degrade the search/read/insert
> peformance.

That is a misunderstanding (or at least an oversimplification):
additional columns, even when NULL, do require more space than not
having those columns. The impact of empty columns is partially mitigated
by the RLE compression that is done on the record, but having compressed
columns will still take more disk I/O (and CPU) than not having to
compress things because they aren't there.

For example, on the server, a NULL VARCHAR(50) still has an in-memory
image of 52 bytes (2-byte length with value zero, followed 50 0x00
bytes). On-disk, this is reduced to 2 bytes (0x34 0x00, or byte with
value 52, byte with value 0) (possibly more efficient if there are
multiple NULL records next to each other).

A NULL VARCHAR(1000) is in-memory 1002 bytes, but on-disk
16 bytes (7x (0x7f 0x00) + (0x71 0x00)) (ignoring possible efficiencies
of neighbouring NULLs).

Mark

--
Mark Rotteveel

Hamish Moffatt

unread,
Jul 18, 2022, 12:27:53 AMJul 18
to firebird...@googlegroups.com
On 15/7/22 21:59, Mark Rotteveel wrote:
> For example, on the server, a NULL VARCHAR(50) still has an in-memory
> image of 52 bytes (2-byte length with value zero, followed 50 0x00
> bytes). On-disk, this is reduced to 2 bytes (0x34 0x00, or byte with
> value 52, byte with value 0) (possibly more efficient if there are
> multiple NULL records next to each other).


But, as I recall from previous discussions, UTF-8 VARCHAR takes 4x the
space specified, and the RLE compression can do at best 4:1, so an empty
VARCHAR(4096) still takes 128 bytes per row.

We made the mistake of assuming a null varchar took basically no space.
But with tens to hundreds of thousands of records it starts to add up.



Hamish

Dmitry Yemanov

unread,
Jul 18, 2022, 1:01:56 AMJul 18
to firebird...@googlegroups.com
18.07.2022 07:28, 'Hamish Moffatt' via firebird-support wrote:
>
> But, as I recall from previous discussions, UTF-8 VARCHAR takes 4x the
> space specified, and the RLE compression can do at best 4:1

64:1


Dmitry

Hamish Moffatt

unread,
Jul 18, 2022, 2:42:45 AMJul 18
to firebird...@googlegroups.com
Thanks. I meant to type 128:1. But it's actually 128:2 ie 64:1, right?


Hamish

Reply all
Reply to author
Forward
0 new messages