On 15-07-2022 13:26, Alexander Skara wrote:
> I have 2 tables: 1) for articles 2) for persons. And they are used in
> many different scenarios and applications. That is - I am using 1 table
> for the entire hierarchy of inheritance of article entity (JPA notions
> how inheritance is mapped to different ORM/SQL structures). I.e. my
> article table can contain datafields for the articles that are glasses
> and lenses (optic stores). My article table also contains fields for the
> articles that are wood products, beverages, jewellery etc. Lot of fields
> and most of them are empty and only some of them are filled and used for
> specific articles. Article table can contain 100K or more records.
>
> My question - does such strategy, indiscrimante addition of new, mostly
> empty fields, does such strategy degrades performance or has other
> operational drawbacks?
Obviously, not having those fields will perform better, as it saves
overhead on having additional columns, additional I/O (though this is
partially mitigated by RLE on-disk), additional CPU for
compressing/decompressing the RLE, additional memory for the record
image, additional I/O for sending data (though this is largely mitigated
by the null-bitset *if* you use NULL instead of, for example, empty
string or 0), and additional memory client-side as Jaybird will need to
create additional objects for handling those columns in the result set,
and your JPA objects are larger because they will have more instance
fields (roughly 4 bytes per field).
However, you would need to measure this in order to quantify that
impact. I guess in the grand scheme of things, it won't matter that
much, but the effect is non-zero.
> As I understand, then there are schema ODS pages that contain the
> information about the fields and their datatypes and then there are data
> ODS pages that contain the data only. I.e. data ODS pages does not
> contain anything for the empty fields and hence - lot of empty fields
> should not increase the DB size or degrade the search/read/insert
> peformance.
That is a misunderstanding (or at least an oversimplification):
additional columns, even when NULL, do require more space than not
having those columns. The impact of empty columns is partially mitigated
by the RLE compression that is done on the record, but having compressed
columns will still take more disk I/O (and CPU) than not having to
compress things because they aren't there.
For example, on the server, a NULL VARCHAR(50) still has an in-memory
image of 52 bytes (2-byte length with value zero, followed 50 0x00
bytes). On-disk, this is reduced to 2 bytes (0x34 0x00, or byte with
value 52, byte with value 0) (possibly more efficient if there are
multiple NULL records next to each other).
A NULL VARCHAR(1000) is in-memory 1002 bytes, but on-disk
16 bytes (7x (0x7f 0x00) + (0x71 0x00)) (ignoring possible efficiencies
of neighbouring NULLs).
Mark
--
Mark Rotteveel