We plan to acquire some storage for our storage system to be based on
iRODS + Postgres. Is there any iRODS documentation that would help
estimate the size of icat storage required based on the default iRODS
icat? For now, we shall base our estimation on 300 million files.
Thank you.
Alinga
A PostgreSQL database may require up to five times the disk space to store data from a text file.
As an example, consider a file of 100,000 lines with an integer and text description on each line. Suppose the text string averages twenty bytes in length. The flat file would be 2.8 MB. The size of the PostgreSQL database file containing this data can be estimated as 5.2 MB:
24 bytes: each row header (approximate) 24 bytes: one int field and one text field + 4 bytes: pointer on page to tuple ---------------------------------------- 52 bytes per row
The data page size in PostgreSQL is 8192 bytes (8 KB), so:
8192 bytes per page ------------------- = 158 rows per database page (rounded down) 52 bytes per row
100000 data rows ------------------ = 633 database pages (rounded up) 158 rows per page
633 database pages * 8192 bytes per page = 5,185,536 bytes (5.2 MB)
Indexes do not require as much overhead, but do contain the data that is being indexed, so they can be large also.
NULLs are stored as bitmaps, so they use very little space.
Note that long values may be compressed transparently.
See also this presentation on the topic: Image:How Long Is a String.pdf.
==============
Based on the above information, a rough estimate of 100M files would
be:
(size of (r_data_main) + size of (r_objt_access)) * 100M * 5
= (7618 B + 88 B) * 500M
= 3.853 TB
For an upper bound, if we assume having to create a collection for each
file, a rough estimate would be:
3.853TB + (size of (r_coll_main) + size of (r_objt_access)) * 100M * 5
= 3.853 TB + (13662 B + 88 B) * 500M
= 3.853 TB + 6.875 TB
= 10.728 TB
create table R_DATA_MAIN ( ... data_name varchar(1000) not null, ... data_path varchar(2700) not null, ... data_checksum varchar(1000), ... r_comment varchar(1000), ... ); create table R_COLL_MAIN ( ... parent_coll_name varchar(2700) not null, coll_name varchar(2700) not null, ... coll_inheritance varchar(1000), ... coll_info1 varchar(2700) DEFAULT '0', coll_info2 varchar(2700) DEFAULT '0', ... r_comment varchar(1000), ... );
Could these fields be made smaller?
Alinga
Alinga: Just want to point out that iRODS ICAT does not store your physical files,
just the metadata of this physical file (eg, path of the file, size of file,...)