Patches for iRods 2.1

43 views
Skip to first unread message

mw...@diceresearch.org

unread,
Sep 4, 2009, 3:45:18 PM9/4/09
to iROD-Chat
Hello,

I put together a set of patches for 2.1. It can be downloaded at the ftp site ftp.sdsc.edu in the path pub/outgoing/mwan/irods/patchFor2.1.tar. The file irods2.1patch.note describes the patches.


Mike

Alinga Yeung

unread,
Sep 16, 2009, 8:38:34 PM9/16/09
to irod...@googlegroups.com
Hi,

We plan to acquire some storage for our storage system to be based on
iRODS + Postgres. Is there any iRODS documentation that would help
estimate the size of icat storage required based on the default iRODS
icat? For now, we shall base our estimation on 300 million files.

Thank you.

Alinga

ali...@uvic.ca

unread,
Sep 16, 2009, 9:41:00 PM9/16/09
to irod...@googlegroups.com
Sorry, I meant to say 100 million files.

>
> Thank you.
>
> Alinga
>
> >
>


Arcot (Raja) Rajasekar

unread,
Sep 17, 2009, 7:08:06 AM9/17/09
to irod...@googlegroups.com, Alinga Yeung


Hi Alinga
We have no system that has reached 300 million files and can only guess.
I suspect that such a system will need 250GB of disk space just for the
iCAT system metadata? May be less but this should be adequate.
But to get better performance, I would suggest getting multi-core,
multi-cpu system with lots of RAM (64GB??). Remember, that this is a
large system.

Also, a very good suggestion would be to see if you can have a cluster
to run the Postgres and use connection pooling and partitioning to get
good performance....

thanks
raja

--
Arcot Rajasekar
Professor
School of Information and Library Sciences
University of North Carolina, Chapel Hill, NC

schr...@diceresearch.org

unread,
Sep 17, 2009, 11:43:07 AM9/17/09
to irod...@googlegroups.com, Alinga Yeung
Hi Alinga,

In addition to what Raja suggested, here's some details that might be
useful to you.

When an irods file is added, another row is added to the table
r_data_main and one to r_objt_access. If you add user-defined metadata,
other rows to other tables are added, but those two contain the basic
data-object information. These tables are defined as:

create table R_DATA_MAIN
(
data_id bigint not null,
coll_id bigint not null,
data_name varchar(1000) not null,
data_repl_num INTEGER not null,
data_version varchar(250) DEFAULT '0',
data_type_name varchar(250) not null,
data_size bigint not null,
resc_group_name varchar(250),
resc_name varchar(250) not null,
data_path varchar(2700) not null,
data_owner_name varchar(250) not null,
data_owner_zone varchar(250) not null,
data_is_dirty INTEGER DEFAULT 0,
data_status varchar(250),
data_checksum varchar(1000),
data_expiry_ts varchar(32),
data_map_id bigint DEFAULT 0,
data_mode varchar(32),
r_comment varchar(1000),
create_ts varchar(32),
modify_ts varchar(32)
);

And:

create table R_OBJT_ACCESS
(
object_id bigint not null,
user_id bigint not null,
access_type_id bigint not null,
create_ts varchar(32),
modify_ts varchar(32)
);


Similarly, when collections are added, a row is added to r_coll_main and
one to r_objt_access.

So you might check with the Postgres folks and/or their documentation to
see if you can find recommendations for sizing for platforms and disks
supporting tables of similar structure and size. Please let us know
what you discover.

Thanks,

- Wayne -

Alinga Yeung

unread,
Sep 17, 2009, 6:25:06 PM9/17/09
to iRODS
Hi Wayne,

I found the following on the PostgreSQL FAQ:
=============

How much database disk space is required to store data from a typical text file?

A PostgreSQL database may require up to five times the disk space to store data from a text file.

As an example, consider a file of 100,000 lines with an integer and text description on each line. Suppose the text string averages twenty bytes in length. The flat file would be 2.8 MB. The size of the PostgreSQL database file containing this data can be estimated as 5.2 MB:

 24 bytes: each row header (approximate)
 24 bytes: one int field and one text field
+ 4 bytes: pointer on page to tuple
----------------------------------------
 52 bytes per row

The data page size in PostgreSQL is 8192 bytes (8 KB), so:

8192 bytes per page
-------------------  =  158 rows per database page (rounded down)
  52 bytes per row
 100000 data rows
------------------  =  633 database pages (rounded up)
 158 rows per page
633 database pages * 8192 bytes per page  =  5,185,536 bytes (5.2 MB)

Indexes do not require as much overhead, but do contain the data that is being indexed, so they can be large also.

NULLs are stored as bitmaps, so they use very little space.

Note that long values may be compressed transparently.

See also this presentation on the topic: Image:How Long Is a String.pdf.
==============

Based on the above information, a rough estimate of 100M files would be:
(size of (r_data_main) + size of (r_objt_access)) * 100M * 5
= (7618 B + 88 B) * 500M
= 3.853 TB
For an upper bound, if we assume having to create a collection for each file, a rough estimate would be:
3.853TB + (size of (r_coll_main) + size of (r_objt_access)) * 100M * 5
= 3.853 TB + (13662 B + 88 B) * 500M
= 3.853 TB + 6.875 TB
= 10.728 TB

This seems to be a large amount of disk space.
Some of the fields in R_DATA_MAIN and R_COLL_MAIN seems quite big. For example,
create table R_DATA_MAIN
 (
   ...
   data_name varchar(1000) not null,
   ...
   data_path varchar(2700) not null,
   ...
   data_checksum varchar(1000),
   ...
   r_comment varchar(1000),
   ...
 );

create table R_COLL_MAIN
(
  ...
  parent_coll_name varchar(2700) not null,
  coll_name varchar(2700) not null,
  ...
  coll_inheritance varchar(1000),
  ...
  coll_info1 varchar(2700) DEFAULT '0',
  coll_info2 varchar(2700) DEFAULT '0',
  ...
  r_comment varchar(1000),
  ...
);

Could these fields be made smaller?

Alinga

she...@diceresearch.org

unread,
Sep 17, 2009, 6:55:50 PM9/17/09
to irod...@googlegroups.com, iRODS
Alinga:  Just want to point out that iRODS ICAT does not store your physical files,
just the metadata of this physical file (eg, path of the file, size of file,...)

The information you find from postgresSQL FAQ, is talking about stored a "typical text file"
in postgres database

Alinga Yeung

unread,
Sep 17, 2009, 7:06:01 PM9/17/09
to irod...@googlegroups.com
she...@diceresearch.org wrote:
Alinga:  Just want to point out that iRODS ICAT does not store your physical files,
just the metadata of this physical file (eg, path of the file, size of file,...)
Yes, thank you for pointing this out. The factor of 5 does not apply.

Alinga
Reply all
Reply to author
Forward
0 new messages