Metadata sizing based on existing filesizes

510 views
Skip to first unread message

jackc...@gmail.com

unread,
Nov 11, 2015, 12:41:09 PM11/11/15
to beegfs-user
We're looking to trial moving from a small Gluster setup to BeeGFS.

We're planning to re-use some of the fairly powerful nodes, adding SSDs to create metadata stores. I know the typical recommendation would be 1-2% of your total storage size, but I wanted to check this would be reasonable using the numbers from our existing ~60 Terabytes. Here's an old-ish analysis of the data, but the file sizes will be very representative:

> nrow(sizes); summary(sizes)
[1] 5574378
       V1           
 Min.   :0.000e+00  
 1st Qu.:3.354e+04  
 Median :1.093e+05  
 Mean   :7.518e+06  
 3rd Qu.:2.077e+05  
 Max.   :3.788e+11  

That's an R analysis of the file sizes - essentially it's 5.5 million files of a mean average size of ~7.5 megabytes; I'm hoping the quartile figures will be enough of a clue about the distribution of the file sizes!

I'll see if I can dig out the original figures as all I've got is this rather sparse summary.

thanks for any help; I'd like to find out where the metadata sizing theory comes from!

cheers
jack

Christian Goll

unread,
Nov 12, 2015, 1:57:17 AM11/12/15
to fhgfs...@googlegroups.com
Hello Jack,
we have two big BeeGFS systems at our site. Here is a tabular overview:
name size used metadata used
------------------------------------------------
fast 1.1P 660T 600G (300 mio inodes) 54%
universe 764T 693T 3T (1500 mio inodes) 19%

As you may notice the bigger system has the smaller metadata capacity,
as the metadata server was designed for 150T, but we are actually
planing to quadruple the capacity, as we ran out of inodes two times. So
0.1% to 0.2% for the ratio between data and metadata should be a good
number.

kind regards,
Christian
--
Dr. Christian Goll
HITS gGmbH
Schloss-Wolfsbrunnenweg 35
69118 Heidelberg
Germany
Phone: +49 6221 533 230
Fax: +49 6221 533 230

jackc...@gmail.com

unread,
Nov 12, 2015, 5:08:18 AM11/12/15
to beegfs-user
Thanks for the reply Christian, much appreciated.

The BeeGFS FAQ (http://www.beegfs.com/wiki/FAQ) says "for a scratch filesystem the space needed for metadata is typically between 0.5% and 1% of the total storage capacity".

Do others concur that 0.1-0.2% is a better sizing estimate for "real filesystems"? - 0.1% to 1% is quite a difference!

thanks
jack

(side note: I don't know about the posting etiquette of top-posting on google groups)

Bernd Lietzow

unread,
Nov 23, 2015, 11:04:35 AM11/23/15
to fhgfs...@googlegroups.com
Hi!

The actual size of the metadata mostly depends on the number of files in
your system. So it's hard to give a space estimate based on the storage
capacity - the number of files your are expecting is the more important
factor (see the notes under http://www.beegfs.com/wiki/FAQ#metadata_size ).

For most applications, 0.1% of the capacity *might* be enough, but since
this space can fill up quickly, mostly by running out of inodes, it's
recommended to leave a *lot* of headroom, to avoid having to go through
the trouble of increasing the metadata space in the future. Granted,
adding a metadata server is easy, but making more space on an existing
one can be a bit of work...

Kind regards
Bernd Lietzow
Reply all
Reply to author
Forward
0 new messages