max-obj-size vs average file size

Francisco Reyes

unread,

Oct 28, 2012, 5:20:31 PM10/28/12

to s3...@googlegroups.com

From reading archives it seems object size is dynamic up to max-obj-size.

Should max-obj-size be as large or larger than average file size from a given data set?

Example:

If I had thousands of files and the average file size was 15MB, would I make the max-obj-size 15MB? Or would it be better to make it higher than the average file size? So for 15MB average file size I would make max-obj-size 20MB?

My scenario.

A client of mine does mobile apps so they often have significant amount of multimedia assets. In addition it is possible, even likely , we may create different AWS accounts for easier billing (imagine one AWS account per client with it's own S3). So, when I am faced with a new data set I could determine average file size and if it is greater than the 10MB default I could increase max-obj-size if it would help.

Francisco Reyes

unread,

Oct 28, 2012, 5:47:30 PM10/28/12

to s3...@googlegroups.com

On Sunday, October 28, 2012 5:20:31 PM UTC-4, Francisco Reyes wrote:

From reading archives it seems object size is dynamic up to max-obj-size.

Just finished reading the " How do I choose an ideal block/max-object size? " thread.

This topic may be good for the FAQ..

To add another specific to my case. My initial users will always also access entire files. So from reading that other thread it would seem my best option is larger than my average file size since I would not ever read partial parts of a file.

Is this more or less the summary?

* For anyone accessing entire files every time, should use a max-obj-size that is as big as the largest file(s) stored or arbitrarily larger.

* For anyone reading partials content of a file should consider how much of the files will be read and adjust max-obj-size accordingly to the most commonly read amount of data from a file.

Is caching at block level or is there any type of possible read-ahead configuration? Or should that be part of the max-obj-size consideration? Meaning, if one knows that usually one reads say 10MB with a good chance of needing data sequentially following those 10MB than perhaps set max-obj-size larger than 10MB may be beneficial (depending on how often one needs data beyond the 10MB).

Nikolaus Rath

unread,

Oct 29, 2012, 9:55:39 PM10/29/12

to s3...@googlegroups.com

On 10/28/2012 05:47 PM, Francisco Reyes wrote:
> On Sunday, October 28, 2012 5:20:31 PM UTC-4, Francisco Reyes wrote:
>
> From reading archives it seems object size is dynamic up
> to max-obj-size.
>
>
> Just finished reading the " How do I choose an ideal block/max-object
> size? " thread.
> This topic may be good for the FAQ..

Feel free to write an entry. I'll be happy to add it :-)

> To add another specific to my case. My initial users will always also
> access entire files. So from reading that other thread it would seem my
> best option is larger than my average file size since I would not ever
> read partial parts of a file.
>
> Is this more or less the summary?
> * For anyone accessing entire files every time, should use
> a max-obj-size that is as big as the largest file(s) stored or
> arbitrarily larger.
> * For anyone reading partials content of a file should consider how much
> of the files will be read and adjust max-obj-size accordingly to the
> most commonly read amount of data from a file.

Yes. But I would add an additional point:

* 94.8% of the time, it's really not worth to worry about this
and the default is good enough.

> Is caching at block level

Yes.

> or is there any type of possible read-ahead
> configuration?

No.

Best,

-Nikolaus

--
�Time flies like an arrow, fruit flies like a Banana.�

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

Francisco Reyes

unread,

Nov 6, 2012, 10:59:46 PM11/6/12

to s3...@googlegroups.com

On Monday, October 29, 2012 9:56:01 PM UTC-4, Nikolaus Rath wrote:

Yes. But I would add an additional point:
* 94.8% of the time, it's really not worth to worry about this
and the default is good enough.

Isn't each block a single file? Thought I had read that.. but have a mount for over 40K files an only slightly over 10K blocks.

About 1.5K of those files are, in average, 128MB each. Also wondering if each of those would end up as 13 block files.

Nikolaus Rath

unread,

Nov 7, 2012, 8:39:27 AM11/7/12

to s3...@googlegroups.com

On 11/06/2012 10:59 PM, Francisco Reyes wrote:
> On Monday, October 29, 2012 9:56:01 PM UTC-4, Nikolaus Rath wrote:
>
> Yes. But I would add an additional point:
> * 94.8% of the time, it's really not worth to worry about this
> and the default is good enough.
>
> Isn't each block a single file? Thought I had read that.. but have a
> mount for over 40K files an only slightly over 10K blocks.

Every file occupies at least one block. However, identical blocks won't
be stored twice because of deduplication. In your case you probably have
several files with identical contents.

> About 1.5K of those files are, in average, 128MB each. Also wondering if
> each of those would end up as 13 block files.

Were it not for deduplication, then yes.

Reply all

Reply to author

Forward