Modulo size

Rick Weiser

unread,

Jun 13, 2016, 2:59:30 PM6/13/16

to Pick and MultiValue Databases

Hi Guys,

I am having a slight brain fart (on set of old age :) on calculating the modulo of a potential file. Here is what I think:

modulo = (records * size) / frame size

If frame size is 1k then use 1024.

Someone please verify this for me, its killing me !!

Thanks,

Rick

Bob Rasmussen

unread,

Jun 13, 2016, 3:09:09 PM6/13/16

to Pick and MultiValue Databases

In general math terms, a modulo is a remainder, not a dividend. Thus
modulo a of b
is equal to
a - (truncate(a / b)) * b

> --
> You received this message because you are subscribed to
> the "Pick and MultiValue Databases" group.
> To post, email to: mvd...@googlegroups.com
> To unsubscribe, email to: mvdbms+un...@googlegroups.com
> For more options, visit http://groups.google.com/group/mvdbms
>
>

Regards,
....Bob Rasmussen, President, Rasmussen Software, Inc.

personal e-mail: r...@anzio.com
company e-mail: r...@anzio.com
voice: (US) 503-624-0360 (9:00-6:00 Pacific Time)
fax: (US) 503-624-0760
web: http://www.anzio.com
street address: Rasmussen Software, Inc.
10240 SW Nimbus, Suite L9
Portland, OR 97223 USA

Richard Lewis

unread,

Jun 13, 2016, 3:11:37 PM6/13/16

to mvd...@googlegroups.com

The way I usually do it is:

recs_per_bucket = (frame_size * separation) / rec_size

est_modulo = num_records / recs_per_bucket

modulo = next prime number bigger than (est_modulo * growth_factor)

If I have to go to the trouble to resize, then I usually make the growth factor at least 10%

I'm sure others will have other good and likely better methods.

Kind Regards,

Richard Lewis

Programmer/Analyst V

Nu Skin Enterprises

--

Kevin King

unread,

Jun 13, 2016, 6:15:47 PM6/13/16

to mvd...@googlegroups.com

Somewhat depends on the platform. Rocket recommends 10 records per group for "optimum" something or other so with Unidata we take the average record size + average key size + the standard deviation (if it's small), multiply x 10, and then pick the frame size that's bigger, whether 16k, 8k, 4k, 2k, 1k. Then divide the file size by the frame size and round up to prime. It's impossible to be perfect, so we're typically shooting more for the "hand grenade" range.

And growth is always good to factor. I used to do 10%, now I lean more towards 30.

--

-K

Rick Weiser

unread,

Jun 13, 2016, 10:58:30 PM6/13/16

to Pick and MultiValue Databases

Thanks for all the help.

To unsubscribe, email to: mvdbms+unsubscribe@googlegroups.com

For more options, visit http://groups.google.com/group/mvdbms

--
You received this message because you are subscribed to
the "Pick and MultiValue Databases" group.
To post, email to: mvd...@googlegroups.com

To unsubscribe, email to: mvdbms+unsubscribe@googlegroups.com

For more options, visit http://groups.google.com/group/mvdbms

--
-K

Brian Speirs

unread,

Jun 14, 2016, 5:01:46 AM6/14/16

to Pick and MultiValue Databases

Unless your files are different from mine, I would have thought you guys will run into overflow real quick!

In my experience, I find it difficult to get file utilisation above 70% without running into a reasonable amount of overflow. Maybe that is a function of uneven item sizes, or uneven hashing, but I find that is a pretty good rule of thumb.

Therefore, starting with the average item size (including @ID and delimiters), figure out your bucket size. I never use less than 2K, and if item sizes are quite variable I often find it useful to use a larger bucket size than is indicated just by the average item size. As Kevin says, aim for more than 10 items per group.

So, say we have 100 byte items, and a 2K frame size, then by my rule of thumb, we should aim to use a bit over 1400 bytes (70% utilisation). That is 14 items per bucket. Now, if we are looking at 20,000 items then that suggests a modulo of 1429 - which by luck is a prime number. So, that would be my starting point. Put the data in and run FILE.STAT (or ISTAT on PICK), and then adjust as necessary.

HTH,

Brian

To unsubscribe, email to: mvdbms+un...@googlegroups.com

For more options, visit http://groups.google.com/group/mvdbms

--
You received this message because you are subscribed to
the "Pick and MultiValue Databases" group.
To post, email to: mvd...@googlegroups.com

To unsubscribe, email to: mvdbms+un...@googlegroups.com

For more options, visit http://groups.google.com/group/mvdbms

--
-K

Anthonys Lists

unread,

Jun 14, 2016, 10:27:45 AM6/14/16

to mvd...@googlegroups.com

Any reason for not using dynamic files? (Type 18 or 30.) Pr1me didn't allow you any choice of bucket size - you got 2K and that was that. And their dynamic files are set by default to split at 80%

Your best option probably also depends on which database you're using ...

Cheers,
Wol

Kevin King

unread,

Jun 14, 2016, 11:23:35 AM6/14/16

to mvd...@googlegroups.com

Yeah. Unidata dynamic files are awful in terms of performance.

Reply all

Reply to author

Forward