1) Don't use the default hashing algorithm, use the polynomial
distribution algorithm written by someone who must have been very
handsome and clever. This is changed by creating the file with the
HASHMETHOD=n option, where n is 1, 2, 3 or 4. You should really use
HASHMETHOD=2
2) Use jrf to resize the file and changing the HASHING algortihm;
3) Change the environment so that HASHMETHOD=2 is the default;
4) Read the articles about CREATE-FILE in the jBASE knowledgebase;
5) Read the posting guidelines for the group and supply the operating
system, version of jBASE and so on ;-)
So, in your login script (.profile or whatever)
export JEDI_PREFILEOP="HASHMETHOD=2"
You can also use this command at the shell prompt. Then resize the file:
jrf MYFILENAME
jstat should now tell you that the bytes per group is closer to 4000 and
because the hashing algorithm is much better, you will find it has a
more even distribution. But with only 36K smallish records, it probably
won't make much difference performance wise.
JIm
Jim
Try it with a realistic number relative to the modulo and you will find
that HASHMETHOD=2 is better than 5.
> Whereas Hash method 5 gives an even spread of items throughout the
> Group for these 'real life' item ids
>
But you need to have 37,000 of them, in a real file size not 19. The
perturbation is not optimized for non-reallife modulos :-) Try the
program I sent you with 1 million for instance.
Jim
pat wrote:And a scientific test, using the 19 item ids that Hashed into Group 4565 in the Original post, shows that Hash method 2 ( two ) demonstrates exactly what the original poster referred to.This isn't scientific at all. 19 item ids in a file that big
gives a statistical spread that is pretty useless mate :-( Try it with a realistic number relative to the