Default options values for BuildIndex

26 views
Skip to first unread message

Laurent Kevers

unread,
Aug 6, 2015, 1:00:17 PM8/6/15
to Semantic Vectors
Hi,

I'm a little bit confused with what are the default values for some options available for BuildIndex...

(1) For termweight : I thought that default value was IDF. It seems to be confirmed by the following line in FlagConfig.java :
        private TermWeight termweight = TermWeight.IDF;
 But the next (comment) line makes me hesitate...
        /** Term weighting used when constructing document vectors, default value {@link TermWeight#NONE} */

--> Is IDF or NONE the default value for termweighting ?


(2) For -elementalmethod flag, my impression was that BuildIndex uses "random generation" but I saw in FlagConfig.java the following line :
        private
ElementalGenerationMethod elementalmethod = ElementalGenerationMethod.CONTENTHASH;

--> So is "Hashing generation" the default ?


And a last question : if I want to use deterministic elemental vectors and have reproducible results from run to run, do I need to use hashing generation or orthographic generation?


Thanks for the help!


Best regards,

Laurent

Dominic Widdows

unread,
Aug 6, 2015, 8:24:15 PM8/6/15
to semanti...@googlegroups.com
Hi Laurent,

On Thu, Aug 6, 2015 at 2:14 AM, Laurent Kevers <laurentke...@gmail.com> wrote:
Hi,

I'm a little bit confused with what are the default values for some options available for BuildIndex...

(1) For termweight : I thought that default value was IDF. It seems to be confirmed by the following line in FlagConfig.java :
        private TermWeight termweight = TermWeight.IDF;
 But the next (comment) line makes me hesitate...
        /** Term weighting used when constructing document vectors, default value {@link TermWeight#NONE} */

--> Is IDF or NONE the default value for termweighting ?

It's IDF nowadays. The javadoc hadn't caught up. (I've updated the source but can't get the javadoc to regenerate properly right now, sorry, gotta go!)
 

(2) For -elementalmethod flag, my impression was that BuildIndex uses "random generation" but I saw in FlagConfig.java the following line :
        private
ElementalGenerationMethod elementalmethod = ElementalGenerationMethod.CONTENTHASH;

--> So is "Hashing generation" the default ?

Hashing has become the default, yes. (Several people requested determinism and nobody ever requested more randomness.) 
 

And a last question : if I want to use deterministic elemental vectors and have reproducible results from run to run, do I need to use hashing generation or orthographic generation?

Both will give deterministic results. Hashing will give more independent results, orthographic generation will probably give some unlikely similarities, but might be worth experimenting with.

Best wishes,
Dominic
 

Thanks for the help!


Best regards,

Laurent

--
You received this message because you are subscribed to the Google Groups "Semantic Vectors" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semanticvecto...@googlegroups.com.
To post to this group, send email to semanti...@googlegroups.com.
Visit this group at http://groups.google.com/group/semanticvectors.
For more options, visit https://groups.google.com/d/optout.

Laurent Kevers

unread,
Aug 7, 2015, 3:31:11 AM8/7/15
to Semantic Vectors

Thank you Dominic for this answer!
I currently use Semantic Vectors 5.8 JAR from the Maven Repository. I just checked the source files and it seems that for this version, ElementalGenerationMethod is still set to RANDOM...
Now, I understand better my results :-)

Best,

Laurent
Reply all
Reply to author
Forward
0 new messages