Thank you all for the replies. I am still a bit confused as to how to
integrate. What do you mean by "to work with a Solr-controlled index
you want to give the field name(s)
on the command line"?
Here is the process I've tried unsuccessfully so far. I am using the
demo data from solr so we can all be on the same page. The fields I
have defined are:
<fields>
<field name="id" type="string" indexed="true" stored="true"
required="true" />
<field name="sku" type="textTight" indexed="true" stored="true"
omitNorms="true"/>
<field name="name" type="text" indexed="true" stored="true"/>
<field name="nameSort" type="string" indexed="true" stored="false"/
>
<field name="alphaNameSort" type="alphaOnlySort" indexed="true"
stored="false"/>
<field name="manu" type="text" indexed="true" stored="true"
omitNorms="true"/>
<field name="cat" type="text_ws" indexed="true" stored="true"
multiValued="true" omitNorms="true" termVectors="true" />
<field name="features" type="text" indexed="true" stored="true"
multiValued="true"/>
<field name="includes" type="text" indexed="true" stored="true"/>
<field name="weight" type="sfloat" indexed="true" stored="true"/>
<field name="price" type="sfloat" indexed="true" stored="true"/>
<field name="popularity" type="sint" indexed="true" stored="true"
default="0"/>
<field name="inStock" type="boolean" indexed="true" stored="true"/>
<field name="word" type="string" indexed="true" stored="true"/>
<field name="text" type="text" indexed="true" stored="false"
multiValued="true"/>
<field name="manu_exact" type="string" indexed="true"
stored="false"/>
<field name="timestamp" type="date" indexed="true" stored="true"
default="NOW" multiValued="false"/>
<field name="spell" type="textSpell" indexed="true" stored="true"
multiValued="true"/>
<dynamicField name="*_i" type="sint" indexed="true"
stored="true"/>
<dynamicField name="*_s" type="string" indexed="true"
stored="true"/>
<dynamicField name="*_l" type="slong" indexed="true"
stored="true"/>
<dynamicField name="*_t" type="text" indexed="true"
stored="true"/>
<dynamicField name="*_b" type="boolean" indexed="true"
stored="true"/>
<dynamicField name="*_f" type="sfloat" indexed="true"
stored="true"/>
<dynamicField name="*_d" type="sdouble" indexed="true"
stored="true"/>
<dynamicField name="*_dt" type="date" indexed="true"
stored="true"/>
<dynamicField name="random*" type="random" />
</fields>
I then change line 198 in buildIndex to read:
String[] fieldsToIndex =
{"id","sku","name","nameSort","alphaNameSort","manu","cat","features","includes","weight","price","popularity","inStock","word","text","manu_exact","timestamp","spell"};
To build the solr index I run:
java -jar post.jar *.xml
this uploads all the exampledocs xml files that are included with
solr. I've run a few searches and the index is fine. I then try
builds SV's index by running:
java pitt.search.semanticvectors.BuildIndex ~/solr/solr/data/index/
I get the following error:
seedLength = 20
Vector length = 200
Minimum frequency = 10
Populating basic sparse doc vector store, number of vectors: 26
Creating store of sparse vectors ...
Created 26 sparse random vectors.
Creating term vectors ...
There are 1133 terms (and 26 docs)
0 ... 1000 ...
Created 5 term vectors ...
Initializing document vector store ...
Building document vectors ...
0 ...
Normalizing doc vectors ...
Exception in thread "main" java.lang.NullPointerException
at pitt.search.semanticvectors.DocVectors.makeWriteableVectorStore
(DocVectors.java:143)
at pitt.search.semanticvectors.BuildIndex.main(BuildIndex.java:231)
What would I explicitly need to do to get SV to build an index of
solr's default xml set?
Thank you all again.