Hi Dominic,
That was real quick reply. Many thanks.
Yes I am using same relative path with which Lucene indexed the files.
I am using windows and eclipse. I will just outine directory structure
and commands I used so that it may help in figuring out where I am
mistaken:
Docs directory (files to index) is located at: C:\workspace\SV\src
\docsDir. This directory has both the files (file1.txt and file2.txt)
Index directory: C:\workspace\SV\src\indexDir
C:\workspace\SV\src\luceneInAction\Indexer.java with which lucene
indexed those 2 above files
I used indexed.java using arguments in eclipse: /workspace/SV/src/
indexDir /workspace/SV/src/docsDir
It printed output: Indexing C:\workspace\SV\src\docsDir\file1.txt
Indexing C:\workspace\SV\src\docsDir\file2.txt
Indexing 2 files took 281 milliseconds
In the index directory (C:\workspace\SV\src\indexDir), it generated
several lucene index files.
Then I used BuildIndex.java (C:\workspace\SV\src\pitt\search
\semanticvectors) with argument in eclipse: /workspace/SV/src/indexDir
INFO: Seedlength = 10
Dimension = 200
Minimum frequency = 0
Maximum frequency =
2147483647
Number non-alphabet characters = 0
Contents fields are: [contents]
23.05.2011 16:18:10 pitt.search.semanticvectors.BuildIndex main
INFO: Creating elemental document vectors ...
23.05.2011 16:18:10 pitt.search.semanticvectors.TermVectorsFromLucene
createTemVectorsFromLuceneImpl
INFO: Populating basic sparse doc vector store, number of vectors: 2
…..
…..
…..
It generated termvectors.bin and docvectors.bin at “C:\workspace\SV”
Then I used CompareTerms.java with arguments:
-queryvectorfile docvectors.bin \workspace\SV\src\docsDir\file1.txt
\workspace\SV\src\docsDir\file2.txt
It generated this output:
23.05.2011 16:21:13 pitt.search.semanticvectors.CompareTerms main
INFO: Opened query vector store from file: docvectors.bin
23.05.2011 16:21:13 pitt.search.semanticvectors.CompareTerms main
INFO: Couldn't open Lucene index at
23.05.2011 16:21:13 pitt.search.semanticvectors.CompareTerms main
INFO: No Lucene index for query term weighting, so all query terms
will have same weight.
23.05.2011 16:21:13
pitt.search.semanticvectors.VectorStoreReaderLucene getVector
INFO: Didn't find vector for '\workspace\SV\src\docsDir\file1.txt'
23.05.2011 16:21:13 pitt.search.semanticvectors.CompoundVectorBuilder
getAdditiveQueryVector
WARNUNG: No vector for \workspace\SV\src\docsDir\file1.txt
23.05.2011 16:21:13
pitt.search.semanticvectors.VectorStoreReaderLucene getVector
INFO: Didn't find vector for '\workspace\SV\src\docsDir\file2.txt'
23.05.2011 16:21:13 pitt.search.semanticvectors.CompoundVectorBuilder
getAdditiveQueryVector
WARNUNG: No vector for \workspace\SV\src\docsDir\file2.txt
23.05.2011 16:21:13 pitt.search.semanticvectors.CompareTerms main
INFO: Outputting similarity of "\workspace\SV\src\docsDir\file1.txt"
with "\workspace\SV\src\docsDir\file2.txt" ...
0.0
Strangely here in the output, it says couldn’t open lucene index, at
the same time not finding vectors for file1.txt and file2.txt. When I
copied lucene index files next to termvectors.bin and docvectors.bin
(C:\workspace\SV) and ran the above program again; in that case it
didn’t give “couldn’t find lucene index message” but still SV could
not locate vectors for both these files.
I tried other paths options as follows but nothing worked and message
was the same that couldn’t find vector for file1(.txt) and
file2(.txt):
-queryvectorfile docvectors.bin C:\workspace\SV\src\docsDir\file1.txt
C:\workspace\SV\src\docsDir\file2.txt
or…
C:\workspace\SV\src\docsDir\file1 C:\workspace\SV\src\docsDir\file2
or…
C:\workspace\SV\src\indexDir\file1 C:\workspace\SV\src\indexDir\file2
or…
C:\workspace\SV\src\indexDir\file1 C:\workspace\SV\src\indexDir\file2
or…
\workspace\SV\src\indexDir\file1 \workspace\SV\src\indexDir\file2
and few more......
But comparing "terms" will always work for example with arguments:
food chicken
23.05.2011 16:31:29 pitt.search.semanticvectors.CompareTerms main
INFO: Opened query vector store from file: termvectors.bin
23.05.2011 16:31:30 pitt.search.semanticvectors.CompareTerms main
INFO: Outputting similarity of "food" with "chicken" ...
0.9892034
I also tried to get the output of the documents using term like:
Search.java -searchvectorfile docvectors.bin chicken
Here it displayed the degree of similarity, but didn’t listed any doc
name:
23.05.2011 16:34:04 pitt.search.semanticvectors.Search RunSearch
INFO: Opening query vector store from file: termvectors.bin
23.05.2011 16:34:04 pitt.search.semanticvectors.Search RunSearch
INFO: Opening search vector store from file: docvectors.bin
23.05.2011 16:34:04 pitt.search.semanticvectors.Search RunSearch
INFO: Searching term vectors, searchtype SUM ...
23.05.2011 16:34:04 pitt.search.semanticvectors.Search main
INFO: Search output follows ...
0.25289524:
I just tried to post each and every step I went through so that may be
it can be figured out if I am making any mistake in the path anywhere.
I will really appreciate your valuable help.
Best Regards,
Deswick
> > Deswick- Hide quoted text -
>
> - Show quoted text -