NullPointerException while creating index from Lucene Index 3.6

80 views
Skip to first unread message

Tech Cool

unread,
Mar 12, 2014, 7:21:44 AM3/12/14
to semanti...@googlegroups.com
Hello,
        I am trying to use SemanticVectors on top of my lucene index. I am running BuildIndex command to create index from lucene index version 3.6. I am getting NullPointerException, Can anybody please help me

java -cp semanticvectors-5.4.jar pitt.search.semanticvectors.BuildIndex -luceneindexpath ca03122014/
Seedlength: 10, Dimension: 200, Vector type: REAL, Minimum frequency: 0, Maximum frequency: 2147483647, Number non-alphabet characters: 2147483647, Contents fields are: [contents]
Creating term vectors as superpositions of elemental document vectors ...
Creating semantic term vectors ...Exception in thread "main" java.lang.NullPointerException
    at pitt.search.semanticvectors.TermVectorsFromLucene.trainTermVectors(TermVectorsFromLucene.java:134)
    at pitt.search.semanticvectors.TermVectorsFromLucene.createTermVectorsFromLuceneImpl(TermVectorsFromLucene.java:123)
    at pitt.search.semanticvectors.TermVectorsFromLucene.createTermVectorsFromLucene(TermVectorsFromLucene.java:97)
    at pitt.search.semanticvectors.BuildIndex.main(BuildIndex.java:109)

Dominic

unread,
Mar 12, 2014, 11:54:02 AM3/12/14
to semanti...@googlegroups.com
Hi there,

Looks like if may be a mismatch between Lucene and SV versions.

Please see https://code.google.com/p/semanticvectors/wiki/LuceneCompatibility for the list of which versions of SV have been tested with which versions of Lucene.

Best wishes,
Dominic

Tech Cool

unread,
Mar 12, 2014, 12:26:27 PM3/12/14
to semanti...@googlegroups.com
Thanks Dominic, it worked with version 3.8. Now BuildIndex generates termvectors.bin and docvectors.bin. When I search using these files, it returns 0 documents, I am using command
java -cp semanticvectors-3.8.jar:lucene-core-3.6.0.jar pitt.search.semanticvectors.Search -queryvectorfile termvectors.bin -searchvectorfile docvectors.bin java

I am not sure whether I am running the command correctly. Should not I provide Lucene index path to search? Please help

Dominic Widdows

unread,
Mar 12, 2014, 12:30:07 PM3/12/14
to semanti...@googlegroups.com
What output do you see when you run the command?

Are you intending "java" to be a query term? Is it part of your corpus?

Best wishes,
Dominic


--
You received this message because you are subscribed to the Google Groups "Semantic Vectors" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semanticvecto...@googlegroups.com.
To post to this group, send email to semanti...@googlegroups.com.
Visit this group at http://groups.google.com/group/semanticvectors.
For more options, visit https://groups.google.com/d/optout.

Tech Cool

unread,
Mar 13, 2014, 6:06:24 AM3/13/14
to semanti...@googlegroups.com
Hi Dominic,
Lucene index has that term and I assume generated termvector should contain that term "java". I get following result
java -cp semanticvectors-3.8.jar:lucene-core-3.6.0.jar pitt.search.semanticvectors.Search -queryvectorfile termvectors.bin -searchvectorfile docvectors.bin concept
Opening query vector store from file: termvectors.bin
Opening search vector store from file: docvectors.bin
Searching term vectors, searchtype SUM
Didn't find vector for 'concept'
No vector for 'concept'
No search output.
 Is there anyway I can check whether term is there or not?

Dominic Widdows

unread,
Mar 13, 2014, 12:16:30 PM3/13/14
to semanti...@googlegroups.com
Yes - easy ways to see what terms are in your termvectors.bin file include:

- Examine it using "more termvectors.bin" - you can usually see the strings pretty clearly even in the binary format.
- To get and entirely text-based format, rebuild using "-indexfileformat text", or use the VectorStoreTranslater to translate your .bin files to .txt files.

Best wishes,
Dominic 
Reply all
Reply to author
Forward
0 new messages