Re: Errors when implementing search using API instead of command lines

127 views
Skip to first unread message
Message has been deleted

Dominic Widdows

unread,
Jul 20, 2016, 6:16:58 PM7/20/16
to semanti...@googlegroups.com
Hello there,


Semantic Vectors is not up-to-date with Lucene 6.1.0, as I've just found out by replicating your error. But it works fine with 5.0. 

Also the last error shows up only at runtime, not compile time. Strange.

What Lucene version are you using? Are you sure there are not incompatible versions of Lucene at different places in your classpath?

Best wishes,
Dominic

On Wed, Jul 20, 2016 at 10:19 AM, Hadeel Maryoosh <hadeel....@gmail.com> wrote:
Hello,

I'm new to Java and to this packagae, but

I have code that generates an index from small data ( now just for testing), and I'm trying to use buildIndex to have the the termvector.bin file and use the rest of the code in the example of github above, but when I add the line of 
BuildIndex.main(new String[] {"-luceneindexpath", "SemVec","-dimension","5"});


 I'm getting this error:

Seedlength: 10, Dimension: 5, Vector type: REAL, Minimum frequency: 0, Maximum frequency: 2147483647, Number non-alphabet characters: 2147483647, Contents fields are: [contents]
Exception in thread "main" java.lang.VerifyError: Bad type on operand stack
Exception Details:
  Location:
    pitt/search/semanticvectors/LuceneUtils.<init>(Lpitt/search/semanticvectors/FlagConfig;)V @77: putfield
  Reason:
    Type 'org/apache/lucene/index/DirectoryReader' (current frame, stack[1]) is not assignable to 'org/apache/lucene/index/BaseCompositeReader'
  Current Frame:
    bci: @77
    flags: { }
    locals: { 'pitt/search/semanticvectors/LuceneUtils', 'pitt/search/semanticvectors/FlagConfig' }
    stack: { 'pitt/search/semanticvectors/LuceneUtils', 'org/apache/lucene/index/DirectoryReader' }
  Bytecode:
    0x0000000: 2ab7 0001 2abb 0002 59b7 0003 b500 042a
    0x0000010: bb00 0259 b700 03b5 0005 2a01 b500 062a
    0x0000020: 01b5 0007 2bb6 0008 b600 0999 000d bb00
    0x0000030: 0a59 120b b700 0cbf 2ab8 000d 2bb6 0008
    0x0000040: 03bd 000e b600 0fb8 0010 b800 11b5 0012
    0x0000050: 2a2a b400 12b8 0013 b500 142a b400 12b8
    0x0000060: 0015 572a 2bb5 0016 2bb6 0017 b600 099a
    0x0000070: 000b 2a2b b600 17b6 0018 2bb6 0019 b600
    0x0000080: 099a 000b 2a2b b600 19b6 001a bb00 1b59
    0x0000090: b700 1c12 1db6 001e 2bb6 0008 b600 1e12
    0x00000a0: 1fb6 001e b600 20b8 0021 b1            
  Stackmap Table:
    full_frame(@56,{Object[#177],Object[#178]},{})
    same_frame_extended(@122)
    same_frame(@140)

at pitt.search.semanticvectors.BuildIndex.main(BuildIndex.java:96)
at com.tutorialspoint.lucene.LuceneTester.createIndex(LuceneTester.java:57)
at com.tutorialspoint.lucene.LuceneTester.main(LuceneTester.java:39)



My source code:

package com.tutorialspoint.lucene;

import java.io.IOException;

import org.apache.lucene.document.Document;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import java.io.StringReader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.StopAnalyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.TermAttribute;
import org.apache.lucene.util.Version;
import pitt.search.semanticvectors.*;
import pitt.search.semanticvectors.vectors.RealVector;
import pitt.search.semanticvectors.vectors.ZeroVectorException;
import java.io.File;
import java.io.IOException;
import java.util.LinkedList;
import java.util.Scanner;


public class LuceneTester {
   String indexDir = "C:\\Users\\A6B0SZZ\\Downloads\\Index";
   String dataDir = "C:\\Users\\A6B0SZZ\\Downloads\\Data";
   public static String DEFAULT_VECTOR_FILE = "C:\\Users\\A6B0SZZ\\Downloads\\termvectors.bin";
   Indexer indexer;
   Searcher searcher;

   public static void main(String[] args) {
      LuceneTester tester;
      try {
   
         tester = new LuceneTester();
       
         tester.createIndex();
         tester.search("liver  heart");
      } catch (IOException e) {
         e.printStackTrace();
      } catch (ParseException e) {
         e.printStackTrace();
      }
      System.out.println("HELLO WORLD");
   }

      
   private void createIndex() throws IOException{
      indexer = new Indexer(indexDir);
      int numIndexed;
      long startTime = System.currentTimeMillis();
      numIndexed = indexer.createIndex(dataDir, new TextFileFilter());
      long endTime = System.currentTimeMillis();
      File indexDir = new File("SemVec");
      BuildIndex.main(new String[] {"-luceneindexpath", "SemVec","-dimension","5"});
      indexer.close();
      System.out.println(numIndexed+" File indexed, time taken: "
         +(endTime-startTime)+" ms");
   }

   private void search(String searchQuery) throws IOException, ParseException{
      searcher = new Searcher(indexDir);
      long startTime = System.currentTimeMillis();
      TopDocs hits = searcher.search(searchQuery);
      long endTime = System.currentTimeMillis();
   
      System.out.println(hits.totalHits +
         " documents found. Time :" + (endTime - startTime));
      for(ScoreDoc scoreDoc : hits.scoreDocs) {
         Document doc = searcher.getDocument(scoreDoc);
            System.out.println("File: "
            + doc.get(LuceneConstants.FILE_PATH));
      }
      searcher.close();
   }
}




Indexer class:

package com.tutorialspoint.lucene;

import java.io.File;
import java.io.FileFilter;
import java.io.FileReader;
import java.io.IOException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

public class Indexer {

   private IndexWriter writer;

   
   //  directory that will have the index ((indexDirectoryPath))
   public Indexer(String indexDirectoryPath) throws IOException{
      //this directory will contain the indexes
      Directory indexDirectory = 
         FSDirectory.open(new File(indexDirectoryPath));

      //create the indexer
      writer = new IndexWriter(indexDirectory, 
         new StandardAnalyzer(Version.LUCENE_36),true, // Using regular analyzing!!!!!
         IndexWriter.MaxFieldLength.UNLIMITED); /// We are writing the index file!!!!!
   }

   public void close() throws CorruptIndexException, IOException{
      writer.close();
   }

   /// How to retrieve things from the index file!!!!!!
   private Document getDocument(File file) throws IOException{
      Document document = new Document();

      //index file contents
      Field contentField = new Field(LuceneConstants.CONTENTS, 
         new FileReader(file));
      //index file name
      Field fileNameField = new Field(LuceneConstants.FILE_NAME,
         file.getName(),
         Field.Store.YES,Field.Index.NOT_ANALYZED);
      //index file path
      Field filePathField = new Field(LuceneConstants.FILE_PATH,
         file.getCanonicalPath(),
         Field.Store.YES,Field.Index.NOT_ANALYZED);
 // Adding all those to document !!!!!!!!!!!!!!!!!
      document.add(contentField);
      document.add(fileNameField);
      document.add(filePathField);

      return document;
   }   

   private void indexFile(File file) throws IOException{
      System.out.println("Indexing "+file.getCanonicalPath());
      Document document = getDocument(file);
      writer.addDocument(document);
   }

   
   
   // Get all these indexes here
   public int createIndex(String dataDirPath, FileFilter filter) 
      throws IOException{
      //get all files in the data directory
      File[] files = new File(dataDirPath).listFiles();

      for (File file : files) {
         if(!file.isDirectory()
            && !file.isHidden()
            && file.exists()
            && file.canRead()
            && filter.accept(file)
         ){
            indexFile(file);
         }
      }
      return writer.numDocs();
   }
}



Searcher:


package com.tutorialspoint.lucene;

import java.io.File;
import java.io.IOException;

import org.apache.lenya.lucene.index.PersonalSimilarity;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.StopAnalyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.TermAttribute;
import org.apache.lucene.util.Version;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.FieldInvertState;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.DefaultSimilarity;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.Similarity;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

public class Searcher {
   IndexSearcher indexSearcher;
   QueryParser queryParser;
   Query query;
   
   public Searcher(String indexDirectoryPath) 
      throws IOException{
      Directory indexDirectory = 
         FSDirectory.open(new File(indexDirectoryPath));
      indexSearcher = new IndexSearcher(indexDirectory);
      queryParser = new QueryParser(Version.LUCENE_36,
         LuceneConstants.CONTENTS,
         new StandardAnalyzer(Version.LUCENE_36));
 
   }
   //QueryParser parser=new QueryParser(Version.LUCENE_30,langCode,this.getAnalyzer());
   // Query query=parser.parse(queryString);
  // int maxSearchLength=1000;
  // TopDocs topDocs=searcher.search(query,null,maxSearchLength);
   public TopDocs search( String searchQuery) 
      throws IOException, ParseException{
      query = queryParser.parse(searchQuery);
  //    PersonalSimilarity ds = new PersonalSimilarity(); 
      
      //query = queryParser.escape(searchQuery);
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   
   Similarity similarity = new DefaultSimilarity() {

       @Override
       public float lengthNorm(FieldInvertState state) {
           return 1.0f;
       }

       @Override
       public float coord(int overlap, int maxOverlap) {
           return 1.0f;
       }

       @Override
       public float idf(long docFreq, long numDocs) {
           return 1.0f;
       }

       @Override
       public float queryNorm(float sumOfSquaredWeights) {
           return 1.0f;
       }   

       @Override
       public float tf(float freq) {
           return freq == 0f ? 0f : 1f;
       }
};



  //  get this document 
   public Document getDocument(ScoreDoc scoreDoc) 
      throws CorruptIndexException, IOException{
      return indexSearcher.doc(scoreDoc.doc);
   }

   public void close() throws IOException{
      indexSearcher.close();
   }
}













I also tried to download the termvector.bin that is in the example, and implement the example alone, but not sure why I'm getting this error:

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.lucene.store.FSDirectory.open(Ljava/nio/file/Path;)Lorg/apache/lucene/store/FSDirectory;
at pitt.search.semanticvectors.VectorStoreReaderLucene.<init>(VectorStoreReaderLucene.java:90)
at pitt.search.semanticvectors.VectorStoreReader.openVectorStore(VectorStoreReader.java:60)
at pitt.search.semanticvectors.VectorStoreRAM.initFromFile(VectorStoreRAM.java:97)
at pitt.search.semanticvectors.VectorStoreRAM.readFromFile(VectorStoreRAM.java:91)
at com.tutorialspoint.lucene.ExampleVectorSearcherClient.main(ExampleVectorSearcherClient.java:35)

--
You received this message because you are subscribed to the Google Groups "Semantic Vectors" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semanticvecto...@googlegroups.com.
To post to this group, send email to semanti...@googlegroups.com.
Visit this group at https://groups.google.com/group/semanticvectors.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Dominic Widdows

unread,
Jul 21, 2016, 12:25:56 PM7/21/16
to semanti...@googlegroups.com
You're right that SV 5.8 is built with Lucene 5.0.0 and ships with this version bundled into it. So if you still have other versions in your classpath that could cause problems. I'm looking into upgrading the SV dependency but it appears to be not a simple operation (in other words I haven't got it working yet!)

-Dominic

On Thu, Jul 21, 2016 at 8:41 AM, Hadeel Maryoosh <hadeel....@gmail.com> wrote:


Well I have the semanticvectors-5.8.jar with Lucene 3.6.2 .. I didn't upgrade Lucene because I thought it's already included with the SV package.. But going through the link you put, If I have Lucene installed already, then I need to follow the old instructions.

Now I have semanticvectors-5.8.jar, what Lucene version do I need? For me, The current version of Lucene is working fine before using the SV package.
Message has been deleted

Dominic Widdows

unread,
Jul 21, 2016, 1:38:52 PM7/21/16
to semanti...@googlegroups.com
"while" is a very general term, and the test vectors are from a very small example.

If you try "wine" you'll likely get better results, including "drunk", "firkins" (relevant, albeit somewhat antiquated). To get anything you're actually happy with you'll want to work with a much bigger corpus.

Best wishes,
Dominic

On Thu, Jul 21, 2016 at 9:26 AM, Hadeel Maryoosh <hadeel....@gmail.com> wrote:

So I have installed the Lucene version 5 with SV 5.8 and it works fine now.

I compiled the example of exampleVectorSearcherClient , but I got weird results, something like:

Enter a query term:
while
1.0:while
0.9089492092180987:little
0.8462886084393697:shew
0.8201901099000826:proverb
0.8201901099000826:travail
0.8201901099000826:needest
0.8201901099000826:sorrow
0.8201901099000826:synagogues
0.8201901099000826:tribulation
0.8201901099000826:offended



Which I found that they are not related,,, Is that normal?

On Thursday, July 21, 2016 at 9:41:43 AM UTC-6, Hadeel Maryoosh wrote:


Well I have the semanticvectors-5.8.jar with Lucene 3.6.2 .. I didn't upgrade Lucene because I thought it's already included with the SV package.. But going through the link you put, If I have Lucene installed already, then I need to follow the old instructions.

Now I have semanticvectors-5.8.jar, what Lucene version do I need? For me, The current version of Lucene is working fine before using the SV package.

On Wednesday, July 20, 2016 at 4:16:58 PM UTC-6, Dominic wrote:
Message has been deleted
Message has been deleted

Dominic Widdows

unread,
Jul 22, 2016, 1:19:36 PM7/22/16
to semanti...@googlegroups.com
BuildIndex is a class not a method. BuildIndex.main is a standard java main method. It doesn't return a vector store to the API caller, it writes the vector store to disk.

-Dominic

On Fri, Jul 22, 2016 at 9:52 AM, Hadeel Maryoosh <hadeel....@gmail.com> wrote:
Ok, so it looks like BuildIndex is a method that returns nothing.. So not sure how to build the termvector.bin with it then? I appreciate the help on how using BuildIndex method.


On Thursday, July 21, 2016 at 3:15:14 PM UTC-6, Hadeel Maryoosh wrote:
Thanks, It makes sense to try on my test data since I was using it with pure Lucene.
This might be stupid, but I have been trying to fit the index file into BuildIndex but I'm getting Java errors on how " can not convert void to string " kind of errors ( still new to Java). I'm trying to add the Buildindex to my index file path like this:


public class ExampleVectorSearcherClient {
  String indexDir = "C:\\Users\\Downloads\\Index";

  /**
   * Opens vector store from given arg, or if this is empty uses {@link #DEFAULT_VECTOR_FILE}.
   * Then prompts the user for terms to search for.
   *
   * @throws IOException If vector store can't be found.
   */
  public static void main(String[] args) throws IOException {
 

    String DEFAULT_VECTOR_FILE = BuildIndex.main(new String[] {"-luceneindexpath", "indexDir","-dimension","5"});
    String vectorStoreName;
    
    if (args.length == 0) {
      vectorStoreName = DEFAULT_VECTOR_FILE;
    } else {
      vectorStoreName = args[0];
    }



is this line wrong? String DEFAULT_VECTOR_FILE = BuildIndex.main(new String[] {"-luceneindexpath", "indexDir","-dimension","5"});


I appreciate the help.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages