Precision and Recall Lucene Java

322 views
Skip to first unread message

Gianluca Di Vincenzo

unread,
Sep 19, 2013, 4:26:23 PM9/19/13
to lucene-a...@googlegroups.com
Hi, I'm a new developer of Lucene inexperienced , I turn to you after a series of failed attempts .
My problem is to create a simple Java program that calculates the precision recall of a series of documents .
Of file system I have a collection of documents that I index using the command prompt with the command:

java -cp C:/lucene/jar/lucene-core-4.3.1.jar;C:/lucene/jar/lucene-analyzers-common-4.3.1.jar;C:/lucene/jar/lucene-demo-4.3.1.jar org.apache.lucene.demo.IndexFiles -index C:/lucene/indice -docs C:/lucene/confusion_track/original

Here's an example of one of the topics in the file C:/lucene/topics.txt :
<top>
<num> Number: CF11
<title> apache
<desc> Description:
I am looking for a document about the dismissal of a lawsuit Involving
Adventist Health Systems.
<narr> Narrative:
</ top>

Also place the Java code , in order to help you easily identify my error :

 public static void main(String[] args) throws IOException {
        File topicsFile = new File("C:/lucene/topics.txt");
        File qrelsFile = new File("C:/lucene/confusion.known_items");

        Directory fsDir = FSDirectory.open(new File("C:/lucene/indice"));
        IndexSearcher searcher = new IndexSearcher(IndexReader.open(fsDir));

        String docNameField = "Federal";

        PrintWriter logger = new PrintWriter(System.out, true);

        TrecTopicsReader qReader = new TrecTopicsReader();//#1

        QualityQuery qqs[] = qReader.readQueries(new BufferedReader(new FileReader(topicsFile)));
        Judge judge = new TrecJudge(new BufferedReader(new FileReader(qrelsFile)));//#2
        judge.validateData(qqs, logger);//#3

        QualityQueryParser qqParser = new SimpleQQParser("title", "description");//#4
        QualityBenchmark qrun = new QualityBenchmark(qqs, qqParser, searcher, docNameField);
        SubmissionReport submitLog = null;
        QualityStats[] stats = qrun.execute(judge, submitLog, logger);//#5
        QualityStats avg = QualityStats.average(stats);//#6
        avg.log("SUMMARY", 2, logger, "  ");

        fsDir.close();
    }

The result obtained is the following:

CF11  -  description:apache

CF11 Stats:
  Search Seconds:         0.014
  DocName Seconds:        0.000
  Num Points:             0.000
  Num Good Points:        0.000
  Max Good Points:        1.000
  Average Precision:      0.000
  MRR:                    0.000
  Recall:                 0.000

CF12  -  description:apache

CF12 Stats:
  Search Seconds:         0.000
  DocName Seconds:        0.000
  Num Points:             0.000
  Num Good Points:        0.000
  Max Good Points:        1.000
  Average Precision:      0.000
  MRR:                    0.000
  Recall:                 0.000

CF13  -  description:apache

CF13 Stats:
  Search Seconds:         0.001
  DocName Seconds:        0.000
  Num Points:             0.000
  Num Good Points:        0.000
  Max Good Points:        1.000
  Average Precision:      0.000
  MRR:                    0.000
  Recall:                 0.000

SUMMARY
Search Seconds:         0.003
DocName Seconds:        0.000
Num Points:             0.000
Num Good Points:        0.000
Max Good Points:        1.000
Average Precision:      0.000
MRR:                    0.000
Recall:                 0.000

Thank you all for the availability and the immense cooperation. 

Fabio Grucci

unread,
Oct 12, 2013, 1:43:31 PM10/12/13
to lucene-a...@googlegroups.com
Dear Gianluca,
this group was born to answer on specific LAE questions/problems. I think that your problem should be posted in other places than this, above all to help growing the right open-source community. However there are many resources in the web that explains how to do that, for example StackOverflow Precision recall in lucene java otherwise you can try this or this.

Thanks for making me discover the org.apache.lucene.benchmark package :-)
Reply all
Reply to author
Forward
0 new messages