Hello all!
I'm Sean Gallagher, and I develop Watsonsim, a class project at UNC Charlotte targeting Jeopardy questions. We're on
Github and we have
some status updates. As it stands right now, we have 35.5% accuracy, and about 52% recall, but it's been changing regularly (i.e. our accuracy yesterday was 31.2%). Our system is a bit different, and tries to be conceptually lean; it does not use UIMA, but it does use Bing. It's main drawbacks right now are that it's slow (it takes about a minute to answer a question), and it is too difficult to setup; in particular we have trouble distributing the requisite data because of the bandwidth. We have made inroads with the speed; only a few weeks ago it averaged around 150 sec/question. There are some oddities though; in our experience, logistic regression is not nearly as effective as RBF SVM.
I do have some questions though. According to my advisor, when IBM was using Indri, it's query latency was within a few percent of Lucene's. But for all the tweaking I have done I cannot seem to make that happen. Indri queries once took 15 seconds per question on a ~28 million record index, and I have been able to get it to 3 seconds per query, but in about the same amount of time Lucene performs about 60 to 120 queries. Has anyone else experienced this?