By which, I mean, an overview of the SolrSherlock project which provoked, in me, a recognition that there were already projects 'out there' that fall into a general category I called Open DeepQA, along with Open Access, Open Science, and so forth.
SolrSherlock started life as SolrWatson, but
Jim Spohrer at IBM pointed out that the Watson project has its intellectual and IP roots in the name Watson, early chairman of IBM; he suggested SolrDrWatson, and
Tom Munnecke rounded that out with SolrSherlock, which slides off one's tongue quite nicely! The term Solr in SolrSherlock rests in the original project's core persistence/index mechanism: Apache
Solr. Thinking ahead, it's not clear that Solr is the only approach, so a daughter project envisions other backsides, and I've named that OpenSherlock. For now, my own work develops around the Solr platform.
The notion that Watson itself will go open source is not well founded; the possibility that students at
rpi.edu will replace internal code with that which could be open sourced is suggested in some places on the web, but remains to be seen. In any case, IBM already gifted the world with
UIMA, a core feature of the Watson architecture.
In the case of SolrSherlock, the project started out as a merge agent for a
topic map platform. In some topic maps, merge decisions are based on comparison of certain topic property values, much as one determines sameness in an OWL ontology by comparing the object's RDF-ID. But, what about topics which do not carry the same, say, RDF-ID, but which are really about the same topic? In the database field, this is a richly studied field, known variously as record reconciliation, database merge, database federation, and so forth. Merge processes are complex; as I worked on merging truly complex objects I described in my
thesis proposal, I began to realize that I was working on a project which would exhibit behaviors not unlike those demonstrated by Watson. Thus, SolrSherlock is an outgrowth of topic map merging agent development; serving as a merge agent is one of its use cases. There, it must answer this question: "Have I seen this before?"
I first described SolrSherlock at my
blog. The project's primary thinking document is at
DebateGraph. Code for the core project resides at
GitHub.