[SolrSherlock] Project Update

14 views
Skip to first unread message

Jack Park

unread,
Apr 18, 2014, 2:24:11 PM4/18/14
to qa-...@googlegroups.com
Tomorrow, April 19, I will be giving a talk at a BigData Sciences
meetup; my slides for that talk are now online at
http://www.slideshare.net/jackpark/big-datasci20140419

The slides introduce a new concept to the SolrSherlock conversation;
after many conversations with Patrick Durusau, Mark Szpakowski, and
Sherry Jones, we decided to give this new concept the name
HyperMembrane.

The term is inspired by a paper by Ted Nelson:
A COSMOLOGY FOR A DIFFERENT COMPUTER UNIVERSE:
Data Model, Mechanisms, Virtual Machine and Visualization Infrastructure
http://xanadu.com/zigzag/ZZdnld/zzRefDef/

where there is an intersection between the very notion of cosmology in
our context, and the topological nature of harvested information
resources. Ted talks about his ZigZag architecture, which, in short,
is like a beads-on-a-string representation of information resources,
where, technically, each topic has just one bead (node). That means
that each bead (node) will have many "strings" passing through it,
depending on context.

In a sense, that is an "information fabric", but one of potentially
high dimensionality. A membrane is a 2-dimensional sheet, a hyper
membrane is a sheet but one of many dimensions. In another
vernacular, one might think of the framework as that of intersecting
manifolds -- topology at work.

I will have much more to say about all of that soon; at this moment,
the code necessary to build and maintain that structure is partially
running. It uses a link-grammar parser, and relies on a topic map to
maintain identity of and relationships among those nodes (beads).

Primary nodes are those of nouns and noun phrases, and verbs and verb
phrases; the fabric ignores what would otherwise be called "stop
words", though they are kept around internally.

Another topic raised in my slides is that of Literature-based
Discovery. There's a huge literature on that topic; my slides present
a simplified version of Don R. Swanson's Fish Oil, Raynaud's Syndrome
discovery, the paper that started that field: undiscovered public
knowledge.

Before I make the source code for this work available at GitHub, it
will be able to demonstrate the ability to perform literature-based
discoveries based on resources captured in the fabric.

More soon!

Leonid Boytsov

unread,
Apr 16, 2015, 9:53:02 PM4/16/15
to qa-...@googlegroups.com
Hi, I checked your slides, but I still don't get why the project called SolrSherlock. Is there a description of the SOLR-specific part?

Thanks!
Reply all
Reply to author
Forward
0 new messages