have pushed a fairly large set of commits with the following features:
* Much more stable when dealing with large numbers of documents (James
- I think this will fix your problem from the other day).
* Improved performance through using a rolling hash and other optimisations.
* A query interface for documents which paves the way for powerful
selective associating. Currently only used on the documents list page,
but will shortly roll out onto search and association API calls. eg:
http://127.0.0.1:8080/document/1;3/
http://127.0.0.1:8080/document/1-2;4-5/
http://127.0.0.1:8080/document/?order_by=characters
http://127.0.0.1:8080/document/?order_by=-title&limit=10
http://127.0.0.1:8080/document/?limit=10&order_by=title&cursor=merchant_of_venice.txt:3:3
* Better search results. Previously the associations tended towards
longer documents. Now they are based on total number of matches.
* Simple load.sh helper script that will load all text files in a
folder structure with a doctype per directory and added metadata based
on each documents parent folder.
The next two things which I aim to implement very shortly are
multi-threaded associations, which should scale linearly with cores,
as we are CPU-bound, and JSON templates.
Any questions or bugs let me know!
Cheers,
Donny.
It's mostly just template stuff documented here:
http://google-ctemplate.googlecode.com/svn/trunk/doc/guide.html
It's quite a traditional templating system in the sense that it
doesn't allow any logic in the template whatsoever!! But that's quite
reassuring in a way. I'm fairly certain it is the code that delivers
the Google search results, especially when you look at the examples!
Have been thinking about how to make interesting use of the JSON
output. I'm sure you'll have come across d3.js. I really like this
example:
http://mbostock.github.com/d3/ex/splom.html
Especially the multiple selection when you drag across an individual
graph. Have thought about a set of concentric rings where each ring
represents a document, and common sections of text are highlighted as
a sectioned arc. Hovering over an arc highlights all the other
matching arcs. Would make it very quick and easy to clusters of
matches between multiple documents at once. Need to do a demo...
A bit like this, but not quite:
http://mbostock.github.com/d3/ex/sunburst.html
Cheers,
Donny.