Donovan Hide

Dec 5, 2011, 8:48:32 AM12/5/11
sorry (again!) for the radio silence. Have been very busy coming up
with an optimal JSON output format for the search and document
browsing responses. Here's a sample of a current format:


which has been used in the UI shown in the attached screenshot which
shows a search of Dicken's "Old Curiosity Shop" against a plagiarism
corpus. The interface is using ExtJS and seems to work well for
quickly browsing through each matched document and looking at the
discovered fragments. I'm doing the same for the document browsing
page and hope to have something that you can play with soon.

A lot of the work being done is directly replicable in the Churnalism
browser extension, and in a browser neutral fashion so it should speed
that part of the work up too. It's an open question of how much
processing of the fragments to do on the server or client side. There
are a great deal of interesting things that can be done with the
fragments list and I've been tending to do the work in the client.

The python client could follow the same model... Not sure, any
feedback appreciated!!



Tom Lee

Dec 5, 2011, 11:49:46 AM12/5/11
This is very exciting!  Thanks for the update, Donny -- I'm looking forward to digging into this.

Martin Moore

Dec 5, 2011, 12:36:36 PM12/5/11
Well done Donny - sorry I've been so hard to get hold of. Ought to be a little more accessible this week (I hope!).
