[OpenSherlock] Harvesting workflows

6 views
Skip to first unread message

Jack Park

unread,
Aug 1, 2015, 11:27:30 AM8/1/15
to qa-...@googlegroups.com
I created a document [1] which describes work I am now doing on OpenSherlock. Specifically, it is about changing the way OpenSherlock can acquire PubMed abstracts to harvest.

As I type, in the last 2 hours, the system I am describing, clustered and harvested some 30,000 PubMed abstracts, a process that used to take nearly an entire day.

The document describes a project I will put up at GitHub shortly, which uses the Carrot2 engine internally to run batch-mode queries against query terms I supply in a list (what I used to do by hand with the Carrot2 Workbench).

The document also sketches planned enhancements to make the project work together with OpenSherlock such that OpenSherlock can send it new terms to query and cluster, and return to OpenSherlock for harvesting, automatically.



Cheers,
Jack
Reply all
Reply to author
Forward
0 new messages