[OpenSherlock] Harvesting workflows

6 views

Skip to first unread message

Jack Park

unread,

Aug 1, 2015, 11:27:30 AM8/1/15

to qa-...@googlegroups.com

I created a document [1] which describes work I am now doing on OpenSherlock. Specifically, it is about changing the way OpenSherlock can acquire PubMed abstracts to harvest.

As I type, in the last 2 hours, the system I am describing, clustered and harvested some 30,000 PubMed abstracts, a process that used to take nearly an entire day.

The document describes a project I will put up at GitHub shortly, which uses the Carrot2 engine internally to run batch-mode queries against query terms I supply in a list (what I used to do by hand with the Carrot2 Workbench).

The document also sketches planned enhancements to make the project work together with OpenSherlock such that OpenSherlock can send it new terms to query and cluster, and return to OpenSherlock for harvesting, automatically.

Cheers,

Jack

[1] https://docs.google.com/document/d/1IOAGg2oxZg6xHYlni_OCJNQnhKcZf5XdwwIXys9gcbc/

Reply all

Reply to author

Forward

0 new messages