Behemoth - an open source platform for large scale document analysis

89 views
Skip to first unread message

julien nioche

unread,
Nov 11, 2009, 9:03:44 AM11/11/09
to DigitalPebble
Hello,

Very early days but I wanted to tell you about a project that we are
working on and which will released under an open source license once
it reaches maturity. Behemoth will allow to deploy GATE or UIMA
applications over a Hadoop cluster in order to do very large scale
document analysis. It uses a simple representation format which will
be used as a common ground between UIMA and GATE-generated
annotations, hence achieving compatibility between both systems. Since
it is Hadoop-based it benefits from all its features, namely
scalability, fault-tolerance and most notably the back up of a
thriving open source community. Quite a few Apache resources will fit
into it: Nutch, Tika, Mahout, Hbase etc...

I will keep you updated as things progress, in the meantime we are
interested in hearing about use-cases and comments so feel free to get
in touch.

Best,

Julien
http://www.digitalpebble.com

julien nioche

unread,
Nov 23, 2009, 5:56:16 AM11/23/09
to DigitalPebble
You can now get the code and follow the progress of Behemoth on
http://code.google.com/p/behemoth-pebble/

On Nov 11, 2:03 pm, julien nioche <digitalpeb...@googlemail.com>
wrote:
Reply all
Reply to author
Forward
0 new messages