Hello,
Very early days but I wanted to tell you about a project that we are
working on and which will released under an open source license once
it reaches maturity. Behemoth will allow to deploy GATE or UIMA
applications over a Hadoop cluster in order to do very large scale
document analysis. It uses a simple representation format which will
be used as a common ground between UIMA and GATE-generated
annotations, hence achieving compatibility between both systems. Since
it is Hadoop-based it benefits from all its features, namely
scalability, fault-tolerance and most notably the back up of a
thriving open source community. Quite a few Apache resources will fit
into it: Nutch, Tika, Mahout, Hbase etc...
I will keep you updated as things progress, in the meantime we are
interested in hearing about use-cases and comments so feel free to get
in touch.
Best,
Julien
http://www.digitalpebble.com