Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Behemoth Beginners Guide?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  2 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
alexmc  
View profile  
 More options Dec 6 2010, 11:23 am
From: alexmc <alex.mclint...@gmail.com>
Date: Mon, 6 Dec 2010 08:23:10 -0800 (PST)
Local: Mon, Dec 6 2010 11:23 am
Subject: Behemoth Beginners Guide?
May ask a dumb question? What would be the best way of getting me up
to speed with Behemoth?

I am trying to do some focussed crawling with Nutch or Bixo, want to
do some Named Entity Recognition (which sounds to me like UIMA or
Gate). Now I have been aware of Behemoth for some time as a tool which
helps do UIMA stuff on Hadoop but am only just getting round to
installing it. However the documentation is a bit thin on the ground?

Should I learn everything I need about connecting Nutch to UIMA now -
or will Behemoth help me with that? Do I just need to delve in and
understand all the code?

Cheers

Alex


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
DigitalPebble  
View profile  
 More options Dec 6 2010, 12:38 pm
From: DigitalPebble <jul...@digitalpebble.com>
Date: Mon, 6 Dec 2010 17:38:15 +0000
Local: Mon, Dec 6 2010 12:38 pm
Subject: Re: Behemoth Beginners Guide?

Hi Alex,

May ask a dumb question? What would be the best way of getting me up

> to speed with Behemoth?

See http://github.com/jnioche/behemoth/wiki/papers-and-talks for
introductions / overview

> I am trying to do some focussed crawling with Nutch or Bixo, want to
> do some Named Entity Recognition (which sounds to me like UIMA or
> Gate). Now I have been aware of Behemoth for some time as a tool which
> helps do UIMA stuff on Hadoop but am only just getting round to
> installing it. However the documentation is a bit thin on the ground?

Did you have a look at http://github.com/jnioche/behemoth/wiki/howto ?
I've changed quite a few things in the way Behemoth's code is managed and
will update the wiki soon, however with some knowledge of Hadoop you should
be able to run the examples.

> Should I learn everything I need about connecting Nutch to UIMA now -
> or will Behemoth help me with that?

Behemoth will convert the Nutch segments into its own representation which
will then be used as an input for GATE or UIMA (or whatever). It does not
connect them as such.

> Do I just need to delve in and
> understand all the code?

re-NER  I'd suggest that you look at GATE (http://gate.ac.uk), play with the
GUI a bit and follow the tutorials there. It has far more available
resources than UIMA and is IMHO more flexible. In particuar it comes with
ANNIE, which is a simple application for NER that is often used as a
starting point by GATE users to build their own pipeline.

On the Behemoth front : a good way to start would be to look at the way
Behemoth converts the Nutch segments into a SequenceFile of
BehemothDocument, use the
CorpusReader<https://github.com/jnioche/behemoth/blob/master/modules/core/src/main...>to
see what the content looks like, then try processing your corpus with
the
GATE app included in the tests following the instructions from the Wiki.

Now that the main refactoring of the code is finished I'll probably spend
more time on the documentation, any contributions, suggestions or questions
are welcome.

HTH

Julien

--
**
*
Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com
http://www.digitalpebble.com*


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »