linguistic preprocessing/stemming algorithm in Go for IR

205 views
Skip to first unread message

Warren Bare

unread,
Jun 28, 2013, 2:36:32 PM6/28/13
to golan...@googlegroups.com
Hi Guys,

Does anyone know of a linguistic preprocessing (stemming and such) routine in Go to be used in a information retrieval?   I like the included tokenizer, and I've taken a look at godoc/index.go, but since that is geared toward programming language as opposed to human language, it does not need preprocessing.

There is a old golang-nuts post on Lucene, but I have not found a Go port of that.  If someone knows of Go Lucene, that would have what I'm looking for.  The project "golucene" is empty.

(You know somewhere inside Google there is a killer Go package for this :-)

Many Thanks!

W

Rodrigo Kochenburger

unread,
Jun 28, 2013, 2:48:27 PM6/28/13
to golan...@googlegroups.com
An option would be to use elasticsearch (which runs on top of lucene) and interface w/ it through the REST api. Unless you wanna be able to tweak and customize the algorithms, it's probably way easier and faster.
Message has been deleted

Warren Bare

unread,
Jul 4, 2013, 12:12:44 PM7/4/13
to Miki Tebeka, golan...@googlegroups.com
Hi Miki,

Thanks very much for the link for snowball.  That helps a lot.  The tebeka package looks great.

Thanks also to you and others that suggested the Lucene interface.  The project I'm working on it not really indexing...  it is more a research project on the relationships between words in a certain context. While we could use Lucene just to get a term stream, for a variety of reasons it made more sense just to do that in our own software.  I've used Lucene in the past.  It is a great project.

Thanks,
W



On Tue, Jul 2, 2013 at 1:20 PM, Miki Tebeka <miki....@gmail.com> wrote:
>>There is a old golang-nuts post on Lucene, but I have not found a Go port of that.  If someone knows of Go Lucene, that would ??have what I'm looking for.  The project "golucene" is empty.
>An option would be to use elasticsearch (which runs on top of lucene) and interface w/ it through the REST api. Unless you wanna be able to tweak and customize the algorithms, it's probably way e

I second that.

Another option will be to look at http://godoc.org/ for NLP packages (like https://bitbucket.org/tebeka/snowball ;)

On Friday, June 28, 2013 11:48:27 AM UTC-7, Rodrigo Kochenburger wroteasier and faster.

On Friday, June 28, 2013 11:36:32 AM UTC-7, Warren Bare wrote:
Hi Guys,

Does anyone know of a linguistic preprocessing (stemming and such) routine in Go to be used in a information retrieval?   I like the included tokenizer, and I've taken a look at godoc/index.go, but since that is geared toward programming language as opposed to human language, it does not need preprocessing.


(You know somewhere inside Google there is a killer Go package for this :-)

Many Thanks!

W

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/rDHPcJCR-sA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Reply all
Reply to author
Forward
0 new messages