Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Bishop 0.3.0 - bayesian classifier for Ruby ported from Python
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  12 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Matt Mower  
View profile  
 More options Apr 15 2005, 3:56 am
Newsgroups: comp.lang.ruby
From: Matt Mower <matt.mo...@gmail.com>
Date: Fri, 15 Apr 2005 16:56:45 +0900
Local: Fri, Apr 15 2005 3:56 am
Subject: [ANN] Bishop 0.3.0 - bayesian classifier for Ruby ported from Python
Hi folks,

I've recently released a Ruby port "Bishop" of the "Reverend" bayesian
classifier written in Python. Bishop-0.3.0 is available as a Gem and
from RubyForge

http://rubyforge.org/projects/bishop/

Bishop is a reasonably direct port of the original Python code, bug
reports and suggestions for improving the structure of the code would
be welcomed.

Bishop includes both Robinson and Robinson-Fisher algorithms for
classification.  It is presumed that they were correctly implemented
in Reverend.  I aim to test this in my own use of the code.

Support is included for saving/loading the trained classifier to/from YAML.

An example of using Bishop:

  require 'bishop'

  b = Bishop::Bayes.new
  b.train( "ham", "a great message from a close friend" )
  b.train( "spam", "buy viagra here" )
  puts b.guess( "would a friend send you a viagra advert?" )

  => [ [ "ham", <prob> ], [ "spam", <prob> ] ]

Bishop defaults to using the Robinson algorithm.  To use a different
algorithm construct the classifier passing a block which will call the
choosen algorithm:

  Bishop::Bayes.new { |probs,ignore| Bishop::robinson_fisher( probs, ignore ) }

To save to a YAML file:

  b.save "myclassifier.yaml"

To load from a YAML file:

  b.load "myclassifier.yaml"

You can uniquely identify training items

  b.train( "ham", "friends don't let friends develop on shared
hosting", "<xyz.323-ON8.002-802....@ham.com>" )

An can untrain items:

  b.untrain( <pool>, <item>[, <uid> ] )

I'm using this in a project of my own and would welcome any feedback
or suggested improvements.

Regards,

Matt

--
Matt Mower :: http://matt.blogs.it/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peña, Botp  
View profile  
 More options Apr 15 2005, 5:29 am
Newsgroups: comp.lang.ruby
From: "Peña, Botp" <b...@delmonte-phil.com>
Date: Fri, 15 Apr 2005 18:29:42 +0900
Local: Fri, Apr 15 2005 5:29 am
Subject: Re: [ANN] Bishop 0.3.0 - bayesian classifier for Ruby ported from Python
Matt Mower [mailto:matt.mo...@gmail.com]

#I've recently released a Ruby port "Bishop" of the "Reverend"
#bayesian classifier written in Python. Bishop-0.3.0 is
#available as a Gem and from RubyForge
#
# http://rubyforge.org/projects/bishop/

hmmm, another cool filter. very small, took me less than 5 seconds to
install remotely the gem.

btw, matt, how difficult or easy it it to port the bishop database to a db
like postgres? I am asking since i may be querying/archiving more than
10_000 entries...

kind regards - botp


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matt Mower  
View profile  
 More options Apr 15 2005, 5:42 am
Newsgroups: comp.lang.ruby
From: Matt Mower <matt.mo...@gmail.com>
Date: Fri, 15 Apr 2005 18:42:39 +0900
Local: Fri, Apr 15 2005 5:42 am
Subject: Re: [ANN] Bishop 0.3.0 - bayesian classifier for Ruby ported from Python
Hi Botp,

On 4/15/05, "Peña, Botp" <b...@delmonte-phil.com> wrote:

> Matt Mower [mailto:matt.mo...@gmail.com]

> #I've recently released a Ruby port "Bishop" of the "Reverend"
> #bayesian classifier written in Python. Bishop-0.3.0 is
> #available as a Gem and from RubyForge
> #
> # http://rubyforge.org/projects/bishop/

> hmmm, another cool filter. very small, took me less than 5 seconds to
> install remotely the gem.

> btw, matt, how difficult or easy it it to port the bishop database to a db
> like postgres? I am asking since i may be querying/archiving more than
> 10_000 entries...

This is an excellent question.  I want to use the classifier within a
Rails based information aggregator I am writing to allow
classification of interesting/uninteresting information and perhaps
for automatic labelling.

The problem I have is that the classifier will need to be available to
process each request classifying an item and each request that sorts
items, i.e. quite often.  This probably means initializing it once and
storing it in a session variable.  Since there is no concept of a
session expiry callback (Rails is not an app server).  The question is
"How do I checkpoint the classifier as it is trained?"

At the moment I can serialize it to YAML but that takes a little time
and will get slower as the training set increases.  Doing the YAML
conversion and a SQL update on each request is prohibitive.

I've been considering whether to have the classifier exist in a
separate thread|process and allow it to checkpoint itself
automatically at intervals independent of the users session behaviour.
 Another option was to convert the code so that everything (or nearly
everything) operated directly out of the database.

Representing the pools and training data via SQL would be simple
enough since it's just (word,count) tuples.  Basing the code on a SQL
variant might be quite attractive.  The issue would be making it
portable.

Since I'm using Rails anyway I could certainly attempt an ActiveRecord
based variant which should satisy the Postgres requirement also.

Could be an interesting experiment.  What do you think?

M

--
Matt Mower :: http://matt.blogs.it/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peña, Botp  
View profile  
 More options Apr 15 2005, 6:16 am
Newsgroups: comp.lang.ruby
From: "Peña, Botp" <b...@delmonte-phil.com>
Date: Fri, 15 Apr 2005 19:16:38 +0900
Local: Fri, Apr 15 2005 6:16 am
Subject: Re: [ANN] Bishop 0.3.0 - bayesian classifier for Ruby ported from Python

Matt Mower [mailto:matt.mo...@gmail.com] wrote:

#Since I'm using Rails anyway I could certainly attempt an
#ActiveRecord based variant which should satisy the Postgres
#requirement also.
#
#Could be an interesting experiment.  What do you think?

interesting indeed, and useful too.
thanks and kind regards -botp

#
#M
#
#--
#Matt Mower :: http://matt.blogs.it/
#


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Douglas Livingstone  
View profile  
 More options Apr 15 2005, 6:42 am
Newsgroups: comp.lang.ruby
From: Douglas Livingstone <ramp...@gmail.com>
Date: Fri, 15 Apr 2005 19:42:51 +0900
Local: Fri, Apr 15 2005 6:42 am
Subject: Re: [ANN] Bishop 0.3.0 - bayesian classifier for Ruby ported from Python
On 4/15/05, Matt Mower <matt.mo...@gmail.com> wrote:

> Hi folks,

> I've recently released a Ruby port "Bishop" of the "Reverend" bayesian
> classifier written in Python. Bishop-0.3.0 is available

Could this be combined with http://rubyforge.org/projects/classifier/ ?

It looks like they both have a similar syntax:

classifier.train :symbol, "content"

Would the method_missing syntax be easy to add to Bishop? Would
untrain be easy to add to projects/classifier? From what I've looked
at them so far, sounds like the answer to both would be yes. If they
had the same API, they could go in the same module so that swapping
filter types would be as simple as changing the Classifier::XXX.new
line.

Cheers,
Douglas


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
gabriele renzi  
View profile  
 More options Apr 15 2005, 7:37 pm
Newsgroups: comp.lang.ruby
From: gabriele renzi <surrender...@remove-yahoo.it>
Date: Fri, 15 Apr 2005 23:37:23 GMT
Local: Fri, Apr 15 2005 7:37 pm
Subject: Re: [ANN] Bishop 0.3.0 - bayesian classifier for Ruby ported from Python
Douglas Livingstone ha scritto:

> On 4/15/05, Matt Mower <matt.mo...@gmail.com> wrote:

>>Hi folks,

>>I've recently released a Ruby port "Bishop" of the "Reverend" bayesian
>>classifier written in Python. Bishop-0.3.0 is available

> Could this be combined with http://rubyforge.org/projects/classifier/ ?

+1 on this question/suggestion.
  There may be reasons to have two different libraries, but IMVHO it
would be better to have one slightly bigger library sharing APIs,
services and keeping the useful differences.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jaypee  
View profile  
 More options Apr 18 2005, 6:46 pm
Newsgroups: comp.lang.ruby
From: Jaypee <rf.ooda...@sd.eepyaj>
Date: Tue, 19 Apr 2005 00:46:47 +0200
Local: Mon, Apr 18 2005 6:46 pm
Subject: Re: [ANN] Bishop 0.3.0 - bayesian classifier for Ruby ported from Python
Matt Mower a écrit :
> Hi folks,

> I've recently released a Ruby port "Bishop" of the "Reverend" bayesian
> classifier written in Python. Bishop-0.3.0 is available as a Gem and
> from RubyForge
...

> Regards,

> Matt

Hello Matt,

Thank you for this useful librbary.
I am trying to use it to analyse the project of text for the european
constitution (Is it social? liberal? respectful of human rights?) I am
doing this for myself, just out of curiosity, there is no responsibility
or any liability involved in the usage of the classifier or in the result.
I'd like to know what the behaviour of the training of a classifier is
when two different set of words are submitted in two successive "train"
method invocations for a given category. Does the second invocation
resets the training or does it accumulate the "experience" progressively.

Thanks again ...
Jean-Pierre


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matt Mower  
View profile  
 More options Apr 19 2005, 5:08 am
Newsgroups: comp.lang.ruby
From: Matt Mower <matt.mo...@gmail.com>
Date: Tue, 19 Apr 2005 18:08:37 +0900
Local: Tues, Apr 19 2005 5:08 am
Subject: Re: [ANN] Bishop 0.3.0 - bayesian classifier for Ruby ported from Python
On 4/16/05, gabriele renzi <surrender...@remove-yahoo.it> wrote:

I thought it was about time I responded to this.

If I had known Lucas was working on his classifier library before I
did the port of Reverend I probably wouldn't have bothered.  However I
have done it and am using it in another project of my own and have had
some ideas about possible future developments.

One example is to build a version which runs directly from a SQL
database (possibly using ActiveRecord).  I'm also interested in new
algorithms and possible improvements for support classifying RSS items
within a tag space.

None of which precludes rolling Bishop and Classifier into one project.

However right now I'd like to keep control of Bishop and not be
constrained from making possibly incompatible changes to the API or
implementation.  Similarly Lucas may have his own plans for how he
wants to see Classifier develop.

I don't see the harm in having two projects and what I've suggested to
Lucas is that we should compare notes periodically and see if it makes
sense to merge the projects.  I guess also if a lot of users of the
libraries made a fuss this would affect my opinon.

Regards,

Matt

---
Matt Mower :: http://matt.blogs.it/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matt Mower  
View profile  
 More options Apr 19 2005, 5:48 am
Newsgroups: comp.lang.ruby
From: Matt Mower <matt.mo...@gmail.com>
Date: Tue, 19 Apr 2005 18:48:56 +0900
Local: Tues, Apr 19 2005 5:48 am
Subject: Re: [ANN] Bishop 0.3.0 - bayesian classifier for Ruby ported from Python
Hi Jean-Pierre,

On 4/18/05, Jaypee <rf.ooda...@sd.eepyaj> wrote:

> Thank you for this useful librbary.

You're welcome.

> I am trying to use it to analyse the project of text for the european
> constitution (Is it social? liberal? respectful of human rights?)
> [..snip..]
> I'd like to know what the behaviour of the training of a classifier is
> when two different set of words are submitted in two successive "train"
> method invocations for a given category. Does the second invocation
> resets the training or does it accumulate the "experience" progressively.

You're right when you say it accumulates.  Further training supplies
more evidence to the classifier about which words are associated with
which categories .  It uses this evidence to work out conditional
probabilities which are then combined to make a guess about the
approriate category for an item.

There is an #untrain method if you want to remove previously trained
information.

Regards,

Matt

--
Matt Mower :: http://matt.blogs.it/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
gabriele renzi  
View profile  
 More options Apr 19 2005, 12:15 pm
Newsgroups: comp.lang.ruby
From: gabriele renzi <surrender...@remove-yahoo.it>
Date: Tue, 19 Apr 2005 16:15:27 GMT
Local: Tues, Apr 19 2005 12:15 pm
Subject: Re: [ANN] Bishop 0.3.0 - bayesian classifier for Ruby ported from Python
Matt Mower ha scritto:

<snip all>
thanks for taking time to answer, I can understand your reasons and I'm
glad to know there is at least a touch beetween different hackers on
similar projects, thanks both :)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jaypee  
View profile  
 More options Apr 19 2005, 2:37 pm
Newsgroups: comp.lang.ruby
From: Jaypee <rf.ooda...@sd.eepyaj>
Date: Tue, 19 Apr 2005 20:37:07 +0200
Local: Tues, Apr 19 2005 2:37 pm
Subject: Re: [ANN] Bishop 0.3.0 - bayesian classifier for Ruby ported from Python
Matt Mower a écrit :

Thank you,
Jean-Pierre

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Lucas Carlson  
View profile  
 More options Apr 21 2005, 5:17 am
Newsgroups: comp.lang.ruby
From: "Lucas Carlson" <lu...@rufy.com>
Date: 21 Apr 2005 02:17:13 -0700
Local: Thurs, Apr 21 2005 5:17 am
Subject: Re: Bishop 0.3.0 - bayesian classifier for Ruby ported from Python
The subversion trunk of projects/classifier (see
http://rufy.com/svn/classifier/trunk) has the untrain method in it.
This will be released soon under Classifier 1.2.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »