Naive Bayesian Classifier for Golang

860 views
Skip to first unread message

Jake

unread,
Nov 23, 2011, 12:23:16 AM11/23/11
to golang-nuts
Comments welcome!

https://github.com/jbrukh/bayesian

Thanks,
Jake.

Dmitry Chestnykh

unread,
Nov 23, 2011, 6:19:56 AM11/23/11
to golan...@googlegroups.com
Hi Jake,

Nice!

Style suggestion: please don't use "this" to name struct references. This makes code less readable (and unidiomatic). For example, I first looked at Score function, where you have:

this.classes

then I looked at getWordProb function where you have:

this.freq

So, naturally, I assumed that "this" here and there are the same object. It turned out that was not the case:

func (this *classData) getWordProb(...
func (this *Classifier) Score(...

Suggested names:

func (d *classData) getWordProb(...
func (cls *Classifier) Score(...

Or something like this.

On the code: I'm not totally sure, but I think there's a float underflow hiding somewhere.

-Dmitry

John Asmuth

unread,
Nov 23, 2011, 8:39:03 AM11/23/11
to golan...@googlegroups.com
On Wednesday, November 23, 2011 6:19:56 AM UTC-5, Dmitry Chestnykh wrote:
Style suggestion: please don't use "this" to name struct references. This makes code less readable (and unidiomatic).

I disagree. To me, "this" clearly refers to whatever the current receiver is. I actually use "me" in recent code.

roger peppe

unread,
Nov 23, 2011, 8:52:45 AM11/23/11
to golan...@googlegroups.com

you might use that, but it's certainly not idiomatic go.

i like the fact that if one is using a normal identifier, refactoring
between function and method is trivial.

func X(me *Foo)

doesn't look so good.

i wish it was possible to use gofmt to rename single letter variables.

Jan Mercl

unread,
Nov 23, 2011, 8:54:36 AM11/23/11
to golan...@googlegroups.com
Recently switched to Visual Basic? ;-)

Rick

unread,
Nov 23, 2011, 9:52:22 AM11/23/11
to golang-nuts
Hi Dmitry,
I'm not sure if short variable names (shrtVarNms) are considered
idiomatic Go, but for the sake of a few characters, I much prefer

func (data *classData) getWordProb(...

func (classifier *Classifier) Score(...

- Rick (not Rck ;-)

John Asmuth

unread,
Nov 23, 2011, 11:22:11 AM11/23/11
to golan...@googlegroups.com


On Wednesday, November 23, 2011 8:52:45 AM UTC-5, rog wrote:

you might use that, but it's certainly not idiomatic go.

Where can I find this description of idiomatic go, anyway?

I'd say that methodology rather than choice of identifiers is what should or should not fall under "idiomatic go".
 

i like the fact that if one is using a normal identifier, refactoring
between function and method is trivial.

func X(me *Foo)

doesn't look so good.

This seems irrelevant.

John Asmuth

unread,
Nov 23, 2011, 11:22:43 AM11/23/11
to golan...@googlegroups.com
I've never even seen visual basic code. I guess "me" is used there?

Jake

unread,
Nov 23, 2011, 11:41:07 AM11/23/11
to golang-nuts
Thanks for your comments. In terms of style, I can definitely see both
sides of the argument -- to me, using "this" or using short
abbreviated names like "d" each are confusing in their own way. :) In
terms of functionality, do you recommend going to float64, then?

Thanks,
Jake

John Asmuth

unread,
Nov 23, 2011, 11:46:12 AM11/23/11
to golan...@googlegroups.com
On Wednesday, November 23, 2011 11:41:07 AM UTC-5, Jake wrote:In
terms of functionality, do you recommend going to float64, then?

Definitely.

roger peppe

unread,
Nov 23, 2011, 11:55:19 AM11/23/11
to golan...@googlegroups.com
On 23 November 2011 16:22, John Asmuth <jas...@gmail.com> wrote:
> On Wednesday, November 23, 2011 8:52:45 AM UTC-5, rog wrote:
>>
>> you might use that, but it's certainly not idiomatic go.
>
> Where can I find this description of idiomatic go, anyway?
> I'd say that methodology rather than choice of identifiers is what should or
> should not fall under "idiomatic go".

the Go libraries are the best expression of idiomatic Go that i know.
you won't find a single instance of "me" or "this" as a receiver
identifier there. that's a pretty good hint IMHO. of course, it *is*
a fairly minor point, but consistency of style is useful.

in Go, the receiver is just a normal variable like any other - why
should it be named differently because it's an implicit
argument?

>> i like the fact that if one is using a normal identifier, refactoring
>> between function and method is trivial.
>>
>> func X(me *Foo)
>>
>> doesn't look so good.
>
> This seems irrelevant.

it's relevant when refactoring

func (f *Foo) X()
to
func X(f *Foo)

or

func (f *Foo) DoWithBar(b *Bar)
to
func (b *Bar) DoWithFoo(f *Foo)

Jonathan Amsterdam

unread,
Nov 23, 2011, 5:19:48 PM11/23/11
to golang-nuts
You can rewrite this:

_, present := data.freqs[word]
if !present {
data.freqs[word] = 1
} else {
data.freqs[word]++
}

as this:

data.freqs[word]++

Jake

unread,
Nov 23, 2011, 10:48:02 PM11/23/11
to golang-nuts
Great point, thanks!

Dmitry Chestnykh

unread,
Nov 24, 2011, 6:50:50 AM11/24/11
to golan...@googlegroups.com
On Wednesday, November 23, 2011 5:41:07 PM UTC+1, Jake wrote:
In terms of functionality, do you recommend going to float64, then?

Jake

unread,
Nov 24, 2011, 12:22:35 PM11/24/11
to golang-nuts
Hey Dmitry, thanks for the resources, I'll look into implementing one
of these methods.


On Nov 24, 6:50 am, Dmitry Chestnykh <dch...@gmail.com> wrote:
> On Wednesday, November 23, 2011 5:41:07 PM UTC+1, Jake wrote:
>
> > In terms of functionality, do you recommend going to float64, then?
>
> This would help a bit until you underflow float64 :-)
>
> Here are some resources:
>

> 1.https://en.wikipedia.org/wiki/Bayesian_spam_filtering#Other_expressio...
> 2.http://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-cla...
> 3.http://stackoverflow.com/questions/2691021/problem-with-precision-flo...
>
> -Dmitry

Johan Liebert

unread,
Jan 9, 2023, 2:56:26 PM1/9/23
to golang-nuts

Naive Bayes classifiers are a collection of classification algorithms based on Bayes' Theorem. It is not a single algorithm but a family of algorithms where all of them share a common principle, i.e. every pair of features being classified is independent of each other.
Reply all
Reply to author
Forward
0 new messages