Regardiing precision & recall [Classification]

612 views
Skip to first unread message

Shubham Khandelwal

unread,
Sep 1, 2016, 6:56:15 AM9/1/16
to fastText library
Hi Everyone,

Thank you for fastText.

I have a doubt, may be I'm missing something.

The precision (P@1) and recall (R@1) is same while running ./classification-results.sh for every dataset respectively

./fasttext test model.bin test.txt k


In this, Can you tell what is the significance of 'k' ? as I think precision and recall can not be same always.

Thanking You.

Shubham Khandelwal

unread,
Sep 1, 2016, 8:24:55 AM9/1/16
to fastText library
Hi,

I understood why precision and recall is same.
But still, I am not getting what does "k" means here ?

Edouard G.

unread,
Sep 5, 2016, 12:21:01 PM9/5/16
to fastText library
The argument k of "fasttext test" and "fasttext predict" is the number of labels which are predicted by the model for each test example. This option is useful for multilabel settings.

Aziz Alto

unread,
Sep 8, 2016, 2:36:17 PM9/8/16
to Shubham Khandelwal, fastText library
k refers to the k most likely labels for each sample. So, with k = 3 the system will return the top 3 possible labels (the possibly closer to correct) for each sample. Thus, P@3 means Precision score considering all (the three X) returned solutions.
Check out a better description on this wiki page.

--
Aziz

-- 
You received this message because you are subscribed to the Google Groups "fastText library" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fasttext-libra...@googlegroups.com.
To post to this group, send email to fasttext...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fasttext-library/28cac0a0-2414-4d2b-ada8-3a156dfe45b4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ben...@googlemail.com

unread,
Sep 22, 2016, 1:00:24 AM9/22/16
to fastText library
On Tuesday, September 6, 2016 at 4:21:01 AM UTC+12, Edouard G. wrote:
The argument k of "fasttext test" and "fasttext predict" is the number of labels which are predicted by the model for each test example. This option is useful for multilabel settings.

Is a fixed k really all that useful for multilabel prediction/testing?

I'm looking into using Fasttext to classify newspaper articles by topic (eg "education", "crime", "economy", "sport" etc).
I've got about 20 labels in total in my training set, and any given article might have anywhere from 0 to 10 relevant labels, say.

So:

    fasttext predict mymodel.bin articles.txt 10



returns 10 labels for _every_ article - even if they don't really deserve that many labels (most articles should really have only 2 or 3 labels at most).
Using "predict-prob" instead gives me probabilities with which I can cull out the obviously-wrong ones, which is fine, albeit a little more cumbersome..

But it seems to me that instead of having a fixed k, it'd be more useful to have a probability threshold instead.

(And forgive me if I'm missing some basic bit of text-classification lore here - I'm new to this stuff :- )

Ben.

Edouard G.

unread,
Sep 23, 2016, 11:22:09 AM9/23/16
to fastText library, ben...@googlemail.com
Hi,

You can also use the command

./fasttext predict-prob mymodel.bin articles.txt 10 

which will also output the score for each label. You can then threshold the predicted labels based on their score (instead of predicting a fixed number of labels).

Best,
Edouard.

ben...@googlemail.com

unread,
Sep 24, 2016, 3:55:41 PM9/24/16
to fastText library, ben...@googlemail.com
On Saturday, September 24, 2016 at 3:22:09 AM UTC+12, Edouard G. wrote:
Hi,

You can also use the command

./fasttext predict-prob mymodel.bin articles.txt 10 

which will also output the score for each label. You can then threshold the predicted labels based on their score (instead of predicting a fixed number of labels).

Yes - that's the way I ended up going.
But I think it still leaves the "test" option mostly useless on multi-label data, right? (or at least data sets which can have a _variable_ number of labels assigned).
I don't have any ideas on how to make "test" work better in such cases, other than filtering the returned labels using some kind of score threshold... which could get pretty fiddly. Endless tuning to find the right cutoff point for particular data sets... ugh

Thanks,
Ben.

ber...@spaziodati.eu

unread,
Sep 30, 2016, 12:31:09 PM9/30/16
to fastText library
Hi, I read only now this thread, I posted yesterday a, issue on this topic on github: https://github.com/facebookresearch/fastText/issues/93
I am now computing precision and recall on my own, only using the predict function on the testset samples.

Cheers
Reply all
Reply to author
Forward
0 new messages