Computation of error measures in evaluate.py

126 views
Skip to first unread message

Jana

unread,
Apr 7, 2016, 5:10:16 AM4/7/16
to bob-devel
Hello,

I'm trying to compare speaker ID results from bob.spear with some of my earlier results. I would normally take the highest scoring speaker model for each of my test files (those in for_probes.lst), check if the likelihood score is above threshold, and if it is and the identity of the test file and the model match it's a true positive, or if the identity doesn't match it's a false positive. Similarly, if the score is below threshold, it's a true or a false negative.

Looking at the code in evaluate.py it seems that it doesn't look for the highest scoring model for each test file, but always uses the results from all models.

For example a result file like this gives 25% HTER, even though testFileA scored highest on the correct model.
speaker1 speaker1 testFileA 0.8
speaker2 speaker1 testFileA 0.7
speaker1 speaker2 testFileB 0.1
speaker2 speaker2 testFileB 0.4

If I add more models that my testFileA doesn't match with low scores, error rate goes down, this file gives a HTER of 12.5%
speaker1 speaker1 testFileA 0.8
speaker2 speaker1 testFileA 0.7
speaker3 speaker1 testFileA 0.1
speaker4 speaker1 testFileA 0.2
speaker1 speaker2 testFileB 0.1
speaker2 speaker2 testFileB 0.4

Could you please verify if this is on purpose?

Thanks!
Jana

Tiago Freitas Pereira

unread,
Apr 7, 2016, 8:43:05 AM4/7/16
to bob-...@googlegroups.com
Hi Jana,

I'm not sure if I understood your question, but I have the feeling that you are mixing verification with identification (closed set case).

For a verification (1:1 comparison) you must define a decision threshold.
For the bob.spear, usually the threshold that corresponds to the EER is defined.

If you run the code bellow (that uses the score from your first example) you will see that your decision threshold is defined in 0.55. This gives you an HTER of 50% (0.5 in the print)

import bob.measure
import numpy

genuines  = numpy.array([0.8, 0.4])
impostors = numpy.array([0.7, 0.1])

THRESHOLD = bob.measure.eer_threshold(impostors, genuines)
far,frr   = bob.measure.farfrr(impostors,genuines,THRESHOLD)

HTER = (far+frr)/2.
print HTER


On the other hand, for identification (closed set; 1:N), the procedure to get the accuracy (or the Rank-1 identification rate) is as you described (by taking the highest score). If you want to get this behaviour using the evaluate.py script, set the --rr option.

Sorry if I misunderstood.

Cheers

--
-- You received this message because you are subscribed to the Google Groups bob-devel group. To post to this group, send email to bob-...@googlegroups.com. To unsubscribe from this group, send email to bob-devel+...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/bob-devel or directly the project website at http://idiap.github.com/bob/
---
You received this message because you are subscribed to the Google Groups "bob-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bob-devel+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Tiago

Jana

unread,
Apr 7, 2016, 10:18:30 AM4/7/16
to bob-devel
Hi Tiago,

Thanks for your quick reply. I think what I am looking for is a mix of the two error measures you describe. I have a set of known speakers with models, and test speakers which might be out of set. For every test utterance, I want to know the identity if in set, or get the information that this is an unknown speaker. So I need the highest scoring model, but also a threshold as my test speaker could be out of set.

The HTER as implemented and related measures obviously don't work for this scenario. Recognition rate also doesn't work, as it doesn't take into account any out of set utterances (and therefore also doesn't have a threshold).

I can write my own evaluation procedure if this type of error measure doesn't exist in bob. I had assumed it was fairly standard, and didn't check the details of the EER computation properly and therefore got surprised, my fault!

Out of interest, do you know of any data sets or publications for my type of scenario, ie recognition but with out of set test utterances?

Thanks again,
Jana

Tiago Freitas Pereira

unread,
Apr 7, 2016, 12:11:31 PM4/7/16
to bob-...@googlegroups.com
Hi Jana,

So you are working with an open set identification.

The evaluation.py script on bob.bio.spear doesn't have this handy, but you can do this using bob.

You can do a script by yourself loading the scores with the function `bob.measure.load.cmc_four_column` (http://pythonhosted.org/bob.measure/py_api.html#bob.measure.load.cmc_four_column) and use the function `bob.measure.recognition_rate` (http://pythonhosted.org/bob.measure/py_api.html#bob.measure.recognition_rate) setting a threshold as an argument.

About your second question, what is your scenario?

Cheers



--
-- You received this message because you are subscribed to the Google Groups bob-devel group. To post to this group, send email to bob-...@googlegroups.com. To unsubscribe from this group, send email to bob-devel+...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/bob-devel or directly the project website at http://idiap.github.com/bob/
---
You received this message because you are subscribed to the Google Groups "bob-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bob-devel+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Tiago

Manuel Günther

unread,
Apr 7, 2016, 12:13:15 PM4/7/16
to bob-devel
Dear Jana,

in fact, there is no standard measure for the scenario that you describe, which we would call an open set recognition problem. I just talked to a guy from that area, and it seems that people are working on such a measure, but there is no standard for that yet. 
@all: Please correct me if I am wrong with that.

A simple solution would be to add a threshold to the command line options of evaluate.py. After reading all scores, you might filter them by the threshold, before you hand them over to the recognition method. Be aware that this method might fail when you provide empty lists (i.e., when you have filtered out all scores).

Note also that bob.spear is outdated, and you might want to switch to its successor bob.bio.spear (http://pythonhosted.org/bob.bio.spear), which is now better integrated into the bob.bio framework: http://pythonhosted.org/bob.bio.base
However, an open set metric is also not implemented into the evaluate.py inside bob.bio. If you want and you feel enabled to do that, you are welcome to implement your own metric and push it to the bob.bio.base GitHub repository.

Manuel

Tiago Freitas Pereira

unread,
Apr 7, 2016, 12:33:35 PM4/7/16
to bob-...@googlegroups.com
Hi,

Just to complement my answer (I don't know if I was clear), the function `bob.measure.recognition_rate` computes the recognition rate in a closed-set scenario under a decision threshold.

To complement the @Manuel's, in the NIST reports, for an open-set problem, they compute the TPIR (True Positive Identification Rate) under a certain threshold (usually based on a FPIR threshold (False Positive Recognition Rate)).





--
-- You received this message because you are subscribed to the Google Groups bob-devel group. To post to this group, send email to bob-...@googlegroups.com. To unsubscribe from this group, send email to bob-devel+...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/bob-devel or directly the project website at http://idiap.github.com/bob/
---
You received this message because you are subscribed to the Google Groups "bob-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bob-devel+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Tiago

Manuel Günther

unread,
Apr 7, 2016, 1:18:17 PM4/7/16
to bob-devel
I have just googled for it (as I will need to implement an open set recognition rate in near future, too), and I found that "The Handbook of Face Recognition" already in 2005 had a section on "Open Set Identification", which basically thresholds the similarity score.

I will implement that both to bob.measure, and provide a parameter to evaluate.py. I'll let you know, once I am finished.

Manuel

Manuel Günther

unread,
Apr 7, 2016, 3:02:07 PM4/7/16
to bob-devel
I have just implemented a --thresholds parameter into to evaluate.py, which will threshold the recognition scores to compute an open set recognition rate (which is called "Detection and Identification Rate" in "The Handbook of Face Recognition"). Apparently, the threshold was already implemented into bob.measure, so I didn't have to change that package (yet).

However, this book also proposes a plot that would plot the "Detection and Identification Rate" versus the "False Acceptance Rate", i.e., similarly to the ROC curve. I will investigate a bit more on that, and most probably add the calculation of that into bob.measure, and later an option into the evaluate.py that would allow to use that as a plot.

A new version of both bob.measure and bob.bio.base will follow, when everything is well-integrated and tested.

Jana

unread,
Apr 8, 2016, 10:26:41 AM4/8/16
to bob-devel
Hi all,

Thanks for all your replies. The background for my setting is a large archive in which I want to find known speakers. I have models for a limited number of speakers that I want to find, but most of the speakers in the archive are unknown/out of set. Using just the recognition rate on my known speakers gives overly optimistic results, as I have to threshold quite heavily to avoid a large number of false positives from my out of set speakers.

I've so far used f-measure, but using only the highest scoring model for each test utterance, and thresholding that to avoid too many false positives. I also found it useful to set a fixed minimum precision, and then see what recall can be achieved. For example saying at least 90% precision, ie 1 in 10 results could be wrong, and then comparing different recalls. Much easier to explain to non-technical people than f-measure, and it has a more direct relation to the quality perceived by a user. From what Tiago writes it seems that's what NIST is doing as well, so will have a closer look at that. Looking at the evaluation measures from the face recognition people is a good hint too, thanks.

Currently I have some not very pretty evaluation scripts in matlab and python, happy to cross check against spear, also happy to help implementing it directly if I find the time (which is a bit limited as this is at the moment only a side project for me).

Thanks again,
Jana


P.S. I am using the newer bob.bio.spear, I was just sloppy writing it down, sorry.




On Thursday, 7 April 2016 10:10:16 UTC+1, Jana wrote:

Manuel Günther

unread,
Apr 8, 2016, 7:50:13 PM4/8/16
to bob-devel
I have just implemented the "Detection and Recognition Rate Curve" that was proposed by Jonathan Phillips in the "Handbook of Face Recognition" into bob.measurehttps://github.com/bioidiap/bob.measure/tree/DIR
This curve plots the open set recognition rate over a range of false accept rates (false positive rate), so that you can read out the performance of your system based on the (relative) number of falsely accepted speakers. Additionally, this plot can be created for various ranks, by default it uses rank 1.

@Jana: I think this might be better suited for your problem. You might want to check out the "DIR" branch of bob.measure to see if the plot fits your needs.

@Others: Could you have a look into this plot and tell me if you think that it is implemented correctly? The first sanity check (i.e., that the DIR at 100% FAR equals the rank 1 recognition rate) is at least fulfilled. 
If yes, I would merge it to the master branch and push a new version of bob.measure. I would also incorporate the new plot into the evaluate.py of bob.bio.base.

Jana

unread,
Apr 12, 2016, 1:03:36 PM4/12/16
to bob-devel
Thanks Manuel!

Can you give me a hint how I best link to a separately checked out version of bob.measure when currently I have the bob.bio.spear package, which has bob.measure as dependency?

Side question about buildout.cfg, it seems to be missing its gmm egg, which was easy to fix. Version history shows bob.bio.gmm was removed in the last update, was that on purpose for some reason unclear to me, or an accident?

Jana

Manuel Günther

unread,
Apr 12, 2016, 1:30:07 PM4/12/16
to bob-devel
Dear Jana,

we have taken out the bob.bio.gmm egg as -- theoretically -- you can run speaker verification experiments without using GMM-based algorithms. In fact, usually you use bob.bio.spear and other bob.bio packages in a separate package, as documented here: http://pythonhosted.org/bob.bio.base/installation.html

There is an easy way to check out a package from GirHub and overwrite the ones in the eggs directory using buildout. Simply add the mr.developer as an extension, and get bob.measure with branch DIR checked out automatically, e.g.:

eggs = ...
       bob.measure

extensions = bob.buildout
             mr.developer
auto-checkout = *
develop = src/bob.measure
          ...
          .

; options for bob.buildout
debug = true
verbose = true
newest = false

[sources]
bob.measure = git https://github.com/bioidiap/bob.measure branch=DIR
...

[scripts]
recipe = bob.buildout:scripts
dependent-scripts = true

I hope this helps.

Jana

unread,
Apr 22, 2016, 11:21:13 AM4/22/16
to bob-devel
Hi Manuel,

Sorry for my late reply, busy with other stuff... I managed to check out the branch, thanks.

I think there might be a bug with passing the threshold value to recognition_rate(). The second parameter is the optional rank, so in src/bob.bio.base/bob/bio/base/script/evaluate.py, calling it
rr = bob.measure.recognition_rate(cmcs_dev[i], args.thresholds[i])
means the threshold value gets mistaken for the rank and threshold is the default 'none'. I didn't immediately see where the proper rank parameter is stored, so did a quick fix by always supplying the default value of 1 (but of course this will mess it up if someone does want to change the rank)
rr = bob.measure.recognition_rate(cmcs_dev[i], 1, args.thresholds[i])
Like this I could get the threshold to work and results on within set speakers are as expected.

As far as I can see, out of set speakers still get ignored, I'm not sure if this is on purpose?
My toy test file looks like this

speaker1 speaker1 testFileA 0.8
speaker2 speaker1 testFileA 0.7
speaker1 speaker2 testFileB 0.1
speaker2 speaker2 testFileB 0.4
speaker1 speaker3 testFileC 0.9
speaker2 speaker3 testFileC 0.8

I get 100% recognition rate for a threshold of 0.3, and 50% for a threshold of 0.5. But the fact that testFileC / speaker3 produces a false positive is not reflected in the results. I definitely need the out of set speakers (I know they cause lots of trouble and mess up my results by scoring highly on random models), did you mean to exclude them?
As far as I can see to get them the file loading routine bob.measure.load.cmc_four_column would need to be adjusted, or maybe even a complete rewrite, not sure the current data structure is actually suitable.

You also mentioned that you're planning to work with an open set, is this a publicly available one? It would be great to have a proper test set to compare with others (and unfortunately due to copyright I can't currently share mine).

Thanks again,
Jana

Manuel Günther

unread,
Apr 22, 2016, 11:44:55 AM4/22/16
to bob-devel
Dear Jana,

first, I guess that I wrote the code a bit in a hurry. I will correct the parameters of recognition_rate upstream. I will also add a --rank parameter, which is currently not available (I had an implementation in a branch, but never the time to merge it).

For the rest, I see what your problem is. Indeed, in the current implementation of the recognition rate, probes with no corresponding gallery entries are ignored (you should get a warning message, when you run with the -vv option). I will need to think about a solution. I could simply add these to a None user, so that I don't need to change the data structure. However, I will need to do more tests in order to see if this is possible / sufficient.

As I am doing face recognition (and not speaker recognition) in an open set scenario, there is little chance that we can compare our results. We are planning to publish the code open source, but we are far from having anything stable by now.

Thanks for your detailed report.
Manuel

Manuel Günther

unread,
Apr 25, 2016, 5:15:46 PM4/25/16
to bob-devel
I have a short question on how to evaluate the open set recognition rate correctly, i.e., in two cases, which were ignored so-far:

1. for a given probe sample with only positive scores (all gallery items this probe was tested with came from the same person), I assume that the image is recognized correctly, when the score is above decision threshold. 
2. for a given probe sample with only negative scores (this person is not in the gallery), I will count the scores to be mis-detected only if they are above threshold. Otherwise the probe element is completely ignored, i.e., it does not count to the total probe samples, i.e., the denominator of the recognition rate:
rr = (number of correctly classified samples) / (number of probe samples)

For 1, I think that should be the obvious solution, though this should not happen in reality.
For 2, I wonder if this is the correct approach, though. With this approach, theoretically a recognition rate of 100 % is possible, even when there are out-of-gallery probe samples, which seems reasonable to me.
Is there any theoretical background, on which I could base this approach? 
In the Handbook of Face Recognition (mentioned above), there is never any modification made to the denominator. In fact, there is only a definition for a "detection and identification rate", which cannot reach 100% when there are out-of-gallery probes. However, this seems to be counter-intuitive to me: when I correctly reject a probe sample to be of no known user, why does it still count negatively into the error rate?

If you guys from Idiap have any more insight into that, could you please enlighten me?

Manuel Günther

unread,
Apr 26, 2016, 4:13:08 PM4/26/16
to bob-devel
Alright, I have implemented a couple of things in the DIR branch of bob.measure.
First, I used the implementation of the open set recognition rate as defined above, which apparently does not correspond to any standard measure. The recognition rate can be used with both closed-set and open-set scores.

Additionally, I also implemented the standard measures, which is a related pair of measures that are defined in the Handbook of face recognition in chapter 14.
These rates are the Detection and Identification Rate, which is computed only on the in-set scores, and the False Alarm Rate, which uses the out-of-set scores only.
Both measures work only on open set identification protocols, and for both a threshold must be defined.

Additionally, I have implemented the Detection and Identification Curve (also defined in that book chapter), which plots the Detection and Identification Rate over the False Alarm Rate.
Please note that the order of the parameters might have changed again, in order to be compatible with older releases of bob.measure.

BTW: I had to read the chapter again to understand that the Detection and Identification Rate is computed only on the in-set scores, and that the denominator is actually the number of probes, for which a gallery item exists.
My claim from above
In the Handbook of Face Recognition (mentioned above), there is never any modification made to the denominator. In fact, there is only a definition for a "detection and identification rate", which cannot reach 100% when there are out-of-gallery probes.
is incorrect. 

Let me know if you still have issues.
Manuel

Tiago Freitas Pereira

unread,
Apr 27, 2016, 5:11:42 AM4/27/16
to bob-...@googlegroups.com
Hi Manuel,

First, thanks for the handbook reference it is clearer than the NIST report that I was using as a guide (The IJB-A Face Identification Challenge Performance Report).

I've debugged the DIR branch and it seems correct to me.

I pushed some time ago 3 openset scores files with 9 probes  in which 7 probes are from identities in the gallery and 2 probes are from identities out of the gallery..
The first one has no classification error (scores-cmc-4col-open-set.txt);
The second one has 1 classification error (scores-cmc-4col-open-set-one-error.txt) and
The third has two classification errors, with one error from a probe in the gallery and one out of the gallery (scores-cmc-4col-open-set-two-errors.txt).

Using the equation (1) in the chapter 14 of the hand book and using \tau = 0.5 we have and identification error of

 - 7/7  for scores-cmc-4col-open-set.txt
 - 6/7  for scores-cmc-4col-open-set-one-error.txt
 - 6/7  for scores-cmc-4col-open-set-two-errors.txt

Using the equation (2) in the chapter 14 of the hand book and using \tau = 0.5 we have and identification error of
 - (1 - 2/2) for scores-cmc-4col-open-set.txt
 - (1 - 2/2) for scores-cmc-4col-open-set-one-error.txt
 - (1 - 1/2) for scores-cmc-4col-open-set-two-errors.txt


Honestly, when I first read the NIST report I misunderstood the False Alarm Rate computation (equation (1) in the NIST report).
Instead of consider only the probes out of the gallery I was considering all probes in my computation.
Reading the handbook and your code, the things became clearer to me.

Regarding your question here.
https://github.com/bioidiap/bob.measure/issues/10#issuecomment-214786866
"you do not specify a threshold, but you still assume that all probes are classified correctly (which is not true), ".
The test that I made is correct, because the bob.measure.load.cmc_four_column will load only the scores that has positive probes. For this particular test if you set the \tau to 0.5 will output the same recognition rate with no \tau.

Just to be clear, what I implemented there (https://github.com/bioidiap/bob.measure/blob/master/bob/measure/__init__.py#L82) is the closed-set identification rate under a threshold.
According to the handbook (chap 14; section 1.3) such measure doesn't exist, because you always "set" the threshold (\tau) to -inf.

Thanks

--
-- You received this message because you are subscribed to the Google Groups bob-devel group. To post to this group, send email to bob-...@googlegroups.com. To unsubscribe from this group, send email to bob-devel+...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/bob-devel or directly the project website at http://idiap.github.com/bob/
---
You received this message because you are subscribed to the Google Groups "bob-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bob-devel+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Tiago

Manuel Günther

unread,
Apr 27, 2016, 12:09:44 PM4/27/16
to bob-devel
Dear Tiago,

indeed, the recognition rate under a threshold is simply a closed-set recognition rate with a threshold, which does not meet any standards. Also, as pointed out above and as clearly documented, the new implementation of the recognition rate in the DIR branch is not anything standard -- at least not under a threshold or with open-set scores. I tried to implement what I thought might be interesting. Maybe we can try to push a paper with this new measure :-)

The recognition rate is only valid for closed-set scores with no threshold. I am not even sure if the handbook of face recognition even uses this term at all. 
I thought, in the master branch, you were implementing another measure -- the detection_identification_rate in the DIR branch -- as an open set adaptation of the recognition rate. I thought that because the name of your test was indicating open set, and it turned out to be that, when using the new score IO, see below. 

Also, as you pointed out, the negative scores with no according positive scores are not read by the cmc_four_column function. However, as the original post from Jana requested to change that, I have implemented this in the DIR branch, where now all scores for all probes are read and returned. When only positive or negative scores exist for a given probe, the other pair is simply None (an empty array should also work). In this case, your test for the open set recognition rate failed -- as it was assuming something different. Now, any of the new measures can use this fact, and sort out any score pairs that they like: 
* only closed-set scores for the detection_idetification_rate
* only open-set scores for the false_alarm_rate
* both types for the open set recognition rate under a threshold (which is the non-standard method)

I have seen your test cases, and I have adapted them to work with the new measures:
1) equation (1) -- the detection_identification_rate with threshold 0.5 (there is no case with no threshold any more)
* 7/7 for scores-cmc-4col-open-set.txt
* 6/7 for scores-cmc-4col-open-set-one-error.txt
* 6/7 for scores-cmc-4col-open-set-two-errors.txt

2) equation (2) -- the false_alarm_rate with threshold 0.5 (this is an error rate, lower values are better):
* 0/2 for scores-cmc-4col-open-set.txt
* 0/2 for scores-cmc-4col-open-set-one-error.txt
* 1/2 for scores-cmc-4col-open-set-two-errors.txt

3) no equation, just my random implementation of the open set recognition_rate under threshold 0.5:
* 7/7 for scores-cmc-4col-open-set.txt
* 6/7 for scores-cmc-4col-open-set-one-error.txt (all open set scores filtered by the threshold)
* 6/8 for scores-cmc-4col-open-set-two-errors.txt (one open-set score was not filtered by the threshold)

4) and without a threshold (all open set scores count as mis-recognized):
* 7/9 for scores-cmc-4col-open-set.txt
* 6/9 for scores-cmc-4col-open-set-one-error.txt
* 6/9 for scores-cmc-4col-open-set-two-errors.txt

@all: For me this seems reasonable, but I can understand if people don't like the denominator to change in 3). Let me know of your opinion. The travis builds for this branch are green, so I could merge it into the master branch when there are no oppositions.

Tiago Freitas Pereira

unread,
Apr 28, 2016, 10:26:08 AM4/28/16
to bob-...@googlegroups.com
Hi Manuel,

I'm a bit confused,

In 3) (third bullet point), I understood that the denominator is 8 because of this line `3 5 probe_9 10` (scores-cmc-4col-open-set-two-errors.txt ).
Basically we have one out-of-gallery probe with a score higher than the threshold.
I guess this is a bit confusing, because this measure it is kind of a blend between the recognition rate and false alarm.
So what exactly this recognition rate tells us?


In 4) I understood that will never be possible to have 9/9.
So I guess, this one doesn't make much sense, what do you think?


Thanks and cheers



--
-- You received this message because you are subscribed to the Google Groups bob-devel group. To post to this group, send email to bob-...@googlegroups.com. To unsubscribe from this group, send email to bob-devel+...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/bob-devel or directly the project website at http://idiap.github.com/bob/
---
You received this message because you are subscribed to the Google Groups "bob-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bob-devel+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Tiago

Manuel Günther

unread,
Apr 28, 2016, 11:44:24 AM4/28/16
to bob-devel
Tiago,

it is true that the metric is a bit confusing. In fact the "open set recognition rate under a threshold" (I would call it like that) is a mix between the "detection and identification rate" and the "false alarm rate". There is no formal definition of the open set recognition rate (under a threshold), so I was trying to come up with one. 
I agree that it might be a bit confusing and I can see many reasons, why people wouldn't like it. The recognition rate tells us: "How many probes that are above threshold are correctly identified." 

Indeed, in 4) there is absolutely now way to reach 100% recognition rate. As there is no threshold defined, all probes are considered to be above threshold and, thus, so are the open set scores. Hence, they always get misclassified.

Maybe, we should remove this implementation (it is already marked as something non-standard), and allow recognition rates only to be computed for closed-set scenarios (without a threshold), and leave all open set recognition experiments to DIR and FAR.
This is already true for the CMC curve, which requires closed-set scores, while the detection and identification curve plots only open set scores.

What do you think?

Tiago Freitas Pereira

unread,
Apr 28, 2016, 11:49:38 AM4/28/16
to bob-...@googlegroups.com
Manuel, for me it is perfect.

@all, any objection?

Cheers

--
-- You received this message because you are subscribed to the Google Groups bob-devel group. To post to this group, send email to bob-...@googlegroups.com. To unsubscribe from this group, send email to bob-devel+...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/bob-devel or directly the project website at http://idiap.github.com/bob/
---
You received this message because you are subscribed to the Google Groups "bob-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bob-devel+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Tiago
Reply all
Reply to author
Forward
0 new messages