A new performance metric for the LVIS dataset

149 views
Skip to first unread message

Emre Akbas

unread,
Dec 4, 2020, 12:36:14 PM12/4/20
to lvis-d...@googlegroups.com, Kemal Öksüz, Barış Can Çam, Sinan Kalkan

Dear maintainers of the LVIS API,


With this mail, we would like to inform you about our new performance metric “Localisation Recall Precision (LRP) Error” that can evaluate different visual detection tasks such as object detection, keypoint detection, instance segmentation and panoptic segmentation.


LRP Error was first published in ECCV2018 (https://arxiv.org/abs/1807.01696); and recently we have released an extended version (“One Metric to Measure them All: Localisation Recall Precision (LRP) for Evaluating Visual Detection Tasks” - currently under review at TPAMI) with thorough analyses on the drawbacks of Average Precision (AP) -- including the COCO-style AP, and empirical comparisons of LRP and AP for all four aforementioned visual detection tasks. This preprint is available at: https://arxiv.org/abs/2011.10772 


We believe that our preprint is highly relevant for you, and we hope that you consider incorporating LRP Error into the official lvisapi due to following reasons:


  • We present three important features that a performance measure to evaluate an object detector should have (Section 1.1). These important features are easy to agree on: completeness (consider all performance aspects, i.e. localisation error, false positive and false negative rates, precisely), interpretability (the resulting value should address the strengths and weaknesses of the detector) and practicality (usage limitations).

  • Then, we make an analysis of AP in terms of these important features and show that AP does not have any of these features (Section 3.2). For example, AP (also COCO-style AP) loosely considers localisation; a resulting AP value is difficult to interpret. 

  • Moreover, in terms of practicality, for rare classes AP can be very sensitive (up to 20% relative superficial difference) to interpolating the PR curve. This can especially affect the evaluation results of LVIS dataset due to its long-tail nature with a median of 9 instances per class in the validation set (If requested, we can provide an analysis for LVIS).

  • On the other hand, LRP Error has all these important features: It considers all performance aspects precisely, it is easy to interpret (also with components) and it has no significant practical drawback, e.g. its computation is exact.


Currently, we are hosting the APIs of COCO and LVIS with LRP Error in our project page (https://github.com/kemaloksuz/LRP-Error) and report LRP Error in addition to standard AP-based measures for these datasets. Consistent with the official lvisapi, our repository yields LRP-based metrics in addition to AP-based measures. LRP  output for the example inputs provided by the official lvisapi (i.e. lvis_val_100.json and lvis_results_100.json) can be found at https://drive.google.com/file/d/1zjgych0uL_1zNjk0kqBskUqhTGA7zfVh/view?usp=sharing. While, in this example, LRP Error is computed for “all” areas and “300” maximum detections, and also according to the number of examples in a class (i.e. rare, common and frequent)  to keep it more clear, LRP Error and its components can similarly be obtained for all area criteria in your standard output.


As a result, as one of the leading datasets driving progress in the field, LVIS could be the pioneer dataset to add support for LRP Error with its many theoretical and practical benefits. If you agree, we can implement LRP Error following your requirements and make a pull request on the official lvisapi.


Thank you in advance,

Sincerely yours


K. Oksuz, B. C. Cam, S. Kalkan, E. Akbas


Ross Girshick

unread,
Dec 7, 2020, 2:59:57 PM12/7/20
to LVIS Dataset
Thanks for raising our awareness of your work on LRP and for the pointer to the extended version of your article. We plan to look into it carefully.

Best,
Ross

Reply all
Reply to author
Forward
0 new messages