Robustness Scores

Ross

unread,

Apr 25, 2019, 9:18:24 PM4/25/19

to VOT Challenge technical support

Hi,

The robustness scores reported in the tables in the report generated by the toolkit

are just the weighted average number of failures. However many papers (and

the official VOT presentation) report scores that use the exp(-(100 or 30) * failures) formula.

See for example Table 2 in https://arxiv.org/pdf/1812.11703.pdf.

My question is: where in the VOT report can we get these numbers? In order to build

tables similar to the ones in the paper.

thanks,

Ross

robust.png

1Nn9GuJtDob.png

Jonathan Tompson

unread,

May 11, 2019, 8:14:15 PM5/11/19

to VOT Challenge technical support

+1

I'd also like some clarification on this point. Can one of the VOT developers please comment? Why are the metrics reported in papers not consistent with the output of VOT's toolkit?

alan lukežič

unread,

May 14, 2019, 3:19:42 AM5/14/19

to VOT Challenge technical support

Hi,

in the VOT18 Challenge results paper, the raw robustness is reported. You can convert the toolkit output into the paper's scores as follows:

paper's robustness = (pooled robustness / (total number of frames in the dataset = 21356)) * 100

this can be interpreted as an average number of failures per 100 frames. The pooled robustness score from the toolkit report represents total number of failures on the whole dataset.

Best,

Alan

Dne nedelja, 12. maj 2019 02.14.15 UTC+2 je oseba Jonathan Tompson napisala:

alan lukežič

unread,

Sep 18, 2019, 7:11:03 AM9/18/19

to VOT Challenge technical support

Hi,

Just to clarify the terminology about the robustness of a tracker: in VOT there are these 3 terms, which have been used in the paper through the years:

- failures: total number of failures on the whole dataset (it can be found in the robustness table generated by the vot toolkit under the column: pooled) [lower is better]

- failure rate: total number of failures normalized with the number of sequences or frames. In the last years the VOT papers use this metric in the final results table to show how often a tracker fails. It is calculated as: 100 * (total number of failures) / (total number of frames in the dataset) and it can be interpreted as average number of failures on 100 frames. [lower is better]

- robustness: probability of a tracker that it will successfully track a video segment, which is S frames long, without a failure. This measure is normalized to an interval [0, 1]. Robustness is calculated using the following equation: exp(-failure rate) = exp(-(S * (total number of failures) / (total number of frames in dataset))). This measure is used in the A-R plots. [higher is better]

You can find more information about the measures in this paper: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6836055

Best,

Alan

Ross

unread,

Sep 19, 2019, 5:18:08 PM9/19/19

to VOT Challenge technical support

Thank you for your reply. I understand what the measures mean. However I do not know how to get the robustness number from the VOT toolkit to report in our paper.

The toolkit reports the weighted mean of the number failures, there is no obvious way to convert this number to robustness.

Luka Čehovin Zajc

unread,

Sep 20, 2019, 3:10:14 AM9/20/19

to Ross, VOT Challenge technical support

Hi, the robustness as defined in the TIP paper was primarily considered a visualization aid in the toolkit, as such it is computed in plot_ar.m around line 102, so right before the data is plotted. But you can use that code to compute it and use it somewhere else. Unfortunately, it was never needed elsewhere so we do not have a handy function that computes it.

cheers,

--
You received this message because you are subscribed to the Google Groups "VOT Challenge technical support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to votchallenge-h...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/votchallenge-help/4e38d79d-7b44-407d-ba65-b361189373d7%40googlegroups.com.

--

Luka Čehovin Zajc
http://luka.tnode.com

Jonathan Tompson

unread,

Sep 20, 2019, 5:26:35 PM9/20/19

to Luka Čehovin Zajc, Ross, VOT Challenge technical support

Thanks for the update Luka.

The thing that puzzles me is that I tried using plot_ar.m to calculate this quantity and I compared the values that people report in the papers, with the line 102 value using the model output the toolkit provides for existing trackers (SiamRPN in particular), and those numbers aren't close. They're not even in the same range.

Are you sure this is the line the community is using to calculate these quantities? It's very odd that the values reported by the VOT toolkit isn't the metric used to report performance on VOT by the rest of the community.

Jonathan Tompson | Research Scientist | tom...@google.com | 617-308-7164

To view this discussion on the web visit https://groups.google.com/d/msgid/votchallenge-help/CAJLmLyUtyypT6%3D-Ltm3rZEtR5yhbrr_ar1JrEtwUncaSkFiYxQ%40mail.gmail.com.

Reply all

Reply to author

Forward