I’m implementing the “sensitivity at 5% positive‐rate” metric for our submission using compute_challenge_score function. I’d like to confirm exactly how your compute_challenge_score(labels, outputs, max_fraction_positive=0.05) function is used during scoring:
Does it internally optimize the decision threshold on the held-out (test) set to achieve the 5% positive-rate guideline?
Or, does it simply compute sensitivity at a fixed threshold that participants must determine beforehand (e.g., on a validation set, maybe during training) and supply to the scoring pipeline?
Thank you for clarifying whether threshold selection happens inside the challenge scorer or must be frozen in our code submission.
Best,
Aditya