Hi,
it seems that the format is [ <phone_index> <log_posterior> ].
If the scores were well calibrated, the default threshold p=0.5 corresponds to log-value -0.693 (higher is correct, lower is incorrect).
In practice, you may need to further calibrate these scores by passing through a logistic regression trained on labelled in-domain data.
Or, at least you can plot histogram of scores of correct/incorrect phones and set the threshold according to it.
Best,
Karel
Dne sobota 11. května 2024 v 4:01:25 UTC+2 uživatel 李俊廷(ヤンヤン) napsal: