I wonder, what metric is being used for the evaluation of the results? I was hoping that I can do it without the A.jar evaluation application and just calculate the score by myself.
But which metric is being used? Is it just the plain average of all F1-scores for each class or does A.jar calculate one of micro, macro or weighted F1 scores for the final evaluation?
I could not find anything in the papers or in the provided material.
Thanks in advance!
BR; Stefan