Hi everyone,
My team and I have been working through the Task 2 YOLOv11 baseline and wanted to raise a few quick questions regarding the documentation and reproducibility.
1. Website Documentation Clarification On the task description page, the detailed image breaking down performance across eight datasets (plus a "Final" row) appears to be evaluating the training set, not the validation set. Table 2 is the actual validation set score (Recall 0.32, Precision 0.67). It might be helpful to clearly label that image as "Training Set Results" on the website to avoid confusion for newcomers!
2. Missing Reproducibility Parameters When running our evaluation on the 3 held-out validation datasets, our macro-average is close to Table 2 (Recall ~0.26, Precision ~0.41), but there is still a gap. Unlike other DCASE challenges where the exact baseline training configurations are provided, we noticed the repository is missing the exact hyperparameters used to achieve the published 0.32/0.67 scores.
Could the organizers share the specific parameters used for the official YOLOv11 baseline? Specifically:
Number of training epochs
The exact confidence threshold (conf) and NMS settings used during inference
Any specific YOLO augmentation modifications
Having these would help us (and other teams) ensure we have perfectly replicated the baseline before building our custom architectures.
Best, Matthias