Hi everyone,
I am a Bachelor’s student working on Task 2 of the BioDCASE 2026 Challenge as part of a university course. I’ve successfully set up the baseline repository and run the full training/prediction pipeline, but I am seeing a discrepancy in the results and wanted to check if I might have missed a specific configuration.
When evaluating on Blue Whale calls (bmabz), my Recall matches the official baseline almost perfectly (~0.093). However, my Precision is significantly lower (~0.047 compared to the reported ~0.89) due to a very high number of False Positives.
I used the default settings and scripts provided in the repository. Since I couldn't find specific random seeds or hyperparameter files for the training phase, I was wondering:
Are there specific "gold" hyperparameters or seeds that were used for the reported Table 2 results?
Has anyone else encountered this high False Positive rate when running the YOLO baseline out-of-the-box?
Being new to this challenge, I want to ensure I haven't overlooked a simple configuration step in the preprocessing or prediction scripts.
Thank you for your help!
Best regards,
Matthias Nagl