Hi all,
I’m currently working on a project using CoralNet to train a machine learning model that annotates species within Monterey’s rocky intertidal community. I have a question about how the training process handles differences in annotation counts across species.
As I understand it, CoralNet randomly subsets the data (about 7/8 of annotations are used for training and 1/8 for testing). The issue I’m running into is that some species (e.g., seagrass) have far more annotations than others (e.g., barnacles). I’m wondering:
Is there a way to adjust the training to account for this imbalance in annotations?
Does CoralNet provide source code or customization options that would let me modify how the training subset is selected?
Thanks in advance for any guidance!
Gabe