Source and Classifier training

48 views
Skip to first unread message

Francesco Pelizza

unread,
Jan 22, 2025, 10:00:34 AM1/22/25
to CoralNet Users
Hello,

Thank you for your amazing work and the incredible skills you dedicate to creating CoralNet. My name is Francesco Pelizza, and I am a PhD student in Italy. I am currently exploring CoralNet with the idea of applying it to future monitoring projects in the Maldives.

I have some questions about the functionality of CoralNet. Specifically, I am unclear about how a source works, the automated annotation tool, and classifier training.

Let’s suppose I have created an account on CoralNet. I understand that I need to either create a new source or join an existing one. I noticed that there are roughly 20 sources related to the Maldives. If I join one of these sources, will the other members of that source be able to see my data?

If I prefer to keep my data private, I assume I need to create a new source. My main concern is how long it will take for my source to achieve accurate automated annotation. When a source is created, is any form of automated annotation performed based on a larger dataset that initially trained CoralNet? Or will I need to train the software from scratch? If so, how many photos are required for effective training?

Thank you again for your incredible work!

Stephen Chan

unread,
Jan 24, 2025, 4:31:42 PM1/24/25
to CoralNet Users
Hello, thanks for the kind words!

Data visibility is controlled on a source by source basis. So if you joined an existing source and added data to that source, then yes, anyone who can see that source can also see the data you added.
Note that this generally means that you will be in close collaboration with the source owner(s) and that they trust you. You need at least Edit permissions to a source to upload data, and that same permission level also allows you to delete the source's data.

Yes, if you're not sharing with anyone yet since you're still figuring things out, feel free to create a new private source. Note that you can always toggle the source from private to public later.

CoralNet cuts a balance between site-wide vs. source-specific training: feature extraction (an image-preprocessing step) is based on training across hundreds of CoralNet sources, while the image classification process itself (or more precisely, classifying the features extracted from the images) is based on training within only one source.

Your source can either use a classifier trained on your source's data, or use a classifier from any other source as long as you have access to that source - meaning either the source is public or you're a member of the source.
So, you may be interested in the latter option if you can find an existing source which has a good amount of data and is based in the Maldives (or an ecologically similar area).
Otherwise, the amount of data you'll need to train a classifier yourself can vary greatly, depending on image quality and uniformity, labelset complexity, your accuracy expectations, and other factors. But you should definitely expect to need hundreds of images at least, and many studies prefer to have thousands.
Reply all
Reply to author
Forward
0 new messages