point annotation and automatic annotation : how many pixels are used ?

91 views
Skip to first unread message

Inès Lukasik

unread,
Mar 20, 2024, 5:16:00 AM3/20/24
to CoralNet Users
Hi,

I'm new to CoralNet, so I apologize for my very basic question. For my Master's thesis, I need to evaluate the cover of benthic substrates in many images (source name: CARMUHAM). Currently, I'm training the classifier by manually annotating 30 random points. I wanted to know if, when CoralNet suggests annotations for a point, is it based on one pixel targeted below the point's location or on many other pixels around the point's location?

For example, here, the point seems to be located on a branching acropora. However, upon zooming in, the point is actually just below the coral, in a darker area. Nevertheless, the first proposal made by the AI is branching acropora. So, is the point annotation proposal made based on many pixels around?

Because if that's the case, I might need to change the way I annotate: because I wasn't taking into account the other pixels around a point, I was zooming in at the pixel level to be sure of the substrate below the crosshair.

Thank you for your help!

Inès

Capture d’écran 2024-03-20 120843.png

Stephen Chan

unread,
Mar 25, 2024, 12:14:27 AM3/25/24
to CoralNet Users
Hi Inès,

It's actually a good question, and there isn't necessarily a straightforward answer.

CoralNet classifiers do look at the pixels around the center of the point, not just the exact center pixel of the point. However, the classifier might give more consideration to the pixels closer to the center.
The part that isn't so straightforward is saying to what extent "more consideration" is given. In theory, since your source's classifier learns from your annotations, the extent should be somewhat influenced by the way you annotate.

No matter how you interpret your point annotations, I imagine that the edge cases will be a common source of error for your classifier. If you consider the center pixel, and a branch of coral ends extremely close to that center pixel, that's your edge case. If you consider a circular or square area around the center, and that area contains 52% Acropora and 48% substrate, that's your edge case. So I personally think there will always be some tricky cases like that, and there isn't a clearly "correct" way. But I think the important thing is that you are consistent and stick to the same rule when you annotate.

Ivor Williams

unread,
Nov 13, 2024, 3:29:20 AM11/13/24
to CoralNet Users
I was very interested in this question and answer .. as I have a related question.

Specifically, as I understand CoralNet uses a box 224x224 pixels around each point to extract features and classify. Most of our images seem to be 3000x2000 pixels, so each 224x224 box would have side a little more than 1/10th of the height of the image.
For consistent training of the classifier, it seems like would we be better to use a box around each point rather than a cross hair. 
From messages above, I assume there is not a simple answer, but Is it possible to give any guidance on this, and if box might be a good idea, how big should that box be? (I see you mention that classifier may give more weight to points close to the center, so probably box should be smaller than 224x224)

Thanks
Ivor

Stephen Chan

unread,
Nov 13, 2024, 3:08:21 PM11/13/24
to CoralNet Users
Hi Ivor,

Indeed, I'd only be guessing if I threw out a number for this question; someone would have to run experiments to get us a meaningful answer. On top of that, dividing the 224x224 pixels into "high weight" and "low weight" regions would be a subjective judgment.

Jordan Pierce - NOAA Affiliate

unread,
Nov 13, 2024, 3:20:54 PM11/13/24
to CoralNet Users
Hi Ivor, Stephan,

I know that Oscar did an analysis in his original PhD paper(s) and settled on 224 x 224 as opposed to other crop sizes because in his dataset (MLC) it produced the best results. Another paper came out (MDNet) that used the MLD dataset as well, and cropped multiple patches of the same point and fed them into the model as a single multi-input and his results were better, suggesting a single fixed patch size is not necessarily the best option (though not the worst). One thing that could potentially be done in CoralNet without modifying too much code (I assume) is varying data augmentation workflow, so for a given point, we crop multiple patch sizes for each point, then resize to 224 x 224 and then feed to model. If you wanted to get fancy, maybe vary the amount of patch copies so you increase the less frequent class categories as a way of weighting the class imbalance.

Regardless though, any fixed patch size will have issues, as it's impossible to know a-priori if a patch will accidently encompass more than one class category; also depending on the resolution of the camera and the distance from the camera to the reef... 

Ignes: an interesting thing to look at would be instance segmentation as opposed to / in addition to patch-based image classification. CoralNet doesn't support polygon annotations, but TagLab and CoralNet-Toolbox support both polygons and points. If not, you can also look at the top5 predictions of the CoralNet model instead of just the top1, that might give you more interesting results.

Best,

Jordan

David Kriegman

unread,
Nov 19, 2024, 7:35:41 PM11/19/24
to CoralNet Users
To follow up on this with more details and precision:
1. When we developed the new feature extractor for CoralNet  using EfficientNet, we experimented with a few different patch sizes, and for example we found a degradation in accuracy when the patch size was smaller. e.g., reducing the patch size from 224 x 224 to 168x168 led to a 1% absolute reduciton in accuracy when compared to different versions of efficientnet and resnet. So, we stuck with 224x224 for all the subsequent research.
2. Re box vs cross hair. The training semantics are the label at that point of the cross hair. and the box is to build the context of the region around it. The meaning is not the "average of the region."
If the region is 40% sand, but the center is acropora, the label is  acropora. The context around a point is crucial for determining what's at that point, and it clearly weighs the central point more heavily than the outer points, a larger context does mean that we can't label point within a border of 224/2 pixels from the edge.
3. There are other ways to establish context in computer vision behond patchdes, but ultimately there's some region of the image that supports a decision.
4.  Right from the start of coralnet, there was an idea to rescale the image so that a pixel corresponds to a standard size on the benthic surface. (e.g., a pixel is 0.5mm). If this could be applied over all training data and inference, it would make classification easier. To do this,  successfully, one would need to know the field of view of camera-lens (easy) and the distance to the surface for every pixel (hard) and even an  aproximate distance (transact was taken at 1 meter) is just not widely. We opted for better coverage of coralnet and just trained over a wide range of conditions, enjoying the benefit of lots of data and deep neural networks.

I hope this provides more context about coralnet. More details can be found in: 

David

Ivor Williams

unread,
Nov 27, 2024, 9:28:54 AM11/27/24
to CoralNet Users
Thanks for the  responses on this.
I get that the point at the cross hair is what the robot is classifying .. very clear now ... but I am still not that sure what the ramifications are of using the larger area for context, and whether there could still be an argument for a human analyst to also account for that context. For example. its not that unusual in my experience for it to a bit blurry or shaded right at the cross hair, but for it be very obvious what a slight wider area around that cross hair is over. Actually, I suspect a lot of human analysts are already somewhat incorporating that in their own classification.
Quite likely I am overthinking it, and probably there isn't any good standard way of accounting for this.

Anyway thanks. As always .. CoralNet is amazingly useful for us.

Cheers
Ivor
Reply all
Reply to author
Forward
0 new messages