My question is how do you get the mask annotation for the 144 videos, not about the gap between the annotated frames.
For example, by manually annotation, or by SAM.
Besides, does the way for annotation in the 4 mot-test videos the same with the 144 videos?