Hi Authors,
I am wondering if there is a way to replicate the tags that is shown in the paper. Specifically, I want the tags to be person1, person2, etc. if there are multiple of one object. Do these tags change every time you run the detection module?
Also, I want to make sure if my understanding is correct about the detection module and the way you match it with the answer choices from Appendix Section C Aligning Detections. Are you saying that the model does not output tags with indices, but this does not really matter because the model will select the most similar option from the answer choices anyway?
Thanks,
Andrew