Good morning,
I am a researcher at LINKS Foundation and a PhD student at Politecnico di Torino. Together with my colleague (in cc), we would like to submit a 4-page Position or Perspective paper.
Our paper introduces a new validation dataset for composed image-text retrieval, aimed at evaluating whether joint encoding of multimodal content is more informative than unimodal encoding. Specifically, we investigate whether representing data as a vector
of (image, text) pairs offers more value than using separate vectors for image-only or text-only inputs.
We would like to kindly request a short extension of a few days to complete our submission.
Best regards,
Federico D'Asaro