CFP: CVPR Workshop "O-DRUM": Open-Domain Retrieval under Multi-Modal settings

Skip to first unread message

Feb 13, 2022, 10:16:52 PMFeb 13
to Machine Learning News
Hello everyone, 

We are inviting submissions to the O-DRUM workshop at CVPR 2022. Details are included below and on the website: 

O-DRUM @ CVPR 2022
Workshop on Open-Domain Retrieval Under a Multi-Modal Setting
CVPR 2022, New Orleans, June 20 

Description: Information Retrieval (IR) is an essential aspect of the internet era and improvements in IR algorithms directly lead to a better search experience for the end-user. IR also serves as a vital component in many natural language processing tasks such as open-domain question answering and knowledge and commonsense-based question answering, and image retrieval applications that have become a vital part of knowledge-based and commonsense visual question answering. Many datasets and IR algorithms have been developed to deal with input queries from a single modality, such as for document retrieval from text queries, image retrieval from text queries, text retrieval form video queries, etc. However, in many cases, the query may be multi-modal, for instance an image of a milkshake and a complementary textual description “restaurants near me” should return potential matches of nearby restaurants serving milkshakes. Similarly, sick patients may be able to input their signs and symptoms (for instance photographs of swelling and natural language descriptions of fever) in order to retrieve more information about their condition. Such functionality is desirable in situations where each modality communicates partial, yet vital information about the required output. O-DRUM 2022 seeks to address this emerging topic area of research. The workshop aims to bring together researchers from information retrieval, natural language processing, computer vision, and knowledge representation and reasoning to address information retrieval with queries that may come from multiple modalities (such as text, images, videos, audio, etc.), or multiple formats (paragraphs, tables, charts, etc.).

Call for Papers: We invite submissions related to the broad topic area of multi-modal retrieval, including but not limited to the following topic areas:
- Retrieval from multi-modal queries or retrieval of multi-modal information.
- New datasets or task design for open-domain retrieval from multi-modal queries, and multi-modal reasoning requiring external knowledge.
- Modification, augmentation of existing benchmarks such as OK-VQA, VisualNews, Web-QA, etc.
- Commentary and analysis on evaluation metrics in IR tasks, and proposals for new evaluation metrics.
- New methods and empirical results for multi-modal retrieval
- Faster, efficient, or scalable algorithms for retrieval.
- Methods which learn from web data and knowledge bases by retrieval, rather than from fixed sources.
- Retrieval methods aiding other tasks such as image and video captioning, visual grounding, VQA, image generation, graphics, etc.
- Use of Retrieval as a means for data augmentation/data generation in unsupervised/few-shot/zero-shot learning.

We encourage submissions of two types:
1. Extended abstracts of novel/previously published work (4 pages including references),
2. LONG papers (8 pages + references) previously accepted in venues such as CVF conferences, *ACL, NeurIPS, ICML, ICLR, AAAI, IJCAI, etc., including papers accepted to CVPR 2022.
All submissions should be formatted using the CVPR 2022 template. Accepted papers will be presented as posters during the workshop, where attendees, invited speakers and organizers can engage in discussion. Accepted papers are considered non-archival and can be submitted to later conferences, and will be distributed on CVPR workshop proceedings and on this website. We plan to highlight the best 3 papers via spotlight talks during the workshop session.

​Important Dates:
 Submission Deadline: March 18, 2022 (Friday), 23:59 PDT
 Notification of Decision: March 31, 2022
 Camera Ready Deadline: April 08, 2022 (Friday), 23:59 PDT

Invited Speakers: Aniruddha Kembhavi (Allen AI),  Danqi Chen (Princeton),  Diane Larlus (NAVER Labs),  Mohit Bansal (UNC),  Xin (Eric) Wang (UCSC)

For questions, Please contact Man Luo ( or Tejas Gokhale (​

Reply all
Reply to author
0 new messages