Apologies for the multiple postings.
-----------------------------
Spoken-Query Cross-Lingual Information Retrieval for the Indic Languages (SqCLIR 2025)
Website: https://sites.google.com/view/sqclir-2025
To be organized in conjunction with FIRE 2025 (fire.irsi.org.in)
17th-20th December 2025, Indian Institute of Technology (BHU), Varanasi, India
------------------------------
India is known for its linguistic diversity, featuring a multitude of languages. The Constitution of India recognizes 22 languages under the Eighth Schedule. These include Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Malayalam, Manipuri, Marathi, Nepali, Odia, Punjabi, Sanskrit, Sindhi, Tamil, Telugu, Urdu, Bodo, Santhali, Maithili, and Dogri. Building a retrieval system that handles spoken queries in one of India's 22 officially recognized languages and locates relevant documents in a large knowledge base is multifaceted and complex. To our knowledge, spoken-query retrieval is a relatively underexplored area in information retrieval and natural language processing, and it is a multi-lingual version that includes under-resourced languages.
To advance research in this direction, we introduce the second iteration of a novel shared task at FIRE 2025. This task invites participants to develop and evaluate systems capable of accepting a spoken query as input and retrieving relevant information from a document collection. The shared task aims to foster innovation in speech-based retrieval methods while promoting support for India's diverse linguistic landscape.
Overview of Task
Participants are provided with a text and a spoken query. They are required to either use the provided spoken query and generate a new set of spoken queries from the text query in different environments, and complete the following two tasks:
Task 1: Spoken Query Ad-Hoc Retrieval Data - Monolingual Task
Participants are required to develop a Spoken Query Retrieval System that handles monolingual queries. This task involves both the spoken queries and the corpus being in the same language, making the retrieval process more straightforward. The system should accurately interpret spoken queries and retrieve relevant documents from a corpus in the same language. This year, the languages involved in this task are Gujarati, Hindi, Bengali, and Kannada.
Task 2: Spoken Query Cross-Lingual Retrieval
Participants are required to develop a Spoken Query Retrieval System capable of handling cross-lingual queries. In this task, the spoken queries and the corpus are in different languages, adding complexity to the retrieval process. The system should accurately interpret spoken queries in one language and retrieve the most relevant documents from a corpus in another language. This year, the task will involve English, Gujarati, Hindi, Bengali, and Kannada. The language pairs for queries and corpus could be any combination of these languages, allowing participants to address various cross-lingual retrieval challenges.
Tentative Timeline
30th June - Data Released and Registrations open
20th August - Run Submission Deadline
30th August - Results Declared
15th September - Working notes due
30th September - Camera Ready Submissions due
17th-20th December - FIRE 2025 at Varanasi, India
Organizers
----------------
Bhargav Dave, DAU, Gandhinagar, India
Debasis Ganguly, University of Glasgow, Scotland
Evangelos Kanoulas, University of Amsterdam
Prasenjit Majumder, DAU, Gandhinagar, India
For regular updates, subscribe to our mailing list: sqclir@googlegroups.com