CFP: 1st Shared Task on Spoken-Query Cross-Lingual Information Retrieval for the Indic Languages (SqCLIR 2025)

5 views
Skip to first unread message

Bhargav Dave

unread,
Jul 3, 2025, 5:08:07 AMJul 3
to

Apologies for the multiple postings.

-----------------------------

Spoken-Query Cross-Lingual Information Retrieval for the Indic Languages (SqCLIR 2025)

Website: https://sites.google.com/view/sqclir-2025


To be organized in conjunction with FIRE 2025 (fire.irsi.org.in)

17th-20th December 2025, Indian Institute of Technology (BHU), Varanasi, India

------------------------------

India is known for its linguistic diversity, featuring a multitude of languages. The Constitution of India recognizes 22 languages under the Eighth Schedule. These include Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Malayalam, Manipuri, Marathi, Nepali, Odia, Punjabi, Sanskrit, Sindhi, Tamil, Telugu, Urdu, Bodo, Santhali, Maithili, and Dogri. Building a retrieval system that handles spoken queries in one of India's 22 officially recognized languages and locates relevant documents in a large knowledge base is multifaceted and complex. To our knowledge, spoken-query retrieval is a relatively underexplored area in information retrieval and natural language processing, and it is a multi-lingual version that includes under-resourced languages.

To advance research in this direction, we introduce the second iteration of a novel shared task at FIRE 2025. This task invites participants to develop and evaluate systems capable of accepting a spoken query as input and retrieving relevant information from a document collection. The shared task aims to foster innovation in speech-based retrieval methods while promoting support for India's diverse linguistic landscape. 

Overview of Task

Participants are provided with a text and a spoken query. They are required to either use the provided spoken query and generate a new set of spoken queries from the text query in different environments, and complete the following two tasks: 

Task 1: Spoken Query Ad-Hoc Retrieval Data - Monolingual Task

Participants are required to develop a Spoken Query Retrieval System that handles monolingual queries. This task involves both the spoken queries and the corpus being in the same language, making the retrieval process more straightforward. The system should accurately interpret spoken queries and retrieve relevant documents from a corpus in the same language. This year, the languages involved in this task are Gujarati, Hindi, Bengali, and Kannada. 

Task 2: Spoken Query Cross-Lingual Retrieval

Participants are required to develop a Spoken Query Retrieval System capable of handling cross-lingual queries. In this task, the spoken queries and the corpus are in different languages, adding complexity to the retrieval process. The system should accurately interpret spoken queries in one language and retrieve the most relevant documents from a corpus in another language. This year, the task will involve English, Gujarati, Hindi, Bengali, and Kannada. The language pairs for queries and corpus could be any combination of these languages, allowing participants to address various cross-lingual retrieval challenges. 

Tentative Timeline

30th June - Data Released and Registrations open

20th August - Run Submission Deadline

30th August - Results Declared

15th September - Working notes due

30th September - Camera Ready Submissions due

17th-20th December - FIRE 2025 at Varanasi, India


Organizers

----------------

Bhargav Dave, DAU, Gandhinagar, India

Debasis Ganguly, University of Glasgow, Scotland

Evangelos Kanoulas, University of Amsterdam

Prasenjit Majumder, DAU, Gandhinagar, India


For regular updates, subscribe to our mailing list: sqclir@googlegroups.com




--
With Regards,
Bhargav Dave

Reply all
Reply to author
Forward
0 new messages