2nd BabyLM Workshop - Call for Papers

5 views

Skip to first unread message

Mustafa Omer Gul

unread,

Mar 17, 2026, 10:43:38 AM (3 days ago) Mar 17

to SIGARAB: Special Interest Group on Arabic Natural Language Processing

Hi SIGARAB Community,

We are delighted to announce the 2nd BabyLM Workshop, to be co-located with EMNLP 2026 in Budapest, Hungary. The full call for papers can be found here: https://arxiv.org/abs/2602.20092v2. A summary follows.

The goals of BabyLM are to bring together multiple disciplines to investigate an enduring question: how can a computational system learn language from a developmentally plausible quantity of input? BabyLM encourages the integration of insights from cognitive science into the design of more sample-efficient language models, while also using advances in language modeling architectures to generate new hypotheses and experimental paradigms for cognitive science.

This year, the theme of the workshop is *Going beyond English*. Previous iterations of BabyLM have focused primarily on English; with the introduction of the new Multilingual track (see BabyLM Challenge below), we aim to inspire submissions for other languages. We hope the BabyBabelLM dataset can be a starting point for this, but also encourage submissions that introduce new resources that will foster progress on data-efficient modeling across diverse languages.

We will accept two types of submission: challenge submissions and workshop submissions.

=== Workshop Topics ===We invite submissions on topics including but not limited to the following:

Data-efficient architectures and training techniques.
Data curation for efficient training.
Cognitively and linguistically inspired language modeling and evaluation.
Small models (and scale comparisons).
Relevant aspects of multimodality.
Interaction with or feedback from teacher models during training.
Second language acquisition, bilingualism or multilingualism.

=== BabyLM Challenge ===The BabyLM Challenge (now in its fourth iteration) challenges participants to train language models on human-sized training corpora, up to 100 million words. This year’s iteration will remain largely the same as in previous iterations, except:

We are debuting a new multilingual track, in which participants are tasked with training models on a trilingual split of the BabyBabelLM dataset.
We continue to offer the strict and strict-small tracks.
We have folded last year’s multimodal and interactive tracks into these tracks.

=== Key Dates ===We will accept submissions through ACL Rolling Review (ARR) or directly to the workshop via OpenReview. Paper submissions to the workshop can ignore competition entry deadlines. Our tentative timeline (subject to ARR and conference deadlines, to be released) are as follows:

February: Call for papers and training data released
Early April: evaluation pipeline and baselines released
May 25: ARR submission deadline
Mid-July: Direct submissions deadline
Early August: Direct submission reviews due; ARR commitment deadline
Mid-August: Decisions released
Early September: Camera-ready due
24-29 October: Workshop @ EMNLP in Budapest (exact date TBA)

=== Contact ===If you have any questions, please join the BabyLM participants’ slack. Please see the link on the BabyLM website, or join at the following link: https://join.slack.com/t/babylmchallenge/shared_invite/zt-3r0lfjm6d-9bZZIfiVGFndKnwat5O35g

Please also feel free to reach out to the organizers directly: Leshem Chosen (leshem....@mail.huji.ac.il), Aaron Mueller (amue...@bu.edu), or Alex Warstadt (alexwa...@gmail.com).

Best regards,The BabyLM 2026 Organizing CommitteeLeshem Chosen, Ryan Cotterell, Mustafa Omer Gul, Jaap Jumelet, Tal Linzen, Aaron Mueller, Suchir Salhan, Raj Sanjay Shah, Alex Warstadt, Ethan Gotlieb Wilcox

Reply all

Reply to author

Forward

0 new messages