***Apologies for possible cross-posting ***
First Call for Papers
Co-located with ACL 2026 in San Diego, California, United States & Online | July 2–7, 2026
Website: https://alvr-workshop.github.io/
Contact email: alvr_worksh...@googlegroups.com
Language & Vision research has rapidly evolved in recent years, driven by the emergence of large vision-language models (LVLMs). The 4th Workshop on Advances in Language and Vision Research (ALVR) will bring together researchers to explore the frontier of multimodal learning, foster collaboration, and identify pressing open challenges. We will feature keynote talks, spotlight paper presentations, poster sessions, and a panel discussion.
Following the success of previous ALVR editions in 2020, 2021 and 2024, this fourth edition will be held as a hybrid full-day workshop at ACL 2026 in San Diego.
Direct Submission deadline: March 5, 2026
Pre-reviewed (ARR) commitment deadline: March 24, 2026
Notification of acceptance: April 28, 2026
Camera-ready paper due: May 12, 2026
Workshop date: July 2nd or 3rd, 2026
All deadlines are 11.59 pm UTC -12h (anywhere on earth).
We accept the following types of submissions:
Long papers: up to 8 pages (+ references and appendix)
Short papers: up to 4 pages (+ references and appendix)
Final versions will be given one additional page of content so that reviewers' comments can be taken into account.
Authors will have the option to provide a link to the relevant arXiv paper.
All papers must be submitted anonymously and will undergo double-blind peer review by at least three reviewers.
We are also including a non-archival track to allow dual submission of work to ALVR 2026 and other conferences/journals. Space permitting, these submissions will still participate and present their work in the workshop and will be hosted on the workshop website but will not be included in the official proceedings.
Please apply the ACL format and submit through openreview but indicate that this is a cross-submission (non-archival) at the bottom of the submission form.
Call for reviewers: If you have published in the field previously and are interested in helping out in the program committee to review papers, please email us at alvr_worksh...@googlegroups.com.
This workshop covers (but is not limited to) the following topics:
Self-supervised vision and language pre-training;
New tasks and datasets that provide real-world solutions in language and vision;
Text-to-image/video generation and text-guided image/video editing;
3D/Spatial reasoning and inference with language and vision;
Multimodal agents and Language-grounded embodied agents;
Visually-grounded natural language understanding and generation;
Culturally-aware LVLMs and LVLMs for underrepresented cultures;
Multilingual LVLMs;
External knowledge integration in visual and language understanding;
Shortcomings of the existing LVLMs on downstream tasks and solutions;
Training efficiency and optimization of LVLMs;
Post-training frameworks for LVLMs, including alignment and reasoning;
Ethics and bias on LVLMs;
Multidisciplinary study that may involve linguistics, cognitive science, robotics, etc;
Practical applications of LVLMs;
Explainability and interpretability on LVLMs.
Qianqi (Jackie) Yan (UC Santa Barbara)
Syrielle Montariol (EPFL, UC Berkeley)
Yue Fan (UC Santa Cruz)
Jing Gu (xAI)
Jiayi Pan (xAI)
Manling Li (Northwestern University)
Parisa Kordjamshidi (Michigan State University)
Alane Suhr (UC Berkeley)
Xin Eric Wang (UC Santa Barbara)